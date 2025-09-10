Summary:

I analyzed 19,291 US death reports and 16,865 foreign death reports using LLM scoring (0-10 scale for vaccine causation likelihood) Used foreign dataset from October 2022 version and US dataset from May 2025 version Observed that LLMs are trained to give all the benefit of doubt to vaccines, instead of using the precautionary principle I also extracted other useful information, summarized below

Found that 6,551 US deaths (34%) and 1,860 foreign deaths (11%) were attributed to COVID-19, with 6,149 of the US COVID deaths (93.9%) and 1,419 of the foreign COVID deaths (76.3%) occurring in fully vaccinated individuals

Identified that 1,063 US patients (5.5%) and 2,301 foreign patients (13.6%) were considered healthy before vaccination

Revealed autopsy rates of 475 done in US reports (2.5%) with 345 received, compared to 1,549 done in foreign reports (9.2%) with 1,033 received, with 17 US autopsies blaming vaccines versus 53 foreign autopsies

8,512 US reporters (44.1%) and 11,497 foreign reporters (68.2%) were contactable for follow-up, while 2,960 US deaths (15.3%) and 1,128 foreign deaths (6.7%) occurred in senior care facilities

I used the Sonoma Dusk Alpha LLM to extract information from a set of US and foreign VAERS death reports.

My goal is to see if

a) they can do a reasonable job assessing if the death was due to the vaccine and

b) how the two datasets compare in terms to the level of detail (more detailed reports would obviously allow better data extraction)

I ask the LLM to predict a score between 0 and 10 for whether the vaccine caused the death (0 = unlikely, 10 = almost certainly due to vaccine).

There were also many other fields of interest which I extracted

was it reported by a healthcare professional?

is the reporter contactable?

was the patient healthy before vaccination?

was the patient in a senior care facility?

were they fully vaccinated?

stated medical reason for death

was an autopsy done?

was the autopsy received?

does the autopsy blame the vaccine?

autopsy results

did patient die from COVID19?

did they test positive for COVID19?

Finally, the last two columns are:

vaccine_caused_death (a score between 0 and 10)

explanation (for the score assigned in the previous column)

Click here to see the US dataset

Click here to see the foreign dataset

Dataset cutoff dates

The foreign dataset is based on the version from around October 2022 which still had the clinical narrative (SYMPTOM_TEXT) field for all the reports.

The US dataset is based on the May 2025 version.

contains_valid_json vs is_valid_schema

Sometimes you see some columns with empty values. If is_valid_schema is False for that row (i.e. that VAERS report), usually it means the column information cannot be extracted based on the response from the LLM.

Without going into too much technical detail, this is because the code expects the data to be following a certain “structure” and it is not possible to extract the value if that structure is incorrect.

In practice, this means that if you are doing aggregate calculations, you can either

a) ignore ALL rows where is_schema_valid is False and do your calculations based on the rest

b) give the vaccine the benefit of the doubt for all the rows where is_schema_valid is False

Scores

Here are the scores assigned by the LLM for the US reports

And these are the scores for the foreign dataset

As you can see, LLMs are usually trained to give as much benefit of the doubt as possible to the vaccine.

For example, this LLM is very happy to assign a score less than 3 if any of these things are true:

the company promises that the “benefit-risk relationship is not affected”

the patient already has any pre-existing health condition

if the autopsy information is not provided

What happened to the “precautionary principle”?

But even if you consider 5 and above as an over 50% chance that the vaccine caused the death, 1248 out of 19291 US reports and 1628 out of 16865 foreign reports fall within that range. While people always talk about the denominator of “billions of doses” the fact remains that if a vaccine is shown to cause thousands of deaths, it means vaccine manufacturers are permitted to have third rate safety standards.

And this is before you consider the under-reporting factor, and the possibility that a lot of vaccine related deaths were wrongly reported as COVID19 deaths.

Autopsy blames vaccine

Perhaps the most clear cut case is when the autopsy blamed the vaccine for the death

There are 17 such reports for the US

There are 53 such reports in the foreign dataset

It is interesting to note that just because an autopsy blamed the vaccine, the LLM does not automatically assign even a high “score”, let alone the highest possible score of 10.

In fact, reading the LLM explanation versus the report conclusions suggests that the LLM is able to identify additional factors that the autopsy may not have considered.

Deaths due to COVID19

I have already mentioned before here and on my Twitter account that I don’t believe all VAERS deaths are due to the vaccine, and a reasonable fraction were almost certainly due to COVID19.

According to the LLM:

In the US 6551 out of the 19291 deaths were due to COVID19

In foreign dataset 1860 out of 16865 deaths were due to COVID19

Healthy before vaccination

In the US, 1063 out of 19291 were considered to be healthy before vaccination.

In the foreign dataset, 2301 out of 16865 were considered to be healthy before vaccination.

Is the reporter contactable?

If the reporter is contactable, it allows the CDC to do follow up investigations.

In the US dataset, 8512 out of 19291 reporters are contactable.

In the foreign dataset, 11497 out of 16865 reporters are contactable.

Was the patient in senior care?

On Twitter, some folks have suggested almost all VAERS deaths are just people who live in senior care facilities. That is an absurd take, given that people living in senior care do not make up even 20% of the VAERS deaths in either the US or in the foreign datasets.

In the US, 2960 out of 19291 patients were in senior care facilities.

In foreign dataset, 1128 out of 16865 patients were in senior care facilities.

Was the patient fully vaccinated?

One of the more idiotic reasons given by the vaccine pushers was the following: many people who died actually died from COVID19 because they were not “fully vaccinated” by the time of symptom onset.

This is also untrue.

In the US, out of the 6551 people who died from COVID19, 6149 were fully vaccinated by the time of symptom onset!

In the foreign deaths, 1419 of the 1860 COVID19 deaths were in fully vaccinated people.

Autopsy results

Here is a breakdown for US reports:

Autopsy done: 475

Autopsy received: 345

Autopsy blames vaccine: 17

This is the breakdown for foreign reports:

Autopsy done: 1549

Autopsy received: 1033

Autopsy blames vaccine: 53

Bloopers

Personally, I don’t think LLMs can “reason” in the same way that humans can.

So you sometimes do see very strange results from the LLMs and that is why it makes a lot of sense for a human to still review the work.

If you search the “vaccine_caused_death_explanation” field for the word “before”, you will see that sometimes the LLM concludes that the vaccine did not cause the death because the patient died before the date of vaccination!

Here is the link to the actual report:

If you have already done analysis of VAERS reports, you know that this just means the month of death is known but the exact date is unknown. In fact this is stated in the symptom text, so this is clearly a big blunder by the LLM.

However, LLMs do produce such nonsensical conclusions occasionally. You can mitigate this somewhat by adding specific instructions about this failure mode, but it is not possible to anticipate all failure modes before you run your experiments.

Results

As you can see, not only are we quite far from “AGI”, even best-in-class large language models occasionally produce major bloopers :-)

But you can also see that the LLMs are able to do an enormous amount of grunt work and drastically reduce the time needed for doing these kinds of analyses.

These are my takeaways:

Quite clearly, the scores (did the vaccine cause the death) are very biased towards absolving vaccines of any blame, and I am not sure they provide any real insight. It is especially concerning that the LLM is just willing to take the company’s word that the “benefit risk relationship is not affected by this report”!

I think LLMs can be especially useful to flag VAERS death reports for further analysis. There are thousands of people who were deemed to be healthy and then died right after vaccination. This creates a smaller list which needs to be investigated further to see what really happened.

They can also be useful for getting a high level picture of some of these metrics as long as you do not expect them to be 100% accurate. For example, you can see pretty quickly that very few of the VAERS death reports are “due to COVID19” in “not yet fully vaccinated” people.

Finally, they can be really useful for trimming down a large set of reports into a much smaller and more manageable size for follow up. A good example would be the number of reports which mention that an autopsy was done, but the autopsy_received is set to “waiting”. This produces a small set of VAERS reports where someone can immediately follow up and inquire about the current status of these autopsy reports.