VAERS deletions are not nefarious
But they are still very problematic, because they understate the true level of vaccine dangers
Jessica Rose recently shared an article of mine (thanks!) about VAERS deletions.
A warm welcome to all my new subscribers! I will state up front that I don’t have much bandwidth to moderate even the few comments I receive, and prefer to focus my spare time on the vaccine data analysis. So I would very much appreciate it if the comments are focused on the topic being discussed. Also, if you provide links to your own data analysis please also provide a little summary blurb of what you discovered if you want me to read the full article.
Since the article was published on 1st Nov 2022, and I have written quite a lot of articles on VAERS deletions afterwards, I have decided to summarize what I have learnt into an FAQ article.
Why does the CDC delete VAERS reports?
Someone asked me this question in the comments:
It is actually the CDC which deletes these reports. And there is a non-nefarious explanation for this: they are only deleting follow-up reports.
In other words, the CDC knows that all these deletions have corresponding original reports in the VAERS database.
As a quick aside, the CDC collects a lot of contact information about the reporter which does not get published for obvious reasons. This means they have access to additional information which they can use to more easily identify followup reports compared to the outsiders.
But that is not the end of the story.
Does this mean we don't have to worry about deletions?
No, that’s not the takeaway message.
We do need to worry about deletions, and in fact it is very problematic.
I estimate that 80% of the deleted (followup) reports make the original report more complete, more serious, or more conclusive (in this case, conclusive means it more likely that the vaccine caused the injury).
As a result, most of the research done on VAERS (for example all those papers which do “data mining” on VAERS) are incomplete at best and maybe even outright wrong in the worst case1.
And this is probably the worst aspect of deletions: if the vaccine injury becomes more serious after the original report is filed, the followup information is not captured in any aggregate analysis. For example, if someone dies between the original and followup report, their death is not added to public-facing VAERS.
In other words, if the original report is a serious injury which then worsens and the patient dies, the serious injury is recorded but the death is not recorded in public facing VAERS. It follows that any aggregate calculations are undercounting even something as serious as a death.
How did you identify the original reports?
It is a multi-step process, and fairly easy if you are familiar with the basics of data science.
I look for reports which match the age, gender, vaccination date and the state (this is for US reports, I haven’t done a similar analysis for foreign deletions yet).
There are three possible scenarios:
1 In some cases, this gives me exactly one result. Then I do some additional sanity checks - for example there should be at least one matching symptom2.
2 Sometimes, there are multiple matches for the same set of (age, gender, vaccination date, state) values. In those cases, I reverse sort by highest number of symptom matches and choose the best match.
3 Finally, there are no matches for the given set of values. This can be quite tricky to solve, and happens because of incomplete and incorrectly coded values. Unfortunately, since there is no easy way to match these deletions, it makes people wonder if all of these are genuinely followup reports, or whether the CDC is just deleting them to make vaccines look safer. Personally, I have spent quite a few hours trying to decode these and I noticed that nearly every suspicious deletion eventually did have a matching original report. But this process is quite laborious since it is very brute force, and it is not easy to scale it to hundreds or thousands of VAERS deletions.
How should the CDC handle these deletions?
The CDC has to do two things.
They should be
a) upfront about the number of deletions and
b) provide a mapping between the deleted and the original reports as a separate file.
But there are also other ways that the CDC can handle this to avoid duplicates in the first place.
A reader who has worked with the CDC in the past believes the CDC has enough resources to ensure the CSV file is always kept up-to-date with the latest information and that all this happens because the incentives in place are just very bad (emphasis mine)
I didn't argue for favoring the original report over the follow-up report, or vice versa. I wholeheartedly agree with you that the best policy is to have ALL of the data, even if they include inconsistencies between original and follow-up reports.
The results of your most recent analysis are consistent with my point that there is no consistent bias in the deletion of reports. Follow-up reports naturally will tend to have somewhat more information than original reports, as symptoms develop and clinicians perform more diagnostic procedures.
My question of how the duplicate reports enter the public version of the database remains. It is very easy to design a database and entry interface to prevent this. Are the VAERS contractors so incompetent that they haven't done so? Maybe they have an incentive to be sloppy, if they negotiate the contract with FDA/CDC based on amount of work. Creating easily preventable errors and then fixing them adds up to a lot of time. And CDC/FDA probably don't care -- they want something that meets just the letter of the law, and might like that VAERS looks shoddy.
A greater concern is deleted reports that aren't duplicates. Albert Benavides has investigated this, and while they represent a very small proportion of all reports, it is very troubling.
While I don’t have any special insight into this, I do think if the CDC is merely more transparent about these deletions and also provides the mapping file3 it would still go a long way in rebuilding trust.
What can we infer from these deletions?
The first thing we can infer is that all the research on COVID19 VAERS will need to be updated or rewritten to take the deletions into account.
Another thing we can infer is that the people who are disparaging VAERS analysis are not just unqualified to comment on the topic, but I don't think they are even qualified to know who is and isn't qualified.
Here is a perfect example where David Gorski is commenting on Jessica Rose’s work (emphasis mine):
Rose then goes on to claim—of course!—that the nefarious CDC is “deleting” VAERS reports, particularly for children. This is a more difficult conspiracy theory to look at because there could be any of a number of reasons why VAERS ID entries are deleted. Even Rose admits that it could be because more than one report was filed for the same vaccine recipient (e.g., by the doctor and the parents) or for other reasons. A version of this conspiracy theory was going around a few months ago, and the CDC responded by explaining that some 6,000 VAERS reports had been removed from the database because they came from outside the US:
Just because David Gorski and his coterie of technically-challenged vaccine cheerleaders don’t know how to investigate this, does not mean it becomes a “conspiracy theory”.
But such remarks do one thing - it prevents the entire tech community from doing more analysis of the VAERS dataset.
I refer to people who call such analysis “dumpster diving” as the anti-VAERS community4, and I think people who still have any common sense should be ignoring the anti-VAERS community.
For example it is very obvious that creating a custom Machine Learning (ML) model for VAERS will improve its analysis in many ways5. And it will probably take just a few weeks for a well-funded team to build such an ML model. Are they being dissuaded because they don’t want to be called “anti-vaxxers”?
In fact, a senior manager at Amazon died 2 weeks after taking the Pfizer vaccine (June 2021), and the CDC stonewalled his mother so they don’t have to investigate his autopsy report. So it is not as if people in the tech community have somehow been spared either, and this makes the whole situation even more ridiculous!
Some people may argue that the deletions are only a small percent of the overall reports. While that may be true (and ironically ALSO IMPLIES that the mRNA vaccine is causing a lot of vaccine injuries) it still makes my case - no one has quantified the effect of the deletions on the VAERS reports, because the people who write these data mining papers don’t even KNOW about VAERS deletions
Note: this can and does produce the occasional false positive, but it is very rare
And importantly, the CDC can do both these things without requiring any significant re-architecting of the VAERS database schema
Amazingly, members of the anti-VAERS community are often remarkably deficient in the skill-sets which are needed to do any sophisticated VAERS analysis. In fact, they seem to prefer to bad-mouth the VAERS analysts instead of having to do any actual deep thinking on the subject. Maybe they already know they don’t have the capacity to do it? In that case, maybe this is just a case of sour grapes?
In fact, even without such a custom Machine Learning model, I was able to take an off-the-shelf ML model for generic biomedical use cases and show that over 90% of the VAERS reports will not qualify as “crazy” reports. This means, you can set some kind of threshold for the number of ‘medical jargon’ terms (I refer to this as ENTITY_SCORE in my articles) and ignore the ones which don’t meet that threshold, and you can automate the process of selecting only genuine VAERS reports.
Great write up. When I run my code to check for duplicates, I check duplicate IDs and I search for duplicate entries by matching a number of variable fields simultaneously. The search algorithm I use is actually quite strict: I check for dups based on exact field matches in the AGE_YRS, STATE, SEX, VAX_LOT, VAX_SITE, VAX_ROUTE, VAX_DATE, VAX_NAME, VAX_MANU, RECVDATE, CAGE_YR, CAGE_MO, DIED, HOSPITAL, ER_ED_VISIT, ALLERGIES, ONSET_DATE, PRIOR_VAX, VAX_DOSE_SERIES, L_THREAT, DISABLE, BIRTH_DEFECT variables. I really think you might be onto something with the idea that the follow-up filings might be the ones removed due to the fact that there are multiple field matches. If I got that wrong from your write-up, I think it may be right. For example, the algorithm I made to suss out duplicates checks for both ID dups and multiple and simultaneous variable field dups (variables listed above) since the same person can remain with permanent ID in VAERS under two different IDs. So theoretically, the VAERS data wranglers might do the same thing to remove duplicates, but perhaps their 'algorithm' to seek and remove duplicates based on duplicate simultaneous multiple field entries is always 'running' (for lack of a better way to express this) and thus maybe that's precisely why people's follow-ups don't get into the front end of VAERS. We have to assume that the temporary VAERS ID entries would be cross-checked against permanent VAERS IDs: how else would they know if 'someone' had previously filed a report? So if matches were found across many variables, maybe their 'algorithm' removes it prematurely? The only weird part about this idea is that, well, we would see none of it. But, I agree, this would not be nefariousness, but it would be stupid and something that needs to be fixed. The shitty part is there's no way for us to check this theory since we don't have access to the uncooked books with the temp IDs. We need a study: people who receive temp IDs and subsequent perm IDs trying to submit a follow-up. Like hundreds of these. Just to see what happens. Would any of their perm IDs get updated? Would they get a new perm ID? Jess
I would agree with the additional new file by CDC with a column explaining each deletion.
The discussion above must be about reports that were visible but then deleted.
But there seem to be 60K that have never been published. 230K at one time so at least 170K of those were eventually released. That's the effect in action we all know that each week some are held back, showing up later, with others being held back.
See the never published chart:
https://deepdots.substack.com/p/new-vaers-flat-file-easy-data-mining