Why deleting followup reports automatically understates vaccine danger signals
Assessing the quality and seriousness of deleted VAERS reports
Some recent comments by readers prompted me to look into the quality of deleted VAERS reports.
We now know that the reason for the deletion is not nefarious (at least I don’t have any reason to think so till now) - they keep the original reports but delete followup reports.
But this is not a good idea.
For one, this means all outside researchers using VAERS data are not getting the complete picture.
And in the worst case, this includes death reports.
But I had an intuition in a previous article that the deleted reports are also qualitatively different from the original reports.
It is my view that if you move from looking at only the deleted death reports, to also taking into account serious adverse events, and then also taking into account the full narrative text of both the original and the followup report, the sum total analysis you can potentially do will end up being drastically different.
The reasons are pretty obvious
a) VAERS is not easy to report to - despite what the pro-vaccine community wants us to believe, it is actually vetted. You first get assigned a temporary VAERS_ID, and only get a permanent one after your information is at least nominally verified.
b) if someone still filed a followup report, they were likely already familiar with the process and were still willing to go through the whole thing once again. The additional motivation alone should probably be a factor in the overall report being better.
This also tells us that the more serious a report, the more likely there will be a followup (if needed). Is this intuition correct?
Are followup reports more serious?
There were 882863 US VAERS reports related to COVID19 as of first week of Feb 2023 when I downloaded the dataset.
We will define a report to be serious if one of the following is true:
DIED, HOSPITAL, DISABLE, L_THREAT
Click here to see the definition of these fields
There are 94766 such reports.
So that is 94766/882863 = 10.73% of all reports are serious. Note that this matches the expected estimate.
There were 31098 deleted reports as of first week of Feb 2023 when I downloaded the dataset.
There are 6173 reports which can be classified as serious using the exact same query as before.
That is, 6173/31098 = 19.85% of deleted reports can be classified as serious.
To summarize, 10.73% of original VAERS reports can be classified as serious, while 19.85% of followup VAERS reports can be classified as serious.
In other words, nearly twice as many deleted reports are serious reports when compared to the original reports.
Are followup reports more descriptive?
The average length of original reports is 703 characters.
The average length of followup reports is 1103 characters, meaning the followup reports are on average more than 50% longer ((1103-703)/703=56.8%).
In other words, going through the followup report is likely to be much more useful if you are trying to figure out if the vaccine caused an injury.
Of course, it is not an either/or scenario. It makes a lot more sense to merge the information together. However, I am doing these comparisons to show what we could be missing out when we ignore followup reports.
Interestingly, the average length of the narrative text for original death reports is only 604 characters, while the average length of the narrative text for followup death reports is more than double at 1407 characters.
But these are averages over the whole dataset and not based on matched reports.
I also tried to look at the difference in the average length of the SYMPTOM_TEXT in the reports which I was able to match (remember I could only match about 50% of the deleted death reports). There is also a jump here - from an average length of 701 characters to an average length of 910 characters. This is not as pronounced. I will look into this in more detail when I get a chance.
There are probably other ways to prove that deleting (and thus ignoring) followup reports is a bad idea, and if someone has any suggestions please let me know in the comments.
I want people to understand ... There are two types of followups.
1. People and medical staff make additional reports in VAERS on individuals. Those are being referred to in the article above.
2. CDC says they follow up on serious VAERS reports and no you can't have them, these are kept secret from the public.
There are two possible ways to resolve that:
A. We slip into an alternate universe where CDC is honest and interested in health (only became possible in that alternate universe because we were not growing at 385,000 new babies per day, that's the real emergency in our current universe here which they are forced to try to address to save planet earth, they do so arguably obeying the law but meanwhile avoiding an increase in public health, by publishing poor data)
B. FOIA request
We humans can't have the discussion about our population growth, a topic that is too emotional, and that's why we get number 2 (pun acknowleged).
Does anyone have a theory on why the CDC injects (or allows) multiple repetitions of sentences in reports? Random copy and paste are not how people write. One doesn't have to look through very many to notice them, surprisingly common. A cynical mind might figure it is their effort to flip off reader's brains (pun intended). To illustrate the point, random copy and paste are not how people write and random copy and paste are not how people write.
Is there some code that can remove them?
Some candidates:
https://www.google.com/search?q=python+remove+repeat+sentences+site%3Astackoverflow.com
One possibility among the results:
https://stackoverflow.com/questions/53181784/how-to-remove-duplicate-phrases-in-python
I'd change that to r'\1 [repeated]' to document them, but didn't work right anyway.
How would the shorter vs. longer text signals on veracity of the reports then tend to differ from now?
Might not be possible with regex. Some suggestions venture into AI.
Another at https://stackoverflow.com/a/64201821/962391, not tested and frankly doesn't make sense to me. :(
No answer at this point.