Do deleted reports simply follow "Keep the first" policy?
This seems like a pretty good explanation
I got this question in a comment on my article about deleted VAERS reports.
Thank you for your very helpful and informative Substack.
I wonder whether the patterns you observe are due to VAERS deleting more recent reports of a case, given this explanation in the VAERS Data Use Guide: "when multiple reports of a single case or event are received, only the first report received is included in the publicly accessible dataset. Subsequent reports may contain additional or conflicting data, and there is no assurance that the data provided in the public dataset is the most accurate or current available" (p. 3). The negative interpretation of this policy is that it allows VAERS to eliminate some of the worst outcomes, such as when a case's condition deteriorates. Are the dates received of the retained and deleted reports consistent with the "keep the first report" policy?
On first glance, it does appear that among the duplicates I could identify, almost all the reports follow the “keep the first report” policy. (But keep in mind that I could not identify the duplicate report for 50% of the deleted reports in my previous analysis).
The relevant field to look at is the RECVDATE.
Since I already have the underlying data, I issued another query to create a table where the first report is not the one which is kept. So I checked for reports where the RECVDATE for the RETAINED_VALUE is after the RECVDATE for the DELETED_VALUE.
SELECT *
FROM matches
WHERE DELETED_URL IN
(
SELECT DELETED_URL
FROM "matches"
WHERE FIELDNAME = 'RECVDATE'
AND DATEDIFF(RETAINED_VALUE, DELETED_VALUE) > 0
)
I found only a handful of such reports (there are only 289 such report-pairs) out of the total of ~10000 duplicates I identified.
As you can see, there are only a small number of such reports, and there aren’t a lot of rows with ⚠️ Flags.
Duplicate reports received on the same day
And just to complete my analysis, I also changed the DATEDIFF to 0 just to see how many duplicate reports are filed on the same day.
There are over 3000 such report-pairs.
And for most of these, the ⚠️ flag is only seen in the RECOVD field, which could well be an unintentional side-effect of the way these reports get updated.
If someone knows why duplicate same-day reports get added into VAERS in the first place, please let me know in the comments.
Summary
I got two takeaways from this analysis.
One, I think “Keep the first report” is a very good explanation for the missing information. While this still means that VAERS is not providing complete up-to-date information, this also means there isn’t an intentional choice to keep the better report.
Two, based on my previous analysis, we can say that a lot of followups end up becoming overall worse reports in terms of safety. So I am not sure if it is actually a good idea to follow the “Keep the first report” policy.
Similarly, as anyone who has read through VAERS reports knows as well as we do ... there are an overwhelming number of repeated sentences. One might theorize this is intentional to switch the reader's brain off as an agency would surely want quality to be their reputation, the presenting of sloppiness would not be an option.
I would be interested in a regular expression or method to replace repeated strings of 50 or more characters with something like [echo87]
Thank you, Aravind, for the follow-up investigation!
The "keep the first report" is definitely a bad policy. No good scientific or investigative study ends with just the initial information. I think there are likely at least two reasons for the policy. First, the CDC/FDA never wanted the VAERS system. Congress forced it on them. I suspect they despise it, especially because it's open -- anyone can submit reports and anyone can view reports. This also means that reporters can check whether their reports have been included and verify their accuracy (unlike almost any other human-reported scientific data set). The openness of the system allows an independent assessment of adverse effects that CDC/FDA can't control, and it allows skilled analysts like you to investigate matters they would like kept in the dark. The "keep the first report" policy allows VAERS to minimize the negative outcomes recorded in the public version of the database. Indeed, any reporter who makes more than one report is truly committed and likely not only believes the symptoms are genuine adverse effects but may also have good evidence to support their claims. A probable second reason for the policy is that it creates less work for the contractor that administers the VAERS (contracting the work out shows how little CDC/FDA care about VAERS). The contractor doesn't want to do anything extra, and CDC/FDA don't want to pay more for a feature that they don't want.