As I was doing my VAERS analysis, I noticed something interesting.
There were some entries which had VAX_DATE (date of vaccination) and ONSET_DATE (date of symptom onset) but were missing NUMDAYS (difference between the two dates) for some reason. Given that the NUMDAYS is merely a computed value (the difference between the two dates), I wasn’t sure how this was possible.
Then I filtered the data in pandas and noticed that they all fell into the same pattern:
What you notice here is that the ONSET_DATE seems to be earlier than the VAX_DATE. How is that possible?
I clicked through into one of these VAERS reports and noticed the following:
As you can see, while the exact vaccination date is known, the exact onset date is not.
And VAERS records the precise VAX_DATE but records the ONSET_DATE as the 1st of the month.
And when you do that, there isn’t any way to compute the NUMDAYS field.
But the NUMDAYS is clearly very important information because it is one of the variables in the Bradford Hill criteria to evaluate if vaccination was the cause of a specific adverse event.
As you can see from the data screenshot, this seems to happen for cases where the vax date and onset date are the same month. (I haven’t done an analysis on the full dataset to verify if this is always true, but mathematically speaking, that is the only way the calculated value for NUMDAYS could be negative).
But what that STILL means is that we can bound the NUMDAYS to 31, which is the maximum number of days in a given month.
In other words, adding a NUMDAYS_BOUND as a new field into the dataset will help people who are analyzing the dataset. That is my suggestion to the CDC.
Of course, if the CDC is doing this intentionally, there are probably going to ignore my suggestion. Let us suppose they are not doing it intentionally.
Adding a NUMDAYS_BOUND can also help folks who just want to do this analysis without relying on the CDC to change their years-old practices, and I will discuss this topic in a future article.