How NLU based information extraction makes VAERS more searchable
For example, we can derive the age from the clinical narrative in a significant percentage of foreign reports even if it is missing in the AGE_YRS field
Recently, the CDC removed the clinical narrative field (SYMPTOM_TEXT) from the foreign VAERS dataset.
The pro-vaccine folks might say - given someone at the CDC reads the report and adds all the extracted information such as age, vaccination date etc. into the remaining columns, what is the big deal if the clinical narrative text is not made public?
But I have already noticed a pretty concerning trend after I started looking at the foreign dataset.
This is the problem: the age field is available in the SYMPTOM_TEXT but is not filled out in the actual CSV file1.
And unless you use NLU based information extraction, it is too laborious and time consuming to fill out this information, especially if the SYMPTOM_TEXT field is already verbose and hard to read.
Note: I don’t actually know if it is OK to post the actual pre-deletion dataset online. So I will not be posting any of the reports for UK and EU at the moment unless I understand the rules around posting it online. Until then I will use other regions for my article datasets.
For example, the three commonly used tools for searching VAERS - namely CDC Wonder, OpenVAERS and MedAlerts - will not provide the full list of reports if you search by age (or age range) because they use the AGE_YRS field.
New Zealand
Let us consider the example of New Zealand, which I discussed in a previous article.
You can see something very interesting if you look at the AGE_YRS and the SYMPTOM_TEXT field in conjunction.
Filter for rows where AGE_YRS is null (meaning it was not filled out in the report) but where the SYMPTOM_TEXT field contains the word “old”. More often than not, this word appears as part of a phrase which provides the patient’s age.
Myocarditis
Suppose you want to know how many people in New Zealand under the age of 40 got Myocarditis after taking the vaccine.
CDC Wonder does not allow you to search for this.
On OpenVAERS while you can search through the foreign reports, you cannot restrict by country.
On MedAlerts, you can actually put NZ into the SPLTTYPE field and it will only return the results where the SPLTTYPE field contains the string NZ, which is what we want. These are all New Zealand reports. So I added it and also searched for the word Myocarditis in the SYMPTOM_TEXT.
Using this search query on MedAlerts, you see that only 25 results are being returned
To compare, I took the New Zealand reports from the previous article and filtered for the word Myocarditis in the SYMPTOM_TEXT (to match with the above search results) and created a new, smaller dataset.
If you now filter only by AGE_YRS < 40, you get 26 results which nearly matches with the MedAlerts website. The difference is probably due to the fact that MedAlerts data is up-to-date, while mine is about a month old.
But if you use the calculated DERIVED_AGE instead (see the article I linked above to see the Python script I used for calculating the DERIVED_AGE), you get 120 results instead. Which is nearly a 5 fold increase!
(You can verify this for yourself using the link above)
Summary
Using NLU powered information extraction allows us to get a much more accurate picture of vaccine safety by allowing us to analyze the SYMPTOM_TEXT field in VAERS.
The fact that none of the health authorities seem to be concerned about this speaks very poorly of them.
And it happens at a significantly higher rate than the US dataset. This is probably because there are many more eyes looking at the US dataset compared to the foreign dataset.