This is Part 1 of my Case for Vaccine Data Science series.
I generally don’t like to read articles written by David Gorski because his articles often seem like “argumentum ad antivax”.
Maybe he is just tired of all the anti-vaccine rhetoric and thinks he is “balancing the scales”. Who knows?
On top of that, his favorite approach is to start his articles with a mind-numbingly long CoI analysis using the very old guilt-by-association technique. Because of this, most people’s brains are too fried to understand the substance of his articles by the time he actually gets to discussing it.
For example, this article talks about the misuse of VAERS, but it first goes into another long CoI analysis about Dr Jessica Rose.
By the way, I find it laughable that David Gorski gets agitated about anti-vaxxers who “primed the pump”, even while doing the same thing routinely in nearly every single article.
Bringing up COI is fair game, given COI happens on both sides of this debate. But why doesn’t David Gorski put this at the end of his articles AFTER making the substantive argument?
However, there is actually another statement about Jessica Rose’s “qualifications” later in the article which caught my eye.
Still, it’s not enough simply to show that Jessica Rose lacks the qualifications to undertake the analyses she’s done and that she works for a disinformation institute that’s been spreading FUD about vaccines for years now. Let’s look at her claims and activity, the most famous of which is that COVID-19 vaccines are killing tens of thousands of people without being detected and that the government is covering it up.
Really?
Jessica Rose lacks the qualifications?
As opposed to who?
David Gorski, who is stuck in 2005/2006 and seems to have completely missed the recent improvements in fields like Natural Language Understanding?
Does David have the qualifications necessary to do VAERS analysis?
More importantly, given all the gaps in his own knowledge, is he even qualified to know who is really qualified to do the analysis?
For example, here is how he describes the use of tools for VAERS data analysis:
This is exactly what you want: A whole bunch of people with no knowledge of epidemiology or, even more importantly, the limitations of VAERS “playing” with the whole VAERS dataset in Excel or, if they’re more sophisticated at computer programming, R. R, for those of you not familiar with it, is a programming language and free software environment for statistical computing and graphing the results. As you might imagine, learning to use R involves a bit of a learning curve, certainly more than Excel does.
Whatever one’s skill level, without knowledge of how VAERS works and what its raw data can be used for, you will fall victim to GIGO or “garbage in, garbage out” no matter how good you are with Excel or at statistical analysis using R. That is, of course, exactly what happened to Rose.
Except, neither Excel nor R is the appropriate tool for VAERS analysis. Pretty much any data scientist can tell you why this is the case.
Excel is too limited (it is not really a full fledged relational database), and R is too statistics focused (which is great for statistical analysis, but misses a large chunk of VAERS analysis because it has no text analysis capabilities).
If anyone who is doing VAERS analysis has actually paid any attention to how Machine Learning has improved Natural Language Understanding (also referred to as Natural Language Processing), they would realize that Python is a far better tool for VAERS analysis for these reasons:
a) the pandas Python library can open large CSV files. In fact that’s one main reason data scientists use it. Opening large CSV files suddenly became a necessity for doing VAERS analysis in 2021. This factoid alone should actually make people in the data science community very curious, but the data science community is “the dog which did not bark” during the vaccine rollout.
b) pandas also allows you to run relational queries over the CSV data, which you need to do when processing hundreds of thousands of reports
c) Python has become the de facto language of Natural Language Understanding, which is actually very helpful for parsing the write-ups. In turn, this means you can easily use an NLP library like spaCy as part of your analysis.
d) even though the VAERS CSV files are “tabular” data, there is so much missing or poorly-translated data in VAERS that using SQL for your data analysis will make you want to pull your hair out (which is not good for me because I am already missing a lot of my hair). In other words, you need to be able to add breakpoints and step into/debug the code etc to actually understand what is going on.
The case for (better) vaccine data science
For the rest of this series, I will be making the case for introducing new and better tools for analyzing vaccine data sets. As you can see, I am not being particularly subtle about it, because that is the name of this entire Substack :-)
People like David Gorski spend so much time demonizing anyone who criticizes vaccines, that people who actually know how to use data science tools are not even willing to do any kind of proper analysis of the topic. I think the vaccine promoters have now thrown the baby out with the bathwater, and missed the real dangers of the mRNA vaccine.
(By the way, my default position on nearly any issue is to assume stupidity over malice, and horribly misaligned incentives rather than centralized conspiracy. Since collusion does sometimes happen, I am willing to break this rule if someone gives me an overwhelming amount of evidence)
For example, by doing nothing more than using an off-the-shelf Machine Learning model for biomedical NLP, I was able to immediately point out that the people who still use the “Incredible Hulk” anecdote should not be taken seriously.
In other words, you can automatically ignore all the fake reports by using these off-the-shelf ML models. But this will likely lead to a lot of false negatives. In other words, since the off-the-shelf ML models are not optimized for the task at hand, it will also mark some real VAERS reports as fake.
But it gets more interesting.
I found some research that David Gorski is still citing in his recent articles, which clearly shows that not one person has applied well known data science principles to VAERS.
In the next part of this series, I will go through this example in detail.
I'll add, regarding the supposed incredible hulk reports in VAERS which critics like to use to vilify the system, their deception becomes obvious, once one looks into it. Their claim is that people can enter ridiculous reports (and that CDC supposedly allows them).
There are seven mentioning 'hulk' and all are people using colorful language to describe the actual pain and suffering, such as a lady saying her arm swelled up like the hulk.
They seem to enjoy tossing both the value of health and truth down the toilet while laughing.
1216617 _ 66 _ F _ UT _ arm felt like I was slugged by the hulk _ MODERNA _ 04/16/2021
1309505 _ 39 _ F _ IN _ She said she tried to "Hulk-out" (contract) her left arm muscle, and that didn't work for the twitching, muscle spasm. _ PFIZER _ 05/12/2021
1722161 _ 60 _ F _ PA _ Arm was swollen ,gigantic, humongous like the incredible hulk _ MODERNA _ 09/22/2021
2269033 _ 43 _ F _ CT _ Within 3.5 hours ... intense heaviness crawl up my arm and to my head. ... like heavy weights were resting on my head and shoulders. ... My arms were hulky looking, with the veins bulging out, my muscles ached ... worst health crisis of my life. _ PFIZER _ 05/10/2022
928864 _ 49 _ M _ MI _ approximately 5 [minutes] after receiving injection, headache developed. ... "felt like the Hulk was squeezing the back of my neck". _ PFIZER _ 01/08/2021
1394389 _ 55 _ M _ SC _ Day one was headache ... right arm pain ... joint pain ... loud ringing ears. shortness of breathe, left lung pain. ... heart would race up and I felt like hulk _ JANSSEN _ 06/12/2021
1788772 _ Unk _ Unknown _ Foreign _ My heart is hurting really bad ... chest grew like incredible hulk right after jab; ... ANGINA PECTORIS ... (seriousness criterion medically significant) _ MODERNA _ 10/15/2021