Peer Reviewed Lancet paper highlights CDC's v-safe text mining failures

How did this paper even clear peer review?

Aravind Mohanoor

Nov 02, 2023

A CDC paper published in August 2022 is the only one (I know of) which published sample verbatim free-text responses from v-safe
The paper tries to filter free text responses related to menstruation, and to further classify them on different dimensions like severity etc. using a pretrained Machine Learning classifier
The sample code used in the paper shows that the v-safe free text information is already in a format which can be easily analyzed if the CDC wanted to do it
The v-safe app omitted a list of known symptoms from the vaccine clinical trials from its check-the-box options before releasing the v-safe app. This meant people who experienced these symptoms could only report them using the free text option.
Till date, the CDC has not done any text mining of v-safe free text for these omitted symptoms

The paper cleared peer review because

a) it was written by the CDC, and

b) the left hand of the CDC does not know what the right hand does

Here is the paper (published August 2022):

Menstrual irregularities and vaginal bleeding after COVID-19 vaccination reported to v-safe active surveillance, USA in December, 2020–January, 2022: an observational cohort study

Let us call this Paper 2. (Paper 1 will be introduced later in the article)

So what is so unique about this paper?

As far as I am aware, it is the only peer reviewed paper which provides sample verbatim free-text responses from v-safe1.

They look like this:

So how does this paper highlight CDC’s text mining failures?

First of all, here is an excerpt from the paper:

Data sharing
Individual participant text responses in v-safe are not available to others because they might contain personally identifying information.

And this is another excerpt from the paper:

Implications of all the available evidence
Menstrual irregularities and vaginal bleeding after COVID-19 vaccination are being reported, although this study is unable to assess whether these events are caused by COVID-19 vaccination. Text analytic methods can speed up the identification of potential vaccine reactions that are not prespecified in structured data collection. Further studies are needed to better understand their prevalence and pathophysiology.

When you combine these two excerpts, a few things become clear:

1 Only the CDC has access, and hence ability, to do text mining of v-safe free text entries. If someone can be blamed for the failure to do proper text mining, it should be the CDC.

2 They acknowledge that “text analytic methods” can “speed up” the identification of potential vaccine reactions that are “not prespecified in structured data collection”. They are referring to unsolicited responses, which we already know that the CDC has not analyzed till date2

CDC's v-safe text mining is laughable

Aravind Mohanoor

March 22, 2023

This is Part 6 of my Case for Vaccine Data Science series. I was reading a paper about CDC’s v-safe free-text-response analysis, and it is just laughable. First, remember that all the “solicited” responses were just check-the-box fields for known, benign vaccination side-effects and any other adverse reaction was only captured by the free text responses …

Read full story

3 The paper also shares some sample code which tells me that all the free text data has already been compiled into an easy-to-analyze CSV format (which I will discuss in a future article)

4 Given the clear admission from the CDC that it has not done text analytics on unsolicited symptoms, it is pretty clear that they are trying to pass off absence of evidence as evidence of absence (of vaccine danger).

If they wanted to be certain there was actually an absence of evidence in the nearly 6 million v-safe free text responses that they collected, the CDC should have actually done a more thorough job of text mining (explained later in the article).

And in case the CDC felt it did not have the necessary skillset in-house, they should have asked for help from people in the tech industry instead of declaring that the mRNA vaccines were completely safe.

What kind of skillsets should they have been looking for?

I would say an “AI Scientist” working at one of the tech giants should have been able to sort this out for the CDC in a matter of a few days.

How many Big Tech AI Scientists does it take to open a CSV file?

Aravind Mohanoor

October 6, 2023

How many Big Tech AI Scientists does it take to open a CSV file?

There was an interesting exchange between Elon Musk and Turing award winner Yann Lecun recently. Remember, we already learnt in mid-2021 that the vaccine manufacturers knew that the mRNA vaccine cannot confer any lasting protection. Here is proof:

Read full story

But I am not sure if the tech giants would have cooperated3.

The 18th Camel: Big Tech Censorship

Aravind Mohanoor

March 31, 2023

People who look at the mRNA grift might ask “How did so many smart people blindly trust Big Pharma even though they paid such heavy fines for false marketing in the past?” Pfizer, the world's largest drugs company, has been hit with the biggest criminal fine in US history as part of a $2.3bn settlement with federal prosecutors for mispromoting medicines …

Read full story

Publication Timeline

Now I will provide another heavily cited paper which looked at pregnancy outcomes, so you can get an idea of the timeline of how things evolved. This was published in June 2021.

Preliminary Findings of mRNA Covid-19 Vaccine Safety in Pregnant Persons

We will call this Paper 1. It has almost 1000 citations according to Google Scholar.

This paper (and the timeline) is important because the two papers share two authors: Tanya Myers and Christine Olson.

In other words, there were at least two people who co-authored Paper 1 who should have asked the rest of the team to read it when they were writing Paper 2.

On top of that, Paper 2 does not even cite Paper 1 even though

a) they are talking about a related topic

b) Paper 1 explicitly states that they will need to do more followup in the future using v-safe (emphasis mine). In other words, it would have made a lot of sense to use Paper 2 for followup

We were unable to evaluate adverse outcomes that might occur in association with exposures earlier in pregnancy, such as congenital anomalies, because no pregnant persons who were vaccinated early in pregnancy have had live births captured in the v-safe pregnancy registry to date; follow-up is ongoing.

Why I think CDC’s v-safe text mining was a failure

Is it too strong to suggest CDC’s text mining of v-safe was a failure?

I don’t think so. Here is why.

Let me first give you a summary of the text mining that Paper 2 tried to do.

In Paper 2, you see the following excerpt:

We examined reports to v-safe from Dec 14, 2020, to Jan 9, 2022, among adults aged 18 years and older. To identify text responses potentially related to menstrual irregularities or vaginal bleeding, we filtered all 6 168 832 free-text responses to select those containing the word fragments (“menses”, “menst”, “spotting”, “period”, “cycle”, “miscarr”, “menorrh”, or “metrorrh”) or (“bleed” or “blood” and “menop”, “uter”, “vag”, “breakthrough”, “break through”, “endomet”, “gest”, “term”, or “trimester”; appendix p 1).

In other words, they try to use string matching to see if a given free text response matches either Pattern 1 (alone) or both Pattern 2 and Pattern 3.

See the figure below.

About 95K free text responses out of a total of 6 million responses matched the pattern described.

Then they used a pretrained Machine Learning classifier model called BART which can classify these free text responses into the hypothesis statements (seen in the bottom of the picture above).

We applied zero-shot text classification to these filtered responses with the Bidirectional and Auto-Regressive Transformers (known as BART)13 model for natural language inference, pretrained for sentence pair classification on the Multi- Genre Natural Language Inference14 corpus

The classifier was able to label ~86K out of the 95K responses from the search string filtering.

They they did manual review of a subset of the classifier’s response, to see how many it got right. You can see that the classifier had different levels of accuracy for different topics.

The accuracy, however, is not the focus or even concern of my critique.

Why wasn’t miscarriage included as one of the classifier topics?

We can see that the string prefix ‘miscarr’ was already part of the search strings.

Here is the list of topics that they wanted to classify using the BART classifier from the code sample:

topics = ['This mentions a period that came early', 
          'This mentions menstruation', 
          'This mentions a period that came late', 
          'This mentions spotting', 
          'This mentions a heavy menstrual period',
          'This mentions vaginal bleeding', 
          'This mentions uterine bleeding', 
          'This mentions a painful menstrual period', 
          'This mentions prolonged bleeding specifically', 
          'This mentions an irregular period specifically', 
          'This specifically mentions missing or skipping a period', 
          'This mentions not having a period for years']

core_top = ['This mentions menstruation',
 'This mentions spotting',
 'This mentions vaginal bleeding',
 'This mentions uterine bleeding',
 'This specifically mentions missing or skipping a period']

timing_top = ['This mentions a period that came early',
 'This mentions a period that came late',
 'This mentions spotting',
 'This mentions an irregular period specifically',
 'This specifically mentions missing or skipping a period']

severe_top = ['This mentions a heavy menstrual period',
 'This mentions a painful menstrual period',
 'This mentions prolonged bleeding specifically']

yrs_top = ['This mentions not having a period for years']

As you can see, miscarriage is not one of the topics they wanted BART to classify.

Why was it omitted?

Were there ANY miscarriage reports in v-safe?

One possibility is that there were no miscarriages at all, so there wasn’t any need to analyze it.

But miscarriages were already reported in the original paper.

With so much of the groundwork done in the code already, you might think it would not have taken much more effort to do topic classification for miscarriages?

Was it omitted because they did not want the public to see some sample free form text messages about this topic?

But it actually gets much worse.

At least the CDC established a pregnancy-specific v-safe registry which they could rely on for their assessments.

As an aside, Arkmedic is clearly not impressed with the results from the v-safe pregnancy registry, but I don’t have enough of a background in biology or medicine to comment on it.

Arkmedic's blog

Debunking Viki Male: Part 1

Yes I know I’ve been busy for a few weeks - for various reasons - so hopefully this longish article will make up for my absence. Although the hiatus was partly due to the need for a break this particular article has required more than the usual amount of research and the web of deceit uncovered reaches into many corners of medicine. I have had to decide…

2 years ago · 222 likes · 65 comments · Dr Ah Kahn Syed

But the CDC did not publish any text mining for the other known serious adverse reactions for the mRNA vaccines.

Omitted solicited symptoms from the original v-safe

If you are new to this topic, you might not know that the original design of v-safe also had some additional serious symptoms which were solicited (here solicited means they had a check-the-box option for it).

This was not some random list, but based on a list of symptoms that the vaccine manufacturers identified during the clinical trials.

And then they were omitted before v-safe was released, so that only the more benign symptoms were provided to the patients.

Here is how Aaron Siri describes it:

And what happened as a result of moving these Solicited symptoms to the free text boxes? This is how Aaron describes it.

Had the CDC included the check-the-box options for these adverse events, it could have clearly calculated a rate for each harm for the 10 million v-safe users. For example, if 400,000 reported myocarditis, then that would be around a 4% reported rate for this condition. The CDC, however, chose not to include these harms as check-the-box options. It instead relegated them to only potentially be captured in free-text fields!
Source

So what is the next logical step if you omit these symptoms from the solicited list?

You will do the same kind of text mining analysis that they did in Paper 2, but based on the omitted symptoms.

And as far as I know, the CDC has not published any paper on the topic of mining the free text entries looking for any of the omitted unsolicited symptoms4.

Why wasn’t this done?

Conclusion

How many times have you heard the vaccine pushers point to v-safe as an example of how well the COVID19 vaccines have been studied and analyzed?

Even allowing for the fact that only the CDC has access to the v-safe free text entries (a red flag all by itself), why has there been only ONE paper written on the topic of text mining v-safe free text entries in 2+ years?

If I am wrong, please provide the link to the peer-reviewed publication in the comments. Not a blog or Substack article, but an actual peer reviewed paper.

I am somewhat glad I found this paper. In that previous article about v-safe text mining, I was making some assertions which would have been a bit hard for me to prove without seeing at least a few sample verbatim free text responses.

I suppose they could have still asked someone in a smaller organization. But this is the kind of quick-and-dirty stuff that is probably beyond the capability of a giant slow-moving bureaucracy.

If I am wrong about this, please provide a link to the paper.

Closed VAERS

Nov 2, 2023

Excellent write-up Aravind, it's worthy of me doing a write-up of explanation of this exact topic (data mining and diagnosing) causing the rift between Liz&Jess and myself. The full back story is here: https://www.vaersaware.com/post/openvaers-com-the-myocarditis-stats-the-full-story.

Speaking of miscarriages and fetal demise and analysis I did with Dr. Thorp, I noticed where CDC coders entrusted to extract a "symptom" from the narrative, incorrectly transcribed/assigned a spontaneous abortion to the report instead of foetal death, or even premature baby death!

Write-ups would say, "I had a miscarriage at 23 weeks" or "I had a miscarriage at 5 months". Both of these statements would be incorrect statements of miscarriage as defined (<19.999 weeks). Since I never found a report where I could say this was a good coder with the wisdom to know the difference , I started to think how much of this "coding" was being "automated" and how much was needed to be seen by human eyes? Surely they are automating some of the low hanging fruit easy stuff and do not need a coder to code everything? Especially the report that are simple administration errors only as an example.

Expand full comment