What happened to the v-safe free text entries?
A paper published by CDC and Oracle in Feb 2023 suggests no one has analyzed the free text entries yet
This is not a part of the “Case for Vaccine Data Science” series, but it will be providing some background information for a future article.
I recently read a paper about v-safe which was jointly authored by the CDC and Oracle’s v-safe team which made some very interesting points.
One of the complaints about the v-safe system is that it was not really designed to capture free form text descriptions from its participants (the way VAERS does), and you can read more about that on Aaaron Siri’s substack.
Why did v-safe use only benign symptoms in their check-the-box fields?
The terminology they use in the paper is “solicited” reactions (or sometimes they refer to it as solicited symptoms).
Participants who report fever are also asked to record their highest temperature for that day (if known); for local reactions and solicited reactions (except fever), they are asked to characterize severity for each reported symptom.
I don’t think Aaron’s article is completely right about the intentional design to use only benign vaccine side effects amongst the solicited symptoms, and you can observe that by reading the paper. It looks like most design decisions were focused on quick, frequent check-ins.
For 82,944,094 surveys completed March 7, 2021 – December 13, 2021 (the study timeframe during which v-safe was available in multiple languages), English was the most frequently utilized language by participants (99.50 % of all surveys), followed by Spanish (0.34 %) and Chinese (0.13 %). Irrespective of language selected for the health check-in survey, the median time for completion of surveys was less than one minute.
It looks like a lot of decisions were made to ensure higher completion rates and two of those decisions were
a) to use well known but benign symptoms at the expense of missing more severe ones to make the survey simpler
b) speed of survey completion, which would gravitate towards check-the-box input fields versus text fields
This is a good example of being so metric focused that you end up creating a poor product, but that’s not nefarious by itself. It does not indicate intentional crippling of the v-safe system. On top of that, you can see that the Oracle team put in quite a lot of effort to get quick, frequent responses from participants instead of long, infrequent ones which can cause a lot of dropout (and if you are being objective, you will notice that they did succeed on that front). So this was probably a good choice in terms of quantifiable metrics, but a poor choice in terms of system design.
In fact, there is a phrase “quantified qualitative” in text analytics, which refers to the process of converting free form text into structured data which can be used for analysis. The reason that phrase even exists is because it is well known that qualitative data adds a lot of friction into quantitative analysis.
When does a v-safe issue get reported to VAERS?
The paper also describes a process for severe symptoms, so it is not quite accurate to suggest the v-safe ignored them.
They call it “outreach” (emphasis mine):
All surveys ask participants to indicate whether any of the symptoms or health conditions they reported caused a health impact (defined as being unable to work or attend school, being unable to do normal daily activities, or seeking care from a doctor or other healthcare professional). Any combination of these health impacts can be selected. Participants who indicate they sought medical care are asked to identify the source of care received (virtual, outpatient or urgent care, emergency department, or hospitalization) and receive telephone outreach from a live agent at a v-safe call center to encourage them to complete a report to VAERS; the call center agents offer to complete the report during the phone call. In cases where the participant does not wish to complete the report during the phone call but indicates willingness to report on their own, the agents provide information so that the participant can complete the report at a later time. No clinical guidance is provided during the follow-up call; participants who have specific questions about how a symptom should be treated are advised to seek guidance from their doctor or healthcare provider.
So there was a process in place to help people who experienced severe and/or unsolicited symptoms.
And the paper also mentions that process lead to the filing of over 21000 VAERS reports by December 2021.
As of December 13, 2021, over 21,000 VAERS reports had been completed during outreach from the v-safe call center; reports to VAERS from v-safe participants are not summarized further here, as they are incorporated into analyses of all reports received to VAERS after COVID-19 vaccine.
I think most or all of these are actually in VAERS. The SPLTTYPE field is marked as ‘vsafe’.
Oracle has not analyzed the free form text
The paper also mentions that no one has analyzed the free form text input (emphasis mine)
The free text fields included in each health survey have posed some challenges analytically. Surveys completed on days 0–7 after vaccination ask participants to record any other symptoms or health events beyond the solicited local and systemic reactions captured in the first week, while surveys completed at day 14 and later ask participants to record any new symptoms or health events. Early reviews of the text fields indicated that while most participants used these fields as intended, some included content unrelated to the capture of symptoms or health events.
While the Oracle team is making it look like content “unrelated to the capture of symptoms or health events” is somehow a bad thing, I would now bet it was mostly just a lot of stuff similar to what you see in the VAERS write up. It is likely a lot of useful information, but we don’t yet have the tools to analyze them.
And it continues:
Approximately 5 % of health surveys overall contain free text entries but the number of surveys completed to date precludes individual review of each entry. We are currently utilizing natural language processing techniques to evaluate patterns of responses in these fields and planning to explore the utility of coding of symptoms reported in free text fields to the Medical Dictionary for Regulatory Activities (MedDRA) terms.
Elsewhere in the paper, they also state that over 121 million health surveys were completed. This means there were around 6 million free text responses. While that is a lot of text to analyze, it is not exactly beyond the realm of data science analysis in 2023, especially if you have a dedicated team looking into this (and I am certain Oracle has one).
But you can see that they are still doing NLP (Natural Language Processing) analysis on the free form text, which is more or less an admission that all publications on the topic of v-safe till date are based on the sanitized version.