How to design a good search interface for VAERS
A proof-of-concept demonstrating the benefits of faceted search
Summary:
I have built a proof-of-concept for a VAERS search interface using Algolia. There is a chance they might pull it down because they don’t like the content, so it is a good idea to try it soon if you want to see it in action :-)
Individual search results contain summary info as the title, the MedAlerts URL is the subheading, and the first few sentences from the writeup is the result snippet
You can search within the text of the writeup and it is very fast
You can filter or search on the following fields
Age
Outcome
Days to Symptom onset
Country
Vaccine Lot Number
Symptom
Whether reporter was a healthcare professional
Vaccination date and symptom onset date
I have built a proof-of-concept1 based on the last fully published foreign VAERS dataset from Nov 2022.
I decided to build this for two reasons:
I don’t think most VAERS analysts fully understand how far search UIs have evolved in the past 10 years or so
I will use this concept of faceted search2 for another more important dataset which I will discuss in my next post, provided Algolia does not pull it down
As a reminder, this was just before the CDC stopped publishing writeups for EU VAERS reports, citing “privacy” reasons.
I used the excellent search product Algolia for building this, but I am not sure if they will allow me to keep this up and running3. Go and try it out soon in case they shut it down!
The tech community is at best ignoring vaccine injuries, and at worst outright censoring all information about them.
I don’t know Algolia’s own stance on this topic, and I hope they are not censorious.
Included reports
I included only serious injury reports, but I also added RECOVD != ’Y’ (not recovered) into the list because
a) I am not sure how serious those reports are
b) it allows us to search for followup reports more easily
According to Algolia’s index, there are over 467K such reports.
And as I will show later, including the ‘unrecovered’ reports also provides a way to search for followup reports.
Search for text inside the writeup
You can search for keywords within the SYMPTOM_TEXT (that is, the writeup) of the VAERS reports.
Since Algolia has a built-in limit on the total number of characters within a single record (10K), sometimes I had to trim some of the writeups to fit the maximum limit. There were only about 150 such reports, so you can search across the entire writeup for all the reports.
Notice that Algolia’s search is instant, and that is one of the big selling points of their search service. Algolia will also highlight search results where it makes sense.
Individual search results
There are three parts to the search result - the title, the subheading and the result snippet.
The title of the search result is Age | Sex | Country
As you can see, the title in the search results contain Age, Sex and Country separated by the pipe (|) symbol to help you see this information at a glance without reading the full report.
If the Age is -1 it means the age was not provided and I was not able to infer it from the writeup using my Python script either.
The subheading is the URL on MedAlerts Wayback Machine
I use Algolia’s built-in user interface demo, so I just populated the subheading of the search result with the corresponding URL in MedAlerts Wayback Machine.
You can copy-paste this URL into another browser tab (it is not clickable) and see the full report.
The result snippet is the first few lines from the writeup
The result snippet is the first few lines from the SYMPTOM_TEXT field. While the full report is not visible in the search results, the tool does search across the full VAERS report writeup when displaying results
Filter and Search by Age
A lot of foreign VAERS reports are missing the AGE_YRS in the CSV file, but you can infer the age by writing some code to parse the text writeup.
In case my Python script cannot infer the age, I just report it as -1
You can also filter within the facet to search for a specific age.
Filter by Days to Symptom Onset
I also report the days to symptom onset, and you can filter and search it. Note that sometimes I calculate the bound for the maximum number of days to symptom onset (for e.g. when the symptom onset is unknown).
Filter by Outcome
You can filter by outcome - so if you want you can filter down and only see reports where the Outcome is Death, you can do that by simply clicking on the corresponding checkbox.
There were 16000 reports of deaths in foreign VAERS by November 2022.
Filter by country
You can also filter by country name. I parse the first two characters of the SPLTTYPE field and map it to the appropriate country name. If this information is not provided, I report the country name as ‘Unknown’
Filter by vaccine lot number
You can also filter by vaccine lot number.
On top of that, you can also specify the first two characters and see if there are some misspelled lot numbers.
For example, knowing that the lot number has to be two characters followed by four digits, you might be able to infer that the highlighted lot numbers quite likely refer to the same lot.
I know some folks who have done a lot of analysis in trying to autocorrect these lot numbers, and this provides another way to deal with the same issue4.
Filter by Symptom
This field just uses the symptoms reported in the SYMPTOMS CSV file for a given VAERS report.
You can search within this list to find specific symptoms you are interested in.
Filter by Reporter (Healthcare Professional or not)
You can filter by the status of the reporter - was it a healthcare professional or was it reported by a non-healthcare professional (usually consumer). Sometimes it is not possible to infer this from the VAERS writeup, in which case I leave it as ‘Unknown’
Reports filed by healthcare professionals tend to be much higher quality. In fact, over 90% of foreign VAERS reports are presented in third person (so it is unlikely to be the consumer), and over 60% are actually filed by HCPs.
This is in stark contrast to the claim that “anyone” can file a VAERS report. This certainly does not seem to be true for foreign VAERS reports.
Filter by vaccination date and symptom onset date
Both of these are represented in YYYYMMDD format, and unfortunately the search within the facet is not that great, so my suggestion would be to filter on other facets first and reduce the size of this list, and then choose the date5.
I built a proof-of-concept based on old data for two reasons - I don’t have the bandwidth to maintain a tool which is based on real-time data, and the snapshot I chose represents “peak VAERS insight” IMO, given that immediately after that the CDC stopped publishing a large chunk of previously available information.
You have in fact used “faceted search” many times already if you have used online stores like Amazon, and OpenVAERS is also an example of faceted search. Algolia simply provides the best product for creating faceted search on a custom dataset, plus it is provides very low latency results.
A quick background on how I got free access to Algolia’s paid search product. In mid-2020, they gave free access to their search tools provided you were building something COVID-related. I built a decent search engine for COVID19 literature based on the CORD19 dataset, but abandoned the project after a while because I realized that the “RIGHT THING TO DO” is to just passively wait for the vaccine, or die in case you caught COVID19 before that. I am being sarcastic of course, but I did not realize back then that there were a lot of for-profit entities trying their best to make sure people did not pay any attention to those who were volunteering their time in trying to help find solutions. In any case, I wiped all the papers from my old search tool (to make space) and used it to create this tool. Technically, this new search tool is also still related to COVID19, but I am not sure if Algolia agrees with me :-)
In addition, it also highlights the importance of searchable facets for vaccine lot analysis. Sometimes the best way to do these things is for the human to take a glance at the end result. As powerful as software is, it is still pretty hard to beat human intuition
Unfortunately, the out-of-the-box user interface demo from Algolia does not provide good user controls for filtering on dates.
Instant is impressive, I wonder how Algolia does that.
If you're not averse to assistance I think I'd like to collaborate with you on that.