At first glance, the LAB_DATA field in the VAERS CSV file looks un-parseable into a machine readable format.
But you can use the concept of a state machine to parse the LAB_DATA
The basic idea is - what was the last seen “test name” and “test date”? Use it as the name and date for the associated test value.
The code below is a very crude implementation of the state machine - it is crude in the sense that it will break under unexpected scenarios. A full state machine usually is robust to handling such errors, but implementing that is overkill for something like VAERS.
def parse_lab_data(lab_data: str, vaers_url, death_reason):
list_items = [l.strip() for l in lab_data.split(';')]
tests = []
curr_date = ''
curr_name = ''
for list_item in list_items:
if list_item.lower().startswith('test name'):
parts = list_item.split(':')
curr_name = parts[1].strip()
elif list_item.lower().startswith('test date'):
parts = list_item.split(':')
curr_date = parts[1].strip()
else:
parts = [l.strip() for l in list_item.split(':')]
part_name = ''
part_val = ''
if len(parts) > 3:
print('Came here')
else:
if parts[0].lower() == 'Result Unstructured Data'.lower():
if parts[1].lower() == 'Test Result'.lower():
part_name = 'Test Result'
part_val = parts[2].strip()
else:
if len(parts) > 1:
part_name = parts[0]
part_val = parts[1]
else:
print('Error')
tests.append((vaers_url, death_reason, curr_date, curr_name, part_name, part_val))
return tests
What we are doing here is parsing the lab data text in the way humans would. Keep track of the last seen “test name” and “test date” and use it as the basis for populating all the fields.
Since these values are sometimes missing, or in the wrong order etc. this kind of parsing is not a robust state machine (as I mentioned before), but it does the trick for a good 90+% of the VAERS reports.
Getting the lab data for death reports
I started with all the foreign reports where DIED=’Y’
Over 90% of them provide a reason for death. The reason is usually provided at the end, and almost always has the same prefix.
Here is an example:
Gastrointestinal hemorrhage; pulmonary oedema; hemoglobin decreased; fibrin d dimer increased; pleural effusion; Deep vein thrombosis; pulmonary embolism; thrombocytopenia; Initial information received on 14-Jan-2022 regarding an unsolicited valid serious case received from non-healthcare professional via Regulatory Authorities under reference: AU-TGA-0000664116 . This case involves a 76 years old female patient who experienced deep vein thrombosis, pleural effusion, fibrin d dimer increased, pulmonary embolism, gastrointestinal hemorrhage, hemoglobin decreased and pulmonary oedema while receiving vaccine influenza quadrival a-b vaccine [FLUQUADRI] and while treated with tozinameran [COMIRNATY]. As medical history it was reported, patient had CT ( computerized tomography ) abdomen 12/03: large bulky confluent lymph node masses both above and below diaphragm in the scanned lower segments of the lungs; possibility of lymphoma; PMHx: CLL (Chronic lymphocytic leukemia) (mesenteric lymphadenopathy; nil treatment; last haem review 2015); lived in nursing home; Alzheimer's dementia (concurrent condition); chronic gastric ulcer The patient's past medical treatment(s), vaccination(s) and family history were not provided. On 01-Apr-2021, the patient started taking 1st dose of COMIRNATY (tozinameran) Unknown dosage, route, strength, formulation, batch number, expiry date for not reported indication and on 22-Apr-2021 patient took its second dose. On 10-Jun-2021, the patient received a dose of suspect INFLUENZA QUADRIVAL A-B VACCINE Suspension for injection (lot number and expiry date not reported) via unknown route in unknown administration site for not reported indication. Patient presented to GP (general physician) around 15/06 with a 2/52 history of right lower limb swelling (onset approximately 01/06 which was 40 days post Pfizer dose 2); GP ordered US (ultrasound) showing extensive right lower limb DVT (deep vein thrombosis); patient was referred into hospital that day; Right lower limb US 15/06: extensive occlusive thrombus in the femoral, poplietal, peroneal and posterior tibial veins; D-dimer: 14 (15/06) PLT 121 (15/06) CTPA (computerized tomography pulmonary angiogram) 15/06: extensive pulmonary emboli (pulmonary embolism); some signs of right heart strain; background changes of lymphoproliferative disorder; free fluid upper abdomen; moderate right pleural effusion; patient commenced anticoagulation; patient found to have significantly elevated d-dimer (fibrin d dimer increased) and low platelets (thrombocytopenia) (all events till here, onset: 15-Jun-2021, latency: 5 days following the administration of INFLUENZA QUADRIVAL A-B VACCINE and 2 months 14 days following the first dose intake and 1 month 23 days following the last dose intake of TOZINAMERAN) so VITT considered (treating team mistakenly thought patient had AZ); VITT (vaccine-induced immune thrombotic thrombocytopenia) ELISA (enzyme-linked immunosorbent assay) done and negative; it's notable patient's DVT and PE occurred in the context of lymphoproliferative disorder (likely malignancy related); discharged home on apixaban on 17/06; re-presented to hospital 01/07 with low haemoglobin (haemoglobin decreased); GI bleed (gastrointestinal haemorrhage) considered likely; patient deteriorated in hospital, developing acute pulmonary oedema (pulmonary oedema) (onset: 01-Jul-2021, latency: 22 days following the administration of INFLUENZA QUADRIVAL A-B VACCINE, 3 months following the first dose intake and 2 months 9 days following the last dose intake of TOZINAMERAN); decision made to pursue palliative care - patient passed away 08/07 (08-Jul-2021). Action taken with quadrivalent influenza vaccine (FLUQUADRI) and tozinameran (COMIRNATY) was not applicable. The patient was treated with APIXABAN and anticoagulation for Deep vein thrombosis, not specified for other events. At time of reporting, the outcome was Fatal for the event deep vein thrombosis, was Fatal for the event pleural effusion, was Fatal for the event gastrointestinal hemorrhage, was Fatal for the event fibrin d dimer increased, was Fatal for the event hemoglobin decreased, was Fatal for the event pulmonary embolism, was Fatal for the event pulmonary oedema and was Fatal for the event thrombocytopenia. It is unknown if an autopsy was done. The cause of death was reported as Thrombocytopenia. Seriousness criteria: fatal (death), caused hospitalization for all events additionally medically significant for deep vein thrombosis, pulmonary embolism, thrombocytopenia, Gastrointestinal haemorrhage, pulmonary oedema.; Sender's Comments: Sanofi company comment dated 20-Jan-2022: This case involves a 76 years old female nursing home patient who died due to pulmonary embolism, 5 days post FluQuadri vaccination. Patient experienced deep vein thrombosis, pleural effusion, pulmonary oedema and gastrointestinal hemorrhage, Patient laboratory test showed fibrin d dimer increased and hemoglobin decreased. Patient had significant medical history of Lymphoma, Alzheimer's disease and gastric ulcers. Time to onset of the event was compatible with vaccination, however, further information such as previous vaccination and allergic history, concomitant medications are required for more comprehensive assessment. Also, the patient's history of possible lymphoma could be a confounder for the events. Based upon the reported information, the role of the individual suspected vaccine cannot be assessed.; Reported Cause(s) of Death: Thrombocytopenia
As you can see, the sentence “Reported Cause(s) of Death: Thrombocytopenia“ is the one we are looking for.
I just did a simple string match to find the prefix “Reported Cause(s) of Death:“, and then I split on the “:” character to get rest of the sentence which provides the reason for death.
And I created a table with the following columns:
VAERS_URL: URL to the MedAlerts Wayback Machine for those who wish to verify the report
DEATH_REASON: The death reason parsed using the technique above
TEST_NAME: Name of the test
TEST_DATE: Date of the test where available
TEST_METRIC: This is usually “Test Result” or “Comments”
TEST_VALUE: The actual value for whatever is being tested
Suppose you want to find all the tests associated with a particular death reason. You can filter within the DEATH_REASON column as shown below:
Deduping
As you can see, there is scope for some deduplication - that is, different death reasons which can be normalized to the same value, and different test names which can be normalized to the same value.
I don’t have enough background in medicine or biology to do this task, and I don’t want to do it manually. If any reader has any insight into how this task can be automated, please let me know in the comments.
Test data patterns
Also, I would be interested to know if there is some way to analyze the lab data to see if we can identify any patterns between Cause of Death → Test Results. A brute force approach like “most commonly associated keywords for cardiac arrest” etc might be possible, but I would like to know if there are better or more nuanced approaches. Once again, if any reader has any insight into this please let me know in the comments section.
Other patterns?
I would also like to know if there is something else which can be analyzed using the lab data. Note that the table that I have created for this article is only for foreign VAERS death reports.
In other words, there is plenty of scope for analyzing other stuff using this technique. But I am not quite sure what to analyze.
To give you an idea:
(Note: all of these numbers are for the most recent dataset I downloaded before the CDC data purge)
Total number of foreign VAERS reports for COVID19 vaccines = 563442
Total number of foreign VAERS reports for C19 where LAB_DATA is not null = 326964
Total number of foreign VAERS reports for C19 where DIED=’Y’ = 16864
(in case you are wondering)
Total number of foreign VAERS reports for C19 where DIED=Y and LAB_DATA is not null = 6784
Given that we can extract the lab data using the method I have described, is there anything else which can be inferred from VAERS by doing an analysis of the lab data? If you have any suggestion please leave a comment below.
How to make test results quantifiable?
As you can see, numerical values are stored as strings in the approach I have used, making it hard to quantify them for filtering and sorting.
While you can technically use SQL (Zoho Analytics provides a feature called Query Tables) to parse and cast the string into a numerical value, I would like to know if there is a systematic way to create multiple CSV files grouped by data type of the test result (int, float, bool) to improve this process on the whole.
If yes, how should I go about it?
Download the CSV file
If you want to do your own analysis on the data, you can download the CSV file here.
Yes. Similar but in the same vein. The basic object is reasonably matching all deleted reports to published reports. As you know the two conditions VAERS can ethically “delete” reports after publication is for fake/false and or duplicate reports. Keep in mind that VAERS has up to 4-6wks to rigorously authenticate incoming submissions and request additional records if needed to adjudicate a report before publication. With that they still have a need to delete 25,644 reports to date including 680 death reports. I’ve manually spot checked about 2K reports over time and could not find even 10% (~200 reports) that could be considered a match or even a reasonable match. A death deletion I refer to routinely is a ~31 female died in Maine back in ~June 2021. In the live data there is not even a 31yr old female that has died in Maine at all? I’ve heard someone else tell me this type of matching was called a “de-duping” process. Most medical systems I’ve worked out of have this type of feature built in with an additional merging feature as duplicate patient medical records is the bane of every doctors existence it seems. Every doctor has about thirty different patients named Maria Hernandez, at least in my neck of the woods. Lol
I’m the only entity in the world that has a complete set of all deleted c19 jab reports (25,644). The next objective would be to determine based on reasonable matches over many fields, which deleted reports have a reasonably matching counterpart in the live data? Which has absolutely no match and is also considered a strong candidate for being a genuine sincere honest report, based on being submitted by a “contactable physician or HCW” and/or other criteria. I’m already in discussions with some VIP’s who are very interested in this project and might even be willing to fund the project or at least pay the right people or person to get the job done. If you think you have the skill set for this project, please reach out. If not for any reason no worries. You’ll see it on the news. God Bless
https://www.vaersaware.com/deleted-reports-2007-2022