I'm also catching not only sudden cardiac or sudden death but things like ... suddenly died plus died suddenly and the word sudden within 20 characters of died or death whether in front or behind those words.
You can see that here with the first 20 from 'text_columns':
Your job now is to see how many of mine are false positives. For example, the one that says this in the 'text_columns' column (a concat of them excluding HISTORY) is a false positive:
"fear of death; suddenly rising blood"
It's 2,571 using regex (?:sudden|unexpect) ... sudden OR unexpect as root, like unexpectedly died.
I don't know if this helps, but i think it was Peter McCullough that said there are many ways to say heart attack. I think he went over that in one of his videos. All terminology was correct as people in the medical field would understand they meant heart attack. Since the terminology is not uniform, you will definitely miss a lot of people that died from heart attacks in your count.
Sometimes I write this stuff down. Sometimes I don't. I will look for it. There can be like ten different ways or more to say heart attack. Again all legit ways. If I find it I will let you know.
Very clearly written. One question - does VAERS foreign report issue mean that a 2023 search will fail to return foreign reports from 2021 or 2022? Or does it mean that those reports are still there, but that there have been no updates?
Do you mean a search on CDC Wonder? It is almost completely useless at this point. And it is not very easy either.
A search on OpenVAERS is a lot easier, but I can already see that they are heavily crippled by having to accurately reflect CDC Wonder data, and everything became much harder for them after the EU data purge.
MedAlerts does something interesting, which is like a combination of data from original report plus some updates (I did not fully study it though), so it is probably the search engine which will yield the maximum information which is also up-to-date.
VAERSAware has a more dynamic dashboard based search, but I don't think it is well suited for fast full-text search.
And my Algolia search demos are all based on not-up-to-date information, although they are only intended to be prototypes to help people see the patterns in the data.
If you know SQL you can also get Jetbrains DataGrip and load all the VAERS CSV files into it and use something like postgres to query it.
There isn't one single search engine which can provide comprehensive search at the moment. You need to tie pieces of information together from multiple sources.
So this is interesting but at least one of your assumptions is not necessarily correct. Yes FR reports have much more text, but frequently this is due to repetition. They are populated into the symptom text field twice and sometimes more times. Other times many reports are ganged into one with record, many descriptions. frequently the distinct field info is ganged into the symptom text in FR reports. Finally often the FR reports are from manufacturers making the quality lower not higher.
I agree, and in fact this is also why I feel the NLP community has really let everyone down when it comes to VAERS.
I can imagine all of this being reasonably easy to answer if someone created a robust custom spaCy model for parsing VAERS reports. It is not hard to do, but does require significant upfront time investment.
I think these are sudden: https://imgur.com/gallery/8XQEK4F
Maybe these: https://www.vaersaware.com/3-hour-deaths
It's ~2100 sudden deaths, not ~350 (Aravind) or ~1300 (Eagle).
http://univaers.com/other/vaers_sudden_death_2023-06-30.csv
I'm using the flat file which retains the gigabyte of purged data and also notes text changes.
That is here:
https://deepdots.substack.com/p/new-vaers-flat-file-easy-data-mining
I'm also catching not only sudden cardiac or sudden death but things like ... suddenly died plus died suddenly and the word sudden within 20 characters of died or death whether in front or behind those words.
You can see that here with the first 20 from 'text_columns':
https://regex101.com/r/nOGr9b/1
Your job now is to see how many of mine are false positives. For example, the one that says this in the 'text_columns' column (a concat of them excluding HISTORY) is a false positive:
"fear of death; suddenly rising blood"
It's 2,571 using regex (?:sudden|unexpect) ... sudden OR unexpect as root, like unexpectedly died.
86% of them foreign
I don't know if this helps, but i think it was Peter McCullough that said there are many ways to say heart attack. I think he went over that in one of his videos. All terminology was correct as people in the medical field would understand they meant heart attack. Since the terminology is not uniform, you will definitely miss a lot of people that died from heart attacks in your count.
Yes this is correct. We worked a long time on our lists for different things. Heart attack and stroke are some of the most difficult to query.
I need that list
I couldn't find the list I wanted from Dr. Peter McCullough, but try these words, cardiovascular disease, heart attack angina pectoris, myocardial infarction or MI, acute myocardial infarction, acute coronary syndrome or ACS, infarct, unstable angina, angina pectoris, coronary artery disease, chest pains, cardiac arrest, coronary thrombosis, heart failure, cardiopulmonary arrest, ischaemic cardiomyopathy or IC, congestive heart failure, ischaemic heart disease, ischaemic heart failure, coronary infarction, coronary heart disease, coronary insufficiency, atherosclerosis, ischemic heart disease, angina, tachycardia, stemi ST elevation myocardial infarction, aorta, heart failure, myocarditis, pericarditis
https://www.bhf.org.uk/informationsupport/heart-matters-magazine/medical/medical-terms
https://columbiasurgery.org/heart/glossary
https://www.texasheart.org/heart-health/heart-information-center/topics/a-z/
https://www.webmd.com/heart-disease/heart-disease-glossary-terms
https://www.thesaurus.com/browse/heart%20attack
https://www.powerthesaurus.org/heart_attack/synonyms
https://www.wordhippo.com/what-is/another-word-for/heart_attack.html
https://www.wordhippo.com/what-is/another-word-for/heart_attack.html
https://thesaurus.plus/synonyms/heart_attack
https://thesaurus.plus/synonyms/heart_attack
https://www.dictionary.com/browse/heart-attack
https://www.powerthesaurus.org/heart_attacks
Thanks
Helpful.
Sometimes I write this stuff down. Sometimes I don't. I will look for it. There can be like ten different ways or more to say heart attack. Again all legit ways. If I find it I will let you know.
Very clearly written. One question - does VAERS foreign report issue mean that a 2023 search will fail to return foreign reports from 2021 or 2022? Or does it mean that those reports are still there, but that there have been no updates?
There isn't a simple answer to this question.
Do you mean a search on CDC Wonder? It is almost completely useless at this point. And it is not very easy either.
A search on OpenVAERS is a lot easier, but I can already see that they are heavily crippled by having to accurately reflect CDC Wonder data, and everything became much harder for them after the EU data purge.
MedAlerts does something interesting, which is like a combination of data from original report plus some updates (I did not fully study it though), so it is probably the search engine which will yield the maximum information which is also up-to-date.
VAERSAware has a more dynamic dashboard based search, but I don't think it is well suited for fast full-text search.
And my Algolia search demos are all based on not-up-to-date information, although they are only intended to be prototypes to help people see the patterns in the data.
If you know SQL you can also get Jetbrains DataGrip and load all the VAERS CSV files into it and use something like postgres to query it.
There isn't one single search engine which can provide comprehensive search at the moment. You need to tie pieces of information together from multiple sources.
So this is interesting but at least one of your assumptions is not necessarily correct. Yes FR reports have much more text, but frequently this is due to repetition. They are populated into the symptom text field twice and sometimes more times. Other times many reports are ganged into one with record, many descriptions. frequently the distinct field info is ganged into the symptom text in FR reports. Finally often the FR reports are from manufacturers making the quality lower not higher.
I agree, and in fact this is also why I feel the NLP community has really let everyone down when it comes to VAERS.
I can imagine all of this being reasonably easy to answer if someone created a robust custom spaCy model for parsing VAERS reports. It is not hard to do, but does require significant upfront time investment.
It's very unclear whether this has not already been done with the real dataset.