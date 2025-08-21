Can LLMs help analyze VAERS autism reports?
I have built a tool to visualize structured data extracted from VAERS autism reports
On 7th August 2025, OpenAI announced GPT-5.
GPT-5 is very unique, because it now provides an LLM which is capable of extracting fairly complex structured data from clinical narratives like VAERS reports.
But there is an important factoid that many people in the US missed - OpenAI were forced to reduce prices (compared to previous flagship versions) not just to get more customers, because also because there are now other open weight LLMs which produce equally good results at a fraction of the cost of even GPT5.
Note: Equally good is not the same as “much better than”. I think GPT5 is still overall a better LLM for this task, but
a) its lead has shrunk dramatically compared to other LLMs and
b) this is soon going to turn into an even closer contest, which is great news for end users who are working on problems which don’t have much commercial demand, like proper vaccine pharmacovigilance :-)
I have built a tool1 to visualize structured data extracted from VAERS autism2 reports. Being able to visualize the data is an important part of the analysis itself, especially when you want to present the results to other people. I will be working on the analysis and writing up my findings over the coming weeks.
This is what the tool looks like:
I am using a different LLM called Mistral for this screenshot to demonstrate that my ideas are not GPT5-specific but apply broadly to any LLM which incorporates some basic “reasoning” skills.
I put reasoning in quotes because when I use the word reasoning, I don’t mean it in the sense people normally think of reasoning. Personally, I don’t think LLMs - at least the ones which the riffraffs3 are allowed to use - are even capable of human like reasoning4. But there are notable improvements in the ability of these LLMs to “put 2 and 2 together” and do basic inference, and that’s what I am referring to as reasoning in this context.
I will also be using this tool to teach a course on Udemy, where I will explain the technical5 aspects of various techniques you can use for automated data extraction using LLMs - these techniques are often called prompt engineering.
If there is something specific that you would want me to analyze within VAERS Autism reports (and especially if it is based on the data that you see in the visualization tool), please leave a comment and I will try to incorporate it into my analysis.
Interesing side note: the tool was also built primarily using AI. I use an AI code editor called Cursor to do this.
It is also possible to build similar tools for other kinds of data - such as the LAB_DATA diagnostics tests - and for other kinds of reports - such as death reports. They would all require different data schemas and those schemas will decide the visualization UI
Some people say that the powers that be have always had access to much more powerful AI than what the general public is allowed to see. Looking at the level of stupidity and sheer incompetence we saw over the COVID years, I am very very skeptical. But even if I am wrong to be skeptical, that doesn’t change anything given that I can only report on what I have been able to test empirically. The stuff that I HAVE tested is quite underwhelming.
Remember, the AI industry has a good reason to hype up its own capabilities since they make more money if they can persuade you that LLMs can accomplish amazing things with no supervision
The course is intended for Python programmers. But if you are non-technical and simply curious about learning more about the topic, you can still get an understanding of the topic by going through the course. If you are non-technical, you might not be able to understand the code demos and walkthroughs, but the visualization tool will help you see the output and still get a good idea of the current state of the art.
A pre-programmed, algorithm-based database of pre-selected records with inaccurate or missing data, badly assigned to time periods, is going to be analysed by a pre-programmed, algorithm-based software created by nobody knows who and nobody knows for what ultimate purposes. Sounds promising :-) Additionally, nobody can access full structure and programming principles of the so called AI (obviously, company secrets), which means that the unknown bot can be modified at will at any time by its owners or those who rule over it. And the whole humanity, especially the intelligentsia elites, have suddenly become avid believers and worship these AI gods like never before in history.