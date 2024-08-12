I recently got a question from a Twitter friend on whether vaccine lot numbers are mentioned in the Pfizer CRF documents.

The search tool I built focuses only on the Pfizer 16+ CRFs, but it extracts ALL the text content which appears below the metadata (header at the top of the page) for its free form text search.

The code splits each page into two sections and uses the Header information for the filters - in other words, the text inside the Header section can only be searched using the filters on the left.

All the text below the Header is added to the search tool as free form text.

However, since this information is usually in tabular format, the text is not well structured - it does not flow in the way humans read, but rather in the way a printer might print out lines of text.

In other words, it will be a bit of a jumbled mess if you try to read the text, but you will still be able to do keyword searches on it.

There is also another challenge. Words which are “wrapped” by table borders get split into two lines. If the word you are searching for gets split like this, it will obviously not show up in regular search results.

However, since Algolia is pretty good at doing “fuzzy” (approximate) searches, you should still be able to find matches if they are actually present in the document text.

For example, notice the actual search keyword and the approximate matches:

In other words Algolia can use partial and incorrect keyword input and still return “close enough” matches. So you should be able to use the lot number prefix to see if there are any matches.

Generally speaking, you should be able to do free form keyword search inside the tool for many different kinds of queries, but I would recommend opening the corresponding PDF page to verify the result.

I haven’t been able to find any mentions of lot numbers till now.