I had made something similar with the Pfizer documents back in the day ; a good tactic if you want to get rid of the limitations of Algolia is to pre-parse all the files to build a JSON index of the words and their total occurrences in documents - allows to keep a very performant search upon query.
Great initiative !
I had made something similar with the Pfizer documents back in the day ; a good tactic if you want to get rid of the limitations of Algolia is to pre-parse all the files to build a JSON index of the words and their total occurrences in documents - allows to keep a very performant search upon query.
See here if interested :
https://github.com/OpenVaet/openvaet/blob/main/tasks/pfizer_documents/get_documents.pl#L293