National Citizens Inquiry Canada: Searchable video transcripts
How AI will help bypass censorship Part 2
TLDR: I used an AI powered audio transcription service called Turboscribe to generate transcripts for all the 20 videos for National Citizens Inquiry Canada 2024 Hearings. You can download the PDF files here and search the text across the entire set of videos using your computer’s Search feature.
In a recent article I mentioned using a software service called AssemblyAI which uses AI to generate high quality audio transcripts. I built a wrapper tool around it1 and asked for some feedback from Twitter friend
and he pointed me to a much more affordable2 service called Turboscribe. I realized that it made much more sense to use and recommend Turboscribe for my use case3.Turboscribe allows “unlimited” audio uploads for $20/month or $120/year.
Here is how I used it.
National Citizens Inquiry Canada 2024: Video transcripts
I uploaded all the Rumble videos for all the 2024 National Citizens Inquiry Canada to my Turboscribe account under a single folder called nci_2024
Once the transcription is complete, you can click and see individual transcripts.
But Turboscribe does not know who the speaker is, so it just marks it as “Speaker 1”, “Speaker 2” etc4.
But you can edit the transcript and rename the speakers. This is what the Rename Speakers feature looks like:
This will change the speaker name across the entire transcript. And you can also edit the transcript at the level of individual sentences if you want to fix the typos.
Once I uploaded all the videos (there were 20 of them), I spent some time updating the speaker names for all the videos (which took me about an hour).
TurboScribe allows you to download the transcripts of an entire folder in multiple formats. I exported all the transcripts in PDF format and also chose the option to “Section Timestamps”
You can simply search across this entire folder on your computer and find all the videos which contain a specific keyword.
Suppose you are looking for all the videos which discuss the topic of DNA contamination.
Here is the list of four videos which mention this keyword
Then you can search again within this PDF to find the appropriate location.
For example, Dr Pierre Kory discusses DNA contamination around the 1 hour 18 minute mark.
Now you can just go to that specific location in the video and watch that segment.
So you had to do all these steps:
search within your computer’s folder (which can usually only do keyword matches and is not good at phrase matches)
then you had to open the PDF file to find the location of the keyword
then you had to navigate to the appropriate location in the video manually
In my next article, I will explain how to speed up the whole process by using additional tools.
For proofreading and editing the transcript and make it easier to use and share
The accuracy of Turboscribe seems to be better than AssemblyAI. But AssemblyAI takes an LLM approach which means it is better in some ways, it is also better suited for developers (because it has an API), and also provides some NLP features like entity detection etc. So in many ways, these are very different products as of June 2024. But for our use case, Turboscribe is sufficient and is a much better choice because it is much more affordable.
In the context of this Substack, my use case is to make it easy for people to watch and remember audio and video content produced by COVID19 dissidents.
Note: Since I had already completed labeling the speakers for the NCI videos, the screenshots I use for explaining the Turboscribe “Edit transcript” features are actually based on a different video
Wow! Super powerful, it's beyond potential, it's already here! Thanks for taking the mystery out of how it all works. I hope your app works just as good and even better. You deserve it. God Bless
Great! Thanks for this work regarding Vaccine Data Science and also for sharing this useful info about TurboScribe.