What if you could search the transcript of every video of a Rumble channel?
How AI can help bypass censorship - Part 3
Summary:
I built a search app for the 2024 hearings1 of Canadian National Inquiry
I use Airtable as the content management system for managing the video transcription workflow
It is possible to use my system and create distributed search applications and bypass2 censorship of audio and video content
In my previous article, I mentioned how a service like Turboscribe - which offers unlimited3 audio transcriptions for a fixed price - is a game changer when it comes to audio and video search.
National Citizens Inquiry Canada: Searchable video transcripts
And how accurate is Turboscribe?
Here is a comment from
Wow! Super powerful, it's beyond potential, it's already here! Thanks for taking the mystery out of how it all works. I hope your app works just as good and even better. You deserve it. God Bless
But there is still a lot of manual work involved for a fully functional search experience:
upload the videos to Turboscribe for transcription
after transcription is complete, update speaker labels
download the entire folder of PDF transcripts to local machine
search the folder for your keyword
open each search result file and find the keyword within the file
jump to that location in the corresponding Rumble video
These are a lot of steps to do what I think is a fairly simple end result4.
By using Airtable as a content management system, it is possible to automate many of these steps and create a video search application which will be fairly easy to maintain.
This is what my Airtable base looks like at the moment.
For now, I use a bunch of Python scripts for the automation.
Features
Here is the search application I built:
You can search for a keyword or phrase and jump directly to the specific minute in the video where the topic is being discussed
You can click on the video title in the left pane and read the video transcript (the minutes are sorted in ascending order).
Remember to clear out the search box before you do this. Otherwise you will see only the minutes where that search term is discussed.
You can click on the Speaker name and the video title to see questions from the Commissioners.
You can also click on the keywords for a given video to see where it is being discussed
You can also add other filters to the search app as per your requirements. For example, if you are building a podcast search engine, the published date would obviously be a very helpful filter.
Benefits
Creating video search applications is beneficial for individuals, for content producers as well as the community at large.
For individuals
You can build your own personal search app based on Rumble channels you are interested in. You don’t have to rely on a search engine like Google or Bing to find the content you are looking for, and you don’t have to watch an entire video to find specific topics either. As a bonus, you can also easily share notes from the video, and also quickly review and recall what you already watched.
For content creators
If you are a video producer or podcast host5, you can transcribe all your videos and
create a search application for all the videos in your video library
add the transcript under each video on your website
Once the video is searchable, people will more easily find your website on search engines6.
This will ensure people will come to your website to search for videos about a certain topic.
For the community
In addition to using the Airtable base to manage their video transcription workflow, the content publisher should also share their Airtable base.
This is the key to the whole system.
Here is how the system can bypass censorship:
When the content creator shares their Airtable base which has information about all the videos and all the associated transcripts, the audience members in their community can mix and match these video libraries and produce their own custom search applications for specific topics.
It is best if the content producer does this on their end - in addition to getting more visibility and exposure to their material, this might also prevent a lot of duplicate work7.
In other words, using a simple software tool (in this case Airtable8), we can create decentralized audio and video metadata libraries which can be utilized by people all over the world to find information they are interested in.
AudioJots Toolkit
I am creating a paid product9 called the AudioJots Toolkit which will be a collection of Python scripts which will automate these tasks and also provide training material explaining how to use these scripts.
I will be releasing this product next week.
A few videos where uploaded after I began writing this article, so the full list of videos will not be searchable until I update the search application
The set of ideas I am describing here were heavily inspired by Encyclosphere, which I mentioned in a previous article. However, I realized a few things after I wrote that article:
Encyclosphere is intended for a very different use case (articles you see in Wikipedia)
there is a lot of audio and video content which does not fit the Encyclosphere concept
I would argue that podcasting was the one major communication system which Big Tech companies could not censor. If podcasts were somehow centralized, things could have turned out much worse. Building decentralized search for audio and video content is going to be very important in the future.
The creator of Turboscribe is upfront that unlimited means 720 hours a month (which is actually the total number of hours in the average month). While not strictly unlimited, it is unlimited in the sense of Netflix - no one can watch more than 720 hours of movies a month either (I hope!)
In fact I think all the stuff I am describing in this article is will become very common in the future once unlimited audio/video transcription (with very high accuracy) becomes a commodity
Everything I have described here for Rumble video channels works just as well for podcasts and videos from other hosting services like BitChute, Odysee etc. And yes, also YouTube.
Unfortunately, if the search engine is deliberately trying to censor your website, adding a transcript is not going to help
That is, people who know that the content creator publishes transcripts will go directly to the source, and won’t have to go and create transcripts by themselves.
There are also some alternatives which will also work for this use case, but a lot of people (including me) think that Airtable’s User Experience is superior to the alternatives.
Tentatively, it will be priced around $29 (one time payment)