Please note: once I publish the second part of this article, I will start focusing on vaccine dataset analysis again on this Substack. If you are interested in the software I am building, I suggest you follow my other Substack.
Summary:
AI generated transcripts have become very accurate, and with the help of my software tool, can be quickly converted into an easy-to-skim article format (for example, see this article based on Bret Weinstein’s recent DarkHorse podcast interview with Joomi Kim and Bret Swanson)
The Encyclosphere project provides a framework1 to bypass online censorship
Creating a searchable encyclopedia about podcast episodes will not only be a practical proof-of-concept, but will also become extremely handy if the censorship gets worse
Podcast creators should publish articles for each podcast episode using the AI generated transcript. This can help bring more traffic to their websites, and make it easier to get credit for ad referrals
These transcripts-turned-into-articles will become the cornerstones for creating searchable encyclopedias for podcast episodes. I will discuss some ideas for that in my next article.
Global Disinformation Index
A few days back Freddie Sayers from Unherd posted this video to Twitter
It confirmed what many people have suspected since the Ivermectin fiasco, but it is still worth the 22 minutes if you want to learn about the core strategy - the easiest way to censor is to attack potential funding sources, using the bottomless reserves of taxpayer money, in a non-transparent way if possible.
Can we do something about the censorship?
In this article, I will discuss a paid software tool I am building which can potentially help bypass online censorship.
I started building it for the following reasons:
a) obviously, it is better than complaining about the censorship
b) it is a field I am quite familiar with
c) if the Encyclosphere project (explained below) takes off, it will probably be much more impactful than vaccine dataset analysis
I think this article will be both educational and promotional, so I urge you to read the whole thing whether or not you want to use the tool I am building. This is especially true if you are not very familiar with how much AI has evolved in the last year or two.
The Encyclosphere project
Larry Sanger is a cofounder of Wikipedia, who now says he no longer trusts the website he helped to create.
Importantly, he started building something to fix the problem.
The Encyclosphere is a bit hard to summarize in a paragraph, but there are two basic ideas:
it combines a search engine and an encyclopedia. There are many moving parts in the project, and the Encyclosphere team has taken care to make each of them decentralized. This decentralization is what makes it censorship resistant.
it uses the one dominant standard for what constitutes an encyclopedia article. This also makes it more resistant to censorship (in contrast, having lot of competing standards often helps entrenched oligopolies).
If you are a software developer, it should be easy to read Larry’s article and think of ways to contribute to the code.
What if I am not a developer?
But if you are not a software developer, you can still help in two ways:
donate money to the project and help it stay alive for longer
use my tool to contribute content :-)
The tool itself is actually really simple.
It will help you quickly convert an AI-generated podcast transcript into an easy-to-skim article.
I will be providing more details over the next two weeks. I hope to launch the early access version of my software tool in the first or second week of May.
Podcast Discoverability
I read this very interesting thread between Larry Sanger and some Twitter commenter from Feb 2022.
This was around the time when Joe Rogan was interviewing people like Dr Robert Malone, and Spotify was facing a lot of external pressure to cancel Rogan.
The thread explains three concepts at once.
There are no open podcast networks
Larry makes this comment in the first tweet. At the moment, there isn’t really any “aggregator” which collects information about all podcasts.
This actually turned out to be a good thing - it meant that there was no easy target for podcast censorship during COVID19.
Podcasts are not centralized
The main reason podcasts were not censored during COVID19 is because there was no way to do it.
As the commenter says,
“Podcasts are just RSS feeds. Video "podcasts" are just videos. Plenty of hosting for both.“
Once you found a podcast talking about Ivermectin, the powers-that-be could do nothing to stop you from getting updates from the podcast creators.
But they could still make sure you could not discover the information.
Discoverability is a big problem
But this openness comes at a price. Podcast episode discoverability is a big problem.
Even today, for example, there isn’t any easy way to find all the podcast episodes which are talking about the upcoming WHO pandemic treaty. There are many scattered podcast search engines, but none of them are comprehensive.
We can create an Encyclosphere of podcast episodes
The Encyclosphere can help solve this problem.
We can create an open network of podcast episodes using the same technology which powers Encyclosphere, so there is no need to reinvent the wheel.
How AI generated transcripts can help podcast creators
AssemblyAI is a service which can produce AI generated transcripts for audio files.
It has very good accuracy (above 95%). But that is quite standard these days.
There are three aspects of AssemblyAI which helps it stand out from its competitors:
if there are multiple speakers, it provides excellent speaker diarization (that is, who spoke during which time segment?)
it provides fairly good automatic chapters (that is, it can automatically split the audio into logical chapters along with the corresponding time segments)
it does all this for about 60c per hour of audio, which makes it one of the most affordable services
Given the cost and effort which already goes into producing podcast episodes, I think podcast creators should make the small extra effort and use AssemblyAI to get this AI-generated transcript.
AssemblyAI has a free playground which allows you to get the raw transcript if you upload an mp3 file. You CAN just directly publish this raw transcript on your website, but there are a lot more benefits if you use their paid service to get a more programmer-friendly transcript file (in JSON format) and plug it into a software tool which turns it into an easy-to-skim article.
Suppose you are a podcast creator.
You can use my software tool to turn it into an article-like format and publish this article on your website.
Here is an example of what this article would look like.
My software tool will allow you to configure the output to your liking - for example, by default my tool adds a deep link to each minute of the audio to make it easier for people to jump to that location. But it is also quite cluttered, so you may not want to display them. All this is easily configurable in the tool.
How does it help you if you publish such articles for each episode?
Give people a reason to visit your website
If you mention these transcripts at the end of each episode, some listeners will visit your website if they are trying to recall some information they heard on your podcast.
If you are selling any products on your website, this also gives you an easy way to surface those to your podcast listeners.
Get more inbound links
If you have a website associated with your podcast, the website itself will get more inbound links, because people would usually prefer to link to a transcript of the episode if possible.
And if you are not being penalized by search engines for your “dissident” content, this should usually help you with SEO.
Highlight your sponsors
AssemblyAI can sometimes suggest some strange chapter headings.
This is one of those examples:
But then, when I read the article, I realized: this is actually a pretty good way to highlight your sponsors!
The article format allows you to do this - so why not take advantage? :-)
Add attribution-friendly links to the transcript
If you have sponsors for your podcast episodes, you can get click-attribution more easily. This is usually quite challenging with the audio format.
You can add a link to the sponsor’s website and embed your coupon or referral code into the URL directly. This will help your sponsors to attribute the purchase to you.
If you consider the transcript example from the beginning of the article, you can simply add a hyperlink with the correct URL:
Link to past episodes
If you mention past episodes, you can link to those articles directly from your transcript.
This will help increase the time that people spend on your website, and also help them quickly find more of your podcast content.
Link to a glossary page
Adding high quality glossaries related to your podcast topic can help you get hyperlinks to your website.
If you do have one, you can also add links to the glossary from your transcript itself.
And suppose you are using WordPress, there are even plugins which can auto-suggest glossary terms by analyzing all your content.
In the next part of this article, I will discuss how to create encyclopedia articles for each podcast episode, as well as some ideas on how to create decentralized search engines for the podcast collections you are interested in.
In the meantime, if you have any questions you would like me to address in the next article, please ask them in the comments.
I would say that the Encyclosphere is still a somewhat theoretical framework. The existing tools are certainly functional, but they are not yet easy-to-use.
You might be interested in podcastindex.org and podcasting2.org.