Reddit Is A Government Experiment
“Where any answer is possible, all answers are meaningless.”
Isaac Asimov, The Road to Infinity
Reddit is intended as a blank canvas for human interaction.
The purpose of this is to generate a massive amount of written content that helps in creating an advanced AI chatbot that learns from Reddit users, in order to realistically mimic human conversation across a wide range of topics.
Up until now, the majority of my posts have been firmly grounded in well-intentioned research and fact. Those that have not been, have been posts that focused on some sort of abstract ideas about life and society.
Today’s post will change all that.
When I made this blog, I made it with three goals in mind: to learn the basics of creating and maintaining a website, to practice my writing skills by drafting weekly articles, and to create content that I would desire to consume. Thus far, I believe I’ve succeeded in these three simple goals—but I have not taken the final goal as far as I would like.
Everything from the name of the website, to the typography, clean graphic design, straightforward user interface, layout of the posts, and so on has emulated what I have found most appealing in the various sites and blogs I’ve browsed over the years. The topics and posts all have been in areas of high interest to me, and many have stemmed from deep discussions with peers.
Even the length of the posts—clocking in at an average of about 2,800 words—is around the ideal length that I like to consume. Many people have offered the feedback that they believe they would enjoy my writing more if it was shorter. I appreciate the feedback as always, but I personally have liked reading articles that go in depth and make me lose myself with the amount of links and analysis.
So what exactly have I been holding back from my personal wishlist? The conspiratorial aspect.
Some of my friends may already know this from experience, but I absolutely love reading, watching, and discussing anything related to conspiracies, from the outlandish to the proven. A lot of the times, these musings are simply for fun, and serve as thought exercises—other times, I genuinely have a belief that the surface level answer is unsatisfactory, and that some grander scheme lies behind the scenes.
I would have liked to make a conspiratorial post earlier in my blog, but I knew I couldn’t just force it. I had to find the right topic that struck a balance having some sort of evidence that could be loosely connected, but also be out there in terms of shock factor—the hallmarks of my favorite kind of conspiracies. These are the ones that require a certain degree of suspension of belief, but are still plausible if you truly have a skeptic’s mind.
After a short conversation I had with one of my good friends José on Friday, I knew I had stumbled upon the perfect topic to break the ice with my readers. It also started the way most great conspiracies start—with an uncomfortable observation about something relatively mundane.
I came home early from work on Friday, August 31st, eager to start my Labor Day weekend.
I took advantage of the extra free time to run a few errands, including tackling my laundry pile. As the evening was winding down, and I was sitting on my couch waiting for the final dry cycle to run out, I did what anyone else mindlessly does—cycle through my various apps. As per the usual, I ended up on reddit, scrolling through my favorite subreddits before heading over to the front page.
The third post from the top was this one, about a former actress who starred on ER, that had been killed by LAPD after suffering a mental breakdown and threatening officers with a BB gun. Even though I had never heard of this woman before, I was intrigued by the story and clicked on the article before heading over to the Reddit comments.
This is where the fun began.
I’ve included screenshots of the top comment thread, which is the one that caught my attention.
I’m curious as to your reactions to this line of comments. I imagine it would vary primarily on your familiarity with Reddit.
At first glance, it may seem like there is nothing amiss about these comments.
The parent comment offers a quick summary of the article, and the child comments follow a natural progression through the various topics that relate to the event: mental health, and suicide. There are mention of details, such as the BB gun that provoked police officers to shoot her, and the insurance ramifications of dying as a result of violent actions.
Overall, the discussion seems organic.
But what about the tone and syntax of the comments? This specific progression of comments flows so well—it almost seems like they were written by one person, or a coordinated team of people.
The comments set one another up, as if each “individual user” is intentionally making a point that leads to another person’s response.
I’d be hard pressed to find an organic comment thread off of Reddit that appeared as scripted as this one.
Let’s take a look at the top comment thread on KTLA 5’s Facebook post about the incident.
Yeah, these comments get pretty ridiculous. That’s the point though.
You can very much envision this sort of discussion happening in real life. You also see the clear distinctions between the various users/individuals, and there is no mistaking who wrote what.
In the reddit thread, if I presented you all the comments as plain sentences, and asked you if they were written by one person, you would possibly answer in the affirmative. That would simply not happen with the comment thread above.
As a rebuttal, you might point out (as José did), that Reddit just seems to attract this sort of user: the geeky fact-finding individual who likes to participate civilly in online discussions about random topics. That would explain the overly proper content of the comments.
Also, the kind of people that participate in online discussions differ from the entire userbase that simply views and/or upvotes content. If you remember my article that discussed the Pareto Principle, about only 20% of an internet site’s userbase creates 80% of the content—meaning the posters and commenters are not fully representative of the average Reddit user.
In addition, you might point out that Reddit lends itself to this monotony of comments given its anonymity and facelessness. Facebook users have to use real names, and their profile pictures are clearly displayed.
To the first point, I can only answer that Reddit’s demographic is similar to other online forum’s. The average reddit user is a straight white male aged 18-25. The age seems to be in line (if a bit younger) than Facebook’s demographics, and the sexual orientation and race as well. The only marked difference is the fact that Reddit is overwhelmingly male when compared to other social networks.
To the second point, I say we look at Twitter (a site that also allows and encourages anonymity) to see if the content compares.
Let’s examine the top comment thread on The Hollywood Reporter’s tweet about the story.
Again, like in the Facebook example, the Twitter comments each have a distinctive personality, unlike the Reddit comments that give off an artificial or forced vibe. Given that four users within the thread have pseudonyms, it’s safe to say that anonymity should not affect the online voice of a person.
So what gives? You found a weird comment thread on Reddit, what’s the importance?
The thing is: this is not a one off thing. This is a trend on Reddit:
Go ahead and visit the Reddit front page for yourself. If you have an account, either sign out or visit the site in Incognito mode to make sure you get a fresh unfiltered page.
Now I’m not saying all the posts have suspicious comment threads such as this, or that they’re always the top comments. Sure, I’m aware of my cherry picking.
My observation is simply this: Reddit has an unusually high amount of human-like users making comments that are similar in tone and syntax.
It’s almost as if the comments are being made by advanced bots.
I told you this week’s post was going to take you on a wild ride.
What am I getting at exactly? You may have figured out where this is going by the post title, summary, or the lead up in the previous sections.
Let me spell it out loud and clear: Reddit was created by or with the help of government researchers, in order to build a comprehensive reverse honeypot.
I’ll continue to flesh out the theory before jumping in to the evidence.
The purpose of creating this site is to craft and experiment with an advanced AI that can mimic human conversations.
The more real human content that is posted on to the site, the better that the overarching AI gets at creating believable content.
These strange comment threads are displays of how proficient the AI has gotten, along with its limitations. Each of those users is simply a sub-bot making comments that come straight from the central AI’s database.
The AI is capable of making posts itself, simply by crawling the web for trending articles and then linking them—or even making text posts, and question threads.
From there, it scans the posted article or video and uses the content to populate the post with comments, and formulate responses to these comments.
Hypothetically, you could stumble upon a Reddit post entirely created by and filled with comments all originating from this mothership AI source—which uses individual bots to accomplish its task.
More likely than not however, the AI prefers to sit back and simply engage in part of the discussion. That could range from just making the post and then digesting the comments, to making a singular comment and closely learning from the responses and/or vote score.
This explains the consistently weird syntax and tone of the comments, which is a limitation of the AI. It is hardest to program a human’s personality, but being able to properly structure sentences and thoughts, and keep them relevant is not out of the bounds of being achievable.
As you can imagine, the more the AI crawls the internet, and then also observes the organic Reddit content, the better it gets at tweaking its algorithms to think and resemble human thought and speech.
What might that algorithm look like?
The data it has on a specific post can resemble a word frequency graph like this (taken from President Obama’s 2015 State of the Union):
However this would not be enough to transition to actually parsing an article and coming up with comments that made sense given the article’s content.
For that, you’d need lots of data, and you’d need to take the analysis further than word frequency.
The AI would have to be programmed to map our humanwide idea networks.
Here is an example of an idea network:
The fact that this network makes some strange associations and seemingly non-sensical conclusions is a perfect way to demonstrate how the AI is limited, in the same way that certain people with developmental disorders struggle to interact well with others—they get what we are saying to a certain extent, but may veer off topic or take the conversation down a weird path.
Over time, the AI can only grow smarter and smarter, as it parses hundreds of billions of words and videos to continue expanding its simulated human neural network. Perhaps now you can see why Reddit provides the perfect classroom.
The site is primarily text based, and is driven by user engagement. It is also targeted to more educated users, meaning that the content posted will be easier to parse than let’s say, immature YouTube comments or Instagram comments containing emojis.
The site has hardly been re-designed since its inception, as a matter of fact, the recent April 2018 re-design was the first in over a decade.
It is clear the site (even with the new re-design) was intended to be information heavy. Text is always primary, and the posts themselves prioritize discussion over all else. The interface is more straightforward for scanning and parsing than the complexity of Twitter or Facebook.
Here are screenshots that display the aforementioned points:
The design of Reddit is reminiscent of forums and chat rooms of the early internet—dating back to when the internet was primarily being used by the military and government researchers.
Everything about Reddit seems to be geared towards encouraging an AI to feast on human conversation in order to gain more knowledge about our interactions.
Learning is not enough however, and it has to put its learnings to practice in the real world environment in order to make concrete progress.
Reddit’s design also helps in this. By placing comments front and center, and incorporating a voting system, the AI now has a laboratory to conduct experiments.
Early on, the sub-bots would likely post non-sensical comments that were quickly downvoted. But as the AI grew smarter, and made its sub-bots post more intelligible comments, it started receiving more upvotes and responses.
This positive feedback loop is what allows the AI to increase its overall intelligence, and skillful creation of content.
It gives us real human users more of what we deem to be not only realistic sounding, but worthy of consumption and discussion. Those are the two criteria it operates by, in order of importance—how human sounding is it, and entertaining/insightful is it.
Especially if this is being designed by academics, one can imagine the experimental procedures that can be employed. You can tweak an inordinate amount of variables, including: username, time of post, order of comments, specific syntax details (placement of commas, capitalization, etc.). This can all be done across roughly the same sample size.
The experiments can also be run using identical posts—given that people on Reddit are accustomed to reposts. This means you can test out a given post and/or comment with several trials in true experimental fashion.
What about those typos in the comments above? I can only imagine that the AI has picked up on our tendency to commit typos, and therefore mirrors our errors occasionally to appear more believable.
The reason that the AI chooses to operate with sub-bots makes sense, particularly if it employs a sub-reddit specific approach. What I mean by this is the fact that the AI can make a bunch of users, and let them loose across not only front page posts—but across subreddits, in order to further expand its knowledge across niches.
The sub-bots can easily integrate themselves into these smaller communities as their knowledge grows more specialized, and they can also be tweaked individual if they start going awry.
Picture this scenario: perhaps several r/Fortnite sub-bots begin mimicking the user base too intensely, over engaging in memes, and the AI senses this as they begin to get downvoted and reported more. Therefore, it can suspend the accounts from posting momentarily until it tweaks the algorithms and sets them back loose in the wild to test their validity.
In this way, not only do us human Reddit users collectively serve as passive teachers to the AI—but we also serve as active critics. This entire process happens unknowingly to us, which is the genius of it.
Now you know the what and how—but the why? Why would the government have any incentive to design such a venture, or why would Reddit founders/employees and venture capitalists want to assist such an operation?
To answer the government’s role, the response is that of the typical conspiracist: to gather as much data as possible from our population, in order to weaponize it in the form of bots that mimic us.
Think of the power that can be unleashed if the AI reaches a form that makes it almost indistinguishable from regular humans. As we’ve seen with Russian interference in 2016’s election, it does not take a sophisticated operation to wreak havoc. These sub-bots can be created and dispersed throughout all corners of the internet in order to sow discord or gather further information from selective communities, all with fairly little human involvement (and thus deniability and scale).
To answer the question regarding the role of the private sector: data is gold. Much like how the government would love to collect and catalog as much data as possible from all of its citizens in order to keep us in line, the private sector wants just as much data in order to shove its consumerist agenda down our throats.
Reddit is notorious for having a tough time generating advertising revenue comparable to its social media peers. Like Twitter, it primarily struggles in the lack of information it has for individual user profiles, which thereby limits the targeting that companies and agencies can do. After all, you can make a Reddit account with nothing more than an email and a fake name.
But what if there was another way to target ads, one that was just as specific and native, as opposed to broad and sponsored? It’s not like we haven’t already seen it in the past presidential election. I am of course talking about astroturfing, the practice of promotion through seemingly organic channels.
Now you can see how an AI with over a decade of Reddit specific experience and knowledge would be priceless to advertisers, along with its sub-bots that have accumulated millions of genuine upvotes, comments, and posts. Entire products and services (and even thought patterns) could be discreetly, subliminally, and gradually implanted in to users.
Reddit’s sub-reddits would solve the targeting problem, as the site is famed for its iconic organization into these sub-communities that range from extremely wide (r/WorldNews) to extremely niche (r/BreadStapledToTrees). Even the most non-sensical or silly subreddits would be more valuable than an inch of Facebook timeline space, given the sponsored advertising fatigue our society is currently facing.
On a larger scale, the detailed idea networks and the word frequency graphs that the AI has collected and generated could be packaged and sold through some sort of data licensing deal, to help marketers and entrepreneurs do product research. The government would also love this for the sake of having more up to date and youth targeted opinion polling.
Well, I have now presented to you a pretty solid overview of what I believe is the grand conspiracy behind Reddit. Everything thus far has been pure conjecture and hypothesized from thin air, albeit with some logical progression.
Time for me to show you the money.
Let me be upfront, lest you get your hopes up.
I do not have much concrete evidence to back up my theory. Everything I have is mostly circumstantial, but again—I’m making this post mainly for entertainment, and not to legitimately peddle a serious uncovering effort.
That being said, I still do have some sorts of clues that can justify my claims.
The first of which relates to the founding of Reddit. Like many other social media networks, Reddit had to contend with the classic chicken and the egg problem—how do I get people to engage with content on our website…if there is no content or users to begin with?
Their solution was this:
If we take Steve Huffman’s words at face value, they certainly are plausible. But imagine how much easier an AI would make their jobs.
Instead of getting all the founders and early employees to make fake accounts, keep track of the log-ins, switching in and out, and collectively drafting tweets—a team of bots could be set loose to start populating the page.
And the way Steve phrases it, it seems like that’s exactly what it turned in to. After the course of a few months, the founders only had to monitor the bots, and tweak the AI’s algorithm, as the training wheels were off and they could easily fill pages with organic looking content that attracted real human users.
One of the more damning pieces of evidence I have concerns a particularly bizarre corner of the site—a subreddit called r/SubredditSimulator.
It is a subreddit composed entirely of bots that represent the average user of a particular subreddit. The posts, and comments are all made by these bots, and human users are not allowed to participate in any way other than voting.
Yeah… this sounds a lot like what I’ve been literally describing for this entire post.
Here’s a sample post and comments:
As indicated by their usernames, you can see which bots are representing which subreddits.
The methodology of these bots is similar to that of your phone’s predictive keyboard, utilizing what is called a Markov chain system derived from the most common keywords found in a subreddit.
If your alarm bells aren’t going off already, let me make it clear.
This is clear proof that a rudimentary version of what I am describing is plausible, and it exists in living form in front of our eyes.
But why would this be a publicly displayed subreddit?
That I do find odd, but perhaps this is just a demonstration of the very first rough algorithms that the AI employed early on in Reddit’s existence, and is intended as a red herring to pretend as if it is beyond our capabilities to build something more advanced.
As for the final piece of evidence, I would like to point you towards the unusual circumstances surrounding the death of one of Reddit’s co-founders.
Although he technically was not there at Reddit’s inception, Y Combinator CEO Paul Graham did grant Aaron Swartz the title of co-founder when his venture merged with Steve Huffman and Alexis Ohanian’s Reddit.
Swartz was considered an internet prodigy, and a hacktivist first and foremost.
He killed himself weeks before his trial, where he was charged with stealing scholarly articles off of JSTOR in order to distribute them for free to students across the US.
Again, even though he technically did not help originate Reddit, he was involved in its key development during its early days—but was fired unexpectedly with little explanation.
Is it possible that Swartz realized the nefarious intentions of the site, and tried to speak up before being ousted? Was he planning on revealing further details before being charged, and ultimately perhaps bullied into committing suicide?
We never will know, and out of respect for his family and legacy, perhaps it’s best that we keep the speculation to a minimum.
One way or another, there is something going on behind the scenes of the self-proclaimed “front page of the internet,” which is being run by the so-called “mayor of the internet.“
Or maybe, we should utilize Occam’s razor and accept that Reddit is just full of the same type of users.
EDIT (Dec. 8, 2018):
I just came across this Reddit post that discusses some the points I made in my article.