The Reddit algorithm: a recap

Posted on December 10, 2013

Yesterday, I published a modest post detailing a flaw in the Reddit ranking system, the consequences, and the response to it. A few hours later, it was #1 on Reddit and #2 on Hacker News. I was not expecting this, to say the least.

Rejoinder

The Reddit developers spoke up again in the comments to explain themselves a little further. Notably, for the first time that I’m aware of, there is some indication that they do consider this a bug and might be planning to fix it at some point.

A number of helpful people also sent me links to previous discussions on this subject, some of which I had not dug up on my own. This thread contains a fairly thorough discussion of the implementation, and a proposal from a developer to change the algorithm – surprisingly, in a different way than what I believe should be done.

I want to make clear while we’re on the subject that I was never looking to be hard on the Reddit developers. I do believe that I am right and they are wrong, and I do believe that they have done a poor job of communicating the actual reasons behind this decision, whatever those reasons are. But I don’t think that the Reddit devs are jerks or idiots or anything else. Managing a community is tough work, and all the tougher when your code is open for people to pick through and criticize. I do want this issue fixed, but I wasn’t really looking to start a crusade for it¹.

Agreement

My discussion of vote gaming struck a nerve. If there’s one lesson to take away from this article’s popularity, it’s that vote manipulation is something Redditors are thinking about and are worried about - not just the programmer types, but everyone. Many people have pet theories about what kind of widespread vote manipulation is taking place². All sorts of comments poured in about this, everything from the reasonable (“I wish there were more safeguards against trolls”) to the full-blown conspiracy theories (“Wake up sheeple”). Several moderators of subreddits chimed in to say that they have struggled with the vote banishment in their subreddits.

Another kind person, rubicks, sent me a graph he created based on my article that provides an abstract representation of the function curves generated³ by the existing calculation (purple) and my proposed solution (green). It’s a powerfully intuitive way to make the argument – the existing version spikes in a strange and discontiguous way.

Disagreement

A number of people also disagreed with my stance, proposing alternate explanations for the behavior I described. I am not entirely convinced by any of the theories I have seen so far, but some of them are interesting and I did enjoy reading them. Click through for my responses and further discussion.

That a post with more downvotes in a shorter period of time is worse because that indicates it is more hated.
That many negative notes demonstrates lots of activity on a post, and thus still a measure of “hotness”.
That banishment helps with spam/noise control, because spam will quickly receive several downvotes.

Action

In response to my article, /r/Chicago is launching a month-long experiment disabling downvotes in their subreddit. I’ll be very interested to see how it turns out for them.

Most importantly of all, /r/BirdPics held a Puffin Day in my honor.

Puffin Day at /r/BirdPics

You guys… you really shouldn’t have. What an honor. I don’t have the words to tell you how much this means to me. No, no I’m fine. I just got something in my eye. It’s fine. Just an eyelash. Runny nose. It’s fine. Excuse me for a moment.

Corrections

A number of people helpfully pointed out that the fluctuating vote numbers I was seeing were due to vote fuzzing, an anti-spam feature. I have corrected that footnote⁴.

I linked to the wrong piece of code when discussing the bug, leading some people to believe it has been fixed in production. This was my fork of the Reddit code in which I fixed it, not the official Reddit repo. I’ve now changed that link to point to a pre-fix commit so as not to be confusing.

Someone else corrected my statement that Reddit has “tons of cash flowing in” by pointing out that they’re still not profitable. I haven’t amended that because that’s just mean.

Press

Anthony Wing Kosner did a solid writeup on the Forbes blog, and advances his own theory on the reasons behind the ranking behavior.

A few other writeups have happened in the business press. I won’t vouch for their quality, but here they are.

Miscellany

Did you know that Randall Munroe has taken an interest in Reddit’s ranking algorithms? Well, now you do.

Another user contends that “Controversial” is the real worst sort implementation. Some good discussion ensues.

And then there’s this story. I don’t have a lot to say about that.

You should realize that I never, ever imagined that a technical post written for a technical audience on my quiet little blog would net this much attention. The absolute ceiling on my expectations was having it do well on /r/programming.↩
Due to the uncertainty introduced by vote fuzzing, I assume these theories are mostly speculation rather than hard observation. However, some of the theories are, like mine, quite plausible.↩
Don’t look at the actual numbers on the graph, as they won’t meet up. It’s the shape of the curves that can help visualize how scores will relate.↩
These corrections also provide an answer as to whether anyone reads the footnotes.↩