The Reddit algorithm: a recap
Yesterday, I published a modest post detailing a flaw in the Reddit ranking system, the consequences, and the response to it. A few hours later, it was #1 on Reddit and #2 on Hacker News. I was not expecting this, to say the least.
The Reddit developers spoke up again in the comments to explain themselves a little further. Notably, for the first time that I’m aware of, there is some indication that they do consider this a bug and might be planning to fix it at some point.
A number of helpful people also sent me links to previous discussions on this subject, some of which I had not dug up on my own. This thread contains a fairly thorough discussion of the implementation, and a proposal from a developer to change the algorithm – surprisingly, in a different way than what I believe should be done.
I want to make clear while we’re on the subject that I was never looking to be hard on the Reddit developers. I do believe that I am right and they are wrong, and I do believe that they have done a poor job of communicating the actual reasons behind this decision, whatever those reasons are. But I don’t think that the Reddit devs are jerks or idiots or anything else. Managing a community is tough work, and all the tougher when your code is open for people to pick through and criticize. I do want this issue fixed, but I wasn’t really looking to start a crusade for it1.
My discussion of vote gaming struck a nerve. If there’s one lesson to take away from this article’s popularity, it’s that vote manipulation is something Redditors are thinking about and are worried about - not just the programmer types, but everyone. Many people have pet theories about what kind of widespread vote manipulation is taking place2. All sorts of comments poured in about this, everything from the reasonable (“I wish there were more safeguards against trolls”) to the full-blown conspiracy theories (“Wake up sheeple”). Several moderators of subreddits chimed in to say that they have struggled with the vote banishment in their subreddits.
Another kind person, rubicks, sent me a graph he created based on my article that provides an abstract representation of the function curves generated3 by the existing calculation (purple) and my proposed solution (green). It’s a powerfully intuitive way to make the argument – the existing version spikes in a strange and discontiguous way.
A number of people also disagreed with my stance, proposing alternate explanations for the behavior I described. I am not entirely convinced by any of the theories I have seen so far, but some of them are interesting and I did enjoy reading them. Click through for my responses and further discussion.
- That a post with more downvotes in a shorter period of time is worse because that indicates it is more hated.
- That many negative notes demonstrates lots of activity on a post, and thus still a measure of “hotness”.
- That banishment helps with spam/noise control, because spam will quickly receive several downvotes.
Most importantly of all, /r/BirdPics held a Puffin Day in my honor.
You guys… you really shouldn’t have. What an honor. I don’t have the words to tell you how much this means to me. No, no I’m fine. I just got something in my eye. It’s fine. Just an eyelash. Runny nose. It’s fine. Excuse me for a moment.
I linked to the wrong piece of code when discussing the bug, leading some people to believe it has been fixed in production. This was my fork of the Reddit code in which I fixed it, not the official Reddit repo. I’ve now changed that link to point to a pre-fix commit so as not to be confusing.
Someone else corrected my statement that Reddit has “tons of cash flowing in” by pointing out that they’re still not profitable. I haven’t amended that because that’s just mean.
Anthony Wing Kosner did a solid writeup on the Forbes blog, and advances his own theory on the reasons behind the ranking behavior.
A few other writeups have happened in the business press. I won’t vouch for their quality, but here they are.
Did you know that Randall Munroe has taken an interest in Reddit’s ranking algorithms? Well, now you do.
Another user contends that “Controversial” is the real worst sort implementation. Some good discussion ensues.
And then there’s this story. I don’t have a lot to say about that.
You should realize that I never, ever imagined that a technical post written for a technical audience on my quiet little blog would net this much attention. The absolute ceiling on my expectations was having it do well on /r/programming.↩
Due to the uncertainty introduced by vote fuzzing, I assume these theories are mostly speculation rather than hard observation. However, some of the theories are, like mine, quite plausible.↩
Don’t look at the actual numbers on the graph, as they won’t meet up. It’s the shape of the curves that can help visualize how scores will relate.↩
These corrections also provide an answer as to whether anyone reads the footnotes.↩