About this Dataroom
As a long time redditor, I've often wondered about the formula for front page success. Nothing I've ever submitted has even come close!
I decided to write a crawler that would use the Reddit API to gather information about the text of the front page stories, and store it in a local database for processing. After a couple of weeks the data started to level out.
This dataset contains a bunch of fun stats I extracted from the titles titles, including:
- the number of characters
- number of words
- readability indices (for example, Grade school level!)
Click on Data Preview to see the results!
I've also included the source code to the crawler. I was surprised how easy it was to create. Check the attachments tab.
Recommended Similar Datarooms
and the Bible. if you ran the Bible through the same reading score summary, you'd get similar, or even lower reading levels. That's a guess anyway. (Hint: someone should do this with the bible).
I tried submitting it to reddit but nobody even voted it up or down. Sometimes I wonder if my account is flagged as spam and anything I submit just goes invisible.
i think @momoko's been having similar reddit issues too
Maybe I should ask someone else on a different network to re-submit it.
I just popped it on programming.
As I said on Reddit, I'd love to see an analysis of the actual content which appears on the front page, and see if there were any noticeable or surprising patterns. It's tricky to suggest that Flesch Kinkaid reading scales influences or is the cause of viral content. But I do think lots of great stuff could be getting lost under long, unattractive headlines, and poorly written summaries, which would shove people away. Also, great writing has long been associated with short, brief, declarative sentence structures, most of which would help boost the scores you tracked. Just think Hemingway.