This is What Data Cleaning Looks Like | Data Collective: Blog
Recent Activity
heyo, I actually found the original blog post for which they compiled this data ... adding to Articles tab now.
Hmm, I can't find a description of primary or secondary fraction in the article. Also, can't find the data. Unless the data is the stuff being sold in the Infochimps marketplace for $1? Namely: http://www.infochimps.com/datasets/candidate-wordbags ?
Hey.. sorry about that.. it's been awhile since I worked with this data :-) Let me try and dig back through what I did and explain a bit more about what exactly the data means...
Btw.. that data set at Infochimps is the *raw* wordbag data -- I've made it free now, so feel free to download.
Here's what I did..
Using the candidate list to split by party:
http://www.infochimps.com/datasets/2010-political-candidates
I then ran an analysis on the word incidence for tweets that contained either republican or democrat full names.
"primary fraction" and "secondary fraction" were actually "democrats" and "republicans" -- sorry, that's confusing, and I should probably just get rid of those two columns. The reason I made a separate "democrats" and "republicans" column was so that it would plot nicely on a bar graph (so.. the "democrats" column is just a negative value of the "primary" column).
The p/s is just a derived column, dividing primary/(primary+secondary) to try and get an interesting sort of the data -- so, low values of p/s mean that they are more republican-ish terms, and values closer to 1 mean that those are more democrat terms, and p/s of 0.5 means that the terms are relatively equal in their incidence by party.
So, in plain english.. the term "god" is in 0.185% of tweets that mention a republican name, and 0.038% of tweets that mention a democrat name.
Can you give a link to the source data? One place to put it would be in the Overview.
(You can use Markdown (http://en.wikipedia.org/wiki/Markdown#Links) to make the links look like normal web links)
I would love to add the current population, % of world pop in 2011, % of world pop projected in 2100, and implied annual growth rate to this data. Once I have the original source that'll be a little easier. The Google Fusion map is really cool, it'll be fun to see another one with the annual growth rates.
Very interesting, thanks!
Done! Thanks for the tip. I created the visualisation in order to write the following guide, rather than for a story on population predictions
http://www.journalism.co.uk/skills/how-to-get-started-using-google-fusion-tables/s7/a544215/
Smallest gini award goes to Beijing! Can someone who has lived in Beijing comment on the plausibility of this?
Also one other factor: its almost non-sensical to compare income vs consumption GINI's you can get a general picture; but that's another factor that goes into it.
I noticed this too. I'm actually a bit surprised there's so few over such a long period of time. I wonder how many people actually ride motorcycles in Toronto ...
Oooh, just realized that this is a clone from azad2002. Why was it cloned? What's different about it? I wish that azad2002 saw the viz and article that I wrote about all this.
I know, I noticed that! I cloned it because I was going to try playing around with it later, but I haven't altered anything yet. If you like you can post your viz and article to Azad's original and I'll delete this one ...
I could do that, but anyway I'm more interested in the general implications of cloning and what that does to the stream of comments, articles and visualizations that are linked to the clone or the original. For now, I'll just be careful to comment on the original from now on.
Sure — our cloning/version control is in active development, so keep your eyes peeled for what we deploy at commercial launch :)
oooh didn't know it was cloned. I had a link to the original file (with viz!) in my google docs for more than a year now and it was the first thing I threw up on buzzdata. Should have searched to see if anyone else had done it before me.
oh it was cloned _from_ me. I see. Also take a look at this amazing motion chart that was made by the original compiler of the data: https://spreadsheets.google.com/spreadsheet/ccc?key=0Aq3K-CZwPWxOdEx2SXh4VDFDYzA5UWhYczlCdWZ2UGc&hl=en_US#gid=15
Very cool!
Nice job!
What do primary fraction and secondary fraction mean? Can you give a link to the code or any other explanation of how this was gathered?