Deadline for contest submissions is tonight!

If you planned on submitting to our first data-storytelling contest, the deadline is midnight EST tonight! To submit, there are specific guidelines (they’re easy but you do have to follow them). The original blog post is here, and here’s a re-post of submission guidelines:

HOW TO SUBMIT YOUR ENTRY:

Submit your project by inviting me (Momoko on BuzzData) and the original publisher of your chosen dataset (see original blog post) to check out your project on BuzzData when it’s done. 

If you know how to use BuzzData, submit as follows:

1. Clone one of the datasets above directly from the publisher on BuzzData. Make it private if you don’t want people to see it until it’s ready

2. Build your project on BuzzData (posting links and viz’s as appropriate).

3. When it’s done, note in the Overview which visualization/article/attachment is the final product(s), then, before deadline, invite me and the data publisher to check it out. And don’t forget to make it public if you want to show it to the world!

4. Tweet and link to your project elsewhere if you want to build interest in it (optional, but always a good idea)

If you’ve never used BuzzData before, here’s a quick video that shows how to start, build and submit your data project (don’t worry, it’s super easy!):

THE WINNER WILL BE VOTED ON AT OUR NEXT MEETUP ON OCT. 24! 

(Short notice, but no worries, this will happen monthly)

Want to attend BuzzData’s workshop next week? Here are the details:

Where: Room 120 at the Centre for Social Innovation, 215 Spadina Ave. (north of Queen St. West in Toronto)

When: Drop in from 6pm onward on Monday, Oct. 24. We’ll be closing up shop by 9pm

RSVP to our Hacks/Hackers meetup group here, please!

Hope to see you there!

-Momoko Price

 (Have you tried BuzzData yet? What are you waiting for, silly?)

Visualizing Toronto’s water usage: a tutorial

Earlier this month I attended and spoke at News:Rewired, a popular digital journalism conference in London, U.K. The journalists there were top-drawer: from Reuters, the BBC, the Guardian, the Telegraph, and others. My talk, on how data curation will be key in driving digital journalism forward, appeared to resonate with quite a few people, which was great.

However, more often than not, attendees came up to me, thanked me for the talk and then prefaced their clear enthusiasm for data journalism with an almost bashful admission that they lack the data exploration, analysis and visualization skills to actually do it. 

This is no surprise. Journalism has historically always been a narrative craft and largely still is. But this experience did make me think that perhaps it might be helpful to make step-by-step tutorial posts showing how to probe and visualize data. 

(Are you a data geek with a tip or tool to show off? Ping me about guest-posting or contributing to a BD tutorial)

This first data-tutorial post — a very basic one for data newbies — will begin exploring some Canadian government open data recently published on BuzzData: the City of Toronto’s water billing data over the last 11 years. A city journalist’s spidey-sense should tell them right away there’s a budding story to be had in this data, namely: 

Which wards are the most water-efficient and water-wasteful in Toronto? 

Let’s follow this 5-step process to find out.

 (Want to follow the evolution of this data and stay updated on results as I go? Follow my dataset on BuzzData

Step 1: Get the data

First you have to get your hands on the data you want. This particular dataset is easy to get: just clone the data from the original publisher here: www.buzzdata.com/opento, and then download the xls file to your desktop. In this video I show you how to clone the data and make it private so you can build your project around it without others seeing what you’re working on:

 Step 2: Pick a question to answer

In future posts we’ll get more sophisticated with our exploration and visualization. For this first exercise we’re going to pick a very specific, simple question:

 “Which wards had the highest and lowest average water consumption last year?”

If you open up the dataset you downloaded, you’ll see that it actually splits water billing accounts into two types: residential and commercial. Let’s stick with residential. (Feel free to repeat this exercise on your own to find out which wards had the highest and lowest average commercial water consumption, and then see if there’s a correlation between the two types …)

Step 3: Pick your visualization method (and use K.I.S.S. — Keep It Simple, Stupid)

I’m going to make a bar chart in Excel. I know, it sounds boring, but here’s why:

To answer my question, I’m going to visualize discrete data (Toronto wards), and only compare one kind of value (the wards’ average residential water consumption). Any other kind of graph would probably be less clear in the long run, because the extra bells and whistles of the method would just add noise to the image.

However, if I wanted to highlight water consumption trends over multiple years, a line graph or time series chart would likely work best.

If I wanted to know which wards were close to each other, a heat map using GIS data would be great. We’ll get to those in the future.

As a rule: pick the method that would best highlight the answer to your question!*

*You may not know which one works best without a little trial & error first.

 Step 4: Format your data

 Now it’s time to look at the data:

That’s a lot of data. Graphing this entire spreadsheet would be pointless, in fact it would probably be harder to understand than the spreadsheet itself. You have to think about what information pertains to your question. I want to know which ward was most efficient and wasteful last year, respectively. So I need the following data:

Average residential water consumption for each of 44 wards in the year 2010

Everything else — commercial accounts, # of accounts, total consumption, etc. — would be noise on the page. So how do we get just this data? There are lots of ways, but in this instance we’ll make a Pivot Table: 

Now we have a nice Pivot Table, but we’ll want to do just a little more formatting and organization before we make our graph. In this video clip I show how to prep your table for graphing, as well as how to sort your data to get an idea of what your findings will be:

Step 5: Graph it & and get your answer!

Before going on, let’s recap what I’m trying to find out here. My original question was:

“Which wards had the highest and lowest average residential water consumption last year?”

By sorting the data earlier, we already know our answer. Now we just want to visualize it. Because we already sorted and formatted our data, graphing is now a piece of cake with Excel’s chart wizard. Here’s how you do it:

        

And that’s how you make a nice clean bar chart (and start to explore data with a journalistic frame of mind). What other trends can you find in this dataset?

One more note: visualizing data one way to answer a question often prompts new questions! In this case, I can’t help but wonder whether city wards with similar water consumption levels cluster together geographically. To answer this, we’ll need to map the data, so stay tuned for the next tutorial post to learn how!

NEXT UP: Visualizing Toronto’s water consumption with GIS (geographic information systems) data. In other words, shapefiles and mapping. Woohoo!

-Momoko Price

Enjoyed this but know you could do better? Great!  Ping me about guest-posting or contributing to a BD tutorial post!

Want to follow the evolution of the data as we go? Follow my dataset on BuzzData!

Data-journalism reunion, anyone?

Well, we said we’d try to do these every month, so we’re back! BuzzData’s putting on another data-journalism workshop next week — can you make it? Be there or well, suck! Just kidding. But you should come.

 

A lot of things have happened in the last month, most notably: BuzzData is now public (and pretty awesome, if we do say so ourselves). So in addition to having the chance to learn more data-wrangling tools, this time you’ll have the opportunity to start using BuzzData and get connected to great data journalists and hackers around the world who are already using it. 

The details:

WHAT: BuzzData’s Data-Journalism Fun-Times (Vol. 2)

WHO: You, silly. RSVP to momoko@buzzdata.com 

(Space limited to 20 attendees. Don’t worry, we’ll do another round the following week if need be)

WHEN: Wednesday, August 17, 6pm – 9pm

WHERE: The BuzzData office – find us at 174 Spadina Ave, Suite #204 (just north of Queen and Spadina)

DETAILS: This workshop will be decidedly less intense & more individual project-focused than last time. Needless to say, bring your laptop and power cord. We will teach some tools, ie: more Google Refine wizardry (barely got started on GRefine last time), and an introduction to ScraperWiki, as well as show you how to look for and do data collaboration on BuzzData. A number of users on BuzzData have already started publishing weird, wild data worth mining, so it will give you an opportunity to practice viz/analytical skills with clean, machine-readable, interesting data, too. 

We also highly recommend you bring in either a) a project you’re working on, b) the basic pitch of a project you want to start, or c) a story/topic you would like to tackle from a data/quantitative angle. After all, much of data journalism is problem-solving. So pick your problem and let’s get to solving it!

Anything else? No? Great! See you there!

Data-driven journalism, done faster

From the start, we went out of our way to enlist the participation of groups and businesses for the BuzzData beta — after all, BuzzData is all about improving group collaboration around data, right?

Having said that, bringing businesses on board at the beta stage, let alone post-commercial launch, is no small feat for a funky, outside-the-box app like BuzzData. The concept of open data is still relatively new, and simple workflow tools for data wrangling and sharing are rare. Finding organizations that were hip to the movement and up for trying a new, untested digital app was a fun challenge, needless to say.

Lucky for us, a small number of influential, forward-thinking organizations came forward to test the beta right at from start, including: 

The Economist Intelligence Unit 

The Globe and Mail (Canada’s national newspaper)

Global News (Canadian broadcast and online news)

The City of Vancouver

And while the beta’s only been active less than a week, we’ve already witnessed instances of unscripted cross-pollination between media, government and data-literate citizens. This is hugely exciting to us. 

The Globe and Mail’s account in particular, hosted by Toronto Hacks/Hackers organizer and Globe mobile editor Mason Wright, has been off to a promising start, largely because Wright clearly gets the give/take aspect of social networking, posting Globe articles to other users’ data and making an effort to put the Globe’s data in context with accompanying articles and visualizations.

It’s fascinating to watch this happen in the context of data. We’re so used to static catalogues and repositories that appear to move at a glacial pace. In contrast, on BuzzData you tell a user something — whether it’s your best friend or a national newspaper — and they talk back to you as a visible, dynamic, listening entity, a single degree of separation away. Not a new phenomenon to social media, certainly, but a refreshing change of pace for data communication. 

As an example, last week the Globe uploaded food price indices data as an accompaniment to a recent Report on Business article. The article itself focused on short-term food prices, but New York-based beta tester David Joerg took the data and, by simply plotting the data over time in Excel, uncovered a startling spike in sugar prices no one had yet noticed: 

Even Wright was surprised to see this. So the question remains: what’s driving the price inflation of sugar? Perhaps Joerg’s cursory data-viz will trigger an entirely new business investigation by the Globe in the near future. That would be incredibly cool, and a truly unique example of collaborative data journalism — one that, in an instant, transcended national boundaries and professional disciplines.

Not bad for the first five days of a beta. 

Got data? Get on the beta (like, right now!)

BuzzData’s beta is officially underway! We’re bringing a first wave of users on board today and more every few days from here on. We’re so excited (and exhausted)!

But first: We at BuzzData built this platform not just for people who are interested in data, but for people who have it. More and more people collect, use and wrangle data these days. They need a place to show off their work and collaborate with others in a way that’s truly efficient, dynamic and fun.

If you’re listening, please know: We built BuzzData for you.

We still have a lot of people to bring on board, but we’d like to let those with data onto BuzzData as soon as possible. You’ve waited long enough (and deserve better than Google Docs, for god’s sake!).

So if you’ve got data, let us know and we’ll hook you up with an account immediately. Email us directly at blog@buzzdata.com with the subject line “Got Data” and we’ll take care of you :)

Oh yeah! Wondering what BuzzData looks like? Check out the demo video for a taste: 

One last thing: if you haven’t signed up for the beta at all yet, seriously: what are you waiting for? Get on the bus, yo!

25 great links for data-lovin’ journalists

Knowing how to avoid errors like this just one reason to love being a data journalist:

In case you missed it — everything we worked on last weekend (and plenty more)!

WORKSHOP PART 1: Intro to ScraperWiki and ManyEyes w/ Momoko Price

For the first half we worked on visualizing data with ManyEyes. We used arms import/export data courtesy of our friend and my doppelganger at ScraperWiki, data journalist Nicola Hughes:

Scraped data (see the icon that says “Download the Spreadsheet (CSV)?” Yeah, do that.):
http://scraperwiki.com/scrapers/arms_imports_database/
http://scraperwiki.com/scrapers/arms_exports_database/

If the source of the data isn’t apparent, check the scraper script (click on the tab that says “Edit”) and check for the source URL. Like so:

http://scraperwiki.com/scrapers/arms_imports_database/edit/

(Did you find the source? Good.)

You can check out a few of the visualizations we made in ManyEyes (for teaching purposes only. I don’t actually think these are great viz’s):

http://www-958.ibm.com/software/data/cognos/manyeyes/visualizations/total-arms-exporting-volume-per-na

http://www-958.ibm.com/software/data/cognos/manyeyes/visualizations/arms-importing-and-exporting-natio

http://www-958.ibm.com/software/data/cognos/manyeyes/visualizations/top-10-arms-exporting-nations-stac

We also used Google Refine to start cleaning up data taken from the new Canadian International Development Agency open data portal. Did you know they just launched one? Well they did:

CIDA Database Source:
http://www.acdi-cida.gc.ca/cidaweb/cpo.nsf/fWebprojDataEn?Readform

Google Refine:
[Data manipulation and cleaning tool]
http://code.google.com/p/google-refine/wiki/Downloads

(Keep in mind, GRefine keeps track of every single alteration you to a dataset, so don’t ever worry about doing something “wrong.” You can always go back. Version control, what an amazing thing.)

WORKSHOP PART 2: Mapping, FusionTables and FusionTables Layers with Joey Coleman

All of Joey’s workshop materials can be found on his data page:

http://data.joeycoleman.ca/

I believe he’ll be posting slides of his presentation soon …

OTHER COOL DATA-JOURNALISM REFERENCES:

Paul Bradshaw’s online journalism blog (amazing resource):
http://onlinejournalismblog.com/

NICAR-L Discussion mailing list (National Institute of Computer
Assisted Reporting)
http://www.ire.org/membership/subscribe/nicar-l.html

Toronto’s open-data catalogue:
http://www1.toronto.ca/wps/portal/open_data/open_data_home?vgnextoid=b3886aa8cc819210VgnVCM10000067d60f89RCRD

Data Visualization Blogs:

Stephen Few’s Perceptual Edge

http://www.perceptualedge.com/examples.php

David McCandless’s Information is Beautiful

http://www.informationisbeautiful.net/

Doug McCune’s Adobe Flex- and ActionScript-focused blog: 

http://dougmccune.com/blog/


OTHER FUN STUFF (COURTESY OF DATA HACKER ROB MEDEIROS):

Google Public Data Explorer:
[Online data visualization tool]
http://www.google.com/publicdata/home

R Project
[statistics and visualization tool]
http://www.r-project.org/

SQLite
[Small, fast, embeddable SQL database]
http://sqlite.org

Matplotlib
[Python graphing and visualization]
http://matplotlib.sourceforge.net/

OpenDX
[Hard-core old skool data visualization tool]
http://www.opendx.org

Blender
[3-D modelling and rendering application; scriptable w/ Python;
great
for 3-D static or interactive visualizations]
http://www.blender.org/

NumPy
[Scientific computing package for Python; fun w/ numbers]
http://numpy.scipy.org/

GNU Octave
[Mathematica clone; great for numerical calculations,
visualizations]
http://www.gnu.org/software/octave/

Linked Data
[Slightly esoteric vision of the future web in which data is
much
easier to get and work with]
http://linkeddata.org

Semantic Web
[Official home of the future, data-centric web]
http://www.w3.org/standards/semanticweb/

REQUIRED READING


Run, don’t walk, to the nearest bookstore and buy anything
written by Edward Tufte, e.g.

* The Visual Display of Quantitive Information
http://www.amazon.com/Visual-Display-Quantitative-Information/dp/0961392142/

* Envisioning Information
http://www.amazon.com/Envisioning-Information-Edward-R-Tufte/dp/0961392118/

* Visual Explanations: Images and Quantities, Evidence and
Narrative
http://www.amazon.com/Visual-Explanations-Quantities-Evidence-Narrative/dp/0961392126/

* Beautiful Evidence
http://www.amazon.com/Beautiful-Evidence-Edward-R-Tufte/dp/0961392177/

* Visual & Statistical Thinking: Displays of Evidence for
Decision Making
http://www.amazon.com/Visual-Statistical-Thinking-Displays-Evidence/dp/0961392134/

BuzzData gets hands-on with Hacks/Hackers

Heyooo! We put on our first data journalism workshop last Saturday at the Centre for Social Innovation in Toronto and it went grrrrrrrrrrreat!

Still can’t believe we had a full house despite the incredible weekend weather. We attracted a horde of enthusiastic, geeky hacks and hackers ready to learn some new skills.

Special props go to mapping workshop presenter Joey Coleman and the Spectator’s open-data reporter Bill Dunphy, who commuted in from Hamilton. We were glad to see The Hammer represent; they have admirably strong, active digital journalism and civic engagement circles. 

Despite the fact that a number of people had to go home after the first half (understandable; six hours of full-on geekery is a tall order on a sunny Saturday) we still had a pretty full room going all the way to the end. Here, Joey Coleman (centre left) leads us even deeper down the rabbit hole of hacker journalism. 

Clearly in the zone. 

Nerd Alert! (Pete Forde would correct me: “Ahem, that’s ‘Geek Alert’ ” …)

The collaborative energy on Saturday was really exceptional. In light of the trouble Open File Toronto has had recently getting FOI data from Toronto Police Services in digital format (the TPS mailed hundreds of pages of data, refusing to hand over a digital version), we asked everyone to lend a hand with some manual data entry, grassroots style, and they all pitched in. 

[UPDATE: I’ve made a note to contact Carole Moore, chief librarian at U of T and colleague of Open Library/Internet visionary Brewster Kahle, for OCR software recommendations to circumvent bureaucratic blocks like this in the future. Stay tuned!]

One particular data enthusiast and freelance hacker present was especially helpful: Robert Medeiros took our original workshop reference list and expanded it to a veritable treasure trove of ddj resources (below).

We’re planning another workshop in early August — if you want in, by all means let me know at momoko@buzzdata.com and I’ll get you plugged in right-quick. 

BD hosts data journalism workshop July 9

Back from an enlightening trip to Berlin; now to plow ahead with more data-related fun!

We had such a good time presenting BuzzData to Toronto’s Hacks/Hackers community back in May we decided to get even more involved: with the support of the #hhTO organizers, BuzzData will be hosting a hands-on data-journalism workshop here in downtown Toronto!

Details below (sign up here):

A hands-on data-journalism workshop!

Start learning how to wrangle data and turn it into stories!

This first workshop will focus on cleaning and augmenting data with the power of Google Refine, mapping data with the easy-click magic of Fusion Tables, and visualizing in myriad ways with ManyEyes. 

Fusion Tables/Mapping: led by Joey Coleman (OpenHamilton)

Google Refine/ManyEyes: led by Momoko Price ( BuzzData)

Got ideas for data stories? Bring them! Share them! Let’s brainstorm and get started together. Are you a wizard with web tools? Come impart your wisdom to an eager group. Sitting on a mountain of dirty, poorly-formatted data? Bring it and crowdsource that sh*t already!

Who: Hacks/Hackers Toronto and BuzzData invite anyone — hack or hacker — who wants to get serious about telling stories with data to come on down. 

Where: The Centre for Social Innovation,   215 Spadina Avenue (in the Alterna Boardroom on the 4th Floor)

When: Saturday July 9, from 11am to 5pm   

How much: Free! (how can you say no?) 

Attendance limit: 35 spots — if it fills up, let us know that you were interested; we can always set up another workshop later!

Want to join in the fun? Sign up!

http://meetupto.hackshackers.com/events/24252121/

BuzzData to present at Hacks/Hackers!

Mark your calendars, y’all — BuzzData will hitting up Radiolaria (1166 Dundas St. W., Toronto) Monday evening (May 30, 7pm) to present our vision and a demonstration of the BuzzData app at Toronto’s Hacks/Hackers meet up. We’ll also talk about how we hope BuzzData will change the landscape of data journalism and open data for the better.

In the last few months we’ve gotten invaluable support and advice by knowledgeable members of the U.K.-based Open Knowledge Foundation, the Creative Commons, and other policy and data experts, both Canadian and international. Because of this, we’re really excited to finally share what we’ve been working on and explain how we’re different than other data startups, and why different is good.

One of the best things about BuzzData is how inclusive and community-focused the app is — BuzzData has, from the start, been guided by a vision that data can be  thought-provoking and accessible to — as well as consumed and created by — people of all professional fields and technical skill levels. If you can upload and update on Facebook, you can use BuzzData.

Not only is BuzzData a great tool with which data professionals and hacker enthusiasts can showcase their work and discuss nitty-gritty data processing details with their peers, it’s a place where curious newbies can sign in, look up a general interest like say, climate change or Canadian politics, and begin exploring and discussing how the world of public data applies to that topic.

BuzzData was built with the belief that, as Vodafone U.K. CEO Guy Lawrence once said, “data, on its own, is impotent.” Data ensconced in community and context, however, can be transformative.

You can check out all the details of the May 30 meetup here, and if you’d like to learn more about Hacks/Hackers in general, check out the home page for the global community here.

See you soon!

-Momoko Price