Our contest is over, check out the talent!

Our Best City in the World Contest has come to an end! We’re so excited to start compiling all the submissions and seeing what people came up with. (Backgrounder: in this contest, the Economist Intelligence Unit challenged the world to devise and visualize new ways to rank cities and measure urban liveability.)

You can check out — and in the case of interactive submissions, play with — many of the contest entries which have now been made public on our Best City Contest Topic page (which aggregates all the submissions). 

Some screenshots of the most recent submissions (picked for no specific reason):

Very impressive — great job, everyone!

Promote your data with our new badges

Hello and Happy Holidays from everyone at BuzzData!

The new year is fast approaching, but we couldn’t help but push a few more new features to the site before 2011 officially comes to an end. Without further ado, here are the latest and greatest improvements we’d like you to know about:

Dataset and user profile badges to add to your websites

We have just released a set of customizable publishing badges  you can now embed into other website(s), thus instantly increasing the visibility and reach of the data you generate.

Whether you want to use them on a personal blog or a company homepage, BuzzData badges make it easier than ever to share your data with the world.

Adding badges to your website  requires simply copy-and-pasting a few code snippets into your webpage source code. You can learn all about how to add badges to your website in our new FAQ & Knowledge Base. Try them out and let us know what you think!

Tweet, share and ‘like’ datasets

Now sharing data with your social circle is as easy as a single mouse click. Want your social media followers and friends to be able to download cool data you find on BuzzData? Get the data into the Twittersphere in an instant.

Community Tasks

This is an early-stage community feature we’re quite excited about. Now whenever you create a new dataset on BuzzData, you can write in tasks  that you’d like help with from others in the  dataset’s Overview tab.

So what? Well, if your dataset is public, it will show up in our new global “Tasks” webpage, located just to the left of the BuzzData search box, along with the tasks you need help with.

The Tasks page is where users can peruse unfinished data projects from around the world that they can contribute to, thus helping the data community work together to achieve their goals.

To illustrate, when creating a dataset:

The global Tasks page is an early-stage feature that will continue to evolve in 2012. We hope you make use of it often and let us know how we can make it better by emailing us at support@buzzdata.com.

Now accepting direct image uploads and video URLs

Have you made a visualization of your dataset on your desktop and want to upload an image or video of it in action? No problem, you can now upload image files directly to the site and view them in our new visualization viewer.

In addition, BuzzData’s visualization viewer now allows you to post and stream videos from a variety of popular content providers such as Youtube, Vimeo and BrightCove, so you can showcase videos of interactive visualizations and other media on BuzzData as well.

Alright that’s it for now! We hope you have a lovely, stress-free winter break and a fun-filled New Years’ Eve!

The BuzzData Team

Mournbots, Poppy Files and Veterans’ Day-ta

Digital newsrooms at the Ottawa Citizen and OpenFile decided to use technology to help connect their readers to our war-torn past this Remembrance Day.

@Wearethedead is a bot created by Ottawa Citizen data journalist Glen McGregor that tweets one fallen Canadian soldier on the 11th minute of every hour. Excluding any updates from today onward, it will take 13 years to tweet the entire database. 

You can read more about how McGregor came up with the idea in the Ottawa Citizen blog post here

Each tweet includes the name, position, date and location of death, and age of the soldier who died. 

OpenFile’s Poppy File  is a data-driven historical retrospective on Canadian veterans that has blossomed from a simple map only a year ago into a beautiful, popular interactive series that allows viewers to discover the identities of soldiers killed in war who once lived in their neighbourhoods, in addition to touching personal narratives and summary charts.

Lovely to see newsmedia using technology creatively to help us connect to an increasingly distant past. 

-Momoko Price

Deadline for contest submissions is tonight!

If you planned on submitting to our first data-storytelling contest, the deadline is midnight EST tonight! To submit, there are specific guidelines (they’re easy but you do have to follow them). The original blog post is here, and here’s a re-post of submission guidelines:

HOW TO SUBMIT YOUR ENTRY:

Submit your project by inviting me (Momoko on BuzzData) and the original publisher of your chosen dataset (see original blog post) to check out your project on BuzzData when it’s done. 

If you know how to use BuzzData, submit as follows:

1. Clone one of the datasets above directly from the publisher on BuzzData. Make it private if you don’t want people to see it until it’s ready

2. Build your project on BuzzData (posting links and viz’s as appropriate).

3. When it’s done, note in the Overview which visualization/article/attachment is the final product(s), then, before deadline, invite me and the data publisher to check it out. And don’t forget to make it public if you want to show it to the world!

4. Tweet and link to your project elsewhere if you want to build interest in it (optional, but always a good idea)

If you’ve never used BuzzData before, here’s a quick video that shows how to start, build and submit your data project (don’t worry, it’s super easy!):

THE WINNER WILL BE VOTED ON AT OUR NEXT MEETUP ON OCT. 24! 

(Short notice, but no worries, this will happen monthly)

Want to attend BuzzData’s workshop next week? Here are the details:

Where: Room 120 at the Centre for Social Innovation, 215 Spadina Ave. (north of Queen St. West in Toronto)

When: Drop in from 6pm onward on Monday, Oct. 24. We’ll be closing up shop by 9pm

RSVP to our Hacks/Hackers meetup group here, please!

Hope to see you there!

-Momoko Price

 (Have you tried BuzzData yet? What are you waiting for, silly?)

Can you spin stories out of data? Prove it.

                      BUZZDATA’S FIRST DATA-STORYTELLING CONTEST IS ON!

It was only a matter of time until I tried something like this. Hopefully this will be the start of something awesome here in Toronto (and perhaps abroad, if others want in): people evaluating and competing to tell the best stories with data.

THE GOAL:

To tell the story behind the data through your own BuzzData project

THE PRIZE:

$100 ITunes gift card — inspired by this guy’s prize for his “Pop and Lock-toberfest” circa 2010 (Sorry. Couldn’t help myself.)

THE RULES:

The number of data sources you can include in your project is unlimited, but you must use at least one of the following, and you have to include all data sources used in your final submission.

1) Toronto Water Billing Data for the last 10 years

2) EIU’s democracy index rankings 2008 and/or 2010

3) Canada Revenue Agency Contracts

4) World Bank Development Indicators

THE DEADLINE: 

Midnight EST, Sunday October 23, 2011

HOW TO SUBMIT YOUR ENTRY:

Submit your project by inviting me (Momoko on BuzzData) and the original publisher of your chosen dataset (as above) to check out your project on BuzzData when it’s done. 

If you know how to use BuzzData, submit as follows:

1. Clone one of the datasets above directly from the publisher on BuzzData. Make it private if you don’t want people to see it until it’s ready

2. Build your project on BuzzData (posting links and viz’s as appropriate).

3. When it’s done, note in the Overview which visualization/article/attachment is the final product(s), then, before deadline, invite me and the data publisher to check it out. And don’t forget to make it public if you want to show it to the world!

4. Tweet and link to your project elsewhere if you want to build interest in it (optional, but always a good idea)

If you’ve never used BuzzData before, here’s a quick video that shows how to start, build and submit your data project (don’t worry, it’s super easy!):

THE WINNER WILL BE VOTED ON AT OUR NEXT MEETUP ON OCT. 24! 

(Short notice, but no worries, this will happen monthly)

Want to attend BuzzData’s workshop next week? Here are the details:

Where: Room 120 at the Centre for Social Innovation, 215 Spadina Ave. (north of Queen St. West in Toronto)

When: Drop in from 6pm onward on Monday, Oct. 24. We’ll be closing up shop by 9pm

RSVP to our Hacks/Hackers meetup group here, please!

Hope to see you there!

-Momoko Price

 (Have you tried BuzzData yet? What are you waiting for, silly?)

Visualizing Toronto’s water usage: a tutorial

Earlier this month I attended and spoke at News:Rewired, a popular digital journalism conference in London, U.K. The journalists there were top-drawer: from Reuters, the BBC, the Guardian, the Telegraph, and others. My talk, on how data curation will be key in driving digital journalism forward, appeared to resonate with quite a few people, which was great.

However, more often than not, attendees came up to me, thanked me for the talk and then prefaced their clear enthusiasm for data journalism with an almost bashful admission that they lack the data exploration, analysis and visualization skills to actually do it. 

This is no surprise. Journalism has historically always been a narrative craft and largely still is. But this experience did make me think that perhaps it might be helpful to make step-by-step tutorial posts showing how to probe and visualize data. 

(Are you a data geek with a tip or tool to show off? Ping me about guest-posting or contributing to a BD tutorial)

This first data-tutorial post — a very basic one for data newbies — will begin exploring some Canadian government open data recently published on BuzzData: the City of Toronto’s water billing data over the last 11 years. A city journalist’s spidey-sense should tell them right away there’s a budding story to be had in this data, namely: 

Which wards are the most water-efficient and water-wasteful in Toronto? 

Let’s follow this 5-step process to find out.

 (Want to follow the evolution of this data and stay updated on results as I go? Follow my dataset on BuzzData

Step 1: Get the data

First you have to get your hands on the data you want. This particular dataset is easy to get: just clone the data from the original publisher here: www.buzzdata.com/opento, and then download the xls file to your desktop. In this video I show you how to clone the data and make it private so you can build your project around it without others seeing what you’re working on:

 Step 2: Pick a question to answer

In future posts we’ll get more sophisticated with our exploration and visualization. For this first exercise we’re going to pick a very specific, simple question:

 “Which wards had the highest and lowest average water consumption last year?”

If you open up the dataset you downloaded, you’ll see that it actually splits water billing accounts into two types: residential and commercial. Let’s stick with residential. (Feel free to repeat this exercise on your own to find out which wards had the highest and lowest average commercial water consumption, and then see if there’s a correlation between the two types …)

Step 3: Pick your visualization method (and use K.I.S.S. — Keep It Simple, Stupid)

I’m going to make a bar chart in Excel. I know, it sounds boring, but here’s why:

To answer my question, I’m going to visualize discrete data (Toronto wards), and only compare one kind of value (the wards’ average residential water consumption). Any other kind of graph would probably be less clear in the long run, because the extra bells and whistles of the method would just add noise to the image.

However, if I wanted to highlight water consumption trends over multiple years, a line graph or time series chart would likely work best.

If I wanted to know which wards were close to each other, a heat map using GIS data would be great. We’ll get to those in the future.

As a rule: pick the method that would best highlight the answer to your question!*

*You may not know which one works best without a little trial & error first.

 Step 4: Format your data

 Now it’s time to look at the data:

That’s a lot of data. Graphing this entire spreadsheet would be pointless, in fact it would probably be harder to understand than the spreadsheet itself. You have to think about what information pertains to your question. I want to know which ward was most efficient and wasteful last year, respectively. So I need the following data:

Average residential water consumption for each of 44 wards in the year 2010

Everything else — commercial accounts, # of accounts, total consumption, etc. — would be noise on the page. So how do we get just this data? There are lots of ways, but in this instance we’ll make a Pivot Table: 

Now we have a nice Pivot Table, but we’ll want to do just a little more formatting and organization before we make our graph. In this video clip I show how to prep your table for graphing, as well as how to sort your data to get an idea of what your findings will be:

Step 5: Graph it & and get your answer!

Before going on, let’s recap what I’m trying to find out here. My original question was:

“Which wards had the highest and lowest average residential water consumption last year?”

By sorting the data earlier, we already know our answer. Now we just want to visualize it. Because we already sorted and formatted our data, graphing is now a piece of cake with Excel’s chart wizard. Here’s how you do it:

        

And that’s how you make a nice clean bar chart (and start to explore data with a journalistic frame of mind). What other trends can you find in this dataset?

One more note: visualizing data one way to answer a question often prompts new questions! In this case, I can’t help but wonder whether city wards with similar water consumption levels cluster together geographically. To answer this, we’ll need to map the data, so stay tuned for the next tutorial post to learn how!

NEXT UP: Visualizing Toronto’s water consumption with GIS (geographic information systems) data. In other words, shapefiles and mapping. Woohoo!

-Momoko Price

Enjoyed this but know you could do better? Great!  Ping me about guest-posting or contributing to a BD tutorial post!

Want to follow the evolution of the data as we go? Follow my dataset on BuzzData!

Our next data-journalism workshop is nigh …

The Guardian's data journalism workflow

(Part of the Guardian’s popular Prezi presentation of its data-journalism workflow) 

… and well, it’s going to be awesome. 

In case you missed my most recent email update, here’s the rundown of tomorrow afternoon:

WHERE: The Marketcrashers Hackernest, 231 Wallace Avenue, Toronto, ON

WHEN: Saturday, September 24, 1 pm – 4 pm

WHAT TO BRING: Laptop, power cord, and a determined, enthusiastic attitude

HOW TO RSVP: Please do so on this Meetup

TENTATIVE EVENT SYLLABUS: 

The workshop itself will have two “streams” happening simultaneously:

STREAM 1 (for hackers): 

  1. A hackathon-style brainstorm-fest for the more advanced hackers

  Like a hackathon, I’ll pick some specific datasets to hack on, as well as offer a prize for best project. (iTunes gift card? value ~ $100.)

  Considering this is meetup is data-journalism-focused, the goal of the contest  will be: Can you find and convey a story from this data? The finished project  can be anything: an essay, a data-viz/infographic, an app, but it has to be web-  publishable. 

  The non-competing “hacks” of the group act will act as as final jury on selecting  the winner, since they have the narrative expertise and editorial sense to evaluate  projects on their clarity, novelty, story cohesiveness.

  Attendees would have the meetup time to brainstorm ideas and/or pick collaborators, and then have the following week to code the h*** out of it. 

  I will be sending out links to the candidate datasets and other rules of the  contest to the RSVP list on Meetup, so be sure you’re on it if you plan to participate. 

STREAM 2 (for hacking newbies): 

  A tutorial/skill-learning stream, with a planned step-by-step curriculum of exercises.

I was going to plan this around simple data-wrangling tricks I hear about through my job, but a data-hacker friend of mine made a really good point about journalists not trying to avoid coding if they really want to mine data. 

  In his words: 

  “To my mind, all of these various sites and tools are great so long as you have to   have a problem / data set small enough to use with them … and have a problem   that the tool is actually suited to address. For all of the effort spent, why not learn a fully general programming language and, having obtained mastery, wield great power over mere mortals?”

Frankly, I agree, and I think most journalists who want to hack are actually eager to learn, as long as they have some kind of curriculum to follow and someone to coordinate. So that’s what I’m going to help with. 

So, the suggested general curriculum for newbies:

PART I: 

  - install Python before workshop

- intro to python syntax

- basic data types: primitives, tuples, lists, dictionaries

PART II: 

- fetching a JSON file via HTTP

- working with JSON (much nicer than CSV if you can get it)

- storing data in a SQLite database

- querying the database

Tackling these face-on, from the ground-up, will empower you to:

  1. write and customize scripts – which is what data-scraping is all about.
  2. make full use of public APIs and actually know how to use the data once you get it
  3. be able to explore data more flexibly without relying on outdated, proprietary software like Microsoft Access

I’m getting Part I ready for this workshop, and maybe we can set up a followup workshop in a week or so to tackle Part II. Python is actually a really fun, easy language to learn, by the way. You’re going to love it!

And of course, everyone is free to discuss other projects they want to tackle, etc.

Again, if this sounds like something you’d like to join, please RSVP! (And if you can’t make it this time, please un-RSVP, so those on the wait-list can get in on it.)

-Momoko Price

Stanford wants to teach you about data

Wow, data enthusiast’s deal of the day right here: The Stanford School of Engineering is offering a free online course on databases from Oct. 10-Dec. 12. Nicola Hughes at ScraperWiki and Joey Coleman of OpenHamilton have already organizing study groups. Thinking BuzzData should start one too for Toronto peeps. Talk about a find!

Sign up for the course here: 

http://db-class.org/

Nearly 60,000 people have already registered. Here’s an excerpt of what they’ll be offering:

A bold experiment in distributed education, “Introduction to Databases” will be offered free and online to students worldwide during the fall of 2011. Students will have access to lecture videos, receive regular feedback on progress, and receive answers to questions. When you successfully complete this class, you will also receive a statement of accomplishment.

I think this is a great opportunity to start a study group through BuzzData, since we already do workshops and now have access to a great venue at the startup hackerspace MarketCrashers, thanks to MC staff member Shaharris Beh

If you are interested in taking this course and joining a BuzzData-organized study group, let me know at momoko@buzzdata.com and I’ll start putting it together for early October. 

UPDATE: I’ve set up a tentative weekly meetup for Toronto-based folks (and anyone else who wants to join) here, on Hacks Hackers Toronto. Please RSVP there and I’ll sort out venue/time details to suit everyone (hopefully!).

-Momoko Price


Data-journalism reunion, anyone?

Well, we said we’d try to do these every month, so we’re back! BuzzData’s putting on another data-journalism workshop next week — can you make it? Be there or well, suck! Just kidding. But you should come.

 

A lot of things have happened in the last month, most notably: BuzzData is now public (and pretty awesome, if we do say so ourselves). So in addition to having the chance to learn more data-wrangling tools, this time you’ll have the opportunity to start using BuzzData and get connected to great data journalists and hackers around the world who are already using it. 

The details:

WHAT: BuzzData’s Data-Journalism Fun-Times (Vol. 2)

WHO: You, silly. RSVP to momoko@buzzdata.com 

(Space limited to 20 attendees. Don’t worry, we’ll do another round the following week if need be)

WHEN: Wednesday, August 17, 6pm – 9pm

WHERE: The BuzzData office – find us at 174 Spadina Ave, Suite #204 (just north of Queen and Spadina)

DETAILS: This workshop will be decidedly less intense & more individual project-focused than last time. Needless to say, bring your laptop and power cord. We will teach some tools, ie: more Google Refine wizardry (barely got started on GRefine last time), and an introduction to ScraperWiki, as well as show you how to look for and do data collaboration on BuzzData. A number of users on BuzzData have already started publishing weird, wild data worth mining, so it will give you an opportunity to practice viz/analytical skills with clean, machine-readable, interesting data, too. 

We also highly recommend you bring in either a) a project you’re working on, b) the basic pitch of a project you want to start, or c) a story/topic you would like to tackle from a data/quantitative angle. After all, much of data journalism is problem-solving. So pick your problem and let’s get to solving it!

Anything else? No? Great! See you there!

BuzzData, live and uncensored

BuzzData has now been public for one whole week. Time for an update! First, the community snapshot. What groups stand out on BuzzData so far?

BuzzData’s community is bustling — we’ve got close to 1,000 users registered on the site, many of them developers (obviously) but also a surprising and exciting number of data-loving journalists hailing from Canada, the U.K., the U.S. and Europe


We’ve been extremely impressed with the government open-data curators who jumped on test-driving us early. The City of Vancouver is already publishing data on BuzzData, while Toronto just signed up for an official City of Toronto account as well. We’re really hoping to see Ottawa and B.C. on BuzzData in the near future, too!

Most recently we were pleasantly surprised to discover a developer from Digital Science and the director of IT and Web from the Public Library of Science trying out the platform, each of them posting some fascinating datasets on impact metrics of science papers (here and here, respectively). 

[We recently had a deep talk with fellow Torontonian and quantum-computing superscientist/author Michael Nielsen about what kind of metrics might compel scientists to publish their data, and the challenge is a nuanced one, involving a host of technical, cultural and political considerations (blog post on this to come). So it was especially encouraging to know that news of our platform had already reached the science-data community.] 

All in all, a very promising first week. One thing I’d personally love to see: more avatar pics! Web 2.0 101: Without a pic, you don’t exist!

Next Up: The BD community visualized and BuzzData’s site superstars! Stay tuned …