Data analysis and visualization tools are everywhere. Derek Willis, an interactive developer for The New York Times, uses plenty of them and the surprising thing is that—despite the fact he has the resources of a large, global newsroom—his favorite tools are all free. His message to freelance journalists and content creators: you don’t have to break the bank to make the most of data storytelling opportunities.
In fact, Willis says most tools used by colleagues at The Times are free. “Especially programs that store and manipulate data…most of the programs we use are free and open source, and that’s part of why we use them,” he says.
His main suite of tools: some kind of spreadsheet software, a free database manager, and some add-ons, like Fusion Tables, and programming languages for higher level tasks.
Spreadsheets — the Gateway Drug to Data
Once you have data, you need a way to keep track of it. Start with Excel or another spreadsheet program like Open Office or Google spreadsheets, Willis recommends.
Excel isn’t free, but it is ubiquitous, and has a few more features than its competitors. It’s also been around long enough that nearly everyone has toyed with it once, creating some familiarity.
“When I teach, usually I end up with students who have used Excel for something, maybe not for reporting but either personally or for something—maybe they had an office job, some kind of data entry job and they used Excel for that. It’s sort of like a gateway drug to working with data,” says Willis. “You can kind of get into it that way: it feels somewhat comfortable, and then if you want to go further there are plenty of tools and choices out there for you.”
To Scrape or Not to Scrape
If you’ve ever copied and pasted tables from a website into Excel or another spreadsheet, you’ve done the very low-level work that a computer scraper does: pulling in data from one or more public sources and storing it in a secondary location and format.
Learning how to use computer programming to scrape data from public websites requires at least a basic level of technical skills, and it’s something that can be accomplished more easily after learning a coding language such as Python or Ruby.
Not everyone needs to learn how to scrape data, says Willis, but it may save you some time if you find yourself cutting and pasting information from the same website or source over and over.
“Learning how to do it programmatically can really help take some of the anxiety out of that process. There’s certain things that computers are built for, and one of them is to do repetitive tasks,” he says.
To store the data, the New York Times uses databases like MYSQL or Postgress (which is better for spatial data).
Analyzing Public Data
Each year, the Peace Corps publishes a list of universities with alumni who are volunteering. “In order to maximize the coverage, they create three top-25 lists, one for large colleges, one for mid-sized colleges and one for small colleges,” Willis says. Although he sees nothing wrong with that, a journalist might approach the same data set a little differently.
It would be much more interesting, not to mention more fair, says Willis, to calculate a per capita ranking of universities that contribute volunteers.
Whether you’re using a simple spreadsheet and Fusion Tables or a complicated Web scraper and your own database, this is just one example of taking information and rearranging it so it’s useful for you and your readers.
If your ability and technical knowledge limits the type of story you can do, that’s when it’s time to pick up additional skills. “I wouldn’t want to be in that position where I want to do a story but can’t technically manage it, because that would be frustrating and disappointing,” Willis says, “…and also there are stories that need to be done regardless of how difficult they are to do, and that makes a pretty good case for actually learning how to do it”
Data Storytelling in Action
Want to get a taste of some of Willis’ favorite projects that he’s worked on, using the skills he urges others to learn? Here’s a small sample:
The New York Times compiled data provided by the Environmental Working Group on over 200,000 facilities that have permits to discharge pollutants, as well as collecting information from states on compliance. Users can see whether contaminants in their water meet legal limits.
This project not only allows users to look at 2012 presidential polls (or the percentage of change from 2008), but also shows specific data based on voter age, gender, race & ethnicity, income, education level, ideology, political party, sexual orientation, views on the economy and other factors.
In addition to an elegant visual representation of a running tally of money raised and spent by presidential candidates, The New York Times also displayed spending by independent groups (including super PACs) per week.
Senate Vote 242
This chart looks at the vote to change the rules on limiting filibusters for most nominations, drawing for readers how the votes hinged on geography, party affiliation and individuals.