Finally completed the Data Journalism course, and was a bit surprised that I aced the final quiz: I did most of it in the first week, but then didn’t get back to it until the final day to complete it.
As mentioned in a previous post, there were five modules:-
- Data journalism in the newsroom
- Finding data to support stories
- Finding story ideas with data analysis
- Dealing with messy data
- Telling stories with visualisation
The first module did have a bit of fluff and the end-of-module quiz had some questions that I thought had no real value. I mean, is it really essential to know how many data journalists work for the Wall Street Journal? Good to set the context for the course though.
The second module covered much ground I was already familiar with:-
- major websites that publish data you are likely to be interested in e.g. Government, EU, UN etc
- using RSS newsfeeds, email alerts and Google Alerts to keep informed of new data published by sites you are interested in
- using advanced search feature in Google
But then there was a section on scraping. Google Drive spreadsheet has a really neat ImportHTML function that can put the content of an HTML table into a spreadsheet. There were also some other tools suggested: OutWit Hub, import.io, Chrome extension scraper and Scraper Wiki.
Apparently I’m not alone in being a data journalist that doesn’t like maths: the third module provided a really handy cheatsheet, with the rest giving you a pretty good grounding in Excel, covering filtering, sorting, formulae, variables and pivot tables. So a good refresher.
With my background in databases and SQL, I was impressed with the fourth module: covered many of the different data issues I’ve come across supporting commercial databases such as duplicates, missing data, typo errors, and date formats. Unsurprisingly, Excel was highlighted as a useful tool – database management systems and SQL was mentioned at the end if you wanted to take it further and/or need to work with bigger data sets. However, this module was where I picked up one of the most interesting of the tools covered: OpenRefine (formerly GoogleRefine). It has a fascinatingly wide range of features to import, format and cleanse data from a wide range of formats. It can do a lot of things that I don’t think you can do with a spreadsheet and has a number of algorithms that can help identify and correct typos. The downside of OpenRefine is that it is a web application that you run on your own machine through a web browser, so if you’re dealing with a big data set on a mediocre computer you may start having performance problems.
The final module provided a pause for thought about creating appropriate graphics to help tell the news story. The tool of choice was Adobe Illustrator, so for me it was more about showing what could be done and what you should consider when doing so.
As a whole, the course does indeed appear to cover everything a budding one-man data journalist should know and opening up areas for further exploration. I started the course knowing it was a basic introduction to data journalism, so would be covering things that I already have a much deeper experience of. It was well worth doing though – I certainly found some new tools and knowledge that I’m sure will help me as a data blogger and will no doubt help many others who have taken the time to do the course. They are planning to run the course again, and I highly recommend it.