Some time ago I was working on a new blog on sharing economy lets-share.de. The blog was specialized on assessing global and local sharing platforms and providing an overview of events based on the sharing model. After helping to develop the digital strategy and the exact focus of the blog, I decided it was high time to add some data-driven stories visualizing important issues of the sharing economy, which change our lives.
Mitfahrgelegenheit.de is the German version of Carpooling.com and is highly popular in Germany. I decided to create a visualization, which would show carpooling patterns between cities in Germany and would possibly reveal hidden connections.
I called the German office of Mitfahrgelegenheit.de and asked them for the data. What I needed was the database of all travels in the past years. Well, I didn’t get it. After that, I had to rethink the concept and think about my limitations more seriously. After all, I had no developer and no designer to support me. That’s where I came to the idea to limit my visualization just to one day, which was May 1, which is a public holiday in Germany.
Extracting data was the most difficult part: I first resorted to the add-ons and the Google Spreadsheets formula helping to extract tables from a website. However, the sheer amount of data points made me think of a different solution. I needed a tool, which would help me extract next tables from numerous pages following each other.
Outwit Hub proved to be the tool I wanted. In the free version, it can extract up to 100 rows. After buying its full version, I was able to extract all the rows I needed by copying and pasting the URL of the first page and clicking on the arrow symbol which means “extract all following tables as well”. I then saved the data as an Excel file. Since I didn’t have much rows, it was quite plausible to use Google Spreadsheets (which can only work with a limited number of rows).
After that I cleaned the data using Excel and started to make sense of it using Google Pivot Tables.
Limitations: I had no coding/scraping experience, no coder at hand and a very limited time.
Tools I used:
- Google spreadsheets (=importHTML (url, query, index))
- Google scraper add-on
- OutWit Hub (paid version)
- Google Spreadsheets
- Google Pivot Tables
- Google Fusion Tables: creating the network diagram and embedding it on the blog
- Datawrapper: creating bar charts with top-10 cities
Outcome and outlook
This was the first data journalism piece I ever created, and that was the piece I created on my own. I have noticed how important it was to have a basic knowledge of statistics (median, modus, correlations) and be able to write a couple of lines of code on your own. But, most important, you need to have a clear idea of what you want to tell the readers with your story. If I have had more time and resources, I would have done the following:
- Connect the visualization to the API of Mitfahrgelegennheit.de should they have one.
- Create a feed, which would update the visualization automatically. That way, the users would always be able to see real-time information.
- Create historical view of the travels to different cities. In an ideal case, every city should have a travel history and be comparable with other cities (how many people came to the city and how many people left on a particular day, compare two and more cities, add events like Oktoberfest to put this into a larger context).
- Personalize the story and let users change their city of destination and have a look at the most common price they would have to pay or the time with the largest number of offers. Ideally, these should be a link, which would bring users direct to the particular offer at mitfahrgelegenheit.de. You could even integrate the booking form into the visualization.