Sport

Data for the 2016 Olympic Games in Rio de Janeiro

Making stats on the athletes and events available in a structured format

I’ve written before on the short-lived nature of pages on the World Wide Web and how valuable information disappears every day. Well here’s another recent example of useful data being lost.

Back in August 2016, when the Olympic Games were in full flow, the official web site for the event, rio2016.com, listed every sport, every athlete, every event, and the winners of every gold, silver, and bronze medal.

It was a fantastic resource, and because I wanted a copy of the data I wrote a web scraper, a computer program that downloads web pages and then extracts information for later analysis. I didn’t have any particular use in mind but I thought it would be nice to provide a public copy of the information in a structured format.

I’m glad I did, because as soon as the games finished the International Olympic Committee (IOC), the organisation that runs the Games, deleted the website and removed all the information that was there. It was a very short-sighted move.

Now you can see that the original website redirects to a different website completely.

The Internet Archive, an organisation that provides access to archives of the Web, has a copy of the original site. For technical reasons — the IOC’s fault — it’s hard to use but you can see an example of the information that was available during the games on the archived page for Jason Kenny.

I can’t emphasise how unimaginative I think the IOC have been by removing the website. It would have been a goldmine for researchers.

Fortunately, my structured copy of the Rio 2016 data is publicly available. The dataset contains the official statistics on the 11,538 athletes (6,333 men and 5,205 women) and 306 events at the 2016 Olympic Games in Rio de Janeiro. If you need to verify the information you can delve into the archived website.

Update, 29 December 2017

Since I first wrote this the IOC have put the 2016 results on their own website, and most — but not all — of the data is available again. It’s also unstructured (you can browse it but not analyse it) so the data files I saved are still a better source. But at least they’re making the results available — for now.

Read more: