Open data – use it for fun or serious business decisions

Data are (yes it’s plural) everywhere, but what is available as ‘open data’ is also vast.
“Open data is data that anyone can access, use or share. Simple as that. When big companies or governments release non-personal data, it enables small businesses, citizens and medical researchers to develop resources which make crucial improvements to their communities.” (The Open Data Institute, https://theodi.org/what-is-open-data)

Where can you find open data? Well, local authorities, Governing Bodies and Government funded research projects provide a wealth of open data ready for reuse by others. For example, The European Data Portal, https://www.europeandataportal.eu/en, has the strategic objective of improving accessibility and increasing the value of Open Data. UK Government data can be searched at https://data.gov.uk/.

Subject specific data is widely available, for example a range of data is provided by the Office of National Statistics (ONS), https://www.ons.gov.uk/, which provides a range of economic, demographic, educational data and more. Official Labour Market Statistics can be found at https://www.nomisweb.co.uk/default.asp; whilst analysis of data for qualifications can be found at the Register of Regulated Qualifications, https://register.ofqual.gov.uk/, with a full data set being downloadable.

Sports facilities data, in England, can be searched at the Sport England Active Places web site, https://www.activeplacespower.com/opendata. A downloadable folder with a range of data sets (as csv ‘comma separated values’) files can be downloaded for analytical purposes.

If you use open data from a single source then you will usually atrribute the data to that organisation. If, however, you use information from several information providers and multiple attributions are not practical in your product or application, you will typically use a statement such as “Contains public sector information licensed under the Open Government Licence v3.0”.

I am particularly interested in investigating the data available from Sport England for sports pitches and that of the ONS (available as csv or xsl files, as well as other formats) in relation to population sizes. I wanted to identify how the different regions of England compared in sports provision, especially in relation to grass pitches (Football, Rugby and Cricket) and tennis courts.

I had to remember that the data available was essentially raw figures and I needed to convert this into something which could be readily understood, i.e. information.

I created a MySQL database (I use www.oneandone.co.uk as my host) from selected data (significantly reducing some of the fields available to focus on what I would need and renaming some of the fields to make them more user friendly) the Sport England dataset; this being easier to query than a range of .csv files imported into an Excel spreadsheet, although .csv are ideal for uploading to a database although you do need to have the tables set up to allow a suitable import to take place.

Database structure

Database structure

I did find one issue with the tennis court data – the playing surface type was not directly linked to the courts themselves, so I had to create a separate table within the database and link these via what was coded the ‘facilityID’: To speed up processing times I then merged this data into a single table.

Tennis courts

Tennis courts

So, from the data available this indicates there are 15 courts per 100,000 of population in the North-East, to 36 courts per 100,000 of population in the South-East; a ratio of 2.4:1 in favour of the South-East.  Is this equitable? Is there a good reason why there is this difference? Analysing such data does start to promote a debate as it is a value judgement as to what is appropriate or not.

The different types of surfaces would need to be considered, and this is readily analysed from the open data; here’s a screen shot of an extraction of some of that analysis:

Tennis surfaces

Tennis surfaces

For the grass pitches query I split this down into querying the regions because trying to do the whole of England in a query resulted in the system timing out.

Grass picthes query for a region

Grass pitches query for a region

The overall results of the grass pitches data are given in the image below:

Grass pitches comparison

Grass pitches comparison

The number of football pitches available ranges from 79 – 102 per 100,000 (ratio of 1.29:1) of population, except for London which is 45 per 100,000 of population, although considering the built-up nature of that region this is no surprise. One question that would then need to be answered is ‘Does the number of artificial pitches, especially the newer 3G surface pitches, help to compensate for the reduced number of grass pitches available for football?”

Clearly a lot more analysis is needed, breaking the data in market segments – age groups, gender, quality of the grass pitches and how many games they can sustain without deteriorating below a certain standard, … but again data analysis opens the way for an informed debate.

Chris Gray, 18th November 2017