Data Science and Machine Learning

Vast swaths of data are available, much of this being open data, but how might these be linked together and what might the correlations be between them?

Data science and machine learning

Data science and machine learning

Are there any interesting data clusters which machine learning can identify from data and the hundreds of thousand and millions of data points in available data?

Without a doubt there are: it’s a matter of thinking about what you might want to achieve, so a bit of visionary thinking or brain storming will often be the first stage otherwise you can find yourself thrashing around without a clear idea and may give up because a focus for your efforts hasn’t been formed.

Collecting relevant data and then preparing it so that it meets your requirements to enable it to be adequately analysed and used with a range of relevant algorithms typically takes 80% or so of the time in data analysis. Once the data has been cleansed and well prepared then analysis can take place to find any patterns, predictions etc. from the data.

A relatively simple outcome might be to want to know where all outdoor sports surfaces are located within the UK and relate this to population (within a defined age group maybe). This can be readily achieved with the range of open data from Government organisations and on it’s own is not too difficult a challenge, however, what purpose could this be used for? Quite simply to work out supply and demand and user ratios, which is probably one of the commonest and simplest actions to carry out and which is already carried out.

Other ideas spring to mind straight away, which can be of more use to the grounds care industry – for example, the value and totals of the materials needed to maintain the surfaces could be grouped into sales areas, helping to focus on untapped areas or prime areas where a high density of surfaces imply potentially higher sales. A similar principle could apply to training courses: there’s probably only a small potential return from promoting certain types of courses in regions/areas which have little or no surfaces, yet higher potential returns from identifying those with high concentrations of those surfaces.

Here’s a fairly simple analysis of all grass football and rugby pitches along with artificial synthetic pitches within England plotted using latitude and longitude data (the data was pulled together into a suitable form from Government open data, which needed a fair bit of adapting to meet my needs) :

Python code

Sample of the Python code used to extract relevant data to create a suitable visual (shown below)

Grass Football & Rugby and Synthetic grass pitches in England

Grass Football & Rugby and Synthetic grass pitches in England

IOW area pitches

Zoom in on the IOW area pitches

How might typical weather / climate affect potential material sales? For example, potential fertiliser sales will or can be related to the length of a growing season.

Combining climate data and location data of sports surfaces can also help to indicate likely mowing requirements (e.g. time, frequencies, fuel requirements).

Combined data for surface requirements (i.e. the sport and the standard of play) plus Rainfall plus Growing season plus Soil type (for non specially constructed surfaces) can be used to provide an informed estimate of how many games might be realistically playable in that locality. This can then be used, along with other data (including realistic expectations with available resources) to more accurately determine what actions might be needed to improve surfaces without the need to physically inspect each pitch.

Building on the data, an informed estimate of material and resource inputs can be provided for the complete maintenance of a turfgrass surface. What about more accurate probabilities of disease incidence during the year? More accurate water requirements? Is the labour force adequately located to ensure suitably qualified workers are located in the sufficient number to match that of the sports surfaces?

What is the total environment impact from the maintenance and management of outdoor sports surfaces? We often hear about how sustainable something is supposed to be, but with little explanation of what this actually means from those making the claim. Data science and machine learning provides the opportunity to provide well-informed analytical data on which to more formally, openly and transparently promote a sustainable approach to grounds management. An open and honest discourse supported by statistically sound data analytics will provide the necessary lever needed to project a more professional and environmentally sympathetic image for the industry.

Is there ultimately a model of grounds management which effectively pulls together a huge range of data points to help more effectively maintain and manage the many outdoor sports surfaces in the UK, or globally, for the benefit of all stakeholders? I’m convinced there is and with the advances in technology, especially sensor technology capturing billions of data-points (the ‘Internet of Things’), the potential for developing a truly comprehensive grounds management digital system is coming, if not already here.

Chris Gray, 27th January 2019