INAIL (National Institute for Insurance against Accidents at Work) corporate information assets are undoubtedly enormous and increases exponentially, being fed daily from various sources. The search for data that is useful and functional to correct interpretation of the various phenomena is therefore complicated and sometimes incapable of providing an accurate overall picture, useful to data governance.
“One of the questions that we wanted to answer – says Giuseppe Morinelli, central prevention coordinator at INAIL-, relates to the effectiveness on the health of workers of prevention policies implemented thanks to INAIL contributions. A reply that, thanks to text mining and data mining carried out by the working group composed of INAIL experts and data engineering scientists, has led us to understand that the trend in accidents within companies that had participated in tenders for INAIL incentives is three times lower than within others.”
The application of Text and Data Mining techniques within the Institute, with an analysis path relating to the impacts deriving from the issue and implementation of the ISI notices on the prevention of workplace accidents, has produced a new scenario, ensuring thorough and exhaustive exploration of the data, highlighting significant areas of analysis.
The context
As of 2010, INAIL has launched economic support mechanisms for businesses to encourage the upgrading of facilities, machinery, equipment and organisational models to comply with workplace health and safety regulations in implementation of Leg. Decrees 81/2008 and 106/2009.
Every year a notice is published setting out the procedures for access to funds made available by the Institute itself and/or by external bodies and whose data are managed by a software application.
In this way a significant amount of data and technical and administrative documentation is generated, relating to the single request for participation in the tender, so that it is possible to characterise each individual project based on numerous parameters.
Data analysis
Given the need to monitor the effects on accidents of the interventions carried out following the award of the ISI tender, an INAIL-Engineering working group was established, which made data central to the analysis process. It started with the study of data for 2010, the first year that the tender was launched, which guaranteed the appropriate lapse of time necessary to statistically monitor accidents.
The phases of the analyses process
There have been various phases in the data stream processes:
- Data Preparation, which aimed to extract data from the databases and document the various sources in addition to implementing liaison activities between the different data structures.
- OCR Preprocessing, namely the extraction of text from documents of interest and the identification of textual elements useful for analysis.
- General Preprocessing, where data is prepared for analysis by specific software.
- Descriptive analysis, with the objective of describing the data with statistical techniques to highlight the basic features.
- Data Mining, which identified the underlying causes of the phenomena that characterise the ISI system, making predictions of quantities of interest and identifying corrective actions.
The complex extraction of previously unknown implicit information and the application of techniques for recognising significant patterns within the structured data have highlighted recurrent models (or sometimes even significant for their exceptional nature) that have guided the user to new decisional scenarios.
“For the first time – continues Morinelli – we used text and data mining techniques to extract data from different databases and documents, and by means of text mining, we detected internal links, not immediately evident otherwise, between the data components.”
All this has enabled information to be extracted from different sources seemingly unrelated to each other, to thus understand the effects of a measure and provide useful tools to make decisions based on data.
As stated by the head of Central Management for the INAIL Digital Organisation Stefano Tomasini, in an interview for Ingenium, “PA [Public Administration] data and services should seize this objective, which is technologically within reach, to change people’s lives for the better and also, in the future, the PA“.