SOCIETY | Apr 10, 2020

The value of data to understand the Coronavirus Pandemic

Thanks to Open Data and Open Source in Knowage you can easily create graphs and make comparisons on the COVID-19 emergency

Numbers told at press conferences, announced on the news or represented in maps or tables. Numbers which refer to the Covid-19 pandemic that rain down on citizens every day and that do not always provide listeners or readers with a clear overview of what is happening in their province, region, country or in the world.

“The collection and presentation of data has become an even more essential skill for those who provide information, starting with Italian Public Administrations” – comments Grazia CazzinOffering Manager Data & Analytics in Engineering’s Research and Innovation Division.

“However, in light of the availability of open data concerning sick, healed, hospitalized people, there is a need for tools which allow us to give meaning and an immediate use to the numbers available. For this reason we thought of creating on Knowage, an open source business analytics platform, some dashboards which organize the official national and international numbers, whilst providing a personalization of perspectives and an easy choice of the most interesting comparisons (comparison between regions or countries, comparison between indicators)”.

How important are open data in an emergency?

The emergency has probably made it possible to understand the great value of open data used not only to understand the phenomenon, but also to make forecasts on the pandemic trend. Data that have been immediately made available in an open way by the Public Administrations and that, as far as Italy is concerned, have been made available by the Civil Protection on a public repository in GitHub.

“It was not a foregone conclusion that Open Data would be distributed by a reliable source so quickly” – continues Cazzin. “Clearly, much more could be done both for analyzing the current phenomenon and for assessing its impacts on the path towards a hoped-for “normal” situation, should such data be available with a greater level of detail. For example, it could be significant, not only for the analysis of the emergency but also for the evaluation of its repercussions in the various areas of our life, to have available data divided by Municipality and not only by Province and Region; by age group and gender; by type of job carried out by those infected; by average hospital stay time and so on.

The more the data are able, whilst respecting privacy of course, to provide detailed information, albeit aggregated, the more we will be able to use advanced analysis techniques, for example to enrich the reading of the situation with future estimates. This is something we intend to do and which we are already working on, using Knowage and the data available at the moment”.

What is the role of openness?

“COVID-19 has probably helped us towards an increased understanding of the importance of openness and the value of sharing, of cooperation even when it may be co-competition” – explains Grazia Cazzin. “Following the open and collaborative model typical of open source  communities, innovative organizations in the digital craftsmanship sector have also begun initiatives to print 3D masks, respirator valves and much more thanks to project sharing, allowing multiple subjects to participate in the creation of value for all in an area which can be both collaborative and positively competitive.

As active members of the open source community, we wanted to use our expertise in data visualization and analysis to make the open data on COVID-19 more usable, allowing everyone to analyze the aspect they consider most relevant and to make the comparisons they deem most interesting. Unlocking knowledge is a basic principle of free software, of open data, of open science and today more than ever we need knowledge”.

What is the role of data?

There are many initiatives which use open data, i.e. that are published and can be used by machines as well, to interpret the pandemic phenomenon. There are just as many options for representing the information. But how long can these be considered reliable?

“When talking of open data, I think it is fundamental” – explains Cazzin – “that we can be sure of their reliability and their updating. For this reason, at Knowage as well, at the moment we have used the data made available by Public Administrations which certify and explain the released data. This does not mean that information cannot be enriched by data from other sources, provided these are objective, collected, obviously not related to “feelings” or “hearsay”.

Unfortunately, the available open data often have a very limited territorial (and sometimes temporal too) value in order to be used effectively in national or international analyses. If, for example, in a virtuous Province more than one organization were to collaborate in releasing open data, this would truly be valuable to analyze the specific territory, but sparsely valuable for an analysis that at a national level would require the availability of the same data also for all the other Italian Provinces.

In addition to this, data must always be described so that there can be no errors of interpretation in the published figures. Just to give an example, if a company reported the number of masks manufactured, the average price, the geographical distribution of the mask deliveries and, in doing so, certified these as collected and up-to-date data, this also could be used to better analyze a certain angle of the phenomenon. Conversely, incorrect or insufficiently explained data (e.g. by asking questions such as: Does the number also include masks already used by employees? Does it include those discarded due to quality problems? Does it include those still stored in the warehouse?), and whose source is unreliable could lead us to create misinformation and to feed easily manipulated opinions. Something that we certainly don’t need at the moment”.

Sonia Montegiove