TECH | Jul 25, 2017

Open Data are not sufficient

What are the best open data strategies for the Public Administration?

Open Data is increasingly talked about as a central element for guiding innovation and transparency in Public Administration.

It’s very true: Open Data is a formidable tool for allowing citizens to access public information in a new way and through apps and software platforms created by third parties. They are an exceptional tool, certainly necessary for achieving such a goal but, unfortunately, not sufficient.

Open Data are not sufficient, and that has to be accepted

Open Data are not sufficient because they are almost always raw data, published as a file (in CSV format, at best) that can be downloaded by the citizen or the programmer to do something.

In 99% of cases, the citizen who downloads a CSV file will open it with Excel to access the information he/she needs directly; it would therefore be more useful to create a dedicated web page, perhaps powered by the same data source that generated the CSV file, meaning fewer complications and easier access to useful information, with the ability to index and insert links and images.

The programmer downloading a CSV file does so because those data will be used for an application, whether it is an app that analyzes the financial statement of a public body, or a complex system that makes museums and architectural assets available, or a map with the list of defibrillators present in a given area, or a weather alert service based on real-time weather data. Fortunately, public bodies make numerous data sets (data sources) available and there are many more possible applications that can be created by using those data.

The problem is that programmers cannot focus only on the functionality of their applications but, on the contrary, have to worry about creating a special database, uploading data to it from different CSV files that have perhaps been downloaded from several different public bodies; they also have to worry about ensuring the updating of those data by constantly monitoring the source bodies to check if there have been variations, corrections or new versions of the data sets themselves, new versions that will have to be uploaded and managed ex novo.

This model is out of date and must be overcome

I have already written about this in an article in April 2012 for Nova 100: it is time (rather, it was the time) to switch from the logic of Open Data to the logic of Open Services, from the “download and use” model to the Application Programming Interface (API) model that allows applications to access data directly, in real time, on systems of origin.

This can be achieved by following two different strategies and a few simple steps:

Strategy 1: Implement an in-house Open Services infrastructure

Stop producing CSV files. Of the Open Data pairing, the most difficult word to accept culturally is Open but, once a body adopts the perspective of making some data available to citizens, we’re almost there. At this point, the thing to do is NOT create many splendid CSV files and consequently NOT publish them in a special section of the body’s portal, but proceed to the next point.

Display data using a service and API software layer. The programmer wants services to access data, secure and certified APIs, standard formats and protocols; in two words: integration and interoperability. Being able to access information through services and APIs, the programmer does not have to take on the burden of creating, managing and updating a complete database containing data from different sources, sometimes from different bodies and possibly in different formats, but can focus on the functionality of his/her application. The CSV file in this case will always be available, but not as a physical file on the administration’s servers, but as one of the (many) formats in which data can be exported from individual data sets, possibly previously filtered, sorted or aggregated depending on needs.

Strategy 2: Use an existing Open Services infrastructure

When the body does not intend to equip itself with its own Open Services infrastructure, there is the alternative of relying on an external, dedicated and ready-made platform, such as the American Data.World, which is focussed on the goal of becoming the world’s largest hub for collecting public and private data, or the very Italian OpenRecordz, more oriented towards the concept of Open Data as a tool for realizing an authentic smart city.

With such a platform, it is possible to create one’s own data repository, as Data.World does, or create one’s own Smart City, as OpenRecordz does, defining, publishing and populating one’s data sets to then be able to use them with applications through standard APIs. For example, OpenRecordz allows direct access to data using REST API, and with the exchange of data in JSON, Data.World on the other hand allows direct integration with BI tools and dashboards.

What are the advantages of Strategy 2?

It will be easier for the public body to release data in the form of Open Data. There will be no longer be the need to prepare CSV files, list them, publish them on a specific page of the body’s site and ensure that they are updated over time. If the end user need to use CSVs, they can still be found them among the many formats in which individual data sets can be exported.

It will be easier to build applications that use Open Data. All that programmers, start-ups, SMEs or large companies wanting to create a variety of applications using Open Data have to do is think about their application without having to take care of the technical aspects of managing and upgrading Open Data.

Data always updated. The data “freed” by public bodies will always be constantly updated automatically; it will no longer be necessary to proceed with manual updating of data streams or switch to intermediate physical formats. In some cases, it may be necessary to create custom alignment functionalities to update data on third-party Open Services platforms. The applications that use them will no longer have to worry about the level of data updating, because of the certainty that they are always updated.

Control of use by the body. Bodies wishing to will be able to allow access to Open Services through a dedicated access key, similar to that used for applications living in Facebook and Twitter ecosystems. Use of this “controlled access” could be very useful for understanding which external application is using data, how much and how.

Collaborative Open Data. The information contained in a record of an Open Data data set is often not sufficient for the particular uses of some applications. In these cases the problem is upstream and simply depends on the fact that the body from which that information comes from does not have it.

But this is information that could make the difference between a useful and a very useful datum. In such cases, applications that use Open Data are often concerned about enriching the database by entering other information from external sources, or asking users to enrich that datum with additional information. By applying this model, a body could allow APIs not only to read the datum but also to add information to it.

This would be information that does not come directly from the body, and therefore, in a sense, not certified. Often, however, it is better to have uncertified information than to have no information.

An example of this functionality? Among the data sets issued by some municipalities, there is often a list of defibrillators present in the area, accompanied by geographic details and indications of where these defibrillators can be found. Try to think about needing a defibrillator, going to where it should be and finding out that it is broken and no longer available. Or it is there clearly visible but the office in charge is closed and you did not know, and you do not even know who to contact to open the doors to take the defibrillator and maybe save a life. Giving those who use these services the possibility of entering additional information, such as availability or absence of the defibrillator, opening hours and a point of contact to use, enables enrichment of the database from the outset and these data will also be available to all other applications that use that particular data set, all without spending one euro of public money. The quality of the database could eventually be ensured through data versioning tools which, in the event of vandalism or careless modification, permit restoration of the correct datum, all in a totally transparent way for the applications using that datum.

That is why Open Data are not sufficient: we need to start thinking about Open Services and interoperability; this will facilitate the production of the data themselves by the public administration and will provide new and more modern tools that will enable applications of surefire utility for citizens.

Massimo Canducci