SOCIETY | Nov 21, 2017

National guidelines and licenses for Open Data

What is the proper license for publication of open data?

Does the publication of Open Data by public sector entities require an “attribution” license (like those recommended by the national guidelines, even in their latest version in 2016), or, is a more “liberal” license allowable? I have already expressed my preference in public documents, but it’s worth restating the issue, given that even recently I’ve had to contradict well-prepared friends on what is allowable.

Brief summary

Open Data is defined as datasets produced by the public sector and made available to everyone under conditions which allow reuse of the data for any purpose, including commercial purposes. The fundamental provision is Article 52 of the Code of Digital Administration (“CAD”).

In order to be available for the stated purposes, they must have established characteristics regarding quality, usability and – for what we’re concerned with here – legal rights. We shall see below that the legal rights which impede full reusability are those called “sui generis” or “database rights”. Copyright as it is normally understood is almost entirely absent from the picture.

The two alternatives that are proposed, as I have already described with Simone Aliprandi  in the Freegist.net study, are:

  • permissive licenses (“attribution”)
  • waiver

For a better understanding, I refer you to that article and to the guidelines for choice of licences in the Freegis.net.project.

2014 Guidelines

Regarding Open Data, Article 52 of the CAD refers to the definition contained in Article 68 of the CAD, making reference to the national guidelines to be issued, which those who wish to release a dataset as Open Data must conform to:

The Agency establishes and annually updates the national guidelines which identify the technical standards, including the determination of the types of services and data, the procedures and methods for realization of the provisions of Title V of this Code with the goal of making the process standardized, efficient and effective at a national level. Public sector entities as per Article 2, paragraph 2, of this Code shall align themselves with the aforesaid guidelines.

The binding nature of the guidelines thus refers, by regulation, only to the technical part and not (expressly) to the legal part of the guidelines. The 2014 guidelines work to establish a legal framework, and thus they deal – in my opinion correctly – with the subject of the preferred licenses to adopt.  Except that they adopt a choice which is incorrect in my opinion, i.e. they recommend using a Creative Commons ‒ attribution (CC-by). They do this for reasons which I find incredible:

In the end, one must remember that the majority of the data and documents necessary for performing the typical functions of public organizations are not appropriate for application of CC0, given that this involves the release of moral rights which are inalienable, unassignable and non-expiring according to national and European rules.

I have previously broadly criticized the idea according to which there is a moral right to protect here, I think there’s little to add. Actually, no. In the new version of the guidelines, this part has vanished. Apparently someone became aware of the issue, I’m glad, I don’t take credit for it, but I hope I have made a useful contribution to the discussion.

Is that settled then? No. We’re halfway across the bridge.

The 2016/2017 guidelines

The new guidelines continue to recommend the CC-by 4.0 license, although at this point it is in effect expressed more as a recommendation than as an instruction. Although, discussing it with persons involved, it is interpreted as more than that. Reference is made to a holdover of the old reasoning, which dictates, no longer three, but two reasons to choose an attribution:

Regarding the above, keeping in mind the regulatory context in question, the indications on the subject of licenses contained in Commission Notice 2014/C – 240/01 and the principles on the inalienability of goods in the cultural public domain expressed in Articles 10 and 53 of the Code on cultural heritage (Legislative Decree 22 January 2004, n. 42), it is deemed appropriate to refer to a single open license, which guarantees freedom of reuse, which is internationally recognized and which allows for the attribution of datasets’ authorship (source attribution).  Therefore, one suggests the general adoption of the 4.0 version of the CC-BY license, further presupposing the automatic attribution of said license in the case of application of the “Open Data by default” principle, set forth in the provisions contained in Article 52 of the CAD. Regarding the above, keeping in mind the regulatory context in question, the indications on the subject of licenses contained in Commission Notice 2014/C – 240/01 and the principles on the inalienability of goods in the cultural public domain expressed in Articles 10 and 53 of the Code on cultural heritage (Legislative Decree 22 January 2004, n. 42), it is deemed appropriate to refer to a single open license, which guarantees freedom of reuse, which is internationally recognized and which allows for the attribution of datasets’ authorship (source attribution). Therefore, one suggests the general adoption of the 4.0 version of the CC-BY license, further presupposing the automatic attribution of said license in the case of application of the “Open Data by default” principle, set forth in the provisions contained in Article 52 of the CAD. [emphasis added]

Thus the reasoning for suggesting an “attribution” rests on the following reasons:

  • Recommendation of the notice
  • Cultural public domain

I say this without trying to offend: neither of these two reasons meets the minimum threshold of acceptability for a legal argument.

Notice 2014/C – 240/01

The plain text of the cited notice (in reference to publication of documents, but this is the only part which deals with licensing) is already clear enough that it does not need too much explanation:

Of these, the CC0 public domain dedication (7) is of particular interest. As a legal tool which allows waiving copyright and database rights on public sector information, it ensures full flexibility for re-users and reduces the complications associated with handling numerous licenses, with possibly conflicting provisions. If the CC0 public domain dedication cannot be used, public sector bodies are encouraged to use open standard licenses [emphasis added]

The cultural public domain

Frankly, scanning through the list of “cultural goods” I don’t see data, databases and datasets. Goods of the cultural public domain are those cultural goods which belong to the State. Here too I see nothing which enlightens us.

Let me give an example: can someone explain to me the connection to the cultural public domain of the database containing the list of Veterinary medicines authorized for sale or with a suspended license or those containing analyses of personal income tax returns?

That alone should suffice. In the majority of cases the cultural public domain fits in like a sore thumb. One doesn’t legislate starting from an exceptional case and establish regulations for the entire subject based on that. One creates the most general rules possible and then one introduces exceptions, if necessary. But here they’re not even necessary!

Indeed it is clear that a cultural good is one thing and a datum or a collection of data (and even a digital reproduction! we’ll get to that next) is something else entirely. A dataset containing a collection of information on cultural goods:

  1. this is not at all an exhaustive representation of the types of data they made available. If one takes the trouble to examine the catalogues at various levels, one sees that this is a very specific type, of negligible significance, meriting if anything a specific provision, not something to dictate the governance of all data issued.
  2. it may contain data, reproductions or other material actually covered by copyright, but they do not extend to the license on the dataset itself.

There is a profound confusion here between the concept of a dataset, which is in substance that which is protected by the sui generis right, and the individual components of the dataset, the information, which may for example be photographs. The dataset is not a derivative work of the individual components. The distinction has been made once and for all by the same Directive on database rights:

  1. whereas works protected by copyright and the services protected by some connected rights that are inserted in a database nevertheless benefit from their respective exclusive rights and therefore cannot be inserted or reproduced by a database without authorization of the rights’ owner or assignee.
  1. whereas the existence of a distinct right in the selection or arrangement of works and services in a database leaves intact the copyrights of said works and the connected rights of the services inserted in a database
  1. whereas, rather than the protection of the copyright for the selection or arrangement of the originals of a database’s contents, the present directive intends to safeguard the compilers of databases from the improper appropriation of the result of the financial and professional investment made to obtain and gather the contents, protecting the entirety or the substantial parts of the database from certain acts committed by the user or by a competitor

When one speaks of databases, one speaks of objects on which three distinct rights may rest:

  1. the copyright on the individual content elements
  2. a copyright on the original database for selection or arrangement of works and services
  3. a sui generis right

Granted that Open Data licenses have serious difficulty handling original databases for selection or arrangement f works (recital 27), and that the rights regarding individual content elements of a database are regardless covered by their individual licenses, which are not prejudiced by their insertion in a database (recital 26), that which we are dealing with is only or primarily (but predominately) the sui generis right. This is data, information, which has its substance solely in that it has been collected, not when considered individually (unless this is under another form of protection).

From reading the original version of the guidelines it is clear that there has been some confusion between the three points indicated above. If there is something in the dataset which merits special protection (and this is the exception), then for that datum or category of data it would be appropriate to give an informational note on the associated license (and attribution of authorship if subject to copyright, something which is not guaranteed per se by the simple attribution of a CC-by license to all datasets).

The aim of Open Data

One last assessment on why a “waiver” license is preferable to an “attribution”. The answer does not come from me. It is given by the regulations of Article 52 of the CAD (which does not mention attribution). The regulation is not a masterpiece of clarity, also because it contains cross-references to other regulations, conflating the field of licenses for rights and the field of safeguarding privacy (which are two radically different elements and which are handled by radically different tools).  The regulation provides (by default) that data should be freely reusable by anyone, even for commercial purposes, even in a disaggregated form. One day we will have a lecture on the misleading use of “even” in regulations, but that will not be today.

The answer is given by the Commission notice cited in the guideline:

[…] offers full flexibility for re-users and reduces the complications associated with handling numerous licenses, with possibly conflicting provisions.

Furthermore, when dealing with all possibilities:

It is recommended that, depending on the law applicable, the obligations be kept to a minimum, requiring at most: a) a statement identifying the source of the document; b) a link to relevant licensing information (where practicable).

The phrase “tutt’al più” used in the Italian version (“at most” in the English version, “tout au plus en”, in the French) clearly doesn’t mean “as a minimum”, but “as a maximum”, emphasizing that the tendency should be to mimise  complications, establishing attribution as a ceiling.   In line with this, precedence should be given to licenses which, compatible with the rights that interfere with release, favor minimizing friction.

This license is the waiver model, it is  CC0.

My advice is always, or at any rate in all cases in which it is possible, to use CC0 as the preferred license for Open Data.

Carlo Piana