TECH | May 2, 2017

Scraping of data from sites: can it be done?

What jurisprudence says with respect to the activity of data collection from websites, or web scraping

The topic we are going to talk about is undoubtedly a much-debated issue: web scraping, which carries the million-dollar question: is scraping data from sites legitimate or not?

As often happens, the interlocutor/reader is surprised that there is no one unambiguous answer, but several responses. Law – unfortunately? luckily? – is not an exact science, but we can try to understand something more together.

What is scraping?

Web scraping is a technique that involves using software to automate the retrieval of data of interest from specific websites. Web scraping is not on the other hand – and it is best clarify this from the outset – theft of confidential data: the data retrieved are already published/available on reference sites, but the software used is programmed to access these sites in a systematic and automated way at specific scheduled intervals.

It is, therefore, a technique similar to that used by search engines, which through bots, scrapers, crawlers and spiders, retrieve the information they will then use to provide services to their users.

Not only that: other sites – the so-called aggregators of which the web is full – use web scraping to offer users the possibility of comparing information present on different sites. But we will come back to this specific point later.

Scraping is a legitimate activity in itself but is likely to take on different meanings depending on the context and use that can be made of the data recovered, of the specific purpose of collection, as it is abstractly likely to constitute various violations, from copyright to the confidentiality of personal data.

In looking at the phenomenon, we cannot fail to consider that we are in the presence of a plurality of interests: ordinance mainly deals with those of site owners/managers, “deprived” of the economic value of the data/databases in their possession or in any case created by those who have invested considerable resources for the legitimate acquisition of consent from data holders.

Since the beginning of 2000, there have been various judicial appeals under overseas law in this sense – remember the Ebay vs Bidder’s Edge case? – many of which have involved big players, such as Facebook in 2009 in one specific case. Accurately studied Terms of Service allowed Facebook to bring action for copyright infringement against’s Power Ventures – and gave rise to the most disparate pronouncements oscillating from the propensity for lawfulness, to considering scraping as a brute-force attack, even if carried out in the absence of access authentication and the commercial purpose of the “scraping” operation. In this respect, the case of Weed, alias Andrew Auernheimer, who was charged and convicted in 2010 for having created a database of 114,000 email addresses from the AT&T site, has now become legal history.

An important sign in Europe was marked by the judgment of the Court of Justice in Case C-30/14 in which Ryanair brought action against PR Aviation, confirming the compatibility with EU law of contractual clauses used to prohibit third parties from using information extrapolated from other sites and aggregated for commercial purposes.

In fact, PR Aviation had been carrying out systematic and automated extraction of detailed information on flights from the Ryanair site, even though there were general terms of use which explicitly prohibited the practice of scraping. According to Ryanair’s defense, that practice constituted, among others, breach of copyright and sui generis right.

First the Court of Utrecht and then the Amsterdam Court of Appeal rejected Ryanair’s claim on the grounds that screen scraping constituted a completely normal and legitimate use of the site, constituting a hypothesis of freedom of use provided for by Articles 6 and 8 of Directive 96/9/EC.

However, in accepting the airline’s request, the Court of Appeal found that the Directive was not applicable to a database protected neither by copyright nor sui generis right, and that free usage “does not preclude the creator of a database from laying down contractual limitations”, then referring to the Court for verification of which form of protection was applicable to Ryanair’s database – copyright or creator of the database.

The same matter was also dealt with by the Court of Milan in the case of Viaggiare Srl vs Ryanair, with a first decision of 4 June 2013 (No. 7825) – with its epilogue a few months ago – establishing the legality of the screen scraping of Ryanair’s database from the latter’s site, which could not be protected under Directive 96/9/EC: in that case, even if the sui generis right of the creator of Ryanair’s database was recognized, Viaggiare Srl’s business was not considered to be such as to harm the commercial interests of the air carrier.

On the other hand, how can data owners be protected?

Legislative Decree No. 196/2003 – the “Privacy” Code – provides that each processing – except those provided for by law or regulation – must be based on the valid consent of the data owner party and that such consent is, in any case, valid and effective only in respect of that specific processing. Often, instead, scraping is used precisely to use this data beyond valid and effective consent, thus to collect and republish (and thus also disseminate …) the personal data of unsuspecting users, who have delivered their data to specific sites for specific purposes, on the network.

That is why, with Provision No. 4 of 14 January 2016, the Italian Data Protection Authority intervened to block what it called “trawling on the web”, namely the systematic activity of “scraping” data and information concerning millions of users carried out for telemarketing purposes by Develhop Srl through the site: the ultimate goal was to create actual phone directories outside of the consolidated database (DBU) of all customers of Italian telephone operators and, of course, outside any individual acquisition of the consent of the owners of the personal data. The Authority did not declare data scraping illegal, but the activity aimed at creating online phone databases and the so-called “reverse search” if the source was not the DBU.

Operating recommendations

It is advisable to take two precautions:

  • in the Terms of Service (TOS), provide for prohibiting users/visitors from using scraping techniques for systematic retrieval of data and information in order to be able to act more easily act in court for the protection of one’s rights in the case of breach of contractual terms, obtain an injunction and any compensation for damage suffered; in such cases, it is preferable that the precise indication be highlighted or specially and separately “flagged” by the user
  • create a restricted section, allowing access only after registration. In such cases, scraping would become an authentic abuse of access to a computer system that could be prosecuted under Article 615-ter of the Italian Criminal Code (“Anyone who introduces him/herself abusively into a computer or telematic system protected by security measures or remains there against the express or silent will of whoever has the right to exclude him/her, is punished with imprisonment up to three years”).

Never forget that these activities should not be a mere exercise, but should only be carried out after appropriate assessments or in the presence of the necessary authorizations.

And that jurisprudence, as seen also by the activity of the Data Protection Authority, is constantly evolving. Therefore, pay the utmost attention.

Morena Rangone