“Everything should be made as simple as possible, but not simpler.”
Albert Einstein
The terms Cloud and Big Data have been in vogue for some time and are often associated, generating uncertainties and misunderstandings, whether involuntarily or studied beforehand by suppliers of products or services for promotional and marketing purposes. So it is worth pausing to reflect on the real potential offered by this scenario.
The phenomenon of Big Data, supported by technologies whose progenitor is the Hadoop open source project which recently celebrated its 10th anniversary, is so consolidated that for some years the Gartner research firm has been identifying it as a practice and not as an emerging technology. In actual fact, however, it requires overcoming difficulties related to the identification of reasonable and profitable analytical scenarios, to the multiplicity of specific skills required by those who have to develop solutions, to the choice among the different software tools available and, finally, to the identification of hardware infrastructure and software support.
Big Data and Cloud
According to Forbes, by 2020 every human being will generate 1.7 Mb of information per second, and one- third of this will be managed in cloud. A recent survey of 1,400 companies in 77 countries by a producer of analytical solutions on Hadoop showed that 53% of respondents had already released their Big Data platform in cloud and that 72% had plans to do so in the near future.
So far it has been a question of quite obvious and unequivocal cause-and-effect reasoning : we move Big Data to the cloud for making their management easier. But there are several aspects to be considered and not all offer sure-fire advantages.
The cloud possesses particularly desirable intrinsic characteristics for the management of Big Data, such as the availability of resources on request and facility of supply (on demand/fast provisioning), elasticity (ability to adapt to changing workloads) and flexibility of infrastructure, the possibility of obtaining an advantageous time-to-market and that of enabling spending constructed on the basis of actual use of resources (pay-as-you-go capacity).
The possibility offered by the cloud to acquire virtual machines very quickly on which to measure one’s Big Data solution is effective and reduces project start-up times as well as facilitating future technological experiments or the performance of demonstration activities (proof of concept). By opportunely selecting suppliers of technology – cloud providers – we know the costs beforehand and contain them without waste or unpleasant surprises.
It should be recognized that there are different ways of using the cloud and also for this reason a wise choice should be made.
Big Data as a Service?
Cloud providers such as IBM, Microsoft and Amazon expound Big Data services, adopting open source or proprietary solutions that meet diverse project needs. The offer is very wide and valid, and obviates both responsibility for the choice, delegating it de facto to the authoritativeness given by the supplier’s brand, and the management and maintenance of software solutions. The range of options made available by cloud providers remains nevertheless limited compared with that available to project activities by definition of an architecture. Paradoxically, this may be comforting and reassuring for IT architects, given the significant quantity of instruments from among which they are required to choose.
A further consideration, which mainly concerns the use of Big Data in service mode (Big Data-as-a-Service) is related to the so-called lock-in, namely, the difficulty of freeing oneself from a technology provider without incurring substantial costs. In a phase of liveliness on the Big Data landscape and the many opportunities that this offers, binding oneself to a technological choice could prove to be extremely counter-productive, even over the short term.
What about data protection?
One of the main issues related to the adoption of cloud infrastructure for hosting Big Data is related to data protection. Many sectors, particularly the financial sector, often give up the cloud for fear of violations of their data, with regard both to traffic in the network and the physical dislocation of these data. Another aspect to be considered is compliance with regulations regarding privacy, which are different in different countries. Storing sensitive information on the cloud, which can offer lower costs than other solutions, is subject to security practices dictated by the service provider.
One possible solution, which does not permit taking full advantage of the economic benefits of the public cloud, is to rely on providers of private clouds or mixed clouds (hybrid clouds). This latter option enables concentrating the most delicate activities and sensitive data on private clouds (which, in the on-premises solution, may also reside with the owner of the information) and retaining on the public cloud those coming from external data which, even in the face of a significant volume, require less protection, still making sufficient cost savings possible.
Monica Franceschini