MARKET | Apr 20, 2017

Is Data Lake a new resource?

A Bloor Research report indicates Data Lake as a potential predictive analysis tool

Organisations are increasingly relying on knowledge derived from data to improve profitability, discover new opportunities, accelerate product and service innovation, and ensure a good customer experience.

Big Data however, as is well-known, requires new capabilities to correlate information with the goal of having a visual approach to data, also by means of new interpretation models. These are not inconsiderable challenges for companies who, approaching the idea with a traditional stance, run the risk of having to deal with costly processes that require a heavy human resource commitment and do not lead to the expected results.

Data Lake: what are we talking about?

Data Lake is a working method that simplifies the storage, management and analysis of Big Data by means of a single collaborative environment where data supply and demand is managed and explored; where any type of data from any internal or external source to the company can be correlated and acquired, so it can be subsequently prepared and delivered for analysis.  In summary, without getting into technical details, Data Lake can be considered as the data management platform for the entire company, flowing from different sources and giving the opportunity to various Lake users to examine the content, immerse or withdraw samples. In practice a huge volume of raw data – structured, semi-structured and de-structured – in native format that can be studied and analysed.

The idea is simple: instead of putting the data in an ad hoc warehouse, it is moved into a Data Lake in its original format, thus eliminating the initial costs of data entry and transformation and providing the opportunity for it to be used by anyone within the company.

The critical issue in using this tool lies in the difficulty of building a “lake” that collects data in such a way as to make it effectively exploitable, thus preventing it becoming a “swamp.”

Advantages and main features:

According to a recent Bloor Research report, an intelligent Data Lake enriches and correlates Big Data’s useful information with customers, products and other critical-business entities. In addition to retrospective data metrics and analyses,  core business intelligence and data-warehouse advantages, Data Lakes will offer new possibilities for predictive analysis.

According to Bloor, Data Lake management should allow productivity and collaboration, in addition to making rapid identification easier via controlled access to the platform. Furthermore, a well-built Data Lake can:

  • ensure that all types of data can be included, prepared for analysis and delivered to users quickly and automatically
  • comply with data governance operational guidelines, following the rules for accessing and analysing data preset by staff belonging to different departments and functions
  • maintain data quality, also integrating machine learning, which is useful in simplifying and improving the automation of the analysis process and thus limiting the possibility of error
  • record data lineage by storing all the information needed to determine the source of the data and obtain maximum reuse.
  • protect the data by controlling access by means of security levels, such as access control, encryption, and data-masking
  • maintain semantic consistency by acquiring and leaving metadata, so that data sets are simple to find and easy to understand.

Whether Data Lake is the most effective methodology for the analysis of inferences or not, only experience will tell. What is certain is that analysis techniques based on Big Data require a genuine change in the mindset of analysts. A change that translates into new operating procedures, new analysis models and a new way of seeing things.  A new way of seeing things that looks at an increasingly complex world based on new interpretative perspectives that demand ever more complex technological challenges.

As we face the challenge of constructing meaning using Big Data analysis, we must however never neglect the fact that inferential statistics look at the thing, while we must never forget to look at the why.