MARKET | Mar 27, 2018

New horizons for Business Intelligence

Logical Data Warehouse and data virtualisation; the new frontier in BI

Historically, a Business Intelligence (BI) infrastructure was very complex; including data sources and their normalisation, ETL processes, OLAP cubes and a Data Warehouse on a physical server. Technology has developed considerably and scenarios are changing due to two special factors: the cloud and virtualisation. But there is more… Trends in Business Intelligence highlight other significant factors that will be important in the coming months.

New technologies always lead to new possibilities. In the case of Business Intelligence, which everyone now refers to as Analytics, we are experiencing a fundamental revolution. The trend is directed towards a scenario where most business users will have access to self-service tools to prepare data for analysis, without necessarily having to go through IT. Most independent self-service data preparation solutions will be extended to end-to-end analysis platforms or integrated as functions in existing Front-End ones. Intelligent data capture on Hadoop, based on semantic, visual and intelligent research, will become a unique form of next-generation data detection.

According to a recent study by Gartner “Organisations are embracing self-service analysis and Business Intelligence to deliver these functional features to business users at all levels. This trend is so marked that Gartner Inc. expects that by 2019 the analytical production of commercial users with self-service capacity will exceed that of professional scientists”. In a nutshell, very soon everyone will be able to perform data analysis without even having to be specially trained in data-processing.

Logical Data Warehouse: a new approach

What is a logical or virtual data warehouse? To understand the logic behind the Logical Data Warehouse, it is necessary to make a detailed examination of the Data Warehouse (DW) of a traditional company. This, according to Barry Devlin, is its definition: “A Data Warehouse is simply a single, complete and consistent archive of data obtained from a variety of sources and made available to end users so that they can understand and use it in a business context” (Data Warehouse: from architecture to implementation).

A Data Warehouse is a single, physical database. It can be a representation of a heterogeneous set of data sources, each of which contains parts of the business data that will be used for transactions or business analyses.

The Logical Data Warehouse is an architectural style that represents data from various sources that is not necessarily physically present anywhere.

In the traditional scenario, with an Enterprise Data Warehouse (EDW), data is generally derived from transactional databases, applications, CRM systems, ERP systems or any other data source. This data is standardised, cleaned and transformed by means of an ETL process (Extraction, Transformation, Loading) to ensure reliability, consistency and precision at company level before being loaded into the Data Warehouse. This process guarantees a stable and secure data platform from which Data Scientist and information operators can perform complex analyses and generate informative reports.

Today the concept of EDW is obsolete and inefficient due to the volume, variety and speed of data coming from clouds, social networks, mobile devices and IOTs; data (often unstructured) that is distributed on global sites, in a multitude of formats. Assumption and expectation are related to the fact that all this will be accessible, meaningful and ready to be consumed by any real-time or almost real-time, self-service BI application. When an EDW project, as described above, is implemented, it often loses its relevance in relation to current business needs. Between the project, its design and initial tests, it is   unlikely that the structure will remain unchanged and more importantly it is very often necessary to take a few steps backwards to reshape the structure of the data. But there is more…I In order to be able to read this data efficiently, it has  to pass through other tools such as OLAP cubes, and other ETL processes will have to be prepared in order to feed them.

As a BI consultant, I have seen well-designed projects that proved to be very complex and took a long time to implement because of the large “funnel” called ETL. Before loading the processes, data normalization è is another critical point for every project. An LDW can make a 75% saving in time because it does not require an ETL process or normalisation of the data. But there is more… In theory with an LDW, a Data Warehouse can even be excluded since this logical layer can connect the data source directly with the Front-End.

More and more business organisations are trying to gain control of this avalanche of raw data by using a logical architecture that recapitulates the intrinsic complexities of Big Data by using a combined approach to data virtualisation, metadata management and distributed computing, given that the Logical Data Warehouse architecture combines all these elements, thus including and transcending the EDW capabilities.

The new Logical Data Warehouse concept will allow IT departments to carry out their BI tasks and responsibilities. Finally, the era of the true CIO (Chief Information Officer) has arrived.

The logical layer of an LDW provides (among other things) various mechanisms for displaying data in the DW, without the need to move and transform data before the visualisation time. In other words, the Logical Data Warehouse integrates the traditional central warehouse (and its main function of aggregation, transformation and persistence of data a priori) with data research and transformation functions in real time.

The big advantage of the logical layer is that the data is fresher, as required by time-sensitive business processes, and the structure of the data provided is created  on the fly  (as required by data/ model-driven analysis), without limiting the data to pre-constructed DW structures. Achieving these advantages has been a challenge in the past simply because the software, hardware and networks lacked the speed, scalability and reliability required for the installations.

Data virtualisation provides a single integrated view of data from sources distributed in real or near real time, regardless of the type or location of the data or whether it is structured, semi-structured or unstructured. When the Logical Data Warehouse, powered by a complete Data Virtualisation product, combines with its unparalleled distributed processing, which pushes said processing to the source system where the data is waiting to be requested, the liberated data “dance” begins.

The Logical Data Warehouse in current terms

The need for a self-service BI in modern data management cannot be underestimated, and therefore the possibility of having a self-service Logical Data warehouse where up to 100 different sources can be connected within minutes of installation is decidedly important.

What does this mean in practice? There are many areas of business in which companies want to perform data analysis; customer information, order status; pretty much anything. If you want to generate reports for gathering information, there are two problems: first, purchasing a license for the appropriate software to load the data and a place to store it; secondly, purchasing a database. It is therefore not that easy. Even by investing in a database and spending money on a license, it will take 6 to 8 months before the report is generated, because the data must be uploaded to the central database. This, in turn, requires developers experienced in the Back-End process. The end result is that the report will originate after six months, when the company has already forgotten what it was for.

Seriously “Data Driven”

Today there is much talk of data-driven businesses, but there are very few tools designed to avoid tedious ETL procedures; it is easy to look at the Front-End, but it is much harder to know how to deal with the Back-End. Organisations need a robust Front-End Business Intelligence, capable of connecting with the Logical Data Warehouse. Qlikview, TARGIT, PowerBI, Tableau and even common spreadsheets can provide results from an SAP source in a few minutes with a couple of clicks. There is no ETL with the sole requirement of Data Modelling; telling the Front-End which fields of the table are to be measured and which dimensions, and with tools like Querona this can be done achieved quite simply.

Decisions can thus be based on current data. Furthermore, a new generation Logical Data Warehouse will allow you to indicate the source of the data and determine if the data will be uploaded once a day, at night, in the morning, in the cloud or anywhere else. Everyone (as authorised by the CIO) can always obtain safe access from any location.

In practice, the new generation LDW allows Data Scientists to manage all information without having to rely on the technological infrastructure. This is a dream coming true.

Michele Iurillo