For some years Data Science has been finding application in companies, producing significant impacts in social and economic terms. So how can the impact in relation to the costs/benefits of Data-Driven Management be measured and how to choose the most appropriate investment strategy?
A recent study by the University of Notre Dame has developed an experimental model that is useful for calculating the ROI of Data-Driven Governance.
Data Science: why?
Companies are rapidly adopting Data Science to support their decision-making and management processes, also to have a positive impact on business and/or operations in order to increase revenues, reduce costs and increase business efficiency. For these reasons, companies are more frequently incorporating new analytical programs to create greater value, using not only their own internal data but also through the connection between these business data and external data sources. However, the acquisition of external data is not a low-cost activity and indeed almost always requires an investment by the organisation. Also, the more complex and advanced models of data analysis are developed and implement, the greater are the investments required in terms of human and technological infrastructure resources.
How is it possible, therefore, to optimise some analyses for the reduction of costs, with the consequence of a greater return on investment (ROI) by using Data Science? Is there an objective way to compare the value of different strategies such as data acquisition or modelling in order to be able to guide companies in the most appropriate choice according to their needs?
How to optimise investments
Researchers at the University of Notre Dame wanted to provide a useful model for evaluating the ROI of Data-Driven Governance, or how to identify the budgets for companies by developing adequate and diversified strategies with respect to various market situations and in a context of balancing investment costs with benefits/revenues.
The NPV model developed in the study suggests the best possible business practices regarding the activities of analysis and strategy. By using the proposed framework, the costs of developing the model, of the acquisition of external data and those of the temporal value of forecasts can be unified; this facilitates the development and implementation of strategies that take advantage of the synergy and convergence of development of the model and external data acquisition.
In summary, the model developed is predictive and is based on machine learning which, starting from traditional empirical measurements, translates the combination of data acquisition, operating costs and investment parameters into an economic measure (in US dollars). The use of machine learning gives the model greater accuracy by minimising the so-called false negatives, which usually have a greater cost for companies than false positives.
In the framework developed, the classification of sensitive costs also takes account of costs for the acquisition of data and external sources, modelling costs and operational costs, each of which is essential to implementation of the model for analysis of concrete cases. If companies want to explore the possibility of acquiring external data for their operations, use of the model means that they can understand if the cost of these data, related to the value of the internal data produced by the company itself (in-house), is greater net of transition costs. If so, it would be worthwhile for the company to continue investing in order to acquire and obtain external data because, in the medium term, the benefit and revenue from data usage would be higher, so the ROI would be very positive.
In summary, the study developed provides a framework within which to develop a strategy for improving the allocation of budgets and resources in the medium and long term by exploiting the analysis of Data Science processes. There are three main parameters that analysts can and should take into account in developing lines of business and investment:
1) if the type of the external data model is better
2) if the type of development model is better
3) when one or each of these two models should be distributed, specifically to the various sectors for business integration.