TECH | Mar 28, 2017

Data Science: what competences are necessary?

About 1 million new Data Scientist jobs for European industries up to 2020, but there is need for training

A little statistics, a little maths, some hackers: this is the classic way of considering the Data Scientist, the most sought-after professional in the industry who wants to properly address the challenges of the digital economy and the increasing availability of large amounts of data, the so-called Big Data.

The profession of Data Scientist – with the knowledge and competences to analyze and interpret the Big Data that are available every day – is set to become a central figure in all businesses of the digital economy, in all socio-economic activities, such as commerce (both online and proximity), financial services, healthcare, the energy sector, telecommunications, the web and social media, corporate management and strategic consultancy, up to industrial robotics and the so-called Intelligent Factory hoped for in the National Industry Plan 4.0.

Some time ago we tried to clarify what is meant by Data Science, a field of knowledge that has been emerging in recent years and which brings together the expertise, competences and abilities of various other sciences: inquisitiveness from the world of research, economic perspective from the world of research and the noted competences of the world of Statistics, Mathematics and Computer Science.

The benefits of the introduction of specialized professionals on corporate organizational chart data are estimated at an average 40% increase in revenue and reduction of costs (McKinsey, February 2016). Companies are conscious of this need and it is estimated that there will be about 1 million new jobs as Data Scientist for European industries up to 202 (PwC, April 2016). But when businesses and governments need to interact with universities and centers of training to bridge competence gaps in their workforce they are faced with the difficulty of what to ask for. Beyond the professional qualification, it is clear that the processes in one domain are not always the same as those in another; the role of the person analyzing data in a large company is more specialized and focused than in a small or medium business, where people often get along by doing a bit of everything.

How to form a Data Scientists?

The need was born for a common language, clear and shared concepts, rules and definitions for the avoidance of ambiguity in both in the supply of and demand for training. This has been done in the recent past by a group of scholars whose efforts converged in an educational framework called e-Competence Framework (e-CF, now in version 3.0). This framework identifies 23 different competences within the ICT world (divided into 4 families) and for each professionalism defines the related competences and learning levels. Competences are grouped into five areas corresponding to the five phases of an ICT process (Plan, Build, Operate, Use and Manage). Each competence is described in operational terms of know-how, of knowledge in addressing a task: for each competence the specific competences and knowledge support the know-how are specified, but also the level of learning for this competence, which can vary according to profile or seniority. The e-CF has proved to be such a useful tool that there is an ongoing evolution of the expert group that regularly upgrades it to a structure that also takes care of commercial exploitation.

The same framework was used by a group of academics and businesses, including Engineering, of the EDISON project for identifying competences for the Data Scientist. After a first attempt to analyze this professional figure through surveys, desk-research and interviews with experts in the field, it was realized that in effect the complexity of the domains (including the world of research) confirms the complexity of this profession. The mantra of the various blogs, papers and informative studies (not always based on statistical studies) calls on the Data Scientist to be the repository of all the competences of various domains. This is not strictly true and, as the wiseman says, “the truth lies somewhere in the middle”.

The EDISON team’s analysis identified another 22 other professions, each worthy of note and consideration, placing the concept of Data Scientist (in the structure and language of e-CF) at the level of new “family”. In this way it is possible to formalize the complexity of competences and levels of familiarity of those who have attracted criticism from many sides, likening the search of the Data Scientist to that of the unicorn.

What are the challenges for those who want to use e-Competence Framework?

  • In a world – that of data – which is in fervent evolution, how to ensure that new patterns, new techniques and new instruments do not rapidly make the competences as they are described outdated and not suited to the needs of businesses and research centers?
  • How to translate the list of competences and skills into a useful tool for teachers and professors for structuring courses suited to different profiles?
  • How to support the mapping of the competences of each worker, researcher or student with the various professions for identifying any gaps but, above all, how to recommend effective ways to fill those gaps?
  • How to help managers understand, starting with the activities of the companies or projects in which they work, which figures they really need?

EDISON is working with universities and businesses to promote this framework and understand how to respond to these and other questions. The second conference of “champions” – that is, teachers and trainers who, in various contexts, from applied sciences to libraries, form the data-professionals of tomorrow – was held recently at the Universidad Carlos III de Madrid in Spain.

On Tuesday, April 4, a workshop will be held in the Main Hall of the Department of Engineering at the University of Perugia with industry and company experts, including Engineering and other local entities, to share ongoing experiences. In particular, the work that the Department has carried out in applying the EDISON framework for the organization of a Master’s in Data Science and how this work can be replicated in the future, will be presented.

The Perugia event will be the first at national level to launch the university-business dialogue on the theme of Data Science training. The hope of everyone is that this effort will not stop at intentions or isolated experiences but that the tools will be found to engage and contaminate trainers and teachers throughout the entire country in order to ensure homogeneity of the training offer and the possibility of rapidly bridging the expected gap in digital competences in the future digital economy.

Andrea Manieri, Francesco Saverio Nucci