*“I never guess. It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.”
*Sir Arthur Conan Doyle, Author of Sherlock Holmes stories

In a previous article *in Ingenium* we saw how the management choices in a *data-driven* project are supported by information taken from data. In a data-driven project, there is an action strategy depending on the available data, expressed by “rules” that implement the decisions already taken. The *data-driven *approach to project management includes the following stages:

- choice of project measurement values (e.g. Measurement values of differences like the ones foreseen in the Earned Value method, measurement of quality like the number or density of faults, etc.).
- Gathering project data
- Data analysis for obtaining information (Project Management Analytics)
- The use of information to make decisions.

In this article, we want to concentrate in particular on step 3, identifying the available solutions and current trends.

## A little “data science” for Project Managers

The computational complexity of project analytics varies in a potentially broad spectrum, crossing the paths of *big data *(according to Volumes, **V**ariety and **V**elocity of change in data) and AI (*machine learning*, *deep learning*, predictive systems). A*nalytics *do something more than allowing you to capture project data and mark off completed tasks.. Instead, it helps you to understand patterns and trends.

You can use this understanding in various ways, for example to determine performances in a project and, if they aren’t in line with the general goals, to establish the decision to be made to improve the success rate.

The basic steps of the general *Project Management Analytics *process are the following:

- Identification of data
*patterns* - Determining significant inferences from data
*patterns*. - Use of inferences to develop regressive/predictive models
- Use of predictive models to support decision making.

One of the best known analysis tools you can use is statistics methods. If you manage a project, you will continuously have to deal with uncertainty. The risks are the representation of uncertainty. All the project baselines (WBS, scheduling, budget) that you built during the planning phase are needed to address the uncertain future of the project. If we presume that, long term, there will be a high probability of “recurrence” of similar quantitative measurements within a given process, statistical analytics help you to address the matter of uncertainty, as it includes tools and techniques for interpreting specific patterns in project management processes or for forecasting future trends.

Statistical analysis of data uses the so-called __probability distribution functions__, for which Project Management offers several scenarios for use.

**“Normal” distribution**, or Gaussian, that distributes values symmetrically around the average of probability, covers several PM processes as it involves the so-called “normal events” such score criteria for the selection of projects in a portfolio, stakeholders’ opinions, task duration, probability associated with a risk, etc.

**Poisson distribution** is the result of calculation processes of discreet, independent events. You can use it to count the number of successes or opportunities as a result of a series of attempts in a given period of time. It may be useful to evaluate the number of human resources acquired by the project in a 2-month period, the number of milestones completed in one month, the number of tasks completed in a week, the number of change requests processed in a given month, etc.

**“Triangular” distribution**, based on 3 underlying values (minimum, maximum and peak value), used in *three-point-estimation*, is useful if you need to estimate costs and duration of activities, considering the most probable values, both optimistic and pessimistic.

Similarly to the triangular distribution, the “beta” distribution allows you to model events that occur in an interval limited by two minimum and maximum values. If you are expert Project Managers you will probably have already used the PERT method (*Program Evaluation and Review Technique*) and CPM method (*Critical Path Method*) for *three-point estimation*.

## The Montecarlo analysis: how can project realization times be calculated?

Once the probability distribution function most suited to your scenario has been chosen, how can you use it to develop a predictive project model? A simple example is the Montecarlo analysis for forecasting project completion times.

A good method for producing estimates (of time, effort, resources) is the three-point estimation, that evaluates three possible scenarios: the best case one (optimistic), the worst case one (pessimistic) and the most likely one, to take into account the risks that always exist in any realization.

Take a very simple project plan with three activities involved, for which you have in some way estimated the three values (best, worst, most likely). To simplify, imagine that the activities are in sequence, therefore each activity starts when the previous one has ended, without overlaps or interweaving. How much time will it take to complete the project?

As the three activities are in sequence, you can add together the estimations to have the three estimation values of the total project time. You understand immediately that this is an unrealistic scenario, as it is unlikely that all three activities are completed at the same time according to the best case, worst case and most likely scenario. It is more likely that an activity will be completed before estimated and another will be slightly delayed.

The Montecarlo analysis produces a predictive model of the project’s total duration, processing data based on a simulation with a sufficient number of iterations (at least in the hundreds). At each iteration a random duration is attributed to the three activities, including the best case and worst case values for each of them, recording part results each time.

If you circulate 1000 loops with your electronic spreadsheet you will obtain a result like the one below.

The three columns on the right show the simulation results. The last line (total project duration) is not obtained by simply adding together the values for the three activities, but by applying the same statistic to the total. Basically, the total is calculated at each loop based on the random variations in activity duration. The same activity will last longer in some iterations, shorter in others, combining with the variations of the other two activities. Substantially, the minimum and maximum value of the duration of the entire project are of interest when random events occur that have an impact on one or more activities (simulation of risk), more or less as happens in any real project.

As you can see, the average value obtained from simulation is almost the same (51 vs. 50) as the initial most likely estimation. You may have partly expected this, but the result depends on the structure of the proposed problem and on the theory that the three activities were in sequence. In more complex situations (high number of activities and more complex dependencies between them), the difference between the most likely value and the average value from the simulation may be greater.

Another observation is that the simulation provides a total duration of a project of between 32 and 70 days, more accurate than the interval of 25-80 days that was obtained by adding together the estimates in best case and worst case. Considering the fact that the extreme values are also the less probable ones, it should be no surprise that it is therefore impossible for a situation to occur in which all three activities have a minimum or maximum duration.

However, the real value achieved by applying these analytics is not just in the final result, but also in the examination of partial and detailed data. Your simulation calculates all the partial values at each iteration, and records them. The graph below shows the probability values (calculated as a number of occurrences divided by the total number of iterations) for completing the project in a given number of days.

As you can see, you only have 5% probability of completing the project within 40 days and 92% probability of completing it within 60 days. The analysis provides support for decisions that is more useful than a simple evaluation expressed in a three-point estimation. The most interesting data is that, according to this model, with 50 days (most likely total value estimated initially) you “only” have 50% probability of completing the project.

You can turn this matter upside down and ask what the duration is that guarantees completion of the project with a higher probability than a given threshold value (level of confidence). The lower it is, the more you are inclined to risk. The higher it is, the more you wish to “play safe”. To have a probability of completion of 75% you must take 55 days, to rise to 85% you must forecast a duration of 58 days. As nothing is free, of course, the 3 extra days of activity on the work plan will cause more costs and therefore a higher budget.

You can apply the analytics and statistical simulations to costs too or to other measurements, in relation to not just the project as a whole but also to parts of it, to single activities or groups of activities or to specific deliverables, to respond to questions such as: *“What is the probability that the project cost comes under a pre-set budget?” “What cost guarantees us a 90% probability of carrying out the entire project?” “What probability is there that a certain activity will finish by a given date?” “When will we have a 90% probability of starting a certain activity?”*

## The AHP model: weighting the decision factors

Another analytical model that directly guides *data-driven decision making*, guided by several evaluation criteria, is AHP (*Analytical Hierarchy Process*). It can be used in any Project Management scenario that includes N factors in the decision-making process, like deciding which project to include in a portfolio or where a deliverable or part of one can be realized internally or be obtained externally (*make or buy analysis*).

The method foresees that the scenario is formed by identifying the criteria on which the decision is based and any possible alternatives. For example, in a Project Management scenario, the criteria may include three factors: flexibility in the scope, flexibility in time, flexibility in costs. To make a decision, you must consider the relative importance of these three factors in your project context.

If, for example you have to organize a birthday party, you won’t have flexibility in time (the date of the party cannot be moved), but, to a different extent, you will have flexibility in costs and characteristics (scope) of the party, choosing whether to organize the event at home, perhaps preparing the buffet yourself, or hiring a room and a catering service.

The AHP model foresees the definition of the relative importance of N decision factors as the first thing to do. This is carried out using a services of pair comparisons, producing an N x N matrix that expresses the relative weights of decision criteria with numerical coefficients.

Next, the possible M alternatives are evaluated compared to each of the N decision factors in a similar way, by pair comparison, obtaining N M x M matrices, to which linear algebra operations will be applied to obtain ranking of the various alternatives depending on the decision factors and their relative importance.

We do not have space for a full example, even a simple one, of an application of the AHP method. Instead please refer here.

## Conclusions

Is project management an art or science? Managing a project means continuously making decisions. The contribution of experience and intuition is important, but your choices should be based on a solid basis of objective measurement of facts. The Project Management Analytics tools and techniques, i.e. the systematic analysis of quantitative project data, allow you to obtain important information, see patterns and useful trends for the decision-making process emerge. You can adopt statistical processing (distributions of probability and Montecarlo simulations to represent uncertainty and risks in a real project with random variables) or algebraic processing (matrix processing in AHP models). What you must always do is start with data measured objectively and, in the case of estimates, with reliable evaluations taken where possible from the examination of historical data.

*Marco Caressa*