TECH | Oct 18, 2016

Data Driven Project Management

When data help the project management? How to analize, discuss and evaluate the data in order to produce useful information? The vision of Marco Caressa

What you need to know to read this article

Let’s talk about Project Management, that concerns everybody, as everybody’s life is full of projects. To understand this article, it is not necessary to be certified project managers with 20 years’ experience (hoping, however, that even they find it useful), but just to know or have experience a few basic concepts. Therefore knowing that a project is a coordinated effort to achieve everything (product or service) that did not exist before with a given level of quality. That what is produced takes place via tangible deliverables, (e.g. a building, a rail line, a car prototype…) or intangible ones (e.g. software, digital documentation,…). That each project has a start and end date. That each project has a goal achievement manager, called Project Manager. That Project Management, as the name says, is the discipline that gathers contributions, ideas, techniques, tools and good practices to manage any type of project, from sending the first man to Mars to the new video-game for your smartphone.

If we have data, let’s look at data. If all we have are opinions, let’s go with mine.
Jim Barksdale, past Netscape CEO 


Being guided by information taken from objective, measurable data is a good way to make decisions. Managerial action is continuous choices and managing projects means deciding whether to include a new requisite at any time of the day, whether to adopt a planning approach, whether to resort to external resources rather than trying to do everything with your own, and so on. However, gather data on project activities is just a first important step, necessary but not enough for making good decisions. The data must be analyzed, discussed and assessed to produce useful information. However, all this is still not enough to state that project management is really data-driven if, instead of “automatically” implementing actions already decided beforehand according to the processed information, the choices must always be only the result of further contingent assessments. Therefore, from planning to estimates and progress control, how can you favor a truly data-driven Project Management approach in your organization?

Are our decisions data-driven?

No. You almost certainly know at least one person that fear flying and nobody frightened of getting into a car, in spite of the fact that current statistics (i.e. Information processed from measurable data) show how planes are a safer means of transport. In the decision-making process, the emotional and subjective perception aspect play a decisive role. This applies to people as well as to teams and organizations.

A company can decide to not invest in a project to create a new product, even if it has the technical, financial and commercial possibilities and in spite of the fact that analyses and reports show a request from the market. Why? Because companies make “gut” decisions too, not based on habits. If the business originates historically from “on order” activities, the company will be reluctant to use costs and resources unless there is a specific order from a customer, thus limiting the inclination to innovate and risking losing opportunities that data analysis available would suggest are worth taking.

When you manage a project there is nothing different. As Project Managers, you will take into account what the PMBOK calls work performance information (i.e. Progress measurement of the work carried out), but you will be inevitably influenced by your subjective perceptions and past experience, that you will cling to because of its similarity, that will make you act “by sensation” more often than you think. Decisions in project management should be based on a solid basis of quantitative data and objective measurements of facts, which should only be influenced by your experience and intuition at a later date, applied to that specific context.

Use of data: from data-informed PM to data-driven PM

The evolution towards a data-driven approach to Project Management is gradual and passes through intermediate stages of maturity. Using data to support your decisions usually requires three steps:

  1. the gathering  of “raw” data about a set of metrics, which if measured can represent if and how the project goals are being achieved
  2. The processing of data to obtain information (filter, aggregate, summarize, present) and sharing it with the other players in the decision-making process
  3. The use of this information to make decisions.

Nothing new here, nothing more than simple common sense and no need, for now, to bring BI/Big Data frameworks  or robotic technical solutions into the picture. To the contrary, you could object that Project Managers have always collected data and processed information on the progress of their own projects to understand how to proceed, using various tools to find, for examples, the actual start and end dates of activities compared to forecasts, to actualize costs and compare them with estimates, to assess the percentage of completion of project deliverables, to measure the quality of the process and product and compare them with expected percentages and so on.

However, even in companies with a deep-rooted project culture, certified professionals and structured work processes tend to confuse data-driven  management with simply data-informed management.

The difference lies in the way with which step 3 is carried out. Once the work performance information (step 1) is gathered and information is processed (step 2):

  1. If you stop and call a good meeting to make the decision to be taken, you are managing the project in a data-informed manner
  2. if instead, depending on the information gathered and a set of “rules” that you have defined beforehand, specific management actions are automatically started up, then you are managing the project in data-driven mode.

To be clearer, in a data-driven project, there is an action strategy depending on the available data, expressed by “rules”  that implement the decisions already taken.

Let’s make an example and imagine a project to create a software system, where you must decide whether and when to authorize release as a test, depending on the number and type of bugs currently found in the code. With data-informed management, release in test mode will be subordinate to a decision after measuring the bugs. With data-driven management, release in test mode (like other conditions) will be automatically authorized, and perhaps also carried out, when the number of bugs found goes below certain threshold values, defined at the beginning of each type of error. Values that can naturally be parameters (e.g. Maximum number allowed of faults, for each type of bug, that give the “green light” for release, that can be modified via a suitable configuration dashboard).

The described scenario has two advantages:

  1. Rapid reaction to change, due to the application of a decision already taken beforehand.
  2. The greater efficacy of the Project Manager’s action that, release from operational details, can concentrate more on the overview of the project to be managed.

Before you start getting doubts about the fact that, as a Project Manager, you risk being deprived of your authority by the decision-making automatisms, relax: all you do is gain efficiency. It is still you that will make the decisions, the difference is that you do it, wherever possible, in advance, without having to reinvent an assessment for the same scenario each time.

As always, everything works in theory, but in practice what are the phases of a hypothetical roadmap, as the good ones say, to make your approach to Project Management truly data-driven ?

STAGE 1 – Choice of measurement metrics for the project (aka, what you are interested in measuring) 

There are basically two criteria of choice in the hundreds of potential metrics that can be used in project management:

  1. Base yourselves on metrics that have a solid link to the project’s goals
  2. Do not reinvent the wheel and always give precedence to consolidate metrics in “literature” (someone else before us has surely addressed and solved our problems in a brilliant manner).

There are no set rules, but a “minimum” list of metrics to be considered may be as follows.

1 – Metrics of predictability, are needed to measure the differences between the actual trend of a project and the planned one (baseline). Of all the ones foreseen in the Earned Value method, that allows integrated control of the project, by placing a regular progress check on deliverables and incurred costs alongside the time schedule.

In particular:

  • The (SPI, Schedule Performance Index) that shows how quickly your deliverables “are progressing” in terms of value compared to planned value. Values below 1 (or than 100%, if the measurement is expressed in percentages) show a trend to deliver later than planned. Values above 1 show a trend to deliver in advance. A value of 1 shows a trend to deliver while observing the planned schedule.
  • The(CPI, Cost Performance Index) that shows how many Euros you are actually spending for each Euro that you estimated spending to achieve the plan. Values below 1 (or than 100%, if the measurement is expressed in percentages) show a trend to spend more than planned. Values above 1 show a trend to spend less than planned. A value of 1 shows a trend to spend the estimated budget correctly.

The performance indexes are metrics taken from simple measurements, as the percentage of physical progress of  estimated deliverables and costs incurred on the project.

2 – Quality metrics, are used to find defects or non-compliances to requisites. In particular:

  • Number of faults found on the project deliverable, grouped together based on status (e.g. Open, deferred, closed, fix available, etc.) and their progress according to schedule
  • Density of defects, given from the ratio between the number of project defects and sizes (e.g. In the event of software application it can be the number of bugs per function point). It is useful to make comparisons between different projects, the defects can be “weighted” in relation to their gravity or priority of resolution (e.g. Difference between defects that prevent the system from functioning and defects that only cause a deterioration of performance).

3 – Response metrics, are needed to measure the speed of response and solutions for problems for the project stakeholders, using help desk systems that open a “ticket” in relation to a problem or an issue and trace the resolution process. In particular:

  • Age of the open problem, time interval passed since identification of a problem and the consequent opening of a ticket, combined with the average age of all open problems provides an indication of waiting times for solving a problem
  • Resolution rate, number of problems/tickets that are closed in the reference period.

Normally, less sophisticated help desk systems provide several other useful measurements, such as the ratio between the number of solved problems and effort/cost used to close them (average turnaround time).

4 – Efficiency metrics are used to assess the use of available resources (human and none). In particular:

  • Resource use rate, as the ratio between the planned effort and the spent effort (e.g. In days/people) on the project, taken from specific reporting systems (e.g. timesheets)

5 – Productivity metrics are used to assess how much output can be produced by using the time of one person or one team on a particular activity. For software development some examples of quantification of the concept of output are:

  • functions added to an application
  • number of defects found/solved
  • number of use cases developed/tested
  • Code lines produced
  • Number of function points developed by time unit (e.g. Per day/person)

Metrics like the last one quoted are especially useful as they are based on the measurement of sizes (in this case the function points) that are separate from the technological project context and are therefore objective to allow comparisons between different projects.

STAGE 2 – Collection of data (aka, how you will make the measurements)

Identify the metrics, to collect the data you must define the “measurement process“, that in practical terms means deciding:

  • Which data detection and memorization tools you will use (e.g. timesheet for accounting for working  days, trouble ticketing systems for tracking faults on the project deliverables, …)
  • how frequently you will measure data (e.g. You will be able to have different detection frequencies depending on the data, for example you can measure people work data on a monthly basis and the faults on what has been realized on a daily basis)
  • Who will be the manager for each finding and processing and presentation of results (the members of the work team, the client, the Project Manager, …).

STAGE 3 – Data analysis (aka, which information you are searching for)

This is where the best and most difficult part comes in. The data you have collected are mostly raw, numerous and extremely detailed. To obtain information that can be useful in the decision-making process, you must filter, group together and process. This processing can be simply aimed at understanding the current project situation (analysis) or, more frequently now and using statistical techniques and tools, you can point it towards identifying trend, patterns and the development of predictive models of events and conduct (data analytics).

Another aspect is that of uncertainty: any numerical measurement of processes and systems follows a distribution of probability brought by the underlying nature. Risk Management introduces the topic of uncertainty in project management and in the relative measurements. Risks are uncertain events with a certain probability of occurring and impact all aspects, in particular planning. Data-driven approach means, in this case, expressing the project metrics in terms of “analytic certainty“. For example, when communicating an estimate, instead of saying:

“Itit will take 3 person/months to realize the activity and complete  the deliverable

you must manage to say:

“Wewe have a 95% probability to complete the deliverable with an effort of 2.5 and 3.5 person/months“.

To the contrary of the first impression, the second formulation is richer in information and therefore more useful to giving support for decisions. It tells you that you could manage to complete the work in less than 3 months (even if there is a risk that you might take more time) and gives you a measure of feasibility of assessment, that is totally lacking in the first formulation. Naturally, both the percentage and the variability of effort interval are not the result of a “gut” sensation (remember, you are data-driven…) but the result of statistical processing that applies a distribution of probability to gathered data (in this case estimation data for time and cost planning), for example via a Monte Carlo simulation, to be used with your preferred electronic sheet.

Thus, the statistical analysis of data will allow you to answer questions such as:

“What is the probability that the project cost comes under budget?”

“What cost guarantees us the 90% probability of carrying out the entire project?” 

“What probability is there that a certain activity will finish by a given data?” 

“When will we have a 90% probability of starting a certain activity?”

“What cost guarantees us the 80% probability of carrying out the entire project?”

STAGE 4 – Support for decisions (aka, how to use the information obtained)

What we have said so far was necessary to reach the true goal ofdata-driven Project Management: use in support of information decisions obtained by processing data in order to implement management actions decided beforehand wherever possible.

The introduction of elements for triggering management automatisms is anything but simple and must be approached gradually. The previous example of release of an application in test mode, started up automatically depending on the measured level of faults in the code, is an ideal simplification. Actually, the number of bugs per type below suitable thresholds can be one of the release conditions, but not the only one, of course. There may be other conditions of a contractual nature (e.g. Verification of existence and completeness of associated documents – inspection plan, test schedule, etc.) or organizational nature (e.g. Formalization by accompanying letter signed by a manager, etc.).

Project Management actions that, based on the data, can be modeled in advance with greater ease are communication, to propel support information more rapidly and precisely to a “strategic” decision level that must be “human” and cannot be eliminated.  The starting point can be a dashboard, like the one shown in the figure, that summarizes all the essential metrics for measuring the project’s “health”.


The upper part is supplied by applying the Earned Value method (predictability metrics), with the performance indicators that summarize the current project situation in terms of time and costs. From these, it is possible to obtain the forecast indicators (ETC, Estimate To Complete) that, based on available data, show the costs still to be incurred and the residual work time required to complete the project. The drill-down is enabled on these metrics to analyze the measurements on parts of project. By clicking on the aggregate value it is possible, for example, to navigate on the project’s WBS to check the performance indicator values up to the elementary work units (workpackage).

For example, an SPC (Schedule Performance Index) value of 0.9 shows a delay in project activity (in terms of produced value) compared to the plan. This information alone, however, may be useful but is not real support for the decision. Is the overall performance compared to schedule a reflection of a “structural” delay on most activities or is due to problems in a single part of the project? By carrying out the drill-down you could, for example check that out of 10 WBS activities, eight have SPC = 1 and two have SPC = 0.5. This allows a rapid, certain decision about which specific areas to intervene on or whether to acquire further information.

What type of advance decision can be implemented in this case? For example, a multi-channel push communication of messages and alarms to project managers or owners of individual activities, when the performance measurement values show problems in schedule and/or costs in specific project areas.

The lower part of the dashboard shows measurements of quality, risk (risk score calculated for the entire project and navigable by individual areas) and productivity metrics, compared to the ones that can be called further “advance decisions”, both in communication and implementation, as a test of the previous example.


In a data-driven project, the Project Manager’s decisions are always supported by information taken from data using analytical processing, which is increasingly statistical. Where possible, these decisions should be defined in advance and modeled using sets of rules that are reviewed regularly, to allow the automatic implementation of actions supporting the management. In this way, fewer decisions will be taken off the cuff, with a series of advantages:

  • Moving towards a better quality of decisions, assessed according to objective criteria
  • decisions taken in advance can be discussed without the urgency of an incumbent situation
  • experience and intuition in decisions are limited to emergency situations
  • Emergency situations decrease and become exceptions, while a planning culture aimed at anticipating implementation problems and scenarios is favored.

So, are you ready to take you project management towards the area of objective factors?

Marco Caressa