Construction cost managers often need to quickly produce an accurate cost estimate. It takes a lot of time to substantiate an accurate cost estimate. Being able to quickly calculate a cost indication and the associated bandwidth gives a construction cost manager a good starting point and advantage in acquisition and advice. We would like to know if it is possible to estimate construction costs by using historical data and machine learning algorithms.
Objectives
The main objectives of our research are • Being able to estimate the costs of a construction project (or a part thereof) using historical (construction cost) data and machine learning. • The final model that will be used in practice will have to be at least 90% accurate. • Insight into the result must be provided by calculating the estimates per cost component or project property
Our research
Recently, we investigated whether it is possible to make a good construction cost estimation using some machine learning techniques like a neural network, a multi-regression function or a regression decision tree and historical data. With these techniques you can create a function or a model which solves your problem. We have collected a dataset of apartments construction projects and got to work. The data contains input parameters like the total gross floor area, number of houses, open and closed facade area, etc. We also use parameter ratios in our dataset, such as the total gross facade area divided by the total gross floor area.
Neural Networks
The estimate results of our neural network seems promising at first sight, but the amount of data presented doesn’t seem enough to train the network properly. Generating statistical data helps. But then we still have the problem of having a black box: there seems to be no insight into why a certain estimate is given.
Regression
The multi-regression method has given good results so far. Playing and testing with the data for several weeks gives us a good feeling about which input properties matter and which projects can be seen as outliers. After the dataset is split into several building types we can make cost estimates with a mean error of a few percentages and a maximal error inside the 10% marge. For estimating the construction costs we only use 7 input parameters and 3 input ratios. The regression coefficients of these 10 parameters form a regression function by which we can estimate the cost. These 10 parameters give some insight into the way in which the result arises.
Regression with decision trees
By using decision trees we can break down our data, making estimates based on asking a series of questions. Each question leads to a separation of the data into two or more subsets. A number of questions together form a tree. The leaves of the tree contain projects with similar properties. An estimate is made by answering the questions and calculate the mean of the construction costs of these leaf projects.
The first attempt of estimating the building cost using decision trees gives us more insight right away about which properties are important for our estimation and how to divide our dataset. The accuracy of this model needs further investigation.
What’s next?
The research of this topic is still in progress. In the near future we are going to do the following investigations concerning construction cost estimates: • Estimate per cost component • Finding out why a project is an outlier • Diving into the data for more details concerning parameters or ratios • See how accurate the decision tree model is
We are also wondering, who has an estimate challenge we can help with?