May 12th, 2021
How Much is My Property Worth?RSS Share Category: Community, Deep Learning, Explainable AI, H2O, Open Source, R
By: Jo-Fai Chow
Note: this is a guest blog post by Jaafar Almusaad.
How Much is My Property Worth?
This is the million-dollar question – both figuratively and literally.
Traditionally, qualified property valuers are tasked to answer this question. It’s a lengthy and costly process, but more critically, it’s inconsistent and largely subjective. Mind you, valuation is an “art,” not “science.”
In reality, “qualified” valuers often end up having different “opinions” about the value of properties, and it’s up to the customer to pick the “opinion” that better serves their interest.
To address this issue, AVMs (Automated Valuation Models) have been developed. The motive was to use data and hand-crafted computer algorithms to estimate the market value of properties instantaneously and consistently. However, a major caveat with AVMs is that human biases can propagate to the final product, potentially resulting in partial valuations.
A more recent approach is to rely on Machine Learning and AI. Indeed, AI has outperformed human experts in many fields, and more data is now available than we can ever digest.
I was curious if AI can replace current AVMs, so I curated a dataset from different sources. It comprises tens of millions of property transactions and data about locations in the UK (safety, income, education, etc.).
I typically use R and the
data.table library for my data workflows, and these work flawlessly with H2O-3. So, I used H2O to train Deep Learning models to predict property values.
As you may realize, Deep Learning AI is superior but pretty much a “black box.” Therefore, it’s crucial to validate the models, not only mathematically but also with a human’s “common sense”. H2O has a whole arsenal of tools for validating and interpreting trained AI models. In addition to the common statistical tools (RMSE, Deviance, etc.), H2O has recently included Residual Analysis. In layman’s terms, residual analysis picks an observation with a known target value, predicts the value, and compares the prediction with the actual value — a residual of zero means perfect prediction. The analysis is illustrated graphically for additional convenience.
Another couple of tools that I find very helpful are Variable Importance and Partial Dependence Plots (PDPs). As the name suggests, Variable Importance tells us the most influential variables (i.e., predictors). In my case, since the variables are well defined and understood, Variable Importance helps validate the models further. For instance, we know (by experience) that the total floor area has the biggest impact on property value. Fortunately, we can see the floor area at the top of the chart. In other words, the model did well.
However, if we want to dive deeper to see exactly how floor area affects the price, then we can use Partial Dependence Plot. PDP analyzes the impact of a specific variable (i.e., floor area) on the target (i.e., price). Under the hood, it averages out all other variables and focuses on the variable in question. In our case, we can see that the relationship between floor area and price is pretty much linear, which is what we would expect.
As my dataset continues to grow, both in terms of the number of observations and features, I find myself spending more time on the technical side when I should be focusing on growing the business. Thankfully, H2O has automated most of the work with their Driverless AI, which I consider exploring in the next phase.
AccuVal, the property valuation platform is publicly and freely available here https://accuval.co.uk/
Please let me know if you want to talk about your H2O use cases. We welcome all kinds of community contributions (e.g. blog posts, tech talks, apps, etc.)