Speedml Machine Learning Speed Start

Author of Speedml explains data science solutions for beginners. Includes chapter on Top 20 most voted Kaggle solution.
Get Data Science Solutions Book

Property Listing Optimization

Property portals can list thousands of properties at a time. How do you as the portal manager analyze the performance of your listings to increase user conversion? How do you as a real-estate agent optimize a property listing so it delivers the best results?

We use Speedml to analyze this problem. We are participating in the Kaggle competition hosted by Two Sigma Connect. The competition is for their Renthop property listings portal. The problem we solve is How much interest will a new rental listing on RentHop receive? as the competition website describes.

Speedml solution reaches top 20% of around 2,500 participating teams and data scientists. We achieve this with more than 70% reduction in iterations when compared with top solutions.

The datasets for this competition are available from Kaggle website or Renthop on request (academic, research use).

Multi-notebook workflow

This project involves significant number of features (200+) and lot of data or nearly 125,000 samples across train and test datasets. Processing such large dataset requires significant compute during our workflow on a laptop. We decide to split the workflow into workflow stage-specific notebooks, saving interim datasets at each stage.

EDA and Wrangle. During this stage we understand the datasets by running exploratory data analysis. During this stage we also visualize location-based features including latitude and longitude of the property location using clustering techniques.

Latitude Longitude Clusters

We also perform basic data pre-processing and wrangling by feature engineering density for high-cardinality features, labels for categorical text features, and outliers fixing, among other aspects.

This stage saves a feature engineered interim dataset for train and test scenarios.

High-cardinality. Next stage further processes certain high-cardinality features based on their correlation with the target variable. This stage is processing intensive so we separate it out and save the results as train and test datasets which handle high-cardinality.

Text processing. We also dedicate a notebook to perform text processing on free-form text fields like property listing description.

Model prediction. Final stage in our workflow builds on top of work done in other stages and generates the model training and prediction.

Speedml solution

Speedml experts have significant experience working on leading property portals including one of UK’s largest property listing portals.

Speedml property listing optimization solution will take as input the sample dataset (simple CSV format) describing your property portal listings or we can provide a sample dataset.

The result will be a set of data analytics reports, charts describing your dataset, and of course machine learning model results based on certain assumptions around user behaviour. You can change these assumptions later to tailor the solution for your business as your user base grows.

Deliverable includes custom data science solution notebook you can run from your laptop, or hosted from private GitHub.

As your user base grows you can feed actual data to the model (as simple as copying a CSV into a folder) and see how this changes the model predictions.

We will develop the solution using the open source Speedml and best-of-breed machine learning packages, which are well documented and community supported.

If you require a custom solution for your property portal please send us a message and we will respond.