Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Case Overview

Amazon Delivery

Figure 1:Amazon Delivery - Photo from Amazon News

Dataset

This is a real Amazon delivery dataset of 45k samples taken over a period of 3 months. Delayed means that on a specific day, at least 10 items from one truck were delayed.

Objective

As a business analyst working for a company in the logistics sector, you have the following tasks:

  1. Analyze the data we already prepared for this demo.

  2. Build your first model and analyze its prediction score.

  3. Improve the accuracy of your delivery predictions.

Correctly predict package delivery on-time status for your customers with an accuracy of 80% or better.

Dataset Description

In this demo, we are using a real Amazon delivery dataset of 45k samples taken over a period of 3 months. Delayed means that on a specific day, at least 10 items from one truck were delayed.

Evaluation

This dataset is used for a tutorial and does not have a formal evaluation metric. However, the goal is to achieve an accuracy of 80% or better in predicting on-time delivery status.

Gradient Boosted Trees

A gradient boosted tree is a machine learning model that builds many small decision trees sequentially, where each new tree is trained to correct the mistakes of the previous ones.

It combines two ideas:

  1. Decision trees

  2. Gradient boosting

Decision Tree (the base learner)

A decision tree:

But a single tree often:

Boosting

Boosting means:

Build models sequentially, where each new model focuses on fixing the errors of the previous model.

Unlike random forests (which build trees independently), boosting builds trees one after another.

Step-by-step Intuition

For binary classification:

Step 1: Start simple

Begin with a very basic prediction (like predicting the average probability).

Step 2: Measure mistakes

Calculate how wrong those predictions are (using log loss).

Step 3: Fit a small tree

Train a shallow tree to predict the errors.

Step 4: Update predictions

Add that tree’s predictions (scaled by learning rate).

Step 5: Repeat

Build another tree to fix the new residual errors.

Do this hundreds of times. The final prediction is the sum of all the trees’ predictions.