Demand and Price Forecasting Using Amazon DeepAR Algorithm
In Consumer Electronics, forecasting demand and selling price is crucial in predicting the inventory of each product across each store. In this blogpost we will discuss with a specific use case how we can use DeepAR algorithm to solve the industry problems.
Use Case
Predicting Mobile phone demand, selling price is difficult for the retailers as the shelf life of a mobile handset lasts anywhere between 16 to 18 months. From the date of launch, product price follows a declining trend due to market dynamics. Towards 12th -18th months retailer companies tend to clear all the stocks from their inventories to make space for new model releases. Usually between 14th -18th months, retailers tend to give massive discounts to clear off any outdated inventory. If we can forecast specific models to be sold, with right timing and price, we can optimize purchasing, shipping, inventory and pricing costs.
From the use case we observed that mobile device prices follow a general declining trend after day of launch. During product life cycle, they may be impacted by other related time series, sales quantity for each transaction, a big purchase order leading to a better negotiated price, seasonality factors such as holiday and sales events, which may result in unprecedented price fluctuations. Deep AR is the perfect algorithm which will capture the trend, seasonality, fluctuations, cold start problem and impact of other model’s time series data.
Sk_id: Unique Id for the combination of Model, storage and color. Now we will see how we use DeepAR algorithm to solve these kind of use cases. For this use case I am using simulated data of mobile phones across different retailers. Below is the sample data.
Amazon DeepAR Algorithm
DeepAR is a supervised learning algorithm for time series forecasting that uses recurrent neural networks (RNN) to produce both point and probabilistic forecasts. It’s quite complex algorithm, unlike other timeseries forecasting techniques that trains different model for each timeseries, DeepAR creates a single model for all the time series and try to identify the similarities between each time series.
Implementation
Launch SageMaker Notebook instance to load, preprocess the data and build the model.
Step 1 - Data Loading and Preprocessing
After cleaning, preprocessing and feature engineering, we need to prepare the data as per the format required by DeepAR algorithm. Import all the required libraries.
%matplotlib inline import sys from dateutil.parser import parse import json from random import shuffle import random import datetime import os import boto3 import s3fs import sagemaker import numpy as np import pandas as pd import matplotlib.pyplot as plt from Lib import utils #Create the connection to s3 to access the training and testing data s3 = boto3.client('s3') #get_execution_role gives the IAM role role = sagemaker.get_execution_role() #Intializing bucket properties bucket = ‘’ #provide the bucket name prefix = ‘’ #Provide the folder name inside the bucket input_data = "s3://{}/{}/data".format(bucket, prefix) output_data = "s3://{}/{}/output".format(bucket, prefix)
All string features need to be encoded to integer and value should start from 0. Hence, “sku_id”, “model”, “storage_col”, “colour”, “customer” and “country” features should be transformed.
transform_json={ "model":{"model1":0,"model2":1,"model3":2,"model4":3,"model5":4,"model6":5}, "storage_vol":{"64 gb":3,"256 gb":1,"512 gb":2,"128 gb":0}, "colour":{"black":0,"blue":1,"coral":2,"gold":3,"green":4,"silver":5,"space grey":6}, "customer":{"retailers1":0,"retailers2":1,"retailers3":2,"retailers4":3}, "country":{"country1":0,"country2":1,"country3":2} } df_cat = df.copy() #Creating the data copy to make sure the no changes in raw data #Convert all column string values to lower case str_cols=['model','storage_vol','colour','customer','country'] for col in str_cols: df_cat[col]=df_cat[col].str.lower() #Repalcing the string with corresponding encoded values df_cat.replace(transform_json, inplace=True) # create mapping table between sku_id and categorical data dict_mapping = df_cat[['sku_id', 'model', 'storage_vol', 'colour']].drop_duplicates() dict_mapping = dict_mapping.sort_values('sku_id') dict_mapping.head()
Next step is to assign the features into 3 arrays, “target”, “cat” and “dynamic_feat” in JSON format. You may notice the different terminologies used in SageMaker compared to Amazon Forecast datasets, and they are equivalent to “target time series”, “metadata” and “related time series”, respectively. Below is a sample of the training data where each line represents one time series, i.e. one sku’s historical record
Preparing the target time series data in above format
#Selecting the required columns df1 = df_cat[['timestamp', 'sku_id', 'price', 'quantity','country', 'customer']] # time series columns naming: sku_id + country + customer_group (Merging the columns as our predictions will be in this dimension) df1['group_id'] = df1.sku_id.astype(str) + '_' + df1.country.astype(str) + '_' + df1.customer.astype(str) #Converting all the timeseries in to respective columns with date index columns = df1['group_id'].unique() columns.sort() df1['timestamp'] = pd.to_datetime(df1['timestamp'], errors='coerce',dayfirst=True) df1 = df1.set_index('timestamp') df_target = pd.DataFrame(columns = columns) df_target['timestamp'] = pd.date_range(start='2018-09-20', end='2020-01-31', freq = 'D') df_target = df_target.set_index('timestamp') df_target = df_target.asfreq(freq='D') #in df_target each column is nothing but one combination time series #Updating all the time series with target price num_columns = len(columns) for i in range(num_columns): columns_split = df_target.columns[i].replace('_', ' ').split(' ') temp = df1.loc[(df1['sku_id'] == int(columns_split[0])) & (df1['country'] == int(columns_split[1])) & (df1['customer'] == int(columns_split[2]))].resample('D').mean() #taking the mean price if we have multiple records for the same day df_target.iloc[:, i] = temp['price'] #Adding all the time series data to target_price list target_price = [] num_ts = df_target.shape[1] for i in range(num_ts): target_price.append(np.trim_zeros(df_target.iloc[:,i], trim='f'))
Data transformation to create dynamic feature data
#Data transformation to create realted time series data df_dynamic = pd.DataFrame(columns = columns) df_dynamic['timestamp'] = pd.date_range(start='2018-09-20', end='2020-01-31', freq = 'D') df_dynamic = df_dynamic.set_index('timestamp') df_dynamic = df_dynamic.asfreq(freq='D num_columns = len(columns) for i in range(num_columns): columns_split = df_dynamic.columns[i].replace('_', ' ').split(' ') temp = df1.loc[(df1['sku_id'] == int(columns_split[0])) & (df1['country'] == int(columns_split[1])) & (df1['customer'] == int(columns_split[2]))].resample('D').sum() #taking the sum of quantity if we have multiple records for the same day df_dynamic.iloc[:, i] = temp['quantity'] df_dynamic = df_dynamic.fillna(0) #Filling all missing values with 0 dynamic_quantity = [] num_ts = df_dynamic.shape[1] for i in range(num_ts): dynamic_quantity.append(df_dynamic.iloc[:,i]) #appending all the related time series data to dynamic_quantity list
The missing data in “target” and “dynamic_feat” need to be manually imputed. It allows nan values in “target” but has to be coded as “NaN’ string specifically to train the model properly. “dynamic_feat” cannot have nan values, so missing data needs to be manually imputed based on business logics. By now, we have completed the data preparation which is the heavy-lifting part
Now Split the prepared data into two datasets, training and testing, and upload them into an S3 bucket.
freq = '1D' prediction_length = 30 #forecast days context_length = 30 start_dataset = pd.Timestamp("2018-09-20", freq=freq) end_training = pd.Timestamp("2020-01-31", freq=freq) # cat structure : [model, storage, colour, country, customer_group] # use dict_mapping, find the cat data for specific sku_id target_cat = {} for i in range(len(target_price)): column_name = target_price[i].name.replace('_', ' ').split(' ') cat_name = dict_mapping.loc[dict_mapping['sku_id'] == int(column_name[0])][['model', 'storage_vol', 'colour']].to_numpy() target_cat[i] = [] target_cat[i].append(int(cat_name[0][0])) target_cat[i].append(int(cat_name[0][1])) target_cat[i].append(int(cat_name[0][2])) target_cat[i].append(int(column_name[1])) target_cat[i].append(int(column_name[2])) FREQ = 'D' training_data = [ { "start": str(start_dataset), "target": [i for i in ts[start_dataset:end_training - pd.Timedelta(1, unit=FREQ)].tolist()], # We use -1, because pandas indexing includes the upper bound "cat": target_cat[index], "dynamic_feat": [[j for j in dynamic_quantity[index][start_dataset:end_training - pd.Timedelta(1, unit=FREQ)].tolist()]] } for index, ts in enumerate(target_price) ] print(len(training_data)) for i in range(len(training_data)): training_data[i]['target'] = [x if np.isfinite(x) else "NaN" for x in training_data[i]['target']] #Creating test dataset with back test window as 4 FREQ = 'D' num_test_windows = 4 test_data = [ { "start": str(start_dataset), "target": [i for i in ts[start_dataset:end_training + pd.Timedelta(k * prediction_length, unit=FREQ)].tolist()], "cat": target_cat[index], "dynamic_feat": [[j for j in dynamic_quantity[index][start_dataset:end_training + pd.Timedelta(k * prediction_length, unit=FREQ)].tolist()]] } for k in range(1, num_test_windows +1) for index, ts in enumerate(target_price) ] print(len(test_data)) for i in range(len(test_data)): test_data[i]['target'] = [x if np.isfinite(x) else "NaN" for x in test_data[i]['target']] #Convert training and testing data to jsonlines %%time utils.write_dicts_to_file("train.json", training_data) utils.write_dicts_to_file("test.json", test_data) #transfer the data to s3 utils.copy_to_s3("train.json", input_data + "/train/train.json") utils.copy_to_s3("test.json", input_data + "/test/test.json")
Step 2 - Training the Model
Then, we use class sagemaker.estimator.Estimator to initialize an estimator instance. In the estimator configuration, user can select the build-in model image, in this case DeepAR Docker image, which sets multiple computing instances to speed up the training process. Yet, GPU instances can also be used for training a DeepAR algorithm. We can set different hyperparameters to tune the model accuracy.
image_name = sagemaker.amazon.amazon_estimator.get_image_uri(region, "forecasting-deepar", "latest") estimator = sagemaker.estimator.Estimator( sagemaker_session=sagemaker_session, image_name=image_name, role=role, train_instance_count=1, train_instance_type='ml.c4.2xlarge', base_job_name='deep-ar-testing-price-prediction‘, output_path=output_data ) hyperparameters = { "time_freq": freq, "epochs": "400", "early_stopping_patience": "40", #stop the job if loss hasn't improved in 40 epochs "mini_batch_size": "64", "learning_rate": "5E-4", "likelihood" : "gaussian", "context_length": str(context_length), "prediction_length": str(prediction_length) } estimator.set_hyperparameters(**hyperparameters)
Then call estimator.fit() method to start the training job. Selected one training instance, ml.c4.2xlarge for training. Below shows the model accuracy metrics.
data_channels = { "train": "{}/train/".format(input_data), "test": "{}/test/".format(input_data) } estimator.fit(inputs=data_channels, wait=True)
Step 3 - Model Deployment
After a model is trained, its artifacts are saved in S3 buckets automatically. In order to retrieve real-time inference results, we need to set up and launch an endpoint to host the model. Below code will create the endpoint.
predictor = estimator.deploy( initial_instance_count=1, instance_type='ml.m4.xlarge')
The endpoint can be integrated with web apps to get the predictions. You can clone end to end code from the below GitHub url: https://github.com/Citrus-Consulting-Services/DNA-AWS-DeepAR-algorithm-implementation-python.git
DeepAR Advantages
- DeepAR+ algorithm can learn the trends and seasonality of a group of similar time series, which is based on training an auto-regressive recurrent network model.
- DeepAR+ algorithm is a probabilistic forecast instead of point forecast. It means the forecast result is in the form of probability distribution that contains a prediction interval, not just a single best realization. It enables optimal decision making under uncertainty in real-life scenarios.
- DeepAR+ algorithm has cold start forecast capability. A cold-start scenario occurs when you want to generate a forecast for a product that just launched in the market, it has little or no existing historical data.
Poonam
Wonderful post ! Very well explained code using all the parameters applicable for DeepAR, especially cat and dynamic_feat. Thanks so much.