Deep Learning for TS

In this section 3 deep learning models will be implemented on the Capital Bikeshare Daily Ridership data. The goal will remain the same as previous sections which is to correctly forecast future ridership.

Data Preparation

Train/Test Sizes

Code

# Read in Data
df = pd.read_csv("./Data/CleanData/cabi.csv")
df = df.rename(columns={"date": "t", "count": "y"})
df = df[["t", "y"]]
X = np.array(df["y"].values.astype("float32")).reshape(df.shape[0], 1)


# Parameter split_percent defines the ratio of training examples
def get_train_test(data, split_percent=0.8):
    scaler = MinMaxScaler(feature_range=(0, 1))
    data = scaler.fit_transform(data).flatten()
    n = len(data)
    # Point for splitting data into train and test
    split = int(n * split_percent)
    train_data = data[range(split)]
    test_data = data[split:]
    return train_data, test_data, data


train_data, test_data, data = get_train_test(X)

print("train shape:", train_data.shape)
print("test shape:", test_data.shape)

train shape: (2334,)
test shape: (584,)

Train/Test Visualization

Code

# Visualize the Training and Test Sets

fig, ax = plt.subplots()
ax.plot(range(0, len(train_data)), train_data, "-")
ax.plot(range(len(train_data), len(X)), test_data, "-")
ax.set(xlabel="Time (days)", ylabel="Bikeshare Ridership Scaled")
plt.show()

Deep Learning Models

Data Preparation

This code was adapted from Professor Hickman’s Sunspot Prediction Example.

Code

# PREPARE THE INPUT X AND TARGET Y
def get_XY(dat, time_steps, plot_data_partition=False):
    global X_ind, X, Y_ind, Y  # use for plotting later

    # INDICES OF TARGET ARRAY
    # Y_ind [  12   24   36   48 ..]; print(np.arange(1,12,1)); exit()
    Y_ind = np.arange(time_steps, len(dat), time_steps)
    # print(Y_ind); exit()
    Y = dat[Y_ind]

    # PREPARE X
    rows_x = len(Y)
    X_ind = [*range(time_steps * rows_x)]
    del X_ind[::time_steps]  # if time_steps=10 remove every 10th entry
    X = dat[X_ind]

    # PLOT
    if plot_data_partition:
        plt.figure(figsize=(15, 6), dpi=80)
        plt.plot(Y_ind, Y, "o", X_ind, X, "-")
        plt.show()

    # RESHAPE INTO KERAS FORMAT
    X1 = np.reshape(X, (rows_x, time_steps - 1, 1))
    # print([*X_ind]); print(X1); print(X1.shape,Y.shape); exit()

    return X1, Y


# PARTITION DATA
p = 30  #
testX, testY = get_XY(test_data, p)
trainX, trainY = get_XY(train_data, p)

Input Array Shapes (Test/Train)

Code

# USER PARAM
recurrent_hidden_units = 3
epochs = 200
f_batch = 0.2  # fraction used for batch size
optimizer = "RMSprop"
validation_split = 0.2
# trainY=trainY.reshape(trainY.shape[0],1)
# testY=testY.reshape(testY.shape[0],1)
print("Testing Array Shape:", testX.shape, testY.shape)
print("Training Array Shape:", trainX.shape, trainY.shape)

Testing Array Shape: (19, 29, 1) (19,)
Training Array Shape: (77, 29, 1) (77,)

LSTM Model

Code

# CREATE MODEL
model = Sequential()
# COMMENT/UNCOMMENT TO USE RNN, LSTM,GRU
model.add(
    LSTM(
        # model.add(SimpleRNN(
        # model.add(GRU(
        recurrent_hidden_units,
        return_sequences=False,
        input_shape=(trainX.shape[1], trainX.shape[2]),
        # recurrent_dropout=0.8,
        recurrent_regularizer=regularizers.L2(1e-2),
        activation="tanh",
    )
)

# NEED TO TAKE THE OUTPUT RNN AND CONVERT TO SCALAR
model.add(Dense(units=1, activation="linear"))

# COMPILE THE MODEL
model.compile(loss="MeanSquaredError", optimizer=optimizer)
model.summary()


# TRAIN MODEL
history = model.fit(
    trainX,
    trainY,
    epochs=epochs,
    batch_size=int(f_batch * trainX.shape[0]),
    validation_split=validation_split,
    verbose=0,
)

# MAKE PREDICTIONS
train_predict = model.predict(trainX).squeeze()
test_predict = model.predict(testX).squeeze()
print(
    trainX.shape,
    train_predict.shape,
    trainY.shape,
    testX.shape,
    test_predict.shape,
    testY.shape,
)
# COMPUTE RMSE
print(trainY.shape, train_predict.shape)
train_rmse = np.sqrt(mean_squared_error(trainY, train_predict))
test_rmse = np.sqrt(mean_squared_error(testY, test_predict))
print(np.mean((trainY - train_predict) ** 2.0))
print(np.mean((testY - test_predict) ** 2.0))

print("Train MSE = %.5f RMSE = %.5f" % (train_rmse**2.0, train_rmse))
print("Test MSE = %.5f RMSE = %.5f" % (test_rmse**2.0, test_rmse))


# PLOT THE RESULT
def plot_result(trainY, testY, train_predict, test_predict):
    plt.figure(figsize=(15, 6), dpi=80)
    # ORIGINAL DATA
    print(X.shape, Y.shape)
    plt.plot(Y_ind, Y, "o", label="target")
    plt.plot(X_ind, X, ".", label="training points")
    plt.plot(Y_ind, train_predict, "r.", label="prediction")
    plt.plot(Y_ind, train_predict, "-")
    plt.legend()
    plt.xlabel("Observation number after given time steps")
    plt.ylabel("Bikeshare Users Scaled")
    plt.title("Actual and Predicted Values")
    plt.show()


plot_result(trainY, testY, train_predict, test_predict)

2023-04-28 09:25:48.756911: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

Model: "sequential"

_________________________________________________________________

 Layer (type)                Output Shape              Param #

=================================================================

 lstm (LSTM)                 (None, 3)                 60

 dense (Dense)               (None, 1)                 4

=================================================================

Total params: 64

Trainable params: 64

Non-trainable params: 0

_________________________________________________________________

1/3 [=========>....................] - ETA: 0s

3/3 [==============================] - 0s 3ms/step

1/1 [==============================] - ETA: 0s

1/1 [==============================] - 0s 18ms/step

(77, 29, 1) (77,) (77,) (19, 29, 1) (19,) (19,)
(77,) (77,)
0.014871944
0.01070964
Train MSE = 0.01487 RMSE = 0.12195
Test MSE = 0.01071 RMSE = 0.10349
(2233,) (77,)

RNN Model

Code

# CREATE MODEL
model = Sequential()
# COMMENT/UNCOMMENT TO USE RNN, LSTM,GRU
model.add(
    SimpleRNN(
        # model.add(GRU(
        recurrent_hidden_units,
        return_sequences=False,
        input_shape=(trainX.shape[1], trainX.shape[2]),
        # recurrent_dropout=0.8,
        recurrent_regularizer=regularizers.L2(1e-2),
        activation="tanh",
    )
)

# NEED TO TAKE THE OUTPUT RNN AND CONVERT TO SCALAR
model.add(Dense(units=1, activation="linear"))

# COMPILE THE MODEL
model.compile(loss="MeanSquaredError", optimizer=optimizer)
model.summary()

# TRAIN MODEL
history = model.fit(
    trainX,
    trainY,
    epochs=epochs,
    batch_size=int(f_batch * trainX.shape[0]),
    validation_split=validation_split,
    verbose=0,
)

# MAKE PREDICTIONS
train_predict = model.predict(trainX).squeeze()
test_predict = model.predict(testX).squeeze()
print(
    trainX.shape,
    train_predict.shape,
    trainY.shape,
    testX.shape,
    test_predict.shape,
    testY.shape,
)
# COMPUTE RMSE
print(trainY.shape, train_predict.shape)
train_rmse = np.sqrt(mean_squared_error(trainY, train_predict))
test_rmse = np.sqrt(mean_squared_error(testY, test_predict))
print(np.mean((trainY - train_predict) ** 2.0))
print(np.mean((testY - test_predict) ** 2.0))


# PLOT THE RESULT
def plot_result(trainY, testY, train_predict, test_predict):
    plt.figure(figsize=(15, 6), dpi=80)
    # ORIGINAL DATA
    print(X.shape, Y.shape)
    plt.plot(Y_ind, Y, "o", label="target")
    plt.plot(X_ind, X, ".", label="training points")
    plt.plot(Y_ind, train_predict, "r.", label="prediction")
    plt.plot(Y_ind, train_predict, "-")
    plt.legend()
    plt.xlabel("Observation number after given time steps")
    plt.ylabel("Bikeshare Users Scaled")
    plt.title("Actual and Predicted Values")
    plt.show()


print("Train MSE = %.5f RMSE = %.5f" % (train_rmse**2.0, train_rmse))
print("Test MSE = %.5f RMSE = %.5f" % (test_rmse**2.0, test_rmse))
plot_result(trainY, testY, train_predict, test_predict)

Model: "sequential_1"

_________________________________________________________________

 Layer (type)                Output Shape              Param #

=================================================================

 simple_rnn (SimpleRNN)      (None, 3)                 15

 dense_1 (Dense)             (None, 1)                 4

=================================================================

Total params: 19

Trainable params: 19

Non-trainable params: 0

_________________________________________________________________

1/3 [=========>....................] - ETA: 0s

3/3 [==============================] - 0s 3ms/step

1/1 [==============================] - ETA: 0s

1/1 [==============================] - 0s 18ms/step

(77, 29, 1) (77,) (77,) (19, 29, 1) (19,) (19,)
(77,) (77,)
0.016305383
0.012164743
Train MSE = 0.01631 RMSE = 0.12769
Test MSE = 0.01216 RMSE = 0.11029
(2233,) (77,)

GRU Model

Code

# CREATE MODEL
model = Sequential()
# COMMENT/UNCOMMENT TO USE RNN, LSTM,GRU
model.add(
    GRU(
        recurrent_hidden_units,
        return_sequences=False,
        input_shape=(trainX.shape[1], trainX.shape[2]),
        # recurrent_dropout=0.8,
        recurrent_regularizer=regularizers.L2(1e-2),
        activation="tanh",
    )
)

# NEED TO TAKE THE OUTPUT RNN AND CONVERT TO SCALAR
model.add(Dense(units=1, activation="linear"))

# COMPILE THE MODEL
model.compile(loss="MeanSquaredError", optimizer=optimizer)
model.summary()

# TRAIN MODEL
history = model.fit(
    trainX,
    trainY,
    epochs=epochs,
    batch_size=int(f_batch * trainX.shape[0]),
    validation_split=validation_split,
    verbose=0,
)

# MAKE PREDICTIONS
train_predict = model.predict(trainX).squeeze()
test_predict = model.predict(testX).squeeze()
print(
    trainX.shape,
    train_predict.shape,
    trainY.shape,
    testX.shape,
    test_predict.shape,
    testY.shape,
)
# COMPUTE RMSE
print(trainY.shape, train_predict.shape)
train_rmse = np.sqrt(mean_squared_error(trainY, train_predict))
test_rmse = np.sqrt(mean_squared_error(testY, test_predict))
print(np.mean((trainY - train_predict) ** 2.0))
print(np.mean((testY - test_predict) ** 2.0))

print("Train MSE = %.5f RMSE = %.5f" % (train_rmse**2.0, train_rmse))
print("Test MSE = %.5f RMSE = %.5f" % (test_rmse**2.0, test_rmse))


# PLOT THE RESULT
def plot_result(trainY, testY, train_predict, test_predict):
    plt.figure(figsize=(15, 6), dpi=80)
    # ORIGINAL DATA
    print(X.shape, Y.shape)
    plt.plot(Y_ind, Y, "o", label="target")
    plt.plot(X_ind, X, ".", label="training points")
    plt.plot(Y_ind, train_predict, "r.", label="prediction")
    plt.plot(Y_ind, train_predict, "-")
    plt.legend()
    plt.xlabel("Observation number after given time steps")
    plt.ylabel("Bikeshare Users Scaled")
    plt.title("Actual and Predicted Values")
    plt.show()


plot_result(trainY, testY, train_predict, test_predict)

Model: "sequential_2"

_________________________________________________________________

 Layer (type)                Output Shape              Param #

=================================================================

 gru (GRU)                   (None, 3)                 54

 dense_2 (Dense)             (None, 1)                 4

=================================================================

Total params: 58

Trainable params: 58

Non-trainable params: 0

_________________________________________________________________

1/3 [=========>....................] - ETA: 0s

3/3 [==============================] - 0s 3ms/step

1/1 [==============================] - ETA: 0s

1/1 [==============================] - 0s 18ms/step

(77, 29, 1) (77,) (77,) (19, 29, 1) (19,) (19,)
(77,) (77,)
0.018047255
0.012979274
Train MSE = 0.01805 RMSE = 0.13434
Test MSE = 0.01298 RMSE = 0.11393
(2233,) (77,)

Model Comparisons

Each of the three deep learning frameworks performed well to predict daily capital bikeshare ridership. The best performing model was the LSTM model with a test RMSE of 0.1029. The RNN model has a test RMSE of 0.1058 and the GRU model has a test RMSE of 0.1079. This all indicates strong predictive power of each model. Including regularization as a part of the model was key to avoid overfitting to the training data and retain a low test RMSE. The model performs well predicting ridership 1 month in the future.

Deep Learning vs Traditional Model Comparison

Each of the deep learning methods (RNN, LSTM, GRU) drastically outperformed all of the traditional time series methods such as ARIMA and SARIMA. Of the traditional time series methods, SARIMA was the best model which makes sense due to the seasonality of the time series. Even using the SARIMA model, the model just wasn’t able to accurately account for the varaince in the time series data which led to large confidence bounds and generic future predictions. The predictions of the SARIMA model were not much better than what a human would estimate given the previous trend and seasonality of the data. The LSTM model on the other hand, which performed the best of all deep learning models, did a much better job accounting for small variations in the data and therefore more closely fitting the data. This led to a far higher predictive ability. The downside of using these deep learning models is that we lose a lot of the interpretability of why the model made these predictions. In the case of predicting bikeshare usership this we are willing to overlook this because the performance of the deep learning model is so much better.