Financial Time Series Models (ARCH/GARCH)

LYFT Data

Capital Bikeshare is operated by LYFT, a popular ridesharing company. While their main business is ridesharing and not bikesharing they do provide bikesharing services for many large cities in the US in addition to electric scooters. Without their investment into bikesharing it would be difficult for these cities to provide this service to its residents. Therefore, I found it of interest to attempt to model the volatility clustering of their stock returns to get a better understanding of their financials.

Code
# Set the start and end dates for the time period you want to analyze
start_date <- "2018-01-01"
end_date <- "2023-04-04"

# Download the Lyft stock data
getSymbols("LYFT", src = "yahoo", from = start_date, to = end_date)
[1] "LYFT"
Code
# Calculate the daily returns
lyft_returns <- dailyReturn(LYFT)

# Print the first few rows of the daily returns data
head(lyft_returns)
           daily.returns
2019-03-29 -0.1035154104
2019-04-01 -0.1185336426
2019-04-02 -0.0005796406
2019-04-03  0.0149340146
2019-04-04  0.0285714286
2019-04-05  0.0340277361

Plot Daily Returns

The Daily Returns plots shows clear volatility clustering. There are multiple periods in the time frame where there is a high volalitly period followed by lower volatility. This makes this series a good candidate for arch/garch modelling.

Code
returns = lyft_returns$daily.returns
chart_Series(returns)

Modelling

ACF/PACF

The acf and pacf of the reutrns data both show significant correlation at the first few lags indicating that an arma model is likely suitable for this data. The series doesn’t appear to need differencing but we will check if ARIMA models perform better as well.

Code
acf(returns, main = "ACF of Returns")

Code
pacf(returns, main = "PACF of Returns")

Optimal ARIMA Model

By comparing different values of p,d,q the optimal model based on AIC was found to be an arma(3,3).

Code
optimal_arima <- function(data, max.p, max.d, max.q) {
  # data: a vector or time series object of your data
  # max.p: the maximum order of the autoregressive (AR) term
  # max.d: the maximum order of differencing
  # max.q: the maximum order of the moving average (MA) term

  # Create a dataframe to store the results
  results <- data.frame(p = integer(), d = integer(), q = integer(), AIC = double(), BIC = double())

  # Calculate the AIC and BIC for each ARIMA(p,d,q) model with p <= max.p, d <= max.d, and q <= max.q
  for (i in 0:max.p) {
    for (j in 0:max.d) {
      for (k in 0:max.q) {
        if (i == 0 && j == 0 && k == 0) {
          next
        }
        if (i + j + k > 8) {
          next
        }
        model <- Arima(data, order = c(i, j, k))
        aic <- AIC(model)
        bic <- BIC(model)
        results <- rbind(results, data.frame(p = i, d = j, q = k, AIC = aic, BIC = bic))
      }
    }
  }

  # Return the results dataframe
  return(results)
}

results = optimal_arima(returns, 5,3,5)

results[which.min(results$AIC),]
   p d q       AIC       BIC
71 3 0 3 -3375.366 -3336.016

Optimal Model Diagnostics

The diagnostics for this model are not great, still showing significant autocorrelation.

Code
fit = Arima(returns, order = c(3,0,3))
summary(fit)
Series: returns 
ARIMA(3,0,3) with non-zero mean 

Coefficients:
         ar1      ar2      ar3      ma1     ma2     ma3     mean
      0.8664  -0.0034  -0.6061  -0.8523  0.0454  0.5924  -0.0012
s.e.  0.4229   0.6271   0.3986   0.4353  0.6351  0.4148   0.0015

sigma^2 = 0.002058:  log likelihood = 1695.68
AIC=-3375.37   AICc=-3375.22   BIC=-3336.02

Training set error measures:
                       ME       RMSE        MAE MPE MAPE      MASE       ACF1
Training set 2.197294e-05 0.04521095 0.03146807 Inf  Inf 0.6945023 0.01280739
Code
sarima(returns, 3,0,3)
initial  value -3.089163 
iter   2 value -3.089626
iter   3 value -3.093328
iter   4 value -3.093348
iter   5 value -3.093377
iter   6 value -3.093539
iter   7 value -3.093697
iter   8 value -3.093833
iter   9 value -3.093892
iter  10 value -3.093911
iter  11 value -3.093936
iter  12 value -3.093984
iter  13 value -3.094011
iter  14 value -3.094024
iter  15 value -3.094034
iter  16 value -3.094060
iter  17 value -3.094071
iter  18 value -3.094083
iter  19 value -3.094096
iter  20 value -3.094131
iter  21 value -3.094173
iter  22 value -3.094216
iter  23 value -3.094245
iter  24 value -3.094264
iter  25 value -3.094306
iter  26 value -3.094367
iter  27 value -3.094428
iter  28 value -3.094455
iter  29 value -3.094462
iter  30 value -3.094472
iter  31 value -3.094524
iter  32 value -3.094544
iter  33 value -3.094549
iter  34 value -3.094559
iter  35 value -3.094567
iter  36 value -3.094651
iter  37 value -3.094754
iter  38 value -3.094874
iter  39 value -3.094927
iter  40 value -3.094960
iter  41 value -3.095024
iter  42 value -3.095085
iter  43 value -3.095128
iter  44 value -3.095143
iter  45 value -3.095150
iter  46 value -3.095163
iter  47 value -3.095169
iter  48 value -3.095171
iter  49 value -3.095175
iter  50 value -3.095195
iter  51 value -3.095209
iter  52 value -3.095219
iter  53 value -3.095223
iter  54 value -3.095225
iter  55 value -3.095229
iter  56 value -3.095233
iter  57 value -3.095236
iter  58 value -3.095237
iter  59 value -3.095239
iter  60 value -3.095243
iter  61 value -3.095249
iter  62 value -3.095252
iter  63 value -3.095253
iter  64 value -3.095254
iter  65 value -3.095254
iter  66 value -3.095254
iter  67 value -3.095255
iter  68 value -3.095255
iter  69 value -3.095255
iter  70 value -3.095255
iter  71 value -3.095256
iter  72 value -3.095257
iter  73 value -3.095258
iter  74 value -3.095258
iter  75 value -3.095258
iter  76 value -3.095258
iter  77 value -3.095258
iter  77 value -3.095258
iter  77 value -3.095258
final  value -3.095258 
converged
initial  value -3.090645 
iter   2 value -3.090694
iter   3 value -3.090714
iter   4 value -3.090751
iter   5 value -3.090787
iter   6 value -3.090923
iter   7 value -3.091012
iter   8 value -3.091065
iter   9 value -3.091069
iter  10 value -3.091074
iter  11 value -3.091087
iter  12 value -3.091096
iter  13 value -3.091103
iter  14 value -3.091110
iter  15 value -3.091131
iter  16 value -3.091335
iter  17 value -3.091362
iter  18 value -3.091379
iter  19 value -3.091491
iter  20 value -3.091561
iter  21 value -3.091583
iter  22 value -3.091600
iter  23 value -3.091687
iter  24 value -3.091796
iter  25 value -3.092028
iter  26 value -3.092302
iter  27 value -3.093118
iter  28 value -3.093396
iter  29 value -3.095274
iter  30 value -3.095467
iter  31 value -3.095812
iter  32 value -3.095871
iter  33 value -3.095891
iter  34 value -3.096063
iter  35 value -3.096087
iter  36 value -3.096112
iter  37 value -3.096157
iter  38 value -3.096163
iter  39 value -3.096164
iter  40 value -3.096164
iter  41 value -3.096166
iter  42 value -3.096169
iter  43 value -3.096171
iter  44 value -3.096171
iter  45 value -3.096172
iter  46 value -3.096172
iter  46 value -3.096172
final  value -3.096172 
converged

$fit

Call:
arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S), 
    xreg = xmean, include.mean = FALSE, transform.pars = trans, fixed = fixed, 
    optim.control = list(trace = trc, REPORT = 1, reltol = tol))

Coefficients:
         ar1      ar2      ar3      ma1     ma2     ma3    xmean
      0.8664  -0.0034  -0.6061  -0.8523  0.0454  0.5924  -0.0012
s.e.  0.4229   0.6271   0.3986   0.4353  0.6351  0.4148   0.0015

sigma^2 estimated as 0.002044:  log likelihood = 1695.68,  aic = -3375.37

$degrees_of_freedom
[1] 1004

$ttable
      Estimate     SE t.value p.value
ar1     0.8664 0.4229  2.0487  0.0408
ar2    -0.0034 0.6271 -0.0054  0.9957
ar3    -0.6061 0.3986 -1.5207  0.1287
ma1    -0.8523 0.4353 -1.9583  0.0505
ma2     0.0454 0.6351  0.0716  0.9430
ma3     0.5924 0.4148  1.4282  0.1535
xmean  -0.0012 0.0015 -0.8030  0.4222

$AIC
[1] -3.338641

$AICc
[1] -3.33853

$BIC
[1] -3.299719

Determining Need for Additional Model

The standardized residuals plot shows that there is still significant volatility not yet modelled by the arma model. The acf and pacf plots of squared residuals both have significant correlation at the first few lags indicating that a garch model would be a good fit for the data. In addition the Arch Test has a p-value less than 0.05 so we can conclude that there are arch affects in the residuals.

Code
resids = fit$residuals
plot(resids)

Code
acf(resids^2, main = "ACF of Squared Residuals")

Code
pacf(resids^2, main= "PACF of Squared Residuals")

Code
ArchTest(resids)

    ARCH LM-test; Null hypothesis: no ARCH effects

data:  resids
Chi-squared = 46.815, df = 12, p-value = 5.019e-06

Optimal Garch Model

Garch Model of (3,1) has the lowest AIC, however after looking at the significance of the coefficients of the model I determined that a GARCH(1,1) is the best model to fit this series. The Ljung-Box Test p-values are all above 0.05 indicating that there is no significant autocorrelation at various lags which is a sign of a good model.

Code
model <- list() ## set counter
cc <- 1
for (p in 1:7) {
  for (q in 1:7) {
  
model[[cc]] <- garch(resids,order=c(q,p),trace=F)
cc <- cc + 1
}
} 

## get AIC values for model evaluation
GARCH_AIC <- sapply(model, AIC) ## model with lowest AIC is the best
which(GARCH_AIC == min(GARCH_AIC))
[1] 15
Code
model[[which(GARCH_AIC == min(GARCH_AIC))]]

Call:
garch(x = resids, order = c(q, p), trace = F)

Coefficient(s):
       a0         a1         a2         a3         b1  
3.936e-04  6.417e-15  1.924e-01  1.049e-01  5.322e-01  
Code
summary(garchFit(~garch(3,1), resids,trace = F)) 

Title:
 GARCH Modelling 

Call:
 garchFit(formula = ~garch(3, 1), data = resids, trace = F) 

Mean and Variance Equation:
 data ~ garch(3, 1)
<environment: 0x7fd13088f778>
 [data = resids]

Conditional Distribution:
 norm 

Coefficient(s):
        mu       omega      alpha1      alpha2      alpha3       beta1  
0.00021973  0.00049046  0.00000001  0.25938593  0.14251622  0.41223271  

Std. Errors:
 based on Hessian 

Error Analysis:
        Estimate  Std. Error  t value Pr(>|t|)    
mu     2.197e-04   1.200e-03    0.183 0.854733    
omega  4.905e-04   2.589e-04    1.895 0.058146 .  
alpha1 1.000e-08   2.743e-02    0.000 1.000000    
alpha2 2.594e-01   6.904e-02    3.757 0.000172 ***
alpha3 1.425e-01   1.260e-01    1.131 0.257950    
beta1  4.122e-01   2.122e-01    1.943 0.052071 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Log Likelihood:
 1770.165    normalized:  1.750905 

Description:
 Fri Apr 28 09:28:59 2023 by user:  


Standardised Residuals Tests:
                                Statistic p-Value  
 Jarque-Bera Test   R    Chi^2  1928.613  0        
 Shapiro-Wilk Test  R    W      0.9469361 0        
 Ljung-Box Test     R    Q(10)  10.86777  0.3679126
 Ljung-Box Test     R    Q(15)  14.21316  0.5094282
 Ljung-Box Test     R    Q(20)  21.82276  0.350205 
 Ljung-Box Test     R^2  Q(10)  1.403864  0.9992048
 Ljung-Box Test     R^2  Q(15)  7.212787  0.9514737
 Ljung-Box Test     R^2  Q(20)  9.580263  0.9751603
 LM Arch Test       R    TR^2   1.740023  0.9997125

Information Criterion Statistics:
      AIC       BIC       SIC      HQIC 
-3.489940 -3.460749 -3.490010 -3.478851 
Code
summary(garchFit(~garch(2,1), resids,trace = F)) 

Title:
 GARCH Modelling 

Call:
 garchFit(formula = ~garch(2, 1), data = resids, trace = F) 

Mean and Variance Equation:
 data ~ garch(2, 1)
<environment: 0x7fd133057190>
 [data = resids]

Conditional Distribution:
 norm 

Coefficient(s):
        mu       omega      alpha1      alpha2       beta1  
0.00021973  0.00026140  0.00000001  0.28164988  0.62978682  

Std. Errors:
 based on Hessian 

Error Analysis:
        Estimate  Std. Error  t value Pr(>|t|)    
mu     2.197e-04   1.189e-03    0.185  0.85333    
omega  2.614e-04   8.447e-05    3.095  0.00197 ** 
alpha1 1.000e-08   2.450e-02    0.000  1.00000    
alpha2 2.816e-01   6.851e-02    4.111 3.94e-05 ***
beta1  6.298e-01   8.059e-02    7.814 5.55e-15 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Log Likelihood:
 1768.96    normalized:  1.749714 

Description:
 Fri Apr 28 09:29:00 2023 by user:  


Standardised Residuals Tests:
                                Statistic p-Value  
 Jarque-Bera Test   R    Chi^2  1961.216  0        
 Shapiro-Wilk Test  R    W      0.945671  0        
 Ljung-Box Test     R    Q(10)  9.65394   0.4713606
 Ljung-Box Test     R    Q(15)  13.02855  0.6000929
 Ljung-Box Test     R    Q(20)  21.0047   0.3968558
 Ljung-Box Test     R^2  Q(10)  1.57615   0.998678 
 Ljung-Box Test     R^2  Q(15)  6.131716  0.9774493
 Ljung-Box Test     R^2  Q(20)  8.108509  0.9911252
 LM Arch Test       R    TR^2   2.044822  0.999334 

Information Criterion Statistics:
      AIC       BIC       SIC      HQIC 
-3.489536 -3.465210 -3.489585 -3.480295 
Code
summary(garchFit(~garch(1,1), resids,trace = F)) 

Title:
 GARCH Modelling 

Call:
 garchFit(formula = ~garch(1, 1), data = resids, trace = F) 

Mean and Variance Equation:
 data ~ garch(1, 1)
<environment: 0x7fd12dc1b030>
 [data = resids]

Conditional Distribution:
 norm 

Coefficient(s):
        mu       omega      alpha1       beta1  
0.00021973  0.00010901  0.14387707  0.81895886  

Std. Errors:
 based on Hessian 

Error Analysis:
        Estimate  Std. Error  t value Pr(>|t|)    
mu     2.197e-04   1.226e-03    0.179 0.857770    
omega  1.090e-04   4.151e-05    2.626 0.008639 ** 
alpha1 1.439e-01   3.955e-02    3.638 0.000275 ***
beta1  8.190e-01   4.551e-02   17.996  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Log Likelihood:
 1758.877    normalized:  1.73974 

Description:
 Fri Apr 28 09:29:00 2023 by user:  


Standardised Residuals Tests:
                                Statistic p-Value  
 Jarque-Bera Test   R    Chi^2  3668.633  0        
 Shapiro-Wilk Test  R    W      0.9321536 0        
 Ljung-Box Test     R    Q(10)  9.353274  0.4989366
 Ljung-Box Test     R    Q(15)  13.49579  0.5640607
 Ljung-Box Test     R    Q(20)  22.67548  0.3050089
 Ljung-Box Test     R^2  Q(10)  1.702324  0.9981545
 Ljung-Box Test     R^2  Q(15)  3.851164  0.9981852
 Ljung-Box Test     R^2  Q(20)  5.610623  0.9993304
 LM Arch Test       R    TR^2   2.271835  0.9988612

Information Criterion Statistics:
      AIC       BIC       SIC      HQIC 
-3.471566 -3.452105 -3.471597 -3.464174 
Code
## Optimal Model

fit = Arima(returns, order = c(3,0,3))
summary(fit)
Series: returns 
ARIMA(3,0,3) with non-zero mean 

Coefficients:
         ar1      ar2      ar3      ma1     ma2     ma3     mean
      0.8664  -0.0034  -0.6061  -0.8523  0.0454  0.5924  -0.0012
s.e.  0.4229   0.6271   0.3986   0.4353  0.6351  0.4148   0.0015

sigma^2 = 0.002058:  log likelihood = 1695.68
AIC=-3375.37   AICc=-3375.22   BIC=-3336.02

Training set error measures:
                       ME       RMSE        MAE MPE MAPE      MASE       ACF1
Training set 2.197294e-05 0.04521095 0.03146807 Inf  Inf 0.6945023 0.01280739
Code
resids = fit$residuals
fit = garchFit(~garch(1,1), resids,trace = F)
summary(fit)

Title:
 GARCH Modelling 

Call:
 garchFit(formula = ~garch(1, 1), data = resids, trace = F) 

Mean and Variance Equation:
 data ~ garch(1, 1)
<environment: 0x7fd131cb9c08>
 [data = resids]

Conditional Distribution:
 norm 

Coefficient(s):
        mu       omega      alpha1       beta1  
0.00021973  0.00010901  0.14387707  0.81895886  

Std. Errors:
 based on Hessian 

Error Analysis:
        Estimate  Std. Error  t value Pr(>|t|)    
mu     2.197e-04   1.226e-03    0.179 0.857770    
omega  1.090e-04   4.151e-05    2.626 0.008639 ** 
alpha1 1.439e-01   3.955e-02    3.638 0.000275 ***
beta1  8.190e-01   4.551e-02   17.996  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Log Likelihood:
 1758.877    normalized:  1.73974 

Description:
 Fri Apr 28 09:29:00 2023 by user:  


Standardised Residuals Tests:
                                Statistic p-Value  
 Jarque-Bera Test   R    Chi^2  3668.633  0        
 Shapiro-Wilk Test  R    W      0.9321536 0        
 Ljung-Box Test     R    Q(10)  9.353274  0.4989366
 Ljung-Box Test     R    Q(15)  13.49579  0.5640607
 Ljung-Box Test     R    Q(20)  22.67548  0.3050089
 Ljung-Box Test     R^2  Q(10)  1.702324  0.9981545
 Ljung-Box Test     R^2  Q(15)  3.851164  0.9981852
 Ljung-Box Test     R^2  Q(20)  5.610623  0.9993304
 LM Arch Test       R    TR^2   2.271835  0.9988612

Information Criterion Statistics:
      AIC       BIC       SIC      HQIC 
-3.471566 -3.452105 -3.471597 -3.464174 

Model Equation

\(y_t = -0.0012 + 0.866y_{t-1}-0.0034y_{t-2}-0.606y_{t-3}-0.85w_{t-1} + 0.0454w_{t-2} + 0.5924w_{t-3}+w_t\)

\(z_t = \epsilon_t\sigma_t\)

\(\sigma_t = 0.000109 + 0.144z_{t-1} + 0.819 \sigma_{t-1}\)

Model Conclusions

The GARCH(1,1) model was the best fit for this data. It performs well on the Ljung-Box Test and does not have significant autocorrelation at any of the lags. The model still has high variation, indicating the need for more data. LYFT stock has only been publicly traded since March 2019, which is only a small dataset. Once LYFT has been traded for longer this analysis will be better able to forecast variation in the returns of its stock price.