• Data Science Tasks
  • Intro
  • Statistical Methods
    • Fitting Distributions
      • Continuous Distributions
      • Discrete Distributions
    • Hypothesis Testing
    • Parameter Estimation
    • Sample Size and Power
      • Proportions
      • T-test
      • Chi-square
      • ANOVA
    • Survival Analysis
    • Experimental Design
      • Completely Random Design
      • Random Complete Block Design
    • Contingency Tables
    • Principal Components
    • Eigen Values and Statistical Distance
  • Regression Methods
    • Matrix Regression
    • Logistic Regression
    • Multinomial Logistic Regression
    • Poisson Regression
  • Clustering Methods
    • Kmeans Clustering
    • Hierarchical Clustering
  • Forecasting
    • Basic Graphical Methods for Time Series
    • Generate Time Series Data
      • 0.0.1 Manually Generated Series
      • Auto Generated Series
    • ARIMA Modeling
    • VAR Modeling
  • Machine Learning
    • Neural Network
    • Random Forest
    • Gradient Boosting
    • Support Vector Machines
    • Overfitting
  • Simulation
    • Bootstrap Simulation
  • Visualization
    • Building Maps with choropleth and ggplot

Data Science Tasks

ARIMA Modeling

library(forecast)

# Generated Time Series
set.seed(1)
x  = 100 + arima.sim(n = 100,
  model = list(order = c(1, 1, 1), ar = .1, ma = .2))

# Plot the series
plot(x, main = "Simulated ARIMA(1,1,1)")

# Check if there is auto correlation, there is
Acf(x, main = "ACF"); Pacf(x, main = "PACF")

# How many differences does it take for the data to be stationary?
# 1
plot(diff(x), main = "First Order Difference")

# Check the Acf/Pacf with differences
Acf(diff(x), main = "ACF 1 diff")  # One significant spike, suggests MA(1)

Pacf(diff(x), main = "PACF 1 diff") # One significant spike, indicates AR(1)

# Suggested model ARIMA(1,1,1)
mdl1 = arima(x, c(1,1,1))

# How do the ARMA terms compare with the terms used in the sim?
# AR1: .18 vs .1 in the sim
# MA1: .14 vs .2 in the sim
summary(mdl1)

Call:
arima(x = x, order = c(1, 1, 1))

Coefficients:
         ar1     ma1
      0.1838  0.1449
s.e.  0.2544  0.2499

sigma^2 estimated as 0.7854:  log likelihood = -129.87,  aic = 265.74

Training set error measures:
                     ME      RMSE       MAE        MPE      MAPE     MASE
Training set 0.07980263 0.8818683 0.6973907 0.07372578 0.6471377 0.936633
                     ACF1
Training set -0.005918057
# What does the forecast package auto.arima function suggest?
(mdl2 = auto.arima(x)) # No AR term
Series: x 
ARIMA(0,1,1) 

Coefficients:
         ma1
      0.3043
s.e.  0.0882

sigma^2 estimated as 0.7972:  log likelihood=-130.1
AIC=264.2   AICc=264.32   BIC=269.41
# How does the manual model compare with the auto.arima model?
AIC(mdl1, mdl2)
     df      AIC
mdl1  3 265.7356
mdl2  2 264.1965