{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Problem Set 4: VAR models [![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/tobiasraabe/time_series/master?filepath=docs%2Fproblem_sets%2Fproblem_set_4.ipynb)\n",
    "\n",
    "## Exercise 1\n",
    "\n",
    "Consider the following VAR\n",
    "\n",
    "$$\\begin{align}\n",
    "    y_t &= (1 + \\beta) y_{t-1} - \\beta \\alpha x_{t-1} + \\epsilon_{1t}\\\\\n",
    "    x_t &= \\gamma y_{t-1} + (1 - \\gamma\\alpha) x_{t-1} + \\epsilon_{2t}\n",
    "\\end{align}$$\n",
    "\n",
    "Show that this VAR is non-stationary.\n",
    "\n",
    "To show the former statement, we will first rewrite the system of equations in matrix notation.\n",
    "\n",
    "$$\\begin{align}\n",
    "    \\begin{pmatrix} y_t \\\\ x_t \\end{pmatrix} = \\begin{pmatrix}1 + \\beta & -\\alpha\\beta\\\\\\gamma & 1 - \\alpha\\gamma\\end{pmatrix} \\begin{pmatrix}y_{t-1}\\\\x_{t-1}\\end{pmatrix} + \\begin{pmatrix}\\epsilon_{1t}\\\\\\epsilon_{2t}\\end{pmatrix}\n",
    "\\end{align}$$\n",
    "\n",
    "For stationarity, it has to hold that\n",
    "\n",
    "$$\n",
    "    \\det(A(z)) = \\det(I_2 - A_1 z) \\neq 0\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Inserting the values from the previous formula yields\n",
    "\n",
    "$$\\begin{align}\n",
    "    \\det(A(z)) &= \\det(I_2 - A_1 z)\\\\\n",
    "               &= \\det\\begin{pmatrix}\n",
    "                           1 - (1 + \\beta)z & \\alpha\\beta z\\\\\n",
    "                           -\\gamma z & 1 - (1 - \\alpha\\gamma)z\n",
    "                       \\end{pmatrix}\\\\\n",
    "               &= (1 - (1 + \\beta)z)(1 - (1 - \\alpha\\gamma)z) + \\alpha\\beta\\gamma z^2\\\\\n",
    "               &= 1 - (1 - \\alpha\\gamma) z - (1 + \\beta)z + (1 + \\beta)(1 - \\alpha\\gamma)z^2 + \\alpha\\beta\\gamma z^2\\\\\n",
    "               &= 1 - z + \\alpha\\gamma z - z - \\beta z + (1 - \\alpha\\gamma + \\beta - \\alpha\\beta\\gamma)z^2 + \\alpha\\beta\\gamma z^2\\\\\n",
    "               &= 1 + (\\alpha\\gamma - 2 - \\beta)z + (1 - \\alpha\\gamma + \\beta)z^2\n",
    "\\end{align}$$\n",
    "\n",
    "**TODO**: Finish proving non-stationarity"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise 2\n",
    "\n",
    "Consider a stationary vector-autoregressive process $A(L)X_t = \\epsilon_t$ and its corresponding moving average representation $X_t = C(L)\\epsilon_t$, where $C(L) = \\sum^\\infty_{i=0} C_i L^i$.\n",
    "\n",
    "1. Find the moving average coefficients for a $VAR(1)$ process.\n",
    "\n",
    "   Given a $VAR(1)$ with $X_t = \\nu + A_1 X_{t-1} + \\epsilon_t$, we can rewrite the process the following way:\n",
    "   \n",
    "   $$\\begin{align}\n",
    "                   X_t &= \\nu + A_1 X_{t-1} + \\epsilon_t\\\\\n",
    "       (I_K - A_1 L) X_t &= \\nu + \\epsilon_t\\\\\n",
    "   \\end{align}$$\n",
    "   \n",
    "   Then, for the Wold representation we need that $C(L)(I_K - A_1 L) = I_K$ with $C(L) = \\sum^\\infty_{i=0} C^i L^i$. Premultiply yields:\n",
    "   \n",
    "   $$\\begin{align}\n",
    "       I_K &= \\sum^\\infty_{i=0} C_i L^i (I_K - A_1 L)\\\\\n",
    "           &= C_0 + C_1 L + C_2 L^2 + \\dots\\\\\n",
    "           &- C_0 A_1 L - C_1 A_1 L^2 + \\dots\\\\\n",
    "   \\end{align}$$\n",
    "   \n",
    "   Matching by the lag operator yields the following relation\n",
    "   \n",
    "   $$\\begin{align}\n",
    "       L = 0: && C_0 &= I_K\\\\\n",
    "       L = 1: && C_1 &= C_0 A_1 = A_1\\\\\n",
    "       L = 2: && C_2 &= C_1 A_1 = A^2_1\\\\\n",
    "       \\vdots && \\vdots\\\\\n",
    "       L = i  && C_i &= A^i_1\n",
    "   \\end{align}$$\n",
    "   \n",
    "2. Show that the moving average coefficients for a $VAR(2)$ can befound recoursively by $C_0 = I$, $C_1 = A_1$, and $C_i = A_1 C_{i-1} + A_2 C_{i-2}$ for $i = 2, \\dots$.\n",
    "\n",
    "   First, let us define a $VAR(2)$ process as $X_t = \\nu + A_1 X_{t-1} + A_2 X_{t-2} + \\epsilon_t$. Then, rewrite the process with the lag operator and apply the Wold representation.\n",
    "\n",
    "   $$\\begin{align}\n",
    "                             X_t &= \\nu + A_1 X_{t-1} + A_2 X_{t-2} + \\epsilon_t\\\\\n",
    "       (I_K - A_1 L - A_2 L^2) X_t &= \\nu + \\epsilon_t\\\\\n",
    "   \\end{align}$$\n",
    "   \n",
    "   The Wold representation requires that $C(L)(I_K - A_1 L - A_2 L) = I_K$.\n",
    "   \n",
    "   $$\\begin{align}\n",
    "       I_K &= \\sum^\\infty_{i=0} C_i L^i (I_K - A_1 L - A_2 L^2)\\\\\n",
    "           &= C_0 + C_1 L + C_2 L^2 + C_3 L^3 + \\dots\\\\\n",
    "           &- C_0 A_1 L - C_1 A_1 L^2 - C_2 A_1 L^3 - \\dots\\\\\n",
    "           &- C_0 A_2 L^2 - C_1 A_2 L^3 - \\dots\n",
    "   \\end{align}$$\n",
    "   \n",
    "   Matching by the lag operator yields the following relation\n",
    "   \n",
    "   $$\\begin{align}\n",
    "       L = 0: && C_0 &= I_K\\\\\n",
    "       L = 1: && C_1 &= C_0 A_1 = A_1\\\\\n",
    "       L = 2: && C_2 &= C_1 A_1 + C_0 A_2\\\\\n",
    "       L = 3: && C_3 &= C_2 A_1 + C_1 A_2\\\\\n",
    "       \\vdots && \\vdots\\\\\n",
    "       L = i  && C_i &= C_{i-1} A_1 + C_{i-2} A_2\n",
    "   \\end{align}$$\n",
    "   \n",
    "   **TODO**: The Wold presentation is on the wrong side of matrix $A_i$. It has to be $A_1 C_0$ instead of $C_0 A_1$, but I want to know the reason for this position change, except that it is obvious as the multiplication would not work without it."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise 3\n",
    "\n",
    "**Data Collection**\n",
    "\n",
    "Get the data from the data folder or go to https://fred.stlouisfed.org and download the following quarterly data series (code in parentheses) for the years 1947Q1-2015Q4:\n",
    "- Real GDP (gdpc96)\n",
    "- GDP Deflator (GDPDEF)\n",
    "- Government consumption (GCE)\n",
    "-  Population (B230RC0Q173SBEA)\n",
    "-  Personal Consumption: Nondurable Goods (PCND)\n",
    "-  Personal Consumption: Services (PCESV)\n",
    "\n",
    "Construct the three time series: real per capita GDP Y , real per capita government consumption\n",
    "G, and real per capita private consumption C (consisting of nondurables and services consumption).\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "using LinearAlgebra, XLSX"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "xf = XLSX.readxlsx(\"problem_set_4_data/us_data.xlsx\")\n",
    "\n",
    "gdp_real = xf[\"Tabelle1!B8:B283\"]\n",
    "gdp_deflator = xf[\"Tabelle1!D8:D283\"]\n",
    "nom_gov_cons_inv = xf[\"Tabelle1!F8:F283\"]\n",
    "nom_priv_cons_ndg = xf[\"Tabelle1!J8:J283\"]\n",
    "nom_priv_cons_services = xf[\"Tabelle1!L8:L283\"]\n",
    "nipa_pop = xf[\"Tabelle1!N8:N283\"]\n",
    "\n",
    "close(xf)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "Y = gdp_real ./ nipa_pop\n",
    "G = nom_gov_cons_inv ./ nipa_pop ./ gdp_deflator\n",
    "C = (nom_priv_cons_ndg + nom_priv_cons_services) ./ nipa_pop ./ gdp_deflator\n",
    "\n",
    "timeline = (1947:0.25:2015.75);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**VAR Estimation**\n",
    "\n",
    "Use the following function to estimate a $VAR(4)$ on the vector of observables $x_t = [log(G_t), log(Y_t), log(C_t)]$ via OLS. Also include a constant and a linear time trend."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "function olsvar(y, p, trend_dummy)\n",
    "\n",
    "    t, K = size(y)\n",
    "    y = y'\n",
    "    \n",
    "    # create lags\n",
    "    Z = y[:, p:t-1]\n",
    "    for i = 1:p-1\n",
    "        Z = [Z; y[:, p-i:t-i-1]]\n",
    "    end\n",
    "    \n",
    "    if trend_dummy == 1\n",
    "        Z = [ones(1, t-p); (1:t-p)'; Z]\n",
    "    else\n",
    "        Z = [ones(1, t-p); Z]\n",
    "    end\n",
    "\n",
    "    Y = y[:, p+1:end]\n",
    "    # Lecture 8, Equations 2 and following\n",
    "    A = (Y * Z') / (Z * Z') \n",
    "    U = Y - A * Z\n",
    "    Σ = U * U' / (t - p - p*K - 1)\n",
    "    V = A[:, 1:1+trend_dummy]\n",
    "    A = A[:, 2+trend_dummy:K*p+1+trend_dummy]\n",
    "    \n",
    "    np = length(A[:])\n",
    "    Σ_ML = (t - p - p*K - 1) / (t - p) * Σ\n",
    "    AIC = logdet(Σ_ML) + 2 * np / (t - p)\n",
    "    BIC = logdet(Σ_ML) + log(t - p) * np / (t - p)\n",
    "    HQ  = logdet(Σ_ML) + 2 * log(log(t - p)) * np / (t - p)\n",
    "\n",
    "    \n",
    "    return A, Σ, V, U, Y, Z, AIC, BIC, HQ\n",
    "end;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The function for the OLS estimation of the $VAR(4)$ requires three inputs where `y` is the matrix containing the log series of the three enitities, govermnent spending, GDP and private consumption. `p` is the number of lags, 4, and `trend_dummy` indicates whether the model should contain a trend dummy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "x = [log.(G) log.(Y) log.(C)]\n",
    "lags = 4\n",
    "trend = 1;"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "A, Σ, V, U, Y, Z, AIC, BIC, HQ = olsvar(x, lags, trend);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "lag_maximum = 8\n",
    "\n",
    "aic_array = Vector{Float64}(undef, lag_maximum)\n",
    "bic_array = Vector{Float64}(undef, lag_maximum)\n",
    "hq_array = Vector{Float64}(undef, lag_maximum)\n",
    "\n",
    "for i = 1 : lag_maximum\n",
    "    _, _, _, _, _, _, aic_array[i], bic_array[i], hq_array[i] = olsvar(x[lag_maximum + 1 - i:end, :], i, 1)\n",
    "end"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(2, 2, 2)"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "argmin(aic_array), argmin(bic_array), argmin(hq_array)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Julia 1.0.0",
   "language": "julia",
   "name": "julia-1.0"
  },
  "language_info": {
   "file_extension": ".jl",
   "mimetype": "application/julia",
   "name": "julia",
   "version": "1.0.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}