## 2012-08-27

### Sweave Rnw For Copula on EGX

I am quite impressed by SyntaxHighlighter. Kudos Alex Gorbatchev!

This was the sweave file from the previous post. Enjoy!

% Sweatex("Cop")
% Sweave("Cop.Rnw")

\documentclass[a4paper,11pt]{article}
\usepackage{epsfig,fancyhdr}
\usepackage{amsmath,amssymb}
\usepackage[british]{babel}
\usepackage{natbib}

\oddsidemargin  0in
\evensidemargin  0in
\textwidth   6.3in
\textheight  9.5in
\topmargin  -0.7in
\renewcommand{\baselinestretch}{1.2}

%defs

\begin{document}
\thispagestyle{empty}
\title{Example of Copula Based Dependence Analysis on the \\ Egyptian Stock Market}
\author{
Mohamed~A.~Elgeneidy\footnote{Cario, Egypt. {\tt elgeneidy@gmail.com}}
}
\date{\today}
%Still need to write summary
\maketitle
{
\bigskip\centerline{\bf Summary}

\noindent This report describes a copula based dependence analysis between a market index and a stock from the index family. We discuss the choice of an appropriate copula function aimed at adequately capturing the dependence between the return time series of EGX30 index and OCIC over the last decade. By comparing with the Gaussian, Student's \textit{t}, Gumbel, Clayton and Frank copula, we concluded that the Gaussian copula function can provide a better fit to the empirical data.
}

\newpage

% set options for LaTeX and Sweave
<>=
options(width=70)
@
\setkeys{Gin}{width=\textwidth}
%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%
<>=
rm(list = ls(1))
setwd('C:/Books/EBooks/2012/Cop/CopSweatex')

library(copula)#copula fxns
library(fGarch)#standardized t density
library(MASS)#fitdistr and kde2d
library(fCopulae)#pempiricalCopula and ellipticalCopulaFit

#Data
source("C:/R/AMG/DFSPROTO.R")#SymbolHistory Table
source("C:/R/AMG/CASEIDXHISTCHOP.R")#CaseIndexHistory Table

#attach
lcase = list(caseLmat = case.Lmat, caseRmat = case.Rmat,
egxL = egx30Lev, egxR = egx30Ret)
attach(lcase)
rm(list=ls(1))#clean workspace

ocicL = caseLmat[,"OCIC.CA"]
ocicR = caseRmat[,"OCIC.CA"]
@
%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Initial data analysis}

Linear correlation between financial asset returns has been a staple approach in modern
finance since the birth of Harry Markowitz's portfolio theory. If the returns are non-normal,
correlation is not an appropriate measure of dependence between multiple asset returns. The theory
of copulas provides a flexible methodology for the general modeling of multivariate dependence.

We consider modeling the bivariate distribution of the daily log returns of Orascom Construction Industries (OCIC) and the EGX30 index (EGX)
over the period May 8, 2003 through May 6, 2010.\footnote{Extracted from egID Oracle data base. Available upon request.}

%%%%%%%%%%%%%%%%%%%%%%%%%%%
<>=
#Merging Returns
mergRet = na.omit(merge(ocicR, egxR))
ocic = as.numeric(unclass(mergRet[,1]))#unclass -> numeric
egx = as.numeric(unclass(mergRet[,2]))
@
%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%
<>=
str(mergRet)
@
%%%%%%%%%%%%%%%%%%%%%%%%%%%

Time series plots daily levels and returns for EGX and OCIC are given in Figures~\ref{egxPlot.fig}~\&~\ref{ocicPlot.fig}.
The two return series exhibit different time dependent dynamics. EGX exhibiting ARCH effects while OCIC displaying covariance stationary properties.
Both returns take on extremely large negative returns. For the present analysis, the time dependent nature of the return volatility will be ignored.

Regarding the marginal distribution of returns, we assume non-normal fat-tailed behavior.
To be sure, the Jarque-Bera test

%%%%%%%%%%%%%%%%%%%%%%%%%%%
<>=
jarqueberaTest(egx)@test$p.value jarqueberaTest(ocic)@test$p.value
@
%%%%%%%%%%%%%%%%%%%%%%%%%%%

\noindent clearly rejects the null hypothesis that the individual returns are normally distributed.

Given that the marginal distribution of the two return series are not normal, it would
be surprising if the bivariate normal is a good characterization of the joint distribution.
To see this, Figure~\ref{scatter.fig}, shows a scatter plot of the actual returns together with
the scatter plot of simulated bivariate normal data calibrated to the actual data. The simulated
normal data does not capture the observed positive dependence in the tails of the distribution.
Since the bivariate normal distribution does not adequately describe the joint behavior of returns, the
correlation coefficient \textit{may not} be the proper measure of dependence.

\begin{figure}[t]
\begin{center}
%%%%%%%%%%%%%%%%%%%%%%%%%%%
<>=
#Exploratory analysis
#TS plots
par(mfrow = c(2,1))
plot(egxL, main = "EGX30")
plot(egxR, main = "EGX Daily Returns")
@
%%%%%%%%%%%%%%%%%%%%%%%%%%%
\end{center}
\caption{EGX30 Closing Prices and Returns.}
\label{egxPlot.fig}
\end{figure}

\begin{figure}[t]
\begin{center}
%%%%%%%%%%%%%%%%%%%%%%%%%%%
<>=
#Exploratory analysis
#TS plots
#Case symbols of interest
#Plots
par(mfrow = c(2,1))
plot(ocicL, main = "Orascom Construction (OCIC)")
plot(ocicR, main = "OCIC Daily Returns")
@
%%%%%%%%%%%%%%%%%%%%%%%%%%%
\end{center}
\caption{OCIC.CA Closing Prices and Returns.}
\label{ocicPlot.fig}
\end{figure}

\begin{figure}[t]
\begin{center}
%%%%%%%%%%%%%%%%%%%%%%%%%%%
<>=

mu.hat = colMeans(mergRet)
Sigma.hat = var(mergRet)
nobs = dim(mergRet)
set.seed(1234)
egoc.sim = rmvnorm(nobs, mean = mu.hat, sigma = Sigma.hat)
names(egoc.sim) <- names(mergRet)

par(mfrow = c(1,2))
plot.zoo(na.omit(as.zoo(ocicR)), na.omit(as.zoo(egxR)), main = "Actual Returns",xlab = "OCIC", ylab = "EGX", xlim = c(-0.2,0.2), ylim = c(-0.10,0.10))#, xlim = c(-0.2,0.2), ylim = c(-0.10,0.10)
lines(stats::lowess(x = mergRet[,1], y = mergRet[,2]))
abline(h = 0, v = 0)
plot(egoc.sim[,1], egoc.sim[,2], main = "Simulated Normal Returns", xlab = "OCIC", ylab = "EGX",, xlim = c(-0.2,0.2), ylim = c(-0.10,0.10))#, xlim = c(-0.2,0.2), ylim = c(-0.10,0.10)
lines(stats::lowess(x = egoc.sim[,1], y = egoc.sim[,2]))
abline(h = 0, v = 0)
@
%%%%%%%%%%%%%%%%%%%%%%%%%%%
\end{center}
\caption{Scatter plots of actual returns and simulated bivariate normal returns for OCIC and EGX.}
\label{scatter.fig}
\end{figure}

%-------------------------------------------------------------------------------
\section{Model-fitting}
We fit copulas to a bivariate data set of returns on OCIC and the EGX index.
First, we fit a model with univariate \textit{t}-distributions and a \textit{t}-copula. The model
has three degrees-of-freedom parameters, one for each of the two univariate
models and a third for the copula. Thus the univariate distributions can have
different tail weights and that their tail weights are independent of the tail
dependence in the copula.

The univariate estimates are used as starting values when the meta \textit{t}-distribution
is fit by maximum likelihood. The estimate of the correlation coefficient in the
\textit{t}-copula is obtained using Kendall's tau ($\sin(\pi\hat{\tau}/2)$).

%%%%%%%%%%%%%%%%%%%%%%%%%%%
<>=
est.ocic = as.numeric(fitdistr(ocic, "t")$estimate) est.egx = as.numeric(fitdistr(egx, "t")$estimate)
est.ocic = est.ocic*sqrt(est.ocic/(est.ocic-2))
est.egx = est.egx*sqrt(est.egx/(est.egx-2))
cor_tau = cor(ocic, egx, method="kendall")
omega = sin((pi/2)*cor_tau)    #cor_spear: 0.8193051
@
%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%
<>=
omega
@
%%%%%%%%%%%%%%%%%%%%%%%%%%%

We then define a meta \textit{t}-distribution by specifying its \textit{t}-copula
and its univariate marginal distribution. After fitting the meta \textit{t}-distribution,
five parametric copulas were fit to the uniform-transformed returns:
\textit{t}, Gaussian, Gumbel, Frank, and Clayton. Since we used parametric estimates
to transform the returns, we are fitting the copulas by parametric maximum
pseudo-likelihood.

The results are in Table~\ref{AIC.table}. Looking at the maximized log-likelihood
values, we see that the Clayton and \textit{t} copulas fit poorly. The Gaussian copula
fits best since it minimizes AIC, but the Gumbel fits well with respect to the
positive dependence shown in the data. Figure~\ref{cdf.fig} plots the transformed
returns and contours of the distribution function of six copulas: the empirical
copula and five estimated parametric copulas. The Gaussian copula fits best in
the sense that its contours are closest to those of the empirical copula. This is
in agreement with the AIC values. Figure~\ref{kde.fig} gives the respective densities.

%%%%%%%%%%%%%%%%%%%%%%%%%%%
<>=
#define t-copula with omega the correlation param
cop_t_dim2 = tCopula(omega, dim = 2, dispstr = "un", df = 4)

#fit copulas to uniform-transformed data
n = length(ocic)
data1 = cbind(pstd(ocic, mean = est.ocic, sd = est.ocic, nu = est.ocic),
pstd(egx, mean = est.egx, sd = est.egx, nu = est.egx))
data2 = cbind(rank(as.numeric(unclass(ocic)))/(n+1), rank(as.numeric(unclass(egx)))/(n+1))
ft1 = fitCopula(cop_t_dim2, data1, optim.method = "L-BFGS-B", method = "ml", start = c(omega, 5), lower = c(0,2.5), upper = c(.5,15))
ft2 = fitCopula(cop_t_dim2, data2, optim.method = "L-BFGS-B", method = "ml", start = c(omega, 5), lower = c(0,2.5), upper = c(.5,15))

#defining the meta t-dist by specifying t-copula and its univariate marginal dists.
mvdc_t_t = mvdc(cop_t_dim2, c("std","std"),
list(list(mean = est.ocic, sd = est.ocic, nu = est.ocic),
list(mean = est.egx, sd = est.egx, nu = est.egx)))

#Fitting meta t-dist...

start = c(est.ocic, est.egx, ft1@estimate)
objFn <- function(param)
{
-loglikMvdc(param, cbind(ocic,egx), mvdc_t_t)
}
t1 = proc.time()
fit_cop = optim(start, objFn, method="L-BFGS-B",
lower = c(-.1,.001,2.5, -.1, .001, 2.5,   .2, 2.5),
upper = c(.1,.03,15,    .1,.03,15,        .8,15)
)
t2 = proc.time()
total_time = t2-t1
total_time.min = total_time/60

#Fitting Normal, Gumbel, Frank, and Clayton copulas to the data
fnorm = fitCopula(data = data1, copula = normalCopula(-.3,dim=2), method="mpl",
optim.method = "BFGS", start=.5)
fgumbel = fitCopula(data = data1, optim.method = "BFGS", method = "mpl",
copula = gumbelCopula(3, dim = 2))
ffrank = fitCopula(data = data1, optim.method = "BFGS", method = "mpl",
copula = frankCopula(3, dim = 2), start = 1)
fclayton = fitCopula(data = data1, optim.method = "BFGS", method = "mpl",
copula = claytonCopula(1, dim = 2), start = 1)

@
%%%%%%%%%%%%%%%%%%%%%%%%%%%

%%%%%%%%%%%%%%%%%%%%%%%%%%%
<>=
#Extract AIC & parameter estimates for table

aicT = -2*log(ft2@loglik) + 2*(length(ft2@estimate))
aicNorm = -2*log(fnorm@loglik) + 2*(length(fnorm@estimate))
aicGum = -2*log(fgumbel@loglik) + 2*(length(fgumbel@estimate))
aicFra = -2*log(ffrank@loglik) + 2*(length(ffrank@estimate))
aicCla = -2*log(fclayton@loglik) + 2*(length(fclayton@estimate))
@
%%%%%%%%%%%%%%%%%%%%%%%%%%%

\begin{table}
\begin{center}
\begin{tabular}{l||c|c|c}
\hline
Copula family & Estimates & Log-Likelihood & AIC\\
\hline
\textit{t} & $\hat{\rho}$ = 0.50 & 809.10 & -9.39 \\
&  $\hat{\nu}$ =  2.50  &                      &                \\
Gaussian  & $\hat{\rho}$ = 0.83 & 1008.53 & -11.83 \\
Gumbel & $\hat{\theta}$ = 2.62 & 1000.69 & -11.82 \\
Frank & $\hat{\theta}$ =  8.83 & 949.73 & -11.71    \\
Clayton & $\hat{\theta}$ = 2.25 & 849.11 & -11.49 \\
%\hline
\end{tabular}
\end{center}
\caption{Estimates of copula parameters using transformed returns data.}
\label{AIC.table}
\end{table}

\begin{figure}[p]
\begin{center}
%%%%%%%%%%%%%%%%%%%%%%%%%%%
<>=

#Estimated Copulas (CDFs) compared with the empirical copula
u1 = data1[,1]
u2 = data1[,2]
dem = pempiricalCopula(u1, u2)
par(mfrow = c(3,2))
contour(dem$x, dem$y, dem\$z, main = "Empirical")
contour(tCopula(param = ft2@estimate, df = ft2@estimate),
pcopula, main = "t")
contour(normalCopula(fnorm@estimate), pcopula, main = "Normal")
contour(gumbelCopula(fgumbel@estimate, dim = 2), pcopula,
main = "Gumbel")
contour(frankCopula(ffrank@estimate, dim = 2), pcopula, main = "Frank")
contour(claytonCopula(fclayton@estimate, dim = 2), pcopula,
main = "Clayton")
@
%%%%%%%%%%%%%%%%%%%%%%%%%%%
\end{center}
\caption{Estimated Copulas (CDFs) compared with empirical copula}
\label{cdf.fig}
\end{figure}

\begin{figure}[p]
\begin{center}
%%%%%%%%%%%%%%%%%%%%%%%%%%%
<>=

#2-d KDE of copula density compared with parametric density estimates
par(mfrow = c(3,2))
contour(kde2d(u1, u2), main = "KDE")
contour(tCopula(param = ft2@estimate, df = ft2@estimate),
dcopula, main = "t", nlevels = 25)
contour(normalCopula(fnorm@estimate), dcopula,
main = "Normal", nlevels = 25)
contour(gumbelCopula(fgumbel@estimate, dim = 2), dcopula,
main = "Gumbel", nlevels = 25)
contour(frankCopula(ffrank@estimate, dim = 2), dcopula,
main = "Frank", nlevels = 25)
contour(claytonCopula(fclayton@estimate, dim = 2),
dcopula, main = "Clayton", nlevels = 25)
@
%%%%%%%%%%%%%%%%%%%%%%%%%%%
\end{center}
\caption{2-D KDE of copula density compared with parametric density estimates}
\label{kde.fig}
\end{figure}

\section{Discussion}
In view of these results, one must withhold confidence. These findings are for one stock
out of 30 in the main index of the Egyptian stock market. They are, however, a nominal step
in understanding risks to a market that has experienced both exogenous and endogenous shocks
to the economy over the last decade. If more data were available, it would be interesting to investigate if holding a hypothetical portfolio on
25^th January 2011 could be insulated.

A alternative and perhaps preferable approach would be to construct a suitable nested archimedean copula.

\nocite{*}
\bibliographystyle{plain}
\bibliography{EGXcopula}

\end{document}
}
}