Non-parametric estimation of archimedean's radial parts -- a summer story

Background

The estimation of Archimedean copulas has been bothering me for quite some time. My first shot at research, back when I was still a master’s student, was a project with a few friends using these models in a Solvency II and reinsurance pricing problem. I was always amazed by the fact that the Archimedean models we actually use are very constrained, often one- or two-parameter families, while the class of Archimedean copulas itself is extremely rich: generators are functions, hence infinite-dimensional objects.

This summer, while spending some time in a white zone without internet, I found myself once again thinking about the non-parametric estimation problem. This blog post summarizes those wanderings.

Theoretical setup

Let $C$ be a $d$-dimensional Archimedean copula $(d\ge 2)$ with $d$-monotone generator $\varphi$. By the Williamson representation, there exists a nonnegative random variable $R$ such that

$$ \varphi(t) = \mathbb{E}\left[\left(1-\frac{t}{R}\right)_{+}^{d-1}\right] =: \varphi_R(t). $$

Let $U\sim C$, and define the Kendall distribution

$$ K_R(x) = \mathbb{P}\left(C(U)\le x\right), \qquad x\in[0,1]. $$

For empirical work, given a sample ${U_i}_{i=1}^n$ from the copula and the Deheuvels empirical copula $C_n$, the Kendall distribution can be estimated by

$$ K_n(x) = \frac{1}{n}\sum_{i=1}^n \mathbf{1}{C_n(U_i)\le x}. $$

Assume now that $R$ is discrete, with

$$ R \sim \sum_{j=1}^N w_j \delta_{r_j}, $$

where the atoms are ordered as $0<r_1<\cdots<r_N$. The associated generator becomes

$$ \varphi_R(t) = \sum_{j=1}^N w_j \left(1-\frac{t}{r_j}\right)_{+}^{d-1}, $$

Knowing that, for Archimedean copulas,

$$ K_R(x) = \mathbb P\left(\varphi_R(R) \le x\right), $$

a natural path toward estimation is to invert this equation to find $R$ in terms of $K_n$. Note that $K_R$ has jumps precisely at the points $x_j = \varphi_R(r_j)$ with masses $w_j$.

Smart inversion

From the Kendall pseudo-sample ${C_n(U_i)}_{i=1}^n$, collect the distinct values and their empirical frequencies, defining

$$ x_1 > x_2 > \cdots > x_N,\qquad w_j = \frac{1}{n}\sum_{i=1}^n \mathbf{1}{C_n(U_i) = x_j}, j=1,\dots,N, $$

so that $\sum_{j=1}^N w_j = 1$ and $K_n(x) = \sum_{j=1}^N w_j \mathbf{1}{x_j\le x}$. Recover support points $0<r_1<\cdots<r_N$ by solving the system $x_j = \varphi_R(r_j)$. Because the positive parts in $\varphi_R$ switch on only when $r_j$ exceeds the evaluation point, the system is triangular:

$$ r_N := 1, $$

$$ r_{N-1} := 1 - \left(\frac{x_{N-1}}{w_N}\right)^{1/(d-1)}, $$

$$ \text{and, for } k = N-2,\dots,1:\quad x_k = \sum_{j=k+1}^N w_j\left(1-\frac{r_k}{r_j}\right)^{d-1}. $$

The last equation can be solved by 1D root finding in $r_k\in[0, r_{k+1})$. The estimate is then

$$ \hat{R} \sim \sum_{j=1}^N w_j \delta_{r_j}. $$

The associated generator $\varphi_{\hat{R}}$ is continuous and piecewise polynomial between the recovered radii.

Examples

We generate $n=2000$ samples from Archimedean copulas with the following radial parts:

  • $R \sim \texttt{Dirac}(1)$, the Dirac mass at point $1$, corresponding to the lower-bound Clayton copula, in dimension $d=3$.
  • $R \sim \frac{\texttt{Dirac}(1) + \texttt{Dirac}(4) + \texttt{Dirac}(8)}{3}$, a simple mixture of point masses, in dimensions $d=2$ and $d=3$.
  • $R \sim \texttt{Gamma}(0.2)$ in dimension $d=10$.
  • $R \sim \texttt{LogNormal}(1,3)$, which is heavy tailed, in dimension $d=10$.
  • $R \sim \texttt{Pareto}(1/2)$, which does not even have an expectation, in dimension $d=10$.

From each sample, we estimate $\hat{R}$ using the proposed procedure. The graphs below give a graphical comparison of the empirical Kendall function and the fitted one, scatterplots of samples from the original copula and from the fitted copula, and histograms of samples from the estimated vs. true $R$.

Each estimator took only a few milliseconds to compute on a standard laptop, thanks to the power of Julia :)

Case $R \sim \delta_1, d=3$. Diagnostic plots: Differences between empirical and fitted Kendall functions on the left,  scatterplots of original vs.\ fitted copula (first two dimensions) samples in the middle, and histograms comparing estimated vs.\ true radial laws on the right.

Case $R \sim \frac{\delta_1 + \delta_4 +\delta_8}{3}, d=2$.

Case $R \sim \frac{\delta_1 + \delta_4 +\delta_8}{3}, d=3$.

Case $R \sim \texttt{Gamma}(0.2), d=10$.

Case $R \sim \texttt{LogNormal}(1,3), d=10$.

Case $R \sim \texttt{Pareto}(1/2), d=10$.

It already exists!

When coming back to civilization, I started looking around to see if something like that already existed, and in fact it does. The method was proposed in 2011 by the following paper:

@article{,
  title = {Inference in Multivariate {{Archimedean}} Copula Models},
  author = {Genest, Christian and Ne{\v s}lehov{\'a}, Johanna and Ziegel, Johanna},
  year = {2011},
  month = aug,
  journal = {TEST},
  volume = {20},
  number = {2},
  pages = {223--256},
  issn = {1133-0686, 1863-8260}
}

The paper also contains proofs and explanations that are much more thorough than mine. In the end, I recycled my code as an example page in the Copulas.jl documentation, and the code used to produce the graphics above was merged into the package as the default non-parametric estimator for Archimedean generators.

Looking ahead, one could even allow Liouville models to be fitted in two parts, with the Dirichlet component treated independently. But that will probably be the topic of another summer story.

Oskar Laverny
Oskar Laverny
Maître de Conférence

What would be the dependence structure between quality of code and quantity of coffee ?

comments powered by Disqus

Related