### Finite-sample analysis of $M$-estimators via self-concordance, II

This is the second of two posts where I present **our recent work with Francis Bach** on the optimal finite-sample rates for $M$-estimators.
Recall that in the previous post, we have proved the Localization Lemma which states the following: stability of the empirical risk Hessian $\mathbf{H}_n(\theta)$ on the Dikin ellipsoid with radius $r$,
\[
\mathbf{H}_n(\theta) \asymp \mathbf{H}_n(\theta_*), \, \forall \theta \in \Theta_{r}(\theta_*),
\]
guarantees that once the *score* $\Vert\nabla L_n(\theta_*) \Vert_{\mathbf{H}^{-1}}^2$ reaches $\Vert\nabla L_n(\theta_*) \Vert_{\mathbf{H}^{-1}}^2 \lesssim r^2,$
one has the desired excess risk bound:
\[
L(\widehat \theta_n) - L(\theta_*) \lesssim \Vert\widehat \theta_n - \theta_*\Vert_{\mathbf{H}}^2 \lesssim \Vert\nabla L_n(\theta_*) \Vert_{\mathbf{H}^{-1}}^2.
\]
I will now show how self-concordance allows to obtain such guarantees for $\mathbf{H}_n(\theta)$.

### Finite-sample analysis of $M$-estimators via self-concordance, I

In this series of posts, I will present our **recent work with Francis Bach** on the optimal rates for $M$-estimators with self-concordant-like losses. The term “$M$-estimator” is more commonly used in the statistical community; in the learning theory community one more often talks about empirical risk minimization.
I will mostly use the statistical terminology to stress the connections to the classical asymptotic results.