Finite-sample analysis of $M$-estimators via self-concordance, II

This is the second of two posts where I present our recent work with Francis Bach on the optimal finite-sample rates for $M$-estimators. Recall that in the previous post, we have proved the Localization Lemma which states the following: stability of the empirical risk Hessian $\mathbf{H}_n(\theta)$ on the Dikin ellipsoid with radius $r$, \[ \mathbf{H}_n(\theta) \asymp \mathbf{H}_n(\theta_*), \, \forall \theta \in \Theta_{r}(\theta_*), \] guarantees that once the score $\Vert\nabla L_n(\theta_*) \Vert_{\mathbf{H}^{-1}}^2$ reaches $\Vert\nabla L_n(\theta_*) \Vert_{\mathbf{H}^{-1}}^2 \lesssim r^2,$ one has the desired excess risk bound: \[ L(\widehat \theta_n) - L(\theta_*) \lesssim \Vert\widehat \theta_n - \theta_*\Vert_{\mathbf{H}}^2 \lesssim \Vert\nabla L_n(\theta_*) \Vert_{\mathbf{H}^{-1}}^2. \] I will now show how self-concordance allows to obtain such guarantees for $\mathbf{H}_n(\theta)$.

Finite-sample analysis of $M$-estimators via self-concordance, I

In this series of posts, I will present our recent work with Francis Bach on the optimal rates for $M$-estimators with self-concordant-like losses. The term “$M$-estimator” is more commonly used in the statistical community; in the learning theory community one more often talks about empirical risk minimization. I will mostly use the statistical terminology to stress the connections to the classical asymptotic results.