(Nearly) Optimal Private Linear Regression for Sub-Gaussian Data via Adaptive Clipping

Abstract

We study the problem of differentially private linear regression where each of the data point is sampled from a fixed sub-Gaussian style distribution. We propose and analyze a one-pass mini-batch stochastic gradient descent method (DP-AMBSSGD) where points in each iteration are sampled without replacement. Noise is added for DP but the noise standard deviation is estimated online. Compared to existing $(ϵ,δ)$ -DP techniques which have sub-optimal error bounds, DP-AMBSSGD is able to provide nearly optimal error bounds in terms of key parameters like dimensionality $d$ , number of points $N$ , and the standard deviation $\sigma$ of the noise in observations. For example, when the $d$ -dimensional covariates are sampled i.i.d. from the normal distribution, then the excess error of DP-AMBSSGD due to privacy is $\frac{σ^2d}{N}(1+\frac{d}{\epsilon^2N}) $, i.e., the error is meaningful when number of samples $N = \Omega (d \log d)$ which is the standard operative regime for linear regression. In contrast, error bounds for existing efficient methods in this setting are $:\mathcal{O}\Big(\frac{d^3}{\epsilon^2N^2}\Big)$, even for $\sigma=0$. That is, for constant $\epsilon$, the existing techniques require $N=\Omega(d^{1.5})$ to provide a non-trivial result.

Publication
Proceedings of Thirty Fifth Conference on Learning Theory
Prateek Varshney
Prateek Varshney
Research Associate

My research interests include distributed robotics, mobile computing and programmable matter.