Privacy-preserving logistic regression
Kamalika Chaudhuri
Information Theory and Applications
University of California, San Diego
kamalika@soe.ucsd.edu
Claire Monteleoni
∗
Center for Computational Learning Systems
Columbia University
cmontel@ccls.columbia.edu
Abstract
This paper addresses the important tradeoff between privacy and learnability,
when designing algorithms for learning from private databases. We focus on
privacy-preserving logistic regression. First we apply an idea of Dwork et al. [6]
to design a privacy-preserving logistic regression algorithm. This involves bound-
ing the sensitivity of regularized logistic regression, and perturbing the learned
classifier with noise proportional to the sensitivity.
We then provide a privacy-preserving regularized logistic regression algorithm
based on a new privacy-preserving technique: solving a perturbed optimization
problem. We prove that our algorithm preserves privacy in the model due to [6].
We provide learning guarantees for both algorithms, which are tighter for our new
algorithm, in cases in which one would typically apply logistic regression. Ex-
periments demonstrate improved learning performance of our method, versus the
sensitivity method. Our privacy-preserving technique does not depend on the sen-
sitivity of the function, and extends easily to a class of convex loss functions. Our
work also reveals an interesting connection between regularization and privacy.
1 Introduction
Privacy-preserving machine learning is an emerging problem, due in part to the increased reliance on
the internet for day-to-day tasks such as banking, shopping, and social networking. Moreover, pri-
vate data such as medical and financial records are increasingly being digitized, stored, and managed
by independent companies. In the literature on cryptography and information security, data privacy
definitions have been proposed, however designing machine learning algorithms that adhere to them
has not been well-explored. On the other hand, data-mining algorithms have been introduced that
aim to respect other notions of privacy that may be less formally justified.
Our goal is to bridge the gap between approaches in the cryptography and information security com-
munity, and those in the data-mining community. This is necessary, as there is a tradeoff between
the privacy of a protocol, and the learnability of functions that respect the protocol. In addition to
the specific contributions of our paper, we hope to encourage the machine learning community to
embrace the goals of privacy-preserving machine learning, as it is still a fledgling endeavor.
In this work, we provide algorithms for learning in a privacy model introduced by Dwork et al. [6].
The -differential privacy model limits how much information an adversary can gain about a par-
ticular private value, by observing a function learned from a database containing that value, even if
she knows every other value in the database. An initial positive result [6] in this setting depends on
the sensitivity of the function to be learned, which is the maximum amount the function value can
change due to an arbitrary change in one input. Using this method requires bounding the sensitivity
of the function class to be learned, and then adding noise proportional to the sensitivity. This might
be difficult for some functions that are important for machine learning.
∗
The majority of this work was done while at UC San Diego.
1