Differentially-Private Learning
of Low Dimensional Manifolds
Anna Choromanska
1
, Krzysztof Choromanski
2
, Geetha Jagannathan
3
,
and Claire Monteleoni
4
1
Department of Electrical Engineering, Columbia University, NY, USA
2
Department of Industrial Engineering and Operations Research,
Columbia University, NY, USA
3
Department of Computer Science, Columbia University, NY, USA
4
Department of Computer Science, George Washington University, DC, USA
{aec2163,kmc2178}@columbia.edu, geetha@cs.columbia.edu, cmontel@gwu.edu
Abstract. In this paper, we study the problem of differentially-private
learning of low dimensional manifolds embedded in high dimensional
spaces. The problems one faces in learning in high dimensional spaces are
compounded in differentially-private learning. We achieve the dual goals
of learning the manifold while maintaining the privacy of the dataset
by constructing a differentially-private data structure that adapts to the
doubling dimension of the dataset. Our differentially-private manifold
learning algorithm extends random projection trees of Dasgupta and
Freund. A naive construction of differentially-private random projection
trees could involve queries with high global sensitivity that would af-
fect the usefulness of the trees. Instead, we present an alternate way of
constructing differentially-private random projection trees that uses low
sensitivity queries that are precise enough for learning the low dimen-
sional manifolds. We prove that the size of the tree depends only on the
doubling dimension of the dataset and not its extrinsic dimension.
1 Introduction
Many real world datasets are measured at extremely high dimension. Analyz-
ing datasets in high dimension affects learning algorithms in many ways. Most
of the existing algorithms have time complexities that are super-polynomially
dependent on the dimension of the dataset. Some algorithms need enormous
amounts of data to obtain meaningful results in high-dimensional spaces. This
phenomenon is referred to as the curse of dimensionality in the machine learning
literature. One way to address this is through dimensionality reduction (Bishop
(2006); Cox and Cox (2000)). In many cases, although a data set may have ap-
parent high dimensionality, the data actually might lie on a low dimensional
manifold.
Non-linear dimensionality reduction techniques (Lee and Verleysen (2007))
provide ways to construct mappings from the given high dimensional
spaces into the low dimensional manifolds in which the data actually lie.
S. Jain et al. (Eds.): ALT 2013, LNAI 8139, pp. 249–263, 2013.
c
Springer-Verlag Berlin Heidelbe rg 2013