Quantification of Integrity
Michael R. Clarkson Fred B. Schneider
Department of Computer Science
Cornell University
Ithaca, NY, USA
{clarkson, fbs}@cs.cornell.edu
Abstract: Two kinds of integrity measures—contamination
and suppression—are introduced. Contamination measures
how much untrusted information reaches trusted outputs; it
is the dual of information-flow confidentiality. Suppression
measures how much information is lost from outputs; it does
not have a confidentiality dual. Two forms of suppression
are considered: programs and channels. Program suppression
measures how much information about the correct output
of a program is lost because of attacker influence and
implementation errors. Channel suppression measures how
much information about inputs to a noisy channel is missing
from channel outputs. The relationship between quantitative
integrity, confidentiality, and database privacy is examined.
Keywords: Integrity, quantitative information flow, informa-
tion theory, database privacy
I. INTRODUCTION
Many integrity requirements for computer systems are
qualitative, but quantitative requirements can also be valu-
able. For example, a system might combine data from
trusted and untrusted sensors if the untrusted sensors cannot
corrupt the result too much. As another example, we might
add noise to a database, thereby protecting privacy, if the
resulting anonymized database still contains enough uncor-
rupted information to be useful for statistical analysis. Yet
methods for quantification of corruption—that is, damage
to integrity—have received little attention to date, whereas
quantification of information leakage has been a topic of
research for over twenty years [1], [2].
To quantify corruption, a formal definition of “integrity”
is required. We know of no such widely accepted definition,
although the widely accepted informal definition seems to
be “prevention of unauthorized modification of informa-
tion” [3]–[8]. So we take two distinct notions of informa-
tion modification as points of departure: taint analysis and
program correctness. These lead to two distinct measures of
corruption that we name “contamination” and “suppression.”
Contamination generalizes taint analysis [9]–[13], which
tracks information flow from tainted inputs to outputs that
are supposed to be untainted—or, using alternative terminol-
ogy, from untrusted inputs to outputs that are supposed to
be trusted. This flow results in contamination of the trusted
outputs. Trusted outputs are not supposed to be influenced by
untrusted information, so contamination corrupts integrity.
We might be willing to deem a program secure if it allows
only limited contamination, even though taint analysis would
deem the same program to be insecure. So quantification of
contamination would be useful.
Flow between untrusted and trusted objects was first
studied by Biba [14], who identified a duality between
models of integrity and confidentiality. The confidentiality
dual to contamination is leakage, which is information
flow from secret inputs to public outputs. Previous work
has developed measures of leakage based on information
theory [15] and on beliefs [16]. This paper adapts those
measures to contamination.
1
Through the Biba duality, we
obtain a measure for corruption from a measure for leakage.
Suppression, our other measure for corruption, is derived
from program correctness. For a given input, a correct
implementation should produce an output o permitted by a
specification. The output might be permitted to differ from
o provided the output conveys the same information as o.
An implementation might, for example, produce all the bits
in the binary representation of o but in reverse order. Or the
implementation might produce o xor k, where k is a known
constant. Any knowledgeable user of these implementations
could recover o from the implementation’s output.
Conversely, the output of an incorrect implementation
would fail to convey all the information about o. For ex-
ample, a (somewhat) incorrect implementation might output
only the first few bits of o; or it might output o with
probability p and output garbage with probability 1−p; or it
might output o xor u, where u is an untrusted input. In each
case, we say that program suppression of information about
the correct output has occurred. Implementation outputs are
not supposed to omit information about correct outputs, so
program suppression corrupts integrity. Yet we might be
willing to use an implementation that produces sufficient
information about the correct output, hence exhibits little
program suppression, even though a traditional verification
methodology (e.g., Hoare logic) would deem the imple-
mentation to be incorrect. So quantification of program
suppression would be useful.
The echo specification “o := t” gives rise to an im-
portant form of program suppression that we call channel
1
Newsome et al. [17] adapt the same information-theoretic metric to
measure what they call quantitative influence (cf. §VI).