Jonathan Katz: Cryptographic Perspectives on the Future of Privacy


(relaxing music)
– Good afternoon.
Thank you for coming out on this rainy day
to our Distinguished
Scholar-Teacher lecture.
My name is Laura Rosenthal, and I’m the
director for faculty leadership
for the Office of Faculty Affairs.
I actually chaired the selection committee
for this award, and I’m here on behalf
of the Provost’s Office.
John Bertot sends his apologies.
He couldn’t be here
because of a Senate meeting
at the same time.
The Distinguished Scholar-Teacher Program
was established in 1978, and has had more
than 200 awardees since its inception.
Now in its 40th year,
the Distinguished Scholar-Teacher Program
is sponsored by the
Office of Academic Affairs
and administered by the
associate provost for Faculty Affairs.
Selected by previous DST recipients,
the award honors tenured faculty members
who combine outstanding scholarship
with teaching excellence.
It is a recognition by the university
of the commitment of faculty who pursue
unfailing excellence in the classroom
and in their research efforts.
I’m very pleased, on
behalf of Provost Rankin,
to recognize Dr. Jonathan Katz as one
of our newest Distinguished
Scholar-Teachers.
As we will hear more about shortly,
Dr. Katz’s work focuses on cryptography
and cybersecurity, with his recent work
focusing largely on
techniques for computing
on personal data while preserving privacy.
He currently serves as the director
of the Maryland Cybersecurity Center.
He has demonstrated
excellence in teaching,
both in the classroom and
through the mentorship
of students, as well as
innovative instruction.
For example, through
his MOOC on cryptography
offered via Coursera
that reaches thousands
of students globally.
It is my pleasure now to
introduce Samir Khuller,
who will formally introduce Dr. Katz.
Thank you.
(applause)
(people talking)
– [Woman] The person who’s recording this.
– Thank you very much,
and welcome to everyone.
15 years ago when we interviewed Jon Katz,
he was a nervous young graduate student,
and I remember taking him
to Georgetown for dinner.
But a lot has elapsed
in the last 15 years.
Jonathan received his Ph.D in 2002.
He eventually became
professor of computer science
at UMIACS since 2013,
and this year was named as
Distinguished Scholar-Teacher.
He’s also, for the last several years,
been the director of the
Maryland Cybersecurity Center,
where he leads a team of
interdisciplinary professors,
research scientists and students to solve
large-scale, complicated
problems all related
to cybersecurity.
I think the biggest impact he’s had
through his research
is by shaping our study
and understanding of
cryptography, his main area.
He’s a brilliant and prolific leader
in this dynamic field.
He’s an academic giant
in computer science,
and we are fortunate to
have him as our colleague.
Jonathan’s work has been published
in over 30 journals, 150
peer-reviewed conference
papers, and he’s co-authored
and edited multiple books, the latest one
being Advances in Cryptography.
He can frequently be found interviewed
by various public media outlets.
Jonathan is currently
advising three Ph.D students.
He’s already graduated 11 Ph.Ds
and has mentored about 15 postdocs,
most of whom who are
leading research scientists
and professors at different universities.
The undergraduate class,
his most popular class,
which he’s taught several times, CMSA 456,
Introduction to Cryptography.
He said that he especially enjoys moments
when they finally understand something
that he’s teaching them.
He said, “My favorite thing is when
“you can see in their eyes that a concept
“is clicking at some
point in the semester,
“I hope soon, early in the semester,
“and they get really
excited about cryptography.”
That spark, to him, is very motivational.
He’s been the recipient
of multiple awards,
the Humboldt Research
Award, a member of the
IEEE Cybersecurity Initiative.
He was also named as the top
50 influential Marylanders.
He’s won an NSF career award,
as well as a DARPA Computer
Science Study Group selection.
Without further ado, I wanna welcome
Professor Jon Katz,
both of computer science
and UMIACS, and director of the
Maryland Cybersecurity Center.
(applause)
– Thank you very much, Samir.
Actually, I remember the dinner like
it was yesterday, so it’s amazing that
15 years went by.
Thank you all for coming today, actually,
especially given the
horrible weather outside,
although I guess I can at least claim
that people prefer to be in here listening
to my talk rather than
spending the day outside.
(laughter)
That’s a good thing.
I’m really very honored,
actually, to receive
this award, especially because
we’ve had other honorees
in our department,
in the Department of Computer Science.
I know Samir himself was an honoree.
Mike Hicks was also an honoree.
Jack Minker, before my time, I think
was an honoree, as well.
I really don’t feel worthy.
I know what excellent teachers they are,
and I hope I can just
measure up to some extent.
I’d like to thank the committee, also,
for nominating,
for choosing me, and to Mike in particular
for nominating me.
I was trying to think about the title
of the talk, and I was trying to come up
with some way to distinguish my title
from all the other titles of the
other DSD nominees chosen this year.
It turns out I was able to do that
by avoiding the colon, but if you really
like to have a colon in your talk,
you can also think about this
as The Future of Privacy: A
Cryptographic Perspective.
(laughter)
But hopefully you’ll be able
to understand it anyway.
I’d like to buck tradition a little bit
and actually start with
the acknowledgements.
That way they won’t get shoved at the end
where I may run out of time.
First of all, I just want to acknowledge
all the love and support of my family.
In particular, my wife
and father are here today
to listen to this talk,
so thank you for coming.
I literally could not have
been here without you.
(laughter)
I would also like to take
the opportunity to thank
the Department of Computer Science
for really providing
a very nurturing place for me.
Really something that
I couldn’t have asked
for anything better and the best place,
I guess, to grow a career over the first
15 years or so of my career.
In particular, the two
chairs of computer science
that I had the
fortune actually of working for,
Larry Davis and Samir
Khuller, so thank you
for everything you’ve done to help create
that environment for the department.
I also wanna thank my
students and postdocs.
Actually,
when I was writing over
this list, it was actually
amazing for me to see the number
of names, but also just
remember some of the
Ph.D students I had, now
going back several years.
I’ve had the opportunity over the years
to catch up with them and
hear how they’re doing.
I’d really say that, for the most part,
it’s been the best kind of
student-advisor relationship,
where I think, it’s very interesting.
I’ve learned more from my students
and from my postdocs than they,
if they learned anything from me.
I certainly learned more from them
than they learned from me.
I wanna highlight in particular, actually,
some of the students
and postdocs whose work
I’m gonna be referring
to at various points
in the talk today.
Let’s see, Xiao Wang, for sure.
Alex Malozemoff.
Sam Ranellucci
and Yan Huang
and Raef Bassily.
I don’t wanna leave anybody,
Adam Groce, as well.
Essentially, some of the work I’m gonna
be talking about today.
I’d also like to take the opportunity
to acknowledge some of my collaborators
over the years.
I’ve had the fortune of working with many
great collaborators, but in particular
on this topic, I’d like to highlight
Mike Hicks, again, as well as Adam Smith,
a professor at Penn State,
and Dave Evans at UVA.
I thought it would be
good to start actually
with a little bit of
background about cryptography.
I don’t know how many people here
are very familiar with cryptography.
I’m guessing that most of the students
who have taken my cryptography class
have since graduated, and so are probably
not in the room today, maybe a handful
are here today, so I hope
this won’t bore you too much.
But I figured it’s worth talking about it
because many people, even if they do
know something about cryptography,
they probably know
about it from the media,
and they may not have a good picture
of what exactly it is
that cryptographers do
and what kind of problems cryptographers
are working on nowadays.
I think that if you were to ask
the general public what they know,
what they think about cryptography,
more likely than not, the first thing
they would hit upon is
the idea of encryption.
In the context of encryption, we have
two parties, let’s say,
who wanna communicate
a message, and they wanna communicate
that message privately.
They wanna make sure that that message
remains secret from anybody else.
The way that encryption has been done
historically is basically through
some process that takes
this original message
and scrambles it in some way,
replacing the original message with some
random-looking text that presumably
anybody eavesdropping on the communication
won’t be able to make much sense of.
The picture down here
is actually Vigenere.
This was somebody who developed
an encryption scheme in the 1700s.
This is supposed to be a representation
of what you might get if you encrypted
an English language plain text using
the Vigenere cipher to get some
unintelligible cipher text that could
be sent across a channel.
Cryptography has been used,
or encryption, I should say, has been used
for many hundreds of years, actually.
It goes back, it predates even the 1700s,
and goes back even to classical times.
It also had a very large role to play
in World War II.
This is the Enigma machine
that the Germans used
to encrypt communication in World War II.
The other thing people might think about
when they think about cryptography
and they think about encryption is that
you have these smaller teams, small groups
of smart people who are
studying how to generate,
how to develop encryption schemes.
You might have other
people on the other side
who are sitting in a room
trying to figure out,
of course, how to break
that encryption scheme.
This is a picture of the so-called bomb
that was developed in London during
the Second World War,
as well, in an effort
to try to defeat Enigma.
The idea there is that
maybe by being more clever
than the other side, you can figure out
something about what it
is they’re communicating,
figure out how to crack the code.
Now, looking at this in
a little bit more detail,
the problem of encryption
is particularly clean.
In the case of encryption,
we have two parties
who are communicating over a channel.
Like I said before, they want to be able
to communicate while ensuring secrecy
of their communication.
They’re worried in particular about
an adversary, an attacker,
who’s eavesdropping, let’s say, on the
communication channel between them.
Again, they’d like to make sure that they
can communicate some message
between the two of them while ensuring
that the attacker doesn’t know anything
about what it is they’re communicating.
What’s particularly
nice about this picture
is that there’s a very clean and obvious
distinction between who the good guys are
and who the bad guys are.
We can draw a (laughter) clean separation
between, on the one hand, the two parties
who share a key, let’s
say, and are agreeing
to communicate.
On the other hand, the attacker,
who is the bad guy who’s listening in,
trying to figure out what
it is they’re communicating.
It’s a very simple,
very clean mental model.
This gives, like I said,
a very clean distinction
between who’s good and who’s bad,
who should be able to learn the message
being communicated, and who should not
be able to learn anything about
what’s being communicated.
Now, everything I’ve
said is an accurate view
of cryptography, up until about the 1980s.
There were, indeed, schemes that were
being developed by people over time,
throughout the centuries, developed in a
mainly heuristic way, involving
smart people thinking about
how to construct schemes.
Other people thinking
about how to break them.
Then sort of a back and forth interplay
between the people developing schemes
and the people breaking schemes
always trying to outdo the other.
Even in the public key revolution
in the 1970s, we had other smart people,
this is Rivest, Shamir and Adleman,
who developed the RSA algorithm,
who were, again, trying
to think of schemes
that other people perhaps couldn’t break,
but mainly working in these small groups
and mainly not having any formal way
to reason about the schemes, but, again,
always trying to be more
clever than the other side.
Also, into the ’80s, you
had this mental model
continuing, where there’s
this clean division
between who should learn everything,
who’s allowed to see
everything, and on the
other side, the bad guy, who’s supposed
to learn absolutely nothing.
Now, what about modern cryptography?
Let’s say roughly since the 1980s,
or mid-1980s, perhaps, or maybe putting it
another way, what are
we cryptographers doing?
Haven’t we already solved the problem
of secure communication?
Isn’t this a done deal?
We all know how to encrypt.
We all encrypt every day.
What’s left to do?
Why are they paying me to do my job?
Well, in fact, modern
cryptography is distinguished
in several ways from this picture that I
presented a few minutes ago.
It’s distinguished in many different ways
from its classical view of cryptography.
First of all, it turns
out that cryptography
is about much more than encryption.
The encryption is only a very small part
of what cryptography offers.
Secondly, what’s
especially developed since
the 1980s is a much more rigorous approach
to the subject.
Rather than having this process of people
trying to be more clever
than their attacker,
we have a more rigorous
approach to the subject
as a whole, as I’ll talk
about in a few slides.
What I think is most interesting,
and what’s perhaps most
relevant to what I’m gonna
be speaking about today, is the fact
that cryptographers now deal with a
much richer class of trust relationships.
We no longer have this
very simple picture,
like I presented on the previous slide.
Let me elaborate on each
of these a little bit.
First of all, modern
cryptography, as I said,
now has a much broader scope than
shared key encryption.
I mentioned already
public key cryptography.
This is the idea, actually, that enables
secure communication over the internet,
where parties don’t
even need to necessarily
share any keys in advance in order
to communicate securely.
But that’s just another
example of encryption,
of ways of achieving
secret communications.
But cryptography goes
beyond that, as well.
Cryptography is also concerned with things
like data integrity, making sure that data
isn’t modified by an active attacker
who’s trying, perhaps, to make one
of the parties receive a message that
the other party didn’t actually send.
It also relates to things
like entity authentication
and proving that you’re
the person you claim
to be at the other end of
a communication channel.
Also, handling much
more complex protocols,
like the ones I’m gonna
be talking about today.
It’s a bit difficult for
me actually to come up
with a good definition for the full scope
of what modern cryptography
nowadays deals with,
but the best definition
that I’ve been able
to come up with, and I’m
hoping to suggest them
for better definitions, is that it’s about
the design, analysis and implementation
of mathematical techniques
for securing information,
systems and distributed computations
against adversarial attack.
Again, this is much more
broader, much broader
than encryption because we’re not only
talking about, number one, privacy,
we’re talking about general
notions of security,
including things like
integrity and other things.
But it’s also not only
talking about communication
of information.
It’s also referring to systems as a whole.
It’s also referring to
the underlying computation
going on within that system, and providing
protections for that
computation as it proceeds.
The other thing that’s been amazing about
cryptography, since the 1980s or often
maybe a little bit earlier,
is that classically,
encryption, and
cryptography more generally,
was primarily the focus
of people in the military.
It was used primarily
in a military context.
It was not widely used
outside of that domain.
But nowadays, cryptography
is really ubiquitous.
I can say almost for
sure, and I’d be surprised
if there’s an exception, that everybody
in this room has used cryptography
multiple times today.
Cryptography is used, for example,
as part of password-based authentication.
If you typed in a password today
to log into any system,
you could be using cryptography,
I’m sure it’s using
cryptography to do that.
Password hashing at the
back end of the server
is also rely on cryptography.
If you’ve ever bought anything online.
That’s a process that involves a way
to securely transmit
your credit card number
from yourself to the merchant without
allowing an eavesdropper to figure out
or learn your credit card number.
That relies a lot on cryptography.
Encrypted wifi.
That relies on cryptography, as well.
If you’ve downloaded a software update,
an update from Microsoft
Windows, for example,
that’s digitally signed so that it can
be verified by your computer before
the software patch is installed.
There are even more complex examples
that maybe people in
this room haven’t used,
but that go beyond these
relatively simple examples.
Things like full disk encryption.
That’s available now.
You can download it and you
can encrypt your hard drive.
Or even much more complicated things.
Has anyone heard of Bitcoin?
Bitcoin is a great example of a
distributed protocol that relies on
very sophisticated cryptography underneath
to ensure the integrity of the computation
going on underneath, that
underlies the protocol.
But this is something
else that’s changed a lot.
These would be cryptography
before 1980 or so.
Now, I mentioned also that,
historically speaking,
cryptography was largely
this art where there
was this heuristic design and
analysis phase that went on
where people would
build protocols and they
would build encryption schemes,
and then they would release them,
they would, perhaps, talk about them
with their friends.
In turn, they would try to rate
the resulting crypto system.
Like I said before, seeing if they
were more clever, if they
could come up with an attack.
If they found an attack, then the scheme
might be tweaked a little bit to prevent
that attack, and then the
process would iterate.
In the late 1970s and early 1980s,
especially, the field began to develop
into much more of a rigorous science.
I think there are really three principles
that underlie this development,
from the heuristic approach to this
more of a scientific
foundational approach,
which are, first of all, an emphasis
on formal definitions.
This means coming up with precise
mathematical models of what the
scheme you’re developing should be doing,
along with mathematical definitions
of what it means precisely for the scheme
to qualify as being secure.
This is actually, I
think, a big leap forward.
People often underestimate the value
and the importance of definitions,
but I think actually
a lot of the fuzziness
that was present in cryptography before,
say, the 1980s was addressed
exactly by people coming up with these
rigorous definitions that allowed people
to pin down exactly what
it was they were studying,
and thereby make progress
on developing schemes
that could meet those definitions.
Cryptography is also, a
little bit unfortunately,
but this is the way it is, relies on
computational assumptions.
These are assumptions about the hardness
of certain problems.
You may all have heard
about the assumptions
that factoring large numbers is hard.
This is something that we
don’t know how to prove.
We don’t know any way to prove, and you
can’t factor numbers,
factor large numbers,
in a short amount of time, but even after
intensive study over many
years, over many decades,
we still have no way of, we have no
efficient algorithm for
factoring large numbers.
So we take it as an
assumption that, indeed,
this is a hard mathematical problem,
and that there’s no efficient algorithm
for, indeed, solving this.
We can then rely on those assumptions
to construct schemes that
meet the definitions.
In turn, we can come up with proofs
of security showing that these schemes
actually meet the
definitions that we’ve given
under the state of assumption.
This also is a very powerful technique,
a very powerful approach
that went a long way toward getting out
of this design-break-patch cycle
that I talked about earlier.
Now rather than just trying to be clever
and coming up with a scheme that
you weren’t able to break
in a couple of weeks,
you now have a rigorous mathematical proof
that your scheme is secure.
That tells you that within the definition
that you were working with, and assuming
that the assumption you
stated indeed holds,
there’s gonna be no way for anybody
to come up and attack the scheme.
This is real progress, I think.
Actually, it’s been recognized as such
by standard bodies, who
now essentially require
proof of security for any schemes
that are being proposed
for standardization.
These are the themes I stress, actually,
when I teach courses in cryptography.
Actually, it forced me, in writing
my textbook, to think about really
what are the three principles,
or what are the principles in general
that distinguish modern
from classical cryptography.
Going through that process
actually helped me a lot
to refine that.
If you’re interested,
you can buy the textbook.
(laughter) You can also
watch the free MOOC.
As we said earlier, you
can go online and sign up
for free and watch the videos online
and listen to me talk
for 20 hours straight.
What could be better? (laughter)
I wanna come back to this other issue,
which is the classical trust model,
where there’s a clean division between
the good guys and the bad guys.
It’s a very nice model,
but it doesn’t capture the reality
in which cryptographic
protocols are being run today.
First of all, we may
have protocols being run
in much larger systems
involving more than two parties.
Now, that’s not necessarily new.
People did talk about group communication
before 1980, but nevertheless, it’s much
more common nowadays to have protocols
involving multiple people.
Once you involve multiple
people in a protocol,
these trust relationships between them
get a lot more complicated.
For example, we might have Alice
and Carol, who perfectly trust each other.
They’re friends, they know each other,
and they perfectly trust each other.
They wouldn’t mind sharing
everything they know
with each other.
Maybe it’s also the
case that Carol and Bob
know each other, and
they also may be willing
to trust each other with
everything that they know.
But this doesn’t imply that Alice and Bob
know each other, and it doesn’t imply
at all that Alice and
Bob trust each other.
It may very well be the case that Alice
is not willing to have her data
be revealed to Bob.
This is a little bit more complicated
than what we had before
because now we no longer
have this clean boundary between who
the good guys are and
who the bad guys are.
Actually, in this
picture, maybe there’s no
attacker, per se, there’s no devil,
but it’s the case that
certain people wanna
protect certain information
from other people.
The requirements or the desires of the
different people can
vary, and they can vary
very much from person to person.
You have more complicated
examples, as well.
For example, maybe it’s the case
that we’re worried about security
or a particular user
running this protocol.
For example, this user
down at the bottom left.
From her point of view in executing
this protocol, maybe she knows everybody,
maybe not be best friends, but she knows
everybody running the protocol from work
or what have you.
Nevertheless, she’s not
really willing to assume
that everybody is completely honest.
Now, she may be willing to assume that
it’s very likely that 1/2 of them, say,
are honest, but she
doesn’t know which 1/2.
It could be the case
that these three people
indicated here are willing
to act dishonestly.
Or maybe it’s the case that
they’re perfectly willing
to act honestly, but
their machine is corrupted
by a virus.
Even though they’re perfectly willing
and they want to act honestly,
something going on in their computer
that they’re even unaware of is making
their computer deviate from the protocol
as the protocol was running.
As I said, the problem is that
this person at the lower left has no idea
which of those three parties that
she’s interacting with might be malicious.
It may be a different
set of three parties.
All she knows, or all
she’s willing to assume,
is that there’s some, three other parties
she’s interacting with,
so some three out of six
of the other people she’s
interacting with are honest,
but she has no idea which ones.
This is another example of a more complex
trust relationship that we’d like
to be able to deal with.
In fact, these issues, these ideas
of having more complicated relationships
are not specific to the group setting.
They come up even in
the two-party setting.
Imagine, maybe it’s easier this way,
rather than having two people interacting,
we have maybe two companies interacting.
These companies are trying to negotiate
some contract.
In the classical picture of encryption,
the person you trust,
you would like to share
everything with, but in this context,
that may no longer be the case.
Maybe one company is willing to share
their net revenue with the other company
as part of this
negotiation, but they’re not
willing to reveal their customer list.
It’s not the case that
just because I trust,
in some sense, the other person that
I’m interacting with
that I’m willing to share
everything with them.
There’s certain things
I’m willing to share,
and certain things I’m
not willing to share.
In addition, this trust relationship may
be asymmetric.
Maybe the other company
is not even willing
to share their net revenue.
They need to decide for themselves
what they’re comfortable sharing
with the other party.
These kind of issues, this richer class
of trust models between
people, is something
that cryptographers nowadays spent
a lot of time thinking about and designing
protocols that can deal with it.
Now, what does this
have to do with privacy?
I guess it’s already
clear because I mentioned
privacy a number of times.
I just wanna say briefly a couple of words
to set the stage about privacy,
especially nowadays.
To note, first of all, that concerns about
privacy, in general, are not new.
They’re traditionally dated to about 1890,
actually, to a Harvard Law Review article
published by Warren and Brandeis.
Louis Brandeis actually was the one
who was supposed to be more involved
in writing the article, but anyway,
he went onto be a Supreme Court justice.
This was before he was
a Supreme Court justice.
He wrote an article trying to understand,
basically, what right to privacy there was
and to what extent the right to privacy
followed from the US Constitution.
It’s interesting, it’s
almost a little quaint,
right, to look back and at the time
to see what he was concerned about.
He was very concerned about the advent,
the widespread use of photography.
Now it meant that people
could take pictures
of other people in the street.
Who could imagine?
The growing popularity of newspapers,
where people would publish, perhaps,
salacious stories about others.
Coming kind of full circle, in 2015,
DARPA kicked off the Brandeis program,
named after Louis Brandeis,
whose goal, actually, was to provide users
with some measure of control
over their own privacy.
I have a particularly nice quote
in the BAA program, actually, that I
took out here.
Saying that, fundamentally,
“Democracy depends
“on creativity and free
interchange of diverse ideas.
“Constant observation,”
lack of privacy, “has a dampening effect
“on individuality, promotes conformance
“and inhibits personal development,
“freedom of thought and speech.”
It’s amazing, actually, to think about
this coming from the DoD.
DARPA’s a,
in the DoD hierarchy.
This is the Department
of Defense talking about
how important privacy
is to a free society.
Now, privacy itself is
actually quite a fuzzy term.
When people talk about
privacy in the media,
or even if you’re thinking about privacy,
it’s not always clear to what exactly
they’re referring and what the limits
on privacy should be.
There are a ton of really
interesting questions
in this area, actually.
Just to name one, like I
was hinting at earlier,
there’s this fundamental
question of what right
to privacy we have.
In particular, as US citizens,
what constitutional right
to privacy do we have?
What privacy rights do
we have that actually
flow from the Constitution itself?
Even if we concede, or once we
figure out exactly what those rights are,
we might ask, “Well, in what conditions
“can those rights be overridden?”
That’s a particularly
important debate that
should be happening today.
You can also ask, from
a societal perspective,
are our expectations of privacy changing?
A lot of people argue that now that
everyone’s putting everything they do
on social media and
publicizing, essentially,
where they are at every
moment, they actually
don’t want privacy.
They prefer something that’s certainly
very different from the kind of privacy
people might have thought about
15 or 20 years ago.
Now, I can
shortcut this by saying,
well, I’m not gonna answer
any of these questions.
I’m not a philosopher.
I’m not a lawyer.
I don’t work in public policy, per se,
but actually, let me say in particular
about the second, well, maybe the first
and second bullets, regarding what right
to privacy we may have today and under
what conditions they can be overridden.
I said earlier that it’s
a very important debate
that we, as a country, should be having.
As a cryptographer, I don’t really view it
as my role, per se, to come out with a
policy recommendation
one way or the other,
but what I do view it as my goal to do
is to present the options and to make sure
that people on both side of the debate
are technically informed.
Now, a lotta people are worried nowadays
about privacy, in particular because
of government surveillance and because
of the revelations that
came out a few years back.
It’s not hard to find examples of this,
of reports of NSA collecting phone records
of US citizens.
People taking people’s cellphones
and looking through the
contents of those phones.
This happened, this particular article
is talking about
taking people’s phones who
were attending the rally.
There are other examples
of people’s phones
being taken when they’re
entering the United States
at a border crossing.
But, in fact, I would say that
while it’s certainly fair game to worry
about this sort of thing,
and it’s certainly,
like I said earlier, the kind of thing
that we need to be having
a public discussion about,
from everything I’ve seen,
the government is doing their best
to do things within the law.
There are certainly many people within
government who take
these legal constraints
very seriously.
Again, while it’s something
to be concerned about,
what concerns me much more, actually,
is corporate surveillance.
We all know that corporations nowadays
are collecting tons of information
about each of us.
The scariest way that I
know how to express it
to myself, I think, is that Facebook,
Google and Apple probably know more
about you than your family does,
than your spouse or your partner does.
Google, for example, knows every
internet search you’ve done, assuming you
were logged in, lets’ say, at the time.
If you have an Android phone, they also
know everywhere you’ve been.
Does everyone in your family know
everything you search for
and everywhere you’ve been?
Probably not.
That’s a scary thought.
It’s really eye opening
to think about things
in that way.
Do you really want, do
you make a decision,
a conscious decision, to trust any
of these corporations more than you trust
some of your own family members?
Actually, I grabbed
this interesting article
last year.
In fact, Google may know you better
than you know yourself.
This is an interesting
read if you wanna read
an article about it.
Basically, arguing that we have these,
delusion is a strong
term, but we have these
delusions about what we do.
I’m a hard worker.
But Google knows exactly how much time
you spend working, how much time you spend
reading your mail. (laughter)
It’s an interesting thought, as well.
There are many other examples.
You see these things all the time.
Actually, what’s great about giving a talk
like this is that as I
was preparing my talk,
I kept on seeing news articles
coming out daily just
reinforcing this idea
that companies are
collecting more and more data
about us.
There’s a famous story about how Target
exposed a teen girl’s pregnancy.
This was many years back.
Basically,
the girl’s father found out that she
was pregnant from Target before the girl
told her father.
That’s amazing.
It can also be these concerns about
collection of data can
also have negative impacts
on people, negative economic impacts.
For example, differential pricing
is a real concern now.
The idea here being
that if a company knows
exactly how much you
spend and on what items,
and not only that, but they know how long
you’re spending in a particular site,
how long you click around
looking for a good deal,
they can use all that information to now
give you, essentially, the
maximum price at which you’ll buy an item.
It might be different
for different people.
You have different
people looking for, say,
an airline ticket.
They’ll get displayed different prices
because the companies
have so much information
about us that they
essentially know in advance
how much we’re willing to pay.
This also is something, like I said,
just came out in the news yesterday,
I suppose, about Verizon
wanting to collect
more data from people in order to do
better advertising, of course.
The problem’s only getting worse.
The problem’s getting
worse as we have more
and more devices being
installed in our houses,
and as we’re wearing devices on our bodies
that are continually collecting
information about us.
You can see this even,
again, some fascinating
cases about whether Alexa,
for example, whether the data from Alexa
can be used as evidence against somebody
at a murder trial.
These are kind of things
that just didn’t happen
even a few years back
because people didn’t
have these devices in their houses,
essentially recording everything that
they’re saying and doing.
Now, what’s cryptography’s
role, or what can
cryptography’s role be in all of this?
I think there are really
two different roles
that cryptography can play.
The first, as I said earlier,
is with this emphasis
on definitions, I think cryptography
can really help us
to replace this fuzzy model
we have about privacy.
There are many different
concerns, and many
different people have different things
they’re worried about.
What cryptography can
help us do a little bit
is to reason about that
in a more formal manner.
It doesn’t mean we have to necessarily
write down mathematical definitions,
it doesn’t mean we have to give proof,
but at least to make us think a little bit
in a more rational
and focused manner about what exactly
we mean by privacy in
a particular context.
Perhaps more interestingly, at
least from my point of view,
is that cryptography can help us achieve
certain levels of
privacy that we currently
are not achieving by
developing new algorithms
and protocols that can allow people to do
or to achieve things that
they’re already achieving,
while also adding additional
privacy guarantees
on top of that.
This is really becoming
increasingly important,
as I indicated with the news articles
on the previous slide,
with this huge volume
of personal data that’s being collected,
aggregated, stored, shared and used about
every one of us in this room.
What I wanna focus on in particular here
is the data collection aspect.
The fact that these
companies are collecting
all of this personal
data about each of us,
and also the aspect of how this data
is, in turn, being used
by other companies.
I’ll focus on the collection first,
and then I’ll come back to
the issue of data usage.
Again, I found this wonderful article,
wonderful for me because it gave me
a springboard, something to tie it into.
This was in just last week’s newspaper.
This was an op-ed talking
about how Congress
basically passed a 21st Century Cures Act.
It passed this, actually,
before last week.
This was an op-ed talking about it.
Where essentially the law tries to create
what they call an information commons
that’s gonna be a
government-regulated pool
of medical data about individuals
that’s going to be accessible to all
health researchers.
I wanna just try to model that.
This is not actually how
the information commons
is working, but we can think about it
as being something like the following.
We may have some hospitals, some local
or regional hospitals,
that patients come to.
During the course of their medical care,
data will be collected about them.
Then we may have researchers at NIH
who want to perform some studies,
perhaps using some data that
they’ve already collected from subjects
that they’ve had, they’ve
worked with themselves,
but also by using the
data that’s been collected
about other people at various hospitals
around the country.
The way this might work,
again, at a rough level,
is that under this act, those hospitals
would be required to
share, somehow or another,
data about their patients with the
medical researchers who
wanna conduct the study.
They would then take that
data that’s being sent
to them from the hospitals,
pool it with their
own data that they hold, and compute
some scientific report.
The issues here are twofold.
There’s data collection, the fact that now
NIH at some moment in
time has collected data
about hundreds or thousands of patients
that it’s storing and
they’re computing over,
and also this data usage.
When they publish this
report, how do I know
that they’re not gonna highlight some
medical condition that
I have, or how do I know
that the report won’t
reveal to somebody else
that I came in for a certain
test on a certain date?
It’s easy to understand why people wanna
collect data in general,
’cause it goes not only
for the medical researchers, but also for
these companies who ultimately
wanna sell you things
or sell advertising.
As the volume of data collected goes up,
the utility of those
companies also goes up.
If they know more about you, they can
charge you a higher price.
If they know more about your habits
and what you’re interested
in, they can charge more
to advertisers to display
a certain ad to you.
But, of course, at the same time
as the amount of data collected goes up,
the privacy of each of us goes down.
There’s a fundamental tension here,
or there seems to be
a fundamental tension,
between the utility and the privacy.
But the real question is do we need
to collect all this data in one location
in order to get the utility
that these companies want
or that the medical researchers want?
Is collecting the data the only way
to derive the utility that
we want from that data?
Or can we come up with other approaches
that avoid the need for
central data collection
in the first place?
This brings us to a favorite
research topic of mine,
multiparty computation.
A multiparty computation protocol
is gonna be a distributed
protocol run between
some number of different entities,
some number of different parties or people
or computers, where each of those people
holds their own local private input.
It’s an interactive
protocol, so they’ll all
exchange messages with each other
as part of running the protocol.
At the end of the protocol,
say, one of the parties
will output some function
of everyone else’s data.
You can view this, for example,
as taking data from several patients,
or maybe these people
are actually hospitals,
and they’ve aggregated some data locally.
Then they wanna run some study, they wanna
run, perhaps, some
machine learning algorithm
over that data.
The function f might, in that case,
be a function that learned
a particular model.
The goal of this protocol
is to learn that model.
So far, I haven’t said
anything about privacy
or anything about security.
But how might we define
a notion of security
in this setting?
What might it mean if
there were a protocol
that allows these people
to collaboratively compute
over their local data?
What might it mean for that protocol
to be called secure?
Well, we could go ahead and list
several desired properties that we want,
where you could talk
about, well, I want this
person’s input to be
secret from this person.
I wanna make sure that they
compute the correct output
or something like that,
but it turns out to be, number one,
very difficult to do that.
Also, maybe more important, by doing that,
you might end up missing some properties,
and you might never be convinced that
you didn’t leave something off your list
that you actually care about.
In addition, some of these properties
are rather subtle to define.
If I just say something
like, if I’m player one,
I wanna make sure that
nobody learns anything
about my input, well, it turns out that
in general, you’re not
going to achieve that
because by the very
nature of the computation,
you learn the output.
The output depended on my input.
There’s gonna be some correlation between
my input and that output.
It’s very difficult, actually, to sit down
and think about how you might define
what it means to learn only that value
and nothing else.
Now, cryptographers, over
the course of many years,
have developed, actually,
a way of defining
security in a setting that addresses
all of these concerns.
In particular, what cryptographers imagine
is what would be the ideal case?
What would we like to achieve if we
didn’t have to live in the real world?
(speaks too low to hear)
If we didn’t have to
live in the real world
and we could imagine that ideal model,
what would that ideal model look like?
The ideal model, what it might look like
is that we have a central
trusted authority.
I have to tell you that
it’s not easy these days
coming up with a picture
of something that people
might even consider
a central trusted authority. (laughter)
I chose the Supreme Court,
but I chose that maybe three years ago.
(laughter)
That’s a discussion for another time.
But imagine we did have a
central trusted authority
that we all agreed was trusted to do
what it’s supposed to.
Then what we might do in that case
is we might actually be willing to trust
that entity with our data.
What we would do is we
would all send our data
to that single trusted entity.
That trusted entity would then perform
the computation over the data,
and then send back the result to either
all of us or one of the parties.
We’ll define a secure
protocol in the real world,
where parties are exchanging messages
between each other, and
there is no trusted party.
We’ll define a protocol
like that to be secure
if it provides a simulation of what we
achieve in that world
with a trusted party.
Now, this word simulation has a very
technical definition in cryptography.
I’m not gonna go into
it, but the basic idea
is just telling you that any property
you achieve in this ideal world
with the trusted entity will also
be achieved by the real-world
execution of the protocol.
In particular, if we
go back to this picture
of the people communicating
over the network,
and one party learning
this output value y,
well, just like we said, that’s going
to be giving you the
same security guarantees,
even if all these other
people are corrupted.
We’re concerned about
the security guarantees
being provided for this
person on the lower left.
It’s gonna provide that
person, that party,
with the same exact guarantees that they
would have in an ideal
model where everybody
sends their inputs to the trusted party,
and a computer itself can send it back to,
in this case, the person
on the upper right.
If you think of what that means,
that the execution of the protocol
doesn’t reveal anything about X5,
about the input of the
party on the lower left,
beyond what’s revealed by the inputs
of the corrupted parties and the output
that they learn from the protocol.
If you wanna compute
f, then that’s the best
you can hope to achieve.
We have this beautiful
definition of what it means
to have a secure computation protocol.
Are these protocols feasible?
Do they exist?
Well, it turns out, actually,
that a long line of research has shown
that under a small set of
reasonable assumptions,
in fact, secure computation
of any function is possible.
You can take any function you’d like,
the most complicated
machine learning algorithm,
the most complicated
statistics or even maybe
the most complicated advertising
generation algorithm,
you could plug it into
one of these protocols,
and you could design a protocol that would
exactly learn the function
you’re trying to compute
without unduly harming the privacy
of any individual party’s input.
If we go back to this example of the
medical researchers, that would mean
that rather than having the status quo
where, say, the hospitals are sending over
all their data to the NIH, and then
the NIH suddenly has information about
hundreds or thousands of patients,
what they could do instead is they could
design a secure computation protocol
to carry out whatever study the NIH
is interested in conducting.
They would replace that data with an
interactive protocol, in this case,
between the two hospitals and NIH,
that would allow NIH to compute,
to run the study and generate the report,
without ever having
their hands on any data.
In particular, again, just like in the
ideal world, giving the guarantees that
the only thing the NIH
learns is what’s generated
in the report, and nothing else.
Now, we said that these protocols exist.
What can we say about efficiency?
Do we have any hope of running them
or using them in practice?
Now, secure computation, the general idea,
was introduced in the 1980s.
Like I said, there was a long line
of research following up on that,
continuing until today,
developing better and better protocols.
I wasn’t around or I wasn’t active
at the time this work was going on,
but I think if you read
the papers at the time,
you get the sense that
in the ’80s and ’90s,
people viewed this as
being very interesting
from a theoretical point of view,
but as being hopelessly impractical.
In fact, it wasn’t until 2004 that there
was the first implementation,
called generic two-party computation,
even in a very simple case where we assume
the parties are honest,
but only try to learn
information after the
fact from the interaction.
Even that implementation
was quite inefficient,
but the main point, and
what resulted in a paper,
and quite a nice paper, it’s just the fact
that they’re able to do it at all.
But over time, of course,
progress in this area
has continued.
Some of the work done by myself,
and in collaboration with
students and postdocs,
have helped to push the boundary
and makes things more efficient
beyond what they previously had been.
If you look, there’s been steady progress
over the years.
The Fairplay paper was that one in 2004
that I mentioned, and there’s some work
in 2009 and 2010.
Showing on the left,
basically, the performance,
in terms of the number of,
well, the number of gates, a circuit that
you can compute in a
certain period of time.
Essentially, the one on the right,
you think how large of a program,
how large of a circuit,
they were able to compute.
We had some ideas in 2011 that led
to a dramatic improvement, actually,
in the performance from scalability,
and I hope maybe gave
people the impression
that this thing actually had a possibility
of being more practical, and
encouraged some followup,
a lot of followup work, actually,
pushing this even further.
Now, this is in the semi-honest setting
where, as I said before, this assumes
that the parties are following
the protocol honestly,
they’re not deviating at all.
All they’re trying to do is potentially
learn information that
they shouldn’t be learning
from the execution of the protocol.
Work continues on developing protocols
with stronger guarantees, as well.
The model we’d actually
really like to be using
in practice is the
so-called malicious model,
where we make no assumptions about how
the parties are gonna be behaving during
the protocol, and we make no assumption
that they’re gonna be following
the protocol honestly.
They can deviate arbitrarily from what we
tell them to do.
You can see here, a steady march,
this is on a logarithmic scale, actually,
of the time required to perform
an AES computation.
An AES is a particular
cryptographic log cipher.
This is a secure two-party
computation of AES,
going from 2009.
The last two works are by myself
and students of mine, postdocs of mine,
over the last year, showing
really dramatic improvements
in the running time over the course of,
I guess, almost not quite 10 years.
It’s really an improvement that could have
a significant impact in practice because
it goes from something, the computation
took about 18 minutes
(clears throat) in 2009,
whereas today it takes
about 37 milliseconds.
This is a really dramatic improvement.
Again, people are now really seriously
starting to think about
applications of this
in the real world.
I’ll just mention that
we have also some work
extending this to the multiparty setting,
running with hundreds of
parties all over the world,
and essentially running
a secure computation
of AES, like we had before,
among 100-plus parties
across five continents in a
time of about two seconds.
I guess that’ll be approved,
maybe Xiao has already approved it.
It’ll be approved, maybe, I’m sure,
in the next couple of
years. (clears throat)
Now, I wanna talk
briefly, I wanna come back
to this issue of privacy of data use.
We’ve been talking a lot about
how secure computation
and how these protocols
can reduce the need to
collect personal data.
Basically, because the protocols allow you
to do computations over private data
without the need to centrally collect
all the data, to collect
that data in any one place.
But we still need to reason about
the computations itself.
I said earlier that the guarantee
provided by secure computation is that
the execution of the
protocol in the real world
is as secure as the
execution of the protocol
in the ideal world.
But the ideal world does leave something.
The ideal world does leave this function
over everybody’s input.
Depending on the function,
that may actually
reveal quite a bit of information
about an individual party’s input.
We’ll come back to this example again
of the medical researchers.
Even if we’re doing everything
using a secure protocol, and so the data
itself is never sitting in one location
at any point in time, the problem
or the concern is that the report itself
may leak information about
particular people’s information.
There are some techniques that have
been introduced over the past decade or so
to try to address this.
I wanna focus on one called
differential privacy.
This was not introduced by me.
This was by Dwork, McSherry, Nissim
and Smith in 2006.
They basically, someone
came up with a definition,
which is again illustrating both
part of the difficulty of coming up
with these definitions, that it took until
2006 for people to formulate something,
but also the utility of having these
definitions in place
once you’ve gone through
the work of developing them.
Basically, the idea they
were trying to capture
is that a particular
computation is private
if the output of the computation is not
too sensitive on any single user’s data.
A little bit more formally, what they did
was they would compare what you would get
by computing the function
over the original dataset.
Here I have a table
with some medical data,
perhaps, of six different patients.
If we run some computation over that data
and learn something
from it and publish it,
what we’re gonna do is
we’re gonna compare that
to what we would get if we eliminated
one user’s data, and then ran the same
computation over that.
Only if those two outputs are close
in a particularly defined way
will we say that the
computation is private.
‘Cause then, by
definition, it’ll mean that
the computation was not too sensitive
to the input of any particular user.
Now, differential privacy is quite,
has become quite a popular (mumbles).
It provides very strong guarantees.
The main drawback of differential privacy
is that it’s nowadays viewed, perhaps,
as being too strong.
In particular, an example
that highlights this
is that if Google wanted to publish
the number of visitors to its website
in a month, that would actually not
be differentially private.
It’s kind of easy to see why.
If I take one user out of that sample,
that will shift, or if
I take one search query
out of that sample,
perhaps, that will shift
the output that I release by one.
That’s a change, that’s
a noticeable change
in the output, depending on whether or not
you visited Google that month or not.
But on the other hand, that
seems a little bit ridiculous.
I don’t think anyone
would claim this is really
a violation of privacy
because there’s no way
that an attacker would possibly be able
to know the search
habits of everybody else
in the world over that period of time,
and thereby be able to
learn anything about
whether or not you made a particular
search query or not.
Now, a little bit more rigorously,
a lot of people have been showing bounds
on the best achievable accuracy
that differentially private
mechanisms can achieve.
Essentially, we know
that while it provides
a very strong guarantee of privacy,
you take a big hit in the accuracy that
you’re able to obtain in the
model, essentially, that
you compute over the data.
In work in 2013, along with Adam Smith
and a student of mine,
and a postdoc of mine,
Raef Bassily and Adam Groce,
we looked at ways to relax
this notion of differential privacy
and come up with a more
workable definition,
one that would allow for better protocols,
but that would still achieve a reasonable
measure of privacy.
We continue to have a
definition to compare
the outputs of some mechanism with
and without some individual user’s data.
The big difference is that we explicitly
took into account
an external observer’s, or an attacker’s,
if you like, uncertainty
about the rest of the data.
In the Google example, that would mean
we explicitly take into account for
the definition the fact that it’s unlikely
that any single observer will know
the exact statistics of who visited Google
that month or not.
We can kinda take that into account
when designing mechanisms
while still ensuring
that they’re private.
In particular, we showed
also that our definition,
in addition to various
properties that it had,
allows, actually, for
noiseless mechanisms.
This is important because
differential privacy
tends to add noise to whatever
computation you’re doing,
basically blurring the answer somewhat.
But we showed that with
respect to our definition,
you could actually not give up anything
in terms of accuracy and
not add any noise at all,
while still achieving a reasonable notion
of privacy in some settings.
I wanted to just conclude in the last
five minutes or so by talking about
the real-world impact
of some of this work.
I should be clear, I’m
not talking directly
about my own work, I’m
talking more broadly
about the field of
privacy-preserving computation
as a whole.
It’s been really
amazingly gratifying to me
to see something that I’ve been working on
for 15 years or so now just staring
to make its way into the consciousness
of the public, as well as
real-world applications.
I can mention for starters about,
because there’s a lot of
interest nowadays in this,
over the last five years
or so, in particular,
from different funding agencies.
DARPA and IARPA have very large programs
dedicated to
privacy-preserving computation.
The Air Force has looked into using
secure computation protocols for
the computations that they run
with other countries.
They’re actually
interested in potentially,
or at least exploring the possibility,
of using these protocols
for real-world application.
I’ve also been, or I have been interacting
with people at the Department of Treasury
and the Office of Financial Research
who are also interested in trying to use
these ideas to perform better
or privacy-preserving
computation of various
economic indicators.
They’re very concerned about privacy
of the people that they’re studying.
They’re concerned both
about individual’s privacy,
and also the businesses and companies
that they interact with are concerned
about their own privacy, and not giving
a competitive edge to their,
to the other people working in the sector.
The Department of Treasury was interested
in exploring the application
of these techniques
to some of their problems
that will allow them
to get better accuracy
with the studies that
they’re talking about.
What was really interesting was a
statement that came out of
Senator Ron Wyden’s office
in May of this year.
It was really like music to my ears.
He had in his statement
an explicit call for using
privacy-preserving algorithms
in evidence-based policymaking.
He says, “I strongly urge the commission,”
this is a commission on
evidence-based policymaking,
“to recommend that
privacy-enhancing technologies,
“such as secure multiparty computation
“and differential
privacy, must be utilized
“by agencies and organizations that seek
“to draw public policy-related insights
“from the private data of Americans.”
When you have a senator talking about
multiparty computations
and differential privacy,
it’s a little bit scary.
I’m not sure who put
that language in there.
But anyway, it’s really
gratifying to see that,
in his mind, number one,
that these technologies
are available, and number
two, that they should
be used, if possible,
by these commissions.
In the last few years, we’ve also seen
a number of startup
companies in this space.
I’ll just highlight, well, some of them
are startups and some of them are beyond
startups, maybe, at this point.
I’ll just mention Sharemind in particular
has been around for a little while.
They’re doing a three-party computation.
They’re based in Estonia.
They’ve been using it over the last couple
of years to actually
do statistical analysis
of financial data for
the Estonian government.
They’re using this on real problems
with reasonable performance, as well.
I’ll highlight also the fact that Google
has recently begun using a form
of two-party computation
in order to do ad tracking.
Google interacts basically with various
advertising companies in order to allow
the advertising company
to compute how many
visitors or how many
successful conversions
their ads had on Google’s platform.
Differential privacy has also made its way
into many devices.
Actually, Apple has a
differentially private mechanism
built into the latest phones.
Google has a differentially
private mechanism
built into its Chrome
browser to collect statistics
from the users using their browser.
In general, I will just say that these
are really exciting times in the field.
We’re seeing more and more interest,
and protocols are getting
much, much better.
Really, I hope that we’ll continue to see
this increase in the,
in people’s knowledge
about these technologies
and also applying them
in different contexts.
With that, I will end, and thank you
for your attention.
I’m happy to stay and take questions.
As far as I’m concerned, we can go beyond
the five o’clock limit,
but I’m happy (coughing).
(applause)
– [Samir] Thank you, Jonathan.
We just quickly have time for questions
if people have questions or comments.
If not–
(crosstalk)
Please go ahead.
– [Man] How real is multiparty
computation protocol?
How will you implement it?
(speaks too low to hear)
– You should come talk to me.
One of the things that’s also been great
is in addition to exploring the
cryptographic protocols themselves,
there’s also been a lotta
focus on the usability
of these techniques, and making them
available to non-experts.
We, (speaks too low to
hear) for example, have a
library that’s built on GitHub.
You can download (speaks too
low to hear) computation.
There’s some effort involved
in writing the program,
but it’s not beyond
whatever it would be for writing
any other program.
It wouldn’t be (speaks too low to hear).
– [Samir] Question.
– [Man] Just to be clear, when you
started talking about
real-world applications,
so only just recently has
(speaks too low to hear),
– Yeah, I mean, it depends a little bit
what you mean by recently.
Some of the companies that I had mentioned
go back almost 10 years,
but it’s increasing
over the last few years.
First there was on example,
then there was another example.
Over the last year or so,
there are now several examples.
– [Samir] We also have a reception outside
if you can stick around and have some
food and drink (speaks too low to hear).
Thank you so much for coming.
(applause)
(relaxing music)

Add a Comment

Your email address will not be published. Required fields are marked *