21. Cryptography: Hash Functions


The following content is
provided under a Creative Commons license. Your support will help
MIT OpenCourseWare continue to offer high quality
educational resources for free. To make a donation, or
view additional materials from hundreds of MIT courses,
visit MIT OpenCourseWare at ocw.mit.edu. SRINIVAS DEVADAS: All
right, let’s get started. Good morning everyone. I see a lot of tired faces. I’m not tired. Why are you tired? [LAUGHTER] I only lecture half the time. You guys take the
class all the time. So today’s lecture is
about hash functions. And you may think that you know
a lot about hash functions, and you probably do. But what we’re going to do
today is talk about really a completely different
application of hash functions, and a new set of
properties that we’re going to require
of hash functions that I’ll elaborate on. And we’re going to see a bunch
of different applications to things like
password protection, checking the integrity
of files, auctions, and so on and so forth. So a little bit of
a different lecture. Both today and on
Thursday I’m going to be going to be doing
cryptography and applications, not too much of algorithms. But we will do a little bit
of analysis with respect to whether properties
are satisfied, in this case by hash
functions or not. So let’s just dive right in. You all know what
hash functions are. There’s no real change
in the definition. But the kinds of hash
functions that we’re going to be looking at
today are quite different from the simple hash
functions, like taking a mod with a prime number that
we’ve looked at in the past. And the notion of
collisions is going to come up again,
except that again we’re going to raise the
stakes a little bit. So a hash function
maps arbitrary strings– let me do this right. So you’re not making a statement
about the length of the string. You will break it up, even if
you had a string of length 512, or maybe it was 27, you do
want to get a number out of it. In a specific
range there’s going to be a number of
bits associated with our hash functions. And previously we
had a number of slots associated with the output
of the hash function. But the input
could be arbitrary. And these arbitrary
strings of data are going to get
mapped, as I just said, to a fixed length output. And we’re going to
think about this fixed length as being a
number of bits today, as opposed to slots
in the hash table. Because we really aren’t
going to be storing a dictionary or a hash
table in the applications we’re going to look at today. It’s simply a question
of computing a hash. And because the
fixed length output is going to be something on the
order of 160-bits, or 256-bits, there’s no way that you
could store two arrays to 160 elements in a hash
table, or even two arrays to 64 really. And so we’re going
to just assume that we’re computing
these hashes and using them for
certain applications. I just wrote output
twice I guess. So map it to a
fixed length output. We want to do this in a
deterministic fashion. So once we’ve computed the
hash of a particular arbitrary string that is
given to us, we want to be able to repeat
that process to get the same hash every time. We want to do this
in a public fashion. So everything is public. There’s no secrecy. There’s keyed hash functions
that we won’t actually look at today, but
maybe in passing I’ll mention it next time. We’re not looking at
keyed hash functions here. There’s no secrets in any of
the descriptions of algorithms or techniques I’m going
to be describing today. And we want this to be random. We want it to look random. True randomness is going to
be impossible to achieve, given our other constraints. But we’re going to
try and approximate it with pseudo-randomness. But we’d want it to
look random, because we are interested– as we were
in the case of dictionaries and the regular application
of hash functions– we are interested in
minimizing collisions. And in fact we’re going to
raise the stakes really high with respect to collisions. We want it to be impossible
for you, or anyone else, to discover collisions. And that’s going to be an
important property of collision resistance that obviously is
going to require randomness. And those are the
three things we want, deterministic,
public, and random. And so just from a function
description standpoint you have 0, 1 star here,
which implies that it’s an arbitrary length strength. And we want to go to 0, 1 d. And this is a
string of length d. So that means that
you’re getting d-bits out from your hash function. And here the length is
greater than or equal to 0. So that’s it. Not a lot that’s new here. But a few things that are going
to be a little bit different. And there’s some subtleties
here that we’ll get to. I want to emphasize two things,
one of which I just said. There’s no secrecy, no secret
keys here in the hash functions that we are describing. All operations are public. So just like you had your hash
function, which was k mod p, and p was a prime and p was
public and known to everyone who used the dictionary,
everything here we are going to be
talking about is public. So anyone can compute h. And we’re going to
assume that this is poly-time computation–
not too surprising– but I’m being quite flexible here. When you look at
dictionaries, and you think about using dictionaries,
and using it to implement efficient algorithms,
what is the assumption we kind of implicitly made–
are perhaps explicitly in some cases– with respect
to computing the hash? Anybody? Yeah? AUDIENCE: Constant time? SRINIVAS DEVADAS: Constant time. We assumed– so this is not
necessarily order 1, right? So that’s important. So we’re going to– I want
to make sure you’re watching. So you’re going to raise
the stakes even with respect to the complexity of the hash. And as you’ll see, because
of the desirable properties, we’re going to have to do that. We’re going to ask for
really a lot with respect to these hash functions. Nobody can find a
collision, right? And if you have something
as simple as k mod p, it’s going to be trivial
to find a collision. And so these order
1 hash functions that you’re familiar
with aren’t going to make the grade with respect
to any of the properties that we’ll discuss
in a few minutes. All right, so remember this
is poly-time computation. And there’s lots of examples
of these hash functions. And for those of you who are
kind of into computer security and cryptography already,
you might have heard of examples like MD4 and MD5. These are versions. MD stands for message digest. These were functions that were
invented by Professor Rivest. And they had d equals
128 way back when– 1992, if I recall– when
they were proposed. And these algorithms have
since been broken in the sense that it was conjectured that
they had particular properties of collision resistance that
it would take exponential time for anybody to find collisions. And it still kind of
takes exponential time, but 2 raised to 37 is
exponential at one level, but constant in another level. So you can kind of do
it in a few seconds now. So a little bit of history. I’m not going to spend
a lot of time on this. MD5 was used to create what was
called a secure hash algorithm. This is 160-bits. And this is not quite
broken at this point. But that people consider it
broken, or soon to be broken. Right now the
recommended algorithm is called SHA-3, secure hash
algorithm version three. And there was a contest
that ran for like 18 months, or maybe even longer,
that eventually was won by what turned into the SHA-3. And they had a different name
for it that I can’t recall. But it turned into SHA-3. And what happened along the
way, as we went from MD4, MD5, SHA-1 to SHA-3, is that
this amount of computation that you had to do increased. And the complexity of
operations that you had to do in order to compute
the hash of an arbitrary string increased to the
point where– you want to think about this as
100 rounds of computation. And certainly order
d computation, where d is the number of bits. And perhaps even more. So it’s definitely not order 1. So as I said a little bit
of context with respect to the things that
are out there. At the end of the
lecture I’ll give you a sense for how these
hash functions are built. We’re not going to
spend a lot of time on creating these
hash functions. It’s really a research topic
onto itself and not really in the slope of 6.046. What is in the scope
of 6.046, and what I think is more
interesting, which is what we’ll focus
our energy and time on, is the properties of
these hash functions. And why these properties
are useful in a bunch of different apps. And so what is it that we want? We want a random oracle. We want to essentially
build something that looks like that,
deterministic, public, random. And we’re going to
claim that what we want is this random
oracle which has all of these wonderful properties
that I’m going to describe. I’m going to describe
the random oracle to you, and then I’m going to tell you
about what the properties are. And then unfortunately
this is an ideal world and we can’t build
this in the real world. And so we’re going to
have to approximate it. And that’s where the MD4’s and
the MD5’s and the SHA-1’s came in, OK? So this is not
achievable in practice. So what is this oracle? This oracle is on input
x, belonging to 0,1 star. So that could be an
arbitrary string. If x not in the book–
so there’s this the book, all right? And there’s this
infinite capacity book that has all of the computations
that were ever done prior. And they’re always
stored in the book. And that’s how we’re
going to get determinism. Because this book
initially gets filled in. All of the entries in
the book are filled in using pure randomness. So you flip a coin d
times to determine h of x. So that’s basically it. And you just keep flipping. You have to flip d times. And so if x was 0, you
flip d times, d was 160. You flipped a coin 160
times and got a string. If x were 1, flip 160 times,
you get a different string with very high
probability, obviously. And so on and so forth. But what you do is
you have this book. So you’re going to record
x h of x in the book, OK? So at some level
your hash function is this giant look-up
table in the sky, right? Actually not giant, infinite
capacity look-up table in the sky. Because you can put
arbitrary strings into this. And if it’s in the book– this
is obviously the important part that gives you determinism–
then you return y, where x and y are
in the book, OK? So you get a random
answer every time, except as required
for consistency with previous answers. So the very first
time you see a string, or– and the whole world
can create this book. It’s public. So if I created the book at
first with a particular string, let’s say Eric. I was the string. And I’m the one who put
the entry– x equals Eric, and h of x, h of Eric equals
some random 160-bit string– into the book, I get
credit for that, right? But if you come a nanosecond
later and ask for h of Eric, you should get exactly
what got put into the book when I asked for h of Eric. And so on and so forth. So this is true for everybody. So this is like– I mean
basically impossible to get. Because not only can
anybody and everybody query, you have to have this
ordering associated with people querying the book. And you have to
have consistency. All right. So everyone convinced
that we can’t build this? All right. If you took anything
out of this lecture, that’s what you should take. No, no. There’s a lot more. So we want to approximate
the random oracle. And we’re going to get to that. Obviously we’re going to have to
do this in poly-space as well. So what’s wrong with this? Of course this picture is
I didn’t actually say this, but you’d like things to be
poly-time in terms of space. You don’t want to store
an infinite number– this is worse than poly-time,
worse than exponential time, because it’s arbitrary strings
that we’re talking about here, right? So you can’t possibly do that. So we have to do
something better. But before I get into how we’d
actually build this, and give you a sense of how SHA-1
and MD5 were built– and that’s going to come
a little bit later– I want to spend a lot of time
on the what is interesting, which are the
desirable properties. Which you can kind of see
using the random oracle. So what is cool about
the random oracle is that it’s a simple algorithm. You can understand it. You can’t implement it. But now you can see what
wonderful properties it gives you. And these properties are
going to be important for our applications, OK? And so let’s get started with a
bunch of different properties. And these are all
properties that are going to be useful for
verification or computer security applications. The first one, it’s not ow,
it’s O, W. It’s one-wayness, all right? So one-way, or one-wayness. And it’s also called– you’re
not going to call it this– but perhaps this is a more
technical term, a more precise term, pre-image resistance. And so what does this mean? Well this is a very
strong requirement. I mean a couple of
other ones are also going to be perhaps stronger. But this is a pretty
strong requirement which says it’s
infeasible, given y, which is in the– it’s basically
a d-bit vector, to find any x such that h of x equals y. And so this is x is
the pre-image of y. So what does this say? It says that I want to
create a hash function such that if I give you
a specific– we call it a 160-bit string, because
we’re talking SHA-1 here, and that’s the hash–
I’m going to have, it’s going to have
to be impossible for me to discover an x that
produced that 160-bit string, OK? Now if you go look
at our random oracle, you realize that if you
had a 160-bit string, and perhaps you
have the entire book and you can read
the entire book. It’s an infinite capacity book. It’s got a bunch of stuff in it. And know that any time anyone
queried the book the first time for a given x, that there was
this random 160-bit number that was generated and
put into the book. And there’s a whole lot
of these numbers, right? So what’s going to
happen is, you’re going to have to look
through the entire book, this entire potentially
infinite capacity book, in order to figure out if this
particular y is in the book or not. And that’s going to take a long
time to do, potentially, OK? So in the case where you
have a random oracle you’d have to go through and find–
looking at the output hash corresponding to each of the
entries in the random oracle, you’re going to start matching,
match, match, match, match, it’s going to take
you exponential time. Well actually worse than that,
given the infinite capacity of the book. So this clearly gives you that. Now you may not be a completely
satisfied with that answer because you say well,
you can’t implement that. But we’ll talk a
little bit, as I said, about how you could
actually get this. But what’s– I should be
clear– is that the simple hash functions that we’ve looked
at in the past just to build dictionaries do not
satisfy this, right? So suppose I had h of x
equals x square mod p. Is this one-way,
given a public p? No of course not, right? Because I’m going to be–
it’s going to be easy for me to do something. Even though this is discrete
arithmetic I could do something like, well, I know that what
I have here– actually let’s do it with something
that’s simpler, and then I’ll talk
about the x squared. If I had something
as simple as x mod p, I mean that’s trivially broken
in terms of one-wayness. Because I know that h of x could
be viewed as the remainder. So anything– if this
is h of x, and let’s just call that y for a
second, because that’s what we had it out there. Something that’s a multiple
of y plus the remainder– so I could have a– is that right? Is that what I want? Yeah. No, plus y. So I want a of– well since
I can’t figure it out, why can’t you? What do I need to
put in there in order to discover an x that
would produce a y? Can I write an equation? Yeah? AUDIENCE: Could you
just write y itself? SRINIVAS DEVADAS: Just y itself. That’s right. Good point. Just y itself in this case. Good. I knew you guys were
smarter than me. This proves it. So if you just take
y– and y remember is going to be something
that’s 0 to p minus 1, right? And that’s it. It just goes through, right? So that’s a trivial
example, right? Now if I put x squared in
here, obviously it’s not y, but I could start looking
at– what I have here is I’m going to get y that
looks like x squared. But I could take
the y that I have, take the square root
of that, and then start looking for x’s that give
me the y that I have. Actually it’s not a complicated
process to try and figure out, through trial and
error potentially, what an x is that
produces a particular y for the kinds of hash
functions that we’ve looked at, all right? Now as you complicate this
equation it gets harder. Because you have to invert
this set of equations. And that’s what
the game is going to be when you go create
one-way hash functions. The amount of computation
that you do in order to compute the y is going
to increase to the point where, as I mentioned, you have
80, 100 rounds of computation, things getting mixed in. And the hope is that you create
this circuit, if you will, that has all this
computation in that. Going forwards is
easy, because you’ve specified the
multiplications and the mods and so on and so forth. But not all of these operations
have simple inverses. And going backwards,
which is what you need to do in order
to break one-wayness, or discover the x
given a y, is going to be harder and harder
as the computations get more complex, OK? So everyone have a sense
of what one-wayness is? So that’s one-wayness. There’s four other properties,
two of which are very related. CR and TCR. So CR is collision resistance. It’s infeasible to find x and
x prime such that x not equal to x prime, and h of
x equals h of x prime, which is of course a collision. OK? And that just says you have
this crazy hash function where you can’t discover collisions. Well it would be
absolutely wonderful. In fact that’s what we wanted
when we built dictionaries. But why don’t we use
SHA-3 in dictionaries? Why don’t we use
SHA-3 in dictionaries? Yeah? AUDIENCE: Because it’s more
complicated than we need. SRINIVAS DEVADAS: Yeah,
it’s horribly slow, right? It would take longer to
compute the hash than access the dictionary,
when you actually had a reasonable dictionary
that maybe had some collisions. I mean you just go off and
you have a linked list, you can afford a few collisions,
what’s the big deal, right? So it just doesn’t
make any sense to use this level of
heavyweight hash function, even if it satisfies
collision resistance– which some of these are conjectured to
do– for the applications we’ve looked at. But there’ll be other apps
where collision resistance is going to be important. So that’s collision resistance. And then there’s– TCR is
target collision resistance. It’s a weaker form–
so sometimes people CR strong collision resistance,
and TCR weak occlusion resistance. We’ll use CR and TCR here. And this says it’s
infeasible, given x– so there’s a
specific x that you want to find a collision
for, as opposed to just finding a pair that
goes once to x and x prime. And any pair would suffice to
break the collision resistance property. But TCR says is I’m going
to give you a specific x. And I want you to
find an x prime who’s hash collides with
the hash of x, OK? That’s TCR. OK that’s TCR for you. And that just to be clear,
I think you probably all got this, obviously
we want this here because we have a
deterministic hash function. And it’s a trivial thing
to say that if you had x, and you had x again, that you
get the same hash back from it. That’s a requirement, really. So we want two distinct x
and x primes that are not equal that end up colliding. That’s really what
a collision is. And so you see the difference
between CR and TCR? Yup? Yeah? AUDIENCE: Are we to
assume that given an x it’s very easy to
get the h of x back? SRINIVAS DEVADAS:
So the question was, given an x, it’s poly-time
computation to get h of x. Absolutely. Public poly-time computation
given an x to get h of x. So going this way is easy. Going this way– I ran
out of room– hard. OK? AUDIENCE: So does that mean that
TCR is basically the same as 1? SRINIVAS DEVADAS: No,
no, no, absolutely not. TCR says it’s OK. You can compute this. You can get x. And you can get h of x. So given x, you know
that you can get h of x. I didn’t actually put
that in the definition. And maybe I should have. So given x you can
always get h of x. Remember that. It’s easy to get h of x. So any time I say given
x, you can always add it, saying given x and h of x. So I’m given x. I’m given h of x. I obviously need to
map– I need to discover an x prime such that h of
x prime equals h of x, OK? Now you have situations
where for– it may be the case that
for particular x’s you can actually do this. And that’s enough to break TCR. So you have to have
this strong property that you really don’t want to
find collisions are for some– even if there’s a constant
fraction of x’s that break the TCR property, you
don’t like your hash function, OK? Because you might end
up picking those and go build security applications
using those properties. I want to talk a little
bit about the relationship between OW, CR, and TCR. So I’m going to
get back to that. And we’re going to talking
about hash functions that satisfy one property but
don’t satisfy the other. And I think your
question will probably be answered better, OK? Thanks for the question. So those are the main ones. And really quickly, if you
want to spend a lot of time on this– but I do
want to put up– I think I’ll leave
these properties up here for the duration. Because it’s important for you
to see these definitions as we look at the
applications where we require these properties, or
a subset of these properties. But that we have
pseudo randomness. And this is simply a
function of the fact that– so this is PRF– we know
we can’t build a random oracle. And so we’re going to have to do
something that’s pseudo-random. And basically what
we’re saying here is the behavior is
indistinguishable from random. So we’re going to have to use
non-linearity, things that are called non-linear
feedback shift registers, to create pseudo-random
functions. There’s many ways that we can
create pseudo-random functions. We won’t really get into that. But obviously
that’s what we want. And then the last
one is a bit tricky. And we will have an app that
requires this way at the end. But this is infeasible
given h of x to produce h of x prime, where
x and x prime are– and it gets a little bit fuzzy here– are
related in some fashion, right? So a concrete
example of this is, let’s say that x
prime is x plus 1. So this is a reasonable
example of this. So what this says is
you’re just given h of x. It doesn’t actually say
anything about one-wayness yet. But you could
assume, for example, that if this was a
one-way hash function, that it would be possible to
get x from h of x, correct? And let’s keep that though. Hold that thought, all right? We’re going to get back to it. So if I’m just given the hash
through some computation, it may be possible for me
to create another hash, h of x prime, such that
there’s some relationship that I can prove or argue
for between the strings that created the hashes,
namely x and x prime, OK? That’s what
malleability is, right? Now you might just go off and
say here’s an x, here’s a y, here’s h of x,
and here’s h of y. These look completely random. And you might go off– I’m
being facetious here– I say that y is x’s third cousin’s
roommate’s brother-in-law or something, right? I mean just make
something up, right? So clearly there’s got to be
a strong, precise relationship between x and y. If in fact you could
do this and get y equals x plus 1, that’d
be a problem, right? But if you are–
and then you can do this sort of consistently
for different x’s and y’s, that would absolutely be
a problem, right? But what you’re really
asking for– and typically when you want
non-malleability– it’s things where you have
auctions, for example, where you are to be careful about
making sure that you don’t want to expose your bid. And so maybe what you’re
doing is exposing h of x. You don’t want somebody
to look at your h of x and figure out how
they could beat your bid by just a little bit. Or in case of Vickrey auctions,
where the second highest bidder wins, now just be a little
bit below you, right? So that’s the kind
of thing that you want to think about when it
comes to non-malleability, or malleability, where you
want a strong relationship between two strings
that are related in some ordered fashion,
like x equals– x prime equals x plus 1, or just
x prime equals 2 times x. And you don’t want
to be able to– you don’t want the adversary
to be able to discover these new strings. Because that would be
the system, all right? So any questions
about properties? Are we all good on
these properties? All right, because I’m
going to start asking you how to use them for
particular applications, or what properties are required
for certain applications, OK? One last thing
before we get there. I promised a slightly
more detailed analysis of the relationships
between these properties. So let’s do that. Now if your just look
at it, eyeball it, and you look at collision
resistance and TCR, what can I say about
the relationship between CR and TCR? If h is CR, it’s going
to be TCR, right? It’s got to be TCR. It’s a strictly
stronger requirement. But not reverse. And you can actually
give a concrete example of a particular hash
function that is TCR. I’m not going to go there. It’s actually a
little more involved than you might think it is,
where a TCR hash function is not collision resistant. But you can see that
examples such as these should exist, simply because I
have a more stringent property corresponding to
collision resistance as opposed to TCR, right? So if you’re interested in
that particular example, you’re not responsible for
it, get in touch with me and I’ll point you to a,
like a three-page description of an example. So I didn’t really
want to go in there. But what I do want to do is talk
about one-wayness and collision resistance. Because I think that’s
actually much more interesting, all right? So if h is one-way–
any conjectures as to what the question
mark is in the middle? Can I make strong statements
about the collision resistance of a hash function,
if I’m guaranteed that the hash function I have
is a one-way hash function, or vice versa? Another way of
putting it is, can you give me an example of,
just to start with, a hash function which is
one-way but not TCR, not target collision resistant? So I’m going to try and
extract this out of you. This is somewhat subtle. But the way you want
to think about this is, let’s say that h
of x is OW and TCR, OK? And so I have a bunch of inputs. And this is the output. And I get d-bits out. And I’ve got x1, x2, to xn, OK? Now I’ve given this h–
I’ve been given this h which is one-way and TCR. It satisfies those properties
that you have up there. In the case of one-way, I give
you an arbitrary d-bit string. You can’t go backwards and
find a bunch of the xi’s that produce exactly that
d-bit string, all right? So it’s going to be
hard to get here. But you’re allowed now
to give me an example. So this is some hash
function that you can create, which may use h as well. And h is kind of nice because
it has this one-way property. So let’s say that we want
to discover something where one-way does not imply TCR. So I want to cook up a
hash function h prime such that h prime is one-way,
but it’s not TCR, OK? The way you want to think about
this is you want to add to h. And you want to add something
to h such that it’s still hard– if you add h it’s still hard
to go from here to there. Because you’ve got to go deeper. If you add to, for
example, the inputs of h. Or you could add to the
outputs of h as well, or the outputs of the current h. But you can basically go deeper,
or need to go deeper in order to find the break
one-wayness, in order to find an x, whatever you have,
that produces the d-bit string that you have, right? So what’s a simple way of
creating an h prime such that it’s going to be pretty easy to
find targeted collisions even, not necessarily collisions,
it’s pretty easy to find targeted collisions,
without breaking the one-way property of h? Yeah? AUDIENCE: So if you have
x sub i, if i odd then return h of x of i. So that’s minus 1. So return the even group. SRINIVAS DEVADAS: Sure. Yep. AUDIENCE: Given
x any x of i, you can usually find another x of
i that was the same output? You can go backwards. SRINIVAS DEVADAS: You
can’t go backwards. Yeah, that’s good. That’s good. I’m going to do something that’s
almost exactly what you said. But I’m going to
draw it pictorially. And what you can do, you can
do a parity, like odd and even that was just described. And all I’ll do is add
a little [? XNOR ?] gate, which is a parity
gate, to one of the inputs. So you have and b here. So I’ve taken x1, and
I have a and b here. So I’ve added– I can
add as many inputs as I want to this function. Oh I should mention
by the way, h of x is working on arbitrary strings. And obviously I
put in some number here that corresponds to
n, which is a fixed number. So you might ask, what the
heck happened here with respect to arbitrary strings? And there’s two answers. The first answer is,
well, ignore arbitrary. And assume that you
only have n-bit strings. And n this is really
large number, right? And that may not be
particularly satisfying. The other answer is,
which is more practical, which is what’s
used in practice, is that typically
what happens is, you do have particular
implementations of hash functions that
obviously need to have fixed inputs, n, for example. And n is typically 512. It’s usually the block size. And you chunk the input up
into five 12-bit blocks. And typically what
you do is, you take the first five 12-bits,
compute the hash for it. And then you can do it
for the remaining blocks. And then you can hash all
of them together, all right? So there’s typically
more invocations. I don’t really want
to get into it. But there’s typically
more invocations of h when the input would be 2 times
n, or 3 times n, all right? So we don’t really
need to go there for the purposes
of this lecture. But keep that in mind. So we’ll still stick with our
arbitrary string requirement. So having said that, take
a look at this picture. And see what this
picture implies. I have an h prime that
I’ve constructed, right? Now if I look at h
prime, and I give you an output for h prime–
so h prime now has, it’s a function of a and b, and
x2 all the way to xn, right? So it’s got an extra input. If I look at h prime, and I look
at the output of h prime that is given to me, and I need
to discover something that produces that, it is pretty
clear that I need to figure out what these values
are, all right? And I need to know what
the parity of a and b is. And maybe I don’t need to
know exactly what a and b are, but I absolutely need to know
what the parity of a and b are, because that’s x1. And the one-way I’d
break would require me to tell you what
the value of x1 is, and the value of x2,
and so on and so forth. So it’s pretty clear that
h prime is one-way, right? Everybody buy that? h prime is one-way. But you know what? I’ve got target
collisions galore, right? All I have to do is flip– I
have a equals 1 and b equals 1. And I have a equals
0 and b equals 0. They’re going to give
me the same hash, right? So trivial example,
but that gets to the essence of the difference
between collision resistance and one-wayness, target
collision resistance and one-wayness, all right? So this is one-way but not TCR,
simply because a equals 0, b equals 0 for
arbitrary x’s produce the same thing as a equals
1 and b equals 1, right? So those are collisions. So admittedly contrived. But it’s a counterexample. Counterexamples
can be contrived. It’s OK. All right. So that was what
happens with that. Let’s look at one
more interesting thing that corresponds to
the other way, right? So what I want to show is that a
TCR does not imply one-wayness. OK, so now I want an
example where it is clear that I have target collision
resistance, because I can just assume that. And we’re going to
use the same strategy. I’m just going assume
that I have an h that’s target collision resistant. And I’m going to try and cook up
an h prime that is not one-way. So I’m going to assume that
in fact h is TCR and OW. And I’m going to take away
one of the properties. And if I take it one
of the properties I have a counterexample, right? So think about how
you could do this. You have h as before. And I want to add
some stuff around it such that it’s going to be
easy to discover– for a large, for a constant
fraction of hashes that I’ve given to me,
not for any old hash. Because you can always
claim that one-wayness is broken by saying I have
x, I computed h of x, now I know what– given h
of x I know what x is. I mean you can’t do that, right? So that’s not breaking
the one-wayness of it. It’s when you have
an h of x and this is the first time
you’ve seen it, you’re trying to find
what x is, right? So how would you– how
would you set it up so you break the
one-wayness of h without necessarily breaking
the target collision resistance of the overall hash
function that you’re creating? And you have to do something
with the outputs, OK? You have to do something. This is a little more involved. It’s not as easy
as this example. It’s a little more involved. But any ideas? Yeah, go ahead. AUDIENCE: So x is
less than b returns x. If x is greater than
b, return [INAUDIBLE]. SRINIVAS DEVADAS: Beautiful. Right. What color did
you get last time? AUDIENCE: Blue. SRINIVAS DEVADAS: You
got a blue last time? All right. Well you get a purple. You have a set. Actually we have these red ones
that are precious, that are– no, we don’t. We chose not to do red. I don’t know. There was some
subliminal message I think with throwing red
Frisbees that we didn’t like. But OK. So thank you. And h of x is simply
something where I’m going to concatenate
a zero to the x value and just put it out. And clearly this is
breaking one-wayness because I’m just taking the
input, I’m adding a zero to it, and shipping it out. So it’s going to be easy
to go backwards, right? And this only happens
if x is less than n, as the gentleman just said. Less than or equal to n in
terms of the input length, OK? Otherwise I’m
going to do h of x. So this is good news. Because I’m actually using
the hash function in the case where I have a
longer input string. This is bad news for
one-wayness because I’m just piping out the input. And so if I get an x, and I
see what the x is out here, and let’s just say
for argument’s sake that– you could
even say that n is going to be something
that is less than d, which is the final
output, which has d-bits. And so if you see something
that h prime produces that’s less than
d-bits you instantly know that you can go
backwards and discover what input produced that
for the h prime, right? Because you just go off
and you go backwards. This is what it tells you. Now on the other
hand if it’s larger obviously you can’t do that. But there’s a whole
lot of combinations that you can do that for. So this breaks one-wayness, OK? Now you think about TCR. And what you want
a show of course is that this maintains TCR. So that’s the last thing
that we have to show. We know that it
breaks one-wayness. But if it broke TCR we don’t
quite have our example. So we want to show
that it actually maintains TCR, which is
kind of a weakish property that we need to maintain. And the reason
this maintains TCR is that there’s really only
two cases here obviously, corresponding to
the if statement. And it’s pretty clear that if
x is less than or equal to n, clearly different x’s produce
different h prime x’s, correct? Because I’m just passing
along the x out to the output. So if x is less than n I am
going to get different hashes at the output. I’m just passing them out. So that’s easy. And for the other case,
well I assume that h of x was CCR, correct? Because that was the original
assumption, that I had h, which was CCR. So in both cases TCR is
maintained because else h of x maintains TCR, all right? So again, a bit of
a contrived example to show you the
difference between these different properties so
you know not to mix them up. You know what you
want to ask for, what is required
when you actually implement an
application that depends on particular properties. All right? Any questions so
far about properties or any of these examples? We’re going to dive
in to using them. OK. So start thinking
computer security. Start thinking hackers,
protecting yourself against the bad guys
that are out there who are trying to discover
your passwords, trying to corrupt
your files, generally make your life miserable. And we’ll start out with
fairly simple examples, where the properties are
somewhat obvious, and graduate to this auction
bidding example which should be sort of the
culmination of at least this part of the lecture. And depending on
how much time I have I’ll tell you a
little bit about how to implement hash functions. But I think these
things are more important from a
standpoint of giving you a sense of cryptographic hashes. All right. Password storage. How many of you write your
password in an unencrypted text file and store it in
a readable location? There you go, man. Thank you for being honest. And I do worse. Not only do I do that, I
use my first daughter’s name for four passwords. I won’t tell you
what the name is. So that’s something that
we’d like to fix, right? So what do real systems do? Real systems cannot protect
against me using my first daughter’s name as
a password, right? So there’s no way you
can protect against that. But if I had a reasonable
password, which had reasonable
entropy in it– so let’s assume here that we
have reasonable entropy in the password. And you can just say 128-bits. And it’s not a lot, right? 128-bits is 16 characters, OK? And you don’t have to answer
this– how many of you have 16 characters
in your password? Oh I’m impressed. OK. So you’ve got
128-bits of entropy. But the rest of you, forget it. This is not going
to help you, OK? But what I want,
assuming you have significant entropy in your
password– because otherwise, if there’s not
enough entropy you can just enumerate all possible
passwords of eight letters. And it’s not that much. It’s 2 raised to
50, what have you. And you can just go off. And none of these
properties matter. You just– you have your h of x. It’s public. We’ll talk about how we
use that in a second. But clearly if the
domain is small you can just
enumerate the domain. So keep that in mind. I talked about h of
x, and it’s obviously going to be relevant here. But suppose I wanted
to build a system, and this is how
systems are built, ETC slash password
file, assuming you have long passwords
it does it this way, otherwise it needs something
that’s called a salt. But that’s 6, 8, 57
and we won’t go there. So we just assume
a large entropy. What is it that a system can do? What can it store in order
to let you in, and only let you in when you
type your password, and not let some bogus
password into the system? Or somebody with a bogus
password into the system. Yeah, go ahead. AUDIENCE: If you capture the
password when you enter it and compare it to
what’s stored– SRINIVAS DEVADAS: Yes. AUDIENCE: If it’s a one-way
hash you know you have what the correct password is. SRINIVAS DEVADAS:
That’s exactly right. That’s exactly right. So it’s a really simple
idea, a very powerful idea. It, as I said, assumed that the
entropy– and I’m belaboring the obvious now–
but it is important when you talk about security
to state your assumptions. But you do not store
password on your computer. And you store the
hash of the password. Now why do I store my
password on the computer? Because this is so
inconvenient, right? So this is what the
system does for me. But the fact of the matter
is, if I lose my password, this doesn’t help me. Because what the system wants
you to do is choose a password that is long enough,
and the h is one-way. So anybody who discovers h of
PW that is publicly readable cannot discover PW, all right? That’s what’s cool about this. How do you let
the person log in? Use h of PW to compare
against h of PW prime, which is what is entered, where
PW prime is the typed password. And clearly what we need is
the disclosure of h of PW should not reveal PW. So we definitely
need one-wayness. What about– what about
collision resistance? Our target collision resistance? Think practitioner now, right? Are we interested in
this hash function being collision resistant? What does that
mean in this case? Give me the context in this
particular application? Yeah, go ahead. AUDIENCE: It means that someone
entering a different password will have the same
hash [INAUDIBLE]. SRINIVAS DEVADAS: Exactly. So it means that what you have
is a situation where you do not reveal– and so what might
happen is that h of PW prime equals h of PW. But h of PW equals
h of PW prime. But PW is not equal to PW prime. What you have is
a false positive. Someone who didn’t
know your password but guessed right– and
this is a 128-bit value, and they guessed right–
is going to get it. You don’t particularly
care of the probability of this occurrence. It’s really small. Typically you’re going to
have systems that lock you out if you try 10 tries that occurs
one, two, wrong passwords, right? So really in systems
you do not require– you do want to
build systems that have minimal
properties with respect to the perimeters that are used. So from a system building
standpoint just require OW. Don’t go overboard. Don’t require collision
resistance or TCR, OK? Let’s do a slightly
different example. Also a bit of a
warm-up for what’s coming next, which is a
file modification detector. So for each file F, I’m going to
store h of F. And as securely. So you assume that this means
that h of F cannot be modified by anybody, h of F itself. And now we want to
check if F is modified by re-computing h of
F. Which could be, this could be modified. So this could
actually be F prime. You don’t know that. You have a file. It’s a gigabyte. And somebody might
have tampered with one of the bits in the file. All you have is a
d-bit digest that corresponds to h of F that you
stored in a secure location. And you want to check
to see, by re-computing h of F, the file
that is given to you, and comparing it with what
you’ve stored, the h of F that you’ve stored. And so what property do we
need in order to pull this off? Of hash functions. What precisely do we
need to pull this off? What is the adversary
trying to do? And what is a successful break? A successful break is if an
adversary can modify the file and keep h of F the same, right? That would be a
successful break, right? Yup. Go ahead. AUDIENCE: TCR. SRINIVAS DEVADAS: TCR? Yeah, absolutely. You need TCR. So you want to modify the file. So you’re given that
the file– the adversary is given the file, which
is the input to the hash, and is going to try and
modify– modify the file, right? So let’s do a couple more. And we’re going to advance our
requirements here a little bit. So those two are
basic properties. I want to leave this up there. We’re going to do
something that corresponds to digital signatures. So digital signatures are
this wonderful invention that came out of MIT in a
computer science laboratory– again, Ron Rivest and
collaborators– which are a way of digitally
signing a document using a secret key, a private key. But anybody who has
access to a public key, so it could be
pretty much anybody, could verify the authenticity
of that signature, right? So that’s what a
digital signature is. So we’re going to talk
about public cryptography on Thursday, in terms
of how you could build systems or encryption algorithms
that are public key algorithms. But here I’ll just tell you
what we want out of them. Essentially what we have here
in the case of signatures, we actually want to talk
about encryption here, are– there’s two
keys associated with a public key system. Anybody and everybody
in the system would have a public key that
you can put on your website. And you also have a secret key–
that’s like your password– that you don’t
want to write down, you don’t want to give away,
because that’s effectively your identity. And what digital
signatures respond to are that you have
two operations. You have signing
and verification. So signing means that you
create a signature sigma that is the sign using your
private key, your secret key, off a message M. So you’re
saying this is this message, it came from me, right? That’s what signing means. You have this long message
and you sign it at the bottom. You’re taking responsibility for
the contents of that message. And then verification is you
have M sigma and a public key. And this is simply going
to output true or false. And so the public key should
not reveal any information about the secret key. And that’s the challenge
of building PKI systems, that we’ll talk about in
some detail next time. But we don’t need to
think about that other than acknowledging it today. So the public and private
key are two distinct things, neither one of which reveals
anything about the other. Think of them as completely
distinct passwords. But they happen to be
mathematically related. That’s why this
whole thing works. And that mathematical
relationship we’ll look at in some
detail on Thursday. But having said
that, take a look at what this app is
doing for us, right? This is a security application. And I haven’t quite gotten
to hash functions yet. But I’ll get to it
in just a minute. But what I want to do is
emphasize that there’s two operations going on. One of which is a
signature, which is a private signature, in the
sense that it’s private to me, if I’m Alice. Or private to Alice. And you’re using
secret information on this public message,
M, because that’s going to be publicized. And you’re going to
sign the public message. And then anybody in the
world who has access to Alice’s public key is
going to be able to say, oh I’m looking at the signature,
which is a bunch of bits. I’m looking at the message,
which is a whole lot of bits. And I have this public key,
which is a bunch of bits. And I’m going to be
able to tell for sure that either Alice
signed this message, or Alice did not
sign this message. And the assumption
here is that Alice kept her private key secret. And of course, what
I just wrote there, that the public key
does not reveal anything about the secret key, OK? So that’s digital signatures
for you, in a nutshell. And when you do MIT
certificates you’re using digital signatures a la
Rivest-Shamir-Adleman, the RSA algorithm. So you’re using
this all the time, when you click on 6.046
links, for example. And what happens is M is
typically really large. I mean it could
be a file, right? It could be a large file. And you don’t necessarily want
to compute these operations on large files. So for convenience, what happens
is you end up hashing the file. And for large M it’s
easier to sign h of M. And so replace the M’s that
you see here with h of M, all right? So now that we’re given that
we’re going to be doing h of M in here, think
about what we wanted to accomplish with M, right? I told you what we wanted
to accomplish with M. There’s a particular message. I’m Alice. I’m going to keep my
secret key secret. But I want to commit to signing
this message M, all right? And I want to make
sure that nobody can pretend to be me who
doesn’t know my secret key. And nobody does. So if I’m going to be signing
the hash of the message, now it comes down
to today’s lecture. I’m signing the hash
of the message h of M. What property do I require of
h in order for this whole thing to work out? Yeah, go ahead. AUDIENCE: Is it
non-malleability? SRINIVAS DEVADAS: Non
malleability, but even before that– suppose– absolutely,
but non-malleability is kind of beyond one of these
properties over on the right. You’re on the
right track, right? So do you want to give
me a different answer? You can give me a
different answer. AUDIENCE: Oh, I’m not sure. SRINIVAS DEVADAS: OK. What? Yeah, back there. AUDIENCE: I think you wanted to
one-way because otherwise you could take that signature and
find another message that you could credit. SRINIVAS DEVADAS: I
can make M public. I can make M– M can be public. And h of M is public. So one-wayness is not
interesting for this example if M is public. And we can assume that M
eventually gets public. Because that’s the message
I’m signing, right? I can also put M out. So I want the
relationship– I want you to focus on the relationship
between h of M and M and tell me what would
break this system. And you’re on the right track. Yeah, go ahead. Or way back there. Yeah, sorry about that. AUDIENCE: TCR. SRINIVAS DEVADAS: TCR. Why TCR? AUDIENCE: [INAUDIBLE]. SRINIVAS DEVADAS: So I have
M. So what happens here– I should write this out. I’m given– as an adversary I
have M and h of M. It is bad if Alice signs h of M, but Bob
claims Alice signed M prime. Because h of M equals
h of M prime, right? That is bad. So the M is public–
could you stand up? M is given. There’s a specific
M, and a specific h of M in particular,
that has been exposed. And h of M is what was
used for the signature. So you want to keep
h of M the same. It’s a specific one. So it’s not
collision resistance, it’s target
collision resistance, because that’s given to you. And you want to
keep that the same. But you want to claim that oh,
you promised me $10,000, not $20, right? If you can do that,
you signed saying you want to pay $10,000, not
$20, then you’ve got a problem. So your thing is very close. It’s just that it doesn’t need
to be a strong relationship between the 10,000 or the 20. I mean I give you a
concrete example of that. But it could be more,
it could be less. Anything that is different
from what you signed, be it with the numerical
relationship or not, would cause a problem and
break this scheme, all right? Are we good? All right, one last example,
the most interesting one. And as I guessed I’m
probably not going to get to saying very much
about how cache functions are implemented. But maybe I’ll spend
a minute or two on it. So let’s do this example that
has to do with commitments. Commitment is important, right? You want to commit
to doing things. You want to keep your promises. And in this case we
have a legal requirement that you want to be able to make
people honor their commitments, and not weasel their way
out of commitments, right? And we want to deal with
this computationally. And let’s think about auctions. So Alice has value x,
e.g. an auction bid. Alice computes what
we’re going to call C of x, which is a commitment
of x, and cements it, right? C of x, C of x is– let’s
assume that the auctioneer, and perhaps other auctionees
as well, see C of x. You have to submit it
to somebody, right? So you can assume
that that’s exposed. And what is going to happen
is, when bidding is over Alice is going to open–
so this is– C of x can be thought
of as sealing the bid. So that’s the commitment. You’re sealing the–
you’re making a bid and you’re sealing
it in an envelope. You’ve committed to that. That’s obviously, what
happens in real life without cryptography,
but we want to do this with cryptography,
with hash functions. And so now Alice opens
C of x to reveal x. So she has to prove that
in fact x was her bid. And that it matches
what she sealed. When you open it up, think
about it conceptually from a standpoint of
what happens with paper, and then we have to think
about this computationally and what this implies, right? So again I’ll do a
little bit of set up. And then we have start
talking about the properties that we want for this
particular application. So there are a bunch
of people who are doing bidding for this auction. I don’t– I want
to be the first– I don’t want to
spend a lot of money. But I want to win. All of us are like that, right? If I know information
about your bid, that is obviously a
tremendous advantage. So clearly that
can’t happen, right? If I know one other person’s
bid I just do plus 1 on that. If I know everybody else’s I
just do plus 1 on the maximum. So clearly there’s some secrecy
that’s required here, correct? So C of x is going to
have to do two things. It can’t reveal x. Because then even maybe
the auctioneer is bad. Or other people are
looking at this. And you can just assume that C
of x is– the C of x’s are all public. But I also need a
constraint that’s associated with C of x
that corresponds to making sure Alice is honest, correct? So I need to make Alice
commit to something, right? So what are the different
properties of the hash function that if I use h of
x here, that I’d want h to satisfy in order
for this whole process to work like it’s supposed to
work with paper and envelopes? Yeah, go ahead. AUDIENCE: It has to be
one-way [INAUDIBLE]. SRINIVAS DEVADAS: It
has to be one-way. And explain to me– so I
want a description of it has to be one-way, because why? AUDIENCE: Because
you want all the c x’s to be hidden from
all the other options. SRINIVAS DEVADAS: Right. C of x should not
reveal x, all right? All right. That’s good. Do you have more? It has to be
collision resistant. OK. I guess. A little bit more. You’re getting there. What– why is it
collision resistant? AUDIENCE: Because you want
to make sure that Alice, when she makes a bid that
she commits that bid. If she’s not going to resist
it then she could bid $100 and then find something else. SRINIVAS DEVADAS:
That’s exactly right. So CR, because
Alice should not be able to open this in
multiple ways, right? And in this case it’s
not TCR in the sense that Alice controls
what her bids are. And so she might find a pair
of bids that collide, correct? She might realize that in
this particular hash function, you know $10,000 and a billion
dollars collide, right? And so she figures
depending on what happens, she’s a billionaire,
let’s assume. She’s going to open
the right thing. She’s a billionaire, but
she doesn’t necessarily want to spend the billion, OK? So that’s that, right? But I want more. Go ahead. AUDIENCE: You don’t
want it to be malleable. Assuming that the
auctioneer is not honest because you don’t want to
accept a bribe from someone and then change
everyone else’s bid to square root of
whatever they bid. SRINIVAS DEVADAS:
That’s exactly right. Or plus 1, which is a
great example, right? So there you go. I ran out of Frisbees. You can get one next time. So yeah, I don’t
need this anymore. You’re exactly right. There’s another– it turns out
it’s even more subtle than what you just described. And I think I might be able
to point that out to you. But let me just first
describe this answer, which gives us non-malleability. So the claim is that you
also want non-malleability in your hash function. And the simple reason is,
given C of x– and let’s assume that this is public. It’s certainly public
to the auctioneer, and it could be public to
the other bidders as well. Because the notion of
sealing is that you’ve sealed it using C of x. But people can see the
outside of the envelope, which is C of x. So everyone can see C of x. You still want this to work,
even though all other bidders can see C of x. So given C of x, should
not be possible to produce C of x plus 1. You don’t know x is. But if you can produce C of
x plus 1, you win, all right? And so that’s the problem. Now it turns out you
now say OK, am I done? I want these three properties. And I’m done, right? There’s a little
subtlety here which these properties don’t capture. So that’s why there’s more here. And I don’t mean to
titillate, because I’ll tell you what is missing here. But let’s say that I have a hash
function that looks like this. And this here is non-malleable. It is collision resistant. And it’s one-way, all right? So h of x has all these
wonderful properties, all right? I’m creating an h
prime x that looks like this, which is
a concatenation of h of x, and giving away the most
significant bit of x, which is my bid, right? I’m just giving
that away, right? The problem here is
that we haven’t really made our properties
broad enough to solve this particular
application to the extent that there’s contrived cases
where these properties aren’t enough, OK? And the reason is simple. h prime x is arguably
NM, CR, and OW. And I won’t go into to
each of those arguments. But you can think
about it, right? If I’m just giving you one
bit, there’s 159 others, there’s a couple of
hundred others, whatever it is that I have in the domain. It’s not going to be invertible. h prime x is not going to
be invertible if h of x is not invertible. h prime x is not going to be
breakable in terms of collision resistance if h of
x is not breakable, and so on and so forth. But if I had a hash
function like that, is it a good hash function
for my commitment application? No, obviously not. Because if I publicize this
hash function– remember, everything is public
here with respect to h and h prime– you
are giving away the most significant that
corresponds to your bid in this particular
hash function, right? So you really need a little bit
more than these for secrecy, for true secrecy. But in the context
of this example, I mean it’s common
sense that you would not use the hash function
like that, right? So it’s not that there’s
anything profound here. It’s just that I
want to make sure that you understand the
nuances of the properties that we’re requiring. We had all the
requirements corresponding to the definitions
of NM and CR and OW. And you need a little bit
more for this example, where you have to say something,
perhaps informally, like the bits of your auction
are scrambled in the final hash output, which most hash
functions should do anyway, and h of x will definitely do. But you kind of unscrambled
it by adding this little thing in here, corresponding to
the most significant thing, all right? So I’ll stop with that. Let me just say that the
operation– or sorry, the work involved in
creating hash functions that are poly-time computable
is research work. People put up hash functions
and they get broken, like MD4 was put up in ’92 and
then got broken, SHA-1 and so on and so forth. And so I just encourage you
to look up SHA-3 and just take a quick scan and what
the complexity of SHA-3 is with respect to computing the
hash given an arbitrary string, all right? I’ll stick around for questions.

Add a Comment

Your email address will not be published. Required fields are marked *