L07 – Hashing and Ethereum (CBDE Course)

L07 –  Hashing and Ethereum (CBDE Course)


In this lecture we are touching on the term
Hashing.
According to the wikipedia – and that is a
general definition – a hash function is any
function that can be used to map data of arbitrary
size to data of fixed size.
The values returned by a hash function are
called hash values, hash codes, digests or
simply hashes.
Let’s see an example.
The easiest way to do hashing is by having
a hash table.
So we could, for example, hash names to integer
values.
John Smith could be mapped to 02, Lisa Smith
could be mapped to 01, Sam doe could be mapped
to 04 and Sandra Dee could be mapped to 02
as well, which would represent a collision,
because John Smith is already mapped to 02.
Hashing is often confused with other, very
similar topics.
For example checksums, fingerprints or lossy
compression etc.
Those are all very similar, and maybe overlaps,
but they are all different and have different
requirements.
Let’s look at some use-cases of hashing:
The first use-case of hashing are hash-tables
to quickly look up a data record.
Caching is another use case, where the hash
is used to build up a cache for large data
sets in slow media.
Hashes can also be perfectly used to find
duplicate entries in large datasets.
And lastly, Hashes are used to Protect data
with cryptographic hash functions.
Hashes must meet some requirements and have
some important properties:
A hash procedure must be deterministic.
That means the same input must always produce
the same output.
A hash function should be uniform, which means
that a large amount of input data should be
evenly used to produce the output.
Not just the first few bytes for example.
It is desirable that the output has a defined
range, meaning the output has a fixed size.
Especially in cryptographic applications it
is desired that hash functions are non-invertible.
That means that given an output of the hash
function it is not realistic to reconstruct
the input.
Then there is a vast amount of hash functions
available.
From Universal hash functions, such as the
Zobrist Hashing to Non-Cryptographic Hash
functions, such as the pearson hashing, to
Unkeyed Cryptographic Hash Functions, such
as MD5 or SHA-1, 256, 512 or SHA-3.
SHA-3 was initially also known as Keccak,
which is used by Ethereum.
The final implmentation of SHA3 differs from
Keccak, so the SHA3 version of Ethereum is
the SHA3 Keccak implementation, while the
official SHA3 version from NIST does different
input padding.
In case of Ethereum mining, Dagger-Hashimoto
hashing is used.
This tries to satisfy two goals:
1.
It is Asic resistant.
That means the typical specialized miners
for bitcoin mining won’t work with Ethereum.
2.
It should be possible to verify blocks with
light clients.
But that’s all way too detailed.
You are probably more interested in ethereum
development.
Mining is a completely different topic.
Here is your take-away: By now you should
know that Hashing is mapping any arbitrary
large data set to a fixed size.
It is deterministic, which means that same
inputs produce same outputs.
And, especially for cryptography use cases,
realistically, it shouldn’t be possible to
reproduce the input only given the output.

Add a Comment

Your email address will not be published. Required fields are marked *