2024-11-17 19:23:00
nrk.neocities.org
If you need a small and fast 64 bit hash function that can be copy-pasted
easily, then here’s one that I cooked up in an hour or so: chibihash64.h.
Some key features:
- Small: ~60 loc in C
- Fast: See benchmark table below
- Portable: Doesn’t use hardware specific instructions (e.g SSE)
- Good Quality: Passes smhasher, so should be good quality (I think)
- Unencumbered: Released into the public domain
- Free of undefined behavior and gives same result regardless of host system’s endianness.
- Non-cryptographic
Here’s some benchmark against other similar hash functions:
Name | Large input (GiB/sec) | Small input (Cycles/Hash) |
---|---|---|
chibihash64 | 18.08 | 49 |
xxhash64 | 12.59 | 50 |
city64 | 14.95 | 35 |
spooky64 | 13.83 | 59 |
It’s the fastest of the bunch for large string throughput.
For small string (hardcoded special cases for input below or equal 32 bytes.
Design overview
The function starts by declaring some constants and initializing a 32 byte
state.
const uint64_t P1 = UINT64_C(0x2B7E151628AED2A5);
const uint64_t P2 = UINT64_C(0x9E3793492EEDC3F7);
const uint64_t P3 = UINT64_C(0x3243F6A8885A308D);
uint64_t h[4] = { P1, P2, P3, seed };
The constants are nothing special, it’s just e (Euler’s number), pi (π) and
golden ratio written out in hex with the last digit adjusted to be odd (because
multiply by even number isn’t invertable).
for (; l >= 32; l -= 32) {
for (int i = 0; i > 24));
}
}
This is the meat of the algorithm.
It processes 8 bytes at a time mixing it in via xor
and multiply.
However, multiplication is an “upwards” operation where the low bits affect the
high bits but the high bits don’t get to affect the low ones.
This creates some biases.
So you’d want to mix the high bits into the lower ones to reduce the bias.
This commonly achieved via an xor-shift: x ^= x >> 32;
.
A bitwise rotation bringing the high bits down also works,
which is what I opted for.
Initially I was mixing it into h[i]
but I noticed that mixing it into the next
index sped things up noticeably.
Probably due breaking up some dependency and allowing for better
instruction-level-parallelism (ILP).
Speaking of ILP, with compiler optimization, the inner loop should get unrolled,
which would open up more avenue for utilizing ILP better.
This is similar principle to how xxhash achieves it’s high speed.
The way it loads the 8 bytes is also important.
The correct way is to load via shift+or:
static inline uint64_t
chibihash64__load64le(const uint8_t *p)
{
return (uint64_t)p[0]
This is free of any UB, works on any alignment and on any machine regardless of
it’s endianness.
It’s also fast, gcc
and clang
recognize this pattern and optimize it into
a single mov
instruction on x86
targets.
h[0] += ((uint64_t)len > 32);
if (l & 1) {
h[0] ^= k[0];
--l, ++k;
}
h[0] *= P2; h[0] ^= h[0] >> 31;
Now we mix in the length along with a byte if the remaining length is odd
(you’ll see why in a minute).
for (int i = 1; l >= 8; l -= 8, k += 8, ++i) {
h[i] ^= chibihash64__load64le(k);
h[i] *= P2; h[i] ^= h[i] >> 31;
}
Keep chopping of 8 bytes until there’s less than that remaining.
for (int i = 0; l > 0; l -= 2, k += 2, ++i) {
h[i] ^= (k[0] | ((uint64_t)k[1] > 31;
}
Now we chop off the last remaining bytes, 2 bytes at a time.
Since we mixed in a byte in case of an odd length already, the remaining length
now must be even, thus it’s safe to read 2 bytes at a time.
This gives a decent dump in small string performance compared to doing it one
byte at a time.
uint64_t x = seed;
x ^= h[0] * ((h[2] >> 32)|1);
x ^= h[1] * ((h[3] >> 32)|1);
x ^= h[2] * ((h[0] >> 32)|1);
x ^= h[3] * ((h[1] >> 32)|1);
Now we combine all the state into a single 64 bit value.
I’m not sure if this is a good way to do it, but it seems to work fine.
x ^= x >> 27; x *= UINT64_C(0x3C79AC492BA7B653);
x ^= x >> 33; x *= UINT64_C(0x1C69B3F74AC4AE35);
x ^= x >> 27;
And finally mix it in well for some good avalancing behavior.
The finisher used here is a direct copy of moremur, which should be a good
finisher with decent speed.
Mathemetical foundation
There are none.
Everything here is “empirically derived”
(I kept making changes until the tests passed).
When NOT to use
The introduction should make it clear on why you’d want to use this.
Here are some reasons to avoid using this:
- For cryptographic purposes.
- For protecting against collision attacks on hash-tables
(SipHash is the recommended one for this purpose). - When you need very strong probability against collisions: ChibiHash does very
minimal amount of mixing compared to other hashes (e.g xxhash64). And so
chances of collision should in theory be higher.
Conclusion
I started writing this because all the 64 bit hash functions I came across were
either too slow (FNV-1a, one byte at a time processing), or too large spanning
hundreds of lines of code, or non-portable due to using hardware specific
instructions.
Being small and portable, the goal is to be able to use ChibiHash as a good
“default” for non-cryptographic 64-bit hashing needs.
Support Techcratic
If you find value in Techcratic’s insights and articles, consider supporting us with Bitcoin. Your support helps me, as a solo operator, continue delivering high-quality content while managing all the technical aspects, from server maintenance to blog writing, future updates, and improvements. Support Innovation! Thank you.
Bitcoin Address:
bc1qlszw7elx2qahjwvaryh0tkgg8y68enw30gpvge
Please verify this address before sending funds.
Bitcoin QR Code
Simply scan the QR code below to support Techcratic.
Please read the Privacy and Security Disclaimer on how Techcratic handles your support.
Disclaimer: As an Amazon Associate, Techcratic may earn from qualifying purchases.