Landon Curt Noll Landon Curt Noll's picture

Fowler / Noll / Vo (FNV) Hash


[Mathematics / Cryptology / Cryptography pages]  [Computer / Algorithm pages]  [Technology pages]  [chongo's home page]
 
Search the entire web   Search only www.isthe.com
 
 

FNV quick index

  • Portable FNV hashing of numeric values
  • Extended FNV type hashing


(top)

FNV hash history

The basis of the FNV hash algorithm was taken from an idea sent as reviewer comments to the IEEE POSIX P1003.2 committee by Glenn Fowler and Phong Vo. In a subsequent ballot round: Landon Curt Noll improved on their algorithm. Some people tried this hash and found that it worked rather well. In an EMail message to Landon, they named it the ``Fowler/Noll/Vo'' or FNV hash.

FNV hashes are designed to be fast while maintaining a low collision rate. The FNV speed allows one to quickly hash lots of data while maintaining a reasonable collision rate. The high dispersion of the FNV hashes makes them well suited for hashing nearly identical strings such as URLs, hostnames, filenames, text, IP addresses, etc.

The FNV hash is in wide spread use:

  • calc
  • Domain Name Servers
  • mdbm key/value data lookup functions
  • Database indexing hashes
  • major web search / indexing engines
  • high performance EMail servers
  • Netnews history file Message-ID lookup functions
  • Anti-spam filters
  • NFS implementations (e.g., FreeBSD 4.3, IRIX, Linux (NFS v4))
  • Cohesia MASS project server collision avoidance
  • spellchecker programmed in Ada 95
  • flatassembler's open source x86 assembler - user-defined symbol hashtree
  • PowerBASIC inline assembly routine
  • text based referenced resources for video games on the PS2, Gamecube and XBOX
  • non-cryptographic file fingerprints
  • FRET - a tool to identify file data structures / helps to understand file formats
  • used to in the process of computing Unique IDs in DASM (DTN Applications for Symbian Mobile-phones)
  • Used by Microsoft in their hash_map implementation for VC++ 2005
  • Used in an implementation of libketama for use in items such as memcache.
  • Used in the realpath cache in PHP 5.x (php-5.2.3/TSRM/tsrm_virtual_cwd.c).

FNV hash algorithms and source code have been released into the public domain. The authors of the FNV algorithmm look deliberate steps to disclose the algorhtm in a public forum soon after it was invented. More than a year passed after this public disclosure and the authors deliberatly took no steps to patent the FNV algorithm. Therefore it is safe to say that the FNV authors have no patent claims on the FNV algorithm as published.

If you use an FNV function in an application, tells us about it by sending EMail to:

fnv EMail addr

We will be happy to add your application to the list.

Comments are welcome.


(top)

The core of the FNV hash

The core of the FNV-1 hash algorithm is as follows:

hash = offset_basis
for each octet_of_data to be hashed
	hash = hash * FNV_prime
	hash = hash xor octet_of_data
return hash

The offset_basis and FNV_prime can be found in the parameters of the FNV-1 hash section below.


(top)

FNV-1a alternate algorithm

There is a minor variation of the FNV hash algorithm known as FNV-1a:

hash = offset_basis
for each octet_of_data to be hashed
	hash = hash xor octet_of_data
	hash = hash * FNV_prime
return hash

The only difference between the FNV-1a hash and the FNV-1 hash is the order of the xor and multiply. The FNV-1a hash uses the same FNV_prime and offset_basis as the FNV-1 hash of the same n-bit size.

Some people use FNV-1a instead of FNV-1 because they see slightly better dispersion for tiny (<4 octets) chunks of memory.

Either FNV-1 or FNV-1a make a fine hash. (Try it with with just a dash of Sage and ground Cloves :-))


(top)

Parameters of the FNV-1 hash

The FNV-1 hash parameters are as follows:

  • hash is an n bit unsigned integer, where n is the bit length of hash.

  • The multiplication is performed modulo 2n where n is the bit length of hash.

  • The xor is performed on the low order octet (8 bits) of hash.

  • The FNV_prime is dependent on n, the size of the hash:

    32 bit FNV_prime = 16777619

    64 bit FNV_prime = 1099511628211

    128 bit FNV_prime = 309485009821345068724781401

    256 bit FNV_prime = 374144419156711147060143317175368453031918731002211

    512 bit FNV_prime =
    35835915874844867368919076489095108449946327955754392558399825615420669938882575
    126094039892345713852759


    1024 bit FNV_prime =
    50164565101131186554345988110352789550307653454047907443030175238311120551081474
    51509157692220295382716162651878526895249385292291816524375083746691371804094271
    873160484737966720260389217684476157468082573

    Part of the magic of FNV is the selection of the FNV_prime for a given sized unsigned integer. Some primes do hash better than other primes for a given integer size. The theory behind which primes make good FNV_prime's is beyond the scope of this web page.

  • The offset_basis for FNV-1 is dependent on n, the size of the hash:

    32 bit offset_basis = 2166136261

    64 bit offset_basis = 14695981039346656037

    128 bit offset_basis = 275519064689413815358837431229664493455

    256 bit offset_basis =
    100029257958052580907070968620625704837092796014241193945225284501741471925557


    512 bit offset_basis =
    96593031294966694980094354007163104660904187456726378961083743294344626579945829
    32197716438449813051892206539805784495328239340083876191928701583869517785


    1024 bit offset_basis =
    14197795064947621068722070641403218320880622795441933960878474914617582723252296
    73230371772215086409652120235554936562817466910857181476047101507614802975596980
    40773201576924585630032153049571501574036444603635505054127112859663616102678680
    82893823963790439336411086884584107735010676915

    These non-zero integers are the FNV-0 hashes of the following 32 octets:

    chongo <Landon Curt Noll> /\../\

    The \'s in the above string are not C-style escape characters. In C-string notation, these 32 octets are:

    "chongo <Landon Curt Noll> /\\../\\"

    The following calc script was used to compute the offset_basis for FNV-1 hashes:

    hash_bits = insert_the_hash_size_in_bits_here;
    FNV_prime = insert_the_FNV_prime_here;
    offset_basis = 0;
    offset_str = "chongo <Landon Curt Noll> /\\../\\";
    hash_mod = 2^hash_bits;
    
    str_len = strlen(offset_str);
    for (i=1; i <= str_len; ++i) {
        offset_basis = (offset_basis * FNV_prime) % hash_mod;
        offset_basis = xor(offset_basis, ord(substr(offset_str,i,1)));
    }
    
    print hash_bits, "bit offset_basis =", offset_basis;
    

    NOTE: The above code fragment example is written in the calc language, not in C.

    FNV-0 Historic note: The FNV-0 is the historic FNV algorithm that is now deprecated. It has an offset_basis of 0. Unless the FNV-0 hash is required for historical purposes, the FNV-1 or FNV-1a should be used in place of the FNV-0 hash. Use FNV-1 and FNV-1a hashes, with their non-zero offset_basis instead. The FNV-0 hashes all buffers that contain only 0 octets to a hash value of 0. The FNV-1 and FNV-1a hash do not suffer from this minor problem.


(top)

Changing the FNV hash size - xor-folding

If you need an x-bit hash where x is not a power of 2, then we recommend that you compute the FNV hash that is just larger than x-bits and xor-fold the result down to x-bits. By xor-folding we mean shift the excess high order bits down and xor them with the lower x-bits.

For example to produce a 24 bit FNV-1 hash in C we xor-fold fold a 32 bit FNV-1 hash:

#define MASK_24 (((u_int32_t)1<<24)-1)	/* i.e., (u_int32_t)0xffffff */
#define FNV1_32_INIT ((u_int32_t)2166136261)
u_int32_t hash;
void *data;
size_t data_len;

hash = fnv_32_buf(data, data_len, FNV1_32_INIT);
hash = (hash>>24) ^ (hash & MASK_24);

To produce a 16 bit FNV-1 hash in C we xor-fold fold a 32 bit FNV-1 hash:

#define MASK_16 (((u_int32_t)1<<16)-1)	/* i.e., (u_int32_t)0xffff */
#define FNV1_32_INIT ((u_int32_t)2166136261)
u_int32_t hash;
void *data;
size_t data_len;

hash = fnv_32_buf(data, data_len, FNV1_32_INIT);
hash = (hash>>16) ^ (hash & MASK_16);

To produce a 56 bit FNV-1 hash in C (on a machine with 64 bit unsigned values) we xor-fold fold a 64 bit FNV-1 hash:

#define MASK_56 (((u_int64_t)1<<56)-1)	/* i.e., (u_int64_t)0xffffffffffffff */
#define FNV1_64_INIT ((u_int64_t)14695981039346656037)
u_int64_t hash;
void *data;
size_t data_len;

hash = fnv_64_buf(data, data_len, FNV1_64_INIT);
hash = (hash>>56) ^ (hash & MASK_56);

The above process assumes that you are using an FNV hash that at most twice as large as the x-bits that you need. For x < 16, there is no 16 bit (or less) FNV-1 hash to use.

For tiny x < 16 bit values, we recommend using a 32 bit FNV-1 hash as follows:

/* NOTE: for 0 < x < 16 ONLY!!! */
#define TINY_MASK(x) (((u_int32_t)1<<(x))-1)
#define FNV1_32_INIT ((u_int32_t)2166136261)
u_int32_t hash;
void *data;
size_t data_len;

hash = fnv_32_buf(data, data_len, FNV1_32_INIT);
hash = (((hash>>x) ^ hash) & TINY_MASK(x));

At the expense of CPU performance, one may use a larger FNV-1 hash that is normally required in any of the above xor-folding examples. This will not produce the same standard output, but it may provide slightly better dispersion at the expense of more CPU time. All that is needed is to use a larger FNV-1 hash and to move the mask outside of the expression on the final statement.

For example, to produce a 24 bit FNV-1 hash could have used a 64 bit FNV-1 hash, in a non-standard way, as follows:

/* NOTE: non-standard use of a larger hash */
#define MASK_24 (((u_int64_t)1<<24)-1)	/* i.e., (u_int64_t)0xffffff */
#define FNV1_64_INIT ((u_int64_t)14695981039346656037)
u_int64_t hash;
void *data;
size_t data_len;

hash = fnv_64_buf(data, data_len, FNV1_64_INIT);
hash = (((hash>>24) ^ hash) & MASK_24);

Replacing a 32 bit FNV-1 hash with a 64 bit FNV-1 hash during xor-folding might yield better dispersion at the expense of CPU time. However using an even larger FNV-1 hash is almost certainly a waste of CPU time. If you are going to use this non-standard xor-folding method, we recommend that you only do it for x < 32 bits, and only replace the 32 bit FNV-1 hash with a 64 bit FNV-1 hash.

NOTE: One may substitute the FNV-1a hash for the FNV-1 hash in any of the xor-folding examples. Some people believe that FNV-1a xor-folding gives then slightly better dispersion without any impact on CPU performance. See the FNV-1a hash description for more information.

If you really need an x-bit hash for x > 1024 bits, send us EMail.


(top)

Changing the FNV hash size - non-powers of 2

The FNV hash is designed for hash sizes that are a power of 2. If you need a hash size that is not a power of two, then you have two choices. One method id called the lazy mod mapping method and the other is called the retry method. Both involve mapping a range that is a power of 2 onto an arbitrary range.

  • Lazy mod mapping method: The lazy mod mapping method uses a simple mod on an n-bit hash to yield an arbitrary range. To produce a hash range between 0 and X use a n-bit FNV hash where n is smallest FNV hash that will produce values larger than X without the need for xor-folding.

    For example, to produce a value between 0 and 2142779559 using the lazy mod mapping method, we select a 32-bit FNV hash because:

    232 > 2142779559

    We compute the 32-bit FNV hash value and then perform a final mod:

    #define TRUE_HASH_SIZE ((u_int32_t)2142779560) /* range top plus 1 */
    #define FNV1_32_INIT ((u_int32_t)2166136261)
    u_int32_t hash;
    void *data;
    size_t data_len;
    
    hash = fnv_32_buf(data, data_len, FNV1_32_INIT);
    hash %= TRUE_HASH_SIZE;
    

    An advantage of the lazy mod mapping method is that it requires only 1 more operation: only an additional mod is performed at the end. The disadvantage of the lazy mod mapping method is that there is a bias against the larger values.

    To understand this bias consider the a need to produce a value between 0 and 999999. We will compute a 32-bit FNV hash value because:

    232 > 999999

    We compute the 32-bit FNV hash value using the and then perform the final mod:

    #define TRUE_HASH_SIZE ((u_int32_t)1000000) /* range top plus 1 */
    #define FNV1_32_INIT ((u_int32_t)2166136261)
    u_int32_t hash;
    void *data;
    size_t data_len;
    
    hash = fnv_32_buf(data, data_len, FNV1_32_INIT);
    hash %= TRUE_HASH_SIZE;
    

    The bias introduced by the final mod is slight. The values 0 through 967295 will be created by 4295 different 32-bit FNV hash values whereas the values 967296 through 999999 will be created by only 4294 different 32-bit FNV hash values. In other words, the values 0 through 967295 will occur ~1.0002328 times as often as the values 967296 through 999999.

    The bias can be larger when the range is nearly as large as the range of values produced by the FNV hash. Consider using the lazy mod mapping method to produce values between 0 and 9999999999999999999. We use a 64-bit FNV hash because:

    264 > 9999999999999999999

    We compute the 64-bit FNV hash value using the and then perform the final mod:

    #define TRUE_HASH_SIZE ((u_int64_t)10000000000000000000) /* range top plus 1 */
    #define FNV1_64_INIT ((u_int64_t)14695981039346656037)
    u_int64_t hash;
    void *data;
    size_t data_len;
    
    hash = fnv_64_buf(data, data_len, FNV1_64_INIT);
    hash %= TRUE_HASH_SIZE;
    

    Here the bias introduced by the final mod is more noticeable. The values 0 through 9999999999999999999 will be created by 2 different 64-bit FNV hash values whereas the values 10000000000000000000 through 18446744073709551615 will be created by only 1 64-bit FNV hash value.

    NOTE: This bias issue may not be of concern to you, but we thought we should point out this issue just in case you care. Many applications should / will not care about this bias. Most applications can use the lazy mod mapping method without any problems. Your application, may vary however.

    NOTE: One may substitute the FNV-1a hash for the FNV-1 hash in any of the lazy mod mapping method examples. Some people believe that FNV-1a lazy mod mapping method gives then slightly better dispersion without any impact on CPU performance. See the FNV-1a hash description for more information.

  • Retry method: The retry method also performs a final mod in order to produce a hash range between 0 and X. Unlike lazy mod mapping method, the retry method avoids the bias by additional computation.

    To produce a hash range between 0 and X use a n-bit FNV hash where n is smallest FNV hash that will produce values larger than X without the need for xor-folding.

    For example, to produce a value between 0 and 49999 using the retry method, we select a 32-bit FNV hash because:

    232 > 49999

    Before the final mod 50000 is performed, we check to see if the 32-bit FNV hash value is one of the upper biased values. If it is, we perform additional loop cycles until is below the bias level. For example:

    #define TRUE_HASH_SIZE ((u_int32_t)50000) /* range top plus 1 */
    #define FNV_32_PRIME ((u_int32_t)16777619)
    #define FNV1_32_INIT ((u_int32_t)2166136261)
    #define MAX_32BIT ((u_int32_t)0xffffffff) /* largest 32 bit unsigned value */
    #define RETRY_LEVEL ((MAX_32BIT / TRUE_HASH_SIZE) * TRUE_HASH_SIZE)
    u_int32_t hash;
    void *data;
    size_t data_len;
    
    hash = fnv_32_buf(data, data_len, FNV1_32_INIT);
    while (hash >= RETRY_LEVEL) {
        hash = (hash * FNV_32_PRIME) + FNV1_32_INIT;
    }
    hash %= TRUE_HASH_SIZE;
    

    The disadvantage of the retry method is that it sometimes requires additional calculations. An advantage of the retry method it avoids slightly biased values.

    For another example, we will produce a value between 0 and 999999999999 using the retry method, we select a 64-bit FNV hash because:

    264 > 999999999999

    Before the final mod 1000000000000 is performed, we check to see if the 64-bit FNV hash value is one of the upper biased value. If it is, we perform additional loop cycles until it is not.

    #define TRUE_HASH_SIZE ((u_int64_t)1000000000000) /* range top plus 1 */
    #define FNV_64_PRIME ((u_int64_t)1099511628211)
    #define FNV1_64_INIT ((u_int64_t)14695981039346656037)
    #define MAX_64BIT ((u_int64_t)0xffffffffffffffff) /* largest 64 bit unsigned value */
    #define RETRY_LEVEL ((MAX_64BIT / TRUE_HASH_SIZE) * TRUE_HASH_SIZE)
    u_int64_t hash;
    void *data;
    size_t data_len;
    
    hash = fnv_64_buf(data, data_len, FNV1_64_INIT);
    while (hash >= RETRY_LEVEL) {
        hash = (hash * FNV_64_PRIME) + FNV1_64_INIT;
    }
    hash %= TRUE_HASH_SIZE;
    

    NOTE: One may substitute the FNV-1a hash for the FNV-1 hash in any of the retry method examples. Some people believe that FNV-1a retry method gives then slightly better dispersion without any impact on CPU performance. See the FNV-1a hash description for more information.

  • To summarize: When dealing with an application that needs to generate a hash value over an arbitrary range, one can do one of the following:

    1. Change the application to use hash values that range between 0 and 2n-1. Use a n-bit FNV hash or xor-folding if needed.

      Pro: Yields the best results in the shortest amount of CPU time.
      Con: Requires source code change to force hash range to be a power of 2 in size.

    2. Use the lazy mod mapping method if one does not care about the slight hash bias and does not want (or cannot change) the hash range.

      Pro: Yields the fastest results for a non-power of 2 range.
      Con: Produces a slight bias against larger hash values. However if one does not care about the slight bias, then there is no problem using this technique.

    3. Use the retry method if one wants to avoid the hash bias and does not want / cannot change the hash range.

      Pro: Produces non-biased values for a non-power of 2 range.
      Con: Requires slightly more CPU time in some cases.


(top)

Portable FNV hashing of numeric values

Some people have asked how one might FNV hash objects such as integers, structures and other non-octet based data. We recommend that you FNV hash such objects, an octet at a time in a fixed byte order. To ensure that the FNV hash values are the same on both Big Endian and Little Endian CPUs, we recommend that objects you always hash object such as integers from low to high byte significance order. To ensure that structures (due to architecture based alignment differences) are always hashed the same, we recommend that you FNV hash each element in a structure in order.

If interoperability is not an issue, you can always FNV hash objects as if they were just an array of octets:

/* WARNING: You really do not want hash structures this way!!! */
struct stuff {
    int data;		/* 32 bits? 64 bits? */
   			/* how many octets of padding? */
    char byte;
    			/* how many octets of padding? */
    short value;	/* 16 bits? */
    			/* how many octets of padding? */
} foo;

/* WARNING: This FNV hash value may not be portable!!! */
/* WARNING: This FNV hash value may be wrong due to pad byte values!!! */
hash = fnv_64a_buf(&foo, sizeof(foo), FNV1_64_INIT);

The hash above may produce different values on Big Endian 32bit, Big Endian 64bit, Little Endian 32bit, and Little Endian 64bit architectures just to name a few. Worse yet, the padding bytes may be non-zero due to the way the structure was initialized in memory. The padding bytes may contain whatever malloc might have picked up from memory, which could have been set to any value. You should really, really, really sure that FNV value portability will NEVER be an issue, and you should really be sure that the padding octets were set to 0 before using the above hack.

To put the issue into different terms, consider a collection servers that use FNV to hash SNMP identifiers (collections of integers). If all that is needed is for a given server to perform a fast table lookup, then treating SNMP identifiers as just an array of octets (bytes) to produce a non-portable FNV value will work provided that any padding octets were initialized to the same (say 0-filled) value. However, if a server needs to send an SNMP identifier FNV hash to another server, then one will need to generate portable FNV hash values.

Consider two servers use FNV hashing to look up numeric and/or compound values in a table. If each server builds their own FNV hash table its own use, then one does not need to produce portable FNV values. The simple non-portable method of FNV hashing all of the octets will work fine. On the other hand, if one server sends a FNV hashes to the other (instead of sending values to save network bandwidth), then both servers need to agree on how to they will FNV hash numeric values. Use of portable FNV values is recommended. When one of the servers is upgraded from a 32 bit AMD CPU to a 64 bit AMD CPU (potentially changing the memory footprint of values), you will be glad you used portable FNV values!

You will be better off if use the portable hash method below. We recommend that you FNV hash integers an octet (byte) at a time. You should FNV hash them from low to high significance order.

Lets give an example of portable hashing, where any padding octets are ignored, using the FNV-1a hash:

hash = offset_basis
for each octet_of_data to be hashed
	hash = hash xor octet_of_data
	hash = hash * FNV_prime
return hash

as a set of C pre-processor macros. For clarify, we will ignore gcc optimization and other performance issues.

The 64 bit FNV-1a hash can be expressed as a set of C pre-processor macros:

/*
 * NOTE: This is for illustration purposes.  Optimization,
 *       if needed, is left as an exercise for the reader.  :-)
 */
/* NOTE: u_int64_t is a 64 bit unsigned type */
/* NOTE: u_int32_t is a 32 bit unsigned type */
/* NOTE: u_int16_t is a 16 bit unsigned type */
/* NOTE: u_int8_t is a 8 bit unsigned type */

#define FNV1_64_INIT ((u_int64_t)14695981039346656037)
#define FNV1_64_PRIME ((u_int64_t)1099511628211)

#define FNV_64A_OP(hash, octet) \
    (((u_int64_t)(hash) ^ (u_int8_t)(octet)) * FNV1_64_PRIME)

As an example, the 64 bit FNV-1a hash of a C-style string would become:

/*
 * NOTE: This is for illustration purposes.  Optimization,
 *       if needed, is left as an exercise for the reader.  :-)
 */
char *string;		/* the string to 64 bit FNV-1a hash */
u_int64_t hash;		/* will hold the final value of the hash */
char *p;

hash = FNV1_64_INIT;
for (p=string; *p; ++p) {
    hash = FNV_64A_OP(hash, *p);
}

The 64 bit FNV-1a hash of an n octet buffer would become:

/*
 * NOTE: This is for illustration purposes.  Optimization,
 *       if needed, is left as an exercise for the reader.  :-)
 */
u_int8_t *buffer;	/* octet data buffer of n octets to hash */
u_int64_t hash;		/* will hold the final value of the hash */
int n;			/* length of the buffer in octets (bytes) */
int i;

hash = FNV1_64_INIT;
for (i=0; i < n; ++i) {
    hash = FNV_64A_OP(hash, buffer[i]);
}

All of the above operations are the standard, portable ways to hash a collection of octets. To hash a 32 bit integer in a portable way, we hash it an octet at a time from low to high order:

/*
 * NOTE: This is for illustration purposes.  Optimization,
 *       if needed, is left as an exercise for the reader.  :-)
 */
u_int32 value;		/* 32 bit integer to hash */
u_int64_t hash;		/* will hold the final value of the hash */

hash = FNV1_64_INIT;
hash = FNV_64A_OP(hash, value);
value >>= 8;
hash = FNV_64A_OP(hash, value);
value >>= 8;
hash = FNV_64A_OP(hash, value);
value >>= 8;
hash = FNV_64A_OP(hash, value);

Note that the FNV_64A_OP macros above only hash the low octet of value. By shifting value left 8 bits we effectively hash the value from the low to high byte significance order.

We can create a general integer hashing macro (and function) as follows:

/*
 * NOTE: This is for illustration purposes.  Optimization,
 *       if needed, is left as an exercise for the reader.  :-)
 */
#define FNV_64A_INT(hash, integer) \
    fnv_64a_int((u_int64_t)(hash), (u_int64_t)(integer), sizeof(integer))

/*
 * fnv_64a_int - hash an integer of up to 8 octets in size
 *
 * given:
 *	hash		current 64 bit FNV-1a hash value
 *	integer		integer value to hash
 *	size		size of the integer in octets (bytes)
 *
 * returns:
 *	64 bit FNV-1a hash of the integer in low to high byte order
 */
u_int64_t
fnv_64a_int(u_int64_t hash, u_int64_t integer, size_t size)
{
    switch (size) {
    case 8: hash = FNV_64A_OP(hash, integer);
    	    integer >>= 8;
	    /*FALLTHRU*/
    case 7: hash = FNV_64A_OP(hash, integer);
    	    integer >>= 8;
	    /*FALLTHRU*/
    case 6: hash = FNV_64A_OP(hash, integer);
    	    integer >>= 8;
	    /*FALLTHRU*/
    case 5: hash = FNV_64A_OP(hash, integer);
    	    integer >>= 8;
	    /*FALLTHRU*/
    case 4: hash = FNV_64A_OP(hash, integer);
    	    integer >>= 8;
	    /*FALLTHRU*/
    case 3: hash = FNV_64A_OP(hash, integer);
    	    integer >>= 8;
	    /*FALLTHRU*/
    case 2: hash = FNV_64A_OP(hash, integer);
    	    integer >>= 8;
	    /*FALLTHRU*/
    case 1: hash = FNV_64A_OP(hash, integer);
	    break;
    default:
    	fprintf(stderr, "fnv_64a_int: BOGUS SIZE: %d\n", size);
	exit(1);
    }
    return hash;
}

Returning to our original struct stuff, we can FNV hash it in a portable way as follows:

struct stuff {
    int data;		/* 32 bits? 64 bits? */
   			/* how many octets of padding? */
    char byte;
    			/* how many octets of padding? */
    short value;	/* 16 bits? */
    			/* how many octets of padding? */
} foo;

/*
 * NOTE: This is for illustration purposes.  Optimization,
 *       if needed, is left as an exercise for the reader.  :-)
 */
hash = FNV1_64_INIT;
hash = FNV_64A_INT(hash, foo.data);
hash = FNV_64A_INT(hash, foo.byte);
hash = FNV_64A_INT(hash, foo.value);


(top)

Extended FNV type hashing

One can extend the concept of Portable FNV hashing of numeric values to the FNV hashing of data types. Let us say you wish for the following two values, whey and bar typically hash to different values:
struct curds {
    u_int16_t left;
    u_int16_t right;
} whey = {0x31, 0x31};

u_int32_t bar = 0x3131;

Or let us say that you want these to numeric values to typically hash to different values:

int32_t value1 = 23209;
u_int32_t value2 = 23209;

If the Portable FNV hashing method were used, both whey and bar would usually FNV hash to the same value. And value1 and value2 would certainly hash to the same value.

Let is say, for some reason, you wanted whey and bar (and value1 and value2) usually FNV hash different values even though their bit patterns in memory look the same. You can accomplish this by hashing in a FNV type value before hashing its numeric value.

Consider the following FNV type values enum:

/*
 * This is an example of how FNV type values could be
 * implemented.  This is not an exhaustive list.
 *
 * The intent of these enum values is that they all bit into a
 * single octet (i.e., < 256).
 */
enum fnv_type {
	/* Common C types */
	FNV_VOID = 0,
	FNV_CHAR, FNV_UCHAR, FNV_SHORT, FNV_USHORT, FNV_INT,
	FNV_UINT, FNV_LONG, FNV_ULONG, FNV_FLOAT, FNV_DOUBLE, FNV_LONGDOUBLE,

	/* Common C type extension */
	FNV_LONGLONG, FNV_ULONGLONG,

	/* Explicit intXY_t and u_intXY_t typedefs */
	FNV_INT8_T, FNV_U_INT8_T, FNV_INT16_T, FNV_U_INT16_T,
	FNV_INT32_T, FNV_U_INT32_T, FNV_INT64_T, FNV_U_INT64_T,

	/* bit fields by length,  ctype foo : 7 ==> BITFLD_7 */
		      FNV_BITFLD1,  FNV_BITFLD2,  FNV_BITFLD3,  FNV_BITFLD4,
        FNV_BITFLD5,  FNV_BITFLD6,  FNV_BITFLD7,  FNV_BITFLD8,  FNV_BITFLD9,
	FNV_BITFLD10, FNV_BITFLD11, FNV_BITFLD12, FNV_BITFLD13, FNV_BITFLD14,
	FNV_BITFLD15, FNV_BITFLD16, FNV_BITFLD17, FNV_BITFLD18, FNV_BITFLD19,
	FNV_BITFLD20, FNV_BITFLD21, FNV_BITFLD22, FNV_BITFLD23, FNV_BITFLD24,
	FNV_BITFLD25, FNV_BITFLD26, FNV_BITFLD27, FNV_BITFLD28, FNV_BITFLD29,
	FNV_BITFLD30, FNV_BITFLD31, FNV_BITFLD32, FNV_BITFLD33, FNV_BITFLD34,
	FNV_BITFLD35, FNV_BITFLD36, FNV_BITFLD37, FNV_BITFLD38, FNV_BITFLD39,
	FNV_BITFLD40, FNV_BITFLD41, FNV_BITFLD42, FNV_BITFLD43, FNV_BITFLD44,
	FNV_BITFLD45, FNV_BITFLD46, FNV_BITFLD47, FNV_BITFLD48, FNV_BITFLD49,
	FNV_BITFLD50, FNV_BITFLD51, FNV_BITFLD52, FNV_BITFLD53, FNV_BITFLD54,
	FNV_BITFLD55, FNV_BITFLD56, FNV_BITFLD57, FNV_BITFLD58, FNV_BITFLD59,
	FNV_BITFLD60, FNV_BITFLD61, FNV_BITFLD62, FNV_BITFLD63, FNV_BITFLD64,

	/* pointer to some C type */
	FNV_POINTER,

	/* ... */
};

/*
 * FNV macro to hash FNV type values
 *
 * NOTE: Assumes that the FNV type value < 256.
 */
#define FNV_64A_TYPE(hash, type) \
    FNV_64A_OP((hash), (enum fnv_type)(type))

If we precede the FNV hashing of each data type with an FNV hash of the octet that represents the FNV type values, we can distinguish among different data types, even when their bit pattern in memory is identical.

Example:

struct stuff {
    int data;
    char byte;
    short value;
    unsigned long longvalue;
    int *words;
    void *pointer;
    char **list;
    u_int16_t a : 3;
    u_int16_t b : 2;
    u_int16_t c : 4;
} baz;

/*
 * NOTE: This is for illustration purposes.  Optimization,
 *       if needed, is left as an exercise for the reader.  :-)
 */
hash = FNV1_64_INIT;

/* int data */
hash = FNV_64A_TYPE(hash, FNV_INT);
hash = FNV_64A_INT(hash, baz.data);

/* char byte */
hash = FNV_64A_TYPE(hash, FNV_CHAR);
hash = FNV_64A_INT(hash, baz.byte);

/* short value */
hash = FNV_64A_TYPE(hash, FNV_SHORT);
hash = FNV_64A_INT(hash, baz.value);

/* unsigned long longvalue */
hash = FNV_64A_TYPE(hash, FNV_ULONG);
hash = FNV_64A_INT(hash, baz.longvalue);

/* int *words */
hash = FNV_64A_TYPE(hash, FNV_POINTER);
hash = FNV_64A_TYPE(hash, FNV_INT);
hash = FNV_64A_INT(hash, baz.words);

/* void *pointer */
hash = FNV_64A_TYPE(hash, FNV_POINTER);
hash = FNV_64A_TYPE(hash, FNV_VOID);
hash = FNV_64A_INT(hash, baz.pointer);

/* char **list */
hash = FNV_64A_TYPE(hash, FNV_POINTER);
hash = FNV_64A_TYPE(hash, FNV_POINTER);
hash = FNV_64A_TYPE(hash, FNV_CHAR);
hash = FNV_64A_INT(hash, baz.list);

/* u_int16_t a : 3 */
hash = FNV_64A_TYPE(hash, FNV_U_INT16_T);
hash = FNV_64A_TYPE(hash, FNV_BITFLD3);
hash = FNV_64A_INT(hash, baz.a);

/* u_int16_t b : 2 */
hash = FNV_64A_TYPE(hash, FNV_U_INT16_T);
hash = FNV_64A_TYPE(hash, FNV_BITFLD2);
hash = FNV_64A_INT(hash, baz.b);

/* u_int16_t c : 4 */
hash = FNV_64A_TYPE(hash, FNV_U_INT16_T);
hash = FNV_64A_TYPE(hash, FNV_BITFLD4);
hash = FNV_64A_INT(hash, baz.c);

Again, if you do NOT care about distinguishing between different data types even when you have the same numeric values, you do not need to use something like the method above. And if you do not need portable FNV files, then:

hash = fnv_64a_buf(&baz, sizeof(baz), FNV1_64_INIT);

will work just fine.


(top)

FNV source

In the C FNV source below, primes are provided for 32 bit and 64 bit unsigned integers. For compilers that do not implement the unsigned long long type, code is provided to quickly simulate the 64 bit multiply by the particular FNV_prime.

Bonelli Nicola has an implementation of FNV using the iovec interface:

Wayne Diamond implemented 32-bit FNV algorithm in PowerBASIC inline x86 assembly:


FUNCTION FNV32(BYVAL dwOffset AS DWORD, BYVAL dwLen AS DWORD, BYVAL offset_basis AS DWORD) AS DWORD
#REGISTER NONE
! mov esi, dwOffset      ;esi = ptr to buffer
! mov ecx, dwLen         ;ecx = length of buffer (counter)
! mov eax, offset_basis  ;set to 2166136261 for FNV-1
! mov edi, &h01000193    ;FNV_32_PRIME = 16777619
! xor ebx, ebx           ;ebx = 0
nextbyte:
! mul edi                ;eax = eax * FNV_32_PRIME
! mov bl, [esi]          ;bl = byte from esi
! xor eax, ebx           ;al = al xor bl
! inc esi                ;esi = esi + 1 (buffer pos)
! dec ecx                ;ecx = ecx - 1 (counter)
! jnz nextbyte           ;if ecx is 0, jmp to NextByte
! mov FUNCTION, eax      ;else, function = eax
END FUNCTION

Wayne said:

''Just thought I should let you know that I've ported the 32-bit FNV algorithm over to inline assembly. It's actually in PowerBASIC (www.powerbasic.com) format - a compiler I use, but the main function is all assembly. It could be optimized further in terms of saving a couple of clock cycles, but it's fairly optimized al ready - only 6 instructions in the main loop, plus 5 setup instructions, and compiles to just 33 bytes.''

M.S.Schulte sent us these 32-bit FNV-1 and FNV-1a x86 assembler implementations (written in flat assembler), half of which were optimized for speed, the other half were optimized for size:


small_fnv32:    ;FNV1 32bit (size: 31 bytes)
;                Intel Core 2 Duo E6600: 354.20 mb/s
   push    esi
   push    edi
   mov     esi, [esp + 0ch] ;buffer
   mov     ecx, [esp + 10h] ;length
   mov     eax, [esp + 14h] ;basis
   mov     edi, 01000193h   ;fnv_32_prime
 next:
   mul     edi
   xor     al, [esi]
   inc     esi
   loop    snext
   pop     edi
   pop     esi
   retn    0ch

 small_fnv32a:   ;FNV1a 32bit (size: 31 bytes)
;                 Intel Core 2 Duo E6600: 327.68 mb/s
   push    esi
   push    edi
   mov     esi, [esp + 0ch] ;buffer
   mov     ecx, [esp + 10h] ;length
   mov     eax, [esp + 14h] ;basis
   mov     edi, 01000193h   ;fnv_32_prime
 nexta:
   xor     al, [esi]
   mul     edi
   inc     esi
   loop    nexta
   pop     edi
   pop     esi
   retn    0ch

fast_fnv32:    ;FNV1 32bit (size: 36 bytes)
;               Intel Core 2 Duo E6600: 565.12 mb/s
   push    ebx
   push    esi
   push    edi
   mov     esi, [esp + 10h] ;buffer
   mov     ecx, [esp + 14h] ;length
   mov     eax, [esp + 18h] ;basis
   mov     edi, 01000193h   ;fnv_32_prime
   xor     ebx, ebx
 next:
   mul     edi
   mov     bl, [esi]
   xor     eax, ebx
   inc     esi
   dec     ecx
   jnz     next
   pop     edi
   pop     esi
   pop     ebx
   retn    0ch

 fast_fnv32a:   ;FNV1a 32bit (size: 36 bytes)
;                Intel Core 2 Duo E6600: 574.95 mb/s
   push    ebx
   push    esi
   push    edi
   mov     esi, [esp + 10h] ;buffer
   mov     ecx, [esp + 14h] ;length
   mov     eax, [esp + 18h] ;basis
   mov     edi, 01000193h   ;fnv_32_prime
   xor     ebx, ebx
 nexta:
   mov     bl, [esi]
   xor     eax, ebx
   mul     edi
   inc     esi
   dec     ecx
   jnz     nexta
   pop     edi
   pop     esi
   pop     ebx
   retn    0ch


(top)

gcc optimization

It has been reported by several people that under the gcc compiler with -O3 on many AMD & Intel CPUs, that replacing the FNV_prime multiply with a expression of shifts and adds will improve the performance.

Limited testing on our part confirmed that one can gain a few % in speed on an 1.6GHz AMD Athlon using gcc version 3.2.2 with -O3 optimization.

For a 32 bit FNV-1, we used:

while (bp < be) {

    /* multiply by the 32 bit FNV magic prime mod 2^32 */
#if defined(NO_FNV_GCC_OPTIMIZATION)
    hval *= FNV_32_PRIME;
#else
    hval += (hval<<1) + (hval<<4) + (hval<<7) + (hval<<8) + (hval<<24);
#endif

    /* xor the bottom with the current octet */
    hval ^= (Fnv32_t)*bp++;
}

For a 32 bit FNV-1a, we used:

while (bp < be) {

    /* xor the bottom with the current octet */
    hval ^= (Fnv32_t)*bp++;

    /* multiply by the 32 bit FNV magic prime mod 2^32 */
#if defined(NO_FNV_GCC_OPTIMIZATION)
    hval *= FNV_32_PRIME;
#else
    hval += (hval<<1) + (hval<<4) + (hval<<7) + (hval<<8) + (hval<<24);
#endif
}

For a 64 bit FNV-1, we used:

while (bp < be) {

    /* multiply by the 64 bit FNV magic prime mod 2^64 */
#if defined(NO_FNV_GCC_OPTIMIZATION)
    hval *= FNV_64_PRIME;
#else
    hval += (hval << 1) + (hval << 4) + (hval << 5) +
	    hval << 7) + (hval << 8) + (hval << 40);
#endif

    /* xor the bottom with the current octet */
    hval ^= (Fnv64_t)*bp++;
}

For a 64 bit FNV-1a, we used:

while (bp < be) {

    /* xor the bottom with the current octet */
    hval ^= (Fnv64_t)*bp++;

    /* multiply by the 64 bit FNV magic prime mod 2^64 */
#if defined(NO_FNV_GCC_OPTIMIZATION)
    hval *= FNV_64_PRIME;
#else
    hval += (hval << 1) + (hval << 4) + (hval << 5) +
	    hval << 7) + (hval << 8) + (hval << 40);
#endif
}

Landon Curt Noll
chongo <was here> /\oo/\