Hash functions can be used for deterministic pseudo-random shuffling of elements.
## halfMD5
## halfMD5 {#hash_functions-halfmd5}
Calculates the MD5 from a string. Then it takes the first 8 bytes of the hash and interprets them as UInt64 in big endian.
Accepts a String-type argument. Returns UInt64.
This function works fairly slowly (5 million short strings per second per processor core).
If you don't need MD5 in particular, use the 'sipHash64' function instead.
[Interprets](../../query_language/functions/type_conversion_functions.md#type_conversion_functions-reinterpretAsString) all the input parameters as strings and calculates the MD5 hash value for each of them. Then combines hashes. Then from the resulting string, takes the first 8 bytes of the hash and interprets them as `UInt64` in big-endian byte order.
## MD5
```
halfMD5(par1, ...)
```
The function works relatively slow (5 million short strings per second per processor core).
Consider using the [sipHash64](#hash_functions-siphash64) function instead.
**Parameters**
The function takes a variable number of input parameters. Parameters can be any of the [supported data types](../../data_types/index.md).
**Returned Value**
Hash value having the [UInt64](../../data_types/int_uint.md) data type.
This function [interprets](../../query_language/functions/type_conversion_functions.md#type_conversion_functions-reinterpretAsString) all the input parameters as strings and calculates the hash value for each of them. Then combines hashes.
This is a cryptographic hash function. It works at least three times faster than the [MD5](#hash_functions-md5) function.
**Parameters**
The function takes a variable number of input parameters. Parameters can be any of the [supported data types](../../data_types/index.md).
**Returned Value**
Hash value having the [UInt64](../../data_types/int_uint.md) data type.
@@ -30,11 +77,41 @@ Differs from sipHash64 in that the final xor-folding state is only done up to 12
## cityHash64
Calculates CityHash64 from a string or a similar hash function for any number of any type of arguments.
For String-type arguments, CityHash is used. This is a fast non-cryptographic hash function for strings with decent quality.
For other types of arguments, a decent implementation-specific fast non-cryptographic hash function is used.
If multiple arguments are passed, the function is calculated using the same rules and chain combinations using the CityHash combinator.
For example, you can compute the checksum of an entire table with accuracy up to the row order: `SELECT sum(cityHash64(*)) FROM table`.
Produces 64-bit hash value.
```
cityHash64(par1,...)
```
This is the fast non-cryptographic hash function. It uses [CityHash](https://github.com/google/cityhash) algorithm for string parameters and implementation-specific fast non-cryptographic hash function for the parameters with other data types. To get the final result, the function uses the CityHash combinator.
**Parameters**
The function takes a variable number of input parameters. Parameters can be any of the [supported data types](../../data_types/index.md).
**Returned Value**
Hash value having the [UInt64](../../data_types/int_uint.md) data type.
@@ -97,7 +97,7 @@ SELECT toFixedString('foo\0bar', 8) AS s, toStringCutToZero(s) AS s_cut
These functions accept a string and interpret the bytes placed at the beginning of the string as a number in host order (little endian). If the string isn't long enough, the functions work as if the string is padded with the necessary number of null bytes. If the string is longer than needed, the extra bytes are ignored. A date is interpreted as the number of days since the beginning of the Unix Epoch, and a date with time is interpreted as the number of seconds since the beginning of the Unix Epoch.
This function accepts a number or date or date with time, and returns a string containing bytes representing the corresponding value in host order (little endian). Null bytes are dropped from the end. For example, a UInt32 type value of 255 is a string that is one byte long.