- 04 4月, 2014 4 次提交
-
-
由 antirez 提交于
Using a seed of zero has the side effect of having the empty string hashing to what is a very special case in the context of HyperLogLog: a very long run of zeroes. This did not influenced the correctness of the result with 16k registers because of the harmonic mean, but still it is inconvenient that a so obvious value maps to a so special hash. The seed 0xadc83b19 is used instead, which is the first 64 bits of the SHA1 of the empty string. Reference: issue #1657.
-
由 antirez 提交于
-
由 antirez 提交于
-
由 antirez 提交于
We need to guarantee that the last bit is 1, otherwise an element may hash to just zeroes with probability 1/(2^64) and trigger an infinite loop. See issue #1657.
-
- 03 4月, 2014 5 次提交
- 02 4月, 2014 5 次提交
-
-
由 antirez 提交于
The function to generate graphs is also more flexible as now includes step and max value. The step of the samples generation function is no longer limited to min step of 1000.
-
由 antirez 提交于
This will allow future changes like compressed representations. Currently the magic is not checked for performance reasons but this may change in the future, for example if we add new types encoded in strings that may have the same size of HyperLogLogs.
-
由 Salvatore Sanfilippo 提交于
Change HLL* to PF* in a few spots
-
由 Raymond Myers 提交于
-
由 Raymond Myers 提交于
-
- 01 4月, 2014 4 次提交
- 31 3月, 2014 12 次提交
-
-
由 antirez 提交于
-
由 antirez 提交于
-
由 antirez 提交于
Better results can be achieved by compensating for the bias of the raw approximation just after 2.5m (when LINEARCOUNTING is no longer used) by using a polynomial that approximates the bias at a given cardinality. The curve used was found using this web page: http://www.xuru.org/rt/PR.asp That performs polynomial regression given a set of values.
-
由 antirez 提交于
Merge N HLL data structures by selecting the max value for every M[i] register among the set of HLLs.
-
由 antirez 提交于
When we update the cached value, we need to propagate the command and signal the key as modified for WATCH.
-
由 antirez 提交于
The following form is given: HLLADD myhll No element is provided in the above case so if 'myhll' var does not exist the result is to just create an empty HLL structure, and no update will be performed on the registers. In this case, the DB should still be set dirty and the command propagated.
-
由 antirez 提交于
-
由 antirez 提交于
Those are generated to trace graphs using gnuplot.
-
由 antirez 提交于
The HyperLogLog original paper suggests using LINEARCOUNTING for cardinalities < 2.5m, however for P=14 the median / max error curves show that a value of '3' is the best pick for m = 16384.
-
由 antirez 提交于
-
由 antirez 提交于
The more we add elements to an HyperLogLog counter, the smaller is the probability that we actually update some register. From this observation it is easy to see how it is possible to use caching of a previously computed cardinality and reuse it to serve HLLCOUNT queries as long as no register was updated in the data structure. This commit does exactly this by using just additional 8 bytes for the data structure to store a 64 bit unsigned integer value cached cardinality. When the most significant bit of the 64 bit integer is set, it means that the value computed is no longer usable since at least a single register was modified and we need to recompute it at the next call of HLLCOUNT. The value is always stored in little endian format regardless of the actual CPU endianess.
-
由 antirez 提交于
All the Redis functions that need to modify the string value of a key in a destructive way (APPEND, SETBIT, SETRANGE, ...) require to make the object unshared (if refcount > 1) and encoded in raw format (if encoding is not already REDIS_ENCODING_RAW). This was cut & pasted many times in multiple places of the code. This commit puts the small logic needed into a function called dbUnshareStringValue().
-
- 30 3月, 2014 1 次提交
-
-
由 antirez 提交于
We need to be sure that you can save a dataset in a Redis instance, reload it in a different architecture, and continue to count in the same HyperLogLog structure. So 32 and 64 bit, little or bit endian, must all guarantee to output the same hash for the same element.
-
- 29 3月, 2014 9 次提交
-
-
由 antirez 提交于
The new representation is more obvious, starting from the LSB of the first byte and using bits going to MSB, and passing to next byte as needed. There was also a subtle error: first two bits were unused, everything was carried over on the right of two bits, even if it worked because of the code requirement of always having a byte more at the end. During the rewrite the code was made safer trying to avoid undefined behavior due to shifting an uint8_t for more than 8 bits.
-
由 antirez 提交于
-
由 antirez 提交于
-
由 antirez 提交于
This speedups the macros by a noticeable factor.
-
由 antirez 提交于
-
由 antirez 提交于
-
由 antirez 提交于
-
由 antirez 提交于
There was an error in the computation of 2^register, and the sequence of zeroes computed after the hashing did not included the "1".
-
由 antirez 提交于
-