1. 11 10月, 2018 1 次提交
    • P
      support OnCompactionBegin (#4431) · 09814f2c
      Peter Pei 提交于
      Summary:
      fix #4288
      
      Add `OnCompactionBegin` support to `rocksdb::EventListener`.
      
      Currently, we only have these three callbacks:
      
      - OnFlushBegin
      - OnFlushCompleted
      - OnCompactionCompleted
      
      As paolococchi requested in #4288 , and ajkr agreed, we should also support `OnCompactionBegin`.
      
      This PR is a try to implement the support of `OnCompactionBegin`.
      
      Hope it is useful to you.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4431
      
      Differential Revision: D10055515
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 39c0f95f8e9ff1c7ca3a10787502a17f258d2334
      09814f2c
  2. 09 10月, 2018 1 次提交
    • D
      Fix DBImpl::GetColumnFamilyHandleUnlocked race condition (#4391) · 27090ae8
      DorianZheng 提交于
      Summary:
      - Fix DBImpl API race condition
      
      The timeline of execution flow is as follow:
      ```
      timeline              user_thread1                      user_thread2
      t1   |     cfh = GetColumnFamilyHandleUnlocked(0)
      t2   |     id1 = cfh->GetID()
      t3   |                                                GetColumnFamilyHandleUnlocked(1)
      t4   |     id2 = cfh->GetID()
           V
      ```
      The original implementation return a pointer to a stateful variable, so that the return `ColumnFamilyHandle` will be changed when another thread calls `GetColumnFamilyHandleUnlocked` with different `column family id`
      
      - Expose ColumnFamily ID to compaction event listener
      
      - Fix the return status of `DBImpl::GetLatestSequenceForKey`
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4391
      
      Differential Revision: D10221243
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: dec60ee9ff0c8261a2f2413a8506ec1063991993
      27090ae8
  3. 15 9月, 2018 1 次提交
  4. 14 8月, 2018 1 次提交
  5. 11 8月, 2018 1 次提交
  6. 01 8月, 2018 1 次提交
    • S
      Trace and Replay for RocksDB (#3837) · 12b6cdee
      Sagar Vemuri 提交于
      Summary:
      A framework for tracing and replaying RocksDB operations.
      
      A binary trace file is created by capturing the DB operations, and it can be replayed back at the same rate using db_bench.
      
      - Column-families are supported
      - Multi-threaded tracing is supported.
      - TraceReader and TraceWriter are exposed to the user, so that tracing to various destinations can be enabled (say, to other messaging/logging services). By default, a FileTraceReader and FileTraceWriter are implemented to capture to a file and replay from it.
      - This is not yet ideal to be enabled in production due to large performance overhead, but it can be safely tried out in a shadow setup, say, for analyzing RocksDB operations.
      
      Currently supported DB operations:
      - Writes:
      -- Put
      -- Merge
      -- Delete
      -- SingleDelete
      -- DeleteRange
      -- Write
      - Reads:
      -- Get (point lookups)
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/3837
      
      Differential Revision: D7974837
      
      Pulled By: sagar0
      
      fbshipit-source-id: 8ec65aaf336504bc1f6ed0feae67f6ed5ef97a72
      12b6cdee
  7. 21 7月, 2018 1 次提交
    • S
      BlockBasedTableReader: automatically adjust tail prefetch size (#4156) · 8425c8bd
      Siying Dong 提交于
      Summary:
      Right now we use one hard-coded prefetch size to prefetch data from the tail of the SST files. However, this may introduce a waste for some use cases, while not efficient for others.
      Introduce a way to adjust this prefetch size by tracking 32 recent times, and pick a value with which the wasted read is less than 10%
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4156
      
      Differential Revision: D8916847
      
      Pulled By: siying
      
      fbshipit-source-id: 8413f9eb3987e0033ed0bd910f83fc2eeaaf5758
      8425c8bd
  8. 14 7月, 2018 1 次提交
    • M
      Per-thread unique test db names (#4135) · 8581a93a
      Maysam Yabandeh 提交于
      Summary:
      The patch makes sure that two parallel test threads will operate on different db paths. This enables using open source tools such as gtest-parallel to run the tests of a file in parallel.
      Example: ``` ~/gtest-parallel/gtest-parallel ./table_test```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4135
      
      Differential Revision: D8846653
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 799bad1abb260e3d346bcb680d2ae207a852ba84
      8581a93a
  9. 28 6月, 2018 3 次提交
    • M
      Remove ReadOnly part of PinnableSliceAndMmapReads from Lite (#4070) · 0a5b5d88
      Maysam Yabandeh 提交于
      Summary:
      Lite does not support readonly DBs.
      Closes https://github.com/facebook/rocksdb/pull/4070
      
      Differential Revision: D8677858
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 536887d2363ee2f5d8e1ea9f1a511e643a1707fa
      0a5b5d88
    • M
      Pin mmap files in ReadOnlyDB (#4053) · 235ab9dd
      Maysam Yabandeh 提交于
      Summary:
      https://github.com/facebook/rocksdb/pull/3881 fixed a bug where PinnableSlice pin mmap files which could be deleted with background compaction. This is however a non-issue for ReadOnlyDB when there is no compaction running and max_open_files is -1. This patch reenables the pinning feature for that case.
      Closes https://github.com/facebook/rocksdb/pull/4053
      
      Differential Revision: D8662546
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 402962602eb0f644e17822748332999c3af029fd
      235ab9dd
    • M
      WriteUnPrepared Txn: Disable seek to snapshot optimization (#3955) · a16e00b7
      Manuel Ung 提交于
      Summary:
      This is implemented by extending ReadCallback with another function `MaxUnpreparedSequenceNumber` which returns the largest visible sequence number for the current transaction, if there is uncommitted data written to DB. Otherwise, it returns zero, indicating no uncommitted data.
      
      There are the places where reads had to be modified.
      - Get and Seek/Next was just updated to seek to max(snapshot_seq, MaxUnpreparedSequenceNumber()) instead, and iterate until a key was visible.
      - Prev did not need need updates since it did not use the Seek to sequence number optimization. Assuming that locks were held when writing unprepared keys, and ValidateSnapshot runs, there should only be committed keys and unprepared keys of the current transaction, all of which are visible. Prev will simply iterate to get the last visible key.
      - Reseeking to skip keys optimization was also disabled for write unprepared, since it's possible to hit the max_skip condition even while reseeking. There needs to be some way to resolve infinite looping in this case.
      Closes https://github.com/facebook/rocksdb/pull/3955
      
      Differential Revision: D8286688
      
      Pulled By: lth
      
      fbshipit-source-id: 25e42f47fdeb5f7accea0f4fd350ef35198caafe
      a16e00b7
  10. 02 6月, 2018 1 次提交
    • A
      Copy Get() result when file reads use mmap · fea2b1df
      Andrew Kryczka 提交于
      Summary:
      For iterator reads, a `SuperVersion` is pinned to preserve a snapshot of SST files, and `Block`s are pinned to allow `key()` and `value()` to return pointers directly into a RocksDB memory region. This works for both non-mmap reads, where the block owns the memory region, and mmap reads, where the file owns the memory region.
      
      For point reads with `PinnableSlice`, only the `Block` object is pinned. This works for non-mmap reads because the block owns the memory region, so even if the file is deleted after compaction, the memory region survives. However, for mmap reads, file deletion causes the memory region to which the `PinnableSlice` refers to be unmapped.   The result is usually a segfault upon accessing the `PinnableSlice`, although sometimes it returned wrong results (I repro'd this a bunch of times with `db_stress`).
      
      This PR copies the value into the `PinnableSlice` when it comes from mmap'd memory. We can tell whether the `Block` owns its memory using `Block::cachable()`, which is unset when reads do not use the provided buffer as is the case with mmap file reads. When that is false we ensure the result of `Get()` is copied.
      
      This feels like a short-term solution as ideally we'd have the `PinnableSlice` pin the mmap'd memory so we can do zero-copy reads. It seemed hard so I chose this approach to fix correctness in the meantime.
      Closes https://github.com/facebook/rocksdb/pull/3881
      
      Differential Revision: D8076288
      
      Pulled By: ajkr
      
      fbshipit-source-id: 31d78ec010198723522323dbc6ea325122a46b08
      fea2b1df
  11. 10 5月, 2018 1 次提交
    • A
      Apply use_direct_io_for_flush_and_compaction to writes only · 072ae671
      Andrew Kryczka 提交于
      Summary:
      Previously `DBOptions::use_direct_io_for_flush_and_compaction=true` combined with `DBOptions::use_direct_reads=false` could cause RocksDB to simultaneously read from two file descriptors for the same file, where background reads used direct I/O and foreground reads used buffered I/O. Our measurements found this mixed-mode I/O negatively impacted foreground read perf, compared to when only buffered I/O was used.
      
      This PR makes the mixed-mode I/O situation impossible by repurposing `DBOptions::use_direct_io_for_flush_and_compaction` to only apply to background writes, and `DBOptions::use_direct_reads` to apply to all reads. There is no risk of direct background direct writes happening simultaneously with buffered reads since we never read from and write to the same file simultaneously.
      Closes https://github.com/facebook/rocksdb/pull/3829
      
      Differential Revision: D7915443
      
      Pulled By: ajkr
      
      fbshipit-source-id: 78bcbf276449b7e7766ab6b0db246f789fb1b279
      072ae671
  12. 13 4月, 2018 1 次提交
  13. 06 4月, 2018 1 次提交
    • P
      Support for Column family specific paths. · 446b32cf
      Phani Shekhar Mantripragada 提交于
      Summary:
      In this change, an option to set different paths for different column families is added.
      This option is set via cf_paths setting of ColumnFamilyOptions. This option will work in a similar fashion to db_paths setting. Cf_paths is a vector of Dbpath values which contains a pair of the absolute path and target size. Multiple levels in a Column family can go to different paths if cf_paths has more than one path.
      To maintain backward compatibility, if cf_paths is not specified for a column family, db_paths setting will be used. Note that, if db_paths setting is also not specified, RocksDB already has code to use db_name as the only path.
      
      Changes :
      1) A new member "cf_paths" is added to ImmutableCfOptions. This is set, based on cf_paths setting of ColumnFamilyOptions and db_paths setting of ImmutableDbOptions.  This member is used to identify the path information whenever files are accessed.
      2) Validation checks are added for cf_paths setting based on existing checks for db_paths setting.
      3) DestroyDB, PurgeObsoleteFiles etc. are edited to support multiple cf_paths.
      4) Unit tests are added appropriately.
      Closes https://github.com/facebook/rocksdb/pull/3102
      
      Differential Revision: D6951697
      
      Pulled By: ajkr
      
      fbshipit-source-id: 60d2262862b0a8fd6605b09ccb0da32bb331787d
      446b32cf
  14. 07 3月, 2018 1 次提交
    • D
      Windows cumulative patch · c364eb42
      Dmitri Smirnov 提交于
      Summary:
      This patch addressed several issues.
        Portability including db_test std::thread -> port::Thread Cc: @
        and %z to ROCKSDB portable macro. Cc: maysamyabandeh
      
        Implement Env::AreFilesSame
      
        Make the implementation of file unique number more robust
      
        Get rid of C-runtime and go directly to Windows API when dealing
        with file primitives.
      
        Implement GetSectorSize() and aling unbuffered read on the value if
        available.
      
        Adjust Windows Logger for the new interface, implement CloseImpl() Cc: anand1976
      
        Fix test running script issue where $status var was of incorrect scope
        so the failures were swallowed and not reported.
      
        DestroyDB() creates a logger and opens a LOG file in the directory
        being cleaned up. This holds a lock on the folder and the cleanup is
        prevented. This fails one of the checkpoin tests. We observe the same in production.
        We close the log file in this change.
      
       Fix DBTest2.ReadAmpBitmapLiveInCacheAfterDBClose failure where the test
       attempts to open a directory with NewRandomAccessFile which does not
       work on Windows.
        Fix DBTest.SoftLimit as it is dependent on thread timing. CC: yiwu-arbug
      Closes https://github.com/facebook/rocksdb/pull/3552
      
      Differential Revision: D7156304
      
      Pulled By: siying
      
      fbshipit-source-id: 43db0a757f1dfceffeb2b7988043156639173f5b
      c364eb42
  15. 06 3月, 2018 1 次提交
  16. 23 2月, 2018 2 次提交
  17. 31 1月, 2018 1 次提交
  18. 18 1月, 2018 1 次提交
    • A
      fix live WALs purged while file deletions disabled · 46e599fc
      Andrew Kryczka 提交于
      Summary:
      When calling `DisableFileDeletions` followed by `GetSortedWalFiles`, we guarantee the files returned by the latter call won't be deleted until after file deletions are re-enabled. However, `GetSortedWalFiles` didn't omit files already planned for deletion via `PurgeObsoleteFiles`, so the guarantee could be broken.
      
      We fix it by making `GetSortedWalFiles` wait for the number of pending purges to hit zero if file deletions are disabled. This condition is eventually met since `PurgeObsoleteFiles` is guaranteed to be called for the existing pending purges, and new purges cannot be scheduled while file deletions are disabled. Once the condition is met, `GetSortedWalFiles` simply returns the content of DB and archive directories, which nobody can delete (except for deletion scheduler, for which I plan to fix this bug later) until deletions are re-enabled.
      Closes https://github.com/facebook/rocksdb/pull/3341
      
      Differential Revision: D6681131
      
      Pulled By: ajkr
      
      fbshipit-source-id: 90b1e2f2362ea9ef715623841c0826611a817634
      46e599fc
  19. 10 11月, 2017 1 次提交
    • A
      use bottommost compression when base level is bottommost · 93f69cb9
      Andrew Kryczka 提交于
      Summary:
      The previous compression type selection caused unexpected behavior when the base level was also the bottommost level. The following sequence of events could happen:
      
      - full compaction generates files with `bottommost_compression` type
      - now base level is bottommost level since all files are in the same level
      - any compaction causes files to be rewritten `compression_per_level` type since bottommost compression didn't apply to base level
      
      I changed the code to make bottommost compression apply to base level.
      Closes https://github.com/facebook/rocksdb/pull/3141
      
      Differential Revision: D6264614
      
      Pulled By: ajkr
      
      fbshipit-source-id: d7aaa8675126896684154a1f2c9034d6214fde82
      93f69cb9
  20. 03 11月, 2017 1 次提交
    • A
      pass key/value samples through zstd compression dictionary generator · 24ad4306
      Andrew Kryczka 提交于
      Summary:
      Instead of using samples directly, we now support passing the samples through zstd's dictionary generator when `CompressionOptions::zstd_max_train_bytes` is set to nonzero. If set to zero, we will use the samples directly as the dictionary -- same as before.
      
      Note this is the first step of #2987, extracted into a separate PR per reviewer request.
      Closes https://github.com/facebook/rocksdb/pull/3057
      
      Differential Revision: D6116891
      
      Pulled By: ajkr
      
      fbshipit-source-id: 70ab13cc4c734fa02e554180eed0618b75255497
      24ad4306
  21. 17 10月, 2017 1 次提交
    • Y
      fix lite build · 8e63cad0
      Yi Wu 提交于
      Summary:
      * make `checksum_type_string_map` available for lite
      * comment out `FilesPerLevel` in lite mode.
      * travis and legocastle lite build also build `all` target and run tests
      Closes https://github.com/facebook/rocksdb/pull/3015
      
      Differential Revision: D6069822
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 9fe92ac220e711e9e6ed4e921bd25ef4314796a0
      8e63cad0
  22. 12 9月, 2017 1 次提交
    • M
      write-prepared txn: call IsInSnapshot · f46464d3
      Maysam Yabandeh 提交于
      Summary:
      This patch instruments the read path to verify each read value against an optional ReadCallback class. If the value is rejected, the reader moves on to the next value. The WritePreparedTxn makes use of this feature to skip sequence numbers that are not in the read snapshot.
      Closes https://github.com/facebook/rocksdb/pull/2850
      
      Differential Revision: D5787375
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 49d808b3062ab35e7ae98ad388f659757794184c
      f46464d3
  23. 31 8月, 2017 1 次提交
    • A
      Extend property map with compaction stats · 8a6708f5
      Artem Danilov 提交于
      Summary:
      This branch extends existing property map which keeps values in doubles to keep values in strings so that it can be used to provide wider range of properties. The immediate need for that is to provide IO stall stats in an easy parseable way to MyRocks which is also part of this branch.
      Closes https://github.com/facebook/rocksdb/pull/2794
      
      Differential Revision: D5717676
      
      Pulled By: Tema
      
      fbshipit-source-id: e34ba5b79ba774697f7b97ce1138d8fd55471b8a
      8a6708f5
  24. 25 8月, 2017 1 次提交
    • Y
      Allow DB reopen with reduced options.num_levels · 3c840d1a
      Yi Wu 提交于
      Summary:
      Allow user to reduce number of levels in LSM by issue a full CompactRange() and put the result in a lower level, and then reopen DB with reduced options.num_levels. Previous this will fail on reopen on when recovery replaying the previous MANIFEST and found a historical file was on a higher level than the new options.num_levels. The workaround was after CompactRange(), reopen the DB with old num_levels, which will create a new MANIFEST, and then reopen the DB again with new num_levels.
      
      This patch relax the check of levels during recovery. It allows DB to open if there was a historical file on level > options.num_levels, but was also deleted.
      Closes https://github.com/facebook/rocksdb/pull/2740
      
      Differential Revision: D5629354
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 545903f6b36b6083e8cbaf777176aef2f488021d
      3c840d1a
  25. 12 8月, 2017 1 次提交
    • S
      Support prefetch last 512KB with direct I/O in block based file reader · 666a005f
      Siying Dong 提交于
      Summary:
      Right now, if direct I/O is enabled, prefetching the last 512KB cannot be applied, except compaction inputs or readahead is enabled for iterators. This can create a lot of I/O for HDD cases. To solve the problem, the 512KB is prefetched in block based table if direct I/O is enabled. The prefetched buffer is passed in totegher with random access file reader, so that we try to read from the buffer before reading from the file. This can be extended in the future to support flexible user iterator readahead too.
      Closes https://github.com/facebook/rocksdb/pull/2708
      
      Differential Revision: D5593091
      
      Pulled By: siying
      
      fbshipit-source-id: ee36ff6d8af11c312a2622272b21957a7b5c81e7
      666a005f
  26. 27 7月, 2017 1 次提交
  27. 22 7月, 2017 2 次提交
  28. 16 7月, 2017 1 次提交
  29. 27 6月, 2017 1 次提交
    • E
      Encryption at rest support · 51778612
      Ewout Prangsma 提交于
      Summary:
      This PR adds support for encrypting data stored by RocksDB when written to disk.
      
      It adds an `EncryptedEnv` override of the `Env` class with matching overrides for sequential&random access files.
      The encryption itself is done through a configurable `EncryptionProvider`. This class creates is asked to create `BlockAccessCipherStream` for a file. This is where the actual encryption/decryption is being done.
      Currently there is a Counter mode implementation of `BlockAccessCipherStream` with a `ROT13` block cipher (NOTE the `ROT13` is for demo purposes only!!).
      
      The Counter operation mode uses an initial counter & random initialization vector (IV).
      Both are created randomly for each file and stored in a 4K (default size) block that is prefixed to that file. The `EncryptedEnv` implementation is such that clients of the `Env` class do not see this prefix (nor data, nor in filesize).
      The largest part of the prefix block is also encrypted, and there is room left for implementation specific settings/values/keys in there.
      
      To test the encryption, the `DBTestBase` class has been extended to consider a new environment variable called `ENCRYPTED_ENV`. If set, the test will setup a encrypted instance of the `Env` class to use for all tests.
      Typically you would run it like this:
      
      ```
      ENCRYPTED_ENV=1 make check_some
      ```
      
      There is also an added test that checks that some data inserted into the database is or is not "visible" on disk. With `ENCRYPTED_ENV` active it must not find plain text strings, with `ENCRYPTED_ENV` unset, it must find the plain text strings.
      Closes https://github.com/facebook/rocksdb/pull/2424
      
      Differential Revision: D5322178
      
      Pulled By: sdwilsh
      
      fbshipit-source-id: 253b0a9c2c498cc98f580df7f2623cbf7678a27f
      51778612
  30. 14 6月, 2017 1 次提交
  31. 06 6月, 2017 1 次提交
  32. 03 6月, 2017 1 次提交
    • S
      Improve write buffer manager (and allow the size to be tracked in block cache) · 95b0e89b
      Siying Dong 提交于
      Summary:
      Improve write buffer manager in several ways:
      1. Size is tracked when arena block is allocated, rather than every allocation, so that it can better track actual memory usage and the tracking overhead is slightly lower.
      2. We start to trigger memtable flush when 7/8 of the memory cap hits, instead of 100%, and make 100% much harder to hit.
      3. Allow a cache object to be passed into buffer manager and the size allocated by memtable can be costed there. This can help users have one single memory cap across block cache and memtable.
      Closes https://github.com/facebook/rocksdb/pull/2350
      
      Differential Revision: D5110648
      
      Pulled By: siying
      
      fbshipit-source-id: b4238113094bf22574001e446b5d88523ba00017
      95b0e89b
  33. 25 5月, 2017 1 次提交
    • A
      Introduce max_background_jobs mutable option · bb01c188
      Andrew Kryczka 提交于
      Summary:
      - `max_background_flushes` and `max_background_compactions` are still supported for backwards compatibility
      - `base_background_compactions` is completely deprecated. Now we just throttle to one background compaction when there's no pressure.
      - `max_background_jobs` is added to automatically partition the concurrent background jobs into flushes vs compactions. Currently it's very simple as we just allocate one-fourth of the jobs to flushes, and the remaining can be used for compactions.
      - The test cases that set `base_background_compactions > 1` needed to be updated. I just grab the pressure token such that the desired number of compactions can be scheduled.
      Closes https://github.com/facebook/rocksdb/pull/2205
      
      Differential Revision: D4937461
      
      Pulled By: ajkr
      
      fbshipit-source-id: df52cbbd497e13bbc9a60560a5ac2a2526b3f1f9
      bb01c188
  34. 11 5月, 2017 2 次提交
    • A
      fix readampbitmap tests · 492fc49a
      Aaron Gao 提交于
      Summary:
      fix test failure of ReadAmpBitmap and ReadAmpBitmapLiveInCacheAfterDBClose.
      test ReadAmpBitmapLiveInCacheAfterDBClose individually and make check
      Closes https://github.com/facebook/rocksdb/pull/2271
      
      Differential Revision: D5038133
      
      Pulled By: lightmark
      
      fbshipit-source-id: 803cd6f45ccfdd14a9d9473c8af311033e164be8
      492fc49a
    • A
      portable sched_getcpu calls · be421b0b
      Andrew Kryczka 提交于
      Summary:
      - added a feature test in build_detect_platform to check whether sched_getcpu() is available. glibc offers it only on some platforms (e.g., linux but not mac); this way should be easier than maintaining a list of platforms on which it's available.
      - refactored PhysicalCoreID() to be simpler / less repetitive. ordered the conditional compilation clauses from most-to-least preferred
      Closes https://github.com/facebook/rocksdb/pull/2272
      
      Differential Revision: D5038093
      
      Pulled By: ajkr
      
      fbshipit-source-id: 81d7db3cc620250de220bdeb3194b2b3d7673de7
      be421b0b
  35. 10 5月, 2017 1 次提交
    • A
      unbiase readamp bitmap · 259a00ea
      Aaron Gao 提交于
      Summary:
      Consider BlockReadAmpBitmap with bytes_per_bit = 32. Suppose bytes [a, b) were used, while bytes [a-32, a)
       and [b+1, b+33) weren't used; more formally, the union of ranges passed to BlockReadAmpBitmap::Mark() contains [a, b) and doesn't intersect with [a-32, a) and [b+1, b+33). Then bits [floor(a/32), ceil(b/32)] will be set, and so the number of useful bytes will be estimated as (ceil(b/32) - floor(a/32)) * 32, which is on average equal to b-a+31.
      
      An extreme example: if we use 1 byte from each block, it'll be counted as 32 bytes from each block.
      
      It's easy to remove this bias by slightly changing the semantics of the bitmap. Currently each bit represents a byte range [i*32, (i+1)*32).
      
      This diff makes each bit represent a single byte: i*32 + X, where X is a random number in [0, 31] generated when bitmap is created. So, e.g., if you read a single byte at random, with probability 31/32 it won't be counted at all, and with probability 1/32 it will be counted as 32 bytes; so, on average it's counted as 1 byte.
      
      *But there is one exception: the last bit will always set with the old way.*
      
      (*) - assuming read_amp_bytes_per_bit = 32.
      Closes https://github.com/facebook/rocksdb/pull/2259
      
      Differential Revision: D5035652
      
      Pulled By: lightmark
      
      fbshipit-source-id: bd98b1b9b49fbe61f9e3781d07f624e3cbd92356
      259a00ea