• A
    Shared dictionary compression using reference block · 843d2e31
    Andrew Kryczka 提交于
    Summary:
    This adds a new metablock containing a shared dictionary that is used
    to compress all data blocks in the SST file. The size of the shared dictionary
    is configurable in CompressionOptions and defaults to 0. It's currently only
    used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
    the compression type if the user chooses a nonzero dictionary size.
    
    During compaction, computes the dictionary by randomly sampling the first
    output file in each subcompaction. It pre-computes the intervals to sample
    by assuming the output file will have the maximum allowable length. In case
    the file is smaller, some of the pre-computed sampling intervals can be beyond
    end-of-file, in which case we skip over those samples and the dictionary will
    be a bit smaller. After the dictionary is generated using the first file in a
    subcompaction, it is loaded into the compression library before writing each
    block in each subsequent file of that subcompaction.
    
    On the read path, gets the dictionary from the metablock, if it exists. Then,
    loads that dictionary into the compression library before reading each block.
    
    Test Plan: new unit test
    
    Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong
    
    Reviewed By: sdong
    
    Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb
    
    Differential Revision: https://reviews.facebook.net/D52287
    843d2e31
可在Tags中查看这些版本中当前仓库的状态.
HISTORY.md 32.5 KB