• A
    RDB modules values serialization format version 2. · 5af0fc0c
    antirez 提交于
    The original RDB serialization format was not parsable without the
    module loaded, becuase the structure was managed only by the module
    itself. Moreover RDB is a streaming protocol in the sense that it is
    both produce di an append-only fashion, and is also sometimes directly
    sent to the socket (in the case of diskless replication).
    
    The fact that modules values cannot be parsed without the relevant
    module loaded is a problem in many ways: RDB checking tools must have
    loaded modules even for doing things not involving the value at all,
    like splitting an RDB into N RDBs by key or alike, or just checking the
    RDB for sanity.
    
    In theory module values could be just a blob of data with a prefixed
    length in order for us to be able to skip it. However prefixing the values
    with a length would mean one of the following:
    
    1. To be able to write some data at a previous offset. This breaks
    stremaing.
    2. To bufferize values before outputting them. This breaks performances.
    3. To have some chunked RDB output format. This breaks simplicity.
    
    Moreover, the above solution, still makes module values a totally opaque
    matter, with the fowllowing problems:
    
    1. The RDB check tool can just skip the value without being able to at
    least check the general structure. For datasets composed mostly of
    modules values this means to just check the outer level of the RDB not
    actually doing any checko on most of the data itself.
    2. It is not possible to do any recovering or processing of data for which a
    module no longer exists in the future, or is unknown.
    
    So this commit implements a different solution. The modules RDB
    serialization API is composed if well defined calls to store integers,
    floats, doubles or strings. After this commit, the parts generated by
    the module API have a one-byte prefix for each of the above emitted
    parts, and there is a final EOF byte as well. So even if we don't know
    exactly how to interpret a module value, we can always parse it at an
    high level, check the overall structure, understand the types used to
    store the information, and easily skip the whole value.
    
    The change is backward compatible: older RDB files can be still loaded
    since the new encoding has a new RDB type: MODULE_2 (of value 7).
    The commit also implements the ability to check RDB files for sanity
    taking advantage of the new feature.
    5af0fc0c
module.c 147.1 KB