• N
    untracked cache: record .gitignore information and dir hierarchy · 0dcb8d7f
    Nguyễn Thái Ngọc Duy 提交于
    The idea is if we can capture all input and (non-rescursive) output of
    read_directory_recursive(), and can verify later that all the input is
    the same, then the second r_d_r() should produce the same output as in
    the first run.
    
    The requirement for this to work is stat info of a directory MUST
    change if an entry is added to or removed from that directory (and
    should not change often otherwise). If your OS and filesystem do not
    meet this requirement, untracked cache is not for you. Most file
    systems on *nix should be fine. On Windows, NTFS is fine while FAT may
    not be [1] even though FAT on Linux seems to be fine.
    
    The list of input of r_d_r() is in the big comment block in dir.h. In
    short, the output of a directory (not counting subdirs) mainly depends
    on stat info of the directory in question, all .gitignore leading to
    it and the check_only flag when r_d_r() is called recursively. This
    patch records all this info (and the output) as r_d_r() runs.
    
    Two hash_sha1_file() are required for $GIT_DIR/info/exclude and
    core.excludesfile unless their stat data matches. hash_sha1_file() is
    only needed when .gitignore files in the worktree are modified,
    otherwise their SHA-1 in index is used (see the previous patch).
    
    We could store stat data for .gitignore files so we don't have to
    rehash them if their content is different from index, but I think
    .gitignore files are rarely modified, so not worth extra cache data
    (and hashing penalty read-cache.c:verify_hdr(), as we will be storing
    this as an index extension).
    
    The implication is, if you change .gitignore, you better add it to the
    index soon or you lose all the benefit of untracked cache because a
    modified .gitignore invalidates all subdirs recursively. This is
    especially bad for .gitignore at root.
    
    This cached output is about untracked files only, not ignored files
    because the number of tracked files is usually small, so small cache
    overhead, while the number of ignored files could go really high
    (e.g. *.o files mixing with source code).
    
    [1] "Description of NTFS date and time stamps for files and folders"
        http://support.microsoft.com/kb/299648Helped-by: NTorsten Bögershausen <tboegi@web.de>
    Helped-by: NDavid Turner <dturner@twopensource.com>
    Signed-off-by: NNguyễn Thái Ngọc Duy <pclouds@gmail.com>
    Signed-off-by: NJunio C Hamano <gitster@pobox.com>
    0dcb8d7f
dir.c 45.8 KB