1. 11 10月, 2016 1 次提交
    • B
      ls-files: add pathspec matching for submodules · 75a6315f
      Brandon Williams 提交于
      Pathspecs can be a bit tricky when trying to apply them to submodules.
      The main challenge is that the pathspecs will be with respect to the
      superproject and not with respect to paths in the submodule.  The
      approach this patch takes is to pass in the identical pathspec from the
      superproject to the submodule in addition to the submodule-prefix, which
      is the path from the root of the superproject to the submodule, and then
      we can compare an entry in the submodule prepended with the
      submodule-prefix to the pathspec in order to determine if there is a
      match.
      
      This patch also permits the pathspec logic to perform a prefix match against
      submodules since a pathspec could refer to a file inside of a submodule.
      Due to limitations in the wildmatch logic, a prefix match is only done
      literally.  If any wildcard character is encountered we'll simply punt
      and produce a false positive match.  More accurate matching will be done
      once inside the submodule.  This is due to the superproject not knowing
      what files could exist in the submodule.
      Signed-off-by: NBrandon Williams <bmwill@google.com>
      Reviewed-by: NStefan Beller <sbeller@google.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      75a6315f
  2. 05 5月, 2016 1 次提交
  3. 23 4月, 2016 2 次提交
  4. 19 3月, 2016 1 次提交
  5. 02 3月, 2016 1 次提交
  6. 16 2月, 2016 1 次提交
  7. 26 1月, 2016 3 次提交
  8. 25 3月, 2015 1 次提交
    • J
      report_path_error(): move to dir.c · 777c55a6
      Junio C Hamano 提交于
      The expected call sequence is for the caller to use match_pathspec()
      repeatedly on a set of pathspecs, accumulating the "hits" in a
      separate array, and then call this function to diagnose a pathspec
      that never matched anything, as that can indicate a typo from the
      command line, e.g. "git commit Maekfile".
      
      Many builtin commands use this function from builtin/ls-files.c,
      which is not a very healthy arrangement.  ls-files might have been
      the first command to feel the need for such a helper, but the need
      is shared by everybody who uses the "match and then report" pattern.
      
      Move it to dir.c where match_pathspec() is defined.
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      777c55a6
  9. 13 3月, 2015 9 次提交
    • N
      untracked cache: guard and disable on system changes · 1e8fef60
      Nguyễn Thái Ngọc Duy 提交于
      If the user enables untracked cache, then
      
       - move worktree to an unsupported filesystem
       - or simply upgrade OS
       - or move the whole (portable) disk from one machine to another
       - or access a shared fs from another machine
      
      there's no guarantee that untracked cache can still function properly.
      Record the worktree location and OS footprint in the cache. If it
      changes, err on the safe side and disable the cache. The user can
      'update-index --untracked-cache' again to make sure all conditions are
      met.
      
      This adds a new requirement that setup_git_directory* must be called
      before read_cache() because we need worktree location by then, or the
      cache is dropped.
      
      This change does not cover all bases, you can fool it if you try
      hard. The point is to stop accidents.
      Helped-by: NEric Sunshine <sunshine@sunshineco.com>
      Helped-by: Nbrian m. carlson <sandals@crustytoothpaste.net>
      Helped-by: NTorsten Bögershausen <tboegi@web.de>
      Signed-off-by: NNguyễn Thái Ngọc Duy <pclouds@gmail.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      1e8fef60
    • N
    • N
      untracked cache: invalidate at index addition or removal · e931371a
      Nguyễn Thái Ngọc Duy 提交于
      Ideally we should implement untracked_cache_remove_from_index() and
      untracked_cache_add_to_index() so that they update untracked cache
      right away instead of invalidating it and wait for read_directory()
      next time to deal with it. But that may need some more work in
      unpack-trees.c. So stay simple as the first step.
      
      The new call in add_index_entry_with_check() may look strange because
      new calls usually stay close to cache_tree_invalidate_path(). We do it
      a bit later than c_t_i_p() in this function because if it's about
      replacing the entry with the same name, we don't care (but cache-tree
      does).
      Signed-off-by: NNguyễn Thái Ngọc Duy <pclouds@gmail.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      e931371a
    • N
    • N
      untracked cache: mark what dirs should be recursed/saved · 26cb0182
      Nguyễn Thái Ngọc Duy 提交于
      If we redo this thing in a functional style, we would have one struct
      untracked_dir as input tree and another as output. The input is used
      for verification. The output is a brand new tree, reflecting current
      worktree.
      
      But that means recreate a lot of dir nodes even if a lot could be
      shared between input and output trees in good cases. So we go with the
      messy but efficient way, combining both input and output trees into
      one. We need a way to know which node in this combined tree belongs to
      the output. This is the purpose of this "recurse" flag.
      
      "valid" bit can't be used for this because it's about data of the node
      except the subdirs. When we invalidate a directory, we want to keep
      cached data of the subdirs intact even though we don't really know
      what subdir still exists (yet). Then we check worktree to see what
      actual subdir remains on disk. Those will have 'recurse' bit set
      again. If cached data for those are still valid, we may be able to
      avoid computing exclude files for them. Those subdirs that are deleted
      will have 'recurse' remained clear and their 'valid' bits do not
      matter.
      Signed-off-by: NNguyễn Thái Ngọc Duy <pclouds@gmail.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      26cb0182
    • N
      untracked cache: record/validate dir mtime and reuse cached output · 91a2288b
      Nguyễn Thái Ngọc Duy 提交于
      The main readdir loop in read_directory_recursive() is replaced with a
      new one that checks if cached results of a directory is still valid.
      
      If a file is added or removed from the index, the containing directory
      is invalidated (but not its subdirs). If directory's mtime is changed,
      the same happens. If a .gitignore is updated, the containing directory
      and all subdirs are invalidated recursively. If dir_struct#flags or
      other conditions change, the cache is ignored.
      
      If a directory is invalidated, we opendir/readdir/closedir and run the
      exclude machinery on that directory listing as usual. If untracked
      cache is also enabled, we'll update the cache along the way. If a
      directory is validated, we simply pull the untracked listing out from
      the cache. The cache also records the list of direct subdirs that we
      have to recurse in. Fully excluded directories are seen as "untracked
      files".
      
      In the best case when no dirs are invalidated, read_directory()
      becomes a series of
      
        stat(dir), open(.gitignore), fstat(), read(), close() and optionally
        hash_sha1_file()
      
      For comparison, standard read_directory() is a sequence of
      
        opendir(), readdir(), open(.gitignore), fstat(), read(), close(), the
        expensive last_exclude_matching() and closedir().
      
      We already try not to open(.gitignore) if we know it does not exist,
      so open/fstat/read/close sequence does not apply to every
      directory. The sequence could be reduced further, as noted in
      prep_exclude() in another patch. So in theory, the entire best-case
      read_directory sequence could be reduced to a series of stat() and
      nothing else.
      
      This is not a silver bullet approach. When you compile a C file, for
      example, the old .o file is removed and a new one with the same name
      created, effectively invalidating the containing directory's cache
      (but not its subdirectories). If your build process touches every
      directory, this cache adds extra overhead for nothing, so it's a good
      idea to separate generated files from tracked files.. Editors may use
      the same strategy for saving files. And of course you're out of luck
      running your repo on an unsupported filesystem and/or operating system.
      Helped-by: NEric Sunshine <sunshine@sunshineco.com>
      Signed-off-by: NNguyễn Thái Ngọc Duy <pclouds@gmail.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      91a2288b
    • N
      untracked cache: initial untracked cache validation · ccad261f
      Nguyễn Thái Ngọc Duy 提交于
      Make sure the starting conditions and all global exclude files are
      good to go. If not, either disable untracked cache completely, or wipe
      out the cache and start fresh.
      Signed-off-by: NNguyễn Thái Ngọc Duy <pclouds@gmail.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      ccad261f
    • N
      untracked cache: record .gitignore information and dir hierarchy · 0dcb8d7f
      Nguyễn Thái Ngọc Duy 提交于
      The idea is if we can capture all input and (non-rescursive) output of
      read_directory_recursive(), and can verify later that all the input is
      the same, then the second r_d_r() should produce the same output as in
      the first run.
      
      The requirement for this to work is stat info of a directory MUST
      change if an entry is added to or removed from that directory (and
      should not change often otherwise). If your OS and filesystem do not
      meet this requirement, untracked cache is not for you. Most file
      systems on *nix should be fine. On Windows, NTFS is fine while FAT may
      not be [1] even though FAT on Linux seems to be fine.
      
      The list of input of r_d_r() is in the big comment block in dir.h. In
      short, the output of a directory (not counting subdirs) mainly depends
      on stat info of the directory in question, all .gitignore leading to
      it and the check_only flag when r_d_r() is called recursively. This
      patch records all this info (and the output) as r_d_r() runs.
      
      Two hash_sha1_file() are required for $GIT_DIR/info/exclude and
      core.excludesfile unless their stat data matches. hash_sha1_file() is
      only needed when .gitignore files in the worktree are modified,
      otherwise their SHA-1 in index is used (see the previous patch).
      
      We could store stat data for .gitignore files so we don't have to
      rehash them if their content is different from index, but I think
      .gitignore files are rarely modified, so not worth extra cache data
      (and hashing penalty read-cache.c:verify_hdr(), as we will be storing
      this as an index extension).
      
      The implication is, if you change .gitignore, you better add it to the
      index soon or you lose all the benefit of untracked cache because a
      modified .gitignore invalidates all subdirs recursively. This is
      especially bad for .gitignore at root.
      
      This cached output is about untracked files only, not ignored files
      because the number of tracked files is usually small, so small cache
      overhead, while the number of ignored files could go really high
      (e.g. *.o files mixing with source code).
      
      [1] "Description of NTFS date and time stamps for files and folders"
          http://support.microsoft.com/kb/299648Helped-by: NTorsten Bögershausen <tboegi@web.de>
      Helped-by: NDavid Turner <dturner@twopensource.com>
      Signed-off-by: NNguyễn Thái Ngọc Duy <pclouds@gmail.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      0dcb8d7f
    • N
      dir.c: optionally compute sha-1 of a .gitignore file · 55fe6f51
      Nguyễn Thái Ngọc Duy 提交于
      This is not used anywhere yet. But the goal is to compare quickly if a
      .gitignore file has changed when we have the SHA-1 of both old (cached
      somewhere) and new (from index or a tree) versions.
      Helped-by: NJunio C Hamano <gitster@pobox.com>
      Helped-by: NTorsten Bögershausen <tboegi@web.de>
      Signed-off-by: NNguyễn Thái Ngọc Duy <pclouds@gmail.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      55fe6f51
  10. 15 7月, 2014 2 次提交
  11. 25 2月, 2014 4 次提交
  12. 16 8月, 2013 1 次提交
    • J
      ls-files -k: a directory only can be killed if the index has a non-directory · 2eac2a4c
      Junio C Hamano 提交于
      "ls-files -o" and "ls-files -k" both traverse the working tree down
      to find either all untracked paths or those that will be "killed"
      (removed from the working tree to make room) when the paths recorded
      in the index are checked out.  It is necessary to traverse the
      working tree fully when enumerating all the "other" paths, but when
      we are only interested in "killed" paths, we can take advantage of
      the fact that paths that do not overlap with entries in the index
      can never be killed.
      
      The treat_one_path() helper function, which is called during the
      recursive traversal, is the ideal place to implement an
      optimization.
      
      When we are looking at a directory P in the working tree, there are
      three cases:
      
       (1) P exists in the index.  Everything inside the directory P in
           the working tree needs to go when P is checked out from the
           index.
      
       (2) P does not exist in the index, but there is P/Q in the index.
           We know P will stay a directory when we check out the contents
           of the index, but we do not know yet if there is a directory
           P/Q in the working tree to be killed, so we need to recurse.
      
       (3) P does not exist in the index, and there is no P/Q in the index
           to require P to be a directory, either.  Only in this case, we
           know that everything inside P will not be killed without
           recursing.
      
      Note that this helper is called by treat_leading_path() that decides
      if we need to traverse only subdirectories of a single common
      leading directory, which is essential for this optimization to be
      correct.  This caller checks each level of the leading path
      component from shallower directory to deeper ones, and that is what
      allows us to only check if the path appears in the index.  If the
      call to treat_one_path() weren't there, given a path P/Q/R, the real
      traversal may start from directory P/Q/R, even when the index
      records P as a regular file, and we would end up having to check if
      any leading subpath in P/Q/R, e.g. P, appears in the index.
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      2eac2a4c
  13. 16 7月, 2013 5 次提交
  14. 16 4月, 2013 3 次提交
    • K
      dir.c: git-status --ignored: don't scan the work tree twice · 0aaf62b6
      Karsten Blees 提交于
      'git-status --ignored' still scans the work tree twice to collect
      untracked and ignored files, respectively.
      
      fill_directory / read_directory already supports collecting untracked and
      ignored files in a single directory scan. However, the DIR_COLLECT_IGNORED
      flag to enable this has some git-add specific side-effects (e.g. it
      doesn't recurse into ignored directories, so listing ignored files with
      --untracked=all doesn't work).
      
      The DIR_SHOW_IGNORED flag doesn't list untracked files and returns ignored
      files in dir_struct.entries[] (instead of dir_struct.ignored[] as
      DIR_COLLECT_IGNORED). DIR_SHOW_IGNORED is used all throughout git.
      
      We don't want to break the existing API, so lets introduce a new flag
      DIR_SHOW_IGNORED_TOO that lists untracked as well as ignored files similar
      to DIR_COLLECT_FILES, but will recurse into sub-directories based on the
      other flags as DIR_SHOW_IGNORED does.
      
      In dir.c::read_directory_recursive, add ignored files to either
      dir_struct.entries[] or dir_struct.ignored[] based on the flags. Also move
      the DIR_COLLECT_IGNORED case here so that filling result lists is in a
      common place.
      
      In wt-status.c::wt_status_collect_untracked, use the new flag and read
      results from dir_struct.ignored[]. Remove the extra fill_directory call.
      
      builtin/check-ignore.c doesn't call fill_directory, setting the git-add
      specific DIR_COLLECT_IGNORED flag has no effect here. Remove for clarity.
      
      Update API documentation to reflect the changes.
      
      Performance: with this patch, 'git-status --ignored' is typically as fast
      as 'git-status'.
      Signed-off-by: NKarsten Blees <blees@dcon.de>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      0aaf62b6
    • K
    • K
      dir.c: unify is_excluded and is_path_excluded APIs · 95c6f271
      Karsten Blees 提交于
      The is_excluded and is_path_excluded APIs are very similar, except for a
      few noteworthy differences:
      
      is_excluded doesn't handle ignored directories, results for paths within
      ignored directories are incorrect. This is probably based on the premise
      that recursive directory scans should stop at ignored directories, which
      is no longer true (in certain cases, read_directory_recursive currently
      calls is_excluded *and* is_path_excluded to get correct ignored state).
      
      is_excluded caches parsed .gitignore files of the last directory in struct
      dir_struct. If the directory changes, it finds a common parent directory
      and is very careful to drop only as much state as necessary. On the other
      hand, is_excluded will also read and parse .gitignore files in already
      ignored directories, which are completely irrelevant.
      
      is_path_excluded correctly handles ignored directories by checking if any
      component in the path is excluded. As it uses is_excluded internally, this
      unfortunately forces is_excluded to drop and re-read all .gitignore files,
      as there is no common parent directory for the root dir.
      
      is_path_excluded tracks state in a separate struct path_exclude_check,
      which is essentially a wrapper of dir_struct with two more fields. However,
      as is_path_excluded also modifies dir_struct, it is not possible to e.g.
      use multiple path_exclude_check structures with the same dir_struct in
      parallel. The additional structure just unnecessarily complicates the API.
      
      Teach is_excluded / prep_exclude about ignored directories: whenever
      entering a new directory, first check if the entire directory is excluded.
      Remember the excluded state in dir_struct. Don't traverse into already
      ignored directories (i.e. don't read irrelevant .gitignore files).
      
      Directories could also be excluded by exclude patterns specified on the
      command line or .git/info/exclude, so we cannot simply skip prep_exclude
      entirely if there's no .gitignore file name (dir_struct.exclude_per_dir).
      Move this check to just before actually reading the file.
      
      is_path_excluded is now equivalent to is_excluded, so we can simply
      redirect to it (the public API is cleaned up in the next patch).
      
      The performance impact of the additional ignored check per directory is
      hardly noticeable when reading directories recursively (e.g. 'git status').
      However, performance of git commands using the is_path_excluded API (e.g.
      'git ls-files --cached --ignored --exclude-standard') is greatly improved
      as this no longer re-reads .gitignore files on each call.
      
      Here's some performance data from the linux and WebKit repos (best of 10
      runs on a Debian Linux on SSD, core.preloadIndex=true):
      
             | ls-files -ci   |    status      | status --ignored
             | linux | WebKit | linux | WebKit | linux | WebKit
      -------+-------+--------+-------+--------+-------+---------
      before | 0.506 |  6.539 | 0.212 |  1.555 | 0.323 |  2.541
      after  | 0.080 |  1.191 | 0.218 |  1.583 | 0.321 |  2.579
      gain   | 6.325 |  5.490 | 0.972 |  0.982 | 1.006 |  0.985
      Signed-off-by: NKarsten Blees <blees@dcon.de>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      95c6f271
  15. 07 1月, 2013 4 次提交
    • A
      dir.c: improve docs for match_pathspec() and match_pathspec_depth() · 52ed1894
      Adam Spiers 提交于
      Fix a grammatical issue in the description of these functions, and
      make it more obvious how and why seen[] can be reused across multiple
      invocations.
      Signed-off-by: NAdam Spiers <git@adamspiers.org>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      52ed1894
    • A
      dir.c: provide clear_directory() for reclaiming dir_struct memory · 270be816
      Adam Spiers 提交于
      By the end of a directory traversal, a dir_struct instance will
      typically contains pointers to various data structures on the heap.
      clear_directory() provides a convenient way to reclaim that memory.
      Signed-off-by: NAdam Spiers <git@adamspiers.org>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      270be816
    • A
      dir.c: keep track of where patterns came from · c04318e4
      Adam Spiers 提交于
      For exclude patterns read in from files, the filename is stored in the
      exclude list, and the originating line number is stored in the
      individual exclude (counting starting at 1).
      
      For exclude patterns provided on the command line, a string describing
      the source of the patterns is stored in the exclude list, and the
      sequence number assigned to each exclude pattern is negative, with
      counting starting at -1.  So for example the 2nd pattern provided via
      --exclude would be numbered -2.  This allows any future consumers of
      that data to easily distinguish between exclude patterns from files
      vs. from the CLI.
      Signed-off-by: NAdam Spiers <git@adamspiers.org>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      c04318e4
    • A
      dir.c: use a single struct exclude_list per source of excludes · c082df24
      Adam Spiers 提交于
      Previously each exclude_list could potentially contain patterns
      from multiple sources.  For example dir->exclude_list[EXC_FILE]
      would typically contain patterns from .git/info/exclude and
      core.excludesfile, and dir->exclude_list[EXC_DIRS] could contain
      patterns from multiple per-directory .gitignore files during
      directory traversal (i.e. when dir->exclude_stack was more than
      one item deep).
      
      We split these composite exclude_lists up into three groups of
      exclude_lists (EXC_CMDL / EXC_DIRS / EXC_FILE as before), so that each
      exclude_list now contains patterns from a single source.  This will
      allow us to cleanly track the origin of each pattern simply by adding
      a src field to struct exclude_list, rather than to struct exclude,
      which would make memory management of the source string tricky in the
      EXC_DIRS case where its contents are dynamically generated.
      
      Similarly, by moving the filebuf member from struct exclude_stack to
      struct exclude_list, it allows us to track and subsequently free
      memory buffers allocated during the parsing of all exclude files,
      rather than only tracking buffers allocated for files in the EXC_DIRS
      group.
      Signed-off-by: NAdam Spiers <git@adamspiers.org>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      c082df24
  16. 29 12月, 2012 1 次提交