1. 12 2月, 2008 1 次提交
  2. 07 2月, 2008 1 次提交
    • S
      safecrlf: Add mechanism to warn about irreversible crlf conversions · 21e5ad50
      Steffen Prohaska 提交于
      CRLF conversion bears a slight chance of corrupting data.
      autocrlf=true will convert CRLF to LF during commit and LF to
      CRLF during checkout.  A file that contains a mixture of LF and
      CRLF before the commit cannot be recreated by git.  For text
      files this is the right thing to do: it corrects line endings
      such that we have only LF line endings in the repository.
      But for binary files that are accidentally classified as text the
      conversion can corrupt data.
      
      If you recognize such corruption early you can easily fix it by
      setting the conversion type explicitly in .gitattributes.  Right
      after committing you still have the original file in your work
      tree and this file is not yet corrupted.  You can explicitly tell
      git that this file is binary and git will handle the file
      appropriately.
      
      Unfortunately, the desired effect of cleaning up text files with
      mixed line endings and the undesired effect of corrupting binary
      files cannot be distinguished.  In both cases CRLFs are removed
      in an irreversible way.  For text files this is the right thing
      to do because CRLFs are line endings, while for binary files
      converting CRLFs corrupts data.
      
      This patch adds a mechanism that can either warn the user about
      an irreversible conversion or can even refuse to convert.  The
      mechanism is controlled by the variable core.safecrlf, with the
      following values:
      
       - false: disable safecrlf mechanism
       - warn: warn about irreversible conversions
       - true: refuse irreversible conversions
      
      The default is to warn.  Users are only affected by this default
      if core.autocrlf is set.  But the current default of git is to
      leave core.autocrlf unset, so users will not see warnings unless
      they deliberately chose to activate the autocrlf mechanism.
      
      The safecrlf mechanism's details depend on the git command.  The
      general principles when safecrlf is active (not false) are:
      
       - we warn/error out if files in the work tree can modified in an
         irreversible way without giving the user a chance to backup the
         original file.
      
       - for read-only operations that do not modify files in the work tree
         we do not not print annoying warnings.
      
      There are exceptions.  Even though...
      
       - "git add" itself does not touch the files in the work tree, the
         next checkout would, so the safety triggers;
      
       - "git apply" to update a text file with a patch does touch the files
         in the work tree, but the operation is about text files and CRLF
         conversion is about fixing the line ending inconsistencies, so the
         safety does not trigger;
      
       - "git diff" itself does not touch the files in the work tree, it is
         often run to inspect the changes you intend to next "git add".  To
         catch potential problems early, safety triggers.
      
      The concept of a safety check was originally proposed in a similar
      way by Linus Torvalds.  Thanks to Dimitry Potapov for insisting
      on getting the naked LF/autocrlf=true case right.
      Signed-off-by: NSteffen Prohaska <prohaska@zib.de>
      21e5ad50
  3. 17 1月, 2008 1 次提交
    • D
      treat any file with NUL as binary · 28624193
      Dmitry Potapov 提交于
      There are two heuristics in Git to detect whether a file is binary
      or text. One in xdiff-interface.c (which is taken from GNU diff)
      relies on existence of the NUL byte at the beginning. However,
      convert.c used a different heuristic, which relied on the percent
      of non-printable symbols (less than 1% for text files).
      
      Due to differences in detection whether a file is binary or not,
      it was possible that a file that diff treats as binary could be
      treated as text by CRLF conversion. This is very confusing for a
      user who sees that 'git diff' shows the file as binary expects it
      to be added as binary.
      
      This patch makes is_binary to consider any file that contains at
      least one NUL character as binary, to ensure that the heuristics
      used for CRLF conversion is tighter than what is used by diff.
      Signed-off-by: NDmitry Potapov <dpotapov@gmail.com>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      28624193
  4. 21 10月, 2007 3 次提交
  5. 16 10月, 2007 1 次提交
  6. 29 9月, 2007 1 次提交
    • P
      strbuf change: be sure ->buf is never ever NULL. · b315c5c0
      Pierre Habouzit 提交于
      For that purpose, the ->buf is always initialized with a char * buf living
      in the strbuf module. It is made a char * so that we can sloppily accept
      things that perform: sb->buf[0] = '\0', and because you can't pass "" as an
      initializer for ->buf without making gcc unhappy for very good reasons.
      
      strbuf_init/_detach/_grow have been fixed to trust ->alloc and not ->buf
      anymore.
      
      as a consequence strbuf_detach is _mandatory_ to detach a buffer, copying
      ->buf isn't an option anymore, if ->buf is going to escape from the scope,
      and eventually be free'd.
      
      API changes:
        * strbuf_setlen now always works, so just make strbuf_reset a convenience
          macro.
        * strbuf_detatch takes a size_t* optional argument (meaning it can be
          NULL) to copy the buffer's len, as it was needed for this refactor to
          make the code more readable, and working like the callers.
      Signed-off-by: NPierre Habouzit <madcoder@debian.org>
      Signed-off-by: NJunio C Hamano <gitster@pobox.com>
      b315c5c0
  7. 19 9月, 2007 1 次提交
  8. 17 9月, 2007 2 次提交
  9. 04 9月, 2007 1 次提交
  10. 26 5月, 2007 1 次提交
    • A
      Fix mishandling of $Id$ expanded in the repository copy in convert.c · c23290d5
      Andy Parkins 提交于
      If the repository contained an expanded ident keyword (i.e. $Id:XXXX$),
      then the wrong bytes were discarded, and the Id keyword was not
      expanded.  The fault was in convert.c:ident_to_worktree().
      
      Previously, when a "$Id:" was found in the repository version,
      ident_to_worktree() would search for the next "$" after this, and
      discarded everything it found until then.  That was done with the loop:
      
          do {
              ch = *cp++;
              if (ch == '$')
                  break;
              rem--;
          } while (rem);
      
      The above loop left cp pointing one character _after_ the final "$"
      (because of ch = *cp++).  This was different from the non-expanded case,
      were cp is left pointing at the "$", and was different from the comment
      which stated "discard up to but not including the closing $".  This
      patch fixes that by making the loop:
      
          do {
              ch = *cp;
              if (ch == '$')
                  break;
              cp++;
              rem--;
          } while (rem);
      
      That is, cp is tested _then_ incremented.
      
      This loop exits if it finds a "$" or if it runs out of bytes in the
      source.  After this loop, if there was no closing "$" the expansion is
      skipped, and the outer loop is allowed to continue leaving this
      non-keyword as it was.  However, when the "$" is found, size is
      corrected, before running the expansion:
      
          size -= (cp - src);
      
      This is wrong; size is going to be corrected anyway after the expansion,
      so there is no need to do it here.  This patch removes that redundant
      correction.
      
      To help find this bug, I heavily commented the routine; those comments
      are included here as a bonus.
      Signed-off-by: NAndy Parkins <andyparkins@gmail.com>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      c23290d5
  11. 19 5月, 2007 2 次提交
    • A
      Fix crlf attribute handling to match documentation · 760f0c62
      Andy Parkins 提交于
      gitattributes.txt says, of the crlf attribute:
      
       Set::
          Setting the `crlf` attribute on a path is meant to mark
          the path as a "text" file.  'core.autocrlf' conversion
          takes place without guessing the content type by
          inspection.
      
      That is to say that the crlf attribute does not force the file to have
      CRLF line endings, instead it removes the autocrlf guesswork and forces
      the file to be treated as text.  Then, whatever line ending is defined
      by the autocrlf setting is applied.
      
      However, that is not what convert.c was doing.  The conversion to CRLF
      was being skipped in crlf_to_worktree() when the following condition was
      true:
      
       action == CRLF_GUESS && auto_crlf <= 0
      
      That is to say conversion took place when not in guess mode (crlf attribute
      not specified) or core.autocrlf set to true.  This was wrong.  It meant
      that the crlf attribute being on for a given file _forced_ CRLF
      conversion, when actually it should force the file to be treated as
      text, and converted accordingly.  The real test should simply be
      
       auto_crlf <= 0
      
      That is to say, if core.autocrlf is falsei (or input), conversion from
      LF to CRLF is never done.  When core.autocrlf is true, conversion from
      LF to CRLF is done only when in CRLF_GUESS (and the guess is "text"), or
      CRLF_TEXT mode.
      
      Similarly for crlf_to_worktree(), if core.autocrlf is false, no conversion
      should _ever_ take place.  In reality it was only not taking place if
      core.autocrlf was false _and_ the crlf attribute was unspecified.
      Signed-off-by: NAndy Parkins <andyparkins@gmail.com>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      760f0c62
    • R
      git-archive: convert archive entries like checkouts do · 5e6cfc80
      René Scharfe 提交于
      As noted by Johan Herland, git-archive is a kind of checkout and needs
      to apply any checkout filters that might be configured.
      
      This patch adds the convenience function convert_sha1_file which returns
      a buffer containing the object's contents, after converting, if necessary
      (i.e. it's a combination of read_sha1_file and convert_to_working_tree).
      Direct calls to read_sha1_file in git-archive are then replaced by calls
      to convert_sha1_file.
      
      Since convert_sha1_file expects its path argument to be NUL-terminated --
      a convention it inherits from convert_to_working_tree -- the patch also
      changes the path handling in archive-tar.c to always NUL-terminate the
      string.  It used to solely rely on the len field of struct strbuf before.
      
      archive-zip.c already NUL-terminates the path and thus needs no such
      change.
      Signed-off-by: NRene Scharfe <rene.scharfe@lsrfire.ath.cx>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      5e6cfc80
  12. 15 5月, 2007 1 次提交
  13. 25 4月, 2007 2 次提交
    • J
      Add 'filter' attribute and external filter driver definition. · aa4ed402
      Junio C Hamano 提交于
      The interface is similar to the custom low-level merge drivers.
      
      First you configure your filter driver by defining 'filter.<name>.*'
      variables in the configuration.
      
      	filter.<name>.clean	filter command to run upon checkin
      	filter.<name>.smudge	filter command to run upon checkout
      
      Then you assign filter attribute to each path, whose name
      matches the custom filter driver's name.
      
      Example:
      
      	(in .gitattributes)
      	*.c	filter=indent
      
      	(in config)
      	[filter "indent"]
      		clean = indent
      		smudge = cat
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      aa4ed402
    • J
      Add 'ident' conversion. · 3fed15f5
      Junio C Hamano 提交于
      The 'ident' attribute set to path squashes "$ident:<any bytes
      except dollor sign>$" to "$ident$" upon checkin, and expands it
      to "$ident: <blob SHA-1> $" upon checkout.
      
      As we have two conversions that affect checkin/checkout paths,
      clarify how they interact with each other.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      3fed15f5
  14. 23 4月, 2007 1 次提交
  15. 22 4月, 2007 1 次提交
  16. 21 4月, 2007 1 次提交
  17. 20 4月, 2007 1 次提交
    • J
      Update 'crlf' attribute semantics. · 163b9591
      Junio C Hamano 提交于
      This updates the semantics of 'crlf' so that .gitattributes file
      can say "this is text, even though it may look funny".
      
      Setting the `crlf` attribute on a path is meant to mark the path
      as a "text" file.  'core.autocrlf' conversion takes place
      without guessing the content type by inspection.
      
      Unsetting the `crlf` attribute on a path is meant to mark the
      path as a "binary" file.  The path never goes through line
      endings conversion upon checkin/checkout.
      
      Unspecified `crlf` attribute tells git to apply the
      `core.autocrlf` conversion when the file content looks like
      text.
      
      Setting the `crlf` attribut to string value "input" is similar
      to setting the attribute to `true`, but also forces git to act
      as if `core.autocrlf` is set to `input` for the path.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      163b9591
  18. 19 4月, 2007 1 次提交
    • J
      Fix funny types used in attribute value representation · a5e92abd
      Junio C Hamano 提交于
      It was bothering me a lot that I abused small integer values
      casted to (void *) to represent non string values in
      gitattributes.  This corrects it by making the type of attribute
      values (const char *), and using the address of a few statically
      allocated character buffer to denote true/false.  Unset attributes
      are represented as having NULLs as their values.
      
      Added in-header documentation to explain how git_checkattr()
      routine should be called.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      a5e92abd
  19. 17 4月, 2007 1 次提交
    • J
      Allow more than true/false to attributes. · 515106fa
      Junio C Hamano 提交于
      This allows you to define three values (and possibly more) to
      each attribute: true, false, and unset.
      
      Typically the handlers that notice and act on attribute values
      treat "unset" attribute to mean "do your default thing"
      (e.g. crlf that is unset would trigger "guess from contents"),
      so being able to override a setting to an unset state is
      actually useful.
      
       - If you want to set the attribute value to true, have an entry
         in .gitattributes file that mentions the attribute name; e.g.
      
      	*.o	binary
      
       - If you want to set the attribute value explicitly to false,
         use '-'; e.g.
      
      	*.a	-diff
      
       - If you want to make the attribute value _unset_, perhaps to
         override an earlier entry, use '!'; e.g.
      
      	*.a	-diff
      	c.i.a	!diff
      
      This also allows string values to attributes, with the natural
      syntax:
      
      	attrname=attrvalue
      
      but you cannot use it, as nobody takes notice and acts on
      it yet.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      515106fa
  20. 16 4月, 2007 1 次提交
    • J
      Fix 'crlf' attribute semantics. · 201ac8ef
      Junio C Hamano 提交于
      Earlier we said 'crlf lets the path go through core.autocrlf
      process while !crlf disables it altogether'.  This fixes the
      semantics to:
      
       - Lack of 'crlf' attribute makes core.autocrlf to apply
         (i.e. we guess based on the contents and if platform
         expresses its desire to have CRLF line endings via
         core.autocrlf, we do so).
      
       - Setting 'crlf' attribute to true forces CRLF line endings in
         working tree files, even if blob does not look like text
         (e.g. contains NUL or other bytes we consider binary).
      
       - Setting 'crlf' attribute to false disables conversion.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      201ac8ef
  21. 14 4月, 2007 1 次提交
    • J
      Define 'crlf' attribute. · 35ebfd6a
      Junio C Hamano 提交于
      This defines the semantics of 'crlf' attribute as an example.
      When a path has this attribute unset (i.e. '!crlf'), autocrlf
      line-end conversion is not applied.
      
      Eventually we would want to let users to build a pipeline of
      processing to munge blob data to filesystem format (and in the
      other direction) based on combination of attributes, and at that
      point the mechanism in convert_to_{git,working_tree}() that
      looks at 'crlf' attribute needs to be enhanced.  Perhaps the
      existing 'crlf' would become the first step in the input chain,
      and the last step in the output chain.
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      35ebfd6a
  22. 15 2月, 2007 2 次提交
    • L
      Make AutoCRLF ternary variable. · d7f46334
      Linus Torvalds 提交于
      This allows you to do:
      
      	[core]
      		AutoCRLF = input
      
      and it should do only the CRLF->LF translation (ie it simplifies CRLF only
      when reading working tree files, but when checking out files, it leaves
      the LF alone, and doesn't turn it into a CRLF).
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      d7f46334
    • L
      Lazy man's auto-CRLF · 6c510bee
      Linus Torvalds 提交于
      It currently does NOT know about file attributes, so it does its
      conversion purely based on content. Maybe that is more in the "git
      philosophy" anyway, since content is king, but I think we should try to do
      the file attributes to turn it off on demand.
      
      Anyway, BY DEFAULT it is off regardless, because it requires a
      
      	[core]
      		AutoCRLF = true
      
      in your config file to be enabled. We could make that the default for
      Windows, of course, the same way we do some other things (filemode etc).
      
      But you can actually enable it on UNIX, and it will cause:
      
       - "git update-index" will write blobs without CRLF
       - "git diff" will diff working tree files without CRLF
       - "git checkout" will write files to the working tree _with_ CRLF
      
      and things work fine.
      
      Funnily, it actually shows an odd file in git itself:
      
      	git clone -n git test-crlf
      	cd test-crlf
      	git config core.autocrlf true
      	git checkout
      	git diff
      
      shows a diff for "Documentation/docbook-xsl.css". Why? Because we have
      actually checked in that file *with* CRLF! So when "core.autocrlf" is
      true, we'll always generate a *different* hash for it in the index,
      because the index hash will be for the content _without_ CRLF.
      
      Is this complete? I dunno. It seems to work for me. It doesn't use the
      filename at all right now, and that's probably a deficiency (we could
      certainly make the "is_binary()" heuristics also take standard filename
      heuristics into account).
      
      I don't pass in the filename at all for the "index_fd()" case
      (git-update-index), so that would need to be passed around, but this
      actually works fine.
      
      NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours
      truly. I will not guarantee that they work at all reasonable. Caveat
      emptor. But it _is_ simple, and it _is_ safe, since it's all off by
      default.
      
      The patch is pretty simple - the biggest part is the new "convert.c" file,
      but even that is really just basic stuff that anybody can write in
      "Teaching C 101" as a final project for their first class in programming.
      Not to say that it's bug-free, of course - but at least we're not talking
      about rocket surgery here.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NJunio C Hamano <junkio@cox.net>
      6c510bee