1. 25 7月, 2016 1 次提交
  2. 30 6月, 2016 1 次提交
    • M
      vfs: merge .d_select_inode() into .d_real() · 2d902671
      Miklos Szeredi 提交于
      The two methods essentially do the same: find the real dentry/inode
      belonging to an overlay dentry.  The difference is in the usage:
      
      vfs_open() uses ->d_select_inode() and expects the function to perform
      copy-up if necessary based on the open flags argument.
      
      file_dentry() uses ->d_real() passing in the overlay dentry as well as the
      underlying inode.
      
      vfs_rename() uses ->d_select_inode() but passes zero flags.  ->d_real()
      with a zero inode would have worked just as well here.
      
      This patch merges the functionality of ->d_select_inode() into ->d_real()
      by adding an 'open_flags' argument to the latter.
      
      [Al Viro] Make the signature of d_real() match that of ->d_real() again.
      And constify the inode argument, while we are at it.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      2d902671
  3. 08 6月, 2016 1 次提交
  4. 06 6月, 2016 1 次提交
    • E
      devpts: Make each mount of devpts an independent filesystem. · eedf265a
      Eric W. Biederman 提交于
      The /dev/ptmx device node is changed to lookup the directory entry "pts"
      in the same directory as the /dev/ptmx device node was opened in.  If
      there is a "pts" entry and that entry is a devpts filesystem /dev/ptmx
      uses that filesystem.  Otherwise the open of /dev/ptmx fails.
      
      The DEVPTS_MULTIPLE_INSTANCES configuration option is removed, so that
      userspace can now safely depend on each mount of devpts creating a new
      instance of the filesystem.
      
      Each mount of devpts is now a separate and equal filesystem.
      
      Reserved ttys are now available to all instances of devpts where the
      mounter is in the initial mount namespace.
      
      A new vfs helper path_pts is introduced that finds a directory entry
      named "pts" in the directory of the passed in path, and changes the
      passed in path to point to it.  The helper path_pts uses a function
      path_parent_directory that was factored out of follow_dotdot.
      
      In the implementation of devpts:
       - devpts_mnt is killed as it is no longer meaningful if all mounts of
         devpts are equal.
       - pts_sb_from_inode is replaced by just inode->i_sb as all cached
         inodes in the tty layer are now from the devpts filesystem.
       - devpts_add_ref is rolled into the new function devpts_ptmx.  And the
         unnecessary inode hold is removed.
       - devpts_del_ref is renamed devpts_release and reduced to just a
         deacrivate_super.
       - The newinstance mount option continues to be accepted but is now
         ignored.
      
      In devpts_fs.h definitions for when !CONFIG_UNIX98_PTYS are removed as
      they are never used.
      
      Documentation/filesystems/devices.txt is updated to describe the current
      situation.
      
      This has been verified to work properly on openwrt-15.05, centos5,
      centos6, centos7, debian-6.0.2, debian-7.9, debian-8.2, ubuntu-14.04.3,
      ubuntu-15.10, fedora23, magia-5, mint-17.3, opensuse-42.1,
      slackware-14.1, gentoo-20151225 (13.0?), archlinux-2015-12-01.  With the
      caveat that on centos6 and on slackware-14.1 that there wind up being
      two instances of the devpts filesystem mounted on /dev/pts, the lower
      copy does not end up getting used.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Serge Hallyn <serge.hallyn@ubuntu.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: Aurelien Jarno <aurelien@aurel32.net>
      Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
      Cc: Jann Horn <jann@thejh.net>
      Cc: Jiri Slaby <jslaby@suse.com>
      Cc: Florian Weimer <fw@deneb.enyo.de>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eedf265a
  5. 05 6月, 2016 1 次提交
    • A
      autofs braino fix for do_last() · e6ec03a2
      Al Viro 提交于
      It's an analogue of commit 7500c38a (fix the braino in "namei:
      massage lookup_slow() to be usable by lookup_one_len_unlocked()").
      The same problem (->lookup()-returned unhashed negative dentry
      just might be an autofs one with ->d_manage() that would wait
      until the daemon makes it positive) applies in do_last() - we
      need to do follow_managed() first.
      
      Fortunately, remaining callers of follow_managed() are OK - only
      autofs has that weirdness (negative dentry that does not mean
      an instant -ENOENT)) and autofs never has its negative dentries
      hashed, so we can't pick one from a dcache lookup.
      
      ->d_manage() is a bloody mess ;-/
      
      Cc: stable@vger.kernel.org # v4.6
      Spotted-by: NIan Kent <raven@themaw.net>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e6ec03a2
  6. 04 6月, 2016 1 次提交
    • A
      fix EOPENSTALE bug in do_last() · fac7d191
      Al Viro 提交于
      EOPENSTALE occuring at the last component of a trailing symlink ends up
      with do_last() retrying its lookup.  After the symlink body has been
      discarded.  The thing is, all this retry_lookup logics in there is not
      needed at all - the upper layers will do the right thing if we simply
      return that -EOPENSTALE as we would with any other error.  Trying to
      microoptimize in do_last() is a lot of headache for no good reason.
      
      Cc: stable@vger.kernel.org # v4.2+
      Tested-by: NOleg Drokin <green@linuxhacker.ru>
      Reviewed-and-Tested-by: NJeff Layton <jlayton@poochiereds.net>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      fac7d191
  7. 29 5月, 2016 5 次提交
    • G
      hash_string: Fix zero-length case for !DCACHE_WORD_ACCESS · e0ab7af9
      George Spelvin 提交于
      The self-test was updated to cover zero-length strings; the function
      needs to be updated, too.
      Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NGeorge Spelvin <linux@sciencehorizons.net>
      Fixes: fcfd2fbf ("fs/namei.c: Add hashlen_string() function")
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e0ab7af9
    • G
      Rename other copy of hash_string to hashlen_string · f2a031b6
      George Spelvin 提交于
      The original name was simply hash_string(), but that conflicted with a
      function with that name in drivers/base/power/trace.c, and I decided
      that calling it "hashlen_" was better anyway.
      
      But you have to do it in two places.
      
      [ This caused build errors for architectures that don't define
        CONFIG_DCACHE_WORD_ACCESS   - Linus ]
      Signed-off-by: NGeorge Spelvin <linux@sciencehorizons.net>
      Reported-by: NGuenter Roeck <linux@roeck-us.net>
      Fixes: fcfd2fbf ("fs/namei.c: Add hashlen_string() function")
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f2a031b6
    • G
      <linux/hash.h>: Add support for architecture-specific functions · 468a9428
      George Spelvin 提交于
      This is just the infrastructure; there are no users yet.
      
      This is modelled on CONFIG_ARCH_RANDOM; a CONFIG_ symbol declares
      the existence of <asm/hash.h>.
      
      That file may define its own versions of various functions, and define
      HAVE_* symbols (no CONFIG_ prefix!) to suppress the generic ones.
      
      Included is a self-test (in lib/test_hash.c) that verifies the basics.
      It is NOT in general required that the arch-specific functions compute
      the same thing as the generic, but if a HAVE_* symbol is defined with
      the value 1, then equality is tested.
      Signed-off-by: NGeorge Spelvin <linux@sciencehorizons.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Andreas Schwab <schwab@linux-m68k.org>
      Cc: Philippe De Muyter <phdm@macq.eu>
      Cc: linux-m68k@lists.linux-m68k.org
      Cc: Alistair Francis <alistai@xilinx.com>
      Cc: Michal Simek <michal.simek@xilinx.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: uclinux-h8-devel@lists.sourceforge.jp
      468a9428
    • G
      fs/namei.c: Improve dcache hash function · 2a18da7a
      George Spelvin 提交于
      Patch 0fed3ac8 improved the hash mixing, but the function is slower
      than necessary; there's a 7-instruction dependency chain (10 on x86)
      each loop iteration.
      
      Word-at-a-time access is a very tight loop (which is good, because
      link_path_walk() is one of the hottest code paths in the entire kernel),
      and the hash mixing function must not have a longer latency to avoid
      slowing it down.
      
      There do not appear to be any published fast hash functions that:
      1) Operate on the input a word at a time, and
      2) Don't need to know the length of the input beforehand, and
      3) Have a single iterated mixing function, not needing conditional
         branches or unrolling to distinguish different loop iterations.
      
      One of the algorithms which comes closest is Yann Collet's xxHash, but
      that's two dependent multiplies per word, which is too much.
      
      The key insights in this design are:
      
      1) Barring expensive ops like multiplies, to diffuse one input bit
         across 64 bits of hash state takes at least log2(64) = 6 sequentially
         dependent instructions.  That is more cycles than we'd like.
      2) An operation like "hash ^= hash << 13" requires a second temporary
         register anyway, and on a 2-operand machine like x86, it's three
         instructions.
      3) A better use of a second register is to hold a two-word hash state.
         With careful design, no temporaries are needed at all, so it doesn't
         increase register pressure.  And this gets rid of register copying
         on 2-operand machines, so the code is smaller and faster.
      4) Using two words of state weakens the requirement for one-round mixing;
         we now have two rounds of mixing before cancellation is possible.
      5) A two-word hash state also allows operations on both halves to be
         done in parallel, so on a superscalar processor we get more mixing
         in fewer cycles.
      
      I ended up using a mixing function inspired by the ChaCha and Speck
      round functions.  It is 6 simple instructions and 3 cycles per iteration
      (assuming multiply by 9 can be done by an "lea" instruction):
      
      		x ^= *input++;
      	y ^= x;	x = ROL(x, K1);
      	x += y;	y = ROL(y, K2);
      	y *= 9;
      
      Not only is this reversible, two consecutive rounds are reversible:
      if you are given the initial and final states, but not the intermediate
      state, it is possible to compute both input words.  This means that at
      least 3 words of input are required to create a collision.
      
      (It also has the property, used by hash_name() to avoid a branch, that
      it hashes all-zero to all-zero.)
      
      The rotate constants K1 and K2 were found by experiment.  The search took
      a sample of random initial states (I used 1023) and considered the effect
      of flipping each of the 64 input bits on each of the 128 output bits two
      rounds later.  Each of the 8192 pairs can be considered a biased coin, and
      adding up the Shannon entropy of all of them produces a score.
      
      The best-scoring shifts also did well in other tests (flipping bits in y,
      trying 3 or 4 rounds of mixing, flipping all 64*63/2 pairs of input bits),
      so the choice was made with the additional constraint that the sum of the
      shifts is odd and not too close to the word size.
      
      The final state is then folded into a 32-bit hash value by a less carefully
      optimized multiply-based scheme.  This also has to be fast, as pathname
      components tend to be short (the most common case is one iteration!), but
      there's some room for latency, as there is a fair bit of intervening logic
      before the hash value is used for anything.
      
      (Performance verified with "bonnie++ -s 0 -n 1536:-2" on tmpfs.  I need
      a better benchmark; the numbers seem to show a slight dip in performance
      between 4.6.0 and this patch, but they're too noisy to quote.)
      
      Special thanks to Bruce fields for diligent testing which uncovered a
      nasty fencepost error in an earlier version of this patch.
      
      [checkpatch.pl formatting complaints noted and respectfully disagreed with.]
      Signed-off-by: NGeorge Spelvin <linux@sciencehorizons.net>
      Tested-by: NJ. Bruce Fields <bfields@redhat.com>
      2a18da7a
    • G
      fs/namei.c: Add hashlen_string() function · fcfd2fbf
      George Spelvin 提交于
      We'd like to make more use of the highly-optimized dcache hash functions
      throughout the kernel, rather than have every subsystem create its own,
      and a function that hashes basic null-terminated strings is required
      for that.
      
      (The name is to emphasize that it returns both hash and length.)
      
      It's actually useful in the dcache itself, specifically d_alloc_name().
      Other uses in the next patch.
      
      full_name_hash() is also tweaked to make it more generally useful:
      1) Take a "char *" rather than "unsigned char *" argument, to
         be consistent with hash_name().
      2) Handle zero-length inputs.  If we want more callers, we don't want
         to make them worry about corner cases.
      Signed-off-by: NGeorge Spelvin <linux@sciencehorizons.net>
      fcfd2fbf
  8. 17 5月, 2016 1 次提交
  9. 11 5月, 2016 2 次提交
    • M
      vfs: add lookup_hash() helper · 3c9fe8cd
      Miklos Szeredi 提交于
      Overlayfs needs lookup without inode_permission() and already has the name
      hash (in form of dentry->d_name on overlayfs dentry).  It also doesn't
      support filesystems with d_op->d_hash() so basically it only needs
      the actual hashed lookup from lookup_one_len_unlocked()
      
      So add a new helper that does unlocked lookup of a hashed name.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      3c9fe8cd
    • M
      vfs: rename: check backing inode being equal · 9409e22a
      Miklos Szeredi 提交于
      If a file is renamed to a hardlink of itself POSIX specifies that rename(2)
      should do nothing and return success.
      
      This condition is checked in vfs_rename().  However it won't detect hard
      links on overlayfs where these are given separate inodes on the overlayfs
      layer.
      
      Overlayfs itself detects this condition and returns success without doing
      anything, but then vfs_rename() will proceed as if this was a successful
      rename (detach_mounts(), d_move()).
      
      The correct thing to do is to detect this condition before even calling
      into overlayfs.  This patch does this by calling vfs_select_inode() to get
      the underlying inodes.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Cc: <stable@vger.kernel.org> # v4.2+
      9409e22a
  10. 03 5月, 2016 19 次提交
  11. 01 5月, 2016 2 次提交
    • M
      ima: add support for creating files using the mknodat syscall · 05d1a717
      Mimi Zohar 提交于
      Commit 3034a146 "ima: pass 'opened' flag to identify newly created files"
      stopped identifying empty files as new files.  However new empty files
      can be created using the mknodat syscall.  On systems with IMA-appraisal
      enabled, these empty files are not labeled with security.ima extended
      attributes properly, preventing them from subsequently being opened in
      order to write the file data contents.  This patch defines a new hook
      named ima_post_path_mknod() to mark these empty files, created using
      mknodat, as new in order to allow the file data contents to be written.
      
      In addition, files with security.ima xattrs containing a file signature
      are considered "immutable" and can not be modified.  The file contents
      need to be written, before signing the file.  This patch relaxes this
      requirement for new files, allowing the file signature to be written
      before the file contents.
      
      Changelog:
      - defer identifying files with signatures stored as security.ima
        (based on Dmitry Rozhkov's comments)
      - removing tests (eg. dentry, dentry->d_inode, inode->i_size == 0)
        (based on Al's review)
      Signed-off-by: NMimi Zohar <zohar@linux.vnet.ibm.com>
      Cc: Al Viro <<viro@zeniv.linux.org.uk>
      Tested-by: NDmitry Rozhkov <dmitry.rozhkov@linux.intel.com>
      05d1a717
    • A
      atomic_open(): fix the handling of create_error · 10c64cea
      Al Viro 提交于
      * if we have a hashed negative dentry and either CREAT|EXCL on
      r/o filesystem, or CREAT|TRUNC on r/o filesystem, or CREAT|EXCL
      with failing may_o_create(), we should fail with EROFS or the
      error may_o_create() has returned, but not ENOENT.  Which is what
      the current code ends up returning.
      
      * if we have CREAT|TRUNC hitting a regular file on a read-only
      filesystem, we can't fail with EROFS here.  At the very least,
      not until we'd done follow_managed() - we might have a writable
      file (or a device, for that matter) bound on top of that one.
      Moreover, the code downstream will see that O_TRUNC and attempt
      to grab the write access (*after* following possible mount), so
      if we really should fail with EROFS, it will happen.  No need
      to do that inside atomic_open().
      
      The real logics is much simpler than what the current code is
      trying to do - if we decided to go for simple lookup, ended
      up with a negative dentry *and* had create_error set, fail with
      create_error.  No matter whether we'd got that negative dentry
      from lookup_real() or had found it in dcache.
      
      Cc: stable@vger.kernel.org # v3.6+
      Acked-by: NMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      10c64cea
  12. 11 4月, 2016 1 次提交
  13. 06 4月, 2016 1 次提交
  14. 31 3月, 2016 2 次提交
  15. 28 3月, 2016 1 次提交