1. 03 9月, 2013 1 次提交
    • L
      vfs: reimplement d_rcu_to_refcount() using lockref_get_or_lock() · 15570086
      Linus Torvalds 提交于
      This moves __d_rcu_to_refcount() from <linux/dcache.h> into fs/namei.c
      and re-implements it using the lockref infrastructure instead.  It also
      adds a lot of comments about what is actually going on, because turning
      a dentry that was looked up using RCU into a long-lived reference
      counted entry is one of the more subtle parts of the rcu walk.
      
      We also used to be _particularly_ subtle in unlazy_walk() where we
      re-validate both the dentry and its parent using the same sequence
      count.  We used to do it by nesting the locks and then verifying the
      sequence count just once.
      
      That was silly, because nested locking is expensive, but the sequence
      count check is not.  So this just re-validates the dentry and the parent
      separately, avoiding the nested locking, and making the lockref lookup
      possible.
      Acked-by: NWaiman Long <waiman.long@hp.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      15570086
  2. 29 8月, 2013 2 次提交
    • W
      vfs: make the dentry cache use the lockref infrastructure · 98474236
      Waiman Long 提交于
      This just replaces the dentry count/lock combination with the lockref
      structure that contains both a count and a spinlock, and does the
      mechanical conversion to use the lockref infrastructure.
      
      There are no semantic changes here, it's purely syntactic.  The
      reference lockref implementation uses the spinlock exactly the same way
      that the old dcache code did, and the bulk of this patch is just
      expanding the internal "d_count" use in the dcache code to use
      "d_lockref.count" instead.
      
      This is purely preparation for the real change to make the reference
      count updates be lockless during the 3.12 merge window.
      
      [ As with the previous commit, this is a rewritten version of a concept
        originally from Waiman, so credit goes to him, blame for any errors
        goes to me.
      
        Waiman's patch had some semantic differences for taking advantage of
        the lockless update in dget_parent(), while this patch is
        intentionally a pure search-and-replace change with no semantic
        changes.     - Linus ]
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      98474236
    • L
      Revert "fs: Allow unprivileged linkat(..., AT_EMPTY_PATH) aka flink" · f0cc6ffb
      Linus Torvalds 提交于
      This reverts commit bb2314b4.
      
      It wasn't necessarily wrong per se, but we're still busily discussing
      the exact details of this all, so I'm going to revert it for now.
      
      It's true that you can already do flink() through /proc and that flink()
      isn't new.  But as Brad Spengler points out, some secure environments do
      not mount proc, and flink adds a new interface that can avoid path
      lookup of the source for those kinds of environments.
      
      We may re-do this (and even mark it for stable backporting back in 3.11
      and possibly earlier) once the whole discussion about the interface is done.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Brad Spengler <spender@grsecurity.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f0cc6ffb
  3. 05 8月, 2013 1 次提交
    • A
      fs: Allow unprivileged linkat(..., AT_EMPTY_PATH) aka flink · bb2314b4
      Andy Lutomirski 提交于
      Every now and then someone proposes a new flink syscall, and this spawns
      a long discussion of whether it would be a security problem.  I think
      that this is missing the point: flink is *already* allowed without
      privilege as long as /proc is mounted -- it's called AT_SYMLINK_FOLLOW.
      
      Now that O_TMPFILE is here, the ability to create a file with O_TMPFILE,
      write it, and link it in is very convenient.  The only problem is that
      it requires that /proc be mounted so that you can do:
      
      linkat(AT_FDCWD, "/proc/self/fd/<tmpfd>", dfd, path, AT_SYMLINK_NOFOLLOW)
      
      This sucks -- it's much nicer to do:
      
      linkat(tmpfd, "", dfd, path, AT_EMPTY_PATH)
      
      Let's allow it.
      
      If this turns out to be excessively scary, it we could instead require
      that the inode in question be I_LINKABLE, but this seems pointless given
      the /proc situation
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      bb2314b4
  4. 13 7月, 2013 1 次提交
    • A
      Safer ABI for O_TMPFILE · bb458c64
      Al Viro 提交于
      [suggested by Rasmus Villemoes] make O_DIRECTORY | O_RDWR part of O_TMPFILE;
      that will fail on old kernels in a lot more cases than what I came up with.
      And make sure O_CREAT doesn't get there...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      bb458c64
  5. 29 6月, 2013 5 次提交
  6. 15 6月, 2013 1 次提交
  7. 08 5月, 2013 1 次提交
    • J
      audit: vfs: fix audit_inode call in O_CREAT case of do_last · 33e2208a
      Jeff Layton 提交于
      Jiri reported a regression in auditing of open(..., O_CREAT) syscalls.
      In older kernels, creating a file with open(..., O_CREAT) created
      audit_name records that looked like this:
      
      type=PATH msg=audit(1360255720.628:64): item=1 name="/abc/foo" inode=138810 dev=fd:00 mode=0100640 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:default_t:s0
      type=PATH msg=audit(1360255720.628:64): item=0 name="/abc/" inode=138635 dev=fd:00 mode=040750 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:default_t:s0
      
      ...in recent kernels though, they look like this:
      
      type=PATH msg=audit(1360255402.886:12574): item=2 name=(null) inode=264599 dev=fd:00 mode=0100640 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:default_t:s0
      type=PATH msg=audit(1360255402.886:12574): item=1 name=(null) inode=264598 dev=fd:00 mode=040750 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:default_t:s0
      type=PATH msg=audit(1360255402.886:12574): item=0 name="/abc/foo" inode=264598 dev=fd:00 mode=040750 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:default_t:s0
      
      Richard bisected to determine that the problems started with commit
      bfcec708, but the log messages have changed with some later
      audit-related patches.
      
      The problem is that this audit_inode call is passing in the parent of
      the dentry being opened, but audit_inode is being called with the parent
      flag false. This causes later audit_inode and audit_inode_child calls to
      match the wrong entry in the audit_names list.
      
      This patch simply sets the flag to properly indicate that this inode
      represents the parent. With this, the audit_names entries are back to
      looking like they did before.
      
      Cc: <stable@vger.kernel.org> # v3.7+
      Reported-by: NJiri Jaburek <jjaburek@redhat.com>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Test By: Richard Guy Briggs <rbriggs@redhat.com>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      33e2208a
  8. 09 3月, 2013 1 次提交
  9. 02 3月, 2013 1 次提交
  10. 26 2月, 2013 1 次提交
    • J
      vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op · ecf3d1f1
      Jeff Layton 提交于
      The following set of operations on a NFS client and server will cause
      
          server# mkdir a
          client# cd a
          server# mv a a.bak
          client# sleep 30  # (or whatever the dir attrcache timeout is)
          client# stat .
          stat: cannot stat `.': Stale NFS file handle
      
      Obviously, we should not be getting an ESTALE error back there since the
      inode still exists on the server. The problem is that the lookup code
      will call d_revalidate on the dentry that "." refers to, because NFS has
      FS_REVAL_DOT set.
      
      nfs_lookup_revalidate will see that the parent directory has changed and
      will try to reverify the dentry by redoing a LOOKUP. That of course
      fails, so the lookup code returns ESTALE.
      
      The problem here is that d_revalidate is really a bad fit for this case.
      What we really want to know at this point is whether the inode is still
      good or not, but we don't really care what name it goes by or whether
      the dcache is still valid.
      
      Add a new d_op->d_weak_revalidate operation and have complete_walk call
      that instead of d_revalidate. The intent there is to allow for a
      "weaker" d_revalidate that just checks to see whether the inode is still
      good. This is also gives us an opportunity to kill off the FS_REVAL_DOT
      special casing.
      
      [AV: changed method name, added note in porting, fixed confusion re
      having it possibly called from RCU mode (it won't be)]
      
      Cc: NeilBrown <neilb@suse.de>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ecf3d1f1
  11. 23 2月, 2013 6 次提交
  12. 21 12月, 2012 12 次提交
  13. 30 11月, 2012 1 次提交
  14. 27 10月, 2012 1 次提交
    • L
      VFS: don't do protected {sym,hard}links by default · 561ec64a
      Linus Torvalds 提交于
      In commit 800179c9 ("This adds symlink and hardlink restrictions to
      the Linux VFS"), the new link protections were enabled by default, in
      the hope that no actual application would care, despite it being
      technically against legacy UNIX (and documented POSIX) behavior.
      
      However, it does turn out to break some applications.  It's rare, and
      it's unfortunate, but it's unacceptable to break existing systems, so
      we'll have to default to legacy behavior.
      
      In particular, it has broken the way AFD distributes files, see
      
        http://www.dwd.de/AFD/
      
      along with some legacy scripts.
      
      Distributions can end up setting this at initrd time or in system
      scripts: if you have security problems due to link attacks during your
      early boot sequence, you have bigger problems than some kernel sysctl
      setting. Do:
      
      	echo 1 > /proc/sys/fs/protected_symlinks
      	echo 1 > /proc/sys/fs/protected_hardlinks
      
      to re-enable the link protections.
      
      Alternatively, we may at some point introduce a kernel config option
      that sets these kinds of "more secure but not traditional" behavioural
      options automatically.
      Reported-by: NNick Bowler <nbowler@elliptictech.com>
      Reported-by: NHolger Kiehl <Holger.Kiehl@dwd.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org # v3.6
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      561ec64a
  15. 13 10月, 2012 5 次提交