1. 31 7月, 2012 2 次提交
  2. 30 7月, 2012 2 次提交
    • K
      fs: add link restrictions · 800179c9
      Kees Cook 提交于
      This adds symlink and hardlink restrictions to the Linux VFS.
      
      Symlinks:
      
      A long-standing class of security issues is the symlink-based
      time-of-check-time-of-use race, most commonly seen in world-writable
      directories like /tmp. The common method of exploitation of this flaw
      is to cross privilege boundaries when following a given symlink (i.e. a
      root process follows a symlink belonging to another user). For a likely
      incomplete list of hundreds of examples across the years, please see:
      http://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=/tmp
      
      The solution is to permit symlinks to only be followed when outside
      a sticky world-writable directory, or when the uid of the symlink and
      follower match, or when the directory owner matches the symlink's owner.
      
      Some pointers to the history of earlier discussion that I could find:
      
       1996 Aug, Zygo Blaxell
        http://marc.info/?l=bugtraq&m=87602167419830&w=2
       1996 Oct, Andrew Tridgell
        http://lkml.indiana.edu/hypermail/linux/kernel/9610.2/0086.html
       1997 Dec, Albert D Cahalan
        http://lkml.org/lkml/1997/12/16/4
       2005 Feb, Lorenzo Hernández García-Hierro
        http://lkml.indiana.edu/hypermail/linux/kernel/0502.0/1896.html
       2010 May, Kees Cook
        https://lkml.org/lkml/2010/5/30/144
      
      Past objections and rebuttals could be summarized as:
      
       - Violates POSIX.
         - POSIX didn't consider this situation and it's not useful to follow
           a broken specification at the cost of security.
       - Might break unknown applications that use this feature.
         - Applications that break because of the change are easy to spot and
           fix. Applications that are vulnerable to symlink ToCToU by not having
           the change aren't. Additionally, no applications have yet been found
           that rely on this behavior.
       - Applications should just use mkstemp() or O_CREATE|O_EXCL.
         - True, but applications are not perfect, and new software is written
           all the time that makes these mistakes; blocking this flaw at the
           kernel is a single solution to the entire class of vulnerability.
       - This should live in the core VFS.
         - This should live in an LSM. (https://lkml.org/lkml/2010/5/31/135)
       - This should live in an LSM.
         - This should live in the core VFS. (https://lkml.org/lkml/2010/8/2/188)
      
      Hardlinks:
      
      On systems that have user-writable directories on the same partition
      as system files, a long-standing class of security issues is the
      hardlink-based time-of-check-time-of-use race, most commonly seen in
      world-writable directories like /tmp. The common method of exploitation
      of this flaw is to cross privilege boundaries when following a given
      hardlink (i.e. a root process follows a hardlink created by another
      user). Additionally, an issue exists where users can "pin" a potentially
      vulnerable setuid/setgid file so that an administrator will not actually
      upgrade a system fully.
      
      The solution is to permit hardlinks to only be created when the user is
      already the existing file's owner, or if they already have read/write
      access to the existing file.
      
      Many Linux users are surprised when they learn they can link to files
      they have no access to, so this change appears to follow the doctrine
      of "least surprise". Additionally, this change does not violate POSIX,
      which states "the implementation may require that the calling process
      has permission to access the existing file"[1].
      
      This change is known to break some implementations of the "at" daemon,
      though the version used by Fedora and Ubuntu has been fixed[2] for
      a while. Otherwise, the change has been undisruptive while in use in
      Ubuntu for the last 1.5 years.
      
      [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/linkat.html
      [2] http://anonscm.debian.org/gitweb/?p=collab-maint/at.git;a=commitdiff;h=f4114656c3a6c6f6070e315ffdf940a49eda3279
      
      This patch is based on the patches in Openwall and grsecurity, along with
      suggestions from Al Viro. I have added a sysctl to enable the protected
      behavior, and documentation.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      800179c9
    • A
      consolidate pipe file creation · e4fad8e5
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e4fad8e5
  3. 23 7月, 2012 3 次提交
  4. 14 7月, 2012 10 次提交
  5. 04 6月, 2012 1 次提交
    • L
      vfs: move inode stat information closer together · 2f9d3df8
      Linus Torvalds 提交于
      The comment above it says "Stat data, not accessed from path walking",
      but in fact some of inode fields we use for the common stat data was way
      down at the end of the inode, causing unnecessary cache misses for the
      common stat operations.
      
      The inode structure is pretty big, and this can change padding depending
      on field width, but at least on the common 64-bit configurations this
      doesn't change the size.  Some of our inode layout has historically been
      to tro to avoid unnecessary padding fields, but cache locality is at
      least as important for layout, if not more.
      
      Noticed by looking at kernel profiles, and noticing that the "i_blkbits"
      access stood out like a sore thumb.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2f9d3df8
  6. 02 6月, 2012 1 次提交
    • J
      fs: introduce inode operation ->update_time · c3b2da31
      Josef Bacik 提交于
      Btrfs has to make sure we have space to allocate new blocks in order to modify
      the inode, so updating time can fail.  We've gotten around this by having our
      own file_update_time but this is kind of a pain, and Christoph has indicated he
      would like to make xfs do something different with atime updates.  So introduce
      ->update_time, where we will deal with i_version an a/m/c time updates and
      indicate which changes need to be made.  The normal version just does what it
      has always done, updates the time and marks the inode dirty, and then
      filesystems can choose to do something different.
      
      I've gone through all of the users of file_update_time and made them check for
      errors with the exception of the fault code since it's complicated and I wasn't
      quite sure what to do there, also Jan is going to be pushing the file time
      updates into page_mkwrite for those who have it so that should satisfy btrfs and
      make it not a big deal to check the file_update_time() return code in the
      generic fault path. Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      c3b2da31
  7. 01 6月, 2012 2 次提交
  8. 30 5月, 2012 1 次提交
  9. 11 5月, 2012 1 次提交
    • J
      block: don't mark buffers beyond end of disk as mapped · 080399aa
      Jeff Moyer 提交于
      Hi,
      
      We have a bug report open where a squashfs image mounted on ppc64 would
      exhibit errors due to trying to read beyond the end of the disk.  It can
      easily be reproduced by doing the following:
      
      [root@ibm-p750e-02-lp3 ~]# ls -l install.img
      -rw-r--r-- 1 root root 142032896 Apr 30 16:46 install.img
      [root@ibm-p750e-02-lp3 ~]# mount -o loop ./install.img /mnt/test
      [root@ibm-p750e-02-lp3 ~]# dd if=/dev/loop0 of=/dev/null
      dd: reading `/dev/loop0': Input/output error
      277376+0 records in
      277376+0 records out
      142016512 bytes (142 MB) copied, 0.9465 s, 150 MB/s
      
      In dmesg, you'll find the following:
      
      squashfs: version 4.0 (2009/01/31) Phillip Lougher
      [   43.106012] attempt to access beyond end of device
      [   43.106029] loop0: rw=0, want=277410, limit=277408
      [   43.106039] Buffer I/O error on device loop0, logical block 138704
      [   43.106053] attempt to access beyond end of device
      [   43.106057] loop0: rw=0, want=277412, limit=277408
      [   43.106061] Buffer I/O error on device loop0, logical block 138705
      [   43.106066] attempt to access beyond end of device
      [   43.106070] loop0: rw=0, want=277414, limit=277408
      [   43.106073] Buffer I/O error on device loop0, logical block 138706
      [   43.106078] attempt to access beyond end of device
      [   43.106081] loop0: rw=0, want=277416, limit=277408
      [   43.106085] Buffer I/O error on device loop0, logical block 138707
      [   43.106089] attempt to access beyond end of device
      [   43.106093] loop0: rw=0, want=277418, limit=277408
      [   43.106096] Buffer I/O error on device loop0, logical block 138708
      [   43.106101] attempt to access beyond end of device
      [   43.106104] loop0: rw=0, want=277420, limit=277408
      [   43.106108] Buffer I/O error on device loop0, logical block 138709
      [   43.106112] attempt to access beyond end of device
      [   43.106116] loop0: rw=0, want=277422, limit=277408
      [   43.106120] Buffer I/O error on device loop0, logical block 138710
      [   43.106124] attempt to access beyond end of device
      [   43.106128] loop0: rw=0, want=277424, limit=277408
      [   43.106131] Buffer I/O error on device loop0, logical block 138711
      [   43.106135] attempt to access beyond end of device
      [   43.106139] loop0: rw=0, want=277426, limit=277408
      [   43.106143] Buffer I/O error on device loop0, logical block 138712
      [   43.106147] attempt to access beyond end of device
      [   43.106151] loop0: rw=0, want=277428, limit=277408
      [   43.106154] Buffer I/O error on device loop0, logical block 138713
      [   43.106158] attempt to access beyond end of device
      [   43.106162] loop0: rw=0, want=277430, limit=277408
      [   43.106166] attempt to access beyond end of device
      [   43.106169] loop0: rw=0, want=277432, limit=277408
      ...
      [   43.106307] attempt to access beyond end of device
      [   43.106311] loop0: rw=0, want=277470, limit=2774
      
      Squashfs manages to read in the end block(s) of the disk during the
      mount operation.  Then, when dd reads the block device, it leads to
      block_read_full_page being called with buffers that are beyond end of
      disk, but are marked as mapped.  Thus, it would end up submitting read
      I/O against them, resulting in the errors mentioned above.  I fixed the
      problem by modifying init_page_buffers to only set the buffer mapped if
      it fell inside of i_size.
      
      Cheers,
      Jeff
      Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
      Acked-by: NNick Piggin <npiggin@kernel.dk>
      
      --
      
      Changes from v1->v2: re-used max_block, as suggested by Nick Piggin.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      080399aa
  10. 06 5月, 2012 2 次提交
    • J
      writeback: Avoid iput() from flusher thread · 169ebd90
      Jan Kara 提交于
      Doing iput() from flusher thread (writeback_sb_inodes()) can create problems
      because iput() can do a lot of work - for example truncate the inode if it's
      the last iput on unlinked file. Some filesystems depend on flusher thread
      progressing (e.g. because they need to flush delay allocated blocks to reduce
      allocation uncertainty) and so flusher thread doing truncate creates
      interesting dependencies and possibilities for deadlocks.
      
      We get rid of iput() in flusher thread by using the fact that I_SYNC inode
      flag effectively pins the inode in memory. So if we take care to either hold
      i_lock or have I_SYNC set, we can get away without taking inode reference
      in writeback_sb_inodes().
      
      As a side effect of these changes, we also fix possible use-after-free in
      wb_writeback() because inode_wait_for_writeback() call could try to reacquire
      i_lock on the inode that was already free.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
      169ebd90
    • J
      vfs: Rename end_writeback() to clear_inode() · dbd5768f
      Jan Kara 提交于
      After we moved inode_sync_wait() from end_writeback() it doesn't make sense
      to call the function end_writeback() anymore. Rename it to clear_inode()
      which well says what the function really does - set I_CLEAR flag.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
      dbd5768f
  11. 03 5月, 2012 1 次提交
  12. 08 4月, 2012 1 次提交
    • E
      userns: Replace the hard to write inode_userns with inode_capable. · 1a48e2ac
      Eric W. Biederman 提交于
      This represents a change in strategy of how to handle user namespaces.
      Instead of tagging everything explicitly with a user namespace and bulking
      up all of the comparisons of uids and gids in the kernel,  all uids and gids
      in use will have a mapping to a flat kuid and kgid spaces respectively.  This
      allows much more of the existing logic to be preserved and in general
      allows for faster code.
      
      In this new and improved world we allow someone to utiliize capabilities
      over an inode if the inodes owner mapps into the capabilities holders user
      namespace and the user has capabilities in their user namespace.  Which
      is simple and efficient.
      
      Moving the fs uid comparisons to be comparisons in a flat kuid space
      follows in later patches, something that is only significant if you
      are using user namespaces.
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      1a48e2ac
  13. 06 4月, 2012 1 次提交
  14. 02 4月, 2012 1 次提交
  15. 22 3月, 2012 1 次提交
    • A
      vfs: remove unused superblock helpers · 9d547c35
      Artem Bityutskiy 提交于
      Remove the 'sb_mark_dirty()', 'sb_mark_clean()' and 'sb_is_dirty()'
      helpers which are not used. I introduced them 2 years and the
      intention was to make all file-systems use them in order to be able to
      optimize 'sync_supers()'.  However, Al Viro vetoed my patches at the
      end and asked me to push superblock management down to file-systems
      and get rid of the 's_dirt' flag completely, as well as kill
      'sync_supers()' altogether. Thus, remove the helpers.
      Signed-off-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      9d547c35
  16. 21 3月, 2012 3 次提交
  17. 14 3月, 2012 1 次提交
  18. 05 3月, 2012 1 次提交
    • P
      BUG: headers with BUG/BUG_ON etc. need linux/bug.h · 187f1882
      Paul Gortmaker 提交于
      If a header file is making use of BUG, BUG_ON, BUILD_BUG_ON, or any
      other BUG variant in a static inline (i.e. not in a #define) then
      that header really should be including <linux/bug.h> and not just
      expecting it to be implicitly present.
      
      We can make this change risk-free, since if the files using these
      headers didn't have exposure to linux/bug.h already, they would have
      been causing compile failures/warnings.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      187f1882
  19. 14 2月, 2012 1 次提交
  20. 24 1月, 2012 2 次提交
  21. 13 1月, 2012 2 次提交