1. 12 12月, 2011 1 次提交
  2. 09 12月, 2011 6 次提交
  3. 08 12月, 2011 4 次提交
  4. 07 12月, 2011 3 次提交
    • A
      fix apparmor dereferencing potentially freed dentry, sanitize __d_path() API · 02125a82
      Al Viro 提交于
      __d_path() API is asking for trouble and in case of apparmor d_namespace_path()
      getting just that.  The root cause is that when __d_path() misses the root
      it had been told to look for, it stores the location of the most remote ancestor
      in *root.  Without grabbing references.  Sure, at the moment of call it had
      been pinned down by what we have in *path.  And if we raced with umount -l, we
      could have very well stopped at vfsmount/dentry that got freed as soon as
      prepend_path() dropped vfsmount_lock.
      
      It is safe to compare these pointers with pre-existing (and known to be still
      alive) vfsmount and dentry, as long as all we are asking is "is it the same
      address?".  Dereferencing is not safe and apparmor ended up stepping into
      that.  d_namespace_path() really wants to examine the place where we stopped,
      even if it's not connected to our namespace.  As the result, it looked
      at ->d_sb->s_magic of a dentry that might've been already freed by that point.
      All other callers had been careful enough to avoid that, but it's really
      a bad interface - it invites that kind of trouble.
      
      The fix is fairly straightforward, even though it's bigger than I'd like:
      	* prepend_path() root argument becomes const.
      	* __d_path() is never called with NULL/NULL root.  It was a kludge
      to start with.  Instead, we have an explicit function - d_absolute_root().
      Same as __d_path(), except that it doesn't get root passed and stops where
      it stops.  apparmor and tomoyo are using it.
      	* __d_path() returns NULL on path outside of root.  The main
      caller is show_mountinfo() and that's precisely what we pass root for - to
      skip those outside chroot jail.  Those who don't want that can (and do)
      use d_path().
      	* __d_path() root argument becomes const.  Everyone agrees, I hope.
      	* apparmor does *NOT* try to use __d_path() or any of its variants
      when it sees that path->mnt is an internal vfsmount.  In that case it's
      definitely not mounted anywhere and dentry_path() is exactly what we want
      there.  Handling of sysctl()-triggered weirdness is moved to that place.
      	* if apparmor is asked to do pathname relative to chroot jail
      and __d_path() tells it we it's not in that jail, the sucker just calls
      d_absolute_path() instead.  That's the other remaining caller of __d_path(),
      BTW.
              * seq_path_root() does _NOT_ return -ENAMETOOLONG (it's stupid anyway -
      the normal seq_file logics will take care of growing the buffer and redoing
      the call of ->show() just fine).  However, if it gets path not reachable
      from root, it returns SEQ_SKIP.  The only caller adjusted (i.e. stopped
      ignoring the return value as it used to do).
      Reviewed-by: NJohn Johansen <john.johansen@canonical.com>
      ACKed-by: NJohn Johansen <john.johansen@canonical.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: stable@vger.kernel.org
      02125a82
    • C
      xfs: fix the logspace waiting algorithm · 9f9c19ec
      Christoph Hellwig 提交于
      Apply the scheme used in log_regrant_write_log_space to wake up any other
      threads waiting for log space before the newly added one to
      log_regrant_write_log_space as well, and factor the code into readable
      helpers.  For each of the queues we have add two helpers:
      
       - one to try to wake up all waiting threads.  This helper will also be
         usable by xfs_log_move_tail once we remove the current opportunistic
         wakeups in it.
       - one to sleep on t_wait until enough log space is available, loosely
         modelled after Linux waitqueues.
       
      And use them to reimplement the guts of log_regrant_write_log_space and
      log_regrant_write_log_space.  These two function now use one and the same
      algorithm for waiting on log space instead of subtly different ones before,
      with an option to completely unify them in the near future.
      
      Also move the filesystem shutdown handling to the common caller given
      that we had to touch it anyway.
      
      Based on hard debugging and an earlier patch from
      Chandra Seetharaman <sekharan@us.ibm.com>.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NChandra Seetharaman <sekharan@us.ibm.com>
      Tested-by: NChandra Seetharaman <sekharan@us.ibm.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      9f9c19ec
    • C
      xfs: fix nfs export of 64-bit inodes numbers on 32-bit kernels · c29f7d45
      Christoph Hellwig 提交于
      The i_ino field in the VFS inode is of type unsigned long and thus can't
      hold the full 64-bit inode number on 32-bit kernels.  We have the full
      inode number in the XFS inode, so use that one for nfs exports.  Note
      that I've also switched the 32-bit file handles types to it, just to make
      the code more consistent and copy & paste errors less likely to happen.
      Reported-by: NGuoquan Yang <ygq51@hotmail.com>
      Reported-by: NHank Peng <pengxihan@gmail.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      c29f7d45
  5. 03 12月, 2011 1 次提交
    • D
      xfs: fix allocation length overflow in xfs_bmapi_write() · a99ebf43
      Dave Chinner 提交于
      When testing the new xfstests --large-fs option that does very large
      file preallocations, this assert was tripped deep in
      xfs_alloc_vextent():
      
      XFS: Assertion failed: args->minlen <= args->maxlen, file: fs/xfs/xfs_alloc.c, line: 2239
      
      The allocation was trying to allocate a zero length extent because
      the lower 32 bits of the allocation length was zero. The remaining
      length of the allocation to be done was an exact multiple of 2^32 -
      the first case I saw was at 496TB remaining to be allocated.
      
      This turns out to be an overflow when converting the allocation
      length (a 64 bit quantity) into the extent length to allocate (a 32
      bit quantity), and it requires the length to be allocated an exact
      multiple of 2^32 blocks to trip the assert.
      
      Fix it by limiting the extent lenth to allocate to MAXEXTLEN.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      a99ebf43
  6. 02 12月, 2011 1 次提交
    • A
      ocfs2: avoid unaligned access to dqc_bitmap · 93925579
      Akinobu Mita 提交于
      The dqc_bitmap field of struct ocfs2_local_disk_chunk is 32-bit aligned,
      but not 64-bit aligned.  The dqc_bitmap is accessed by ocfs2_set_bit(),
      ocfs2_clear_bit(), ocfs2_test_bit(), or ocfs2_find_next_zero_bit().  These
      are wrapper macros for ext2_*_bit() which need to take an unsigned long
      aligned address (though some architectures are able to handle unaligned
      address correctly)
      
      So some 64bit architectures may not be able to access the dqc_bitmap
      correctly.
      
      This avoids such unaligned access by using another wrapper functions for
      ext2_*_bit().  The code is taken from fs/ext4/mballoc.c which also need to
      handle unaligned bitmap access.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Acked-by: NJoel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NJoel Becker <jlbec@evilplan.org>
      93925579
  7. 01 12月, 2011 10 次提交
  8. 30 11月, 2011 2 次提交
    • C
      xfs: fix attr2 vs large data fork assert · 4c393a60
      Christoph Hellwig 提交于
      With Dmitry fsstress updates I've seen very reproducible crashes in
      xfs_attr_shortform_remove because xfs_attr_shortform_bytesfit claims that
      the attributes would not fit inline into the inode after removing an
      attribute.  It turns out that we were operating on an inode with lots
      of delalloc extents, and thus an if_bytes values for the data fork that
      is larger than biggest possible on-disk storage for it which utterly
      confuses the code near the end of xfs_attr_shortform_bytesfit.
      
      Fix this by always allowing the current attribute fork, like we already
      do for the attr1 format, given that delalloc conversion will take care
      for moving either the data or attribute area out of line if it doesn't
      fit at that point - or making the point moot by merging extents at this
      point.
      
      Also document the function better, and clean up some loose bits.
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      4c393a60
    • C
      xfs: force buffer writeback before blocking on the ilock in inode reclaim · 4dd2cb4a
      Christoph Hellwig 提交于
      If we are doing synchronous inode reclaim we block the VM from making
      progress in memory reclaim.  So if we encouter a flush locked inode
      promote it in the delwri list and wake up xfsbufd to write it out now.
      Without this we can get hangs of up to 30 seconds during workloads hitting
      synchronous inode reclaim.
      
      The scheme is copied from what we do for dquot reclaims.
      Reported-by: NSimon Kirby <sim@hostway.ca>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NSimon Kirby <sim@hostway.ca>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      4dd2cb4a
  9. 29 11月, 2011 1 次提交
  10. 25 11月, 2011 1 次提交
    • T
      ext4: fix racy use-after-free in ext4_end_io_dio() · 4c81f045
      Tejun Heo 提交于
      ext4_end_io_dio() queues io_end->work and then clears iocb->private;
      however, io_end->work calls aio_complete() which frees the iocb
      object.  If that slab object gets reallocated, then ext4_end_io_dio()
      can end up clearing someone else's iocb->private, this use-after-free
      can cause a leak of a struct ext4_io_end_t structure.
      
      Detected and tested with slab poisoning.
      
      [ Note: Can also reproduce using 12 fio's against 12 file systems with the
        following configuration file:
      
        [global]
        direct=1
        ioengine=libaio
        iodepth=1
        bs=4k
        ba=4k
        size=128m
      
        [create]
        filename=${TESTDIR}
        rw=write
      
        -- tytso ]
      
      Google-Bug-Id: 5354697
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reported-by: NKent Overstreet <koverstreet@google.com>
      Tested-by: NKent Overstreet <koverstreet@google.com>
      Cc: stable@kernel.org
      4c81f045
  11. 24 11月, 2011 3 次提交
    • T
      eCryptfs: Extend array bounds for all filename chars · 0f751e64
      Tyler Hicks 提交于
      From mhalcrow's original commit message:
      
          Characters with ASCII values greater than the size of
          filename_rev_map[] are valid filename characters.
          ecryptfs_decode_from_filename() will access kernel memory beyond
          that array, and ecryptfs_parse_tag_70_packet() will then decrypt
          those characters. The attacker, using the FNEK of the crafted file,
          can then re-encrypt the characters to reveal the kernel memory past
          the end of the filename_rev_map[] array. I expect low security
          impact since this array is statically allocated in the text area,
          and the amount of memory past the array that is accessible is
          limited by the largest possible ASCII filename character.
      
      This patch solves the issue reported by mhalcrow but with an
      implementation suggested by Linus to simply extend the length of
      filename_rev_map[] to 256. Characters greater than 0x7A are mapped to
      0x00, which is how invalid characters less than 0x7A were previously
      being handled.
      Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
      Reported-by: NMichael Halcrow <mhalcrow@google.com>
      Cc: stable@kernel.org
      0f751e64
    • T
      eCryptfs: Flush file in vma close · 32001d6f
      Tyler Hicks 提交于
      Dirty pages weren't being written back when an mmap'ed eCryptfs file was
      closed before the mapping was unmapped. Since f_ops->flush() is not
      called by the munmap() path, the lower file was simply being released.
      This patch flushes the eCryptfs file in the vm_ops->close() path.
      
      https://launchpad.net/bugs/870326Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
      Cc: stable@kernel.org [2.6.39+]
      32001d6f
    • T
      eCryptfs: Prevent file create race condition · b59db43a
      Tyler Hicks 提交于
      The file creation path prematurely called d_instantiate() and
      unlock_new_inode() before the eCryptfs inode info was fully
      allocated and initialized and before the eCryptfs metadata was written
      to the lower file.
      
      This could result in race conditions in subsequent file and inode
      operations leading to unexpected error conditions or a null pointer
      dereference while attempting to use the unallocated memory.
      
      https://launchpad.net/bugs/813146Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
      Cc: stable@kernel.org
      b59db43a
  12. 23 11月, 2011 1 次提交
  13. 22 11月, 2011 2 次提交
  14. 21 11月, 2011 1 次提交
    • D
      VFS: Log the fact that we've given ELOOP rather than creating a loop · dd179946
      David Howells 提交于
      To prevent an NFS server from being used to create a directory loop in an NFS
      superblock on the client, the following patch was committed:
      
      	commit 18367501
      	Author: Al Viro <viro@zeniv.linux.org.uk>
      	Date:   Tue Jul 12 21:42:24 2011 -0400
      	Subject: fix loop checks in d_materialise_unique()
      
      This causes ELOOP to be reported to anyone trying to access the dentry that
      would otherwise cause the kernel to complete the loop.
      
      However, no indication is given to the caller as to why an operation that ought
      to work doesn't.  The fault is with the kernel, which doesn't want to try and
      solve the problem as it gets horrendously messy if there's another mountpoint
      somewhere in the trees being spliced that can't be moved[*].
      
      [*] The real problem is that we don't handle the excision of a subtree that
      gets moved _out_ of what we can see.  This can happen on the server where a
      directory is merely moved between two other dirs on the same filesystem, but
      where destination dir is not accessible by the client.
      
      So, given the choice to return ELOOP rather than trying to reconfigure the
      dentry tree, we should give the caller some indication of why they aren't being
      allowed to make what should be a legitimate request and log a message.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NSachin Prabhu <sprabhu@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      dd179946
  15. 20 11月, 2011 3 次提交
    • J
      Btrfs: sectorsize align offsets in fiemap · 4d479cf0
      Josef Bacik 提交于
      We've been hitting BUG()'s in btrfs_cont_expand and btrfs_fallocate and anywhere
      else that calls btrfs_get_extent while running xfstests 13 in a loop.  This is
      because fiemap is calling btrfs_get_extent with non-sectorsize aligned offsets,
      which will end up adding mappings that are not sectorsize aligned, which will
      cause problems in some cases for subsequent calls to btrfs_get_extent for
      similar areas that are sectorsize aligned.  With this patch I ran xfstests 13 in
      a loop for a couple of hours and didn't hit the problem that I could previously
      hit in at most 20 minutes.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      4d479cf0
    • J
      Btrfs: clear pages dirty for io and set them extent mapped · f7d61dcd
      Josef Bacik 提交于
      When doing the io_ctl helpers to clean up the free space cache stuff I stopped
      using our normal prepare_pages stuff, which means I of course forgot to do
      things like set the pages extent mapped, which will cause us all sorts of
      wonderful propblems.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      f7d61dcd
    • J
      Btrfs: wait on caching if we're loading the free space cache · 291c7d2f
      Josef Bacik 提交于
      We've been hitting panics when running xfstest 13 in a loop for long periods of
      time.  And actually this problem has always existed so we've been hitting these
      things randomly for a while.  Basically what happens is we get a thread coming
      into the allocator and reading the space cache off of disk and adding the
      entries to the free space cache as we go.  Then we get another thread that comes
      in and tries to allocate from that block group.  Since block_group->cached !=
      BTRFS_CACHE_NO it goes ahead and tries to do the allocation.  We do this because
      if we're doing the old slow way of caching we don't want to hold people up and
      wait for everything to finish.  The problem with this is we could end up
      discarding the space cache at some arbitrary point in the future, which means we
      could very well end up allocating space that is either bad, or when the real
      caching happens it could end up thinking the space isn't in use when it really
      is and cause all sorts of other problems.
      
      The solution is to add a new flag to indicate we are loading the free space
      cache from disk, and always try to cache the block group if cache->cached !=
      BTRFS_CACHE_FINISHED.  That way if we are loading the space cache anybody else
      who tries to allocate from the block group will have to wait until it's finished
      to make sure it completes successfully.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      291c7d2f