1. 02 12月, 2011 1 次提交
    • A
      ocfs2: avoid unaligned access to dqc_bitmap · 93925579
      Akinobu Mita 提交于
      The dqc_bitmap field of struct ocfs2_local_disk_chunk is 32-bit aligned,
      but not 64-bit aligned.  The dqc_bitmap is accessed by ocfs2_set_bit(),
      ocfs2_clear_bit(), ocfs2_test_bit(), or ocfs2_find_next_zero_bit().  These
      are wrapper macros for ext2_*_bit() which need to take an unsigned long
      aligned address (though some architectures are able to handle unaligned
      address correctly)
      
      So some 64bit architectures may not be able to access the dqc_bitmap
      correctly.
      
      This avoids such unaligned access by using another wrapper functions for
      ext2_*_bit().  The code is taken from fs/ext4/mballoc.c which also need to
      handle unaligned bitmap access.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Acked-by: NJoel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NJoel Becker <jlbec@evilplan.org>
      93925579
  2. 01 12月, 2011 10 次提交
  3. 25 11月, 2011 1 次提交
    • T
      ext4: fix racy use-after-free in ext4_end_io_dio() · 4c81f045
      Tejun Heo 提交于
      ext4_end_io_dio() queues io_end->work and then clears iocb->private;
      however, io_end->work calls aio_complete() which frees the iocb
      object.  If that slab object gets reallocated, then ext4_end_io_dio()
      can end up clearing someone else's iocb->private, this use-after-free
      can cause a leak of a struct ext4_io_end_t structure.
      
      Detected and tested with slab poisoning.
      
      [ Note: Can also reproduce using 12 fio's against 12 file systems with the
        following configuration file:
      
        [global]
        direct=1
        ioengine=libaio
        iodepth=1
        bs=4k
        ba=4k
        size=128m
      
        [create]
        filename=${TESTDIR}
        rw=write
      
        -- tytso ]
      
      Google-Bug-Id: 5354697
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reported-by: NKent Overstreet <koverstreet@google.com>
      Tested-by: NKent Overstreet <koverstreet@google.com>
      Cc: stable@kernel.org
      4c81f045
  4. 24 11月, 2011 3 次提交
    • T
      eCryptfs: Extend array bounds for all filename chars · 0f751e64
      Tyler Hicks 提交于
      From mhalcrow's original commit message:
      
          Characters with ASCII values greater than the size of
          filename_rev_map[] are valid filename characters.
          ecryptfs_decode_from_filename() will access kernel memory beyond
          that array, and ecryptfs_parse_tag_70_packet() will then decrypt
          those characters. The attacker, using the FNEK of the crafted file,
          can then re-encrypt the characters to reveal the kernel memory past
          the end of the filename_rev_map[] array. I expect low security
          impact since this array is statically allocated in the text area,
          and the amount of memory past the array that is accessible is
          limited by the largest possible ASCII filename character.
      
      This patch solves the issue reported by mhalcrow but with an
      implementation suggested by Linus to simply extend the length of
      filename_rev_map[] to 256. Characters greater than 0x7A are mapped to
      0x00, which is how invalid characters less than 0x7A were previously
      being handled.
      Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
      Reported-by: NMichael Halcrow <mhalcrow@google.com>
      Cc: stable@kernel.org
      0f751e64
    • T
      eCryptfs: Flush file in vma close · 32001d6f
      Tyler Hicks 提交于
      Dirty pages weren't being written back when an mmap'ed eCryptfs file was
      closed before the mapping was unmapped. Since f_ops->flush() is not
      called by the munmap() path, the lower file was simply being released.
      This patch flushes the eCryptfs file in the vm_ops->close() path.
      
      https://launchpad.net/bugs/870326Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
      Cc: stable@kernel.org [2.6.39+]
      32001d6f
    • T
      eCryptfs: Prevent file create race condition · b59db43a
      Tyler Hicks 提交于
      The file creation path prematurely called d_instantiate() and
      unlock_new_inode() before the eCryptfs inode info was fully
      allocated and initialized and before the eCryptfs metadata was written
      to the lower file.
      
      This could result in race conditions in subsequent file and inode
      operations leading to unexpected error conditions or a null pointer
      dereference while attempting to use the unallocated memory.
      
      https://launchpad.net/bugs/813146Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
      Cc: stable@kernel.org
      b59db43a
  5. 23 11月, 2011 1 次提交
  6. 22 11月, 2011 2 次提交
  7. 21 11月, 2011 1 次提交
    • D
      VFS: Log the fact that we've given ELOOP rather than creating a loop · dd179946
      David Howells 提交于
      To prevent an NFS server from being used to create a directory loop in an NFS
      superblock on the client, the following patch was committed:
      
      	commit 18367501
      	Author: Al Viro <viro@zeniv.linux.org.uk>
      	Date:   Tue Jul 12 21:42:24 2011 -0400
      	Subject: fix loop checks in d_materialise_unique()
      
      This causes ELOOP to be reported to anyone trying to access the dentry that
      would otherwise cause the kernel to complete the loop.
      
      However, no indication is given to the caller as to why an operation that ought
      to work doesn't.  The fault is with the kernel, which doesn't want to try and
      solve the problem as it gets horrendously messy if there's another mountpoint
      somewhere in the trees being spliced that can't be moved[*].
      
      [*] The real problem is that we don't handle the excision of a subtree that
      gets moved _out_ of what we can see.  This can happen on the server where a
      directory is merely moved between two other dirs on the same filesystem, but
      where destination dir is not accessible by the client.
      
      So, given the choice to return ELOOP rather than trying to reconfigure the
      dentry tree, we should give the caller some indication of why they aren't being
      allowed to make what should be a legitimate request and log a message.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NSachin Prabhu <sprabhu@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      dd179946
  8. 20 11月, 2011 12 次提交
    • J
      Btrfs: sectorsize align offsets in fiemap · 4d479cf0
      Josef Bacik 提交于
      We've been hitting BUG()'s in btrfs_cont_expand and btrfs_fallocate and anywhere
      else that calls btrfs_get_extent while running xfstests 13 in a loop.  This is
      because fiemap is calling btrfs_get_extent with non-sectorsize aligned offsets,
      which will end up adding mappings that are not sectorsize aligned, which will
      cause problems in some cases for subsequent calls to btrfs_get_extent for
      similar areas that are sectorsize aligned.  With this patch I ran xfstests 13 in
      a loop for a couple of hours and didn't hit the problem that I could previously
      hit in at most 20 minutes.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      4d479cf0
    • J
      Btrfs: clear pages dirty for io and set them extent mapped · f7d61dcd
      Josef Bacik 提交于
      When doing the io_ctl helpers to clean up the free space cache stuff I stopped
      using our normal prepare_pages stuff, which means I of course forgot to do
      things like set the pages extent mapped, which will cause us all sorts of
      wonderful propblems.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      f7d61dcd
    • J
      Btrfs: wait on caching if we're loading the free space cache · 291c7d2f
      Josef Bacik 提交于
      We've been hitting panics when running xfstest 13 in a loop for long periods of
      time.  And actually this problem has always existed so we've been hitting these
      things randomly for a while.  Basically what happens is we get a thread coming
      into the allocator and reading the space cache off of disk and adding the
      entries to the free space cache as we go.  Then we get another thread that comes
      in and tries to allocate from that block group.  Since block_group->cached !=
      BTRFS_CACHE_NO it goes ahead and tries to do the allocation.  We do this because
      if we're doing the old slow way of caching we don't want to hold people up and
      wait for everything to finish.  The problem with this is we could end up
      discarding the space cache at some arbitrary point in the future, which means we
      could very well end up allocating space that is either bad, or when the real
      caching happens it could end up thinking the space isn't in use when it really
      is and cause all sorts of other problems.
      
      The solution is to add a new flag to indicate we are loading the free space
      cache from disk, and always try to cache the block group if cache->cached !=
      BTRFS_CACHE_FINISHED.  That way if we are loading the space cache anybody else
      who tries to allocate from the block group will have to wait until it's finished
      to make sure it completes successfully.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      291c7d2f
    • A
      Btrfs: prefix resize related printks with btrfs: · 5bb14682
      Arnd Hannemann 提交于
      For the user it is confusing to find something like:
      [10197.627710] new size for /dev/mapper/vg0-usr_share is 3221225472
      in kernel log, because it doesn't point directly to btrfs.
      
      This patch prefixes those messages with "btrfs:" like other btrfs
      related printks.
      Signed-off-by: NArnd Hannemann <arnd@arndnet.de>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      5bb14682
    • D
      btrfs: fix stat blocks accounting · fadc0d8b
      David Sterba 提交于
      Round inode bytes and delalloc bytes up to real blocksize before
      converting to sector size. Otherwise eg. files smaller than 512
      are reported with zero blocks due to incorrect rounding.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      fadc0d8b
    • L
      Btrfs: avoid unnecessary bitmap search for cluster setup · 52621cb6
      Li Zefan 提交于
      setup_cluster_no_bitmap() searches all the extents and bitmaps starting
      from offset. Therefore if it returns -ENOSPC, all the bitmaps starting
      from offset are in the bitmaps list, so it's sufficient to search from
      this list in setup_cluser_bitmap().
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      52621cb6
    • L
      Btrfs: fix to search one more bitmap for cluster setup · 0f0fbf1d
      Li Zefan 提交于
      Suppose there are two bitmaps [0, 256], [256, 512] and one extent
      [100, 120] in the free space cache, and we want to setup a cluster
      with offset=100, bytes=50.
      
      In this case, there will be only one bitmap [256, 512] in the temporary
      bitmaps list, and then setup_cluster_bitmap() won't search bitmap [0, 256].
      
      The cause is, the list is constructed in setup_cluster_no_bitmap(),
      and only bitmaps with bitmap_entry->offset >= offset will be added
      into the list, and the very bitmap that convers offset has
      bitmap_entry->offset <= offset.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      0f0fbf1d
    • J
      btrfs: mirror_num should be int, not u64 · 32240a91
      Jan Schmidt 提交于
      My previous patch introduced some u64 for failed_mirror variables, this one
      makes it consistent again.
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      32240a91
    • J
      btrfs: Fix up 32/64-bit compatibility for new ioctls · 745c4d8e
      Jeff Mahoney 提交于
       This patch casts to unsigned long before casting to a pointer and fixes
       the following warnings:
      fs/btrfs/extent_io.c:2289:20: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
      fs/btrfs/ioctl.c:2933:37: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
      fs/btrfs/ioctl.c:2937:21: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
      fs/btrfs/ioctl.c:3020:21: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
      fs/btrfs/scrub.c:275:4: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
      fs/btrfs/backref.c:686:27: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      745c4d8e
    • C
      Btrfs: fix barrier flushes · 387125fc
      Chris Mason 提交于
      When btrfs is writing the super blocks, it send barrier flushes to make
      sure writeback caching drives get all the metadata on disk in the
      right order.
      
      But, we have two bugs in the way these are sent down.  When doing
      full commits (not via the tree log), we are sending the barrier down
      before the last super when it should be going down before the first.
      
      In multi-device setups, we should be waiting for the barriers to
      complete on all devices before writing any of the supers.
      
      Both of these bugs can cause corruptions on power failures.  We fix it
      with some new code to send down empty barriers to all devices before
      writing the first super.
      
      Alexandre Oliva found the multi-device bug.  Arne Jansen did the async
      barrier loop.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      Reported-by: NAlexandre Oliva <oliva@lsd.ic.unicamp.br>
      387125fc
    • A
      minixfs: kill manual hweight(), simplify · f1fd306a
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      f1fd306a
    • J
      fs/minix: Verify bitmap block counts before mounting · 016e8d44
      Josh Boyer 提交于
      Newer versions of MINIX can create filesystems that allocate an extra
      bitmap block.  Mounting of this succeeds, but doing a statfs call will
      result in an oops in count_free because of a negative number being used
      for the bh index.
      
      Avoid this by verifying the number of allocated blocks at mount time,
      erroring out if there are not enough and make statfs ignore the extras
      if there are too many.
      
      This fixes https://bugzilla.kernel.org/show_bug.cgi?id=18792Signed-off-by: NJosh Boyer <jwboyer@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      016e8d44
  9. 18 11月, 2011 1 次提交
    • K
      pstore: pass allocated memory region back to caller · f6f82851
      Kees Cook 提交于
      The buf_lock cannot be held while populating the inodes, so make the backend
      pass forward an allocated and filled buffer instead. This solves the following
      backtrace. The effect is that "buf" is only ever used to notify the backends
      that something was written to it, and shouldn't be used in the read path.
      
      To replace the buf_lock during the read path, isolate the open/read/close
      loop with a separate mutex to maintain serialized access to the backend.
      
      Note that is is up to the pstore backend to cope if the (*write)() path is
      called in the middle of the read path.
      
      [   59.691019] BUG: sleeping function called from invalid context at .../mm/slub.c:847
      [   59.691019] in_atomic(): 0, irqs_disabled(): 1, pid: 1819, name: mount
      [   59.691019] Pid: 1819, comm: mount Not tainted 3.0.8 #1
      [   59.691019] Call Trace:
      [   59.691019]  [<810252d5>] __might_sleep+0xc3/0xca
      [   59.691019]  [<810a26e6>] kmem_cache_alloc+0x32/0xf3
      [   59.691019]  [<810b53ac>] ? __d_lookup_rcu+0x6f/0xf4
      [   59.691019]  [<810b68b1>] alloc_inode+0x2a/0x64
      [   59.691019]  [<810b6903>] new_inode+0x18/0x43
      [   59.691019]  [<81142447>] pstore_get_inode.isra.1+0x11/0x98
      [   59.691019]  [<81142623>] pstore_mkfile+0xae/0x26f
      [   59.691019]  [<810a2a66>] ? kmem_cache_free+0x19/0xb1
      [   59.691019]  [<8116c821>] ? ida_get_new_above+0x140/0x158
      [   59.691019]  [<811708ea>] ? __init_rwsem+0x1e/0x2c
      [   59.691019]  [<810b67e8>] ? inode_init_always+0x111/0x1b0
      [   59.691019]  [<8102127e>] ? should_resched+0xd/0x27
      [   59.691019]  [<8137977f>] ? _cond_resched+0xd/0x21
      [   59.691019]  [<81142abf>] pstore_get_records+0x52/0xa7
      [   59.691019]  [<8114254b>] pstore_fill_super+0x7d/0x91
      [   59.691019]  [<810a7ff5>] mount_single+0x46/0x82
      [   59.691019]  [<8114231a>] pstore_mount+0x15/0x17
      [   59.691019]  [<811424ce>] ? pstore_get_inode.isra.1+0x98/0x98
      [   59.691019]  [<810a8199>] mount_fs+0x5a/0x12d
      [   59.691019]  [<810b9174>] ? alloc_vfsmnt+0xa4/0x14a
      [   59.691019]  [<810b9474>] vfs_kern_mount+0x4f/0x7d
      [   59.691019]  [<810b9d7e>] do_kern_mount+0x34/0xb2
      [   59.691019]  [<810bb15f>] do_mount+0x5fc/0x64a
      [   59.691019]  [<810912fb>] ? strndup_user+0x2e/0x3f
      [   59.691019]  [<810bb3cb>] sys_mount+0x66/0x99
      [   59.691019]  [<8137b537>] sysenter_do_call+0x12/0x26
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      f6f82851
  10. 17 11月, 2011 8 次提交
    • J
      ocfs2: Use filemap_write_and_wait() instead of write_inode_now() · 249ec93c
      Jan Kara 提交于
      Since ocfs2 has no ->write_inode method, there's no point in calling
      write_inode_now() from ocfs2_cleanup_delete_inode().  Use
      filemap_write_and_wait() instead. This helps us to cleanup inode writing
      interfaces...
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJoel Becker <jlbec@evilplan.org>
      249ec93c
    • M
      ocfs2: honor O_(D)SYNC flag in fallocate · df295d4a
      Mark Fasheh 提交于
      We need to sync the transaction which updates i_size if the file is marked
      as needing sync semantics.
      Signed-off-by: NMark Fasheh <mfasheh@suse.de>
      Signed-off-by: NJoel Becker <jlbec@evilplan.org>
      df295d4a
    • X
      ocfs2: Add a missing journal credit in ocfs2_link_credits() -v2 · 0393afea
      Xiaowei.Hu 提交于
      With indexed_dir enabled, ocfs2 maintains a list of dirblocks having
      space.
      
      The credit calculation in ocfs2_link_credits() did not correctly account
      for adding an entry that exactly fills a dirblock that triggers removing
      that dirblock by changing the pointer in the previous block in the list.
      The credit calculation did not account for that previous block.
      
      To expose, do:
      
      mkfs.ocfs2 -b 512 -M local /dev/sdX
      mount /dev/sdX /ocfs2
      mkdir /ocfs2/linkdir
      touch /ocfs2/linkdir/file1
      for i in `seq 1 29` ; do link /ocfs2/linkdir/file1
      /ocfs2/linkdir/linklinklinklinklinklink$i; done
      rm -f /ocfs2/linkdir/linklinklinklinklinklink10
      sleep 8
      link /ocfs2/linkdir/file1
      /ocfs2/linkdir/linklinklinklinklinklinkaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
      
      Note:
      The link names have been crafted for a 512 byte blocksize. Reproducing
      with a larger blocksize will require longer (or more) links. The sleep
      is important. We want jbd2 to commit the transaction so that the missing
      block does not piggy back on account of the previous transaction.
      
      Signed-off-by: XiaoweiHu <xiaowei.hu at oracle.com>
      Reviewed-by: WengangWang <wen.gang.wang at oracle.com>
      Reviewed-by: Sunil.Mushran <sunil.mushran at oracle.com>
      Signed-off-by: NJoel Becker <jlbec@evilplan.org>
      0393afea
    • D
      ocfs2: send correct UUID to cleancache initialization · e41d33af
      Dan Magenheimer 提交于
      ocfs2: Fix cleancache initialization call to correctly pass uuid
      
      As reported by Steven Whitehouse in https://lkml.org/lkml/2011/5/27/221
      the ocfs2 volume UUID is incorrectly passed to cleancache.
      As a result, shared-ephemeral tmem pools will not actually
      be created; instead they will be private (unshared) which
      misses out on a major benefit of tmem.
      Reported-by: NSteven Whitehouse <swhiteho@redhat.com>
      Signed-off-by: NDan Magenheimer <dan.magenheimer@oracle.com>
      Signed-off-by: NJoel Becker <jlbec@evilplan.org>
      e41d33af
    • W
      ocfs2: Commit transactions in error cases -v2 · b8a0ae57
      Wengang Wang 提交于
      There are three cases found that in error cases, journal transactions are not
      committed nor aborted. We should take care of these case by committing the
      transactions. Otherwise, there would left a journal handle which will lead to
      , in same process context, the comming ocfs2_start_trans() gets wrong credits.
      Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
      Signed-off-by: NJoel Becker <jlbec@evilplan.org>
      b8a0ae57
    • W
      ocfs2: make direntry invalid when deleting it · 82985248
      Wengang Wang 提交于
      When we deleting a direntry from a directory, if it's the first in a block we
      invalid it by setting inode to 0; otherwise, we merge the deleted one to the
      prior and contiguous direntry. And we don't truncate directories.
      
      There is a problem for the later case since inode is not set to 0.
      This problem happens when the caller passes a file position as parameter to
      ocfs2_dir_foreach_blk(). If the position happens to point to a stale(not
      the first, deleted in betweens of ocfs2_dir_foreach_blk()s) direntry, we are
      not able to recognize its staleness. So that we treat it as a live one wrongly.
      
      The fix is to set inode to 0 in both cases indicating the direntry is stale.
      This won't introduce additional IOs.
      Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
      Signed-off-by: NJoel Becker <jlbec@evilplan.org>
      82985248
    • J
      fs/ocfs2/dlm/dlmlock.c: free kmem_cache_zalloc'd data using kmem_cache_free · fc9f8994
      Julia Lawall 提交于
      Memory allocated using kmem_cache_zalloc should be freed using
      kmem_cache_free, not kfree.
      
      The semantic patch that fixes this problem is as follows:
      (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @@
      expression x,e,e1,e2;
      @@
      
      x = kmem_cache_zalloc(e1,e2)
      ... when != x = e
      ?-kfree(x)
      +kmem_cache_free(e1,x)
      // </smpl>
      Signed-off-by: NJulia Lawall <julia@diku.dk>
      Signed-off-by: NJoel Becker <jlbec@evilplan.org>
      fc9f8994
    • A
      new helper: mount_subtree() · ea441d11
      Al Viro 提交于
      takes vfsmount and relative path, does lookup within that vfsmount
      (possibly triggering automounts) and returns the result as root
      of subtree suitable for return by ->mount() (i.e. a reference to
      dentry and an active reference to its superblock grabbed, superblock
      locked exclusive).
      
      btrfs and nfs switched to it instead of open-coding the sucker.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ea441d11