1. 22 6月, 2012 5 次提交
  2. 21 6月, 2012 9 次提交
    • J
      Btrfs: delay iput with async extents · cb77fcd8
      Josef Bacik 提交于
      There is some concern that these iput()'s could be the final iputs and could
      induce lockups on people waiting on writeback.  This would happen in the
      rare case that we don't create ordered extents because of an error, but it
      is theoretically possible and we already have a mechanism to deal with this
      so just make them delayed iputs to negate any worry.
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      cb77fcd8
    • J
      Btrfs: add a missing spin_lock · e18fca73
      Josef Bacik 提交于
      When fixing up the locking in the delayed ref destruction work I accidently
      broke the locking myself ;(.  Add back a spin_lock that should be there and
      we are now all set.  Thanks,
      Btrfs: add a missing spin_lock
      
      When fixing up the locking in the delayed ref destruction work I accidently
      broke the locking myself ;(.  Add back a spin_lock that should be there and
      we are now all set.  Thanks,
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      e18fca73
    • A
      Btrfs: don't assume to be on the correct extent in add_all_parents · 69bca40d
      Alexander Block 提交于
      add_all_parents did assume that path is already at a correct extent data
      item, which may not be true in case of data extents that were partly
      rewritten and splitted.
      
      We need to check if we're on a matching extent for every item and only
      for the ones after the first. The loop is changed to do this now.
      
      This patch also fixes a bug introduced with commit 3b127fd8 "Btrfs:
      remove obsolete btrfs_next_leaf call from __resolve_indirect_ref".
      The removal of next_leaf did sometimes result in slot==nritems when
      the above described case happens, and thus resulting in invalid values
      (e.g. wanted_obejctid) in add_all_parents (leading to missed backrefs
      or even crashes).
      Signed-off-by: NAlexander Block <ablock84@googlemail.com>
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      69bca40d
    • A
      Btrfs: introduce btrfs_next_old_item · 1c8f52a5
      Alexander Block 提交于
      We introduce btrfs_next_old_item that uses btrfs_next_old_leaf instead
      of btrfs_next_leaf.
      
      btrfs_next_item is also changed to simply call btrfs_next_old_item with
      time_seq being 0.
      Signed-off-by: NAlexander Block <ablock84@googlemail.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      1c8f52a5
    • K
      mm: correctly synchronize rss-counters at exit/exec · 4fe7efdb
      Konstantin Khlebnikov 提交于
      do_exit() and exec_mmap() call sync_mm_rss() before mm_release() does
      put_user(clear_child_tid) which can update task->rss_stat and thus make
      mm->rss_stat inconsistent.  This triggers the "BUG:" printk in check_mm().
      
      Let's fix this bug in the safest way, and optimize/cleanup this later.
      Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4fe7efdb
    • R
      nilfs2: ensure proper cache clearing for gc-inodes · fbb24a3a
      Ryusuke Konishi 提交于
      A gc-inode is a pseudo inode used to buffer the blocks to be moved by
      garbage collection.
      
      Block caches of gc-inodes must be cleared every time a garbage collection
      function (nilfs_clean_segments) completes.  Otherwise, stale blocks
      buffered in the caches may be wrongly reused in successive calls of the GC
      function.
      
      For user files, this is not a problem because their gc-inodes are
      distinguished by a checkpoint number as well as an inode number.  They
      never buffer different blocks if either an inode number, a checkpoint
      number, or a block offset differs.
      
      However, gc-inodes of sufile, cpfile and DAT file can store different data
      for the same block offset.  Thus, the nilfs_clean_segments function can
      move incorrect block for these meta-data files if an old block is cached.
      I found this is really causing meta-data corruption in nilfs.
      
      This fixes the issue by ensuring cache clear of gc-inodes and resolves
      reported GC problems including checkpoint file corruption, b-tree
      corruption, and the following warning during GC.
      
        nilfs_palloc_freev: entry number 307234 already freed.
        ...
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Tested-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: <stable@vger.kernel.org>	[2.6.37+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fbb24a3a
    • J
      xfs: fix debug_object WARN at xfs_alloc_vextent() · 3b876c8f
      Jeff Liu 提交于
      Fengguang reports:
      
      [  780.529603] XFS (vdd): Ending clean mount
      [  781.454590] ODEBUG: object is on stack, but not annotated
      [  781.455433] ------------[ cut here ]------------
      [  781.455433] WARNING: at /c/kernel-tests/sound/lib/debugobjects.c:301 __debug_object_init+0x173/0x1f1()
      [  781.455433] Hardware name: Bochs
      [  781.455433] Modules linked in:
      [  781.455433] Pid: 26910, comm: kworker/0:2 Not tainted 3.4.0+ #51
      [  781.455433] Call Trace:
      [  781.455433]  [<ffffffff8106bc84>] warn_slowpath_common+0x83/0x9b
      [  781.455433]  [<ffffffff8106bcb6>] warn_slowpath_null+0x1a/0x1c
      [  781.455433]  [<ffffffff814919a5>] __debug_object_init+0x173/0x1f1
      [  781.455433]  [<ffffffff81491c65>] debug_object_init+0x14/0x16
      [  781.455433]  [<ffffffff8108842a>] __init_work+0x20/0x22
      [  781.455433]  [<ffffffff8134ea56>] xfs_alloc_vextent+0x6c/0xd5
      
      Use INIT_WORK_ONSTACK in xfs_alloc_vextent instead of INIT_WORK.
      Reported-by: NWu Fengguang <wfg@linux.intel.com>
      Signed-off-by: NJie Liu <jeff.liu@oracle.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      3b876c8f
    • A
      xfs: xfs_vm_writepage clear iomap_valid when !buffer_uptodate (REV2) · 66f93113
      Alain Renaud 提交于
      On filesytems with a block size smaller than PAGE_SIZE we currently have
      a problem with unwritten extents.  If a we have multi-block page for
      which an unwritten extent has been allocated, and only some of the
      buffers have been written to, and they are not contiguous, we can expose
      stale data from disk in the blocks between the writes after extent
      conversion.
      
      Example of a page with unwritten and real data.
      buffer  content
      0       empty  b_state = 0
      1       DATA   b_state = 0x1023 Uptodate,Dirty,Mapped,Unwritten
      2       DATA   b_state = 0x1023 Uptodate,Dirty,Mapped,Unwritten
      3       empty  b_state = 0
      4       empty  b_state = 0
      5       DATA   b_state = 0x1023 Uptodate,Dirty,Mapped,Unwritten
      6       DATA   b_state = 0x1023 Uptodate,Dirty,Mapped,Unwritten
      7       empty  b_state = 0
      
      Buffers 1, 2, 5, and 6 have been written to, leaving 0, 3, 4, and 7
      empty.  Currently buffers 1, 2, 5, and 6 are added to a single ioend,
      and when IO has completed, extent conversion creates a real extent from
      block 1 through block 6, leaving 0 and 7 unwritten.  However buffers 3
      and 4 were not written to disk, so stale data is exposed from those
      blocks on a subsequent read.
      
      Fix this by setting iomap_valid = 0 when we find a buffer that is not
      Uptodate.  This ensures that buffers 5 and 6 are not added to the same
      ioend as buffers 1 and 2.  Later these blocks will be converted into two
      separate real extents, leaving the blocks in between unwritten.
      Signed-off-by: NAlain Renaud <arenaud@sgi.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      66f93113
    • B
      NFS: Force the legacy idmapper to be single threaded · b1027439
      Bryan Schumaker 提交于
      It was initially coded under the assumption that there would only be one
      request at a time, so use a lock to enforce this requirement..
      Signed-off-by: NBryan Schumaker <bjschuma@netapp.com>
      CC: stable@vger.kernel.org [3.4+]
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      b1027439
  3. 20 6月, 2012 3 次提交
  4. 18 6月, 2012 3 次提交
  5. 16 6月, 2012 3 次提交
  6. 15 6月, 2012 17 次提交
    • M
      Btrfs: destroy the items of the delayed inodes in error handling routine · 67cde344
      Miao Xie 提交于
      the items of the delayed inodes were forgotten to be freed, this patch
      fixes it.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      67cde344
    • L
      Btrfs: make sure that we've made everything in pinned tree clean · ed0eaa14
      Liu Bo 提交于
      Since we have two trees for recording pinned extents, we need to go through
      both of them to make sure that we've done everything clean.
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      ed0eaa14
    • L
      Btrfs: avoid memory leak of extent state in error handling routine · 6e841e32
      Liu Bo 提交于
      We've forgotten to clear extent states in pinned tree, which will results in
      space counter mismatch and memory leak:
      
      WARNING: at fs/btrfs/extent-tree.c:7537 btrfs_free_block_groups+0x1f3/0x2e0 [btrfs]()
      ...
      space_info 2 has 8380416 free, is not full
      space_info total=12582912, used=4096, pinned=4096, reserved=0, may_use=0, readonly=4194304
      btrfs state leak: start 29364224 end 29376511 state 1 in tree ffff880075f20090 refs 1
      ...
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      6e841e32
    • L
      Btrfs: do not resize a seeding device · 4e42ae1b
      Liu Bo 提交于
      Seeding devices are not supposed to change any more.
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      4e42ae1b
    • L
      Btrfs: fix missing inherited flag in rename · bc178237
      Liu Bo 提交于
      When we move a file into a directory with compression flag, we need to
      inherite BTRFS_INODE_COMPRESS and clear BTRFS_INODE_NOCOMPRESS as well.
      But if we move a file into a directory without compression flag, we need
      to clear both of them.
      
      It is the way how our setflags deals with compression flag, so keep
      the same behaviour here.
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      bc178237
    • L
      Btrfs: fix incompat flags setting · 69e380d1
      Li Zefan 提交于
      It's a bug, but it happens to work, as BTRFS_COMPRESS_LZO == 2, which
      has only one bit set.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      69e380d1
    • L
      Btrfs: fix defrag regression · 6c282eb4
      Li Zefan 提交于
      If a file has 3 small extents:
      
      | ext1 | ext2 | ext3 |
      
      Running "btrfs fi defrag" will only defrag the last two extents, if those
      extent mappings hasn't been read into memory from disk.
      
      This bug was introduced by commit 17ce6ef8
      ("Btrfs: add a check to decide if we should defrag the range")
      
      The cause is, that commit looked into previous and next extents using
      lookup_extent_mapping() only.
      
      While at it, remove the code that checks the previous extent, since
      it's sufficient to check the next extent.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      6c282eb4
    • J
      Btrfs: call filemap_fdatawrite twice for compression · 7ddf5a42
      Josef Bacik 提交于
      I removed this in an earlier commit and I was wrong.  Because compression
      can return from filemap_fdatawrite() without having actually set any of it's
      pages as writeback() it can make filemap_fdatawait() do essentially nothing,
      and then we won't find any ordered extents because they may not have been
      created yet.  So not only does this make fsync() completely useless, but it
      will also screw up if you truncate on a non-page aligned offset since we
      zero out the end and then wait on ordered extents and then call drop caches.
      We can drop the cache before the io completes and then we try to unpin the
      extent we just wrote we won't find it and everything goes sideways.  So fix
      this by putting it back and put a giant comment there to keep me from trying
      to remove it in the future.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      7ddf5a42
    • J
      Btrfs: keep inode pinned when compressing writes · 8180ef88
      Josef Bacik 提交于
      A user reported lots of problems using compression on the new code and it
      turns out part of the problem was that igrab() was failing when we added a
      new ordered extent.  This is because when writing out an inode under
      compression we immediately return without actually doing anything to the
      pages, and then in another thread at some point down the line actually do
      the ordered dance.  The problem is between the point that we start writeback
      and we actually add the ordered extent we could be trying to reclaim the
      inode, which makes igrab() return NULL.  So we need to do an igrab() when we
      create the async extent and then drop it when we are done with it.  This
      makes sure we stay pinned in memory until the ordered extent can get a
      reference on it and we are good to go.  With this patch we no longer panic
      in btrfs_finish_ordered_io().  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      8180ef88
    • J
      Btrfs: implement ->show_devname · 9c5085c1
      Josef Bacik 提交于
      Because btrfs can remove the device that was mounted we need to have a
      ->show_devname so that in this case we can print out some other device in
      the file system to /proc/mount.  So if there are multiple devices in a btrfs
      file system we will just print the device with the lowest devid that we can
      find.  This will make everything consistent and deal with device removal
      properly.  The drawback is if you mount with a device that is higher than
      the lowest devicd it won't show up as the mounted device in /proc/mounts,
      but this is a small price to pay. This was inspired by Miao Xie's patch.
      Thanks,
      Reviewed-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      9c5085c1
    • J
      Btrfs: use rcu to protect device->name · 606686ee
      Josef Bacik 提交于
      Al pointed out that we can just toss out the old name on a device and add a
      new one arbitrarily, so anybody who uses device->name in printk could
      possibly use free'd memory.  Instead of adding locking around all of this he
      suggested doing it with RCU, so I've introduced a struct rcu_string that
      does just that and have gone through and protected all accesses to
      device->name that aren't under the uuid_mutex with rcu_read_lock().  This
      protects us and I will use it for dealing with removing the device that we
      used to mount the file system in a later patch.  Thanks,
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      606686ee
    • J
      Btrfs: unlock everything properly in the error case for nocow · 17ca04af
      Josef Bacik 提交于
      I was getting hung on umount when a transaction was aborted because a range
      of one of the free space inodes was still locked.  This is because the nocow
      stuff doesn't unlock anything on error.  This fixed the problem and I
      verified that is what was happening.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      17ca04af
    • J
      Btrfs: fix btrfs_destroy_marked_extents · ee670f0a
      Josef Bacik 提交于
      So we're forcing the eb's to have their ref count set to 1 so invalidatepage
      works but this breaks lots of things, for example root nodes, and is just
      plain wrong, we don't need to just evict all of this stuff.  Also drop the
      invalidatepage altogether and add a page_cache_release().  With this patch
      we no longer hang when trying to access the root nodes after an aborted
      transaction and we no longer leak memory.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      ee670f0a
    • J
      Btrfs: abort the transaction if the commit fails · 7b8b92af
      Josef Bacik 提交于
      If a transaction commit fails we don't abort it so we don't set an error on
      the file system.  This patch fixes that by actually calling the abort stuff
      and then adding a check for a fs error in the transaction start stuff to
      make sure it is caught properly.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      7b8b92af
    • J
      Btrfs: wake up transaction waiters when aborting a transaction · d7096fc3
      Josef Bacik 提交于
      I was getting lots of hung tasks and a NULL pointer dereference because we
      are not cleaning up the transaction properly when it aborts.  First we need
      to reset the running_transaction to NULL so we don't get a bad dereference
      for any start_transaction callers after this.  Also we cannot rely on
      waitqueue_active() since it's just a list_empty(), so just call wake_up()
      directly since that will do the barrier for us and such.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      d7096fc3
    • J
      Btrfs: fix locking in btrfs_destroy_delayed_refs · b939d1ab
      Josef Bacik 提交于
      The transaction abort stuff was throwing warnings from the list debugging
      code because we do a list_del_init outside of the delayed_refs spin lock.
      The delayed refs locking makes baby Jesus cry so it's not hard to get wrong,
      but we need to take the ref head mutex to make sure it's not being processed
      currently, and so if it is we need to drop the spin lock and then take and
      drop the mutex and do the search again.  If we can take the mutex then we
      can safely remove the head from the list and carry on.  Now when the
      transaction aborts I don't get the list debugging warnings.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      b939d1ab
    • J
      Btrfs: pass locked_page into extent_clear_unlock_delalloc if theres an error · beb42dd7
      Josef Bacik 提交于
      While doing my enospc work I got a transaction abortion that resulted in a
      panic when we tried to unlock_page() an already unlocked page.  This is
      because we aren't calling extent_clear_unlock_delalloc with the locked page
      so it was unlocking all the pages in the range.  This is wrong since
      __extent_writepage expects to have the page locked still unless we return
      *page_started as 1.  This should keep us from panicing.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      beb42dd7