1. 13 1月, 2012 2 次提交
    • M
      mm: compaction: introduce sync-light migration for use by compaction · a6bc32b8
      Mel Gorman 提交于
      This patch adds a lightweight sync migrate operation MIGRATE_SYNC_LIGHT
      mode that avoids writing back pages to backing storage.  Async compaction
      maps to MIGRATE_ASYNC while sync compaction maps to MIGRATE_SYNC_LIGHT.
      For other migrate_pages users such as memory hotplug, MIGRATE_SYNC is
      used.
      
      This avoids sync compaction stalling for an excessive length of time,
      particularly when copying files to a USB stick where there might be a
      large number of dirty pages backed by a filesystem that does not support
      ->writepages.
      
      [aarcange@redhat.com: This patch is heavily based on Andrea's work]
      [akpm@linux-foundation.org: fix fs/nfs/write.c build]
      [akpm@linux-foundation.org: fix fs/btrfs/disk-io.c build]
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Andy Isaacson <adi@hexapodia.org>
      Cc: Nai Xia <nai.xia@gmail.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a6bc32b8
    • M
      mm: compaction: determine if dirty pages can be migrated without blocking within ->migratepage · b969c4ab
      Mel Gorman 提交于
      Asynchronous compaction is used when allocating transparent hugepages to
      avoid blocking for long periods of time.  Due to reports of stalling,
      there was a debate on disabling synchronous compaction but this severely
      impacted allocation success rates.  Part of the reason was that many dirty
      pages are skipped in asynchronous compaction by the following check;
      
      	if (PageDirty(page) && !sync &&
      		mapping->a_ops->migratepage != migrate_page)
      			rc = -EBUSY;
      
      This skips over all mapping aops using buffer_migrate_page() even though
      it is possible to migrate some of these pages without blocking.  This
      patch updates the ->migratepage callback with a "sync" parameter.  It is
      the responsibility of the callback to fail gracefully if migration would
      block.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Andy Isaacson <adi@hexapodia.org>
      Cc: Nai Xia <nai.xia@gmail.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b969c4ab
  2. 16 12月, 2011 1 次提交
    • J
      Btrfs: fix num_workers_starting bug and other bugs in async thread · 0dc3b84a
      Josef Bacik 提交于
      Al pointed out we have some random problems with the way we account for
      num_workers_starting in the async thread stuff.  First of all we need to make
      sure to decrement num_workers_starting if we fail to start the worker, so make
      __btrfs_start_workers do this.  Also fix __btrfs_start_workers so that it
      doesn't call btrfs_stop_workers(), there is no point in stopping everybody if we
      failed to create a worker.  Also check_pending_worker_creates needs to call
      __btrfs_start_work in it's work function since it already increments
      num_workers_starting.
      
      People only start one worker at a time, so get rid of the num_workers argument
      everywhere, and make btrfs_queue_worker a void since it will always succeed.
      Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      0dc3b84a
  3. 22 11月, 2011 1 次提交
    • T
      freezer: unexport refrigerator() and update try_to_freeze() slightly · a0acae0e
      Tejun Heo 提交于
      There is no reason to export two functions for entering the
      refrigerator.  Calling refrigerator() instead of try_to_freeze()
      doesn't save anything noticeable or removes any race condition.
      
      * Rename refrigerator() to __refrigerator() and make it return bool
        indicating whether it scheduled out for freezing.
      
      * Update try_to_freeze() to return bool and relay the return value of
        __refrigerator() if freezing().
      
      * Convert all refrigerator() users to try_to_freeze().
      
      * Update documentation accordingly.
      
      * While at it, add might_sleep() to try_to_freeze().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Samuel Ortiz <samuel@sortiz.org>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Christoph Hellwig <hch@infradead.org>
      a0acae0e
  4. 20 11月, 2011 2 次提交
    • J
      btrfs: mirror_num should be int, not u64 · 32240a91
      Jan Schmidt 提交于
      My previous patch introduced some u64 for failed_mirror variables, this one
      makes it consistent again.
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      32240a91
    • C
      Btrfs: fix barrier flushes · 387125fc
      Chris Mason 提交于
      When btrfs is writing the super blocks, it send barrier flushes to make
      sure writeback caching drives get all the metadata on disk in the
      right order.
      
      But, we have two bugs in the way these are sent down.  When doing
      full commits (not via the tree log), we are sending the barrier down
      before the last super when it should be going down before the first.
      
      In multi-device setups, we should be waiting for the barriers to
      complete on all devices before writing any of the supers.
      
      Both of these bugs can cause corruptions on power failures.  We fix it
      with some new code to send down empty barriers to all devices before
      writing the first super.
      
      Alexandre Oliva found the multi-device bug.  Arne Jansen did the async
      barrier loop.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      Reported-by: NAlexandre Oliva <oliva@lsd.ic.unicamp.br>
      387125fc
  5. 10 11月, 2011 2 次提交
  6. 07 11月, 2011 1 次提交
  7. 06 11月, 2011 6 次提交
  8. 02 11月, 2011 1 次提交
  9. 20 10月, 2011 3 次提交
    • J
      Btrfs: allow us to overcommit our enospc reservations · 2bf64758
      Josef Bacik 提交于
      One of the things that kills us is the fact that our ENOSPC reservations are
      horribly over the top in most normal cases.  There isn't too much that can be
      done about this because when we are completely full we really need them to work
      like this so we don't under reserve.  However if there is plenty of unallocated
      chunks on the disk we can use that to gauge how much we can overcommit.  So this
      patch adds chunk free space accounting so we always know how much unallocated
      space we have.  Then if we fail to make a reservation within our allocated
      space, check to see if we can overcommit.  In the normal flushing case (like
      with delalloc metadata reservations) we'll take the free space and divide it by
      2 if our metadata profile is setup for DUP or any of those, and then divide it
      by 8 to make sure we don't overcommit too much.  Then if we're in a non-flushing
      case (we really need this reservation now!) we only limit ourselves to half of
      the free space.  This makes this fio test
      
      [torrent]
      filename=torrent-test
      rw=randwrite
      size=4g
      ioengine=sync
      directory=/mnt/btrfs-test
      
      go from taking around 45 minutes to 10 seconds on my freshly formatted 3 TiB
      file system.  This doesn't seem to break my other enospc tests, but could really
      use some more testing as this is a super scary change.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      2bf64758
    • J
      Btrfs: put the block group cache after we commit the super · 300e4f8a
      Josef Bacik 提交于
      In moving some enospc stuff around I noticed that when we unmount we are often
      evicting the free space cache inodes before we do our last commit.  This isn't
      bad, but it makes us constantly have to re-read the inodes back.  So instead
      don't evict the cache until after we do our last commit, this will make things a
      little less crappy and makes a future enospc change work properly.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      300e4f8a
    • J
      Btrfs: kill the durable block rsv stuff · 37be25bc
      Josef Bacik 提交于
      This is confusing code and isn't used by anything anymore, so delete it.
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      37be25bc
  10. 02 10月, 2011 4 次提交
    • A
      btrfs: hooks for readahead · 4bb31e92
      Arne Jansen 提交于
      This adds the hooks needed for readahead. In the readpage_end_io_hook,
      the extent state is checked for the EXTENT_READAHEAD flag. Only in this
      case the readahead hook is called, to keep the impact on non-ra as low
      as possible.
      Additionally, a hook for a failed IO is added, otherwise readahead would
      wait indefinitely for the extent to finish.
      
      Changes for v2:
       - eliminate race condition
      Signed-off-by: NArne Jansen <sensille@gmx.net>
      4bb31e92
    • A
      btrfs: state information for readahead · 90519d66
      Arne Jansen 提交于
      Add state information for readahead to btrfs_fs_info and btrfs_device
      
      Changes v2:
       - don't wait in radix_trees
       - add own set of workers for readahead
      Reviewed-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NArne Jansen <sensille@gmx.net>
      90519d66
    • A
      btrfs: add READAHEAD extent buffer flag · ab0fff03
      Arne Jansen 提交于
      Add a READAHEAD extent buffer flag.
      Add a function to trigger a read with this flag set.
      
      Changes v2:
       - use extent buffer flags instead of extent state flags
      
      Changes v5:
       - adapt to changed read_extent_buffer_pages interface
       - don't return eb from reada_tree_block_flagged if it has CORRUPT flag set
      Signed-off-by: NArne Jansen <sensille@gmx.net>
      ab0fff03
    • A
      btrfs: add an extra wait mode to read_extent_buffer_pages · bb82ab88
      Arne Jansen 提交于
      read_extent_buffer_pages currently has two modes, either trigger a read
      without waiting for anything, or wait for the I/O to finish. The former
      also bails when it's unable to lock the page. This patch now adds an
      additional parameter to allow it to block on page lock, but don't wait
      for completion.
      
      Changes v5:
       - merge the 2 wait parameters into one and define WAIT_NONE, WAIT_COMPLETE and
         WAIT_PAGE_LOCK
      
      Change v6:
       - fix bug introduced in v5
      Signed-off-by: NArne Jansen <sensille@gmx.net>
      bb82ab88
  11. 29 9月, 2011 1 次提交
  12. 28 7月, 2011 3 次提交
    • C
      Btrfs: make a lockdep class for each root · 85d4e461
      Chris Mason 提交于
      This patch was originally from Tejun Heo.  lockdep complains about the btrfs
      locking because we sometimes take btree locks from two different trees at the
      same time.  The current classes are based only on level in the btree, which
      isn't enough information for lockdep to figure out if the lock is safe.
      
      This patch makes a class for each type of tree, and lumps all the FS trees that
      actually have files and directories into the same class.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      85d4e461
    • C
      Btrfs: stop using highmem for extent_buffers · a6591715
      Chris Mason 提交于
      The extent_buffers have a very complex interface where
      we use HIGHMEM for metadata and try to cache a kmap mapping
      to access the memory.
      
      The next commit adds reader/writer locks, and concurrent use
      of this kmap cache would make it even more complex.
      
      This commit drops the ability to use HIGHMEM with extent buffers,
      and rips out all of the related code.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      a6591715
    • J
      Btrfs: use a worker thread to do caching · bab39bf9
      Josef Bacik 提交于
      A user reported a deadlock when copying a bunch of files.  This is because they
      were low on memory and kthreadd got hung up trying to migrate pages for an
      allocation when starting the caching kthread.  The page was locked by the person
      starting the caching kthread.  To fix this we just need to use the async thread
      stuff so that the threads are already created and we don't have to worry about
      deadlocks.  Thanks,
      Reported-by: NRoman Mamedov <rm@romanrm.ru>
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      bab39bf9
  13. 20 7月, 2011 1 次提交
  14. 18 6月, 2011 3 次提交
  15. 13 6月, 2011 1 次提交
  16. 11 6月, 2011 1 次提交
  17. 10 6月, 2011 1 次提交
  18. 27 5月, 2011 2 次提交
  19. 24 5月, 2011 4 次提交
    • X
      Btrfs: using rcu lock in the reader side of devices list · 1f78160c
      Xiao Guangrong 提交于
      fs_devices->devices is only updated on remove and add device paths, so we can
      use rcu to protect it in the reader side
      Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      1f78160c
    • X
      Btrfs: fix the race between reading and updating devices · c9513edb
      Xiao Guangrong 提交于
      On btrfs_congested_fn and __unplug_io_fn paths, we should hold
      device_list_mutex to avoid remove/add device path to
      update fs_devices->devices
      
      On __btrfs_close_devices and btrfs_prepare_sprout paths, the devices in
      fs_devices->devices or fs_devices->devices is updated, so we should hold
      the mutex to avoid the reader side to reach them
      Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      c9513edb
    • A
      BTRFS: Remove unused node_lock · 0956c798
      Andi Kleen 提交于
      240f62c8 replaced the node_lock with rcu_read_lock, but forgot
      to remove the actual lock in the data structure. Remove it here.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      0956c798
    • J
      Btrfs: kill trans_mutex · a4abeea4
      Josef Bacik 提交于
      We use trans_mutex for lots of things, here's a basic list
      
      1) To serialize trans_handles joining the currently running transaction
      2) To make sure that no new trans handles are started while we are committing
      3) To protect the dead_roots list and the transaction lists
      
      Really the serializing trans_handles joining is not too hard, and can really get
      bogged down in acquiring a reference to the transaction.  So replace the
      trans_mutex with a trans_lock spinlock and use it to do the following
      
      1) Protect fs_info->running_transaction.  All trans handles have to do is check
      this, and then take a reference of the transaction and keep on going.
      2) Protect the fs_info->trans_list.  This doesn't get used too much, basically
      it just holds the current transactions, which will usually just be the currently
      committing transaction and the currently running transaction at most.
      3) Protect the dead roots list.  This is only ever processed by splicing the
      list so this is relatively simple.
      4) Protect the fs_info->reloc_ctl stuff.  This is very lightweight and was using
      the trans_mutex before, so this is a pretty straightforward change.
      5) Protect fs_info->no_trans_join.  Because we don't hold the trans_lock over
      the entirety of the commit we need to have a way to block new people from
      creating a new transaction while we're doing our work.  So we set no_trans_join
      and in join_transaction we test to see if that is set, and if it is we do a
      wait_on_commit.
      6) Make the transaction use count atomic so we don't need to take locks to
      modify it when we're dropping references.
      7) Add a commit_lock to the transaction to make sure multiple people trying to
      commit the same transaction don't race and commit at the same time.
      8) Make open_ioctl_trans an atomic so we don't have to take any locks for ioctl
      trans.
      
      I have tested this with xfstests, but obviously it is a pretty hairy change so
      lots of testing is greatly appreciated.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      a4abeea4