1. 30 5月, 2012 2 次提交
    • J
      Btrfs: convert the inode bit field to use the actual bit operations · 72ac3c0d
      Josef Bacik 提交于
      Miao pointed this out while I was working on an orphan problem that messing
      with a bitfield where different ranges are protected by different locks
      doesn't work out right.  Turns out we've been doing this forever where we
      have different parts of the bit field protected by either no lock at all or
      different locks which could cause all sorts of weird problems including the
      issue I was hitting.  So instead make a runtime_flags thing that we use the
      normal bit operations on that are all atomic so we can keep having our
      no/different locking for the different flags and then make force_compress
      it's own thing so it can be treated normally.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      72ac3c0d
    • J
      Btrfs: finish ordered extents in their own thread · 5fd02043
      Josef Bacik 提交于
      We noticed that the ordered extent completion doesn't really rely on having
      a page and that it could be done independantly of ending the writeback on a
      page.  This patch makes us not do the threaded endio stuff for normal
      buffered writes and direct writes so we can end page writeback as soon as
      possible (in irq context) and only start threads to do the ordered work when
      it is actually done.  Compression needs to be reworked some to take
      advantage of this as well, but atm it has to do a find_get_page in its endio
      handler so it must be done in its own thread.  This makes direct writes
      quite a bit faster.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      5fd02043
  2. 06 5月, 2012 1 次提交
    • C
      Btrfs: avoid sleeping in verify_parent_transid while atomic · b9fab919
      Chris Mason 提交于
      verify_parent_transid needs to lock the extent range to make
      sure no IO is underway, and so it can safely clear the
      uptodate bits if our checks fail.
      
      But, a few callers are using it with spinlocks held.  Most
      of the time, the generation numbers are going to match, and
      we don't want to switch to a blocking lock just for the error
      case.  This adds an atomic flag to verify_parent_transid,
      and changes it to return EAGAIN if it needs to block to
      properly verifiy things.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      b9fab919
  3. 19 4月, 2012 2 次提交
  4. 30 3月, 2012 1 次提交
  5. 29 3月, 2012 3 次提交
  6. 27 3月, 2012 6 次提交
    • J
      Btrfs: deal with read errors on extent buffers differently · ea466794
      Josef Bacik 提交于
      Since we need to read and write extent buffers in their entirety we can't use
      the normal bio_readpage_error stuff since it only works on a per page basis.  So
      instead make it so that if we see an io error in endio we just mark the eb as
      having an IO error and then in btree_read_extent_buffer_pages we will manually
      try other mirrors and then overwrite the bad mirror if we find a good copy.
      This works with larger than page size blocks.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      ea466794
    • C
      Btrfs: don't use threaded IO completion helpers for metadata writes · f3f266ab
      Chris Mason 提交于
      The metadata write IO completion code is now simple enough that we
      don't need the threaded helpers anymore.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      f3f266ab
    • J
      Btrfs: ensure an entire eb is written at once · 0b32f4bb
      Josef Bacik 提交于
      This patch simplifies how we track our extent buffers.  Previously we could exit
      writepages with only having written half of an extent buffer, which meant we had
      to track the state of the pages and the state of the extent buffers differently.
      Now we only read in entire extent buffers and write out entire extent buffers,
      this allows us to simply set bits in our bflags to indicate the state of the eb
      and we no longer have to do things like track uptodate with our iotree.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      0b32f4bb
    • J
      Btrfs: introduce free_extent_buffer_stale · 3083ee2e
      Josef Bacik 提交于
      Because btrfs cow's we can end up with extent buffers that are no longer
      necessary just sitting around in memory.  So instead of evicting these pages, we
      could end up evicting things we actually care about.  Thus we have
      free_extent_buffer_stale for use when we are freeing tree blocks.  This will
      make it so that the ref for the eb being in the radix tree is dropped as soon as
      possible and then is freed when the refcount hits 0 instead of waiting to be
      released by releasepage.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      3083ee2e
    • J
      Btrfs: set page->private to the eb · 4f2de97a
      Josef Bacik 提交于
      We spend a lot of time looking up extent buffers from pages when we could just
      store the pointer to the eb the page is associated with in page->private.  This
      patch does just that, and it makes things a little simpler and reduces a bit of
      CPU overhead involved with doing metadata IO.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      4f2de97a
    • C
      Btrfs: allow metadata blocks larger than the page size · 727011e0
      Chris Mason 提交于
      A few years ago the btrfs code to support blocks lager than
      the page size was disabled to fix a few corner cases in the
      page cache handling.  This fixes the code to properly support
      large metadata blocks again.
      
      Since current kernels will crash early and often with larger
      metadata blocks, this adds an incompat bit so that older kernels
      can't mount it.
      
      This also does away with different blocksizes for nodes and leaves.
      You get a single block size for all tree blocks.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      727011e0
  7. 22 3月, 2012 8 次提交
  8. 23 2月, 2012 1 次提交
    • C
      Btrfs: make sure we update latest_bdev · a6b0d5c8
      Chris Mason 提交于
      When we are setting up the mount, we close all the
      devices that were not actually part of the metadata we found.
      
      But, we don't make sure that one of those devices wasn't
      fs_devices->latest_bdev, which means we can do a use after free
      on the one we closed.
      
      This updates latest_bdev as it goes.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      a6b0d5c8
  9. 15 2月, 2012 1 次提交
    • K
      btrfs: Sector Size check during Mount · 941b2ddf
      Keith Mannthey 提交于
      Gracefully fail when trying to mount a BTRFS file system that has a
      sectorsize smaller than PAGE_SIZE.
      
      On PPC it is possible to build a FS while using a 4k PAGE_SIZE kernel
      then boot into a 64K PAGE_SIZE kernel.  Presently open_ctree fails in an
      endless loop and hangs the machine in this situation.
      
      My debugging has show this Sector size < Page size to be a non trivial
      situation and a graceful exit from the situation would be nice for the
      time being.
      Signed-off-by: NKeith Mannthey <kmannth@us.ibm.com>
      941b2ddf
  10. 27 1月, 2012 1 次提交
    • D
      btrfs: mask out gfp flags in releasepage · 0c4e538b
      David Sterba 提交于
      btree_releasepage is a callback and can be passed unknown gfp flags and then
      they may end up in kmem_cache_alloc called from alloc_extent_state, slab
      allocator will BUG_ON when there is HIGHMEM or DMA32 flag set.
      
      This may happen when btrfs is mounted from a loop device, which masks out
      __GFP_IO flag. The check in try_release_extent_state
      
      3399                 if ((mask & GFP_NOFS) == GFP_NOFS)
      3400                         mask = GFP_NOFS;
      
      will not work and passes unfiltered flags further resulting in crash at
      mm/slab.c:2963
      
       [<000000000024ae4c>] cache_alloc_refill+0x3b4/0x5c8
       [<000000000024c810>] kmem_cache_alloc+0x204/0x294
       [<00000000001fd3c2>] mempool_alloc+0x52/0x170
       [<000003c000ced0b0>] alloc_extent_state+0x40/0xd4 [btrfs]
       [<000003c000cee5ae>] __clear_extent_bit+0x38a/0x4cc [btrfs]
       [<000003c000cee78c>] try_release_extent_state+0x9c/0xd4 [btrfs]
       [<000003c000cc4c66>] btree_releasepage+0x7e/0xd0 [btrfs]
       [<0000000000210d84>] shrink_page_list+0x6a0/0x724
       [<0000000000211394>] shrink_inactive_list+0x230/0x578
       [<0000000000211bb8>] shrink_list+0x6c/0x120
       [<0000000000211e4e>] shrink_zone+0x1e2/0x228
       [<0000000000211f24>] shrink_zones+0x90/0x254
       [<0000000000213410>] do_try_to_free_pages+0xac/0x420
       [<0000000000213ae0>] try_to_free_pages+0x13c/0x1b0
       [<0000000000204e6c>] __alloc_pages_nodemask+0x5b4/0x9a8
       [<00000000001fb04a>] grab_cache_page_write_begin+0x7e/0xe8
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      0c4e538b
  11. 17 1月, 2012 5 次提交
    • I
      Btrfs: allow for canceling restriper · a7e99c69
      Ilya Dryomov 提交于
      Implement an ioctl for canceling restriper.  Currently we wait until
      relocation of the current block group is finished, in future this can be
      done by triggering a commit.  Balance item is deleted and no memory
      about the interrupted balance is kept.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      a7e99c69
    • I
      Btrfs: allow for pausing restriper · 837d5b6e
      Ilya Dryomov 提交于
      Implement an ioctl for pausing restriper.  This pauses the relocation,
      but balance is still considered to be "in progress": balance item is
      not deleted, other volume operations cannot be started, etc.  If paused
      in the middle of profile changing operation we will continue making
      allocations with the target profile.
      
      Add a hook to close_ctree() to pause restriper and free its data
      structures on unmount.  (It's safe to unmount when restriper is in
      "paused" state, we will resume with the same parameters on the next
      mount)
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      837d5b6e
    • I
      Btrfs: recover balance on mount · 59641015
      Ilya Dryomov 提交于
      On mount, if balance item is found, resume balance in a separate
      kernel thread.
      
      Try to be smart to continue roughly where previous balance (or convert)
      was interrupted.  For chunk types that were being converted to some
      profile we turn on soft convert, in case of a simple balance we turn on
      usage filter and relocate only less-than-90%-full chunks of that type.
      These are just heuristics but they help quite a bit, and can be improved
      in future.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      59641015
    • I
      Btrfs: add basic restriper infrastructure · c9e9f97b
      Ilya Dryomov 提交于
      Add basic restriper infrastructure: extended balancing ioctl and all
      related ioctl data structures, add data structure for tracking
      restriper's state to fs_info, etc.  The semantics of the old balancing
      ioctl are fully preserved.
      
      Explicitly disallow any volume operations when balance is in progress.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      c9e9f97b
    • I
      Btrfs: get rid of *_alloc_profile fields · 6fef8df1
      Ilya Dryomov 提交于
      {data,metadata,system}_alloc_profile fields have been unused for a long
      time now.  Get rid of them.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      6fef8df1
  12. 13 1月, 2012 2 次提交
    • M
      mm: compaction: introduce sync-light migration for use by compaction · a6bc32b8
      Mel Gorman 提交于
      This patch adds a lightweight sync migrate operation MIGRATE_SYNC_LIGHT
      mode that avoids writing back pages to backing storage.  Async compaction
      maps to MIGRATE_ASYNC while sync compaction maps to MIGRATE_SYNC_LIGHT.
      For other migrate_pages users such as memory hotplug, MIGRATE_SYNC is
      used.
      
      This avoids sync compaction stalling for an excessive length of time,
      particularly when copying files to a USB stick where there might be a
      large number of dirty pages backed by a filesystem that does not support
      ->writepages.
      
      [aarcange@redhat.com: This patch is heavily based on Andrea's work]
      [akpm@linux-foundation.org: fix fs/nfs/write.c build]
      [akpm@linux-foundation.org: fix fs/btrfs/disk-io.c build]
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Andy Isaacson <adi@hexapodia.org>
      Cc: Nai Xia <nai.xia@gmail.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a6bc32b8
    • M
      mm: compaction: determine if dirty pages can be migrated without blocking within ->migratepage · b969c4ab
      Mel Gorman 提交于
      Asynchronous compaction is used when allocating transparent hugepages to
      avoid blocking for long periods of time.  Due to reports of stalling,
      there was a debate on disabling synchronous compaction but this severely
      impacted allocation success rates.  Part of the reason was that many dirty
      pages are skipped in asynchronous compaction by the following check;
      
      	if (PageDirty(page) && !sync &&
      		mapping->a_ops->migratepage != migrate_page)
      			rc = -EBUSY;
      
      This skips over all mapping aops using buffer_migrate_page() even though
      it is possible to migrate some of these pages without blocking.  This
      patch updates the ->migratepage callback with a "sync" parameter.  It is
      the responsibility of the callback to fail gracefully if migration would
      block.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Andy Isaacson <adi@hexapodia.org>
      Cc: Nai Xia <nai.xia@gmail.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b969c4ab
  13. 11 1月, 2012 1 次提交
    • L
      Btrfs: fix possible deadlock when opening a seed device · b367e47f
      Li Zefan 提交于
      The correct lock order is uuid_mutex -> volume_mutex -> chunk_mutex,
      but when we mount a filesystem which has backing seed devices, we have
      this lock chain:
      
          open_ctree()
              lock(chunk_mutex);
              read_chunk_tree();
                  read_one_dev();
                      open_seed_devices();
                          lock(uuid_mutex);
      
      and then we hit a lockdep splat.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      b367e47f
  14. 09 1月, 2012 6 次提交