1. 25 11月, 2013 1 次提交
  2. 12 11月, 2013 7 次提交
  3. 11 10月, 2013 1 次提交
    • J
      Btrfs: limit delalloc pages outside of find_delalloc_range · 7bf811a5
      Josef Bacik 提交于
      Liu fixed part of this problem and unfortunately I steered him in slightly the
      wrong direction and so didn't completely fix the problem.  The problem is we
      limit the size of the delalloc range we are looking for to max bytes and then we
      try to lock that range.  If we fail to lock the pages in that range we will
      shrink the max bytes to a single page and re loop.  However if our first page is
      inside of the delalloc range then we will end up limiting the end of the range
      to a period before our first page.  This is illustrated below
      
      [0 -------- delalloc range --------- 256mb]
                                        [page]
      
      So find_delalloc_range will return with delalloc_start as 0 and end as 128mb,
      and then we will notice that delalloc_start < *start and adjust it up, but not
      adjust delalloc_end up, so things go sideways.  To fix this we need to not limit
      the max bytes in find_delalloc_range, but in find_lock_delalloc_range and that
      way we don't end up with this confusion.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      7bf811a5
  4. 05 10月, 2013 2 次提交
    • D
      btrfs: Fix crash due to not allocating integrity data for a bioset · b208c2f7
      Darrick J. Wong 提交于
      When btrfs creates a bioset, we must also allocate the integrity data pool.
      Otherwise btrfs will crash when it tries to submit a bio to a checksumming
      disk:
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
       IP: [<ffffffff8111e28a>] mempool_alloc+0x4a/0x150
       PGD 2305e4067 PUD 23063d067 PMD 0
       Oops: 0000 [#1] PREEMPT SMP
       Modules linked in: btrfs scsi_debug xfs ext4 jbd2 ext3 jbd mbcache
      sch_fq_codel eeprom lpc_ich mfd_core nfsd exportfs auth_rpcgss af_packet
      raid6_pq xor zlib_deflate libcrc32c [last unloaded: scsi_debug]
       CPU: 1 PID: 4486 Comm: mount Not tainted 3.12.0-rc1-mcsum #2
       Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
       task: ffff8802451c9720 ti: ffff880230698000 task.ti: ffff880230698000
       RIP: 0010:[<ffffffff8111e28a>]  [<ffffffff8111e28a>] mempool_alloc+0x4a/0x150
       RSP: 0018:ffff880230699688  EFLAGS: 00010286
       RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000005f8445
       RDX: 0000000000000001 RSI: 0000000000000010 RDI: 0000000000000000
       RBP: ffff8802306996f8 R08: 0000000000011200 R09: 0000000000000008
       R10: 0000000000000020 R11: ffff88009d6e8000 R12: 0000000000011210
       R13: 0000000000000030 R14: ffff8802306996b8 R15: ffff8802451c9720
       FS:  00007f25b8a16800(0000) GS:ffff88024fc80000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
       CR2: 0000000000000018 CR3: 0000000230576000 CR4: 00000000000007e0
       Stack:
        ffff8802451c9720 0000000000000002 ffffffff81a97100 0000000000281250
        ffffffff81a96480 ffff88024fc99150 ffff880228d18200 0000000000000000
        0000000000000000 0000000000000040 ffff880230e8c2e8 ffff8802459dc900
       Call Trace:
        [<ffffffff811b2208>] bio_integrity_alloc+0x48/0x1b0
        [<ffffffff811b26fc>] bio_integrity_prep+0xac/0x360
        [<ffffffff8111e298>] ? mempool_alloc+0x58/0x150
        [<ffffffffa03e8041>] ? alloc_extent_state+0x31/0x110 [btrfs]
        [<ffffffff81241579>] blk_queue_bio+0x1c9/0x460
        [<ffffffff8123e58a>] generic_make_request+0xca/0x100
        [<ffffffff8123e639>] submit_bio+0x79/0x160
        [<ffffffffa03f865e>] btrfs_map_bio+0x48e/0x5b0 [btrfs]
        [<ffffffffa03c821a>] btree_submit_bio_hook+0xda/0x110 [btrfs]
        [<ffffffffa03e7eba>] submit_one_bio+0x6a/0xa0 [btrfs]
        [<ffffffffa03ef450>] read_extent_buffer_pages+0x250/0x310 [btrfs]
        [<ffffffff8125eef6>] ? __radix_tree_preload+0x66/0xf0
        [<ffffffff8125f1c5>] ? radix_tree_insert+0x95/0x260
        [<ffffffffa03c66f6>] btree_read_extent_buffer_pages.constprop.128+0xb6/0x120
      [btrfs]
        [<ffffffffa03c8c1a>] read_tree_block+0x3a/0x60 [btrfs]
        [<ffffffffa03caefd>] open_ctree+0x139d/0x2030 [btrfs]
        [<ffffffffa03a282a>] btrfs_mount+0x53a/0x7d0 [btrfs]
        [<ffffffff8113ab0b>] ? pcpu_alloc+0x8eb/0x9f0
        [<ffffffff81167305>] ? __kmalloc_track_caller+0x35/0x1e0
        [<ffffffff81176ba0>] mount_fs+0x20/0xd0
        [<ffffffff81191096>] vfs_kern_mount+0x76/0x120
        [<ffffffff81193320>] do_mount+0x200/0xa40
        [<ffffffff81135cdb>] ? strndup_user+0x5b/0x80
        [<ffffffff81193bf0>] SyS_mount+0x90/0xe0
        [<ffffffff8156d31d>] system_call_fastpath+0x1a/0x1f
       Code: 4c 8d 75 a8 4c 89 6d e8 45 89 e0 4c 8d 6f 30 48 89 5d d8 41 83 e0 af 48
      89 fb 49 83 c6 18 4c 89 7d f8 65 4c 8b 3c 25 c0 b8 00 00 <48> 8b 73 18 44 89 c7
      44 89 45 98 ff 53 20 48 85 c0 48 89 c2 74
       RIP  [<ffffffff8111e28a>] mempool_alloc+0x4a/0x150
        RSP <ffff880230699688>
       CR2: 0000000000000018
       ---[ end trace 7a96042017ed21e2 ]---
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      b208c2f7
    • L
      Btrfs: fix crash of compressed writes · 385fe0be
      Liu Bo 提交于
      The crash[1] is found by xfstests/generic/208 with "-o compress",
      it's not reproduced everytime, but it does panic.
      
      The bug is quite interesting, it's actually introduced by a recent commit
      (573aecaf,
      Btrfs: actually limit the size of delalloc range).
      
      Btrfs implements delay allocation, so during writeback, we
      (1) get a page A and lock it
      (2) search the state tree for delalloc bytes and lock all pages within the range
      (3) process the delalloc range, including find disk space and create
          ordered extent and so on.
      (4) submit the page A.
      
      It runs well in normal cases, but if we're in a racy case, eg.
      buffered compressed writes and aio-dio writes,
      sometimes we may fail to lock all pages in the 'delalloc' range,
      in which case, we need to fall back to search the state tree again with
      a smaller range limit(max_bytes = PAGE_CACHE_SIZE - offset).
      
      The mentioned commit has a side effect, that is, in the fallback case,
      we can find delalloc bytes before the index of the page we already have locked,
      so we're in the case of (delalloc_end <= *start) and return with (found > 0).
      
      This ends with not locking delalloc pages but making ->writepage still
      process them, and the crash happens.
      
      This fixes it by just thinking that we find nothing and returning to caller
      as the caller knows how to deal with it properly.
      
      [1]:
      ------------[ cut here ]------------
      kernel BUG at mm/page-writeback.c:2170!
      [...]
      CPU: 2 PID: 11755 Comm: btrfs-delalloc- Tainted: G           O 3.11.0+ #8
      [...]
      RIP: 0010:[<ffffffff810f5093>]  [<ffffffff810f5093>] clear_page_dirty_for_io+0x1e/0x83
      [...]
      [ 4934.248731] Stack:
      [ 4934.248731]  ffff8801477e5dc8 ffffea00049b9f00 ffff8801869f9ce8 ffffffffa02b841a
      [ 4934.248731]  0000000000000000 0000000000000000 0000000000000fff 0000000000000620
      [ 4934.248731]  ffff88018db59c78 ffffea0005da8d40 ffffffffa02ff860 00000001810016c0
      [ 4934.248731] Call Trace:
      [ 4934.248731]  [<ffffffffa02b841a>] extent_range_clear_dirty_for_io+0xcf/0xf5 [btrfs]
      [ 4934.248731]  [<ffffffffa02a8889>] compress_file_range+0x1dc/0x4cb [btrfs]
      [ 4934.248731]  [<ffffffff8104f7af>] ? detach_if_pending+0x22/0x4b
      [ 4934.248731]  [<ffffffffa02a8bad>] async_cow_start+0x35/0x53 [btrfs]
      [ 4934.248731]  [<ffffffffa02c694b>] worker_loop+0x14b/0x48c [btrfs]
      [ 4934.248731]  [<ffffffffa02c6800>] ? btrfs_queue_worker+0x25c/0x25c [btrfs]
      [ 4934.248731]  [<ffffffff810608f5>] kthread+0x8d/0x95
      [ 4934.248731]  [<ffffffff81060868>] ? kthread_freezable_should_stop+0x43/0x43
      [ 4934.248731]  [<ffffffff814fe09c>] ret_from_fork+0x7c/0xb0
      [ 4934.248731]  [<ffffffff81060868>] ? kthread_freezable_should_stop+0x43/0x43
      [ 4934.248731] Code: ff 85 c0 0f 94 c0 0f b6 c0 59 5b 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb e8 2c de 00 00 49 89 c4 48 8b 03 a8 01 75 02 <0f> 0b 4d 85 e4 74 52 49 8b 84 24 80 00 00 00 f6 40 20 01 75 44
      [ 4934.248731] RIP  [<ffffffff810f5093>] clear_page_dirty_for_io+0x1e/0x83
      [ 4934.248731]  RSP <ffff8801869f9c48>
      [ 4934.280307] ---[ end trace 36f06d3f8750236a ]---
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      385fe0be
  5. 21 9月, 2013 1 次提交
    • J
      Btrfs: actually limit the size of delalloc range · 573aecaf
      Josef Bacik 提交于
      So forever we have had this thing to limit the amount of delalloc pages we'll
      setup to be written out to 128mb.  This is because we have to lock all the pages
      in this range, so anything above this gets a bit unweildly, and also without a
      limit we'll happily allocate gigantic chunks of disk space.  Turns out our check
      for this wasn't quite right, we wouldn't actually limit the chunk we wanted to
      write out, we'd just stop looking for more space after we went over the limit.
      So if you do a giant 20gb dd on my box with lots of ram I could get 2gig
      extents.  This is fine normally, except when you go to relocate these extents
      and we can't find enough space to relocate these moster extents, since we have
      to be able to allocate exactly the same sized extent to move it around.  So fix
      this by actually enforcing the limit.  With this patch I'm no longer seeing
      giant 1.5gb extents.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      573aecaf
  6. 01 9月, 2013 14 次提交
  7. 10 8月, 2013 1 次提交
    • J
      Btrfs: do not offset physical if we're compressed · b76bb701
      Josef Bacik 提交于
      xfstest btrfs/276 was freaking out on slower boxes partly because fiemap was
      offsetting the physical based on the extent offset.  This is perfectly fine with
      uncompressed extents, however the extent offset is into the uncompressed area,
      not the compressed.  So we can return a physical value that isn't at all within
      the area we have allocated on disk.  Fix this by returning the start of the
      extent if it is compressed no matter what the offset.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      b76bb701
  8. 02 7月, 2013 1 次提交
    • J
      Btrfs: check if we can nocow if we don't have data space · 7ee9e440
      Josef Bacik 提交于
      We always just try and reserve data space when we write, but if we are out of
      space but have prealloc'ed extents we should still successfully write.  This
      patch will try and see if we can write to prealloc'ed space and if we can go
      ahead and allow the write to continue.  With this patch we now pass xfstests
      generic/274.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      7ee9e440
  9. 01 7月, 2013 1 次提交
    • J
      Btrfs: move btrfs_truncate_page to btrfs_cont_expand instead of btrfs_truncate · a71754fc
      Josef Bacik 提交于
      This has plagued us forever and I'm so over working around it.  When we truncate
      down to a non-page aligned offset we will call btrfs_truncate_page to zero out
      the end of the page and write it back to disk, this will keep us from exposing
      stale data if we truncate back up from that point.  The problem with this is it
      requires data space to do this, and people don't really expect to get ENOSPC
      from truncate() for these sort of things.  This also tends to bite the orphan
      cleanup stuff too which keeps people from mounting.  To get around this we can
      just move this into btrfs_cont_expand() to make sure if we are truncating up
      from a non-page size aligned i_size we will zero out the rest of this page so
      that we don't expose stale data.  This will give ENOSPC if you try to truncate()
      up or if you try to write past the end of isize, which is much more reasonable.
      This fixes xfstests generic/083 failing to mount because of the orphan cleanup
      failing.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      a71754fc
  10. 14 6月, 2013 1 次提交
  11. 22 5月, 2013 1 次提交
    • L
      mm: change invalidatepage prototype to accept length · d47992f8
      Lukas Czerner 提交于
      Currently there is no way to truncate partial page where the end
      truncate point is not at the end of the page. This is because it was not
      needed and the functionality was enough for file system truncate
      operation to work properly. However more file systems now support punch
      hole feature and it can benefit from mm supporting truncating page just
      up to the certain point.
      
      Specifically, with this functionality truncate_inode_pages_range() can
      be changed so it supports truncating partial page at the end of the
      range (currently it will BUG_ON() if 'end' is not at the end of the
      page).
      
      This commit changes the invalidatepage() address space operation
      prototype to accept range to be invalidated and update all the instances
      for it.
      
      We also change the block_invalidatepage() in the same way and actually
      make a use of the new length argument implementing range invalidation.
      
      Actual file system implementations will follow except the file systems
      where the changes are really simple and should not change the behaviour
      in any way .Implementation for truncate_page_range() which will be able
      to accept page unaligned ranges will follow as well.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      d47992f8
  12. 18 5月, 2013 3 次提交
    • C
      Btrfs: use a btrfs bioset instead of abusing bio internals · 9be3395b
      Chris Mason 提交于
      Btrfs has been pointer tagging bi_private and using bi_bdev
      to store the stripe index and mirror number of failed IOs.
      
      As bios bubble back up through the call chain, we use these
      to decide if and how to retry our IOs.  They are also used
      to count IO failures on a per device basis.
      
      Recently a bio tracepoint was added lead to crashes because
      we were abusing bi_bdev.
      
      This commit adds a btrfs bioset, and creates explicit fields
      for the mirror number and stripe index.  The plan is to
      extend this structure for all of the fields currently in
      struct btrfs_bio, which will mean one less kmalloc in
      our IO path.
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      Reported-by: NTejun Heo <tj@kernel.org>
      9be3395b
    • A
      btrfs: do away with non-whole_page extent I/O · 17a5adcc
      Alexandre Oliva 提交于
      end_bio_extent_readpage computes whole_page based on bv_offset and
      bv_len, without taking into account that blk_update_request may modify
      them when some of the blocks to be read into a page produce a read
      error.  This would cause the read to unlock only part of the file
      range associated with the page, which would in turn leave the entire
      page locked, which would not only keep the process blocked instead of
      returning -EIO to it, but also prevent any further access to the file.
      
      It turns out that btrfs always issues whole-page reads and writes.
      The special handling of non-whole_page appears to be a mistake or a
      left-over from a time when this wasn't the case.  Indeed,
      end_bio_extent_writepage distinguished between whole_page and
      non-whole_page writes but behaved identically in both cases!
      
      I've replaced the whole_page computations with warnings, just to be
      sure that we're not issuing partial page reads or writes.  The
      warnings should probably just go away some time.
      Signed-off-by: NAlexandre Oliva <oliva@gnu.org>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      17a5adcc
    • L
      Btrfs: fix off-by-one in fiemap · a52f4cd2
      Liu Bo 提交于
      lock_extent/unlock_extent expect an exclusive end.
      Tested-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      a52f4cd2
  13. 07 5月, 2013 6 次提交