1. 25 10月, 2013 32 次提交
  2. 05 10月, 2013 8 次提交
    • D
      btrfs: Fix crash due to not allocating integrity data for a bioset · b208c2f7
      Darrick J. Wong 提交于
      When btrfs creates a bioset, we must also allocate the integrity data pool.
      Otherwise btrfs will crash when it tries to submit a bio to a checksumming
      disk:
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
       IP: [<ffffffff8111e28a>] mempool_alloc+0x4a/0x150
       PGD 2305e4067 PUD 23063d067 PMD 0
       Oops: 0000 [#1] PREEMPT SMP
       Modules linked in: btrfs scsi_debug xfs ext4 jbd2 ext3 jbd mbcache
      sch_fq_codel eeprom lpc_ich mfd_core nfsd exportfs auth_rpcgss af_packet
      raid6_pq xor zlib_deflate libcrc32c [last unloaded: scsi_debug]
       CPU: 1 PID: 4486 Comm: mount Not tainted 3.12.0-rc1-mcsum #2
       Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
       task: ffff8802451c9720 ti: ffff880230698000 task.ti: ffff880230698000
       RIP: 0010:[<ffffffff8111e28a>]  [<ffffffff8111e28a>] mempool_alloc+0x4a/0x150
       RSP: 0018:ffff880230699688  EFLAGS: 00010286
       RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000005f8445
       RDX: 0000000000000001 RSI: 0000000000000010 RDI: 0000000000000000
       RBP: ffff8802306996f8 R08: 0000000000011200 R09: 0000000000000008
       R10: 0000000000000020 R11: ffff88009d6e8000 R12: 0000000000011210
       R13: 0000000000000030 R14: ffff8802306996b8 R15: ffff8802451c9720
       FS:  00007f25b8a16800(0000) GS:ffff88024fc80000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
       CR2: 0000000000000018 CR3: 0000000230576000 CR4: 00000000000007e0
       Stack:
        ffff8802451c9720 0000000000000002 ffffffff81a97100 0000000000281250
        ffffffff81a96480 ffff88024fc99150 ffff880228d18200 0000000000000000
        0000000000000000 0000000000000040 ffff880230e8c2e8 ffff8802459dc900
       Call Trace:
        [<ffffffff811b2208>] bio_integrity_alloc+0x48/0x1b0
        [<ffffffff811b26fc>] bio_integrity_prep+0xac/0x360
        [<ffffffff8111e298>] ? mempool_alloc+0x58/0x150
        [<ffffffffa03e8041>] ? alloc_extent_state+0x31/0x110 [btrfs]
        [<ffffffff81241579>] blk_queue_bio+0x1c9/0x460
        [<ffffffff8123e58a>] generic_make_request+0xca/0x100
        [<ffffffff8123e639>] submit_bio+0x79/0x160
        [<ffffffffa03f865e>] btrfs_map_bio+0x48e/0x5b0 [btrfs]
        [<ffffffffa03c821a>] btree_submit_bio_hook+0xda/0x110 [btrfs]
        [<ffffffffa03e7eba>] submit_one_bio+0x6a/0xa0 [btrfs]
        [<ffffffffa03ef450>] read_extent_buffer_pages+0x250/0x310 [btrfs]
        [<ffffffff8125eef6>] ? __radix_tree_preload+0x66/0xf0
        [<ffffffff8125f1c5>] ? radix_tree_insert+0x95/0x260
        [<ffffffffa03c66f6>] btree_read_extent_buffer_pages.constprop.128+0xb6/0x120
      [btrfs]
        [<ffffffffa03c8c1a>] read_tree_block+0x3a/0x60 [btrfs]
        [<ffffffffa03caefd>] open_ctree+0x139d/0x2030 [btrfs]
        [<ffffffffa03a282a>] btrfs_mount+0x53a/0x7d0 [btrfs]
        [<ffffffff8113ab0b>] ? pcpu_alloc+0x8eb/0x9f0
        [<ffffffff81167305>] ? __kmalloc_track_caller+0x35/0x1e0
        [<ffffffff81176ba0>] mount_fs+0x20/0xd0
        [<ffffffff81191096>] vfs_kern_mount+0x76/0x120
        [<ffffffff81193320>] do_mount+0x200/0xa40
        [<ffffffff81135cdb>] ? strndup_user+0x5b/0x80
        [<ffffffff81193bf0>] SyS_mount+0x90/0xe0
        [<ffffffff8156d31d>] system_call_fastpath+0x1a/0x1f
       Code: 4c 8d 75 a8 4c 89 6d e8 45 89 e0 4c 8d 6f 30 48 89 5d d8 41 83 e0 af 48
      89 fb 49 83 c6 18 4c 89 7d f8 65 4c 8b 3c 25 c0 b8 00 00 <48> 8b 73 18 44 89 c7
      44 89 45 98 ff 53 20 48 85 c0 48 89 c2 74
       RIP  [<ffffffff8111e28a>] mempool_alloc+0x4a/0x150
        RSP <ffff880230699688>
       CR2: 0000000000000018
       ---[ end trace 7a96042017ed21e2 ]---
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      b208c2f7
    • I
      Btrfs: fix a use-after-free bug in btrfs_dev_replace_finishing · 1357272f
      Ilya Dryomov 提交于
      free_device rcu callback, scheduled from btrfs_rm_dev_replace_srcdev,
      can be processed before btrfs_scratch_superblock is called, which would
      result in a use-after-free on btrfs_device contents.  Fix this by
      zeroing the superblock before the rcu callback is registered.
      
      Cc: Stefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      1357272f
    • I
      Btrfs: eliminate races in worker stopping code · 964fb15a
      Ilya Dryomov 提交于
      The current implementation of worker threads in Btrfs has races in
      worker stopping code, which cause all kinds of panics and lockups when
      running btrfs/011 xfstest in a loop.  The problem is that
      btrfs_stop_workers is unsynchronized with respect to check_idle_worker,
      check_busy_worker and __btrfs_start_workers.
      
      E.g., check_idle_worker race flow:
      
             btrfs_stop_workers():            check_idle_worker(aworker):
      - grabs the lock
      - splices the idle list into the
        working list
      - removes the first worker from the
        working list
      - releases the lock to wait for
        its kthread's completion
                                        - grabs the lock
                                        - if aworker is on the working list,
                                          moves aworker from the working list
                                          to the idle list
                                        - releases the lock
      - grabs the lock
      - puts the worker
      - removes the second worker from the
        working list
                                    ......
              btrfs_stop_workers returns, aworker is on the idle list
                       FS is umounted, memory is freed
                                    ......
                    aworker is waken up, fireworks ensue
      
      With this applied, I wasn't able to trigger the problem in 48 hours,
      whereas previously I could reliably reproduce at least one of these
      races within an hour.
      Reported-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      964fb15a
    • L
      Btrfs: fix crash of compressed writes · 385fe0be
      Liu Bo 提交于
      The crash[1] is found by xfstests/generic/208 with "-o compress",
      it's not reproduced everytime, but it does panic.
      
      The bug is quite interesting, it's actually introduced by a recent commit
      (573aecaf,
      Btrfs: actually limit the size of delalloc range).
      
      Btrfs implements delay allocation, so during writeback, we
      (1) get a page A and lock it
      (2) search the state tree for delalloc bytes and lock all pages within the range
      (3) process the delalloc range, including find disk space and create
          ordered extent and so on.
      (4) submit the page A.
      
      It runs well in normal cases, but if we're in a racy case, eg.
      buffered compressed writes and aio-dio writes,
      sometimes we may fail to lock all pages in the 'delalloc' range,
      in which case, we need to fall back to search the state tree again with
      a smaller range limit(max_bytes = PAGE_CACHE_SIZE - offset).
      
      The mentioned commit has a side effect, that is, in the fallback case,
      we can find delalloc bytes before the index of the page we already have locked,
      so we're in the case of (delalloc_end <= *start) and return with (found > 0).
      
      This ends with not locking delalloc pages but making ->writepage still
      process them, and the crash happens.
      
      This fixes it by just thinking that we find nothing and returning to caller
      as the caller knows how to deal with it properly.
      
      [1]:
      ------------[ cut here ]------------
      kernel BUG at mm/page-writeback.c:2170!
      [...]
      CPU: 2 PID: 11755 Comm: btrfs-delalloc- Tainted: G           O 3.11.0+ #8
      [...]
      RIP: 0010:[<ffffffff810f5093>]  [<ffffffff810f5093>] clear_page_dirty_for_io+0x1e/0x83
      [...]
      [ 4934.248731] Stack:
      [ 4934.248731]  ffff8801477e5dc8 ffffea00049b9f00 ffff8801869f9ce8 ffffffffa02b841a
      [ 4934.248731]  0000000000000000 0000000000000000 0000000000000fff 0000000000000620
      [ 4934.248731]  ffff88018db59c78 ffffea0005da8d40 ffffffffa02ff860 00000001810016c0
      [ 4934.248731] Call Trace:
      [ 4934.248731]  [<ffffffffa02b841a>] extent_range_clear_dirty_for_io+0xcf/0xf5 [btrfs]
      [ 4934.248731]  [<ffffffffa02a8889>] compress_file_range+0x1dc/0x4cb [btrfs]
      [ 4934.248731]  [<ffffffff8104f7af>] ? detach_if_pending+0x22/0x4b
      [ 4934.248731]  [<ffffffffa02a8bad>] async_cow_start+0x35/0x53 [btrfs]
      [ 4934.248731]  [<ffffffffa02c694b>] worker_loop+0x14b/0x48c [btrfs]
      [ 4934.248731]  [<ffffffffa02c6800>] ? btrfs_queue_worker+0x25c/0x25c [btrfs]
      [ 4934.248731]  [<ffffffff810608f5>] kthread+0x8d/0x95
      [ 4934.248731]  [<ffffffff81060868>] ? kthread_freezable_should_stop+0x43/0x43
      [ 4934.248731]  [<ffffffff814fe09c>] ret_from_fork+0x7c/0xb0
      [ 4934.248731]  [<ffffffff81060868>] ? kthread_freezable_should_stop+0x43/0x43
      [ 4934.248731] Code: ff 85 c0 0f 94 c0 0f b6 c0 59 5b 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb e8 2c de 00 00 49 89 c4 48 8b 03 a8 01 75 02 <0f> 0b 4d 85 e4 74 52 49 8b 84 24 80 00 00 00 f6 40 20 01 75 44
      [ 4934.248731] RIP  [<ffffffff810f5093>] clear_page_dirty_for_io+0x1e/0x83
      [ 4934.248731]  RSP <ffff8801869f9c48>
      [ 4934.280307] ---[ end trace 36f06d3f8750236a ]---
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      385fe0be
    • J
      Btrfs: fix transid verify errors when recovering log tree · 60e7cd3a
      Josef Bacik 提交于
      If we crash with a log, remount and recover that log, and then crash before we
      can commit another transaction we will get transid verify errors on the next
      mount.  This is because we were not zero'ing out the log when we committed the
      transaction after recovery.  This is ok as long as we commit another transaction
      at some point in the future, but if you abort or something else goes wrong you
      can end up in this weird state because the recovery stuff says that the tree log
      should have a generation+1 of the super generation, which won't be the case of
      the transaction that was started for recovery.  Fix this by removing the check
      and _always_ zero out the log portion of the super when we commit a transaction.
      This fixes the transid verify issues I was seeing with my force errors tests.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      60e7cd3a
    • T
      xfs: Use kmem_free() instead of free() · b2a42f78
      Thierry Reding 提交于
      This fixes a build failure caused by calling the free() function which
      does not exist in the Linux kernel.
      Signed-off-by: NThierry Reding <treding@nvidia.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit aaaae980)
      b2a42f78
    • T
      xfs: fix memory leak in xlog_recover_add_to_trans · 9b3b77fe
      tinguely@sgi.com 提交于
      Free the memory in error path of xlog_recover_add_to_trans().
      Normally this memory is freed in recovery pass2, but is leaked
      in the error path.
      Signed-off-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit 519ccb81)
      9b3b77fe
    • D
      xfs: dirent dtype presence is dependent on directory magic numbers · 6d313498
      Dave Chinner 提交于
      The determination of whether a directory entry contains a dtype
      field originally was dependent on the filesystem having CRCs
      enabled. This meant that the format for dtype beign enabled could be
      determined by checking the directory block magic number rather than
      doing a feature bit check. This was useful in that it meant that we
      didn't need to pass a struct xfs_mount around to functions that
      were already supplied with a directory block header.
      
      Unfortunately, the introduction of dtype fields into the v4
      structure via a feature bit meant this "use the directory block
      magic number" method of discriminating the dirent entry sizes is
      broken. Hence we need to convert the places that use magic number
      checks to use feature bit checks so that they work correctly and not
      by chance.
      
      The current code works on v4 filesystems only because the dirent
      size roundup covers the extra byte needed by the dtype field in the
      places where this problem occurs.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit 367993e7)
      6d313498