1. 26 11月, 2015 3 次提交
  2. 25 11月, 2015 14 次提交
    • H
      btrfs: fix balance range usage filters in 4.4-rc · dba72cb3
      Holger Hoffstätte 提交于
      There's a regression in 4.4-rc since commit bc309467
      (btrfs: extend balance filter usage to take minimum and maximum) in that
      existing (non-ranged) balance with -dusage=x no longer works; all chunks
      are skipped.
      
      After staring at the code for a while and wondering why a non-ranged
      balance would even need min and max thresholds (..which then were not
      set correctly, leading to the bug) I realized that the only problem
      was the fact that the filter functions were named wrong, thanks to
      patching copypasta. Simply renaming both functions lets the existing
      btrfs-progs call balance with -dusage=x and now the non-ranged filter
      function is invoked, properly using only a single chunk limit.
      Signed-off-by: NHolger Hoffstätte <holger.hoffstaette@googlemail.com>
      Fixes: bc309467 ("btrfs: extend balance filter usage to take minimum and maximum")
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      dba72cb3
    • M
      btrfs: qgroup: account shared subtree during snapshot delete · 82bd101b
      Mark Fasheh 提交于
      Commit 0ed4792a ('btrfs: qgroup: Switch to new extent-oriented qgroup
      mechanism.') removed our qgroup accounting during
      btrfs_drop_snapshot(). Predictably, this results in qgroup numbers
      going bad shortly after a snapshot is removed.
      
      Fix this by adding a dirty extent record when we encounter extents during
      our shared subtree walk. This effectively restores the functionality we had
      with the original shared subtree walking code in 1152651a (btrfs: qgroup:
      account shared subtrees during snapshot delete).
      
      The idea with the original patch (and this one) is that shared subtrees can
      get skipped during drop_snapshot. The shared subtree walk then allows us a
      chance to visit those extents and add them to the qgroup work for later
      processing. This ultimately makes the accounting for drop snapshot work.
      
      The new qgroup code nicely handles all the other extents during the tree
      walk via the ref dec/inc functions so we don't have to add actions beyond
      what we had originally.
      Signed-off-by: NMark Fasheh <mfasheh@suse.de>
      Signed-off-by: NChris Mason <clm@fb.com>
      82bd101b
    • J
      Btrfs: use btrfs_get_fs_root in resolve_indirect_ref · 2d9e9776
      Josef Bacik 提交于
      The backref code will look up the fs_root we're trying to resolve our indirect
      refs for, unfortunately we use btrfs_read_fs_root_no_name, which returns -ENOENT
      if the ref is 0.  This isn't helpful for the qgroup stuff with snapshot delete
      as it won't be able to search down the snapshot we are deleting, which will
      cause us to miss roots.  So use btrfs_get_fs_root and send false for check_ref
      so we can always get the root we're looking for.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.de>
      Signed-off-by: NChris Mason <clm@fb.com>
      2d9e9776
    • J
      btrfs: qgroup: fix quota disable during rescan · 967ef513
      Justin Maggard 提交于
      There's a race condition that leads to a NULL pointer dereference if you
      disable quotas while a quota rescan is running.  To fix this, we just need
      to wait for the quota rescan worker to actually exit before tearing down
      the quota structures.
      Signed-off-by: NJustin Maggard <jmaggard@netgear.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      967ef513
    • F
      Btrfs: fix race between cleaner kthread and space cache writeout · 036a9348
      Filipe Manana 提交于
      When a block group becomes unused and the cleaner kthread is currently
      running, we can end up getting the current transaction aborted with error
      -ENOENT when we try to commit the transaction, leading to the following
      trace:
      
        [59779.258768] WARNING: CPU: 3 PID: 5990 at fs/btrfs/extent-tree.c:3740 btrfs_write_dirty_block_groups+0x17c/0x214 [btrfs]()
        [59779.272594] BTRFS: Transaction aborted (error -2)
        (...)
        [59779.291137] Call Trace:
        [59779.291621]  [<ffffffff812566f4>] dump_stack+0x4e/0x79
        [59779.292543]  [<ffffffff8104d0a6>] warn_slowpath_common+0x9f/0xb8
        [59779.293435]  [<ffffffffa04cb81f>] ? btrfs_write_dirty_block_groups+0x17c/0x214 [btrfs]
        [59779.295000]  [<ffffffff8104d107>] warn_slowpath_fmt+0x48/0x50
        [59779.296138]  [<ffffffffa04c2721>] ? write_one_cache_group.isra.32+0x77/0x82 [btrfs]
        [59779.297663]  [<ffffffffa04cb81f>] btrfs_write_dirty_block_groups+0x17c/0x214 [btrfs]
        [59779.299141]  [<ffffffffa0549b0d>] commit_cowonly_roots+0x1de/0x261 [btrfs]
        [59779.300359]  [<ffffffffa04dd5b6>] btrfs_commit_transaction+0x4c4/0x99c [btrfs]
        [59779.301805]  [<ffffffffa04b5df4>] btrfs_sync_fs+0x145/0x1ad [btrfs]
        [59779.302893]  [<ffffffff81196634>] sync_filesystem+0x7f/0x93
        (...)
        [59779.318186] ---[ end trace 577e2daff90da33a ]---
      
      The following diagram illustrates a sequence of steps leading to this
      problem:
      
             CPU 1                                             CPU 2
      
                                 <at transaction N>
      
                                                              adds bg A to list
                                                              fs_info->unused_bgs
      
                                                              adds bg B to list
                                                              fs_info->unused_bgs
      
                                 <transaction kthread
                                  commits transaction N
                                  and wakes up the
                                  cleaner kthread>
      
        cleaner kthread
          delete_unused_bgs()
      
            sees bg A in list
            fs_info->unused_bgs
      
            btrfs_start_transaction()
      
                                 <transaction N + 1 starts>
      
            deletes bg A
      
                                                              update_block_group(bg C)
      
                                                                --> adds bg C to list
                                                                    fs_info->unused_bgs
      
            deletes bg B
      
            sees bg C in the list
            fs_info->unused_bgs
      
            btrfs_remove_chunk(bg C)
              btrfs_remove_block_group(bg C)
      
                --> checks if the block group
                    is in a dirty list, and
                    because it isn't now, it
                    does nothing
      
                --> the block group item
                    is deleted from the
                    extent tree
      
                                                                --> adds bg C to list
                                                                    transaction->dirty_bgs
      
                                                               some task calls
                                                               btrfs_commit_transaction(t N + 1)
                                                                 commit_cowonly_roots()
                                                                   btrfs_write_dirty_block_groups()
                                                                     --> sees bg C in cur_trans->dirty_bgs
                                                                     --> calls write_one_cache_group()
                                                                         which returns -ENOENT because
                                                                         it did not find the block group
                                                                         item in the extent tree
                                                                     --> transaction aborte with -ENOENT
                                                                         because write_one_cache_group()
                                                                         returned that error
      
      So fix this by adding a block group to the list of dirty block groups
      before adding it to the list of unused block groups.
      
      This happened on a stress test using fsstress plus concurrent calls to
      fallocate 20G and truncate (releasing part of the space allocated with
      fallocate).
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      036a9348
    • F
      Btrfs: fix scrub preventing unused block groups from being deleted · 758f2dfc
      Filipe Manana 提交于
      Currently scrub can race with the cleaner kthread when the later attempts
      to delete an unused block group, and the result is preventing the cleaner
      kthread from ever deleting later the block group - unless the block group
      becomes used and unused again. The following diagram illustrates that
      race:
      
                    CPU 1                                 CPU 2
      
       cleaner kthread
         btrfs_delete_unused_bgs()
      
           gets block group X from
           fs_info->unused_bgs and
           removes it from that list
      
                                                   scrub_enumerate_chunks()
      
                                                     searches device tree using
                                                     its commit root
      
                                                     finds device extent for
                                                     block group X
      
                                                     gets block group X from the tree
                                                     fs_info->block_group_cache_tree
                                                     (via btrfs_lookup_block_group())
      
                                                     sets bg X to RO
      
           sees the block group is
           already RO and therefore
           doesn't delete it nor adds
           it back to unused list
      
      So fix this by making scrub add the block group again to the list of
      unused block groups if the block group is still unused when it finished
      scrubbing it and it hasn't been removed already.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      758f2dfc
    • F
      Btrfs: fix race between scrub and block group deletion · 020d5b73
      Filipe Manana 提交于
      Scrub can race with the cleaner kthread deleting block groups that are
      unused (and with relocation too) leading to a failure with error -EINVAL
      that gets returned to user space.
      
      The following diagram illustrates how it happens:
      
                    CPU 1                                 CPU 2
      
       cleaner kthread
         btrfs_delete_unused_bgs()
      
           gets block group X from
           fs_info->unused_bgs
      
           sets block group to RO
      
             btrfs_remove_chunk(bg X)
      
               deletes device extents
      
                                               scrub_enumerate_chunks()
      
                                                 searches device tree using
                                                 its commit root
      
                                                 finds device extent for
                                                 block group X
      
                                                 gets block group X from the tree
                                                 fs_info->block_group_cache_tree
                                                 (via btrfs_lookup_block_group())
      
                                                 sets bg X to RO (again)
      
                btrfs_remove_block_group(bg X)
      
                  deletes block group from
                  fs_info->block_group_cache_tree
      
                  removes extent map from
                  fs_info->mapping_tree
      
                                                     scrub_chunk(offset X)
      
                                                       searches fs_info->mapping_tree
                                                       for extent map starting at
                                                       offset X
      
                                                          --> doesn't find any such
                                                              extent map
                                                          --> returns -EINVAL and scrub
                                                              errors out to userspace
                                                              with -EINVAL
      
      Fix this by dealing with an extent map lookup failure as an indicator of
      block group deletion.
      Issue reproduced with fstest btrfs/071.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      020d5b73
    • D
      btrfs: fix rcu warning during device replace · 31388ab2
      David Sterba 提交于
      The test btrfs/011 triggers a rcu warning
      Reviewed-by: NAnand Jain <anand.jain@oracle.com>
      
      ===============================
      [ INFO: suspicious RCU usage. ]
      4.4.0-rc1-default+ #286 Tainted: G        W
      -------------------------------
      fs/btrfs/volumes.c:1977 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 1, debug_locks = 0
      4 locks held by btrfs/28786:
      
      0:  (&fs_info->dev_replace.lock_finishing_cancel_unmount){+.+...}, at: [<ffffffffa00bc785>] btrfs_dev_replace_finishing+0x45/0xa00 [btrfs]
      1:  (uuid_mutex){+.+.+.}, at: [<ffffffffa00bc84f>] btrfs_dev_replace_finishing+0x10f/0xa00 [btrfs]
      2:  (&fs_devs->device_list_mutex){+.+.+.}, at: [<ffffffffa00bc868>] btrfs_dev_replace_finishing+0x128/0xa00 [btrfs]
      3:  (&fs_info->chunk_mutex){+.+...}, at: [<ffffffffa00bc87d>] btrfs_dev_replace_finishing+0x13d/0xa00 [btrfs]
      
      stack backtrace:
      CPU: 0 PID: 28786 Comm: btrfs Tainted: G        W       4.4.0-rc1-default+ #286
      Hardware name: Intel Corporation SandyBridge Platform/To be filled by O.E.M., BIOS ASNBCPT1.86C.0031.B00.1006301607 06/30/2010
      0000000000000001 ffff8800a07dfb48 ffffffff8141d47b 0000000000000001
      0000000000000001 0000000000000000 ffff8801464a4f00 ffff8800a07dfb78
      ffffffff810cd883 ffff880146eb9400 ffff8800a3698600 ffff8800a33fe220
      Call Trace:
      [<ffffffff8141d47b>] dump_stack+0x4f/0x74
      [<ffffffff810cd883>] lockdep_rcu_suspicious+0x103/0x140
      [<ffffffffa0071261>] btrfs_rm_dev_replace_remove_srcdev+0x111/0x130 [btrfs]
      [<ffffffff810d354d>] ? trace_hardirqs_on+0xd/0x10
      [<ffffffff81449536>] ? __percpu_counter_sum+0x66/0x80
      [<ffffffffa00bcc15>] btrfs_dev_replace_finishing+0x4d5/0xa00 [btrfs]
      [<ffffffffa00bc96e>] ? btrfs_dev_replace_finishing+0x22e/0xa00 [btrfs]
      [<ffffffffa00a8795>] ? btrfs_scrub_dev+0x415/0x6d0 [btrfs]
      [<ffffffffa003ea69>] ? btrfs_start_transaction+0x9/0x20 [btrfs]
      [<ffffffffa00bda79>] btrfs_dev_replace_start+0x339/0x590 [btrfs]
      [<ffffffff81196aa5>] ? __might_fault+0x95/0xa0
      [<ffffffffa0078638>] btrfs_ioctl_dev_replace+0x118/0x160 [btrfs]
      [<ffffffff811409c6>] ? stack_trace_call+0x46/0x70
      [<ffffffffa007c914>] ? btrfs_ioctl+0x24/0x1770 [btrfs]
      [<ffffffffa007ce43>] btrfs_ioctl+0x553/0x1770 [btrfs]
      [<ffffffff811409c6>] ? stack_trace_call+0x46/0x70
      [<ffffffff811d6eb1>] ? do_vfs_ioctl+0x21/0x5a0
      [<ffffffff811d6f1c>] do_vfs_ioctl+0x8c/0x5a0
      [<ffffffff811e3336>] ? __fget_light+0x86/0xb0
      [<ffffffff811e3369>] ? __fdget+0x9/0x20
      [<ffffffff811d7451>] ? SyS_ioctl+0x21/0x80
      [<ffffffff811d7483>] SyS_ioctl+0x53/0x80
      [<ffffffff81b1efd7>] entry_SYSCALL_64_fastpath+0x12/0x6f
      
      This is because of unprotected use of rcu_dereference in
      btrfs_scratch_superblocks. We can't add rcu locks around the whole
      function because we read the superblock.
      
      The fix will use the rcu string buffer directly without the rcu locking.
      Thi is safe as the device will not go away in the meantime. We're
      holding the device list mutexes.
      
      Restructuring the code to narrow down the rcu section turned out to be
      impossible, we need to call filp_open (through update_dev_time) on the
      buffer and this could call kmalloc/__might_sleep. We could call kstrdup
      with GFP_ATOMIC but it's not absolutely necessary.
      
      Fixes: 12b1c263 (Btrfs: enhance btrfs_scratch_superblock to scratch all superblocks)
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      31388ab2
    • Z
      btrfs: Continue replace when set_block_ro failed · 76a8efa1
      Zhaolei 提交于
      xfstests/011 failed in node with small_size filesystem.
      Can be reproduced by following script:
        DEV_LIST="/dev/vdd /dev/vde"
        DEV_REPLACE="/dev/vdf"
      
        do_test()
        {
            local mkfs_opt="$1"
            local size="$2"
      
            dmesg -c >/dev/null
            umount $SCRATCH_MNT &>/dev/null
      
            echo  mkfs.btrfs -f $mkfs_opt "${DEV_LIST[*]}"
            mkfs.btrfs -f $mkfs_opt "${DEV_LIST[@]}" || return 1
            mount "${DEV_LIST[0]}" $SCRATCH_MNT
      
            echo -n "Writing big files"
            dd if=/dev/urandom of=$SCRATCH_MNT/t0 bs=1M count=1 >/dev/null 2>&1
            for ((i = 1; i <= size; i++)); do
                echo -n .
                /bin/cp $SCRATCH_MNT/t0 $SCRATCH_MNT/t$i || return 1
            done
            echo
      
            echo Start replace
            btrfs replace start -Bf "${DEV_LIST[0]}" "$DEV_REPLACE" $SCRATCH_MNT || {
                dmesg
                return 1
            }
            return 0
        }
      
        # Set size to value near fs size
        # for example, 1897 can trigger this bug in 2.6G device.
        #
        ./do_test "-d raid1 -m raid1" 1897
      
      System will report replace fail with following warning in dmesg:
       [  134.710853] BTRFS: dev_replace from /dev/vdd (devid 1) to /dev/vdf started
       [  135.542390] BTRFS: btrfs_scrub_dev(/dev/vdd, 1, /dev/vdf) failed -28
       [  135.543505] ------------[ cut here ]------------
       [  135.544127] WARNING: CPU: 0 PID: 4080 at fs/btrfs/dev-replace.c:428 btrfs_dev_replace_start+0x398/0x440()
       [  135.545276] Modules linked in:
       [  135.545681] CPU: 0 PID: 4080 Comm: btrfs Not tainted 4.3.0 #256
       [  135.546439] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
       [  135.547798]  ffffffff81c5bfcf ffff88003cbb3d28 ffffffff817fe7b5 0000000000000000
       [  135.548774]  ffff88003cbb3d60 ffffffff810a88f1 ffff88002b030000 00000000ffffffe4
       [  135.549774]  ffff88003c080000 ffff88003c082588 ffff88003c28ab60 ffff88003cbb3d70
       [  135.550758] Call Trace:
       [  135.551086]  [<ffffffff817fe7b5>] dump_stack+0x44/0x55
       [  135.551737]  [<ffffffff810a88f1>] warn_slowpath_common+0x81/0xc0
       [  135.552487]  [<ffffffff810a89e5>] warn_slowpath_null+0x15/0x20
       [  135.553211]  [<ffffffff81448c88>] btrfs_dev_replace_start+0x398/0x440
       [  135.554051]  [<ffffffff81412c3e>] btrfs_ioctl+0x1d2e/0x25c0
       [  135.554722]  [<ffffffff8114c7ba>] ? __audit_syscall_entry+0xaa/0xf0
       [  135.555506]  [<ffffffff8111ab36>] ? current_kernel_time64+0x56/0xa0
       [  135.556304]  [<ffffffff81201e3d>] do_vfs_ioctl+0x30d/0x580
       [  135.557009]  [<ffffffff8114c7ba>] ? __audit_syscall_entry+0xaa/0xf0
       [  135.557855]  [<ffffffff810011d1>] ? do_audit_syscall_entry+0x61/0x70
       [  135.558669]  [<ffffffff8120d1c1>] ? __fget_light+0x61/0x90
       [  135.559374]  [<ffffffff81202124>] SyS_ioctl+0x74/0x80
       [  135.559987]  [<ffffffff81809857>] entry_SYSCALL_64_fastpath+0x12/0x6f
       [  135.560842] ---[ end trace 2a5c1fc3205abbdd ]---
      
      Reason:
       When big data writen to fs, the whole free space will be allocated
       for data chunk.
       And operation as scrub need to set_block_ro(), and when there is
       only one metadata chunk in system(or other metadata chunks
       are all full), the function will try to allocate a new chunk,
       and failed because no space in device.
      
      Fix:
       When set_block_ro failed for metadata chunk, it is not a problem
       because scrub_lock paused commit_trancaction in same time, and
       metadata are always cowed, so the on-the-fly writepages will not
       write data into same place with scrub/replace.
       Let replace continue in this case is no problem.
      
      Tested by above script, and xfstests/011, plus 100 times xfstests/070.
      
      Changelog v1->v2:
      1: Add detail comments in source and commit-message.
      2: Add dmesg detail into commit-message.
      3: Limit return value of -ENOSPC to be passed.
      All suggested by: Filipe Manana <fdmanana@gmail.com>
      Suggested-by: NFilipe Manana <fdmanana@gmail.com>
      Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      76a8efa1
    • D
      btrfs: fix clashing number of the enhanced balance usage filter · da02c689
      David Sterba 提交于
      I've accidentally picked an already used number for the enhanced usage
      filter represented by BTRFS_BALANCE_ARGS_USAGE_RANGE, clashing with
      BTRFS_BALANCE_ARGS_CONVERT. Introduced during the development phase,
      no backward compatibility issues.
      Reported-by: NHolger Hoffstätte <holger.hoffstaette@googlemail.com>
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Fixes: bc309467 ("btrfs: extend balance filter usage to take minimum and maximum")
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      da02c689
    • F
      Btrfs: fix the number of transaction units needed to remove a block group · 7fd01182
      Filipe Manana 提交于
      We were using only 1 transaction unit when attempting to delete an unused
      block group but in reality we need 3 + N units, where N corresponds to the
      number of stripes. We were accounting only for the addition of the orphan
      item (for the block group's free space cache inode) but we were not
      accounting that we need to delete one block group item from the extent
      tree, one free space item from the tree of tree roots and N device extent
      items from the device tree.
      
      While one unit is not enough, it worked most of the time because for each
      single unit we are too pessimistic and assume an entire tree path, with
      the highest possible heigth (8), needs to be COWed with eventual node
      splits at every possible level in the tree, so there was usually enough
      reserved space for removing all the items and adding the orphan item.
      
      However after adding the orphan item, writepages() can by called by the VM
      subsystem against the btree inode when we are under memory pressure, which
      causes writeback to start for the nodes we COWed before, this forces the
      operation to remove the free space item to COW again some (or all of) the
      same nodes (in the tree of tree roots). Even without writepages() being
      called, we could fail with ENOSPC because these items are located in
      multiple trees and one of them might have a higher heigth and require
      node/leaf splits at many levels, exhausting all the reserved space before
      removing all the items and adding the orphan.
      
      In the kernel 4.0 release, commit 3d84be79 ("Btrfs: fix BUG_ON in
      btrfs_orphan_add() when delete unused block group"), we attempted to fix
      a BUG_ON due to ENOSPC when trying to add the orphan item by making the
      cleaner kthread reserve one transaction unit before attempting to remove
      the block group, but this was not enough. We had a couple user reports
      still hitting the same BUG_ON after 4.0, like Stefan Priebe's report on
      a 4.2-rc6 kernel for example:
      
          http://www.spinics.net/lists/linux-btrfs/msg46070.html
      
      So fix this by reserving all the necessary units of metadata.
      Reported-by: NStefan Priebe <s.priebe@profihost.ag>
      Fixes: 3d84be79 ("Btrfs: fix BUG_ON in btrfs_orphan_add() when delete unused block group")
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      7fd01182
    • F
      Btrfs: use global reserve when deleting unused block group after ENOSPC · 8eab77ff
      Filipe Manana 提交于
      It's possible to reach a state where the cleaner kthread isn't able to
      start a transaction to delete an unused block group due to lack of enough
      free metadata space and due to lack of unallocated device space to allocate
      a new metadata block group as well. If this happens try to use space from
      the global block group reserve just like we do for unlink operations, so
      that we don't reach a permanent state where starting a transaction for
      filesystem operations (file creation, renames, etc) keeps failing with
      -ENOSPC. Such an unfortunate state was observed on a machine where over
      a dozen unused data block groups existed and the cleaner kthread was
      failing to delete them due to ENOSPC error when attempting to start a
      transaction, and even running balance with a -dusage=0 filter failed with
      ENOSPC as well. Also unmounting and mounting again the filesystem didn't
      help. Allowing the cleaner kthread to use the global block reserve to
      delete the unused data block groups fixed the problem.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      8eab77ff
    • D
      Btrfs: tests: checking for NULL instead of IS_ERR() · 89b6c8d1
      Dan Carpenter 提交于
      btrfs_alloc_dummy_root() return an error pointer on failure, it never
      returns NULL.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      89b6c8d1
    • D
      btrfs: fix signed overflows in btrfs_sync_file · 9dcbeed4
      David Sterba 提交于
      The calculation of range length in btrfs_sync_file leads to signed
      overflow. This was caught by PaX gcc SIZE_OVERFLOW plugin.
      
      https://forums.grsecurity.net/viewtopic.php?f=1&t=4284
      
      The fsync call passes 0 and LLONG_MAX, the range length does not fit to
      loff_t and overflows, but the value is converted to u64 so it silently
      works as expected.
      
      The minimal fix is a typecast to u64, switching functions to take
      (start, end) instead of (start, len) would be more intrusive.
      
      Coccinelle script found that there's one more opencoded calculation of
      the length.
      
      <smpl>
      @@
      loff_t start, end;
      @@
      * end - start
      </smpl>
      
      CC: stable@vger.kernel.org
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      9dcbeed4
  3. 24 11月, 2015 13 次提交
  4. 21 11月, 2015 5 次提交
  5. 20 11月, 2015 1 次提交
    • D
      block: protect rw_page against device teardown · 2e6edc95
      Dan Williams 提交于
      Fix use after free crashes like the following:
      
       general protection fault: 0000 [#1] SMP
       Call Trace:
        [<ffffffffa0050216>] ? pmem_do_bvec.isra.12+0xa6/0xf0 [nd_pmem]
        [<ffffffffa0050ba2>] pmem_rw_page+0x42/0x80 [nd_pmem]
        [<ffffffff8128fd90>] bdev_read_page+0x50/0x60
        [<ffffffff812972f0>] do_mpage_readpage+0x510/0x770
        [<ffffffff8128fd20>] ? I_BDEV+0x20/0x20
        [<ffffffff811d86dc>] ? lru_cache_add+0x1c/0x50
        [<ffffffff81297657>] mpage_readpages+0x107/0x170
        [<ffffffff8128fd20>] ? I_BDEV+0x20/0x20
        [<ffffffff8128fd20>] ? I_BDEV+0x20/0x20
        [<ffffffff8129058d>] blkdev_readpages+0x1d/0x20
        [<ffffffff811d615f>] __do_page_cache_readahead+0x28f/0x310
        [<ffffffff811d6039>] ? __do_page_cache_readahead+0x169/0x310
        [<ffffffff811c5abd>] ? pagecache_get_page+0x2d/0x1d0
        [<ffffffff811c76f6>] filemap_fault+0x396/0x530
        [<ffffffff811f816e>] __do_fault+0x4e/0xf0
        [<ffffffff811fce7d>] handle_mm_fault+0x11bd/0x1b50
      
      Cc: <stable@vger.kernel.org>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Acked-by: NMatthew Wilcox <willy@linux.intel.com>
      [willy: symmetry fixups]
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      2e6edc95
  6. 17 11月, 2015 3 次提交
    • D
      dax: disable pmd mappings · ee82c9ed
      Dan Williams 提交于
      While dax pmd mappings are functional in the nominal path they trigger
      kernel crashes in the following paths:
      
       BUG: unable to handle kernel paging request at ffffea0004098000
       IP: [<ffffffff812362f7>] follow_trans_huge_pmd+0x117/0x3b0
       [..]
       Call Trace:
        [<ffffffff811f6573>] follow_page_mask+0x2d3/0x380
        [<ffffffff811f6708>] __get_user_pages+0xe8/0x6f0
        [<ffffffff811f7045>] get_user_pages_unlocked+0x165/0x1e0
        [<ffffffff8106f5b1>] get_user_pages_fast+0xa1/0x1b0
      
       kernel BUG at arch/x86/mm/gup.c:131!
       [..]
       Call Trace:
        [<ffffffff8106f34c>] gup_pud_range+0x1bc/0x220
        [<ffffffff8106f634>] get_user_pages_fast+0x124/0x1b0
      
       BUG: unable to handle kernel paging request at ffffea0004088000
       IP: [<ffffffff81235f49>] copy_huge_pmd+0x159/0x350
       [..]
       Call Trace:
        [<ffffffff811fad3c>] copy_page_range+0x34c/0x9f0
        [<ffffffff810a0daf>] copy_process+0x1b7f/0x1e10
        [<ffffffff810a11c1>] _do_fork+0x91/0x590
      
      All of these paths are interpreting a dax pmd mapping as a transparent
      huge page and making the assumption that the pfn is covered by the
      memmap, i.e. that the pfn has an associated struct page.  PTE mappings
      do not suffer the same fate since they have the _PAGE_SPECIAL flag to
      cause the gup path to fault.  We can do something similar for the PMD
      path, or otherwise defer pmd support for cases where a struct page is
      available.  For now, 4.4-rc and -stable need to disable dax pmd support
      by default.
      
      For development the "depends on BROKEN" line can be removed from
      CONFIG_FS_DAX_PMD.
      
      Cc: <stable@vger.kernel.org>
      Cc: Jan Kara <jack@suse.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reported-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      ee82c9ed
    • G
      FS-Cache: Add missing initialization of ret in cachefiles_write_page() · cf897526
      Geert Uytterhoeven 提交于
      fs/cachefiles/rdwr.c: In function ‘cachefiles_write_page’:
      fs/cachefiles/rdwr.c:882: warning: ‘ret’ may be used uninitialized in
      this function
      
      If the jump to label "error" is taken, "ret" will indeed be
      uninitialized, and random stack data may be printed by the debug code.
      
      Fixes: 102f4d90 ("FS-Cache: Handle a write to the page immediately beyond the EOF marker")
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      cf897526
    • D
      ext2, ext4: warn when mounting with dax enabled · ef83b6e8
      Dan Williams 提交于
      Similar to XFS warn when mounting DAX while it is still considered under
      development.  Also, aspects of the DAX implementation, for example
      synchronization against multiple faults and faults causing block
      allocation, depend on the correct implementation in the filesystem.  The
      maturity of a given DAX implementation is filesystem specific.
      
      Cc: <stable@vger.kernel.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: linux-ext4@vger.kernel.org
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reported-by: NDave Chinner <david@fromorbit.com>
      Acked-by: NJan Kara <jack@suse.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      ef83b6e8
  7. 14 11月, 2015 1 次提交
    • A
      f2fs: xattr simplifications · 29608d20
      Andreas Gruenbacher 提交于
      Now that the xattr handler is passed to the xattr handler operations, we
      have access to the attribute name prefix, so simplify
      f2fs_xattr_generic_list.
      
      Also, f2fs_xattr_advise_list is only ever called for
      f2fs_xattr_advise_handler; there is no need to double check for that.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Cc: Jaegeuk Kim <jaegeuk@kernel.org>
      Cc: Changman Lee <cm224.lee@samsung.com>
      Cc: Chao Yu <chao2.yu@samsung.com>
      Cc: linux-f2fs-devel@lists.sourceforge.net
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      29608d20