1. 06 5月, 2016 3 次提交
  2. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  3. 12 3月, 2016 1 次提交
  4. 23 2月, 2016 1 次提交
    • L
      Btrfs: fix lockdep deadlock warning due to dev_replace · 73beece9
      Liu Bo 提交于
      Xfstests btrfs/011 complains about a deadlock warning,
      
      [ 1226.649039] =========================================================
      [ 1226.649039] [ INFO: possible irq lock inversion dependency detected ]
      [ 1226.649039] 4.1.0+ #270 Not tainted
      [ 1226.649039] ---------------------------------------------------------
      [ 1226.652955] kswapd0/46 just changed the state of lock:
      [ 1226.652955]  (&delayed_node->mutex){+.+.-.}, at: [<ffffffff81458735>] __btrfs_release_delayed_node+0x45/0x1d0
      [ 1226.652955] but this lock took another, RECLAIM_FS-unsafe lock in the past:
      [ 1226.652955]  (&fs_info->dev_replace.lock){+.+.+.}
      
      and interrupts could create inverse lock ordering between them.
      
      [ 1226.652955]
      other info that might help us debug this:
      [ 1226.652955] Chain exists of:
        &delayed_node->mutex --> &found->groups_sem --> &fs_info->dev_replace.lock
      
      [ 1226.652955]  Possible interrupt unsafe locking scenario:
      
      [ 1226.652955]        CPU0                    CPU1
      [ 1226.652955]        ----                    ----
      [ 1226.652955]   lock(&fs_info->dev_replace.lock);
      [ 1226.652955]                                local_irq_disable();
      [ 1226.652955]                                lock(&delayed_node->mutex);
      [ 1226.652955]                                lock(&found->groups_sem);
      [ 1226.652955]   <Interrupt>
      [ 1226.652955]     lock(&delayed_node->mutex);
      [ 1226.652955]
       *** DEADLOCK ***
      
      Commit 084b6e7c ("btrfs: Fix a lockdep warning when running xfstest.") tried
      to fix a similar one that has the exactly same warning, but with that, we still
      run to this.
      
      The above lock chain comes from
      btrfs_commit_transaction
        ->btrfs_run_delayed_items
          ...
          ->__btrfs_update_delayed_inode
            ...
            ->__btrfs_cow_block
               ...
               ->find_free_extent
                  ->cache_block_group
                    ->load_free_space_cache
                      ->btrfs_readpages
                        ->submit_one_bio
                          ...
                          ->__btrfs_map_block
                            ->btrfs_dev_replace_lock
      
      However, with high memory pressure, tasks which hold dev_replace.lock can
      be interrupted by kswapd and then kswapd is intended to release memory occupied
      by superblock, inodes and dentries, where we may call evict_inode, and it comes
      to
      
      [ 1226.652955]  [<ffffffff81458735>] __btrfs_release_delayed_node+0x45/0x1d0
      [ 1226.652955]  [<ffffffff81459e74>] btrfs_remove_delayed_node+0x24/0x30
      [ 1226.652955]  [<ffffffff8140c5fe>] btrfs_evict_inode+0x34e/0x700
      
      delayed_node->mutex may be acquired in __btrfs_release_delayed_node(), and it leads
      to a ABBA deadlock.
      
      To fix this, we can use "blocking rwlock" used in the case of extent_buffer, but
      things are simpler here since we only needs read's spinlock to blocking lock.
      
      With this, btrfs/011 no more produces warnings in dmesg.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      73beece9
  5. 11 2月, 2016 1 次提交
    • D
      btrfs: scrub: use GFP_KERNEL on the submission path · 58c4e173
      David Sterba 提交于
      Scrub is not on the critical writeback path we don't need to use
      GFP_NOFS for all allocations. The failures are handled and stats passed
      back to userspace.
      
      Let's use GFP_KERNEL on the paths where everything is ok, ie. setup the
      global structures and the IO submission paths.
      
      Functions that do the repair and fixups still use GFP_NOFS as we might
      want to skip any other filesystem activity if we encounter an error.
      This could turn out to be unnecessary, but requires more review compared
      to the easy cases in this patch.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      58c4e173
  6. 23 1月, 2016 1 次提交
    • A
      wrappers for ->i_mutex access · 5955102c
      Al Viro 提交于
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5955102c
  7. 20 1月, 2016 1 次提交
  8. 16 1月, 2016 1 次提交
  9. 07 1月, 2016 3 次提交
  10. 03 12月, 2015 1 次提交
  11. 25 11月, 2015 3 次提交
    • F
      Btrfs: fix scrub preventing unused block groups from being deleted · 758f2dfc
      Filipe Manana 提交于
      Currently scrub can race with the cleaner kthread when the later attempts
      to delete an unused block group, and the result is preventing the cleaner
      kthread from ever deleting later the block group - unless the block group
      becomes used and unused again. The following diagram illustrates that
      race:
      
                    CPU 1                                 CPU 2
      
       cleaner kthread
         btrfs_delete_unused_bgs()
      
           gets block group X from
           fs_info->unused_bgs and
           removes it from that list
      
                                                   scrub_enumerate_chunks()
      
                                                     searches device tree using
                                                     its commit root
      
                                                     finds device extent for
                                                     block group X
      
                                                     gets block group X from the tree
                                                     fs_info->block_group_cache_tree
                                                     (via btrfs_lookup_block_group())
      
                                                     sets bg X to RO
      
           sees the block group is
           already RO and therefore
           doesn't delete it nor adds
           it back to unused list
      
      So fix this by making scrub add the block group again to the list of
      unused block groups if the block group is still unused when it finished
      scrubbing it and it hasn't been removed already.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      758f2dfc
    • F
      Btrfs: fix race between scrub and block group deletion · 020d5b73
      Filipe Manana 提交于
      Scrub can race with the cleaner kthread deleting block groups that are
      unused (and with relocation too) leading to a failure with error -EINVAL
      that gets returned to user space.
      
      The following diagram illustrates how it happens:
      
                    CPU 1                                 CPU 2
      
       cleaner kthread
         btrfs_delete_unused_bgs()
      
           gets block group X from
           fs_info->unused_bgs
      
           sets block group to RO
      
             btrfs_remove_chunk(bg X)
      
               deletes device extents
      
                                               scrub_enumerate_chunks()
      
                                                 searches device tree using
                                                 its commit root
      
                                                 finds device extent for
                                                 block group X
      
                                                 gets block group X from the tree
                                                 fs_info->block_group_cache_tree
                                                 (via btrfs_lookup_block_group())
      
                                                 sets bg X to RO (again)
      
                btrfs_remove_block_group(bg X)
      
                  deletes block group from
                  fs_info->block_group_cache_tree
      
                  removes extent map from
                  fs_info->mapping_tree
      
                                                     scrub_chunk(offset X)
      
                                                       searches fs_info->mapping_tree
                                                       for extent map starting at
                                                       offset X
      
                                                          --> doesn't find any such
                                                              extent map
                                                          --> returns -EINVAL and scrub
                                                              errors out to userspace
                                                              with -EINVAL
      
      Fix this by dealing with an extent map lookup failure as an indicator of
      block group deletion.
      Issue reproduced with fstest btrfs/071.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      020d5b73
    • Z
      btrfs: Continue replace when set_block_ro failed · 76a8efa1
      Zhaolei 提交于
      xfstests/011 failed in node with small_size filesystem.
      Can be reproduced by following script:
        DEV_LIST="/dev/vdd /dev/vde"
        DEV_REPLACE="/dev/vdf"
      
        do_test()
        {
            local mkfs_opt="$1"
            local size="$2"
      
            dmesg -c >/dev/null
            umount $SCRATCH_MNT &>/dev/null
      
            echo  mkfs.btrfs -f $mkfs_opt "${DEV_LIST[*]}"
            mkfs.btrfs -f $mkfs_opt "${DEV_LIST[@]}" || return 1
            mount "${DEV_LIST[0]}" $SCRATCH_MNT
      
            echo -n "Writing big files"
            dd if=/dev/urandom of=$SCRATCH_MNT/t0 bs=1M count=1 >/dev/null 2>&1
            for ((i = 1; i <= size; i++)); do
                echo -n .
                /bin/cp $SCRATCH_MNT/t0 $SCRATCH_MNT/t$i || return 1
            done
            echo
      
            echo Start replace
            btrfs replace start -Bf "${DEV_LIST[0]}" "$DEV_REPLACE" $SCRATCH_MNT || {
                dmesg
                return 1
            }
            return 0
        }
      
        # Set size to value near fs size
        # for example, 1897 can trigger this bug in 2.6G device.
        #
        ./do_test "-d raid1 -m raid1" 1897
      
      System will report replace fail with following warning in dmesg:
       [  134.710853] BTRFS: dev_replace from /dev/vdd (devid 1) to /dev/vdf started
       [  135.542390] BTRFS: btrfs_scrub_dev(/dev/vdd, 1, /dev/vdf) failed -28
       [  135.543505] ------------[ cut here ]------------
       [  135.544127] WARNING: CPU: 0 PID: 4080 at fs/btrfs/dev-replace.c:428 btrfs_dev_replace_start+0x398/0x440()
       [  135.545276] Modules linked in:
       [  135.545681] CPU: 0 PID: 4080 Comm: btrfs Not tainted 4.3.0 #256
       [  135.546439] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
       [  135.547798]  ffffffff81c5bfcf ffff88003cbb3d28 ffffffff817fe7b5 0000000000000000
       [  135.548774]  ffff88003cbb3d60 ffffffff810a88f1 ffff88002b030000 00000000ffffffe4
       [  135.549774]  ffff88003c080000 ffff88003c082588 ffff88003c28ab60 ffff88003cbb3d70
       [  135.550758] Call Trace:
       [  135.551086]  [<ffffffff817fe7b5>] dump_stack+0x44/0x55
       [  135.551737]  [<ffffffff810a88f1>] warn_slowpath_common+0x81/0xc0
       [  135.552487]  [<ffffffff810a89e5>] warn_slowpath_null+0x15/0x20
       [  135.553211]  [<ffffffff81448c88>] btrfs_dev_replace_start+0x398/0x440
       [  135.554051]  [<ffffffff81412c3e>] btrfs_ioctl+0x1d2e/0x25c0
       [  135.554722]  [<ffffffff8114c7ba>] ? __audit_syscall_entry+0xaa/0xf0
       [  135.555506]  [<ffffffff8111ab36>] ? current_kernel_time64+0x56/0xa0
       [  135.556304]  [<ffffffff81201e3d>] do_vfs_ioctl+0x30d/0x580
       [  135.557009]  [<ffffffff8114c7ba>] ? __audit_syscall_entry+0xaa/0xf0
       [  135.557855]  [<ffffffff810011d1>] ? do_audit_syscall_entry+0x61/0x70
       [  135.558669]  [<ffffffff8120d1c1>] ? __fget_light+0x61/0x90
       [  135.559374]  [<ffffffff81202124>] SyS_ioctl+0x74/0x80
       [  135.559987]  [<ffffffff81809857>] entry_SYSCALL_64_fastpath+0x12/0x6f
       [  135.560842] ---[ end trace 2a5c1fc3205abbdd ]---
      
      Reason:
       When big data writen to fs, the whole free space will be allocated
       for data chunk.
       And operation as scrub need to set_block_ro(), and when there is
       only one metadata chunk in system(or other metadata chunks
       are all full), the function will try to allocate a new chunk,
       and failed because no space in device.
      
      Fix:
       When set_block_ro failed for metadata chunk, it is not a problem
       because scrub_lock paused commit_trancaction in same time, and
       metadata are always cowed, so the on-the-fly writepages will not
       write data into same place with scrub/replace.
       Let replace continue in this case is no problem.
      
      Tested by above script, and xfstests/011, plus 100 times xfstests/070.
      
      Changelog v1->v2:
      1: Add detail comments in source and commit-message.
      2: Add dmesg detail into commit-message.
      3: Limit return value of -ENOSPC to be passed.
      All suggested by: Filipe Manana <fdmanana@gmail.com>
      Suggested-by: NFilipe Manana <fdmanana@gmail.com>
      Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      76a8efa1
  12. 11 11月, 2015 6 次提交
  13. 08 10月, 2015 3 次提交
  14. 01 9月, 2015 2 次提交
  15. 14 8月, 2015 1 次提交
  16. 09 8月, 2015 11 次提交