1. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  2. 07 1月, 2016 3 次提交
  3. 16 12月, 2015 1 次提交
    • C
      Btrfs: check for empty bitmap list in setup_cluster_bitmaps · 1b9b922a
      Chris Mason 提交于
      Dave Jones found a warning from kasan in setup_cluster_bitmaps()
      
      ==================================================================
      BUG: KASAN: stack-out-of-bounds in setup_cluster_bitmap+0xc4/0x5a0 at
      addr ffff88039bef6828
      Read of size 8 by task nfsd/1009
      page:ffffea000e6fbd80 count:0 mapcount:0 mapping:          (null)
      index:0x0
      flags: 0x8000000000000000()
      page dumped because: kasan: bad access detected
      CPU: 1 PID: 1009 Comm: nfsd Tainted: G        W
      4.4.0-rc3-backup-debug+ #1
       ffff880065647b50 000000006bb712c2 ffff88039bef6640 ffffffffa680a43e
       0000004559c00000 ffff88039bef66c8 ffffffffa62638d1 ffffffffa61121c0
       ffff8803a5769de8 0000000000000296 ffff8803a5769df0 0000000000046280
      Call Trace:
       [<ffffffffa680a43e>] dump_stack+0x4b/0x6d
       [<ffffffffa62638d1>] kasan_report_error+0x501/0x520
       [<ffffffffa61121c0>] ? debug_show_all_locks+0x1e0/0x1e0
       [<ffffffffa6263948>] kasan_report+0x58/0x60
       [<ffffffffa6814b00>] ? rb_last+0x10/0x40
       [<ffffffffa66f8af4>] ? setup_cluster_bitmap+0xc4/0x5a0
       [<ffffffffa6262ead>] __asan_load8+0x5d/0x70
       [<ffffffffa66f8af4>] setup_cluster_bitmap+0xc4/0x5a0
       [<ffffffffa66f675a>] ? setup_cluster_no_bitmap+0x6a/0x400
       [<ffffffffa66fcd16>] btrfs_find_space_cluster+0x4b6/0x640
       [<ffffffffa66fc860>] ? btrfs_alloc_from_cluster+0x4e0/0x4e0
       [<ffffffffa66fc36e>] ? btrfs_return_cluster_to_free_space+0x9e/0xb0
       [<ffffffffa702dc37>] ? _raw_spin_unlock+0x27/0x40
       [<ffffffffa666a1a1>] find_free_extent+0xba1/0x1520
      
      Andrey noticed this was because we were doing list_first_entry on a list
      that might be empty.  Rework the tests a bit so we don't do that.
      Signed-off-by: NChris Mason <clm@fb.com>
      Reprorted-by: NAndrey Ryabinin <ryabinin.a.a@gmail.com>
      Reported-by: NDave Jones <dsj@fb.com>
      1b9b922a
  4. 10 12月, 2015 1 次提交
  5. 03 12月, 2015 1 次提交
  6. 07 11月, 2015 1 次提交
  7. 22 10月, 2015 5 次提交
  8. 08 10月, 2015 1 次提交
  9. 29 7月, 2015 1 次提交
  10. 03 6月, 2015 1 次提交
  11. 11 5月, 2015 1 次提交
    • F
      Btrfs: fix crash after inode cache writeback failure · e43699d4
      Filipe Manana 提交于
      If the writeback of an inode cache failed we were unnecessarilly
      attempting to release again the delalloc metadata that we previously
      reserved. However attempting to do this a second time triggers an
      assertion at drop_outstanding_extent() because we have no more
      outstanding extents for our inode cache's inode. If we were able
      to start writeback of the cache the reserved metadata space is
      released at btrfs_finished_ordered_io(), even if an error happens
      during writeback.
      
      So make sure we don't repeat the metadata space release if writeback
      started for our inode cache.
      
      This issue was trivial to reproduce by running the fstest btrfs/088
      with "-o inode_cache", which triggered the assertion leading to a
      BUG() call and requiring a reboot in order to run the remaining
      fstests. Trace produced by btrfs/088:
      
      [255289.385904] BTRFS: assertion failed: BTRFS_I(inode)->outstanding_extents >= num_extents, file: fs/btrfs/extent-tree.c, line: 5276
      [255289.388094] ------------[ cut here ]------------
      [255289.389184] kernel BUG at fs/btrfs/ctree.h:4057!
      [255289.390125] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      (...)
      [255289.392068] Call Trace:
      [255289.392068]  [<ffffffffa035e774>] drop_outstanding_extent+0x3d/0x6d [btrfs]
      [255289.392068]  [<ffffffffa0364988>] btrfs_delalloc_release_metadata+0x54/0xe3 [btrfs]
      [255289.392068]  [<ffffffffa03b4174>] btrfs_write_out_ino_cache+0x95/0xad [btrfs]
      [255289.392068]  [<ffffffffa036f5c4>] btrfs_save_ino_cache+0x275/0x2dc [btrfs]
      [255289.392068]  [<ffffffffa03e2d83>] commit_fs_roots.isra.12+0xaa/0x137 [btrfs]
      [255289.392068]  [<ffffffff8107d33d>] ? trace_hardirqs_on+0xd/0xf
      [255289.392068]  [<ffffffffa037841f>] ? btrfs_commit_transaction+0x4b1/0x9c9 [btrfs]
      [255289.392068]  [<ffffffff814351a4>] ? _raw_spin_unlock+0x32/0x46
      [255289.392068]  [<ffffffffa037842e>] btrfs_commit_transaction+0x4c0/0x9c9 [btrfs]
      (...)
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      e43699d4
  12. 07 5月, 2015 1 次提交
  13. 26 4月, 2015 1 次提交
  14. 25 4月, 2015 1 次提交
  15. 24 4月, 2015 1 次提交
    • C
      Btrfs: fix inode cache writeout · 85db36cf
      Chris Mason 提交于
      The code to fix stalls during free spache cache IO wasn't using
      the correct root when waiting on the IO for inode caches.  This
      is only a problem when the inode cache is enabled with
      
      mount -o inode_cache
      
      This fixes the inode cache writeout to preserve any error values and
      makes sure not to override the root when inode cache writeout is done.
      Reported-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      85db36cf
  16. 11 4月, 2015 5 次提交
    • C
      Btrfs: allow block group cache writeout outside critical section in commit · 1bbc621e
      Chris Mason 提交于
      We loop through all of the dirty block groups during commit and write
      the free space cache.  In order to make sure the cache is currect, we do
      this while no other writers are allowed in the commit.
      
      If a large number of block groups are dirty, this can introduce long
      stalls during the final stages of the commit, which can block new procs
      trying to change the filesystem.
      
      This commit changes the block group cache writeout to take appropriate
      locks and allow it to run earlier in the commit.  We'll still have to
      redo some of the block groups, but it means we can get most of the work
      out of the way without blocking the entire FS.
      Signed-off-by: NChris Mason <clm@fb.com>
      1bbc621e
    • C
      Btrfs: don't use highmem for free space cache pages · 2b108268
      Chris Mason 提交于
      In order to create the free space cache concurrently with FS modifications,
      we need to take a few block group locks.
      
      The cache code also does kmap, which would schedule with the locks held.
      Instead of going through kmap_atomic, lets just use lowmem for the cache
      pages.
      Signed-off-by: NChris Mason <clm@fb.com>
      2b108268
    • C
      Btrfs: two stage dirty block group writeout · c9dc4c65
      Chris Mason 提交于
      Block group cache writeout is currently waiting on the pages for each
      block group cache before moving on to writing the next one.  This commit
      switches things around to send down all the caches and then wait on them
      in batches.
      
      The end result is much faster, since we're keeping the disk pipeline
      full.
      Signed-off-by: NChris Mason <clm@fb.com>
      c9dc4c65
    • C
      btrfs: move struct io_ctl into ctree.h and rename it · 4c6d1d85
      Chris Mason 提交于
      We'll need to put the io_ctl into the block_group cache struct, so
      name it struct btrfs_io_ctl and move it into ctree.h
      Signed-off-by: NChris Mason <clm@fb.com>
      4c6d1d85
    • C
      btrfs: actively run the delayed refs while deleting large files · 28ed1345
      Chris Mason 提交于
      When we are deleting large files with large extents, we are building up
      a huge set of delayed refs for processing.  Truncate isn't checking
      often enough to see if we need to back off and process those, or let
      a commit proceed.
      
      The end result is long stalls after the rm, and very long commit times.
      During the commits, other processes back up waiting to start new
      transactions and we get into trouble.
      Signed-off-by: NChris Mason <clm@fb.com>
      28ed1345
  17. 04 3月, 2015 7 次提交
  18. 21 2月, 2015 1 次提交
  19. 03 2月, 2015 1 次提交
  20. 22 1月, 2015 1 次提交
    • J
      Btrfs: track dirty block groups on their own list · ce93ec54
      Josef Bacik 提交于
      Currently any time we try to update the block groups on disk we will walk _all_
      block groups and check for the ->dirty flag to see if it is set.  This function
      can get called several times during a commit.  So if you have several terabytes
      of data you will be a very sad panda as we will loop through _all_ of the block
      groups several times, which makes the commit take a while which slows down the
      rest of the file system operations.
      
      This patch introduces a dirty list for the block groups that we get added to
      when we dirty the block group for the first time.  Then we simply update any
      block groups that have been dirtied since the last time we called
      btrfs_write_dirty_block_groups.  This allows us to clean up how we write the
      free space cache out so it is much cleaner.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      ce93ec54
  21. 11 12月, 2014 2 次提交
  22. 03 12月, 2014 2 次提交
    • F
      Btrfs: fix memory leak after block remove + trimming · 946ddbe8
      Filipe Manana 提交于
      There was a free space entry structure memeory leak if a block
      group is remove while a free space entry is being trimmed, which
      the following diagram explains:
      
                 CPU 1                                          CPU 2
      
        btrfs_trim_block_group()
            trim_no_bitmap()
                remove free space entry from
                block group cache's rbtree
                do_trimming()
      
                                                      btrfs_remove_block_group()
                                                          btrfs_remove_free_space_cache()
      
                    add back free space entry to
                    block group's cache rbtree
        btrfs_put_block_group()
      
                                                          (...)
                                                          btrfs_put_block_group()
                                                              kfree(bg->free_space_ctl)
                                                              kfree(bg)
      
      The free space entry added after doing the discard of its respective
      range ends up never being freed.
      Detected after doing an "rmmod btrfs" after running the stress test
      recently submitted for fstests:
      
      [ 8234.642212] kmem_cache_destroy btrfs_free_space: Slab cache still has objects
      [ 8234.642657] CPU: 1 PID: 32276 Comm: rmmod Tainted: G        W    L 3.17.0-rc5-btrfs-next-2+ #1
      [ 8234.642660] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
      [ 8234.642664]  0000000000000000 ffff8801af1b3eb8 ffffffff8140c7b6 ffff8801dbedd0c0
      [ 8234.642670]  ffff8801af1b3ed0 ffffffff811149ce 0000000000000000 ffff8801af1b3ee0
      [ 8234.642676]  ffffffffa042dbe7 ffff8801af1b3ef0 ffffffffa0487422 ffff8801af1b3f78
      [ 8234.642682] Call Trace:
      [ 8234.642692]  [<ffffffff8140c7b6>] dump_stack+0x4d/0x66
      [ 8234.642699]  [<ffffffff811149ce>] kmem_cache_destroy+0x4d/0x92
      [ 8234.642731]  [<ffffffffa042dbe7>] btrfs_destroy_cachep+0x63/0x76 [btrfs]
      [ 8234.642757]  [<ffffffffa0487422>] exit_btrfs_fs+0x9/0xbe7 [btrfs]
      [ 8234.642762]  [<ffffffff810a76a5>] SyS_delete_module+0x155/0x1c6
      [ 8234.642768]  [<ffffffff8122a7eb>] ? trace_hardirqs_on_thunk+0x3a/0x3f
      [ 8234.642773]  [<ffffffff814122d2>] system_call_fastpath+0x16/0x1b
      
      This applies on top (depends on) of my previous patch titled:
      "Btrfs: fix race between fs trimming and block group remove/allocation"
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      946ddbe8
    • F
      Btrfs: fix race between writing free space cache and trimming · 55507ce3
      Filipe Manana 提交于
      Trimming is completely transactionless, and the way it operates consists
      of hiding free space entries from a block group, perform the trim/discard
      and then make the free space entries visible again.
      Therefore while a free space entry is being trimmed, we can have free space
      cache writing running in parallel (as part of a transaction commit) which
      will miss the free space entry. This means that an unmount (or crash/reboot)
      after that transaction commit and mount again before another transaction
      starts/commits after the discard finishes, we will have some free space
      that won't be used again unless the free space cache is rebuilt. After the
      unmount, fsck (btrfsck, btrfs check) reports the issue like the following
      example:
      
              *** fsck.btrfs output ***
              checking extents
              checking free space cache
              There is no free space entry for 521764864-521781248
              There is no free space entry for 521764864-1103101952
              cache appears valid but isnt 29360128
              Checking filesystem on /dev/sdc
              UUID: b4789e27-4774-4626-98e9-ae8dfbfb0fb5
              found 1235681286 bytes used err is -22
              (...)
      
      Another issue caused by this race is a crash while writing bitmap entries
      to the cache, because while the cache writeout task accesses the bitmaps,
      the trim task can be concurrently modifying the bitmap or worse might
      be freeing the bitmap. The later case results in the following crash:
      
      [55650.804460] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
      [55650.804835] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc loop parport_pc parport i2c_piix4 psmouse evdev pcspkr microcode processor i2ccore serio_raw thermal_sys button ext4 crc16 jbd2 mbcache sg sd_mod crc_t10dif sr_mod cdrom crct10dif_generic crct10dif_common ata_generic virtio_scsi floppy ata_piix libata virtio_pci virtio_ring virtio scsi_mod e1000 [last unloaded: btrfs]
      [55650.806169] CPU: 1 PID: 31002 Comm: btrfs-transacti Tainted: G        W      3.17.0-rc5-btrfs-next-1+ #1
      [55650.806493] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
      [55650.806867] task: ffff8800b12f6410 ti: ffff880071538000 task.ti: ffff880071538000
      [55650.807166] RIP: 0010:[<ffffffffa037cf45>]  [<ffffffffa037cf45>] write_bitmap_entries+0x65/0xbb [btrfs]
      [55650.807514] RSP: 0018:ffff88007153bc30  EFLAGS: 00010246
      [55650.807687] RAX: 000000005d1ec000 RBX: ffff8800a665df08 RCX: 0000000000000400
      [55650.807885] RDX: ffff88005d1ec000 RSI: 6b6b6b6b6b6b6b6b RDI: ffff88005d1ec000
      [55650.808017] RBP: ffff88007153bc58 R08: 00000000ddd51536 R09: 00000000000001e0
      [55650.808017] R10: 0000000000000000 R11: 0000000000000037 R12: 6b6b6b6b6b6b6b6b
      [55650.808017] R13: ffff88007153bca8 R14: 6b6b6b6b6b6b6b6b R15: ffff88007153bc98
      [55650.808017] FS:  0000000000000000(0000) GS:ffff88023ec80000(0000) knlGS:0000000000000000
      [55650.808017] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [55650.808017] CR2: 0000000002273b88 CR3: 00000000b18f6000 CR4: 00000000000006e0
      [55650.808017] Stack:
      [55650.808017]  ffff88020e834e00 ffff880172d68db0 0000000000000000 ffff88019257c800
      [55650.808017]  ffff8801d42ea720 ffff88007153bd10 ffffffffa037d2fa ffff880224e99180
      [55650.808017]  ffff8801469a6188 ffff880224e99140 ffff880172d68c50 00000003000000b7
      [55650.808017] Call Trace:
      [55650.808017]  [<ffffffffa037d2fa>] __btrfs_write_out_cache+0x1ea/0x37f [btrfs]
      [55650.808017]  [<ffffffffa037d959>] btrfs_write_out_cache+0xa1/0xd8 [btrfs]
      [55650.808017]  [<ffffffffa033936b>] btrfs_write_dirty_block_groups+0x4b5/0x505 [btrfs]
      [55650.808017]  [<ffffffffa03aa98e>] commit_cowonly_roots+0x15e/0x1f7 [btrfs]
      [55650.808017]  [<ffffffff813eb9c7>] ? _raw_spin_lock+0xe/0x10
      [55650.808017]  [<ffffffffa0346e46>] btrfs_commit_transaction+0x411/0x882 [btrfs]
      [55650.808017]  [<ffffffffa03432a4>] transaction_kthread+0xf2/0x1a4 [btrfs]
      [55650.808017]  [<ffffffffa03431b2>] ? btrfs_cleanup_transaction+0x3d8/0x3d8 [btrfs]
      [55650.808017]  [<ffffffff8105966b>] kthread+0xb7/0xbf
      [55650.808017]  [<ffffffff810595b4>] ? __kthread_parkme+0x67/0x67
      [55650.808017]  [<ffffffff813ebeac>] ret_from_fork+0x7c/0xb0
      [55650.808017]  [<ffffffff810595b4>] ? __kthread_parkme+0x67/0x67
      [55650.808017] Code: 4c 89 ef 8d 70 ff e8 d4 fc ff ff 41 8b 45 34 41 39 45 30 7d 5c 31 f6 4c 89 ef e8 80 f6 ff ff 49 8b 7d 00 4c 89 f6 b9 00 04 00 00 <f3> a5 4c 89 ef 41 8b 45 30 8d 70 ff e8 a3 fc ff ff 41 8b 45 34
      [55650.808017] RIP  [<ffffffffa037cf45>] write_bitmap_entries+0x65/0xbb [btrfs]
      [55650.808017]  RSP <ffff88007153bc30>
      [55650.815725] ---[ end trace 1c032e96b149ff86 ]---
      
      Fix this by serializing both tasks in such a way that cache writeout
      doesn't wait for the trim/discard of free space entries to finish and
      doesn't miss any free space entry.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      55507ce3