1. 28 2月, 2017 1 次提交
  2. 17 2月, 2017 1 次提交
  3. 06 12月, 2016 3 次提交
  4. 27 9月, 2016 1 次提交
  5. 25 8月, 2016 1 次提交
    • W
      btrfs: update btrfs_space_info's bytes_may_use timely · 18513091
      Wang Xiaoguang 提交于
      This patch can fix some false ENOSPC errors, below test script can
      reproduce one false ENOSPC error:
      	#!/bin/bash
      	dd if=/dev/zero of=fs.img bs=$((1024*1024)) count=128
      	dev=$(losetup --show -f fs.img)
      	mkfs.btrfs -f -M $dev
      	mkdir /tmp/mntpoint
      	mount $dev /tmp/mntpoint
      	cd /tmp/mntpoint
      	xfs_io -f -c "falloc 0 $((64*1024*1024))" testfile
      
      Above script will fail for ENOSPC reason, but indeed fs still has free
      space to satisfy this request. Please see call graph:
      btrfs_fallocate()
      |-> btrfs_alloc_data_chunk_ondemand()
      |   bytes_may_use += 64M
      |-> btrfs_prealloc_file_range()
          |-> btrfs_reserve_extent()
              |-> btrfs_add_reserved_bytes()
              |   alloc_type is RESERVE_ALLOC_NO_ACCOUNT, so it does not
              |   change bytes_may_use, and bytes_reserved += 64M. Now
              |   bytes_may_use + bytes_reserved == 128M, which is greater
              |   than btrfs_space_info's total_bytes, false enospc occurs.
              |   Note, the bytes_may_use decrease operation will be done in
              |   end of btrfs_fallocate(), which is too late.
      
      Here is another simple case for buffered write:
                          CPU 1              |              CPU 2
                                             |
      |-> cow_file_range()                   |-> __btrfs_buffered_write()
          |-> btrfs_reserve_extent()         |   |
          |                                  |   |
          |                                  |   |
          |    .....                         |   |-> btrfs_check_data_free_space()
          |                                  |
          |                                  |
          |-> extent_clear_unlock_delalloc() |
      
      In CPU 1, btrfs_reserve_extent()->find_free_extent()->
      btrfs_add_reserved_bytes() do not decrease bytes_may_use, the decrease
      operation will be delayed to be done in extent_clear_unlock_delalloc().
      Assume in this case, btrfs_reserve_extent() reserved 128MB data, CPU2's
      btrfs_check_data_free_space() tries to reserve 100MB data space.
      If
      	100MB > data_sinfo->total_bytes - data_sinfo->bytes_used -
      		data_sinfo->bytes_reserved - data_sinfo->bytes_pinned -
      		data_sinfo->bytes_readonly - data_sinfo->bytes_may_use
      btrfs_check_data_free_space() will try to allcate new data chunk or call
      btrfs_start_delalloc_roots(), or commit current transaction in order to
      reserve some free space, obviously a lot of work. But indeed it's not
      necessary as long as decreasing bytes_may_use timely, we still have
      free space, decreasing 128M from bytes_may_use.
      
      To fix this issue, this patch chooses to update bytes_may_use for both
      data and metadata in btrfs_add_reserved_bytes(). For compress path, real
      extent length may not be equal to file content length, so introduce a
      ram_bytes argument for btrfs_reserve_extent(), find_free_extent() and
      btrfs_add_reserved_bytes(), it's becasue bytes_may_use is increased by
      file content length. Then compress path can update bytes_may_use
      correctly. Also now we can discard RESERVE_ALLOC_NO_ACCOUNT, RESERVE_ALLOC
      and RESERVE_FREE.
      
      As we know, usually EXTENT_DO_ACCOUNTING is used for error path. In
      run_delalloc_nocow(), for inode marked as NODATACOW or extent marked as
      PREALLOC, we also need to update bytes_may_use, but can not pass
      EXTENT_DO_ACCOUNTING, because it also clears metadata reservation, so
      here we introduce EXTENT_CLEAR_DATA_RESV flag to indicate btrfs_clear_bit_hook()
      to update btrfs_space_info's bytes_may_use.
      
      Meanwhile __btrfs_prealloc_file_range() will call
      btrfs_free_reserved_data_space() internally for both sucessful and failed
      path, btrfs_prealloc_file_range()'s callers does not need to call
      btrfs_free_reserved_data_space() any more.
      Signed-off-by: NWang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
      Reviewed-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      18513091
  6. 26 7月, 2016 2 次提交
  7. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  8. 12 3月, 2016 1 次提交
  9. 16 1月, 2016 1 次提交
    • C
      Btrfs: Initialize btrfs_root->highest_objectid when loading tree root and subvolume roots · f32e48e9
      Chandan Rajendra 提交于
      The following call trace is seen when btrfs/031 test is executed in a loop,
      
      [  158.661848] ------------[ cut here ]------------
      [  158.662634] WARNING: CPU: 2 PID: 890 at /home/chandan/repos/linux/fs/btrfs/ioctl.c:558 create_subvol+0x3d1/0x6ea()
      [  158.664102] BTRFS: Transaction aborted (error -2)
      [  158.664774] Modules linked in:
      [  158.665266] CPU: 2 PID: 890 Comm: btrfs Not tainted 4.4.0-rc6-g511711af #2
      [  158.666251] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      [  158.667392]  ffffffff81c0a6b0 ffff8806c7c4f8e8 ffffffff81431fc8 ffff8806c7c4f930
      [  158.668515]  ffff8806c7c4f920 ffffffff81051aa1 ffff880c85aff000 ffff8800bb44d000
      [  158.669647]  ffff8808863b5c98 0000000000000000 00000000fffffffe ffff8806c7c4f980
      [  158.670769] Call Trace:
      [  158.671153]  [<ffffffff81431fc8>] dump_stack+0x44/0x5c
      [  158.671884]  [<ffffffff81051aa1>] warn_slowpath_common+0x81/0xc0
      [  158.672769]  [<ffffffff81051b27>] warn_slowpath_fmt+0x47/0x50
      [  158.673620]  [<ffffffff813bc98d>] create_subvol+0x3d1/0x6ea
      [  158.674440]  [<ffffffff813777c9>] btrfs_mksubvol.isra.30+0x369/0x520
      [  158.675376]  [<ffffffff8108a4aa>] ? percpu_down_read+0x1a/0x50
      [  158.676235]  [<ffffffff81377a81>] btrfs_ioctl_snap_create_transid+0x101/0x180
      [  158.677268]  [<ffffffff81377b52>] btrfs_ioctl_snap_create+0x52/0x70
      [  158.678183]  [<ffffffff8137afb4>] btrfs_ioctl+0x474/0x2f90
      [  158.678975]  [<ffffffff81144b8e>] ? vma_merge+0xee/0x300
      [  158.679751]  [<ffffffff8115be31>] ? alloc_pages_vma+0x91/0x170
      [  158.680599]  [<ffffffff81123f62>] ? lru_cache_add_active_or_unevictable+0x22/0x70
      [  158.681686]  [<ffffffff813d99cf>] ? selinux_file_ioctl+0xff/0x1d0
      [  158.682581]  [<ffffffff8117b791>] do_vfs_ioctl+0x2c1/0x490
      [  158.683399]  [<ffffffff813d3cde>] ? security_file_ioctl+0x3e/0x60
      [  158.684297]  [<ffffffff8117b9d4>] SyS_ioctl+0x74/0x80
      [  158.685051]  [<ffffffff819b2bd7>] entry_SYSCALL_64_fastpath+0x12/0x6a
      [  158.685958] ---[ end trace 4b63312de5a2cb76 ]---
      [  158.686647] BTRFS: error (device loop0) in create_subvol:558: errno=-2 No such entry
      [  158.709508] BTRFS info (device loop0): forced readonly
      [  158.737113] BTRFS info (device loop0): disk space caching is enabled
      [  158.738096] BTRFS error (device loop0): Remounting read-write after error is not allowed
      [  158.851303] BTRFS error (device loop0): cleaner transaction attach returned -30
      
      This occurs because,
      
      Mount filesystem
      Create subvol with ID 257
      Unmount filesystem
      Mount filesystem
      Delete subvol with ID 257
        btrfs_drop_snapshot()
          Add root corresponding to subvol 257 into
          btrfs_transaction->dropped_roots list
      Create new subvol (i.e. create_subvol())
        257 is returned as the next free objectid
        btrfs_read_fs_root_no_name()
          Finds the btrfs_root instance corresponding to the old subvol with ID 257
          in btrfs_fs_info->fs_roots_radix.
          Returns error since btrfs_root_item->refs has the value of 0.
      
      To fix the issue the commit initializes tree root's and subvolume root's
      highest_objectid when loading the roots from disk.
      Signed-off-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      f32e48e9
  10. 07 1月, 2016 3 次提交
  11. 22 10月, 2015 2 次提交
  12. 01 7月, 2015 2 次提交
    • F
      Btrfs: fix race between caching kthread and returning inode to inode cache · ae9d8f17
      Filipe Manana 提交于
      While the inode cache caching kthread is calling btrfs_unpin_free_ino(),
      we could have a concurrent call to btrfs_return_ino() that adds a new
      entry to the root's free space cache of pinned inodes. This concurrent
      call does not acquire the fs_info->commit_root_sem before adding a new
      entry if the caching state is BTRFS_CACHE_FINISHED, which is a problem
      because the caching kthread calls btrfs_unpin_free_ino() after setting
      the caching state to BTRFS_CACHE_FINISHED and therefore races with
      the task calling btrfs_return_ino(), which is adding a new entry, while
      the former (caching kthread) is navigating the cache's rbtree, removing
      and freeing nodes from the cache's rbtree without acquiring the spinlock
      that protects the rbtree.
      
      This race resulted in memory corruption due to double free of struct
      btrfs_free_space objects because both tasks can end up doing freeing the
      same objects. Note that adding a new entry can result in merging it with
      other entries in the cache, in which case those entries are freed.
      This is particularly important as btrfs_free_space structures are also
      used for the block group free space caches.
      
      This memory corruption can be detected by a debugging kernel, which
      reports it with the following trace:
      
      [132408.501148] slab error in verify_redzone_free(): cache `btrfs_free_space': double free detected
      [132408.505075] CPU: 15 PID: 12248 Comm: btrfs-ino-cache Tainted: G        W       4.1.0-rc5-btrfs-next-10+ #1
      [132408.505075] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014
      [132408.505075]  ffff880023e7d320 ffff880163d73cd8 ffffffff8145eec7 ffffffff81095dce
      [132408.505075]  ffff880009735d40 ffff880163d73ce8 ffffffff81154e1e ffff880163d73d68
      [132408.505075]  ffffffff81155733 ffffffffa054a95a ffff8801b6099f00 ffffffffa0505b5f
      [132408.505075] Call Trace:
      [132408.505075]  [<ffffffff8145eec7>] dump_stack+0x4f/0x7b
      [132408.505075]  [<ffffffff81095dce>] ? console_unlock+0x356/0x3a2
      [132408.505075]  [<ffffffff81154e1e>] __slab_error.isra.28+0x25/0x36
      [132408.505075]  [<ffffffff81155733>] __cache_free+0xe2/0x4b6
      [132408.505075]  [<ffffffffa054a95a>] ? __btrfs_add_free_space+0x2f0/0x343 [btrfs]
      [132408.505075]  [<ffffffffa0505b5f>] ? btrfs_unpin_free_ino+0x8e/0x99 [btrfs]
      [132408.505075]  [<ffffffff810f3b30>] ? time_hardirqs_off+0x15/0x28
      [132408.505075]  [<ffffffff81084d42>] ? trace_hardirqs_off+0xd/0xf
      [132408.505075]  [<ffffffff811563a1>] ? kfree+0xb6/0x14e
      [132408.505075]  [<ffffffff811563d0>] kfree+0xe5/0x14e
      [132408.505075]  [<ffffffffa0505b5f>] btrfs_unpin_free_ino+0x8e/0x99 [btrfs]
      [132408.505075]  [<ffffffffa0505e08>] caching_kthread+0x29e/0x2d9 [btrfs]
      [132408.505075]  [<ffffffffa0505b6a>] ? btrfs_unpin_free_ino+0x99/0x99 [btrfs]
      [132408.505075]  [<ffffffff8106698f>] kthread+0xef/0xf7
      [132408.505075]  [<ffffffff810f3b08>] ? time_hardirqs_on+0x15/0x28
      [132408.505075]  [<ffffffff810668a0>] ? __kthread_parkme+0xad/0xad
      [132408.505075]  [<ffffffff814653d2>] ret_from_fork+0x42/0x70
      [132408.505075]  [<ffffffff810668a0>] ? __kthread_parkme+0xad/0xad
      [132408.505075] ffff880023e7d320: redzone 1:0x9f911029d74e35b, redzone 2:0x9f911029d74e35b.
      [132409.501654] slab: double free detected in cache 'btrfs_free_space', objp ffff880023e7d320
      [132409.503355] ------------[ cut here ]------------
      [132409.504241] kernel BUG at mm/slab.c:2571!
      
      Therefore fix this by having btrfs_unpin_free_ino() acquire the lock
      that protects the rbtree while doing the searches and removing entries.
      
      Fixes: 1c70d8fb ("Btrfs: fix inode caching vs tree log")
      Cc: stable@vger.kernel.org
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      ae9d8f17
    • F
      Btrfs: use kmem_cache_free when freeing entry in inode cache · c3f4a168
      Filipe Manana 提交于
      The free space entries are allocated using kmem_cache_zalloc(),
      through __btrfs_add_free_space(), therefore we should use
      kmem_cache_free() and not kfree() to avoid any confusion and
      any potential problem. Looking at the kfree() definition at
      mm/slab.c it has the following comment:
      
        /*
         * (...)
         *
         * Don't free memory not originally allocated by kmalloc()
         * or you will run into trouble.
         */
      
      So better be safe and use kmem_cache_free().
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      c3f4a168
  13. 11 4月, 2015 1 次提交
    • C
      Btrfs: allow block group cache writeout outside critical section in commit · 1bbc621e
      Chris Mason 提交于
      We loop through all of the dirty block groups during commit and write
      the free space cache.  In order to make sure the cache is currect, we do
      this while no other writers are allowed in the commit.
      
      If a large number of block groups are dirty, this can introduce long
      stalls during the final stages of the commit, which can block new procs
      trying to change the filesystem.
      
      This commit changes the block group cache writeout to take appropriate
      locks and allow it to run earlier in the commit.  We'll still have to
      redo some of the block groups, but it means we can get most of the work
      out of the way without blocking the entire FS.
      Signed-off-by: NChris Mason <clm@fb.com>
      1bbc621e
  14. 03 12月, 2014 1 次提交
    • F
      Btrfs: fix race between writing free space cache and trimming · 55507ce3
      Filipe Manana 提交于
      Trimming is completely transactionless, and the way it operates consists
      of hiding free space entries from a block group, perform the trim/discard
      and then make the free space entries visible again.
      Therefore while a free space entry is being trimmed, we can have free space
      cache writing running in parallel (as part of a transaction commit) which
      will miss the free space entry. This means that an unmount (or crash/reboot)
      after that transaction commit and mount again before another transaction
      starts/commits after the discard finishes, we will have some free space
      that won't be used again unless the free space cache is rebuilt. After the
      unmount, fsck (btrfsck, btrfs check) reports the issue like the following
      example:
      
              *** fsck.btrfs output ***
              checking extents
              checking free space cache
              There is no free space entry for 521764864-521781248
              There is no free space entry for 521764864-1103101952
              cache appears valid but isnt 29360128
              Checking filesystem on /dev/sdc
              UUID: b4789e27-4774-4626-98e9-ae8dfbfb0fb5
              found 1235681286 bytes used err is -22
              (...)
      
      Another issue caused by this race is a crash while writing bitmap entries
      to the cache, because while the cache writeout task accesses the bitmaps,
      the trim task can be concurrently modifying the bitmap or worse might
      be freeing the bitmap. The later case results in the following crash:
      
      [55650.804460] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
      [55650.804835] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc loop parport_pc parport i2c_piix4 psmouse evdev pcspkr microcode processor i2ccore serio_raw thermal_sys button ext4 crc16 jbd2 mbcache sg sd_mod crc_t10dif sr_mod cdrom crct10dif_generic crct10dif_common ata_generic virtio_scsi floppy ata_piix libata virtio_pci virtio_ring virtio scsi_mod e1000 [last unloaded: btrfs]
      [55650.806169] CPU: 1 PID: 31002 Comm: btrfs-transacti Tainted: G        W      3.17.0-rc5-btrfs-next-1+ #1
      [55650.806493] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
      [55650.806867] task: ffff8800b12f6410 ti: ffff880071538000 task.ti: ffff880071538000
      [55650.807166] RIP: 0010:[<ffffffffa037cf45>]  [<ffffffffa037cf45>] write_bitmap_entries+0x65/0xbb [btrfs]
      [55650.807514] RSP: 0018:ffff88007153bc30  EFLAGS: 00010246
      [55650.807687] RAX: 000000005d1ec000 RBX: ffff8800a665df08 RCX: 0000000000000400
      [55650.807885] RDX: ffff88005d1ec000 RSI: 6b6b6b6b6b6b6b6b RDI: ffff88005d1ec000
      [55650.808017] RBP: ffff88007153bc58 R08: 00000000ddd51536 R09: 00000000000001e0
      [55650.808017] R10: 0000000000000000 R11: 0000000000000037 R12: 6b6b6b6b6b6b6b6b
      [55650.808017] R13: ffff88007153bca8 R14: 6b6b6b6b6b6b6b6b R15: ffff88007153bc98
      [55650.808017] FS:  0000000000000000(0000) GS:ffff88023ec80000(0000) knlGS:0000000000000000
      [55650.808017] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [55650.808017] CR2: 0000000002273b88 CR3: 00000000b18f6000 CR4: 00000000000006e0
      [55650.808017] Stack:
      [55650.808017]  ffff88020e834e00 ffff880172d68db0 0000000000000000 ffff88019257c800
      [55650.808017]  ffff8801d42ea720 ffff88007153bd10 ffffffffa037d2fa ffff880224e99180
      [55650.808017]  ffff8801469a6188 ffff880224e99140 ffff880172d68c50 00000003000000b7
      [55650.808017] Call Trace:
      [55650.808017]  [<ffffffffa037d2fa>] __btrfs_write_out_cache+0x1ea/0x37f [btrfs]
      [55650.808017]  [<ffffffffa037d959>] btrfs_write_out_cache+0xa1/0xd8 [btrfs]
      [55650.808017]  [<ffffffffa033936b>] btrfs_write_dirty_block_groups+0x4b5/0x505 [btrfs]
      [55650.808017]  [<ffffffffa03aa98e>] commit_cowonly_roots+0x15e/0x1f7 [btrfs]
      [55650.808017]  [<ffffffff813eb9c7>] ? _raw_spin_lock+0xe/0x10
      [55650.808017]  [<ffffffffa0346e46>] btrfs_commit_transaction+0x411/0x882 [btrfs]
      [55650.808017]  [<ffffffffa03432a4>] transaction_kthread+0xf2/0x1a4 [btrfs]
      [55650.808017]  [<ffffffffa03431b2>] ? btrfs_cleanup_transaction+0x3d8/0x3d8 [btrfs]
      [55650.808017]  [<ffffffff8105966b>] kthread+0xb7/0xbf
      [55650.808017]  [<ffffffff810595b4>] ? __kthread_parkme+0x67/0x67
      [55650.808017]  [<ffffffff813ebeac>] ret_from_fork+0x7c/0xb0
      [55650.808017]  [<ffffffff810595b4>] ? __kthread_parkme+0x67/0x67
      [55650.808017] Code: 4c 89 ef 8d 70 ff e8 d4 fc ff ff 41 8b 45 34 41 39 45 30 7d 5c 31 f6 4c 89 ef e8 80 f6 ff ff 49 8b 7d 00 4c 89 f6 b9 00 04 00 00 <f3> a5 4c 89 ef 41 8b 45 30 8d 70 ff e8 a3 fc ff ff 41 8b 45 34
      [55650.808017] RIP  [<ffffffffa037cf45>] write_bitmap_entries+0x65/0xbb [btrfs]
      [55650.808017]  RSP <ffff88007153bc30>
      [55650.815725] ---[ end trace 1c032e96b149ff86 ]---
      
      Fix this by serializing both tasks in such a way that cache writeout
      doesn't wait for the trim/discard of free space entries to finish and
      doesn't miss any free space entry.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      55507ce3
  15. 12 11月, 2014 1 次提交
  16. 18 9月, 2014 1 次提交
  17. 10 6月, 2014 1 次提交
  18. 25 4月, 2014 2 次提交
    • M
      Btrfs: fix inode caching vs tree log · 1c70d8fb
      Miao Xie 提交于
      Currently, with inode cache enabled, we will reuse its inode id immediately
      after unlinking file, we may hit something like following:
      
      |->iput inode
      |->return inode id into inode cache
      |->create dir,fsync
      |->power off
      
      An easy way to reproduce this problem is:
      
      mkfs.btrfs -f /dev/sdb
      mount /dev/sdb /mnt -o inode_cache,commit=100
      dd if=/dev/zero of=/mnt/data bs=1M count=10 oflag=sync
      inode_id=`ls -i /mnt/data | awk '{print $1}'`
      rm -f /mnt/data
      
      i=1
      while [ 1 ]
      do
              mkdir /mnt/dir_$i
              test1=`stat /mnt/dir_$i | grep Inode: | awk '{print $4}'`
              if [ $test1 -eq $inode_id ]
              then
      		dd if=/dev/zero of=/mnt/dir_$i/data bs=1M count=1 oflag=sync
      		echo b > /proc/sysrq-trigger
      	fi
      	sleep 1
              i=$(($i+1))
      done
      
      mount /dev/sdb /mnt
      umount /dev/sdb
      btrfs check /dev/sdb
      
      We fix this problem by adding unlinked inode's id into pinned tree,
      and we can not reuse them until committing transaction.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      1c70d8fb
    • W
      Btrfs: avoid triggering bug_on() when we fail to start inode caching task · e60efa84
      Wang Shilong 提交于
      When running stress test(including snapshots,balance,fstress), we trigger
      the following BUG_ON() which is because we fail to start inode caching task.
      
      [  181.131945] kernel BUG at fs/btrfs/inode-map.c:179!
      [  181.137963] invalid opcode: 0000 [#1] SMP
      [  181.217096] CPU: 11 PID: 2532 Comm: btrfs Not tainted 3.14.0 #1
      [  181.240521] task: ffff88013b621b30 ti: ffff8800b6ada000 task.ti: ffff8800b6ada000
      [  181.367506] Call Trace:
      [  181.371107]  [<ffffffffa036c1be>] btrfs_return_ino+0x9e/0x110 [btrfs]
      [  181.379191]  [<ffffffffa038082b>] btrfs_evict_inode+0x46b/0x4c0 [btrfs]
      [  181.387464]  [<ffffffff810b5a70>] ? autoremove_wake_function+0x40/0x40
      [  181.395642]  [<ffffffff811dc5fe>] evict+0x9e/0x190
      [  181.401882]  [<ffffffff811dcde3>] iput+0xf3/0x180
      [  181.408025]  [<ffffffffa03812de>] btrfs_orphan_cleanup+0x1ee/0x430 [btrfs]
      [  181.416614]  [<ffffffffa03a6abd>] btrfs_mksubvol.isra.29+0x3bd/0x450 [btrfs]
      [  181.425399]  [<ffffffffa03a6cd6>] btrfs_ioctl_snap_create_transid+0x186/0x190 [btrfs]
      [  181.435059]  [<ffffffffa03a6e3b>] btrfs_ioctl_snap_create_v2+0xeb/0x130 [btrfs]
      [  181.444148]  [<ffffffffa03a9656>] btrfs_ioctl+0xf76/0x2b90 [btrfs]
      [  181.451971]  [<ffffffff8117e565>] ? handle_mm_fault+0x475/0xe80
      [  181.459509]  [<ffffffff8167ba0c>] ? __do_page_fault+0x1ec/0x520
      [  181.467046]  [<ffffffff81185b35>] ? do_mmap_pgoff+0x2f5/0x3c0
      [  181.474393]  [<ffffffff811d4da8>] do_vfs_ioctl+0x2d8/0x4b0
      [  181.481450]  [<ffffffff811d5001>] SyS_ioctl+0x81/0xa0
      [  181.488021]  [<ffffffff81680b69>] system_call_fastpath+0x16/0x1b
      
      We should avoid triggering BUG_ON() here, instead, we output warning messages
      and clear inode_cache option.
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      e60efa84
  19. 07 4月, 2014 1 次提交
    • J
      Btrfs: remove transaction from send · 9e351cc8
      Josef Bacik 提交于
      Lets try this again.  We can deadlock the box if we send on a box and try to
      write onto the same fs with the app that is trying to listen to the send pipe.
      This is because the writer could get stuck waiting for a transaction commit
      which is being blocked by the send.  So fix this by making sure looking at the
      commit roots is always going to be consistent.  We do this by keeping track of
      which roots need to have their commit roots swapped during commit, and then
      taking the commit_root_sem and swapping them all at once.  Then make sure we
      take a read lock on the commit_root_sem in cases where we search the commit root
      to make sure we're always looking at a consistent view of the commit roots.
      Previously we had problems with this because we would swap a fs tree commit root
      and then swap the extent tree commit root independently which would cause the
      backref walking code to screw up sometimes.  With this patch we no longer
      deadlock and pass all the weird send/receive corner cases.  Thanks,
      Reportedy-by: NHugo Mills <hugo@carfax.org.uk>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      9e351cc8
  20. 12 11月, 2013 5 次提交
    • D
      btrfs: Use WARN_ON()'s return value in place of WARN_ON(1) · fae7f21c
      Dulshani Gunawardhana 提交于
      Use WARN_ON()'s return value in place of WARN_ON(1) for cleaner source
      code that outputs a more descriptive warnings. Also fix the styling
      warning of redundant braces that came up as a result of this fix.
      Signed-off-by: NDulshani Gunawardhana <dulshani.gunawardhana89@gmail.com>
      Reviewed-by: NZach Brown <zab@redhat.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      fae7f21c
    • S
      Btrfs: Don't allocate inode that is already in use · ff76b056
      Stefan Behrens 提交于
      Due to an off-by-one error, it is possible to reproduce a bug
      when the inode cache is used.
      
      The same inode number is assigned twice, the second time this
      leads to an EEXIST in btrfs_insert_empty_items().
      
      The issue can happen when a file is removed right after a subvolume
      is created and then a new inode number is created before the
      inodes in free_inode_pinned are processed.
      unlink() calls btrfs_return_ino() which calls start_caching() in this
      case which adds [highest_ino + 1, BTRFS_LAST_FREE_OBJECTID] by
      searching for the highest inode (which already cannot find the
      unlinked one anymore in btrfs_find_free_objectid()). So if this
      unlinked inode's number is equal to the highest_ino + 1 (or >= this value
      instead of > this value which was the off-by-one error), we mustn't add
      the inode number to free_ino_pinned (caching_thread() does it right).
      In this case we need to try directly to add the number to the inode_cache
      which will fail in this case.
      
      When this inode number is allocated while it is still in free_ino_pinned,
      it is allocated and still added to the free inode cache when the
      pinned inodes are processed, thus one of the following inode number
      allocations will get an inode that is already in use and fail with EEXIST
      in btrfs_insert_empty_items().
      
      One example which was created with the reproducer below:
      Create a snapshot, work in the newly created snapshot for the rest.
      In unlink(inode 34284) call btrfs_return_ino() which calls start_caching().
      start_caching() calls add_free_space [34284, 18446744073709517077].
      In btrfs_return_ino(), call start_caching pinned [34284, 1] which is wrong.
      mkdir() call btrfs_find_ino_for_alloc() which returns the number 34284.
      btrfs_unpin_free_ino calls add_free_space [34284, 1].
      mkdir() call btrfs_find_ino_for_alloc() which returns the number 34284.
      EEXIST when the new inode is inserted.
      
      One possible reproducer is this one:
       #!/bin/sh
       # preparation
      TEST_DEV=/dev/sdc1
      TEST_MNT=/mnt
      umount ${TEST_MNT} 2>/dev/null || true
      mkfs.btrfs -f ${TEST_DEV}
      mount ${TEST_DEV} ${TEST_MNT} -o \
       rw,relatime,compress=lzo,space_cache,inode_cache
      btrfs subv create ${TEST_MNT}/s1
      for i in `seq 34027`; do touch ${TEST_MNT}/s1/${i}; done
      btrfs subv snap ${TEST_MNT}/s1 ${TEST_MNT}/s2
      FILENAME=`find ${TEST_MNT}/s1/ -inum 4085 | sed 's|^.*/\([^/]*\)$|\1|'`
      rm ${TEST_MNT}/s2/$FILENAME
      touch ${TEST_MNT}/s2/$FILENAME
       # the following steps can be repeated to reproduce the issue again and again
      [ -e ${TEST_MNT}/s3 ] && btrfs subv del ${TEST_MNT}/s3
      btrfs subv snap ${TEST_MNT}/s2 ${TEST_MNT}/s3
      rm ${TEST_MNT}/s3/$FILENAME
      touch ${TEST_MNT}/s3/$FILENAME
      ls -alFi ${TEST_MNT}/s?/$FILENAME
      touch ${TEST_MNT}/s3/_1 || logger FAILED
      ls -alFi ${TEST_MNT}/s?/_1
      touch ${TEST_MNT}/s3/_2 || logger FAILED
      ls -alFi ${TEST_MNT}/s?/_2
      touch ${TEST_MNT}/s3/__1 || logger FAILED
      ls -alFi ${TEST_MNT}/s?/__1
      touch ${TEST_MNT}/s3/__2 || logger FAILED
      ls -alFi ${TEST_MNT}/s?/__2
       # if the above is not enough, add the following loop:
      for i in `seq 3 9`; do touch ${TEST_MNT}/s3/__${i} || logger FAILED; done
       #for i in `seq 3 34027`; do touch ${TEST_MNT}/s3/__${i} || logger FAILED; done
       # one of the touch(1) calls in s3 fail due to EEXIST because the inode is
       # already in use that btrfs_find_ino_for_alloc() returns.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      Reviewed-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      ff76b056
    • F
      Btrfs: remove path arg from btrfs_truncate_free_space_cache · 74514323
      Filipe David Borba Manana 提交于
      Not used for anything, and removing it avoids caller's need to
      allocate a path structure.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      74514323
    • F
      Btrfs: remove duplicated ino cache's inode lookup · 53645a91
      Filipe David Borba Manana 提交于
      We're doing a unnecessary extra lookup of the ino cache's
      inode when we already have it (and holding a reference)
      during the process of saving the ino cache contents to disk.
      Therefore remove this extra lookup.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      53645a91
    • S
      Btrfs: eliminate the exceptional root_tree refs=0 · 69e9c6c6
      Stefan Behrens 提交于
      The fact that btrfs_root_refs() returned 0 for the tree_root caused
      bugs in the past, therefore it is set to 1 with this patch and
      (hopefully) all affected code is adapted to this change.
      
      I verified this change by temporarily adding WARN_ON() checks
      everywhere where btrfs_root_refs() is used, checking whether the
      logic of the code is changed by btrfs_root_refs() returning 1
      instead of 0 for root->root_key.objectid == BTRFS_ROOT_TREE_OBJECTID.
      With these added checks, I ran the xfstests './check -g auto'.
      
      The two roots chunk_root and log_root_tree that are only referenced
      by the superblock and the log_roots below the log_root_tree still
      have btrfs_root_refs() == 0, only the tree_root is changed.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      69e9c6c6
  21. 18 5月, 2013 2 次提交
  22. 12 12月, 2012 1 次提交
    • M
      Btrfs: improve the noflush reservation · 08e007d2
      Miao Xie 提交于
      In some places(such as: evicting inode), we just can not flush the reserved
      space of delalloc, flushing the delayed directory index and delayed inode
      is OK, but we don't try to flush those things and just go back when there is
      no enough space to be reserved. This patch fixes this problem.
      
      We defined 3 types of the flush operations: NO_FLUSH, FLUSH_LIMIT and FLUSH_ALL.
      If we can in the transaction, we should not flush anything, or the deadlock
      would happen, so use NO_FLUSH. If we flushing the reserved space of delalloc
      would cause deadlock, use FLUSH_LIMIT. In the other cases, FLUSH_ALL is used,
      and we will flush all things.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      08e007d2
  23. 29 3月, 2012 1 次提交
  24. 22 3月, 2012 1 次提交
  25. 24 2月, 2012 1 次提交
  26. 17 1月, 2012 1 次提交
  27. 11 11月, 2011 1 次提交
    • M
      Btrfs: fix no reserved space for writing out inode cache · ba38eb4d
      Miao Xie 提交于
      I-node cache forgets to reserve the space when writing out it. And when
      we do some stress test, such as synctest, it will trigger WARN_ON() in
      use_block_rsv().
      
      WARNING: at fs/btrfs/extent-tree.c:5718 btrfs_alloc_free_block+0xbf/0x281 [btrfs]()
      ...
      Call Trace:
       [<ffffffff8104df86>] warn_slowpath_common+0x80/0x98
       [<ffffffff8104dfb3>] warn_slowpath_null+0x15/0x17
       [<ffffffffa0369c60>] btrfs_alloc_free_block+0xbf/0x281 [btrfs]
       [<ffffffff810cbcb8>] ? __set_page_dirty_nobuffers+0xfe/0x108
       [<ffffffffa035c040>] __btrfs_cow_block+0x118/0x3b5 [btrfs]
       [<ffffffffa035c7ba>] btrfs_cow_block+0x103/0x14e [btrfs]
       [<ffffffffa035e4c4>] btrfs_search_slot+0x249/0x6a4 [btrfs]
       [<ffffffffa036d086>] btrfs_lookup_inode+0x2a/0x8a [btrfs]
       [<ffffffffa03788b7>] btrfs_update_inode+0xaa/0x141 [btrfs]
       [<ffffffffa036d7ec>] btrfs_save_ino_cache+0xea/0x202 [btrfs]
       [<ffffffffa03a761e>] ? btrfs_update_reloc_root+0x17e/0x197 [btrfs]
       [<ffffffffa0373867>] commit_fs_roots+0xaa/0x158 [btrfs]
       [<ffffffffa03746a6>] btrfs_commit_transaction+0x405/0x731 [btrfs]
       [<ffffffff810690df>] ? wake_up_bit+0x25/0x25
       [<ffffffffa039d652>] ? btrfs_log_dentry_safe+0x43/0x51 [btrfs]
       [<ffffffffa0381c5f>] btrfs_sync_file+0x16a/0x198 [btrfs]
       [<ffffffff81122806>] ? mntput+0x21/0x23
       [<ffffffff8112d150>] vfs_fsync_range+0x18/0x21
       [<ffffffff8112d170>] vfs_fsync+0x17/0x19
       [<ffffffff8112d316>] do_fsync+0x29/0x3e
       [<ffffffff8112d348>] sys_fsync+0xb/0xf
       [<ffffffff81468352>] system_call_fastpath+0x16/0x1b
      
      Sometimes it causes BUG_ON() in the reservation code of the delayed inode
      is triggered.
      
      So we must reserve enough space for inode cache.
      
      Note: If we can not reserve the enough space for inode cache, we will
      give up writing out it.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      ba38eb4d