1. 15 1月, 2013 1 次提交
    • J
      Btrfs: add orphan before truncating pagecache · f3fe820c
      Josef Bacik 提交于
      Running xfstests 83 in a loop would sometimes fail the fsck.  This happens
      because if we invalidate a page that already has an ordered extent setup for
      it we will complete the ordered extent ourselves, assuming that the truncate
      will clean everything up.  The problem with this is there is plenty of time
      for the truncate to fail after we've done this work.  So to fix this we need
      to add the orphan item first to make sure the cleanup gets done properly,
      and then we can truncate the pagecache and all that stuff and be safe.  This
      fixes the btrfsck failures I was seeing while running 83 in a loop.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      f3fe820c
  2. 18 12月, 2012 2 次提交
    • L
      Btrfs: fix a bug of per-file nocow · 213490b3
      Liu Bo 提交于
      Users report a bug, the reproducer is:
      $ mkfs.btrfs /dev/loop0
      $ mount /dev/loop0 /mnt/btrfs/
      $ mkdir /mnt/btrfs/dir
      $ chattr +C /mnt/btrfs/dir/
      $ dd if=/dev/zero of=/mnt/btrfs/dir/foo bs=4K count=10;
      $ lsattr /mnt/btrfs/dir/foo
      ---------------C- /mnt/btrfs/dir/foo
      $ filefrag /mnt/btrfs/dir/foo
      /mnt/btrfs/dir/foo: 1 extent found    ---> an extent
      $ dd if=/dev/zero of=/mnt/btrfs/dir/foo bs=4K count=1 seek=5 conv=notrunc,nocreat; sync
      $ filefrag /mnt/btrfs/dir/foo
      /mnt/btrfs/dir/foo: 3 extents found   ---> with nocow, btrfs breaks the extent into three parts
      
      The new created file should not only inherit the NODATACOW flag, but also
      honor NODATASUM flag, because we must do COW on a file extent with checksum.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      213490b3
    • C
      Btrfs: fix hash overflow handling · 9c52057c
      Chris Mason 提交于
      The handling for directory crc hash overflows was fairly obscure,
      split_leaf returns EOVERFLOW when we try to extend the item and that is
      supposed to bubble up to userland.  For a while it did so, but along the
      way we added better handling of errors and forced the FS readonly if we
      hit IO errors during the directory insertion.
      
      Along the way, we started testing only for EEXIST and the EOVERFLOW case
      was dropped.  The end result is that we may force the FS readonly if we
      catch a directory hash bucket overflow.
      
      This fixes a few problem spots.  First I add tests for EOVERFLOW in the
      places where we can safely just return the error up the chain.
      
      btrfs_rename is harder though, because it tries to insert the new
      directory item only after it has already unlinked anything the rename
      was going to overwrite.  Rather than adding very complex logic, I added
      a helper to test for the hash overflow case early while it is still safe
      to bail out.
      
      Snapshot and subvolume creation had a similar problem, so they are using
      the new helper now too.
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      Reported-by: NPascal Junod <pascal@junod.info>
      9c52057c
  3. 17 12月, 2012 12 次提交
  4. 13 12月, 2012 6 次提交
  5. 12 12月, 2012 2 次提交
    • M
      Btrfs: make delalloc inodes be flushed by multi-task · 8ccf6f19
      Miao Xie 提交于
      This patch introduce a new worker pool named "flush_workers", and if we
      want to force all the inode with pending delalloc to the disks, we can
      queue those inodes into the work queue of the worker pool, in this way,
      those inodes will be flushed by multi-task.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      8ccf6f19
    • M
      Btrfs: improve the noflush reservation · 08e007d2
      Miao Xie 提交于
      In some places(such as: evicting inode), we just can not flush the reserved
      space of delalloc, flushing the delayed directory index and delayed inode
      is OK, but we don't try to flush those things and just go back when there is
      no enough space to be reserved. This patch fixes this problem.
      
      We defined 3 types of the flush operations: NO_FLUSH, FLUSH_LIMIT and FLUSH_ALL.
      If we can in the transaction, we should not flush anything, or the deadlock
      would happen, so use NO_FLUSH. If we flushing the reserved space of delalloc
      would cause deadlock, use FLUSH_LIMIT. In the other cases, FLUSH_ALL is used,
      and we will flush all things.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      08e007d2
  6. 26 10月, 2012 1 次提交
  7. 09 10月, 2012 4 次提交
  8. 04 10月, 2012 1 次提交
  9. 03 10月, 2012 1 次提交
  10. 02 10月, 2012 10 次提交
    • M
      Btrfs: fix unnecessary warning when the fragments make the space alloc fail · 962197ba
      Miao Xie 提交于
      When we wrote some data by compress mode into a btrfs filesystem which was full
      of the fragments, the kernel will report:
      	BTRFS warning (device xxx): Aborting unused transaction.
      
      The reason is:
      We can not find a long enough free space to store the compressed data because
      of the fragmentary free space, and the compressed data can not be splited,
      so the kernel outputed the above message.
      
      In fact, btrfs can deal with this problem very well: it fall back to
      uncompressed IO, split the uncompressed data into small ones, and then
      store them into to the fragmentary free space. So we shouldn't output the
      above warning message.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      962197ba
    • J
      Btrfs: create a pinned em when writing to a prealloc range in DIO · 69ffb543
      Josef Bacik 提交于
      Wade Cline reported a problem where he was getting garbage and warnings when
      writing to a preallocated range via O_DIRECT.  This is because we weren't
      creating our normal pinned extent_map for the range we were writing to,
      which was causing all sorts of issues.  This patch fixes the problem and
      makes his testcase much happier.  Thanks,
      Reported-by: NWade Cline <clinew@linux.vnet.ibm.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      69ffb543
    • M
      Btrfs: fix corrupted metadata in the snapshot · 8407aa46
      Miao Xie 提交于
      When we delete a inode, we will remove all the delayed items including delayed
      inode update, and then truncate all the relative metadata. If there is lots of
      metadata, we will end the current transaction, and start a new transaction to
      truncate the left metadata. In this way, we will leave a inode item that its
      link counter is > 0, and also may leave some directory index items in fs/file tree
      after the current transaction ends. In other words, the metadata in this fs/file tree
      is inconsistent. If we create a snapshot for this tree now, we will find a inode with
      corrupted metadata in the new snapshot, and we won't continue to drop the left metadata,
      because its link counter is not 0.
      
      We fix this problem by updating the inode item before the current transaction ends.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      8407aa46
    • D
      btrfs: polish names of kmem caches · 837e1972
      David Sterba 提交于
      Usecase:
      
        watch 'grep btrfs < /proc/slabinfo'
      
      easy to watch all caches in one go.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      837e1972
    • L
      Btrfs: use flag EXTENT_DEFRAG for snapshot-aware defrag · 9e8a4a8b
      Liu Bo 提交于
      We're going to use this flag EXTENT_DEFRAG to indicate which range
      belongs to defragment so that we can implement snapshow-aware defrag:
      
      We set the EXTENT_DEFRAG flag when dirtying the extents that need
      defragmented, so later on writeback thread can differentiate between
      normal writeback and writeback started by defragmentation.
      Original-Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      9e8a4a8b
    • M
      Btrfs: add a new "type" field into the block reservation structure · 66d8f3dd
      Miao Xie 提交于
      Sometimes we need choose the method of the reservation according to the type
      of the block reservation, such as the reservation for the delayed inode update.
      Now we identify the type just by comparing the address of the reservation
      variants, it is very ugly if it is a temporary one because we need compare it
      with all the common reservation variants. So we add a new "type" field to keep
      the type the reservation variants.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      66d8f3dd
    • S
      Btrfs: do not take cleanup_work_sem in btrfs_run_delayed_iputs() · ac14aed6
      Sage Weil 提交于
      Josef has suggested that this is not necessary.  Removing it also avoids
      this lockdep splat (after the new sb_internal locking stuff was added):
      
      [  604.090449] ======================================================
      [  604.114819] [ INFO: possible circular locking dependency detected ]
      [  604.139262] 3.6.0-rc2-ceph-00144-g463b030 #1 Not tainted
      [  604.162193] -------------------------------------------------------
      [  604.186139] btrfs-cleaner/6669 is trying to acquire lock:
      [  604.209555]  (sb_internal#2){.+.+..}, at: [<ffffffffa0042b84>] start_transaction+0x124/0x430 [btrfs]
      [  604.257100]
      [  604.257100] but task is already holding lock:
      [  604.300366]  (&fs_info->cleanup_work_sem){.+.+..}, at: [<ffffffffa0048002>] btrfs_run_delayed_iputs+0x72/0x130 [btrfs]
      [  604.352989]
      [  604.352989] which lock already depends on the new lock.
      [  604.352989]
      [  604.427104]
      [  604.427104] the existing dependency chain (in reverse order) is:
      [  604.478493]
      [  604.478493] -> #1 (&fs_info->cleanup_work_sem){.+.+..}:
      [  604.529313]        [<ffffffff810b2c82>] lock_acquire+0xa2/0x140
      [  604.559621]        [<ffffffff81632b69>] down_read+0x39/0x4e
      [  604.589382]        [<ffffffffa004db98>] btrfs_lookup_dentry+0x218/0x550 [btrfs]
      [  604.596161] btrfs: unlinked 1 orphans
      [  604.675002]        [<ffffffffa006aadd>] create_subvol+0x62d/0x690 [btrfs]
      [  604.708859]        [<ffffffffa006d666>] btrfs_mksubvol.isra.52+0x346/0x3a0 [btrfs]
      [  604.772466]        [<ffffffffa006d7f2>] btrfs_ioctl_snap_create_transid+0x132/0x190 [btrfs]
      [  604.842245]        [<ffffffffa006d8ae>] btrfs_ioctl_snap_create+0x5e/0x80 [btrfs]
      [  604.912852]        [<ffffffffa00708ae>] btrfs_ioctl+0x138e/0x1990 [btrfs]
      [  604.951888]        [<ffffffff8118e9b8>] do_vfs_ioctl+0x98/0x560
      [  604.989961]        [<ffffffff8118ef11>] sys_ioctl+0x91/0xa0
      [  605.026628]        [<ffffffff8163d569>] system_call_fastpath+0x16/0x1b
      [  605.064404]
      [  605.064404] -> #0 (sb_internal#2){.+.+..}:
      [  605.126832]        [<ffffffff810b25e8>] __lock_acquire+0x1ac8/0x1b90
      [  605.163671]        [<ffffffff810b2c82>] lock_acquire+0xa2/0x140
      [  605.200228]        [<ffffffff8117dac6>] __sb_start_write+0xc6/0x1b0
      [  605.236818]        [<ffffffffa0042b84>] start_transaction+0x124/0x430 [btrfs]
      [  605.274029]        [<ffffffffa00431a3>] btrfs_start_transaction+0x13/0x20 [btrfs]
      [  605.340520]        [<ffffffffa004ccfa>] btrfs_evict_inode+0x19a/0x330 [btrfs]
      [  605.378720]        [<ffffffff811972c8>] evict+0xb8/0x1c0
      [  605.416057]        [<ffffffff811974d5>] iput+0x105/0x210
      [  605.452373]        [<ffffffffa0048082>] btrfs_run_delayed_iputs+0xf2/0x130 [btrfs]
      [  605.521627]        [<ffffffffa003b5e1>] cleaner_kthread+0xa1/0x120 [btrfs]
      [  605.560520]        [<ffffffff810791ee>] kthread+0xae/0xc0
      [  605.598094]        [<ffffffff8163e744>] kernel_thread_helper+0x4/0x10
      [  605.636499]
      [  605.636499] other info that might help us debug this:
      [  605.636499]
      [  605.736504]  Possible unsafe locking scenario:
      [  605.736504]
      [  605.801931]        CPU0                    CPU1
      [  605.835126]        ----                    ----
      [  605.867093]   lock(&fs_info->cleanup_work_sem);
      [  605.898594]                                lock(sb_internal#2);
      [  605.931954]                                lock(&fs_info->cleanup_work_sem);
      [  605.965359]   lock(sb_internal#2);
      [  605.994758]
      [  605.994758]  *** DEADLOCK ***
      [  605.994758]
      [  606.075281] 2 locks held by btrfs-cleaner/6669:
      [  606.104528]  #0:  (&fs_info->cleaner_mutex){+.+...}, at: [<ffffffffa003b5d5>] cleaner_kthread+0x95/0x120 [btrfs]
      [  606.165626]  #1:  (&fs_info->cleanup_work_sem){.+.+..}, at: [<ffffffffa0048002>] btrfs_run_delayed_iputs+0x72/0x130 [btrfs]
      [  606.231297]
      [  606.231297] stack backtrace:
      [  606.287723] Pid: 6669, comm: btrfs-cleaner Not tainted 3.6.0-rc2-ceph-00144-g463b030 #1
      [  606.347823] Call Trace:
      [  606.376184]  [<ffffffff8162a77c>] print_circular_bug+0x1fb/0x20c
      [  606.409243]  [<ffffffff810b25e8>] __lock_acquire+0x1ac8/0x1b90
      [  606.441343]  [<ffffffffa0042b84>] ? start_transaction+0x124/0x430 [btrfs]
      [  606.474583]  [<ffffffff810b2c82>] lock_acquire+0xa2/0x140
      [  606.505934]  [<ffffffffa0042b84>] ? start_transaction+0x124/0x430 [btrfs]
      [  606.539429]  [<ffffffff8132babd>] ? do_raw_spin_unlock+0x5d/0xb0
      [  606.571719]  [<ffffffff8117dac6>] __sb_start_write+0xc6/0x1b0
      [  606.603498]  [<ffffffffa0042b84>] ? start_transaction+0x124/0x430 [btrfs]
      [  606.637405]  [<ffffffffa0042b84>] ? start_transaction+0x124/0x430 [btrfs]
      [  606.670165]  [<ffffffff81172e75>] ? kmem_cache_alloc+0xb5/0x160
      [  606.702144]  [<ffffffffa0042b84>] start_transaction+0x124/0x430 [btrfs]
      [  606.735562]  [<ffffffffa00256a6>] ? block_rsv_add_bytes+0x56/0x80 [btrfs]
      [  606.769861]  [<ffffffffa00431a3>] btrfs_start_transaction+0x13/0x20 [btrfs]
      [  606.804575]  [<ffffffffa004ccfa>] btrfs_evict_inode+0x19a/0x330 [btrfs]
      [  606.838756]  [<ffffffff81634c6b>] ? _raw_spin_unlock+0x2b/0x40
      [  606.872010]  [<ffffffff811972c8>] evict+0xb8/0x1c0
      [  606.903800]  [<ffffffff811974d5>] iput+0x105/0x210
      [  606.935416]  [<ffffffffa0048082>] btrfs_run_delayed_iputs+0xf2/0x130 [btrfs]
      [  606.970510]  [<ffffffffa003b5d5>] ? cleaner_kthread+0x95/0x120 [btrfs]
      [  607.005648]  [<ffffffffa003b5e1>] cleaner_kthread+0xa1/0x120 [btrfs]
      [  607.040724]  [<ffffffffa003b540>] ? btrfs_destroy_delayed_refs.isra.102+0x220/0x220 [btrfs]
      [  607.104740]  [<ffffffff810791ee>] kthread+0xae/0xc0
      [  607.137119]  [<ffffffff810b379d>] ? trace_hardirqs_on+0xd/0x10
      [  607.169797]  [<ffffffff8163e744>] kernel_thread_helper+0x4/0x10
      [  607.202472]  [<ffffffff81635430>] ? retint_restore_args+0x13/0x13
      [  607.235884]  [<ffffffff81079140>] ? flush_kthread_work+0x1a0/0x1a0
      [  607.268731]  [<ffffffff8163e740>] ? gs_change+0x13/0x13
      Signed-off-by: NSage Weil <sage@inktank.com>
      ac14aed6
    • J
      Btrfs: add hole punching · 2aaa6655
      Josef Bacik 提交于
      This patch adds hole punching via fallocate.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      2aaa6655
    • J
      Btrfs: remove unused hint byte argument for btrfs_drop_extents · 2671485d
      Josef Bacik 提交于
      I audited all users of btrfs_drop_extents and found that nobody actually uses
      the hint_byte argument.  I'm sure it was used for something at some point but
      it's not used now, and the way the pinning works the disk bytenr would never be
      immediately useful anyway so lets just remove it.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      2671485d
    • L
      Btrfs: fix a bug in checking whether a inode is already in log · 46d8bc34
      Liu Bo 提交于
      This is based on Josef's "Btrfs: turbo charge fsync".
      
      The current btrfs checks if an inode is in log by comparing
      root's last_log_commit to inode's last_sub_trans[2].
      
      But the problem is that this root->last_log_commit is shared among
      inodes.
      
      Say we have N inodes to be logged, after the first inode,
      root's last_log_commit is updated and the N-1 remained files will
      be skipped.
      
      This fixes the bug by keeping a local copy of root's last_log_commit
      inside each inode and this local copy will be maintained itself.
      
      [1]: we regard each log transaction as a subset of btrfs's transaction,
      i.e. sub_trans
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      46d8bc34
新手
引导
客服 返回
顶部