1. 09 7月, 2014 19 次提交
  2. 23 6月, 2014 10 次提交
  3. 20 6月, 2014 9 次提交
    • M
      Btrfs: fix wrong error handle when the device is missing or is not writeable · 8408c716
      Miao Xie 提交于
      The original bio might be submitted, so we shoud increase bi_remaining to
      account for it when we deal with the error that the device is missing or
      is not writeable, or we would skip the endio handle.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      8408c716
    • M
      Btrfs: fix deadlock when mounting a degraded fs · c55f1396
      Miao Xie 提交于
      The deadlock happened when we mount degraded filesystem, the reproduced
      steps are following:
       # mkfs.btrfs -f -m raid1 -d raid1 <dev0> <dev1>
       # echo 1 > /sys/block/`basename <dev0>`/device/delete
       # mount -o degraded <dev1> <mnt>
      
      The reason was that the counter -- bi_remaining was wrong. If the missing
      or unwriteable device was the last device in the mapping array, we would
      not submit the original bio, so we shouldn't increase bi_remaining of it
      in btrfs_end_bio(), or we would skip the final endio handle.
      
      Fix this problem by adding a flag into btrfs bio structure. If we submit
      the original bio, we will set the flag, and we increase bi_remaining counter,
      or we don't.
      
      Though there is another way to fix it -- decrease bi_remaining counter of the
      original bio when we make sure the original bio is not submitted, this method
      need add more check and is easy to make mistake.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      c55f1396
    • M
      Btrfs: use bio_endio_nodec instead of open code · e990f167
      Miao Xie 提交于
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      e990f167
    • W
      Btrfs: fix NULL pointer crash when running balance and scrub concurrently · 298a8f9c
      Wang Shilong 提交于
      While running balance, scrub, fsstress concurrently we hit the
      following kernel crash:
      
      [56561.448845] BTRFS info (device sde): relocating block group 11005853696 flags 132
      [56561.524077] BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
      [56561.524237] IP: [<ffffffffa038956d>] scrub_chunk.isra.12+0xdd/0x130 [btrfs]
      [56561.524297] PGD 9be28067 PUD 7f3dd067 PMD 0
      [56561.524325] Oops: 0000 [#1] SMP
      [....]
      [56561.527237] Call Trace:
      [56561.527309]  [<ffffffffa038980e>] scrub_enumerate_chunks+0x24e/0x490 [btrfs]
      [56561.527392]  [<ffffffff810abe00>] ? abort_exclusive_wait+0x50/0xb0
      [56561.527476]  [<ffffffffa038add4>] btrfs_scrub_dev+0x1a4/0x530 [btrfs]
      [56561.527561]  [<ffffffffa0368107>] btrfs_ioctl+0x13f7/0x2a90 [btrfs]
      [56561.527639]  [<ffffffff811c82f0>] do_vfs_ioctl+0x2e0/0x4c0
      [56561.527712]  [<ffffffff8109c384>] ? vtime_account_user+0x54/0x60
      [56561.527788]  [<ffffffff810f768c>] ? __audit_syscall_entry+0x9c/0xf0
      [56561.527870]  [<ffffffff811c8551>] SyS_ioctl+0x81/0xa0
      [56561.527941]  [<ffffffff815707f7>] tracesys+0xdd/0xe2
      [...]
      [56561.528304] RIP  [<ffffffffa038956d>] scrub_chunk.isra.12+0xdd/0x130 [btrfs]
      [56561.528395]  RSP <ffff88004c0f5be8>
      [56561.528454] CR2: 0000000000000078
      
      This is because in btrfs_relocate_chunk(), we will free @bdev directly while
      scrub may still hold extent mapping, and may access freed memory.
      
      Fix this problem by wrapping freeing @bdev work into free_extent_map() which
      is based on reference count.
      Reported-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      298a8f9c
    • Q
      btrfs: Skip scrubbing removed chunks to avoid -ENOENT. · ced96edc
      Qu Wenruo 提交于
      When run scrub with balance, sometimes -ENOENT will be returned, since
      in scrub_enumerate_chunks() will search dev_extent in *COMMIT_ROOT*, but
      btrfs_lookup_block_group() will search block group in *MEMORY*, so if a
      chunk is removed but not committed, -ENOENT will be returned.
      
      However, there is no need to stop scrubbing since other chunks may be
      scrubbed without problem.
      
      So this patch changes the behavior to skip removed chunks and continue
      to scrub the rest.
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      ced96edc
    • M
      Btrfs: fix broken free space cache after the system crashed · e570fd27
      Miao Xie 提交于
      When we mounted the filesystem after the crash, we got the following
      message:
        BTRFS error (device xxx): block group xxxx has wrong amount of free space
        BTRFS error (device xxx): failed to load free space cache for block group xxx
      
      It is because we didn't update the metadata of the allocated space (in extent
      tree) until the file data was written into the disk. During this time, there was
      no information about the allocated spaces in either the extent tree nor the
      free space cache. when we wrote out the free space cache at this time (commit
      transaction), those spaces were lost. In fact, only the free space that is
      used to store the file data had this problem, the others didn't because
      the metadata of them is updated in the same transaction context.
      
      There are many methods which can fix the above problem
      - track the allocated space, and write it out when we write out the free
        space cache
      - account the size of the allocated space that is used to store the file
        data, if the size is not zero, don't write out the free space cache.
      
      The first one is complex and may make the performance drop down.
      This patch chose the second method, we use a per-block-group variant to
      account the size of that allocated space. Besides that, we also introduce
      a per-block-group read-write semaphore to avoid the race between
      the allocation and the free space cache write out.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      e570fd27
    • M
      Btrfs: make free space cache write out functions more readable · 5349d6c3
      Miao Xie 提交于
      This patch makes the free space cache write out functions more readable,
      and beisdes that, it also reduces the stack space that the function --
      __btrfs_write_out_cache uses from 194bytes to 144bytes.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      5349d6c3
    • F
      Btrfs: remove unused wait queue in struct extent_buffer · 46fefe41
      Filipe Manana 提交于
      The lock_wq wait queue is not used anywhere, therefore just remove it.
      On a x86_64 system, this reduced sizeof(struct extent_buffer) from 320
      bytes down to 296 bytes, which means a 4Kb page can now be used for
      13 extent buffers instead of 12.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      46fefe41
    • C
      Btrfs: fix deadlocks with trylock on tree nodes · ea4ebde0
      Chris Mason 提交于
      The Btrfs tree trylock function is poorly named.  It always takes
      the spinlock and backs off if the blocking lock is held.  This
      can lead to surprising lockups because people expect it to really be a
      trylock.
      
      This commit makes it a pure trylock, both for the spinlock and the
      blocking lock.  It also reworks the nested lock handling slightly to
      avoid taking the read lock while a spinning write lock might be held.
      Signed-off-by: NChris Mason <clm@fb.com>
      ea4ebde0
  4. 18 6月, 2014 2 次提交
    • K
      NFSD: fix bug for readdir of pseudofs · f41c5ad2
      Kinglong Mee 提交于
      Commit 561f0ed4 (nfsd4: allow large readdirs) introduces a bug
      about readdir the root of pseudofs.
      
      Call xdr_truncate_encode() revert encoded name when skipping.
      Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      f41c5ad2
    • N
      NFSD: Don't hand out delegations for 30 seconds after recalling them. · 6282cd56
      NeilBrown 提交于
      If nfsd needs to recall a delegation for some reason it implies that there is
      contention on the file, so further delegations should not be handed out.
      
      The current code fails to do so, and the result is effectively a
      live-lock under some workloads: a client attempting a conflicting
      operation on a read-delegated file receives NFS4ERR_DELAY and retries
      the operation, but by the time it retries the server may already have
      given out another delegation.
      
      We could simply avoid delegations for (say) 30 seconds after any recall, but
      this is probably too heavy handed.
      
      We could keep a list of inodes (or inode numbers or filehandles) for recalled
      delegations, but that requires memory allocation and searching.
      
      The approach taken here is to use a bloom filter to record the filehandles
      which are currently blocked from delegation, and to accept the cost of a few
      false positives.
      
      We have 2 bloom filters, each of which is valid for 30 seconds.   When a
      delegation is recalled the filehandle is added to one filter and will remain
      disabled for between 30 and 60 seconds.
      
      We keep a count of the number of filehandles that have been added, so when
      that count is zero we can bypass all other tests.
      
      The bloom filters have 256 bits and 3 hash functions.  This should allow a
      couple of dozen blocked  filehandles with minimal false positives.  If many
      more filehandles are all blocked at once, behaviour will degrade towards
      rejecting all delegations for between 30 and 60 seconds, then resetting and
      allowing new delegations.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      6282cd56