1. 11 2月, 2015 23 次提交
  2. 06 2月, 2015 1 次提交
    • R
      nilfs2: fix deadlock of segment constructor over I_SYNC flag · 7ef3ff2f
      Ryusuke Konishi 提交于
      Nilfs2 eventually hangs in a stress test with fsstress program.  This
      issue was caused by the following deadlock over I_SYNC flag between
      nilfs_segctor_thread() and writeback_sb_inodes():
      
        nilfs_segctor_thread()
          nilfs_segctor_thread_construct()
            nilfs_segctor_unlock()
              nilfs_dispose_list()
                iput()
                  iput_final()
                    evict()
                      inode_wait_for_writeback()  * wait for I_SYNC flag
      
        writeback_sb_inodes()
           * set I_SYNC flag on inode->i_state
          __writeback_single_inode()
            do_writepages()
              nilfs_writepages()
                nilfs_construct_dsync_segment()
                  nilfs_segctor_sync()
                     * wait for completion of segment constructor
          inode_sync_complete()
             * clear I_SYNC flag after __writeback_single_inode() completed
      
      writeback_sb_inodes() calls do_writepages() for dirty inodes after
      setting I_SYNC flag on inode->i_state.  do_writepages() in turn calls
      nilfs_writepages(), which can run segment constructor and wait for its
      completion.  On the other hand, segment constructor calls iput(), which
      can call evict() and wait for the I_SYNC flag on
      inode_wait_for_writeback().
      
      Since segment constructor doesn't know when I_SYNC will be set, it
      cannot know whether iput() will block or not unless inode->i_nlink has a
      non-zero count.  We can prevent evict() from being called in iput() by
      implementing sop->drop_inode(), but it's not preferable to leave inodes
      with i_nlink == 0 for long periods because it even defers file
      truncation and inode deallocation.  So, this instead resolves the
      deadlock by calling iput() asynchronously with a workqueue for inodes
      with i_nlink == 0.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Tested-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7ef3ff2f
  3. 05 2月, 2015 1 次提交
  4. 04 2月, 2015 1 次提交
    • D
      aio: annotate aio_read_event_ring for sleep patterns · 9c9ce763
      Dave Chinner 提交于
      Under CONFIG_DEBUG_ATOMIC_SLEEP=y, aio_read_event_ring() will throw
      warnings like the following due to being called from wait_event
      context:
      
       WARNING: CPU: 0 PID: 16006 at kernel/sched/core.c:7300 __might_sleep+0x7f/0x90()
       do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff810d85a3>] prepare_to_wait_event+0x63/0x110
       Modules linked in:
       CPU: 0 PID: 16006 Comm: aio-dio-fcntl-r Not tainted 3.19.0-rc6-dgc+ #705
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
        ffffffff821c0372 ffff88003c117cd8 ffffffff81daf2bd 000000000000d8d8
        ffff88003c117d28 ffff88003c117d18 ffffffff8109beda ffff88003c117cf8
        ffffffff821c115e 0000000000000061 0000000000000000 00007ffffe4aa300
       Call Trace:
        [<ffffffff81daf2bd>] dump_stack+0x4c/0x65
        [<ffffffff8109beda>] warn_slowpath_common+0x8a/0xc0
        [<ffffffff8109bf56>] warn_slowpath_fmt+0x46/0x50
        [<ffffffff810d85a3>] ? prepare_to_wait_event+0x63/0x110
        [<ffffffff810d85a3>] ? prepare_to_wait_event+0x63/0x110
        [<ffffffff810bdfcf>] __might_sleep+0x7f/0x90
        [<ffffffff81db8344>] mutex_lock+0x24/0x45
        [<ffffffff81216b7c>] aio_read_events+0x4c/0x290
        [<ffffffff81216fac>] read_events+0x1ec/0x220
        [<ffffffff810d8650>] ? prepare_to_wait_event+0x110/0x110
        [<ffffffff810fdb10>] ? hrtimer_get_res+0x50/0x50
        [<ffffffff8121899d>] SyS_io_getevents+0x4d/0xb0
        [<ffffffff81dba5a9>] system_call_fastpath+0x12/0x17
       ---[ end trace bde69eaf655a4fea ]---
      
      There is not actually a bug here, so annotate the code to tell the
      debug logic that everything is just fine and not to fire a false
      positive.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
      9c9ce763
  5. 28 1月, 2015 3 次提交
    • J
      quota: Switch ->get_dqblk() and ->set_dqblk() to use bytes as space units · 14bf61ff
      Jan Kara 提交于
      Currently ->get_dqblk() and ->set_dqblk() use struct fs_disk_quota which
      tracks space limits and usage in 512-byte blocks. However VFS quotas
      track usage in bytes (as some filesystems require that) and we need to
      somehow pass this information. Upto now it wasn't a problem because we
      didn't do any unit conversion (thus VFS quota routines happily stuck
      number of bytes into d_bcount field of struct fd_disk_quota). Only if
      you tried to use Q_XGETQUOTA or Q_XSETQLIM for VFS quotas (or Q_GETQUOTA
      / Q_SETQUOTA for XFS quotas), you got bogus results. Hardly anyone
      tried this but reportedly some Samba users hit the problem in practice.
      So when we want interfaces compatible we need to fix this.
      
      We bite the bullet and define another quota structure used for passing
      information from/to ->get_dqblk()/->set_dqblk. It's somewhat sad we have
      to have more conversion routines in fs/quota/quota.c and another copying
      of quota structure slows down getting of quota information by about 2%
      but it seems cleaner than overloading e.g. units of d_bcount to bytes.
      
      CC: stable@vger.kernel.org
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      14bf61ff
    • J
      udf: Release preallocation on last writeable close · b07ef352
      Jan Kara 提交于
      Commit 6fb1ca92 "udf: Fix race between write(2) and close(2)"
      changed the condition when preallocation is released. The idea was that
      we don't want to release the preallocation for an inode on close when
      there are other writeable file descriptors for the inode. However the
      condition was written in the opposite way so we released preallocation
      only if there were other writeable file descriptors. Fix the problem by
      changing the condition properly.
      
      CC: stable@vger.kernel.org
      Fixes: 6fb1ca92Reported-by: NFabian Frederick <fabf@skynet.be>
      Signed-off-by: NJan Kara <jack@suse.cz>
      b07ef352
    • G
      btrfs: fix raid56 scrub failed in xfstests btrfs/072 · 063c54dc
      Gui Hecheng 提交于
      The xfstests btrfs/072 reports uncorrectable read errors in dmesg,
      because scrub forgets to use commit_root for parity scrub routine
      and scrub attempts to scrub those extents items whose contents are
      not fully on disk.
      
      To fix it, we just add the @search_commit_root flag back.
      Signed-off-by: NGui Hecheng <guihc.fnst@cn.fujitsu.com>
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Reviewed-by: NMiao Xie <miaoxie@huawei.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      063c54dc
  6. 27 1月, 2015 1 次提交
  7. 22 1月, 2015 3 次提交
  8. 21 1月, 2015 2 次提交
    • Q
      btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. · a53f4f8e
      Qu Wenruo 提交于
      Commit 6b5fe46d (btrfs: do commit in sync_fs if there are pending
      changes) will call btrfs_start_transaction() in sync_fs(), to handle
      some operations needed to be done in next transaction.
      
      However this can cause deadlock if the filesystem is frozen, with the
      following sys_r+w output:
      [  143.255932] Call Trace:
      [  143.255936]  [<ffffffff816c0e09>] schedule+0x29/0x70
      [  143.255939]  [<ffffffff811cb7f3>] __sb_start_write+0xb3/0x100
      [  143.255971]  [<ffffffffa040ec06>] start_transaction+0x2e6/0x5a0
      [btrfs]
      [  143.255992]  [<ffffffffa040f1eb>] btrfs_start_transaction+0x1b/0x20
      [btrfs]
      [  143.256003]  [<ffffffffa03dc0ba>] btrfs_sync_fs+0xca/0xd0 [btrfs]
      [  143.256007]  [<ffffffff811f7be0>] sync_fs_one_sb+0x20/0x30
      [  143.256011]  [<ffffffff811cbd01>] iterate_supers+0xe1/0xf0
      [  143.256014]  [<ffffffff811f7d75>] sys_sync+0x55/0x90
      [  143.256017]  [<ffffffff816c49d2>] system_call_fastpath+0x12/0x17
      [  143.256111] Call Trace:
      [  143.256114]  [<ffffffff816c0e09>] schedule+0x29/0x70
      [  143.256119]  [<ffffffff816c3405>] rwsem_down_write_failed+0x1c5/0x2d0
      [  143.256123]  [<ffffffff8133f013>] call_rwsem_down_write_failed+0x13/0x20
      [  143.256131]  [<ffffffff811caae8>] thaw_super+0x28/0xc0
      [  143.256135]  [<ffffffff811db3e5>] do_vfs_ioctl+0x3f5/0x540
      [  143.256187]  [<ffffffff811db5c1>] SyS_ioctl+0x91/0xb0
      [  143.256213]  [<ffffffff816c49d2>] system_call_fastpath+0x12/0x17
      
      The reason is like the following:
      (Holding s_umount)
      VFS sync_fs staff:
      |- btrfs_sync_fs()
         |- btrfs_start_transaction()
            |- sb_start_intwrite()
            (Waiting thaw_fs to unfreeze)
      					VFS thaw_fs staff:
      					thaw_fs()
      					(Waiting sync_fs to release
      					 s_umount)
      
      So deadlock happens.
      This can be easily triggered by fstest/generic/068 with inode_cache
      mount option.
      
      The fix is to check if the fs is frozen, if the fs is frozen, just
      return and waiting for the next transaction.
      
      Cc: David Sterba <dsterba@suse.cz>
      Reported-by: NGui Hecheng <guihc.fnst@cn.fujitsu.com>
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      [enhanced comment, changed to SB_FREEZE_WRITE]
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      a53f4f8e
    • Q
      btrfs: Fix the bug that fs_info->pending_changes is never cleared. · 6c9fe14f
      Qu Wenruo 提交于
      Fs_info->pending_changes is never cleared since the original code uses
      cmpxchg(&fs_info->pending_changes, 0, 0), which will only clear it if
      pending_changes is already 0.
      
      This will cause a lot of problem when mount it with inode_cache mount
      option.
      If the btrfs is mounted as inode_cache, pending_changes will always be
      1, even when the fs is frozen.
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      6c9fe14f
  9. 20 1月, 2015 5 次提交