1. 11 2月, 2015 24 次提交
  2. 06 2月, 2015 1 次提交
    • R
      nilfs2: fix deadlock of segment constructor over I_SYNC flag · 7ef3ff2f
      Ryusuke Konishi 提交于
      Nilfs2 eventually hangs in a stress test with fsstress program.  This
      issue was caused by the following deadlock over I_SYNC flag between
      nilfs_segctor_thread() and writeback_sb_inodes():
      
        nilfs_segctor_thread()
          nilfs_segctor_thread_construct()
            nilfs_segctor_unlock()
              nilfs_dispose_list()
                iput()
                  iput_final()
                    evict()
                      inode_wait_for_writeback()  * wait for I_SYNC flag
      
        writeback_sb_inodes()
           * set I_SYNC flag on inode->i_state
          __writeback_single_inode()
            do_writepages()
              nilfs_writepages()
                nilfs_construct_dsync_segment()
                  nilfs_segctor_sync()
                     * wait for completion of segment constructor
          inode_sync_complete()
             * clear I_SYNC flag after __writeback_single_inode() completed
      
      writeback_sb_inodes() calls do_writepages() for dirty inodes after
      setting I_SYNC flag on inode->i_state.  do_writepages() in turn calls
      nilfs_writepages(), which can run segment constructor and wait for its
      completion.  On the other hand, segment constructor calls iput(), which
      can call evict() and wait for the I_SYNC flag on
      inode_wait_for_writeback().
      
      Since segment constructor doesn't know when I_SYNC will be set, it
      cannot know whether iput() will block or not unless inode->i_nlink has a
      non-zero count.  We can prevent evict() from being called in iput() by
      implementing sop->drop_inode(), but it's not preferable to leave inodes
      with i_nlink == 0 for long periods because it even defers file
      truncation and inode deallocation.  So, this instead resolves the
      deadlock by calling iput() asynchronously with a workqueue for inodes
      with i_nlink == 0.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Tested-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7ef3ff2f
  3. 05 2月, 2015 1 次提交
  4. 04 2月, 2015 1 次提交
    • D
      aio: annotate aio_read_event_ring for sleep patterns · 9c9ce763
      Dave Chinner 提交于
      Under CONFIG_DEBUG_ATOMIC_SLEEP=y, aio_read_event_ring() will throw
      warnings like the following due to being called from wait_event
      context:
      
       WARNING: CPU: 0 PID: 16006 at kernel/sched/core.c:7300 __might_sleep+0x7f/0x90()
       do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff810d85a3>] prepare_to_wait_event+0x63/0x110
       Modules linked in:
       CPU: 0 PID: 16006 Comm: aio-dio-fcntl-r Not tainted 3.19.0-rc6-dgc+ #705
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
        ffffffff821c0372 ffff88003c117cd8 ffffffff81daf2bd 000000000000d8d8
        ffff88003c117d28 ffff88003c117d18 ffffffff8109beda ffff88003c117cf8
        ffffffff821c115e 0000000000000061 0000000000000000 00007ffffe4aa300
       Call Trace:
        [<ffffffff81daf2bd>] dump_stack+0x4c/0x65
        [<ffffffff8109beda>] warn_slowpath_common+0x8a/0xc0
        [<ffffffff8109bf56>] warn_slowpath_fmt+0x46/0x50
        [<ffffffff810d85a3>] ? prepare_to_wait_event+0x63/0x110
        [<ffffffff810d85a3>] ? prepare_to_wait_event+0x63/0x110
        [<ffffffff810bdfcf>] __might_sleep+0x7f/0x90
        [<ffffffff81db8344>] mutex_lock+0x24/0x45
        [<ffffffff81216b7c>] aio_read_events+0x4c/0x290
        [<ffffffff81216fac>] read_events+0x1ec/0x220
        [<ffffffff810d8650>] ? prepare_to_wait_event+0x110/0x110
        [<ffffffff810fdb10>] ? hrtimer_get_res+0x50/0x50
        [<ffffffff8121899d>] SyS_io_getevents+0x4d/0xb0
        [<ffffffff81dba5a9>] system_call_fastpath+0x12/0x17
       ---[ end trace bde69eaf655a4fea ]---
      
      There is not actually a bug here, so annotate the code to tell the
      debug logic that everything is just fine and not to fire a false
      positive.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
      9c9ce763
  5. 28 1月, 2015 3 次提交
    • J
      quota: Switch ->get_dqblk() and ->set_dqblk() to use bytes as space units · 14bf61ff
      Jan Kara 提交于
      Currently ->get_dqblk() and ->set_dqblk() use struct fs_disk_quota which
      tracks space limits and usage in 512-byte blocks. However VFS quotas
      track usage in bytes (as some filesystems require that) and we need to
      somehow pass this information. Upto now it wasn't a problem because we
      didn't do any unit conversion (thus VFS quota routines happily stuck
      number of bytes into d_bcount field of struct fd_disk_quota). Only if
      you tried to use Q_XGETQUOTA or Q_XSETQLIM for VFS quotas (or Q_GETQUOTA
      / Q_SETQUOTA for XFS quotas), you got bogus results. Hardly anyone
      tried this but reportedly some Samba users hit the problem in practice.
      So when we want interfaces compatible we need to fix this.
      
      We bite the bullet and define another quota structure used for passing
      information from/to ->get_dqblk()/->set_dqblk. It's somewhat sad we have
      to have more conversion routines in fs/quota/quota.c and another copying
      of quota structure slows down getting of quota information by about 2%
      but it seems cleaner than overloading e.g. units of d_bcount to bytes.
      
      CC: stable@vger.kernel.org
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      14bf61ff
    • J
      udf: Release preallocation on last writeable close · b07ef352
      Jan Kara 提交于
      Commit 6fb1ca92 "udf: Fix race between write(2) and close(2)"
      changed the condition when preallocation is released. The idea was that
      we don't want to release the preallocation for an inode on close when
      there are other writeable file descriptors for the inode. However the
      condition was written in the opposite way so we released preallocation
      only if there were other writeable file descriptors. Fix the problem by
      changing the condition properly.
      
      CC: stable@vger.kernel.org
      Fixes: 6fb1ca92Reported-by: NFabian Frederick <fabf@skynet.be>
      Signed-off-by: NJan Kara <jack@suse.cz>
      b07ef352
    • G
      btrfs: fix raid56 scrub failed in xfstests btrfs/072 · 063c54dc
      Gui Hecheng 提交于
      The xfstests btrfs/072 reports uncorrectable read errors in dmesg,
      because scrub forgets to use commit_root for parity scrub routine
      and scrub attempts to scrub those extents items whose contents are
      not fully on disk.
      
      To fix it, we just add the @search_commit_root flag back.
      Signed-off-by: NGui Hecheng <guihc.fnst@cn.fujitsu.com>
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Reviewed-by: NMiao Xie <miaoxie@huawei.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      063c54dc
  6. 27 1月, 2015 1 次提交
  7. 22 1月, 2015 3 次提交
  8. 21 1月, 2015 2 次提交
    • Q
      btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. · a53f4f8e
      Qu Wenruo 提交于
      Commit 6b5fe46d (btrfs: do commit in sync_fs if there are pending
      changes) will call btrfs_start_transaction() in sync_fs(), to handle
      some operations needed to be done in next transaction.
      
      However this can cause deadlock if the filesystem is frozen, with the
      following sys_r+w output:
      [  143.255932] Call Trace:
      [  143.255936]  [<ffffffff816c0e09>] schedule+0x29/0x70
      [  143.255939]  [<ffffffff811cb7f3>] __sb_start_write+0xb3/0x100
      [  143.255971]  [<ffffffffa040ec06>] start_transaction+0x2e6/0x5a0
      [btrfs]
      [  143.255992]  [<ffffffffa040f1eb>] btrfs_start_transaction+0x1b/0x20
      [btrfs]
      [  143.256003]  [<ffffffffa03dc0ba>] btrfs_sync_fs+0xca/0xd0 [btrfs]
      [  143.256007]  [<ffffffff811f7be0>] sync_fs_one_sb+0x20/0x30
      [  143.256011]  [<ffffffff811cbd01>] iterate_supers+0xe1/0xf0
      [  143.256014]  [<ffffffff811f7d75>] sys_sync+0x55/0x90
      [  143.256017]  [<ffffffff816c49d2>] system_call_fastpath+0x12/0x17
      [  143.256111] Call Trace:
      [  143.256114]  [<ffffffff816c0e09>] schedule+0x29/0x70
      [  143.256119]  [<ffffffff816c3405>] rwsem_down_write_failed+0x1c5/0x2d0
      [  143.256123]  [<ffffffff8133f013>] call_rwsem_down_write_failed+0x13/0x20
      [  143.256131]  [<ffffffff811caae8>] thaw_super+0x28/0xc0
      [  143.256135]  [<ffffffff811db3e5>] do_vfs_ioctl+0x3f5/0x540
      [  143.256187]  [<ffffffff811db5c1>] SyS_ioctl+0x91/0xb0
      [  143.256213]  [<ffffffff816c49d2>] system_call_fastpath+0x12/0x17
      
      The reason is like the following:
      (Holding s_umount)
      VFS sync_fs staff:
      |- btrfs_sync_fs()
         |- btrfs_start_transaction()
            |- sb_start_intwrite()
            (Waiting thaw_fs to unfreeze)
      					VFS thaw_fs staff:
      					thaw_fs()
      					(Waiting sync_fs to release
      					 s_umount)
      
      So deadlock happens.
      This can be easily triggered by fstest/generic/068 with inode_cache
      mount option.
      
      The fix is to check if the fs is frozen, if the fs is frozen, just
      return and waiting for the next transaction.
      
      Cc: David Sterba <dsterba@suse.cz>
      Reported-by: NGui Hecheng <guihc.fnst@cn.fujitsu.com>
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      [enhanced comment, changed to SB_FREEZE_WRITE]
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      a53f4f8e
    • Q
      btrfs: Fix the bug that fs_info->pending_changes is never cleared. · 6c9fe14f
      Qu Wenruo 提交于
      Fs_info->pending_changes is never cleared since the original code uses
      cmpxchg(&fs_info->pending_changes, 0, 0), which will only clear it if
      pending_changes is already 0.
      
      This will cause a lot of problem when mount it with inode_cache mount
      option.
      If the btrfs is mounted as inode_cache, pending_changes will always be
      1, even when the fs is frozen.
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      6c9fe14f
  9. 20 1月, 2015 4 次提交
    • S
      Complete oplock break jobs before closing file handle · ca7df8e0
      Sachin Prabhu 提交于
      Commit
      c11f1df5
      requires writers to wait for any pending oplock break handler to
      complete before proceeding to write. This is done by waiting on bit
      CIFS_INODE_PENDING_OPLOCK_BREAK in cifsFileInfo->flags. This bit is
      cleared by the oplock break handler job queued on the workqueue once it
      has completed handling the oplock break allowing writers to proceed with
      writing to the file.
      
      While testing, it was noticed that the filehandle could be closed while
      there is a pending oplock break which results in the oplock break
      handler on the cifsiod workqueue being cancelled before it has had a
      chance to execute and clear the CIFS_INODE_PENDING_OPLOCK_BREAK bit.
      Any subsequent attempt to write to this file hangs waiting for the
      CIFS_INODE_PENDING_OPLOCK_BREAK bit to be cleared.
      
      We fix this by ensuring that we also clear the bit
      CIFS_INODE_PENDING_OPLOCK_BREAK when we remove the oplock break handler
      from the workqueue.
      
      The bug was found by Red Hat QA while testing using ltp's fsstress
      command.
      Signed-off-by: NSachin Prabhu <sprabhu@redhat.com>
      Acked-by: NShirish Pargaonkar <shirishpargaonkar@gmail.com>
      Signed-off-by: NJeff Layton <jlayton@samba.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NSteve French <steve.french@primarydata.com>
      ca7df8e0
    • G
      cifs: use memzero_explicit to clear stack buffer · f99dbfa4
      Giel van Schijndel 提交于
      When leaving a function use memzero_explicit instead of memset(0) to
      clear stack allocated buffers. memset(0) may be optimized away.
      
      This particular buffer is highly likely to contain sensitive data which
      we shouldn't leak (it's named 'passwd' after all).
      Signed-off-by: NGiel van Schijndel <me@mortis.eu>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Reported-at: http://www.viva64.com/en/b/0299/
      Reported-by: Andrey Karpov
      Reported-by: Svyatoslav Razmyslov
      Signed-off-by: NSteve French <steve.french@primarydata.com>
      f99dbfa4
    • S
      btrfs: fix state->private cast on 32 bit machines · 6e1103a6
      Satoru Takeuchi 提交于
      Suppress the following warning displayed on building 32bit (i686) kernel.
      
      ===============================================================================
      ...
         CC [M]  fs/btrfs/extent_io.o
      fs/btrfs/extent_io.c: In function ‘btrfs_free_io_failure_record’:
      fs/btrfs/extent_io.c:2193:13: warning: cast to pointer from integer of
      different size [-Wint-to-pointer-cast]
          failrec = (struct io_failure_record *)state->private;
      ...
      ===============================================================================
      Signed-off-by: NSatoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
      Reported-by: NChris Murphy <chris@colorremedies.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      6e1103a6
    • F
      Btrfs: fix race deleting block group from space_info->ro_bgs list · 75c68e9f
      Filipe Manana 提交于
      When removing a block group we were deleting it from its space_info's
      ro_bgs list without the correct protection - the space info's spinlock.
      Fix this by doing the list delete while holding the spinlock of the
      corresponding space info, which is the correct lock for any operation
      on that list.
      
      This issue was introduced in the 3.19 kernel by the following change:
      
          Btrfs: move read only block groups onto their own list V2
          commit 633c0aad
      
      I ran into a kernel crash while a task was running statfs, which iterates
      the space_info->ro_bgs list while holding the space info's spinlock,
      and another task was deleting it from the same list, without holding that
      spinlock, as part of the block group remove operation (while running the
      function btrfs_remove_block_group). This happened often when running the
      stress test xfstests/generic/038 I recently made.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      75c68e9f