1. 06 9月, 2011 3 次提交
  2. 01 9月, 2011 2 次提交
    • C
      xfs: fix ->write_inode return values · 58d84c4e
      Christoph Hellwig 提交于
      Currently we always redirty an inode that was attempted to be written out
      synchronously but has been cleaned by an AIL pushed internall, which is
      rather bogus.  Fix that by doing the i_update_core check early on and
      return 0 for it.  Also include async calls for it, as doing any work for
      those is just as pointless.  While we're at it also fix the sign for the
      EIO return in case of a filesystem shutdown, and fix the completely
      non-sensical locking around xfs_log_inode.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      (cherry picked from commit 297db93bb74cf687510313eb235a7aec14d67e97)
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      58d84c4e
    • C
      xfs: fix xfs_mark_inode_dirty during umount · 866e4ed7
      Christoph Hellwig 提交于
      During umount we do not add a dirty inode to the lru and wait for it to
      become clean first, but force writeback of data and metadata with
      I_WILL_FREE set.  Currently there is no way for XFS to detect that the
      inode has been redirtied for metadata operations, as we skip the
      mark_inode_dirty call during teardown.  Fix this by setting i_update_core
      nanually in that case, so that the inode gets flushed during inode reclaim.
      
      Alternatively we could enable calling mark_inode_dirty for inodes in
      I_WILL_FREE state, and let the VFS dirty tracking handle this.  I decided
      against this as we will get better I/O patterns from reclaim compared to
      the synchronous writeout in write_inode_now, and always marking the inode
      dirty in some way from xfs_mark_inode_dirty is a better safetly net in
      either case.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      (cherry picked from commit da6742a5a4cc844a9982fdd936ddb537c0747856)
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      866e4ed7
  3. 31 8月, 2011 1 次提交
    • J
      ext4: remove i_mutex lock in ext4_evict_inode to fix lockdep complaining · 8c0bec21
      Jiaying Zhang 提交于
      The i_mutex lock and flush_completed_IO() added by commit 2581fdc8
      in ext4_evict_inode() causes lockdep complaining about potential
      deadlock in several places.  In most/all of these LOCKDEP complaints
      it looks like it's a false positive, since many of the potential
      circular locking cases can't take place by the time the
      ext4_evict_inode() is called; but since at the very least it may mask
      real problems, we need to address this.
      
      This change removes the flush_completed_IO() and i_mutex lock in
      ext4_evict_inode().  Instead, we take a different approach to resolve
      the software lockup that commit 2581fdc8 intends to fix.  Rather
      than having ext4-dio-unwritten thread wait for grabing the i_mutex
      lock of an inode, we use mutex_trylock() instead, and simply requeue
      the work item if we fail to grab the inode's i_mutex lock.
      
      This should speed up work queue processing in general and also
      prevents the following deadlock scenario: During page fault,
      shrink_icache_memory is called that in turn evicts another inode B.
      Inode B has some pending io_end work so it calls ext4_ioend_wait()
      that waits for inode B's i_ioend_count to become zero.  However, inode
      B's ioend work was queued behind some of inode A's ioend work on the
      same cpu's ext4-dio-unwritten workqueue.  As the ext4-dio-unwritten
      thread on that cpu is processing inode A's ioend work, it tries to
      grab inode A's i_mutex lock.  Since the i_mutex lock of inode A is
      still hold before the page fault happened, we enter a deadlock.
      Signed-off-by: NJiaying Zhang <jiayingz@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      8c0bec21
  4. 27 8月, 2011 1 次提交
  5. 26 8月, 2011 1 次提交
    • J
      lockdep: Add helper function for dir vs file i_mutex annotation · e096d0c7
      Josh Boyer 提交于
      Purely in-memory filesystems do not use the inode hash as the dcache
      tells us if an entry already exists.  As a result, they do not call
      unlock_new_inode, and thus directory inodes do not get put into a
      different lockdep class for i_sem.
      
      We need the different lockdep classes, because the locking order for
      i_mutex is different for directory inodes and regular inodes.  Directory
      inodes can do "readdir()", which takes i_mutex *before* possibly taking
      mm->mmap_sem (due to a page fault while copying the directory entry to
      user space).
      
      In contrast, regular inodes can be mmap'ed, which takes mm->mmap_sem
      before accessing i_mutex.
      
      The two cases can never happen for the same inode, so no real deadlock
      can occur, but without the different lockdep classes, lockdep cannot
      understand that.  As a result, if CONFIG_DEBUG_LOCK_ALLOC is set, this
      can lead to false positives from lockdep like below:
      
          find/645 is trying to acquire lock:
           (&mm->mmap_sem){++++++}, at: [<ffffffff81109514>] might_fault+0x5c/0xac
      
          but task is already holding lock:
           (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [<ffffffff81149f34>]
          vfs_readdir+0x5b/0xb4
      
          which lock already depends on the new lock.
      
          the existing dependency chain (in reverse order) is:
      
          -> #1 (&sb->s_type->i_mutex_key#15){+.+.+.}:
                [<ffffffff8108ac26>] lock_acquire+0xbf/0x103
                [<ffffffff814db822>] __mutex_lock_common+0x4c/0x361
                [<ffffffff814dbc46>] mutex_lock_nested+0x40/0x45
                [<ffffffff811daa87>] hugetlbfs_file_mmap+0x82/0x110
                [<ffffffff81111557>] mmap_region+0x258/0x432
                [<ffffffff811119dd>] do_mmap_pgoff+0x2ac/0x306
                [<ffffffff81111b4f>] sys_mmap_pgoff+0x118/0x16a
                [<ffffffff8100c858>] sys_mmap+0x22/0x24
                [<ffffffff814e3ec2>] system_call_fastpath+0x16/0x1b
      
          -> #0 (&mm->mmap_sem){++++++}:
                [<ffffffff8108a4bc>] __lock_acquire+0xa1a/0xcf7
                [<ffffffff8108ac26>] lock_acquire+0xbf/0x103
                [<ffffffff81109541>] might_fault+0x89/0xac
                [<ffffffff81149cff>] filldir+0x6f/0xc7
                [<ffffffff811586ea>] dcache_readdir+0x67/0x205
                [<ffffffff81149f54>] vfs_readdir+0x7b/0xb4
                [<ffffffff8114a073>] sys_getdents+0x7e/0xd1
                [<ffffffff814e3ec2>] system_call_fastpath+0x16/0x1b
      
      This patch moves the directory vs file lockdep annotation into a helper
      function that can be called by in-memory filesystems and has hugetlbfs
      call it.
      Signed-off-by: NJosh Boyer <jwboyer@redhat.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e096d0c7
  6. 25 8月, 2011 1 次提交
  7. 24 8月, 2011 1 次提交
  8. 23 8月, 2011 1 次提交
  9. 21 8月, 2011 1 次提交
  10. 20 8月, 2011 1 次提交
    • J
      ext4: flush any pending end_io requests before DIO reads w/dioread_nolock · dccaf33f
      Jiaying Zhang 提交于
      There is a race between ext4 buffer write and direct_IO read with
      dioread_nolock mount option enabled. The problem is that we clear
      PageWriteback flag during end_io time but will do
      uninitialized-to-initialized extent conversion later with dioread_nolock.
      If an O_direct read request comes in during this period, ext4 will return
      zero instead of the recently written data.
      
      This patch checks whether there are any pending uninitialized-to-initialized
      extent conversion requests before doing O_direct read to close the race.
      Note that this is just a bandaid fix. The fundamental issue is that we
      clear PageWriteback flag before we really complete an IO, which is
      problem-prone. To fix the fundamental issue, we may need to implement an
      extent tree cache that we can use to look up pending to-be-converted extents.
      Signed-off-by: NJiaying Zhang <jiayingz@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      dccaf33f
  11. 19 8月, 2011 2 次提交
  12. 18 8月, 2011 4 次提交
  13. 17 8月, 2011 13 次提交
  14. 16 8月, 2011 1 次提交
    • J
      cifs: demote cERROR in build_path_from_dentry to cFYI · fa71f447
      Jeff Layton 提交于
      Running the cthon tests on a recent kernel caused this message to pop
      occasionally:
      
          CIFS VFS: did not end path lookup where expected namelen is 0
      
      Some added debugging showed that namelen and dfsplen were both 0 when
      this occurred. That means that the read_seqretry returned true.
      
      Assuming that the comment inside the if statement is true, this should
      be harmless and just means that we raced with a rename. If that is the
      case, then there's no need for alarm and we can demote this to cFYI.
      
      While we're at it, print the dfsplen too so that we can see what
      happened here if the message pops during debugging.
      
      Cc: stable@kernel.org
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      fa71f447
  15. 14 8月, 2011 3 次提交
    • T
      ext4: fix nomblk_io_submit option so it correctly converts uninit blocks · 9dd75f1f
      Theodore Ts'o 提交于
      Bug discovered by Jan Kara:
      
      Finally, commit 1449032b returned back
      the old IO submission code but apparently it forgot to return the old
      handling of uninitialized buffers so we unconditionnaly call
      block_write_full_page() without specifying end_io function. So AFAICS
      we never convert unwritten extents to written in some cases. For
      example when I mount the fs as: mount -t ext4 -o
      nomblk_io_submit,dioread_nolock /dev/ubdb /mnt and do
              int fd = open(argv[1], O_RDWR | O_CREAT | O_TRUNC, 0600);
              char buf[1024];
              memset(buf, 'a', sizeof(buf));
              fallocate(fd, 0, 0, 16384);
              write(fd, buf, sizeof(buf));
      
      I get a file full of zeros (after remounting the filesystem so that
      pagecache is dropped) instead of seeing the first KB contain 'a's.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      9dd75f1f
    • T
      ext4: Resolve the hang of direct i/o read in handling EXT4_IO_END_UNWRITTEN. · 32c80b32
      Tao Ma 提交于
      EXT4_IO_END_UNWRITTEN flag set and the increase of i_aiodio_unwritten
      should be done simultaneously since ext4_end_io_nolock always clear
      the flag and decrease the counter in the same time.
      
      We don't increase i_aiodio_unwritten when setting
      EXT4_IO_END_UNWRITTEN so it will go nagative and causes some process
      to wait forever.
      
      Part of the patch came from Eric in his e-mail, but it doesn't fix the
      problem met by Michael actually.
      
      http://marc.info/?l=linux-ext4&m=131316851417460&w=2
      
      Reported-and-Tested-by: Michael Tokarev<mjt@tls.msk.ru>
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      32c80b32
    • J
      ext4: call ext4_ioend_wait and ext4_flush_completed_IO in ext4_evict_inode · 2581fdc8
      Jiaying Zhang 提交于
      Flush inode's i_completed_io_list before calling ext4_io_wait to
      prevent the following deadlock scenario: A page fault happens while
      some process is writing inode A. During page fault,
      shrink_icache_memory is called that in turn evicts another inode
      B. Inode B has some pending io_end work so it calls ext4_ioend_wait()
      that waits for inode B's i_ioend_count to become zero. However, inode
      B's ioend work was queued behind some of inode A's ioend work on the
      same cpu's ext4-dio-unwritten workqueue. As the ext4-dio-unwritten
      thread on that cpu is processing inode A's ioend work, it tries to
      grab inode A's i_mutex lock. Since the i_mutex lock of inode A is
      still hold before the page fault happened, we enter a deadlock.
      
      Also moves ext4_flush_completed_IO and ext4_ioend_wait from
      ext4_destroy_inode() to ext4_evict_inode(). During inode deleteion,
      ext4_evict_inode() is called before ext4_destroy_inode() and in
      ext4_evict_inode(), we may call ext4_truncate() without holding
      i_mutex lock. As a result, there is a race between flush_completed_IO
      that is called from ext4_ext_truncate() and ext4_end_io_work, which
      may cause corruption on an io_end structure. This change moves
      ext4_flush_completed_IO and ext4_ioend_wait from ext4_destroy_inode()
      to ext4_evict_inode() to resolve the race between ext4_truncate() and
      ext4_end_io_work during inode deletion.
      Signed-off-by: NJiaying Zhang <jiayingz@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      2581fdc8
  16. 13 8月, 2011 4 次提交
    • C
      ext4: Fix ext4_should_writeback_data() for no-journal mode · 441c8508
      Curt Wohlgemuth 提交于
      ext4_should_writeback_data() had an incorrect sequence of
      tests to determine if it should return 0 or 1: in
      particular, even in no-journal mode, 0 was being returned
      for a non-regular-file inode.
      
      This meant that, in non-journal mode, we would use
      ext4_journalled_aops for directories, symlinks, and other
      non-regular files.  However, calling journalled aop
      callbacks when there is no valid handle, can cause problems.
      
      This would cause a kernel crash with Jan Kara's commit
      2d859db3 ("ext4: fix data corruption in inodes with
      journalled data"), because we now dereference 'handle' in
      ext4_journalled_write_end().
      
      I also added BUG_ONs to check for a valid handle in the
      obviously journal-only aops callbacks.
      
      I tested this running xfstests with a scratch device in
      these modes:
      
         - no-journal
         - data=ordered
         - data=writeback
         - data=journal
      
      All work fine; the data=journal run has many failures and a
      crash in xfstests 074, but this is no different from a
      vanilla kernel.
      Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      441c8508
    • C
      xfs: remove subdirectories · c59d87c4
      Christoph Hellwig 提交于
      Use the move from Linux 2.6 to Linux 3.x as an excuse to kill the
      annoying subdirectories in the XFS source code.  Besides the large
      amount of file rename the only changes are to the Makefile, a few
      files including headers with the subdirectory prefix, and the binary
      sysctl compat code that includes a header under fs/xfs/ from
      kernel/.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      c59d87c4
    • A
      xfs: don't expect xfs headers to be in subdirectories · 06f8e2d6
      Alex Elder 提交于
      Fix up some #include directives in preparation for moving a few
      header files out of xfs source subdirectories.
      
      Note that "xfs_linux.h" also got its quoting convention for included
      files switched.
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      06f8e2d6
    • C
      xfs: replace xfs_buf_geterror() with bp->b_error · e5702805
      Chandra Seetharaman 提交于
      Since we just checked bp for NULL, it is ok to replace
      xfs_buf_geterror() with bp->b_error in these places.
      Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      e5702805