1. 30 4月, 2012 10 次提交
  2. 24 4月, 2012 1 次提交
  3. 17 4月, 2012 2 次提交
  4. 13 4月, 2012 1 次提交
  5. 30 3月, 2012 1 次提交
    • L
      Revert "ext4: don't release page refs in ext4_end_bio()" · 6268b325
      Linus Torvalds 提交于
      This reverts commit b43d17f3.
      
      Dave Jones reports that it causes lockups on his laptop, and his debug
      output showed a lot of processes hung waiting for page_writeback (or
      more commonly - processes hung waiting for a lock that was held during
      that writeback wait).
      
      The page_writeback hint made Ted suggest that Dave look at this commit,
      and Dave verified that reverting it makes his problems go away.
      
      Ted says:
       "That commit fixes a race which is seen when you write into fallocated
        (and hence uninitialized) disk blocks under *very* heavy memory
        pressure.  Furthermore, although theoretically it could trigger under
        normal direct I/O writes, it only seems to trigger if you are issuing
        a huge number of AIO writes, such that a just-written page can get
        evicted from memory, and then read back into memory, before the
        workqueue has a chance to update the extent tree.
      
        This race has been around for a little over a year, and no one noticed
        until two months ago; it only happens under fairly exotic conditions,
        and in fact even after trying very hard to create a simple repro under
        lab conditions, we could only reproduce the problem and confirm the
        fix on production servers running MySQL on very fast PCIe-attached
        flash devices.
      
        Given that Dave was able to hit this problem pretty quickly, if we
        confirm that this commit is at fault, the only reasonable thing to do
        is to revert it IMO."
      Reported-and-tested-by: NDave Jones <davej@redhat.com>
      Acked-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6268b325
  6. 22 3月, 2012 8 次提交
  7. 21 3月, 2012 3 次提交
  8. 20 3月, 2012 8 次提交
  9. 19 3月, 2012 1 次提交
  10. 12 3月, 2012 1 次提交
  11. 05 3月, 2012 4 次提交
    • C
      ext4: add comments to definition of ext4_io_end_t · 4188188b
      Curt Wohlgemuth 提交于
      This should make it more clear what this structure is used
      for, and how some of the (mutually exclusive) fields are
      used to keep page cache references.
      Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      4188188b
    • C
      ext4: don't release page refs in ext4_end_bio() · b43d17f3
      Curt Wohlgemuth 提交于
      We can clear PageWriteback on each page when the IO
      completes, but we can't release the references on the page
      until we convert any uninitialized extents.
      
      Without this patch, the use of the dioread_nolock mount
      option can break buffered writes, because extents may
      not be converted by the time a subsequent buffered read
      comes in; if the page is not in the page cache, a read
      will return zeros if the extent is still uninitialized.
      
      I tested this with a (temporary) patch that adds a call
      to msleep(1000) at the start of ext4_end_io_work(), to delay
      processing of each DIO-unwritten work queue item.  With this
      msleep(), a simple workload of
      
        fallocate
        write
        fadvise
        read
      
      will fail without this patch, succeeds with it.
      Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      b43d17f3
    • J
      ext4: fix race between sync and completed io work · 491caa43
      Jeff Moyer 提交于
      The following command line will leave the aio-stress process unkillable
      on an ext4 file system (in my case, mounted on /mnt/test):
      
      aio-stress -t 20 -s 10 -O -S -o 2 -I 1000 /mnt/test/aiostress.3561.4 /mnt/test/aiostress.3561.4.20 /mnt/test/aiostress.3561.4.19 /mnt/test/aiostress.3561.4.18 /mnt/test/aiostress.3561.4.17 /mnt/test/aiostress.3561.4.16 /mnt/test/aiostress.3561.4.15 /mnt/test/aiostress.3561.4.14 /mnt/test/aiostress.3561.4.13 /mnt/test/aiostress.3561.4.12 /mnt/test/aiostress.3561.4.11 /mnt/test/aiostress.3561.4.10 /mnt/test/aiostress.3561.4.9 /mnt/test/aiostress.3561.4.8 /mnt/test/aiostress.3561.4.7 /mnt/test/aiostress.3561.4.6 /mnt/test/aiostress.3561.4.5 /mnt/test/aiostress.3561.4.4 /mnt/test/aiostress.3561.4.3 /mnt/test/aiostress.3561.4.2
      
      This is using the aio-stress program from the xfstests test suite.
      That particular command line tells aio-stress to do random writes to
      20 files from 20 threads (one thread per file).  The files are NOT
      preallocated, so you will get writes to random offsets within the
      file, thus creating holes and extending i_size.  It also opens the
      file with O_DIRECT and O_SYNC.
      
      On to the problem.  When an I/O requires unwritten extent conversion,
      it is queued onto the completed_io_list for the ext4 inode.  Two code
      paths will pull work items from this list.  The first is the
      ext4_end_io_work routine, and the second is ext4_flush_completed_IO,
      which is called via the fsync path (and O_SYNC handling, as well).
      There are two issues I've found in these code paths.  First, if the
      fsync path beats the work routine to a particular I/O, the work
      routine will free the io_end structure!  It does not take into account
      the fact that the io_end may still be in use by the fsync path.  I've
      fixed this issue by adding yet another IO_END flag, indicating that
      the io_end is being processed by the fsync path.
      
      The second problem is that the work routine will make an assignment to
      io->flag outside of the lock.  I have witnessed this result in a hang
      at umount.  Moving the flag setting inside the lock resolved that
      problem.
      
      The problem was introduced by commit b82e384c ("ext4: optimize
      locking for end_io extent conversion"), which first appeared in 3.2.
      As such, the fix should be backported to that release (probably along
      with the unwritten extent conversion race fix).
      Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      CC: stable@kernel.org
      491caa43
    • J
      ext4: clean up the flags passed to __blockdev_direct_IO · 93ef8541
      Jeff Moyer 提交于
      For extent-based files, you can perform DIO to holes, as mentioned in
      the comments in ext4_ext_direct_IO.  However, that function passes
      DIO_SKIP_HOLES to __blockdev_direct_IO, which is *really* confusing to
      the uninitiated reader.  The key, here, is that the get_block function
      passed in, ext4_get_block_write, completely ignores the create flag
      that is passed to it (the create flag is passed in from the direct I/O
      code, which uses the DIO_SKIP_HOLES flag to determine whether or not
      it should be cleared).
      
      This is a long-winded way of saying that the DIO_SKIP_HOLES flag is
      ultimately ignored.  So let's remove it.
      Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      93ef8541