1. 01 5月, 2011 1 次提交
    • C
      ext4: don't set PageUptodate in ext4_end_bio() · 39db00f1
      Curt Wohlgemuth 提交于
      In the bio completion routine, we should not be setting
      PageUptodate at all -- it's set at sys_write() time, and is
      unaffected by success/failure of the write to disk.
      
      This can cause a page corruption bug when the file system's
      block size is less than the architecture's VM page size.
      
      if we have only written a single block -- we might end up
      setting the page's PageUptodate flag, indicating that page
      is completely read into memory, which may not be true.
      This could cause subsequent reads to get bad data.
      
      This commit also takes the opportunity to clean up error
      handling in ext4_end_bio(), and remove some extraneous code:
      
         - fixes ext4_end_bio() to set AS_EIO in the
           page->mapping->flags on error, which was left out by
           mistake.  This is needed so that fsync() will
           return an error if there was an I/O error.
         - remove the clear_buffer_dirty() call on unmapped
           buffers for each page.
         - consolidate page/buffer error handling in a single
           section.
      Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reported-by: NJim Meyering <jim@meyering.net>
      Reported-by: NHugh Dickins <hughd@google.com>
      Cc: Mingming Cao <cmm@us.ibm.com>
      39db00f1
  2. 19 4月, 2011 1 次提交
    • T
      ext4: check for ext[23] file system features when mounting as ext[23] · 2035e776
      Theodore Ts'o 提交于
      Provide better emulation for ext[23] mode by enforcing that the file
      system does not have any unsupported file system features as defined
      by ext[23] when emulating the ext[23] file system driver when
      CONFIG_EXT4_USE_FOR_EXT23 is defined.
      
      This causes the file system type information in /proc/mounts to be
      correct for the automatically mounted root file system.  This also
      means that "mount -t ext2 /dev/sda /mnt" will fail if /dev/sda
      contains an ext3 or ext4 file system, just as one would expect if the
      original ext2 file system driver were in use.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      2035e776
  3. 17 4月, 2011 1 次提交
  4. 11 4月, 2011 4 次提交
  5. 06 4月, 2011 1 次提交
  6. 05 4月, 2011 3 次提交
  7. 31 3月, 2011 1 次提交
  8. 24 3月, 2011 5 次提交
  9. 22 3月, 2011 3 次提交
  10. 21 3月, 2011 4 次提交
  11. 17 3月, 2011 1 次提交
    • T
      ext4: Initialize fsync transaction ids in ext4_new_inode() · 688f869c
      Theodore Ts'o 提交于
      When allocating a new inode, we need to make sure i_sync_tid and
      i_datasync_tid are initialized.  Otherwise, one or both of these two
      values could be left initialized to zero, which could potentially
      result in BUG_ON in jbd2_journal_commit_transaction.
      
      (This could happen by having journal->commit_request getting set to
      zero, which could wake up the kjournald process even though there is
      no running transaction, which then causes a BUG_ON via the 
      J_ASSERT(j_ruinning_transaction != NULL) statement.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      688f869c
  12. 15 3月, 2011 2 次提交
  13. 10 3月, 2011 2 次提交
    • J
      block: kill off REQ_UNPLUG · 721a9602
      Jens Axboe 提交于
      With the plugging now being explicitly controlled by the
      submitter, callers need not pass down unplugging hints
      to the block layer. If they want to unplug, it's because they
      manually plugged on their own - in which case, they should just
      unplug at will.
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      721a9602
    • J
      block: remove per-queue plugging · 7eaceacc
      Jens Axboe 提交于
      Code has been converted over to the new explicit on-stack plugging,
      and delay users have been converted to use the new API for that.
      So lets kill off the old plugging along with aops->sync_page().
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      7eaceacc
  14. 06 3月, 2011 1 次提交
  15. 01 3月, 2011 1 次提交
    • T
      ext4: optimize ext4_bio_write_page() when no extent conversion is needed · b6168443
      Theodore Ts'o 提交于
      If no extent conversion is required, wake up any processes waiting for
      the page's writeback to be complete and free the ext4_io_end structure
      directly in ext4_end_bio() instead of dropping it on the linked list
      (which requires taking a spinlock to queue and dequeue the io_end
      structure), and waiting for the workqueue to do this work.
      
      This removes an extra scheduling delay before process waiting for an
      fsync() to complete gets woken up, and it also reduces the CPU
      overhead for a random write workload.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      b6168443
  16. 28 2月, 2011 6 次提交
    • A
      ext4: skip orphan cleanup if fs has unknown ROCOMPAT features · d39195c3
      Amir Goldstein 提交于
      Orphan cleanup is currently executed even if the file system has some
      number of unknown ROCOMPAT features, which deletes inodes and frees
      blocks, which could be very bad for some RO_COMPAT features,
      especially the SNAPSHOT feature.
      
      This patch skips the orphan cleanup if it contains readonly compatible
      features not known by this ext4 implementation, which would prevent
      the fs from being mounted (or remounted) readwrite.
      Signed-off-by: NAmir Goldstein <amir73il@users.sf.net>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      d39195c3
    • A
      ext4: use the nblocks arg to ext4_truncate_restart_trans() · 8e8eaabe
      Amir Goldstein 提交于
      nblocks is passed into ext4_truncate_restart_trans() from
      ext4_ext_truncate_extend_restart() with a value different from the default
      blocks_for_truncate(), but is being ignored.
      
      The two other calls to ext4_truncate_restart_trans() already pass the
      default value, which is then being recalculated inside the function.
      
      Fix the problem by using the passed argument.
      Signed-off-by: NAmir Goldstein <amir73il@users.sf.net>
      8e8eaabe
    • M
      ext4: fix missing iput of root inode for some mount error paths · 32a9bb57
      Manish Katiyar 提交于
      This assures that the root inode is not leaked, and that sb->s_root is
      NULL, which will prevent generic_shutdown_super() from doing extra
      work, including call sync_filesystem, which ultimately results in
      ext4_sync_fs() getting called with an uninitialized struct super,
      which is the cause of the crash noted in Kernel Bugzilla #26752.
      
      https://bugzilla.kernel.org/show_bug.cgi?id=26752Signed-off-by: NManish Katiyar <mkatiyar@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      32a9bb57
    • Y
      ext4: make FIEMAP and delayed allocation play well together · 6d9c85eb
      Yongqiang Yang 提交于
      Fix the FIEMAP ioctl so that it returns all of the page ranges which
      are still subject to delayed allocation.  We were missing some cases
      if the file was sparse.
      
      Reported by Chris Mason <chris.mason@oracle.com>:
      >We've had reports on btrfs that cp is giving us files full of zeros
      >instead of actually copying them.  It was tracked down to a bug with
      >the btrfs fiemap implementation where it was returning holes for
      >delalloc ranges.
      >
      >Newer versions of cp are trusting fiemap to tell it where the holes
      >are, which does seem like a pretty neat trick.
      >
      >I decided to give xfs and ext4 a shot with a few tests cases too, xfs
      >passed with all the ones btrfs was getting wrong, and ext4 got the basic
      >delalloc case right.
      >$ mkfs.ext4 /dev/xxx
      >$ mount /dev/xxx /mnt
      >$ dd if=/dev/zero of=/mnt/foo bs=1M count=1
      >$ fiemap-test foo
      >ext:   0 logical: [       0..     255] phys:        0..     255
      >flags: 0x007 tot: 256
      >
      >Horray!  But once we throw a hole in, things go bad:
      >$ mkfs.ext4 /dev/xxx
      >$ mount /dev/xxx /mnt
      >$ dd if=/dev/zero of=/mnt/foo bs=1M count=1 seek=1
      >$ fiemap-test foo
      >< no output >
      >
      >We've got a delalloc extent after the hole and ext4 fiemap didn't find
      >it.  If I run sync to kick the delalloc out:
      >$sync
      >$ fiemap-test foo
      >ext:   0 logical: [     256..     511] phys:    34048..   34303
      >flags: 0x001 tot: 256
      >
      >fiemap-test is sitting in my /usr/local/bin, and I have no idea how it
      >got there.  It's full of pretty comments so I know it isn't mine, but
      >you can grab it here:
      >
      >http://oss.oracle.com/~mason/fiemap-test.c
      >
      >xfsqa has a fiemap program too.
      
      After Fix, test results are as follows:
      ext:   0 logical: [     256..     511] phys:        0..     255
      flags: 0x007 tot: 256
      ext:   0 logical: [     256..     511] phys:    33280..   33535
      flags: 0x001 tot: 256
      
      $ mkfs.ext4 /dev/xxx
      $ mount /dev/xxx /mnt
      $ dd if=/dev/zero of=/mnt/foo bs=1M count=1 seek=1
      $ sync
      $ dd if=/dev/zero of=/mnt/foo bs=1M count=1 seek=3
      $ dd if=/dev/zero of=/mnt/foo bs=1M count=1 seek=5
      $ fiemap-test foo
      ext:   0 logical: [     256..     511] phys:    33280..   33535
      flags: 0x000 tot: 256
      ext:   1 logical: [     768..    1023] phys:        0..     255
      flags: 0x006 tot: 256
      ext:   2 logical: [    1280..    1535] phys:        0..     255
      flags: 0x007 tot: 256
      Tested-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: NYongqiang Yang <xiaoqiangnk@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      6d9c85eb
    • T
      ext4: suppress verbose debugging information if malloc-debug is off · 4dd89fc6
      Theodore Ts'o 提交于
      If CONFIG_EXT4_DEBUG is enabled, then if a block allocation fails due
      to disk being full, a verbose debugging message is printed, even if
      the malloc-debug switch has not been enabled.  Suppress the debugging
      message so that nothing is printed unless malloc-debug has been turned
      on.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      4dd89fc6
    • T
      ext4: don't leave PageWriteback set after memory failure · a54aa761
      Theodore Ts'o 提交于
      In ext4_bio_write_page(), if the memory allocation for the struct
      ext4_io_page fails, it returns with the page's PageWriteback flag set.
      This will end up causing the page not to skip writeback in
      WB_SYNC_NONE mode, and in WB_SYNC_ALL mode (i.e., on a sync, fsync, or
      umount) the writeback daemon will get stuck forever on the
      wait_on_page_writeback() function in write_cache_pages_da().
      
      Or, if journalling is enabled and the file gets deleted, it the
      journal thread can get stuck in journal_finish_inode_data_buffers()
      call to filemap_fdatawait().
      
      Another place where things can get hung up is in
      truncate_inode_pages(), called out of ext4_evict_inode().
      
      Fix this by not setting PageWriteback until after we have successfully
      allocated the struct ext4_io_page.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a54aa761
  17. 27 2月, 2011 3 次提交
    • T
      ext4: move setup of the mpd structure to write_cache_pages_da() · 168fc022
      Theodore Ts'o 提交于
      Move the initialization of all of the fields of the mpd structure to
      write_cache_pages_da().  This simplifies the code considerably.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      168fc022
    • T
      ext4: don't lock the next page in write_cache_pages if not needed · 78aaced3
      Theodore Ts'o 提交于
      If we have accumulated a contiguous region of memory to be written
      out, and the next page can added to this region, don't bother locking
      (and then unlocking the page) before writing out the memory.  In the
      unlikely event that the next page was being written back by some other
      CPU, we can also skip waiting that page to finish writeback.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      78aaced3
    • T
      ext4: remove page_skipped hackery in ext4_da_writepages() · ee6ecbcc
      Theodore Ts'o 提交于
      Because the ext4 page writeback codepath had been prematurely calling
      clear_page_dirty_for_io(), if it turned out that a particular page
      couldn't be written out during a particular pass of
      write_cache_pages_da(), the page would have to get redirtied by
      calling redirty_pages_for_writeback().  Not only was this wasted work,
      but redirty_page_for_writeback() would increment wbc->pages_skipped to
      signal to writeback_sb_inodes() that buffers were locked, and that it
      should skip this inode until later.
      
      Since this signal was incorrect in ext4's case --- which was caused by
      ext4's historically incorrect use of write_cache_pages() ---
      ext4_da_writepages() saved and restored wbc->skipped_pages to avoid
      confusing writeback_sb_inodes().
      
      Now that we've fixed ext4 to call clear_page_dirty_for_io() right
      before initiating the page I/O, we can nuke the page_skipped
      save/restore hackery, and breathe a sigh of relief.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      ee6ecbcc