1. 23 3月, 2016 1 次提交
  2. 14 3月, 2016 3 次提交
  3. 13 3月, 2016 2 次提交
    • A
      ext4: print ext4 mount option data_err=abort correctly · 7915a861
      Ales Novak 提交于
      If data_err=abort option is specified for an ext3/ext4 mount,
      /proc/mounts does show it as "(null)". This is caused by token2str()
      returning NULL for Opt_data_err_abort (due to its pattern containing
      '=').
      Signed-off-by: NAles Novak <alnovak@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      7915a861
    • E
      ext4: fix NULL pointer dereference in ext4_mark_inode_dirty() · 5e1021f2
      Eryu Guan 提交于
      ext4_reserve_inode_write() in ext4_mark_inode_dirty() could fail on
      error (e.g. EIO) and iloc.bh can be NULL in this case. But the error is
      ignored in the following "if" condition and ext4_expand_extra_isize()
      might be called with NULL iloc.bh set, which triggers NULL pointer
      dereference.
      
      This is uncovered by commit 8b4953e1 ("ext4: reserve code points for
      the project quota feature"), which enlarges the ext4_inode size, and
      run the following script on new kernel but with old mke2fs:
      
        #/bin/bash
        mnt=/mnt/ext4
        devname=ext4-error
        dev=/dev/mapper/$devname
        fsimg=/home/fs.img
      
        trap cleanup 0 1 2 3 9 15
      
        cleanup()
        {
                umount $mnt >/dev/null 2>&1
                dmsetup remove $devname
                losetup -d $backend_dev
                rm -f $fsimg
                exit 0
        }
      
        rm -f $fsimg
        fallocate -l 1g $fsimg
        backend_dev=`losetup -f --show $fsimg`
        devsize=`blockdev --getsz $backend_dev`
      
        good_tab="0 $devsize linear $backend_dev 0"
        error_tab="0 $devsize error $backend_dev 0"
      
        dmsetup create $devname --table "$good_tab"
      
        mkfs -t ext4 $dev
        mount -t ext4 -o errors=continue,strictatime $dev $mnt
      
        dmsetup load $devname --table "$error_tab" && dmsetup resume $devname
        echo 3 > /proc/sys/vm/drop_caches
        ls -l $mnt
        exit 0
      
      [ Patch changed to simplify the function a tiny bit. -- Ted ]
      Signed-off-by: NEryu Guan <guaneryu@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      5e1021f2
  4. 10 3月, 2016 8 次提交
  5. 09 3月, 2016 6 次提交
    • J
      ext4: remove i_ioend_count · 600be30a
      Jan Kara 提交于
      Remove counter of pending io ends as it is unused.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      600be30a
    • J
      ext4: simplify io_end handling for AIO DIO · 109811c2
      Jan Kara 提交于
      When mapping blocks for direct IO, we allocate io_end structure before
      mapping blocks and store pointer to it in the inode. This creates a
      requirement that any AIO DIO using io_end must be protected by i_mutex.
      This created problems in the past with dioread_nolock mode which was
      corrupting io_end pointers. Also io_end is allocated unnecessarily in
      case where we don't need to convert any extents (which is a common case
      for example when overwriting file).
      
      We fix the problem by allocating io_end only once we return unwritten
      extent from block mapping function for AIO DIO (so we can save some
      pointless io_end allocations) and we pass pointer to it in bh->b_private
      which generic DIO code later passes to our end IO callback. That way we
      remove any need for global pointer to io_end structure and thus fix the
      races.
      
      The downside of this change is that the checking for unwritten IO in
      flight in ext4_extents_can_be_merged() is more racy since we now
      increment i_unwritten / set EXT4_STATE_DIO_UNWRITTEN only after dropping
      i_data_sem. However the check has been racy already before because
      ext4_writepages() already increment i_unwritten after dropping
      i_data_sem and reserved blocks save us from hitting ENOSPC in the worst
      case.
      Signed-off-by: NJan Kara <jack@suse.cz>
      109811c2
    • J
      ext4: move trans handling and completion deferal out of _ext4_get_block · efe70c29
      Jan Kara 提交于
      There is no need to handle starting of a transaction and deferal of DIO
      completion in _ext4_get_block() function. We can move this out to get
      block functions for direct IO that need it. That way we can add stricter
      checks verifying things work as we expect.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      efe70c29
    • J
      ext4: rename and split get blocks functions · 705965bd
      Jan Kara 提交于
      Rename ext4_get_blocks_write() to ext4_get_blocks_unwritten() to better
      describe what it does. Also split out get blocks functions for direct
      IO. Later we move functionality from _ext4_get_blocks() there. There's no
      functional change in this patch.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      705965bd
    • J
      ext4: use i_mutex to serialize unaligned AIO DIO · e142d052
      Jan Kara 提交于
      Currently we've used hashed aio_mutex to serialize unaligned AIO DIO.
      However the code cleanups that happened after 2011 when the lock was
      introduced made aio_mutex acquired at almost the same places where we
      already have exclusion using i_mutex. So just use i_mutex for the
      exclusion of unaligned AIO DIO.
      
      The change moves waiting for pending unwritten extent conversion under
      i_mutex. That makes special handling of O_APPEND writes unnecessary and
      also avoids possible livelocking of unaligned AIO DIO with aligned one
      (nothing was preventing contiguous stream of aligned AIO DIOs to let
      unaligned AIO DIO wait forever).
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      e142d052
    • J
      ext4: pack ioend structure better · 3bd6ad7b
      Jan Kara 提交于
      On 64-bit architectures we have two 4-byte holes in struct ext4_io_end.
      Order entries better to avoid this and thus make the structure occupy
      64 instead of 72 bytes for 64-bit architectures.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      3bd6ad7b
  6. 29 2月, 2016 1 次提交
  7. 28 2月, 2016 4 次提交
    • R
      ext2, ext4: fix issue with missing journal entry in ext4_dax_mkwrite() · 1e9d180b
      Ross Zwisler 提交于
      As it is currently written ext4_dax_mkwrite() assumes that the call into
      __dax_mkwrite() will not have to do a block allocation so it doesn't create
      a journal entry.  For a read that creates a zero page to cover a hole
      followed by a write that actually allocates storage this is incorrect.  The
      ext4_dax_mkwrite() -> __dax_mkwrite() -> __dax_fault() path calls
      get_blocks() to allocate storage.
      
      Fix this by having the ->page_mkwrite fault handler call ext4_dax_fault()
      as this function already has all the logic needed to allocate a journal
      entry and call __dax_fault().
      
      Also update the ext2 fault handlers in this same way to remove duplicate
      code and keep the logic between ext2 and ext4 the same.
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      1e9d180b
    • R
      dax: move writeback calls into the filesystems · 7f6d5b52
      Ross Zwisler 提交于
      Previously calls to dax_writeback_mapping_range() for all DAX filesystems
      (ext2, ext4 & xfs) were centralized in filemap_write_and_wait_range().
      
      dax_writeback_mapping_range() needs a struct block_device, and it used
      to get that from inode->i_sb->s_bdev.  This is correct for normal inodes
      mounted on ext2, ext4 and XFS filesystems, but is incorrect for DAX raw
      block devices and for XFS real-time files.
      
      Instead, call dax_writeback_mapping_range() directly from the filesystem
      ->writepages function so that it can supply us with a valid block
      device.  This also fixes DAX code to properly flush caches in response
      to sync(2).
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Cc: Al Viro <viro@ftp.linux.org.uk>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7f6d5b52
    • R
      ext4: online defrag not supported with DAX · 73f34a5e
      Ross Zwisler 提交于
      Online defrag operations for ext4 are hard coded to use the page cache.
      See ext4_ioctl() -> ext4_move_extents() -> move_extent_per_page()
      
      When combined with DAX I/O, which circumvents the page cache, this can
      result in data corruption.  This was observed with xfstests ext4/307 and
      ext4/308.
      
      Fix this by only allowing online defrag for non-DAX files.
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Al Viro <viro@ftp.linux.org.uk>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      73f34a5e
    • R
      ext2, ext4: only set S_DAX for regular inodes · 0a6cf913
      Ross Zwisler 提交于
      When S_DAX is set on an inode we assume that if there are pages attached
      to the mapping (mapping->nrpages != 0), those pages are clean zero pages
      that were used to service reads from holes.  Any dirty data associated
      with the inode should be in the form of DAX exceptional entries
      (mapping->nrexceptional) that is written back via
      dax_writeback_mapping_range().
      
      With the current code, though, this isn't always true.  For example,
      ext2 and ext4 directory inodes can have S_DAX set, but have their dirty
      data stored as dirty page cache entries.  For these types of inodes,
      having S_DAX set doesn't really make sense since their I/O doesn't
      actually happen through the DAX code path.
      
      Instead, only allow S_DAX to be set for regular inodes for ext2 and
      ext4.  This allows us to have strict DAX vs non-DAX paths in the
      writeback code.
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Al Viro <viro@ftp.linux.org.uk>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0a6cf913
  8. 23 2月, 2016 6 次提交
  9. 22 2月, 2016 2 次提交
  10. 20 2月, 2016 1 次提交
  11. 19 2月, 2016 2 次提交
    • J
      ext4: fix crashes in dioread_nolock mode · 74dae427
      Jan Kara 提交于
      Competing overwrite DIO in dioread_nolock mode will just overwrite
      pointer to io_end in the inode. This may result in data corruption or
      extent conversion happening from IO completion interrupt because we
      don't properly set buffer_defer_completion() when unlocked DIO races
      with locked DIO to unwritten extent.
      
      Since unlocked DIO doesn't need io_end for anything, just avoid
      allocating it and corrupting pointer from inode for locked DIO.
      A cleaner fix would be to avoid these games with io_end pointer from the
      inode but that requires more intrusive changes so we leave that for
      later.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      74dae427
    • J
      ext4: fix bh->b_state corruption · ed8ad838
      Jan Kara 提交于
      ext4 can update bh->b_state non-atomically in _ext4_get_block() and
      ext4_da_get_block_prep(). Usually this is fine since bh is just a
      temporary storage for mapping information on stack but in some cases it
      can be fully living bh attached to a page. In such case non-atomic
      update of bh->b_state can race with an atomic update which then gets
      lost. Usually when we are mapping bh and thus updating bh->b_state
      non-atomically, nobody else touches the bh and so things work out fine
      but there is one case to especially worry about: ext4_finish_bio() uses
      BH_Uptodate_Lock on the first bh in the page to synchronize handling of
      PageWriteback state. So when blocksize < pagesize, we can be atomically
      modifying bh->b_state of a buffer that actually isn't under IO and thus
      can race e.g. with delalloc trying to map that buffer. The result is
      that we can mistakenly set / clear BH_Uptodate_Lock bit resulting in the
      corruption of PageWriteback state or missed unlock of BH_Uptodate_Lock.
      
      Fix the problem by always updating bh->b_state bits atomically.
      
      CC: stable@vger.kernel.org
      Reported-by: NNikolay Borisov <kernel@kyup.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      ed8ad838
  12. 16 2月, 2016 1 次提交
  13. 12 2月, 2016 3 次提交
    • E
      ext4: remove unused parameter "newblock" in convert_initialized_extent() · 56263b4c
      Eryu Guan 提交于
      The "newblock" parameter is not used in convert_initialized_extent(),
      remove it.
      Signed-off-by: NEryu Guan <guaneryu@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      56263b4c
    • E
      ext4: don't read blocks from disk after extents being swapped · bcff2488
      Eryu Guan 提交于
      I notice ext4/307 fails occasionally on ppc64 host, reporting md5
      checksum mismatch after moving data from original file to donor file.
      
      The reason is that move_extent_per_page() calls __block_write_begin()
      and block_commit_write() to write saved data from original inode blocks
      to donor inode blocks, but __block_write_begin() not only maps buffer
      heads but also reads block content from disk if the size is not block
      size aligned.  At this time the physical block number in mapped buffer
      head is pointing to the donor file not the original file, and that
      results in reading wrong data to page, which get written to disk in
      following block_commit_write call.
      
      This also can be reproduced by the following script on 1k block size ext4
      on x86_64 host:
      
          mnt=/mnt/ext4
          donorfile=$mnt/donor
          testfile=$mnt/testfile
          e4compact=~/xfstests/src/e4compact
      
          rm -f $donorfile $testfile
      
          # reserve space for donor file, written by 0xaa and sync to disk to
          # avoid EBUSY on EXT4_IOC_MOVE_EXT
          xfs_io -fc "pwrite -S 0xaa 0 1m" -c "fsync" $donorfile
      
          # create test file written by 0xbb
          xfs_io -fc "pwrite -S 0xbb 0 1023" -c "fsync" $testfile
      
          # compute initial md5sum
          md5sum $testfile | tee md5sum.txt
          # drop cache, force e4compact to read data from disk
          echo 3 > /proc/sys/vm/drop_caches
      
          # test defrag
          echo "$testfile" | $e4compact -i -v -f $donorfile
          # check md5sum
          md5sum -c md5sum.txt
      
      Fix it by creating & mapping buffer heads only but not reading blocks
      from disk, because all the data in page is guaranteed to be up-to-date
      in mext_page_mkuptodate().
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NEryu Guan <guaneryu@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      bcff2488
    • I
      ext4: fix potential integer overflow · 46901760
      Insu Yun 提交于
      Since sizeof(ext_new_group_data) > sizeof(ext_new_flex_group_data),
      integer overflow could be happened.
      Therefore, need to fix integer overflow sanitization.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NInsu Yun <wuninsu@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      46901760