1. 13 3月, 2016 1 次提交
    • E
      ext4: fix NULL pointer dereference in ext4_mark_inode_dirty() · 5e1021f2
      Eryu Guan 提交于
      ext4_reserve_inode_write() in ext4_mark_inode_dirty() could fail on
      error (e.g. EIO) and iloc.bh can be NULL in this case. But the error is
      ignored in the following "if" condition and ext4_expand_extra_isize()
      might be called with NULL iloc.bh set, which triggers NULL pointer
      dereference.
      
      This is uncovered by commit 8b4953e1 ("ext4: reserve code points for
      the project quota feature"), which enlarges the ext4_inode size, and
      run the following script on new kernel but with old mke2fs:
      
        #/bin/bash
        mnt=/mnt/ext4
        devname=ext4-error
        dev=/dev/mapper/$devname
        fsimg=/home/fs.img
      
        trap cleanup 0 1 2 3 9 15
      
        cleanup()
        {
                umount $mnt >/dev/null 2>&1
                dmsetup remove $devname
                losetup -d $backend_dev
                rm -f $fsimg
                exit 0
        }
      
        rm -f $fsimg
        fallocate -l 1g $fsimg
        backend_dev=`losetup -f --show $fsimg`
        devsize=`blockdev --getsz $backend_dev`
      
        good_tab="0 $devsize linear $backend_dev 0"
        error_tab="0 $devsize error $backend_dev 0"
      
        dmsetup create $devname --table "$good_tab"
      
        mkfs -t ext4 $dev
        mount -t ext4 -o errors=continue,strictatime $dev $mnt
      
        dmsetup load $devname --table "$error_tab" && dmsetup resume $devname
        echo 3 > /proc/sys/vm/drop_caches
        ls -l $mnt
        exit 0
      
      [ Patch changed to simplify the function a tiny bit. -- Ted ]
      Signed-off-by: NEryu Guan <guaneryu@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      5e1021f2
  2. 10 3月, 2016 3 次提交
    • J
      ext4: more efficient SEEK_DATA implementation · 2d90c160
      Jan Kara 提交于
      Using SEEK_DATA in a huge sparse file can easily lead to sotflockups as
      ext4_seek_data() iterates hole block-by-block. Fix the problem by using
      returned hole size from ext4_map_blocks() and thus skip the hole in one
      go.
      
      Update also SEEK_HOLE implementation to follow the same pattern as
      SEEK_DATA to make future maintenance easier.
      
      Furthermore we add cond_resched() to both ext4_seek_data() and
      ext4_seek_hole() to avoid softlockups in case evil user creates huge
      fragmented file and we have to go through lots of extents.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      2d90c160
    • J
      ext4: cleanup handling of bh->b_state in DAX mmap · e3fb8eb1
      Jan Kara 提交于
      ext4_dax_mmap_get_block() updates bh->b_state directly instead of using
      ext4_update_bh_state(). This is mostly a cosmetic issue since DAX code
      always passes on-stack buffer_head but clean this up to make code more
      uniform.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      e3fb8eb1
    • J
      ext4: return hole from ext4_map_blocks() · facab4d9
      Jan Kara 提交于
      Currently, ext4_map_blocks() just returns 0 when it finds a hole and
      allocation is not requested. However we have all the information
      available to tell how large the hole actually is and there are callers
      of ext4_map_blocks() which would save some block-by-block hole iteration
      if they knew this information. So fill in struct ext4_map_blocks even
      for holes with the information we have. We keep returning 0 for holes to
      maintain backward compatibility of the function.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      facab4d9
  3. 09 3月, 2016 4 次提交
    • J
      ext4: remove i_ioend_count · 600be30a
      Jan Kara 提交于
      Remove counter of pending io ends as it is unused.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      600be30a
    • J
      ext4: simplify io_end handling for AIO DIO · 109811c2
      Jan Kara 提交于
      When mapping blocks for direct IO, we allocate io_end structure before
      mapping blocks and store pointer to it in the inode. This creates a
      requirement that any AIO DIO using io_end must be protected by i_mutex.
      This created problems in the past with dioread_nolock mode which was
      corrupting io_end pointers. Also io_end is allocated unnecessarily in
      case where we don't need to convert any extents (which is a common case
      for example when overwriting file).
      
      We fix the problem by allocating io_end only once we return unwritten
      extent from block mapping function for AIO DIO (so we can save some
      pointless io_end allocations) and we pass pointer to it in bh->b_private
      which generic DIO code later passes to our end IO callback. That way we
      remove any need for global pointer to io_end structure and thus fix the
      races.
      
      The downside of this change is that the checking for unwritten IO in
      flight in ext4_extents_can_be_merged() is more racy since we now
      increment i_unwritten / set EXT4_STATE_DIO_UNWRITTEN only after dropping
      i_data_sem. However the check has been racy already before because
      ext4_writepages() already increment i_unwritten after dropping
      i_data_sem and reserved blocks save us from hitting ENOSPC in the worst
      case.
      Signed-off-by: NJan Kara <jack@suse.cz>
      109811c2
    • J
      ext4: move trans handling and completion deferal out of _ext4_get_block · efe70c29
      Jan Kara 提交于
      There is no need to handle starting of a transaction and deferal of DIO
      completion in _ext4_get_block() function. We can move this out to get
      block functions for direct IO that need it. That way we can add stricter
      checks verifying things work as we expect.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      efe70c29
    • J
      ext4: rename and split get blocks functions · 705965bd
      Jan Kara 提交于
      Rename ext4_get_blocks_write() to ext4_get_blocks_unwritten() to better
      describe what it does. Also split out get blocks functions for direct
      IO. Later we move functionality from _ext4_get_blocks() there. There's no
      functional change in this patch.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      705965bd
  4. 19 2月, 2016 2 次提交
    • J
      ext4: fix crashes in dioread_nolock mode · 74dae427
      Jan Kara 提交于
      Competing overwrite DIO in dioread_nolock mode will just overwrite
      pointer to io_end in the inode. This may result in data corruption or
      extent conversion happening from IO completion interrupt because we
      don't properly set buffer_defer_completion() when unlocked DIO races
      with locked DIO to unwritten extent.
      
      Since unlocked DIO doesn't need io_end for anything, just avoid
      allocating it and corrupting pointer from inode for locked DIO.
      A cleaner fix would be to avoid these games with io_end pointer from the
      inode but that requires more intrusive changes so we leave that for
      later.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      74dae427
    • J
      ext4: fix bh->b_state corruption · ed8ad838
      Jan Kara 提交于
      ext4 can update bh->b_state non-atomically in _ext4_get_block() and
      ext4_da_get_block_prep(). Usually this is fine since bh is just a
      temporary storage for mapping information on stack but in some cases it
      can be fully living bh attached to a page. In such case non-atomic
      update of bh->b_state can race with an atomic update which then gets
      lost. Usually when we are mapping bh and thus updating bh->b_state
      non-atomically, nobody else touches the bh and so things work out fine
      but there is one case to especially worry about: ext4_finish_bio() uses
      BH_Uptodate_Lock on the first bh in the page to synchronize handling of
      PageWriteback state. So when blocksize < pagesize, we can be atomically
      modifying bh->b_state of a buffer that actually isn't under IO and thus
      can race e.g. with delalloc trying to map that buffer. The result is
      that we can mistakenly set / clear BH_Uptodate_Lock bit resulting in the
      corruption of PageWriteback state or missed unlock of BH_Uptodate_Lock.
      
      Fix the problem by always updating bh->b_state bits atomically.
      
      CC: stable@vger.kernel.org
      Reported-by: NNikolay Borisov <kernel@kyup.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      ed8ad838
  5. 23 1月, 2016 1 次提交
    • A
      wrappers for ->i_mutex access · 5955102c
      Al Viro 提交于
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5955102c
  6. 09 1月, 2016 1 次提交
  7. 09 12月, 2015 1 次提交
    • A
      don't put symlink bodies in pagecache into highmem · 21fc61c7
      Al Viro 提交于
      kmap() in page_follow_link_light() needed to go - allowing to hold
      an arbitrary number of kmaps for long is a great way to deadlocking
      the system.
      
      new helper (inode_nohighmem(inode)) needs to be used for pagecache
      symlinks inodes; done for all in-tree cases.  page_follow_link_light()
      instrumented to yell about anything missed.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      21fc61c7
  8. 08 12月, 2015 6 次提交
    • J
      ext4: use pre-zeroed blocks for DAX page faults · ba5843f5
      Jan Kara 提交于
      Make DAX fault path use pre-zeroed blocks to avoid races with extent
      conversion and zeroing when two page faults to the same block happen.
      Signed-off-by: NJan Kara <jack@suse.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      ba5843f5
    • J
      ext4: implement allocation of pre-zeroed blocks · c86d8db3
      Jan Kara 提交于
      DAX page fault path needs to get blocks that are pre-zeroed to avoid
      races when two concurrent page faults happen in the same block of a
      file. Implement support for this in ext4_map_blocks().
      Signed-off-by: NJan Kara <jack@suse.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      c86d8db3
    • J
      ext4: provide ext4_issue_zeroout() · 53085fac
      Jan Kara 提交于
      Create new function ext4_issue_zeroout() to zeroout contiguous (both
      logically and physically) part of inode data. We will need to issue
      zeroout when extent structure is not readily available and this function
      will allow us to do it without making up fake extent structures.
      Signed-off-by: NJan Kara <jack@suse.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      53085fac
    • J
      ext4: get rid of EXT4_GET_BLOCKS_NO_LOCK flag · 2dcba478
      Jan Kara 提交于
      When dioread_nolock mode is enabled, we grab i_data_sem in
      ext4_ext_direct_IO() and therefore we need to instruct _ext4_get_block()
      not to grab i_data_sem again using EXT4_GET_BLOCKS_NO_LOCK. However
      holding i_data_sem over overwrite direct IO isn't needed these days. We
      have exclusion against truncate / hole punching because we increase
      i_dio_count under i_mutex in ext4_ext_direct_IO() so once
      ext4_file_write_iter() verifies blocks are allocated & written, they are
      guaranteed to stay so during the whole direct IO even after we drop
      i_mutex.
      
      So we can just remove this locking abuse and the no longer necessary
      EXT4_GET_BLOCKS_NO_LOCK flag.
      Signed-off-by: NJan Kara <jack@suse.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      2dcba478
    • J
      ext4: fix races of writeback with punch hole and zero range · 01127848
      Jan Kara 提交于
      When doing delayed allocation, update of on-disk inode size is postponed
      until IO submission time. However hole punch or zero range fallocate
      calls can end up discarding the tail page cache page and thus on-disk
      inode size would never be properly updated.
      
      Make sure the on-disk inode size is updated before truncating page
      cache.
      Signed-off-by: NJan Kara <jack@suse.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      01127848
    • J
      ext4: fix races between page faults and hole punching · ea3d7209
      Jan Kara 提交于
      Currently, page faults and hole punching are completely unsynchronized.
      This can result in page fault faulting in a page into a range that we
      are punching after truncate_pagecache_range() has been called and thus
      we can end up with a page mapped to disk blocks that will be shortly
      freed. Filesystem corruption will shortly follow. Note that the same
      race is avoided for truncate by checking page fault offset against
      i_size but there isn't similar mechanism available for punching holes.
      
      Fix the problem by creating new rw semaphore i_mmap_sem in inode and
      grab it for writing over truncate, hole punching, and other functions
      removing blocks from extent tree and for read over page faults. We
      cannot easily use i_data_sem for this since that ranks below transaction
      start and we need something ranking above it so that it can be held over
      the whole truncate / hole punching operation. Also remove various
      workarounds we had in the code to reduce race window when page fault
      could have created pages with stale mapping information.
      Signed-off-by: NJan Kara <jack@suse.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      ea3d7209
  9. 11 11月, 2015 1 次提交
    • R
      vfs: remove unused wrapper block_page_mkwrite() · 5c500029
      Ross Zwisler 提交于
      The function currently called "__block_page_mkwrite()" used to be called
      "block_page_mkwrite()" until a wrapper for this function was added by:
      
      commit 24da4fab ("vfs: Create __block_page_mkwrite() helper passing
      	error values back")
      
      This wrapper, the current "block_page_mkwrite()", is currently unused.
      __block_page_mkwrite() is used directly by ext4, nilfs2 and xfs.
      
      Remove the unused wrapper, rename __block_page_mkwrite() back to
      block_page_mkwrite() and update the comment above block_page_mkwrite().
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJan Kara <jack@suse.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5c500029
  10. 07 11月, 2015 1 次提交
  11. 18 10月, 2015 2 次提交
  12. 15 10月, 2015 1 次提交
    • T
      ext4: use private version of page_zero_new_buffers() for data=journal mode · b90197b6
      Theodore Ts'o 提交于
      If there is a error while copying data from userspace into the page
      cache during a write(2) system call, in data=journal mode, in
      ext4_journalled_write_end() were using page_zero_new_buffers() from
      fs/buffer.c.  Unfortunately, this sets the buffer dirty flag, which is
      no good if journalling is enabled.  This is a long-standing bug that
      goes back for years and years in ext3, but a combination of (a)
      data=journal not being very common, (b) in many case it only results
      in a warning message. and (c) only very rarely causes the kernel hang,
      means that we only really noticed this as a problem when commit
      998ef75d caused this failure to happen frequently enough to cause
      generic/208 to fail when run in data=journal mode.
      
      The fix is to have our own version of this function that doesn't call
      mark_dirty_buffer(), since we will end up calling
      ext4_handle_dirty_metadata() on the buffer head(s) in questions very
      shortly afterwards in ext4_journalled_write_end().
      
      Thanks to Dave Hansen and Linus Torvalds for helping to identify the
      root cause of the problem.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.com>
      b90197b6
  13. 03 10月, 2015 2 次提交
    • T
      ext4 crypto: ext4_page_crypto() doesn't need a encryption context · 3684de8c
      Theodore Ts'o 提交于
      Since ext4_page_crypto() doesn't need an encryption context (at least
      not any more), this allows us to simplify a number function signature
      and also allows us to avoid needing to allocate a context in
      ext4_block_write_begin().  It also means we no longer need a separate
      ext4_decrypt_one() function.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      3684de8c
    • T
      ext4: optimize ext4_writepage() for attempted 4k delalloc writes · cccd147a
      Theodore Ts'o 提交于
      In cases where the file system block size is the same as the page
      size, and ext4_writepage() is asked to write out a page which is
      either has the unwritten bit set in the extent tree, or which does not
      yet have a block assigned due to delayed allocation, we can bail out
      early and, unlocking the page earlier and avoiding a round trip
      through ext4_bio_write_page() with the attendant calls to
      set_page_writeback() and redirty_page_for_writeback().
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      cccd147a
  14. 09 9月, 2015 2 次提交
  15. 29 7月, 2015 1 次提交
  16. 24 7月, 2015 1 次提交
  17. 04 7月, 2015 1 次提交
    • L
      ext4: fix reservation release on invalidatepage for delalloc fs · 9705acd6
      Lukas Czerner 提交于
      On delalloc enabled file system on invalidatepage operation
      in ext4_da_page_release_reservation() we want to clear the delayed
      buffer and remove the extent covering the delayed buffer from the extent
      status tree.
      
      However currently there is a bug where on the systems with page size >
      block size we will always remove extents from the start of the page
      regardless where the actual delayed buffers are positioned in the page.
      This leads to the errors like this:
      
      EXT4-fs warning (device loop0): ext4_da_release_space:1225:
      ext4_da_release_space: ino 13, to_free 1 with only 0 reserved data
      blocks
      
      This however can cause data loss on writeback time if the file system is
      in ENOSPC condition because we're releasing reservation for someones
      else delayed buffer.
      
      Fix this by only removing extents that corresponds to the part of the
      page we want to invalidate.
      
      This problem is reproducible by the following fio receipt (however I was
      only able to reproduce it with fio-2.1 or older.
      
      [global]
      bs=8k
      iodepth=1024
      iodepth_batch=60
      randrepeat=1
      size=1m
      directory=/mnt/test
      numjobs=20
      [job1]
      ioengine=sync
      bs=1k
      direct=1
      rw=randread
      filename=file1:file2
      [job2]
      ioengine=libaio
      rw=randwrite
      direct=1
      filename=file1:file2
      [job3]
      bs=1k
      ioengine=posixaio
      rw=randwrite
      direct=1
      filename=file1:file2
      [job5]
      bs=1k
      ioengine=sync
      rw=randread
      filename=file1:file2
      [job7]
      ioengine=libaio
      rw=randwrite
      filename=file1:file2
      [job8]
      ioengine=posixaio
      rw=randwrite
      filename=file1:file2
      [job10]
      ioengine=mmap
      rw=randwrite
      bs=1k
      filename=file1:file2
      [job11]
      ioengine=mmap
      rw=randwrite
      direct=1
      filename=file1:file2
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: stable@vger.kernel.org
      9705acd6
  18. 02 7月, 2015 1 次提交
    • T
      ext4: fix fencepost error in lazytime optimization · 0f0ff9a9
      Theodore Ts'o 提交于
      Commit 8f4d8558: "ext4: fix lazytime optimization" was not a
      complete fix.  In the case where the inode number is a multiple of 16,
      and we could still end up updating an inode with dirty timestamps
      written to the wrong inode on disk.  Oops.
      
      This can be easily reproduced by using generic/005 with a file system
      with metadata_csum and lazytime enabled.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      0f0ff9a9
  19. 22 6月, 2015 2 次提交
  20. 21 6月, 2015 1 次提交
    • T
      ext4: prevent ext4_quota_write() from failing due to ENOSPC · c5e298ae
      Theodore Ts'o 提交于
      In order to prevent quota block tracking to be inaccurate when
      ext4_quota_write() fails with ENOSPC, we make two changes.  The quota
      file can now use the reserved block (since the quota file is arguably
      file system metadata), and ext4_quota_write() now uses
      ext4_should_retry_alloc() to retry the block allocation after a commit
      has completed and released some blocks for allocation.
      
      This fixes failures of xfstests generic/270:
      
      Quota error (device vdc): write_blk: dquota write failed
      Quota error (device vdc): qtree_write_dquot: Error -28 occurred while creating quota
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      c5e298ae
  21. 13 6月, 2015 1 次提交
    • T
      ext4: fix race between truncate and __ext4_journalled_writepage() · bdf96838
      Theodore Ts'o 提交于
      The commit cf108bca: "ext4: Invert the locking order of page_lock
      and transaction start" caused __ext4_journalled_writepage() to drop
      the page lock before the page was written back, as part of changing
      the locking order to jbd2_journal_start -> page_lock.  However, this
      introduced a potential race if there was a truncate racing with the
      data=journalled writeback mode.
      
      Fix this by grabbing the page lock after starting the journal handle,
      and then checking to see if page had gotten truncated out from under
      us.
      
      This fixes a number of different warnings or BUG_ON's when running
      xfstests generic/086 in data=journalled mode, including:
      
      jbd2_journal_dirty_metadata: vdc-8: bad jh for block 115643: transaction (ee3fe7
      c0, 164), jh->b_transaction (  (null), 0), jh->b_next_transaction (  (null), 0), jlist 0
      
      	      	      	  - and -
      
      kernel BUG at /usr/projects/linux/ext4/fs/jbd2/transaction.c:2200!
          ...
      Call Trace:
       [<c02b2ded>] ? __ext4_journalled_invalidatepage+0x117/0x117
       [<c02b2de5>] __ext4_journalled_invalidatepage+0x10f/0x117
       [<c02b2ded>] ? __ext4_journalled_invalidatepage+0x117/0x117
       [<c027d883>] ? lock_buffer+0x36/0x36
       [<c02b2dfa>] ext4_journalled_invalidatepage+0xd/0x22
       [<c0229139>] do_invalidatepage+0x22/0x26
       [<c0229198>] truncate_inode_page+0x5b/0x85
       [<c022934b>] truncate_inode_pages_range+0x156/0x38c
       [<c0229592>] truncate_inode_pages+0x11/0x15
       [<c022962d>] truncate_pagecache+0x55/0x71
       [<c02b913b>] ext4_setattr+0x4a9/0x560
       [<c01ca542>] ? current_kernel_time+0x10/0x44
       [<c026c4d8>] notify_change+0x1c7/0x2be
       [<c0256a00>] do_truncate+0x65/0x85
       [<c0226f31>] ? file_ra_state_init+0x12/0x29
      
      	      	      	  - and -
      
      WARNING: CPU: 1 PID: 1331 at /usr/projects/linux/ext4/fs/jbd2/transaction.c:1396
      irty_metadata+0x14a/0x1ae()
          ...
      Call Trace:
       [<c01b879f>] ? console_unlock+0x3a1/0x3ce
       [<c082cbb4>] dump_stack+0x48/0x60
       [<c0178b65>] warn_slowpath_common+0x89/0xa0
       [<c02ef2cf>] ? jbd2_journal_dirty_metadata+0x14a/0x1ae
       [<c0178bef>] warn_slowpath_null+0x14/0x18
       [<c02ef2cf>] jbd2_journal_dirty_metadata+0x14a/0x1ae
       [<c02d8615>] __ext4_handle_dirty_metadata+0xd4/0x19d
       [<c02b2f44>] write_end_fn+0x40/0x53
       [<c02b4a16>] ext4_walk_page_buffers+0x4e/0x6a
       [<c02b59e7>] ext4_writepage+0x354/0x3b8
       [<c02b2f04>] ? mpage_release_unused_pages+0xd4/0xd4
       [<c02b1b21>] ? wait_on_buffer+0x2c/0x2c
       [<c02b5a4b>] ? ext4_writepage+0x3b8/0x3b8
       [<c02b5a5b>] __writepage+0x10/0x2e
       [<c0225956>] write_cache_pages+0x22d/0x32c
       [<c02b5a4b>] ? ext4_writepage+0x3b8/0x3b8
       [<c02b6ee8>] ext4_writepages+0x102/0x607
       [<c019adfe>] ? sched_clock_local+0x10/0x10e
       [<c01a8a7c>] ? __lock_is_held+0x2e/0x44
       [<c01a8ad5>] ? lock_is_held+0x43/0x51
       [<c0226dff>] do_writepages+0x1c/0x29
       [<c0276bed>] __writeback_single_inode+0xc3/0x545
       [<c0277c07>] writeback_sb_inodes+0x21f/0x36d
          ...
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      bdf96838
  22. 04 6月, 2015 1 次提交
    • D
      dax: don't abuse get_block mapping for endio callbacks · e842f290
      Dave Chinner 提交于
      dax_fault() currently relies on the get_block callback to attach an
      io completion callback to the mapping buffer head so that it can
      run unwritten extent conversion after zeroing allocated blocks.
      
      Instead of this hack, pass the conversion callback directly into
      dax_fault() similar to the get_block callback. When the filesystem
      allocates unwritten extents, it will set the buffer_unwritten()
      flag, and hence the dax_fault code can call the completion function
      in the contexts where it is necessary without overloading the
      mapping buffer head.
      
      Note: The changes to ext4 to use this interface are suspect at best.
      In fact, the way ext4 did this end_io assignment in the first place
      looks suspect because it only set a completion callback when there
      wasn't already some other write() call taking place on the same
      inode. The ext4 end_io code looks rather intricate and fragile with
      all it's reference counting and passing to different contexts for
      modification via inode private pointers that aren't protected by
      locks...
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Acked-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      e842f290
  23. 15 5月, 2015 1 次提交
  24. 11 5月, 2015 2 次提交