1. 07 5月, 2014 1 次提交
  2. 04 4月, 2014 1 次提交
    • J
      mm + fs: store shadow entries in page cache · 91b0abe3
      Johannes Weiner 提交于
      Reclaim will be leaving shadow entries in the page cache radix tree upon
      evicting the real page.  As those pages are found from the LRU, an
      iput() can lead to the inode being freed concurrently.  At this point,
      reclaim must no longer install shadow pages because the inode freeing
      code needs to ensure the page tree is really empty.
      
      Add an address_space flag, AS_EXITING, that the inode freeing code sets
      under the tree lock before doing the final truncate.  Reclaim will check
      for this flag before installing shadow pages.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NMinchan Kim <minchan@kernel.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Luigi Semenzato <semenzato@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Metin Doslu <metin@citusdata.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Ozgun Erdogan <ozgun@citusdata.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Roman Gushchin <klamm@yandex-team.ru>
      Cc: Ryan Mallon <rmallon@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      91b0abe3
  3. 18 3月, 2014 1 次提交
  4. 13 3月, 2014 1 次提交
    • J
      ext3: Speedup WB_SYNC_ALL pass · 2299432e
      Jan Kara 提交于
      When doing filesystem wide sync, there's no need to force transaction
      commit separately for each inode because ext3_sync_fs() takes care of
      forcing commit at the end. Most of the time this slowness doesn't
      manifest because previous WB_SYNC_NONE writeback doesn't leave much to
      write but when there are processes aggressively creating new files and
      several filesystems to sync, the sync slowness can be noticeable. In the
      following test script sync(1) takes around 6 minutes when there are two
      ext3 filesystems mounted on a standard SATA drive. After this patch sync
      is about twice as fast in the default data=ordered mode. For
      data=writeback mode we have even bigger speedup.
      
      function run_writers
      {
        for (( i = 0; i < 10; i++ )); do
          mkdir $1/dir$i
          for (( j = 0; j < 40000; j++ )); do
            dd if=/dev/zero of=$1/dir$i/$j bs=4k count=4 &>/dev/null
          done &
        done
      }
      
      for dir in "$@"; do
        run_writers $dir
      done
      
      sleep 40
      time sync
      Signed-off-by: NJan Kara <jack@suse.cz>
      2299432e
  5. 04 3月, 2014 2 次提交
  6. 03 3月, 2014 1 次提交
  7. 26 1月, 2014 1 次提交
  8. 04 7月, 2013 1 次提交
    • M
      mm: vmscan: take page buffers dirty and locked state into account · b4597226
      Mel Gorman 提交于
      Page reclaim keeps track of dirty and under writeback pages and uses it
      to determine if wait_iff_congested() should stall or if kswapd should
      begin writing back pages.  This fails to account for buffer pages that
      can be under writeback but not PageWriteback which is the case for
      filesystems like ext3 ordered mode.  Furthermore, PageDirty buffer pages
      can have all the buffers clean and writepage does no IO so it should not
      be accounted as congested.
      
      This patch adds an address_space operation that filesystems may
      optionally use to check if a page is really dirty or really under
      writeback.  An implementation is provided for for buffer_heads is added
      and used for block operations and ext3 in ordered mode.  By default the
      page flags are obeyed.
      
      Credit goes to Jan Kara for identifying that the page flags alone are
      not sufficient for ext3 and sanity checking a number of ideas on how the
      problem could be addressed.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
      Cc: Zlatko Calusic <zcalusic@bitsync.net>
      Cc: dormando <dormando@rydia.net>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b4597226
  9. 22 5月, 2013 2 次提交
    • L
      jbd: change journal_invalidatepage() to accept length · d8c8900a
      Lukas Czerner 提交于
      ->invalidatepage() aop now accepts range to invalidate so we can make
      use of it in journal_invalidatepage() and all the users in ext3 file
      system. Also update ext3 trace point to print out length argument.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      d8c8900a
    • L
      mm: change invalidatepage prototype to accept length · d47992f8
      Lukas Czerner 提交于
      Currently there is no way to truncate partial page where the end
      truncate point is not at the end of the page. This is because it was not
      needed and the functionality was enough for file system truncate
      operation to work properly. However more file systems now support punch
      hole feature and it can benefit from mm supporting truncating page just
      up to the certain point.
      
      Specifically, with this functionality truncate_inode_pages_range() can
      be changed so it supports truncating partial page at the end of the
      range (currently it will BUG_ON() if 'end' is not at the end of the
      page).
      
      This commit changes the invalidatepage() address space operation
      prototype to accept range to be invalidated and update all the instances
      for it.
      
      We also change the block_invalidatepage() in the same way and actually
      make a use of the new length argument implementing range invalidation.
      
      Actual file system implementations will follow except the file systems
      where the changes are really simple and should not change the behaviour
      in any way .Implementation for truncate_page_range() which will be able
      to accept page unaligned ranges will follow as well.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      d47992f8
  10. 08 5月, 2013 1 次提交
  11. 20 3月, 2013 1 次提交
    • J
      ext3: fix data=journal fast mount/umount hang · e6436921
      Jan Kara 提交于
      In data=journal mode, if we unmount the file system before a
      transaction has a chance to complete, when the journal inode is being
      evicted, we can end up calling into log_wait_commit() for the
      last transaction, after the journalling machinery has been shut down.
      That triggers the WARN_ONCE in __log_start_commit().
      
      Arguably we should adjust ext3_should_journal_data() to return FALSE
      for the journal inode, but the only place it matters is
      ext3_evict_inode(), and so it's to save a bit of CPU time, and to make
      the patch much more obviously correct by inspection(tm), we'll fix it
      by explicitly not trying to waiting for a journal commit when we are
      evicting the journal inode, since it's guaranteed to never succeed in
      this case.
      
      This can be easily replicated via:
      
           mount -t ext3 -o data=journal /dev/vdb /vdb ; umount /vdb
      
      This is a port of ext4 fix from Ted Ts'o.
      Signed-off-by: NJan Kara <jack@suse.cz>
      e6436921
  12. 21 1月, 2013 3 次提交
  13. 13 12月, 2012 1 次提交
  14. 04 9月, 2012 1 次提交
  15. 02 9月, 2012 1 次提交
  16. 04 8月, 2012 1 次提交
  17. 16 5月, 2012 1 次提交
  18. 06 5月, 2012 1 次提交
  19. 01 4月, 2012 1 次提交
  20. 01 3月, 2012 1 次提交
  21. 09 1月, 2012 3 次提交
  22. 02 12月, 2011 1 次提交
  23. 22 11月, 2011 1 次提交
    • D
      ext3: NULL dereference in ext3_evict_inode() · bcdd0c16
      Dan Carpenter 提交于
      This is an fsfuzzer bug.  ->s_journal is set at the end of
      ext3_load_journal() but we try to use it in the error handling from
      ext3_get_journal() while it's still NULL.
      
      [  337.039041] BUG: unable to handle kernel NULL pointer dereference at 0000000000000024
      [  337.040380] IP: [<ffffffff816e6539>] _raw_spin_lock+0x9/0x30
      [  337.041687] PGD 0
      [  337.043118] Oops: 0002 [#1] SMP
      [  337.044483] CPU 3
      [  337.044495] Modules linked in: ecb md4 cifs fuse kvm_intel kvm brcmsmac brcmutil crc8 cordic r8169 [last unloaded: scsi_wait_scan]
      [  337.047633]
      [  337.049259] Pid: 8308, comm: mount Not tainted 3.2.0-rc2-next-20111121+ #24 SAMSUNG ELECTRONICS CO., LTD. RV411/RV511/E3511/S3511    /RV411/RV511/E3511/S3511
      [  337.051064] RIP: 0010:[<ffffffff816e6539>]  [<ffffffff816e6539>] _raw_spin_lock+0x9/0x30
      [  337.052879] RSP: 0018:ffff8800b1d11ae8  EFLAGS: 00010282
      [  337.054668] RAX: 0000000000000100 RBX: 0000000000000000 RCX: ffff8800b77c2000
      [  337.056400] RDX: ffff8800a97b5c00 RSI: 0000000000000000 RDI: 0000000000000024
      [  337.058099] RBP: ffff8800b1d11ae8 R08: 6000000000000000 R09: e018000000000000
      [  337.059841] R10: ff67366cc2607c03 R11: 00000000110688e6 R12: 0000000000000000
      [  337.061607] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8800a78f06e8
      [  337.063385] FS:  00007f9d95652800(0000) GS:ffff8800b7180000(0000) knlGS:0000000000000000
      [  337.065110] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  337.066801] CR2: 0000000000000024 CR3: 00000000aef2c000 CR4: 00000000000006e0
      [  337.068581] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  337.070321] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [  337.072105] Process mount (pid: 8308, threadinfo ffff8800b1d10000, task ffff8800b1d02be0)
      [  337.073800] Stack:
      [  337.075487]  ffff8800b1d11b08 ffffffff811f48cf ffff88007ac9b158 0000000000000000
      [  337.077255]  ffff8800b1d11b38 ffffffff8119405d ffff88007ac9b158 ffff88007ac9b250
      [  337.078851]  ffffffff8181bda0 ffffffff8181bda0 ffff8800b1d11b68 ffffffff81131e31
      [  337.080284] Call Trace:
      [  337.081706]  [<ffffffff811f48cf>] log_start_commit+0x1f/0x40
      [  337.083107]  [<ffffffff8119405d>] ext3_evict_inode+0x1fd/0x2a0
      [  337.084490]  [<ffffffff81131e31>] evict+0xa1/0x1a0
      [  337.085857]  [<ffffffff81132031>] iput+0x101/0x210
      [  337.087220]  [<ffffffff811339d1>] iget_failed+0x21/0x30
      [  337.088581]  [<ffffffff811905fc>] ext3_iget+0x15c/0x450
      [  337.089936]  [<ffffffff8118b0c1>] ? ext3_rsv_window_add+0x81/0x100
      [  337.091284]  [<ffffffff816df9a4>] ext3_get_journal+0x15/0xde
      [  337.092641]  [<ffffffff811a2e9b>] ext3_fill_super+0xf2b/0x1c30
      [  337.093991]  [<ffffffff810ddf7d>] ? register_shrinker+0x4d/0x60
      [  337.095332]  [<ffffffff8111c112>] mount_bdev+0x1a2/0x1e0
      [  337.096680]  [<ffffffff811a1f70>] ? ext3_setup_super+0x210/0x210
      [  337.098026]  [<ffffffff8119a770>] ext3_mount+0x10/0x20
      [  337.099362]  [<ffffffff8111cbee>] mount_fs+0x3e/0x1b0
      [  337.100759]  [<ffffffff810eda1b>] ? __alloc_percpu+0xb/0x10
      [  337.102330]  [<ffffffff81135385>] vfs_kern_mount+0x65/0xc0
      [  337.103889]  [<ffffffff8113611f>] do_kern_mount+0x4f/0x100
      [  337.105442]  [<ffffffff811378fc>] do_mount+0x19c/0x890
      [  337.106989]  [<ffffffff810e8456>] ? memdup_user+0x46/0x90
      [  337.108572]  [<ffffffff810e84f3>] ? strndup_user+0x53/0x70
      [  337.110114]  [<ffffffff811383fb>] sys_mount+0x8b/0xe0
      [  337.111617]  [<ffffffff816ed93b>] system_call_fastpath+0x16/0x1b
      [  337.113133] Code: 38 c2 74 0f 66 0f 1f 44 00 00 f3 90 0f b6 03 38 c2 75 f7 48 83 c4 08 5b 5d c3 0f 1f 84 00 00 00 00 00 55 b8 00 01 00 00 48 89 e5 <f0> 66 0f c1 07 0f b6 d4 38 c2 74 0c 0f 1f 00 f3 90 0f b6 07 38
      [  337.116588] RIP  [<ffffffff816e6539>] _raw_spin_lock+0x9/0x30
      [  337.118260]  RSP <ffff8800b1d11ae8>
      [  337.119998] CR2: 0000000000000024
      [  337.188701] ---[ end trace c36d790becac1615 ]---
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      bcdd0c16
  24. 02 11月, 2011 1 次提交
  25. 23 8月, 2011 2 次提交
  26. 23 7月, 2011 1 次提交
    • J
      ext3: Fix data corruption in inodes with journalled data · b22570d9
      Jan Kara 提交于
      When journalling data for an inode (either because it is a symlink or
      because the filesystem is mounted in data=journal mode), ext3_evict_inode()
      can discard unwritten data by calling truncate_inode_pages(). This is
      because we don't mark the buffer / page dirty when journalling data but only
      add the buffer to the running transaction and thus mm does not know there
      are still unwritten data.
      
      Fix the problem by carefully tracking transaction containing inode's data,
      committing this transaction, and writing uncheckpointed buffers when inode
      should be reaped.
      Signed-off-by: NJan Kara <jack@suse.cz>
      b22570d9
  27. 21 7月, 2011 2 次提交
  28. 25 6月, 2011 3 次提交
    • J
      ext3: Improve truncate error handling · ee3e77f1
      Jan Kara 提交于
      New truncate calling convention allows us to handle errors from
      ext3_block_truncate_page(). So reorganize the code so that
      ext3_block_truncate_page() is called before we change inode size.
      
      This also removes unnecessary block zeroing from error recovery after failed
      buffered writes (zeroing isn't needed because we could have never written
      non-zero data to disk). We have to be careful and keep zeroing in direct IO
      write error recovery because there we might have already overwritten end of the
      last file block.
      Signed-off-by: NJan Kara <jack@suse.cz>
      ee3e77f1
    • J
      ext3: Convert ext3 to new truncate calling convention · 40680f2f
      Jan Kara 提交于
      Mostly trivial conversion. We fix a bug that IS_IMMUTABLE and IS_APPEND files
      could not be truncated during failed writes as we change the code.  In fact the
      test is not needed at all because both IS_IMMUTABLE and IS_APPEND is tested in
      upper layers in do_sys_[f]truncate(), may_write(), etc.
      Signed-off-by: NJan Kara <jack@suse.cz>
      40680f2f
    • L
      ext3: Add fixed tracepoints · 785c4bcc
      Lukas Czerner 提交于
      This commit adds fixed tracepoints to the ext3 code. It is based on ext4
      tracepoints, however due to the differences of both file systems, there
      are some tracepoints missing (those for delaloc and for multi-block
      allocator) and there are some ext3 specific as well (for reservation
      windows).
      
      Here is a list:
      
      ext3_free_inode
      ext3_request_inode
      ext3_allocate_inode
      ext3_evict_inode
      ext3_drop_inode
      ext3_mark_inode_dirty
      ext3_write_begin
      ext3_ordered_write_end
      ext3_writeback_write_end
      ext3_journalled_write_end
      ext3_ordered_writepage
      ext3_writeback_writepage
      ext3_journalled_writepage
      ext3_readpage
      ext3_releasepage
      ext3_invalidatepage
      ext3_discard_blocks
      ext3_request_blocks
      ext3_allocate_blocks
      ext3_free_blocks
      ext3_sync_file_enter
      ext3_sync_file_exit
      ext3_sync_fs
      ext3_rsv_window_add
      ext3_discard_reservation
      ext3_alloc_new_reservation
      ext3_reserved
      ext3_forget
      ext3_read_block_bitmap
      ext3_direct_IO_enter
      ext3_direct_IO_exit
      ext3_unlink_enter
      ext3_unlink_exit
      ext3_truncate_enter
      ext3_truncate_exit
      ext3_get_blocks_enter
      ext3_get_blocks_exit
      ext3_load_inode
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: NJan Kara <jack@suse.cz>
      785c4bcc
  29. 27 5月, 2011 1 次提交
    • C
      fs: pass exact type of data dirties to ->dirty_inode · aa385729
      Christoph Hellwig 提交于
      Tell the filesystem if we just updated timestamp (I_DIRTY_SYNC) or
      anything else, so that the filesystem can track internally if it
      needs to push out a transaction for fdatasync or not.
      
      This is just the prototype change with no user for it yet.  I plan
      to push large XFS changes for the next merge window, and getting
      this trivial infrastructure in this window would help a lot to avoid
      tree interdependencies.
      
      Also remove incorrect comments that ->dirty_inode can't block.  That
      has been changed a long time ago, and many implementations rely on it.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      aa385729
  30. 31 3月, 2011 1 次提交