1. 11 5月, 2015 1 次提交
  2. 16 4月, 2015 1 次提交
  3. 12 4月, 2015 3 次提交
  4. 26 3月, 2015 1 次提交
  5. 22 5月, 2014 1 次提交
    • J
      ext3: Fix deadlock in data=journal mode when fs is frozen · 166418cc
      Jan Kara 提交于
      When ext3 is used in data=journal mode, syncing filesystem makes sure
      all the data is committed in the journal but the data doesn't have to be
      checkpointed. ext3_freeze() then takes care of checkpointing all the
      data so all buffer heads are clean but pages can still have dangling
      dirty bits. So when flusher thread comes later when filesystem is
      frozen, it tries to write back dirty pages, ext3_journalled_writepage()
      tries to start a transaction and hangs waiting for frozen fs causing a
      deadlock because a holder of s_umount semaphore may be waiting for
      flusher thread to complete.
      
      The fix is luckily relatively easy. We don't have to start a transaction
      in ext3_journalled_writepage() when a page is just dirty (and doesn't
      have PageChecked set) because in that case all buffers should be already
      mapped (mapping must happen before writing a buffer to the journal) and
      it is enough to write them out. This optimization also solves the deadlock
      because block_write_full_page() will just find out there's no buffer to
      write and do nothing.
      Signed-off-by: NJan Kara <jack@suse.cz>
      166418cc
  6. 07 5月, 2014 3 次提交
  7. 04 4月, 2014 1 次提交
    • J
      mm + fs: store shadow entries in page cache · 91b0abe3
      Johannes Weiner 提交于
      Reclaim will be leaving shadow entries in the page cache radix tree upon
      evicting the real page.  As those pages are found from the LRU, an
      iput() can lead to the inode being freed concurrently.  At this point,
      reclaim must no longer install shadow pages because the inode freeing
      code needs to ensure the page tree is really empty.
      
      Add an address_space flag, AS_EXITING, that the inode freeing code sets
      under the tree lock before doing the final truncate.  Reclaim will check
      for this flag before installing shadow pages.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NMinchan Kim <minchan@kernel.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Luigi Semenzato <semenzato@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Metin Doslu <metin@citusdata.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Ozgun Erdogan <ozgun@citusdata.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Roman Gushchin <klamm@yandex-team.ru>
      Cc: Ryan Mallon <rmallon@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      91b0abe3
  8. 18 3月, 2014 1 次提交
  9. 13 3月, 2014 1 次提交
    • J
      ext3: Speedup WB_SYNC_ALL pass · 2299432e
      Jan Kara 提交于
      When doing filesystem wide sync, there's no need to force transaction
      commit separately for each inode because ext3_sync_fs() takes care of
      forcing commit at the end. Most of the time this slowness doesn't
      manifest because previous WB_SYNC_NONE writeback doesn't leave much to
      write but when there are processes aggressively creating new files and
      several filesystems to sync, the sync slowness can be noticeable. In the
      following test script sync(1) takes around 6 minutes when there are two
      ext3 filesystems mounted on a standard SATA drive. After this patch sync
      is about twice as fast in the default data=ordered mode. For
      data=writeback mode we have even bigger speedup.
      
      function run_writers
      {
        for (( i = 0; i < 10; i++ )); do
          mkdir $1/dir$i
          for (( j = 0; j < 40000; j++ )); do
            dd if=/dev/zero of=$1/dir$i/$j bs=4k count=4 &>/dev/null
          done &
        done
      }
      
      for dir in "$@"; do
        run_writers $dir
      done
      
      sleep 40
      time sync
      Signed-off-by: NJan Kara <jack@suse.cz>
      2299432e
  10. 04 3月, 2014 2 次提交
  11. 03 3月, 2014 1 次提交
  12. 26 1月, 2014 1 次提交
  13. 04 7月, 2013 1 次提交
    • M
      mm: vmscan: take page buffers dirty and locked state into account · b4597226
      Mel Gorman 提交于
      Page reclaim keeps track of dirty and under writeback pages and uses it
      to determine if wait_iff_congested() should stall or if kswapd should
      begin writing back pages.  This fails to account for buffer pages that
      can be under writeback but not PageWriteback which is the case for
      filesystems like ext3 ordered mode.  Furthermore, PageDirty buffer pages
      can have all the buffers clean and writepage does no IO so it should not
      be accounted as congested.
      
      This patch adds an address_space operation that filesystems may
      optionally use to check if a page is really dirty or really under
      writeback.  An implementation is provided for for buffer_heads is added
      and used for block operations and ext3 in ordered mode.  By default the
      page flags are obeyed.
      
      Credit goes to Jan Kara for identifying that the page flags alone are
      not sufficient for ext3 and sanity checking a number of ideas on how the
      problem could be addressed.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
      Cc: Zlatko Calusic <zcalusic@bitsync.net>
      Cc: dormando <dormando@rydia.net>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b4597226
  14. 22 5月, 2013 2 次提交
    • L
      jbd: change journal_invalidatepage() to accept length · d8c8900a
      Lukas Czerner 提交于
      ->invalidatepage() aop now accepts range to invalidate so we can make
      use of it in journal_invalidatepage() and all the users in ext3 file
      system. Also update ext3 trace point to print out length argument.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      d8c8900a
    • L
      mm: change invalidatepage prototype to accept length · d47992f8
      Lukas Czerner 提交于
      Currently there is no way to truncate partial page where the end
      truncate point is not at the end of the page. This is because it was not
      needed and the functionality was enough for file system truncate
      operation to work properly. However more file systems now support punch
      hole feature and it can benefit from mm supporting truncating page just
      up to the certain point.
      
      Specifically, with this functionality truncate_inode_pages_range() can
      be changed so it supports truncating partial page at the end of the
      range (currently it will BUG_ON() if 'end' is not at the end of the
      page).
      
      This commit changes the invalidatepage() address space operation
      prototype to accept range to be invalidated and update all the instances
      for it.
      
      We also change the block_invalidatepage() in the same way and actually
      make a use of the new length argument implementing range invalidation.
      
      Actual file system implementations will follow except the file systems
      where the changes are really simple and should not change the behaviour
      in any way .Implementation for truncate_page_range() which will be able
      to accept page unaligned ranges will follow as well.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      d47992f8
  15. 08 5月, 2013 1 次提交
  16. 20 3月, 2013 1 次提交
    • J
      ext3: fix data=journal fast mount/umount hang · e6436921
      Jan Kara 提交于
      In data=journal mode, if we unmount the file system before a
      transaction has a chance to complete, when the journal inode is being
      evicted, we can end up calling into log_wait_commit() for the
      last transaction, after the journalling machinery has been shut down.
      That triggers the WARN_ONCE in __log_start_commit().
      
      Arguably we should adjust ext3_should_journal_data() to return FALSE
      for the journal inode, but the only place it matters is
      ext3_evict_inode(), and so it's to save a bit of CPU time, and to make
      the patch much more obviously correct by inspection(tm), we'll fix it
      by explicitly not trying to waiting for a journal commit when we are
      evicting the journal inode, since it's guaranteed to never succeed in
      this case.
      
      This can be easily replicated via:
      
           mount -t ext3 -o data=journal /dev/vdb /vdb ; umount /vdb
      
      This is a port of ext4 fix from Ted Ts'o.
      Signed-off-by: NJan Kara <jack@suse.cz>
      e6436921
  17. 21 1月, 2013 3 次提交
  18. 13 12月, 2012 1 次提交
  19. 04 9月, 2012 1 次提交
  20. 02 9月, 2012 1 次提交
  21. 04 8月, 2012 1 次提交
  22. 16 5月, 2012 1 次提交
  23. 06 5月, 2012 1 次提交
  24. 01 4月, 2012 1 次提交
  25. 01 3月, 2012 1 次提交
  26. 09 1月, 2012 3 次提交
  27. 02 12月, 2011 1 次提交
  28. 22 11月, 2011 1 次提交
    • D
      ext3: NULL dereference in ext3_evict_inode() · bcdd0c16
      Dan Carpenter 提交于
      This is an fsfuzzer bug.  ->s_journal is set at the end of
      ext3_load_journal() but we try to use it in the error handling from
      ext3_get_journal() while it's still NULL.
      
      [  337.039041] BUG: unable to handle kernel NULL pointer dereference at 0000000000000024
      [  337.040380] IP: [<ffffffff816e6539>] _raw_spin_lock+0x9/0x30
      [  337.041687] PGD 0
      [  337.043118] Oops: 0002 [#1] SMP
      [  337.044483] CPU 3
      [  337.044495] Modules linked in: ecb md4 cifs fuse kvm_intel kvm brcmsmac brcmutil crc8 cordic r8169 [last unloaded: scsi_wait_scan]
      [  337.047633]
      [  337.049259] Pid: 8308, comm: mount Not tainted 3.2.0-rc2-next-20111121+ #24 SAMSUNG ELECTRONICS CO., LTD. RV411/RV511/E3511/S3511    /RV411/RV511/E3511/S3511
      [  337.051064] RIP: 0010:[<ffffffff816e6539>]  [<ffffffff816e6539>] _raw_spin_lock+0x9/0x30
      [  337.052879] RSP: 0018:ffff8800b1d11ae8  EFLAGS: 00010282
      [  337.054668] RAX: 0000000000000100 RBX: 0000000000000000 RCX: ffff8800b77c2000
      [  337.056400] RDX: ffff8800a97b5c00 RSI: 0000000000000000 RDI: 0000000000000024
      [  337.058099] RBP: ffff8800b1d11ae8 R08: 6000000000000000 R09: e018000000000000
      [  337.059841] R10: ff67366cc2607c03 R11: 00000000110688e6 R12: 0000000000000000
      [  337.061607] R13: 0000000000000000 R14: 0000000000000000 R15: ffff8800a78f06e8
      [  337.063385] FS:  00007f9d95652800(0000) GS:ffff8800b7180000(0000) knlGS:0000000000000000
      [  337.065110] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  337.066801] CR2: 0000000000000024 CR3: 00000000aef2c000 CR4: 00000000000006e0
      [  337.068581] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  337.070321] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [  337.072105] Process mount (pid: 8308, threadinfo ffff8800b1d10000, task ffff8800b1d02be0)
      [  337.073800] Stack:
      [  337.075487]  ffff8800b1d11b08 ffffffff811f48cf ffff88007ac9b158 0000000000000000
      [  337.077255]  ffff8800b1d11b38 ffffffff8119405d ffff88007ac9b158 ffff88007ac9b250
      [  337.078851]  ffffffff8181bda0 ffffffff8181bda0 ffff8800b1d11b68 ffffffff81131e31
      [  337.080284] Call Trace:
      [  337.081706]  [<ffffffff811f48cf>] log_start_commit+0x1f/0x40
      [  337.083107]  [<ffffffff8119405d>] ext3_evict_inode+0x1fd/0x2a0
      [  337.084490]  [<ffffffff81131e31>] evict+0xa1/0x1a0
      [  337.085857]  [<ffffffff81132031>] iput+0x101/0x210
      [  337.087220]  [<ffffffff811339d1>] iget_failed+0x21/0x30
      [  337.088581]  [<ffffffff811905fc>] ext3_iget+0x15c/0x450
      [  337.089936]  [<ffffffff8118b0c1>] ? ext3_rsv_window_add+0x81/0x100
      [  337.091284]  [<ffffffff816df9a4>] ext3_get_journal+0x15/0xde
      [  337.092641]  [<ffffffff811a2e9b>] ext3_fill_super+0xf2b/0x1c30
      [  337.093991]  [<ffffffff810ddf7d>] ? register_shrinker+0x4d/0x60
      [  337.095332]  [<ffffffff8111c112>] mount_bdev+0x1a2/0x1e0
      [  337.096680]  [<ffffffff811a1f70>] ? ext3_setup_super+0x210/0x210
      [  337.098026]  [<ffffffff8119a770>] ext3_mount+0x10/0x20
      [  337.099362]  [<ffffffff8111cbee>] mount_fs+0x3e/0x1b0
      [  337.100759]  [<ffffffff810eda1b>] ? __alloc_percpu+0xb/0x10
      [  337.102330]  [<ffffffff81135385>] vfs_kern_mount+0x65/0xc0
      [  337.103889]  [<ffffffff8113611f>] do_kern_mount+0x4f/0x100
      [  337.105442]  [<ffffffff811378fc>] do_mount+0x19c/0x890
      [  337.106989]  [<ffffffff810e8456>] ? memdup_user+0x46/0x90
      [  337.108572]  [<ffffffff810e84f3>] ? strndup_user+0x53/0x70
      [  337.110114]  [<ffffffff811383fb>] sys_mount+0x8b/0xe0
      [  337.111617]  [<ffffffff816ed93b>] system_call_fastpath+0x16/0x1b
      [  337.113133] Code: 38 c2 74 0f 66 0f 1f 44 00 00 f3 90 0f b6 03 38 c2 75 f7 48 83 c4 08 5b 5d c3 0f 1f 84 00 00 00 00 00 55 b8 00 01 00 00 48 89 e5 <f0> 66 0f c1 07 0f b6 d4 38 c2 74 0c 0f 1f 00 f3 90 0f b6 07 38
      [  337.116588] RIP  [<ffffffff816e6539>] _raw_spin_lock+0x9/0x30
      [  337.118260]  RSP <ffff8800b1d11ae8>
      [  337.119998] CR2: 0000000000000024
      [  337.188701] ---[ end trace c36d790becac1615 ]---
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      bcdd0c16
  29. 02 11月, 2011 1 次提交
  30. 23 8月, 2011 1 次提交