1. 17 9月, 2009 4 次提交
    • T
      ext4: Fix the alloc on close after a truncate hueristic · 5534fb5b
      Theodore Ts'o 提交于
      In an attempt to avoid doing an unneeded flush after opening a
      (previously non-existent) file with O_CREAT|O_TRUNC, the code only
      triggered the hueristic if ei->disksize was non-zero.  Turns out that
      the VFS doesn't call ->truncate() if the file doesn't exist, and
      ei->disksize is always zero even if the file previously existed.  So
      remove the test, since it isn't necessary and in fact disabled the
      hueristic.
      
      Thanks to Clemens Eisserer that he was seeing problems with files
      written using kwrite and eclipse after sudden crashes caused by a
      buggy Intel video driver.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      5534fb5b
    • T
      ext4: Add a tracepoint for ext4_alloc_da_blocks() · fb40ba0d
      Theodore Ts'o 提交于
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      fb40ba0d
    • T
      ext4: store EXT4_EXT_MIGRATE in i_state instead of i_flags · 1b9c12f4
      Theodore Ts'o 提交于
      EXT4_EXT_MIGRATE is only intended to be used for an in-memory flag,
      and the hex value assigned to it collides with FS_DIRECTIO_FL (which
      is also stored in i_flags).  There's no reason for the
      EXT4_EXT_MIGRATE bit to be stored in i_flags, so we switch it to use
      i_state instead.
      
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      1b9c12f4
    • E
      ext4: limit block allocations for indirect-block files to < 2^32 · fb0a387d
      Eric Sandeen 提交于
      Today, the ext4 allocator will happily allocate blocks past
      2^32 for indirect-block files, which results in the block
      numbers getting truncated, and corruption ensues.
      
      This patch limits such allocations to < 2^32, and adds
      BUG_ONs if we do get blocks larger than that.
      
      This should address RH Bug 519471, ext4 bitmap allocator 
      must limit blocks to < 2^32
      
      * ext4_find_goal() is modified to choose a goal < UINT_MAX,
        so that our starting point is in an acceptable range.
      
      * ext4_xattr_block_set() is modified such that the goal block
        is < UINT_MAX, as above.
      
      * ext4_mb_regular_allocator() is modified so that the group
        search does not continue into groups which are too high
      
      * ext4_mb_use_preallocated() has a check that we don't use
        preallocated space which is too far out
      
      * ext4_alloc_blocks() and ext4_xattr_block_set() add some BUG_ONs
      
      No attempt has been made to limit inode locations to < 2^32,
      so we may wind up with blocks far from their inodes.  Doing
      this much already will lead to some odd ENOSPC issues when the
      "lower 32" gets full, and further restricting inodes could
      make that even weirder.
      
      For high inodes, choosing a goal of the original, % UINT_MAX,
      may be a bit odd, but then we're in an odd situation anyway,
      and I don't know of a better heuristic.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      fb0a387d
  2. 10 9月, 2009 1 次提交
    • F
      ext4: Make non-journal fsync work properly · 91ac6f43
      Frank Mayhar 提交于
      Teach ext4_write_inode() and ext4_do_update_inode() about non-journal
      mode:  If we're not using a journal, ext4_write_inode() now calls
      ext4_do_update_inode() (after getting the iloc via ext4_get_inode_loc())
      with a new "do_sync" parameter.  If that parameter is nonzero _and_ we're
      not using a journal, ext4_do_update_inode() calls sync_dirty_buffer()
      instead of ext4_handle_dirty_metadata().
      
      This problem was found in power-fail testing, checking the amount of
      loss of files and blocks after a power failure when using fsync() and
      when not using fsync().  It turned out that using fsync() was actually
      worse than not doing so, possibly because it increased the likelihood
      that the inodes would remain unflushed and would therefore be lost at
      the power failure.
      Signed-off-by: NFrank Mayhar <fmayhar@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      91ac6f43
  3. 08 9月, 2009 1 次提交
  4. 10 9月, 2009 1 次提交
  5. 01 9月, 2009 2 次提交
  6. 18 8月, 2009 1 次提交
    • J
      ext4: Fix possible deadlock between ext4_truncate() and ext4_get_blocks() · 487caeef
      Jan Kara 提交于
      During truncate we are sometimes forced to start a new transaction as
      the amount of blocks to be journaled is both quite large and hard to
      predict. So far we restarted a transaction while holding i_data_sem
      and that violates lock ordering because i_data_sem ranks below a
      transaction start (and it can lead to a real deadlock with
      ext4_get_blocks() mapping blocks in some page while having a
      transaction open).
      
      We fix the problem by dropping the i_data_sem before restarting the
      transaction and acquire it afterwards. It's slightly subtle that this
      works:
      
      1) By the time ext4_truncate() is called, all the page cache for the
      truncated part of the file is dropped so get_block() should not be
      called on it (we only have to invalidate extent cache after we
      reacquire i_data_sem because some extent from not-truncated part could
      extend also into the part we are going to truncate).
      
      2) Writes, migrate or defrag hold i_mutex so they are stopped for all
      the time of the truncate.
      
      This bug has been found and analyzed by Theodore Tso <tytso@mit.edu>.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      487caeef
  7. 11 8月, 2009 1 次提交
  8. 13 7月, 2009 1 次提交
    • C
      ext4: Fix buffer head reference leak in no-journal mode · e6b5d301
      Curt Wohlgemuth 提交于
      We found a problem with buffer head reference leaks when using an ext4
      partition without a journal.  In particular, calls to ext4_forget() would
      not to a brelse() on the input buffer head, which will cause pages they
      belong to to not be reclaimable.
      
      Further investigation showed that all places where ext4_journal_forget() and
      ext4_journal_revoke() are called are subject to the same problem.  The patch
      below changes __ext4_journal_forget/__ext4_journal_revoke to do an explicit
      release of the buffer head when the journal handle isn't valid.
      Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      e6b5d301
  9. 17 7月, 2009 1 次提交
    • C
      ext4: More buffer head reference leaks · 6487a9d3
      Curt Wohlgemuth 提交于
      After the patch I posted last week regarding buffer head ref leaks in
      no-journal mode, I looked at all the code that uses buffer heads and
      searched for more potential leaks.
      
      The patch below fixes the issues I found; these can occur even when a
      journal is present.
      
      The change to inode.c fixes a double release if
      ext4_journal_get_create_access() fails.
      
      The changes to namei.c are more complicated.  add_dirent_to_buf() will
      release the input buffer head EXCEPT when it returns -ENOSPC.  There are
      some callers of this routine that don't always do the brelse() in the event
      that -ENOSPC is returned.  Unfortunately, to put this fix into ext4_add_entry()
      required capturing the return value of make_indexed_dir() and
      add_dirent_to_buf().
      Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      6487a9d3
  10. 24 6月, 2009 1 次提交
  11. 15 6月, 2009 5 次提交
  12. 13 6月, 2009 2 次提交
  13. 09 6月, 2009 1 次提交
  14. 05 6月, 2009 2 次提交
  15. 09 6月, 2009 1 次提交
    • J
      ext4: Get rid of EXTEND_DISKSIZE flag of ext4_get_blocks_handle() · 03f5d8bc
      Jan Kara 提交于
      Get rid of EXTEND_DISKSIZE flag of ext4_get_blocks_handle(). This
      seems to be a relict from some old days and setting disksize in this
      function does not make much sense.  Currently it was set only by
      ext4_getblk().  Since the parameter has some effect only if create ==
      1, it is easy to check by grepping through the sources that the three
      callers which end up calling ext4_getblk() with create == 1
      (ext4_append, ext4_quota_write, ext4_mkdir) do the right thing and set
      disksize themselves.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      03f5d8bc
  16. 04 6月, 2009 1 次提交
  17. 14 7月, 2009 1 次提交
    • J
      ext4: Fix truncation of symlinks after failed write · ffacfa7a
      Jan Kara 提交于
      Contents of long symlinks is written via standard write methods. So
      when the write fails, we add inode to orphan list. But symlinks don't
      have .truncate method defined so nobody properly removes them from the
      on disk orphan list.
      
      Fix this by calling ext4_truncate() directly instead of calling
      vmtruncate() (which is saner anyway since we don't need anything
      vmtruncate() does except from calling .truncate in these paths).  We
      also add inode to orphan list only if ext4_can_truncate() is true
      (currently, it can be false for symlinks when there are no blocks
      allocated) - otherwise orphan list processing will complain and
      ext4_truncate() will not remove inode from on-disk orphan list.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      ffacfa7a
  18. 06 7月, 2009 1 次提交
    • T
      ext4: Fix potential reclaim deadlock when truncating partial block · f4a01017
      Theodore Ts'o 提交于
      The ext4_block_truncate_page() function previously called
      grab_cache_page(), which called find_or_create_page() with the
      __GFP_FS flag potentially set.  This could cause a deadlock if the
      system is low on memory and it attempts a memory reclaim, which could
      potentially call back into ext4.  So we need to call
      find_or_create_page() directly, and remove the __GFP_FP flag to avoid
      this potential deadlock.
      
      Thanks to Roland Dreier for reporting a lockdep warning which showed
      this problem.
      
      [20786.363249] =================================
      [20786.363257] [ INFO: inconsistent lock state ]
      [20786.363265] 2.6.31-2-generic #14~rbd4gitd960eea9
      [20786.363270] ---------------------------------
      [20786.363276] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
      [20786.363285] http/8397 [HC0[0]:SC0[0]:HE1:SE1] takes:
      [20786.363291]  (jbd2_handle){+.+.?.}, at: [<ffffffff812008bb>] jbd2_journal_start+0xdb/0x150
      [20786.363314] {IN-RECLAIM_FS-W} state was registered at:
      [20786.363320]   [<ffffffff8108bef6>] mark_irqflags+0xc6/0x1a0
      [20786.363334]   [<ffffffff8108d347>] __lock_acquire+0x287/0x430
      [20786.363345]   [<ffffffff8108d595>] lock_acquire+0xa5/0x150
      [20786.363355]   [<ffffffff812008da>] jbd2_journal_start+0xfa/0x150
      [20786.363365]   [<ffffffff811d98a8>] ext4_journal_start_sb+0x58/0x90
      [20786.363377]   [<ffffffff811cce85>] ext4_delete_inode+0xc5/0x2c0
      [20786.363389]   [<ffffffff81146fa3>] generic_delete_inode+0xd3/0x1a0
      [20786.363401]   [<ffffffff81147095>] generic_drop_inode+0x25/0x30
      [20786.363411]   [<ffffffff81145ce2>] iput+0x62/0x70
      [20786.363420]   [<ffffffff81142878>] dentry_iput+0x98/0x110
      [20786.363429]   [<ffffffff81142a00>] d_kill+0x50/0x80
      [20786.363438]   [<ffffffff811444c5>] dput+0x95/0x180
      [20786.363447]   [<ffffffff8120de4b>] ecryptfs_d_release+0x2b/0x70
      [20786.363459]   [<ffffffff81142978>] d_free+0x28/0x60
      [20786.363468]   [<ffffffff81142a18>] d_kill+0x68/0x80
      [20786.363477]   [<ffffffff81142ad3>] prune_one_dentry+0xa3/0xc0
      [20786.363487]   [<ffffffff81142d61>] __shrink_dcache_sb+0x271/0x290
      [20786.363497]   [<ffffffff81142e89>] prune_dcache+0x109/0x1b0
      [20786.363506]   [<ffffffff81142f6f>] shrink_dcache_memory+0x3f/0x50
      [20786.363516]   [<ffffffff810f6d3d>] shrink_slab+0x12d/0x190
      [20786.363527]   [<ffffffff810f97d7>] balance_pgdat+0x4d7/0x640
      [20786.363537]   [<ffffffff810f9a57>] kswapd+0x117/0x170
      [20786.363546]   [<ffffffff810773ce>] kthread+0x9e/0xb0
      [20786.363558]   [<ffffffff8101430a>] child_rip+0xa/0x20
      [20786.363569]   [<ffffffffffffffff>] 0xffffffffffffffff
      [20786.363598] irq event stamp: 15997
      [20786.363603] hardirqs last  enabled at (15997): [<ffffffff81125f9d>] kmem_cache_alloc+0xfd/0x1a0
      [20786.363617] hardirqs last disabled at (15996): [<ffffffff81125f01>] kmem_cache_alloc+0x61/0x1a0
      [20786.363628] softirqs last  enabled at (15966): [<ffffffff810631ea>] __do_softirq+0x14a/0x220
      [20786.363641] softirqs last disabled at (15861): [<ffffffff8101440c>] call_softirq+0x1c/0x30
      [20786.363651] 
      [20786.363653] other info that might help us debug this:
      [20786.363660] 3 locks held by http/8397:
      [20786.363665]  #0:  (&sb->s_type->i_mutex_key#8){+.+.+.}, at: [<ffffffff8112ed24>] do_truncate+0x64/0x90
      [20786.363685]  #1:  (&sb->s_type->i_alloc_sem_key#5){+++++.}, at: [<ffffffff81147f90>] notify_change+0x250/0x350
      [20786.363707]  #2:  (jbd2_handle){+.+.?.}, at: [<ffffffff812008bb>] jbd2_journal_start+0xdb/0x150
      [20786.363724] 
      [20786.363726] stack backtrace:
      [20786.363734] Pid: 8397, comm: http Tainted: G         C 2.6.31-2-generic #14~rbd4gitd960eea9
      [20786.363741] Call Trace:
      [20786.363752]  [<ffffffff8108ad7c>] print_usage_bug+0x18c/0x1a0
      [20786.363763]  [<ffffffff8108b0c0>] ? check_usage_backwards+0x0/0xb0
      [20786.363773]  [<ffffffff8108bad2>] mark_lock_irq+0xf2/0x280
      [20786.363783]  [<ffffffff8108bd97>] mark_lock+0x137/0x1d0
      [20786.363793]  [<ffffffff8108c03c>] mark_held_locks+0x6c/0xa0
      [20786.363803]  [<ffffffff8108c11f>] lockdep_trace_alloc+0xaf/0xe0
      [20786.363813]  [<ffffffff810efbac>] __alloc_pages_nodemask+0x7c/0x180
      [20786.363824]  [<ffffffff810e9411>] ? find_get_page+0x91/0xf0
      [20786.363835]  [<ffffffff8111d3b7>] alloc_pages_current+0x87/0xd0
      [20786.363845]  [<ffffffff810e9827>] __page_cache_alloc+0x67/0x70
      [20786.363856]  [<ffffffff810eb7df>] find_or_create_page+0x4f/0xb0
      [20786.363867]  [<ffffffff811cb3be>] ext4_block_truncate_page+0x3e/0x460
      [20786.363876]  [<ffffffff812008da>] ? jbd2_journal_start+0xfa/0x150
      [20786.363885]  [<ffffffff812008bb>] ? jbd2_journal_start+0xdb/0x150
      [20786.363895]  [<ffffffff811c6415>] ? ext4_meta_trans_blocks+0x75/0xf0
      [20786.363905]  [<ffffffff811e8d8b>] ext4_ext_truncate+0x1bb/0x1e0
      [20786.363916]  [<ffffffff811072c5>] ? unmap_mapping_range+0x75/0x290
      [20786.363926]  [<ffffffff811ccc28>] ext4_truncate+0x498/0x630
      [20786.363938]  [<ffffffff8129b4ce>] ? _raw_spin_unlock+0x5e/0xb0
      [20786.363947]  [<ffffffff81107306>] ? unmap_mapping_range+0xb6/0x290
      [20786.363957]  [<ffffffff8108c3ad>] ? trace_hardirqs_on+0xd/0x10
      [20786.363966]  [<ffffffff811ffe58>] ? jbd2_journal_stop+0x1f8/0x2e0
      [20786.363976]  [<ffffffff81107690>] vmtruncate+0xb0/0x110
      [20786.363986]  [<ffffffff81147c05>] inode_setattr+0x35/0x170
      [20786.363995]  [<ffffffff811c9906>] ext4_setattr+0x186/0x370
      [20786.364005]  [<ffffffff81147eab>] notify_change+0x16b/0x350
      [20786.364014]  [<ffffffff8112ed30>] do_truncate+0x70/0x90
      [20786.364021]  [<ffffffff8112f48b>] T.657+0xeb/0x110
      [20786.364021]  [<ffffffff8112f4be>] sys_ftruncate+0xe/0x10
      [20786.364021]  [<ffffffff81013132>] system_call_fastpath+0x16/0x1b
      Reported-by: NRoland Dreier <roland@digitalvampire.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      f4a01017
  19. 25 5月, 2009 1 次提交
  20. 18 5月, 2009 1 次提交
    • T
      ext4: Add a comprehensive block validity check to ext4_get_blocks() · 6fd058f7
      Theodore Ts'o 提交于
      To catch filesystem bugs or corruption which could lead to the
      filesystem getting severly damaged, this patch adds a facility for
      tracking all of the filesystem metadata blocks by contiguous regions
      in a red-black tree.  This allows quick searching of the tree to
      locate extents which might overlap with filesystem metadata blocks.
      
      This facility is also used by the multi-block allocator to assure that
      it is not allocating blocks out of the system zone, as well as by the
      routines used when reading indirect blocks and extents information
      from disk to make sure their contents are valid.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      6fd058f7
  21. 15 5月, 2009 2 次提交
    • A
      ext4: Clear the unwritten buffer_head flag after the extent is initialized · 2a8964d6
      Aneesh Kumar K.V 提交于
      The BH_Unwritten flag indicates that the buffer is allocated on disk
      but has not been written; that is, the disk was part of a persistent
      preallocation area.  That flag should only be set when a get_blocks()
      function is looking up a inode's logical to physical block mapping.
      
      When ext4_get_blocks_wrap() is called with create=1, the uninitialized
      extent is converted into an initialized one, so the BH_Unwritten flag
      is no longer appropriate.  Hence, we need to make sure the
      BH_Unwritten is not left set, since the combination of BH_Mapped and
      BH_Unwritten is not allowed; among other things, it will result ext4's
      get_block() to be called over and over again during the write_begin
      phase of write(2).
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      2a8964d6
    • T
      ext4: Clean up ext4_get_blocks() so it does not depend on bh_result->b_state · 2ac3b6e0
      Theodore Ts'o 提交于
      The ext4_get_blocks() function was depending on the value of
      bh_result->b_state as an input parameter to decide whether or not
      update the delalloc accounting statistics by calling
      ext4_da_update_reserve_space().  We now use a separate flag,
      EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE, to requests this update, so that
      all callers of ext4_get_blocks() can clear map_bh.b_state before
      calling ext4_get_blocks() without worrying about any consistency
      issues.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      2ac3b6e0
  22. 14 5月, 2009 1 次提交
  23. 13 5月, 2009 1 次提交
  24. 14 5月, 2009 1 次提交
  25. 13 5月, 2009 1 次提交
  26. 14 5月, 2009 3 次提交
    • T
      ext4: Add documentation to the ext4_*get_block* functions · b920c755
      Theodore Ts'o 提交于
      This adds more documentation to various internal functions in
      fs/ext4/inode.c, most notably ext4_ind_get_blocks(),
      ext4_da_get_block_write(), ext4_da_get_block_prep(),
      ext4_normal_get_block_write().
      
      In addition, the static function ext4_normal_get_block_write() has
      been renamed noalloc_get_block_write(), since it is used in many
      places far beyond ext4_normal_writepage().
      
      Plenty of warnings have been added to the noalloc_get_block_write()
      function, since the way it is used is amazingly fragile.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      b920c755
    • T
      ext4: Define a new set of flags for ext4_get_blocks() · c2177057
      Theodore Ts'o 提交于
      The functions ext4_get_blocks(), ext4_ext_get_blocks(), and
      ext4_ind_get_blocks() used an ad-hoc set of integer variables used as
      boolean flags passed in as arguments.  Use a single flags parameter
      and a setandard set of bitfield flags instead.  This saves space on
      the call stack, and it also makes the code a bit more understandable.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      c2177057
    • T
      ext4: Rename ext4_get_blocks_wrap() to be ext4_get_blocks() · 12b7ac17
      Theodore Ts'o 提交于
      Another function rename for clarity's sake.  The _wrap prefix simply
      confuses people, and didn't add much people trying to follow the code
      paths.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      12b7ac17
  27. 12 5月, 2009 1 次提交