1. 13 7月, 2009 1 次提交
    • C
      ext4: Fix buffer head reference leak in no-journal mode · e6b5d301
      Curt Wohlgemuth 提交于
      We found a problem with buffer head reference leaks when using an ext4
      partition without a journal.  In particular, calls to ext4_forget() would
      not to a brelse() on the input buffer head, which will cause pages they
      belong to to not be reclaimable.
      
      Further investigation showed that all places where ext4_journal_forget() and
      ext4_journal_revoke() are called are subject to the same problem.  The patch
      below changes __ext4_journal_forget/__ext4_journal_revoke to do an explicit
      release of the buffer head when the journal handle isn't valid.
      Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      e6b5d301
  2. 15 6月, 2009 3 次提交
  3. 04 6月, 2009 1 次提交
  4. 14 7月, 2009 1 次提交
    • J
      ext4: Fix truncation of symlinks after failed write · ffacfa7a
      Jan Kara 提交于
      Contents of long symlinks is written via standard write methods. So
      when the write fails, we add inode to orphan list. But symlinks don't
      have .truncate method defined so nobody properly removes them from the
      on disk orphan list.
      
      Fix this by calling ext4_truncate() directly instead of calling
      vmtruncate() (which is saner anyway since we don't need anything
      vmtruncate() does except from calling .truncate in these paths).  We
      also add inode to orphan list only if ext4_can_truncate() is true
      (currently, it can be false for symlinks when there are no blocks
      allocated) - otherwise orphan list processing will complain and
      ext4_truncate() will not remove inode from on-disk orphan list.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      ffacfa7a
  5. 06 7月, 2009 1 次提交
    • T
      ext4: Fix potential reclaim deadlock when truncating partial block · f4a01017
      Theodore Ts'o 提交于
      The ext4_block_truncate_page() function previously called
      grab_cache_page(), which called find_or_create_page() with the
      __GFP_FS flag potentially set.  This could cause a deadlock if the
      system is low on memory and it attempts a memory reclaim, which could
      potentially call back into ext4.  So we need to call
      find_or_create_page() directly, and remove the __GFP_FP flag to avoid
      this potential deadlock.
      
      Thanks to Roland Dreier for reporting a lockdep warning which showed
      this problem.
      
      [20786.363249] =================================
      [20786.363257] [ INFO: inconsistent lock state ]
      [20786.363265] 2.6.31-2-generic #14~rbd4gitd960eea9
      [20786.363270] ---------------------------------
      [20786.363276] inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
      [20786.363285] http/8397 [HC0[0]:SC0[0]:HE1:SE1] takes:
      [20786.363291]  (jbd2_handle){+.+.?.}, at: [<ffffffff812008bb>] jbd2_journal_start+0xdb/0x150
      [20786.363314] {IN-RECLAIM_FS-W} state was registered at:
      [20786.363320]   [<ffffffff8108bef6>] mark_irqflags+0xc6/0x1a0
      [20786.363334]   [<ffffffff8108d347>] __lock_acquire+0x287/0x430
      [20786.363345]   [<ffffffff8108d595>] lock_acquire+0xa5/0x150
      [20786.363355]   [<ffffffff812008da>] jbd2_journal_start+0xfa/0x150
      [20786.363365]   [<ffffffff811d98a8>] ext4_journal_start_sb+0x58/0x90
      [20786.363377]   [<ffffffff811cce85>] ext4_delete_inode+0xc5/0x2c0
      [20786.363389]   [<ffffffff81146fa3>] generic_delete_inode+0xd3/0x1a0
      [20786.363401]   [<ffffffff81147095>] generic_drop_inode+0x25/0x30
      [20786.363411]   [<ffffffff81145ce2>] iput+0x62/0x70
      [20786.363420]   [<ffffffff81142878>] dentry_iput+0x98/0x110
      [20786.363429]   [<ffffffff81142a00>] d_kill+0x50/0x80
      [20786.363438]   [<ffffffff811444c5>] dput+0x95/0x180
      [20786.363447]   [<ffffffff8120de4b>] ecryptfs_d_release+0x2b/0x70
      [20786.363459]   [<ffffffff81142978>] d_free+0x28/0x60
      [20786.363468]   [<ffffffff81142a18>] d_kill+0x68/0x80
      [20786.363477]   [<ffffffff81142ad3>] prune_one_dentry+0xa3/0xc0
      [20786.363487]   [<ffffffff81142d61>] __shrink_dcache_sb+0x271/0x290
      [20786.363497]   [<ffffffff81142e89>] prune_dcache+0x109/0x1b0
      [20786.363506]   [<ffffffff81142f6f>] shrink_dcache_memory+0x3f/0x50
      [20786.363516]   [<ffffffff810f6d3d>] shrink_slab+0x12d/0x190
      [20786.363527]   [<ffffffff810f97d7>] balance_pgdat+0x4d7/0x640
      [20786.363537]   [<ffffffff810f9a57>] kswapd+0x117/0x170
      [20786.363546]   [<ffffffff810773ce>] kthread+0x9e/0xb0
      [20786.363558]   [<ffffffff8101430a>] child_rip+0xa/0x20
      [20786.363569]   [<ffffffffffffffff>] 0xffffffffffffffff
      [20786.363598] irq event stamp: 15997
      [20786.363603] hardirqs last  enabled at (15997): [<ffffffff81125f9d>] kmem_cache_alloc+0xfd/0x1a0
      [20786.363617] hardirqs last disabled at (15996): [<ffffffff81125f01>] kmem_cache_alloc+0x61/0x1a0
      [20786.363628] softirqs last  enabled at (15966): [<ffffffff810631ea>] __do_softirq+0x14a/0x220
      [20786.363641] softirqs last disabled at (15861): [<ffffffff8101440c>] call_softirq+0x1c/0x30
      [20786.363651] 
      [20786.363653] other info that might help us debug this:
      [20786.363660] 3 locks held by http/8397:
      [20786.363665]  #0:  (&sb->s_type->i_mutex_key#8){+.+.+.}, at: [<ffffffff8112ed24>] do_truncate+0x64/0x90
      [20786.363685]  #1:  (&sb->s_type->i_alloc_sem_key#5){+++++.}, at: [<ffffffff81147f90>] notify_change+0x250/0x350
      [20786.363707]  #2:  (jbd2_handle){+.+.?.}, at: [<ffffffff812008bb>] jbd2_journal_start+0xdb/0x150
      [20786.363724] 
      [20786.363726] stack backtrace:
      [20786.363734] Pid: 8397, comm: http Tainted: G         C 2.6.31-2-generic #14~rbd4gitd960eea9
      [20786.363741] Call Trace:
      [20786.363752]  [<ffffffff8108ad7c>] print_usage_bug+0x18c/0x1a0
      [20786.363763]  [<ffffffff8108b0c0>] ? check_usage_backwards+0x0/0xb0
      [20786.363773]  [<ffffffff8108bad2>] mark_lock_irq+0xf2/0x280
      [20786.363783]  [<ffffffff8108bd97>] mark_lock+0x137/0x1d0
      [20786.363793]  [<ffffffff8108c03c>] mark_held_locks+0x6c/0xa0
      [20786.363803]  [<ffffffff8108c11f>] lockdep_trace_alloc+0xaf/0xe0
      [20786.363813]  [<ffffffff810efbac>] __alloc_pages_nodemask+0x7c/0x180
      [20786.363824]  [<ffffffff810e9411>] ? find_get_page+0x91/0xf0
      [20786.363835]  [<ffffffff8111d3b7>] alloc_pages_current+0x87/0xd0
      [20786.363845]  [<ffffffff810e9827>] __page_cache_alloc+0x67/0x70
      [20786.363856]  [<ffffffff810eb7df>] find_or_create_page+0x4f/0xb0
      [20786.363867]  [<ffffffff811cb3be>] ext4_block_truncate_page+0x3e/0x460
      [20786.363876]  [<ffffffff812008da>] ? jbd2_journal_start+0xfa/0x150
      [20786.363885]  [<ffffffff812008bb>] ? jbd2_journal_start+0xdb/0x150
      [20786.363895]  [<ffffffff811c6415>] ? ext4_meta_trans_blocks+0x75/0xf0
      [20786.363905]  [<ffffffff811e8d8b>] ext4_ext_truncate+0x1bb/0x1e0
      [20786.363916]  [<ffffffff811072c5>] ? unmap_mapping_range+0x75/0x290
      [20786.363926]  [<ffffffff811ccc28>] ext4_truncate+0x498/0x630
      [20786.363938]  [<ffffffff8129b4ce>] ? _raw_spin_unlock+0x5e/0xb0
      [20786.363947]  [<ffffffff81107306>] ? unmap_mapping_range+0xb6/0x290
      [20786.363957]  [<ffffffff8108c3ad>] ? trace_hardirqs_on+0xd/0x10
      [20786.363966]  [<ffffffff811ffe58>] ? jbd2_journal_stop+0x1f8/0x2e0
      [20786.363976]  [<ffffffff81107690>] vmtruncate+0xb0/0x110
      [20786.363986]  [<ffffffff81147c05>] inode_setattr+0x35/0x170
      [20786.363995]  [<ffffffff811c9906>] ext4_setattr+0x186/0x370
      [20786.364005]  [<ffffffff81147eab>] notify_change+0x16b/0x350
      [20786.364014]  [<ffffffff8112ed30>] do_truncate+0x70/0x90
      [20786.364021]  [<ffffffff8112f48b>] T.657+0xeb/0x110
      [20786.364021]  [<ffffffff8112f4be>] sys_ftruncate+0xe/0x10
      [20786.364021]  [<ffffffff81013132>] system_call_fastpath+0x16/0x1b
      Reported-by: NRoland Dreier <roland@digitalvampire.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      f4a01017
  6. 24 6月, 2009 1 次提交
  7. 15 6月, 2009 2 次提交
  8. 13 6月, 2009 2 次提交
  9. 09 6月, 2009 1 次提交
  10. 05 6月, 2009 2 次提交
  11. 09 6月, 2009 1 次提交
    • J
      ext4: Get rid of EXTEND_DISKSIZE flag of ext4_get_blocks_handle() · 03f5d8bc
      Jan Kara 提交于
      Get rid of EXTEND_DISKSIZE flag of ext4_get_blocks_handle(). This
      seems to be a relict from some old days and setting disksize in this
      function does not make much sense.  Currently it was set only by
      ext4_getblk().  Since the parameter has some effect only if create ==
      1, it is easy to check by grepping through the sources that the three
      callers which end up calling ext4_getblk() with create == 1
      (ext4_append, ext4_quota_write, ext4_mkdir) do the right thing and set
      disksize themselves.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      03f5d8bc
  12. 25 5月, 2009 1 次提交
  13. 18 5月, 2009 1 次提交
    • T
      ext4: Add a comprehensive block validity check to ext4_get_blocks() · 6fd058f7
      Theodore Ts'o 提交于
      To catch filesystem bugs or corruption which could lead to the
      filesystem getting severly damaged, this patch adds a facility for
      tracking all of the filesystem metadata blocks by contiguous regions
      in a red-black tree.  This allows quick searching of the tree to
      locate extents which might overlap with filesystem metadata blocks.
      
      This facility is also used by the multi-block allocator to assure that
      it is not allocating blocks out of the system zone, as well as by the
      routines used when reading indirect blocks and extents information
      from disk to make sure their contents are valid.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      6fd058f7
  14. 15 5月, 2009 2 次提交
    • A
      ext4: Clear the unwritten buffer_head flag after the extent is initialized · 2a8964d6
      Aneesh Kumar K.V 提交于
      The BH_Unwritten flag indicates that the buffer is allocated on disk
      but has not been written; that is, the disk was part of a persistent
      preallocation area.  That flag should only be set when a get_blocks()
      function is looking up a inode's logical to physical block mapping.
      
      When ext4_get_blocks_wrap() is called with create=1, the uninitialized
      extent is converted into an initialized one, so the BH_Unwritten flag
      is no longer appropriate.  Hence, we need to make sure the
      BH_Unwritten is not left set, since the combination of BH_Mapped and
      BH_Unwritten is not allowed; among other things, it will result ext4's
      get_block() to be called over and over again during the write_begin
      phase of write(2).
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      2a8964d6
    • T
      ext4: Clean up ext4_get_blocks() so it does not depend on bh_result->b_state · 2ac3b6e0
      Theodore Ts'o 提交于
      The ext4_get_blocks() function was depending on the value of
      bh_result->b_state as an input parameter to decide whether or not
      update the delalloc accounting statistics by calling
      ext4_da_update_reserve_space().  We now use a separate flag,
      EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE, to requests this update, so that
      all callers of ext4_get_blocks() can clear map_bh.b_state before
      calling ext4_get_blocks() without worrying about any consistency
      issues.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      2ac3b6e0
  15. 14 5月, 2009 1 次提交
  16. 13 5月, 2009 1 次提交
  17. 14 5月, 2009 1 次提交
  18. 13 5月, 2009 1 次提交
  19. 14 5月, 2009 3 次提交
    • T
      ext4: Add documentation to the ext4_*get_block* functions · b920c755
      Theodore Ts'o 提交于
      This adds more documentation to various internal functions in
      fs/ext4/inode.c, most notably ext4_ind_get_blocks(),
      ext4_da_get_block_write(), ext4_da_get_block_prep(),
      ext4_normal_get_block_write().
      
      In addition, the static function ext4_normal_get_block_write() has
      been renamed noalloc_get_block_write(), since it is used in many
      places far beyond ext4_normal_writepage().
      
      Plenty of warnings have been added to the noalloc_get_block_write()
      function, since the way it is used is amazingly fragile.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      b920c755
    • T
      ext4: Define a new set of flags for ext4_get_blocks() · c2177057
      Theodore Ts'o 提交于
      The functions ext4_get_blocks(), ext4_ext_get_blocks(), and
      ext4_ind_get_blocks() used an ad-hoc set of integer variables used as
      boolean flags passed in as arguments.  Use a single flags parameter
      and a setandard set of bitfield flags instead.  This saves space on
      the call stack, and it also makes the code a bit more understandable.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      c2177057
    • T
      ext4: Rename ext4_get_blocks_wrap() to be ext4_get_blocks() · 12b7ac17
      Theodore Ts'o 提交于
      Another function rename for clarity's sake.  The _wrap prefix simply
      confuses people, and didn't add much people trying to follow the code
      paths.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      12b7ac17
  20. 12 5月, 2009 2 次提交
  21. 17 6月, 2009 1 次提交
  22. 01 5月, 2009 1 次提交
    • T
      ext4: Avoid races caused by on-line resizing and SMP memory reordering · 8df9675f
      Theodore Ts'o 提交于
      Ext4's on-line resizing adds a new block group and then, only at the
      last step adjusts s_groups_count.  However, it's possible on SMP
      systems that another CPU could see the updated the s_group_count and
      not see the newly initialized data structures for the just-added block
      group.  For this reason, it's important to insert a SMP read barrier
      after reading s_groups_count and before reading any (for example) the
      new block group descriptors allowed by the increased value of
      s_groups_count.
      
      Unfortunately, we rather blatently violate this locking protocol
      documented in fs/ext4/resize.c.  Fortunately, (1) on-line resizes
      happen relatively rarely, and (2) it seems rare that the filesystem
      code will immediately try to use just-added block group before any
      memory ordering issues resolve themselves.  So apparently problems
      here are relatively hard to hit, since ext3 has been vulnerable to the
      same issue for years with no one apparently complaining.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      8df9675f
  23. 13 5月, 2009 1 次提交
    • A
      ext4: Mark the unwritten buffer_head as mapped during write_begin · 29fa89d0
      Aneesh Kumar K.V 提交于
      Setting BH_Unwritten buffer_heads as BH_Mapped avoids multiple
      (unnecessary) calls to get_block() during the call to the write(2)
      system call.  Setting BH_Unwritten buffer heads as BH_Mapped requires
      that the writepages() functions can handle BH_Unwritten buffer_heads.
      
      After this commit, things work as follows:
      
      ext4_ext_get_block() returns unmapped, unwritten, buffer head when
      called with create = 0 for prealloc space. This makes sure we handle
      the read path and non-delayed allocation case correctly.  Even though
      the buffer head is marked unmapped we have valid b_blocknr and b_bdev
      values in the buffer_head.
      
      ext4_da_get_block_prep() called for block resrevation will now return
      mapped, unwritten, new buffer_head for prealloc space. This avoids
      multiple calls to get_block() for write to same offset. By making such
      buffers as BH_New, we also assure that sub-block zeroing of buffered
      writes happens correctly.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      29fa89d0
  24. 14 5月, 2009 1 次提交
  25. 25 4月, 2009 3 次提交
    • T
      ext4: Do not try to validate extents on special files · c4b5a614
      Theodore Ts'o 提交于
      The EXTENTS_FL flag should never be set on special files, but if it
      is, don't bother trying to validate that the extents tree is valid,
      since only files, directories, and non-fast symlinks will ever have an
      extent data structure.  We perhaps should flag the filesystem as being
      corrupted if we see a special file (named pipes, device nodes, Unix
      domain sockets, etc.) with the EXTENTS_FL flag, but e2fsck doesn't
      currently check this case, so we'll just ignore this for now, since
      it's harmless.
      
      Without this fix, a special device with the extents flag is flagged as
      an error by the kernel, so it is impossible to access or delete the
      inode, but e2fsck doesn't see it as a problem, leading to
      confused/frustrated users.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      c4b5a614
    • T
      ext4: Ignore i_file_acl_high unless EXT4_FEATURE_INCOMPAT_64BIT is present · a9e81742
      Theodore Ts'o 提交于
      Don't try to look at i_file_acl_high unless the INCOMPAT_64BIT feature
      bit is set.  The field is normally zero, but older versions of e2fsck
      didn't automatically check to make sure of this, so in the spirit of
      "be liberal in what you accept", don't look at i_file_acl_high unless
      we are using a 64-bit filesystem.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a9e81742
    • T
      ext4: Fix softlockup caused by illegal i_file_acl value in on-disk inode · 485c26ec
      Theodore Ts'o 提交于
      If the block containing external extended attributes (which is stored
      in i_file_acl and i_file_acl_high) is larger than the on-disk
      filesystem, the process which tried to access the extended attributes
      will endlessly issue kernel printks complaining that
      "__find_get_block_slow() failed", locking up that CPU until the system
      is forcibly rebooted.
      
      So when we read in the inode, make sure the i_file_acl value is legal,
      and if not, flag the filesystem as being corrupted.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      485c26ec
  26. 08 4月, 2009 1 次提交
  27. 01 4月, 2009 1 次提交
    • N
      mm: page_mkwrite change prototype to match fault · c2ec175c
      Nick Piggin 提交于
      Change the page_mkwrite prototype to take a struct vm_fault, and return
      VM_FAULT_xxx flags.  There should be no functional change.
      
      This makes it possible to return much more detailed error information to
      the VM (and also can provide more information eg.  virtual_address to the
      driver, which might be important in some special cases).
      
      This is required for a subsequent fix.  And will also make it easier to
      merge page_mkwrite() with fault() in future.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <joel.becker@oracle.com>
      Cc: Artem Bityutskiy <dedekind@infradead.org>
      Cc: Felix Blyakher <felixb@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c2ec175c
  28. 31 3月, 2009 1 次提交
  29. 26 3月, 2009 1 次提交