1. 24 9月, 2012 1 次提交
  2. 20 9月, 2012 3 次提交
    • T
      ext4: remove erroneous ext4_superblock_csum_set() in update_backups() · bef53b01
      Tao Ma 提交于
      The update_backups() function is used to backup all the metadata
      blocks, so we should not take it for granted that 'data' is pointed to
      a super block and use ext4_superblock_csum_set to calculate the
      checksum there.  In case where the data is a group descriptor block,
      it will corrupt the last group descriptor, and then e2fsck will
      complain about it it.
      
      As all the metadata checksums should already be OK when we do the
      backup, remove the wrong ext4_superblock_csum_set and it should be
      just fine.
      Reported-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      bef53b01
    • T
      ext4: fix potential deadlock in ext4_nonda_switch() · 00d4e736
      Theodore Ts'o 提交于
      In ext4_nonda_switch(), if the file system is getting full we used to
      call writeback_inodes_sb_if_idle().  The problem is that we can be
      holding i_mutex already, and this causes a potential deadlock when
      writeback_inodes_sb_if_idle() when it tries to take s_umount.  (See
      lockdep output below).
      
      As it turns out we don't need need to hold s_umount; the fact that we
      are in the middle of the write(2) system call will keep the superblock
      pinned.  Unfortunately writeback_inodes_sb() checks to make sure
      s_umount is taken, and the VFS uses a different mechanism for making
      sure the file system doesn't get unmounted out from under us.  The
      simplest way of dealing with this is to just simply grab s_umount
      using a trylock, and skip kicking the writeback flusher thread in the
      very unlikely case that we can't take a read lock on s_umount without
      blocking.
      
      Also, we now check the cirteria for kicking the writeback thread
      before we decide to whether to fall back to non-delayed writeback, so
      if there are any outstanding delayed allocation writes, we try to get
      them resolved as soon as possible.
      
         [ INFO: possible circular locking dependency detected ]
         3.6.0-rc1-00042-gce894ca #367 Not tainted
         -------------------------------------------------------
         dd/8298 is trying to acquire lock:
          (&type->s_umount_key#18){++++..}, at: [<c02277d4>] writeback_inodes_sb_if_idle+0x28/0x46
      
         but task is already holding lock:
          (&sb->s_type->i_mutex_key#8){+.+...}, at: [<c01ddcce>] generic_file_aio_write+0x5f/0xd3
      
         which lock already depends on the new lock.
      
         2 locks held by dd/8298:
          #0:  (sb_writers#2){.+.+.+}, at: [<c01ddcc5>] generic_file_aio_write+0x56/0xd3
          #1:  (&sb->s_type->i_mutex_key#8){+.+...}, at: [<c01ddcce>] generic_file_aio_write+0x5f/0xd3
      
         stack backtrace:
         Pid: 8298, comm: dd Not tainted 3.6.0-rc1-00042-gce894ca #367
         Call Trace:
          [<c015b79c>] ? console_unlock+0x345/0x372
          [<c06d62a1>] print_circular_bug+0x190/0x19d
          [<c019906c>] __lock_acquire+0x86d/0xb6c
          [<c01999db>] ? mark_held_locks+0x5c/0x7b
          [<c0199724>] lock_acquire+0x66/0xb9
          [<c02277d4>] ? writeback_inodes_sb_if_idle+0x28/0x46
          [<c06db935>] down_read+0x28/0x58
          [<c02277d4>] ? writeback_inodes_sb_if_idle+0x28/0x46
          [<c02277d4>] writeback_inodes_sb_if_idle+0x28/0x46
          [<c026f3b2>] ext4_nonda_switch+0xe1/0xf4
          [<c0271ece>] ext4_da_write_begin+0x27/0x193
          [<c01dcdb0>] generic_file_buffered_write+0xc8/0x1bb
          [<c01ddc47>] __generic_file_aio_write+0x1dd/0x205
          [<c01ddce7>] generic_file_aio_write+0x78/0xd3
          [<c026d336>] ext4_file_write+0x480/0x4a6
          [<c0198c1d>] ? __lock_acquire+0x41e/0xb6c
          [<c0180944>] ? sched_clock_cpu+0x11a/0x13e
          [<c01967e9>] ? trace_hardirqs_off+0xb/0xd
          [<c018099f>] ? local_clock+0x37/0x4e
          [<c0209f2c>] do_sync_write+0x67/0x9d
          [<c0209ec5>] ? wait_on_retry_sync_kiocb+0x44/0x44
          [<c020a7b9>] vfs_write+0x7b/0xe6
          [<c020a9a6>] sys_write+0x3b/0x64
          [<c06dd4bd>] syscall_call+0x7/0xb
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      00d4e736
    • A
      ext4: speed up truncate/unlink by not using bforget() unless needed · 18888cf0
      Andrey Sidorov 提交于
      Do not iterate over data blocks scanning for bh's to forget as they're
      never exist. This improves time taken by unlink / truncate syscall.
      Tested by continuously truncating file that is being written by dd.
      Another test is rm -rf of linux tree while tar unpacks it. With
      ordered data mode condition unlikely(!tbh) was always met in
      ext4_free_blocks. With journal data mode tbh was found only few times,
      so optimisation is also possible.
      
      Unlinking fallocated 60G file after doing sync && echo 3 >
      /proc/sys/vm/drop_caches && time rm --help
      
      X86 before (linux 3.6-rc4):
      # time rm -f test1
      real    0m2.710s
      user    0m0.000s
      sys     0m1.530s
      
      X86 after:
      # time rm -f test1
      real    0m0.644s
      user    0m0.003s
      sys     0m0.060s
      
      MIPS before (linux 2.6.37):
      # time rm -f test1
      real    0m 4.93s
      user    0m 0.00s
      sys     0m 4.61s
      
      MIPS after:
      # time rm -f test1
      real    0m 0.16s
      user    0m 0.00s
      sys     0m 0.06s
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrey Sidorov <qrxd43@motorola.com>
      18888cf0
  3. 19 9月, 2012 3 次提交
    • T
      ext4: fix online resizing when the # of block groups is constant · 59e31c15
      Theodore Ts'o 提交于
      Commit 1c6bd717 introduced a regression where an online resize
      operation which did not change the number of block groups would fail,
      i.e:
      
      	mke2fs -t /dev/vdc 60000
      	mount /dev/vdc
      	resize2fs /dev/vdc 60001
      
      This was due to a bug in the logic regarding when to try converting
      the filesystem to use meta_bg.
      
      Also fix up a number of other minor issues with the online resizing
      code: (a) Fix a sparse warning; (b) only check to make sure the device
      is large enough once, instead of multiple times through the resize
      loop.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      59e31c15
    • A
      ext4: make orphan functions be no-op in no-journal mode · c9b92530
      Anatol Pomozov 提交于
      Instead of checking whether the handle is valid, we check if journal
      is enabled. This avoids taking the s_orphan_lock mutex in all cases
      when there is no journal in use, including the error paths where
      ext4_orphan_del() is called with a handle set to NULL.
      Signed-off-by: NAnatol Pomozov <anatol.pomozov@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      c9b92530
    • T
      ext4: re-enable -o discard functionality in no-journal mode · b5e2368b
      Theodore Ts'o 提交于
      This is a revert of commit b56ff9d3, which removed the call to
      ext4_issue_discard() to fix a BUG reported because
      ext4_issue_discard() was being called from inside a block group
      spinlock.  As it turns out this bug had already been fixed by Lukas
      Czerner in commit 53fdcf99 by the simple expedient of moving when
      we call ext4_issue_discard() outside the spinlock.
      
      So it should be safe to re-enable this functionality, which I tested
      by putting an BUG_ON(in_atomic) just after the restored callsite to
      ext4_issue_discard().
      
      Addresses-Google-Bug: #6750518
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: Anatol Pomozov <anatol.pomozov@gmail.com>
      b5e2368b
  4. 18 9月, 2012 2 次提交
    • C
      ext4: fix possible non-initialized variable in htree_dirblock_to_tree() · 90b0a973
      Carlos Maiolino 提交于
      htree_dirblock_to_tree() declares a non-initialized 'err' variable,
      which is passed as a reference to another functions expecting them to
      set this variable with their error codes.
      
      It's passed to ext4_bread(), which then passes it to ext4_getblk(). If
      ext4_map_blocks() returns 0 due to a lookup failure, leaving the
      ext4_getblk() buffer_head uninitialized, it will make ext4_getblk()
      return to ext4_bread() without initialize the 'err' variable, and
      ext4_bread() will return to htree_dirblock_to_tree() with this variable
      still uninitialized.  htree_dirblock_to_tree() will pass this variable
      with garbage back to ext4_htree_fill_tree(), which expects a number of
      directory entries added to the rb-tree. which, in case, might return a
      fake non-zero value due the garbage left in the 'err' variable, leading
      the kernel to an Oops in ext4_dx_readdir(), once this is expecting a
      filled rb-tree node, when in turn it will have a NULL-ed one, causing an
      invalid page request when trying to get a fname struct from this NULL-ed
      rb-tree node in this line:
      
      fname = rb_entry(info->curr_node, struct fname, rb_hash);
      
      The patch itself initializes the err variable in
      htree_dirblock_to_tree() to avoid usage mistakes by the called
      functions, and also fix ext4_getblk() to return a initialized 'err'
      variable when ext4_map_blocks() fails a lookup.
      Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      90b0a973
    • T
      ext4: do not enable delalloc by default for ext2 · bc0b75f7
      Theodore Ts'o 提交于
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      bc0b75f7
  5. 14 9月, 2012 1 次提交
  6. 13 9月, 2012 3 次提交
  7. 05 9月, 2012 7 次提交
  8. 20 8月, 2012 1 次提交
  9. 19 8月, 2012 3 次提交
  10. 18 8月, 2012 1 次提交
  11. 17 8月, 2012 11 次提交
  12. 06 8月, 2012 2 次提交
    • T
      ext4: avoid kmemcheck complaint from reading uninitialized memory · 7e731bc9
      Theodore Ts'o 提交于
      Commit 03179fe9 introduced a kmemcheck complaint in
      ext4_da_get_block_prep() because we save and restore
      ei->i_da_metadata_calc_last_lblock even though it is left
      uninitialized in the case where i_da_metadata_calc_len is zero.
      
      This doesn't hurt anything, but silencing the kmemcheck complaint
      makes it easier for people to find real bugs.
      
      Addresses https://bugzilla.kernel.org/show_bug.cgi?id=45631
      (which is marked as a regression).
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      7e731bc9
    • T
      ext4: make sure the journal sb is written in ext4_clear_journal_err() · d796c52e
      Theodore Ts'o 提交于
      After we transfer set the EXT4_ERROR_FS bit in the file system
      superblock, it's not enough to call jbd2_journal_clear_err() to clear
      the error indication from journal superblock --- we need to call
      jbd2_journal_update_sb_errno() as well.  Otherwise, when the root file
      system is mounted read-only, the journal is replayed, and the error
      indicator is transferred to the superblock --- but the s_errno field
      in the jbd2 superblock is left set (since although we cleared it in
      memory, we never flushed it out to disk).
      
      This can end up confusing e2fsck.  We should make e2fsck more robust
      in this case, but the kernel shouldn't be leaving things in this
      confused state, either.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      
      d796c52e
  13. 31 7月, 2012 2 次提交