1. 09 2月, 2013 2 次提交
    • T
      ext4: pass context information to jbd2__journal_start() · 9924a92a
      Theodore Ts'o 提交于
      So we can better understand what bits of ext4 are responsible for
      long-running jbd2 handles, use jbd2__journal_start() so we can pass
      context information for logging purposes.
      
      The recommended way for finding the longer-running handles is:
      
         T=/sys/kernel/debug/tracing
         EVENT=$T/events/jbd2/jbd2_handle_stats
         echo "interval > 5" > $EVENT/filter
         echo 1 > $EVENT/enable
      
         ./run-my-fs-benchmark
      
         cat $T/trace > /tmp/problem-handles
      
      This will list handles that were active for longer than 20ms.  Having
      longer-running handles is bad, because a commit started at the wrong
      time could stall for those 20+ milliseconds, which could delay an
      fsync() or an O_SYNC operation.  Here is an example line from the
      trace file describing a handle which lived on for 311 jiffies, or over
      1.2 seconds:
      
      postmark-2917  [000] ....   196.435786: jbd2_handle_stats: dev 254,32 
         tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1
         dirtied_blocks 0
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      9924a92a
    • T
      ext4: move the jbd2 wrapper functions out of super.c · 722887dd
      Theodore Ts'o 提交于
      Move the jbd2 wrapper functions which start and stop handles out of
      super.c, where they don't really logically belong, and into
      ext4_jbd2.c.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      722887dd
  2. 05 2月, 2013 1 次提交
    • T
      ext4: optimize mballoc for large allocations · 40ae3487
      Theodore Ts'o 提交于
      The ext4 block allocator only maintains buddy bitmaps for chunks which
      are less than or equal to one quarter of a block group.  That is, for
      a file aystem with a 1k blocksize, and where the number of blocks in a
      block group is 8192 blocks, the largest chunk size tracked by buddy
      bitmaps is 2048 blocks.
      
      For a file system with a 4k blocksize, and where the number of blocks
      in a block group is 32768 blocks, the largest chunk size tracked by
      buddy bitmaps is 8192 blocks.
      
      To work around this code, mballoc.c before this commit would truncate
      allocation requests to the number of blocks in a block group minus 10.
      Why 10?  Aside from being a completely arbitrary number, it avoids
      block allocation to be a power of two larger than 25% of the block
      group.  If you try to explicitly fallocate 50% of the block group
      size, this will demonstrate the problem; the block allocation code
      will scan the all of the blocks in the file system with cr==0 (since
      the request is for a natural power of two), but then completely fail
      for all blocks groups, since the buddy bitmaps don't track chunk sizes
      of 50% of the block group.
      
      To fix this, in these we use ext4_mb_complex_scan_group() instead of
      ext4_mb_simple_scan_group().
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: Andreas Dilger <adilger@dilger.ca>
      40ae3487
  3. 03 2月, 2013 4 次提交
  4. 02 2月, 2013 4 次提交
  5. 30 1月, 2013 1 次提交
  6. 29 1月, 2013 10 次提交
  7. 28 1月, 2013 6 次提交
  8. 25 1月, 2013 2 次提交
  9. 17 1月, 2013 1 次提交
  10. 13 1月, 2013 4 次提交
    • T
      ext4: trigger the lazy inode table initialization after resize · 7f511862
      Theodore Ts'o 提交于
      After we have finished extending the file system, we need to trigger a
      the lazy inode table thread to zero out the inode tables.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      7f511862
    • E
      ext4: check bh in ext4_read_block_bitmap() · 15b49132
      Eryu Guan 提交于
      Validate the bh pointer before using it, since
      ext4_read_block_bitmap_nowait() might return NULL.
      
      I've seen this in fsfuzz testing.
      
       EXT4-fs error (device loop0): ext4_read_block_bitmap_nowait:385: comm touch: Cannot get buffer for block bitmap - block_group = 0, block_bitmap = 3925999616
       BUG: unable to handle kernel NULL pointer dereference at           (null)
       IP: [<ffffffff8121de25>] ext4_wait_block_bitmap+0x25/0xe0
       ...
       Call Trace:
        [<ffffffff8121e1e5>] ext4_read_block_bitmap+0x35/0x60
        [<ffffffff8125e9c6>] ext4_free_blocks+0x236/0xb80
        [<ffffffff811d0d36>] ? __getblk+0x36/0x70
        [<ffffffff811d0a5f>] ? __find_get_block+0x8f/0x210
        [<ffffffff81191ef3>] ? kmem_cache_free+0x33/0x140
        [<ffffffff812678e5>] ext4_xattr_release_block+0x1b5/0x1d0
        [<ffffffff812679be>] ext4_xattr_delete_inode+0xbe/0x100
        [<ffffffff81222a7c>] ext4_free_inode+0x7c/0x4d0
        [<ffffffff812277b8>] ? ext4_mark_inode_dirty+0x88/0x230
        [<ffffffff8122993c>] ext4_evict_inode+0x32c/0x490
        [<ffffffff811b8cd7>] evict+0xa7/0x1c0
        [<ffffffff811b8ed3>] iput_final+0xe3/0x170
        [<ffffffff811b8f9e>] iput+0x3e/0x50
        [<ffffffff812316fd>] ext4_add_nondir+0x4d/0x90
        [<ffffffff81231d0b>] ext4_create+0xeb/0x170
        [<ffffffff811aae9c>] vfs_create+0xac/0xd0
        [<ffffffff811ac845>] lookup_open+0x185/0x1c0
        [<ffffffff8129e3b9>] ? selinux_inode_permission+0xa9/0x170
        [<ffffffff811acb54>] do_last+0x2d4/0x7a0
        [<ffffffff811af743>] path_openat+0xb3/0x480
        [<ffffffff8116a8a1>] ? handle_mm_fault+0x251/0x3b0
        [<ffffffff811afc49>] do_filp_open+0x49/0xa0
        [<ffffffff811bbaad>] ? __alloc_fd+0xdd/0x150
        [<ffffffff8119da28>] do_sys_open+0x108/0x1f0
        [<ffffffff8119db51>] sys_open+0x21/0x30
        [<ffffffff81618959>] system_call_fastpath+0x16/0x1b
      
      Also fix comment for ext4_read_block_bitmap_nowait()
      Signed-off-by: NEryu Guan <guaneryu@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      15b49132
    • W
      ext4: use unlikely to improve the efficiency of the kernel · aebf0243
      Wang Shilong 提交于
      Because the function 'sb_getblk' seldomly fails to return NULL
      value,it will be better to use 'unlikely' to optimize it.
      Signed-off-by: NWang Shilong <wangsl-fnst@cn.fujitsu.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      aebf0243
    • T
      ext4: return ENOMEM if sb_getblk() fails · 860d21e2
      Theodore Ts'o 提交于
      The only reason for sb_getblk() failing is if it can't allocate the
      buffer_head.  So ENOMEM is more appropriate than EIO.  In addition,
      make sure that the file system is marked as being inconsistent if
      sb_getblk() fails.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      860d21e2
  11. 07 1月, 2013 3 次提交
  12. 27 12月, 2012 2 次提交
    • T
      ext4: avoid hang when mounting non-journal filesystems with orphan list · 0e9a9a1a
      Theodore Ts'o 提交于
      When trying to mount a file system which does not contain a journal,
      but which does have a orphan list containing an inode which needs to
      be truncated, the mount call with hang forever in
      ext4_orphan_cleanup() because ext4_orphan_del() will return
      immediately without removing the inode from the orphan list, leading
      to an uninterruptible loop in kernel code which will busy out one of
      the CPU's on the system.
      
      This can be trivially reproduced by trying to mount the file system
      found in tests/f_orphan_extents_inode/image.gz from the e2fsprogs
      source tree.  If a malicious user were to put this on a USB stick, and
      mount it on a Linux desktop which has automatic mounts enabled, this
      could be considered a potential denial of service attack.  (Not a big
      deal in practice, but professional paranoids worry about such things,
      and have even been known to allocate CVE numbers for such problems.)
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
      Cc: stable@vger.kernel.org
      0e9a9a1a
    • T
      ext4: lock i_mutex when truncating orphan inodes · 721e3eba
      Theodore Ts'o 提交于
      Commit c278531d added a warning when ext4_flush_unwritten_io() is
      called without i_mutex being taken.  It had previously not been taken
      during orphan cleanup since races weren't possible at that point in
      the mount process, but as a result of this c278531d, we will now see
      a kernel WARN_ON in this case.  Take the i_mutex in
      ext4_orphan_cleanup() to suppress this warning.
      Reported-by: NAlexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
      Cc: stable@vger.kernel.org
      721e3eba