1. 18 2月, 2013 1 次提交
  2. 09 2月, 2013 1 次提交
    • T
      ext4: pass context information to jbd2__journal_start() · 9924a92a
      Theodore Ts'o 提交于
      So we can better understand what bits of ext4 are responsible for
      long-running jbd2 handles, use jbd2__journal_start() so we can pass
      context information for logging purposes.
      
      The recommended way for finding the longer-running handles is:
      
         T=/sys/kernel/debug/tracing
         EVENT=$T/events/jbd2/jbd2_handle_stats
         echo "interval > 5" > $EVENT/filter
         echo 1 > $EVENT/enable
      
         ./run-my-fs-benchmark
      
         cat $T/trace > /tmp/problem-handles
      
      This will list handles that were active for longer than 20ms.  Having
      longer-running handles is bad, because a commit started at the wrong
      time could stall for those 20+ milliseconds, which could delay an
      fsync() or an O_SYNC operation.  Here is an example line from the
      trace file describing a handle which lived on for 311 jiffies, or over
      1.2 seconds:
      
      postmark-2917  [000] ....   196.435786: jbd2_handle_stats: dev 254,32 
         tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1
         dirtied_blocks 0
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      9924a92a
  3. 02 2月, 2013 1 次提交
  4. 29 11月, 2012 1 次提交
    • T
      ext4: rationalize ext4_extents.h inclusion · 4a092d73
      Theodore Ts'o 提交于
      Previously, ext4_extents.h was being included at the end of ext4.h,
      which was bad for a number of reasons: (a) it was not being included
      in the expected place, and (b) it caused the header to be included
      multiple times.  There were #ifdef's to prevent this from causing any
      problems, but it still was unnecessary.
      
      By moving the function declarations that were in ext4_extents.h to
      ext4.h, which is standard practice for where the function declarations
      for the rest of ext4.h can be found, we can remove ext4_extents.h from
      being included in ext4.h at all, and then we can only include
      ext4_extents.h where it is needed in ext4's source files.
      
      It should be possible to move a few more things into ext4.h, and
      further reduce the number of source files that need to #include
      ext4_extents.h, but that's a cleanup for another day.
      Reported-by: NSachin Kamat <sachin.kamat@linaro.org>
      Reported-by: NWei Yongjun <weiyj.lk@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      4a092d73
  5. 29 9月, 2012 1 次提交
  6. 27 9月, 2012 6 次提交
    • W
      ext4: convert to use leXX_add_cpu() · ba39ebb6
      Wei Yongjun 提交于
      Convert cpu_to_leXX(leXX_to_cpu(E1) + E2) to use leXX_add_cpu().
      
      dpatch engine is used to auto generate this patch.
      (https://github.com/weiyj/dpatch)
      Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      ba39ebb6
    • W
      ext4: remove redundant offset check in mext_check_arguments() · cbb4ee83
      Wang Sheng-Hui 提交于
      In the check code above, if orig_start != donor_start, we would
      return -EINVAL. So here, orig_start should be equal with donor_start.
      Remove the redundant check here.
      Signed-off-by: NWang Sheng-Hui <shhuiw@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      cbb4ee83
    • D
      ext4: reimplement uninit extent optimization for move_extent_per_page() · 8c854473
      Dmitry Monakhov 提交于
      Uninitialized extent may became initialized(parallel writeback task)
      at any moment after we drop i_data_sem, so we have to recheck extent's
      state after we hold page's lock and i_data_sem.
      
      If we about to change page's mapping we must hold page's lock in order to
      serialize other users.
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      8c854473
    • D
      ext4: clean up online defrag bugs in move_extent_per_page() · bb557488
      Dmitry Monakhov 提交于
      Non-full list of bugs:
      1) uninitialized extent optimization does not hold page's lock,
         and simply replace brunches after that writeback code goes
         crazy because block mapping changed under it's feets
         kernel BUG at fs/ext4/inode.c:1434!  ( 288'th xfstress)
      
      2) uninitialized extent may became initialized right after we
         drop i_data_sem, so extent state must be rechecked
      
      3) Locked pages goes uptodate via following sequence:
         ->readpage(page); lock_page(page); use_that_page(page)
         But after readpage() one may invalidate it because it is
         uptodate and unlocked (reclaimer does that)
         As result kernel bug at include/linux/buffer_head.c:133!
      
      4) We call write_begin() with already opened stansaction which
         result in following deadlock:
      ->move_extent_per_page()
        ->ext4_journal_start()-> hold journal transaction
        ->write_begin()
          ->ext4_da_write_begin()
            ->ext4_nonda_switch()
              ->writeback_inodes_sb_if_idle()  --> will wait for journal_stop()
      
      5) try_to_release_page() may fail and it does fail if one of page's bh was
         pinned by journal
      
      6) If we about to change page's mapping we MUST hold it's lock during entire
         remapping procedure, this is true for both pages(original and donor one)
      
      Fixes:
      
      - Avoid (1) and (2) simply by temproraly drop uninitialized extent handling
        optimization, this will be reimplemented later.
      
      - Fix (3) by manually forcing page to uptodate state w/o dropping it's lock
      
      - Fix (4) by rearranging existing locking:
        from: journal_start(); ->write_begin
        to: write_begin(); journal_extend()
      - Fix (5) simply by checking retvalue
      - Fix (6) by locking both (original and donor one) pages during extent swap
        with help of mext_page_double_lock()
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      bb557488
    • D
      ext4: online defrag is not supported for journaled files · f066055a
      Dmitry Monakhov 提交于
      Proper block swap for inodes with full journaling enabled is
      truly non obvious task. In order to be on a safe side let's
      explicitly disable it for now.
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      f066055a
    • D
      ext4: move_extent code cleanup · 03bd8b9b
      Dmitry Monakhov 提交于
      - Remove usless checks, because it is too late to check that inode != NULL
        at the moment it was referenced several times.
      - Double lock routines looks very ugly and locking ordering relays on
        order of i_ino, but other kernel code rely on order of pointers.
        Let's make them simple and clean.
      - check that inodes belongs to the same SB as soon as possible.
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      03bd8b9b
  7. 10 9月, 2011 1 次提交
  8. 06 6月, 2011 1 次提交
    • L
      ext4: Fix max file size and logical block counting of extent format file · f17722f9
      Lukas Czerner 提交于
      Kazuya Mio reported that he was able to hit BUG_ON(next == lblock)
      in ext4_ext_put_gap_in_cache() while creating a sparse file in extent
      format and fill the tail of file up to its end. We will hit the BUG_ON
      when we write the last block (2^32-1) into the sparse file.
      
      The root cause of the problem lies in the fact that we specifically set
      s_maxbytes so that block at s_maxbytes fit into on-disk extent format,
      which is 32 bit long. However, we are not storing start and end block
      number, but rather start block number and length in blocks. It means
      that in order to cover extent from 0 to EXT_MAX_BLOCK we need
      EXT_MAX_BLOCK+1 to fit into len (because we counting block 0 as well) -
      and it does not.
      
      The only way to fix it without changing the meaning of the struct
      ext4_extent members is, as Kazuya Mio suggested, to lower s_maxbytes
      by one fs block so we can cover the whole extent we can get by the
      on-disk extent format.
      
      Also in many places EXT_MAX_BLOCK is used as length instead of maximum
      logical block number as the name suggests, it is all a bit messy. So
      this commit renames it to EXT_MAX_BLOCKS and change its usage in some
      places to actually be maximum number of blocks in the extent.
      
      The bug which this commit fixes can be reproduced as follows:
      
       dd if=/dev/zero of=/mnt/mp1/file bs=<blocksize> count=1 seek=$((2**32-2))
       sync
       dd if=/dev/zero of=/mnt/mp1/file bs=<blocksize> count=1 seek=$((2**32-1))
      Reported-by: NKazuya Mio <k-mio@sx.jp.nec.com>
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      f17722f9
  9. 19 5月, 2011 1 次提交
  10. 28 10月, 2010 1 次提交
  11. 27 7月, 2010 1 次提交
  12. 03 6月, 2010 1 次提交
  13. 17 5月, 2010 2 次提交
  14. 11 5月, 2010 1 次提交
    • liuqi_123's avatar
      ext4: Fix coding style in fs/ext4/move_extent.c · c26d0bad
      liuqi_123 提交于
      Making sure ee_block is initialized to zero to prevent gcc from
      kvetching.  It's harmless (although it's not obvious that it's
      harmless) from code inspection:
      
      fs/ext4/move_extent.c:478: warning: 'start_ext.ee_block' may be used
      uninitialized in this function
      
      Thanks to Stefan Richter for first bringing this to the attention of
      linux-ext4@vger.kernel.org.
      Signed-off-by: liuqi_123's avatarLiuQi <lingjiujianke@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
      c26d0bad
  15. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  16. 04 3月, 2010 3 次提交
  17. 16 2月, 2010 1 次提交
  18. 09 2月, 2010 1 次提交
  19. 07 12月, 2009 1 次提交
    • A
      ext4: Fix insufficient checks in EXT4_IOC_MOVE_EXT · 4a58579b
      Akira Fujita 提交于
      This patch fixes three problems in the handling of the
      EXT4_IOC_MOVE_EXT ioctl:
      
      1. In current EXT4_IOC_MOVE_EXT, there are read access mode checks for
      original and donor files, but they allow the illegal write access to
      donor file, since donor file is overwritten by original file data.  To
      fix this problem, change access mode checks of original (r->r/w) and
      donor (r->w) files.
      
      2.  Disallow the use of donor files that have a setuid or setgid bits.
      
      3.  Call mnt_want_write() and mnt_drop_write() before and after
      ext4_move_extents() calling to get write access to a mount.
      Signed-off-by: NAkira Fujita <a-fujita@rs.jp.nec.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      4a58579b
  20. 24 11月, 2009 3 次提交
    • A
      ext4: move_extent_per_page() cleanup · ac48b0a1
      Akira Fujita 提交于
      Integrate duplicate lines (acquire/release semaphore and invalidate
      extent cache in move_extent_per_page()) into mext_replace_branches(),
      to reduce source and object code size.
      Signed-off-by: NAkira Fujita <a-fujita@rs.jp.nec.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      ac48b0a1
    • K
      ext4: initialize moved_len before calling ext4_move_extents() · 446aaa6e
      Kazuya Mio 提交于
      The move_extent.moved_len is used to pass back the number of exchanged
      blocks count to user space.  Currently the caller must clear this
      field; but we spend more code space checking for this requirement than
      simply zeroing the field ourselves, so let's just make life easier for
      everyone all around.
      Signed-off-by: NKazuya Mio <k-mio@sx.jp.nec.com>
      Signed-off-by: NAkira Fujita <a-fujita@rs.jp.nec.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      446aaa6e
    • A
      ext4: Fix double-free of blocks with EXT4_IOC_MOVE_EXT · 94d7c16c
      Akira Fujita 提交于
      At the beginning of ext4_move_extent(), we call
      ext4_discard_preallocations() to discard inode PAs of orig and donor
      inodes.  But in the following case, blocks can be double freed, so
      move ext4_discard_preallocations() to the end of ext4_move_extents().
      
      1. Discard inode PAs of orig and donor inodes with
         ext4_discard_preallocations() in ext4_move_extents().
      
         orig : [ DATA1 ]
         donor: [ DATA2 ]
      
      2. While data blocks are exchanging between orig and donor inodes, new
         inode PAs is created to orig by other process's block allocation.
         (Since there are semaphore gaps in ext4_move_extents().)  And new
         inode PAs is used partially (2-1).
      
         2-1 Create new inode PAs to orig inode
         orig : [ DATA1 | used PA1 | free PA1 ]
         donor: [ DATA2 ]
      
      3. Donor inode which has old orig inode's blocks is deleted after
         EXT4_IOC_MOVE_EXT finished (3-1, 3-2).  So the block bitmap
         corresponds to old orig inode's blocks are freed.
      
         3-1 After EXT4_IOC_MOVE_EXT finished
         orig : [ DATA2 |  free PA1 ]
         donor: [ DATA1 |  used PA1 ]
      
         3-2 Delete donor inode
         orig : [ DATA2 |  free PA1 ]
         donor: [ FREE SPACE(DATA1) | FREE SPACE(used PA1) ]
      
      4. The double-free of blocks is occurred, when close() is called to
         orig inode.  Because ext4_discard_preallocations() for orig inode
         frees used PA1 and free PA1, though used PA1 is already freed in 3.
      
         4-1 Double-free of blocks is occurred
         orig : [ DATA2 |  FREE SPACE(free PA1) ]
         donor: [ FREE SPACE(DATA1) | DOUBLE FREE(used PA1) ]
      Signed-off-by: NAkira Fujita <a-fujita@rs.jp.nec.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      94d7c16c
  21. 23 11月, 2009 4 次提交
    • A
      ext4: fix spelling typos in move_extent.c · 92c28159
      Akira Fujita 提交于
      Fix a few spelling typos in move_extent.c
      Signed-off-by: NAkira Fujita <a-fujita@rs.jp.nec.co.jp>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      92c28159
    • A
      ext4: fix possible recursive locking warning in EXT4_IOC_MOVE_EXT · 49bd22bc
      Akira Fujita 提交于
      If CONFIG_PROVE_LOCKING is enabled, the double_down_write_data_sem()
      will trigger a false-positive warning of a recursive lock.  Since we
      take i_data_sem for the two inodes ordered by their inode numbers,
      this isn't a problem.  Use of down_write_nested() will notify the lock
      dependency checker machinery that there is no problem here.
      
      This problem was reported by Brian Rogers:
      
      	http://marc.info/?l=linux-ext4&m=125115356928011&w=1Reported-by: NBrian Rogers <brian@xyzw.org>
      Signed-off-by: NAkira Fujita <a-fujita@rs.jp.nec.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      49bd22bc
    • A
      ext4: fix lock order problem in ext4_move_extents() · fc04cb49
      Akira Fujita 提交于
      ext4_move_extents() checks the logical block contiguousness
      of original file with ext4_find_extent() and mext_next_extent().
      Therefore the extent which ext4_ext_path structure indicates
      must not be changed between above functions.
      
      But in current implementation, there is no i_data_sem protection
      between ext4_ext_find_extent() and mext_next_extent().  So the extent
      which ext4_ext_path structure indicates may be overwritten by
      delalloc.  As a result, ext4_move_extents() will exchange wrong blocks
      between original and donor files.  I change the place where
      acquire/release i_data_sem to solve this problem.
      
      Moreover, I changed move_extent_per_page() to start transaction first,
      and then acquire i_data_sem.  Without this change, there is a
      possibility of the deadlock between mmap() and ext4_move_extents():
      
      * NOTE: "A", "B" and "C" mean different processes
      
      A-1: ext4_ext_move_extents() acquires i_data_sem of two inodes.
      
      B:   do_page_fault() starts the transaction (T),
           and then tries to acquire i_data_sem.
           But process "A" is already holding it, so it is kept waiting.
      
      C:   While "A" and "B" running, kjournald2 tries to commit transaction (T)
           but it is under updating, so kjournald2 waits for it.
      
      A-2: Call ext4_journal_start with holding i_data_sem,
           but transaction (T) is locked.
      Signed-off-by: NAkira Fujita <a-fujita@rs.jp.nec.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      fc04cb49
    • A
      ext4: fix the returned block count if EXT4_IOC_MOVE_EXT fails · f868a48d
      Akira Fujita 提交于
      If the EXT4_IOC_MOVE_EXT ioctl fails, the number of blocks that were
      exchanged before the failure should be returned to the userspace
      caller.  Unfortunately, currently if the block size is not the same as
      the page size, the returned block count that is returned is the
      page-aligned block count instead of the actual block count.  This
      commit addresses this bug.
      Signed-off-by: NAkira Fujita <a-fujita@rs.jp.nec.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      f868a48d
  22. 29 9月, 2009 2 次提交
  23. 17 9月, 2009 4 次提交