1. 02 9月, 2014 3 次提交
  2. 13 7月, 2014 1 次提交
    • T
      ext4: fix a potential deadlock in __ext4_es_shrink() · 3f1f9b85
      Theodore Ts'o 提交于
      This fixes the following lockdep complaint:
      
      [ INFO: possible circular locking dependency detected ]
      3.16.0-rc2-mm1+ #7 Tainted: G           O  
      -------------------------------------------------------
      kworker/u24:0/4356 is trying to acquire lock:
       (&(&sbi->s_es_lru_lock)->rlock){+.+.-.}, at: [<ffffffff81285fff>] __ext4_es_shrink+0x4f/0x2e0
      
      but task is already holding lock:
       (&ei->i_es_lock){++++-.}, at: [<ffffffff81286961>] ext4_es_insert_extent+0x71/0x180
      
      which lock already depends on the new lock.
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(&ei->i_es_lock);
                                     lock(&(&sbi->s_es_lru_lock)->rlock);
                                     lock(&ei->i_es_lock);
        lock(&(&sbi->s_es_lru_lock)->rlock);
      
       *** DEADLOCK ***
      
      6 locks held by kworker/u24:0/4356:
       #0:  ("writeback"){.+.+.+}, at: [<ffffffff81071d00>] process_one_work+0x180/0x560
       #1:  ((&(&wb->dwork)->work)){+.+.+.}, at: [<ffffffff81071d00>] process_one_work+0x180/0x560
       #2:  (&type->s_umount_key#22){++++++}, at: [<ffffffff811a9c74>] grab_super_passive+0x44/0x90
       #3:  (jbd2_handle){+.+...}, at: [<ffffffff812979f9>] start_this_handle+0x189/0x5f0
       #4:  (&ei->i_data_sem){++++..}, at: [<ffffffff81247062>] ext4_map_blocks+0x132/0x550
       #5:  (&ei->i_es_lock){++++-.}, at: [<ffffffff81286961>] ext4_es_insert_extent+0x71/0x180
      
      stack backtrace:
      CPU: 0 PID: 4356 Comm: kworker/u24:0 Tainted: G           O   3.16.0-rc2-mm1+ #7
      Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      Workqueue: writeback bdi_writeback_workfn (flush-253:0)
       ffffffff8213dce0 ffff880014b07538 ffffffff815df0bb 0000000000000007
       ffffffff8213e040 ffff880014b07588 ffffffff815db3dd ffff880014b07568
       ffff880014b07610 ffff88003b868930 ffff88003b868908 ffff88003b868930
      Call Trace:
       [<ffffffff815df0bb>] dump_stack+0x4e/0x68
       [<ffffffff815db3dd>] print_circular_bug+0x1fb/0x20c
       [<ffffffff810a7a3e>] __lock_acquire+0x163e/0x1d00
       [<ffffffff815e89dc>] ? retint_restore_args+0xe/0xe
       [<ffffffff815ddc7b>] ? __slab_alloc+0x4a8/0x4ce
       [<ffffffff81285fff>] ? __ext4_es_shrink+0x4f/0x2e0
       [<ffffffff810a8707>] lock_acquire+0x87/0x120
       [<ffffffff81285fff>] ? __ext4_es_shrink+0x4f/0x2e0
       [<ffffffff8128592d>] ? ext4_es_free_extent+0x5d/0x70
       [<ffffffff815e6f09>] _raw_spin_lock+0x39/0x50
       [<ffffffff81285fff>] ? __ext4_es_shrink+0x4f/0x2e0
       [<ffffffff8119760b>] ? kmem_cache_alloc+0x18b/0x1a0
       [<ffffffff81285fff>] __ext4_es_shrink+0x4f/0x2e0
       [<ffffffff812869b8>] ext4_es_insert_extent+0xc8/0x180
       [<ffffffff812470f4>] ext4_map_blocks+0x1c4/0x550
       [<ffffffff8124c4c4>] ext4_writepages+0x6d4/0xd00
      	...
      Reported-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reported-by: NMinchan Kim <minchan@kernel.org>
      Cc: stable@vger.kernel.org
      Cc: Zheng Liu <gnehzuil.liu@gmail.com>
      3f1f9b85
  3. 13 5月, 2014 1 次提交
  4. 21 4月, 2014 1 次提交
    • L
      ext4: rename uninitialized extents to unwritten · 556615dc
      Lukas Czerner 提交于
      Currently in ext4 there is quite a mess when it comes to naming
      unwritten extents. Sometimes we call it uninitialized and sometimes we
      refer to it as unwritten.
      
      The right name for the extent which has been allocated but does not
      contain any written data is _unwritten_. Other file systems are
      using this name consistently, even the buffer head state refers to it as
      unwritten. We need to fix this confusion in ext4.
      
      This commit changes every reference to an uninitialized extent (meaning
      allocated but unwritten) to unwritten extent. This includes comments,
      function names and variable names. It even covers abbreviation of the
      word uninitialized (such as uninit) and some misspellings.
      
      This commit does not change any of the code paths at all. This has been
      confirmed by comparing md5sums of the assembly code of each object file
      after all the function names were stripped from it.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      556615dc
  5. 07 4月, 2014 1 次提交
  6. 21 2月, 2014 1 次提交
  7. 20 2月, 2014 1 次提交
  8. 11 9月, 2013 1 次提交
    • D
      fs: convert fs shrinkers to new scan/count API · 1ab6c499
      Dave Chinner 提交于
      Convert the filesystem shrinkers to use the new API, and standardise some
      of the behaviours of the shrinkers at the same time.  For example,
      nr_to_scan means the number of objects to scan, not the number of objects
      to free.
      
      I refactored the CIFS idmap shrinker a little - it really needs to be
      broken up into a shrinker per tree and keep an item count with the tree
      root so that we don't need to walk the tree every time the shrinker needs
      to count the number of objects in the tree (i.e.  all the time under
      memory pressure).
      
      [glommer@openvz.org: fixes for ext4, ubifs, nfs, cifs and glock. Fixes are needed mainly due to new code merged in the tree]
      [assorted fixes folded in]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Acked-by: NJan Kara <jack@suse.cz>
      Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1ab6c499
  9. 29 8月, 2013 1 次提交
    • Z
      ext4: isolate ext4_extents.h file · d7b2a00c
      Zheng Liu 提交于
      After applied the commit (4a092d73), we have reduced the number of
      source files that need to #include ext4_extents.h.  But we can do
      better.
      
      This commit defines ext4_zeroout_es() in extents.c and move
      EXT_MAX_BLOCKS into ext4.h in order not to include ext4_extents.h in
      indirect.c and ioctl.c.  Meanwhile we just need to include this file in
      extent_status.c when ES_AGGRESSIVE_TEST is defined.  Otherwise, this
      commit removes a duplicated declaration in trace/events/ext4.h.
      
      After applied this patch, we just need to include ext4_extents.h file
      in {super,migrate,move_extents,extents}.c, and it is easy for us to
      define a new extent disk layout.
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      d7b2a00c
  10. 17 8月, 2013 3 次提交
    • T
      ext4: add support for extent pre-caching · 7869a4a6
      Theodore Ts'o 提交于
      Add a new fiemap flag which forces the all of the extents in an inode
      to be cached in the extent_status tree.  This is critically important
      when using AIO to a preallocated file, since if we need to read in
      blocks from the extent tree, the io_submit(2) system call becomes
      synchronous, and the AIO is no longer "A", which is bad.
      
      In addition, for most files which have an external leaf tree block,
      the cost of caching the information in the extent status tree will be
      less than caching the entire 4k block in the buffer cache.  So it is
      generally a win to keep the extent information cached.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      7869a4a6
    • T
      ext4: cache all of an extent tree's leaf block upon reading · 107a7bd3
      Theodore Ts'o 提交于
      When we read in an extent tree leaf block from disk, arrange to have
      all of its entries cached.  In nearly all cases the in-memory
      representation will be more compact than the on-disk representation in
      the buffer cache, and it allows us to get the information without
      having to traverse the extent tree for successive extents.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
      107a7bd3
    • T
      ext4: use unsigned int for es_status values · 3be78c73
      Theodore Ts'o 提交于
      Don't use an unsigned long long for the es_status flags; this requires
      that we pass 64-bit values around which is painful on 32-bit systems.
      Instead pass the extent status flags around using the low 4 bits of an
      unsigned int, and shift them into place when we are reading or writing
      es_pblk.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
      3be78c73
  11. 15 7月, 2013 1 次提交
    • T
      ext4: make the extent_status code more robust against ENOMEM failures · e15f742c
      Theodore Ts'o 提交于
      Some callers of ext4_es_remove_extent() and ext4_es_insert_extent()
      may not be completely robust against ENOMEM failures (or the
      consequences of reflecting ENOMEM back up to userspace may lead to
      xfstest or user application failure).
      
      To mitigate against this, when trying to insert an entry in the extent
      status tree, try to shrink the inode's extent status tree before
      returning ENOMEM.  If there are entries which don't record information
      about extents under delayed allocations, freeing one of them is
      preferable to returning ENOMEM.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
      e15f742c
  12. 13 7月, 2013 1 次提交
  13. 01 7月, 2013 1 次提交
    • Z
      ext4: improve extent cache shrink mechanism to avoid to burn CPU time · d3922a77
      Zheng Liu 提交于
      Now we maintain an proper in-order LRU list in ext4 to reclaim entries
      from extent status tree when we are under heavy memory pressure.  For
      keeping this order, a spin lock is used to protect this list.  But this
      lock burns a lot of CPU time.  We can use the following steps to trigger
      it.
      
        % cd /dev/shm
        % dd if=/dev/zero of=ext4-img bs=1M count=2k
        % mkfs.ext4 ext4-img
        % mount -t ext4 -o loop ext4-img /mnt
        % cd /mnt
        % for ((i=0;i<160;i++)); do truncate -s 64g $i; done
        % for ((i=0;i<160;i++)); do cp $i /dev/null &; done
        % perf record -a -g
        % perf report
      
      This commit tries to fix this problem.  Now a new member called
      i_touch_when is added into ext4_inode_info to record the last access
      time for an inode.  Meanwhile we never need to keep a proper in-order
      LRU list.  So this can avoid to burns some CPU time.  When we try to
      reclaim some entries from extent status tree, we use list_sort() to get
      a proper in-order list.  Then we traverse this list to discard some
      entries.  In ext4_sb_info, we use s_es_last_sorted to record the last
      time of sorting this list.  When we traverse the list, we skip the inode
      that is newer than this time, and move this inode to the tail of LRU
      list.  When the head of the list is newer than s_es_last_sorted, we will
      sort the LRU list again.
      
      In this commit, we break the loop if s_extent_cache_cnt == 0 because
      that means that all extents in extent status tree have been reclaimed.
      
      Meanwhile in this commit, ext4_es_{un}register_shrinker()'s prototype is
      changed to save a local variable in these functions.
      Reported-by: NDave Hansen <dave.hansen@intel.com>
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      d3922a77
  14. 03 5月, 2013 1 次提交
    • Y
      ext4: fix fio regression · e30b5dca
      Yan, Zheng 提交于
      We (Linux Kernel Performance project) found a regression introduced
      by commit:
      
        f7fec032 ext4: track all extent status in extent status tree
      
      The commit causes about 20% performance decrease in fio random write
      test. Profiler shows that rb_next() uses a lot of CPU time. The call
      stack is:
      
        rb_next
        ext4_es_find_delayed_extent
        ext4_map_blocks
        _ext4_get_block
        ext4_get_block_write
        __blockdev_direct_IO
        ext4_direct_IO
        generic_file_direct_write
        __generic_file_aio_write
        ext4_file_write
        aio_rw_vect_retry
        aio_run_iocb
        do_io_submit
        sys_io_submit
        system_call_fastpath
        io_submit
        td_io_getevents
        io_u_queued_complete
        thread_main
        main
        __libc_start_main
      
      The cause is that ext4_es_find_delayed_extent() doesn't have an
      upper bound, it keeps searching until a delayed extent is found.
      When there are a lots of non-delayed entries in the extent state
      tree, ext4_es_find_delayed_extent() may uses a lot of CPU time.
      Reported-by: NLKP project <lkp@linux.intel.com>
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      e30b5dca
  15. 11 3月, 2013 3 次提交
  16. 02 3月, 2013 1 次提交
  17. 01 3月, 2013 1 次提交
    • T
      ext4: optimize ext4_es_shrink() · 24630774
      Theodore Ts'o 提交于
      When the system is under memory pressure, ext4_es_srhink() will get
      called very often.  So optimize returning the number of items in the
      file system's extent status cache by keeping a per-filesystem count,
      instead of calculating it each time by scanning all of the inodes in
      the extent status cache.
      
      Also rename the slab used for the extent status cache to be
      "ext4_extent_status" so it's obviousl the slab in question is created
      by ext4.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: Zheng Liu <gnehzuil.liu@gmail.com>
      24630774
  18. 23 2月, 2013 1 次提交
  19. 18 2月, 2013 6 次提交
    • Z
      ext4: reclaim extents from extent status tree · 74cd15cd
      Zheng Liu 提交于
      Although extent status is loaded on-demand, we also need to reclaim
      extent from the tree when we are under a heavy memory pressure because
      in some cases fragmented extent tree causes status tree costs too much
      memory.
      
      Here we maintain a lru list in super_block.  When the extent status of
      an inode is accessed and changed, this inode will be move to the tail
      of the list.  The inode will be dropped from this list when it is
      cleared.  In the inode, a counter is added to count the number of
      cached objects in extent status tree.  Here only written/unwritten/hole
      extent is counted because delayed extent doesn't be reclaimed due to
      fiemap, bigalloc and seek_data/hole need it.  The counter will be
      increased as a new extent is allocated, and it will be decreased as a
      extent is freed.
      
      In this commit we use normal shrinker framework to reclaim memory from
      the status tree.  ext4_es_reclaim_extents_count() traverses the lru list
      to count the number of reclaimable extents.  ext4_es_shrink() tries to
      reclaim written/unwritten/hole extents from extent status tree.  The
      inode that has been shrunk is moved to the tail of lru list.
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: Jan kara <jack@suse.cz>
      74cd15cd
    • Z
      ext4: adjust some functions for reclaiming extents from extent status tree · bdedbb7b
      Zheng Liu 提交于
      This commit changes some interfaces in extent status tree because we
      need to use inode to count the cached objects in a extent status tree.
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: Jan kara <jack@suse.cz>
      bdedbb7b
    • Z
      ext4: lookup block mapping in extent status tree · d100eef2
      Zheng Liu 提交于
      After tracking all extent status, we already have a extent cache in
      memory.  Every time we want to lookup a block mapping, we can first
      try to lookup it in extent status tree to avoid a potential disk I/O.
      
      A new function called ext4_es_lookup_extent is defined to finish this
      work.  When we try to lookup a block mapping, we always call
      ext4_map_blocks and/or ext4_da_map_blocks.  So in these functions we
      first try to lookup a block mapping in extent status tree.
      
      A new flag EXT4_GET_BLOCKS_NO_PUT_HOLE is used in ext4_da_map_blocks
      in order not to put a hole into extent status tree because this hole
      will be converted to delayed extent in the tree immediately.
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: Jan kara <jack@suse.cz>
      d100eef2
    • Z
      ext4: rename and improbe ext4_es_find_extent() · be401363
      Zheng Liu 提交于
      This commit renames ext4_es_find_extent with ext4_es_find_delayed_extent
      and improve this function.  First, we split input and output parameter.
      Second, this function never return the first block of the next delayed
      extent after 'es'.
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: Jan kara <jack@suse.cz>
      be401363
    • Z
      ext4: add physical block and status member into extent status tree · fdc0212e
      Zheng Liu 提交于
      This commit adds two members in extent_status structure to let it record
      physical block and extent status.  Here es_pblk is used to record both
      of them because physical block only has 48 bits.  So extent status could
      be stashed into it so that we can save some memory.  Now written,
      unwritten, delayed and hole are defined as status.
      
      Due to new member is added into extent status tree, all interfaces need
      to be adjusted.
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      fdc0212e
    • Z
      ext4: refine extent status tree · 06b0c886
      Zheng Liu 提交于
      This commit refines the extent status tree code.
      
      1) A prefix 'es_' is added to to the extent status tree structure
      members.
      
      2) Refactored es_remove_extent() so that __es_remove_extent() can be
      used by es_insert_extent() to remove the old extent entry(-ies) before
      inserting a new one.
      
      3) Rename extent_status_end() to ext4_es_end()
      
      4) ext4_es_can_be_merged() is define to check whether two extents can
      be merged or not.
      
      5) Update and clarified comments.
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      06b0c886
  20. 09 11月, 2012 2 次提交