1. 12 5月, 2013 1 次提交
  2. 06 5月, 2013 1 次提交
    • L
      ext4: limit group search loop for non-extent files · e6155736
      Lachlan McIlroy 提交于
      In the case where we are allocating for a non-extent file,
      we must limit the groups we allocate from to those below
      2^32 blocks, and ext4_mb_regular_allocator() attempts to
      do this initially by putting a cap on ngroups for the
      subsequent search loop.
      
      However, the initial target group comes in from the 
      allocation context (ac), and it may already be beyond
      the artificially limited ngroups.  In this case,
      the limit
      
      	if (group == ngroups)
      		group = 0;
      
      at the top of the loop is never true, and the loop will
      run away.
      
      Catch this case inside the loop and reset the search to
      start at group 0.
      
      [sandeen@redhat.com: add commit msg & comments]
      Signed-off-by: NLachlan McIlroy <lmcilroy@redhat.com>
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      e6155736
  3. 03 5月, 2013 1 次提交
    • Y
      ext4: fix fio regression · e30b5dca
      Yan, Zheng 提交于
      We (Linux Kernel Performance project) found a regression introduced
      by commit:
      
        f7fec032 ext4: track all extent status in extent status tree
      
      The commit causes about 20% performance decrease in fio random write
      test. Profiler shows that rb_next() uses a lot of CPU time. The call
      stack is:
      
        rb_next
        ext4_es_find_delayed_extent
        ext4_map_blocks
        _ext4_get_block
        ext4_get_block_write
        __blockdev_direct_IO
        ext4_direct_IO
        generic_file_direct_write
        __generic_file_aio_write
        ext4_file_write
        aio_rw_vect_retry
        aio_run_iocb
        do_io_submit
        sys_io_submit
        system_call_fastpath
        io_submit
        td_io_getevents
        io_u_queued_complete
        thread_main
        main
        __libc_start_main
      
      The cause is that ext4_es_find_delayed_extent() doesn't have an
      upper bound, it keeps searching until a delayed extent is found.
      When there are a lots of non-delayed entries in the extent state
      tree, ext4_es_find_delayed_extent() may uses a lot of CPU time.
      Reported-by: NLKP project <lkp@linux.intel.com>
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      e30b5dca
  4. 23 4月, 2013 1 次提交
  5. 22 4月, 2013 5 次提交
  6. 21 4月, 2013 2 次提交
  7. 20 4月, 2013 5 次提交
    • T
      ext4: fix readdir error in case inline_data+^dir_index. · c4d8b023
      Tao Ma 提交于
      Zach reported a problem that if inline data is enabled, we don't
      tell the difference between the offset of '.' and '..'. And a
      getdents will fail if the user only want to get '.'. And what's
      worse, we may meet with duplicate dir entries as the offset
      for inline dir and non-inline one is quite different.
      
      This patch just try to resolve this problem if dir_index
      is disabled. In this case, f_pos is the real offset with
      the dir block, so for inline dir, we just pretend as if
      we are a dir block and returns the offset like a norml
      dir block does.
      Reported-by: NZach Brown <zab@redhat.com>
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      c4d8b023
    • T
      ext4: fix readdir error in the case of inline_data+dir_index · 8af0f082
      Tao Ma 提交于
      Zach reported a problem that if inline data is enabled, we don't
      tell the difference between the offset of '.' and '..'. And a
      getdents will fail if the user only want to get '.' and what's worse,
      if there is a conversion happens when the user calls getdents
      many times, he/she may get the same entry twice.
      
      In theory, a dir block would also fail if it is converted to a
      hashed-index based dir since f_pos will become a hash value, not the
      real one, but it doesn't happen.  And a deep investigation shows that
      we uses a hash based solution even for a normal dir if the dir_index
      feature is enabled.
      
      So this patch just adds a new htree_inlinedir_to_tree for inline dir,
      and if we find that the hash index is supported, we will do like what
      we do for a dir block.
      Reported-by: NZach Brown <zab@redhat.com>
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      8af0f082
    • Z
      jbd2: use kmem_cache_zalloc instead of kmem_cache_alloc/memset · 28daf4fa
      Zheng Liu 提交于
      The jbd2_alloc_handle() function is only called by new_handle().  So
      this commit uses kmem_cache_zalloc() instead of
      kmem_cache_alloc()/memset().
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      28daf4fa
    • D
    • J
      ext4: move quota initialization out of inode allocation transaction · eb9cc7e1
      Jan Kara 提交于
      Inode allocation transaction is pretty heavy (246 credits with quotas
      and extents before previous patch, still around 200 after it).  This is
      mostly due to credits required for allocation of quota structures
      (credits there are heavily overestimated but it's difficult to make
      better estimates if we don't want to wire non-trivial assumptions about
      quota format into filesystem).
      
      So move quota initialization out of allocation transaction. That way
      transaction for quota structure allocation will be started only if we
      need to look up quota structure on disk (rare) and furthermore it will
      be started for each quota type separately, not for all of them at once.
      This reduces maximum transaction size to 34 is most cases and to 73 in
      the worst case.
      
      [ Modified by tytso to clean up the cleanup paths for error handling.
        Also use a separate call to ext4_std_error() for each failure so it
        is easier for someone who is debugging a problem in this function to
        determine which function call failed. ]
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      eb9cc7e1
  8. 19 4月, 2013 1 次提交
  9. 12 4月, 2013 6 次提交
  10. 11 4月, 2013 2 次提交
  11. 10 4月, 2013 7 次提交
  12. 09 4月, 2013 4 次提交
    • E
      ext4: fix free space estimate in ext4_nonda_switch() · 5c1ff336
      Eric Whitney 提交于
      Values stored in s_freeclusters_counter and s_dirtyclusters_counter
      are both in cluster units.  Remove the cluster to block conversion
      applied to s_freeclusters_counter causing an inflated estimate of
      free space because s_dirtyclusters_counter is not similarly
      converted.  Rename free_blocks and dirty_blocks to better reflect
      the units these variables contain to avoid future confusion.  This
      fix corrects ENOSPC failures for xfstests 127 and 231 on bigalloc
      file systems.
      Signed-off-by: NEric Whitney <enwlinux@gmail.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      5c1ff336
    • J
      ext4: fix deadlock with quota feature · bcb13850
      Jan Kara 提交于
      We didn't mark hidden quota files with S_NOQUOTA flag and thus quota was
      accounted even for quota files. Thus we could recurse back to quota code
      when adding new blocks to quota file which can easily deadlock. Mark
      hidden quota files properly.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      bcb13850
    • D
      ext4: fix incorrect lock ordering for ext4_ind_migrate · e8238f9a
      Dmitry Monakhov 提交于
      existing locking ordering: journal-> i_data_sem, but
      ext4_ind_migrate() grab locks in opposite order which may result in
      deadlock.
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      e8238f9a
    • D
      ext4: implementation of a new ioctl called EXT4_IOC_SWAP_BOOT · 393d1d1d
      Dr. Tilmann Bubeck 提交于
      Add a new ioctl, EXT4_IOC_SWAP_BOOT which swaps i_blocks and
      associated attributes (like i_blocks, i_size, i_flags, ...) from the
      specified inode with inode EXT4_BOOT_LOADER_INO (#5). This is
      typically used to store a boot loader in a secure part of the
      filesystem, where it can't be changed by a normal user by accident.
      The data blocks of the previous boot loader will be associated with
      the given inode.
      
      This usercode program is a simple example of the usage:
      
      int main(int argc, char *argv[])
      {
        int fd;
        int err;
      
        if ( argc != 2 ) {
          printf("usage: ext4-swap-boot-inode FILE-TO-SWAP\n");
          exit(1);
        }
      
        fd = open(argv[1], O_WRONLY);
        if ( fd < 0 ) {
          perror("open");
          exit(1);
        }
      
        err = ioctl(fd, EXT4_IOC_SWAP_BOOT);
        if ( err < 0 ) {
          perror("ioctl");
          exit(1);
        }
      
        close(fd);
        exit(0);
      }
      
      [ Modified by Theodore Ts'o to fix a number of bugs in the original code.]
      Signed-off-by: NDr. Tilmann Bubeck <t.bubeck@reinform.de>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      393d1d1d
  13. 04 4月, 2013 4 次提交
    • L
      ext4: print more info in ext4_print_free_blocks() · f78ee70d
      Lukas Czerner 提交于
      Additionally print i_allocated_meta_blocks information as well.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      f78ee70d
    • L
      ext4: try to prepend extent to the existing one · be8981be
      Lukas Czerner 提交于
      Currently when inserting extent in ext4_ext_insert_extent() we would
      only try to to see if we can append new extent to the found extent. If
      we can not, then we proceed with adding new extent into the extent tree,
      but then possibly merging it back again.
      
      We can avoid this situation by trying to append and prepend new extent
      to the existing ones. However since the new extent can be on either
      sides of the existing extent, we have to pick the right extent to try to
      append/prepend to.
      
      This patch adds the conditions to pick the right extent to
      append/prepend to and adds the actual prepending condition as well. This
      will also eliminate the need to use "reserved" block for possibly
      growing extent tree.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      be8981be
    • L
      ext4: Transfer initialized block to right neighbor if possible · bc2d9db4
      Lukas Czerner 提交于
      Currently when converting extent to initialized we attempt to transfer
      initialized block to the left neighbour if possible when certain
      criteria are met. However we do not attempt to do the same for the
      right neighbor.
      
      This commit adds the possibility to transfer initialized block to the
      right neighbour if:
      
      1. We're not converting the whole extent
      2. Both extents are stored in the same extent tree node
      3. Right neighbor is initialized
      4. Right neighbor is logically abutting the current one
      5. Right neighbor is physically abutting the current one
      6. Right neighbor would not overflow the length limit
      
      This is basically the same logic as with transferring to the left. This
      will gain us some performance benefits since it is faster than inserting
      extent and then merging it.
      
      It would also prevent some situation in delalloc patch when we might run
      out of metadata reservation. This is due to the fact that we would
      attempt to split the extent first (possibly allocating new metadata
      block) even though we did not counted for that because it can (and will)
      be merged again. This commit fix that scenario, because we no longer
      need to split the extent in such case.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      bc2d9db4
    • L
      ext4: introduce ext4_get_group_number() · bd86298e
      Lukas Czerner 提交于
      Currently on many places in ext4 we're using
      ext4_get_group_no_and_offset() even though we're only interested in
      knowing the block group of the particular block, not the offset within
      the block group so we can use more efficient way to compute block
      group.
      
      This patch introduces ext4_get_group_number() which computes block
      group for a given block much more efficiently. Use this function
      instead of ext4_get_group_no_and_offset() everywhere where we're only
      interested in knowing the block group.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      bd86298e