1. 23 2月, 2016 4 次提交
    • A
      mbcache: add reusable flag to cache entries · 6048c64b
      Andreas Gruenbacher 提交于
      To reduce amount of damage caused by single bad block, we limit number
      of inodes sharing an xattr block to 1024. Thus there can be more xattr
      blocks with the same contents when there are lots of files with the same
      extended attributes. These xattr blocks naturally result in hash
      collisions and can form long hash chains and we unnecessarily check each
      such block only to find out we cannot use it because it is already
      shared by too many inodes.
      
      Add a reusable flag to cache entries which is cleared when a cache entry
      has reached its maximum refcount.  Cache entries which are not marked
      reusable are skipped by mb_cache_entry_find_{first,next}. This
      significantly speeds up mbcache when there are many same xattr blocks.
      For example for xattr-bench with 5 values and each process handling
      20000 files, the run for 64 processes is 25x faster with this patch.
      Even for 8 processes the speedup is almost 3x. We have also verified
      that for situations where there is only one xattr block of each kind,
      the patch doesn't have a measurable cost.
      
      [JK: Remove handling of setting the same value since it is not needed
      anymore, check for races in e_reusable setting, improve changelog,
      add measurements]
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      6048c64b
    • J
      ext4: shortcut setting of xattr to the same value · 3fd16462
      Jan Kara 提交于
      When someone tried to set xattr to the same value (i.e., not changing
      anything) we did all the work of removing original xattr, possibly
      breaking references to shared xattr block, inserting new xattr, and
      merging xattr blocks again. Since this is not so rare operation and it
      is relatively cheap for us to detect this case, check for this and
      shortcut xattr setting in that case.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      3fd16462
    • J
      mbcache2: rename to mbcache · 7a2508e1
      Jan Kara 提交于
      Since old mbcache code is gone, let's rename new code to mbcache since
      number 2 is now meaningless. This is just a mechanical replacement.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      7a2508e1
    • J
      ext4: convert to mbcache2 · 82939d79
      Jan Kara 提交于
      The conversion is generally straightforward. The only tricky part is
      that xattr block corresponding to found mbcache entry can get freed
      before we get buffer lock for that block. So we have to check whether
      the entry is still valid after getting buffer lock.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      82939d79
  2. 07 1月, 2016 1 次提交
  3. 14 12月, 2015 1 次提交
  4. 14 11月, 2015 1 次提交
  5. 18 10月, 2015 2 次提交
  6. 16 4月, 2015 1 次提交
  7. 03 4月, 2015 2 次提交
  8. 13 10月, 2014 1 次提交
  9. 17 9月, 2014 1 次提交
    • D
      ext4: check EA value offset when loading · a0626e75
      Darrick J. Wong 提交于
      When loading extended attributes, check each entry's value offset to
      make sure it doesn't collide with the entries.
      
      Without this check it is easy to crash the kernel by mounting a
      malicious FS containing a file with an EA wherein e_value_offs = 0 and
      e_value_size > 0 and then deleting the EA, which corrupts the name
      list.
      
      (See the f_ea_value_crash test's FS image in e2fsprogs for an example.)
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      a0626e75
  10. 05 9月, 2014 1 次提交
    • T
      ext4: prepare to drop EXT4_STATE_DELALLOC_RESERVED · e3cf5d5d
      Theodore Ts'o 提交于
      The EXT4_STATE_DELALLOC_RESERVED flag was originally implemented
      because it was too hard to make sure the mballoc and get_block flags
      could be reliably passed down through all of the codepaths that end up
      calling ext4_mb_new_blocks().
      
      Since then, we have mb_flags passed down through most of the code
      paths, so getting rid of EXT4_STATE_DELALLOC_RESERVED isn't as tricky
      as it used to.
      
      This commit plumbs in the last of what is required, and then adds a
      WARN_ON check to make sure we haven't missed anything.  If this passes
      a full regression test run, we can then drop
      EXT4_STATE_DELALLOC_RESERVED.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      e3cf5d5d
  11. 13 5月, 2014 2 次提交
  12. 12 5月, 2014 1 次提交
  13. 07 4月, 2014 1 次提交
    • J
      ext4: fix jbd2 warning under heavy xattr load · ec4cb1aa
      Jan Kara 提交于
      When heavily exercising xattr code the assertion that
      jbd2_journal_dirty_metadata() shouldn't return error was triggered:
      
      WARNING: at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1237
      jbd2_journal_dirty_metadata+0x1ba/0x260()
      
      CPU: 0 PID: 8877 Comm: ceph-osd Tainted: G    W 3.10.0-ceph-00049-g68d04c9 #1
      Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
       ffffffff81a1d3c8 ffff880214469928 ffffffff816311b0 ffff880214469968
       ffffffff8103fae0 ffff880214469958 ffff880170a9dc30 ffff8802240fbe80
       0000000000000000 ffff88020b366000 ffff8802256e7510 ffff880214469978
      Call Trace:
       [<ffffffff816311b0>] dump_stack+0x19/0x1b
       [<ffffffff8103fae0>] warn_slowpath_common+0x70/0xa0
       [<ffffffff8103fb2a>] warn_slowpath_null+0x1a/0x20
       [<ffffffff81267c2a>] jbd2_journal_dirty_metadata+0x1ba/0x260
       [<ffffffff81245093>] __ext4_handle_dirty_metadata+0xa3/0x140
       [<ffffffff812561f3>] ext4_xattr_release_block+0x103/0x1f0
       [<ffffffff81256680>] ext4_xattr_block_set+0x1e0/0x910
       [<ffffffff8125795b>] ext4_xattr_set_handle+0x38b/0x4a0
       [<ffffffff810a319d>] ? trace_hardirqs_on+0xd/0x10
       [<ffffffff81257b32>] ext4_xattr_set+0xc2/0x140
       [<ffffffff81258547>] ext4_xattr_user_set+0x47/0x50
       [<ffffffff811935ce>] generic_setxattr+0x6e/0x90
       [<ffffffff81193ecb>] __vfs_setxattr_noperm+0x7b/0x1c0
       [<ffffffff811940d4>] vfs_setxattr+0xc4/0xd0
       [<ffffffff8119421e>] setxattr+0x13e/0x1e0
       [<ffffffff811719c7>] ? __sb_start_write+0xe7/0x1b0
       [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
       [<ffffffff8118c65c>] ? fget_light+0x3c/0x130
       [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
       [<ffffffff8118f1f8>] ? __mnt_want_write+0x58/0x70
       [<ffffffff811946be>] SyS_fsetxattr+0xbe/0x100
       [<ffffffff816407c2>] system_call_fastpath+0x16/0x1b
      
      The reason for the warning is that buffer_head passed into
      jbd2_journal_dirty_metadata() didn't have journal_head attached. This is
      caused by the following race of two ext4_xattr_release_block() calls:
      
      CPU1                                CPU2
      ext4_xattr_release_block()          ext4_xattr_release_block()
      lock_buffer(bh);
      /* False */
      if (BHDR(bh)->h_refcount == cpu_to_le32(1))
      } else {
        le32_add_cpu(&BHDR(bh)->h_refcount, -1);
        unlock_buffer(bh);
                                          lock_buffer(bh);
                                          /* True */
                                          if (BHDR(bh)->h_refcount == cpu_to_le32(1))
                                            get_bh(bh);
                                            ext4_free_blocks()
                                              ...
                                              jbd2_journal_forget()
                                                jbd2_journal_unfile_buffer()
                                                -> JH is gone
        error = ext4_handle_dirty_xattr_block(handle, inode, bh);
        -> triggers the warning
      
      We fix the problem by moving ext4_handle_dirty_xattr_block() under the
      buffer lock. Sadly this cannot be done in nojournal mode as that
      function can call sync_dirty_buffer() which would deadlock. Luckily in
      nojournal mode the race is harmless (we only dirty already freed buffer)
      and thus for nojournal mode we leave the dirtying outside of the buffer
      lock.
      Reported-by: NSage Weil <sage@inktank.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      ec4cb1aa
  14. 19 3月, 2014 1 次提交
    • T
      ext4: each filesystem creates and uses its own mb_cache · 9c191f70
      T Makphaibulchoke 提交于
      This patch adds new interfaces to create and destory cache,
      ext4_xattr_create_cache() and ext4_xattr_destroy_cache(), and remove
      the cache creation and destory calls from ex4_init_xattr() and
      ext4_exitxattr() in fs/ext4/xattr.c.
      
      fs/ext4/super.c has been changed so that when a filesystem is mounted
      a cache is allocated and attched to its ext4_sb_info structure.
      
      fs/mbcache.c has been changed so that only one slab allocator is
      allocated and used by all mbcache structures.
      Signed-off-by: NT. Makphaibulchoke <tmac@hp.com>
      9c191f70
  15. 20 2月, 2014 1 次提交
  16. 26 1月, 2014 1 次提交
  17. 01 11月, 2013 1 次提交
  18. 13 10月, 2013 1 次提交
    • D
      ext4: fix memory leak in xattr · 6e4ea8e3
      Dave Jones 提交于
      If we take the 2nd retry path in ext4_expand_extra_isize_ea, we
      potentionally return from the function without having freed these
      allocations.  If we don't do the return, we over-write the previous
      allocation pointers, so we leak either way.
      
      Spotted with Coverity.
      
      [ Fixed by tytso to set is and bs to NULL after freeing these
        pointers, in case in the retry loop we later end up triggering an
        error causing a jump to cleanup, at which point we could have a double
        free bug. -- Ted ]
      Signed-off-by: NDave Jones <davej@fedoraproject.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Cc: stable@vger.kernel.org
      6e4ea8e3
  19. 10 4月, 2013 1 次提交
  20. 19 2月, 2013 1 次提交
    • L
      ext4: fix xattr block allocation/release with bigalloc · 1231b3a1
      Lukas Czerner 提交于
      Currently when new xattr block is created or released we we would call
      dquot_free_block() or dquot_alloc_block() respectively, among the else
      decrementing or incrementing the number of blocks assigned to the
      inode by one block.
      
      This however does not work for bigalloc file system because we always
      allocate/free the whole cluster so we have to count with that in
      dquot_free_block() and dquot_alloc_block() as well.
      
      Use the clusters-to-blocks conversion EXT4_C2B() when passing number of
      blocks to the dquot_alloc/free functions to fix the problem.
      
      The problem has been revealed by xfstests #117 (and possibly others).
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Cc: stable@vger.kernel.org
      1231b3a1
  21. 10 2月, 2013 1 次提交
    • T
      ext4: fix the number of credits needed for acl ops with inline data · 95eaefbd
      Theodore Ts'o 提交于
      Operations which modify extended attributes may need extra journal
      credits if inline data is used, since there is a chance that some
      extended attributes may need to get pushed to an external attribute
      block.
      
      Changes to reflect this was made in xattr.c, but they were missed in
      fs/ext4/acl.c.  To fix this, abstract the calculation of the number of
      credits needed for xattr operations to an inline function defined in
      ext4_jbd2.h, and use it in acl.c and xattr.c.
      
      Also move the function declarations used in inline.c from xattr.h
      (where they are non-obviously hidden, and caused problems since
      ext4_jbd2.h needs to use the function ext4_has_inline_data), and move
      them to ext4.h.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Reviewed-by: NTao Ma <boyu.mt@taobao.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      95eaefbd
  22. 09 2月, 2013 1 次提交
    • T
      ext4: pass context information to jbd2__journal_start() · 9924a92a
      Theodore Ts'o 提交于
      So we can better understand what bits of ext4 are responsible for
      long-running jbd2 handles, use jbd2__journal_start() so we can pass
      context information for logging purposes.
      
      The recommended way for finding the longer-running handles is:
      
         T=/sys/kernel/debug/tracing
         EVENT=$T/events/jbd2/jbd2_handle_stats
         echo "interval > 5" > $EVENT/filter
         echo 1 > $EVENT/enable
      
         ./run-my-fs-benchmark
      
         cat $T/trace > /tmp/problem-handles
      
      This will list handles that were active for longer than 20ms.  Having
      longer-running handles is bad, because a commit started at the wrong
      time could stall for those 20+ milliseconds, which could delay an
      fsync() or an O_SYNC operation.  Here is an example line from the
      trace file describing a handle which lived on for 311 jiffies, or over
      1.2 seconds:
      
      postmark-2917  [000] ....   196.435786: jbd2_handle_stats: dev 254,32 
         tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1
         dirtied_blocks 0
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      9924a92a
  23. 13 1月, 2013 2 次提交
  24. 11 12月, 2012 2 次提交
  25. 05 12月, 2012 1 次提交
  26. 09 11月, 2012 1 次提交
  27. 10 7月, 2012 1 次提交
    • T
      ext4: use s_csum_seed instead of i_csum_seed for xattr block · 41eb70dd
      Tao Ma 提交于
      In xattr block operation, we use h_refcount to indicate whether the
      xattr block is shared among many inodes. And xattr block csum uses
      s_csum_seed if it is shared and i_csum_seed if it belongs to
      one inode. But this has a problem. So consider the block is shared
      first bewteen inode A and B, and B has some xattr update and CoW
      the xattr block. When it updates the *old* xattr block(because
      of the h_refcount change) and calls ext4_xattr_release_block, we
      has no idea that inode A is the real owner of the *old* xattr
      block and we can't use the i_csum_seed of inode A either in xattr
      block csum calculation. And I don't think we have an easy way to
      find inode A.
      
      So this patch just removes the tricky i_csum_seed and we now uses
      s_csum_seed every time for the xattr block csum. The corresponding
      patch for the e2fsprogs will be sent in another patch.
      
      This is spotted by xfstests 117.
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Acked-by: NDarrick J. Wong <djwong@us.ibm.com>
      41eb70dd
  28. 30 4月, 2012 1 次提交
  29. 20 3月, 2012 1 次提交
  30. 21 2月, 2012 2 次提交
    • E
      ext4: avoid deadlock on sync-mounted FS w/o journal · c1bb05a6
      Eric Sandeen 提交于
      Processes hang forever on a sync-mounted ext2 file system that
      is mounted with the ext4 module (default in Fedora 16).
      
      I can reproduce this reliably by mounting an ext2 partition with
      "-o sync" and opening a new file an that partition with vim. vim
      will hang in "D" state forever.  The same happens on ext4 without
      a journal.
      
      I am attaching a small patch here that solves this issue for me.
      In the sync mounted case without a journal,
      ext4_handle_dirty_metadata() may call sync_dirty_buffer(), which
      can't be called with buffer lock held.
      
      Also move mb_cache_entry_release inside lock to avoid race
      fixed previously by 8a2bfdcb ext[34]: EA block reference count racing fix
      Note too that ext2 fixed this same problem in 2006 with
      b2f49033 [PATCH] fix deadlock in ext2
      
      Signed-off-by: Martin.Wilck@ts.fujitsu.com
      [sandeen@redhat.com: move mb_cache_entry_release before unlock, edit commit msg]
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      c1bb05a6
    • Z
      ext4: remove unneeded variable in ext4_xattr_check_block() · f1b3a2a7
      Zheng Liu 提交于
      We could return directly from ext4_xattr_check_block(). Thus, we
      shouldn't need to define a 'error' variable.
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      f1b3a2a7
  31. 29 10月, 2011 1 次提交
    • E
      ext4: fix race in xattr block allocation path · 6d6a4351
      Eric Sandeen 提交于
      Ceph users reported that when using Ceph on ext4, the filesystem
      would often become corrupted, containing inodes with incorrect
      i_blocks counters.
      
      I managed to reproduce this with a very hacked-up "streamtest"
      binary from the Ceph tree.
      
      Ceph is doing a lot of xattr writes, to out-of-inode blocks.
      There is also another thread which does sync_file_range and close,
      of the same files.  The problem appears to happen due to this race:
      
      sync/flush thread               xattr-set thread
      -----------------               ----------------
      
      do_writepages                   ext4_xattr_set
      ext4_da_writepages              ext4_xattr_set_handle
      mpage_da_map_blocks             ext4_xattr_block_set
              set DELALLOC_RESERVE
                                      ext4_new_meta_blocks
                                              ext4_mb_new_blocks
                                                      if (!i_delalloc_reserved_flag)
                                                              vfs_dq_alloc_block
      ext4_get_blocks
      	down_write(i_data_sem)
              set i_delalloc_reserved_flag
      	...
      	up_write(i_data_sem)
                                              if (i_delalloc_reserved_flag)
                                                      vfs_dq_alloc_block_nofail
      
      
      In other words, the sync/flush thread pops in and sets
      i_delalloc_reserved_flag on the inode, which makes the xattr thread
      think that it's in a delalloc path in ext4_new_meta_blocks(),
      and add the block for a second time, after already having added
      it once in the !i_delalloc_reserved_flag case in ext4_mb_new_blocks
      
      The real problem is that we shouldn't be using the DELALLOC_RESERVED
      state flag, and instead we should be passing
      EXT4_GET_BLOCKS_DELALLOC_RESERVE down to ext4_map_blocks() instead of
      using an inode state flag.  We'll fix this for now with using
      i_data_sem to prevent this race, but this is really not the right way
      to fix things.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@kernel.org
      6d6a4351