1. 11 7月, 2016 1 次提交
  2. 27 4月, 2016 1 次提交
  3. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  4. 10 3月, 2016 1 次提交
  5. 09 3月, 2016 1 次提交
  6. 09 1月, 2016 1 次提交
  7. 18 10月, 2015 1 次提交
  8. 19 5月, 2015 1 次提交
    • T
      ext4 crypto: optimize filename encryption · 5b643f9c
      Theodore Ts'o 提交于
      Encrypt the filename as soon it is passed in by the user.  This avoids
      our needing to encrypt the filename 2 or 3 times while in the process
      of creating a filename.
      
      Similarly, when looking up a directory entry, encrypt the filename
      early, or if the encryption key is not available, base-64 decode the
      file syystem so that the hash value and the last 16 bytes of the
      encrypted filename is available in the new struct ext4_filename data
      structure.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      5b643f9c
  9. 16 4月, 2015 1 次提交
  10. 12 4月, 2015 2 次提交
  11. 03 4月, 2015 1 次提交
    • R
      ext4: fix transposition typo in format string · 80cfb71e
      Rasmus Villemoes 提交于
      According to C99, %*.s means the same as %*.0s, in other words, print as
      many spaces as the field width argument says and effectively ignore the
      string argument. That is certainly not what was meant here. The kernel's
      printf implementation, however, treats it as if the . was not there,
      i.e. as %*s. I don't know if de->name is nul-terminated or not, but in
      any case I'm guessing the intention was to use de->name_len as precision
      instead of field width.
      
      [ Note: this is debugging code which is commented out, so this is not
        security issue; a developer would have to explicitly enable
        INLINE_DIR_DEBUG before this would be an issue. ]
      Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      80cfb71e
  12. 06 12月, 2014 1 次提交
    • D
      ext4: ext4_da_convert_inline_data_to_extent drop locked page after error · 50db71ab
      Dmitry Monakhov 提交于
      Testcase:
      xfstests generic/270
      MKFS_OPTIONS="-q -I 256 -O inline_data,64bit"
      
      Call Trace:
       [<ffffffff81144c76>] lock_page+0x35/0x39 -------> DEADLOCK
       [<ffffffff81145260>] pagecache_get_page+0x65/0x15a
       [<ffffffff811507fc>] truncate_inode_pages_range+0x1db/0x45c
       [<ffffffff8120ea63>] ? ext4_da_get_block_prep+0x439/0x4b6
       [<ffffffff811b29b7>] ? __block_write_begin+0x284/0x29c
       [<ffffffff8120e62a>] ? ext4_change_inode_journal_flag+0x16b/0x16b
       [<ffffffff81150af0>] truncate_inode_pages+0x12/0x14
       [<ffffffff81247cb4>] ext4_truncate_failed_write+0x19/0x25
       [<ffffffff812488cf>] ext4_da_write_inline_data_begin+0x196/0x31c
       [<ffffffff81210dad>] ext4_da_write_begin+0x189/0x302
       [<ffffffff810c07ac>] ? trace_hardirqs_on+0xd/0xf
       [<ffffffff810ddd13>] ? read_seqcount_begin.clone.1+0x9f/0xcc
       [<ffffffff8114309d>] generic_perform_write+0xc7/0x1c6
       [<ffffffff810c040e>] ? mark_held_locks+0x59/0x77
       [<ffffffff811445d1>] __generic_file_write_iter+0x17f/0x1c5
       [<ffffffff8120726b>] ext4_file_write_iter+0x2a5/0x354
       [<ffffffff81185656>] ? file_start_write+0x2a/0x2c
       [<ffffffff8107bcdb>] ? bad_area_nosemaphore+0x13/0x15
       [<ffffffff811858ce>] new_sync_write+0x8a/0xb2
       [<ffffffff81186e7b>] vfs_write+0xb5/0x14d
       [<ffffffff81186ffb>] SyS_write+0x5c/0x8c
       [<ffffffff816f2529>] system_call_fastpath+0x12/0x17
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      50db71ab
  13. 03 12月, 2014 2 次提交
  14. 13 10月, 2014 1 次提交
  15. 11 9月, 2014 1 次提交
  16. 29 7月, 2014 1 次提交
  17. 15 7月, 2014 1 次提交
    • Z
      ext4: make ext4_has_inline_data() as a inline function · 83447ccb
      Zheng Liu 提交于
      Now ext4_has_inline_data() is used in wide spread codepaths.  So we need
      to make it as a inline function to avoid burning some CPU cycles.
      
      Change in text size:
      
               text     data      bss     dec     hex filename
      before: 326110    19258    5528  350896   55ab0 fs/ext4/ext4.o
      after:  326227    19258    5528  351013   55b25 fs/ext4/ext4.o
      
      I use the following script to measure the CPU usage.
      
        #!/bin/bash
      
        shm_base='/dev/shm'
        img=${shm_base}/ext4-img
        mnt=/mnt/loop
      
        e2fsprgs_base=$HOME/e2fsprogs
        mkfs=${e2fsprgs_base}/misc/mke2fs
        fsck=${e2fsprgs_base}/e2fsck/e2fsck
      
        sudo umount $mnt
        dd if=/dev/zero of=$img bs=4k count=3145728
        ${mkfs} -t ext4 -O inline_data -F $img
        sudo mount -t ext4 -o loop $img $mnt
      
        # start testing...
        testdir="${mnt}/testdir"
        mkdir $testdir
        cd $testdir
      
        echo "start testing..."
        for ((cnt=0;cnt<100;cnt++)); do
      
        for ((i=0;i<5;i++)); do
        	for ((j=0;j<5;j++)); do
        		for ((k=0;k<5;k++)); do
        			for ((l=0;l<5;l++)); do
        				mkdir -p $i/$j/$k/$l
        				echo "$i-$j-$k-$l" > $i/$j/$k/$l/testfile
        			done
        		done
        	done
        done
      
        ls -R $testdir > /dev/null
        rm -rf $testdir/*
      
        done
      
      The result of `perf top -G -U` is as below.
      
      vanilla:
       13.92%  [ext4]  [k] ext4_do_update_inode
        9.36%  [ext4]  [k] __ext4_get_inode_loc
        4.07%  [ext4]  [k] ftrace_define_fields_ext4_writepages
        3.83%  [ext4]  [k] __ext4_handle_dirty_metadata
        3.42%  [ext4]  [k] ext4_get_inode_flags
        2.71%  [ext4]  [k] ext4_mark_iloc_dirty
        2.46%  [ext4]  [k] ftrace_define_fields_ext4_direct_IO_enter
        2.26%  [ext4]  [k] ext4_get_inode_loc
        2.22%  [ext4]  [k] ext4_has_inline_data
        [...]
      
      After applied the patch, we don't see ext4_has_inline_data() because it
      has been inlined and perf couldn't sample it.  Although it doesn't mean
      that the CPU cycles can be saved but at least the overhead of function
      calls can be eliminated.  So IMHO we'd better inline this function.
      
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Signed-off-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      83447ccb
  18. 13 5月, 2014 1 次提交
  19. 12 5月, 2014 1 次提交
  20. 12 1月, 2014 1 次提交
  21. 08 1月, 2014 1 次提交
  22. 07 1月, 2014 2 次提交
  23. 30 10月, 2013 2 次提交
  24. 01 7月, 2013 1 次提交
  25. 29 6月, 2013 1 次提交
  26. 01 6月, 2013 1 次提交
  27. 20 4月, 2013 2 次提交
    • T
      ext4: fix readdir error in case inline_data+^dir_index. · c4d8b023
      Tao Ma 提交于
      Zach reported a problem that if inline data is enabled, we don't
      tell the difference between the offset of '.' and '..'. And a
      getdents will fail if the user only want to get '.'. And what's
      worse, we may meet with duplicate dir entries as the offset
      for inline dir and non-inline one is quite different.
      
      This patch just try to resolve this problem if dir_index
      is disabled. In this case, f_pos is the real offset with
      the dir block, so for inline dir, we just pretend as if
      we are a dir block and returns the offset like a norml
      dir block does.
      Reported-by: NZach Brown <zab@redhat.com>
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      c4d8b023
    • T
      ext4: fix readdir error in the case of inline_data+dir_index · 8af0f082
      Tao Ma 提交于
      Zach reported a problem that if inline data is enabled, we don't
      tell the difference between the offset of '.' and '..'. And a
      getdents will fail if the user only want to get '.' and what's worse,
      if there is a conversion happens when the user calls getdents
      many times, he/she may get the same entry twice.
      
      In theory, a dir block would also fail if it is converted to a
      hashed-index based dir since f_pos will become a hash value, not the
      real one, but it doesn't happen.  And a deep investigation shows that
      we uses a hash based solution even for a normal dir if the dir_index
      feature is enabled.
      
      So this patch just adds a new htree_inlinedir_to_tree for inline dir,
      and if we find that the hash index is supported, we will do like what
      we do for a dir block.
      Reported-by: NZach Brown <zab@redhat.com>
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      8af0f082
  28. 23 2月, 2013 1 次提交
  29. 09 2月, 2013 1 次提交
    • T
      ext4: pass context information to jbd2__journal_start() · 9924a92a
      Theodore Ts'o 提交于
      So we can better understand what bits of ext4 are responsible for
      long-running jbd2 handles, use jbd2__journal_start() so we can pass
      context information for logging purposes.
      
      The recommended way for finding the longer-running handles is:
      
         T=/sys/kernel/debug/tracing
         EVENT=$T/events/jbd2/jbd2_handle_stats
         echo "interval > 5" > $EVENT/filter
         echo 1 > $EVENT/enable
      
         ./run-my-fs-benchmark
      
         cat $T/trace > /tmp/problem-handles
      
      This will list handles that were active for longer than 20ms.  Having
      longer-running handles is bad, because a commit started at the wrong
      time could stall for those 20+ milliseconds, which could delay an
      fsync() or an O_SYNC operation.  Here is an example line from the
      trace file describing a handle which lived on for 311 jiffies, or over
      1.2 seconds:
      
      postmark-2917  [000] ....   196.435786: jbd2_handle_stats: dev 254,32 
         tid 570 type 2 line_no 2541 interval 311 sync 0 requested_blocks 1
         dirtied_blocks 0
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      9924a92a
  30. 13 1月, 2013 1 次提交
  31. 11 12月, 2012 5 次提交