1. 28 10月, 2010 6 次提交
    • T
      ext4: use bio layer instead of buffer layer in mpage_da_submit_io · bd2d0210
      Theodore Ts'o 提交于
      Call the block I/O layer directly instad of going through the buffer
      layer.  This should give us much better performance and scalability,
      as well as lowering our CPU utilization when doing buffered writeback.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      bd2d0210
    • E
      ext4: remove unused ext4_sb_info members · 640e9396
      Eric Sandeen 提交于
      Not that these take up a lot of room, but the structure is long enough
      as it is, and there's no need to confuse people with these various
      undocumented & unused structure members...
      Signed-off-by: NEric Sandeen <sandeen@redaht.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      640e9396
    • T
      ext4: improve llseek error handling for overly large seek offsets · e0d10bfa
      Toshiyuki Okajima 提交于
      The llseek system call should return EINVAL if passed a seek offset
      which results in a write error.  What this maximum offset should be
      depends on whether or not the huge_file file system feature is set,
      and whether or not the file is extent based or not.
      
      
      If the file has no "EXT4_EXTENTS_FL" flag, the maximum size which can be 
      written (write systemcall) is different from the maximum size which can be 
      sought (lseek systemcall).
      
      For example, the following 2 cases demonstrates the differences
      between the maximum size which can be written, versus the seek offset
      allowed by the llseek system call:
      
      #1: mkfs.ext3 <dev>; mount -t ext4 <dev>
      #2: mkfs.ext3 <dev>; tune2fs -Oextent,huge_file <dev>; mount -t ext4 <dev>
      
      Table. the max file size which we can write or seek
             at each filesystem feature tuning and file flag setting
      +============+===============================+===============================+
      | \ File flag|                               |                               |
      |      \     |     !EXT4_EXTENTS_FL          |        EXT4_EXTETNS_FL        |
      |case       \|                               |                               |
      +------------+-------------------------------+-------------------------------+
      | #1         |   write:      2194719883264   | write:       --------------   |
      |            |   seek:       2199023251456   | seek:        --------------   |
      +------------+-------------------------------+-------------------------------+
      | #2         |   write:      4402345721856   | write:       17592186044415   |
      |            |   seek:      17592186044415   | seek:        17592186044415   |
      +------------+-------------------------------+-------------------------------+
      
      The differences exist because ext4 has 2 maxbytes which are sb->s_maxbytes
      (= extent-mapped maxbytes) and EXT4_SB(sb)->s_bitmap_maxbytes (= block-mapped 
      maxbytes).  Although generic_file_llseek uses only extent-mapped maxbytes.
      (llseek of ext4_file_operations is generic_file_llseek which uses
      sb->s_maxbytes.)
      
      Therefore we create ext4 llseek function which uses 2 maxbytes.
      
      The new own function originates from generic_file_llseek().
      If the file flag, "EXT4_EXTENTS_FL" is not set, the function alters 
      inode->i_sb->s_maxbytes into EXT4_SB(inode->i_sb)->s_bitmap_maxbytes.
      Signed-off-by: NToshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      e0d10bfa
    • L
      ext4: add interface to advertise ext4 features in sysfs · 857ac889
      Lukas Czerner 提交于
      User-space should have the opportunity to check what features doest ext4
      support in each particular copy. This adds easy interface by creating new
      "features" directory in sys/fs/ext4/. In that directory files
      advertising feature names can be created.
      
      Add lazy_itable_init to the feature list.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      857ac889
    • L
      ext4: add support for lazy inode table initialization · bfff6873
      Lukas Czerner 提交于
      When the lazy_itable_init extended option is passed to mke2fs, it
      considerably speeds up filesystem creation because inode tables are
      not zeroed out.  The fact that parts of the inode table are
      uninitialized is not a problem so long as the block group descriptors,
      which contain information regarding how much of the inode table has
      been initialized, has not been corrupted However, if the block group
      checksums are not valid, e2fsck must scan the entire inode table, and
      the the old, uninitialized data could potentially cause e2fsck to
      report false problems.
      
      Hence, it is important for the inode tables to be initialized as soon
      as possble.  This commit adds this feature so that mke2fs can safely
      use the lazy inode table initialization feature to speed up formatting
      file systems.
      
      This is done via a new new kernel thread called ext4lazyinit, which is
      created on demand and destroyed, when it is no longer needed.  There
      is only one thread for all ext4 filesystems in the system. When the
      first filesystem with inititable mount option is mounted, ext4lazyinit
      thread is created, then the filesystem can register its request in the
      request list.
      
      This thread then walks through the list of requests picking up
      scheduled requests and invoking ext4_init_inode_table(). Next schedule
      time for the request is computed by multiplying the time it took to
      zero out last inode table with wait multiplier, which can be set with
      the (init_itable=n) mount option (default is 10).  We are doing
      this so we do not take the whole I/O bandwidth. When the thread is no
      longer necessary (request list is empty) it frees the appropriate
      structures and exits (and can be created later later by another
      filesystem).
      
      We do not disturb regular inode allocations in any way, it just do not
      care whether the inode table is, or is not zeroed. But when zeroing, we
      have to skip used inodes, obviously. Also we should prevent new inode
      allocations from the group, while zeroing is on the way. For that we
      take write alloc_sem lock in ext4_init_inode_table() and read alloc_sem
      in the ext4_claim_inode, so when we are unlucky and allocator hits the
      group which is currently being zeroed, it just has to wait.
      
      This can be suppresed using the mount option no_init_itable.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      bfff6873
    • C
      ext4: use dedicated slab caches for group_info structures · fb1813f4
      Curt Wohlgemuth 提交于
      ext4_group_info structures are currently allocated with kmalloc().
      With a typical 4K block size, these are 136 bytes each -- meaning
      they'll each consume a 256-byte slab object.  On a system with many
      ext4 large partitions, that's a lot of wasted kernel slab space.
      (E.g., a single 1TB partition will have about 8000 block groups, using
      about 2MB of slab, of which nearly 1MB is wasted.)
      
      This patch creates an array of slab pointers created as needed --
      depending on the superblock block size -- and uses these slabs to
      allocate the group info objects.
      
      Google-Bug-Id: 2980809
      Signed-off-by: NCurt Wohlgemuth <curtw@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      fb1813f4
  2. 10 8月, 2010 1 次提交
  3. 05 8月, 2010 1 次提交
    • E
      ext4: re-inline ext4_rec_len_(to|from)_disk functions · 0cfc9255
      Eric Sandeen 提交于
      commit 3d0518f4, "ext4: New rec_len encoding for very
      large blocksizes" made several changes to this path, but from
      a perf perspective, un-inlining ext4_rec_len_from_disk() seems
      most significant.  This function is called from ext4_check_dir_entry(),
      which on a file-creation workload is called extremely often.
      
      I tested this with bonnie:
      
      # bonnie++ -u root -s 0 -f -x 200 -d /mnt/test -n 32
      
      (this does 200 iterations) and got this for the file creations:
      
      ext4 stock:   Average =  21206.8 files/s
      ext4 inlined: Average =  22346.7 files/s  (+5%)
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      0cfc9255
  4. 02 8月, 2010 1 次提交
  5. 27 7月, 2010 7 次提交
  6. 30 6月, 2010 1 次提交
  7. 29 6月, 2010 2 次提交
  8. 15 6月, 2010 1 次提交
  9. 12 6月, 2010 1 次提交
    • T
      ext4: Clean up s_dirt handling · a0375156
      Theodore Ts'o 提交于
      We don't need to set s_dirt in most of the ext4 code when journaling
      is enabled.  In ext3/4 some of the summary statistics for # of free
      inodes, blocks, and directories are calculated from the per-block
      group statistics when the file system is mounted or unmounted.  As a
      result the superblock doesn't have to be updated, either via the
      journal or by setting s_dirt.  There are a few exceptions, most
      notably when resizing the file system, where the superblock needs to
      be modified --- and in that case it should be done as a journalled
      operation if possible, and s_dirt set only in no-journal mode.
      
      This patch will optimize out some unneeded disk writes when using ext4
      with a journal.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a0375156
  10. 28 5月, 2010 1 次提交
  11. 17 5月, 2010 8 次提交
  12. 06 3月, 2010 1 次提交
  13. 04 3月, 2010 1 次提交
  14. 03 3月, 2010 1 次提交
  15. 05 3月, 2010 1 次提交
    • J
      ext4: use ext4_get_block_write in buffer write · 744692dc
      Jiaying Zhang 提交于
      Allocate uninitialized extent before ext4 buffer write and
      convert the extent to initialized after io completes.
      The purpose is to make sure an extent can only be marked
      initialized after it has been written with new data so
      we can safely drop the i_mutex lock in ext4 DIO read without
      exposing stale data. This helps to improve multi-thread DIO
      read performance on high-speed disks.
      
      Skip the nobh and data=journal mount cases to make things simple for now.
      Signed-off-by: NJiaying Zhang <jiayingz@google.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      744692dc
  16. 03 3月, 2010 1 次提交
  17. 24 2月, 2010 1 次提交
  18. 17 2月, 2010 1 次提交
    • T
      percpu: add __percpu sparse annotations to fs · 003cb608
      Tejun Heo 提交于
      Add __percpu sparse annotations to fs.
      
      These annotations are to make sparse consider percpu variables to be
      in a different address space and warn if accessed without going
      through percpu accessors.  This patch doesn't affect normal builds.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: Alex Elder <aelder@sgi.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      003cb608
  19. 16 2月, 2010 1 次提交
  20. 25 1月, 2010 2 次提交