1. 29 1月, 2008 13 次提交
    • A
      ext4: Convert truncate_mutex to read write semaphore. · 0e855ac8
      Aneesh Kumar K.V 提交于
      We are currently taking the truncate_mutex for every read. This would have
      performance impact on large CPU configuration. Convert the lock to read write
      semaphore and take read lock when we are trying to read the file.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      0e855ac8
    • A
      ext4: Change the default behaviour on error · bb4f397a
      Aneesh Kumar K.V 提交于
      ext4 file system was by default ignoring errors and continuing. This
      is not a good default as continuing on error could lead to file system
      corruption. Change the default to mark the file system
      readonly. Debian and ubuntu already does this as the default in their
      fstab.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Acked-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      bb4f397a
    • E
      ext4: fix oops on corrupted ext4 mount · e7c95593
      Eric Sandeen 提交于
      When mounting an ext4 filesystem with corrupted s_first_data_block, things
      can go very wrong and oops.
      
      Because blocks_count in ext4_fill_super is a u64, and we must use do_div,
      the calculation of db_count is done differently than on ext4.  If
      first_data_block is corrupted such that it is larger than ext4_blocks_count,
      for example, then the intermediate blocks_count value may go negative,
      but sign-extend to a very large value:
      
              blocks_count = (ext4_blocks_count(es) -
                              le32_to_cpu(es->s_first_data_block) +
                              EXT4_BLOCKS_PER_GROUP(sb) - 1);
      
      This is then assigned to s_groups_count which is an unsigned long:
      
              sbi->s_groups_count = blocks_count;
      
      This may result in a value of 0xFFFFFFFF which is then used to compute
      db_count:
      
              db_count = (sbi->s_groups_count + EXT4_DESC_PER_BLOCK(sb) - 1) /
                         EXT4_DESC_PER_BLOCK(sb);
      
      and in this case db_count will wind up as 0 because the addition overflows
      32 bits.  This in turn causes the kmalloc for group_desc to be of 0 size:
      
              sbi->s_group_desc = kmalloc(db_count * sizeof (struct buffer_head *),
                                          GFP_KERNEL);
      
      and eventually in ext4_check_descriptors, dereferencing
      sbi->s_group_desc[desc_block] will result in a NULL pointer dereference.
      
      The simplest test seems to be to sanity check s_first_data_block,
      EXT4_BLOCKS_PER_GROUP, and ext4_blocks_count values to be sure
      their combination won't result in a bad intermediate value for
      blocks_count.  We could just check for db_count == 0, but
      catching it at the root cause seems like it provides more info.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NMingming Cao <cmm@us.ibm.com>
      e7c95593
    • A
      ext4/super.c: fix #ifdef's (CONFIG_EXT4_* -> CONFIG_EXT4DEV_*) · 07620f69
      Adrian Bunk 提交于
      Based on a report by Robert P. J. Day.
      Signed-off-by: NAdrian Bunk <bunk@kernel.org>
      07620f69
    • E
      ext4: store maxbytes for bitmapped files and return EFBIG as appropriate · e2b46574
      Eric Sandeen 提交于
      Calculate & store the max offset for bitmapped files, and
      catch too-large seeks, truncates, and writes in ext4, shortening
      or rejecting as appropriate.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      e2b46574
    • E
      ext4: different maxbytes functions for bitmap & extent files · cd2291a4
      Eric Sandeen 提交于
      use 2 different maxbytes functions for bitmapped & extent-based
      files.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      cd2291a4
    • A
      ext4: Support large files · 8180a562
      Aneesh Kumar K.V 提交于
      This patch converts ext4_inode i_blocks to represent total
      blocks occupied by the inode in file system block size.
      Earlier the variable used to represent this in 512 byte
      block size. This actually limited the total size of the file.
      
      The feature is enabled transparently when we write an inode
      whose i_blocks cannot be represnted as 512 byte units in a
      48 bit variable.
      
      inode flag  EXT4_HUGE_FILE_FL
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      8180a562
    • A
      ext4: Add support for 48 bit inode i_blocks. · 0fc1b451
      Aneesh Kumar K.V 提交于
      Use the __le16 l_i_reserved1 field of the linux2 struct of ext4_inode
      to represet the higher 16 bits for i_blocks. With this change max_file
      size becomes (2**48 -1 )* 512 bytes.
      
      We add a RO_COMPAT feature to the super block to indicate that inode
      have i_blocks represented as a split 48 bits. Super block with this
      feature set cannot be mounted read write on a kernel with CONFIG_LSF
      disabled.
      
      Super block flag EXT4_FEATURE_RO_COMPAT_HUGE_FILE
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      0fc1b451
    • A
      ext4: Fix sparse warnings. · 1d03ec98
      Aneesh Kumar K.V 提交于
      Fix sparse warnings related to static functions
      and local variables.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      1d03ec98
    • A
      ext4: Introduce ext4_update_*_feature · 99e6f829
      Aneesh Kumar K.V 提交于
      Introduce ext4_update_*_feature and use them instead
      of opencoding.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      99e6f829
    • A
      ext4: add ext4_group_t, and change all group variables to this type. · fd2d4291
      Avantika Mathur 提交于
      In many places variables for block group are of type int, which limits the
      maximum number of block groups to 2^31.  Each block group can have up to
      2^15 blocks, with a 4K block size,  and the max filesystem size is limited to
      2^31 * (2^15 * 2^12) = 2^58  -- or 256 PB
      
      This patch introduces a new type ext4_group_t, of type unsigned long, to
      represent block group numbers in ext4.
      All occurrences of block group variables are converted to type ext4_group_t.
      Signed-off-by: NAvantika Mathur <mathur@us.ibm.com>
      fd2d4291
    • A
      ext4: Introduce ext4_lblk_t · 725d26d3
      Aneesh Kumar K.V 提交于
      This patch adds a new data type ext4_lblk_t to represent
      the logical file blocks.
      
      This is the preparatory patch to support large files in ext4
      The follow up patch with convert the ext4_inode i_blocks to
      represent the number of blocks in file system block size. This
      changes makes it possible to have a block number 2**32 -1 which
      will result in overflow if the block number is represented by
      signed long. This patch convert all the block number to type
      ext4_lblk_t which is typedef to __u32
      
      Also remove dead code ext4_ext_walk_space
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      725d26d3
    • T
      ext4: Support large blocksize up to PAGESIZE · afc7cbca
      Takashi Sato 提交于
      This patch set supports large block size(>4k, <=64k) in ext4,
      just enlarging the block size limit. But it is NOT possible to have 64kB
      blocksize on ext4 without some changes to the directory handling
      code.  The reason is that an empty 64kB directory block would have a
      rec_len == (__u16)2^16 == 0, and this would cause an error to be hit in
      the filesystem.  The proposed solution is treat 64k rec_len
      with a an impossible value like rec_len = 0xffff to handle this.
      
      The Patch-set consists of the following 2 patches.
        [1/2]  ext4: enlarge blocksize
               - Allow blocksize up to pagesize
      
        [2/2]  ext4: fix rec_len overflow
               - prevent rec_len from overflow with 64KB blocksize
      
      Now on 64k page ppc64 box runs with this patch set we could create a 64k
      block size ext4dev, and able to handle empty directory block.
      Signed-off-by: NTakashi Sato <sho@tnes.nec.co.jp>
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      afc7cbca
  2. 18 12月, 2007 1 次提交
  3. 22 10月, 2007 2 次提交
  4. 18 10月, 2007 8 次提交
    • A
      ext4: Convert s_r_blocks_count and s_free_blocks_count · 308ba3ec
      Aneesh Kumar K.V 提交于
      Convert s_r_blocks_count and s_free_blocks_count to
      s_r_blocks_count_lo and s_free_blocks_count_lo
      
      This helps in finding BUGs due to direct partial access of
      these split 64 bit values
      
      Also fix direct partial access in ext4 code
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      308ba3ec
    • A
      ext4: Convert s_blocks_count to s_blocks_count_lo · 6bc9feff
      Aneesh Kumar K.V 提交于
      Convert s_blocks_count to s_blocks_count_lo
      This helps in finding BUGs due to direct partial access of
      these split 64 bit values
      
      Also fix direct partial access in ext4 code
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      6bc9feff
    • A
      ext4: Convert bg_inode_bitmap and bg_inode_table · 5272f837
      Aneesh Kumar K.V 提交于
      Convert bg_inode_bitmap and bg_inode_table to bg_inode_bitmap_lo
      and bg_inode_table_lo.  This helps in finding BUGs due to
      direct partial access of these split 64 bit values
      
      Also fix one direct partial access
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      5272f837
    • A
      ext4: Convert bg_block_bitmap to bg_block_bitmap_lo · 3a14589c
      Aneesh Kumar K.V 提交于
      Convert bg_block_bitmap to bg_block_bitmap_lo
      This helps in catching some BUGS due to direct
      partial access of these split fields.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      3a14589c
    • J
      ext4: FLEX_BG Kernel support v2. · ce421581
      Jose R. Santos 提交于
      This feature relaxes check restrictions on where each block groups meta
      data is located within the storage media.  This allows for the allocation
      of bitmaps or inode tables outside the block group boundaries in cases
      where bad blocks forces us to look for new blocks which the owning block
      group can not satisfy.  This will also allow for new meta-data allocation
      schemes to improve performance and scalability.
      Signed-off-by: NJose R. Santos <jrs@us.ibm.com>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      ce421581
    • A
      Ext4: Uninitialized Block Groups · 717d50e4
      Andreas Dilger 提交于
      In pass1 of e2fsck, every inode table in the fileystem is scanned and checked,
      regardless of whether it is in use.  This is this the most time consuming part
      of the filesystem check.  The unintialized block group feature can greatly
      reduce e2fsck time by eliminating checking of uninitialized inodes.
      
      With this feature, there is a a high water mark of used inodes for each block
      group.  Block and inode bitmaps can be uninitialized on disk via a flag in the
      group descriptor to avoid reading or scanning them at e2fsck time.  A checksum
      of each group descriptor is used to ensure that corruption in the group
      descriptor's bit flags does not cause incorrect operation.
      
      The feature is enabled through a mkfs option
      
      	mke2fs /dev/ -O uninit_groups
      
      A patch adding support for uninitialized block groups to e2fsprogs tools has
      been posted to the linux-ext4 mailing list.
      
      The patches have been stress tested with fsstress and fsx.  In performance
      tests testing e2fsck time, we have seen that e2fsck time on ext3 grows
      linearly with the total number of inodes in the filesytem.  In ext4 with the
      uninitialized block groups feature, the e2fsck time is constant, based
      solely on the number of used inodes rather than the total inode count.
      Since typical ext4 filesystems only use 1-10% of their inodes, this feature can
      greatly reduce e2fsck time for users.  With performance improvement of 2-20
      times, depending on how full the filesystem is.
      
      The attached graph shows the major improvements in e2fsck times in filesystems
      with a large total inode count, but few inodes in use.
      
      In each group descriptor if we have
      
      EXT4_BG_INODE_UNINIT set in bg_flags:
              Inode table is not initialized/used in this group. So we can skip
              the consistency check during fsck.
      EXT4_BG_BLOCK_UNINIT set in bg_flags:
              No block in the group is used. So we can skip the block bitmap
              verification for this group.
      
      We also add two new fields to group descriptor as a part of
      uninitialized group patch.
      
              __le16  bg_itable_unused;       /* Unused inodes count */
              __le16  bg_checksum;            /* crc16(sb_uuid+group+desc) */
      
      bg_itable_unused:
      
      If we have EXT4_BG_INODE_UNINIT not set in bg_flags
      then bg_itable_unused will give the offset within
      the inode table till the inodes are used. This can be
      used by fsck to skip list of inodes that are marked unused.
      
      bg_checksum:
      Now that we depend on bg_flags and bg_itable_unused to determine
      the block and inode usage, we need to make sure group descriptor
      is not corrupt. We add checksum to group descriptor to
      detect corruption. If the descriptor is found to be corrupt, we
      mark all the blocks and inodes in the group used.
      Signed-off-by: NAvantika Mathur <mathur@us.ibm.com>
      Signed-off-by: NAndreas Dilger <adilger@clusterfs.com>
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      717d50e4
    • C
      ext4: Remove (partial, never completed) fragment support · f077d0d7
      Coly Li 提交于
      Fragment support in ext2/3/4 was never implemented, and it probably will
      never be implemented.   So remove it from ext4.
      Signed-off-by: NColy Li <coyli@suse.de>
      Acked-by: NAndreas Dilger <adilger@clusterfs.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      f077d0d7
    • M
      jbd2: JBD_XXX to JBD2_XXX naming cleanup · cd02ff0b
      Mingming Cao 提交于
      change JBD_XXX macros to JBD2_XXX in JBD2/Ext4
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      cd02ff0b
  5. 17 10月, 2007 6 次提交
  6. 12 9月, 2007 1 次提交
  7. 27 7月, 2007 1 次提交
    • E
      fix inode_table test in ext234_check_descriptors · 780dcdb2
      Eric Sandeen 提交于
      ext[234]_check_descriptors sanity checks block group descriptor geometry at
      mount time, testing whether the block bitmap, inode bitmap, and inode table
      reside wholly within the blockgroup.  However, the inode table test is off
      by one so that if the last block in the inode table resides on the last
      block of the block group, the test incorrectly fails.  This is because it
      tests the last block as (start + length) rather than (start + length - 1).
      
      This can be seen by trying to mount a filesystem made such as:
      
       mkfs.ext2 -F -b 1024 -m 0 -g 256 -N 3744 fsfile 1024
      
      which yields:
      
       EXT2-fs error (device loop0): ext2_check_descriptors: Inode table for group 0 not in group (block 101)!
       EXT2-fs: group descriptors corrupted!
      
      There is a similar bug in e2fsprogs, patch already sent for that.
      
      (I wonder if inside(), outside(), and/or in_range() should someday be
      used in this and other tests throughout the ext filesystems...)
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      780dcdb2
  8. 20 7月, 2007 1 次提交
    • P
      mm: Remove slab destructors from kmem_cache_create(). · 20c2df83
      Paul Mundt 提交于
      Slab destructors were no longer supported after Christoph's
      c59def9f change. They've been
      BUGs for both slab and slub, and slob never supported them
      either.
      
      This rips out support for the dtor pointer from kmem_cache_create()
      completely and fixes up every single callsite in the kernel (there were
      about 224, not including the slab allocator definitions themselves,
      or the documentation references).
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      20c2df83
  9. 18 7月, 2007 5 次提交
  10. 17 7月, 2007 2 次提交