1. 29 1月, 2008 22 次提交
    • A
      ext4: Add EXT4_IOC_MIGRATE ioctl · c14c6fd5
      Aneesh Kumar K.V 提交于
      The below patch add ioctl for migrating ext3 indirect block mapped inode
      to ext4 extent mapped inode.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      c14c6fd5
    • J
      ext4: Add inode version support in ext4 · 25ec56b5
      Jean Noel Cordenner 提交于
      This patch adds 64-bit inode version support to ext4. The lower 32 bits
      are stored in the osd1.linux1.l_i_version field while the high 32 bits
      are stored in the i_version_hi field newly created in the ext4_inode.
      This field is incremented in case the ext4_inode is large enough. A
      i_version mount option has been added to enable the feature.
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      Signed-off-by: NAndreas Dilger <adilger@clusterfs.com>
      Signed-off-by: NKalpak Shah <kalpak@clusterfs.com>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NJean Noel Cordenner <jean-noel.cordenner@bull.net>
      25ec56b5
    • J
      vfs: Add 64 bit i_version support · 7a224228
      Jean Noel Cordenner 提交于
      The i_version field of the inode is changed to be a 64-bit counter that
      is set on every inode creation and that is incremented every time the
      inode data is modified (similarly to the "ctime" time-stamp).
      The aim is to fulfill a NFSv4 requirement for rfc3530.
      This first part concerns the vfs, it converts the 32-bit i_version in
      the generic inode to a 64-bit, a flag is added in the super block in
      order to check if the feature is enabled and the i_version is
      incremented in the vfs.
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      Signed-off-by: NJean Noel Cordenner <jean-noel.cordenner@bull.net>
      Signed-off-by: NKalpak Shah <kalpak@clusterfs.com>
      7a224228
    • G
      ext4: Add the journal checksum feature · 818d276c
      Girish Shilamkar 提交于
      The journal checksum feature adds two new flags i.e
      JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT and JBD2_FEATURE_COMPAT_CHECKSUM.
      
      JBD2_FEATURE_CHECKSUM flag indicates that the commit block contains the
      checksum for the blocks described by the descriptor blocks.
      Due to checksums, writing of the commit record no longer needs to be
      synchronous. Now commit record can be sent to disk without waiting for
      descriptor blocks to be written to disk. This behavior is controlled
      using JBD2_FEATURE_ASYNC_COMMIT flag. Older kernels/e2fsck should not be
      able to recover the journal with _ASYNC_COMMIT hence it is made
      incompat.
      The commit header has been extended to hold the checksum along with the
      type of the checksum.
      
      For recovery in pass scan checksums are verified to ensure the sanity
      and completeness(in case of _ASYNC_COMMIT) of every transaction.
      Signed-off-by: NAndreas Dilger <adilger@clusterfs.com>
      Signed-off-by: NGirish Shilamkar <girish@clusterfs.com>
      Signed-off-by: NDave Kleikamp <shaggy@linux.vnet.ibm.com>
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      818d276c
    • J
      jbd2: jbd2 stats through procfs · 8e85fb3f
      Johann Lombardi 提交于
      The patch below updates the jbd stats patch to 2.6.20/jbd2.
      The initial patch was posted by Alex Tomas in December 2005
      (http://marc.info/?l=linux-ext4&m=113538565128617&w=2).
      It provides statistics via procfs such as transaction lifetime and size.
      
      Sometimes, investigating performance problems, i find useful to have
      stats from jbd about transaction's lifetime, size, etc. here is a
      patch for review and inclusion probably.
      
      for example, stats after creation of 3M files in htree directory:
      
      [root@bob ~]# cat /proc/fs/jbd/sda/history
      R/C  tid   wait  run   lock  flush log   hndls  block inlog ctime write drop  close
      R    261   8260  2720  0     0     750   9892   8170  8187
      C    259                                                    750   0     4885  1
      R    262   20    2200  10    0     770   9836   8170  8187
      R    263   30    2200  10    0     3070  9812   8170  8187
      R    264   0     5000  10    0     1340  0      0     0
      C    261                                                    8240  3212  4957  0
      R    265   8260  1470  0     0     4640  9854   8170  8187
      R    266   0     5000  10    0     1460  0      0     0
      C    262                                                    8210  2989  4868  0
      R    267   8230  1490  10    0     4440  9875   8171  8188
      R    268   0     5000  10    0     1260  0      0     0
      C    263                                                    7710  2937  4908  0
      R    269   7730  1470  10    0     3330  9841   8170  8187
      R    270   0     5000  10    0     830   0      0     0
      C    265                                                    8140  3234  4898  0
      C    267                                                    720   0     4849  1
      R    271   8630  2740  20    0     740   9819   8170  8187
      C    269                                                    800   0     4214  1
      R    272   40    2170  10    0     830   9716   8170  8187
      R    273   40    2280  0     0     3530  9799   8170  8187
      R    274   0     5000  10    0     990   0      0     0
      
      
      where,
      
      R     - line for transaction's life from T_RUNNING to T_FINISHED
      C     - line for transaction's checkpointing
      tid   - transaction's id
      wait  - for how long we were waiting for new transaction to start
               (the longest period journal_start() took in this transaction)
      run   - real transaction's lifetime (from T_RUNNING to T_LOCKED
      lock  - how long we were waiting for all handles to close
               (time the transaction was in T_LOCKED)
      flush - how long it took to flush all data (data=ordered)
      log   - how long it took to write the transaction to the log
      hndls - how many handles got to the transaction
      block - how many blocks got to the transaction
      inlog - how many blocks are written to the log (block + descriptors)
      ctime - how long it took to checkpoint the transaction
      write - how many blocks have been written during checkpointing
      drop  - how many blocks have been dropped during checkpointing
      close - how many running transactions have been closed to checkpoint this one
      
      all times are in msec.
      
      
      [root@bob ~]# cat /proc/fs/jbd/sda/info
      280 transaction, each upto 8192 blocks
      average:
        1633ms waiting for transaction
        3616ms running transaction
        5ms transaction was being locked
        1ms flushing data (in ordered mode)
        1799ms logging transaction
        11781 handles per transaction
        5629 blocks per transaction
        5641 logged blocks per transaction
      Signed-off-by: NJohann Lombardi <johann.lombardi@bull.net>
      Signed-off-by: NMariusz Kozlowski <m.kozlowski@tuxland.pl>
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      8e85fb3f
    • A
      ext4: Convert truncate_mutex to read write semaphore. · 0e855ac8
      Aneesh Kumar K.V 提交于
      We are currently taking the truncate_mutex for every read. This would have
      performance impact on large CPU configuration. Convert the lock to read write
      semaphore and take read lock when we are trying to read the file.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      0e855ac8
    • A
      ext4: Make ext4_get_blocks_wrap take the truncate_mutex early. · c278bfec
      Aneesh Kumar K.V 提交于
      When doing a migrate from ext3 to ext4 inode we need to make sure the test
      for inode type and walking inode data happens inside  lock. To make this
      happen move truncate_mutex early before checking the i_flags.
      
      
      This actually should enable us to remove the verify_chain().
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      c278bfec
    • J
      jbd2: Fix assertion failure in fs/jbd2/checkpoint.c · f5a7a6b0
      Jan Kara 提交于
      Before we start committing a transaction, we call
      __journal_clean_checkpoint_list() to cleanup transaction's written-back
      buffers.
      
      If this call happens to remove all of them (and there were already some
      buffers), __journal_remove_checkpoint() will decide to free the transaction
      because it isn't (yet) a committing transaction and soon we fail some
      assertion - the transaction really isn't ready to be freed :).
      
      We change the check in __journal_remove_checkpoint() to free only a
      transaction in T_FINISHED state.  The locking there is subtle though (as
      everywhere in JBD ;().  We use j_list_lock to protect the check and a
      subsequent call to __journal_drop_transaction() and do the same in the end
      of journal_commit_transaction() which is the only place where a transaction
      can get to T_FINISHED state.
      
      Probably I'm too paranoid here and such locking is not really necessary -
      checkpoint lists are processed only from log_do_checkpoint() where a
      transaction must be already committed to be processed or from
      __journal_clean_checkpoint_list() where kjournald itself calls it and thus
      transaction cannot change state either.  Better be safe if something
      changes in future...
      Signed-off-by: NJan Kara <jack@suse.cz>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      f5a7a6b0
    • C
      jbd2: Remove printk from J_ASSERT to preserve registers during BUG · 36df53f4
      Chris Snook 提交于
      Signed-off-by: NChris Snook <csnook@redhat.com>
      Cc: "Stephen C. Tweedie" <sct@redhat.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      36df53f4
    • A
      Add buffer head related helper functions · 389d1b08
      Aneesh Kumar K.V 提交于
      Add buffer head related helper function bh_uptodate_or_lock and
      bh_submit_read which can be used by file system
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      389d1b08
    • C
      ext4: sync up block group descriptor with e2fsprogs. · 91b51a01
      Coly Li 提交于
      This patch extends bg_itable_unused of ext4 group descriptor
      from 16bit into 32bit. In order to add bg_itable_unused_hi into
      struct ext4_group_desc, some extra fields which are already introduced into
      e2fsprogs are also added in for consistency.
      Signed-off-by: NColy Li <coyli@suse.de>
      Cc: Andreas Dilger <adilger@clusterfs.com>
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      91b51a01
    • E
      ext4: store maxbytes for bitmapped files and return EFBIG as appropriate · e2b46574
      Eric Sandeen 提交于
      Calculate & store the max offset for bitmapped files, and
      catch too-large seeks, truncates, and writes in ext4, shortening
      or rejecting as appropriate.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      e2b46574
    • A
      ext4: Support large files · 8180a562
      Aneesh Kumar K.V 提交于
      This patch converts ext4_inode i_blocks to represent total
      blocks occupied by the inode in file system block size.
      Earlier the variable used to represent this in 512 byte
      block size. This actually limited the total size of the file.
      
      The feature is enabled transparently when we write an inode
      whose i_blocks cannot be represnted as 512 byte units in a
      48 bit variable.
      
      inode flag  EXT4_HUGE_FILE_FL
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      8180a562
    • A
      ext4: Add support for 48 bit inode i_blocks. · 0fc1b451
      Aneesh Kumar K.V 提交于
      Use the __le16 l_i_reserved1 field of the linux2 struct of ext4_inode
      to represet the higher 16 bits for i_blocks. With this change max_file
      size becomes (2**48 -1 )* 512 bytes.
      
      We add a RO_COMPAT feature to the super block to indicate that inode
      have i_blocks represented as a split 48 bits. Super block with this
      feature set cannot be mounted read write on a kernel with CONFIG_LSF
      disabled.
      
      Super block flag EXT4_FEATURE_RO_COMPAT_HUGE_FILE
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      0fc1b451
    • A
      ext4: Rename i_dir_acl to i_size_high · a48380f7
      Aneesh Kumar K.V 提交于
      Rename ext4_inode.i_dir_acl to i_size_high
      drop ext4_inode_info.i_dir_acl as it is not used
      Rename ext4_inode.i_size to ext4_inode.i_size_lo
      Add helper function for accessing the ext4_inode combined i_size.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      a48380f7
    • A
      ext4: Rename i_file_acl to i_file_acl_lo · 7973c0c1
      Aneesh Kumar K.V 提交于
      Rename i_file_acl to i_file_acl_lo. This helps
      in finding bugs where we use i_file_acl instead
      of the combined i_file_acl_lo and i_file_acl_high
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      7973c0c1
    • A
      ext4: Fix sparse warnings. · 1d03ec98
      Aneesh Kumar K.V 提交于
      Fix sparse warnings related to static functions
      and local variables.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      1d03ec98
    • A
      ext4: Introduce ext4_update_*_feature · 99e6f829
      Aneesh Kumar K.V 提交于
      Introduce ext4_update_*_feature and use them instead
      of opencoding.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      99e6f829
    • A
      ext4: add ext4_group_t, and change all group variables to this type. · fd2d4291
      Avantika Mathur 提交于
      In many places variables for block group are of type int, which limits the
      maximum number of block groups to 2^31.  Each block group can have up to
      2^15 blocks, with a 4K block size,  and the max filesystem size is limited to
      2^31 * (2^15 * 2^12) = 2^58  -- or 256 PB
      
      This patch introduces a new type ext4_group_t, of type unsigned long, to
      represent block group numbers in ext4.
      All occurrences of block group variables are converted to type ext4_group_t.
      Signed-off-by: NAvantika Mathur <mathur@us.ibm.com>
      fd2d4291
    • A
      ext4: Introduce ext4_lblk_t · 725d26d3
      Aneesh Kumar K.V 提交于
      This patch adds a new data type ext4_lblk_t to represent
      the logical file blocks.
      
      This is the preparatory patch to support large files in ext4
      The follow up patch with convert the ext4_inode i_blocks to
      represent the number of blocks in file system block size. This
      changes makes it possible to have a block number 2**32 -1 which
      will result in overflow if the block number is represented by
      signed long. This patch convert all the block number to type
      ext4_lblk_t which is typedef to __u32
      
      Also remove dead code ext4_ext_walk_space
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      725d26d3
    • J
      ext4: Avoid rec_len overflow with 64KB block size · a72d7f83
      Jan Kara 提交于
      With 64KB blocksize, a directory entry can have size 64KB which does not fit
      into 16 bits we have for entry lenght. So we store 0xffff instead and convert
      value when read from / written to disk. The patch also converts some places
      to use ext4_next_entry() when we are changing them anyway.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      a72d7f83
    • T
      ext4: Support large blocksize up to PAGESIZE · afc7cbca
      Takashi Sato 提交于
      This patch set supports large block size(>4k, <=64k) in ext4,
      just enlarging the block size limit. But it is NOT possible to have 64kB
      blocksize on ext4 without some changes to the directory handling
      code.  The reason is that an empty 64kB directory block would have a
      rec_len == (__u16)2^16 == 0, and this would cause an error to be hit in
      the filesystem.  The proposed solution is treat 64k rec_len
      with a an impossible value like rec_len = 0xffff to handle this.
      
      The Patch-set consists of the following 2 patches.
        [1/2]  ext4: enlarge blocksize
               - Allow blocksize up to pagesize
      
        [2/2]  ext4: fix rec_len overflow
               - prevent rec_len from overflow with 64KB blocksize
      
      Now on 64k page ppc64 box runs with this patch set we could create a 64k
      block size ext4dev, and able to handle empty directory block.
      Signed-off-by: NTakashi Sato <sho@tnes.nec.co.jp>
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      afc7cbca
  2. 28 1月, 2008 18 次提交