1. 09 10月, 2008 1 次提交
  2. 14 9月, 2008 1 次提交
  3. 08 9月, 2008 1 次提交
  4. 17 9月, 2008 1 次提交
    • T
      jbd2: clean up how the journal device name is printed · 05496769
      Theodore Ts'o 提交于
      Calculate the journal device name once and stash it away in the
      journal_s structure.  This avoids needing to call bdevname()
      everywhere and reduces stack usage by not needing to allocate an
      on-stack buffer.  In addition, we eliminate the '/' that can appear in
      device names (e.g. "cciss/c0d0p9" --- see kernel bugzilla #11321) that
      can cause problems when creating proc directory names, and include the
      inode number to support ocfs2 which creates multiple journals with
      different inode numbers.
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      05496769
  5. 14 9月, 2008 1 次提交
  6. 08 9月, 2008 1 次提交
  7. 09 10月, 2008 1 次提交
    • E
      ext4: Avoid printk floods in the face of directory corruption · 9d9f1775
      Eric Sandeen 提交于
      Note: some people thinks this represents a security bug, since it
      might make the system go away while it is printing a large number of
      console messages, especially if a serial console is involved.  Hence,
      it has been assigned CVE-2008-3528, but it requires that the attacker
      either has physical access to your machine to insert a USB disk with a
      corrupted filesystem image (at which point why not just hit the power
      button), or is otherwise able to convince the system administrator to
      mount an arbitrary filesystem image (at which point why not just
      include a setuid shell or world-writable hard disk device file or some
      such).  Me, I think they're just being silly. --tytso
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: linux-ext4@vger.kernel.org
      Cc: Eugene Teo <eugeneteo@kernel.sg>
      9d9f1775
  8. 14 9月, 2008 2 次提交
  9. 09 9月, 2008 3 次提交
  10. 09 10月, 2008 2 次提交
  11. 10 10月, 2008 1 次提交
  12. 09 9月, 2008 1 次提交
  13. 09 10月, 2008 1 次提交
    • A
      ext4: Make sure all the block allocation paths reserve blocks · a30d542a
      Aneesh Kumar K.V 提交于
      With delayed allocation we need to make sure block are reserved before
      we attempt to allocate them. Otherwise we get block allocation failure
      (ENOSPC) during writepages which cannot be handled. This would mean
      silent data loss (We do a printk stating data will be lost). This patch
      updates the DIO and fallocate code path to do block reservation before
      block allocation. This is needed to make sure parallel DIO and fallocate
      request doesn't take block out of delayed reserve space.
      
      When free blocks count go below a threshold we switch to a slow patch
      which looks at other CPU's accumulated percpu counter values.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a30d542a
  14. 20 8月, 2008 1 次提交
  15. 09 9月, 2008 3 次提交
  16. 10 10月, 2008 1 次提交
  17. 03 8月, 2008 1 次提交
  18. 01 8月, 2008 1 次提交
    • A
      [PATCH] fix races and leaks in vfs_quota_on() users · 77e69dac
      Al Viro 提交于
      * new helper: vfs_quota_on_path(); equivalent of vfs_quota_on() sans the
        pathname resolution.
      * callers of vfs_quota_on() that do their own pathname resolution and
        checks based on it are switched to vfs_quota_on_path(); that way we
        avoid the races.
      * reiserfs leaked dentry/vfsmount references on several failure exits.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      77e69dac
  19. 29 7月, 2008 1 次提交
    • H
      vfs: pagecache usage optimization for pagesize!=blocksize · 8ab22b9a
      Hisashi Hifumi 提交于
      When we read some part of a file through pagecache, if there is a
      pagecache of corresponding index but this page is not uptodate, read IO
      is issued and this page will be uptodate.
      
      I think this is good for pagesize == blocksize environment but there is
      room for improvement on pagesize != blocksize environment.  Because in
      this case a page can have multiple buffers and even if a page is not
      uptodate, some buffers can be uptodate.
      
      So I suggest that when all buffers which correspond to a part of a file
      that we want to read are uptodate, use this pagecache and copy data from
      this pagecache to user buffer even if a page is not uptodate.  This can
      reduce read IO and improve system throughput.
      
      I wrote a benchmark program and got result number with this program.
      
      This benchmark do:
      
        1: mount and open a test file.
      
        2: create a 512MB file.
      
        3: close a file and umount.
      
        4: mount and again open a test file.
      
        5: pwrite randomly 300000 times on a test file.  offset is aligned
           by IO size(1024bytes).
      
        6: measure time of preading randomly 100000 times on a test file.
      
      The result was:
      	2.6.26
              330 sec
      
      	2.6.26-patched
              226 sec
      
      Arch:i386
      Filesystem:ext3
      Blocksize:1024 bytes
      Memory: 1GB
      
      On ext3/4, a file is written through buffer/block.  So random read/write
      mixed workloads or random read after random write workloads are optimized
      with this patch under pagesize != blocksize environment.  This test result
      showed this.
      
      The benchmark program is as follows:
      
      #include <stdio.h>
      #include <sys/types.h>
      #include <sys/stat.h>
      #include <fcntl.h>
      #include <unistd.h>
      #include <time.h>
      #include <stdlib.h>
      #include <string.h>
      #include <sys/mount.h>
      
      #define LEN 1024
      #define LOOP 1024*512 /* 512MB */
      
      main(void)
      {
      	unsigned long i, offset, filesize;
      	int fd;
      	char buf[LEN];
      	time_t t1, t2;
      
      	if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
      		perror("cannot mount\n");
      		exit(1);
      	}
      	memset(buf, 0, LEN);
      	fd = open("/root/test1/testfile", O_CREAT|O_RDWR|O_TRUNC);
      	if (fd < 0) {
      		perror("cannot open file\n");
      		exit(1);
      	}
      	for (i = 0; i < LOOP; i++)
      		write(fd, buf, LEN);
      	close(fd);
      	if (umount("/root/test1/") < 0) {
      		perror("cannot umount\n");
      		exit(1);
      	}
      	if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
      		perror("cannot mount\n");
      		exit(1);
      	}
      	fd = open("/root/test1/testfile", O_RDWR);
      	if (fd < 0) {
      		perror("cannot open file\n");
      		exit(1);
      	}
      
      	filesize = LEN * LOOP;
      	for (i = 0; i < 300000; i++){
      		offset = (random() % filesize) & (~(LEN - 1));
      		pwrite(fd, buf, LEN, offset);
      	}
      	printf("start test\n");
      	time(&t1);
      	for (i = 0; i < 100000; i++){
      		offset = (random() % filesize) & (~(LEN - 1));
      		pread(fd, buf, LEN, offset);
      	}
      	time(&t2);
      	printf("%ld sec\n", t2-t1);
      	close(fd);
      	if (umount("/root/test1/") < 0) {
      		perror("cannot umount\n");
      		exit(1);
      	}
      }
      Signed-off-by: NHisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Jan Kara <jack@ucw.cz>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8ab22b9a
  20. 19 8月, 2008 1 次提交
  21. 20 8月, 2008 9 次提交
    • A
      ext4: Initialize writeback_index to 0 when allocating a new inode · 91246c00
      Aneesh Kumar K.V 提交于
      The write_cache_pages() function uses the mapping->writeback_index as
      the starting index to write out when range_cyclic is set.  Properly
      initialize writeback_index so that we start the writeout at index 0.
      
      This was found when debugging the small file fragmentation on ext4.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      91246c00
    • A
      ext4: make sure ext4_has_free_blocks returns 0 for ENOSPC · 16eb7295
      Aneesh Kumar K.V 提交于
      Fix ext4_has_free_blocks() to return 0 when we don't have enough space.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      16eb7295
    • M
      ext4: journal credit fix for the delayed allocation's writepages() function · 525f4ed8
      Mingming Cao 提交于
      Previous delalloc writepages implementation started a new transaction
      outside of a loop which called get_block() to do the block allocation.
      Since we didn't know exactly how many blocks would need to be allocated,
      the estimated journal credits required was very conservative and caused
      many issues.
      
      With the reworked delayed allocation, a new transaction is created for
      each get_block(), thus we don't need to guess how many credits for the
      multiple chunk of allocation.  We start every transaction with enough
      credits for inserting a single exent.  When estimate the credits for
      indirect blocks to allocate a chunk of blocks, we need to know the
      number of data blocks to allocate.  We use the total number of reserved
      delalloc datablocks; if that is too big, for non-extent files, we need
      to limit the number of blocks to EXT4_MAX_TRANS_BLOCKS.
      
      Code cleanup from Aneesh.
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      Reviewed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      525f4ed8
    • A
      ext4: Rework the ext4_da_writepages() function · a1d6cc56
      Aneesh Kumar K.V 提交于
      With the below changes we reserve credit needed to insert only one
      extent resulting from a call to single get_block.  This makes sure we
      don't take too much journal credits during writeout.  We also don't
      limit the pages to write.  That means we loop through the dirty pages
      building largest possible contiguous block request.  Then we issue a
      single get_block request.  We may get less block that we requested.  If
      so we would end up not mapping some of the buffer_heads.  That means
      those buffer_heads are still marked delay.  Later in the writepage
      callback via __mpage_writepage we redirty those pages.
      
      We should also not limit/throttle wbc->nr_to_write in the filesystem
      writepages callback. That cause wrong behaviour in
      generic_sync_sb_inodes caused by wbc->nr_to_write being <= 0
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Reviewed-by: NMingming Cao <cmm@us.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a1d6cc56
    • M
      ext4: journal credits reservation fixes for DIO, fallocate · f3bd1f3f
      Mingming Cao 提交于
      DIO and fallocate credit calculation is different than writepage, as
      they do start a new journal right for each call to ext4_get_blocks_wrap().
      This patch uses the helper function in DIO and fallocate case, passing
      a flag indicating that the modified data are contigous thus could account
      less indirect/index blocks.
      
      This patch also fixed the journal credit reservation for direct I/O
      (DIO).  Previously the estimated credits for DIO only was calculated for
      non-extent files, which was not enough if the file is extent-based.
      
      Also fixed was fallocate double-counting credits for modifying the the
      superblock.
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      f3bd1f3f
    • M
      ext4: journal credits reservation fixes for extent file writepage · ee12b630
      Mingming Cao 提交于
      This patch modified the writepage/write_begin credit calculation for
      extent files, to use the credits caculation helper function.
      
      The current calculation of how many index/leaf blocks should be
      accounted is too conservetive, it always considered the worse case,
      where the tree level is 5, and in the case of multiple chunk
      allocations, it always assumed no blocks were dirtied in common across
      the allocations. This path uses the accurate depth of the inode with
      some extras to calculate the index blocks, and also less conservative in
      the case of multiple allocation accounting.
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      ee12b630
    • M
      ext4: journal credits calulation cleanup and fix for non-extent writepage · a02908f1
      Mingming Cao 提交于
      When considering how many journal credits are needed for modifying a
      chunk of data, we need to account for the super block, inode block,
      quota blocks and xattr block, indirect/index blocks, also, group bitmap
      and group descriptor blocks for new allocation (including data and
      indirect/index blocks). There are many places in ext4 do the calculation
      on their own and often missed one or two meta blocks, and often they
      assume single block allocation, and did not considering the multile
      chunk of allocation case.
      
      This patch is trying to cleanup current journal credit code, provides
      some common helper funtion to calculate the journal credits, to be used
      for writepage, writepages, DIO, fallocate, migration, defrag, and for
      both nonextent and extent files.
      
      This patch modified the writepage/write_begin credit caculation for
      nonextent files, to use the new helper function. It also fixed the
      problem that writepage on nonextent files did not consider the case
      blocksize <pagesize, thus could possibelly need multiple block
      allocation in a single transaction.
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      a02908f1
    • E
      ext4: Fix bug where we return ENOSPC even though we have plenty of inodes · c001077f
      Eric Sandeen 提交于
      The find_group_flex() function starts with best_flex as the
      parent_fbg_group, which happens to have 0 inodes free.  Some of the
      flex groups searched have free blocks and free inodes, but the
      flex_freeb_ratio is < 10, so they're skipped.  Then when a group is
      compared to the current "best" flex group, it does not have more free
      blocks than "best", so it is skipped as well.
      
      This continues until no flex group with free inodes is found which has
      a proper ratio or which has more free blocks than the "best" group,
      and we're left with a "best" group that has 0 inodes free, and we
      return -ENOSPC.
      
      We fix this by changing the logic so that if the current "best" flex
      group has no inodes free, and the current one does have room, it is
      promoted to the next "best."
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      c001077f
    • J
      ext4: don't try to resize if there are no reserved gdt blocks left · 37609fd5
      Josef Bacik 提交于
      When trying to resize an ext4 fs and you run out of reserved gdt blocks,
      you get an error that doesn't actually tell you what went wrong, it just
      says that the gdb it picked is not correct, which is the case since you
      don't have any reserved gdt blocks left.  This patch adds a check to make
      sure you have reserved gdt blocks to use, and if not prints out a more
      relevant error.
      Signed-off-by: NJosef Bacik <jbacik@redhat.com>
      Cc: <linux-ext4@vger.kernel.org>
      Cc: Andreas Dilger <adilger@sun.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      37609fd5
  22. 16 8月, 2008 1 次提交
  23. 20 8月, 2008 2 次提交
  24. 14 8月, 2008 1 次提交
  25. 20 8月, 2008 1 次提交