1. 16 9月, 2009 1 次提交
    • A
      HWPOISON: Enable .remove_error_page for migration aware file systems · aa261f54
      Andi Kleen 提交于
      Enable removing of corrupted pages through truncation
      for a bunch of file systems: ext*, xfs, gfs2, ocfs2, ntfs
      These should cover most server needs.
      
      I chose the set of migration aware file systems for this
      for now, assuming they have been especially audited.
      But in general it should be safe for all file systems
      on the data area that support read/write and truncate.
      
      Caveat: the hardware error handler does not take i_mutex
      for now before calling the truncate function. Is that ok?
      
      Cc: tytso@mit.edu
      Cc: hch@infradead.org
      Cc: mfasheh@suse.com
      Cc: aia21@cantab.net
      Cc: hugh.dickins@tiscali.co.uk
      Cc: swhiteho@redhat.com
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      aa261f54
  2. 14 9月, 2009 1 次提交
  3. 24 6月, 2009 1 次提交
  4. 12 6月, 2009 1 次提交
  5. 14 4月, 2009 1 次提交
    • J
      ext2: fix data corruption for racing writes · 316cb4ef
      Jan Kara 提交于
      If two writers allocating blocks to file race with each other (e.g.
      because writepages races with ordinary write or two writepages race with
      each other), ext2_getblock() can be called on the same inode in parallel.
      Before we are going to allocate new blocks, we have to recheck the block
      chain we have obtained so far without holding truncate_mutex.  Otherwise
      we could overwrite the indirect block pointer set by the other writer
      leading to data loss.
      
      The below test program by Ying is able to reproduce the data loss with ext2
      on in BRD in a few minutes if the machine is under memory pressure:
      
      long kMemSize  = 50 << 20;
      int kPageSize = 4096;
      
      int main(int argc, char **argv) {
      	int status;
      	int count = 0;
      	int i;
      	char *fname = "/mnt/test.mmap";
      	char *mem;
      	unlink(fname);
      	int fd = open(fname, O_CREAT | O_EXCL | O_RDWR, 0600);
      	status = ftruncate(fd, kMemSize);
      	mem = mmap(0, kMemSize, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
      	// Fill the memory with 1s.
      	memset(mem, 1, kMemSize);
      	sleep(2);
      	for (i = 0; i < kMemSize; i++) {
      		int byte_good = mem[i] != 0;
      		if (!byte_good && ((i % kPageSize) == 0)) {
      			//printf("%d ", i / kPageSize);
      			count++;
      		}
      	}
      	munmap(mem, kMemSize);
      	close(fd);
      	unlink(fname);
      
      	if (count > 0) {
      		printf("Running %d bad page\n", count);
      		return 1;
      	}
      	return 0;
      }
      
      Cc: Ying Han <yinghan@google.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Cc: Mingming Cao <cmm@us.ibm.com>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      316cb4ef
  6. 26 3月, 2009 1 次提交
  7. 09 1月, 2009 1 次提交
  8. 01 1月, 2009 1 次提交
  9. 04 10月, 2008 1 次提交
    • J
      generic block based fiemap implementation · 68c9d702
      Josef Bacik 提交于
      Any block based fs (this patch includes ext3) just has to declare its own
      fiemap() function and then call this generic function with its own
      get_block_t. This works well for block based filesystems that will map
      multiple contiguous blocks at one time, but will work for filesystems that
      only map one block at a time, you will just end up with an "extent" for each
      block. One gotcha is this will not play nicely where there is hole+data
      after the EOF. This function will assume its hit the end of the data as soon
      as it hits a hole after the EOF, so if there is any data past that it will
      not pick that up. AFAIK no block based fs does this anyway, but its in the
      comments of the function anyway just in case.
      Signed-off-by: NJosef Bacik <jbacik@redhat.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: linux-fsdevel@vger.kernel.org
      68c9d702
  10. 29 7月, 2008 1 次提交
    • H
      vfs: pagecache usage optimization for pagesize!=blocksize · 8ab22b9a
      Hisashi Hifumi 提交于
      When we read some part of a file through pagecache, if there is a
      pagecache of corresponding index but this page is not uptodate, read IO
      is issued and this page will be uptodate.
      
      I think this is good for pagesize == blocksize environment but there is
      room for improvement on pagesize != blocksize environment.  Because in
      this case a page can have multiple buffers and even if a page is not
      uptodate, some buffers can be uptodate.
      
      So I suggest that when all buffers which correspond to a part of a file
      that we want to read are uptodate, use this pagecache and copy data from
      this pagecache to user buffer even if a page is not uptodate.  This can
      reduce read IO and improve system throughput.
      
      I wrote a benchmark program and got result number with this program.
      
      This benchmark do:
      
        1: mount and open a test file.
      
        2: create a 512MB file.
      
        3: close a file and umount.
      
        4: mount and again open a test file.
      
        5: pwrite randomly 300000 times on a test file.  offset is aligned
           by IO size(1024bytes).
      
        6: measure time of preading randomly 100000 times on a test file.
      
      The result was:
      	2.6.26
              330 sec
      
      	2.6.26-patched
              226 sec
      
      Arch:i386
      Filesystem:ext3
      Blocksize:1024 bytes
      Memory: 1GB
      
      On ext3/4, a file is written through buffer/block.  So random read/write
      mixed workloads or random read after random write workloads are optimized
      with this patch under pagesize != blocksize environment.  This test result
      showed this.
      
      The benchmark program is as follows:
      
      #include <stdio.h>
      #include <sys/types.h>
      #include <sys/stat.h>
      #include <fcntl.h>
      #include <unistd.h>
      #include <time.h>
      #include <stdlib.h>
      #include <string.h>
      #include <sys/mount.h>
      
      #define LEN 1024
      #define LOOP 1024*512 /* 512MB */
      
      main(void)
      {
      	unsigned long i, offset, filesize;
      	int fd;
      	char buf[LEN];
      	time_t t1, t2;
      
      	if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
      		perror("cannot mount\n");
      		exit(1);
      	}
      	memset(buf, 0, LEN);
      	fd = open("/root/test1/testfile", O_CREAT|O_RDWR|O_TRUNC);
      	if (fd < 0) {
      		perror("cannot open file\n");
      		exit(1);
      	}
      	for (i = 0; i < LOOP; i++)
      		write(fd, buf, LEN);
      	close(fd);
      	if (umount("/root/test1/") < 0) {
      		perror("cannot umount\n");
      		exit(1);
      	}
      	if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
      		perror("cannot mount\n");
      		exit(1);
      	}
      	fd = open("/root/test1/testfile", O_RDWR);
      	if (fd < 0) {
      		perror("cannot open file\n");
      		exit(1);
      	}
      
      	filesize = LEN * LOOP;
      	for (i = 0; i < 300000; i++){
      		offset = (random() % filesize) & (~(LEN - 1));
      		pwrite(fd, buf, LEN, offset);
      	}
      	printf("start test\n");
      	time(&t1);
      	for (i = 0; i < 100000; i++){
      		offset = (random() % filesize) & (~(LEN - 1));
      		pread(fd, buf, LEN, offset);
      	}
      	time(&t2);
      	printf("%ld sec\n", t2-t1);
      	close(fd);
      	if (umount("/root/test1/") < 0) {
      		perror("cannot umount\n");
      		exit(1);
      	}
      }
      Signed-off-by: NHisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Jan Kara <jack@ucw.cz>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8ab22b9a
  11. 28 4月, 2008 3 次提交
  12. 22 4月, 2008 1 次提交
  13. 08 2月, 2008 1 次提交
  14. 07 2月, 2008 2 次提交
  15. 17 10月, 2007 4 次提交
  16. 09 5月, 2007 2 次提交
    • J
      ext3: copy i_flags to inode flags on write · 4f99ed67
      Jan Kara 提交于
      Propagate flags such as S_APPEND, S_IMMUTABLE, etc.  from i_flags into
      ext2-specific i_flags.  Hence, when someone sets these flags via a different
      interface than ioctl, they are stored correctly.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4f99ed67
    • M
      ext2/3/4: fix file date underflow on ext2 3 filesystems on 64 bit systems · 4d7bf11d
      Markus Rechberger 提交于
      Taken from http://bugzilla.kernel.org/show_bug.cgi?id=5079
      
      signed long ranges from -2.147.483.648 to 2.147.483.647 on x86 32bit
      
      10000011110110100100111110111101 .. -2,082,844,739
      10000011110110100100111110111101 ..  2,212,122,557 <- this currently gets
      stored on the disk but when converting it to a 64bit signed long value it loses
      its sign and becomes positive.
      
      Cc: Andreas Dilger <adilger@dilger.ca>
      Cc: <linux-ext4@vger.kernel.org>
      
      Andreas says:
      
      This patch is now treating timestamps with the high bit set as negative
      times (before Jan 1, 1970).  This means we lose 1/2 of the possible range
      of timestamps (lopping off 68 years before unix timestamp overflow -
      now only 30 years away :-) to handle the extremely rare case of setting
      timestamps into the distant past.
      
      If we are only interested in fixing the underflow case, we could just
      limit the values to 0 instead of storing negative values.  At worst this
      will skew the timestamp by a few hours for timezones in the far east
      (files would still show Jan 1, 1970 in "ls -l" output).
      
      That said, it seems 32-bit systems (mine at least) allow files to be set
      into the past (01/01/1907 works fine) so it seems this patch is bringing
      the x86_64 behaviour into sync with other kernels.
      
      On the plus side, we have a patch that is ready to add nanosecond timestamps
      to ext3 and as an added bonus adds 2 high bits to the on-disk timestamp so
      this extends the maximum date to 2242.
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4d7bf11d
  17. 27 9月, 2006 1 次提交
  18. 29 6月, 2006 1 次提交
  19. 27 3月, 2006 1 次提交
  20. 02 2月, 2006 1 次提交
  21. 31 10月, 2005 1 次提交
  22. 10 9月, 2005 1 次提交
  23. 24 6月, 2005 1 次提交
  24. 17 4月, 2005 2 次提交
    • B
      [PATCH] ext2 corruption - regression between 2.6.9 and 2.6.10 · e072c6f2
      Bernard Blackham 提交于
      Whilst trying to stress test a Promise SX8 card, we stumbled across
      some nasty filesystem corruption in ext2. Our tests involved
      creating an ext2 partition, mounting, running several concurrent
      fsx's over it, umounting, and fsck'ing, all scripted[1]. The fsck
      would always return with errors.
      
      This regression was traced back to a change between 2.6.9 and
      2.6.10, which moves the functionality of ext2_put_inode into
      ext2_clear_inode.  The attached patch reverses this change, and
      eliminated the source of corruption.
      
      Mingming Cao <cmm@us.ibm.com> said:
      
      I think his patch for ext2 is correct.  The corruption on ext3 is not the same
      issue he saw on ext2.  I believe that's the race between discard reservation
      and reservation in-use that we already fixed it in 2.6.12- rc1.
      
      For the problem related to ext2, at the time when we design reservation for
      ext3, we decide we only need to discard the reservation at the last file
      close, so we have ext3_discard_reservation on iput_final- >ext3_clear_inode.
      
      The ext2 handle discard preallocation differently at that time, it discard the
      preallocation at each iput(), not in input_final(), so we think it's
      unnecessary to thrash it so frequently, and the right thing to do, as we did
      for ext3 reservation, discard preallocation on last iput().  So we moved the
      ext2_discard_preallocation from ext2_put_inode(0 to ext2_clear_inode.
      
      Since ext2 preallocation is doing pre-allocation on disk, so it is possible
      that at the unmount time, someone is still hold the reference of the inode, so
      the preallocation for a file is not discard yet, so we still mark those blocks
      allocated on disk, while they are not actually in the inode's block map, so
      fsck will catch/fix that error later.
      
      This is not a issue for ext3, as ext3 reservation(pre-allocation) is done in
      memory.
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e072c6f2
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4