1. 09 9月, 2009 1 次提交
  2. 06 9月, 2009 1 次提交
  3. 13 7月, 2009 1 次提交
  4. 01 7月, 2009 1 次提交
    • B
      ext2: return -EIO not -ESTALE on directory traversal through deleted inode · 4d6c13f8
      Bryan Donlan 提交于
      ext2_iget() returns -ESTALE if invoked on a deleted inode, in order to
      report errors to NFS properly.  However, in ext[234]_lookup(), this
      -ESTALE can be propagated to userspace if the filesystem is corrupted such
      that a directory entry references a deleted inode.  This leads to a
      misleading error message - "Stale NFS file handle" - and confusion on the
      part of the admin.
      
      The bug can be easily reproduced by creating a new filesystem, making a
      link to an unused inode using debugfs, then mounting and attempting to ls
      -l said link.
      
      This patch thus changes ext2_lookup to return -EIO if it receives -ESTALE
      from ext2_iget(), as ext2 does for other filesystem metadata corruption;
      and also invokes the appropriate ext*_error functions when this case is
      detected.
      Signed-off-by: NBryan Donlan <bdonlan@gmail.com>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4d6c13f8
  5. 24 6月, 2009 2 次提交
  6. 19 6月, 2009 1 次提交
    • J
      ext2: Do not update mtime of a moved directory · 39fe7557
      Jan Kara 提交于
      One of our users is complaining that his backup tool is upset on ext2
      (while it's happy on ext3, xfs, ...) because of the mtime change.
      
      The problem is:
      
          mkdir foo
          mkdir bar
          mkdir foo/a
      
      Now under ext2:
          mv foo/a foo/b
      
      changes mtime of 'foo/a' (foo/b after the move).  That does not really
      make sense and it does not happen under any other filesystem I've seen.
      
      More complicated is:
          mv foo/a bar/a
      
      This changes mtime of foo/a (bar/a after the move) and it makes some
      sense since we had to update parent directory pointer of foo/a.  But
      again, no other filesystem does this.  So after some thoughts I'd vote
      for consistency and change ext2 to behave the same as other filesystems.
      
      Do not update mtime of a moved directory.  Specs don't say anything
      about it (neither that it should, nor that it should not be updated) and
      other common filesystems (ext3, ext4, xfs, reiserfs, fat, ...) don't do
      it.  So let's become more consistent.
      
      Spotted by ronny.pretzsch@dfs.de, initial fix by Jörn Engel.
      
      Reported-by: <ronny.pretzsch@dfs.de>
      Cc: <hare@suse.de>
      Cc: Jörn Engel <joern@logfs.org>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      39fe7557
  7. 13 6月, 2009 1 次提交
  8. 12 6月, 2009 5 次提交
  9. 18 5月, 2009 1 次提交
  10. 27 4月, 2009 1 次提交
  11. 14 4月, 2009 1 次提交
    • J
      ext2: fix data corruption for racing writes · 316cb4ef
      Jan Kara 提交于
      If two writers allocating blocks to file race with each other (e.g.
      because writepages races with ordinary write or two writepages race with
      each other), ext2_getblock() can be called on the same inode in parallel.
      Before we are going to allocate new blocks, we have to recheck the block
      chain we have obtained so far without holding truncate_mutex.  Otherwise
      we could overwrite the indirect block pointer set by the other writer
      leading to data loss.
      
      The below test program by Ying is able to reproduce the data loss with ext2
      on in BRD in a few minutes if the machine is under memory pressure:
      
      long kMemSize  = 50 << 20;
      int kPageSize = 4096;
      
      int main(int argc, char **argv) {
      	int status;
      	int count = 0;
      	int i;
      	char *fname = "/mnt/test.mmap";
      	char *mem;
      	unlink(fname);
      	int fd = open(fname, O_CREAT | O_EXCL | O_RDWR, 0600);
      	status = ftruncate(fd, kMemSize);
      	mem = mmap(0, kMemSize, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
      	// Fill the memory with 1s.
      	memset(mem, 1, kMemSize);
      	sleep(2);
      	for (i = 0; i < kMemSize; i++) {
      		int byte_good = mem[i] != 0;
      		if (!byte_good && ((i % kPageSize) == 0)) {
      			//printf("%d ", i / kPageSize);
      			count++;
      		}
      	}
      	munmap(mem, kMemSize);
      	close(fd);
      	unlink(fname);
      
      	if (count > 0) {
      		printf("Running %d bad page\n", count);
      		return 1;
      	}
      	return 0;
      }
      
      Cc: Ying Han <yinghan@google.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Cc: Mingming Cao <cmm@us.ibm.com>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      316cb4ef
  12. 01 4月, 2009 1 次提交
  13. 26 3月, 2009 2 次提交
  14. 12 2月, 2009 1 次提交
    • C
      ext2/xip: refuse to change xip flag during remount with busy inodes · 0e4a9b59
      Carsten Otte 提交于
      For a reason that I was unable to understand in three months of debugging,
      mount ext2 -o remount stopped working properly when remounting from
      regular operation to xip, or the other way around.  According to a git
      bisect search, the problem was introduced with the VM_MIXEDMAP/PTE_SPECIAL
      rework in the vm:
      
      commit 70688e4d
      Author: Nick Piggin <npiggin@suse.de>
      Date:   Mon Apr 28 02:13:02 2008 -0700
      
          xip: support non-struct page backed memory
      
      In the failing scenario, the filesystem is mounted read only via root=
      kernel parameter on s390x.  During remount (in rc.sysinit), the inodes of
      the bash binary and its libraries are busy and cannot be invalidated (the
      bash which is running rc.sysinit resides on subject filesystem).
      Afterwards, another bash process (running ifup-eth) recurses into a
      subshell, runs dup_mm (via fork).  Some of the mappings in this bash
      process were created from inodes that could not be invalidated during
      remount.
      
      Both parent and child process crash some time later due to inconsistencies
      in their address spaces.  The issue seems to be timing sensitive, various
      attempts to recreate it have failed.
      
      This patch refuses to change the xip flag during remount in case some
      inodes cannot be invalidated.  This patch keeps users from running into
      that issue.
      
      [akpm@linux-foundation.org: cleanup]
      Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Jared Hulbert <jaredeh@gmail.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0e4a9b59
  15. 16 1月, 2009 1 次提交
  16. 09 1月, 2009 4 次提交
  17. 01 1月, 2009 2 次提交
    • A
      nfsd race fixes: ext2 · 41080b5a
      Al Viro 提交于
      * make ext2_new_inode() put the inode into icache in locked state
      * do not unlock until the inode is fully set up; otherwise nfsd
      might pick it in half-baked state.
      * make sure that ext2_new_inode() does *not* lead to two inodes with the
      same inumber hashed at the same time; otherwise a bogus fhandle coming
      from nfsd might race with inode creation:
      
      nfsd: iget_locked() creates inode
      nfsd: try to read from disk, block on that.
      ext2_new_inode(): allocate inode with that inumber
      ext2_new_inode(): insert it into icache, set it up and dirty
      ext2_write_inode(): get the relevant part of inode table in cache,
      set the entry for our inode (and start writing to disk)
      nfsd: get CPU again, look into inode table, see nice and sane on-disk
      inode, set the in-core inode from it
      
      oops - we have two in-core inodes with the same inumber live in icache,
      both used for IO.  Welcome to fs corruption...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      41080b5a
    • D
      ext2: ensure fast symlinks are NUL-terminated · 8d6d0c4d
      Duane Griffin 提交于
      Ensure fast symlink targets are NUL-terminated, even if corrupted
      on-disk.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDuane Griffin <duaneg@dghda.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      8d6d0c4d
  18. 14 11月, 2008 1 次提交
  19. 23 10月, 2008 2 次提交
  20. 21 10月, 2008 2 次提交
  21. 17 10月, 2008 2 次提交
    • E
      ext2: avoid printk floods in the face of directory corruption · bd39597c
      Eric Sandeen 提交于
      A very large directory with many read failures (either due to storage
      problems, or due to invalid size & blocks from corruption) will generate a
      printk storm as the filesystem continues to try to read all the blocks.
      This flood of messages can tie up the box until it is complete - which may
      be a very long time, especially for very large corrupted values.
      
      This is fixed by only reporting the corruption once each time we try to
      read the directory.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: Eugene Teo <eugeneteo@kernel.sg>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bd39597c
    • M
      ext2: fix ext2 block reservation early ENOSPC issue · d707d31c
      Mingming Cao 提交于
      We could run into ENOSPC error on ext2, even when there is free blocks on
      the filesystem.
      
      The problem is triggered in the case the goal block group has 0 free
      blocks , and the rest block groups are skipped due to the check of
      "free_blocks < windowsz/2".  Current code could fall back to non
      reservation allocation to prevent early ENOSPC after examing all the block
      groups with reservation on , but this code was bypassed if the reservation
      window is turned off already, which is true in this case.
      
      This patch fixed two issues:
      1) We don't need to turn off block reservation if the goal block group has
      0 free blocks left and continue search for the rest of block groups.
      
      Current code the intention is to turn off the block reservation if the
      goal allocation group has a few (some) free blocks left (not enough for
      make the desired reservation window),to try to allocation in the goal
      block group, to get better locality.  But if the goal blocks have 0 free
      blocks, it should leave the block reservation on, and continues search for
      the next block groups,rather than turn off block reservation completely.
      
      2) we don't need to check the window size if the block reservation is off.
      
      The problem was originally found and fixed in ext4.
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d707d31c
  22. 14 10月, 2008 1 次提交
  23. 04 10月, 2008 1 次提交
    • J
      generic block based fiemap implementation · 68c9d702
      Josef Bacik 提交于
      Any block based fs (this patch includes ext3) just has to declare its own
      fiemap() function and then call this generic function with its own
      get_block_t. This works well for block based filesystems that will map
      multiple contiguous blocks at one time, but will work for filesystems that
      only map one block at a time, you will just end up with an "extent" for each
      block. One gotcha is this will not play nicely where there is hole+data
      after the EOF. This function will assume its hit the end of the data as soon
      as it hits a hole after the EOF, so if there is any data past that it will
      not pick that up. AFAIK no block based fs does this anyway, but its in the
      comments of the function anyway just in case.
      Signed-off-by: NJosef Bacik <jbacik@redhat.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: linux-fsdevel@vger.kernel.org
      68c9d702
  24. 29 7月, 2008 1 次提交
    • H
      vfs: pagecache usage optimization for pagesize!=blocksize · 8ab22b9a
      Hisashi Hifumi 提交于
      When we read some part of a file through pagecache, if there is a
      pagecache of corresponding index but this page is not uptodate, read IO
      is issued and this page will be uptodate.
      
      I think this is good for pagesize == blocksize environment but there is
      room for improvement on pagesize != blocksize environment.  Because in
      this case a page can have multiple buffers and even if a page is not
      uptodate, some buffers can be uptodate.
      
      So I suggest that when all buffers which correspond to a part of a file
      that we want to read are uptodate, use this pagecache and copy data from
      this pagecache to user buffer even if a page is not uptodate.  This can
      reduce read IO and improve system throughput.
      
      I wrote a benchmark program and got result number with this program.
      
      This benchmark do:
      
        1: mount and open a test file.
      
        2: create a 512MB file.
      
        3: close a file and umount.
      
        4: mount and again open a test file.
      
        5: pwrite randomly 300000 times on a test file.  offset is aligned
           by IO size(1024bytes).
      
        6: measure time of preading randomly 100000 times on a test file.
      
      The result was:
      	2.6.26
              330 sec
      
      	2.6.26-patched
              226 sec
      
      Arch:i386
      Filesystem:ext3
      Blocksize:1024 bytes
      Memory: 1GB
      
      On ext3/4, a file is written through buffer/block.  So random read/write
      mixed workloads or random read after random write workloads are optimized
      with this patch under pagesize != blocksize environment.  This test result
      showed this.
      
      The benchmark program is as follows:
      
      #include <stdio.h>
      #include <sys/types.h>
      #include <sys/stat.h>
      #include <fcntl.h>
      #include <unistd.h>
      #include <time.h>
      #include <stdlib.h>
      #include <string.h>
      #include <sys/mount.h>
      
      #define LEN 1024
      #define LOOP 1024*512 /* 512MB */
      
      main(void)
      {
      	unsigned long i, offset, filesize;
      	int fd;
      	char buf[LEN];
      	time_t t1, t2;
      
      	if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
      		perror("cannot mount\n");
      		exit(1);
      	}
      	memset(buf, 0, LEN);
      	fd = open("/root/test1/testfile", O_CREAT|O_RDWR|O_TRUNC);
      	if (fd < 0) {
      		perror("cannot open file\n");
      		exit(1);
      	}
      	for (i = 0; i < LOOP; i++)
      		write(fd, buf, LEN);
      	close(fd);
      	if (umount("/root/test1/") < 0) {
      		perror("cannot umount\n");
      		exit(1);
      	}
      	if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
      		perror("cannot mount\n");
      		exit(1);
      	}
      	fd = open("/root/test1/testfile", O_RDWR);
      	if (fd < 0) {
      		perror("cannot open file\n");
      		exit(1);
      	}
      
      	filesize = LEN * LOOP;
      	for (i = 0; i < 300000; i++){
      		offset = (random() % filesize) & (~(LEN - 1));
      		pwrite(fd, buf, LEN, offset);
      	}
      	printf("start test\n");
      	time(&t1);
      	for (i = 0; i < 100000; i++){
      		offset = (random() % filesize) & (~(LEN - 1));
      		pread(fd, buf, LEN, offset);
      	}
      	time(&t2);
      	printf("%ld sec\n", t2-t1);
      	close(fd);
      	if (umount("/root/test1/") < 0) {
      		perror("cannot umount\n");
      		exit(1);
      	}
      }
      Signed-off-by: NHisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Jan Kara <jack@ucw.cz>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8ab22b9a
  25. 27 7月, 2008 2 次提交
  26. 26 7月, 2008 1 次提交