1. 10 8月, 2010 5 次提交
  2. 05 6月, 2010 1 次提交
    • N
      fix truncate inode time modification breakage · af5a30d8
      Nick Piggin 提交于
      mtime and ctime should be changed only if the file size has actually
      changed. Patches changing ext2 and tmpfs from vmtruncate to new truncate
      sequence has caused regressions where they always update timestamps.
      
      There is some strange cases in POSIX where truncate(2) must not update
      times unless the size has acutally changed, see 6e656be8.
      
      This area is all still rather buggy in different ways in a lot of
      filesystems and needs a cleanup and audit (ideally the vfs will provide
      a simple attribute or call to direct all filesystems exactly which
      attributes to change). But coming up with the best solution will take a
      while and is not appropriate for rc anyway.
      
      So fix recent regression for now.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      af5a30d8
  3. 28 5月, 2010 1 次提交
  4. 22 5月, 2010 3 次提交
    • D
      quota: unify quota init condition in setattr · 12755627
      Dmitry Monakhov 提交于
      Quota must being initialized if size or uid/git changes requested.
      But initialization performed in two different places:
      in case of i_size file system is responsible for dquot init
      , but in case of uid/gid init will be called internally in
      dquot_transfer().
      This ambiguity makes code harder to understand.
      Let's move this logic to one common helper function.
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Signed-off-by: NJan Kara <jack@suse.cz>
      12755627
    • J
      BKL: Remove BKL from ext2 filesystem · e0a5cbac
      Jan Blunck 提交于
      The BKL is still used in ext2_put_super(), ext2_fill_super(), ext2_sync_fs()
      ext2_remount() and ext2_write_inode(). From these calls ext2_put_super(),
      ext2_fill_super() and ext2_remount() are protected against each other by
      the struct super_block s_umount rw semaphore. The call in ext2_write_inode()
      could only protect the modification of the ext2_sb_info through
      ext2_update_dynamic_rev() against concurrent ext2_sync_fs() or ext2_remount().
      ext2_fill_super() and ext2_put_super() can be left out because you need a
      valid filesystem reference in all three cases, which you do not have when
      you are one of these functions.
      
      If the BKL is only protecting the modification of the ext2_sb_info it can
      safely be removed since this is protected by the struct ext2_sb_info s_lock.
      Signed-off-by: NJan Blunck <jblunck@suse.de>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: NJan Kara <jack@suse.cz>
      e0a5cbac
    • J
      ext2: Add ext2_sb_info s_lock spinlock · c15271f4
      Jan Blunck 提交于
      Add a spinlock that protects against concurrent modifications of
      s_mount_state, s_blocks_last, s_overhead_last and the content of the
      superblock's buffer pointed to by sbi->s_es. The spinlock is now used in
      ext2_xattr_update_super_block() which was setting the
      EXT2_FEATURE_COMPAT_EXT_ATTR flag on the superblock without protection
      before. Likewise the spinlock is used in ext2_show_options() to have a
      consistent view of the mount options.
      
      This is a preparation patch for removing the BKL from ext2 in the next
      patch.
      Signed-off-by: NJan Blunck <jblunck@suse.de>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Signed-off-by: NJan Kara <jack@suse.cz>
      c15271f4
  5. 06 3月, 2010 1 次提交
  6. 05 3月, 2010 3 次提交
    • C
      dquot: cleanup dquot initialize routine · 871a2931
      Christoph Hellwig 提交于
      Get rid of the initialize dquot operation - it is now always called from
      the filesystem and if a filesystem really needs it's own (which none
      currently does) it can just call into it's own routine directly.
      
      Rename the now static low-level dquot_initialize helper to __dquot_initialize
      and vfs_dq_init to dquot_initialize to have a consistent namespace.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      871a2931
    • C
      dquot: move dquot initialization responsibility into the filesystem · 907f4554
      Christoph Hellwig 提交于
      Currently various places in the VFS call vfs_dq_init directly.  This means
      we tie the quota code into the VFS.  Get rid of that and make the
      filesystem responsible for the initialization.   For most metadata operations
      this is a straight forward move into the methods, but for truncate and
      open it's a bit more complicated.
      
      For truncate we currently only call vfs_dq_init for the sys_truncate case
      because open already takes care of it for ftruncate and open(O_TRUNC) - the
      new code causes an additional vfs_dq_init for those which is harmless.
      
      For open the initialization is moved from do_filp_open into the open method,
      which means it happens slightly earlier now, and only for regular files.
      The latter is fine because we don't need to initialize it for operations
      on special files, and we already do it as part of the namespace operations
      for directories.
      
      Add a dquot_file_open helper that filesystems that support generic quotas
      can use to fill in ->open.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      907f4554
    • C
      dquot: cleanup dquot transfer routine · b43fa828
      Christoph Hellwig 提交于
      Get rid of the transfer dquot operation - it is now always called from
      the filesystem and if a filesystem really needs it's own (which none
      currently does) it can just call into it's own routine directly.
      
      Rename the now static low-level dquot_transfer helper to __dquot_transfer
      and vfs_dq_transfer to dquot_transfer to have a consistent namespace,
      and make the new dquot_transfer return a normal negative errno value
      which all callers expect.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      b43fa828
  7. 10 12月, 2009 1 次提交
    • A
      ext2: Unify log messages in ext2 · 2314b07c
      Alexey Fisher 提交于
      make messages produced by ext2 more unified. It should be
      easy to parse.
      
      dmesg before patch:
      [ 4893.684892] reservations ON
      [ 4893.684896] xip option not supported
      [ 4893.684961] EXT2-fs warning: mounting ext3 filesystem as ext2
      [ 4893.684964] EXT2-fs warning: maximal mount count reached, running
      e2fsck is recommended
      [ 4893.684990] EXT II FS: 0.5b, 95/08/09, bs=1024, fs=1024, gc=2,
      bpg=8192, ipg=1280, mo=80010]
      
      dmesg after patch:
      [ 4893.684892] EXT2-fs (loop0): reservations ON
      [ 4893.684896] EXT2-fs (loop0): xip option not supported
      [ 4893.684961] EXT2-fs (loop0): warning: mounting ext3 filesystem as
      ext2
      [ 4893.684964] EXT2-fs (loop0): warning: maximal mount count reached,
      running e2fsck is recommended
      [ 4893.684990] EXT2-fs (loop0): 0.5b, 95/08/09, bs=1024, fs=1024, gc=2,
      bpg=8192, ipg=1280, mo=80010]
      Signed-off-by: NAlexey Fisher <bug-track@fisher-privat.net>
      Reviewed-by: NAndreas Dilger <adilger@sun.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      2314b07c
  8. 16 9月, 2009 1 次提交
    • A
      HWPOISON: Enable .remove_error_page for migration aware file systems · aa261f54
      Andi Kleen 提交于
      Enable removing of corrupted pages through truncation
      for a bunch of file systems: ext*, xfs, gfs2, ocfs2, ntfs
      These should cover most server needs.
      
      I chose the set of migration aware file systems for this
      for now, assuming they have been especially audited.
      But in general it should be safe for all file systems
      on the data area that support read/write and truncate.
      
      Caveat: the hardware error handler does not take i_mutex
      for now before calling the truncate function. Is that ok?
      
      Cc: tytso@mit.edu
      Cc: hch@infradead.org
      Cc: mfasheh@suse.com
      Cc: aia21@cantab.net
      Cc: hugh.dickins@tiscali.co.uk
      Cc: swhiteho@redhat.com
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      aa261f54
  9. 14 9月, 2009 1 次提交
  10. 24 6月, 2009 1 次提交
  11. 12 6月, 2009 1 次提交
  12. 14 4月, 2009 1 次提交
    • J
      ext2: fix data corruption for racing writes · 316cb4ef
      Jan Kara 提交于
      If two writers allocating blocks to file race with each other (e.g.
      because writepages races with ordinary write or two writepages race with
      each other), ext2_getblock() can be called on the same inode in parallel.
      Before we are going to allocate new blocks, we have to recheck the block
      chain we have obtained so far without holding truncate_mutex.  Otherwise
      we could overwrite the indirect block pointer set by the other writer
      leading to data loss.
      
      The below test program by Ying is able to reproduce the data loss with ext2
      on in BRD in a few minutes if the machine is under memory pressure:
      
      long kMemSize  = 50 << 20;
      int kPageSize = 4096;
      
      int main(int argc, char **argv) {
      	int status;
      	int count = 0;
      	int i;
      	char *fname = "/mnt/test.mmap";
      	char *mem;
      	unlink(fname);
      	int fd = open(fname, O_CREAT | O_EXCL | O_RDWR, 0600);
      	status = ftruncate(fd, kMemSize);
      	mem = mmap(0, kMemSize, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
      	// Fill the memory with 1s.
      	memset(mem, 1, kMemSize);
      	sleep(2);
      	for (i = 0; i < kMemSize; i++) {
      		int byte_good = mem[i] != 0;
      		if (!byte_good && ((i % kPageSize) == 0)) {
      			//printf("%d ", i / kPageSize);
      			count++;
      		}
      	}
      	munmap(mem, kMemSize);
      	close(fd);
      	unlink(fname);
      
      	if (count > 0) {
      		printf("Running %d bad page\n", count);
      		return 1;
      	}
      	return 0;
      }
      
      Cc: Ying Han <yinghan@google.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Cc: Mingming Cao <cmm@us.ibm.com>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      316cb4ef
  13. 26 3月, 2009 1 次提交
  14. 09 1月, 2009 1 次提交
  15. 01 1月, 2009 1 次提交
  16. 04 10月, 2008 1 次提交
    • J
      generic block based fiemap implementation · 68c9d702
      Josef Bacik 提交于
      Any block based fs (this patch includes ext3) just has to declare its own
      fiemap() function and then call this generic function with its own
      get_block_t. This works well for block based filesystems that will map
      multiple contiguous blocks at one time, but will work for filesystems that
      only map one block at a time, you will just end up with an "extent" for each
      block. One gotcha is this will not play nicely where there is hole+data
      after the EOF. This function will assume its hit the end of the data as soon
      as it hits a hole after the EOF, so if there is any data past that it will
      not pick that up. AFAIK no block based fs does this anyway, but its in the
      comments of the function anyway just in case.
      Signed-off-by: NJosef Bacik <jbacik@redhat.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: linux-fsdevel@vger.kernel.org
      68c9d702
  17. 29 7月, 2008 1 次提交
    • H
      vfs: pagecache usage optimization for pagesize!=blocksize · 8ab22b9a
      Hisashi Hifumi 提交于
      When we read some part of a file through pagecache, if there is a
      pagecache of corresponding index but this page is not uptodate, read IO
      is issued and this page will be uptodate.
      
      I think this is good for pagesize == blocksize environment but there is
      room for improvement on pagesize != blocksize environment.  Because in
      this case a page can have multiple buffers and even if a page is not
      uptodate, some buffers can be uptodate.
      
      So I suggest that when all buffers which correspond to a part of a file
      that we want to read are uptodate, use this pagecache and copy data from
      this pagecache to user buffer even if a page is not uptodate.  This can
      reduce read IO and improve system throughput.
      
      I wrote a benchmark program and got result number with this program.
      
      This benchmark do:
      
        1: mount and open a test file.
      
        2: create a 512MB file.
      
        3: close a file and umount.
      
        4: mount and again open a test file.
      
        5: pwrite randomly 300000 times on a test file.  offset is aligned
           by IO size(1024bytes).
      
        6: measure time of preading randomly 100000 times on a test file.
      
      The result was:
      	2.6.26
              330 sec
      
      	2.6.26-patched
              226 sec
      
      Arch:i386
      Filesystem:ext3
      Blocksize:1024 bytes
      Memory: 1GB
      
      On ext3/4, a file is written through buffer/block.  So random read/write
      mixed workloads or random read after random write workloads are optimized
      with this patch under pagesize != blocksize environment.  This test result
      showed this.
      
      The benchmark program is as follows:
      
      #include <stdio.h>
      #include <sys/types.h>
      #include <sys/stat.h>
      #include <fcntl.h>
      #include <unistd.h>
      #include <time.h>
      #include <stdlib.h>
      #include <string.h>
      #include <sys/mount.h>
      
      #define LEN 1024
      #define LOOP 1024*512 /* 512MB */
      
      main(void)
      {
      	unsigned long i, offset, filesize;
      	int fd;
      	char buf[LEN];
      	time_t t1, t2;
      
      	if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
      		perror("cannot mount\n");
      		exit(1);
      	}
      	memset(buf, 0, LEN);
      	fd = open("/root/test1/testfile", O_CREAT|O_RDWR|O_TRUNC);
      	if (fd < 0) {
      		perror("cannot open file\n");
      		exit(1);
      	}
      	for (i = 0; i < LOOP; i++)
      		write(fd, buf, LEN);
      	close(fd);
      	if (umount("/root/test1/") < 0) {
      		perror("cannot umount\n");
      		exit(1);
      	}
      	if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) {
      		perror("cannot mount\n");
      		exit(1);
      	}
      	fd = open("/root/test1/testfile", O_RDWR);
      	if (fd < 0) {
      		perror("cannot open file\n");
      		exit(1);
      	}
      
      	filesize = LEN * LOOP;
      	for (i = 0; i < 300000; i++){
      		offset = (random() % filesize) & (~(LEN - 1));
      		pwrite(fd, buf, LEN, offset);
      	}
      	printf("start test\n");
      	time(&t1);
      	for (i = 0; i < 100000; i++){
      		offset = (random() % filesize) & (~(LEN - 1));
      		pread(fd, buf, LEN, offset);
      	}
      	time(&t2);
      	printf("%ld sec\n", t2-t1);
      	close(fd);
      	if (umount("/root/test1/") < 0) {
      		perror("cannot umount\n");
      		exit(1);
      	}
      }
      Signed-off-by: NHisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Jan Kara <jack@ucw.cz>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8ab22b9a
  18. 28 4月, 2008 3 次提交
  19. 22 4月, 2008 1 次提交
  20. 08 2月, 2008 1 次提交
  21. 07 2月, 2008 2 次提交
  22. 17 10月, 2007 4 次提交
  23. 09 5月, 2007 2 次提交
    • J
      ext3: copy i_flags to inode flags on write · 4f99ed67
      Jan Kara 提交于
      Propagate flags such as S_APPEND, S_IMMUTABLE, etc.  from i_flags into
      ext2-specific i_flags.  Hence, when someone sets these flags via a different
      interface than ioctl, they are stored correctly.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4f99ed67
    • M
      ext2/3/4: fix file date underflow on ext2 3 filesystems on 64 bit systems · 4d7bf11d
      Markus Rechberger 提交于
      Taken from http://bugzilla.kernel.org/show_bug.cgi?id=5079
      
      signed long ranges from -2.147.483.648 to 2.147.483.647 on x86 32bit
      
      10000011110110100100111110111101 .. -2,082,844,739
      10000011110110100100111110111101 ..  2,212,122,557 <- this currently gets
      stored on the disk but when converting it to a 64bit signed long value it loses
      its sign and becomes positive.
      
      Cc: Andreas Dilger <adilger@dilger.ca>
      Cc: <linux-ext4@vger.kernel.org>
      
      Andreas says:
      
      This patch is now treating timestamps with the high bit set as negative
      times (before Jan 1, 1970).  This means we lose 1/2 of the possible range
      of timestamps (lopping off 68 years before unix timestamp overflow -
      now only 30 years away :-) to handle the extremely rare case of setting
      timestamps into the distant past.
      
      If we are only interested in fixing the underflow case, we could just
      limit the values to 0 instead of storing negative values.  At worst this
      will skew the timestamp by a few hours for timezones in the far east
      (files would still show Jan 1, 1970 in "ls -l" output).
      
      That said, it seems 32-bit systems (mine at least) allow files to be set
      into the past (01/01/1907 works fine) so it seems this patch is bringing
      the x86_64 behaviour into sync with other kernels.
      
      On the plus side, we have a patch that is ready to add nanosecond timestamps
      to ext3 and as an added bonus adds 2 high bits to the on-disk timestamp so
      this extends the maximum date to 2242.
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4d7bf11d
  24. 27 9月, 2006 1 次提交
  25. 29 6月, 2006 1 次提交