1. 04 5月, 2010 12 次提交
  2. 03 5月, 2010 1 次提交
    • R
      nilfs2: fix sync silent failure · 973bec34
      Ryusuke Konishi 提交于
      As of 32a88aa1, __sync_filesystem() will return 0 if s_bdi is not set.
      And nilfs does not set s_bdi anywhere.  I noticed this problem by the
      warning introduced by the recent commit 5129a469 ("Catch filesystem
      lacking s_bdi").
      
       WARNING: at fs/super.c:959 vfs_kern_mount+0xc5/0x14e()
       Hardware name: PowerEdge 2850
       Modules linked in: nilfs2 loop tpm_tis tpm tpm_bios video shpchp pci_hotplug output dcdbas
       Pid: 3773, comm: mount.nilfs2 Not tainted 2.6.34-rc6-debug #38
       Call Trace:
        [<c1028422>] warn_slowpath_common+0x60/0x90
        [<c102845f>] warn_slowpath_null+0xd/0x10
        [<c1095936>] vfs_kern_mount+0xc5/0x14e
        [<c1095a03>] do_kern_mount+0x32/0xbd
        [<c10a811e>] do_mount+0x671/0x6d0
        [<c1073794>] ? __get_free_pages+0x1f/0x21
        [<c10a684f>] ? copy_mount_options+0x2b/0xe2
        [<c107b634>] ? strndup_user+0x48/0x67
        [<c10a81de>] sys_mount+0x61/0x8f
        [<c100280c>] sysenter_do_call+0x12/0x32
      
      This ensures to set s_bdi for nilfs and fixes the sync silent failure.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Acked-by: NJens Axboe <jens.axboe@oracle.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      973bec34
  3. 01 5月, 2010 2 次提交
  4. 30 4月, 2010 3 次提交
  5. 29 4月, 2010 5 次提交
  6. 28 4月, 2010 3 次提交
  7. 27 4月, 2010 2 次提交
    • N
      nfsd4: bug in read_buf · 2bc3c117
      Neil Brown 提交于
      When read_buf is called to move over to the next page in the pagelist
      of an NFSv4 request, it sets argp->end to essentially a random
      number, certainly not an address within the page which argp->p now
      points to.  So subsequent calls to READ_BUF will think there is much
      more than a page of spare space (the cast to u32 ensures an unsigned
      comparison) so we can expect to fall off the end of the second
      page.
      
      We never encountered thsi in testing because typically the only
      operations which use more than two pages are write-like operations,
      which have their own decoding logic.  Something like a getattr after a
      write may cross a page boundary, but it would be very unusual for it to
      cross another boundary after that.
      
      Cc: stable@kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      2bc3c117
    • D
      xfs: more swap extent fixes for dynamic fork offsets · dd77ef92
      Dave Chinner 提交于
      A new xfsqa test (226) with a prototype xfs_fsr change to try to
      handle dynamic fork offsets better triggers an assertion failure
      where the inode data fork is in btree format, yet there is room in
      the inode for it to be in extent format. The two inodes look like:
      
      before: ino 0x101 (target), num_extents 11, Max in-fork extents 6, broot size 40, fork offset 96
      before: ino 0x115 (temp),  num_extents 5, Max in-fork extents 3, broot size 40, fork offset 56
      after: ino 0x101 (target), num_extents 5, Max in-fork extents 6, broot size 40, fork offset 96
      after: ino 0x115 (temp), num_extents 11, Max in-fork extents 3, broot size 40, fork offset 56
      
      Basically the target inode ends up with 5 extents in btree format,
      but it had space for 6 extents in extent format, so ends up
      incorrect. Notably here the broot size is the same, and that is
      where the kernel code is going wrong - the btree root will fit, so
      it lets the swap go ahead.
      
      The check should not allow the swap to take place if the number of
      extents while in btree format is less than the number of extents
      that can fit in the inode in extent format. Adding that check will
      prevent this swap and corruption from occurring.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      dd77ef92
  8. 26 4月, 2010 1 次提交
  9. 25 4月, 2010 7 次提交
    • J
      Catch filesystems lacking s_bdi · 5129a469
      Jörn Engel 提交于
      noop_backing_dev_info is used only as a flag to mark filesystems that
      don't have any backing store, like tmpfs, procfs, spufs, etc.
      Signed-off-by: NJoern Engel <joern@logfs.org>
      
      Changed the BUG_ON() to a WARN_ON(). Note that adding dirty inodes
      to the noop_backing_dev_info is not legal and will not result in
      them being flushed, but we already catch this condition in
      __mark_inode_dirty() when checking for a registered bdi.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      5129a469
    • P
      squashfs: fix potential buffer over-run on 4K block file systems · e0d1f700
      Phillip Lougher 提交于
      Sizing the buffer based on block size is incorrect, leading
      to a potential buffer over-run on 4K block size file systems
      (because the metadata block size is always 8K).  This bug
      doesn't seem have triggered because 4K block size file systems
      are not default, and also because metadata blocks after
      compression tend to be less than 4K.
      Signed-off-by: NPhillip Lougher <phillip@lougher.demon.co.uk>
      e0d1f700
    • P
      squashfs: add missing buffer free · 370ec3d1
      Phillip Lougher 提交于
      Signed-off-by: NPhillip Lougher <phillip@lougher.demon.co.uk>
      370ec3d1
    • P
      squashfs: fix warn_on when root inode is corrupted · 1cb08e97
      Phillip Lougher 提交于
      Fix warn_on triggered by mounting a fsfuzzer corrupted file system, where
      the root inode has been corrupted.
      Signed-off-by: NPhillip Lougher <phillip@lougher.demon.co.uk>
      Reported-by: NSteve Grubb <sgrubb@redhat.com>
      1cb08e97
    • A
      fs/block_dev.c: fix performance regression in O_DIRECT|O_SYNC writes to block devices · b8af67e2
      Anton Blanchard 提交于
      We are seeing a large regression in database performance on recent
      kernels.  The database opens a block device with O_DIRECT|O_SYNC and a
      number of threads write to different regions of the file at the same time.
      
      A simple test case is below.  I haven't defined DEVICE since getting it
      wrong will destroy your data :) On an 3 disk LVM with a 64k chunk size we
      see about 17MB/sec and only a few threads in IO wait:
      
      procs  -----io---- -system-- -----cpu------
       r  b     bi    bo   in   cs us sy id wa st
       0  3      0 16170  656 2259  0  0 86 14  0
       0  2      0 16704  695 2408  0  0 92  8  0
       0  2      0 17308  744 2653  0  0 86 14  0
       0  2      0 17933  759 2777  0  0 89 10  0
      
      Most threads are blocking in vfs_fsync_range, which has:
      
              mutex_lock(&mapping->host->i_mutex);
              err = fop->fsync(file, dentry, datasync);
              if (!ret)
                      ret = err;
              mutex_unlock(&mapping->host->i_mutex);
      
      commit 148f948b (vfs: Introduce new
      helpers for syncing after writing to O_SYNC file or IS_SYNC inode) offers
      some explanation of what is going on:
      
          Use these new helpers for syncing from generic VFS functions. This makes
          O_SYNC writes to block devices acquire i_mutex for syncing. If we really
          care about this, we can make block_fsync() drop the i_mutex and reacquire
          it before it returns.
      
      Thanks Jan for such a good commit message!  As well as dropping i_mutex,
      Christoph suggests we should remove the call to sync_blockdev():
      
      > sync_blockdev is an overcomplicated alias for filemap_write_and_wait on
      > the block device inode, which is exactly what we did just before calling
      > into ->fsync
      
      The patch below incorporates both suggestions. With it the testcase improves
      from 17MB/s to 68M/sec:
      
      procs  -----io---- -system-- -----cpu------
       r  b     bi    bo   in   cs us sy id wa st
       0  7      0 65536 1000 3878  0  0 70 30  0
       0 34      0 69632 1016 3921  0  1 46 53  0
       0 57      0 69632 1000 3921  0  0 55 45  0
       0 53      0 69640  754 4111  0  0 81 19  0
      
      Testcase:
      
      #define _GNU_SOURCE
      #include <stdio.h>
      #include <pthread.h>
      #include <unistd.h>
      #include <stdlib.h>
      #include <string.h>
      #include <sys/types.h>
      #include <sys/stat.h>
      #include <fcntl.h>
      
      #define NR_THREADS 64
      #define BUFSIZE (64 * 1024)
      
      #define DEVICE "/dev/mapper/XXXXXX"
      
      #define ALIGN(VAL, SIZE) (((VAL)+(SIZE)-1) & ~((SIZE)-1))
      
      static int fd;
      
      static void *doit(void *arg)
      {
      	unsigned long offset = (long)arg;
      	char *b, *buf;
      
      	b = malloc(BUFSIZE + 1024);
      	buf = (char *)ALIGN((unsigned long)b, 1024);
      	memset(buf, 0, BUFSIZE);
      
      	while (1)
      		pwrite(fd, buf, BUFSIZE, offset);
      }
      
      int main(int argc, char *argv[])
      {
      	int flags = O_RDWR|O_DIRECT;
      	int i;
      	unsigned long offset = 0;
      
      	if (argc > 1 && !strcmp(argv[1], "O_SYNC"))
      		flags |= O_SYNC;
      
      	fd = open(DEVICE, flags);
      	if (fd == -1) {
      		perror("open");
      		exit(1);
      	}
      
      	for (i = 0; i < NR_THREADS-1; i++) {
      		pthread_t tid;
      		pthread_create(&tid, NULL, doit, (void *)offset);
      		offset += BUFSIZE;
      	}
      	doit((void *)offset);
      
      	return 0;
      }
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Acked-by: NJan Kara <jack@suse.cz>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b8af67e2
    • J
      reiserfs: fix corruption during shrinking of xattrs · fb2162df
      Jeff Mahoney 提交于
      Commit 48b32a35 ("reiserfs: use generic
      xattr handlers") introduced a problem that causes corruption when extended
      attributes are replaced with a smaller value.
      
      The issue is that the reiserfs_setattr to shrink the xattr file was moved
      from before the write to after the write.
      
      The root issue has always been in the reiserfs xattr code, but was papered
      over by the fact that in the shrink case, the file would just be expanded
      again while the xattr was written.
      
      The end result is that the last 8 bytes of xattr data are lost.
      
      This patch fixes it to use new_size.
      
      Addresses https://bugzilla.kernel.org/show_bug.cgi?id=14826Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Reported-by: NChristian Kujau <lists@nerdbynature.de>
      Tested-by: NChristian Kujau <lists@nerdbynature.de>
      Cc: Edward Shishkin <edward.shishkin@gmail.com>
      Cc: Jethro Beekman <kernel@jbeekman.nl>
      Cc: Greg Surbey <gregsurbey@hotmail.com>
      Cc: Marco Gatti <marco.gatti@gmail.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fb2162df
    • J
      reiserfs: fix permissions on .reiserfs_priv · cac36f70
      Jeff Mahoney 提交于
      Commit 677c9b2e ("reiserfs: remove
      privroot hiding in lookup") removed the magic from the lookup code to hide
      the .reiserfs_priv directory since it was getting loaded at mount-time
      instead.  The intent was that the entry would be hidden from the user via
      a poisoned d_compare, but this was faulty.
      
      This introduced a security issue where unprivileged users could access and
      modify extended attributes or ACLs belonging to other users, including
      root.
      
      This patch resolves the issue by properly hiding .reiserfs_priv.  This was
      the intent of the xattr poisoning code, but it appears to have never
      worked as expected.  This is fixed by using d_revalidate instead of
      d_compare.
      
      This patch makes -oexpose_privroot a no-op.  I'm fine leaving it this way.
      The effort involved in working out the corner cases wrt permissions and
      caching outweigh the benefit of the feature.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Acked-by: NEdward Shishkin <edward.shishkin@gmail.com>
      Reported-by: NMatt McCutchen <matt@mattmccutchen.net>
      Tested-by: NMatt McCutchen <matt@mattmccutchen.net>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cac36f70
  10. 24 4月, 2010 4 次提交