1. 18 5月, 2010 4 次提交
  2. 04 5月, 2010 12 次提交
  3. 03 5月, 2010 1 次提交
    • R
      nilfs2: fix sync silent failure · 973bec34
      Ryusuke Konishi 提交于
      As of 32a88aa1, __sync_filesystem() will return 0 if s_bdi is not set.
      And nilfs does not set s_bdi anywhere.  I noticed this problem by the
      warning introduced by the recent commit 5129a469 ("Catch filesystem
      lacking s_bdi").
      
       WARNING: at fs/super.c:959 vfs_kern_mount+0xc5/0x14e()
       Hardware name: PowerEdge 2850
       Modules linked in: nilfs2 loop tpm_tis tpm tpm_bios video shpchp pci_hotplug output dcdbas
       Pid: 3773, comm: mount.nilfs2 Not tainted 2.6.34-rc6-debug #38
       Call Trace:
        [<c1028422>] warn_slowpath_common+0x60/0x90
        [<c102845f>] warn_slowpath_null+0xd/0x10
        [<c1095936>] vfs_kern_mount+0xc5/0x14e
        [<c1095a03>] do_kern_mount+0x32/0xbd
        [<c10a811e>] do_mount+0x671/0x6d0
        [<c1073794>] ? __get_free_pages+0x1f/0x21
        [<c10a684f>] ? copy_mount_options+0x2b/0xe2
        [<c107b634>] ? strndup_user+0x48/0x67
        [<c10a81de>] sys_mount+0x61/0x8f
        [<c100280c>] sysenter_do_call+0x12/0x32
      
      This ensures to set s_bdi for nilfs and fixes the sync silent failure.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Acked-by: NJens Axboe <jens.axboe@oracle.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      973bec34
  4. 02 5月, 2010 2 次提交
    • D
      NFS: Fix RCU issues in the NFSv4 delegation code · 17d2c0a0
      David Howells 提交于
      Fix a number of RCU issues in the NFSv4 delegation code.
      
       (1) delegation->cred doesn't need to be RCU protected as it's essentially an
           invariant refcounted structure.
      
           By the time we get to nfs_free_delegation(), the delegation is being
           released, so no one else should be attempting to use the saved
           credentials, and they can be cleared.
      
           However, since the list of delegations could still be under traversal at
           this point by such as nfs_client_return_marked_delegations(), the cred
           should be released in nfs_do_free_delegation() rather than in
           nfs_free_delegation().  Simply using rcu_assign_pointer() to clear it is
           insufficient as that doesn't stop the cred from being destroyed, and nor
           does calling put_rpccred() after call_rcu(), given that the latter is
           asynchronous.
      
       (2) nfs_detach_delegation_locked() and nfs_inode_set_delegation() should use
           rcu_derefence_protected() because they can only be called if
           nfs_client::cl_lock is held, and that guards against anyone changing
           nfsi->delegation under it.  Furthermore, the barrier imposed by
           rcu_dereference() is superfluous, given that the spin_lock() is also a
           barrier.
      
       (3) nfs_detach_delegation_locked() is now passed a pointer to the nfs_client
           struct so that it can issue lockdep advice based on clp->cl_lock for (2).
      
       (4) nfs_inode_return_delegation_noreclaim() and nfs_inode_return_delegation()
           should use rcu_access_pointer() outside the spinlocked region as they
           merely examine the pointer and don't follow it, thus rendering unnecessary
           the need to impose a partial ordering over the one item of interest.
      
           These result in an RCU warning like the following:
      
      [ INFO: suspicious rcu_dereference_check() usage. ]
      ---------------------------------------------------
      fs/nfs/delegation.c:332 invoked rcu_dereference_check() without protection!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 1, debug_locks = 0
      2 locks held by mount.nfs4/2281:
       #0:  (&type->s_umount_key#34){+.+...}, at: [<ffffffff810b25b4>] deactivate_super+0x60/0x80
       #1:  (iprune_sem){+.+...}, at: [<ffffffff810c332a>] invalidate_inodes+0x39/0x13a
      
      stack backtrace:
      Pid: 2281, comm: mount.nfs4 Not tainted 2.6.34-rc1-cachefs #110
      Call Trace:
       [<ffffffff8105149f>] lockdep_rcu_dereference+0xaa/0xb2
       [<ffffffffa00b4591>] nfs_inode_return_delegation_noreclaim+0x5b/0xa0 [nfs]
       [<ffffffffa0095d63>] nfs4_clear_inode+0x11/0x1e [nfs]
       [<ffffffff810c2d92>] clear_inode+0x9e/0xf8
       [<ffffffff810c3028>] dispose_list+0x67/0x10e
       [<ffffffff810c340d>] invalidate_inodes+0x11c/0x13a
       [<ffffffff810b1dc1>] generic_shutdown_super+0x42/0xf4
       [<ffffffff810b1ebe>] kill_anon_super+0x11/0x4f
       [<ffffffffa009893c>] nfs4_kill_super+0x3f/0x72 [nfs]
       [<ffffffff810b25bc>] deactivate_super+0x68/0x80
       [<ffffffff810c6744>] mntput_no_expire+0xbb/0xf8
       [<ffffffff810c681b>] release_mounts+0x9a/0xb0
       [<ffffffff810c689b>] put_mnt_ns+0x6a/0x79
       [<ffffffffa00983a1>] nfs_follow_remote_path+0x5a/0x146 [nfs]
       [<ffffffffa0098334>] ? nfs_do_root_mount+0x82/0x95 [nfs]
       [<ffffffffa00985a9>] nfs4_try_mount+0x75/0xaf [nfs]
       [<ffffffffa0098874>] nfs4_get_sb+0x291/0x31a [nfs]
       [<ffffffff810b2059>] vfs_kern_mount+0xb8/0x177
       [<ffffffff810b2176>] do_kern_mount+0x48/0xe8
       [<ffffffff810c810b>] do_mount+0x782/0x7f9
       [<ffffffff810c8205>] sys_mount+0x83/0xbe
       [<ffffffff81001eeb>] system_call_fastpath+0x16/0x1b
      
      Also on:
      
      fs/nfs/delegation.c:215 invoked rcu_dereference_check() without protection!
       [<ffffffff8105149f>] lockdep_rcu_dereference+0xaa/0xb2
       [<ffffffffa00b4223>] nfs_inode_set_delegation+0xfe/0x219 [nfs]
       [<ffffffffa00a9c6f>] nfs4_opendata_to_nfs4_state+0x2c2/0x30d [nfs]
       [<ffffffffa00aa15d>] nfs4_do_open+0x2a6/0x3a6 [nfs]
       ...
      
      And:
      
      fs/nfs/delegation.c:40 invoked rcu_dereference_check() without protection!
       [<ffffffff8105149f>] lockdep_rcu_dereference+0xaa/0xb2
       [<ffffffffa00b3bef>] nfs_free_delegation+0x3d/0x6e [nfs]
       [<ffffffffa00b3e71>] nfs_do_return_delegation+0x26/0x30 [nfs]
       [<ffffffffa00b406a>] __nfs_inode_return_delegation+0x1ef/0x1fe [nfs]
       [<ffffffffa00b448a>] nfs_client_return_marked_delegations+0xc9/0x124 [nfs]
       ...
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      17d2c0a0
    • T
      NFSv4: Fix the locking in nfs_inode_reclaim_delegation() · 8f649c37
      Trond Myklebust 提交于
      Ensure that we correctly rcu-dereference the delegation itself, and that we
      protect against removal while we're changing the contents.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      8f649c37
  5. 01 5月, 2010 2 次提交
  6. 30 4月, 2010 3 次提交
  7. 29 4月, 2010 5 次提交
  8. 28 4月, 2010 3 次提交
  9. 27 4月, 2010 2 次提交
    • N
      nfsd4: bug in read_buf · 2bc3c117
      Neil Brown 提交于
      When read_buf is called to move over to the next page in the pagelist
      of an NFSv4 request, it sets argp->end to essentially a random
      number, certainly not an address within the page which argp->p now
      points to.  So subsequent calls to READ_BUF will think there is much
      more than a page of spare space (the cast to u32 ensures an unsigned
      comparison) so we can expect to fall off the end of the second
      page.
      
      We never encountered thsi in testing because typically the only
      operations which use more than two pages are write-like operations,
      which have their own decoding logic.  Something like a getattr after a
      write may cross a page boundary, but it would be very unusual for it to
      cross another boundary after that.
      
      Cc: stable@kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      2bc3c117
    • D
      xfs: more swap extent fixes for dynamic fork offsets · dd77ef92
      Dave Chinner 提交于
      A new xfsqa test (226) with a prototype xfs_fsr change to try to
      handle dynamic fork offsets better triggers an assertion failure
      where the inode data fork is in btree format, yet there is room in
      the inode for it to be in extent format. The two inodes look like:
      
      before: ino 0x101 (target), num_extents 11, Max in-fork extents 6, broot size 40, fork offset 96
      before: ino 0x115 (temp),  num_extents 5, Max in-fork extents 3, broot size 40, fork offset 56
      after: ino 0x101 (target), num_extents 5, Max in-fork extents 6, broot size 40, fork offset 96
      after: ino 0x115 (temp), num_extents 11, Max in-fork extents 3, broot size 40, fork offset 56
      
      Basically the target inode ends up with 5 extents in btree format,
      but it had space for 6 extents in extent format, so ends up
      incorrect. Notably here the broot size is the same, and that is
      where the kernel code is going wrong - the btree root will fit, so
      it lets the swap go ahead.
      
      The check should not allow the swap to take place if the number of
      extents while in btree format is less than the number of extents
      that can fit in the inode in extent format. Adding that check will
      prevent this swap and corruption from occurring.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      dd77ef92
  10. 26 4月, 2010 1 次提交
  11. 25 4月, 2010 5 次提交
    • J
      Catch filesystems lacking s_bdi · 5129a469
      Jörn Engel 提交于
      noop_backing_dev_info is used only as a flag to mark filesystems that
      don't have any backing store, like tmpfs, procfs, spufs, etc.
      Signed-off-by: NJoern Engel <joern@logfs.org>
      
      Changed the BUG_ON() to a WARN_ON(). Note that adding dirty inodes
      to the noop_backing_dev_info is not legal and will not result in
      them being flushed, but we already catch this condition in
      __mark_inode_dirty() when checking for a registered bdi.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      5129a469
    • P
      squashfs: fix potential buffer over-run on 4K block file systems · e0d1f700
      Phillip Lougher 提交于
      Sizing the buffer based on block size is incorrect, leading
      to a potential buffer over-run on 4K block size file systems
      (because the metadata block size is always 8K).  This bug
      doesn't seem have triggered because 4K block size file systems
      are not default, and also because metadata blocks after
      compression tend to be less than 4K.
      Signed-off-by: NPhillip Lougher <phillip@lougher.demon.co.uk>
      e0d1f700
    • P
      squashfs: add missing buffer free · 370ec3d1
      Phillip Lougher 提交于
      Signed-off-by: NPhillip Lougher <phillip@lougher.demon.co.uk>
      370ec3d1
    • P
      squashfs: fix warn_on when root inode is corrupted · 1cb08e97
      Phillip Lougher 提交于
      Fix warn_on triggered by mounting a fsfuzzer corrupted file system, where
      the root inode has been corrupted.
      Signed-off-by: NPhillip Lougher <phillip@lougher.demon.co.uk>
      Reported-by: NSteve Grubb <sgrubb@redhat.com>
      1cb08e97
    • A
      fs/block_dev.c: fix performance regression in O_DIRECT|O_SYNC writes to block devices · b8af67e2
      Anton Blanchard 提交于
      We are seeing a large regression in database performance on recent
      kernels.  The database opens a block device with O_DIRECT|O_SYNC and a
      number of threads write to different regions of the file at the same time.
      
      A simple test case is below.  I haven't defined DEVICE since getting it
      wrong will destroy your data :) On an 3 disk LVM with a 64k chunk size we
      see about 17MB/sec and only a few threads in IO wait:
      
      procs  -----io---- -system-- -----cpu------
       r  b     bi    bo   in   cs us sy id wa st
       0  3      0 16170  656 2259  0  0 86 14  0
       0  2      0 16704  695 2408  0  0 92  8  0
       0  2      0 17308  744 2653  0  0 86 14  0
       0  2      0 17933  759 2777  0  0 89 10  0
      
      Most threads are blocking in vfs_fsync_range, which has:
      
              mutex_lock(&mapping->host->i_mutex);
              err = fop->fsync(file, dentry, datasync);
              if (!ret)
                      ret = err;
              mutex_unlock(&mapping->host->i_mutex);
      
      commit 148f948b (vfs: Introduce new
      helpers for syncing after writing to O_SYNC file or IS_SYNC inode) offers
      some explanation of what is going on:
      
          Use these new helpers for syncing from generic VFS functions. This makes
          O_SYNC writes to block devices acquire i_mutex for syncing. If we really
          care about this, we can make block_fsync() drop the i_mutex and reacquire
          it before it returns.
      
      Thanks Jan for such a good commit message!  As well as dropping i_mutex,
      Christoph suggests we should remove the call to sync_blockdev():
      
      > sync_blockdev is an overcomplicated alias for filemap_write_and_wait on
      > the block device inode, which is exactly what we did just before calling
      > into ->fsync
      
      The patch below incorporates both suggestions. With it the testcase improves
      from 17MB/s to 68M/sec:
      
      procs  -----io---- -system-- -----cpu------
       r  b     bi    bo   in   cs us sy id wa st
       0  7      0 65536 1000 3878  0  0 70 30  0
       0 34      0 69632 1016 3921  0  1 46 53  0
       0 57      0 69632 1000 3921  0  0 55 45  0
       0 53      0 69640  754 4111  0  0 81 19  0
      
      Testcase:
      
      #define _GNU_SOURCE
      #include <stdio.h>
      #include <pthread.h>
      #include <unistd.h>
      #include <stdlib.h>
      #include <string.h>
      #include <sys/types.h>
      #include <sys/stat.h>
      #include <fcntl.h>
      
      #define NR_THREADS 64
      #define BUFSIZE (64 * 1024)
      
      #define DEVICE "/dev/mapper/XXXXXX"
      
      #define ALIGN(VAL, SIZE) (((VAL)+(SIZE)-1) & ~((SIZE)-1))
      
      static int fd;
      
      static void *doit(void *arg)
      {
      	unsigned long offset = (long)arg;
      	char *b, *buf;
      
      	b = malloc(BUFSIZE + 1024);
      	buf = (char *)ALIGN((unsigned long)b, 1024);
      	memset(buf, 0, BUFSIZE);
      
      	while (1)
      		pwrite(fd, buf, BUFSIZE, offset);
      }
      
      int main(int argc, char *argv[])
      {
      	int flags = O_RDWR|O_DIRECT;
      	int i;
      	unsigned long offset = 0;
      
      	if (argc > 1 && !strcmp(argv[1], "O_SYNC"))
      		flags |= O_SYNC;
      
      	fd = open(DEVICE, flags);
      	if (fd == -1) {
      		perror("open");
      		exit(1);
      	}
      
      	for (i = 0; i < NR_THREADS-1; i++) {
      		pthread_t tid;
      		pthread_create(&tid, NULL, doit, (void *)offset);
      		offset += BUFSIZE;
      	}
      	doit((void *)offset);
      
      	return 0;
      }
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Acked-by: NJan Kara <jack@suse.cz>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b8af67e2