1. 29 1月, 2014 1 次提交
  2. 28 1月, 2014 1 次提交
    • J
      NFS: fix the handling of NFS_INO_INVALID_DATA flag in nfs_revalidate_mapping · d529ef83
      Jeff Layton 提交于
      There is a possible race in how the nfs_invalidate_mapping function is
      handled.  Currently, we go and invalidate the pages in the file and then
      clear NFS_INO_INVALID_DATA.
      
      The problem is that it's possible for a stale page to creep into the
      mapping after the page was invalidated (i.e., via readahead). If another
      writer comes along and sets the flag after that happens but before
      invalidate_inode_pages2 returns then we could clear the flag
      without the cache having been properly invalidated.
      
      So, we must clear the flag first and then invalidate the pages. Doing
      this however, opens another race:
      
      It's possible to have two concurrent read() calls that end up in
      nfs_revalidate_mapping at the same time. The first one clears the
      NFS_INO_INVALID_DATA flag and then goes to call nfs_invalidate_mapping.
      
      Just before calling that though, the other task races in, checks the
      flag and finds it cleared. At that point, it trusts that the mapping is
      good and gets the lock on the page, allowing the read() to be satisfied
      from the cache even though the data is no longer valid.
      
      These effects are easily manifested by running diotest3 from the LTP
      test suite on NFS. That program does a series of DIO writes and buffered
      reads. The operations are serialized and page-aligned but the existing
      code fails the test since it occasionally allows a read to come out of
      the cache incorrectly. While mixing direct and buffered I/O isn't
      recommended, I believe it's possible to hit this in other ways that just
      use buffered I/O, though that situation is much harder to reproduce.
      
      The problem is that the checking/clearing of that flag and the
      invalidation of the mapping really need to be atomic. Fix this by
      serializing concurrent invalidations with a bitlock.
      
      At the same time, we also need to allow other places that check
      NFS_INO_INVALID_DATA to check whether we might be in the middle of
      invalidating the file, so fix up a couple of places that do that
      to look for the new NFS_INO_INVALIDATING flag.
      
      Doing this requires us to be careful not to set the bitlock
      unnecessarily, so this code only does that if it believes it will
      be doing an invalidation.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      d529ef83
  3. 06 1月, 2014 1 次提交
  4. 29 10月, 2013 1 次提交
  5. 25 10月, 2013 1 次提交
  6. 28 9月, 2013 1 次提交
    • D
      NFS: Use i_writecount to control whether to get an fscache cookie in nfs_open() · f1fe29b4
      David Howells 提交于
      Use i_writecount to control whether to get an fscache cookie in nfs_open() as
      NFS does not do write caching yet.  I *think* this is the cause of a problem
      encountered by Mark Moseley whereby __fscache_uncache_page() gets a NULL
      pointer dereference because cookie->def is NULL:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
      IP: [<ffffffff812a1903>] __fscache_uncache_page+0x23/0x160
      PGD 0
      Thread overran stack, or stack corrupted
      Oops: 0000 [#1] SMP
      Modules linked in: ...
      CPU: 7 PID: 18993 Comm: php Not tainted 3.11.1 #1
      Hardware name: Dell Inc. PowerEdge R420/072XWF, BIOS 1.3.5 08/21/2012
      task: ffff8804203460c0 ti: ffff880420346640
      RIP: 0010:[<ffffffff812a1903>] __fscache_uncache_page+0x23/0x160
      RSP: 0018:ffff8801053af878 EFLAGS: 00210286
      RAX: 0000000000000000 RBX: ffff8800be2f8780 RCX: ffff88022ffae5e8
      RDX: 0000000000004c66 RSI: ffffea00055ff440 RDI: ffff8800be2f8780
      RBP: ffff8801053af898 R08: 0000000000000001 R09: 0000000000000003
      R10: 0000000000000000 R11: 0000000000000000 R12: ffffea00055ff440
      R13: 0000000000001000 R14: ffff8800c50be538 R15: 0000000000000000
      FS: 0000000000000000(0000) GS:ffff88042fc60000(0063) knlGS:00000000e439c700
      CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
      CR2: 0000000000000010 CR3: 0000000001d8f000 CR4: 00000000000607f0
      Stack:
      ...
      Call Trace:
      [<ffffffff81365a72>] __nfs_fscache_invalidate_page+0x42/0x70
      [<ffffffff813553d5>] nfs_invalidate_page+0x75/0x90
      [<ffffffff811b8f5e>] truncate_inode_page+0x8e/0x90
      [<ffffffff811b90ad>] truncate_inode_pages_range.part.12+0x14d/0x620
      [<ffffffff81d6387d>] ? __mutex_lock_slowpath+0x1fd/0x2e0
      [<ffffffff811b95d3>] truncate_inode_pages_range+0x53/0x70
      [<ffffffff811b969d>] truncate_inode_pages+0x2d/0x40
      [<ffffffff811b96ff>] truncate_pagecache+0x4f/0x70
      [<ffffffff81356840>] nfs_setattr_update_inode+0xa0/0x120
      [<ffffffff81368de4>] nfs3_proc_setattr+0xc4/0xe0
      [<ffffffff81357f78>] nfs_setattr+0xc8/0x150
      [<ffffffff8122d95b>] notify_change+0x1cb/0x390
      [<ffffffff8120a55b>] do_truncate+0x7b/0xc0
      [<ffffffff8121f96c>] do_last+0xa4c/0xfd0
      [<ffffffff8121ffbc>] path_openat+0xcc/0x670
      [<ffffffff81220a0e>] do_filp_open+0x4e/0xb0
      [<ffffffff8120ba1f>] do_sys_open+0x13f/0x2b0
      [<ffffffff8126aaf6>] compat_SyS_open+0x36/0x50
      [<ffffffff81d7204c>] sysenter_dispatch+0x7/0x24
      
      The code at the instruction pointer was disassembled:
      
      > (gdb) disas __fscache_uncache_page
      > Dump of assembler code for function __fscache_uncache_page:
      > ...
      > 0xffffffff812a18ff <+31>: mov 0x48(%rbx),%rax
      > 0xffffffff812a1903 <+35>: cmpb $0x0,0x10(%rax)
      > 0xffffffff812a1907 <+39>: je 0xffffffff812a19cd <__fscache_uncache_page+237>
      
      These instructions make up:
      
      	ASSERTCMP(cookie->def->type, !=, FSCACHE_COOKIE_TYPE_INDEX);
      
      That cmpb is the faulting instruction (%rax is 0).  So cookie->def is NULL -
      which presumably means that the cookie has already been at least partway
      through __fscache_relinquish_cookie().
      
      What I think may be happening is something like a three-way race on the same
      file:
      
      	PROCESS 1	PROCESS 2	PROCESS 3
      	===============	===============	===============
      	open(O_TRUNC|O_WRONLY)
      			open(O_RDONLY)
      					open(O_WRONLY)
      	-->nfs_open()
      	-->nfs_fscache_set_inode_cookie()
      	nfs_fscache_inode_lock()
      	nfs_fscache_disable_inode_cookie()
      	__fscache_relinquish_cookie()
      	nfs_inode->fscache = NULL
      	<--nfs_fscache_set_inode_cookie()
      
      			-->nfs_open()
      			-->nfs_fscache_set_inode_cookie()
      			nfs_fscache_inode_lock()
      			nfs_fscache_enable_inode_cookie()
      			__fscache_acquire_cookie()
      			nfs_inode->fscache = cookie
      			<--nfs_fscache_set_inode_cookie()
      	<--nfs_open()
      	-->nfs_setattr()
      	...
      	...
      	-->nfs_invalidate_page()
      	-->__nfs_fscache_invalidate_page()
      	cookie = nfsi->fscache
      					-->nfs_open()
      					-->nfs_fscache_set_inode_cookie()
      					nfs_fscache_inode_lock()
      					nfs_fscache_disable_inode_cookie()
      					-->__fscache_relinquish_cookie()
      	-->__fscache_uncache_page(cookie)
      	<crash>
      					<--__fscache_relinquish_cookie()
      					nfs_inode->fscache = NULL
      					<--nfs_fscache_set_inode_cookie()
      
      What is needed is something to prevent process #2 from reacquiring the cookie
      - and I think checking i_writecount should do the trick.
      
      It's also possible to have a two-way race on this if the file is opened
      O_TRUNC|O_RDONLY instead.
      Reported-by: NMark Moseley <moseleymark@gmail.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      f1fe29b4
  7. 26 9月, 2013 1 次提交
  8. 17 9月, 2013 1 次提交
    • M
      nfs: set FILE_CREATED · 01c919ab
      Miklos Szeredi 提交于
      Set FILE_CREATED on O_CREAT|O_EXCL.  If the NFS server honored our request
      for exclusivity then this must be correct.
      
      Currently this is a no-op, since the VFS sets FILE_CREATED anyway.  The
      next patch will, however, require this flag to be always set by
      filesystems.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      01c919ab
  9. 11 9月, 2013 2 次提交
    • D
      fs: convert fs shrinkers to new scan/count API · 1ab6c499
      Dave Chinner 提交于
      Convert the filesystem shrinkers to use the new API, and standardise some
      of the behaviours of the shrinkers at the same time.  For example,
      nr_to_scan means the number of objects to scan, not the number of objects
      to free.
      
      I refactored the CIFS idmap shrinker a little - it really needs to be
      broken up into a shrinker per tree and keep an item count with the tree
      root so that we don't need to walk the tree every time the shrinker needs
      to count the number of objects in the tree (i.e.  all the time under
      memory pressure).
      
      [glommer@openvz.org: fixes for ext4, ubifs, nfs, cifs and glock. Fixes are needed mainly due to new code merged in the tree]
      [assorted fixes folded in]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Acked-by: NJan Kara <jack@suse.cz>
      Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1ab6c499
    • G
      super: fix calculation of shrinkable objects for small numbers · 55f841ce
      Glauber Costa 提交于
      The sysctl knob sysctl_vfs_cache_pressure is used to determine which
      percentage of the shrinkable objects in our cache we should actively try
      to shrink.
      
      It works great in situations in which we have many objects (at least more
      than 100), because the aproximation errors will be negligible.  But if
      this is not the case, specially when total_objects < 100, we may end up
      concluding that we have no objects at all (total / 100 = 0, if total <
      100).
      
      This is certainly not the biggest killer in the world, but may matter in
      very low kernel memory situations.
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      55f841ce
  10. 06 9月, 2013 1 次提交
  11. 04 9月, 2013 1 次提交
  12. 30 8月, 2013 1 次提交
  13. 22 8月, 2013 7 次提交
  14. 21 8月, 2013 1 次提交
  15. 08 8月, 2013 1 次提交
  16. 10 7月, 2013 2 次提交
  17. 05 7月, 2013 1 次提交
  18. 04 7月, 2013 1 次提交
  19. 29 6月, 2013 1 次提交
  20. 09 6月, 2013 3 次提交
  21. 07 6月, 2013 2 次提交
  22. 26 3月, 2013 1 次提交
  23. 26 2月, 2013 1 次提交
    • J
      vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op · ecf3d1f1
      Jeff Layton 提交于
      The following set of operations on a NFS client and server will cause
      
          server# mkdir a
          client# cd a
          server# mv a a.bak
          client# sleep 30  # (or whatever the dir attrcache timeout is)
          client# stat .
          stat: cannot stat `.': Stale NFS file handle
      
      Obviously, we should not be getting an ESTALE error back there since the
      inode still exists on the server. The problem is that the lookup code
      will call d_revalidate on the dentry that "." refers to, because NFS has
      FS_REVAL_DOT set.
      
      nfs_lookup_revalidate will see that the parent directory has changed and
      will try to reverify the dentry by redoing a LOOKUP. That of course
      fails, so the lookup code returns ESTALE.
      
      The problem here is that d_revalidate is really a bad fit for this case.
      What we really want to know at this point is whether the inode is still
      good or not, but we don't really care what name it goes by or whether
      the dcache is still valid.
      
      Add a new d_op->d_weak_revalidate operation and have complete_walk call
      that instead of d_revalidate. The intent there is to allow for a
      "weaker" d_revalidate that just checks to see whether the inode is still
      good. This is also gives us an opportunity to kill off the FS_REVAL_DOT
      special casing.
      
      [AV: changed method name, added note in porting, fixed confusion re
      having it possibly called from RCU mode (it won't be)]
      
      Cc: NeilBrown <neilb@suse.de>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ecf3d1f1
  24. 23 2月, 2013 1 次提交
  25. 04 1月, 2013 1 次提交
  26. 18 12月, 2012 1 次提交
  27. 15 12月, 2012 2 次提交
    • T
      NFS: nfs_lookup_revalidate should not trust an inode with i_nlink == 0 · 65a0c149
      Trond Myklebust 提交于
      If the inode has no links, then we should force a new lookup.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      65a0c149
    • T
      NFS: Fix calls to drop_nlink() · 1f018458
      Trond Myklebust 提交于
      It is almost always wrong for NFS to call drop_nlink() after removing a
      file. What we really want is to mark the inode's attributes for
      revalidation, and we want to ensure that the VFS drops it if we're
      reasonably sure that this is the final unlink().
      Do the former using the usual cache validity flags, and the latter
      by testing if inode->i_nlink == 1, and clearing it in that case.
      
      This also fixes the following warning reported by Neil Brown and
      Jeff Layton (among others).
      
      [634155.004438] WARNING:
      at /home/abuild/rpmbuild/BUILD/kernel-desktop-3.5.0/lin [634155.004442]
      Hardware name: Latitude E6510 [634155.004577]  crc_itu_t crc32c_intel
      snd_hwdep snd_pcm snd_timer snd soundcor [634155.004609] Pid: 13402, comm:
      bash Tainted: G        W    3.5.0-36-desktop # [634155.004611] Call Trace:
      [634155.004630]  [<ffffffff8100444a>] dump_trace+0xaa/0x2b0
      [634155.004641]  [<ffffffff815a23dc>] dump_stack+0x69/0x6f
      [634155.004653]  [<ffffffff81041a0b>] warn_slowpath_common+0x7b/0xc0
      [634155.004662]  [<ffffffff811832e4>] drop_nlink+0x34/0x40
      [634155.004687]  [<ffffffffa05bb6c3>] nfs_dentry_iput+0x33/0x70 [nfs]
      [634155.004714]  [<ffffffff8118049e>] dput+0x12e/0x230
      [634155.004726]  [<ffffffff8116b230>] __fput+0x170/0x230
      [634155.004735]  [<ffffffff81167c0f>] filp_close+0x5f/0x90
      [634155.004743]  [<ffffffff81167cd7>] sys_close+0x97/0x100
      [634155.004754]  [<ffffffff815c3b39>] system_call_fastpath+0x16/0x1b
      [634155.004767]  [<00007f2a73a0d110>] 0x7f2a73a0d10f
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Cc: stable@vger.kernel.org [3.3+]
      1f018458
  28. 30 11月, 2012 1 次提交