1. 01 10月, 2014 1 次提交
  2. 05 8月, 2014 1 次提交
    • E
      NFS: Fix /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes · 65b38851
      Eric W. Biederman 提交于
      The usage of pid_ns->child_reaper->nsproxy->net_ns in
      nfs_server_list_open and nfs_client_list_open is not safe.
      
      /proc for a pid namespace can remain mounted after the all of the
      process in that pid namespace have exited.  There are also times
      before the initial process in a pid namespace has started or after the
      initial process in a pid namespace has exited where
      pid_ns->child_reaper can be NULL or stale.  Making the idiom
      pid_ns->child_reaper->nsproxy a double whammy of problems.
      
      Luckily all that needs to happen is to move /proc/fs/nfsfs/servers and
      /proc/fs/nfsfs/volumes under /proc/net to /proc/net/nfsfs/servers and
      /proc/net/nfsfs/volumes and add a symlink from the original location,
      and to use seq_open_net as it has been designed.
      
      Cc: stable@vger.kernel.org
      Cc: Trond Myklebust <trond.myklebust@primarydata.com>
      Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      65b38851
  3. 04 8月, 2014 1 次提交
    • N
      NFS: teach nfs_neg_need_reval to understand LOOKUP_RCU · 912a108d
      NeilBrown 提交于
      This requires nfs_check_verifier to take an rcu_walk flag, and requires
      an rcu version of nfs_revalidate_inode which returns -ECHILD rather
      than making an RPC call.
      
      With this, nfs_lookup_revalidate can call nfs_neg_need_reval in
      RCU-walk mode.
      
      We can also move the LOOKUP_RCU check past the nfs_check_verifier()
      call in nfs_lookup_revalidate.
      
      If RCU_WALK prevents nfs_check_verifier or nfs_neg_need_reval from
      doing a full check, they return a status indicating that a revalidation
      is required.  As this revalidation will not be possible in RCU_WALK
      mode, -ECHILD will ultimately be returned, which is the desired result.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      912a108d
  4. 16 7月, 2014 2 次提交
    • N
      sched: Allow wait_on_bit_action() functions to support a timeout · c1221321
      NeilBrown 提交于
      It is currently not possible for various wait_on_bit functions
      to implement a timeout.
      
      While the "action" function that is called to do the waiting
      could certainly use schedule_timeout(), there is no way to carry
      forward the remaining timeout after a false wake-up.
      As false-wakeups a clearly possible at least due to possible
      hash collisions in bit_waitqueue(), this is a real problem.
      
      The 'action' function is currently passed a pointer to the word
      containing the bit being waited on.  No current action functions
      use this pointer.  So changing it to something else will be a
      little noisy but will have no immediate effect.
      
      This patch changes the 'action' function to take a pointer to
      the "struct wait_bit_key", which contains a pointer to the word
      containing the bit so nothing is really lost.
      
      It also adds a 'private' field to "struct wait_bit_key", which
      is initialized to zero.
      
      An action function can now implement a timeout with something
      like
      
      static int timed_out_waiter(struct wait_bit_key *key)
      {
      	unsigned long waited;
      	if (key->private == 0) {
      		key->private = jiffies;
      		if (key->private == 0)
      			key->private -= 1;
      	}
      	waited = jiffies - key->private;
      	if (waited > 10 * HZ)
      		return -EAGAIN;
      	schedule_timeout(waited - 10 * HZ);
      	return 0;
      }
      
      If any other need for context in a waiter were found it would be
      easy to use ->private for some other purpose, or even extend
      "struct wait_bit_key".
      
      My particular need is to support timeouts in nfs_release_page()
      to avoid deadlocks with loopback mounted NFS.
      
      While wait_on_bit_timeout() would be a cleaner interface, it
      will not meet my need.  I need the timeout to be sensitive to
      the state of the connection with the server, which could change.
       So I need to use an 'action' interface.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steve French <sfrench@samba.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20140707051604.28027.41257.stgit@notabene.brownSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c1221321
    • N
      sched: Remove proliferation of wait_on_bit() action functions · 74316201
      NeilBrown 提交于
      The current "wait_on_bit" interface requires an 'action'
      function to be provided which does the actual waiting.
      There are over 20 such functions, many of them identical.
      Most cases can be satisfied by one of just two functions, one
      which uses io_schedule() and one which just uses schedule().
      
      So:
       Rename wait_on_bit and        wait_on_bit_lock to
              wait_on_bit_action and wait_on_bit_lock_action
       to make it explicit that they need an action function.
      
       Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
       which are *not* given an action function but implicitly use
       a standard one.
       The decision to error-out if a signal is pending is now made
       based on the 'mode' argument rather than being encoded in the action
       function.
      
       All instances of the old wait_on_bit and wait_on_bit_lock which
       can use the new version have been changed accordingly and their
       action functions have been discarded.
       wait_on_bit{_lock} does not return any specific error code in the
       event of a signal so the caller must check for non-zero and
       interpolate their own error code as appropriate.
      
      The wait_on_bit() call in __fscache_wait_on_invalidate() was
      ambiguous as it specified TASK_UNINTERRUPTIBLE but used
      fscache_wait_bit_interruptible as an action function.
      David Howells confirms this should be uniformly
      "uninterruptible"
      
      The main remaining user of wait_on_bit{,_lock}_action is NFS
      which needs to use a freezer-aware schedule() call.
      
      A comment in fs/gfs2/glock.c notes that having multiple 'action'
      functions is useful as they display differently in the 'wchan'
      field of 'ps'. (and /proc/$PID/wchan).
      As the new bit_wait{,_io} functions are tagged "__sched", they
      will not show up at all, but something higher in the stack.  So
      the distinction will still be visible, only with different
      function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
      gfs2/glock.c case).
      
      Since first version of this patch (against 3.15) two new action
      functions appeared, on in NFS and one in CIFS.  CIFS also now
      uses an action function that makes the same freezer aware
      schedule call as NFS.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: David Howells <dhowells@redhat.com> (fscache, keys)
      Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2)
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steve French <sfrench@samba.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brownSigned-off-by: NIngo Molnar <mingo@kernel.org>
      74316201
  5. 25 6月, 2014 2 次提交
  6. 18 4月, 2014 1 次提交
  7. 16 4月, 2014 1 次提交
  8. 04 4月, 2014 1 次提交
    • J
      mm + fs: store shadow entries in page cache · 91b0abe3
      Johannes Weiner 提交于
      Reclaim will be leaving shadow entries in the page cache radix tree upon
      evicting the real page.  As those pages are found from the LRU, an
      iput() can lead to the inode being freed concurrently.  At this point,
      reclaim must no longer install shadow pages because the inode freeing
      code needs to ensure the page tree is really empty.
      
      Add an address_space flag, AS_EXITING, that the inode freeing code sets
      under the tree lock before doing the final truncate.  Reclaim will check
      for this flag before installing shadow pages.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NMinchan Kim <minchan@kernel.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Luigi Semenzato <semenzato@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Metin Doslu <metin@citusdata.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Ozgun Erdogan <ozgun@citusdata.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Roman Gushchin <klamm@yandex-team.ru>
      Cc: Ryan Mallon <rmallon@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      91b0abe3
  9. 12 2月, 2014 1 次提交
  10. 10 2月, 2014 1 次提交
    • T
      NFS: Do not set NFS_INO_INVALID_LABEL unless server supports labeled NFS · fd1defc2
      Trond Myklebust 提交于
      Commit aa9c2669 (NFS: Client implementation of Labeled-NFS) introduces
      a performance regression. When nfs_zap_caches_locked is called, it sets
      the NFS_INO_INVALID_LABEL flag irrespectively of whether or not the
      NFS server supports security labels. Since that flag is never cleared,
      it means that all calls to nfs_revalidate_inode() will now trigger
      an on-the-wire GETATTR call.
      
      This patch ensures that we never set the NFS_INO_INVALID_LABEL unless the
      server advertises support for labeled NFS.
      It also causes nfs_setsecurity() to clear NFS_INO_INVALID_LABEL when it
      has successfully set the security label for the inode.
      Finally it gets rid of the NFS_INO_INVALID_LABEL cruft from nfs_update_inode,
      which has nothing to do with labeled NFS.
      Reported-by: NNeil Brown <neilb@suse.de>
      Cc: stable@vger.kernel.org # 3.11+
      Tested-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      fd1defc2
  11. 29 1月, 2014 1 次提交
  12. 28 1月, 2014 2 次提交
    • T
      NFS: Fix races in nfs_revalidate_mapping · 17dfeb91
      Trond Myklebust 提交于
      Commit d529ef83 (NFS: fix the handling
      of NFS_INO_INVALID_DATA flag in nfs_revalidate_mapping) introduces
      a potential race, since it doesn't test the value of nfsi->cache_validity
      and set the bitlock in nfsi->flags atomically.
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      Cc: Jeff Layton <jlayton@redhat.com>
      17dfeb91
    • J
      NFS: fix the handling of NFS_INO_INVALID_DATA flag in nfs_revalidate_mapping · d529ef83
      Jeff Layton 提交于
      There is a possible race in how the nfs_invalidate_mapping function is
      handled.  Currently, we go and invalidate the pages in the file and then
      clear NFS_INO_INVALID_DATA.
      
      The problem is that it's possible for a stale page to creep into the
      mapping after the page was invalidated (i.e., via readahead). If another
      writer comes along and sets the flag after that happens but before
      invalidate_inode_pages2 returns then we could clear the flag
      without the cache having been properly invalidated.
      
      So, we must clear the flag first and then invalidate the pages. Doing
      this however, opens another race:
      
      It's possible to have two concurrent read() calls that end up in
      nfs_revalidate_mapping at the same time. The first one clears the
      NFS_INO_INVALID_DATA flag and then goes to call nfs_invalidate_mapping.
      
      Just before calling that though, the other task races in, checks the
      flag and finds it cleared. At that point, it trusts that the mapping is
      good and gets the lock on the page, allowing the read() to be satisfied
      from the cache even though the data is no longer valid.
      
      These effects are easily manifested by running diotest3 from the LTP
      test suite on NFS. That program does a series of DIO writes and buffered
      reads. The operations are serialized and page-aligned but the existing
      code fails the test since it occasionally allows a read to come out of
      the cache incorrectly. While mixing direct and buffered I/O isn't
      recommended, I believe it's possible to hit this in other ways that just
      use buffered I/O, though that situation is much harder to reproduce.
      
      The problem is that the checking/clearing of that flag and the
      invalidation of the mapping really need to be atomic. Fix this by
      serializing concurrent invalidations with a bitlock.
      
      At the same time, we also need to allow other places that check
      NFS_INO_INVALID_DATA to check whether we might be in the middle of
      invalidating the file, so fix up a couple of places that do that
      to look for the new NFS_INO_INVALIDATING flag.
      
      Doing this requires us to be careful not to set the bitlock
      unnecessarily, so this code only does that if it believes it will
      be doing an invalidation.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      d529ef83
  13. 26 1月, 2014 1 次提交
  14. 14 1月, 2014 1 次提交
  15. 06 1月, 2014 1 次提交
  16. 20 11月, 2013 1 次提交
  17. 05 11月, 2013 2 次提交
  18. 29 10月, 2013 1 次提交
  19. 28 9月, 2013 1 次提交
    • D
      NFS: Use i_writecount to control whether to get an fscache cookie in nfs_open() · f1fe29b4
      David Howells 提交于
      Use i_writecount to control whether to get an fscache cookie in nfs_open() as
      NFS does not do write caching yet.  I *think* this is the cause of a problem
      encountered by Mark Moseley whereby __fscache_uncache_page() gets a NULL
      pointer dereference because cookie->def is NULL:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
      IP: [<ffffffff812a1903>] __fscache_uncache_page+0x23/0x160
      PGD 0
      Thread overran stack, or stack corrupted
      Oops: 0000 [#1] SMP
      Modules linked in: ...
      CPU: 7 PID: 18993 Comm: php Not tainted 3.11.1 #1
      Hardware name: Dell Inc. PowerEdge R420/072XWF, BIOS 1.3.5 08/21/2012
      task: ffff8804203460c0 ti: ffff880420346640
      RIP: 0010:[<ffffffff812a1903>] __fscache_uncache_page+0x23/0x160
      RSP: 0018:ffff8801053af878 EFLAGS: 00210286
      RAX: 0000000000000000 RBX: ffff8800be2f8780 RCX: ffff88022ffae5e8
      RDX: 0000000000004c66 RSI: ffffea00055ff440 RDI: ffff8800be2f8780
      RBP: ffff8801053af898 R08: 0000000000000001 R09: 0000000000000003
      R10: 0000000000000000 R11: 0000000000000000 R12: ffffea00055ff440
      R13: 0000000000001000 R14: ffff8800c50be538 R15: 0000000000000000
      FS: 0000000000000000(0000) GS:ffff88042fc60000(0063) knlGS:00000000e439c700
      CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
      CR2: 0000000000000010 CR3: 0000000001d8f000 CR4: 00000000000607f0
      Stack:
      ...
      Call Trace:
      [<ffffffff81365a72>] __nfs_fscache_invalidate_page+0x42/0x70
      [<ffffffff813553d5>] nfs_invalidate_page+0x75/0x90
      [<ffffffff811b8f5e>] truncate_inode_page+0x8e/0x90
      [<ffffffff811b90ad>] truncate_inode_pages_range.part.12+0x14d/0x620
      [<ffffffff81d6387d>] ? __mutex_lock_slowpath+0x1fd/0x2e0
      [<ffffffff811b95d3>] truncate_inode_pages_range+0x53/0x70
      [<ffffffff811b969d>] truncate_inode_pages+0x2d/0x40
      [<ffffffff811b96ff>] truncate_pagecache+0x4f/0x70
      [<ffffffff81356840>] nfs_setattr_update_inode+0xa0/0x120
      [<ffffffff81368de4>] nfs3_proc_setattr+0xc4/0xe0
      [<ffffffff81357f78>] nfs_setattr+0xc8/0x150
      [<ffffffff8122d95b>] notify_change+0x1cb/0x390
      [<ffffffff8120a55b>] do_truncate+0x7b/0xc0
      [<ffffffff8121f96c>] do_last+0xa4c/0xfd0
      [<ffffffff8121ffbc>] path_openat+0xcc/0x670
      [<ffffffff81220a0e>] do_filp_open+0x4e/0xb0
      [<ffffffff8120ba1f>] do_sys_open+0x13f/0x2b0
      [<ffffffff8126aaf6>] compat_SyS_open+0x36/0x50
      [<ffffffff81d7204c>] sysenter_dispatch+0x7/0x24
      
      The code at the instruction pointer was disassembled:
      
      > (gdb) disas __fscache_uncache_page
      > Dump of assembler code for function __fscache_uncache_page:
      > ...
      > 0xffffffff812a18ff <+31>: mov 0x48(%rbx),%rax
      > 0xffffffff812a1903 <+35>: cmpb $0x0,0x10(%rax)
      > 0xffffffff812a1907 <+39>: je 0xffffffff812a19cd <__fscache_uncache_page+237>
      
      These instructions make up:
      
      	ASSERTCMP(cookie->def->type, !=, FSCACHE_COOKIE_TYPE_INDEX);
      
      That cmpb is the faulting instruction (%rax is 0).  So cookie->def is NULL -
      which presumably means that the cookie has already been at least partway
      through __fscache_relinquish_cookie().
      
      What I think may be happening is something like a three-way race on the same
      file:
      
      	PROCESS 1	PROCESS 2	PROCESS 3
      	===============	===============	===============
      	open(O_TRUNC|O_WRONLY)
      			open(O_RDONLY)
      					open(O_WRONLY)
      	-->nfs_open()
      	-->nfs_fscache_set_inode_cookie()
      	nfs_fscache_inode_lock()
      	nfs_fscache_disable_inode_cookie()
      	__fscache_relinquish_cookie()
      	nfs_inode->fscache = NULL
      	<--nfs_fscache_set_inode_cookie()
      
      			-->nfs_open()
      			-->nfs_fscache_set_inode_cookie()
      			nfs_fscache_inode_lock()
      			nfs_fscache_enable_inode_cookie()
      			__fscache_acquire_cookie()
      			nfs_inode->fscache = cookie
      			<--nfs_fscache_set_inode_cookie()
      	<--nfs_open()
      	-->nfs_setattr()
      	...
      	...
      	-->nfs_invalidate_page()
      	-->__nfs_fscache_invalidate_page()
      	cookie = nfsi->fscache
      					-->nfs_open()
      					-->nfs_fscache_set_inode_cookie()
      					nfs_fscache_inode_lock()
      					nfs_fscache_disable_inode_cookie()
      					-->__fscache_relinquish_cookie()
      	-->__fscache_uncache_page(cookie)
      	<crash>
      					<--__fscache_relinquish_cookie()
      					nfs_inode->fscache = NULL
      					<--nfs_fscache_set_inode_cookie()
      
      What is needed is something to prevent process #2 from reacquiring the cookie
      - and I think checking i_writecount should do the trick.
      
      It's also possible to have a two-way race on this if the file is opened
      O_TRUNC|O_RDONLY instead.
      Reported-by: NMark Moseley <moseleymark@gmail.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      f1fe29b4
  20. 13 9月, 2013 1 次提交
  21. 22 8月, 2013 2 次提交
  22. 08 8月, 2013 2 次提交
  23. 10 7月, 2013 1 次提交
  24. 19 6月, 2013 1 次提交
  25. 09 6月, 2013 4 次提交
  26. 07 6月, 2013 1 次提交
  27. 12 5月, 2013 1 次提交
    • C
      freezer: add unsafe versions of freezable helpers for NFS · 416ad3c9
      Colin Cross 提交于
      NFS calls the freezable helpers with locks held, which is unsafe
      and will cause lockdep warnings when 6aa97070 "lockdep: check
      that no locks held at freeze time" is reapplied (it was reverted
      in dbf520a9).  NFS shouldn't be doing this, but it has
      long-running syscalls that must hold a lock but also shouldn't
      block suspend.  Until NFS freeze handling is rewritten to use a
      signal to exit out of the critical section, add new *_unsafe
      versions of the helpers that will not run the lockdep test when
      6aa97070 is reapplied, and call them from NFS.
      
      In practice the likley result of holding the lock while freezing
      is that a second task blocked on the lock will never freeze,
      aborting suspend, but it is possible to manufacture a case using
      the cgroup freezer, the lock, and the suspend freezer to create
      a deadlock.  Silencing the lockdep warning here will allow
      problems to be found in other drivers that may have a more
      serious deadlock risk, and prevent new problems from being added.
      Signed-off-by: NColin Cross <ccross@android.com>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      416ad3c9
  28. 09 4月, 2013 1 次提交
  29. 26 3月, 2013 1 次提交
  30. 28 2月, 2013 1 次提交
    • J
      nfs: don't allow nfs_find_actor to match inodes of the wrong type · f6488c9b
      Jeff Layton 提交于
      Benny Halevy reported the following oops when testing RHEL6:
      
      <7>nfs_update_inode: inode 892950 mode changed, 0040755 to 0100644
      <1>BUG: unable to handle kernel NULL pointer dereference at (null)
      <1>IP: [<ffffffffa02a52c5>] nfs_closedir+0x15/0x30 [nfs]
      <4>PGD 81448a067 PUD 831632067 PMD 0
      <4>Oops: 0000 [#1] SMP
      <4>last sysfs file: /sys/kernel/mm/redhat_transparent_hugepage/enabled
      <4>CPU 6
      <4>Modules linked in: fuse bonding 8021q garp ebtable_nat ebtables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi softdog bridge stp llc xt_physdev ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 dm_round_robin dm_multipath objlayoutdriver2(U) nfs(U) lockd fscache auth_rpcgss nfs_acl sunrpc vhost_net macvtap macvlan tun kvm_intel kvm be2net igb dca ptp pps_core microcode serio_raw sg iTCO_wdt iTCO_vendor_support i7core_edac edac_core shpchp ext4 mbcache jbd2 sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
      <4>
      <4>Pid: 6332, comm: dd Not tainted 2.6.32-358.el6.x86_64 #1 HP ProLiant DL170e G6  /ProLiant DL170e G6
      <4>RIP: 0010:[<ffffffffa02a52c5>]  [<ffffffffa02a52c5>] nfs_closedir+0x15/0x30 [nfs]
      <4>RSP: 0018:ffff88081458bb98  EFLAGS: 00010292
      <4>RAX: ffffffffa02a52b0 RBX: 0000000000000000 RCX: 0000000000000003
      <4>RDX: ffffffffa02e45a0 RSI: ffff88081440b300 RDI: ffff88082d5f5760
      <4>RBP: ffff88081458bba8 R08: 0000000000000000 R09: 0000000000000000
      <4>R10: 0000000000000772 R11: 0000000000400004 R12: 0000000040000008
      <4>R13: ffff88082d5f5760 R14: ffff88082d6e8800 R15: ffff88082f12d780
      <4>FS:  00007f728f37e700(0000) GS:ffff8800456c0000(0000) knlGS:0000000000000000
      <4>CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      <4>CR2: 0000000000000000 CR3: 0000000831279000 CR4: 00000000000007e0
      <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      <4>Process dd (pid: 6332, threadinfo ffff88081458a000, task ffff88082fa0e040)
      <4>Stack:
      <4> 0000000040000008 ffff88081440b300 ffff88081458bbf8 ffffffff81182745
      <4><d> ffff88082d5f5760 ffff88082d6e8800 ffff88081458bbf8 ffffffffffffffea
      <4><d> ffff88082f12d780 ffff88082d6e8800 ffffffffa02a50a0 ffff88082d5f5760
      <4>Call Trace:
      <4> [<ffffffff81182745>] __fput+0xf5/0x210
      <4> [<ffffffffa02a50a0>] ? do_open+0x0/0x20 [nfs]
      <4> [<ffffffff81182885>] fput+0x25/0x30
      <4> [<ffffffff8117e23e>] __dentry_open+0x27e/0x360
      <4> [<ffffffff811c397a>] ? inotify_d_instantiate+0x2a/0x60
      <4> [<ffffffff8117e4b9>] lookup_instantiate_filp+0x69/0x90
      <4> [<ffffffffa02a6679>] nfs_intent_set_file+0x59/0x90 [nfs]
      <4> [<ffffffffa02a686b>] nfs_atomic_lookup+0x1bb/0x310 [nfs]
      <4> [<ffffffff8118e0c2>] __lookup_hash+0x102/0x160
      <4> [<ffffffff81225052>] ? selinux_inode_permission+0x72/0xb0
      <4> [<ffffffff8118e76a>] lookup_hash+0x3a/0x50
      <4> [<ffffffff81192a4b>] do_filp_open+0x2eb/0xdd0
      <4> [<ffffffff8104757c>] ? __do_page_fault+0x1ec/0x480
      <4> [<ffffffff8119f562>] ? alloc_fd+0x92/0x160
      <4> [<ffffffff8117de79>] do_sys_open+0x69/0x140
      <4> [<ffffffff811811f6>] ? sys_lseek+0x66/0x80
      <4> [<ffffffff8117df90>] sys_open+0x20/0x30
      <4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      <4>Code: 65 48 8b 04 25 c8 cb 00 00 83 a8 44 e0 ff ff 01 5b 41 5c c9 c3 90 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 48 8b 9e a0 00 00 00 <48> 8b 3b e8 13 0c f7 ff 48 89 df e8 ab 3d ec e0 48 83 c4 08 31
      <1>RIP  [<ffffffffa02a52c5>] nfs_closedir+0x15/0x30 [nfs]
      <4> RSP <ffff88081458bb98>
      <4>CR2: 0000000000000000
      
      I think this is ultimately due to a bug on the server. The client had
      previously found a directory dentry. It then later tried to do an atomic
      open on a new (regular file) dentry. The attributes it got back had the
      same filehandle as the previously found directory inode. It then tried
      to put the filp because it failed the aops tests for O_DIRECT opens, and
      oopsed here because the ctx was still NULL.
      
      Obviously the root cause here is a server issue, but we can take steps
      to mitigate this on the client. When nfs_fhget is called, we always know
      what type of inode it is. In the event that there's a broken or
      malicious server on the other end of the wire, the client can end up
      crashing because the wrong ops are set on it.
      
      Have nfs_find_actor check that the inode type is correct after checking
      the fileid. The fileid check should rarely ever match, so it should only
      rarely ever get to this check. In the case where we have a broken
      server, we may see two different inodes with the same i_ino, but the
      client should be able to cope with them without crashing.
      
      This should fix the oops reported here:
      
          https://bugzilla.redhat.com/show_bug.cgi?id=913660Reported-by: NBenny Halevy <bhalevy@tonian.com>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      f6488c9b
  31. 23 2月, 2013 1 次提交