1. 28 9月, 2013 1 次提交
    • D
      NFS: Use i_writecount to control whether to get an fscache cookie in nfs_open() · f1fe29b4
      David Howells 提交于
      Use i_writecount to control whether to get an fscache cookie in nfs_open() as
      NFS does not do write caching yet.  I *think* this is the cause of a problem
      encountered by Mark Moseley whereby __fscache_uncache_page() gets a NULL
      pointer dereference because cookie->def is NULL:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
      IP: [<ffffffff812a1903>] __fscache_uncache_page+0x23/0x160
      PGD 0
      Thread overran stack, or stack corrupted
      Oops: 0000 [#1] SMP
      Modules linked in: ...
      CPU: 7 PID: 18993 Comm: php Not tainted 3.11.1 #1
      Hardware name: Dell Inc. PowerEdge R420/072XWF, BIOS 1.3.5 08/21/2012
      task: ffff8804203460c0 ti: ffff880420346640
      RIP: 0010:[<ffffffff812a1903>] __fscache_uncache_page+0x23/0x160
      RSP: 0018:ffff8801053af878 EFLAGS: 00210286
      RAX: 0000000000000000 RBX: ffff8800be2f8780 RCX: ffff88022ffae5e8
      RDX: 0000000000004c66 RSI: ffffea00055ff440 RDI: ffff8800be2f8780
      RBP: ffff8801053af898 R08: 0000000000000001 R09: 0000000000000003
      R10: 0000000000000000 R11: 0000000000000000 R12: ffffea00055ff440
      R13: 0000000000001000 R14: ffff8800c50be538 R15: 0000000000000000
      FS: 0000000000000000(0000) GS:ffff88042fc60000(0063) knlGS:00000000e439c700
      CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
      CR2: 0000000000000010 CR3: 0000000001d8f000 CR4: 00000000000607f0
      Stack:
      ...
      Call Trace:
      [<ffffffff81365a72>] __nfs_fscache_invalidate_page+0x42/0x70
      [<ffffffff813553d5>] nfs_invalidate_page+0x75/0x90
      [<ffffffff811b8f5e>] truncate_inode_page+0x8e/0x90
      [<ffffffff811b90ad>] truncate_inode_pages_range.part.12+0x14d/0x620
      [<ffffffff81d6387d>] ? __mutex_lock_slowpath+0x1fd/0x2e0
      [<ffffffff811b95d3>] truncate_inode_pages_range+0x53/0x70
      [<ffffffff811b969d>] truncate_inode_pages+0x2d/0x40
      [<ffffffff811b96ff>] truncate_pagecache+0x4f/0x70
      [<ffffffff81356840>] nfs_setattr_update_inode+0xa0/0x120
      [<ffffffff81368de4>] nfs3_proc_setattr+0xc4/0xe0
      [<ffffffff81357f78>] nfs_setattr+0xc8/0x150
      [<ffffffff8122d95b>] notify_change+0x1cb/0x390
      [<ffffffff8120a55b>] do_truncate+0x7b/0xc0
      [<ffffffff8121f96c>] do_last+0xa4c/0xfd0
      [<ffffffff8121ffbc>] path_openat+0xcc/0x670
      [<ffffffff81220a0e>] do_filp_open+0x4e/0xb0
      [<ffffffff8120ba1f>] do_sys_open+0x13f/0x2b0
      [<ffffffff8126aaf6>] compat_SyS_open+0x36/0x50
      [<ffffffff81d7204c>] sysenter_dispatch+0x7/0x24
      
      The code at the instruction pointer was disassembled:
      
      > (gdb) disas __fscache_uncache_page
      > Dump of assembler code for function __fscache_uncache_page:
      > ...
      > 0xffffffff812a18ff <+31>: mov 0x48(%rbx),%rax
      > 0xffffffff812a1903 <+35>: cmpb $0x0,0x10(%rax)
      > 0xffffffff812a1907 <+39>: je 0xffffffff812a19cd <__fscache_uncache_page+237>
      
      These instructions make up:
      
      	ASSERTCMP(cookie->def->type, !=, FSCACHE_COOKIE_TYPE_INDEX);
      
      That cmpb is the faulting instruction (%rax is 0).  So cookie->def is NULL -
      which presumably means that the cookie has already been at least partway
      through __fscache_relinquish_cookie().
      
      What I think may be happening is something like a three-way race on the same
      file:
      
      	PROCESS 1	PROCESS 2	PROCESS 3
      	===============	===============	===============
      	open(O_TRUNC|O_WRONLY)
      			open(O_RDONLY)
      					open(O_WRONLY)
      	-->nfs_open()
      	-->nfs_fscache_set_inode_cookie()
      	nfs_fscache_inode_lock()
      	nfs_fscache_disable_inode_cookie()
      	__fscache_relinquish_cookie()
      	nfs_inode->fscache = NULL
      	<--nfs_fscache_set_inode_cookie()
      
      			-->nfs_open()
      			-->nfs_fscache_set_inode_cookie()
      			nfs_fscache_inode_lock()
      			nfs_fscache_enable_inode_cookie()
      			__fscache_acquire_cookie()
      			nfs_inode->fscache = cookie
      			<--nfs_fscache_set_inode_cookie()
      	<--nfs_open()
      	-->nfs_setattr()
      	...
      	...
      	-->nfs_invalidate_page()
      	-->__nfs_fscache_invalidate_page()
      	cookie = nfsi->fscache
      					-->nfs_open()
      					-->nfs_fscache_set_inode_cookie()
      					nfs_fscache_inode_lock()
      					nfs_fscache_disable_inode_cookie()
      					-->__fscache_relinquish_cookie()
      	-->__fscache_uncache_page(cookie)
      	<crash>
      					<--__fscache_relinquish_cookie()
      					nfs_inode->fscache = NULL
      					<--nfs_fscache_set_inode_cookie()
      
      What is needed is something to prevent process #2 from reacquiring the cookie
      - and I think checking i_writecount should do the trick.
      
      It's also possible to have a two-way race on this if the file is opened
      O_TRUNC|O_RDONLY instead.
      Reported-by: NMark Moseley <moseleymark@gmail.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      f1fe29b4
  2. 13 9月, 2013 1 次提交
  3. 22 8月, 2013 2 次提交
  4. 08 8月, 2013 2 次提交
  5. 10 7月, 2013 1 次提交
  6. 19 6月, 2013 1 次提交
  7. 09 6月, 2013 4 次提交
  8. 07 6月, 2013 1 次提交
  9. 12 5月, 2013 1 次提交
    • C
      freezer: add unsafe versions of freezable helpers for NFS · 416ad3c9
      Colin Cross 提交于
      NFS calls the freezable helpers with locks held, which is unsafe
      and will cause lockdep warnings when 6aa97070 "lockdep: check
      that no locks held at freeze time" is reapplied (it was reverted
      in dbf520a9).  NFS shouldn't be doing this, but it has
      long-running syscalls that must hold a lock but also shouldn't
      block suspend.  Until NFS freeze handling is rewritten to use a
      signal to exit out of the critical section, add new *_unsafe
      versions of the helpers that will not run the lockdep test when
      6aa97070 is reapplied, and call them from NFS.
      
      In practice the likley result of holding the lock while freezing
      is that a second task blocked on the lock will never freeze,
      aborting suspend, but it is possible to manufacture a case using
      the cgroup freezer, the lock, and the suspend freezer to create
      a deadlock.  Silencing the lockdep warning here will allow
      problems to be found in other drivers that may have a more
      serious deadlock risk, and prevent new problems from being added.
      Signed-off-by: NColin Cross <ccross@android.com>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      416ad3c9
  10. 09 4月, 2013 1 次提交
  11. 26 3月, 2013 1 次提交
  12. 28 2月, 2013 1 次提交
    • J
      nfs: don't allow nfs_find_actor to match inodes of the wrong type · f6488c9b
      Jeff Layton 提交于
      Benny Halevy reported the following oops when testing RHEL6:
      
      <7>nfs_update_inode: inode 892950 mode changed, 0040755 to 0100644
      <1>BUG: unable to handle kernel NULL pointer dereference at (null)
      <1>IP: [<ffffffffa02a52c5>] nfs_closedir+0x15/0x30 [nfs]
      <4>PGD 81448a067 PUD 831632067 PMD 0
      <4>Oops: 0000 [#1] SMP
      <4>last sysfs file: /sys/kernel/mm/redhat_transparent_hugepage/enabled
      <4>CPU 6
      <4>Modules linked in: fuse bonding 8021q garp ebtable_nat ebtables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi softdog bridge stp llc xt_physdev ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 dm_round_robin dm_multipath objlayoutdriver2(U) nfs(U) lockd fscache auth_rpcgss nfs_acl sunrpc vhost_net macvtap macvlan tun kvm_intel kvm be2net igb dca ptp pps_core microcode serio_raw sg iTCO_wdt iTCO_vendor_support i7core_edac edac_core shpchp ext4 mbcache jbd2 sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
      <4>
      <4>Pid: 6332, comm: dd Not tainted 2.6.32-358.el6.x86_64 #1 HP ProLiant DL170e G6  /ProLiant DL170e G6
      <4>RIP: 0010:[<ffffffffa02a52c5>]  [<ffffffffa02a52c5>] nfs_closedir+0x15/0x30 [nfs]
      <4>RSP: 0018:ffff88081458bb98  EFLAGS: 00010292
      <4>RAX: ffffffffa02a52b0 RBX: 0000000000000000 RCX: 0000000000000003
      <4>RDX: ffffffffa02e45a0 RSI: ffff88081440b300 RDI: ffff88082d5f5760
      <4>RBP: ffff88081458bba8 R08: 0000000000000000 R09: 0000000000000000
      <4>R10: 0000000000000772 R11: 0000000000400004 R12: 0000000040000008
      <4>R13: ffff88082d5f5760 R14: ffff88082d6e8800 R15: ffff88082f12d780
      <4>FS:  00007f728f37e700(0000) GS:ffff8800456c0000(0000) knlGS:0000000000000000
      <4>CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      <4>CR2: 0000000000000000 CR3: 0000000831279000 CR4: 00000000000007e0
      <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      <4>Process dd (pid: 6332, threadinfo ffff88081458a000, task ffff88082fa0e040)
      <4>Stack:
      <4> 0000000040000008 ffff88081440b300 ffff88081458bbf8 ffffffff81182745
      <4><d> ffff88082d5f5760 ffff88082d6e8800 ffff88081458bbf8 ffffffffffffffea
      <4><d> ffff88082f12d780 ffff88082d6e8800 ffffffffa02a50a0 ffff88082d5f5760
      <4>Call Trace:
      <4> [<ffffffff81182745>] __fput+0xf5/0x210
      <4> [<ffffffffa02a50a0>] ? do_open+0x0/0x20 [nfs]
      <4> [<ffffffff81182885>] fput+0x25/0x30
      <4> [<ffffffff8117e23e>] __dentry_open+0x27e/0x360
      <4> [<ffffffff811c397a>] ? inotify_d_instantiate+0x2a/0x60
      <4> [<ffffffff8117e4b9>] lookup_instantiate_filp+0x69/0x90
      <4> [<ffffffffa02a6679>] nfs_intent_set_file+0x59/0x90 [nfs]
      <4> [<ffffffffa02a686b>] nfs_atomic_lookup+0x1bb/0x310 [nfs]
      <4> [<ffffffff8118e0c2>] __lookup_hash+0x102/0x160
      <4> [<ffffffff81225052>] ? selinux_inode_permission+0x72/0xb0
      <4> [<ffffffff8118e76a>] lookup_hash+0x3a/0x50
      <4> [<ffffffff81192a4b>] do_filp_open+0x2eb/0xdd0
      <4> [<ffffffff8104757c>] ? __do_page_fault+0x1ec/0x480
      <4> [<ffffffff8119f562>] ? alloc_fd+0x92/0x160
      <4> [<ffffffff8117de79>] do_sys_open+0x69/0x140
      <4> [<ffffffff811811f6>] ? sys_lseek+0x66/0x80
      <4> [<ffffffff8117df90>] sys_open+0x20/0x30
      <4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
      <4>Code: 65 48 8b 04 25 c8 cb 00 00 83 a8 44 e0 ff ff 01 5b 41 5c c9 c3 90 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 48 8b 9e a0 00 00 00 <48> 8b 3b e8 13 0c f7 ff 48 89 df e8 ab 3d ec e0 48 83 c4 08 31
      <1>RIP  [<ffffffffa02a52c5>] nfs_closedir+0x15/0x30 [nfs]
      <4> RSP <ffff88081458bb98>
      <4>CR2: 0000000000000000
      
      I think this is ultimately due to a bug on the server. The client had
      previously found a directory dentry. It then later tried to do an atomic
      open on a new (regular file) dentry. The attributes it got back had the
      same filehandle as the previously found directory inode. It then tried
      to put the filp because it failed the aops tests for O_DIRECT opens, and
      oopsed here because the ctx was still NULL.
      
      Obviously the root cause here is a server issue, but we can take steps
      to mitigate this on the client. When nfs_fhget is called, we always know
      what type of inode it is. In the event that there's a broken or
      malicious server on the other end of the wire, the client can end up
      crashing because the wrong ops are set on it.
      
      Have nfs_find_actor check that the inode type is correct after checking
      the fileid. The fileid check should rarely ever match, so it should only
      rarely ever get to this check. In the case where we have a broken
      server, we may see two different inodes with the same i_ino, but the
      client should be able to cope with them without crashing.
      
      This should fix the oops reported here:
      
          https://bugzilla.redhat.com/show_bug.cgi?id=913660Reported-by: NBenny Halevy <bhalevy@tonian.com>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      f6488c9b
  13. 23 2月, 2013 1 次提交
  14. 13 2月, 2013 1 次提交
  15. 01 2月, 2013 1 次提交
  16. 21 12月, 2012 1 次提交
    • D
      NFS: Use FS-Cache invalidation · de242c0b
      David Howells 提交于
      Use the new FS-Cache invalidation facility from NFS to deal with foreign
      changes being detected on the server rather than attempting to retire the old
      cookie and get a new one.
      
      The problem with the old method was that NFS did not wait for all outstanding
      storage and retrieval ops on the cache to complete.  There was no automatic
      wait between the calls to ->readpages() and calls to invalidate_inode_pages2()
      as the latter can only wait on locked pages that have been added to the
      pagecache (which they haven't yet on entry to ->readpages()).
      
      This was leading to oopses like the one below when an outstanding read got cut
      off from its cookie by a premature release.
      
      BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
      IP: [<ffffffffa0075118>] __fscache_read_or_alloc_pages+0x1dd/0x315 [fscache]
      PGD 15889067 PUD 15890067 PMD 0
      Oops: 0000 [#1] SMP
      CPU 0
      Modules linked in: cachefiles nfs fscache auth_rpcgss nfs_acl lockd sunrpc
      
      Pid: 4544, comm: tar Not tainted 3.1.0-rc4-fsdevel+ #1064                  /DG965RY
      RIP: 0010:[<ffffffffa0075118>]  [<ffffffffa0075118>] __fscache_read_or_alloc_pages+0x1dd/0x315 [fscache]
      RSP: 0018:ffff8800158799e8  EFLAGS: 00010246
      RAX: 0000000000000000 RBX: ffff8800070d41e0 RCX: ffff8800083dc1b0
      RDX: 0000000000000000 RSI: ffff880015879960 RDI: ffff88003e627b90
      RBP: ffff880015879a28 R08: 0000000000000002 R09: 0000000000000002
      R10: 0000000000000001 R11: ffff880015879950 R12: ffff880015879aa4
      R13: 0000000000000000 R14: ffff8800083dc158 R15: ffff880015879be8
      FS:  00007f671e9d87c0(0000) GS:ffff88003bc00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 00000000000000a8 CR3: 000000001587f000 CR4: 00000000000006f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process tar (pid: 4544, threadinfo ffff880015878000, task ffff880015875040)
      Stack:
       ffffffffa00b1759 ffff8800070dc158 ffff8800000213da ffff88002a286508
       ffff880015879aa4 ffff880015879be8 0000000000000001 ffff88002a2866e8
       ffff880015879a88 ffffffffa00b20be 00000000000200da ffff880015875040
      Call Trace:
       [<ffffffffa00b1759>] ? nfs_fscache_wait_bit+0xd/0xd [nfs]
       [<ffffffffa00b20be>] __nfs_readpages_from_fscache+0x7e/0x13f [nfs]
       [<ffffffff81095fe7>] ? __alloc_pages_nodemask+0x156/0x662
       [<ffffffffa0098763>] nfs_readpages+0xee/0x187 [nfs]
       [<ffffffff81098a5e>] __do_page_cache_readahead+0x1be/0x267
       [<ffffffff81098942>] ? __do_page_cache_readahead+0xa2/0x267
       [<ffffffff81098d7b>] ra_submit+0x1c/0x20
       [<ffffffff8109900a>] ondemand_readahead+0x28b/0x29a
       [<ffffffff810990ce>] page_cache_sync_readahead+0x38/0x3a
       [<ffffffff81091d8a>] generic_file_aio_read+0x2ab/0x67e
       [<ffffffffa008cfbe>] nfs_file_read+0xa4/0xc9 [nfs]
       [<ffffffff810c22c4>] do_sync_read+0xba/0xfa
       [<ffffffff810a62c9>] ? might_fault+0x4e/0x9e
       [<ffffffff81177a47>] ? security_file_permission+0x7b/0x84
       [<ffffffff810c25dd>] ? rw_verify_area+0xab/0xc8
       [<ffffffff810c29a4>] vfs_read+0xaa/0x13a
       [<ffffffff810c2a79>] sys_read+0x45/0x6c
       [<ffffffff813ac37b>] system_call_fastpath+0x16/0x1b
      Reported-by: NMark Moseley <moseleymark@gmail.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      de242c0b
  17. 15 12月, 2012 1 次提交
  18. 05 11月, 2012 1 次提交
  19. 01 11月, 2012 1 次提交
    • W
      NFS: add nfs_sb_deactive_async to avoid deadlock · 324d003b
      Weston Andros Adamson 提交于
      Use nfs_sb_deactive_async instead of nfs_sb_deactive when in a workqueue
      context.  This avoids a deadlock where rpc_shutdown_client loops forever
      in a workqueue kworker context, trying to kill all RPC tasks associated with
      the client, while one or more of these tasks have already been assigned to the
      same kworker (and will never run rpc_exit_task).
      
      This approach is needed because RPC tasks that have already been assigned
      to a kworker by queue_work cannot be canceled, as explained in the comment
      for workqueue.c:insert_wq_barrier.
      Signed-off-by: NWeston Andros Adamson <dros@netapp.com>
      [Trond: add module_get/put.]
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      324d003b
  20. 03 10月, 2012 1 次提交
  21. 29 9月, 2012 2 次提交
  22. 05 9月, 2012 1 次提交
  23. 01 8月, 2012 1 次提交
    • M
      nfs: disable data cache revalidation for swapfiles · 29418aa4
      Mel Gorman 提交于
      The VM does not like PG_private set on PG_swapcache pages.  As suggested
      by Trond in http://lkml.org/lkml/2006/8/25/348, this patch disables NFS
      data cache revalidation on swap files.  as it does not make sense to have
      other clients change the file while it is being used as swap.  This avoids
      setting PG_private on swap pages, since there ought to be no further races
      with invalidate_inode_pages2() to deal with.
      
      Since we cannot set PG_private we cannot use page->private which is
      already used by PG_swapcache pages to store the nfs_page.  Thus augment
      the new nfs_page_find_request logic.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Eric B Munson <emunson@mgebm.net>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Mike Christie <michaelc@cs.wisc.edu>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: Xiaotian Feng <dfeng@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      29418aa4
  24. 31 7月, 2012 5 次提交
  25. 18 7月, 2012 1 次提交
  26. 29 6月, 2012 2 次提交
  27. 20 6月, 2012 1 次提交
  28. 01 6月, 2012 1 次提交
  29. 25 5月, 2012 1 次提交