提交 · f1fe29b4a02d0805aa7d0ff6b73410a9f9316d69 · openeuler / raspberrypi-kernel

28 9月, 2013 1 次提交

NFS: Use i_writecount to control whether to get an fscache cookie in nfs_open() · f1fe29b4

由 David Howells 提交于 9月 27, 2013

Use i_writecount to control whether to get an fscache cookie in nfs_open() as
NFS does not do write caching yet.  I *think* this is the cause of a problem
encountered by Mark Moseley whereby __fscache_uncache_page() gets a NULL
pointer dereference because cookie->def is NULL:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: [<ffffffff812a1903>] __fscache_uncache_page+0x23/0x160
PGD 0
Thread overran stack, or stack corrupted
Oops: 0000 [#1] SMP
Modules linked in: ...
CPU: 7 PID: 18993 Comm: php Not tainted 3.11.1 #1
Hardware name: Dell Inc. PowerEdge R420/072XWF, BIOS 1.3.5 08/21/2012
task: ffff8804203460c0 ti: ffff880420346640
RIP: 0010:[<ffffffff812a1903>] __fscache_uncache_page+0x23/0x160
RSP: 0018:ffff8801053af878 EFLAGS: 00210286
RAX: 0000000000000000 RBX: ffff8800be2f8780 RCX: ffff88022ffae5e8
RDX: 0000000000004c66 RSI: ffffea00055ff440 RDI: ffff8800be2f8780
RBP: ffff8801053af898 R08: 0000000000000001 R09: 0000000000000003
R10: 0000000000000000 R11: 0000000000000000 R12: ffffea00055ff440
R13: 0000000000001000 R14: ffff8800c50be538 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88042fc60000(0063) knlGS:00000000e439c700
CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
CR2: 0000000000000010 CR3: 0000000001d8f000 CR4: 00000000000607f0
Stack:
...
Call Trace:
[<ffffffff81365a72>] __nfs_fscache_invalidate_page+0x42/0x70
[<ffffffff813553d5>] nfs_invalidate_page+0x75/0x90
[<ffffffff811b8f5e>] truncate_inode_page+0x8e/0x90
[<ffffffff811b90ad>] truncate_inode_pages_range.part.12+0x14d/0x620
[<ffffffff81d6387d>] ? __mutex_lock_slowpath+0x1fd/0x2e0
[<ffffffff811b95d3>] truncate_inode_pages_range+0x53/0x70
[<ffffffff811b969d>] truncate_inode_pages+0x2d/0x40
[<ffffffff811b96ff>] truncate_pagecache+0x4f/0x70
[<ffffffff81356840>] nfs_setattr_update_inode+0xa0/0x120
[<ffffffff81368de4>] nfs3_proc_setattr+0xc4/0xe0
[<ffffffff81357f78>] nfs_setattr+0xc8/0x150
[<ffffffff8122d95b>] notify_change+0x1cb/0x390
[<ffffffff8120a55b>] do_truncate+0x7b/0xc0
[<ffffffff8121f96c>] do_last+0xa4c/0xfd0
[<ffffffff8121ffbc>] path_openat+0xcc/0x670
[<ffffffff81220a0e>] do_filp_open+0x4e/0xb0
[<ffffffff8120ba1f>] do_sys_open+0x13f/0x2b0
[<ffffffff8126aaf6>] compat_SyS_open+0x36/0x50
[<ffffffff81d7204c>] sysenter_dispatch+0x7/0x24

The code at the instruction pointer was disassembled:

> (gdb) disas __fscache_uncache_page
> Dump of assembler code for function __fscache_uncache_page:
> ...
> 0xffffffff812a18ff <+31>: mov 0x48(%rbx),%rax
> 0xffffffff812a1903 <+35>: cmpb $0x0,0x10(%rax)
> 0xffffffff812a1907 <+39>: je 0xffffffff812a19cd <__fscache_uncache_page+237>

These instructions make up:

	ASSERTCMP(cookie->def->type, !=, FSCACHE_COOKIE_TYPE_INDEX);

That cmpb is the faulting instruction (%rax is 0).  So cookie->def is NULL -
which presumably means that the cookie has already been at least partway
through __fscache_relinquish_cookie().

What I think may be happening is something like a three-way race on the same
file:

	PROCESS 1	PROCESS 2	PROCESS 3
	===============	===============	===============
	open(O_TRUNC|O_WRONLY)
			open(O_RDONLY)
					open(O_WRONLY)
	-->nfs_open()
	-->nfs_fscache_set_inode_cookie()
	nfs_fscache_inode_lock()
	nfs_fscache_disable_inode_cookie()
	__fscache_relinquish_cookie()
	nfs_inode->fscache = NULL
	<--nfs_fscache_set_inode_cookie()

			-->nfs_open()
			-->nfs_fscache_set_inode_cookie()
			nfs_fscache_inode_lock()
			nfs_fscache_enable_inode_cookie()
			__fscache_acquire_cookie()
			nfs_inode->fscache = cookie
			<--nfs_fscache_set_inode_cookie()
	<--nfs_open()
	-->nfs_setattr()
	...
	...
	-->nfs_invalidate_page()
	-->__nfs_fscache_invalidate_page()
	cookie = nfsi->fscache
					-->nfs_open()
					-->nfs_fscache_set_inode_cookie()
					nfs_fscache_inode_lock()
					nfs_fscache_disable_inode_cookie()
					-->__fscache_relinquish_cookie()
	-->__fscache_uncache_page(cookie)
	<crash>
					<--__fscache_relinquish_cookie()
					nfs_inode->fscache = NULL
					<--nfs_fscache_set_inode_cookie()

What is needed is something to prevent process #2 from reacquiring the cookie
- and I think checking i_writecount should do the trick.

It's also possible to have a two-way race on this if the file is opened
O_TRUNC|O_RDONLY instead.
Reported-by: NMark Moseley <moseleymark@gmail.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>

f1fe29b4

13 9月, 2013 1 次提交

truncate: drop 'oldsize' truncate_pagecache() parameter · 7caef267

由 Kirill A. Shutemov 提交于 9月 12, 2013

truncate_pagecache() doesn't care about old size since commit
cedabed4 ("vfs: Fix vmtruncate() regression").  Let's drop it.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7caef267

22 8月, 2013 2 次提交

NFS: Add event tracing for generic NFS events · f4ce1299

由 Trond Myklebust 提交于 8月 19, 2013

Add tracepoints for inode attribute updates, attribute revalidation,
writeback start/end fsync start/end, attribute change start/end,
permission check start/end.

The intention is to enable performance tracing using 'perf'as well as
improving debugging.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

f4ce1299

NFS: refactor code for calculating the crc32 hash of a filehandle · 1264a2f0

由 Trond Myklebust 提交于 8月 12, 2013

We want to be able to display the crc32 hash of the filehandle in
tracepoints.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

1264a2f0

08 8月, 2013 2 次提交

NFS: Remove unnecessary call to nfs_setsecurity in nfs_fhget() · eddffa40

由 Trond Myklebust 提交于 8月 07, 2013

We only need to call it on the creation of the inode.
Reported-by: NJulia Lawall <Julia.Lawall@lip6.fr>
Cc: Steve Dickson <SteveD@redhat.com>
Cc: Dave Quigley <dpquigl@davequigley.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

eddffa40

NFS: Fix writeback performance issue on cache invalidation · f8806c84

由 Trond Myklebust 提交于 8月 05, 2013

If a cache invalidation is triggered, and we happen to have a lot of
writebacks cached at the time, then the call to invalidate_inode_pages2()
will end up calling ->launder_page() on each and every dirty page in order
to sync its contents to disk, thus defeating write coalescing.
The following patch ensures that we try to sync the inode to disk before
calling invalidate_inode_pages2() so that we do the writeback as efficiently
as possible.
Reported-by: NWilliam Dauchy <william@gandi.net>
Reported-by: NPascal Bouchareine <pascal@gandi.net>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: NWilliam Dauchy <william@gandi.net>
Reviewed-by: NJeff Layton <jlayton@redhat.com>

f8806c84

10 7月, 2013 1 次提交

NFS: Make nfs_attribute_cache_expired() non-static · 43f291cd

由 Scott Mayhew 提交于 7月 05, 2013

NFS: Make nfs_attribute_cache_expired() non-static so we can call it from
nfs_readdir().
Signed-off-by: NScott Mayhew <smayhew@redhat.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

43f291cd

19 6月, 2013 1 次提交

NFSv4: Move the DNS resolver into the NFSv4 module · c8d74d9b

由 Trond Myklebust 提交于 6月 01, 2013

The other protocols don't use it, so make it local to NFSv4, and
remove the EXPORT.
Also ensure that we only compile in cache_lib.o if we're using
the legacy DNS resolver.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Cc: Bryan Schumaker <bjschuma@netapp.com>

c8d74d9b

09 6月, 2013 4 次提交

NFS: Client implementation of Labeled-NFS · aa9c2669

由 David Quigley 提交于 5月 22, 2013

This patch implements the client transport and handling support for labeled
NFS. The patch adds two functions to encode and decode the security label
recommended attribute which makes use of the LSM hooks added earlier. It also
adds code to grab the label from the file attribute structures and encode the
label to be sent back to the server.
Acked-by: NJames Morris <james.l.morris@oracle.com>
Signed-off-by: NMatthew N. Dodd <Matthew.Dodd@sparta.com>
Signed-off-by: NMiguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg>
Signed-off-by: NPhua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg>
Signed-off-by: NKhin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg>
Signed-off-by: NSteve Dickson <steved@redhat.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

aa9c2669

NFS: Add label lifecycle management · 14c43f76

由 David Quigley 提交于 5月 22, 2013

This patch adds the lifecycle management for the security label structure
introduced in an earlier patch. The label is not used yet but allocations and
freeing of the structure is handled.
Signed-off-by: NMatthew N. Dodd <Matthew.Dodd@sparta.com>
Signed-off-by: NMiguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg>
Signed-off-by: NPhua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg>
Signed-off-by: NKhin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg>
Signed-off-by: NSteve Dickson <steved@redhat.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

14c43f76

NFS:Add labels to client function prototypes · 1775fd3e

由 David Quigley 提交于 5月 22, 2013

After looking at all of the nfsv4 operations the label structure has been added
to the prototypes of the functions which can transmit label data.
Signed-off-by: NMatthew N. Dodd <Matthew.Dodd@sparta.com>
Signed-off-by: NMiguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg>
Signed-off-by: NPhua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg>
Signed-off-by: NKhin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg>
Signed-off-by: NSteve Dickson <steved@redhat.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

1775fd3e

NFSv4: Introduce new label structure · e058f70b

由 Steve Dickson 提交于 5月 22, 2013

In order to mimic the way that NFSv4 ACLs are implemented we have created a
structure to be used to pass label data up and down the call chain. This patch
adds the new structure and new members to the required NFSv4 call structures.
Signed-off-by: NMatthew N. Dodd <Matthew.Dodd@sparta.com>
Signed-off-by: NMiguel Rodel Felipe <Rodel_FM@dsi.a-star.edu.sg>
Signed-off-by: NPhua Eu Gene <PHUA_Eu_Gene@dsi.a-star.edu.sg>
Signed-off-by: NKhin Mi Mi Aung <Mi_Mi_AUNG@dsi.a-star.edu.sg>
Signed-off-by: NSteve Dickson <steved@redhat.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

e058f70b

07 6月, 2013 1 次提交

NFSv4: Close another NFSv4 recovery race · c45ffdd2

由 Trond Myklebust 提交于 5月 29, 2013

State recovery currently relies on being able to find a valid
nfs_open_context in the inode->open_files list.
We therefore need to put the nfs_open_context on the list while
we're still protected by the sp->so_reclaim_seqcount in order
to avoid reboot races.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

c45ffdd2

12 5月, 2013 1 次提交

freezer: add unsafe versions of freezable helpers for NFS · 416ad3c9

由 Colin Cross 提交于 5月 06, 2013

NFS calls the freezable helpers with locks held, which is unsafe
and will cause lockdep warnings when 6aa97070 "lockdep: check
that no locks held at freeze time" is reapplied (it was reverted
in dbf520a9).  NFS shouldn't be doing this, but it has
long-running syscalls that must hold a lock but also shouldn't
block suspend.  Until NFS freeze handling is rewritten to use a
signal to exit out of the critical section, add new *_unsafe
versions of the helpers that will not run the lockdep test when
6aa97070 is reapplied, and call them from NFS.

In practice the likley result of holding the lock while freezing
is that a second task blocked on the lock will never freeze,
aborting suspend, but it is possible to manufacture a case using
the cgroup freezer, the lock, and the suspend freezer to create
a deadlock.  Silencing the lockdep warning here will allow
problems to be found in other drivers that may have a more
serious deadlock risk, and prevent new problems from being added.
Signed-off-by: NColin Cross <ccross@android.com>
Acked-by: NPavel Machek <pavel@ucw.cz>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

416ad3c9

09 4月, 2013 1 次提交

NFS: Add functionality to allow waiting on all outstanding reads to complete · 577b4232

由 Trond Myklebust 提交于 4月 08, 2013

This will later allow NFS locking code to wait for readahead to complete
before releasing byte range locks.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

577b4232

26 3月, 2013 1 次提交

NFS: __nfs_find_lock_context needs to check ctx->lock_context for a match too · 8c86899f

由 Trond Myklebust 提交于 3月 15, 2013

Currently, we're forcing an unnecessary duplication of the
initial nfs_lock_context in calls to nfs_get_lock_context, since
__nfs_find_lock_context ignores the ctx->lock_context.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

8c86899f

28 2月, 2013 1 次提交

nfs: don't allow nfs_find_actor to match inodes of the wrong type · f6488c9b

由 Jeff Layton 提交于 2月 27, 2013

Benny Halevy reported the following oops when testing RHEL6:

<7>nfs_update_inode: inode 892950 mode changed, 0040755 to 0100644
<1>BUG: unable to handle kernel NULL pointer dereference at (null)
<1>IP: [<ffffffffa02a52c5>] nfs_closedir+0x15/0x30 [nfs]
<4>PGD 81448a067 PUD 831632067 PMD 0
<4>Oops: 0000 [#1] SMP
<4>last sysfs file: /sys/kernel/mm/redhat_transparent_hugepage/enabled
<4>CPU 6
<4>Modules linked in: fuse bonding 8021q garp ebtable_nat ebtables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi softdog bridge stp llc xt_physdev ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 dm_round_robin dm_multipath objlayoutdriver2(U) nfs(U) lockd fscache auth_rpcgss nfs_acl sunrpc vhost_net macvtap macvlan tun kvm_intel kvm be2net igb dca ptp pps_core microcode serio_raw sg iTCO_wdt iTCO_vendor_support i7core_edac edac_core shpchp ext4 mbcache jbd2 sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
<4>
<4>Pid: 6332, comm: dd Not tainted 2.6.32-358.el6.x86_64 #1 HP ProLiant DL170e G6  /ProLiant DL170e G6
<4>RIP: 0010:[<ffffffffa02a52c5>]  [<ffffffffa02a52c5>] nfs_closedir+0x15/0x30 [nfs]
<4>RSP: 0018:ffff88081458bb98  EFLAGS: 00010292
<4>RAX: ffffffffa02a52b0 RBX: 0000000000000000 RCX: 0000000000000003
<4>RDX: ffffffffa02e45a0 RSI: ffff88081440b300 RDI: ffff88082d5f5760
<4>RBP: ffff88081458bba8 R08: 0000000000000000 R09: 0000000000000000
<4>R10: 0000000000000772 R11: 0000000000400004 R12: 0000000040000008
<4>R13: ffff88082d5f5760 R14: ffff88082d6e8800 R15: ffff88082f12d780
<4>FS:  00007f728f37e700(0000) GS:ffff8800456c0000(0000) knlGS:0000000000000000
<4>CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
<4>CR2: 0000000000000000 CR3: 0000000831279000 CR4: 00000000000007e0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process dd (pid: 6332, threadinfo ffff88081458a000, task ffff88082fa0e040)
<4>Stack:
<4> 0000000040000008 ffff88081440b300 ffff88081458bbf8 ffffffff81182745
<4><d> ffff88082d5f5760 ffff88082d6e8800 ffff88081458bbf8 ffffffffffffffea
<4><d> ffff88082f12d780 ffff88082d6e8800 ffffffffa02a50a0 ffff88082d5f5760
<4>Call Trace:
<4> [<ffffffff81182745>] __fput+0xf5/0x210
<4> [<ffffffffa02a50a0>] ? do_open+0x0/0x20 [nfs]
<4> [<ffffffff81182885>] fput+0x25/0x30
<4> [<ffffffff8117e23e>] __dentry_open+0x27e/0x360
<4> [<ffffffff811c397a>] ? inotify_d_instantiate+0x2a/0x60
<4> [<ffffffff8117e4b9>] lookup_instantiate_filp+0x69/0x90
<4> [<ffffffffa02a6679>] nfs_intent_set_file+0x59/0x90 [nfs]
<4> [<ffffffffa02a686b>] nfs_atomic_lookup+0x1bb/0x310 [nfs]
<4> [<ffffffff8118e0c2>] __lookup_hash+0x102/0x160
<4> [<ffffffff81225052>] ? selinux_inode_permission+0x72/0xb0
<4> [<ffffffff8118e76a>] lookup_hash+0x3a/0x50
<4> [<ffffffff81192a4b>] do_filp_open+0x2eb/0xdd0
<4> [<ffffffff8104757c>] ? __do_page_fault+0x1ec/0x480
<4> [<ffffffff8119f562>] ? alloc_fd+0x92/0x160
<4> [<ffffffff8117de79>] do_sys_open+0x69/0x140
<4> [<ffffffff811811f6>] ? sys_lseek+0x66/0x80
<4> [<ffffffff8117df90>] sys_open+0x20/0x30
<4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
<4>Code: 65 48 8b 04 25 c8 cb 00 00 83 a8 44 e0 ff ff 01 5b 41 5c c9 c3 90 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 48 8b 9e a0 00 00 00 <48> 8b 3b e8 13 0c f7 ff 48 89 df e8 ab 3d ec e0 48 83 c4 08 31
<1>RIP  [<ffffffffa02a52c5>] nfs_closedir+0x15/0x30 [nfs]
<4> RSP <ffff88081458bb98>
<4>CR2: 0000000000000000

I think this is ultimately due to a bug on the server. The client had
previously found a directory dentry. It then later tried to do an atomic
open on a new (regular file) dentry. The attributes it got back had the
same filehandle as the previously found directory inode. It then tried
to put the filp because it failed the aops tests for O_DIRECT opens, and
oopsed here because the ctx was still NULL.

Obviously the root cause here is a server issue, but we can take steps
to mitigate this on the client. When nfs_fhget is called, we always know
what type of inode it is. In the event that there's a broken or
malicious server on the other end of the wire, the client can end up
crashing because the wrong ops are set on it.

Have nfs_find_actor check that the inode type is correct after checking
the fileid. The fileid check should rarely ever match, so it should only
rarely ever get to this check. In the case where we have a broken
server, we may see two different inodes with the same i_ino, but the
client should be able to cope with them without crashing.

This should fix the oops reported here:

    https://bugzilla.redhat.com/show_bug.cgi?id=913660Reported-by: NBenny Halevy <bhalevy@tonian.com>
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

f6488c9b

23 2月, 2013 1 次提交
- A
  new helper: file_inode(file) · 496ad9aa
  由 Al Viro 提交于 1月 23, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  496ad9aa
13 2月, 2013 1 次提交

nfs: kuid and kgid conversions for nfs/inode.c · 9ff593c4

由 Eric W. Biederman 提交于 2月 01, 2013

- Use uid_eq and gid_eq when comparing kuids and kgids.
- Use make_kuid(&init_user_ns, -2) and make_kgid(&init_user_ns, -2) as
  the initial uid and gid on nfs inodes, instead of using the typeunsafe
  value of -2.

Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

9ff593c4

01 2月, 2013 1 次提交

Revert "NFS: add nfs_sb_deactive_async to avoid deadlock" · 322b2b90

由 Trond Myklebust 提交于 1月 11, 2013

This reverts commit 324d003b.

The deadlock turned out to be caused by a workqueue limitation that has
now been worked around in the RPC code (see comment in rpc_free_task).
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

322b2b90

21 12月, 2012 1 次提交

NFS: Use FS-Cache invalidation · de242c0b

由 David Howells 提交于 12月 20, 2012

Use the new FS-Cache invalidation facility from NFS to deal with foreign
changes being detected on the server rather than attempting to retire the old
cookie and get a new one.

The problem with the old method was that NFS did not wait for all outstanding
storage and retrieval ops on the cache to complete.  There was no automatic
wait between the calls to ->readpages() and calls to invalidate_inode_pages2()
as the latter can only wait on locked pages that have been added to the
pagecache (which they haven't yet on entry to ->readpages()).

This was leading to oopses like the one below when an outstanding read got cut
off from its cookie by a premature release.

BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
IP: [<ffffffffa0075118>] __fscache_read_or_alloc_pages+0x1dd/0x315 [fscache]
PGD 15889067 PUD 15890067 PMD 0
Oops: 0000 [#1] SMP
CPU 0
Modules linked in: cachefiles nfs fscache auth_rpcgss nfs_acl lockd sunrpc

Pid: 4544, comm: tar Not tainted 3.1.0-rc4-fsdevel+ #1064                  /DG965RY
RIP: 0010:[<ffffffffa0075118>]  [<ffffffffa0075118>] __fscache_read_or_alloc_pages+0x1dd/0x315 [fscache]
RSP: 0018:ffff8800158799e8  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8800070d41e0 RCX: ffff8800083dc1b0
RDX: 0000000000000000 RSI: ffff880015879960 RDI: ffff88003e627b90
RBP: ffff880015879a28 R08: 0000000000000002 R09: 0000000000000002
R10: 0000000000000001 R11: ffff880015879950 R12: ffff880015879aa4
R13: 0000000000000000 R14: ffff8800083dc158 R15: ffff880015879be8
FS:  00007f671e9d87c0(0000) GS:ffff88003bc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000000000a8 CR3: 000000001587f000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process tar (pid: 4544, threadinfo ffff880015878000, task ffff880015875040)
Stack:
 ffffffffa00b1759 ffff8800070dc158 ffff8800000213da ffff88002a286508
 ffff880015879aa4 ffff880015879be8 0000000000000001 ffff88002a2866e8
 ffff880015879a88 ffffffffa00b20be 00000000000200da ffff880015875040
Call Trace:
 [<ffffffffa00b1759>] ? nfs_fscache_wait_bit+0xd/0xd [nfs]
 [<ffffffffa00b20be>] __nfs_readpages_from_fscache+0x7e/0x13f [nfs]
 [<ffffffff81095fe7>] ? __alloc_pages_nodemask+0x156/0x662
 [<ffffffffa0098763>] nfs_readpages+0xee/0x187 [nfs]
 [<ffffffff81098a5e>] __do_page_cache_readahead+0x1be/0x267
 [<ffffffff81098942>] ? __do_page_cache_readahead+0xa2/0x267
 [<ffffffff81098d7b>] ra_submit+0x1c/0x20
 [<ffffffff8109900a>] ondemand_readahead+0x28b/0x29a
 [<ffffffff810990ce>] page_cache_sync_readahead+0x38/0x3a
 [<ffffffff81091d8a>] generic_file_aio_read+0x2ab/0x67e
 [<ffffffffa008cfbe>] nfs_file_read+0xa4/0xc9 [nfs]
 [<ffffffff810c22c4>] do_sync_read+0xba/0xfa
 [<ffffffff810a62c9>] ? might_fault+0x4e/0x9e
 [<ffffffff81177a47>] ? security_file_permission+0x7b/0x84
 [<ffffffff810c25dd>] ? rw_verify_area+0xab/0xc8
 [<ffffffff810c29a4>] vfs_read+0xaa/0x13a
 [<ffffffff810c2a79>] sys_read+0x45/0x6c
 [<ffffffff813ac37b>] system_call_fastpath+0x16/0x1b
Reported-by: NMark Moseley <moseleymark@gmail.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>

de242c0b

15 12月, 2012 1 次提交
- T
  NFS: Ensure that we always drop inodes that have been marked as stale · eed99357
  由 Trond Myklebust 提交于 12月 14, 2012
```
There is no need to cache stale inodes.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
```
  eed99357
05 11月, 2012 1 次提交
- T
  NFS: Remove BUG_ON()s in the fs/nfs/inode.c · f48407dd
  由 Trond Myklebust 提交于 10月 15, 2012
```
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
```
  f48407dd
01 11月, 2012 1 次提交

NFS: add nfs_sb_deactive_async to avoid deadlock · 324d003b

由 Weston Andros Adamson 提交于 10月 30, 2012

Use nfs_sb_deactive_async instead of nfs_sb_deactive when in a workqueue
context.  This avoids a deadlock where rpc_shutdown_client loops forever
in a workqueue kworker context, trying to kill all RPC tasks associated with
the client, while one or more of these tasks have already been assigned to the
same kworker (and will never run rpc_exit_task).

This approach is needed because RPC tasks that have already been assigned
to a kworker by queue_work cannot be canceled, as explained in the comment
for workqueue.c:insert_wq_barrier.
Signed-off-by: NWeston Andros Adamson <dros@netapp.com>
[Trond: add module_get/put.]
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

324d003b

03 10月, 2012 1 次提交

fs: push rcu_barrier() from deactivate_locked_super() to filesystems · 8c0a8537

由 Kirill A. Shutemov 提交于 9月 26, 2012

There's no reason to call rcu_barrier() on every
deactivate_locked_super().  We only need to make sure that all delayed rcu
free inodes are flushed before we destroy related cache.

Removing rcu_barrier() from deactivate_locked_super() affects some fast
paths.  E.g.  on my machine exit_group() of a last process in IPC
namespace takes 0.07538s.  rcu_barrier() takes 0.05188s of that time.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

8c0a8537

29 9月, 2012 2 次提交

NFS: Clean up helper function nfs4_select_rw_stateid() · 2a369153

由 Trond Myklebust 提交于 8月 13, 2012

We want to be able to pass on the information that the page was not
dirtied under a lock. Instead of adding a flag parameter, do this
by passing a pointer to a 'struct nfs_lock_owner' that may be NULL.

Also reuse this structure in struct nfs_lock_context to carry the
fl_owner_t and pid_t.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

2a369153

NFS: Convert nfs_get_lock_context to return an ERR_PTR on failure · b3c54de6

由 Trond Myklebust 提交于 8月 13, 2012

We want to be able to distinguish between allocation failures, and
the case where the lock context is not needed (because there are no
locks).
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

b3c54de6

05 9月, 2012 1 次提交

NFS: Fix the initialisation of the readdir 'cookieverf' array · c3f52af3

由 Trond Myklebust 提交于 9月 03, 2012

When the NFS_COOKIEVERF helper macro was converted into a static
inline function in commit 99fadcd7 (nfs: convert NFS_*(inode)
helpers to static inline), we broke the initialisation of the
readdir cookies, since that depended on doing a memset with an
argument of 'sizeof(NFS_COOKIEVERF(inode))' which therefore
changed from sizeof(be32 cookieverf[2]) to sizeof(be32 *).

At this point, NFS_COOKIEVERF seems to be more of an obfuscation
than a helper, so the best thing would be to just get rid of it.

Also see: https://bugzilla.kernel.org/show_bug.cgi?id=46881Reported-by: NAndi Kleen <andi@firstfloor.org>
Reported-by: NDavid Binderman <dcb314@hotmail.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org

c3f52af3

01 8月, 2012 1 次提交

nfs: disable data cache revalidation for swapfiles · 29418aa4

由 Mel Gorman 提交于 7月 31, 2012

The VM does not like PG_private set on PG_swapcache pages.  As suggested
by Trond in http://lkml.org/lkml/2006/8/25/348, this patch disables NFS
data cache revalidation on swap files.  as it does not make sense to have
other clients change the file while it is being used as swap.  This avoids
setting PG_private on swap pages, since there ought to be no further races
with invalidate_inode_pages2() to deal with.

Since we cannot set PG_private we cannot use page->private which is
already used by PG_swapcache pages to store the nfs_page.  Thus augment
the new nfs_page_find_request logic.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NMel Gorman <mgorman@suse.de>
Acked-by: NRik van Riel <riel@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Paris <eparis@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Neil Brown <neilb@suse.de>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Xiaotian Feng <dfeng@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

29418aa4

31 7月, 2012 5 次提交

NFS: Convert v4 into a module · 89d77c8f

由 Bryan Schumaker 提交于 7月 30, 2012

This patch exports symbols needed by the v4 module.  In addition, I also
switch over to using IS_ENABLED() to check if CONFIG_NFS_V4 or
CONFIG_NFS_V4_MODULE are set.

The module (nfs4.ko) will be created in the same directory as nfs.ko and
will be automatically loaded the first time you try to mount over NFS v4.
Signed-off-by: NBryan Schumaker <bjschuma@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

89d77c8f

NFS: Convert v3 into a module · 1c606fb7

由 Bryan Schumaker 提交于 7月 30, 2012

This patch exports symbols and moves over the final structures needed by
the v3 module. In addition, I also switch over to using IS_ENABLED() to
check if CONFIG_NFS_V3 or CONFIG_NFS_V3_MODULE are set.

The module (nfs3.ko) will be created in the same directory as nfs.ko and
will be automatically loaded the first time you try to mount over NFS v3.
Signed-off-by: NBryan Schumaker <bjschuma@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

1c606fb7

NFS: Convert v2 into a module · ddda8e0a

由 Bryan Schumaker 提交于 7月 30, 2012

The module (nfs2.ko) will be created in the same directory as nfs.ko and
will be automatically loaded the first time you try to mount over NFS v2.
Signed-off-by: NBryan Schumaker <bjschuma@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

ddda8e0a

NFS: Split out remaining NFS v4 inode functions · 19d87ca3

由 Bryan Schumaker 提交于 7月 30, 2012

Somehow I missed this in my previous patch series, but these functions
are only needed by the v4 code and should be moved to a v4-only file. I
wasn't exactly sure where I should put these functions, so I moved them
into nfs4super.c where I could make them static.
Signed-off-by: NBryan Schumaker <bjschuma@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

19d87ca3

NFS: Add version registering framework · ab7017a3

由 Bryan Schumaker 提交于 7月 30, 2012

This patch adds in the code to track multiple versions of the NFS
protocol.  I created default structures for v2, v3 and v4 so that each
version can continue to work while I convert them into kernel modules.
I also removed the const parameter from the rpc_version array so that I
can change it at runtime.
Signed-off-by: NBryan Schumaker <bjschuma@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

ab7017a3

18 7月, 2012 1 次提交

NFS: Create an init_nfs_v4() function · 129d1977

由 Bryan Schumaker 提交于 7月 16, 2012

I want to initialize all of NFS v4 in a single function that will
eventually be used as the v4 module init function.
Signed-off-by: NBryan Schumaker <bjschuma@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

129d1977

29 6月, 2012 2 次提交

NFS: Create a return_delegation rpc op · 57ec14c5

由 Bryan Schumaker 提交于 6月 20, 2012

Delegations are a v4 feature, so push return_delegation out of the
generic client by creating a new rpc_op and renaming the old function to
be in the nfs v4 "namespace"
Signed-off-by: NBryan Schumaker <bjschuma@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

57ec14c5

NFS: Create a have_delegation rpc_op · 011e2a7f

由 Bryan Schumaker 提交于 6月 20, 2012

Delegations are a v4 feature, so push them out of the generic code.
Signed-off-by: NBryan Schumaker <bjschuma@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

011e2a7f

20 6月, 2012 1 次提交
- T
  NFS: Initialise commit_info.rpc_out when !defined(CONFIG_NFS_V4) · 1a0de48a
  由 Trond Myklebust 提交于 6月 19, 2012
```
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>
```
  1a0de48a
01 6月, 2012 1 次提交

NFS: Ensure that setattr and getattr wait for O_DIRECT write completion · 1d59d61f

由 Trond Myklebust 提交于 5月 31, 2012

Use the same mechanism as the block devices are using, but move the
helper functions from fs/direct-io.c into fs/inode.c to remove the
dependency on CONFIG_BLOCK.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Fred Isaman <iisaman@netapp.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1d59d61f

25 5月, 2012 1 次提交

NFSv4.1 add nfs_inode book keeping for mdsthreshold · 2701d086

由 Andy Adamson 提交于 5月 24, 2012

Keep track of the number of bytes read or written via buffered, direct, and
mem-mapped i/o for use by mdsthreshold size_io hints.
Signed-off-by: NAndy Adamson <andros@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

2701d086