提交 · 3a6bb738792500e8b4534c0350c13a132bac0492 · openanolis / cloud-kernel

16 6月, 2015 2 次提交

nfs: convert setclientid and exchange_id encoders to use clp->cl_owner_id · 3a6bb738

由 Jeff Layton 提交于 6月 09, 2015

...instead of buffers that are part of their arg structs. We already
hold a reference to the client, so we might as well use the allocated
buffer. In the event that we can't allocate the clp->cl_owner_id, then
just return -ENOMEM.

Note too that we switch from a GFP_KERNEL allocation here to GFP_NOFS.
It's possible we could end up trying to do a SETCLIENTID or EXCHANGE_ID
in order to reclaim some memory, and the GFP_KERNEL allocations in the
existing code could cause recursion back into NFS reclaim.
Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

3a6bb738

pnfs/flexfiles: use swap() in ff_layout_sort_mirrors() · 455b6ee6

由 Fabian Frederick 提交于 6月 12, 2015

Use kernel.h macro definition.

Thanks to Julia Lawall for Coccinelle scripting support.
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

455b6ee6

12 6月, 2015 1 次提交

nfs: deny backchannel RPCs with an incorrect authflavor instead of dropping them · 6f02dc88

由 Jeff Layton 提交于 6月 04, 2015

A drop should really only be done when the frame is malformed or we have
reason to think that there is some sort of DoS going on. When we get an
RPC with bad auth, we should send back an error instead.

Cc: Andy Adamson <William.Adamson@netapp.com>
Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

6f02dc88

11 6月, 2015 4 次提交

NFS: Convert use of __constant_htonl to htonl · 4ed0d83d

由 Vaishali Thakkar 提交于 6月 10, 2015

In little endian cases, the macro htonl unfolds to __swab32 which
provides special case for constants. In big endian cases,
__constant_htonl and htonl expand directly to the same expression.
So, replace __constant_htonl with htonl with the goal of getting
rid of the definition of __constant_htonl completely.

The semantic patch that performs this transformation is as follows:

@@expression x;@@

- __constant_htonl(x)
+ htonl(x)
Signed-off-by: NVaishali Thakkar <vthakkar1994@gmail.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

4ed0d83d

NFS: Remove unused nfs_rw_ops->rw_release() function · 11598b8f

由 Anna Schumaker 提交于 6月 10, 2015

This was only ever set to nfs_writeback_release_common(), a function
which is completely empty.  Let's just drop this function pointer and
simplify the code a bit.
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

11598b8f

NFSv4: handle nfs4_get_referral failure · c86c90c6

由 Dominique Martinet 提交于 6月 04, 2015

nfs4_proc_lookup_common is supposed to return a posix error, we have to
handle any error returned that isn't errno
Reported-by: NOlga Kornievskaia <kolga@netapp.com>
Signed-off-by: NFrank S. Filz <ffilzlnx@mindspring.com>
Signed-off-by: NDominique Martinet <dominique.martinet@cea.fr>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

c86c90c6

sunrpc: keep a count of swapfiles associated with the rpc_clnt · 3c87ef6e

由 Jeff Layton 提交于 6月 03, 2015

Jerome reported seeing a warning pop when working with a swapfile on
NFS. The nfs_swap_activate can end up calling sk_set_memalloc while
holding the rcu_read_lock and that function can sleep.

To fix that, we need to take a reference to the xprt while holding the
rcu_read_lock, set the socket up for swapping and then drop that
reference. But, xprt_put is not exported and having NFS deal with the
underlying xprt is a bit of layering violation anyway.

Fix this by adding a set of activate/deactivate functions that take a
rpc_clnt pointer instead of an rpc_xprt, and have nfs_swap_activate and
nfs_swap_deactivate call those.

Also, add a per-rpc_clnt atomic counter to keep track of the number of
active swapfiles associated with it. When the counter does a 0->1
transition, we enable swapping on the xprt, when we do a 1->0 transition
we disable swapping on it.

This also allows us to be a bit more selective with the RPC_TASK_SWAPPER
flag. If non-swapper and swapper clnts are sharing a xprt, then we only
need to flag the tasks from the swapper clnt with that flag.
Acked-by: NMel Gorman <mgorman@suse.de>
Reported-by: NJerome Marchand <jmarchan@redhat.com>
Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>
Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

3c87ef6e

02 6月, 2015 4 次提交

NFS: drop unneeded goto · 13985b1f

由 Julia Lawall 提交于 5月 28, 2015

Delete jump to a label on the next line, when that label is not
used elsewhere.

A simplified version of the semantic patch that makes this change is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@r@
identifier l;
@@

-if (...) goto l;
-l:
// </smpl>

Also drop the unnecessary ret variable.
Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

13985b1f

NFS: Fix size of NFSACL SETACL operations · d683cc49

由 Chuck Lever 提交于 5月 26, 2015

When encoding the NFSACL SETACL operation, reserve just the estimated
size of the ACL rather than a fixed maximum. This eliminates needless
zero padding on the wire that the server ignores.

Fixes: ee5dc773 ('NFS: Fix "kernel BUG at fs/nfs/nfs3xdr.c:1338!"')
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

d683cc49

NFS: report more appropriate block size for directories. · 7ef5ca4f

由 NeilBrown 提交于 5月 08, 2015

In glibc 2.21 (and several previous), a call to opendir() will
result in a 32K (BUFSIZ*4) buffer being allocated and passed to
getdents.

However a call to fdopendir() results in an 'fstat' request to
determine block size and a matching buffer allocated for subsequent
use with getdents.  This will typically be 1M.

The first getdents call on an NFS directory will always use
READDIR_PLUS (or NFSv4 equivalent) if available.  Subsequent getdents
calls only use this more expensive version if some 'stat' requests are
made between the getdents calls.

For this reason it is good to keep at least that first getdents call
relatively short.  When fdopendir() and readdir() is used on a large
directory, it takes approximately 32 times as long to complete as
using "opendir".  Current versions of 'find' use fdopendir() and
demonstrate this slowness.

'stat' on a directory currently returns the 'wsize'.  This number has
no meaning on directories.
Actual READDIR requests are limited to ->dtsize, which itself is
capped at 4 pages, coincidently the same as BUFSIZ*4.
So this is a meaningful number to use as the blocksize on directories,
and has the effect of making 'find' on large directories go a lot
faster.
Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

7ef5ca4f

NFSv4: Always drain the slot table before re-establishing the lease · 5cae02f4

由 Trond Myklebust 提交于 6月 01, 2015

While the NFSv4.1 code has always drained the slot tables in order to stop
non-recovery related RPC calls when doing lease recovery, the NFSv4 code
did not.
The reason for the difference in behaviour is that NFSv4 does not have
session state, and so RPC calls can in theory proceed while recovery is
happening. In practice, however, anything I/O or state related needs to
wait until recovery is over.

This patch changes the behaviour of NFSv4 to match that of NFSv4.1 so that
we can simplify the state recovery code by assuming that we do not have to
deal with races between recovery and ordinary I/O.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

5cae02f4

01 6月, 2015 1 次提交

fixing infinite OPEN loop in 4.0 stateid recovery · e8d975e7

由 Olga Kornievskaia 提交于 5月 15, 2015

Problem: When an operation like WRITE receives a BAD_STATEID, even though
recovery code clears the RECLAIM_NOGRACE recovery flag before recovering
the open state, because of clearing delegation state for the associated
inode, nfs_inode_find_state_and_recover() gets called and it makes the
same state with RECLAIM_NOGRACE flag again. As a results, when we restart
looking over the open states, we end up in the infinite loop instead of
breaking out in the next test of state flags.

Solution: unset the RECLAIM_NOGRACE set because of
calling of nfs_inode_find_state_and_recover() after returning from calling
recover_open() function.
Signed-off-by: NOlga Kornievskaia <kolga@netapp.com>
Cc: stable@vger.kernel.org
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

e8d975e7

14 5月, 2015 2 次提交

nfs: take extra reference to fl->fl_file when running a setlk · feaff8e5

由 Jeff Layton 提交于 5月 12, 2015

We had a report of a crash while stress testing the NFS client:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000150
    IP: [<ffffffff8127b698>] locks_get_lock_context+0x8/0x90
    PGD 0
    Oops: 0000 [#1] SMP
    Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_filter ebtable_broute bridge stp llc ebtables ip6table_security ip6table_mangle ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_raw ip6table_filter ip6_tables iptable_security iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_raw coretemp crct10dif_pclmul ppdev crc32_pclmul crc32c_intel ghash_clmulni_intel vmw_balloon serio_raw vmw_vmci i2c_piix4 shpchp parport_pc acpi_cpufreq parport nfsd auth_rpcgss nfs_acl lockd grace sunrpc vmwgfx drm_kms_helper ttm drm mptspi scsi_transport_spi mptscsih mptbase e1000 ata_generic pata_acpi
    CPU: 1 PID: 399 Comm: kworker/1:1H Not tainted 4.1.0-0.rc1.git0.1.fc23.x86_64 #1
    Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/30/2013
    Workqueue: rpciod rpc_async_schedule [sunrpc]
    task: ffff880036aea7c0 ti: ffff8800791f4000 task.ti: ffff8800791f4000
    RIP: 0010:[<ffffffff8127b698>]  [<ffffffff8127b698>] locks_get_lock_context+0x8/0x90
    RSP: 0018:ffff8800791f7c00  EFLAGS: 00010293
    RAX: ffff8800791f7c40 RBX: ffff88001f2ad8c0 RCX: ffffe8ffffc80305
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
    RBP: ffff8800791f7c88 R08: ffff88007fc971d8 R09: 279656d600000000
    R10: 0000034a01000000 R11: 279656d600000000 R12: ffff88001f2ad918
    R13: ffff88001f2ad8c0 R14: 0000000000000000 R15: 0000000100e73040
    FS:  0000000000000000(0000) GS:ffff88007fc80000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000150 CR3: 0000000001c0b000 CR4: 00000000000407e0
    Stack:
     ffffffff8127c5b0 ffff8800791f7c18 ffffffffa0171e29 ffff8800791f7c58
     ffffffffa0171ef8 ffff8800791f7c78 0000000000000246 ffff88001ea0ba00
     ffff8800791f7c40 ffff8800791f7c40 00000000ff5d86a3 ffff8800791f7ca8
    Call Trace:
     [<ffffffff8127c5b0>] ? __posix_lock_file+0x40/0x760
     [<ffffffffa0171e29>] ? rpc_make_runnable+0x99/0xa0 [sunrpc]
     [<ffffffffa0171ef8>] ? rpc_wake_up_task_queue_locked.part.35+0xc8/0x250 [sunrpc]
     [<ffffffff8127cd3a>] posix_lock_file_wait+0x4a/0x120
     [<ffffffffa03e4f12>] ? nfs41_wake_and_assign_slot+0x32/0x40 [nfsv4]
     [<ffffffffa03bf108>] ? nfs41_sequence_done+0xd8/0x2d0 [nfsv4]
     [<ffffffffa03c116d>] do_vfs_lock+0x2d/0x30 [nfsv4]
     [<ffffffffa03c251d>] nfs4_lock_done+0x1ad/0x210 [nfsv4]
     [<ffffffffa0171a30>] ? __rpc_sleep_on_priority+0x390/0x390 [sunrpc]
     [<ffffffffa0171a30>] ? __rpc_sleep_on_priority+0x390/0x390 [sunrpc]
     [<ffffffffa0171a5c>] rpc_exit_task+0x2c/0xa0 [sunrpc]
     [<ffffffffa0167450>] ? call_refreshresult+0x150/0x150 [sunrpc]
     [<ffffffffa0172640>] __rpc_execute+0x90/0x460 [sunrpc]
     [<ffffffffa0172a25>] rpc_async_schedule+0x15/0x20 [sunrpc]
     [<ffffffff810baa1b>] process_one_work+0x1bb/0x410
     [<ffffffff810bacc3>] worker_thread+0x53/0x480
     [<ffffffff810bac70>] ? process_one_work+0x410/0x410
     [<ffffffff810bac70>] ? process_one_work+0x410/0x410
     [<ffffffff810c0b38>] kthread+0xd8/0xf0
     [<ffffffff810c0a60>] ? kthread_worker_fn+0x180/0x180
     [<ffffffff817a1aa2>] ret_from_fork+0x42/0x70
     [<ffffffff810c0a60>] ? kthread_worker_fn+0x180/0x180

Jean says:

"Running locktests with a large number of iterations resulted in a
 client crash.  The test run took a while and hasn't finished after close
 to 2 hours. The crash happened right after I gave up and killed the test
 (after 107m) with Ctrl+C."

The crash happened because a NULL inode pointer got passed into
locks_get_lock_context. The call chain indicates that file_inode(filp)
returned NULL, which means that f_inode was NULL. Since that's zeroed
out in __fput, that suggests that this filp pointer outlived the last
reference.

Looking at the code, that seems possible. We copy the struct file_lock
that's passed in, but if the task is signalled at an inopportune time we
can end up trying to use that file_lock in rpciod context after the process
that requested it has already returned (and possibly put its filp
reference).

Fix this by taking an extra reference to the filp when we allocate the
lock info, and put it in nfs4_lock_release.
Reported-by: NJean Spector <jean@primarydata.com>
Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

feaff8e5

nfs: stat(2) fails during cthon04 basic test5 on NFSv4.0 · 6b196875

由 Chuck Lever 提交于 5月 06, 2015

When running the Connectathon basic tests against a Solaris NFS
server over NFSv4.0, test5 reports that stat(2) returns a file size
of zero instead of 1MB.

On success, nfs_commit_inode() can return a positive result; see
other call sites such as nfs_file_fsync_commit() and
nfs_commit_unstable_pages().

The call site recently added in nfs_wb_all() does not prevent that
positive return value from leaking to its callers. If it leaks
through nfs_sync_inode() back to nfs_getattr(), that causes stat(2)
to return a positive return value to user space while also not
filling in the passed-in struct stat.

Additional clean up: the new logic in nfs_wb_all() is rewritten in
bfields-normal form.

Fixes: 5bb89b47 ("NFSv4.1/pnfs: Separate out metadata . . .")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

6b196875

25 4月, 2015 1 次提交

direct-io: only inc/dec inode->i_dio_count for file systems · fe0f07d0

由 Jens Axboe 提交于 4月 15, 2015

do_blockdev_direct_IO() increments and decrements the inode
->i_dio_count for each IO operation. It does this to protect against
truncate of a file. Block devices don't need this sort of protection.

For a capable multiqueue setup, this atomic int is the only shared
state between applications accessing the device for O_DIRECT, and it
presents a scaling wall for that. In my testing, as much as 30% of
system time is spent incrementing and decrementing this value. A mixed
read/write workload improved from ~2.5M IOPS to ~9.6M IOPS, with
better latencies too. Before:

clat percentiles (usec):
 |  1.00th=[   33],  5.00th=[   34], 10.00th=[   34], 20.00th=[   34],
 | 30.00th=[   34], 40.00th=[   34], 50.00th=[   35], 60.00th=[   35],
 | 70.00th=[   35], 80.00th=[   35], 90.00th=[   37], 95.00th=[   80],
 | 99.00th=[   98], 99.50th=[  151], 99.90th=[  155], 99.95th=[  155],
 | 99.99th=[  165]

After:

clat percentiles (usec):
 |  1.00th=[   95],  5.00th=[  108], 10.00th=[  129], 20.00th=[  149],
 | 30.00th=[  155], 40.00th=[  161], 50.00th=[  167], 60.00th=[  171],
 | 70.00th=[  177], 80.00th=[  185], 90.00th=[  201], 95.00th=[  270],
 | 99.00th=[  390], 99.50th=[  398], 99.90th=[  418], 99.95th=[  422],
 | 99.99th=[  438]

In other setups, Robert Elliott reported seeing good performance
improvements:

https://lkml.org/lkml/2015/4/3/557

The more applications accessing the device, the worse it gets.

Add a new direct-io flags, DIO_SKIP_DIO_COUNT, which tells
do_blockdev_direct_IO() that it need not worry about incrementing
or decrementing the inode i_dio_count for this caller.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Elliott, Robert (Server Storage) <elliott@hp.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NJens Axboe <axboe@fb.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

fe0f07d0

24 4月, 2015 14 次提交

fs/nfs: fix new compiler warning about boolean in switch · c7757074

由 Andre Przywara 提交于 4月 23, 2015

The brand new GCC 5.1.0 warns by default on using a boolean in the
switch condition. This results in the following warning:

fs/nfs/nfs4proc.c: In function 'nfs4_proc_get_rootfh':
fs/nfs/nfs4proc.c:3100:10: warning: switch condition has boolean value [-Wswitch-bool]
  switch (auth_probe) {
          ^

This code was obviously using switch to make use of the fall-through
semantics (without the usual comment, though).
Rewrite that code using if statements to avoid the warning and make
the code a bit more readable on the way.
Signed-off-by: NAndre Przywara <andre.przywara@arm.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

c7757074

nfs: Remove unneeded casts in nfs · c456aacf

由 Firo Yang 提交于 4月 23, 2015

Don't unnecessarily cast allocation return value in
fs/nfs/inode.c::nfs_alloc_inode().
Signed-off-by: NFiro Yang <firogm@gmail.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

c456aacf

NFS: Don't attempt to decode missing directory entries · ce85cfbe

由 Benjamin Coddington 提交于 4月 21, 2015

If a READDIR reply comes back without any page data, avoid a NULL pointer
dereference in xdr_copy_to_scratch().

BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
IP: [<ffffffff813a378d>] memcpy+0xd/0x110
...
Call Trace:
	? xdr_inline_decode+0x7a/0xb0 [sunrpc]
	nfs3_decode_dirent+0x73/0x320 [nfsv3]
	nfs_readdir_page_filler+0xd5/0x4e0 [nfs]
	? nfs3_rpc_wrapper.constprop.9+0x42/0xc0 [nfsv3]
	nfs_readdir_xdr_to_array+0x1fa/0x330 [nfs]
	? mem_cgroup_commit_charge+0xac/0x160
	? nfs_readdir_xdr_to_array+0x330/0x330 [nfs]
	nfs_readdir_filler+0x22/0x90 [nfs]
	do_read_cache_page+0x7e/0x1a0
	read_cache_page+0x1c/0x20
	nfs_readdir+0x18e/0x660 [nfs]
	? nfs3_xdr_dec_getattr3res+0x80/0x80 [nfsv3]
	iterate_dir+0x97/0x130
	SyS_getdents+0x94/0x120
	? fillonedir+0xd0/0xd0
	system_call_fastpath+0x12/0x17
Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

ce85cfbe

Revert "nfs: replace nfs_add_stats with nfs_inc_stats when add one" · 3708f842

由 Nicolas Iooss 提交于 4月 16, 2015

This reverts commit 5a254d08.

Since commit 5a254d08 ("nfs: replace nfs_add_stats with
nfs_inc_stats when add one"), nfs_readpage and nfs_do_writepage use
nfs_inc_stats to increment NFSIOS_READPAGES and NFSIOS_WRITEPAGES
instead of nfs_add_stats.

However nfs_inc_stats does not do the same thing as nfs_add_stats with
value 1 because these functions work on distinct stats:
nfs_inc_stats increments stats from "enum nfs_stat_eventcounters" (in
server->io_stats->events) and nfs_add_stats those from "enum
nfs_stat_bytecounters" (in server->io_stats->bytes).
Signed-off-by: NNicolas Iooss <nicolas.iooss_linux@m4x.org>
Fixes: 5a254d08 ("nfs: replace nfs_add_stats with nfs_inc_stats...")
Cc: stable@vger.kernel.org # 3.19+
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

3708f842

NFS: Rename idmap.c to nfs4idmap.c · 7b320382

由 Anna Schumaker 提交于 4月 15, 2015

I added the nfs4 prefix to make it obvious that this file is built into
the NFS v4 module, and not the generic client.
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

7b320382

NFS: Move nfs_idmap.h into fs/nfs/ · 40c64c26

由 Anna Schumaker 提交于 4月 15, 2015

This file is only used internally to the NFS v4 module, so it doesn't
need to be in the global include path.  I also renamed it from
nfs_idmap.h to nfs4idmap.h to emphasize that it's an NFSv4-only include
file.
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

40c64c26

NFS: Remove CONFIG_NFS_V4 checks from nfs_idmap.h · f9ebd618

由 Anna Schumaker 提交于 4月 15, 2015

The idmapper is completely internal to the NFS v4 module, so this macro
will always evaluate to true. This patch also removes unnecessary
includes of this file from the generic NFS client.
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

f9ebd618

NFS: Add a stub for GETDEVICELIST · 7c61f0d3

由 Anna Schumaker 提交于 4月 14, 2015

d4b18c3e (pnfs: remove GETDEVICELIST implementation) removed the
GETDEVICELIST operation from the NFS client, but left a "hole" in the
nfs4_procedures array.  This caused /proc/self/mountstats to report an
operation named "51" where GETDEVICELIST used to be.  This patch adds a
stub to fix mountstats.
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
Fixes: d4b18c3e (pnfs: remove GETDEVICELIST implementation)
Cc: stable@vger.kernel.org # 3.17+
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

7c61f0d3

nfs: remove WARN_ON_ONCE from nfs_direct_good_bytes · 05f54903

由 Peng Tao 提交于 4月 09, 2015

For flexfiles driver, we might choose to read from mirror index other
than 0 while mirror_count is always 1 for read.
Reported-by: NJean Spector <jean@primarydata.com>
Cc: <stable@vger.kernel.org> # v3.19+
Cc: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: NPeng Tao <tao.peng@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

05f54903

nfs: fix DIO good bytes calculation · 1ccbad9f

由 Peng Tao 提交于 4月 09, 2015

For direct read that has IO size larger than rsize, we'll split
it into several READ requests and nfs_direct_good_bytes() would
count completed bytes incorrectly by eating last zero count reply.

Fix it by handling mirror and non-mirror cases differently such that
we only count mirrored writes differently.

This fixes 5fadeb47("nfs: count DIO good bytes correctly with mirroring").
Reported-by: NJean Spector <jean@primarydata.com>
Cc: <stable@vger.kernel.org> # v3.19+
Signed-off-by: NPeng Tao <tao.peng@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

1ccbad9f

nfs: Fetch MOUNTED_ON_FILEID when updating an inode · ea96d1ec

由 Anna Schumaker 提交于 4月 03, 2015

2ef47eb1 (NFS: Fix use of nfs_attr_use_mounted_on_fileid()) was a good
start to fixing a circular directory structure warning for NFS v4
"junctioned" mountpoints.  Unfortunately, further testing continued to
generate this error.

My server is configured like this:

anna@nfsd ~ % df
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1       9.1G  2.0G  6.5G  24% /
/dev/vdc1      1014M   33M  982M   4% /exports
/dev/vdc2      1014M   33M  982M   4% /exports/vol1
/dev/vdc3      1014M   33M  982M   4% /exports/vol1/vol2

anna@nfsd ~ % cat /etc/exports
/exports/          *(rw,async,no_subtree_check,no_root_squash)
/exports/vol1/     *(rw,async,no_subtree_check,no_root_squash)
/exports/vol1/vol2 *(rw,async,no_subtree_check,no_root_squash)

I've been running chown across the entire mountpoint twice in a row to
hit this problem.  The first run succeeds, but the second one fails with
the circular directory warning along with:

anna@client ~ % dmesg
[Apr 3 14:28] NFS: server 192.168.100.204 error: fileid changed
              fsid 0:39: expected fileid 0x100080, got 0x80

WHere 0x80 is the mountpoint's fileid and 0x100080 is the mounted-on
fileid.

This patch fixes the issue by requesting an updated mounted-on fileid
from the server during nfs_update_inode(), and then checking that the
fileid stored in the nfs_inode matches either the fileid or mounted-on
fileid returned by the server.
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

ea96d1ec

nfs: fix high load average due to callback thread sleeping · 5d05e54a

由 Jeff Layton 提交于 3月 20, 2015

Chuck pointed out a problem that crept in with commit 6ffa30d3 (nfs:
don't call blocking operations while !TASK_RUNNING). Linux counts tasks
in uninterruptible sleep against the load average, so this caused the
system's load average to be pinned at at least 1 when there was a
NFSv4.1+ mount active.

Not a huge problem, but it's probably worth fixing before we get too
many complaints about it. This patch converts the code back to use
TASK_INTERRUPTIBLE sleep, simply has it flush any signals on each loop
iteration. In practice no one should really be signalling this thread at
all, so I think this is reasonably safe.

With this change, there's also no need to game the hung task watchdog so
we can also convert the schedule_timeout call back to a normal schedule.

Cc: <stable@vger.kernel.org>
Reported-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>
Tested-by: NChuck Lever <chuck.lever@oracle.com>
Fixes: commit 6ffa30d3 (“nfs: don't call blocking . . .”)
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

5d05e54a

NFS: Reduce time spent holding the i_mutex during fallocate() · f830f7dd

由 Anna Schumaker 提交于 3月 16, 2015

At the very least, we should not be taking the i_mutex until after
checking if the server even supports ALLOCATE or DEALLOCATE, allowing
v4.0 or v4.1 to exit without potentially waiting on a lock.
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

f830f7dd

NFS: Don't zap caches on fallocate() · 9a51940b

由 Anna Schumaker 提交于 3月 16, 2015

This patch adds a GETATTR to the end of ALLOCATE and DEALLOCATE
operations so we can set the updated inode size and change attribute
directly.  DEALLOCATE will still need to release pagecache pages, so
nfs42_proc_deallocate() now calls truncate_pagecache_range() before
contacting the server.
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

9a51940b

16 4月, 2015 4 次提交

kernel: conditionally support non-root users, groups and capabilities · 2813893f

由 Iulia Manda 提交于 4月 15, 2015

There are a lot of embedded systems that run most or all of their
functionality in init, running as root:root.  For these systems,
supporting multiple users is not necessary.

This patch adds a new symbol, CONFIG_MULTIUSER, that makes support for
non-root users, non-root groups, and capabilities optional.  It is enabled
under CONFIG_EXPERT menu.

When this symbol is not defined, UID and GID are zero in any possible case
and processes always have all capabilities.

The following syscalls are compiled out: setuid, setregid, setgid,
setreuid, setresuid, getresuid, setresgid, getresgid, setgroups,
getgroups, setfsuid, setfsgid, capget, capset.

Also, groups.c is compiled out completely.

In kernel/capability.c, capable function was moved in order to avoid
adding two ifdef blocks.

This change saves about 25 KB on a defconfig build.  The most minimal
kernels have total text sizes in the high hundreds of kB rather than
low MB.  (The 25k goes down a bit with allnoconfig, but not that much.

The kernel was booted in Qemu.  All the common functionalities work.
Adding users/groups is not possible, failing with -ENOSYS.

Bloat-o-meter output:
add/remove: 7/87 grow/shrink: 19/397 up/down: 1675/-26325 (-24650)

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NIulia Manda <iulia.manda21@gmail.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Tested-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2813893f

VFS: normal filesystems (and lustre): d_inode() annotations · 2b0143b5

由 David Howells 提交于 3月 17, 2015

that's the bulk of filesystem drivers dealing with inodes of their own
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2b0143b5

NFS: Don't use d_inode as a variable name · 88e7fbd4

由 David Howells 提交于 3月 04, 2015

Don't use d_inode as a variable name as it now masks a function name.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

88e7fbd4

A
nfs: generic_write_checks() shouldn't be done on swapout... · 65a4a1ca
由 Al Viro 提交于 4月 09, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
65a4a1ca

15 4月, 2015 1 次提交

page_writeback: clean up mess around cancel_dirty_page() · b9ea2515

由 Konstantin Khlebnikov 提交于 4月 14, 2015

This patch replaces cancel_dirty_page() with a helper function
account_page_cleaned() which only updates counters.  It's called from
truncate_complete_page() and from try_to_free_buffers() (hack for ext3).
Page is locked in both cases, page-lock protects against concurrent
dirtiers: see commit 2d6d7f98 ("mm: protect set_page_dirty() from
ongoing truncation").

Delete_from_page_cache() shouldn't be called for dirty pages, they must
be handled by caller (either written or truncated).  This patch treats
final dirty accounting fixup at the end of __delete_from_page_cache() as
a debug check and adds WARN_ON_ONCE() around it.  If something removes
dirty pages without proper handling that might be a bug and unwritten
data might be lost.

Hugetlbfs has no dirty pages accounting, ClearPageDirty() is enough
here.

cancel_dirty_page() in nfs_wb_page_cancel() is redundant.  This is
helper for nfs_invalidate_page() and it's called only in case complete
invalidation.

The mess was started in v2.6.20 after commits 46d2277c ("Clean up
and make try_to_free_buffers() not race with dirty pages") and
3e67c098 ("truncate: clear page dirtiness before running
try_to_free_buffers()") first was reverted right in v2.6.20 in commit
ecdfc978 ("Resurrect 'try_to_free_buffers()' VM hackery"), second in
v2.6.25 commit a2b34564 ("Fix dirty page accounting leak with ext3
data=journal").

Custom fixes were introduced between these points.  NFS in v2.6.23, commit
1b3b4a1a ("NFS: Fix a write request leak in nfs_invalidate_page()").
Kludge in __delete_from_page_cache() in v2.6.24, commit 3a692790 ("Do
dirty page accounting when removing a page from the page cache").  Since
v2.6.25 all of them are redundant.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b9ea2515

12 4月, 2015 6 次提交

A
mirror O_APPEND and O_DIRECT into iocb->ki_flags · 2ba48ce5
由 Al Viro 提交于 4月 09, 2015
```
... avoiding write_iter/fcntl races.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
2ba48ce5

switch generic_write_checks() to iocb and iter · 3309dd04

由 Al Viro 提交于 4月 09, 2015

... returning -E... upon error and amount of data left in iter after
(possible) truncation upon success.  Note, that normal case gives
a non-zero (positive) return value, so any tests for != 0 _must_ be
updated.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

Conflicts:
	fs/ext4/file.c

3309dd04

generic_write_checks(): drop isblk argument · 0fa6b005

由 Al Viro 提交于 4月 04, 2015

all remaining callers are passing 0; some just obscure that fact.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

0fa6b005

direct_IO: remove rw from a_ops->direct_IO() · 22c6186e

由 Omar Sandoval 提交于 3月 16, 2015

Now that no one is using rw, remove it completely.
Signed-off-by: NOmar Sandoval <osandov@osandov.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

22c6186e

direct_IO: use iov_iter_rw() instead of rw everywhere · 6f673763

由 Omar Sandoval 提交于 3月 16, 2015

The rw parameter to direct_IO is redundant with iov_iter->type, and
treated slightly differently just about everywhere it's used: some users
do rw & WRITE, and others do rw == WRITE where they should be doing a
bitwise check. Simplify this with the new iov_iter_rw() helper, which
always returns either READ or WRITE.
Signed-off-by: NOmar Sandoval <osandov@osandov.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6f673763

make new_sync_{read,write}() static · 5d5d5689

由 Al Viro 提交于 4月 03, 2015

All places outside of core VFS that checked ->read and ->write for being NULL or
called the methods directly are gone now, so NULL {read,write} with non-NULL
{read,write}_iter will do the right thing in all cases.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5d5d5689

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功