提交 · 81833de1a46edce9ca20cfe079872ac1c20ef359 · openanolis / cloud-kernel

28 11月, 2017 12 次提交

lockd: fix "list_add double add" caused by legacy signal interface · 81833de1

由 Vasily Averin 提交于 11月 13, 2017

restart_grace() uses hardcoded init_net.
It can cause to "list_add double add" in following scenario:

1) nfsd and lockd was started in several net namespaces
2) nfsd in init_net was stopped (lockd was not stopped because
 it have users from another net namespaces)
3) lockd got signal, called restart_grace() -> set_grace_period()
 and enabled lock_manager in hardcoded init_net.
4) nfsd in init_net is started again,
 its lockd_up() calls set_grace_period() and tries to add
 lock_manager into init_net 2nd time.

Jeff Layton suggest:
"Make it safe to call locks_start_grace multiple times on the same
lock_manager. If it's already on the global grace_list, then don't try
to add it again.  (But we don't intentionally add twice, so for now we
WARN about that case.)

With this change, we also need to ensure that the nfsd4 lock manager
initializes the list before we call locks_start_grace. While we're at
it, move the rest of the nfsd_net initialization into
nfs4_state_create_net. I see no reason to have it spread over two
functions like it is today."

Suggested patch was updated to generate warning in described situation.
Suggested-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

81833de1

nfsd: check for use of the closed special stateid · ae254dac

由 Andrew Elble 提交于 11月 09, 2017

Prevent the use of the closed (invalid) special stateid by clients.
Signed-off-by: NAndrew Elble <aweits@rit.edu>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

ae254dac

nfsd: fix panic in posix_unblock_lock called from nfs4_laundromat · 64ebe124

由 Naofumi Honda 提交于 11月 09, 2017

From kernel 4.9, my two nfsv4 servers sometimes suffer from
    "panic: unable to handle kernel page request"
in posix_unblock_lock() called from nfs4_laundromat().

These panics diseappear if we revert the commit "nfsd: add a LRU list
for blocked locks".

The cause appears to be a typo in nfs4_laundromat(), which is also
present in nfs4_state_shutdown_net().

Cc: stable@vger.kernel.org
Fixes: 7919d0a2 "nfsd: add a LRU list for blocked locks"
Cc: jlayton@redhat.com
Reveiwed-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

64ebe124

nfsd: fix locking validator warning on nfs4_ol_stateid->st_mutex class · 4f34bd05

由 Andrew Elble 提交于 11月 08, 2017

The use of the st_mutex has been confusing the validator. Use the
proper nested notation so as to not produce warnings.
Signed-off-by: NAndrew Elble <aweits@rit.edu>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

4f34bd05

nfsd: Fix races with check_stateid_generation() · 03da3169

由 Trond Myklebust 提交于 11月 03, 2017

The various functions that call check_stateid_generation() in order
to compare a client-supplied stateid with the nfs4_stid state, usually
need to atomically check for closed state. Those that perform the
check after locking the st_mutex using nfsd4_lock_ol_stateid()
should now be OK, but we do want to fix up the others.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

03da3169

nfsd: Ensure we check stateid validity in the seqid operation checks · 9271d7e5

由 Trond Myklebust 提交于 11月 03, 2017

After taking the stateid st_mutex, we want to know that the stateid
still represents valid state before performing any non-idempotent
actions.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

9271d7e5

nfsd: Fix race in lock stateid creation · beeca19c

由 Trond Myklebust 提交于 11月 03, 2017

If we're looking up a new lock state, and the creation fails, then
we want to unhash it, just like we do for OPEN. However in order
to do so, we need to that no other LOCK requests can grab the
mutex until we have unhashed it (and marked it as closed).
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

beeca19c

nfsd4: move find_lock_stateid · fd1fd685

由 Trond Myklebust 提交于 11月 03, 2017

Trivial cleanup to simplify following patch.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

fd1fd685

nfsd: Ensure we don't recognise lock stateids after freeing them · 659aefb6

由 Trond Myklebust 提交于 11月 03, 2017

In order to deal with lookup races, nfsd4_free_lock_stateid() needs
to be able to signal to other stateful functions that the lock stateid
is no longer valid. Right now, nfsd_lock() will check whether or not an
existing stateid is still hashed, but only in the "new lock" path.

To ensure the stateid invalidation is also recognised by the "existing lock"
path, and also by a second call to nfsd4_free_lock_stateid() itself, we can
change the type to NFS4_CLOSED_STID under the stp->st_mutex.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

659aefb6

T
nfsd: CLOSE SHOULD return the invalid special stateid for NFSv4.x (x>0) · fb500a7c
由 Trond Myklebust 提交于 11月 03, 2017
```
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
```
fb500a7c

nfsd: Fix another OPEN stateid race · d8a1a000

由 Trond Myklebust 提交于 11月 03, 2017

If nfsd4_process_open2() is initialising a new stateid, and yet the
call to nfs4_get_vfs_file() fails for some reason, then we must
declare the stateid closed, and unhash it before dropping the mutex.

Right now, we unhash the stateid after dropping the mutex, and without
changing the stateid type, meaning that another OPEN could theoretically
look it up and attempt to use it.
Reported-by: NAndrew W Elble <aweits@rit.edu>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

d8a1a000

nfsd: Fix stateid races between OPEN and CLOSE · 15ca08d3

由 Trond Myklebust 提交于 11月 03, 2017

Open file stateids can linger on the nfs4_file list of stateids even
after they have been closed. In order to avoid reusing such a
stateid, and confusing the client, we need to recheck the
nfs4_stid's type after taking the mutex.
Otherwise, we risk reusing an old stateid that was already closed,
which will confuse clients that expect new stateids to conform to
RFC7530 Sections 9.1.4.2 and 16.2.5 or RFC5661 Sections 8.2.2 and 18.2.4.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

15ca08d3

08 11月, 2017 7 次提交

nfsd: deal with revoked delegations appropriately · 95da1b3a

由 Andrew Elble 提交于 11月 03, 2017

If a delegation has been revoked by the server, operations using that
delegation should error out with NFS4ERR_DELEG_REVOKED in the >4.1
case, and NFS4ERR_BAD_STATEID otherwise.

The server needs NFSv4.1 clients to explicitly free revoked delegations.
If the server returns NFS4ERR_DELEG_REVOKED, the client will do that;
otherwise it may just forget about the delegation and be unable to
recover when it later sees SEQ4_STATUS_RECALLABLE_STATE_REVOKED set on a
SEQUENCE reply.  That can cause the Linux 4.1 client to loop in its
stage manager.
Signed-off-by: NAndrew Elble <aweits@rit.edu>
Reviewed-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

95da1b3a

nfsd: use nfs->ns.inum as net ID · 7e981a8a

由 Vasily Averin 提交于 11月 06, 2017

Publishing of net pointer is not safe,
let's use nfs->ns.inum instead
Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

7e981a8a

fs, nfsd: convert nfs4_file.fi_ref from atomic_t to refcount_t · 818a34eb

由 Elena Reshetova 提交于 10月 20, 2017

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable nfs4_file.fi_ref is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NDavid Windsor <dwindsor@gmail.com>
Reviewed-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

818a34eb

fs, nfsd: convert nfs4_cntl_odstate.co_odcount from atomic_t to refcount_t · cff7cb2e

由 Elena Reshetova 提交于 10月 20, 2017

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable nfs4_cntl_odstate.co_odcount is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NDavid Windsor <dwindsor@gmail.com>
Reviewed-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

cff7cb2e

fs, nfsd: convert nfs4_stid.sc_count from atomic_t to refcount_t · a15dfcd5

由 Elena Reshetova 提交于 10月 20, 2017

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable nfs4_stid.sc_count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NDavid Windsor <dwindsor@gmail.com>
Reviewed-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

a15dfcd5

nfsd4: catch some false session retries · 53da6a53

由 J. Bruce Fields 提交于 10月 17, 2017

The spec allows us to return NFS4ERR_SEQ_FALSE_RETRY if we notice that
the client is making a call that matches a previous (slot, seqid) pair
but that *isn't* actually a replay, because some detail of the call
doesn't actually match the previous one.

Catching every such case is difficult, but we may as well catch a few
easy ones.  This also handles the case described in the previous patch,
in a different way.

The spec does however require us to catch the case where the difference
is in the rpc credentials.  This prevents somebody from snooping another
user's replies by fabricating retries.

(But the practical value of the attack is limited by the fact that the
replies with the most sensitive data are READ replies, which are not
normally cached.)
Tested-by: NOlga Kornievskaia <aglo@umich.edu>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

53da6a53

nfsd4: fix cached replies to solo SEQUENCE compounds · 085def3a

由 J. Bruce Fields 提交于 10月 18, 2017

Currently our handling of 4.1+ requests without "cachethis" set is
confusing and not quite correct.

Suppose a client sends a compound consisting of only a single SEQUENCE
op, and it matches the seqid in a session slot (so it's a retry), but
the previous request with that seqid did not have "cachethis" set.

The obvious thing to do might be to return NFS4ERR_RETRY_UNCACHED_REP,
but the protocol only allows that to be returned on the op following the
SEQUENCE, and there is no such op in this case.

The protocol permits us to cache replies even if the client didn't ask
us to. And it's easy to do so in the case of solo SEQUENCE compounds.

So, when we get a solo SEQUENCE, we can either return the previously
cached reply or NFSERR_SEQ_FALSE_RETRY if we notice it differs in some
way from the original call.

Currently, we're returning a corrupt reply in the case a solo SEQUENCE
matches a previous compound with more ops. This actually matters
because the Linux client recently started doing this as a way to recover
from lost replies to idempotent operations in the case the process doing
the original reply was killed: in that case it's difficult to keep the
original arguments around to do a real retry, and the client no longer
cares what the result is anyway, but it would like to make sure that the
slot's sequence id has been incremented, and the solo SEQUENCE assures
that: if the server never got the original reply, it will increment the
sequence id. If it did get the original reply, it won't increment, and
nothing else that about the reply really matters much. But we can at
least attempt to return valid xdr!
Tested-by: NOlga Kornievskaia <aglo@umich.edu>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

085def3a

05 10月, 2017 1 次提交

nfsd: give out fewer session slots as limit approaches · de766e57

由 J. Bruce Fields 提交于 9月 19, 2017

Instead of granting client's full requests until we hit our DRC size
limit and then failing CREATE_SESSIONs (and hence mounts) completely,
start granting clients smaller slot tables as we approach the limit.

The factor chosen here is pretty much arbitrary.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

de766e57

14 7月, 2017 3 次提交

nfsd4: properly type op_func callbacks · 72edc37a

由 Christoph Hellwig 提交于 5月 08, 2017

Pass union nfsd4_op_u to the op_func callbacks instead of using unsafe
function pointer casts.

It also adds two missing structures to struct nfsd4_op.u to facilitate
this.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

72edc37a

nfsd4: properly type op_get_currentstateid callbacks · c2a1102a

由 Christoph Hellwig 提交于 5月 08, 2017

Pass union nfsd4_op_u to the op_set_currentstateid callbacks instead of
using unsafe function pointer casts.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

c2a1102a

nfsd4: properly type op_set_currentstateid callbacks · 6c9600a7

由 Christoph Hellwig 提交于 5月 08, 2017

Given the args union in struct nfsd4_op a name, and pass it to the
op_set_currentstateid callbacks instead of using unsafe function
pointer casts.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

6c9600a7

15 5月, 2017 3 次提交

nfsd4: properly type op_func callbacks · eb69853d

由 Christoph Hellwig 提交于 5月 08, 2017

Pass union nfsd4_op_u to the op_func callbacks instead of using unsafe
function pointer casts.

It also adds two missing structures to struct nfsd4_op.u to facilitate
this.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

eb69853d

nfsd4: properly type op_get_currentstateid callbacks · 57832e7b

由 Christoph Hellwig 提交于 5月 08, 2017

Pass union nfsd4_op_u to the op_set_currentstateid callbacks instead of
using unsafe function pointer casts.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

57832e7b

nfsd4: properly type op_set_currentstateid callbacks · b60e9859

由 Christoph Hellwig 提交于 5月 08, 2017

Given the args union in struct nfsd4_op a name, and pass it to the
op_set_currentstateid callbacks instead of using unsafe function
pointer casts.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

b60e9859

26 4月, 2017 1 次提交

nfsd4: remove pointless strdup_if_nonnull · 2f10fdcb

由 NeilBrown 提交于 3月 23, 2017

kstrdup() already checks for NULL.

(Brought to our attention by Jason Yann noticing (from sparse output)
that it should have been declared static.)
Signed-off-by: NNeilBrown <neilb@suse.com>
Reported-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

2f10fdcb

25 2月, 2017 1 次提交

nfsd: remove superfluous KERN_INFO · 4ab495bf

由 Rasmus Villemoes 提交于 2月 24, 2017

dprintk already provides a KERN_* prefix; this KERN_INFO just shows up
as some odd characters in the output.

Simplify the message a bit while we're there.
Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

4ab495bf

18 2月, 2017 1 次提交

nfsd/callback: Cleanup callback cred on shutdown · f7d1ddbe

由 Kinglong Mee 提交于 2月 05, 2017

The rpccred gotten from rpc_lookup_machine_cred() should be put when
state is shutdown.
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

f7d1ddbe

01 2月, 2017 1 次提交

NFSD: Fix a null reference case in find_or_create_lock_stateid() · d19fb70d

由 Kinglong Mee 提交于 1月 18, 2017

nfsd assigns the nfs4_free_lock_stateid to .sc_free in init_lock_stateid().

If nfsd doesn't go through init_lock_stateid() and put stateid at end,
there is a NULL reference to .sc_free when calling nfs4_put_stid(ns).

This patch let the nfs4_stid.sc_free assignment to nfs4_alloc_stid().

Cc: stable@vger.kernel.org
Fixes: 356a95ec "nfsd: clean up races in lock stateid searching..."
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

d19fb70d

02 11月, 2016 1 次提交

nfsd: Fix general protection fault in release_lock_stateid() · f46c445b

由 Chuck Lever 提交于 10月 29, 2016

When I push NFSv4.1 / RDMA hard, (xfstests generic/089, for example),
I get this crash on the server:

Oct 28 22:04:30 klimt kernel: general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
Oct 28 22:04:30 klimt kernel: Modules linked in: cts rpcsec_gss_krb5 iTCO_wdt iTCO_vendor_support sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm btrfs irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd xor pcspkr raid6_pq i2c_i801 i2c_smbus lpc_ich mfd_core sg mei_me mei ioatdma shpchp wmi ipmi_si ipmi_msghandler rpcrdma ib_ipoib rdma_ucm acpi_power_meter acpi_pad ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c mlx4_ib mlx4_en ib_core sr_mod cdrom sd_mod ast drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel igb ahci libahci ptp mlx4_core pps_core dca libata i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod
Oct 28 22:04:30 klimt kernel: CPU: 7 PID: 1558 Comm: nfsd Not tainted 4.9.0-rc2-00005-g82cd754 #8
Oct 28 22:04:30 klimt kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015
Oct 28 22:04:30 klimt kernel: task: ffff880835c3a100 task.stack: ffff8808420d8000
Oct 28 22:04:30 klimt kernel: RIP: 0010:[<ffffffffa05a759f>]  [<ffffffffa05a759f>] release_lock_stateid+0x1f/0x60 [nfsd]
Oct 28 22:04:30 klimt kernel: RSP: 0018:ffff8808420dbce0  EFLAGS: 00010246
Oct 28 22:04:30 klimt kernel: RAX: ffff88084e6660f0 RBX: ffff88084e667020 RCX: 0000000000000000
Oct 28 22:04:30 klimt kernel: RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffff88084e667020
Oct 28 22:04:30 klimt kernel: RBP: ffff8808420dbcf8 R08: 0000000000000001 R09: 0000000000000000
Oct 28 22:04:30 klimt kernel: R10: ffff880835c3a100 R11: ffff880835c3aca8 R12: 6b6b6b6b6b6b6b6b
Oct 28 22:04:30 klimt kernel: R13: ffff88084e6670d8 R14: ffff880835f546f0 R15: ffff880835f1c548
Oct 28 22:04:30 klimt kernel: FS:  0000000000000000(0000) GS:ffff88087bdc0000(0000) knlGS:0000000000000000
Oct 28 22:04:30 klimt kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 28 22:04:30 klimt kernel: CR2: 00007ff020389000 CR3: 0000000001c06000 CR4: 00000000001406e0
Oct 28 22:04:30 klimt kernel: Stack:
Oct 28 22:04:30 klimt kernel: ffff88084e667020 0000000000000000 ffff88084e6670d8 ffff8808420dbd20
Oct 28 22:04:30 klimt kernel: ffffffffa05ac80d ffff880835f54548 ffff88084e640008 ffff880835f545b0
Oct 28 22:04:30 klimt kernel: ffff8808420dbd70 ffffffffa059803d ffff880835f1c768 0000000000000870
Oct 28 22:04:30 klimt kernel: Call Trace:
Oct 28 22:04:30 klimt kernel: [<ffffffffa05ac80d>] nfsd4_free_stateid+0xfd/0x1b0 [nfsd]
Oct 28 22:04:30 klimt kernel: [<ffffffffa059803d>] nfsd4_proc_compound+0x40d/0x690 [nfsd]
Oct 28 22:04:30 klimt kernel: [<ffffffffa0583114>] nfsd_dispatch+0xd4/0x1d0 [nfsd]
Oct 28 22:04:30 klimt kernel: [<ffffffffa047bbf9>] svc_process_common+0x3d9/0x700 [sunrpc]
Oct 28 22:04:30 klimt kernel: [<ffffffffa047ca64>] svc_process+0xf4/0x330 [sunrpc]
Oct 28 22:04:30 klimt kernel: [<ffffffffa05827ca>] nfsd+0xfa/0x160 [nfsd]
Oct 28 22:04:30 klimt kernel: [<ffffffffa05826d0>] ? nfsd_destroy+0x170/0x170 [nfsd]
Oct 28 22:04:30 klimt kernel: [<ffffffff810b367b>] kthread+0x10b/0x120
Oct 28 22:04:30 klimt kernel: [<ffffffff810b3570>] ? kthread_stop+0x280/0x280
Oct 28 22:04:30 klimt kernel: [<ffffffff8174e8ba>] ret_from_fork+0x2a/0x40
Oct 28 22:04:30 klimt kernel: Code: c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 48 8b 87 b0 00 00 00 48 89 fb 4c 8b a0 98 00 00 00 <49> 8b 44 24 20 48 8d b8 80 03 00 00 e8 10 66 1a e1 48 89 df e8
Oct 28 22:04:30 klimt kernel: RIP  [<ffffffffa05a759f>] release_lock_stateid+0x1f/0x60 [nfsd]
Oct 28 22:04:30 klimt kernel: RSP <ffff8808420dbce0>
Oct 28 22:04:30 klimt kernel: ---[ end trace cf5d0b371973e167 ]---

Jeff Layton says:
> Hm...now that I look though, this is a little suspicious:
>
>    struct nfs4_openowner *oo = openowner(stp->st_openstp->st_stateowner);
>
> I wonder if it's possible for the openstateid to have already been
> destroyed at this point.
>
> We might be better off doing something like this to get the client pointer:
>
>    stp->st_stid.sc_client;
>
> ...which should be more direct and less dependent on other stateids
> staying valid.

With the suggested change, I am no longer able to reproduce the above oops.

v2: Fix unhash_lock_stateid() as well
Fix-suggested-by: NJeff Layton <jlayton@redhat.com>
Fixes: 42691398 ('nfsd: Fix race between FREE_STATEID and LOCK')
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

f46c445b

25 10月, 2016 1 次提交

nfsd: move blocked lock handling under a dedicated spinlock · 0cc11a61

由 Jeff Layton 提交于 10月 20, 2016

Bruce was hitting some lockdep warnings in testing, showing that we
could hit a deadlock with the new CB_NOTIFY_LOCK handling, involving a
rather complex situation involving four different spinlocks.

The crux of the matter is that we end up taking the nn->client_lock in
the lm_notify handler. The simplest fix is to just declare a new
per-nfsd_net spinlock to protect the new CB_NOTIFY_LOCK structures.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

0cc11a61

08 10月, 2016 1 次提交

cred: simpler, 1D supplementary groups · 81243eac

由 Alexey Dobriyan 提交于 10月 07, 2016

Current supplementary groups code can massively overallocate memory and
is implemented in a way so that access to individual gid is done via 2D
array.

If number of gids is <= 32, memory allocation is more or less tolerable
(140/148 bytes).  But if it is not, code allocates full page (!)
regardless and, what's even more fun, doesn't reuse small 32-entry
array.

2D array means dependent shifts, loads and LEAs without possibility to
optimize them (gid is never known at compile time).

All of the above is unnecessary.  Switch to the usual
trailing-zero-len-array scheme.  Memory is allocated with
kmalloc/vmalloc() and only as much as needed.  Accesses become simpler
(LEA 8(gi,idx,4) or even without displacement).

Maximum number of gids is 65536 which translates to 256KB+8 bytes.  I
think kernel can handle such allocation.

On my usual desktop system with whole 9 (nine) aux groups, struct
group_info shrinks from 148 bytes to 44 bytes, yay!

Nice side effects:

 - "gi->gid[i]" is shorter than "GROUP_AT(gi, i)", less typing,

 - fix little mess in net/ipv4/ping.c
   should have been using GROUP_AT macro but this point becomes moot,

 - aux group allocation is persistent and should be accounted as such.

Link: http://lkml.kernel.org/r/20160817201927.GA2096@p183.telecom.bySigned-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Cc: Vasily Kulikov <segoon@openwall.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

81243eac

27 9月, 2016 4 次提交

nfsd4: setclientid_confirm with unmatched verifier should fail · 7d22fc11

由 J. Bruce Fields 提交于 9月 20, 2016

A setclientid_confirm with (clientid, verifier) both matching an
existing confirmed record is assumed to be a replay, but if the verifier
doesn't match, it shouldn't be.

This would be a very rare case, except that clients following
https://tools.ietf.org/html/rfc7931#section-5.8 may depend on the
failure.
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

7d22fc11

nfsd: set the MAY_NOTIFY_LOCK flag in OPEN replies · 19e4c347

由 Jeff Layton 提交于 9月 16, 2016

If we are using v4.1+, then we can send notification when contended
locks become free. Inform the client of that fact.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

19e4c347

nfsd: add a LRU list for blocked locks · 7919d0a2

由 Jeff Layton 提交于 9月 16, 2016

It's possible for a client to call in on a lock that is blocked for a
long time, but discontinue polling for it. A malicious client could
even set a lock on a file, and then spam the server with failing lock
requests from different lockowners that pile up in a DoS attack.

Add the blocked lock structures to a per-net namespace LRU when hashing
them, and timestamp them. If the lock request is not revisited after a
lease period, we'll drop it under the assumption that the client is no
longer interested.

This also gives us a mechanism to clean up these objects at server
shutdown time as well.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

7919d0a2

nfsd: have nfsd4_lock use blocking locks for v4.1+ locks · 76d348fa

由 Jeff Layton 提交于 9月 16, 2016

Create a new per-lockowner+per-inode structure that contains a
file_lock. Have nfsd4_lock add this structure to the lockowner's list
prior to setting the lock. Then call the vfs and request a blocking lock
(by setting FL_SLEEP). If we get anything besides FILE_LOCK_DEFERRED
back, then we dequeue the block structure and free it. When the next
lock request comes in, we'll look for an existing block for the same
filehandle and dequeue and reuse it if there is one.

When the lock comes free (a'la an lm_notify call), we dequeue it
from the lockowner's list and kick off a CB_NOTIFY_LOCK callback to
inform the client that it should retry the lock request.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

76d348fa

13 8月, 2016 1 次提交

nfsd: don't return an unhashed lock stateid after taking mutex · dd257933

由 Jeff Layton 提交于 8月 11, 2016

nfsd4_lock will take the st_mutex before working with the stateid it
gets, but between the time when we drop the cl_lock and take the mutex,
the stateid could become unhashed (a'la FREE_STATEID). If that happens
the lock stateid returned to the client will be forgotten.

Fix this by first moving the st_mutex acquisition into
lookup_or_create_lock_state. Then, have it check to see if the lock
stateid is still hashed after taking the mutex. If it's not, then put
the stateid and try the find/create again.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Tested-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
Cc: stable@vger.kernel.org # feb9dad5 nfsd: Always lock state exclusively.
Cc: stable@vger.kernel.org
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

dd257933

12 8月, 2016 1 次提交

nfsd: Fix race between FREE_STATEID and LOCK · 42691398

由 Chuck Lever 提交于 8月 11, 2016

When running LTP's nfslock01 test, the Linux client can send a LOCK
and a FREE_STATEID request at the same time. The outcome is:

Frame 324    R OPEN stateid [2,O]

Frame 115004 C LOCK lockowner_is_new stateid [2,O] offset 672000 len 64
Frame 115008 R LOCK stateid [1,L]
Frame 115012 C WRITE stateid [0,L] offset 672000 len 64
Frame 115016 R WRITE NFS4_OK
Frame 115019 C LOCKU stateid [1,L] offset 672000 len 64
Frame 115022 R LOCKU NFS4_OK
Frame 115025 C FREE_STATEID stateid [2,L]
Frame 115026 C LOCK lockowner_is_new stateid [2,O] offset 672128 len 64
Frame 115029 R FREE_STATEID NFS4_OK
Frame 115030 R LOCK stateid [3,L]
Frame 115034 C WRITE stateid [0,L] offset 672128 len 64
Frame 115038 R WRITE NFS4ERR_BAD_STATEID

In other words, the server returns stateid L in a successful LOCK
reply, but it has already released it. Subsequent uses of stateid L
fail.

To address this, protect the generation check in nfsd4_free_stateid
with the st_mutex. This should guarantee that only one of two
outcomes occurs: either LOCK returns a fresh valid stateid, or
FREE_STATEID returns NFS4ERR_LOCKS_HELD.
Reported-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
Fix-suggested-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

42691398

16 7月, 2016 1 次提交

nfsd: Close race between nfsd4_release_lockowner and nfsd4_lock · 88584818

由 Chuck Lever 提交于 7月 13, 2016

nfsd4_release_lockowner finds a lock owner that has no lock state,
and drops cl_lock. Then release_lockowner picks up cl_lock and
unhashes the lock owner.

During the window where cl_lock is dropped, I don't see anything
preventing a concurrent nfsd4_lock from finding that same lock owner
and adding lock state to it.

Move release_lockowner() into nfsd4_release_lockowner and hang onto
the cl_lock until after the lock owner's state cannot be found
again.

Found by inspection, we don't currently have a reproducer.

Fixes: 2c41beb0 ("nfsd: reduce cl_lock thrashing in ... ")
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

88584818

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功