提交 · 7e6cca6caf7230b049bd681c5400b01c365ee452 · openeuler / Kernel

15 8月, 2017 10 次提交

NFS: Remove page group limit in nfs_flush_incompatible() · 7e6cca6c

由 Trond Myklebust 提交于 7月 17, 2017

nfs_try_to_update_request() should be able to cope now.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

7e6cca6c

T
NFS: Teach nfs_try_to_update_request() to deal with request page_groups · f6032f21
由 Trond Myklebust 提交于 7月 17, 2017
```
Simplify the code, and avoid some flushes to disk.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
```
f6032f21

NFS: Fix the inode request accounting when pages have subrequests · b66aaa8d

由 Trond Myklebust 提交于 7月 18, 2017

Both nfs_destroy_unlinked_subrequests() and nfs_lock_and_join_requests()
manipulate the inode flags adjusting the NFS_I(inode)->nrequests.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

b66aaa8d

NFS: Don't unlock writebacks before declaring PG_WB_END · 31a01f09

由 Trond Myklebust 提交于 7月 18, 2017

We don't want nfs_lock_and_join_requests() to start fiddling with
the request before the call to nfs_page_group_sync_on_bit().
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

31a01f09

NFS: Don't check request offset and size without holding a lock · e14bebf6

由 Trond Myklebust 提交于 7月 17, 2017

Request offsets and sizes are not guaranteed to be stable unless you
are holding the request locked.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

e14bebf6

NFS: Fix an ABBA issue in nfs_lock_and_join_requests() · a0e265bc

由 Trond Myklebust 提交于 7月 17, 2017

All other callers of nfs_page_group_lock() appear to already hold the
page lock on the head page, so doing it in the opposite order here
is inefficient, although not deadlock prone since we roll back all
locks on contention.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

a0e265bc

NFS: Fix a reference and lock leak in nfs_lock_and_join_requests() · 7cb9cd9a

由 Trond Myklebust 提交于 7月 17, 2017

Yes, this is a situation that should never happen (hence the WARN_ON)
but we should still ensure that we free up the locks and references to
the faulty pages.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

7cb9cd9a

NFS: Reduce lock contention in nfs_try_to_update_request() · 1403390d

由 Trond Myklebust 提交于 7月 17, 2017

Micro-optimisation to move the lockless check into the for(;;) loop.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

1403390d

NFS: Reduce lock contention in nfs_page_find_head_request() · 82749dd4

由 Trond Myklebust 提交于 7月 17, 2017

Add a lockless check for whether or not the page might be carrying
an existing writeback before we grab the inode->i_lock.
Reported-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

82749dd4

NFS: Simplify page writeback · 6d17d653

由 Trond Myklebust 提交于 7月 09, 2017

We don't expect the page header lock to ever be held across I/O, so
it should always be safe to wait for it, even if we're doing nonblocking
writebacks.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

6d17d653

14 7月, 2017 2 次提交

NFS: Fix commit policy for non-blocking calls to nfs_write_inode() · 1a4edf0f

由 Trond Myklebust 提交于 6月 20, 2017

Now that the writes will schedule a commit on their own, we don't
need nfs_write_inode() to schedule one if there are outstanding
writes, and we're being called in non-blocking mode.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

1a4edf0f

NFS: Ensure we commit after writeback is complete · 919e3bd9

由 Trond Myklebust 提交于 6月 20, 2017

If the page cache is being flushed, then we want to ensure that we
do start a commit once the pages are done being flushed.
If we just wait until all I/O is done to that file, we can end up
livelocking until the balance_dirty_pages() mechanism puts its
foot down and forces I/O to stop.
So instead we do more or less the same thing that O_DIRECT does,
and set up a counter to tell us when the flush is done,
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

919e3bd9

09 5月, 2017 1 次提交

NFS append COMMIT after synchronous COPY · e0926934

由 Olga Kornievskaia 提交于 5月 08, 2017

Instead of messing with the commit path which has been causing issues,
add a COMMIT op after the COPY and ask for stable copies in the first
space.

It saves a round trip, since after the COPY, the client sends a COMMIT
anyway.
Signed-off-by: NOlga Kornievskaia <kolga@netapp.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

e0926934

27 4月, 2017 1 次提交

NFSv4: Don't special case "launder" · c373fff7

由 Trond Myklebust 提交于 4月 26, 2017

If the client receives a fatal server error from nfs_pageio_add_request(),
then we should always truncate the page on which the error occurred.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

c373fff7

26 4月, 2017 1 次提交

NFS: Don't write back further requests if there is a pending write error · a6598813

由 Trond Myklebust 提交于 4月 25, 2017

If the server has already returned a fatal write error that the user
has not yet received on this file, then don't write back the other pages.
Instead, act as if they have been sent, and have returned with the same
error.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

a6598813

21 4月, 2017 4 次提交

nfs: Convert to separately allocated bdi · 0db10944

由 Jan Kara 提交于 4月 12, 2017

Allocate struct backing_dev_info separately instead of embedding it
inside the superblock. This unifies handling of bdi among users.

CC: Anna Schumaker <anna.schumaker@netapp.com>
CC: linux-nfs@vger.kernel.org
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJan Kara <jack@suse.cz>
Acked-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

0db10944

NFS: move rw_mode to nfs_pageio_header · fbe77c30

由 Benjamin Coddington 提交于 4月 19, 2017

Let's try to have it in a cacheline in nfs4_proc_pgio_rpc_prepare().
Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

fbe77c30

NFS: Fix use after free in write error path · 1f84ccdf

由 Fred Isaman 提交于 4月 14, 2017

Signed-off-by: NFred Isaman <fred.isaman@gmail.com>
Fixes: 0bcbf039 ("nfs: handle request add failure properly")
Cc: stable@vger.kernel.org # v4.5+
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

1f84ccdf

NFS: fix usage of mempools. · 518662e0

由 NeilBrown 提交于 4月 10, 2017

When passed GFP flags that allow sleeping (such as
GFP_NOIO), mempool_alloc() will never return NULL, it will
wait until memory is available.

This means that we don't need to handle failure, but that we
do need to ensure one thread doesn't call mempool_alloc()
twice on the one pool without queuing or freeing the first
allocation.  If multiple threads did this during times of
high memory pressure, the pool could be exhausted and a
deadlock could result.

pnfs_generic_alloc_ds_commits() attempts to allocate from
the nfs_commit_mempool while already holding an allocation
from that pool.  This is not safe.  So change
nfs_commitdata_alloc() to take a flag that indicates whether
failure is acceptable.

In pnfs_generic_alloc_ds_commits(), accept failure and
handle it as we currently do.  Else where, do not accept
failure, and do not handle it.

Even when failure is acceptable, we want to succeed if
possible.  That means both
 - using an entry from the pool if there is one
 - waiting for direct reclaim is there isn't.

We call mempool_alloc(GFP_NOWAIT) to achieve the first, then
kmem_cache_alloc(GFP_NOIO|__GFP_NORETRY) to achieve the
second.  Each of these can fail, but together they do the
best they can without blocking indefinitely.

The objects returned by kmem_cache_alloc() will still be freed
by mempool_free().  This is safe as mempool_alloc() uses
exactly the same function to allocate objects (since the mempool
was created with mempool_create_slab_pool()).  The object returned
by mempool_alloc() and kmem_cache_alloc() are indistinguishable
so mempool_free() will handle both identically, either adding to the
pool or calling kmem_cache_free().

Also, don't test for failure when allocating from
nfs_wdata_mempool.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

518662e0

18 3月, 2017 1 次提交

NFS: fix the fault nrequests decreasing for nfs_inode COPY · 38a33101

由 Kinglong Mee 提交于 3月 09, 2017

The nfs_commit_file for NFSv4.2's COPY operation goes through
the commit path for normal WRITE, but without increase nrequests,
so, the nrequests decreased in nfs_commit_release_pages is fault.
After that, the nrequests will be wrong.

[ 5670.299881] ------------[ cut here ]------------
[ 5670.300295] WARNING: CPU: 0 PID: 27656 at fs/nfs/inode.c:127 nfs_clear_inode+0x66/0x90 [nfs]
[ 5670.300558] Modules linked in: nfsv4(E) nfs(E) fscache(E) tun bridge stp llc fuse ip_set nfnetlink vmw_vsock_vmci_transport vsock snd_seq_midi snd_seq_midi_event ppdev f2fs coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_ens1371 intel_rapl_perf gameport snd_ac97_codec vmw_balloon ac97_bus snd_seq snd_pcm joydev snd_rawmidi snd_timer snd_seq_device snd soundcore nfit parport_pc parport acpi_cpufreq tpm_tis tpm_tis_core tpm i2c_piix4 vmw_vmci shpchp nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c vmwgfx drm_kms_helper ttm drm e1000 crc32c_intel mptspi scsi_transport_spi serio_raw mptscsih mptbase ata_generic pata_acpi fjes [last unloaded: fscache]
[ 5670.302925] CPU: 0 PID: 27656 Comm: umount.nfs4 Tainted: G        W   E   4.11.0-rc1+ #519
[ 5670.303292] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[ 5670.304094] Call Trace:
[ 5670.304510]  dump_stack+0x63/0x86
[ 5670.304917]  __warn+0xcb/0xf0
[ 5670.305276]  warn_slowpath_null+0x1d/0x20
[ 5670.305661]  nfs_clear_inode+0x66/0x90 [nfs]
[ 5670.306093]  nfs4_evict_inode+0x61/0x70 [nfsv4]
[ 5670.306480]  evict+0xbb/0x1c0
[ 5670.306888]  dispose_list+0x4d/0x70
[ 5670.307233]  evict_inodes+0x178/0x1a0
[ 5670.307579]  generic_shutdown_super+0x44/0xf0
[ 5670.307985]  nfs_kill_super+0x21/0x40 [nfs]
[ 5670.308325]  deactivate_locked_super+0x43/0x70
[ 5670.308698]  deactivate_super+0x5a/0x60
[ 5670.309036]  cleanup_mnt+0x3f/0x90
[ 5670.309407]  __cleanup_mnt+0x12/0x20
[ 5670.309837]  task_work_run+0x80/0xa0
[ 5670.310162]  exit_to_usermode_loop+0x89/0x90
[ 5670.310497]  syscall_return_slowpath+0xaa/0xb0
[ 5670.310875]  entry_SYSCALL_64_fastpath+0xa7/0xa9
[ 5670.311197] RIP: 0033:0x7f1bb3617fe7
[ 5670.311545] RSP: 002b:00007ffecbabb828 EFLAGS: 00000206 ORIG_RAX: 00000000000000a6
[ 5670.311906] RAX: 0000000000000000 RBX: 0000000001dca1f0 RCX: 00007f1bb3617fe7
[ 5670.312239] RDX: 000000000000000c RSI: 0000000000000001 RDI: 0000000001dc83c0
[ 5670.312653] RBP: 0000000001dc83c0 R08: 0000000000000001 R09: 0000000000000000
[ 5670.312998] R10: 0000000000000755 R11: 0000000000000206 R12: 00007ffecbabc66a
[ 5670.313335] R13: 0000000001dc83a0 R14: 0000000000000000 R15: 0000000000000000
[ 5670.313758] ---[ end trace bf4bfe7764e4eb40 ]---

Cc: linux-kernel@vger.kernel.org
Fixes: 67911c8f ("NFS: Add nfs_commit_file()")
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Cc: stable@vger.kernel.org # 4.7+
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

38a33101

23 2月, 2017 1 次提交

nfs: no PG_private waiters remain, remove waker · d94e0c05

由 Nicholas Piggin 提交于 2月 22, 2017

Since commit 4f52b6bb ("NFS: Don't call COMMIT in ->releasepage()"),
no tasks wait on PagePrivate.

Thus the wake introduced in commit 95905446 ("NFS: avoid deadlocks
with loop-back mounted NFS filesystems.") can be removed.

Link: http://lkml.kernel.org/r/20170103182234.30141-2-npiggin@gmail.comSigned-off-by: NNicholas Piggin <npiggin@gmail.com>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d94e0c05

09 2月, 2017 1 次提交

nfs: no PG_private waiters remain, remove waker · 600424e3

由 Nicholas Piggin 提交于 1月 04, 2017

Since commit 4f52b6bb ("NFS: Don't call COMMIT in ->releasepage()"),
no tasks wait on PagePrivate, so the wake introduced in commit 95905446
("NFS: avoid deadlocks with loop-back mounted NFS filesystems.") can
be removed.
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

600424e3

31 1月, 2017 1 次提交

sunrpc & nfs: Add and use dprintk_cont macros · ddeaa637

由 Joe Perches 提交于 10月 15, 2016

Allow line continuations to work properly with KERN_CONT.
Signed-off-by: NJoe Perches <joe@perches.com>
[Anna: Add fallback dprintk_cont() for when CONFIG_SUNRPC_DEBUG=n]
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

ddeaa637

25 12月, 2016 1 次提交

Replace <asm/uaccess.h> with <linux/uaccess.h> globally · 7c0f6ba6

由 Linus Torvalds 提交于 12月 24, 2016

This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
  sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.
Requested-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7c0f6ba6

02 12月, 2016 2 次提交

NFS: discard nfs_lockowner structure. · d51fdb87

由 NeilBrown 提交于 10月 13, 2016

It now has only one field and is only used in one structure.
So replaced it in that structure by the field it contains.
Signed-off-by: NNeilBrown <neilb@suse.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

d51fdb87

NFS: remove l_pid field from nfs_lockowner · b184b5c3

由 NeilBrown 提交于 10月 13, 2016

this field is not used in any important way and probably should
have been removed by

Commit: 8003d3c4 ("nfs4: treat lock owners as opaque values")

which removed the pid argument from nfs4_get_lock_state.

Except in unusual and uninteresting cases, two threads with the same
->tgid will have the same ->files pointer, so keeping them both
for comparison brings no benefit.
Acked-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

b184b5c3

08 10月, 2016 1 次提交

mm: remove page_file_index · 8cd79788

由 Huang Ying 提交于 10月 07, 2016

After using the offset of the swap entry as the key of the swap cache,
the page_index() becomes exactly same as page_file_index().  So the
page_file_index() is removed and the callers are changed to use
page_index() instead.

Link: http://lkml.kernel.org/r/1473270649-27229-2-git-send-email-ying.huang@intel.comSigned-off-by: N"Huang, Ying" <ying.huang@intel.com>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8cd79788

29 7月, 2016 1 次提交

mm: move most file-based accounting to the node · 11fb9989

由 Mel Gorman 提交于 7月 28, 2016

There are now a number of accounting oddities such as mapped file pages
being accounted for on the node while the total number of file pages are
accounted on the zone.  This can be coped with to some extent but it's
confusing so this patch moves the relevant file-based accounted.  Due to
throttling logic in the page allocator for reliable OOM detection, it is
still necessary to track dirty and writeback pages on a per-zone basis.

[mgorman@techsingularity.net: fix NR_ZONE_WRITE_PENDING accounting]
  Link: http://lkml.kernel.org/r/1468404004-5085-5-git-send-email-mgorman@techsingularity.net
Link: http://lkml.kernel.org/r/1467970510-21195-20-git-send-email-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

11fb9989

23 7月, 2016 1 次提交

nfs: don't create zero-length requests · 149a4fdd

由 Benjamin Coddington 提交于 7月 18, 2016

NFS doesn't expect requests with wb_bytes set to zero and may make
unexpected decisions about how to handle that request at the page IO layer.
Skip request creation if we won't have any wb_bytes in the request.
Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: NWeston Andros Adamson <dros@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

149a4fdd

20 7月, 2016 1 次提交

sunrpc: move NO_CRKEY_TIMEOUT to the auth->au_flags · ce52914e

由 Scott Mayhew 提交于 6月 07, 2016

A generic_cred can be used to look up a unx_cred or a gss_cred, so it's
not really safe to use the the generic_cred->acred->ac_flags to store
the NO_CRKEY_TIMEOUT flag.  A lookup for a unx_cred triggered while the
KEY_EXPIRE_SOON flag is already set will cause both NO_CRKEY_TIMEOUT and
KEY_EXPIRE_SOON to be set in the ac_flags, leaving the user associated
with the auth_cred to be in a state where they're perpetually doing 4K
NFS_FILE_SYNC writes.

This can be reproduced as follows:

1. Mount two NFS filesystems, one with sec=krb5 and one with sec=sys.
They do not need to be the same export, nor do they even need to be from
the same NFS server.  Also, v3 is fine.
$ sudo mount -o v3,sec=krb5 server1:/export /mnt/krb5
$ sudo mount -o v3,sec=sys server2:/export /mnt/sys

2. As the normal user, before accessing the kerberized mount, kinit with
a short lifetime (but not so short that renewing the ticket would leave
you within the 4-minute window again by the time the original ticket
expires), e.g.
$ kinit -l 10m -r 60m

3. Do some I/O to the kerberized mount and verify that the writes are
wsize, UNSTABLE:
$ dd if=/dev/zero of=/mnt/krb5/file bs=1M count=1

4. Wait until you're within 4 minutes of key expiry, then do some more
I/O to the kerberized mount to ensure that RPC_CRED_KEY_EXPIRE_SOON gets
set.  Verify that the writes are 4K, FILE_SYNC:
$ dd if=/dev/zero of=/mnt/krb5/file bs=1M count=1

5. Now do some I/O to the sec=sys mount.  This will cause
RPC_CRED_NO_CRKEY_TIMEOUT to be set:
$ dd if=/dev/zero of=/mnt/sys/file bs=1M count=1

6. Writes for that user will now be permanently 4K, FILE_SYNC for that
user, regardless of which mount is being written to, until you reboot
the client.  Renewing the kerberos ticket (assuming it hasn't already
expired) will have no effect.  Grabbing a new kerberos ticket at this
point will have no effect either.

Move the flag to the auth->au_flags field (which is currently unused)
and rename it slightly to reflect that it's no longer associated with
the auth_cred->ac_flags.  Add the rpc_auth to the arg list of
rpcauth_cred_key_to_expire and check the au_flags there too.  Finally,
add the inode to the arg list of nfs_ctx_key_to_expire so we can
determine the rpc_auth to pass to rpcauth_cred_key_to_expire.
Signed-off-by: NScott Mayhew <smayhew@redhat.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

ce52914e

06 7月, 2016 2 次提交

NFSv4.2: Fix writeback races in nfs4_copy_file_range · 837bb1d7

由 Trond Myklebust 提交于 6月 25, 2016

We need to ensure that any writes to the destination file are serialised
with the copy, meaning that the writeback has to occur under the inode lock.

Also relax the writeback requirement on the source, and rely on the
stateid checking to tell us if the source rebooted. Add the helper
nfs_filemap_write_and_wait_range() to call pnfs_sync_inode() as
is appropriate for pNFS servers that may need a layoutcommit.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

837bb1d7

NFS: Fix O_DIRECT verifier problems · 8fc3c386

由 Trond Myklebust 提交于 6月 01, 2016

We should not be interested in looking at the value of the stable field,
since that could take any value.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

8fc3c386

22 6月, 2016 2 次提交

NFS: writepage of a single page should not be synchronous · 811ed92e

由 Trond Myklebust 提交于 6月 01, 2016

It is almost always better to wait for more so that we can issue a
bulk commit.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

811ed92e

NFS: Kill NFS_INO_NFS_INO_FLUSHING: it is a performance killer · 6b56a898

由 Trond Myklebust 提交于 6月 01, 2016

filemap_datawrite() and friends already deal just fine with livelock.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

6b56a898

26 5月, 2016 2 次提交

nfs: avoid race that crashes nfs_init_commit · ade8febd

由 Weston Andros Adamson 提交于 5月 25, 2016

Since the patch "NFS: Allow multiple commit requests in flight per file"
we can run multiple simultaneous commits on the same inode.  This
introduced a race over collecting pages to commit that made it possible
to call nfs_init_commit() with an empty list - which causes crashes like
the one below.

The fix is to catch this race and avoid calling nfs_init_commit and
initiate_commit when there is no work to do.

Here is the crash:

[600522.076832] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
[600522.078475] IP: [<ffffffffa0479e72>] nfs_init_commit+0x22/0x130 [nfs]
[600522.078745] PGD 4272b1067 PUD 4272cb067 PMD 0
[600522.078972] Oops: 0000 [#1] SMP
[600522.079204] Modules linked in: nfsv3 nfs_layout_flexfiles rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache dcdbas ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw vmw_vsock_vmci_transport vsock bonding ipmi_devintf ipmi_msghandler coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ppdev vmw_balloon parport_pc parport acpi_cpufreq vmw_vmci i2c_piix4 shpchp nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c vmwgfx drm_kms_helper ttm drm crc32c_intel serio_raw vmxnet3
[600522.081380]  vmw_pvscsi ata_generic pata_acpi
[600522.081809] CPU: 3 PID: 15667 Comm: /usr/bin/python Not tainted 4.1.9-100.pd.88.el7.x86_64 #1
[600522.082281] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014
[600522.082814] task: ffff8800bbbfa780 ti: ffff88042ae84000 task.ti: ffff88042ae84000
[600522.083378] RIP: 0010:[<ffffffffa0479e72>]  [<ffffffffa0479e72>] nfs_init_commit+0x22/0x130 [nfs]
[600522.083973] RSP: 0018:ffff88042ae87438  EFLAGS: 00010246
[600522.084571] RAX: 0000000000000000 RBX: ffff880003485e40 RCX: ffff88042ae87588
[600522.085188] RDX: 0000000000000000 RSI: ffff88042ae874b0 RDI: ffff880003485e40
[600522.085756] RBP: ffff88042ae87448 R08: ffff880003486010 R09: ffff88042ae874b0
[600522.086332] R10: 0000000000000000 R11: 0000000000000005 R12: ffff88042ae872d0
[600522.086905] R13: ffff88042ae874b0 R14: ffff880003485e40 R15: ffff88042704c840
[600522.087484] FS:  00007f4728ff2740(0000) GS:ffff88043fd80000(0000) knlGS:0000000000000000
[600522.088070] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[600522.088663] CR2: 0000000000000040 CR3: 000000042b6aa000 CR4: 00000000001406e0
[600522.089327] Stack:
[600522.089926]  0000000000000001 ffff88042ae87588 ffff88042ae874f8 ffffffffa04f09fa
[600522.090549]  0000000000017840 0000000000017840 ffff88042ae87588 ffff8803258d9930
[600522.091169]  ffff88042ae87578 ffffffffa0563d80 0000000000000000 ffff88042704c840
[600522.091789] Call Trace:
[600522.092420]  [<ffffffffa04f09fa>] pnfs_generic_commit_pagelist+0x1da/0x320 [nfsv4]
[600522.093052]  [<ffffffffa0563d80>] ? ff_layout_commit_prepare_v3+0x30/0x30 [nfs_layout_flexfiles]
[600522.093696]  [<ffffffffa0562645>] ff_layout_commit_pagelist+0x15/0x20 [nfs_layout_flexfiles]
[600522.094359]  [<ffffffffa047bc78>] nfs_generic_commit_list+0xe8/0x120 [nfs]
[600522.095032]  [<ffffffffa047bd6a>] nfs_commit_inode+0xba/0x110 [nfs]
[600522.095719]  [<ffffffffa046ac54>] nfs_release_page+0x44/0xd0 [nfs]
[600522.096410]  [<ffffffff811a8122>] try_to_release_page+0x32/0x50
[600522.097109]  [<ffffffff811bd4f1>] shrink_page_list+0x961/0xb30
[600522.097812]  [<ffffffff811bdced>] shrink_inactive_list+0x1cd/0x550
[600522.098530]  [<ffffffff811bea65>] shrink_lruvec+0x635/0x840
[600522.099250]  [<ffffffff811bed60>] shrink_zone+0xf0/0x2f0
[600522.099974]  [<ffffffff811bf312>] do_try_to_free_pages+0x192/0x470
[600522.100709]  [<ffffffff811bf6ca>] try_to_free_pages+0xda/0x170
[600522.101464]  [<ffffffff811b2198>] __alloc_pages_nodemask+0x588/0x970
[600522.102235]  [<ffffffff811fbbd5>] alloc_pages_vma+0xb5/0x230
[600522.103000]  [<ffffffff813a1589>] ? cpumask_any_but+0x39/0x50
[600522.103774]  [<ffffffff811d6115>] wp_page_copy.isra.55+0x95/0x490
[600522.104558]  [<ffffffff810e3438>] ? __wake_up+0x48/0x60
[600522.105357]  [<ffffffff811d7d3b>] do_wp_page+0xab/0x4f0
[600522.106137]  [<ffffffff810a1bbb>] ? release_task+0x36b/0x470
[600522.106902]  [<ffffffff8126dbd7>] ? eventfd_ctx_read+0x67/0x1c0
[600522.107659]  [<ffffffff811da2a8>] handle_mm_fault+0xc78/0x1900
[600522.108431]  [<ffffffff81067ef1>] __do_page_fault+0x181/0x420
[600522.109173]  [<ffffffff811446a6>] ? __audit_syscall_exit+0x1e6/0x280
[600522.109893]  [<ffffffff810681c0>] do_page_fault+0x30/0x80
[600522.110594]  [<ffffffff81024f36>] ? syscall_trace_leave+0xc6/0x120
[600522.111288]  [<ffffffff81790a58>] page_fault+0x28/0x30
[600522.111947] Code: 5d c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 4c 8d 87 d0 01 00 00 48 89 e5 53 48 89 fb 48 83 ec 08 4c 8b 0e 49 8b 41 18 4c 39 ce <48> 8b 40 40 4c 8b 50 30 74 24 48 8b 87 d0 01 00 00 48 8b 7e 08
[600522.113343] RIP  [<ffffffffa0479e72>] nfs_init_commit+0x22/0x130 [nfs]
[600522.114003]  RSP <ffff88042ae87438>
[600522.114636] CR2: 0000000000000040

Fixes: af7cf057 (NFS: Allow multiple commit requests in flight per file)
CC: stable@vger.kernel.org
Signed-off-by: NWeston Andros Adamson <dros@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

ade8febd

NFS: checking for NULL instead of IS_ERR() in nfs_commit_file() · 2997bfd0

由 Dan Carpenter 提交于 5月 23, 2016

nfs_create_request() doesn't return NULL, it returns error pointers.

Fixes: 67911c8f ('NFS: Add nfs_commit_file()')
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

2997bfd0

18 5月, 2016 2 次提交

NFS: Reclaim writes via writepage are opportunistic · cca588d6

由 Trond Myklebust 提交于 5月 16, 2016

No need to make them a priority any more, or to make them succeed.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

cca588d6

NFS: Add nfs_commit_file() · 67911c8f

由 Anna Schumaker 提交于 1月 19, 2016

Copy will use this to set up a commit request for a generic range.  I
don't want to allocate a new pagecache entry for the file, so I needed
to change parts of the commit path to handle requests with a null
wb_page.
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

67911c8f

09 5月, 2016 1 次提交

NFS: Save struct inode * inside nfs_commit_info to clarify usage of i_lock · fe238e60

由 Dave Wysochanski 提交于 4月 01, 2016

Commit ea2cf228 created nfs_commit_info and saved &inode->i_lock inside
this NFS specific structure.  This obscures the usage of i_lock.
Instead, save struct inode * so later it's clear the spinlock taken is
i_lock.

Should be no functional change.
Signed-off-by: NDave Wysochanski <dwysocha@redhat.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

fe238e60

05 4月, 2016 1 次提交

mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf

由 Kirill A. Shutemov 提交于 4月 01, 2016

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

09cbfeaf

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功