提交 · f844cd0d76378fa898890d2448222d407f3f8ecf · openeuler / raspberrypi-kernel

20 9月, 2016 1 次提交

nfs: cover ->migratepage with CONFIG_MIGRATION · f844cd0d

由 Chao Yu 提交于 9月 20, 2016

It will be more clean to use CONFIG_MIGRATION to cover nfs' private
.migratepage in nfs_file_aops like we do in other part of nfs
operations.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

f844cd0d

04 9月, 2016 1 次提交

NFS: Fix error reporting in nfs_file_write() · c49edecd

由 Trond Myklebust 提交于 9月 03, 2016

When doing O_DSYNC writes, the actual write errors are reported through
generic_write_sync(), so we must test the result.
Reported-by: NJ. R. Okajima <hooanon05g@gmail.com>
Fixes: 18290650 ("NFS: Move buffered I/O locking into nfs_file_write()")
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

c49edecd

20 7月, 2016 1 次提交

sunrpc: move NO_CRKEY_TIMEOUT to the auth->au_flags · ce52914e

由 Scott Mayhew 提交于 6月 07, 2016

A generic_cred can be used to look up a unx_cred or a gss_cred, so it's
not really safe to use the the generic_cred->acred->ac_flags to store
the NO_CRKEY_TIMEOUT flag.  A lookup for a unx_cred triggered while the
KEY_EXPIRE_SOON flag is already set will cause both NO_CRKEY_TIMEOUT and
KEY_EXPIRE_SOON to be set in the ac_flags, leaving the user associated
with the auth_cred to be in a state where they're perpetually doing 4K
NFS_FILE_SYNC writes.

This can be reproduced as follows:

1. Mount two NFS filesystems, one with sec=krb5 and one with sec=sys.
They do not need to be the same export, nor do they even need to be from
the same NFS server.  Also, v3 is fine.
$ sudo mount -o v3,sec=krb5 server1:/export /mnt/krb5
$ sudo mount -o v3,sec=sys server2:/export /mnt/sys

2. As the normal user, before accessing the kerberized mount, kinit with
a short lifetime (but not so short that renewing the ticket would leave
you within the 4-minute window again by the time the original ticket
expires), e.g.
$ kinit -l 10m -r 60m

3. Do some I/O to the kerberized mount and verify that the writes are
wsize, UNSTABLE:
$ dd if=/dev/zero of=/mnt/krb5/file bs=1M count=1

4. Wait until you're within 4 minutes of key expiry, then do some more
I/O to the kerberized mount to ensure that RPC_CRED_KEY_EXPIRE_SOON gets
set.  Verify that the writes are 4K, FILE_SYNC:
$ dd if=/dev/zero of=/mnt/krb5/file bs=1M count=1

5. Now do some I/O to the sec=sys mount.  This will cause
RPC_CRED_NO_CRKEY_TIMEOUT to be set:
$ dd if=/dev/zero of=/mnt/sys/file bs=1M count=1

6. Writes for that user will now be permanently 4K, FILE_SYNC for that
user, regardless of which mount is being written to, until you reboot
the client.  Renewing the kerberos ticket (assuming it hasn't already
expired) will have no effect.  Grabbing a new kerberos ticket at this
point will have no effect either.

Move the flag to the auth->au_flags field (which is currently unused)
and rename it slightly to reflect that it's no longer associated with
the auth_cred->ac_flags.  Add the rpc_auth to the arg list of
rpcauth_cred_key_to_expire and check the au_flags there too.  Finally,
add the inode to the arg list of nfs_ctx_key_to_expire so we can
determine the rpc_auth to pass to rpcauth_cred_key_to_expire.
Signed-off-by: NScott Mayhew <smayhew@redhat.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

ce52914e

06 7月, 2016 5 次提交

NFS nfs_vm_page_mkwrite: Don't freeze me, Bro... · 9a773e7c

由 Trond Myklebust 提交于 6月 23, 2016

Prevent filesystem freezes while handling the write page fault.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

9a773e7c

NFS: Remove redundant waits for O_DIRECT in fsync() and write_begin() · f508d46a

由 Trond Myklebust 提交于 6月 23, 2016

We're now waiting immediately after taking the locks, so waiting
in fsync() and write_begin() is either redundant or potentially
subject to livelock (if not holding the lock).
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

f508d46a

NFS: Do not serialise O_DIRECT reads and writes · a5864c99

由 Trond Myklebust 提交于 6月 03, 2016

Allow dio requests to be scheduled in parallel, but ensuring that they
do not conflict with buffered I/O.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

a5864c99

NFS: Move buffered I/O locking into nfs_file_write() · 18290650

由 Trond Myklebust 提交于 6月 23, 2016

Preparation for the patch that de-serialises O_DIRECT reads and
writes.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

18290650

T
NFS Cleanup: move call to generic_write_checks() into fs/nfs/direct.c · 89698b24
由 Trond Myklebust 提交于 6月 23, 2016
```
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
```
89698b24

22 6月, 2016 4 次提交

NFS: Don't call COMMIT in ->releasepage() · 4f52b6bb

由 Trond Myklebust 提交于 6月 02, 2016

While COMMIT has the potential to free up a lot of memory that is being
taken by unstable writes, it isn't guaranteed to free up this particular
page. Also, calling fsync() on the server is expensive and so we want to
do it in a more controlled fashion, rather than have it triggered at
random by the VM.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

4f52b6bb

NFS: Don't hold the inode lock across fsync() · 93761d98

由 Trond Myklebust 提交于 6月 02, 2016

Commits are no longer required to be serialised.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

93761d98

NFS: Kill NFS_INO_NFS_INO_FLUSHING: it is a performance killer · 6b56a898

由 Trond Myklebust 提交于 6月 01, 2016

filemap_datawrite() and friends already deal just fine with livelock.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

6b56a898

NFS: Cache aggressively when file is open for writing · ca0daa27

由 Trond Myklebust 提交于 6月 08, 2016

Unless the user is using file locking, we must assume close-to-open
cache consistency when the file is open for writing. Adjust the
caching algorithm so that it does not clear the cache on out-of-order
writes and/or attribute revalidations.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

ca0daa27

02 5月, 2016 1 次提交

direct-io: eliminate the offset argument to ->direct_IO · c8b8e32d

由 Christoph Hellwig 提交于 4月 07, 2016

Including blkdev_direct_IO and dax_do_io.  It has to be ki_pos to actually
work, so eliminate the superflous argument.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c8b8e32d

05 4月, 2016 1 次提交

mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf

由 Kirill A. Shutemov 提交于 4月 01, 2016

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized.  And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE.  And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special.  They are
not.

The changes are pretty straight-forward:

 - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

 - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

 - page_cache_get() -> get_page();

 - page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below.  For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach.  I'll
fix them manually in a separate patch.  Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

09cbfeaf

17 3月, 2016 2 次提交

nfs: remove nfs_inode_dio_wait · 95d9f6c3

由 Christoph Hellwig 提交于 3月 02, 2016

Just call inode_dio_wait directly instead of through a pointless wrapper.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

95d9f6c3

nfs: remove nfs4_file_fsync · 4ff79bc7

由 Christoph Hellwig 提交于 3月 02, 2016

The only difference to nfs_file_fsync is the call to pnfs_sync_inode.  But
pnfs_sync_inode is just an inline that calls a pNFS layout driver method
if CONFIG_PNFS is designed, and thus can be called just fine from the core
NFS module.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

4ff79bc7

23 1月, 2016 1 次提交

wrappers for ->i_mutex access · 5955102c

由 Al Viro 提交于 1月 22, 2016

parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5955102c

08 1月, 2016 1 次提交

NFS: Use wait_on_atomic_t() for unlock after readahead · 210c7c17

由 Benjamin Coddington 提交于 1月 06, 2016

The use of wait_on_atomic_t() for waiting on I/O to complete before
unlocking allows us to git rid of the NFS_IO_INPROGRESS flag, and thus the
nfs_iocounter's flags member, and finally the nfs_iocounter altogether.
The count of I/O is moved to the lock context, and the counter
increment/decrement functions become simple enough to open-code.
Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
[Trond: Fix up conflict with existing function nfs_wait_atomic_killable()]
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

210c7c17

01 1月, 2016 1 次提交

NFS: Allow multiple commit requests in flight per file · af7cf057

由 Trond Myklebust 提交于 9月 29, 2015

Allow synchronous RPC calls to wait for pending RPC calls to finish,
but also allow asynchronous ones to just fire off another commit.

With this patch, the xfstests generic/074 test completes in 226s
instead of 242s
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

af7cf057

29 12月, 2015 1 次提交

nfs: only remove page from mapping if launder_page fails · d6c843b9

由 Peng Tao 提交于 12月 05, 2015

Instead of dropping pages when write fails, only do it when
we get fatal failure in launder_page write back.
Signed-off-by: NPeng Tao <tao.peng@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

d6c843b9

07 11月, 2015 1 次提交

mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep... · d0164adc

由 Mel Gorman 提交于 11月 06, 2015

mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd

__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts.  They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve".  __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".

Over time, callers had a requirement to not block when fallback options
were available.  Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.

This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative.  High priority users continue to use
__GFP_HIGH.  __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim.  __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim.  __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.

This patch then converts a number of sites

o __GFP_ATOMIC is used by callers that are high priority and have memory
  pools for those requests. GFP_ATOMIC uses this flag.

o Callers that have a limited mempool to guarantee forward progress clear
  __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
  into this category where kswapd will still be woken but atomic reserves
  are not used as there is a one-entry mempool to guarantee progress.

o Callers that are checking if they are non-blocking should use the
  helper gfpflags_allow_blocking() where possible. This is because
  checking for __GFP_WAIT as was done historically now can trigger false
  positives. Some exceptions like dm-crypt.c exist where the code intent
  is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
  flag manipulations.

o Callers that built their own GFP flags instead of starting with GFP_KERNEL
  and friends now also need to specify __GFP_KSWAPD_RECLAIM.

The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.

The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL.  They may
now wish to specify __GFP_KSWAPD_RECLAIM.  It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.
Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Acked-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d0164adc

23 10月, 2015 1 次提交

Move locks API users to locks_lock_inode_wait() · 4f656367

由 Benjamin Coddington 提交于 10月 22, 2015

Instead of having users check for FL_POSIX or FL_FLOCK to call the correct
locks API function, use the check within locks_lock_inode_wait().  This
allows for some later cleanup.
Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>

4f656367

08 9月, 2015 1 次提交

NFSv4: Respect the server imposed limit on how many changes we may cache · 5445b1fb

由 Trond Myklebust 提交于 9月 05, 2015

The NFSv4 delegation spec allows the server to tell a client to limit how
much data it cache after the file is closed. In return, the server
guarantees enough free space to avoid ENOSPC situations, etc.
Prior to this patch, we assumed we could always cache aggressively after
close. Unfortunately, this causes problems with servers that set the
limit to 0 and therefore do not offer any ENOSPC guarantees.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

5445b1fb

18 8月, 2015 2 次提交

NFS: Don't fsync twice for O_SYNC/IS_SYNC files · 7e94d6c4

由 Trond Myklebust 提交于 8月 17, 2015

generic_file_write_iter() will already do an fsync on our behalf
if the file descriptor is O_SYNC or the file is marked as IS_SYNC.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

7e94d6c4

NFS: Remove nfs_release() · aff8d8dc

由 Anna Schumaker 提交于 7月 13, 2015

And call nfs_file_clear_open_context() directly.  This makes it obvious
that nfs_file_release() will always return 0.
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

aff8d8dc

11 6月, 2015 1 次提交

sunrpc: keep a count of swapfiles associated with the rpc_clnt · 3c87ef6e

由 Jeff Layton 提交于 6月 03, 2015

Jerome reported seeing a warning pop when working with a swapfile on
NFS. The nfs_swap_activate can end up calling sk_set_memalloc while
holding the rcu_read_lock and that function can sleep.

To fix that, we need to take a reference to the xprt while holding the
rcu_read_lock, set the socket up for swapping and then drop that
reference. But, xprt_put is not exported and having NFS deal with the
underlying xprt is a bit of layering violation anyway.

Fix this by adding a set of activate/deactivate functions that take a
rpc_clnt pointer instead of an rpc_xprt, and have nfs_swap_activate and
nfs_swap_deactivate call those.

Also, add a per-rpc_clnt atomic counter to keep track of the number of
active swapfiles associated with it. When the counter does a 0->1
transition, we enable swapping on the xprt, when we do a 1->0 transition
we disable swapping on it.

This also allows us to be a bit more selective with the RPC_TASK_SWAPPER
flag. If non-swapper and swapper clnts are sharing a xprt, then we only
need to flag the tasks from the swapper clnt with that flag.
Acked-by: NMel Gorman <mgorman@suse.de>
Reported-by: NJerome Marchand <jmarchan@redhat.com>
Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>
Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

3c87ef6e

16 4月, 2015 1 次提交
- A
  nfs: generic_write_checks() shouldn't be done on swapout... · 65a4a1ca
  由 Al Viro 提交于 4月 09, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  65a4a1ca
12 4月, 2015 2 次提交

A
mirror O_APPEND and O_DIRECT into iocb->ki_flags · 2ba48ce5
由 Al Viro 提交于 4月 09, 2015
```
... avoiding write_iter/fcntl races.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
2ba48ce5

make new_sync_{read,write}() static · 5d5d5689

由 Al Viro 提交于 4月 03, 2015

All places outside of core VFS that checked ->read and ->write for being NULL or
called the methods directly are gone now, so NULL {read,write} with non-NULL
{read,write}_iter will do the right thing in all cases.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5d5d5689

28 3月, 2015 2 次提交

NFSv4.1/pnfs: Ensure that writes respect the O_SYNC flag when doing O_DIRECT · a0815d55

由 Trond Myklebust 提交于 3月 25, 2015

If the caller does not specify the O_SYNC flag, then it is legitimate
to return from O_DIRECT without doing a pNFS layoutcommit operation.
However if the file is opened O_DIRECT|O_SYNC then we'd better get it
right.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

a0815d55

NFS: File unlock needs to be a metadata synchronisation point · d9dabc1a

由 Trond Myklebust 提交于 3月 26, 2015

File unlock needs to update both data and metadata on the NFS server
in order to act as a synchronisation point for other clients.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

d9dabc1a

26 3月, 2015 1 次提交

fs: move struct kiocb to fs.h · e2e40f2c

由 Christoph Hellwig 提交于 2月 22, 2015

struct kiocb now is a generic I/O container, so move it to fs.h.
Also do a #include diet for aio.h while we're at it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e2e40f2c

04 3月, 2015 2 次提交

NFS: Don't write enable new pages while an invalidation is proceeding · ef070dcb

由 Trond Myklebust 提交于 3月 03, 2015

nfs_vm_page_mkwrite() should wait until the page cache invalidation
is finished. This is the second patch in a 2 patch series to deprecate
the NFS client's reliance on nfs_release_page() in the context of
nfs_invalidate_mapping().
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

ef070dcb

NFS: Fix a regression in the read() syscall · 874f9463

由 Trond Myklebust 提交于 3月 02, 2015

When invalidating the page cache for a regular file, we want to first
sync all dirty data to disk and then call invalidate_inode_pages2().
The latter relies on nfs_launder_page() and nfs_release_page() to deal
respectively with dirty pages, and unstable written pages.

When commit 95905446 ("NFS: avoid deadlocks with loop-back mounted
NFS filesystems.") changed the behaviour of nfs_release_page(), then it
made it possible for invalidate_inode_pages2() to fail with an EBUSY.
Unfortunately, that error is then propagated back to read().

Let's therefore work around the problem for now by protecting the call
to sync the data and invalidate_inode_pages2() so that they are atomic
w.r.t. the addition of new writes.
Later on, we can revisit whether or not we still need nfs_launder_page()
and nfs_release_page().
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

874f9463

02 3月, 2015 1 次提交

NFS: Ensure that buffered writes wait for O_DIRECT writes to complete · aa5accea

由 Trond Myklebust 提交于 2月 27, 2015

The O_DIRECT code will grab the inode->i_mutex and flush out buffered
writes, before scheduling a read or a write. However there is no
equivalent in the buffered write code to wait for O_DIRECT to complete.

Fixes a reported issue in xfstests generic/133, when first performing an
O_DIRECT write followed by a buffered write.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Tested-by: NChuck Lever <chuck.lever@oracle.com>

aa5accea

11 2月, 2015 1 次提交

mm: drop vm_ops->remap_pages and generic_file_remap_pages() stub · d83a08db

由 Kirill A. Shutemov 提交于 2月 10, 2015

Nobody uses it anymore.

[akpm@linux-foundation.org: fix filemap_xip.c]
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d83a08db

14 10月, 2014 1 次提交

block: Remove REQ_KERNEL · e19a8a0a

由 Martin K. Petersen 提交于 10月 14, 2014

REQ_KERNEL is no longer used. Remove it and drop the redundant uio
argument to nfs_file_direct_{read,write}.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

e19a8a0a

25 9月, 2014 3 次提交

NFS/SUNRPC: Remove other deadlock-avoidance mechanisms in nfs_release_page() · 1aff5256

由 NeilBrown 提交于 9月 24, 2014

Now that nfs_release_page() doesn't block indefinitely, other deadlock
avoidance mechanisms aren't needed.
 - it doesn't hurt for kswapd to block occasionally.  If it doesn't
   want to block it would clear __GFP_WAIT.  The current_is_kswapd()
   was only added to avoid deadlocks and we have a new approach for
   that.
 - memory allocation in the SUNRPC layer can very rarely try to
   ->releasepage() a page it is trying to handle.  The deadlock
   is removed as nfs_release_page() doesn't block indefinitely.

So we don't need to set PF_FSTRANS for sunrpc network operations any
more.
Signed-off-by: NNeilBrown <neilb@suse.de>
Acked-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

1aff5256

NFS: avoid waiting at all in nfs_release_page when congested. · 353db796

由 NeilBrown 提交于 9月 24, 2014

If nfs_release_page() is called on a sequence of pages which are all
in the same file which is blocked on COMMIT, each page could
contribute a 1 second delay which could be come excessive.  I have
seen delays of as much as 208 seconds.

To keep the delay to one second, mark the bdi as write-congested
if the commit didn't finished.  Once it does finish, the
write-congested flag will be cleared by nfs_commit_release_pages().

With this, the longest total delay in try_to_free_pages that I have
seen is under 3 seconds.  With no waiting in nfs_release_page at all
I have seen delays of nearly 1.5 seconds.
Signed-off-by: NNeilBrown <neilb@suse.de>
Acked-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

353db796

NFS: avoid deadlocks with loop-back mounted NFS filesystems. · 95905446

由 NeilBrown 提交于 9月 24, 2014

Support for loop-back mounted NFS filesystems is useful when NFS is
used to access shared storage in a high-availability cluster.

If the node running the NFS server fails, some other node can mount the
filesystem and start providing NFS service.  If that node already had
the filesystem NFS mounted, it will now have it loop-back mounted.

nfsd can suffer a deadlock when allocating memory and entering direct
reclaim.
While direct reclaim does not write to the NFS filesystem it can send
and wait for a COMMIT through nfs_release_page().

This patch modifies nfs_release_page() to wait a limited time for the
commit to complete - one second.  If the commit doesn't complete
in this time, nfs_release_page() will fail.  This means it might now
fail in some cases where it wouldn't before.  These cases are only
when 'gfp' includes '__GFP_WAIT'.

nfs_release_page() is only called by try_to_release_page(), and that
can only be called on an NFS page with required 'gfp' flags from
 - page_cache_pipe_buf_steal() in splice.c
 - shrink_page_list() in vmscan.c
 - invalidate_inode_pages2_range() in truncate.c

The first two handle failure quite safely.  The last is only called
after ->launder_page() has been called, and that will have waited
for the commit to finish already.

So aborting if the commit takes longer than 1 second is perfectly safe.
Signed-off-by: NNeilBrown <neilb@suse.de>
Acked-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

95905446