提交 · 88ac815cdbef93dec8382b3531ef90474dd102f2 · openeuler / Kernel

13 9月, 2014 15 次提交

nfs41: change PNFS_LAYOUTRET_ON_SETATTR to only return on truncation to smaller size · 88ac815c

由 Peng Tao 提交于 9月 12, 2014

Both blocks layout and objects layout want to use it to avoid CB_LAYOUTRECALL
but that should only happen if client is doing truncation to a smaller size.
For other cases, we let server decide if it wants to recall client's layouts.
Change PNFS_LAYOUTRET_ON_SETATTR to follow the logic and not to send
layoutreturn unnecessarily.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Boaz Harrosh <boaz@plexistor.com>
Signed-off-by: NPeng Tao <tao.peng@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

88ac815c

NFS: Move NFS v3 acl functions to nfs3_fs.h · cb8c20fa

由 Anna Schumaker 提交于 9月 03, 2014

This code is internal to the v3 module, so other parts of the client
shouldn't have any knowledge of it.

nfs3_getxattr(), nfs3_setxattr(), and nfs3_removexattr() no longer exist
anywhere so I remove the declarations while I'm here.
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

cb8c20fa

NFS: Remove v3 not compiled check from validate_mount_data() · f08460dc

由 Anna Schumaker 提交于 9月 03, 2014

This check is already performed by the module loading code - if the
module can't be found then -EPROTONOSUPPORT will be returned.  Let's
handle v3 this way, too.
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

f08460dc

NFS: Move v3 declarations out of internal.h · 00a36a10

由 Anna Schumaker 提交于 9月 03, 2014

I am generally against the "one big header file" approach, and
everything in the client includes this file.  Let's move all the NFS v3
declarations into a v3-only header file.
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

00a36a10

NFS: Unconditionally enable commit code · f418c64b

由 Anna Schumaker 提交于 9月 03, 2014

The goal is to create a generic NFS module with code that does not
depend on what versions of NFS are enabled.
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

f418c64b

T
pNFS/blocklayout: Remove a couple of unused variables · 164ae58c
由 Trond Myklebust 提交于 9月 12, 2014
```
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
```
164ae58c

pnfs: enable CB_NOTIFY_DEVICEID support · 84c9dee3

由 Christoph Hellwig 提交于 9月 10, 2014

This code has been around for a while, but never was enabled, although
it is in a working shape.

Note that we implement NOTIFY_DEVICEID4_CHANGE identical to
NOTIFY_DEVICEID4_DELETE.  Given that in either case we can't do anything
but preventing further lookups of a given device ID there isn't much difference
in semantics for the two.  For the delete case the server MUST ensure that
there are no outstanding layouts, while for the change case it doesn't, but
that has little relevance to the client.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

84c9dee3

pnfs/blocklayout: in-kernel GETDEVICEINFO XDR parsing · 5c83746a

由 Christoph Hellwig 提交于 9月 10, 2014

This patches moves parsing of the GETDEVICEINFO XDR to kernel space, as well
as the management of complex devices. The reason for that is we might have
multiple outstanding complex devices after a NOTIFY_DEVICEID4_CHANGE, which
device mapper or md can't handle as they claim devices exclusively.

But as is turns out simple striping / concatenation is fairly trivial to
implement anyway, so we make our life simpler by reducing the reliance
on blkmapd. For now we still use blkmapd by feeding it synthetic SIMPLE
device XDR to translate device signatures to device numbers, but in the
long runs I have plans to eliminate it entirely.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

5c83746a

pnfs/blocklayout: move all rpc_pipefs related code into a single file · 871760ce

由 Christoph Hellwig 提交于 9月 10, 2014

Create a file to house all the rpc_pipefs boilerplate code instead of
sprinkling it over a few files.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

871760ce

pnfs/blocklayout: refactor extent processing · ca0fe1df

由 Christoph Hellwig 提交于 9月 10, 2014

Factor out a helper for all per-extent work, and merge the now trivial
functions for lseg allocation and parsing.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

ca0fe1df

pnfs/blocklayout: move extent processing to blocklayout.c · 9cc47541

由 Christoph Hellwig 提交于 9月 10, 2014

This isn't device(id) related, so move it into the main file.  Simple move
for now, the next commit will clean it up a bit.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

9cc47541

pnfs/blocklayout: allocate separate pages for the layoutcommit payload · 34dc93c2

由 Christoph Hellwig 提交于 9月 10, 2014

Instead of overflowing the XDR send buffer with our extent list allocate
pages and pre-encode the layoutupdate payload into them.  We optimistically
allocate a single page use alloc_page and only switch to vmalloc when we
have more extents outstanding.  Currently there is only a single testcase
(xfstests generic/113) which can reproduce large enough extent lists for
this to occur.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

34dc93c2

pnfs: remove GETDEVICELIST implementation · d4b18c3e

由 Christoph Hellwig 提交于 9月 10, 2014

The current GETDEVICELIST implementation is buggy in that it doesn't handle
cursors correctly, and in that it returns an error if the server returns
NFSERR_NOTSUPP. Given that there is no actual need for GETDEVICELIST,
it has various issues and might get removed for NFSv4.2 stop using it in
the blocklayout driver, and thus the Linux NFS client as whole.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

d4b18c3e

pnfs/objlayout: fix endianess annotation in objio_alloc_deviceid_node · fd41b474

由 Christoph Hellwig 提交于 9月 10, 2014

The kbuild test robot complained about a new sparse warning in
objio_alloc_deviceid_node, but it turns out that this was just a moved
reference to an existing variable.  Fix it to have the right big endian
annotated type.

Note that there are some other endianess issues in this file that I didn't
bother to sort out as they involve global headers.
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

fd41b474

pnfs/blocklayout: remove some debugging · 3e3f6b4e

由 Christoph Hellwig 提交于 9月 10, 2014

The kbuild test robot complained that we got the printk format wrong.
Let's just kill these printks instead of fixing them as there is not
point after the initial tree algorithm debugging.
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

3e3f6b4e

11 9月, 2014 25 次提交

nfs: add __acquires and __releases annotations to seqfile start/stop routines · 8d11620e

由 Jeff Layton 提交于 9月 10, 2014

To make sparse happy...
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

8d11620e

nfs: fix RCU cl_xprt handling in nfs_swap_activate/deactivate · dad2b015

由 Jeff Layton 提交于 9月 10, 2014

sparse says:

fs/nfs/file.c:543:60: warning: incorrect type in argument 1 (different address spaces)
fs/nfs/file.c:543:60:    expected struct rpc_xprt *xprt
fs/nfs/file.c:543:60:    got struct rpc_xprt [noderef] <asn:4>*cl_xprt
fs/nfs/file.c:548:53: warning: incorrect type in argument 1 (different address spaces)
fs/nfs/file.c:548:53:    expected struct rpc_xprt *xprt
fs/nfs/file.c:548:53:    got struct rpc_xprt [noderef] <asn:4>*cl_xprt

cl_xprt is RCU-managed, so we need to take care to dereference and use
it while holding the RCU read lock.

Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

dad2b015

nfs: setattr can only change regular file sizes · 08a899d5

由 Christoph Hellwig 提交于 9月 07, 2014

The VFS never calls setattr with ATTR_SIZE on anything but regular
files.  Remove the if check and turn it into an assert similar to
what some other file systems do.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

08a899d5

pnfs/blocklayout: use the device id cache · 20d655d6

由 Christoph Hellwig 提交于 9月 02, 2014

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

20d655d6

pnfs: add a nfs4_get_deviceid helper · 30ff0603

由 Christoph Hellwig 提交于 9月 02, 2014

This will be used by the block layout driver when splitting extents.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

30ff0603

pnfs: add a common GETDEVICELIST implementation · 9dd2fcd3

由 Christoph Hellwig 提交于 9月 02, 2014

At a simple helper to issue a GETDEVICELIST operation and pre-load
the device id cache based on the result.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

9dd2fcd3

pnfs: factor GETDEVICEINFO implementations · 661373b1

由 Christoph Hellwig 提交于 9月 02, 2014

Add support to the common pNFS core to issue GETDEVICEINFO calls on
a device ID cache miss. The code is taken from the well debugged
file layout implementation and calls out to the layoutdriver through
a new alloc_deviceid_node method. The calling conventions for
nfs4_find_get_deviceid are changed so that all information needed to
send a GETDEVICEINFO request is passed to the common code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

661373b1

pnfs/blocklayout: return layouts on setattr · 848746bd

由 Christoph Hellwig 提交于 9月 10, 2014

This speads up truncate-heavy workloads like fsx by multiple orders of
magnitude.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

848746bd

pnfs/blocklayout: implement the return_range method · 71d5b763

由 Christoph Hellwig 提交于 9月 10, 2014

This allows removing extents from the extent tree especially on truncate
operations, and thus fixing reads from truncated and re-extended that
previously returned stale data.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

71d5b763

pnfs/blocklayout: rewrite extent tracking · 8067253c

由 Christoph Hellwig 提交于 9月 10, 2014

Currently the block layout driver tracks extents in three separate
data structures:

 - the two list of pnfs_block_extent structures returned by the server
 - the list of sectors that were in invalid state but have been written to
 - a list of pnfs_block_short_extent structures for LAYOUTCOMMIT

All of these share the property that they are not only highly inefficient
data structures, but also that operations on them are even more inefficient
than nessecary.

In addition there are various implementation defects like:

 - using an int to track sectors, causing corruption for large offsets
 - incorrect normalization of page or block granularity ranges
 - insufficient error handling
 - incorrect synchronization as extents can be modified while they are in
   use

This patch replace all three data with a single unified rbtree structure
tracking all extents, as well as their in-memory state, although we still
need to instance for read-only and read-write extent due to the arcane
client side COW feature in the block layouts spec.

To fix the problem of extent possibly being modified while in use we make
sure to return a copy of the extent for use in the write path - the
extent can only be invalidated by a layout recall or return which has
to wait until the I/O operations finished due to refcounts on the layout
segment.

The new extent tree work similar to the schemes used by block based
filesystems like XFS or ext4.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

8067253c

pnfs/blocklayout: don't set pages uptodate · 8c792ea9

由 Christoph Hellwig 提交于 9月 10, 2014

The core nfs code handles setting pages uptodate on reads, no need to mess
with the pageflags outselves.  Also remove a debug function to dump page
flags.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

8c792ea9

pnfs/blocklayout: remove read-modify-write handling in bl_write_pagelist · 3a6fd1f0

由 Christoph Hellwig 提交于 9月 10, 2014

Use the new PNFS_READ_WHOLE_PAGE flag to offload read-modify-write
handling to core nfs code, and remove a huge chunk of deadlock prone
mess from the block layout writeback path.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

3a6fd1f0

pnfs: add return_range method · c88953d8

由 Christoph Hellwig 提交于 9月 10, 2014

If a layout driver keeps per-inode state outside of the layout segments it
needs to be notified of any layout returns or recalls on an inode, and not
just about the freeing of layout segments. Add a method to acomplish this,
which will allow the block layout driver to handle the case of truncated
and re-expanded files properly.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

c88953d8

pnfs: add flag to force read-modify-write in ->write_begin · 612aa983

由 Christoph Hellwig 提交于 9月 10, 2014

Like all block based filesystems, the pNFS block layout driver can't read
or write at a byte granularity and thus has to perform read-modify-write
cycles on writes smaller than this granularity.

Add a flag so that the core NFS code always reads a whole page when
starting a smaller write, so that we can do it in the place where the VFS
expects it instead of doing in very deadlock prone way in the writeback
handler.

Note that in theory we could do less than page size reads here for disks
that have a smaller sector size which are served by a server with a smaller
pnfs block size.  But so far that doesn't seem like a worthwhile
optimization.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

612aa983

pnfs: force a layout commit when encountering busy segments during recall · 7c5d1875

由 Christoph Hellwig 提交于 9月 10, 2014

Expedite layout recall processing by forcing a layout commit when
we see busy segments.  Without it the layout recall might have to wait
until the VM decided to start writeback for the file, which can introduce
long delays.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

7c5d1875

NFS: Fix a compile warning when !(CONFIG_NFS_V3 || CONFIG_NFS_V4) · 3a3908c8

由 Trond Myklebust 提交于 9月 08, 2014

gcc reports:

linux/fs/nfs/write.c: In function ‘nfs_page_find_head_request_locked.isra.17’:
linux/fs/nfs/write.c:121:64: warning: ‘cinfo.mds’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  list_for_each_entry_safe(freq, t, &cinfo.mds->list, wb_list) {
                                                                  ^
linux/fs/nfs/write.c:110:25: note: ‘cinfo.mds’ was declared here
  struct nfs_commit_info cinfo;
Reported-by: NAnna Schumaker <Anna.Schumaker@netapp.com>
Cc: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

3a3908c8

pnfs/blocklayout: correctly decrement extent length · 921b81a8

由 Christoph Hellwig 提交于 8月 21, 2014

When we do non-page sized reads we can underflow the extent_length variable
and read incorrect data. Fix the extent_length calculation and change to
defensive <= checks for the extent length in the read and write path.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

921b81a8

pnfs/blocklayout: plug block queues · be98fd0a

由 Christoph Hellwig 提交于 8月 21, 2014

Make sure the block queue is plugged when performing pNFS blocklayout I/O.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

be98fd0a

pnfs/blocklayout: improve GETDEVICEINFO error reporting · 72c5e59f

由 Christoph Hellwig 提交于 8月 21, 2014

Tell userspace what stage of GETDEVICEINFO failed so that there is a chance
to debug it, especially with the userspace daemon clusterf***k in the block
layout driver.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

72c5e59f

pnfs/blocklayout: reject pnfs blocksize larger than page size · e3aaf7f2

由 Christoph Hellwig 提交于 8月 21, 2014

The Linux VM subsystem can't support block sizes larger than page size
for block based filesystems very well. While this can be hacked around
to some extent for simple filesystems the read-modify-write cycles
required for pnfs block invalid extents are extremly deadlock prone
when operating on multiple pages. Reject this case early on instead
of pretending to support it (badly).
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

e3aaf7f2

pnfs: allow splicing pre-encoded pages into the layoutcommit args · 5f919c9f

由 Christoph Hellwig 提交于 8月 21, 2014

Currently there is no XDR buffer space allocated for the per-layout driver
layoutcommit payload, which leads to server buffer overflows in the
blocklayout driver even under simple workloads. As we can't do per-layout
sizes for XDR operations we'll have to splice a previously encoded list
of pages into the XDR stream, similar to how we handle ACL buffers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

5f919c9f

pnfs: avoid using stale stateids after layoutreturn · 47abadef

由 Christoph Hellwig 提交于 8月 21, 2014

After we issued a layoutreturn operations the may free the layout stateid
and will thus cause bad stateid error when the client uses it again.

We currently try to avoid this case by chosing the open stateid if not
lsegs are present for this inode. But various places can hold refererence
on lsegs and thus cause the list not to be empty shortly after a layout
return. Add an explicit flag to mark the current layout stateid invalid
and force usage of the openstateid after we did a full file layoutreturn.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

47abadef

pnfs: retry after a bad stateid error from layoutget · defb8460

由 Christoph Hellwig 提交于 8月 21, 2014

Currently we fall through to nfs4_async_handle_error when we get
a bad stateid error back from layoutget. nfs4_async_handle_error
with a NULL state argument will never retry the operations but return
the error to higher layer, causing an avoiable fallback to MDS I/O.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

defb8460

pnfs: don't check sequence on new stateids in layoutget · 362f7474

由 Christoph Hellwig 提交于 8月 21, 2014

When layoutget returns an entirely new layout stateid it should not
check the generation counter as the new stateid will start with a new
counter entirely unrelated to old one.

The current behavior causes constant layoutget failures against a block
server which allocates a new stateid after an recall that removed all
outstanding layouts.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

362f7474

pnfs: do not pass uninitialized lsegs to ->free_lseg · 1013df61

由 Christoph Hellwig 提交于 8月 21, 2014

Ensure the lsegs are initialized early so that we don't pass an unitialized
one back to ->free_lseg during error processing.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

1013df61

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功