提交 · 31ef83dc053835fc14741426e20c60dbbba8c13d · openanolis / cloud-kernel

03 2月, 2015 13 次提交

由 Christoph Hellwig 提交于 8月 16, 2014

For now just a few simple events to trace the layout stateid lifetime, but
these already were enough to find several bugs in the Linux client layout
stateid handling.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

31ef83dc

C
nfsd: update documentation for pNFS support · 18d1aef8
由 Christoph Hellwig 提交于 9月 25, 2014
```
Signed-off-by: NChristoph Hellwig <hch@lst.de>
```
18d1aef8

nfsd: implement pNFS layout recalls · c5c707f9

由 Christoph Hellwig 提交于 9月 23, 2014

Add support to issue layout recalls to clients. For now we only support
full-file recalls to get a simple and stable implementation. This allows
to embedd a nfsd4_callback structure in the layout_state and thus avoid
any memory allocations under spinlocks during a recall. For normal
use cases that do not intent to share a single file between multiple
clients this implementation is fully sufficient.

To ensure layouts are recalled on local filesystem access each layout
state registers a new FL_LAYOUT lease with the kernel file locking code,
which filesystems that support pNFS exports that require recalls need
to break on conflicting access patterns.

The XDR code is based on the old pNFS server implementation by
Andy Adamson, Benny Halevy, Boaz Harrosh, Dean Hildebrand, Fred Isaman,
Marc Eshel, Mike Sager and Ricardo Labiaga.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

c5c707f9

nfsd: implement pNFS operations · 9cf514cc

由 Christoph Hellwig 提交于 5月 05, 2014

Add support for the GETDEVICEINFO, LAYOUTGET, LAYOUTCOMMIT and
LAYOUTRETURN NFSv4.1 operations, as well as backing code to manage
outstanding layouts and devices.

Layout management is very straight forward, with a nfs4_layout_stateid
structure that extends nfs4_stid to manage layout stateids as the
top-level structure. It is linked into the nfs4_file and nfs4_client
structures like the other stateids, and contains a linked list of
layouts that hang of the stateid. The actual layout operations are
implemented in layout drivers that are not part of this commit, but
will be added later.

The worst part of this commit is the management of the pNFS device IDs,
which suffers from a specification that is not sanely implementable due
to the fact that the device-IDs are global and not bound to an export,
and have a small enough size so that we can't store the fsid portion of
a file handle, and must never be reused. As we still do need perform all
export authentication and validation checks on a device ID passed to
GETDEVICEINFO we are caught between a rock and a hard place. To work
around this issue we add a new hash that maps from a 64-bit integer to a
fsid so that we can look up the export to authenticate against it,
a 32-bit integer as a generation that we can bump when changing the device,
and a currently unused 32-bit integer that could be used in the future
to handle more than a single device per export. Entries in this hash
table are never deleted as we can't reuse the ids anyway, and would have
a severe lifetime problem anyway as Linux export structures are temporary
structures that can go away under load.

Parts of the XDR data, structures and marshaling/unmarshaling code, as
well as many concepts are derived from the old pNFS server implementation
from Andy Adamson, Benny Halevy, Dean Hildebrand, Marc Eshel, Fred Isaman,
Mike Sager, Ricardo Labiaga and many others.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

9cf514cc

C
nfsd: make find_any_file available outside nfs4state.c · 4d227fca
由 Christoph Hellwig 提交于 8月 17, 2014
```
Signed-off-by: NChristoph Hellwig <hch@lst.de>
```
4d227fca
C
nfsd: make find/get/put file available outside nfs4state.c · e6ba76e1
由 Christoph Hellwig 提交于 8月 14, 2014
```
Signed-off-by: NChristoph Hellwig <hch@lst.de>
```
e6ba76e1
C
nfsd: make lookup/alloc/unhash_stid available outside nfs4state.c · cd61c522
由 Christoph Hellwig 提交于 8月 14, 2014
```
Signed-off-by: NChristoph Hellwig <hch@lst.de>
```
cd61c522

nfsd: add fh_fsid_match helper · 9558f250

由 Christoph Hellwig 提交于 8月 13, 2014

Add a helper to check that the fsid parts of two file handles match.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

9558f250

nfsd: move nfsd_fh_match to nfsfh.h · 4d94c2ef

由 Christoph Hellwig 提交于 8月 14, 2014

The pnfs code will need it too.  Also remove the nfsd_ prefix to match the
other filehandle helpers in that file.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

4d94c2ef

fs: add FL_LAYOUT lease type · 11afe9f7

由 Christoph Hellwig 提交于 1月 21, 2015

This (ab-)uses the file locking code to allow filesystems to recall
outstanding pNFS layouts on a file. This new lease type is similar but
not quite the same as FL_DELEG. A FL_LAYOUT lease can always be granted,
an a per-filesystem lock (XFS iolock for the initial implementation)
ensures not FL_LAYOUT leases granted when we would need to recall them.

Also included are changes that allow multiple outstanding read
leases of different types on the same file as long as they have a
differnt owner. This wasn't a problem until now as nfsd never set
FL_LEASE leases, and no one else used FL_DELEG leases, but given that
nfsd will also issues FL_LAYOUT leases we will have to handle it now.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

11afe9f7

fs: track fl_owner for leases · 2ab99ee1

由 Christoph Hellwig 提交于 1月 21, 2015

Just like for other lock types we should allow different owners to have
a read lease on a file.  Currently this can't happen, but with the addition
of pNFS layout leases we'll need this feature.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

2ab99ee1

nfs: add LAYOUT_TYPE_MAX enum value · 6cae0a46

由 Christoph Hellwig 提交于 8月 16, 2014

This gives us a nice upper bound for later use in nfѕd.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

6cae0a46

J
Merge branch 'locks-3.20' of git://git.samba.org/jlayton/linux into for-3.20 · a584143b
由 J. Bruce Fields 提交于 2月 02, 2015
```
Christoph's block pnfs patches have some minor dependencies on these
lock patches.
```
a584143b

23 1月, 2015 3 次提交

C
nfsd: factor out a helper to decode nfstime4 values · 4c94e13e
由 Christoph Hellwig 提交于 1月 22, 2015
```
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
```
4c94e13e

sunrpc/lockd: fix references to the BKL · 3c519914

由 Jeff Layton 提交于 1月 22, 2015

The BKL is completely out of the picture in the lockd and sunrpc code
these days. Update the antiquated comments that refer to it.
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

3c519914

nfsd: fix year-2038 nfs4 state problem · bbc7f33a

由 J. Bruce Fields 提交于 1月 20, 2015

Someone with a weird time_t happened to notice this, it shouldn't really
manifest till 2038.  It may not be our ownly year-2038 problem.
Reported-by: NAaron Pace <Aaron.Pace@alcatel-lucent.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

bbc7f33a

22 1月, 2015 1 次提交
- J
  locks: update comments that refer to inode->i_flock · 8116bf4c
  由 Jeff Layton 提交于 1月 21, 2015
```
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
```
  8116bf4c
17 1月, 2015 12 次提交

locks: consolidate NULL i_flctx checks in locks_remove_file · 3d8e560d

由 Jeff Layton 提交于 1月 16, 2015

We have each of the locks_remove_* variants doing this individually.
Have the caller do it instead, and have locks_remove_flock and
locks_remove_lease just assume that it's a valid pointer.
Signed-off-by: NJeff Layton <jlayton@primarydata.com>

3d8e560d

locks: keep a count of locks on the flctx lists · 9bd0f45b

由 Jeff Layton 提交于 1月 16, 2015

This makes things a bit more efficient in the cifs and ceph lock
pushing code.
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Acked-by: NChristoph Hellwig <hch@lst.de>

9bd0f45b

locks: clean up the lm_change prototype · 7448cc37

由 Jeff Layton 提交于 1月 16, 2015

Now that we use standard list_heads for tracking leases, we can have
lm_change take a pointer to the lease to be modified instead of a
double pointer.
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Acked-by: NChristoph Hellwig <hch@lst.de>

7448cc37

locks: add a dedicated spinlock to protect i_flctx lists · 6109c850

由 Jeff Layton 提交于 1月 16, 2015

We can now add a dedicated spinlock without expanding struct inode.
Change to using that to protect the various i_flctx lists.
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Acked-by: NChristoph Hellwig <hch@lst.de>

6109c850

locks: remove i_flock field from struct inode · a7231a97

由 Jeff Layton 提交于 1月 16, 2015

Nothing uses it anymore. Also add a forward declaration for struct
file_lock to silence some compiler warnings that the removal triggers.
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Acked-by: NChristoph Hellwig <hch@lst.de>

a7231a97

J
locks: convert lease handling to file_lock_context · 8634b51f
由 Jeff Layton 提交于 1月 16, 2015
```
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Acked-by: NChristoph Hellwig <hch@lst.de>
```
8634b51f
J
locks: convert posix locks to file_lock_context · bd61e0a9
由 Jeff Layton 提交于 1月 16, 2015
```
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Acked-by: NChristoph Hellwig <hch@lst.de>
```
bd61e0a9

locks: move flock locks to file_lock_context · 5263e31e

由 Jeff Layton 提交于 1月 16, 2015

Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Acked-by: NChristoph Hellwig <hch@lst.de>

5263e31e

ceph: move spinlocking into ceph_encode_locks_to_buffer and ceph_count_locks · c362781c

由 Jeff Layton 提交于 1月 16, 2015

There is only a single call site for each of these functions, and the
caller takes the i_lock prior to calling them and drops it just
afterward. Move the spinlocking into the functions instead.
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Acked-by: NChristoph Hellwig <hch@lst.de>

c362781c

locks: add a new struct file_locking_context pointer to struct inode · 4a075e39

由 Jeff Layton 提交于 1月 16, 2015

The current scheme of using the i_flock list is really difficult to
manage. There is also a legitimate desire for a per-inode spinlock to
manage these lists that isn't the i_lock.

Start conversion to a new scheme to eventually replace the old i_flock
list with a new "file_lock_context" object.

We start by adding a new i_flctx to struct inode. For now, it lives in
parallel with i_flock list, but will eventually replace it. The idea is
to allocate a structure to sit in that pointer and act as a locus for
all things file locking.

We allocate a file_lock_context for an inode when the first lock is
added to it, and it's only freed when the inode is freed. We use the
i_lock to protect the assignment, but afterward it should mostly be
accessed locklessly.
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Acked-by: NChristoph Hellwig <hch@lst.de>

4a075e39

locks: have locks_release_file use flock_lock_file to release generic flock locks · dd459bb1

由 Jeff Layton 提交于 1月 16, 2015

...instead of open-coding it and removing flock locks directly. This
helps consolidate the flock lock removal logic into a single spot.
Signed-off-by: NJeff Layton <jlayton@primarydata.com>

dd459bb1

locks: add new struct list_head to struct file_lock · 6dee60f6

由 Jeff Layton 提交于 1月 16, 2015

...that we can use to queue file_locks to per-ctx list_heads. Go ahead
and convert locks_delete_lock and locks_dispose_list to use it instead
of the fl_block list.
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Acked-by: NChristoph Hellwig <hch@lst.de>

6dee60f6

16 1月, 2015 11 次提交

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse · cb596708

由 Linus Torvalds 提交于 1月 16, 2015

Pull fuse fixes from Miklos Szeredi:
 "This fixes a regression in the latest fuse update plus a fix for a
  rather theoretical memory ordering issue"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: add memory barrier to INIT
  fuse: fix LOOKUP vs INIT compat handling

cb596708

Merge tag 'fbdev-fixes-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux · 0b6212e0

由 Linus Torvalds 提交于 1月 16, 2015

Pull fbdev fixes from Tomi Valkeinen:
 - broadsheetfb: fix memory leak
 - simplefb: fix build failure on sparc

* tag 'fbdev-fixes-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux:
  fbdev/broadsheetfb: fix memory leak
  simplefb: Fix build failure on Sparc

0b6212e0

Merge tag 'mmc-v3.19-4' of git://git.linaro.org/people/ulf.hansson/mmc · 7b552bc1

由 Linus Torvalds 提交于 1月 16, 2015

Pull MMC bugfix from Ulf Hansson:
 "Fix sdhci regulator regression for Qualcomm and Nvidia boards"

* tag 'mmc-v3.19-4' of git://git.linaro.org/people/ulf.hansson/mmc:
  mmc: sdhci: Set SDHCI_POWER_ON with external vmmc

7b552bc1

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k · f8cb3954

由 Linus Torvalds 提交于 1月 16, 2015

Pull m68k fixlet from Geert Uytterhoeven.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
  m68k: Wire up execveat

f8cb3954

Merge tag 'powerpc-3.19-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux · 3fa116e8

由 Linus Torvalds 提交于 1月 16, 2015

Pull powerpc fixes from Michael Ellerman:
 "A few powerpc fixes"

* tag 'powerpc-3.19-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux:
  powerpc: Work around gcc bug in current_thread_info()
  cxl: Fix issues when unmapping contexts
  powernv: Fix OPAL tracepoint code

3fa116e8

svcrdma: Handle additional inline content · a97c331f

由 Chuck Lever 提交于 1月 13, 2015

Most NFS RPCs place their large payload argument at the end of the
RPC header (eg, NFSv3 WRITE). For NFSv3 WRITE and SYMLINK, RPC/RDMA
sends the complete RPC header inline, and the payload argument in
the read list. Data in the read list is the last part of the XDR
stream.

One important case is not like this, however. NFSv4 COMPOUND is a
counted array of operations. A WRITE operation, with its large data
payload, can appear in the middle of the compound's operations
array. Thus NFSv4 WRITE compounds can have header content after the
WRITE payload.

The Linux client, for example, performs an NFSv4 WRITE like this:

  { PUTFH, WRITE, GETATTR }

Though RFC 5667 is not precise about this, the proper way to convey
this compound is to place the GETATTR inline, _after_ the front of
the RPC header. The receiver inserts the read list payload into the
XDR stream after the initial WRITE arguments, and before the GETATTR
operation, thanks to the value of the read list "position" field.

The Linux client currently sends the GETATTR at the end of the
RPC/RDMA read list, which is incorrect. It will be corrected in the
future.

The Linux server currently rejects NFSv4 compounds with inline
content after the read list. For the above NFSv4 WRITE compound, the
NFS compound header indicates there are three operations, but the
server finds nonsense when it looks in the XDR stream for the third
operation, and the compound fails with OP_ILLEGAL.

Move trailing inline content to the end of the XDR buffer's page
list. This presents incoming NFSv4 WRITE compounds to NFSD in the
same way the socket transport does.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

a97c331f

svcrdma: Move read list XDR round-up logic · fcbeced5

由 Chuck Lever 提交于 1月 13, 2015

This is a pre-requisite for a subsequent patch.

Read list XDR round-up needs to be done _before_ additional inline
content is copied to the end of the XDR buffer's page list. Move
the logic added by commit e560e3b5 ("svcrdma: Add zero padding
if the client doesn't send it").
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

fcbeced5

svcrdma: Support RDMA_NOMSG requests · 0b056c22

由 Chuck Lever 提交于 1月 13, 2015

Currently the Linux server can not decode RDMA_NOMSG type requests.
Operations whose length exceeds the fixed size of RDMA SEND buffers,
like large NFSv4 CREATE(NF4LNK) operations, must be conveyed via
RDMA_NOMSG.

For an RDMA_MSG type request, the client sends the RPC/RDMA, RPC
headers, and some or all of the NFS arguments via RDMA SEND.

For an RDMA_NOMSG type request, the client sends just the RPC/RDMA
header via RDMA SEND. The request's read list contains elements for
the entire RPC message, including the RPC header.

NFSD expects the RPC/RMDA header and RPC header to be contiguous in
page zero of the XDR buffer. Add logic in the RDMA READ path to make
the read list contents land where the server prefers, when the
incoming message is a type RDMA_NOMSG message.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

0b056c22

svcrdma: rc_position sanity checking · 61edbcb7

由 Chuck Lever 提交于 1月 13, 2015

An RPC/RDMA client may send large RPC arguments via a read
list. This is a list of scatter/gather elements which convey
RPC call arguments too large to fit in a small RDMA SEND.

Each entry in the read list has a "position" field, whose value is
the byte offset in the XDR stream where the data in that entry is to
be inserted. Entries which share the same "position" value make up
the same RPC argument. The receiver inserts entries with the same
position field value in list order into the XDR stream.

Currently the Linux NFS/RDMA server cannot handle receiving read
chunks in more than one position, mostly because no current client
sends read lists with elements in more than one position. As a
sanity check, ensure that all received chunks have the same
"rc_position."
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

61edbcb7

svcrdma: Plant reader function in struct svcxprt_rdma · e5452411

由 Chuck Lever 提交于 1月 13, 2015

The RDMA reader function doesn't change once an svcxprt_rdma is
instantiated. Instead of checking sc_devcap during every incoming
RPC, set the reader function once when the connection is accepted.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

e5452411

svcrdma: Find rmsgp more reliably · e5523bd2

由 Chuck Lever 提交于 1月 13, 2015

xdr_start() can return the wrong rmsgp address if an assumption
about how the xdr_buf was constructed changes.  When it gets it
wrong, the client receives a reply that has gibberish in the
RPC/RDMA header, preventing it from matching a waiting RPC request.

Instead, make (and document) just one assumption: that the RDMA
header for the client's RPC call is at the start of the first page
in rq_pages.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

e5523bd2

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功