提交 · 5d50ffd7c31dab47c6b828841ca1ec70a1b40169 · openanolis / cloud-kernel

31 3月, 2014 13 次提交

locks: add new fcntl cmd values for handling file private locks · 5d50ffd7

由 Jeff Layton 提交于 2月 03, 2014

Due to some unfortunate history, POSIX locks have very strange and
unhelpful semantics. The thing that usually catches people by surprise
is that they are dropped whenever the process closes any file descriptor
associated with the inode.

This is extremely problematic for people developing file servers that
need to implement byte-range locks. Developers often need a "lock
management" facility to ensure that file descriptors are not closed
until all of the locks associated with the inode are finished.

Additionally, "classic" POSIX locks are owned by the process. Locks
taken between threads within the same process won't conflict with one
another, which renders them useless for synchronization between threads.

This patchset adds a new type of lock that attempts to address these
issues. These locks conflict with classic POSIX read/write locks, but
have semantics that are more like BSD locks with respect to inheritance
and behavior on close.

This is implemented primarily by changing how fl_owner field is set for
these locks. Instead of having them owned by the files_struct of the
process, they are instead owned by the filp on which they were acquired.
Thus, they are inherited across fork() and are only released when the
last reference to a filp is put.

These new semantics prevent them from being merged with classic POSIX
locks, even if they are acquired by the same process. These locks will
also conflict with classic POSIX locks even if they are acquired by
the same process or on the same file descriptor.

The new locks are managed using a new set of cmd values to the fcntl()
syscall. The initial implementation of this converts these values to
"classic" cmd values at a fairly high level, and the details are not
exposed to the underlying filesystem. We may eventually want to push
this handing out to the lower filesystem code but for now I don't
see any need for it.

Also, note that with this implementation the new cmd values are only
available via fcntl64() on 32-bit arches. There's little need to
add support for legacy apps on a new interface like this.
Signed-off-by: NJeff Layton <jlayton@redhat.com>

5d50ffd7

locks: skip deadlock detection on FL_FILE_PVT locks · 57b65325

由 Jeff Layton 提交于 2月 03, 2014

It's not really feasible to do deadlock detection with FL_FILE_PVT
locks since they aren't owned by a single task, per-se. Deadlock
detection also tends to be rather expensive so just skip it for
these sorts of locks.

Also, add a FIXME comment about adding more limited deadlock detection
that just applies to ro -> rw upgrades, per Andy's request.

Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: NJeff Layton <jlayton@redhat.com>

57b65325

locks: pass the cmd value to fcntl_getlk/getlk64 · c1e62b8f

由 Jeff Layton 提交于 2月 03, 2014

Once we introduce file private locks, we'll need to know what cmd value
was used, as that affects the ownership and whether a conflict would
arise.
Signed-off-by: NJeff Layton <jlayton@redhat.com>

c1e62b8f

locks: report l_pid as -1 for FL_FILE_PVT locks · 3fd80cdd

由 Jeff Layton 提交于 2月 03, 2014

FL_FILE_PVT locks are no longer tied to a particular pid, and are
instead inheritable by child processes. Report a l_pid of '-1' for
these sorts of locks since the pid is somewhat meaningless for them.

This precedent comes from FreeBSD. There, POSIX and flock() locks can
conflict with one another. If fcntl(F_GETLK, ...) returns a lock set
with flock() then the l_pid member cannot be a process ID because the
lock is not held by a process as such.
Acked-by: NJ. Bruce Fields <bfields@fieldses.org>
Signed-off-by: NJeff Layton <jlayton@redhat.com>

3fd80cdd

locks: make /proc/locks show IS_FILE_PVT locks as type "FLPVT" · c918d42a

由 Jeff Layton 提交于 2月 03, 2014

In a later patch, we'll be adding a new type of lock that's owned by
the struct file instead of the files_struct. Those sorts of locks
will be flagged with a new FL_FILE_PVT flag.

Report these types of locks as "FLPVT" in /proc/locks to distinguish
them from "classic" POSIX locks.
Acked-by: NJ. Bruce Fields <bfields@fieldses.org>
Signed-off-by: NJeff Layton <jlayton@redhat.com>

c918d42a

locks: rename locks_remove_flock to locks_remove_file · 78ed8a13

由 Jeff Layton 提交于 2月 03, 2014

This function currently removes leases in addition to flock locks and in
a later patch we'll have it deal with file-private locks too. Rename it
to locks_remove_file to indicate that it removes locks that are
associated with a particular struct file, and not just flock locks.
Acked-by: NJ. Bruce Fields <bfields@fieldses.org>
Signed-off-by: NJeff Layton <jlayton@redhat.com>

78ed8a13

locks: consolidate checks for compatible filp->f_mode values in setlk handlers · bce7560d

由 Jeff Layton 提交于 2月 03, 2014

Move this check into flock64_to_posix_lock instead of duplicating it in
two places. This also fixes a minor wart in the code where we continue
referring to the struct flock after converting it to struct file_lock.
Acked-by: NJ. Bruce Fields <bfields@fieldses.org>
Signed-off-by: NJeff Layton <jlayton@redhat.com>

bce7560d

locks: fix posix lock range overflow handling · ef12e72a

由 J. Bruce Fields 提交于 2月 03, 2014

In the 32-bit case fcntl assigns the 64-bit f_pos and i_size to a 32-bit
off_t.

The existing range checks also seem to depend on signed arithmetic
wrapping when it overflows.  In practice maybe that works, but we can be
more careful.  That also allows us to make a more reliable distinction
between -EINVAL and -EOVERFLOW.

Note that in the 32-bit case SEEK_CUR or SEEK_END might allow the caller
to set a lock with starting point no longer representable as a 32-bit
value.  We could return -EOVERFLOW in such cases, but the locks code is
capable of handling such ranges, so we choose to be lenient here.  The
only problem is that subsequent GETLK calls on such a lock will fail
with EOVERFLOW.

While we're here, do some cleanup including consolidating code for the
flock and flock64 cases.
Signed-off-by: NJ. Bruce Fields <bfields@fieldses.org>
Signed-off-by: NJeff Layton <jlayton@redhat.com>

ef12e72a

locks: eliminate BUG() call when there's an unexpected lock on file close · 8c3cac5e

由 Jeff Layton 提交于 2月 03, 2014

A leftover lock on the list is surely a sign of a problem of some sort,
but it's not necessarily a reason to panic the box. Instead, just log a
warning with some info about the lock, and then delete it like we would
any other lock.

In the event that the filesystem declares a ->lock f_op, we may end up
leaking something, but that's generally preferable to an immediate
panic.
Acked-by: NJ. Bruce Fields <bfields@fieldses.org>
Signed-off-by: NJeff Layton <jlayton@redhat.com>

8c3cac5e

J
locks: add __acquires and __releases annotations to locks_start and locks_stop · b03dfdec
由 Jeff Layton 提交于 2月 03, 2014
```
...to make sparse happy.
Acked-by: NJ. Bruce Fields <bfields@fieldses.org>
Signed-off-by: NJeff Layton <jlayton@redhat.com>
```
b03dfdec

locks: remove "inline" qualifier from fl_link manipulation functions · 6ca10ed8

由 Jeff Layton 提交于 2月 03, 2014

It's best to let the compiler decide that.
Acked-by: NJ. Bruce Fields <bfields@fieldses.org>
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NJeff Layton <jlayton@redhat.com>

6ca10ed8

locks: clean up comment typo · 46dad760

由 Jeff Layton 提交于 2月 03, 2014

Acked-by: NJ. Bruce Fields <bfields@fieldses.org>
Signed-off-by: NJeff Layton <jlayton@redhat.com>

46dad760

locks: close potential race between setlease and open · 24cbe784

由 Jeff Layton 提交于 2月 03, 2014

As Al Viro points out, there is an unlikely, but possible race between
opening a file and setting a lease on it. generic_add_lease is done with
the i_lock held, but the inode->i_flock check in break_lease is
lockless. It's possible for another task doing an open to do the entire
pathwalk and call break_lease between the point where generic_add_lease
checks for a conflicting open and adds the lease to the list. If this
occurs, we can end up with a lease set on the file with a conflicting
open.

To guard against that, check again for a conflicting open after adding
the lease to the i_flock list. If the above race occurs, then we can
simply unwind the lease setting and return -EAGAIN.

Because we take dentry references and acquire write access on the file
before calling break_lease, we know that if the i_flock list is empty
when the open caller goes to check it then the necessary refcounts have
already been incremented. Thus the additional check for a conflicting
open will see that there is one and the setlease call will fail.

Cc: Bruce Fields <bfields@fieldses.org>
Cc: David Howells <dhowells@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@fieldses.org>

24cbe784

13 11月, 2013 1 次提交

locks: missing unlock on error in generic_add_lease() · 4fdb793f

由 Dan Carpenter 提交于 11月 13, 2013

We should unlock here before returning.

Fixes: df4e8d2c ('locks: implement delegations')
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4fdb793f

09 11月, 2013 2 次提交

locks: implement delegations · df4e8d2c

由 J. Bruce Fields 提交于 3月 05, 2012

Implement NFSv4 delegations at the vfs level using the new FL_DELEG lock
type.

Note nfsd is the only delegation user and is only using read
delegations.  Warn on any attempt to set a write delegation for now.
We'll come back to that case later.
Acked-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

df4e8d2c

locks: introduce new FL_DELEG lock flag · 617588d5

由 J. Bruce Fields 提交于 7月 01, 2011

For now FL_DELEG is just a synonym for FL_LEASE.  So this patch doesn't
change behavior.

Next we'll modify break_lease to treat FL_DELEG leases differently, to
account for the fact that NFSv4 delegations should be broken in more
situations than Windows oplocks.
Acked-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

617588d5

25 10月, 2013 1 次提交
- A
  file->f_op is never NULL... · 72c2d531
  由 Al Viro 提交于 9月 22, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  72c2d531
08 7月, 2013 1 次提交

locks: move file_lock_list to a set of percpu hlist_heads and convert file_lock_lock to an lglock · 7012b02a

由 Jeff Layton 提交于 6月 21, 2013

The file_lock_list is only used for /proc/locks. The vastly common case
is for locks to be put onto the list and come off again, without ever
being traversed.

Help optimize for this use-case by moving to percpu hlist_head-s. At the
same time, we can make the locking less contentious by moving to an
lglock. When iterating over the lists for /proc/locks, we must take the
global lock and then iterate over each CPU's list in turn.

This change necessitates a new fl_link_cpu field to keep track of which
CPU the entry is on. On x86_64 at least, this field is placed within an
existing hole in the struct to avoid growing the size.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Acked-by: NJ. Bruce Fields <bfields@fieldses.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7012b02a

05 7月, 2013 1 次提交
- A
  helper for reading ->d_count · 84d08fa8
  由 Al Viro 提交于 7月 05, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  84d08fa8
29 6月, 2013 12 次提交

locks: give the blocked_hash its own spinlock · 7b2296af

由 Jeff Layton 提交于 6月 21, 2013

There's no reason we have to protect the blocked_hash and file_lock_list
with the same spinlock. With the tests I have, breaking it in two gives
a barely measurable performance benefit, but it seems reasonable to make
this locking as granular as possible.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7b2296af

locks: add a new "lm_owner_key" lock operation · 3999e493

由 Jeff Layton 提交于 6月 21, 2013

Currently, the hashing that the locking code uses to add these values
to the blocked_hash is simply calculated using fl_owner field. That's
valid in most cases except for server-side lockd, which validates the
owner of a lock based on fl_owner and fl_pid.

In the case where you have a small number of NFS clients doing a lot
of locking between different processes, you could end up with all
the blocked requests sitting in a very small number of hash buckets.

Add a new lm_owner_key operation to the lock_manager_operations that
will generate an unsigned long to use as the key in the hashtable.
That function is only implemented for server-side lockd, and simply
XORs the fl_owner and fl_pid.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Acked-by: NJ. Bruce Fields <bfields@fieldses.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

3999e493

locks: turn the blocked_list into a hashtable · 48f74186

由 Jeff Layton 提交于 6月 21, 2013

Break up the blocked_list into a hashtable, using the fl_owner as a key.
This speeds up searching the hash chains, which is especially significant
for deadlock detection.

Note that the initial implementation assumes that hashing on fl_owner is
sufficient. In most cases it should be, with the notable exception being
server-side lockd, which compares ownership using a tuple of the
nlm_host and the pid sent in the lock request. So, this may degrade to a
single hash bucket when you only have a single NFS client. That will be
addressed in a later patch.

The careful observer may note that this patch leaves the file_lock_list
alone. There's much less of a case for turning the file_lock_list into a
hashtable. The only user of that list is the code that generates
/proc/locks, and it always walks the entire list.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Acked-by: NJ. Bruce Fields <bfields@fieldses.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

48f74186

locks: convert fl_link to a hlist_node · 139ca04e

由 Jeff Layton 提交于 6月 21, 2013

Testing has shown that iterating over the blocked_list for deadlock
detection turns out to be a bottleneck. In order to alleviate that,
begin the process of turning it into a hashtable. We start by turning
the fl_link into a hlist_node and the global lists into hlists. A later
patch will do the conversion of the blocked_list to a hashtable.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Acked-by: NJ. Bruce Fields <bfields@fieldses.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

139ca04e

locks: avoid taking global lock if possible when waking up blocked waiters · 4e8c765d

由 Jeff Layton 提交于 6月 21, 2013

Since we always hold the i_lock when inserting a new waiter onto the
fl_block list, we can avoid taking the global lock at all if we find
that it's empty when we go to wake up blocked waiters.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4e8c765d

locks: protect most of the file_lock handling with i_lock · 1c8c601a

由 Jeff Layton 提交于 6月 21, 2013

Having a global lock that protects all of this code is a clear
scalability problem. Instead of doing that, move most of the code to be
protected by the i_lock instead. The exceptions are the global lists
that the ->fl_link sits on, and the ->fl_block list.

->fl_link is what connects these structures to the
global lists, so we must ensure that we hold those locks when iterating
over or updating these lists.

Furthermore, sound deadlock detection requires that we hold the
blocked_list state steady while checking for loops. We also must ensure
that the search and update to the list are atomic.

For the checking and insertion side of the blocked_list, push the
acquisition of the global lock into __posix_lock_file and ensure that
checking and update of the  blocked_list is done without dropping the
lock in between.

On the removal side, when waking up blocked lock waiters, take the
global lock before walking the blocked list and dequeue the waiters from
the global list prior to removal from the fl_block list.

With this, deadlock detection should be race free while we minimize
excessive file_lock_lock thrashing.

Finally, in order to avoid a lock inversion problem when handling
/proc/locks output we must ensure that manipulations of the fl_block
list are also protected by the file_lock_lock.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1c8c601a

locks: encapsulate the fl_link list handling · 88974691

由 Jeff Layton 提交于 6月 21, 2013

Move the fl_link list handling routines into a separate set of helpers.
Also ensure that locks and requests are always put on global lists
last (after fully initializing them) and are taken off before unintializing
them.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

88974691

locks: make "added" in __posix_lock_file a bool · b9746ef8

由 Jeff Layton 提交于 6月 21, 2013

Signed-off-by: NJeff Layton <jlayton@redhat.com>
Acked-by: NJ. Bruce Fields <bfields@fieldses.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b9746ef8

locks: comment cleanups and clarifications · 1cb36012

由 Jeff Layton 提交于 6月 21, 2013

Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1cb36012

locks: make generic_add_lease and generic_delete_lease static · d4f22d19

由 Jeff Layton 提交于 6月 21, 2013

Signed-off-by: NJeff Layton <jlayton@redhat.com>
Acked-by: NJ. Bruce Fields <bfields@fieldses.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d4f22d19

cifs: use posix_unblock_lock instead of locks_delete_block · 1a9e64a7

由 Jeff Layton 提交于 6月 21, 2013

commit 66189be7 (CIFS: Fix VFS lock usage for oplocked files) exported
the locks_delete_block symbol. There's already an exported helper
function that provides this capability however, so make cifs use that
instead and turn locks_delete_block back into a static function.

Note that if fl->fl_next == NULL then this lock has already been through
locks_delete_block(), so we should be OK to ignore an ENOENT error here
and simply not retry the lock.

Cc: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Acked-by: NJ. Bruce Fields <bfields@fieldses.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1a9e64a7

J
locks: drop the unused filp argument to posix_unblock_lock · f891a29f
由 Jeff Layton 提交于 6月 21, 2013
```
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
f891a29f

23 2月, 2013 1 次提交
- A
  new helper: file_inode(file) · 496ad9aa
  由 Al Viro 提交于 1月 23, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  496ad9aa
27 9月, 2012 1 次提交
- A
  switch simple cases of fget_light to fdget · 2903ff01
  由 Al Viro 提交于 8月 28, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  2903ff01
21 8月, 2012 1 次提交

vfs: don't treat fl_type as a bitmap · 0ee5c6d6

由 Jeff Layton 提交于 8月 02, 2012

The rules for fl_type are rather convoluted. Typically it's treated as
holding specific values, except in the case of LOCK_MAND, in which case
it can be or'ed with LOCK_READ|LOCK_WRITE.

On some arches F_WRLCK == 2 and F_UNLCK == 3, so and'ing with F_WRLCK will also
catch the F_UNLCK case. It's unlikely in either case here that we'd ever see
F_UNLCK since those shouldn't end up on any lists, but it's still best to be
consistent.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

0ee5c6d6

02 8月, 2012 1 次提交

locks: remove unused lm_release_private · 068535f1

由 J. Bruce Fields 提交于 8月 01, 2012

In commit 3b6e2723 ("locks: prevent side-effects of
locks_release_private before file_lock is initialized") we removed the
last user of lm_release_private without removing the field itself.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

068535f1

28 7月, 2012 1 次提交

locks: move lease-specific code out of locks_delete_lock · 96d6d59c

由 J. Bruce Fields 提交于 7月 27, 2012

No point putting something only used by one caller into common code.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

96d6d59c

27 7月, 2012 1 次提交

locks: prevent side-effects of locks_release_private before file_lock is initialized · 3b6e2723

由 Filipe Brandenburger 提交于 7月 27, 2012

When calling fcntl(fd, F_SETLEASE, lck) [with lck=F_WRLCK or F_RDLCK],
the custom signal or owner (if any were previously set using F_SETSIG
or F_SETOWN fcntls) would be reset when F_SETLEASE was called for the
second time on the same file descriptor.

This bug is a regression of 2.6.37 and is described here:
https://bugzilla.kernel.org/show_bug.cgi?id=43336

This patch reverts a commit from Oct 2004 (with subject "nfs4 lease:
move the f_delown processing") which originally introduced the
lm_release_private callback.
Signed-off-by: NFilipe Brandenburger <filbranden@gmail.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

3b6e2723

24 7月, 2012 1 次提交

locks: fix checking of fcntl_setlease argument · 0ec4f431

由 J. Bruce Fields 提交于 7月 23, 2012

The only checks of the long argument passed to fcntl(fd,F_SETLEASE,.)
are done after converting the long to an int.  Thus some illegal values
may be let through and cause problems in later code.

[ They actually *don't* cause problems in mainline, as of Dave Jones's
  commit 8d657eb3 "Remove easily user-triggerable BUG from
  generic_setlease", but we should fix this anyway.  And this patch will
  be necessary to fix real bugs on earlier kernels. ]

Cc: stable@vger.kernel.org
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0ec4f431

14 7月, 2012 1 次提交

Remove easily user-triggerable BUG from generic_setlease · 8d657eb3

由 Dave Jones 提交于 7月 13, 2012

This can be trivially triggered from userspace by passing in something unexpected.

    kernel BUG at fs/locks.c:1468!
    invalid opcode: 0000 [#1] SMP
    RIP: 0010:generic_setlease+0xc2/0x100
    Call Trace:
      __vfs_setlease+0x35/0x40
      fcntl_setlease+0x76/0x150
      sys_fcntl+0x1c6/0x810
      system_call_fastpath+0x1a/0x1f
Signed-off-by: NDave Jones <davej@redhat.com>
Cc: stable@kernel.org # 3.2+
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8d657eb3

30 5月, 2012 1 次提交
- A
  switch flock to fget_light/fput_light · bdc68959
  由 Al Viro 提交于 4月 21, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  bdc68959

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功