提交 · 4453641fe85f2ffda653e2e61b6a554dba1f0581 · openanolis / cloud-kernel

27 9月, 2014 6 次提交

__d_materialise_dentry(): flip the order of arguments · 4453641f

由 Al Viro 提交于 9月 26, 2014

... thus making it much closer to (now unreachable, BTW) IS_ROOT(dentry)
case in __d_move().  A bit more and it'll fold in.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4453641f

__d_move(): fold manipulations with ->d_child/->d_subdirs · 9d8cd306

由 Al Viro 提交于 9月 26, 2014

list_del() + list_add() is a slightly pessimised list_move()
list_del() + INIT_LIST_HEAD() is a slightly pessimised list_del_init()

Interleaving those makes the resulting code even worse. And harder to follow...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9d8cd306

A
don't open-code d_rehash() in d_materialise_unique() · 8527dd71
由 Al Viro 提交于 9月 26, 2014
```
... and get rid of duplicate BUG_ON() there
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
8527dd71
A
pull rehashing and unlocking the target dentry into __d_materialise_dentry() · 5cc3821b
由 Al Viro 提交于 9月 26, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
5cc3821b
A
ufs: deal with nfsd/iget races · e4502c63
由 Al Viro 提交于 9月 26, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
e4502c63

fuse: honour max_read and max_write in direct_io mode · 2c80929c

由 Miklos Szeredi 提交于 9月 24, 2014

The third argument of fuse_get_user_pages() "nbytesp" refers to the number of
bytes a caller asked to pack into fuse request. This value may be lesser
than capacity of fuse request or iov_iter.  So fuse_get_user_pages() must
ensure that *nbytesp won't grow.

Now, when helper iov_iter_get_pages() performs all hard work of extracting
pages from iov_iter, it can be done by passing properly calculated
"maxsize" to the helper.

The other caller of iov_iter_get_pages() (dio_refill_pages()) doesn't need
this capability, so pass LONG_MAX as the maxsize argument here.

Fixes: c9c37e2e ("fuse: switch to iov_iter_get_pages()")
Reported-by: NWerner Baumann <werner.baumann@onlinehome.de>
Tested-by: NMaxim Patlasov <mpatlasov@parallels.com>
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2c80929c

19 9月, 2014 2 次提交

NFSv4: Fix another bug in the close/open_downgrade code · cd9288ff

由 Trond Myklebust 提交于 9月 18, 2014

James Drew reports another bug whereby the NFS client is now sending
an OPEN_DOWNGRADE in a situation where it should really have sent a
CLOSE: the client is opening the file for O_RDWR, but then trying to
do a downgrade to O_RDONLY, which is not allowed by the NFSv4 spec.
Reported-by: NJames Drews <drews@engr.wisc.edu>
Link: http://lkml.kernel.org/r/541AD7E5.8020409@engr.wisc.edu
Fixes: aee7af35 (NFSv4: Fix problems with close in the presence...)
Cc: stable@vger.kernel.org # 2.6.33+
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

cd9288ff

NFSv4: nfs4_state_manager() vs. nfs_server_remove_lists() · 080af20c

由 Steve Dickson 提交于 9月 18, 2014

There is a race between nfs4_state_manager() and
nfs_server_remove_lists() that happens during a nfsv3 mount.

The v3 mount notices there is already a supper block so
nfs_server_remove_lists() called which uses the nfs_client_lock
spin lock to synchronize access to the client list.

At the same time nfs4_state_manager() is running through
the client list looking for work to do, using the same
lock. When nfs4_state_manager() wins the race to the
list, a v3 client pointer is found and not ignored
properly which causes the panic.

Moving some protocol checks before the state checking
avoids the panic.

CC: Stable Tree <stable@vger.kernel.org>
Signed-off-by: NSteve Dickson <steved@redhat.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

080af20c

18 9月, 2014 1 次提交

Revert "Btrfs: device_list_add() should not update list when mounted" · 0f23ae74

由 Chris Mason 提交于 9月 18, 2014

This reverts commit b96de000.

This commit is triggering failures to mount by subvolume id in some
configurations.  The main problem is how many different ways this
scanning function is used, both for scanning while mounted and
unmounted.  A proper cleanup is too big for late rcs.

For now, just revert the commit and we'll put a better fix into a later
merge window.
Signed-off-by: NChris Mason <clm@fb.com>

0f23ae74

17 9月, 2014 1 次提交

Btrfs: set inode's logged_trans/last_log_commit after ranged fsync · 125c4cf9

由 Filipe Manana 提交于 9月 11, 2014

When a ranged fsync finishes if there are still extent maps in the modified
list, still set the inode's logged_trans and last_log_commit. This is important
in case an inode is fsync'ed and unlinked in the same transaction, to ensure its
inode ref gets deleted from the log and the respective dentries in its parent
are deleted too from the log (if the parent directory was fsync'ed in the same
transaction).

Instead make btrfs_inode_in_log() return false if the list of modified extent
maps isn't empty.

This is an incremental on top of the v4 version of the patch:

    "Btrfs: fix fsync data loss after a ranged fsync"

which was added to its v5, but didn't make it on time.
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

125c4cf9

16 9月, 2014 6 次提交

vfs: workaround gcc <4.6 build error in link_path_walk() · a060dc50

由 James Hogan 提交于 9月 16, 2014

Commit d6bb3e90 ("vfs: simplify and shrink stack frame of
link_path_walk()") introduced build problems with GCC versions older
than 4.6 due to the initialisation of a member of an anonymous union in
struct qstr without enclosing braces.

This hits GCC bug 10676 [1] (which was fixed in GCC 4.6 by [2]), and
causes the following build error:

  fs/namei.c: In function 'link_path_walk':
  fs/namei.c:1778: error: unknown field 'hash_len' specified in initializer

This is worked around by adding explicit braces.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10676
[2] https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=159206

Fixes: d6bb3e90 (vfs: simplify and shrink stack frame of link_path_walk())
Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-metag@vger.kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a060dc50

Fix mfsymlinks file size check · 364d4293

由 Steve French 提交于 9月 16, 2014

If the mfsymlinks file size has changed (e.g. the file no longer
represents an emulated symlink) we were not returning an error properly.
Signed-off-by: NSteve French <smfrench@gmail.com>
Reviewed-by: NStefan Metzmacher <metze@samba.org>

364d4293

S
Update version number displayed by modinfo for cifs.ko · 69af38db
由 Steve French 提交于 9月 16, 2014
```
Update cifs.ko version to 2.05

Signed-off-by: Steve French <smfrench@gmail.com>w
```
69af38db

cifs: remove dead code · 116ae5e2

由 Arnd Bergmann 提交于 9月 08, 2014

cifs provides two dummy functions 'sess_auth_lanman' and
'sess_auth_kerberos' for the case in which the respective
features are not defined. However, the caller is also under
an #ifdef, so we just get warnings about unused code:

fs/cifs/sess.c:1109:1: warning: 'sess_auth_kerberos' defined but not used [-Wunused-function]
 sess_auth_kerberos(struct sess_data *sess_data)

Removing the dead functions gets rid of the warnings without
any downsides that I can see.

(Yalin Wang reported the identical problem and fix so added him)
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NYalin Wang <yalin.wang@sonymobile.com>
Signed-off-by: NSteve French <smfrench@gmail.com>

116ae5e2

Revert "cifs: No need to send SIGKILL to demux_thread during umount" · a5c3e1c7

由 Steve French 提交于 9月 16, 2014

This reverts commit 52a36244.

Causes rmmod to fail for at least 7 seconds after unmount which
makes automated testing a little harder when reloading cifs.ko
between test runs.
Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
CC: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: NSteve French <smfrench@gmail.com>

a5c3e1c7

vfs: simplify and shrink stack frame of link_path_walk() · d6bb3e90

由 Linus Torvalds 提交于 9月 15, 2014

Commit 9226b5b4 ("vfs: avoid non-forwarding large load after small
store in path lookup") made link_path_walk() always access the
"hash_len" field as a single 64-bit entity, in order to avoid mixed size
accesses to the members.

However, what I didn't notice was that that effectively means that the
whole "struct qstr this" is now basically redundant.  We already
explicitly track the "const char *name", and if we just use "u64
hash_len" instead of "long len", there is nothing else left of the
"struct qstr".

We do end up wanting the "struct qstr" if we have a filesystem with a
"d_hash()" function, but that's a rare case, and we might as well then
just squirrell away the name and hash_len at that point.

End result: fewer live variables in the loop, a smaller stack frame, and
better code generation.  And we don't need to pass in pointers variables
to helper functions any more, because the return value contains all the
relevant information.  So this removes more lines than it adds, and the
source code is clearer too.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d6bb3e90

15 9月, 2014 5 次提交

[SMB3] Fix oops when creating symlinks on smb3 · da80659d

由 Steve French 提交于 9月 14, 2014

We were not checking for symlink support properly for SMB2/SMB3
mounts so could oops when mounted with mfsymlinks when try
to create symlink when mfsymlinks on smb2/smb3 mounts
Signed-off-by: NSteve French <smfrench@gmail.com>
Cc: <stable@vger.kernel.org> # 3.14+
CC: Sachin Prabhu <sprabhu@redhat.com>

da80659d

vfs: avoid non-forwarding large load after small store in path lookup · 9226b5b4

由 Linus Torvalds 提交于 9月 14, 2014

The performance regression that Josef Bacik reported in the pathname
lookup (see commit 99d263d4 "vfs: fix bad hashing of dentries") made
me look at performance stability of the dcache code, just to verify that
the problem was actually fixed.  That turned up a few other problems in
this area.

There are a few cases where we exit RCU lookup mode and go to the slow
serializing case when we shouldn't, Al has fixed those and they'll come
in with the next VFS pull.

But my performance verification also shows that link_path_walk() turns
out to have a very unfortunate 32-bit store of the length and hash of
the name we look up, followed by a 64-bit read of the combined hash_len
field.  That screws up the processor store to load forwarding, causing
an unnecessary hickup in this critical routine.

It's caused by the ugly calling convention for the "hash_name()"
function, and easily fixed by just making hash_name() fill in the whole
'struct qstr' rather than passing it a pointer to just the hash value.

With that, the profile for this function looks much smoother.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9226b5b4

[CIFS] Fix setting time before epoch (negative time values) · 2ae83bf9

由 Steve French 提交于 9月 14, 2014

xfstest generic/258 sets the time on a file to a negative value
(before 1970) which fails since do_div can not handle negative
numbers.  In addition 'normal' division of 64 bit values does
not build on 32 bit arch so have to workaround this by special
casing negative values in cifs_NTtimeToUnix

Samba server also has a bug with this (see samba bugzilla 7771)
but it works to Windows server.
Signed-off-by: NSteve French <smfrench@gmail.com>

2ae83bf9

be careful with nd->inode in path_init() and follow_dotdot_rcu() · 4023bfc9

由 Al Viro 提交于 9月 13, 2014

in the former we simply check if dentry is still valid after picking
its ->d_inode; in the latter we fetch ->d_inode in the same places
where we fetch dentry and its ->d_seq, under the same checks.

Cc: stable@vger.kernel.org # 2.6.38+
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4023bfc9

don't bugger nd->seq on set_root_rcu() from follow_dotdot_rcu() · 7bd88377

由 Al Viro 提交于 9月 13, 2014

return the value instead, and have path_init() do the assignment.  Broken by
"vfs: Fix absolute RCU path walk failures due to uninitialized seq number",
which was Cc-stable with 2.6.38+ as destination.  This one should go where
it went.

To avoid dummy value returned in case when root is already set (it would do
no harm, actually, since the only caller that doesn't ignore the return value
is guaranteed to have nd->root *not* set, but it's more obvious that way),
lift the check into callers.  And do the same to set_root(), to keep them
in sync.

Cc: stable@vger.kernel.org # 2.6.38+
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7bd88377

14 9月, 2014 3 次提交

fix bogus read_seqretry() checks introduced in · f5be3e29

由 Al Viro 提交于 9月 13, 2014

read_seqretry() returns true on mismatch, not on match...

Cc: stable@vger.kernel.org # 3.15+
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

f5be3e29

A
move the call of __d_drop(anon) into __d_materialise_unique(dentry, anon) · 6f18493e
由 Al Viro 提交于 9月 11, 2014
```
and lock the right list there

Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
6f18493e

vfs: fix bad hashing of dentries · 99d263d4

由 Linus Torvalds 提交于 9月 13, 2014

Josef Bacik found a performance regression between 3.2 and 3.10 and
narrowed it down to commit bfcfaa77 ("vfs: use 'unsigned long'
accesses for dcache name comparison and hashing"). He reports:

 "The test case is essentially

      for (i = 0; i < 1000000; i++)
              mkdir("a$i");

  On xfs on a fio card this goes at about 20k dir/sec with 3.2, and 12k
  dir/sec with 3.10.  This is because we spend waaaaay more time in
  __d_lookup on 3.10 than in 3.2.

  The new hashing function for strings is suboptimal for <
  sizeof(unsigned long) string names (and hell even > sizeof(unsigned
  long) string names that I've tested).  I broke out the old hashing
  function and the new one into a userspace helper to get real numbers
  and this is what I'm getting:

      Old hash table had 1000000 entries, 0 dupes, 0 max dupes
      New hash table had 12628 entries, 987372 dupes, 900 max dupes
      We had 11400 buckets with a p50 of 30 dupes, p90 of 240 dupes, p99 of 567 dupes for the new hash

  My test does the hash, and then does the d_hash into a integer pointer
  array the same size as the dentry hash table on my system, and then
  just increments the value at the address we got to see how many
  entries we overlap with.

  As you can see the old hash function ended up with all 1 million
  entries in their own bucket, whereas the new one they are only
  distributed among ~12.5k buckets, which is why we're using so much
  more CPU in __d_lookup".

The reason for this hash regression is two-fold:

 - On 64-bit architectures the down-mixing of the original 64-bit
   word-at-a-time hash into the final 32-bit hash value is very
   simplistic and suboptimal, and just adds the two 32-bit parts
   together.

   In particular, because there is no bit shuffling and the mixing
   boundary is also a byte boundary, similar character patterns in the
   low and high word easily end up just canceling each other out.

 - the old byte-at-a-time hash mixed each byte into the final hash as it
   hashed the path component name, resulting in the low bits of the hash
   generally being a good source of hash data.  That is not true for the
   word-at-a-time case, and the hash data is distributed among all the
   bits.

The fix is the same in both cases: do a better job of mixing the bits up
and using as much of the hash data as possible.  We already have the
"hash_32|64()" functions to do that.
Reported-by: NJosef Bacik <jbacik@fb.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Chris Mason <clm@fb.com>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

99d263d4

13 9月, 2014 2 次提交

GFS2: fix d_splice_alias() misuses · cfb2f9d5

由 Al Viro 提交于 9月 12, 2014

Callers of d_splice_alias(dentry, inode) don't need iput(), neither
on success nor on failure.  Either the reference to inode is stored
in a previously negative dentry, or it's dropped.  In either case
inode reference the caller used to hold is consumed.

__gfs2_lookup() does iput() in case when d_splice_alias() has failed.
Double iput() if we ever hit that.  And gfs2_create_inode() ends up
not only with double iput(), but with link count dropped to zero - on
an inode it has just found in directory.

Cc: stable@vger.kernel.org # v3.14+
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

cfb2f9d5

NFS: remove BUG possibility in nfs4_open_and_get_state · f39c0104

由 NeilBrown 提交于 9月 11, 2014

commit 4fa2c54b
    NFS: nfs4_do_open should add negative results to the dcache.

used "d_drop(); d_add();" to ensure that a dentry was hashed
as a negative cached entry.
This is not safe if the dentry has an non-NULL ->d_inode.
It will trigger a BUG_ON in d_instantiate().
In that case, d_delete() is needed.

Also, only d_add if the dentry is currently unhashed, it seems
pointless removed and re-adding it unchanged.
Reported-by: NChristoph Hellwig <hch@infradead.org>
Fixes: 4fa2c54b
Cc: Jeff Layton <jeff.layton@primarydata.com>
Link: http://lkml.kernel.org/r/20140908144525.GB19811@infradead.orgSigned-off-by: NNeilBrown <neilb@suse.de>
Acked-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

f39c0104

11 9月, 2014 6 次提交

GFS2: Don't use MAXQUOTAS value · a937cca2

由 Jan Kara 提交于 9月 10, 2014

MAXQUOTAS value defines maximum number of quota types VFS supports.
This isn't necessarily the number of types gfs2 supports and with
addition of project quotas these two numbers stop matching. So make gfs2
use its private definition.

CC: cluster-devel@redhat.com
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

a937cca2

GFS2: Hash the negative dentry during inode lookup · 7b7a9115

由 Benjamin Coddington 提交于 9月 10, 2014

Fix a regression introduced by:
6d4ade98 GFS2: Add atomic_open support
where an early return misses d_splice_alias() which had been
adding the negative dentry.
Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

7b7a9115

fs/notify: don't show f_handle if exportfs_encode_inode_fh failed · 7e882481

由 Andrey Vagin 提交于 9月 09, 2014

Currently we handle only ENOSPC.  In case of other errors the file_handle
variable isn't filled properly and we will show a part of stack.
Signed-off-by: NAndrey Vagin <avagin@openvz.org>
Acked-by: NCyrill Gorcunov <gorcunov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7e882481

fsnotify/fdinfo: use named constants instead of hardcoded values · 1fc98d11

由 Andrey Vagin 提交于 9月 09, 2014

MAX_HANDLE_SZ is equal to 128, but currently the size of pad is only 64
bytes, so exportfs_encode_inode_fh can return an error.
Signed-off-by: NAndrey Vagin <avagin@openvz.org>
Acked-by: NCyrill Gorcunov <gorcunov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1fc98d11

eventpoll: fix uninitialized variable in epoll_ctl · c680e41b

由 Nicolas Iooss 提交于 9月 09, 2014

When calling epoll_ctl with operation EPOLL_CTL_DEL, structure epds is
not initialized but ep_take_care_of_epollwakeup reads its event field.
When this unintialized field has EPOLLWAKEUP bit set, a capability check
is done for CAP_BLOCK_SUSPEND in ep_take_care_of_epollwakeup.  This
produces unexpected messages in the audit log, such as (on a system
running SELinux):

    type=AVC msg=audit(1408212798.866:410): avc:  denied
    { block_suspend } for  pid=7754 comm="dbus-daemon" capability=36
    scontext=unconfined_u:unconfined_r:unconfined_t
    tcontext=unconfined_u:unconfined_r:unconfined_t
    tclass=capability2 permissive=1

    type=SYSCALL msg=audit(1408212798.866:410): arch=c000003e syscall=233
    success=yes exit=0 a0=3 a1=2 a2=9 a3=7fffd4d66ec0 items=0 ppid=1
    pid=7754 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0
    fsgid=0 tty=(none) ses=3 comm="dbus-daemon"
    exe="/usr/bin/dbus-daemon"
    subj=unconfined_u:unconfined_r:unconfined_t key=(null)

("arch=c000003e syscall=233 a1=2" means "epoll_ctl(op=EPOLL_CTL_DEL)")

Remove use of epds in epoll_ctl when op == EPOLL_CTL_DEL.

Fixes: 4d7e30d9 ("epoll: Add a flag, EPOLLWAKEUP, to prevent suspend while epoll events are ready")
Signed-off-by: NNicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Arve Hjønnevåg <arve@android.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c680e41b

pnfs: fix filelayout_retry_commit when idx > 0 · 224ecbf5

由 Weston Andros Adamson 提交于 9月 09, 2014

filelayout_retry_commit was recently split out from alloc_ds_commits,
but was done in such a way that the bucket pointer always starts at
index 0 no matter what the @idx argument is set to.

The intention of the @idx argument is to retry commits starting at
bucket @idx. This is called when alloc_ds_commits fails for a bucket.
Signed-off-by: NWeston Andros Adamson <dros@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

224ecbf5

09 9月, 2014 7 次提交

nfs: revert "nfs4: queue free_lock_state job submission to nfsiod" · 0c0e0d3c

由 Jeff Layton 提交于 9月 08, 2014

This reverts commit 49a4bda2.

Christoph reported an oops due to the above commit:

generic/089 242s ...[ 2187.041239] general protection fault: 0000 [#1]
SMP
[ 2187.042899] Modules linked in:
[ 2187.044000] CPU: 0 PID: 11913 Comm: kworker/0:1 Not tainted 3.16.0-rc6+ #1151
[ 2187.044287] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[ 2187.044287] Workqueue: nfsiod free_lock_state_work
[ 2187.044287] task: ffff880072b50cd0 ti: ffff88007a4ec000 task.ti: ffff88007a4ec000
[ 2187.044287] RIP: 0010:[<ffffffff81361ca6>]  [<ffffffff81361ca6>] free_lock_state_work+0x16/0x30
[ 2187.044287] RSP: 0018:ffff88007a4efd58  EFLAGS: 00010296
[ 2187.044287] RAX: 6b6b6b6b6b6b6b6b RBX: ffff88007a947ac0 RCX: 8000000000000000
[ 2187.044287] RDX: ffffffff826af9e0 RSI: ffff88007b093c00 RDI: ffff88007b093db8
[ 2187.044287] RBP: ffff88007a4efd58 R08: ffffffff832d3e10 R09: 000001c40efc0000
[ 2187.044287] R10: 0000000000000000 R11: 0000000000059e30 R12: ffff88007fc13240
[ 2187.044287] R13: ffff88007fc18b00 R14: ffff88007b093db8 R15: 0000000000000000
[ 2187.044287] FS:  0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[ 2187.044287] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2187.044287] CR2: 00007f93ec33fb80 CR3: 0000000079dc2000 CR4: 00000000000006f0
[ 2187.044287] Stack:
[ 2187.044287]  ffff88007a4efdd8 ffffffff810cc877 ffffffff810cc80d ffff88007fc13258
[ 2187.044287]  000000007a947af0 0000000000000000 ffffffff8353ccc8 ffffffff82b6f3d0
[ 2187.044287]  0000000000000000 ffffffff82267679 ffff88007a4efdd8 ffff88007fc13240
[ 2187.044287] Call Trace:
[ 2187.044287]  [<ffffffff810cc877>] process_one_work+0x1c7/0x490
[ 2187.044287]  [<ffffffff810cc80d>] ? process_one_work+0x15d/0x490
[ 2187.044287]  [<ffffffff810cd569>] worker_thread+0x119/0x4f0
[ 2187.044287]  [<ffffffff810fbbad>] ? trace_hardirqs_on+0xd/0x10
[ 2187.044287]  [<ffffffff810cd450>] ? init_pwq+0x190/0x190
[ 2187.044287]  [<ffffffff810d3c6f>] kthread+0xdf/0x100
[ 2187.044287]  [<ffffffff810d3b90>] ? __init_kthread_worker+0x70/0x70
[ 2187.044287]  [<ffffffff81d9873c>] ret_from_fork+0x7c/0xb0
[ 2187.044287]  [<ffffffff810d3b90>] ? __init_kthread_worker+0x70/0x70
[ 2187.044287] Code: 0f 1f 44 00 00 31 c0 5d c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 8d b7 48 fe ff ff 48 8b 87 58 fe ff ff 48 89 e5 48 8b 40 30 <48> 8b 00 48 8b 10 48 89 c7 48 8b 92 90 03 00 00 ff 52 28 5d c3
[ 2187.044287] RIP  [<ffffffff81361ca6>] free_lock_state_work+0x16/0x30
[ 2187.044287]  RSP <ffff88007a4efd58>
[ 2187.103626] ---[ end trace 0f11326d28e5d8fa ]---

The original reason for this patch was because the fl_release_private
operation couldn't sleep. With commit ed9814d8 (locks: defer freeing
locks in locks_delete_lock until after i_lock has been dropped), this is
no longer a problem so we can revert this patch.
Reported-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

0c0e0d3c

nfs: fix kernel warning when removing proc entry · 21e81002

由 Cong Wang 提交于 9月 08, 2014

I saw the following kernel warning:

[ 1852.321222] ------------[ cut here ]------------
[ 1852.326527] WARNING: CPU: 0 PID: 118 at fs/proc/generic.c:521 remove_proc_entry+0x154/0x16b()
[ 1852.335630] remove_proc_entry: removing non-empty directory 'fs/nfsfs', leaking at least 'volumes'
[ 1852.344084] CPU: 0 PID: 118 Comm: kworker/u8:2 Not tainted 3.16.0+ #540
[ 1852.350036] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 1852.354992] Workqueue: netns cleanup_net
[ 1852.358701]  0000000000000000 ffff880116f2fbd0 ffffffff819c03e9 ffff880116f2fc18
[ 1852.366474]  ffff880116f2fc08 ffffffff810744ee ffffffff811e0e6e ffff8800d4e96238
[ 1852.373507]  ffffffff81dbe665 ffff8800d46a5948 0000000000000005 ffff880116f2fc68
[ 1852.380224] Call Trace:
[ 1852.381976]  [<ffffffff819c03e9>] dump_stack+0x4d/0x66
[ 1852.385495]  [<ffffffff810744ee>] warn_slowpath_common+0x7a/0x93
[ 1852.389869]  [<ffffffff811e0e6e>] ? remove_proc_entry+0x154/0x16b
[ 1852.393987]  [<ffffffff8107457b>] warn_slowpath_fmt+0x4c/0x4e
[ 1852.397999]  [<ffffffff811e0e6e>] remove_proc_entry+0x154/0x16b
[ 1852.402034]  [<ffffffff8129c73d>] nfs_fs_proc_net_exit+0x53/0x56
[ 1852.406136]  [<ffffffff812a103b>] nfs_net_exit+0x12/0x1d
[ 1852.409774]  [<ffffffff81785bc9>] ops_exit_list+0x44/0x55
[ 1852.413529]  [<ffffffff81786389>] cleanup_net+0xee/0x182
[ 1852.417198]  [<ffffffff81088c9e>] process_one_work+0x209/0x40d
[ 1852.502320]  [<ffffffff81088bf7>] ? process_one_work+0x162/0x40d
[ 1852.587629]  [<ffffffff810890c1>] worker_thread+0x1f0/0x2c7
[ 1852.673291]  [<ffffffff81088ed1>] ? process_scheduled_works+0x2f/0x2f
[ 1852.759470]  [<ffffffff8108e079>] kthread+0xc9/0xd1
[ 1852.843099]  [<ffffffff8109427f>] ? finish_task_switch+0x3a/0xce
[ 1852.926518]  [<ffffffff8108dfb0>] ? __kthread_parkme+0x61/0x61
[ 1853.008565]  [<ffffffff819cbeac>] ret_from_fork+0x7c/0xb0
[ 1853.076477]  [<ffffffff8108dfb0>] ? __kthread_parkme+0x61/0x61
[ 1853.140653] ---[ end trace 69c4c6617f78e32d ]---

It looks wrong that we add "/proc/net/nfsfs" in nfs_fs_proc_net_init()
while remove "/proc/fs/nfsfs" in nfs_fs_proc_net_exit().

Fixes: commit 65b38851 (NFS: Fix /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes)
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Dan Aloni <dan@kernelim.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
[Trond: replace uses of remove_proc_entry() with remove_proc_subtree()
as suggested by Al Viro]
Cc: stable@vger.kernel.org # 3.4.x : 65b38851: NFS: Fix /proc/fs/nfsfs/servers
Cc: stable@vger.kernel.org # 3.4.x
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

21e81002

Btrfs: use insert_inode_locked4 for inode creation · b0d5d10f

由 Chris Mason 提交于 9月 08, 2014

Btrfs was inserting inodes into the hash table before we had fully
set the inode up on disk.  This leaves us open to rare races that allow
two different inodes in memory for the same [root, inode] pair.

This patch fixes things by using insert_inode_locked4 to insert an I_NEW
inode and unlock_new_inode when we're ready for the rest of the kernel
to use the inode.

It also makes sure to init the operations pointers on the inode before
going into the error handling paths.
Signed-off-by: NChris Mason <clm@fb.com>
Reported-by: NAl Viro <viro@zeniv.linux.org.uk>

b0d5d10f

Btrfs: fix fsync data loss after a ranged fsync · 49dae1bc

由 Filipe Manana 提交于 9月 06, 2014

While we're doing a full fsync (when the inode has the flag
BTRFS_INODE_NEEDS_FULL_SYNC set) that is ranged too (covers only a
portion of the file), we might have ordered operations that are started
before or while we're logging the inode and that fall outside the fsync
range.

Therefore when a full ranged fsync finishes don't remove every extent
map from the list of modified extent maps - as for some of them, that
fall outside our fsync range, their respective ordered operation hasn't
finished yet, meaning the corresponding file extent item wasn't inserted
into the fs/subvol tree yet and therefore we didn't log it, and we must
let the next fast fsync (one that checks only the modified list) see this
extent map and log a matching file extent item to the log btree and wait
for its ordered operation to finish (if it's still ongoing).

A test case for xfstests follows.
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

49dae1bc

Btrfs: kfree()ing ERR_PTRs · c47ca32d

由 Dan Carpenter 提交于 9月 04, 2014

The "inherit" in btrfs_ioctl_snap_create_v2() and "vol_args" in
btrfs_ioctl_rm_dev() are ERR_PTRs so we can't call kfree() on them.

These kind of bugs are "One Err Bugs" where there is just one error
label that does everything. I could set the "inherit = NULL" and keep
the single out label but it ends up being more complicated that way. It
makes the code simpler to re-order the unwind so it's in the mirror
order of the allocation and introduce some new error labels.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NChris Mason <clm@fb.com>

c47ca32d

lockd: fix rpcbind crash on lockd startup failure · 7c17705e

由 J. Bruce Fields 提交于 8月 29, 2014

Nikita Yuschenko reported that booting a kernel with init=/bin/sh and
then nfs mounting without portmap or rpcbind running using a busybox
mount resulted in:

  # mount -t nfs 10.30.130.21:/opt /mnt
  svc: failed to register lockdv1 RPC service (errno 111).
  lockd_up: makesock failed, error=-111
  Unable to handle kernel paging request for data at address 0x00000030
  Faulting instruction address: 0xc055e65c
  Oops: Kernel access of bad area, sig: 11 [#1]
  MPC85xx CDS
  Modules linked in:
  CPU: 0 PID: 1338 Comm: mount Not tainted 3.10.44.cge #117
  task: cf29cea0 ti: cf35c000 task.ti: cf35c000
  NIP: c055e65c LR: c0566490 CTR: c055e648
  REGS: cf35dad0 TRAP: 0300   Not tainted  (3.10.44.cge)
  MSR: 00029000 <CE,EE,ME>  CR: 22442488  XER: 20000000
  DEAR: 00000030, ESR: 00000000

  GPR00: c05606f4 cf35db80 cf29cea0 cf0ded80 cf0dedb8 00000001 1dec3086
  00000000
  GPR08: 00000000 c07b1640 00000007 1dec3086 22442482 100b9758 00000000
  10090ae8
  GPR16: 00000000 000186a5 00000000 00000000 100c3018 bfa46edc 100b0000
  bfa46ef0
  GPR24: cf386ae0 c07834f0 00000000 c0565f88 00000001 cf0dedb8 00000000
  cf0ded80
  NIP [c055e65c] call_start+0x14/0x34
  LR [c0566490] __rpc_execute+0x70/0x250
  Call Trace:
  [cf35db80] [00000080] 0x80 (unreliable)
  [cf35dbb0] [c05606f4] rpc_run_task+0x9c/0xc4
  [cf35dbc0] [c0560840] rpc_call_sync+0x50/0xb8
  [cf35dbf0] [c056ee90] rpcb_register_call+0x54/0x84
  [cf35dc10] [c056f24c] rpcb_register+0xf8/0x10c
  [cf35dc70] [c0569e18] svc_unregister.isra.23+0x100/0x108
  [cf35dc90] [c0569e38] svc_rpcb_cleanup+0x18/0x30
  [cf35dca0] [c0198c5c] lockd_up+0x1dc/0x2e0
  [cf35dcd0] [c0195348] nlmclnt_init+0x2c/0xc8
  [cf35dcf0] [c015bb5c] nfs_start_lockd+0x98/0xec
  [cf35dd20] [c015ce6c] nfs_create_server+0x1e8/0x3f4
  [cf35dd90] [c0171590] nfs3_create_server+0x10/0x44
  [cf35dda0] [c016528c] nfs_try_mount+0x158/0x1e4
  [cf35de20] [c01670d0] nfs_fs_mount+0x434/0x8c8
  [cf35de70] [c00cd3bc] mount_fs+0x20/0xbc
  [cf35de90] [c00e4f88] vfs_kern_mount+0x50/0x104
  [cf35dec0] [c00e6e0c] do_mount+0x1d0/0x8e0
  [cf35df10] [c00e75ac] SyS_mount+0x90/0xd0
  [cf35df40] [c000ccf4] ret_from_syscall+0x0/0x3c

The addition of svc_shutdown_net() resulted in two calls to
svc_rpcb_cleanup(); the second is no longer necessary and crashes when
it calls rpcb_register_call with clnt=NULL.
Reported-by: NNikita Yushchenko <nyushchenko@dev.rtsoft.ru>
Fixes: 679b033d "lockd: ensure we tear down any live sockets when socket creation fails during lockd_up"
Cc: stable@vger.kernel.org
Acked-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

7c17705e

nfsd4: fix rd_dircount enforcement · aee37764

由 J. Bruce Fields 提交于 8月 20, 2014

Commit 3b299709 "nfsd4: enforce rd_dircount" totally misunderstood
rd_dircount; it refers to total non-attribute bytes returned, not number
of directory entries returned.

Bring the code into agreement with RFC 3530 section 14.2.24.

Cc: stable@vger.kernel.org
Fixes: 3b299709 "nfsd4: enforce rd_dircount"
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

aee37764

08 9月, 2014 1 次提交

ufs: fix deadlocks introduced by sb mutex merge · 9ef7db7f

由 Alexey Khoroshilov 提交于 9月 02, 2014

Commit 0244756e ("ufs: sb mutex merge + mutex_destroy") introduces
deadlocks in ufs_new_inode() and ufs_free_inode().
Most callers of that functions acqure the mutex by themselves and
ufs_{new,free}_inode() do that via lock_ufs(),
i.e we have an unavoidable double lock.

The patch proposes to resolve the issue by making sure that
ufs_{new,free}_inode() are not called with the mutex held.

Found by Linux Driver Verification project (linuxtesting.org).

Cc: stable@vger.kernel.org # 3.16
Signed-off-by: NAlexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9ef7db7f

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功