提交 · fa9c2362497fbd64788063288dc4e74daf977ebb · openeuler / Kernel

25 11月, 2019 23 次提交

CIFS: Fix SMB2 oplock break processing · fa9c2362

由 Pavel Shilovsky 提交于 10月 31, 2019

Even when mounting modern protocol version the server may be
configured without supporting SMB2.1 leases and the client
uses SMB2 oplock to optimize IO performance through local caching.

However there is a problem in oplock break handling that leads
to missing a break notification on the client who has a file
opened. It latter causes big latencies to other clients that
are trying to open the same file.

The problem reproduces when there are multiple shares from the
same server mounted on the client. The processing code tries to
match persistent and volatile file ids from the break notification
with an open file but it skips all share besides the first one.
Fix this by looking up in all shares belonging to the server that
issued the oplock break.

Cc: Stable <stable@vger.kernel.org>
Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

fa9c2362

cifs: don't use 'pre:' for MODULE_SOFTDEP · 3591bb83

由 Ronnie Sahlberg 提交于 10月 31, 2019

It can cause
to fail with
modprobe: FATAL: Module <module> is builtin.

RHBZ: 1767094
Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

3591bb83

cifs: smbd: Return -EAGAIN when transport is reconnecting · 4357d45f

由 Long Li 提交于 10月 16, 2019

During reconnecting, the transport may have already been destroyed and is in
the process being reconnected. In this case, return -EAGAIN to not fail and
to retry this I/O.
Signed-off-by: NLong Li <longli@microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: NSteve French <stfrench@microsoft.com>

4357d45f

cifs: smbd: Only queue work for error recovery on memory registration · c21ce58e

由 Long Li 提交于 10月 16, 2019

It's not necessary to queue invalidated memory registration to work queue, as
all we need to do is to unmap the SG and make it usable again. This can save
CPU cycles in normal data paths as memory registration errors are rare and
normally only happens during reconnection.
Signed-off-by: NLong Li <longli@microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: NSteve French <stfrench@microsoft.com>

c21ce58e

smb3: add debug messages for closing unmatched open · 87bc2376

由 Ronnie Sahlberg 提交于 11月 14, 2019

Helps distinguish between an interrupted close and a truly
unmatched open.
Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

87bc2376

CIFS: Do not miss cancelled OPEN responses · 7b71843f

由 Pavel Shilovsky 提交于 11月 21, 2019

When an OPEN command is cancelled we mark a mid as
cancelled and let the demultiplex thread process it
by closing an open handle. The problem is there is
a race between a system call thread and the demultiplex
thread and there may be a situation when the mid has
been already processed before it is set as cancelled.

Fix this by processing cancelled requests when mids
are being destroyed which means that there is only
one thread referencing a particular mid. Also set
mids as cancelled unconditionally on their state.

Cc: Stable <stable@vger.kernel.org>
Tested-by: NFrank Sorenson <sorenson@redhat.com>
Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

7b71843f

CIFS: Fix NULL pointer dereference in mid callback · 86a7964b

由 Pavel Shilovsky 提交于 11月 21, 2019

There is a race between a system call processing thread
and the demultiplex thread when mid->resp_buf becomes NULL
and later is being accessed to get credits. It happens when
the 1st thread wakes up before a mid callback is called in
the 2nd one but the mid state has already been set to
MID_RESPONSE_RECEIVED. This causes NULL pointer dereference
in mid callback.

Fix this by saving credits from the response before we
update the mid state and then use this value in the mid
callback rather then accessing a response buffer.

Cc: Stable <stable@vger.kernel.org>
Fixes: ee258d79 ("CIFS: Move credit processing to mid callbacks for SMB3")
Tested-by: NFrank Sorenson <sorenson@redhat.com>
Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

86a7964b

CIFS: Close open handle after interrupted close · 9150c3ad

由 Pavel Shilovsky 提交于 11月 21, 2019

If Close command is interrupted before sending a request
to the server the client ends up leaking an open file
handle. This wastes server resources and can potentially
block applications that try to remove the file or any
directory containing this file.

Fix this by putting the close command into a worker queue,
so another thread retries it later.

Cc: Stable <stable@vger.kernel.org>
Tested-by: NFrank Sorenson <sorenson@redhat.com>
Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

9150c3ad

CIFS: Respect O_SYNC and O_DIRECT flags during reconnect · 44805b0e

由 Pavel Shilovsky 提交于 11月 12, 2019

Currently the client translates O_SYNC and O_DIRECT flags
into corresponding SMB create options when openning a file.
The problem is that on reconnect when the file is being
re-opened the client doesn't set those flags and it causes
a server to reject re-open requests because create options
don't match. The latter means that any subsequent system
call against that open file fail until a share is re-mounted.

Fix this by properly setting SMB create options when
re-openning files after reconnects.

Fixes: 1013e760: ("SMB3: Don't ignore O_SYNC/O_DSYNC and O_DIRECT flags")
Cc: Stable <stable@vger.kernel.org>
Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

44805b0e

smb3: remove confusing dmesg when mounting with encryption ("seal") · 037d0507

由 Steve French 提交于 11月 08, 2019

The smb2/smb3 message checking code was logging to dmesg when mounting
with encryption ("seal") for compounded SMB3 requests.  When encrypted
the whole frame (including potentially multiple compounds) is read
so the length field is longer than in the case of non-encrypted
case (where length field will match the the calculated length for
the particular SMB3 request in the compound being validated).

Avoids the warning on mount (with "seal"):

   "srv rsp padded more than expected. Length 384 not ..."
Signed-off-by: NSteve French <stfrench@microsoft.com>

037d0507

R
cifs: close the shared root handle on tree disconnect · 72e73c78
由 Ronnie Sahlberg 提交于 11月 07, 2019
```
Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>
```
72e73c78

CIFS: Return directly after a failed build_path_from_dentry() in cifs_do_create() · 598b6c57

由 Markus Elfring 提交于 8月 20, 2017

Return directly after a call of the function "build_path_from_dentry"
failed at the beginning.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Signed-off-by: NSteve French <stfrench@microsoft.com>

598b6c57

CIFS: Use common error handling code in smb2_ioctl_query_info() · 2b1116bb

由 Markus Elfring 提交于 11月 05, 2019

Move the same error code assignments so that such exception handling
can be better reused at the end of this function.

This issue was detected by using the Coccinelle software.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Signed-off-by: NSteve French <stfrench@microsoft.com>

2b1116bb

CIFS: Use memdup_user() rather than duplicating its implementation · cfaa1181

由 Markus Elfring 提交于 11月 05, 2019

Reuse existing functionality from memdup_user() instead of keeping
duplicate source code.

Generated by: scripts/coccinelle/api/memdup_user.cocci

Fixes: f5b05d62 ("cifs: add IOCTL for QUERY_INFO passthrough to userspace")
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Signed-off-by: NSteve French <stfrench@microsoft.com>

cfaa1181

cifs: smbd: Return -ECONNABORTED when trasnport is not in connected state · acd4680e

由 Long Li 提交于 10月 16, 2019

The transport should return this error so the upper layer will reconnect.
Signed-off-by: NLong Li <longli@microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: NSteve French <stfrench@microsoft.com>

acd4680e

cifs: smbd: Add messages on RDMA session destroy and reconnection · d63cdbae

由 Long Li 提交于 10月 16, 2019

Log these activities to help production support.
Signed-off-by: NLong Li <longli@microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: NSteve French <stfrench@microsoft.com>

d63cdbae

cifs: smbd: Return -EINVAL when the number of iovs exceeds SMBDIRECT_MAX_SGE · 37941ea1

由 Long Li 提交于 10月 16, 2019

While it's not friendly to fail user processes that issue more iovs
than we support, at least we should return the correct error code so the
user process gets a chance to retry with smaller number of iovs.
Signed-off-by: NLong Li <longli@microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: NSteve French <stfrench@microsoft.com>

37941ea1

cifs: smbd: Invalidate and deregister memory registration on re-send for direct I/O · b7a55bbd

由 Long Li 提交于 10月 15, 2019

On re-send, there might be a reconnect and all prevoius memory registrations
need to be invalidated and deregistered.
Signed-off-by: NLong Li <longli@microsoft.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

b7a55bbd

cifs: Don't display RDMA transport on reconnect · 14cc639c

由 Long Li 提交于 10月 16, 2019

On reconnect, the transport data structure is NULL and its information is not
available.
Signed-off-by: NLong Li <longli@microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: NSteve French <stfrench@microsoft.com>

14cc639c

CIFS: remove set but not used variables 'cinode' and 'netfid' · f28a2e5e

由 YueHaibing 提交于 10月 17, 2019

Fixes gcc '-Wunused-but-set-variable' warning:

fs/cifs/file.c: In function 'cifs_flock':
fs/cifs/file.c:1704:8: warning:
 variable 'netfid' set but not used [-Wunused-but-set-variable]

fs/cifs/file.c:1702:24: warning:
 variable 'cinode' set but not used [-Wunused-but-set-variable]
Reported-by: NHulk Robot <hulkci@huawei.com>
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

f28a2e5e

cifs: add support for flock · d0677992

由 Steve French 提交于 7月 16, 2019

The flock system call locks the whole file rather than a byte
range and so is currently emulated by various other file systems
by simply sending a byte range lock for the whole file.
Add flock handling for cifs.ko in similar way.

xfstest generic/504 passes with this as well
Signed-off-by: NSteve French <stfrench@microsoft.com>
Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>

d0677992

cifs: remove unused variable 'sid_user' · be1bf978

由 YueHaibing 提交于 10月 23, 2019

fs/cifs/cifsacl.c:43:30: warning:
 sid_user defined but not used [-Wunused-const-variable=]

It is never used, so remove it.
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

be1bf978

cifs: rename a variable in SendReceive() · 8bd3754c

由 Dan Carpenter 提交于 10月 25, 2019

Smatch gets confused because we sometimes refer to "server->srv_mutex" and
sometimes to "sess->server->srv_mutex".  They refer to the same lock so
let's just make this consistent.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

8bd3754c

24 11月, 2019 1 次提交

cramfs: fix usage on non-MTD device · 3e5aeec0

由 Maxime Bizon 提交于 10月 19, 2019

When both CONFIG_CRAMFS_MTD and CONFIG_CRAMFS_BLOCKDEV are enabled, if
we fail to mount on MTD, we don't try on block device.

Note: this relies upon cramfs_mtd_fill_super() leaving no side
effects on fc state in case of failure; in general, failing
get_tree_...() does *not* mean "fine to try again"; e.g. parsed
options might've been consumed by fill_super callback and freed
on failure.

Fixes: 74f78fc5 ("vfs: Convert cramfs to use the new mount API")
Signed-off-by: NMaxime Bizon <mbizon@freebox.fr>
Signed-off-by: NNicolas Pitre <nico@fluxnic.net>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

3e5aeec0

23 11月, 2019 3 次提交

afs: Fix large file support · b485275f

由 Marc Dionne 提交于 11月 21, 2019

By default s_maxbytes is set to MAX_NON_LFS, which limits the usable
file size to 2GB, enforced by the vfs.

Commit b9b1f8d5 ("AFS: write support fixes") added support for the
64-bit fetch and store server operations, but did not change this value.
As a result, attempts to write past the 2G mark result in EFBIG errors:

 $ dd if=/dev/zero of=foo bs=1M count=1 seek=2048
 dd: error writing 'foo': File too large

Set s_maxbytes to MAX_LFS_FILESIZE.

Fixes: b9b1f8d5 ("AFS: write support fixes")
Signed-off-by: NMarc Dionne <marc.dionne@auristor.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b485275f

afs: Fix possible assert with callbacks from yfs servers · cd340703

由 Marc Dionne 提交于 11月 21, 2019

Servers sending callback breaks to the YFS_CM_SERVICE service may
send up to YFSCBMAX (1024) fids in a single RPC.  Anything over
AFSCBMAX (50) will cause the assert in afs_break_callbacks to trigger.

Remove the assert, as the count has already been checked against
the appropriate max values in afs_deliver_cb_callback and
afs_deliver_yfs_cb_callback.

Fixes: 35dbfba3 ("afs: Implement the YFS cache manager service")
Signed-off-by: NMarc Dionne <marc.dionne@auristor.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cd340703

Revert "fs: ocfs2: fix possible null-pointer dereferences in ocfs2_xa_prepare_entry()" · 94b07b6f

由 Joseph Qi 提交于 11月 21, 2019

This reverts commit 56e94ea1.

Commit 56e94ea1 ("fs: ocfs2: fix possible null-pointer dereferences
in ocfs2_xa_prepare_entry()") introduces a regression that fail to
create directory with mount option user_xattr and acl.  Actually the
reported NULL pointer dereference case can be correctly handled by
loc->xl_ops->xlo_add_entry(), so revert it.

Link: http://lkml.kernel.org/r/1573624916-83825-1-git-send-email-joseph.qi@linux.alibaba.com
Fixes: 56e94ea1 ("fs: ocfs2: fix possible null-pointer dereferences in ocfs2_xa_prepare_entry()")
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Reported-by: NThomas Voegtle <tv@lio96.de>
Acked-by: NChangwei Ge <gechangwei@live.cn>
Cc: Jia-Ju Bai <baijiaju1990@gmail.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

94b07b6f

20 11月, 2019 1 次提交

afs: Fix missing timeout reset · c74386d5

由 David Howells 提交于 11月 19, 2019

In afs_wait_for_call_to_complete(), rather than immediately aborting an
operation if a signal occurs, the code attempts to wait for it to
complete, using a schedule timeout of 2*RTT (or min 2 jiffies) and a
check that we're still receiving relevant packets from the server before
we consider aborting the call. We may even ping the server to check on
the status of the call.

However, there's a missing timeout reset in the event that we do
actually get a packet to process, such that if we then get a couple of
short stalls, we then time out when progress is actually being made.

Fix this by resetting the timeout any time we get something to process.
If it's the failure of the call then the call state will get changed and
we'll exit the loop shortly thereafter.

A symptom of this is data fetches and stores failing with EINTR when
they really shouldn't.

Fixes: bc5e3a54 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals")
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Reviewed-by: NMarc Dionne <marc.dionne@auristor.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c74386d5

16 11月, 2019 1 次提交

afs: Fix race in commit bulk status fetch · a28f239e

由 David Howells 提交于 11月 14, 2019

When a lookup is done, the afs filesystem will perform a bulk status-fetch
operation on the requested vnode (file) plus the next 49 other vnodes from
the directory list (in AFS, directory contents are downloaded as blobs and
parsed locally). When the results are received, it will speculatively
populate the inode cache from the extra data.

However, if the lookup races with another lookup on the same directory, but
for a different file - one that's in the 49 extra fetches, then if the bulk
status-fetch operation finishes first, it will try and update the inode
from the other lookup.

If this other inode is still in the throes of being created, however, this
will cause an assertion failure in afs_apply_status():

BUG_ON(test_bit(AFS_VNODE_UNSET, &vnode->flags));

on or about fs/afs/inode.c:175 because it expects data to be there already
that it can compare to.

Fix this by skipping the update if the inode is being created as the
creator will presumably set up the inode with the same information.

Fixes: 39db9815 ("afs: Fix application of the results of a inline bulk status fetch")
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Reviewed-by: NMarc Dionne <marc.dionne@auristor.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a28f239e

15 11月, 2019 2 次提交

ceph: increment/decrement dio counter on async requests · 6a81749e

由 Jeff Layton 提交于 11月 13, 2019

Ceph can in some cases issue an async DIO request, in which case we can
end up calling ceph_end_io_direct before the I/O is actually complete.
That may allow buffered operations to proceed while DIO requests are
still in flight.

Fix this by incrementing the i_dio_count when issuing an async DIO
request, and decrement it when tearing down the aio_req.

Fixes: 321fe13c ("ceph: add buffered/direct exclusionary locking for reads and writes")
Signed-off-by: NJeff Layton <jlayton@kernel.org>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

6a81749e

ceph: take the inode lock before acquiring cap refs · a81bc310

由 Jeff Layton 提交于 11月 13, 2019

Most of the time, we (or the vfs layer) takes the inode_lock and then
acquires caps, but ceph_read_iter does the opposite, and that can lead
to a deadlock.

When there are multiple clients treading over the same data, we can end
up in a situation where a reader takes caps and then tries to acquire
the inode_lock. Another task holds the inode_lock and issues a request
to the MDS which needs to revoke the caps, but that can't happen until
the inode_lock is unwedged.

Fix this by having ceph_read_iter take the inode_lock earlier, before
attempting to acquire caps.

Fixes: 321fe13c ("ceph: add buffered/direct exclusionary locking for reads and writes")
Link: https://tracker.ceph.com/issues/36348Signed-off-by: NJeff Layton <jlayton@kernel.org>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

a81bc310

14 11月, 2019 2 次提交

io_uring: ensure registered buffer import returns the IO length · 5e559561

由 Jens Axboe 提交于 11月 13, 2019

A test case was reported where two linked reads with registered buffers
failed the second link always. This is because we set the expected value
of a request in req->result, and if we don't get this result, then we
fail the dependent links. For some reason the registered buffer import
returned -ERROR/0, while the normal import returns -ERROR/length. This
broke linked commands with registered buffers.

Fix this by making io_import_fixed() correctly return the mapped length.

Cc: stable@vger.kernel.org # v5.3
Reported-by: N李通洲 <carter.li@eoitek.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5e559561

io_uring: Fix getting file for timeout · 5683e540

由 Pavel Begunkov 提交于 11月 14, 2019

For timeout requests io_uring tries to grab a file with specified fd,
which is usually stdin/fd=0.
Update io_op_needs_file()
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5683e540

12 11月, 2019 2 次提交

io_uring: make timeout sequence == 0 mean no sequence · 93bd25bb

由 Jens Axboe 提交于 11月 11, 2019

Currently we make sequence == 0 be the same as sequence == 1, but that's
not super useful if the intent is really to have a timeout that's just
a pure timeout.

If the user passes in sqe->off == 0, then don't apply any sequence logic
to the request, let it purely be driven by the timeout specified.
Reported-by: N李通洲 <carter.li@eoitek.com>
Reviewed-by: N李通洲 <carter.li@eoitek.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

93bd25bb

Btrfs: fix log context list corruption after rename exchange operation · e6c61710

由 Filipe Manana 提交于 11月 08, 2019

During rename exchange we might have successfully log the new name in the
source root's log tree, in which case we leave our log context (allocated
on stack) in the root's list of log contextes. However we might fail to
log the new name in the destination root, in which case we fallback to
a transaction commit later and never sync the log of the source root,
which causes the source root log context to remain in the list of log
contextes. This later causes invalid memory accesses because the context
was allocated on stack and after rename exchange finishes the stack gets
reused and overwritten for other purposes.

The kernel's linked list corruption detector (CONFIG_DEBUG_LIST=y) can
detect this and report something like the following:

  [  691.489929] ------------[ cut here ]------------
  [  691.489947] list_add corruption. prev->next should be next (ffff88819c944530), but was ffff8881c23f7be4. (prev=ffff8881c23f7a38).
  [  691.489967] WARNING: CPU: 2 PID: 28933 at lib/list_debug.c:28 __list_add_valid+0x95/0xe0
  (...)
  [  691.489998] CPU: 2 PID: 28933 Comm: fsstress Not tainted 5.4.0-rc6-btrfs-next-62 #1
  [  691.490001] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
  [  691.490003] RIP: 0010:__list_add_valid+0x95/0xe0
  (...)
  [  691.490007] RSP: 0018:ffff8881f0b3faf8 EFLAGS: 00010282
  [  691.490010] RAX: 0000000000000000 RBX: ffff88819c944530 RCX: 0000000000000000
  [  691.490011] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffffa2c497e0
  [  691.490013] RBP: ffff8881f0b3fe68 R08: ffffed103eaa4115 R09: ffffed103eaa4114
  [  691.490015] R10: ffff88819c944000 R11: ffffed103eaa4115 R12: 7fffffffffffffff
  [  691.490016] R13: ffff8881b4035610 R14: ffff8881e7b84728 R15: 1ffff1103e167f7b
  [  691.490019] FS:  00007f4b25ea2e80(0000) GS:ffff8881f5500000(0000) knlGS:0000000000000000
  [  691.490021] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [  691.490022] CR2: 00007fffbb2d4eec CR3: 00000001f2a4a004 CR4: 00000000003606e0
  [  691.490025] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [  691.490027] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [  691.490029] Call Trace:
  [  691.490058]  btrfs_log_inode_parent+0x667/0x2730 [btrfs]
  [  691.490083]  ? join_transaction+0x24a/0xce0 [btrfs]
  [  691.490107]  ? btrfs_end_log_trans+0x80/0x80 [btrfs]
  [  691.490111]  ? dget_parent+0xb8/0x460
  [  691.490116]  ? lock_downgrade+0x6b0/0x6b0
  [  691.490121]  ? rwlock_bug.part.0+0x90/0x90
  [  691.490127]  ? do_raw_spin_unlock+0x142/0x220
  [  691.490151]  btrfs_log_dentry_safe+0x65/0x90 [btrfs]
  [  691.490172]  btrfs_sync_file+0x9f1/0xc00 [btrfs]
  [  691.490195]  ? btrfs_file_write_iter+0x1800/0x1800 [btrfs]
  [  691.490198]  ? rcu_read_lock_any_held.part.11+0x20/0x20
  [  691.490204]  ? __do_sys_newstat+0x88/0xd0
  [  691.490207]  ? cp_new_stat+0x5d0/0x5d0
  [  691.490218]  ? do_fsync+0x38/0x60
  [  691.490220]  do_fsync+0x38/0x60
  [  691.490224]  __x64_sys_fdatasync+0x32/0x40
  [  691.490228]  do_syscall_64+0x9f/0x540
  [  691.490233]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
  [  691.490235] RIP: 0033:0x7f4b253ad5f0
  (...)
  [  691.490239] RSP: 002b:00007fffbb2d6078 EFLAGS: 00000246 ORIG_RAX: 000000000000004b
  [  691.490242] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f4b253ad5f0
  [  691.490244] RDX: 00007fffbb2d5fe0 RSI: 00007fffbb2d5fe0 RDI: 0000000000000003
  [  691.490245] RBP: 000000000000000d R08: 0000000000000001 R09: 00007fffbb2d608c
  [  691.490247] R10: 00000000000002e8 R11: 0000000000000246 R12: 00000000000001f4
  [  691.490248] R13: 0000000051eb851f R14: 00007fffbb2d6120 R15: 00005635a498bda0

This started happening recently when running some test cases from fstests
like btrfs/004 for example, because support for rename exchange was added
last week to fsstress from fstests.

So fix this by deleting the log context for the source root from the list
if we have logged the new name in the source root.
Reported-by: NSu Yue <Damenly_Su@gmx.com>
Fixes: d4682ba0 ("Btrfs: sync log after logging new name")
CC: stable@vger.kernel.org # 4.19+
Tested-by: NSu Yue <Damenly_Su@gmx.com>
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

e6c61710

11 11月, 2019 4 次提交

ecryptfs_lookup_interpose(): lower_dentry->d_parent is not stable either · 762c6968

由 Al Viro 提交于 11月 03, 2019

We need to get the underlying dentry of parent; sure, absent the races
it is the parent of underlying dentry, but there's nothing to prevent
losing a timeslice to preemtion in the middle of evaluation of
lower_dentry->d_parent->d_inode, having another process move lower_dentry
around and have its (ex)parent not pinned anymore and freed on memory
pressure.  Then we regain CPU and try to fetch ->d_inode from memory
that is freed by that point.

dentry->d_parent *is* stable here - it's an argument of ->lookup() and
we are guaranteed that it won't be moved anywhere until we feed it
to d_add/d_splice_alias.  So we safely go that way to get to its
underlying dentry.

Cc: stable@vger.kernel.org # since 2009 or so
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

762c6968

ecryptfs_lookup_interpose(): lower_dentry->d_inode is not stable · e72b9dd6

由 Al Viro 提交于 11月 03, 2019

lower_dentry can't go from positive to negative (we have it pinned),
but it *can* go from negative to positive.  So fetching ->d_inode
into a local variable, doing a blocking allocation, checking that
now ->d_inode is non-NULL and feeding the value we'd fetched
earlier to a function that won't accept NULL is not a good idea.

Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e72b9dd6

ecryptfs: fix unlink and rmdir in face of underlying fs modifications · bcf0d9d4

由 Al Viro 提交于 11月 03, 2019

A problem similar to the one caught in commit 74dd7c97 ("ecryptfs_rename():
verify that lower dentries are still OK after lock_rename()") exists for
unlink/rmdir as well.

Instead of playing with dget_parent() of underlying dentry of victim
and hoping it's the same as underlying dentry of our directory,
do the following:
        * find the underlying dentry of victim
        * find the underlying directory of victim's parent (stable
since the victim is ecryptfs dentry and inode of its parent is
held exclusive by the caller).
        * lock the inode of dentry underlying the victim's parent
        * check that underlying dentry of victim is still hashed and
has the right parent - it can be moved, but it can't be moved to/from
the directory we are holding exclusive.  So while ->d_parent itself
might not be stable, the result of comparison is.

If the check passes, everything is fine - underlying directory is locked,
underlying victim is still a child of that directory and we can go ahead
and feed them to vfs_unlink().  As in the current mainline we need to
pin the underlying dentry of victim, so that it wouldn't go negative under
us, but that's the only temporary reference that needs to be grabbed there.
Underlying dentry of parent won't go away (it's pinned by the parent,
which is held by caller), so there's no need to grab it.

The same problem (with the same solution) exists for rmdir.  Moreover,
rename gets simpler and more robust with the same "don't bother with
dget_parent()" approach.

Fixes: 74dd7c97 "ecryptfs_rename(): verify that lower dentries are still OK after lock_rename()"
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

bcf0d9d4

A
exportfs_decode_fh(): negative pinned may become positive without the parent locked · a2ece088
由 Al Viro 提交于 11月 08, 2019
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
a2ece088

09 11月, 2019 1 次提交

cgroup,writeback: don't switch wbs immediately on dead wbs if the memcg is dead · 65de03e2

由 Tejun Heo 提交于 11月 08, 2019

cgroup writeback tries to refresh the associated wb immediately if the
current wb is dead.  This is to avoid keeping issuing IOs on the stale
wb after memcg - blkcg association has changed (ie. when blkcg got
disabled / enabled higher up in the hierarchy).

Unfortunately, the logic gets triggered spuriously on inodes which are
associated with dead cgroups.  When the logic is triggered on dead
cgroups, the attempt fails only after doing quite a bit of work
allocating and initializing a new wb.

While c3aab9a0 ("mm/filemap.c: don't initiate writeback if mapping
has no dirty pages") alleviated the issue significantly as it now only
triggers when the inode has dirty pages.  However, the condition can
still be triggered before the inode is switched to a different cgroup
and the logic simply doesn't make sense.

Skip the immediate switching if the associated memcg is dying.

This is a simplified version of the following two patches:

 * https://lore.kernel.org/linux-mm/20190513183053.GA73423@dennisz-mbp/
 * http://lkml.kernel.org/r/156355839560.2063.5265687291430814589.stgit@buzz

Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Fixes: e8a7abf5 ("writeback: disassociate inodes from dying bdi_writebacks")
Acked-by: NDennis Zhou <dennis@kernel.org>
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

65de03e2

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功