提交 · d17abdf7566566fc402c31899b353044a7ff3cf4 · openeuler / Kernel

14 12月, 2020 14 次提交

cifs: add an smb3_fs_context to cifs_sb · d17abdf7

由 Ronnie Sahlberg 提交于 11月 10, 2020

and populate it during mount in cifs_smb3_do_mount()
Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: NShyam Prasad N <sprasad@microsoft.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

d17abdf7

cifs: remove the devname argument to cifs_compose_mount_options · 4deb0759

由 Ronnie Sahlberg 提交于 12月 10, 2020

none of the callers use this argument any more.
Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

4deb0759

cifs: switch to new mount api · 24e0a1ef

由 Ronnie Sahlberg 提交于 12月 10, 2020

See Documentation/filesystems/mount_api.rst for details on new mount API
Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

24e0a1ef

cifs: move cifs_parse_devname to fs_context.c · 66e7b09c

由 Ronnie Sahlberg 提交于 11月 05, 2020

Also rename the function from cifs_ to smb3_
Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

66e7b09c

cifs: move the enum for cifs parameters into fs_context.h · 15c7d09a

由 Ronnie Sahlberg 提交于 11月 02, 2020

No change to logic, just moving the enum of cifs mount parms into a header
Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

15c7d09a

cifs: rename dup_vol to smb3_fs_context_dup and move it into fs_context.c · 837e3a1b

由 Ronnie Sahlberg 提交于 11月 02, 2020

Continue restructuring needed for support of new mount API
Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

837e3a1b

cifs: rename smb_vol as smb3_fs_context and move it to fs_context.h · 3fa1c6d1

由 Ronnie Sahlberg 提交于 12月 09, 2020

Harmonize and change all such variables to 'ctx', where possible.
No changes to actual logic.
Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

3fa1c6d1

SMB3.1.1: do not log warning message if server doesn't populate salt · 7955f105

由 Steve French 提交于 12月 09, 2020

In the negotiate protocol preauth context, the server is not required
to populate the salt (although it is done by most servers) so do
not warn on mount.

We retain the checks (warn) that the preauth context is the minimum
size and that the salt does not exceed DataLength of the SMB response.
Although we use the defaults in the case that the preauth context
response is invalid, these checks may be useful in the future
as servers add support for additional mechanisms.

CC: Stable <stable@vger.kernel.org>
Reviewed-by: NShyam Prasad N <sprasad@microsoft.com>
Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

7955f105

SMB3.1.1: update comments clarifying SPNEGO info in negprot response · 145024e3

由 Steve French 提交于 12月 09, 2020

Trivial changes to clarify confusing comment about
SPNEGO blog (and also one length comparisons in negotiate
context parsing).
Suggested-by: NTom Talpey <tom@talpey.com>
Suggested-by: NPavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

145024e3

cifs: Enable sticky bit with cifsacl mount option. · f2156d35

由 Shyam Prasad N 提交于 11月 09, 2020

For the cifsacl mount option, we did not support sticky bits.
With this patch, we do support it, by setting the DELETE_CHILD perm
on the directory only for the owner user. When sticky bit is not
enabled, allow DELETE_CHILD perm for everyone.
Signed-off-by: NShyam Prasad N <sprasad@microsoft.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

f2156d35

cifs: Fix unix perm bits to cifsacl conversion for "other" bits. · 0f22053e

由 Shyam Prasad N 提交于 8月 17, 2020

With the "cifsacl" mount option, the mode bits set on the file/dir
is converted to corresponding ACEs in DACL. However, only the
ALLOWED ACEs were being set for "owner" and "group" SIDs. Since
owner is a subset of group, and group is a subset of
everyone/world SID, in order to properly emulate unix perm groups,
we need to add DENIED ACEs. If we don't do that, "owner" and "group"
SIDs could get more access rights than they should. Which is what
was happening. This fixes it.

We try to keep the "preferred" order of ACEs, i.e. DENYs followed
by ALLOWs. However, for a small subset of cases we cannot
maintain the preferred order. In that case, we'll end up with the
DENY ACE for group after the ALLOW for the owner.

If owner SID == group SID, use the more restrictive
among the two perm bits and convert them to ACEs.

Also, for reverse mapping, i.e. to convert ACL to unix perm bits,
for the "others" bits, we needed to add the masked bits of the
owner and group masks to others mask.

Updated version of patch fixes a problem noted by the kernel
test robot.
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NShyam Prasad N <sprasad@microsoft.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

0f22053e

SMB3.1.1: remove confusing mount warning when no SPNEGO info on negprot rsp · bc7c4129

由 Steve French 提交于 12月 09, 2020

Azure does not send an SPNEGO blob in the negotiate protocol response,
so we shouldn't assume that it is there when validating the location
of the first negotiate context.  This avoids the potential confusing
mount warning:

   CIFS: Invalid negotiate context offset

CC: Stable <stable@vger.kernel.org>
Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

bc7c4129

SMB3: avoid confusing warning message on mount to Azure · ebcd6de9

由 Steve French 提交于 12月 08, 2020

Mounts to Azure cause an unneeded warning message in dmesg
   "CIFS: VFS: parse_server_interfaces: incomplete interface info"

Azure rounds up the size (by 8 additional bytes, to a
16 byte boundary) of the structure returned on the query
of the server interfaces at mount time.  This is permissible
even though different than other servers so do not log a warning
if query network interfaces response is only rounded up by 8
bytes or fewer.

CC: Stable <stable@vger.kernel.org>
Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

ebcd6de9

cifs: Fix fall-through warnings for Clang · 21ac58f4

由 Gustavo A. R. Silva 提交于 11月 20, 2020

In preparation to enable -Wimplicit-fallthrough for Clang, fix multiple
warnings by explicitly adding multiple break/goto statements instead of
just letting the code fall through to the next case.

Link: https://github.com/KSPP/linux/issues/115Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: NSteve French <stfrench@microsoft.com>

21ac58f4

12 12月, 2020 1 次提交

proc: use untagged_addr() for pagemap_read addresses · 40d6366e

由 Miles Chen 提交于 12月 11, 2020

When we try to visit the pagemap of a tagged userspace pointer, we find
that the start_vaddr is not correct because of the tag.
To fix it, we should untag the userspace pointers in pagemap_read().

I tested with 5.10-rc4 and the issue remains.

Explanation from Catalin in [1]:

 "Arguably, that's a user-space bug since tagged file offsets were never
  supported. In this case it's not even a tag at bit 56 as per the arm64
  tagged address ABI but rather down to bit 47. You could say that the
  problem is caused by the C library (malloc()) or whoever created the
  tagged vaddr and passed it to this function. It's not a kernel
  regression as we've never supported it.

  Now, pagemap is a special case where the offset is usually not
  generated as a classic file offset but rather derived by shifting a
  user virtual address. I guess we can make a concession for pagemap
  (only) and allow such offset with the tag at bit (56 - PAGE_SHIFT + 3)"

My test code is based on [2]:

A userspace pointer which has been tagged by 0xb4: 0xb400007662f541c8

userspace program:

  uint64 OsLayer::VirtualToPhysical(void *vaddr) {
	uint64 frame, paddr, pfnmask, pagemask;
	int pagesize = sysconf(_SC_PAGESIZE);
	off64_t off = ((uintptr_t)vaddr) / pagesize * 8; // off = 0xb400007662f541c8 / pagesize * 8 = 0x5a00003b317aa0
	int fd = open(kPagemapPath, O_RDONLY);
	...

	if (lseek64(fd, off, SEEK_SET) != off || read(fd, &frame, 8) != 8) {
		int err = errno;
		string errtxt = ErrorString(err);
		if (fd >= 0)
			close(fd);
		return 0;
	}
  ...
  }

kernel fs/proc/task_mmu.c:

  static ssize_t pagemap_read(struct file *file, char __user *buf,
		size_t count, loff_t *ppos)
  {
	...
	src = *ppos;
	svpfn = src / PM_ENTRY_BYTES; // svpfn == 0xb400007662f54
	start_vaddr = svpfn << PAGE_SHIFT; // start_vaddr == 0xb400007662f54000
	end_vaddr = mm->task_size;

	/* watch out for wraparound */
	// svpfn == 0xb400007662f54
	// (mm->task_size >> PAGE) == 0x8000000
	if (svpfn > mm->task_size >> PAGE_SHIFT) // the condition is true because of the tag 0xb4
		start_vaddr = end_vaddr;

	ret = 0;
	while (count && (start_vaddr < end_vaddr)) { // we cannot visit correct entry because start_vaddr is set to end_vaddr
		int len;
		unsigned long end;
		...
	}
	...
  }

[1] https://lore.kernel.org/patchwork/patch/1343258/
[2] https://github.com/stressapptest/stressapptest/blob/master/src/os.cc#L158

Link: https://lkml.kernel.org/r/20201204024347.8295-1-miles.chen@mediatek.comSigned-off-by: NMiles Chen <miles.chen@mediatek.com>
Reviewed-by: NVincenzo Frascino <vincenzo.frascino@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>
Cc: <stable@vger.kernel.org>	[5.4-]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

40d6366e

11 12月, 2020 3 次提交

NFS: Disable READ_PLUS by default · 21e31401

由 Anna Schumaker 提交于 12月 03, 2020

We've been seeing failures with xfstests generic/091 and generic/263
when using READ_PLUS. I've made some progress on these issues, and the
tests fail later on but still don't pass. Let's disable READ_PLUS by
default until we can work out what is going on.
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

21e31401

NFSv4.2: Fix 5 seconds delay when doing inter server copy · fe8eb820

由 Dai Ngo 提交于 11月 23, 2020

Since commit b4868b44 ("NFSv4: Wait for stateid updates after
CLOSE/OPEN_DOWNGRADE"), every inter server copy operation suffers 5
seconds delay regardless of the size of the copy. The delay is from
nfs_set_open_stateid_locked when the check by nfs_stateid_is_sequential
fails because the seqid in both nfs4_state and nfs4_stateid are 0.

Fix __nfs42_ssc_open to delay setting of NFS_OPEN_STATE in nfs4_state,
until after the call to update_open_stateid, to indicate this is the 1st
open. This fix is part of a 2 patches, the other patch is the fix in the
source server to return the stateid for COPY_NOTIFY request with seqid 1
instead of 0.

Fixes: ce0887ac ("NFSD add nfs4 inter ssc to nfsd4_copy")
Signed-off-by: NDai Ngo <dai.ngo@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

fe8eb820

NFS: Fix rpcrdma_inline_fixup() crash with new LISTXATTRS operation · 1c87b851

由 Chuck Lever 提交于 11月 24, 2020

By switching to an XFS-backed export, I am able to reproduce the
ibcomp worker crash on my client with xfstests generic/013.

For the failing LISTXATTRS operation, xdr_inline_pages() is called
with page_len=12 and buflen=128.

- When ->send_request() is called, rpcrdma_marshal_req() does not
  set up a Reply chunk because buflen is smaller than the inline
  threshold. Thus rpcrdma_convert_iovs() does not get invoked at
  all and the transport's XDRBUF_SPARSE_PAGES logic is not invoked
  on the receive buffer.

- During reply processing, rpcrdma_inline_fixup() tries to copy
  received data into rq_rcv_buf->pages because page_len is positive.
  But there are no receive pages because rpcrdma_marshal_req() never
  allocated them.

The result is that the ibcomp worker faults and dies. Sometimes that
causes a visible crash, and sometimes it results in a transport hang
without other symptoms.

RPC/RDMA's XDRBUF_SPARSE_PAGES support is not entirely correct, and
should eventually be fixed or replaced. However, my preference is
that upper-layer operations should explicitly allocate their receive
buffers (using GFP_KERNEL) when possible, rather than relying on
XDRBUF_SPARSE_PAGES.
Reported-by: NOlga kornievskaia <kolga@netapp.com>
Suggested-by: NOlga kornievskaia <kolga@netapp.com>
Fixes: c10a7514 ("NFSv4.2: add the extended attribute proc functions.")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NOlga kornievskaia <kolga@netapp.com>
Reviewed-by: NFrank van der Linden <fllinden@amazon.com>
Tested-by: NOlga kornievskaia <kolga@netapp.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

1c87b851

10 12月, 2020 1 次提交

zonefs: fix page reference and BIO leak · 6bea0225

由 Damien Le Moal 提交于 12月 09, 2020

In zonefs_file_dio_append(), the pages obtained using
bio_iov_iter_get_pages() are not released on completion of the
REQ_OP_APPEND BIO, nor when bio_iov_iter_get_pages() fails.
Furthermore, a call to bio_put() is missing when
bio_iov_iter_get_pages() fails.

Fix these resource leaks by adding BIO resource release code (bio_put()i
and bio_release_pages()) at the end of the function after the BIO
execution and add a jump to this resource cleanup code in case of
bio_iov_iter_get_pages() failure.

While at it, also fix the call to task_io_account_write() to be passed
the correct BIO size instead of bio_iov_iter_get_pages() return value.
Reported-by: NChristoph Hellwig <hch@lst.de>
Fixes: 02ef12a6 ("zonefs: use REQ_OP_ZONE_APPEND for sync DIO")
Cc: stable@vger.kernel.org
Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

6bea0225

09 12月, 2020 1 次提交

afs: Fix memory leak when mounting with multiple source parameters · 4cb68296

由 David Howells 提交于 12月 08, 2020

There's a memory leak in afs_parse_source() whereby multiple source=
parameters overwrite fc->source in the fs_context struct without freeing
the previously recorded source.

Fix this by only permitting a single source parameter and rejecting with
an error all subsequent ones.

This was caught by syzbot with the kernel memory leak detector, showing
something like the following trace:

  unreferenced object 0xffff888114375440 (size 32):
    comm "repro", pid 5168, jiffies 4294923723 (age 569.948s)
    backtrace:
      slab_post_alloc_hook+0x42/0x79
      __kmalloc_track_caller+0x125/0x16a
      kmemdup_nul+0x24/0x3c
      vfs_parse_fs_string+0x5a/0xa1
      generic_parse_monolithic+0x9d/0xc5
      do_new_mount+0x10d/0x15a
      do_mount+0x5f/0x8e
      __do_sys_mount+0xff/0x127
      do_syscall_64+0x2d/0x3a
      entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 13fcc683 ("afs: Add fs_context support")
Reported-by: syzbot+86dc6632faaca40133ab@syzkaller.appspotmail.com
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4cb68296

08 12月, 2020 1 次提交

io_uring: fix file leak on error path of io ctx creation · f26c08b4

由 Hillf Danton 提交于 12月 08, 2020

Put file as part of error handling when setting up io ctx to fix
memory leaks like the following one.

   BUG: memory leak
   unreferenced object 0xffff888101ea2200 (size 256):
     comm "syz-executor355", pid 8470, jiffies 4294953658 (age 32.400s)
     hex dump (first 32 bytes):
       00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
       20 59 03 01 81 88 ff ff 80 87 a8 10 81 88 ff ff   Y..............
     backtrace:
       [<000000002e0a7c5f>] kmem_cache_zalloc include/linux/slab.h:654 [inline]
       [<000000002e0a7c5f>] __alloc_file+0x1f/0x130 fs/file_table.c:101
       [<000000001a55b73a>] alloc_empty_file+0x69/0x120 fs/file_table.c:151
       [<00000000fb22349e>] alloc_file+0x33/0x1b0 fs/file_table.c:193
       [<000000006e1465bb>] alloc_file_pseudo+0xb2/0x140 fs/file_table.c:233
       [<000000007118092a>] anon_inode_getfile fs/anon_inodes.c:91 [inline]
       [<000000007118092a>] anon_inode_getfile+0xaa/0x120 fs/anon_inodes.c:74
       [<000000002ae99012>] io_uring_get_fd fs/io_uring.c:9198 [inline]
       [<000000002ae99012>] io_uring_create fs/io_uring.c:9377 [inline]
       [<000000002ae99012>] io_uring_setup+0x1125/0x1630 fs/io_uring.c:9411
       [<000000008280baad>] do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       [<00000000685d8cf0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reported-by: syzbot+71c4697e27c99fddcf17@syzkaller.appspotmail.com
Fixes: 0f212204 ("io_uring: don't rely on weak ->files references")
Cc: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NHillf Danton <hdanton@sina.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f26c08b4

07 12月, 2020 2 次提交

io_uring: fix mis-seting personality's creds · e8c954df

由 Pavel Begunkov 提交于 12月 06, 2020

After io_identity_cow() copies an work.identity it wants to copy creds
to the new just allocated id, not the old one. Otherwise it's
akin to req->work.identity->creds = req->work.identity->creds.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e8c954df

coredump: fix core_pattern parse error · 2bf509d9

由 Menglong Dong 提交于 12月 05, 2020

'format_corename()' will splite 'core_pattern' on spaces when it is in
pipe mode, and take helper_argv[0] as the path to usermode executable.
It works fine in most cases.

However, if there is a space between '|' and '/file/path', such as
'| /usr/lib/systemd/systemd-coredump %P %u %g', then helper_argv[0] will
be parsed as '', and users will get a 'Core dump to | disabled'.

It is not friendly to users, as the pattern above was valid previously.
Fix this by ignoring the spaces between '|' and '/file/path'.

Fixes: 315c6926 ("coredump: split pipe command whitespace before expanding template")
Signed-off-by: NMenglong Dong <dong.menglong@zte.com.cn>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Paul Wise <pabs3@bonedaddy.net>
Cc: Jakub Wilk <jwilk@jwilk.net> [https://bugs.debian.org/924398]
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/5fb62870.1c69fb81.8ef5d.af76@mx.google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2bf509d9

04 12月, 2020 3 次提交

cifs: refactor create_sd_buf() and and avoid corrupting the buffer · ea64370b

由 Ronnie Sahlberg 提交于 11月 30, 2020

When mounting with "idsfromsid" mount option, Azure
corrupted the owner SIDs due to excessive padding
caused by placing the owner fields at the end of the
security descriptor on create.  Placing owners at the
front of the security descriptor (rather than the end)
is also safer, as the number of ACEs (that follow it)
are variable.
Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
Suggested-by: NRohith Surabattula <rohiths@microsoft.com>
CC: Stable <stable@vger.kernel.org> # v5.8
Signed-off-by: NSteve French <stfrench@microsoft.com>

ea64370b

cifs: add NULL check for ses->tcon_ipc · 59463eb8

由 Aurelien Aptel 提交于 12月 03, 2020

In some scenarios (DFS and BAD_NETWORK_NAME) set_root_set() can be
called with a NULL ses->tcon_ipc.
Signed-off-by: NAurelien Aptel <aaptel@suse.com>
Reviewed-by: NPaulo Alcantara (SUSE) <pc@cjr.nz>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: NSteve French <stfrench@microsoft.com>

59463eb8

smb3: set COMPOUND_FID to FileID field of subsequent compound request · 79631784

由 Namjae Jeon 提交于 12月 03, 2020

For an operation compounded with an SMB2 CREATE request, client must set
COMPOUND_FID(0xFFFFFFFFFFFFFFFF) to FileID field of smb2 ioctl.
Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>
Fixes: 2e4564b3 ("smb3: add support stat of WSL reparse points for special file types")
Reviewed-by: NAurelien Aptel <aaptel@suse.com>
Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

79631784

02 12月, 2020 2 次提交

fs: 9p: add generic splice_write file operation · 960f4f8a

由 Dominique Martinet 提交于 12月 01, 2020

The default splice operations got removed recently, add it back to 9p
with iter_file_splice_write like many other filesystems do.

Link: http://lkml.kernel.org/r/1606837496-21717-1-git-send-email-asmadeus@codewreck.org
Fixes: 36e2c742 ("fs: don't allow splice read/write without explicit ops")
Signed-off-by: NDominique Martinet <asmadeus@codewreck.org>
Acked-by: NToke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

960f4f8a

fs: 9p: add generic splice_read file operations · cf03f316

由 Toke Høiland-Jørgensen 提交于 12月 01, 2020

The v9fs file operations were missing the splice_read operations, which
breaks sendfile() of files on such a filesystem. I discovered this while
trying to load an eBPF program using iproute2 inside a 'virtme' environment
which uses 9pfs for the virtual file system. iproute2 relies on sendfile()
with an AF_ALG socket to hash files, which was erroring out in the virtual
environment.

Since generic_file_splice_read() seems to just implement splice_read in
terms of the read_iter operation, I simply added the generic implementation
to the file operations, which fixed the error I was seeing. A quick grep
indicates that this is what most other file systems do as well.

Link: http://lkml.kernel.org/r/20201201135409.55510-1-toke@redhat.com
Fixes: 36e2c742 ("fs: don't allow splice read/write without explicit ops")
Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: NDominique Martinet <asmadeus@codewreck.org>

cf03f316

01 12月, 2020 4 次提交

gfs2: Fix deadlock between gfs2_{create_inode,inode_lookup} and delete_work_func · dd0ecf54

由 Andreas Gruenbacher 提交于 11月 30, 2020

In gfs2_create_inode and gfs2_inode_lookup, make sure to cancel any pending
delete work before taking the inode glock. Otherwise, gfs2_cancel_delete_work
may block waiting for delete_work_func to complete, and delete_work_func may
block trying to acquire the inode glock in gfs2_inode_lookup.
Reported-by: NAlexander Aring <aahringo@redhat.com>
Fixes: a0e3cc65 ("gfs2: Turn gl_delete into a delayed work")
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

dd0ecf54

cifs: fix potential use-after-free in cifs_echo_request() · 21225336

由 Paulo Alcantara 提交于 11月 28, 2020

This patch fixes a potential use-after-free bug in
cifs_echo_request().

For instance,

  thread 1
  --------
  cifs_demultiplex_thread()
    clean_demultiplex_info()
      kfree(server)

  thread 2 (workqueue)
  --------
  apic_timer_interrupt()
    smp_apic_timer_interrupt()
      irq_exit()
        __do_softirq()
          run_timer_softirq()
            call_timer_fn()
	      cifs_echo_request() <- use-after-free in server ptr
Signed-off-by: NPaulo Alcantara (SUSE) <pc@cjr.nz>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

21225336

cifs: allow syscalls to be restarted in __smb_send_rqst() · 6988a619

由 Paulo Alcantara 提交于 11月 28, 2020

A customer has reported that several files in their multi-threaded app
were left with size of 0 because most of the read(2) calls returned
-EINTR and they assumed no bytes were read.  Obviously, they could
have fixed it by simply retrying on -EINTR.

We noticed that most of the -EINTR on read(2) were due to real-time
signals sent by glibc to process wide credential changes (SIGRT_1),
and its signal handler had been established with SA_RESTART, in which
case those calls could have been automatically restarted by the
kernel.

Let the kernel decide to whether or not restart the syscalls when
there is a signal pending in __smb_send_rqst() by returning
-ERESTARTSYS.  If it can't, it will return -EINTR anyway.
Signed-off-by: NPaulo Alcantara (SUSE) <pc@cjr.nz>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

6988a619

io_uring: fix recvmsg setup with compat buf-select · 2d280bc8

由 Pavel Begunkov 提交于 11月 29, 2020

__io_compat_recvmsg_copy_hdr() with REQ_F_BUFFER_SELECT reads out iov
len but never assigns it to iov/fast_iov, leaving sr->len with garbage.
Hopefully, following io_buffer_select() truncates it to the selected
buffer size, but the value is still may be under what was specified.

Cc: <stable@vger.kernel.org> # 5.7
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2d280bc8

30 11月, 2020 1 次提交

pNFS/flexfiles: Fix array overflow when flexfiles mirroring is enabled · 63e2fffa

由 Trond Myklebust 提交于 11月 15, 2020

If the flexfiles mirroring is enabled, then the read code expects to be
able to set pgio->pg_mirror_idx to point to the data server that is
being used for this particular read. However it does not change the
pg_mirror_count because we only need to send a single read.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

63e2fffa

27 11月, 2020 1 次提交

gfs2: Upgrade shared glocks for atime updates · 82e938bd

由 Andreas Gruenbacher 提交于 11月 25, 2020

Commit 20f82999 ("gfs2: Rework read and page fault locking") lifted
the glock lock taking from the low-level ->readpage and ->readahead
address space operations to the higher-level ->read_iter file and
->fault vm operations.  The glocks are still taken in LM_ST_SHARED mode
only.  On filesystems mounted without the noatime option, ->read_iter
sometimes needs to update the atime as well, though.  Right now, this
leads to a failed locking mode assertion in gfs2_dirty_inode.

Fix that by introducing a new update_time inode operation.  There, if
the glock is held non-exclusively, upgrade it to an exclusive lock.
Reported-by: NAlexander Aring <aahringo@redhat.com>
Fixes: 20f82999 ("gfs2: Rework read and page fault locking")
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

82e938bd

26 11月, 2020 3 次提交

io_uring: fix files grab/cancel race · af604703

由 Pavel Begunkov 提交于 11月 25, 2020

When one task is in io_uring_cancel_files() and another is doing
io_prep_async_work() a race may happen. That's because after accounting
a request inflight in first call to io_grab_identity() it still may fail
and go to io_identity_cow(), which migh briefly keep dangling
work.identity and not only.

Grab files last, so io_prep_async_work() won't fail if it did get into
->inflight_list.

note: the bug shouldn't exist after making io_uring_cancel_files() not
poking into other tasks' requests.
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

af604703

gfs2: Don't freeze the file system during unmount · f39e7d3a

由 Bob Peterson 提交于 11月 24, 2020

GFS2's freeze/thaw mechanism uses a special freeze glock to control its
operation. It does this with a sync glock operation (glops.c) called
freeze_go_sync. When the freeze glock is demoted (glock's do_xmote) the
glops function causes the file system to be frozen. This is intended. However,
GFS2's mount and unmount processes also hold the freeze glock to prevent other
processes, perhaps on different cluster nodes, from mounting the frozen file
system in read-write mode.

Before this patch, there was no check in freeze_go_sync for whether a freeze
in intended or whether the glock demote was caused by a normal unmount.
So it was trying to freeze the file system it's trying to unmount, which
ends up in a deadlock.

This patch adds an additional check to freeze_go_sync so that demotes of the
freeze glock are ignored if they come from the unmount process.

Fixes: 20b32912 ("gfs2: Fix regression in freeze_go_sync")
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

f39e7d3a

gfs2: check for empty rgrp tree in gfs2_ri_update · 77872151

由 Bob Peterson 提交于 11月 24, 2020

If gfs2 tries to mount a (corrupt) file system that has no resource
groups it still tries to set preferences on the first one, which causes
a kernel null pointer dereference. This patch adds a check to function
gfs2_ri_update so this condition is detected and reported back as an
error.

Reported-by: syzbot+e3f23ce40269a4c9053a@syzkaller.appspotmail.com
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

77872151

25 11月, 2020 3 次提交

efivarfs: revert "fix memory leak in efivarfs_create()" · ff04f3b6

由 Ard Biesheuvel 提交于 11月 25, 2020

The memory leak addressed by commit fe5186cf is a false positive:
all allocations are recorded in a linked list, and freed when the
filesystem is unmounted. This leads to double frees, and as reported
by David, leads to crashes if SLUB is configured to self destruct when
double frees occur.

So drop the redundant kfree() again, and instead, mark the offending
pointer variable so the allocation is ignored by kmemleak.

Cc: Vamshi K Sthambamkadi <vamshi.k.sthambamkadi@gmail.com>
Fixes: fe5186cf ("efivarfs: fix memory leak in efivarfs_create()")
Reported-by: NDavid Laight <David.Laight@aculab.com>
Signed-off-by: NArd Biesheuvel <ardb@kernel.org>

ff04f3b6

gfs2: set lockdep subclass for iopen glocks · 515b269d

由 Alexander Aring 提交于 11月 23, 2020

This patch introduce a new globs attribute to define the subclass of the
glock lockref spinlock. This avoid the following lockdep warning, which
occurs when we lock an inode lock while an iopen lock is held:

============================================
WARNING: possible recursive locking detected
5.10.0-rc3+ #4990 Not tainted
--------------------------------------------
kworker/0:1/12 is trying to acquire lock:
ffff9067d45672d8 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: lockref_get+0x9/0x20

but task is already holding lock:
ffff9067da308588 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: delete_work_func+0x164/0x260

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&gl->gl_lockref.lock);
  lock(&gl->gl_lockref.lock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

3 locks held by kworker/0:1/12:
 #0: ffff9067c1bfdd38 ((wq_completion)delete_workqueue){+.+.}-{0:0}, at: process_one_work+0x1b7/0x540
 #1: ffffac594006be70 ((work_completion)(&(&gl->gl_delete)->work)){+.+.}-{0:0}, at: process_one_work+0x1b7/0x540
 #2: ffff9067da308588 (&gl->gl_lockref.lock){+.+.}-{3:3}, at: delete_work_func+0x164/0x260

stack backtrace:
CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.10.0-rc3+ #4990
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014
Workqueue: delete_workqueue delete_work_func
Call Trace:
 dump_stack+0x8b/0xb0
 __lock_acquire.cold+0x19e/0x2e3
 lock_acquire+0x150/0x410
 ? lockref_get+0x9/0x20
 _raw_spin_lock+0x27/0x40
 ? lockref_get+0x9/0x20
 lockref_get+0x9/0x20
 delete_work_func+0x188/0x260
 process_one_work+0x237/0x540
 worker_thread+0x4d/0x3b0
 ? process_one_work+0x540/0x540
 kthread+0x127/0x140
 ? __kthread_bind_mask+0x60/0x60
 ret_from_fork+0x22/0x30
Suggested-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

515b269d

gfs2: Fix deadlock dumping resource group glocks · 16e6281b

由 Alexander Aring 提交于 11月 22, 2020

Commit 0e539ca1 ("gfs2: Fix NULL pointer dereference in gfs2_rgrp_dump")
introduced additional locking in gfs2_rgrp_go_dump, which is also used for
dumping resource group glocks via debugfs.  However, on that code path, the
glock spin lock is already taken in dump_glock, and taking it again in
gfs2_glock2rgrp leads to deadlock.  This can be reproduced with:

  $ mkfs.gfs2 -O -p lock_nolock /dev/FOO
  $ mount /dev/FOO /mnt/foo
  $ touch /mnt/foo/bar
  $ cat /sys/kernel/debug/gfs2/FOO/glocks

Fix that by not taking the glock spin lock inside the go_dump callback.

Fixes: 0e539ca1 ("gfs2: Fix NULL pointer dereference in gfs2_rgrp_dump")
Signed-off-by: NAlexander Aring <aahringo@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

16e6281b

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功