提交 · a4f5b52167a80edec74093fe6fef291a0318f4ba · openeuler / Kernel

07 7月, 2022 3 次提交

step_into(): lose inode argument · a4f5b521

由 Al Viro 提交于 7月 03, 2022

make handle_mounts() always fetch it.  This is just the first step -
the callers of step_into() will stop trying to calculate the sucker,
etc.

The passed value should be equal to dentry->d_inode in all cases;
in RCU mode - fetched after we'd sampled ->d_seq.  Might as well
fetch it here.  We do need to validate ->d_seq, which duplicates
the check currently done in lookup_fast(); that duplication will
go away shortly.

After that change handle_mounts() always ignores the initial value of
*inode and always sets it on success.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a4f5b521

namei: stash the sampled ->d_seq into nameidata · 03fa86e9

由 Al Viro 提交于 7月 04, 2022

New field: nd->next_seq.  Set to 0 outside of RCU mode, holds the sampled
value for the next dentry to be considered.  Used instead of an arseload
of local variables, arguments, etc.

step_into() has lost seq argument; nd->next_seq is used, so dentry passed
to it must be the one ->next_seq is about.

There are two requirements for RCU pathwalk:
	1) it should not give a hard failure (other than -ECHILD) unless
non-RCU pathwalk might fail that way given suitable timings.
	2) it should not succeed unless non-RCU pathwalk might succeed
with the same end location given suitable timings.

The use of seq numbers is the way we achieve that.  Invariant we want
to maintain is:
	if RCU pathwalk can reach the state with given nd->path, nd->inode
and nd->seq after having traversed some part of pathname, it must be possible
for non-RCU pathwalk to reach the same nd->path and nd->inode after having
traversed the same part of pathname, and observe the nd->path.dentry->d_seq
equal to what RCU pathwalk has in nd->seq

	For transition from parent to child, we sample child's ->d_seq
and verify that parent's ->d_seq remains unchanged.  Anything that
disrupts parent-child relationship would've bumped ->d_seq on both.
	For transitions from child to parent we sample parent's ->d_seq
and verify that child's ->d_seq has not changed.  Same reasoning as
for the previous case applies.
	For transition from mountpoint to root of mounted we sample
the ->d_seq of root and verify that nobody has touched mount_lock since
the beginning of pathwalk.  That guarantees that mount we'd found had
been there all along, with these mountpoint and root of the mounted.
It would be possible for a non-RCU pathwalk to reach the previous state,
find the same mount and observe its root at the moment we'd sampled
->d_seq of that
	For transitions from root of mounted to mountpoint we sample
->d_seq of mountpoint and verify that mount_lock had not been touched
since the beginning of pathwalk.  The same reasoning as in the
previous case applies.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

03fa86e9

namei: move clearing LOOKUP_RCU towards rcu_read_unlock() · 6e180327

由 Al Viro 提交于 7月 06, 2022

try_to_unlazy()/try_to_unlazy_next() drop LOOKUP_RCU in the
very beginning and do rcu_read_unlock() only at the very end.
However, nothing done in between even looks at the flag in
question; might as well clear it at the same time we unlock.

Note that try_to_unlazy_next() used to call legitimize_mnt(),
which might drop/regain rcu_read_lock() in some cases.  This
is no longer true, so we really have rcu_read_lock() held
all along until the end.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6e180327

06 7月, 2022 4 次提交

switch try_to_unlazy_next() to __legitimize_mnt() · 7e4745a0

由 Al Viro 提交于 7月 05, 2022

The tricky case (__legitimize_mnt() failing after having grabbed
a reference) can be trivially dealt with by leaving nd->path.mnt
non-NULL, for terminate_walk() to drop it.

legitimize_mnt() becomes static after that.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7e4745a0

follow_dotdot{,_rcu}(): change calling conventions · 51c6546c

由 Al Viro 提交于 7月 04, 2022

Instead of returning NULL when we are in root, just make it return
the current position (and set *seqp and *inodep accordingly).
That collapses the calls of step_into() in handle_dots()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

51c6546c

namei: get rid of pointless unlikely(read_seqcount_retry(...)) · 82ef0698

由 Al Viro 提交于 7月 05, 2022

read_seqcount_retry() et.al. are inlined and there's enough annotations
for compiler to figure out that those are unlikely to return non-zero.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

82ef0698

__follow_mount_rcu(): verify that mount_lock remains unchanged · 20aac6c6

由 Al Viro 提交于 7月 04, 2022

Validate mount_lock seqcount as soon as we cross into mount in RCU
mode.  Sure, ->mnt_root is pinned and will remain so until we
do rcu_read_unlock() anyway, and we will eventually fail to unlazy if
the mount_lock had been touched, but we might run into a hard error
(e.g. -ENOENT) before trying to unlazy.  And it's possible to end
up with RCU pathwalk racing with rename() and umount() in a way
that would fail with -ENOENT while non-RCU pathwalk would've
succeeded with any timings.

Once upon a time we hadn't needed that, but analysis had been subtle,
brittle and went out of window as soon as RENAME_EXCHANGE had been
added.

It's narrow, hard to hit and won't get you anything other than
stray -ENOENT that could be arranged in much easier way with the
same priveleges, but it's a bug all the same.

Cc: stable@kernel.org
X-sky-is-falling: unlikely
Fixes: da1ce067 "vfs: add cross-rename"
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

20aac6c6

11 6月, 2022 4 次提交

cifs: populate empty hostnames for extra channels · 4c14d704

由 Shyam Prasad N 提交于 6月 06, 2022

Currently, the secondary channels of a multichannel session
also get hostname populated based on the info in primary channel.
However, this will end up with a wrong resolution of hostname to
IP address during reconnect.

This change fixes this by not populating hostname info for all
secondary channels.

Fixes: 5112d80c ("cifs: populate server_hostname for extra channels")
Cc: stable@vger.kernel.org
Signed-off-by: NShyam Prasad N <sprasad@microsoft.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

4c14d704

netfs: Rename the netfs_io_request cleanup op and give it an op pointer · 40a81101

由 David Howells 提交于 2月 25, 2022

The netfs_io_request cleanup op is now always in a position to be given a
pointer to a netfs_io_request struct, so this can be passed in instead of
the mapping and private data arguments (both of which are included in the
struct).

So rename the ->cleanup op to ->free_request (to match ->init_request) and
pass in the I/O pointer.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Reviewed-by: NJeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com

40a81101

netfs: Further cleanups after struct netfs_inode wrapper introduced · e81fb419

由 Linus Torvalds 提交于 6月 09, 2022

Change the signature of netfs helper functions to take a struct netfs_inode
pointer rather than a struct inode pointer where appropriate, thereby
relieving the need for the network filesystem to convert its internal inode
format down to the VFS inode only for netfslib to bounce it back up.  For
type safety, it's better not to do that (and it's less typing too).

Give netfs_write_begin() an extra argument to pass in a pointer to the
netfs_inode struct rather than deriving it internally from the file
pointer.  Note that the ->write_begin() and ->write_end() ops are intended
to be replaced in the future by netfslib code that manages this without the
need to call in twice for each page.

netfs_readpage() and similar are intended to be pointed at directly by the
address_space_operations table, so must stick to the signature dictated by
the function pointers there.

Changes
=======
- Updated the kerneldoc comments and documentation [DH].
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/CAHk-=wgkwKyNmNdKpQkqZ6DnmUL-x9hp0YBnUGjaPFEAdxDTbw@mail.gmail.com/

e81fb419

afs: Fix some checker issues · 102d8410

由 David Howells 提交于 5月 19, 2022

Remove an unused global variable and make another static as reported by
make C=1.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
cc: linux-afs@lists.infradead.org

102d8410

10 6月, 2022 2 次提交

netfs: Fix gcc-12 warning by embedding vfs inode in netfs_i_context · 874c8ca1

由 David Howells 提交于 6月 09, 2022

While randstruct was satisfied with using an open-coded "void *" offset
cast for the netfs_i_context <-> inode casting, __builtin_object_size() as
used by FORTIFY_SOURCE was not as easily fooled.  This was causing the
following complaint[1] from gcc v12:

  In file included from include/linux/string.h:253,
                   from include/linux/ceph/ceph_debug.h:7,
                   from fs/ceph/inode.c:2:
  In function 'fortify_memset_chk',
      inlined from 'netfs_i_context_init' at include/linux/netfs.h:326:2,
      inlined from 'ceph_alloc_inode' at fs/ceph/inode.c:463:2:
  include/linux/fortify-string.h:242:25: warning: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Wattribute-warning]
    242 |                         __write_overflow_field(p_size_field, size);
        |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fix this by embedding a struct inode into struct netfs_i_context (which
should perhaps be renamed to struct netfs_inode).  The struct inode
vfs_inode fields are then removed from the 9p, afs, ceph and cifs inode
structs and vfs_inode is then simply changed to "netfs.inode" in those
filesystems.

Further, rename netfs_i_context to netfs_inode, get rid of the
netfs_inode() function that converted a netfs_i_context pointer to an
inode pointer (that can now be done with &ctx->inode) and rename the
netfs_i_context() function to netfs_inode() (which is now a wrapper
around container_of()).

Most of the changes were done with:

  perl -p -i -e 's/vfs_inode/netfs.inode/'g \
        `git grep -l 'vfs_inode' -- fs/{9p,afs,ceph,cifs}/*.[ch]`

Kees suggested doing it with a pair structure[2] and a special
declarator to insert that into the network filesystem's inode
wrapper[3], but I think it's cleaner to embed it - and then it doesn't
matter if struct randomisation reorders things.

Dave Chinner suggested using a filesystem-specific VFS_I() function in
each filesystem to convert that filesystem's own inode wrapper struct
into the VFS inode struct[4].

Version #2:
 - Fix a couple of missed name changes due to a disabled cifs option.
 - Rename nfs_i_context to nfs_inode
 - Use "netfs" instead of "nic" as the member name in per-fs inode wrapper
   structs.

[ This also undoes commit 507160f4 ("netfs: gcc-12: temporarily
  disable '-Wattribute-warning' for now") that is no longer needed ]

Fixes: bc899ee1 ("netfs: Add a netfs inode context")
Reported-by: NJeff Layton <jlayton@kernel.org>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Reviewed-by: NJeff Layton <jlayton@kernel.org>
Reviewed-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NXiubo Li <xiubli@redhat.com>
cc: Jonathan Corbet <corbet@lwn.net>
cc: Eric Van Hensbergen <ericvh@gmail.com>
cc: Latchesar Ionkov <lucho@ionkov.net>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Christian Schoenebeck <linux_oss@crudebyte.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: Steve French <smfrench@gmail.com>
cc: William Kucharski <william.kucharski@oracle.com>
cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
cc: Dave Chinner <david@fromorbit.com>
cc: linux-doc@vger.kernel.org
cc: v9fs-developer@lists.sourceforge.net
cc: linux-afs@lists.infradead.org
cc: ceph-devel@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: samba-technical@lists.samba.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-hardening@vger.kernel.org
Link: https://lore.kernel.org/r/d2ad3a3d7bdd794c6efb562d2f2b655fb67756b9.camel@kernel.org/ [1]
Link: https://lore.kernel.org/r/20220517210230.864239-1-keescook@chromium.org/ [2]
Link: https://lore.kernel.org/r/20220518202212.2322058-1-keescook@chromium.org/ [3]
Link: https://lore.kernel.org/r/20220524101205.GI2306852@dread.disaster.area/ [4]
Link: https://lore.kernel.org/r/165296786831.3591209.12111293034669289733.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/165305805651.4094995.7763502506786714216.stgit@warthog.procyon.org.uk # v2
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

874c8ca1

netfs: gcc-12: temporarily disable '-Wattribute-warning' for now · 507160f4

由 Linus Torvalds 提交于 6月 09, 2022

This is a pure band-aid so that I can continue merging stuff from people
while some of the gcc-12 fallout gets sorted out.

In particular, gcc-12 is very unhappy about the kinds of pointer
arithmetic tricks that netfs does, and that makes the fortify checks
trigger in afs and ceph:

  In function ‘fortify_memset_chk’,
      inlined from ‘netfs_i_context_init’ at include/linux/netfs.h:327:2,
      inlined from ‘afs_set_netfs_context’ at fs/afs/inode.c:61:2,
      inlined from ‘afs_root_iget’ at fs/afs/inode.c:543:2:
  include/linux/fortify-string.h:258:25: warning: call to ‘__write_overflow_field’ declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Wattribute-warning]
    258 |                         __write_overflow_field(p_size_field, size);
        |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

and the reason is that netfs_i_context_init() is passed a 'struct inode'
pointer, and then it does

        struct netfs_i_context *ctx = netfs_i_context(inode);

        memset(ctx, 0, sizeof(*ctx));

where that netfs_i_context() function just does pointer arithmetic on
the inode pointer, knowing that the netfs_i_context is laid out
immediately after it in memory.

This is all truly disgusting, since the whole "netfs_i_context is laid
out immediately after it in memory" is not actually remotely true in
general, but is just made to be that way for afs and ceph.

See for example fs/cifs/cifsglob.h:

  struct cifsInodeInfo {
        struct {
                /* These must be contiguous */
                struct inode    vfs_inode;      /* the VFS's inode record */
                struct netfs_i_context netfs_ctx; /* Netfslib context */
        };
	[...]

and realize that this is all entirely wrong, and the pointer arithmetic
that netfs_i_context() is doing is also very very wrong and wouldn't
give the right answer if netfs_ctx had different alignment rules from a
'struct inode', for example).

Anyway, that's just a long-winded way to say "the gcc-12 warning is
actually quite reasonable, and our code happens to work but is pretty
disgusting".

This is getting fixed properly, but for now I made the mistake of
thinking "the week right after the merge window tends to be calm for me
as people take a breather" and I did a sustem upgrade.  And I got gcc-12
as a result, so to continue merging fixes from people and not have the
end result drown in warnings, I am fixing all these gcc-12 issues I hit.

Including with these kinds of temporary fixes.

Cc: Kees Cook <keescook@chromium.org>
Cc: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/all/AEEBCF5D-8402-441D-940B-105AA718C71F@chromium.org/Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

507160f4

08 6月, 2022 3 次提交

zonefs: fix zonefs_iomap_begin() for reads · c1c1204c

由 Damien Le Moal 提交于 5月 23, 2022

If a readahead is issued to a sequential zone file with an offset
exactly equal to the current file size, the iomap type is set to
IOMAP_UNWRITTEN, which will prevent an IO, but the iomap length is
calculated as 0. This causes a WARN_ON() in iomap_iter():

[17309.548939] WARNING: CPU: 3 PID: 2137 at fs/iomap/iter.c:34 iomap_iter+0x9cf/0xe80
[...]
[17309.650907] RIP: 0010:iomap_iter+0x9cf/0xe80
[...]
[17309.754560] Call Trace:
[17309.757078]  <TASK>
[17309.759240]  ? lock_is_held_type+0xd8/0x130
[17309.763531]  iomap_readahead+0x1a8/0x870
[17309.767550]  ? iomap_read_folio+0x4c0/0x4c0
[17309.771817]  ? lockdep_hardirqs_on_prepare+0x400/0x400
[17309.778848]  ? lock_release+0x370/0x750
[17309.784462]  ? folio_add_lru+0x217/0x3f0
[17309.790220]  ? reacquire_held_locks+0x4e0/0x4e0
[17309.796543]  read_pages+0x17d/0xb60
[17309.801854]  ? folio_add_lru+0x238/0x3f0
[17309.807573]  ? readahead_expand+0x5f0/0x5f0
[17309.813554]  ? policy_node+0xb5/0x140
[17309.819018]  page_cache_ra_unbounded+0x27d/0x450
[17309.825439]  filemap_get_pages+0x500/0x1450
[17309.831444]  ? filemap_add_folio+0x140/0x140
[17309.837519]  ? lock_is_held_type+0xd8/0x130
[17309.843509]  filemap_read+0x28c/0x9f0
[17309.848953]  ? zonefs_file_read_iter+0x1ea/0x4d0 [zonefs]
[17309.856162]  ? trace_contention_end+0xd6/0x130
[17309.862416]  ? __mutex_lock+0x221/0x1480
[17309.868151]  ? zonefs_file_read_iter+0x166/0x4d0 [zonefs]
[17309.875364]  ? filemap_get_pages+0x1450/0x1450
[17309.881647]  ? __mutex_unlock_slowpath+0x15e/0x620
[17309.888248]  ? wait_for_completion_io_timeout+0x20/0x20
[17309.895231]  ? lock_is_held_type+0xd8/0x130
[17309.901115]  ? lock_is_held_type+0xd8/0x130
[17309.906934]  zonefs_file_read_iter+0x356/0x4d0 [zonefs]
[17309.913750]  new_sync_read+0x2d8/0x520
[17309.919035]  ? __x64_sys_lseek+0x1d0/0x1d0

Furthermore, this causes iomap_readahead() to loop forever as
iomap_readahead_iter() always returns 0, making no progress.

Fix this by treating reads after the file size as access to holes,
setting the iomap type to IOMAP_HOLE, the iomap addr to IOMAP_NULL_ADDR
and using the length argument as is for the iomap length. To simplify
the code with this change, zonefs_iomap_begin() is split into the read
variant, zonefs_read_iomap_begin() and zonefs_read_iomap_ops, and the
write variant, zonefs_write_iomap_begin() and zonefs_write_iomap_ops.
Reported-by: NJorgen Hansen <Jorgen.Hansen@wdc.com>
Fixes: 8dcc1a9d ("fs: New zonefs file system")
Signed-off-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: NJorgen Hansen <Jorgen.Hansen@wdc.com>

c1c1204c

zonefs: Do not ignore explicit_open with active zone limit · 96eca145

由 Damien Le Moal 提交于 6月 02, 2022

A zoned device may have no limit on the number of open zones but may
have a limit on the number of active zones it can support. In such
case, the explicit_open mount option should not be ignored to ensure
that the open() system call activates the zone with an explicit zone
open command, thus guaranteeing that the zone can be written.

Enforce this by ignoring the explicit_open mount option only for
devices that have both the open and active zone limits equal to 0.

Fixes: 87c9ce3f ("zonefs: Add active seq file accounting")
Signed-off-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

96eca145

zonefs: fix handling of explicit_open option on mount · a2a513be

由 Damien Le Moal 提交于 6月 02, 2022

Ignoring the explicit_open mount option on mount for devices that do not
have a limit on the number of open zones must be done after the mount
options are parsed and set in s_mount_opts. Move the check to ignore
the explicit_open option after the call to zonefs_parse_options() in
zonefs_fill_super().

Fixes: b5c00e97 ("zonefs: open/close zone on file open/close")
Cc: <stable@vger.kernel.org>
Signed-off-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>

a2a513be

07 6月, 2022 1 次提交

cifs: return errors during session setup during reconnects · 8ea21823

由 Shyam Prasad N 提交于 5月 31, 2022

During reconnects, we check the return value from
cifs_negotiate_protocol, and have handlers for both success
and failures. But if that passes, and cifs_setup_session
returns any errors other than -EACCES, we do not handle
that. This fix adds a handler for that, so that we don't
go ahead and try a tree_connect on a failed session.
Signed-off-by: NShyam Prasad N <sprasad@microsoft.com>
Reviewed-by: NEnzo Matsumiya <ematsumiya@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: NSteve French <stfrench@microsoft.com>

8ea21823

06 6月, 2022 5 次提交

quota: Prevent memory allocation recursion while holding dq_lock · 537e11cd

由 Matthew Wilcox (Oracle) 提交于 6月 05, 2022

As described in commit 02117b8a ("f2fs: Set GF_NOFS in
read_cache_page_gfp while doing f2fs_quota_read"), we must not enter
filesystem reclaim while holding the dq_lock. Prevent this more generally
by using memalloc_nofs_save() while holding the lock.

Link: https://lore.kernel.org/r/20220605143815.2330891-2-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NJan Kara <jack@suse.cz>

537e11cd

writeback: Fix inode->i_io_list not be protected by inode->i_lock error · 10e14073

由 Jchao Sun 提交于 5月 24, 2022

Commit b35250c0 ("writeback: Protect inode->i_io_list with
inode->i_lock") made inode->i_io_list not only protected by
wb->list_lock but also inode->i_lock, but inode_io_list_move_locked()
was missed. Add lock there and also update comment describing
things protected by inode->i_lock. This also fixes a race where
__mark_inode_dirty() could move inode under flush worker's hands
and thus sync(2) could miss writing some inodes.

Fixes: b35250c0 ("writeback: Protect inode->i_io_list with inode->i_lock")
Link: https://lore.kernel.org/r/20220524150540.12552-1-sunjunchao2870@gmail.com
CC: stable@vger.kernel.org
Signed-off-by: NJchao Sun <sunjunchao2870@gmail.com>
Signed-off-by: NJan Kara <jack@suse.cz>

10e14073

fs: Fix syntax errors in comments · 2aab03b8

由 Xiang wangx 提交于 6月 05, 2022

Delete the redundant word 'not'.

Link: https://lore.kernel.org/r/20220605125509.14837-1-wangxiang@cdjrlc.comSigned-off-by: NXiang wangx <wangxiang@cdjrlc.com>
Signed-off-by: NJan Kara <jack@suse.cz>

2aab03b8

cifs: fix reconnect on smb3 mount types · c36ee7da

由 Paulo Alcantara 提交于 6月 05, 2022

cifs.ko defines two file system types: cifs & smb3, and
__cifs_get_super() was not including smb3 file system type when
looking up superblocks, therefore failing to reconnect tcons in
cifs_tree_connect().

Fix this by calling iterate_supers_type() on both file system types.

Link: https://lore.kernel.org/r/CAFrh3J9soC36+BVuwHB=g9z_KB5Og2+p2_W+BBoBOZveErz14w@mail.gmail.com
Cc: stable@vger.kernel.org
Tested-by: NSatadru Pramanik <satadru@gmail.com>
Reported-by: NSatadru Pramanik <satadru@gmail.com>
Signed-off-by: NPaulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: NSteve French <stfrench@microsoft.com>

c36ee7da

fix the breakage in close_fd_get_file() calling conventions change · 40a19260

由 Al Viro 提交于 6月 05, 2022

It used to grab an extra reference to struct file rather than
just transferring to caller the one it had removed from descriptor
table. New variant doesn't, and callers need to be adjusted.

Reported-and-tested-by: syzbot+47dd250f527cb7bebf24@syzkaller.appspotmail.com
Fixes: 6319194e ("Unify the primitives for file descriptor closing")
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

40a19260

05 6月, 2022 1 次提交

cifs: fix uninitialized pointer in error case in dfs_cache_get_tgt_share · ee3c8019

由 Steve French 提交于 6月 04, 2022

Set default value of ppath to null.
Reported-by: Nkernel test robot <lkp@intel.com>
Reviewed-by: NPaulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: NSteve French <stfrench@microsoft.com>

ee3c8019

04 6月, 2022 1 次提交

cifs: skip trailing separators of prefix paths · ef605e86

由 Paulo Alcantara 提交于 6月 03, 2022

During DFS failover, prefix paths may change, so make sure to not
leave trailing separators when parsing thew in
dfs_cache_get_tgt_share().  The separators of prefix paths are already
handled by build_path_from_dentry_optional_prefix().

Consider the following DFS link:

  //dom/dfs/link: [\srv1\share\dir1, \srv2\share\dir1]

Before commit:

  mount.cifs //dom/dfs/link
  tree connect to \\srv1\share; prefix_path=dir1
  disconnect srv1; failover to srv2
  tree connect to \\srv2\share; prefix_path=dir1\
  mv foo bar

  ...
  SMB2 430 Create Request File: dir1\\foo;GetInfo Request FILE_INFO/SMB2_FILE_ALL_INFO;Close Request
  SMB2 582 Create Response File: dir1\\foo;GetInfo Response;Close Response
  SMB2 430 Create Request File: dir1\\bar;GetInfo Request FILE_INFO/SMB2_FILE_ALL_INFO;Close Request
  SMB2 286 Create Response, Error: STATUS_OBJECT_NAME_NOT_FOUND;GetInfo Response, Error: STATUS_OBJECT_NAME_NOT_FOUND;Close Response, Error: STATUS_OBJECT_NAME_NOT_FOUND
  SMB2 462 Create Request File: dir1\\foo;SetInfo Request FILE_INFO/SMB2_FILE_RENAME_INFO NewName:dir1\\bar;Close Request
  SMB2 478 Create Response File: dir1\\foo;SetInfo Response, Error: STATUS_OBJECT_NAME_INVALID;Close Response

After commit:

  mount.cifs //dom/dfs/link
  tree connect to \\srv1\share; prefix_path=dir1
  disconnect srv1; failover to srv2
  tree connect to \\srv2\share; prefix_path=dir1
  mv foo bar

  ...
  SMB2 430 Create Request File: dir1\foo;GetInfo Request FILE_INFO/SMB2_FILE_ALL_INFO;Close Request
  SMB2 582 Create Response File: dir1\foo;GetInfo Response;Close Response
  SMB2 430 Create Request File: dir1\bar;GetInfo Request FILE_INFO/SMB2_FILE_ALL_INFO;Close Request
  SMB2 286 Create Response, Error: STATUS_OBJECT_NAME_NOT_FOUND;GetInfo Response, Error: STATUS_OBJECT_NAME_NOT_FOUND;Close Response, Error: STATUS_OBJECT_NAME_NOT_FOUND
  SMB2 462 Create Request File: dir1\foo;SetInfo Request FILE_INFO/SMB2_FILE_RENAME_INFO NewName:dir1\bar;Close Request
  SMB2 478 Create Response File: dir1\foo;SetInfo Response;Close Response
Signed-off-by: NPaulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: NSteve French <stfrench@microsoft.com>

ef605e86

03 6月, 2022 1 次提交

NFSD: Fix potential use-after-free in nfsd_file_put() · b6c71c66

由 Chuck Lever 提交于 5月 31, 2022

nfsd_file_put_noref() can free @nf, so don't dereference @nf
immediately upon return from nfsd_file_put_noref().
Suggested-by: NTrond Myklebust <trondmy@hammerspace.com>
Fixes: 99939792 ("nfsd: Clean up nfsd_file_put()")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>

b6c71c66

02 6月, 2022 6 次提交

io_uring: reinstate the inflight tracking · 9cae36a0

由 Jens Axboe 提交于 6月 01, 2022

After some debugging, it was realized that we really do still need the
old inflight tracking for any file type that has io_uring_fops assigned.
If we don't, then trivial circular references will mean that we never get
the ctx cleaned up and hence it'll leak.

Just bring back the inflight tracking, which then also means we can
eliminate the conditional dropping of the file when task_work is queued.

Fixes: d5361233 ("io_uring: drop the old style inflight file tracking")
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9cae36a0

S
cifs: update internal module number · 096c956b
由 Steve French 提交于 6月 01, 2022
```
To 2.37
Signed-off-by: NSteve French <stfrench@microsoft.com>
```
096c956b

cifs: version operations for smb20 unneeded when legacy support disabled · 7ef93ffc

由 Steve French 提交于 6月 01, 2022

We should not be including unused smb20 specific code when legacy
support is disabled (CONFIG_CIFS_ALLOW_INSECURE_LEGACY turned
off).  For example smb2_operations and smb2_values aren't used
in that case.  Over time we can move more and more SMB1/CIFS and SMB2.0
code into the insecure legacy ifdefs
Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

7ef93ffc

cifs: do not build smb1ops if legacy support is disabled · 387ba9bf

由 Steve French 提交于 6月 01, 2022

We should not be including unused SMB1/CIFS functions when legacy
support is disabled (CONFIG_CIFS_ALLOW_INSECURE_LEGACY turned
off), but especially obvious is not needing to build smb1ops.c
at all when legacy support is disabled. Over time we can move
more SMB1/CIFS and SMB2.0 legacy functions into ifdefs but this
is a good start (and shrinks the module size a few percent).
Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

387ba9bf

afs: Fix infinite loop found by xfstest generic/676 · 17eabd42

由 David Howells 提交于 5月 31, 2022

In AFS, a directory is handled as a file that the client downloads and
parses locally for the purposes of performing lookup and getdents
operations.  The in-kernel afs filesystem has a number of functions that
do this.

A directory file is arranged as a series of 2K blocks divided into
32-byte slots, where a directory entry occupies one or more slots, plus
each block starts with one or more metadata blocks.

When parsing a block, if the last slots are occupied by a dirent that
occupies more than a single slot and the file position points at a slot
that's not the initial one, the logic in afs_dir_iterate_block() that
skips over it won't advance the file pointer to the end of it.  This
will cause an infinite loop in getdents() as it will keep retrying that
block and failing to advance beyond the final entry.

Fix this by advancing the file pointer if the next entry will be beyond
it when we skip a block.

This was found by the generic/676 xfstest but can also be triggered with
something like:

	~/xfstests-dev/src/t_readdir_3 /xfstest.test/z 4000 1

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Reviewed-by: NMarc Dionne <marc.dionne@auristor.com>
Tested-by: NMarc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: http://lore.kernel.org/r/165391973497.110268.2939296942213894166.stgit@warthog.procyon.org.uk/Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

17eabd42

io_uring: fix deadlock on iowq file slot alloc · 61c1b44a

由 Pavel Begunkov 提交于 6月 01, 2022

io_fixed_fd_install() can grab uring_lock in the slot allocation path
when called from io-wq, and then call into io_install_fixed_file(),
which will lock it again. Pull all locking out of
io_install_fixed_file() into io_fixed_fd_install().

Fixes: 1339f24b ("io_uring: allow allocated fixed files for openat/openat2")
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/64116172a9d0b85b85300346bb280f3657aafc26.1654087283.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

61c1b44a

01 6月, 2022 6 次提交

fs/ntfs3: provide block_invalidate_folio to fix memory leak · 724bbe49

由 Mikulas Patocka 提交于 5月 30, 2022

The ntfs3 filesystem lacks the 'invalidate_folio' method and it causes
memory leak. If you write to the filesystem and then unmount it, the
cached written data are not freed and they are permanently leaked.
Fixes: 7ba13abb ("fs: Turn block_invalidatepage into block_invalidate_folio")
Reported-by: NJosé Luis Lara Carrascal <manualinux@yahoo.es>
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Acked-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: NNamjae Jeon <linkinjeon@kernel.org>
Signed-off-by: NKonstantin Komarov <almaz.alexandrovich@paragon-software.com>
Cc: stable@vger.kernel.org # v5.18

724bbe49

cifs: fix potential deadlock in direct reclaim · cc391b69

由 Vincent Whitchurch 提交于 6月 01, 2022

The srv_mutex is used during writeback so cifs should ensure that
allocations done when that mutex is held are done with GFP_NOFS, to
avoid having direct reclaim ending up waiting for the same mutex and
causing a deadlock.  This is detected by lockdep with the splat below:

 ======================================================
 WARNING: possible circular locking dependency detected
 5.18.0 #70 Not tainted
 ------------------------------------------------------
 kswapd0/49 is trying to acquire lock:
 ffff8880195782e0 (&tcp_ses->srv_mutex){+.+.}-{3:3}, at: compound_send_recv

 but task is already holding lock:
 ffffffffa98e66c0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (fs_reclaim){+.+.}-{0:0}:
        fs_reclaim_acquire
        kmem_cache_alloc_trace
        __request_module
        crypto_alg_mod_lookup
        crypto_alloc_tfm_node
        crypto_alloc_shash
        cifs_alloc_hash
        smb311_crypto_shash_allocate
        smb311_update_preauth_hash
        compound_send_recv
        cifs_send_recv
        SMB2_negotiate
        smb2_negotiate
        cifs_negotiate_protocol
        cifs_get_smb_ses
        cifs_mount
        cifs_smb3_do_mount
        smb3_get_tree
        vfs_get_tree
        path_mount
        __x64_sys_mount
        do_syscall_64
        entry_SYSCALL_64_after_hwframe

 -> #0 (&tcp_ses->srv_mutex){+.+.}-{3:3}:
        __lock_acquire
        lock_acquire
        __mutex_lock
        mutex_lock_nested
        compound_send_recv
        cifs_send_recv
        SMB2_write
        smb2_sync_write
        cifs_write
        cifs_writepage_locked
        cifs_writepage
        shrink_page_list
        shrink_lruvec
        shrink_node
        balance_pgdat
        kswapd
        kthread
        ret_from_fork

 other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(fs_reclaim);
                                lock(&tcp_ses->srv_mutex);
                                lock(fs_reclaim);
   lock(&tcp_ses->srv_mutex);

  *** DEADLOCK ***

 1 lock held by kswapd0/49:
  #0: ffffffffa98e66c0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat

 stack backtrace:
 CPU: 2 PID: 49 Comm: kswapd0 Not tainted 5.18.0 #70
 Call Trace:
  <TASK>
  dump_stack_lvl
  dump_stack
  print_circular_bug.cold
  check_noncircular
  __lock_acquire
  lock_acquire
  __mutex_lock
  mutex_lock_nested
  compound_send_recv
  cifs_send_recv
  SMB2_write
  smb2_sync_write
  cifs_write
  cifs_writepage_locked
  cifs_writepage
  shrink_page_list
  shrink_lruvec
  shrink_node
  balance_pgdat
  kswapd
  kthread
  ret_from_fork
  </TASK>

Fix this by using the memalloc_nofs_save/restore APIs around the places
where the srv_mutex is held.  Do this in a wrapper function for the
lock/unlock of the srv_mutex, and rename the srv_mutex to avoid missing
call sites in the conversion.

Note that there is another lockdep warning involving internal crypto
locks, which was masked by this problem and is visible after this fix,
see the discussion in this thread:

 https://lore.kernel.org/all/20220523123755.GA13668@axis.com/

Link: https://lore.kernel.org/r/CANT5p=rqcYfYMVHirqvdnnca4Mo+JQSw5Qu12v=kPfpk5yhhmg@mail.gmail.com/Reported-by: NShyam Prasad N <nspmangalore@gmail.com>
Suggested-by: NLars Persson <larper@axis.com>
Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: NEnzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: NVincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

cc391b69

cifs: when extending a file with falloc we should make files not-sparse · f66f8b94

由 Ronnie Sahlberg 提交于 6月 01, 2022

as this is the only way to make sure the region is allocated.
Fix the conditional that was wrong and only tried to make already
non-sparse files non-sparse.

Cc: stable@vger.kernel.org
Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

f66f8b94

NFSv4.1 mark qualified async operations as MOVEABLE tasks · 118f09ed

由 Olga Kornievskaia 提交于 5月 25, 2022

Mark async operations such as RENAME, REMOVE, COMMIT MOVEABLE
for the nfsv4.1+ sessions.

Fixes: 85e39fee ("NFSv4.1 identify and mark RPC tasks that can move between transports")
Signed-off-by: NOlga Kornievskaia <kolga@netapp.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

118f09ed

NFSv4: Fix free of uninitialized nfs4_label on referral lookup. · c3ed2227

由 Benjamin Coddington 提交于 5月 14, 2022

Send along the already-allocated fattr along with nfs4_fs_locations, and
drop the memcpy of fattr.  We end up growing two more allocations, but this
fixes up a crash as:

PID: 790    TASK: ffff88811b43c000  CPU: 0   COMMAND: "ls"
 #0 [ffffc90000857920] panic at ffffffff81b9bfde
 #1 [ffffc900008579c0] do_trap at ffffffff81023a9b
 #2 [ffffc90000857a10] do_error_trap at ffffffff81023b78
 #3 [ffffc90000857a58] exc_stack_segment at ffffffff81be1f45
 #4 [ffffc90000857a80] asm_exc_stack_segment at ffffffff81c009de
 #5 [ffffc90000857b08] nfs_lookup at ffffffffa0302322 [nfs]
 #6 [ffffc90000857b70] __lookup_slow at ffffffff813a4a5f
 #7 [ffffc90000857c60] walk_component at ffffffff813a86c4
 #8 [ffffc90000857cb8] path_lookupat at ffffffff813a9553
 #9 [ffffc90000857cf0] filename_lookup at ffffffff813ab86b
Suggested-by: NTrond Myklebust <trondmy@hammerspace.com>
Fixes: 9558a007 ("NFS: Remove the label from the nfs4_lookup_res struct")
Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

c3ed2227

cifs: remove repeated debug message on cifs_put_smb_ses() · 0d5106a8

由 Enzo Matsumiya 提交于 5月 31, 2022

Similar message is printed a few lines later in the same function
Signed-off-by: NEnzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: NSteve French <stfrench@microsoft.com>

0d5106a8