提交 · 9872e4a8734c68b78e8782e4f7f49e22cd6e9463 · openeuler / Kernel

11 8月, 2022 1 次提交

xfs: fix inode reservation space for removing transaction · 031d166f

由 hexiaole 提交于 8月 09, 2022

In 'fs/xfs/libxfs/xfs_trans_resv.c', the comment for transaction of removing a
directory entry writes:

/* fs/xfs/libxfs/xfs_trans_resv.c begin */
/*
 * For removing a directory entry we can modify:
 *    the parent directory inode: inode size
 *    the removed inode: inode size
...
xfs_calc_remove_reservation(
        struct xfs_mount        *mp)
{
        return XFS_DQUOT_LOGRES(mp) +
                xfs_calc_iunlink_add_reservation(mp) +
                max((xfs_calc_inode_res(mp, 1) +
...
/* fs/xfs/libxfs/xfs_trans_resv.c end */

There has 2 inode size of space to be reserverd, but the actual code
for inode reservation space writes.

There only count for 1 inode size to be reserved in
'xfs_calc_inode_res(mp, 1)', rather than 2.
Signed-off-by: Nhexiaole <hexiaole@kylinos.cn>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
[djwong: remove redundant code citations]
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>

031d166f

10 8月, 2022 3 次提交

NFS: Improve readpage/writepage tracing · 3fa5cbdc

由 Trond Myklebust 提交于 8月 09, 2022

Switch formatting to better match that used by other NFS tracepoints.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

3fa5cbdc

NFS: Improve O_DIRECT tracing · b313eb91

由 Trond Myklebust 提交于 8月 09, 2022

Switch the formatting to match the other NFS tracepoints.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

b313eb91

NFS: Improve write error tracing · af887e43

由 Trond Myklebust 提交于 8月 09, 2022

Don't leak request pointers, but use the "device:inode" labelling that
is used by all the other trace points. Furthermore, replace use of page
indexes with an offset, again in order to align behaviour with other
NFS trace points.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

af887e43

09 8月, 2022 12 次提交

fscache: add tracepoint when failing cookie · 1a1e3aca

由 Jeff Layton 提交于 8月 05, 2022

Signed-off-by: NJeff Layton <jlayton@kernel.org>
Signed-off-by: NDavid Howells <dhowells@redhat.com>

1a1e3aca

fscache: don't leak cookie access refs if invalidation is in progress or failed · fb24771f

由 Jeff Layton 提交于 8月 05, 2022

It's possible for a request to invalidate a fscache_cookie will come in
while we're already processing an invalidation. If that happens we
currently take an extra access reference that will leak. Only call
__fscache_begin_cookie_access if the FSCACHE_COOKIE_DO_INVALIDATE bit
was previously clear.

Also, ensure that we attempt to clear the bit when the cookie is
"FAILED" and put the reference to avoid an access leak.

Fixes: 85e4ea10 ("fscache: Fix invalidation/lookup race")
Suggested-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NJeff Layton <jlayton@kernel.org>
Signed-off-by: NDavid Howells <dhowells@redhat.com>

fb24771f

A
hugetlbfs: copy_page_to_iter() can deal with compound pages · c7d57ab1
由 Al Viro 提交于 6月 23, 2022
```
... since April 2021
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
c7d57ab1

ceph: switch the last caller of iov_iter_get_pages_alloc() · b5358992

由 Al Viro 提交于 6月 10, 2022

here nothing even looks at the iov_iter after the call, so we couldn't
care less whether it advances or not.
Reviewed-by: NJeff Layton <jlayton@kernel.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b5358992

iter_to_pipe(): switch to advancing variant of iov_iter_get_pages() · 7d690c15

由 Al Viro 提交于 6月 09, 2022

... and untangle the cleanup on failure to add into pipe.
Reviewed-by: NJeff Layton <jlayton@kernel.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7d690c15

iov_iter: advancing variants of iov_iter_get_pages{,_alloc}() · 1ef255e2

由 Al Viro 提交于 6月 09, 2022

Most of the users immediately follow successful iov_iter_get_pages()
with advancing by the amount it had returned.

Provide inline wrappers doing that, convert trivial open-coded
uses of those.

BTW, iov_iter_get_pages() never returns more than it had been asked
to; such checks in cifs ought to be removed someday...
Reviewed-by: NJeff Layton <jlayton@kernel.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1ef255e2

splice: stop abusing iov_iter_advance() to flush a pipe · 0d964934

由 Al Viro 提交于 6月 12, 2022

Use pipe_discard_from() explicitly in generic_file_read_iter(); don't bother
with rather non-obvious use of iov_iter_advance() in there.
Reviewed-by: NJeff Layton <jlayton@kernel.org>
Reviewed-by: NChristian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

0d964934

switch new_sync_{read,write}() to ITER_UBUF · 3e20a751

由 Al Viro 提交于 5月 22, 2022

Reviewed-by: NChristian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

3e20a751

new iov_iter flavour - ITER_UBUF · fcb14cb1

由 Al Viro 提交于 5月 22, 2022

Equivalent of single-segment iovec.  Initialized by iov_iter_ubuf(),
checked for by iter_is_ubuf(), otherwise behaves like ITER_IOVEC
ones.

We are going to expose the things like ->write_iter() et.al. to those
in subsequent commits.

New predicate (user_backed_iter()) that is true for ITER_IOVEC and
ITER_UBUF; places like direct-IO handling should use that for
checking that pages we modify after getting them from iov_iter_get_pages()
would need to be dirtied.

DO NOT assume that replacing iter_is_iovec() with user_backed_iter()
will solve all problems - there's code that uses iter_is_iovec() to
decide how to poke around in iov_iter guts and for that the predicate
replacement obviously won't suffice.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

fcb14cb1

mm: hugetlb_vmemmap: introduce the name HVO · dff03381

由 Muchun Song 提交于 6月 28, 2022

It it inconvenient to mention the feature of optimizing vmemmap pages
associated with HugeTLB pages when communicating with others since there
is no specific or abbreviated name for it when it is first introduced. 
Let us give it a name HVO (HugeTLB Vmemmap Optimization) from now.

This commit also updates the document about "hugetlb_free_vmemmap" by the
way discussed in thread [1].

Link: https://lore.kernel.org/all/21aae898-d54d-cc4b-a11f-1bb7fddcfffa@redhat.com/ [1]
Link: https://lkml.kernel.org/r/20220628092235.91270-4-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com>
Reviewed-by: NOscar Salvador <osalvador@suse.de>
Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Will Deacon <will@kernel.org>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

dff03381

NFS: don't unhash dentry during unlink/rename · 3c59366c

由 NeilBrown 提交于 8月 01, 2022

NFS unlink() (and rename over existing target) must determine if the
file is open, and must perform a "silly rename" instead of an unlink (or
before rename) if it is.  Otherwise the client might hold a file open
which has been removed on the server.

Consequently if it determines that the file isn't open, it must block
any subsequent opens until the unlink/rename has been completed on the
server.

This is currently achieved by unhashing the dentry.  This forces any
open attempt to the slow-path for lookup which will block on i_rwsem on
the directory until the unlink/rename completes.  A future patch will
change the VFS to only get a shared lock on i_rwsem for unlink, so this
will no longer work.

Instead we introduce an explicit interlock.  A special value is stored
in dentry->d_fsdata while the unlink/rename is running and
->d_revalidate blocks while that value is present.  When ->d_revalidate
unblocks, the dentry will be invalid.  This closes the race
without requiring exclusion on i_rwsem.

d_fsdata is already used in two different ways.
1/ an IS_ROOT directory dentry might have a "devname" stored in
   d_fsdata.  Such a dentry doesn't have a name and so cannot be the
   target of unlink or rename.  For safety we check if an old devname
   is still stored, and remove it if it is.
2/ a dentry with DCACHE_NFSFS_RENAMED set will have a 'struct
   nfs_unlinkdata' stored in d_fsdata.  While this is set maydelete()
   will fail, so an unlink or rename will never proceed on such
   a dentry.

Neither of these can be in effect when a dentry is the target of unlink
or rename.  So we can expect d_fsdata to be NULL, and store a special
value ((void*)1) which is given the name NFS_FSDATA_BLOCKED to indicate
that any lookup will be blocked.

The d_count() is incremented under d_lock() when a lookup finds the
dentry, so we check d_count() is low, and set NFS_FSDATA_BLOCKED under
the same lock to avoid any races.
Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

3c59366c

vfs: Check the truncate maximum size in inode_newsize_ok() · e2ebff9c

由 David Howells 提交于 8月 08, 2022

If something manages to set the maximum file size to MAX_OFFSET+1, this
can cause the xfs and ext4 filesystems at least to become corrupt.

Ordinarily, the kernel protects against userspace trying this by
checking the value early in the truncate() and ftruncate() system calls
calls - but there are at least two places that this check is bypassed:

 (1) Cachefiles will round up the EOF of the backing file to DIO block
     size so as to allow DIO on the final block - but this might push
     the offset negative. It then calls notify_change(), but this
     inadvertently bypasses the checking. This can be triggered if
     someone puts an 8EiB-1 file on a server for someone else to try and
     access by, say, nfs.

 (2) ksmbd doesn't check the value it is given in set_end_of_file_info()
     and then calls vfs_truncate() directly - which also bypasses the
     check.

In both cases, it is potentially possible for a network filesystem to
cause a disk filesystem to be corrupted: cachefiles in the client's
cache filesystem; ksmbd in the server's filesystem.

nfsd is okay as it checks the value, but we can then remove this check
too.

Fix this by adding a check to inode_newsize_ok(), as called from
setattr_prepare(), thereby catching the issue as filesystems set up to
perform the truncate with minimal opportunity for bypassing the new
check.

Fixes: 1f08c925 ("cachefiles: Implement backing file wrangling")
Fixes: f4415848 ("cifsd: add file operations")
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Reported-by: NJeff Layton <jlayton@kernel.org>
Tested-by: NJeff Layton <jlayton@kernel.org>
Reviewed-by: NNamjae Jeon <linkinjeon@kernel.org>
Cc: stable@kernel.org
Acked-by: NAlexander Viro <viro@zeniv.linux.org.uk>
cc: Steve French <sfrench@samba.org>
cc: Hyunchul Lee <hyc.lee@gmail.com>
cc: Chuck Lever <chuck.lever@oracle.com>
cc: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e2ebff9c

06 8月, 2022 8 次提交

xfs: Fix false ENOSPC when performing direct write on a delalloc extent in cow fork · d6211330

由 Chandan Babu R 提交于 8月 04, 2022

On a higly fragmented filesystem a Direct IO write can fail with -ENOSPC error
even though the filesystem has sufficient number of free blocks.

This occurs if the file offset range on which the write operation is being
performed has a delalloc extent in the cow fork and this delalloc extent
begins much before the Direct IO range.

In such a scenario, xfs_reflink_allocate_cow() invokes xfs_bmapi_write() to
allocate the blocks mapped by the delalloc extent. The extent thus allocated
may not cover the beginning of file offset range on which the Direct IO write
was issued. Hence xfs_reflink_allocate_cow() ends up returning -ENOSPC.

The following script reliably recreates the bug described above.

  #!/usr/bin/bash

  device=/dev/loop0
  shortdev=$(basename $device)

  mntpnt=/mnt/
  file1=${mntpnt}/file1
  file2=${mntpnt}/file2
  fragmentedfile=${mntpnt}/fragmentedfile
  punchprog=/root/repos/xfstests-dev/src/punch-alternating

  errortag=/sys/fs/xfs/${shortdev}/errortag/bmap_alloc_minlen_extent

  umount $device > /dev/null 2>&1

  echo "Create FS"
  mkfs.xfs -f -m reflink=1 $device > /dev/null 2>&1
  if [[ $? != 0 ]]; then
  	echo "mkfs failed."
  	exit 1
  fi

  echo "Mount FS"
  mount $device $mntpnt > /dev/null 2>&1
  if [[ $? != 0 ]]; then
  	echo "mount failed."
  	exit 1
  fi

  echo "Create source file"
  xfs_io -f -c "pwrite 0 32M" $file1 > /dev/null 2>&1

  sync

  echo "Create Reflinked file"
  xfs_io -f -c "reflink $file1" $file2 &>/dev/null

  echo "Set cowextsize"
  xfs_io -c "cowextsize 16M" $file1 > /dev/null 2>&1

  echo "Fragment FS"
  xfs_io -f -c "pwrite 0 64M" $fragmentedfile > /dev/null 2>&1
  sync
  $punchprog $fragmentedfile

  echo "Allocate block sized extent from now onwards"
  echo -n 1 > $errortag

  echo "Create 16MiB delalloc extent in CoW fork"
  xfs_io -c "pwrite 0 4k" $file1 > /dev/null 2>&1

  sync

  echo "Direct I/O write at offset 12k"
  xfs_io -d -c "pwrite 12k 8k" $file1

This commit fixes the bug by invoking xfs_bmapi_write() in a loop until disk
blocks are allocated for atleast the starting file offset of the Direct IO
write range.

Fixes: 3c68d44a ("xfs: allocate direct I/O COW blocks in iomap_begin")
Reported-and-Root-caused-by: NWengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: NChandan Babu R <chandan.babu@oracle.com>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
[djwong: slight editing to make the locking less grody, and fix some style things]
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>

d6211330

xfs: fix intermittent hang during quotacheck · f0c2d7d2

由 Darrick J. Wong 提交于 8月 03, 2022

Every now and then, I see the following hang during mount time
quotacheck when running fstests.  Turning on KASAN seems to make it
happen somewhat more frequently.  I've edited the backtrace for brevity.

XFS (sdd): Quotacheck needed: Please wait.
XFS: Assertion failed: bp->b_flags & _XBF_DELWRI_Q, file: fs/xfs/xfs_buf.c, line: 2411
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1831409 at fs/xfs/xfs_message.c:104 assfail+0x46/0x4a [xfs]
CPU: 0 PID: 1831409 Comm: mount Tainted: G        W         5.19.0-rc6-xfsx #rc6 09911566947b9f737b036b4af85e399e4b9aef64
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
RIP: 0010:assfail+0x46/0x4a [xfs]
Code: a0 8f 41 a0 e8 45 fe ff ff 8a 1d 2c 36 10 00 80 fb 01 76 0f 0f b6 f3 48 c7 c7 c0 f0 4f a0 e8 10 f0 02 e1 80 e3 01 74 02 0f 0b <0f> 0b 5b c3 48 8d 45 10 48 89 e2 4c 89 e6 48 89 1c 24 48 89 44 24
RSP: 0018:ffffc900078c7b30 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8880099ac000 RCX: 000000007fffffff
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffa0418fa0
RBP: ffff8880197bc1c0 R08: 0000000000000000 R09: 000000000000000a
R10: 000000000000000a R11: f000000000000000 R12: ffffc900078c7d20
R13: 00000000fffffff5 R14: ffffc900078c7d20 R15: 0000000000000000
FS:  00007f0449903800(0000) GS:ffff88803ec00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005610ada631f0 CR3: 0000000014dd8002 CR4: 00000000001706f0
Call Trace:
 <TASK>
 xfs_buf_delwri_pushbuf+0x150/0x160 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368]
 xfs_qm_flush_one+0xd6/0x130 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368]
 xfs_qm_dquot_walk.isra.0+0x109/0x1e0 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368]
 xfs_qm_quotacheck+0x319/0x490 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368]
 xfs_qm_mount_quotas+0x65/0x2c0 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368]
 xfs_mountfs+0x6b5/0xab0 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368]
 xfs_fs_fill_super+0x781/0x990 [xfs 4561f5b32c9bfb874ec98d58d0719464e1f87368]
 get_tree_bdev+0x175/0x280
 vfs_get_tree+0x1a/0x80
 path_mount+0x6f5/0xaa0
 __x64_sys_mount+0x103/0x140
 do_syscall_64+0x2b/0x80
 entry_SYSCALL_64_after_hwframe+0x46/0xb0

I /think/ this can happen if xfs_qm_flush_one is racing with
xfs_qm_dquot_isolate (i.e. dquot reclaim) when the second function has
taken the dquot flush lock but xfs_qm_dqflush hasn't yet locked the
dquot buffer, let alone queued it to the delwri list.  In this case,
flush_one will fail to get the dquot flush lock, but it can lock the
incore buffer, but xfs_buf_delwri_pushbuf will then trip over this
ASSERT, which checks that the buffer isn't on a delwri list.  The hang
results because the _delwri_submit_buffers ignores non DELWRI_Q buffers,
which means that xfs_buf_iowait waits forever for an IO that has not yet
been scheduled.

AFAICT, a reasonable solution here is to detect a dquot buffer that is
not on a DELWRI list, drop it, and return -EAGAIN to try the flush
again.  It's not /that/ big of a deal if quotacheck writes the dquot
buffer repeatedly before we even set QUOTA_CHKD.
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

f0c2d7d2

xfs: check return codes when flushing block devices · 7d839e32

由 Darrick J. Wong 提交于 7月 28, 2022

If a blkdev_issue_flush fails, fsync needs to report that to upper
levels.  Modify xfs_file_fsync to capture the errors, while trying to
flush as much data and log updates to disk as possible.

If log writes cannot flush the data device, we need to shut down the log
immediately because we've violated a log invariant.  Modify this code to
check the return value of blkdev_issue_flush as well.

This behavior seems to go back to about 2.6.15 or so, which makes this
fixes tag a bit misleading.

Link: https://elixir.bootlin.com/linux/v2.6.15/source/fs/xfs/xfs_vnodeops.c#L1187
Fixes: b5071ada ("xfs: remove xfs_blkdev_issue_flush")
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

7d839e32

S
cifs: update internal module number · 0d168a58
由 Steve French 提交于 8月 01, 2022
```
To 2.38
Signed-off-by: NSteve French <stfrench@microsoft.com>
```
0d168a58
S
cifs: alloc_mid function should be marked as static · ea75a78c
由 Steve French 提交于 8月 05, 2022
```
It is only used in transport.c.
Signed-off-by: NSteve French <stfrench@microsoft.com>
```
ea75a78c

cifs: remove "cifs_" prefix from init/destroy mids functions · f5fd3f28

由 Enzo Matsumiya 提交于 8月 05, 2022

Rename generic mid functions to same style, i.e. without "cifs_"
prefix.

cifs_{init,destroy}_mids() -> {init,destroy}_mids()
Signed-off-by: NEnzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: NSteve French <stfrench@microsoft.com>

f5fd3f28

cifs: remove useless DeleteMidQEntry() · 70f08f91

由 Enzo Matsumiya 提交于 8月 05, 2022

DeleteMidQEntry() was just a proxy for cifs_mid_q_entry_release().

- remove DeleteMidQEntry()
- rename cifs_mid_q_entry_release() to release_mid()
- rename kref_put() callback _cifs_mid_q_entry_release to __release_mid
- rename AllocMidQEntry() to alloc_mid()
- rename cifs_delete_mid() to delete_mid()

Update callers to use new names.
Signed-off-by: NEnzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: NSteve French <stfrench@microsoft.com>

70f08f91

cifs: when insecure legacy is disabled shrink amount of SMB1 code · fb157ed2

由 Steve French 提交于 8月 01, 2022

Currently much of the smb1 code is built even when
CONFIG_CIFS_ALLOW_INSECURE_LEGACY is disabled.

Move cifssmb.c to only be compiled when insecure legacy is disabled,
and move various SMB1/CIFS helper functions to that ifdef.  Some
functions that were not SMB1/CIFS specific needed to be moved out of
cifssmb.c

This shrinks cifs.ko by more than 10% which is good - but also will
help with the eventual movement of the legacy code to a distinct
module.  Follow on patches can shrink the number of ifdefs by
code restructuring where smb1 code is wedged in functions that
should be calling dialect specific helper functions instead,
and also by moving some functions from file.c/dir.c/inode.c into
smb1 specific c files.
Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: NPaulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: NSteve French <stfrench@microsoft.com>

fb157ed2

05 8月, 2022 9 次提交

f2fs: use onstack pages instead of pvec · 01fc4b9a

由 Fengnan Chang 提交于 7月 31, 2022

Since pvec have 15 pages, it not a multiple of 4, when write compressed
pages, write in 64K as a unit, it will call pagevec_lookup_range_tag
agagin, sometimes this will take a lot of time.
Use onstack pages instead of pvec to mitigate this problem.
Signed-off-by: NFengnan Chang <fengnanchang@gmail.com>
Reviewed-by: NChao Yu <chao@kernel.org>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

01fc4b9a

f2fs: intorduce f2fs_all_cluster_page_ready · 4f8219f8

由 Fengnan Chang 提交于 7月 31, 2022

When write total cluster, all pages is uptodate, there is not need to call
f2fs_prepare_compress_overwrite, intorduce f2fs_all_cluster_page_ready
to avoid this.
Signed-off-by: NFengnan Chang <changfengnan@vivo.com>
Reviewed-by: NChao Yu <chao@kernel.org>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

4f8219f8

f2fs: clean up f2fs_abort_atomic_write() · e53f8643

由 Chao Yu 提交于 8月 04, 2022

f2fs_abort_atomic_write() has checked whether current inode is
atomic_write one or not, it's redundant to check in its caller,
remove it for cleanup.
Signed-off-by: NChao Yu <chao.yu@oppo.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

e53f8643

f2fs: handle decompress only post processing in softirq · bff139b4

由 Daeho Jeong 提交于 8月 02, 2022

Now decompression is being handled in workqueue and it makes read I/O
latency non-deterministic, because of the non-deterministic scheduling
nature of workqueues. So, I made it handled in softirq context only if
possible, not in low memory devices, since this modification will
maintain decompresion related memory a little longer.
Signed-off-by: NDaeho Jeong <daehojeong@google.com>
Reviewed-by: NChao Yu <chao@kernel.org>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

bff139b4

f2fs: do not allow to decompress files have FI_COMPRESS_RELEASED · 90be48bd

由 Jaewook Kim 提交于 8月 03, 2022

If a file has FI_COMPRESS_RELEASED, all writes for it should not be
allowed. However, as of now, in case of compress_mode=user, writes
triggered by IOCTLs like F2FS_IOC_DE/COMPRESS_FILE are allowed unexpectly,
which could crash that file.
To fix it, let's do not allow F2FS_IOC_DE/COMPRESS_IOCTL if a file already
has FI_COMPRESS_RELEASED flag.

This is the reproduction process:
1.  $ touch ./file
2.  $ chattr +c ./file
3.  $ dd if=/dev/random of=./file bs=4096 count=30 conv=notrunc
4.  $ dd if=/dev/zero of=./file bs=4096 count=34 seek=30 conv=notrunc
5.  $ sync
6.  $ do_compress ./file      ; call F2FS_IOC_COMPRESS_FILE
7.  $ get_compr_blocks ./file ; call F2FS_IOC_GET_COMPRESS_BLOCKS
8.  $ release ./file          ; call F2FS_IOC_RELEASE_COMPRESS_BLOCKS
9.  $ do_compress ./file      ; call F2FS_IOC_COMPRESS_FILE again
10. $ get_compr_blocks ./file ; call F2FS_IOC_GET_COMPRESS_BLOCKS again

This reproduction process is tested in 128kb cluster size.
You can find compr_blocks has a negative value.

Fixes: 5fdb322f ("f2fs: add F2FS_IOC_DECOMPRESS_FILE and F2FS_IOC_COMPRESS_FILE")
Signed-off-by: NJunbeom Yeom <junbeom.yeom@samsung.com>
Signed-off-by: NSungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: NYoungjin Gil <youngjin.gil@samsung.com>
Signed-off-by: NJaewook Kim <jw5454.kim@samsung.com>
Reviewed-by: NChao Yu <chao@kernel.org>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

90be48bd

f2fs: do not set compression bit if kernel doesn't support · 912f0d65

由 Jaegeuk Kim 提交于 8月 03, 2022

If kernel doesn't have CONFIG_F2FS_FS_COMPRESSION, a file having FS_COMPR_FL via
ioctl(FS_IOC_SETFLAGS) is unaccessible due to f2fs_is_compress_backend_ready().
Let's avoid it.
Reviewed-by: NChao Yu <chao@kernel.org>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

912f0d65

f2fs: remove device type check for direct IO · dbf8e63f

由 Eunhee Rho 提交于 8月 01, 2022

To ensure serialized IOs, f2fs allows only LFS mode for zoned
device. Remove redundant check for direct IO.
Signed-off-by: NEunhee Rho <eunhee83.rho@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

dbf8e63f

f2fs: fix null-ptr-deref in f2fs_get_dnode_of_data · 4a2c5b79

由 Ye Bin 提交于 8月 01, 2022

There is issue as follows when test f2fs atomic write:
F2FS-fs (loop0): Can't find valid F2FS filesystem in 2th superblock
F2FS-fs (loop0): invalid crc_offset: 0
F2FS-fs (loop0): f2fs_check_nid_range: out-of-range nid=1, run fsck to fix.
F2FS-fs (loop0): f2fs_check_nid_range: out-of-range nid=2, run fsck to fix.
==================================================================
BUG: KASAN: null-ptr-deref in f2fs_get_dnode_of_data+0xac/0x16d0
Read of size 8 at addr 0000000000000028 by task rep/1990

CPU: 4 PID: 1990 Comm: rep Not tainted 5.19.0-rc6-next-20220715 #266
Call Trace:
 <TASK>
 dump_stack_lvl+0x6e/0x91
 print_report.cold+0x49a/0x6bb
 kasan_report+0xa8/0x130
 f2fs_get_dnode_of_data+0xac/0x16d0
 f2fs_do_write_data_page+0x2a5/0x1030
 move_data_page+0x3c5/0xdf0
 do_garbage_collect+0x2015/0x36c0
 f2fs_gc+0x554/0x1d30
 f2fs_balance_fs+0x7f5/0xda0
 f2fs_write_single_data_page+0xb66/0xdc0
 f2fs_write_cache_pages+0x716/0x1420
 f2fs_write_data_pages+0x84f/0x9a0
 do_writepages+0x130/0x3a0
 filemap_fdatawrite_wbc+0x87/0xa0
 file_write_and_wait_range+0x157/0x1c0
 f2fs_do_sync_file+0x206/0x12d0
 f2fs_sync_file+0x99/0xc0
 vfs_fsync_range+0x75/0x140
 f2fs_file_write_iter+0xd7b/0x1850
 vfs_write+0x645/0x780
 ksys_write+0xf1/0x1e0
 do_syscall_64+0x3b/0x90
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

As 3db1de0e commit changed atomic write way which new a cow_inode for
atomic write file, and also mark cow_inode as FI_ATOMIC_FILE.
When f2fs_do_write_data_page write cow_inode will use cow_inode's cow_inode
which is NULL. Then will trigger null-ptr-deref.
To solve above issue, introduce FI_COW_FILE flag for COW inode.

Fiexes: 3db1de0e("f2fs: change the current atomic write way")
Signed-off-by: NYe Bin <yebin10@huawei.com>
Reviewed-by: NChao Yu <chao@kernel.org>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

4a2c5b79

f2fs: revive F2FS_IOC_ABORT_VOLATILE_WRITE · 23339e57

由 Daeho Jeong 提交于 8月 01, 2022

F2FS_IOC_ABORT_VOLATILE_WRITE was used to abort a atomic write before.
However it was removed accidentally. So revive it by changing the name,
since volatile write had gone.
Signed-off-by: NDaeho Jeong <daehojeong@google.com>
Fiexes: 7bc155fe("f2fs: kill volatile write support")
Reviewed-by: NChao Yu <chao@kernel.org>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

23339e57

04 8月, 2022 7 次提交

ksmbd: fix heap-based overflow in set_ntacl_dacl() · 8f054118

由 Namjae Jeon 提交于 8月 02, 2022

The testcase use SMB2_SET_INFO_HE command to set a malformed file attribute
under the label `security.NTACL`. SMB2_QUERY_INFO_HE command in testcase
trigger the following overflow.

[ 4712.003781] ==================================================================
[ 4712.003790] BUG: KASAN: slab-out-of-bounds in build_sec_desc+0x842/0x1dd0 [ksmbd]
[ 4712.003807] Write of size 1060 at addr ffff88801e34c068 by task kworker/0:0/4190

[ 4712.003813] CPU: 0 PID: 4190 Comm: kworker/0:0 Not tainted 5.19.0-rc5 #1
[ 4712.003850] Workqueue: ksmbd-io handle_ksmbd_work [ksmbd]
[ 4712.003867] Call Trace:
[ 4712.003870]  <TASK>
[ 4712.003873]  dump_stack_lvl+0x49/0x5f
[ 4712.003935]  print_report.cold+0x5e/0x5cf
[ 4712.003972]  ? ksmbd_vfs_get_sd_xattr+0x16d/0x500 [ksmbd]
[ 4712.003984]  ? cmp_map_id+0x200/0x200
[ 4712.003988]  ? build_sec_desc+0x842/0x1dd0 [ksmbd]
[ 4712.004000]  kasan_report+0xaa/0x120
[ 4712.004045]  ? build_sec_desc+0x842/0x1dd0 [ksmbd]
[ 4712.004056]  kasan_check_range+0x100/0x1e0
[ 4712.004060]  memcpy+0x3c/0x60
[ 4712.004064]  build_sec_desc+0x842/0x1dd0 [ksmbd]
[ 4712.004076]  ? parse_sec_desc+0x580/0x580 [ksmbd]
[ 4712.004088]  ? ksmbd_acls_fattr+0x281/0x410 [ksmbd]
[ 4712.004099]  smb2_query_info+0xa8f/0x6110 [ksmbd]
[ 4712.004111]  ? psi_group_change+0x856/0xd70
[ 4712.004148]  ? update_load_avg+0x1c3/0x1af0
[ 4712.004152]  ? asym_cpu_capacity_scan+0x5d0/0x5d0
[ 4712.004157]  ? xas_load+0x23/0x300
[ 4712.004162]  ? smb2_query_dir+0x1530/0x1530 [ksmbd]
[ 4712.004173]  ? _raw_spin_lock_bh+0xe0/0xe0
[ 4712.004179]  handle_ksmbd_work+0x30e/0x1020 [ksmbd]
[ 4712.004192]  process_one_work+0x778/0x11c0
[ 4712.004227]  ? _raw_spin_lock_irq+0x8e/0xe0
[ 4712.004231]  worker_thread+0x544/0x1180
[ 4712.004234]  ? __cpuidle_text_end+0x4/0x4
[ 4712.004239]  kthread+0x282/0x320
[ 4712.004243]  ? process_one_work+0x11c0/0x11c0
[ 4712.004246]  ? kthread_complete_and_exit+0x30/0x30
[ 4712.004282]  ret_from_fork+0x1f/0x30

This patch add the buffer validation for security descriptor that is
stored by malformed SMB2_SET_INFO_HE command. and allocate large
response buffer about SMB2_O_INFO_SECURITY file info class.

Fixes: e2f34481 ("cifsd: add server-side procedures for SMB3")
Cc: stable@vger.kernel.org
Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-17771
Reviewed-by: NHyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: NNamjae Jeon <linkinjeon@kernel.org>
Signed-off-by: NSteve French <stfrench@microsoft.com>

8f054118

lockd: detect and reject lock arguments that overflow · 6930bcbf

由 Jeff Layton 提交于 8月 01, 2022

lockd doesn't currently vet the start and length in nlm4 requests like
it should, and can end up generating lock requests with arguments that
overflow when passed to the filesystem.

The NLM4 protocol uses unsigned 64-bit arguments for both start and
length, whereas struct file_lock tracks the start and end as loff_t
values. By the time we get around to calling nlm4svc_retrieve_args,
we've lost the information that would allow us to determine if there was
an overflow.

Start tracking the actual start and len for NLM4 requests in the
nlm_lock. In nlm4svc_retrieve_args, vet these values to ensure they
won't cause an overflow, and return NLM4_FBIG if they do.

Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=392Reported-by: NJan Kasiak <j.kasiak@gmail.com>
Signed-off-by: NJeff Layton <jlayton@kernel.org>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Cc: <stable@vger.kernel.org> # 5.14+

6930bcbf

NFSD: discard fh_locked flag and fh_lock/fh_unlock · dd8dd403

由 NeilBrown 提交于 7月 26, 2022

As all inode locking is now fully balanced, fh_put() does not need to
call fh_unlock().
fh_lock() and fh_unlock() are no longer used, so discard them.
These are the only real users of ->fh_locked, so discard that too.
Reviewed-by: NJeff Layton <jlayton@kernel.org>
Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>

dd8dd403

NFSD: use (un)lock_inode instead of fh_(un)lock for file operations · bb4d53d6

由 NeilBrown 提交于 7月 26, 2022

When locking a file to access ACLs and xattrs etc, use explicit locking
with inode_lock() instead of fh_lock().  This means that the calls to
fh_fill_pre/post_attr() are also explicit which improves readability and
allows us to place them only where they are needed.  Only the xattr
calls need pre/post information.

When locking a file we don't need I_MUTEX_PARENT as the file is not a
parent of anything, so we can use inode_lock() directly rather than the
inode_lock_nested() call that fh_lock() uses.
Reviewed-by: NJeff Layton <jlayton@kernel.org>
Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>

bb4d53d6

NFSD: use explicit lock/unlock for directory ops · debf16f0

由 NeilBrown 提交于 7月 26, 2022

When creating or unlinking a name in a directory use explicit
inode_lock_nested() instead of fh_lock(), and explicit calls to
fh_fill_pre_attrs() and fh_fill_post_attrs().  This is already done
for renames, with lock_rename() as the explicit locking.

Also move the 'fill' calls closer to the operation that might change the
attributes.  This way they are avoided on some error paths.

For the v2-only code in nfsproc.c, the fill calls are not replaced as
they aren't needed.

Making the locking explicit will simplify proposed future changes to
locking for directories.  It also makes it easily visible exactly where
pre/post attributes are used - not all callers of fh_lock() actually
need the pre/post attributes.
Reviewed-by: NJeff Layton <jlayton@kernel.org>
Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>

debf16f0

NFSD: reduce locking in nfsd_lookup() · 19d008b4

由 NeilBrown 提交于 7月 26, 2022

nfsd_lookup() takes an exclusive lock on the parent inode, but no
callers want the lock and it may not be needed at all if the
result is in the dcache.

Change nfsd_lookup_dentry() to not take the lock, and call
lookup_one_len_locked() which takes lock only if needed.

nfsd4_open() currently expects the lock to still be held, but that isn't
necessary as nfsd_validate_delegated_dentry() provides required
guarantees without the lock.

NOTE: NFSv4 requires directory changeinfo for OPEN even when a create
  wasn't requested and no change happened.  Now that nfsd_lookup()
  doesn't use fh_lock(), we need to explicitly fill the attributes
  when no create happens.  A new fh_fill_both_attrs() is provided
  for that task.
Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>

19d008b4

NFSD: only call fh_unlock() once in nfsd_link() · e18bcb33

由 NeilBrown 提交于 7月 26, 2022

On non-error paths, nfsd_link() calls fh_unlock() twice.  This is safe
because fh_unlock() records that the unlock has been done and doesn't
repeat it.
However it makes the code a little confusing and interferes with changes
that are planned for directory locking.

So rearrange the code to ensure fh_unlock() is called exactly once if
fh_lock() was called.
Reviewed-by: NJeff Layton <jlayton@kernel.org>
Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>

e18bcb33

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功