提交 · 83895227aba1ade33e81f586aa7b6b1e143096a5 · openeuler / Kernel

07 7月, 2020 4 次提交

xfs: fix reflink quota reservation accounting error · 83895227

由 Darrick J. Wong 提交于 6月 29, 2020

Quota reservations are supposed to account for the blocks that might be
allocated due to a bmap btree split.  Reflink doesn't do this, so fix
this to make the quota accounting more accurate before we start
rearranging things.

Fixes: 862bb360 ("xfs: reflink extents from one file to another")
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

83895227

xfs: don't eat an EIO/ENOSPC writeback error when scrubbing data fork · eb0efe50

由 Darrick J. Wong 提交于 6月 29, 2020

The data fork scrubber calls filemap_write_and_wait to flush dirty pages
and delalloc reservations out to disk prior to checking the data fork's
extent mappings.  Unfortunately, this means that scrub can consume the
EIO/ENOSPC errors that would otherwise have stayed around in the address
space until (we hope) the writer application calls fsync to persist data
and collect errors.  The end result is that programs that wrote to a
file might never see the error code and proceed as if nothing were
wrong.

xfs_scrub is not in a position to notify file writers about the
writeback failure, and it's only here to check metadata, not file
contents.  Therefore, if writeback fails, we should stuff the error code
back into the address space so that an fsync by the writer application
can pick that up.

Fixes: 99d9d8d0 ("xfs: scrub inode block mappings")
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

eb0efe50

xfs: preserve rmapbt swapext block reservation from freed blocks · f74681ba

由 Brian Foster 提交于 6月 29, 2020

The rmapbt extent swap algorithm remaps individual extents between
the source inode and the target to trigger reverse mapping metadata
updates. If either inode straddles a format or other bmap allocation
boundary, the individual unmap and map cycles can trigger repeated
bmap block allocations and frees as the extent count bounces back
and forth across the boundary. While net block usage is bound across
the swap operation, this behavior can prematurely exhaust the
transaction block reservation because it continuously drains as the
transaction rolls. Each allocation accounts against the reservation
and each free returns to global free space on transaction roll.

The previous workaround to this problem attempted to detect this
boundary condition and provide surplus block reservation to
acommodate it. This is insufficient because more remaps can occur
than implied by the extent counts; if start offset boundaries are
not aligned between the two inodes, for example.

To address this problem more generically and dynamically, add a
transaction accounting mode that returns freed blocks to the
transaction reservation instead of the superblock counters on
transaction roll and use it when the rmapbt based algorithm is
active. This allows the chain of remap transactions to preserve the
block reservation based own its own frees and prevent premature
exhaustion regardless of the remap pattern. Note that this is only
safe for superblocks with lazy sb accounting, but the latter is
required for v5 supers and the rmap feature depends on v5.

Fixes: b3fed434 ("xfs: account format bouncing into rmapbt swapext tx reservation")
Root-caused-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

f74681ba

xfs: Couple of typo fixes in comments · 06734e3c

由 Keyur Patel 提交于 6月 29, 2020

./xfs/libxfs/xfs_inode_buf.c:56: unnecssary ==> unnecessary
./xfs/libxfs/xfs_inode_buf.c:59: behavour ==> behaviour
./xfs/libxfs/xfs_inode_buf.c:206: unitialized ==> uninitialized
Signed-off-by: NKeyur Patel <iamkeyur96@gmail.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

06734e3c

05 7月, 2020 1 次提交

io_uring: fix regression with always ignoring signals in io_cqring_wait() · b7db41c9

由 Jens Axboe 提交于 7月 04, 2020

When switching to TWA_SIGNAL for task_work notifications, we also made
any signal based condition in io_cqring_wait() return -ERESTARTSYS.
This breaks applications that rely on using signals to abort someone
waiting for events.

Check if we have a signal pending because of queued task_work, and
repeat the signal check once we've run the task_work. This provides a
reliable way of telling the two apart.

Additionally, only use TWA_SIGNAL if we are using an eventfd. If not,
we don't have the dependency situation described in the original commit,
and we can get by with just using TWA_RESUME like we previously did.

Fixes: ce593a6c ("io_uring: use signal based task_work running")
Cc: stable@vger.kernel.org # v5.7
Reported-by: NAndres Freund <andres@anarazel.de>
Tested-by: NAndres Freund <andres@anarazel.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

b7db41c9

04 7月, 2020 1 次提交

Call sysctl_head_finish on error · d4d80e69

由 Matthew Wilcox (Oracle) 提交于 7月 03, 2020

This error path returned directly instead of calling sysctl_head_finish().

Fixes: ef9d965b ("sysctl: reject gigantic reads/write to sysctl files")
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d4d80e69

03 7月, 2020 5 次提交

gfs2: The freeze glock should never be frozen · c860f8ff

由 Bob Peterson 提交于 6月 25, 2020

Before this patch, some gfs2 code locked the freeze glock with LM_FLAG_NOEXP
(Do not freeze) flag, and some did not. We never want to freeze the freeze
glock, so this patch makes it consistently use LM_FLAG_NOEXP always.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

c860f8ff

gfs2: When freezing gfs2, use GL_EXACT and not GL_NOCACHE · 623ba664

由 Bob Peterson 提交于 6月 25, 2020

Before this patch, the freeze code in gfs2 specified GL_NOCACHE in
several places. That's wrong because we always want to know the state
of whether the file system is frozen.

There was also a problem with freeze/thaw transitioning the glock from
frozen (EX) to thawed (SH) because gfs2 will normally grant glocks in EX
to processes that request it in SH mode, unless GL_EXACT is specified.
Therefore, the freeze/thaw code, which tried to reacquire the glock in
SH mode would get the glock in EX mode, and miss the transition from EX
to SH. That made it think the thaw had completed normally, but since the
glock was still cached in EX, other nodes could not freeze again.

This patch removes the GL_NOCACHE flag to allow the freeze glock to be
cached. It also adds the GL_EXACT flag so the glock is fully transitioned
from EX to SH, thereby allowing future freeze operations.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

623ba664

gfs2: read-only mounts should grab the sd_freeze_gl glock · b780cc61

由 Bob Peterson 提交于 6月 25, 2020

Before this patch, only read-write mounts would grab the freeze
glock in read-only mode, as part of gfs2_make_fs_rw. So the freeze
glock was never initialized. That meant requests to freeze, which
request the glock in EX, were granted without any state transition.
That meant you could mount a gfs2 file system, which is currently
frozen on a different cluster node, in read-only mode.

This patch makes read-only mounts lock the freeze glock in SH mode,
which will block for file systems that are frozen on another node.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

b780cc61

gfs2: freeze should work on read-only mounts · 541656d3

由 Bob Peterson 提交于 6月 25, 2020

Before this patch, function freeze_go_sync, called when promoting
the freeze glock, was testing for the SDF_JOURNAL_LIVE superblock flag.
That's only set for read-write mounts. Read-only mounts don't use a
journal, so the bit is never set, so the freeze never happened.

This patch removes the check for SDF_JOURNAL_LIVE for freeze requests
but still checks it when deciding whether to flush a journal.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

541656d3

gfs2: eliminate GIF_ORDERED in favor of list_empty · 7542486b

由 Bob Peterson 提交于 6月 17, 2020

In several places, we used the GIF_ORDERED inode flag to determine
if an inode was on the ordered writes list. However, since we always
held the sd_ordered_lock spin_lock during the manipulation, we can
just as easily check list_empty(&ip->i_ordered) instead.
This allows us to keep more than one ordered writes list to make
journal writing improvements.

This patch eliminates GIF_ORDERED in favor of checking list_empty.
Signed-off-by: NBob Peterson <rpeterso@redhat.com>

7542486b

02 7月, 2020 8 次提交

cifs: prevent truncation from long to int in wait_for_free_credits · 19e88867

由 Ronnie Sahlberg 提交于 7月 02, 2020

The wait_event_... defines evaluate to long so we should not assign it an int as this may truncate
the value.
Reported-by: NMarshall Midden <marshallmidden@gmail.com>
Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>

19e88867

cifs: Fix the target file was deleted when rename failed. · 9ffad926

由 Zhang Xiaoxu 提交于 6月 28, 2020

When xfstest generic/035, we found the target file was deleted
if the rename return -EACESS.

In cifs_rename2, we unlink the positive target dentry if rename
failed with EACESS or EEXIST, even if the target dentry is positived
before rename. Then the existing file was deleted.

We should just delete the target file which created during the
rename.
Reported-by: NHulk Robot <hulkci@huawei.com>
Signed-off-by: NZhang Xiaoxu <zhangxiaoxu5@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: NSteve French <stfrench@microsoft.com>
Reviewed-by: NAurelien Aptel <aaptel@suse.com>

9ffad926

SMB3: Honor 'posix' flag for multiuser mounts · 5391b8e1

由 Paul Aurich 提交于 6月 26, 2020

The flag from the primary tcon needs to be copied into the volume info
so that cifs_get_tcon will try to enable extensions on the per-user
tcon. At that point, since posix extensions must have already been
enabled on the superblock, don't try to needlessly adjust the mount
flags.

Fixes: ce558b0e ("smb3: Add posix create context for smb3.11 posix mounts")
Fixes: b326614e ("smb3: allow "posix" mount option to enable new SMB311 protocol extensions")
Signed-off-by: NPaul Aurich <paul@darkrain42.org>
Signed-off-by: NSteve French <stfrench@microsoft.com>
Reviewed-by: NAurelien Aptel <aaptel@suse.com>

5391b8e1

SMB3: Honor 'handletimeout' flag for multiuser mounts · 6b356f6c

由 Paul Aurich 提交于 6月 26, 2020

Fixes: ca567eb2 ("SMB3: Allow persistent handle timeout to be configurable on mount")
Signed-off-by: NPaul Aurich <paul@darkrain42.org>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: NSteve French <stfrench@microsoft.com>
Reviewed-by: NAurelien Aptel <aaptel@suse.com>

6b356f6c

SMB3: Honor lease disabling for multiuser mounts · ad35f169

由 Paul Aurich 提交于 6月 26, 2020

Fixes: 3e7a02d4 ("smb3: allow disabling requesting leases")
Signed-off-by: NPaul Aurich <paul@darkrain42.org>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: NSteve French <stfrench@microsoft.com>
Reviewed-by: NAurelien Aptel <aaptel@suse.com>

ad35f169

SMB3: Honor persistent/resilient handle flags for multiuser mounts · 00dfbc2f

由 Paul Aurich 提交于 6月 26, 2020

Without this:

- persistent handles will only be enabled for per-user tcons if the
  server advertises the 'Continuous Availabity' capability
- resilient handles would never be enabled for per-user tcons
Signed-off-by: NPaul Aurich <paul@darkrain42.org>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: NSteve French <stfrench@microsoft.com>
Reviewed-by: NAurelien Aptel <aaptel@suse.com>

00dfbc2f

SMB3: Honor 'seal' flag for multiuser mounts · cc15461c

由 Paul Aurich 提交于 6月 26, 2020

Ensure multiuser SMB3 mounts use encryption for all users' tcons if the
mount options are configured to require encryption. Without this, only
the primary tcon and IPC tcons are guaranteed to be encrypted. Per-user
tcons would only be encrypted if the server was configured to require
encryption.
Signed-off-by: NPaul Aurich <paul@darkrain42.org>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: NSteve French <stfrench@microsoft.com>
Reviewed-by: NAurelien Aptel <aaptel@suse.com>

cc15461c

cifs: Display local UID details for SMB sessions in DebugData · aadd69ca

由 Paul Aurich 提交于 6月 26, 2020

This is useful for distinguishing SMB sessions on a multiuser mount.
Signed-off-by: NPaul Aurich <paul@darkrain42.org>
Signed-off-by: NSteve French <stfrench@microsoft.com>
Reviewed-by: NAurelien Aptel <aaptel@suse.com>

aadd69ca

01 7月, 2020 1 次提交

io_uring: use signal based task_work running · ce593a6c

由 Jens Axboe 提交于 6月 30, 2020

Since 5.7, we've been using task_work to trigger async running of
requests in the context of the original task. This generally works
great, but there's a case where if the task is currently blocked
in the kernel waiting on a condition to become true, it won't process
task_work. Even though the task is woken, it just checks whatever
condition it's waiting on, and goes back to sleep if it's still false.

This is a problem if that very condition only becomes true when that
task_work is run. An example of that is the task registering an eventfd
with io_uring, and it's now blocked waiting on an eventfd read. That
read could depend on a completion event, and that completion event
won't get trigged until task_work has been run.

Use the TWA_SIGNAL notification for task_work, so that we ensure that
the task always runs the work when queued.

Cc: stable@vger.kernel.org # v5.7
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ce593a6c

30 6月, 2020 6 次提交

gfs2: Don't sleep during glock hash walk · 34244d71

由 Andreas Gruenbacher 提交于 6月 10, 2020

In flush_delete_work, instead of flushing each individual pending
delayed work item, cancel and re-queue them for immediate execution.
The waiting isn't needed here because we're already waiting for all
queued work items to complete in gfs2_flush_delete_work. This makes the
code more efficient, but more importantly, it avoids sleeping during a
rhashtable walk, inside rcu_read_lock().
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

34244d71

gfs2: fix trans slab error when withdraw occurs inside log_flush · 58e08e8d

由 Bob Peterson 提交于 6月 09, 2020

Log flush operations (gfs2_log_flush()) can target a specific transaction.
But if the function encounters errors (e.g. io errors) and withdraws,
the transaction was only freed it if was queued to one of the ail lists.
If the withdraw occurred before the transaction was queued to the ail1
list, function ail_drain never freed it. The result was:

BUG gfs2_trans: Objects remaining in gfs2_trans on __kmem_cache_shutdown()

This patch makes log_flush() add the targeted transaction to the ail1
list so that function ail_drain() will find and free it properly.

Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: NBob Peterson <rpeterso@redhat.com>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

58e08e8d

gfs2: Don't return NULL from gfs2_inode_lookup · 5902f4dd

由 Andreas Gruenbacher 提交于 6月 09, 2020

Callers expect gfs2_inode_lookup to return an inode pointer or ERR_PTR(error).
Commit b66648ad caused it to return NULL instead of ERR_PTR(-ESTALE) in
some cases.  Fix that.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Fixes: b66648ad ("gfs2: Move inode generation number check into gfs2_inode_lookup")
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>

5902f4dd

nfsd: fix nfsdfs inode reference count leak · bf265401

由 J. Bruce Fields 提交于 6月 23, 2020

I don't understand this code well, but  I'm seeing a warning about a
still-referenced inode on unmount, and every other similar filesystem
does a dput() here.

Fixes: e8a79fb1 ("nfsd: add nfsd/clients directory")
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

bf265401

nfsd4: fix nfsdfs reference count loop · 681370f4

由 J. Bruce Fields 提交于 6月 23, 2020

We don't drop the reference on the nfsdfs filesystem with
mntput(nn->nfsd_mnt) until nfsd_exit_net(), but that won't be called
until the nfsd module's unloaded, and we can't unload the module as long
as there's a reference on nfsdfs. So this prevents module unloading.

Fixes: 2c830dd7 ("nfsd: persist nfsd filesystem across mounts")
Reported-and-Tested-by: Luo Xiaogang <lxgrxd@163.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

681370f4

Revert "fs: Do not check if there is a fsnotify watcher on pseudo inodes" · b6509f6a

由 Mel Gorman 提交于 6月 29, 2020

This reverts commit e9c15bad ("fs: Do not check if there is a
fsnotify watcher on pseudo inodes"). The commit intended to eliminate
fsnotify-related overhead for pseudo inodes but it is broken in
concept. inotify can receive events of pipe files under /proc/X/fd and
chromium relies on close and open events for sandboxing. Maxim Levitsky
reported the following

  Chromium starts as a white rectangle, shows few white rectangles that
  resemble its notifications and then crashes.

  The stdout output from chromium:

  [mlevitsk@starship ~]$chromium-freeworld
  mesa: for the   --simplifycfg-sink-common option: may only occur zero or one times!
  mesa: for the   --global-isel-abort option: may only occur zero or one times!
  [3379:3379:0628/135151.440930:ERROR:browser_switcher_service.cc(238)] XXX Init()
  ../../sandbox/linux/seccomp-bpf-helpers/sigsys_handlers.cc:**CRASHING**:seccomp-bpf failure in syscall 0072
  Received signal 11 SEGV_MAPERR 0000004a9048

Crashes are not universal but even if chromium does not crash, it certainly
does not work properly. While filtering just modify and access might be
safe, the benefit is not worth the risk hence the revert.
Reported-by: NMaxim Levitsky <mlevitsk@redhat.com>
Fixes: e9c15bad ("fs: Do not check if there is a fsnotify watcher on pseudo inodes")
Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b6509f6a

29 6月, 2020 5 次提交

exfat: flush dirty metadata in fsync · 5267456e

由 Sungjong Seo 提交于 6月 18, 2020

generic_file_fsync() exfat used could not guarantee the consistency of
a file because it has flushed not dirty metadata but only dirty data pages
for a file.

Instead of that, use exfat_file_fsync() for files and directories so that
it guarantees to commit both the metadata and data pages for a file.
Signed-off-by: NSungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>

5267456e

exfat: move setting VOL_DIRTY over exfat_remove_entries() · 3bcfb701

由 Namjae Jeon 提交于 6月 17, 2020

Move setting VOL_DIRTY over exfat_remove_entries() to avoid unneeded
leaving VOL_DIRTY on -ENOTEMPTY.

Fixes: 5f2aa075 ("exfat: add inode operations")
Cc: stable@vger.kernel.org # v5.7
Reported-by: NTetsuhiro Kohada <kohada.t2@gmail.com>
Reviewed-by: NSungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>

3bcfb701

exfat: call sync_filesystem for read-only remount · a0271a15

由 Hyunchul Lee 提交于 6月 16, 2020

We need to commit dirty metadata and pages to disk
before remounting exfat as read-only.

This fixes a failure in xfstests generic/452

generic/452 does the following:
cp something <exfat>/
mount -o remount,ro <exfat>

the <exfat>/something is corrupted. because while
exfat is remounted as read-only, exfat doesn't
have a chance to commit metadata and
vfs invalidates page caches in a block device.
Signed-off-by: NHyunchul Lee <hyc.lee@gmail.com>
Acked-by: NSungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>

a0271a15

exfat: add missing brelse() calls on error paths · e8dd3cda

由 Dan Carpenter 提交于 6月 10, 2020

If the second exfat_get_dentry() call fails then we need to release
"old_bh" before returning.  There is a similar bug in exfat_move_file().

Fixes: 5f2aa075 ("exfat: add inode operations")
Reported-by: NMarkus Elfring <Markus.Elfring@web.de>
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>

e8dd3cda

exfat: Set the unused characters of FileName field to the value 0000h · 4ba6ccd6

由 Hyeongseok.Kim 提交于 6月 09, 2020

Some fsck tool complain that padding part of the FileName field
is not set to the value 0000h. So let's maintain filesystem cleaner,
as exfat's spec. recommendation.
Signed-off-by: NHyeongseok.Kim <Hyeongseok@gmail.com>
Reviewed-by: NSungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com>

4ba6ccd6

28 6月, 2020 1 次提交

afs: Fix storage of cell names · 719fdd32

由 David Howells 提交于 6月 24, 2020

The cell name stored in the afs_cell struct is a 64-char + NUL buffer -
when it needs to be able to handle up to AFS_MAXCELLNAME (256 chars) + NUL.

Fix this by changing the array to a pointer and allocating the string.

Found using Coverity.

Fixes: 989782dc ("afs: Overhaul cell database management")
Reported-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

719fdd32

26 6月, 2020 7 次提交

NFSv4 fix CLOSE not waiting for direct IO compeletion · d03727b2

由 Olga Kornievskaia 提交于 6月 24, 2020

Figuring out the root case for the REMOVE/CLOSE race and
suggesting the solution was done by Neil Brown.

Currently what happens is that direct IO calls hold a reference
on the open context which is decremented as an asynchronous task
in the nfs_direct_complete(). Before reference is decremented,
control is returned to the application which is free to close the
file. When close is being processed, it decrements its reference
on the open_context but since directIO still holds one, it doesn't
sent a close on the wire. It returns control to the application
which is free to do other operations. For instance, it can delete a
file. Direct IO is finally releasing its reference and triggering
an asynchronous close. Which races with the REMOVE. On the server,
REMOVE can be processed before the CLOSE, failing the REMOVE with
EACCES as the file is still opened.
Signed-off-by: NOlga Kornievskaia <kolga@netapp.com>
Suggested-by: NNeil Brown <neilb@suse.com>
CC: stable@vger.kernel.org
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

d03727b2

pNFS/flexfiles: Fix list corruption if the mirror count changes · 8b040137

由 Trond Myklebust 提交于 6月 22, 2020

If the mirror count changes in the new layout we pick up inside
ff_layout_pg_init_write(), then we can end up adding the
request to the wrong mirror and corrupting the mirror->pg_list.

Fixes: d600ad1f ("NFS41: pop some layoutget errors to application")
Cc: stable@vger.kernel.org
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

8b040137

nfs: Fix memory leak of export_path · 4659ed7c

由 Tom Rix 提交于 6月 12, 2020

The try_location function is called within a loop by nfs_follow_referral.
try_location calls nfs4_pathname_string to created the export_path.
nfs4_pathname_string allocates the memory. export_path is stored in the
nfs_fs_context/fs_context structure similarly as hostname and source.
But whereas the ctx hostname and source are freed before assignment,
export_path is not.  So if there are multiple loops, the new export_path
will overwrite the old without the old being freed.

So call kfree for export_path.
Signed-off-by: NTom Rix <trix@redhat.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

4659ed7c

ocfs2: fix value of OCFS2_INVALID_SLOT · 9277f833

由 Junxiao Bi 提交于 6月 25, 2020

In the ocfs2 disk layout, slot number is 16 bits, but in ocfs2
implementation, slot number is 32 bits.  Usually this will not cause any
issue, because slot number is converted from u16 to u32, but
OCFS2_INVALID_SLOT was defined as -1, when an invalid slot number from
disk was obtained, its value was (u16)-1, and it was converted to u32.
Then the following checking in get_local_system_inode will be always
skipped:

 static struct inode **get_local_system_inode(struct ocfs2_super *osb,
                                               int type,
                                               u32 slot)
 {
 	BUG_ON(slot == OCFS2_INVALID_SLOT);
	...
 }

Link: http://lkml.kernel.org/r/20200616183829.87211-5-junxiao.bi@oracle.comSigned-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9277f833

ocfs2: fix panic on nfs server over ocfs2 · e5a15e17

由 Junxiao Bi 提交于 6月 25, 2020

The following kernel panic was captured when running nfs server over
ocfs2, at that time ocfs2_test_inode_bit() was checking whether one
inode locating at "blkno" 5 was valid, that is ocfs2 root inode, its
"suballoc_slot" was OCFS2_INVALID_SLOT(65535) and it was allocted from
//global_inode_alloc, but here it wrongly assumed that it was got from per
slot inode alloctor which would cause array overflow and trigger kernel
panic.

  BUG: unable to handle kernel paging request at 0000000000001088
  IP: [<ffffffff816f6898>] _raw_spin_lock+0x18/0xf0
  PGD 1e06ba067 PUD 1e9e7d067 PMD 0
  Oops: 0002 [#1] SMP
  CPU: 6 PID: 24873 Comm: nfsd Not tainted 4.1.12-124.36.1.el6uek.x86_64 #2
  Hardware name: Huawei CH121 V3/IT11SGCA1, BIOS 3.87 02/02/2018
  RIP: _raw_spin_lock+0x18/0xf0
  RSP: e02b:ffff88005ae97908  EFLAGS: 00010206
  RAX: ffff88005ae98000 RBX: 0000000000001088 RCX: 0000000000000000
  RDX: 0000000000020000 RSI: 0000000000000009 RDI: 0000000000001088
  RBP: ffff88005ae97928 R08: 0000000000000000 R09: ffff880212878e00
  R10: 0000000000007ff0 R11: 0000000000000000 R12: 0000000000001088
  R13: ffff8800063c0aa8 R14: ffff8800650c27d0 R15: 000000000000ffff
  FS:  0000000000000000(0000) GS:ffff880218180000(0000) knlGS:ffff880218180000
  CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000001088 CR3: 00000002033d0000 CR4: 0000000000042660
  Call Trace:
    igrab+0x1e/0x60
    ocfs2_get_system_file_inode+0x63/0x3a0 [ocfs2]
    ocfs2_test_inode_bit+0x328/0xa00 [ocfs2]
    ocfs2_get_parent+0xba/0x3e0 [ocfs2]
    reconnect_path+0xb5/0x300
    exportfs_decode_fh+0xf6/0x2b0
    fh_verify+0x350/0x660 [nfsd]
    nfsd4_putfh+0x4d/0x60 [nfsd]
    nfsd4_proc_compound+0x3d3/0x6f0 [nfsd]
    nfsd_dispatch+0xe0/0x290 [nfsd]
    svc_process_common+0x412/0x6a0 [sunrpc]
    svc_process+0x123/0x210 [sunrpc]
    nfsd+0xff/0x170 [nfsd]
    kthread+0xcb/0xf0
    ret_from_fork+0x61/0x90
  Code: 83 c2 02 0f b7 f2 e8 18 dc 91 ff 66 90 eb bf 0f 1f 40 00 55 48 89 e5 41 56 41 55 41 54 53 0f 1f 44 00 00 48 89 fb ba 00 00 02 00 <f0> 0f c1 17 89 d0 45 31 e4 45 31 ed c1 e8 10 66 39 d0 41 89 c6
  RIP   _raw_spin_lock+0x18/0xf0
  CR2: 0000000000001088
  ---[ end trace 7264463cd1aac8f9 ]---
  Kernel panic - not syncing: Fatal exception

Link: http://lkml.kernel.org/r/20200616183829.87211-4-junxiao.bi@oracle.comSigned-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e5a15e17

ocfs2: load global_inode_alloc · 7569d3c7

由 Junxiao Bi 提交于 6月 25, 2020

Set global_inode_alloc as OCFS2_FIRST_ONLINE_SYSTEM_INODE, that will
make it load during mount.  It can be used to test whether some
global/system inodes are valid.  One use case is that nfsd will test
whether root inode is valid.

Link: http://lkml.kernel.org/r/20200616183829.87211-3-junxiao.bi@oracle.comSigned-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7569d3c7

ocfs2: avoid inode removal while nfsd is accessing it · 4cd9973f

由 Junxiao Bi 提交于 6月 25, 2020

Patch series "ocfs2: fix nfsd over ocfs2 issues", v2.

This is a series of patches to fix issues on nfsd over ocfs2.  patch 1
is to avoid inode removed while nfsd access it patch 2 & 3 is to fix a
panic issue.

This patch (of 4):

When nfsd is getting file dentry using handle or parent dentry of some
dentry, one cluster lock is used to avoid inode removed from other node,
but it still could be removed from local node, so use a rw lock to avoid
this.

Link: http://lkml.kernel.org/r/20200616183829.87211-1-junxiao.bi@oracle.com
Link: http://lkml.kernel.org/r/20200616183829.87211-2-junxiao.bi@oracle.comSigned-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4cd9973f

25 6月, 2020 1 次提交

io_uring: fix current->mm NULL dereference on exit · d60b5fbc

由 Pavel Begunkov 提交于 6月 25, 2020

Don't reissue requests from io_iopoll_reap_events(), the task may not
have mm, which ends up with NULL. It's better to kill everything off on
exit anyway.

[  677.734670] RIP: 0010:io_iopoll_complete+0x27e/0x630
...
[  677.734679] Call Trace:
[  677.734695]  ? __send_signal+0x1f2/0x420
[  677.734698]  ? _raw_spin_unlock_irqrestore+0x24/0x40
[  677.734699]  ? send_signal+0xf5/0x140
[  677.734700]  io_iopoll_getevents+0x12f/0x1a0
[  677.734702]  io_iopoll_reap_events.part.0+0x5e/0xa0
[  677.734703]  io_ring_ctx_wait_and_kill+0x132/0x1c0
[  677.734704]  io_uring_release+0x20/0x30
[  677.734706]  __fput+0xcd/0x230
[  677.734707]  ____fput+0xe/0x10
[  677.734709]  task_work_run+0x67/0xa0
[  677.734710]  do_exit+0x35d/0xb70
[  677.734712]  do_group_exit+0x43/0xa0
[  677.734713]  get_signal+0x140/0x900
[  677.734715]  do_signal+0x37/0x780
[  677.734717]  ? enqueue_hrtimer+0x41/0xb0
[  677.734718]  ? recalibrate_cpu_khz+0x10/0x10
[  677.734720]  ? ktime_get+0x3e/0xa0
[  677.734721]  ? lapic_next_deadline+0x26/0x30
[  677.734723]  ? tick_program_event+0x4d/0x90
[  677.734724]  ? __hrtimer_get_next_event+0x4d/0x80
[  677.734726]  __prepare_exit_to_usermode+0x126/0x1c0
[  677.734741]  prepare_exit_to_usermode+0x9/0x40
[  677.734742]  idtentry_exit_cond_rcu+0x4c/0x60
[  677.734743]  sysvec_reschedule_ipi+0x92/0x160
[  677.734744]  ? asm_sysvec_reschedule_ipi+0xa/0x20
[  677.734745]  asm_sysvec_reschedule_ipi+0x12/0x20
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d60b5fbc

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功