提交 · 028b49111ed1bff3e161037f9ac66b8e6428b8a9 · openanolis / cloud-kernel

02 9月, 2020 40 次提交

xfs: add agf freeblocks verify in xfs_agf_verify · 028b4911

由 Zheng Bin 提交于 2月 21, 2020

to #28557760

[ Upstream commit d0c7feaf87678371c2c09b3709400be416b2dc62 ]

We recently used fuzz(hydra) to test XFS and automatically generate
tmp.img(XFS v5 format, but some metadata is wrong)

xfs_repair information(just one AG):
agf_freeblks 0, counted 3224 in ag 0
agf_longest 536874136, counted 3224 in ag 0
sb_fdblocks 613, counted 3228

Test as follows:
mount tmp.img tmpdir
cp file1M tmpdir
sync

In 4.19-stable, sync will stuck, the reason is:
xfs_mountfs
  xfs_check_summary_counts
    if ((!xfs_sb_version_haslazysbcount(&mp->m_sb) ||
       XFS_LAST_UNMOUNT_WAS_CLEAN(mp)) &&
       !xfs_fs_has_sickness(mp, XFS_SICK_FS_COUNTERS))
	return 0;  -->just return, incore sb_fdblocks still be 613
    xfs_initialize_perag_data

cp file1M tmpdir -->ok(write file to pagecache)
sync -->stuck(write pagecache to disk)
xfs_map_blocks
  xfs_iomap_write_allocate
    while (count_fsb != 0) {
      nimaps = 0;
      while (nimaps == 0) { --> endless loop
         nimaps = 1;
         xfs_bmapi_write(..., &nimaps) --> nimaps becomes 0 again
xfs_bmapi_write
  xfs_bmap_alloc
    xfs_bmap_btalloc
      xfs_alloc_vextent
        xfs_alloc_fix_freelist
          xfs_alloc_space_available -->fail(agf_freeblks is 0)

In linux-next, sync not stuck, cause commit c2b3164320b5 ("xfs:
use the latest extent at writeback delalloc conversion time") remove
the above while, dmesg is as follows:
[   55.250114] XFS (loop0): page discard on page ffffea0008bc7380, inode 0x1b0c, offset 0.

Users do not know why this page is discard, the better soultion is:
1. Like xfs_repair, make sure sb_fdblocks is equal to counted
(xfs_initialize_perag_data did this, who is not called at this mount)
2. Add agf verify, if fail, will tell users to repair

This patch use the second soultion.
Signed-off-by: NZheng Bin <zhengbin13@huawei.com>
Signed-off-by: NRen Xudong <renxudong1@huawei.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

028b4911

dm: use noio when sending kobject event · 99e1527a

由 Mikulas Patocka 提交于 7月 08, 2020

to #28557827

commit 6958c1c640af8c3f40fa8a2eee3b5b905d95b677 upstream.

kobject_uevent may allocate memory and it may be called while there are dm
devices suspended. The allocation may recurse into a suspended device,
causing a deadlock. We must set the noio flag when sending a uevent.

The observed deadlock was reported here:
https://www.redhat.com/archives/dm-devel/2020-March/msg00025.htmlReported-by: NKhazhismel Kumykov <khazhy@google.com>
Reported-by: NTahsin Erdogan <tahsin@google.com>
Reported-by: NGabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

99e1527a

ext4: fix race between ext4_sync_parent() and rename() · 727bd990

由 Eric Biggers 提交于 5月 06, 2020

to #28557685

commit 08adf452e628b0e2ce9a01048cfbec52353703d7 upstream.

'igrab(d_inode(dentry->d_parent))' without holding dentry->d_lock is
broken because without d_lock, d_parent can be concurrently changed due
to a rename().  Then if the old directory is immediately deleted, old
d_parent->inode can be NULL.  That causes a NULL dereference in igrab().

To fix this, use dget_parent() to safely grab a reference to the parent
dentry, which pins the inode.  This also eliminates the need to use
d_find_any_alias() other than for the initial inode, as we no longer
throw away the dentry at each step.

This is an extremely hard race to hit, but it is possible.  Adding a
udelay() in between the reads of ->d_parent and its ->d_inode makes it
reproducible on a no-journal filesystem using the following program:

    #include <fcntl.h>
    #include <unistd.h>

    int main()
    {
        if (fork()) {
            for (;;) {
                mkdir("dir1", 0700);
                int fd = open("dir1/file", O_RDWR|O_CREAT|O_SYNC);
                write(fd, "X", 1);
                close(fd);
            }
        } else {
            mkdir("dir2", 0700);
            for (;;) {
                rename("dir1/file", "dir2/file");
                rmdir("dir1");
            }
        }
    }

Fixes: d59729f4 ("ext4: fix races in ext4_sync_parent()")
Cc: stable@vger.kernel.org
Signed-off-by: NEric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20200506183140.541194-1-ebiggers@kernel.orgSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

727bd990

ext4: fix EXT_MAX_EXTENT/INDEX to check for zeroed eh_max · d9bf1840

由 Harshad Shirwadkar 提交于 4月 20, 2020

to #28557685

commit c36a71b4e35ab35340facdd6964a00956b9fef0a upstream.

If eh->eh_max is 0, EXT_MAX_EXTENT/INDEX would evaluate to unsigned
(-1) resulting in illegal memory accesses. Although there is no
consistent repro, we see that generic/019 sometimes crashes because of
this bug.

Ran gce-xfstests smoke and verified that there were no regressions.
Signed-off-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20200421023959.20879-2-harshadshirwadkar@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

d9bf1840

alinux: virtio-blk: fix discard buffer overrun · 2303e69f

由 Jeffle Xu 提交于 7月 29, 2020

fix #29557176

For DISCARD request, the generic block layer may not guarantee that
@req->nr_phys_segments equals the number of bios in the request. In
that case, we are in risk of overruning virtio_blk_discard_write_zeroes
buffers.

commit 8cb6af7b ("nvme: Fix discard buffer overrun") has fixed the
similar issue in nvme driver.
Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

2303e69f

x86/cpufeatures: Add feature bit RDPRU on AMD · bc28fd7c

由 Babu Moger 提交于 10月 07, 2019

fix #29429936

commit 9d40b85bb46a99bc95dad3a07787da93b0a018e9 upstream

AMD Zen 2 introduces a new RDPRU instruction which is used to give
access to some processor registers that are typically only accessible
when the privilege level is zero.

ECX is used as the implicit register to specify which register to read.
RDPRU places the specified register’s value into EDX:EAX.

For example, the RDPRU instruction can be used to read MPERF and APERF
at CPL > 0.

Add the feature bit so it is visible in /proc/cpuinfo.

Details are available in the AMD64 Architecture Programmer’s Manual:
https://www.amd.com/system/files/TechDocs/24594.pdfSigned-off-by: NBabu Moger <babu.moger@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: Aaron Lewis <aaronlewis@google.com>
Cc: ak@linux.intel.com
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: robert.hu@linux.intel.com
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191007204839.5727.10803.stgit@localhost.localdomainSigned-off-by: NArtie Ding <artie.ding@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

bc28fd7c

ext4: disable dioread_nolock whenever delayed allocation is disabled · 8c6a9862

由 Eric Whitney 提交于 3月 19, 2020

fix #29455282

commit c8980e1980ccdc2229aa2218d532ddc62e0aabe5 upstream

The patch "ext4: make dioread_nolock the default" (244adf6426ee) causes
generic/422 to fail when run in kvm-xfstests' ext3conv test case. This
applies both the dioread_nolock and nodelalloc mount options, a
combination not previously tested by kvm-xfstests. The failure occurs
because the dioread_nolock code path splits a previously fallocated
multiblock extent into a series of single block extents when overwriting
a portion of that extent. That causes allocation of an extent tree leaf
node and a reshuffling of extents. Once writeback is completed, the
individual extents are recombined into a single extent, the extent is
moved again, and the leaf node is deleted. The difference in block
utilization before and after writeback due to the leaf node triggers the
failure.

The original reason for this behavior was to avoid ENOSPC when handling
I/O completions during writeback in the dioread_nolock code paths when
delayed allocation is disabled. It may no longer be necessary, because
code was added in the past to reserve extra space to solve this problem
when delayed allocation is enabled, and this code may also apply when
delayed allocation is disabled. Until this can be verified, don't use
the dioread_nolock code paths if delayed allocation is disabled.
Signed-off-by: NEric Whitney <enwlinux@gmail.com>
Link: https://lore.kernel.org/r/20200319150028.24592-1-enwlinux@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

8c6a9862

alinux: nvme-pci: hold cq_lock while completing CQEs · ba34628b

由 Xiaoguang Wang 提交于 7月 28, 2020

fix #29535320

In __nvme_poll(), nvme_complete_cqes() should also been protected
by nvmeq->cq_lock.

Fixes: 0d326c85dba5 ("nvme: provide optimized poll function for separate poll queues")
Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

ba34628b

alinux: panic: change the default value of crash_kexec_post_notifiers to true · 907cd776

由 Shile Zhang 提交于 6月 30, 2020

fix #29056122

commit 'fbb2f06e' ("pvpanic: add crash loaded event") introduce new
pvpanic event 'PVPANIC_CRASH_LOADED', it make the qemu on host can get
info that if the guest already handle the panic by kdump or not.

But if the guest enabled the kdump, it will not post the panic event by
default unless the parameter 'crash_kexec_post_notifiers' is given.
So, its better to set the default value of this parameter to true, to
avoid it missed in case of kdump enabled.

If user want disable the event notification, the parameter
'crash_kexec_post_notifiers=N' should be given.
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

907cd776

alinux: configs: add VIRTIO_MEM and VIRTIO_FS · f4b6a006

由 Liu Bo 提交于 7月 27, 2020

task #28910367
These two are added to alinux recently, update configs to reflect
them.
Signed-off-by: NLiu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

f4b6a006

nvme: fix possible deadlock when nvme_update_formats fails · 7301fff6

由 Sagi Grimberg 提交于 10月 02, 2019

fix #29495487

commit 6abff1b9f7b8884a46b7bd80b49e7af0b5625aeb upstream.

nvme_update_formats may fail to revalidate the namespace and
attempt to remove the namespace. This may lead to a deadlock
as nvme_ns_remove will attempt to acquire the subsystem lock
which is already acquired by the passthru command with effects.

Move the invalid namepsace removal to after the passthru command
releases the subsystem lock.
Reported-by: NJudy Brock <judy.brock@samsung.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>

7301fff6

configs: disable some needless builtin modules · 8f020770

由 Shile Zhang 提交于 7月 16, 2020

task #29355864

Disable the following builtin modules, which is needless for modern
kernel:
 - CONFIG_VIRTIO_BLK_SCSI=y
 - CONFIG_SYSCTL_SYSCALL=y
 - CONFIG_ISA_BUS=y
 - CONFIG_NET_VENDOR_CADENCE=y
 - CONFIG_NET_VENDOR_CORTINA=y
 - CONFIG_NET_VENDOR_I825XX=y
 - CONFIG_NET_VENDOR_NETERION=y
 - CONFIG_NET_VENDOR_NI=y
 - CONFIG_NET_VENDOR_PACKET_ENGINES=y
 - CONFIG_NET_VENDOR_SOCIONEXT=y
 - CONFIG_XFS_RT=y
 - CONFIG_DEBUG_SG=y
 - CONFIG_DEBUG_NOTIFIERS=y
 - CONFIG_DEBUG_CREDENTIALS=y
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NDust Li <dust.li@linux.alibaba.com>

8f020770

alinux: virtiofs: simplify mount options · d068535c

由 Liu Bo 提交于 4月 09, 2020

task #28910367
Rather than explicitly specifying "-o
default_permissions,allow_other", virtiofs can set some default values
for them.

With this, we can simply do
"mount -t virtio_fs atest /mnt/test/ -otag=myfs-1,dax".
Signed-off-by: NLiu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

d068535c

alinux: virtio-fs: export fuse_request_free · e6067150

由 Liu Bo 提交于 7月 25, 2020

task #28910367
virtio-fs will need to use it from outside fs/fuse/dev.c.
Make the symbol visible.
Signed-off-by: NLiu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

e6067150

fuse: Support RENAME_WHITEOUT flag · ebea99bf

由 Vivek Goyal 提交于 2月 05, 2020

task #28910367
commit 519525fa47b5a8155f0b203e49a3a6a2319f75ae upstream

Allow fuse to pass RENAME_WHITEOUT to fuse server.  Overlayfs on top of
virtiofs uses RENAME_WHITEOUT.

Without this patch renaming a directory in overlayfs (dir is on lower)
fails with -EINVAL. With this patch it works.
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
(cherry picked from commit 519525fa47b5a8155f0b203e49a3a6a2319f75ae)
Signed-off-by: NLiu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

ebea99bf

virtiofs: Use completions while waiting for queue to be drained · 88fa38fa

由 Vivek Goyal 提交于 10月 30, 2019

task #28910367
commit 724c15a43e2c7ac26e2d07abef99191162498fa9 upstream

While we wait for queue to finish draining, use completions instead of
usleep_range(). This is better way of waiting for event.
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
(cherry picked from commit 724c15a43e2c7ac26e2d07abef99191162498fa9)
Signed-off-by: NLiu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

88fa38fa

virtiofs: Do not send forget request "struct list_head" element · 2a6ae53e

由 Vivek Goyal 提交于 10月 30, 2019

task #28910367
commit 1efcf39eb627573f8d543ea396cf36b0651b1e56 upstream

We are sending whole of virtio_fs_forget struct to the other end over
virtqueue. Other end does not need to see elements like "struct list".
That's internal detail of guest kernel. Fix it.
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
(cherry picked from commit 1efcf39eb627573f8d543ea396cf36b0651b1e56)
Signed-off-by: NLiu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

2a6ae53e

virtiofs: Use a common function to send forget · a6d9f512

由 Vivek Goyal 提交于 10月 30, 2019

task #28910367
commit 58ada94f95f71d4f73197ab0e9603dbba6e47fe3 upstream

Currently we are duplicating logic to send forgets at two
places. Consolidate the code by calling one helper function.

This also uses virtqueue_add_outbuf() instead of
virtqueue_add_sgs(). Former is simpler to call.
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
(cherry picked from commit 58ada94f95f71d4f73197ab0e9603dbba6e47fe3)
Signed-off-by: NLiu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

a6d9f512

virtiofs: Fix old-style declaration · 8270fcad

由 YueHaibing 提交于 11月 11, 2019

task #28910367
commit 00929447f5758c4f64c74d0a4b40a6eb3d9df0e3 upstream

There expect the 'static' keyword to come first in a declaration, and we
get warnings like this with "make W=1":

fs/fuse/virtio_fs.c:687:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
fs/fuse/virtio_fs.c:692:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
fs/fuse/virtio_fs.c:1029:1: warning: 'static' is not at beginning of declaration [-Wold-style-declaration]
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
(cherry picked from commit 00929447f5758c4f64c74d0a4b40a6eb3d9df0e3)
Signed-off-by: NLiu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

8270fcad

virtiofs: Remove set but not used variable 'fc' · 823286b7

由 zhengbin 提交于 10月 23, 2019

task #28910367
commit 80da5a809d193c60d090cbdf4fe316781bc07965 upstream

Fixes gcc '-Wunused-but-set-variable' warning:

fs/fuse/virtio_fs.c: In function virtio_fs_wake_pending_and_unlock:
fs/fuse/virtio_fs.c:983:20: warning: variable fc set but not used [-Wunused-but-set-variable]

It is not used since commit 7ee1e2e631db ("virtiofs: No need to check
fpq->connected state")
Reported-by: NHulk Robot <hulkci@huawei.com>
Signed-off-by: Nzhengbin <zhengbin13@huawei.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NLiu Bo <bo.liu@linux.alibaba.com>

823286b7

virtiofs: Retry request submission from worker context · 986957da