- 22 9月, 2020 40 次提交
-
-
由 Logan Gunthorpe 提交于
mainline inclusion from mainline-5.3-rc2 commit e654dfd3 category: bugfix bugzilla: 19960 CVE: NA --------------------------- When freeing the subsystem after finding another match with __nvme_find_get_subsystem(), use put_device() instead of __nvme_release_subsystem() which calls kfree() directly. Per the documentation, put_device() should always be used after device_initialization() is called. Otherwise, leaks like the one below which was detected by kmemleak may occur. Once the call of __nvme_release_subsystem() is removed it no longer makes sense to keep the helper, so fold it back into nvme_release_subsystem(). unreferenced object 0xffff8883d12bfbc0 (size 16): comm "nvme", pid 2635, jiffies 4294933602 (age 739.952s) hex dump (first 16 bytes): 6e 76 6d 65 2d 73 75 62 73 79 73 32 00 88 ff ff nvme-subsys2.... backtrace: [<000000007d8fc208>] __kmalloc_track_caller+0x16d/0x2a0 [<0000000081169e5f>] kvasprintf+0xad/0x130 [<0000000025626f25>] kvasprintf_const+0x47/0x120 [<00000000fa66ad36>] kobject_set_name_vargs+0x44/0x120 [<000000004881f8b3>] dev_set_name+0x98/0xc0 [<000000007124dae3>] nvme_init_identify+0x1995/0x38e0 [<000000009315020a>] nvme_loop_configure_admin_queue+0x4fa/0x5e0 [<000000001a63e766>] nvme_loop_create_ctrl+0x489/0xf80 [<00000000a46ecc23>] nvmf_dev_write+0x1a12/0x2220 [<000000002259b3d5>] __vfs_write+0x66/0x120 [<000000002f6df81e>] vfs_write+0x154/0x490 [<000000007e8cfc19>] ksys_write+0x10a/0x240 [<00000000ff5c7b85>] __x64_sys_write+0x73/0xb0 [<00000000fee6d692>] do_syscall_64+0xaa/0x470 [<00000000997e1ede>] entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: ab9e00cc ("nvme: track subsystems") Signed-off-by: NLogan Gunthorpe <logang@deltatee.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NSun Ke <sunke32@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Sagi Grimberg 提交于
mainline inclusion from mainline-5.4-rc4 commit 6abff1b9 category: bugfix bugzilla: 24170 CVE: NA --------------------------- nvme_update_formats may fail to revalidate the namespace and attempt to remove the namespace. This may lead to a deadlock as nvme_ns_remove will attempt to acquire the subsystem lock which is already acquired by the passthru command with effects. Move the invalid namepsace removal to after the passthru command releases the subsystem lock. Reported-by: NJudy Brock <judy.brock@samsung.com> Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NSun Ke <sunke32@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 xianrong.zhou 提交于
mainline inclusion from mainline-5.6-rc1 commit 0a531c5a category: bugfix bugzilla: 29444 CVE: NA --------------------------- Try to skip prefetching hash blocks that won't be needed due to the "check_at_most_once" option being enabled and the corresponding data blocks already having been verified. Since prefetching operates on a range of data blocks, do this by just trimming the two ends of the range. This doesn't skip every unneeded hash block, since data blocks in the middle of the range could also be unneeded, and hash blocks are still prefetched in large clusters as controlled by dm_verity_prefetch_cluster. But it can still help a lot. In a test on Android Q launching 91 apps every 15s repeated 21 times, prefetching was only done for 447177/4776629 = 9.36% of data blocks. Tested-by: Nruxian.feng <ruxian.feng@transsion.com> Co-developed-by: Nyuanjiong.gao <yuanjiong.gao@transsion.com> Signed-off-by: Nyuanjiong.gao <yuanjiong.gao@transsion.com> Signed-off-by: Nxianrong.zhou <xianrong.zhou@transsion.com> [EB: simplified the 'while' loops and improved the commit message] Signed-off-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NSun Ke <sunke32@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Masami Hiramatsu 提交于
mainline inclusion from mainline-5.3-rc3 commit b3980e48 category: bugfix bugzilla: 20080 CVE: NA ------------------------------------------------- kprobes manipulates the interrupted PSTATE for single step, and doesn't restore it. Thus, if we put a kprobe where the pstate.D (debug) masked, the mask will be cleared after the kprobe hits. Moreover, in the most complicated case, this can lead a kernel crash with below message when a nested kprobe hits. [ 152.118921] Unexpected kernel single-step exception at EL1 When the 1st kprobe hits, do_debug_exception() will be called. At this point, debug exception (= pstate.D) must be masked (=1). But if another kprobes hits before single-step of the first kprobe (e.g. inside user pre_handler), it unmask the debug exception (pstate.D = 0) and return. Then, when the 1st kprobe setting up single-step, it saves current DAIF, mask DAIF, enable single-step, and restore DAIF. However, since "D" flag in DAIF is cleared by the 2nd kprobe, the single-step exception happens soon after restoring DAIF. This has been introduced by commit 7419333f ("arm64: kprobe: Always clear pstate.D in breakpoint exception handler") To solve this issue, this stores all DAIF bits and restore it after single stepping. Reported-by: NNaresh Kamboju <naresh.kamboju@linaro.org> Fixes: 7419333f ("arm64: kprobe: Always clear pstate.D in breakpoint exception handler") Reviewed-by: NJames Morse <james.morse@arm.com> Tested-by: NJames Morse <james.morse@arm.com> Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org> Signed-off-by: NWill Deacon <will@kernel.org> Signed-off-by: NWei Li <liwei391@huawei.com> Reviewed-by: NHanjun Guo <guohanjun@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Xiubo Li 提交于
mainline inclusion from mainline-5.4-rc1 commit 8454d685 category: bugfix bugzilla: 23253 CVE: NA --------------------------- When the NBD_CFLAG_DESTROY_ON_DISCONNECT flag is set and at the same time when the socket is closed due to the server daemon is restarted, just before the last DISCONNET is totally done if we start a new connection by using the old nbd_index, there will be crashing randomly, like: <3>[ 110.151949] block nbd1: Receive control failed (result -32) <1>[ 110.152024] BUG: unable to handle page fault for address: 0000058000000840 <1>[ 110.152063] #PF: supervisor read access in kernel mode <1>[ 110.152083] #PF: error_code(0x0000) - not-present page <6>[ 110.152094] PGD 0 P4D 0 <4>[ 110.152106] Oops: 0000 [#1] SMP PTI <4>[ 110.152120] CPU: 0 PID: 6698 Comm: kworker/u5:1 Kdump: loaded Not tainted 5.3.0-rc4+ #2 <4>[ 110.152136] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 <4>[ 110.152166] Workqueue: knbd-recv recv_work [nbd] <4>[ 110.152187] RIP: 0010:__dev_printk+0xd/0x67 <4>[ 110.152206] Code: 10 e8 c5 fd ff ff 48 8b 4c 24 18 65 48 33 0c 25 28 00 [...] <4>[ 110.152244] RSP: 0018:ffffa41581f13d18 EFLAGS: 00010206 <4>[ 110.152256] RAX: ffffa41581f13d30 RBX: ffff96dd7374e900 RCX: 0000000000000000 <4>[ 110.152271] RDX: ffffa41581f13d20 RSI: 00000580000007f0 RDI: ffffffff970ec24f <4>[ 110.152285] RBP: ffffa41581f13d80 R08: ffff96dd7fc17908 R09: 0000000000002e56 <4>[ 110.152299] R10: ffffffff970ec24f R11: 0000000000000003 R12: ffff96dd7374e900 <4>[ 110.152313] R13: 0000000000000000 R14: ffff96dd7374e9d8 R15: ffff96dd6e3b02c8 <4>[ 110.152329] FS: 0000000000000000(0000) GS:ffff96dd7fc00000(0000) knlGS:0000000000000000 <4>[ 110.152362] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 110.152383] CR2: 0000058000000840 CR3: 0000000067cc6002 CR4: 00000000001606f0 <4>[ 110.152401] Call Trace: <4>[ 110.152422] _dev_err+0x6c/0x83 <4>[ 110.152435] nbd_read_stat.cold+0xda/0x578 [nbd] <4>[ 110.152448] ? __switch_to_asm+0x34/0x70 <4>[ 110.152468] ? __switch_to_asm+0x40/0x70 <4>[ 110.152478] ? __switch_to_asm+0x34/0x70 <4>[ 110.152491] ? __switch_to_asm+0x40/0x70 <4>[ 110.152501] ? __switch_to_asm+0x34/0x70 <4>[ 110.152511] ? __switch_to_asm+0x40/0x70 <4>[ 110.152522] ? __switch_to_asm+0x34/0x70 <4>[ 110.152533] recv_work+0x35/0x9e [nbd] <4>[ 110.152547] process_one_work+0x19d/0x340 <4>[ 110.152558] worker_thread+0x50/0x3b0 <4>[ 110.152568] kthread+0xfb/0x130 <4>[ 110.152577] ? process_one_work+0x340/0x340 <4>[ 110.152609] ? kthread_park+0x80/0x80 <4>[ 110.152637] ret_from_fork+0x35/0x40 This is very easy to reproduce by running the nbd-runner. Reviewed-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NXiubo Li <xiubli@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NSunKe <sunke32@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Xiubo Li 提交于
mainline inclusion from mainline-5.4-rc1 commit ec76a7b9 category: bugfix bugzilla: 23253 CVE: NA --------------------------- Preparing for the destory when disconnecting crash fixing. Reviewed-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NXiubo Li <xiubli@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NSun Ke <sunke32@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Chandan Rajendra 提交于
mainline inclusion from mainline-5.4-rc1 commit 547b9ad6 category: bugfix bugzilla: 23039 CVE: NA --------------------------- When executing generic/388 on a ppc64le machine, we notice the following call trace, VFS: brelse: Trying to free free buffer WARNING: CPU: 0 PID: 6637 at /root/repos/linux/fs/buffer.c:1195 __brelse+0x84/0xc0 Call Trace: __brelse+0x80/0xc0 (unreliable) invalidate_bh_lru+0x78/0xc0 on_each_cpu_mask+0xa8/0x130 on_each_cpu_cond_mask+0x130/0x170 invalidate_bh_lrus+0x44/0x60 invalidate_bdev+0x38/0x70 ext4_put_super+0x294/0x560 generic_shutdown_super+0xb0/0x170 kill_block_super+0x38/0xb0 deactivate_locked_super+0xa4/0xf0 cleanup_mnt+0x164/0x1d0 task_work_run+0x110/0x160 do_notify_resume+0x414/0x460 ret_from_except_lite+0x70/0x74 The warning happens because flush_descriptor() drops bh reference it does not own. The bh reference acquired by jbd2_journal_get_descriptor_buffer() is owned by the log_bufs list and gets released when this list is processed. The reference for doing IO is only acquired in write_dirty_buffer() later in flush_descriptor(). Reported-by: NHarish Sriram <harish@linux.ibm.com> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NChandan Rajendra <chandan@linux.ibm.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com> Reviewed-by: NZhang Xiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Mike Snitzer 提交于
mainline inclusion from mainline-5.5-rc1 commit f612b213 category: bugfix bugzilla: 25149 CVE: NA --------------------------- This reverts commit a1b89132. Revert required hand-patching due to subsequent changes that were applied since commit a1b89132. Requires: ed0302e8 ("dm crypt: make workqueue names device-specific") Cc: stable@vger.kernel.org Bug: https://bugzilla.kernel.org/show_bug.cgi?id=199857Reported-by: NVito Caputo <vcaputo@pengaru.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NSun Ke <sunke32@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Bob Moore 提交于
mainline inclusion from mainline-v5.5-rc1 commit 197aba20 category: bugfix bugzilla: 25975 CVE: NA ------------------------------------------------- ACPICA commit 7bc16c650317001bc82d4bae227b888a49c51f5e Avoid possible overflow from get_tick_count. Also, cast math using ACPI_100NSEC_PER_MSEC to uint64. Link: https://github.com/acpica/acpica/commit/7bc16c65Signed-off-by: NBob Moore <robert.moore@intel.com> Signed-off-by: NErik Schmauss <erik.schmauss@intel.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: NHanjun Guo <guohanjun@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Theodore Ts'o 提交于
mainline inclusion from mainline-5.6-rc1 commit d4c5e960 category: bugfix bugzilla: 29909 CVE: NA --------------------------- Linus observed that an allmodconfig build which does a lot of stat(2) calls that ext4_getattr() was a noticeable (1%) amount of CPU time, due to the cache line for i_extra_isize getting pulled in. Since the normal stat system call doesn't return btime, it's a complete waste. So only calculate btime when it is explicitly requested. [ Fixed to check against request_mask instead of query_flags. ] Link: https://lore.kernel.org/r/CAHk-=wivmk_j6KbTX+Er64mLrG8abXZo0M10PNdAnHc8fWXfsQ@mail.gmail.comReported-by: NLinus Torvalds <torvalds@linux-foundation.org> Reviewed-by: NAndreas Dilger <adilger@dilger.ca> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com> Reviewed-by: NYang Erkun <yangerkun@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Steven Price 提交于
mainline inclusion from mainline-5.6-rc1 commit c02a9875 category: bugfix bugzilla: 30160 CVE: NA ------------------------------------------------- If walk_pte_range() is called with a 'end' argument that is beyond the last page of memory (e.g. ~0UL) then the comparison between 'addr' and 'end' will always fail and the loop will be infinite. Instead change the comparison to >= while accounting for overflow. Link: http://lkml.kernel.org/r/20191218162402.45610-15-steven.price@arm.comSigned-off-by: NSteven Price <steven.price@arm.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Hogan <jhogan@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: "Liang, Kan" <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Paul Burton <paul.burton@mips.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Zong Li <zong.li@sifive.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit c02a9875) Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: NJing Xiangfeng <jingxiangfeng@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Wei Yang 提交于
mainline inclusion from mainline-5.6-rc1 commit cb829624 category: bugfix bugzilla: 29957 CVE: NA ------------------------------------------------- The page could be a tail page, if this is the case, this BUG_ON will never be triggered. Link: http://lkml.kernel.org/r/20200110032610.26499-1-richardw.yang@linux.intel.com Fixes: e9b61f19 ("thp: reintroduce split_huge_page()") Signed-off-by: NWei Yang <richardw.yang@linux.intel.com> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit cb829624) Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: NJing Xiangfeng <jingxiangfeng@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Wen Yang 提交于
mainline inclusion from mainline-5.5-rc7 commit 0a5d1a7f category: bugfix bugzilla: 28317 CVE: NA ------------------------------------------------- Use div64_ul() instead of do_div() if the divisor is unsigned long, to avoid truncation to 32-bit on 64-bit platforms. Link: http://lkml.kernel.org/r/20200102081442.8273-4-wenyang@linux.alibaba.comSigned-off-by: NWen Yang <wenyang@linux.alibaba.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Qian Cai <cai@lca.pw> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 0a5d1a7f) Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: NJing Xiangfeng <jingxiangfeng@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Wen Yang 提交于
mainline inclusion from mainline-5.5-rc7 commit d3ac946e category: bugfix bugzilla: 28317 CVE: NA ------------------------------------------------- The two variables 'numerator' and 'denominator', though they are declared as long, they should actually be unsigned long (according to the implementation of the fprop_fraction_percpu() function) And do_div() does a 64-by-32 division, while the divisor 'denominator' is unsigned long, thus 64-bit on 64-bit platforms. Hence the proper function to call is div64_ul(). Link: http://lkml.kernel.org/r/20200102081442.8273-3-wenyang@linux.alibaba.comSigned-off-by: NWen Yang <wenyang@linux.alibaba.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Qian Cai <cai@lca.pw> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit d3ac946e) Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: NJing Xiangfeng <jingxiangfeng@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Rafael J. Wysocki 提交于
mainline inclusion from mainline-v5.3-rc1 commit b51033e0 category: bugfix bugzilla: 17644 CVE: NA ------------------------------------------------- In pci_pm_complete() there are checks to decide whether or not to resume devices that were left in runtime-suspend during the preceding system-wide transition into a sleep state. They involve checking the current power state of the device and comparing it with the power state of it set before the preceding system-wide transition, but the platform component of the device's power state is not handled correctly in there. Namely, on platforms with ACPI, the device power state information needs to be updated with care, so that the reference counters of power resources used by the device (if any) are set to ensure that the refreshed power state of it will be maintained going forward. To that end, introduce a new ->refresh_state() platform PM callback for PCI devices, for asking the platform to refresh the device power state data and ensure that the corresponding power state will be maintained going forward, make it invoke acpi_device_update_power() (for devices with ACPI PM) on platforms with ACPI and make pci_pm_complete() use it, through a new pci_refresh_power_state() wrapper function. Fixes: a0d2a959 (PCI: Avoid unnecessary resume after direct-complete) Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: NMika Westerberg <mika.westerberg@linux.intel.com> Conflicts: drivers/pci/pci.c [wangxiongfeng: fix a little conflicts] Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: NHanjun Guo <guohanjun@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Rafael J. Wysocki 提交于
mainline inclusion from mainline-v5.3-rc3 commit 42787ed7 category: bugfix bugzilla: 17645 CVE: NA ------------------------------------------------- Commit f850a48a ("ACPI: PM: Allow transitions to D0 to occur in special cases") overlooked the fact that acpi_power_transition() may change the power.state value for the target device and if that happens, it may confuse acpi_device_set_power() and cause it to omit the _PS0 evaluation which on some systems is necessary to change power states of devices from low-power to D0. Fix that by saving the current value of power.state for the target device before passing it to acpi_power_transition() and using the saved value in a subsequent check. Fixes: f850a48a ("ACPI: PM: Allow transitions to D0 to occur in special cases") Reported-by: NKai-Heng Feng <kai.heng.feng@canonical.com> Reported-by: NMario Limonciello <mario.limonciello@dell.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: NKai-Heng Feng <kai.heng.feng@canonical.com> Tested-by: NMario Limonciello <mario.limonciello@dell.com> Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: NHanjun Guo <guohanjun@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Rafael J. Wysocki 提交于
mainline inclusion from mainline-v5.3-rc1 commit f850a48a category: bugfix bugzilla: 17645 CVE: NA ------------------------------------------------- If a device with ACPI PM is left in D0 during a system-wide transition to the S3 (suspend-to-RAM) or S4 (hibernation) sleep state, the actual state of the device need not be D0 during resume from it, although its power.state value will still reflect D0 (that is, the power state from before the system-wide transition). In that case, the acpi_device_set_power() call made to ensure that the power state of the device will be D0 going forward has no effect, because the new state (D0) is equal to the one reflected by the device's power.state value. That does not affect power resources, which are taken care of by acpi_resume_power_resources() called from acpi_pm_finish() during resume from system-wide sleep states, but it still may be necessary to invoke _PS0 for the device on top of that in order to finalize its transition to D0. For this reason, modify acpi_device_set_power() to allow transitions to D0 to occur even if D0 is the current power state of the device according to its power.state value. That will not affect power resources, which are assumed to be in the right configuration already (as reflected by the current values of their reference counters), but it may cause _PS0 to be evaluated for the device. However, evaluating _PS0 for a device already in D0 may lead to confusion in general, so invoke _PSC (if present) to check the device's current power state upfront and only evaluate _PS0 for it if _PSC has returned a power state different from D0. [If _PSC is not present or the evaluation of it fails, the power state of the device is assumed to be D0 at this point.] Fixes: 20dacb71 (ACPI / PM: Rework device power management to follow ACPI 6) Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: NMika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: NHanjun Guo <guohanjun@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Rafael J. Wysocki 提交于
mainline inclusion from mainline-v5.3-rc1 commit 21ba2379 category: bugfix bugzilla: NA CVE: NA ------------------------------------------------- If the power state of a device with ACPI PM is changed from D3hot to D3cold, it merely is a matter of dropping references to additional power resources (specifically, those in the list returned by _PR3), and the _PS3 method should not be invoked for the device then (as it has already been evaluated during the previous transition to D3hot). Fixes: 20dacb71 (ACPI / PM: Rework device power management to follow ACPI 6) Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: NMika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: NHanjun Guo <guohanjun@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Anders Roxell 提交于
mainline inclusion from mainline-v5.3-rc5 commit 11f4fe9b category: bugfix bugzilla: 20500 CVE: NA ------------------------------------------------------------------------- Now that -Wimplicit-fallthrough is passed to GCC by default, the following warning shows up: ../drivers/iommu/arm-smmu-v3.c: In function ‘arm_smmu_write_strtab_ent’: ../drivers/iommu/arm-smmu-v3.c:1189:7: warning: this statement may fall through [-Wimplicit-fallthrough=] if (disable_bypass) ^ ../drivers/iommu/arm-smmu-v3.c:1191:3: note: here default: ^~~~~~~ Rework so that the compiler doesn't warn about fall-through. Make it clearer by calling 'BUG_ON()' when disable_bypass is set, and always 'break;' Signed-off-by: NAnders Roxell <anders.roxell@linaro.org> Acked-by: NWill Deacon <will@kernel.org> Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: NZhen Lei <thunder.leizhen@huawei.com> Reviewed-by: NHanjun Guo <guohanjun@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Ard Biesheuvel 提交于
mainline inclusion from mainline-5.5-rc3 commit ab0eb162 category: bugfix bugzilla: 27656 CVE: NA ------------------------------------------------- Memory regions that are reserved using efi_mem_reserve_persistent() are recorded in a special EFI config table which survives kexec, allowing the incoming kernel to honour them as well. However, such reservations are not visible in /proc/iomem, and so the kexec tools that load the incoming kernel and its initrd into memory may overwrite these reserved regions before the incoming kernel has a chance to reserve them from further use. Address this problem by adding these reservations to /proc/iomem as they are created. Note that reservations that are inherited from a previous kernel are memblock_reserve()'d early on, so they are already visible in /proc/iomem. Tested-by: NMasayoshi Mizuma <m.mizuma@jp.fujitsu.com> Tested-by: NBhupesh Sharma <bhsharma@redhat.com> Signed-off-by: NArd Biesheuvel <ardb@kernel.org> Reviewed-by: NBhupesh Sharma <bhsharma@redhat.com> Cc: <stable@vger.kernel.org> # v5.4+ Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Arvind Sankar <nivedita@alum.mit.edu> Cc: linux-efi@vger.kernel.org Link: https://lkml.kernel.org/r/20191206165542.31469-2-ardb@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Reviewed-by: NHanjun Guo <guohanjun@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Arnd Bergmann 提交于
mainline inclusion from mainline-v5.5-rc1 commit 9d7bf41f category: bugfix bugzilla: 26804 CVE: NA ------------------------------------- Unlike the normal SIOCOUTQ, SIOCOUTQNSD was never handled in compat mode. Add it to the common socket compat handler along with similar ones. Fixes: 2f4e1b39 ("tcp: ioctl type SIOCOUTQNSD returns amount of data not sent") Cc: Eric Dumazet <edumazet@google.com> Cc: netdev@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: Nguodeqing <geffrey.guo@huawei.com> Reviewed-by: NWenan Mao <maowenan@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Eugeniu Rosca 提交于
mainline inclusion from mainline-v5.9-rc4 commit dc07a728 category: bugfix bugzilla: NA CVE: NA ------------------------------------------------- Commit 52f23478 ("mm/slub.c: fix corrupted freechain in deactivate_slab()") suffered an update when picked up from LKML [1]. Specifically, relocating 'freelist = NULL' into 'freelist_corrupted()' created a no-op statement. Fix it by sticking to the behavior intended in the original patch [1]. In addition, make freelist_corrupted() immune to passing NULL instead of &freelist. The issue has been spotted via static analysis and code review. [1] https://lore.kernel.org/linux-mm/20200331031450.12182-1-dongli.zhang@oracle.com/ Fixes: 52f23478 ("mm/slub.c: fix corrupted freechain in deactivate_slab()") Signed-off-by: NEugeniu Rosca <erosca@de.adit-jv.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Dongli Zhang <dongli.zhang@oracle.com> Cc: Joe Jin <joe.jin@oracle.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200824130643.10291-1-erosca@de.adit-jv.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Hugh Dickins 提交于
stable inclusion from linux-4.19.141 commit 2406c45db3de8da8052eccb1dcfde69ea5988b19 -------------------------------- commit 18e77600 upstream. Only once have I seen this scenario (and forgot even to notice what forced the eventual crash): a sequence of "BUG: Bad page map" alerts from vm_normal_page(), from zap_pte_range() servicing exit_mmap(); pmd:00000000, pte values corresponding to data in physical page 0. The pte mappings being zapped in this case were supposed to be from a huge page of ext4 text (but could as well have been shmem): my belief is that it was racing with collapse_file()'s retract_page_tables(), found *pmd pointing to a page table, locked it, but *pmd had become 0 by the time start_pte was decided. In most cases, that possibility is excluded by holding mmap lock; but exit_mmap() proceeds without mmap lock. Most of what's run by khugepaged checks khugepaged_test_exit() after acquiring mmap lock: khugepaged_collapse_pte_mapped_thps() and hugepage_vma_revalidate() do so, for example. But retract_page_tables() did not: fix that. The fix is for retract_page_tables() to check khugepaged_test_exit(), after acquiring mmap lock, before doing anything to the page table. Getting the mmap lock serializes with __mmput(), which briefly takes and drops it in __khugepaged_exit(); then the khugepaged_test_exit() check on mm_users makes sure we don't touch the page table once exit_mmap() might reach it, since exit_mmap() will be proceeding without mmap lock, not expecting anyone to be racing with it. Fixes: f3f0e1d2 ("khugepaged: add support of collapse for tmpfs/shmem pages") Signed-off-by: NHugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Song Liu <songliubraving@fb.com> Cc: <stable@vger.kernel.org> [4.8+] Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021215400.27773@eggly.anvilsSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Muchun Song 提交于
stable inclusion from linux-4.19.141 commit 46c9d3925ab0ceb4d19cee4be1a061b87faf1e11 -------------------------------- commit 0cb2f137 upstream. We found a case of kernel panic on our server. The stack trace is as follows(omit some irrelevant information): BUG: kernel NULL pointer dereference, address: 0000000000000080 RIP: 0010:kprobe_ftrace_handler+0x5e/0xe0 RSP: 0018:ffffb512c6550998 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff8e9d16eea018 RCX: 0000000000000000 RDX: ffffffffbe1179c0 RSI: ffffffffc0535564 RDI: ffffffffc0534ec0 RBP: ffffffffc0534ec1 R08: ffff8e9d1bbb0f00 R09: 0000000000000004 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8e9d1f797060 R14: 000000000000bacc R15: ffff8e9ce13eca00 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000080 CR3: 00000008453d0005 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> ftrace_ops_assist_func+0x56/0xe0 ftrace_call+0x5/0x34 tcpa_statistic_send+0x5/0x130 [ttcp_engine] The tcpa_statistic_send is the function being kprobed. After analysis, the root cause is that the fourth parameter regs of kprobe_ftrace_handler is NULL. Why regs is NULL? We use the crash tool to analyze the kdump. crash> dis tcpa_statistic_send -r <tcpa_statistic_send>: callq 0xffffffffbd8018c0 <ftrace_caller> The tcpa_statistic_send calls ftrace_caller instead of ftrace_regs_caller. So it is reasonable that the fourth parameter regs of kprobe_ftrace_handler is NULL. In theory, we should call the ftrace_regs_caller instead of the ftrace_caller. After in-depth analysis, we found a reproducible path. Writing a simple kernel module which starts a periodic timer. The timer's handler is named 'kprobe_test_timer_handler'. The module name is kprobe_test.ko. 1) insmod kprobe_test.ko 2) bpftrace -e 'kretprobe:kprobe_test_timer_handler {}' 3) echo 0 > /proc/sys/kernel/ftrace_enabled 4) rmmod kprobe_test 5) stop step 2) kprobe 6) insmod kprobe_test.ko 7) bpftrace -e 'kretprobe:kprobe_test_timer_handler {}' We mark the kprobe as GONE but not disarm the kprobe in the step 4). The step 5) also do not disarm the kprobe when unregister kprobe. So we do not remove the ip from the filter. In this case, when the module loads again in the step 6), we will replace the code to ftrace_caller via the ftrace_module_enable(). When we register kprobe again, we will not replace ftrace_caller to ftrace_regs_caller because the ftrace is disabled in the step 3). So the step 7) will trigger kernel panic. Fix this problem by disarming the kprobe when the module is going away. Link: https://lkml.kernel.org/r/20200728064536.24405-1-songmuchun@bytedance.com Cc: stable@vger.kernel.org Fixes: ae6aa16f ("kprobes: introduce ftrace based optimization") Acked-by: NMasami Hiramatsu <mhiramat@kernel.org> Signed-off-by: NMuchun Song <songmuchun@bytedance.com> Co-developed-by: NChengming Zhou <zhouchengming@bytedance.com> Signed-off-by: NChengming Zhou <zhouchengming@bytedance.com> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Chengming Zhou 提交于
stable inclusion from linux-4.19.141 commit 892fd3637a2c0c29d3dd7402bbb0802eb231b69d -------------------------------- commit 8a224ffb upstream. When module loaded and enabled, we will use __ftrace_replace_code for module if any ftrace_ops referenced it found. But we will get wrong ftrace_addr for module rec in ftrace_get_addr_new, because rec->flags has not been setup correctly. It can cause the callback function of a ftrace_ops has FTRACE_OPS_FL_SAVE_REGS to be called with pt_regs set to NULL. So setup correct FTRACE_FL_REGS flags for rec when we call referenced_filters to find ftrace_ops references it. Link: https://lkml.kernel.org/r/20200728180554.65203-1-zhouchengming@bytedance.com Cc: stable@vger.kernel.org Fixes: 8c4f3c3f ("ftrace: Check module functions being traced on reload") Signed-off-by: NChengming Zhou <zhouchengming@bytedance.com> Signed-off-by: NMuchun Song <songmuchun@bytedance.com> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Michal Koutný 提交于
stable inclusion from linux-4.19.141 commit e88a72e86bd0c0c19ba00865ed68dce22824aac6 -------------------------------- commit a6f23d14 upstream. When workload runs in cgroups that aren't directly below root cgroup and their parent specifies reclaim protection, it may end up ineffective. The reason is that propagate_protected_usage() is not called in all hierarchy up. All the protected usage is incorrectly accumulated in the workload's parent. This means that siblings_low_usage is overestimated and effective protection underestimated. Even though it is transitional phenomenon (uncharge path does correct propagation and fixes the wrong children_low_usage), it can undermine the intended protection unexpectedly. We have noticed this problem while seeing a swap out in a descendant of a protected memcg (intermediate node) while the parent was conveniently under its protection limit and the memory pressure was external to that hierarchy. Michal has pinpointed this down to the wrong siblings_low_usage which led to the unwanted reclaim. The fix is simply updating children_low_usage in respective ancestors also in the charging path. Fixes: 23067153 ("mm: memory.low hierarchical behavior") Signed-off-by: NMichal Koutný <mkoutny@suse.com> Signed-off-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NMichal Hocko <mhocko@suse.com> Acked-by: NRoman Gushchin <guro@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> [4.18+] Link: http://lkml.kernel.org/r/20200803153231.15477-1-mhocko@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Lukas Wunner 提交于
stable inclusion from linux-4.19.141 commit 706695d477fb16a5920098535380ee39337e7ea8 -------------------------------- commit 65488832 upstream. Commit 3451a495 ("driver core: Establish order of operations for device_add and device_del via bitflag") sought to prevent asynchronous driver binding to a device which is being removed. It added a per-device "dead" flag which is checked in the following code paths: * asynchronous binding in __driver_attach_async_helper() * synchronous binding in device_driver_attach() * asynchronous binding in __device_attach_async_helper() It did *not* check the flag upon: * synchronous binding in __device_attach() However __device_attach() may also be called asynchronously from: deferred_probe_work_func() bus_probe_device() device_initial_probe() __device_attach() So if the commit's intention was to check the "dead" flag in all asynchronous code paths, then a check is also necessary in __device_attach(). Add the missing check. Fixes: 3451a495 ("driver core: Establish order of operations for device_add and device_del via bitflag") Signed-off-by: NLukas Wunner <lukas@wunner.de> Cc: stable@vger.kernel.org # v5.1+ Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Link: https://lore.kernel.org/r/de88a23a6fe0ef70f7cfd13c8aea9ab51b4edab6.1594214103.git.lukas@wunner.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Thomas Gleixner 提交于
stable inclusion from linux-4.19.141 commit 5c4d9eefd314e763dcb2a499797176c17ad6ab69 -------------------------------- commit f0c7baca upstream. John reported that on a RK3288 system the perf per CPU interrupts are all affine to CPU0 and provided the analysis: "It looks like what happens is that because the interrupts are not per-CPU in the hardware, armpmu_request_irq() calls irq_force_affinity() while the interrupt is deactivated and then request_irq() with IRQF_PERCPU | IRQF_NOBALANCING. Now when irq_startup() runs with IRQ_STARTUP_NORMAL, it calls irq_setup_affinity() which returns early because IRQF_PERCPU and IRQF_NOBALANCING are set, leaving the interrupt on its original CPU." This was broken by the recent commit which blocked interrupt affinity setting in hardware before activation of the interrupt. While this works in general, it does not work for this particular case. As contrary to the initial analysis not all interrupt chip drivers implement an activate callback, the safe cure is to make the deferred interrupt affinity setting at activation time opt-in. Implement the necessary core logic and make the two irqchip implementations for which this is required opt-in. In hindsight this would have been the right thing to do, but ... Fixes: baedb87d ("genirq/affinity: Handle affinity setting on inactive interrupts correctly") Reported-by: NJohn Keeping <john@metanate.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NMarc Zyngier <maz@kernel.org> Acked-by: NMarc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/87blk4tzgm.fsf@nanos.tec.linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Paul E. McKenney 提交于
stable inclusion from linux-4.19.140 commit abfa9c47ece7c712d3fab494d1771c8f15bba8fa -------------------------------- [ Upstream commit 0a3b3c25 ] A large process running on a heavily loaded system can encounter the following RCU CPU stall warning: rcu: INFO: rcu_sched self-detected stall on CPU rcu: 3-....: (20998 ticks this GP) idle=4ea/1/0x4000000000000002 softirq=556558/556558 fqs=5190 (t=21013 jiffies g=1005461 q=132576) NMI backtrace for cpu 3 CPU: 3 PID: 501900 Comm: aio-free-ring-w Kdump: loaded Not tainted 5.2.9-108_fbk12_rc3_3858_gb83b75af7909 #1 Hardware name: Wiwynn HoneyBadger/PantherPlus, BIOS HBM6.71 02/03/2016 Call Trace: <IRQ> dump_stack+0x46/0x60 nmi_cpu_backtrace.cold.3+0x13/0x50 ? lapic_can_unplug_cpu.cold.27+0x34/0x34 nmi_trigger_cpumask_backtrace+0xba/0xca rcu_dump_cpu_stacks+0x99/0xc7 rcu_sched_clock_irq.cold.87+0x1aa/0x397 ? tick_sched_do_timer+0x60/0x60 update_process_times+0x28/0x60 tick_sched_timer+0x37/0x70 __hrtimer_run_queues+0xfe/0x270 hrtimer_interrupt+0xf4/0x210 smp_apic_timer_interrupt+0x5e/0x120 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:kmem_cache_free+0x223/0x300 Code: 88 00 00 00 0f 85 ca 00 00 00 41 8b 55 18 31 f6 f7 da 41 f6 45 0a 02 40 0f 94 c6 83 c6 05 9c 41 5e fa e8 a0 a7 01 00 41 56 9d <49> 8b 47 08 a8 03 0f 85 87 00 00 00 65 48 ff 08 e9 3d fe ff ff 65 RSP: 0018:ffffc9000e8e3da8 EFLAGS: 00000206 ORIG_RAX: ffffffffffffff13 RAX: 0000000000020000 RBX: ffff88861b9de960 RCX: 0000000000000030 RDX: fffffffffffe41e8 RSI: 000060777fe3a100 RDI: 000000000001be18 RBP: ffffea00186e7780 R08: ffffffffffffffff R09: ffffffffffffffff R10: ffff88861b9dea28 R11: ffff88887ffde000 R12: ffffffff81230a1f R13: ffff888854684dc0 R14: 0000000000000206 R15: ffff8888547dbc00 ? remove_vma+0x4f/0x60 remove_vma+0x4f/0x60 exit_mmap+0xd6/0x160 mmput+0x4a/0x110 do_exit+0x278/0xae0 ? syscall_trace_enter+0x1d3/0x2b0 ? handle_mm_fault+0xaa/0x1c0 do_group_exit+0x3a/0xa0 __x64_sys_exit_group+0x14/0x20 do_syscall_64+0x42/0x100 entry_SYSCALL_64_after_hwframe+0x44/0xa9 And on a PREEMPT=n kernel, the "while (vma)" loop in exit_mmap() can run for a very long time given a large process. This commit therefore adds a cond_resched() to this loop, providing RCU any needed quiescent states. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: <linux-mm@kvack.org> Reviewed-by: NShakeel Butt <shakeelb@google.com> Reviewed-by: NJoel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: NPaul E. McKenney <paulmck@kernel.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Peng Liu 提交于
stable inclusion from linux-4.19.140 commit 008560eb23a8ad242c15cb90478c428a5a69f546 -------------------------------- [ Upstream commit 9b1b234b ] During sched domain init, we check whether non-topological SD_flags are returned by tl->sd_flags(), if found, fire a waning and correct the violation, but the code failed to correct the violation. Correct this. Fixes: 143e1e28 ("sched: Rework sched_domain topology definition") Signed-off-by: NPeng Liu <iwtbavbm@gmail.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NVincent Guittot <vincent.guittot@linaro.org> Reviewed-by: NValentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20200609150936.GA13060@iZj6chx1xj0e0buvshuecpZSigned-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Vincent Guittot 提交于
stable inclusion from linux-4.19.140 commit 519252e38ca1e1d614b11f982b1f489d62c135ee -------------------------------- [ Upstream commit 3ea2f097 ] With commit: 'b7031a02 ("sched/fair: Add NOHZ_STATS_KICK")' rebalance_domains of the local cfs_rq happens before others idle cpus have updated nohz.next_balance and its value is overwritten. Move the update of nohz.next_balance for other idles cpus before balancing and updating the next_balance of local cfs_rq. Also, the nohz.next_balance is now updated only if all idle cpus got a chance to rebalance their domains and the idle balance has not been aborted because of new activities on the CPU. In case of need_resched, the idle load balance will be kick the next jiffie in order to address remaining ilb. Fixes: b7031a02 ("sched/fair: Add NOHZ_STATS_KICK") Reported-by: NPeng Liu <iwtbavbm@gmail.com> Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NValentin Schneider <valentin.schneider@arm.com> Acked-by: NMel Gorman <mgorman@suse.de> Link: https://lkml.kernel.org/r/20200609123748.18636-1-vincent.guittot@linaro.orgSigned-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Frank van der Linden 提交于
stable inclusion from linux-4.19.139 commit fabe9d6cc1663deafba59499829e0c4c28971345 -------------------------------- commit 08b5d501 upstream. set/removexattr on an exported filesystem should break NFS delegations. This is true in general, but also for the upcoming support for RFC 8726 (NFSv4 extended attribute support). Make sure that they do. Additionally, they need to grow a _locked variant, since callers might call this with i_rwsem held (like the NFS server code). Cc: stable@vger.kernel.org # v4.9+ Cc: linux-fsdevel@vger.kernel.org Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NFrank van der Linden <fllinden@amazon.com> Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Qiushi Wu 提交于
stable inclusion from linux-4.19.139 commit 937dafe8682044e70821c886d9063869744b3057 -------------------------------- [ Upstream commit fe3c6068 ] kobject_init_and_add() takes reference even when it fails. If this function returns an error, kobject_put() must be called to properly clean up the memory associated with the object. Callback function fw_cfg_sysfs_release_entry() in kobject_put() can handle the pointer "entry" properly. Signed-off-by: NQiushi Wu <wu000273@umn.edu> Link: https://lore.kernel.org/r/20200613190533.15712-1-wu000273@umn.eduSigned-off-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Jiang Ying 提交于
stable inclusion from linux-4.19.138 commit aa0962310814356b10a973c578cfe595d89a3e39 -------------------------------- This patch is used to fix ext4 direct I/O read error when the read size is not aligned with block size. Then, I will use a test to explain the error. (1) Make a file that is not aligned with block size: $dd if=/dev/zero of=./test.jar bs=1000 count=3 (2) I wrote a source file named "direct_io_read_file.c" as following: #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/file.h> #include <sys/types.h> #include <sys/stat.h> #include <string.h> #define BUF_SIZE 1024 int main() { int fd; int ret; unsigned char *buf; ret = posix_memalign((void **)&buf, 512, BUF_SIZE); if (ret) { perror("posix_memalign failed"); exit(1); } fd = open("./test.jar", O_RDONLY | O_DIRECT, 0755); if (fd < 0){ perror("open ./test.jar failed"); exit(1); } do { ret = read(fd, buf, BUF_SIZE); printf("ret=%d\n",ret); if (ret < 0) { perror("write test.jar failed"); } } while (ret > 0); free(buf); close(fd); } (3) Compile the source file: $gcc direct_io_read_file.c -D_GNU_SOURCE (4) Run the test program: $./a.out The result is as following: ret=1024 ret=1024 ret=952 ret=-1 write test.jar failed: Invalid argument. I have tested this program on XFS filesystem, XFS does not have this problem, because XFS use iomap_dio_rw() to do direct I/O read. And the comparing between read offset and file size is done in iomap_dio_rw(), the code is as following: if (pos < size) { retval = filemap_write_and_wait_range(mapping, pos, pos + iov_length(iov, nr_segs) - 1); if (!retval) { retval = mapping->a_ops->direct_IO(READ, iocb, iov, pos, nr_segs); } ... } ...only when "pos < size", direct I/O can be done, or 0 will be return. I have tested the fix patch on Ext4, it is up to the mustard of EINVAL in man2(read) as following: #include <unistd.h> ssize_t read(int fd, void *buf, size_t count); EINVAL fd is attached to an object which is unsuitable for reading; or the file was opened with the O_DIRECT flag, and either the address specified in buf, the value specified in count, or the current file offset is not suitably aligned. So I think this patch can be applied to fix ext4 direct I/O error. However Ext4 introduces direct I/O read using iomap infrastructure on kernel 5.5, the patch is commit <b1b4705d> ("ext4: introduce direct I/O read using iomap infrastructure"), then Ext4 will be the same as XFS, they all use iomap_dio_rw() to do direct I/O read. So this problem does not exist on kernel 5.5 for Ext4. >From above description, we can see this problem exists on all the kernel versions between kernel 3.14 and kernel 5.4. It will cause the Applications to fail to read. For example, when the search service downloads a new full index file, the search engine is loading the previous index file and is processing the search request, it can not use buffer io that may squeeze the previous index file in use from pagecache, so the serch service must use direct I/O read. Please apply this patch on these kernel versions, or please use the method on kernel 5.5 to fix this problem. Fixes: 9fe55eea ("Fix race when checking i_size on direct i/o read") Reviewed-by: NJan Kara <jack@suse.cz> Co-developed-by: NWang Long <wanglong19@meituan.com> Signed-off-by: NWang Long <wanglong19@meituan.com> Signed-off-by: NJiang Ying <jiangying8582@126.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Robin Murphy 提交于
stable inclusion from linux-4.19.137 commit 0be9b57b5bb5a1e643624ea68decc4a8a14ffda6 -------------------------------- [ Upstream commit 05fb3dbd ] Although iph is expected to point to at least 20 bytes of valid memory, ihl may be bogus, for example on reception of a corrupt packet. If it happens to be less than 5, we really don't want to run away and dereference 16GB worth of memory until it wraps back to exactly zero... Fixes: 0e455d8e ("arm64: Implement optimised IP checksum helpers") Reported-by: Nguodeqing <geffrey.guo@huawei.com> Signed-off-by: NRobin Murphy <robin.murphy@arm.com> Signed-off-by: NWill Deacon <will@kernel.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Sami Tolvanen 提交于
stable inclusion from linux-4.19.137 commit 53f941777b9df64fa70fad775efabfcfb5db92c4 -------------------------------- [ Upstream commit 966a0acc ] Commit f7b93d42 ("arm64/alternatives: use subsections for replacement sequences") breaks LLVM's integrated assembler, because due to its one-pass design, it cannot compute instruction sequence lengths before the layout for the subsection has been finalized. This change fixes the build by moving the .org directives inside the subsection, so they are processed after the subsection layout is known. Fixes: f7b93d42 ("arm64/alternatives: use subsections for replacement sequences") Signed-off-by: NSami Tolvanen <samitolvanen@google.com> Link: https://github.com/ClangBuiltLinux/linux/issues/1078 Link: https://lore.kernel.org/r/20200730153701.3892953-1-samitolvanen@google.comSigned-off-by: NWill Deacon <will@kernel.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Andrii Nakryiko 提交于
stable inclusion from linux-4.19.137 commit 634d42cadc4771fbe3b70e0fa8b82b334fd41959 -------------------------------- [ Upstream commit 1d4e1eab ] Fix HASH_OF_MAPS bug of not putting inner map pointer on bpf_map_elem_update() operation. This is due to per-cpu extra_elems optimization, which bypassed free_htab_elem() logic doing proper clean ups. Make sure that inner map is put properly in optimized case as well. Fixes: 8c290e60 ("bpf: fix hashmap extra_elems logic") Signed-off-by: NAndrii Nakryiko <andriin@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NSong Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200729040913.2815687-1-andriin@fb.comSigned-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Mikulas Patocka 提交于
stable inclusion from linux-4.19.135 commit cab7ef0066401efb5ab2dea1d85f490ce9c626f3 -------------------------------- commit 5df96f2b upstream. Commit adc0daad ("dm: report suspended device during destroy") broke integrity recalculation. The problem is dm_suspended() returns true not only during suspend, but also during resume. So this race condition could occur: 1. dm_integrity_resume calls queue_work(ic->recalc_wq, &ic->recalc_work) 2. integrity_recalc (&ic->recalc_work) preempts the current thread 3. integrity_recalc calls if (unlikely(dm_suspended(ic->ti))) goto unlock_ret; 4. integrity_recalc exits and no recalculating is done. To fix this race condition, add a function dm_post_suspending that is only true during the postsuspend phase and use it instead of dm_suspended(). Signed-off-by: Mikulas Patocka <mpatocka redhat com> Fixes: adc0daad ("dm: report suspended device during destroy") Cc: stable vger kernel org # v4.18+ Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Michael J. Ruhl 提交于
stable inclusion from linux-4.19.135 commit 4daa403143be80d373f520029a5220602db2e4cd -------------------------------- commit e0b3e0b1 upstream. The !ATOMIC_IOMAP version of io_maping_init_wc will always return success, even when the ioremap fails. Since the ATOMIC_IOMAP version returns NULL when the init fails, and callers check for a NULL return on error this is unexpected. During a device probe, where the ioremap failed, a crash can look like this: BUG: unable to handle page fault for address: 0000000000210000 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page Oops: 0002 [#1] PREEMPT SMP CPU: 0 PID: 177 Comm: RIP: 0010:fill_page_dma [i915] gen8_ppgtt_create [i915] i915_ppgtt_create [i915] intel_gt_init [i915] i915_gem_init [i915] i915_driver_probe [i915] pci_device_probe really_probe driver_probe_device The remap failure occurred much earlier in the probe. If it had been propagated, the driver would have exited with an error. Return NULL on ioremap failure. [akpm@linux-foundation.org: detect ioremap_wc() errors earlier] Fixes: cafaf14a ("io-mapping: Always create a struct to hold metadata about the io-mapping") Signed-off-by: NMichael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200721171936.81563-1-michael.j.ruhl@intel.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Tetsuo Handa 提交于
stable inclusion from linux-4.19.135 commit 74752b81eae8ae64e97de222320026367e92c4b5 -------------------------------- commit ce684552 upstream. syzbot is reporting general protection fault in do_con_write() [1] caused by vc->vc_screenbuf == ZERO_SIZE_PTR caused by vc->vc_screenbuf_size == 0 caused by vc->vc_cols == vc->vc_rows == vc->vc_size_row == 0 caused by fb_set_var() from ioctl(FBIOPUT_VSCREENINFO) on /dev/fb0 , for gotoxy(vc, 0, 0) from reset_terminal() from vc_init() from vc_allocate() from con_install() from tty_init_dev() from tty_open() on such console causes vc->vc_pos == 0x10000000e due to ((unsigned long) ZERO_SIZE_PTR) + -1U * 0 + (-1U << 1). I don't think that a console with 0 column or 0 row makes sense. And it seems that vc_do_resize() does not intend to allow resizing a console to 0 column or 0 row due to new_cols = (cols ? cols : vc->vc_cols); new_rows = (lines ? lines : vc->vc_rows); exception. Theoretically, cols and rows can be any range as long as 0 < cols * rows * 2 <= KMALLOC_MAX_SIZE is satisfied (e.g. cols == 1048576 && rows == 2 is possible) because of vc->vc_size_row = vc->vc_cols << 1; vc->vc_screenbuf_size = vc->vc_rows * vc->vc_size_row; in visual_init() and kzalloc(vc->vc_screenbuf_size) in vc_allocate(). Since we can detect cols == 0 or rows == 0 via screenbuf_size = 0 in visual_init(), we can reject kzalloc(0). Then, vc_allocate() will return an error, and con_write() will not be called on a console with 0 column or 0 row. We need to make sure that integer overflow in visual_init() won't happen. Since vc_do_resize() restricts cols <= 32767 and rows <= 32767, applying 1 <= cols <= 32767 and 1 <= rows <= 32767 restrictions to vc_allocate() will be practically fine. This patch does not touch con_init(), for returning -EINVAL there does not help when we are not returning -ENOMEM. [1] https://syzkaller.appspot.com/bug?extid=017265e8553724e514e8Reported-and-tested-by: Nsyzbot <syzbot+017265e8553724e514e8@syzkaller.appspotmail.com> Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20200712111013.11881-1-penguin-kernel@I-love.SAKURA.ne.jpSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-