- 05 3月, 2020 3 次提交
-
-
由 Russell King 提交于
[ Upstream commit d2ed49cf6c13e379c5819aa5ac20e1f9674ebc89 ] When a PHY is probed, if the top bit is set, we end up requesting a module with the string "mdio:-10101110000000100101000101010001" - the top bit is printed to a signed -1 value. This leads to the module not being loaded. Fix the module format string and the macro generating the values for it to ensure that we only print unsigned types and the top bit is always 0/1. We correctly end up with "mdio:10101110000000100101000101010001". Fixes: 8626d3b4 ("phylib: Support phy module autoloading") Reviewed-by: NAndrew Lunn <andrew@lunn.ch> Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Guillaume Nault 提交于
[ Upstream commit 04d26e7b159a396372646a480f4caa166d1b6720 ] If no synflood happens for a long enough period of time, then the synflood timestamp isn't refreshed and jiffies can advance so much that time_after32() can't accurately compare them any more. Therefore, we can end up in a situation where time_after32(now, last_overflow + HZ) returns false, just because these two values are too far apart. In that case, the synflood timestamp isn't updated as it should be, which can trick tcp_synq_no_recent_overflow() into rejecting valid syncookies. For example, let's consider the following scenario on a system with HZ=1000: * The synflood timestamp is 0, either because that's the timestamp of the last synflood or, more commonly, because we're working with a freshly created socket. * We receive a new SYN, which triggers synflood protection. Let's say that this happens when jiffies == 2147484649 (that is, 'synflood timestamp' + HZ + 2^31 + 1). * Then tcp_synq_overflow() doesn't update the synflood timestamp, because time_after32(2147484649, 1000) returns false. With: - 2147484649: the value of jiffies, aka. 'now'. - 1000: the value of 'last_overflow' + HZ. * A bit later, we receive the ACK completing the 3WHS. But cookie_v[46]_check() rejects it because tcp_synq_no_recent_overflow() says that we're not under synflood. That's because time_after32(2147484649, 120000) returns false. With: - 2147484649: the value of jiffies, aka. 'now'. - 120000: the value of 'last_overflow' + TCP_SYNCOOKIE_VALID. Of course, in reality jiffies would have increased a bit, but this condition will last for the next 119 seconds, which is far enough to accommodate for jiffie's growth. Fix this by updating the overflow timestamp whenever jiffies isn't within the [last_overflow, last_overflow + HZ] range. That shouldn't have any performance impact since the update still happens at most once per second. Now we're guaranteed to have fresh timestamps while under synflood, so tcp_synq_no_recent_overflow() can safely use it with time_after32() in such situations. Stale timestamps can still make tcp_synq_no_recent_overflow() return the wrong verdict when not under synflood. This will be handled in the next patch. For 64 bits architectures, the problem was introduced with the conversion of ->tw_ts_recent_stamp to 32 bits integer by commit cca9bab1 ("tcp: use monotonic timestamps for PAWS"). The problem has always been there on 32 bits architectures. Fixes: cca9bab1 ("tcp: use monotonic timestamps for PAWS") Fixes: 1da177e4 ("Linux-2.6.12-rc2") Signed-off-by: NGuillaume Nault <gnault@redhat.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> -
由 Eric Dumazet 提交于
[ Upstream commit 501a90c945103e8627406763dac418f20f3837b2 ] syzbot was once again able to crash a host by setting a very small mtu on loopback device. Let's make inetdev_valid_mtu() available in include/net/ip.h, and use it in ip_setup_cork(), so that we protect both ip_append_page() and __ip_append_data() Also add a READ_ONCE() when the device mtu is read. Pairs this lockless read with one WRITE_ONCE() in __dev_set_mtu(), even if other code paths might write over this field. Add a big comment in include/linux/netdevice.h about dev->mtu needing READ_ONCE()/WRITE_ONCE() annotations. Hopefully we will add the missing ones in followup patches. [1] refcount_t: saturated; leaking memory. WARNING: CPU: 0 PID: 9464 at lib/refcount.c:22 refcount_warn_saturate+0x138/0x1f0 lib/refcount.c:22 Kernel panic - not syncing: panic_on_warn set ... CPU: 0 PID: 9464 Comm: syz-executor850 Not tainted 5.4.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x197/0x210 lib/dump_stack.c:118 panic+0x2e3/0x75c kernel/panic.c:221 __warn.cold+0x2f/0x3e kernel/panic.c:582 report_bug+0x289/0x300 lib/bug.c:195 fixup_bug arch/x86/kernel/traps.c:174 [inline] fixup_bug arch/x86/kernel/traps.c:169 [inline] do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:267 do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:286 invalid_op+0x23/0x30 arch/x86/entry/entry_64.S:1027 RIP: 0010:refcount_warn_saturate+0x138/0x1f0 lib/refcount.c:22 Code: 06 31 ff 89 de e8 c8 f5 e6 fd 84 db 0f 85 6f ff ff ff e8 7b f4 e6 fd 48 c7 c7 e0 71 4f 88 c6 05 56 a6 a4 06 01 e8 c7 a8 b7 fd <0f> 0b e9 50 ff ff ff e8 5c f4 e6 fd 0f b6 1d 3d a6 a4 06 31 ff 89 RSP: 0018:ffff88809689f550 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff815e4336 RDI: ffffed1012d13e9c RBP: ffff88809689f560 R08: ffff88809c50a3c0 R09: fffffbfff15d31b1 R10: fffffbfff15d31b0 R11: ffffffff8ae98d87 R12: 0000000000000001 R13: 0000000000040100 R14: ffff888099041104 R15: ffff888218d96e40 refcount_add include/linux/refcount.h:193 [inline] skb_set_owner_w+0x2b6/0x410 net/core/sock.c:1999 sock_wmalloc+0xf1/0x120 net/core/sock.c:2096 ip_append_page+0x7ef/0x1190 net/ipv4/ip_output.c:1383 udp_sendpage+0x1c7/0x480 net/ipv4/udp.c:1276 inet_sendpage+0xdb/0x150 net/ipv4/af_inet.c:821 kernel_sendpage+0x92/0xf0 net/socket.c:3794 sock_sendpage+0x8b/0xc0 net/socket.c:936 pipe_to_sendpage+0x2da/0x3c0 fs/splice.c:458 splice_from_pipe_feed fs/splice.c:512 [inline] __splice_from_pipe+0x3ee/0x7c0 fs/splice.c:636 splice_from_pipe+0x108/0x170 fs/splice.c:671 generic_splice_sendpage+0x3c/0x50 fs/splice.c:842 do_splice_from fs/splice.c:861 [inline] direct_splice_actor+0x123/0x190 fs/splice.c:1035 splice_direct_to_actor+0x3b4/0xa30 fs/splice.c:990 do_splice_direct+0x1da/0x2a0 fs/splice.c:1078 do_sendfile+0x597/0xd00 fs/read_write.c:1464 __do_sys_sendfile64 fs/read_write.c:1525 [inline] __se_sys_sendfile64 fs/read_write.c:1511 [inline] __x64_sys_sendfile64+0x1dd/0x220 fs/read_write.c:1511 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x441409 Code: e8 ac e8 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007fffb64c4f78 EFLAGS: 00000246 ORIG_RAX: 0000000000000028 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000441409 RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000005 RBP: 0000000000073b8a R08: 0000000000000010 R09: 0000000000000010 R10: 0000000000010001 R11: 0000000000000246 R12: 0000000000402180 R13: 0000000000402210 R14: 0000000000000000 R15: 0000000000000000 Kernel Offset: disabled Rebooting in 86400 seconds.. Fixes: 1470ddf7 ("inet: Remove explicit write references to sk/inet in ip_append_data") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: Nsyzbot <syzkaller@googlegroups.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
- 13 1月, 2020 3 次提交
-
-
由 luochunsheng 提交于
hulk inclusion category: drivers/iommu bugzilla: 28210 CVE: NA ---------------------------------- arm64 cannot support virtualization when using the 3408iMR/3416iMRraid card. For solving the problem,we add two fuctions: 1.we prepare init bypass level2 entry for specific devices when smmu uses 2 level streamid entry. 2. we add smmu.bypassdev cmdline to allow SMMU bypass streams for some specific devices. Signed-off-by: luochunsheng<luochunsheng@huawei.com> Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com> Reviewed-by: NZhen Lei <thunder.leizhen@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Heyi Guo 提交于
euleros inclusion category: bugfix This patch has already been queued in linux-next and will soon land on mainline tree. commit 52ff679c98731d3d3ae435e29c0620ae3c04a87d ------------------------------------------------- There is no special reason to set virtual LPI pending table as non-shareable. If we choose to hard code the shareability without probing, inner-shareable will be a better choice, for all the other ITS/GICR tables prefer to be inner-shareable. What's more, on Hisilicon hip08 it will trigger some kind of bus warning when mixing use of different shareabilities. Signed-off-by: NHeyi Guo <guoheyi@huawei.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20191130073849.38378-1-guoheyi@huawei.comSigned-off-by: NZenghui Yu <yuzenghui@huawei.com> Reviewed-by: NHailiang Zhang <zhang.zhanghailiang@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Kristina Martsenko 提交于
mainline inclusion from mainline-v4.20-rc1 commit 8ad50c8985d805923f52a80698010a0a5123c07d category: feature feature: Dynamic IPA and 52bit IPA support ------------------------------------------------- Add support for handling 52bit guest physical address to the VGIC layer. So far we have limited the guest physical address to 48bits, by explicitly masking the upper bits. This patch removes the restriction. We do not have to check if the host supports 52bit as the gpa is always validated during an access. (e.g, kvm_{read/write}_guest, kvm_is_visible_gfn()). Also, the ITS table save-restore is also not affected with the enhancement. The DTE entries already store the bits[51:8] of the ITT_addr (with a 256byte alignment). Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoffer Dall <cdall@kernel.org> Reviewed-by: NEric Auger <eric.auger@redhat.com> Signed-off-by: NKristina Martsenko <kristina.martsenko@arm.com> [ Macro clean ups, fix PROPBASER and PENDBASER accesses ] Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NZenghui Yu <yuzenghui@huawei.com> Reviewed-by: NHailiang Zhang <zhang.zhanghailiang@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
- 29 12月, 2019 1 次提交
-
-
由 Xie XiuQi 提交于
hulk inclusion category: bugfix bugzilla: 16633 CVE: NA Rewrite the source file's licence of mpam feature. Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com> Reviewed-by: NHanjun Guo <guohanjun@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
- 27 12月, 2019 33 次提交
-
-
由 Michal Hocko 提交于
[ Upstream commit 7635d9cbe8327e131a1d3d8517dc186c2796ce2e ] Userspace falls short when trying to find out whether a specific memory range is eligible for THP. There are usecases that would like to know that http://lkml.kernel.org/r/alpine.DEB.2.21.1809251248450.50347@chino.kir.corp.google.com : This is used to identify heap mappings that should be able to fault thp : but do not, and they normally point to a low-on-memory or fragmentation : issue. The only way to deduce this now is to query for hg resp. nh flags and confronting the state with the global setting. Except that there is also PR_SET_THP_DISABLE that might change the picture. So the final logic is not trivial. Moreover the eligibility of the vma depends on the type of VMA as well. In the past we have supported only anononymous memory VMAs but things have changed and shmem based vmas are supported as well these days and the query logic gets even more complicated because the eligibility depends on the mount option and another global configuration knob. Simplify the current state and report the THP eligibility in /proc/<pid>/smaps for each existing vma. Reuse transparent_hugepage_enabled for this purpose. The original implementation of this function assumes that the caller knows that the vma itself is supported for THP so make the core checks into __transparent_hugepage_enabled and use it for existing callers. __show_smap just use the new transparent_hugepage_enabled which also checks the vma support status (please note that this one has to be out of line due to include dependency issues). [mhocko@kernel.org: fix oops with NULL ->f_mapping] Link: http://lkml.kernel.org/r/20181224185106.GC16738@dhcp22.suse.cz Link: http://lkml.kernel.org/r/20181211143641.3503-3-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Jan Kara <jack@suse.cz> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Paul Oppenheimer <bepvte@gmail.com> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Daniel Schultz 提交于
[ Upstream commit 37ef8c2c15bdc1322b160e38986c187de2b877b2 ] The Rockchip PMIC driver can automatically detect connected component versions by reading the ID_MSB and ID_LSB registers. The probe function will always fail with RK818 PMICs because the ID_MSK is 0xFFF0 and the RK818 template ID is 0x8181. This patch changes this value to 0x8180. Fixes: 9d6105e1 ("mfd: rk808: Fix up the chip id get failed") Cc: stable@vger.kernel.org Cc: Elaine Zhang <zhangqing@rock-chips.com> Cc: Joseph Chen <chenjh@rock-chips.com> Signed-off-by: NDaniel Schultz <d.schultz@phytec.de> Signed-off-by: NHeiko Stuebner <heiko@sntech.de> Signed-off-by: NLee Jones <lee.jones@linaro.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Dmitry Monakhov 提交于
commit df4bb5d128e2c44848aeb36b7ceceba3ac85080d upstream. There is a race window where quota was redirted once we drop dq_list_lock inside dqput(), but before we grab dquot->dq_lock inside dquot_release() TASK1 TASK2 (chowner) ->dqput() we_slept: spin_lock(&dq_list_lock) if (dquot_dirty(dquot)) { spin_unlock(&dq_list_lock); dquot->dq_sb->dq_op->write_dquot(dquot); goto we_slept if (test_bit(DQ_ACTIVE_B, &dquot->dq_flags)) { spin_unlock(&dq_list_lock); dquot->dq_sb->dq_op->release_dquot(dquot); dqget() mark_dquot_dirty() dqput() goto we_slept; } So dquot dirty quota will be released by TASK1, but on next we_sleept loop we detect this and call ->write_dquot() for it. XFSTEST: https://github.com/dmonakhov/xfstests/commit/440a80d4cbb39e9234df4d7240aee1d551c36107 Link: https://lore.kernel.org/r/20191031103920.3919-2-dmonakhov@openvz.org CC: stable@vger.kernel.org Signed-off-by: NDmitry Monakhov <dmtrmonakhov@yandex-team.ru> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> -
由 Jan Kara 提交于
commit add3efdd78b8a0478ce423bb9d4df6bd95e8b335 upstream. When number of free space in the journal is very low, the arithmetic in jbd2_log_space_left() could underflow resulting in very high number of free blocks and thus triggering assertion failure in transaction commit code complaining there's not enough space in the journal: J_ASSERT(journal->j_free > 1); Properly check for the low number of free blocks. CC: stable@vger.kernel.org Reviewed-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NJan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20191105164437.32602-1-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Tejun Heo 提交于
commit e23f568aa63f64cd6b355094224cc9356c0f696b upstream. When the 32bit ino wraps around, kernfs increments the generation number to distinguish reused ino instances. The wrap-around detection tests whether the allocated ino is lower than what the cursor but the cursor is pointing to the next ino to allocate so the condition never triggers. Fix it by remembering the last ino and comparing against that. Signed-off-by: NTejun Heo <tj@kernel.org> Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Fixes: 4a3ef68a ("kernfs: implement i_generation") Cc: Namhyung Kim <namhyung@kernel.org> Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Jonathan Marek 提交于
[ Upstream commit 16ad9501b1f2edebe24f8cf3c09da0695871986b ] This fixes the case when CONFIG_QCOM_SCM is not enabled, and linux/errno.h has not been included previously. Signed-off-by: NJonathan Marek <jonathan@marek.ca> Reviewed-by: NBjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: NAndy Gross <andy.gross@linaro.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Dmitry Safonov 提交于
[ Upstream commit c96cf923a98d1b094df9f0cf97a83e118817e31b ] There might be situations where tty_ldisc_lock() has blocked, but there is already IO on tty and it prevents line discipline changes. It might theoretically turn into dead-lock. Basically, provide more priority to pending tty_ldisc_lock() than to servicing reads/writes over tty. User-visible issue was reported by Mikulas where on pa-risc with Debian 5 reboot took either 80 seconds, 3 minutes or 3:25 after proper locking in tty_reopen(). Cc: Jiri Slaby <jslaby@suse.com> Reported-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NDmitry Safonov <dima@arista.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Miquel Raynal 提交于
[ Upstream commit 4348433d8c0234f44adb6e12112e69343f50f0c5 ] mtd_oobavail() returns either mtd->oovabail or mtd->oobsize. Both values are unsigned 32-bit entities, so there is no reason to pretend returning a signed one. Signed-off-by: NMiquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: NBoris Brezillon <boris.brezillon@bootlin.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Niklas Söderlund 提交于
[ Upstream commit c9d76d0655c06b8c1f944e46c4fd9e9cf4b331c0 ] The function dma_set_max_seg_size() can return either 0 on success or -EIO on error. Change its return type from unsigned int to int to capture this. Signed-off-by: NNiklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: NGeert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Alexey Dobriyan 提交于
[ Upstream commit f8c6d1402b89f22a3647705d63cbd171aa19a77e ] acpi_find_child_device() accepts boolean not pointer as last argument. Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> [ rjw: Subject ] Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Mark Brown 提交于
[ Upstream commit f1abf67217de91f5cd3c757ae857632ca565099a ] The stub implementation of _set_load() returns a mode value which is within the bounds of valid return codes for success (the documentation just says that failures are negative error codes) but not sensible or what the actual implementation does. Fix it to just return 0. Reported-by: NCheng-Yi Chiang <cychiang@chromium.org> Signed-off-by: NMark Brown <broonie@kernel.org> Reviewed-by: NDouglas Anderson <dianders@chromium.org> Signed-off-by: NMark Brown <broonie@kernel.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Alice Michael 提交于
[ Upstream commit 843faff87af261bf55eda719a06087af0486a168 ] When calculating the valid length for a VIRTCHNL_OP_ENABLE_CHANNELS message, we accidentally allowed messages with one extra virtchnl_channel_info structure on the end. This happened due to an off by one error, because we forgot that valid_len already accounted for one virtchnl_channel_info structure, so we need to subtract one from the num_tc value. Signed-off-by: NAlice Michael <alice.michael@intel.com> Tested-by: NAndrew Bowers <andrewx.bowers@intel.com> Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Douglas Anderson 提交于
[ Upstream commit d6e1935819db0c91ce4a5af82466f3ab50d17346 ] Right now serial drivers process sysrq keys deep in their character receiving code. This means that they've already grabbed their port->lock spinlock. This can end up getting in the way if we've go to do serial stuff (especially kgdb) in response to the sysrq. Serial drivers have various hacks in them to handle this. Looking at '8250_port.c' you can see that the console_write() skips locking if we're in the sysrq handler. Looking at 'msm_serial.c' you can see that the port lock is dropped around uart_handle_sysrq_char(). It turns out that these hacks aren't exactly perfect. If you have lockdep turned on and use something like the 8250_port hack you'll get a splat that looks like: WARNING: possible circular locking dependency detected [...] is trying to acquire lock: ... (console_owner){-.-.}, at: console_unlock+0x2e0/0x5e4 but task is already holding lock: ... (&port_lock_key){-.-.}, at: serial8250_handle_irq+0x30/0xe4 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&port_lock_key){-.-.}: _raw_spin_lock_irqsave+0x58/0x70 serial8250_console_write+0xa8/0x250 univ8250_console_write+0x40/0x4c console_unlock+0x528/0x5e4 register_console+0x2c4/0x3b0 uart_add_one_port+0x350/0x478 serial8250_register_8250_port+0x350/0x3a8 dw8250_probe+0x67c/0x754 platform_drv_probe+0x58/0xa4 really_probe+0x150/0x294 driver_probe_device+0xac/0xe8 __driver_attach+0x98/0xd0 bus_for_each_dev+0x84/0xc8 driver_attach+0x2c/0x34 bus_add_driver+0xf0/0x1ec driver_register+0xb4/0x100 __platform_driver_register+0x60/0x6c dw8250_platform_driver_init+0x20/0x28 ... -> #0 (console_owner){-.-.}: lock_acquire+0x1e8/0x214 console_unlock+0x35c/0x5e4 vprintk_emit+0x230/0x274 vprintk_default+0x7c/0x84 vprintk_func+0x190/0x1bc printk+0x80/0xa0 __handle_sysrq+0x104/0x21c handle_sysrq+0x30/0x3c serial8250_read_char+0x15c/0x18c serial8250_rx_chars+0x34/0x74 serial8250_handle_irq+0x9c/0xe4 dw8250_handle_irq+0x98/0xcc serial8250_interrupt+0x50/0xe8 ... other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&port_lock_key); lock(console_owner); lock(&port_lock_key); lock(console_owner); *** DEADLOCK *** The hack used in 'msm_serial.c' doesn't cause the above splats but it seems a bit ugly to unlock / lock our spinlock deep in our irq handler. It seems like we could defer processing the sysrq until the end of the interrupt handler right after we've unlocked the port. With this scheme if a whole batch of sysrq characters comes in one irq then we won't handle them all, but that seems like it should be a fine compromise. Signed-off-by: NDouglas Anderson <dianders@chromium.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> -
由 Thomas Gleixner 提交于
mainline inclusion from mainline-5.2-rc1 commit 1b72d4323798 category: bugfix bugzilla: 14439 CVE: NA ------------------------------------------------- Valentin reported that unplugging a CPU occasionally results in a warning in the tick broadcast code which is triggered when an offline CPU is in the broadcast mask. This happens because the outgoing CPU is not removing itself from the broadcast masks, especially not from the broadcast_force_mask. The removal happens on the control CPU after the outgoing CPU is dead. It's a long standing issue, but the warning is harmless. Rework the hotplug mechanism so that the outgoing CPU removes itself from the broadcast masks after disabling interrupts and removing itself from the online mask. Reported-by: NValentin Schneider <valentin.schneider@arm.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NValentin Schneider <valentin.schneider@arm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1903211540180.1784@nanos.tec.linutronix.deSigned-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-By: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Marc Zyngier 提交于
mainline inclusion from mainline-5.5-rc1 commit f226650494c6 category: bugfix bugzilla: 23209 CVE: NA --------------------------- The GICv3 architecture specification is incredibly misleading when it comes to PMR and the requirement for a DSB. It turns out that this DSB is only required if the CPU interface sends an Upstream Control message to the redistributor in order to update the RD's view of PMR. This message is only sent when ICC_CTLR_EL1.PMHE is set, which isn't the case in Linux. It can still be set from EL3, so some special care is required. But the upshot is that in the (hopefuly large) majority of the cases, we can drop the DSB altogether. This relies on a new static key being set if the boot CPU has PMHE set. The drawback is that this static key has to be exported to modules. Cc: Will Deacon <will@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Julien Thierry <julien.thierry.kdev@gmail.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NWei Li <liwei391@huawei.com> Reviewed-by: NHanjun Guo <guohanjun@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Linus Torvalds 提交于
mainline inclusion from mainline-v5.4-rc3 commit c512c69187197fe08026cb5bbe7b9709f4f89b73 category: bugfix bugzilla: 13690 CVE: CVE-2019-10220 ------------------------------------------------- In commit 9f79b78ef744 ("Convert filldir[64]() from __put_user() to unsafe_put_user()") I made filldir() use unsafe_put_user(), which improves code generation on x86 enormously. But because we didn't have a "unsafe_copy_to_user()", the dirent name copy was also done by hand with unsafe_put_user() in a loop, and it turns out that a lot of other architectures didn't like that, because unlike x86, they have various alignment issues. Most non-x86 architectures trap and fix it up, and some (like xtensa) will just fail unaligned put_user() accesses unconditionally. Which makes that "copy using put_user() in a loop" not work for them at all. I could make that code do explicit alignment etc, but the architectures that don't like unaligned accesses also don't really use the fancy "user_access_begin/end()" model, so they might just use the regular old __copy_to_user() interface. So this commit takes that looping implementation, turns it into the x86 version of "unsafe_copy_to_user()", and makes other architectures implement the unsafe copy version as __copy_to_user() (the same way they do for the other unsafe_xyz() accessor functions). Note that it only does this for the copying _to_ user space, and we still don't have a unsafe version of copy_from_user(). That's partly because we have no current users of it, but also partly because the copy_from_user() case is slightly different and cannot efficiently be implemented in terms of a unsafe_get_user() loop (because gcc can't do asm goto with outputs). It would be trivial to do this using "rep movsb", which would work really nicely on newer x86 cores, but really badly on some older ones. Al Viro is looking at cleaning up all our user copy routines to make this all a non-issue, but for now we have this simple-but-stupid version for x86 that works fine for the dirent name copy case because those names are short strings and we simply don't need anything fancier. Fixes: 9f79b78ef744 ("Convert filldir[64]() from __put_user() to unsafe_put_user()") Reported-by: NGuenter Roeck <linux@roeck-us.net> Reported-and-tested-by: NTony Luck <tony.luck@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Conflicts: include/linux/uaccess.h [yyl: adjust context] Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> -
由 Jan Kara 提交于
mainline inclusion from mainline-5.4-rc1 commit cf1ea0592dbf109e7e7935b7d5b1a47a1ba04174 category: bugfix bugzilla: 22583 CVE: NA --------------------------- Filesystems will need to call this function from their fadvise handlers. CC: stable@vger.kernel.org Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Nyu kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Rafael J. Wysocki 提交于
mainline inclusion from mainline-5.2-rc7 commit 471a739a47aa category: bugfix bugzilla: 17173 CVE: NA ------------------------------------------------- There are platforms that do not call pm_set_suspend_via_firmware(), so pm_suspend_via_firmware() returns 'false' on them, but the power states of PCI devices (PCIe ports in particular) are changed as a result of powering down core platform components during system-wide suspend. Thus the pm_suspend_via_firmware() checks in pci_pm_suspend_noirq() and pci_pm_resume_noirq() introduced by commit 3e26c5feed2a ("PCI: PM: Skip devices in D0 for suspend-to- idle") are not sufficient to determine that devices left in D0 during suspend will remain in D0 during resume and so the bus-level power management can be skipped for them. For this reason, introduce a new global suspend flag, PM_SUSPEND_FLAG_NO_PLATFORM, set it for suspend-to-idle only and replace the pm_suspend_via_firmware() checks mentioned above with checks against this flag. Fixes: 3e26c5feed2a ("PCI: PM: Skip devices in D0 for suspend-to-idle") Reported-by: NJon Hunter <jonathanh@nvidia.com> Tested-by: NJon Hunter <jonathanh@nvidia.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: NMika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: NMika Westerberg <mika.westerberg@linux.intel.com> Conflicts: include/linux/suspend.h [fix conflicts caused by comments] Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: NHanjun Guo <guohanjun@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> -
由 Rafael J. Wysocki 提交于
mainline inclusion from mainline-5.2-rc3 commit d491f2b75237 category: bugfix bugzilla: 17100 CVE: NA ------------------------------------------------- If a PCI driver leaves the device handled by it in D0 and calls pci_save_state() on the device in its ->suspend() or ->suspend_late() callback, it can expect the device to stay in D0 over the whole s2idle cycle. However, that may not be the case if there is a spurious wakeup while the system is suspended, because in that case pci_pm_suspend_noirq() will run again after pci_pm_resume_noirq() which calls pci_restore_state(), via pci_pm_default_resume_early(), so state_saved is cleared and the second iteration of pci_pm_suspend_noirq() will invoke pci_prepare_to_sleep() which may change the power state of the device. To avoid that, add a new internal flag, skip_bus_pm, that will be set by pci_pm_suspend_noirq() when it runs for the first time during the given system suspend-resume cycle if the state of the device has been saved already and the device is still in D0. Setting that flag will cause the next iterations of pci_pm_suspend_noirq() to set state_saved for pci_pm_resume_noirq(), so that it always restores the device state from the originally saved data, and avoid calling pci_prepare_to_sleep() for the device. Fixes: 33e4f80e ("ACPI / PM: Ignore spurious SCI wakeups from suspend-to-idle") Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: NKeith Busch <keith.busch@intel.com> Reviewed-by: NMika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: NHanjun Guo <guohanjun@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 YueHaibing 提交于
mainline inclusion from mainline-v5.1-rc3 commit 9804501fa1228048857910a6bf23e085aade37cc category: bugfix bugzilla: 13690 CVE: CVE-2019-19227 ------------------------------------------------- register_snap_client may return NULL, all the callers check it, but only print a warning. This will result in NULL pointer dereference in unregister_snap_client and other places. It has always been used like this since v2.6 Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NYueHaibing <yuehaibing@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Reviewed-by: NYue Haibing <yuehaibing@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Jan Kara 提交于
[ Upstream commit 0803de78049fe1b0baf44bcddc727b036fb9139b ] Currently, blktrace will not show requests that don't have any data as rq->__sector is initialized to -1 which is out of device range and thus discarded by act_log_check(). This is most notably the case for cache flush requests sent to the device. Fix the problem by making blk_rq_trace_sector() return 0 for requests without initialized sector. Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Yi Wang 提交于
[ Upstream commit fb5bf31722d0805a3f394f7d59f2e8cd07acccb7 ] We get a warning when building kernel with W=1: kernel/fork.c:167:13: warning: no previous prototype for `arch_release_thread_stack' [-Wmissing-prototypes] kernel/fork.c:779:13: warning: no previous prototype for `fork_init' [-Wmissing-prototypes] Add the missing declaration in head file to fix this. Also, remove arch_release_thread_stack() completely because no arch seems to implement it since bb9d8126 (arch: remove tile port). Link: http://lkml.kernel.org/r/1542170087-23645-1-git-send-email-wang.yi59@zte.com.cnSigned-off-by: NYi Wang <wang.yi59@zte.com.cn> Acked-by: NMichal Hocko <mhocko@suse.com> Acked-by: NMike Rapoport <rppt@linux.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Alexey Skidanov 提交于
[ Upstream commit 52fbf1134d479234d7e64ba9dcbaea23405f229e ] gen_pool_alloc_algo() uses different allocation functions implementing different allocation algorithms. With gen_pool_first_fit_align() allocation function, the returned address should be aligned on the requested boundary. If chunk start address isn't aligned on the requested boundary, the returned address isn't aligned too. The only way to get properly aligned address is to initialize the pool with chunks aligned on the requested boundary. If want to have an ability to allocate buffers aligned on different boundaries (for example, 4K, 1MB, ...), the chunk start address should be aligned on the max possible alignment. This happens because gen_pool_first_fit_align() looks for properly aligned memory block without taking into account the chunk start address alignment. To fix this, we provide chunk start address to gen_pool_first_fit_align() and change its implementation such that it starts looking for properly aligned block with appropriate offset (exactly as is done in CMA). Link: https://lkml.kernel.org/lkml/a170cf65-6884-3592-1de9-4c235888cc8a@intel.com Link: http://lkml.kernel.org/r/1541690953-4623-1-git-send-email-alexey.skidanov@intel.comSigned-off-by: NAlexey Skidanov <alexey.skidanov@intel.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Daniel Mentz <danielmentz@google.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Laura Abbott <labbott@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Wei Yang 提交于
[ Upstream commit 8b09549c2bfd9f3f8f4cdad74107ef4f4ff9cdd7 ] Commit fa5e084e ("vmscan: do not unconditionally treat zones that fail zone_reclaim() as full") changed the return value of node_reclaim(). The original return value 0 means NODE_RECLAIM_SOME after this commit. While the return value of node_reclaim() when CONFIG_NUMA is n is not changed. This will leads to call zone_watermark_ok() again. This patch fixes the return value by adjusting to NODE_RECLAIM_NOSCAN. Since node_reclaim() is only called in page_alloc.c, move it to mm/internal.h. Link: http://lkml.kernel.org/r/20181113080436.22078-1-richard.weiyang@gmail.comSigned-off-by: NWei Yang <richard.weiyang@gmail.com> Acked-by: NMichal Hocko <mhocko@suse.com> Reviewed-by: NMatthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Masami Hiramatsu 提交于
[ Upstream commit fb1a59fae8baa3f3c69b72a87ff94fc4fa5683ec ] Blacklist symbols in arch-defined probe-prohibited areas. With this change, user can see all symbols which are prohibited to probe in debugfs. All archtectures which have custom prohibit areas should define its own arch_populate_kprobe_blacklist() function, but unless that, all symbols marked __kprobes are blacklisted. Reported-by: NAndrea Righi <righi.andrea@gmail.com> Tested-by: NAndrea Righi <righi.andrea@gmail.com> Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: David S. Miller <davem@davemloft.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yonghong Song <yhs@fb.com> Link: http://lkml.kernel.org/r/154503485491.26176.15823229545155174796.stgit@devboxSigned-off-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Darwin Dingel 提交于
[ Upstream commit 6d7f677a2afa1c82d7fc7af7f9159cbffd5dc010 ] When a serial port gets faulty or gets flooded with inputs, its interrupt handler starts to work double time to get the characters to the workqueue for the tty layer to handle them. When this busy time on the serial/tty subsystem happens during boot, where it is also busy on the userspace trying to initialise, some processes can continuously get preempted and will be on hold until the interrupts subside. The fix is to backoff on processing received characters for a specified amount of time when an input overrun is seen (received a new character before the previous one is processed). This only stops receive and will continue to transmit characters to serial port. After the backoff period is done, it receive will be re-enabled. This is optional and will only be enabled by setting 'overrun-throttle-ms' in the dts. Signed-off-by: NDarwin Dingel <darwin.dingel@alliedtelesis.co.nz> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Masami Hiramatsu 提交于
[ Upstream commit fc800a10be26017f8f338bc8e500d48e3e6429d9 ] synthetic event is using synth_event_mutex for protecting synth_event_list, and event_trigger_write() path acquires locks as below order. event_trigger_write(event_mutex) ->trigger_process_regex(trigger_cmd_mutex) ->event_hist_trigger_func(synth_event_mutex) On the other hand, synthetic event creation and deletion paths call trace_add_event_call() and trace_remove_event_call() which acquires event_mutex. In that case, if we keep the synth_event_mutex locked while registering/unregistering synthetic events, its dependency will be inversed. To avoid this issue, current synthetic event is using a 2 phase process to create/delete events. For example, it searches existing events under synth_event_mutex to check for event-name conflicts, and unlocks synth_event_mutex, then registers a new event under event_mutex locked. Finally, it locks synth_event_mutex and tries to add the new event to the list. But it can introduce complexity and a chance for name conflicts. To solve this simpler, this introduces trace_add_event_call_nolock() and trace_remove_event_call_nolock() which don't acquire event_mutex inside. synthetic event can lock event_mutex before synth_event_mutex to solve the lock dependency issue simpler. Link: http://lkml.kernel.org/r/154140844377.17322.13781091165954002713.stgit@devboxReviewed-by: NTom Zanussi <tom.zanussi@linux.intel.com> Tested-by: NTom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> -
由 Matthew Wilcox (Oracle) 提交于
[ Upstream commit f6341c5af4e6e15041be39976d16deca789555fa ] If there is an entry at INT_MAX then idr_for_each_entry() will increment id after handling it. This is undefined behaviour, and is caught by UBSAN. Adding 1U to id forces the operation to be carried out as an unsigned addition which (when assigned to id) will result in INT_MIN. Since there is never an entry stored at INT_MIN, idr_get_next() will return NULL, ending the loop as expected. Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Randy Dunlap 提交于
[ Upstream commit f430c7ed8bc22992ed528b518da465b060b9223f ] Add a missing short description to the reset_control_ops documentation. Signed-off-by: NRandy Dunlap <rdunlap@infradead.org> [p.zabel@pengutronix.de: rebased and updated commit message] Signed-off-by: NPhilipp Zabel <p.zabel@pengutronix.de> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Arnd Bergmann 提交于
commit 04e7712f4460585e5eed5b853fd8b82a9943958f upstream. We are going to share the compat_sys_futex() handler between 64-bit architectures and 32-bit architectures that need to deal with both 32-bit and 64-bit time_t, and this is easier if both entry points are in the same file. In fact, most other system call handlers do the same thing these days, so let's follow the trend here and merge all of futex_compat.c into futex.c. In the process, a few minor changes have to be done to make sure everything still makes sense: handle_futex_death() and futex_cmpxchg_enabled() become local symbol, and the compat version of the fetch_robust_entry() function gets renamed to compat_fetch_robust_entry() to avoid a symbol clash. This is intended as a purely cosmetic patch, no behavior should change. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Sean Christopherson 提交于
commit a78986aae9b2988f8493f9f65a587ee433e83bc3 upstream. Explicitly exempt ZONE_DEVICE pages from kvm_is_reserved_pfn() and instead manually handle ZONE_DEVICE on a case-by-case basis. For things like page refcounts, KVM needs to treat ZONE_DEVICE pages like normal pages, e.g. put pages grabbed via gup(). But for flows such as setting A/D bits or shifting refcounts for transparent huge pages, KVM needs to to avoid processing ZONE_DEVICE pages as the flows in question lack the underlying machinery for proper handling of ZONE_DEVICE pages. This fixes a hang reported by Adam Borowski[*] in dev_pagemap_cleanup() when running a KVM guest backed with /dev/dax memory, as KVM straight up doesn't put any references to ZONE_DEVICE pages acquired by gup(). Note, Dan Williams proposed an alternative solution of doing put_page() on ZONE_DEVICE pages immediately after gup() in order to simplify the auditing needed to ensure is_zone_device_page() is called if and only if the backing device is pinned (via gup()). But that approach would break kvm_vcpu_{un}map() as KVM requires the page to be pinned from map() 'til unmap() when accessing guest memory, unlike KVM's secondary MMU, which coordinates with mmu_notifier invalidations to avoid creating stale page references, i.e. doesn't rely on pages being pinned. [*] http://lkml.kernel.org/r/20190919115547.GA17963@angband.plReported-by: NAdam Borowski <kilobyte@angband.pl> Analyzed-by: NDavid Hildenbrand <david@redhat.com> Acked-by: NDan Williams <dan.j.williams@intel.com> Cc: stable@vger.kernel.org Fixes: 3565fce3 ("mm, x86: get_user_pages() for dax mappings") Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> [sean: backport to 4.x; resolve conflict in mmu.c] Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> -
由 Rasmus Villemoes 提交于
[ Upstream commit 7275b097851a5e2e0dd4da039c7e96b59ac5314e ] The static inlines in bitmap.h do not handle a compile-time constant nbits==0 correctly (they dereference the passed src or dst pointers, despite only 0 words being valid to access). I had the 0-day buildbot chew on a patch [1] that would cause build failures for such cases without complaining, suggesting that we don't have any such users currently, at least for the 70 .config/arch combinations that was built. Should any turn up, make sure they use the out-of-line versions, which do handle nbits==0 correctly. This is of course not the most efficient, but it's much less churn than teaching all the static inlines an "if (zero_const_nbits())", and since we don't have any current instances, this doesn't affect existing code at all. [1] lkml.kernel.org/r/20180815085539.27485-1-linux@rasmusvillemoes.dk Link: http://lkml.kernel.org/r/20180818131623.8755-3-linux@rasmusvillemoes.dkSigned-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: NAndy Shevchenko <andy.shevchenko@gmail.com> Cc: Yury Norov <ynorov@caviumnetworks.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-
由 Hangbin Liu 提交于
[ Upstream commit 966c37f2d77eb44d47af8e919267b1ba675b2eca ] Similiar with ipv6 mcast commit 89225d1c ("net: ipv6: mld: fix v1/v2 switchback timeout to rfc3810, 9.12.") i) RFC3376 8.12. Older Version Querier Present Timeout says: The Older Version Querier Interval is the time-out for transitioning a host back to IGMPv3 mode once an older version query is heard. When an older version query is received, hosts set their Older Version Querier Present Timer to Older Version Querier Interval. This value MUST be ((the Robustness Variable) times (the Query Interval in the last Query received)) plus (one Query Response Interval). Currently we only use a hardcode value IGMP_V1/v2_ROUTER_PRESENT_TIMEOUT. Fix it by adding two new items mr_qi(Query Interval) and mr_qri(Query Response Interval) in struct in_device. Now we can calculate the switchback time via (mr_qrv * mr_qi) + mr_qri. We need update these values when receive IGMPv3 queries. Reported-by: NYing Xu <yinxu@redhat.com> Signed-off-by: NHangbin Liu <liuhangbin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
-