- 11 3月, 2022 1 次提交
-
-
由 Dmitry Baryshkov 提交于
The function device_pm_check_callbacks() can be called under the spin lock (in the reported case it happens from genpd_add_device() -> dev_pm_domain_set(), when the genpd uses spinlocks rather than mutexes. However this function uncoditionally uses spin_lock_irq() / spin_unlock_irq(), thus not preserving the CPU flags. Use the irqsave/irqrestore instead. The backtrace for the reference: [ 2.752010] ------------[ cut here ]------------ [ 2.756769] raw_local_irq_restore() called with IRQs enabled [ 2.762596] WARNING: CPU: 4 PID: 1 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x34/0x50 [ 2.772338] Modules linked in: [ 2.775487] CPU: 4 PID: 1 Comm: swapper/0 Tainted: G S 5.17.0-rc6-00384-ge330d0d82eff-dirty #684 [ 2.781384] Freeing initrd memory: 46024K [ 2.785839] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 2.785841] pc : warn_bogus_irq_restore+0x34/0x50 [ 2.785844] lr : warn_bogus_irq_restore+0x34/0x50 [ 2.785846] sp : ffff80000805b7d0 [ 2.785847] x29: ffff80000805b7d0 x28: 0000000000000000 x27: 0000000000000002 [ 2.785850] x26: ffffd40e80930b18 x25: ffff7ee2329192b8 x24: ffff7edfc9f60800 [ 2.785853] x23: ffffd40e80930b18 x22: ffffd40e80930d30 x21: ffff7edfc0dffa00 [ 2.785856] x20: ffff7edfc09e3768 x19: 0000000000000000 x18: ffffffffffffffff [ 2.845775] x17: 6572206f74206465 x16: 6c696166203a3030 x15: ffff80008805b4f7 [ 2.853108] x14: 0000000000000000 x13: ffffd40e809550b0 x12: 00000000000003d8 [ 2.860441] x11: 0000000000000148 x10: ffffd40e809550b0 x9 : ffffd40e809550b0 [ 2.867774] x8 : 00000000ffffefff x7 : ffffd40e809ad0b0 x6 : ffffd40e809ad0b0 [ 2.875107] x5 : 000000000000bff4 x4 : 0000000000000000 x3 : 0000000000000000 [ 2.882440] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff7edfc03a8000 [ 2.889774] Call trace: [ 2.892290] warn_bogus_irq_restore+0x34/0x50 [ 2.896770] _raw_spin_unlock_irqrestore+0x94/0xa0 [ 2.901690] genpd_unlock_spin+0x20/0x30 [ 2.905724] genpd_add_device+0x100/0x2d0 [ 2.909850] __genpd_dev_pm_attach+0xa8/0x23c [ 2.914329] genpd_dev_pm_attach_by_id+0xc4/0x190 [ 2.919167] genpd_dev_pm_attach_by_name+0x3c/0xd0 [ 2.924086] dev_pm_domain_attach_by_name+0x24/0x30 [ 2.929102] psci_dt_attach_cpu+0x24/0x90 [ 2.933230] psci_cpuidle_probe+0x2d4/0x46c [ 2.937534] platform_probe+0x68/0xe0 [ 2.941304] really_probe.part.0+0x9c/0x2fc [ 2.945605] __driver_probe_device+0x98/0x144 [ 2.950085] driver_probe_device+0x44/0x15c [ 2.954385] __device_attach_driver+0xb8/0x120 [ 2.958950] bus_for_each_drv+0x78/0xd0 [ 2.962896] __device_attach+0xd8/0x180 [ 2.966843] device_initial_probe+0x14/0x20 [ 2.971144] bus_probe_device+0x9c/0xa4 [ 2.975092] device_add+0x380/0x88c [ 2.978679] platform_device_add+0x114/0x234 [ 2.983067] platform_device_register_full+0x100/0x190 [ 2.988344] psci_idle_init+0x6c/0xb0 [ 2.992113] do_one_initcall+0x74/0x3a0 [ 2.996060] kernel_init_freeable+0x2fc/0x384 [ 3.000543] kernel_init+0x28/0x130 [ 3.004132] ret_from_fork+0x10/0x20 [ 3.007817] irq event stamp: 319826 [ 3.011404] hardirqs last enabled at (319825): [<ffffd40e7eda0268>] __up_console_sem+0x78/0x84 [ 3.020332] hardirqs last disabled at (319826): [<ffffd40e7fd6d9d8>] el1_dbg+0x24/0x8c [ 3.028458] softirqs last enabled at (318312): [<ffffd40e7ec90410>] _stext+0x410/0x588 [ 3.036678] softirqs last disabled at (318299): [<ffffd40e7ed1bf68>] __irq_exit_rcu+0x158/0x174 [ 3.045607] ---[ end trace 0000000000000000 ]--- Signed-off-by: NDmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 05 3月, 2022 1 次提交
-
-
由 Douglas Anderson 提交于
The PM Runtime docs say: Drivers in ->remove() callback should undo the runtime PM changes done in ->probe(). Usually this means calling pm_runtime_disable(), pm_runtime_dont_use_autosuspend() etc. From grepping code, it's clear that many people aren't aware of the need to call pm_runtime_dont_use_autosuspend(). When brainstorming solutions, one idea that came up was to leverage the new-ish devm_pm_runtime_enable() function. The idea here is that: * When the devm action is called we know that the driver is being removed. It's the perfect time to undo the use_autosuspend. * The code of pm_runtime_dont_use_autosuspend() already handles the case of being called when autosuspend wasn't enabled. Suggested-by: NLaurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: NDouglas Anderson <dianders@chromium.org> Reviewed-by: NUlf Hansson <ulf.hansson@linaro.org> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 26 2月, 2022 1 次提交
-
-
由 Jason Gunthorpe 提交于
If the state is not idle then resolve_prepare_src() should immediately fail and no change to global state should happen. However, it unconditionally overwrites the src_addr trying to build a temporary any address. For instance if the state is already RDMA_CM_LISTEN then this will corrupt the src_addr and would cause the test in cma_cancel_operation(): if (cma_any_addr(cma_src_addr(id_priv)) && !id_priv->cma_dev) Which would manifest as this trace from syzkaller: BUG: KASAN: use-after-free in __list_add_valid+0x93/0xa0 lib/list_debug.c:26 Read of size 8 at addr ffff8881546491e0 by task syz-executor.1/32204 CPU: 1 PID: 32204 Comm: syz-executor.1 Not tainted 5.12.0-rc8-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x141/0x1d7 lib/dump_stack.c:120 print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:232 __kasan_report mm/kasan/report.c:399 [inline] kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:416 __list_add_valid+0x93/0xa0 lib/list_debug.c:26 __list_add include/linux/list.h:67 [inline] list_add_tail include/linux/list.h:100 [inline] cma_listen_on_all drivers/infiniband/core/cma.c:2557 [inline] rdma_listen+0x787/0xe00 drivers/infiniband/core/cma.c:3751 ucma_listen+0x16a/0x210 drivers/infiniband/core/ucma.c:1102 ucma_write+0x259/0x350 drivers/infiniband/core/ucma.c:1732 vfs_write+0x28e/0xa30 fs/read_write.c:603 ksys_write+0x1ee/0x250 fs/read_write.c:658 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae This is indicating that an rdma_id_private was destroyed without doing cma_cancel_listens(). Instead of trying to re-use the src_addr memory to indirectly create an any address derived from the dst build one explicitly on the stack and bind to that as any other normal flow would do. rdma_bind_addr() will copy it over the src_addr once it knows the state is valid. This is similar to commit bc0bdc5a ("RDMA/cma: Do not change route.addr.src_addr.ss_family") Link: https://lore.kernel.org/r/0-v2-e975c8fd9ef2+11e-syz_cma_srcaddr_jgg@nvidia.com Cc: stable@vger.kernel.org Fixes: 732d41c5 ("RDMA/cma: Make the locking for automatic state transition more clear") Reported-by: syzbot+c94a3675a626f6333d74@syzkaller.appspotmail.com Reviewed-by: NLeon Romanovsky <leonro@nvidia.com> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
-
- 25 2月, 2022 6 次提交
-
-
由 Chuansheng Liu 提交于
It is easy to hit the below memory leaks in my TigerLake platform: unreferenced object 0xffff927c8b91dbc0 (size 32): comm "kworker/0:2", pid 112, jiffies 4294893323 (age 83.604s) hex dump (first 32 bytes): 4e 41 4d 45 3d 49 4e 54 33 34 30 30 20 54 68 65 NAME=INT3400 The 72 6d 61 6c 00 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 rmal.kkkkkkkkkk. backtrace: [<ffffffff9c502c3e>] __kmalloc_track_caller+0x2fe/0x4a0 [<ffffffff9c7b7c15>] kvasprintf+0x65/0xd0 [<ffffffff9c7b7d6e>] kasprintf+0x4e/0x70 [<ffffffffc04cb662>] int3400_notify+0x82/0x120 [int3400_thermal] [<ffffffff9c8b7358>] acpi_ev_notify_dispatch+0x54/0x71 [<ffffffff9c88f1a7>] acpi_os_execute_deferred+0x17/0x30 [<ffffffff9c2c2c0a>] process_one_work+0x21a/0x3f0 [<ffffffff9c2c2e2a>] worker_thread+0x4a/0x3b0 [<ffffffff9c2cb4dd>] kthread+0xfd/0x130 [<ffffffff9c201c1f>] ret_from_fork+0x1f/0x30 Fix it by calling kfree() accordingly. Fixes: 38e44da5 ("thermal: int3400_thermal: process "thermal table changed" event") Signed-off-by: NChuansheng Liu <chuansheng.liu@intel.com> Cc: 4.14+ <stable@vger.kernel.org> # 4.14+ Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Mauri Sandberg 提交于
Obtaining a MAC address may be deferred in cases when the MAC is stored in an NVMEM block, for example, and it may not be ready upon the first retrieval attempt and return EPROBE_DEFER. It is also possible that a port that does not rely on NVMEM has been already created when getting the defer request. Thus, also the resources allocated previously must be freed when doing a roll-back. Fixes: 76723bca ("net: mv643xx_eth: add DT parsing support") Signed-off-by: NMauri Sandberg <maukka@ext.kapsi.fi> Reviewed-by: NAndrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220223142337.41757-1-maukka@ext.kapsi.fiSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Mateusz Palczewski 提交于
Revert of a patch that instead of fixing a AQ error when trying to reset BW limit introduced several regressions related to creation and managing TC. Currently there are errors when creating a TC on both PF and VF. Error log: [17428.783095] i40e 0000:3b:00.1: AQ command Config VSI BW allocation per TC failed = 14 [17428.783107] i40e 0000:3b:00.1: Failed configuring TC map 0 for VSI 391 [17428.783254] i40e 0000:3b:00.1: AQ command Config VSI BW allocation per TC failed = 14 [17428.783259] i40e 0000:3b:00.1: Unable to configure TC map 0 for VSI 391 This reverts commit 3d250466. Fixes: 3d250466 (i40e: Fix reset bw limit when DCB enabled with 1 TC) Signed-off-by: NMateusz Palczewski <mateusz.palczewski@intel.com> Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20220223175347.1690692-1-anthony.l.nguyen@intel.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Manish Chopra 提交于
Commit b7a49f73 ("bnx2x: Utilize firmware 7.13.21.0") added new firmware support in the driver with maintaining older firmware compatibility. However, older firmware was not added in MODULE_FIRMWARE() which caused missing firmware files in initrd image leading to driver load failure from initrd. This patch adds MODULE_FIRMWARE() for older firmware version to have firmware files included in initrd. Fixes: b7a49f73 ("bnx2x: Utilize firmware 7.13.21.0") Link: https://bugzilla.kernel.org/show_bug.cgi?id=215627Signed-off-by: NManish Chopra <manishc@marvell.com> Signed-off-by: NAlok Prasad <palok@marvell.com> Signed-off-by: NAriel Elior <aelior@marvell.com> Link: https://lore.kernel.org/r/20220223085720.12021-1-manishc@marvell.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
This reverts commit 2afeec08. The reasoning in the commit was wrong - the code expected to setup the watch even if 'hotplug-status' didn't exist. In fact, it relied on the watch being fired the first time - to check if maybe 'hotplug-status' is already set to 'connected'. Not registering a watch for non-existing path (which is the case if hotplug script hasn't been executed yet), made the backend not waiting for the hotplug script to execute. This in turns, made the netfront think the interface is fully operational, while in fact it was not (the vif interface on xen-netback side might not be configured yet). This was a workaround for 'hotplug-status' erroneously being removed. But since that is reverted now, the workaround is not necessary either. More discussion at https://lore.kernel.org/xen-devel/afedd7cb-a291-e773-8b0d-4db9b291fa98@ipxe.org/T/#uSigned-off-by: NMarek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: NPaul Durrant <paul@xen.org> Reviewed-by: NMichael Brown <mbrown@fensystems.co.uk> Link: https://lore.kernel.org/r/20220222001817.2264967-2-marmarek@invisiblethingslab.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
This reverts commit 1f256578. The 'hotplug-status' node should not be removed as long as the vif device remains configured. Otherwise the xen-netback would wait for re-running the network script even if it was already called (in case of the frontent re-connecting). But also, it _should_ be removed when the vif device is destroyed (for example when unbinding the driver) - otherwise hotplug script would not configure the device whenever it re-appear. Moving removal of the 'hotplug-status' node was a workaround for nothing calling network script after xen-netback module is reloaded. But when vif interface is re-created (on xen-netback unbind/bind for example), the script should be called, regardless of who does that - currently this case is not handled by the toolstack, and requires manual script call. Keeping hotplug-status=connected to skip the call is wrong and leads to not configured interface. More discussion at https://lore.kernel.org/xen-devel/afedd7cb-a291-e773-8b0d-4db9b291fa98@ipxe.org/T/#uSigned-off-by: NMarek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: NPaul Durrant <paul@xen.org> Link: https://lore.kernel.org/r/20220222001817.2264967-1-marmarek@invisiblethingslab.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 24 2月, 2022 31 次提交
-
-
由 Hans de Goede 提交于
The battery on the 2nd hand Surface 3 which I recently bought appears to not have a serial number programmed in. This results in any I2C reads from the registers containing the serial number failing with an I2C NACK. This was causing mshw0011_bix() to fail causing the battery readings to not work at all. Ignore EREMOTEIO (I2C NACK) errors when retrieving the serial number and continue with an empty serial number to fix this. Fixes: b1f81b49 ("platform/x86: surface3_power: MSHW0011 rev-eng implementation") BugLink: https://github.com/linux-surface/linux-surface/issues/608Reviewed-by: NBenjamin Tissoires <benjamin.tissoires@redhat.com> Reviewed-by: NMaximilian Luz <luzmaximilian@gmail.com> Signed-off-by: NHans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20220224101848.7219-1-hdegoede@redhat.com
-
由 Mario Limonciello 提交于
commit 59348401 ("platform/x86: amd-pmc: Add special handling for timer based S0i3 wakeup") adds support for using another platform timer in lieu of the RTC which doesn't work properly on some systems. This path was validated and worked well before submission. During the 5.16-rc1 merge window other patches were merged that caused this to stop working properly. When this feature was used with 5.16-rc1 or later some OEM laptops with the matching firmware requirements from that commit would shutdown instead of program a timer based wakeup. This was bisected to commit 8d89835b ("PM: suspend: Do not pause cpuidle in the suspend-to-idle path"). This wasn't supposed to cause any negative impacts and also tested well on both Intel and ARM platforms. However this changed the semantics of when CPUs are allowed to be in the deepest state. For the AMD systems in question it appears this causes a firmware crash for timer based wakeup. It's hypothesized to be caused by the `amd-pmc` driver sending `OS_HINT` and all the CPUs going into a deep state while the timer is still being programmed. It's likely a firmware bug, but to avoid it don't allow setting CPUs into the deepest state while using CZN timer wakeup path. If later it's discovered that this also occurs from "regular" suspends without a timer as well or on other silicon, this may be later expanded to run in the suspend path for more scenarios. Cc: stable@vger.kernel.org # 5.16+ Suggested-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/linux-acpi/BL1PR12MB51570F5BD05980A0DCA1F3F4E23A9@BL1PR12MB5157.namprd12.prod.outlook.com/T/#mee35f39c41a04b624700ab2621c795367f19c90e Fixes: 8d89835b ("PM: suspend: Do not pause cpuidle in the suspend-to-idle path") Fixes: 23f62d7a ("PM: sleep: Pause cpuidle later and resume it earlier during system transitions") Fixes: 59348401 ("platform/x86: amd-pmc: Add special handling for timer based S0i3 wakeup" Reviewed-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: NMario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20220223175237.6209-1-mario.limonciello@amd.comReviewed-by: NHans de Goede <hdegoede@redhat.com> Signed-off-by: NHans de Goede <hdegoede@redhat.com>
-
由 Daehwan Jung 提交于
There's no lock for rndis response list. It could cause list corruption if there're two different list_add at the same time like below. It's better to add in rndis_add_response / rndis_free_response / rndis_get_next_response to prevent any race condition on response list. [ 361.894299] [1: irq/191-dwc3:16979] list_add corruption. next->prev should be prev (ffffff80651764d0), but was ffffff883dc36f80. (next=ffffff80651764d0). [ 361.904380] [1: irq/191-dwc3:16979] Call trace: [ 361.904391] [1: irq/191-dwc3:16979] __list_add_valid+0x74/0x90 [ 361.904401] [1: irq/191-dwc3:16979] rndis_msg_parser+0x168/0x8c0 [ 361.904409] [1: irq/191-dwc3:16979] rndis_command_complete+0x24/0x84 [ 361.904417] [1: irq/191-dwc3:16979] usb_gadget_giveback_request+0x20/0xe4 [ 361.904426] [1: irq/191-dwc3:16979] dwc3_gadget_giveback+0x44/0x60 [ 361.904434] [1: irq/191-dwc3:16979] dwc3_ep0_complete_data+0x1e8/0x3a0 [ 361.904442] [1: irq/191-dwc3:16979] dwc3_ep0_interrupt+0x29c/0x3dc [ 361.904450] [1: irq/191-dwc3:16979] dwc3_process_event_entry+0x78/0x6cc [ 361.904457] [1: irq/191-dwc3:16979] dwc3_process_event_buf+0xa0/0x1ec [ 361.904465] [1: irq/191-dwc3:16979] dwc3_thread_interrupt+0x34/0x5c Fixes: f6281af9 ("usb: gadget: rndis: use list_for_each_entry_safe") Cc: stable <stable@kernel.org> Signed-off-by: NDaehwan Jung <dh10.jung@samsung.com> Link: https://lore.kernel.org/r/1645507768-77687-1-git-send-email-dh10.jung@samsung.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
The interrupt service routine registered for the gadget is a primary handler which mask the interrupt source and a threaded handler which handles the source of the interrupt. Since the threaded handler is voluntary threaded, the IRQ-core does not disable bottom halves before invoke the handler like it does for the forced-threaded handler. Due to changes in networking it became visible that a network gadget's completions handler may schedule a softirq which remains unprocessed. The gadget's completion handler is usually invoked either in hard-IRQ or soft-IRQ context. In this context it is enough to just raise the softirq because the softirq itself will be handled once that context is left. In the case of the voluntary threaded handler, there is nothing that will process pending softirqs. Which means it remain queued until another random interrupt (on this CPU) fires and handles it on its exit path or another thread locks and unlocks a lock with the bh suffix. Worst case is that the CPU goes idle and the NOHZ complains about unhandled softirqs. Disable bottom halves before acquiring the lock (and disabling interrupts) and enable them after dropping the lock. This ensures that any pending softirqs will handled right away. Link: https://lkml.kernel.org/r/c2a64979-73d1-2c22-e048-c275c9f81558@samsung.com Fixes: e5f68b4a ("Revert "usb: dwc3: gadget: remove unnecessary _irqsave()"") Cc: stable <stable@kernel.org> Reported-by: NMarek Szyprowski <m.szyprowski@samsung.com> Tested-by: NMarek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/Yg/YPejVQH3KkRVd@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Szymon Heidrich 提交于
Assure that host may not manipulate the index to point past endpoint array. Signed-off-by: NSzymon Heidrich <szymon.heidrich@gmail.com> Cc: stable <stable@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Gal Pressman 提交于
The VF min and max rate were passed incorrectly and resulted in wrongly interchanging them. Fix the order of parameters in mlx5_esw_qos_set_vport_rate(). Fixes: d7df09f5 ("net/mlx5: E-switch, Enable vport QoS on demand") Signed-off-by: NGal Pressman <gal@nvidia.com> Reviewed-by: NAya Levin <ayal@nvidia.com> Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
-
由 Lama Kayal 提交于
Add mistakenly missing increment of count variable when looping over output buffer in mlx5e_self_test(). This resolves the issue of garbage values output when querying with self test via ethtool. before: $ ethtool -t eth2 The test result is PASS The test extra info: Link Test 0 Speed Test 1768697188 Health Test 758528120 Loopback Test 3288687 after: $ ethtool -t eth2 The test result is PASS The test extra info: Link Test 0 Speed Test 0 Health Test 0 Loopback Test 0 Fixes: 7990b1b5 ("net/mlx5e: loopback test is not supported in switchdev mode") Signed-off-by: NLama Kayal <lkayal@nvidia.com> Reviewed-by: NGal Pressman <gal@nvidia.com> Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
-
由 Maor Dickman 提交于
Currently offload of rule on bareudp device require tunnel key in order to match on mpls fields and without it the mpls fields are ignored, this is incorrect due to the fact udp tunnel doesn't have key to match on. Fix by returning error in case flow is matching on tunnel key. Fixes: 72046a91 ("net/mlx5e: Allow to match on mpls parameters") Signed-off-by: NMaor Dickman <maord@nvidia.com> Reviewed-by: NRoi Dayan <roid@nvidia.com> Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
-
由 Maor Dickman 提交于
Currently the MPLSoUDP encap builds the MPLS header using encap action information (tunnel id, ttl and tos) instead of the MPLS action information (label, ttl, tc and bos) which is wrong. Fix by storing the MPLS action information during the flow action parse and later using it to create the encap MPLS header. Fixes: f828ca6a ("net/mlx5e: Add support for hw encapsulation of MPLS over UDP") Signed-off-by: NMaor Dickman <maord@nvidia.com> Reviewed-by: NRoi Dayan <roid@nvidia.com> Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
-
由 Lama Kayal 提交于
Fec counters support is checked via the PCAM feature_cap_mask, bit 0: PPCNT_counter_group_Phy_statistical_counter_group. Add feature check to avoid faulty behavior. Fixes: 0a1498eb ("net/mlx5e: Expose FEC counters via ethtool") Signed-off-by: NLama Kayal <lkayal@nvidia.com> Reviewed-by: NGal Pressman <gal@nvidia.com> Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
-
由 Roi Dayan 提交于
Offload of ct clear action is just resetting the reg_c register. It's done by allocating modify hdr resources which is limited. Doing it multiple times is redundant and wasting modify hdr resources and if resources depleted the driver will fail offloading the rule. Ignore redundant ct clear actions after the first one. Fixes: 806401c2 ("net/mlx5e: CT, Fix multiple allocations and memleak of mod acts") Signed-off-by: NRoi Dayan <roid@nvidia.com> Reviewed-by: NAriel Levkovich <lariel@nvidia.com> Reviewed-by: NMaor Dickman <maord@nvidia.com> Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
-
由 Roi Dayan 提交于
Such rules are redundant but allowed and passed to the driver. The driver does not support offloading such rules so return an error. Fixes: 03a9d11e ("net/mlx5e: Add TC drop and mirred/redirect action parsing for SRIOV offloads") Signed-off-by: NRoi Dayan <roid@nvidia.com> Reviewed-by: NOz Shlomo <ozsh@nvidia.com> Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
-
由 Roi Dayan 提交于
This kind of action is not supported by firmware and generates a syndrome. kernel: mlx5_core 0000:08:00.0: mlx5_cmd_check:777:(pid 102063): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x8708c3) Fixes: d7e75a32 ("net/mlx5e: Add offloading of E-Switch TC pedit (header re-write) actions") Signed-off-by: NRoi Dayan <roid@nvidia.com> Reviewed-by: NMaor Dickman <maord@nvidia.com> Reviewed-by: NOz Shlomo <ozsh@nvidia.com> Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
-
由 Tariq Toukan 提交于
For RX TLS device-offloaded packets, the HW spec guarantees checksum validation for the offloaded packets, but does not define whether the CQE.checksum field matches the original packet (ciphertext) or the decrypted one (plaintext). This latitude allows architetctural improvements between generations of chips, resulting in different decisions regarding the value type of CQE.checksum. Hence, for these packets, the device driver should not make use of this CQE field. Here we block CHECKSUM_COMPLETE usage for RX TLS device-offloaded packets, and use CHECKSUM_UNNECESSARY instead. Value of the packet's tcp_hdr.csum is not modified by the HW, and it always matches the original ciphertext. Fixes: 1182f365 ("net/mlx5e: kTLS, Add kTLS RX HW offload support") Signed-off-by: NTariq Toukan <tariqt@nvidia.com> Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
-
由 Gal Pressman 提交于
The ioctl EEPROM query wrongly returns success on read failures, fix that by returning the appropriate error code. Fixes: bb64143e ("net/mlx5e: Add ethtool support for dump module EEPROM") Signed-off-by: NGal Pressman <gal@nvidia.com> Reviewed-by: NTariq Toukan <tariqt@nvidia.com> Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
-
由 Maor Gottlieb 提交于
Add missing call to up_write_ref_node() which releases the semaphore in case the FTE doesn't have destinations, such in drop rule case. Fixes: 465e7baa ("net/mlx5: Fix deletion of duplicate rules") Signed-off-by: NMaor Gottlieb <maorg@nvidia.com> Reviewed-by: NMark Bloch <mbloch@nvidia.com> Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
-
由 Chris Mi 提交于
Only prio 1 is supported if firmware doesn't support ignore flow level for nic mode. The offending commit removed the check wrongly. Add it back. Fixes: 9a99c8f1 ("net/mlx5e: E-Switch, Offload all chain 0 priorities when modify header and forward action is not supported") Signed-off-by: NChris Mi <cmi@nvidia.com> Reviewed-by: NRoi Dayan <roid@nvidia.com> Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
-
由 Ariel Levkovich 提交于
Match metadata support check returns false for ecpf device. However, this support does exist for ecpf and therefore this limitation should be removed to allow feature such as stacked devices and internal port offloaded to be supported. Fixes: 92ab1eb3 ("net/mlx5: E-Switch, Enable vport metadata matching if firmware supports it") Signed-off-by: NAriel Levkovich <lariel@nvidia.com> Reviewed-by: NMaor Dickman <maord@nvidia.com> Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
-
由 Maher Sanalla 提交于
Currently, log_max_qp value is dependent on what FW reports as its max capability. In reality, due to a bug, some FWs report a value greater than 17, even though they don't support log_max_qp > 17. This FW issue led the driver to exhaust memory on startup. Thus, log_max_qp value is set to be no more than 17 regardless of what FW reports, as it was before the cited commit. Fixes: f79a609e ("net/mlx5: Update log_max_qp value to FW max capability") Signed-off-by: NMaher Sanalla <msanalla@nvidia.com> Reviewed-by: NAvihai Horon <avihaih@nvidia.com> Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
-
由 Yevgeny Kliteynik 提交于
When deciding whether to start syncing and actually free all the "hot" ICM chunks, we need to consider the type of the ICM chunks that we're dealing with. For instance, the amount of available ICM for MODIFY_ACTION is significantly lower than the usual STE ICM, so the threshold should account for that - otherwise we can deplete MODIFY_ACTION memory just by creating and deleting the same modify header action in a continuous loop. This patch replaces the hard-coded threshold with a dynamic value. Fixes: 1c586514 ("net/mlx5: DR, ICM memory pools sync optimization") Signed-off-by: NYevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: NAlex Vesker <valex@nvidia.com> Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
-
由 Yevgeny Kliteynik 提交于
Currently SMFS allows adding rule with matching on src/dst IP w/o matching on full ethertype or ip_version, which is not supported by HW. This patch fixes this issue and adds the check as it is done in DMFS. Fixes: 26d688e3 ("net/mlx5: DR, Add Steering entry (STE) utilities") Signed-off-by: NYevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: NAlex Vesker <valex@nvidia.com> Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
-
由 Yevgeny Kliteynik 提交于
When adding a rule with 32 destinations, we hit the following out-of-band access issue: BUG: KASAN: slab-out-of-bounds in mlx5_cmd_dr_create_fte+0x18ee/0x1e70 This patch fixes the issue by both increasing the allocated buffers to accommodate for the needed actions and by checking the number of actions to prevent this issue when a rule with too many actions is provided. Fixes: 1ffd4989 ("net/mlx5: DR, Increase supported num of actions to 32") Signed-off-by: NYevgeny Kliteynik <kliteyn@nvidia.com> Reviewed-by: NAlex Vesker <valex@nvidia.com> Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
-
由 Yevgeny Kliteynik 提交于
During rule insertion on each ICM memory chunk we also allocate shadow memory used for management. This includes the hw_ste, dr_ste and miss list per entry. Since the scale of these allocations is large we noticed a performance hiccup that happens once malloc and free are stressed. In extreme usecases when ~1M chunks are freed at once, it might take up to 40 seconds to complete this, up to the point the kernel sees this as self-detected stall on CPU: rcu: INFO: rcu_sched self-detected stall on CPU To resolve this we will increase the reuse of shadow memory. Doing this we see that a time in the aforementioned usecase dropped from ~40 seconds to ~8-10 seconds. Fixes: 29cf8feb ("net/mlx5: DR, ICM pool memory allocator") Signed-off-by: NAlex Vesker <valex@nvidia.com> Signed-off-by: NYevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
-
由 Meir Lichtinger 提交于
Add the upcoming BlueField-4 and ConnectX-8 device IDs. Fixes: 2e9d3e83 ("net/mlx5: Update the list of the PCI supported devices") Signed-off-by: NMeir Lichtinger <meirl@nvidia.com> Reviewed-by: NGal Pressman <gal@nvidia.com> Reviewed-by: NTariq Toukan <tariqt@nvidia.com> Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
-
由 Qiang Yu 提交于
Workstation application ANSA/META v21.1.4 get this error dmesg when running CI test suite provided by ANSA/META: [drm:amdgpu_gem_va_ioctl [amdgpu]] *ERROR* Couldn't update BO_VA (-16) This is caused by: 1. create a 256MB buffer in invisible VRAM 2. CPU map the buffer and access it causes vm_fault and try to move it to visible VRAM 3. force visible VRAM space and traverse all VRAM bos to check if evicting this bo is valuable 4. when checking a VM bo (in invisible VRAM), amdgpu_vm_evictable() will set amdgpu_vm->evicting, but latter due to not in visible VRAM, won't really evict it so not add it to amdgpu_vm->evicted 5. before next CS to clear the amdgpu_vm->evicting, user VM ops ioctl will pass amdgpu_vm_ready() (check amdgpu_vm->evicted) but fail in amdgpu_vm_bo_update_mapping() (check amdgpu_vm->evicting) and get this error log This error won't affect functionality as next CS will finish the waiting VM ops. But we'd better clear the error log by checking the amdgpu_vm->evicting flag in amdgpu_vm_ready() to stop calling amdgpu_vm_bo_update_mapping() later. Another reason is amdgpu_vm->evicted list holds all BOs (both user buffer and page table), but only page table BOs' eviction prevent VM ops. amdgpu_vm->evicting flag is set only for page table BOs, so we should use evicting flag instead of evicted list in amdgpu_vm_ready(). The side effect of this change is: previously blocked VM op (user buffer in "evicted" list but no page table in it) gets done immediately. v2: update commit comments. Acked-by: NPaul Menzel <pmenzel@molgen.mpg.de> Reviewed-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NQiang Yu <qiang.yu@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
-
由 Guchun Chen 提交于
vkms leverages common amdgpu framebuffer creation, and also as it does not support FB modifier, there is no need to check tiling flags when initing framebuffer when virtual display is enabled. This can fix below calltrace: amdgpu 0000:00:08.0: GFX9+ requires FB check based on format modifier WARNING: CPU: 0 PID: 1023 at drivers/gpu/drm/amd/amdgpu/amdgpu_display.c:1150 amdgpu_display_framebuffer_init+0x8e7/0xb40 [amdgpu] v2: check adev->enable_virtual_display instead as vkms can be enabled in bare metal as well. Signed-off-by: NLeslie Shi <Yuliang.Shi@amd.com> Signed-off-by: NGuchun Chen <guchun.chen@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Guchun Chen 提交于
This reverts commit 4046afce. No need to support modifier in virtual kms, otherwise, in SRIOV mode, when lanuching X server, set crtc will fail due to mismatch between primary plane modifier and framebuffer modifier. Signed-off-by: NGuchun Chen <guchun.chen@amd.com> Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
由 Chen Gong 提交于
The GPU reset function of raven2 is not maintained or tested, so it should be very unstable. Now the amdgpu_asic_reset function is added to amdgpu_pmops_suspend, which causes the S3 test of raven2 to fail, so the asic_reset of raven2 is ignored here. Fixes: daf8de08 ("drm/amdgpu: always reset the asic in suspend (v2)") Signed-off-by: NChen Gong <curry.gong@amd.com> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Reviewed-by: NMario Limonciello <mario.limonciello@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
-
由 Nicholas Kazlauskas 提交于
[Why] Found when running igt@kms_atomic. Userspace attempts to do a TEST_COMMIT when 0 streams which calls dc_remove_stream_from_ctx. This in turn calls link_enc_unassign which ends up modifying stream->link = NULL directly, causing the global link_enc to be removed preventing further link activity and future link validation from passing. [How] We take care of link_enc unassignment at the start of link_enc_cfg_link_encs_assign so this call is no longer necessary. Fixes global state from being modified while unlocked. Reviewed-by: NJimmy Kizito <Jimmy.Kizito@amd.com> Acked-by: NJasdeep Dhillon <jdhillon@amd.com> Signed-off-by: NNicholas Kazlauskas <nicholas.kazlauskas@amd.com> Tested-by: NDaniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
-
由 Mario Limonciello 提交于
commit 0064b0ce ("drm/amd/pm: enable ASPM by default") enabled ASPM by default but a variety of hardware configurations it turns out that this caused a regression. * PPC64LE hardware does not support ASPM at a hardware level. CONFIG_PCIEASPM is often disabled on these architectures. * Some dGPUs on ALD platforms don't work with ASPM enabled and PCIe subsystem disables it Check with the PCIe subsystem to see that ASPM has been enabled or not. Fixes: 0064b0ce ("drm/amd/pm: enable ASPM by default") Link: https://wiki.raptorcs.com/w/images/a/ad/P9_PHB_version1.0_27July2018_pub.pdf Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1723 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1739 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1885 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1907 Tested-by: koba.ko@canonical.com Reviewed-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NMario Limonciello <mario.limonciello@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
-
由 Shreeya Patel 提交于
We are racing the registering of .to_irq when probing the i2c driver. This results in random failure of touchscreen devices. Following explains the race condition better. [gpio driver] gpio driver registers gpio chip [gpio consumer] gpio is acquired [gpio consumer] gpiod_to_irq() fails with -ENXIO [gpio driver] gpio driver registers irqchip gpiod_to_irq works at this point, but -ENXIO is fatal We could see the following errors in dmesg logs when gc->to_irq is NULL [2.101857] i2c_hid i2c-FTS3528:00: HID over i2c has not been provided an Int IRQ [2.101953] i2c_hid: probe of i2c-FTS3528:00 failed with error -22 To avoid this situation, defer probing until to_irq is registered. Returning -EPROBE_DEFER would be the first step towards avoiding the failure of devices due to the race in registration of .to_irq. Final solution to this issue would be to avoid using gc irq members until they are fully initialized. This issue has been reported many times in past and people have been using workarounds like changing the pinctrl_amd to built-in instead of loading it as a module or by adding a softdep for pinctrl_amd into the config file. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=209413Reviewed-by: NLinus Walleij <linus.walleij@linaro.org> Reviewed-by: NAndy Shevchenko <andy.shevchenko@gmail.com> Reported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NShreeya Patel <shreeya.patel@collabora.com> Signed-off-by: NBartosz Golaszewski <brgl@bgdev.pl>
-