- 08 10月, 2019 19 次提交
-
-
由 Stephen Boyd 提交于
[ Upstream commit af55dadfbce35b4f4c6247244ce3e44b2e242b84 ] A future patch is going to change semantics of clk_register() so that clk_hw::init is guaranteed to be NULL after a clk is registered. Avoid referencing this member here so that we don't run into NULL pointer exceptions. Cc: Guo Zeng <Guo.Zeng@csr.com> Cc: Barry Song <Baohua.Song@csr.com> Signed-off-by: NStephen Boyd <sboyd@kernel.org> Link: https://lkml.kernel.org/r/20190731193517.237136-6-sboyd@kernel.orgSigned-off-by: NSasha Levin <sashal@kernel.org>
-
由 Stephen Boyd 提交于
[ Upstream commit cf9ec1fc6d7cceb73e7f1efd079d2eae173fdf57 ] A future patch is going to change semantics of clk_register() so that clk_hw::init is guaranteed to be NULL after a clk is registered. Avoid referencing this member here so that we don't run into NULL pointer exceptions. Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: NStephen Boyd <sboyd@kernel.org> Link: https://lkml.kernel.org/r/20190731193517.237136-2-sboyd@kernel.org [sboyd@kernel.org: Move name to after checking for error or NULL hw] Acked-by: NManivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Alexey Kardashevskiy 提交于
[ Upstream commit c37c792dec0929dbb6360a609fb00fa20bb16fc2 ] We allocate only the first level of multilevel TCE tables for KVM already (alloc_userspace_copy==true), and the rest is allocated on demand. This is not enabled though for bare metal. This removes the KVM limitation (implicit, via the alloc_userspace_copy parameter) and always allocates just the first level. The on-demand allocation of missing levels is already implemented. As from now on DMA map might happen with disabled interrupts, this allocates TCEs with GFP_ATOMIC; otherwise lockdep reports errors 1]. In practice just a single page is allocated there so chances for failure are quite low. To save time when creating a new clean table, this skips non-allocated indirect TCE entries in pnv_tce_free just like we already do in the VFIO IOMMU TCE driver. This changes the default level number from 1 to 2 to reduce the amount of memory required for the default 32bit DMA window at the boot time. The default window size is up to 2GB which requires 4MB of TCEs which is unlikely to be used entirely or at all as most devices these days are 64bit capable so by switching to 2 levels by default we save 4032KB of RAM per a device. While at this, add __GFP_NOWARN to alloc_pages_node() as the userspace can trigger this path via VFIO, see the failure and try creating a table again with different parameters which might succeed. [1]: === BUG: sleeping function called from invalid context at mm/page_alloc.c:4596 in_atomic(): 1, irqs_disabled(): 1, pid: 1038, name: scsi_eh_1 2 locks held by scsi_eh_1/1038: #0: 000000005efd659a (&host->eh_mutex){+.+.}, at: ata_eh_acquire+0x34/0x80 #1: 0000000006cf56a6 (&(&host->lock)->rlock){....}, at: ata_exec_internal_sg+0xb0/0x5c0 irq event stamp: 500 hardirqs last enabled at (499): [<c000000000cb8a74>] _raw_spin_unlock_irqrestore+0x94/0xd0 hardirqs last disabled at (500): [<c000000000cb85c4>] _raw_spin_lock_irqsave+0x44/0x120 softirqs last enabled at (0): [<c000000000101120>] copy_process.isra.4.part.5+0x640/0x1a80 softirqs last disabled at (0): [<0000000000000000>] 0x0 CPU: 73 PID: 1038 Comm: scsi_eh_1 Not tainted 5.2.0-rc6-le_nv2_aikATfstn1-p1 #634 Call Trace: [c000003d064cef50] [c000000000c8e6c4] dump_stack+0xe8/0x164 (unreliable) [c000003d064cefa0] [c00000000014ed78] ___might_sleep+0x2f8/0x310 [c000003d064cf020] [c0000000003ca084] __alloc_pages_nodemask+0x2a4/0x1560 [c000003d064cf220] [c0000000000c2530] pnv_alloc_tce_level.isra.0+0x90/0x130 [c000003d064cf290] [c0000000000c2888] pnv_tce+0x128/0x3b0 [c000003d064cf360] [c0000000000c2c00] pnv_tce_build+0xb0/0xf0 [c000003d064cf3c0] [c0000000000bbd9c] pnv_ioda2_tce_build+0x3c/0xb0 [c000003d064cf400] [c00000000004cfe0] ppc_iommu_map_sg+0x210/0x550 [c000003d064cf510] [c00000000004b7a4] dma_iommu_map_sg+0x74/0xb0 [c000003d064cf530] [c000000000863944] ata_qc_issue+0x134/0x470 [c000003d064cf5b0] [c000000000863ec4] ata_exec_internal_sg+0x244/0x5c0 [c000003d064cf700] [c0000000008642d0] ata_exec_internal+0x90/0xe0 [c000003d064cf780] [c0000000008650ac] ata_dev_read_id+0x2ec/0x640 [c000003d064cf8d0] [c000000000878e28] ata_eh_recover+0x948/0x16d0 [c000003d064cfa10] [c00000000087d760] sata_pmp_error_handler+0x480/0xbf0 [c000003d064cfbc0] [c000000000884624] ahci_error_handler+0x74/0xe0 [c000003d064cfbf0] [c000000000879fa8] ata_scsi_port_error_handler+0x2d8/0x7c0 [c000003d064cfca0] [c00000000087a544] ata_scsi_error+0xb4/0x100 [c000003d064cfd00] [c000000000802450] scsi_error_handler+0x120/0x510 [c000003d064cfdb0] [c000000000140c48] kthread+0x1b8/0x1c0 [c000003d064cfe20] [c00000000000bd8c] ret_from_kernel_thread+0x5c/0x70 ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300) irq event stamp: 2305 ======================================================== hardirqs last enabled at (2305): [<c00000000000e4c8>] fast_exc_return_irq+0x28/0x34 hardirqs last disabled at (2303): [<c000000000cb9fd0>] __do_softirq+0x4a0/0x654 WARNING: possible irq lock inversion dependency detected 5.2.0-rc6-le_nv2_aikATfstn1-p1 #634 Tainted: G W softirqs last enabled at (2304): [<c000000000cba054>] __do_softirq+0x524/0x654 softirqs last disabled at (2297): [<c00000000010f278>] irq_exit+0x128/0x180 -------------------------------------------------------- swapper/0/0 just changed the state of lock: 0000000006cf56a6 (&(&host->lock)->rlock){-...}, at: ahci_single_level_irq_intr+0xac/0x120 but this lock took another, HARDIRQ-unsafe lock in the past: (fs_reclaim){+.+.} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(fs_reclaim); local_irq_disable(); lock(&(&host->lock)->rlock); lock(fs_reclaim); <Interrupt> lock(&(&host->lock)->rlock); *** DEADLOCK *** no locks held by swapper/0/0. the shortest dependencies between 2nd lock and 1st lock: -> (fs_reclaim){+.+.} ops: 167579 { HARDIRQ-ON-W at: lock_acquire+0xf8/0x2a0 fs_reclaim_acquire.part.23+0x44/0x60 kmem_cache_alloc_node_trace+0x80/0x590 alloc_desc+0x64/0x270 __irq_alloc_descs+0x2e4/0x3a0 irq_domain_alloc_descs+0xb0/0x150 irq_create_mapping+0x168/0x2c0 xics_smp_probe+0x2c/0x98 pnv_smp_probe+0x40/0x9c smp_prepare_cpus+0x524/0x6c4 kernel_init_freeable+0x1b4/0x650 kernel_init+0x2c/0x148 ret_from_kernel_thread+0x5c/0x70 SOFTIRQ-ON-W at: lock_acquire+0xf8/0x2a0 fs_reclaim_acquire.part.23+0x44/0x60 kmem_cache_alloc_node_trace+0x80/0x590 alloc_desc+0x64/0x270 __irq_alloc_descs+0x2e4/0x3a0 irq_domain_alloc_descs+0xb0/0x150 irq_create_mapping+0x168/0x2c0 xics_smp_probe+0x2c/0x98 pnv_smp_probe+0x40/0x9c smp_prepare_cpus+0x524/0x6c4 kernel_init_freeable+0x1b4/0x650 kernel_init+0x2c/0x148 ret_from_kernel_thread+0x5c/0x70 INITIAL USE at: lock_acquire+0xf8/0x2a0 fs_reclaim_acquire.part.23+0x44/0x60 kmem_cache_alloc_node_trace+0x80/0x590 alloc_desc+0x64/0x270 __irq_alloc_descs+0x2e4/0x3a0 irq_domain_alloc_descs+0xb0/0x150 irq_create_mapping+0x168/0x2c0 xics_smp_probe+0x2c/0x98 pnv_smp_probe+0x40/0x9c smp_prepare_cpus+0x524/0x6c4 kernel_init_freeable+0x1b4/0x650 kernel_init+0x2c/0x148 ret_from_kernel_thread+0x5c/0x70 } === Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: NAlistair Popple <alistair@popple.id.au> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20190718051139.74787-4-aik@ozlabs.ruSigned-off-by: NSasha Levin <sashal@kernel.org>
-
由 Lewis Huang 提交于
[ Upstream commit e5382701c3520b3ed66169a6e4aa6ce5df8c56e0 ] [Why] The vm config will be clear to 0 when system enter S4. It will cause hubbub didn't know how to fetch data when system resume. The flip always pending because earliest_inuse_address and request_address are different. [How] Reprogram VM config when system resume Signed-off-by: NLewis Huang <Lewis.Huang@amd.com> Reviewed-by: NJun Lei <Jun.Lei@amd.com> Acked-by: NEric Yang <eric.yang2@amd.com> Acked-by: NLeo Li <sunpeng.li@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Anthony Koo 提交于
[ Upstream commit 1cbcfc975164f397b449efb17f59d81a703090db ] [Why] When endpoint is at the boundary of a region, such as at 2^0=1 we find that the last segment has a sharp slope and some points are clipped at the top. [How] If end point is 1, which is exactly at the 2^0 region boundary, we need to program an additional region beyond this point. Signed-off-by: NAnthony Koo <Anthony.Koo@amd.com> Reviewed-by: NAric Cyr <Aric.Cyr@amd.com> Acked-by: NLeo Li <sunpeng.li@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Icenowy Zheng 提交于
[ Upstream commit 720099603d1f62e37b789366d7e89824b009ca28 ] The MMC2 clock slices are currently not defined in V3s CCU driver, which makes MMC2 not working. Fix this issue. Fixes: d0f11d14 ("clk: sunxi-ng: add support for V3s CCU") Signed-off-by: NIcenowy Zheng <icenowy@aosc.io> Signed-off-by: NMaxime Ripard <maxime.ripard@bootlin.com> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Nathan Huckleberry 提交于
[ Upstream commit a95fb581b144b5e73da382eaedb2e32027610597 ] drivers/clk/clk-qoriq.c:138:38: warning: unused variable 'p5020_cmux_grp1' [-Wunused-const-variable] static const struct clockgen_muxinfo p5020_cmux_grp1 drivers/clk/clk-qoriq.c:146:38: warning: unused variable 'p5020_cmux_grp2' [-Wunused-const-variable] static const struct clockgen_muxinfo p5020_cmux_grp2 In the definition of the p5020 chip, the p2041 chip's info was used instead. The p5020 and p2041 chips have different info. This is most likely a typo. Link: https://github.com/ClangBuiltLinux/linux/issues/525 Cc: clang-built-linux@googlegroups.com Signed-off-by: NNathan Huckleberry <nhuck@google.com> Link: https://lkml.kernel.org/r/20190627220642.78575-1-nhuck@google.comReviewed-by: NNick Desaulniers <ndesaulniers@google.com> Acked-by: NScott Wood <oss@buserror.net> Signed-off-by: NStephen Boyd <sboyd@kernel.org> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Corey Minyard 提交于
[ Upstream commit 340ff31ab00bca5c15915e70ad9ada3030c98cf8 ] ipmi_thread() uses back-to-back schedule() to poll for command completion which, on some machines, can push up CPU consumption and heavily tax the scheduler locks leading to noticeable overall performance degradation. This was originally added so firmware updates through IPMI would complete in a timely manner. But we can't kill the scheduler locks for that one use case. Instead, only run schedule() continuously in maintenance mode, where firmware updates should run. Signed-off-by: NCorey Minyard <cminyard@mvista.com> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Nathan Chancellor 提交于
[ Upstream commit 0df3e42167caaf9f8c7b64de3da40a459979afe8 ] When building with -Wsometimes-uninitialized, clang warns: drivers/pci/hotplug/rpaphp_core.c:243:14: warning: variable 'fndit' is used uninitialized whenever 'for' loop exits because its condition is false [-Wsometimes-uninitialized] for (j = 0; j < entries; j++) { ^~~~~~~~~~~ drivers/pci/hotplug/rpaphp_core.c:256:6: note: uninitialized use occurs here if (fndit) ^~~~~ drivers/pci/hotplug/rpaphp_core.c:243:14: note: remove the condition if it is always true for (j = 0; j < entries; j++) { ^~~~~~~~~~~ drivers/pci/hotplug/rpaphp_core.c:233:14: note: initialize the variable 'fndit' to silence this warning int j, fndit; ^ = 0 fndit is only used to gate a sprintf call, which can be moved into the loop to simplify the code and eliminate the local variable, which will fix this warning. Fixes: 2fcf3ae5 ("hotplug/drc-info: Add code to search ibm,drc-info property") Suggested-by: NNick Desaulniers <ndesaulniers@google.com> Signed-off-by: NNathan Chancellor <natechancellor@gmail.com> Acked-by: NTyrel Datwyler <tyreld@linux.ibm.com> Acked-by: NJoel Savitz <jsavitz@redhat.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://github.com/ClangBuiltLinux/linux/issues/504 Link: https://lore.kernel.org/r/20190603221157.58502-1-natechancellor@gmail.comSigned-off-by: NSasha Levin <sashal@kernel.org>
-
由 Jia-Ju Bai 提交于
[ Upstream commit f3eb9b8f67bc28783eddc142ad805ebdc53d6339 ] In radeon_connector_set_property(), there is an if statement on line 743 to check whether connector->encoder is NULL: if (connector->encoder) When connector->encoder is NULL, it is used on line 755: if (connector->encoder->crtc) Thus, a possible null-pointer dereference may occur. To fix this bug, connector->encoder is checked before being used. This bug is found by a static analysis tool STCheck written by us. Signed-off-by: NJia-Ju Bai <baijiaju1990@gmail.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 KyleMahlkuch 提交于
[ Upstream commit 6f7fe9a93e6c09bf988c5059403f5f88e17e21e6 ] During kexec some adapters hit an EEH since they are not properly shut down in the radeon_pci_shutdown() function. Adding radeon_suspend_kms() fixes this issue. Signed-off-by: NKyleMahlkuch <kmahlkuc@linux.vnet.ibm.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Sean Paul 提交于
[ Upstream commit ad309284a52be47c8b3126c9376358bf381861bc ] Once we start shutting off the link during PSR, we're going to want fast training to work. If the display doesn't support fast training, don't enable psr. Changes in v2: - None Changes in v3: - None Changes in v4: - None Changes in v5: - None Link to v1: https://patchwork.freedesktop.org/patch/msgid/20190228210939.83386-3-sean@poorly.run Link to v2: https://patchwork.freedesktop.org/patch/msgid/20190326204509.96515-2-sean@poorly.run Link to v3: https://patchwork.freedesktop.org/patch/msgid/20190502194956.218441-9-sean@poorly.run Link to v4: https://patchwork.freedesktop.org/patch/msgid/20190508160920.144739-8-sean@poorly.run Cc: Zain Wang <wzz@rock-chips.com> Cc: Tomasz Figa <tfiga@chromium.org> Tested-by: NHeiko Stuebner <heiko@sntech.de> Reviewed-by: NHeiko Stuebner <heiko@sntech.de> Signed-off-by: NSean Paul <seanpaul@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20190611160844.257498-8-sean@poorly.runSigned-off-by: NSasha Levin <sashal@kernel.org>
-
由 Navid Emamdoost 提交于
[ Upstream commit afd6d4f5a52c16e1483328ac074abb1cde92c29f ] The following function calls may fail and return NULL, so the null check is added. of_graph_get_next_endpoint of_graph_get_remote_port_parent of_graph_get_remote_port Update: Thanks to Sam Ravnborg, for suggession on the use of goto to avoid leaking endpoint. Signed-off-by: NNavid Emamdoost <navid.emamdoost@gmail.com> Signed-off-by: NSam Ravnborg <sam@ravnborg.org> Link: https://patchwork.freedesktop.org/patch/msgid/20190724195534.9303-1-navid.emamdoost@gmail.comSigned-off-by: NSasha Levin <sashal@kernel.org>
-
由 Ahmad Fatoum 提交于
[ Upstream commit 8fabc9c3109a71b3577959a05408153ae69ccd8d ] To properly synchronize with other devices the fence from the GEM object backing the framebuffer needs to be attached to the atomic state, so the commit work can wait on fence signaling. Signed-off-by: NAhmad Fatoum <a.fatoum@pengutronix.de> Signed-off-by: NLucas Stach <l.stach@pengutronix.de> Acked-by: NPhilippe Cornu <philippe.cornu@st.com> Tested-by: NPhilippe Cornu <philippe.cornu@st.com> Signed-off-by: NBenjamin Gaignard <benjamin.gaignard@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/20190712084228.8338-1-l.stach@pengutronix.deSigned-off-by: NSasha Levin <sashal@kernel.org>
-
由 Marko Kohtala 提交于
[ Upstream commit dd9782834dd9dde3624ff1acea8859f3d3e792d4 ] The page_offset was only applied to the end of the page range. This caused the display updates to cause a scrolling effect on the display because the amount of data written to the display did not match the range display expected. Fixes: 301bc067 ("video: ssd1307fb: Make use of horizontal addressing mode") Signed-off-by: NMarko Kohtala <marko.kohtala@okoko.fi> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Rob Herring <robh+dt@kernel.org> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: David Airlie <airlied@linux.ie> Cc: Michal Vokáč <michal.vokac@ysoft.com> Signed-off-by: NBartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190618074111.9309-4-marko.kohtala@okoko.fiSigned-off-by: NSasha Levin <sashal@kernel.org>
-
由 Lucas Stach 提交于
[ Upstream commit f8c6bfc612b56f02e1b8fae699dff12738aaf889 ] The horizontal blanking periods are too short, as the values are specified for a single LVDS channel. Since this panel is dual LVDS they need to be doubled. With this change the panel reaches its nominal vrefresh rate of 60Fps, instead of the 64Fps with the current wrong blanking. Philipp Zabel added: The datasheet specifies 960 active clocks + 40/128/160 clocks blanking on each of the two LVDS channels (min/typical/max), so doubled this is now correct. Signed-off-by: NLucas Stach <l.stach@pengutronix.de> Reviewed-by: NPhilipp Zabel <p.zabel@pengutronix.de> Reviewed-by: NSam Ravnborg <sam@ravnborg.org> Signed-off-by: NSam Ravnborg <sam@ravnborg.org> Link: https://patchwork.freedesktop.org/patch/msgid/1562764060.23869.12.camel@pengutronix.deSigned-off-by: NSasha Levin <sashal@kernel.org>
-
由 Andrey Smirnov 提交于
[ Upstream commit e0655feaec62d5139b6b13a7b1bbb1ab8f1c2d83 ] According to the datasheet tc358767 can transfer up to 16 bytes via its AUX channel, so the artificial limit of 8 appears to be too low. However only up to 15-bytes seem to be actually supported and trying to use 16-byte transfers results in transfers failing sporadically (with bogus status in case of I2C transfers), so limit it to 15. Signed-off-by: NAndrey Smirnov <andrew.smirnov@gmail.com> Reviewed-by: NAndrzej Hajda <a.hajda@samsung.com> Reviewed-by: NTomi Valkeinen <tomi.valkeinen@ti.com> Cc: Andrzej Hajda <a.hajda@samsung.com> Cc: Laurent Pinchart <Laurent.pinchart@ideasonboard.com> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com> Cc: Andrey Gusakov <andrey.gusakov@cogentembedded.com> Cc: Philipp Zabel <p.zabel@pengutronix.de> Cc: Cory Tusar <cory.tusar@zii.aero> Cc: Chris Healy <cphealy@gmail.com> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: dri-devel@lists.freedesktop.org Cc: linux-kernel@vger.kernel.org Signed-off-by: NAndrzej Hajda <a.hajda@samsung.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190619052716.16831-9-andrew.smirnov@gmail.comSigned-off-by: NSasha Levin <sashal@kernel.org>
-
由 Vadim Sukhomlinov 提交于
commit db4d8cb9c9f2af71c4d087817160d866ed572cc9 upstream TPM 2.0 Shutdown involve sending TPM2_Shutdown to TPM chip and disabling future TPM operations. TPM 1.2 behavior was different, future TPM operations weren't disabled, causing rare issues. This patch ensures that future TPM operations are disabled. Fixes: d1bd4a79 ("tpm: Issue a TPM2_Shutdown for TPM2 devices.") Cc: stable@vger.kernel.org Signed-off-by: NVadim Sukhomlinov <sukhomlinov@google.com> [dianders: resolved merge conflicts with mainline] Signed-off-by: NDouglas Anderson <dianders@chromium.org> Reviewed-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Jarkko Sakkinen 提交于
commit 2677ca98ae377517930c183248221f69f771c921 upstream Use tpm_try_get_ops() in tpm-sysfs.c so that we can consider moving other decorations (locking, localities, power management for example) inside it. This direction can be of course taken only after other call sites for tpm_transmit() have been treated in the same way. Signed-off-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Reviewed-by: NStefan Berger <stefanb@linux.ibm.com> Tested-by: NStefan Berger <stefanb@linux.ibm.com> Reviewed-by: NJerry Snitselaar <jsnitsel@redhat.com> Reviewed-by: NJames Bottomley <James.Bottomley@HansenPartnership.com> Tested-by: NAlexander Steffen <Alexander.Steffen@infineon.com> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
- 05 10月, 2019 21 次提交
-
-
由 Greg Kroah-Hartman 提交于
-
由 Kai-Heng Feng 提交于
commit bb264220d9316f6bd7c1fd84b8da398c93912931 upstream. Laptops with AMD APU doesn't restore display backlight brightness after system resume. This issue started when DC was introduced. Let's use BL_CORE_SUSPENDRESUME so the backlight core calls update_status callback after system resume to restore the backlight level. Tested on Dell Inspiron 3180 (Stoney Ridge) and Dell Latitude 5495 (Raven Ridge). Cc: <stable@vger.kernel.org> Signed-off-by: NKai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Yafang Shao 提交于
[ Upstream commit a94b525241c0fff3598809131d7cfcfe1d572d8c ] total_{migrate,free}_scanned will be added to COMPACTMIGRATE_SCANNED and COMPACTFREE_SCANNED in compact_zone(). We should clear them before scanning a new zone. In the proc triggered compaction, we forgot clearing them. [laoar.shao@gmail.com: introduce a helper compact_zone_counters_init()] Link: http://lkml.kernel.org/r/1563869295-25748-1-git-send-email-laoar.shao@gmail.com [akpm@linux-foundation.org: expand compact_zone_counters_init() into its single callsite, per mhocko] [vbabka@suse.cz: squash compact_zone() list_head init as well] Link: http://lkml.kernel.org/r/1fb6f7da-f776-9e42-22f8-bbb79b030b98@suse.cz [akpm@linux-foundation.org: kcompactd_do_work(): avoid unnecessary initialization of cc.zone] Link: http://lkml.kernel.org/r/1563789275-9639-1-git-send-email-laoar.shao@gmail.com Fixes: 7f354a54 ("mm, compaction: add vmstats for kcompactd work") Signed-off-by: NYafang Shao <laoar.shao@gmail.com> Signed-off-by: NVlastimil Babka <vbabka@suse.cz> Reviewed-by: NVlastimil Babka <vbabka@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: Yafang Shao <shaoyafang@didiglobal.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Eric Biggers 提交于
[ Upstream commit 76e43c8ccaa35c30d5df853013561145a0f750a5 ] When IOCB_CMD_POLL is used on the FUSE device, aio_poll() disables IRQs and takes kioctx::ctx_lock, then fuse_iqueue::waitq.lock. This may have to wait for fuse_iqueue::waitq.lock to be released by one of many places that take it with IRQs enabled. Since the IRQ handler may take kioctx::ctx_lock, lockdep reports that a deadlock is possible. Fix it by protecting the state of struct fuse_iqueue with a separate spinlock, and only accessing fuse_iqueue::waitq using the versions of the waitqueue functions which do IRQ-safe locking internally. Reproducer: #include <fcntl.h> #include <stdio.h> #include <sys/mount.h> #include <sys/stat.h> #include <sys/syscall.h> #include <unistd.h> #include <linux/aio_abi.h> int main() { char opts[128]; int fd = open("/dev/fuse", O_RDWR); aio_context_t ctx = 0; struct iocb cb = { .aio_lio_opcode = IOCB_CMD_POLL, .aio_fildes = fd }; struct iocb *cbp = &cb; sprintf(opts, "fd=%d,rootmode=040000,user_id=0,group_id=0", fd); mkdir("mnt", 0700); mount("foo", "mnt", "fuse", 0, opts); syscall(__NR_io_setup, 1, &ctx); syscall(__NR_io_submit, ctx, 1, &cbp); } Beginning of lockdep output: ===================================================== WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected 5.3.0-rc5 #9 Not tainted ----------------------------------------------------- syz_fuse/135 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: 000000003590ceda (&fiq->waitq){+.+.}, at: spin_lock include/linux/spinlock.h:338 [inline] 000000003590ceda (&fiq->waitq){+.+.}, at: aio_poll fs/aio.c:1751 [inline] 000000003590ceda (&fiq->waitq){+.+.}, at: __io_submit_one.constprop.0+0x203/0x5b0 fs/aio.c:1825 and this task is already holding: 0000000075037284 (&(&ctx->ctx_lock)->rlock){..-.}, at: spin_lock_irq include/linux/spinlock.h:363 [inline] 0000000075037284 (&(&ctx->ctx_lock)->rlock){..-.}, at: aio_poll fs/aio.c:1749 [inline] 0000000075037284 (&(&ctx->ctx_lock)->rlock){..-.}, at: __io_submit_one.constprop.0+0x1f4/0x5b0 fs/aio.c:1825 which would create a new lock dependency: (&(&ctx->ctx_lock)->rlock){..-.} -> (&fiq->waitq){+.+.} but this new dependency connects a SOFTIRQ-irq-safe lock: (&(&ctx->ctx_lock)->rlock){..-.} [...] Reported-by: syzbot+af05535bb79520f95431@syzkaller.appspotmail.com Reported-by: syzbot+d86c4426a01f60feddc7@syzkaller.appspotmail.com Fixes: bfe4037e ("aio: implement IOCB_CMD_POLL") Cc: <stable@vger.kernel.org> # v4.19+ Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 NeilBrown 提交于
[ Upstream commit c84a1372df929033cb1a0441fb57bd3932f39ac9 ] If the drives in a RAID0 are not all the same size, the array is divided into zones. The first zone covers all drives, to the size of the smallest. The second zone covers all drives larger than the smallest, up to the size of the second smallest - etc. A change in Linux 3.14 unintentionally changed the layout for the second and subsequent zones. All the correct data is still stored, but each chunk may be assigned to a different device than in pre-3.14 kernels. This can lead to data corruption. It is not possible to determine what layout to use - it depends which kernel the data was written by. So we add a module parameter to allow the old (0) or new (1) layout to be specified, and refused to assemble an affected array if that parameter is not set. Fixes: 20d0189b ("block: Introduce new bio_split()") cc: stable@vger.kernel.org (3.14+) Acked-by: NGuoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: NNeilBrown <neilb@suse.de> Signed-off-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Pavel Shilovsky 提交于
commit a016e2794fc3a245a91946038dd8f34d65e53cc3 upstream. There may be situations when a server negotiates SMB 2.1 protocol version or higher but responds to a CREATE request with an oplock rather than a lease. Currently the client doesn't handle such a case correctly: when another CREATE comes in the server sends an oplock break to the initial CREATE and the client doesn't send an ack back due to a wrong caching level being set (READ instead of RWH). Missing an oplock break ack makes the server wait until the break times out which dramatically increases the latency of the second CREATE. Fix this by properly detecting oplocks when using SMB 2.1 protocol version and higher. Cc: <stable@vger.kernel.org> Signed-off-by: NPavel Shilovsky <pshilov@microsoft.com> Signed-off-by: NSteve French <stfrench@microsoft.com> Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Murphy Zhou 提交于
commit 63d37fb4ce5ae7bf1e58f906d1bf25f036fe79b2 upstream. It should not be larger then the slab max buf size. If user specifies a larger size, it passes this check and goes straightly to SMB2_set_info_init performing an insecure memcpy. Signed-off-by: NMurphy Zhou <jencce.kernel@gmail.com> Reviewed-by: NAurelien Aptel <aaptel@suse.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: NSteve French <stfrench@microsoft.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Chris Brandt 提交于
commit a71e2ac1f32097fbb2beab098687a7a95c84543e upstream. The NACKF flag should be cleared in INTRIICNAKI interrupt processing as description in HW manual. This issue shows up quickly when PREEMPT_RT is applied and a device is probed that is not plugged in (like a touchscreen controller). The result is endless interrupts that halt system boot. Fixes: 310c18a4 ("i2c: riic: add driver") Cc: stable@vger.kernel.org Reported-by: NChien Nguyen <chien.nguyen.eb@rvc.renesas.com> Signed-off-by: NChris Brandt <chris.brandt@renesas.com> Signed-off-by: NWolfram Sang <wsa@the-dreams.de> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Laurent Vivier 提交于
commit 78887832e76541f77169a24ac238fccb51059b63 upstream. add_early_randomness() is called by hwrng_register() when the hardware is added. If this hardware and its module are present at boot, and if there is no data available the boot hangs until data are available and can't be interrupted. For instance, in the case of virtio-rng, in some cases the host can be not able to provide enough entropy for all the guests. We can have two easy ways to reproduce the problem but they rely on misconfiguration of the hypervisor or the egd daemon: - if virtio-rng device is configured to connect to the egd daemon of the host but when the virtio-rng driver asks for data the daemon is not connected, - if virtio-rng device is configured to connect to the egd daemon of the host but the egd daemon doesn't provide data. The guest kernel will hang at boot until the virtio-rng driver provides enough data. To avoid that, call rng_get_data() in non-blocking mode (wait=0) from add_early_randomness(). Signed-off-by: NLaurent Vivier <lvivier@redhat.com> Fixes: d9e79726 ("hwrng: add randomness to system from rng...") Cc: <stable@vger.kernel.org> Reviewed-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Chao Yu 提交于
commit 6565c182094f69e4ffdece337d395eb7ec760efc upstream. Quoted from commit 3da40c7b ("ext4: only call ext4_truncate when size <= isize") " At LSF we decided that if we truncate up from isize we shouldn't trim fallocated blocks that were fallocated with KEEP_SIZE and are past the new i_size. This patch fixes ext4 to do this. " And generic/092 of fstest have covered this case for long time, however is_quota_modification() didn't adjust based on that rule, so that in below condition, we will lose to quota block change: - fallocate blocks beyond EOF - remount - truncate(file_path, file_size) Fix it. Link: https://lore.kernel.org/r/20190911093650.35329-1-yuchao0@huawei.com Fixes: 3da40c7b ("ext4: only call ext4_truncate when size <= isize") CC: stable@vger.kernel.org Signed-off-by: NChao Yu <yuchao0@huawei.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Theodore Ts'o 提交于
commit c1e8220bd316d8ae8e524df39534b8a412a45d5e upstream. If a program attempts to punch a hole on an inline data file, we need to convert it to a normal file first. This was detected using ext4/032 using the adv configuration. Simple reproducer: mke2fs -Fq -t ext4 -O inline_data /dev/vdc mount /vdc echo "" > /vdc/testfile xfs_io -c 'truncate 33554432' /vdc/testfile xfs_io -c 'fpunch 0 1048576' /vdc/testfile umount /vdc e2fsck -fy /dev/vdc Cc: stable@vger.kernel.org Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Rakesh Pandit 提交于
commit e3d550c2c4f2f3dba469bc3c4b83d9332b4e99e1 upstream. Really enable warning when CONFIG_EXT4_DEBUG is set and fix missing first argument. This was introduced in commit ff95ec22 ("ext4: add warning to ext4_convert_unwritten_extents_endio") and splitting extents inside endio would trigger it. Fixes: ff95ec22 ("ext4: add warning to ext4_convert_unwritten_extents_endio") Signed-off-by: NRakesh Pandit <rakesh@tuxera.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Tetsuo Handa 提交于
commit 8619e5bdeee8b2c685d686281f2d2a6017c4bc15 upstream. syzbot found that a thread can stall for minutes inside read_mem() or write_mem() after that thread was killed by SIGKILL [1]. Reading from iomem areas of /dev/mem can be slow, depending on the hardware. While reading 2GB at one read() is legal, delaying termination of killed thread for minutes is bad. Thus, allow reading/writing /dev/mem and /dev/kmem to be preemptible and killable. [ 1335.912419][T20577] read_mem: sz=4096 count=2134565632 [ 1335.943194][T20577] read_mem: sz=4096 count=2134561536 [ 1335.978280][T20577] read_mem: sz=4096 count=2134557440 [ 1336.011147][T20577] read_mem: sz=4096 count=2134553344 [ 1336.041897][T20577] read_mem: sz=4096 count=2134549248 Theoretically, reading/writing /dev/mem and /dev/kmem can become "interruptible". But this patch chose "killable". Future patch will make them "interruptible" so that we can revert to "killable" if some program regressed. [1] https://syzkaller.appspot.com/bug?id=a0e3436829698d5824231251fad9d8e998f94f5eSigned-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: stable <stable@vger.kernel.org> Reported-by: Nsyzbot <syzbot+8ab2d0f39fb79fe6ca40@syzkaller.appspotmail.com> Link: https://lore.kernel.org/r/1566825205-10703-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jpSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Denis Kenzior 提交于
commit c1d3ad84eae35414b6b334790048406bd6301b12 upstream. Currently frame registrations are not purged, even when changing the interface type. This can lead to potentially weird situations where frames possibly not allowed on a given interface type remain registered due to the type switching happening after registration. The kernel currently relies on userspace apps to actually purge the registrations themselves, this is not something that the kernel should rely on. Add a call to cfg80211_mlme_purge_registrations() to forcefully remove any registrations left over prior to switching the iftype. Cc: stable@vger.kernel.org Signed-off-by: NDenis Kenzior <denkenz@gmail.com> Link: https://lore.kernel.org/r/20190828211110.15005-1-denkenz@gmail.comSigned-off-by: NJohannes Berg <johannes.berg@intel.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 NeilBrown 提交于
commit 480523feae581ab714ba6610388a3b4619a2f695 upstream. Since commit 4ad23a97 ("MD: use per-cpu counter for writes_pending"), set_in_sync() is substantially more expensive: it can wait for a full RCU grace period which can be 10s of milliseconds. So we should only call it when the cost is justified. md_check_recovery() currently calls set_in_sync() every time it finds anything to do (on non-external active arrays). For an array performing resync or recovery, this will be quite often. Each call will introduce a delay to the md thread, which can noticeable affect IO submission latency. In md_check_recovery() we only need to call set_in_sync() if 'safemode' was non-zero at entry, meaning that there has been not recent IO. So we save this "safemode was nonzero" state, and only call set_in_sync() if it was non-zero. This measurably reduces mean and maximum IO submission latency during resync/recovery. Reported-and-tested-by: NJack Wang <jinpu.wang@cloud.ionos.com> Fixes: 4ad23a97 ("MD: use per-cpu counter for writes_pending") Cc: stable@vger.kernel.org (v4.12+) Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 NeilBrown 提交于
commit 9d4b45d6af442237560d0bb5502a012baa5234b7 upstream. Until revalidate_disk() has completed, the size of a new md array will appear to be zero. So we shouldn't report, through array_state, that the array is active until that time. udev rules check array_state to see if the array is ready. As soon as it appear to be zero, fsck can be run. If it find the size to be zero, it will fail. So add a new flag to provide an interlock between do_md_run() and array_state_show(). This flag is set while do_md_run() is active and it prevents array_state_show() from reporting that the array is active. Before do_md_run() is called, ->pers will be NULL so array is definitely not active. After do_md_run() is called, revalidate_disk() will have run and the array will be completely ready. We also move various sysfs_notify*() calls out of md_run() into do_md_run() after MD_NOT_READY is cleared. This ensure the information is ready before the notification is sent. Prior to v4.12, array_state_show() was called with the mddev->reconfig_mutex held, which provided exclusion with do_md_run(). Note that MD_NOT_READY cleared twice. This is deliberate to cover both success and error paths with minimal noise. Fixes: b7b17c9b ("md: remove mddev_lock() from md_attr_show()") Cc: stable@vger.kernel.org (v4.12++) Signed-off-by: NNeilBrown <neilb@suse.com> Signed-off-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Xiao Ni 提交于
commit 143f6e733b73051cd22dcb80951c6c929da413ce upstream. 7471fb77 ("md/raid6: Fix anomily when recovering a single device in RAID6.") avoids rereading P when it can be computed from other members. However, this misses the chance to re-write the right data to P. This patch sets R5_ReadError if the re-read fails. Also, when re-read is skipped, we also missed the chance to reset rdev->read_errors to 0. It can fail the disk when there are many read errors on P member disk (other disks don't have read error) V2: upper layer read request don't read parity/Q data. So there is no need to consider such situation. This is Reported-by: kbuild test robot <lkp@intel.com> Fixes: 7471fb77 ("md/raid6: Fix anomily when recovering a single device in RAID6.") Cc: <stable@vger.kernel.org> #4.4+ Signed-off-by: NXiao Ni <xni@redhat.com> Signed-off-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Filipe Manana 提交于
commit 13fc1d271a2e3ab8a02071e711add01fab9271f6 upstream. There is a race between setting up a qgroup rescan worker and completing a qgroup rescan worker that can lead to callers of the qgroup rescan wait ioctl to either not wait for the rescan worker to complete or to hang forever due to missing wake ups. The following diagram shows a sequence of steps that illustrates the race. CPU 1 CPU 2 CPU 3 btrfs_ioctl_quota_rescan() btrfs_qgroup_rescan() qgroup_rescan_init() mutex_lock(&fs_info->qgroup_rescan_lock) spin_lock(&fs_info->qgroup_lock) fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_RESCAN init_completion( &fs_info->qgroup_rescan_completion) fs_info->qgroup_rescan_running = true mutex_unlock(&fs_info->qgroup_rescan_lock) spin_unlock(&fs_info->qgroup_lock) btrfs_init_work() --> starts the worker btrfs_qgroup_rescan_worker() mutex_lock(&fs_info->qgroup_rescan_lock) fs_info->qgroup_flags &= ~BTRFS_QGROUP_STATUS_FLAG_RESCAN mutex_unlock(&fs_info->qgroup_rescan_lock) starts transaction, updates qgroup status item, etc btrfs_ioctl_quota_rescan() btrfs_qgroup_rescan() qgroup_rescan_init() mutex_lock(&fs_info->qgroup_rescan_lock) spin_lock(&fs_info->qgroup_lock) fs_info->qgroup_flags |= BTRFS_QGROUP_STATUS_FLAG_RESCAN init_completion( &fs_info->qgroup_rescan_completion) fs_info->qgroup_rescan_running = true mutex_unlock(&fs_info->qgroup_rescan_lock) spin_unlock(&fs_info->qgroup_lock) btrfs_init_work() --> starts another worker mutex_lock(&fs_info->qgroup_rescan_lock) fs_info->qgroup_rescan_running = false mutex_unlock(&fs_info->qgroup_rescan_lock) complete_all(&fs_info->qgroup_rescan_completion) Before the rescan worker started by the task at CPU 3 completes, if another task calls btrfs_ioctl_quota_rescan(), it will get -EINPROGRESS because the flag BTRFS_QGROUP_STATUS_FLAG_RESCAN is set at fs_info->qgroup_flags, which is expected and correct behaviour. However if other task calls btrfs_ioctl_quota_rescan_wait() before the rescan worker started by the task at CPU 3 completes, it will return immediately without waiting for the new rescan worker to complete, because fs_info->qgroup_rescan_running is set to false by CPU 2. This race is making test case btrfs/171 (from fstests) to fail often: btrfs/171 9s ... - output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad) # --- tests/btrfs/171.out 2018-09-16 21:30:48.505104287 +0100 # +++ /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad 2019-09-19 02:01:36.938486039 +0100 # @@ -1,2 +1,3 @@ # QA output created by 171 # +ERROR: quota rescan failed: Operation now in progress # Silence is golden # ... # (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/171.out /home/fdmanana/git/hub/xfstests/results//btrfs/171.out.bad' to see the entire diff) That is because the test calls the btrfs-progs commands "qgroup quota rescan -w", "qgroup assign" and "qgroup remove" in a sequence that makes calls to the rescan start ioctl fail with -EINPROGRESS (note the "btrfs" commands 'qgroup assign' and 'qgroup remove' often call the rescan start ioctl after calling the qgroup assign ioctl, btrfs_ioctl_qgroup_assign()), since previous waits didn't actually wait for a rescan worker to complete. Another problem the race can cause is missing wake ups for waiters, since the call to complete_all() happens outside a critical section and after clearing the flag BTRFS_QGROUP_STATUS_FLAG_RESCAN. In the sequence diagram above, if we have a waiter for the first rescan task (executed by CPU 2), then fs_info->qgroup_rescan_completion.wait is not empty, and if after the rescan worker clears BTRFS_QGROUP_STATUS_FLAG_RESCAN and before it calls complete_all() against fs_info->qgroup_rescan_completion, the task at CPU 3 calls init_completion() against fs_info->qgroup_rescan_completion which re-initilizes its wait queue to an empty queue, therefore causing the rescan worker at CPU 2 to call complete_all() against an empty queue, never waking up the task waiting for that rescan worker. Fix this by clearing BTRFS_QGROUP_STATUS_FLAG_RESCAN and setting fs_info->qgroup_rescan_running to false in the same critical section, delimited by the mutex fs_info->qgroup_rescan_lock, as well as doing the call to complete_all() in that same critical section. This gives the protection needed to avoid rescan wait ioctl callers not waiting for a running rescan worker and the lost wake ups problem, since setting that rescan flag and boolean as well as initializing the wait queue is done already in a critical section delimited by that mutex (at qgroup_rescan_init()). Fixes: 57254b6e ("Btrfs: add ioctl to wait for qgroup rescan completion") Fixes: d2c609b8 ("btrfs: properly track when rescan worker is running") CC: stable@vger.kernel.org # 4.4+ Reviewed-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NFilipe Manana <fdmanana@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Qu Wenruo 提交于
commit d4e204948fe3e0dc8e1fbf3f8f3290c9c2823be3 upstream. [BUG] The following script can cause btrfs qgroup data space leak: mkfs.btrfs -f $dev mount $dev -o nospace_cache $mnt btrfs subv create $mnt/subv btrfs quota en $mnt btrfs quota rescan -w $mnt btrfs qgroup limit 128m $mnt/subv for (( i = 0; i < 3; i++)); do # Create 3 64M holes for latter fallocate to fail truncate -s 192m $mnt/subv/file xfs_io -c "pwrite 64m 4k" $mnt/subv/file > /dev/null xfs_io -c "pwrite 128m 4k" $mnt/subv/file > /dev/null sync # it's supposed to fail, and each failure will leak at least 64M # data space xfs_io -f -c "falloc 0 192m" $mnt/subv/file &> /dev/null rm $mnt/subv/file sync done # Shouldn't fail after we removed the file xfs_io -f -c "falloc 0 64m" $mnt/subv/file [CAUSE] Btrfs qgroup data reserve code allow multiple reservations to happen on a single extent_changeset: E.g: btrfs_qgroup_reserve_data(inode, &data_reserved, 0, SZ_1M); btrfs_qgroup_reserve_data(inode, &data_reserved, SZ_1M, SZ_2M); btrfs_qgroup_reserve_data(inode, &data_reserved, 0, SZ_4M); Btrfs qgroup code has its internal tracking to make sure we don't double-reserve in above example. The only pattern utilizing this feature is in the main while loop of btrfs_fallocate() function. However btrfs_qgroup_reserve_data()'s error handling has a bug in that on error it clears all ranges in the io_tree with EXTENT_QGROUP_RESERVED flag but doesn't free previously reserved bytes. This bug has a two fold effect: - Clearing EXTENT_QGROUP_RESERVED ranges This is the correct behavior, but it prevents btrfs_qgroup_check_reserved_leak() to catch the leakage as the detector is purely EXTENT_QGROUP_RESERVED flag based. - Leak the previously reserved data bytes. The bug manifests when N calls to btrfs_qgroup_reserve_data are made and the last one fails, leaking space reserved in the previous ones. [FIX] Also free previously reserved data bytes when btrfs_qgroup_reserve_data fails. Fixes: 52472553 ("btrfs: qgroup: Introduce btrfs_qgroup_reserve_data function") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Qu Wenruo 提交于
commit bab32fc069ce8829c416e8737c119f62a57970f9 upstream. [BUG] Under the following case with qgroup enabled, if some error happened after we have reserved delalloc space, then in error handling path, we could cause qgroup data space leakage: From btrfs_truncate_block() in inode.c: ret = btrfs_delalloc_reserve_space(inode, &data_reserved, block_start, blocksize); if (ret) goto out; again: page = find_or_create_page(mapping, index, mask); if (!page) { btrfs_delalloc_release_space(inode, data_reserved, block_start, blocksize, true); btrfs_delalloc_release_extents(BTRFS_I(inode), blocksize, true); ret = -ENOMEM; goto out; } [CAUSE] In the above case, btrfs_delalloc_reserve_space() will call btrfs_qgroup_reserve_data() and mark the io_tree range with EXTENT_QGROUP_RESERVED flag. In the error handling path, we have the following call stack: btrfs_delalloc_release_space() |- btrfs_free_reserved_data_space() |- btrsf_qgroup_free_data() |- __btrfs_qgroup_release_data(reserved=@reserved, free=1) |- qgroup_free_reserved_data(reserved=@reserved) |- clear_record_extent_bits(); |- freed += changeset.bytes_changed; However due to a completion bug, qgroup_free_reserved_data() will clear EXTENT_QGROUP_RESERVED flag in BTRFS_I(inode)->io_failure_tree, other than the correct BTRFS_I(inode)->io_tree. Since io_failure_tree is never marked with that flag, btrfs_qgroup_free_data() will not free any data reserved space at all, causing a leakage. This type of error handling can only be triggered by errors outside of qgroup code. So EDQUOT error from qgroup can't trigger it. [FIX] Fix the wrong target io_tree. Reported-by: NJosef Bacik <josef@toxicpanda.com> Fixes: bc42bda2 ("btrfs: qgroup: Fix qgroup reserved space underflow by only freeing reserved ranges") CC: stable@vger.kernel.org # 4.14+ Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NQu Wenruo <wqu@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Nikolay Borisov 提交于
commit 6af112b11a4bc1b560f60a618ac9c1dcefe9836e upstream. When doing any form of incremental send the parent and the child trees need to be compared via btrfs_compare_trees. This can result in long loop chains without ever relinquishing the CPU. This causes softlockup detector to trigger when comparing trees with a lot of items. Example report: watchdog: BUG: soft lockup - CPU#0 stuck for 24s! [snapperd:16153] CPU: 0 PID: 16153 Comm: snapperd Not tainted 5.2.9-1-default #1 openSUSE Tumbleweed (unreleased) Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 pstate: 40000005 (nZcv daif -PAN -UAO) pc : __ll_sc_arch_atomic_sub_return+0x14/0x20 lr : btrfs_release_extent_buffer_pages+0xe0/0x1e8 [btrfs] sp : ffff00001273b7e0 Call trace: __ll_sc_arch_atomic_sub_return+0x14/0x20 release_extent_buffer+0xdc/0x120 [btrfs] free_extent_buffer.part.0+0xb0/0x118 [btrfs] free_extent_buffer+0x24/0x30 [btrfs] btrfs_release_path+0x4c/0xa0 [btrfs] btrfs_free_path.part.0+0x20/0x40 [btrfs] btrfs_free_path+0x24/0x30 [btrfs] get_inode_info+0xa8/0xf8 [btrfs] finish_inode_if_needed+0xe0/0x6d8 [btrfs] changed_cb+0x9c/0x410 [btrfs] btrfs_compare_trees+0x284/0x648 [btrfs] send_subvol+0x33c/0x520 [btrfs] btrfs_ioctl_send+0x8a0/0xaf0 [btrfs] btrfs_ioctl+0x199c/0x2288 [btrfs] do_vfs_ioctl+0x4b0/0x820 ksys_ioctl+0x84/0xb8 __arm64_sys_ioctl+0x28/0x38 el0_svc_common.constprop.0+0x7c/0x188 el0_svc_handler+0x34/0x90 el0_svc+0x8/0xc Fix this by adding a call to cond_resched at the beginning of the main loop in btrfs_compare_trees. Fixes: 7069830a ("Btrfs: add btrfs_compare_trees function") CC: stable@vger.kernel.org # 4.4+ Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-