- 01 12月, 2019 40 次提交
-
-
由 David Hildenbrand 提交于
[ Upstream commit 8df1d0e4a265f25dc1e7e7624ccdbcb4a6630c89 ] add_memory() currently does not take the device_hotplug_lock, however is aleady called under the lock from arch/powerpc/platforms/pseries/hotplug-memory.c drivers/acpi/acpi_memhotplug.c to synchronize against CPU hot-remove and similar. In general, we should hold the device_hotplug_lock when adding memory to synchronize against online/offline request (e.g. from user space) - which already resulted in lock inversions due to device_lock() and mem_hotplug_lock - see 30467e0b ("mm, hotplug: fix concurrent memory hot-add deadlock"). add_memory()/add_memory_resource() will create memory block devices, so this really feels like the right thing to do. Holding the device_hotplug_lock makes sure that a memory block device can really only be accessed (e.g. via .online/.state) from user space, once the memory has been fully added to the system. The lock is not held yet in drivers/xen/balloon.c arch/powerpc/platforms/powernv/memtrace.c drivers/s390/char/sclp_cmd.c drivers/hv/hv_balloon.c So, let's either use the locked variants or take the lock. Don't export add_memory_resource(), as it once was exported to be used by XEN, which is never built as a module. If somebody requires it, we also have to export a locked variant (as device_hotplug_lock is never exported). Link: http://lkml.kernel.org/r/20180925091457.28651-3-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com> Reviewed-by: NPavel Tatashin <pavel.tatashin@microsoft.com> Reviewed-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: NRashmica Gupta <rashmica.g@gmail.com> Reviewed-by: NOscar Salvador <osalvador@suse.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Len Brown <lenb@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Cc: John Allen <jallen@linux.vnet.ibm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mathieu Malaterre <malat@debian.org> Cc: Pavel Tatashin <pavel.tatashin@microsoft.com> Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Neuling <mikey@neuling.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Borislav Petkov 提交于
[ Upstream commit 95c4fb78fb23081472465ca20d5d31c4b780ed82 ] ... because panic() itself already does this. Otherwise you have line-broken trailer: [ 1.836965] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: pgd_alloc+0x29e/0x2a0 [ 1.836965] ]--- Link: http://lkml.kernel.org/r/20181008202901.7894-1-bp@alien8.deSigned-off-by: NBorislav Petkov <bp@suse.de> Acked-by: NKees Cook <keescook@chromium.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Colin Ian King 提交于
[ Upstream commit 6c9a3f843a29d6894dfc40df338b91dbd78f0ae3 ] Currently extent and index i are both being incremented causing an array out of bounds read on extent[i]. Fix this by removing the extraneous increment of extent. Ernesto said: : This is only triggered when deleting a file with a resource fork. I : may be wrong because the documentation isn't clear, but I don't think : you can create those under linux. So I guess nobody was testing them. : : > A disk space leak, perhaps? : : That's what it looks like in general. hfs_free_extents() won't do : anything if the block count doesn't add up, and the error will be : ignored. Now, if the block count randomly does add up, we could see : some corruption. Detected by CoverityScan, CID#711541 ("Out of bounds read") Link: http://lkml.kernel.org/r/20180831140538.31566-1-colin.king@canonical.comSigned-off-by: NColin Ian King <colin.king@canonical.com> Reviewed-by: NErnesto A. Fernndez <ernesto.mnd.fernandez@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Hin-Tak Leung <htl10@users.sourceforge.net> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Ernesto A. Fernández 提交于
[ Upstream commit 8cd3cb5061730af085a3f9890a3352f162b4e20c ] The vfs takes care of updating mtime on ftruncate(), but on truncate() it must be done by the module. Link: http://lkml.kernel.org/r/e1611eda2985b672ed2d8677350b4ad8c2d07e8a.1539316825.git.ernesto.mnd.fernandez@gmail.comSigned-off-by: NErnesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Reviewed-by: NVyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Ernesto A. Fernández 提交于
[ Upstream commit dc8844aada735890a6de109bef327f5df36a982e ] The vfs takes care of updating ctime and mtime on ftruncate(), but on truncate() it must be done by the module. This patch can be tested with xfstests generic/313. Link: http://lkml.kernel.org/r/9beb0913eea37288599e8e1b7cec8768fb52d1b8.1539316825.git.ernesto.mnd.fernandez@gmail.comSigned-off-by: NErnesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Reviewed-by: NVyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Ernesto A. Fernández 提交于
[ Upstream commit 1267a07be5ebbff2d2739290f3d043ae137c15b4 ] Direct writes to empty inodes fail with EIO. The generic direct-io code is in part to blame (a patch has been submitted as "direct-io: allow direct writes to empty inodes"), but hfs is worse affected than the other filesystems because the fallback to buffered I/O doesn't happen. The problem is the return value of hfs_get_block() when called with !create. Change it to be more consistent with the other modules. Link: http://lkml.kernel.org/r/4538ab8c35ea37338490525f0f24cbc37227528c.1539195310.git.ernesto.mnd.fernandez@gmail.comSigned-off-by: NErnesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Reviewed-by: NVyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Ernesto A. Fernández 提交于
[ Upstream commit 839c3a6a5e1fbc8542d581911b35b2cb5cd29304 ] Direct writes to empty inodes fail with EIO. The generic direct-io code is in part to blame (a patch has been submitted as "direct-io: allow direct writes to empty inodes"), but hfsplus is worse affected than the other filesystems because the fallback to buffered I/O doesn't happen. The problem is the return value of hfsplus_get_block() when called with !create. Change it to be more consistent with the other modules. Link: http://lkml.kernel.org/r/2cd1301404ec7cf1e39c8f11a01a4302f1460ad6.1539195310.git.ernesto.mnd.fernandez@gmail.comSigned-off-by: NErnesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Reviewed-by: NVyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Ernesto A. Fernández 提交于
[ Upstream commit 54640c7502e5ed41fbf4eedd499e85f9acc9698f ] Inserting a new record in a btree may require splitting several of its nodes. If we hit ENOSPC halfway through, the new nodes will be left orphaned and their records will be lost. This could mean lost inodes or extents. Henceforth, check the available disk space before making any changes. This still leaves the potential problem of corruption on ENOMEM. There is no need to reserve space before deleting a catalog record, as we do for hfsplus. This difference is because hfs index nodes have fixed length keys. Link: http://lkml.kernel.org/r/ab5fc8a7d5ffccfd5f27b1cf2cb4ceb6c110da74.1536269131.git.ernesto.mnd.fernandez@gmail.comSigned-off-by: NErnesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Ernesto A. Fernández 提交于
[ Upstream commit d92915c35bfaf763d78bf1d5ac7f183420e3bd99 ] Inserting or deleting a record in a btree may require splitting several of its nodes. If we hit ENOSPC halfway through, the new nodes will be left orphaned and their records will be lost. This could mean lost inodes, extents or xattrs. Henceforth, check the available disk space before making any changes. This still leaves the potential problem of corruption on ENOMEM. The patch can be tested with xfstests generic/027. Link: http://lkml.kernel.org/r/4596eef22fbda137b4ffa0272d92f0da15364421.1536269129.git.ernesto.mnd.fernandez@gmail.comSigned-off-by: NErnesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Ernesto A. Fernández 提交于
[ Upstream commit ef75bcc5763d130451a99825f247d301088b790b ] hfs_brec_update_parent() may hit BUG_ON() if the first record of both a leaf node and its parent are changed, and if this forces the parent to be split. It is not possible for this to happen on a valid hfs filesystem because the index nodes have fixed length keys. For reasons I ignore, the hfs module does have support for a number of hfsplus features. A corrupt btree header may report variable length keys and trigger this BUG, so it's better to fix it. Link: http://lkml.kernel.org/r/cf9b02d57f806217a2b1bf5db8c3e39730d8f603.1535682463.git.ernesto.mnd.fernandez@gmail.comSigned-off-by: NErnesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Viacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Ernesto A. Fernández 提交于
[ Upstream commit 19a9d0f1acf75e8be8cfba19c1a34e941846fa2b ] Creating, renaming or deleting a file may hit BUG_ON() if the first record of both a leaf node and its parent are changed, and if this forces the parent to be split. This bug is triggered by xfstests generic/027, somewhat rarely; here is a more reliable reproducer: truncate -s 50M fs.iso mkfs.hfsplus fs.iso mount fs.iso /mnt i=1000 while [ $i -le 2400 ]; do touch /mnt/$i &>/dev/null ((++i)) done i=2400 while [ $i -ge 1000 ]; do mv /mnt/$i /mnt/$(perl -e "print $i x61") &>/dev/null ((--i)) done The issue is that a newly created bnode is being put twice. Reset new_node to NULL in hfs_brec_update_parent() before reaching goto again. Link: http://lkml.kernel.org/r/5ee1db09b60373a15890f6a7c835d00e76bf601d.1535682461.git.ernesto.mnd.fernandez@gmail.comSigned-off-by: NErnesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Rasmus Villemoes 提交于
[ Upstream commit ce1091d471107dbf6f91db66a480a25950c9b9ff ] For various alignments of buf, the current expression computes 4096 ok 4095 ok 8190 8189 ... 4097 i.e., if the caller has already written two bytes into the page buffer, len is 8190 rather than 4094, because PTR_ALIGN aligns up to the next boundary. So if the printed version of the bitmap is huge, scnprintf() ends up writing beyond the page boundary. I don't think any current callers actually write anything before bitmap_print_to_pagebuf, but the API seems to be designed to allow it. [akpm@linux-foundation.org: use offset_in_page(), per Andy] [akpm@linux-foundation.org: include mm.h for offset_in_page()] Link: http://lkml.kernel.org/r/20180818131623.8755-7-linux@rasmusvillemoes.dkSigned-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: NAndy Shevchenko <andy.shevchenko@gmail.com> Cc: Yury Norov <ynorov@caviumnetworks.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Rasmus Villemoes 提交于
[ Upstream commit d9873969fa8725dc6a5a21ab788c057fd8719751 ] Most other bitmap API, including the OOL version __bitmap_shift_right, take unsigned nbits. This was accidentally left out from 2fbad299. Link: http://lkml.kernel.org/r/20180818131623.8755-5-linux@rasmusvillemoes.dk Fixes: 2fbad299 ("lib: bitmap: change bitmap_shift_right to take unsigned parameters") Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk> Reported-by: NYury Norov <ynorov@caviumnetworks.com> Reviewed-by: NAndy Shevchenko <andy.shevchenko@gmail.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Rasmus Villemoes 提交于
[ Upstream commit 7275b097851a5e2e0dd4da039c7e96b59ac5314e ] The static inlines in bitmap.h do not handle a compile-time constant nbits==0 correctly (they dereference the passed src or dst pointers, despite only 0 words being valid to access). I had the 0-day buildbot chew on a patch [1] that would cause build failures for such cases without complaining, suggesting that we don't have any such users currently, at least for the 70 .config/arch combinations that was built. Should any turn up, make sure they use the out-of-line versions, which do handle nbits==0 correctly. This is of course not the most efficient, but it's much less churn than teaching all the static inlines an "if (zero_const_nbits())", and since we don't have any current instances, this doesn't affect existing code at all. [1] lkml.kernel.org/r/20180815085539.27485-1-linux@rasmusvillemoes.dk Link: http://lkml.kernel.org/r/20180818131623.8755-3-linux@rasmusvillemoes.dkSigned-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: NAndy Shevchenko <andy.shevchenko@gmail.com> Cc: Yury Norov <ynorov@caviumnetworks.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Dan Carpenter 提交于
[ Upstream commit 4b408c74ee5a0b74fc9265c2fe39b0e7dec7c056 ] The concern here is that "gup->size" is a u64 and "nr_pages" is unsigned long. On 32 bit systems we could trick the kernel into allocating fewer pages than expected. Link: http://lkml.kernel.org/r/20181025061546.hnhkv33diogf2uis@kili.mountain Fixes: 64c349f4 ("mm: add infrastructure for get_user_pages_fast() benchmarking") Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Keith Busch <keith.busch@intel.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: YueHaibing <yuehaibing@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Ming Lei 提交于
[ Upstream commit c57cdf7a9e51d97a43e29b8f4a04157875104000 ] rq_qos_exit() removes the current q->rq_qos, this action has to be done after queue is frozen, otherwise the IO queue path may never be waken up, then IO hang is caused. So fixes this issue by moving rq_qos_exit() after queue is frozen. Cc: Josef Bacik <josef@toxicpanda.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Michael Ellerman 提交于
[ Upstream commit 69f8117f17b332a68cd8f4bf8c2d0d3d5b84efc5 ] Use TEST_GEN_PROGS and don't redefine all, this makes the out-of-tree build work. We need to move the extra dependencies below the include of lib.mk, because it adds the $(OUTPUT) prefix if it's defined. We can also drop the clean rule, lib.mk does it for us. Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Michael Ellerman 提交于
[ Upstream commit 266bac361d5677e61a6815bd29abeb3bdced2b07 ] For the out-of-tree build to work we need to tell switch_endian_test to look for check-reversed.S in $(OUTPUT). Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Joel Stanley 提交于
[ Upstream commit 27825349d7b238533a47e3d98b8bb0efd886b752 ] We should use TEST_GEN_PROGS, not TEST_PROGS. That tells the selftests makefile (lib.mk) that those tests are generated (built), and so it adds the $(OUTPUT) prefix for us, making the out-of-tree build work correctly. It also means we don't need our own clean rule, lib.mk does it. We also have to update the signal_tm rule to use $(OUTPUT). Signed-off-by: NJoel Stanley <joel@jms.id.au> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Joel Stanley 提交于
[ Upstream commit c39b79082a38a4f8c801790edecbbb4d62ed2992 ] We should use TEST_GEN_PROGS, not TEST_PROGS. That tells the selftests makefile (lib.mk) that those tests are generated (built), and so it adds the $(OUTPUT) prefix for us, making the out-of-tree build work correctly. It also means we don't need our own clean rule, lib.mk does it. We also have to update the ptrace-pkey and core-pkey rules to use $(OUTPUT). Signed-off-by: NJoel Stanley <joel@jms.id.au> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Joel Stanley 提交于
[ Upstream commit 9c87156cce5a63735d1218f0096a65c50a7a32aa ] When building with clang (8 trunk, 7.0 release) the frame size limit is hit: arch/powerpc/xmon/xmon.c:452:12: warning: stack frame size of 2576 bytes in function 'xmon_core' [-Wframe-larger-than=] Some investigation by Naveen indicates this is due to clang saving the addresses to printf format strings on the stack. While this issue is investigated, bump up the frame size limit for xmon when building with clang. Link: https://github.com/ClangBuiltLinux/linux/issues/252Signed-off-by: NJoel Stanley <joel@jms.id.au> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Hangbin Liu 提交于
[ Upstream commit 966c37f2d77eb44d47af8e919267b1ba675b2eca ] Similiar with ipv6 mcast commit 89225d1c ("net: ipv6: mld: fix v1/v2 switchback timeout to rfc3810, 9.12.") i) RFC3376 8.12. Older Version Querier Present Timeout says: The Older Version Querier Interval is the time-out for transitioning a host back to IGMPv3 mode once an older version query is heard. When an older version query is received, hosts set their Older Version Querier Present Timer to Older Version Querier Interval. This value MUST be ((the Robustness Variable) times (the Query Interval in the last Query received)) plus (one Query Response Interval). Currently we only use a hardcode value IGMP_V1/v2_ROUTER_PRESENT_TIMEOUT. Fix it by adding two new items mr_qi(Query Interval) and mr_qri(Query Response Interval) in struct in_device. Now we can calculate the switchback time via (mr_qrv * mr_qi) + mr_qri. We need update these values when receive IGMPv3 queries. Reported-by: NYing Xu <yinxu@redhat.com> Signed-off-by: NHangbin Liu <liuhangbin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Darrick J. Wong 提交于
[ Upstream commit 07d19dc9fbe9128378b9e226abe886fd8fd473df ] A deduplication data corruption is exposed in XFS and btrfs. It is caused by extending the block match range to include the partial EOF block, but then allowing unknown data beyond EOF to be considered a "match" to data in the destination file because the comparison is only made to the end of the source file. This corrupts the destination file when the source extent is shared with it. The VFS remapping prep functions only support whole block dedupe, but we still need to appear to support whole file dedupe correctly. Hence if the dedupe request includes the last block of the souce file, don't include it in the actual dedupe operation. If the rest of the range dedupes successfully, then reject the entire request. A subsequent patch will enable us to shorten dedupe requests correctly. When reflinking sub-file ranges, a data corruption can occur when the source file range includes a partial EOF block. This shares the unknown data beyond EOF into the second file at a position inside EOF, exposing stale data in the second file. If the reflink request includes the last block of the souce file, only proceed with the reflink operation if it lands at or past the destination file's current EOF. If it lands within the destination file EOF, reject the entire request with -EINVAL and make the caller go the hard way. A subsequent patch will enable us to shorten reflink requests correctly. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Anton Ivanov 提交于
[ Upstream commit 917e2fd2c53eb3c4162f5397555cbd394390d4bc ] This fixes a long standing bug where large amounts of output could freeze the tty (most commonly seen on stdio console). While the bug has always been there it became more pronounced after moving to the new interrupt controller. The line semantics are now changed to have true IRQ write semantics which should further improve the tty/line subsystem stability and performance Signed-off-by: NAnton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: NRichard Weinberger <richard@nod.at> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Masahiro Yamada 提交于
[ Upstream commit eaba68785c2d24ebf1f0d46c24e11b79cc2f94c7 ] The current IRQ handler clears all the IRQ status bits when it bails out. This is dangerous because it might clear away the status bits that have just been set while processing the current handler. If this happens, the IRQ event for the latest transfer is lost forever. The IRQ status bits must be cleared *before* the next transfer is kicked. Fixes: 6a62974b ("i2c: uniphier_f: add UniPhier FIFO-builtin I2C driver") Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: NWolfram Sang <wsa@the-dreams.de> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Masahiro Yamada 提交于
[ Upstream commit 39226aaa85f002d695e3cafade3309e12ffdaecd ] Currently, a timeout error could happen at a repeated START condition. For a (non-repeated) START condition, the controller starts sending data when the UNIPHIER_FI2C_CR_STA bit is set. However, for a repeated START condition, the hardware starts running when the slave address is written to the TX FIFO - the write to the UNIPHIER_FI2C_CR register is actually unneeded. Because the hardware is already running before the IRQ is enabled for a repeated START, the driver may miss the IRQ event. In most cases, this problem does not show up since modern CPUs are much faster than the I2C transfer. However, it is still possible that a context switch happens after the controller starts, but before the IRQ register is set up. To fix this, - Do not write UNIPHIER_FI2C_CR for repeated START conditions. - Enable IRQ *before* writing the slave address to the TX FIFO. - Disable IRQ for the current CPU while queuing up the TX FIFO; If the CPU is interrupted by some task, the interrupt handler might be invoked due to the empty TX FIFO before completing the setup. Fixes: 6a62974b ("i2c: uniphier_f: add UniPhier FIFO-builtin I2C driver") Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: NWolfram Sang <wsa@the-dreams.de> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Masahiro Yamada 提交于
[ Upstream commit f1fdcbbdf45d9609f3d4063b67e9ea941ba3a58f ] This is unlikely to happen, but it is possible for a CPU to enter the interrupt handler just after wait_for_completion_timeout() has expired. If this happens, the hardware is accessed from multiple contexts concurrently. Disable the IRQ after wait_for_completion_timeout(), and do nothing from the handler when the IRQ is disabled. Fixes: 6a62974b ("i2c: uniphier_f: add UniPhier FIFO-builtin I2C driver") Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: NWolfram Sang <wsa@the-dreams.de> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Jianchao Wang 提交于
[ Upstream commit 69840466086d2248898020a08dda52732686c4e6 ] There are two cases when handle DISCARD merge. If max_discard_segments == 1, the bios/requests need to be contiguous to merge. If max_discard_segments > 1, it takes every bio as a range and different range needn't to be contiguous. But now, attempt_merge screws this up. It always consider contiguity for DISCARD for the case max_discard_segments > 1 and cannot merge contiguous DISCARD for the case max_discard_segments == 1, because rq_attempt_discard_merge always returns false in this case. This patch fixes both of the two cases above. Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Sabrina Dubroca 提交于
[ Upstream commit 07bddef9839378bd6f95b393cf24c420529b4ef1 ] Currently, the kernel doesn't let the administrator set a macsec device up unless its lower device is currently up. This is inconsistent, as a macsec device that is up won't automatically go down when its lower device goes down. Now that linkstate propagation works, there's really no reason for this limitation, so let's remove it. Fixes: c09440f7 ("macsec: introduce IEEE 802.1AE driver") Reported-by: NRadu Rendec <radu.rendec@gmail.com> Signed-off-by: NSabrina Dubroca <sd@queasysnail.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Sabrina Dubroca 提交于
[ Upstream commit e6ac075882b2afcdf2d5ab328ce4ab42a1eb9593 ] Like all other virtual devices (macvlan, vlan), the operstate of a macsec device should match the state of its lower device. This is done by calling netif_stacked_transfer_operstate from its netdevice notifier. We also need to call netif_stacked_transfer_operstate when a new macsec device is created, so that its operstate is set properly. This is only relevant when we try to bring the device up directly when we create it. Radu Rendec proposed a similar patch, inspired from the 802.1q driver, that included changing the administrative state of the macsec device, instead of just the operstate. This version is similar to what the macvlan driver does, and updates only the operstate. Fixes: c09440f7 ("macsec: introduce IEEE 802.1AE driver") Reported-by: NRadu Rendec <radu.rendec@gmail.com> Reported-by: NPatrick Talbert <ptalbert@redhat.com> Signed-off-by: NSabrina Dubroca <sd@queasysnail.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Andrea Arcangeli 提交于
[ Upstream commit d7c3393413fe7e7dc54498ea200ea94742d61e18 ] Patch series "migrate_misplaced_transhuge_page race conditions". Aaron found a new instance of the THP MADV_DONTNEED race against pmdp_clear_flush* variants, that was apparently left unfixed. While looking into the race found by Aaron, I may have found two more issues in migrate_misplaced_transhuge_page. These race conditions would not cause kernel instability, but they'd corrupt userland data or leave data non zero after MADV_DONTNEED. I did only minor testing, and I don't expect to be able to reproduce this (especially the lack of ->invalidate_range before migrate_page_copy, requires the latest iommu hardware or infiniband to reproduce). The last patch is noop for x86 and it needs further review from maintainers of archs that implement flush_cache_range() (not in CC yet). To avoid confusion, it's not the first patch that introduces the bug fixed in the second patch, even before removing the pmdp_huge_clear_flush_notify, that _notify suffix was called after migrate_page_copy already run. This patch (of 3): This is a corollary of ced10803 ("thp: fix MADV_DONTNEED vs. numa balancing race"), 58ceeb6b ("thp: fix MADV_DONTNEED vs. MADV_FREE race") and 5b7abeae ("thp: fix MADV_DONTNEED vs clear soft dirty race). When the above three fixes where posted Dave asked https://lkml.kernel.org/r/929b3844-aec2-0111-fef7-8002f9d4e2b9@intel.com but apparently this was missed. The pmdp_clear_flush* in migrate_misplaced_transhuge_page() was introduced in a54a407f ("mm: Close races between THP migration and PMD numa clearing"). The important part of such commit is only the part where the page lock is not released until the first do_huge_pmd_numa_page() finished disarming the pagenuma/protnone. The addition of pmdp_clear_flush() wasn't beneficial to such commit and there's no commentary about such an addition either. I guess the pmdp_clear_flush() in such commit was added just in case for safety, but it ended up introducing the MADV_DONTNEED race condition found by Aaron. At that point in time nobody thought of such kind of MADV_DONTNEED race conditions yet (they were fixed later) so the code may have looked more robust by adding the pmdp_clear_flush(). This specific race condition won't destabilize the kernel, but it can confuse userland because after MADV_DONTNEED the memory won't be zeroed out. This also optimizes the code and removes a superfluous TLB flush. [akpm@linux-foundation.org: reflow comment to 80 cols, fix grammar and typo (beacuse)] Link: http://lkml.kernel.org/r/20181013002430.698-2-aarcange@redhat.comSigned-off-by: NAndrea Arcangeli <aarcange@redhat.com> Reported-by: NAaron Tomlin <atomlin@redhat.com> Acked-by: NMel Gorman <mgorman@suse.de> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Jerome Glisse <jglisse@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Keith Busch 提交于
[ Upstream commit 319e0bec1aecb36c5ac6d23812af487ff2c8f47f ] If the '-w' parameter was provided, the benchmark would exit due to a mssing 'break'. Link: http://lkml.kernel.org/r/20181010195605.10689-3-keith.busch@intel.comSigned-off-by: NKeith Busch <keith.busch@intel.com> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Dave Chinner 提交于
[ Upstream commit 64081362e8ff4587b4554087f3cfc73d3e0a4cd7 ] We've recently seen a workload on XFS filesystems with a repeatable deadlock between background writeback and a multi-process application doing concurrent writes and fsyncs to a small range of a file. range_cyclic writeback Process 1 Process 2 xfs_vm_writepages write_cache_pages writeback_index = 2 cycled = 0 .... find page 2 dirty lock Page 2 ->writepage page 2 writeback page 2 clean page 2 added to bio no more pages write() locks page 1 dirties page 1 locks page 2 dirties page 1 fsync() .... xfs_vm_writepages write_cache_pages start index 0 find page 1 towrite lock Page 1 ->writepage page 1 writeback page 1 clean page 1 added to bio find page 2 towrite lock Page 2 page 2 is writeback <blocks> write() locks page 1 dirties page 1 fsync() .... xfs_vm_writepages write_cache_pages start index 0 !done && !cycled sets index to 0, restarts lookup find page 1 dirty find page 1 towrite lock Page 1 page 1 is writeback <blocks> lock Page 1 <blocks> DEADLOCK because: - process 1 needs page 2 writeback to complete to make enough progress to issue IO pending for page 1 - writeback needs page 1 writeback to complete so process 2 can progress and unlock the page it is blocked on, then it can issue the IO pending for page 2 - process 2 can't make progress until process 1 issues IO for page 1 The underlying cause of the problem here is that range_cyclic writeback is processing pages in descending index order as we hold higher index pages in a structure controlled from above write_cache_pages(). The write_cache_pages() caller needs to be able to submit these pages for IO before write_cache_pages restarts writeback at mapping index 0 to avoid wcp inverting the page lock/writeback wait order. generic_writepages() is not susceptible to this bug as it has no private context held across write_cache_pages() - filesystems using this infrastructure always submit pages in ->writepage immediately and so there is no problem with range_cyclic going back to mapping index 0. However: mpage_writepages() has a private bio context, exofs_writepages() has page_collect fuse_writepages() has fuse_fill_wb_data nfs_writepages() has nfs_pageio_descriptor xfs_vm_writepages() has xfs_writepage_ctx All of these ->writepages implementations can hold pages under writeback in their private structures until write_cache_pages() returns, and hence they are all susceptible to this deadlock. Also worth noting is that ext4 has it's own bastardised version of write_cache_pages() and so it /may/ have an equivalent deadlock. I looked at the code long enough to understand that it has a similar retry loop for range_cyclic writeback reaching the end of the file and then promptly ran away before my eyes bled too much. I'll leave it for the ext4 developers to determine if their code is actually has this deadlock and how to fix it if it has. There's a few ways I can see avoid this deadlock. There's probably more, but these are the first I've though of: 1. get rid of range_cyclic altogether 2. range_cyclic always stops at EOF, and we start again from writeback index 0 on the next call into write_cache_pages() 2a. wcp also returns EAGAIN to ->writepages implementations to indicate range cyclic has hit EOF. writepages implementations can then flush the current context and call wpc again to continue. i.e. lift the retry into the ->writepages implementation 3. range_cyclic uses trylock_page() rather than lock_page(), and it skips pages it can't lock without blocking. It will already do this for pages under writeback, so this seems like a no-brainer 3a. all non-WB_SYNC_ALL writeback uses trylock_page() to avoid blocking as per pages under writeback. I don't think #1 is an option - range_cyclic prevents frequently dirtied lower file offset from starving background writeback of rarely touched higher file offsets. #2 is simple, and I don't think it will have any impact on performance as going back to the start of the file implies an immediate seek. We'll have exactly the same number of seeks if we switch writeback to another inode, and then come back to this one later and restart from index 0. #2a is pretty much "status quo without the deadlock". Moving the retry loop up into the wcp caller means we can issue IO on the pending pages before calling wcp again, and so avoid locking or waiting on pages in the wrong order. I'm not convinced we need to do this given that we get the same thing from #2 on the next writeback call from the writeback infrastructure. #3 is really just a band-aid - it doesn't fix the access/wait inversion problem, just prevents it from becoming a deadlock situation. I'd prefer we fix the inversion, not sweep it under the carpet like this. #3a is really an optimisation that just so happens to include the band-aid fix of #3. So it seems that the simplest way to fix this issue is to implement solution #2 Link: http://lkml.kernel.org/r/20181005054526.21507-1-david@fromorbit.comSigned-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NJan Kara <jack@suse.de> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Jia-Ju Bai 提交于
[ Upstream commit 999865764f5f128896402572b439269acb471022 ] The kernel module may sleep with holding a spinlock. The function call paths (from bottom to top) in Linux-4.16 are: [FUNC] get_zeroed_page(GFP_NOFS) fs/ocfs2/dlm/dlmdebug.c, 332: get_zeroed_page in dlm_print_one_mle fs/ocfs2/dlm/dlmmaster.c, 240: dlm_print_one_mle in __dlm_put_mle fs/ocfs2/dlm/dlmmaster.c, 255: __dlm_put_mle in dlm_put_mle fs/ocfs2/dlm/dlmmaster.c, 254: spin_lock in dlm_put_ml [FUNC] get_zeroed_page(GFP_NOFS) fs/ocfs2/dlm/dlmdebug.c, 332: get_zeroed_page in dlm_print_one_mle fs/ocfs2/dlm/dlmmaster.c, 240: dlm_print_one_mle in __dlm_put_mle fs/ocfs2/dlm/dlmmaster.c, 222: __dlm_put_mle in dlm_put_mle_inuse fs/ocfs2/dlm/dlmmaster.c, 219: spin_lock in dlm_put_mle_inuse To fix this bug, GFP_NOFS is replaced with GFP_ATOMIC. This bug is found by my static analysis tool DSAC. Link: http://lkml.kernel.org/r/20180901112528.27025-1-baijiaju1990@gmail.comSigned-off-by: NJia-Ju Bai <baijiaju1990@gmail.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <ge.changwei@h3c.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Andrey Ryabinin 提交于
[ Upstream commit 19a2ca0fb560fd7be7b5293c6b652c6d6078dcde ] ARM64 has asm implementation of memchr(), memcmp(), str[r]chr(), str[n]cmp(), str[n]len(). KASAN don't see memory accesses in asm code, thus it can potentially miss many bugs. Ifdef out __HAVE_ARCH_* defines of these functions when KASAN is enabled, so the generic implementations from lib/string.c will be used. We can't just remove the asm functions because efistub uses them. And we can't have two non-weak functions either, so declare the asm functions as weak. Link: http://lkml.kernel.org/r/20180920135631.23833-2-aryabinin@virtuozzo.comSigned-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com> Reported-by: NKyeongdon Kim <kyeongdon.kim@lge.com> Cc: Alexander Potapenko <glider@google.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 David S. Miller 提交于
[ Upstream commit 6c2fc9cddc1ffdef8ada1dc8404e5affae849953 ] Such as: fs/ocfs2/file.c: In function ‘ocfs2_file_write_iter’: ./arch/sparc/include/asm/cmpxchg_64.h:55:22: warning: value computed is not used [-Wunused-value] #define xchg(ptr,x) ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr)))) and drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c: In function ‘ixgbevf_xdp_setup’: ./arch/sparc/include/asm/cmpxchg_64.h:55:22: warning: value computed is not used [-Wunused-value] #define xchg(ptr,x) ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr)))) Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Felipe Rechia 提交于
[ Upstream commit e901378578c62202594cba0f6c076f3df365ec91 ] Fix a bug introduced by the creation of flush_all_to_thread() for processors that have SPE (Signal Processing Engine) and use it to compute floating-point operations. >From userspace perspective, the problem was seen in attempts of computing floating-point operations which should generate exceptions. For example: fork(); float x = 0.0 / 0.0; isnan(x); // forked process returns False (should be True) The operation above also should always cause the SPEFSCR FINV bit to be set. However, the SPE floating-point exceptions were turned off after a fork(). Kernel versions prior to the bug used flush_spe_to_thread(), which first saves SPEFSCR register values in tsk->thread and then calls giveup_spe(tsk). After commit 579e633e, the save_all() function was called first to giveup_spe(), and then the SPEFSCR register values were saved in tsk->thread. This would save the SPEFSCR register values after disabling SPE for that thread, causing the bug described above. Fixes 579e633e ("powerpc: create flush_all_to_thread()") Signed-off-by: NFelipe Rechia <felipe.rechia@datacom.com.br> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Martin Lau 提交于
[ Upstream commit 4a6998aff82a20a1aece86a186d8e5263f8b2315 ] Wenwen Wang reported: In btf_parse(), the header of the user-space btf data 'btf_data' is firstly parsed and verified through btf_parse_hdr(). In btf_parse_hdr(), the header is copied from user-space 'btf_data' to kernel-space 'btf->hdr' and then verified. If no error happens during the verification process, the whole data of 'btf_data', including the header, is then copied to 'data' in btf_parse(). It is obvious that the header is copied twice here. More importantly, no check is enforced after the second copy to make sure the headers obtained in these two copies are same. Given that 'btf_data' resides in the user space, a malicious user can race to modify the header between these two copies. By doing so, the user can inject inconsistent data, which can cause undefined behavior of the kernel and introduce potential security risk. This issue is similar to the one fixed in commit 8af03d1ae2e1 ("bpf: btf: Fix a missing check bug"). To fix it, this patch copies the user 'btf_data' *before* parsing / verifying the BTF header. Fixes: 69b693f0 ("bpf: btf: Introduce BPF Type Format (BTF)") Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Co-developed-by: NWenwen Wang <wang6495@umn.edu> Acked-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Taehee Yoo 提交于
[ Upstream commit f592f804831f1cf9d1f9966f58c80f150e6829b5 ] The dev_map_notification() removes interface in devmap if unregistering interface's ifindex is same. But only checking ifindex is not enough because other netns can have same ifindex. so that wrong interface selection could occurred. Hence netdev pointer comparison code is added. v2: compare netdev pointer instead of using net_eq() (Daniel Borkmann) v1: Initial patch Fixes: 2ddf71e2 ("net: add notifier hooks for devmap bpf map") Signed-off-by: NTaehee Yoo <ap420073@gmail.com> Acked-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Tristram Ha 提交于
[ Upstream commit 899ecaedd15599c22553d158f53b127cc1632dc2 ] Socket buffer is not re-created when headroom is 2 and tailroom is 1. Signed-off-by: NTristram Ha <Tristram.Ha@microchip.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NSasha Levin <sashal@kernel.org>
-