- 27 4月, 2018 2 次提交
-
-
由 Maxime Chevallier 提交于
Marvell PPv2.2 controller present on CP-110 need the extra "mg_core_clk" clock to avoid system hangs when powering some network interfaces up. This issue appeared after a recent clock rework on Armada 7K/8K platforms. This commit adds the new clock and updates the documentation accordingly. [gregory.clement: use the real first commit to fix and add the cc:stable flag] Fixes: e3af9f7c ("RM64: dts: marvell: armada-cp110: Fix clock resources for various node") Cc: <stable@vger.kernel.org> Signed-off-by: NMaxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: NGregory CLEMENT <gregory.clement@bootlin.com>
-
由 Maxime Chevallier 提交于
The Marvell XSMI controller needs 3 clocks to operate correctly : - The MG clock (clk 5) - The MG Core clock (clk 6) - The GOP clock (clk 18) This commit adds them, to avoid system hangs when using these interfaces. [gregory.clement: use the real first commit to fix and add the cc:stable flag] Fixes: f66b2aff ("arm64: dts: marvell: add xmdio nodes for 7k/8k") Cc: <stable@vger.kernel.org> Signed-off-by: NMaxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: NGregory CLEMENT <gregory.clement@bootlin.com>
-
- 14 4月, 2018 12 次提交
-
-
由 Philipp Rudo 提交于
The code to verify the new kernels sha digest is applicable for all architectures. Move it to common code. One problem is the string.c implementation on x86. Currently sha256 includes x86/boot/string.h which defines memcpy and memset to be gcc builtins. By moving the sha256 implementation to common code and changing the include to linux/string.h both functions are no longer defined. Thus definitions have to be provided in x86/purgatory/string.c Link: http://lkml.kernel.org/r/20180321112751.22196-12-prudo@linux.vnet.ibm.comSigned-off-by: NPhilipp Rudo <prudo@linux.vnet.ibm.com> Acked-by: NDave Young <dyoung@redhat.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Philipp Rudo 提交于
For s390 new kernels are loaded to fixed addresses in memory before they are booted. With the current code this is a problem as it assumes the kernel will be loaded to an 'arbitrary' address. In particular, kexec_locate_mem_hole searches for a large enough memory region and sets the load address (kexec_bufer->mem) to it. Luckily there is a simple workaround for this problem. By returning 1 in arch_kexec_walk_mem, kexec_locate_mem_hole is turned off. This allows the architecture to set kbuf->mem by hand. While the trick works fine for the kernel it does not for the purgatory as here the architectures don't have access to its kexec_buffer. Give architectures access to the purgatories kexec_buffer by changing kexec_load_purgatory to take a pointer to it. With this change architectures have access to the buffer and can edit it as they need. A nice side effect of this change is that we can get rid of the purgatory_info->purgatory_load_address field. As now the information stored there can directly be accessed from kbuf->mem. Link: http://lkml.kernel.org/r/20180321112751.22196-11-prudo@linux.vnet.ibm.comSigned-off-by: NPhilipp Rudo <prudo@linux.vnet.ibm.com> Reviewed-by: NMartin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: NDave Young <dyoung@redhat.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Philipp Rudo 提交于
The current code uses the sh_offset field in purgatory_info->sechdrs to store a pointer to the current load address of the section. Depending whether the section will be loaded or not this is either a pointer into purgatory_info->purgatory_buf or kexec_purgatory. This is not only a violation of the ELF standard but also makes the code very hard to understand as you cannot tell if the memory you are using is read-only or not. Remove this misuse and store the offset of the section in pugaroty_info->purgatory_buf in sh_offset. Link: http://lkml.kernel.org/r/20180321112751.22196-10-prudo@linux.vnet.ibm.comSigned-off-by: NPhilipp Rudo <prudo@linux.vnet.ibm.com> Acked-by: NDave Young <dyoung@redhat.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Philipp Rudo 提交于
When the relocations are applied to the purgatory only the section the relocations are applied to is writable. The other sections, i.e. the symtab and .rel/.rela, are in read-only kexec_purgatory. Highlight this by marking the corresponding variables as 'const'. While at it also change the signatures of arch_kexec_apply_relocations* to take section pointers instead of just the index of the relocation section. This removes the second lookup and sanity check of the sections in arch code. Link: http://lkml.kernel.org/r/20180321112751.22196-6-prudo@linux.vnet.ibm.comSigned-off-by: NPhilipp Rudo <prudo@linux.vnet.ibm.com> Acked-by: NDave Young <dyoung@redhat.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 AKASHI Takahiro 提交于
In the previous patches, commonly-used routines, exclude_mem_range() and prepare_elf64_headers(), were carved out. Now place them in kexec common code. A prefix "crash_" is given to each of their names to avoid possible name collisions. Link: http://lkml.kernel.org/r/20180306102303.9063-8-takahiro.akashi@linaro.orgSigned-off-by: NAKASHI Takahiro <takahiro.akashi@linaro.org> Acked-by: NDave Young <dyoung@redhat.com> Tested-by: NDave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 AKASHI Takahiro 提交于
Removing bufp variable in prepare_elf64_headers() makes the code simpler and more understandable. Link: http://lkml.kernel.org/r/20180306102303.9063-7-takahiro.akashi@linaro.orgSigned-off-by: NAKASHI Takahiro <takahiro.akashi@linaro.org> Acked-by: NDave Young <dyoung@redhat.com> Tested-by: NDave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 AKASHI Takahiro 提交于
While CRASH_MAX_RANGES (== 16) seems to be good enough, fixed-number array is not a good idea in general. In this patch, size of crash_mem buffer is calculated as before and the buffer is now dynamically allocated. This change also allows removing crash_elf_data structure. Link: http://lkml.kernel.org/r/20180306102303.9063-6-takahiro.akashi@linaro.orgSigned-off-by: NAKASHI Takahiro <takahiro.akashi@linaro.org> Acked-by: NDave Young <dyoung@redhat.com> Tested-by: NDave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 AKASHI Takahiro 提交于
The code guarded by CONFIG_X86_64 is necessary on some architectures which have a dedicated kernel mapping outside of linear memory mapping. (arm64 is among those.) In this patch, an additional argument, kernel_map, is added to enable/ disable the code removing #ifdef. Link: http://lkml.kernel.org/r/20180306102303.9063-5-takahiro.akashi@linaro.orgSigned-off-by: NAKASHI Takahiro <takahiro.akashi@linaro.org> Acked-by: NDave Young <dyoung@redhat.com> Tested-by: NDave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 AKASHI Takahiro 提交于
While prepare_elf64_headers() in x86 looks pretty generic for other architectures' use, it contains some code which tries to list crash memory regions by walking through system resources, which is not always architecture agnostic. To make this function more generic, the related code should be purged. In this patch, prepare_elf64_headers() simply scans crash_mem buffer passed and add all the listed regions to elf header as a PT_LOAD segment. So walk_system_ram_res(prepare_elf64_headers_callback) have been moved forward before prepare_elf64_headers() where the callback, prepare_elf64_headers_callback(), is now responsible for filling up crash_mem buffer. Meanwhile exclude_elf_header_ranges() used to be called every time in this callback it is rather redundant and now called only once in prepare_elf_headers() as well. Link: http://lkml.kernel.org/r/20180306102303.9063-4-takahiro.akashi@linaro.orgSigned-off-by: NAKASHI Takahiro <takahiro.akashi@linaro.org> Acked-by: NDave Young <dyoung@redhat.com> Tested-by: NDave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 AKASHI Takahiro 提交于
As arch_kexec_kernel_image_{probe,load}(), arch_kimage_file_post_load_cleanup() and arch_kexec_kernel_verify_sig() are almost duplicated among architectures, they can be commonalized with an architecture-defined kexec_file_ops array. So let's factor them out. Link: http://lkml.kernel.org/r/20180306102303.9063-3-takahiro.akashi@linaro.orgSigned-off-by: NAKASHI Takahiro <takahiro.akashi@linaro.org> Acked-by: NDave Young <dyoung@redhat.com> Tested-by: NDave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 AKASHI Takahiro 提交于
Patch series "kexec_file, x86, powerpc: refactoring for other architecutres", v2. This is a preparatory patchset for adding kexec_file support on arm64. It was originally included in a arm64 patch set[1], but Philipp is also working on their kexec_file support on s390[2] and some changes are now conflicting. So these common parts were extracted and put into a separate patch set for better integration. What's more, my original patch#4 was split into a few small chunks for easier review after Dave's comment. As such, the resulting code is basically identical with my original, and the only *visible* differences are: - renaming of _kexec_kernel_image_probe() and _kimage_file_post_load_cleanup() - change one of types of arguments at prepare_elf64_headers() Those, unfortunately, require a couple of trivial changes on the rest (#1, #6 to #13) of my arm64 kexec_file patch set[1]. Patch #1 allows making a use of purgatory optional, particularly useful for arm64. Patch #2 commonalizes arch_kexec_kernel_{image_probe, image_load, verify_sig}() and arch_kimage_file_post_load_cleanup() across architectures. Patches #3-#7 are also intended to generalize parse_elf64_headers(), along with exclude_mem_range(), to be made best re-use of. [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-February/561182.html [2] http://lkml.iu.edu//hypermail/linux/kernel/1802.1/02596.html This patch (of 7): On arm64, crash dump kernel's usable memory is protected by *unmapping* it from kernel virtual space unlike other architectures where the region is just made read-only. It is highly unlikely that the region is accidentally corrupted and this observation rationalizes that digest check code can also be dropped from purgatory. The resulting code is so simple as it doesn't require a bit ugly re-linking/relocation stuff, i.e. arch_kexec_apply_relocations_add(). Please see: http://lists.infradead.org/pipermail/linux-arm-kernel/2017-December/545428.html All that the purgatory does is to shuffle arguments and jump into a new kernel, while we still need to have some space for a hash value (purgatory_sha256_digest) which is never checked against. As such, it doesn't make sense to have trampline code between old kernel and new kernel on arm64. This patch introduces a new configuration, ARCH_HAS_KEXEC_PURGATORY, and allows related code to be compiled in only if necessary. [takahiro.akashi@linaro.org: fix trivial screwup] Link: http://lkml.kernel.org/r/20180309093346.GF25863@linaro.org Link: http://lkml.kernel.org/r/20180306102303.9063-2-takahiro.akashi@linaro.orgSigned-off-by: NAKASHI Takahiro <takahiro.akashi@linaro.org> Acked-by: NDave Young <dyoung@redhat.com> Tested-by: NDave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michael S. Tsirkin 提交于
__get_user_pages_fast handles errors differently from get_user_pages_fast: the former always returns the number of pages pinned, the later might return a negative error code. Link: http://lkml.kernel.org/r/1522962072-182137-6-git-send-email-mst@redhat.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thorsten Leemhuis <regressions@leemhuis.info> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 4月, 2018 12 次提交
-
-
由 Michael Ellerman 提交于
The cpu_has_feature() mechanism has an optimisation where at build time we construct a mask of the CPU feature bits that will always be true for the given .config, based on the platform/bitness/etc. that we are building for. That is incompatible with DT CPU features, where the set of CPU features is dependent on feature flags that are given to us by firmware. The result is that some feature bits can not be *disabled* by DT CPU features. Or more accurately, they can be disabled but they will still appear in the ALWAYS mask, meaning cpu_has_feature() will always return true for them. In the past this hasn't really been a problem because on Book3S 64 (where we support DT CPU features), the set of ALWAYS bits has been very small. That was because we always built for POWER4 and later, meaning the set of common bits was small. The only bit that could be cleared by DT CPU features that was also in the ALWAYS mask was CPU_FTR_NODSISRALIGN, and that was only used in the alignment handler to create a fake DSISR. That code was itself deleted in 31bfdb03 ("powerpc: Use instruction emulation infrastructure to handle alignment faults") (Sep 2017). However the set of ALWAYS features changed with the recent commit db5ae1c1 ("powerpc/64s: Refine feature sets for little endian builds") which restricted the set of feature flags when building little endian to Power7 or later. That caused the ALWAYS mask to become much larger for little endian builds. The result is that the following feature bits can currently not be *disabled* by DT CPU features: CPU_FTR_REAL_LE, CPU_FTR_MMCRA, CPU_FTR_CTRL, CPU_FTR_SMT, CPU_FTR_PURR, CPU_FTR_SPURR, CPU_FTR_DSCR, CPU_FTR_PKEY, CPU_FTR_VMX_COPY, CPU_FTR_CFAR, CPU_FTR_HAS_PPR. To fix it we need to mask the set of ALWAYS features with the base set of DT CPU features, ie. the features that are always enabled by DT CPU features. That way there are no bits in the ALWAYS mask that are not also always set by DT CPU features. Fixes: db5ae1c1 ("powerpc/64s: Refine feature sets for little endian builds") Reviewed-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
-
由 Thomas Petazzoni 提交于
On SuperH, the base of the physical memory might be different from zero. In this case, PCI address zero will map to a non-zero physical address. In order to make sure that the DMA mapping API takes care of this DMA offset, we must fill in the dev->dma_pfn_offset field for PCI devices. This gets done in the pcibios_bus_add_device() hook, called for each new PCI device detected. The dma_pfn_offset global variable is re-calculated for every PCI controller available on the platform, but that's not an issue because its value will each time be exactly the same, as it only depends on the memory start address and memory size. Signed-off-by: NThomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: NRich Felker <dalias@libc.org>
-
由 Thomas Petazzoni 提交于
The code setting up the PCI -> SuperHighway mapping doesn't take into account the fact that the address stored in PCIELARx must be aligned with the size stored in PCIELAMRx. For example, when your physical memory starts at 0x0800_0000 (128 MB), a size of 64 MB or 128 MB is fine. However, if you have 256 MB of memory, it doesn't work because the base address is not aligned on the size. In such situation, we have to round down the base address to make sure it is aligned on the size of the area. For for a 0x0800_0000 base address with 256 MB of memory, we will round down to 0x0, and extend the size of the mapping to 512 MB. This allows the mapping to work on platforms that have 256 MB of RAM. The current setup would only work with 128 MB of RAM or less. Signed-off-by: NThomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: NRich Felker <dalias@libc.org>
-
由 Thomas Petazzoni 提交于
The current definition of the PCIe IO and MEM resources for SH7786 doesn't match what the datasheet says. For example, for PCIe0 0xfe100000 is advertised by the datasheet as a PCI IO region, while 0xfd000000 is advertised as a PCI MEM region. The code currently inverts the two. The SH4A_PCIEPARL and SH4A_PCIEPTCTLR registers allow to define the base address and role of the different regions (including whether it's a MEM or IO region). However, practical experience on a SH7786 shows that if 0xfe100000 is used for LEL and 0xfd000000 for IO, a PCIe device using two MEM BARs cannot be accessed at all. Simply using 0xfe100000 for IO and 0xfd000000 for MEM makes the PCIe device accessible. It is very likely that this was never seen because there are two other PCI MEM region listed in the resources. However, for different reasons, none of the two other MEM regions are usable on the specific SH7786 platform the problem was encountered. Therefore, the last MEM region at 0xfe100000 was used to place the BARs, making the device non-functional. This commit therefore adjusts those PCI MEM and IO resources definitions so that they match what the datasheet says. They have only been tested with PCIe 0. Signed-off-by: NThomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: NRich Felker <dalias@libc.org>
-
由 Thomas Petazzoni 提交于
Depending on the physical memory layout, some PCI MEM areas are not usable. According to the SH7786 datasheet, the PCI MEM area from 1000_0000 to 13FF_FFFF is only usable if the physical memory layout (in MMSELR) is 1, 2, 5 or 6. In all other configurations, this PCI MEM area is not usable (because it overlaps with DRAM). Therefore, this commit adjusts the PCI SH7786 initialization to mark the relevant PCI resource as IORESOURCE_DISABLED if we can't use it. Signed-off-by: NThomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: NRich Felker <dalias@libc.org>
-
由 Thomas Petazzoni 提交于
Some PCI MEM resources are marked as IORESOURCE_MEM_32BIT, which means they are only usable when the SH core runs in 32-bit mode. In 29-bit mode, such memory regions are not usable. The existing code for SH7786 properly skips such regions when configuring the PCIe controller registers. However, because such regions are still described in the resource array, the pcibios_scanbus() function in the SuperH pci.c will register them to the PCI core. Due to this, the PCI core will allocate MEM areas from this resource, and assign BARs pointing to this area, even though it's unusable. In order to prevent this from happening, we mark such regions as IORESOURCE_DISABLED, which tells the SuperH pci.c pcibios_scanbus() function to skip them. Note that we separate marking the region as disabled from skipping it, because other regions will be marked as disabled in follow-up patches. Signed-off-by: NThomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: NRich Felker <dalias@libc.org>
-
由 Thomas Petazzoni 提交于
In pcibios_scanbus(), we provide to the PCI core the usable MEM and IO regions using pci_add_resource_offset(). We travel through all resources available in the "struct pci_channel". Also, in register_pci_controller(), we travel through all resources to request them, making sure they don't conflict with already requested resources. However, some resources may be disabled, in which case they should not be requested nor provided to the PCI core. In the current situation, none of the resources are disabled. However, follow-up patches in this series will make some resources disabled, making this preliminary change necessary. Signed-off-by: NThomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: NRich Felker <dalias@libc.org>
-
由 Thomas Petazzoni 提交于
Some devices may have a non-zero DMA offset, i.e an offset between the DMA address and the physical address. Such an offset can be encoded into the dma_pfn_offset field of "struct device", but the SuperH implementation of the DMA mapping API does not observe this information. This commit fixes that by ensuring the DMA address is properly calculated depending on this DMA offset. Signed-off-by: NThomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: NRich Felker <dalias@libc.org>
-
由 Thomas Petazzoni 提交于
The SH7786 has different physical memory layout configurations, configurable through the MMSELR register. The configuration is typically defined by the bootloader, so Linux generally doesn't care. Except that depending on the configuration, some PCI MEM areas may or may not be available. This commit adds a helper function that allows to retrieve the current physical memory layout configuration. It will be used in a following patch to exclude unusable PCI MEM areas during the PCI initialization. Signed-off-by: NThomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: NRich Felker <dalias@libc.org>
-
由 Rich Felker 提交于
When responding to a debug trap (breakpoint) in userspace, the kernel's trap handler raised SIGTRAP but returned from the trap via a code path that ignored pending signals, resulting in an infinite loop re-executing the trapping instruction. Signed-off-by: NRich Felker <dalias@libc.org>
-
由 Rich Felker 提交于
unflatten_device_tree() makes use of memblock allocation, and therefore must be called before paging_init() migrates the memblock allocation data to the bootmem framework. Otherwise the record of the allocation for the expanded device tree will be lost, and will eventually be clobbered when allocated for another use. Signed-off-by: NRich Felker <dalias@libc.org>
-
由 Aurelien Jarno 提交于
Commit 00b73d8d ("sh: add working futex atomic ops on userspace addresses for smp") changed the futex_atomic_op_inuser function to use a loop. In case of the FUTEX_OP_SET op with a userspace address containing a value different of 0, this loop is an endless loop. Fix that by loading the value of oldval from the userspace before doing the cmpxchg op, also for the FUTEX_OP_SET case. Signed-off-by: NAurelien Jarno <aurelien@aurel32.net> Signed-off-by: NRich Felker <dalias@libc.org>
-
- 12 4月, 2018 14 次提交
-
-
由 Michael Ellerman 提交于
In tlbiel_radix_set_isa300() we use the PPC_TLBIEL() macro to construct tlbiel instructions. The instruction takes 5 fields, two of which are registers, and the others are constants. But because it's constructed with inline asm the compiler doesn't know that. We got the constraint wrong on the 'r' field, using "r" tells the compiler to put the value in a register. The value we then get in the macro is the *register number*, not the value of the field. That means when we mask the register number with 0x1 we get 0 or 1 depending on which register the compiler happens to put the constant in, eg: li r10,1 tlbiel r8,r9,2,0,0 li r7,1 tlbiel r10,r6,0,0,1 If we're unlucky we might generate an invalid instruction form, for example RIC=0, PRS=1 and R=0, tlbiel r8,r7,0,1,0, this has been observed to cause machine checks: Oops: Machine check, sig: 7 [#1] CPU: 24 PID: 0 Comm: swapper NIP: 00000000000385f4 LR: 000000000100ed00 CTR: 000000000000007f REGS: c00000000110bb40 TRAP: 0200 MSR: 9000000000201003 <SF,HV,ME,RI,LE> CR: 48002222 XER: 20040000 CFAR: 00000000000385d0 DAR: 0000000000001c00 DSISR: 00000200 SOFTE: 1 If the machine check happens early in boot while we have MSR_ME=0 it will escalate into a checkstop and kill the box entirely. To fix it we could change the inline asm constraint to "i" which tells the compiler the value is a constant. But a better fix is to just pass a literal 1 into the macro, which bypasses any problems with inline asm constraints. Fixes: d4748276 ("powerpc/64s: Improve local TLB flush for boot and MCE on POWER9") Cc: stable@vger.kernel.org # v4.16+ Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
-
由 Joerg Roedel 提交于
The pmd_set_huge() and pud_set_huge() functions are used from the generic ioremap() code to establish large mappings where this is possible. But the generic ioremap() code does not check whether the PMD/PUD entries are already populated with a non-leaf entry, so that any page-table pages these entries point to will be lost. Further, on x86-32 with SHARED_KERNEL_PMD=0, this causes a BUG_ON() in vmalloc_sync_one() when PMD entries are synced from swapper_pg_dir to the current page-table. This happens because the PMD entry from swapper_pg_dir was promoted to a huge-page entry while the current PGD still contains the non-leaf entry. Because both entries are present and point to a different page, the BUG_ON() triggers. This was actually triggered with pti-x32 enabled in a KVM virtual machine by the graphics driver. A real and better fix for that would be to improve the page-table handling in the generic ioremap() code. But that is out-of-scope for this patch-set and left for later work. Reported-by: NDavid H. Gutteridge <dhgutteridge@sympatico.ca> Signed-off-by: NJoerg Roedel <jroedel@suse.de> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: David Laight <David.Laight@aculab.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eduardo Valentin <eduval@amazon.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Waiman Long <llong@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180411152437.GC15462@8bytes.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dave Hansen 提交于
Global pages are bad for hardening because they potentially let an exploit read the kernel image via a Meltdown-style attack which makes it easier to find gadgets. But, global pages are good for performance because they reduce TLB misses when making user/kernel transitions, especially when PCIDs are not available, such as on older hardware, or where a hypervisor has disabled them for some reason. This patch implements a basic, sane policy: If you have PCIDs, you only map a minimal amount of kernel text global. If you do not have PCIDs, you map all kernel text global. This policy effectively makes PCIDs something that not only adds performance but a little bit of hardening as well. I ran a simple "lseek" microbenchmark[1] to test the benefit on a modern Atom microserver. Most of the benefit comes from applying the series before this patch ("entry only"), but there is still a signifiant benefit from this patch. No Global Lines (baseline ): 6077741 lseeks/sec 88 Global Lines (entry only): 7528609 lseeks/sec (+23.9%) 94 Global Lines (this patch): 8433111 lseeks/sec (+38.8%) [1.] https://github.com/antonblanchard/will-it-scale/blob/master/tests/lseek1.cSigned-off-by: NDave Hansen <dave.hansen@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180406205518.E3D989EB@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dave Hansen 提交于
Summary: In current kernels, with PTI enabled, no pages are marked Global. This potentially increases TLB misses. But, the mechanism by which the Global bit is set and cleared is rather haphazard. This patch makes the process more explicit. In the end, it leaves us with Global entries in the page tables for the areas truly shared by userspace and kernel and increases TLB hit rates. The place this patch really shines in on systems without PCIDs. In this case, we are using an lseek microbenchmark[1] to see how a reasonably non-trivial syscall behaves. Higher is better: No Global pages (baseline): 6077741 lseeks/sec 88 Global Pages (this set): 7528609 lseeks/sec (+23.9%) On a modern Skylake desktop with PCIDs, the benefits are tangible, but not huge for a kernel compile (lower is better): No Global pages (baseline): 186.951 seconds time elapsed ( +- 0.35% ) 28 Global pages (this set): 185.756 seconds time elapsed ( +- 0.09% ) -1.195 seconds (-0.64%) I also re-checked everything using the lseek1 test[1]: No Global pages (baseline): 15783951 lseeks/sec 28 Global pages (this set): 16054688 lseeks/sec +270737 lseeks/sec (+1.71%) The effect is more visible, but still modest. Details: The kernel page tables are inherited from head_64.S which rudely marks them as _PAGE_GLOBAL. For PTI, we have been relying on the grace of $DEITY and some insane behavior in pageattr.c to clear _PAGE_GLOBAL. This patch tries to do better. First, stop filtering out "unsupported" bits from being cleared in the pageattr code. It's fine to filter out *setting* these bits but it is insane to keep us from clearing them. Then, *explicitly* go clear _PAGE_GLOBAL from the kernel identity map. Do not rely on pageattr to do it magically. After this patch, we can see that "GLB" shows up in each copy of the page tables, that we have the same number of global entries in each and that they are the *same* entries. /sys/kernel/debug/page_tables/current_kernel:11 /sys/kernel/debug/page_tables/current_user:11 /sys/kernel/debug/page_tables/kernel:11 9caae8ad6a1fb53aca2407ec037f612d current_kernel.GLB 9caae8ad6a1fb53aca2407ec037f612d current_user.GLB 9caae8ad6a1fb53aca2407ec037f612d kernel.GLB A quick visual audit also shows that all the entries make sense. 0xfffffe0000000000 is the cpu_entry_area and 0xffffffff81c00000 is the entry/exit text: 0xfffffe0000000000-0xfffffe0000002000 8K ro GLB NX pte 0xfffffe0000002000-0xfffffe0000003000 4K RW GLB NX pte 0xfffffe0000003000-0xfffffe0000006000 12K ro GLB NX pte 0xfffffe0000006000-0xfffffe0000007000 4K ro GLB x pte 0xfffffe0000007000-0xfffffe000000d000 24K RW GLB NX pte 0xfffffe000002d000-0xfffffe000002e000 4K ro GLB NX pte 0xfffffe000002e000-0xfffffe000002f000 4K RW GLB NX pte 0xfffffe000002f000-0xfffffe0000032000 12K ro GLB NX pte 0xfffffe0000032000-0xfffffe0000033000 4K ro GLB x pte 0xfffffe0000033000-0xfffffe0000039000 24K RW GLB NX pte 0xffffffff81c00000-0xffffffff81e00000 2M ro PSE GLB x pmd [1.] https://github.com/antonblanchard/will-it-scale/blob/master/tests/lseek1.cSigned-off-by: NDave Hansen <dave.hansen@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180406205517.C80FBE05@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dave Hansen 提交于
The entry/exit text and cpu_entry_area are mapped into userspace and the kernel. But, they are not _PAGE_GLOBAL. This creates unnecessary TLB misses. Add the _PAGE_GLOBAL flag for these areas. Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180406205515.2977EE7D@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dave Hansen 提交于
__ro_after_init data gets stuck in the .rodata section. That's normally fine because the kernel itself manages the R/W properties. But, if we run __change_page_attr() on an area which is __ro_after_init, the .rodata checks will trigger and force the area to be immediately read-only, even if it is early-ish in boot. This caused problems when trying to clear the _PAGE_GLOBAL bit for these area in the PTI code: it cleared _PAGE_GLOBAL like I asked, but also took it up on itself to clear _PAGE_RW. The kernel then oopses the next time it wrote to a __ro_after_init data structure. To fix this, add the kernel_set_to_readonly check, just like we have for kernel text, just a few lines below in this function. Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Acked-by: NKees Cook <keescook@chromium.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180406205514.8D898241@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dave Hansen 提交于
I was mystified as to where the _PAGE_GLOBAL in the kernel page tables for kernel text came from. I audited all the places I could find, but I missed one: head_64.S. The page tables that we create in here live for a long time, and they also have _PAGE_GLOBAL set, despite whether the processor supports it or not. It's harmless, and we got *lucky* that the pageattr code accidentally clears it when we wipe it out of __supported_pte_mask and then later try to mark kernel text read-only. Comment some of these properties to make it easier to find and understand in the future. Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180406205513.079BB265@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dave Hansen 提交于
The pageattr code has a mode where it can set or clear PTE bits in existing PTEs, so the page protections of the *new* PTEs come from one of two places: 1. The set/clear masks: cpa->mask_clr / cpa->mask_set 2. The existing PTE We filter ->mask_set/clr for supported PTE bits at entry to __change_page_attr() so we never need to filter them again. The only other place permissions can come from is an existing PTE and those already presumably have good bits. We do not need to filter them again. Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180406205511.BC072352@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dave Hansen 提交于
A PTE is constructed from a physical address and a pgprotval_t. __PAGE_KERNEL, for instance, is a pgprot_t and must be converted into a pgprotval_t before it can be used to create a PTE. This is done implicitly within functions like pfn_pte() by massage_pgprot(). However, this makes it very challenging to set bits (and keep them set) if your bit is being filtered out by massage_pgprot(). This moves the bit filtering out of pfn_pte() and friends. For users of PAGE_KERNEL*, filtering will be done automatically inside those macros but for users of __PAGE_KERNEL*, they need to do their own filtering now. Note that we also just move pfn_pte/pmd/pud() over to check_pgprot() instead of massage_pgprot(). This way, we still *look* for unsupported bits and properly warn about them if we find them. This might happen if an unfiltered __PAGE_KERNEL* value was passed in, for instance. - printk format warning fix from: Arnd Bergmann <arnd@arndb.de> - boot crash fix from: Tom Lendacky <thomas.lendacky@amd.com> - crash bisected by: Mike Galbraith <efault@gmx.de> Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Reported-and-fixed-by: NArnd Bergmann <arnd@arndb.de> Fixed-by: NTom Lendacky <thomas.lendacky@amd.com> Bisected-by: NMike Galbraith <efault@gmx.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20180406205509.77E1D7F6@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Helge Deller 提交于
When issuing a "shutdown -h now", the reboot syscall calls kernel_halt() which shouldn't return, otherwise one gets this panic: reboot: System halted Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000000 CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 4.16.0-32bit+ #560 Backtrace: [<1018a694>] show_stack+0x18/0x28 [<106e68a8>] dump_stack+0x80/0x10c [<101a4df8>] panic+0xfc/0x290 [<101a90b8>] do_exit+0x73c/0x914 [<101c7e38>] SyS_reboot+0x190/0x1d4 [<1017e444>] syscall_exit+0x0/0x14 Fix it by letting machine_halt() call machine_power_off() which doesn't return. Signed-off-by: NHelge Deller <deller@gmx.de>
-
由 Ard Biesheuvel 提交于
Add support macros to conditionally yield the NEON (and thus the CPU) that may be called from the assembler code. In some cases, yielding the NEON involves saving and restoring a non trivial amount of context (especially in the CRC folding algorithms), and so the macro is split into three, and the code in between is only executed when the yield path is taken, allowing the context to be preserved. The third macro takes an optional label argument that marks the resume path after a yield has been performed. Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: NDave Martin <Dave.Martin@arm.com> Signed-off-by: NWill Deacon <will.deacon@arm.com>
-
由 Ard Biesheuvel 提交于
We are going to add code to all the NEON crypto routines that will turn them into non-leaf functions, so we need to manage the stack frames. To make this less tedious and error prone, add some macros that take the number of callee saved registers to preserve and the extra size to allocate in the stack frame (for locals) and emit the ldp/stp sequences. Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: NDave Martin <Dave.Martin@arm.com> Signed-off-by: NWill Deacon <will.deacon@arm.com>
-
由 Marc Zyngier 提交于
bpi.S was introduced as we were starting to build the Spectre v2 mitigation framework, and it was rather unclear that it would become strictly KVM specific. Now that the picture is a lot clearer, let's move the content of that file to hyp-entry.S, where it actually belong. Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NWill Deacon <will.deacon@arm.com>
-
由 Marc Zyngier 提交于
The very existence of __smccc_workaround_1_hvc_* is a thinko, as KVM will never use a HVC call to perform the branch prediction invalidation. Even as a nested hypervisor, it would use an SMC instruction. Let's get rid of it. Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NWill Deacon <will.deacon@arm.com>
-