- 23 1月, 2017 3 次提交
-
-
由 Denys Vlasenko 提交于
A lot of asm-optimized routines in arch/x86/crypto/ keep its constants in .data. This is wrong, they should be on .rodata. Mnay of these constants are the same in different modules. For example, 128-bit shuffle mask 0x000102030405060708090A0B0C0D0E0F exists in at least half a dozen places. There is a way to let linker merge them and use just one copy. The rules are as follows: mergeable objects of different sizes should not share sections. You can't put them all in one .rodata section, they will lose "mergeability". GCC puts its mergeable constants in ".rodata.cstSIZE" sections, or ".rodata.cstSIZE.<object_name>" if -fdata-sections is used. This patch does the same: .section .rodata.cst16.SHUF_MASK, "aM", @progbits, 16 It is important that all data in such section consists of 16-byte elements, not larger ones, and there are no implicit use of one element from another. When this is not the case, use non-mergeable section: .section .rodata[.VAR_NAME], "a", @progbits This reduces .data by ~15 kbytes: text data bss dec hex filename 11097415 2705840 2630712 16433967 fac32f vmlinux-prev.o 11112095 2690672 2630712 16433479 fac147 vmlinux.o Merged objects are visible in System.map: ffffffff81a28810 r POLY ffffffff81a28810 r POLY ffffffff81a28820 r TWOONE ffffffff81a28820 r TWOONE ffffffff81a28830 r PSHUFFLE_BYTE_FLIP_MASK <- merged regardless of ffffffff81a28830 r SHUF_MASK <------------- the name difference ffffffff81a28830 r SHUF_MASK ffffffff81a28830 r SHUF_MASK .. ffffffff81a28d00 r K512 <- merged three identical 640-byte tables ffffffff81a28d00 r K512 ffffffff81a28d00 r K512 Use of object names in section name suffixes is not strictly necessary, but might help if someday link stage will use garbage collection to eliminate unused sections (ld --gc-sections). Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: Josh Poimboeuf <jpoimboe@redhat.com> CC: Xiaodong Liu <xiaodong.liu@intel.com> CC: Megha Dey <megha.dey@intel.com> CC: linux-crypto@vger.kernel.org CC: x86@kernel.org CC: linux-kernel@vger.kernel.org Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Denys Vlasenko 提交于
%progbits form is used on ARM (where @ is a comment char). x86 consistently uses @progbits everywhere else. Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: Josh Poimboeuf <jpoimboe@redhat.com> CC: Xiaodong Liu <xiaodong.liu@intel.com> CC: Megha Dey <megha.dey@intel.com> CC: George Spelvin <linux@horizon.com> CC: linux-crypto@vger.kernel.org CC: x86@kernel.org CC: linux-kernel@vger.kernel.org Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Ard Biesheuvel 提交于
The GNU assembler for ARM version 2.22 or older fails to infer the element size from the vmov instructions, and aborts the build in the following way; .../aes-neonbs-core.S: Assembler messages: .../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1h[1],r10' .../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1h[0],r9' .../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1l[1],r8' .../aes-neonbs-core.S:817: Error: bad type for scalar -- `vmov q1l[0],r7' .../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2h[1],r10' .../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2h[0],r9' .../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2l[1],r8' .../aes-neonbs-core.S:818: Error: bad type for scalar -- `vmov q2l[0],r7' Fix this by setting the element size explicitly, by replacing vmov with vmov.32. Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
- 13 1月, 2017 9 次提交
-
-
由 Ard Biesheuvel 提交于
The ARMv8-M architecture introduces 'tt' and 'ttt' instructions, which means we can no longer use 'tt' as a register alias on recent versions of binutils for ARM. So replace the alias with 'ttab'. Fixes: 81edb426 ("crypto: arm/aes - replace scalar AES cipher") Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Ard Biesheuvel 提交于
This replaces the unwieldy generated implementation of bit-sliced AES in CBC/CTR/XTS modes that originated in the OpenSSL project with a new version that is heavily based on the OpenSSL implementation, but has a number of advantages over the old version: - it does not rely on the scalar AES cipher that also originated in the OpenSSL project and contains redundant lookup tables and key schedule generation routines (which we already have in crypto/aes_generic.) - it uses the same expanded key schedule for encryption and decryption, reducing the size of the per-key data structure by 1696 bytes - it adds an implementation of AES in ECB mode, which can be wrapped by other generic chaining mode implementations - it moves the handling of corner cases that are non critical to performance to the glue layer written in C - it was written directly in assembler rather than generated from a Perl script Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Ard Biesheuvel 提交于
This is a reimplementation of the NEON version of the bit-sliced AES algorithm. This code is heavily based on Andy Polyakov's OpenSSL version for ARM, which is also available in the kernel. This is an alternative for the existing NEON implementation for arm64 authored by me, which suffers from poor performance due to its reliance on the pathologically slow four register variant of the tbl/tbx NEON instruction. This version is about ~30% (*) faster than the generic C code, but only in cases where the input can be 8x interleaved (this is a fundamental property of bit slicing). For this reason, only the chaining modes ECB, XTS and CTR are implemented. (The significance of ECB is that it could potentially be used by other chaining modes) * Measured on Cortex-A57. Note that this is still an order of magnitude slower than the implementations that use the dedicated AES instructions introduced in ARMv8, but those are part of an optional extension, and so it is good to have a fallback. Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Ard Biesheuvel 提交于
This replaces the scalar AES cipher that originates in the OpenSSL project with a new implementation that is ~15% (*) faster (on modern cores), and reuses the lookup tables and the key schedule generation routines from the generic C implementation (which is usually compiled in anyway due to networking and other subsystems depending on it). Note that the bit sliced NEON code for AES still depends on the scalar cipher that this patch replaces, so it is not removed entirely yet. * On Cortex-A57, the performance increases from 17.0 to 14.9 cycles per byte for 128-bit keys. Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Ard Biesheuvel 提交于
This adds a scalar implementation of AES, based on the precomputed tables that are exposed by the generic AES code. Since rotates are cheap on arm64, this implementation only uses the 4 core tables (of 1 KB each), and avoids the prerotated ones, reducing the D-cache footprint by 75%. On Cortex-A57, this code manages 13.0 cycles per byte, which is ~34% faster than the generic C code. (Note that this is still >13x slower than the code that uses the optional ARMv8 Crypto Extensions, which manages <1 cycles per byte.) Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Ard Biesheuvel 提交于
In addition to wrapping the AES-CTR cipher into the async SIMD wrapper, which exposes it as an async skcipher that defers processing to process context, expose our AES-CTR implementation directly as a synchronous cipher as well, but with a lower priority. This makes the AES-CTR transform usable in places where synchronous transforms are required, such as the MAC802.11 encryption code, which executes in sotfirq context, where SIMD processing is allowed on arm64. Users of the async transform will keep the existing behavior. Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Ard Biesheuvel 提交于
This is a straight port to ARM/NEON of the x86 SSE3 implementation of the ChaCha20 stream cipher. It uses the new skcipher walksize attribute to process the input in strides of 4x the block size. Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Ard Biesheuvel 提交于
This is a straight port to arm64/NEON of the x86 SSE3 implementation of the ChaCha20 stream cipher. It uses the new skcipher walksize attribute to process the input in strides of 4x the block size. Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Herbert Xu 提交于
The kernel on x86-64 cannot use gcc attribute align to align to a 16-byte boundary. This patch reverts to the old way of aligning it by hand. Fixes: 9ae433bc ("crypto: chacha20 - convert generic and...") Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
-
- 05 1月, 2017 5 次提交
-
-
由 Jan Dakinevich 提交于
Declaration of VMX_VPID_EXTENT_SUPPORTED_MASK occures twice in the code. Probably, it was happened after unsuccessful merge. Signed-off-by: NJan Dakinevich <jan.dakinevich@gmail.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 James Hogan 提交于
Flush the KVM entry code from the icache on all CPUs, not just the one that built the entry code. Signed-off-by: NJames Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Cc: <stable@vger.kernel.org> # 3.16.x- Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 James Hogan 提交于
On 64-bit kernels, MIPS KVM will clear CP0_Status.UX to prevent the guest (running in user mode) from accessing the 64-bit memory segments. However the previous value of CP0_Status.UX is never restored when exiting from the guest. If the user process uses 64-bit addressing (the n64 ABI) this can result in address error exceptions from the kernel if it needs to deliver a signal before returning to user mode, as the kernel will need to write a sigframe to high user addresses on the user stack which are disallowed by CP0_Status.UX=0. This is fixed by explicitly setting SX and UX again when exiting from the guest, and explicitly clearing those bits when returning to the guest. Having the SX and UX bits set when handling guest exits (rather than only when exiting to userland) will be helpful when we support VZ, since we shouldn't need to directly read or write guest memory, so it will be valid for cache management IPIs to access host user addresses. Signed-off-by: NJames Hogan <james.hogan@imgtec.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Cc: <stable@vger.kernel.org> # 4.8.x- Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Mark Rutland 提交于
Commit c02433dd ("arm64: split thread_info from task stack") inverted the relationship between get_current() and current_thread_info(), with sp_el0 now holding the current task_struct rather than the current thead_info. The new implementation of get_current() prevents the compiler from being able to optimize repeated calls to either, resulting in a noticeable penalty in some microbenchmarks. This patch restores the previous optimisation by implementing get_current() in the same way as our old current_thread_info(), using a non-volatile asm statement. Acked-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NMark Rutland <mark.rutland@arm.com> Reported-by: NDavidlohr Bueso <dbueso@suse.de> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-
由 Mark Rutland 提交于
Recent changes made KERN_CONT mandatory for continued lines. In the absence of KERN_CONT, a newline may be implicit inserted by the core printk code. In show_pte, we (erroneously) use printk without KERN_CONT for continued prints, resulting in output being split across a number of lines, and not matching the intended output, e.g. [ff000000000000] *pgd=00000009f511b003 , *pud=00000009f4a80003 , *pmd=0000000000000000 Fix this by using pr_cont() for all the continuations. Acked-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NMark Rutland <mark.rutland@arm.com> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-
- 04 1月, 2017 3 次提交
-
-
由 Kevin Hilman 提交于
Signed-off-by: NKevin Hilman <khilman@baylibre.com>
-
由 Neil Armstrong 提交于
Add Video Processing Unit and CVBS Output nodes, and enable CVBS on selected boards. Reviewed-by: NLaurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: NNeil Armstrong <narmstrong@baylibre.com> Signed-off-by: NKevin Hilman <khilman@baylibre.com>
-
由 Kevin Hilman 提交于
Signed-off-by: NKevin Hilman <khilman@baylibre.com>
-
- 03 1月, 2017 3 次提交
-
-
由 Fabio Estevam 提交于
Commit 1be81ea5 ("ARM: dts: imx6: Add imx-weim parameters to dtsi's") causes the following probe error when the weim node is not present on the board dts (such as imx6q-sabresd): imx-weim 21b8000.weim: Invalid 'ranges' configuration imx-weim: probe of 21b8000.weim failed with error -22 There is no need to always enable the "weim" node on mx6. Do the same as in the other i.MX dtsi files where "weim" is disabled and only gets enabled on a per dts basis. All the imx6 weim dts users explicitily provide 'status = "okay"', so this change has no impact on current imx6 weim users. If a board does not use the weim driver it will not describe its 'ranges' property, so simply disable the 'weim' node in the imx6 dtsi files to avoid such probe error message. Fixes: 1be81ea5 ("ARM: dts: imx6: Add imx-weim parameters to dtsi's") Signed-off-by: NFabio Estevam <fabio.estevam@nxp.com> Signed-off-by: NShawn Guo <shawnguo@kernel.org>
-
由 Helge Deller 提交于
Add a leading line break else printed line gets too long. Signed-off-by: NHelge Deller <deller@gmx.de> Cc: <stable@vger.kernel.org> # v4.9
-
由 Bjorn Andersson 提交于
As per the device tree binding the apq8064 scm node requires the core clock to be specified, so add this. Signed-off-by: NBjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: NAndy Gross <andy.gross@linaro.org>
-
- 02 1月, 2017 9 次提交
-
-
由 Alexandre Bailon 提交于
Everytime the usb20 phy is enabled, there is a "sleeping function called from invalid context" BUG. In addition, there is a recursive locking happening because of the recurse call to clk_enable(). clk_enable() from arch/arm/mach-davinci/clock.c uses spin_lock_irqsave() before to invoke the callback usb20_phy_clk_enable(). usb20_phy_clk_enable() uses clk_get() and clk_enable_prepapre() which may sleep. Replace clk_prepare_enable() by davinci_clk_enable(). Signed-off-by: NAlexandre Bailon <abailon@baylibre.com> Suggested-by: NDavid Lechner <david@lechnology.com> [nsekhar@ti.com: minor commit description adjustment] Signed-off-by: NSekhar Nori <nsekhar@ti.com>
-
由 Alexandre Bailon 提交于
In some cases, there is a need to enable a clock as part of clock enable callback of a different clock. For example, USB 2.0 PHY clock enable requires USB 2.0 clock to be enabled. In this case, it is safe to instead call __clk_enable() since the clock framework lock is already taken. Calling clk_enable() causes recursive locking error. A similar case arises in the clock disable path. To enable such usage, make __clk_{enable,disable} functions publicly available outside of clock.c. Also, call them davinci_clk_{enable|disable} now to be consistent with how other davinci-specific clock functions are named. Note that these functions are not exported to drivers. They are meant for usage in platform specific clock management code. Signed-off-by: NAlexandre Bailon <abailon@baylibre.com> Suggested-by: NDavid Lechner <david@lechnology.com> Signed-off-by: NSekhar Nori <nsekhar@ti.com>
-
由 Bartosz Golaszewski 提交于
Similarly to the aemif clock - this screws up the linked list of clock children. Create a separate clock for mdio inheriting the rate from emac_clk. Cc: <stable@vger.kernel.org> # 3.12.x- Signed-off-by: NBartosz Golaszewski <bgolaszewski@baylibre.com> [nsekhar@ti.com: add a comment over mdio_clk to explaing its existence + commit headline updates] Signed-off-by: NSekhar Nori <nsekhar@ti.com>
-
由 Bartosz Golaszewski 提交于
The aemif clock is added twice to the lookup table in da850.c. This breaks the children list of pll0_sysclk3 as we're using the same list links in struct clk. When calling clk_set_rate(), we get stuck in propagate_rate(). Create a separate clock for nand, inheriting the rate of the aemif clock and retrieve it in the davinci_nand module. Cc: <stable@vger.kernel.org> # 4.9.x Signed-off-by: NBartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: NSekhar Nori <nsekhar@ti.com>
-
由 Vladimir Murzin 提交于
There is no need to define map_io only for debug_ll_io_init() since it is already called in devicemaps_init() if map_io is NULL. Apart from that, for NOMMU build debug_ll_io_init() is a nop which leads to following error: CC arch/arm/mach-imx/mach-imx1.o arch/arm/mach-imx/mach-imx1.c:40:13: error: 'debug_ll_io_init' undeclared here (not in a function) .map_io = debug_ll_io_init, ^ make[1]: *** [arch/arm/mach-imx/mach-imx1.o] Error 1 Cc: Alexander Shiyan <shc_work@mail.ru> Cc: Sascha Hauer <kernel@pengutronix.de> Cc: Fabio Estevam <fabio.estevam@nxp.com> Signed-off-by: NVladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: NShawn Guo <shawnguo@kernel.org>
-
由 Andreas Färber 提交于
Found while reviewing Marvell dsa bindings usage. Fixes: f283745b ("arm: vf610: zii devel b: Add support for switch interrupts") Cc: Andrew Lunn <andrew@lunn.ch> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: NAndreas Färber <afaerber@suse.de> Reviewed-by: NAndrew Lunn <andrew@lunn.ch> Signed-off-by: NShawn Guo <shawnguo@kernel.org>
-
由 Gary Bisson 提交于
The NANDF_CS2 pad is also part of the wlan-vmmcgrp iomux group. Removing is from the usdhc2grp group avoids the following error: imx6q-pinctrl 20e0000.iomuxc: pin MX6Q_PAD_NANDF_CS2 already requested by regulators:regulator@4; cannot claim for 2194000.usdhc imx6q-pinctrl 20e0000.iomuxc: pin-187 (2194000.usdhc) status -22 imx6q-pinctrl 20e0000.iomuxc: could not request pin 187 (MX6Q_PAD_NANDF_CS2) from group usdhc2grp on device 20e0000.iomuxc Signed-off-by: NGary Bisson <gary.bisson@boundarydevices.com> Signed-off-by: NShawn Guo <shawnguo@kernel.org>
-
由 Vladimir Zapolskiy 提交于
On i.MX31 AVIC interrupt controller base address is at 0x68000000. The problem was shadowed by the AVIC driver, which takes the correct base address from a SoC specific header file. Fixes: d2a37b3d ("ARM i.MX31: Add devicetree support") Signed-off-by: NVladimir Zapolskiy <vladimir_zapolskiy@mentor.com> Reviewed-by: NFabio Estevam <fabio.estevam@nxp.com> Signed-off-by: NShawn Guo <shawnguo@kernel.org>
-
由 Stafford Horne 提交于
The build robot reports: .tmp_kallsyms1.o: In function `kallsyms_relative_base': >> (.rodata+0x8a18): undefined reference to `_text' This is when using 'make alldefconfig'. Adding this _text symbol to mark the start of the kernel as in other architecture fixes this. Signed-off-by: NStafford Horne <shorne@gmail.com> Acked-by: NJonas Bonn <jonas@southpole.se>
-
- 31 12月, 2016 1 次提交
-
-
由 Kishon Vijay Abraham I 提交于
Add 'gpios' property to pcie1 dt node and populate it with GPIO3_23 in order to drive PCIE_RESETn high. This gets PCIe cards to be detected in AM572X IDK board. Signed-off-by: NKishon Vijay Abraham I <kishon@ti.com> Signed-off-by: NTony Lindgren <tony@atomide.com>
-
- 30 12月, 2016 5 次提交
-
-
由 Sudeep Holla 提交于
The GICv2 CPU interface registers span across 8K, not 4K as indicated in the DT. Only the GICC_DIR register is located after the initial 4K boundary, leaving a functional system but without support for separately EOI'ing and deactivating interrupts. After this change the system supports split priority drop and interrupt deactivation. This patch is based on similar one from Christoffer Dall: commit 368400e2 ("ARM: dts: vexpress: Support GICC_DIR operations") Signed-off-by: NSudeep Holla <sudeep.holla@arm.com>
-
由 Christoffer Dall 提交于
The GICv2 CPU interface registers span across 8K, not 4K as indicated in the DT. Only the GICC_DIR register is located after the initial 4K boundary, leaving a functional system but without support for separately EOI'ing and deactivating interrupts. After this change the system supports split priority drop and interrupt deactivation. Acked-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org> [sudeep.holla@arm.com: included same fix for tc1 platform too] Signed-off-by: NSudeep Holla <sudeep.holla@arm.com>
-
由 Helge Deller 提交于
Commit 7e781418 ("signal: consolidate {TS,TLF}_RESTORE_SIGMASK code") introduced code with which the "restore sigmask" flag lives in task_struct instead of ti->flags. Let's use this optimization on parisc too. Signed-off-by: NHelge Deller <deller@gmx.de>
-
由 Helge Deller 提交于
The cr16 interval timer of each CPU is not syncronized to other cr16 timers in other CPUs in a SMP system. So, delay the registration of the cr16 clocksource until all CPUs have been detected and then - if we are on a SMP machine - mark the cr16 clocksource as unstable and lower it's rating before registering it at the clocksource framework. This patch fixes the stalled CPU warnings which we have seen since introduction of the cr16 clocksource. Signed-off-by: NHelge Deller <deller@gmx.de> Cc: <stable@vger.kernel.org> # v4.8+
-
由 Linus Torvalds 提交于
In commit 62906027 ("mm: add PageWaiters indicating tasks are waiting for a page bit") Nick Piggin made our page locking no longer unconditionally touch the hashed page waitqueue, which not only helps performance in general, but is particularly helpful on NUMA machines where the hashed wait queues can bounce around a lot. However, the "clear lock bit atomically and then test the waiters bit" sequence turns out to be much more expensive than it needs to be, because you get a nasty stall when trying to access the same word that just got updated atomically. On architectures where locking is done with LL/SC, this would be trivial to fix with a new primitive that clears one bit and tests another atomically, but that ends up not working on x86, where the only atomic operations that return the result end up being cmpxchg and xadd. The atomic bit operations return the old value of the same bit we changed, not the value of an unrelated bit. On x86, we could put the lock bit in the high bit of the byte, and use "xadd" with that bit (where the overflow ends up not touching other bits), and look at the other bits of the result. However, an even simpler model is to just use a regular atomic "and" to clear the lock bit, and then the sign bit in eflags will indicate the resulting state of the unrelated bit #7. So by moving the PageWaiters bit up to bit #7, we can atomically clear the lock bit and test the waiters bit on x86 too. And architectures with LL/SC (which is all the usual RISC suspects), the particular bit doesn't matter, so they are fine with this approach too. This avoids the extra access to the same atomic word, and thus avoids the costly stall at page unlock time. The only downside is that the interface ends up being a bit odd and specialized: clear a bit in a byte, and test the sign bit. Nick doesn't love the resulting name of the new primitive, but I'd rather make the name be descriptive and very clear about the limitation imposed by trying to work across all relevant architectures than make it be some generic thing that doesn't make the odd semantics explicit. So this introduces the new architecture primitive clear_bit_unlock_is_negative_byte(); and adds the trivial implementation for x86. We have a generic non-optimized fallback (that just does a "clear_bit()"+"test_bit(7)" combination) which can be overridden by any architecture that can do better. According to Nick, Power has the same hickup x86 has, for example, but some other architectures may not even care. All these optimizations mean that my page locking stress-test (which is just executing a lot of small short-lived shell scripts: "make test" in the git source tree) no longer makes our page locking look horribly bad. Before all these optimizations, just the unlock_page() costs were just over 3% of all CPU overhead on "make test". After this, it's down to 0.66%, so just a quarter of the cost it used to be. (The difference on NUMA is bigger, but there this micro-optimization is likely less noticeable, since the big issue on NUMA was not the accesses to 'struct page', but the waitqueue accesses that were already removed by Nick's earlier commit). Acked-by: NNick Piggin <npiggin@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Andrew Lutomirski <luto@kernel.org> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 29 12月, 2016 1 次提交
-
-
由 Stephen Boyd 提交于
This patch adds required memory carveouts so that the kernel does not access memory that is in use or has been reserved for use by other remote processors. Signed-off-by: NAndy Gross <andy.gross@linaro.org> Signed-off-by: NStephen Boyd <sboyd@codeaurora.org>
-
- 28 12月, 2016 1 次提交
-
-
由 Herbert Xu 提交于
This patch reverts the following commits: 8621caa0 80966672 I should not have applied them because they had already been obsoleted by a subsequent patch series. They also cause a build failure because of the subsequent commit 9ae433bc. Fixes: 9ae433bc ("crypto: chacha20 - convert generic and...") Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-