- 12 8月, 2016 1 次提交
-
-
由 Kees Cook 提交于
Guided by grsecurity's analogous __read_only markings in arch/arm, this applies several uses of __ro_after_init to structures that are only updated during __init. Signed-off-by: NKees Cook <keescook@chromium.org> Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
- 08 8月, 2016 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 23 6月, 2016 1 次提交
-
-
由 Nicolas Pitre 提交于
Now that we don't support ARMv3 anymore, the loop based delay code can convert microsecs into number of loops using a 64-bit multiplication and more precision. This allows us to lift the hard limit of 3355 on the bogomips value as loops_per_jiffy may now safely span the full 32-bit range. Signed-off-by: NNicolas Pitre <nico@linaro.org> Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
- 16 1月, 2016 1 次提交
-
-
由 Kirill A. Shutemov 提交于
With new refcounting we don't need to mark PMDs splitting. Let's drop code to handle this. pmdp_splitting_flush() is not needed too: on splitting PMD we will do pmdp_clear_flush() + set_pte_at(). pmdp_clear_flush() will do IPI as needed for fast_gup. [arnd@arndb.de: fix unterminated ifdef in header file] Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Steve Capper <steve.capper@linaro.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 12月, 2015 1 次提交
-
-
由 Nicolas Pitre 提交于
The ARM compiler inserts calls to __aeabi_idiv() and __aeabi_uidiv() when it needs to perform division on signed and unsigned integers. If a processor has support for the sdiv and udiv instructions, the kernel may overwrite the beginning of those functions with those instructions and a "bx lr" to get better performance. To ensure that those functions are aligned to a 32-bit word for easier patching (which might not always be the case in Thumb mode) and that the two patched instructions end up in the same cache line, a 8-byte alignment is enforced when ARM_PATCH_IDIV is selected. This was heavily inspired by a previous patch from Stephen Boyd. Signed-off-by: NNicolas Pitre <nico@linaro.org> Acked-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
- 15 12月, 2015 1 次提交
-
-
由 Russell King 提交于
The uaccess_with_memcpy() code is currently incompatible with the SW PAN code: it takes locks within the region that we've changed the DACR, potentially sleeping as a result. As we do not save and restore the DACR across co-operative sleep events, can lead to an incorrect DACR value later in this code path. Reported-by: NPeter Rosin <peda@axentia.se> Tested-by: NPeter Rosin <peda@axentia.se> Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
- 03 10月, 2015 1 次提交
-
-
由 Stephen Boyd 提交于
Add unwinding annotations so that unwinding from this function works properly. Signed-off-by: NStephen Boyd <sboyd@codeaurora.org> Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
- 27 8月, 2015 1 次提交
-
-
由 Russell King 提交于
Provide a software-based implementation of the priviledged no access support found in ARMv8.1. Userspace pages are mapped using a different domain number from the kernel and IO mappings. If we switch the user domain to "no access" when we enter the kernel, we can prevent the kernel from touching userspace. However, the kernel needs to be able to access userspace via the various user accessor functions. With the wrapping in the previous patch, we can temporarily enable access when the kernel needs user access, and re-disable it afterwards. This allows us to trap non-intended accesses to userspace, eg, caused by an inadvertent dereference of the LIST_POISON* values, which, with appropriate user mappings setup, can be made to succeed. This in turn can allow use-after-free bugs to be further exploited than would otherwise be possible. Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
- 25 8月, 2015 1 次提交
-
-
由 Russell King 提交于
Provide uaccess_save_and_enable() and uaccess_restore() to permit control of userspace visibility to the kernel, and hook these into the appropriate places in the kernel where we need to access userspace. Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
- 18 8月, 2015 1 次提交
-
-
由 Nicolas Pitre 提交于
The mmap semaphore should not be taken when page faults are disabled. Since pagefault_disable() no longer disables preemption, we now need to use faulthandler_disabled() in place of in_atomic(). Signed-off-by: NNicolas Pitre <nico@linaro.org> Tested-by: NMark Salter <msalter@redhat.com> Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
- 04 7月, 2015 1 次提交
-
-
由 Russell King 提交于
We don't want GCC optimising our memset_io(), memcpy_fromio() or memcpy_toio() variants, so we must not call one of the standard functions. Provide a separate name for our assembly memcpy() and memset() functions, and use that instead, thereby bypassing GCC's ability to optimise these operations. GCCs optimisation may introduce unaligned accesses which are invalid for device mappings. Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
- 26 5月, 2015 1 次提交
-
-
由 Antonio Ospite 提交于
Signed-off-by: NAntonio Ospite <ao2@ao2.it> Cc: Russell King <linux@arm.linux.org.uk> Cc: Christoffer Dall <christoffer.dall@linaro.org> Cc: linux-arm-kernel@lists.infradead.org Acked-by: NRussell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: NJiri Kosina <jkosina@suse.cz>
-
- 09 5月, 2015 1 次提交
-
-
由 Russell King 提交于
BSYM() was invented to allow us to work around a problem with the assembler, where local symbols resolved by the assembler for the 'adr' instruction did not take account of their ISA. Since we don't want BSYM() used elsewhere, replace BSYM() with a new macro 'badr', which is like the 'adr' pseudo-op, but with the BSYM() mechanics integrated into it. This ensures that the BSYM()-ification is only used in conjunction with 'adr'. Acked-by: NDave Martin <Dave.Martin@arm.com> Acked-by: NNicolas Pitre <nico@linaro.org> Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
- 15 4月, 2015 1 次提交
-
-
由 Russell King 提交于
We have recently had an example of someone wanting to use a 90kHz timer for the software delay loop. udelay() needs to have at least microsecond resolution to allow drivers access to a delay mechanism with a reasonable chance of delaying the period they requested within at least a 50% marging of error, especially for small delays. Discussion about the udelay() accuracy can be found at: https://lkml.org/lkml/2011/1/9/37 Reject timers which are unable to supply this level of resolution. Acked-by: NNicolas Pitre <nico@linaro.org> Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
- 30 3月, 2015 1 次提交
-
-
由 Ard Biesheuvel 提交于
This moves all fixup snippets to the .text.fixup section, which is a special section that gets emitted along with the .text section for each input object file, i.e., the snippets are kept much closer to the code they refer to, which helps prevent linker failure on large kernels. Acked-by: NNicolas Pitre <nico@linaro.org> Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
- 16 1月, 2015 1 次提交
-
-
由 Nicolas Pitre 提交于
This code was restored with commit 080fc66f ("ARM: Bring back ARMv3 IO and user access code") because the RiscPC memory bus does not understand half-word load/stores. However only the IO code needed restoring since the alternative user access code contains no half-word accesses, is already used when CONFIG_PREEMPT is set and runs faster on a StrongARM. Signed-off-by: NNicolas Pitre <nico@linaro.org> Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
- 28 11月, 2014 3 次提交
-
-
由 Lin Yongting 提交于
The memory copy functions(memcpy, __copy_from_user, __copy_to_user) never had unwinding annotations added. Currently, when accessing invalid pointer by these functions occurs the backtrace shown will stop at these functions or some completely unrelated function. Add unwinding annotations in hopes of getting a more useful backtrace in following cases: 1. die on accessing invalid pointer by these functions 2. kprobe trapped at any instruction within these functions 3. interrupted at any instruction within these functions Signed-off-by: NLin Yongting <linyongting@gmail.com> Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
由 Lin Yongting 提交于
The memmove function never had unwinding annotations added. Currently, when accessing invalid pointer by memmove occurs the backtrace shown will stop at memmove or some completely unrelated function. Add unwinding annotations in hopes of getting a more useful backtrace in following cases: 1. die on accessing invalid pointer by memmove 2. kprobe trapped at any instruction within memmove 3. interrupted at any instruction within memmove Signed-off-by: NLin Yongting <linyongting@gmail.com> Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
由 Lin Yongting 提交于
The __memzero function never had unwinding annotations added. Currently, when accessing invalid pointer by __memzero occurs the backtrace shown will stop at __memzero or some completely unrelated function. Add unwinding annotations in hopes of getting a more useful backtrace in following cases: 1. die on accessing invalid pointer by __memzero 2. kprobe trapped at any instruction within __memzero 3. interrupted at any instruction within __memzero Signed-off-by: NLin Yongting <linyongting@gmail.com> Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
- 21 11月, 2014 1 次提交
-
-
由 Lin Yongting 提交于
The memset function never had unwinding annotations added. Currently, when accessing NULL pointer by memset occurs the backtrace shown will stop at memset or some completely unrelated function. Add unwinding annotations in hopes of getting a more useful backtrace when accessing NULL pointer by memset, kprobe or interrupt. Signed-off-by: NLin Yongting <linyongting@gmail.com> Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
- 13 9月, 2014 1 次提交
-
-
由 Victor Kamensky 提交于
e38361d0 'ARM: 8091/2: add get_user() support for 8 byte types' commit broke V7 BE get_user call when target var size is 64 bit, but '*ptr' size is 32 bit or smaller. e38361d0 changed type of __r2 from 'register unsigned long' to 'register typeof(x) __r2 asm("r2")' i.e before the change even when target variable size was 64 bit, __r2 was still 32 bit. But after e38361d0 commit, for target var of 64 bit size, __r2 became 64 bit and now it should occupy 2 registers r2, and r3. The issue in BE case that r3 register is least significant word of __r2 and r2 register is most significant word of __r2. But __get_user_4 still copies result into r2 (most significant word of __r2). Subsequent code copies from __r2 into x, but for situation described it will pick up only garbage from r3 register. Special __get_user_64t_(124) functions are introduced. They are similar to corresponding __get_user_(124) function but result stored in r3 register (lsw in case of 64 bit __r2 in BE image). Those function are used by get_user macro in case of BE and target var size is 64bit. Also changed __get_user_lo8 name into __get_user_32t_8 to get consistent naming accross all cases. Signed-off-by: NVictor Kamensky <victor.kamensky@linaro.org> Suggested-by: NDaniel Thompson <daniel.thompson@linaro.org> Reviewed-by: NDaniel Thompson <daniel.thompson@linaro.org> Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
- 18 7月, 2014 2 次提交
-
-
由 Daniel Thompson 提交于
Recent contributions, including to DRM and binder, introduce 64-bit values in their interfaces. A common motivation for this is to allow the same ABI for 32- and 64-bit userspaces (and therefore also a shared ABI for 32/64 hybrid userspaces). Anyhow, the developers would like to avoid gotchas like having to use copy_from_user(). This feature is already implemented on x86-32 and the majority of other 32-bit architectures. The current list of get_user_8 hold out architectures are: arm, avr32, blackfin, m32r, metag, microblaze, mn10300, sh. Credit: My name sits rather uneasily at the top of this patch. The v1 and v2 versions of the patch were written by Rob Clark and to produce v4 I mostly copied code from Russell King and H. Peter Anvin. However I have mangled the patch sufficiently that *blame* is rightfully mine even if credit should more widely shared. Changelog: v5: updated to use the ret macro (requested by Russell King) v4: remove an inlined add on big endian systems (spotted by Russell King), used __ARMEB__ rather than BIG_ENDIAN (to match rest of file), cleared r3 on EFAULT during __get_user_8. v3: fix a couple of checkpatch issues v2: pass correct size to check_uaccess, and better handling of narrowing double word read with __get_user_xb() (Russell King's suggestion) v1: original Signed-off-by: NRob Clark <robdclark@gmail.com> Signed-off-by: NDaniel Thompson <daniel.thompson@linaro.org> Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
由 Russell King 提交于
ARMv6 and greater introduced a new instruction ("bx") which can be used to return from function calls. Recent CPUs perform better when the "bx lr" instruction is used rather than the "mov pc, lr" instruction, and this sequence is strongly recommended to be used by the ARM architecture manual (section A.4.1.1). We provide a new macro "ret" with all its variants for the condition code which will resolve to the appropriate instruction. Rather than doing this piecemeal, and miss some instances, change all the "mov pc" instances to use the new macro, with the exception of the "movs" instruction and the kprobes code. This allows us to detect the "mov pc, lr" case and fix it up - and also gives us the possibility of deploying this for other registers depending on the CPU selection. Reported-by: NWill Deacon <will.deacon@arm.com> Tested-by: Stephen Warren <swarren@nvidia.com> # Tegra Jetson TK1 Tested-by: Robert Jarzmik <robert.jarzmik@free.fr> # mioa701_bootresume.S Tested-by: Andrew Lunn <andrew@lunn.ch> # Kirkwood Tested-by: NShawn Guo <shawn.guo@freescale.com> Tested-by: Tony Lindgren <tony@atomide.com> # OMAPs Tested-by: Gregory CLEMENT <gregory.clement@free-electrons.com> # Armada XP, 375, 385 Acked-by: Sekhar Nori <nsekhar@ti.com> # DaVinci Acked-by: Christoffer Dall <christoffer.dall@linaro.org> # kvm/hyp Acked-by: Haojian Zhuang <haojian.zhuang@gmail.com> # PXA3xx Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> # Xen Tested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> # ARMv7M Tested-by: Simon Horman <horms+renesas@verge.net.au> # Shmobile Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
- 17 6月, 2014 1 次提交
-
-
由 Peter De Schrijver 提交于
In case there are several possible delay timers, choose the one with the highest resolution. This code relies on the fact secondary CPUs have not yet been brought online when register_current_timer_delay() is called. This is ensured by implementing calibration_delay_done(), Signed-off-by: NPeter De Schrijver <pdeschrijver@nvidia.com> Acked-by: NRussell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: NStephen Warren <swarren@nvidia.com>
-
- 25 2月, 2014 2 次提交
-
-
由 Victor Kamensky 提交于
Renames logical shift macros, 'push' and 'pull', defined in arch/arm/include/asm/assembler.h, into 'lspush' and 'lspull'. That eliminates name conflict between 'push' logical shift macro and 'push' instruction mnemonic. That allows assembler.h to be included in .S files that use 'push' instruction. Suggested-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NVictor Kamensky <victor.kamensky@linaro.org> Acked-by: NNicolas Pitre <nico@linaro.org> Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
由 Will Deacon 提交于
After a bunch of benchmarking on the interaction between dmb and pldw, it turns out that issuing the pldw *after* the dmb instruction can give modest performance gains (~3% atomic_add_return improvement on a dual A15). This patch adds prefetchw invocations to our barriered atomic operations including cmpxchg, test_and_xxx and futexes. Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
- 29 12月, 2013 2 次提交
-
-
由 Kim Phillips 提交于
Enable the compiler intrinsic for byte swapping on arch ARM. This allows the compiler to detect and be able to optimize out byte swappings, and has a very modest benefit on vmlinux size (Linaro gcc 4.8): text data bss dec hex filename 2840310 123932 61960 3026202 2e2d1a vmlinux-lart #orig 2840152 123932 61960 3026044 2e2c7c vmlinux-lart #builtin-bswap 6473120 314840 5616016 12403976 bd4508 vmlinux-mxs #orig 6472586 314848 5616016 12403450 bd42fa vmlinux-mxs #builtin-bswap 7419872 318372 379556 8117800 7bde28 vmlinux-imx_v6_v7 #orig 7419170 318364 379556 8117090 7bdb62 vmlinux-imx_v6_v7 #builtin-bswap Signed-off-by: NKim Phillips <kim.phillips@freescale.com> Reviewed-by: NNicolas Pitre <nico@linaro.org> Acked-by: NDavid Woodhouse <David.Woodhouse@intel.com> Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
由 Russell King 提交于
We don't need the offset for the first function name in each backtrace entry; this needlessly consumes screen space. This is virtually always the first or second instruction in the called function. Also, recognise stmfd instructions which include r10 as a valid stack saving instruction, and when dumping the registers, dump six registers per line rather than five, and fix the wrapping. Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
- 01 12月, 2013 1 次提交
-
-
由 Fabio Estevam 提交于
Currently mx53 (CortexA8) running at 1GHz reports: Calibrating delay loop... 663.55 BogoMIPS (lpj=3317760) Tom Evans verified that alignments of 0x0 and 0x8 run the two instructions of __loop_delay in one clock cycle (1 clock/loop), while alignments of 0x4 and 0xc take 3 clocks to run the loop twice. (1.5 clock/loop) The original object code looks like this: 00000010 <__loop_const_udelay>: 10: e3e01000 mvn r1, #0 14: e51f201c ldr r2, [pc, #-28] ; 0 <__loop_udelay-0x8> 18: e5922000 ldr r2, [r2] 1c: e0800921 add r0, r0, r1, lsr #18 20: e1a00720 lsr r0, r0, #14 24: e0822b21 add r2, r2, r1, lsr #22 28: e1a02522 lsr r2, r2, #10 2c: e0000092 mul r0, r2, r0 30: e0800d21 add r0, r0, r1, lsr #26 34: e1b00320 lsrs r0, r0, #6 38: 01a0f00e moveq pc, lr 0000003c <__loop_delay>: 3c: e2500001 subs r0, r0, #1 40: 8afffffe bhi 3c <__loop_delay> 44: e1a0f00e mov pc, lr After adding the 'align 3' directive to __loop_delay (align to 8 bytes): 00000010 <__loop_const_udelay>: 10: e3e01000 mvn r1, #0 14: e51f201c ldr r2, [pc, #-28] ; 0 <__loop_udelay-0x8> 18: e5922000 ldr r2, [r2] 1c: e0800921 add r0, r0, r1, lsr #18 20: e1a00720 lsr r0, r0, #14 24: e0822b21 add r2, r2, r1, lsr #22 28: e1a02522 lsr r2, r2, #10 2c: e0000092 mul r0, r2, r0 30: e0800d21 add r0, r0, r1, lsr #26 34: e1b00320 lsrs r0, r0, #6 38: 01a0f00e moveq pc, lr 3c: e320f000 nop {0} 00000040 <__loop_delay>: 40: e2500001 subs r0, r0, #1 44: 8afffffe bhi 40 <__loop_delay> 48: e1a0f00e mov pc, lr 4c: e320f000 nop {0} , which now reports: Calibrating delay loop... 996.14 BogoMIPS (lpj=4980736) Some more test results: On mx31 (ARM1136) running at 532 MHz, before the patch: Calibrating delay loop... 351.43 BogoMIPS (lpj=1757184) On mx31 (ARM1136) running at 532 MHz after the patch: Calibrating delay loop... 528.79 BogoMIPS (lpj=2643968) Also tested on mx6 (CortexA9) and on mx27 (ARM926), which shows the same BogoMIPS value before and after this patch. Reported-by: NTom Evans <tom_usenet@optusnet.com.au> Suggested-by: NTom Evans <tom_usenet@optusnet.com.au> Signed-off-by: NFabio Estevam <fabio.estevam@freescale.com> Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
- 21 11月, 2013 1 次提交
-
-
由 Will Deacon 提交于
Uwe reported a build failure when targetting a NOMMU platform with my recent prefetch changes: arch/arm/lib/changebit.S: Assembler messages: arch/arm/lib/changebit.S:15: Error: architectural extension `mp' is not allowed for the current base architecture This is due to use of the .arch_extension mp directive immediately prior to an ALT_SMP(...) instruction. Whilst the ALT_SMP macro will expand to nothing if !CONFIG_SMP, gas will still choke on the directive. This patch fixes the issue by only emitting the sequence (including the directive) if CONFIG_SMP=y. Tested-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
- 29 10月, 2013 1 次提交
-
-
由 Steven Capper 提交于
The memory pinning code in uaccess_with_memcpy.c does not check for HugeTLB or THP pmds, and will enter an infinite loop should a __copy_to_user or __clear_user occur against a huge page. This patch adds detection code for huge pages to pin_page_for_write. As this code can be executed in a fast path it refers to the actual pmds rather than the vma. If a HugeTLB or THP is found (they have the same pmd representation on ARM), the page table spinlock is taken to prevent modification whilst the page is pinned. On ARM, huge pages are only represented as pmds, thus no huge pud checks are performed. (For huge puds one would lock the page table in a similar manner as in the pmd case). Two helper functions are introduced; pmd_thp_or_huge will check whether or not a page is huge or transparent huge (which have the same pmd layout on ARM), and pmd_hugewillfault will detect whether or not a page fault will occur on write to the page. Running the following test (with the chunking from read_zero removed): $ dd if=/dev/zero of=/dev/null bs=10M count=1024 Gave: 2.3 GB/s backed by normal pages, 2.9 GB/s backed by huge pages, 5.1 GB/s backed by huge pages, with page mask=HPAGE_MASK. After some discussion, it was decided not to adopt the HPAGE_MASK, as this would have a significant detrimental effect on the overall system latency due to page_table_lock being held for too long. This could be revisited if split huge page locks are adopted. Signed-off-by: NSteve Capper <steve.capper@linaro.org> Reviewed-by: NNicolas Pitre <nico@linaro.org> Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
- 30 9月, 2013 1 次提交
-
-
由 Will Deacon 提交于
The cost of changing a cacheline from shared to exclusive state can be significant, especially when this is triggered by an exclusive store, since it may result in having to retry the transaction. This patch prefixes our atomic bitops implementation with prefetchw, to try and grab the line in exclusive state from the start. The testop macro is left alone, since the barrier semantics limit the usefulness of prefetching data. Acked-by: NNicolas Pitre <nico@linaro.org> Signed-off-by: NWill Deacon <will.deacon@arm.com>
-
- 17 9月, 2013 1 次提交
-
-
由 Linus Walleij 提交于
The Shark machine sub-architecture (also known as DNARD, the DIGITAL Network Appliance Reference Design) lacks a maintainer able to apply and test patches to modernize the architecture. It is suspected that the current kernel, while it compiles, does not even boot on this machine. The listed maintainer has expressed that he will not be able to spend any time on the maintenance for the coming year. So let's delete it from the kernel for now. It can always be resurrected with git revert if maintenance is resumed. As the VIA82c505 PCI adapter was only used by this architecture, that gets deleted too. Cc: arm@kernel.org Cc: Alexander Schulz <alex@shark-linux.de> Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
-
- 09 9月, 2013 1 次提交
-
-
由 Ard Biesheuvel 提交于
Commit 01956597 introduced a NEON accelerated version of the xor_blocks() function, but it needs the changes in this patch to allow it to be built as a module rather than statically into the kernel. This patch creates a separate module xor-neon.ko which exports the NEON inner xor_blocks() functions depended upon by the regular xor.ko if it is built with CONFIG_KERNEL_MODE_NEON=y Reported-by: NJosh Boyer <jwboyer@fedoraproject.org> Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
- 15 7月, 2013 1 次提交
-
-
由 Paul Gortmaker 提交于
The __cpuinit type of throwaway sections might have made sense some time ago when RAM was more constrained, but now the savings do not offset the cost and complications. For example, the fix in commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time") is a good example of the nasty type of bugs that can be created with improper use of the various __init prefixes. After a discussion on LKML[1] it was decided that cpuinit should go the way of devinit and be phased out. Once all the users are gone, we can then finally remove the macros themselves from linux/init.h. Note that some harmless section mismatch warnings may result, since notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c) and are flagged as __cpuinit -- so if we remove the __cpuinit from the arch specific callers, we will also get section mismatch warnings. As an intermediate step, we intend to turn the linux/init.h cpuinit related content into no-ops as early as possible, since that will get rid of these warnings. In any case, they are temporary and harmless. This removes all the ARM uses of the __cpuinit macros from C code, and all __CPUINIT from assembly code. It also had two ".previous" section statements that were paired off against __CPUINIT (aka .section ".cpuinit.text") that also get removed here. [1] https://lkml.org/lkml/2013/5/20/589 Cc: Russell King <linux@arm.linux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
- 09 7月, 2013 1 次提交
-
-
由 Ard Biesheuvel 提交于
Add a source file xor-neon.c (which is really just the reference C implementation passed through the GCC vectorizer) and hook it up to the XOR framework. Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: NNicolas Pitre <nico@linaro.org>
-
- 03 4月, 2013 1 次提交
-
-
由 Will Deacon 提交于
Commit 70264367 ("ARM: 7653/2: do not scale loops_per_jiffy when using a constant delay clock") fixed a problem with our timer-based delay loop, where loops_per_jiffy is scaled by cpufreq yet used directly by the timer delay ops. This patch fixes the problem in a more elegant way by keeping a private ticks_per_jiffy field in the delay ops, independent of loops_per_jiffy and therefore not subject to scaling. The loop-based delay continues to use loops_per_jiffy directly, as it should. Acked-by: NNicolas Pitre <nico@linaro.org> Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
- 12 3月, 2013 1 次提交
-
-
由 Nicolas Pitre 提交于
Commit 455bd4c4 ("ARM: 7668/1: fix memset-related crashes caused by recent GCC (4.7.2) optimizations") attempted to fix a compliance issue with the memset return value. However the memset itself became broken by that patch for misaligned pointers. This fixes the above by branching over the entry code from the misaligned fixup code to avoid reloading the original pointer. Also, because the function entry alignment is wrong in the Thumb mode compilation, that fixup code is moved to the end. While at it, the entry instructions are slightly reworked to help dual issue pipelines. Signed-off-by: NNicolas Pitre <nico@linaro.org> Tested-by: NAlexander Holler <holler@ahsoftware.de> Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
- 08 3月, 2013 1 次提交
-
-
由 Ivan Djelic 提交于
Recent GCC versions (e.g. GCC-4.7.2) perform optimizations based on assumptions about the implementation of memset and similar functions. The current ARM optimized memset code does not return the value of its first argument, as is usually expected from standard implementations. For instance in the following function: void debug_mutex_lock_common(struct mutex *lock, struct mutex_waiter *waiter) { memset(waiter, MUTEX_DEBUG_INIT, sizeof(*waiter)); waiter->magic = waiter; INIT_LIST_HEAD(&waiter->list); } compiled as: 800554d0 <debug_mutex_lock_common>: 800554d0: e92d4008 push {r3, lr} 800554d4: e1a00001 mov r0, r1 800554d8: e3a02010 mov r2, #16 ; 0x10 800554dc: e3a01011 mov r1, #17 ; 0x11 800554e0: eb04426e bl 80165ea0 <memset> 800554e4: e1a03000 mov r3, r0 800554e8: e583000c str r0, [r3, #12] 800554ec: e5830000 str r0, [r3] 800554f0: e5830004 str r0, [r3, #4] 800554f4: e8bd8008 pop {r3, pc} GCC assumes memset returns the value of pointer 'waiter' in register r0; causing register/memory corruptions. This patch fixes the return value of the assembly version of memset. It adds a 'mov' instruction and merges an additional load+store into existing load/store instructions. For ease of review, here is a breakdown of the patch into 4 simple steps: Step 1 ====== Perform the following substitutions: ip -> r8, then r0 -> ip, and insert 'mov ip, r0' as the first statement of the function. At this point, we have a memset() implementation returning the proper result, but corrupting r8 on some paths (the ones that were using ip). Step 2 ====== Make sure r8 is saved and restored when (! CALGN(1)+0) == 1: save r8: - str lr, [sp, #-4]! + stmfd sp!, {r8, lr} and restore r8 on both exit paths: - ldmeqfd sp!, {pc} @ Now <64 bytes to go. + ldmeqfd sp!, {r8, pc} @ Now <64 bytes to go. (...) tst r2, #16 stmneia ip!, {r1, r3, r8, lr} - ldr lr, [sp], #4 + ldmfd sp!, {r8, lr} Step 3 ====== Make sure r8 is saved and restored when (! CALGN(1)+0) == 0: save r8: - stmfd sp!, {r4-r7, lr} + stmfd sp!, {r4-r8, lr} and restore r8 on both exit paths: bgt 3b - ldmeqfd sp!, {r4-r7, pc} + ldmeqfd sp!, {r4-r8, pc} (...) tst r2, #16 stmneia ip!, {r4-r7} - ldmfd sp!, {r4-r7, lr} + ldmfd sp!, {r4-r8, lr} Step 4 ====== Rewrite register list "r4-r7, r8" as "r4-r8". Signed-off-by: NIvan Djelic <ivan.djelic@parrot.com> Reviewed-by: NNicolas Pitre <nico@linaro.org> Signed-off-by: NDirk Behme <dirk.behme@gmail.com> Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
- 21 2月, 2013 1 次提交
-
-
由 Nicolas Pitre 提交于
When udelay() is implemented using an architected timer, it is wrong to scale loops_per_jiffy when changing the CPU clock frequency since the timer clock remains constant. The lpj should probably become an implementation detail relevant to the CPU loop based delay routine only and more confined to it. In the mean time this is the minimal fix needed to have expected delays with the timer based implementation when cpufreq is also in use. Reported-by: NViresh Kumar <viresh.kumar@linaro.org> Signed-off-by: NNicolas Pitre <nico@linaro.org> Tested-by: NViresh Kumar <viresh.kumar@linaro.org> Acked-by: NLiviu Dudau <Liviu.Dudau@arm.com> Cc: stable@vger.kernel.org Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-