- 28 2月, 2019 1 次提交
-
-
由 Daniel Borkmann 提交于
From networking side, there are numerous attempts to get rid of indirect calls in fast-path wherever feasible in order to avoid the cost of retpolines, for example, just to name a few: * 283c16a2 ("indirect call wrappers: helpers to speed-up indirect calls of builtin") * aaa5d90b ("net: use indirect call wrappers at GRO network layer") * 028e0a47 ("net: use indirect call wrappers at GRO transport layer") * 356da6d0 ("dma-mapping: bypass indirect calls for dma-direct") * 09772d92 ("bpf: avoid retpoline for lookup/update/delete calls on maps") * 10870dd8 ("netfilter: nf_tables: add direct calls for all builtin expressions") [...] Recent work on XDP from Björn and Magnus additionally found that manually transforming the XDP return code switch statement with more than 5 cases into if-else combination would result in a considerable speedup in XDP layer due to avoidance of indirect calls in CONFIG_RETPOLINE enabled builds. On i40e driver with XDP prog attached, a 20-26% speedup has been observed [0]. Aside from XDP, there are many other places later in the networking stack's critical path with similar switch-case processing. Rather than fixing every XDP-enabled driver and locations in stack by hand, it would be good to instead raise the limit where gcc would emit expensive indirect calls from the switch under retpolines and stick with the default as-is in case of !retpoline configured kernels. This would also have the advantage that for archs where this is not necessary, we let compiler select the underlying target optimization for these constructs and avoid potential slow-downs by if-else hand-rewrite. In case of gcc, this setting is controlled by case-values-threshold which has an architecture global default that selects 4 or 5 (latter if target does not have a case insn that compares the bounds) where some arch back ends like arm64 or s390 override it with their own target hooks, for example, in gcc commit db7a90aa0de5 ("S/390: Disable prediction of indirect branches") the threshold pretty much disables jump tables by limit of 20 under retpoline builds. Comparing gcc's and clang's default code generation on x86-64 under O2 level with retpoline build results in the following outcome for 5 switch cases: * gcc with -mindirect-branch=thunk-inline -mindirect-branch-register: # gdb -batch -ex 'disassemble dispatch' ./c-switch Dump of assembler code for function dispatch: 0x0000000000400be0 <+0>: cmp $0x4,%edi 0x0000000000400be3 <+3>: ja 0x400c35 <dispatch+85> 0x0000000000400be5 <+5>: lea 0x915f8(%rip),%rdx # 0x4921e4 0x0000000000400bec <+12>: mov %edi,%edi 0x0000000000400bee <+14>: movslq (%rdx,%rdi,4),%rax 0x0000000000400bf2 <+18>: add %rdx,%rax 0x0000000000400bf5 <+21>: callq 0x400c01 <dispatch+33> 0x0000000000400bfa <+26>: pause 0x0000000000400bfc <+28>: lfence 0x0000000000400bff <+31>: jmp 0x400bfa <dispatch+26> 0x0000000000400c01 <+33>: mov %rax,(%rsp) 0x0000000000400c05 <+37>: retq 0x0000000000400c06 <+38>: nopw %cs:0x0(%rax,%rax,1) 0x0000000000400c10 <+48>: jmpq 0x400c90 <fn_3> 0x0000000000400c15 <+53>: nopl (%rax) 0x0000000000400c18 <+56>: jmpq 0x400c70 <fn_2> 0x0000000000400c1d <+61>: nopl (%rax) 0x0000000000400c20 <+64>: jmpq 0x400c50 <fn_1> 0x0000000000400c25 <+69>: nopl (%rax) 0x0000000000400c28 <+72>: jmpq 0x400c40 <fn_0> 0x0000000000400c2d <+77>: nopl (%rax) 0x0000000000400c30 <+80>: jmpq 0x400cb0 <fn_4> 0x0000000000400c35 <+85>: push %rax 0x0000000000400c36 <+86>: callq 0x40dd80 <abort> End of assembler dump. * clang with -mretpoline emitting search tree: # gdb -batch -ex 'disassemble dispatch' ./c-switch Dump of assembler code for function dispatch: 0x0000000000400b30 <+0>: cmp $0x1,%edi 0x0000000000400b33 <+3>: jle 0x400b44 <dispatch+20> 0x0000000000400b35 <+5>: cmp $0x2,%edi 0x0000000000400b38 <+8>: je 0x400b4d <dispatch+29> 0x0000000000400b3a <+10>: cmp $0x3,%edi 0x0000000000400b3d <+13>: jne 0x400b52 <dispatch+34> 0x0000000000400b3f <+15>: jmpq 0x400c50 <fn_3> 0x0000000000400b44 <+20>: test %edi,%edi 0x0000000000400b46 <+22>: jne 0x400b5c <dispatch+44> 0x0000000000400b48 <+24>: jmpq 0x400c20 <fn_0> 0x0000000000400b4d <+29>: jmpq 0x400c40 <fn_2> 0x0000000000400b52 <+34>: cmp $0x4,%edi 0x0000000000400b55 <+37>: jne 0x400b66 <dispatch+54> 0x0000000000400b57 <+39>: jmpq 0x400c60 <fn_4> 0x0000000000400b5c <+44>: cmp $0x1,%edi 0x0000000000400b5f <+47>: jne 0x400b66 <dispatch+54> 0x0000000000400b61 <+49>: jmpq 0x400c30 <fn_1> 0x0000000000400b66 <+54>: push %rax 0x0000000000400b67 <+55>: callq 0x40dd20 <abort> End of assembler dump. For sake of comparison, clang without -mretpoline: # gdb -batch -ex 'disassemble dispatch' ./c-switch Dump of assembler code for function dispatch: 0x0000000000400b30 <+0>: cmp $0x4,%edi 0x0000000000400b33 <+3>: ja 0x400b57 <dispatch+39> 0x0000000000400b35 <+5>: mov %edi,%eax 0x0000000000400b37 <+7>: jmpq *0x492148(,%rax,8) 0x0000000000400b3e <+14>: jmpq 0x400bf0 <fn_0> 0x0000000000400b43 <+19>: jmpq 0x400c30 <fn_4> 0x0000000000400b48 <+24>: jmpq 0x400c10 <fn_2> 0x0000000000400b4d <+29>: jmpq 0x400c20 <fn_3> 0x0000000000400b52 <+34>: jmpq 0x400c00 <fn_1> 0x0000000000400b57 <+39>: push %rax 0x0000000000400b58 <+40>: callq 0x40dcf0 <abort> End of assembler dump. Raising the cases to a high number (e.g. 100) will still result in similar code generation pattern with clang and gcc as above, in other words clang generally turns off jump table emission by having an extra expansion pass under retpoline build to turn indirectbr instructions from their IR into switch instructions as a built-in -mno-jump-table lowering of a switch (in this case, even if IR input already contained an indirect branch). For gcc, adding --param=case-values-threshold=20 as in similar fashion as s390 in order to raise the limit for x86 retpoline enabled builds results in a small vmlinux size increase of only 0.13% (before=18,027,528 after=18,051,192). For clang this option is ignored due to i) not being needed as mentioned and ii) not having above cmdline parameter. Non-retpoline-enabled builds with gcc continue to use the default case-values-threshold setting, so nothing changes here. [0] https://lore.kernel.org/netdev/20190129095754.9390-1-bjorn.topel@gmail.com/ and "The Path to DPDK Speeds for AF_XDP", LPC 2018, networking track: - http://vger.kernel.org/lpc_net2018_talks/lpc18_pres_af_xdp_perf-v3.pdf - http://vger.kernel.org/lpc_net2018_talks/lpc18_paper_af_xdp_perf-v2.pdfSigned-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Acked-by: NBjörn Töpel <bjorn.topel@intel.com> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: netdev@vger.kernel.org Cc: David S. Miller <davem@davemloft.net> Cc: Magnus Karlsson <magnus.karlsson@intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Link: https://lkml.kernel.org/r/20190221221941.29358-1-daniel@iogearbox.net
-
- 16 1月, 2019 1 次提交
-
-
由 Borislav Petkov 提交于
The various x86 linker scripts use the three-argument linker script command variant OUTPUT_FORMAT(DEFAULT, BIG, LITTLE) which specifies three object file formats when the -EL and -EB linker command line options are used. When -EB is specified, OUTPUT_FORMAT issues the BIG object file format, when -EL, LITTLE, respectively, and when neither is specified, DEFAULT. However, those -E[LB] options are not used by arch/x86/ so switch to the simple OUTPUT_FORMAT(BFDNAME) macro variant. No functional changes. Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Link: https://lkml.kernel.org/r/20190109181531.27513-1-bp@alien8.de
-
- 12 1月, 2019 1 次提交
-
-
由 George Rimar 提交于
The kernel uses the OUTPUT_FORMAT linker script command in it's linker scripts. Most of the time, the -m option is passed to the linker with correct architecture, but sometimes (at least for x86_64) the -m option contradicts the OUTPUT_FORMAT directive. Specifically, arch/x86/boot and arch/x86/realmode/rm produce i386 object files, but are linked with the -m elf_x86_64 linker flag when building for x86_64. The GNU linker manpage doesn't explicitly state any tie-breakers between -m and OUTPUT_FORMAT. But with BFD and Gold linkers, OUTPUT_FORMAT overrides the emulation value specified with the -m option. LLVM lld has a different behavior, however. When supplied with contradicting -m and OUTPUT_FORMAT values it fails with the following error message: ld.lld: error: arch/x86/realmode/rm/header.o is incompatible with elf_x86_64 Therefore, just add the correct -m after the incorrect one (it overrides it), so the linker invocation looks like this: ld -m elf_x86_64 -z max-page-size=0x200000 -m elf_i386 --emit-relocs -T \ realmode.lds header.o trampoline_64.o stack.o reboot.o -o realmode.elf This is not a functional change for GNU ld, because (although not explicitly documented) OUTPUT_FORMAT overrides -m EMULATION. Tested by building x86_64 kernel with GNU gcc/ld toolchain and booting it in QEMU. [ bp: massage and clarify text. ] Suggested-by: NDmitry Golovin <dima@golovin.in> Signed-off-by: NGeorge Rimar <grimar@accesssoftek.com> Signed-off-by: NTri Vo <trong@android.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Tested-by: NTri Vo <trong@android.com> Tested-by: NNick Desaulniers <ndesaulniers@google.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Michael Matz <matz@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: morbo@google.com Cc: ndesaulniers@google.com Cc: ruiu@google.com Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190111201012.71210-1-trong@android.com
-
- 09 1月, 2019 1 次提交
-
-
Accessing per-CPU variables is done by finding the offset of the variable in the per-CPU block and adding it to the address of the respective CPU's block. Section 3.10.8 of ld.bfd's documentation states: For expressions involving numbers, relative addresses and absolute addresses, ld follows these rules to evaluate terms: Other binary operations, that is, between two relative addresses not in the same section, or between a relative address and an absolute address, first convert any non-absolute term to an absolute address before applying the operator." Note that LLVM's linker does not adhere to the GNU ld's implementation and as such requires implicitly-absolute terms to be explicitly marked as absolute in the linker script. If not, it fails currently with: ld.lld: error: ./arch/x86/kernel/vmlinux.lds:153: at least one side of the expression must be absolute ld.lld: error: ./arch/x86/kernel/vmlinux.lds:154: at least one side of the expression must be absolute Makefile:1040: recipe for target 'vmlinux' failed This is not a functional change for ld.bfd which converts the term to an absolute symbol anyways as specified above. Based on a previous submission by Tri Vo <trong@android.com>. Reported-by: NDmitry Golovin <dima@golovin.in> Signed-off-by: NRafael Ávila de Espíndola <rafael@espindo.la> [ Update commit message per Boris' and Michael's suggestions. ] Signed-off-by: NNick Desaulniers <ndesaulniers@google.com> [ Massage commit message more, fix typos. ] Signed-off-by: NBorislav Petkov <bp@suse.de> Tested-by: NDmitry Golovin <dima@golovin.in> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: Cao Jin <caoj.fnst@cn.fujitsu.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tri Vo <trong@android.com> Cc: dima@golovin.in Cc: morbo@google.com Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20181219190145.252035-1-ndesaulniers@google.com
-
- 06 1月, 2019 4 次提交
-
-
由 Masahiro Yamada 提交于
Now that Kbuild automatically creates asm-generic wrappers for missing mandatory headers, it is redundant to list the same headers in generic-y and mandatory-y. Suggested-by: NSam Ravnborg <sam@ravnborg.org> Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Acked-by: NSam Ravnborg <sam@ravnborg.org>
-
由 Masahiro Yamada 提交于
These comments are leftovers of commit fcc8487d ("uapi: export all headers under uapi directories"). Prior to that commit, exported headers must be explicitly added to header-y. Now, all headers under the uapi/ directories are exported. Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
-
由 Masahiro Yamada 提交于
Since commit 9c2af1c7 ("kbuild: add .DELETE_ON_ERROR special target"), the target file is automatically deleted on failure. The boilerplate code ... || { rm -f $@; false; } is unneeded. Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
-
由 Masahiro Yamada 提交于
Currently, CONFIG_JUMP_LABEL just means "I _want_ to use jump label". The jump label is controlled by HAVE_JUMP_LABEL, which is defined like this: #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL) # define HAVE_JUMP_LABEL #endif We can improve this by testing 'asm goto' support in Kconfig, then make JUMP_LABEL depend on CC_HAS_ASM_GOTO. Ugly #ifdef HAVE_JUMP_LABEL will go away, and CONFIG_JUMP_LABEL will match to the real kernel capability. Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
-
- 05 1月, 2019 8 次提交
-
-
由 Christoph Hellwig 提交于
In many cases we don't have to create a GART mapping at all, which also means there is nothing to unmap. Fix the range check that was incorrectly modified when removing the mapping_error method. Fixes: 9e8aa6b5 ("x86/amd_gart: remove the mapping_error dma_map_ops method") Reported-by: NMichal Kubecek <mkubecek@suse.cz> Signed-off-by: NChristoph Hellwig <hch@lst.de> Tested-by: NMichal Kubecek <mkubecek@suse.cz>
-
由 Linus Torvalds 提交于
This has been broken forever, and nobody ever really noticed because it's purely a performance issue. Long long ago, in commit 6175ddf0 ("x86: Clean up mem*io functions") Brian Gerst simplified the memory copies to and from iomem, since on x86, the instructions to access iomem are exactly the same as the regular instructions. That is technically true, and things worked, and nobody said anything. Besides, back then the regular memcpy was pretty simple and worked fine. Nobody noticed except for David Laight, that is. David has a testing a TLP monitor he was writing for an FPGA, and has been occasionally complaining about how memcpy_toio() writes things one byte at a time. Which is completely unacceptable from a performance standpoint, even if it happens to technically work. The reason it's writing one byte at a time is because while it's technically true that accesses to iomem are the same as accesses to regular memory on x86, the _granularity_ (and ordering) of accesses matter to iomem in ways that they don't matter to regular cached memory. In particular, when ERMS is set, we default to using "rep movsb" for larger memory copies. That is indeed perfectly fine for real memory, since the whole point is that the CPU is going to do cacheline optimizations and executes the memory copy efficiently for cached memory. With iomem? Not so much. With iomem, "rep movsb" will indeed work, but it will copy things one byte at a time. Slowly and ponderously. Now, originally, back in 2010 when commit 6175ddf0 was done, we didn't use ERMS, and this was much less noticeable. Our normal memcpy() was simpler in other ways too. Because in fact, it's not just about using the string instructions. Our memcpy() these days does things like "read and write overlapping values" to handle the last bytes of the copy. Again, for normal memory, overlapping accesses isn't an issue. For iomem? It can be. So this re-introduces the specialized memcpy_toio(), memcpy_fromio() and memset_io() functions. It doesn't particularly optimize them, but it tries to at least not be horrid, or do overlapping accesses. In fact, this uses the existing __inline_memcpy() function that we still had lying around that uses our very traditional "rep movsl" loop followed by movsw/movsb for the final bytes. Somebody may decide to try to improve on it, but if we've gone almost a decade with only one person really ever noticing and complaining, maybe it's not worth worrying about further, once it's not _completely_ broken? Reported-by: NDavid Laight <David.Laight@aculab.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
This actually enables the __put_user_goto() functionality in unsafe_put_user(). For an example of the effect of this, this is the code generated for the unsafe_put_user(signo, &infop->si_signo, Efault); in the waitid() system call: movl %ecx,(%rbx) # signo, MEM[(struct __large_struct *)_2] It's just one single store instruction, along with generating an exception table entry pointing to the Efault label case in case that instruction faults. Before, we would generate this: xorl %edx, %edx movl %ecx,(%rbx) # signo, MEM[(struct __large_struct *)_3] testl %edx, %edx jne .L309 with the exception table generated for that 'mov' instruction causing us to jump to a stub that set %edx to -EFAULT and then jumped back to the 'testl' instruction. So not only do we now get rid of the extra code in the normal sequence, we also avoid unnecessarily keeping that extra error register live across it all. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
This is finally the actual reason for the odd error handling in the "unsafe_get/put_user()" functions, introduced over three years ago. Using a "jump to error label" interface is somewhat odd, but very convenient as a programming interface, and more importantly, it fits very well with simply making the target be the exception handler address directly from the inline asm. The reason it took over three years to actually do this? We need "asm goto" support for it, which only became the default on x86 last year. It's now been a year that we've forced asm goto support (see commit e501ce95 "x86: Force asm-goto"), and so let's just do it here too. [ Side note: this commit was originally done back in 2016. The above commentary about timing is obviously about it only now getting merged into my real upstream tree - Linus ] Sadly, gcc still only supports "asm goto" with asms that do not have any outputs, so we are limited to only the put_user case for this. Maybe in several more years we can do the get_user case too. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joel Fernandes (Google) 提交于
Moving page-tables at the PMD-level on x86 is known to be safe. Enable this option so that we can do fast mremap when possible. Link: http://lkml.kernel.org/r/20181108181201.88826-4-joelaf@google.comSigned-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org> Suggested-by: NKirill A. Shutemov <kirill@shutemov.name> Acked-by: NKirill A. Shutemov <kirill@shutemov.name> Cc: Julia Lawall <Julia.Lawall@lip6.fr> Cc: Michal Hocko <mhocko@kernel.org> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joel Fernandes (Google) 提交于
Patch series "Add support for fast mremap". This series speeds up the mremap(2) syscall by copying page tables at the PMD level even for non-THP systems. There is concern that the extra 'address' argument that mremap passes to pte_alloc may do something subtle architecture related in the future that may make the scheme not work. Also we find that there is no point in passing the 'address' to pte_alloc since its unused. This patch therefore removes this argument tree-wide resulting in a nice negative diff as well. Also ensuring along the way that the enabled architectures do not do anything funky with the 'address' argument that goes unnoticed by the optimization. Build and boot tested on x86-64. Build tested on arm64. The config enablement patch for arm64 will be posted in the future after more testing. The changes were obtained by applying the following Coccinelle script. (thanks Julia for answering all Coccinelle questions!). Following fix ups were done manually: * Removal of address argument from pte_fragment_alloc * Removal of pte_alloc_one_fast definitions from m68k and microblaze. // Options: --include-headers --no-includes // Note: I split the 'identifier fn' line, so if you are manually // running it, please unsplit it so it runs for you. virtual patch @pte_alloc_func_def depends on patch exists@ identifier E2; identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; type T2; @@ fn(... - , T2 E2 ) { ... } @pte_alloc_func_proto_noarg depends on patch exists@ type T1, T2, T3, T4; identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; @@ ( - T3 fn(T1, T2); + T3 fn(T1); | - T3 fn(T1, T2, T4); + T3 fn(T1, T2); ) @pte_alloc_func_proto depends on patch exists@ identifier E1, E2, E4; type T1, T2, T3, T4; identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; @@ ( - T3 fn(T1 E1, T2 E2); + T3 fn(T1 E1); | - T3 fn(T1 E1, T2 E2, T4 E4); + T3 fn(T1 E1, T2 E2); ) @pte_alloc_func_call depends on patch exists@ expression E2; identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; @@ fn(... -, E2 ) @pte_alloc_macro depends on patch exists@ identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; identifier a, b, c; expression e; position p; @@ ( - #define fn(a, b, c) e + #define fn(a, b) e | - #define fn(a, b) e + #define fn(a) e ) Link: http://lkml.kernel.org/r/20181108181201.88826-2-joelaf@google.comSigned-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org> Suggested-by: NKirill A. Shutemov <kirill@shutemov.name> Acked-by: NKirill A. Shutemov <kirill@shutemov.name> Cc: Michal Hocko <mhocko@kernel.org> Cc: Julia Lawall <Julia.Lawall@lip6.fr> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox 提交于
When testing in userspace, UBSAN pointed out that shifting into the sign bit is undefined behaviour. It doesn't really make sense to ask for the highest set bit of a negative value, so just turn the argument type into an unsigned int. Some architectures (eg ppc) already had it declared as an unsigned int, so I don't expect too many problems. Link: http://lkml.kernel.org/r/20181105221117.31828-1-willy@infradead.orgSigned-off-by: NMatthew Wilcox <willy@infradead.org> Acked-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org> Cc: <linux-arch@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
Originally, the rule used to be that you'd have to do access_ok() separately, and then user_access_begin() before actually doing the direct (optimized) user access. But experience has shown that people then decide not to do access_ok() at all, and instead rely on it being implied by other operations or similar. Which makes it very hard to verify that the access has actually been range-checked. If you use the unsafe direct user accesses, hardware features (either SMAP - Supervisor Mode Access Protection - on x86, or PAN - Privileged Access Never - on ARM) do force you to use user_access_begin(). But nothing really forces the range check. By putting the range check into user_access_begin(), we actually force people to do the right thing (tm), and the range check vill be visible near the actual accesses. We have way too long a history of people trying to avoid them. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 1月, 2019 1 次提交
-
-
由 Linus Torvalds 提交于
Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument of the user address range verification function since we got rid of the old racy i386-only code to walk page tables by hand. It existed because the original 80386 would not honor the write protect bit when in kernel mode, so you had to do COW by hand before doing any user access. But we haven't supported that in a long time, and these days the 'type' argument is a purely historical artifact. A discussion about extending 'user_access_begin()' to do the range checking resulted this patch, because there is no way we're going to move the old VERIFY_xyz interface to that model. And it's best done at the end of the merge window when I've done most of my merges, so let's just get this done once and for all. This patch was mostly done with a sed-script, with manual fix-ups for the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form. There were a couple of notable cases: - csky still had the old "verify_area()" name as an alias. - the iter_iov code had magical hardcoded knowledge of the actual values of VERIFY_{READ,WRITE} (not that they mattered, since nothing really used it) - microblaze used the type argument for a debug printout but other than those oddities this should be a total no-op patch. I tried to fix up all architectures, did fairly extensive grepping for access_ok() uses, and the changes are trivial, but I may have missed something. Any missed conversion should be trivially fixable, though. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 30 12月, 2018 2 次提交
-
-
由 Christophe Leroy 提交于
checkpatch.pl reports the following: WARNING: struct kgdb_arch should normally be const #28: FILE: arch/mips/kernel/kgdb.c:397: +struct kgdb_arch arch_kgdb_ops = { This report makes sense, as all other ops struct, this one should also be const. This patch does the change. Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Michal Simek <monstr@monstr.eu> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Burton <paul.burton@mips.com> Cc: James Hogan <jhogan@kernel.org> Cc: Ley Foon Tan <lftan@altera.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Rich Felker <dalias@libc.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: x86@kernel.org Acked-by: NDaniel Thompson <daniel.thompson@linaro.org> Acked-by: NPaul Burton <paul.burton@mips.com> Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr> Acked-by: NBorislav Petkov <bp@suse.de> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Signed-off-by: NDaniel Thompson <daniel.thompson@linaro.org>
-
由 Douglas Anderson 提交于
The function kgdb_roundup_cpus() was passed a parameter that was documented as: > the flags that will be used when restoring the interrupts. There is > local_irq_save() call before kgdb_roundup_cpus(). Nobody used those flags. Anyone who wanted to temporarily turn on interrupts just did local_irq_enable() and local_irq_disable() without looking at them. So we can definitely remove the flags. Signed-off-by: NDouglas Anderson <dianders@chromium.org> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Burton <paul.burton@mips.com> Cc: James Hogan <jhogan@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Acked-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NDaniel Thompson <daniel.thompson@linaro.org>
-
- 29 12月, 2018 7 次提交
-
-
由 Will Deacon 提交于
Whilst no architectures actually enable support for huge p4d mappings in the vmap area, the code that is implemented should be using break-before-make, as we do for pud and pmd huge entries. Link: http://lkml.kernel.org/r/1544120495-17438-6-git-send-email-will.deacon@arm.comSigned-off-by: NWill Deacon <will.deacon@arm.com> Reviewed-by: NToshi Kani <toshi.kani@hpe.com> Cc: Chintan Pandya <cpandya@codeaurora.org> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Michal Hocko <mhocko@suse.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Will Deacon 提交于
The core code already has a check for pXd_none(), so remove it from the architecture implementation. Link: http://lkml.kernel.org/r/1544120495-17438-4-git-send-email-will.deacon@arm.comSigned-off-by: NWill Deacon <will.deacon@arm.com> Acked-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NToshi Kani <toshi.kani@hpe.com> Cc: Chintan Pandya <cpandya@codeaurora.org> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oscar Salvador 提交于
Patch series "Do not touch pages in hot-remove path", v2. This patchset aims for two things: 1) A better definition about offline and hot-remove stage 2) Solving bugs where we can access non-initialized pages during hot-remove operations [2] [3]. This is achieved by moving all page/zone handling to the offline stage, so we do not need to access pages when hot-removing memory. [1] https://patchwork.kernel.org/cover/10691415/ [2] https://patchwork.kernel.org/patch/10547445/ [3] https://www.spinics.net/lists/linux-mm/msg161316.html This patch (of 5): This is a preparation for the following-up patches. The idea of passing the nid is that it will allow us to get rid of the zone parameter afterwards. Link: http://lkml.kernel.org/r/20181127162005.15833-2-osalvador@suse.deSigned-off-by: NOscar Salvador <osalvador@suse.de> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Reviewed-by: NPavel Tatashin <pasha.tatashin@soleen.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexey Dobriyan 提交于
and propagate through down the call stack. Link: http://lkml.kernel.org/r/20181124091411.GC10969@avx2Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Arun KS 提交于
totalram_pages and totalhigh_pages are made static inline function. Main motivation was that managed_page_count_lock handling was complicating things. It was discussed in length here, https://lore.kernel.org/patchwork/patch/995739/#1181785 So it seemes better to remove the lock and convert variables to atomic, with preventing poteintial store-to-read tearing as a bonus. [akpm@linux-foundation.org: coding style fixes] Link: http://lkml.kernel.org/r/1542090790-21750-4-git-send-email-arunks@codeaurora.orgSigned-off-by: NArun KS <arunks@codeaurora.org> Suggested-by: NMichal Hocko <mhocko@suse.com> Suggested-by: NVlastimil Babka <vbabka@suse.cz> Reviewed-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Reviewed-by: NPavel Tatashin <pasha.tatashin@soleen.com> Acked-by: NMichal Hocko <mhocko@suse.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Arun KS 提交于
Patch series "mm: convert totalram_pages, totalhigh_pages and managed pages to atomic", v5. This series converts totalram_pages, totalhigh_pages and zone->managed_pages to atomic variables. totalram_pages, zone->managed_pages and totalhigh_pages updates are protected by managed_page_count_lock, but readers never care about it. Convert these variables to atomic to avoid readers potentially seeing a store tear. Main motivation was that managed_page_count_lock handling was complicating things. It was discussed in length here, https://lore.kernel.org/patchwork/patch/995739/#1181785 It seemes better to remove the lock and convert variables to atomic. With the change, preventing poteintial store-to-read tearing comes as a bonus. This patch (of 4): This is in preparation to a later patch which converts totalram_pages and zone->managed_pages to atomic variables. Please note that re-reading the value might lead to a different value and as such it could lead to unexpected behavior. There are no known bugs as a result of the current code but it is better to prevent from them in principle. Link: http://lkml.kernel.org/r/1542090790-21750-2-git-send-email-arunks@codeaurora.orgSigned-off-by: NArun KS <arunks@codeaurora.org> Reviewed-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Acked-by: NMichal Hocko <mhocko@suse.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Reviewed-by: NPavel Tatashin <pasha.tatashin@soleen.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrey Konovalov 提交于
With tag based KASAN mode the early shadow value is 0xff and not 0x00, so this patch renames kasan_zero_(page|pte|pmd|pud|p4d) to kasan_early_shadow_(page|pte|pmd|pud|p4d) to avoid confusion. Link: http://lkml.kernel.org/r/3fed313280ebf4f88645f5b89ccbc066d320e177.1544099024.git.andreyknvl@google.comSigned-off-by: NAndrey Konovalov <andreyknvl@google.com> Suggested-by: NMark Rutland <mark.rutland@arm.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Christoph Lameter <cl@linux.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 23 12月, 2018 14 次提交
-
-
由 Eric Biggers 提交于
Passing atomic=true to skcipher_walk_virt() only makes the later skcipher_walk_done() calls use atomic memory allocations, not skcipher_walk_virt() itself. Thus, we have to move it outside of the preemption-disabled region (kernel_fpu_begin()/kernel_fpu_end()). (skcipher_walk_virt() only allocates memory for certain layouts of the input scatterlist, hence why I didn't notice this earlier...) Reported-by: syzbot+9bf843c33f782d73ae7d@syzkaller.appspotmail.com Fixes: 4af78261 ("crypto: x86/chacha20 - add XChaCha20 support") Signed-off-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Dave Watson 提交于
Add the appropriate scatter/gather stubs to the avx asm. In the C code, we can now always use crypt_by_sg, since both sse and asm code now support scatter/gather. Introduce a new struct, aesni_gcm_tfm, that is initialized on startup to point to either the SSE, AVX, or AVX2 versions of the four necessary encryption/decryption routines. GENX_OPTSIZE is still checked at the start of crypt_by_sg. The total size of the data is checked, since the additional overhead is in the init function, calculating additional HashKeys. Signed-off-by: NDave Watson <davejwatson@fb.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Dave Watson 提交于
Before this diff, multiple calls to GCM_ENC_DEC will succeed, but only if all calls are a multiple of 16 bytes. Handle partial blocks at the start of GCM_ENC_DEC, and update aadhash as appropriate. The data offset %r11 is also updated after the partial block. Signed-off-by: NDave Watson <davejwatson@fb.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Dave Watson 提交于
Introduce READ_PARTIAL_BLOCK macro, and use it in the two existing partial block cases: AAD and the end of ENC_DEC. In particular, the ENC_DEC case should be faster, since we read by 8/4 bytes if possible. This macro will also be used to read partial blocks between enc_update and dec_update calls. Signed-off-by: NDave Watson <davejwatson@fb.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Dave Watson 提交于
Prepare to handle partial blocks between scatter/gather calls. For the last partial block, we only want to calculate the aadhash in GCM_COMPLETE, and a new partial block macro will handle both aadhash update and encrypting partial blocks between calls. Signed-off-by: NDave Watson <davejwatson@fb.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Dave Watson 提交于
Fill in aadhash, aadlen, pblocklen, curcount with appropriate values. pblocklen, aadhash, and pblockenckey are also updated at the end of each scatter/gather operation, to be carried over to the next operation. Signed-off-by: NDave Watson <davejwatson@fb.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Dave Watson 提交于
The precompute functions differ only by the sub-macros they call, merge them to a single macro. Later diffs add more code to fill in the gcm_context_data structure, this allows changes in a single place. Signed-off-by: NDave Watson <davejwatson@fb.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Dave Watson 提交于
AAD hash only needs to be calculated once for each scatter/gather operation. Move it to its own macro, and call it from GCM_INIT instead of INITIAL_BLOCKS. Signed-off-by: NDave Watson <davejwatson@fb.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Dave Watson 提交于
Merge encode and decode tag calculations in GCM_COMPLETE macro. Scatter/gather routines will call this once at the end of encryption or decryption. Signed-off-by: NDave Watson <davejwatson@fb.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Dave Watson 提交于
Add support for 192/256-bit keys using the avx gcm/aes routines. The sse routines were previously updated in e31ac32d (Add support for 192 & 256 bit keys to AESNI RFC4106). Instead of adding an additional loop in the hotpath as in e31ac32d, this diff instead generates separate versions of the code using macros, and the entry routines choose which version once. This results in a 5% performance improvement vs. adding a loop to the hot path. This is the same strategy chosen by the intel isa-l_crypto library. The key size checks are removed from the c code where appropriate. Note that this diff depends on using gcm_context_data - 256 bit keys require 16 HashKeys + 15 expanded keys, which is larger than struct crypto_aes_ctx, so they are stored in struct gcm_context_data. Signed-off-by: NDave Watson <davejwatson@fb.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Dave Watson 提交于
Macro-ify function save and restore. These will be used in new functions added for scatter/gather update operations. Signed-off-by: NDave Watson <davejwatson@fb.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Dave Watson 提交于
Add the gcm_context_data structure to the avx asm routines. This will be necessary to support both 256 bit keys and scatter/gather. The pre-computed HashKeys are now stored in the gcm_context_data struct, which is expanded to hold the greater number of hashkeys necessary for avx. Loads and stores to the new struct are always done unlaligned to avoid compiler issues, see e5b954e8 "Use unaligned loads from gcm_context_data" Signed-off-by: NDave Watson <davejwatson@fb.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Dave Watson 提交于
The GCM_ENC_DEC routines for AVX and AVX2 are identical, except they call separate sub-macros. Pass the macros as arguments, and merge them. This facilitates additional refactoring, by requiring changes in only one place. The GCM_ENC_DEC macro was moved above the CONFIG_AS_AVX* ifdefs, since it will be used by both AVX and AVX2. Signed-off-by: NDave Watson <davejwatson@fb.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Masahiro Yamada 提交于
Avoid unneeded recreation of these in the incremental build. Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
-