1. 28 2月, 2019 1 次提交
    • D
      x86, retpolines: Raise limit for generating indirect calls from switch-case · ce02ef06
      Daniel Borkmann 提交于
      From networking side, there are numerous attempts to get rid of indirect
      calls in fast-path wherever feasible in order to avoid the cost of
      retpolines, for example, just to name a few:
      
        * 283c16a2 ("indirect call wrappers: helpers to speed-up indirect calls of builtin")
        * aaa5d90b ("net: use indirect call wrappers at GRO network layer")
        * 028e0a47 ("net: use indirect call wrappers at GRO transport layer")
        * 356da6d0 ("dma-mapping: bypass indirect calls for dma-direct")
        * 09772d92 ("bpf: avoid retpoline for lookup/update/delete calls on maps")
        * 10870dd8 ("netfilter: nf_tables: add direct calls for all builtin expressions")
        [...]
      
      Recent work on XDP from Björn and Magnus additionally found that manually
      transforming the XDP return code switch statement with more than 5 cases
      into if-else combination would result in a considerable speedup in XDP
      layer due to avoidance of indirect calls in CONFIG_RETPOLINE enabled
      builds. On i40e driver with XDP prog attached, a 20-26% speedup has been
      observed [0]. Aside from XDP, there are many other places later in the
      networking stack's critical path with similar switch-case
      processing. Rather than fixing every XDP-enabled driver and locations in
      stack by hand, it would be good to instead raise the limit where gcc would
      emit expensive indirect calls from the switch under retpolines and stick
      with the default as-is in case of !retpoline configured kernels. This would
      also have the advantage that for archs where this is not necessary, we let
      compiler select the underlying target optimization for these constructs and
      avoid potential slow-downs by if-else hand-rewrite.
      
      In case of gcc, this setting is controlled by case-values-threshold which
      has an architecture global default that selects 4 or 5 (latter if target
      does not have a case insn that compares the bounds) where some arch back
      ends like arm64 or s390 override it with their own target hooks, for
      example, in gcc commit db7a90aa0de5 ("S/390: Disable prediction of indirect
      branches") the threshold pretty much disables jump tables by limit of 20
      under retpoline builds.  Comparing gcc's and clang's default code
      generation on x86-64 under O2 level with retpoline build results in the
      following outcome for 5 switch cases:
      
      * gcc with -mindirect-branch=thunk-inline -mindirect-branch-register:
      
        # gdb -batch -ex 'disassemble dispatch' ./c-switch
        Dump of assembler code for function dispatch:
         0x0000000000400be0 <+0>:     cmp    $0x4,%edi
         0x0000000000400be3 <+3>:     ja     0x400c35 <dispatch+85>
         0x0000000000400be5 <+5>:     lea    0x915f8(%rip),%rdx        # 0x4921e4
         0x0000000000400bec <+12>:    mov    %edi,%edi
         0x0000000000400bee <+14>:    movslq (%rdx,%rdi,4),%rax
         0x0000000000400bf2 <+18>:    add    %rdx,%rax
         0x0000000000400bf5 <+21>:    callq  0x400c01 <dispatch+33>
         0x0000000000400bfa <+26>:    pause
         0x0000000000400bfc <+28>:    lfence
         0x0000000000400bff <+31>:    jmp    0x400bfa <dispatch+26>
         0x0000000000400c01 <+33>:    mov    %rax,(%rsp)
         0x0000000000400c05 <+37>:    retq
         0x0000000000400c06 <+38>:    nopw   %cs:0x0(%rax,%rax,1)
         0x0000000000400c10 <+48>:    jmpq   0x400c90 <fn_3>
         0x0000000000400c15 <+53>:    nopl   (%rax)
         0x0000000000400c18 <+56>:    jmpq   0x400c70 <fn_2>
         0x0000000000400c1d <+61>:    nopl   (%rax)
         0x0000000000400c20 <+64>:    jmpq   0x400c50 <fn_1>
         0x0000000000400c25 <+69>:    nopl   (%rax)
         0x0000000000400c28 <+72>:    jmpq   0x400c40 <fn_0>
         0x0000000000400c2d <+77>:    nopl   (%rax)
         0x0000000000400c30 <+80>:    jmpq   0x400cb0 <fn_4>
         0x0000000000400c35 <+85>:    push   %rax
         0x0000000000400c36 <+86>:    callq  0x40dd80 <abort>
        End of assembler dump.
      
      * clang with -mretpoline emitting search tree:
      
        # gdb -batch -ex 'disassemble dispatch' ./c-switch
        Dump of assembler code for function dispatch:
         0x0000000000400b30 <+0>:     cmp    $0x1,%edi
         0x0000000000400b33 <+3>:     jle    0x400b44 <dispatch+20>
         0x0000000000400b35 <+5>:     cmp    $0x2,%edi
         0x0000000000400b38 <+8>:     je     0x400b4d <dispatch+29>
         0x0000000000400b3a <+10>:    cmp    $0x3,%edi
         0x0000000000400b3d <+13>:    jne    0x400b52 <dispatch+34>
         0x0000000000400b3f <+15>:    jmpq   0x400c50 <fn_3>
         0x0000000000400b44 <+20>:    test   %edi,%edi
         0x0000000000400b46 <+22>:    jne    0x400b5c <dispatch+44>
         0x0000000000400b48 <+24>:    jmpq   0x400c20 <fn_0>
         0x0000000000400b4d <+29>:    jmpq   0x400c40 <fn_2>
         0x0000000000400b52 <+34>:    cmp    $0x4,%edi
         0x0000000000400b55 <+37>:    jne    0x400b66 <dispatch+54>
         0x0000000000400b57 <+39>:    jmpq   0x400c60 <fn_4>
         0x0000000000400b5c <+44>:    cmp    $0x1,%edi
         0x0000000000400b5f <+47>:    jne    0x400b66 <dispatch+54>
         0x0000000000400b61 <+49>:    jmpq   0x400c30 <fn_1>
         0x0000000000400b66 <+54>:    push   %rax
         0x0000000000400b67 <+55>:    callq  0x40dd20 <abort>
        End of assembler dump.
      
        For sake of comparison, clang without -mretpoline:
      
        # gdb -batch -ex 'disassemble dispatch' ./c-switch
        Dump of assembler code for function dispatch:
         0x0000000000400b30 <+0>:	cmp    $0x4,%edi
         0x0000000000400b33 <+3>:	ja     0x400b57 <dispatch+39>
         0x0000000000400b35 <+5>:	mov    %edi,%eax
         0x0000000000400b37 <+7>:	jmpq   *0x492148(,%rax,8)
         0x0000000000400b3e <+14>:	jmpq   0x400bf0 <fn_0>
         0x0000000000400b43 <+19>:	jmpq   0x400c30 <fn_4>
         0x0000000000400b48 <+24>:	jmpq   0x400c10 <fn_2>
         0x0000000000400b4d <+29>:	jmpq   0x400c20 <fn_3>
         0x0000000000400b52 <+34>:	jmpq   0x400c00 <fn_1>
         0x0000000000400b57 <+39>:	push   %rax
         0x0000000000400b58 <+40>:	callq  0x40dcf0 <abort>
        End of assembler dump.
      
      Raising the cases to a high number (e.g. 100) will still result in similar
      code generation pattern with clang and gcc as above, in other words clang
      generally turns off jump table emission by having an extra expansion pass
      under retpoline build to turn indirectbr instructions from their IR into
      switch instructions as a built-in -mno-jump-table lowering of a switch (in
      this case, even if IR input already contained an indirect branch).
      
      For gcc, adding --param=case-values-threshold=20 as in similar fashion as
      s390 in order to raise the limit for x86 retpoline enabled builds results
      in a small vmlinux size increase of only 0.13% (before=18,027,528
      after=18,051,192). For clang this option is ignored due to i) not being
      needed as mentioned and ii) not having above cmdline
      parameter. Non-retpoline-enabled builds with gcc continue to use the
      default case-values-threshold setting, so nothing changes here.
      
      [0] https://lore.kernel.org/netdev/20190129095754.9390-1-bjorn.topel@gmail.com/
          and "The Path to DPDK Speeds for AF_XDP", LPC 2018, networking track:
        - http://vger.kernel.org/lpc_net2018_talks/lpc18_pres_af_xdp_perf-v3.pdf
        - http://vger.kernel.org/lpc_net2018_talks/lpc18_paper_af_xdp_perf-v2.pdfSigned-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NBjörn Töpel <bjorn.topel@intel.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: netdev@vger.kernel.org
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Magnus Karlsson <magnus.karlsson@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Link: https://lkml.kernel.org/r/20190221221941.29358-1-daniel@iogearbox.net
      ce02ef06
  2. 16 1月, 2019 1 次提交
  3. 12 1月, 2019 1 次提交
    • G
      x86/build: Specify elf_i386 linker emulation explicitly for i386 objects · 927185c1
      George Rimar 提交于
      The kernel uses the OUTPUT_FORMAT linker script command in it's linker
      scripts. Most of the time, the -m option is passed to the linker with
      correct architecture, but sometimes (at least for x86_64) the -m option
      contradicts the OUTPUT_FORMAT directive.
      
      Specifically, arch/x86/boot and arch/x86/realmode/rm produce i386 object
      files, but are linked with the -m elf_x86_64 linker flag when building
      for x86_64.
      
      The GNU linker manpage doesn't explicitly state any tie-breakers between
      -m and OUTPUT_FORMAT. But with BFD and Gold linkers, OUTPUT_FORMAT
      overrides the emulation value specified with the -m option.
      
      LLVM lld has a different behavior, however. When supplied with
      contradicting -m and OUTPUT_FORMAT values it fails with the following
      error message:
      
        ld.lld: error: arch/x86/realmode/rm/header.o is incompatible with elf_x86_64
      
      Therefore, just add the correct -m after the incorrect one (it overrides
      it), so the linker invocation looks like this:
      
        ld -m elf_x86_64 -z max-page-size=0x200000 -m elf_i386 --emit-relocs -T \
          realmode.lds header.o trampoline_64.o stack.o reboot.o -o realmode.elf
      
      This is not a functional change for GNU ld, because (although not
      explicitly documented) OUTPUT_FORMAT overrides -m EMULATION.
      
      Tested by building x86_64 kernel with GNU gcc/ld toolchain and booting
      it in QEMU.
      
       [ bp: massage and clarify text. ]
      Suggested-by: NDmitry Golovin <dima@golovin.in>
      Signed-off-by: NGeorge Rimar <grimar@accesssoftek.com>
      Signed-off-by: NTri Vo <trong@android.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Tested-by: NTri Vo <trong@android.com>
      Tested-by: NNick Desaulniers <ndesaulniers@google.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Michael Matz <matz@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: morbo@google.com
      Cc: ndesaulniers@google.com
      Cc: ruiu@google.com
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190111201012.71210-1-trong@android.com
      927185c1
  4. 09 1月, 2019 1 次提交
    • R
      x86/build: Mark per-CPU symbols as absolute explicitly for LLD · d071ae09
      Rafael Ávila de Espíndola 提交于
      Accessing per-CPU variables is done by finding the offset of the
      variable in the per-CPU block and adding it to the address of the
      respective CPU's block.
      
      Section 3.10.8 of ld.bfd's documentation states:
      
        For expressions involving numbers, relative addresses and absolute
        addresses, ld follows these rules to evaluate terms:
      
        Other binary operations, that is, between two relative addresses
        not in the same section, or between a relative address and an
        absolute address, first convert any non-absolute term to an
        absolute address before applying the operator."
      
      Note that LLVM's linker does not adhere to the GNU ld's implementation
      and as such requires implicitly-absolute terms to be explicitly marked
      as absolute in the linker script. If not, it fails currently with:
      
        ld.lld: error: ./arch/x86/kernel/vmlinux.lds:153: at least one side of the expression must be absolute
        ld.lld: error: ./arch/x86/kernel/vmlinux.lds:154: at least one side of the expression must be absolute
        Makefile:1040: recipe for target 'vmlinux' failed
      
      This is not a functional change for ld.bfd which converts the term to an
      absolute symbol anyways as specified above.
      
      Based on a previous submission by Tri Vo <trong@android.com>.
      Reported-by: NDmitry Golovin <dima@golovin.in>
      Signed-off-by: NRafael Ávila de Espíndola <rafael@espindo.la>
      [ Update commit message per Boris' and Michael's suggestions. ]
      Signed-off-by: NNick Desaulniers <ndesaulniers@google.com>
      [ Massage commit message more, fix typos. ]
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Tested-by: NDmitry Golovin <dima@golovin.in>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Cao Jin <caoj.fnst@cn.fujitsu.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tri Vo <trong@android.com>
      Cc: dima@golovin.in
      Cc: morbo@google.com
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20181219190145.252035-1-ndesaulniers@google.com
      d071ae09
  5. 06 1月, 2019 4 次提交
  6. 05 1月, 2019 8 次提交
    • C
      x86/amd_gart: fix unmapping of non-GART mappings · 06f55fd2
      Christoph Hellwig 提交于
      In many cases we don't have to create a GART mapping at all, which
      also means there is nothing to unmap.  Fix the range check that was
      incorrectly modified when removing the mapping_error method.
      
      Fixes: 9e8aa6b5 ("x86/amd_gart: remove the mapping_error dma_map_ops method")
      Reported-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NMichal Kubecek <mkubecek@suse.cz>
      06f55fd2
    • L
      x86: re-introduce non-generic memcpy_{to,from}io · 170d13ca
      Linus Torvalds 提交于
      This has been broken forever, and nobody ever really noticed because
      it's purely a performance issue.
      
      Long long ago, in commit 6175ddf0 ("x86: Clean up mem*io functions")
      Brian Gerst simplified the memory copies to and from iomem, since on
      x86, the instructions to access iomem are exactly the same as the
      regular instructions.
      
      That is technically true, and things worked, and nobody said anything.
      Besides, back then the regular memcpy was pretty simple and worked fine.
      
      Nobody noticed except for David Laight, that is.  David has a testing a
      TLP monitor he was writing for an FPGA, and has been occasionally
      complaining about how memcpy_toio() writes things one byte at a time.
      
      Which is completely unacceptable from a performance standpoint, even if
      it happens to technically work.
      
      The reason it's writing one byte at a time is because while it's
      technically true that accesses to iomem are the same as accesses to
      regular memory on x86, the _granularity_ (and ordering) of accesses
      matter to iomem in ways that they don't matter to regular cached memory.
      
      In particular, when ERMS is set, we default to using "rep movsb" for
      larger memory copies.  That is indeed perfectly fine for real memory,
      since the whole point is that the CPU is going to do cacheline
      optimizations and executes the memory copy efficiently for cached
      memory.
      
      With iomem? Not so much.  With iomem, "rep movsb" will indeed work, but
      it will copy things one byte at a time. Slowly and ponderously.
      
      Now, originally, back in 2010 when commit 6175ddf0 was done, we
      didn't use ERMS, and this was much less noticeable.
      
      Our normal memcpy() was simpler in other ways too.
      
      Because in fact, it's not just about using the string instructions.  Our
      memcpy() these days does things like "read and write overlapping values"
      to handle the last bytes of the copy.  Again, for normal memory,
      overlapping accesses isn't an issue.  For iomem? It can be.
      
      So this re-introduces the specialized memcpy_toio(), memcpy_fromio() and
      memset_io() functions.  It doesn't particularly optimize them, but it
      tries to at least not be horrid, or do overlapping accesses.  In fact,
      this uses the existing __inline_memcpy() function that we still had
      lying around that uses our very traditional "rep movsl" loop followed by
      movsw/movsb for the final bytes.
      
      Somebody may decide to try to improve on it, but if we've gone almost a
      decade with only one person really ever noticing and complaining, maybe
      it's not worth worrying about further, once it's not _completely_ broken?
      Reported-by: NDavid Laight <David.Laight@aculab.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      170d13ca
    • L
      Use __put_user_goto in __put_user_size() and unsafe_put_user() · a959dc88
      Linus Torvalds 提交于
      This actually enables the __put_user_goto() functionality in
      unsafe_put_user().
      
      For an example of the effect of this, this is the code generated for the
      
              unsafe_put_user(signo, &infop->si_signo, Efault);
      
      in the waitid() system call:
      
      	movl %ecx,(%rbx)        # signo, MEM[(struct __large_struct *)_2]
      
      It's just one single store instruction, along with generating an
      exception table entry pointing to the Efault label case in case that
      instruction faults.
      
      Before, we would generate this:
      
      	xorl    %edx, %edx
      	movl %ecx,(%rbx)        # signo, MEM[(struct __large_struct *)_3]
              testl   %edx, %edx
              jne     .L309
      
      with the exception table generated for that 'mov' instruction causing us
      to jump to a stub that set %edx to -EFAULT and then jumped back to the
      'testl' instruction.
      
      So not only do we now get rid of the extra code in the normal sequence,
      we also avoid unnecessarily keeping that extra error register live
      across it all.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a959dc88
    • L
      x86 uaccess: Introduce __put_user_goto · 4a789213
      Linus Torvalds 提交于
      This is finally the actual reason for the odd error handling in the
      "unsafe_get/put_user()" functions, introduced over three years ago.
      
      Using a "jump to error label" interface is somewhat odd, but very
      convenient as a programming interface, and more importantly, it fits
      very well with simply making the target be the exception handler address
      directly from the inline asm.
      
      The reason it took over three years to actually do this? We need "asm
      goto" support for it, which only became the default on x86 last year.
      It's now been a year that we've forced asm goto support (see commit
      e501ce95 "x86: Force asm-goto"), and so let's just do it here too.
      
      [ Side note: this commit was originally done back in 2016. The above
        commentary about timing is obviously about it only now getting merged
        into my real upstream tree     - Linus ]
      
      Sadly, gcc still only supports "asm goto" with asms that do not have any
      outputs, so we are limited to only the put_user case for this.  Maybe in
      several more years we can do the get_user case too.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4a789213
    • J
      mm: select HAVE_MOVE_PMD on x86 for faster mremap · 9f132f7e
      Joel Fernandes (Google) 提交于
      Moving page-tables at the PMD-level on x86 is known to be safe.  Enable
      this option so that we can do fast mremap when possible.
      
      Link: http://lkml.kernel.org/r/20181108181201.88826-4-joelaf@google.comSigned-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
      Suggested-by: NKirill A. Shutemov <kirill@shutemov.name>
      Acked-by: NKirill A. Shutemov <kirill@shutemov.name>
      Cc: Julia Lawall <Julia.Lawall@lip6.fr>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: William Kucharski <william.kucharski@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9f132f7e
    • J
      mm: treewide: remove unused address argument from pte_alloc functions · 4cf58924
      Joel Fernandes (Google) 提交于
      Patch series "Add support for fast mremap".
      
      This series speeds up the mremap(2) syscall by copying page tables at
      the PMD level even for non-THP systems.  There is concern that the extra
      'address' argument that mremap passes to pte_alloc may do something
      subtle architecture related in the future that may make the scheme not
      work.  Also we find that there is no point in passing the 'address' to
      pte_alloc since its unused.  This patch therefore removes this argument
      tree-wide resulting in a nice negative diff as well.  Also ensuring
      along the way that the enabled architectures do not do anything funky
      with the 'address' argument that goes unnoticed by the optimization.
      
      Build and boot tested on x86-64.  Build tested on arm64.  The config
      enablement patch for arm64 will be posted in the future after more
      testing.
      
      The changes were obtained by applying the following Coccinelle script.
      (thanks Julia for answering all Coccinelle questions!).
      Following fix ups were done manually:
      * Removal of address argument from  pte_fragment_alloc
      * Removal of pte_alloc_one_fast definitions from m68k and microblaze.
      
      // Options: --include-headers --no-includes
      // Note: I split the 'identifier fn' line, so if you are manually
      // running it, please unsplit it so it runs for you.
      
      virtual patch
      
      @pte_alloc_func_def depends on patch exists@
      identifier E2;
      identifier fn =~
      "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      type T2;
      @@
      
       fn(...
      - , T2 E2
       )
       { ... }
      
      @pte_alloc_func_proto_noarg depends on patch exists@
      type T1, T2, T3, T4;
      identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      @@
      
      (
      - T3 fn(T1, T2);
      + T3 fn(T1);
      |
      - T3 fn(T1, T2, T4);
      + T3 fn(T1, T2);
      )
      
      @pte_alloc_func_proto depends on patch exists@
      identifier E1, E2, E4;
      type T1, T2, T3, T4;
      identifier fn =~
      "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      @@
      
      (
      - T3 fn(T1 E1, T2 E2);
      + T3 fn(T1 E1);
      |
      - T3 fn(T1 E1, T2 E2, T4 E4);
      + T3 fn(T1 E1, T2 E2);
      )
      
      @pte_alloc_func_call depends on patch exists@
      expression E2;
      identifier fn =~
      "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      @@
      
       fn(...
      -,  E2
       )
      
      @pte_alloc_macro depends on patch exists@
      identifier fn =~
      "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      identifier a, b, c;
      expression e;
      position p;
      @@
      
      (
      - #define fn(a, b, c) e
      + #define fn(a, b) e
      |
      - #define fn(a, b) e
      + #define fn(a) e
      )
      
      Link: http://lkml.kernel.org/r/20181108181201.88826-2-joelaf@google.comSigned-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
      Suggested-by: NKirill A. Shutemov <kirill@shutemov.name>
      Acked-by: NKirill A. Shutemov <kirill@shutemov.name>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Julia Lawall <Julia.Lawall@lip6.fr>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: William Kucharski <william.kucharski@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4cf58924
    • M
      fls: change parameter to unsigned int · 3fc2579e
      Matthew Wilcox 提交于
      When testing in userspace, UBSAN pointed out that shifting into the sign
      bit is undefined behaviour.  It doesn't really make sense to ask for the
      highest set bit of a negative value, so just turn the argument type into
      an unsigned int.
      
      Some architectures (eg ppc) already had it declared as an unsigned int,
      so I don't expect too many problems.
      
      Link: http://lkml.kernel.org/r/20181105221117.31828-1-willy@infradead.orgSigned-off-by: NMatthew Wilcox <willy@infradead.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3fc2579e
    • L
      make 'user_access_begin()' do 'access_ok()' · 594cc251
      Linus Torvalds 提交于
      Originally, the rule used to be that you'd have to do access_ok()
      separately, and then user_access_begin() before actually doing the
      direct (optimized) user access.
      
      But experience has shown that people then decide not to do access_ok()
      at all, and instead rely on it being implied by other operations or
      similar.  Which makes it very hard to verify that the access has
      actually been range-checked.
      
      If you use the unsafe direct user accesses, hardware features (either
      SMAP - Supervisor Mode Access Protection - on x86, or PAN - Privileged
      Access Never - on ARM) do force you to use user_access_begin().  But
      nothing really forces the range check.
      
      By putting the range check into user_access_begin(), we actually force
      people to do the right thing (tm), and the range check vill be visible
      near the actual accesses.  We have way too long a history of people
      trying to avoid them.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      594cc251
  7. 04 1月, 2019 1 次提交
    • L
      Remove 'type' argument from access_ok() function · 96d4f267
      Linus Torvalds 提交于
      Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
      of the user address range verification function since we got rid of the
      old racy i386-only code to walk page tables by hand.
      
      It existed because the original 80386 would not honor the write protect
      bit when in kernel mode, so you had to do COW by hand before doing any
      user access.  But we haven't supported that in a long time, and these
      days the 'type' argument is a purely historical artifact.
      
      A discussion about extending 'user_access_begin()' to do the range
      checking resulted this patch, because there is no way we're going to
      move the old VERIFY_xyz interface to that model.  And it's best done at
      the end of the merge window when I've done most of my merges, so let's
      just get this done once and for all.
      
      This patch was mostly done with a sed-script, with manual fix-ups for
      the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
      
      There were a couple of notable cases:
      
       - csky still had the old "verify_area()" name as an alias.
      
       - the iter_iov code had magical hardcoded knowledge of the actual
         values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
         really used it)
      
       - microblaze used the type argument for a debug printout
      
      but other than those oddities this should be a total no-op patch.
      
      I tried to fix up all architectures, did fairly extensive grepping for
      access_ok() uses, and the changes are trivial, but I may have missed
      something.  Any missed conversion should be trivially fixable, though.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      96d4f267
  8. 30 12月, 2018 2 次提交
    • C
      kgdb/treewide: constify struct kgdb_arch arch_kgdb_ops · cc028297
      Christophe Leroy 提交于
      checkpatch.pl reports the following:
      
        WARNING: struct kgdb_arch should normally be const
        #28: FILE: arch/mips/kernel/kgdb.c:397:
        +struct kgdb_arch arch_kgdb_ops = {
      
      This report makes sense, as all other ops struct, this
      one should also be const. This patch does the change.
      
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Rich Felker <dalias@libc.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: x86@kernel.org
      Acked-by: NDaniel Thompson <daniel.thompson@linaro.org>
      Acked-by: NPaul Burton <paul.burton@mips.com>
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Signed-off-by: NDaniel Thompson <daniel.thompson@linaro.org>
      cc028297
    • D
      kgdb: Remove irq flags from roundup · 9ef7fa50
      Douglas Anderson 提交于
      The function kgdb_roundup_cpus() was passed a parameter that was
      documented as:
      
      > the flags that will be used when restoring the interrupts. There is
      > local_irq_save() call before kgdb_roundup_cpus().
      
      Nobody used those flags.  Anyone who wanted to temporarily turn on
      interrupts just did local_irq_enable() and local_irq_disable() without
      looking at them.  So we can definitely remove the flags.
      Signed-off-by: NDouglas Anderson <dianders@chromium.org>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NDaniel Thompson <daniel.thompson@linaro.org>
      9ef7fa50
  9. 29 12月, 2018 7 次提交
  10. 23 12月, 2018 14 次提交