1. 06 12月, 2018 1 次提交
  2. 03 8月, 2018 1 次提交
    • S
      x86/speculation: Support Enhanced IBRS on future CPUs · 706d5168
      Sai Praneeth 提交于
      Future Intel processors will support "Enhanced IBRS" which is an "always
      on" mode i.e. IBRS bit in SPEC_CTRL MSR is enabled once and never
      disabled.
      
      From the specification [1]:
      
       "With enhanced IBRS, the predicted targets of indirect branches
        executed cannot be controlled by software that was executed in a less
        privileged predictor mode or on another logical processor. As a
        result, software operating on a processor with enhanced IBRS need not
        use WRMSR to set IA32_SPEC_CTRL.IBRS after every transition to a more
        privileged predictor mode. Software can isolate predictor modes
        effectively simply by setting the bit once. Software need not disable
        enhanced IBRS prior to entering a sleep state such as MWAIT or HLT."
      
      If Enhanced IBRS is supported by the processor then use it as the
      preferred spectre v2 mitigation mechanism instead of Retpoline. Intel's
      Retpoline white paper [2] states:
      
       "Retpoline is known to be an effective branch target injection (Spectre
        variant 2) mitigation on Intel processors belonging to family 6
        (enumerated by the CPUID instruction) that do not have support for
        enhanced IBRS. On processors that support enhanced IBRS, it should be
        used for mitigation instead of retpoline."
      
      The reason why Enhanced IBRS is the recommended mitigation on processors
      which support it is that these processors also support CET which
      provides a defense against ROP attacks. Retpoline is very similar to ROP
      techniques and might trigger false positives in the CET defense.
      
      If Enhanced IBRS is selected as the mitigation technique for spectre v2,
      the IBRS bit in SPEC_CTRL MSR is set once at boot time and never
      cleared. Kernel also has to make sure that IBRS bit remains set after
      VMEXIT because the guest might have cleared the bit. This is already
      covered by the existing x86_spec_ctrl_set_guest() and
      x86_spec_ctrl_restore_host() speculation control functions.
      
      Enhanced IBRS still requires IBPB for full mitigation.
      
      [1] Speculative-Execution-Side-Channel-Mitigations.pdf
      [2] Retpoline-A-Branch-Target-Injection-Mitigation.pdf
      Both documents are available at:
      https://bugzilla.kernel.org/show_bug.cgi?id=199511Originally-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: NSai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Tim C Chen <tim.c.chen@intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Ravi Shankar <ravi.v.shankar@intel.com>
      Link: https://lkml.kernel.org/r/1533148945-24095-1-git-send-email-sai.praneeth.prakhya@intel.com
      706d5168
  3. 19 7月, 2018 1 次提交
  4. 17 5月, 2018 2 次提交
  5. 15 5月, 2018 1 次提交
  6. 14 5月, 2018 1 次提交
  7. 05 5月, 2018 1 次提交
  8. 04 5月, 2018 1 次提交
    • W
      bpf, x86_32: add eBPF JIT compiler for ia32 · 03f5781b
      Wang YanQing 提交于
      The JIT compiler emits ia32 bit instructions. Currently, It supports eBPF
      only. Classic BPF is supported because of the conversion by BPF core.
      
      Almost all instructions from eBPF ISA supported except the following:
      BPF_ALU64 | BPF_DIV | BPF_K
      BPF_ALU64 | BPF_DIV | BPF_X
      BPF_ALU64 | BPF_MOD | BPF_K
      BPF_ALU64 | BPF_MOD | BPF_X
      BPF_STX | BPF_XADD | BPF_W
      BPF_STX | BPF_XADD | BPF_DW
      
      It doesn't support BPF_JMP|BPF_CALL with BPF_PSEUDO_CALL at the moment.
      
      IA32 has few general purpose registers, EAX|EDX|ECX|EBX|ESI|EDI. I use
      EAX|EDX|ECX|EBX as temporary registers to simulate instructions in eBPF
      ISA, and allocate ESI|EDI to BPF_REG_AX for constant blinding, all others
      eBPF registers, R0-R10, are simulated through scratch space on stack.
      
      The reasons behind the hardware registers allocation policy are:
      1:MUL need EAX:EDX, shift operation need ECX, so they aren't fit
        for general eBPF 64bit register simulation.
      2:We need at least 4 registers to simulate most eBPF ISA operations
        on registers operands instead of on register&memory operands.
      3:We need to put BPF_REG_AX on hardware registers, or constant blinding
        will degrade jit performance heavily.
      
      Tested on PC (Intel(R) Core(TM) i5-5200U CPU).
      Testing results on i5-5200U:
      1) test_bpf: Summary: 349 PASSED, 0 FAILED, [319/341 JIT'ed]
      2) test_progs: Summary: 83 PASSED, 0 FAILED.
      3) test_lpm: OK
      4) test_lru_map: OK
      5) test_verifier: Summary: 828 PASSED, 0 FAILED.
      
      Above tests are all done in following two conditions separately:
      1:bpf_jit_enable=1 and bpf_jit_harden=0
      2:bpf_jit_enable=1 and bpf_jit_harden=2
      
      Below are some numbers for this jit implementation:
      Note:
        I run test_progs in kselftest 100 times continuously for every condition,
        the numbers are in format: total/times=avg.
        The numbers that test_bpf reports show almost the same relation.
      
      a:jit_enable=0 and jit_harden=0            b:jit_enable=1 and jit_harden=0
        test_pkt_access:PASS:ipv4:15622/100=156    test_pkt_access:PASS:ipv4:10674/100=106
        test_pkt_access:PASS:ipv6:9130/100=91      test_pkt_access:PASS:ipv6:4855/100=48
        test_xdp:PASS:ipv4:240198/100=2401         test_xdp:PASS:ipv4:138912/100=1389
        test_xdp:PASS:ipv6:137326/100=1373         test_xdp:PASS:ipv6:68542/100=685
        test_l4lb:PASS:ipv4:61100/100=611          test_l4lb:PASS:ipv4:37302/100=373
        test_l4lb:PASS:ipv6:101000/100=1010        test_l4lb:PASS:ipv6:55030/100=550
      
      c:jit_enable=1 and jit_harden=2
        test_pkt_access:PASS:ipv4:10558/100=105
        test_pkt_access:PASS:ipv6:5092/100=50
        test_xdp:PASS:ipv4:131902/100=1319
        test_xdp:PASS:ipv6:77932/100=779
        test_l4lb:PASS:ipv4:38924/100=389
        test_l4lb:PASS:ipv6:57520/100=575
      
      The numbers show we get 30%~50% improvement.
      
      See Documentation/networking/filter.txt for more information.
      
      Changelog:
      
       Changes v5-v6:
       1:Add do {} while (0) to RETPOLINE_RAX_BPF_JIT for
         consistence reason.
       2:Clean up non-standard comments, reported by Daniel Borkmann.
       3:Fix a memory leak issue, repoted by Daniel Borkmann.
      
       Changes v4-v5:
       1:Delete is_on_stack, BPF_REG_AX is the only one
         on real hardware registers, so just check with
         it.
       2:Apply commit 1612a981 ("bpf, x64: fix JIT emission
         for dead code"), suggested by Daniel Borkmann.
      
       Changes v3-v4:
       1:Fix changelog in commit.
         I install llvm-6.0, then test_progs willn't report errors.
         I submit another patch:
         "bpf: fix misaligned access for BPF_PROG_TYPE_PERF_EVENT program type on x86_32 platform"
         to fix another problem, after that patch, test_verifier willn't report errors too.
       2:Fix clear r0[1] twice unnecessarily in *BPF_IND|BPF_ABS* simulation.
      
       Changes v2-v3:
       1:Move BPF_REG_AX to real hardware registers for performance reason.
       3:Using bpf_load_pointer instead of bpf_jit32.S, suggested by Daniel Borkmann.
       4:Delete partial codes in 1c2a088a, suggested by Daniel Borkmann.
       5:Some bug fixes and comments improvement.
      
       Changes v1-v2:
       1:Fix bug in emit_ia32_neg64.
       2:Fix bug in emit_ia32_arsh_r64.
       3:Delete filename in top level comment, suggested by Thomas Gleixner.
       4:Delete unnecessary boiler plate text, suggested by Thomas Gleixner.
       5:Rewrite some words in changelog.
       6:CodingSytle improvement and a little more comments.
      Signed-off-by: NWang YanQing <udknight@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      03f5781b
  9. 03 5月, 2018 7 次提交
  10. 14 3月, 2018 1 次提交
  11. 23 2月, 2018 1 次提交
    • D
      bpf, x64: implement retpoline for tail call · a493a87f
      Daniel Borkmann 提交于
      Implement a retpoline [0] for the BPF tail call JIT'ing that converts
      the indirect jump via jmp %rax that is used to make the long jump into
      another JITed BPF image. Since this is subject to speculative execution,
      we need to control the transient instruction sequence here as well
      when CONFIG_RETPOLINE is set, and direct it into a pause + lfence loop.
      The latter aligns also with what gcc / clang emits (e.g. [1]).
      
      JIT dump after patch:
      
        # bpftool p d x i 1
         0: (18) r2 = map[id:1]
         2: (b7) r3 = 0
         3: (85) call bpf_tail_call#12
         4: (b7) r0 = 2
         5: (95) exit
      
      With CONFIG_RETPOLINE:
      
        # bpftool p d j i 1
        [...]
        33:	cmp    %edx,0x24(%rsi)
        36:	jbe    0x0000000000000072  |*
        38:	mov    0x24(%rbp),%eax
        3e:	cmp    $0x20,%eax
        41:	ja     0x0000000000000072  |
        43:	add    $0x1,%eax
        46:	mov    %eax,0x24(%rbp)
        4c:	mov    0x90(%rsi,%rdx,8),%rax
        54:	test   %rax,%rax
        57:	je     0x0000000000000072  |
        59:	mov    0x28(%rax),%rax
        5d:	add    $0x25,%rax
        61:	callq  0x000000000000006d  |+
        66:	pause                      |
        68:	lfence                     |
        6b:	jmp    0x0000000000000066  |
        6d:	mov    %rax,(%rsp)         |
        71:	retq                       |
        72:	mov    $0x2,%eax
        [...]
      
        * relative fall-through jumps in error case
        + retpoline for indirect jump
      
      Without CONFIG_RETPOLINE:
      
        # bpftool p d j i 1
        [...]
        33:	cmp    %edx,0x24(%rsi)
        36:	jbe    0x0000000000000063  |*
        38:	mov    0x24(%rbp),%eax
        3e:	cmp    $0x20,%eax
        41:	ja     0x0000000000000063  |
        43:	add    $0x1,%eax
        46:	mov    %eax,0x24(%rbp)
        4c:	mov    0x90(%rsi,%rdx,8),%rax
        54:	test   %rax,%rax
        57:	je     0x0000000000000063  |
        59:	mov    0x28(%rax),%rax
        5d:	add    $0x25,%rax
        61:	jmpq   *%rax               |-
        63:	mov    $0x2,%eax
        [...]
      
        * relative fall-through jumps in error case
        - plain indirect jump as before
      
        [0] https://support.google.com/faqs/answer/7625886
        [1] https://github.com/gcc-mirror/gcc/commit/a31e654fa107be968b802786d747e962c2fcdb2bSigned-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      a493a87f
  12. 21 2月, 2018 2 次提交
  13. 20 2月, 2018 2 次提交
  14. 15 2月, 2018 1 次提交
  15. 13 2月, 2018 1 次提交
    • D
      Revert "x86/speculation: Simplify indirect_branch_prediction_barrier()" · f208820a
      David Woodhouse 提交于
      This reverts commit 64e16720.
      
      We cannot call C functions like that, without marking all the
      call-clobbered registers as, well, clobbered. We might have got away
      with it for now because the __ibp_barrier() function was *fairly*
      unlikely to actually use any other registers. But no. Just no.
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: arjan.van.de.ven@intel.com
      Cc: dave.hansen@intel.com
      Cc: jmattson@google.com
      Cc: karahmed@amazon.de
      Cc: kvm@vger.kernel.org
      Cc: pbonzini@redhat.com
      Cc: rkrcmar@redhat.com
      Cc: sironi@amazon.de
      Link: http://lkml.kernel.org/r/1518305967-31356-3-git-send-email-dwmw@amazon.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f208820a
  16. 03 2月, 2018 1 次提交
  17. 28 1月, 2018 3 次提交
  18. 26 1月, 2018 2 次提交
  19. 19 1月, 2018 2 次提交
  20. 15 1月, 2018 1 次提交
    • T
      x86/retpoline: Add LFENCE to the retpoline/RSB filling RSB macros · 28d437d5
      Tom Lendacky 提交于
      The PAUSE instruction is currently used in the retpoline and RSB filling
      macros as a speculation trap.  The use of PAUSE was originally suggested
      because it showed a very, very small difference in the amount of
      cycles/time used to execute the retpoline as compared to LFENCE.  On AMD,
      the PAUSE instruction is not a serializing instruction, so the pause/jmp
      loop will use excess power as it is speculated over waiting for return
      to mispredict to the correct target.
      
      The RSB filling macro is applicable to AMD, and, if software is unable to
      verify that LFENCE is serializing on AMD (possible when running under a
      hypervisor), the generic retpoline support will be used and, so, is also
      applicable to AMD.  Keep the current usage of PAUSE for Intel, but add an
      LFENCE instruction to the speculation trap for AMD.
      
      The same sequence has been adopted by GCC for the GCC generated retpolines.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@alien8.de>
      Acked-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Acked-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
      Cc: Kees Cook <keescook@google.com>
      Link: https://lkml.kernel.org/r/20180113232730.31060.36287.stgit@tlendack-t1.amdoffice.net
      28d437d5
  21. 12 1月, 2018 3 次提交
    • D
      x86/retpoline: Fill return stack buffer on vmexit · 117cc7a9
      David Woodhouse 提交于
      In accordance with the Intel and AMD documentation, we need to overwrite
      all entries in the RSB on exiting a guest, to prevent malicious branch
      target predictions from affecting the host kernel. This is needed both
      for retpoline and for IBRS.
      
      [ak: numbers again for the RSB stuffing labels]
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: thomas.lendacky@amd.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Link: https://lkml.kernel.org/r/1515755487-8524-1-git-send-email-dwmw@amazon.co.uk
      117cc7a9
    • D
      x86/spectre: Add boot time option to select Spectre v2 mitigation · da285121
      David Woodhouse 提交于
      Add a spectre_v2= option to select the mitigation used for the indirect
      branch speculation vulnerability.
      
      Currently, the only option available is retpoline, in its various forms.
      This will be expanded to cover the new IBRS/IBPB microcode features.
      
      The RETPOLINE_AMD feature relies on a serializing LFENCE for speculation
      control. For AMD hardware, only set RETPOLINE_AMD if LFENCE is a
      serializing instruction, which is indicated by the LFENCE_RDTSC feature.
      
      [ tglx: Folded back the LFENCE/AMD fixes and reworked it so IBRS
        	integration becomes simple ]
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: thomas.lendacky@amd.com
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Link: https://lkml.kernel.org/r/1515707194-20531-5-git-send-email-dwmw@amazon.co.uk
      da285121
    • D
      x86/retpoline: Add initial retpoline support · 76b04384
      David Woodhouse 提交于
      Enable the use of -mindirect-branch=thunk-extern in newer GCC, and provide
      the corresponding thunks. Provide assembler macros for invoking the thunks
      in the same way that GCC does, from native and inline assembler.
      
      This adds X86_FEATURE_RETPOLINE and sets it by default on all CPUs. In
      some circumstances, IBRS microcode features may be used instead, and the
      retpoline can be disabled.
      
      On AMD CPUs if lfence is serialising, the retpoline can be dramatically
      simplified to a simple "lfence; jmp *\reg". A future patch, after it has
      been verified that lfence really is serialising in all circumstances, can
      enable this by setting the X86_FEATURE_RETPOLINE_AMD feature bit in addition
      to X86_FEATURE_RETPOLINE.
      
      Do not align the retpoline in the altinstr section, because there is no
      guarantee that it stays aligned when it's copied over the oldinstr during
      alternative patching.
      
      [ Andi Kleen: Rename the macros, add CONFIG_RETPOLINE option, export thunks]
      [ tglx: Put actual function CALL/JMP in front of the macros, convert to
        	symbolic labels ]
      [ dwmw2: Convert back to numeric labels, merge objtool fixes ]
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NArjan van de Ven <arjan@linux.intel.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Cc: gnomes@lxorguk.ukuu.org.uk
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: thomas.lendacky@amd.com
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Link: https://lkml.kernel.org/r/1515707194-20531-4-git-send-email-dwmw@amazon.co.uk
      76b04384