1. 26 3月, 2021 1 次提交
    • J
      x86/build: Propagate $(CLANG_FLAGS) to $(REALMODE_FLAGS) · 8abe7fc2
      John Millikin 提交于
      When cross-compiling with Clang, the `$(CLANG_FLAGS)' variable
      contains additional flags needed to build C and assembly sources
      for the target platform. Normally this variable is automatically
      included in `$(KBUILD_CFLAGS)' via the top-level Makefile.
      
      The x86 real-mode makefile builds `$(REALMODE_CFLAGS)' from a
      plain assignment and therefore drops the Clang flags. This causes
      Clang to not recognize x86-specific assembler directives:
      
        arch/x86/realmode/rm/header.S:36:1: error: unknown directive
        .type real_mode_header STT_OBJECT ; .size real_mode_header, .-real_mode_header
        ^
      
      Explicit propagation of `$(CLANG_FLAGS)' to `$(REALMODE_CFLAGS)',
      which is inherited by real-mode make rules, fixes cross-compilation
      with Clang for x86 targets.
      
      Relevant flags:
      
      * `--target' sets the target architecture when cross-compiling. This
        flag must be set for both compilation and assembly (`KBUILD_AFLAGS')
        to support architecture-specific assembler directives.
      
      * `-no-integrated-as' tells clang to assemble with GNU Assembler
        instead of its built-in LLVM assembler. This flag is set by default
        unless `LLVM_IAS=1' is set, because the LLVM assembler can't yet
        parse certain GNU extensions.
      Signed-off-by: NJohn Millikin <john@john-millikin.com>
      Signed-off-by: NNathan Chancellor <nathan@kernel.org>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NArd Biesheuvel <ardb@kernel.org>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Link: https://lkml.kernel.org/r/20210326000435.4785-2-nathan@kernel.org
      8abe7fc2
  2. 23 3月, 2021 1 次提交
  3. 24 2月, 2021 1 次提交
  4. 09 2月, 2021 1 次提交
  5. 30 1月, 2021 1 次提交
  6. 29 1月, 2021 1 次提交
  7. 29 12月, 2020 2 次提交
  8. 02 12月, 2020 1 次提交
  9. 01 12月, 2020 1 次提交
  10. 03 9月, 2020 1 次提交
  11. 24 7月, 2020 1 次提交
  12. 07 7月, 2020 2 次提交
  13. 22 4月, 2020 1 次提交
  14. 08 4月, 2020 6 次提交
  15. 05 3月, 2020 1 次提交
  16. 28 8月, 2019 1 次提交
    • L
      x86/build: Add -Wnoaddress-of-packed-member to REALMODE_CFLAGS, to silence GCC9 build warning · 42e0e954
      Linus Torvalds 提交于
      One of the very few warnings I have in the current build comes from
      arch/x86/boot/edd.c, where I get the following with a gcc9 build:
      
         arch/x86/boot/edd.c: In function ‘query_edd’:
         arch/x86/boot/edd.c:148:11: warning: taking address of packed member of ‘struct boot_params’ may result in an unaligned pointer value [-Waddress-of-packed-member]
           148 |  mbrptr = boot_params.edd_mbr_sig_buffer;
               |           ^~~~~~~~~~~
      
      This warning triggers because we throw away all the CFLAGS and then make
      a new set for REALMODE_CFLAGS, so the -Wno-address-of-packed-member we
      added in the following commit is not present:
      
        6f303d60 ("gcc-9: silence 'address-of-packed-member' warning")
      
      The simplest solution for now is to adjust the warning for this version
      of CFLAGS as well, but it would definitely make sense to examine whether
      REALMODE_CFLAGS could be derived from CFLAGS, so that it picks up changes
      in the compiler flags environment automatically.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: NBorislav Petkov <bp@alien8.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      42e0e954
  17. 05 4月, 2019 1 次提交
  18. 28 3月, 2019 1 次提交
    • D
      x86/retpolines: Disable switch jump tables when retpolines are enabled · a9d57ef1
      Daniel Borkmann 提交于
      Commit ce02ef06 ("x86, retpolines: Raise limit for generating indirect
      calls from switch-case") raised the limit under retpolines to 20 switch
      cases where gcc would only then start to emit jump tables, and therefore
      effectively disabling the emission of slow indirect calls in this area.
      
      After this has been brought to attention to gcc folks [0], Martin Liska
      has then fixed gcc to align with clang by avoiding to generate switch jump
      tables entirely under retpolines. This is taking effect in gcc starting
      from stable version 8.4.0. Given kernel supports compilation with older
      versions of gcc where the fix is not being available or backported anymore,
      we need to keep the extra KBUILD_CFLAGS around for some time and generally
      set the -fno-jump-tables to align with what more recent gcc is doing
      automatically today.
      
      More than 20 switch cases are not expected to be fast-path critical, but
      it would still be good to align with gcc behavior for versions < 8.4.0 in
      order to have consistency across supported gcc versions. vmlinux size is
      slightly growing by 0.27% for older gcc. This flag is only set to work
      around affected gcc, no change for clang.
      
        [0] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86952Suggested-by: NMartin Liska <mliska@suse.cz>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Björn Töpel<bjorn.topel@intel.com>
      Cc: Magnus Karlsson <magnus.karlsson@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: H.J. Lu <hjl.tools@gmail.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: David S. Miller <davem@davemloft.net>
      Link: https://lkml.kernel.org/r/20190325135620.14882-1-daniel@iogearbox.net
      a9d57ef1
  19. 28 2月, 2019 1 次提交
    • D
      x86, retpolines: Raise limit for generating indirect calls from switch-case · ce02ef06
      Daniel Borkmann 提交于
      From networking side, there are numerous attempts to get rid of indirect
      calls in fast-path wherever feasible in order to avoid the cost of
      retpolines, for example, just to name a few:
      
        * 283c16a2 ("indirect call wrappers: helpers to speed-up indirect calls of builtin")
        * aaa5d90b ("net: use indirect call wrappers at GRO network layer")
        * 028e0a47 ("net: use indirect call wrappers at GRO transport layer")
        * 356da6d0 ("dma-mapping: bypass indirect calls for dma-direct")
        * 09772d92 ("bpf: avoid retpoline for lookup/update/delete calls on maps")
        * 10870dd8 ("netfilter: nf_tables: add direct calls for all builtin expressions")
        [...]
      
      Recent work on XDP from Björn and Magnus additionally found that manually
      transforming the XDP return code switch statement with more than 5 cases
      into if-else combination would result in a considerable speedup in XDP
      layer due to avoidance of indirect calls in CONFIG_RETPOLINE enabled
      builds. On i40e driver with XDP prog attached, a 20-26% speedup has been
      observed [0]. Aside from XDP, there are many other places later in the
      networking stack's critical path with similar switch-case
      processing. Rather than fixing every XDP-enabled driver and locations in
      stack by hand, it would be good to instead raise the limit where gcc would
      emit expensive indirect calls from the switch under retpolines and stick
      with the default as-is in case of !retpoline configured kernels. This would
      also have the advantage that for archs where this is not necessary, we let
      compiler select the underlying target optimization for these constructs and
      avoid potential slow-downs by if-else hand-rewrite.
      
      In case of gcc, this setting is controlled by case-values-threshold which
      has an architecture global default that selects 4 or 5 (latter if target
      does not have a case insn that compares the bounds) where some arch back
      ends like arm64 or s390 override it with their own target hooks, for
      example, in gcc commit db7a90aa0de5 ("S/390: Disable prediction of indirect
      branches") the threshold pretty much disables jump tables by limit of 20
      under retpoline builds.  Comparing gcc's and clang's default code
      generation on x86-64 under O2 level with retpoline build results in the
      following outcome for 5 switch cases:
      
      * gcc with -mindirect-branch=thunk-inline -mindirect-branch-register:
      
        # gdb -batch -ex 'disassemble dispatch' ./c-switch
        Dump of assembler code for function dispatch:
         0x0000000000400be0 <+0>:     cmp    $0x4,%edi
         0x0000000000400be3 <+3>:     ja     0x400c35 <dispatch+85>
         0x0000000000400be5 <+5>:     lea    0x915f8(%rip),%rdx        # 0x4921e4
         0x0000000000400bec <+12>:    mov    %edi,%edi
         0x0000000000400bee <+14>:    movslq (%rdx,%rdi,4),%rax
         0x0000000000400bf2 <+18>:    add    %rdx,%rax
         0x0000000000400bf5 <+21>:    callq  0x400c01 <dispatch+33>
         0x0000000000400bfa <+26>:    pause
         0x0000000000400bfc <+28>:    lfence
         0x0000000000400bff <+31>:    jmp    0x400bfa <dispatch+26>
         0x0000000000400c01 <+33>:    mov    %rax,(%rsp)
         0x0000000000400c05 <+37>:    retq
         0x0000000000400c06 <+38>:    nopw   %cs:0x0(%rax,%rax,1)
         0x0000000000400c10 <+48>:    jmpq   0x400c90 <fn_3>
         0x0000000000400c15 <+53>:    nopl   (%rax)
         0x0000000000400c18 <+56>:    jmpq   0x400c70 <fn_2>
         0x0000000000400c1d <+61>:    nopl   (%rax)
         0x0000000000400c20 <+64>:    jmpq   0x400c50 <fn_1>
         0x0000000000400c25 <+69>:    nopl   (%rax)
         0x0000000000400c28 <+72>:    jmpq   0x400c40 <fn_0>
         0x0000000000400c2d <+77>:    nopl   (%rax)
         0x0000000000400c30 <+80>:    jmpq   0x400cb0 <fn_4>
         0x0000000000400c35 <+85>:    push   %rax
         0x0000000000400c36 <+86>:    callq  0x40dd80 <abort>
        End of assembler dump.
      
      * clang with -mretpoline emitting search tree:
      
        # gdb -batch -ex 'disassemble dispatch' ./c-switch
        Dump of assembler code for function dispatch:
         0x0000000000400b30 <+0>:     cmp    $0x1,%edi
         0x0000000000400b33 <+3>:     jle    0x400b44 <dispatch+20>
         0x0000000000400b35 <+5>:     cmp    $0x2,%edi
         0x0000000000400b38 <+8>:     je     0x400b4d <dispatch+29>
         0x0000000000400b3a <+10>:    cmp    $0x3,%edi
         0x0000000000400b3d <+13>:    jne    0x400b52 <dispatch+34>
         0x0000000000400b3f <+15>:    jmpq   0x400c50 <fn_3>
         0x0000000000400b44 <+20>:    test   %edi,%edi
         0x0000000000400b46 <+22>:    jne    0x400b5c <dispatch+44>
         0x0000000000400b48 <+24>:    jmpq   0x400c20 <fn_0>
         0x0000000000400b4d <+29>:    jmpq   0x400c40 <fn_2>
         0x0000000000400b52 <+34>:    cmp    $0x4,%edi
         0x0000000000400b55 <+37>:    jne    0x400b66 <dispatch+54>
         0x0000000000400b57 <+39>:    jmpq   0x400c60 <fn_4>
         0x0000000000400b5c <+44>:    cmp    $0x1,%edi
         0x0000000000400b5f <+47>:    jne    0x400b66 <dispatch+54>
         0x0000000000400b61 <+49>:    jmpq   0x400c30 <fn_1>
         0x0000000000400b66 <+54>:    push   %rax
         0x0000000000400b67 <+55>:    callq  0x40dd20 <abort>
        End of assembler dump.
      
        For sake of comparison, clang without -mretpoline:
      
        # gdb -batch -ex 'disassemble dispatch' ./c-switch
        Dump of assembler code for function dispatch:
         0x0000000000400b30 <+0>:	cmp    $0x4,%edi
         0x0000000000400b33 <+3>:	ja     0x400b57 <dispatch+39>
         0x0000000000400b35 <+5>:	mov    %edi,%eax
         0x0000000000400b37 <+7>:	jmpq   *0x492148(,%rax,8)
         0x0000000000400b3e <+14>:	jmpq   0x400bf0 <fn_0>
         0x0000000000400b43 <+19>:	jmpq   0x400c30 <fn_4>
         0x0000000000400b48 <+24>:	jmpq   0x400c10 <fn_2>
         0x0000000000400b4d <+29>:	jmpq   0x400c20 <fn_3>
         0x0000000000400b52 <+34>:	jmpq   0x400c00 <fn_1>
         0x0000000000400b57 <+39>:	push   %rax
         0x0000000000400b58 <+40>:	callq  0x40dcf0 <abort>
        End of assembler dump.
      
      Raising the cases to a high number (e.g. 100) will still result in similar
      code generation pattern with clang and gcc as above, in other words clang
      generally turns off jump table emission by having an extra expansion pass
      under retpoline build to turn indirectbr instructions from their IR into
      switch instructions as a built-in -mno-jump-table lowering of a switch (in
      this case, even if IR input already contained an indirect branch).
      
      For gcc, adding --param=case-values-threshold=20 as in similar fashion as
      s390 in order to raise the limit for x86 retpoline enabled builds results
      in a small vmlinux size increase of only 0.13% (before=18,027,528
      after=18,051,192). For clang this option is ignored due to i) not being
      needed as mentioned and ii) not having above cmdline
      parameter. Non-retpoline-enabled builds with gcc continue to use the
      default case-values-threshold setting, so nothing changes here.
      
      [0] https://lore.kernel.org/netdev/20190129095754.9390-1-bjorn.topel@gmail.com/
          and "The Path to DPDK Speeds for AF_XDP", LPC 2018, networking track:
        - http://vger.kernel.org/lpc_net2018_talks/lpc18_pres_af_xdp_perf-v3.pdf
        - http://vger.kernel.org/lpc_net2018_talks/lpc18_paper_af_xdp_perf-v2.pdfSigned-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NBjörn Töpel <bjorn.topel@intel.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: netdev@vger.kernel.org
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Magnus Karlsson <magnus.karlsson@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Link: https://lkml.kernel.org/r/20190221221941.29358-1-daniel@iogearbox.net
      ce02ef06
  20. 22 1月, 2019 1 次提交
  21. 06 1月, 2019 1 次提交
    • M
      jump_label: move 'asm goto' support test to Kconfig · e9666d10
      Masahiro Yamada 提交于
      Currently, CONFIG_JUMP_LABEL just means "I _want_ to use jump label".
      
      The jump label is controlled by HAVE_JUMP_LABEL, which is defined
      like this:
      
        #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
        # define HAVE_JUMP_LABEL
        #endif
      
      We can improve this by testing 'asm goto' support in Kconfig, then
      make JUMP_LABEL depend on CC_HAS_ASM_GOTO.
      
      Ugly #ifdef HAVE_JUMP_LABEL will go away, and CONFIG_JUMP_LABEL will
      match to the real kernel capability.
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      e9666d10
  22. 19 12月, 2018 1 次提交
  23. 09 12月, 2018 1 次提交
  24. 05 12月, 2018 1 次提交
  25. 28 11月, 2018 1 次提交
  26. 05 11月, 2018 1 次提交
  27. 04 10月, 2018 1 次提交
    • N
      kbuild/Makefile: Prepare for using macros in inline assembly code to work... · 77b0bf55
      Nadav Amit 提交于
      kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs
      
      Using macros in inline assembly allows us to work around bugs
      in GCC's inlining decisions.
      
      Compile macros.S and use it to assemble all C files.
      Currently only x86 will use it.
      
      Background:
      
      The inlining pass of GCC doesn't include an assembler, so it's not aware
      of basic properties of the generated code, such as its size in bytes,
      or that there are such things as discontiuous blocks of code and data
      due to the newfangled linker feature called 'sections' ...
      
      Instead GCC uses a lazy and fragile heuristic: it does a linear count of
      certain syntactic and whitespace elements in inlined assembly block source
      code, such as a count of new-lines and semicolons (!), as a poor substitute
      for "code size and complexity".
      
      Unsurprisingly this heuristic falls over and breaks its neck whith certain
      common types of kernel code that use inline assembly, such as the frequent
      practice of putting useful information into alternative sections.
      
      As a result of this fresh, 20+ years old GCC bug, GCC's inlining decisions
      are effectively disabled for inlined functions that make use of such asm()
      blocks, because GCC thinks those sections of code are "large" - when in
      reality they are often result in just a very low number of machine
      instructions.
      
      This absolute lack of inlining provess when GCC comes across such asm()
      blocks both increases generated kernel code size and causes performance
      overhead, which is particularly noticeable on paravirt kernels, which make
      frequent use of these inlining facilities in attempt to stay out of the
      way when running on baremetal hardware.
      
      Instead of fixing the compiler we use a workaround: we set an assembly macro
      and call it from the inlined assembly block. As a result GCC considers the
      inline assembly block as a single instruction. (Which it often isn't but I digress.)
      
      This uglifies and bloats the source code - for example just the refcount
      related changes have this impact:
      
       Makefile                 |    9 +++++++--
       arch/x86/Makefile        |    7 +++++++
       arch/x86/kernel/macros.S |    7 +++++++
       scripts/Kbuild.include   |    4 +++-
       scripts/mod/Makefile     |    2 ++
       5 files changed, 26 insertions(+), 3 deletions(-)
      
      Yay readability and maintainability, it's not like assembly code is hard to read
      and maintain ...
      
      We also hope that GCC will eventually get fixed, but we are not holding
      our breath for that. Yet we are optimistic, it might still happen, any decade now.
      
      [ mingo: Wrote new changelog describing the background. ]
      Tested-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NNadav Amit <namit@vmware.com>
      Acked-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Marek <michal.lkml@markovi.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kbuild@vger.kernel.org
      Link: http://lkml.kernel.org/r/20181003213100.189959-3-namit@vmware.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      77b0bf55
  28. 01 10月, 2018 1 次提交
  29. 31 8月, 2018 1 次提交
  30. 30 8月, 2018 1 次提交
  31. 24 8月, 2018 1 次提交
  32. 16 7月, 2018 1 次提交
  33. 21 6月, 2018 1 次提交