1. 28 4月, 2020 2 次提交
    • R
      arm64/lib: improve CRC32 performance for deep pipelines · acc2d05a
      Rongwei Wang 提交于
      to #26730415
      
      commit efdb25efc7645b326cd5eb82be5feeabe167c24e upstream.
      
      Improve the performance of the crc32() asm routines by getting rid of
      most of the branches and small sized loads on the common path.
      
      Instead, use a branchless code path involving overlapping 16 byte
      loads to process the first (length % 32) bytes, and process the
      remainder using a loop that processes 32 bytes at a time.
      
      Tested using the following test program:
      
        #include <stdlib.h>
      
        extern void crc32_le(unsigned short, char const*, int);
      
        int main(void)
        {
          static const char buf[4096];
      
          srand(20181126);
      
          for (int i = 0; i < 100 * 1000 * 1000; i++)
            crc32_le(0, buf, rand() % 1024);
      
          return 0;
        }
      
      On Cortex-A53 and Cortex-A57, the performance regresses but only very
      slightly. On Cortex-A72 however, the performance improves from
      
        $ time ./crc32
      
        real  0m10.149s
        user  0m10.149s
        sys   0m0.000s
      
      to
      
        $ time ./crc32
      
        real  0m7.915s
        user  0m7.915s
        sys   0m0.000s
      
      Cc: Rui Sun <sunrui26@huawei.com>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NRongwei Wang <rongwei.wang@linux.alibaba.com>
      Acked-by: NZou Cao <zoucao@linux.alibaba.com>
      acc2d05a
    • R
      arm64/lib: add accelerated crc32 routines · 19473a51
      Rongwei Wang 提交于
      to #26730415
      
      commit 7481cddf29ede204b475facc40e6f65459939881 upstream.
      
      Unlike crc32c(), which is wired up to the crypto API internally so the
      optimal driver is selected based on the platform's capabilities,
      crc32_le() is implemented as a library function using a slice-by-8 table
      based C implementation. Even though few of the call sites may be
      bottlenecks, calling a time variant implementation with a non-negligible
      D-cache footprint is a bit of a waste, given that ARMv8.1 and up
      mandates
      support for the CRC32 instructions that were optional in ARMv8.0, but
      are
      already widely available, even on the Cortex-A53 based Raspberry Pi.
      
      So implement routines that use these instructions if available, and fall
      back to the existing generic routines otherwise. The selection is based
      on alternatives patching.
      
      Note that this unconditionally selects CONFIG_CRC32 as a builtin. Since
      CRC32 is relied upon by core functionality such as CONFIG_OF_FLATTREE,
      this just codifies the status quo.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NRongwei Wang <rongwei.wang@linux.alibaba.com>
      Acked-by: NZou Cao <zoucao@linux.alibaba.com>
      19473a51
  2. 18 3月, 2020 1 次提交
  3. 01 12月, 2019 1 次提交
  4. 24 11月, 2019 1 次提交
    • P
      arm64: uaccess: Ensure PAN is re-enabled after unhandled uaccess fault · 70366259
      Pavel Tatashin 提交于
      commit 94bb804e1e6f0a9a77acf20d7c70ea141c6c821e upstream.
      
      A number of our uaccess routines ('__arch_clear_user()' and
      '__arch_copy_{in,from,to}_user()') fail to re-enable PAN if they
      encounter an unhandled fault whilst accessing userspace.
      
      For CPUs implementing both hardware PAN and UAO, this bug has no effect
      when both extensions are in use by the kernel.
      
      For CPUs implementing hardware PAN but not UAO, this means that a kernel
      using hardware PAN may execute portions of code with PAN inadvertently
      disabled, opening us up to potential security vulnerabilities that rely
      on userspace access from within the kernel which would usually be
      prevented by this mechanism. In other words, parts of the kernel run the
      same way as they would on a CPU without PAN implemented/emulated at all.
      
      For CPUs not implementing hardware PAN and instead relying on software
      emulation via 'CONFIG_ARM64_SW_TTBR0_PAN=y', the impact is unfortunately
      much worse. Calling 'schedule()' with software PAN disabled means that
      the next task will execute in the kernel using the page-table and ASID
      of the previous process even after 'switch_mm()', since the actual
      hardware switch is deferred until return to userspace. At this point, or
      if there is a intermediate call to 'uaccess_enable()', the page-table
      and ASID of the new process are installed. Sadly, due to the changes
      introduced by KPTI, this is not an atomic operation and there is a very
      small window (two instructions) where the CPU is configured with the
      page-table of the old task and the ASID of the new task; a speculative
      access in this state is disastrous because it would corrupt the TLB
      entries for the new task with mappings from the previous address space.
      
      As Pavel explains:
      
        | I was able to reproduce memory corruption problem on Broadcom's SoC
        | ARMv8-A like this:
        |
        | Enable software perf-events with PERF_SAMPLE_CALLCHAIN so userland's
        | stack is accessed and copied.
        |
        | The test program performed the following on every CPU and forking
        | many processes:
        |
        |	unsigned long *map = mmap(NULL, PAGE_SIZE, PROT_READ|PROT_WRITE,
        |				  MAP_SHARED | MAP_ANONYMOUS, -1, 0);
        |	map[0] = getpid();
        |	sched_yield();
        |	if (map[0] != getpid()) {
        |		fprintf(stderr, "Corruption detected!");
        |	}
        |	munmap(map, PAGE_SIZE);
        |
        | From time to time I was getting map[0] to contain pid for a
        | different process.
      
      Ensure that PAN is re-enabled when returning after an unhandled user
      fault from our uaccess routines.
      
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: NMark Rutland <mark.rutland@arm.com>
      Tested-by: NMark Rutland <mark.rutland@arm.com>
      Cc: <stable@vger.kernel.org>
      Fixes: 338d4f49 ("arm64: kernel: Add support for Privileged Access Never")
      Signed-off-by: NPavel Tatashin <pasha.tatashin@soleen.com>
      [will: rewrote commit message]
      Signed-off-by: NWill Deacon <will@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      70366259
  5. 14 11月, 2018 1 次提交
  6. 21 6月, 2018 1 次提交
  7. 22 5月, 2018 1 次提交
    • J
      arm64: export tishift functions to modules · 255845fc
      Jason A. Donenfeld 提交于
      Otherwise modules that use these arithmetic operations will fail to
      link. We accomplish this with the usual EXPORT_SYMBOL, which on most
      architectures goes in the .S file but the ARM64 maintainers prefer that
      insead it goes into arm64ksyms.
      
      While we're at it, we also fix this up to use SPDX, and I personally
      choose to relicense this as GPL2||BSD so that these symbols don't need
      to be export_symbol_gpl, so all modules can use the routines, since
      these are important general purpose compiler-generated function calls.
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Reported-by: NPaX Team <pageexec@freemail.hu>
      Cc: stable@vger.kernel.org
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      255845fc
  8. 27 4月, 2018 1 次提交
    • M
      arm64: avoid instrumenting atomic_ll_sc.o · 3789c122
      Mark Rutland 提交于
      Our out-of-line atomics are built with a special calling convention,
      preventing pointless stack spilling, and allowing us to patch call sites
      with ARMv8.1 atomic instructions.
      
      Instrumentation inserted by the compiler may result in calls to
      functions not following this special calling convention, resulting in
      registers being unexpectedly clobbered, and various problems resulting
      from this.
      
      For example, if a kernel is built with KCOV and ARM64_LSE_ATOMICS, the
      compiler inserts calls to __sanitizer_cov_trace_pc in the prologues of
      the atomic functions. This has been observed to result in spurious
      cmpxchg failures, leading to a hang early on in the boot process.
      
      This patch avoids such issues by preventing instrumentation of our
      out-of-line atomics.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      3789c122
  9. 07 3月, 2018 1 次提交
    • W
      arm64: lse: Pass -fomit-frame-pointer to out-of-line ll/sc atomics · 6b24442d
      Will Deacon 提交于
      In cases where x30 is used as a temporary in the out-of-line ll/sc atomics
      (e.g. atomic_fetch_add), the compiler tends to put out a full stackframe,
      which included pointing the x29 at the new frame.
      
      Since these things aren't traceable anyway, we can pass -fomit-frame-pointer
      to reduce the work when spilling. Since this is incompatible with -pg, we
      also remove that from the CFLAGS for this file.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      6b24442d
  10. 06 3月, 2018 1 次提交
  11. 07 2月, 2018 1 次提交
  12. 17 1月, 2018 2 次提交
  13. 02 1月, 2018 1 次提交
  14. 11 12月, 2017 1 次提交
  15. 14 11月, 2017 1 次提交
    • J
      arm64: Implement __lshrti3 library function · 9bfe7553
      Jason A. Donenfeld 提交于
      Commit fb872273 ("arm64: support __int128 on gcc 5+") added support
      for the __int128 data type, but this breaks the build in some configurations
      where GCC ends up emitting calls to the __lshrti3 helper in libgcc, which
      results in a link error:
      
        kernel/sched/fair.o: In function `__calc_delta':
        fair.c:(.text+0xca0): undefined reference to `__lshrti3'
        kernel/time/timekeeping.o: In function `timekeeping_resume':
        timekeeping.c:(.text+0x3f60): undefined reference to `__lshrti3'
        make: *** [vmlinux] Error 1
      
      Fix the build by providing an implementation of __lshrti3, like we do
      already for __ashlti3 and __ashrti3.
      Reported-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      9bfe7553
  16. 03 11月, 2017 1 次提交
  17. 02 11月, 2017 1 次提交
    • G
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman 提交于
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
  18. 14 10月, 2017 1 次提交
    • J
      arm64: use WFE for long delays · 7b77452e
      Julien Thierry 提交于
      The current delay implementation uses the yield instruction, which is a
      hint that it is beneficial to schedule another thread. As this is a hint,
      it may be implemented as a NOP, causing all delays to be busy loops. This
      is the case for many existing CPUs.
      
      Taking advantage of the generic timer sending periodic events to all
      cores, we can use WFE during delays to reduce power consumption. This is
      beneficial only for delays longer than the period of the timer event
      stream.
      
      If timer event stream is not enabled, delays will behave as yield/busy
      loops.
      Signed-off-by: NJulien Thierry <julien.thierry@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      7b77452e
  19. 10 8月, 2017 1 次提交
  20. 09 8月, 2017 1 次提交
  21. 25 7月, 2017 1 次提交
  22. 29 3月, 2017 1 次提交
  23. 28 2月, 2017 1 次提交
  24. 27 12月, 2016 1 次提交
  25. 25 12月, 2016 1 次提交
  26. 22 11月, 2016 1 次提交
  27. 16 9月, 2016 1 次提交
  28. 12 9月, 2016 1 次提交
    • M
      arm64: use alternative auto-nop · 6ba3b554
      Mark Rutland 提交于
      Make use of the new alternative_if and alternative_else_nop_endif and
      get rid of our homebew NOP sleds, making the code simpler to read.
      
      Note that for cpu_do_switch_mm the ret has been moved out of the
      alternative sequence, and in the default case there will be three
      additional NOPs executed.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      6ba3b554
  29. 27 7月, 2016 1 次提交
  30. 21 6月, 2016 1 次提交
  31. 05 3月, 2016 1 次提交
  32. 27 2月, 2016 1 次提交
    • A
      arm64: lse: deal with clobbered IP registers after branch via PLT · 5be8b70a
      Ard Biesheuvel 提交于
      The LSE atomics implementation uses runtime patching to patch in calls
      to out of line non-LSE atomics implementations on cores that lack hardware
      support for LSE. To avoid paying the overhead cost of a function call even
      if no call ends up being made, the bl instruction is kept invisible to the
      compiler, and the out of line implementations preserve all registers, not
      just the ones that they are required to preserve as per the AAPCS64.
      
      However, commit fd045f6c ("arm64: add support for module PLTs") added
      support for routing branch instructions via veneers if the branch target
      offset exceeds the range of the ordinary relative branch instructions.
      Since this deals with jump and call instructions that are exposed to ELF
      relocations, the PLT code uses x16 to hold the address of the branch target
      when it performs an indirect branch-to-register, something which is
      explicitly allowed by the AAPCS64 (and ordinary compiler generated code
      does not expect register x16 or x17 to retain their values across a bl
      instruction).
      
      Since the lse runtime patched bl instructions don't adhere to the AAPCS64,
      they don't deal with this clobbering of registers x16 and x17. So add them
      to the clobber list of the asm() statements that perform the call
      instructions, and drop x16 and x17 from the list of registers that are
      callee saved in the out of line non-LSE implementations.
      
      In addition, since we have given these functions two scratch registers,
      they no longer need to stack/unstack temp registers.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      [will: factored clobber list into #define, updated Makefile comment]
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      5be8b70a
  33. 19 2月, 2016 2 次提交
    • J
      arm64: kernel: Don't toggle PAN on systems with UAO · 70544196
      James Morse 提交于
      If a CPU supports both Privileged Access Never (PAN) and User Access
      Override (UAO), we don't need to disable/re-enable PAN round all
      copy_to_user() like calls.
      
      UAO alternatives cause these calls to use the 'unprivileged' load/store
      instructions, which are overridden to be the privileged kind when
      fs==KERNEL_DS.
      
      This patch changes the copy_to_user() calls to have their PAN toggling
      depend on a new composite 'feature' ARM64_ALT_PAN_NOT_UAO.
      
      If both features are detected, PAN will be enabled, but the copy_to_user()
      alternatives will not be applied. This means PAN will be enabled all the
      time for these functions. If only PAN is detected, the toggling will be
      enabled as normal.
      
      This will save the time taken to disable/re-enable PAN, and allow us to
      catch copy_to_user() accesses that occur with fs==KERNEL_DS.
      
      Futex and swp-emulation code continue to hang their PAN toggling code on
      ARM64_HAS_PAN.
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      70544196
    • J
      arm64: kernel: Add support for User Access Override · 57f4959b
      James Morse 提交于
      'User Access Override' is a new ARMv8.2 feature which allows the
      unprivileged load and store instructions to be overridden to behave in
      the normal way.
      
      This patch converts {get,put}_user() and friends to use ldtr*/sttr*
      instructions - so that they can only access EL0 memory, then enables
      UAO when fs==KERNEL_DS so that these functions can access kernel memory.
      
      This allows user space's read/write permissions to be checked against the
      page tables, instead of testing addr<USER_DS, then using the kernel's
      read/write permissions.
      Signed-off-by: NJames Morse <james.morse@arm.com>
      [catalin.marinas@arm.com: move uao_thread_switch() above dsb()]
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      57f4959b
  34. 16 2月, 2016 3 次提交
  35. 13 10月, 2015 1 次提交
    • A
      arm64: add KASAN support · 39d114dd
      Andrey Ryabinin 提交于
      This patch adds arch specific code for kernel address sanitizer
      (see Documentation/kasan.txt).
      
      1/8 of kernel addresses reserved for shadow memory. There was no
      big enough hole for this, so virtual addresses for shadow were
      stolen from vmalloc area.
      
      At early boot stage the whole shadow region populated with just
      one physical page (kasan_zero_page). Later, this page reused
      as readonly zero shadow for some memory that KASan currently
      don't track (vmalloc).
      After mapping the physical memory, pages for shadow memory are
      allocated and mapped.
      
      Functions like memset/memmove/memcpy do a lot of memory accesses.
      If bad pointer passed to one of these function it is important
      to catch this. Compiler's instrumentation cannot do this since
      these functions are written in assembly.
      KASan replaces memory functions with manually instrumented variants.
      Original functions declared as weak symbols so strong definitions
      in mm/kasan/kasan.c could replace them. Original functions have aliases
      with '__' prefix in name, so we could call non-instrumented variant
      if needed.
      Some files built without kasan instrumentation (e.g. mm/slub.c).
      Original mem* function replaced (via #define) with prefixed variants
      to disable memory access checks for such files.
      Signed-off-by: NAndrey Ryabinin <ryabinin.a.a@gmail.com>
      Tested-by: NLinus Walleij <linus.walleij@linaro.org>
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      39d114dd