1. 31 5月, 2018 5 次提交
  2. 24 3月, 2018 1 次提交
  3. 07 2月, 2018 3 次提交
    • C
      lib: optimize cpumask_next_and() · 0ade34c3
      Clement Courbet 提交于
      We've measured that we spend ~0.6% of sys cpu time in cpumask_next_and().
      It's essentially a joined iteration in search for a non-zero bit, which is
      currently implemented as a lookup join (find a nonzero bit on the lhs,
      lookup the rhs to see if it's set there).
      
      Implement a direct join (find a nonzero bit on the incrementally built
      join).  Also add generic bitmap benchmarks in the new `test_find_bit`
      module for new function (see `find_next_and_bit` in [2] and [3] below).
      
      For cpumask_next_and, direct benchmarking shows that it's 1.17x to 14x
      faster with a geometric mean of 2.1 on 32 CPUs [1].  No impact on memory
      usage.  Note that on Arm, the new pure-C implementation still outperforms
      the old one that uses a mix of C and asm (`find_next_bit`) [3].
      
      [1] Approximate benchmark code:
      
      ```
        unsigned long src1p[nr_cpumask_longs] = {pattern1};
        unsigned long src2p[nr_cpumask_longs] = {pattern2};
        for (/*a bunch of repetitions*/) {
          for (int n = -1; n <= nr_cpu_ids; ++n) {
            asm volatile("" : "+rm"(src1p)); // prevent any optimization
            asm volatile("" : "+rm"(src2p));
            unsigned long result = cpumask_next_and(n, src1p, src2p);
            asm volatile("" : "+rm"(result));
          }
        }
      ```
      
      Results:
      pattern1    pattern2     time_before/time_after
      0x0000ffff  0x0000ffff   1.65
      0x0000ffff  0x00005555   2.24
      0x0000ffff  0x00001111   2.94
      0x0000ffff  0x00000000   14.0
      0x00005555  0x0000ffff   1.67
      0x00005555  0x00005555   1.71
      0x00005555  0x00001111   1.90
      0x00005555  0x00000000   6.58
      0x00001111  0x0000ffff   1.46
      0x00001111  0x00005555   1.49
      0x00001111  0x00001111   1.45
      0x00001111  0x00000000   3.10
      0x00000000  0x0000ffff   1.18
      0x00000000  0x00005555   1.18
      0x00000000  0x00001111   1.17
      0x00000000  0x00000000   1.25
      -----------------------------
                     geo.mean  2.06
      
      [2] test_find_next_bit, X86 (skylake)
      
       [ 3913.477422] Start testing find_bit() with random-filled bitmap
       [ 3913.477847] find_next_bit: 160868 cycles, 16484 iterations
       [ 3913.477933] find_next_zero_bit: 169542 cycles, 16285 iterations
       [ 3913.478036] find_last_bit: 201638 cycles, 16483 iterations
       [ 3913.480214] find_first_bit: 4353244 cycles, 16484 iterations
       [ 3913.480216] Start testing find_next_and_bit() with random-filled
       bitmap
       [ 3913.481074] find_next_and_bit: 89604 cycles, 8216 iterations
       [ 3913.481075] Start testing find_bit() with sparse bitmap
       [ 3913.481078] find_next_bit: 2536 cycles, 66 iterations
       [ 3913.481252] find_next_zero_bit: 344404 cycles, 32703 iterations
       [ 3913.481255] find_last_bit: 2006 cycles, 66 iterations
       [ 3913.481265] find_first_bit: 17488 cycles, 66 iterations
       [ 3913.481266] Start testing find_next_and_bit() with sparse bitmap
       [ 3913.481272] find_next_and_bit: 764 cycles, 1 iterations
      
      [3] test_find_next_bit, arm (v7 odroid XU3).
      
      [  267.206928] Start testing find_bit() with random-filled bitmap
      [  267.214752] find_next_bit: 4474 cycles, 16419 iterations
      [  267.221850] find_next_zero_bit: 5976 cycles, 16350 iterations
      [  267.229294] find_last_bit: 4209 cycles, 16419 iterations
      [  267.279131] find_first_bit: 1032991 cycles, 16420 iterations
      [  267.286265] Start testing find_next_and_bit() with random-filled
      bitmap
      [  267.302386] find_next_and_bit: 2290 cycles, 8140 iterations
      [  267.309422] Start testing find_bit() with sparse bitmap
      [  267.316054] find_next_bit: 191 cycles, 66 iterations
      [  267.322726] find_next_zero_bit: 8758 cycles, 32703 iterations
      [  267.329803] find_last_bit: 84 cycles, 66 iterations
      [  267.336169] find_first_bit: 4118 cycles, 66 iterations
      [  267.342627] Start testing find_next_and_bit() with sparse bitmap
      [  267.356919] find_next_and_bit: 91 cycles, 1 iterations
      
      [courbet@google.com: v6]
        Link: http://lkml.kernel.org/r/20171129095715.23430-1-courbet@google.com
      [geert@linux-m68k.org: m68k/bitops: always include <asm-generic/bitops/find.h>]
        Link: http://lkml.kernel.org/r/1512556816-28627-1-git-send-email-geert@linux-m68k.org
      Link: http://lkml.kernel.org/r/20171128131334.23491-1-courbet@google.comSigned-off-by: NClement Courbet <courbet@google.com>
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Cc: Yury Norov <ynorov@caviumnetworks.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0ade34c3
    • M
      arm64: KVM: Report SMCCC_ARCH_WORKAROUND_1 BP hardening support · 6167ec5c
      Marc Zyngier 提交于
      A new feature of SMCCC 1.1 is that it offers firmware-based CPU
      workarounds. In particular, SMCCC_ARCH_WORKAROUND_1 provides
      BP hardening for CVE-2017-5715.
      
      If the host has some mitigation for this issue, report that
      we deal with it using SMCCC_ARCH_WORKAROUND_1, as we apply the
      host workaround on every guest exit.
      Tested-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      6167ec5c
    • M
      arm/arm64: KVM: Consolidate the PSCI include files · 1a2fb94e
      Marc Zyngier 提交于
      As we're about to update the PSCI support, and because I'm lazy,
      let's move the PSCI include file to include/kvm so that both
      ARM architectures can find it.
      Acked-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Tested-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      1a2fb94e
  4. 01 2月, 2018 1 次提交
  5. 23 1月, 2018 1 次提交
  6. 21 1月, 2018 4 次提交
  7. 16 1月, 2018 3 次提交
    • J
      KVM: arm64: Handle RAS SErrors from EL1 on guest exit · 3368bd80
      James Morse 提交于
      We expect to have firmware-first handling of RAS SErrors, with errors
      notified via an APEI method. For systems without firmware-first, add
      some minimal handling to KVM.
      
      There are two ways KVM can take an SError due to a guest, either may be a
      RAS error: we exit the guest due to an SError routed to EL2 by HCR_EL2.AMO,
      or we take an SError from EL2 when we unmask PSTATE.A from __guest_exit.
      
      For SError that interrupt a guest and are routed to EL2 the existing
      behaviour is to inject an impdef SError into the guest.
      
      Add code to handle RAS SError based on the ESR. For uncontained and
      uncategorized errors arm64_is_fatal_ras_serror() will panic(), these
      errors compromise the host too. All other error types are contained:
      For the fatal errors the vCPU can't make progress, so we inject a virtual
      SError. We ignore contained errors where we can make progress as if
      we're lucky, we may not hit them again.
      
      If only some of the CPUs support RAS the guest will see the cpufeature
      sanitised version of the id registers, but we may still take RAS SError
      on this CPU. Move the SError handling out of handle_exit() into a new
      handler that runs before we can be preempted. This allows us to use
      this_cpu_has_cap(), via arm64_is_ras_serror().
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      3368bd80
    • J
      KVM: arm/arm64: mask/unmask daif around VHE guests · 4f5abad9
      James Morse 提交于
      Non-VHE systems take an exception to EL2 in order to world-switch into the
      guest. When returning from the guest KVM implicitly restores the DAIF
      flags when it returns to the kernel at EL1.
      
      With VHE none of this exception-level jumping happens, so KVMs
      world-switch code is exposed to the host kernel's DAIF values, and KVM
      spills the guest-exit DAIF values back into the host kernel.
      On entry to a guest we have Debug and SError exceptions unmasked, KVM
      has switched VBAR but isn't prepared to handle these. On guest exit
      Debug exceptions are left disabled once we return to the host and will
      stay this way until we enter user space.
      
      Add a helper to mask/unmask DAIF around VHE guests. The unmask can only
      happen after the hosts VBAR value has been synchronised by the isb in
      __vhe_hyp_call (via kvm_call_hyp()). Masking could be as late as
      setting KVMs VBAR value, but is kept here for symmetry.
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      4f5abad9
    • K
      arm: Implement thread_struct whitelist for hardened usercopy · 08626a60
      Kees Cook 提交于
      While ARM32 carries FPU state in the thread structure that is saved and
      restored during signal handling, it doesn't need to declare a usercopy
      whitelist, since existing accessors are all either using a bounce buffer
      (for which whitelisting isn't checking the slab), are statically sized
      (which will bypass the hardened usercopy check), or both.
      
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Signed-off-by: NKees Cook <keescook@chromium.org>
      08626a60
  8. 15 1月, 2018 1 次提交
  9. 10 1月, 2018 3 次提交
  10. 09 1月, 2018 1 次提交
  11. 08 1月, 2018 6 次提交
  12. 04 1月, 2018 1 次提交
  13. 02 1月, 2018 1 次提交
    • C
      KVM: arm/arm64: Avoid work when userspace iqchips are not used · 61bbe380
      Christoffer Dall 提交于
      We currently check if the VM has a userspace irqchip in several places
      along the critical path, and if so, we do some work which is only
      required for having an irqchip in userspace.  This is unfortunate, as we
      could avoid doing any work entirely, if we didn't have to support
      irqchip in userspace.
      
      Realizing the userspace irqchip on ARM is mostly a developer or hobby
      feature, and is unlikely to be used in servers or other scenarios where
      performance is a priority, we can use a refcounted static key to only
      check the irqchip configuration when we have at least one VM that uses
      an irqchip in userspace.
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      61bbe380
  14. 23 12月, 2017 2 次提交
  15. 19 12月, 2017 1 次提交
    • B
      PCI: Remove PCI_REASSIGN_ALL_RSRC use on arm and arm64 · 7153884c
      Bjorn Helgaas 提交于
      On arm, PCI_REASSIGN_ALL_RSRC is used only in pcibios_assign_all_busses(),
      which helps decide whether to reconfigure bridge bus numbers.  It has
      nothing to do with BAR assignments.  On arm64 and powerpc,
      pcibios_assign_all_busses() tests PCI_REASSIGN_ALL_BUS, which makes more
      sense.
      
      Align arm with arm64 and powerpc, so they all use PCI_REASSIGN_ALL_BUS for
      pcibios_assign_all_busses().
      
      Remove PCI_REASSIGN_ALL_RSRC from the generic, Tegra, Versatile, and
      R-Car drivers.  These drivers are used only on arm or arm64, where
      PCI_REASSIGN_ALL_RSRC is not used after this change, so removing it
      should have no effect.
      
      No functional change intended.
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NManikanta Maddireddy <mmaddireddy@nvidia.com>
      Reviewed-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      7153884c
  16. 18 12月, 2017 3 次提交
    • F
      ARM: 8725/1: Add Broadcom Brahma-B15 readahead cache support · f6f9be1c
      Florian Fainelli 提交于
      This patch adds support for the Broadcom Brahma-B15 CPU readahead cache
      controller. This cache controller sits between the L2 and the memory bus
      and its purpose is to provide a friendler burst size towards the DDR
      interface than the native cache line size.
      
      The readahead cache is mostly transparent, except for
      flush_kern_cache_all, which is precisely what we are overriding here.
      
      The readahead cache only intercepts reads, and does invalidate on
      writes (IOW), as such, some data can remain stale in any of its buffers, such
      that we need to flush it, which is an operation that needs to happen in
      a particular order:
      
      - disable the readahead cache
      - flush it
      - call the appropriate cache-v7.S function
      - re-enable
      
      This patch tries to minimize the impact to the cache-v7.S file by only
      providing a stub in case CONFIG_CACHE_B15_RAC is enabled (default for
      ARCH_BRCMSTB since it is the current user).
      Signed-off-by: NAlamy Liu <alamyliu@broadcom.com>
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      f6f9be1c
    • R
      ARM: probes: avoid adding kprobes to sensitive kernel-entry/exit code · c6089061
      Russell King 提交于
      Avoid adding kprobes to any of the kernel entry/exit or startup
      assembly code, or code in the identity-mapped region.  This code does
      not conform to the standard C conventions, which means that the
      expectations of the kprobes code is not forfilled.
      
      Placing kprobes at some of these locations results in the kernel trying
      to return to userspace addresses while retaining the CPU in kernel mode.
      Tested-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      c6089061
    • N
      ARM: 8723/2: always assume the "unified" syntax for assembly code · 75fea300
      Nicolas Pitre 提交于
      The GNU assembler has implemented the "unified syntax" parsing since
      2005. This "unified" syntax is required when the kernel is built in
      Thumb2 mode. However the "unified" syntax is a mixed bag of features,
      including not requiring a `#' prefix with immediate operands. This leads
      to situations where some code builds just fine in Thumb2 mode and fails
      to build in ARM mode if that prefix is missing. This behavior
      discrepancy makes build tests less valuable, forcing both ARM and Thumb2
      builds for proper coverage.
      
      Let's "fix" this issue by always using the "unified" syntax for both ARM
      and Thumb2 mode. Given that the documented minimum binutils version that
      properly builds the kernel is version 2.20 released in 2010, we can
      assume that any toolchain capable of building the latest kernel is also
      "unified syntax" capable.
      
      Whith this, a bunch of macros used to mask some differences between both
      syntaxes can be removed, with the side effect of making LTO easier.
      Suggested-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NNicolas Pitre <nico@linaro.org>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      75fea300
  17. 30 11月, 2017 1 次提交
  18. 29 11月, 2017 2 次提交