1. 24 2月, 2022 1 次提交
    • J
      arm64: Mitigate spectre style branch history side channels · 558c303c
      James Morse 提交于
      Speculation attacks against some high-performance processors can
      make use of branch history to influence future speculation.
      When taking an exception from user-space, a sequence of branches
      or a firmware call overwrites or invalidates the branch history.
      
      The sequence of branches is added to the vectors, and should appear
      before the first indirect branch. For systems using KPTI the sequence
      is added to the kpti trampoline where it has a free register as the exit
      from the trampoline is via a 'ret'. For systems not using KPTI, the same
      register tricks are used to free up a register in the vectors.
      
      For the firmware call, arch-workaround-3 clobbers 4 registers, so
      there is no choice but to save them to the EL1 stack. This only happens
      for entry from EL0, so if we take an exception due to the stack access,
      it will not become re-entrant.
      
      For KVM, the existing branch-predictor-hardening vectors are used.
      When a spectre version of these vectors is in use, the firmware call
      is sufficient to mitigate against Spectre-BHB. For the non-spectre
      versions, the sequence of branches is added to the indirect vector.
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NJames Morse <james.morse@arm.com>
      558c303c
  2. 16 2月, 2022 6 次提交
    • J
      arm64: proton-pack: Report Spectre-BHB vulnerabilities as part of Spectre-v2 · dee435be
      James Morse 提交于
      Speculation attacks against some high-performance processors can
      make use of branch history to influence future speculation as part of
      a spectre-v2 attack. This is not mitigated by CSV2, meaning CPUs that
      previously reported 'Not affected' are now moderately mitigated by CSV2.
      
      Update the value in /sys/devices/system/cpu/vulnerabilities/spectre_v2
      to also show the state of the BHB mitigation.
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NJames Morse <james.morse@arm.com>
      dee435be
    • J
      arm64: Add percpu vectors for EL1 · bd09128d
      James Morse 提交于
      The Spectre-BHB workaround adds a firmware call to the vectors. This
      is needed on some CPUs, but not others. To avoid the unaffected CPU in
      a big/little pair from making the firmware call, create per cpu vectors.
      
      The per-cpu vectors only apply when returning from EL0.
      
      Systems using KPTI can use the canonical 'full-fat' vectors directly at
      EL1, the trampoline exit code will switch to this_cpu_vector on exit to
      EL0. Systems not using KPTI should always use this_cpu_vector.
      
      this_cpu_vector will point at a vector in tramp_vecs or
      __bp_harden_el1_vectors, depending on whether KPTI is in use.
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NJames Morse <james.morse@arm.com>
      bd09128d
    • J
      arm64: entry: Add vectors that have the bhb mitigation sequences · ba268923
      James Morse 提交于
      Some CPUs affected by Spectre-BHB need a sequence of branches, or a
      firmware call to be run before any indirect branch. This needs to go
      in the vectors. No CPU needs both.
      
      While this can be patched in, it would run on all CPUs as there is a
      single set of vectors. If only one part of a big/little combination is
      affected, the unaffected CPUs have to run the mitigation too.
      
      Create extra vectors that include the sequence. Subsequent patches will
      allow affected CPUs to select this set of vectors. Later patches will
      modify the loop count to match what the CPU requires.
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NJames Morse <james.morse@arm.com>
      ba268923
    • J
      arm64: entry: Allow the trampoline text to occupy multiple pages · a9c406e6
      James Morse 提交于
      Adding a second set of vectors to .entry.tramp.text will make it
      larger than a single 4K page.
      
      Allow the trampoline text to occupy up to three pages by adding two
      more fixmap slots. Previous changes to tramp_valias allowed it to reach
      beyond a single page.
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NJames Morse <james.morse@arm.com>
      a9c406e6
    • J
      arm64: entry: Move the trampoline data page before the text page · c091fb6a
      James Morse 提交于
      The trampoline code has a data page that holds the address of the vectors,
      which is unmapped when running in user-space. This ensures that with
      CONFIG_RANDOMIZE_BASE, the randomised address of the kernel can't be
      discovered until after the kernel has been mapped.
      
      If the trampoline text page is extended to include multiple sets of
      vectors, it will be larger than a single page, making it tricky to
      find the data page without knowing the size of the trampoline text
      pages, which will vary with PAGE_SIZE.
      
      Move the data page to appear before the text page. This allows the
      data page to be found without knowing the size of the trampoline text
      pages. 'tramp_vectors' is used to refer to the beginning of the
      .entry.tramp.text section, do that explicitly.
      Reviewed-by: NRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NJames Morse <james.morse@arm.com>
      c091fb6a
    • J
      KVM: arm64: Allow indirect vectors to be used without SPECTRE_V3A · 5bdf3437
      James Morse 提交于
      CPUs vulnerable to Spectre-BHB either need to make an SMC-CC firmware
      call from the vectors, or run a sequence of branches. This gets added
      to the hyp vectors. If there is no support for arch-workaround-1 in
      firmware, the indirect vector will be used.
      
      kvm_init_vector_slots() only initialises the two indirect slots if
      the platform is vulnerable to Spectre-v3a. pKVM's hyp_map_vectors()
      only initialises __hyp_bp_vect_base if the platform is vulnerable to
      Spectre-v3a.
      
      As there are about to more users of the indirect vectors, ensure
      their entries in hyp_spectre_vector_selector[] are always initialised,
      and __hyp_bp_vect_base defaults to the regular VA mapping.
      
      The Spectre-v3a check is moved to a helper
      kvm_system_needs_idmapped_vectors(), and merged with the code
      that creates the hyp mappings.
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NJames Morse <james.morse@arm.com>
      5bdf3437
  3. 28 1月, 2022 1 次提交
  4. 24 1月, 2022 1 次提交
  5. 20 1月, 2022 1 次提交
    • K
      arm64: atomics: lse: Dereference matching size · 3364c6ce
      Kees Cook 提交于
      When building with -Warray-bounds, the following warning is generated:
      
      In file included from ./arch/arm64/include/asm/lse.h:16,
                       from ./arch/arm64/include/asm/cmpxchg.h:14,
                       from ./arch/arm64/include/asm/atomic.h:16,
                       from ./include/linux/atomic.h:7,
                       from ./include/asm-generic/bitops/atomic.h:5,
                       from ./arch/arm64/include/asm/bitops.h:25,
                       from ./include/linux/bitops.h:33,
                       from ./include/linux/kernel.h:22,
                       from kernel/printk/printk.c:22:
      ./arch/arm64/include/asm/atomic_lse.h:247:9: warning: array subscript 'long unsigned int[0]' is partly outside array bounds of 'atomic_t[1]' [-Warray-bounds]
        247 |         asm volatile(                                                   \
            |         ^~~
      ./arch/arm64/include/asm/atomic_lse.h:266:1: note: in expansion of macro '__CMPXCHG_CASE'
        266 | __CMPXCHG_CASE(w,  , acq_, 32,  a, "memory")
            | ^~~~~~~~~~~~~~
      kernel/printk/printk.c:3606:17: note: while referencing 'printk_cpulock_owner'
       3606 | static atomic_t printk_cpulock_owner = ATOMIC_INIT(-1);
            |                 ^~~~~~~~~~~~~~~~~~~~
      
      This is due to the compiler seeing an unsigned long * cast against
      something (atomic_t) that is int sized. Replace the cast with the
      matching size cast. This results in no change in binary output.
      
      Note that __ll_sc__cmpxchg_case_##name##sz already uses the same
      constraint:
      
      	[v] "+Q" (*(u##sz *)ptr
      
      Which is why only the LSE form needs updating and not the
      LL/SC form, so this change is unlikely to be problematic.
      
      Cc: Will Deacon <will@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Acked-by: NArd Biesheuvel <ardb@kernel.org>
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20220112202259.3950286-1-keescook@chromium.orgSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      3364c6ce
  6. 16 1月, 2022 1 次提交
  7. 15 1月, 2022 1 次提交
  8. 12 1月, 2022 1 次提交
  9. 22 12月, 2021 1 次提交
  10. 20 12月, 2021 1 次提交
  11. 18 12月, 2021 1 次提交
  12. 17 12月, 2021 1 次提交
  13. 16 12月, 2021 4 次提交
  14. 15 12月, 2021 4 次提交
  15. 14 12月, 2021 8 次提交
    • M
      arm64: atomics: lse: define RETURN ops in terms of FETCH ops · 053f58ba
      Mark Rutland 提交于
      The FEAT_LSE atomic instructions include LD* instructions which return
      the original value of a memory location can be used to directly
      implement FETCH opertations. Each RETURN op is implemented as a copy of
      the corresponding FETCH op with a trailing instruction to generate the
      new value of the memory location. We only directly implement
      *_fetch_add*(), for which we have a trailing `add` instruction.
      
      As the compiler has no visibility of the `add`, this leads to less than
      optimal code generation when consuming the result.
      
      For example, the compiler cannot constant-fold the addition into later
      operations, and currently GCC 11.1.0 will compile:
      
             return __lse_atomic_sub_return(1, v) == 0;
      
      As:
      
      	mov     w1, #0xffffffff
      	ldaddal w1, w2, [x0]
      	add     w1, w1, w2
      	cmp     w1, #0x0
      	cset    w0, eq  // eq = none
      	ret
      
      This patch improves this by replacing the `add` with C addition after
      the inline assembly block, e.g.
      
      	ret += i;
      
      This allows the compiler to manipulate `i`. This permits the compiler to
      merge the `add` and `cmp` for the above, e.g.
      
      	mov     w1, #0xffffffff
      	ldaddal w1, w1, [x0]
      	cmp     w1, #0x1
      	cset    w0, eq  // eq = none
      	ret
      
      With this change the assembly for each RETURN op is identical to the
      corresponding FETCH op (including barriers and clobbers) so I've removed
      the inline assembly and rewritten each RETURN op in terms of the
      corresponding FETCH op, e.g.
      
      | static inline void __lse_atomic_add_return(int i, atomic_t *v)
      | {
      |       return __lse_atomic_fetch_add(i, v) + i
      | }
      
      The new construction does not adversely affect the common case, and
      before and after this patch GCC 11.1.0 can compile:
      
      	__lse_atomic_add_return(i, v)
      
      As:
      
      	ldaddal w0, w2, [x1]
      	add     w0, w0, w2
      
      ... while having the freedom to do better elsewhere.
      
      This is intended as an optimization and cleanup.
      There should be no functional change as a result of this patch.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Acked-by: NWill Deacon <will@kernel.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20211210151410.2782645-6-mark.rutland@arm.comSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      053f58ba
    • M
      arm64: atomics: lse: improve constraints for simple ops · 8a578a75
      Mark Rutland 提交于
      We have overly conservative assembly constraints for the basic FEAT_LSE
      atomic instructions, and using more accurate and permissive constraints
      will allow for better code generation.
      
      The FEAT_LSE basic atomic instructions have come in two forms:
      
      	LD{op}{order}{size} <Rs>, <Rt>, [<Rn>]
      	ST{op}{order}{size} <Rs>, [<Rn>]
      
      The ST* forms are aliases of the LD* forms where:
      
      	ST{op}{order}{size} <Rs>, [<Rn>]
      Is:
      	LD{op}{order}{size} <Rs>, XZR, [<Rn>]
      
      For either form, both <Rs> and <Rn> are read but not written back to,
      and <Rt> is written with the original value of the memory location.
      Where (<Rt> == <Rs>) or (<Rt> == <Rn>), <Rt> is written *after* the
      other register value(s) are consumed. There are no UNPREDICTABLE or
      CONSTRAINED UNPREDICTABLE behaviours when any pair of <Rs>, <Rt>, or
      <Rn> are the same register.
      
      Our current inline assembly always uses <Rs> == <Rt>, treating this
      register as both an input and an output (using a '+r' constraint). This
      forces the compiler to do some unnecessary register shuffling and/or
      redundant value generation.
      
      For example, the compiler cannot reuse the <Rs> value, and currently GCC
      11.1.0 will compile:
      
      	__lse_atomic_add(1, a);
      	__lse_atomic_add(1, b);
      	__lse_atomic_add(1, c);
      
      As:
      
      	mov     w3, #0x1
      	mov     w4, w3
      	stadd   w4, [x0]
      	mov     w0, w3
      	stadd   w0, [x1]
      	stadd   w3, [x2]
      
      We can improve this with more accurate constraints, separating <Rs> and
      <Rt>, where <Rs> is an input-only register ('r'), and <Rt> is an
      output-only value ('=r'). As <Rt> is written back after <Rs> is
      consumed, it does not need to be earlyclobber ('=&r'), leaving the
      compiler free to use the same register for both <Rs> and <Rt> where this
      is desirable.
      
      At the same time, the redundant 'r' constraint for `v` is removed, as
      the `+Q` constraint is sufficient.
      
      With this change, the above example becomes:
      
      	mov     w3, #0x1
      	stadd   w3, [x0]
      	stadd   w3, [x1]
      	stadd   w3, [x2]
      
      I've made this change for the non-value-returning and FETCH ops. The
      RETURN ops have a multi-instruction sequence for which we cannot use the
      same constraints, and a subsequent patch will rewrite hte RETURN ops in
      terms of the FETCH ops, relying on the ability for the compiler to reuse
      the <Rs> value.
      
      This is intended as an optimization.
      There should be no functional change as a result of this patch.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Acked-by: NWill Deacon <will@kernel.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20211210151410.2782645-5-mark.rutland@arm.comSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      8a578a75
    • M
      arm64: atomics: lse: define ANDs in terms of ANDNOTs · 5e9e43c9
      Mark Rutland 提交于
      The FEAT_LSE atomic instructions include atomic bit-clear instructions
      (`ldclr*` and `stclr*`) which can be used to directly implement ANDNOT
      operations. Each AND op is implemented as a copy of the corresponding
      ANDNOT op with a leading `mvn` instruction to apply a bitwise NOT to the
      `i` argument.
      
      As the compiler has no visibility of the `mvn`, this leads to less than
      optimal code generation when generating `i` into a register. For
      example, __lse_atomic_fetch_and(0xf, v) can be compiled to:
      
      	mov     w1, #0xf
      	mvn     w1, w1
      	ldclral w1, w1, [x2]
      
      This patch improves this by replacing the `mvn` with NOT in C before the
      inline assembly block, e.g.
      
      	i = ~i;
      
      This allows the compiler to generate `i` into a register more optimally,
      e.g.
      
      	mov     w1, #0xfffffff0
      	ldclral w1, w1, [x2]
      
      With this change the assembly for each AND op is identical to the
      corresponding ANDNOT op (including barriers and clobbers), so I've
      removed the inline assembly and rewritten each AND op in terms of the
      corresponding ANDNOT op, e.g.
      
      | static inline void __lse_atomic_and(int i, atomic_t *v)
      | {
      | 	return __lse_atomic_andnot(~i, v);
      | }
      
      This is intended as an optimization and cleanup.
      There should be no functional change as a result of this patch.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Acked-by: NWill Deacon <will@kernel.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20211210151410.2782645-4-mark.rutland@arm.comSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      5e9e43c9
    • M
      arm64: atomics lse: define SUBs in terms of ADDs · ef532450
      Mark Rutland 提交于
      The FEAT_LSE atomic instructions include atomic ADD instructions
      (`stadd*` and `ldadd*`), but do not include atomic SUB instructions, so
      we must build all of the SUB operations using the ADD instructions. We
      open-code these today, with each SUB op implemented as a copy of the
      corresponding ADD op with a leading `neg` instruction in the inline
      assembly to negate the `i` argument.
      
      As the compiler has no visibility of the `neg`, this leads to less than
      optimal code generation when generating `i` into a register. For
      example, __les_atomic_fetch_sub(1, v) can be compiled to:
      
      	mov     w1, #0x1
      	neg     w1, w1
      	ldaddal w1, w1, [x2]
      
      This patch improves this by replacing the `neg` with negation in C
      before the inline assembly block, e.g.
      
      	i = -i;
      
      This allows the compiler to generate `i` into a register more optimally,
      e.g.
      
      	mov     w1, #0xffffffff
      	ldaddal w1, w1, [x2]
      
      With this change the assembly for each SUB op is identical to the
      corresponding ADD op (including barriers and clobbers), so I've removed
      the inline assembly and rewritten each SUB op in terms of the
      corresponding ADD op, e.g.
      
      | static inline void __lse_atomic_sub(int i, atomic_t *v)
      | {
      | 	__lse_atomic_add(-i, v);
      | }
      
      For clarity I've moved the definition of each SUB op immediately after
      the corresponding ADD op, and used a single macro to create the RETURN
      forms of both ops.
      
      This is intended as an optimization and cleanup.
      There should be no functional change as a result of this patch.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Acked-by: NWill Deacon <will@kernel.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20211210151410.2782645-3-mark.rutland@arm.comSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      ef532450
    • M
      arm64: atomics: format whitespace consistently · 8e6082e9
      Mark Rutland 提交于
      The code for the atomic ops is formatted inconsistently, and while this
      is not a functional problem it is rather distracting when working on
      them.
      
      Some have ops have consistent indentation, e.g.
      
      | #define ATOMIC_OP_ADD_RETURN(name, mb, cl...)                           \
      | static inline int __lse_atomic_add_return##name(int i, atomic_t *v)     \
      | {                                                                       \
      |         u32 tmp;                                                        \
      |                                                                         \
      |         asm volatile(                                                   \
      |         __LSE_PREAMBLE                                                  \
      |         "       ldadd" #mb "    %w[i], %w[tmp], %[v]\n"                 \
      |         "       add     %w[i], %w[i], %w[tmp]"                          \
      |         : [i] "+r" (i), [v] "+Q" (v->counter), [tmp] "=&r" (tmp)        \
      |         : "r" (v)                                                       \
      |         : cl);                                                          \
      |                                                                         \
      |         return i;                                                       \
      | }
      
      While others have negative indentation for some lines, and/or have
      misaligned trailing backslashes, e.g.
      
      | static inline void __lse_atomic_##op(int i, atomic_t *v)                        \
      | {                                                                       \
      |         asm volatile(                                                   \
      |         __LSE_PREAMBLE                                                  \
      | "       " #asm_op "     %w[i], %[v]\n"                                  \
      |         : [i] "+r" (i), [v] "+Q" (v->counter)                           \
      |         : "r" (v));                                                     \
      | }
      
      This patch makes the indentation consistent and also aligns the trailing
      backslashes. This makes the code easier to read for those (like myself)
      who are easily distracted by these inconsistencies.
      
      This is intended as a cleanup.
      There should be no functional change as a result of this patch.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Acked-by: NWill Deacon <will@kernel.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20211210151410.2782645-2-mark.rutland@arm.comSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      8e6082e9
    • J
      arm64: cpufeature: add HWCAP for FEAT_RPRES · 1175011a
      Joey Gouly 提交于
      Add a new HWCAP to detect the Increased precision of Reciprocal Estimate
      and Reciprocal Square Root Estimate feature (FEAT_RPRES), introduced in Armv8.7.
      
      Also expose this to userspace in the ID_AA64ISAR2_EL1 feature register.
      Signed-off-by: NJoey Gouly <joey.gouly@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Acked-by: NMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20211210165432.8106-4-joey.gouly@arm.comSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      1175011a
    • J
      arm64: add ID_AA64ISAR2_EL1 sys register · 9e45365f
      Joey Gouly 提交于
      This is a new ID register, introduced in 8.7.
      Signed-off-by: NJoey Gouly <joey.gouly@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: James Morse <james.morse@arm.com>
      Cc: Alexandru Elisei <alexandru.elisei@arm.com>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: Reiji Watanabe <reijiw@google.com>
      Acked-by: NMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20211210165432.8106-3-joey.gouly@arm.comSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      9e45365f
    • J
      arm64: cpufeature: add HWCAP for FEAT_AFP · 5c13f042
      Joey Gouly 提交于
      Add a new HWCAP to detect the Alternate Floating-point Behaviour
      feature (FEAT_AFP), introduced in Armv8.7.
      
      Also expose this to userspace in the ID_AA64MMFR1_EL1 feature register.
      Signed-off-by: NJoey Gouly <joey.gouly@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Acked-by: NMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/20211210165432.8106-2-joey.gouly@arm.comSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      5c13f042
  16. 13 12月, 2021 1 次提交
  17. 10 12月, 2021 2 次提交
  18. 08 12月, 2021 3 次提交
    • M
      KVM: arm64: Drop unused workaround_flags vcpu field · 142ff9bd
      Marc Zyngier 提交于
      workaround_flags is a leftover from our earlier Spectre-v4 workaround
      implementation, and now serves no purpose.
      
      Get rid of the field and the corresponding asm-offset definition.
      
      Fixes: 29e8910a ("KVM: arm64: Simplify handling of ARCH_WORKAROUND_2")
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      142ff9bd
    • S
      KVM: Drop obsolete kvm_arch_vcpu_block_finish() · 005467e0
      Sean Christopherson 提交于
      Drop kvm_arch_vcpu_block_finish() now that all arch implementations are
      nops.
      
      No functional change intended.
      Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Reviewed-by: NDavid Matlack <dmatlack@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211009021236.4122790-10-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      005467e0
    • S
      KVM: arm64: Move vGIC v4 handling for WFI out arch callback hook · 6109c5a6
      Sean Christopherson 提交于
      Move the put and reload of the vGIC out of the block/unblock callbacks
      and into a dedicated WFI helper.  Functionally, this is nearly a nop as
      the block hook is called at the very beginning of kvm_vcpu_block(), and
      the only code in kvm_vcpu_block() after the unblock hook is to update the
      halt-polling controls, i.e. can only affect the next WFI.
      
      Back when the arch (un)blocking hooks were added by commits 3217f7c2
      ("KVM: Add kvm_arch_vcpu_{un}blocking callbacks) and d35268da
      ("arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_block"),
      the hooks were invoked only when KVM was about to "block", i.e. schedule
      out the vCPU.  The use case at the time was to schedule a timer in the
      host based on the earliest timer in the guest in order to wake the
      blocking vCPU when the emulated guest timer fired.  Commit accb99bc
      ("KVM: arm/arm64: Simplify bg_timer programming") reworked the timer
      logic to be even more precise, by waiting until the vCPU was actually
      scheduled out, and so move the timer logic from the (un)blocking hooks to
      vcpu_load/put.
      
      In the meantime, the hooks gained usage for enabling vGIC v4 doorbells in
      commit df9ba959 ("KVM: arm/arm64: GICv4: Use the doorbell interrupt
      as an unblocking source"), and added related logic for the VMCR in commit
      5eeaf10e ("KVM: arm/arm64: Sync ICH_VMCR_EL2 back when about to block").
      
      Finally, commit 07ab0f8d ("KVM: Call kvm_arch_vcpu_blocking early
      into the blocking sequence") hoisted the (un)blocking hooks so that they
      wrapped KVM's halt-polling logic in addition to the core "block" logic.
      
      In other words, the original need for arch hooks to take action _only_
      in the block path is long since gone.
      
      Cc: Oliver Upton <oupton@google.com>
      Cc: Marc Zyngier <maz@kernel.org>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20211009021236.4122790-11-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6109c5a6
  19. 07 12月, 2021 1 次提交
    • S
      locking: Allow to include asm/spinlock_types.h from linux/spinlock_types_raw.h · 77993b59
      Sebastian Andrzej Siewior 提交于
      The printk header file includes ratelimit_types.h for its __ratelimit()
      based usage. It is required for the static initializer used in
      printk_ratelimited(). It uses a raw_spinlock_t and includes the
      spinlock_types.h.
      
      PREEMPT_RT substitutes spinlock_t with a rtmutex based implementation and so
      its spinlock_t implmentation (provided by spinlock_rt.h) includes rtmutex.h and
      atomic.h which leads to recursive includes where defines are missing.
      
      By including only the raw_spinlock_t defines it avoids the atomic.h
      related includes at this stage.
      
      An example on powerpc:
      
      |  CALL    scripts/atomic/check-atomics.sh
      |In file included from include/linux/bug.h:5,
      |                 from include/linux/page-flags.h:10,
      |                 from kernel/bounds.c:10:
      |arch/powerpc/include/asm/page_32.h: In function ‘clear_page’:
      |arch/powerpc/include/asm/bug.h:87:4: error: implicit declaration of function â=80=98__WARNâ=80=99 [-Werror=3Dimplicit-function-declaration]
      |   87 |    __WARN();    \
      |      |    ^~~~~~
      |arch/powerpc/include/asm/page_32.h:48:2: note: in expansion of macro ‘WARN_ONâ€=99
      |   48 |  WARN_ON((unsigned long)addr & (L1_CACHE_BYTES - 1));
      |      |  ^~~~~~~
      |arch/powerpc/include/asm/bug.h:58:17: error: invalid application of ‘sizeofâ€=99 to incomplete type ‘struct bug_entryâ€=99
      |   58 |     "i" (sizeof(struct bug_entry)), \
      |      |                 ^~~~~~
      |arch/powerpc/include/asm/bug.h:89:3: note: in expansion of macro ‘BUG_ENTRYâ€=99
      |   89 |   BUG_ENTRY(PPC_TLNEI " %4, 0",   \
      |      |   ^~~~~~~~~
      |arch/powerpc/include/asm/page_32.h:48:2: note: in expansion of macro ‘WARN_ONâ€=99
      |   48 |  WARN_ON((unsigned long)addr & (L1_CACHE_BYTES - 1));
      |      |  ^~~~~~~
      |In file included from arch/powerpc/include/asm/ptrace.h:298,
      |                 from arch/powerpc/include/asm/hw_irq.h:12,
      |                 from arch/powerpc/include/asm/irqflags.h:12,
      |                 from include/linux/irqflags.h:16,
      |                 from include/asm-generic/cmpxchg-local.h:6,
      |                 from arch/powerpc/include/asm/cmpxchg.h:526,
      |                 from arch/powerpc/include/asm/atomic.h:11,
      |                 from include/linux/atomic.h:7,
      |                 from include/linux/rwbase_rt.h:6,
      |                 from include/linux/rwlock_types.h:55,
      |                 from include/linux/spinlock_types.h:74,
      |                 from include/linux/ratelimit_types.h:7,
      |                 from include/linux/printk.h:10,
      |                 from include/asm-generic/bug.h:22,
      |                 from arch/powerpc/include/asm/bug.h:109,
      |                 from include/linux/bug.h:5,
      |                 from include/linux/page-flags.h:10,
      |                 from kernel/bounds.c:10:
      |include/linux/thread_info.h: In function â=80=98copy_overflowâ=80=99:
      |include/linux/thread_info.h:210:2: error: implicit declaration of function â=80=98WARNâ=80=99 [-Werror=3Dimplicit-function-declaration]
      |  210 |  WARN(1, "Buffer overflow detected (%d < %lu)!\n", size, count);
      |      |  ^~~~
      
      The WARN / BUG include pulls in printk.h and then ptrace.h expects WARN
      (from bug.h) which is not yet complete. Even hw_irq.h has WARN_ON()
      statements.
      
      On POWERPC64 there are missing atomic64 defines while building 32bit
      VDSO:
      |  VDSO32C arch/powerpc/kernel/vdso32/vgettimeofday.o
      |In file included from include/linux/atomic.h:80,
      |                 from include/linux/rwbase_rt.h:6,
      |                 from include/linux/rwlock_types.h:55,
      |                 from include/linux/spinlock_types.h:74,
      |                 from include/linux/ratelimit_types.h:7,
      |                 from include/linux/printk.h:10,
      |                 from include/linux/kernel.h:19,
      |                 from arch/powerpc/include/asm/page.h:11,
      |                 from arch/powerpc/include/asm/vdso/gettimeofday.h:5,
      |                 from include/vdso/datapage.h:137,
      |                 from lib/vdso/gettimeofday.c:5,
      |                 from <command-line>:
      |include/linux/atomic-arch-fallback.h: In function ‘arch_atomic64_incâ€=99:
      |include/linux/atomic-arch-fallback.h:1447:2: error: implicit declaration of function ‘arch_atomic64_add’; did you mean ‘arch_atomic_add’? [-Werror=3Dimpl
      |icit-function-declaration]
      | 1447 |  arch_atomic64_add(1, v);
      |      |  ^~~~~~~~~~~~~~~~~
      |      |  arch_atomic_add
      
      The generic fallback is not included, atomics itself are not used. If
      kernel.h does not include printk.h then it comes later from the bug.h
      include.
      
      Allow asm/spinlock_types.h to be included from
      linux/spinlock_types_raw.h.
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20211129174654.668506-12-bigeasy@linutronix.de
      77993b59