1. 16 1月, 2018 1 次提交
    • P
      KVM: PPC: Book3S HV: Enable migration of decrementer register · 5855564c
      Paul Mackerras 提交于
      This adds a register identifier for use with the one_reg interface
      to allow the decrementer expiry time to be read and written by
      userspace.  The decrementer expiry time is in guest timebase units
      and is equal to the sum of the decrementer and the guest timebase.
      (The expiry time is used rather than the decrementer value itself
      because the expiry time is not constantly changing, though the
      decrementer value is, while the guest vcpu is not running.)
      
      Without this, a guest vcpu migrated to a new host will see its
      decrementer set to some random value.  On POWER8 and earlier, the
      decrementer is 32 bits wide and counts down at 512MHz, so the
      guest vcpu will potentially see no decrementer interrupts for up
      to about 4 seconds, which will lead to a stall.  With POWER9, the
      decrementer is now 56 bits side, so the stall can be much longer
      (up to 2.23 years) and more noticeable.
      
      To help work around the problem in cases where userspace has not been
      updated to migrate the decrementer expiry time, we now set the
      default decrementer expiry at vcpu creation time to the current time
      rather than the maximum possible value.  This should mean an
      immediate decrementer interrupt when a migrated vcpu starts
      running.  In cases where the decrementer is 32 bits wide and more
      than 4 seconds elapse between the creation of the vcpu and when it
      first runs, the decrementer would have wrapped around to positive
      values and there may still be a stall - but this is no worse than
      the current situation.  In the large-decrementer case, we are sure
      to get an immediate decrementer interrupt (assuming the time from
      vcpu creation to first run is less than 2.23 years) and we thus
      avoid a very long stall.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      5855564c
  2. 11 1月, 2018 2 次提交
  3. 23 11月, 2017 6 次提交
    • P
      KVM: PPC: Book3S: Eliminate some unnecessary checks · 9aa6825b
      Paul Mackerras 提交于
      In an excess of caution, commit 6f63e81b ("KVM: PPC: Book3S: Add
      MMIO emulation for FP and VSX instructions", 2017-02-21) included
      checks for the case that vcpu->arch.mmio_vsx_copy_nums is less than
      zero, even though its type is u8.  This causes a Coverity warning,
      so we remove the check for < 0.  We also adjust the associated
      comment to be more accurate ("4 or less" rather than "less than 4").
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      9aa6825b
    • P
      KVM: PPC: Book3S HV: Fix conditions for starting vcpu · c0093f1a
      Paul Mackerras 提交于
      This corrects the test that determines whether a vcpu that has just
      become able to run in the guest (e.g. it has just finished handling
      a hypercall or hypervisor page fault) and whose virtual core is
      already running somewhere as a "piggybacked" vcore can start
      immediately or not.  (A piggybacked vcore is one which is executing
      along with another vcore as a result of dynamic micro-threading.)
      
      Previously the test tried to lock the piggybacked vcore using
      spin_trylock, which would always fail because the vcore was already
      locked, and so the vcpu would have to wait until its vcore exited
      the guest before it could enter.
      
      In fact the vcpu can enter if its vcore is in VCORE_PIGGYBACK state
      and not already exiting (or exited) the guest, so the test in
      VCORE_PIGGYBACK state is basically the same as for VCORE_RUNNING
      state.
      
      Coverity detected this as a double unlock issue, which it isn't
      because the spin_trylock would always fail.  This will fix the
      apparent double unlock as well.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      c0093f1a
    • P
      KVM: PPC: Book3S HV: Remove useless statement · 4fcf361d
      Paul Mackerras 提交于
      This removes a statement that has no effect.  It should have been
      removed in commit 898b25b2 ("KVM: PPC: Book3S HV: Simplify dynamic
      micro-threading code", 2017-06-22) along with the loop over the
      piggy-backed virtual cores.
      
      This issue was reported by Coverity.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      4fcf361d
    • P
      KVM: PPC: Book3S HV: Fix typo in kvmppc_hv_get_dirty_log_radix() · 117647ff
      Paul Mackerras 提交于
      This fixes a typo where the intent was to assign to 'j' in order to
      skip some number of bits in the dirty bitmap for a guest.  The effect
      of the typo is benign since it means we just iterate through all the
      bits rather than skipping bits which we know will be zero.  This issue
      was found by Coverity.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      117647ff
    • P
      KVM: PPC: Book3S HV: Avoid shifts by negative amounts · cda2eaa3
      Paul Mackerras 提交于
      The kvmppc_hpte_page_shifts function decodes the actual and base page
      sizes for a HPTE, returning -1 if it doesn't recognize the page size
      encoding.  This then gets used as a shift amount in various places,
      which is undefined behaviour.  This was reported by Coverity.
      
      In fact this should never occur, since we should only get HPTEs in the
      HPT which have a recognized page size encoding.  The only place where
      this might not be true is in the call to kvmppc_actual_pgsz() near the
      beginning of kvmppc_do_h_enter(), where we are validating the HPTE
      value passed in from the guest.
      
      So to fix this and eliminate the undefined behaviour, we make
      kvmppc_hpte_page_shifts return 0 for unrecognized page size encodings,
      and make kvmppc_actual_pgsz() detect that case and return 0 for the
      page size, which will then cause kvmppc_do_h_enter() to return an
      error and refuse to insert any HPTE with an unrecognized page size
      encoding.
      
      To ensure that we don't get undefined behaviour in compute_tlbie_rb(),
      we take the 4k page size path for any unrecognized page size encoding.
      This should never be hit in practice because it is only used on HPTE
      values which have previously been checked for having a recognized
      page size encoding.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      cda2eaa3
    • P
      KVM: PPC: Book3S HV: Fix migration and HPT resizing of HPT guests on radix hosts · ded13fc1
      Paul Mackerras 提交于
      This fixes two errors that prevent a guest using the HPT MMU from
      successfully migrating to a POWER9 host in radix MMU mode, or resizing
      its HPT when running on a radix host.
      
      The first bug was that commit 8dc6cca5 ("KVM: PPC: Book3S HV:
      Don't rely on host's page size information", 2017-09-11) missed two
      uses of hpte_base_page_size(), one in the HPT rehashing code and
      one in kvm_htab_write() (which is used on the destination side in
      migrating a HPT guest).  Instead we use kvmppc_hpte_base_page_shift().
      Having the shift count means that we can use left and right shifts
      instead of multiplication and division in a few places.
      
      Along the way, this adds a check in kvm_htab_write() to ensure that the
      page size encoding in the incoming HPTEs is recognized, and if not
      return an EINVAL error to userspace.
      
      The second bug was that kvm_htab_write was performing some but not all
      of the functions of kvmhv_setup_mmu(), resulting in the destination VM
      being left in radix mode as far as the hardware is concerned.  The
      simplest fix for now is make kvm_htab_write() call
      kvmppc_setup_partition_table() like kvmppc_hv_setup_htab_rma() does.
      In future it would be better to refactor the code more extensively
      to remove the duplication.
      
      Fixes: 8dc6cca5 ("KVM: PPC: Book3S HV: Don't rely on host's page size information")
      Fixes: 7a84084c ("KVM: PPC: Book3S HV: Set partition table rather than SDR1 on POWER9")
      Reported-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Tested-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      ded13fc1
  4. 17 11月, 2017 1 次提交
  5. 16 11月, 2017 16 次提交
  6. 15 11月, 2017 1 次提交
    • M
      powerpc/64s: Fix Power9 DD2.0 workarounds by adding DD2.1 feature · 3ffa9d9e
      Michael Ellerman 提交于
      Recently we added a CPU feature for Power9 DD2.0, to capture the fact
      that some workarounds are required only on Power9 DD1 and DD2.0 but
      not DD2.1 or later.
      
      Then in commit 9d2f510a ("powerpc/64s/idle: avoid POWER9 DD1 and
      DD2.0 ERAT workaround on DD2.1") and commit e3646330
      "powerpc/64s/idle: avoid POWER9 DD1 and DD2.0 PMU workaround on
      DD2.1") we changed CPU_FTR_SECTIONs to check for DD1 or DD20, eg:
      
        BEGIN_FTR_SECTION
                PPC_INVALIDATE_ERAT
        END_FTR_SECTION_IFSET(CPU_FTR_POWER9_DD1 | CPU_FTR_POWER9_DD20)
      
      Unfortunately although this reads as "if set DD1 or DD2.0", the or is
      a bitwise or and actually generates a mask of both bits. The code that
      does the feature patching then checks that the value of the CPU
      features masked with that mask are equal to the mask.
      
      So the end result is we're checking for DD1 and DD20 being set, which
      never happens. Yes the API is terrible.
      
      Removing the ERAT workaround on DD2.0 results in random SEGVs, the
      system tends to boot, but things randomly die including sometimes
      dhclient, udev etc.
      
      To fix the problem and hopefully avoid it in future, we remove the
      DD2.0 CPU feature and instead add a DD2.1 (or later) feature. This
      allows us to easily express that the workarounds are required if DD2.1
      is not set.
      
      At some point we will drop the DD1 workarounds entirely and some of
      this can be cleaned up.
      
      Fixes: 9d2f510a ("powerpc/64s/idle: avoid POWER9 DD1 and DD2.0 ERAT workaround on DD2.1")
      Fixes: e3646330 ("powerpc/64s/idle: avoid POWER9 DD1 and DD2.0 PMU workaround on DD2.1")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3ffa9d9e
  7. 14 11月, 2017 7 次提交
    • M
      powerpc/64s: Fix masking of SRR1 bits on instruction fault · 475b581f
      Michael Ellerman 提交于
      On 64-bit Book3s, when we take an instruction fault the reason for the
      fault may be reported in SRR1. For data faults the reason is reported
      in DSISR (Data Storage Instruction Status Register).
      
      The reasons reported in each do not necessarily correspond, so we mask
      the SRR1 bits before copying them to the DSISR, which is then used by
      the page fault code.
      
      Prior to commit b4c001dc ("powerpc/mm: Use symbolic constants for
      filtering SRR1 bits on ISIs") we used a hard-coded mask of 0x58200000,
      which corresponds to:
      
        DSISR_NOHPTE		0x40000000 /* no translation found */
        DSISR_NOEXEC_OR_G	0x10000000 /* exec of no-exec or guarded */
        DSISR_PROTFAULT	0x08000000 /* protection fault */
        DSISR_KEYFAULT	0x00200000 /* Storage Key fault */
      
      That commit added a #define for the mask, DSISR_SRR1_MATCH_64S, but
      incorrectly used a different similarly named DSISR_BAD_FAULT_64S.
      
      This had the effect of changing the mask to 0xa43a0000, which omits
      everything but DSISR_KEYFAULT.
      
      Luckily this had no visible effect, because in practice we hardly use
      the DSISR bits. The lack of DSISR_NOHPTE means a TLB flush
      optimisation was missed in the native HPTE code, and DSISR_NOEXEC_OR_G
      and DSISR_PROTFAULT are both only used to trigger rare warnings.
      
      So we got lucky, but let's fix it. The new value only has bits between
      17 and 30 set, so we can continue to use andis.
      
      Fixes: b4c001dc ("powerpc/mm: Use symbolic constants for filtering SRR1 bits on ISIs")
      Cc: stable@vger.kernel.org # v4.14+
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      475b581f
    • R
      x86 / CPU: Avoid unnecessary IPIs in arch_freq_get_on_cpu() · b29c6ef7
      Rafael J. Wysocki 提交于
      Even though aperfmperf_snapshot_khz() caches the samples.khz value to
      return if called again in a sufficiently short time, its caller,
      arch_freq_get_on_cpu(), still uses smp_call_function_single() to run it
      which may allow user space to trigger an IPI storm by reading from the
      scaling_cur_freq cpufreq sysfs file in a tight loop.
      
      To avoid that, move the decision on whether or not to return the cached
      samples.khz value to arch_freq_get_on_cpu().
      
      This change was part of commit 941f5f0f ("x86: CPU: Fix up "cpu MHz"
      in /proc/cpuinfo"), but it was not the reason for the revert and it
      remains applicable.
      
      Fixes: 4815d3c5 (cpufreq: x86: Make scaling_cur_freq behave more as expected)
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: NWANG Chao <chao.wang@ucloud.cn>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b29c6ef7
    • D
      MIPS: Add iomem resource for kernel bss section. · e0c5f36b
      David Daney 提交于
      The kexec/kdump tools need to know where the .bss is so it can be
      included in the core dump.  This allows vmcore-dmesg to have access to
      the dmesg buffers of the crashed kernel as well as allowing the
      debugger to examine variables in the bss section.
      
      Add a request for the bss resource in addition to the already
      requested code and data sections.
      Signed-off-by: NDavid Daney <david.daney@cavium.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Steven J. Hill <steven.hill@cavium.com>,
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/17485/Signed-off-by: NJames Hogan <jhogan@kernel.org>
      e0c5f36b
    • B
      MIPS: cmpxchg64() and HAVE_VIRT_CPU_ACCOUNTING_GEN don't work for 32-bit SMP · a3f14310
      Ben Hutchings 提交于
      __cmpxchg64_local_generic() is atomic only w.r.t tasks and interrupts
      on the same CPU (that's what the 'local' means).  We can't use it to
      implement cmpxchg64() in SMP configurations.
      
      So, for 32-bit SMP configurations:
      
      - Don't define cmpxchg64()
      - Don't enable HAVE_VIRT_CPU_ACCOUNTING_GEN, which requires it
      
      Fixes: e2093c7b ("MIPS: Fall back to generic implementation of ...")
      Fixes: bb877e96 ("MIPS: Add support for full dynticks CPU time accounting")
      Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Deng-Cheng Zhu <dengcheng.zhu@mips.com>
      Cc: linux-mips@linux-mips.org
      Cc: <stable@vger.kernel.org> # 4.1+
      Patchwork: https://patchwork.linux-mips.org/patch/17413/Signed-off-by: NJames Hogan <jhogan@kernel.org>
      a3f14310
    • J
      MIPS: BMIPS: Enable HARDIRQS_SW_RESEND · 4dc4704c
      Justin Chen 提交于
      HW interrupts triggered when irq_disable() were being ignored. Enable
      resending HW interrupts as SW interrupts.
      
      This was causing an issue where the interrupts waking the system up from
      a suspend state were not calling their interrupt handlers.
      Signed-off-by: NJustin Chen <justinpopo6@gmail.com>
      Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/16116/Signed-off-by: NJames Hogan <jhogan@kernel.org>
      4dc4704c
    • D
      arm64: Make ARMV8_DEPRECATED depend on SYSCTL · 6cfa7cc4
      Dave Martin 提交于
      If CONFIG_SYSCTL=n and CONFIG_ARMV8_DEPRECATED=y, the deprecated
      instruction emulation code currently leaks some memory at boot
      time, and won't have any runtime control interface.  This does
      not feel like useful or intended behaviour...
      
      This patch adds a dependency on CONFIG_SYSCTL, so that such a
      kernel can't be built in the first place.
      
      It's probably not worth adding the error-handling / cleanup code
      that would be needed to deal with this otherwise: people who
      desperately need the emulation can still enable SYSCTL.
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      6cfa7cc4
    • J
      arm64: Implement __lshrti3 library function · 9bfe7553
      Jason A. Donenfeld 提交于
      Commit fb872273 ("arm64: support __int128 on gcc 5+") added support
      for the __int128 data type, but this breaks the build in some configurations
      where GCC ends up emitting calls to the __lshrti3 helper in libgcc, which
      results in a link error:
      
        kernel/sched/fair.o: In function `__calc_delta':
        fair.c:(.text+0xca0): undefined reference to `__lshrti3'
        kernel/time/timekeeping.o: In function `timekeeping_resume':
        timekeeping.c:(.text+0x3f60): undefined reference to `__lshrti3'
        make: *** [vmlinux] Error 1
      
      Fix the build by providing an implementation of __lshrti3, like we do
      already for __ashlti3 and __ashrti3.
      Reported-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      9bfe7553
  8. 13 11月, 2017 6 次提交