1. 19 2月, 2019 3 次提交
  2. 10 2月, 2019 1 次提交
    • J
      x86/mm: Make set_pmd_at() paravirt aware · 20e55bc1
      Juergen Gross 提交于
      set_pmd_at() calls native_set_pmd() unconditionally on x86. This was
      fine as long as only huge page entries were written via set_pmd_at(),
      as Xen pv guests don't support those.
      
      Commit 2c91bd4a ("mm: speed up mremap by 20x on large regions")
      introduced a usage of set_pmd_at() possible on pv guests, leading to
      failures like:
      
      BUG: unable to handle kernel paging request at ffff888023e26778
      #PF error: [PROT] [WRITE]
      RIP: e030:move_page_tables+0x7c1/0xae0
      move_vma.isra.3+0xd1/0x2d0
      __se_sys_mremap+0x3c6/0x5b0
       do_syscall_64+0x49/0x100
      entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Make set_pmd_at() paravirt aware by just letting it use set_pmd().
      
      Fixes: 2c91bd4a ("mm: speed up mremap by 20x on large regions")
      Reported-by: NSander Eikelenboom <linux@eikelenboom.it>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: xen-devel@lists.xenproject.org
      Cc: boris.ostrovsky@oracle.com
      Cc: sstabellini@kernel.org
      Cc: hpa@zytor.com
      Cc: bp@alien8.de
      Cc: torvalds@linux-foundation.org
      Link: https://lkml.kernel.org/r/20190210074056.11842-1-jgross@suse.com
      20e55bc1
  3. 08 2月, 2019 5 次提交
    • P
      x86/mm/cpa: Fix set_mce_nospec() · 0521e8be
      Peter Zijlstra 提交于
      The recent commit fe0937b2 ("x86/mm/cpa: Fold cpa_flush_range() and
      cpa_flush_array() into a single cpa_flush() function") accidentally made
      the call to make_addr_canonical_again() go away, which breaks
      set_mce_nospec().
      
      Re-instate the call to convert the address back into canonical form right
      before invoking either CLFLUSH or INVLPG. Rename the function while at it
      to be shorter (and less MAGA).
      
      Fixes: fe0937b2 ("x86/mm/cpa: Fold cpa_flush_range() and cpa_flush_array() into a single cpa_flush() function")
      Reported-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NTony Luck <tony.luck@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Rik van Riel <riel@surriel.com>
      Link: https://lkml.kernel.org/r/20190208120859.GH32511@hirez.programming.kicks-ass.net
      0521e8be
    • V
      mips: cm: reprime error cause · 05dc6001
      Vladimir Kondratiev 提交于
      Accordingly to the documentation
      ---cut---
      The GCR_ERROR_CAUSE.ERR_TYPE field and the GCR_ERROR_MULT.ERR_TYPE
      fields can be cleared by either a reset or by writing the current
      value of GCR_ERROR_CAUSE.ERR_TYPE to the
      GCR_ERROR_CAUSE.ERR_TYPE register.
      ---cut---
      Do exactly this. Original value of cm_error may be safely written back;
      it clears error cause and keeps other bits untouched.
      
      Fixes: 3885c2b4 ("MIPS: CM: Add support for reporting CM cache errors")
      Signed-off-by: NVladimir Kondratiev <vladimir.kondratiev@linux.intel.com>
      Signed-off-by: NPaul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: linux-mips@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: stable@vger.kernel.org # v4.3+
      05dc6001
    • Y
      mips: loongson64: remove unreachable(), fix loongson_poweroff(). · 8a96669d
      Yifeng Li 提交于
      On my Yeeloong 8089, I noticed the machine fails to shutdown
      properly, and often, the function mach_prepare_reboot() is
      unexpectedly executed, thus the machine reboots instead. A
      wait loop is needed to ensure the system is in a well-defined
      state before going down.
      
      In commit 997e93d4 ("MIPS: Hang more efficiently on
      halt/powerdown/restart"), a general superset of the wait loop for all
      platforms is already provided, so we don't need to implement our own.
      
      This commit simply removes the unreachable() compiler marco after
      mach_prepare_reboot(), thus allowing the execution of machine_hang().
      My test shows that the machine is now able to shutdown successfully.
      
      Please note that there are two different bugs preventing the machine
      from shutting down, another work-in-progress commit is needed to
      fix a lockup in cpufreq / i8259 driver, please read Reference, this
      commit does not fix that bug.
      
      Reference: https://lkml.org/lkml/2019/2/5/908Signed-off-by: NYifeng Li <tomli@tomli.me>
      Signed-off-by: NPaul Burton <paul.burton@mips.com>
      Cc: linux-mips@vger.kernel.org
      Cc: Huacai Chen <chenhc@lemote.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Cc: Aaro Koskinen <aaro.koskinen@iki.fi>
      Cc: stable@vger.kernel.org # v4.17+
      8a96669d
    • P
      KVM: nVMX: unconditionally cancel preemption timer in free_nested (CVE-2019-7221) · ecec7688
      Peter Shier 提交于
      Bugzilla: 1671904
      
      There are multiple code paths where an hrtimer may have been started to
      emulate an L1 VMX preemption timer that can result in a call to free_nested
      without an intervening L2 exit where the hrtimer is normally
      cancelled. Unconditionally cancel in free_nested to cover all cases.
      
      Embargoed until Feb 7th 2019.
      Signed-off-by: NPeter Shier <pshier@google.com>
      Reported-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Reported-by: NFelix Wilhelm <fwilhelm@google.com>
      Cc: stable@kernel.org
      Message-Id: <20181011184646.154065-1-pshier@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ecec7688
    • P
      KVM: x86: work around leak of uninitialized stack contents (CVE-2019-7222) · 353c0956
      Paolo Bonzini 提交于
      Bugzilla: 1671930
      
      Emulation of certain instructions (VMXON, VMCLEAR, VMPTRLD, VMWRITE with
      memory operand, INVEPT, INVVPID) can incorrectly inject a page fault
      when passed an operand that points to an MMIO address.  The page fault
      will use uninitialized kernel stack memory as the CR2 and error code.
      
      The right behavior would be to abort the VM with a KVM_EXIT_INTERNAL_ERROR
      exit to userspace; however, it is not an easy fix, so for now just
      ensure that the error code and CR2 are zero.
      
      Embargoed until Feb 7th 2019.
      Reported-by: NFelix Wilhelm <fwilhelm@google.com>
      Cc: stable@kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      353c0956
  4. 07 2月, 2019 2 次提交
  5. 05 2月, 2019 4 次提交
    • J
      arm64: kexec_file: handle empty command-line · ea573680
      Jean-Philippe Brucker 提交于
      Calling strlen() on cmdline == NULL produces a kernel oops. Since having
      a NULL cmdline is valid, handle this case explicitly.
      
      Fixes: 52b2a8af ("arm64: kexec_file: load initrd and device-tree")
      Signed-off-by: NJean-Philippe Brucker <jean-philippe.brucker@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      ea573680
    • J
      MIPS: Remove function size check in get_frame_info() · 2b424cfc
      Jun-Ru Chang 提交于
      Patch (b6c7a324 "MIPS: Fix get_frame_info() handling of
      microMIPS function size.") introduces additional function size
      check for microMIPS by only checking insn between ip and ip + func_size.
      However, func_size in get_frame_info() is always 0 if KALLSYMS is not
      enabled. This causes get_frame_info() to return immediately without
      calculating correct frame_size, which in turn causes "Can't analyze
      schedule() prologue" warning messages at boot time.
      
      This patch removes func_size check, and let the frame_size check run
      up to 128 insns for both MIPS and microMIPS.
      Signed-off-by: NJun-Ru Chang <jrjang@realtek.com>
      Signed-off-by: NTony Wu <tonywu@realtek.com>
      Signed-off-by: NPaul Burton <paul.burton@mips.com>
      Fixes: b6c7a324 ("MIPS: Fix get_frame_info() handling of microMIPS function size.")
      Cc: <ralf@linux-mips.org>
      Cc: <jhogan@kernel.org>
      Cc: <macro@mips.com>
      Cc: <yamada.masahiro@socionext.com>
      Cc: <peterz@infradead.org>
      Cc: <mingo@kernel.org>
      Cc: <linux-mips@vger.kernel.org>
      Cc: <linux-kernel@vger.kernel.org>
      2b424cfc
    • P
      MIPS: Use lower case for addresses in nexys4ddr.dts · 047f2d94
      Paul Burton 提交于
      DTC introduced an i2c_bus_reg check in v1.4.7, used since Linux v4.20,
      which complains about upper case addresses used in the unit name.
      
      nexys4ddr.dts names an I2C device node "ad7420@4B", leading to:
      
        arch/mips/boot/dts/xilfpga/nexys4ddr.dts:109.16-112.8: Warning
          (i2c_bus_reg): /i2c@10A00000/ad7420@4B: I2C bus unit address format
          error, expected "4b"
      
      Fix this by switching to lower case addresses throughout the file, as is
      *mostly* the case in the file already & fairly standard throughout the
      tree.
      Signed-off-by: NPaul Burton <paul.burton@mips.com>
      Cc: stable@vger.kernel.org # v4.20+
      Cc: linux-mips@vger.kernel.org
      047f2d94
    • H
      MIPS: Loongson: Introduce and use loongson_llsc_mb() · e02e07e3
      Huacai Chen 提交于
      On the Loongson-2G/2H/3A/3B there is a hardware flaw that ll/sc and
      lld/scd is very weak ordering. We should add sync instructions "before
      each ll/lld" and "at the branch-target between ll/sc" to workaround.
      Otherwise, this flaw will cause deadlock occasionally (e.g. when doing
      heavy load test with LTP).
      
      Below is the explaination of CPU designer:
      
      "For Loongson 3 family, when a memory access instruction (load, store,
      or prefetch)'s executing occurs between the execution of LL and SC, the
      success or failure of SC is not predictable. Although programmer would
      not insert memory access instructions between LL and SC, the memory
      instructions before LL in program-order, may dynamically executed
      between the execution of LL/SC, so a memory fence (SYNC) is needed
      before LL/LLD to avoid this situation.
      
      Since Loongson-3A R2 (3A2000), we have improved our hardware design to
      handle this case. But we later deduce a rarely circumstance that some
      speculatively executed memory instructions due to branch misprediction
      between LL/SC still fall into the above case, so a memory fence (SYNC)
      at branch-target (if its target is not between LL/SC) is needed for
      Loongson 3A1000, 3B1500, 3A2000 and 3A3000.
      
      Our processor is continually evolving and we aim to to remove all these
      workaround-SYNCs around LL/SC for new-come processor."
      
      Here is an example:
      
      Both cpu1 and cpu2 simutaneously run atomic_add by 1 on same atomic var,
      this bug cause both 'sc' run by two cpus (in atomic_add) succeed at same
      time('sc' return 1), and the variable is only *added by 1*, sometimes,
      which is wrong and unacceptable(it should be added by 2).
      
      Why disable fix-loongson3-llsc in compiler?
      Because compiler fix will cause problems in kernel's __ex_table section.
      
      This patch fix all the cases in kernel, but:
      
      +. the fix at the end of futex_atomic_cmpxchg_inatomic is for branch-target
      of 'bne', there other cases which smp_mb__before_llsc() and smp_llsc_mb() fix
      the ll and branch-target coincidently such as atomic_sub_if_positive/
      cmpxchg/xchg, just like this one.
      
      +. Loongson 3 does support CONFIG_EDAC_ATOMIC_SCRUB, so no need to touch
      edac.h
      
      +. local_ops and cmpxchg_local should not be affected by this bug since
      only the owner can write.
      
      +. mips_atomic_set for syscall.c is deprecated and rarely used, just let
      it go
      Signed-off-by: NHuacai Chen <chenhc@lemote.com>
      Signed-off-by: NHuang Pei <huangpei@loongson.cn>
      [paul.burton@mips.com:
        - Simplify the addition of -mno-fix-loongson3-llsc to cflags, and add
          a comment describing why it's there.
        - Make loongson_llsc_mb() a no-op when
          CONFIG_CPU_LOONGSON3_WORKAROUNDS=n, rather than a compiler memory
          barrier.
        - Add a comment describing the bug & how loongson_llsc_mb() helps
          in asm/barrier.h.]
      Signed-off-by: NPaul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: ambrosehua@gmail.com
      Cc: Steven J . Hill <Steven.Hill@cavium.com>
      Cc: linux-mips@linux-mips.org
      Cc: Fuxin Zhang <zhangfx@lemote.com>
      Cc: Zhangjin Wu <wuzhangjin@gmail.com>
      Cc: Li Xuefeng <lixuefeng@loongson.cn>
      Cc: Xu Chenghua <xuchenghua@loongson.cn>
      e02e07e3
  6. 04 2月, 2019 3 次提交
    • W
      arm64: ptdump: Don't iterate kernel page tables using PTRS_PER_PXX · d23c808c
      Will Deacon 提交于
      When 52-bit virtual addressing is enabled for userspace
      (CONFIG_ARM64_USER_VA_BITS_52=y), the kernel continues to utilise 48-bit
      virtual addressing in TTBR1. Consequently, PTRS_PER_PGD reflects the
      larger page table size for userspace and the pgd pointer for kernel page
      tables is offset before being written to TTBR1.
      
      This means that we can't use PTRS_PER_PGD to iterate over kernel page
      tables unless we apply the same offset, which is fiddly to get right and
      leads to some non-idiomatic walking code. Instead, just follow the usual
      pattern when walking page tables by using a while loop driven by
      pXd_offset() and pXd_addr_end().
      Reported-by: NQian Cai <cai@lca.pw>
      Tested-by: NQian Cai <cai@lca.pw>
      Acked-by: NSteve Capper <steve.capper@arm.com>
      Tested-by: NSteve Capper <steve.capper@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      d23c808c
    • P
      perf/x86/intel: Delay memory deallocation until x86_pmu_dead_cpu() · 602cae04
      Peter Zijlstra 提交于
      intel_pmu_cpu_prepare() allocated memory for ->shared_regs among other
      members of struct cpu_hw_events. This memory is released in
      intel_pmu_cpu_dying() which is wrong. The counterpart of the
      intel_pmu_cpu_prepare() callback is x86_pmu_dead_cpu().
      
      Otherwise if the CPU fails on the UP path between CPUHP_PERF_X86_PREPARE
      and CPUHP_AP_PERF_X86_STARTING then it won't release the memory but
      allocate new memory on the next attempt to online the CPU (leaking the
      old memory).
      Also, if the CPU down path fails between CPUHP_AP_PERF_X86_STARTING and
      CPUHP_PERF_X86_PREPARE then the CPU will go back online but never
      allocate the memory that was released in x86_pmu_dying_cpu().
      
      Make the memory allocation/free symmetrical in regard to the CPU hotplug
      notifier by moving the deallocation to intel_pmu_cpu_dead().
      
      This started in commit:
      
         a7e3ed1e ("perf: Add support for supplementary event registers").
      
      In principle the bug was introduced in v2.6.39 (!), but it will almost
      certainly not backport cleanly across the big CPU hotplug rewrite between v4.7-v4.15...
      
      [ bigeasy: Added patch description. ]
      [ mingo: Added backporting guidance. ]
      Reported-by: NHe Zhe <zhe.he@windriver.com>
      Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> # With developer hat on
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> # With maintainer hat on
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@kernel.org
      Cc: bp@alien8.de
      Cc: hpa@zytor.com
      Cc: jolsa@kernel.org
      Cc: kan.liang@linux.intel.com
      Cc: namhyung@kernel.org
      Cc: <stable@vger.kernel.org>
      Fixes: a7e3ed1e ("perf: Add support for supplementary event registers").
      Link: https://lkml.kernel.org/r/20181219165350.6s3jvyxbibpvlhtq@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      602cae04
    • K
      perf/x86/intel/uncore: Add Node ID mask · 9e63a789
      Kan Liang 提交于
      Some PCI uncore PMUs cannot be registered on an 8-socket system (HPE
      Superdome Flex).
      
      To understand which Socket the PCI uncore PMUs belongs to, perf retrieves
      the local Node ID of the uncore device from CPUNODEID(0xC0) of the PCI
      configuration space, and the mapping between Socket ID and Node ID from
      GIDNIDMAP(0xD4). The Socket ID can be calculated accordingly.
      
      The local Node ID is only available at bit 2:0, but current code doesn't
      mask it. If a BIOS doesn't clear the rest of the bits, an incorrect Node ID
      will be fetched.
      
      Filter the Node ID by adding a mask.
      Reported-by: NSong Liu <songliubraving@fb.com>
      Tested-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: <stable@vger.kernel.org> # v3.7+
      Fixes: 7c94ee2e ("perf/x86: Add Intel Nehalem and Sandy Bridge-EP uncore support")
      Link: https://lkml.kernel.org/r/1548600794-33162-1-git-send-email-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9e63a789
  7. 03 2月, 2019 1 次提交
    • T
      x86/MCE: Initialize mce.bank in the case of a fatal error in mce_no_way_out() · d28af26f
      Tony Luck 提交于
      Internal injection testing crashed with a console log that said:
      
        mce: [Hardware Error]: CPU 7: Machine Check Exception: f Bank 0: bd80000000100134
      
      This caused a lot of head scratching because the MCACOD (bits 15:0) of
      that status is a signature from an L1 data cache error. But Linux says
      that it found it in "Bank 0", which on this model CPU only reports L1
      instruction cache errors.
      
      The answer was that Linux doesn't initialize "m->bank" in the case that
      it finds a fatal error in the mce_no_way_out() pre-scan of banks. If
      this was a local machine check, then this partially initialized struct
      mce is being passed to mce_panic().
      
      Fix is simple: just initialize m->bank in the case of a fatal error.
      
      Fixes: 40c36e27 ("x86/mce: Fix incorrect "Machine check from unknown source" message")
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: x86-ml <x86@kernel.org>
      Cc: stable@vger.kernel.org # v4.18 Note pre-v5.0 arch/x86/kernel/cpu/mce/core.c was called arch/x86/kernel/cpu/mcheck/mce.c
      Link: https://lkml.kernel.org/r/20190201003341.10638-1-tony.luck@intel.com
      d28af26f
  8. 02 2月, 2019 5 次提交
    • J
      x86/resctrl: Avoid confusion over the new X86_RESCTRL config · e6d42931
      Johannes Weiner 提交于
      "Resource Control" is a very broad term for this CPU feature, and a term
      that is also associated with containers, cgroups etc. This can easily
      cause confusion.
      
      Make the user prompt more specific. Match the config symbol name.
      
       [ bp: In the future, the corresponding ARM arch-specific code will be
         under ARM_CPU_RESCTRL and the arch-agnostic bits will be carved out
         under the CPU_RESCTRL umbrella symbol. ]
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Babu Moger <Babu.Moger@amd.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: linux-doc@vger.kernel.org
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Pu Wen <puwen@hygon.cn>
      Cc: Reinette Chatre <reinette.chatre@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190130195621.GA30653@cmpxchg.org
      e6d42931
    • Q
      x86_64: increase stack size for KASAN_EXTRA · a8e911d1
      Qian Cai 提交于
      If the kernel is configured with KASAN_EXTRA, the stack size is
      increasted significantly because this option sets "-fstack-reuse" to
      "none" in GCC [1].  As a result, it triggers stack overrun quite often
      with 32k stack size compiled using GCC 8.  For example, this reproducer
      
        https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise06.c
      
      triggers a "corrupted stack end detected inside scheduler" very reliably
      with CONFIG_SCHED_STACK_END_CHECK enabled.
      
      There are just too many functions that could have a large stack with
      KASAN_EXTRA due to large local variables that have been called over and
      over again without being able to reuse the stacks.  Some noticiable ones
      are
      
        size
        7648 shrink_page_list
        3584 xfs_rmap_convert
        3312 migrate_page_move_mapping
        3312 dev_ethtool
        3200 migrate_misplaced_transhuge_page
        3168 copy_process
      
      There are other 49 functions are over 2k in size while compiling kernel
      with "-Wframe-larger-than=" even with a related minimal config on this
      machine.  Hence, it is too much work to change Makefiles for each object
      to compile without "-fsanitize-address-use-after-scope" individually.
      
      [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715#c23
      
      Although there is a patch in GCC 9 to help the situation, GCC 9 probably
      won't be released in a few months and then it probably take another
      6-month to 1-year for all major distros to include it as a default.
      Hence, the stack usage with KASAN_EXTRA can be revisited again in 2020
      when GCC 9 is everywhere.  Until then, this patch will help users avoid
      stack overrun.
      
      This has already been fixed for arm64 for the same reason via
      6e883067 ("arm64: kasan: Increase stack size for KASAN_EXTRA").
      
      Link: http://lkml.kernel.org/r/20190109215209.2903-1-cai@lca.pwSigned-off-by: NQian Cai <cai@lca.pw>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a8e911d1
    • M
      arch: unexport asm/shmparam.h for all architectures · 36c0f7f0
      Masahiro Yamada 提交于
      Most architectures do not export shmparam.h to user-space.
      
        $ find arch -name shmparam.h  | sort
        arch/alpha/include/asm/shmparam.h
        arch/arc/include/asm/shmparam.h
        arch/arm64/include/asm/shmparam.h
        arch/arm/include/asm/shmparam.h
        arch/csky/include/asm/shmparam.h
        arch/ia64/include/asm/shmparam.h
        arch/mips/include/asm/shmparam.h
        arch/nds32/include/asm/shmparam.h
        arch/nios2/include/asm/shmparam.h
        arch/parisc/include/asm/shmparam.h
        arch/powerpc/include/asm/shmparam.h
        arch/s390/include/asm/shmparam.h
        arch/sh/include/asm/shmparam.h
        arch/sparc/include/asm/shmparam.h
        arch/x86/include/asm/shmparam.h
        arch/xtensa/include/asm/shmparam.h
      
      Strangely, some users of the asm-generic wrapper export shmparam.h
      
        $ git grep 'generic-y += shmparam.h'
        arch/c6x/include/uapi/asm/Kbuild:generic-y += shmparam.h
        arch/h8300/include/uapi/asm/Kbuild:generic-y += shmparam.h
        arch/hexagon/include/uapi/asm/Kbuild:generic-y += shmparam.h
        arch/m68k/include/uapi/asm/Kbuild:generic-y += shmparam.h
        arch/microblaze/include/uapi/asm/Kbuild:generic-y += shmparam.h
        arch/openrisc/include/uapi/asm/Kbuild:generic-y += shmparam.h
        arch/riscv/include/asm/Kbuild:generic-y += shmparam.h
        arch/unicore32/include/uapi/asm/Kbuild:generic-y += shmparam.h
      
      The newly added riscv correctly creates the asm-generic wrapper
      in the kernel space, but the others (c6x, h8300, hexagon, m68k,
      microblaze, openrisc, unicore32) create the one in the uapi directory.
      
      Digging into the git history, now I guess fcc8487d ("uapi:
      export all headers under uapi directories") was the misconversion.
      Prior to that commit, no architecture exported to shmparam.h
      As its commit description said, that commit exported shmparam.h
      for c6x, h8300, hexagon, m68k, openrisc, unicore32.
      
      83f0124a ("microblaze: remove asm-generic wrapper headers")
      accidentally exported shmparam.h for microblaze.
      
      This commit unexports shmparam.h for those architectures.
      
      There is no more reason to export include/uapi/asm-generic/shmparam.h,
      so it has been moved to include/asm-generic/shmparam.h
      
      Link: http://lkml.kernel.org/r/1546904307-11124-1-git-send-email-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: NStafford Horne <shorne@gmail.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Aurelien Jacquiot <jacquiot.aurelien@gmail.com>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Palmer Dabbelt <palmer@sifive.com>
      Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      36c0f7f0
    • K
      x86/kexec: Don't setup EFI info if EFI runtime is not enabled · 2aa958c9
      Kairui Song 提交于
      Kexec-ing a kernel with "efi=noruntime" on the first kernel's command
      line causes the following null pointer dereference:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
        #PF error: [normal kernel read fault]
        Call Trace:
         efi_runtime_map_copy+0x28/0x30
         bzImage64_load+0x688/0x872
         arch_kexec_kernel_image_load+0x6d/0x70
         kimage_file_alloc_init+0x13e/0x220
         __x64_sys_kexec_file_load+0x144/0x290
         do_syscall_64+0x55/0x1a0
         entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Just skip the EFI info setup if EFI runtime services are not enabled.
      
       [ bp: Massage commit message. ]
      Suggested-by: NDave Young <dyoung@redhat.com>
      Signed-off-by: NKairui Song <kasong@redhat.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NDave Young <dyoung@redhat.com>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: bhe@redhat.com
      Cc: David Howells <dhowells@redhat.com>
      Cc: erik.schmauss@intel.com
      Cc: fanc.fnst@cn.fujitsu.com
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: kexec@lists.infradead.org
      Cc: lenb@kernel.org
      Cc: linux-acpi@vger.kernel.org
      Cc: Philipp Rudo <prudo@linux.vnet.ibm.com>
      Cc: rafael.j.wysocki@intel.com
      Cc: robert.moore@intel.com
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86-ml <x86@kernel.org>
      Cc: Yannik Sembritzki <yannik@sembritzki.me>
      Link: https://lkml.kernel.org/r/20190118111310.29589-2-kasong@redhat.com
      2aa958c9
    • L
      x86: explicitly align IO accesses in memcpy_{to,from}io · c228d294
      Linus Torvalds 提交于
      In commit 170d13ca ("x86: re-introduce non-generic memcpy_{to,from}io")
      I made our copy from IO space use a separate copy routine rather than
      rely on the generic memcpy.  I did that because our generic memory copy
      isn't actually well-defined when it comes to internal access ordering or
      alignment, and will in fact depend on various CPUID flags.
      
      In particular, the default memcpy() for a modern Intel CPU will
      generally be just a "rep movsb", which works reasonably well for
      medium-sized memory copies of regular RAM, since the CPU will turn it
      into fairly optimized microcode.
      
      However, for non-cached memory and IO, "rep movs" ends up being
      horrendously slow and will just do the architectural "one byte at a
      time" accesses implied by the movsb.
      
      At the other end of the spectrum, if you _don't_ end up using the "rep
      movsb" code, you'd likely fall back to the software copy, which does
      overlapping accesses for the tail, and may copy things backwards.
      Again, for regular memory that's fine, for IO memory not so much.
      
      The thinking was that clearly nobody really cared (because things
      worked), but some people had seen horrible performance due to the byte
      accesses, so let's just revert back to our long ago version that dod
      "rep movsl" for the bulk of the copy, and then fixed up the potentially
      last few bytes of the tail with "movsw/b".
      
      Interestingly (and perhaps not entirely surprisingly), while that was
      our original memory copy implementation, and had been used before for
      IO, in the meantime many new users of memcpy_*io() had come about.  And
      while the access patterns for the memory copy weren't well-defined (so
      arguably _any_ access pattern should work), in practice the "rep movsb"
      case had been very common for the last several years.
      
      In particular Jarkko Sakkinen reported that the memcpy_*io() change
      resuled in weird errors from his Geminilake NUC TPM module.
      
      And it turns out that the TPM TCG accesses according to spec require
      that the accesses be
      
       (a) done strictly sequentially
      
       (b) be naturally aligned
      
      otherwise the TPM chip will abort the PCI transaction.
      
      And, in fact, the tpm_crb.c driver did this:
      
      	memcpy_fromio(buf, priv->rsp, 6);
      	...
      	memcpy_fromio(&buf[6], &priv->rsp[6], expected - 6);
      
      which really should never have worked in the first place, but back
      before commit 170d13ca it *happened* to work, because the
      memcpy_fromio() would be expanded to a regular memcpy, and
      
       (a) gcc would expand the first memcpy in-line, and turn it into a
           4-byte and a 2-byte read, and they happened to be in the right
           order, and the alignment was right.
      
       (b) gcc would call "memcpy()" for the second one, and the machines that
           had this TPM chip also apparently ended up always having ERMS
           ("Enhanced REP MOVSB/STOSB instructions"), so we'd use the "rep
           movbs" for that copy.
      
      In other words, basically by pure luck, the code happened to use the
      right access sizes in the (two different!) memcpy() implementations to
      make it all work.
      
      But after commit 170d13ca, both of the memcpy_fromio() calls
      resulted in a call to the routine with the consistent memory accesses,
      and in both cases it started out transferring with 4-byte accesses.
      Which worked for the first copy, but resulted in the second copy doing a
      32-bit read at an address that was only 2-byte aligned.
      
      Jarkko is actually fixing the fragile code in the TPM driver, but since
      this is an excellent example of why we absolutely must not use a generic
      memcpy for IO accesses, _and_ an IO-specific one really should strive to
      align the IO accesses, let's do exactly that.
      
      Side note: Jarkko also noted that the driver had been used on ARM
      platforms, and had worked.  That was because on 32-bit ARM, memcpy_*io()
      ends up always doing byte accesses, and on 64-bit ARM it first does byte
      accesses to align to 8-byte boundaries, and then does 8-byte accesses
      for the bulk.
      
      So ARM actually worked by design, and the x86 case worked by pure luck.
      
      We *might* want to make x86-64 do the 8-byte case too.  That should be a
      pretty straightforward extension, but let's do one thing at a time.  And
      generally MMIO accesses aren't really all that performance-critical, as
      shown by the fact that for a long time we just did them a byte at a
      time, and very few people ever noticed.
      Reported-and-tested-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
      Tested-by: NJerry Snitselaar <jsnitsel@redhat.com>
      Cc: David Laight <David.Laight@aculab.com>
      Fixes: 170d13ca ("x86: re-introduce non-generic memcpy_{to,from}io")
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c228d294
  9. 01 2月, 2019 8 次提交
    • J
      arm64: hibernate: Clean the __hyp_text to PoC after resume · f7daa9c8
      James Morse 提交于
      During resume hibernate restores all physical memory. Any memory
      that is accessed with the MMU disabled needs to be cleaned to the
      PoC.
      
      KVMs __hyp_text was previously ommitted as it runs with the MMU
      enabled, but now that the hyp-stub is located in this section,
      we must clean __hyp_text too.
      
      This ensures secondary CPUs that come online after hibernate
      has finished resuming, and load KVM via the freshly written
      hyp-stub see the correct instructions.
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      f7daa9c8
    • J
      arm64: hyp-stub: Forbid kprobing of the hyp-stub · 8fac5cbd
      James Morse 提交于
      The hyp-stub is loaded by the kernel's early startup code at EL2
      during boot, before KVM takes ownership later. The hyp-stub's
      text is part of the regular kernel text, meaning it can be kprobed.
      
      A breakpoint in the hyp-stub causes the CPU to spin in el2_sync_invalid.
      
      Add it to the __hyp_text.
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      8fac5cbd
    • J
      arm64: kprobe: Always blacklist the KVM world-switch code · f2b3d856
      James Morse 提交于
      On systems with VHE the kernel and KVM's world-switch code run at the
      same exception level. Code that is only used on a VHE system does not
      need to be annotated as __hyp_text as it can reside anywhere in the
       kernel text.
      
      __hyp_text was also used to prevent kprobes from patching breakpoint
      instructions into this region, as this code runs at a different
      exception level. While this is no longer true with VHE, KVM still
      switches VBAR_EL1, meaning a kprobe's breakpoint executed in the
      world-switch code will cause a hyp-panic.
      
      Move the __hyp_text check in the kprobes blacklist so it applies on
      VHE systems too, to cover the common code and guest enter/exit
      assembly.
      
      Fixes: 888b3c87 ("arm64: Treat all entry code as non-kprobe-able")
      Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com>
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      f2b3d856
    • A
      arm64: kaslr: ensure randomized quantities are clean also when kaslr is off · 8ea23593
      Ard Biesheuvel 提交于
      Commit 1598ecda ("arm64: kaslr: ensure randomized quantities are
      clean to the PoC") added cache maintenance to ensure that global
      variables set by the kaslr init routine are not wiped clean due to
      cache invalidation occurring during the second round of page table
      creation.
      
      However, if kaslr_early_init() exits early with no randomization
      being applied (either due to the lack of a seed, or because the user
      has disabled kaslr explicitly), no cache maintenance is performed,
      leading to the same issue we attempted to fix earlier, as far as the
      module_alloc_base variable is concerned.
      
      Note that module_alloc_base cannot be initialized statically, because
      that would cause it to be subject to a R_AARCH64_RELATIVE relocation,
      causing it to be overwritten by the second round of KASLR relocation
      processing.
      
      Fixes: f80fb3a3 ("arm64: add support for kernel ASLR")
      Cc: <stable@vger.kernel.org> # v4.6+
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      8ea23593
    • C
      arm64: Do not issue IPIs for user executable ptes · 132fdc37
      Catalin Marinas 提交于
      Commit 3b8c9f1c ("arm64: IPI each CPU after invalidating the I-cache
      for kernel mappings") was aimed at fixing the I-cache invalidation for
      kernel mappings. However, it inadvertently caused all cache maintenance
      for user mappings via set_pte_at() -> __sync_icache_dcache() ->
      sync_icache_aliases() to call kick_all_cpus_sync().
      Reported-by: NShijith Thotton <sthotton@marvell.com>
      Tested-by: NShijith Thotton <sthotton@marvell.com>
      Reported-by: NWandun Chen <chenwandun@huawei.com>
      Fixes: 3b8c9f1c ("arm64: IPI each CPU after invalidating the I-cache for kernel mappings")
      Cc: <stable@vger.kernel.org> # 4.19.x-
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      132fdc37
    • O
      powerpc/papr_scm: Use the correct bind address · 5a3840a4
      Oliver O'Halloran 提交于
      When binding an SCM volume to a physical address the hypervisor has the
      option to return early with a continue token with the expectation that
      the guest will resume the bind operation until it completes. A quirk of
      this interface is that the bind address will only be returned by the
      first bind h-call and the subsequent calls will return
      0xFFFF_FFFF_FFFF_FFFF for the bind address.
      
      We currently do not save the address returned by the first h-call. As a
      result we will use the junk address as the base of the bound region if
      the hypervisor decides to split the bind across multiple h-calls. This
      bug was found when testing with very large SCM volumes where the bind
      process would take more time than they hypervisor's internal h-call time
      limit would allow. This patch fixes the issue by saving the bind address
      from the first call.
      
      Cc: stable@vger.kernel.org
      Fixes: b5beae5e ("powerpc/pseries: Add driver for PAPR SCM regions")
      Signed-off-by: NOliver O'Halloran <oohall@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5a3840a4
    • K
      ARM: cns3xxx: Use actual size reads for PCIe · 432dd706
      Koen Vandeputte 提交于
      commit 802b7c06 ("ARM: cns3xxx: Convert PCI to use generic config
      accessors") reimplemented cns3xxx_pci_read_config() using
      pci_generic_config_read32(), which preserved the property of only doing
      32-bit reads.
      
      It also replaced cns3xxx_pci_write_config() with pci_generic_config_write(),
      so it changed writes from always being 32 bits to being the actual size,
      which works just fine.
      
      Given that:
      
      - The documentation does not mention that only 32 bit access is allowed.
      - Writes are already executed using the actual size
      - Extensive testing shows that 8b, 16b and 32b reads work as intended
      
      Allow read access of any size by replacing pci_generic_config_read32()
      with the pci_generic_config_read() accessors.
      
      Fixes: 802b7c06 ("ARM: cns3xxx: Convert PCI to use generic config accessors")
      Suggested-by: NBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: NKoen Vandeputte <koen.vandeputte@ncentric.com>
      [lorenzo.pieralisi@arm.com: updated commit log]
      Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Acked-by: NKrzysztof Halasa <khalasa@piap.pl>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      CC: Krzysztof Halasa <khalasa@piap.pl>
      CC: Olof Johansson <olof@lixom.net>
      CC: Robin Leblon <robin.leblon@ncentric.com>
      CC: Rob Herring <robh@kernel.org>
      CC: Russell King <linux@armlinux.org.uk>
      CC: Tim Harvey <tharvey@gateworks.com>
      432dd706
    • K
      ARM: cns3xxx: Fix writing to wrong PCI config registers after alignment · 65dbb423
      Koen Vandeputte 提交于
      Originally, cns3xxx used its own functions for mapping, reading and
      writing config registers.
      
      Commit 802b7c06 ("ARM: cns3xxx: Convert PCI to use generic config
      accessors") removed the internal PCI config write function in favor of
      the generic one:
      
        cns3xxx_pci_write_config() --> pci_generic_config_write()
      
      cns3xxx_pci_write_config() expected aligned addresses, being produced by
      cns3xxx_pci_map_bus() while the generic one pci_generic_config_write()
      actually expects the real address as both the function and hardware are
      capable of byte-aligned writes.
      
      This currently leads to pci_generic_config_write() writing to the wrong
      registers.
      
      For instance, upon ath9k module loading:
      
      - driver ath9k gets loaded
      - The driver wants to write value 0xA8 to register PCI_LATENCY_TIMER,
        located at 0x0D
      - cns3xxx_pci_map_bus() aligns the address to 0x0C
      - pci_generic_config_write() effectively writes 0xA8 into register 0x0C
        (CACHE_LINE_SIZE)
      
      Fix the bug by removing the alignment in the cns3xxx mapping function.
      
      Fixes: 802b7c06 ("ARM: cns3xxx: Convert PCI to use generic config accessors")
      Signed-off-by: NKoen Vandeputte <koen.vandeputte@ncentric.com>
      [lorenzo.pieralisi@arm.com: updated commit log]
      Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Acked-by: NKrzysztof Halasa <khalasa@piap.pl>
      Acked-by: NTim Harvey <tharvey@gateworks.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      CC: stable@vger.kernel.org	# v4.0+
      CC: Bjorn Helgaas <bhelgaas@google.com>
      CC: Olof Johansson <olof@lixom.net>
      CC: Robin Leblon <robin.leblon@ncentric.com>
      CC: Rob Herring <robh@kernel.org>
      CC: Russell King <linux@armlinux.org.uk>
      65dbb423
  10. 31 1月, 2019 3 次提交
    • T
      x86/microcode/amd: Don't falsely trick the late loading mechanism · 912139cf
      Thomas Lendacky 提交于
      The load_microcode_amd() function searches for microcode patches and
      attempts to apply a microcode patch if it is of different level than the
      currently installed level.
      
      While the processor won't actually load a level that is less than
      what is already installed, the logic wrongly returns UCODE_NEW thus
      signaling to its caller reload_store() that a late loading should be
      attempted.
      
      If the file-system contains an older microcode revision than what is
      currently running, such a late microcode reload can result in these
      misleading messages:
      
        x86/CPU: CPU features have changed after loading microcode, but might not take effect.
        x86/CPU: Please consider either early loading through initrd/built-in or a potential BIOS update.
      
      These messages were issued on a system where SME/SEV are not
      enabled by the BIOS (MSR C001_0010[23] = 0b) because during boot,
      early_detect_mem_encrypt() is called and cleared the SME and SEV
      features in this case.
      
      However, after the wrong late load attempt, get_cpu_cap() is called and
      reloads the SME and SEV feature bits, resulting in the messages.
      
      Update the microcode level check to not attempt microcode loading if the
      current level is greater than(!) and not only equal to the current patch
      level.
      
       [ bp: massage commit message. ]
      
      Fixes: 2613f36e ("x86/microcode: Attempt late loading only when new microcode is present")
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/154894518427.9406.8246222496874202773.stgit@tlendack-t1.amdoffice.net
      912139cf
    • A
      powerpc/radix: Fix kernel crash with mremap() · 579b9239
      Aneesh Kumar K.V 提交于
      With support for split pmd lock, we use pmd page pmd_huge_pte pointer
      to store the deposited page table. In those config when we move page
      tables we need to make sure we move the deposited page table to the
      correct pmd page. Otherwise this can result in crash when we withdraw
      of deposited page table because we can find the pmd_huge_pte NULL.
      
      eg:
      
        __split_huge_pmd+0x1070/0x1940
        __split_huge_pmd+0xe34/0x1940 (unreliable)
        vma_adjust_trans_huge+0x110/0x1c0
        __vma_adjust+0x2b4/0x9b0
        __split_vma+0x1b8/0x280
        __do_munmap+0x13c/0x550
        sys_mremap+0x220/0x7e0
        system_call+0x5c/0x70
      
      Fixes: 675d9952 ("powerpc/book3s64: Enable split pmd ptlock.")
      Cc: stable@vger.kernel.org # v4.18+
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      579b9239
    • J
      cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM · b284909a
      Josh Poimboeuf 提交于
      With the following commit:
      
        73d5e2b4 ("cpu/hotplug: detect SMT disabled by BIOS")
      
      ... the hotplug code attempted to detect when SMT was disabled by BIOS,
      in which case it reported SMT as permanently disabled.  However, that
      code broke a virt hotplug scenario, where the guest is booted with only
      primary CPU threads, and a sibling is brought online later.
      
      The problem is that there doesn't seem to be a way to reliably
      distinguish between the HW "SMT disabled by BIOS" case and the virt
      "sibling not yet brought online" case.  So the above-mentioned commit
      was a bit misguided, as it permanently disabled SMT for both cases,
      preventing future virt sibling hotplugs.
      
      Going back and reviewing the original problems which were attempted to
      be solved by that commit, when SMT was disabled in BIOS:
      
        1) /sys/devices/system/cpu/smt/control showed "on" instead of
           "notsupported"; and
      
        2) vmx_vm_init() was incorrectly showing the L1TF_MSG_SMT warning.
      
      I'd propose that we instead consider #1 above to not actually be a
      problem.  Because, at least in the virt case, it's possible that SMT
      wasn't disabled by BIOS and a sibling thread could be brought online
      later.  So it makes sense to just always default the smt control to "on"
      to allow for that possibility (assuming cpuid indicates that the CPU
      supports SMT).
      
      The real problem is #2, which has a simple fix: change vmx_vm_init() to
      query the actual current SMT state -- i.e., whether any siblings are
      currently online -- instead of looking at the SMT "control" sysfs value.
      
      So fix it by:
      
        a) reverting the original "fix" and its followup fix:
      
           73d5e2b4 ("cpu/hotplug: detect SMT disabled by BIOS")
           bc2d8d26 ("cpu/hotplug: Fix SMT supported evaluation")
      
           and
      
        b) changing vmx_vm_init() to query the actual current SMT state --
           instead of the sysfs control value -- to determine whether the L1TF
           warning is needed.  This also requires the 'sched_smt_present'
           variable to exported, instead of 'cpu_smt_control'.
      
      Fixes: 73d5e2b4 ("cpu/hotplug: detect SMT disabled by BIOS")
      Reported-by: NIgor Mammedov <imammedo@redhat.com>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Joe Mario <jmario@redhat.com>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: kvm@vger.kernel.org
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/e3a85d585da28cc333ecbc1e78ee9216e6da9396.1548794349.git.jpoimboe@redhat.com
      b284909a
  11. 30 1月, 2019 5 次提交
    • M
      ARM: tango: Improve ARCH_MULTIPLATFORM compatibility · d0f9f167
      Marc Gonzalez 提交于
      Calling platform-specific code unconditionally blows up when running
      an ARCH_MULTIPLATFORM kernel on a different platform. Don't do it.
      Reported-by: NPaolo Pisati <p.pisati@gmail.com>
      Signed-off-by: NMarc Gonzalez <marc.w.gonzalez@free.fr>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Cc: stable@vger.kernel.org # v4.8+
      Fixes: a30eceb7 ("ARM: tango: add Suspend-to-RAM support")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      d0f9f167
    • R
      ARM: iop32x/n2100: fix PCI IRQ mapping · db409092
      Russell King 提交于
      Booting 4.20 on a TheCUS N2100 results in a kernel oops while probing
      PCI, due to n2100_pci_map_irq() having been discarded during boot.
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Cc: stable@vger.kernel.org # 2.6.18+
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      db409092
    • C
      x86/fault: Fix sign-extend unintended sign extension · 5ccd3528
      Colin Ian King 提交于
      show_ldttss() shifts desc.base2 by 24 bit, but base2 is 8 bits of a
      bitfield in a u16.
      
      Due to the really great idea of integer promotion in C99 base2 is promoted
      to an int, because that's the standard defined behaviour when all values
      which can be represented by base2 fit into an int.
      
      Now if bit 7 is set in desc.base2 the result of the shift left by 24 makes
      the resulting integer negative and the following conversion to unsigned
      long legitmately sign extends first causing the upper bits 32 bits to be
      set in the result.
      
      Fix this by casting desc.base2 to unsigned long before the shift.
      
      Detected by CoverityScan, CID#1475635 ("Unintended sign extension")
      
      [ tglx: Reworded the changelog a bit as I actually had to lookup
        	the standard (again) to decode the original one. ]
      
      Fixes: a1a371c4 ("x86/fault: Decode page fault OOPSes better")
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: kernel-janitors@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181222191116.21831-1-colin.king@canonical.com
      5ccd3528
    • W
      x86/boot/compressed/64: Set EFER.LME=1 in 32-bit trampoline before returning to long mode · b677dfae
      Wei Huang 提交于
      In some old AMD KVM implementation, guest's EFER.LME bit is cleared by KVM
      when the hypervsior detects that the guest sets CR0.PG to 0. This causes
      the guest OS to reboot when it tries to return from 32-bit trampoline code
      because the CPU is in incorrect state: CR4.PAE=1, CR0.PG=1, CS.L=1, but
      EFER.LME=0.  As a precaution, set EFER.LME=1 as part of long mode
      activation procedure. This extra step won't cause any harm when Linux is
      booted on a bare-metal machine.
      Signed-off-by: NWei Huang <wei@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: bp@alien8.de
      Cc: hpa@zytor.com
      Link: https://lkml.kernel.org/r/20190104054411.12489-1-wei@redhat.com
      b677dfae
    • P
      MIPS: VDSO: Include $(ccflags-vdso) in o32,n32 .lds builds · 67fc5dc8
      Paul Burton 提交于
      When generating vdso-o32.lds & vdso-n32.lds for use with programs
      running as compat ABIs under 64b kernels, we previously haven't included
      the compiler flags that are supposedly common to all ABIs - ie. those in
      the ccflags-vdso variable.
      
      This is problematic in cases where we need to provide the -m%-float flag
      in order to ensure that we don't attempt to use a floating point ABI
      that's incompatible with the target CPU & ABI. For example a toolchain
      using current gcc trunk configured --with-fp-32=xx fails to build a
      64r6el_defconfig kernel with the following error:
      
        cc1: error: '-march=mips1' requires '-mfp32'
        make[2]: *** [arch/mips/vdso/Makefile:135: arch/mips/vdso/vdso-o32.lds] Error 1
      
      Include $(ccflags-vdso) for the compat VDSO .lds builds, just as it is
      included for the native VDSO .lds & when compiling objects for the
      compat VDSOs. This ensures we consistently provide the -msoft-float flag
      amongst others, avoiding the problem by ensuring we're agnostic to the
      toolchain defaults.
      Signed-off-by: NPaul Burton <paul.burton@mips.com>
      Fixes: ebb5e78c ("MIPS: Initial implementation of a VDSO")
      Cc: linux-mips@vger.kernel.org
      Cc: Kevin Hilman <khilman@baylibre.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Maciej W . Rozycki <macro@linux-mips.org>
      Cc: stable@vger.kernel.org # v4.4+
      67fc5dc8