1. 08 10月, 2018 5 次提交
  2. 06 10月, 2018 1 次提交
  3. 05 10月, 2018 2 次提交
    • S
      powerpc/numa: Skip onlining a offline node in kdump path · ac1788cc
      Srikar Dronamraju 提交于
      With commit 2ea62630 ("powerpc/topology: Get topology for shared
      processors at boot"), kdump kernel on shared LPAR may crash.
      
      The necessary conditions are
      - Shared LPAR with at least 2 nodes having memory and CPUs.
      - Memory requirement for kdump kernel must be met by the first N-1
        nodes where there are at least N nodes with memory and CPUs.
      
      Example numactl of such a machine.
        $ numactl -H
        available: 5 nodes (0,2,5-7)
        node 0 cpus:
        node 0 size: 0 MB
        node 0 free: 0 MB
        node 2 cpus:
        node 2 size: 255 MB
        node 2 free: 189 MB
        node 5 cpus: 24 25 26 27 28 29 30 31
        node 5 size: 4095 MB
        node 5 free: 4024 MB
        node 6 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
        node 6 size: 6353 MB
        node 6 free: 5998 MB
        node 7 cpus: 8 9 10 11 12 13 14 15 32 33 34 35 36 37 38 39
        node 7 size: 7640 MB
        node 7 free: 7164 MB
        node distances:
        node   0   2   5   6   7
          0:  10  40  40  40  40
          2:  40  10  40  40  40
          5:  40  40  10  40  40
          6:  40  40  40  10  20
          7:  40  40  40  20  10
      
      Steps to reproduce.
      1. Load / start kdump service.
      2. Trigger a kdump (for example : echo c > /proc/sysrq-trigger)
      
      When booting a kdump kernel with 2048M:
      
        kexec: Starting switchover sequence.
        I'm in purgatory
        Using 1TB segments
        hash-mmu: Initializing hash mmu with SLB
        Linux version 4.19.0-rc5-master+ (srikar@linux-xxu6) (gcc version 4.8.5 (SUSE Linux)) #1 SMP Thu Sep 27 19:45:00 IST 2018
        Found initrd at 0xc000000009e70000:0xc00000000ae554b4
        Using pSeries machine description
        -----------------------------------------------------
        ppc64_pft_size    = 0x1e
        phys_mem_size     = 0x88000000
        dcache_bsize      = 0x80
        icache_bsize      = 0x80
        cpu_features      = 0x000000ff8f5d91a7
          possible        = 0x0000fbffcf5fb1a7
          always          = 0x0000006f8b5c91a1
        cpu_user_features = 0xdc0065c2 0xef000000
        mmu_features      = 0x7c006001
        firmware_features = 0x00000007c45bfc57
        htab_hash_mask    = 0x7fffff
        physical_start    = 0x8000000
        -----------------------------------------------------
        numa:   NODE_DATA [mem 0x87d5e300-0x87d67fff]
        numa:     NODE_DATA(0) on node 6
        numa:   NODE_DATA [mem 0x87d54600-0x87d5e2ff]
        Top of RAM: 0x88000000, Total RAM: 0x88000000
        Memory hole size: 0MB
        Zone ranges:
          DMA      [mem 0x0000000000000000-0x0000000087ffffff]
          DMA32    empty
          Normal   empty
        Movable zone start for each node
        Early memory node ranges
          node   6: [mem 0x0000000000000000-0x0000000087ffffff]
        Could not find start_pfn for node 0
        Initmem setup node 0 [mem 0x0000000000000000-0x0000000000000000]
        On node 0 totalpages: 0
        Initmem setup node 6 [mem 0x0000000000000000-0x0000000087ffffff]
        On node 6 totalpages: 34816
      
        Unable to handle kernel paging request for data at address 0x00000060
        Faulting instruction address: 0xc000000008703a54
        Oops: Kernel access of bad area, sig: 11 [#1]
        LE SMP NR_CPUS=2048 NUMA pSeries
        Modules linked in:
        CPU: 11 PID: 1 Comm: swapper/11 Not tainted 4.19.0-rc5-master+ #1
        NIP:  c000000008703a54 LR: c000000008703a38 CTR: 0000000000000000
        REGS: c00000000b673440 TRAP: 0380   Not tainted  (4.19.0-rc5-master+)
        MSR:  8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 24022022  XER: 20000002
        CFAR: c0000000086fc238 IRQMASK: 0
        GPR00: c000000008703a38 c00000000b6736c0 c000000009281900 0000000000000000
        GPR04: 0000000000000000 0000000000000000 fffffffffffff001 c00000000b660080
        GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000220
        GPR12: 0000000000002200 c000000009e51400 0000000000000000 0000000000000008
        GPR16: 0000000000000000 c000000008c152e8 c000000008c152a8 0000000000000000
        GPR20: c000000009422fd8 c000000009412fd8 c000000009426040 0000000000000008
        GPR24: 0000000000000000 0000000000000000 c000000009168bc8 c000000009168c78
        GPR28: c00000000b126410 0000000000000000 c00000000916a0b8 c00000000b126400
        NIP [c000000008703a54] bus_add_device+0x84/0x1e0
        LR [c000000008703a38] bus_add_device+0x68/0x1e0
        Call Trace:
        [c00000000b6736c0] [c000000008703a38] bus_add_device+0x68/0x1e0 (unreliable)
        [c00000000b673740] [c000000008700194] device_add+0x454/0x7c0
        [c00000000b673800] [c00000000872e660] __register_one_node+0xb0/0x240
        [c00000000b673860] [c00000000839a6bc] __try_online_node+0x12c/0x180
        [c00000000b673900] [c00000000839b978] try_online_node+0x58/0x90
        [c00000000b673930] [c0000000080846d8] find_and_online_cpu_nid+0x158/0x190
        [c00000000b673a10] [c0000000080848a0] numa_update_cpu_topology+0x190/0x580
        [c00000000b673c00] [c000000008d3f2e4] smp_cpus_done+0x94/0x108
        [c00000000b673c70] [c000000008d5c00c] smp_init+0x174/0x19c
        [c00000000b673d00] [c000000008d346b8] kernel_init_freeable+0x1e0/0x450
        [c00000000b673dc0] [c0000000080102e8] kernel_init+0x28/0x160
        [c00000000b673e30] [c00000000800b65c] ret_from_kernel_thread+0x5c/0x80
        Instruction dump:
        60000000 60000000 e89e0020 7fe3fb78 4bff87d5 60000000 7c7d1b79 4082008c
        e8bf0050 e93e0098 3b9f0010 2fa50000 <e8690060> 38630018 419e0114 7f84e378
        ---[ end trace 593577668c2daa65 ]---
      
      However a regular kernel with 4096M (2048 gets reserved for crash
      kernel) boots properly.
      
      Unlike regular kernels, which mark all available nodes as online,
      kdump kernel only marks just enough nodes as online and marks the rest
      as offline at boot. However kdump kernel boots with all available
      CPUs. With Commit 2ea62630 ("powerpc/topology: Get topology for
      shared processors at boot"), all CPUs are onlined on their respective
      nodes at boot time. try_online_node() tries to online the offline
      nodes but fails as all needed subsystems are not yet initialized.
      
      As part of fix, detect and skip early onlining of a offline node.
      
      Fixes: 2ea62630 ("powerpc/topology: Get topology for shared processors at boot")
      Reported-by: NPavithra Prakash <pavrampu@in.ibm.com>
      Signed-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Tested-by: NHari Bathini <hbathini@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      ac1788cc
    • M
      powerpc: Don't print kernel instructions in show_user_instructions() · a932ed3b
      Michael Ellerman 提交于
      Recently we implemented show_user_instructions() which dumps the code
      around the NIP when a user space process dies with an unhandled
      signal. This was modelled on the x86 code, and we even went so far as
      to implement the exact same bug, namely that if the user process
      crashed with its NIP pointing into the kernel we will dump kernel text
      to dmesg. eg:
      
        bad-bctr[2996]: segfault (11) at c000000000010000 nip c000000000010000 lr 12d0b0894 code 1
        bad-bctr[2996]: code: fbe10068 7cbe2b78 7c7f1b78 fb610048 38a10028 38810020 fb810050 7f8802a6
        bad-bctr[2996]: code: 3860001c f8010080 48242371 60000000 <7c7b1b79> 4082002c e8010080 eb610048
      
      This was discovered on x86 by Jann Horn and fixed in commit
      342db04a ("x86/dumpstack: Don't dump kernel memory based on usermode RIP").
      
      Fix it by checking the adjusted NIP value (pc) and number of
      instructions against USER_DS, and bail if we fail the check, eg:
      
        bad-bctr[2969]: segfault (11) at c000000000010000 nip c000000000010000 lr 107930894 code 1
        bad-bctr[2969]: Bad NIP, not dumping instructions.
      
      Fixes: 88b0fe17 ("powerpc: Add show_user_instructions()")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a932ed3b
  4. 04 10月, 2018 5 次提交
    • P
      kvm: nVMX: fix entry with pending interrupt if APICv is enabled · 7e712684
      Paolo Bonzini 提交于
      Commit b5861e5c introduced a check on
      the interrupt-window and NMI-window CPU execution controls in order to
      inject an external interrupt vmexit before the first guest instruction
      executes.  However, when APIC virtualization is enabled the host does not
      need a vmexit in order to inject an interrupt at the next interrupt window;
      instead, it just places the interrupt vector in RVI and the processor will
      inject it as soon as possible.  Therefore, on machines with APICv it is
      not enough to check the CPU execution controls: the same scenario can also
      happen if RVI>vPPR.
      
      Fixes: b5861e5cReviewed-by: NNikita Leshchenko <nikita.leshchenko@oracle.com>
      Cc: Sean Christopherson <sean.j.christopherson@intel.com>
      Cc: Liran Alon <liran.alon@oracle.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7e712684
    • P
      KVM: VMX: hide flexpriority from guest when disabled at the module level · 2cf7ea9f
      Paolo Bonzini 提交于
      As of commit 8d860bbe ("kvm: vmx: Basic APIC virtualization controls
      have three settings"), KVM will disable VIRTUALIZE_APIC_ACCESSES when
      a nested guest writes APIC_BASE MSR and kvm-intel.flexpriority=0,
      whereas previously KVM would allow a nested guest to enable
      VIRTUALIZE_APIC_ACCESSES so long as it's supported in hardware.  That is,
      KVM now advertises VIRTUALIZE_APIC_ACCESSES to a guest but doesn't
      (always) allow setting it when kvm-intel.flexpriority=0, and may even
      initially allow the control and then clear it when the nested guest
      writes APIC_BASE MSR, which is decidedly odd even if it doesn't cause
      functional issues.
      
      Hide the control completely when the module parameter is cleared.
      reported-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Fixes: 8d860bbe ("kvm: vmx: Basic APIC virtualization controls have three settings")
      Cc: Jim Mattson <jmattson@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2cf7ea9f
    • S
      KVM: VMX: check for existence of secondary exec controls before accessing · fd6b6d9b
      Sean Christopherson 提交于
      Return early from vmx_set_virtual_apic_mode() if the processor doesn't
      support VIRTUALIZE_APIC_ACCESSES or VIRTUALIZE_X2APIC_MODE, both of
      which reside in SECONDARY_VM_EXEC_CONTROL.  This eliminates warnings
      due to VMWRITEs to SECONDARY_VM_EXEC_CONTROL (VMCS field 401e) failing
      on processors without secondary exec controls.
      
      Remove the similar check for TPR shadowing as it is incorporated in the
      flexpriority_enabled check and the APIC-related code in
      vmx_update_msr_bitmap() is further gated by VIRTUALIZE_X2APIC_MODE.
      Reported-by: NGerhard Wiesinger <redhat@wiesinger.com>
      Fixes: 8d860bbe ("kvm: vmx: Basic APIC virtualization controls have three settings")
      Cc: Jim Mattson <jmattson@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fd6b6d9b
    • A
      x86/vdso: Fix vDSO syscall fallback asm constraint regression · 02e42566
      Andy Lutomirski 提交于
      When I added the missing memory outputs, I failed to update the
      index of the first argument (ebx) on 32-bit builds, which broke the
      fallbacks.  Somehow I must have screwed up my testing or gotten
      lucky.
      
      Add another test to cover gettimeofday() as well.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Fixes: 715bd9d1 ("x86/vdso: Fix asm constraints on vDSO syscall fallbacks")
      Link: http://lkml.kernel.org/r/21bd45ab04b6d838278fa5bebfa9163eceffa13c.1538608971.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      02e42566
    • P
      KVM: PPC: Book3S HV: Avoid crash from THP collapse during radix page fault · 6579804c
      Paul Mackerras 提交于
      Commit 71d29f43 ("KVM: PPC: Book3S HV: Don't use compound_order to
      determine host mapping size", 2018-09-11) added a call to 
      __find_linux_pte() and a dereference of the returned PTE pointer to the
      radix page fault path in the common case where the page is normal
      system memory.  Previously, __find_linux_pte() was only called for
      mappings to physical addresses which don't have a page struct (e.g.
      memory-mapped I/O) or where the page struct is marked as reserved
      memory.
      
      This exposes us to the possibility that the returned PTE pointer
      could be NULL, for example in the case of a concurrent THP collapse
      operation.  Dereferencing the returned NULL pointer causes a host
      crash.
      
      To fix this, we check for NULL, and if it is NULL, we retry the
      operation by returning to the guest, with the expectation that it
      will generate the same page fault again (unless of course it has
      been fixed up by another CPU in the meantime).
      
      Fixes: 71d29f43 ("KVM: PPC: Book3S HV: Don't use compound_order to determine host mapping size")
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      6579804c
  5. 03 10月, 2018 5 次提交
  6. 02 10月, 2018 5 次提交
  7. 01 10月, 2018 6 次提交
  8. 28 9月, 2018 1 次提交
    • K
      x86/boot: Fix kexec booting failure in the SEV bit detection code · bdec8d7f
      Kairui Song 提交于
      Commit
      
        1958b5fc ("x86/boot: Add early boot support when running with SEV active")
      
      can occasionally cause system resets when kexec-ing a second kernel even
      if SEV is not active.
      
      That's because get_sev_encryption_bit() uses 32-bit rIP-relative
      addressing to read the value of enc_bit - a variable which caches a
      previously detected encryption bit position - but kexec may allocate
      the early boot code to a higher location, beyond the 32-bit addressing
      limit.
      
      In this case, garbage will be read and get_sev_encryption_bit() will
      return the wrong value, leading to accessing memory with the wrong
      encryption setting.
      
      Therefore, remove enc_bit, and thus get rid of the need to do 32-bit
      rIP-relative addressing in the first place.
      
       [ bp: massage commit message heavily. ]
      
      Fixes: 1958b5fc ("x86/boot: Add early boot support when running with SEV active")
      Suggested-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NKairui Song <kasong@redhat.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NTom Lendacky <thomas.lendacky@amd.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: tglx@linutronix.de
      Cc: mingo@redhat.com
      Cc: hpa@zytor.com
      Cc: brijesh.singh@amd.com
      Cc: kexec@lists.infradead.org
      Cc: dyoung@redhat.com
      Cc: bhe@redhat.com
      Cc: ghook@redhat.com
      Link: https://lkml.kernel.org/r/20180927123845.32052-1-kasong@redhat.com
      bdec8d7f
  9. 26 9月, 2018 1 次提交
  10. 25 9月, 2018 8 次提交
    • S
      powerpc/numa: Use associativity if VPHN hcall is successful · 2483ef05
      Srikar Dronamraju 提交于
      Currently associativity is used to lookup node-id even if the
      preceding VPHN hcall failed. However this can cause CPU to be made
      part of the wrong node, (most likely to be node 0). This is because
      VPHN is not enabled on KVM guests.
      
      With 2ea62630 ("powerpc/topology: Get topology for shared processors at
      boot"), associativity is used to set to the wrong node. Hence KVM
      guest topology is broken.
      
      For example : A 4 node KVM guest before would have reported.
      
        [root@localhost ~]#  numactl -H
        available: 4 nodes (0-3)
        node 0 cpus: 0 1 2 3
        node 0 size: 1746 MB
        node 0 free: 1604 MB
        node 1 cpus: 4 5 6 7
        node 1 size: 2044 MB
        node 1 free: 1765 MB
        node 2 cpus: 8 9 10 11
        node 2 size: 2044 MB
        node 2 free: 1837 MB
        node 3 cpus: 12 13 14 15
        node 3 size: 2044 MB
        node 3 free: 1903 MB
        node distances:
        node   0   1   2   3
          0:  10  40  40  40
          1:  40  10  40  40
          2:  40  40  10  40
          3:  40  40  40  10
      
      Would now report:
      
        [root@localhost ~]# numactl -H
        available: 4 nodes (0-3)
        node 0 cpus: 0 2 3 4 5 6 7 8 9 10 11 12 13 14 15
        node 0 size: 1746 MB
        node 0 free: 1244 MB
        node 1 cpus:
        node 1 size: 2044 MB
        node 1 free: 2032 MB
        node 2 cpus: 1
        node 2 size: 2044 MB
        node 2 free: 2028 MB
        node 3 cpus:
        node 3 size: 2044 MB
        node 3 free: 2032 MB
        node distances:
        node   0   1   2   3
          0:  10  40  40  40
          1:  40  10  40  40
          2:  40  40  10  40
          3:  40  40  40  10
      
      Fix this by skipping associativity lookup if the VPHN hcall failed.
      
      Fixes: 2ea62630 ("powerpc/topology: Get topology for shared processors at boot")
      Signed-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      2483ef05
    • M
      powerpc/tm: Avoid possible userspace r1 corruption on reclaim · 96dc89d5
      Michael Neuling 提交于
      Current we store the userspace r1 to PACATMSCRATCH before finally
      saving it to the thread struct.
      
      In theory an exception could be taken here (like a machine check or
      SLB miss) that could write PACATMSCRATCH and hence corrupt the
      userspace r1. The SLB fault currently doesn't touch PACATMSCRATCH, but
      others do.
      
      We've never actually seen this happen but it's theoretically
      possible. Either way, the code is fragile as it is.
      
      This patch saves r1 to the kernel stack (which can't fault) before we
      turn MSR[RI] back on. PACATMSCRATCH is still used but only with
      MSR[RI] off. We then copy r1 from the kernel stack to the thread
      struct once we have MSR[RI] back on.
      Suggested-by: NBreno Leitao <leitao@debian.org>
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      96dc89d5
    • M
      powerpc/tm: Fix userspace r13 corruption · cf13435b
      Michael Neuling 提交于
      When we treclaim we store the userspace checkpointed r13 to a scratch
      SPR and then later save the scratch SPR to the user thread struct.
      
      Unfortunately, this doesn't work as accessing the user thread struct
      can take an SLB fault and the SLB fault handler will write the same
      scratch SPRG that now contains the userspace r13.
      
      To fix this, we store r13 to the kernel stack (which can't fault)
      before we access the user thread struct.
      
      Found by running P8 guest + powervm + disable_1tb_segments + TM. Seen
      as a random userspace segfault with r13 looking like a kernel address.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Reviewed-by: NBreno Leitao <leitao@debian.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      cf13435b
    • J
      RISC-V: include linux/ftrace.h in asm-prototypes.h · 57a48978
      James Cowgill 提交于
      Building a riscv kernel with CONFIG_FUNCTION_TRACER and
      CONFIG_MODVERSIONS enabled results in these two warnings:
      
        MODPOST vmlinux.o
      WARNING: EXPORT symbol "return_to_handler" [vmlinux] version generation failed, symbol will not be versioned.
      WARNING: EXPORT symbol "_mcount" [vmlinux] version generation failed, symbol will not be versioned.
      
      When exporting symbols from an assembly file, the MODVERSIONS code
      requires their prototypes to be defined in asm-prototypes.h (see
      scripts/Makefile.build). Since both of these symbols have prototypes
      defined in linux/ftrace.h, include this header from RISC-V's
      asm-prototypes.h.
      Reported-by: NKarsten Merker <merker@debian.org>
      Signed-off-by: NJames Cowgill <jcowgill@debian.org>
      Signed-off-by: NPalmer Dabbelt <palmer@sifive.com>
      57a48978
    • F
      ARM: dts: BCM63xx: Fix incorrect interrupt specifiers · 3ab97942
      Florian Fainelli 提交于
      A number of our interrupts were incorrectly specified, fix both the PPI
      and SPI interrupts to be correct.
      
      Fixes: b5762cac ("ARM: bcm63138: add NAND DT support")
      Fixes: 46d4bca0 ("ARM: BCM63XX: add BCM63138 minimal Device Tree")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      3ab97942
    • S
      arm64: hugetlb: Avoid unnecessary clearing in huge_ptep_set_access_flags · 031e6e6b
      Steve Capper 提交于
      For contiguous hugetlb, huge_ptep_set_access_flags performs a
      get_clear_flush (which then flushes the TLBs) even when no change of ptes
      is necessary.
      
      Unfortunately, this behaviour can lead to back-to-back page faults being
      generated when running with multiple threads that access the same
      contiguous huge page.
      
      Thread 1                     |  Thread 2
      -----------------------------+------------------------------
      hugetlb_fault                |
      huge_ptep_set_access_flags   |
        -> invalidate pte range    | hugetlb_fault
      continue processing          | wait for hugetlb_fault_mutex
      release mutex and return     | huge_ptep_set_access_flags
                                   |   -> invalidate pte range
      hugetlb_fault
      ...
      
      This patch changes huge_ptep_set_access_flags s.t. we first read the
      contiguous range of ptes (whilst preserving dirty information); the pte
      range is only then invalidated where necessary and this prevents further
      spurious page faults.
      
      Fixes: d8bdcff2 ("arm64: hugetlb: Add break-before-make logic for contiguous entries")
      Reported-by: NLei Zhang <zhang.lei@jp.fujitsu.com>
      Signed-off-by: NSteve Capper <steve.capper@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      031e6e6b
    • S
      arm64: hugetlb: Fix handling of young ptes · 469ed9d8
      Steve Capper 提交于
      In the contiguous bit hugetlb break-before-make code we assume that all
      hugetlb pages are young.
      
      In fact, remove_migration_pte is able to place an old hugetlb pte so
      this assumption is not valid.
      
      This patch fixes the contiguous hugetlb scanning code to preserve young
      ptes.
      
      Fixes: d8bdcff2 ("arm64: hugetlb: Add break-before-make logic for contiguous entries")
      Signed-off-by: NSteve Capper <steve.capper@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      469ed9d8
    • P
      KVM: x86: never trap MSR_KERNEL_GS_BASE · 4679b61f
      Paolo Bonzini 提交于
      KVM has an old optimization whereby accesses to the kernel GS base MSR
      are trapped when the guest is in 32-bit and not when it is in 64-bit mode.
      The idea is that swapgs is not available in 32-bit mode, thus the
      guest has no reason to access the MSR unless in 64-bit mode and
      32-bit applications need not pay the price of switching the kernel GS
      base between the host and the guest values.
      
      However, this optimization adds complexity to the code for little
      benefit (these days most guests are going to be 64-bit anyway) and in fact
      broke after commit 678e315e ("KVM: vmx: add dedicated utility to
      access guest's kernel_gs_base", 2018-08-06); the guest kernel GS base
      can be corrupted across SMIs and UEFI Secure Boot is therefore broken
      (a secure boot Linux guest, for example, fails to reach the login prompt
      about half the time).  This patch just removes the optimization; the
      kernel GS base MSR is now never trapped by KVM, similarly to the FS and
      GS base MSRs.
      
      Fixes: 678e315eReviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4679b61f
  11. 24 9月, 2018 1 次提交
    • M
      powerpc/pseries: Fix unitialized timer reset on migration · 8604895a
      Michael Bringmann 提交于
      After migration of a powerpc LPAR, the kernel executes code to
      update the system state to reflect new platform characteristics.
      
      Such changes include modifications to device tree properties provided
      to the system by PHYP. Property notifications received by the
      post_mobility_fixup() code are passed along to the kernel in general
      through a call to of_update_property() which in turn passes such
      events back to all modules through entries like the '.notifier_call'
      function within the NUMA module.
      
      When the NUMA module updates its state, it resets its event timer. If
      this occurs after a previous call to stop_topology_update() or on a
      system without VPHN enabled, the code runs into an unitialized timer
      structure and crashes. This patch adds a safety check along this path
      toward the problem code.
      
      An example crash log is as follows.
      
        ibmvscsi 30000081: Re-enabling adapter!
        ------------[ cut here ]------------
        kernel BUG at kernel/time/timer.c:958!
        Oops: Exception in kernel mode, sig: 5 [#1]
        LE SMP NR_CPUS=2048 NUMA pSeries
        Modules linked in: nfsv3 nfs_acl nfs tcp_diag udp_diag inet_diag lockd unix_diag af_packet_diag netlink_diag grace fscache sunrpc xts vmx_crypto pseries_rng sg binfmt_misc ip_tables xfs libcrc32c sd_mod ibmvscsi ibmveth scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod
        CPU: 11 PID: 3067 Comm: drmgr Not tainted 4.17.0+ #179
        ...
        NIP mod_timer+0x4c/0x400
        LR  reset_topology_timer+0x40/0x60
        Call Trace:
          0xc0000003f9407830 (unreliable)
          reset_topology_timer+0x40/0x60
          dt_update_callback+0x100/0x120
          notifier_call_chain+0x90/0x100
          __blocking_notifier_call_chain+0x60/0x90
          of_property_notify+0x90/0xd0
          of_update_property+0x104/0x150
          update_dt_property+0xdc/0x1f0
          pseries_devicetree_update+0x2d0/0x510
          post_mobility_fixup+0x7c/0xf0
          migration_store+0xa4/0xc0
          kobj_attr_store+0x30/0x60
          sysfs_kf_write+0x64/0xa0
          kernfs_fop_write+0x16c/0x240
          __vfs_write+0x40/0x200
          vfs_write+0xc8/0x240
          ksys_write+0x5c/0x100
          system_call+0x58/0x6c
      
      Fixes: 5d88aa85 ("powerpc/pseries: Update CPU maps when device tree is updated")
      Cc: stable@vger.kernel.org # v3.10+
      Signed-off-by: NMichael Bringmann <mwb@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8604895a