1. 22 9月, 2021 10 次提交
    • H
      kvm: fix wrong exception emulation in check_rdtsc · e9337c84
      Hou Wenlong 提交于
      According to Intel's SDM Vol2 and AMD's APM Vol3, when
      CR4.TSD is set, use rdtsc/rdtscp instruction above privilege
      level 0 should trigger a #GP.
      
      Fixes: d7eb8203 ("KVM: SVM: Add intercept checks for remaining group7 instructions")
      Signed-off-by: NHou Wenlong <houwenlong93@linux.alibaba.com>
      Message-Id: <1297c0dd3f1bb47a6d089f850b629c7aa0247040.1629257115.git.houwenlong93@linux.alibaba.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e9337c84
    • S
      KVM: SEV: Pin guest memory for write for RECEIVE_UPDATE_DATA · 50c03801
      Sean Christopherson 提交于
      Require the target guest page to be writable when pinning memory for
      RECEIVE_UPDATE_DATA.  Per the SEV API, the PSP writes to guest memory:
      
        The result is then encrypted with GCTX.VEK and written to the memory
        pointed to by GUEST_PADDR field.
      
      Fixes: 15fb7de1 ("KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command")
      Cc: stable@vger.kernel.org
      Cc: Peter Gonda <pgonda@google.com>
      Cc: Marc Orr <marcorr@google.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210914210951.2994260-2-seanjc@google.com>
      Reviewed-by: NBrijesh Singh <brijesh.singh@amd.com>
      Reviewed-by: NPeter Gonda <pgonda@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      50c03801
    • M
      KVM: SVM: fix missing sev_decommission in sev_receive_start · f1815e0a
      Mingwei Zhang 提交于
      DECOMMISSION the current SEV context if binding an ASID fails after
      RECEIVE_START.  Per AMD's SEV API, RECEIVE_START generates a new guest
      context and thus needs to be paired with DECOMMISSION:
      
           The RECEIVE_START command is the only command other than the LAUNCH_START
           command that generates a new guest context and guest handle.
      
      The missing DECOMMISSION can result in subsequent SEV launch failures,
      as the firmware leaks memory and might not able to allocate more SEV
      guest contexts in the future.
      
      Note, LAUNCH_START suffered the same bug, but was previously fixed by
      commit 934002cd ("KVM: SVM: Call SEV Guest Decommission if ASID
      binding fails").
      
      Cc: Alper Gun <alpergun@google.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: David Rienjes <rientjes@google.com>
      Cc: Marc Orr <marcorr@google.com>
      Cc: John Allen <john.allen@amd.com>
      Cc: Peter Gonda <pgonda@google.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Vipin Sharma <vipinsh@google.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: NMarc Orr <marcorr@google.com>
      Acked-by: NBrijesh Singh <brijesh.singh@amd.com>
      Fixes: af43cbbf ("KVM: SVM: Add support for KVM_SEV_RECEIVE_START command")
      Signed-off-by: NMingwei Zhang <mizhang@google.com>
      Reviewed-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210912181815.3899316-1-mizhang@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f1815e0a
    • P
      KVM: SEV: Acquire vcpu mutex when updating VMSA · bb18a677
      Peter Gonda 提交于
      The update-VMSA ioctl touches data stored in struct kvm_vcpu, and
      therefore should not be performed concurrently with any VCPU ioctl
      that might cause KVM or the processor to use the same data.
      
      Adds vcpu mutex guard to the VMSA updating code. Refactors out
      __sev_launch_update_vmsa() function to deal with per vCPU parts
      of sev_launch_update_vmsa().
      
      Fixes: ad73109a ("KVM: SVM: Provide support to launch and run an SEV-ES guest")
      Signed-off-by: NPeter Gonda <pgonda@google.com>
      Cc: Marc Orr <marcorr@google.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: kvm@vger.kernel.org
      Cc: stable@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Message-Id: <20210915171755.3773766-1-pgonda@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      bb18a677
    • Y
      KVM: nVMX: fix comments of handle_vmon() · ed7023a1
      Yu Zhang 提交于
      "VMXON pointer" is saved in vmx->nested.vmxon_ptr since
      commit 3573e22c ("KVM: nVMX: additional checks on
      vmxon region"). Also, handle_vmptrld() & handle_vmclear()
      now have logic to check the VMCS pointer against the VMXON
      pointer.
      
      So just remove the obsolete comments of handle_vmon().
      Signed-off-by: NYu Zhang <yu.c.zhang@linux.intel.com>
      Message-Id: <20210908171731.18885-1-yu.c.zhang@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ed7023a1
    • H
      KVM: x86: Handle SRCU initialization failure during page track init · eb7511bf
      Haimin Zhang 提交于
      Check the return of init_srcu_struct(), which can fail due to OOM, when
      initializing the page track mechanism.  Lack of checking leads to a NULL
      pointer deref found by a modified syzkaller.
      Reported-by: NTCS Robot <tcs_robot@tencent.com>
      Signed-off-by: NHaimin Zhang <tcs_kernel@tencent.com>
      Message-Id: <1630636626-12262-1-git-send-email-tcs_kernel@tencent.com>
      [Move the call towards the beginning of kvm_arch_init_vm. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      eb7511bf
    • S
      KVM: VMX: Remove defunct "nr_active_uret_msrs" field · cd36ae87
      Sean Christopherson 提交于
      Remove vcpu_vmx.nr_active_uret_msrs and its associated comment, which are
      both defunct now that KVM keeps the list constant and instead explicitly
      tracks which entries need to be loaded into hardware.
      
      No functional change intended.
      
      Fixes: ee9d22e0 ("KVM: VMX: Use flag to indicate "active" uret MSRs instead of sorting list")
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210908002401.1947049-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cd36ae87
    • S
      KVM: x86: Clear KVM's cached guest CR3 at RESET/INIT · 03a6e840
      Sean Christopherson 提交于
      Explicitly zero the guest's CR3 and mark it available+dirty at RESET/INIT.
      Per Intel's SDM and AMD's APM, CR3 is zeroed at both RESET and INIT.  For
      RESET, this is a nop as vcpu is zero-allocated.  For INIT, the bug has
      likely escaped notice because no firmware/kernel puts its page tables root
      at PA=0, let alone relies on INIT to get the desired CR3 for such page
      tables.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210921000303.400537-3-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      03a6e840
    • S
      KVM: x86: Mark all registers as avail/dirty at vCPU creation · 7117003f
      Sean Christopherson 提交于
      Mark all registers as available and dirty at vCPU creation, as the vCPU has
      obviously not been loaded into hardware, let alone been given the chance to
      be modified in hardware.  On SVM, reading from "uninitialized" hardware is
      a non-issue as VMCBs are zero allocated (thus not truly uninitialized) and
      hardware does not allow for arbitrary field encoding schemes.
      
      On VMX, backing memory for VMCSes is also zero allocated, but true
      initialization of the VMCS _technically_ requires VMWRITEs, as the VMX
      architectural specification technically allows CPU implementations to
      encode fields with arbitrary schemes.  E.g. a CPU could theoretically store
      the inverted value of every field, which would result in VMREAD to a
      zero-allocated field returns all ones.
      
      In practice, only the AR_BYTES fields are known to be manipulated by
      hardware during VMREAD/VMREAD; no known hardware or VMM (for nested VMX)
      does fancy encoding of cacheable field values (CR0, CR3, CR4, etc...).  In
      other words, this is technically a bug fix, but practically speakings it's
      a glorified nop.
      
      Failure to mark registers as available has been a lurking bug for quite
      some time.  The original register caching supported only GPRs (+RIP, which
      is kinda sorta a GPR), with the masks initialized at ->vcpu_reset().  That
      worked because the two cacheable registers, RIP and RSP, are generally
      speaking not read as side effects in other flows.
      
      Arguably, commit aff48baa ("KVM: Fetch guest cr3 from hardware on
      demand") was the first instance of failure to mark regs available.  While
      _just_ marking CR3 available during vCPU creation wouldn't have fixed the
      VMREAD from an uninitialized VMCS bug because ept_update_paging_mode_cr0()
      unconditionally read vmcs.GUEST_CR3, marking CR3 _and_ intentionally not
      reading GUEST_CR3 when it's available would have avoided VMREAD to a
      technically-uninitialized VMCS.
      
      Fixes: aff48baa ("KVM: Fetch guest cr3 from hardware on demand")
      Fixes: 6de4f3ad ("KVM: Cache pdptrs")
      Fixes: 6de12732 ("KVM: VMX: Optimize vmx_get_rflags()")
      Fixes: 2fb92db1 ("KVM: VMX: Cache vmcs segment fields")
      Fixes: bd31fe49 ("KVM: VMX: Add proper cache tracking for CR0")
      Fixes: f98c1e77 ("KVM: VMX: Add proper cache tracking for CR4")
      Fixes: 5addc235 ("KVM: VMX: Cache vmcs.EXIT_QUALIFICATION using arch avail_reg flags")
      Fixes: 87915858 ("KVM: VMX: Cache vmcs.EXIT_INTR_INFO using arch avail_reg flags")
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210921000303.400537-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7117003f
    • S
      entry: rseq: Call rseq_handle_notify_resume() in tracehook_notify_resume() · a68de80f
      Sean Christopherson 提交于
      Invoke rseq_handle_notify_resume() from tracehook_notify_resume() now
      that the two function are always called back-to-back by architectures
      that have rseq.  The rseq helper is stubbed out for architectures that
      don't support rseq, i.e. this is a nop across the board.
      
      Note, tracehook_notify_resume() is horribly named and arguably does not
      belong in tracehook.h as literally every line of code in it has nothing
      to do with tracing.  But, that's been true since commit a42c6ded
      ("move key_repace_session_keyring() into tracehook_notify_resume()")
      first usurped tracehook_notify_resume() back in 2012.  Punt cleaning that
      mess up to future patches.
      
      No functional change intended.
      Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210901203030.1292304-3-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a68de80f
  2. 20 9月, 2021 2 次提交
  3. 19 9月, 2021 5 次提交
    • N
      x86/build: Do not add -falign flags unconditionally for clang · 7fa6a274
      Nathan Chancellor 提交于
      clang does not support -falign-jumps and only recently gained support
      for -falign-loops. When one of the configuration options that adds these
      flags is enabled, clang warns and all cc-{disable-warning,option} that
      follow fail because -Werror gets added to test for the presence of this
      warning:
      
      clang-14: warning: optimization flag '-falign-jumps=0' is not supported
      [-Wignored-optimization-argument]
      
      To resolve this, add a couple of cc-option calls when building with
      clang; gcc has supported these options since 3.2 so there is no point in
      testing for their support. -falign-functions was implemented in clang-7,
      -falign-loops was implemented in clang-14, and -falign-jumps has not
      been implemented yet.
      
      Link: https://lore.kernel.org/r/YSQE2f5teuvKLkON@Ryzen-9-3900X.localdomain/
      Link: https://lore.kernel.org/r/20210824022640.2170859-2-nathan@kernel.org/Reported-by: Nkernel test robot <lkp@intel.com>
      Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NNathan Chancellor <nathan@kernel.org>
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      7fa6a274
    • G
      sh: Add missing FORCE prerequisites in Makefile · 4e70b646
      Geert Uytterhoeven 提交于
      make:
      
          arch/sh/boot/Makefile:87: FORCE prerequisite is missing
      
      Add the missing FORCE prerequisites for all build targets identified by
      "make help".
      
      Fixes: e1f86d7b ("kbuild: warn if FORCE is missing for if_changed(_dep,_rule) and filechk")
      Signed-off-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      4e70b646
    • L
      alpha: move __udiv_qrnnd library function to arch/alpha/lib/ · d4d016ca
      Linus Torvalds 提交于
      We already had the implementation for __udiv_qrnnd (unsigned divide for
      multi-precision arithmetic) as part of the alpha math emulation code.
      
      But you can disable the math emulation code - even if you shouldn't -
      and then the MPI code that actually wants this functionality (and is
      needed by various crypto functions) will fail to build.
      
      So move the extended-precision divide code to be a regular library
      function, just like all the regular division code is.  That way ie is
      available regardless of math-emulation.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d4d016ca
    • L
      alpha: mark 'Jensen' platform as no longer broken · ab41f75e
      Linus Torvalds 提交于
      Ok, it almost certainly is still broken on actual hardware, but the
      immediate reason for it having been marked BROKEN was a build error that
      is fixed by just making sure the low-level IO header file is included
      sufficiently early that the __EXTERN_INLINE hackery takes effect.
      
      This was marked broken back in 2017 by commit 1883c9f4 ("alpha: mark
      jensen as broken"), but Ulrich Teichert made me look at it as part of my
      cross-build work to make sure -Werror actually does the right thing.
      
      There are lots of alpha configurations that do not build cleanly, but
      now it's no longer because Jensen wouldn't be buildable.  That said,
      because the Jensen platform doesn't force PCI to be enabled (Jensen only
      had EISA), it ends up being somewhat interesting as a source of odd
      configs.
      Reported-by: NUlrich Teichert <krypton@ulrich-teichert.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ab41f75e
    • L
      alpha: make 'Jensen' IO functions build again · cc9d3aaa
      Linus Torvalds 提交于
      The Jensen IO functions are overly copmplicated because some of the IO
      addresses refer to special 'local IO' ports, and they get accessed
      differently.
      
      That then makes gcc not actually inline them, and since they were marked
      "extern inline" when included through the regular <asm/io.h> path, and
      then only marked "inline" when included from sys_jensen.c, you never
      necessarily got a body for the IO functions at all.
      
      The intent of the sys_jensen.c code is to actually get the non-inlined
      copy generated, so remove the 'inline' from the magic macro that is
      supposed to sort this all out.
      
      Also, do not mix 'extern inline' functions (that may or may not be
      inlined and will not generate a function body if they are not) with
      'static inline' (that _will_ generate a function body when not inlined).
      Because gcc will complain about this situation:
      
         error: ‘jensen_bus_outb’ is static but used in inline function ‘jensen_outb’ which is not static
      
      because gcc basically doesn't know whether to generate a body for that
      static inline function or not for that call site.
      
      So make all of these use that __EXTERN_INLINE marker.  Gcc will
      generally not inline these things on use, and then generate the function
      body out-of-line in sys_jensen.c.
      
      This makes the core IO functions build for the alpha Jensen config.
      
      Not that the rest then builds, because it turns out Jensen also doesn't
      enable PCI, which then makes other drievrs very unhappy, but that's a
      separate issue.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cc9d3aaa
  4. 17 9月, 2021 4 次提交
  5. 16 9月, 2021 4 次提交
  6. 15 9月, 2021 7 次提交
    • H
      s390: remove WARN_DYNAMIC_STACK · f5711f9d
      Heiko Carstens 提交于
      s390 is the only architecture which allows to set the
      -mwarn-dynamicstack compile option. This however will also always
      generate a warning with system call stack randomization, which uses
      alloca to generate some random sized stack frame.
      
      On the other hand Linus just enabled "-Werror" by default with commit
      3fe617cc ("Enable '-Werror' by default for all kernel builds"),
      which means compiles will always fail by default.
      
      So instead of playing once again whack-a-mole for something which is
      s390 specific, simply remove this option.
      Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
      f5711f9d
    • H
      s390: update defconfigs · 4b26ceac
      Heiko Carstens 提交于
      Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
      4b26ceac
    • D
      s390/pci_mmio: fully validate the VMA before calling follow_pte() · a8b92b8c
      David Hildenbrand 提交于
      We should not walk/touch page tables outside of VMA boundaries when
      holding only the mmap sem in read mode. Evil user space can modify the
      VMA layout just before this function runs and e.g., trigger races with
      page table removal code since commit dd2283f2 ("mm: mmap: zap pages
      with read mmap_sem in munmap").
      
      find_vma() does not check if the address is >= the VMA start address;
      use vma_lookup() instead.
      Reviewed-by: NNiklas Schnelle <schnelle@linux.ibm.com>
      Reviewed-by: NLiam R. Howlett <Liam.Howlett@oracle.com>
      Fixes: dd2283f2 ("mm: mmap: zap pages with read mmap_sem in munmap")
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
      a8b92b8c
    • C
      powerpc/xics: Set the IRQ chip data for the ICS native backend · c006a065
      Cédric Le Goater 提交于
      The ICS native driver relies on the IRQ chip data to find the struct
      'ics_native' describing the ICS controller but it was removed by commit
      248af248 ("powerpc/xics: Rename the map handler in a check handler").
      Revert this change to fix the Microwatt SoC platform.
      
      Fixes: 248af248 ("powerpc/xics: Rename the map handler in a check handler")
      Signed-off-by: NCédric Le Goater <clg@kaod.org>
      Tested-by: NGustavo Romero <gustavo.romero@linaro.org>
      Reviewed-by: NJoel Stanley <joel@jms.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210913134056.3761960-1-clg@kaod.org
      c006a065
    • J
      xen: fix usage of pmd_populate in mremap for pv guests · 36c9b592
      Juergen Gross 提交于
      Commit 0881ace2 ("mm/mremap: use pmd/pud_poplulate to update page
      table entries") introduced a regression when running as Xen PV guest.
      
      Today pmd_populate() for Xen PV assumes that the PFN inserted is
      referencing a not yet used page table. In case of move_normal_pmd()
      this is not true, resulting in WARN splats like:
      
      [34321.304270] ------------[ cut here ]------------
      [34321.304277] WARNING: CPU: 0 PID: 23628 at arch/x86/xen/multicalls.c:102 xen_mc_flush+0x176/0x1a0
      [34321.304288] Modules linked in:
      [34321.304291] CPU: 0 PID: 23628 Comm: apt-get Not tainted 5.14.1-20210906-doflr-mac80211debug+ #1
      [34321.304294] Hardware name: MSI MS-7640/890FXA-GD70 (MS-7640)  , BIOS V1.8B1 09/13/2010
      [34321.304296] RIP: e030:xen_mc_flush+0x176/0x1a0
      [34321.304300] Code: 89 45 18 48 c1 e9 3f 48 89 ce e9 20 ff ff ff e8 60 03 00 00 66 90 5b 5d 41 5c 41 5d c3 48 c7 45 18 ea ff ff ff be 01 00 00 00 <0f> 0b 8b 55 00 48 c7 c7 10 97 aa 82 31 db 49 c7 c5 38 97 aa 82 65
      [34321.304303] RSP: e02b:ffffc90000a97c90 EFLAGS: 00010002
      [34321.304305] RAX: ffff88807d416398 RBX: ffff88807d416350 RCX: ffff88807d416398
      [34321.304306] RDX: 0000000000000001 RSI: 0000000000000001 RDI: deadbeefdeadf00d
      [34321.304308] RBP: ffff88807d416300 R08: aaaaaaaaaaaaaaaa R09: ffff888006160cc0
      [34321.304309] R10: deadbeefdeadf00d R11: ffffea000026a600 R12: 0000000000000000
      [34321.304310] R13: ffff888012f6b000 R14: 0000000012f6b000 R15: 0000000000000001
      [34321.304320] FS:  00007f5071177800(0000) GS:ffff88807d400000(0000) knlGS:0000000000000000
      [34321.304322] CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
      [34321.304323] CR2: 00007f506f542000 CR3: 00000000160cc000 CR4: 0000000000000660
      [34321.304326] Call Trace:
      [34321.304331]  xen_alloc_pte+0x294/0x320
      [34321.304334]  move_pgt_entry+0x165/0x4b0
      [34321.304339]  move_page_tables+0x6fa/0x8d0
      [34321.304342]  move_vma.isra.44+0x138/0x500
      [34321.304345]  __x64_sys_mremap+0x296/0x410
      [34321.304348]  do_syscall_64+0x3a/0x80
      [34321.304352]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [34321.304355] RIP: 0033:0x7f507196301a
      [34321.304358] Code: 73 01 c3 48 8b 0d 76 0e 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 49 89 ca b8 19 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 46 0e 0c 00 f7 d8 64 89 01 48
      [34321.304360] RSP: 002b:00007ffda1eecd38 EFLAGS: 00000246 ORIG_RAX: 0000000000000019
      [34321.304362] RAX: ffffffffffffffda RBX: 000056205f950f30 RCX: 00007f507196301a
      [34321.304363] RDX: 0000000001a00000 RSI: 0000000001900000 RDI: 00007f506dc56000
      [34321.304364] RBP: 0000000001a00000 R08: 0000000000000010 R09: 0000000000000004
      [34321.304365] R10: 0000000000000001 R11: 0000000000000246 R12: 00007f506dc56060
      [34321.304367] R13: 00007f506dc56000 R14: 00007f506dc56060 R15: 000056205f950f30
      [34321.304368] ---[ end trace a19885b78fe8f33e ]---
      [34321.304370] 1 of 2 multicall(s) failed: cpu 0
      [34321.304371]   call  2: op=12297829382473034410 arg=[aaaaaaaaaaaaaaaa] result=-22
      
      Fix that by modifying xen_alloc_ptpage() to only pin the page table in
      case it wasn't pinned already.
      
      Fixes: 0881ace2 ("mm/mremap: use pmd/pud_poplulate to update page table entries")
      Cc: <stable@vger.kernel.org>
      Reported-by: NSander Eikelenboom <linux@eikelenboom.it>
      Tested-by: NSander Eikelenboom <linux@eikelenboom.it>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Link: https://lore.kernel.org/r/20210908073640.11299-1-jgross@suse.comSigned-off-by: NJuergen Gross <jgross@suse.com>
      36c9b592
    • J
      xen: reset legacy rtc flag for PV domU · f68aa100
      Juergen Gross 提交于
      A Xen PV guest doesn't have a legacy RTC device, so reset the legacy
      RTC flag. Otherwise the following WARN splat will occur at boot:
      
      [    1.333404] WARNING: CPU: 1 PID: 1 at /home/gross/linux/head/drivers/rtc/rtc-mc146818-lib.c:25 mc146818_get_time+0x1be/0x210
      [    1.333404] Modules linked in:
      [    1.333404] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G        W         5.14.0-rc7-default+ #282
      [    1.333404] RIP: e030:mc146818_get_time+0x1be/0x210
      [    1.333404] Code: c0 64 01 c5 83 fd 45 89 6b 14 7f 06 83 c5 64 89 6b 14 41 83 ec 01 b8 02 00 00 00 44 89 63 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 <0f> 0b 48 c7 c7 30 0e ef 82 4c 89 e6 e8 71 2a 24 00 48 c7 c0 ff ff
      [    1.333404] RSP: e02b:ffffc90040093df8 EFLAGS: 00010002
      [    1.333404] RAX: 00000000000000ff RBX: ffffc90040093e34 RCX: 0000000000000000
      [    1.333404] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000000000000d
      [    1.333404] RBP: ffffffff82ef0e30 R08: ffff888005013e60 R09: 0000000000000000
      [    1.333404] R10: ffffffff82373e9b R11: 0000000000033080 R12: 0000000000000200
      [    1.333404] R13: 0000000000000000 R14: 0000000000000002 R15: ffffffff82cdc6d4
      [    1.333404] FS:  0000000000000000(0000) GS:ffff88807d440000(0000) knlGS:0000000000000000
      [    1.333404] CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
      [    1.333404] CR2: 0000000000000000 CR3: 000000000260a000 CR4: 0000000000050660
      [    1.333404] Call Trace:
      [    1.333404]  ? wakeup_sources_sysfs_init+0x30/0x30
      [    1.333404]  ? rdinit_setup+0x2b/0x2b
      [    1.333404]  early_resume_init+0x23/0xa4
      [    1.333404]  ? cn_proc_init+0x36/0x36
      [    1.333404]  do_one_initcall+0x3e/0x200
      [    1.333404]  kernel_init_freeable+0x232/0x28e
      [    1.333404]  ? rest_init+0xd0/0xd0
      [    1.333404]  kernel_init+0x16/0x120
      [    1.333404]  ret_from_fork+0x1f/0x30
      
      Cc: <stable@vger.kernel.org>
      Fixes: 8d152e7a ("x86/rtc: Replace paravirt rtc check with platform legacy quirk")
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Link: https://lore.kernel.org/r/20210903084937.19392-3-jgross@suse.comSigned-off-by: NJuergen Gross <jgross@suse.com>
      f68aa100
    • L
      memblock: introduce saner 'memblock_free_ptr()' interface · 77e02cf5
      Linus Torvalds 提交于
      The boot-time allocation interface for memblock is a mess, with
      'memblock_alloc()' returning a virtual pointer, but then you are
      supposed to free it with 'memblock_free()' that takes a _physical_
      address.
      
      Not only is that all kinds of strange and illogical, but it actually
      causes bugs, when people then use it like a normal allocation function,
      and it fails spectacularly on a NULL pointer:
      
         https://lore.kernel.org/all/20210912140820.GD25450@xsang-OptiPlex-9020/
      
      or just random memory corruption if the debug checks don't catch it:
      
         https://lore.kernel.org/all/61ab2d0c-3313-aaab-514c-e15b7aa054a0@suse.cz/
      
      I really don't want to apply patches that treat the symptoms, when the
      fundamental cause is this horribly confusing interface.
      
      I started out looking at just automating a sane replacement sequence,
      but because of this mix or virtual and physical addresses, and because
      people have used the "__pa()" macro that can take either a regular
      kernel pointer, or just the raw "unsigned long" address, it's all quite
      messy.
      
      So this just introduces a new saner interface for freeing a virtual
      address that was allocated using 'memblock_alloc()', and that was kept
      as a regular kernel pointer.  And then it converts a couple of users
      that are obvious and easy to test, including the 'xbc_nodes' case in
      lib/bootconfig.c that caused problems.
      Reported-by: Nkernel test robot <oliver.sang@intel.com>
      Fixes: 40caa127 ("init: bootconfig: Remove all bootconfig data when the init memory is removed")
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      77e02cf5
  7. 14 9月, 2021 7 次提交
    • M
      powerpc/boot: Fix build failure since GCC 4.9 removal · 1619b69e
      Michael Ellerman 提交于
      Stephen reported that the build was broken since commit
      6d2ef226 ("compiler_attributes.h: drop __has_attribute() support for
      gcc4"), with errors such as:
      
        include/linux/compiler_attributes.h:296:5: warning: "__has_attribute" is not defined, evaluates to 0 [-Wundef]
          296 | #if __has_attribute(__warning__)
              |     ^~~~~~~~~~~~~~~
        make[2]: *** [arch/powerpc/boot/Makefile:225: arch/powerpc/boot/crt0.o] Error 1
      
      But we expect __has_attribute() to always be defined now that we've
      stopped using GCC 4.
      
      Linus debugged it to the point of reading the GCC sources, and noticing
      that the problem is that __has_attribute() is not defined when
      preprocessing assembly files, which is what we're doing here.
      
      Our assembly files don't include, or need, compiler_attributes.h, but
      they are getting it unconditionally from the -include in BOOT_CFLAGS,
      which is then added in its entirety to BOOT_AFLAGS.
      
      That -include was added in commit 77433830 ("powerpc: boot: include
      compiler_attributes.h") so that we'd have "fallthrough" and other
      attributes defined for the C files in arch/powerpc/boot. But it's not
      needed for assembly files.
      
      The minimal fix is to move the addition to BOOT_CFLAGS of -include
      compiler_attributes.h until after we've copied BOOT_CFLAGS into
      BOOT_AFLAGS. That avoids including compiler_attributes.h for asm files,
      but makes no other change to BOOT_CFLAGS or BOOT_AFLAGS.
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Debugged-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Tested-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1619b69e
    • A
      sparc32: page align size in arch_dma_alloc · 59583f74
      Andreas Larsson 提交于
      Commit 53b7670e ("sparc: factor the dma coherent mapping into
      helper") lost the page align for the calls to dma_make_coherent and
      srmmu_unmapiorange. The latter cannot handle a non page aligned len
      argument.
      Signed-off-by: NAndreas Larsson <andreas@gaisler.com>
      Reviewed-by: NSam Ravnborg <sam@ravnborg.org>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      59583f74
    • T
      x86/mce: Avoid infinite loop for copy from user recovery · 81065b35
      Tony Luck 提交于
      There are two cases for machine check recovery:
      
      1) The machine check was triggered by ring3 (application) code.
         This is the simpler case. The machine check handler simply queues
         work to be executed on return to user. That code unmaps the page
         from all users and arranges to send a SIGBUS to the task that
         triggered the poison.
      
      2) The machine check was triggered in kernel code that is covered by
         an exception table entry. In this case the machine check handler
         still queues a work entry to unmap the page, etc. but this will
         not be called right away because the #MC handler returns to the
         fix up code address in the exception table entry.
      
      Problems occur if the kernel triggers another machine check before the
      return to user processes the first queued work item.
      
      Specifically, the work is queued using the ->mce_kill_me callback
      structure in the task struct for the current thread. Attempting to queue
      a second work item using this same callback results in a loop in the
      linked list of work functions to call. So when the kernel does return to
      user, it enters an infinite loop processing the same entry for ever.
      
      There are some legitimate scenarios where the kernel may take a second
      machine check before returning to the user.
      
      1) Some code (e.g. futex) first tries a get_user() with page faults
         disabled. If this fails, the code retries with page faults enabled
         expecting that this will resolve the page fault.
      
      2) Copy from user code retries a copy in byte-at-time mode to check
         whether any additional bytes can be copied.
      
      On the other side of the fence are some bad drivers that do not check
      the return value from individual get_user() calls and may access
      multiple user addresses without noticing that some/all calls have
      failed.
      
      Fix by adding a counter (current->mce_count) to keep track of repeated
      machine checks before task_work() is called. First machine check saves
      the address information and calls task_work_add(). Subsequent machine
      checks before that task_work call back is executed check that the address
      is in the same page as the first machine check (since the callback will
      offline exactly one page).
      
      Expected worst case is four machine checks before moving on (e.g. one
      user access with page faults disabled, then a repeat to the same address
      with page faults enabled ... repeat in copy tail bytes). Just in case
      there is some code that loops forever enforce a limit of 10.
      
       [ bp: Massage commit message, drop noinstr, fix typo, extend panic
         messages. ]
      
      Fixes: 5567d11c ("x86/mce: Send #MC singal from task work")
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/YT/IJ9ziLqmtqEPu@agluck-desk2.amr.corp.intel.com
      81065b35
    • N
      arm64: remove GCC version check for ARCH_SUPPORTS_INT128 · 42a7ba16
      Nick Desaulniers 提交于
      Now that GCC 5.1 is the minimally supported compiler version, this
      Kconfig check is no longer necessary.
      
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: linux-arm-kernel@lists.infradead.org
      Signed-off-by: NNick Desaulniers <ndesaulniers@google.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NNathan Chancellor <nathan@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      42a7ba16
    • N
      powerpc: remove GCC version check for UPD_CONSTR · 6563139d
      Nick Desaulniers 提交于
      Now that GCC 5.1 is the minimum supported version, we can drop this
      workaround for older versions of GCC. This adversely affected clang,
      too.
      
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Segher Boessenkool <segher@kernel.crashing.org>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: NNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6563139d
    • N
      riscv: remove Kconfig check for GCC version for ARCH_RV64I · d2075895
      Nick Desaulniers 提交于
      The minimum supported version of GCC is now 5.1. The check wasn't
      correct as written anyways since GCC_VERSION is 0 when CC=clang.
      
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: linux-riscv@lists.infradead.org
      Signed-off-by: NNick Desaulniers <ndesaulniers@google.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NNathan Chancellor <nathan@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d2075895
    • W
      x86/uaccess: Fix 32-bit __get_user_asm_u64() when CC_HAS_ASM_GOTO_OUTPUT=y · a69ae291
      Will Deacon 提交于
      Commit 865c50e1 ("x86/uaccess: utilize CONFIG_CC_HAS_ASM_GOTO_OUTPUT")
      added an optimised version of __get_user_asm() for x86 using 'asm goto'.
      
      Like the non-optimised code, the 32-bit implementation of 64-bit
      get_user() expands to a pair of 32-bit accesses.  Unlike the
      non-optimised code, the _original_ pointer is incremented to copy the
      high word instead of loading through a new pointer explicitly
      constructed to point at a 32-bit type.  Consequently, if the pointer
      points at a 64-bit type then we end up loading the wrong data for the
      upper 32-bits.
      
      This was observed as a mount() failure in Android targeting i686 after
      b0cfcdd9 ("d_path: make 'prepend()' fill up the buffer exactly on
      overflow") because the call to copy_from_kernel_nofault() from
      prepend_copy() ends up in __get_kernel_nofault() and casts the source
      pointer to a 'u64 __user *'.  An attempt to mount at "/debug_ramdisk"
      therefore ends up failing trying to mount "/debumdismdisk".
      
      Use the existing '__gu_ptr' source pointer to unsigned int for 32-bit
      __get_user_asm_u64() instead of the original pointer.
      
      Cc: Bill Wendling <morbo@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Reported-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Fixes: 865c50e1 ("x86/uaccess: utilize CONFIG_CC_HAS_ASM_GOTO_OUTPUT")
      Signed-off-by: NWill Deacon <will@kernel.org>
      Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
      Tested-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a69ae291
  8. 13 9月, 2021 1 次提交
    • G
      powerpc/mce: Fix access error in mce handler · 3a1e92d0
      Ganesh Goudar 提交于
      We queue an irq work for deferred processing of mce event in realmode
      mce handler, where translation is disabled. Queuing of the work may
      result in accessing memory outside RMO region, such access needs the
      translation to be enabled for an LPAR running with hash mmu else the
      kernel crashes.
      
      After enabling translation in mce_handle_error() we used to leave it
      enabled to avoid crashing here, but now with the commit
      74c3354b ("powerpc/pseries/mce: restore msr before returning from
      handler") we are restoring the MSR to disable translation.
      
      Hence to fix this enable the translation before queuing the work.
      
      Without this change following trace is seen on injecting SLB multihit in
      an LPAR running with hash mmu.
      
        Oops: Kernel access of bad area, sig: 11 [#1]
        LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
        CPU: 5 PID: 1883 Comm: insmod Tainted: G        OE     5.14.0-mce+ #137
        NIP:  c000000000735d60 LR: c000000000318640 CTR: 0000000000000000
        REGS: c00000001ebff9a0 TRAP: 0300   Tainted: G       OE      (5.14.0-mce+)
        MSR:  8000000000001003 <SF,ME,RI,LE>  CR: 28008228  XER: 00000001
        CFAR: c00000000031863c DAR: c00000027fa8fe08 DSISR: 40000000 IRQMASK: 0
        ...
        NIP llist_add_batch+0x0/0x40
        LR  __irq_work_queue_local+0x70/0xc0
        Call Trace:
          0xc00000001ebffc0c (unreliable)
          irq_work_queue+0x40/0x70
          machine_check_queue_event+0xbc/0xd0
          machine_check_early_common+0x16c/0x1f4
      
      Fixes: 74c3354b ("powerpc/pseries/mce: restore msr before returning from handler")
      Signed-off-by: NGanesh Goudar <ganeshgr@linux.ibm.com>
      [mpe: Fix comment formatting, trim oops in change log for readability]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20210909064330.312432-1-ganeshgr@linux.ibm.com
      3a1e92d0