1. 03 2月, 2022 1 次提交
    • J
      x86/Xen: streamline (and fix) PV CPU enumeration · e25a8d95
      Jan Beulich 提交于
      This started out with me noticing that "dom0_max_vcpus=<N>" with <N>
      larger than the number of physical CPUs reported through ACPI tables
      would not bring up the "excess" vCPU-s. Addressing this is the primary
      purpose of the change; CPU maps handling is being tidied only as far as
      is necessary for the change here (with the effect of also avoiding the
      setting up of too much per-CPU infrastructure, i.e. for CPUs which can
      never come online).
      
      Noticing that xen_fill_possible_map() is called way too early, whereas
      xen_filter_cpu_maps() is called too late (after per-CPU areas were
      already set up), and further observing that each of the functions serves
      only one of Dom0 or DomU, it looked like it was better to simplify this.
      Use the .get_smp_config hook instead, uniformly for Dom0 and DomU.
      xen_fill_possible_map() can be dropped altogether, while
      xen_filter_cpu_maps() is re-purposed but not otherwise changed.
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Link: https://lore.kernel.org/r/2dbd5f0a-9859-ca2d-085e-a02f7166c610@suse.comSigned-off-by: NJuergen Gross <jgross@suse.com>
      e25a8d95
  2. 02 11月, 2021 2 次提交
  3. 05 10月, 2021 4 次提交
  4. 21 9月, 2021 1 次提交
    • J
      xen/x86: fix PV trap handling on secondary processors · 0594c581
      Jan Beulich 提交于
      The initial observation was that in PV mode under Xen 32-bit user space
      didn't work anymore. Attempts of system calls ended in #GP(0x402). All
      of the sudden the vector 0x80 handler was not in place anymore. As it
      turns out up to 5.13 redundant initialization did occur: Once from
      cpu_initialize_context() (through its VCPUOP_initialise hypercall) and a
      2nd time while each CPU was brought fully up. This 2nd initialization is
      now gone, uncovering that the 1st one was flawed: Unlike for the
      set_trap_table hypercall, a full virtual IDT needs to be specified here;
      the "vector" fields of the individual entries are of no interest. With
      many (kernel) IDT entries still(?) (i.e. at that point at least) empty,
      the syscall vector 0x80 ended up in slot 0x20 of the virtual IDT, thus
      becoming the domain's handler for vector 0x20.
      
      Make xen_convert_trap_info() fit for either purpose, leveraging the fact
      that on the xen_copy_trap_info() path the table starts out zero-filled.
      This includes moving out the writing of the sentinel, which would also
      have lead to a buffer overrun in the xen_copy_trap_info() case if all
      (kernel) IDT entries were populated. Convert the writing of the sentinel
      to clearing of the entire table entry rather than just the address
      field.
      
      (I didn't bother trying to identify the commit which uncovered the issue
      in 5.14; the commit named below is the one which actually introduced the
      bad code.)
      
      Fixes: f87e4cac ("xen: SMP guest support")
      Cc: stable@vger.kernel.org
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Link: https://lore.kernel.org/r/7a266932-092e-b68f-f2bb-1473b61adc6e@suse.comSigned-off-by: NJuergen Gross <jgross@suse.com>
      0594c581
  5. 17 9月, 2021 3 次提交
  6. 15 9月, 2021 1 次提交
    • J
      xen: reset legacy rtc flag for PV domU · f68aa100
      Juergen Gross 提交于
      A Xen PV guest doesn't have a legacy RTC device, so reset the legacy
      RTC flag. Otherwise the following WARN splat will occur at boot:
      
      [    1.333404] WARNING: CPU: 1 PID: 1 at /home/gross/linux/head/drivers/rtc/rtc-mc146818-lib.c:25 mc146818_get_time+0x1be/0x210
      [    1.333404] Modules linked in:
      [    1.333404] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G        W         5.14.0-rc7-default+ #282
      [    1.333404] RIP: e030:mc146818_get_time+0x1be/0x210
      [    1.333404] Code: c0 64 01 c5 83 fd 45 89 6b 14 7f 06 83 c5 64 89 6b 14 41 83 ec 01 b8 02 00 00 00 44 89 63 10 5b 5d 41 5c 41 5d 41 5e 41 5f c3 <0f> 0b 48 c7 c7 30 0e ef 82 4c 89 e6 e8 71 2a 24 00 48 c7 c0 ff ff
      [    1.333404] RSP: e02b:ffffc90040093df8 EFLAGS: 00010002
      [    1.333404] RAX: 00000000000000ff RBX: ffffc90040093e34 RCX: 0000000000000000
      [    1.333404] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000000000000d
      [    1.333404] RBP: ffffffff82ef0e30 R08: ffff888005013e60 R09: 0000000000000000
      [    1.333404] R10: ffffffff82373e9b R11: 0000000000033080 R12: 0000000000000200
      [    1.333404] R13: 0000000000000000 R14: 0000000000000002 R15: ffffffff82cdc6d4
      [    1.333404] FS:  0000000000000000(0000) GS:ffff88807d440000(0000) knlGS:0000000000000000
      [    1.333404] CS:  10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
      [    1.333404] CR2: 0000000000000000 CR3: 000000000260a000 CR4: 0000000000050660
      [    1.333404] Call Trace:
      [    1.333404]  ? wakeup_sources_sysfs_init+0x30/0x30
      [    1.333404]  ? rdinit_setup+0x2b/0x2b
      [    1.333404]  early_resume_init+0x23/0xa4
      [    1.333404]  ? cn_proc_init+0x36/0x36
      [    1.333404]  do_one_initcall+0x3e/0x200
      [    1.333404]  kernel_init_freeable+0x232/0x28e
      [    1.333404]  ? rest_init+0xd0/0xd0
      [    1.333404]  kernel_init+0x16/0x120
      [    1.333404]  ret_from_fork+0x1f/0x30
      
      Cc: <stable@vger.kernel.org>
      Fixes: 8d152e7a ("x86/rtc: Replace paravirt rtc check with platform legacy quirk")
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Link: https://lore.kernel.org/r/20210903084937.19392-3-jgross@suse.comSigned-off-by: NJuergen Gross <jgross@suse.com>
      f68aa100
  7. 30 8月, 2021 1 次提交
  8. 22 6月, 2021 1 次提交
  9. 21 5月, 2021 1 次提交
  10. 12 3月, 2021 2 次提交
  11. 08 3月, 2021 1 次提交
    • A
      x86/stackprotector/32: Make the canary into a regular percpu variable · 3fb0fdb3
      Andy Lutomirski 提交于
      On 32-bit kernels, the stackprotector canary is quite nasty -- it is
      stored at %gs:(20), which is nasty because 32-bit kernels use %fs for
      percpu storage.  It's even nastier because it means that whether %gs
      contains userspace state or kernel state while running kernel code
      depends on whether stackprotector is enabled (this is
      CONFIG_X86_32_LAZY_GS), and this setting radically changes the way
      that segment selectors work.  Supporting both variants is a
      maintenance and testing mess.
      
      Merely rearranging so that percpu and the stack canary
      share the same segment would be messy as the 32-bit percpu address
      layout isn't currently compatible with putting a variable at a fixed
      offset.
      
      Fortunately, GCC 8.1 added options that allow the stack canary to be
      accessed as %fs:__stack_chk_guard, effectively turning it into an ordinary
      percpu variable.  This lets us get rid of all of the code to manage the
      stack canary GDT descriptor and the CONFIG_X86_32_LAZY_GS mess.
      
      (That name is special.  We could use any symbol we want for the
       %fs-relative mode, but for CONFIG_SMP=n, gcc refuses to let us use any
       name other than __stack_chk_guard.)
      
      Forcibly disable stackprotector on older compilers that don't support
      the new options and turn the stack canary into a percpu variable. The
      "lazy GS" approach is now used for all 32-bit configurations.
      
      Also makes load_gs_index() work on 32-bit kernels. On 64-bit kernels,
      it loads the GS selector and updates the user GSBASE accordingly. (This
      is unchanged.) On 32-bit kernels, it loads the GS selector and updates
      GSBASE, which is now always the user base. This means that the overall
      effect is the same on 32-bit and 64-bit, which avoids some ifdeffery.
      
       [ bp: Massage commit message. ]
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Link: https://lkml.kernel.org/r/c0ff7dba14041c7e5d1cae5d4df052f03759bef3.1613243844.git.luto@kernel.org
      3fb0fdb3
  12. 10 2月, 2021 5 次提交
  13. 27 1月, 2021 1 次提交
    • J
      x86/xen: avoid warning in Xen pv guest with CONFIG_AMD_MEM_ENCRYPT enabled · 2e924936
      Juergen Gross 提交于
      When booting a kernel which has been built with CONFIG_AMD_MEM_ENCRYPT
      enabled as a Xen pv guest a warning is issued for each processor:
      
      [    5.964347] ------------[ cut here ]------------
      [    5.968314] WARNING: CPU: 0 PID: 1 at /home/gross/linux/head/arch/x86/xen/enlighten_pv.c:660 get_trap_addr+0x59/0x90
      [    5.972321] Modules linked in:
      [    5.976313] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W         5.11.0-rc5-default #75
      [    5.980313] Hardware name: Dell Inc. OptiPlex 9020/0PC5F7, BIOS A05 12/05/2013
      [    5.984313] RIP: e030:get_trap_addr+0x59/0x90
      [    5.988313] Code: 42 10 83 f0 01 85 f6 74 04 84 c0 75 1d b8 01 00 00 00 c3 48 3d 00 80 83 82 72 08 48 3d 20 81 83 82 72 0c b8 01 00 00 00 eb db <0f> 0b 31 c0 c3 48 2d 00 80 83 82 48 ba 72 1c c7 71 1c c7 71 1c 48
      [    5.992313] RSP: e02b:ffffc90040033d38 EFLAGS: 00010202
      [    5.996313] RAX: 0000000000000001 RBX: ffffffff82a141d0 RCX: ffffffff8222ec38
      [    6.000312] RDX: ffffffff8222ec38 RSI: 0000000000000005 RDI: ffffc90040033d40
      [    6.004313] RBP: ffff8881003984a0 R08: 0000000000000007 R09: ffff888100398000
      [    6.008312] R10: 0000000000000007 R11: ffffc90040246000 R12: ffff8884082182a8
      [    6.012313] R13: 0000000000000100 R14: 000000000000001d R15: ffff8881003982d0
      [    6.016316] FS:  0000000000000000(0000) GS:ffff888408200000(0000) knlGS:0000000000000000
      [    6.020313] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
      [    6.024313] CR2: ffffc900020ef000 CR3: 000000000220a000 CR4: 0000000000050660
      [    6.028314] Call Trace:
      [    6.032313]  cvt_gate_to_trap.part.7+0x3f/0x90
      [    6.036313]  ? asm_exc_double_fault+0x30/0x30
      [    6.040313]  xen_convert_trap_info+0x87/0xd0
      [    6.044313]  xen_pv_cpu_up+0x17a/0x450
      [    6.048313]  bringup_cpu+0x2b/0xc0
      [    6.052313]  ? cpus_read_trylock+0x50/0x50
      [    6.056313]  cpuhp_invoke_callback+0x80/0x4c0
      [    6.060313]  _cpu_up+0xa7/0x140
      [    6.064313]  cpu_up+0x98/0xd0
      [    6.068313]  bringup_nonboot_cpus+0x4f/0x60
      [    6.072313]  smp_init+0x26/0x79
      [    6.076313]  kernel_init_freeable+0x103/0x258
      [    6.080313]  ? rest_init+0xd0/0xd0
      [    6.084313]  kernel_init+0xa/0x110
      [    6.088313]  ret_from_fork+0x1f/0x30
      [    6.092313] ---[ end trace be9ecf17dceeb4f3 ]---
      
      Reason is that there is no Xen pv trap entry for X86_TRAP_VC.
      
      Fix that by adding a generic trap handler for unknown traps and wire all
      unknown bare metal handlers to this generic handler, which will just
      crash the system in case such a trap will ever happen.
      
      Fixes: 0786138c ("x86/sev-es: Add a Runtime #VC Exception Handler")
      Cc: <stable@vger.kernel.org> # v5.10
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reviewed-by: NAndrew Cooper <andrew.cooper3@citrix.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      2e924936
  14. 14 10月, 2020 1 次提交
    • D
      x86/numa: cleanup configuration dependent command-line options · 2dd57d34
      Dan Williams 提交于
      Patch series "device-dax: Support sub-dividing soft-reserved ranges", v5.
      
      The device-dax facility allows an address range to be directly mapped
      through a chardev, or optionally hotplugged to the core kernel page
      allocator as System-RAM.  It is the mechanism for converting persistent
      memory (pmem) to be used as another volatile memory pool i.e.  the current
      Memory Tiering hot topic on linux-mm.
      
      In the case of pmem the nvdimm-namespace-label mechanism can sub-divide
      it, but that labeling mechanism is not available / applicable to
      soft-reserved ("EFI specific purpose") memory [3].  This series provides a
      sysfs-mechanism for the daxctl utility to enable provisioning of
      volatile-soft-reserved memory ranges.
      
      The motivations for this facility are:
      
      1/ Allow performance differentiated memory ranges to be split between
         kernel-managed and directly-accessed use cases.
      
      2/ Allow physical memory to be provisioned along performance relevant
         address boundaries. For example, divide a memory-side cache [4] along
         cache-color boundaries.
      
      3/ Parcel out soft-reserved memory to VMs using device-dax as a security
         / permissions boundary [5]. Specifically I have seen people (ab)using
         memmap=nn!ss (mark System-RAM as Persistent Memory) just to get the
         device-dax interface on custom address ranges. A follow-on for the VM
         use case is to teach device-dax to dynamically allocate 'struct page' at
         runtime to reduce the duplication of 'struct page' space in both the
         guest and the host kernel for the same physical pages.
      
      [2]: http://lore.kernel.org/r/20200713160837.13774-11-joao.m.martins@oracle.com
      [3]: http://lore.kernel.org/r/157309097008.1579826.12818463304589384434.stgit@dwillia2-desk3.amr.corp.intel.com
      [4]: http://lore.kernel.org/r/154899811738.3165233.12325692939590944259.stgit@dwillia2-desk3.amr.corp.intel.com
      [5]: http://lore.kernel.org/r/20200110190313.17144-1-joao.m.martins@oracle.com
      
      This patch (of 23):
      
      In preparation for adding a new numa= option clean up the existing ones to
      avoid ifdefs in numa_setup(), and provide feedback when the option is
      numa=fake= option is invalid due to kernel config.  The same does not need
      to be done for numa=noacpi, since the capability is already hard disabled
      at compile-time.
      Suggested-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brice Goglin <Brice.Goglin@inria.fr>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Daniel Vetter <daniel@ffwll.ch>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jia He <justin.he@arm.com>
      Cc: Joao Martins <joao.m.martins@oracle.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Hulk Robot <hulkci@huawei.com>
      Cc: Jason Yan <yanaijie@huawei.com>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: kernel test robot <lkp@intel.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Link: https://lkml.kernel.org/r/160106109960.30709.7379926726669669398.stgit@dwillia2-desk3.amr.corp.intel.com
      Link: https://lkml.kernel.org/r/159643094279.4062302.17779410714418721328.stgit@dwillia2-desk3.amr.corp.intel.com
      Link: https://lkml.kernel.org/r/159643094925.4062302.14979872973043772305.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2dd57d34
  15. 05 10月, 2020 1 次提交
    • J
      x86/xen: disable Firmware First mode for correctable memory errors · d759af38
      Juergen Gross 提交于
      When running as Xen dom0 the kernel isn't responsible for selecting the
      error handling mode, this should be handled by the hypervisor.
      
      So disable setting FF mode when running as Xen pv guest. Not doing so
      might result in boot splats like:
      
      [    7.509696] HEST: Enabling Firmware First mode for corrected errors.
      [    7.510382] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 2.
      [    7.510383] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 3.
      [    7.510384] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 4.
      [    7.510384] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 5.
      [    7.510385] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 6.
      [    7.510386] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 7.
      [    7.510386] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 8.
      
      Reason is that the HEST ACPI table contains the real number of MCA
      banks, while the hypervisor is emulating only 2 banks for guests.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Link: https://lore.kernel.org/r/20200925140751.31381-1-jgross@suse.comSigned-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      d759af38
  16. 10 9月, 2020 1 次提交
  17. 15 8月, 2020 1 次提交
  18. 11 8月, 2020 2 次提交
  19. 27 7月, 2020 1 次提交
  20. 18 7月, 2020 1 次提交
  21. 05 7月, 2020 1 次提交
  22. 18 6月, 2020 1 次提交
  23. 11 6月, 2020 6 次提交