1. 14 1月, 2020 3 次提交
  2. 05 1月, 2020 1 次提交
    • D
      mm/memory_hotplug: shrink zones when offlining memory · feee6b29
      David Hildenbrand 提交于
      We currently try to shrink a single zone when removing memory.  We use
      the zone of the first page of the memory we are removing.  If that
      memmap was never initialized (e.g., memory was never onlined), we will
      read garbage and can trigger kernel BUGs (due to a stale pointer):
      
          BUG: unable to handle page fault for address: 000000000000353d
          #PF: supervisor write access in kernel mode
          #PF: error_code(0x0002) - not-present page
          PGD 0 P4D 0
          Oops: 0002 [#1] SMP PTI
          CPU: 1 PID: 7 Comm: kworker/u8:0 Not tainted 5.3.0-rc5-next-20190820+ #317
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.4
          Workqueue: kacpi_hotplug acpi_hotplug_work_fn
          RIP: 0010:clear_zone_contiguous+0x5/0x10
          Code: 48 89 c6 48 89 c3 e8 2a fe ff ff 48 85 c0 75 cf 5b 5d c3 c6 85 fd 05 00 00 01 5b 5d c3 0f 1f 840
          RSP: 0018:ffffad2400043c98 EFLAGS: 00010246
          RAX: 0000000000000000 RBX: 0000000200000000 RCX: 0000000000000000
          RDX: 0000000000200000 RSI: 0000000000140000 RDI: 0000000000002f40
          RBP: 0000000140000000 R08: 0000000000000000 R09: 0000000000000001
          R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000140000
          R13: 0000000000140000 R14: 0000000000002f40 R15: ffff9e3e7aff3680
          FS:  0000000000000000(0000) GS:ffff9e3e7bb00000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: 000000000000353d CR3: 0000000058610000 CR4: 00000000000006e0
          DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
          DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
          Call Trace:
           __remove_pages+0x4b/0x640
           arch_remove_memory+0x63/0x8d
           try_remove_memory+0xdb/0x130
           __remove_memory+0xa/0x11
           acpi_memory_device_remove+0x70/0x100
           acpi_bus_trim+0x55/0x90
           acpi_device_hotplug+0x227/0x3a0
           acpi_hotplug_work_fn+0x1a/0x30
           process_one_work+0x221/0x550
           worker_thread+0x50/0x3b0
           kthread+0x105/0x140
           ret_from_fork+0x3a/0x50
          Modules linked in:
          CR2: 000000000000353d
      
      Instead, shrink the zones when offlining memory or when onlining failed.
      Introduce and use remove_pfn_range_from_zone(() for that.  We now
      properly shrink the zones, even if we have DIMMs whereby
      
       - Some memory blocks fall into no zone (never onlined)
      
       - Some memory blocks fall into multiple zones (offlined+re-onlined)
      
       - Multiple memory blocks that fall into different zones
      
      Drop the zone parameter (with a potential dubious value) from
      __remove_pages() and __remove_section().
      
      Link: http://lkml.kernel.org/r/20191006085646.5768-6-david@redhat.com
      Fixes: f1dd2cd1 ("mm, memory_hotplug: do not associate hotadded memory to zones until online")	[visible after d0dc12e8]
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NOscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: <stable@vger.kernel.org>	[5.0+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      feee6b29
  3. 19 12月, 2019 2 次提交
  4. 17 12月, 2019 6 次提交
    • A
      perf/x86/intel: Fix PT PMI handling · 92ca7da4
      Alexander Shishkin 提交于
      Commit:
      
        ccbebba4 ("perf/x86/intel/pt: Bypass PT vs. LBR exclusivity if the core supports it")
      
      skips the PT/LBR exclusivity check on CPUs where PT and LBRs coexist, but
      also inadvertently skips the active_events bump for PT in that case, which
      is a bug. If there aren't any hardware events at the same time as PT, the
      PMI handler will ignore PT PMIs, as active_events reads zero in that case,
      resulting in the "Uhhuh" spurious NMI warning and PT data loss.
      
      Fix this by always increasing active_events for PT events.
      
      Fixes: ccbebba4 ("perf/x86/intel/pt: Bypass PT vs. LBR exclusivity if the core supports it")
      Reported-by: NVitaly Slobodskoy <vitaly.slobodskoy@intel.com>
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NAlexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Link: https://lkml.kernel.org/r/20191210105101.77210-1-alexander.shishkin@linux.intel.com
      92ca7da4
    • A
      perf/x86/intel/bts: Fix the use of page_private() · ff61541c
      Alexander Shishkin 提交于
      Commit
      
        8062382c ("perf/x86/intel/bts: Add BTS PMU driver")
      
      brought in a warning with the BTS buffer initialization
      that is easily tripped with (assuming KPTI is disabled):
      
      instantly throwing:
      
      > ------------[ cut here ]------------
      > WARNING: CPU: 2 PID: 326 at arch/x86/events/intel/bts.c:86 bts_buffer_setup_aux+0x117/0x3d0
      > Modules linked in:
      > CPU: 2 PID: 326 Comm: perf Not tainted 5.4.0-rc8-00291-gceb9e773 #904
      > RIP: 0010:bts_buffer_setup_aux+0x117/0x3d0
      > Call Trace:
      >  rb_alloc_aux+0x339/0x550
      >  perf_mmap+0x607/0xc70
      >  mmap_region+0x76b/0xbd0
      ...
      
      It appears to assume (for lost raisins) that PagePrivate() is set,
      while later it actually tests for PagePrivate() before using
      page_private().
      
      Make it consistent and always check PagePrivate() before using
      page_private().
      
      Fixes: 8062382c ("perf/x86/intel/bts: Add BTS PMU driver")
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Link: https://lkml.kernel.org/r/20191205142853.28894-2-alexander.shishkin@linux.intel.com
      ff61541c
    • P
      perf/x86: Fix potential out-of-bounds access · 1e69a0ef
      Peter Zijlstra 提交于
      UBSAN reported out-of-bound accesses for x86_pmu.event_map(), it's
      arguments should be < x86_pmu.max_events. Make sure all users observe
      this constraint.
      Reported-by: NMeelis Roos <mroos@linux.ee>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: NMeelis Roos <mroos@linux.ee>
      1e69a0ef
    • J
      x86/mce: Fix possibly incorrect severity calculation on AMD · a3a57dda
      Jan H. Schönherr 提交于
      The function mce_severity_amd_smca() requires m->bank to be initialized
      for correct operation. Fix the one case, where mce_severity() is called
      without doing so.
      
      Fixes: 6bda529e ("x86/mce: Grade uncorrected errors for SMCA-enabled systems")
      Fixes: d28af26f ("x86/MCE: Initialize mce.bank in the case of a fatal error in mce_no_way_out()")
      Signed-off-by: NJan H. Schönherr <jschoenh@amazon.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NTony Luck <tony.luck@intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Cc: <stable@vger.kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86-ml <x86@kernel.org>
      Cc: Yazen Ghannam <Yazen.Ghannam@amd.com>
      Link: https://lkml.kernel.org/r/20191210000733.17979-4-jschoenh@amazon.de
      a3a57dda
    • Y
      x86/MCE/AMD: Allow Reserved types to be overwritten in smca_banks[] · 966af209
      Yazen Ghannam 提交于
      Each logical CPU in Scalable MCA systems controls a unique set of MCA
      banks in the system. These banks are not shared between CPUs. The bank
      types and ordering will be the same across CPUs on currently available
      systems.
      
      However, some CPUs may see a bank as Reserved/Read-as-Zero (RAZ) while
      other CPUs do not. In this case, the bank seen as Reserved on one CPU is
      assumed to be the same type as the bank seen as a known type on another
      CPU.
      
      In general, this occurs when the hardware represented by the MCA bank
      is disabled, e.g. disabled memory controllers on certain models, etc.
      The MCA bank is disabled in the hardware, so there is no possibility of
      getting an MCA/MCE from it even if it is assumed to have a known type.
      
      For example:
      
      Full system:
      	Bank  |  Type seen on CPU0  |  Type seen on CPU1
      	------------------------------------------------
      	 0    |         LS          |          LS
      	 1    |         UMC         |          UMC
      	 2    |         CS          |          CS
      
      System with hardware disabled:
      	Bank  |  Type seen on CPU0  |  Type seen on CPU1
      	------------------------------------------------
      	 0    |         LS          |          LS
      	 1    |         UMC         |          RAZ
      	 2    |         CS          |          CS
      
      For this reason, there is a single, global struct smca_banks[] that is
      initialized at boot time. This array is initialized on each CPU as it
      comes online. However, the array will not be updated if an entry already
      exists.
      
      This works as expected when the first CPU (usually CPU0) has all
      possible MCA banks enabled. But if the first CPU has a subset, then it
      will save a "Reserved" type in smca_banks[]. Successive CPUs will then
      not be able to update smca_banks[] even if they encounter a known bank
      type.
      
      This may result in unexpected behavior. Depending on the system
      configuration, a user may observe issues enumerating the MCA
      thresholding sysfs interface. The issues may be as trivial as sysfs
      entries not being available, or as severe as system hangs.
      
      For example:
      
      	Bank  |  Type seen on CPU0  |  Type seen on CPU1
      	------------------------------------------------
      	 0    |         LS          |          LS
      	 1    |         RAZ         |          UMC
      	 2    |         CS          |          CS
      
      Extend the smca_banks[] entry check to return if the entry is a
      non-reserved type. Otherwise, continue so that CPUs that encounter a
      known bank type can update smca_banks[].
      
      Fixes: 68627a69 ("x86/mce/AMD, EDAC/mce_amd: Enumerate Reserved SMCA bank type")
      Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Cc: <stable@vger.kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20191121141508.141273-1-Yazen.Ghannam@amd.com
      966af209
    • K
      x86/MCE/AMD: Do not use rdmsr_safe_on_cpu() in smca_configure() · 246ff09f
      Konstantin Khlebnikov 提交于
      ... because interrupts are disabled that early and sending IPIs can
      deadlock:
      
        BUG: sleeping function called from invalid context at kernel/sched/completion.c:99
        in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/1
        no locks held by swapper/1/0.
        irq event stamp: 0
        hardirqs last  enabled at (0): [<0000000000000000>] 0x0
        hardirqs last disabled at (0): [<ffffffff8106dda9>] copy_process+0x8b9/0x1ca0
        softirqs last  enabled at (0): [<ffffffff8106dda9>] copy_process+0x8b9/0x1ca0
        softirqs last disabled at (0): [<0000000000000000>] 0x0
        Preemption disabled at:
        [<ffffffff8104703b>] start_secondary+0x3b/0x190
        CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.5.0-rc2+ #1
        Hardware name: GIGABYTE MZ01-CE1-00/MZ01-CE1-00, BIOS F02 08/29/2018
        Call Trace:
         dump_stack
         ___might_sleep.cold.92
         wait_for_completion
         ? generic_exec_single
         rdmsr_safe_on_cpu
         ? wrmsr_on_cpus
         mce_amd_feature_init
         mcheck_cpu_init
         identify_cpu
         identify_secondary_cpu
         smp_store_cpu_info
         start_secondary
         secondary_startup_64
      
      The function smca_configure() is called only on the current CPU anyway,
      therefore replace rdmsr_safe_on_cpu() with atomic rdmsr_safe() and avoid
      the IPI.
      
       [ bp: Update commit message. ]
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NYazen Ghannam <yazen.ghannam@amd.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: linux-edac <linux-edac@vger.kernel.org>
      Cc: <stable@vger.kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/157252708836.3876.4604398213417262402.stgit@buzz
      246ff09f
  5. 14 12月, 2019 1 次提交
  6. 11 12月, 2019 1 次提交
  7. 10 12月, 2019 1 次提交
  8. 05 12月, 2019 2 次提交
    • M
      arch: sembuf.h: make uapi asm/sembuf.h self-contained · 0fb9dc28
      Masahiro Yamada 提交于
      Userspace cannot compile <asm/sembuf.h> due to some missing type
      definitions.  For example, building it for x86 fails as follows:
      
          CC      usr/include/asm/sembuf.h.s
        In file included from <command-line>:32:0:
        usr/include/asm/sembuf.h:17:20: error: field `sem_perm' has incomplete type
          struct ipc64_perm sem_perm; /* permissions .. see ipc.h */
                            ^~~~~~~~
        usr/include/asm/sembuf.h:24:2: error: unknown type name `__kernel_time_t'
          __kernel_time_t sem_otime; /* last semop time */
          ^~~~~~~~~~~~~~~
        usr/include/asm/sembuf.h:25:2: error: unknown type name `__kernel_ulong_t'
          __kernel_ulong_t __unused1;
          ^~~~~~~~~~~~~~~~
        usr/include/asm/sembuf.h:26:2: error: unknown type name `__kernel_time_t'
          __kernel_time_t sem_ctime; /* last change time */
          ^~~~~~~~~~~~~~~
        usr/include/asm/sembuf.h:27:2: error: unknown type name `__kernel_ulong_t'
          __kernel_ulong_t __unused2;
          ^~~~~~~~~~~~~~~~
        usr/include/asm/sembuf.h:29:2: error: unknown type name `__kernel_ulong_t'
          __kernel_ulong_t sem_nsems; /* no. of semaphores in array */
          ^~~~~~~~~~~~~~~~
        usr/include/asm/sembuf.h:30:2: error: unknown type name `__kernel_ulong_t'
          __kernel_ulong_t __unused3;
          ^~~~~~~~~~~~~~~~
        usr/include/asm/sembuf.h:31:2: error: unknown type name `__kernel_ulong_t'
          __kernel_ulong_t __unused4;
          ^~~~~~~~~~~~~~~~
      
      It is just a matter of missing include directive.
      
      Include <asm/ipcbuf.h> to make it self-contained, and add it to
      the compile-test coverage.
      
      Link: http://lkml.kernel.org/r/20191030063855.9989-3-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0fb9dc28
    • M
      arch: msgbuf.h: make uapi asm/msgbuf.h self-contained · 9ef0e004
      Masahiro Yamada 提交于
      Userspace cannot compile <asm/msgbuf.h> due to some missing type
      definitions.  For example, building it for x86 fails as follows:
      
          CC      usr/include/asm/msgbuf.h.s
        In file included from usr/include/asm/msgbuf.h:6:0,
                         from <command-line>:32:
        usr/include/asm-generic/msgbuf.h:25:20: error: field `msg_perm' has incomplete type
          struct ipc64_perm msg_perm;
                            ^~~~~~~~
        usr/include/asm-generic/msgbuf.h:27:2: error: unknown type name `__kernel_time_t'
          __kernel_time_t msg_stime; /* last msgsnd time */
          ^~~~~~~~~~~~~~~
        usr/include/asm-generic/msgbuf.h:28:2: error: unknown type name `__kernel_time_t'
          __kernel_time_t msg_rtime; /* last msgrcv time */
          ^~~~~~~~~~~~~~~
        usr/include/asm-generic/msgbuf.h:29:2: error: unknown type name `__kernel_time_t'
          __kernel_time_t msg_ctime; /* last change time */
          ^~~~~~~~~~~~~~~
        usr/include/asm-generic/msgbuf.h:41:2: error: unknown type name `__kernel_pid_t'
          __kernel_pid_t msg_lspid; /* pid of last msgsnd */
          ^~~~~~~~~~~~~~
        usr/include/asm-generic/msgbuf.h:42:2: error: unknown type name `__kernel_pid_t'
          __kernel_pid_t msg_lrpid; /* last receive pid */
          ^~~~~~~~~~~~~~
      
      It is just a matter of missing include directive.
      
      Include <asm/ipcbuf.h> to make it self-contained, and add it to
      the compile-test coverage.
      
      Link: http://lkml.kernel.org/r/20191030063855.9989-2-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9ef0e004
  9. 04 12月, 2019 3 次提交
    • J
      kvm: vmx: Stop wasting a page for guest_msrs · 7d73710d
      Jim Mattson 提交于
      We will never need more guest_msrs than there are indices in
      vmx_msr_index. Thus, at present, the guest_msrs array will not exceed
      168 bytes.
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7d73710d
    • P
      KVM: x86: fix out-of-bounds write in KVM_GET_EMULATED_CPUID (CVE-2019-19332) · 433f4ba1
      Paolo Bonzini 提交于
      The bounds check was present in KVM_GET_SUPPORTED_CPUID but not
      KVM_GET_EMULATED_CPUID.
      
      Reported-by: syzbot+e3f4897236c4eeb8af4f@syzkaller.appspotmail.com
      Fixes: 84cffe49 ("kvm: Emulate MOVBE", 2013-10-29)
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      433f4ba1
    • D
      x86/efi: Update e820 with reserved EFI boot services data to fix kexec breakage · af164898
      Dave Young 提交于
      Michael Weiser reported that he got this error during a kexec rebooting:
      
        esrt: Unsupported ESRT version 2904149718861218184.
      
      The ESRT memory stays in EFI boot services data, and it was reserved
      in kernel via efi_mem_reserve().  The initial purpose of the reservation
      is to reuse the EFI boot services data across kexec reboot. For example
      the BGRT image data and some ESRT memory like Michael reported.
      
      But although the memory is reserved it is not updated in the X86 E820 table,
      and kexec_file_load() iterates system RAM in the IO resource list to find places
      for kernel, initramfs and other stuff. In Michael's case the kexec loaded
      initramfs overwrote the ESRT memory and then the failure happened.
      
      Since kexec_file_load() depends on the E820 table being updated, just fix this
      by updating the reserved EFI boot services memory as reserved type in E820.
      
      Originally any memory descriptors with EFI_MEMORY_RUNTIME attribute are
      bypassed in the reservation code path because they are assumed as reserved.
      
      But the reservation is still needed for multiple kexec reboots,
      and it is the only possible case we come here thus just drop the code
      chunk, then everything works without side effects.
      
      On my machine the ESRT memory sits in an EFI runtime data range, it does
      not trigger the problem, but I successfully tested with BGRT instead.
      both kexec_load() and kexec_file_load() work and kdump works as well.
      
      [ mingo: Edited the changelog. ]
      Reported-by: NMichael Weiser <michael@weiser.dinsnail.net>
      Tested-by: NMichael Weiser <michael@weiser.dinsnail.net>
      Signed-off-by: NDave Young <dyoung@redhat.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kexec@lists.infradead.org
      Cc: linux-efi@vger.kernel.org
      Link: https://lkml.kernel.org/r/20191204075233.GA10520@dhcp-128-65.nay.redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      af164898
  10. 02 12月, 2019 2 次提交
    • D
      x86/kasan: support KASAN_VMALLOC · 0609ae01
      Daniel Axtens 提交于
      In the case where KASAN directly allocates memory to back vmalloc space,
      don't map the early shadow page over it.
      
      We prepopulate pgds/p4ds for the range that would otherwise be empty.
      This is required to get it synced to hardware on boot, allowing the
      lower levels of the page tables to be filled dynamically.
      
      Link: http://lkml.kernel.org/r/20191031093909.9228-5-dja@axtens.netSigned-off-by: NDaniel Axtens <dja@axtens.net>
      Acked-by: NDmitry Vyukov <dvyukov@google.com>
      Reviewed-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0609ae01
    • I
      x86/mm/pat: Fix off-by-one bugs in interval tree search · 91298f1a
      Ingo Molnar 提交于
      There's a bug in the new PAT code, the conversion of memtype_check_conflict()
      is buggy:
      
         8d04a5f9: ("x86/mm/pat: Convert the PAT tree to a generic interval tree")
      
              dprintk("Overlap at 0x%Lx-0x%Lx\n", match->start, match->end);
              found_type = match->type;
      
      -       node = rb_next(&match->rb);
      -       while (node) {
      -               match = rb_entry(node, struct memtype, rb);
      -
      -               if (match->start >= end) /* Checked all possible matches */
      -                       goto success;
      -
      -               if (is_node_overlap(match, start, end) &&
      -                   match->type != found_type) {
      +       match = memtype_interval_iter_next(match, start, end);
      +       while (match) {
      +               if (match->type != found_type)
                              goto failure;
      -               }
      
      -               node = rb_next(&match->rb);
      +               match = memtype_interval_iter_next(match, start, end);
              }
      
      Note how the '>= end' condition to end the interval check, got converted
      into:
      
      +       match = memtype_interval_iter_next(match, start, end);
      
      This is subtly off by one, because the interval trees interfaces require
      closed interval parameters:
      
        include/linux/interval_tree_generic.h
      
       /*                                                                            \
        * Iterate over intervals intersecting [start;last]                           \
        *                                                                            \
        * Note that a node's interval intersects [start;last] iff:                   \
        *   Cond1: ITSTART(node) <= last                                             \
        * and                                                                        \
        *   Cond2: start <= ITLAST(node)                                             \
        */                                                                           \
      
        ...
      
                      if (ITSTART(node) <= last) {            /* Cond1 */           \
                              if (start <= ITLAST(node))      /* Cond2 */           \
                                      return node;    /* node is leftmost match */  \
      
      [start;last] is a closed interval (note that '<= last' check) - while the
      PAT 'end' parameter is 1 byte beyond the end of the range, because
      ioremap() and the other mapping APIs usually use the [start,end)
      half-open interval, derived from 'size'.
      
      This is what ioremap() does for example:
      
              /*
               * Mappings have to be page-aligned
               */
              offset = phys_addr & ~PAGE_MASK;
              phys_addr &= PHYSICAL_PAGE_MASK;
              size = PAGE_ALIGN(last_addr+1) - phys_addr;
      
              retval = reserve_memtype(phys_addr, (u64)phys_addr + size,
                                                      pcm, &new_pcm);
      
      phys_addr+size will be on a page boundary, after the last byte of the
      mapped interval.
      
      So the correct parameter to use in the interval tree searches is not
      'end' but 'end-1'.
      
      This could have relevance if conflicting PAT ranges are exactly adjacent,
      for example a future WC region is followed immediately by an already
      mapped UC- region - in this case memtype_check_conflict() would
      incorrectly deny the WC memtype region and downgrade the memtype to UC-.
      
      BTW., rather annoyingly this downgrading is done silently in
      memtype_check_insert():
      
      int memtype_check_insert(struct memtype *new,
                               enum page_cache_mode *ret_type)
      {
              int err = 0;
      
              err = memtype_check_conflict(new->start, new->end, new->type, ret_type);
              if (err)
                      return err;
      
              if (ret_type)
                      new->type = *ret_type;
      
              memtype_interval_insert(new, &memtype_rbroot);
              return 0;
      }
      
      So on such a conflict we'd just silently get UC- in *ret_type, and write
      it into the new region, never the wiser ...
      
      So assuming that the patch below fixes the primary bug the diagnostics
      side of ioremap() cache attribute downgrades would be another thing to
      fix.
      
      Anyway, I checked all the interval-tree iterations, and most of them are
      off by one - but I think the one related to memtype_check_conflict() is
      the one causing this particular performance regression.
      
      The only correct interval-tree searches were these two:
      
        arch/x86/mm/pat_interval.c:     match = memtype_interval_iter_first(&memtype_rbroot, 0, ULONG_MAX);
        arch/x86/mm/pat_interval.c:             match = memtype_interval_iter_next(match, 0, ULONG_MAX);
      
      The ULONG_MAX was hiding the off-by-one in plain sight. :-)
      
      Note that the bug was probably benign in the sense of implementing a too
      strict cache attribute conflict policy and downgrading cache attributes,
      so AFAICS the worst outcome of this bug would be a performance regression,
      not any instabilities.
      Reported-by: Nkernel test robot <rong.a.chen@intel.com>
      Reported-by: NKenneth R. Crudup <kenny@panix.com>
      Reported-by: NMariusz Ceier <mceier+kernel@gmail.com>
      Tested-by: NMariusz Ceier <mceier@gmail.com>
      Tested-by: NKenneth R. Crudup <kenny@panix.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20191201144947.GA4167@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      91298f1a
  11. 01 12月, 2019 1 次提交
  12. 29 11月, 2019 3 次提交
  13. 28 11月, 2019 1 次提交
    • S
      x86/fpu: Don't cache access to fpu_fpregs_owner_ctx · 59c4bd85
      Sebastian Andrzej Siewior 提交于
      The state/owner of the FPU is saved to fpu_fpregs_owner_ctx by pointing
      to the context that is currently loaded. It never changed during the
      lifetime of a task - it remained stable/constant.
      
      After deferred FPU registers loading until return to userland was
      implemented, the content of fpu_fpregs_owner_ctx may change during
      preemption and must not be cached.
      
      This went unnoticed for some time and was now noticed, in particular
      since gcc 9 is caching that load in copy_fpstate_to_sigframe() and
      reusing it in the retry loop:
      
        copy_fpstate_to_sigframe()
          load fpu_fpregs_owner_ctx and save on stack
          fpregs_lock()
          copy_fpregs_to_sigframe() /* failed */
          fpregs_unlock()
               *** PREEMPTION, another uses FPU, changes fpu_fpregs_owner_ctx ***
      
          fault_in_pages_writeable() /* succeed, retry */
      
          fpregs_lock()
      	__fpregs_load_activate()
      	  fpregs_state_valid() /* uses fpu_fpregs_owner_ctx from stack */
          copy_fpregs_to_sigframe() /* succeeds, random FPU content */
      
      This is a comparison of the assembly produced by gcc 9, without vs with this
      patch:
      
      | # arch/x86/kernel/fpu/signal.c:173:      if (!access_ok(buf, size))
      |        cmpq    %rdx, %rax      # tmp183, _4
      |        jb      .L190   #,
      |-# arch/x86/include/asm/fpu/internal.h:512:       return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
      |-#APP
      |-# 512 "arch/x86/include/asm/fpu/internal.h" 1
      |-       movq %gs:fpu_fpregs_owner_ctx,%rax      #, pfo_ret__
      |-# 0 "" 2
      |-#NO_APP
      |-       movq    %rax, -88(%rbp) # pfo_ret__, %sfp
      …
      |-# arch/x86/include/asm/fpu/internal.h:512:       return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
      |-       movq    -88(%rbp), %rcx # %sfp, pfo_ret__
      |-       cmpq    %rcx, -64(%rbp) # pfo_ret__, %sfp
      |+# arch/x86/include/asm/fpu/internal.h:512:       return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
      |+#APP
      |+# 512 "arch/x86/include/asm/fpu/internal.h" 1
      |+       movq %gs:fpu_fpregs_owner_ctx(%rip),%rax        # fpu_fpregs_owner_ctx, pfo_ret__
      |+# 0 "" 2
      |+# arch/x86/include/asm/fpu/internal.h:512:       return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
      |+#NO_APP
      |+       cmpq    %rax, -64(%rbp) # pfo_ret__, %sfp
      
      Use this_cpu_read() instead this_cpu_read_stable() to avoid caching of
      fpu_fpregs_owner_ctx during preemption points.
      
      The Fixes: tag points to the commit where deferred FPU loading was
      added. Since this commit, the compiler is no longer allowed to move the
      load of fpu_fpregs_owner_ctx somewhere else / outside of the locked
      section. A task preemption will change its value and stale content will
      be observed.
      
       [ bp: Massage. ]
      Debugged-by: NAustin Clements <austin@google.com>
      Debugged-by: NDavid Chase <drchase@golang.org>
      Debugged-by: NIan Lance Taylor <ian@airs.com>
      Fixes: 5f409e20 ("x86/fpu: Defer FPU state load until return to userspace")
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NRik van Riel <riel@surriel.com>
      Tested-by: NBorislav Petkov <bp@suse.de>
      Cc: Aubrey Li <aubrey.li@intel.com>
      Cc: Austin Clements <austin@google.com>
      Cc: Barret Rhoden <brho@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: David Chase <drchase@golang.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: ian@airs.com
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Josh Bleecher Snyder <josharian@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20191128085306.hxfa2o3knqtu4wfn@linutronix.de
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=205663
      59c4bd85
  14. 27 11月, 2019 13 次提交
    • P
      KVM x86: Move kvm cpuid support out of svm · c1de0f25
      Peter Gonda 提交于
      Memory encryption support does not have module parameter dependencies
      and can be moved into the general x86 cpuid __do_cpuid_ent function.
      This changes maintains current behavior of passing through all of
      CPUID.8000001F.
      Suggested-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPeter Gonda <pgonda@google.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c1de0f25
    • B
      x86/entry/32: Remove unused 'restore_all_notrace' local label · 3e1b4358
      Borislav Petkov 提交于
      Signed-off-by: NBorislav Petkov <bp@alien8.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      3e1b4358
    • A
      perf/x86: Implement immediate enforcement of /sys/devices/cpu/rdpmc value of 0 · 405b4537
      Anthony Steinhauser 提交于
      When you successfully write 0 to /sys/devices/cpu/rdpmc, the RDPMC
      instruction should be disabled unconditionally and immediately (after you
      close the SYSFS file) by the documentation.
      
      Instead, in the current implementation the PMU must be reloaded which
      happens only eventually some time in the future. Only after that the RDPMC
      instruction becomes disabled (on ring 3) on the respective core.
      
      This change makes the treatment of the 0 value as blocking and as
      unconditional as the current treatment of the 2 value, only the CR4.PCE
      bit is naturally set to false instead of true.
      Signed-off-by: NAnthony Steinhauser <asteinhauser@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Link: https://lkml.kernel.org/r/20191125054838.137615-1-asteinhauser@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      405b4537
    • J
      crypto: arch - conditionalize crypto api in arch glue for lib code · 8394bfec
      Jason A. Donenfeld 提交于
      For glue code that's used by Zinc, the actual Crypto API functions might
      not necessarily exist, and don't need to exist either. Before this
      patch, there are valid build configurations that lead to a unbuildable
      kernel. This fixes it to conditionalize those symbols on the existence
      of the proper config entry.
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Acked-by: NArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      8394bfec
    • A
      x86/ptrace: Document FSBASE and GSBASE ABI oddities · 56f2ab41
      Andy Lutomirski 提交于
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      56f2ab41
    • A
      x86/ptrace: Remove set_segment_reg() implementations for current · 8e05f1b4
      Andy Lutomirski 提交于
      seg_segment_reg() should be unreachable with task == current.
      Rather than confusingly trying to make it work, just explicitly
      disable this case.
      
      (regset->get is used for current in the coredump code, but the ->set
       interface is only used for ptrace, and you can't ptrace yourself.)
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      8e05f1b4
    • A
      x86/traps: die() instead of panicking on a double fault · 0337b7eb
      Andy Lutomirski 提交于
      A double fault has a decent chance of being recoverable by killing
      the offending thread.  Use die() so that we at least try to recover.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      0337b7eb
    • A
      x86/doublefault/32: Rewrite the x86_32 #DF handler and unify with 64-bit · 7d8d8cfd
      Andy Lutomirski 提交于
      The old x86_32 doublefault_fn() was old and crufty, and it did not
      even try to recover.  do_double_fault() is much nicer.  Rewrite the
      32-bit double fault code to sanitize CPU state and call
      do_double_fault().  This is mostly an exercise i386 archaeology.
      
      With this patch applied, 32-bit double faults get a real stack trace,
      just like 64-bit double faults.
      
      [ mingo: merged the patch to a later kernel base. ]
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      7d8d8cfd
    • A
      x86/doublefault/32: Move #DF stack and TSS to cpu_entry_area · dc4e0021
      Andy Lutomirski 提交于
      There are three problems with the current layout of the doublefault
      stack and TSS.  First, the TSS is only cacheline-aligned, which is
      not enough -- if the hardware portion of the TSS (struct x86_hw_tss)
      crosses a page boundary, horrible things happen [0].  Second, the
      stack and TSS are global, so simultaneous double faults on different
      CPUs will cause massive corruption.  Third, the whole mechanism
      won't work if user CR3 is loaded, resulting in a triple fault [1].
      
      Let the doublefault stack and TSS share a page (which prevents the
      TSS from spanning a page boundary), make it percpu, and move it into
      cpu_entry_area.  Teach the stack dump code about the doublefault
      stack.
      
      [0] Real hardware will read past the end of the page onto the next
          *physical* page if a task switch happens.  Virtual machines may
          have any number of bugs, and I would consider it reasonable for
          a VM to summarily kill the guest if it tries to task-switch to
          a page-spanning TSS.
      
      [1] Real hardware triple faults.  At least some VMs seem to hang.
          I'm not sure what's going on.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      dc4e0021
    • A
      x86/doublefault/32: Rename doublefault.c to doublefault_32.c · e99b6f46
      Andy Lutomirski 提交于
      doublefault.c now only contains 32-bit code.  Rename it to
      doublefault_32.c.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      e99b6f46
    • A
      x86/traps: Disentangle the 32-bit and 64-bit doublefault code · 93efbde2
      Andy Lutomirski 提交于
      The 64-bit doublefault handler is much nicer than the 32-bit one.
      As a first step toward unifying them, make the 64-bit handler
      self-contained.  This should have no effect no functional effect
      except in the odd case of x86_64 with CONFIG_DOUBLEFAULT=n in which
      case it will change the logging a bit.
      
      This also gets rid of CONFIG_DOUBLEFAULT configurability on 64-bit
      kernels.  It didn't do anything useful -- CONFIG_DOUBLEFAULT=n
      didn't actually disable doublefault handling on x86_64.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      93efbde2
    • J
      x86/mm/32: Sync only to VMALLOC_END in vmalloc_sync_all() · 9a62d200
      Joerg Roedel 提交于
      The job of vmalloc_sync_all() is to help the lazy freeing of vmalloc()
      ranges: before such vmap ranges are reused we make sure that they are
      unmapped from every task's page tables.
      
      This is really easy on pagetable setups where the kernel page tables
      are shared between all tasks - this is the case on 32-bit kernels
      with SHARED_KERNEL_PMD = 1.
      
      But on !SHARED_KERNEL_PMD 32-bit kernels this involves iterating
      over the pgd_list and clearing all pmd entries in the pgds that
      are cleared in the init_mm.pgd, which is the reference pagetable
      that the vmalloc() code uses.
      
      In that context the current practice of vmalloc_sync_all() iterating
      until FIX_ADDR_TOP is buggy:
      
              for (address = VMALLOC_START & PMD_MASK;
                   address >= TASK_SIZE_MAX && address < FIXADDR_TOP;
                   address += PMD_SIZE) {
                      struct page *page;
      
      Because iterating up to FIXADDR_TOP will involve a lot of non-vmalloc
      address ranges:
      
      	VMALLOC -> PKMAP -> LDT -> CPU_ENTRY_AREA -> FIX_ADDR
      
      This is mostly harmless for the FIX_ADDR and CPU_ENTRY_AREA ranges
      that don't clear their pmds, but it's lethal for the LDT range,
      which relies on having different mappings in different processes,
      and 'synchronizing' them in the vmalloc sense corrupts those
      pagetable entries (clearing them).
      
      This got particularly prominent with PTI, which turns SHARED_KERNEL_PMD
      off and makes this the dominant mapping mode on 32-bit.
      
      To make LDT working again vmalloc_sync_all() must only iterate over
      the volatile parts of the kernel address range that are identical
      between all processes.
      
      So the correct check in vmalloc_sync_all() is "address < VMALLOC_END"
      to make sure the VMALLOC areas are synchronized and the LDT
      mapping is not falsely overwritten.
      
      The CPU_ENTRY_AREA and the FIXMAP area are no longer synced either,
      but this is not really a proplem since their PMDs get established
      during bootup and never change.
      
      This change fixes the ldt_gdt selftest in my setup.
      
      [ mingo: Fixed up the changelog to explain the logic and modified the
               copying to only happen up until VMALLOC_END. ]
      Reported-by: NBorislav Petkov <bp@suse.de>
      Tested-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      Cc: <stable@vger.kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: hpa@zytor.com
      Fixes: 7757d607: ("x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32")
      Link: https://lkml.kernel.org/r/20191126111119.GA110513@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9a62d200
    • I
      x86/iopl: Make 'struct tss_struct' constant size again · 0bcd7762
      Ingo Molnar 提交于
      After the following commit:
      
        05b042a1: ("x86/pti/32: Calculate the various PTI cpu_entry_area sizes correctly, make the CPU_ENTRY_AREA_PAGES assert precise")
      
      'struct cpu_entry_area' has to be Kconfig invariant, so that we always
      have a matching CPU_ENTRY_AREA_PAGES size.
      
      This commit added a CONFIG_X86_IOPL_IOPERM dependency to tss_struct:
      
        111e7b15: ("x86/ioperm: Extend IOPL config to control ioperm() as well")
      
      Which, if CONFIG_X86_IOPL_IOPERM is turned off, reduces the size of
      cpu_entry_area by two pages, triggering the assert:
      
        ./include/linux/compiler.h:391:38: error: call to ‘__compiletime_assert_202’ declared with attribute error: BUILD_BUG_ON failed: (CPU_ENTRY_AREA_PAGES+1)*PAGE_SIZE != CPU_ENTRY_AREA_MAP_SIZE
      
      Simplify the Kconfig dependencies and make cpu_entry_area constant
      size on 32-bit kernels again.
      
      Fixes: 05b042a1: ("x86/pti/32: Calculate the various PTI cpu_entry_area sizes correctly, make the CPU_ENTRY_AREA_PAGES assert precise")
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      0bcd7762