提交 · 938de2324afb53acd20893cd8eec8ee6f80de646 · openanolis / cloud-kernel

18 12月, 2019 2 次提交

x86/MCE/AMD: Carve out the MC4_MISC thresholding quirk · 938de232

由 Shirish S 提交于 1月 16, 2019

[ Upstream commit 30aa3d26edb0f3d7992757287eec0ca588a5c259 ]

The MC4_MISC thresholding quirk needs to be applied during S5 -> S0 and
S3 -> S0 state transitions, which follow different code paths. Carve it
out into a separate function and call it mce_amd_feature_init() where
the two code paths of the state transitions converge.

 [ bp: massage commit message and the carved out function. ]
Signed-off-by: NShirish S <shirish.s@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/1547651417-23583-3-git-send-email-shirish.s@amd.comSigned-off-by: NSasha Levin <sashal@kernel.org>

938de232

x86/MCE/AMD: Turn off MC4_MISC thresholding on all family 0x15 models · 805f5ff8

由 Shirish S 提交于 1月 10, 2019

[ Upstream commit c95b323dcd3598dd7ef5005d6723c1ba3b801093 ]

MC4_MISC thresholding is not supported on all family 0x15 processors,
hence skip the x86_model check when applying the quirk.

 [ bp: massage commit message. ]
Signed-off-by: NShirish S <shirish.s@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/1547106849-3476-2-git-send-email-shirish.s@amd.comSigned-off-by: NSasha Levin <sashal@kernel.org>

805f5ff8

13 12月, 2019 6 次提交

KVM: x86: fix out-of-bounds write in KVM_GET_EMULATED_CPUID (CVE-2019-19332) · 5119ffd4

由 Paolo Bonzini 提交于 12月 04, 2019

commit 433f4ba1904100da65a311033f17a9bf586b287e upstream.

The bounds check was present in KVM_GET_SUPPORTED_CPUID but not
KVM_GET_EMULATED_CPUID.

Reported-by: syzbot+e3f4897236c4eeb8af4f@syzkaller.appspotmail.com
Fixes: 84cffe49 ("kvm: Emulate MOVBE", 2013-10-29)
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

5119ffd4

KVM: x86: Grab KVM's srcu lock when setting nested state · 5cbc7ff5

由 Sean Christopherson 提交于 11月 22, 2019

commit ad5996d9a0e8019c3ae5151e687939369acfe044 upstream.

Acquire kvm->srcu for the duration of ->set_nested_state() to fix a bug
where nVMX derefences ->memslots without holding ->srcu or ->slots_lock.

The other half of nested migration, ->get_nested_state(), does not need
to acquire ->srcu as it is a purely a dump of internal KVM (and CPU)
state to userspace.

Detected as an RCU lockdep splat that is 100% reproducible by running
KVM's state_test selftest with CONFIG_PROVE_LOCKING=y.  Note that the
failing function, kvm_is_visible_gfn(), is only checking the validity of
a gfn, it's not actually accessing guest memory (which is more or less
unsupported during vmx_set_nested_state() due to incorrect MMU state),
i.e. vmx_set_nested_state() itself isn't fundamentally broken.  In any
case, setting nested state isn't a fast path so there's no reason to go
out of our way to avoid taking ->srcu.

  =============================
  WARNING: suspicious RCU usage
  5.4.0-rc7+ #94 Not tainted
  -----------------------------
  include/linux/kvm_host.h:626 suspicious rcu_dereference_check() usage!

               other info that might help us debug this:

  rcu_scheduler_active = 2, debug_locks = 1
  1 lock held by evmcs_test/10939:
   #0: ffff88826ffcb800 (&vcpu->mutex){+.+.}, at: kvm_vcpu_ioctl+0x85/0x630 [kvm]

  stack backtrace:
  CPU: 1 PID: 10939 Comm: evmcs_test Not tainted 5.4.0-rc7+ #94
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  Call Trace:
   dump_stack+0x68/0x9b
   kvm_is_visible_gfn+0x179/0x180 [kvm]
   mmu_check_root+0x11/0x30 [kvm]
   fast_cr3_switch+0x40/0x120 [kvm]
   kvm_mmu_new_cr3+0x34/0x60 [kvm]
   nested_vmx_load_cr3+0xbd/0x1f0 [kvm_intel]
   nested_vmx_enter_non_root_mode+0xab8/0x1d60 [kvm_intel]
   vmx_set_nested_state+0x256/0x340 [kvm_intel]
   kvm_arch_vcpu_ioctl+0x491/0x11a0 [kvm]
   kvm_vcpu_ioctl+0xde/0x630 [kvm]
   do_vfs_ioctl+0xa2/0x6c0
   ksys_ioctl+0x66/0x70
   __x64_sys_ioctl+0x16/0x20
   do_syscall_64+0x54/0x200
   entry_SYSCALL_64_after_hwframe+0x49/0xbe
  RIP: 0033:0x7f59a2b95f47

Fixes: 8fcc4b59 ("kvm: nVMX: Introduce KVM_CAP_NESTED_STATE")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

5cbc7ff5

KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES · 6a10f818

由 Paolo Bonzini 提交于 11月 18, 2019

commit cbbaa2727aa3ae9e0a844803da7cef7fd3b94f2b upstream.

KVM does not implement MSR_IA32_TSX_CTRL, so it must not be presented
to the guests.  It is also confusing to have !ARCH_CAP_TSX_CTRL_MSR &&
!RTM && ARCH_CAP_TAA_NO: lack of MSR_IA32_TSX_CTRL suggests TSX was not
hidden (it actually was), yet the value says that TSX is not vulnerable
to microarchitectural data sampling.  Fix both.

Cc: stable@vger.kernel.org
Tested-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

6a10f818

KVM: x86: do not modify masked bits of shared MSRs · 5efbd9a9

由 Paolo Bonzini 提交于 11月 18, 2019

commit de1fca5d6e0105c9d33924e1247e2f386efc3ece upstream.

"Shared MSRs" are guest MSRs that are written to the host MSRs but
keep their value until the next return to userspace.  They support
a mask, so that some bits keep the host value, but this mask is
only used to skip an unnecessary MSR write and the value written
to the MSR is always the guest MSR.

Fix this and, while at it, do not update smsr->values[slot].curr if
for whatever reason the wrmsr fails.  This should only happen due to
reserved bits, so the value written to smsr->values[slot].curr
will not match when the user-return notifier and the host value will
always be restored.  However, it is untidy and in rare cases this
can actually avoid spurious WRMSRs on return to userspace.

Cc: stable@vger.kernel.org
Reviewed-by: NJim Mattson <jmattson@google.com>
Tested-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

5efbd9a9

x86/PCI: Avoid AMD FCH XHCI USB PME# from D0 defect · 2e5c738a

由 Kai-Heng Feng 提交于 9月 02, 2019

commit 7e8ce0e2b036dbc6617184317983aea4f2c52099 upstream.

The AMD FCH USB XHCI Controller advertises support for generating PME#
while in D0. When in D0, it does signal PME# for USB 3.0 connect events,
but not for USB 2.0 or USB 1.1 connect events, which means the controller
doesn't wake correctly for those events.

00:10.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] FCH USB XHCI Controller [1022:7914] (rev 20) (prog-if 30 [XHCI])
Subsystem: Dell FCH USB XHCI Controller [1028:087e]
Capabilities: [50] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)

Clear PCI_PM_CAP_PME_D0 in dev->pme_support to indicate the device will not
assert PME# from D0 so we don't rely on it.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203673
Link: https://lore.kernel.org/r/20190902145252.32111-1-kai.heng.feng@canonical.comSigned-off-by: NKai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

2e5c738a

x86/mm/32: Sync only to VMALLOC_END in vmalloc_sync_all() · 37d080a4

由 Joerg Roedel 提交于 11月 26, 2019

commit 9a62d20027da3164a22244d9f022c0c987261687 upstream.

The job of vmalloc_sync_all() is to help the lazy freeing of vmalloc()
ranges: before such vmap ranges are reused we make sure that they are
unmapped from every task's page tables.

This is really easy on pagetable setups where the kernel page tables
are shared between all tasks - this is the case on 32-bit kernels
with SHARED_KERNEL_PMD = 1.

But on !SHARED_KERNEL_PMD 32-bit kernels this involves iterating
over the pgd_list and clearing all pmd entries in the pgds that
are cleared in the init_mm.pgd, which is the reference pagetable
that the vmalloc() code uses.

In that context the current practice of vmalloc_sync_all() iterating
until FIX_ADDR_TOP is buggy:

        for (address = VMALLOC_START & PMD_MASK;
             address >= TASK_SIZE_MAX && address < FIXADDR_TOP;
             address += PMD_SIZE) {
                struct page *page;

Because iterating up to FIXADDR_TOP will involve a lot of non-vmalloc
address ranges:

	VMALLOC -> PKMAP -> LDT -> CPU_ENTRY_AREA -> FIX_ADDR

This is mostly harmless for the FIX_ADDR and CPU_ENTRY_AREA ranges
that don't clear their pmds, but it's lethal for the LDT range,
which relies on having different mappings in different processes,
and 'synchronizing' them in the vmalloc sense corrupts those
pagetable entries (clearing them).

This got particularly prominent with PTI, which turns SHARED_KERNEL_PMD
off and makes this the dominant mapping mode on 32-bit.

To make LDT working again vmalloc_sync_all() must only iterate over
the volatile parts of the kernel address range that are identical
between all processes.

So the correct check in vmalloc_sync_all() is "address < VMALLOC_END"
to make sure the VMALLOC areas are synchronized and the LDT
mapping is not falsely overwritten.

The CPU_ENTRY_AREA and the FIXMAP area are no longer synced either,
but this is not really a proplem since their PMDs get established
during bootup and never change.

This change fixes the ldt_gdt selftest in my setup.

[ mingo: Fixed up the changelog to explain the logic and modified the
         copying to only happen up until VMALLOC_END. ]
Reported-by: NBorislav Petkov <bp@suse.de>
Tested-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: hpa@zytor.com
Fixes: 7757d607: ("x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32")
Link: https://lkml.kernel.org/r/20191126111119.GA110513@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

37d080a4

05 12月, 2019 6 次提交

kprobes/x86: Show x86-64 specific blacklisted symbols correctly · 962b1f6b

由 Masami Hiramatsu 提交于 12月 17, 2018

[ Upstream commit fe6e65615415987629a2dda583b4495677d8c388 ]

Show x86-64 specific blacklisted symbols in debugfs.

Since x86-64 prohibits probing on symbols which are in
entry text, those should be shown.
Tested-by: NAndrea Righi <righi.andrea@gmail.com>
Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yonghong Song <yhs@fb.com>
Link: http://lkml.kernel.org/r/154503488425.26176.17136784384033608516.stgit@devboxSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

962b1f6b

kprobes/x86/xen: blacklist non-attachable xen interrupt functions · 24d992bc

由 Andrea Righi 提交于 12月 10, 2018

[ Upstream commit bf9445a33ae6ac2f0822d2f1ce1365408387d568 ]

Blacklist symbols in Xen probe-prohibited areas, so that user can see
these prohibited symbols in debugfs.

See also: a50480cb6d61.
Signed-off-by: NAndrea Righi <righi.andrea@gmail.com>
Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

24d992bc

Revert "KVM: nVMX: move check_vmentry_postreqs() call to nested_vmx_enter_non_root_mode()" · 48012689

由 Greg Kroah-Hartman 提交于 12月 05, 2019

This reverts commit 7392aa08 which is
commit 7671ce21b13b9596163a29f4712cb2451a9b97dc upstream.

It should not have been selected for a stable kernel as it breaks the
nVMX regression tests.
Reported-by: NJack Wang <jack.wang.usish@gmail.com>
Reported-by: NPaolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Sasha Levin <sashal@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

48012689

Revert "KVM: nVMX: reset cache/shadows when switching loaded VMCS" · 1865538d

由 Greg Kroah-Hartman 提交于 12月 05, 2019

This reverts commit 9fe573d5 which is
commit b7031fd40fcc741b0f9b0c04c8d844e445858b84 upstream.

It should not have been selected for a stable kernel as it breaks the
nVMX regression tests.
Reported-by: NJack Wang <jack.wang.usish@gmail.com>
Reported-by: NPaolo Bonzini <pbonzini@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Sasha Levin <sashal@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

1865538d

kvm: vmx: Set IA32_TSC_AUX for legacy mode guests · 279bbd22

由 Jim Mattson 提交于 12月 05, 2018

[ Upstream commit 0023ef39dc35c773c436eaa46ca539a26b308b55 ]

RDTSCP is supported in legacy mode as well as long mode. The
IA32_TSC_AUX MSR should be set to the correct guest value before
entering any guest that supports RDTSCP.

Fixes: 4e47c7a6 ("KVM: VMX: Add instruction rdtscp support for guest")
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NPeter Shier <pshier@google.com>
Reviewed-by: NMarc Orr <marcorr@google.com>
Reviewed-by: NLiran Alon <liran.alon@oracle.com>
Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

279bbd22

x86/resctrl: Prevent NULL pointer dereference when reading mondata · 1677a0e5

由 Xiaochen Shen 提交于 10月 29, 2019

[ Upstream commit 26467b0f8407cbd628fa5b7bcfd156e772004155 ]

When a mon group is being deleted, rdtgrp->flags is set to RDT_DELETED
in rdtgroup_rmdir_mon() firstly. The structure of rdtgrp will be freed
until rdtgrp->waitcount is dropped to 0 in rdtgroup_kn_unlock() later.

During the window of deleting a mon group, if an application calls
rdtgroup_mondata_show() to read mondata under this mon group,
'rdtgrp' returned from rdtgroup_kn_lock_live() is a NULL pointer when
rdtgrp->flags is RDT_DELETED. And then 'rdtgrp' is passed in this path:
rdtgroup_mondata_show() --> mon_event_read() --> mon_event_count().
Thus it results in NULL pointer dereference in mon_event_count().

Check 'rdtgrp' in rdtgroup_mondata_show(), and return -ENOENT
immediately when reading mondata during the window of deleting a mon
group.

Fixes: d89b7379 ("x86/intel_rdt/cqm: Add mon_data")
Signed-off-by: NXiaochen Shen <xiaochen.shen@intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NFenghua Yu <fenghua.yu@intel.com>
Reviewed-by: NTony Luck <tony.luck@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: pei.p.jia@intel.com
Cc: Reinette Chatre <reinette.chatre@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/1572326702-27577-1-git-send-email-xiaochen.shen@intel.comSigned-off-by: NSasha Levin <sashal@kernel.org>

1677a0e5

01 12月, 2019 9 次提交

x86/speculation: Fix redundant MDS mitigation message · ed7a3dde

由 Waiman Long 提交于 11月 15, 2019

commit cd5a2aa89e847bdda7b62029d94e95488d73f6b2 upstream.

Since MDS and TAA mitigations are inter-related for processors that are
affected by both vulnerabilities, the followiing confusing messages can
be printed in the kernel log:

  MDS: Vulnerable
  MDS: Mitigation: Clear CPU buffers

To avoid the first incorrect message, defer the printing of MDS
mitigation after the TAA mitigation selection has been done. However,
that has the side effect of printing TAA mitigation first before MDS
mitigation.

 [ bp: Check box is affected/mitigations are disabled first before
   printing and massage. ]
Suggested-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: NWaiman Long <longman@redhat.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Mark Gross <mgross@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191115161445.30809-3-longman@redhat.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

ed7a3dde

x86/speculation: Fix incorrect MDS/TAA mitigation status · 0af5ae26

由 Waiman Long 提交于 11月 15, 2019

commit 64870ed1b12e235cfca3f6c6da75b542c973ff78 upstream.

For MDS vulnerable processors with TSX support, enabling either MDS or
TAA mitigations will enable the use of VERW to flush internal processor
buffers at the right code path. IOW, they are either both mitigated
or both not. However, if the command line options are inconsistent,
the vulnerabilites sysfs files may not report the mitigation status
correctly.

For example, with only the "mds=off" option:

  vulnerabilities/mds:Vulnerable; SMT vulnerable
  vulnerabilities/tsx_async_abort:Mitigation: Clear CPU buffers; SMT vulnerable

The mds vulnerabilities file has wrong status in this case. Similarly,
the taa vulnerability file will be wrong with mds mitigation on, but
taa off.

Change taa_select_mitigation() to sync up the two mitigation status
and have them turned off if both "mds=off" and "tsx_async_abort=off"
are present.

Update documentation to emphasize the fact that both "mds=off" and
"tsx_async_abort=off" have to be specified together for processors that
are affected by both TAA and MDS to be effective.

 [ bp: Massage and add kernel-parameters.txt change too. ]

Fixes: 1b42f017415b ("x86/speculation/taa: Add mitigation for TSX Async Abort")
Signed-off-by: NWaiman Long <longman@redhat.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: linux-doc@vger.kernel.org
Cc: Mark Gross <mgross@linux.intel.com>
Cc: <stable@vger.kernel.org>
Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20191115161445.30809-2-longman@redhat.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

0af5ae26

x86/insn: Fix awk regexp warnings · ed731209

由 Alexander Kapshuk 提交于 9月 24, 2019

commit 700c1018b86d0d4b3f1f2d459708c0cdf42b521d upstream.

gawk 5.0.1 generates the following regexp warnings:

  GEN      /home/sasha/torvalds/tools/objtool/arch/x86/lib/inat-tables.c
  awk: ../arch/x86/tools/gen-insn-attr-x86.awk:260: warning: regexp escape sequence `\:' is not a known regexp operator
  awk: ../arch/x86/tools/gen-insn-attr-x86.awk:350: (FILENAME=../arch/x86/lib/x86-opcode-map.txt FNR=41) warning: regexp escape sequence `\&' is  not a known regexp operator

Ealier versions of gawk are not known to generate these warnings. The
gawk manual referenced below does not list characters ':' and '&' as
needing escaping, so 'unescape' them. See

  https://www.gnu.org/software/gawk/manual/html_node/Escape-Sequences.html

for more info.

Running diff on the output generated by the script before and after
applying the patch reported no differences.

 [ bp: Massage commit message. ]

[ Caught the respective tools header discrepancy. ]
Reported-by: Nkbuild test robot <lkp@intel.com>
Signed-off-by: NAlexander Kapshuk <alexander.kapshuk@gmail.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190924044659.3785-1-alexander.kapshuk@gmail.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

ed731209

KVM: MMU: Do not treat ZONE_DEVICE pages as being reserved · 4ae7392a

由 Sean Christopherson 提交于 11月 11, 2019

commit a78986aae9b2988f8493f9f65a587ee433e83bc3 upstream.

Explicitly exempt ZONE_DEVICE pages from kvm_is_reserved_pfn() and
instead manually handle ZONE_DEVICE on a case-by-case basis.  For things
like page refcounts, KVM needs to treat ZONE_DEVICE pages like normal
pages, e.g. put pages grabbed via gup().  But for flows such as setting
A/D bits or shifting refcounts for transparent huge pages, KVM needs to
to avoid processing ZONE_DEVICE pages as the flows in question lack the
underlying machinery for proper handling of ZONE_DEVICE pages.

This fixes a hang reported by Adam Borowski[*] in dev_pagemap_cleanup()
when running a KVM guest backed with /dev/dax memory, as KVM straight up
doesn't put any references to ZONE_DEVICE pages acquired by gup().

Note, Dan Williams proposed an alternative solution of doing put_page()
on ZONE_DEVICE pages immediately after gup() in order to simplify the
auditing needed to ensure is_zone_device_page() is called if and only if
the backing device is pinned (via gup()).  But that approach would break
kvm_vcpu_{un}map() as KVM requires the page to be pinned from map() 'til
unmap() when accessing guest memory, unlike KVM's secondary MMU, which
coordinates with mmu_notifier invalidations to avoid creating stale
page references, i.e. doesn't rely on pages being pinned.

[*] http://lkml.kernel.org/r/20190919115547.GA17963@angband.plReported-by: NAdam Borowski <kilobyte@angband.pl>
Analyzed-by: NDavid Hildenbrand <david@redhat.com>
Acked-by: NDan Williams <dan.j.williams@intel.com>
Cc: stable@vger.kernel.org
Fixes: 3565fce3 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
[sean: backport to 4.x; resolve conflict in mmu.c]
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

4ae7392a

x86/intel_rdt: Prevent pseudo-locking from using stale pointers · 3d02e3bb

由 Jithu Joseph 提交于 10月 12, 2018

[ Upstream commit b61b8bba18fe2b63d38fdaf9b83de25e2d787dfe ]

When the last CPU in an rdt_domain goes offline, its rdt_domain struct gets
freed. Current pseudo-locking code is unaware of this scenario and tries to
dereference the freed structure in a few places.

Add checks to prevent pseudo-locking code from doing this.

While further work is needed to seamlessly restore resource groups (not
just pseudo-locking) to their configuration when the domain is brought back
online, the immediate issue of invalid pointers is addressed here.

Fixes: f4e80d67 ("x86/intel_rdt: Resctrl files reflect pseudo-locked information")
Fixes: 443810fe ("x86/intel_rdt: Create debugfs files for pseudo-locking testing")
Fixes: 746e0859 ("x86/intel_rdt: Create character device exposing pseudo-locked region")
Fixes: 33dc3e41 ("x86/intel_rdt: Make CPU information accessible for pseudo-locked regions")
Signed-off-by: NJithu Joseph <jithu.joseph@intel.com>
Signed-off-by: NReinette Chatre <reinette.chatre@intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: fenghua.yu@intel.com
Cc: tony.luck@intel.com
Cc: gavin.hindman@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/231f742dbb7b00a31cc104416860e27dba6b072d.1539384145.git.reinette.chatre@intel.comSigned-off-by: NSasha Levin <sashal@kernel.org>

3d02e3bb

kprobes, x86/ptrace.h: Make regs_get_kernel_stack_nth() not fault on bad stack · cb6a3096

由 Steven Rostedt (VMware) 提交于 10月 17, 2018

[ Upstream commit c2712b858187f5bcd7b042fe4daa3ba3a12635c0 ]

Andy had some concerns about using regs_get_kernel_stack_nth() in a new
function regs_get_kernel_argument() as if there's any error in the stack
code, it could cause a bad memory access. To be on the safe side, call
probe_kernel_read() on the stack address to be extra careful in accessing
the memory. A helper function, regs_get_kernel_stack_nth_addr(), was added
to just return the stack address (or NULL if not on the stack), that will be
used to find the address (and could be used by other functions) and read the
address with kernel_probe_read().
Requested-by: NAndy Lutomirski <luto@amacapital.net>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181017165951.09119177@gandalf.local.homeSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

cb6a3096

KVM/x86: Fix invvpid and invept register operand size in 64-bit mode · 5a487f40

由 Uros Bizjak 提交于 10月 11, 2018

[ Upstream commit 5ebb272b2ea7e02911a03a893f8d922d49f9bb4a ]

Register operand size of invvpid and invept instruction in 64-bit mode
has always 64 bits. Adjust inline function argument type to reflect
correct size.
Signed-off-by: NUros Bizjak <ubizjak@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

5a487f40

KVM: nVMX: move check_vmentry_postreqs() call to nested_vmx_enter_non_root_mode() · 7392aa08

由 Sean Christopherson 提交于 9月 26, 2018

[ Upstream commit 7671ce21b13b9596163a29f4712cb2451a9b97dc ]

In preparation of supporting checkpoint/restore for nested state,
commit ca0bde28 ("kvm: nVMX: Split VMCS checks from nested_vmx_run()")
modified check_vmentry_postreqs() to only perform the guest EFER
consistency checks when nested_run_pending is true.  But, in the
normal nested VMEntry flow, nested_run_pending is only set after
check_vmentry_postreqs(), i.e. the consistency check is being skipped.

Alternatively, nested_run_pending could be set prior to calling
check_vmentry_postreqs() in nested_vmx_run(), but placing the
consistency checks in nested_vmx_enter_non_root_mode() allows us
to split prepare_vmcs02() and interleave the preparation with
the consistency checks without having to change the call sites
of nested_vmx_enter_non_root_mode().  In other words, the rest
of the consistency check code in nested_vmx_run() will be joining
the postreqs checks in future patches.

Fixes: ca0bde28 ("kvm: nVMX: Split VMCS checks from nested_vmx_run()")
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Cc: Jim Mattson <jmattson@google.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

7392aa08

KVM: nVMX: reset cache/shadows when switching loaded VMCS · 9fe573d5

由 Sean Christopherson 提交于 9月 26, 2018

[ Upstream commit b7031fd40fcc741b0f9b0c04c8d844e445858b84 ]

Reset the vm_{entry,exit}_controls_shadow variables as well as the
segment cache after loading a new VMCS in vmx_switch_vmcs().  The
shadows/cache track VMCS data, i.e. they're stale every time we
switch to a new VMCS regardless of reason.

This fixes a bug where stale control shadows would be consumed after
a nested VMExit due to a failed consistency check.
Suggested-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

9fe573d5

24 11月, 2019 13 次提交

x86/resctrl: Fix rdt_find_domain() return value and checks · 2e1f151c

由 Reinette Chatre 提交于 12月 10, 2018

[ Upstream commit 52eb74339a6233c69f4e3794b69ea7c98eeeae1b ]

rdt_find_domain() returns an ERR_PTR() that is generated from a provided
domain id when the value is negative.

Care needs to be taken when creating an ERR_PTR() from this value
because a subsequent check using IS_ERR() expects the error to
be within the MAX_ERRNO range. Using an invalid domain id as an
ERR_PTR() does work at this time since this is currently always -1.
Using this undocumented assumption is fragile since future users of
rdt_find_domain() may not be aware of thus assumption.

Two related issues are addressed:

- Ensure that rdt_find_domain() always returns a valid error value by
forcing the error to be -ENODEV when a negative domain id is provided.

- In a few instances the return value of rdt_find_domain() is just
checked for NULL - fix these to include a check of ERR_PTR.

Fixes: d89b7379 ("x86/intel_rdt/cqm: Add mon_data")
Fixes: 521348b011d6 ("x86/intel_rdt: Introduce utility to obtain CDP peer")
Signed-off-by: NReinette Chatre <reinette.chatre@intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: fenghua.yu@intel.com
Cc: gavin.hindman@intel.com
Cc: jithu.joseph@intel.com
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/b88cd4ff6a75995bf8db9b0ea546908fe50f69f3.1544479852.git.reinette.chatre@intel.comSigned-off-by: NSasha Levin <sashal@kernel.org>

2e1f151c

x86/mm: Do not warn about PCI BIOS W+X mappings · 33679383

由 Thomas Gleixner 提交于 10月 08, 2018

[ Upstream commit c200dac78fec66d87ef262cac38cfe4feabdf737 ]

PCI BIOS requires the BIOS area 0x0A0000-0x0FFFFFF to be mapped W+X for
various legacy reasons. When CONFIG_DEBUG_WX is enabled, this triggers the
WX warning, but this is misleading because the mapping is required and is
not a result of an accidental oversight.

Prevent the full warning when PCI BIOS is enabled and the detected WX
mapping is in the BIOS area. Just emit a pr_warn() which denotes the
fact. This is partially duplicating the info which the PCI BIOS code emits
when it maps the area as executable, but that info is not in the context of
the WX checking output.

Remove the extra %p printout in the WARN_ONCE() while at it. %pS is enough.
Reported-by: NPaul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: NBorislav Petkov <bp@suse.de>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1810082151160.2455@nanos.tec.linutronix.deSigned-off-by: NSasha Levin <sashal@kernel.org>

33679383

x86/kexec: Correct KEXEC_BACKUP_SRC_END off-by-one error · 85f996c3

由 Bjorn Helgaas 提交于 9月 27, 2018

[ Upstream commit 51fbf14f2528a8c6401290e37f1c893a2412f1d3 ]

The only use of KEXEC_BACKUP_SRC_END is as an argument to
walk_system_ram_res():

  int crash_load_segments(struct kimage *image)
  {
    ...
    walk_system_ram_res(KEXEC_BACKUP_SRC_START, KEXEC_BACKUP_SRC_END,
                        image, determine_backup_region);

walk_system_ram_res() expects "start, end" arguments that are inclusive,
i.e., the range to be walked includes both the start and end addresses.

KEXEC_BACKUP_SRC_END was previously defined as (640 * 1024UL), which is the
first address *past* the desired 0-640KB range.

Define KEXEC_BACKUP_SRC_END as (640 * 1024UL - 1) so the KEXEC_BACKUP_SRC
region is [0-0x9ffff], not [0-0xa0000].

Fixes: dd5f7260 ("kexec: support for kexec on panic using new system call")
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Brijesh Singh <brijesh.singh@amd.com>
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CC: Ingo Molnar <mingo@redhat.com>
CC: Lianbo Jiang <lijiang@redhat.com>
CC: Takashi Iwai <tiwai@suse.de>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Tom Lendacky <thomas.lendacky@amd.com>
CC: Vivek Goyal <vgoyal@redhat.com>
CC: baiyaowei@cmss.chinamobile.com
CC: bhe@redhat.com
CC: dan.j.williams@intel.com
CC: dyoung@redhat.com
CC: kexec@lists.infradead.org
Link: http://lkml.kernel.org/r/153805811578.1157.6948388946904655969.stgit@bhelgaas-glaptop.roam.corp.google.comSigned-off-by: NSasha Levin <sashal@kernel.org>

85f996c3

x86/intel_rdt: CBM overlap should also check for overlap with CDP peer · 322f530f

由 Reinette Chatre 提交于 10月 03, 2018

[ Upstream commit e5f3530c391105fdd6174852e3ea6136d073b45a ]

The CBM overlap test is used to manage the allocations of RDT resources
where overlap is possible between resource groups. When a resource group
is in exclusive mode then there should be no overlap between resource
groups.

The current overlap test only considers overlap between the same
resources, for example, that usage of a RDT_RESOURCE_L2DATA resource
in one resource group does not overlap with usage of a RDT_RESOURCE_L2DATA
resource in another resource group. The problem with this is that it
allows overlap between a RDT_RESOURCE_L2DATA resource in one resource
group with a RDT_RESOURCE_L2CODE resource in another resource group -
even if both resource groups are in exclusive mode. This is a problem
because even though these appear to be different resources they end up
sharing the same underlying hardware and thus does not fulfill the
user's request for exclusive use of hardware resources.

Fix this by including the CDP peer (if there is one) in every CBM
overlap test. This does not impact the overlap between resources
within the same exclusive resource group that is allowed.

Fixes: 49f7b4ef ("x86/intel_rdt: Enable setting of exclusive mode")
Reported-by: NJithu Joseph <jithu.joseph@intel.com>
Signed-off-by: NReinette Chatre <reinette.chatre@intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: NJithu Joseph <jithu.joseph@intel.com>
Acked-by: NFenghua Yu <fenghua.yu@intel.com>
Cc: tony.luck@intel.com
Cc: gavin.hindman@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/e538b7f56f7ca15963dce2e00ac3be8edb8a68e1.1538603665.git.reinette.chatre@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

322f530f

x86/intel_rdt: Introduce utility to obtain CDP peer · 36cf9131

由 Reinette Chatre 提交于 10月 03, 2018

[ Upstream commit 521348b011d64cf3febb10b64ba5b472681bef94 ]

Introduce a utility that, when provided with a RDT resource and an
instance of this RDT resource (a RDT domain), would return pointers to
the RDT resource and RDT domain that share the same hardware. This is
specific to the CDP resources that share the same hardware.

For example, if a pointer to the RDT_RESOURCE_L2DATA resource (struct
rdt_resource) and a pointer to an instance of this resource (struct
rdt_domain) is provided, then it will return a pointer to the
RDT_RESOURCE_L2CODE resource as well as the specific instance that
shares the same hardware as the provided rdt_domain.

This utility is created in support of the "exclusive" resource group
mode where overlap of resource allocation between resource groups need
to be avoided. The overlap test need to consider not just the matching
resources, but also the resources that share the same hardware.

Temporarily mark it as unused in support of patch testing to avoid
compile warnings until it is used.

Fixes: 49f7b4ef ("x86/intel_rdt: Enable setting of exclusive mode")
Signed-off-by: NReinette Chatre <reinette.chatre@intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: NJithu Joseph <jithu.joseph@intel.com>
Acked-by: NFenghua Yu <fenghua.yu@intel.com>
Cc: tony.luck@intel.com
Cc: gavin.hindman@intel.com
Cc: dave.hansen@intel.com
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/9b4bc4d59ba2e903b6a3eb17e16ef41a8e7b7c3e.1538603665.git.reinette.chatre@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

36cf9131

x86/fsgsbase/64: Fix ptrace() to read the FS/GS base accurately · 500c9330

由 Andy Lutomirski 提交于 9月 18, 2018

[ Upstream commit 07e1d88adaaeab247b300926f78cc3f950dbeda3 ]

On 64-bit kernels ptrace can read the FS/GS base using the register access
APIs (PTRACE_PEEKUSER, etc.) or PTRACE_ARCH_PRCTL.

Make both of these mechanisms return the actual FS/GS base.

This will improve debuggability by providing the correct information
to ptracer such as GDB.

[ chang: Rebased and revised patch description. ]
[ mingo: Revised the changelog some more. ]
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NChang S. Bae <chang.seok.bae@intel.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Markus T Metzger <markus.t.metzger@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1537312139-5580-2-git-send-email-chang.seok.bae@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

500c9330

x86/olpc: Fix build error with CONFIG_MFD_CS5535=m · af23231a

由 Borislav Petkov 提交于 10月 05, 2018

[ Upstream commit fa112cf1e8bc693d5a666b1c479a2859c8b6e0f1 ]

When building a 32-bit config which has the above MFD item as module
but OLPC_XO1_PM is enabled =y - which is bool, btw - the kernel fails
building with:

  ld: arch/x86/platform/olpc/olpc-xo1-pm.o: in function `xo1_pm_remove':
  /home/boris/kernel/linux/arch/x86/platform/olpc/olpc-xo1-pm.c:159: undefined reference to `mfd_cell_disable'
  ld: arch/x86/platform/olpc/olpc-xo1-pm.o: in function `xo1_pm_probe':
  /home/boris/kernel/linux/arch/x86/platform/olpc/olpc-xo1-pm.c:133: undefined reference to `mfd_cell_enable'
  make: *** [Makefile:1030: vmlinux] Error 1

Force MFD_CS5535 to y if OLPC_XO1_PM is enabled.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: Lubomir Rintel <lkundrak@v3.sk>
Cc: x86@kernel.org
Link: http://lkml.kernel.org/r/20181005131750.GA5366@zn.tnicSigned-off-by: NSasha Levin <sashal@kernel.org>

af23231a

PM / hibernate: Check the success of generating md5 digest before hibernation · 9372023e

由 Chen Yu 提交于 9月 21, 2018

[ Upstream commit 749fa17093ff67b31dea864531a3698b6a95c26c ]

Currently if get_e820_md5() fails, then it will hibernate nevertheless.
Actually the error code should be propagated to upper caller so that
the hibernation could be aware of the result and terminates the process
if md5 digest fails.
Suggested-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPavel Machek <pavel@ucw.cz>
Signed-off-by: NChen Yu <yu.c.chen@intel.com>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

9372023e

x86/PCI: Apply VMD's AERSID fixup generically · cc7d996a

由 Jon Derrick 提交于 9月 07, 2018

[ Upstream commit 4f475e8e0a6d4f5d430350d1f74f7e4899fb1692 ]

A root port Device ID changed between simulation and production.  Rather
than match Device IDs which may not be future-proof if left unmaintained,
match all root ports which exist in a VMD domain.
Signed-off-by: NJon Derrick <jonathan.derrick@intel.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

cc7d996a

bpf, x32: Fix bug for BPF_JMP | {BPF_JSGT, BPF_JSLE, BPF_JSLT, BPF_JSGE} · d51d9605

由 Wang YanQing 提交于 4月 27, 2019

commit 711aef1bbf88212a21f7103e88f397b47a528805 upstream.

The current method to compare 64-bit numbers for conditional jump is:

1) Compare the high 32-bit first.

2) If the high 32-bit isn't the same, then goto step 4.

3) Compare the low 32-bit.

4) Check the desired condition.

This method is right for unsigned comparison, but it is buggy for signed
comparison, because it does signed comparison for low 32-bit too.

There is only one sign bit in 64-bit number, that is the MSB in the 64-bit
number, it is wrong to treat low 32-bit as signed number and do the signed
comparison for it.

This patch fixes the bug and adds a testcase in selftests/bpf for such bug.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=205469Reported-by: NTony Ambardar <itugrok@yahoo.com>
Cc: Tony Ambardar <itugrok@yahoo.com>
Cc: stable@vger.kernel.org #v4.19
Signed-off-by: NWang YanQing <udknight@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

d51d9605

bpf, x32: Fix bug with ALU64 {LSH, RSH, ARSH} BPF_K shift by 0 · f3c40792

由 Luke Nelson 提交于 6月 28, 2019

commit 6fa632e719eec4d1b1ebf3ddc0b2d667997b057b upstream.

The current x32 BPF JIT does not correctly compile shift operations when
the immediate shift amount is 0. The expected behavior is for this to
be a no-op.

The following program demonstrates the bug. The expexceted result is 1,
but the current JITed code returns 2.

  r0 = 1
  r1 = 1
  r1 <<= 0
  if r1 == 1 goto end
  r0 = 2
end:
  exit

This patch simplifies the code and fixes the bug.

Fixes: 03f5781b ("bpf, x86_32: add eBPF JIT compiler for ia32")
Co-developed-by: NXi Wang <xi.wang@gmail.com>
Signed-off-by: NXi Wang <xi.wang@gmail.com>
Signed-off-by: NLuke Nelson <luke.r.nels@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NWang YanQing <udknight@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

f3c40792

bpf, x32: Fix bug with ALU64 {LSH, RSH, ARSH} BPF_X shift by 0 · a085f797

由 Luke Nelson 提交于 6月 28, 2019

commit 68a8357ec15bdce55266e9fba8b8b3b8143fa7d2 upstream.

The current x32 BPF JIT for shift operations is not correct when the
shift amount in a register is 0. The expected behavior is a no-op, whereas
the current implementation changes bits in the destination register.

The following example demonstrates the bug. The expected result of this
program is 1, but the current JITed code returns 2.

  r0 = 1
  r1 = 1
  r2 = 0
  r1 <<= r2
  if r1 == 1 goto end
  r0 = 2
end:
  exit

The bug is caused by an incorrect assumption by the JIT that a shift by
32 clear the register. On x32 however, shifts use the lower 5 bits of
the source, making a shift by 32 equivalent to a shift by 0.

This patch fixes the bug using double-precision shifts, which also
simplifies the code.

Fixes: 03f5781b ("bpf, x86_32: add eBPF JIT compiler for ia32")
Co-developed-by: NXi Wang <xi.wang@gmail.com>
Signed-off-by: NXi Wang <xi.wang@gmail.com>
Signed-off-by: NLuke Nelson <luke.r.nels@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NWang YanQing <udknight@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

a085f797

bpf, x32: Fix bug for BPF_ALU64 | BPF_NEG · 860a7d18

由 Wang YanQing 提交于 4月 28, 2019

commit b9aa0b35d878dff9ed19f94101fe353a4de00cc4 upstream.

The current implementation has two errors:

1: The second xor instruction will clear carry flag which
   is necessary for following sbb instruction.
2: The select coding for sbb instruction is wrong, the coding
   is "sbb dreg_hi,ecx", but what we need is "sbb ecx,dreg_hi".

This patch rewrites the implementation and fixes the errors.

This patch fixes below errors reported by bpf/test_verifier in x32
platform when the jit is enabled:

"
0: (b4) w1 = 4
1: (b4) w2 = 4
2: (1f) r2 -= r1
3: (4f) r2 |= r1
4: (87) r2 = -r2
5: (c7) r2 s>>= 63
6: (5f) r1 &= r2
7: (bf) r0 = r1
8: (95) exit
processed 9 insns (limit 131072), stack depth 0
0: (b4) w1 = 4
1: (b4) w2 = 4
2: (1f) r2 -= r1
3: (4f) r2 |= r1
4: (87) r2 = -r2
5: (c7) r2 s>>= 63
6: (5f) r1 &= r2
7: (bf) r0 = r1
8: (95) exit
processed 9 insns (limit 131072), stack depth 0
......
Summary: 1189 PASSED, 125 SKIPPED, 15 FAILED
"
Signed-off-by: NWang YanQing <udknight@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

860a7d18

21 11月, 2019 4 次提交

x86/hyperv: Suppress "PCI: Fatal: No config space access function found" · d34465e7

由 Dexuan Cui 提交于 9月 18, 2018

[ Upstream commit 2f285f46240d67060061d153786740d4df53cd78 ]

A Generation-2 Linux VM on Hyper-V doesn't have the legacy PCI bus, and
users always see the scary warning, which is actually harmless.

Suppress it.
Signed-off-by: NDexuan Cui <decui@microsoft.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NMichael Kelley <mikelley@microsoft.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: KY Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: "devel@linuxdriverproject.org" <devel@linuxdriverproject.org>
Cc: Olaf Aepfle <olaf@aepfle.de>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Marcelo Cerri <marcelo.cerri@canonical.com>
Cc: Josh Poulson <jopoulso@microsoft.com>
Link: https://lkml.kernel.org/r/ <KU1P153MB0166D977DC930996C4BF538ABF1D0@KU1P153MB0166.APCP153.PROD.OUTLOOK.COM
Signed-off-by: NSasha Levin <sashal@kernel.org>

d34465e7

x86/CPU: Change query logic so CPUID is enabled before testing · e897dd4c

由 Matthew Whitehead 提交于 9月 21, 2018

[ Upstream commit 2893cc8ff892fa74972d8dc0e1d0dc65116daaa3 ]

Presently we check first if CPUID is enabled. If it is not already
enabled, then we next call identify_cpu_without_cpuid() and clear
X86_FEATURE_CPUID.

Unfortunately, identify_cpu_without_cpuid() is the function where CPUID
becomes _enabled_ on Cyrix 6x86/6x86L CPUs.

Reverse the calling sequence so that CPUID is first enabled, and then
check a second time to see if the feature has now been activated.

[ bp: Massage commit message and remove trailing whitespace. ]
Suggested-by: NAndy Lutomirski <luto@amacapital.net>
Signed-off-by: NMatthew Whitehead <tedheadster@gmail.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NAndy Lutomirski <luto@amacapital.net>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180921212041.13096-3-tedheadster@gmail.comSigned-off-by: NSasha Levin <sashal@kernel.org>

e897dd4c

x86/CPU: Use correct macros for Cyrix calls · d26ad73b

由 Matthew Whitehead 提交于 9月 21, 2018

[ Upstream commit 03b099bdcdf7125d4a63dc9ddeefdd454e05123d ]

There are comments in processor-cyrix.h advising you to _not_ make calls
using the deprecated macros in this style:

  setCx86_old(CX86_CCR4, getCx86_old(CX86_CCR4) | 0x80);

This is because it expands the macro into a non-functioning calling
sequence. The calling order must be:

  outb(CX86_CCR2, 0x22);
  inb(0x23);

From the comments:

 * When using the old macros a line like
 *   setCx86(CX86_CCR2, getCx86(CX86_CCR2) | 0x88);
 * gets expanded to:
 *  do {
 *    outb((CX86_CCR2), 0x22);
 *    outb((({
 *        outb((CX86_CCR2), 0x22);
 *        inb(0x23);
 *    }) | 0x88), 0x23);
 *  } while (0);

The new macros fix this problem, so use them instead.
Signed-off-by: NMatthew Whitehead <tedheadster@gmail.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NAndy Lutomirski <luto@amacapital.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jia Zhang <qianyue.zj@alibaba-inc.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180921212041.13096-2-tedheadster@gmail.comSigned-off-by: NSasha Levin <sashal@kernel.org>

d26ad73b

x86/mce-inject: Reset injection struct after injection · 098e12f3

由 Borislav Petkov 提交于 9月 04, 2018

[ Upstream commit 7401a633c34adc7aefd3edfec60074cb0475a3e8 ]

Clear the MCE struct which is used for collecting the injection details
after injection.

Also, populate it with more details from the machine.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20180905081954.10391-1-bp@alien8.deSigned-off-by: NSasha Levin <sashal@kernel.org>

098e12f3

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功