提交 · 891011ca564ddc66976345a6d8b84775a92d244e · openanolis / cloud-kernel

16 9月, 2019 40 次提交

KVM: VMX: Fix handling of #MC that occurs during VM-Entry · 891011ca

由 Sean Christopherson 提交于 9月 02, 2019

[ Upstream commit beb8d93b3e423043e079ef3dda19dad7b28467a8 ]

A previous fix to prevent KVM from consuming stale VMCS state after a
failed VM-Entry inadvertantly blocked KVM's handling of machine checks
that occur during VM-Entry.

Per Intel's SDM, a #MC during VM-Entry is handled in one of three ways,
depending on when the #MC is recognoized.  As it pertains to this bug
fix, the third case explicitly states EXIT_REASON_MCE_DURING_VMENTRY
is handled like any other VM-Exit during VM-Entry, i.e. sets bit 31 to
indicate the VM-Entry failed.

If a machine-check event occurs during a VM entry, one of the following occurs:
 - The machine-check event is handled as if it occurred before the VM entry:
        ...
 - The machine-check event is handled after VM entry completes:
        ...
 - A VM-entry failure occurs as described in Section 26.7. The basic
   exit reason is 41, for "VM-entry failure due to machine-check event".

Explicitly handle EXIT_REASON_MCE_DURING_VMENTRY as a one-off case in
vmx_vcpu_run() instead of binning it into vmx_complete_atomic_exit().
Doing so allows vmx_vcpu_run() to handle VMX_EXIT_REASONS_FAILED_VMENTRY
in a sane fashion and also simplifies vmx_complete_atomic_exit() since
VMCS.VM_EXIT_INTR_INFO is guaranteed to be fresh.

Fixes: b060ca3b ("kvm: vmx: Handle VMLAUNCH/VMRESUME failure properly")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

891011ca

KVM: VMX: Always signal #GP on WRMSR to MSR_IA32_CR_PAT with bad value · 74ce1333

由 Sean Christopherson 提交于 5月 07, 2019

[ Upstream commit d28f4290b53a157191ed9991ad05dffe9e8c0c89 ]

The behavior of WRMSR is in no way dependent on whether or not KVM
consumes the value.

Fixes: 4566654b ("KVM: vmx: Inject #GP on invalid PAT CR")
Cc: stable@vger.kernel.org
Cc: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

74ce1333

KVM: x86: optimize check for valid PAT value · 74fd8aae

由 Paolo Bonzini 提交于 4月 10, 2019

[ Upstream commit 674ea351cdeb01d2740edce31db7f2d79ce6095d ]

This check will soon be done on every nested vmentry and vmexit,
"parallelize" it using bitwise operations.
Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

74fd8aae

ceph: use ceph_evict_inode to cleanup inode's resource · 81281039

由 Yan, Zheng 提交于 6月 02, 2019

[ Upstream commit 87bc5b895d94a0f40fe170d4cf5771c8e8f85d15 ]

remove_session_caps() relies on __wait_on_freeing_inode(), to wait for
freeing inode to remove its caps. But VFS wakes freeing inode waiters
before calling destroy_inode().

Cc: stable@vger.kernel.org
Link: https://tracker.ceph.com/issues/40102Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

81281039

ALSA: hda - Don't resume forcibly i915 HDMI/DP codec · 42fa0e35

由 Takashi Iwai 提交于 7月 16, 2019

[ Upstream commit 4914da2fb0c89205790503f20dfdde854f3afdd8 ]

We apply the codec resume forcibly at system resume callback for
updating and syncing the jack detection state that may have changed
during sleeping.  This is, however, superfluous for the codec like
Intel HDMI/DP, where the jack detection is managed via the audio
component notification; i.e. the jack state change shall be reported
sooner or later from the graphics side at mode change.

This patch changes the codec resume callback to avoid the forcible
resume conditionally with a new flag, codec->relaxed_resume, for
reducing the resume time.  The flag is set in the codec probe.

Although this doesn't fix the entire bug mentioned in the bugzilla
entry below, it's still a good optimization and some improvements are
seen.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201901
Cc: <stable@vger.kernel.org>
Signed-off-by: NTakashi Iwai <tiwai@suse.de>
Signed-off-by: NSasha Levin <sashal@kernel.org>

42fa0e35

cifs: Properly handle auto disabling of serverino option · 987564c2

由 Paulo Alcantara (SUSE) 提交于 6月 18, 2019

[ Upstream commit 29fbeb7a908a60a5ae8c50fbe171cb8fdcef1980 ]

Fix mount options comparison when serverino option is turned off later
in cifs_autodisable_serverino() and thus avoiding mismatch of new cifs
mounts.

Cc: stable@vger.kernel.org
Signed-off-by: NPaulo Alcantara (SUSE) <paulo@paulo.ac>
Signed-off-by: NSteve French <stfrench@microsoft.com>
Reviewed-by: NPavel Shilovsky <pshilove@microsoft.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

987564c2

scsi: zfcp: fix request object use-after-free in send path causing wrong traces · d85e830d

由 Benjamin Block 提交于 7月 02, 2019

[ Upstream commit 106d45f350c7cac876844dc685845cba4ffdb70b ]

When tracing instances where we open and close WKA ports, we also pass the
request-ID of the respective FSF command.

But after successfully sending the FSF command we must not use the
request-object anymore, as this might result in an use-after-free (see
"zfcp: fix request object use-after-free in send path causing seqno
errors" ).

To fix this add a new variable that caches the request-ID before sending
the request. This won't change during the hand-off to the FCP channel,
and so it's safe to trace this cached request-ID later, instead of using
the request object.
Signed-off-by: NBenjamin Block <bblock@linux.ibm.com>
Fixes: d27a7cb9 ("zfcp: trace on request for open and close of WKA port")
Cc: <stable@vger.kernel.org> #2.6.38+
Reviewed-by: NSteffen Maier <maier@linux.ibm.com>
Reviewed-by: NJens Remus <jremus@linux.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

d85e830d

staging: wilc1000: fix error path cleanup in wilc_wlan_initialize() · ba8701d2

由 Ajay Singh 提交于 6月 26, 2019

[ Upstream commit 6419f818ababebc1116fb2d0e220bd4fe835d0e3 ]

For the error path in wilc_wlan_initialize(), the resources are not
cleanup in the correct order. Reverted the previous changes and use the
correct order to free during error condition.

Fixes: b46d6882 ("staging: wilc1000: remove COMPLEMENT_BOOT")
Cc: <stable@vger.kernel.org>
Signed-off-by: NAjay Singh <ajay.kathat@microchip.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

ba8701d2

scsi: target/iblock: Fix overrun in WRITE SAME emulation · 60b856dc

由 Roman Bolshakov 提交于 7月 02, 2019

[ Upstream commit 5676234f20fef02f6ca9bd66c63a8860fce62645 ]

WRITE SAME corrupts data on the block device behind iblock if the command
is emulated. The emulation code issues (M - 1) * N times more bios than
requested, where M is the number of 512 blocks per real block size and N is
the NUMBER OF LOGICAL BLOCKS specified in WRITE SAME command. So, for a
device with 4k blocks, 7 * N more LBAs gets written after the requested
range.

The issue happens because the number of 512 byte sectors to be written is
decreased one by one while the real bios are typically from 1 to 8 512 byte
sectors per bio.

Fixes: c66ac9db ("[SCSI] target: Add LIO target core v4.0.0-rc6")
Cc: <stable@vger.kernel.org>
Signed-off-by: NRoman Bolshakov <r.bolshakov@yadro.com>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

60b856dc

scsi: target/core: Use the SECTOR_SHIFT constant · ba52842d

由 Bart Van Assche 提交于 10月 15, 2018

[ Upstream commit 80b045b385cfef10939c913fbfeb19ce5491c1f2 ]

Instead of duplicating the SECTOR_SHIFT definition from <linux/blkdev.h>,
use it. This patch does not change any functionality.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Cc: Mike Christie <mchristi@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

ba52842d

apparmor: reset pos on failure to unpack for various functions · 17111037

由 Mike Salvatore 提交于 6月 12, 2019

[ Upstream commit 156e42996bd84eccb6acf319f19ce0cb140d00e3 ]

Each function that manipulates the aa_ext struct should reset it's "pos"
member on failure. This ensures that, on failure, no changes are made to
the state of the aa_ext struct.

There are paths were elements are optional and the error path is
used to indicate the optional element is not present. This means
instead of just aborting on error the unpack stream can become
unsynchronized on optional elements, if using one of the affected
functions.

Cc: stable@vger.kernel.org
Fixes: 736ec752 ("AppArmor: policy routines for loading and unpacking policy")
Signed-off-by: NMike Salvatore <mike.salvatore@canonical.com>
Signed-off-by: NJohn Johansen <john.johansen@canonical.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

17111037

IB/hfi1: Avoid hardlockup with flushlist_lock · 90ca4912

由 Mike Marciniszyn 提交于 6月 14, 2019

[ Upstream commit cf131a81967583ae737df6383a0893b9fee75b4e ]

Heavy contention of the sde flushlist_lock can cause hard lockups at
extreme scale when the flushing logic is under stress.

Mitigate by replacing the item at a time copy to the local list with
an O(1) list_splice_init() and using the high priority work queue to
do the flushes.

Fixes: 77241056 ("IB/hfi1: add driver files")
Cc: <stable@vger.kernel.org>
Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

90ca4912

clk: tegra210: Fix default rates for HDA clocks · fa717fc4

由 Jon Hunter 提交于 6月 05, 2019

[ Upstream commit 9caec6620f25b6d15646bbdb93062c872ba3b56f ]

Currently the default clock rates for the HDA and HDA2CODEC_2X clocks
are both 19.2MHz. However, the default rates for these clocks should
actually be 51MHz and 48MHz, respectively. The current clock settings
results in a distorted output during audio playback. Correct the default
clock rates for these clocks by specifying them in the clock init table
for Tegra210.

Cc: stable@vger.kernel.org
Signed-off-by: NJon Hunter <jonathanh@nvidia.com>
Acked-by: NThierry Reding <treding@nvidia.com>
Signed-off-by: NStephen Boyd <sboyd@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

fa717fc4

clk: tegra: Fix maximum audio sync clock for Tegra124/210 · 350503c8

由 Jon Hunter 提交于 12月 03, 2018

[ Upstream commit 845d782d91448e0fbca686bca2cc9f9c2a9ba3e7 ]

The maximum frequency supported for I2S on Tegra124 and Tegra210 is
24.576MHz (as stated in the Tegra TK1 data sheet for Tegra124 and the
Jetson TX1 module data sheet for Tegra210). However, the maximum I2S
frequency is limited to 24MHz because that is the maximum frequency of
the audio sync clock. Increase the maximum audio sync clock frequency
to 24.576MHz for Tegra124 and Tegra210 in order to support 24.576MHz
for I2S.

Update the tegra_clk_register_sync_source() function so that it does
not set the initial rate for the sync clocks and use the clock init
tables to set the initial rate instead.
Signed-off-by: NJon Hunter <jonathanh@nvidia.com>
Acked-by: NThierry Reding <treding@nvidia.com>
Signed-off-by: NStephen Boyd <sboyd@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

350503c8

cifs: add spinlock for the openFileList to cifsInodeInfo · acc07941

由 Ronnie Sahlberg 提交于 6月 05, 2019

[ Upstream commit 487317c99477d00f22370625d53be3239febabbe ]

We can not depend on the tcon->open_file_lock here since in multiuser mode
we may have the same file/inode open via multiple different tcons.

The current code is race prone and will crash if one user deletes a file
at the same time a different user opens/create the file.

To avoid this we need to have a spinlock attached to the inode and not the tcon.

RHBZ:  1580165

CC: Stable <stable@vger.kernel.org>
Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>
Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

acc07941

Btrfs: fix race between block group removal and block group allocation · 1d064876

由 Filipe Manana 提交于 6月 12, 2019

[ Upstream commit 8eaf40c0e24e98899a0f3ac9d25a33aafe13822a ]

If a task is removing the block group that currently has the highest start
offset amongst all existing block groups, there is a short time window
where it races with a concurrent block group allocation, resulting in a
transaction abort with an error code of EEXIST.

The following diagram explains the race in detail:

      Task A                                                        Task B

 btrfs_remove_block_group(bg offset X)

   remove_extent_mapping(em offset X)
     -> removes extent map X from the
        tree of extent maps
        (fs_info->mapping_tree), so the
        next call to find_next_chunk()
        will return offset X

                                                   btrfs_alloc_chunk()
                                                     find_next_chunk()
                                                       --> returns offset X

                                                     __btrfs_alloc_chunk(offset X)
                                                       btrfs_make_block_group()
                                                         btrfs_create_block_group_cache()
                                                           --> creates btrfs_block_group_cache
                                                               object with a key corresponding
                                                               to the block group item in the
                                                               extent, the key is:
                                                               (offset X, BTRFS_BLOCK_GROUP_ITEM_KEY, 1G)

                                                         --> adds the btrfs_block_group_cache object
                                                             to the list new_bgs of the transaction
                                                             handle

                                                   btrfs_end_transaction(trans handle)
                                                     __btrfs_end_transaction()
                                                       btrfs_create_pending_block_groups()
                                                         --> sees the new btrfs_block_group_cache
                                                             in the new_bgs list of the transaction
                                                             handle
                                                         --> its call to btrfs_insert_item() fails
                                                             with -EEXIST when attempting to insert
                                                             the block group item key
                                                             (offset X, BTRFS_BLOCK_GROUP_ITEM_KEY, 1G)
                                                             because task A has not removed that key yet
                                                         --> aborts the running transaction with
                                                             error -EEXIST

   btrfs_del_item()
     -> removes the block group's key from
        the extent tree, key is
        (offset X, BTRFS_BLOCK_GROUP_ITEM_KEY, 1G)

A sample transaction abort trace:

  [78912.403537] ------------[ cut here ]------------
  [78912.403811] BTRFS: Transaction aborted (error -17)
  [78912.404082] WARNING: CPU: 2 PID: 20465 at fs/btrfs/extent-tree.c:10551 btrfs_create_pending_block_groups+0x196/0x250 [btrfs]
  (...)
  [78912.405642] CPU: 2 PID: 20465 Comm: btrfs Tainted: G        W         5.0.0-btrfs-next-46 #1
  [78912.405941] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
  [78912.406586] RIP: 0010:btrfs_create_pending_block_groups+0x196/0x250 [btrfs]
  (...)
  [78912.407636] RSP: 0018:ffff9d3d4b7e3b08 EFLAGS: 00010282
  [78912.407997] RAX: 0000000000000000 RBX: ffff90959a3796f0 RCX: 0000000000000006
  [78912.408369] RDX: 0000000000000007 RSI: 0000000000000001 RDI: ffff909636b16860
  [78912.408746] RBP: ffff909626758a58 R08: 0000000000000000 R09: 0000000000000000
  [78912.409144] R10: ffff9095ff462400 R11: 0000000000000000 R12: ffff90959a379588
  [78912.409521] R13: ffff909626758ab0 R14: ffff9095036c0000 R15: ffff9095299e1158
  [78912.409899] FS:  00007f387f16f700(0000) GS:ffff909636b00000(0000) knlGS:0000000000000000
  [78912.410285] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [78912.410673] CR2: 00007f429fc87cbc CR3: 000000014440a004 CR4: 00000000003606e0
  [78912.411095] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [78912.411496] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [78912.411898] Call Trace:
  [78912.412318]  __btrfs_end_transaction+0x5b/0x1c0 [btrfs]
  [78912.412746]  btrfs_inc_block_group_ro+0xcf/0x160 [btrfs]
  [78912.413179]  scrub_enumerate_chunks+0x188/0x5b0 [btrfs]
  [78912.413622]  ? __mutex_unlock_slowpath+0x100/0x2a0
  [78912.414078]  btrfs_scrub_dev+0x2ef/0x720 [btrfs]
  [78912.414535]  ? __sb_start_write+0xd4/0x1c0
  [78912.414963]  ? mnt_want_write_file+0x24/0x50
  [78912.415403]  btrfs_ioctl+0x17fb/0x3120 [btrfs]
  [78912.415832]  ? lock_acquire+0xa6/0x190
  [78912.416256]  ? do_vfs_ioctl+0xa2/0x6f0
  [78912.416685]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
  [78912.417116]  do_vfs_ioctl+0xa2/0x6f0
  [78912.417534]  ? __fget+0x113/0x200
  [78912.417954]  ksys_ioctl+0x70/0x80
  [78912.418369]  __x64_sys_ioctl+0x16/0x20
  [78912.418812]  do_syscall_64+0x60/0x1b0
  [78912.419231]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
  [78912.419644] RIP: 0033:0x7f3880252dd7
  (...)
  [78912.420957] RSP: 002b:00007f387f16ed68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
  [78912.421426] RAX: ffffffffffffffda RBX: 000055f5becc1df0 RCX: 00007f3880252dd7
  [78912.421889] RDX: 000055f5becc1df0 RSI: 00000000c400941b RDI: 0000000000000003
  [78912.422354] RBP: 0000000000000000 R08: 00007f387f16f700 R09: 0000000000000000
  [78912.422790] R10: 00007f387f16f700 R11: 0000000000000246 R12: 0000000000000000
  [78912.423202] R13: 00007ffda49c266f R14: 0000000000000000 R15: 00007f388145e040
  [78912.425505] ---[ end trace eb9bfe7c426fc4d3 ]---

Fix this by calling remove_extent_mapping(), at btrfs_remove_block_group(),
only at the very end, after removing the block group item key from the
extent tree (and removing the free space tree entry if we are using the
free space tree feature).

Fixes: 04216820 ("Btrfs: fix race between fs trimming and block group remove/allocation")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

1d064876

drm/amdgpu/{uvd,vcn}: fetch ring's read_ptr after alloc · f276beb3

由 Shirish S 提交于 6月 04, 2019

[ Upstream commit 517b91f4cde3043d77b2178548473e8545ef07cb ]

[What]
readptr read always returns zero, since most likely
these blocks are either power or clock gated.

[How]
fetch rptr after amdgpu_ring_alloc() which informs
the power management code that the block is about to be
used and hence the gating is turned off.
Signed-off-by: NLouis Li <Ching-shih.Li@amd.com>
Signed-off-by: NShirish S <shirish.s@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: NSasha Levin <sashal@kernel.org>

f276beb3

drm/amdgpu: fix ring test failure issue during s3 in vce 3.0 (V2) · 7abeffff

由 Louis Li 提交于 5月 25, 2019

[ Upstream commit ce0e22f5d886d1b56c7ab4347c45b9ac5fcc058d ]

[What]
vce ring test fails consistently during resume in s3 cycle, due to
mismatch read & write pointers.
On debug/analysis its found that rptr to be compared is not being
correctly updated/read, which leads to this failure.
Below is the failure signature:
	[drm:amdgpu_vce_ring_test_ring] *ERROR* amdgpu: ring 12 test failed
	[drm:amdgpu_device_ip_resume_phase2] *ERROR* resume of IP block <vce_v3_0> failed -110
	[drm:amdgpu_device_resume] *ERROR* amdgpu_device_ip_resume failed (-110).

[How]
fetch rptr appropriately, meaning move its read location further down
in the code flow.
With this patch applied the s3 failure is no more seen for >5k s3 cycles,
which otherwise is pretty consistent.

V2: remove reduntant fetch of rptr
Signed-off-by: NLouis Li <Ching-shih.Li@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: NSasha Levin <sashal@kernel.org>

7abeffff

kvm: Check irqchip mode before assign irqfd · d5f65393

由 Peter Xu 提交于 5月 05, 2019

[ Upstream commit 654f1f13ea56b92bacade8ce2725aea0457f91c0 ]

When assigning kvm irqfd we didn't check the irqchip mode but we allow
KVM_IRQFD to succeed with all the irqchip modes.  However it does not
make much sense to create irqfd even without the kernel chips.  Let's
provide a arch-dependent helper to check whether a specific irqfd is
allowed by the arch.  At least for x86, it should make sense to check:

- when irqchip mode is NONE, all irqfds should be disallowed, and,

- when irqchip mode is SPLIT, irqfds that are with resamplefd should
  be disallowed.

For either of the case, previously we'll silently ignore the irq or
the irq ack event if the irqchip mode is incorrect.  However that can
cause misterious guest behaviors and it can be hard to triage.  Let's
fail KVM_IRQFD even earlier to detect these incorrect configurations.

CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Radim Krčmář <rkrcmar@redhat.com>
CC: Alex Williamson <alex.williamson@redhat.com>
CC: Eduardo Habkost <ehabkost@redhat.com>
Signed-off-by: NPeter Xu <peterx@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

d5f65393

drm/amdkfd: Add missing Polaris10 ID · 90772cf5

由 Kent Russell 提交于 5月 13, 2019

[ Upstream commit 0a5a9c276c335870a1cecc4f02b76d6d6f663c8b ]

This was added to amdgpu but was missed in amdkfd
Signed-off-by: NKent Russell <kent.russell@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.rg
Signed-off-by: NSasha Levin <sashal@kernel.org>

90772cf5

ARC: mm: SIGSEGV userspace trying to access kernel virtual memory · cacbc853

由 Eugeniy Paltsev 提交于 5月 13, 2019

[ Upstream commit a8c715b4dd73c26a81a9cc8dc792aa715d8b4bb2 ]

As of today if userspace process tries to access a kernel virtual addres
(0x7000_0000 to 0x7ffff_ffff) such that a legit kernel mapping already
exists, that process hangs instead of being killed with SIGSEGV

Fix that by ensuring that do_page_fault() handles kenrel vaddr only if
in kernel mode.

And given this, we can also simplify the code a bit. Now a vmalloc fault
implies kernel mode so its failure (for some reason) can reuse the
@no_context label and we can remove @bad_area_nosemaphore.

Reproduce user test for original problem:

------------------------>8-----------------
 #include <stdlib.h>
 #include <stdint.h>

 int main(int argc, char *argv[])
 {
 	volatile uint32_t temp;

 	temp = *(uint32_t *)(0x70000000);
 }
------------------------>8-----------------

Cc: <stable@vger.kernel.org>
Signed-off-by: NEugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

cacbc853

ARC: mm: fix uninitialised signal code in do_page_fault · 7edfa9c9

由 Eugeniy Paltsev 提交于 11月 07, 2018

[ Upstream commit 121e38e5acdc8e1e4cdb750fcdcc72f94e420968 ]

Commit 15773ae938d8 ("signal/arc: Use force_sig_fault where
appropriate") introduced undefined behaviour by leaving si_code
unitiailized and leaking random kernel values to user space.

Fixes: 15773ae938d8 ("signal/arc: Use force_sig_fault where appropriate")
Signed-off-by: NEugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

7edfa9c9

signal/arc: Use force_sig_fault where appropriate · 0828438e

由 Eric W. Biederman 提交于 8月 01, 2017

[ Upstream commit 15773ae938d8d93d982461990bebad6e1d7a1830 ]
Acked-by: NVineet Gupta <vgupta@synopsys.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

0828438e

dm crypt: move detailed message into debug level · fcb2f1e2

由 Milan Broz 提交于 5月 15, 2019

[ Upstream commit 7a1cd7238fde6ab367384a4a2998cba48330c398 ]

The information about tag size should not be printed without debug info
set. Also print device major:minor in the error message to identify the
device instance.

Also use rate limiting and debug level for info about used crypto API
implementaton.  This is important because during online reencryption
the existing message saturates syslog (because we are moving hotzone
across the whole device).

Cc: stable@vger.kernel.org
Signed-off-by: NMilan Broz <gmazyland@gmail.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

fcb2f1e2

cifs: smbd: take an array of reqeusts when sending upper layer data · 96b44c20

由 Long Li 提交于 4月 15, 2019

[ Upstream commit 4739f2328661d070f93f9bcc8afb2a82706c826d ]

To support compounding, __smb_send_rqst() now sends an array of requests to
the transport layer.
Change smbd_send() to take an array of requests, and send them in as few
packets as possible.
Signed-off-by: NLong Li <longli@microsoft.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

96b44c20

PCI: dwc: Use devm_pci_alloc_host_bridge() to simplify code · 3f27a14b

由 Jisheng Zhang 提交于 3月 29, 2019

[ Upstream commit e6fdd3bf5aecd8615f31a5128775b9abcf3e0d86 ]

Use devm_pci_alloc_host_bridge() to simplify the error code path.  This
also fixes a leak in the dw_pcie_host_init() error path.
Signed-off-by: NJisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Acked-by: NGustavo Pimentel <gustavo.pimentel@synopsys.com>
CC: stable@vger.kernel.org	# v4.13+
Signed-off-by: NSasha Levin <sashal@kernel.org>

3f27a14b

mmc: sdhci-pci: Add support for Intel CML · 842da8fa

由 Adrian Hunter 提交于 4月 08, 2019

[ Upstream commit 765c59675ab571caf7ada456bbfd23a73136b535 ]

Add PCI Ids for Intel CML.
Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

842da8fa

blk-mq: free hw queue's resource in hctx's release handler · e238e6dc

由 Ming Lei 提交于 4月 30, 2019

[ Upstream commit c7e2d94b3d1634988a95ac4d77a72dc7487ece06 ]

Once blk_cleanup_queue() returns, tags shouldn't be used any more,
because blk_mq_free_tag_set() may be called. Commit 45a9c9d9
("blk-mq: Fix a use-after-free") fixes this issue exactly.

However, that commit introduces another issue. Before 45a9c9d9,
we are allowed to run queue during cleaning up queue if the queue's
kobj refcount is held. After that commit, queue can't be run during
queue cleaning up, otherwise oops can be triggered easily because
some fields of hctx are freed by blk_mq_free_queue() in blk_cleanup_queue().

We have invented ways for addressing this kind of issue before, such as:

	8dc765d438f1 ("SCSI: fix queue cleanup race before queue initialization is done")
	c2856ae2 ("blk-mq: quiesce queue before freeing queue")

But still can't cover all cases, recently James reports another such
kind of issue:

	https://marc.info/?l=linux-scsi&m=155389088124782&w=2

This issue can be quite hard to address by previous way, given
scsi_run_queue() may run requeues for other LUNs.

Fixes the above issue by freeing hctx's resources in its release handler, and this
way is safe becasue tags isn't needed for freeing such hctx resource.

This approach follows typical design pattern wrt. kobject's release handler.

Cc: Dongli Zhang <dongli.zhang@oracle.com>
Cc: James Smart <james.smart@broadcom.com>
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: linux-scsi@vger.kernel.org,
Cc: Martin K . Petersen <martin.petersen@oracle.com>,
Cc: Christoph Hellwig <hch@lst.de>,
Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>,
Reported-by: NJames Smart <james.smart@broadcom.com>
Fixes: 45a9c9d9 ("blk-mq: Fix a use-after-free")
Cc: stable@vger.kernel.org
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NJames Smart <james.smart@broadcom.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NSasha Levin <sashal@kernel.org>

e238e6dc

dm mpath: fix missing call of path selector type->end_io · 69409854

由 Yufen Yu 提交于 4月 24, 2019

[ Upstream commit 5de719e3d01b4abe0de0d7b857148a880ff2a90b ]

After commit 396eaf21 ("blk-mq: improve DM's blk-mq IO merging via
blk_insert_cloned_request feedback"), map_request() will requeue the tio
when issued clone request return BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE.

Thus, if device driver status is error, a tio may be requeued multiple
times until the return value is not DM_MAPIO_REQUEUE.  That means
type->start_io may be called multiple times, while type->end_io is only
called when IO complete.

In fact, even without commit 396eaf21, setup_clone() failure can
also cause tio requeue and associated missed call to type->end_io.

The service-time path selector selects path based on in_flight_size,
which is increased by st_start_io() and decreased by st_end_io().
Missed calls to st_end_io() can lead to in_flight_size count error and
will cause the selector to make the wrong choice.  In addition,
queue-length path selector will also be affected.

To fix the problem, call type->end_io in ->release_clone_rq before tio
requeue.  map_info is passed to ->release_clone_rq() for map_request()
error path that result in requeue.

Fixes: 396eaf21 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback")
Cc: stable@vger.kernl.org
Signed-off-by: NYufen Yu <yuyufen@huawei.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

69409854

PCI: Reset Lenovo ThinkPad P50 nvgpu at boot if necessary · 0fe09701

由 Lyude Paul 提交于 2月 12, 2019

[ Upstream commit e0547c81bfcfad01cbbfa93a5e66bb98ab932f80 ]

On ThinkPad P50 SKUs with an Nvidia Quadro M1000M instead of the M2000M
variant, the BIOS does not always reset the secondary Nvidia GPU during
reboot if the laptop is configured in Hybrid Graphics mode.  The reason is
unknown, but the following steps and possibly a good bit of patience will
reproduce the issue:

  1. Boot up the laptop normally in Hybrid Graphics mode
  2. Make sure nouveau is loaded and that the GPU is awake
  3. Allow the Nvidia GPU to runtime suspend itself after being idle
  4. Reboot the machine, the more sudden the better (e.g. sysrq-b may help)
  5. If nouveau loads up properly, reboot the machine again and go back to
     step 2 until you reproduce the issue

This results in some very strange behavior: the GPU will be left in exactly
the same state it was in when the previously booted kernel started the
reboot.  This has all sorts of bad side effects: for starters, this
completely breaks nouveau starting with a mysterious EVO channel failure
that happens well before we've actually used the EVO channel for anything:

  nouveau 0000:01:00.0: disp: chid 0 mthd 0000 data 00000400 00001000 00000002

This causes a timeout trying to bring up the GR ctx:

  nouveau 0000:01:00.0: timeout
  WARNING: CPU: 0 PID: 12 at drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgf100.c:1547 gf100_grctx_generate+0x7b2/0x850 [nouveau]
  Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET82W (1.55 ) 12/18/2018
  Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper]
  ...
  nouveau 0000:01:00.0: gr: wait for idle timeout (en: 1, ctxsw: 0, busy: 1)
  nouveau 0000:01:00.0: gr: wait for idle timeout (en: 1, ctxsw: 0, busy: 1)
  nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at 0000000000008000 engine 00 [GR] client 15 [HUB/SCC_NB] reason c4 [] on channel -1 [0000000000 unknown]

The GPU never manages to recover.  Booting without loading nouveau causes
issues as well, since the GPU starts sending spurious interrupts that cause
other device's IRQs to get disabled by the kernel:

  irq 16: nobody cared (try booting with the "irqpoll" option)
  ...
  handlers:
  [<000000007faa9e99>] i801_isr [i2c_i801]
  Disabling IRQ #16
  ...
  serio: RMI4 PS/2 pass-through port at rmi4-00.fn03
  i801_smbus 0000:00:1f.4: Timeout waiting for interrupt!
  i801_smbus 0000:00:1f.4: Transaction timeout
  rmi4_f03 rmi4-00.fn03: rmi_f03_pt_write: Failed to write to F03 TX register (-110).
  i801_smbus 0000:00:1f.4: Timeout waiting for interrupt!
  i801_smbus 0000:00:1f.4: Transaction timeout
  rmi4_physical rmi4-00: rmi_driver_set_irq_bits: Failed to change enabled interrupts!

This causes the touchpad and sometimes other things to get disabled.

Since this happens without nouveau, we can't fix this problem from nouveau
itself.

Add a PCI quirk for the specific P50 variant of this GPU.  Make sure the
GPU is advertising NoReset- so we don't reset the GPU when the machine is
in Dedicated graphics mode (where the GPU being initialized by the BIOS is
normal and expected).  Map the GPU MMIO space and read the magic 0x2240c
register, which will have bit 1 set if the device was POSTed during a
previous boot.  Once we've confirmed all of this, reset the GPU and
re-disable it - bringing it back to a healthy state.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=203003
Link: https://lore.kernel.org/lkml/20190212220230.1568-1-lyude@redhat.comSigned-off-by: NLyude Paul <lyude@redhat.com>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Cc: nouveau@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: Karol Herbst <kherbst@redhat.com>
Cc: Ben Skeggs <skeggsb@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: NSasha Levin <sashal@kernel.org>

0fe09701

PCI: Add macro for Switchtec quirk declarations · 5659dfca

由 Logan Gunthorpe 提交于 10月 10, 2018

[ Upstream commit 01d5d7fa8376c6b5acda86e16fcad22de6bba486 ]

Add SWITCHTEC_QUIRK() to reduce redundancy in declaring devices that use
quirk_switchtec_ntb_dma_alias().

By itself, this is no functional change, but a subsequent patch updates
SWITCHTEC_QUIRK() to fix ad281ecf ("PCI: Add DMA alias quirk for
Microsemi Switchtec NTB").

Fixes: ad281ecf ("PCI: Add DMA alias quirk for Microsemi Switchtec NTB")
Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
[bhelgaas: split to separate patch]
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

5659dfca

dt-bindings: mmc: Add disable-cqe-dcmd property. · e4ba1578

由 Christoph Muellner 提交于 3月 22, 2019

[ Upstream commit 28f22fb755ecf9f933f045bc0afdb8140641b01c ]

Add disable-cqe-dcmd as optional property for MMC hosts.
This property allows to disable or not enable the direct command
features of the command queue engine.
Signed-off-by: NChristoph Muellner <christoph.muellner@theobroma-systems.com>
Signed-off-by: NPhilipp Tomsich <philipp.tomsich@theobroma-systems.com>
Fixes: 84362d79 ("mmc: sdhci-of-arasan: Add CQHCI support for arasan,sdhci-5.1")
Cc: stable@vger.kernel.org
Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

e4ba1578

dt-bindings: mmc: Add supports-cqe property · eb83f9fa

由 Sowjanya Komatineni 提交于 1月 23, 2019

[ Upstream commit c7fddbd5db5cffd10ed4d18efa20e36803d1899f ]

Add supports-cqe optional property for MMC hosts.

This property is used to identify the specific host controller
supporting command queue.
Signed-off-by: NSowjanya Komatineni <skomatineni@nvidia.com>
Reviewed-by: NThierry Reding <treding@nvidia.com>
Reviewed-by: NRob Herring <robh@kernel.org>
Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

eb83f9fa

ARM: dts: qcom: ipq4019: enlarge PCIe BAR range · 0a0176f9

由 Christian Lamparter 提交于 2月 26, 2019

[ Upstream commit f3e35357cd460a8aeb48b8113dc4b761a7d5c828 ]

David Bauer reported that the VDSL modem (attached via PCIe)
on his AVM Fritz!Box 7530 was complaining about not having
enough space in the BAR. A closer inspection of the old
qcom-ipq40xx.dtsi pulled from the GL-iNet repository listed:

| qcom,pcie@80000 {
|	compatible = "qcom,msm_pcie";
|	reg = <0x80000 0x2000>,
|	      <0x99000 0x800>,
|	      <0x40000000 0xf1d>,
|	      <0x40000f20 0xa8>,
|	      <0x40100000 0x1000>,
|	      <0x40200000 0x100000>,
|	      <0x40300000 0xd00000>;
|	reg-names = "parf", "phy", "dm_core", "elbi",
|			"conf", "io", "bars";

Matching the reg-names with the listed reg leads to
<0xd00000> as the size for the "bars".

Cc: stable@vger.kernel.org
BugLink: https://www.mail-archive.com/openwrt-devel@lists.openwrt.org/msg45212.htmlReported-by: NDavid Bauer <mail@david-bauer.net>
Signed-off-by: NChristian Lamparter <chunkeey@gmail.com>
Signed-off-by: NAndy Gross <agross@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

0a0176f9

ARM: dts: qcom: ipq4019: Fix MSI IRQ type · 445a78ea

由 Niklas Cassel 提交于 1月 24, 2019

[ Upstream commit 97131f85c08e024df49480ed499aae8fb754067f ]

The databook clearly states that the MSI IRQ (msi_ctrl_int) is a level
triggered interrupt.

The msi_ctrl_int will be high for as long as any MSI status bit is set,
thus the IRQ type should be set to IRQ_TYPE_LEVEL_HIGH, causing the
IRQ handler to keep getting called, as long as any MSI status bit is set.

A git grep shows that ipq4019 is the only SoC using snps,dw-pcie that has
configured this IRQ incorrectly.

Not having the correct IRQ type defined will cause us to lose interrupts,
which in turn causes timeouts in the PCIe endpoint drivers.
Signed-off-by: NNiklas Cassel <niklas.cassel@linaro.org>
Reviewed-by: NBjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: NAndy Gross <andy.gross@linaro.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

445a78ea

ARM: dts: qcom: ipq4019: fix PCI range · df1216d8

由 Mathias Kresin 提交于 7月 25, 2018

[ Upstream commit da89f500cb55fb3f19c4b399b46d8add0abbd4d6 ]

The PCI range is invalid and PCI attached devices doen't work.
Signed-off-by: NMathias Kresin <dev@kresin.me>
Signed-off-by: NJohn Crispin <john@phrozen.org>
Signed-off-by: NAndy Gross <andy.gross@linaro.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

df1216d8

ext4: protect journal inode's blocks using block_validity · 2fd4629d

由 Theodore Ts'o 提交于 4月 09, 2019

[ Upstream commit 345c0dbf3a30872d9b204db96b5857cd00808cae ]

Add the blocks which belong to the journal inode to block_validity's
system zone so attempts to deallocate or overwrite the journal due a
corrupted file system where the journal blocks are also claimed by
another inode.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202879Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: NSasha Levin <sashal@kernel.org>

2fd4629d

media: i2c: tda1997x: select V4L2_FWNODE · f10a9230

由 Koen Vandeputte 提交于 3月 18, 2019

[ Upstream commit 5f2efda71c09b12012053f457fac7692f268b72c ]

Building tda1997x fails now unless V4L2_FWNODE is selected:

drivers/media/i2c/tda1997x.o: in function `tda1997x_parse_dt'
undefined reference to `v4l2_fwnode_endpoint_parse'

While at it, also sort the selections alphabetically

Fixes: 9ac0038d ("media: i2c: Add TDA1997x HDMI receiver driver")
Signed-off-by: NKoen Vandeputte <koen.vandeputte@ncentric.com>
Cc: stable@vger.kernel.org # v4.17+
Acked-by: NSakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: NHans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

f10a9230

cifs: Fix lease buffer length error · 4061e662

由 ZhangXiaoxu 提交于 4月 06, 2019

[ Upstream commit b57a55e2200ede754e4dc9cce4ba9402544b9365 ]

There is a KASAN slab-out-of-bounds:
BUG: KASAN: slab-out-of-bounds in _copy_from_iter_full+0x783/0xaa0
Read of size 80 at addr ffff88810c35e180 by task mount.cifs/539

CPU: 1 PID: 539 Comm: mount.cifs Not tainted 4.19 #10
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
            rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
Call Trace:
 dump_stack+0xdd/0x12a
 print_address_description+0xa7/0x540
 kasan_report+0x1ff/0x550
 check_memory_region+0x2f1/0x310
 memcpy+0x2f/0x80
 _copy_from_iter_full+0x783/0xaa0
 tcp_sendmsg_locked+0x1840/0x4140
 tcp_sendmsg+0x37/0x60
 inet_sendmsg+0x18c/0x490
 sock_sendmsg+0xae/0x130
 smb_send_kvec+0x29c/0x520
 __smb_send_rqst+0x3ef/0xc60
 smb_send_rqst+0x25a/0x2e0
 compound_send_recv+0x9e8/0x2af0
 cifs_send_recv+0x24/0x30
 SMB2_open+0x35e/0x1620
 open_shroot+0x27b/0x490
 smb2_open_op_close+0x4e1/0x590
 smb2_query_path_info+0x2ac/0x650
 cifs_get_inode_info+0x1058/0x28f0
 cifs_root_iget+0x3bb/0xf80
 cifs_smb3_do_mount+0xe00/0x14c0
 cifs_do_mount+0x15/0x20
 mount_fs+0x5e/0x290
 vfs_kern_mount+0x88/0x460
 do_mount+0x398/0x31e0
 ksys_mount+0xc6/0x150
 __x64_sys_mount+0xea/0x190
 do_syscall_64+0x122/0x590
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

It can be reproduced by the following step:
  1. samba configured with: server max protocol = SMB2_10
  2. mount -o vers=default

When parse the mount version parameter, the 'ops' and 'vals'
was setted to smb30,  if negotiate result is smb21, just
update the 'ops' to smb21, but the 'vals' is still smb30.
When add lease context, the iov_base is allocated with smb21
ops, but the iov_len is initiallited with the smb30. Because
the iov_len is longer than iov_base, when send the message,
copy array out of bounds.

we need to keep the 'ops' and 'vals' consistent.

Fixes: 9764c02f ("SMB3: Add support for multidialect negotiate (SMB2.1 and later)")
Fixes: d5c7076b772a ("smb3: add smb3.1.1 to default dialect list")
Signed-off-by: NZhangXiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: NSteve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

4061e662

KVM: x86: Always use 32-bit SMRAM save state for 32-bit kernels · df5d4ea2

由 Sean Christopherson 提交于 4月 02, 2019

[ Upstream commit b68f3cc7d978943fcf85148165b00594c38db776 ]

Invoking the 64-bit variation on a 32-bit kenrel will crash the guest,
trigger a WARN, and/or lead to a buffer overrun in the host, e.g.
rsm_load_state_64() writes r8-r15 unconditionally, but enum kvm_reg and
thus x86_emulate_ctxt._regs only define r8-r15 for CONFIG_X86_64.

KVM allows userspace to report long mode support via CPUID, even though
the guest is all but guaranteed to crash if it actually tries to enable
long mode.  But, a pure 32-bit guest that is ignorant of long mode will
happily plod along.

SMM complicates things as 64-bit CPUs use a different SMRAM save state
area.  KVM handles this correctly for 64-bit kernels, e.g. uses the
legacy save state map if userspace has hid long mode from the guest,
but doesn't fare well when userspace reports long mode support on a
32-bit host kernel (32-bit KVM doesn't support 64-bit guests).

Since the alternative is to crash the guest, e.g. by not loading state
or explicitly requesting shutdown, unconditionally use the legacy SMRAM
save state map for 32-bit KVM.  If a guest has managed to get far enough
to handle SMIs when running under a weird/buggy userspace hypervisor,
then don't deliberately crash the guest since there are no downsides
(from KVM's perspective) to allow it to continue running.

Fixes: 660a5d51 ("KVM: x86: save/load state on SMM switch")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

df5d4ea2

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功