提交 · 669d3d01530f83dc9b10b1d77886dce4169840cf · openanolis / cloud-kernel

12 10月, 2019 12 次提交

iommu/arm-smmu-v3: Remove unnecessary wrapper function · 669d3d01

由 Andrew Murray 提交于 10月 10, 2018

commit 5e731073bc0a4a53a213412dbd33982d829560f1 upstream

Simplify the code by removing an unnecessary wrapper function.

This was left behind by commit 2f657add
("iommu/arm-smmu-v3: Specialise CMD_SYNC handling")
Signed-off-by: NAndrew Murray <andrew.murray@arm.com>
Reviewed-by: NRobin Murphy <robin.murphy@arm.com>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

669d3d01

iommu/arm-smmu: Support non-strict mode · 815a3acf

由 Robin Murphy 提交于 9月 20, 2018

commit 44f6876a00e83df5fd28681502b19b0f51e4a3c6 upstream

All we need is to wire up .flush_iotlb_all properly and implement the
domain attribute, and iommu-dma and io-pgtable will do the rest for us.
The only real subtlety is documenting the barrier semantics we're
introducing between io-pgtable and the drivers for non-strict flushes.
Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

815a3acf

iommu/io-pgtable-arm-v7s: Add support for non-strict mode · 397aa2bf

由 Robin Murphy 提交于 9月 20, 2018

commit b2dfeba654cb08db327d0ed4547b66c2f8fce997 upstream

As for LPAE, it's simply a case of skipping the leaf invalidation for a
regular unmap, and ensuring that the one in split_blk_unmap() is paired
with an explicit sync ASAP rather than relying on one which might only
eventually happen way down the line.
Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

397aa2bf

iommu/arm-smmu-v3: Add support for non-strict mode · e50b3155

由 Zhen Lei 提交于 9月 20, 2018

commit 9662b99a19abccb0b7bfc91abb3fec1447c35bf0 upstream

Now that io-pgtable knows how to dodge strict TLB maintenance, all
that's left to do is bridge the gap between the IOMMU core requesting
DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE for default domains, and showing the
appropriate IO_PGTABLE_QUIRK_NON_STRICT flag to alloc_io_pgtable_ops().
Signed-off-by: NZhen Lei <thunder.leizhen@huawei.com>
[rm: convert to domain attribute, tweak commit message]
Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

e50b3155

iommu/io-pgtable-arm: Add support for non-strict mode · 15c316e7

由 Zhen Lei 提交于 9月 20, 2018

commit b6b65ca20bc93d14319f9b5cf98fd3c19a4244e3 upstream

Non-strict mode is simply a case of skipping 'regular' leaf TLBIs, since
the sync is already factored out into ops->iotlb_sync at the core API
level. Non-leaf invalidations where we change the page table structure
itself still have to be issued synchronously in order to maintain walk
caches correctly.

To save having to reason about it too much, make sure the invalidation
in arm_lpae_split_blk_unmap() just performs its own unconditional sync
to minimise the window in which we're technically violating the break-
before-make requirement on a live mapping. This might work out redundant
with an outer-level sync for strict unmaps, but we'll never be splitting
blocks on a DMA fastpath anyway.
Signed-off-by: NZhen Lei <thunder.leizhen@huawei.com>
[rm: tweak comment, commit message, split_blk_unmap logic and barriers]
Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

15c316e7

iommu: Add "iommu.strict" command line option · c4e943b6

由 Zhen Lei 提交于 9月 20, 2018

commit 68a6efe86f6a16e25556a2aff40efad41097b486 upstream

Add a generic command line option to enable lazy unmapping via IOVA
flush queues, which will initally be suuported by iommu-dma. This echoes
the semantics of "intel_iommu=strict" (albeit with the opposite default
value), but in the driver-agnostic fashion of "iommu.passthrough".
Signed-off-by: NZhen Lei <thunder.leizhen@huawei.com>
[rm: move handling out of SMMUv3 driver, clean up documentation]
Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
[will: dropped broken printk when parsing command-line option]
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

c4e943b6

iommu/dma: Add support for non-strict mode · c103f920

由 Zhen Lei 提交于 9月 20, 2018

commit 2da274cdf998a1c12afa6b5975db2df1df01edf1 upstream

With the flush queue infrastructure already abstracted into IOVA
domains, hooking it up in iommu-dma is pretty simple. Since there is a
degree of dependency on the IOMMU driver knowing what to do to play
along, we key the whole thing off a domain attribute which will be set
on default DMA ops domains to request non-strict invalidation. That way,
drivers can indicate the appropriate support by acknowledging the
attribute, and we can easily fall back to strict invalidation otherwise.

The flush queue callback needs a handle on the iommu_domain which owns
our cookie, so we have to add a pointer back to that, but neatly, that's
also sufficient to indicate whether we're using a flush queue or not,
and thus which way to release IOVAs. The only slight subtlety is
switching __iommu_dma_unmap() from calling iommu_unmap() to explicit
iommu_unmap_fast()/iommu_tlb_sync() so that we can elide the sync
entirely in non-strict mode.
Signed-off-by: NZhen Lei <thunder.leizhen@huawei.com>
[rm: convert to domain attribute, tweak comments and commit message]
Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

c103f920

iommu/arm-smmu-v3: Implement flush_iotlb_all hook · 4ec05fe0

由 Zhen Lei 提交于 9月 20, 2018

commit 07fdef34d2be6811f00c6f9e4e2a1483cf86696c upstream

.flush_iotlb_all is currently stubbed to arm_smmu_iotlb_sync() since the
only time it would ever need to actually do anything is for callers
doing their own explicit batching, e.g.:

	iommu_unmap_fast(domain, ...);
	iommu_unmap_fast(domain, ...);
	iommu_iotlb_flush_all(domain, ...);

where since io-pgtable still issues the TLBI commands implicitly in the
unmap instead of implementing .iotlb_range_add, the "flush" only needs
to ensure completion of those already-in-flight invalidations.

However, we're about to start using it in anger with flush queues, so
let's get a proper implementation wired up.
Signed-off-by: NZhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: NRobin Murphy <robin.murphy@arm.com>
[rm: document why it wasn't a bug]
Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

4ec05fe0

iommu/arm-smmu-v3: Avoid back-to-back CMD_SYNC operations · 6b151cdb

由 Zhen Lei 提交于 8月 19, 2018

commit 901510ee32f7190902f6fe4affb463e5d86a804c upstream

Putting adjacent CMD_SYNCs into the command queue is nonsensical, but
can happen when multiple CPUs are inserting commands. Rather than leave
the poor old hardware to chew through these operations, we can instead
drop the subsequent SYNCs and poll for completion of the first. This
has been shown to improve IO performance under pressure, where the
number of SYNC operations reduces by about a third:

	CMD_SYNCs reduced:	19542181
	CMD_SYNCs total:	58098548	(include reduced)
	CMDs total:		116197099	(TLBI:SYNC about 1:1)
Signed-off-by: NZhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

6b151cdb

iommu/arm-smmu-v3: Fix unexpected CMD_SYNC timeout · afb27ac8

由 Zhen Lei 提交于 8月 19, 2018

commit 0f02477d16980938a84aba8688a4e3a303306116 upstream

The condition break condition of:

	(int)(VAL - sync_idx) >= 0

in the __arm_smmu_sync_poll_msi() polling loop requires that sync_idx
must be increased monotonically according to the sequence of the CMDs in
the cmdq.

However, since the msidata is populated using atomic_inc_return_relaxed()
before taking the command-queue spinlock, then the following scenario
can occur:

CPU0			CPU1
msidata=0
			msidata=1
			insert cmd1
insert cmd0
			smmu execute cmd1
smmu execute cmd0
			poll timeout, because msidata=1 is overridden by
			cmd0, that means VAL=0, sync_idx=1.

This is not a functional problem, since the caller will eventually either
timeout or exit due to another CMD_SYNC, however it's clearly not what
the code is supposed to be doing. Fix it, by incrementing the sequence
count with the command-queue lock held, allowing us to drop the atomic
operations altogether.
Signed-off-by: NZhen Lei <thunder.leizhen@huawei.com>
[will: dropped the specialised cmd building routine for now]
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

afb27ac8

iommu/io-pgtable-arm: Fix race handling in split_blk_unmap() · 4fed8b77

由 Robin Murphy 提交于 9月 06, 2018

commit 85c7a0f1ef624ef58173ef52ea77780257bdfe04 upstream

In removing the pagetable-wide lock, we gained the possibility of the
vanishingly unlikely case where we have a race between two concurrent
unmappers splitting the same block entry. The logic to handle this is
fairly straightforward - whoever loses the race frees their partial
next-level table and instead dereferences the winner's newly-installed
entry in order to fall back to a regular unmap, which intentionally
echoes the pre-existing case of recursively splitting a 1GB block down
to 4KB pages by installing a full table of 2MB blocks first.

Unfortunately, the chump who implemented that logic failed to update the
condition check for that fallback, meaning that if said race occurs at
the last level (where the loser's unmap_idx is valid) then the unmap
won't actually happen. Fix that to properly account for both the race
and recursive cases.

Fixes: 2c3d273e ("iommu/io-pgtable-arm: Support lockless operation")
Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
[will: re-jig control flow to avoid duplicate cmpxchg test]
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

4fed8b77

iommu/arm-smmu-v3: Fix a couple of minor comment typos · fcc22750

由 John Garry 提交于 8月 17, 2018

commit 657135f3108122556c3cf60a78c6f0e76aeb60e6 commit

Fix some comment typos spotted.
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NZou Cao <zoucao@linux.alibaba.com>
Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>

fcc22750

25 9月, 2019 2 次提交

scsi: mpt3sas_ctl: fix double-fetch bug in _ctl_ioctl_main() · d7ae59d8

由 Gen Zhang 提交于 5月 30, 2019

commit f9e3ebeea4521652318af903cddeaf033527e93e upstream.

In _ctl_ioctl_main(), 'ioctl_header' is fetched the first time from
userspace. 'ioctl_header.ioc_number' is then checked. The legal result is
saved to 'ioc'. Then, in condition MPT3COMMAND, the whole struct is fetched
again from the userspace. Then _ctl_do_mpt_command() is called, 'ioc' and
'karg' as inputs.

However, a malicious user can change the 'ioc_number' between the two
fetches, which will cause a potential security issues.  Moreover, a
malicious user can provide a valid 'ioc_number' to pass the check in first
fetch, and then modify it in the second fetch.

To fix this, we need to recheck the 'ioc_number' in the second fetch.
Signed-off-by: NGen Zhang <blackgod016574@gmail.com>
Acked-by: NSuganath Prabu S <suganath-prabu.subramani@broadcom.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

d7ae59d8

clk-sunxi: fix a missing-check bug in sunxi_divs_clk_setup() · fffdfdf5

由 Gen Zhang 提交于 5月 28, 2019

commit fcdf445ff42f036d22178b49cf64e92d527c1330 upstream.

In sunxi_divs_clk_setup(), 'derived_name' is allocated by kstrndup().
It returns NULL when fails. 'derived_name' should be checked.
Signed-off-by: NGen Zhang <blackgod016574@gmail.com>
Signed-off-by: NMaxime Ripard <maxime.ripard@bootlin.com>
Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

fffdfdf5

20 9月, 2019 2 次提交

e1000e: increase pause and refresh time · 3bdf742f

由 Miguel Bernal Marin 提交于 3月 27, 2017

commit f74dc880098b4a29f76d756b888fb31d81ad9a0c upstream.
Suggested-by: NTim Pepper <timothy.c.pepper@linux.intel.com>
Signed-off-by: NMiguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
Signed-off-by: NPaul Menzel <pmenzel@molgen.mpg.de>
Acked-by: NSasha Neftin <sasha.neftin@intel.com>
Tested-by: NAaron Brown <aaron.f.brown@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

3bdf742f

reduce e1000e boot time by tightening sleep ranges · 229b67b5

由 Arjan van de Ven 提交于 7月 25, 2016

commit ab6973aed6200510662856afce5e3d1e386b7b64 upstream.

The e1000e driver is a great user of the usleep_range() API,
and has any nice ranges that in principle help power management.

However the ranges that are used only during system startup are
very long (and can add easily 100 msec to the boot time) while
the power savings of such long ranges is irrelevant due to the
one-off, boot only, nature of these functions.

This patch shrinks some of the longest ranges to be shorter
(while still using a power friendly 1 msec range); this saves
100msec+ of boot time on my BDW NUCs
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NPaul Menzel <pmenzel@molgen.mpg.de>
Tested-by: NAaron Brown <aaron.f.brown@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

229b67b5

18 9月, 2019 1 次提交

vhost: make sure log_num < in_num · 1bd2872a

由 yongduan 提交于 9月 11, 2019

commit 060423bfdee3f8bc6e2c1bac97de24d5415e2bc4 upstream.

The code assumes log_num < in_num everywhere, and that is true as long as
in_num is incremented by descriptor iov count, and log_num by 1. However
this breaks if there's a zero sized descriptor.

As a result, if a malicious guest creates a vring desc with desc.len = 0,
it may cause the host kernel to crash by overflowing the log array. This
bug can be triggered during the VM migration.

There's no need to log when desc.len = 0, so just don't increment log_num
in this case.

Fixes: 3a4d5c94 ("vhost_net: a kernel-level virtio server")
Cc: stable@vger.kernel.org
Reviewed-by: NLidong Chen <lidongchen@tencent.com>
Signed-off-by: Nruippan <ruippan@tencent.com>
Signed-off-by: Nyongduan <yongduan@tencent.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Reviewed-by: NTyler Hicks <tyhicks@canonical.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

1bd2872a

19 8月, 2019 9 次提交

dm raid: fix false -EBUSY when handling check/repair message · e359ea01

由 Heinz Mauelshagen 提交于 12月 18, 2018

commit 74694bcbdf7e28a5ad548cdda9ac56d30be00d13 upstream.

Sending a check/repair message infrequently leads to -EBUSY instead of
properly identifying an active resync.  This occurs because
raid_message() is testing recovery bits in a racy way.

Fix by calling decipher_sync_action() from raid_message() to properly
identify the idle state of the RAID device.
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

e359ea01

make 'user_access_begin()' do 'access_ok()' · 6342e75a

由 Linus Torvalds 提交于 1月 04, 2019

commit 594cc251fdd0d231d342d88b2fdff4bc42fb0690 upstream.

Originally, the rule used to be that you'd have to do access_ok()
separately, and then user_access_begin() before actually doing the
direct (optimized) user access.

But experience has shown that people then decide not to do access_ok()
at all, and instead rely on it being implied by other operations or
similar.  Which makes it very hard to verify that the access has
actually been range-checked.

If you use the unsafe direct user accesses, hardware features (either
SMAP - Supervisor Mode Access Protection - on x86, or PAN - Privileged
Access Never - on ARM) do force you to use user_access_begin().  But
nothing really forces the range check.

By putting the range check into user_access_begin(), we actually force
people to do the right thing (tm), and the range check vill be visible
near the actual accesses.  We have way too long a history of people
trying to avoid them.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

[ Shile: fix following conflicts by adding a dummy arguments ]
Conflicts:
	kernel/compat.c
	kernel/exit.c
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>

6342e75a

i915: fix missing user_access_end() in page fault exception case · 89b31e43

由 Linus Torvalds 提交于 1月 04, 2019

commit 0b2c8f8b6b0c7530e2866c95862546d0da2057b0 upstream.

When commit fddcd00a49e9 ("drm/i915: Force the slow path after a
user-write error") unified the error handling for various user access
problems, it didn't do the user_access_end() that is needed for the
unsafe_put_user() case.

It's not a huge deal: a missed user_access_end() will only mean that
SMAP protection isn't active afterwards, and for the error case we'll be
returning to user mode soon enough anyway.  But it's wrong, and adding
the proper user_access_end() is trivial enough (and doing it for the
other error cases where it isn't needed doesn't hurt).

I noticed it while doing the same prep-work for changing
user_access_begin() that precipitated the access_ok() changes in commit
96d4f267e40f ("Remove 'type' argument from access_ok() function").

Fixes: fddcd00a49e9 ("drm/i915: Force the slow path after a user-write error")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: stable@kernel.org # v4.20
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

89b31e43

drm/i915: Force the slow path after a user-write error · 7e6c8a93

由 Chris Wilson 提交于 9月 03, 2018

commit fddcd00a49e9122a3579247151e9cb3ce5a1a36e upstream.

If we fail to write the user relocation back when it is changed, force
ourselves to take the slow relocation path where we can handle faults in
the write path. There is still an element of dubiousness as having
patched up the batch to use the correct offset, it no longer matches the
presumed_offset in the relocation, so a second pass may miss any changes
in layout.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180903083337.13134-3-chris@chris-wilson.co.ukSigned-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

7e6c8a93

random: speed up the initialization of module · e1064533

由 Xingjun Liu 提交于 6月 26, 2019

During the module initialization phase, entropy will be added
to entropy pool for every interrupt, the change should speed up
initialization of the random module.

Before optimization:
[   22.180236] random: crng init done

After optimization:
[    1.474832] random: crng init done
Signed-off-by: NXingjun Liu <xingjun.lxj@alibaba-inc.com>
Reviewed-by: NLiu Jiang <gerry@linux.alibaba.com>
Reviewed-by: NCaspar Zhang <caspar@linux.alibaba.com>
Reviewed-by: Jia Zhang <zhang.jia@linux.alibaba.com>
Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: NLiu Bo <bo.liu@linux.alibaba.com>

e1064533

random: introduce the initialization seed · 8c233ea3

由 Xingjun Liu 提交于 6月 24, 2019

Add random entropy with the module parameter as the initialization
seed when the kernel startup.

For guest OS working in VM, the random entropy will be less,
it cause the random module to initialize very slowly, and if
the application which running in guest os gets a certain amount of
random numbers in the initialization phase, it will be blocked.

This patch allows the VMM to provide a certain amount of random seed
when starting guest OS, speeding up the initialization of the entire
guest OS random module.

Before optimization:
[   22.180236] random: crng init done

After optimization:
[    1.553362] random: crng init done
Signed-off-by: NXingjun Liu <xingjun.lxj@alibaba-inc.com>
Reviewed-by: NLiu Jiang <gerry@linux.alibaba.com>
Reviewed-by: NCaspar Zhang <caspar@linux.alibaba.com>
Reviewed-by: Jia Zhang <zhang.jia@linux.alibaba.com>
Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: NLiu Bo <bo.liu@linux.alibaba.com>

8c233ea3

cpufreq/intel_pstate: Load only on Intel hardware · 748fc02e

由 Borislav Petkov 提交于 7月 29, 2019

commit 4ab526468344c11d2d1807ae95feb1f5305dc014 upstream.

This driver is Intel-only so loading on anything which is not Intel is
pointless. Prevent it from doing so.

While at it, correct the "not supported" print statement to say CPU
"model" which is what that test does.

Fixes: 076b862c7e44 (cpufreq: intel_pstate: Add reasons for failure and debug messages)
Suggested-by: NErwan Velu <e.velu@criteo.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NThomas Renninger <trenn@suse.de>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NShanpei Chen <shanpeic@linux.alibaba.com>
Acked-by: NMichael Wang <yun.wang@linux.alibaba.com>

748fc02e

cpufreq: intel_pstate: Add reasons for failure and debug messages · 3bc2474e

由 Erwan Velu 提交于 7月 29, 2019

commit 076b862c7e4409d2dcacfda19f7eaf8d07ab9200 upstream.

The init code path has several exceptions where the driver can
decide not to load.

As CONFIG_X86_INTEL_PSTATE is generally set to Y, the return code is
not reachable.  The initialization code is neither verbose of the
reason why it did choose to prematurely exit, so it is difficult for
a user to determine, on a given platform, why the driver didn't load
properly.

This patch is about reporting to the user the reason/context of why
the driver failed to load.  That is a precious hint when debugging
a platform.
Signed-off-by: NErwan Velu <e.velu@criteo.com>
[ rjw: Subject & changelog, minor fixups ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NShanpei Chen <shanpeic@linux.alibaba.com>
Acked-by: NMichael Wang <yun.wang@linux.alibaba.com>

3bc2474e

cpufreq: intel_pstate: Force HWP min perf before offline · 5602eeb8

由 Srinivas Pandruvada 提交于 7月 29, 2019

commit af3b7379e2d709f2d7c6966b8a6f5ec6bd134241 upstream.

Force HWP Request MAX = HWP Request MIN = HWP Capability MIN and EPP to
0xFF. In this way the performance limits on the offlined CPU will not
influence performance limits on its sibling CPU, which is still online.

If the sibling CPU is calling for higher performance, it will impact the
max core performance. Here core performance will follow higher of the
performance requests from each sibling.
Reported-and-tested-by: NChen Yu <yu.c.chen@intel.com>
Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NShanpei Chen <shanpeic@linux.alibaba.com>
Acked-by: NMichael Wang <yun.wang@linux.alibaba.com>

5602eeb8

17 8月, 2019 12 次提交

dm: add missing trace_block_split() to __split_and_process_bio() · 77570621

由 Mike Snitzer 提交于 1月 18, 2019

commit 075c18c3e124a1511ebc10a89f1858c8a77dcb01 upstream.

Provides useful context about bio splits in blktrace.
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

77570621

dm: fix dm_wq_work() to only use __split_and_process_bio() if appropriate · ff1ac8f0

由 Mike Snitzer 提交于 1月 17, 2019

commit 6548c7c538e5658cbce686c2dd1a9b4f5398bf34 upstream.

Otherwise targets that don't support/expect IO splitting could resubmit
bios using code paths with unnecessary IO splitting complexity.

Depends-on: 24113d487843 ("dm: avoid indirect call in __dm_make_request")
Fixes: 978e51ba ("dm: optimize bio-based NVMe IO submission")
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

ff1ac8f0

dm: avoid indirect call in __dm_make_request · f75c099f

由 Mikulas Patocka 提交于 11月 06, 2018

commit 24113d4878439baf1f23c1a33dfcc340fba66e97 upstream.

Indirect calls are inefficient because of retpolines that are used for
spectre workaround. This patch replaces an indirect call with a condition
(that can be predicted by the branch predictor).
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

f75c099f

dm: fix redundant IO accounting for bios that need splitting · bfaa531b

由 Mike Snitzer 提交于 1月 17, 2019

commit a1e1cb72d96491277ede8d257ce6b48a381dd336 upstream.

[Joseph: cherry-pick part_stat_get() from commit 1226b8dd0e91 ("block:
switch to per-cpu in-flight counters") since we don't want the whole
patch series get involved.]

The risk of redundant IO accounting was not taken into consideration
when commit 18a25da8 ("dm: ensure bio submission follows a
depth-first tree walk") introduced IO splitting in terms of recursion
via generic_make_request().

Fix this by subtracting the split bio's payload from the IO stats that
were already accounted for by start_io_acct() upon dm_make_request()
entry.  This repeat oscillation of the IO accounting, up then down,
isn't ideal but refactoring DM core's IO splitting to pre-split bios
_before_ they are accounted turned out to be an excessive amount of
change that will need a full development cycle to refine and verify.

Before this fix:

  /dev/mapper/stripe_dev is a 4-way stripe using a 32k chunksize, so
  bios are split on 32k boundaries.

  # fio --name=16M --filename=/dev/mapper/stripe_dev --rw=write --bs=64k --size=16M \
    	--iodepth=1 --ioengine=libaio --direct=1 --refill_buffers

  with debugging added:
  [103898.310264] device-mapper: core: start_io_acct: dm-2 WRITE bio->bi_iter.bi_sector=0 len=128
  [103898.318704] device-mapper: core: __split_and_process_bio: recursing for following split bio:
  [103898.329136] device-mapper: core: start_io_acct: dm-2 WRITE bio->bi_iter.bi_sector=64 len=64
  ...

  16M written yet 136M (278528 * 512b) accounted:
  # cat /sys/block/dm-2/stat | awk '{ print $7 }'
  278528

After this fix:

  16M written and 16M (32768 * 512b) accounted:
  # cat /sys/block/dm-2/stat | awk '{ print $7 }'
  32768

Fixes: 18a25da8 ("dm: ensure bio submission follows a depth-first tree walk")
Cc: stable@vger.kernel.org # 4.16+
Reported-by: NBryan Gurney <bgurney@redhat.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

bfaa531b

dm: fix clone_bio() to trigger blk_recount_segments() · 65f839dd

由 Mike Snitzer 提交于 1月 16, 2019

commit 57c36519e4b949f89381053f7283f5d605595b42 upstream.

DM's clone_bio() now benefits from using bio_trim() by fixing the fact
that clone_bio() wasn't clearing BIO_SEG_VALID like bio_trim() does;
which triggers blk_recount_segments() via bio_phys_segments().
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

65f839dd

include/: refactor headers to allow kthread.h inclusion in psi_types.h · a9c1573c

由 Suren Baghdasaryan 提交于 5月 14, 2019

commit 8af0c18af1425fc70686c0fdcfc0072cd8431aa0 upstream.

kthread.h can't be included in psi_types.h because it creates a circular
inclusion with kthread.h eventually including psi_types.h and
complaining on kthread structures not being defined because they are
defined further in the kthread.h.  Resolve this by removing psi_types.h
inclusion from the headers included from kthread.h.

Link: http://lkml.kernel.org/r/20190319235619.260832-7-surenb@google.comSigned-off-by: NSuren Baghdasaryan <surenb@google.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

a9c1573c

sched: loadavg: consolidate LOAD_INT, LOAD_FRAC, CALC_LOAD · b0406fce

由 Johannes Weiner 提交于 10月 26, 2018

commit 8508cf3ffad4defa202b303e5b6379efc4cd9054 upstream.

There are several definitions of those functions/macros in places that
mess with fixed-point load averages.  Provide an official version.

[akpm@linux-foundation.org: fix missed conversion in block/blk-iolatency.c]
Link: http://lkml.kernel.org/r/20180828172258.3185-5-hannes@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: NSuren Baghdasaryan <surenb@google.com>
Tested-by: NDaniel Drake <drake@endlessm.com>
Cc: Christopher Lameter <cl@linux.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Weiner <jweiner@fb.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Enderborg <peter.enderborg@sony.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
[Joseph: use stat.mean instead of stat->rqs.mean to solve conflict]
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

Conflicts:
    block/blk-iolatency.c

b0406fce

PCI: Fix "try" semantics of bus and slot reset · 6987a10b

由 Alex Williamson 提交于 5月 24, 2019

commit ddefc033eecf23f1e8b81d0663c5db965adf5516 upstream

The commit referenced below introduced device locking around save and
restore of state for each device during a PCI bus "try" reset, making
it decidely non-"try" and prone to deadlock in the event that a device
is already locked. Restore __pci_reset_bus() and __pci_reset_slot()
to their advertised locking semantics by pushing the save and restore
functions into the branch where the entire tree is already locked.
Extend the helper function names with "_locked" and update the comment
to reflect this calling requirement.

Fixes: b014e96d ("PCI: Protect pci_error_handlers->reset_notify() usage with device_lock()")
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NZhiyuan Hou <zhiyuan2048@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>

6987a10b

virtio_blk: add discard and write zeroes support · 311efc03

由 Changpeng Liu 提交于 11月 01, 2018

commit 1f23816b8eb8fdc39990abe166c10a18c16f6b21 upstream.

In commit 88c85538, "virtio-blk: add discard and write zeroes features
to specification" (https://github.com/oasis-tcs/virtio-spec), the virtio
block specification has been extended to add VIRTIO_BLK_T_DISCARD and
VIRTIO_BLK_T_WRITE_ZEROES commands.  This patch enables support for
discard and write zeroes in the virtio-blk driver when the device
advertises the corresponding features, VIRTIO_BLK_F_DISCARD and
VIRTIO_BLK_F_WRITE_ZEROES.
Signed-off-by: NChangpeng Liu <changpeng.liu@intel.com>
Signed-off-by: NDaniel Verkamp <dverkamp@chromium.org>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Reviewed-by: NLiu Bo <bo.liu@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

311efc03

eci: drivers/virtio: add vring_force_dma_api boot param · 64b5a541

由 Eryu Guan 提交于 12月 24, 2018

Prior to xdragon platform 20181230 release (e.g. 0930 release),
vring_use_dma_api() is required to return 'true' unconditionally.

Introduce a new kernel boot parameter called "vring_force_dma_api" to
control the behavior, boot xdragon host with "vring_force_dma_api"
command line to make ENI hotplug work, so that normal ECS hosts keep the
original behavior.
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Signed-off-by: NEryu Guan <eguan@linux.alibaba.com>

64b5a541

boot: give rdrand some credit · 5f00d7ad

由 Arjan van de Ven 提交于 7月 29, 2016

Cherry-pick from clear-linux patches:
https://github.com/clearlinux-pkgs/linux-kvm/0104-give-rdrand-some-credit.patch

try to credit rdrand/rdseed with some entropy

In VMs but even modern hardware, we're super starved for entropy, and while we can
and do wear a tin foil hat, it's very hard to argue that
rdrand and rdtsc add zero entropy.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>

5f00d7ad

NEMU: Compile in evged always · 91e41111

由 Arjan van de Ven 提交于 8月 10, 2018

Cherry-pick from kata-container patches:
https://github.com/kata-containers/packaging/tree/master/kernel/patches/0002-Compile-in-evged-always.patch

We need evged for NEMU (and in general for hw reduced)

The config option cannot be set normally since it breaks all
regular systems, and hardware reduced is really a runtime choice.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NEryu Guan <eguan@linux.alibaba.com>
Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

91e41111

16 8月, 2019 2 次提交

iwlwifi: mvm: fix version check for GEO_TX_POWER_LIMIT support · ac295111

由 Luca Coelho 提交于 7月 19, 2019

commit f5a47fae6aa3eb06f100e701d2342ee56b857bee upstream.

We erroneously added a check for FW API version 41 before sending
GEO_TX_POWER_LIMIT, but this was already implemented in version 38.
Additionally, it was cherry-picked to older versions, namely 17, 26
and 29, so check for those as well.

Cc: stable@vger.kernel.org
Fixes: eca1e56ceedd ("iwlwifi: mvm: don't send GEO_TX_POWER_LIMIT to old firmwares")
Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

ac295111

iwlwifi: mvm: don't send GEO_TX_POWER_LIMIT on version < 41 · 6a81677a

由 Luca Coelho 提交于 6月 24, 2019

commit 39bd984c203e86f3109b49c2a2e20677c4d3ab65 upstream.

Firmware versions before 41 don't support the GEO_TX_POWER_LIMIT
command, and sending it to the firmware will cause a firmware crash.
We allow this via debugfs, so we need to return an error value in case
it's not supported.

This had already been fixed during init, when we send the command if
the ACPI WGDS table is present.  Fix it also for the other,
userspace-triggered case.

Cc: stable@vger.kernel.org
Fixes: 7fe90e0e ("iwlwifi: mvm: refactor geo init")
Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

6a81677a

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功