提交 · 0487638208dd25c7364d0a5caa45d1b45c7fd79d · openeuler / Kernel

07 1月, 2022 30 次提交

KVM: vmx/pmu: Release guest LBR event via lazy release mechanism · 04876382

由 Like Xu 提交于 1月 07, 2022

mainline inclusion
from mainline-v5.12-rc1
commit 9aa4f622
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NP0K
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9aa4f622460f9287e57804dbeb219bfef29f04a1

-------------------

The vPMU uses GUEST_LBR_IN_USE_IDX (bit 58) in 'pmu->pmc_in_use' to
indicate whether a guest LBR event is still needed by the vcpu. If the
vcpu no longer accesses LBR related registers within a scheduling time
slice, and the enable bit of LBR has been unset, vPMU will treat the
guest LBR event as a bland event of a vPMC counter and release it
as usual. Also, the pass-through state of LBR records msrs is cancelled.
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Message-Id: <20210201051039.255478-10-like.xu@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NWangJian <wangjian161@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

04876382

KVM: vmx/pmu: Emulate legacy freezing LBRs on virtual PMI · 82d2ccbb

由 Like Xu 提交于 1月 07, 2022

mainline inclusion
from mainline-v5.12-rc1
commit e6209a3b
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NP0K
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e6209a3bef793e8fe29c873a7612023916eaa611

-------------------

The current vPMU only supports Architecture Version 2. According to
Intel SDM "17.4.7 Freezing LBR and Performance Counters on PMI", if
IA32_DEBUGCTL.Freeze_LBR_On_PMI = 1, the LBR is frozen on the virtual
PMI and the KVM would emulate to clear the LBR bit (bit 0) in
IA32_DEBUGCTL. Also, guest needs to re-enable IA32_DEBUGCTL.LBR
to resume recording branches.
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Message-Id: <20210201051039.255478-9-like.xu@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NWangJian <wangjian161@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

82d2ccbb

KVM: vmx/pmu: Reduce the overhead of LBR pass-through or cancellation · 58632199

由 Like Xu 提交于 1月 07, 2022

mainline inclusion
from mainline-v5.12-rc1
commit 9254beaa
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NP0K
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9254beaafd12e27d48149fab3b16db372bc90ad7

-------------------

When the LBR records msrs has already been pass-through, there is no
need to call vmx_update_intercept_for_lbr_msrs() again and again, and
vice versa.
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Message-Id: <20210201051039.255478-8-like.xu@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NWangJian <wangjian161@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

58632199

KVM: vmx/pmu: Pass-through LBR msrs when the guest LBR event is ACTIVE · e6aea025

由 Like Xu 提交于 1月 07, 2022

mainline inclusion
from mainline-v5.12-rc1
commit 1b5ac322
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NP0K
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1b5ac3226a1aa071135fe0ee5d1055d9e88b717c

-------------------

In addition to DEBUGCTLMSR_LBR, any KVM trap caused by LBR msrs access
will result in a creation of guest LBR event per-vcpu.

If the guest LBR event is scheduled on with the corresponding vcpu context,
KVM will pass-through all LBR records msrs to the guest. The LBR callstack
mechanism implemented in the host could help save/restore the guest LBR
records during the event context switches, which reduces a lot of overhead
if we save/restore tens of LBR msrs (e.g. 32 LBR records entries) in the
much more frequent VMX transitions.

To avoid reclaiming LBR resources from any higher priority event on host,
KVM would always check the exist of guest LBR event and its state before
vm-entry as late as possible. A negative result would cancel the
pass-through state, and it also prevents real registers accesses and
potential data leakage. If host reclaims the LBR between two checks, the
interception state and LBR records can be safely preserved due to native
save/restore support from guest LBR event.

The KVM emits a pr_warn() when the LBR hardware is unavailable to the
guest LBR event. The administer is supposed to reminder users that the
guest result may be inaccurate if someone is using LBR to record
hypervisor on the host side.
Suggested-by: NAndi Kleen <ak@linux.intel.com>
Co-developed-by: NWei Wang <wei.w.wang@intel.com>
Signed-off-by: NWei Wang <wei.w.wang@intel.com>
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Message-Id: <20210201051039.255478-7-like.xu@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NWangJian <wangjian161@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e6aea025

KVM: vmx/pmu: Create a guest LBR event when vcpu sets DEBUGCTLMSR_LBR · 157add0e

由 Like Xu 提交于 1月 07, 2022

mainline inclusion
from mainline-v5.12-rc1
commit 8e12911b
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NP0K
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8e12911b243e485f5e4c7c5fbc79cdf185728700

-------------------

When vcpu sets DEBUGCTLMSR_LBR in the MSR_IA32_DEBUGCTLMSR, the KVM handler
would create a guest LBR event which enables the callstack mode and none of
hardware counter is assigned. The host perf would schedule and enable this
event as usual but in an exclusive way.

The guest LBR event will be released when the vPMU is reset but soon,
the lazy release mechanism would be applied to this event like a vPMC.
Suggested-by: NAndi Kleen <ak@linux.intel.com>
Co-developed-by: NWei Wang <wei.w.wang@intel.com>
Signed-off-by: NWei Wang <wei.w.wang@intel.com>
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Message-Id: <20210201051039.255478-6-like.xu@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NWangJian <wangjian161@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

157add0e

KVM: vmx/pmu: Add PMU_CAP_LBR_FMT check when guest LBR is enabled · 1c78a8dc

由 Like Xu 提交于 1月 07, 2022

mainline inclusion
from mainline-v5.12-rc1
commit c6462363
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NP0K
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c646236344e9054cc84cd5a9f763163b9654cf7e

-------------------

Usespace could set the bits [0, 5] of the IA32_PERF_CAPABILITIES
MSR which tells about the record format stored in the LBR records.

The LBR will be enabled on the guest if host perf supports LBR
(checked via x86_perf_get_lbr()) and the vcpu model is compatible
with the host one.
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Message-Id: <20210201051039.255478-4-like.xu@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NWangJian <wangjian161@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

1c78a8dc

KVM: vmx/pmu: Add PMU_CAP_LBR_FMT check when guest LBR is enabled · 81d99ef6

由 Paolo Bonzini 提交于 1月 07, 2022

mainline inclusion
from mainline-v5.12-rc1
commit 9c9520ce
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NP0K
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9c9520ce883386dc3794c7d60204487ff1db09cb

-------------------

Usespace could set the bits [0, 5] of the IA32_PERF_CAPABILITIES
MSR which tells about the record format stored in the LBR records.

The LBR will be enabled on the guest if host perf supports LBR
(checked via x86_perf_get_lbr()) and the vcpu model is compatible
with the host one.
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Message-Id: <20210201051039.255478-4-like.xu@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NWangJian <wangjian161@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

81d99ef6

KVM: x86/pmu: preserve IA32_PERF_CAPABILITIES across CPUID refresh · c22843e8

由 Paolo Bonzini 提交于 1月 07, 2022

mainline inclusion
from mainline-v5.12-rc1
commit a7557539
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NP0K
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a755753903a40d982f6dd23d65eb96b248a2577a

-------------------

Once MSR_IA32_PERF_CAPABILITIES is changed via vmx_set_msr(), the
value should not be changed by cpuid(). To ensure that the new value
is kept, the default initialization path is moved to intel_pmu_init().
The effective value of the MSR will be 0 if PDCM is clear, however.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NWangJian <wangjian161@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c22843e8

KVM: x86/vmx: Make vmx_set_intercept_for_msr() non-static · a7beb17f

由 Like Xu 提交于 1月 07, 2022

mainline inclusion
from mainline-v5.12-rc1
commit 252e365e
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NP0K
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=252e365eb28ddf49eb31cec1a5d99e708c73f57b

-------------------

To make code responsibilities clear, we may resue and invoke the
vmx_set_intercept_for_msr() in other vmx-specific files (e.g. pmu_intel.c),
so expose it to passthrough LBR msrs later.
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Message-Id: <20210201051039.255478-2-like.xu@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NWangJian <wangjian161@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a7beb17f

KVM: VMX: read/write MSR_IA32_DEBUGCTLMSR from GUEST_IA32_DEBUGCTL · 52622a7d

由 Like Xu 提交于 1月 07, 2022

mainline inclusion
from mainline-v5.12-rc1
commit d855066f
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NP0K
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d855066f81726155caf766e47eea58ae10b1fd57

-------------------

SVM already has specific handlers of MSR_IA32_DEBUGCTLMSR in the
svm_get/set_msr, so the x86 common part can be safely moved to VMX.
This allows KVM to store the bits it supports in GUEST_IA32_DEBUGCTL.

Add vmx_supported_debugctl() to refactor the throwing logic of #GP.
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Message-Id: <20210108013704.134985-2-like.xu@linux.intel.com>
[Merge parts of Chenyi Qiang's "KVM: X86: Expose bus lock debug exception
 to guest". - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NWangJian <wangjian161@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

52622a7d

openeuler_defconfig: Enable sharepool feature in defconfig · e24758a4

由 Wang Wensheng 提交于 1月 07, 2022

ascend inclusion
category: Feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NDAW
CVE: NA

-------------------

Add this configs in openeuler_defconfig.
CONFIG_ASCEND_CHARGE_MIGRATE_HUGEPAGES=y
CONFIG_ASCEND_SHARE_POOL=y
Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e24758a4

net/spnic:The reset command flags modification. · 09b1db85

由 Yanling Song 提交于 1月 07, 2022

Ramaxel inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4OENF
CVE: NA

Rearragne the order and add some flag bits for future use.
Signed-off-by: NYanling Song <songyl@ramaxel.com>
Reviewed-by: NYang Gan <yanggan@ramaxel.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

09b1db85

net/spnic:Attribute negotiation and optimization. · 1ac36fde

由 Yanling Song 提交于 1月 07, 2022

Ramaxel inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4OENF
CVE: NA

Driver and firmware properties negotiate optimization.
After obtaining the attributes from the firmware.
It intersects with the attributes supported by the driver.
Signed-off-by: NYanling Song <songyl@ramaxel.com>
Reviewed-by: NYang Gan <yanggan@ramaxel.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

1ac36fde

net/spnic:RSS initialization process optimization · 1bd29bc5

由 Yanling Song 提交于 1月 07, 2022

Ramaxel inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4OENF
CVE: NA

Firmware is responsible for the application and release of RSS module ID.
And the driver no longer perceives it .
Signed-off-by: NYanling Song <songyl@ramaxel.com>
Reviewed-by: NYang Gan <yanggan@ramaxel.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

1bd29bc5

arm64: Fix conflict for capability when cpu hotplug · edaf6cd4

由 Weilong Chen 提交于 1月 07, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4LGV4
CVE: NA

---------------------------

Patch "cache: Workaround HiSilicon Taishan DC CVAU" breaks the verifiy
of cpu capability when hot plug cpus. It set the system scope on but
local cpu capability still off.
This path fix it by two step:
1. Unset CTR_IDC_SHIFT bit from strict_mask to skip check.
2. Special treatment in read_cpuid_effective_cachetype
Signed-off-by: NWeilong Chen <chenweilong@huawei.com>
Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

edaf6cd4

memcg: Add static key for memcg kswapd · 84355bcc

由 Lu Jialin 提交于 1月 07, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4IMAK?from=project-issue
CVE: NA

--------

This patch adds a default-false static key to disable memcg kswapd
feature. User can enable by set memcg_kswapd in cmdline.
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: Nweiyang wang <wangweiyang2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

84355bcc

memcg: make memcg kswapd deal with dirty · 70d020ae

由 Lu Jialin 提交于 1月 07, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4IMAK?from=project-issue
CVE: NA

--------

The memcg kswapd could set dirty state to memcg if current scan find all
pages are unqueued dirty in the memcg. Then kswapd would write out dirty pages.
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Nweiyang wang <wangweiyang2@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

70d020ae

memcg: support memcg sync reclaim work as kswapd · 1496d67c

由 Lu Jialin 提交于 1月 07, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4IMAK?from=project-issue
CVE: NA

--------

Since memory.high reclaim is sync whether is in interrupt, it could
do more work than direct reclaim, i.e. write out dirty page, etc.

So, add PF_KSWAPD flag, so that current_is_kswapd() would return true
for memcg kswapd.

Memcg kswapd should stop when usage of memcg fit the memcg kswapd stop
flag. When the userland sets the memcg->memory.max, the stop_flag is
(memcg->memory.high - memcg->memory.max * 10 / 1000), which is similar
with global kswapd. Otherwise, the stop_flag is (memcg->memory.high -
memcg->memory.high / 6), which is similar with most difference between
watermark_low and watermark_high.

And, memcg kswapd should not break memory.low protection for now.
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Nweiyang wang <wangweiyang2@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

1496d67c

memcg: Export memcg.high from cgroupv2 to cgroupv1 · 6a7b3e98

由 Lu Jialin 提交于 1月 07, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4IMAK?from=project-issue
CVE: NA

--------

Export memory.high from cgroupv2 to cgroupv1. Therefore, when the usage
of the memcg is larger than memory.high, some pages will be reclaimed
before return to userland, which will throttle the process.

Only export memory.high number in mem_cgroup_legacy_files and move
related functions in front of mem_cgroup_legacy_files. There is no need
to other changes.
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: Nweiyang wang <wangweiyang2@huawei.com>
Reviewed-by: Nweiyang wang <wangweiyang2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6a7b3e98

memcg: Export memcg.{min/low} from cgroupv2 to cgroupv1 · 27c047f4

由 Lu Jialin 提交于 1月 07, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4IMAK?from=project-issue
CVE: NA

--------

Export memcg.min and memcg.low from cgroupv2 to cgroupv1, in order to reduce
the negtive impact between cgroups when the system memory is insufficient.

Only export memory.{min/low} numbers in mem_cgroup_legacy_files and move
related functions in front of mem_cgroup_legacy_files. There is no need
to other changes.
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: Nweiyang wang <wangweiyang2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

27c047f4

kabi: Add reserved page and gfp flags for future extension · afdf2a6c

由 Peng Liang 提交于 1月 07, 2022

euleros inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4OELI
CVE: NA

-------------------------------------------------

24 page flags are used in 32-bit architectures and 27 page flags are
used in 64-bit ones currently.  And 23 gfp flags are used currently.
Add 2 reserved page and gfp flags for internal extension.  For the
new flags which backported from kernel upstream, place them behind the
reserved flags.
Signed-off-by: NPeng Liang <liangpeng10@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

afdf2a6c

kabi: reserve space for cgroup_bpf_attach_type and bpf_cgroup_storage_type · 7623fa5d

由 Lu Jialin 提交于 1月 07, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4GII8?from=project-issue
CVE: NA

--------

We reserve some fields beforehand for cgroup_bpf_attach_type and bpf_cgroup_storage_type
prone to change, therefore, we can hot add/change features of bpf cgroup
with this enhancement.

After reserving, normally cache does not matter as the reserved fields
are not accessed at all.
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: Nweiyang wang <wangweiyang2@huawei.com>
Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Wei Yongjun<weiyongjun1@huawei.com>
Reviewed-by: Nweiyang wang <wangweiyang2@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

7623fa5d

bpf: Migrate cgroup_bpf to internal cgroup_bpf_attach_type enum · fac4c1ea

由 Dave Marchevsky 提交于 1月 07, 2022

mainline inclusion
from mainline-v5.15-rc1
commit 6fc88c35
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4GII8?from=project-issue
CVE: NA

----------

Add an enum (cgroup_bpf_attach_type) containing only valid cgroup_bpf
attach types and a function to map bpf_attach_type values to the new
enum. Inspired by netns_bpf_attach_type.

Then, migrate cgroup_bpf to use cgroup_bpf_attach_type wherever
possible.  Functionality is unchanged as attach_type_to_prog_type
switches in bpf/syscall.c were preventing non-cgroup programs from
making use of the invalid cgroup_bpf array slots.

As a result struct cgroup_bpf uses 504 fewer bytes relative to when its
arrays were sized using MAX_BPF_ATTACH_TYPE.

bpf_cgroup_storage is notably not migrated as struct
bpf_cgroup_storage_key is part of uapi and contains a bpf_attach_type
member which is not meant to be opaque. Similarly, bpf_cgroup_link
continues to report its bpf_attach_type member to userspace via fdinfo
and bpf_link_info.

To ease disambiguation, bpf_attach_type variables are renamed from
'type' to 'atype' when changed to cgroup_bpf_attach_type.
Signed-off-by: NDave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210819092420.1984861-2-davemarchevsky@fb.com
Conflicts:
	include/linux/bpf-cgroup.h
	kernel/bpf/cgroup.c
	net/ipv4/af_inet.c
	net/ipv6/af_inet6.c
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: Wei Yongjun<weiyongjun1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

fac4c1ea

bpf: Split cgroup_bpf_enabled per attach type · e960ab23

由 Stanislav Fomichev 提交于 1月 07, 2022

mainline inclusion
from mainline-v5.12-rc1
commit a9ed15da
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4GII8?from=project-issue
CVE: NA

------------

When we attach any cgroup hook, the rest (even if unused/unattached) start
to contribute small overhead. In particular, the one we want to avoid is
__cgroup_bpf_run_filter_skb which does two redirections to get to
the cgroup and pushes/pulls skb.

Let's split cgroup_bpf_enabled to be per-attach to make sure
only used attach types trigger.

I've dropped some existing high-level cgroup_bpf_enabled in some
places because BPF_PROG_CGROUP_XXX_RUN macros usually have another
cgroup_bpf_enabled check.

I also had to copy-paste BPF_CGROUP_RUN_SA_PROG_LOCK for
GETPEERNAME/GETSOCKNAME because type for cgroup_bpf_enabled[type]
has to be constant and known at compile time.
Signed-off-by: NStanislav Fomichev <sdf@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NSong Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20210115163501.805133-4-sdf@google.com
Conflict:
	include/linux/bpf-cgroup.h
Signed-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: Wei Yongjun<weiyongjun1@huawei.com>
Reviewed-by: Nweiyang wang <wangweiyang2@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e960ab23

bpf: Try to avoid kzalloc in cgroup/{s,g}etsockopt · 5bf7da9d

由 Stanislav Fomichev 提交于 1月 07, 2022

mainline inclusion
from mainline-v5.12-rc1
commit 20f2505f
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4GII8?from=project-issue
CVE: NA

--------------

When we attach a bpf program to cgroup/getsockopt any other getsockopt()
syscall starts incurring kzalloc/kfree cost.

Let add a small buffer on the stack and use it for small (majority)
{s,g}etsockopt values. The buffer is small enough to fit into
the cache line and cover the majority of simple options (most
of them are 4 byte ints).

It seems natural to do the same for setsockopt, but it's a bit more
involved when the BPF program modifies the data (where we have to
kmalloc). The assumption is that for the majority of setsockopt
calls (which are doing pure BPF options or apply policy) this
will bring some benefit as well.

Without this patch (we remove about 1% __kmalloc):
     3.38%     0.07%  tcp_mmap  [kernel.kallsyms]  [k] __cgroup_bpf_run_filter_getsockopt
            |
             --3.30%--__cgroup_bpf_run_filter_getsockopt
                       |
                        --0.81%--__kmalloc
Signed-off-by: NStanislav Fomichev <sdf@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210115163501.805133-3-sdf@google.comSigned-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: Wei Yongjun<weiyongjun1@huawei.com>
Reviewed-by: Nweiyang wang <wangweiyang2@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

5bf7da9d

bpf: Allow bpf_{s,g}etsockopt from cgroup bind{4,6} hooks · b55b39ad

由 Stanislav Fomichev 提交于 1月 07, 2022

mainline inclusion
from mainline-v5.11-rc1
commit 427167c0
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4GII8?from=project-issue
CVE: NA

---------

I have to now lock/unlock socket for the bind hook execution.
That shouldn't cause any overhead because the socket is unbound
and shouldn't receive any traffic.
Signed-off-by: NStanislav Fomichev <sdf@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NAndrey Ignatov <rdna@fb.com>
Link: https://lore.kernel.org/bpf/20201202172516.3483656-3-sdf@google.comSigned-off-by: NLu Jialin <lujialin4@huawei.com>
Reviewed-by: Wei Yongjun<weiyongjun1@huawei.com>
Reviewed-by: Nweiyang wang <wangweiyang2@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b55b39ad

KABI: Add KABI_AUX_PTR extenstions to some more base structures · a62969d3

由 Zheng Zengkai 提交于 1月 07, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4JBL0
CVE: NA

------------------------------

Add KABI_AUX_PTR extenstions to the following base structures
before KABI freeze:

struct device_driver
struct class
struct device
struct hrtimer
struct ipmi_smi_handlers
struct net_device
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a62969d3

kabi: Generalize naming of kabi helper macros · e146a64e

由 Zheng Zengkai 提交于 1月 07, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4K3S5

--------------------------

Generalize naming of some kabi helper macros.
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e146a64e

arm64: Request resources for reserved memory via memmap · 374db2be

由 Peng Liu 提交于 1月 07, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NYPZ
CVE: NA

-------------------------------------------------

A new flag MEMBLOCK_MEMMAP is added into memblock_flags, which is
used to identify reserved memory for memmap. This flag is limited
for arm64. When memmap memory is reserved by memblock_reserve, it
is subsequently marked with flag MEMBLOCK_MEMMAP. Therefore,
for_each_mem_region can find memmap memory and request resources
for it.
Signed-off-by: NPeng Liu <liupeng256@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

374db2be

arm64: Add support for memmap kernel parameters · d05cfbd9

由 Peng Liu 提交于 1月 07, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NYPZ
CVE: NA

-------------------------------------------------

Add support for memmap kernel parameters for ARM64. The three below
modes are supported:

memmap=exactmap
Enable setting of an exact memory map, as specified by the user.

memmap=nn[KMG]@ss[KMG]
Force usage of a specific region of memory.

memmap=nn[KMG]$ss[KMG]
Region of memory to be reserved is from ss to ss+nn, the region must
be in the range of existed memory, otherwise will be ignored.

If users set memmap=exactmap before memmap=nn[KMG]@ss[KMG], they will
get the exact memory specified by memmap=nn[KMG]@ss[KMG]. For example,
on one machine with 4GB memory, "memmap=exactmap memmap=1G@1G" will
make kernel use the memory from 1GB to 2GB only.
Signed-off-by: NPeng Liu <liupeng256@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d05cfbd9

06 1月, 2022 3 次提交

openeuler_defconfig: Enable CONFIG_KABI_RESERVE for x86 and arm64 · 8c34e5f6

由 Zheng Zengkai 提交于 1月 06, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4K3S5

---------------------------

Enable CONFIG_KABI_RESERVE for x86 and arm64 architectures in
openeuler_defconfig by default.
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8c34e5f6

KABI: Add CONFIG_KABI_RESERVE to control KABI padding reserve · d332bb38

由 Zheng Zengkai 提交于 1月 06, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4K3S5

-----------------------

Add CONFIG_KABI_RESERVE to control KABI padding reserve or not,
for some embedded system, KABI padding reserve may be not necessary.

By the way, adjust unsigned long to u64 to unify basic reserve
length for both 32bit and 64bit architectures.
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d332bb38

KABI: Fix allmodconfig build error · 65805061

由 Zheng Zengkai 提交于 1月 06, 2022

hulk inclusion
bugzilla: https://gitee.com/openeuler/kernel/issues/I4MZU1
CVE: NA

---------------------------

For x86 platform, make allmodconfig & make -j64, following two
build errors are reported.

First error:
- build failed:
- In file included from <command-line>:32:0:
./usr/include/asm/bootparam.h:45:10: fatal error: linux/kabi.h: No such file or directory
 #include <linux/kabi.h>
          ^~~~~~~~~~~~~~
compilation terminated.
make[2]: *** [usr/include/asm/bootparam.hdrtest] Error 1

Second error:
./arch/x86/include/asm/paravirt_types.h:198:2: error: expected specifier-qualifier-list before ‘KABI_RESERVE’
  KABI_RESERVE(1)
  ^~~~~~~~~~~~
./arch/x86/include/asm/paravirt_types.h:286:2: error: expected specifier-qualifier-list before ‘KABI_RESERVE’
  KABI_RESERVE(1)
  ^~~~~~~~~~~~
./arch/x86/include/asm/paravirt_types.h:309:2: error: expected specifier-qualifier-list before ‘KABI_RESERVE’
  KABI_RESERVE(1)
  ^~~~~~~~~~~~
make[1]: *** [scripts/Makefile.build:117: arch/x86/kernel/asm-offsets.s] Error 1

To fix first error, reverts commit 3ba63bac bootparam: Add kabi_reserve in bootparam.
To fix second error, add include file kabi.h to arch/x86/include/asm/paravirt_types.h.
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NXie Xiuqi <xiexiuqi@huawei.com>

65805061

04 1月, 2022 2 次提交

Revert "kabi: reserve space for ptp_clock.h" · 43e6af94

由 Zheng Zengkai 提交于 1月 04, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4KFY7?from=project-issue
CVE: NA

-------------------------------

Revert this patch to fix following compile error:

In file included from <command-line>:32:
./usr/include/linux/ptp_clock.h:135:2: error: expected specifier-qualifier-list before ‘KABI_RESERVE’
  KABI_RESERVE(1)
  ^~~~~~~~~~~~
make[2]: *** [usr/include/Makefile:108: usr/include/linux/ptp_clock.hdrtest] Error 1

This reverts commit af9f9bb9.
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>

43e6af94

kabi: reserve space for arm64 cpufeature related structure · e2b85dd0

由 Jialin Zhang 提交于 1月 04, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4JBL0
CVE: NA

-------------------------------

Reserve space for the structure cpu_hwcap_keys and cpu_hwcaps.
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
Reviewed-by: Nwangxiongfeng <wangxiongfeng2@huawei.com>
Reviewed-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e2b85dd0

31 12月, 2021 5 次提交

UAPI: nfsfh.h: Replace one-element array with flexible-array member · 10e38d23

由 Gustavo A. R. Silva 提交于 12月 31, 2021

mainline inclusion
from mainline-v5.16-rc1
category: feature
bugzilla: 185747 https://gitee.com/openeuler/kernel/issues/I4OUFN
CVE: NA

-------------------------------

There is a regular need in the kernel to provide a way to declare having
a dynamically sized set of trailing elements in a structure. Kernel code
should always use “flexible array members”[1] for these cases. The older
style of one-element or zero-length arrays should no longer be used[2].

Use an anonymous union with a couple of anonymous structs in order to
keep userspace unchanged:

$ pahole -C nfs_fhbase_new fs/nfsd/nfsfh.o
struct nfs_fhbase_new {
        union {
                struct {
                        __u8       fb_version_aux;       /*     0     1 */
                        __u8       fb_auth_type_aux;     /*     1     1 */
                        __u8       fb_fsid_type_aux;     /*     2     1 */
                        __u8       fb_fileid_type_aux;   /*     3     1 */
                        __u32      fb_auth[1];           /*     4     4 */
                };                                       /*     0     8 */
                struct {
                        __u8       fb_version;           /*     0     1 */
                        __u8       fb_auth_type;         /*     1     1 */
                        __u8       fb_fsid_type;         /*     2     1 */
                        __u8       fb_fileid_type;       /*     3     1 */
                        __u32      fb_auth_flex[0];      /*     4     0 */
                };                                       /*     0     4 */
        };                                               /*     0     8 */

        /* size: 8, cachelines: 1, members: 1 */
        /* last cacheline: 8 bytes */
};

Also, this helps with the ongoing efforts to enable -Warray-bounds by
fixing the following warnings:

fs/nfsd/nfsfh.c: In function ‘nfsd_set_fh_dentry’:
fs/nfsd/nfsfh.c:191:41: warning: array subscript 1 is above array bounds of ‘__u32[1]’ {aka ‘unsigned int[1]’} [-Warray-bounds]
  191 |        ntohl((__force __be32)fh->fh_fsid[1])));
      |                              ~~~~~~~~~~~^~~
./include/linux/kdev_t.h:12:46: note: in definition of macro ‘MKDEV’
   12 | #define MKDEV(ma,mi) (((ma) << MINORBITS) | (mi))
      |                                              ^~
./include/uapi/linux/byteorder/little_endian.h:40:26: note: in expansion of macro ‘__swab32’
   40 | #define __be32_to_cpu(x) __swab32((__force __u32)(__be32)(x))
      |                          ^~~~~~~~
./include/linux/byteorder/generic.h:136:21: note: in expansion of macro ‘__be32_to_cpu’
  136 | #define ___ntohl(x) __be32_to_cpu(x)
      |                     ^~~~~~~~~~~~~
./include/linux/byteorder/generic.h:140:18: note: in expansion of macro ‘___ntohl’
  140 | #define ntohl(x) ___ntohl(x)
      |                  ^~~~~~~~
fs/nfsd/nfsfh.c:191:8: note: in expansion of macro ‘ntohl’
  191 |        ntohl((__force __be32)fh->fh_fsid[1])));
      |        ^~~~~
fs/nfsd/nfsfh.c:192:32: warning: array subscript 2 is above array bounds of ‘__u32[1]’ {aka ‘unsigned int[1]’} [-Warray-bounds]
  192 |    fh->fh_fsid[1] = fh->fh_fsid[2];
      |                     ~~~~~~~~~~~^~~
fs/nfsd/nfsfh.c:192:15: warning: array subscript 1 is above array bounds of ‘__u32[1]’ {aka ‘unsigned int[1]’} [-Warray-bounds]
  192 |    fh->fh_fsid[1] = fh->fh_fsid[2];
      |    ~~~~~~~~~~~^~~

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.10/process/deprecated.html#zero-length-and-one-element-arrays

Link: https://github.com/KSPP/linux/issues/79
Link: https://github.com/KSPP/linux/issues/109Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

10e38d23

kabi: Add kabi reservation for storage module · dde4fe56

由 Zhihao Cheng 提交于 12月 31, 2021

hulk inclusion
category: feature
bugzilla: 185747 https://gitee.com/openeuler/kernel/issues/I4OUFN
CVE: NA

-------------------------------

Introduce kabi for storage module.
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

dde4fe56

KABI:reserve space for sched structures · 800d1646

由 Guan Jing 提交于 12月 31, 2021

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4KAP1?from=project-issue
CVE: NA

--------
We reserve some fields beforehand for sched structures prone to change,
therefore, we can hot add/change features of sched with this enhancement.
After reserving, normally cache does not matter as the reserved fields
are not accessed at all.
Signed-off-by: NGuan Jing <guanjing6@huawei.com>
Reviewed-by: NChen Hui <judy.chenhui@huawei.com>
Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
Reviewed-by: NCheng Jian <cj.chengjian@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

800d1646

KABI: reserve space for IMA IPE · f389569e

由 Guo Zihua 提交于 12月 31, 2021

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4GK6B
CVE: NA
---------------------------

Reserving some fields for future IMA IPE development.
Signed-off-by: NGuo Zihua <guozihua@huawei.com>
Reviewed-by: Nweiyang wang <wangweiyang2@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

f389569e

kabi: reserve space for cred and user_namespace · 328e6481

由 GONG, Ruiqi 提交于 12月 31, 2021

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4MYYH
CVE: N/A

-----------------------

Reserve space in struct cred and struct user_namespace in advance to
prepare for merging a promising new feature [1] from kernel 5.14 and
some others in the future.

[1]: https://lkml.kernel.org/r/94d1dbecab060a6b116b0a2d1accd8ca1bbb4f5f.1619094428.git.legion@kernel.orgSigned-off-by: NGONG, Ruiqi <gongruiqi1@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: Nweiyang wang <wangweiyang2@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

328e6481

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功