提交 · f53b6dda4d9b6e4ba1af5efd6c6650784996b4e7 · openeuler / Kernel

24 1月, 2023 3 次提交

由 Kees Cook 提交于 11月 17, 2022

commit 9fc9e278 upstream.

Like oops_limit, add warn_limit for limiting the number of warnings when
panic_on_warn is not set.

Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: tangmeng <tangmeng@uniontech.com>
Cc: "Guilherme G. Piccoli" <gpiccoli@igalia.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-doc@vger.kernel.org
Reviewed-by: NLuis Chamberlain <mcgrof@kernel.org>
Signed-off-by: NKees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221117234328.594699-5-keescook@chromium.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

f53b6dda

exit: Allow oops_limit to be disabled · e0738725

由 Kees Cook 提交于 12月 02, 2022

commit de92f657 upstream.

In preparation for keeping oops_limit logic in sync with warn_limit,
have oops_limit == 0 disable checking the Oops counter.

Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-doc@vger.kernel.org
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

e0738725

exit: Put an upper limit on how often we can oops · 767997ef

由 Jann Horn 提交于 11月 17, 2022

commit d4ccd54d upstream.

Many Linux systems are configured to not panic on oops; but allowing an
attacker to oops the system **really** often can make even bugs that look
completely unexploitable exploitable (like NULL dereferences and such) if
each crash elevates a refcount by one or a lock is taken in read mode, and
this causes a counter to eventually overflow.

The most interesting counters for this are 32 bits wide (like open-coded
refcounts that don't use refcount_t). (The ldsem reader count on 32-bit
platforms is just 16 bits, but probably nobody cares about 32-bit platforms
that much nowadays.)

So let's panic the system if the kernel is constantly oopsing.

The speed of oopsing 2^32 times probably depends on several factors, like
how long the stack trace is and which unwinder you're using; an empirically
important one is whether your console is showing a graphical environment or
a text console that oopses will be printed to.
In a quick single-threaded benchmark, it looks like oopsing in a vfork()
child with a very short stack trace only takes ~510 microseconds per run
when a graphical console is active; but switching to a text console that
oopses are printed to slows it down around 87x, to ~45 milliseconds per
run.
(Adding more threads makes this faster, but the actual oops printing
happens under &die_lock on x86, so you can maybe speed this up by a factor
of around 2 and then any further improvement gets eaten up by lock
contention.)

It looks like it would take around 8-12 days to overflow a 32-bit counter
with repeated oopsing on a multi-core X86 system running a graphical
environment; both me (in an X86 VM) and Seth (with a distro kernel on
normal hardware in a standard configuration) got numbers in that ballpark.

12 days aren't *that* short on a desktop system, and you'd likely need much
longer on a typical server system (assuming that people don't run graphical
desktop environments on their servers), and this is a *very* noisy and
violent approach to exploiting the kernel; and it also seems to take orders
of magnitude longer on some machines, probably because stuff like EFI
pstore will slow it down a ton if that's active.
Signed-off-by: NJann Horn <jannh@google.com>
Link: https://lore.kernel.org/r/20221107201317.324457-1-jannh@google.comReviewed-by: NLuis Chamberlain <mcgrof@kernel.org>
Signed-off-by: NKees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221117234328.594699-2-keescook@chromium.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

767997ef

07 1月, 2023 1 次提交

iommu/amd: Fix ill-formed ivrs_ioapic, ivrs_hpet and ivrs_acpihid options · 7e883477

由 Kim Phillips 提交于 9月 19, 2022

commit 1198d231 upstream.

Currently, these options cause the following libkmod error:

libkmod: ERROR ../libkmod/libkmod-config.c:489 kcmdline_parse_result: \
Ignoring bad option on kernel command line while parsing module \
name: 'ivrs_xxxx[XX:XX'

Fix by introducing a new parameter format for these options and
throw a warning for the deprecated format.

Users are still allowed to omit the PCI Segment if zero.

Adding a Link: to the reason why we're modding the syntax parsing
in the driver and not in libkmod.

Fixes: ca3bf5d4 ("iommu/amd: Introduces ivrs_acpihid kernel parameter")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/linux-modules/20200310082308.14318-2-lucas.demarchi@intel.com/Reported-by: NKim Phillips <kim.phillips@amd.com>
Co-developed-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: NKim Phillips <kim.phillips@amd.com>
Link: https://lore.kernel.org/r/20220919155638.391481-2-kim.phillips@amd.comSigned-off-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

7e883477

31 12月, 2022 1 次提交

x86/split_lock: Add sysctl to control the misery mode · bb1878d7

由 Guilherme G. Piccoli 提交于 10月 24, 2022

[ Upstream commit 72720937 ]

Commit b041b525 ("x86/split_lock: Make life miserable for split lockers")
changed the way the split lock detector works when in "warn" mode;
basically, it not only shows the warn message, but also intentionally
introduces a slowdown through sleeping plus serialization mechanism
on such task. Based on discussions in [0], seems the warning alone
wasn't enough motivation for userspace developers to fix their
applications.

This slowdown is enough to totally break some proprietary (aka.
unfixable) userspace[1].

Happens that originally the proposal in [0] was to add a new mode
which would warns + slowdown the "split locking" task, keeping the
old warn mode untouched. In the end, that idea was discarded and
the regular/default "warn" mode now slows down the applications. This
is quite aggressive with regards proprietary/legacy programs that
basically are unable to properly run in kernel with this change.
While it is understandable that a malicious application could DoS
by split locking, it seems unacceptable to regress old/proprietary
userspace programs through a default configuration that previously
worked. An example of such breakage was reported in [1].

Add a sysctl to allow controlling the "misery mode" behavior, as per
Thomas suggestion on [2]. This way, users running legacy and/or
proprietary software are allowed to still execute them with a decent
performance while still observing the warning messages on kernel log.

[0] https://lore.kernel.org/lkml/20220217012721.9694-1-tony.luck@intel.com/
[1] https://github.com/doitsujin/dxvk/issues/2938
[2] https://lore.kernel.org/lkml/87pmf4bter.ffs@tglx/

[ dhansen: minor changelog tweaks, including clarifying the actual
  	   problem ]

Fixes: b041b525 ("x86/split_lock: Make life miserable for split lockers")
Suggested-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NGuilherme G. Piccoli <gpiccoli@igalia.com>
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: NTony Luck <tony.luck@intel.com>
Tested-by: NAndre Almeida <andrealmeid@igalia.com>
Link: https://lore.kernel.org/all/20221024200254.635256-1-gpiccoli%40igalia.comSigned-off-by: NSasha Levin <sashal@kernel.org>

bb1878d7

23 11月, 2022 2 次提交

Documentation: add amd-pstate kernel command line options · 1056d314

由 Perry Yuan 提交于 11月 17, 2022

Add a new amd pstate driver command line option to enable driver passive
working mode via MSR and shared memory interface to request desired
performance within abstract scale and the power management firmware
(SMU) convert the perf requests into actual hardware pstates.

Also the `disable` parameter can disable the pstate driver loading by
adding `amd_pstate=disable` to kernel command line.
Acked-by: NHuang Rui <ray.huang@amd.com>
Reviewed-by: NGautham R. Shenoy <gautham.shenoy@amd.com>
Tested-by: NWyes Karny <wyes.karny@amd.com>
Signed-off-by: NPerry Yuan <Perry.Yuan@amd.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

1056d314

Documentation: amd-pstate: add driver working mode introduction · 8a2cbf72

由 Perry Yuan 提交于 11月 17, 2022

Introduce the `amd_pstate` driver new working mode with
`amd_pstate=passive` added to kernel command line.
If there is no passive mode enabled by user, amd_pstate driver will be
disabled by default for now.
Acked-by: NHuang Rui <ray.huang@amd.com>
Reviewed-by: NGautham R. Shenoy <gautham.shenoy@amd.com>
Tested-by: NWyes Karny <wyes.karny@amd.com>
Signed-off-by: NPerry Yuan <Perry.Yuan@amd.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

8a2cbf72

25 10月, 2022 1 次提交

media: vivid.rst: loop_video is set on the capture devnode · de547896

由 Hans Verkuil 提交于 10月 17, 2022

The example on how to use and test Capture Overlay specified
the wrong video device node. Back in 2015 the loop_video control
moved from the output device to the capture device, but this
example code is still referring to the output video device.
Signed-off-by: NHans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: NMauro Carvalho Chehab <mchehab@kernel.org>

de547896

19 10月, 2022 1 次提交

dm verity: Add documentation for try_verify_in_tasklet option · dc3efedf

由 Milan Broz 提交于 9月 27, 2022

Add documentation that was missing from commit 5721d4e5 ("dm
verity: Add optional "try_verify_in_tasklet" feature").
Signed-off-by: NMilan Broz <gmazyland@gmail.com>
Signed-off-by: NMike Snitzer <snitzer@kernel.org>

dc3efedf

14 10月, 2022 1 次提交

Documentation: ACPI: Prune DSDT override documentation from index · 83439a0f

由 Bagas Sanjaya 提交于 10月 11, 2022

Commit d206cef0 ("ACPI: docs: Drop useless DSDT override documentation")
removes useless DSDT override documentation. However, the commit forgets
to prune the documentation entry from table of contents of ACPI admin
guide documentation, hence triggers Sphinx warning:

Documentation/admin-guide/acpi/index.rst:8: WARNING: toctree contains reference to nonexisting document 'admin-guide/acpi/dsdt-override'

Prune the entry to fix the warning.

Fixes: d206cef0 ("ACPI: docs: Drop useless DSDT override documentation")
Signed-off-by: NBagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

83439a0f

11 10月, 2022 1 次提交

xen/pv: support selecting safe/unsafe msr accesses · 3fac3734

由 Juergen Gross 提交于 9月 26, 2022

Instead of always doing the safe variants for reading and writing MSRs
in Xen PV guests, make the behavior controllable via Kconfig option
and a boot parameter.

The default will be the current behavior, which is to always use the
safe variant.
Signed-off-by: NJuergen Gross <jgross@suse.com>

3fac3734

06 10月, 2022 1 次提交

Documentation: amd-pstate: Add unit test introduction · 7fe36297

由 Meng Li 提交于 8月 17, 2022

Introduce the AMD P-State unit test module design and implementation.
It also talks about kselftest and how to use.
Signed-off-by: NMeng Li <li.meng@amd.com>
Acked-by: NHuang Rui <ray.huang@amd.com>
Reviewed-by: NShuah Khan <skhan@linuxfoundation.org>
Signed-off-by: NShuah Khan <skhan@linuxfoundation.org>

7fe36297

04 10月, 2022 7 次提交

mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol · e55b9f96

由 Johannes Weiner 提交于 9月 26, 2022

Since 2d1c4980 ("mm: memcontrol: make swap tracking an integral part
of memory control"), CONFIG_MEMCG_SWAP hasn't been a user-visible config
option anymore, it just means CONFIG_MEMCG && CONFIG_SWAP.

Update the sites accordingly and drop the symbol.

[ While touching the docs, remove two references to CONFIG_MEMCG_KMEM,
  which hasn't been a user-visible symbol for over half a decade. ]

Link: https://lkml.kernel.org/r/20220926135704.400818-5-hannes@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NShakeel Butt <shakeelb@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

e55b9f96

mm: memcontrol: deprecate swapaccounting=0 mode · b25806dc

由 Johannes Weiner 提交于 9月 26, 2022

The swapaccounting= commandline option already does very little today. To
close a trivial containment failure case, the swap ownership tracking part
of the swap controller has recently become mandatory (see commit
2d1c4980 ("mm: memcontrol: make swap tracking an integral part of
memory control") for details), which makes up the majority of the work
during swapout, swapin, and the swap slot map.

The only thing left under this flag is the page_counter operations and the
visibility of the swap control files in the first place, which are rather
meager savings. There also aren't many scenarios, if any, where
controlling the memory of a cgroup while allowing it unlimited access to a
global swap space is a workable resource isolation strategy.

On the other hand, there have been several bugs and confusion around the
many possible swap controller states (cgroup1 vs cgroup2 behavior, memory
accounting without swap accounting, memcg runtime disabled).

This puts the maintenance overhead of retaining the toggle above its
practical benefits. Deprecate it.

Link: https://lkml.kernel.org/r/20220926135704.400818-3-hannes@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Suggested-by: NShakeel Butt <shakeelb@google.com>
Reviewed-by: NShakeel Butt <shakeelb@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

b25806dc

mm/khugepaged: attempt to map file/shmem-backed pte-mapped THPs by pmds · 58ac9a89

由 Zach O'Keefe 提交于 9月 22, 2022

The main benefit of THPs are that they can be mapped at the pmd level,
increasing the likelihood of TLB hit and spending less cycles in page
table walks. pte-mapped hugepages - that is - hugepage-aligned compound
pages of order HPAGE_PMD_ORDER mapped by ptes - although being contiguous
in physical memory, don't have this advantage. In fact, one could argue
they are detrimental to system performance overall since they occupy a
precious hugepage-aligned/sized region of physical memory that could
otherwise be used more effectively. Additionally, pte-mapped hugepages
can be the cheapest memory to collapse for khugepaged since no new
hugepage allocation or copying of memory contents is necessary - we only
need to update the mapping page tables.

In the anonymous collapse path, we are able to collapse pte-mapped
hugepages (albeit, perhaps suboptimally), but the file/shmem path makes no
effort when compound pages (of any order) are encountered.

Identify pte-mapped hugepages in the file/shmem collapse path. The
final step of which makes a racy check of the value of the pmd to
ensure it maps a pte table. This should be fine, since races that
result in false-positive (i.e. attempt collapse even though we
shouldn't) will fail later in collapse_pte_mapped_thp() once we
actually lock mmap_lock and reinspect the pmd value. Races that result
in false-negatives (i.e. where we decide to not attempt collapse, but
should have) shouldn't be an issue, since in the worst case, we do
nothing - which is what we've done up to this point. We make a similar
check in retract_page_tables(). If we do think we've found a
pte-mapped hugepgae in khugepaged context, attempt to update page
tables mapping this hugepage.

Note that these collapses still count towards the
/sys/kernel/mm/transparent_hugepage/khugepaged/pages_collapsed counter,
and if the pte-mapped hugepage was also mapped into multiple process'
address spaces, could be incremented for each page table update. Since we
increment the counter when a pte-mapped hugepage is successfully added to
the list of to-collapse pte-mapped THPs, it's possible that we never
actually update the page table either. This is different from how
file/shmem pages_collapsed accounting works today where only a successful
page cache update is counted (it's also possible here that no page tables
are actually changed). Though it incurs some slop, this is preferred to
either not accounting for the event at all, or plumbing through data in
struct mm_slot on whether to account for the collapse or not.

Also note that work still needs to be done to support arbitrary compound
pages, and that this should all be converted to using folios.

[shy828301@gmail.com: Spelling mistake, update comment, and add Documentation]
Link: https://lore.kernel.org/linux-mm/CAHbLzkpHwZxFzjfX9nxVoRhzup8WMjMfyL6Xiq8mZ9M-N3ombw@mail.gmail.com/
Link: https://lkml.kernel.org/r/20220907144521.3115321-3-zokeefe@google.com
Link: https://lkml.kernel.org/r/20220922224046.1143204-3-zokeefe@google.comSigned-off-by: NZach O'Keefe <zokeefe@google.com>
Reviewed-by: NYang Shi <shy828301@gmail.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Chris Kennelly <ckennelly@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: James Houghton <jthoughton@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rongwei Wang <rongwei.wang@linux.alibaba.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

58ac9a89

mm/huge_memory: prevent THP_ZERO_PAGE_ALLOC increased twice · f4981502

由 Liu Shixin 提交于 9月 09, 2022

A user who reads THP_ZERO_PAGE_ALLOC may be more concerned about the huge
zero pages that are really allocated for thp.  It is misleading to
increase THP_ZERO_PAGE_ALLOC twice if two threads call get_huge_zero_page
concurrently.  Don't increase the value if the huge page is not really
used.

Update Documentation/admin-guide/mm/transhuge.rst to suit.

Link: https://lkml.kernel.org/r/20220909021653.3371879-1-liushixin2@huawei.comSigned-off-by: NLiu Shixin <liushixin2@huawei.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

f4981502

Docs/admin-guide/mm/damon/usage: note DAMON debugfs interface deprecation plan · f1f3afd5

由 SeongJae Park 提交于 9月 09, 2022

Commit b1840272 ("Docs/admin-guide/mm/damon/usage: document DAMON
sysfs interface") announced the DAMON debugfs interface deprecation plan,
but it is not so aggressively announced.  As the deprecation time is
coming, this commit makes the announce more easy to be found by adding the
note at the beginning of the DAMON debugfs interface usage document.

Link: https://lkml.kernel.org/r/20220909202901.57977-8-sj@kernel.orgSigned-off-by: NSeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Yun Levi <ppbuk5246@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

f1f3afd5

Docs/admin-guide/mm/damon/start: mention the dependency as sysfs instead of debugfs · 04cc7e4b

由 SeongJae Park 提交于 9月 09, 2022

'Getting Started' document of DAMON says DAMON user-space tool, damo[1],
is using DAMON debugfs interface, and therefore it needs to ensure debugfs
is mounted.  However, the latest version of the tool is using DAMON sysfs
interface.  Moreover, DAMON debugfs interface is going to be deprecated as
announced by commit b1840272 ("Docs/admin-guide/mm/damon/usage:
document DAMON sysfs interface").

This commit therefore update the document to tell readers about DAMON
sysfs interface dependency instead and never mention about debugfs
interface, which will be deprecated.

[1] https://github.com/awslabs/damo

Link: https://lkml.kernel.org/r/20220909202901.57977-7-sj@kernel.orgSigned-off-by: NSeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Yun Levi <ppbuk5246@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

04cc7e4b

Docs/admin-guide/mm/damon: rename the title of the document · 0ff11f10

由 SeongJae Park 提交于 9月 09, 2022

The title of the DAMON document for admin-guide, 'Monitoring Data
Accesses', could confuse readers in some ways.  First of all, DAMON is not
the only single way for data access monitoring.  And the document is for
not only the data access monitoring but also data access pattern based
memory management optimizations (DAMOS).  This commit updates the title to
'DAMON: Data Access MONitor', which more explicitly explains what the
document describes.

Link: https://lkml.kernel.org/r/20220909202901.57977-5-sj@kernel.org
Fixes: c4ba6014 ("Documentation: add documents for DAMON")
Signed-off-by: NSeongJae Park <sj@kernel.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Yun Levi <ppbuk5246@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

0ff11f10

01 10月, 2022 1 次提交

kunit: add kunit.enable to enable/disable KUnit test · d20a6ba5

由 Joe Fradley 提交于 8月 23, 2022

This patch adds the kunit.enable module parameter that will need to be
set to true in addition to KUNIT being enabled for KUnit tests to run.
The default value is true giving backwards compatibility. However, for
the production+testing use case the new config option
KUNIT_DEFAULT_ENABLED can be set to N requiring the tester to opt-in
by passing kunit.enable=1 to the kernel.
Signed-off-by: NJoe Fradley <joefradley@google.com>
Reviewed-by: NDavid Gow <davidgow@google.com>
Signed-off-by: NShuah Khan <skhan@linuxfoundation.org>

d20a6ba5

28 9月, 2022 6 次提交

ACPI: docs: Drop useless DSDT override documentation · d206cef0

由 Rafael J. Wysocki 提交于 9月 26, 2022

Because https://01.org/linux-acpi web site has become permanently
inaccessible, the "Overriding DSDT" document in the kernel tree
pointing to it as the main source of information is useless (and
the config option name mentioned by it is incorrect), so drop it
and drop the pointer to it from the ACPI Kconfig.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

d206cef0

docs: hugetlbpage.rst: fix a typo of hugepage size · 16461c66

由 Hoi Pok Wu 提交于 9月 22, 2022

should be kB instead of Kb
Signed-off-by: NHoi Pok Wu <wuhoipok@gmail.com>
Reviewed-by: NMuchun Song <songmuchun@bytedance.com>
Link: https://lore.kernel.org/r/20220922030645.9719-1-wuhoipok@gmail.comSigned-off-by: NJonathan Corbet <corbet@lwn.net>

16461c66

Documentation/hw-vuln: Update spectre doc · 06cb31cc

由 Lin Yujun 提交于 8月 30, 2022

commit 7c693f54 ("x86/speculation: Add spectre_v2=ibrs option to support Kernel IBRS")

adds the "ibrs " option in
Documentation/admin-guide/kernel-parameters.txt but omits it to
Documentation/admin-guide/hw-vuln/spectre.rst, add it.
Signed-off-by: NLin Yujun <linyujun809@huawei.com>
Link: https://lore.kernel.org/r/20220830123614.23007-1-linyujun809@huawei.comSigned-off-by: NJonathan Corbet <corbet@lwn.net>

06cb31cc

docs: admin-guide: for kernel bugs refer to other kernel documentation · 32a3a9db

由 Lukas Bulwahn 提交于 7月 20, 2022

The current section 'If something goes wrong' makes a number of suggestions
for debugging, bug hunting and reporting issues, which are quite briefly
described in that section.

However, the suggestions are also well covered in other kernel
documentation or sometimes simply outdated. Here, each suggestion in that
section is summarized, and then followed with its assessment, and the
derived action for each suggestion:

- use MAINTAINERS and mailing list: covered in 'Reporting issues',
summarized in the short guide, detailed in its further section.
Reporting issues even provides some specific examples that guides
readers well through the needed steps. Refer to 'Reporting issues'.

- contact Linus Torvalds: probably outdated as currently described.
nevertheless covered in 'Reporting issues'. Reporting issues points out
to contact the relevant kernel maintainers first, and after some
patience and failed attempts with those maintainers, contacting Linus
Torvalds might be okay. Refer to 'Reporting issues'.

- tell what kernel, how to duplicate, the setup, if the problem is new
or old and when did you notice: covered in 'Reporting issues',
especially in Step-by-step guide how to report issues to the kernel
maintainers. Refer to 'Reporting issues'.

- duplicate kernel bug reports exactly: covered in 'Reporting issues',
especially in Write and send the report. Refer to 'Reporting issues'.

- read 'Bug hunting': keep this reference. Refer to 'Bug hunting'.

- compile the kernel with CONFIG_KALLSYMS: covered in 'Reporting issues',
especially in Decode failure messages. Refer to 'Reporting issues'.

- alternatively, use ksymoops: ksymoops at the mentioned URL seems not to
be maintained anymore. It was released roughly once a year until
version 2.4.11 in 2005, but has not seen a new release since then. The
information in ./scripts/ksymoops/README is from 1999, and does not
give more insight on its actual maintenance state either. Ksymoops is
mentioned as system utility in changes.rst, but also not recommended
there. Drop the explanation on using ksymoops.

- alternatively, lookup dump manually with the EIP and nm to determine
the function in which the kernel crashes: this method seems already a
quite advanced and low-level debugging method. Even all the further
references on bug hunting and debugging do not mention it. Drop this
alternative method and limit mentioning methods explained in the other
existing kernel documentation.

- read 'Reporting issues': keep this reference.
Refer to 'Reporting issues'.

- use gdb for debugging: some specific details, e.g., edit
arch/x86/Makefile, are probably outdated or limited to one (historic
important) setup. Using gdb is covered in 'Bug hunting', 'Debugging
kernel and modules via gdb' and 'Using kgdb, kdb and the kernel
debugger internals'. Refer to those three documents.

Overall, it is sufficient to refer to reporting-issues.rst,
bug-hunting.rst, gdb-kernel-debugging.rst and kgdb.rst and this way cover
the existing suggestions.

'Reporting issues' is quite new and probably up to date. 'Bug hunting',
'Debugging kernel and modules via gdb' and 'Using kgdb, kdb and the kernel
debugger internals' might need some revisit and update, but they are
generally in an acceptable state for referring to them.

Replace the existing suggestions by reference to other existing kernel
documentation covering those suggestions---partly even nicely summarized
and then explained in greater detail.
Signed-off-by: NLukas Bulwahn <lukas.bulwahn@gmail.com>
Link: https://lore.kernel.org/r/20220720041325.15693-3-lukas.bulwahn@gmail.comSigned-off-by: NJonathan Corbet <corbet@lwn.net>

32a3a9db

docs: admin-guide: do not mention the 'run a.out user programs' feature · 3f10b508

由 Lukas Bulwahn 提交于 7月 20, 2022

Running a.out user programs with the latest kernel release is a very rare
and uncommon use case nowadays. The support of a.out user programs is only
remaining for the alpha architecture and is not defined and activated in
the architecture's Kconfig (so even the activation of this support requires
to modify the Kconfig file and not just kernel build configuration).

The discussion on a.out support in 2019 (see Link) shows that the support
of a.out user programs is just remaining for a special corner case from
some (alpha architecture) users.

There is no need to point out and mention this special feature to the
general audience of kernel users. Delete the reference to this historic and
special feature.

Link: https://lore.kernel.org/all/CAHk-=wgt7M6yA5BJCJo0nF22WgPJnN8CvViL9CAJmd+S+Civ6w@mail.gmail.com/Signed-off-by: NLukas Bulwahn <lukas.bulwahn@gmail.com>
Link: https://lore.kernel.org/r/20220720041325.15693-2-lukas.bulwahn@gmail.comSigned-off-by: NJonathan Corbet <corbet@lwn.net>

3f10b508

Remove duplicate words inside documentation · d2bef8e1

由 Akhil Raj 提交于 8月 27, 2022

I have removed repeated `the` inside the documentation
Signed-off-by: NAkhil Raj <lf32.dev@gmail.com>
Link: https://lore.kernel.org/r/20220827145359.32599-1-lf32.dev@gmail.comSigned-off-by: NJonathan Corbet <corbet@lwn.net>

d2bef8e1

27 9月, 2022 2 次提交

ksm: add profit monitoring documentation · 21b7bdb5

由 xu xin 提交于 8月 30, 2022

Add the description of KSM profit and how to determine it separately in
system-wide range and inner a single process.

Link: https://lkml.kernel.org/r/20220830144003.299870-1-xu.xin16@zte.com.cnSigned-off-by: Nxu xin <xu.xin16@zte.com.cn>
Reviewed-by: NXiaokai Ran <ran.xiaokai@zte.com.cn>
Reviewed-by: NYang Yang <yang.yang29@zte.com.cn>
Reviewed-by: NBagas Sanjaya <bagasdotme@gmail.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Izik Eidus <izik.eidus@ravellosystems.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

21b7bdb5

mm: multi-gen LRU: admin guide · 07017acb

由 Yu Zhao 提交于 9月 18, 2022

Add an admin guide.

Link: https://lkml.kernel.org/r/20220918080010.2920238-14-yuzhao@google.comSigned-off-by: NYu Zhao <yuzhao@google.com>
Acked-by: NBrian Geffon <bgeffon@google.com>
Acked-by: NJan Alexander Steffens (heftig) <heftig@archlinux.org>
Acked-by: NOleksandr Natalenko <oleksandr@natalenko.name>
Acked-by: NSteven Barrett <steven@liquorix.net>
Acked-by: NSuleiman Souhlal <suleiman@google.com>
Acked-by: NMike Rapoport <rppt@linux.ibm.com>
Tested-by: NDaniel Byrne <djbyrne@mtu.edu>
Tested-by: NDonald Carr <d@chaos-reins.com>
Tested-by: NHolger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: NKonstantin Kharlamov <Hi-Angel@yandex.ru>
Tested-by: NShuang Zhai <szhai2@cs.rochester.edu>
Tested-by: NSofia Trinh <sofia.trinh@edi.works>
Tested-by: NVaibhav Jain <vaibhav@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michael Larabel <Michael@MichaelLarabel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

07017acb

26 9月, 2022 1 次提交

Documentation: Rename PPC_FSL_BOOK3E to PPC_E500 · 404a5e72

由 Christophe Leroy 提交于 9月 19, 2022

CONFIG_PPC_FSL_BOOK3E is redundant with CONFIG_PPC_E500.

Rename it so that CONFIG_PPC_FSL_BOOK3E can be removed later.
Signed-off-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d3d42b395c09e66b0705fda1e51779f33e13ac38.1663606876.git.christophe.leroy@csgroup.eu

404a5e72

22 9月, 2022 1 次提交

docs: perf: Add description for Alibaba's T-Head PMU driver · a6f92909

由 Shuai Xue 提交于 9月 14, 2022

Alibaba's T-Head SoC implements uncore PMU for performance and functional
debugging to facilitate system maintenance. Document it to provide guidance
on how to use it.
Signed-off-by: NShuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: NJonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220914022326.88550-2-xueshuai@linux.alibaba.comSigned-off-by: NWill Deacon <will@kernel.org>

a6f92909

17 9月, 2022 1 次提交

bpf: Use bpf_capable() instead of CAP_SYS_ADMIN for blinding decision · bfeb7e39

由 Yauheni Kaliuta 提交于 9月 05, 2022

The full CAP_SYS_ADMIN requirement for blinding looks too strict nowadays.
These days given unprivileged BPF is disabled by default, the main users
for constant blinding coming from unprivileged in particular via cBPF -> eBPF
migration (e.g. old-style socket filters).
Signed-off-by: NYauheni Kaliuta <ykaliuta@redhat.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220831090655.156434-1-ykaliuta@redhat.com
Link: https://lore.kernel.org/bpf/20220905090149.61221-1-ykaliuta@redhat.com

bfeb7e39

16 9月, 2022 1 次提交

arm64: support huge vmalloc mappings · e9207223

由 Kefeng Wang 提交于 9月 11, 2022

As commit 559089e0 ("vmalloc: replace VM_NO_HUGE_VMAP with
VM_ALLOW_HUGE_VMAP"), the use of hugepage mappings for vmalloc
is an opt-in strategy, so it is saftly to support huge vmalloc
mappings on arm64, for now, it is used in kvmalloc() and
alloc_large_system_hash().
Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Link: https://lore.kernel.org/r/20220911044423.139229-1-wangkefeng.wang@huawei.comSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>

e9207223

12 9月, 2022 5 次提交

kernel/utsname_sysctl.c: print kernel arch · bfca3dd3

由 Petr Vorel 提交于 9月 01, 2022

Print the machine hardware name (UTS_MACHINE) in /proc/sys/kernel/arch.

This helps people who debug kernel with initramfs with minimal environment
(i.e.  without coreutils or even busybox) or allow to open sysfs file
instead of run 'uname -m' in high level languages.

Link: https://lkml.kernel.org/r/20220901194403.3819-1-pvorel@suse.czSigned-off-by: NPetr Vorel <pvorel@suse.cz>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: David Sterba <dsterba@suse.com>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

bfca3dd3

page_ext: introduce boot parameter 'early_page_ext' · c4f20f14

由 Li Zhe 提交于 8月 25, 2022

In commit 2f1ee091 ("Revert "mm: use early_pfn_to_nid in
page_ext_init""), we call page_ext_init() after page_alloc_init_late() to
avoid some panic problem.  It seems that we cannot track early page
allocations in current kernel even if page structure has been initialized
early.

This patch introduces a new boot parameter 'early_page_ext' to resolve
this problem.  If we pass it to the kernel, page_ext_init() will be moved
up and the feature 'deferred initialization of struct pages' will be
disabled to initialize the page allocator early and prevent the panic
problem above.  It can help us to catch early page allocations.  This is
useful especially when we find that the free memory value is not the same
right after different kernel booting.

[akpm@linux-foundation.org: fix section issue by removing __meminitdata]
Link: https://lkml.kernel.org/r/20220825102714.669-1-lizhe.67@bytedance.comSigned-off-by: NLi Zhe <lizhe.67@bytedance.com>
Suggested-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark-PK Tsai <mark-pk.tsai@mediatek.com>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

c4f20f14

memory tiering: rate limit NUMA migration throughput · c6833e10

由 Huang Ying 提交于 7月 13, 2022

In NUMA balancing memory tiering mode, if there are hot pages in slow
memory node and cold pages in fast memory node, we need to promote/demote
hot/cold pages between the fast and cold memory nodes.

A choice is to promote/demote as fast as possible.  But the CPU cycles and
memory bandwidth consumed by the high promoting/demoting throughput will
hurt the latency of some workload because of accessing inflating and slow
memory bandwidth contention.

A way to resolve this issue is to restrict the max promoting/demoting
throughput.  It will take longer to finish the promoting/demoting.  But
the workload latency will be better.  This is implemented in this patch as
the page promotion rate limit mechanism.

The number of the candidate pages to be promoted to the fast memory node
via NUMA balancing is counted, if the count exceeds the limit specified by
the users, the NUMA balancing promotion will be stopped until the next
second.

A new sysctl knob kernel.numa_balancing_promote_rate_limit_MBps is added
for the users to specify the limit.

Link: https://lkml.kernel.org/r/20220713083954.34196-3-ying.huang@intel.comSigned-off-by: N"Huang, Ying" <ying.huang@intel.com>
Reviewed-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Tested-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: osalvador <osalvador@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Wei Xu <weixugc@google.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zhong Jiang <zhongjiang-ali@linux.alibaba.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

c6833e10

mm/cma_debug: show complete cma name in debugfs directories · 9a79443d

由 Charan Teja Kalla 提交于 8月 11, 2022

Currently only 12 characters of the cma name is being used as the debug
directories where as the cma name can be of length CMA_MAX_NAME(=64)
characters. One side problem with this is having 2 cma's with first
common 12 characters would end up in trying to create directories with
same name and fails with -EEXIST thus can limit cma debug functionality.

The 'cma-' prefix is used initially where cma areas don't have any names
and are represented by simple integer values. Since now each cma would be
having its own name, drop 'cma-' prefix for the cma debug directories as
they are clearly evident that they are for cma debug through creating them
in /sys/kernel/debug/cma/ path.

Link: https://lkml.kernel.org/r/1660223729-22461-1-git-send-email-quic_charante@quicinc.comSigned-off-by: NCharan Teja Kalla <quic_charante@quicinc.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Pavan Kondeti <quic_pkondeti@quicinc.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

9a79443d

userfaultfd: update documentation to describe /dev/userfaultfd · 816284a3

由 Axel Rasmussen 提交于 8月 08, 2022

Explain the different ways to create a new userfaultfd, and how access
control works for each way.

[axelrasmussen@google.com: improve wording in documentation, per Mike]
  Link: https://lkml.kernel.org/r/20220819205201.658693-5-axelrasmussen@google.com
Link: https://lkml.kernel.org/r/20220808175614.3885028-5-axelrasmussen@google.comSigned-off-by: NAxel Rasmussen <axelrasmussen@google.com>
Acked-by: NPeter Xu <peterx@redhat.com>
Reviewed-by: NShuah Khan <skhan@linuxfoundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dmitry V. Levin <ldv@altlinux.org>
Cc: Gleb Fotengauer-Malinovskiy <glebfm@altlinux.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Zhang Yi <yi.zhang@huawei.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

816284a3

10 9月, 2022 1 次提交

arm64: spectre: increase parameters that can be used to turn off bhb mitigation individually · 877ace9e

由 Liu Song 提交于 8月 26, 2022

In our environment, it was found that the mitigation BHB has a great
impact on the benchmark performance. For example, in the lmbench test,
the "process fork && exit" test performance drops by 20%.
So it is necessary to have the ability to turn off the mitigation
individually through cmdline, thus avoiding having to compile the
kernel by adjusting the config.
Signed-off-by: NLiu Song <liusong@linux.alibaba.com>
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/1661514050-22263-1-git-send-email-liusong@linux.alibaba.comSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>

877ace9e

09 9月, 2022 2 次提交

sched/psi: Per-cgroup PSI accounting disable/re-enable interface · 34f26a15

由 Chengming Zhou 提交于 9月 07, 2022

PSI accounts stalls for each cgroup separately and aggregates it
at each level of the hierarchy. This may cause non-negligible overhead
for some workloads when under deep level of the hierarchy.

commit 3958e2d0 ("cgroup: make per-cgroup pressure stall tracking configurable")
make PSI to skip per-cgroup stall accounting, only account system-wide
to avoid this each level overhead.

But for our use case, we also want leaf cgroup PSI stats accounted for
userspace adjustment on that cgroup, apart from only system-wide adjustment.

So this patch introduce a per-cgroup PSI accounting disable/re-enable
interface "cgroup.pressure", which is a read-write single value file that
allowed values are "0" and "1", the defaults is "1" so per-cgroup
PSI stats is enabled by default.

Implementation details:

It should be relatively straight-forward to disable and re-enable
state aggregation, time tracking, averaging on a per-cgroup level,
if we can live with losing history from while it was disabled.
I.e. the avgs will restart from 0, total= will have gaps.

But it's hard or complex to stop/restart groupc->tasks[] updates,
which is not implemented in this patch. So we always update
groupc->tasks[] and PSI_ONCPU bit in psi_group_change() even when
the cgroup PSI stats is disabled.
Suggested-by: NJohannes Weiner <hannes@cmpxchg.org>
Suggested-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NChengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Link: https://lkml.kernel.org/r/20220907090332.2078-1-zhouchengming@bytedance.com

34f26a15

sched/psi: Add PSI_IRQ to track IRQ/SOFTIRQ pressure · 52b1364b

由 Chengming Zhou 提交于 8月 26, 2022

Now PSI already tracked workload pressure stall information for
CPU, memory and IO. Apart from these, IRQ/SOFTIRQ could have
obvious impact on some workload productivity, such as web service
workload.

When CONFIG_IRQ_TIME_ACCOUNTING, we can get IRQ/SOFTIRQ delta time
from update_rq_clock_task(), in which we can record that delta
to CPU curr task's cgroups as PSI_IRQ_FULL status.

Note we don't use PSI_IRQ_SOME since IRQ/SOFTIRQ always happen in
the current task on the CPU, make nothing productive could run
even if it were runnable, so we only use PSI_IRQ_FULL.
Signed-off-by: NChengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Link: https://lore.kernel.org/r/20220825164111.29534-8-zhouchengming@bytedance.com

52b1364b

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功