提交 · 330d48105245abfb8c9ca491dc53ea500657217a · openeuler / Kernel

15 7月, 2019 7 次提交

docs: admin-guide: add kdump documentation into it · 330d4810

由 Mauro Carvalho Chehab 提交于 6月 13, 2019

The Kdump documentation describes procedures with admins use
in order to solve issues on their systems.
Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>

330d4810

docs: admin-guide: add laptops documentation · 9e1cbede

由 Mauro Carvalho Chehab 提交于 6月 13, 2019

The docs under Documentation/laptops contain users specific
information.
Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: NAndy Shevchenko <andy.shevchenko@gmail.com>

9e1cbede

docs: admin-guide: move sysctl directory to it · 57043247

由 Mauro Carvalho Chehab 提交于 4月 22, 2019

The stuff under sysctl describes /sys interface from userspace
point of view. So, add it to the admin-guide and remove the
:orphan: from its index file.
Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>

57043247

docs: block: convert to ReST · 898bd37a

由 Mauro Carvalho Chehab 提交于 4月 18, 2019

Rename the block documentation files to ReST, add an
index for them and adjust in order to produce a nice html
output via the Sphinx build system.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.
Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>

898bd37a

docs: sysctl: convert to ReST · 53b95375

由 Mauro Carvalho Chehab 提交于 4月 18, 2019

Rename the /proc/sys/ documentation files to ReST, using the
README file as a template for an index.rst, adding the other
files there via TOC markup.

Despite being written on different times with different
styles, try to make them somewhat coherent with a similar
look and feel, ensuring that they'll look nice as both
raw text file and as via the html output produced by the
Sphinx build system.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.
Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>

53b95375

docs: blockdev: convert to ReST · 39443104

由 Mauro Carvalho Chehab 提交于 4月 18, 2019

Rename the blockdev documentation files to ReST, add an
index for them and adjust in order to produce a nice html
output via the Sphinx build system.

The drbd sub-directory contains some graphs and data flows.
Add those too to the documentation.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.
Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>

39443104

docs: laptops: convert to ReST · b02f1651

由 Mauro Carvalho Chehab 提交于 4月 18, 2019

Rename the laptops documentation files to ReST, add an
index for them and adjust in order to produce a nice html
output via the Sphinx build system.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.
Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: NAndy Shevchenko <andy.shevchenko@gmail.com>

b02f1651

13 7月, 2019 2 次提交

mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options · 6471384a

由 Alexander Potapenko 提交于 7月 11, 2019

Patch series "add init_on_alloc/init_on_free boot options", v10.

Provide init_on_alloc and init_on_free boot options.

These are aimed at preventing possible information leaks and making the
control-flow bugs that depend on uninitialized values more deterministic.

Enabling either of the options guarantees that the memory returned by the
page allocator and SL[AU]B is initialized with zeroes.  SLOB allocator
isn't supported at the moment, as its emulation of kmem caches complicates
handling of SLAB_TYPESAFE_BY_RCU caches correctly.

Enabling init_on_free also guarantees that pages and heap objects are
initialized right after they're freed, so it won't be possible to access
stale data by using a dangling pointer.

As suggested by Michal Hocko, right now we don't let the heap users to
disable initialization for certain allocations.  There's not enough
evidence that doing so can speed up real-life cases, and introducing ways
to opt-out may result in things going out of control.

This patch (of 2):

The new options are needed to prevent possible information leaks and make
control-flow bugs that depend on uninitialized values more deterministic.

This is expected to be on-by-default on Android and Chrome OS.  And it
gives the opportunity for anyone else to use it under distros too via the
boot args.  (The init_on_free feature is regularly requested by folks
where memory forensics is included in their threat models.)

init_on_alloc=1 makes the kernel initialize newly allocated pages and heap
objects with zeroes.  Initialization is done at allocation time at the
places where checks for __GFP_ZERO are performed.

init_on_free=1 makes the kernel initialize freed pages and heap objects
with zeroes upon their deletion.  This helps to ensure sensitive data
doesn't leak via use-after-free accesses.

Both init_on_alloc=1 and init_on_free=1 guarantee that the allocator
returns zeroed memory.  The two exceptions are slab caches with
constructors and SLAB_TYPESAFE_BY_RCU flag.  Those are never
zero-initialized to preserve their semantics.

Both init_on_alloc and init_on_free default to zero, but those defaults
can be overridden with CONFIG_INIT_ON_ALLOC_DEFAULT_ON and
CONFIG_INIT_ON_FREE_DEFAULT_ON.

If either SLUB poisoning or page poisoning is enabled, those options take
precedence over init_on_alloc and init_on_free: initialization is only
applied to unpoisoned allocations.

Slowdown for the new features compared to init_on_free=0, init_on_alloc=0:

hackbench, init_on_free=1:  +7.62% sys time (st.err 0.74%)
hackbench, init_on_alloc=1: +7.75% sys time (st.err 2.14%)

Linux build with -j12, init_on_free=1:  +8.38% wall time (st.err 0.39%)
Linux build with -j12, init_on_free=1:  +24.42% sys time (st.err 0.52%)
Linux build with -j12, init_on_alloc=1: -0.13% wall time (st.err 0.42%)
Linux build with -j12, init_on_alloc=1: +0.57% sys time (st.err 0.40%)

The slowdown for init_on_free=0, init_on_alloc=0 compared to the baseline
is within the standard error.

The new features are also going to pave the way for hardware memory
tagging (e.g.  arm64's MTE), which will require both on_alloc and on_free
hooks to set the tags for heap objects.  With MTE, tagging will have the
same cost as memory initialization.

Although init_on_free is rather costly, there are paranoid use-cases where
in-memory data lifetime is desired to be minimized.  There are various
arguments for/against the realism of the associated threat models, but
given that we'll need the infrastructure for MTE anyway, and there are
people who want wipe-on-free behavior no matter what the performance cost,
it seems reasonable to include it in this series.

[glider@google.com: v8]
  Link: http://lkml.kernel.org/r/20190626121943.131390-2-glider@google.com
[glider@google.com: v9]
  Link: http://lkml.kernel.org/r/20190627130316.254309-2-glider@google.com
[glider@google.com: v10]
  Link: http://lkml.kernel.org/r/20190628093131.199499-2-glider@google.com
Link: http://lkml.kernel.org/r/20190617151050.92663-2-glider@google.comSigned-off-by: NAlexander Potapenko <glider@google.com>
Acked-by: NKees Cook <keescook@chromium.org>
Acked-by: Michal Hocko <mhocko@suse.cz>		[page and dmapool parts
Acked-by: James Morris <jamorris@linux.microsoft.com>]
Cc: Christoph Lameter <cl@linux.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Sandeep Patil <sspatil@android.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marco Elver <elver@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6471384a

mm, debug_pagealloc: use a page type instead of page_ext flag · 3972f6bb

由 Vlastimil Babka 提交于 7月 11, 2019

When debug_pagealloc is enabled, we currently allocate the page_ext
array to mark guard pages with the PAGE_EXT_DEBUG_GUARD flag.  Now that
we have the page_type field in struct page, we can use that instead, as
guard pages are neither PageSlab nor mapped to userspace.  This reduces
memory overhead when debug_pagealloc is enabled and there are no other
features requiring the page_ext array.

Link: http://lkml.kernel.org/r/20190603143451.27353-4-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3972f6bb

03 7月, 2019 1 次提交

x86/fsgsbase: Revert FSGSBASE support · 049331f2

由 Thomas Gleixner 提交于 7月 03, 2019

The FSGSBASE series turned out to have serious bugs and there is still an
open issue which is not fully understood yet.

The confidence in those changes has become close to zero especially as the
test cases which have been shipped with that series were obviously never
run before sending the final series out to LKML.

  ./fsgsbase_64 >/dev/null
  Segmentation fault

As the merge window is close, the only sane decision is to revert FSGSBASE
support. The revert is necessary as this branch has been merged into
perf/core already and rebasing all of that a few days before the merge
window is not the most brilliant idea.

I could definitely slap myself for not noticing the test case fail when
merging that series, but TBH my expectations weren't that low back
then. Won't happen again.

Revert the following commits:
539bca53 ("x86/entry/64: Fix and clean up paranoid_exit")
2c7b5ac5 ("Documentation/x86/64: Add documentation for GS/FS addressing mode")
f987c955 ("x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2")
2032f1f9 ("x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bit")
5bf0cab6 ("x86/entry/64: Document GSBASE handling in the paranoid path")
708078f6 ("x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit")
79e1932f ("x86/entry/64: Introduce the FIND_PERCPU_BASE macro")
1d07316b ("x86/entry/64: Switch CR3 before SWAPGS in paranoid entry")
f60a83df ("x86/process/64: Use FSGSBASE instructions on thread copy and ptrace")
1ab5f3f7 ("x86/process/64: Use FSBSBASE in switch_to() if available")
a86b4625 ("x86/fsgsbase/64: Enable FSGSBASE instructions in helper functions")
8b71340d ("x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions")
b64ed19b ("x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE")
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NIngo Molnar <mingo@kernel.org>
Cc: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>

049331f2

01 7月, 2019 1 次提交

powerpc: Document xive=off option · ba45cff6

由 Michael Neuling 提交于 5月 13, 2019

commit 243e2511 ("powerpc/xive: Native exploitation of the XIVE
interrupt controller") added an option to turn off Linux native XIVE
usage via the xive=off kernel command line option.

This documents this option.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Reviewed-by: NCédric Le Goater <clg@kaod.org>
Acked-by: NStewart Smith <stewart@linux.ibm.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

ba45cff6

28 6月, 2019 2 次提交

x86/vsyscall: Add a new vsyscall=xonly mode · bd49e16e

由 Andy Lutomirski 提交于 6月 26, 2019

With vsyscall emulation on, a readable vsyscall page is still exposed that
contains syscall instructions that validly implement the vsyscalls.

This is required because certain dynamic binary instrumentation tools
attempt to read the call targets of call instructions in the instrumented
code.  If the instrumented code uses vsyscalls, then the vsyscall page needs
to contain readable code.

Unfortunately, leaving readable memory at a deterministic address can be
used to help various ASLR bypasses, so some hardening value can be gained
by disallowing vsyscall reads.

Given how rarely the vsyscall page needs to be readable, add a mechanism to
make the vsyscall page be execute only.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NKees Cook <keescook@chromium.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Kernel Hardening <kernel-hardening@lists.openwall.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/d17655777c21bc09a7af1bbcf74e6f2b69a51152.1561610354.git.luto@kernel.org

bd49e16e

Documentation/admin: Remove the vsyscall=native documentation · d974ffcf

由 Andy Lutomirski 提交于 6月 26, 2019

The vsyscall=native feature is gone -- remove the docs.

Fixes: 076ca272 ("x86/vsyscall/64: Drop "native" vsyscalls")
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NKees Cook <keescook@chromium.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: stable@vger.kernel.org
Cc: Borislav Petkov <bp@alien8.de>
Cc: Kernel Hardening <kernel-hardening@lists.openwall.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/d77c7105eb4c57c1a95a95b6a5b8ba194a18e764.1561610354.git.luto@kernel.org

d974ffcf

22 6月, 2019 2 次提交

x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bit · 2032f1f9

由 Andy Lutomirski 提交于 5月 08, 2019

Now that FSGSBASE is fully supported, remove unsafe_fsgsbase, enable
FSGSBASE by default, and add nofsgsbase to disable it.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NChang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: https://lkml.kernel.org/r/1557309753-24073-17-git-send-email-chang.seok.bae@intel.com

2032f1f9

x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE · b64ed19b

由 Andy Lutomirski 提交于 5月 08, 2019

This is temporary.  It will allow the next few patches to be tested
incrementally.

Setting unsafe_fsgsbase is a root hole.  Don't do it.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NChang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Reviewed-by: NAndy Lutomirski <luto@kernel.org>
Cc: Ravi Shankar <ravi.v.shankar@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Link: https://lkml.kernel.org/r/1557309753-24073-4-git-send-email-chang.seok.bae@intel.com

b64ed19b

19 6月, 2019 1 次提交

powerpc/64s/radix: Enable HAVE_ARCH_HUGE_VMAP · d909f910

由 Nicholas Piggin 提交于 6月 10, 2019

This sets the HAVE_ARCH_HUGE_VMAP option, and defines the required
page table functions.

This enables huge (2MB and 1GB) ioremap mappings. I don't have a
benchmark for this change, but huge vmap will be used by a later core
kernel change to enable huge vmalloc memory mappings. This improves
cached `git diff` performance by about 5% on a 2-node POWER9 with 32MB
size dentry cache hash.

  Profiling git diff dTLB misses with a vanilla kernel:

  81.75%  git      [kernel.vmlinux]    [k] __d_lookup_rcu
   7.21%  git      [kernel.vmlinux]    [k] strncpy_from_user
   1.77%  git      [kernel.vmlinux]    [k] find_get_entry
   1.59%  git      [kernel.vmlinux]    [k] kmem_cache_free

            40,168      dTLB-miss
       0.100342754 seconds time elapsed

  With powerpc huge vmalloc:

             2,987      dTLB-miss
       0.095933138 seconds time elapsed
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

d909f910

15 6月, 2019 6 次提交

docs: EDID/HOWTO.txt: convert it and rename to howto.rst · a2f405a5

由 Mauro Carvalho Chehab 提交于 6月 12, 2019

Sphinx need to know when a paragraph ends. So, do some adjustments
at the file for it to be properly parsed.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

that's said, I believe that this file should be moved to the
GPU/DRM documentation.
Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: NJonathan Corbet <corbet@lwn.net>

a2f405a5

docs: watchdog: convert docs to ReST and rename to *.rst · cc2a2d19

由 Mauro Carvalho Chehab 提交于 6月 12, 2019

Convert those documents and prepare them to be part of the kernel
API book, as most of the stuff there are related to the
Kernel interfaces.

Still, in the future, it would make sense to split the docs,
as some of the stuff is clearly focused on sysadmin tasks.

The conversion is actually:
  - add blank lines and identation in order to identify paragraphs;
  - fix tables markups;
  - add some lists markups;
  - mark literal blocks;
  - adjust title markups.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.
Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>
Reviewed-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NJonathan Corbet <corbet@lwn.net>

cc2a2d19

docs: cgroup-v1: convert docs to ReST and rename to *.rst · 99c8b231

由 Mauro Carvalho Chehab 提交于 6月 12, 2019

Convert the cgroup-v1 files to ReST format, in order to
allow a later addition to the admin-guide.

The conversion is actually:
  - add blank lines and identation in order to identify paragraphs;
  - fix tables markups;
  - add some lists markups;
  - mark literal blocks;
  - adjust title markups.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.
Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NTejun Heo <tj@kernel.org>

99c8b231

docs: kdump: convert docs to ReST and rename to *.rst · d67297ad

由 Mauro Carvalho Chehab 提交于 6月 12, 2019

Convert kdump documentation to ReST and add it to the
user faced manual, as the documents are mainly focused on
sysadmins that would be enabling kdump.

Note: the vmcoreinfo.rst has one very long title on one of its
sub-sections:

	PG_lru|PG_private|PG_swapcache|PG_swapbacked|PG_slab|PG_hwpoision|PG_head_mask|PAGE_BUDDY_MAPCOUNT_VALUE(~PG_buddy)|PAGE_OFFLINE_MAPCOUNT_VALUE(~PG_offline)

I opted to break this one, into two entries with the same content,
in order to make it easier to display after being parsed in html and PDF.

The conversion is actually:
  - add blank lines and identation in order to identify paragraphs;
  - fix tables markups;
  - add some lists markups;
  - mark literal blocks;
  - adjust title markups.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.
Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: NJonathan Corbet <corbet@lwn.net>

d67297ad

docs: ide: convert docs to ReST and rename to *.rst · d7b461c5

由 Mauro Carvalho Chehab 提交于 6月 12, 2019

The conversion is actually:
  - add blank lines and identation in order to identify paragraphs;
  - fix tables markups;
  - add some lists markups;
  - mark literal blocks;
  - adjust title markups.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.
Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NJonathan Corbet <corbet@lwn.net>

d7b461c5

docs: fb: convert docs to ReST and rename to *.rst · ab42b818

由 Mauro Carvalho Chehab 提交于 6月 12, 2019

The conversion is actually:
  - add blank lines and identation in order to identify paragraphs;
  - fix tables markups;
  - add some lists markups;
  - mark literal blocks;
  - adjust title markups.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Also, removed the Maintained by, as requested by Geert.
Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: NJonathan Corbet <corbet@lwn.net>

ab42b818

11 6月, 2019 1 次提交

docs: s390: convert docs to ReST and rename to *.rst · 8b4a503d

由 Mauro Carvalho Chehab 提交于 6月 08, 2019

Convert all text files with s390 documentation to ReST format.

Tried to preserve as much as possible the original document
format. Still, some of the files required some work in order
for it to be visible on both plain text and after converted
to html.

The conversion is actually:
  - add blank lines and identation in order to identify paragraphs;
  - fix tables markups;
  - add some lists markups;
  - mark literal blocks;
  - adjust title markups.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.
Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>

8b4a503d

09 6月, 2019 2 次提交

docs: isdn: remove hisax references from kernel-parameters.txt · 9915ec28

由 Mauro Carvalho Chehab 提交于 6月 07, 2019

The hisax driver got removed on 85993b8c ("isdn: remove hisax driver"),
but a left-over was kept at kernel-parameters.txt.

Fixes: 85993b8c ("isdn: remove hisax driver")
Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>
Signed-off-by: NJonathan Corbet <corbet@lwn.net>

9915ec28

docs: fix broken documentation links · cb1aaebe

由 Mauro Carvalho Chehab 提交于 6月 07, 2019

Mostly due to x86 and acpi conversion, several documentation
links are still pointing to the old file. Fix them.
Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>
Reviewed-by: NWolfram Sang <wsa@the-dreams.de>
Reviewed-by: NSven Van Asbroeck <TheSven73@gmail.com>
Reviewed-by: NBhupesh Sharma <bhsharma@redhat.com>
Acked-by: NMark Brown <broonie@kernel.org>
Signed-off-by: NJonathan Corbet <corbet@lwn.net>

cb1aaebe

30 5月, 2019 1 次提交

doc: kernel-parameters.txt: fix documentation of nmi_watchdog parameter · 93285c01

由 Zhenzhong Duan 提交于 5月 21, 2019

The default behavior of hardlockup depends on the config of
CONFIG_BOOTPARAM_HARDLOCKUP_PANIC.

Fix the description of nmi_watchdog to make it clear.
Suggested-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com>
Reviewed-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
Acked-by: NIngo Molnar <mingo@kernel.org>
Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-doc@vger.kernel.org
Signed-off-by: NJonathan Corbet <corbet@lwn.net>

93285c01

26 5月, 2019 1 次提交

rcu: Enable elimination of Tree-RCU softirq processing · 48d07c04

由 Sebastian Andrzej Siewior 提交于 3月 20, 2019

Some workloads need to change kthread priority for RCU core processing
without affecting other softirq work.  This commit therefore introduces
the rcutree.use_softirq kernel boot parameter, which moves the RCU core
work from softirq to a per-CPU SCHED_OTHER kthread named rcuc.  Use of
SCHED_OTHER approach avoids the scalability problems that appeared
with the earlier attempt to move RCU core processing to from softirq
to kthreads.  That said, kernels built with RCU_BOOST=y will run the
rcuc kthreads at the RCU-boosting priority.

Note that rcutree.use_softirq=0 must be specified to move RCU core
processing to the rcuc kthreads: rcutree.use_softirq=1 is the default.
Reported-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
[ paulmck: Adjust for invoke_rcu_callbacks() only ever being invoked
  from RCU core processing, in contrast to softirq->rcuc transition
  in old mainline RCU priority boosting. ]
[ paulmck: Avoid wakeups when scheduler might have invoked rcu_read_unlock()
  while holding rq or pi locks, also possibly fixing a pre-existing latent
  bug involving raise_softirq()-induced wakeups. ]
Signed-off-by: NPaul E. McKenney <paulmck@linux.ibm.com>

48d07c04

19 5月, 2019 1 次提交

panic: add an option to replay all the printk message in buffer · de6da1e8

由 Feng Tang 提交于 5月 17, 2019

Currently on panic, kernel will lower the loglevel and print out pending
printk msg only with console_flush_on_panic().

Add an option for users to configure the "panic_print" to replay all
dmesg in buffer, some of which they may have never seen due to the
loglevel setting, which will help panic debugging .

[feng.tang@intel.com: keep the original console_flush_on_panic() inside panic()]
  Link: http://lkml.kernel.org/r/1556199137-14163-1-git-send-email-feng.tang@intel.com
[feng.tang@intel.com: use logbuf lock to protect the console log index]
  Link: http://lkml.kernel.org/r/1556269868-22654-1-git-send-email-feng.tang@intel.com
Link: http://lkml.kernel.org/r/1556095872-36838-1-git-send-email-feng.tang@intel.comSigned-off-by: NFeng Tang <feng.tang@intel.com>
Reviewed-by: NPetr Mladek <pmladek@suse.com>
Cc: Aaro Koskinen <aaro.koskinen@nokia.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

de6da1e8

15 5月, 2019 3 次提交

ipc: allow boot time extension of IPCMNI from 32k to 16M · 5ac893b8

由 Waiman Long 提交于 5月 14, 2019

The maximum number of unique System V IPC identifiers was limited to
32k.  That limit should be big enough for most use cases.

However, there are some users out there requesting for more, especially
those that are migrating from Solaris which uses 24 bits for unique
identifiers.  To satisfy the need of those users, a new boot time kernel
option "ipcmni_extend" is added to extend the IPCMNI value to 16M.  This
is a 512X increase which should be big enough for users out there that
need a large number of unique IPC identifier.

The use of this new option will change the pattern of the IPC
identifiers returned by functions like shmget(2).  An application that
depends on such pattern may not work properly.  So it should only be
used if the users really need more than 32k of unique IPC numbers.

This new option does have the side effect of reducing the maximum number
of unique sequence numbers from 64k down to 128.  So it is a trade-off.

The computation of a new IPC id is not done in the performance critical
path.  So a little bit of additional overhead shouldn't have any real
performance impact.

Link: http://lkml.kernel.org/r/20190329204930.21620-1-longman@redhat.comSigned-off-by: NWaiman Long <longman@redhat.com>
Acked-by: NManfred Spraul <manfred@colorfullife.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kees Cook <keescook@chromium.org>
Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5ac893b8

panic/reboot: allow specifying reboot_mode for panic only · b287a25a

由 Aaro Koskinen 提交于 5月 14, 2019

Allow specifying reboot_mode for panic only. This is needed on systems
where ramoops is used to store panic logs, and user wants to use warm
reset to preserve those, while still having cold reset on normal
reboots.

Link: http://lkml.kernel.org/r/20190322004735.27702-1-aaro.koskinen@iki.fiSigned-off-by: NAaro Koskinen <aaro.koskinen@nokia.com>
Reviewed-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b287a25a

mm: shuffle initial free memory to improve memory-side-cache utilization · e900a918

由 Dan Williams 提交于 5月 14, 2019

Patch series "mm: Randomize free memory", v10.

This patch (of 3):

Randomization of the page allocator improves the average utilization of
a direct-mapped memory-side-cache.  Memory side caching is a platform
capability that Linux has been previously exposed to in HPC
(high-performance computing) environments on specialty platforms.  In
that instance it was a smaller pool of high-bandwidth-memory relative to
higher-capacity / lower-bandwidth DRAM.  Now, this capability is going
to be found on general purpose server platforms where DRAM is a cache in
front of higher latency persistent memory [1].

Robert offered an explanation of the state of the art of Linux
interactions with memory-side-caches [2], and I copy it here:

    It's been a problem in the HPC space:
    http://www.nersc.gov/research-and-development/knl-cache-mode-performance-coe/

    A kernel module called zonesort is available to try to help:
    https://software.intel.com/en-us/articles/xeon-phi-software

    and this abandoned patch series proposed that for the kernel:
    https://lkml.kernel.org/r/20170823100205.17311-1-lukasz.daniluk@intel.com

    Dan's patch series doesn't attempt to ensure buffers won't conflict, but
    also reduces the chance that the buffers will. This will make performance
    more consistent, albeit slower than "optimal" (which is near impossible
    to attain in a general-purpose kernel).  That's better than forcing
    users to deploy remedies like:
        "To eliminate this gradual degradation, we have added a Stream
         measurement to the Node Health Check that follows each job;
         nodes are rebooted whenever their measured memory bandwidth
         falls below 300 GB/s."

A replacement for zonesort was merged upstream in commit cc9aec03
("x86/numa_emulation: Introduce uniform split capability").  With this
numa_emulation capability, memory can be split into cache sized
("near-memory" sized) numa nodes.  A bind operation to such a node, and
disabling workloads on other nodes, enables full cache performance.
However, once the workload exceeds the cache size then cache conflicts
are unavoidable.  While HPC environments might be able to tolerate
time-scheduling of cache sized workloads, for general purpose server
platforms, the oversubscribed cache case will be the common case.

The worst case scenario is that a server system owner benchmarks a
workload at boot with an un-contended cache only to see that performance
degrade over time, even below the average cache performance due to
excessive conflicts.  Randomization clips the peaks and fills in the
valleys of cache utilization to yield steady average performance.

Here are some performance impact details of the patches:

1/ An Intel internal synthetic memory bandwidth measurement tool, saw a
   3X speedup in a contrived case that tries to force cache conflicts.
   The contrived cased used the numa_emulation capability to force an
   instance of the benchmark to be run in two of the near-memory sized
   numa nodes.  If both instances were placed on the same emulated they
   would fit and cause zero conflicts.  While on separate emulated nodes
   without randomization they underutilized the cache and conflicted
   unnecessarily due to the in-order allocation per node.

2/ A well known Java server application benchmark was run with a heap
   size that exceeded cache size by 3X.  The cache conflict rate was 8%
   for the first run and degraded to 21% after page allocator aging.  With
   randomization enabled the rate levelled out at 11%.

3/ A MongoDB workload did not observe measurable difference in
   cache-conflict rates, but the overall throughput dropped by 7% with
   randomization in one case.

4/ Mel Gorman ran his suite of performance workloads with randomization
   enabled on platforms without a memory-side-cache and saw a mix of some
   improvements and some losses [3].

While there is potentially significant improvement for applications that
depend on low latency access across a wide working-set, the performance
may be negligible to negative for other workloads.  For this reason the
shuffle capability defaults to off unless a direct-mapped
memory-side-cache is detected.  Even then, the page_alloc.shuffle=0
parameter can be specified to disable the randomization on those systems.

Outside of memory-side-cache utilization concerns there is potentially
security benefit from randomization.  Some data exfiltration and
return-oriented-programming attacks rely on the ability to infer the
location of sensitive data objects.  The kernel page allocator, especially
early in system boot, has predictable first-in-first out behavior for
physical pages.  Pages are freed in physical address order when first
onlined.

Quoting Kees:
    "While we already have a base-address randomization
     (CONFIG_RANDOMIZE_MEMORY), attacks against the same hardware and
     memory layouts would certainly be using the predictability of
     allocation ordering (i.e. for attacks where the base address isn't
     important: only the relative positions between allocated memory).
     This is common in lots of heap-style attacks. They try to gain
     control over ordering by spraying allocations, etc.

     I'd really like to see this because it gives us something similar
     to CONFIG_SLAB_FREELIST_RANDOM but for the page allocator."

While SLAB_FREELIST_RANDOM reduces the predictability of some local slab
caches it leaves vast bulk of memory to be predictably in order allocated.
However, it should be noted, the concrete security benefits are hard to
quantify, and no known CVE is mitigated by this randomization.

Introduce shuffle_free_memory(), and its helper shuffle_zone(), to perform
a Fisher-Yates shuffle of the page allocator 'free_area' lists when they
are initially populated with free memory at boot and at hotplug time.  Do
this based on either the presence of a page_alloc.shuffle=Y command line
parameter, or autodetection of a memory-side-cache (to be added in a
follow-on patch).

The shuffling is done in terms of CONFIG_SHUFFLE_PAGE_ORDER sized free
pages where the default CONFIG_SHUFFLE_PAGE_ORDER is MAX_ORDER-1 i.e.  10,
4MB this trades off randomization granularity for time spent shuffling.
MAX_ORDER-1 was chosen to be minimally invasive to the page allocator
while still showing memory-side cache behavior improvements, and the
expectation that the security implications of finer granularity
randomization is mitigated by CONFIG_SLAB_FREELIST_RANDOM.  The
performance impact of the shuffling appears to be in the noise compared to
other memory initialization work.

This initial randomization can be undone over time so a follow-on patch is
introduced to inject entropy on page free decisions.  It is reasonable to
ask if the page free entropy is sufficient, but it is not enough due to
the in-order initial freeing of pages.  At the start of that process
putting page1 in front or behind page0 still keeps them close together,
page2 is still near page1 and has a high chance of being adjacent.  As
more pages are added ordering diversity improves, but there is still high
page locality for the low address pages and this leads to no significant
impact to the cache conflict rate.

[1]: https://itpeernetwork.intel.com/intel-optane-dc-persistent-memory-operating-modes/
[2]: https://lkml.kernel.org/r/AT5PR8401MB1169D656C8B5E121752FC0F8AB120@AT5PR8401MB1169.NAMPRD84.PROD.OUTLOOK.COM
[3]: https://lkml.org/lkml/2018/10/12/309

[dan.j.williams@intel.com: fix shuffle enable]
  Link: http://lkml.kernel.org/r/154943713038.3858443.4125180191382062871.stgit@dwillia2-desk3.amr.corp.intel.com
[cai@lca.pw: fix SHUFFLE_PAGE_ALLOCATOR help texts]
  Link: http://lkml.kernel.org/r/20190425201300.75650-1-cai@lca.pw
Link: http://lkml.kernel.org/r/154899811738.3165233.12325692939590944259.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NQian Cai <cai@lca.pw>
Reviewed-by: NKees Cook <keescook@chromium.org>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Robert Elliott <elliott@hpe.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e900a918

01 5月, 2019 1 次提交

arm64/speculation: Support 'mitigations=' cmdline option · a111b7c0

由 Josh Poimboeuf 提交于 4月 12, 2019

Configure arm64 runtime CPU speculation bug mitigations in accordance
with the 'mitigations=' cmdline option.  This affects Meltdown, Spectre
v2, and Speculative Store Bypass.

The default behavior is unchanged.
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
[will: reorder checks so KASLR implies KPTI and SSBS is affected by cmdline]
Signed-off-by: NWill Deacon <will.deacon@arm.com>

a111b7c0

29 4月, 2019 2 次提交

s390/pci: add parameter to disable usage of MIO instructions · 56271303

由 Sebastian Ott 提交于 4月 18, 2019

Allow users to disable usage of MIO instructions by specifying pci=nomio
at the kernel command line.
Signed-off-by: NSebastian Ott <sebott@linux.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

56271303

s390/pci: add parameter to force floating irqs · fbfe07d4

由 Sebastian Ott 提交于 2月 26, 2019

Provide a kernel parameter to force the usage of floating interrupts.
Signed-off-by: NSebastian Ott <sebott@linux.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

fbfe07d4

26 4月, 2019 1 次提交

arm64: Provide a command line to disable spectre_v2 mitigation · e5ce5e72

由 Jeremy Linton 提交于 4月 15, 2019

There are various reasons, such as benchmarking, to disable spectrev2
mitigation on a machine. Provide a command-line option to do so.
Signed-off-by: NJeremy Linton <jeremy.linton@arm.com>
Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
Reviewed-by: NAndre Przywara <andre.przywara@arm.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Tested-by: NStefan Wahren <stefan.wahren@i2se.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
Signed-off-by: NWill Deacon <will.deacon@arm.com>

e5ce5e72

23 4月, 2019 1 次提交

x86/xen: Add "xen_timer_slop" command line option · 2ec16bc0

由 Ryan Thibodeaux 提交于 3月 22, 2019

Add a new command-line option "xen_timer_slop=<INT>" that sets the
minimum delta of virtual Xen timers. This commit does not change the
default timer slop value for virtual Xen timers.

Lowering the timer slop value should improve the accuracy of virtual
timers (e.g., better process dispatch latency), but it will likely
increase the number of virtual timer interrupts (relative to the
original slop setting).

The original timer slop value has not changed since the introduction
of the Xen-aware Linux kernel code. This commit provides users an
opportunity to tune timer performance given the refinements to
hardware and the Xen event channel processing. It also mirrors
a feature in the Xen hypervisor - the "timer_slop" Xen command line
option.

[boris: updated comment describing TIMER_SLOP]
Signed-off-by: NRyan Thibodeaux <ryan.thibodeaux@starlab.io>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

2ec16bc0

22 4月, 2019 1 次提交

x86/kdump: Fall back to reserve high crashkernel memory · b9ac3849

由 Dave Young 提交于 4月 22, 2019

crashkernel=xM tries to reserve memory for the crash kernel under 4G,
which is enough, usually. But this could fail sometimes, for example
when one tries to reserve a big chunk like 2G, for example.

So let the crashkernel=xM just fall back to use high memory in case it
fails to find a suitable low range. Do not set the ,high as default
because it allocates extra low memory for DMA buffers and swiotlb, and
this is not always necessary for all machines.

Typically, crashkernel=128M usually works with low reservation under 4G,
so keep <4G as default.

 [ bp: Massage. ]
Signed-off-by: NDave Young <dyoung@redhat.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NIngo Molnar <mingo@kernel.org>
Acked-by: NBaoquan He <bhe@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: linux-doc@vger.kernel.org
Cc: "Paul E. McKenney" <paulmck@linux.ibm.com>
Cc: Petr Tesarik <ptesarik@suse.cz>
Cc: piliu@redhat.com
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Sinan Kaya <okaya@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thymo van Beers <thymovanbeers@gmail.com>
Cc: vgoyal@redhat.com
Cc: x86-ml <x86@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Zhimin Gu <kookoo.gu@intel.com>
Link: https://lkml.kernel.org/r/20190422031905.GA8387@dhcp-128-65.nay.redhat.com

b9ac3849

21 4月, 2019 2 次提交

powerpc: Add a framework for Kernel Userspace Access Protection · de78a9c4

由 Christophe Leroy 提交于 4月 18, 2019

This patch implements a framework for Kernel Userspace Access
Protection.

Then subarches will have the possibility to provide their own
implementation by providing setup_kuap() and
allow/prevent_user_access().

Some platforms will need to know the area accessed and whether it is
accessed from read, write or both. Therefore source, destination and
size and handed over to the two functions.

mpe: Rename to allow/prevent rather than unlock/lock, and add
read/write wrappers. Drop the 32-bit code for now until we have an
implementation for it. Add kuap to pt_regs for 64-bit as well as
32-bit. Don't split strings, use pr_crit_ratelimited().
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NRussell Currey <ruscur@russell.cc>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

de78a9c4

powerpc: Add skeleton for Kernel Userspace Execution Prevention · 0fb1c25a

由 Christophe Leroy 提交于 4月 18, 2019

This patch adds a skeleton for Kernel Userspace Execution Prevention.

Then subarches implementing it have to define CONFIG_PPC_HAVE_KUEP
and provide setup_kuep() function.
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
[mpe: Don't split strings, use pr_crit_ratelimited()]
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

0fb1c25a

18 4月, 2019 1 次提交

x86/speculation/mds: Add 'mitigations=' support for MDS · 5c14068f

由 Josh Poimboeuf 提交于 4月 17, 2019

Add MDS to the new 'mitigations=' cmdline option.
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

5c14068f

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功