提交 · 25cf336de51b51a3e440e1893751f9532095eff0 · openeuler / Kernel

04 7月, 2020 8 次提交

由 Eric W. Biederman 提交于 6月 25, 2020

Now that the last callser has been removed remove this code from exec.

For anyone thinking of resurrecing do_execve_file please note that
the code was buggy in several fundamental ways.

- It did not ensure the file it was passed was read-only and that
deny_write_access had been called on it. Which subtlely breaks
invaniants in exec.

- The caller of do_execve_file was expected to hold and put a
reference to the file, but an extra reference for use by exec was
not taken so that when exec put it's reference to the file an
underflow occured on the file reference count.

- The point of the interface was so that a pathname did not need to
exist. Which breaks pathname based LSMs.

Tetsuo Handa originally reported these issues[1]. While it was clear
that deny_write_access was missing the fundamental incompatibility
with the passed in O_RDWR filehandle was not immediately recognized.

All of these issues were fixed by modifying the usermode driver code
to have a path, so it did not need this hack.
Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
[1] https://lore.kernel.org/linux-fsdevel/2a8775b4-1dd5-9d5c-aa42-9872445e0942@i-love.sakura.ne.jp/
v1: https://lkml.kernel.org/r/871rm2f0hi.fsf_-_@x220.int.ebiederm.org
v2: https://lkml.kernel.org/r/87lfk54p0m.fsf_-_@x220.int.ebiederm.org
Link: https://lkml.kernel.org/r/20200702164140.4468-10-ebiederm@xmission.comReviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Tested-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

25cf336d

umh: Stop calling do_execve_file · 55e6074e

由 Eric W. Biederman 提交于 6月 25, 2020

With the user mode driver code changed to not set subprocess_info.file
there are no more users of subproces_info.file. Remove this field
from struct subprocess_info and remove the only user in
call_usermodehelper_exec_async that would call do_execve_file instead
of do_execve if file was set.

v1: https://lkml.kernel.org/r/877dvuf0i7.fsf_-_@x220.int.ebiederm.org
v2: https://lkml.kernel.org/r/87r1tx4p2a.fsf_-_@x220.int.ebiederm.org
Link: https://lkml.kernel.org/r/20200702164140.4468-9-ebiederm@xmission.comReviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Tested-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

55e6074e

umd: Transform fork_usermode_blob into fork_usermode_driver · e2dc9bf3

由 Eric W. Biederman 提交于 6月 25, 2020

Instead of loading a binary blob into a temporary file with
shmem_kernel_file_setup load a binary blob into a temporary tmpfs
filesystem. This means that the blob can be stored in an init section
and discared, and it means the binary blob will have a filename so can
be executed normally.

The only tricky thing about this code is that in the helper function
blob_to_mnt __fput_sync is used. That is because a file can not be
executed if it is still open for write, and the ordinary delayed close
for kernel threads does not happen soon enough, which causes the
following exec to fail. The function umd_load_blob is not called with
any locks so this should be safe.

Executing the blob normally winds up correcting several problems with
the user mode driver code discovered by Tetsuo Handa[1]. By passing
an ordinary filename into the exec, it is no longer necessary to
figure out how to turn a O_RDWR file descriptor into a properly
referende counted O_EXEC file descriptor that forbids all writes. For
path based LSMs there are no new special cases.

[1] https://lore.kernel.org/linux-fsdevel/2a8775b4-1dd5-9d5c-aa42-9872445e0942@i-love.sakura.ne.jp/
v1: https://lkml.kernel.org/r/87d05mf0j9.fsf_-_@x220.int.ebiederm.org
v2: https://lkml.kernel.org/r/87wo3p4p35.fsf_-_@x220.int.ebiederm.org
Link: https://lkml.kernel.org/r/20200702164140.4468-8-ebiederm@xmission.comReviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Tested-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

e2dc9bf3

umd: Rename umd_info.cmdline umd_info.driver_name · 1199c6c3

由 Eric W. Biederman 提交于 6月 25, 2020

The only thing supplied in the cmdline today is the driver name so
rename the field to clarify the code.

As this value is always supplied stop trying to handle the case of
a NULL cmdline.

Additionally since we now have a name we can count on use the
driver_name any place where the code is looking for a name
of the binary.

v1: https://lkml.kernel.org/r/87imfef0k3.fsf_-_@x220.int.ebiederm.org
v2: https://lkml.kernel.org/r/87366d63os.fsf_-_@x220.int.ebiederm.org
Link: https://lkml.kernel.org/r/20200702164140.4468-7-ebiederm@xmission.comReviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Tested-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

1199c6c3

umd: For clarity rename umh_info umd_info · 74be2d3b

由 Eric W. Biederman 提交于 6月 26, 2020

This structure is only used for user mode drivers so change
the prefix from umh to umd to make that clear.

v1: https://lkml.kernel.org/r/87o8p6f0kw.fsf_-_@x220.int.ebiederm.org
v2: https://lkml.kernel.org/r/878sg563po.fsf_-_@x220.int.ebiederm.org
Link: https://lkml.kernel.org/r/20200702164140.4468-6-ebiederm@xmission.comReviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Tested-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

74be2d3b

umh: Separate the user mode driver and the user mode helper support · 884c5e68

由 Eric W. Biederman 提交于 6月 26, 2020

This makes it clear which code is part of the core user mode
helper support and which code is needed to implement user mode
drivers.

This makes the kernel smaller for everyone who does not use a usermode
driver.

v1: https://lkml.kernel.org/r/87tuyyf0ln.fsf_-_@x220.int.ebiederm.org
v2: https://lkml.kernel.org/r/87imf963s6.fsf_-_@x220.int.ebiederm.org
Link: https://lkml.kernel.org/r/20200702164140.4468-5-ebiederm@xmission.comReviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Tested-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

884c5e68

umh: Remove call_usermodehelper_setup_file. · 21d59828

由 Eric W. Biederman 提交于 6月 24, 2020

The only caller of call_usermodehelper_setup_file is fork_usermode_blob.
In fork_usermode_blob replace call_usermodehelper_setup_file with
call_usermodehelper_setup and delete fork_usermodehelper_setup_file.

For this to work the argv_free is moved from umh_clean_and_save_pid
to fork_usermode_blob.

v1: https://lkml.kernel.org/r/87zh8qf0mp.fsf_-_@x220.int.ebiederm.org
v2: https://lkml.kernel.org/r/87o8p163u1.fsf_-_@x220.int.ebiederm.org
Link: https://lkml.kernel.org/r/20200702164140.4468-4-ebiederm@xmission.comReviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Tested-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

21d59828

umh: Capture the pid in umh_pipe_setup · 5fec25f2

由 Eric W. Biederman 提交于 6月 24, 2020

The pid in struct subprocess_info is only used by umh_clean_and_save_pid to
write the pid into umh_info.

Instead always capture the pid on struct umh_info in umh_pipe_setup, removing
code that is specific to user mode drivers from the common user path of
user mode helpers.

v1: https://lkml.kernel.org/r/87h7uygf9i.fsf_-_@x220.int.ebiederm.org
v2: https://lkml.kernel.org/r/875zb97iix.fsf_-_@x220.int.ebiederm.org
Link: https://lkml.kernel.org/r/20200702164140.4468-1-ebiederm@xmission.comReviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Tested-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

5fec25f2

15 6月, 2020 1 次提交

security: Add LSM hooks to set*gid syscalls · 39030e13

由 Thomas Cedeno 提交于 6月 09, 2020

The SafeSetID LSM uses the security_task_fix_setuid hook to filter
set*uid() syscalls according to its configured security policy. In
preparation for adding analagous support in the LSM for set*gid()
syscalls, we add the requisite hook here. Tested by putting print
statements in the security_task_fix_setgid hook and seeing them get hit
during kernel boot.
Signed-off-by: NThomas Cedeno <thomascedeno@google.com>
Signed-off-by: NMicah Morton <mortonm@chromium.org>

39030e13

12 6月, 2020 17 次提交

compiler_types.h, kasan: Use __SANITIZE_ADDRESS__ instead of CONFIG_KASAN to decide inlining · 1f44328e

由 Marco Elver 提交于 5月 21, 2020

Use __always_inline in compilation units that have instrumentation
disabled (KASAN_SANITIZE_foo.o := n) for KASAN, like it is done for
KCSAN.

Also, add common documentation for KASAN and KCSAN explaining the
attribute.

 [ bp: Massage commit message. ]
Signed-off-by: NMarco Elver <elver@google.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NWill Deacon <will@kernel.org>
Link: https://lkml.kernel.org/r/20200521142047.169334-12-elver@google.com

1f44328e

compiler.h: Move function attributes to compiler_types.h · eb73876c

由 Marco Elver 提交于 5月 21, 2020

Cleanup and move the KASAN and KCSAN related function attributes to
compiler_types.h, where the rest of the same kind live.

No functional change intended.
Signed-off-by: NMarco Elver <elver@google.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NWill Deacon <will@kernel.org>
Link: https://lkml.kernel.org/r/20200521142047.169334-11-elver@google.com

eb73876c

compiler.h: Avoid nested statement expression in data_race() · 95c094fc

由 Marco Elver 提交于 5月 21, 2020

It appears that compilers have trouble with nested statement
expressions. Therefore, remove one level of statement expression nesting
from the data_race() macro. This will help avoiding potential problems
in the future as its usage increases.
Reported-by: NBorislav Petkov <bp@suse.de>
Reported-by: NNathan Chancellor <natechancellor@gmail.com>
Signed-off-by: NMarco Elver <elver@google.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NWill Deacon <will@kernel.org>
Tested-by: NNick Desaulniers <ndesaulniers@google.com>
Link: https://lkml.kernel.org/r/20200520221712.GA21166@zn.tnic
Link: https://lkml.kernel.org/r/20200521142047.169334-10-elver@google.com

95c094fc

compiler.h: Remove data_race() and unnecessary checks from {READ,WRITE}_ONCE() · 44b97dcc

由 Marco Elver 提交于 5月 21, 2020

The volatile accesses no longer need to be wrapped in data_race()
because compilers that emit instrumentation distinguishing volatile
accesses are required for KCSAN.

Consequently, the explicit kcsan_check_atomic*() are no longer required
either since the compiler emits instrumentation distinguishing the
volatile accesses.

Finally, simplify __READ_ONCE_SCALAR() and remove __WRITE_ONCE_SCALAR().

 [ bp: Convert commit message to passive voice. ]
Signed-off-by: NMarco Elver <elver@google.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NWill Deacon <will@kernel.org>
Link: https://lkml.kernel.org/r/20200521142047.169334-9-elver@google.com

44b97dcc

kcsan: Remove 'noinline' from __no_kcsan_or_inline · e3b779d9

由 Marco Elver 提交于 5月 21, 2020

Some compilers incorrectly inline small __no_kcsan functions, which then
results in instrumenting the accesses. For this reason, the 'noinline'
attribute was added to __no_kcsan_or_inline. All known versions of GCC
are affected by this. Supported versions of Clang are unaffected, and
never inline a no_sanitize function.

However, the attribute 'noinline' in __no_kcsan_or_inline causes
unexpected code generation in functions that are __no_kcsan and call a
__no_kcsan_or_inline function.

In certain situations it is expected that the __no_kcsan_or_inline
function is actually inlined by the __no_kcsan function, and *no* calls
are emitted. By removing the 'noinline' attribute, give the compiler
the ability to inline and generate the expected code in __no_kcsan
functions.
Signed-off-by: NMarco Elver <elver@google.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NWill Deacon <will@kernel.org>
Link: https://lkml.kernel.org/r/CANpmjNNOpJk0tprXKB_deiNAv_UmmORf1-2uajLhnLWQQ1hvoA@mail.gmail.com
Link: https://lkml.kernel.org/r/20200521142047.169334-6-elver@google.com

e3b779d9

SUNRPC: rpc_xprt lifetime events should record xprt->state · 94afd9c4

由 Chuck Lever 提交于 5月 18, 2020

Help troubleshoot the logic that uses these flags.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

94afd9c4

nfs: set invalid blocks after NFSv4 writes · 3a39e778

由 Zheng Bin 提交于 5月 21, 2020

Use the following command to test nfsv4(size of file1M is 1MB):
mount -t nfs -o vers=4.0,actimeo=60 127.0.0.1/dir1 /mnt
cp file1M /mnt
du -h /mnt/file1M  -->0 within 60s, then 1M

When write is done(cp file1M /mnt), will call this:
nfs_writeback_done
  nfs4_write_done
    nfs4_write_done_cb
      nfs_writeback_update_inode
        nfs_post_op_update_inode_force_wcc_locked(change, ctime, mtime
nfs_post_op_update_inode_force_wcc_locked
   nfs_set_cache_invalid
   nfs_refresh_inode_locked
     nfs_update_inode

nfsd write response contains change, ctime, mtime, the flag will be
clear after nfs_update_inode. Howerver, write response does not contain
space_used, previous open response contains space_used whose value is 0,
so inode->i_blocks is still 0.

nfs_getattr  -->called by "du -h"
  do_update |= force_sync || nfs_attribute_cache_expired -->false in 60s
  cache_validity = READ_ONCE(NFS_I(inode)->cache_validity)
  do_update |= cache_validity & (NFS_INO_INVALID_ATTR    -->false
  if (do_update) {
        __nfs_revalidate_inode
  }

Within 60s, does not send getattr request to nfsd, thus "du -h /mnt/file1M"
is 0.

Add a NFS_INO_INVALID_BLOCKS flag, set it when nfsv4 write is done.

Fixes: 16e14375 ("NFS: More fine grained attribute tracking")
Signed-off-by: NZheng Bin <zhengbin13@huawei.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

3a39e778

SUNRPC: trace RPC client lifetime events · 42aad0d7

由 Chuck Lever 提交于 5月 12, 2020

The "create" tracepoint records parts of the rpc_create arguments,
and the shutdown tracepoint records when the rpc_clnt is about to
signal pending tasks and destroy auths.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

42aad0d7

SUNRPC: Trace transport lifetime events · 911813d7

由 Chuck Lever 提交于 5月 12, 2020

Refactor: Hoist create/destroy/disconnect tracepoints out of
xprtrdma and into the generic RPC client. Some benefits include:

- Enable tracing of xprt lifetime events for the socket transport
  types

- Expose the different types of disconnect to help run down
  issues with lingering connections
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

911813d7

SUNRPC: Split the xdr_buf event class · c509f15a

由 Chuck Lever 提交于 5月 12, 2020

To help tie the recorded xdr_buf to a particular RPC transaction,
the client side version of this class should display task ID
information and the server side one should show the request's XID.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

c509f15a

SUNRPC: Add tracepoint to rpc_call_rpcerror() · 0125ecbb

由 Chuck Lever 提交于 5月 12, 2020

Add a tracepoint in another common exit point for failing RPCs.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

0125ecbb

SUNRPC: Update the RPC_SHOW_SOCKET() macro · 82909dc5

由 Chuck Lever 提交于 5月 12, 2020

Clean up: remove unnecessary commas, and fix a white-space nit.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

82909dc5

SUNRPC: Update the rpc_show_task_flags() macro · 7a34c8e0

由 Chuck Lever 提交于 5月 12, 2020

Recent additions to the RPC_TASK flags neglected to update
the tracepoint ENUM definitions.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

7a34c8e0

SUNRPC: Trace GSS context lifetimes · 74fb8fec

由 Chuck Lever 提交于 5月 12, 2020

Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

74fb8fec

SUNRPC: receive buffer size estimation values almost never change · 53bc19f1

由 Chuck Lever 提交于 5月 12, 2020

Avoid unnecessary cache sloshing by placing the buffer size
estimation update logic behind an atomic bit flag.

The size of GSS information included in each wrapped Reply does
not change during the lifetime of a GSS context. Therefore, the
au_rslack and au_ralign fields need to be updated only once after
establishing a fresh GSS credential.

Thus a slack size update must occur after a cred is created,
duplicated, renewed, or expires. I'm not sure I have this exactly
right. A trace point is introduced to track updates to these
variables to enable troubleshooting the problem if I missed a spot.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

53bc19f1

media: videobuf2-dma-contig: fix bad kfree in vb2_dma_contig_clear_max_seg_size · 0d966872

由 Tomi Valkeinen 提交于 5月 27, 2020

Commit 9495b7e9 ("driver core: platform:
Initialize dma_parms for platform devices") in v5.7-rc5 causes
vb2_dma_contig_clear_max_seg_size() to kfree memory that was not
allocated by vb2_dma_contig_set_max_seg_size().

The assumption in vb2_dma_contig_set_max_seg_size() seems to be that
dev->dma_parms is always NULL when the driver is probed, and the case
where dev->dma_parms has bee initialized by someone else than the driver
(by calling vb2_dma_contig_set_max_seg_size) will cause a failure.

All the current users of these functions are platform devices, which now
always have dma_parms set by the driver core. To fix the issue for v5.7,
make vb2_dma_contig_set_max_seg_size() return an error if dma_parms is
NULL to be on the safe side, and remove the kfree code from
vb2_dma_contig_clear_max_seg_size().

For v5.8 we should remove the two functions and move the
dma_set_max_seg_size() calls into the drivers.
Signed-off-by: NTomi Valkeinen <tomi.valkeinen@ti.com>
Fixes: 9495b7e9 ("driver core: platform: Initialize dma_parms for platform devices")
Cc: stable@vger.kernel.org
Acked-by: NMarek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: NUlf Hansson <ulf.hansson@linaro.org>
Signed-off-by: NHans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org>

0d966872

KVM: async_pf: Inject 'page ready' event only if 'page not present' was previously injected · 2a18b7e7

由 Vitaly Kuznetsov 提交于 6月 10, 2020

'Page not present' event may or may not get injected depending on
guest's state. If the event wasn't injected, there is no need to
inject the corresponding 'page ready' event as the guest may get
confused. E.g. Linux thinks that the corresponding 'page not present'
event wasn't delivered *yet* and allocates a 'dummy entry' for it.
This entry is never freed.

Note, 'wakeup all' events have no corresponding 'page not present'
event and always get injected.

s390 seems to always be able to inject 'page not present', the
change is effectively a nop.
Suggested-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200610175532.779793-2-vkuznets@redhat.com>
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=208081Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2a18b7e7

11 6月, 2020 14 次提交

x86/{mce,mm}: Unmap the entire page if the whole page is affected and poisoned · 17fae129

由 Tony Luck 提交于 5月 20, 2020

An interesting thing happened when a guest Linux instance took a machine
check. The VMM unmapped the bad page from guest physical space and
passed the machine check to the guest.

Linux took all the normal actions to offline the page from the process
that was using it. But then guest Linux crashed because it said there
was a second machine check inside the kernel with this stack trace:

do_memory_failure
    set_mce_nospec
         set_memory_uc
              _set_memory_uc
                   change_page_attr_set_clr
                        cpa_flush
                             clflush_cache_range_opt

This was odd, because a CLFLUSH instruction shouldn't raise a machine
check (it isn't consuming the data). Further investigation showed that
the VMM had passed in another machine check because is appeared that the
guest was accessing the bad page.

Fix is to check the scope of the poison by checking the MCi_MISC register.
If the entire page is affected, then unmap the page. If only part of the
page is affected, then mark the page as uncacheable.

This assumes that VMMs will do the logical thing and pass in the "whole
page scope" via the MCi_MISC register (since they unmapped the entire
page).

  [ bp: Adjust to x86/entry changes. ]

Fixes: 284ce401 ("x86/memory_failure: Introduce {set, clear}_mce_nospec()")
Reported-by: NJue Wang <juew@google.com>
Signed-off-by: NTony Luck <tony.luck@intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: NJue Wang <juew@google.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200520163546.GA7977@agluck-desk2.amr.corp.intel.com

17fae129

x86/entry: Unbreak __irqentry_text_start/end magic · f0178fc0

由 Thomas Gleixner 提交于 6月 10, 2020

The entry rework moved interrupt entry code from the irqentry to the
noinstr section which made the irqentry section empty.

This breaks boundary checks which rely on the __irqentry_text_start/end
markers to find out whether a function in a stack trace is
interrupt/exception entry code. This affects the function graph tracer and
filter_irq_stacks().

As the IDT entry points are all sequentialy emitted this is rather simple
to unbreak by injecting __irqentry_text_start/end as global labels.

To make this work correctly:

  - Remove the IRQENTRY_TEXT section from the x86 linker script
  - Define __irqentry so it breaks the build if it's used
  - Adjust the entry mirroring in PTI
  - Remove the redundant kprobes and unwinder bound checks
Reported-by: NQian Cai <cai@lca.pw>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

f0178fc0

lockdep: __always_inline more for noinstr · 6eebad1a

由 Peter Zijlstra 提交于 6月 03, 2020

vmlinux.o: warning: objtool: debug_locks_off()+0xd: call to __debug_locks_off() leaves .noinstr.text section
vmlinux.o: warning: objtool: match_held_lock()+0x6a: call to look_up_lock_class.isra.0() leaves .noinstr.text section
vmlinux.o: warning: objtool: lock_is_held_type()+0x90: call to lockdep_recursion_finish() leaves .noinstr.text section
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200603114052.185201076@infradead.org

6eebad1a

x86/entry: Rename trace_hardirqs_off_prepare() · bf2b3008

由 Peter Zijlstra 提交于 5月 29, 2020

The typical pattern for trace_hardirqs_off_prepare() is:

  ENTRY
    lockdep_hardirqs_off(); // because hardware
    ... do entry magic
    instrumentation_begin();
    trace_hardirqs_off_prepare();
    ... do actual work
    trace_hardirqs_on_prepare();
    lockdep_hardirqs_on_prepare();
    instrumentation_end();
    ... do exit magic
    lockdep_hardirqs_on();

which shows that it's named wrong, rename it to
trace_hardirqs_off_finish(), as it concludes the hardirq_off transition.

Also, given that the above is the only correct order, make the traditional
all-in-one trace_hardirqs_off() follow suit.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200529213321.415774872@infradead.org

bf2b3008

xen: Move xen_setup_callback_vector() definition to include/xen/hvm.h · 998c2034

由 Vitaly Kuznetsov 提交于 5月 20, 2020

Kbuild test robot reports the following problem on ARM:

  for 'xen_setup_callback_vector' [-Wmissing-prototypes]
1664 | void xen_setup_callback_vector(void) {}
|      ^~~~~~~~~~~~~~~~~~~~~~~~~

The problem is that xen_setup_callback_vector is a x86 only thing, its
definition is present in arch/x86/xen/xen-ops.h but not on ARM. In
events_base.c there is a stub for !CONFIG_XEN_PVHVM but it is not declared
as 'static'.

On x86 the situation is hardly better: drivers/xen/events/events_base.c
doesn't include 'xen-ops.h' from arch/x86/xen/, it includes its namesake
from include/xen/ which also results in a 'no previous prototype' warning.

Currently, xen_setup_callback_vector() has two call sites: one in
drivers/xen/events_base.c and another in arch/x86/xen/suspend_hvm.c. The
former is placed under #ifdef CONFIG_X86 and the later is only compiled
in when CONFIG_XEN_PVHVM.

Resolve the issue by moving xen_setup_callback_vector() declaration to
arch neutral 'include/xen/hvm.h' as the implementation lives in arch
neutral drivers/xen/events/events_base.c.
Reported-by: Nkbuild test robot <lkp@intel.com>
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NJuergen Gross <jgross@suse.com>
Link: https://lkml.kernel.org/r/20200520161600.361895-1-vkuznets@redhat.com

998c2034

x86/entry: Convert XEN hypercall vector to IDTENTRY_SYSVEC · cb09ea29

由 Thomas Gleixner 提交于 5月 21, 2020

Convert the last oldstyle defined vector to IDTENTRY_SYSVEC:

  - Implement the C entry point with DEFINE_IDTENTRY_SYSVEC
  - Emit the ASM stub with DECLARE_IDTENTRY_SYSVEC
  - Remove the ASM idtentries in 64-bit
  - Remove the BUILD_INTERRUPT entries in 32-bit
  - Remove the old prototypes

Fixup the related XEN code by providing the primary C entry point in x86 to
avoid cluttering the generic code with X86'isms.

No functional change.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Acked-by: NAndy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202119.741950104@linutronix.de

cb09ea29

x86/entry: Switch XEN/PV hypercall entry to IDTENTRY · 2f6474e4

由 Thomas Gleixner 提交于 5月 21, 2020

Convert the XEN/PV hypercall to IDTENTRY:

  - Emit the ASM stub with DECLARE_IDTENTRY
  - Remove the ASM idtentry in 64-bit
  - Remove the open coded ASM entry code in 32-bit
  - Remove the old prototypes

The handler stubs need to stay in ASM code as they need corner case handling
and adjustment of the stack pointer.

Provide a new C function which invokes the entry/exit handling and calls
into the XEN handler on the interrupt stack if required.

The exit code is slightly different from the regular idtentry_exit() on
non-preemptible kernels. If the hypercall is preemptible and need_resched()
is set then XEN provides a preempt hypercall scheduling function.

Move this functionality into the entry code so it can use the existing
idtentry functionality.

[ mingo: Build fixes. ]
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Acked-by: NAndy Lutomirski <luto@kernel.org>
Acked-by: NJuergen Gross <jgross@suse.com>
Tested-by: NJuergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20200521202118.055270078@linutronix.de

2f6474e4

genirq: Provide __irq_enter/exit_raw() · 98a3bf19

由 Thomas Gleixner 提交于 5月 21, 2020

Like __irq_enter/exit() but without time accounting. To be used for "empty"
system vectors like the scheduler IPI to avoid the overhead.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Reviewed-by: NAndy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202117.671682341@linutronix.de

98a3bf19

genirq: Provide irq_enter/exit_rcu() · 8a6bc478

由 Thomas Gleixner 提交于 5月 21, 2020

irq_enter()/exit() currently include RCU handling. To properly separate the RCU
handling code, provide variants which contain only the non-RCU related
functionality.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Reviewed-by: NAndy Lutomirski <luto@kernel.org>
Link: https://lore.kernel.org/r/20200521202117.567023613@linutronix.de

8a6bc478

nmi, tracing: Make hardware latency tracing noinstr safe · 2ab70319

由 Thomas Gleixner 提交于 5月 21, 2020

The hardware latency tracer calls into instrumentable functions. Move the
calls into the RCU watching sections and annotate them.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Acked-by: NAndy Lutomirski <luto@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/20200521202116.904176298@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

2ab70319

lib/bsearch: Provide __always_inline variant · df65bba1

由 Peter Zijlstra 提交于 2月 19, 2020

For code that needs the ultimate performance (it can inline the @cmp
function too) or simply needs to avoid calling external functions for
whatever reason, provide an __always_inline variant of bsearch().

[ tglx: Renamed to __inline_bsearch() as suggested by Andy ]
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NAlexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: NAndy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505135313.624443814@linutronix.de

df65bba1

bug: Annotate WARN/BUG/stackfail as noinstr safe · 5916d5f9

由 Thomas Gleixner 提交于 3月 13, 2020

Warnings, bugs and stack protection fails from noinstr sections, e.g. low
level and early entry code, are likely to be fatal.

Mark them as "safe" to be invoked from noinstr protected code to avoid
annotating all usage sites. Getting the information out is important.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NAlexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134100.376598577@linutronix.de

5916d5f9

context_tracking: Ensure that the critical path cannot be instrumented · 0372007f

由 Thomas Gleixner 提交于 3月 04, 2020

context tracking lacks a few protection mechanisms against instrumentation:

 - While the core functions are marked NOKPROBE they lack protection
   against function tracing which is required as the function entry/exit
   points can be utilized by BPF.

 - static functions invoked from the protected functions need to be marked
   as well as they can be instrumented otherwise.

 - using plain inline allows the compiler to emit traceable and probable
   functions.

Fix this by marking the functions noinstr and converting the plain inlines
to __always_inline.

The NOKPROBE_SYMBOL() annotations are removed as the .noinstr.text section
is already excluded from being probed.

Cures the following objtool warnings:

 vmlinux.o: warning: objtool: enter_from_user_mode()+0x34: call to __context_tracking_exit() leaves .noinstr.text section
 vmlinux.o: warning: objtool: prepare_exit_to_usermode()+0x29: call to __context_tracking_enter() leaves .noinstr.text section
 vmlinux.o: warning: objtool: syscall_return_slowpath()+0x29: call to __context_tracking_enter() leaves .noinstr.text section
 vmlinux.o: warning: objtool: do_syscall_64()+0x7f: call to __context_tracking_enter() leaves .noinstr.text section
 vmlinux.o: warning: objtool: do_int80_syscall_32()+0x3d: call to __context_tracking_enter() leaves .noinstr.text section
 vmlinux.o: warning: objtool: do_fast_syscall_32()+0x9c: call to __context_tracking_enter() leaves .noinstr.text section

and generates new ones...
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: NAlexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134340.811520478@linutronix.de

0372007f

locking/atomics: Flip fallbacks and instrumentation · 37f8173d

由 Peter Zijlstra 提交于 1月 24, 2020

Currently instrumentation of atomic primitives is done at the architecture
level, while composites or fallbacks are provided at the generic level.

The result is that there are no uninstrumented variants of the
fallbacks. Since there is now need of such variants to isolate text poke
from any form of instrumentation invert this ordering.

Doing this means moving the instrumentation into the generic code as
well as having (for now) two variants of the fallbacks.

Notes:

 - the various *cond_read* primitives are not proper fallbacks
   and got moved into linux/atomic.c. No arch_ variants are
   generated because the base primitives smp_cond_load*()
   are instrumented.

 - once all architectures are moved over to arch_atomic_ one of the
   fallback variants can be removed and some 2300 lines reclaimed.

 - atomic_{read,set}*() are no longer double-instrumented
Reported-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NMark Rutland <mark.rutland@arm.com>
Link: https://lkml.kernel.org/r/20200505134058.769149955@linutronix.de

37f8173d

openeuler / Kernel 大约 2 年 前同步成功

openeuler / Kernel
大约 2 年前同步成功