提交 · 12c88d91a82f30e54febb95fa68f6ce75108e22f · openanolis / cloud-kernel

18 12月, 2019 5 次提交

kernel/module.c: wakeup processes in module_wq on module unload · 12c88d91

由 Konstantin Khorenko 提交于 11月 13, 2019

[ Upstream commit 5d603311615f612320bb77bd2a82553ef1ced5b7 ]

Fix the race between load and unload a kernel module.

sys_delete_module()
 try_stop_module()
  mod->state = _GOING
					add_unformed_module()
					 old = find_module_all()
					 (old->state == _GOING =>
					  wait_event_interruptible())

					 During pre-condition
					 finished_loading() rets 0
					 schedule()
					 (never gets waken up later)
 free_module()
  mod->state = _UNFORMED
   list_del_rcu(&mod->list)
   (dels mod from "modules" list)

return

The race above leads to modprobe hanging forever on loading
a module.

Error paths on loading module call wake_up_all(&module_wq) after
freeing module, so let's do the same on straight module unload.

Fixes: 6e6de3dee51a ("kernel/module.c: Only return -EEXIST for modules that have finished loading")
Reviewed-by: NPrarit Bhargava <prarit@redhat.com>
Signed-off-by: NKonstantin Khorenko <khorenko@virtuozzo.com>
Signed-off-by: NJessica Yu <jeyu@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

12c88d91

workqueue: Fix missing kfree(rescuer) in destroy_workqueue() · 1b83d575

由 Tejun Heo 提交于 9月 20, 2019

commit 8efe1223d73c218ce7e8b2e0e9aadb974b582d7f upstream.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NQian Cai <cai@lca.pw>
Fixes: def98c84b6cd ("workqueue: Fix spurious sanity check failures in destroy_workqueue()")
Cc: Nobuhiro Iwamatsu <nobuhiro1.iwamatsu@toshiba.co.jp>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

1b83d575

cgroup: pids: use atomic64_t for pids->limit · a1de70aa

由 Aleksa Sarai 提交于 10月 17, 2019

commit a713af394cf382a30dd28a1015cbe572f1b9ca75 upstream.

Because pids->limit can be changed concurrently (but we don't want to
take a lock because it would be needlessly expensive), use atomic64_ts
instead.

Fixes: commit 49b786ea ("cgroup: implement the PIDs subsystem")
Cc: stable@vger.kernel.org # v4.3+
Signed-off-by: NAleksa Sarai <cyphar@cyphar.com>
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

a1de70aa

workqueue: Fix pwq ref leak in rescuer_thread() · ebd9fbf9

由 Tejun Heo 提交于 9月 25, 2019

commit e66b39af00f426b3356b96433d620cb3367ba1ff upstream.

008847f6 ("workqueue: allow rescuer thread to do more work.") made
the rescuer worker requeue the pwq immediately if there may be more
work items which need rescuing instead of waiting for the next mayday
timer expiration.  Unfortunately, it doesn't check whether the pwq is
already on the mayday list and unconditionally gets the ref and moves
it onto the list.  This doesn't corrupt the list but creates an
additional reference to the pwq.  It got queued twice but will only be
removed once.

This leak later can trigger pwq refcnt warning on workqueue
destruction and prevent freeing of the workqueue.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: "Williams, Gerald S" <gerald.s.williams@intel.com>
Cc: NeilBrown <neilb@suse.de>
Cc: stable@vger.kernel.org # v3.19+
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

ebd9fbf9

workqueue: Fix spurious sanity check failures in destroy_workqueue() · 7c43540e

由 Tejun Heo 提交于 9月 18, 2019

commit def98c84b6cdf2eeea19ec5736e90e316df5206b upstream.

Before actually destrying a workqueue, destroy_workqueue() checks
whether it's actually idle.  If it isn't, it prints out a bunch of
warning messages and leaves the workqueue dangling.  It unfortunately
has a couple issues.

* Mayday list queueing increments pwq's refcnts which gets detected as
  busy and fails the sanity checks.  However, because mayday list
  queueing is asynchronous, this condition can happen without any
  actual work items left in the workqueue.

* Sanity check failure leaves the sysfs interface behind too which can
  lead to init failure of newer instances of the workqueue.

This patch fixes the above two by

* If a workqueue has a rescuer, disable and kill the rescuer before
  sanity checks.  Disabling and killing is guaranteed to flush the
  existing mayday list.

* Remove sysfs interface before sanity checks.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NMarcin Pawlowski <mpawlowski@fb.com>
Reported-by: N"Williams, Gerald S" <gerald.s.williams@intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

7c43540e

13 12月, 2019 7 次提交

sched/fair: Scale bandwidth quota and period without losing quota/period ratio precision · 742f2319

由 Xuewei Zhang 提交于 10月 03, 2019

commit 4929a4e6faa0f13289a67cae98139e727f0d4a97 upstream.

The quota/period ratio is used to ensure a child task group won't get
more bandwidth than the parent task group, and is calculated as:

  normalized_cfs_quota() = [(quota_us << 20) / period_us]

If the quota/period ratio was changed during this scaling due to
precision loss, it will cause inconsistency between parent and child
task groups.

See below example:

A userspace container manager (kubelet) does three operations:

 1) Create a parent cgroup, set quota to 1,000us and period to 10,000us.
 2) Create a few children cgroups.
 3) Set quota to 1,000us and period to 10,000us on a child cgroup.

These operations are expected to succeed. However, if the scaling of
147/128 happens before step 3, quota and period of the parent cgroup
will be changed:

  new_quota: 1148437ns,   1148us
 new_period: 11484375ns, 11484us

And when step 3 comes in, the ratio of the child cgroup will be
104857, which will be larger than the parent cgroup ratio (104821),
and will fail.

Scaling them by a factor of 2 will fix the problem.
Tested-by: NPhil Auld <pauld@redhat.com>
Signed-off-by: NXuewei Zhang <xueweiz@google.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NPhil Auld <pauld@redhat.com>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Fixes: 2e8e19226398 ("sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup")
Link: https://lkml.kernel.org/r/20191004001243.140897-1-xueweiz@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

742f2319

bpf: btf: check name validity for various types · 12c49ac4

由 Yonghong Song 提交于 11月 27, 2018

[ Upstream commit eb04bbb608e683f8fd3ef7f716e2fa32dd90861f ]

This patch added name checking for the following types:
 . BTF_KIND_PTR, BTF_KIND_ARRAY, BTF_KIND_VOLATILE,
   BTF_KIND_CONST, BTF_KIND_RESTRICT:
     the name must be null
 . BTF_KIND_STRUCT, BTF_KIND_UNION: the struct/member name
     is either null or a valid identifier
 . BTF_KIND_ENUM: the enum type name is either null or a valid
     identifier; the enumerator name must be a valid identifier.
 . BTF_KIND_FWD: the name must be a valid identifier
 . BTF_KIND_TYPEDEF: the name must be a valid identifier

For those places a valid name is required, the name must be
a valid C identifier. This can be relaxed later if we found
use cases for a different (non-C) frontend.

Fixes: 69b693f0 ("bpf: btf: Introduce BPF Type Format (BTF)")
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

12c49ac4

bpf: btf: implement btf_name_valid_identifier() · 2f3e380d

由 Yonghong Song 提交于 11月 27, 2018

[ Upstream commit cdbb096adddb3f42584cecb5ec2e07c26815b71f ]

Function btf_name_valid_identifier() have been implemented in
bpf-next commit 2667a2626f4d ("bpf: btf: Add BTF_KIND_FUNC and
BTF_KIND_FUNC_PROTO"). Backport this function so later patch
can use it.

Fixes: 69b693f0 ("bpf: btf: Introduce BPF Type Format (BTF)")
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

2f3e380d

audit: Embed key into chunk · 6ce317fd

由 Jan Kara 提交于 11月 12, 2018

[ Upstream commit 8d20d6e9301d7b3777d66d47dd5b89acd645cd39 ]

Currently chunk hash key (which is in fact pointer to the inode) is
derived as chunk->mark.conn->obj. It is tricky to make this dereference
reliable for hash table lookups only under RCU as mark can get detached
from the connector and connector gets freed independently of the
running lookup. Thus there is a possible use after free / NULL ptr
dereference issue:

CPU1					CPU2
					untag_chunk()
					  ...
audit_tree_lookup()
  list_for_each_entry_rcu(p, list, hash) {
					  list_del_rcu(&chunk->hash);
					  fsnotify_destroy_mark(entry);
					  fsnotify_put_mark(entry)
    chunk_to_key(p)
      if (!chunk->mark.connector)
					    ...
					    hlist_del_init_rcu(&mark->obj_list);
					    if (hlist_empty(&conn->list)) {
					      inode = fsnotify_detach_connector_from_object(conn);
					    mark->connector = NULL;
					    ...
					    frees connector from workqueue
      chunk->mark.connector->obj

This race is probably impossible to hit in practice as the race window
on CPU1 is very narrow and CPU2 has a lot of code to execute. Still it's
better to have this fixed. Since the inode the chunk is attached to is
constant during chunk's lifetime it is easy to cache the key in the
chunk itself and thus avoid these issues.
Reviewed-by: NRichard Guy Briggs <rgb@redhat.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NPaul Moore <paul@paul-moore.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

6ce317fd

perf/core: Consistently fail fork on allocation failures · 78a917be

由 Alexander Shishkin 提交于 11月 05, 2019

[ Upstream commit 697d877849d4b34ab58d7078d6930bad0ef6fc66 ]

Commit:

  313ccb96 ("perf: Allocate context task_ctx_data for child event")

makes the inherit path skip over the current event in case of task_ctx_data
allocation failure. This, however, is inconsistent with allocation failures
in perf_event_alloc(), which would abort the fork.

Correct this by returning an error code on task_ctx_data allocation
failure and failing the fork in that case.
Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: https://lkml.kernel.org/r/20191105075702.60319-1-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

78a917be

sched/core: Avoid spurious lock dependencies · 870083b6

由 Peter Zijlstra 提交于 10月 01, 2019

[ Upstream commit ff51ff84d82aea5a889b85f2b9fb3aa2b8691668 ]

While seemingly harmless, __sched_fork() does hrtimer_init(), which,
when DEBUG_OBJETS, can end up doing allocations.

This then results in the following lock order:

  rq->lock
    zone->lock.rlock
      batched_entropy_u64.lock

Which in turn causes deadlocks when we do wakeups while holding that
batched_entropy lock -- as the random code does.

Solve this by moving __sched_fork() out from under rq->lock. This is
safe because nothing there relies on rq->lock, as also evident from the
other __sched_fork() callsite.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qian Cai <cai@lca.pw>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: bigeasy@linutronix.de
Cc: cl@linux.com
Cc: keescook@chromium.org
Cc: penberg@kernel.org
Cc: rientjes@google.com
Cc: thgarnie@google.com
Cc: tytso@mit.edu
Cc: will@kernel.org
Fixes: b7d5dc21072c ("random: add a spinlock_t to struct batched_entropy")
Link: https://lkml.kernel.org/r/20191001091837.GK4536@hirez.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

870083b6

audit_get_nd(): don't unlock parent too early · 7fb6ef16

由 Al Viro 提交于 11月 02, 2019

[ Upstream commit 69924b89687a2923e88cc42144aea27868913d0e ]

if the child has been negative and just went positive
under us, we want coherent d_is_positive() and ->d_inode.
Don't unlock the parent until we'd done that work...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NSasha Levin <sashal@kernel.org>

7fb6ef16

05 12月, 2019 7 次提交

xdp: fix cpumap redirect SKB creation bug · 7d8677ff

由 Jesper Dangaard Brouer 提交于 3月 29, 2019

[ Upstream commit 676e4a6fe703f2dae699ee9d56f14516f9ada4ea ]

We want to avoid leaking pointer info from xdp_frame (that is placed in
top of frame) like commit 6dfb970d ("xdp: avoid leaking info stored in
frame data on page reuse"), and followup commit 97e19cce ("bpf:
reserve xdp_frame size in xdp headroom") that reserve this headroom.

These changes also affected how cpumap constructed SKBs, as xdpf->headroom
size changed, the skb data starting point were in-effect shifted with 32
bytes (sizeof xdp_frame). This was still okay, as the cpumap frame_size
calculation also included xdpf->headroom which were reduced by same amount.

A bug was introduced in commit 77ea5f4cbe20 ("bpf/cpumap: make sure
frame_size for build_skb is aligned if headroom isn't"), where the
xdpf->headroom became part of the SKB_DATA_ALIGN rounding up. This
round-up to find the frame_size is in principle still correct as it does
not exceed the 2048 bytes frame_size (which is max for ixgbe and i40e),
but the 32 bytes offset of pkt_data_start puts this over the 2048 bytes
limit. This cause skb_shared_info to spill into next frame. It is a little
hard to trigger, as the SKB need to use above 15 skb_shinfo->frags[] as
far as I calculate. This does happen in practise for TCP streams when
skb_try_coalesce() kicks in.

KASAN can be used to detect these wrong memory accesses, I've seen:
BUG: KASAN: use-after-free in skb_try_coalesce+0x3cb/0x760
BUG: KASAN: wild-memory-access in skb_release_data+0xe2/0x250

Driver veth also construct a SKB from xdp_frame in this way, but is not
affected, as it doesn't reserve/deduct the room (used by xdp_frame) from
the SKB headroom. Instead is clears the pointers via xdp_scrub_frame(),
and allows SKB to use this area.

The fix in this patch is to do like veth and instead allow SKB to (re)use
the area occupied by xdp_frame, by clearing via xdp_scrub_frame(). (This
does kill the idea of the SKB being able to access (mem) info from this
area, but I guess it was a bad idea anyhow, and it was already killed by
the veth changes.)

Fixes: 77ea5f4cbe20 ("bpf/cpumap: make sure frame_size for build_skb is aligned if headroom isn't")
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

7d8677ff

bpf: drop refcount if bpf_map_new_fd() fails in map_create() · 1a5df073

由 Peng Sun 提交于 2月 27, 2019

[ Upstream commit 352d20d611414715353ee65fc206ee57ab1a6984 ]

In bpf/syscall.c, map_create() first set map->usercnt to 1, a file
descriptor is supposed to return to userspace. When bpf_map_new_fd()
fails, drop the refcount.

Fixes: bd5f5f4e ("bpf: Add BPF_MAP_GET_FD_BY_ID")
Signed-off-by: NPeng Sun <sironhide0null@gmail.com>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>

1a5df073

bpf: decrease usercnt if bpf_map_new_fd() fails in bpf_map_get_fd_by_id() · 944df7b4

由 Peng Sun 提交于 2月 26, 2019

[ Upstream commit 781e62823cb81b972dc8652c1827205cda2ac9ac ]

In bpf/syscall.c, bpf_map_get_fd_by_id() use bpf_map_inc_not_zero()
to increase the refcount, both map->refcnt and map->usercnt. Then, if
bpf_map_new_fd() fails, should handle map->usercnt too.

Fixes: bd5f5f4e ("bpf: Add BPF_MAP_GET_FD_BY_ID")
Signed-off-by: NPeng Sun <sironhide0null@gmail.com>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>

944df7b4

fork: fix some -Wmissing-prototypes warnings · fc38279e

由 Yi Wang 提交于 1月 03, 2019

[ Upstream commit fb5bf31722d0805a3f394f7d59f2e8cd07acccb7 ]

We get a warning when building kernel with W=1:

  kernel/fork.c:167:13: warning: no previous prototype for `arch_release_thread_stack' [-Wmissing-prototypes]
  kernel/fork.c:779:13: warning: no previous prototype for `fork_init' [-Wmissing-prototypes]

Add the missing declaration in head file to fix this.

Also, remove arch_release_thread_stack() completely because no arch
seems to implement it since bb9d8126 (arch: remove tile port).

Link: http://lkml.kernel.org/r/1542170087-23645-1-git-send-email-wang.yi59@zte.com.cnSigned-off-by: NYi Wang <wang.yi59@zte.com.cn>
Acked-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NMike Rapoport <rppt@linux.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

fc38279e

bpf/cpumap: make sure frame_size for build_skb is aligned if headroom isn't · f0910af7

由 Jesper Dangaard Brouer 提交于 12月 19, 2018

[ Upstream commit 77ea5f4cbe2084db9ab021ba73fb7eadf1610884 ]

The frame_size passed to build_skb must be aligned, else it is
possible that the embedded struct skb_shared_info gets unaligned.

For correctness make sure that xdpf->headroom in included in the
alignment. No upstream drivers can hit this, as all XDP drivers provide
an aligned headroom.  This was discovered when playing with implementing
XDP support for mvneta, which have a 2 bytes DSA header, and this
Marvell ARM64 platform didn't like doing atomic operations on an
unaligned skb_shinfo(skb)->dataref addresses.

Fixes: 1c601d82 ("bpf: cpumap xdp_buff to skb conversion and allocation")
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>

f0910af7

kprobes: Blacklist symbols in arch-defined prohibited area · 98cf16d3

由 Masami Hiramatsu 提交于 12月 17, 2018

[ Upstream commit fb1a59fae8baa3f3c69b72a87ff94fc4fa5683ec ]

Blacklist symbols in arch-defined probe-prohibited areas.
With this change, user can see all symbols which are prohibited
to probe in debugfs.

All archtectures which have custom prohibit areas should define
its own arch_populate_kprobe_blacklist() function, but unless that,
all symbols marked __kprobes are blacklisted.
Reported-by: NAndrea Righi <righi.andrea@gmail.com>
Tested-by: NAndrea Righi <righi.andrea@gmail.com>
Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yonghong Song <yhs@fb.com>
Link: http://lkml.kernel.org/r/154503485491.26176.15823229545155174796.stgit@devboxSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

98cf16d3

tracing: Lock event_mutex before synth_event_mutex · dee3f770

由 Masami Hiramatsu 提交于 11月 05, 2018

[ Upstream commit fc800a10be26017f8f338bc8e500d48e3e6429d9 ]

synthetic event is using synth_event_mutex for protecting
synth_event_list, and event_trigger_write() path acquires
locks as below order.

event_trigger_write(event_mutex)
  ->trigger_process_regex(trigger_cmd_mutex)
    ->event_hist_trigger_func(synth_event_mutex)

On the other hand, synthetic event creation and deletion paths
call trace_add_event_call() and trace_remove_event_call()
which acquires event_mutex. In that case, if we keep the
synth_event_mutex locked while registering/unregistering synthetic
events, its dependency will be inversed.

To avoid this issue, current synthetic event is using a 2 phase
process to create/delete events. For example, it searches existing
events under synth_event_mutex to check for event-name conflicts, and
unlocks synth_event_mutex, then registers a new event under event_mutex
locked. Finally, it locks synth_event_mutex and tries to add the
new event to the list. But it can introduce complexity and a chance
for name conflicts.

To solve this simpler, this introduces trace_add_event_call_nolock()
and trace_remove_event_call_nolock() which don't acquire
event_mutex inside. synthetic event can lock event_mutex before
synth_event_mutex to solve the lock dependency issue simpler.

Link: http://lkml.kernel.org/r/154140844377.17322.13781091165954002713.stgit@devboxReviewed-by: NTom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: NTom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

dee3f770

01 12月, 2019 12 次提交

futex: Prevent robust futex exit race · 2819f403

由 Yang Tao 提交于 11月 06, 2019

commit ca16d5bee59807bf04deaab0a8eccecd5061528c upstream.

Robust futexes utilize the robust_list mechanism to allow the kernel to
release futexes which are held when a task exits. The exit can be voluntary
or caused by a signal or fault. This prevents that waiters block forever.

The futex operations in user space store a pointer to the futex they are
either locking or unlocking in the op_pending member of the per task robust
list.

After a lock operation has succeeded the futex is queued in the robust list
linked list and the op_pending pointer is cleared.

After an unlock operation has succeeded the futex is removed from the
robust list linked list and the op_pending pointer is cleared.

The robust list exit code checks for the pending operation and any futex
which is queued in the linked list. It carefully checks whether the futex
value is the TID of the exiting task. If so, it sets the OWNER_DIED bit and
tries to wake up a potential waiter.

This is race free for the lock operation but unlock has two race scenarios
where waiters might not be woken up. These issues can be observed with
regular robust pthread mutexes. PI aware pthread mutexes are not affected.

(1) Unlocking task is killed after unlocking the futex value in user space
    before being able to wake a waiter.

        pthread_mutex_unlock()
                |
                V
        atomic_exchange_rel (&mutex->__data.__lock, 0)
                        <------------------------killed
            lll_futex_wake ()                   |
                                                |
                                                |(__lock = 0)
                                                |(enter kernel)
                                                |
                                                V
                                            do_exit()
                                            exit_mm()
                                          mm_release()
                                        exit_robust_list()
                                        handle_futex_death()
                                                |
                                                |(__lock = 0)
                                                |(uval = 0)
                                                |
                                                V
        if ((uval & FUTEX_TID_MASK) != task_pid_vnr(curr))
                return 0;

    The sanity check which ensures that the user space futex is owned by
    the exiting task prevents the wakeup of waiters which in consequence
    block infinitely.

(2) Waiting task is killed after a wakeup and before it can acquire the
    futex in user space.

        OWNER                         WAITER
				futex_wait()
   pthread_mutex_unlock()               |
                |                       |
                |(__lock = 0)           |
                |                       |
                V                       |
         futex_wake() ------------>  wakeup()
                                        |
                                        |(return to userspace)
                                        |(__lock = 0)
                                        |
                                        V
                        oldval = mutex->__data.__lock
                                          <-----------------killed
    atomic_compare_and_exchange_val_acq (&mutex->__data.__lock,  |
                        id | assume_other_futex_waiters, 0)      |
                                                                 |
                                                                 |
                                                   (enter kernel)|
                                                                 |
                                                                 V
                                                         do_exit()
                                                        |
                                                        |
                                                        V
                                        handle_futex_death()
                                        |
                                        |(__lock = 0)
                                        |(uval = 0)
                                        |
                                        V
        if ((uval & FUTEX_TID_MASK) != task_pid_vnr(curr))
                return 0;

    The sanity check which ensures that the user space futex is owned
    by the exiting task prevents the wakeup of waiters, which seems to
    be correct as the exiting task does not own the futex value, but
    the consequence is that other waiters wont be woken up and block
    infinitely.

In both scenarios the following conditions are true:

   - task->robust_list->list_op_pending != NULL
   - user space futex value == 0
   - Regular futex (not PI)

If these conditions are met then it is reasonably safe to wake up a
potential waiter in order to prevent the above problems.

As this might be a false positive it can cause spurious wakeups, but the
waiter side has to handle other types of unrelated wakeups, e.g. signals
gracefully anyway. So such a spurious wakeup will not affect the
correctness of these operations.

This workaround must not touch the user space futex value and cannot set
the OWNER_DIED bit because the lock value is 0, i.e. uncontended. Setting
OWNER_DIED in this case would result in inconsistent state and subsequently
in malfunction of the owner died handling in user space.

The rest of the user space state is still consistent as no other task can
observe the list_op_pending entry in the exiting tasks robust list.

The eventually woken up waiter will observe the uncontended lock value and
take it over.

[ tglx: Massaged changelog and comment. Made the return explicit and not
  	depend on the subsequent check and added constants to hand into
  	handle_futex_death() instead of plain numbers. Fixed a few coding
	style issues. ]

Fixes: 0771dfef ("[PATCH] lightweight robust futexes: core")
Signed-off-by: NYang Tao <yang.tao172@zte.com.cn>
Signed-off-by: NYi Wang <wang.yi59@zte.com.cn>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NIngo Molnar <mingo@kernel.org>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1573010582-35297-1-git-send-email-wang.yi59@zte.com.cn
Link: https://lkml.kernel.org/r/20191106224555.943191378@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

2819f403

y2038: futex: Move compat implementation into futex.c · d3f8c58d

由 Arnd Bergmann 提交于 4月 17, 2018

commit 04e7712f4460585e5eed5b853fd8b82a9943958f upstream.

We are going to share the compat_sys_futex() handler between 64-bit
architectures and 32-bit architectures that need to deal with both 32-bit
and 64-bit time_t, and this is easier if both entry points are in the
same file.

In fact, most other system call handlers do the same thing these days, so
let's follow the trend here and merge all of futex_compat.c into futex.c.

In the process, a few minor changes have to be done to make sure everything
still makes sense: handle_futex_death() and futex_cmpxchg_enabled() become
local symbol, and the compat version of the fetch_robust_entry() function
gets renamed to compat_fetch_robust_entry() to avoid a symbol clash.

This is intended as a purely cosmetic patch, no behavior should
change.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

d3f8c58d

audit: print empty EXECVE args · 3c69a033

由 Richard Guy Briggs 提交于 10月 10, 2018

[ Upstream commit ea956d8be91edc702a98b7fe1f9463e7ca8c42ab ]

Empty executable arguments were being skipped when printing out the list
of arguments in an EXECVE record, making it appear they were somehow
lost.  Include empty arguments as an itemized empty string.

Reproducer:
	autrace /bin/ls "" "/etc"
	ausearch --start recent -m execve -i | grep EXECVE
	type=EXECVE msg=audit(10/03/2018 13:04:03.208:1391) : argc=3 a0=/bin/ls a2=/etc

With fix:
	type=EXECVE msg=audit(10/03/2018 21:51:38.290:194) : argc=3 a0=/bin/ls a1= a2=/etc
	type=EXECVE msg=audit(1538617898.290:194): argc=3 a0="/bin/ls" a1="" a2="/etc"

Passes audit-testsuite.  GH issue tracker at
https://github.com/linux-audit/audit-kernel/issues/99Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
[PM: cleaned up the commit metadata]
Signed-off-by: NPaul Moore <paul@paul-moore.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

3c69a033

sched/fair: Don't increase sd->balance_interval on newidle balance · 31bced01

由 Valentin Schneider 提交于 9月 26, 2018

[ Upstream commit 3f130a37c442d5c4d66531b240ebe9abfef426b5 ]

When load_balance() fails to move some load because of task affinity,
we end up increasing sd->balance_interval to delay the next periodic
balance in the hopes that next time we look, that annoying pinned
task(s) will be gone.

However, idle_balance() pays no attention to sd->balance_interval, yet
it will still lead to an increase in balance_interval in case of
pinned tasks.

If we're going through several newidle balances (e.g. we have a
periodic task), this can lead to a huge increase of the
balance_interval in a very small amount of time.

To prevent that, don't increase the balance interval when going
through a newidle balance.

This is a similar approach to what is done in commit 58b26c4c
("sched: Increment cache_nice_tries only on periodic lb"), where we
disregard newidle balance and rely on periodic balance for more stable
results.
Signed-off-by: NValentin Schneider <valentin.schneider@arm.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dietmar.Eggemann@arm.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: patrick.bellasi@arm.com
Cc: vincent.guittot@linaro.org
Link: http://lkml.kernel.org/r/1537974727-30788-2-git-send-email-valentin.schneider@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

31bced01

sched/topology: Fix off by one bug · ed023646

由 Peter Zijlstra 提交于 11月 02, 2018

[ Upstream commit 993f0b0510dad98b4e6e39506834dab0d13fd539 ]

With the addition of the NUMA identity level, we increased @level by
one and will run off the end of the array in the distance sort loop.

Fixed: 051f3ca0 ("sched/topology: Introduce NUMA identity node sched domain")
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

ed023646

irq/matrix: Fix memory overallocation · 0bbb8382

由 Michael Kelley 提交于 11月 01, 2018

[ Upstream commit 57f01796f14fecf00d330fe39c8d2477ced9cd79 ]

IRQ_MATRIX_SIZE is the number of longs needed for a bitmap, multiplied by
the size of a long, yielding a byte count. But it is used to size an array
of longs, which is way more memory than is needed.

Change IRQ_MATRIX_SIZE so it is just the number of longs needed and the
arrays come out the correct size.

Fixes: 2f75d9e1 ("genirq: Implement bitmap matrix allocator")
Signed-off-by: NMichael Kelley <mikelley@microsoft.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: KY Srinivasan <kys@microsoft.com>
Link: https://lkml.kernel.org/r/1541032428-10392-1-git-send-email-mikelley@microsoft.comSigned-off-by: NSasha Levin <sashal@kernel.org>

0bbb8382

kernel/panic.c: do not append newline to the stack protector panic string · 023c071f

由 Borislav Petkov 提交于 10月 30, 2018

[ Upstream commit 95c4fb78fb23081472465ca20d5d31c4b780ed82 ]

... because panic() itself already does this. Otherwise you have
line-broken trailer:

  [    1.836965] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: pgd_alloc+0x29e/0x2a0
  [    1.836965]  ]---

Link: http://lkml.kernel.org/r/20181008202901.7894-1-bp@alien8.deSigned-off-by: NBorislav Petkov <bp@suse.de>
Acked-by: NKees Cook <keescook@chromium.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

023c071f

bpf, btf: fix a missing check bug in btf_parse · 54299e1c

由 Martin Lau 提交于 10月 24, 2018

[ Upstream commit 4a6998aff82a20a1aece86a186d8e5263f8b2315 ]

Wenwen Wang reported:

  In btf_parse(), the header of the user-space btf data 'btf_data'
  is firstly parsed and verified through btf_parse_hdr().
  In btf_parse_hdr(), the header is copied from user-space 'btf_data'
  to kernel-space 'btf->hdr' and then verified. If no error happens
  during the verification process, the whole data of 'btf_data',
  including the header, is then copied to 'data' in btf_parse(). It
  is obvious that the header is copied twice here. More importantly,
  no check is enforced after the second copy to make sure the headers
  obtained in these two copies are same. Given that 'btf_data' resides
  in the user space, a malicious user can race to modify the header
  between these two copies. By doing so, the user can inject
  inconsistent data, which can cause undefined behavior of the
  kernel and introduce potential security risk.

This issue is similar to the one fixed in commit 8af03d1ae2e1 ("bpf:
btf: Fix a missing check bug"). To fix it, this patch copies the user
'btf_data' *before* parsing / verifying the BTF header.

Fixes: 69b693f0 ("bpf: btf: Introduce BPF Type Format (BTF)")
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Co-developed-by: NWenwen Wang <wang6495@umn.edu>
Acked-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>

54299e1c

bpf: devmap: fix wrong interface selection in notifier_call · 8044e741

由 Taehee Yoo 提交于 10月 24, 2018

[ Upstream commit f592f804831f1cf9d1f9966f58c80f150e6829b5 ]

The dev_map_notification() removes interface in devmap if
unregistering interface's ifindex is same.
But only checking ifindex is not enough because other netns can have
same ifindex. so that wrong interface selection could occurred.
Hence netdev pointer comparison code is added.

v2: compare netdev pointer instead of using net_eq() (Daniel Borkmann)
v1: Initial patch

Fixes: 2ddf71e2 ("net: add notifier hooks for devmap bpf map")
Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
Acked-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>

8044e741

swiotlb: do not panic on mapping failures · ad9a4e96

由 Christoph Hellwig 提交于 4月 12, 2018

[ Upstream commit 8088546832aa2c0d8f99dd56edf6384f8a9b63b3 ]

All properly written drivers now have error handling in the
dma_map_single / dma_map_page callers.  As swiotlb_tbl_map_single already
prints a useful warning when running out of swiotlb pool space we can
also remove swiotlb_full entirely as it serves no purpose now.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NRobin Murphy <robin.murphy@arm.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

ad9a4e96

printk: fix integer overflow in setup_log_buf() · 4465a916

由 Sergey Senozhatsky 提交于 10月 10, 2018

[ Upstream commit d2130e82e9454304e9b91ba9da551b5989af8c27 ]

The way we calculate logbuf free space percentage overflows signed
integer:

	int free;

	free = __LOG_BUF_LEN - log_next_idx;
	pr_info("early log buf free: %u(%u%%)\n",
		free, (free * 100) / __LOG_BUF_LEN);

We support LOG_BUF_LEN of up to 1<<25 bytes. Since setup_log_buf() is
called during early init, logbuf is mostly empty, so

	__LOG_BUF_LEN - log_next_idx

is close to 1<<25. Thus when we multiply it by 100, we overflow signed
integer value range: 100 is 2^6 + 2^5 + 2^2.

Example, booting with LOG_BUF_LEN 1<<25 and log_buf_len=2G
boot param:

[    0.075317] log_buf_len: -2147483648 bytes
[    0.075319] early log buf free: 33549896(-28%)

Make "free" unsigned integer and use appropriate printk() specifier.

Link: http://lkml.kernel.org/r/20181010113308.9337-1-sergey.senozhatsky@gmail.com
To: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-kernel@vger.kernel.org
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: NPetr Mladek <pmladek@suse.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

4465a916

printk: lock/unlock console only for new logbuf entries · 90d73768

由 Sergey Senozhatsky 提交于 10月 02, 2018

[ Upstream commit 3ac37a93fa9217e576bebfd4ba3e80edaaeb2289 ]

Prior to commit 5c2992ee ("printk: remove console flushing special
cases for partial buffered lines") we would do console_cont_flush()
for each pr_cont() to print cont fragments, so console_unlock() would
actually print data:

	pr_cont();
	 console_lock();
	 console_unlock()
	  console_cont_flush(); // print cont fragment
	...
	pr_cont();
	 console_lock();
	 console_unlock()
	  console_cont_flush(); // print cont fragment

We don't do console_cont_flush() anymore, so when we do pr_cont()
console_unlock() does nothing (unless we flushed the cont buffer):

	pr_cont();
	 console_lock();
	 console_unlock();      // noop
	...
	pr_cont();
	 console_lock();
	 console_unlock();      // noop
	...
	pr_cont();
	  cont_flush();
	    console_lock();
	    console_unlock();   // print data

We also wakeup klogd purposelessly for pr_cont() output - un-flushed
cont buffer is not stored in log_buf; there is nothing to pull.

Thus we can console_lock()/console_unlock()/wake_up_klogd() only when
we know that we log_store()-ed a message and there is something to
print to the consoles/syslog.

Link: http://lkml.kernel.org/r/20181002023836.4487-3-sergey.senozhatsky@gmail.com
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Tejun Heo <tj@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: NPetr Mladek <pmladek@suse.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

90d73768

24 11月, 2019 7 次提交

bpf: btf: Fix a missing check bug · 01404588

由 Wenwen Wang 提交于 10月 07, 2018

[ Upstream commit 8af03d1ae2e154a8be3631e8694b87007e1bdbc2 ]

In btf_parse_hdr(), the length of the btf data header is firstly copied
from the user space to 'hdr_len' and checked to see whether it is larger
than 'btf_data_size'. If yes, an error code EINVAL is returned. Otherwise,
the whole header is copied again from the user space to 'btf->hdr'.
However, after the second copy, there is no check between
'btf->hdr->hdr_len' and 'hdr_len' to confirm that the two copies get the
same value. Given that the btf data is in the user space, a malicious user
can race to change the data between the two copies. By doing so, the user
can provide malicious data to the kernel and cause undefined behavior.

This patch adds a necessary check after the second copy, to make sure
'btf->hdr->hdr_len' has the same value as 'hdr_len'. Otherwise, an error
code EINVAL will be returned.
Signed-off-by: NWenwen Wang <wang6495@umn.edu>
Acked-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

01404588

printk: Give error on attempt to set log buffer length to over 2G · 3404019d

由 He Zhe 提交于 9月 30, 2018

[ Upstream commit e6fe3e5b7d16e8f146a4ae7fe481bc6e97acde1e ]

The current printk() is ready to handle log buffer size up to 2G.
Give an explicit error for users who want to use larger log buffer.

Also fix printk formatting to show the 2G as a positive number.

Link: http://lkml.kernel.org/r/20181008135916.gg4kkmoki5bgtco5@pathway.suse.cz
Cc: rostedt@goodmis.org
Cc: linux-kernel@vger.kernel.org
Suggested-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: NHe Zhe <zhe.he@windriver.com>
Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
[pmladek: Fixed to the really safe limit 2GB.]
Signed-off-by: NPetr Mladek <pmladek@suse.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

3404019d

kexec: Allocate decrypted control pages for kdump if SME is enabled · 624cd074

由 Lianbo Jiang 提交于 9月 30, 2018

[ Upstream commit 9cf38d5559e813cccdba8b44c82cc46ba48d0896 ]

When SME is enabled in the first kernel, it needs to allocate decrypted
pages for kdump because when the kdump kernel boots, these pages need to
be accessed decrypted in the initial boot stage, before SME is enabled.

 [ bp: clean up text. ]
Signed-off-by: NLianbo Jiang <lijiang@redhat.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NTom Lendacky <thomas.lendacky@amd.com>
Cc: kexec@lists.infradead.org
Cc: tglx@linutronix.de
Cc: mingo@redhat.com
Cc: hpa@zytor.com
Cc: akpm@linux-foundation.org
Cc: dan.j.williams@intel.com
Cc: bhelgaas@google.com
Cc: baiyaowei@cmss.chinamobile.com
Cc: tiwai@suse.de
Cc: brijesh.singh@amd.com
Cc: dyoung@redhat.com
Cc: bhe@redhat.com
Cc: jroedel@suse.de
Link: https://lkml.kernel.org/r/20180930031033.22110-3-lijiang@redhat.comSigned-off-by: NSasha Levin <sashal@kernel.org>

624cd074

printk: Correct wrong casting · 799c98cb

由 He Zhe 提交于 9月 30, 2018

[ Upstream commit 51a72ab7372d85c96104e58036f1b49ba11e5d2b ]

log_first_seq and console_seq are 64-bit unsigned integers.
Correct a wrong casting that might cut off the output.

Link: http://lkml.kernel.org/r/1538239553-81805-2-git-send-email-zhe.he@windriver.com
Cc: rostedt@goodmis.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NHe Zhe <zhe.he@windriver.com>
[sergey.senozhatsky@gmail.com: More descriptive commit message]
Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: NPetr Mladek <pmladek@suse.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

799c98cb

cpu/SMT: State SMT is disabled even with nosmt and without "=force" · 89cf2472

由 Borislav Petkov 提交于 10月 04, 2018

[ Upstream commit d0e7d14455d41163126afecd0fcce935463cc512 ]

When booting with "nosmt=force" a message is issued into dmesg to
confirm that SMT has been force-disabled but such a message is not
issued when only "nosmt" is on the kernel command line.

Fix that.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20181004172227.10094-1-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

89cf2472

printk: CON_PRINTBUFFER console registration is a bit racy · 671ce9f8

由 Sergey Senozhatsky 提交于 9月 28, 2018

[ Upstream commit 884e370ea88c109a3b982f4eb9ecd82510a3a1fe ]

CON_PRINTBUFFER console registration requires us to do several
preparation steps:
- Rollback console_seq to replay logbuf messages which were already
  seen on other consoles;
- Set exclusive_console flag so console_unlock() will ->write() logbuf
  messages only to the exclusive_console driver.

The way we do it, however, is a bit racy

	logbuf_lock_irqsave(flags);
	console_seq = syslog_seq;
	console_idx = syslog_idx;
	logbuf_unlock_irqrestore(flags);
						<< preemption enabled
						<< irqs enabled
	exclusive_console = newcon;
	console_unlock();

We rollback console_seq under logbuf_lock with IRQs disabled, but
we set exclusive_console with local IRQs enabled and logbuf unlocked.
If the system oops-es or panic-s before we set exclusive_console - and
given that we have IRQs and preemption enabled there is such a
possibility - we will re-play all logbuf messages to every registered
console, which may be a bit annoying and time consuming.

Move exclusive_console assignment to the same IRQs-disabled and
logbuf_lock-protected section where we rollback console_seq.

Link: http://lkml.kernel.org/r/20180928095304.9972-1-sergey.senozhatsky@gmail.com
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: NPetr Mladek <pmladek@suse.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

671ce9f8

printk: Do not miss new messages when replaying the log · cd120df1

由 Petr Mladek 提交于 9月 13, 2018

[ Upstream commit f92b070f2dc89a8ff1a0cc8b608e20abef894c7d ]

The variable "exclusive_console" is used to reply all existing messages
on a newly registered console. It is cleared when all messages are out.

The problem is that new messages might appear in the meantime. These
are then visible only on the exclusive console.

The obvious solution is to clear "exclusive_console" after we replay
all messages that were already proceed before we started the reply.
Reported-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Link: http://lkml.kernel.org/r/20180913123406.14378-1-pmladek@suse.com
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: linux-kernel@vger.kernel.org
Acked-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: NPetr Mladek <pmladek@suse.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

cd120df1

21 11月, 2019 2 次提交

kprobes: Don't call BUG_ON() if there is a kprobe in use on free list · a9cab0fe

由 Masami Hiramatsu 提交于 9月 11, 2018

[ Upstream commit cbdd96f5586151e48317d90a403941ec23f12660 ]

Instead of calling BUG_ON(), if we find a kprobe in use on free kprobe
list, just remove it from the list and keep it on kprobe hash list
as same as other in-use kprobes.
Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Naveen N . Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/153666126882.21306.10738207224288507996.stgit@devboxSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

a9cab0fe

signal: Properly deliver SIGILL from uprobes · 91c3a880

由 Eric W. Biederman 提交于 7月 19, 2018

[ Upstream commit 55a3235fc71bf34303e34a95eeee235b2d2a35dd ]

For userspace to tell the difference between a random signal and an
exception, the exception must include siginfo information.

Using SEND_SIG_FORCED for SIGILL is thus wrong, and it will result
in userspace seeing si_code == SI_USER (like a random signal) instead
of si_code == SI_KERNEL or a more specific si_code as all exceptions
deliver.

Therefore replace force_sig_info(SIGILL, SEND_SIG_FORCE, current)
with force_sig(SIG_ILL, current) which gets this right and is
shorter and easier to type.

Fixes: 014940ba ("uprobes/x86: Send SIGILL if arch_uprobe_post_xol() fails")
Fixes: 0b5256c7 ("uprobes: Send SIGILL if handle_trampoline() fails")
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

91c3a880

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功