1. 27 12月, 2019 21 次提交
  2. 18 12月, 2019 5 次提交
  3. 13 12月, 2019 7 次提交
    • X
      sched/fair: Scale bandwidth quota and period without losing quota/period ratio precision · 742f2319
      Xuewei Zhang 提交于
      commit 4929a4e6faa0f13289a67cae98139e727f0d4a97 upstream.
      
      The quota/period ratio is used to ensure a child task group won't get
      more bandwidth than the parent task group, and is calculated as:
      
        normalized_cfs_quota() = [(quota_us << 20) / period_us]
      
      If the quota/period ratio was changed during this scaling due to
      precision loss, it will cause inconsistency between parent and child
      task groups.
      
      See below example:
      
      A userspace container manager (kubelet) does three operations:
      
       1) Create a parent cgroup, set quota to 1,000us and period to 10,000us.
       2) Create a few children cgroups.
       3) Set quota to 1,000us and period to 10,000us on a child cgroup.
      
      These operations are expected to succeed. However, if the scaling of
      147/128 happens before step 3, quota and period of the parent cgroup
      will be changed:
      
        new_quota: 1148437ns,   1148us
       new_period: 11484375ns, 11484us
      
      And when step 3 comes in, the ratio of the child cgroup will be
      104857, which will be larger than the parent cgroup ratio (104821),
      and will fail.
      
      Scaling them by a factor of 2 will fix the problem.
      Tested-by: NPhil Auld <pauld@redhat.com>
      Signed-off-by: NXuewei Zhang <xueweiz@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NPhil Auld <pauld@redhat.com>
      Cc: Anton Blanchard <anton@ozlabs.org>
      Cc: Ben Segall <bsegall@google.com>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Fixes: 2e8e19226398 ("sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup")
      Link: https://lkml.kernel.org/r/20191004001243.140897-1-xueweiz@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      742f2319
    • Y
      bpf: btf: check name validity for various types · 12c49ac4
      Yonghong Song 提交于
      [ Upstream commit eb04bbb608e683f8fd3ef7f716e2fa32dd90861f ]
      
      This patch added name checking for the following types:
       . BTF_KIND_PTR, BTF_KIND_ARRAY, BTF_KIND_VOLATILE,
         BTF_KIND_CONST, BTF_KIND_RESTRICT:
           the name must be null
       . BTF_KIND_STRUCT, BTF_KIND_UNION: the struct/member name
           is either null or a valid identifier
       . BTF_KIND_ENUM: the enum type name is either null or a valid
           identifier; the enumerator name must be a valid identifier.
       . BTF_KIND_FWD: the name must be a valid identifier
       . BTF_KIND_TYPEDEF: the name must be a valid identifier
      
      For those places a valid name is required, the name must be
      a valid C identifier. This can be relaxed later if we found
      use cases for a different (non-C) frontend.
      
      Fixes: 69b693f0 ("bpf: btf: Introduce BPF Type Format (BTF)")
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      12c49ac4
    • Y
      bpf: btf: implement btf_name_valid_identifier() · 2f3e380d
      Yonghong Song 提交于
      [ Upstream commit cdbb096adddb3f42584cecb5ec2e07c26815b71f ]
      
      Function btf_name_valid_identifier() have been implemented in
      bpf-next commit 2667a2626f4d ("bpf: btf: Add BTF_KIND_FUNC and
      BTF_KIND_FUNC_PROTO"). Backport this function so later patch
      can use it.
      
      Fixes: 69b693f0 ("bpf: btf: Introduce BPF Type Format (BTF)")
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      2f3e380d
    • J
      audit: Embed key into chunk · 6ce317fd
      Jan Kara 提交于
      [ Upstream commit 8d20d6e9301d7b3777d66d47dd5b89acd645cd39 ]
      
      Currently chunk hash key (which is in fact pointer to the inode) is
      derived as chunk->mark.conn->obj. It is tricky to make this dereference
      reliable for hash table lookups only under RCU as mark can get detached
      from the connector and connector gets freed independently of the
      running lookup. Thus there is a possible use after free / NULL ptr
      dereference issue:
      
      CPU1					CPU2
      					untag_chunk()
      					  ...
      audit_tree_lookup()
        list_for_each_entry_rcu(p, list, hash) {
      					  list_del_rcu(&chunk->hash);
      					  fsnotify_destroy_mark(entry);
      					  fsnotify_put_mark(entry)
          chunk_to_key(p)
            if (!chunk->mark.connector)
      					    ...
      					    hlist_del_init_rcu(&mark->obj_list);
      					    if (hlist_empty(&conn->list)) {
      					      inode = fsnotify_detach_connector_from_object(conn);
      					    mark->connector = NULL;
      					    ...
      					    frees connector from workqueue
            chunk->mark.connector->obj
      
      This race is probably impossible to hit in practice as the race window
      on CPU1 is very narrow and CPU2 has a lot of code to execute. Still it's
      better to have this fixed. Since the inode the chunk is attached to is
      constant during chunk's lifetime it is easy to cache the key in the
      chunk itself and thus avoid these issues.
      Reviewed-by: NRichard Guy Briggs <rgb@redhat.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NPaul Moore <paul@paul-moore.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      6ce317fd
    • A
      perf/core: Consistently fail fork on allocation failures · 78a917be
      Alexander Shishkin 提交于
      [ Upstream commit 697d877849d4b34ab58d7078d6930bad0ef6fc66 ]
      
      Commit:
      
        313ccb96 ("perf: Allocate context task_ctx_data for child event")
      
      makes the inherit path skip over the current event in case of task_ctx_data
      allocation failure. This, however, is inconsistent with allocation failures
      in perf_event_alloc(), which would abort the fork.
      
      Correct this by returning an error code on task_ctx_data allocation
      failure and failing the fork in that case.
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/20191105075702.60319-1-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      78a917be
    • P
      sched/core: Avoid spurious lock dependencies · 870083b6
      Peter Zijlstra 提交于
      [ Upstream commit ff51ff84d82aea5a889b85f2b9fb3aa2b8691668 ]
      
      While seemingly harmless, __sched_fork() does hrtimer_init(), which,
      when DEBUG_OBJETS, can end up doing allocations.
      
      This then results in the following lock order:
      
        rq->lock
          zone->lock.rlock
            batched_entropy_u64.lock
      
      Which in turn causes deadlocks when we do wakeups while holding that
      batched_entropy lock -- as the random code does.
      
      Solve this by moving __sched_fork() out from under rq->lock. This is
      safe because nothing there relies on rq->lock, as also evident from the
      other __sched_fork() callsite.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: akpm@linux-foundation.org
      Cc: bigeasy@linutronix.de
      Cc: cl@linux.com
      Cc: keescook@chromium.org
      Cc: penberg@kernel.org
      Cc: rientjes@google.com
      Cc: thgarnie@google.com
      Cc: tytso@mit.edu
      Cc: will@kernel.org
      Fixes: b7d5dc21072c ("random: add a spinlock_t to struct batched_entropy")
      Link: https://lkml.kernel.org/r/20191001091837.GK4536@hirez.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      870083b6
    • A
      audit_get_nd(): don't unlock parent too early · 7fb6ef16
      Al Viro 提交于
      [ Upstream commit 69924b89687a2923e88cc42144aea27868913d0e ]
      
      if the child has been negative and just went positive
      under us, we want coherent d_is_positive() and ->d_inode.
      Don't unlock the parent until we'd done that work...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      7fb6ef16
  4. 05 12月, 2019 7 次提交