1. 18 12月, 2019 5 次提交
  2. 13 12月, 2019 7 次提交
    • X
      sched/fair: Scale bandwidth quota and period without losing quota/period ratio precision · 742f2319
      Xuewei Zhang 提交于
      commit 4929a4e6faa0f13289a67cae98139e727f0d4a97 upstream.
      
      The quota/period ratio is used to ensure a child task group won't get
      more bandwidth than the parent task group, and is calculated as:
      
        normalized_cfs_quota() = [(quota_us << 20) / period_us]
      
      If the quota/period ratio was changed during this scaling due to
      precision loss, it will cause inconsistency between parent and child
      task groups.
      
      See below example:
      
      A userspace container manager (kubelet) does three operations:
      
       1) Create a parent cgroup, set quota to 1,000us and period to 10,000us.
       2) Create a few children cgroups.
       3) Set quota to 1,000us and period to 10,000us on a child cgroup.
      
      These operations are expected to succeed. However, if the scaling of
      147/128 happens before step 3, quota and period of the parent cgroup
      will be changed:
      
        new_quota: 1148437ns,   1148us
       new_period: 11484375ns, 11484us
      
      And when step 3 comes in, the ratio of the child cgroup will be
      104857, which will be larger than the parent cgroup ratio (104821),
      and will fail.
      
      Scaling them by a factor of 2 will fix the problem.
      Tested-by: NPhil Auld <pauld@redhat.com>
      Signed-off-by: NXuewei Zhang <xueweiz@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NPhil Auld <pauld@redhat.com>
      Cc: Anton Blanchard <anton@ozlabs.org>
      Cc: Ben Segall <bsegall@google.com>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Fixes: 2e8e19226398 ("sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup")
      Link: https://lkml.kernel.org/r/20191004001243.140897-1-xueweiz@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      742f2319
    • Y
      bpf: btf: check name validity for various types · 12c49ac4
      Yonghong Song 提交于
      [ Upstream commit eb04bbb608e683f8fd3ef7f716e2fa32dd90861f ]
      
      This patch added name checking for the following types:
       . BTF_KIND_PTR, BTF_KIND_ARRAY, BTF_KIND_VOLATILE,
         BTF_KIND_CONST, BTF_KIND_RESTRICT:
           the name must be null
       . BTF_KIND_STRUCT, BTF_KIND_UNION: the struct/member name
           is either null or a valid identifier
       . BTF_KIND_ENUM: the enum type name is either null or a valid
           identifier; the enumerator name must be a valid identifier.
       . BTF_KIND_FWD: the name must be a valid identifier
       . BTF_KIND_TYPEDEF: the name must be a valid identifier
      
      For those places a valid name is required, the name must be
      a valid C identifier. This can be relaxed later if we found
      use cases for a different (non-C) frontend.
      
      Fixes: 69b693f0 ("bpf: btf: Introduce BPF Type Format (BTF)")
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      12c49ac4
    • Y
      bpf: btf: implement btf_name_valid_identifier() · 2f3e380d
      Yonghong Song 提交于
      [ Upstream commit cdbb096adddb3f42584cecb5ec2e07c26815b71f ]
      
      Function btf_name_valid_identifier() have been implemented in
      bpf-next commit 2667a2626f4d ("bpf: btf: Add BTF_KIND_FUNC and
      BTF_KIND_FUNC_PROTO"). Backport this function so later patch
      can use it.
      
      Fixes: 69b693f0 ("bpf: btf: Introduce BPF Type Format (BTF)")
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      2f3e380d
    • J
      audit: Embed key into chunk · 6ce317fd
      Jan Kara 提交于
      [ Upstream commit 8d20d6e9301d7b3777d66d47dd5b89acd645cd39 ]
      
      Currently chunk hash key (which is in fact pointer to the inode) is
      derived as chunk->mark.conn->obj. It is tricky to make this dereference
      reliable for hash table lookups only under RCU as mark can get detached
      from the connector and connector gets freed independently of the
      running lookup. Thus there is a possible use after free / NULL ptr
      dereference issue:
      
      CPU1					CPU2
      					untag_chunk()
      					  ...
      audit_tree_lookup()
        list_for_each_entry_rcu(p, list, hash) {
      					  list_del_rcu(&chunk->hash);
      					  fsnotify_destroy_mark(entry);
      					  fsnotify_put_mark(entry)
          chunk_to_key(p)
            if (!chunk->mark.connector)
      					    ...
      					    hlist_del_init_rcu(&mark->obj_list);
      					    if (hlist_empty(&conn->list)) {
      					      inode = fsnotify_detach_connector_from_object(conn);
      					    mark->connector = NULL;
      					    ...
      					    frees connector from workqueue
            chunk->mark.connector->obj
      
      This race is probably impossible to hit in practice as the race window
      on CPU1 is very narrow and CPU2 has a lot of code to execute. Still it's
      better to have this fixed. Since the inode the chunk is attached to is
      constant during chunk's lifetime it is easy to cache the key in the
      chunk itself and thus avoid these issues.
      Reviewed-by: NRichard Guy Briggs <rgb@redhat.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NPaul Moore <paul@paul-moore.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      6ce317fd
    • A
      perf/core: Consistently fail fork on allocation failures · 78a917be
      Alexander Shishkin 提交于
      [ Upstream commit 697d877849d4b34ab58d7078d6930bad0ef6fc66 ]
      
      Commit:
      
        313ccb96 ("perf: Allocate context task_ctx_data for child event")
      
      makes the inherit path skip over the current event in case of task_ctx_data
      allocation failure. This, however, is inconsistent with allocation failures
      in perf_event_alloc(), which would abort the fork.
      
      Correct this by returning an error code on task_ctx_data allocation
      failure and failing the fork in that case.
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/20191105075702.60319-1-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      78a917be
    • P
      sched/core: Avoid spurious lock dependencies · 870083b6
      Peter Zijlstra 提交于
      [ Upstream commit ff51ff84d82aea5a889b85f2b9fb3aa2b8691668 ]
      
      While seemingly harmless, __sched_fork() does hrtimer_init(), which,
      when DEBUG_OBJETS, can end up doing allocations.
      
      This then results in the following lock order:
      
        rq->lock
          zone->lock.rlock
            batched_entropy_u64.lock
      
      Which in turn causes deadlocks when we do wakeups while holding that
      batched_entropy lock -- as the random code does.
      
      Solve this by moving __sched_fork() out from under rq->lock. This is
      safe because nothing there relies on rq->lock, as also evident from the
      other __sched_fork() callsite.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: akpm@linux-foundation.org
      Cc: bigeasy@linutronix.de
      Cc: cl@linux.com
      Cc: keescook@chromium.org
      Cc: penberg@kernel.org
      Cc: rientjes@google.com
      Cc: thgarnie@google.com
      Cc: tytso@mit.edu
      Cc: will@kernel.org
      Fixes: b7d5dc21072c ("random: add a spinlock_t to struct batched_entropy")
      Link: https://lkml.kernel.org/r/20191001091837.GK4536@hirez.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      870083b6
    • A
      audit_get_nd(): don't unlock parent too early · 7fb6ef16
      Al Viro 提交于
      [ Upstream commit 69924b89687a2923e88cc42144aea27868913d0e ]
      
      if the child has been negative and just went positive
      under us, we want coherent d_is_positive() and ->d_inode.
      Don't unlock the parent until we'd done that work...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      7fb6ef16
  3. 05 12月, 2019 7 次提交
  4. 01 12月, 2019 12 次提交
    • Y
      futex: Prevent robust futex exit race · 2819f403
      Yang Tao 提交于
      commit ca16d5bee59807bf04deaab0a8eccecd5061528c upstream.
      
      Robust futexes utilize the robust_list mechanism to allow the kernel to
      release futexes which are held when a task exits. The exit can be voluntary
      or caused by a signal or fault. This prevents that waiters block forever.
      
      The futex operations in user space store a pointer to the futex they are
      either locking or unlocking in the op_pending member of the per task robust
      list.
      
      After a lock operation has succeeded the futex is queued in the robust list
      linked list and the op_pending pointer is cleared.
      
      After an unlock operation has succeeded the futex is removed from the
      robust list linked list and the op_pending pointer is cleared.
      
      The robust list exit code checks for the pending operation and any futex
      which is queued in the linked list. It carefully checks whether the futex
      value is the TID of the exiting task. If so, it sets the OWNER_DIED bit and
      tries to wake up a potential waiter.
      
      This is race free for the lock operation but unlock has two race scenarios
      where waiters might not be woken up. These issues can be observed with
      regular robust pthread mutexes. PI aware pthread mutexes are not affected.
      
      (1) Unlocking task is killed after unlocking the futex value in user space
          before being able to wake a waiter.
      
              pthread_mutex_unlock()
                      |
                      V
              atomic_exchange_rel (&mutex->__data.__lock, 0)
                              <------------------------killed
                  lll_futex_wake ()                   |
                                                      |
                                                      |(__lock = 0)
                                                      |(enter kernel)
                                                      |
                                                      V
                                                  do_exit()
                                                  exit_mm()
                                                mm_release()
                                              exit_robust_list()
                                              handle_futex_death()
                                                      |
                                                      |(__lock = 0)
                                                      |(uval = 0)
                                                      |
                                                      V
              if ((uval & FUTEX_TID_MASK) != task_pid_vnr(curr))
                      return 0;
      
          The sanity check which ensures that the user space futex is owned by
          the exiting task prevents the wakeup of waiters which in consequence
          block infinitely.
      
      (2) Waiting task is killed after a wakeup and before it can acquire the
          futex in user space.
      
              OWNER                         WAITER
      				futex_wait()
         pthread_mutex_unlock()               |
                      |                       |
                      |(__lock = 0)           |
                      |                       |
                      V                       |
               futex_wake() ------------>  wakeup()
                                              |
                                              |(return to userspace)
                                              |(__lock = 0)
                                              |
                                              V
                              oldval = mutex->__data.__lock
                                                <-----------------killed
          atomic_compare_and_exchange_val_acq (&mutex->__data.__lock,  |
                              id | assume_other_futex_waiters, 0)      |
                                                                       |
                                                                       |
                                                         (enter kernel)|
                                                                       |
                                                                       V
                                                               do_exit()
                                                              |
                                                              |
                                                              V
                                              handle_futex_death()
                                              |
                                              |(__lock = 0)
                                              |(uval = 0)
                                              |
                                              V
              if ((uval & FUTEX_TID_MASK) != task_pid_vnr(curr))
                      return 0;
      
          The sanity check which ensures that the user space futex is owned
          by the exiting task prevents the wakeup of waiters, which seems to
          be correct as the exiting task does not own the futex value, but
          the consequence is that other waiters wont be woken up and block
          infinitely.
      
      In both scenarios the following conditions are true:
      
         - task->robust_list->list_op_pending != NULL
         - user space futex value == 0
         - Regular futex (not PI)
      
      If these conditions are met then it is reasonably safe to wake up a
      potential waiter in order to prevent the above problems.
      
      As this might be a false positive it can cause spurious wakeups, but the
      waiter side has to handle other types of unrelated wakeups, e.g. signals
      gracefully anyway. So such a spurious wakeup will not affect the
      correctness of these operations.
      
      This workaround must not touch the user space futex value and cannot set
      the OWNER_DIED bit because the lock value is 0, i.e. uncontended. Setting
      OWNER_DIED in this case would result in inconsistent state and subsequently
      in malfunction of the owner died handling in user space.
      
      The rest of the user space state is still consistent as no other task can
      observe the list_op_pending entry in the exiting tasks robust list.
      
      The eventually woken up waiter will observe the uncontended lock value and
      take it over.
      
      [ tglx: Massaged changelog and comment. Made the return explicit and not
        	depend on the subsequent check and added constants to hand into
        	handle_futex_death() instead of plain numbers. Fixed a few coding
      	style issues. ]
      
      Fixes: 0771dfef ("[PATCH] lightweight robust futexes: core")
      Signed-off-by: NYang Tao <yang.tao172@zte.com.cn>
      Signed-off-by: NYi Wang <wang.yi59@zte.com.cn>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/1573010582-35297-1-git-send-email-wang.yi59@zte.com.cn
      Link: https://lkml.kernel.org/r/20191106224555.943191378@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2819f403
    • A
      y2038: futex: Move compat implementation into futex.c · d3f8c58d
      Arnd Bergmann 提交于
      commit 04e7712f4460585e5eed5b853fd8b82a9943958f upstream.
      
      We are going to share the compat_sys_futex() handler between 64-bit
      architectures and 32-bit architectures that need to deal with both 32-bit
      and 64-bit time_t, and this is easier if both entry points are in the
      same file.
      
      In fact, most other system call handlers do the same thing these days, so
      let's follow the trend here and merge all of futex_compat.c into futex.c.
      
      In the process, a few minor changes have to be done to make sure everything
      still makes sense: handle_futex_death() and futex_cmpxchg_enabled() become
      local symbol, and the compat version of the fetch_robust_entry() function
      gets renamed to compat_fetch_robust_entry() to avoid a symbol clash.
      
      This is intended as a purely cosmetic patch, no behavior should
      change.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d3f8c58d
    • R
      audit: print empty EXECVE args · 3c69a033
      Richard Guy Briggs 提交于
      [ Upstream commit ea956d8be91edc702a98b7fe1f9463e7ca8c42ab ]
      
      Empty executable arguments were being skipped when printing out the list
      of arguments in an EXECVE record, making it appear they were somehow
      lost.  Include empty arguments as an itemized empty string.
      
      Reproducer:
      	autrace /bin/ls "" "/etc"
      	ausearch --start recent -m execve -i | grep EXECVE
      	type=EXECVE msg=audit(10/03/2018 13:04:03.208:1391) : argc=3 a0=/bin/ls a2=/etc
      
      With fix:
      	type=EXECVE msg=audit(10/03/2018 21:51:38.290:194) : argc=3 a0=/bin/ls a1= a2=/etc
      	type=EXECVE msg=audit(1538617898.290:194): argc=3 a0="/bin/ls" a1="" a2="/etc"
      
      Passes audit-testsuite.  GH issue tracker at
      https://github.com/linux-audit/audit-kernel/issues/99Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
      [PM: cleaned up the commit metadata]
      Signed-off-by: NPaul Moore <paul@paul-moore.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      3c69a033
    • V
      sched/fair: Don't increase sd->balance_interval on newidle balance · 31bced01
      Valentin Schneider 提交于
      [ Upstream commit 3f130a37c442d5c4d66531b240ebe9abfef426b5 ]
      
      When load_balance() fails to move some load because of task affinity,
      we end up increasing sd->balance_interval to delay the next periodic
      balance in the hopes that next time we look, that annoying pinned
      task(s) will be gone.
      
      However, idle_balance() pays no attention to sd->balance_interval, yet
      it will still lead to an increase in balance_interval in case of
      pinned tasks.
      
      If we're going through several newidle balances (e.g. we have a
      periodic task), this can lead to a huge increase of the
      balance_interval in a very small amount of time.
      
      To prevent that, don't increase the balance interval when going
      through a newidle balance.
      
      This is a similar approach to what is done in commit 58b26c4c
      ("sched: Increment cache_nice_tries only on periodic lb"), where we
      disregard newidle balance and rely on periodic balance for more stable
      results.
      Signed-off-by: NValentin Schneider <valentin.schneider@arm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Dietmar.Eggemann@arm.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: patrick.bellasi@arm.com
      Cc: vincent.guittot@linaro.org
      Link: http://lkml.kernel.org/r/1537974727-30788-2-git-send-email-valentin.schneider@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      31bced01
    • P
      sched/topology: Fix off by one bug · ed023646
      Peter Zijlstra 提交于
      [ Upstream commit 993f0b0510dad98b4e6e39506834dab0d13fd539 ]
      
      With the addition of the NUMA identity level, we increased @level by
      one and will run off the end of the array in the distance sort loop.
      
      Fixed: 051f3ca0 ("sched/topology: Introduce NUMA identity node sched domain")
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      ed023646
    • M
      irq/matrix: Fix memory overallocation · 0bbb8382
      Michael Kelley 提交于
      [ Upstream commit 57f01796f14fecf00d330fe39c8d2477ced9cd79 ]
      
      IRQ_MATRIX_SIZE is the number of longs needed for a bitmap, multiplied by
      the size of a long, yielding a byte count. But it is used to size an array
      of longs, which is way more memory than is needed.
      
      Change IRQ_MATRIX_SIZE so it is just the number of longs needed and the
      arrays come out the correct size.
      
      Fixes: 2f75d9e1 ("genirq: Implement bitmap matrix allocator")
      Signed-off-by: NMichael Kelley <mikelley@microsoft.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: KY Srinivasan <kys@microsoft.com>
      Link: https://lkml.kernel.org/r/1541032428-10392-1-git-send-email-mikelley@microsoft.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      0bbb8382
    • B
      kernel/panic.c: do not append newline to the stack protector panic string · 023c071f
      Borislav Petkov 提交于
      [ Upstream commit 95c4fb78fb23081472465ca20d5d31c4b780ed82 ]
      
      ... because panic() itself already does this. Otherwise you have
      line-broken trailer:
      
        [    1.836965] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: pgd_alloc+0x29e/0x2a0
        [    1.836965]  ]---
      
      Link: http://lkml.kernel.org/r/20181008202901.7894-1-bp@alien8.deSigned-off-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      023c071f
    • M
      bpf, btf: fix a missing check bug in btf_parse · 54299e1c
      Martin Lau 提交于
      [ Upstream commit 4a6998aff82a20a1aece86a186d8e5263f8b2315 ]
      
      Wenwen Wang reported:
      
        In btf_parse(), the header of the user-space btf data 'btf_data'
        is firstly parsed and verified through btf_parse_hdr().
        In btf_parse_hdr(), the header is copied from user-space 'btf_data'
        to kernel-space 'btf->hdr' and then verified. If no error happens
        during the verification process, the whole data of 'btf_data',
        including the header, is then copied to 'data' in btf_parse(). It
        is obvious that the header is copied twice here. More importantly,
        no check is enforced after the second copy to make sure the headers
        obtained in these two copies are same. Given that 'btf_data' resides
        in the user space, a malicious user can race to modify the header
        between these two copies. By doing so, the user can inject
        inconsistent data, which can cause undefined behavior of the
        kernel and introduce potential security risk.
      
      This issue is similar to the one fixed in commit 8af03d1ae2e1 ("bpf:
      btf: Fix a missing check bug"). To fix it, this patch copies the user
      'btf_data' *before* parsing / verifying the BTF header.
      
      Fixes: 69b693f0 ("bpf: btf: Introduce BPF Type Format (BTF)")
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Co-developed-by: NWenwen Wang <wang6495@umn.edu>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      54299e1c
    • T
      bpf: devmap: fix wrong interface selection in notifier_call · 8044e741
      Taehee Yoo 提交于
      [ Upstream commit f592f804831f1cf9d1f9966f58c80f150e6829b5 ]
      
      The dev_map_notification() removes interface in devmap if
      unregistering interface's ifindex is same.
      But only checking ifindex is not enough because other netns can have
      same ifindex. so that wrong interface selection could occurred.
      Hence netdev pointer comparison code is added.
      
      v2: compare netdev pointer instead of using net_eq() (Daniel Borkmann)
      v1: Initial patch
      
      Fixes: 2ddf71e2 ("net: add notifier hooks for devmap bpf map")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      8044e741
    • C
      swiotlb: do not panic on mapping failures · ad9a4e96
      Christoph Hellwig 提交于
      [ Upstream commit 8088546832aa2c0d8f99dd56edf6384f8a9b63b3 ]
      
      All properly written drivers now have error handling in the
      dma_map_single / dma_map_page callers.  As swiotlb_tbl_map_single already
      prints a useful warning when running out of swiotlb pool space we can
      also remove swiotlb_full entirely as it serves no purpose now.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      ad9a4e96
    • S
      printk: fix integer overflow in setup_log_buf() · 4465a916
      Sergey Senozhatsky 提交于
      [ Upstream commit d2130e82e9454304e9b91ba9da551b5989af8c27 ]
      
      The way we calculate logbuf free space percentage overflows signed
      integer:
      
      	int free;
      
      	free = __LOG_BUF_LEN - log_next_idx;
      	pr_info("early log buf free: %u(%u%%)\n",
      		free, (free * 100) / __LOG_BUF_LEN);
      
      We support LOG_BUF_LEN of up to 1<<25 bytes. Since setup_log_buf() is
      called during early init, logbuf is mostly empty, so
      
      	__LOG_BUF_LEN - log_next_idx
      
      is close to 1<<25. Thus when we multiply it by 100, we overflow signed
      integer value range: 100 is 2^6 + 2^5 + 2^2.
      
      Example, booting with LOG_BUF_LEN 1<<25 and log_buf_len=2G
      boot param:
      
      [    0.075317] log_buf_len: -2147483648 bytes
      [    0.075319] early log buf free: 33549896(-28%)
      
      Make "free" unsigned integer and use appropriate printk() specifier.
      
      Link: http://lkml.kernel.org/r/20181010113308.9337-1-sergey.senozhatsky@gmail.com
      To: Steven Rostedt <rostedt@goodmis.org>
      Cc: linux-kernel@vger.kernel.org
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: NPetr Mladek <pmladek@suse.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      4465a916
    • S
      printk: lock/unlock console only for new logbuf entries · 90d73768
      Sergey Senozhatsky 提交于
      [ Upstream commit 3ac37a93fa9217e576bebfd4ba3e80edaaeb2289 ]
      
      Prior to commit 5c2992ee ("printk: remove console flushing special
      cases for partial buffered lines") we would do console_cont_flush()
      for each pr_cont() to print cont fragments, so console_unlock() would
      actually print data:
      
      	pr_cont();
      	 console_lock();
      	 console_unlock()
      	  console_cont_flush(); // print cont fragment
      	...
      	pr_cont();
      	 console_lock();
      	 console_unlock()
      	  console_cont_flush(); // print cont fragment
      
      We don't do console_cont_flush() anymore, so when we do pr_cont()
      console_unlock() does nothing (unless we flushed the cont buffer):
      
      	pr_cont();
      	 console_lock();
      	 console_unlock();      // noop
      	...
      	pr_cont();
      	 console_lock();
      	 console_unlock();      // noop
      	...
      	pr_cont();
      	  cont_flush();
      	    console_lock();
      	    console_unlock();   // print data
      
      We also wakeup klogd purposelessly for pr_cont() output - un-flushed
      cont buffer is not stored in log_buf; there is nothing to pull.
      
      Thus we can console_lock()/console_unlock()/wake_up_klogd() only when
      we know that we log_store()-ed a message and there is something to
      print to the consoles/syslog.
      
      Link: http://lkml.kernel.org/r/20181002023836.4487-3-sergey.senozhatsky@gmail.com
      To: Steven Rostedt <rostedt@goodmis.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dmitriy Vyukov <dvyukov@google.com>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: LKML <linux-kernel@vger.kernel.org>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: NPetr Mladek <pmladek@suse.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      90d73768
  5. 24 11月, 2019 7 次提交
  6. 21 11月, 2019 2 次提交