1. 03 7月, 2018 9 次提交
    • P
      kthread: Simplify kthread_park() completion · f83ee19b
      Peter Zijlstra 提交于
      Oleg explains the reason we could hit park+park is that
      smpboot_update_cpumask_percpu_thread()'s
      
        for_each_cpu_and(cpu, &tmp, cpu_online_mask)
      	smpboot_park_kthread();
      
      turns into:
      
        for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask, (void)and)
      	smpboot_park_kthread();
      
      on UP, ignoring the mask. But since we just completely removed that
      function, this is no longer relevant.
      
      So revert commit:
      
        b1f5b378 ("kthread: Allow kthread_park() on a parked kthread")
      Suggested-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f83ee19b
    • P
      smpboot: Remove cpumask from the API · 167a8867
      Peter Zijlstra 提交于
      Now that the sole use of the whole smpboot_*cpumask() API is gone,
      remove it.
      Suggested-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      167a8867
    • P
      watchdog/softlockup: Replace "watchdog/%u" threads with cpu_stop_work · 9cf57731
      Peter Zijlstra 提交于
      Oleg suggested to replace the "watchdog/%u" threads with
      cpu_stop_work. That removes one thread per CPU while at the same time
      fixes softlockup vs SCHED_DEADLINE.
      
      But more importantly, it does away with the single
      smpboot_update_cpumask_percpu_thread() user, which allows
      cleanups/shrinkage of the smpboot interface.
      Suggested-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      9cf57731
    • P
      kthread, sched/core: Fix kthread_parkme() (again...) · 1cef1150
      Peter Zijlstra 提交于
      Gaurav reports that commit:
      
        85f1abe0 ("kthread, sched/wait: Fix kthread_parkme() completion issue")
      
      isn't working for him. Because of the following race:
      
      > controller Thread                               CPUHP Thread
      > takedown_cpu
      > kthread_park
      > kthread_parkme
      > Set KTHREAD_SHOULD_PARK
      >                                                 smpboot_thread_fn
      >                                                 set Task interruptible
      >
      >
      > wake_up_process
      >  if (!(p->state & state))
      >                 goto out;
      >
      >                                                 Kthread_parkme
      >                                                 SET TASK_PARKED
      >                                                 schedule
      >                                                 raw_spin_lock(&rq->lock)
      > ttwu_remote
      > waiting for __task_rq_lock
      >                                                 context_switch
      >
      >                                                 finish_lock_switch
      >
      >
      >
      >                                                 Case TASK_PARKED
      >                                                 kthread_park_complete
      >
      >
      > SET Running
      
      Furthermore, Oleg noticed that the whole scheduler TASK_PARKED
      handling is buggered because the TASK_DEAD thing is done with
      preemption disabled, the current code can still complete early on
      preemption :/
      
      So basically revert that earlier fix and go with a variant of the
      alternative mentioned in the commit. Promote TASK_PARKED to special
      state to avoid the store-store issue on task->state leading to the
      WARN in kthread_unpark() -> __kthread_bind().
      
      But in addition, add wait_task_inactive() to kthread_park() to ensure
      the task really is PARKED when we return from kthread_park(). This
      avoids the whole kthread still gets migrated nonsense -- although it
      would be really good to get this done differently.
      Reported-by: NGaurav Kohli <gkohli@codeaurora.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 85f1abe0 ("kthread, sched/wait: Fix kthread_parkme() completion issue")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      1cef1150
    • V
      sched/util_est: Fix util_est_dequeue() for throttled cfs_rq · 3482d98b
      Vincent Guittot 提交于
      When a cfs_rq is throttled, parent cfs_rq->nr_running is decreased and
      everything happens at cfs_rq level. Currently util_est stays unchanged
      in such case and it keeps accounting the utilization of throttled tasks.
      This can somewhat make sense as we don't dequeue tasks but only throttled
      cfs_rq.
      
      If a task of another group is enqueued/dequeued and root cfs_rq becomes
      idle during the dequeue, util_est will be cleared whereas it was
      accounting util_est of throttled tasks before. So the behavior of util_est
      is not always the same regarding throttled tasks and depends of side
      activity. Furthermore, util_est will not be updated when the cfs_rq is
      unthrottled as everything happens at cfs_rq level. Main results is that
      util_est will stay null whereas we now have running tasks. We have to wait
      for the next dequeue/enqueue of the previously throttled tasks to get an
      up to date util_est.
      
      Remove the assumption that cfs_rq's estimated utilization of a CPU is 0
      if there is no running task so the util_est of a task remains until the
      latter is dequeued even if its cfs_rq has been throttled.
      Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NPatrick Bellasi <patrick.bellasi@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 7f65ea42 ("sched/fair: Add util_est on top of PELT")
      Link: http://lkml.kernel.org/r/1528972380-16268-1-git-send-email-vincent.guittot@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3482d98b
    • X
      sched/fair: Advance global expiration when period timer is restarted · f1d1be8a
      Xunlei Pang 提交于
      When period gets restarted after some idle time, start_cfs_bandwidth()
      doesn't update the expiration information, expire_cfs_rq_runtime() will
      see cfs_rq->runtime_expires smaller than rq clock and go to the clock
      drift logic, wasting needless CPU cycles on the scheduler hot path.
      
      Update the global expiration in start_cfs_bandwidth() to avoid frequent
      expire_cfs_rq_runtime() calls once a new period begins.
      Signed-off-by: NXunlei Pang <xlpang@linux.alibaba.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NBen Segall <bsegall@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20180620101834.24455-2-xlpang@linux.alibaba.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f1d1be8a
    • X
      sched/fair: Fix bandwidth timer clock drift condition · 512ac999
      Xunlei Pang 提交于
      I noticed that cgroup task groups constantly get throttled even
      if they have low CPU usage, this causes some jitters on the response
      time to some of our business containers when enabling CPU quotas.
      
      It's very simple to reproduce:
      
        mkdir /sys/fs/cgroup/cpu/test
        cd /sys/fs/cgroup/cpu/test
        echo 100000 > cpu.cfs_quota_us
        echo $$ > tasks
      
      then repeat:
      
        cat cpu.stat | grep nr_throttled  # nr_throttled will increase steadily
      
      After some analysis, we found that cfs_rq::runtime_remaining will
      be cleared by expire_cfs_rq_runtime() due to two equal but stale
      "cfs_{b|q}->runtime_expires" after period timer is re-armed.
      
      The current condition to judge clock drift in expire_cfs_rq_runtime()
      is wrong, the two runtime_expires are actually the same when clock
      drift happens, so this condtion can never hit. The orginal design was
      correctly done by this commit:
      
        a9cf55b2 ("sched: Expire invalid runtime")
      
      ... but was changed to be the current implementation due to its locking bug.
      
      This patch introduces another way, it adds a new field in both structures
      cfs_rq and cfs_bandwidth to record the expiration update sequence, and
      uses them to figure out if clock drift happens (true if they are equal).
      Signed-off-by: NXunlei Pang <xlpang@linux.alibaba.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NBen Segall <bsegall@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 51f2176d ("sched/fair: Fix unlocked reads of some cfs_b->quota/period")
      Link: http://lkml.kernel.org/r/20180620101834.24455-1-xlpang@linux.alibaba.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      512ac999
    • V
      sched/rt: Fix call to cpufreq_update_util() · 296b2ffe
      Vincent Guittot 提交于
      With commit:
      
        8f111bc3 ("cpufreq/schedutil: Rewrite CPUFREQ_RT support")
      
      the schedutil governor uses rq->rt.rt_nr_running to detect whether an
      RT task is currently running on the CPU and to set frequency to max
      if necessary.
      
      cpufreq_update_util() is called in enqueue/dequeue_top_rt_rq() but
      rq->rt.rt_nr_running has not been updated yet when dequeue_top_rt_rq() is
      called so schedutil still considers that an RT task is running when the
      last task is dequeued. The update of rq->rt.rt_nr_running happens later
      in dequeue_rt_stack().
      
      In fact, we can take advantage of the sequence that the dequeue then
      re-enqueue rt entities when a rt task is enqueued or dequeued;
      As a result enqueue_top_rt_rq() is always called when a task is
      enqueued or dequeued and also when groups are throttled or unthrottled.
      The only place that not use enqueue_top_rt_rq() is when root rt_rq is
      throttled.
      Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: efault@gmx.de
      Cc: juri.lelli@redhat.com
      Cc: patrick.bellasi@arm.com
      Cc: viresh.kumar@linaro.org
      Fixes: 8f111bc3 ('cpufreq/schedutil: Rewrite CPUFREQ_RT support')
      Link: http://lkml.kernel.org/r/1530021202-21695-1-git-send-email-vincent.guittot@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      296b2ffe
    • F
      sched/nohz: Skip remote tick on idle task entirely · d9c0ffca
      Frederic Weisbecker 提交于
      Some people have reported that the warning in sched_tick_remote()
      occasionally triggers, especially in favour of some RCU-Torture
      pressure:
      
      	WARNING: CPU: 11 PID: 906 at kernel/sched/core.c:3138 sched_tick_remote+0xb6/0xc0
      	Modules linked in:
      	CPU: 11 PID: 906 Comm: kworker/u32:3 Not tainted 4.18.0-rc2+ #1
      	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
      	Workqueue: events_unbound sched_tick_remote
      	RIP: 0010:sched_tick_remote+0xb6/0xc0
      	Code: e8 0f 06 b8 00 c6 03 00 fb eb 9d 8b 43 04 85 c0 75 8d 48 8b 83 e0 0a 00 00 48 85 c0 75 81 eb 88 48 89 df e8 bc fe ff ff eb aa <0f> 0b eb
      	+c5 66 0f 1f 44 00 00 bf 17 00 00 00 e8 b6 2e fe ff 0f b6
      	Call Trace:
      	 process_one_work+0x1df/0x3b0
      	 worker_thread+0x44/0x3d0
      	 kthread+0xf3/0x130
      	 ? set_worker_desc+0xb0/0xb0
      	 ? kthread_create_worker_on_cpu+0x70/0x70
      	 ret_from_fork+0x35/0x40
      
      This happens when the remote tick applies on an idle task. Usually the
      idle_cpu() check avoids that, but it is performed before we lock the
      runqueue and it is therefore racy. It was intended to be that way in
      order to prevent from useless runqueue locks since idle task tick
      callback is a no-op.
      
      Now if the racy check slips out of our hands and we end up remotely
      ticking an idle task, the empty task_tick_idle() is harmless. Still
      it won't pass the WARN_ON_ONCE() test that ensures rq_clock_task() is
      not too far from curr->se.exec_start because update_curr_idle() doesn't
      update the exec_start value like other scheduler policies. Hence the
      reported false positive.
      
      So let's have another check, while the rq is locked, to make sure we
      don't remote tick on an idle task. The lockless idle_cpu() still applies
      to avoid unecessary rq lock contention.
      Reported-by: NJacek Tomaka <jacekt@dug.com>
      Reported-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reported-by: NAnna-Maria Gleixner <anna-maria@linutronix.de>
      Signed-off-by: NFrederic Weisbecker <frederic@kernel.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1530203381-31234-1-git-send-email-frederic@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d9c0ffca
  2. 28 6月, 2018 1 次提交
  3. 27 6月, 2018 1 次提交
  4. 23 6月, 2018 1 次提交
    • W
      rseq: Avoid infinite recursion when delivering SIGSEGV · 784e0300
      Will Deacon 提交于
      When delivering a signal to a task that is using rseq, we call into
      __rseq_handle_notify_resume() so that the registers pushed in the
      sigframe are updated to reflect the state of the restartable sequence
      (for example, ensuring that the signal returns to the abort handler if
      necessary).
      
      However, if the rseq management fails due to an unrecoverable fault when
      accessing userspace or certain combinations of RSEQ_CS_* flags, then we
      will attempt to deliver a SIGSEGV. This has the potential for infinite
      recursion if the rseq code continuously fails on signal delivery.
      
      Avoid this problem by using force_sigsegv() instead of force_sig(), which
      is explicitly designed to reset the SEGV handler to SIG_DFL in the case
      of a recursive fault. In doing so, remove rseq_signal_deliver() from the
      internal rseq API and have an optional struct ksignal * parameter to
      rseq_handle_notify_resume() instead.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: peterz@infradead.org
      Cc: paulmck@linux.vnet.ibm.com
      Cc: boqun.feng@gmail.com
      Link: https://lkml.kernel.org/r/1529664307-983-1-git-send-email-will.deacon@arm.com
      784e0300
  5. 22 6月, 2018 7 次提交
  6. 21 6月, 2018 2 次提交
  7. 20 6月, 2018 4 次提交
  8. 19 6月, 2018 1 次提交
  9. 16 6月, 2018 5 次提交
  10. 15 6月, 2018 7 次提交
    • M
      sched/core / kcov: avoid kcov_area during task switch · 0ed557aa
      Mark Rutland 提交于
      During a context switch, we first switch_mm() to the next task's mm,
      then switch_to() that new task.  This means that vmalloc'd regions which
      had previously been faulted in can transiently disappear in the context
      of the prev task.
      
      Functions instrumented by KCOV may try to access a vmalloc'd kcov_area
      during this window, and as the fault handling code is instrumented, this
      results in a recursive fault.
      
      We must avoid accessing any kcov_area during this window.  We can do so
      with a new flag in kcov_mode, set prior to switching the mm, and cleared
      once the new task is live.  Since task_struct::kcov_mode isn't always a
      specific enum kcov_mode value, this is made an unsigned int.
      
      The manipulation is hidden behind kcov_{prepare,finish}_switch() helpers,
      which are empty for !CONFIG_KCOV kernels.
      
      The code uses macros because I can't use static inline functions without a
      circular include dependency between <linux/sched.h> and <linux/kcov.h>,
      since the definition of task_struct uses things defined in <linux/kcov.h>
      
      Link: http://lkml.kernel.org/r/20180504135535.53744-4-mark.rutland@arm.comSigned-off-by: NMark Rutland <mark.rutland@arm.com>
      Acked-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0ed557aa
    • M
      kcov: prefault the kcov_area · dc55daff
      Mark Rutland 提交于
      On many architectures the vmalloc area is lazily faulted in upon first
      access.  This is problematic for KCOV, as __sanitizer_cov_trace_pc
      accesses the (vmalloc'd) kcov_area, and fault handling code may be
      instrumented.  If an access to kcov_area faults, this will result in
      mutual recursion through the fault handling code and
      __sanitizer_cov_trace_pc(), eventually leading to stack corruption
      and/or overflow.
      
      We can avoid this by faulting in the kcov_area before
      __sanitizer_cov_trace_pc() is permitted to access it.  Once it has been
      faulted in, it will remain present in the process page tables, and will
      not fault again.
      
      [akpm@linux-foundation.org: code cleanup]
      [akpm@linux-foundation.org: add comment explaining kcov_fault_in_area()]
      [akpm@linux-foundation.org: fancier code comment from Mark]
      Link: http://lkml.kernel.org/r/20180504135535.53744-3-mark.rutland@arm.comSigned-off-by: NMark Rutland <mark.rutland@arm.com>
      Acked-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dc55daff
    • M
      kcov: ensure irq code sees a valid area · c9484b98
      Mark Rutland 提交于
      Patch series "kcov: fix unexpected faults".
      
      These patches fix a few issues where KCOV code could trigger recursive
      faults, discovered while debugging a patch enabling KCOV for arch/arm:
      
      * On CONFIG_PREEMPT kernels, there's a small race window where
        __sanitizer_cov_trace_pc() can see a bogus kcov_area.
      
      * Lazy faulting of the vmalloc area can cause mutual recursion between
        fault handling code and __sanitizer_cov_trace_pc().
      
      * During the context switch, switching the mm can cause the kcov_area to
        be transiently unmapped.
      
      These are prerequisites for enabling KCOV on arm, but the issues
      themsevles are generic -- we just happen to avoid them by chance rather
      than design on x86-64 and arm64.
      
      This patch (of 3):
      
      For kernels built with CONFIG_PREEMPT, some C code may execute before or
      after the interrupt handler, while the hardirq count is zero.  In these
      cases, in_task() can return true.
      
      A task can be interrupted in the middle of a KCOV_DISABLE ioctl while it
      resets the task's kcov data via kcov_task_init().  Instrumented code
      executed during this period will call __sanitizer_cov_trace_pc(), and as
      in_task() returns true, will inspect t->kcov_mode before trying to write
      to t->kcov_area.
      
      In kcov_init_task() we update t->kcov_{mode,area,size} with plain stores,
      which may be re-ordered, torn, etc.  Thus __sanitizer_cov_trace_pc() may
      see bogus values for any of these fields, and may attempt to write to
      memory which is not mapped.
      
      Let's avoid this by using WRITE_ONCE() to set t->kcov_mode, with a
      barrier() to ensure this is ordered before we clear t->kov_{area,size}.
      This ensures that any code execute while kcov_init_task() is preempted
      will either see valid values for t->kcov_{area,size}, or will see that
      t->kcov_mode is KCOV_MODE_DISABLED, and bail out without touching
      t->kcov_area.
      
      Link: http://lkml.kernel.org/r/20180504135535.53744-2-mark.rutland@arm.comSigned-off-by: NMark Rutland <mark.rutland@arm.com>
      Acked-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c9484b98
    • S
      kernel/relay.c: change return type to vm_fault_t · 3fb3894b
      Souptick Joarder 提交于
      Use new return type vm_fault_t for fault handler.  For now, this is just
      documenting that the function returns a VM_FAULT value rather than an
      errno.  Once all instances are converted, vm_fault_t will become a
      distinct type.
      
      commit 1c8f4220 ("mm: change return type to vm_fault_t")
      
      Link: http://lkml.kernel.org/r/20180510140335.GA25363@jordon-HP-15-Notebook-PCSigned-off-by: NSouptick Joarder <jrdr.linux@gmail.com>
      Reviewed-by: NMatthew Wilcox <mawilcox@microsoft.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Eric Biggers <ebiggers@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3fb3894b
    • T
      mm: check for SIGKILL inside dup_mmap() loop · 655c79bb
      Tetsuo Handa 提交于
      As a theoretical problem, dup_mmap() of an mm_struct with 60000+ vmas
      can loop while potentially allocating memory, with mm->mmap_sem held for
      write by current thread.  This is bad if current thread was selected as
      an OOM victim, for current thread will continue allocations using memory
      reserves while OOM reaper is unable to reclaim memory.
      
      As an actually observable problem, it is not difficult to make OOM
      reaper unable to reclaim memory if the OOM victim is blocked at
      i_mmap_lock_write() in this loop.  Unfortunately, since nobody can
      explain whether it is safe to use killable wait there, let's check for
      SIGKILL before trying to allocate memory.  Even without an OOM event,
      there is no point with continuing the loop from the beginning if current
      thread is killed.
      
      I tested with debug printk().  This patch should be safe because we
      already fail if security_vm_enough_memory_mm() or
      kmem_cache_alloc(GFP_KERNEL) fails and exit_mmap() handles it.
      
         ***** Aborting dup_mmap() due to SIGKILL *****
         ***** Aborting dup_mmap() due to SIGKILL *****
         ***** Aborting dup_mmap() due to SIGKILL *****
         ***** Aborting dup_mmap() due to SIGKILL *****
         ***** Aborting exit_mmap() due to NULL mmap *****
      
      [akpm@linux-foundation.org: add comment]
      Link: http://lkml.kernel.org/r/201804071938.CDE04681.SOFVQJFtMHOOLF@I-love.SAKURA.ne.jpSigned-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      655c79bb
    • J
      kexec: yield to scheduler when loading kimage segments · a8311f64
      Jarrett Farnitano 提交于
      Without yielding while loading kimage segments, a large initrd will
      block all other work on the CPU performing the load until it is
      completed.  For example loading an initrd of 200MB on a low power single
      core system will lock up the system for a few seconds.
      
      To increase system responsiveness to other tasks at that time, call
      cond_resched() in both the crash kernel and normal kernel segment
      loading loops.
      
      I did run into a practical problem.  Hardware watchdogs on embedded
      systems can have short timers on the order of seconds.  If the system is
      locked up for a few seconds with only a single core available, the
      watchdog may not be pet in a timely fashion.  If this happens, the
      hardware watchdog will fire and reset the system.
      
      This really only becomes a problem when you are working with a single
      core, a decently sized initrd, and have a constrained hardware watchdog.
      
      Link: http://lkml.kernel.org/r/1528738546-3328-1-git-send-email-jmf@amazon.comSigned-off-by: NJarrett Farnitano <jmf@amazon.com>
      Reviewed-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a8311f64
    • M
      kconfig: tinyconfig: remove stale stack protector fixups · a0f8c297
      Masahiro Yamada 提交于
      Prior to commit 2a61f474 ("stack-protector: test compiler capability
      in Kconfig and drop AUTO mode"), the stack protector was configured by
      the choice of NONE, REGULAR, STRONG, AUTO.
      
      tiny.config needed to explicitly set NONE because the default value of
      choice, AUTO, did not produce the tiniest kernel.
      
      Now that there are only two boolean symbols, STACKPROTECTOR and
      STACKPROTECTOR_STRONG, they are naturally disabled by "make
      allnoconfig", which "make tinyconfig" is based on.  Remove unnecessary
      lines from the tiny.config fragment file.
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a0f8c297
  11. 14 6月, 2018 2 次提交
    • C
      dma-mapping: move all DMA mapping code to kernel/dma · cf65a0f6
      Christoph Hellwig 提交于
      Currently the code is split over various files with dma- prefixes in the
      lib/ and drives/base directories, and the number of files keeps growing.
      Move them into a single directory to keep the code together and remove
      the file name prefixes.  To match the irq infrastructure this directory
      is placed under the kernel/ directory.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      cf65a0f6
    • L
      Kbuild: rename CC_STACKPROTECTOR[_STRONG] config variables · 050e9baa
      Linus Torvalds 提交于
      The changes to automatically test for working stack protector compiler
      support in the Kconfig files removed the special STACKPROTECTOR_AUTO
      option that picked the strongest stack protector that the compiler
      supported.
      
      That was all a nice cleanup - it makes no sense to have the AUTO case
      now that the Kconfig phase can just determine the compiler support
      directly.
      
      HOWEVER.
      
      It also meant that doing "make oldconfig" would now _disable_ the strong
      stackprotector if you had AUTO enabled, because in a legacy config file,
      the sane stack protector configuration would look like
      
        CONFIG_HAVE_CC_STACKPROTECTOR=y
        # CONFIG_CC_STACKPROTECTOR_NONE is not set
        # CONFIG_CC_STACKPROTECTOR_REGULAR is not set
        # CONFIG_CC_STACKPROTECTOR_STRONG is not set
        CONFIG_CC_STACKPROTECTOR_AUTO=y
      
      and when you ran this through "make oldconfig" with the Kbuild changes,
      it would ask you about the regular CONFIG_CC_STACKPROTECTOR (that had
      been renamed from CONFIG_CC_STACKPROTECTOR_REGULAR to just
      CONFIG_CC_STACKPROTECTOR), but it would think that the STRONG version
      used to be disabled (because it was really enabled by AUTO), and would
      disable it in the new config, resulting in:
      
        CONFIG_HAVE_CC_STACKPROTECTOR=y
        CONFIG_CC_HAS_STACKPROTECTOR_NONE=y
        CONFIG_CC_STACKPROTECTOR=y
        # CONFIG_CC_STACKPROTECTOR_STRONG is not set
        CONFIG_CC_HAS_SANE_STACKPROTECTOR=y
      
      That's dangerously subtle - people could suddenly find themselves with
      the weaker stack protector setup without even realizing.
      
      The solution here is to just rename not just the old RECULAR stack
      protector option, but also the strong one.  This does that by just
      removing the CC_ prefix entirely for the user choices, because it really
      is not about the compiler support (the compiler support now instead
      automatially impacts _visibility_ of the options to users).
      
      This results in "make oldconfig" actually asking the user for their
      choice, so that we don't have any silent subtle security model changes.
      The end result would generally look like this:
      
        CONFIG_HAVE_CC_STACKPROTECTOR=y
        CONFIG_CC_HAS_STACKPROTECTOR_NONE=y
        CONFIG_STACKPROTECTOR=y
        CONFIG_STACKPROTECTOR_STRONG=y
        CONFIG_CC_HAS_SANE_STACKPROTECTOR=y
      
      where the "CC_" versions really are about internal compiler
      infrastructure, not the user selections.
      Acked-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      050e9baa