1. 08 1月, 2018 1 次提交
  2. 06 1月, 2018 1 次提交
  3. 05 1月, 2018 2 次提交
  4. 30 12月, 2017 8 次提交
    • T
      timers: Invoke timer_start_debug() where it makes sense · fd45bb77
      Thomas Gleixner 提交于
      The timer start debug function is called before the proper timer base is
      set. As a consequence the trace data contains the stale CPU and flags
      values.
      
      Call the debug function after setting the new base and flags.
      
      Fixes: 500462a9 ("timers: Switch to a non-cascading wheel")
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: stable@vger.kernel.org
      Cc: rt@linutronix.de
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
      Link: https://lkml.kernel.org/r/20171222145337.792907137@linutronix.de
      fd45bb77
    • T
      nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick() · 5d62c183
      Thomas Gleixner 提交于
      The conditions in irq_exit() to invoke tick_nohz_irq_exit() which
      subsequently invokes tick_nohz_stop_sched_tick() are:
      
        if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu))
      
      If need_resched() is not set, but a timer softirq is pending then this is
      an indication that the softirq code punted and delegated the execution to
      softirqd. need_resched() is not true because the current interrupted task
      takes precedence over softirqd.
      
      Invoking tick_nohz_irq_exit() in this case can cause an endless loop of
      timer interrupts because the timer wheel contains an expired timer, but
      softirqs are not yet executed. So it returns an immediate expiry request,
      which causes the timer to fire immediately again. Lather, rinse and
      repeat....
      
      Prevent that by adding a check for a pending timer soft interrupt to the
      conditions in tick_nohz_stop_sched_tick() which avoid calling
      get_next_timer_interrupt(). That keeps the tick sched timer on the tick and
      prevents a repetitive programming of an already expired timer.
      Reported-by: NSebastian Siewior <bigeasy@linutronix.d>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712272156050.2431@nanos
      5d62c183
    • T
      timers: Reinitialize per cpu bases on hotplug · 26456f87
      Thomas Gleixner 提交于
      The timer wheel bases are not (re)initialized on CPU hotplug. That leaves
      them with a potentially stale clk and next_expiry valuem, which can cause
      trouble then the CPU is plugged.
      
      Add a prepare callback which forwards the clock, sets next_expiry to far in
      the future and reset the control flags to a known state.
      
      Set base->must_forward_clk so the first timer which is queued will try to
      forward the clock to current jiffies.
      
      Fixes: 500462a9 ("timers: Switch to a non-cascading wheel")
      Reported-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712272152200.2431@nanos
      26456f87
    • A
      timers: Use deferrable base independent of base::nohz_active · ced6d5c1
      Anna-Maria Gleixner 提交于
      During boot and before base::nohz_active is set in the timer bases, deferrable
      timers are enqueued into the standard timer base. This works correctly as
      long as base::nohz_active is false.
      
      Once it base::nohz_active is set and a timer which was enqueued before that
      is accessed the lock selector code choses the lock of the deferred
      base. This causes unlocked access to the standard base and in case the
      timer is removed it does not clear the pending flag in the standard base
      bitmap which causes get_next_timer_interrupt() to return bogus values.
      
      To prevent that, the deferrable timers must be enqueued in the deferrable
      base, even when base::nohz_active is not set. Those deferrable timers also
      need to be expired unconditional.
      
      Fixes: 500462a9 ("timers: Switch to a non-cascading wheel")
      Signed-off-by: NAnna-Maria Gleixner <anna-maria@linutronix.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: stable@vger.kernel.org
      Cc: rt@linutronix.de
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Link: https://lkml.kernel.org/r/20171222145337.633328378@linutronix.de
      ced6d5c1
    • T
      genirq/msi, x86/vector: Prevent reservation mode for non maskable MSI · bc976233
      Thomas Gleixner 提交于
      The new reservation mode for interrupts assigns a dummy vector when the
      interrupt is allocated and assigns a real vector when the interrupt is
      requested. The reservation mode prevents vector pressure when devices with
      a large amount of queues/interrupts are initialized, but only a minimal
      subset of those queues/interrupts is actually used.
      
      This mode has an issue with MSI interrupts which cannot be masked. If the
      driver is not careful or the hardware emits an interrupt before the device
      irq is requestd by the driver then the interrupt ends up on the dummy
      vector as a spurious interrupt which can cause malfunction of the device or
      in the worst case a lockup of the machine.
      
      Change the logic for the reservation mode so that the early activation of
      MSI interrupts checks whether:
      
       - the device is a PCI/MSI device
       - the reservation mode of the underlying irqdomain is activated
       - PCI/MSI masking is globally enabled
       - the PCI/MSI device uses either MSI-X, which supports masking, or
         MSI with the maskbit supported.
      
      If one of those conditions is false, then clear the reservation mode flag
      in the irq data of the interrupt and invoke irq_domain_activate_irq() with
      the reserve argument cleared. In the x86 vector code, clear the can_reserve
      flag in the vector allocation data so a subsequent free_irq() won't create
      the same situation again. The interrupt stays assigned to a real vector
      until pci_disable_msi() is invoked and all allocations are undone.
      
      Fixes: 4900be83 ("x86/vector/msi: Switch to global reservation mode")
      Reported-by: NAlexandru Chirvasitu <achirvasub@gmail.com>
      Reported-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NAlexandru Chirvasitu <achirvasub@gmail.com>
      Tested-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Maciej W. Rozycki <macro@linux-mips.org>
      Cc: Mikael Pettersson <mikpelinux@gmail.com>
      Cc: Josh Poulson <jopoulso@microsoft.com>
      Cc: Mihai Costache <v-micos@microsoft.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: linux-pci@vger.kernel.org
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Dexuan Cui <decui@microsoft.com>
      Cc: Simon Xiao <sixiao@microsoft.com>
      Cc: Saeed Mahameed <saeedm@mellanox.com>
      Cc: Jork Loeser <Jork.Loeser@microsoft.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: devel@linuxdriverproject.org
      Cc: KY Srinivasan <kys@microsoft.com>
      Cc: Alan Cox <alan@linux.intel.com>
      Cc: Sakari Ailus <sakari.ailus@intel.com>,
      Cc: linux-media@vger.kernel.org
      Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712291406420.1899@nanos
      Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712291409460.1899@nanos
      bc976233
    • T
      genirq/irqdomain: Rename early argument of irq_domain_activate_irq() · 702cb0a0
      Thomas Gleixner 提交于
      The 'early' argument of irq_domain_activate_irq() is actually used to
      denote reservation mode. To avoid confusion, rename it before abuse
      happens.
      
      No functional change.
      
      Fixes: 72491643 ("genirq/irqdomain: Update irq_domain_ops.activate() signature")
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Alexandru Chirvasitu <achirvasub@gmail.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Maciej W. Rozycki <macro@linux-mips.org>
      Cc: Mikael Pettersson <mikpelinux@gmail.com>
      Cc: Josh Poulson <jopoulso@microsoft.com>
      Cc: Mihai Costache <v-micos@microsoft.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: linux-pci@vger.kernel.org
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Dexuan Cui <decui@microsoft.com>
      Cc: Simon Xiao <sixiao@microsoft.com>
      Cc: Saeed Mahameed <saeedm@mellanox.com>
      Cc: Jork Loeser <Jork.Loeser@microsoft.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: devel@linuxdriverproject.org
      Cc: KY Srinivasan <kys@microsoft.com>
      Cc: Alan Cox <alan@linux.intel.com>
      Cc: Sakari Ailus <sakari.ailus@intel.com>,
      Cc: linux-media@vger.kernel.org
      702cb0a0
    • T
      genirq: Introduce IRQD_CAN_RESERVE flag · 69790ba9
      Thomas Gleixner 提交于
      Add a new flag to mark interrupts which can use reservation mode. This is
      going to be used in subsequent patches to disable reservation mode for a
      certain class of MSI devices.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NAlexandru Chirvasitu <achirvasub@gmail.com>
      Tested-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Maciej W. Rozycki <macro@linux-mips.org>
      Cc: Mikael Pettersson <mikpelinux@gmail.com>
      Cc: Josh Poulson <jopoulso@microsoft.com>
      Cc: Mihai Costache <v-micos@microsoft.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: linux-pci@vger.kernel.org
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Dexuan Cui <decui@microsoft.com>
      Cc: Simon Xiao <sixiao@microsoft.com>
      Cc: Saeed Mahameed <saeedm@mellanox.com>
      Cc: Jork Loeser <Jork.Loeser@microsoft.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: devel@linuxdriverproject.org
      Cc: KY Srinivasan <kys@microsoft.com>
      Cc: Alan Cox <alan@linux.intel.com>
      Cc: Sakari Ailus <sakari.ailus@intel.com>,
      Cc: linux-media@vger.kernel.org
      69790ba9
    • T
      genirq/msi: Handle reactivation only on success · da5dd9e8
      Thomas Gleixner 提交于
      When analyzing the fallout of the x86 vector allocation rework it turned
      out that the error handling in msi_domain_alloc_irqs() is broken.
      
      If MSI_FLAG_MUST_REACTIVATE is set for a MSI domain then it clears the
      activation flag for a successfully initialized msi descriptor. If a
      subsequent initialization fails then the error handling code path does not
      deactivate the interrupt because the activation flag got cleared.
      
      Move the clearing of the activation flag outside of the initialization loop
      so that an eventual failure can be cleaned up correctly.
      
      Fixes: 22d0b12f ("genirq/irqdomain: Add force reactivation flag to irq domains")
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NAlexandru Chirvasitu <achirvasub@gmail.com>
      Tested-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Maciej W. Rozycki <macro@linux-mips.org>
      Cc: Mikael Pettersson <mikpelinux@gmail.com>
      Cc: Josh Poulson <jopoulso@microsoft.com>
      Cc: Mihai Costache <v-micos@microsoft.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: linux-pci@vger.kernel.org
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Dexuan Cui <decui@microsoft.com>
      Cc: Simon Xiao <sixiao@microsoft.com>
      Cc: Saeed Mahameed <saeedm@mellanox.com>
      Cc: Jork Loeser <Jork.Loeser@microsoft.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: devel@linuxdriverproject.org
      Cc: KY Srinivasan <kys@microsoft.com>
      Cc: Alan Cox <alan@linux.intel.com>
      Cc: Sakari Ailus <sakari.ailus@intel.com>,
      Cc: linux-media@vger.kernel.org
      
      da5dd9e8
  5. 28 12月, 2017 9 次提交
  6. 24 12月, 2017 1 次提交
    • E
      pid: Handle failure to allocate the first pid in a pid namespace · c0ee5549
      Eric W. Biederman 提交于
      With the replacement of the pid bitmap and hashtable with an idr in
      alloc_pid started occassionally failing when allocating the first pid
      in a pid namespace.  Things were not completely reset resulting in
      the first allocated pid getting the number 2 (not 1).  Which
      further resulted in ns->proc_mnt not getting set and eventually
      causing an oops in proc_flush_task.
      
      Oops: 0000 [#1] SMP
      CPU: 2 PID: 6743 Comm: trinity-c117 Not tainted 4.15.0-rc4-think+ #2
      RIP: 0010:proc_flush_task+0x8e/0x1b0
      RSP: 0018:ffffc9000bbffc40 EFLAGS: 00010286
      RAX: 0000000000000001 RBX: 0000000000000001 RCX: 00000000fffffffb
      RDX: 0000000000000000 RSI: ffffc9000bbffc50 RDI: 0000000000000000
      RBP: ffffc9000bbffc63 R08: 0000000000000000 R09: 0000000000000002
      R10: ffffc9000bbffb70 R11: ffffc9000bbffc64 R12: 0000000000000003
      R13: 0000000000000000 R14: 0000000000000003 R15: ffff8804c10d7840
      FS:  00007f7cb8965700(0000) GS:ffff88050a200000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 00000003e21ae003 CR4: 00000000001606e0
      DR0: 00007fb1d6c22000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
      Call Trace:
       ? release_task+0xaf/0x680
       release_task+0xd2/0x680
       ? wait_consider_task+0xb82/0xce0
       wait_consider_task+0xbe9/0xce0
       ? do_wait+0xe1/0x330
       do_wait+0x151/0x330
       kernel_wait4+0x8d/0x150
       ? task_stopped_code+0x50/0x50
       SYSC_wait4+0x95/0xa0
       ? rcu_read_lock_sched_held+0x6c/0x80
       ? syscall_trace_enter+0x2d7/0x340
       ? do_syscall_64+0x60/0x210
       do_syscall_64+0x60/0x210
       entry_SYSCALL64_slow_path+0x25/0x25
      RIP: 0033:0x7f7cb82603aa
      RSP: 002b:00007ffd60770bc8 EFLAGS: 00000246
       ORIG_RAX: 000000000000003d
      RAX: ffffffffffffffda RBX: 00007f7cb6cd4000 RCX: 00007f7cb82603aa
      RDX: 000000000000000b RSI: 00007ffd60770bd0 RDI: 0000000000007cca
      RBP: 0000000000007cca R08: 00007f7cb8965700 R09: 00007ffd607c7080
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 00007ffd60770bd0 R14: 00007f7cb6cd4058 R15: 00000000cccccccd
      Code: c1 e2 04 44 8b 60 30 48 8b 40 38 44 8b 34 11 48 c7 c2 60 3a f5 81 44 89 e1 4c 8b 68 58 e8 4b b4 77 00 89 44 24 14 48 8d 74 24 10 <49> 8b 7d 00 e8 b9 6a f9 ff 48 85 c0 74 1a 48 89 c7 48 89 44 24
      RIP: proc_flush_task+0x8e/0x1b0 RSP: ffffc9000bbffc40
      CR2: 0000000000000000
      ---[ end trace 53d67a6481059862 ]---
      
      Improve the quality of the implementation by resetting the place to
      start allocating pids on failure to allocate the first pid.
      
      As improving the quality of the implementation is the goal remove the now
      unnecesarry disable_pid_allocations call when we fail to mount proc.
      
      Fixes: 95846ecf ("pid: replace pid bitmap implementation with IDR API")
      Fixes: 8ef047aa ("pid namespaces: make alloc_pid(), free_pid() and put_pid() work with struct upid")
      Reported-by: NDave Jones <davej@codemonkey.org.uk>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      c0ee5549
  7. 23 12月, 2017 1 次提交
    • T
      arch, mm: Allow arch_dup_mmap() to fail · c10e83f5
      Thomas Gleixner 提交于
      In order to sanitize the LDT initialization on x86 arch_dup_mmap() must be
      allowed to fail. Fix up all instances.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirsky <luto@kernel.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Laight <David.Laight@aculab.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eduardo Valentin <eduval@amazon.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: aliguori@amazon.com
      Cc: dan.j.williams@intel.com
      Cc: hughd@google.com
      Cc: keescook@google.com
      Cc: kirill.shutemov@linux.intel.com
      Cc: linux-mm@kvack.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c10e83f5
  8. 21 12月, 2017 9 次提交
  9. 18 12月, 2017 1 次提交
    • P
      sched/isolation: Make CONFIG_NO_HZ_FULL select CONFIG_CPU_ISOLATION · bf29cb23
      Paul E. McKenney 提交于
      CONFIG_NO_HZ_FULL doesn't make sense without CONFIG_CPU_ISOLATION. In
      fact enabling the first without the second is a regression as nohz_full=
      boot parameter gets silently ignored.
      
      Besides this unnatural combination hangs RCU gp kthread when running
      rcutorture for reasons that are not yet fully understood:
      
      	rcu_preempt kthread starved for 9974 jiffies! g4294967208
      	+c4294967207 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=0
      	rcu_preempt     I 7464     8      2 0x80000000
      	Call Trace:
      		__schedule+0x493/0x620
      		schedule+0x24/0x40
      		schedule_timeout+0x330/0x3b0
      		? preempt_count_sub+0xea/0x140
      		? collect_expired_timers+0xb0/0xb0
      		rcu_gp_kthread+0x6bf/0xef0
      
      This commit therefore makes NO_HZ_FULL select CPU_ISOLATION, which
      prevents all these bad behaviours.
      Reported-by: Nkernel test robot <xiaolong.ye@intel.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NFrederic Weisbecker <frederic@kernel.org>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <kernellwp@gmail.com>
      Fixes: 5c4991e2 ("sched/isolation: Split out new CONFIG_CPU_ISOLATION=y config from CONFIG_NO_HZ_FULL")
      Link: http://lkml.kernel.org/r/1513275507-29200-2-git-send-email-frederic@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      bf29cb23
  10. 17 12月, 2017 1 次提交
  11. 16 12月, 2017 1 次提交
    • D
      bpf: guarantee r1 to be ctx in case of bpf_helper_changes_pkt_data · 04514d13
      Daniel Borkmann 提交于
      Some JITs don't cache skb context on stack in prologue, so when
      LD_ABS/IND is used and helper calls yield bpf_helper_changes_pkt_data()
      as true, then they temporarily save/restore skb pointer. However,
      the assumption that skb always has to be in r1 is a bit of a
      gamble. Right now it turned out to be true for all helpers listed
      in bpf_helper_changes_pkt_data(), but lets enforce that from verifier
      side, so that we make this a guarantee and bail out if the func
      proto is misconfigured in future helpers.
      
      In case of BPF helper calls from cBPF, bpf_helper_changes_pkt_data()
      is completely unrelevant here (since cBPF is context read-only) and
      therefore always false.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      04514d13
  12. 15 12月, 2017 5 次提交
    • S
      sched/rt: Do not pull from current CPU if only one CPU to pull · f73c52a5
      Steven Rostedt 提交于
      Daniel Wagner reported a crash on the BeagleBone Black SoC.
      
      This is a single CPU architecture, and does not have a functional
      arch_send_call_function_single_ipi() implementation which can crash
      the kernel if that is called.
      
      As it only has one CPU, it shouldn't be called, but if the kernel is
      compiled for SMP, the push/pull RT scheduling logic now calls it for
      irq_work if the one CPU is overloaded, it can use that function to call
      itself and crash the kernel.
      
      Ideally, we should disable the SCHED_FEAT(RT_PUSH_IPI) if the system
      only has a single CPU. But SCHED_FEAT is a constant if sched debugging
      is turned off. Another fix can also be used, and this should also help
      with normal SMP machines. That is, do not initiate the pull code if
      there's only one RT overloaded CPU, and that CPU happens to be the
      current CPU that is scheduling in a lower priority task.
      
      Even on a system with many CPUs, if there's many RT tasks waiting to
      run on a single CPU, and that CPU schedules in another RT task of lower
      priority, it will initiate the PULL logic in case there's a higher
      priority RT task on another CPU that is waiting to run. But if there is
      no other CPU with waiting RT tasks, it will initiate the RT pull logic
      on itself (as it still has RT tasks waiting to run). This is a wasted
      effort.
      
      Not only does this help with SMP code where the current CPU is the only
      one with RT overloaded tasks, it should also solve the issue that
      Daniel encountered, because it will prevent the PULL logic from
      executing, as there's only one CPU on the system, and the check added
      here will cause it to exit the RT pull code.
      Reported-by: NDaniel Wagner <wagi@monom.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
      Cc: stable@vger.kernel.org
      Fixes: 4bdced5c ("sched/rt: Simplify the IPI based RT balancing logic")
      Link: http://lkml.kernel.org/r/20171202130454.4cbbfe8d@vmware.local.homeSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f73c52a5
    • T
      posix-timer: Properly check sigevent->sigev_notify · cef31d9a
      Thomas Gleixner 提交于
      timer_create() specifies via sigevent->sigev_notify the signal delivery for
      the new timer. The valid modes are SIGEV_NONE, SIGEV_SIGNAL, SIGEV_THREAD
      and (SIGEV_SIGNAL | SIGEV_THREAD_ID).
      
      The sanity check in good_sigevent() is only checking the valid combination
      for the SIGEV_THREAD_ID bit, i.e. SIGEV_SIGNAL, but if SIGEV_THREAD_ID is
      not set it accepts any random value.
      
      This has no real effects on the posix timer and signal delivery code, but
      it affects show_timer() which handles the output of /proc/$PID/timers. That
      function uses a string array to pretty print sigev_notify. The access to
      that array has no bound checks, so random sigev_notify cause access beyond
      the array bounds.
      
      Add proper checks for the valid notify modes and remove the SIGEV_THREAD_ID
      masking from various code pathes as SIGEV_NONE can never be set in
      combination with SIGEV_THREAD_ID.
      Reported-by: NEric Biggers <ebiggers3@gmail.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Reported-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: stable@vger.kernel.org
      cef31d9a
    • S
      tracing: Have stack trace not record if RCU is not watching · b00d607b
      Steven Rostedt (VMware) 提交于
      The stack tracer records a stack dump whenever it sees a stack usage that is
      more than what it ever saw before. This can happen at any function that is
      being traced. If it happens when the CPU is going idle (or other strange
      locations), RCU may not be watching, and in this case, the recording of the
      stack trace will trigger a warning. There's been lots of efforts to make
      hacks to allow stack tracing to proceed even if RCU is not watching, but
      this only causes more issues to appear. Simply do not trace a stack if RCU
      is not watching. It probably isn't a bad stack anyway.
      Acked-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      b00d607b
    • S
      arch: define weak abort() · 7c2c11b2
      Sudip Mukherjee 提交于
      gcc toggle -fisolate-erroneous-paths-dereference (default at -O2
      onwards) isolates faulty code paths such as null pointer access, divide
      by zero etc.  If gcc port doesnt implement __builtin_trap, an abort() is
      generated which causes kernel link error.
      
      In this case, gcc is generating abort due to 'divide by zero' in
      lib/mpi/mpih-div.c.
      
      Currently 'frv' and 'arc' are failing.  Previously other arch was also
      broken like m32r was fixed by commit d22e3d69 ("m32r: fix build
      failure").
      
      Let's define this weak function which is common for all arch and fix the
      problem permanently.  We can even remove the arch specific 'abort' after
      this is done.
      
      Link: http://lkml.kernel.org/r/1513118956-8718-1-git-send-email-sudipm.mukherjee@gmail.comSigned-off-by: NSudip Mukherjee <sudipm.mukherjee@gmail.com>
      Cc: Alexey Brodkin <Alexey.Brodkin@synopsys.com>
      Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com>
      Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7c2c11b2
    • T
      kernel: make groups_sort calling a responsibility group_info allocators · bdcf0a42
      Thiago Rafael Becker 提交于
      In testing, we found that nfsd threads may call set_groups in parallel
      for the same entry cached in auth.unix.gid, racing in the call of
      groups_sort, corrupting the groups for that entry and leading to
      permission denials for the client.
      
      This patch:
       - Make groups_sort globally visible.
       - Move the call to groups_sort to the modifiers of group_info
       - Remove the call to groups_sort from set_groups
      
      Link: http://lkml.kernel.org/r/20171211151420.18655-1-thiago.becker@gmail.comSigned-off-by: NThiago Rafael Becker <thiago.becker@gmail.com>
      Reviewed-by: NMatthew Wilcox <mawilcox@microsoft.com>
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Acked-by: N"J. Bruce Fields" <bfields@fieldses.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bdcf0a42