1. 13 11月, 2013 1 次提交
    • J
      x86/dumpstack: Fix printk_address for direct addresses · 5f01c988
      Jiri Slaby 提交于
      Consider a kernel crash in a module, simulated the following way:
      
       static int my_init(void)
       {
               char *map = (void *)0x5;
               *map = 3;
               return 0;
       }
       module_init(my_init);
      
      When we turn off FRAME_POINTERs, the very first instruction in
      that function causes a BUG. The problem is that we print IP in
      the BUG report using %pB (from printk_address). And %pB
      decrements the pointer by one to fix printing addresses of
      functions with tail calls.
      
      This was added in commit 71f9e598 ("x86, dumpstack: Use
      %pB format specifier for stack trace") to fix the call stack
      printouts.
      
      So instead of correct output:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000005
        IP: [<ffffffffa01ac000>] my_init+0x0/0x10 [pb173]
      
      We get:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000005
        IP: [<ffffffffa0152000>] 0xffffffffa0151fff
      
      To fix that, we use %pS only for stack addresses printouts (via
      newly added printk_stack_address) and %pB for regs->ip (via
      printk_address). I.e. we revert to the old behaviour for all
      except call stacks. And since from all those reliable is 1, we
      remove that parameter from printk_address.
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: joe@perches.com
      Cc: jirislaby@gmail.com
      Link: http://lkml.kernel.org/r/1382706418-8435-1-git-send-email-jslaby@suse.czSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5f01c988
  2. 12 11月, 2013 1 次提交
    • I
      Revert "x86/UV: Add uvtrace support" · b5dfcb09
      Ingo Molnar 提交于
      This reverts commit 8eba1842.
      
      uv_trace() is not used by anything, nor is uv_trace_nmi_func, nor
      uv_trace_func.
      
      That's not how we do instrumentation code in the kernel: we add
      tracepoints, printk()s, etc. so that everyone not just those with
      magic kernel modules can debug a system.
      
      So remove this unused (and misguied) piece of code.
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Cc: Mike Travis <travis@sgi.com>
      Cc: Dimitri Sivanich <sivanich@sgi.com>
      Cc: Hedi Berriche <hedi@sgi.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Jason Wessel <jason.wessel@windriver.com>
      Link: http://lkml.kernel.org/n/tip-tumfBffmr4jmnt8Gyxanoblg@git.kernel.org
      b5dfcb09
  3. 07 11月, 2013 2 次提交
  4. 31 10月, 2013 1 次提交
    • G
      percpu: fix this_cpu_sub() subtrahend casting for unsigneds · bd09d9a3
      Greg Thelen 提交于
      this_cpu_sub() is implemented as negation and addition.
      
      This patch casts the adjustment to the counter type before negation to
      sign extend the adjustment.  This helps in cases where the counter type
      is wider than an unsigned adjustment.  An alternative to this patch is
      to declare such operations unsupported, but it seemed useful to avoid
      surprises.
      
      This patch specifically helps the following example:
        unsigned int delta = 1
        preempt_disable()
        this_cpu_write(long_counter, 0)
        this_cpu_sub(long_counter, delta)
        preempt_enable()
      
      Before this change long_counter on a 64 bit machine ends with value
      0xffffffff, rather than 0xffffffffffffffff.  This is because
      this_cpu_sub(pcp, delta) boils down to this_cpu_add(pcp, -delta),
      which is basically:
        long_counter = 0 + 0xffffffff
      
      Also apply the same cast to:
        __this_cpu_sub()
        __this_cpu_sub_return()
        this_cpu_sub_return()
      
      All percpu_test.ko passes, especially the following cases which
      previously failed:
      
        l -= ui_one;
        __this_cpu_sub(long_counter, ui_one);
        CHECK(l, long_counter, -1);
      
        l -= ui_one;
        this_cpu_sub(long_counter, ui_one);
        CHECK(l, long_counter, -1);
        CHECK(l, long_counter, 0xffffffffffffffff);
      
        ul -= ui_one;
        __this_cpu_sub(ulong_counter, ui_one);
        CHECK(ul, ulong_counter, -1);
        CHECK(ul, ulong_counter, 0xffffffffffffffff);
      
        ul = this_cpu_sub_return(ulong_counter, ui_one);
        CHECK(ul, ulong_counter, 2);
      
        ul = __this_cpu_sub_return(ulong_counter, ui_one);
        CHECK(ul, ulong_counter, 1);
      Signed-off-by: NGreg Thelen <gthelen@google.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bd09d9a3
  5. 29 10月, 2013 1 次提交
  6. 26 10月, 2013 2 次提交
    • J
      x86: Unify copy_to_user() and add size checking to it · 7a3d9b0f
      Jan Beulich 提交于
      Similarly to copy_from_user(), where the range check is to
      protect against kernel memory corruption, copy_to_user() can
      benefit from such checking too: Here it protects against kernel
      information leaks.
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Cc: <arjan@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/5265059502000078000FC4F6@nat28.tlf.novell.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      7a3d9b0f
    • J
      x86: Unify copy_from_user() size checking · 3df7b41a
      Jan Beulich 提交于
      Commits 4a312769 ("x86: Turn the
      copy_from_user check into an (optional) compile time warning")
      and 63312b6a ("x86: Add a
      Kconfig option to turn the copy_from_user warnings into errors")
      touched only the 32-bit variant of copy_from_user(), whereas the
      original commit 9f0cf4ad ("x86:
      Use __builtin_object_size() to validate the buffer size for
      copy_from_user()") also added the same code to the 64-bit one.
      
      Further the earlier conversion from an inline WARN() to the call
      to copy_from_user_overflow() went a little too far: When the
      number of bytes to be copied is not a constant (e.g. [looking at
      3.11] in drivers/net/tun.c:__tun_chr_ioctl() or
      drivers/pci/pcie/aer/aer_inject.c:aer_inject_write()), the
      compiler will always have to keep the funtion call, and hence
      there will always be a warning. By using __builtin_constant_p()
      we can avoid this.
      
      And then this slightly extends the effect of
      CONFIG_DEBUG_STRICT_USER_COPY_CHECKS in that apart from
      converting warnings to errors in the constant size case, it
      retains the (possibly wrong) warnings in the non-constant size
      case, such that if someone is prepared to get a few false
      positives, (s)he'll be able to recover the current behavior
      (except that these diagnostics now will never be converted to
      errors).
      
      Since the 32-bit variant (intentionally) didn't call
      might_fault(), the unification results in this being called
      twice now. Adding a suitable #ifdef would be the alternative if
      that's a problem.
      
      I'd like to point out though that with
      __compiletime_object_size() being restricted to gcc before 4.6,
      the whole construct is going to become more and more pointless
      going forward. I would question however that commit
      2fb0815c ("gcc4: disable
      __compiletime_object_size for GCC 4.6+") was really necessary,
      and instead this should have been dealt with as is done here
      from the beginning.
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/5265056D02000078000FC4F3@nat28.tlf.novell.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3df7b41a
  7. 24 10月, 2013 1 次提交
  8. 18 10月, 2013 5 次提交
  9. 11 10月, 2013 3 次提交
  10. 28 9月, 2013 2 次提交
    • B
      x86: Improve the printout of the SMP bootup CPU table · 646e29a1
      Borislav Petkov 提交于
      As the new x86 CPU bootup printout format code maintainer, I am
      taking immediate action to improve and clean (and thus indulge
      my OCD) the reporting of the cores when coming up online.
      
      Fix padding to a right-hand alignment, cleanup code and bind
      reporting width to the max number of supported CPUs on the
      system, like this:
      
       [    0.074509] smpboot: Booting Node   0, Processors:      #1  #2  #3  #4  #5  #6  #7 OK
       [    0.644008] smpboot: Booting Node   1, Processors:  #8  #9 #10 #11 #12 #13 #14 #15 OK
       [    1.245006] smpboot: Booting Node   2, Processors: #16 #17 #18 #19 #20 #21 #22 #23 OK
       [    1.864005] smpboot: Booting Node   3, Processors: #24 #25 #26 #27 #28 #29 #30 #31 OK
       [    2.489005] smpboot: Booting Node   4, Processors: #32 #33 #34 #35 #36 #37 #38 #39 OK
       [    3.093005] smpboot: Booting Node   5, Processors: #40 #41 #42 #43 #44 #45 #46 #47 OK
       [    3.698005] smpboot: Booting Node   6, Processors: #48 #49 #50 #51 #52 #53 #54 #55 OK
       [    4.304005] smpboot: Booting Node   7, Processors: #56 #57 #58 #59 #60 #61 #62 #63 OK
       [    4.961413] Brought up 64 CPUs
      
      and this:
      
       [    0.072367] smpboot: Booting Node   0, Processors:    #1 #2 #3 #4 #5 #6 #7 OK
       [    0.686329] Brought up 8 CPUs
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Libin <huawei.libin@huawei.com>
      Cc: wangyijing@huawei.com
      Cc: fenghua.yu@intel.com
      Cc: guohanjun@huawei.com
      Cc: paul.gortmaker@windriver.com
      Link: http://lkml.kernel.org/r/20130927143554.GF4422@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
      646e29a1
    • P
      sched: Revert need_resched() to look at TIF_NEED_RESCHED · 75f93fed
      Peter Zijlstra 提交于
      Yuanhan reported a serious throughput regression in his pigz
      benchmark. Using the ftrace patch I found that several idle
      paths need more TLC before we can switch the generic
      need_resched() over to preempt_need_resched.
      
      The preemption paths benefit most from preempt_need_resched and
      do indeed use it; all other need_resched() users don't really
      care that much so reverting need_resched() back to
      tif_need_resched() is the simple and safe solution.
      Reported-by: NYuanhan Liu <yuanhan.liu@linux.intel.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: lkp@linux.intel.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20130927153003.GF15690@laptop.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      75f93fed
  11. 25 9月, 2013 5 次提交
  12. 24 9月, 2013 3 次提交
    • M
      x86/UV: Add uvtrace support · 8eba1842
      Mike Travis 提交于
      This patch adds support for the uvtrace module by providing a
      skeleton call to the registered trace function.  It also
      provides another separate 'NMI' tracer that is triggered by the
      system wide 'power nmi' command.
      Signed-off-by: NMike Travis <travis@sgi.com>
      Reviewed-by: NDimitri Sivanich <sivanich@sgi.com>
      Reviewed-by: NHedi Berriche <hedi@sgi.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Jason Wessel <jason.wessel@windriver.com>
      Link: http://lkml.kernel.org/r/20130923212501.185052551@asylum.americas.sgi.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8eba1842
    • M
      x86/UV: Update UV support for external NMI signals · 0d12ef0c
      Mike Travis 提交于
      The current UV NMI handler has not been updated for the changes
      in the system NMI handler and the perf operations.  The UV NMI
      handler reads an MMR in the UV Hub to check to see if the NMI
      event was caused by the external 'system NMI' that the operator
      can initiate on the System Mgmt Controller.
      
      The problem arises when the perf tools are running, causing
      millions of perf events per second on very large CPU count
      systems.  Previously this was okay because the perf NMI handler
      ran at a higher priority on the NMI call chain and if the NMI
      was a perf event, it would stop calling other NMI handlers
      remaining on the NMI call chain.
      
      Now the system NMI handler calls all the handlers on the NMI
      call chain including the UV NMI handler.  This causes the UV NMI
      handler to read the MMRs at the same millions per second rate.
      This can lead to significant performance loss and possible
      system failures.  It also can cause thousands of 'Dazed and
      Confused' messages being sent to the system console.  This
      effectively makes perf tools unusable on UV systems.
      
      To avoid this excessive overhead when perf tools are running,
      this code has been optimized to minimize reading of the MMRs as
      much as possible, by moving to the NMI_UNKNOWN notifier chain.
      This chain is called only when all the users on the standard
      NMI_LOCAL call chain have been called and none of them have
      claimed this NMI.
      
      There is an exception where the NMI_LOCAL notifier chain is
      used.  When the perf tools are in use, it's possible that the UV
      NMI was captured by some other NMI handler and then either
      ignored or mistakenly processed as a perf event.  We set a
      per_cpu ('ping') flag for those CPUs that ignored the initial
      NMI, and then send them an IPI NMI signal.  The NMI_LOCAL
      handler on each cpu does not need to read the MMR, but instead
      checks the in memory flag indicating it was pinged.  There are
      two module variables, 'ping_count' indicating how many requested
      NMI events occurred, and 'ping_misses' indicating how many stray
      NMI events.  These most likely are perf events so it shows the
      overhead of the perf NMI interrupts and how many MMR reads were avoided.
      
      This patch also minimizes the reads of the MMRs by having the
      first cpu entering the NMI handler on each node set a per HUB
      in-memory atomic value.  (Having a per HUB value avoids sending
      lock traffic over NumaLink.)  Both types of UV NMIs from the SMI
      layer are supported.
      Signed-off-by: NMike Travis <travis@sgi.com>
      Reviewed-by: NDimitri Sivanich <sivanich@sgi.com>
      Reviewed-by: NHedi Berriche <hedi@sgi.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Jason Wessel <jason.wessel@windriver.com>
      Link: http://lkml.kernel.org/r/20130923212500.353547733@asylum.americas.sgi.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0d12ef0c
    • M
      x86/UV: Move NMI support · 1e019421
      Mike Travis 提交于
      This patch moves the UV NMI support from the x2apic file to a
      new separate uv_nmi.c file in preparation for the next sequence
      of patches. It prevents upcoming bloat of the x2apic file, and
      has the added benefit of putting the upcoming /sys/module
      parameters under the name 'uv_nmi' instead of 'x2apic_uv_x',
      which was obscure.
      Signed-off-by: NMike Travis <travis@sgi.com>
      Reviewed-by: NDimitri Sivanich <sivanich@sgi.com>
      Reviewed-by: NHedi Berriche <hedi@sgi.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Jason Wessel <jason.wessel@windriver.com>
      Link: http://lkml.kernel.org/r/20130923212500.183295611@asylum.americas.sgi.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1e019421
  13. 12 9月, 2013 2 次提交
  14. 11 9月, 2013 1 次提交
  15. 03 9月, 2013 1 次提交
    • L
      lockref: implement lockless reference count updates using cmpxchg() · bc08b449
      Linus Torvalds 提交于
      Instead of taking the spinlock, the lockless versions atomically check
      that the lock is not taken, and do the reference count update using a
      cmpxchg() loop.  This is semantically identical to doing the reference
      count update protected by the lock, but avoids the "wait for lock"
      contention that you get when accesses to the reference count are
      contended.
      
      Note that a "lockref" is absolutely _not_ equivalent to an atomic_t.
      Even when the lockref reference counts are updated atomically with
      cmpxchg, the fact that they also verify the state of the spinlock means
      that the lockless updates can never happen while somebody else holds the
      spinlock.
      
      So while "lockref_put_or_lock()" looks a lot like just another name for
      "atomic_dec_and_lock()", and both optimize to lockless updates, they are
      fundamentally different: the decrement done by atomic_dec_and_lock() is
      truly independent of any lock (as long as it doesn't decrement to zero),
      so a locked region can still see the count change.
      
      The lockref structure, in contrast, really is a *locked* reference
      count.  If you hold the spinlock, the reference count will be stable and
      you can modify the reference count without using atomics, because even
      the lockless updates will see and respect the state of the lock.
      
      In order to enable the cmpxchg lockless code, the architecture needs to
      do three things:
      
       (1) Make sure that the "arch_spinlock_t" and an "unsigned int" can fit
           in an aligned u64, and have a "cmpxchg()" implementation that works
           on such a u64 data type.
      
       (2) define a helper function to test for a spinlock being unlocked
           ("arch_spin_value_unlocked()")
      
       (3) select the "ARCH_USE_CMPXCHG_LOCKREF" config variable in its
           Kconfig file.
      
      This enables it for x86-64 (but not 32-bit, we'd need to make sure
      cmpxchg() turns into the proper cmpxchg8b in order to enable it for
      32-bit mode).
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bc08b449
  16. 02 9月, 2013 1 次提交
  17. 30 8月, 2013 3 次提交
  18. 27 8月, 2013 1 次提交
  19. 26 8月, 2013 2 次提交
  20. 21 8月, 2013 1 次提交
  21. 20 8月, 2013 1 次提交
    • Y
      x86/ioapic/kcrash: Prevent crash_kexec() from deadlocking on ioapic_lock · 17405453
      Yoshihiro YUNOMAE 提交于
      Prevent crash_kexec() from deadlocking on ioapic_lock. When
      crash_kexec() is executed on a CPU, the CPU will take ioapic_lock
      in disable_IO_APIC(). So if the cpu gets an NMI while locking
      ioapic_lock, a deadlock will happen.
      
      In this patch, ioapic_lock is zapped/initialized before disable_IO_APIC().
      
      You can reproduce this deadlock the following way:
      
      1. Add mdelay(1000) after raw_spin_lock_irqsave() in
         native_ioapic_set_affinity()@arch/x86/kernel/apic/io_apic.c
      
         Although the deadlock can occur without this modification, it will increase
         the potential of the deadlock problem.
      
      2. Build and install the kernel
      
      3. Set up the OS which will run panic() and kexec when NMI is injected
          # echo "kernel.unknown_nmi_panic=1" >> /etc/sysctl.conf
          # vim /etc/default/grub
            add "nmi_watchdog=0 crashkernel=256M" in GRUB_CMDLINE_LINUX line
          # grub2-mkconfig
      
      4. Reboot the OS
      
      5. Run following command for each vcpu on the guest
          # while true; do echo <CPU num> > /proc/irq/<IO-APIC-edge or IO-APIC-fasteoi>/smp_affinitity; done;
         By running this command, cpus will get ioapic_lock for setting affinity.
      
      6. Inject NMI (push a dump button or execute 'virsh inject-nmi <domain>' if you
         use VM). After injecting NMI, panic() is called in an nmi-handler context.
         Then, kexec will normally run in panic(), but the operation will be stopped
         by deadlock on ioapic_lock in crash_kexec()->machine_crash_shutdown()->
         native_machine_crash_shutdown()->disable_IO_APIC()->clear_IO_APIC()->
         clear_IO_APIC_pin()->ioapic_read_entry().
      Signed-off-by: NYoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: yrl.pp-manager.tt@hitachi.com
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Seiji Aguchi <seiji.aguchi@hds.com>
      Link: http://lkml.kernel.org/r/20130820070107.28245.83806.stgit@yunodevelSigned-off-by: NIngo Molnar <mingo@kernel.org>
      17405453