1. 27 7月, 2017 1 次提交
  2. 31 7月, 2015 1 次提交
  3. 19 5月, 2015 1 次提交
  4. 10 5月, 2015 2 次提交
  5. 07 5月, 2015 2 次提交
  6. 24 4月, 2015 1 次提交
  7. 16 12月, 2014 1 次提交
    • J
      x86: Avoid building unused IRQ entry stubs · 2414e021
      Jan Beulich 提交于
      When X86_LOCAL_APIC (i.e. unconditionally on x86-64),
      first_system_vector will never end up being higher than
      LOCAL_TIMER_VECTOR (0xef), and hence building stubs for vectors
      0xef...0xff is pointlessly reducing code density. Deal with this at
      build time already.
      
      Taking into consideration that X86_64 implies X86_LOCAL_APIC, also
      simplify (and hence make easier to read and more consistent with the
      change done here) some #if-s in arch/x86/kernel/irqinit.c.
      
      While we could further improve the packing of the IRQ entry stubs (the
      four ones now left in the last set could be fit into the four padding
      bytes each of the final four sets have) this doesn't seem to provide
      any real benefit: Both irq_entries_start and common_interrupt getting
      cache line aligned, eliminating the 30th set would just produce 32
      bytes of padding between the 29th and common_interrupt.
      
      [ tglx: Folded lguest fix from Dan Carpenter ]
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: lguest@lists.ozlabs.org
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Link: http://lkml.kernel.org/r/54574D5F0200007800044389@mail.emea.novell.com
      Link: http://lkml.kernel.org/r/20141115185718.GB6530@mwandaSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      2414e021
  8. 17 4月, 2013 1 次提交
  9. 13 2月, 2013 1 次提交
  10. 28 6月, 2012 1 次提交
  11. 13 10月, 2011 1 次提交
  12. 11 8月, 2011 1 次提交
  13. 16 6月, 2011 1 次提交
  14. 07 6月, 2011 1 次提交
    • A
      x86-64: Emulate legacy vsyscalls · 5cec93c2
      Andy Lutomirski 提交于
      There's a fair amount of code in the vsyscall page.  It contains
      a syscall instruction (in the gettimeofday fallback) and who
      knows what will happen if an exploit jumps into the middle of
      some other code.
      
      Reduce the risk by replacing the vsyscalls with short magic
      incantations that cause the kernel to emulate the real
      vsyscalls. These incantations are useless if entered in the
      middle.
      
      This causes vsyscalls to be a little more expensive than real
      syscalls.  Fortunately sensible programs don't use them.
      The only exception is time() which is still called by glibc
      through the vsyscall - but calling time() millions of times
      per second is not sensible. glibc has this fixed in the
      development tree.
      
      This patch is not perfect: the vread_tsc and vread_hpet
      functions are still at a fixed address.  Fixing that might
      involve making alternative patching work in the vDSO.
      Signed-off-by: NAndy Lutomirski <luto@mit.edu>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Jesper Juhl <jj@chaosbits.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Jan Beulich <JBeulich@novell.com>
      Cc: richard -rw- weinberger <richard.weinberger@gmail.com>
      Cc: Mikael Pettersson <mikpe@it.uu.se>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Louis Rilling <Louis.Rilling@kerlabs.com>
      Cc: Valdis.Kletnieks@vt.edu
      Cc: pageexec@freemail.hu
      Link: http://lkml.kernel.org/r/e64e1b3c64858820d12c48fa739efbd1485e79d5.1307292171.git.luto@mit.edu
      [ Removed the CONFIG option - it's simpler to just do it unconditionally. Tidied up the code as well. ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5cec93c2
  15. 03 3月, 2011 1 次提交
  16. 14 2月, 2011 2 次提交
    • S
      x86: Scale up the number of TLB invalidate vectors with NR_CPUs, up to 32 · 70e4a369
      Shaohua Li 提交于
      Make the maxium TLB invalidate vectors depend on NR_CPUS linearly,
      with a maximum of 32 vectors.
      
      We currently only have 8 vectors for TLB invalidation and that is clearly
      inadequate. If we have a lot of CPUs, the CPUs need share the 8 vectors and
      tlbstate_lock is used to protect them. flush_tlb_page() is
      heavily used in page reclaim, which will cause a lot of lock
      contention for tlbstate_lock.
      
      Andi Kleen suggested increasing the vectors number to 32, which should be
      good for current typical systems to reduce the tlbstate_lock contention.
      
      My test system has 4 sockets and 64G memory, and 64 CPUs. My
      workload creates 64 processes. Each process mmap reads a big
      empty sparse file. The total size of the files are 2*total_mem,
      so this will cause a lot of page reclaim.
      
      Below is the result I get from perf call-graph profiling:
      
       without the patch:
       ------------------
      
          24.25%           usemem  [kernel]                                   [k] _raw_spin_lock
                           |
                           --- _raw_spin_lock
                              |
                              |--42.15%-- native_flush_tlb_others
      
       with the patch:
       ------------------
      
          14.96%           usemem  [kernel]                                   [k] _raw_spin_lock
                           |
                           --- _raw_spin_lock
                              |--13.89%-- native_flush_tlb_others
      
      So this heavily reduces the tlbstate_lock contention.
      Suggested-by: NAndi Kleen <andi@firstfloor.org>
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1295232727.1949.709.camel@sli10-conroe>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      70e4a369
    • S
      x86: Cleanup vector usage · 60f6e65d
      Shaohua Li 提交于
      Cleanup the vector usage and make them continuous if possible.
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      LKML-Reference: <1295232722.1949.707.camel@sli10-conroe>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      60f6e65d
  17. 19 10月, 2010 1 次提交
    • P
      irq_work: Add generic hardirq context callbacks · e360adbe
      Peter Zijlstra 提交于
      Provide a mechanism that allows running code in IRQ context. It is
      most useful for NMI code that needs to interact with the rest of the
      system -- like wakeup a task to drain buffers.
      
      Perf currently has such a mechanism, so extract that and provide it as
      a generic feature, independent of perf so that others may also
      benefit.
      
      The IRQ context callback is generated through self-IPIs where
      possible, or on architectures like powerpc the decrementer (the
      built-in timer facility) is set to generate an interrupt immediately.
      
      Architectures that don't have anything like this get to do with a
      callback from the timer tick. These architectures can call
      irq_work_run() at the tail of any IRQ handlers that might enqueue such
      work (like the perf IRQ handler) to avoid undue latencies in
      processing the work.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NKyle McMartin <kyle@mcmartin.ca>
      Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      [ various fixes ]
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      LKML-Reference: <1287036094.7768.291.camel@yhuang-dev>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e360adbe
  18. 23 7月, 2010 1 次提交
  19. 19 1月, 2010 1 次提交
    • S
      x86, irq: Use 0x20 for the IRQ_MOVE_CLEANUP_VECTOR instead of 0x1f · 6579b474
      Suresh Siddha 提交于
      After talking to some more folks inside intel (Peter Anvin, Asit Mallick),
      the safest option (for future compatibility etc) seen was to use vector 0x20
      for IRQ_MOVE_CLEANUP_VECTOR instead of using vector 0x1f (which is documented as
      reserved vector in the Intel IA32 manuals).
      
      Also we don't need to reserve the entire privilege level (all 16 vectors in
      the priority bucket that IRQ_MOVE_CLEANUP_VECTOR falls into), as the
      x86 architecture (section 10.9.3 in SDM Vol3a) specifies that with in the
      priority level, the higher the vector number the higher the priority.
      And hence we don't need to reserve the complete priority level 0x20-0x2f for
      the IRQ migration cleanup logic.
      
      So change the IRQ_MOVE_CLEANUP_VECTOR to 0x20 and  allow 0x21-0x2f to be used
      for device interrupts. 0x30-0x3f will be used for ISA interrupts (these
      also can be migrated in the context of IOAPIC and hence need to be at a higher
      priority level than IRQ_MOVE_CLEANUP_VECTOR).
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <20100114002118.521826763@sbs-t61.sc.intel.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Maciej W. Rozycki <macro@linux-mips.org>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      6579b474
  20. 05 1月, 2010 2 次提交
    • H
      x86, apic: Don't waste a vector to improve vector spread · ea943966
      H. Peter Anvin 提交于
      We want to use a vector-assignment sequence that avoids stumbling onto
      0x80 earlier in the sequence, in order to improve the spread of
      vectors across priority levels on machines with a small number of
      interrupt sources.  Right now, this is done by simply making the first
      vector (0x31 or 0x41) completely unusable.  This is unnecessary; all
      we need is to start assignment at a +1 offset, we don't actually need
      to prohibit the usage of this vector once we have wrapped around.
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      LKML-Reference: <4B426550.6000209@kernel.org>
      ea943966
    • H
      x86, apic: Reclaim IDT vectors 0x20-0x2f · 99d113b1
      H. Peter Anvin 提交于
      Reclaim 16 IDT vectors and make them available for general allocation.
      
      Reclaim vectors 0x20-0x2f by reallocating the IRQ_MOVE_CLEANUP_VECTOR
      to vector 0x1f.  This is in the range of vector numbers that is
      officially reserved for the CPU (for exceptions), however, the use of
      the APIC to generate any vector 0x10 or above is documented, and the
      CPU internally can receive any vector number (the legacy BIOS uses INT
      0x08-0x0f for interrupts, as messed up as that is.)
      
      Since IRQ_MOVE_CLEANUP_VECTOR has to be alone in the lowest-numbered
      priority level (block of 16), this effectively enables us to reclaim
      an otherwise-unusable APIC priority level and put it to use.
      
      Since this is a transient kernel-only allocation we can change it at
      any time, and if/when there is an exception at vector 0x1f this
      assignment needs to be changed as part of OS enabling that new feature.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <4B4284C6.9030107@kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      99d113b1
  21. 30 12月, 2009 1 次提交
    • Y
      x86: Increase NR_IRQS and nr_irqs · 9959c888
      Yinghai Lu 提交于
      I have a system with lots of igb and ixgbe, when iov/vf are
      enabled for them, we hit the limit of 3064.
      
      when system has 20 pcie installed, and one card has 2
      functions, and one function needs 64 msi-x,
       may need 20 * 2 * 64 = 2560 for msi-x
      
      but if iov and vf are enabled
       may need 20 * 2 * 64 * 3 = 7680 for msi-x
      assume system with 5 ioapic, nr_irqs_gsi will be 120.
      
      NR_CPUS = 512, and nr_cpu_ids = 128
      will have NR_IRQS = 256 + 512 * 64 = 33024
      will have nr_irqs = 120 + 8 * 128 + 120 * 64 = 8824
      
      When SPARSE_IRQ is not set, there is no increase with kernel data
      size.
      
      when NR_CPUS=128, and SPARSE_IRQ is set:
         text		   data	    bss		   dec		 hex	filename
      21837444	4216564	12480736	38534744	24bfe58	vmlinux.before
      21837442	4216580	12480736	38534758	24bfe66	vmlinux.after
      when NR_CPUS=4096, and SPARSE_IRQ is set
         text		   data	    bss		   dec		 hex	filename
      21878619	5610244	13415392	40904255	270263f	vmlinux.before
      21878617	5610244	13415392	40904253	270263d	vmlinux.after
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <4B398ECD.1080506@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9959c888
  22. 13 12月, 2009 1 次提交
  23. 15 10月, 2009 1 次提交
  24. 04 6月, 2009 3 次提交
    • A
      x86, mce: define MCE_VECTOR · 8fa8dd9e
      Andi Kleen 提交于
      Add MCE_VECTOR for the #MC exception.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      8fa8dd9e
    • A
      x86: fix panic with interrupts off (needed for MCE) · 4ef702c1
      Andi Kleen 提交于
      For some time each panic() called with interrupts disabled
      triggered the !irqs_disabled() WARN_ON in smp_call_function(),
      producing ugly backtraces and confusing users.
      
      This is a common situation with machine checks for example which
      tend to call panic with interrupts disabled, but will also hit
      in other situations e.g. panic during early boot.  In fact it
      means that panic cannot be called in many circumstances, which
      would be bad.
      
      This all started with the new fancy queued smp_call_function,
      which is then used by the shutdown path to shut down the other
      CPUs.
      
      On closer examination it turned out that the fancy RCU
      smp_call_function() does lots of things not suitable in a panic
      situation anyways, like allocating memory and relying on complex
      system state.
      
      I originally tried to patch this over by checking for panic
      there, but it was quite complicated and the original patch
      was also not very popular.  This also didn't fix some of the
      underlying complexity problems.
      
      The new code in post 2.6.29 tries to patch around this by
      checking for oops_in_progress, but that is not enough to make
      this fully safe and I don't think that's a real solution
      because panic has to be reliable.
      
      So instead use an own vector to reboot.  This makes the reboot
      code extremly straight forward, which is definitely a big plus
      in a panic situation where it is important to avoid relying on
      too much kernel state.  The new simple code is also safe to be
      called from interupts off region because it is very very simple.
      
      There can be situations where it is important that panic
      is reliable.  For example on a fatal machine check the panic
      is needed to get the system up again and running as quickly
      as possible.  So it's important that panic is reliable and
      all function it calls simple.
      
      This is why I came up with this simple vector scheme.
      It's very hard to beat in simplicity.  Vectors are not
      particularly precious anymore since all big systems are
      using per CPU vectors.
      
      Another possibility would have been to use an NMI similar
      to kdump, but there is still the problem that NMIs don't
      work reliably on some systems due to BIOS issues.  NMIs
      would have been able to stop CPUs running with interrupts
      off too.  In the sake of universal reliability I opted for
      using a non NMI vector for now.
      
      I put the reboot vector into the highest priority bucket of
      the APIC vectors and moved the 64bit UV_BAU message down
      instead into the next lower priority.
      
      [ Impact: bug fix, fixes an old regression ]
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      4ef702c1
    • A
      x86, mce: implement bootstrapping for machine check wakeups · ccc3c319
      Andi Kleen 提交于
      Machine checks support waking up the mcelog daemon quickly.
      
      The original wake up code for this was pretty ugly, relying on
      a idle notifier and a special process flag. The reason it did
      it this way is that the machine check handler is not subject
      to normal interrupt locking rules so it's not safe
      to call wake_up().  Instead it set a process flag
      and then either did the wakeup in the syscall return
      or in the idle notifier.
      
      This patch adds a new "bootstraping" method as replacement.
      
      The idea is that the handler checks if it's in a state where
      it is unsafe to call wake_up(). If it's safe it calls it directly.
      When it's not safe -- that is it interrupted in a critical
      section with interrupts disables -- it uses a new "self IPI" to trigger
      an IPI to its own CPU. This can be done safely because IPI
      triggers are atomic with some care. The IPI is raised
      once the interrupts are reenabled and can then safely call
      wake_up().
      
      When APICs are disabled the event is just queued and will be picked up
      eventually by the next polling timer. I think that's a reasonable
      compromise, since it should only happen quite rarely.
      
      Contains fixes from Ying Huang.
      
      [ solve conflict on irqinit, make it work on 32bit (entry_arch.h) - HS ]
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      ccc3c319
  25. 03 6月, 2009 1 次提交
    • Y
      perf_counter/x86: Remove the IRQ (non-NMI) handling bits · a3288106
      Yong Wang 提交于
      Remove the IRQ (non-NMI) handling bits as NMI will be used always.
      Signed-off-by: NYong Wang <yong.y.wang@intel.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: John Kacur <jkacur@redhat.com>
      LKML-Reference: <20090603051255.GA2791@ywang-moblin2.bj.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a3288106
  26. 29 5月, 2009 2 次提交
  27. 10 4月, 2009 1 次提交
  28. 07 4月, 2009 1 次提交
  29. 05 3月, 2009 1 次提交
  30. 25 2月, 2009 1 次提交
  31. 16 2月, 2009 1 次提交
  32. 31 1月, 2009 2 次提交