1. 07 8月, 2013 9 次提交
  2. 30 7月, 2013 1 次提交
  3. 17 7月, 2013 1 次提交
    • K
      x86: Make sure IDT is page aligned · 4df05f36
      Kees Cook 提交于
      Since the IDT is referenced from a fixmap, make sure it is page aligned.
      Merge with 32-bit one, since it was already aligned to deal with F00F
      bug. Since bss is cleared before IDT setup, it can live there. This also
      moves the other *_idt_table variables into common locations.
      
      This avoids the risk of the IDT ever being moved in the bss and having
      the mapping be offset, resulting in calling incorrect handlers. In the
      current upstream kernel this is not a manifested bug, but heavily patched
      kernels (such as those using the PaX patch series) did encounter this bug.
      
      The tables other than idt_table technically do not need to be page
      aligned, at least not at the current time, but using a common
      declaration avoids mistakes.  On 64 bits the table is exactly one page
      long, anyway.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Link: http://lkml.kernel.org/r/20130716183441.GA14232@www.outflux.netReported-by: NPaX Team <pageexec@gmail.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      4df05f36
  4. 16 7月, 2013 1 次提交
  5. 15 7月, 2013 1 次提交
    • P
      x86: delete __cpuinit usage from all x86 files · 148f9bb8
      Paul Gortmaker 提交于
      The __cpuinit type of throwaway sections might have made sense
      some time ago when RAM was more constrained, but now the savings
      do not offset the cost and complications.  For example, the fix in
      commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
      is a good example of the nasty type of bugs that can be created
      with improper use of the various __init prefixes.
      
      After a discussion on LKML[1] it was decided that cpuinit should go
      the way of devinit and be phased out.  Once all the users are gone,
      we can then finally remove the macros themselves from linux/init.h.
      
      Note that some harmless section mismatch warnings may result, since
      notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
      are flagged as __cpuinit  -- so if we remove the __cpuinit from
      arch specific callers, we will also get section mismatch warnings.
      As an intermediate step, we intend to turn the linux/init.h cpuinit
      content into no-ops as early as possible, since that will get rid
      of these warnings.  In any case, they are temporary and harmless.
      
      This removes all the arch/x86 uses of the __cpuinit macros from
      all C files.  x86 only had the one __CPUINIT used in assembly files,
      and it wasn't paired off with a .previous or a __FINIT, so we can
      delete it directly w/o any corresponding additional change there.
      
      [1] https://lkml.org/lkml/2013/5/20/589
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: x86@kernel.org
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      148f9bb8
  6. 12 7月, 2013 1 次提交
  7. 10 7月, 2013 9 次提交
  8. 05 7月, 2013 1 次提交
  9. 04 7月, 2013 1 次提交
  10. 02 7月, 2013 1 次提交
    • S
      x86/tracing: Add irq_enter/exit() in smp_trace_reschedule_interrupt() · 4787c368
      Seiji Aguchi 提交于
      Reschedule vector tracepoints may be called in cpu idle state.
      This causes lockdep check warning below.
      
      The tracepoint requires rcu but for accuracy it also
      requires irq_enter() (tracepoints record the irq context), thus,
      the tracepoint interrupt handler should be calling irq_enter()
      and not rcu_irq_enter() (irq_enter() calls rcu_irq_enter()).
      
      So, add irq_enter/exit() to smp_trace_reschedule_interrupt()
      with common pre/post processing functions, smp_entering_irq()
      and exiting_irq() (exiting_irq() calls just irq_exit()
       in arch/x86/include/asm/apic.h),
      because these can be shared among reschedule, call_function,
      and call_function_single vectors.
      
      [   50.720557] Testing event reschedule_exit:
      [   50.721349]
      [   50.721502] ===============================
      [   50.721835] [ INFO: suspicious RCU usage. ]
      [   50.722169] 3.10.0-rc6-00004-gcf910e83 #190 Not tainted
      [   50.722582] -------------------------------
      [   50.722915] /c/kernel-tests/src/linux/arch/x86/include/asm/trace/irq_vectors.h:50 suspicious rcu_dereference_check() usage!
      [   50.723770]
      [   50.723770] other info that might help us debug this:
      [   50.723770]
      [   50.724385]
      [   50.724385] RCU used illegally from idle CPU!
      [   50.724385] rcu_scheduler_active = 1, debug_locks = 0
      [   50.725232] RCU used illegally from extended quiescent state!
      [   50.725690] no locks held by swapper/0/0.
      [   50.726010]
      [   50.726010] stack backtrace:
      [...]
      Signed-off-by: NSeiji Aguchi <seiji.aguchi@hds.com>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/51CDCFA3.9080101@hds.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4787c368
  11. 28 6月, 2013 2 次提交
  12. 27 6月, 2013 3 次提交
  13. 26 6月, 2013 5 次提交
    • A
      perf/x86/intel: Support full width counting · 069e0c3c
      Andi Kleen 提交于
      Recent Intel CPUs like Haswell and IvyBridge have a new
      alternative MSR range for perfctrs that allows writing the full
      counter width. Enable this range if the hardware reports it
      using a new capability bit.
      
      Currently the perf code queries CPUID to get the counter width,
      and sign extends the counter values as needed. The traditional
      PERFCTR MSRs always limit to 32bit, even though the counter
      internally is larger (usually 48 bits on recent CPUs)
      
      When the new capability is set use the alternative range which
      do not have these restrictions.
      
      This lowers the overhead of perf stat slightly because it has to
      do less interrupts to accumulate the counter value. On Haswell
      it also avoids some problems with TSX aborting when the end of
      the counter range is reached.
      
      ( See the patch "perf/x86/intel: Avoid checkpointed counters
        causing excessive TSX aborts" for more details. )
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Reviewed-by: NStephane Eranian <eranian@google.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1372173153-20215-1-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      069e0c3c
    • H
      x86, asm, cleanup: Replace open-coded control register values with symbolic · a3d7b7dd
      H. Peter Anvin 提交于
      Clean up an unnecessary open-coded control register values.
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Link: http://lkml.kernel.org/n/tip-um7za1nzf6brb17o0h4om6e3@git.kernel.org
      a3d7b7dd
    • H
      x86, flags: Rename X86_EFLAGS_BIT1 to X86_EFLAGS_FIXED · 1adfa76a
      H. Peter Anvin 提交于
      Bit 1 in the x86 EFLAGS is always set.  Name the macro something that
      actually tries to explain what it is all about, rather than being a
      tautology.
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Link: http://lkml.kernel.org/n/tip-f10rx5vjjm6tfnt8o1wseb3v@git.kernel.org
      1adfa76a
    • N
      mce: acpi/apei: Add comments to clarify usage of the various bitfields in the MCA subsystem · 0644414e
      Naveen N. Rao 提交于
      There is some confusion about the 'mce_poll_banks' and 'mce_banks_owned'
      per-cpu bitmaps.  Provide comments so that we all know exactly what these
      are used for, and why.
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      0644414e
    • Y
      x86: Fix /proc/mtrr with base/size more than 44bits · d5c78673
      Yinghai Lu 提交于
      On one sytem that mtrr range is more then 44bits, in dmesg we have
      [    0.000000] MTRR default type: write-back
      [    0.000000] MTRR fixed ranges enabled:
      [    0.000000]   00000-9FFFF write-back
      [    0.000000]   A0000-BFFFF uncachable
      [    0.000000]   C0000-DFFFF write-through
      [    0.000000]   E0000-FFFFF write-protect
      [    0.000000] MTRR variable ranges enabled:
      [    0.000000]   0 [000080000000-0000FFFFFFFF] mask 3FFF80000000 uncachable
      [    0.000000]   1 [380000000000-38FFFFFFFFFF] mask 3F0000000000 uncachable
      [    0.000000]   2 [000099000000-000099FFFFFF] mask 3FFFFF000000 write-through
      [    0.000000]   3 [00009A000000-00009AFFFFFF] mask 3FFFFF000000 write-through
      [    0.000000]   4 [381FFA000000-381FFBFFFFFF] mask 3FFFFE000000 write-through
      [    0.000000]   5 [381FFC000000-381FFC0FFFFF] mask 3FFFFFF00000 write-through
      [    0.000000]   6 [0000AD000000-0000ADFFFFFF] mask 3FFFFF000000 write-through
      [    0.000000]   7 [0000BD000000-0000BDFFFFFF] mask 3FFFFF000000 write-through
      [    0.000000]   8 disabled
      [    0.000000]   9 disabled
      
      but /proc/mtrr report wrong:
      reg00: base=0x080000000 ( 2048MB), size= 2048MB, count=1: uncachable
      reg01: base=0x80000000000 (8388608MB), size=1048576MB, count=1: uncachable
      reg02: base=0x099000000 ( 2448MB), size=   16MB, count=1: write-through
      reg03: base=0x09a000000 ( 2464MB), size=   16MB, count=1: write-through
      reg04: base=0x81ffa000000 (8519584MB), size=   32MB, count=1: write-through
      reg05: base=0x81ffc000000 (8519616MB), size=    1MB, count=1: write-through
      reg06: base=0x0ad000000 ( 2768MB), size=   16MB, count=1: write-through
      reg07: base=0x0bd000000 ( 3024MB), size=   16MB, count=1: write-through
      reg08: base=0x09b000000 ( 2480MB), size=   16MB, count=1: write-combining
      
      so bit 44 and bit 45 get cut off.
      
      We have problems in arch/x86/kernel/cpu/mtrr/generic.c::generic_get_mtrr().
      1. for base, we miss cast base_lo to 64bit before shifting.
      Fix that by adding u64 casting.
      
      2. for size, it only can handle 44 bits aka 32bits + page_shift
      Fix that with 64bit mask instead of 32bit mask_lo, then range could be
      more than 44bits.
      At the same time, we need to update size_or_mask for old cpus that does
      support cpuid 0x80000008 to get phys_addr. Need to set high 32bits
      to all 1s, otherwise will not get correct size for them.
      
      Also fix mtrr_add_page: it should check base and (base + size - 1)
      instead of base and size, as base and size could be small but
      base + size could bigger enough to be out of boundary. We can
      use boot_cpu_data.x86_phys_bits directly to avoid size_or_mask.
      
      So When are we going to have size more than 44bits? that is 16TiB.
      
      after patch we have right ouput:
      reg00: base=0x080000000 ( 2048MB), size= 2048MB, count=1: uncachable
      reg01: base=0x380000000000 (58720256MB), size=1048576MB, count=1: uncachable
      reg02: base=0x099000000 ( 2448MB), size=   16MB, count=1: write-through
      reg03: base=0x09a000000 ( 2464MB), size=   16MB, count=1: write-through
      reg04: base=0x381ffa000000 (58851232MB), size=   32MB, count=1: write-through
      reg05: base=0x381ffc000000 (58851264MB), size=    1MB, count=1: write-through
      reg06: base=0x0ad000000 ( 2768MB), size=   16MB, count=1: write-through
      reg07: base=0x0bd000000 ( 3024MB), size=   16MB, count=1: write-through
      reg08: base=0x09b000000 ( 2480MB), size=   16MB, count=1: write-combining
      
      -v2: simply checking in mtrr_add_page according to hpa.
      
      [ hpa: This probably wants to go into -stable only after having sat in
        mainline for a bit.  It is not a regression. ]
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1371162815-29931-1-git-send-email-yinghai@kernel.org
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      d5c78673
  14. 24 6月, 2013 1 次提交
    • G
      irqdomain: Refactor irq_domain_associate_many() · ddaf144c
      Grant Likely 提交于
      Originally, irq_domain_associate_many() was designed to unwind the
      mapped irqs on a failure of any individual association. However, that
      proved to be a problem with certain IRQ controllers. Some of them only
      support a subset of irqs, and will fail when attempting to map a
      reserved IRQ. In those cases we want to map as many IRQs as possible, so
      instead it is better for irq_domain_associate_many() to make a
      best-effort attempt to map irqs, but not fail if any or all of them
      don't succeed. If a caller really cares about how many irqs got
      associated, then it should instead go back and check that all of the
      irqs is cares about were mapped.
      
      The original design open-coded the individual association code into the
      body of irq_domain_associate_many(), but with no longer needing to
      unwind associations, the code becomes simpler to split out
      irq_domain_associate() to contain the bulk of the logic, and
      irq_domain_associate_many() to be a simple loop wrapper.
      
      This patch also adds a new error check to the associate path to make
      sure it isn't called for an irq larger than the controller can handle,
      and adds locking so that the irq_domain_mutex is held while setting up a
      new association.
      
      v3: Fixup missing change to irq_domain_add_tree()
      v2: Fixup x86 warning. irq_domain_associate_many() no longer returns an
          error code, but reports errors to the printk log directly. In the
          majority of cases we don't actually want to fail if there is a
          problem, but rather log it and still try to boot the system.
      Signed-off-by: NGrant Likely <grant.likely@linaro.org>
      
      irqdomain: Fix flubbed irq_domain_associate_many refactoring
      
      commit d39046ec72, "irqdomain: Refactor irq_domain_associate_many()" was
      missing the following hunk which causes a boot failure on anything using
      irq_domain_add_tree() to allocate an irq domain.
      Signed-off-by: NGrant Likely <grant.likely@linaro.org>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
      Cc: Thomas Gleixner <tglx@linutronix.de>,
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      ddaf144c
  15. 23 6月, 2013 3 次提交
    • D
      x86: Add NMI duration tracepoints · 0c4df02d
      Dave Hansen 提交于
      This patch has been invaluable in my adventures finding
      issues in the perf NMI handler.  I'm as big a fan of
      printk() as anybody is, but using printk() in NMIs is
      deadly when they're happening frequently.
      
      Even hacking in trace_printk() ended up eating enough
      CPU to throw off some of the measurements I was making.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: paulus@samba.org
      Cc: acme@ghostprotocols.net
      Cc: Dave Hansen <dave@sr71.net>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      0c4df02d
    • D
      perf: Drop sample rate when sampling is too slow · 14c63f17
      Dave Hansen 提交于
      This patch keeps track of how long perf's NMI handler is taking,
      and also calculates how many samples perf can take a second.  If
      the sample length times the expected max number of samples
      exceeds a configurable threshold, it drops the sample rate.
      
      This way, we don't have a runaway sampling process eating up the
      CPU.
      
      This patch can tend to drop the sample rate down to level where
      perf doesn't work very well.  *BUT* the alternative is that my
      system hangs because it spends all of its time handling NMIs.
      
      I'll take a busted performance tool over an entire system that's
      busted and undebuggable any day.
      
      BTW, my suspicion is that there's still an underlying bug here.
      Using the HPET instead of the TSC is definitely a contributing
      factor, but I suspect there are some other things going on.
      But, I can't go dig down on a bug like that with my machine
      hanging all the time.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: paulus@samba.org
      Cc: acme@ghostprotocols.net
      Cc: Dave Hansen <dave@sr71.net>
      [ Prettified it a bit. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      14c63f17
    • D
      x86: Warn when NMI handlers take large amounts of time · 2ab00456
      Dave Hansen 提交于
      I have a system which is causing all kinds of problems.  It has
      8 NUMA nodes, and lots of cores that can fight over cachelines.
      If things are not working _perfectly_, then NMIs can take longer
      than expected.
      
      If we get too many of them backed up to each other, we can
      easily end up in a situation where we are doing nothing *but*
      running NMIs.  The biggest problem, though, is that this happens
      _silently_.  You might be lucky to get an hrtimer warning, but
      most of the time system simply hangs.
      
      This patch should at least give us some warning before we fall
      off the cliff.  the warnings look like this:
      
      	nmi_handle: perf_event_nmi_handler() took: 26095071 ns
      
      The message is triggered whenever we notice the longest NMI
      we've seen to date.  You can always view and reset this value
      via the debugfs interface if you like.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: paulus@samba.org
      Cc: acme@ghostprotocols.net
      Cc: Dave Hansen <dave@sr71.net>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      2ab00456