1. 01 11月, 2014 1 次提交
    • S
      ftrace/x86: Add dynamic allocated trampoline for ftrace_ops · f3bea491
      Steven Rostedt (Red Hat) 提交于
      The current method of handling multiple function callbacks is to register
      a list function callback that calls all the other callbacks based on
      their hash tables and compare it to the function that the callback was
      called on. But this is very inefficient.
      
      For example, if you are tracing all functions in the kernel and then
      add a kprobe to a function such that the kprobe uses ftrace, the
      mcount trampoline will switch from calling the function trace callback
      to calling the list callback that will iterate over all registered
      ftrace_ops (in this case, the function tracer and the kprobes callback).
      That means for every function being traced it checks the hash of the
      ftrace_ops for function tracing and kprobes, even though the kprobes
      is only set at a single function. The kprobes ftrace_ops is checked
      for every function being traced!
      
      Instead of calling the list function for functions that are only being
      traced by a single callback, we can call a dynamically allocated
      trampoline that calls the callback directly. The function graph tracer
      already uses a direct call trampoline when it is being traced by itself
      but it is not dynamically allocated. It's trampoline is static in the
      kernel core. The infrastructure that called the function graph trampoline
      can also be used to call a dynamically allocated one.
      
      For now, only ftrace_ops that are not dynamically allocated can have
      a trampoline. That is, users such as function tracer or stack tracer.
      kprobes and perf allocate their ftrace_ops, and until there's a safe
      way to free the trampoline, it can not be used. The dynamically allocated
      ftrace_ops may, although, use the trampoline if the kernel is not
      compiled with CONFIG_PREEMPT. But that will come later.
      Tested-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Tested-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f3bea491
  2. 14 10月, 2014 4 次提交
  3. 08 10月, 2014 3 次提交
  4. 07 10月, 2014 1 次提交
    • A
      x86_64, entry: Filter RFLAGS.NT on entry from userspace · 8c7aa698
      Andy Lutomirski 提交于
      The NT flag doesn't do anything in long mode other than causing IRET
      to #GP.  Oddly, CPL3 code can still set NT using popf.
      
      Entry via hardware or software interrupt clears NT automatically, so
      the only relevant entries are fast syscalls.
      
      If user code causes kernel code to run with NT set, then there's at
      least some (small) chance that it could cause trouble.  For example,
      user code could cause a call to EFI code with NT set, and who knows
      what would happen?  Apparently some games on Wine sometimes do
      this (!), and, if an IRET return happens, they will segfault.  That
      segfault cannot be handled, because signal delivery fails, too.
      
      This patch programs the CPU to clear NT on entry via SYSCALL (both
      32-bit and 64-bit, by my reading of the AMD APM), and it clears NT
      in software on entry via SYSENTER.
      
      To save a few cycles, this borrows a trick from Jan Beulich in Xen:
      it checks whether NT is set before trying to clear it.  As a result,
      it seems to have very little effect on SYSENTER performance on my
      machine.
      
      There's another minor bug fix in here: it looks like the CFI
      annotations were wrong if CONFIG_AUDITSYSCALL=n.
      
      Testers beware: on Xen, SYSENTER with NT set turns into a GPF.
      
      I haven't touched anything on 32-bit kernels.
      
      The syscall mask change comes from a variant of this patch by Anish
      Bhatt.
      
      Note to stable maintainers: there is no known security issue here.
      A misguided program can set NT and cause the kernel to try and fail
      to deliver SIGSEGV, crashing the program.  This patch fixes Far Cry
      on Wine: https://bugs.winehq.org/show_bug.cgi?id=33275
      
      Cc: <stable@vger.kernel.org>
      Reported-by: NAnish Bhatt <anish@chelsio.com>
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      Link: http://lkml.kernel.org/r/395749a5d39a29bd3e4b35899cf3a3c1340e5595.1412189265.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
      8c7aa698
  5. 03 10月, 2014 3 次提交
  6. 24 9月, 2014 18 次提交
    • O
      x86: Speed up ___preempt_schedule*() by using THUNK helpers · 0ad6e3c5
      Oleg Nesterov 提交于
      ___preempt_schedule() does SAVE_ALL/RESTORE_ALL but this is
      suboptimal, we do not need to save/restore the callee-saved
      register. And we already have arch/x86/lib/thunk_*.S which
      implements the similar asm wrappers, so it makes sense to
      redefine ___preempt_schedule() as "THUNK ..." and remove
      preempt.S altogether.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NAndy Lutomirski <luto@amacapital.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20140921184153.GA23727@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0ad6e3c5
    • W
      sched: Fix unreleased llc_shared_mask bit during CPU hotplug · 03bd4e1f
      Wanpeng Li 提交于
      The following bug can be triggered by hot adding and removing a large number of
      xen domain0's vcpus repeatedly:
      
      	BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 IP: [..] find_busiest_group
      	PGD 5a9d5067 PUD 13067 PMD 0
      	Oops: 0000 [#3] SMP
      	[...]
      	Call Trace:
      	load_balance
      	? _raw_spin_unlock_irqrestore
      	idle_balance
      	__schedule
      	schedule
      	schedule_timeout
      	? lock_timer_base
      	schedule_timeout_uninterruptible
      	msleep
      	lock_device_hotplug_sysfs
      	online_store
      	dev_attr_store
      	sysfs_write_file
      	vfs_write
      	SyS_write
      	system_call_fastpath
      
      Last level cache shared mask is built during CPU up and the
      build_sched_domain() routine takes advantage of it to setup
      the sched domain CPU topology.
      
      However, llc_shared_mask is not released during CPU disable,
      which leads to an invalid sched domainCPU topology.
      
      This patch fix it by releasing the llc_shared_mask correctly
      during CPU disable.
      
      Yasuaki also reported that this can happen on real hardware:
      
        https://lkml.org/lkml/2014/7/22/1018
      
      His case is here:
      
      	==
      	Here is an example on my system.
      	My system has 4 sockets and each socket has 15 cores and HT is
      	enabled. In this case, each core of sockes is numbered as
      	follows:
      
      		 | CPU#
      	Socket#0 | 0-14 , 60-74
      	Socket#1 | 15-29, 75-89
      	Socket#2 | 30-44, 90-104
      	Socket#3 | 45-59, 105-119
      
      	Then llc_shared_mask of CPU#30 has 0x3fff80000001fffc0000000.
      
      	It means that last level cache of Socket#2 is shared with
      	CPU#30-44 and 90-104.
      
      	When hot-removing socket#2 and #3, each core of sockets is
      	numbered as follows:
      
      		 | CPU#
      	Socket#0 | 0-14 , 60-74
      	Socket#1 | 15-29, 75-89
      
      	But llc_shared_mask is not cleared. So llc_shared_mask of CPU#30
      	remains having 0x3fff80000001fffc0000000.
      
      	After that, when hot-adding socket#2 and #3, each core of
      	sockets is numbered as follows:
      
      		 | CPU#
      	Socket#0 | 0-14 , 60-74
      	Socket#1 | 15-29, 75-89
      	Socket#2 | 30-59
      	Socket#3 | 90-119
      
      	Then llc_shared_mask of CPU#30 becomes
      	0x3fff8000fffffffc0000000. It means that last level cache of
      	Socket#2 is shared with CPU#30-59 and 90-104. So the mask has
      	the wrong value.
      Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com>
      Tested-by: NLinn Crosetto <linn@hp.com>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NToshi Kani <toshi.kani@hp.com>
      Reviewed-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: <stable@vger.kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1411547885-48165-1-git-send-email-wanpeng.li@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      03bd4e1f
    • B
      x86/intel/quark: Switch off CR4.PGE so TLB flush uses CR3 instead · ee1b5b16
      Bryan O'Donoghue 提交于
      Quark x1000 advertises PGE via the standard CPUID method
      PGE bits exist in Quark X1000's PTEs. In order to flush
      an individual PTE it is necessary to reload CR3 irrespective
      of the PTE.PGE bit.
      
      See Quark Core_DevMan_001.pdf section 6.4.11
      
      This bug was fixed in Galileo kernels, unfixed vanilla kernels are expected to
      crash and burn on this platform.
      Signed-off-by: NBryan O'Donoghue <pure.logic@nexus-software.ie>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1411514784-14885-1-git-send-email-pure.logic@nexus-software.ieSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ee1b5b16
    • L
      x86/smpboot: Speed up suspend/resume by avoiding 100ms sleep for CPU offline during S3 · 2ed53c0d
      Lan Tianyu 提交于
      With certain kernel configurations, CPU offline consumes more than
      100ms during S3.
      
      It's a timing related issue: native_cpu_die() would occasionally fall
      into a 100ms sleep when the CPU idle loop thread marked the CPU state
      to DEAD too slowly.
      
      What native_cpu_die() does is that it polls the CPU state and waits
      for 100ms if CPU state hasn't been marked to DEAD. The 100ms sleep
      doesn't make sense and is purely historic.
      
      To avoid such long sleeping, this patch adds a 'struct completion'
      to each CPU, waits for the completion in native_cpu_die() and wakes
      up the completion when the CPU state is marked to DEAD.
      
      Tested on an Intel Xeon server with 48 cores, Ivybridge and on
      Haswell laptops. The CPU offlining cost on these machines is
      reduced from more than 100ms to less than 5ms. The system
      suspend time is reduced by 2.3s on the servers.
      
      Borislav and Prarit also helped to test the patch on an AMD
      machine and a few systems of various sizes and configurations
      (multi-socket, single-socket, no hyper threading, etc.). No
      issues were seen.
      Tested-by: NPrarit Bhargava <prarit@redhat.com>
      Signed-off-by: NLan Tianyu <tianyu.lan@intel.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: srostedt@redhat.com
      Cc: toshi.kani@hp.com
      Cc: imammedo@redhat.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1409039025-32310-1-git-send-email-tianyu.lan@intel.com
      [ Improved a few minor details in the code, cleaned up the changelog. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      2ed53c0d
    • S
      perf/x86/intel/uncore: Update support for client uncore IMC PMU · 521e8bac
      Stephane Eranian 提交于
      This patch restructures the memory controller (IMC) uncore PMU support
      for client SNB/IVB/HSW processors. The main change is that it can now
      cope with more than one PCI device ID per processor model. There are
      many flavors of memory controllers for each processor. They have
      different PCI device ID, yet they behave the same w.r.t. the memory
      controller PMU that we are interested in.
      
      The patch now supports two distinct memory controllers for IVB
      processors: one for mobile, one for desktop.
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20140917090616.GA11281@quad
      Cc: ak@linux.intel.com
      Cc: kan.liang@intel.com
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      521e8bac
    • A
      perf/x86/intel/uncore: Fix PCU filter setup for Sandy/Ivy/Haswell EP · b10fc1c3
      Andi Kleen 提交于
      The PCU frequency band filters use 8 bit each in a register.
      When setting up the value the shift value was not correctly
      scaled, which resulted in all filters except for band 0 to
      be zero. Fix the scaling.
      
      This allows to correctly monitor multiple uncore frequency bands.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: eranian@google.com
      Link: http://lkml.kernel.org/r/1409872109-31645-5-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b10fc1c3
    • A
      perf/x86/intel/uncore: Add missing cbox filter flags on IvyBridge-EP uncore driver · 7e96ae1a
      Andi Kleen 提交于
      The IvyBridge-EP uncore driver was missing three filter flags:
      NC, ISOC, C6 which are useful in some cases. Support them in the same way
      as the Haswell EP driver, by allowing to set them and exposing
      them in the sysfs formats.
      
      Also fix a typo in a define.
      
      Relies on the Haswell EP driver to be applied earlier.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1409872109-31645-4-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7e96ae1a
    • Y
      perf/x86/intel/uncore: Register the PMU only if the uncore pci device exists · 513d793e
      Yan, Zheng 提交于
      Current code registers PMUs for all possible uncore pci devices.
      This is not good because, on some machines, one or more uncore pci
      devices can be missing. The missing pci device make corresponding
      PMU unusable. Register the PMU only if the uncore device exists.
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: eranian@google.com
      Link: http://lkml.kernel.org/r/1409872109-31645-3-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      513d793e
    • Y
      perf/x86/intel/uncore: Add Haswell-EP uncore support · e735b9db
      Yan, Zheng 提交于
      The uncore subsystem in Haswell-EP is similar to Sandy/Ivy
      Bridge-EP. There are some differences in config register
      encoding and pci device IDs. The Haswell-EP uncore also
      supports a few new events. Add the Haswell-EP driver to
      the snbep split driver.
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      [ Add missing break. Add imc events. Add cbox nc/isoc/c6. ]
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: eranian@google.com
      Link: http://lkml.kernel.org/r/1409872109-31645-2-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e735b9db
    • A
      perf/x86/intel: Use Broadwell cache event list for Haswell · fdda3c4a
      Andi Kleen 提交于
      Use the newly added Broadwell cache event list for Haswell too.
      All Haswell and Broadwell events and offcore masks used in these lists
      are identical.
      
      However Haswell is very different from the Sandy Bridge
      list that was used previously. That fixes a wide range of mis-counting
      cache events.
      
      The node events are now only for retired memory events, so prefetching
      and speculative memory accesses are not included. They are PEBS
      capable now, which makes it much easier to sample for them, plus it's
      possible to create address maps with -d.
      
      The prefetch events are gone now. They way the hardware counts
      them is very misleading (some prefetches included, others not), so
      it seemed best to leave them out.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: eranian@google.com
      Link: http://lkml.kernel.org/r/1409683455-29168-5-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fdda3c4a
    • A
      perf/x86: Add INST_RETIRED.ALL workarounds · c46e665f
      Andi Kleen 提交于
      On Broadwell INST_RETIRED.ALL cannot be used with any period
      that doesn't have the lowest 6 bits cleared. And the period
      should not be smaller than 128.
      
      Add a new callback to enforce this, and set it for Broadwell.
      
      This is erratum BDM57 and BDM11.
      
      How does this handle the case when an app requests a specific
      period with some of the bottom bits set
      
      The apps thinks it is sampling at X occurences per sample, when it is
      in fact at X - 63 (worst case).
      
      Short answer:
      
      Any useful instruction sampling period needs to be 4-6 orders
      of magnitude larger than 128, as an PMI every 128 instructions
      would instantly overwhelm the system and be throttled.
      So the +-64 error from this is really small compared to the
      period, much smaller than normal system jitter.
      
      Long answer:
      
      <write up by Peter:>
      
      IFF we guarantee perf_event_attr::sample_period >= 128.
      
      Suppose we start out with sample_period=192; then we'll set period_left
      to 192, we'll end up with left = 128 (we truncate the lower bits). We
      get an interrupt, find that period_left = 64 (>0 so we return 0 and
      don't get an overflow handler), up that to 128. Then we trigger again,
      at n=256. Then we find period_left = -64 (<=0 so we return 1 and do get
      an overflow). We increment with sample_period so we get left = 128. We
      fire again, at n=384, period_left = 0 (<=0 so we return 1 and get an
      overflow). And on and on.
      
      So while the individual interrupts are 'wrong' we get then with
      interval=256,128 in exactly the right ratio to average out at 192. And
      this works for everything >=128.
      
      So the num_samples*fixed_period thing is still entirely correct +- 127,
      which is good enough I'd say, as you already have that error anyhow.
      
      So no need to 'fix' the tools, al we need to do is refuse to create
      INST_RETIRED:ALL events with sample_period < 128.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com>
      Cc: Mark Davies <junk@eslaf.co.uk>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1409683455-29168-4-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c46e665f
    • A
      perf/x86/intel: Add Broadwell core support · 86a349a2
      Andi Kleen 提交于
      Add Broadwell support for Broadwell Client to perf.  This is very
      similar to Haswell.  It uses a new cache event table, because there
      were various changes there.
      
      The constraint list has one new event that needs to be handled over
      Haswell.
      
      The PEBS event list is the same, so we reuse Haswell's.
      
      [fengguang.wu: make intel_bdw_event_constraints[] static]
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: eranian@google.com
      Link: http://lkml.kernel.org/r/1409683455-29168-3-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      86a349a2
    • A
      perf/x86/intel: Document all Haswell models · d86c8eaf
      Andi Kleen 提交于
      Add names for each Haswell model as requested by Peter.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: eranian@google.com
      Link: http://lkml.kernel.org/r/1409683455-29168-2-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d86c8eaf
    • A
      perf/x86/intel: Remove incorrect model number from Haswell perf · b7614685
      Andi Kleen 提交于
      71 is a Broadwell, not a Haswell. The model number was added
      by mistake earlier.
      
      Remove it for now, until it can be re-added later with
      real Broadwell support.
      
      In practice it does not cause a lot of issues because the Broadwell
      PMU is very similar to Haswell, but some details were wrong,
      and it's better to handle it correctly.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: eranian@google.com
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Link: http://lkml.kernel.org/r/1409683455-29168-1-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b7614685
    • D
      x86, sched: Add new topology for multi-NUMA-node CPUs · cebf15eb
      Dave Hansen 提交于
      I'm getting the spew below when booting with Haswell (Xeon
      E5-2699 v3) CPUs and the "Cluster-on-Die" (CoD) feature enabled
      in the BIOS.  It seems similar to the issue that some folks from
      AMD ran in to on their systems and addressed in this commit:
      
        161270fc ("x86/smp: Fix topology checks on AMD MCM CPUs")
      
      Both these Intel and AMD systems break an assumption which is
      being enforced by topology_sane(): a socket may not contain more
      than one NUMA node.
      
      AMD special-cased their system by looking for a cpuid flag.  The
      Intel mode is dependent on BIOS options and I do not know of a
      way which it is enumerated other than the tables being parsed
      during the CPU bringup process.  In other words, we have to trust
      the ACPI tables <shudder>.
      
      This detects the situation where a NUMA node occurs at a place in
      the middle of the "CPU" sched domains.  It replaces the default
      topology with one that relies on the NUMA information from the
      firmware (SRAT table) for all levels of sched domains above the
      hyperthreads.
      
      This also fixes a sysfs bug.  We used to freak out when we saw
      the "mc" group cross a node boundary, so we stopped building the
      MC group.  MC gets exported as the 'core_siblings_list' in
      /sys/devices/system/cpu/cpu*/topology/ and this caused CPUs with
      the same 'physical_package_id' to not be listed together in
      'core_siblings_list'.  This violates a statement from
      Documentation/ABI/testing/sysfs-devices-system-cpu:
      
      	core_siblings: internal kernel map of cpu#'s hardware threads
      	within the same physical_package_id.
      
      	core_siblings_list: human-readable list of the logical CPU
      	numbers within the same physical_package_id as cpu#.
      
      The sysfs effects here cause an issue with the hwloc tool where
      it gets confused and thinks there are more sockets than are
      physically present.
      
      Before this patch, there are two packages:
      
      # cd /sys/devices/system/cpu/
      # cat cpu*/topology/physical_package_id | sort | uniq -c
           18 0
           18 1
      
      But 4 _sets_ of core siblings:
      
      # cat cpu*/topology/core_siblings_list | sort | uniq -c
            9 0-8
            9 18-26
            9 27-35
            9 9-17
      
      After this set, there are only 2 sets of core siblings, which
      is what we expect for a 2-socket system.
      
      # cat cpu*/topology/physical_package_id | sort | uniq -c
           18 0
           18 1
      # cat cpu*/topology/core_siblings_list | sort | uniq -c
           18 0-17
           18 18-35
      
      Example spew:
      ...
      	NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
      	 #2  #3  #4  #5  #6  #7  #8
      	.... node  #1, CPUs:    #9
      	------------[ cut here ]------------
      	WARNING: CPU: 9 PID: 0 at /home/ak/hle/linux-hle-2.6/arch/x86/kernel/smpboot.c:306 topology_sane.isra.2+0x74/0x90()
      	sched: CPU #9's mc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
      	Modules linked in:
      	CPU: 9 PID: 0 Comm: swapper/9 Not tainted 3.17.0-rc1-00293-g8e01c4d-dirty #631
      	Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRNDSDP1.86B.0036.R05.1407140519 07/14/2014
      	0000000000000009 ffff88046ddabe00 ffffffff8172e485 ffff88046ddabe48
      	ffff88046ddabe38 ffffffff8109691d 000000000000b001 0000000000000009
      	ffff88086fc12580 000000000000b020 0000000000000009 ffff88046ddabe98
      	Call Trace:
      	[<ffffffff8172e485>] dump_stack+0x45/0x56
      	[<ffffffff8109691d>] warn_slowpath_common+0x7d/0xa0
      	[<ffffffff8109698c>] warn_slowpath_fmt+0x4c/0x50
      	[<ffffffff81074f94>] topology_sane.isra.2+0x74/0x90
      	[<ffffffff8107530e>] set_cpu_sibling_map+0x31e/0x4f0
      	[<ffffffff8107568d>] start_secondary+0x1ad/0x240
      	---[ end trace 3fe5f587a9fcde61 ]---
      	#10 #11 #12 #13 #14 #15 #16 #17
      	.... node  #2, CPUs:   #18 #19 #20 #21 #22 #23 #24 #25 #26
      	.... node  #3, CPUs:   #27 #28 #29 #30 #31 #32 #33 #34 #35
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      [ Added LLC domain and s/match_mc/match_die/ ]
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: brice.goglin@gmail.com
      Cc: "H. Peter Anvin" <hpa@linux.intel.com>
      Link: http://lkml.kernel.org/r/20140918193334.C065EBCE@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cebf15eb
    • P
      x86: kvm: use alternatives for VMCALL vs. VMMCALL if kernel text is read-only · c1118b36
      Paolo Bonzini 提交于
      On x86_64, kernel text mappings are mapped read-only with CONFIG_DEBUG_RODATA.
      In that case, KVM will fail to patch VMCALL instructions to VMMCALL
      as required on AMD processors.
      
      The failure mode is currently a divide-by-zero exception, which obviously
      is a KVM bug that has to be fixed.  However, picking the right instruction
      between VMCALL and VMMCALL will be faster and will help if you cannot upgrade
      the hypervisor.
      Reported-by: NChris Webb <chris@arachsys.com>
      Tested-by: NChris Webb <chris@arachsys.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: x86@kernel.org
      Acked-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c1118b36
    • R
      audit: x86: drop arch from __audit_syscall_entry() interface · b4f0d375
      Richard Guy Briggs 提交于
      Since the arch is found locally in __audit_syscall_entry(), there is no need to
      pass it in as a parameter.  Delete it from the parameter list.
      
      x86* was the only arch to call __audit_syscall_entry() directly and did so from
      assembly code.
      Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: x86@kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-audit@redhat.com
      Signed-off-by: NEric Paris <eparis@redhat.com>
      
      ---
      
      As this patch relies on changes in the audit tree, I think it
      appropriate to send it through my tree rather than the x86 tree.
      b4f0d375
    • E
      ARCH: AUDIT: audit_syscall_entry() should not require the arch · 91397401
      Eric Paris 提交于
      We have a function where the arch can be queried, syscall_get_arch().
      So rather than have every single piece of arch specific code use and/or
      duplicate syscall_get_arch(), just have the audit code use the
      syscall_get_arch() code.
      Based-on-patch-by: NRichard Briggs <rgb@redhat.com>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Cc: linux-alpha@vger.kernel.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-ia64@vger.kernel.org
      Cc: microblaze-uclinux@itee.uq.edu.au
      Cc: linux-mips@linux-mips.org
      Cc: linux@lists.openrisc.net
      Cc: linux-parisc@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: linux-s390@vger.kernel.org
      Cc: linux-sh@vger.kernel.org
      Cc: sparclinux@vger.kernel.org
      Cc: user-mode-linux-devel@lists.sourceforge.net
      Cc: linux-xtensa@linux-xtensa.org
      Cc: x86@kernel.org
      91397401
  7. 19 9月, 2014 4 次提交
  8. 16 9月, 2014 3 次提交
  9. 14 9月, 2014 1 次提交
  10. 12 9月, 2014 2 次提交