1. 20 12月, 2013 1 次提交
  2. 26 10月, 2013 1 次提交
    • J
      x86/cpu: Track legacy CPU model data only on 32-bit kernels · 09dc68d9
      Jan Beulich 提交于
      struct cpu_dev's c_models is only ever set inside CONFIG_X86_32
      conditionals (or code that's being built for 32-bit only), so
      there's no use of reserving the (empty) space for the model
      names in a 64-bit kernel.
      
      Similarly, c_size_cache is only used in the #else of a
      CONFIG_X86_64 conditional, so reserving space for (and in one
      case even initializing) that field is pointless for 64-bit
      kernels too.
      
      While moving both fields to the end of the structure, I also
      noticed that:
      
       - the c_models array size was one too small, potentially causing
         table_lookup_model() to return garbage on Intel CPUs (intel.c's
         instance was lacking the sentinel with family being zero), so the
         patch bumps that by one,
      
       - c_models' vendor sub-field was unused (and anyway redundant
         with the base structure's c_x86_vendor field), so the patch deletes it.
      
      Also rename the legacy fields so that their legacy nature stands out
      and comment their declarations.
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Link: http://lkml.kernel.org/r/5265036802000078000FC4DB@nat28.tlf.novell.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      09dc68d9
  3. 15 7月, 2013 1 次提交
    • P
      x86: delete __cpuinit usage from all x86 files · 148f9bb8
      Paul Gortmaker 提交于
      The __cpuinit type of throwaway sections might have made sense
      some time ago when RAM was more constrained, but now the savings
      do not offset the cost and complications.  For example, the fix in
      commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
      is a good example of the nasty type of bugs that can be created
      with improper use of the various __init prefixes.
      
      After a discussion on LKML[1] it was decided that cpuinit should go
      the way of devinit and be phased out.  Once all the users are gone,
      we can then finally remove the macros themselves from linux/init.h.
      
      Note that some harmless section mismatch warnings may result, since
      notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
      are flagged as __cpuinit  -- so if we remove the __cpuinit from
      arch specific callers, we will also get section mismatch warnings.
      As an intermediate step, we intend to turn the linux/init.h cpuinit
      content into no-ops as early as possible, since that will get rid
      of these warnings.  In any case, they are temporary and harmless.
      
      This removes all the arch/x86 uses of the __cpuinit macros from
      all C files.  x86 only had the one __CPUINIT used in assembly files,
      and it wasn't paired off with a .previous or a __FINIT, so we can
      delete it directly w/o any corresponding additional change there.
      
      [1] https://lkml.org/lkml/2013/5/20/589
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: x86@kernel.org
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      148f9bb8
  4. 12 4月, 2013 1 次提交
    • K
      x86: Use a read-only IDT alias on all CPUs · 4eefbe79
      Kees Cook 提交于
      Make a copy of the IDT (as seen via the "sidt" instruction) read-only.
      This primarily removes the IDT from being a target for arbitrary memory
      write attacks, and has the added benefit of also not leaking the kernel
      base offset, if it has been relocated.
      
      We already did this on vendor == Intel and family == 5 because of the
      F0 0F bug -- regardless of if a particular CPU had the F0 0F bug or
      not.  Since the workaround was so cheap, there simply was no reason to
      be very specific.  This patch extends the readonly alias to all CPUs,
      but does not activate the #PF to #UD conversion code needed to deliver
      the proper exception in the F0 0F case except on Intel family 5
      processors.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Link: http://lkml.kernel.org/r/20130410192422.GA17344@www.outflux.net
      Cc: Eric Northup <digitaleric@google.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      4eefbe79
  5. 03 4月, 2013 1 次提交
  6. 16 3月, 2013 1 次提交
  7. 30 11月, 2012 1 次提交
  8. 18 11月, 2012 1 次提交
  9. 17 11月, 2012 1 次提交
  10. 07 8月, 2012 1 次提交
  11. 28 6月, 2012 2 次提交
    • A
      x86/tlb: add tlb_flushall_shift for specific CPU · c4211f42
      Alex Shi 提交于
      Testing show different CPU type(micro architectures and NUMA mode) has
      different balance points between the TLB flush all and multiple invlpg.
      And there also has cases the tlb flush change has no any help.
      
      This patch give a interface to let x86 vendor developers have a chance
      to set different shift for different CPU type.
      
      like some machine in my hands, balance points is 16 entries on
      Romely-EP; while it is at 8 entries on Bloomfield NHM-EP; and is 256 on
      IVB mobile CPU. but on model 15 core2 Xeon using invlpg has nothing
      help.
      
      For untested machine, do a conservative optimization, same as NHM CPU.
      Signed-off-by: NAlex Shi <alex.shi@intel.com>
      Link: http://lkml.kernel.org/r/1340845344-27557-5-git-send-email-alex.shi@intel.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
      c4211f42
    • A
      x86/tlb_info: get last level TLB entry number of CPU · e0ba94f1
      Alex Shi 提交于
      For 4KB pages, x86 CPU has 2 or 1 level TLB, first level is data TLB and
      instruction TLB, second level is shared TLB for both data and instructions.
      
      For hupe page TLB, usually there is just one level and seperated by 2MB/4MB
      and 1GB.
      
      Although each levels TLB size is important for performance tuning, but for
      genernal and rude optimizing, last level TLB entry number is suitable. And
      in fact, last level TLB always has the biggest entry number.
      
      This patch will get the biggest TLB entry number and use it in furture TLB
      optimizing.
      
      Accroding Borislav's suggestion, except tlb_ll[i/d]_* array, other
      function and data will be released after system boot up.
      
      For all kinds of x86 vendor friendly, vendor specific code was moved to its
      specific files.
      Signed-off-by: NAlex Shi <alex.shi@intel.com>
      Link: http://lkml.kernel.org/r/1340845344-27557-2-git-send-email-alex.shi@intel.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
      e0ba94f1
  12. 21 12月, 2011 1 次提交
  13. 14 10月, 2011 2 次提交
  14. 16 7月, 2011 1 次提交
  15. 15 7月, 2011 1 次提交
    • L
      x86, intel, power: Initialize MSR_IA32_ENERGY_PERF_BIAS · abe48b10
      Len Brown 提交于
      Since 2.6.36 (23016bf0), Linux prints the existence of "epb" in /proc/cpuinfo,
      Since 2.6.38 (d5532ee7), the x86_energy_perf_policy(8) utility has
      been available in-tree to update MSR_IA32_ENERGY_PERF_BIAS.
      
      However, the typical BIOS fails to initialize the MSR, presumably
      because this is handled by high-volume shrink-wrap operating systems...
      
      Linux distros, on the other hand, do not yet invoke x86_energy_perf_policy(8).
      As a result, WSM-EP, SNB, and later hardware from Intel will run in its
      default hardware power-on state (performance), which assumes that users
      care for performance at all costs and not for energy efficiency.
      While that is fine for performance benchmarks, the hardware's intended default
      operating point is "normal" mode...
      
      Initialize the MSR to the "normal" by default during kernel boot.
      
      x86_energy_perf_policy(8) is available to change the default after boot,
      should the user have a different preference.
      Signed-off-by: NLen Brown <len.brown@intel.com>
      Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1107140051020.18606@x980Acked-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: <stable@kernel.org>
      abe48b10
  16. 18 5月, 2011 1 次提交
  17. 17 5月, 2011 1 次提交
  18. 28 1月, 2011 2 次提交
    • T
      x86: Unify CPU -> NUMA node mapping between 32 and 64bit · 645a7919
      Tejun Heo 提交于
      Unlike 64bit, 32bit has been using its own cpu_to_node_map[] for
      CPU -> NUMA node mapping.  Replace it with early_percpu variable
      x86_cpu_to_node_map and share the mapping code with 64bit.
      
      * USE_PERCPU_NUMA_NODE_ID is now enabled for 32bit too.
      
      * x86_cpu_to_node_map and numa_set/clear_node() are moved from
        numa_64 to numa.  For now, on 32bit, x86_cpu_to_node_map is initialized
        with 0 instead of NUMA_NO_NODE.  This is to avoid introducing unexpected
        behavior change and will be updated once init path is unified.
      
      * srat_detect_node() is now enabled for x86_32 too.  It calls
        numa_set_node() and initializes the mapping making explicit
        cpu_to_node_map[] updates from map/unmap_cpu_to_node() unnecessary.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: eric.dumazet@gmail.com
      Cc: yinghai@kernel.org
      Cc: brgerst@gmail.com
      Cc: gorcunov@gmail.com
      Cc: penberg@kernel.org
      Cc: shaohui.zheng@intel.com
      Cc: rientjes@google.com
      LKML-Reference: <1295789862-25482-15-git-send-email-tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: David Rientjes <rientjes@google.com>
      645a7919
    • T
      x86: Unify cpu/apicid <-> NUMA node mapping between 32 and 64bit · bbc9e2f4
      Tejun Heo 提交于
      The mapping between cpu/apicid and node is done via
      apicid_to_node[] on 64bit and apicid_2_node[] +
      apic->x86_32_numa_cpu_node() on 32bit. This difference makes it
      difficult to further unify 32 and 64bit NUMA handling.
      
      This patch unifies it by replacing both apicid_to_node[] and
      apicid_2_node[] with __apicid_to_node[] array, which is accessed
      by two accessors - set_apicid_to_node() and numa_cpu_node().  On
      64bit, numa_cpu_node() always consults __apicid_to_node[]
      directly while 32bit goes through apic->numa_cpu_node() method
      to allow apic implementations to override it.
      
      srat_detect_node() for amd cpus contains workaround for broken
      NUMA configuration which assumes relationship between APIC ID,
      HT node ID and NUMA topology.  Leave it to access
      __apicid_to_node[] directly as mapping through CPU might result
      in undesirable behavior change.  The comment is reformatted and
      updated to note the ugliness.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NPekka Enberg <penberg@kernel.org>
      Cc: eric.dumazet@gmail.com
      Cc: yinghai@kernel.org
      Cc: brgerst@gmail.com
      Cc: gorcunov@gmail.com
      Cc: shaohui.zheng@intel.com
      Cc: rientjes@google.com
      LKML-Reference: <1295789862-25482-14-git-send-email-tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: David Rientjes <rientjes@google.com>
      bbc9e2f4
  19. 12 10月, 2010 1 次提交
    • N
      x86, numa: Assign CPUs to nodes in round-robin manner on fake NUMA · 50f2d7f6
      Nikanth Karthikesan 提交于
      commit d9c2d5ac "x86, numa: Use near(er)
      online node instead of roundrobin for NUMA" changed NUMA initialization on
      Intel to choose the nearest online node or first node.  Fake NUMA would be
      better of with round-robin initialization, instead of the all CPUS on
      first node.  Change the choice of first node, back to round-robin.
      
      For testing NUMA kernel behaviour without cpusets and NUMA aware
      applications, it would be better to have cpus in different nodes, rather
      than all in a single node.  With cpusets migration of tasks scenarios
      cannot not be tested.
      
      I guess having it round-robin shouldn't affect the use cases for all cpus
      on the first node.
      
      The code comments in arch/x86/mm/numa_64.c:759 indicate that this used to
      be the case, which was changed by commit d9c2d5ac.  It changed from
      roundrobin to nearer or first node.  And I couldn't find any reason for
      this change in its changelog.
      Signed-off-by: NNikanth Karthikesan <knikanth@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      50f2d7f6
  20. 29 9月, 2010 1 次提交
  21. 21 9月, 2010 1 次提交
  22. 13 8月, 2010 1 次提交
  23. 24 4月, 2010 1 次提交
    • H
      x86: Disable large pages on CPUs with Atom erratum AAE44 · 7a0fc404
      H. Peter Anvin 提交于
      Atom erratum AAE44/AAF40/AAG38/AAH41:
      
      "If software clears the PS (page size) bit in a present PDE (page
      directory entry), that will cause linear addresses mapped through this
      PDE to use 4-KByte pages instead of using a large page after old TLB
      entries are invalidated. Due to this erratum, if a code fetch uses
      this PDE before the TLB entry for the large page is invalidated then
      it may fetch from a different physical address than specified by
      either the old large page translation or the new 4-KByte page
      translation. This erratum may also cause speculative code fetches from
      incorrect addresses."
      
      [http://download.intel.com/design/processor/specupdt/319536.pdf]
      
      Where as commit 211b3d03 seems to
      workaround errata AAH41 (mixed 4K TLBs) it reduces the window of
      opportunity for the bug to occur and does not totally remove it.  This
      patch disables mixed 4K/4MB page tables totally avoiding the page
      splitting and not tripping this processor issue.
      
      This is based on an original patch by Colin King.
      Originally-by: NColin Ian King <colin.king@canonical.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      LKML-Reference: <1269271251-19775-1-git-send-email-colin.king@canonical.com>
      Cc: <stable@kernel.org>
      7a0fc404
  24. 10 4月, 2010 1 次提交
  25. 26 3月, 2010 1 次提交
    • P
      x86, perf, bts, mm: Delete the never used BTS-ptrace code · faa4602e
      Peter Zijlstra 提交于
      Support for the PMU's BTS features has been upstreamed in
      v2.6.32, but we still have the old and disabled ptrace-BTS,
      as Linus noticed it not so long ago.
      
      It's buggy: TIF_DEBUGCTLMSR is trampling all over that MSR without
      regard for other uses (perf) and doesn't provide the flexibility
      needed for perf either.
      
      Its users are ptrace-block-step and ptrace-bts, since ptrace-bts
      was never used and ptrace-block-step can be implemented using a
      much simpler approach.
      
      So axe all 3000 lines of it. That includes the *locked_memory*()
      APIs in mm/mlock.c as well.
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Markus Metzger <markus.t.metzger@intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <20100325135413.938004390@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      faa4602e
  26. 02 3月, 2010 1 次提交
  27. 18 12月, 2009 1 次提交
  28. 12 12月, 2009 1 次提交
    • M
      x86: Limit the number of processor bootup messages · 2eaad1fd
      Mike Travis 提交于
      When there are a large number of processors in a system, there
      is an excessive amount of messages sent to the system console.
      It's estimated that with 4096 processors in a system, and the
      console baudrate set to 56K, the startup messages will take
      about 84 minutes to clear the serial port.
      
      This set of patches limits the number of repetitious messages
      which contain no additional information.  Much of this information
      is obtainable from the /proc and /sysfs.   Some of the messages
      are also sent to the kernel log buffer as KERN_DEBUG messages so
      dmesg can be used to examine more closely any details specific to
      a problem.
      
      The new cpu bootup sequence for system_state == SYSTEM_BOOTING:
      
      Booting Node   0, Processors  #1 #2 #3 #4 #5 #6 #7 Ok.
      Booting Node   1, Processors  #8 #9 #10 #11 #12 #13 #14 #15 Ok.
      ...
      Booting Node   3, Processors  #56 #57 #58 #59 #60 #61 #62 #63 Ok.
      Brought up 64 CPUs
      
      After the system is running, a single line boot message is displayed
      when CPU's are hotplugged on:
      
          Booting Node %d Processor %d APIC 0x%x
      
      Status of the following lines:
      
          CPU: Physical Processor ID:		printed once (for boot cpu)
          CPU: Processor Core ID:		printed once (for boot cpu)
          CPU: Hyper-Threading is disabled	printed once (for boot cpu)
          CPU: Thermal monitoring enabled	printed once (for boot cpu)
          CPU %d/0x%x -> Node %d:		removed
          CPU %d is now offline:		only if system_state == RUNNING
          Initializing CPU#%d:		KERN_DEBUG
      Signed-off-by: NMike Travis <travis@sgi.com>
      LKML-Reference: <4B219E28.8080601@sgi.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      2eaad1fd
  29. 23 11月, 2009 1 次提交
    • Y
      x86, numa: Use near(er) online node instead of roundrobin for NUMA · d9c2d5ac
      Yinghai Lu 提交于
      CPU to node mapping is set via the following sequence:
      
       1. numa_init_array(): Set up roundrobin from cpu to online node
      
       2. init_cpu_to_node(): Set that according to apicid_to_node[]
      			according to srat only handle the node that
      			is online, and leave other cpu on node
      			without ram (aka not online) to still
      			roundrobin.
      
      3. later call srat_detect_node for Intel/AMD, will use first_online
         node or nearby node.
      
      Problem is that setup_per_cpu_areas() is not called between 2 and 3,
      the per_cpu for cpu on node with ram is on different node, and could
      put that on node with two hops away.
      
      So try to optimize this and add find_near_online_node() and call
      init_cpu_to_node().
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <4B07A739.3030104@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d9c2d5ac
  30. 15 9月, 2009 1 次提交
    • P
      x86: Move APERF/MPERF into a X86_FEATURE · a8303aaf
      Peter Zijlstra 提交于
      Move the APERFMPERF capacility into a X86_FEATURE flag so that it
      can be used outside of the acpi cpufreq driver.
      
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Cc: Yanmin <yanmin_zhang@linux.intel.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Yinghai Lu <yhlu.kernel@gmail.com>
      Cc: cpufreq@vger.kernel.org
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a8303aaf
  31. 11 7月, 2009 1 次提交
  32. 15 6月, 2009 1 次提交
    • V
      x86: add hooks for kmemcheck · f8561296
      Vegard Nossum 提交于
      The hooks that we modify are:
      - Page fault handler (to handle kmemcheck faults)
      - Debug exception handler (to hide pages after single-stepping
        the instruction that caused the page fault)
      
      Also redefine memset() to use the optimized version if kmemcheck is
      enabled.
      
      (Thanks to Pekka Enberg for minimizing the impact on the page fault
      handler.)
      
      As kmemcheck doesn't handle MMX/SSE instructions (yet), we also disable
      the optimized xor code, and rely instead on the generic C implementation
      in order to avoid false-positive warnings.
      Signed-off-by: NVegard Nossum <vegardno@ifi.uio.no>
      
      [whitespace fixlet]
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      
      [rebased for mainline inclusion]
      Signed-off-by: NVegard Nossum <vegardno@ifi.uio.no>
      f8561296
  33. 18 5月, 2009 1 次提交
  34. 13 3月, 2009 1 次提交
  35. 12 3月, 2009 1 次提交
  36. 08 3月, 2009 1 次提交
  37. 27 2月, 2009 1 次提交
    • I
      x86: set X86_FEATURE_TSC_RELIABLE · 83ce4009
      Ingo Molnar 提交于
      If the TSC is constant and non-stop, also set it reliable.
      
      (We will turn this off in DMI quirks for multi-chassis systems)
      
      The performance number on a 16-way Nehalem system running
      32 tasks that context-switch between each other is significant:
      
         sched_clock_stable=0		sched_clock_stable=1
         ....................         ....................
         22.456925 million/sec        24.306972 million/sec   [+8.2%]
      
      lmbench's "lat_ctx -s 0 2" goes from 0.63 microseconds to
      0.59 microseconds - a 6.7% increase in context-switching
      performance.
      
      Perfstat of 1 million pipe context switches between two tasks:
      
       Performance counter stats for './pipe-test-1m':
      
             [before]           [after]
         ............      ............
         37621.421089      36436.848378    task clock ticks     (msecs)
      
                    0                 0    CPU migrations       (events)
              2000274           2000189    context switches     (events)
                  194               193    pagefaults           (events)
           8433799643        8171016416    CPU cycles           (events) -3.21%
           8370133368        8180999694    instructions         (events) -2.31%
              4158565           3895941    cache references     (events) -6.74%
                44312             46264    cache misses         (events)
      
          2349.287976       2279.362465    wall-time            (msecs)  -3.06%
      
      The speedup comes straight from the reduction in the instruction
      count. sched_clock_cpu() got simpler and the whole workload thus
      executes faster.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      83ce4009