1. 25 1月, 2014 2 次提交
    • M
      mm, x86: Revisit tlb_flushall_shift tuning for page flushes except on IvyBridge · b9a3b4c9
      Mel Gorman 提交于
      There was a large ebizzy performance regression that was
      bisected to commit 611ae8e3 (x86/tlb: enable tlb flush range
      support for x86).  The problem was related to the
      tlb_flushall_shift tuning for IvyBridge which was altered.  The
      problem is that it is not clear if the tuning values for each
      CPU family is correct as the methodology used to tune the values
      is unclear.
      
      This patch uses a conservative tlb_flushall_shift value for all
      CPU families except IvyBridge so the decision can be revisited
      if any regression is found as a result of this change.
      IvyBridge is an exception as testing with one methodology
      determined that the value of 2 is acceptable.  Details are in
      the changelog for the patch "x86: mm: Change tlb_flushall_shift
      for IvyBridge".
      
      One important aspect of this to watch out for is Xen.  The
      original commit log mentioned large performance gains on Xen.
      It's possible Xen is more sensitive to this value if it flushes
      small ranges of pages more frequently than workloads on bare
      metal typically do.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Tested-by: NDavidlohr Bueso <davidlohr@hp.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Alex Shi <alex.shi@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/n/tip-dyzMww3fqugnhbhgo6Gxmtkw@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b9a3b4c9
    • M
      x86: mm: change tlb_flushall_shift for IvyBridge · f98b7a77
      Mel Gorman 提交于
      There was a large performance regression that was bisected to
      commit 611ae8e3 ("x86/tlb: enable tlb flush range support for
      x86").  This patch simply changes the default balance point
      between a local and global flush for IvyBridge.
      
      In the interest of allowing the tests to be reproduced, this
      patch was tested using mmtests 0.15 with the following
      configurations
      
      	configs/config-global-dhp__tlbflush-performance
      	configs/config-global-dhp__scheduler-performance
      	configs/config-global-dhp__network-performance
      
      Results are from two machines
      
      Ivybridge   4 threads:  Intel(R) Core(TM) i3-3240 CPU @ 3.40GHz
      Ivybridge   8 threads:  Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
      
      Page fault microbenchmark showed nothing interesting.
      
      Ebizzy was configured to run multiple iterations and threads.
      Thread counts ranged from 1 to NR_CPUS*2. For each thread count,
      it ran 100 iterations and each iteration lasted 10 seconds.
      
      Ivybridge 4 threads
                          3.13.0-rc7            3.13.0-rc7
                             vanilla           altshift-v3
      Mean   1     6395.44 (  0.00%)     6789.09 (  6.16%)
      Mean   2     7012.85 (  0.00%)     8052.16 ( 14.82%)
      Mean   3     6403.04 (  0.00%)     6973.74 (  8.91%)
      Mean   4     6135.32 (  0.00%)     6582.33 (  7.29%)
      Mean   5     6095.69 (  0.00%)     6526.68 (  7.07%)
      Mean   6     6114.33 (  0.00%)     6416.64 (  4.94%)
      Mean   7     6085.10 (  0.00%)     6448.51 (  5.97%)
      Mean   8     6120.62 (  0.00%)     6462.97 (  5.59%)
      
      Ivybridge 8 threads
                           3.13.0-rc7            3.13.0-rc7
                              vanilla           altshift-v3
      Mean   1      7336.65 (  0.00%)     7787.02 (  6.14%)
      Mean   2      8218.41 (  0.00%)     9484.13 ( 15.40%)
      Mean   3      7973.62 (  0.00%)     8922.01 ( 11.89%)
      Mean   4      7798.33 (  0.00%)     8567.03 (  9.86%)
      Mean   5      7158.72 (  0.00%)     8214.23 ( 14.74%)
      Mean   6      6852.27 (  0.00%)     7952.45 ( 16.06%)
      Mean   7      6774.65 (  0.00%)     7536.35 ( 11.24%)
      Mean   8      6510.50 (  0.00%)     6894.05 (  5.89%)
      Mean   12     6182.90 (  0.00%)     6661.29 (  7.74%)
      Mean   16     6100.09 (  0.00%)     6608.69 (  8.34%)
      
      Ebizzy hits the worst case scenario for TLB range flushing every
      time and it shows for these Ivybridge CPUs at least that the
      default choice is a poor on.  The patch addresses the problem.
      
      Next was a tlbflush microbenchmark written by Alex Shi at
      http://marc.info/?l=linux-kernel&m=133727348217113 .  It
      measures access costs while the TLB is being flushed.  The
      expectation is that if there are always full TLB flushes that
      the benchmark would suffer and it benefits from range flushing
      
      There are 320 iterations of the test per thread count.  The
      number of entries is randomly selected with a min of 1 and max
      of 512.  To ensure a reasonably even spread of entries, the full
      range is broken up into 8 sections and a random number selected
      within that section.
      
      iteration 1, random number between 0-64
      iteration 2, random number between 64-128 etc
      
      This is still a very weak methodology.  When you do not know
      what are typical ranges, random is a reasonable choice but it
      can be easily argued that the opimisation was for smaller ranges
      and an even spread is not representative of any workload that
      matters.  To improve this, we'd need to know the probability
      distribution of TLB flush range sizes for a set of workloads
      that are considered "common", build a synthetic trace and feed
      that into this benchmark.  Even that is not perfect because it
      would not account for the time between flushes but there are
      limits of what can be reasonably done and still be doing
      something useful.  If a representative synthetic trace is
      provided then this benchmark could be revisited and the shift values retuned.
      
      Ivybridge 4 threads
                              3.13.0-rc7            3.13.0-rc7
                                 vanilla           altshift-v3
      Mean       1       10.50 (  0.00%)       10.50 (  0.03%)
      Mean       2       17.59 (  0.00%)       17.18 (  2.34%)
      Mean       3       22.98 (  0.00%)       21.74 (  5.41%)
      Mean       5       47.13 (  0.00%)       46.23 (  1.92%)
      Mean       8       43.30 (  0.00%)       42.56 (  1.72%)
      
      Ivybridge 8 threads
                               3.13.0-rc7            3.13.0-rc7
                                  vanilla           altshift-v3
      Mean       1         9.45 (  0.00%)        9.36 (  0.93%)
      Mean       2         9.37 (  0.00%)        9.70 ( -3.54%)
      Mean       3         9.36 (  0.00%)        9.29 (  0.70%)
      Mean       5        14.49 (  0.00%)       15.04 ( -3.75%)
      Mean       8        41.08 (  0.00%)       38.73 (  5.71%)
      Mean       13       32.04 (  0.00%)       31.24 (  2.49%)
      Mean       16       40.05 (  0.00%)       39.04 (  2.51%)
      
      For both CPUs, average access time is reduced which is good as
      this is the benchmark that was used to tune the shift values in
      the first place albeit it is now known *how* the benchmark was
      used.
      
      The scheduler benchmarks were somewhat inconclusive.  They
      showed gains and losses and makes me reconsider how stable those
      benchmarks really are or if something else might be interfering
      with the test results recently.
      
      Network benchmarks were inconclusive.  Almost all results were
      flat except for netperf-udp tests on the 4 thread machine.
      These results were unstable and showed large variations between
      reboots.  It is unknown if this is a recent problems but I've
      noticed before that netperf-udp results tend to vary.
      
      Based on these results, changing the default for Ivybridge seems
      like a logical choice.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Tested-by: NDavidlohr Bueso <davidlohr@hp.com>
      Reviewed-by: NAlex Shi <alex.shi@linaro.org>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/n/tip-cqnadffh1tiqrshthRj3Esge@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f98b7a77
  2. 13 1月, 2014 1 次提交
    • P
      sched/clock, x86: Use a static_key for sched_clock_stable · 35af99e6
      Peter Zijlstra 提交于
      In order to avoid the runtime condition and variable load turn
      sched_clock_stable into a static_key.
      
      Also provide a shorter implementation of local_clock() and
      cpu_clock(int) when sched_clock_stable==1.
      
                              MAINLINE   PRE       POST
      
          sched_clock_stable: 1          1         1
          (cold) sched_clock: 329841     221876    215295
          (cold) local_clock: 301773     234692    220773
          (warm) sched_clock: 38375      25602     25659
          (warm) local_clock: 100371     33265     27242
          (warm) rdtsc:       27340      24214     24208
          sched_clock_stable: 0          0         0
          (cold) sched_clock: 382634     235941    237019
          (cold) local_clock: 396890     297017    294819
          (warm) sched_clock: 38194      25233     25609
          (warm) local_clock: 143452     71234     71232
          (warm) rdtsc:       27345      24245     24243
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/n/tip-eummbdechzz37mwmpags1gjr@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      35af99e6
  3. 07 1月, 2014 1 次提交
  4. 04 1月, 2014 1 次提交
  5. 20 12月, 2013 1 次提交
  6. 26 10月, 2013 1 次提交
    • J
      x86/cpu: Track legacy CPU model data only on 32-bit kernels · 09dc68d9
      Jan Beulich 提交于
      struct cpu_dev's c_models is only ever set inside CONFIG_X86_32
      conditionals (or code that's being built for 32-bit only), so
      there's no use of reserving the (empty) space for the model
      names in a 64-bit kernel.
      
      Similarly, c_size_cache is only used in the #else of a
      CONFIG_X86_64 conditional, so reserving space for (and in one
      case even initializing) that field is pointless for 64-bit
      kernels too.
      
      While moving both fields to the end of the structure, I also
      noticed that:
      
       - the c_models array size was one too small, potentially causing
         table_lookup_model() to return garbage on Intel CPUs (intel.c's
         instance was lacking the sentinel with family being zero), so the
         patch bumps that by one,
      
       - c_models' vendor sub-field was unused (and anyway redundant
         with the base structure's c_x86_vendor field), so the patch deletes it.
      
      Also rename the legacy fields so that their legacy nature stands out
      and comment their declarations.
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Link: http://lkml.kernel.org/r/5265036802000078000FC4DB@nat28.tlf.novell.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      09dc68d9
  7. 15 7月, 2013 1 次提交
    • P
      x86: delete __cpuinit usage from all x86 files · 148f9bb8
      Paul Gortmaker 提交于
      The __cpuinit type of throwaway sections might have made sense
      some time ago when RAM was more constrained, but now the savings
      do not offset the cost and complications.  For example, the fix in
      commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
      is a good example of the nasty type of bugs that can be created
      with improper use of the various __init prefixes.
      
      After a discussion on LKML[1] it was decided that cpuinit should go
      the way of devinit and be phased out.  Once all the users are gone,
      we can then finally remove the macros themselves from linux/init.h.
      
      Note that some harmless section mismatch warnings may result, since
      notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
      are flagged as __cpuinit  -- so if we remove the __cpuinit from
      arch specific callers, we will also get section mismatch warnings.
      As an intermediate step, we intend to turn the linux/init.h cpuinit
      content into no-ops as early as possible, since that will get rid
      of these warnings.  In any case, they are temporary and harmless.
      
      This removes all the arch/x86 uses of the __cpuinit macros from
      all C files.  x86 only had the one __CPUINIT used in assembly files,
      and it wasn't paired off with a .previous or a __FINIT, so we can
      delete it directly w/o any corresponding additional change there.
      
      [1] https://lkml.org/lkml/2013/5/20/589
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: x86@kernel.org
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      148f9bb8
  8. 12 4月, 2013 1 次提交
    • K
      x86: Use a read-only IDT alias on all CPUs · 4eefbe79
      Kees Cook 提交于
      Make a copy of the IDT (as seen via the "sidt" instruction) read-only.
      This primarily removes the IDT from being a target for arbitrary memory
      write attacks, and has the added benefit of also not leaking the kernel
      base offset, if it has been relocated.
      
      We already did this on vendor == Intel and family == 5 because of the
      F0 0F bug -- regardless of if a particular CPU had the F0 0F bug or
      not.  Since the workaround was so cheap, there simply was no reason to
      be very specific.  This patch extends the readonly alias to all CPUs,
      but does not activate the #PF to #UD conversion code needed to deliver
      the proper exception in the F0 0F case except on Intel family 5
      processors.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Link: http://lkml.kernel.org/r/20130410192422.GA17344@www.outflux.net
      Cc: Eric Northup <digitaleric@google.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      4eefbe79
  9. 03 4月, 2013 1 次提交
  10. 16 3月, 2013 1 次提交
  11. 30 11月, 2012 1 次提交
  12. 18 11月, 2012 1 次提交
  13. 17 11月, 2012 1 次提交
  14. 07 8月, 2012 1 次提交
  15. 28 6月, 2012 2 次提交
    • A
      x86/tlb: add tlb_flushall_shift for specific CPU · c4211f42
      Alex Shi 提交于
      Testing show different CPU type(micro architectures and NUMA mode) has
      different balance points between the TLB flush all and multiple invlpg.
      And there also has cases the tlb flush change has no any help.
      
      This patch give a interface to let x86 vendor developers have a chance
      to set different shift for different CPU type.
      
      like some machine in my hands, balance points is 16 entries on
      Romely-EP; while it is at 8 entries on Bloomfield NHM-EP; and is 256 on
      IVB mobile CPU. but on model 15 core2 Xeon using invlpg has nothing
      help.
      
      For untested machine, do a conservative optimization, same as NHM CPU.
      Signed-off-by: NAlex Shi <alex.shi@intel.com>
      Link: http://lkml.kernel.org/r/1340845344-27557-5-git-send-email-alex.shi@intel.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
      c4211f42
    • A
      x86/tlb_info: get last level TLB entry number of CPU · e0ba94f1
      Alex Shi 提交于
      For 4KB pages, x86 CPU has 2 or 1 level TLB, first level is data TLB and
      instruction TLB, second level is shared TLB for both data and instructions.
      
      For hupe page TLB, usually there is just one level and seperated by 2MB/4MB
      and 1GB.
      
      Although each levels TLB size is important for performance tuning, but for
      genernal and rude optimizing, last level TLB entry number is suitable. And
      in fact, last level TLB always has the biggest entry number.
      
      This patch will get the biggest TLB entry number and use it in furture TLB
      optimizing.
      
      Accroding Borislav's suggestion, except tlb_ll[i/d]_* array, other
      function and data will be released after system boot up.
      
      For all kinds of x86 vendor friendly, vendor specific code was moved to its
      specific files.
      Signed-off-by: NAlex Shi <alex.shi@intel.com>
      Link: http://lkml.kernel.org/r/1340845344-27557-2-git-send-email-alex.shi@intel.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
      e0ba94f1
  16. 21 12月, 2011 1 次提交
  17. 14 10月, 2011 2 次提交
  18. 16 7月, 2011 1 次提交
  19. 15 7月, 2011 1 次提交
    • L
      x86, intel, power: Initialize MSR_IA32_ENERGY_PERF_BIAS · abe48b10
      Len Brown 提交于
      Since 2.6.36 (23016bf0), Linux prints the existence of "epb" in /proc/cpuinfo,
      Since 2.6.38 (d5532ee7), the x86_energy_perf_policy(8) utility has
      been available in-tree to update MSR_IA32_ENERGY_PERF_BIAS.
      
      However, the typical BIOS fails to initialize the MSR, presumably
      because this is handled by high-volume shrink-wrap operating systems...
      
      Linux distros, on the other hand, do not yet invoke x86_energy_perf_policy(8).
      As a result, WSM-EP, SNB, and later hardware from Intel will run in its
      default hardware power-on state (performance), which assumes that users
      care for performance at all costs and not for energy efficiency.
      While that is fine for performance benchmarks, the hardware's intended default
      operating point is "normal" mode...
      
      Initialize the MSR to the "normal" by default during kernel boot.
      
      x86_energy_perf_policy(8) is available to change the default after boot,
      should the user have a different preference.
      Signed-off-by: NLen Brown <len.brown@intel.com>
      Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1107140051020.18606@x980Acked-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: <stable@kernel.org>
      abe48b10
  20. 18 5月, 2011 1 次提交
  21. 17 5月, 2011 1 次提交
  22. 28 1月, 2011 2 次提交
    • T
      x86: Unify CPU -> NUMA node mapping between 32 and 64bit · 645a7919
      Tejun Heo 提交于
      Unlike 64bit, 32bit has been using its own cpu_to_node_map[] for
      CPU -> NUMA node mapping.  Replace it with early_percpu variable
      x86_cpu_to_node_map and share the mapping code with 64bit.
      
      * USE_PERCPU_NUMA_NODE_ID is now enabled for 32bit too.
      
      * x86_cpu_to_node_map and numa_set/clear_node() are moved from
        numa_64 to numa.  For now, on 32bit, x86_cpu_to_node_map is initialized
        with 0 instead of NUMA_NO_NODE.  This is to avoid introducing unexpected
        behavior change and will be updated once init path is unified.
      
      * srat_detect_node() is now enabled for x86_32 too.  It calls
        numa_set_node() and initializes the mapping making explicit
        cpu_to_node_map[] updates from map/unmap_cpu_to_node() unnecessary.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: eric.dumazet@gmail.com
      Cc: yinghai@kernel.org
      Cc: brgerst@gmail.com
      Cc: gorcunov@gmail.com
      Cc: penberg@kernel.org
      Cc: shaohui.zheng@intel.com
      Cc: rientjes@google.com
      LKML-Reference: <1295789862-25482-15-git-send-email-tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: David Rientjes <rientjes@google.com>
      645a7919
    • T
      x86: Unify cpu/apicid <-> NUMA node mapping between 32 and 64bit · bbc9e2f4
      Tejun Heo 提交于
      The mapping between cpu/apicid and node is done via
      apicid_to_node[] on 64bit and apicid_2_node[] +
      apic->x86_32_numa_cpu_node() on 32bit. This difference makes it
      difficult to further unify 32 and 64bit NUMA handling.
      
      This patch unifies it by replacing both apicid_to_node[] and
      apicid_2_node[] with __apicid_to_node[] array, which is accessed
      by two accessors - set_apicid_to_node() and numa_cpu_node().  On
      64bit, numa_cpu_node() always consults __apicid_to_node[]
      directly while 32bit goes through apic->numa_cpu_node() method
      to allow apic implementations to override it.
      
      srat_detect_node() for amd cpus contains workaround for broken
      NUMA configuration which assumes relationship between APIC ID,
      HT node ID and NUMA topology.  Leave it to access
      __apicid_to_node[] directly as mapping through CPU might result
      in undesirable behavior change.  The comment is reformatted and
      updated to note the ugliness.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NPekka Enberg <penberg@kernel.org>
      Cc: eric.dumazet@gmail.com
      Cc: yinghai@kernel.org
      Cc: brgerst@gmail.com
      Cc: gorcunov@gmail.com
      Cc: shaohui.zheng@intel.com
      Cc: rientjes@google.com
      LKML-Reference: <1295789862-25482-14-git-send-email-tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: David Rientjes <rientjes@google.com>
      bbc9e2f4
  23. 12 10月, 2010 1 次提交
    • N
      x86, numa: Assign CPUs to nodes in round-robin manner on fake NUMA · 50f2d7f6
      Nikanth Karthikesan 提交于
      commit d9c2d5ac "x86, numa: Use near(er)
      online node instead of roundrobin for NUMA" changed NUMA initialization on
      Intel to choose the nearest online node or first node.  Fake NUMA would be
      better of with round-robin initialization, instead of the all CPUS on
      first node.  Change the choice of first node, back to round-robin.
      
      For testing NUMA kernel behaviour without cpusets and NUMA aware
      applications, it would be better to have cpus in different nodes, rather
      than all in a single node.  With cpusets migration of tasks scenarios
      cannot not be tested.
      
      I guess having it round-robin shouldn't affect the use cases for all cpus
      on the first node.
      
      The code comments in arch/x86/mm/numa_64.c:759 indicate that this used to
      be the case, which was changed by commit d9c2d5ac.  It changed from
      roundrobin to nearer or first node.  And I couldn't find any reason for
      this change in its changelog.
      Signed-off-by: NNikanth Karthikesan <knikanth@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      50f2d7f6
  24. 29 9月, 2010 1 次提交
  25. 21 9月, 2010 1 次提交
  26. 13 8月, 2010 1 次提交
  27. 24 4月, 2010 1 次提交
    • H
      x86: Disable large pages on CPUs with Atom erratum AAE44 · 7a0fc404
      H. Peter Anvin 提交于
      Atom erratum AAE44/AAF40/AAG38/AAH41:
      
      "If software clears the PS (page size) bit in a present PDE (page
      directory entry), that will cause linear addresses mapped through this
      PDE to use 4-KByte pages instead of using a large page after old TLB
      entries are invalidated. Due to this erratum, if a code fetch uses
      this PDE before the TLB entry for the large page is invalidated then
      it may fetch from a different physical address than specified by
      either the old large page translation or the new 4-KByte page
      translation. This erratum may also cause speculative code fetches from
      incorrect addresses."
      
      [http://download.intel.com/design/processor/specupdt/319536.pdf]
      
      Where as commit 211b3d03 seems to
      workaround errata AAH41 (mixed 4K TLBs) it reduces the window of
      opportunity for the bug to occur and does not totally remove it.  This
      patch disables mixed 4K/4MB page tables totally avoiding the page
      splitting and not tripping this processor issue.
      
      This is based on an original patch by Colin King.
      Originally-by: NColin Ian King <colin.king@canonical.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      LKML-Reference: <1269271251-19775-1-git-send-email-colin.king@canonical.com>
      Cc: <stable@kernel.org>
      7a0fc404
  28. 10 4月, 2010 1 次提交
  29. 26 3月, 2010 1 次提交
    • P
      x86, perf, bts, mm: Delete the never used BTS-ptrace code · faa4602e
      Peter Zijlstra 提交于
      Support for the PMU's BTS features has been upstreamed in
      v2.6.32, but we still have the old and disabled ptrace-BTS,
      as Linus noticed it not so long ago.
      
      It's buggy: TIF_DEBUGCTLMSR is trampling all over that MSR without
      regard for other uses (perf) and doesn't provide the flexibility
      needed for perf either.
      
      Its users are ptrace-block-step and ptrace-bts, since ptrace-bts
      was never used and ptrace-block-step can be implemented using a
      much simpler approach.
      
      So axe all 3000 lines of it. That includes the *locked_memory*()
      APIs in mm/mlock.c as well.
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Markus Metzger <markus.t.metzger@intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <20100325135413.938004390@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      faa4602e
  30. 02 3月, 2010 1 次提交
  31. 18 12月, 2009 1 次提交
  32. 12 12月, 2009 1 次提交
    • M
      x86: Limit the number of processor bootup messages · 2eaad1fd
      Mike Travis 提交于
      When there are a large number of processors in a system, there
      is an excessive amount of messages sent to the system console.
      It's estimated that with 4096 processors in a system, and the
      console baudrate set to 56K, the startup messages will take
      about 84 minutes to clear the serial port.
      
      This set of patches limits the number of repetitious messages
      which contain no additional information.  Much of this information
      is obtainable from the /proc and /sysfs.   Some of the messages
      are also sent to the kernel log buffer as KERN_DEBUG messages so
      dmesg can be used to examine more closely any details specific to
      a problem.
      
      The new cpu bootup sequence for system_state == SYSTEM_BOOTING:
      
      Booting Node   0, Processors  #1 #2 #3 #4 #5 #6 #7 Ok.
      Booting Node   1, Processors  #8 #9 #10 #11 #12 #13 #14 #15 Ok.
      ...
      Booting Node   3, Processors  #56 #57 #58 #59 #60 #61 #62 #63 Ok.
      Brought up 64 CPUs
      
      After the system is running, a single line boot message is displayed
      when CPU's are hotplugged on:
      
          Booting Node %d Processor %d APIC 0x%x
      
      Status of the following lines:
      
          CPU: Physical Processor ID:		printed once (for boot cpu)
          CPU: Processor Core ID:		printed once (for boot cpu)
          CPU: Hyper-Threading is disabled	printed once (for boot cpu)
          CPU: Thermal monitoring enabled	printed once (for boot cpu)
          CPU %d/0x%x -> Node %d:		removed
          CPU %d is now offline:		only if system_state == RUNNING
          Initializing CPU#%d:		KERN_DEBUG
      Signed-off-by: NMike Travis <travis@sgi.com>
      LKML-Reference: <4B219E28.8080601@sgi.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      2eaad1fd
  33. 23 11月, 2009 1 次提交
    • Y
      x86, numa: Use near(er) online node instead of roundrobin for NUMA · d9c2d5ac
      Yinghai Lu 提交于
      CPU to node mapping is set via the following sequence:
      
       1. numa_init_array(): Set up roundrobin from cpu to online node
      
       2. init_cpu_to_node(): Set that according to apicid_to_node[]
      			according to srat only handle the node that
      			is online, and leave other cpu on node
      			without ram (aka not online) to still
      			roundrobin.
      
      3. later call srat_detect_node for Intel/AMD, will use first_online
         node or nearby node.
      
      Problem is that setup_per_cpu_areas() is not called between 2 and 3,
      the per_cpu for cpu on node with ram is on different node, and could
      put that on node with two hops away.
      
      So try to optimize this and add find_near_online_node() and call
      init_cpu_to_node().
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <4B07A739.3030104@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d9c2d5ac
  34. 15 9月, 2009 1 次提交
    • P
      x86: Move APERF/MPERF into a X86_FEATURE · a8303aaf
      Peter Zijlstra 提交于
      Move the APERFMPERF capacility into a X86_FEATURE flag so that it
      can be used outside of the acpi cpufreq driver.
      
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Cc: Yanmin <yanmin_zhang@linux.intel.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Yinghai Lu <yhlu.kernel@gmail.com>
      Cc: cpufreq@vger.kernel.org
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a8303aaf
  35. 11 7月, 2009 1 次提交
  36. 15 6月, 2009 1 次提交
    • V
      x86: add hooks for kmemcheck · f8561296
      Vegard Nossum 提交于
      The hooks that we modify are:
      - Page fault handler (to handle kmemcheck faults)
      - Debug exception handler (to hide pages after single-stepping
        the instruction that caused the page fault)
      
      Also redefine memset() to use the optimized version if kmemcheck is
      enabled.
      
      (Thanks to Pekka Enberg for minimizing the impact on the page fault
      handler.)
      
      As kmemcheck doesn't handle MMX/SSE instructions (yet), we also disable
      the optimized xor code, and rely instead on the generic C implementation
      in order to avoid false-positive warnings.
      Signed-off-by: NVegard Nossum <vegardno@ifi.uio.no>
      
      [whitespace fixlet]
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      
      [rebased for mainline inclusion]
      Signed-off-by: NVegard Nossum <vegardno@ifi.uio.no>
      f8561296