1. 04 2月, 2014 1 次提交
    • B
      x86/PCI: Add x86_pci_root_bus_node() to look up NUMA node from PCI bus · afcf21c2
      Bjorn Helgaas 提交于
      The AMD early_fill_mp_bus_info() already allocates a struct pci_root_info
      for each PCI host bridge it finds, and that structure contains the NUMA
      node number.  We don't need to keep the same information in the
      mp_bus_to_node[] table.
      
      This adds x86_pci_root_bus_node(), which returns the NUMA node number, or
      NUMA_NO_NODE if the node is unknown.
      
      Note that unlike get_mp_bus_to_node(), x86_pci_root_bus_node() only works
      for root buses.  For example, if amd_bus.c finds a host bridge on node 1 to
      [bus 00-0f], get_mp_bus_to_node() returns 1 for any bus between 00 and 0f,
      but x86_pci_root_bus_node() returns 1 for bus 00 and NUMA_NO_NODE for buses
      01-0f.
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      afcf21c2
  2. 30 7月, 2013 1 次提交
  3. 09 5月, 2012 1 次提交
    • P
      sched/numa: Rewrite the CONFIG_NUMA sched domain support · cb83b629
      Peter Zijlstra 提交于
      The current code groups up to 16 nodes in a level and then puts an
      ALLNODES domain spanning the entire tree on top of that. This doesn't
      reflect the numa topology and esp for the smaller not-fully-connected
      machines out there today this might make a difference.
      
      Therefore, build a proper numa topology based on node_distance().
      
      Since there's no fixed numa layers anymore, the static SD_NODE_INIT
      and SD_ALLNODES_INIT aren't usable anymore, the new code tries to
      construct something similar and scales some values either on the
      number of cpus in the domain and/or the node_distance() ratio.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: linux-alpha@vger.kernel.org
      Cc: linux-ia64@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: linux-sh@vger.kernel.org
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: sparclinux@vger.kernel.org
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: x86@kernel.org
      Cc: Dimitri Sivanich <sivanich@sgi.com>
      Cc: Greg Pearson <greg.pearson@hp.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: bob.picco@oracle.com
      Cc: chris.mason@oracle.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/n/tip-r74n3n8hhuc2ynbrnp3vt954@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cb83b629
  4. 07 1月, 2012 1 次提交
    • B
      x86/PCI: convert to pci_create_root_bus() and pci_scan_root_bus() · 2cd6975a
      Bjorn Helgaas 提交于
      x86 has two kinds of PCI root bus scanning:
      
      (1) ACPI-based, using _CRS resources.  This used pci_create_bus(), not
          pci_scan_bus(), because ACPI hotplug needed to split the
          pci_bus_add_devices() into a separate host bridge .start() method.
      
          This patch parses the _CRS resources earlier, so we can build a list of
          resources and pass it to pci_create_root_bus().
      
          Note that as before, we parse the _CRS even if we aren't going to use
          it so we can print it for debugging purposes.
      
      (2) All other, which used either default resources (ioport_resource and
          iomem_resource) or information read from the hardware via amd_bus.c or
          similar.  This used pci_scan_bus().
      
          This patch converts x86_pci_root_bus_res_quirks() (previously called
          from pcibios_fixup_bus()) to x86_pci_root_bus_resources(), which builds
          a list of resources before we call pci_scan_root_bus().
      
          We also use x86_pci_root_bus_resources() if we have ACPI but are
          ignoring _CRS.
      
      CC: Yinghai Lu <yinghai.lu@oracle.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      2cd6975a
  5. 07 12月, 2011 1 次提交
  6. 02 5月, 2011 1 次提交
    • T
      x86, NUMA: Make 32bit use common NUMA init path · bd6709a9
      Tejun Heo 提交于
      With both _numa_init() methods converted and the rest of init code
      adjusted, numa_32.c now can switch from the 32bit only init code to
      the common one in numa.c.
      
      * Shim get_memcfg_*()'s are dropped and initmem_init() calls
        x86_numa_init(), which is updated to handle NUMAQ.
      
      * All boilerplate operations including node range limiting, pgdat
        alloc/init are handled by numa_init().  32bit only implementation is
        removed.
      
      * 32bit numa_add_memblk(), numa_set_distance() and
        memory_add_physaddr_to_nid() removed and common versions in
        numa_32.c enabled for 32bit.
      
      This change causes the following behavior changes.
      
      * NODE_DATA()->node_start_pfn/node_spanned_pages properly initialized
        for 32bit too.
      
      * Much more sanity checks and configuration cleanups.
      
      * Proper handling of node distances.
      
      * The same NUMA init messages as 64bit.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      bd6709a9
  7. 07 4月, 2011 1 次提交
  8. 17 2月, 2011 1 次提交
    • T
      x86-64, NUMA: Implement generic node distance handling · ac7136b6
      Tejun Heo 提交于
      Node distance either used direct node comparison, ACPI PXM comparison
      or ACPI SLIT table lookup.  This patch implements generic node
      distance handling.  NUMA init methods can call numa_set_distance() to
      set distance between nodes and the common __node_distance()
      implementation will report the set distance.
      
      Due to the way NUMA emulation is implemented, the generic node
      distance handling is used only when emulation is not used.  Later
      patches will update NUMA emulation to use the generic distance
      mechanism.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Shaohui Zheng <shaohui.zheng@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      ac7136b6
  9. 28 1月, 2011 1 次提交
    • T
      x86: Unify CPU -> NUMA node mapping between 32 and 64bit · 645a7919
      Tejun Heo 提交于
      Unlike 64bit, 32bit has been using its own cpu_to_node_map[] for
      CPU -> NUMA node mapping.  Replace it with early_percpu variable
      x86_cpu_to_node_map and share the mapping code with 64bit.
      
      * USE_PERCPU_NUMA_NODE_ID is now enabled for 32bit too.
      
      * x86_cpu_to_node_map and numa_set/clear_node() are moved from
        numa_64 to numa.  For now, on 32bit, x86_cpu_to_node_map is initialized
        with 0 instead of NUMA_NO_NODE.  This is to avoid introducing unexpected
        behavior change and will be updated once init path is unified.
      
      * srat_detect_node() is now enabled for x86_32 too.  It calls
        numa_set_node() and initializes the mapping making explicit
        cpu_to_node_map[] updates from map/unmap_cpu_to_node() unnecessary.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: eric.dumazet@gmail.com
      Cc: yinghai@kernel.org
      Cc: brgerst@gmail.com
      Cc: gorcunov@gmail.com
      Cc: penberg@kernel.org
      Cc: shaohui.zheng@intel.com
      Cc: rientjes@google.com
      LKML-Reference: <1295789862-25482-15-git-send-email-tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: David Rientjes <rientjes@google.com>
      645a7919
  10. 28 5月, 2010 2 次提交
    • L
      numa: x86_64: use generic percpu var numa_node_id() implementation · e534c7c5
      Lee Schermerhorn 提交于
      x86 arch specific changes to use generic numa_node_id() based on generic
      percpu variable infrastructure.  Back out x86's custom version of
      numa_node_id()
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Eric Whitney <eric.whitney@hp.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e534c7c5
    • L
      numa: add generic percpu var numa_node_id() implementation · 72812019
      Lee Schermerhorn 提交于
      Rework the generic version of the numa_node_id() function to use the new
      generic percpu variable infrastructure.
      
      Guard the new implementation with a new config option:
      
              CONFIG_USE_PERCPU_NUMA_NODE_ID.
      
      Archs which support this new implemention will default this option to 'y'
      when NUMA is configured.  This config option could be removed if/when all
      archs switch over to the generic percpu implementation of numa_node_id().
      Arch support involves:
      
        1) converting any existing per cpu variable implementations to use
           this implementation.  x86_64 is an instance of such an arch.
        2) archs that don't use a per cpu variable for numa_node_id() will
           need to initialize the new per cpu variable "numa_node" as cpus
           are brought on-line.  ia64 is an example.
        3) Defining USE_PERCPU_NUMA_NODE_ID in arch dependent Kconfig--e.g.,
           when NUMA is configured.  This is required because I have
           retained the old implementation by default to allow archs to
           be modified incrementally, as desired.
      
      Subsequent patches will convert x86_64 and ia64 to use this implemenation.
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Reviewed-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Eric Whitney <eric.whitney@hp.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      72812019
  11. 16 12月, 2009 1 次提交
  12. 03 11月, 2009 1 次提交
  13. 14 10月, 2009 1 次提交
  14. 24 9月, 2009 1 次提交
  15. 16 9月, 2009 1 次提交
    • P
      sched: Disable wakeup balancing · 182a85f8
      Peter Zijlstra 提交于
      Sysbench thinks SD_BALANCE_WAKE is too agressive and kbuild doesn't
      really mind too much, SD_BALANCE_NEWIDLE picks up most of the
      slack.
      
      On a dual socket, quad core, dual thread nehalem system:
      
      sysbench (--num_threads=16):
      
       SD_BALANCE_WAKE-: 13982 tx/s
       SD_BALANCE_WAKE+: 15688 tx/s
      
      kbuild (-j16):
      
       SD_BALANCE_WAKE-: 47.648295846  seconds time elapsed   ( +-   0.312% )
       SD_BALANCE_WAKE+: 47.608607360  seconds time elapsed   ( +-   0.026% )
      
      (same within noise)
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      182a85f8
  16. 15 9月, 2009 4 次提交
    • P
      sched: Reduce forkexec_idx · b8a543ea
      Peter Zijlstra 提交于
      If we're looking to place a new task, we might as well find the
      idlest position _now_, not 1 tick ago.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b8a543ea
    • M
      sched: Improve latencies and throughput · 0ec9fab3
      Mike Galbraith 提交于
      Make the idle balancer more agressive, to improve a
      x264 encoding workload provided by Jason Garrett-Glaser:
      
       NEXT_BUDDY NO_LB_BIAS
       encoded 600 frames, 252.82 fps, 22096.60 kb/s
       encoded 600 frames, 250.69 fps, 22096.60 kb/s
       encoded 600 frames, 245.76 fps, 22096.60 kb/s
      
       NO_NEXT_BUDDY LB_BIAS
       encoded 600 frames, 344.44 fps, 22096.60 kb/s
       encoded 600 frames, 346.66 fps, 22096.60 kb/s
       encoded 600 frames, 352.59 fps, 22096.60 kb/s
      
       NO_NEXT_BUDDY NO_LB_BIAS
       encoded 600 frames, 425.75 fps, 22096.60 kb/s
       encoded 600 frames, 425.45 fps, 22096.60 kb/s
       encoded 600 frames, 422.49 fps, 22096.60 kb/s
      
      Peter pointed out that this is better done via newidle_idx,
      not via LB_BIAS, newidle balancing should look for where
      there is load _now_, not where there was load 2 ticks ago.
      
      Worst-case latencies are improved as well as no buddies
      means less vruntime spread. (as per prior lkml discussions)
      
      This change improves kbuild-peak parallelism as well.
      Reported-by: NJason Garrett-Glaser <darkshikari@gmail.com>
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1253011667.9128.16.camel@marge.simson.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0ec9fab3
    • P
      sched: Tweak wake_idx · 78e7ed53
      Peter Zijlstra 提交于
      When merging select_task_rq_fair() and sched_balance_self() we lost
      the use of wake_idx, restore that and set them to 0 to make wake
      balancing more aggressive.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      78e7ed53
    • P
      sched: Merge select_task_rq_fair() and sched_balance_self() · c88d5910
      Peter Zijlstra 提交于
      The problem with wake_idle() is that is doesn't respect things like
      cpu_power, which means it doesn't deal well with SMT nor the recent
      RT interaction.
      
      To cure this, it needs to do what sched_balance_self() does, which
      leads to the possibility of merging select_task_rq_fair() and
      sched_balance_self().
      
      Modify sched_balance_self() to:
      
        - update_shares() when walking up the domain tree,
          (it only called it for the top domain, but it should
           have done this anyway), which allows us to remove
          this ugly bit from try_to_wake_up().
      
        - do wake_affine() on the smallest domain that contains
          both this (the waking) and the prev (the wakee) cpu for
          WAKE invocations.
      
      Then use the top-down balance steps it had to replace wake_idle().
      
      This leads to the dissapearance of SD_WAKE_BALANCE and
      SD_WAKE_IDLE_FAR, with SD_WAKE_IDLE replaced with SD_BALANCE_WAKE.
      
      SD_WAKE_AFFINE needs SD_BALANCE_WAKE to be effective.
      
      Touch all topology bits to replace the old with new SD flags --
      platforms might need re-tuning, enabling SD_BALANCE_WAKE
      conditionally on a NUMA distance seems like a good additional
      feature, magny-core and small nehalem systems would want this
      enabled, systems with slow interconnects would not.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c88d5910
  17. 08 9月, 2009 1 次提交
  18. 04 9月, 2009 2 次提交
    • I
      sched: Turn on SD_BALANCE_NEWIDLE · 840a0653
      Ingo Molnar 提交于
      Start the re-tuning of the balancer by turning on newidle.
      
      It improves hackbench performance and parallelism on a 4x4 box.
      The "perf stat --repeat 10" measurements give us:
      
        domain0             domain1
        .......................................
       -SD_BALANCE_NEWIDLE -SD_BALANCE_NEWIDLE:
         2041.273208  task-clock-msecs         #      9.354 CPUs    ( +-   0.363% )
      
       +SD_BALANCE_NEWIDLE -SD_BALANCE_NEWIDLE:
         2086.326925  task-clock-msecs         #     11.934 CPUs    ( +-   0.301% )
      
       +SD_BALANCE_NEWIDLE +SD_BALANCE_NEWIDLE:
         2115.289791  task-clock-msecs         #     12.158 CPUs    ( +-   0.263% )
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
      Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
      Cc: Gautham R Shenoy <ego@in.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      840a0653
    • I
      sched: Clean up topology.h · 47734f89
      Ingo Molnar 提交于
      Re-organize the flag settings so that it's visible at a glance
      which sched-domains flags are set and which not.
      
      With the new balancer code we'll need to re-tune these details
      anyway, so make it cleaner to make fewer mistakes down the
      road ;-)
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
      Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
      Cc: Gautham R Shenoy <ego@in.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      47734f89
  19. 12 5月, 2009 1 次提交
    • V
      sched: Don't export sched_mc_power_savings on multi-socket single core system · 2ff799d3
      Vaidyanathan Srinivasan 提交于
      Fix to prevent sched_mc_power_saving from being exported through sysfs
      for multi-scoket single core system. Max cores should be always greater than
      one (1). My earlier patch that introduced fix for not exporting
      'sched_mc_power_saving' on laptops  broke it on multi-socket single core
      system. This fix addresses issue on both laptop and multi-socket single
      core system.
      Below are the Test results:
      
      1. Single socket - multi-core
             Before Patch: Does not export 'sched_mc_power_saving'
             After Patch: Does not export 'sched_mc_power_saving'
             Result: Pass
      
      2. Multi Socket - single core
            Before Patch: exports 'sched_mc_power_saving'
            After Patch: Does not export 'sched_mc_power_saving'
            Result: Pass
      
      3. Multi Socket - Multi core
            Before Patch: exports 'sched_mc_power_saving'
            After Patch: exports 'sched_mc_power_saving'
      
      [ Impact: make the sched_mc_power_saving control available more consistently ]
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Cc: Suresh B Siddha <suresh.b.siddha@intel.com>
      Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20090511143914.GB4853@dirshya.in.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2ff799d3
  20. 23 4月, 2009 1 次提交
  21. 30 3月, 2009 1 次提交
  22. 13 3月, 2009 9 次提交
  23. 27 1月, 2009 1 次提交
  24. 18 1月, 2009 1 次提交
  25. 16 1月, 2009 2 次提交
  26. 26 12月, 2008 1 次提交