1. 27 1月, 2009 1 次提交
  2. 18 1月, 2009 1 次提交
  3. 16 1月, 2009 2 次提交
  4. 26 12月, 2008 2 次提交
  5. 17 12月, 2008 1 次提交
  6. 01 12月, 2008 1 次提交
  7. 06 11月, 2008 1 次提交
    • I
      sched: re-tune balancing · 9fcd18c9
      Ingo Molnar 提交于
      Impact: improve wakeup affinity on NUMA systems, tweak SMP systems
      
      Given the fixes+tweaks to the wakeup-buddy code, re-tweak the domain
      balancing defaults on NUMA and SMP systems.
      
      Turn on SD_WAKE_AFFINE which was off on x86 NUMA - there's no reason
      why we would not want to have wakeup affinity across nodes as well.
      (we already do this in the standard NUMA template.)
      
      lat_ctx on a NUMA box is particularly happy about this change:
      
      before:
      
       |   phoenix:~/l> ./lat_ctx -s 0 2
       |   "size=0k ovr=2.60
       |   2 5.70
      
      after:
      
       |   phoenix:~/l> ./lat_ctx -s 0 2
       |   "size=0k ovr=2.65
       |   2 2.07
      
      a 2.75x speedup.
      
      pipe-test is similarly happy about it too:
      
       |  phoenix:~/sched-tests> ./pipe-test
       |   18.26 usecs/loop.
       |   14.70 usecs/loop.
       |   14.38 usecs/loop.
       |   10.55 usecs/loop.              # +WAKE_AFFINE on domain0+domain1
       |   8.63 usecs/loop.
       |   8.59 usecs/loop.
       |   9.03 usecs/loop.
       |   8.94 usecs/loop.
       |   8.96 usecs/loop.
       |   8.63 usecs/loop.
      
      Also:
      
       - disable SD_BALANCE_NEWIDLE on NUMA and SMP domains (keep it for siblings)
       - enable SD_WAKE_BALANCE on SMP domains
      
      Sysbench+postgresql improves all around the board, quite significantly:
      
                 .28-rc3-11474e2c  .28-rc3-11474e2c-tune
      -------------------------------------------------
          1:             571              688    +17.08%
          2:            1236             1206    -2.55%
          4:            2381             2642    +9.89%
          8:            4958             5164    +3.99%
         16:            9580             9574    -0.07%
         32:            7128             8118    +12.20%
         64:            7342             8266    +11.18%
        128:            7342             8064    +8.95%
        256:            7519             7884    +4.62%
        512:            7350             7731    +4.93%
      -------------------------------------------------
        SUM:           55412            59341    +6.62%
      
      So it's a win both for the runup portion, the peak area and the tail.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9fcd18c9
  8. 23 10月, 2008 2 次提交
  9. 23 7月, 2008 1 次提交
    • V
      x86: consolidate header guards · 77ef50a5
      Vegard Nossum 提交于
      This patch is the result of an automatic script that consolidates the
      format of all the headers in include/asm-x86/.
      
      The format:
      
      1. No leading underscore. Names with leading underscores are reserved.
      2. Pathname components are separated by two underscores. So we can
         distinguish between mm_types.h and mm/types.h.
      3. Everything except letters and numbers are turned into single
         underscores.
      Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com>
      77ef50a5
  10. 14 7月, 2008 1 次提交
  11. 08 7月, 2008 4 次提交
    • M
      x86: add check for node passed to node_to_cpumask, v3 · 6a2f47ca
      Mike Travis 提交于
        * When CONFIG_DEBUG_PER_CPU_MAPS is set, the node passed to
          node_to_cpumask and node_to_cpumask_ptr should be validated.
          If invalid, then a dump_stack is performed and a zero cpumask
          is returned.
      
      v2: Slightly different version to remove a compiler warning.
      v3: Redone to reflect moving setup.c -> setup_percpu.c
      Signed-off-by: NMike Travis <travis@sgi.com>
      Cc: Vegard Nossum <vegard.nossum@gmail.com>
      Cc: "akpm@linux-foundation.org" <akpm@linux-foundation.org>
      Cc: Yinghai Lu <yhlu.kernel@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6a2f47ca
    • M
      x86: remove the static 256k node_to_cpumask_map · 9f248bde
      Mike Travis 提交于
        * Consolidate node_to_cpumask operations and remove the 256k
          byte node_to_cpumask_map.  This is done by allocating the
          node_to_cpumask_map array after the number of possible nodes
          (nr_node_ids) is known.
      
        * Debug printouts when CONFIG_DEBUG_PER_CPU_MAPS is active have
          been increased.  It now shows faults when calling node_to_cpumask()
          and node_to_cpumask_ptr().
      
      For inclusion into sched-devel/latest tree.
      
      Based on:
      	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
          +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git
      Signed-off-by: NMike Travis <travis@sgi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      9f248bde
    • M
      x86: restore pda nodenumber field · 7891a24e
      Mike Travis 提交于
        * Restore the nodenumber field in the x86_64 pda.  This field is slightly
          different than the x86_cpu_to_node_map mainly because it's a static
          indication of which node the cpu is on while the cpu to node map is a
          dyanamic mapping that may get reset if the cpu goes offline.  This also
          simplifies the numa_node_id() macro.
      
      For inclusion into sched-devel/latest tree.
      
      Based on:
      	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
          +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git
      Signed-off-by: NMike Travis <travis@sgi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      7891a24e
    • M
      x86: cleanup early per cpu variables/accesses v4 · 23ca4bba
      Mike Travis 提交于
        * Introduce a new PER_CPU macro called "EARLY_PER_CPU".  This is
          used by some per_cpu variables that are initialized and accessed
          before there are per_cpu areas allocated.
      
          ["Early" in respect to per_cpu variables is "earlier than the per_cpu
          areas have been setup".]
      
          This patchset adds these new macros:
      
      	DEFINE_EARLY_PER_CPU(_type, _name, _initvalue)
      	EXPORT_EARLY_PER_CPU_SYMBOL(_name)
      	DECLARE_EARLY_PER_CPU(_type, _name)
      
      	early_per_cpu_ptr(_name)
      	early_per_cpu_map(_name, _idx)
      	early_per_cpu(_name, _cpu)
      
          The DEFINE macro defines the per_cpu variable as well as the early
          map and pointer.  It also initializes the per_cpu variable and map
          elements to "_initvalue".  The early_* macros provide access to
          the initial map (usually setup during system init) and the early
          pointer.  This pointer is initialized to point to the early map
          but is then NULL'ed when the actual per_cpu areas are setup.  After
          that the per_cpu variable is the correct access to the variable.
      
          The early_per_cpu() macro is not very efficient but does show how to
          access the variable if you have a function that can be called both
          "early" and "late".  It tests the early ptr to be NULL, and if not
          then it's still valid.  Otherwise, the per_cpu variable is used
          instead:
      
      	#define early_per_cpu(_name, _cpu) 			\
      		(early_per_cpu_ptr(_name) ?			\
      			early_per_cpu_ptr(_name)[_cpu] :	\
      			per_cpu(_name, _cpu))
      
          A better method is to actually check the pointer manually.  In the
          case below, numa_set_node can be called both "early" and "late":
      
      	void __cpuinit numa_set_node(int cpu, int node)
      	{
      	    int *cpu_to_node_map = early_per_cpu_ptr(x86_cpu_to_node_map);
      
      	    if (cpu_to_node_map)
      		    cpu_to_node_map[cpu] = node;
      	    else
      		    per_cpu(x86_cpu_to_node_map, cpu) = node;
      	}
      
        * Add a flag "arch_provides_topology_pointers" that indicates pointers
          to topology cpumask_t maps are available.  Otherwise, use the function
          returning the cpumask_t value.  This is useful if cpumask_t set size
          is very large to avoid copying data on to/off of the stack.
      
        * The coverage of CONFIG_DEBUG_PER_CPU_MAPS has been increased while
          the non-debug case has been optimized a bit.
      
        * Remove an unreferenced compiler warning in drivers/base/topology.c
      
        * Clean up #ifdef in setup.c
      
      For inclusion into sched-devel/latest tree.
      
      Based on:
      	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
          +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git
      Signed-off-by: NMike Travis <travis@sgi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      23ca4bba
  12. 11 5月, 2008 1 次提交
    • V
      x86: sysfs cpu?/topology is empty in 2.6.25 (32-bit Intel system) · 5c3a121d
      Vaidyanathan Srinivasan 提交于
      System topology on intel based system needs to be exported
      for non-numa case as well.
      
      All parts of asm-i386/topology.h has come under
      #ifdef CONFIG_NUMA after the merge to asm-x86/topology.h
      
      /sys/devices/system/cpu/cpu?/topology/* is populated based on
      ENABLE_TOPO_DEFINES
      
      The sysfs cpu topology is not being populated on my dual socket
      dual core xeon 5160 processor based (x86 32 bit) system.
      
      CONFIG_NUMA is not set in my case yet the topology is relevant
      and useful.
      
      irqbalance daemon application depends on topology to build the
      cpus and package list and it fails on Fedora9 beta since the
      sysfs topology was not being populated in the 2.6.25 kernel.
      
      I am not sure if it was intentional to not define ENABLE_TOPO_DEFINES
      for non-numa systems.
      
      This fix has been tested on the above mentioned dual core, dual socket
      system.
      Signed-off-by: NVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@kernel.org
      5c3a121d
  13. 30 4月, 2008 1 次提交
    • A
      [IA64] Provide ACPI fixup for /proc/cpuinfo/physical_id · fe086a7b
      Alex Chiang 提交于
      Legacy HP ia64 platforms currently cannot provide
      /proc/cpuinfo/physical_id due to legacy SAL/PAL implementations.
      However, that physical topology information can be obtained
      via ACPI.
      
      Provide an interface that gives ACPI one last chance to provide
      physical_id for these legacy platforms. This logic only comes
      into play iff:
      
      - ACPI actually provides slot information for the CPU
      - we lack a valid socket_id
      
      Otherwise, we don't do anything.
      
      Since x86 uses the ACPI processor driver as well, we provide a nop
      stub function for arch_fix_phys_package_id() in asm-x86/topology.h
      Signed-off-by: NAlex Chiang <achiang@hp.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      fe086a7b
  14. 27 4月, 2008 2 次提交
    • Y
      x86: multi pci root bus with different io resource range, on 64-bit · 30a18d6c
      Yinghai Lu 提交于
      scan AMD opteron io/mmio routing to make sure every pci root bus get correct
      resource range. Thus later pci scan could assign correct resource to device
      with unassigned resource.
      
      this can fix a system without _CRS for multi pci root bus.
      Signed-off-by: NYinghai Lu <yinghai.lu@sun.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      30a18d6c
    • Y
      x86: get mp_bus_to_node early · 871d5f8d
      Yinghai Lu 提交于
      Currently, on an amd k8 system with multi ht chains, the numa_node of
      pci devices under /sys/devices/pci0000:80/* is always 0, even if that
      chain is on node 1 or 2 or 3.
      
      Workaround: pcibus_to_node(bus) is used when we want to get the node that
      pci_device is on.
      
      In struct device, we already have numa_node member, and we could use
      dev_to_node()/set_dev_node() to get and set numa_node in the device.
      set_dev_node is called in pci_device_add() with pcibus_to_node(bus),
      and pcibus_to_node uses bus->sysdata for nodeid.
      
      The problem is when pci_add_device is called, bus->sysdata is not assigned
      correct nodeid yet. The result is that numa_node will always be 0.
      
      pcibios_scan_root and pci_scan_root could take sysdata. So we need to get
      mp_bus_to_node mapping before these two are called, and thus
      get_mp_bus_to_node could get correct node for sysdata in root bus.
      
      In scanning of the root bus, all child busses will take parent bus sysdata.
      So all pci_device->dev.numa_node will be assigned correctly and automatically.
      
      Later we could use dev_to_node(&pci_dev->dev) to get numa_node, and we
      could also could make other bus specific device get the correct numa_node
      too.
      
      This is an updated version of pci_sysdata and Jeff's pci_domain patch.
      
      [ mingo@elte.hu: build fix ]
      Signed-off-by: NYinghai Lu <yinghai.lu@sun.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      871d5f8d
  15. 20 4月, 2008 3 次提交
    • H
      sched, cpuset: customize sched domains, core · 1d3504fc
      Hidetoshi Seto 提交于
      [rebased for sched-devel/latest]
      
       - Add a new cpuset file, having levels:
           sched_relax_domain_level
      
       - Modify partition_sched_domains() and build_sched_domains()
         to take attributes parameter passed from cpuset.
      
       - Fill newidle_idx for node domains which currently unused but
         might be required if sched_relax_domain_level become higher.
      
       - We can change the default level by boot option 'relax_domain_level='.
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1d3504fc
    • M
      cpumask: reduce stack usage in SD_x_INIT initializers · 7c16ec58
      Mike Travis 提交于
        * Remove empty cpumask_t (and all non-zero/non-null) variables
          in SD_*_INIT macros.  Use memset(0) to clear.  Also, don't
          inline the initializer functions to save on stack space in
          build_sched_domains().
      
        * Merge change to include/linux/topology.h that uses the new
          node_to_cpumask_ptr function in the nr_cpus_node macro into
          this patch.
      
      Depends on:
      	[mm-patch]: asm-generic-add-node_to_cpumask_ptr-macro.patch
      	[sched-devel]: sched: add new set_cpus_allowed_ptr function
      
      Cc: H. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NMike Travis <travis@sgi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7c16ec58
    • M
      asm-generic: add node_to_cpumask_ptr macro · aa6b5446
      Mike Travis 提交于
      Create a simple macro to always return a pointer to the node_to_cpumask(node)
      value.  This relies on compiler optimization to remove the extra indirection:
      
          #define node_to_cpumask_ptr(v, node) 		\
      	    cpumask_t _##v = node_to_cpumask(node), *v = &_##v
      
      For those systems with a large cpumask size, then a true pointer
      to the array element can be used:
      
          #define node_to_cpumask_ptr(v, node)		\
      	    cpumask_t *v = &(node_to_cpumask_map[node])
      
      A node_to_cpumask_ptr_next() macro is provided to access another
      node_to_cpumask value.
      
      The other change is to always include asm-generic/topology.h moving the
      ifdef CONFIG_NUMA to this same file.
      
      Note: there are no references to either of these new macros in this patch,
      only the definition.
      
      Based on 2.6.25-rc5-mm1
      
      # alpha
      Cc: Richard Henderson <rth@twiddle.net>
      
      # fujitsu
      Cc: David Howells <dhowells@redhat.com>
      
      # ia64
      Cc: Tony Luck <tony.luck@intel.com>
      
      # powerpc
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      
      # sparc
      Cc: David S. Miller <davem@davemloft.net>
      Cc: William L. Irwin <wli@holomorphy.com>
      
      # x86
      Cc: H. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NMike Travis <travis@sgi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      aa6b5446
  16. 17 4月, 2008 3 次提交
  17. 30 1月, 2008 10 次提交
  18. 11 10月, 2007 1 次提交