1. 20 10月, 2009 1 次提交
    • S
      x86-64: preserve large page mapping for 1st 2MB kernel txt with CONFIG_DEBUG_RODATA · b9af7c0d
      Suresh Siddha 提交于
      In the first 2MB, kernel text is co-located with kernel static
      page tables setup by head_64.S.  CONFIG_DEBUG_RODATA chops this
      2MB large page mapping to small 4KB pages as we mark the kernel text as RO,
      leaving the static page tables as RW.
      
      With CONFIG_DEBUG_RODATA disabled, OLTP run on NHM-EP shows 1% improvement
      with 2% reduction in system time and 1% improvement in iowait idle time.
      
      To recover this, move the kernel static page tables to .data section, so that
      we don't have to break the first 2MB of kernel text to small pages with
      CONFIG_DEBUG_RODATA.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <20091014220254.063193621@sbs-t61.sc.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      b9af7c0d
  2. 13 10月, 2009 4 次提交
    • D
      x86: Interleave emulated nodes over physical nodes · adc19389
      David Rientjes 提交于
      Add interleaved NUMA emulation support
      
      This patch interleaves emulated nodes over the system's physical
      nodes. This is required for interleave optimizations since
      mempolicies, for example, operate by iterating over a nodemask and
      act without knowledge of node distances.  It can also be used for
      testing memory latencies and NUMA bugs in the kernel.
      
      There're a couple of ways to do this:
      
       - divide the number of emulated nodes by the number of physical
         nodes and allocate the result on each physical node, or
      
       - allocate each successive emulated node on a different physical
         node until all memory is exhausted.
      
      The disadvantage of the first option is, depending on the asymmetry
      in node capacities of each physical node, emulated nodes may
      substantially differ in size on a particular physical node compared
      to another.
      
      The disadvantage of the second option is, also depending on the
      asymmetry in node capacities of each physical node, there may be
      more emulated nodes allocated on a single physical node as another.
      
      This patch implements the second option; we sacrifice the
      possibility that we may have slightly more emulated nodes on a
      particular physical node compared to another in lieu of node size
      asymmetry.
      
       [ Note that "node capacity" of a physical node is not only a
         function of its addressable range, but also is affected by
         subtracting out the amount of reserved memory over that range.
         NUMA emulation only deals with available, non-reserved memory
         quantities. ]
      
      We ensure there is at least a minimal amount of available memory
      allocated to each node.  We also make sure that at least this
      amount of available memory is available in ZONE_DMA32 for any node
      that includes both ZONE_DMA32 and ZONE_NORMAL.
      
      This patch also cleans the emulation code up by no longer passing
      the statically allocated struct bootnode array among the various
      functions. This init.data array is not allocated on the stack since
      it may be very large and thus it may be accessed at file scope.
      
      The WARN_ON() for nodes_cover_memory() when faking proximity
      domains is removed since it relies on successive nodes always
      having greater start addresses than previous nodes; with
      interleaving this is no longer always true.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Ankita Garg <ankita@in.ibm.com>
      Cc: Len Brown <len.brown@intel.com>
      LKML-Reference: <alpine.DEB.1.00.0909251519150.14754@chino.kir.corp.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      adc19389
    • D
      x86: Export srat physical topology · 8716273c
      David Rientjes 提交于
      This is the counterpart to "x86: export k8 physical topology" for
      SRAT. It is not as invasive because the acpi code already seperates
      node setup into detection and registration steps, with the
      exception of registering e820 active regions in
      acpi_numa_memory_affinity_init().  This is now moved to
      acpi_scan_nodes() if NUMA emulation is disabled or deferred.
      
      acpi_numa_init() now returns a value which specifies whether an
      underlying SRAT was located.  If so, that topology can be used by
      the emulation code to interleave emulated nodes over physical nodes
      or to register the nodes for ACPI.
      
      acpi_get_nodes() may now be used to export the srat physical
      topology of the machine for NUMA emulation.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Ankita Garg <ankita@in.ibm.com>
      Cc: Len Brown <len.brown@intel.com>
      LKML-Reference: <alpine.DEB.1.00.0909251518580.14754@chino.kir.corp.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8716273c
    • D
      x86: Export k8 physical topology · 8ee2debc
      David Rientjes 提交于
      To eventually interleave emulated nodes over physical nodes, we
      need to know the physical topology of the machine without actually
      registering it.  This does the k8 node setup in two parts:
      detection and registration.  NUMA emulation can then used the
      physical topology detected to setup the address ranges of emulated
      nodes accordingly.  If emulation isn't used, the k8 nodes are
      registered as normal.
      
      Two formals are added to the x86 NUMA setup functions: `acpi' and
      `k8'. These represent whether ACPI or K8 NUMA has been detected;
      both cannot be true at the same time.  This specifies to the NUMA
      emulation code whether an underlying physical NUMA topology exists
      and which interface to use.
      
      This patch deals solely with separating the k8 setup path into
      Northbridge detection and registration steps and leaves the ACPI
      changes for a subsequent patch.  The `acpi' formal is added here,
      however, to avoid touching all the header files again in the next
      patch.
      
      This approach also ensures emulated nodes will not span physical
      nodes so the true memory latency is not misrepresented.
      
      k8_get_nodes() may now be used to export the k8 physical topology
      of the machine for NUMA emulation.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Ankita Garg <ankita@in.ibm.com>
      Cc: Len Brown <len.brown@intel.com>
      LKML-Reference: <alpine.DEB.1.00.0909251518400.14754@chino.kir.corp.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8ee2debc
    • D
      x86: Clean up and add missing log levels for k8 · 1af5ba51
      David Rientjes 提交于
      Convert all printk's in arch/x86/mm/k8topology_64.c to use
      pr_info() or pr_err() appropriately.
      
      Adds log levels for messages currently lacking them.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Ankita Garg <ankita@in.ibm.com>
      Cc: Len Brown <len.brown@intel.com>
      LKML-Reference: <alpine.DEB.1.00.0909251517440.14754@chino.kir.corp.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1af5ba51
  3. 12 10月, 2009 2 次提交
  4. 08 10月, 2009 1 次提交
    • A
      x86, timers: Check for pending timers after (device) interrupts · 9bcbdd9c
      Arjan van de Ven 提交于
      Now that range timers and deferred timers are common, I found a
      problem with these using the "perf timechart" tool. Frans Pop also
      reported high scheduler latencies via LatencyTop, when using
      iwlagn.
      
      It turns out that on x86, these two 'opportunistic' timers only get
      checked when another "real" timer happens. These opportunistic
      timers have the objective to save power by hitchhiking on other
      wakeups, as to avoid CPU wakeups by themselves as much as possible.
      
      The change in this patch runs this check not only at timer
      interrupts, but at all (device) interrupts. The effect is that:
      
       1) the deferred timers/range timers get delayed less
      
       2) the range timers cause less wakeups by themselves because
          the percentage of hitchhiking on existing wakeup events goes up.
      
      I've verified the working of the patch using "perf timechart", the
      original exposed bug is gone with this patch. Frans also reported
      success - the latencies are now down in the expected ~10 msec
      range.
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Tested-by: NFrans Pop <elendil@planet.nl>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mike Galbraith <efault@gmx.de>
      LKML-Reference: <20091008064041.67219b13@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9bcbdd9c
  5. 04 10月, 2009 9 次提交
  6. 03 10月, 2009 1 次提交
    • A
      x86: Simplify bound checks in the MTRR code · 11879ba5
      Arjan van de Ven 提交于
      The current bound checks for copy_from_user in the MTRR driver are
      not as obvious as they could be, and gcc agrees with that.
      
      This patch simplifies the boundary checks to the point that gcc can
      now prove to itself that the copy_from_user() is never going past
      its bounds.
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <20090926205150.30797709@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      11879ba5
  7. 02 10月, 2009 3 次提交
  8. 01 10月, 2009 7 次提交
  9. 30 9月, 2009 1 次提交
  10. 28 9月, 2009 1 次提交
  11. 27 9月, 2009 2 次提交
  12. 24 9月, 2009 8 次提交
    • A
      sysctl: remove "struct file *" argument of ->proc_handler · 8d65af78
      Alexey Dobriyan 提交于
      It's unused.
      
      It isn't needed -- read or write flag is already passed and sysctl
      shouldn't care about the rest.
      
      It _was_ used in two places at arch/frv for some reason.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: James Morris <jmorris@namei.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8d65af78
    • R
      x86: Remove redundant non-NUMA topology functions · b0c6fbe4
      Rusty Russell 提交于
      arch/x86/include/asm/topology.h declares inline fns cpu_to_node and
      cpumask_of_node for !NUMA, even though they are then declared as
      macros by asm-generic/topology.h, which is #included just below.
      
      The macros (which are the same) end up being used; these functions
      are just confusing.
      Noticed-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
      Cc: "Greg Kroah-Hartman" <gregkh@suse.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      LKML-Reference: <200909241748.45629.rusty@rustcorp.com.au>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b0c6fbe4
    • J
      x86: early_printk: Protect against using the same device twice · 429a6e5e
      Jason Wessel 提交于
      If you use the kernel argument:
      
        earlyprintk=serial,ttyS0,115200
      
      This will cause a recursive hang printing the same line
      again and again:
      
       BIOS-e820: 000000003fff3000 - 0000000040000000 (ACPI data)
       BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
       BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
      bootconsole [earlyser0] enabled
      Linux version 2.6.31-07863-gb64ada6b (mingo@sirius) (gcc version 4.3.2 20081105 (Red Hat 4.3.2-7) (GCC) ) #16789 SMP Wed Sep 23 21:09:43 CEST 2009
      Linux version 2.6.31-07863-gb64ada6b (mingo@sirius) (gcc version 4.3.2 20081105 (Red Hat 4.3.2-7) (GCC) ) #16789 SMP Wed Sep 23 21:09:43 CEST 2009
      Linux version 2.6.31-07863-gb64ada6b (mingo@sirius) (gcc version 4.3.2 20081105 (Red Hat 4.3.2-7) (GCC) ) #16789 SMP Wed Sep 23 21:09:43 CEST 2009
      Linux version 2.6.31-07863-gb64ada6b (mingo@sirius) (gcc version 4.3.2 20081105 (Red Hat 4.3.2-7) (GCC) ) #16789 SMP Wed Sep 23 21:09:43 CEST 2009
      Linux version 2.6.31-07863-gb64ada6b (mingo@sirius) (gcc version 4.3.2 20081105 (Red Hat 4.3.2-7) (GCC) ) #16789 SMP Wed Sep 23 21:09:43 CEST 2009
      
      Instead warn the end user that they specified the device
      a second time, and ignore that second console.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Greg KH <gregkh@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <4ABAAB89.1080407@windriver.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      429a6e5e
    • R
      x86: Reduce verbosity of "PAT enabled" kernel message · e23a8b6a
      Roland Dreier 提交于
      On modern systems, the kernel prints the message
      
          x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
      
      once for every CPU.
      
      This gets kind of ridiculous on huge systems; for example, on a
      64-thread system I was lucky enough to get:
      
          dmesg| grep 'PAT enabled' | wc
               64     704    5174
      
      There is already a BUG() if non-boot CPUs have PAT capabilities
      that don't match the boot CPU, so just print the message on the
      boot CPU. (I kept the print after the wrmsrl() that enables PAT,
      so that the log output continues to mean that the system survived
      enabling PAT on the boot CPU)
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      LKML-Reference: <adavdj92sso.fsf@cisco.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e23a8b6a
    • R
      x86: Reduce verbosity of "TSC is reliable" message · ea01c0d7
      Roland Dreier 提交于
      On modern systems, the kernel prints the message
      
          Skipping synchronization checks as TSC is reliable.
      
      once for every non-boot CPU.
      
      This gets kind of ridiculous on huge systems; for example, on a
      64-thread system I was lucky enough to get:
      
          $ dmesg | grep 'TSC is reliable' | wc
               63     567    4221
      
      There's no point to doing this for every CPU, since the code is
      just checking the boot CPU anyway, so change this to a
      printk_once() to make the message appears only once.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      LKML-Reference: <adazl8l2swc.fsf@cisco.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ea01c0d7
    • A
      headers: utsname.h redux · 2bcd57ab
      Alexey Dobriyan 提交于
      * remove asm/atomic.h inclusion from linux/utsname.h --
         not needed after kref conversion
       * remove linux/utsname.h inclusion from files which do not need it
      
      NOTE: it looks like fs/binfmt_elf.c do not need utsname.h, however
      due to some personality stuff it _is_ needed -- cowardly leave ELF-related
      headers and files alone.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2bcd57ab
    • R
      cpumask: use mm_cpumask() wrapper: x86 · 78f1c4d6
      Rusty Russell 提交于
      Makes code futureproof against the impending change to mm->cpu_vm_mask (to be a pointer).
      
      It's also a chance to use the new cpumask_ ops which take a pointer
      (the older ones are deprecated, but there's no hurry for arch code).
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      78f1c4d6
    • R
      cpumask: remove arch_send_call_function_ipi · 0748bd01
      Rusty Russell 提交于
      Now everyone is converted to arch_send_call_function_ipi_mask, remove
      the shim and the #defines.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      0748bd01