1. 24 11月, 2009 7 次提交
  2. 23 11月, 2009 3 次提交
    • Y
      x86, numa: Use near(er) online node instead of roundrobin for NUMA · d9c2d5ac
      Yinghai Lu 提交于
      CPU to node mapping is set via the following sequence:
      
       1. numa_init_array(): Set up roundrobin from cpu to online node
      
       2. init_cpu_to_node(): Set that according to apicid_to_node[]
      			according to srat only handle the node that
      			is online, and leave other cpu on node
      			without ram (aka not online) to still
      			roundrobin.
      
      3. later call srat_detect_node for Intel/AMD, will use first_online
         node or nearby node.
      
      Problem is that setup_per_cpu_areas() is not called between 2 and 3,
      the per_cpu for cpu on node with ram is on different node, and could
      put that on node with two hops away.
      
      So try to optimize this and add find_near_online_node() and call
      init_cpu_to_node().
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <4B07A739.3030104@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d9c2d5ac
    • Y
      x86, numa, bootmem: Only free bootmem on NUMA failure path · 021428ad
      Yinghai Lu 提交于
      In the NUMA bootmem setup failure path we freed nodedata_phys
      incorrectly.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <4B07A739.3030104@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      021428ad
    • Y
      x86: Change crash kernel to reserve via reserve_early() · 44280733
      Yinghai Lu 提交于
      use find_e820_area()/reserve_early() instead.
      
      -v2: address Eric's request, to restore original semantics.
           will fail, if the provided address can not be used.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Acked-by: NEric W. Biederman <ebiederm@xmission.com>
      LKML-Reference: <4B09E2F9.7040403@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      44280733
  3. 19 11月, 2009 1 次提交
    • J
      x86: Eliminate redundant/contradicting cache line size config options · 350f8f56
      Jan Beulich 提交于
      Rather than having X86_L1_CACHE_BYTES and X86_L1_CACHE_SHIFT
      (with inconsistent defaults), just having the latter suffices as
      the former can be easily calculated from it.
      
      To be consistent, also change X86_INTERNODE_CACHE_BYTES to
      X86_INTERNODE_CACHE_SHIFT, and set it to 7 (128 bytes) for NUMA
      to account for last level cache line size (which here matters
      more than L1 cache line size).
      
      Finally, make sure the default value for X86_L1_CACHE_SHIFT,
      when X86_GENERIC is selected, is being seen before that for the
      individual CPU model options (other than on x86-64, where
      GENERIC_CPU is part of the choice construct, X86_GENERIC is a
      separate option on ix86).
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Acked-by: NRavikiran Thirumalai <kiran@scalex86.org>
      Acked-by: NNick Piggin <npiggin@suse.de>
      LKML-Reference: <4AFD5710020000780001F8F0@vpn.id2.novell.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      350f8f56
  4. 18 11月, 2009 1 次提交
  5. 17 11月, 2009 6 次提交
  6. 12 11月, 2009 1 次提交
  7. 08 11月, 2009 1 次提交
  8. 03 11月, 2009 3 次提交
  9. 28 10月, 2009 1 次提交
    • S
      tracing: allow to change permissions for text with dynamic ftrace enabled · 883242dd
      Steven Rostedt 提交于
      The commit 74e08179
      x86-64: align RODATA kernel section to 2MB with CONFIG_DEBUG_RODATA
      prevents text sections from becoming read/write using set_memory_rw.
      
      The dynamic ftrace changes all text pages to read/write just before
      converting the calls to tracing to nops, and vice versa.
      
      I orginally just added a flag to allow this transaction when ftrace
      did the change, but I also found that when the CPA testing was running
      it would remove the read/write as well, and ftrace does not do the text
      conversion on boot up, and the CPA changes caused the dynamic tracer
      to fail on self tests.
      
      The current solution I have is to simply not to prevent
      change_page_attr from setting the RW bit for kernel text pages.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      883242dd
  10. 24 10月, 2009 1 次提交
  11. 23 10月, 2009 1 次提交
  12. 20 10月, 2009 3 次提交
    • S
      x86-64: add comment for RODATA large page retainment · d6cc1c3a
      Suresh Siddha 提交于
      Add a comment explaining why RODATA is aligned to 2 MB.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      d6cc1c3a
    • S
      x86-64: align RODATA kernel section to 2MB with CONFIG_DEBUG_RODATA · 74e08179
      Suresh Siddha 提交于
      CONFIG_DEBUG_RODATA chops the large pages spanning boundaries of kernel
      text/rodata/data to small 4KB pages as they are mapped with different
      attributes (text as RO, RODATA as RO and NX etc).
      
      On x86_64, preserve the large page mappings for kernel text/rodata/data
      boundaries when CONFIG_DEBUG_RODATA is enabled. This is done by allowing the
      RODATA section to be hugepage aligned and having same RWX attributes
      for the 2MB page boundaries
      
      Extra Memory pages padding the sections will be freed during the end of the boot
      and the kernel identity mappings will have different RWX permissions compared to
      the kernel text mappings.
      
      Kernel identity mappings to these physical pages will be mapped with smaller
      pages but large page mappings are still retained for kernel text,rodata,data
      mappings.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <20091014220254.190119924@sbs-t61.sc.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      74e08179
    • S
      x86-64: preserve large page mapping for 1st 2MB kernel txt with CONFIG_DEBUG_RODATA · b9af7c0d
      Suresh Siddha 提交于
      In the first 2MB, kernel text is co-located with kernel static
      page tables setup by head_64.S.  CONFIG_DEBUG_RODATA chops this
      2MB large page mapping to small 4KB pages as we mark the kernel text as RO,
      leaving the static page tables as RW.
      
      With CONFIG_DEBUG_RODATA disabled, OLTP run on NHM-EP shows 1% improvement
      with 2% reduction in system time and 1% improvement in iowait idle time.
      
      To recover this, move the kernel static page tables to .data section, so that
      we don't have to break the first 2MB of kernel text to small pages with
      CONFIG_DEBUG_RODATA.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <20091014220254.063193621@sbs-t61.sc.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      b9af7c0d
  13. 13 10月, 2009 4 次提交
    • D
      x86: Interleave emulated nodes over physical nodes · adc19389
      David Rientjes 提交于
      Add interleaved NUMA emulation support
      
      This patch interleaves emulated nodes over the system's physical
      nodes. This is required for interleave optimizations since
      mempolicies, for example, operate by iterating over a nodemask and
      act without knowledge of node distances.  It can also be used for
      testing memory latencies and NUMA bugs in the kernel.
      
      There're a couple of ways to do this:
      
       - divide the number of emulated nodes by the number of physical
         nodes and allocate the result on each physical node, or
      
       - allocate each successive emulated node on a different physical
         node until all memory is exhausted.
      
      The disadvantage of the first option is, depending on the asymmetry
      in node capacities of each physical node, emulated nodes may
      substantially differ in size on a particular physical node compared
      to another.
      
      The disadvantage of the second option is, also depending on the
      asymmetry in node capacities of each physical node, there may be
      more emulated nodes allocated on a single physical node as another.
      
      This patch implements the second option; we sacrifice the
      possibility that we may have slightly more emulated nodes on a
      particular physical node compared to another in lieu of node size
      asymmetry.
      
       [ Note that "node capacity" of a physical node is not only a
         function of its addressable range, but also is affected by
         subtracting out the amount of reserved memory over that range.
         NUMA emulation only deals with available, non-reserved memory
         quantities. ]
      
      We ensure there is at least a minimal amount of available memory
      allocated to each node.  We also make sure that at least this
      amount of available memory is available in ZONE_DMA32 for any node
      that includes both ZONE_DMA32 and ZONE_NORMAL.
      
      This patch also cleans the emulation code up by no longer passing
      the statically allocated struct bootnode array among the various
      functions. This init.data array is not allocated on the stack since
      it may be very large and thus it may be accessed at file scope.
      
      The WARN_ON() for nodes_cover_memory() when faking proximity
      domains is removed since it relies on successive nodes always
      having greater start addresses than previous nodes; with
      interleaving this is no longer always true.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Ankita Garg <ankita@in.ibm.com>
      Cc: Len Brown <len.brown@intel.com>
      LKML-Reference: <alpine.DEB.1.00.0909251519150.14754@chino.kir.corp.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      adc19389
    • D
      x86: Export srat physical topology · 8716273c
      David Rientjes 提交于
      This is the counterpart to "x86: export k8 physical topology" for
      SRAT. It is not as invasive because the acpi code already seperates
      node setup into detection and registration steps, with the
      exception of registering e820 active regions in
      acpi_numa_memory_affinity_init().  This is now moved to
      acpi_scan_nodes() if NUMA emulation is disabled or deferred.
      
      acpi_numa_init() now returns a value which specifies whether an
      underlying SRAT was located.  If so, that topology can be used by
      the emulation code to interleave emulated nodes over physical nodes
      or to register the nodes for ACPI.
      
      acpi_get_nodes() may now be used to export the srat physical
      topology of the machine for NUMA emulation.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Ankita Garg <ankita@in.ibm.com>
      Cc: Len Brown <len.brown@intel.com>
      LKML-Reference: <alpine.DEB.1.00.0909251518580.14754@chino.kir.corp.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8716273c
    • D
      x86: Export k8 physical topology · 8ee2debc
      David Rientjes 提交于
      To eventually interleave emulated nodes over physical nodes, we
      need to know the physical topology of the machine without actually
      registering it.  This does the k8 node setup in two parts:
      detection and registration.  NUMA emulation can then used the
      physical topology detected to setup the address ranges of emulated
      nodes accordingly.  If emulation isn't used, the k8 nodes are
      registered as normal.
      
      Two formals are added to the x86 NUMA setup functions: `acpi' and
      `k8'. These represent whether ACPI or K8 NUMA has been detected;
      both cannot be true at the same time.  This specifies to the NUMA
      emulation code whether an underlying physical NUMA topology exists
      and which interface to use.
      
      This patch deals solely with separating the k8 setup path into
      Northbridge detection and registration steps and leaves the ACPI
      changes for a subsequent patch.  The `acpi' formal is added here,
      however, to avoid touching all the header files again in the next
      patch.
      
      This approach also ensures emulated nodes will not span physical
      nodes so the true memory latency is not misrepresented.
      
      k8_get_nodes() may now be used to export the k8 physical topology
      of the machine for NUMA emulation.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Ankita Garg <ankita@in.ibm.com>
      Cc: Len Brown <len.brown@intel.com>
      LKML-Reference: <alpine.DEB.1.00.0909251518400.14754@chino.kir.corp.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8ee2debc
    • D
      x86: Clean up and add missing log levels for k8 · 1af5ba51
      David Rientjes 提交于
      Convert all printk's in arch/x86/mm/k8topology_64.c to use
      pr_info() or pr_err() appropriately.
      
      Adds log levels for messages currently lacking them.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Ankita Garg <ankita@in.ibm.com>
      Cc: Len Brown <len.brown@intel.com>
      LKML-Reference: <alpine.DEB.1.00.0909251517440.14754@chino.kir.corp.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1af5ba51
  14. 12 10月, 2009 7 次提交
    • L
      Revert "USB: Work around BIOS bugs by quiescing USB controllers earlier" · d93a8f82
      Linus Torvalds 提交于
      This reverts commit db8be50c, as per
      
      	http://bugzilla.kernel.org/show_bug.cgi?id=14374
      	http://marc.info/?l=linux-kernel&m=125446885705223&w=4
      
      We simply can't do the USB handoff at FIXUP_HEADER time, since it will
      often require us to have valid IO mappings etc.  But that in turn
      requires a whole different approach, not this trivial one-liner.
      
      Maybe we could teach all the USB quirk handoff handlers to only do the
      quirk if the device has all its registers set up (since if it isn't
      initialized, it's unlikely to be active), but regardless that will need
      a whole lot more code than just saying "let's do it really early".
      
      The proper fix is almost certainly to just leave the legacy IOMMU
      mappings active until after all devices have been initialized.
      Reported-by: NNick Piggin <npiggin@suse.de>
      Cc: David Woodhouse <David.Woodhouse@intel.com>
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d93a8f82
    • L
      Linux 2.6.32-rc4 · 16129139
      Linus Torvalds 提交于
      16129139
    • Y
      pci: increase alignment to make more space for hidden code · 15b812f1
      Yinghai Lu 提交于
      As reported in
      
      	http://bugzilla.kernel.org/show_bug.cgi?id=13940
      
      on some system when acpi are enabled, acpi clears some BAR for some
      devices without reason, and kernel will need to allocate devices for
      them.  It then apparently hits some undocumented resource conflict,
      resulting in non-working devices.
      
      Try to increase alignment to get more safe range for unassigned devices.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      15b812f1
    • L
      Merge branch 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6 · f144c78e
      Linus Torvalds 提交于
      * 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6: (21 commits)
        [S390] dasd: fix race condition in resume code
        [S390] Add EX_TABLE for addressing exception in usercopy functions.
        [S390] 64-bit register support for 31-bit processes
        [S390] hibernate: Use correct place for CPU address in lowcore
        [S390] pm: ignore time spend in suspended state
        [S390] zcrypt: Improve some comments
        [S390] zcrypt: Fix sparse warning.
        [S390] perf_counter: fix vdso detection
        [S390] ftrace: drop nmi protection
        [S390] compat: fix truncate system call wrapper
        [S390] Provide arch specific mdelay implementation.
        [S390] Fix enabled udelay for short delays.
        [S390] cio: allow setting boxed devices offline
        [S390] cio: make not operational handling consistent
        [S390] cio: make disconnected handling consistent
        [S390] Fix memory leak in /proc/cio_ignore
        [S390] cio: channel path memory leak
        [S390] module: fix memory leak in s390 module loader
        [S390] Enable kmemleak on s390.
        [S390] 3270 console build fix
        ...
      f144c78e
    • B
      ROMFS: fix length used with romfs_dev_strnlen() function · ef1f7a7e
      Bernd Schmidt 提交于
      An interestingly corrupted romfs file system exposed a problem with the
      romfs_dev_strnlen function: it's passing the wrong value to its helpers.
      Rather than limit the string to the length passed in by the callers, it
      uses the size of the device as the limit.
      Signed-off-by: NBernd Schmidt <bernds_cb1@t-online.de>
      Signed-off-by: NMike Frysinger <vapier@gentoo.org>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ef1f7a7e
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6 · c6c59927
      Linus Torvalds 提交于
      * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6: (32 commits)
        USB: serial: no unnecessary GFP_ATOMIC in oti6858
        USB: serial: fix race between unthrottle and completion handler in visor
        USB: serial: fix assumption that throttle/unthrottle cannot sleep
        USB: serial: fix race between unthrottle and completion handler in symbolserial
        USB: serial: fix race between unthrottle and completion handler in opticon
        USB: ehci: Fix isoc scheduling boundary checking.
        USB: storage: When a device returns no sense data, call it a Hardware Error
        USB: small fix in error case of suspend in generic usbserial code
        USB: visor: fix trivial accounting bug in visor driver
        USB: Fix throttling in generic usbserial driver
        USB: cp210x: Add support for the DW700 UART
        USB: ipaq: fix oops when device is plugged in
        USB: isp1362: fix build warnings on 64-bit systems
        USB: gadget: imx_udc: Use resource size
        USB: storage: iRiver P7 UNUSUAL_DEV patch
        USB: musb: make HAVE_CLK support optional
        USB: xhci: Fix dropping endpoints from the xHC schedule.
        USB: xhci: Don't wait for a disable slot cmd when HC dies.
        USB: xhci: Handle canceled URBs when HC dies.
        USB: xhci: Stop debugging polling loop when HC dies.
        ...
      c6c59927
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6 · ff945afb
      Linus Torvalds 提交于
      * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6:
        Staging: comedi: fix build on arches that don't want comedi drivers
        Staging: comedi: pcmcia irq fixes
        Staging: comedi: ni_pcimio: Added device id for pxi-6225.
        Staging: comedi: ni_65xx.c: fix output inversion problem.
        Staging: comedi: ni_65xx.c: fix insn_bits shift calculation.
        Staging: comedi: s526: fixes for pulse generator
        Staging: comedi: s526: Take account of arch's byte order.
        Staging: comedi: s526: Get rid of global variable 'cmReg'.
        Staging: comedi: s526: Fix number of channels on DIO subdevice
        Staging: comedi: cb_pcidio: fix "section mismatch" error
        Staging: comedi: jr3_pci: Initialize transf variable fully in jr3_pci_poll_subdevice().
        Staging: comedi: Corrected type of a printk argument in resize_async_buffer().
        Staging: p9auth: a few fixes
        Staging: rtl8192e: Add #include <linux/vmalloc.h>
        Staging: iio: Don't build on s390
        Staging: winbond: implement prepare_multicast and fix API usage
        Staging: w35und: Fix ->beacon_int breakage
        Staging: remove cowloop driver
        Staging: remove agnx driver
        Staging: comedi: serial2002: fix include build issue
      ff945afb