1. 09 5月, 2007 3 次提交
  2. 08 5月, 2007 2 次提交
    • B
      get_unmapped_area handles MAP_FIXED on i386 · 5a8130f2
      Benjamin Herrenschmidt 提交于
      Handle MAP_FIXED in i386 hugetlb_get_unmapped_area(), just call
      prepare_hugepage_range.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: NWilliam Irwin <bill.irwin@oracle.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5a8130f2
    • C
      SLUB core · 81819f0f
      Christoph Lameter 提交于
      This is a new slab allocator which was motivated by the complexity of the
      existing code in mm/slab.c. It attempts to address a variety of concerns
      with the existing implementation.
      
      A. Management of object queues
      
         A particular concern was the complex management of the numerous object
         queues in SLAB. SLUB has no such queues. Instead we dedicate a slab for
         each allocating CPU and use objects from a slab directly instead of
         queueing them up.
      
      B. Storage overhead of object queues
      
         SLAB Object queues exist per node, per CPU. The alien cache queue even
         has a queue array that contain a queue for each processor on each
         node. For very large systems the number of queues and the number of
         objects that may be caught in those queues grows exponentially. On our
         systems with 1k nodes / processors we have several gigabytes just tied up
         for storing references to objects for those queues  This does not include
         the objects that could be on those queues. One fears that the whole
         memory of the machine could one day be consumed by those queues.
      
      C. SLAB meta data overhead
      
         SLAB has overhead at the beginning of each slab. This means that data
         cannot be naturally aligned at the beginning of a slab block. SLUB keeps
         all meta data in the corresponding page_struct. Objects can be naturally
         aligned in the slab. F.e. a 128 byte object will be aligned at 128 byte
         boundaries and can fit tightly into a 4k page with no bytes left over.
         SLAB cannot do this.
      
      D. SLAB has a complex cache reaper
      
         SLUB does not need a cache reaper for UP systems. On SMP systems
         the per CPU slab may be pushed back into partial list but that
         operation is simple and does not require an iteration over a list
         of objects. SLAB expires per CPU, shared and alien object queues
         during cache reaping which may cause strange hold offs.
      
      E. SLAB has complex NUMA policy layer support
      
         SLUB pushes NUMA policy handling into the page allocator. This means that
         allocation is coarser (SLUB does interleave on a page level) but that
         situation was also present before 2.6.13. SLABs application of
         policies to individual slab objects allocated in SLAB is
         certainly a performance concern due to the frequent references to
         memory policies which may lead a sequence of objects to come from
         one node after another. SLUB will get a slab full of objects
         from one node and then will switch to the next.
      
      F. Reduction of the size of partial slab lists
      
         SLAB has per node partial lists. This means that over time a large
         number of partial slabs may accumulate on those lists. These can
         only be reused if allocator occur on specific nodes. SLUB has a global
         pool of partial slabs and will consume slabs from that pool to
         decrease fragmentation.
      
      G. Tunables
      
         SLAB has sophisticated tuning abilities for each slab cache. One can
         manipulate the queue sizes in detail. However, filling the queues still
         requires the uses of the spin lock to check out slabs. SLUB has a global
         parameter (min_slab_order) for tuning. Increasing the minimum slab
         order can decrease the locking overhead. The bigger the slab order the
         less motions of pages between per CPU and partial lists occur and the
         better SLUB will be scaling.
      
      G. Slab merging
      
         We often have slab caches with similar parameters. SLUB detects those
         on boot up and merges them into the corresponding general caches. This
         leads to more effective memory use. About 50% of all caches can
         be eliminated through slab merging. This will also decrease
         slab fragmentation because partial allocated slabs can be filled
         up again. Slab merging can be switched off by specifying
         slub_nomerge on boot up.
      
         Note that merging can expose heretofore unknown bugs in the kernel
         because corrupted objects may now be placed differently and corrupt
         differing neighboring objects. Enable sanity checks to find those.
      
      H. Diagnostics
      
         The current slab diagnostics are difficult to use and require a
         recompilation of the kernel. SLUB contains debugging code that
         is always available (but is kept out of the hot code paths).
         SLUB diagnostics can be enabled via the "slab_debug" option.
         Parameters can be specified to select a single or a group of
         slab caches for diagnostics. This means that the system is running
         with the usual performance and it is much more likely that
         race conditions can be reproduced.
      
      I. Resiliency
      
         If basic sanity checks are on then SLUB is capable of detecting
         common error conditions and recover as best as possible to allow the
         system to continue.
      
      J. Tracing
      
         Tracing can be enabled via the slab_debug=T,<slabcache> option
         during boot. SLUB will then protocol all actions on that slabcache
         and dump the object contents on free.
      
      K. On demand DMA cache creation.
      
         Generally DMA caches are not needed. If a kmalloc is used with
         __GFP_DMA then just create this single slabcache that is needed.
         For systems that have no ZONE_DMA requirement the support is
         completely eliminated.
      
      L. Performance increase
      
         Some benchmarks have shown speed improvements on kernbench in the
         range of 5-10%. The locking overhead of slub is based on the
         underlying base allocation size. If we can reliably allocate
         larger order pages then it is possible to increase slub
         performance much further. The anti-fragmentation patches may
         enable further performance increases.
      
      Tested on:
      i386 UP + SMP, x86_64 UP + SMP + NUMA emulation, IA64 NUMA + Simulator
      
      SLUB Boot options
      
      slub_nomerge		Disable merging of slabs
      slub_min_order=x	Require a minimum order for slab caches. This
      			increases the managed chunk size and therefore
      			reduces meta data and locking overhead.
      slub_min_objects=x	Mininum objects per slab. Default is 8.
      slub_max_order=x	Avoid generating slabs larger than order specified.
      slub_debug		Enable all diagnostics for all caches
      slub_debug=<options>	Enable selective options for all caches
      slub_debug=<o>,<cache>	Enable selective options for a certain set of
      			caches
      
      Available Debug options
      F		Double Free checking, sanity and resiliency
      R		Red zoning
      P		Object / padding poisoning
      U		Track last free / alloc
      T		Trace all allocs / frees (only use for individual slabs).
      
      To use SLUB: Apply this patch and then select SLUB as the default slab
      allocator.
      
      [hugh@veritas.com: fix an oops-causing locking error]
      [akpm@linux-foundation.org: various stupid cleanups and small fixes]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      81819f0f
  3. 07 5月, 2007 1 次提交
    • L
      Revert "[PATCH] x86: __pa and __pa_symbol address space separation" · e3ebadd9
      Linus Torvalds 提交于
      This was broken.  It adds complexity, for no good reason.  Rather than
      separate __pa() and __pa_symbol(), we should deprecate __pa_symbol(),
      and preferably __pa() too - and just use "virt_to_phys()" instead, which
      is more readable and has nicer semantics.
      
      However, right now, just undo the separation, and make __pa_symbol() be
      the exact same as __pa().  That fixes the bugs this patch introduced,
      and we can do the fairly obvious cleanups later.
      
      Do the new __phys_addr() function (which is now the actual workhorse for
      the unified __pa()/__pa_symbol()) as a real external function, that way
      all the potential issues with compile/link-time optimizations of
      constant symbol addresses go away, and we can also, if we choose to, add
      more sanity-checking of the argument.
      
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Vivek Goyal <vgoyal@in.ibm.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e3ebadd9
  4. 03 5月, 2007 34 次提交
    • C
      PCI: add debug information to resource collision message · f14e3136
      Chuck Ebbert 提交于
      Add more information to PCI resource collision message
      to help with debugging.
      Signed-off-by: NChuck Ebbert <cebbert@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      f14e3136
    • M
      MSI: arch must connect the irq and the msi_desc · 7fe3730d
      Michael Ellerman 提交于
      set_irq_msi() currently connects an irq_desc to an msi_desc. The archs call
      it at some point in their setup routine, and then the generic code sets up the
      reverse mapping from the msi_desc back to the irq.
      
      set_irq_msi() should do both connections, making it the one and only call
      required to connect an irq with it's MSI desc and vice versa.
      
      The arch code MUST call set_irq_msi(), and it must do so only once it's sure
      it's not going to fail the irq allocation.
      
      Given that there's no need for the arch to return the irq anymore, the return
      value from the arch setup routine just becomes 0 for success and anything else
      for failure.
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      7fe3730d
    • D
      msi: introduce ARCH_SUPPORTS_MSI Kconfig option (rev2) · f282b970
      Dan Williams 提交于
      Allows architectures to advertise that they support MSI rather than listing
      each architecture as a PCI_MSI dependency.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      f282b970
    • J
      PCI: fix sysfs rom file creation for BIOS ROM shadows · 40ee9e9f
      Jesse Barnes 提交于
      At one time, if a BIOS ROM shadow was detected for the boot video
      device (stored at offset 0xc0000), we'd set a special resource flag,
      IORESOURCE_ROM_SHADOW, so that the sysfs ROM file code could handle
      it properly.  That broke along the way somewhere though, so current
      kernels will be missing 'rom' files in sysfs if the video device
      doesn't have an explicit ROM BAR.
      
      This patch fixes the regression by moving the video fixup quirk to a
      little later in the boot cycle (to avoid having its work undone by
      PCI resource allocation) and checking in the PCI sysfs code whether
      a rom file should be created due to a shadow resource, which is also
      moved to a little later in the boot cycle so it will occur after the
      video fixup.  Tested and works on my i386 test box.
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      40ee9e9f
    • J
      PCI: Cleanup the includes of <linux/pci.h> · 6473d160
      Jean Delvare 提交于
      I noticed that many source files include <linux/pci.h> while they do
      not appear to need it. Here is an attempt to clean it all up.
      
      In order to find all possibly affected files, I searched for all
      files including <linux/pci.h> but without any other occurence of "pci"
      or "PCI". I removed the include statement from all of these, then I
      compiled an allmodconfig kernel on both i386 and x86_64 and fixed the
      false positives manually.
      
      My tests covered 66% of the affected files, so there could be false
      positives remaining. Untested files are:
      
      arch/alpha/kernel/err_common.c
      arch/alpha/kernel/err_ev6.c
      arch/alpha/kernel/err_ev7.c
      arch/ia64/sn/kernel/huberror.c
      arch/ia64/sn/kernel/xpnet.c
      arch/m68knommu/kernel/dma.c
      arch/mips/lib/iomap.c
      arch/powerpc/platforms/pseries/ras.c
      arch/ppc/8260_io/enet.c
      arch/ppc/8260_io/fcc_enet.c
      arch/ppc/8xx_io/enet.c
      arch/ppc/syslib/ppc4xx_sgdma.c
      arch/sh64/mach-cayman/iomap.c
      arch/xtensa/kernel/xtensa_ksyms.c
      arch/xtensa/platform-iss/setup.c
      drivers/i2c/busses/i2c-at91.c
      drivers/i2c/busses/i2c-mpc.c
      drivers/media/video/saa711x.c
      drivers/misc/hdpuftrs/hdpu_cpustate.c
      drivers/misc/hdpuftrs/hdpu_nexus.c
      drivers/net/au1000_eth.c
      drivers/net/fec_8xx/fec_main.c
      drivers/net/fec_8xx/fec_mii.c
      drivers/net/fs_enet/fs_enet-main.c
      drivers/net/fs_enet/mac-fcc.c
      drivers/net/fs_enet/mac-fec.c
      drivers/net/fs_enet/mac-scc.c
      drivers/net/fs_enet/mii-bitbang.c
      drivers/net/fs_enet/mii-fec.c
      drivers/net/ibm_emac/ibm_emac_core.c
      drivers/net/lasi_82596.c
      drivers/parisc/hppb.c
      drivers/sbus/sbus.c
      drivers/video/g364fb.c
      drivers/video/platinumfb.c
      drivers/video/stifb.c
      drivers/video/valkyriefb.c
      include/asm-arm/arch-ixp4xx/dma.h
      sound/oss/au1550_ac97.c
      
      I would welcome test reports for these files. I am fine with removing
      the untested files from the patch if the general opinion is that these
      changes aren't safe. The tested part would still be nice to have.
      
      Note that this patch depends on another header fixup patch I submitted
      to LKML yesterday:
        [PATCH] scatterlist.h needs types.h
        http://lkml.org/lkml/2007/3/01/141Signed-off-by: NJean Delvare <khali@linux-fr.org>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      6473d160
    • T
      [PATCH] i386: Don't delete cpu_devs data to identify different x86 types in late_initcall · 35060b6a
      Thomas Renninger 提交于
      In arch/i386/cpu/common.c there is:
      cpu_devs[X86_VENDOR_INTEL]
      cpu_devs[X86_VENDOR_CYRIX]
      cpu_devs[X86_VENDOR_AMD]
      ...
      They are all filled with data early.
      The data (struct) got set to NULL  for all, but Intel in different
      late_initcall (exit_cpu_vendor) calls.
      I don't see what sense this makes at all, maybe something that got
      forgotten with the HOTPLUG_CPU extenstions?
      
      Please check/review whether initdata, cpuinitdata is still ok and this
      still works with HOTPLUG_CPU and without, it should...
      Signed-off-by: NThomas Renninger <trenn@suse.de>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: davej@redhat.com
      35060b6a
    • D
      [PATCH] i386: type may be unused · a3193348
      David Rientjes 提交于
      In the case of !CONFIG_PCI_DIRECT && !CONFIG_PCI_MMCONFIG, type is
      unreferened.
      
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      a3193348
    • O
      [PATCH] i386: Some additional chipset register values validation. · b5229dbb
      Olivier Galibert 提交于
      On i945, a mmconfig range hitting the f0000000-ffffffff zone conflicts
      with the APIC registers and others.  Consider it invalid.
      
      On E7520, values 0000 and f000 for the window register are defined
      invalid in the documentation.
      
      I haven't seen a bios use these values, but who trusts biosen these
      days?
      Signed-off-by: NOlivier Galibert <galibert@pobox.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      
       arch/i386/pci/mmconfig-shared.c |   25 +++++++++++++++++--------
       1 file changed, 17 insertions(+), 8 deletions(-)
      b5229dbb
    • B
      [PATCH] i386: Add missing !X86_PAE dependincy to the 2G/2G split. · 6c2af358
      Bill Irwin 提交于
      Only 1GB-aligned kernel/user splits are now handled for PAE. The
      2GB/2GB split attempts to avoid aliasing vmallocspace with the 1:1
      mapping for physical memory by using an actual split of 1.875/2.125
      to accommodate 128MB of vmallocspace out of what would otherwise
      be a full 2GB for userspace. That attempt disturbs the alignment
      required by PAE for 2GB/2GB splits, and furthermore does not provide
      a 2GB/2GB split as advertised.
      
      This patch resolves the issues here in two manners. The first is
      by providing a true 2GB/2GB split in addition to the 1.875/2.125
      split. The second is by renaming the 1.875/2.125 split to
      CONFIG_VMSPLIT_2G_OPT analogously to CONFIG_VMSPLIT_3G_OPT, which
      performs a similar manuever to avoid aliasing vmallocspace with
      the 1:1 mapping for physical memory around the 3GB boundary. With
      the 1.875/2.125 split properly-named, its config option is then
      tagged as depending on !HIGHMEM to express the PAE implementation's
      current inability to deal with such unaligned splits.
      
      This patch is essentially a combination of two patches, one written
      by Eric Biederman and the other by Eric Dumazet. If they could add
      their Signed-off-by: to this, I'd be much obliged.
      Signed-off-by: NWilliam Irwin <wli@holomorphy.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Eric Dumazet <dada1@cosmosbay.com>
      Cc: Mark Lord <lkml@rtr.ca>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Andi Kleen <ak@suse.de>
      6c2af358
    • A
      [PATCH] i386: Drop noisy e820 debugging printks · 13063832
      Andi Kleen 提交于
      Signed-off-by: NAndi Kleen <ak@suse.de>
      13063832
    • A
      [PATCH] i386: Remove copy_*_user BUG_ONs for (size < 0) · c812d6c1
      Andi Kleen 提交于
      access_ok checks this case anyways, no need to check twice.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      c812d6c1
    • A
      [PATCH] i386: Little cleanups in smpboot.c · ec1180db
      Andi Kleen 提交于
      - Remove #if that is always set
      - Fix warning
      Signed-off-by: NAndi Kleen <ak@suse.de>
      ec1180db
    • A
      [PATCH] i386: Implement X86_FEATURE_SYNC_RDTSC on i386 · 3aefbe07
      Andi Kleen 提交于
      Syncs up with x86-64.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      3aefbe07
    • A
      [PATCH] i386: Verify important CPUID bits in real mode · c7f81c94
      Andi Kleen 提交于
      Check some CPUID bits that are needed for compiler generated early in boot.
      When the system is still in real mode before changing the VESA BIOS mode
      it is possible to still display an visible error message on the screen.
      
      Similar to x86-64.
      
      Includes cleanups from Eric Biederman
      Signed-off-by: NAndi Kleen <ak@suse.de>
      c7f81c94
    • A
      [PATCH] i386: Drop -traditional in arch/i386/boot · 484ad393
      Andi Kleen 提交于
      Needed for followon patch
      Signed-off-by: NAndi Kleen <ak@suse.de>
      484ad393
    • A
      [PATCH] i386: Clean up NMI watchdog code · 09198e68
      Andi Kleen 提交于
      - Introduce a wd_ops structure
      - Convert the various nmi watchdogs over to it
      - This allows to split the perfctr reservation from the watchdog
      setup cleanly.
      - Do perfctr reservation globally as it should have always been
      - Remove dead code referenced only by unused EXPORT_SYMBOLs
      Signed-off-by: NAndi Kleen <ak@suse.de>
      09198e68
    • A
      [PATCH] i386: fix wrong comment for syscall stack layout · 889f21ce
      Andi Kleen 提交于
      `ret_from_sys_call' label no longer exist and `syscall_exit' label was
      introduced instead.
      Signed-off-by: NSatoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      889f21ce
    • E
      [PATCH] i386: convert to the kthread API · f26d6a2b
      Eric W. Biederman 提交于
      This patch just trivial converts from calling kernel_thread and daemonize
      to just calling kthread_run.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      f26d6a2b
    • D
      [PATCH] i386: remove xtime_lock'ing around cpufreq notifier · df3624aa
      Daniel Walker 提交于
      The locking of the xtime_lock around the cpu notifier is unessesary now.
      At one time the tsc was used after a frequency change for timekeeping, but
      the re-write of timekeeping no longer uses the TSC unless the frequency is
      constant.
      
      The variables that are changed in this section of code had also once been
      used for timekeeping, but not any longer ..
      Signed-off-by: NDaniel Walker <dwalker@mvista.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: john stultz <johnstul@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      df3624aa
    • J
      [PATCH] i386: check capability · 2f3c30e6
      Joachim Deguara 提交于
      Currently the i386 architecture checks the family for mce capability and this
      removes that and uses the CPUID information.  Tested on a K8 revE and a
      family10h processor.
      
      This eliminates checking of a set AMD procesor family if mce is
      allowed and relies on the information being in CPUID.
      Signed-off-by: NJoachim Deguara <joachim.deguara@amd.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      2f3c30e6
    • K
      [PATCH] i386: clean up flush_tlb_others fn · 1bdae458
      Keshavamurthy, Anil S 提交于
      Cleanup flush_tlb_others(), no functional change.
      Signed-off-by: NAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      1bdae458
    • H
      [PATCH] i386: replace spin_lock_irqsave with spin_lock · 62dbc210
      Hisashi Hifumi 提交于
      IRQ is already disabled through local_irq_disable().  So
      spin_lock_irqsave() can be replaced with spin_lock().
      Signed-off-by: NHisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      62dbc210
    • K
      [PATCH] i386: avoid checking for cpu gone when CONFIG_HOTPLUG_CPU not defined · e8a72ffa
      Keshavamurthy, Anil S 提交于
      Avoid checking for cpu gone in mm hot path when CONFIG_HOTPLUG_CPU is not
      defined.
      Signed-off-by: NAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Gautham R Shenoy <ego@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      e8a72ffa
    • J
      [PATCH] i386: PARAVIRT: fix startup_ipi_hook config dependency · 0260c196
      Jeremy Fitzhardinge 提交于
      startup_ipi_hook depends on CONFIG_X86_LOCAL_APIC, so move it to the
      right part of the paravirt_ops initialization.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      0260c196
    • R
      [PATCH] i386: fix mtrr sections · 25c16b99
      Randy Dunlap 提交于
      Fix section mismatch warnings in mtrr code.
      Fix line length on one source line.
      
      WARNING: arch/x86_64/kernel/built-in.o - Section mismatch: reference to .init.data: from .text.get_mtrr_state after 'get_mtrr_state' (at offset 0x103)
      WARNING: arch/x86_64/kernel/built-in.o - Section mismatch: reference to .init.text: from .text.get_mtrr_state after 'get_mtrr_state' (at offset 0x180)
      WARNING: arch/x86_64/kernel/built-in.o - Section mismatch: reference to .init.text: from .text.get_mtrr_state after 'get_mtrr_state' (at offset 0x199)
      WARNING: arch/x86_64/kernel/built-in.o - Section mismatch: reference to .init.text: from .text.get_mtrr_state after 'get_mtrr_state' (at offset 0x1c1)
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      25c16b99
    • F
      [PATCH] i386: Use safe_apic_wait_icr_idle in safe_apic_wait_icr_idle - i386 · f5efb41e
      Use safe_apic_wait_icr_idle to check ICR idle bit if the vector is
      NMI_VECTOR to avoid potential hangups in the event of crash when kdump
      tries to stop the other CPUs.
      Signed-off-by: NFernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      f5efb41e
    • F
      [PATCH] i386: __send_IPI_dest_field - i386 · 45ae5e96
      Implement __send_IPI_dest_field which can be used to send IPIs when the
      "destination shorthand" field of the ICR is set to 00 (destination
      field). Use it whenever possible.
      Signed-off-by: NFernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      45ae5e96
    • F
      [PATCH] i386: use safe_apic_wait_icr_idle in smpboot.c · 4312fa81
      Fernando Luis VazquezCao 提交于
      __inquire_remote_apic is used for APIC debugging, so use
      safe_apic_wait_icr_idle  instead of apic_wait_icr_idle to avoid possible
      lockups when APIC delivery fails.
      Signed-off-by: NFernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      4312fa81
    • F
      [PATCH] i386: use safe_apic_wait_icr_idle - i386 · ae08e43e
      Fernando Luis VazquezCao 提交于
      The functionality provided by the new safe_apic_wait_icr_idle is being
      open-coded all over "kernel/smpboot.c". Use safe_apic_wait_icr_idle
      instead to consolidate code and ease maintenance.
      Signed-off-by: NFernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      ae08e43e
    • F
      [PATCH] i386: safe_apic_wait_icr_idle - i386 · f2b218dd
      Fernando Luis VazquezCao 提交于
      apic_wait_icr_idle looks like this:
      
      static __inline__ void apic_wait_icr_idle(void)
      {
        while (apic_read(APIC_ICR) & APIC_ICR_BUSY)
          cpu_relax();
      }
      
      The busy loop in this function would not be problematic if the
      corresponding status bit in the ICR were always updated, but that does
      not seem to be the case under certain crash scenarios. Kdump uses an IPI
      to stop the other CPUs in the event of a crash, but when any of the
      other CPUs are locked-up inside the NMI handler the CPU that sends the
      IPI will end up looping forever in the ICR check, effectively
      hard-locking the whole system.
      
      Quoting from Intel's "MultiProcessor Specification" (Version 1.4), B-3:
      
      "A local APIC unit indicates successful dispatch of an IPI by
      resetting the Delivery Status bit in the Interrupt Command
      Register (ICR). The operating system polls the delivery status
      bit after sending an INIT or STARTUP IPI until the command has
      been dispatched.
      
      A period of 20 microseconds should be sufficient for IPI dispatch
      to complete under normal operating conditions. If the IPI is not
      successfully dispatched, the operating system can abort the
      command. Alternatively, the operating system can retry the IPI by
      writing the lower 32-bit double word of the ICR. This “time-out”
      mechanism can be implemented through an external interrupt, if
      interrupts are enabled on the processor, or through execution of
      an instruction or time-stamp counter spin loop."
      
      Intel's documentation suggests the implementation of a time-out
      mechanism, which, by the way, is already being open-coded in some parts
      of the kernel that tinker with ICR.
      
      Create a apic_wait_icr_idle replacement that implements the time-out
      mechanism and that can be used to solve the aforementioned problem.
      
      AK: moved both functions out of line
      AK: added improved loop from Keith Owens
      Signed-off-by: NFernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      f2b218dd
    • B
      [PATCH] i386: Enable support for fixed-range IORRs to keep RdMem & WrMem in sync · de938c51
      Bernhard Kaindl 提交于
      If our copy of the MTRRs of the BSP has RdMem or WrMem set, and
      we are running on an AMD64/K8 system, the boot CPU must have had
      MtrrFixDramEn and MtrrFixDramModEn set (otherwise our RDMSR would
      have copied these bits cleared), so we set them on this CPU as well.
      
      This allows us to keep the AMD64/K8 RdMem and WrMem bits in sync
      across the CPUs of SMP systems in order to fullfill the duty of
      system software to "initialize and maintain MTRR consistency
      across all processors." as written in the AMD and Intel manuals.
      
      If an WRMSR instruction fails because MtrrFixDramModEn is not
      set, I expect that also the Intel-style MTRR bits are not updated.
      
      AK: minor cleanup, moved MSR defines around
      Signed-off-by: NBernhard Kaindl <bk@suse.de>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      de938c51
    • B
      [PATCH] x86: Save and restore the fixed-range MTRRs of the BSP when suspending · 3ebad590
      Bernhard Kaindl 提交于
      Note: This patch didn'nt need an update since it's initial post.
      
      Some BIOSes may modify fixed-range MTRRs in SMM, e.g. when they
      transition the system into ACPI mode, which is entered thru an SMI,
      triggered by Linux in acpi_enable().
      
      SMIs which cause that Linux is interrupted and BIOS code is
      executed (which may change e.g. fixed-range MTRRs) in SMM may
      be raised by an embedded system controller which is often found
      in notebooks also at other occasions.
      
      If we would not update our copy of the fixed-range MTRRs before
      suspending to RAM or to disk, restore_processor_state() would
      set the fixed-range MTRRs of the BSP using old backup values
      which may be outdated and this could cause the system to fail
      later during resume.
      
      This patch ensures that our copy of the fixed-range MTRRs
      is updated when saving the boot processor state on suspend
      to disk and suspend to RAM.
      
      In combination with other patches this allows to fix s2ram
      and s2disk on the Acer Ferrari 1000 notebook and at least
      s2disk on the Acer Ferrari 5000 notebook.
      Signed-off-by: NBernhard Kaindl <bk@suse.de>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      3ebad590
    • B
      [PATCH] x86: Save the MTRRs of the BSP before booting an AP · 2b1f6278
      Bernhard Kaindl 提交于
      Applied fix by Andew Morton:
      http://lkml.org/lkml/2007/4/8/88 - Fix `make headers_check'.
      
      AMD and Intel x86 CPU manuals state that it is the responsibility of
      system software to initialize and maintain MTRR consistency across
      all processors in Multi-Processing Environments.
      
      Quote from page 188 of the AMD64 System Programming manual (Volume 2):
      
      7.6.5 MTRRs in Multi-Processing Environments
      
      "In multi-processing environments, the MTRRs located in all processors must
      characterize memory in the same way. Generally, this means that identical
      values are written to the MTRRs used by the processors." (short omission here)
      "Failure to do so may result in coherency violations or loss of atomicity.
      Processor implementations do not check the MTRR settings in other processors
      to ensure consistency. It is the responsibility of system software to
      initialize and maintain MTRR consistency across all processors."
      
      Current Linux MTRR code already implements the above in the case that the
      BIOS does not properly initialize MTRRs on the secondary processors,
      but the case where the fixed-range MTRRs of the boot processor are changed
      after Linux started to boot, before the initialsation of a secondary
      processor, is not handled yet.
      
      In this case, secondary processors are currently initialized by Linux
      with MTRRs which the boot processor had very early, when mtrr_bp_init()
      did run, but not with the MTRRs which the boot processor uses at the
      time when that secondary processors is actually booted,
      causing differing MTRR contents on the secondary processors.
      
      Such situation happens on Acer Ferrari 1000 and 5000 notebooks where the
      BIOS enables and sets AMD-specific IORR bits in the fixed-range MTRRs
      of the boot processor when it transitions the system into ACPI mode.
      The SMI handler of the BIOS does this in SMM, entered while Linux ACPI
      code runs acpi_enable().
      
      Other occasions where the SMI handler of the BIOS may change bits in
      the MTRRs could occur as well. To initialize newly booted secodary
      processors with the fixed-range MTRRs which the boot processor uses
      at that time, this patch saves the fixed-range MTRRs of the boot
      processor before new secondary processors are started. When the
      secondary processors run their Linux initialisation code, their
      fixed-range MTRRs will be updated with the saved fixed-range MTRRs.
      
      If CONFIG_MTRR is not set, we define mtrr_save_state
      as an empty statement because there is nothing to do.
      
      Possible TODOs:
      
      *) CPU-hotplugging outside of SMP suspend/resume is not yet tested
         with this patch.
      
      *) If, even in this case, an AP never runs i386/do_boot_cpu or x86_64/cpu_up,
         then the calls to mtrr_save_state() could be replaced by calls to
         mtrr_save_fixed_ranges(NULL) and  mtrr_save_state() would not be
         needed.
      
         That would need either verification of the CPU-hotplug code or
         at least a test on a >2 CPU machine.
      
      *) The MTRRs of other running processors are not yet checked at this
         time but it might be interesting to syncronize the MTTRs of all
         processors before booting. That would be an incremental patch,
         but of rather low priority since there is no machine known so
         far which would require this.
      
      AK: moved prototypes on x86-64 around to fix warnings
      Signed-off-by: NBernhard Kaindl <bk@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      2b1f6278
    • B
      [PATCH] x86: Adds mtrr_save_fixed_ranges() for use in two later patches. · 2b3b4835
      Bernhard Kaindl 提交于
      In this current implementation which is used in other patches,
      mtrr_save_fixed_ranges() accepts a dummy void pointer because
      in the current implementation of one of these patches, this
      function may be called from smp_call_function_single() which
      requires that this function takes a void pointer argument.
      
      This function calls get_fixed_ranges(), passing mtrr_state.fixed_ranges
      which is the element of the static struct which stores our current
      backup of the fixed-range MTRR values which all CPUs shall be
      using.
      
      Because  mtrr_save_fixed_ranges calls get_fixed_ranges after
      kernel initialisation time, __init needs to be removed from
      the declaration of get_fixed_ranges().
      
      If CONFIG_MTRR is not set, we define mtrr_save_fixed_ranges
      as an empty statement because there is nothing to do.
      
      AK: Moved prototypes for x86-64 around to fix warnings
      Signed-off-by: NBernhard Kaindl <bk@suse.de>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      2b3b4835