1. 23 1月, 2007 1 次提交
    • J
      [PATCH] x86: fix PDA variables to work during boot · 9ee79a3d
      James Bottomley 提交于
      The current PDA code, which went in in post 2.6.19 has a flaw in that it
      doesn't correctly cycle the GDT and %GS segment through the boot PDA,
      the CPU PDA and finally the per-cpu PDA.
      
      The bug generally doesn't show up if the boot CPU id is zero, but
      everything falls apart for a non zero boot CPU id.  The basically kills
      voyager which is perfectly capable of doing non zero CPU id boots, so
      voyager currently won't boot without this.
      
      The fix is to be careful and actually do the GDT setups correctly.
      Signed-off-by: NJames Bottomley <James.Bottomley@SteelEye.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Andrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9ee79a3d
  2. 11 1月, 2007 1 次提交
    • V
      [PATCH] i386: cpu hotplug/smpboot misc MODPOST warning fixes · 4a5d107a
      Vivek Goyal 提交于
      o Misc smpboot/cpu hotplug path cleanups. I did those to supress the
        warnings generated by MODPOST. These warnings are visible only
        if CONFIG_RELOCATABLE=y.
      
      o CONFIG_RELOCATABLE compiles the kernel with --emit-relocs option. This
        option retains relocation information in vmlinux file and MODPOST
        is quick to spit out "Section mismatch" warnings.
      
      o This patch fixes some of those warnings. Many of the functions in
        smpboot case are __devinit type and they in turn accesses text/data which
        if of type __cpuinit. Now if CONFIG_HOTPLUG=y and CONFIG_HOTPLUG_CPU=n
        then we end up in cases where a function in .text segment is calling
        another function in .init.text segment and MODPOST emits warning.
      
      WARNING: vmlinux - Section mismatch: reference to .init.text:identify_cpu from .text between 'smp_store_cpu_info' (at offset 0xc011020d) and 'do_boot_cpu'
      WARNING: vmlinux - Section mismatch: reference to .init.text:init_gdt from .text between 'do_boot_cpu' (at offset 0xc01102ca) and '__cpu_up'
      WARNING: vmlinux - Section mismatch: reference to .init.text:print_cpu_info from .text between 'do_boot_cpu' (at offset 0xc01105d0) and '__cpu_up'
      
      o It also fixes the issues where CONFIG_HOTPLUG_CPU=y and start_secondary()
        is calling smp_callin() which in-turn calls synchronize_tsc_ap() which is
        of type __init. This should have meant broken CPU hotplug.
      
      WARNING: vmlinux - Section mismatch: reference to .init.data: from .text between 'start_secondary' (at offset 0xc011603f) and 'initialize_secondary'
      WARNING: vmlinux - Section mismatch: reference to .init.data: from .text between 'MP_processor_info' (at offset 0xc0116a4f) and 'mp_register_lapic'
      WARNING: vmlinux - Section mismatch: reference to .init.data: from .text between 'MP_processor_info' (at offset 0xc0116a4f) and 'mp_register_lapic'
      Signed-off-by: NVivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      4a5d107a
  3. 06 1月, 2007 1 次提交
    • V
      [PATCH] i386: modpost smpboot code warning fix · 3771a450
      Vivek Goyal 提交于
      o Currently synchronize_tsc_ap() is of type __init. It is called by
        smp_callin() which is of type __cpuinit. So synchronize_tsc_ap()
        should be of type __cpuinit.
      
      o Modpost generates warnings for i386 if CONFIG_RELOCATABLE=y and
        CONFIG_HOTPLUG_CPU=y
      
      WARNING: vmlinux - Section mismatch: reference to .init.data: from .text between 'start_secondary' (at offset 0xc01164dc) and 'initialize_secondary'
      WARNING: vmlinux - Section mismatch: reference to .init.data: from .text between 'start_secondary' (at offset 0xc01164e8) and 'initialize_secondary'
      
      o tsc is of type __initdata. It should be of type __cpuinitdata.
      Signed-off-by: NVivek Goyal <vgoyal@in.ibm.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3771a450
  4. 14 12月, 2006 1 次提交
  5. 10 12月, 2006 1 次提交
  6. 09 12月, 2006 1 次提交
  7. 07 12月, 2006 4 次提交
  8. 22 11月, 2006 1 次提交
  9. 04 10月, 2006 1 次提交
  10. 02 10月, 2006 1 次提交
  11. 01 10月, 2006 1 次提交
  12. 30 9月, 2006 1 次提交
    • K
      [PATCH] convert i386 Summit subarch to use SRAT info for apicid_to_node calls · 3b08606d
      keith mannthey 提交于
      Convert the i386 summit subarch apicid_to_node to use node information
      provided by the SRAT.  It was discussed a little on LKML a few weeks ago
      and was seen as an acceptable fix.  The current way of obtaining the nodeid
      
       static inline int apicid_to_node(int logical_apicid)
       {
         return logical_apicid >> 5;
       }
      
      is just not correct for all summit systems/bios.  Assuming the apicid
      matches the Linux node number require a leap of faith that the bios mapped
      out the apicids a set way.  Modern summit HW (IBM x460) does not layout its
      bios in the manner for various reasons and is unable to boot i386 numa.
      
      The best way to get the correct apicid to node information is from the SRAT
      table during boot.  It lays out what apicid belongs to what node.  I use
      this information to create a table for use at run time.
      Signed-off-by: NKeith Mannthey <kmannth@us.ibm.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3b08606d
  13. 26 9月, 2006 4 次提交
    • D
      [PATCH] i386: don't taint UP K7's running SMP kernels. · 3ca113ea
      Dave Jones 提交于
      We have a test that looks for invalid pairings of certain athlon/durons
      that weren't designed for SMP, and taint accordingly (with 'S') if we find
      such a configuration.  However, this test shouldn't fire if there's only
      a single CPU present. It's perfectly valid for an SMP kernel to boot on UP
      hardware for example.
      
      AK: changed to num_possible_cpus()
      Signed-off-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      3ca113ea
    • R
      [PATCH] i386: Replace i386 open-coded cmdline parsing with · 1a3f239d
      Rusty Russell 提交于
      This patch replaces the open-coded early commandline parsing
      throughout the i386 boot code with the generic mechanism (already used
      by ppc, powerpc, ia64 and s390).  The code was inconsistent with
      whether it deletes the option from the cmdline or not, meaning some of
      these will get passed through the environment into init.
      
      This transformation is mainly mechanical, but there are some notable
      parts:
      
      1) Grammar: s/linux never set's it up/linux never sets it up/
      
      2) Remove hacked-in earlyprintk= option scanning.  When someone
         actually implements CONFIG_EARLY_PRINTK, then they can use
         early_param().
      [AK: actually it is implemented, but I'm adding the early_param it in the next
      x86-64 patch]
      
      3) Move declaration of generic_apic_probe() from setup.c into asm/apic.h
      
      4) Various parameters now moved into their appropriate files (thanks Andi).
      
      5) All parse functions which examine arg need to check for NULL,
         except one where it has subtle humor value.
      
      AK: readded acpi_sci handling which was completely dropped
      AK: moved some more variables into acpi/boot.c
      
      Cc: len.brown@intel.com
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      1a3f239d
    • S
      [PATCH] i386/x86-64: Fix NMI watchdog suspend/resume · 4038f901
      Shaohua Li 提交于
      Making NMI suspend/resume work with SMP. We use CPU hotplug to offline
      APs in SMP suspend/resume. Only BSP executes sysdev's .suspend/.resume
      method. APs should follow CPU hotplug code path.
      
      And:
      
      +From: Don Zickus <dzickus@redhat.com>
      
      Makes the start/stop paths of nmi watchdog more robust to handle the
      suspend/resume cases more gracefully.
      
      AK: I merged the two patches together
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      4038f901
    • K
      [PATCH] i386: fix flat mode numa on a real numa system · bfa0e9a0
      keith mannthey 提交于
      If there is only 1 node in the system cpus should think they are apart of
      some other node.
      
      If cases where a real numa system boots the Flat numa option make sure the
      cpus don't claim to be apart on a non-existent node.
      Signed-off-by: NKeith Mannthey <kmannth@us.ibm.com>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      bfa0e9a0
  14. 01 8月, 2006 1 次提交
  15. 01 7月, 2006 1 次提交
  16. 28 6月, 2006 3 次提交
  17. 27 6月, 2006 1 次提交
  18. 26 6月, 2006 1 次提交
  19. 28 4月, 2006 1 次提交
  20. 28 3月, 2006 1 次提交
    • S
      [PATCH] sched: new sched domain for representing multi-core · 1e9f28fa
      Siddha, Suresh B 提交于
      Add a new sched domain for representing multi-core with shared caches
      between cores.  Consider a dual package system, each package containing two
      cores and with last level cache shared between cores with in a package.  If
      there are two runnable processes, with this appended patch those two
      processes will be scheduled on different packages.
      
      On such systems, with this patch we have observed 8% perf improvement with
      specJBB(2 warehouse) benchmark and 35% improvement with CFP2000 rate(with 2
      users).
      
      This new domain will come into play only on multi-core systems with shared
      caches.  On other systems, this sched domain will be removed by domain
      degeneration code.  This new domain can be also used for implementing power
      savings policy (see OLS 2005 CMP kernel scheduler paper for more details..
      I will post another patch for power savings policy soon)
      
      Most of the arch/* file changes are for cpu_coregroup_map() implementation.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1e9f28fa
  21. 26 3月, 2006 1 次提交
  22. 23 3月, 2006 1 次提交
    • G
      [PATCH] x86: SMP alternatives · 9a0b5817
      Gerd Hoffmann 提交于
      Implement SMP alternatives, i.e.  switching at runtime between different
      code versions for UP and SMP.  The code can patch both SMP->UP and UP->SMP.
      The UP->SMP case is useful for CPU hotplug.
      
      With CONFIG_CPU_HOTPLUG enabled the code switches to UP at boot time and
      when the number of CPUs goes down to 1, and switches to SMP when the number
      of CPUs goes up to 2.
      
      Without CONFIG_CPU_HOTPLUG or on non-SMP-capable systems the code is
      patched once at boot time (if needed) and the tables are released
      afterwards.
      
      The changes in detail:
      
        * The current alternatives bits are moved to a separate file,
          the SMP alternatives code is added there.
      
        * The patch adds some new elf sections to the kernel:
          .smp_altinstructions
      	like .altinstructions, also contains a list
      	of alt_instr structs.
          .smp_altinstr_replacement
      	like .altinstr_replacement, but also has some space to
      	save original instruction before replaving it.
          .smp_locks
      	list of pointers to lock prefixes which can be nop'ed
      	out on UP.
          The first two are used to replace more complex instruction
          sequences such as spinlocks and semaphores.  It would be possible
          to deal with the lock prefixes with that as well, but by handling
          them as special case the table sizes become much smaller.
      
       * The sections are page-aligned and padded up to page size, so they
         can be free if they are not needed.
      
       * Splitted the code to release init pages to a separate function and
         use it to release the elf sections if they are unused.
      Signed-off-by: NGerd Hoffmann <kraxel@suse.de>
      Signed-off-by: NChuck Ebbert <76306.1226@compuserve.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9a0b5817
  23. 17 3月, 2006 1 次提交
  24. 25 2月, 2006 1 次提交
    • J
      [PATCH] x86: fix broken SMP boot sequence · 2b932f6c
      James Bottomley 提交于
      Recent GDT changes broke the SMP boot sequence if the booting CPU is
      numbered anything other than zero.  There's also a subtle source of error
      in that the boot time CPU now uses cpu_gdt_table (which is actually the GDT
      for booting CPUs in head.S).  This patch fixes both problems by making GDT
      descriptors themselves allocated from a per_cpu area and switching to them
      in cpu_init(), which now means that cpu_gdt_table is exclusively used for
      booting CPUs again.
      Signed-off-by: NJames Bottomley <James.Bottomley@SteelEye.com>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Matt Tolentino <metolent@snoqualmie.dp.intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      2b932f6c
  25. 11 2月, 2006 1 次提交
    • A
      [PATCH] x86: don't initialise cpu_possible_map to all ones · 7a8ef1cb
      Andrew Morton 提交于
      Initialising cpu_possible_map to all-ones with CONFIG_HOTPLUG_CPU means that
      
      a) All for_each_cpu() loops will iterate across all NR_CPUS CPUs, rather
         than over possible ones.  That can be quite expensive.
      
      b) Soon we'll be allocating per-cpu areas only for possible CPUs.  So with
         CPU_MASK_ALL, we'll be wasting memory.
      
      I also switched voyager over to not use CPU_MASK_ALL in the non-CPU-hotplug
      case.  Should be OK..
      
      I note that parisc is also using CPU_MASK_ALL.  Suggest that it stop doing
      that.
      
      Cc: James Bottomley <James.Bottomley@steeleye.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: Zwane Mwaikambo <zwane@linuxpower.ca>
      Cc: Paul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7a8ef1cb
  26. 13 1月, 2006 2 次提交
    • A
      [PATCH] i386: fix task_pt_regs() · 07b047fc
      akpm@osdl.org 提交于
      )
      
      From: Al Viro <viro@ftp.linux.org.uk>
      
      task_pt_regs() needs the same offset-by-8 to match copy_thread()
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      07b047fc
    • A
      [PATCH] scheduler cache-hot-autodetect · 198e2f18
      akpm@osdl.org 提交于
      )
      
      From: Ingo Molnar <mingo@elte.hu>
      
      This is the latest version of the scheduler cache-hot-auto-tune patch.
      
      The first problem was that detection time scaled with O(N^2), which is
      unacceptable on larger SMP and NUMA systems. To solve this:
      
      - I've added a 'domain distance' function, which is used to cache
        measurement results. Each distance is only measured once. This means
        that e.g. on NUMA distances of 0, 1 and 2 might be measured, on HT
        distances 0 and 1, and on SMP distance 0 is measured. The code walks
        the domain tree to determine the distance, so it automatically follows
        whatever hierarchy an architecture sets up. This cuts down on the boot
        time significantly and removes the O(N^2) limit. The only assumption
        is that migration costs can be expressed as a function of domain
        distance - this covers the overwhelming majority of existing systems,
        and is a good guess even for more assymetric systems.
      
        [ People hacking systems that have assymetries that break this
          assumption (e.g. different CPU speeds) should experiment a bit with
          the cpu_distance() function. Adding a ->migration_distance factor to
          the domain structure would be one possible solution - but lets first
          see the problem systems, if they exist at all. Lets not overdesign. ]
      
      Another problem was that only a single cache-size was used for measuring
      the cost of migration, and most architectures didnt set that variable
      up. Furthermore, a single cache-size does not fit NUMA hierarchies with
      L3 caches and does not fit HT setups, where different CPUs will often
      have different 'effective cache sizes'. To solve this problem:
      
      - Instead of relying on a single cache-size provided by the platform and
        sticking to it, the code now auto-detects the 'effective migration
        cost' between two measured CPUs, via iterating through a wide range of
        cachesizes. The code searches for the maximum migration cost, which
        occurs when the working set of the test-workload falls just below the
        'effective cache size'. I.e. real-life optimized search is done for
        the maximum migration cost, between two real CPUs.
      
        This, amongst other things, has the positive effect hat if e.g. two
        CPUs share a L2/L3 cache, a different (and accurate) migration cost
        will be found than between two CPUs on the same system that dont share
        any caches.
      
      (The reliable measurement of migration costs is tricky - see the source
      for details.)
      
      Furthermore i've added various boot-time options to override/tune
      migration behavior.
      
      Firstly, there's a blanket override for autodetection:
      
      	migration_cost=1000,2000,3000
      
      will override the depth 0/1/2 values with 1msec/2msec/3msec values.
      
      Secondly, there's a global factor that can be used to increase (or
      decrease) the autodetected values:
      
      	migration_factor=120
      
      will increase the autodetected values by 20%. This option is useful to
      tune things in a workload-dependent way - e.g. if a workload is
      cache-insensitive then CPU utilization can be maximized by specifying
      migration_factor=0.
      
      I've tested the autodetection code quite extensively on x86, on 3
      P3/Xeon/2MB, and the autodetected values look pretty good:
      
      Dual Celeron (128K L2 cache):
      
       ---------------------
       migration cost matrix (max_cache_size: 131072, cpu: 467 MHz):
       ---------------------
                 [00]    [01]
       [00]:     -     1.7(1)
       [01]:   1.7(1)    -
       ---------------------
       cacheflush times [2]: 0.0 (0) 1.7 (1784008)
       ---------------------
      
      Here the slow memory subsystem dominates system performance, and even
      though caches are small, the migration cost is 1.7 msecs.
      
      Dual HT P4 (512K L2 cache):
      
       ---------------------
       migration cost matrix (max_cache_size: 524288, cpu: 2379 MHz):
       ---------------------
                 [00]    [01]    [02]    [03]
       [00]:     -     0.4(1)  0.0(0)  0.4(1)
       [01]:   0.4(1)    -     0.4(1)  0.0(0)
       [02]:   0.0(0)  0.4(1)    -     0.4(1)
       [03]:   0.4(1)  0.0(0)  0.4(1)    -
       ---------------------
       cacheflush times [2]: 0.0 (33900) 0.4 (448514)
       ---------------------
      
      Here it can be seen that there is no migration cost between two HT
      siblings (CPU#0/2 and CPU#1/3 are separate physical CPUs). A fast memory
      system makes inter-physical-CPU migration pretty cheap: 0.4 msecs.
      
      8-way P3/Xeon [2MB L2 cache]:
      
       ---------------------
       migration cost matrix (max_cache_size: 2097152, cpu: 700 MHz):
       ---------------------
                 [00]    [01]    [02]    [03]    [04]    [05]    [06]    [07]
       [00]:     -    19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)
       [01]:  19.2(1)    -    19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)
       [02]:  19.2(1) 19.2(1)    -    19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)
       [03]:  19.2(1) 19.2(1) 19.2(1)    -    19.2(1) 19.2(1) 19.2(1) 19.2(1)
       [04]:  19.2(1) 19.2(1) 19.2(1) 19.2(1)    -    19.2(1) 19.2(1) 19.2(1)
       [05]:  19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)    -    19.2(1) 19.2(1)
       [06]:  19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)    -    19.2(1)
       [07]:  19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)    -
       ---------------------
       cacheflush times [2]: 0.0 (0) 19.2 (19281756)
       ---------------------
      
      This one has huge caches and a relatively slow memory subsystem - so the
      migration cost is 19 msecs.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAshok Raj <ashok.raj@intel.com>
      Signed-off-by: NKen Chen <kenneth.w.chen@intel.com>
      Cc: <wilder@us.ibm.com>
      Signed-off-by: NJohn Hawkes <hawkes@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      198e2f18
  27. 07 1月, 2006 1 次提交
  28. 13 12月, 2005 1 次提交
  29. 15 11月, 2005 1 次提交
  30. 09 11月, 2005 1 次提交
    • N
      [PATCH] sched: disable preempt in idle tasks · 5bfb5d69
      Nick Piggin 提交于
      Run idle threads with preempt disabled.
      
      Also corrected a bugs in arm26's cpu_idle (make it actually call schedule()).
      How did it ever work before?
      
      Might fix the CPU hotplugging hang which Nigel Cunningham noted.
      
      We think the bug hits if the idle thread is preempted after checking
      need_resched() and before going to sleep, then the CPU offlined.
      
      After calling stop_machine_run, the CPU eventually returns from preemption and
      into the idle thread and goes to sleep.  The CPU will continue executing
      previous idle and have no chance to call play_dead.
      
      By disabling preemption until we are ready to explicitly schedule, this bug is
      fixed and the idle threads generally become more robust.
      
      From: alexs <ashepard@u.washington.edu>
      
        PPC build fix
      
      From: Yoichi Yuasa <yuasa@hh.iij4u.or.jp>
      
        MIPS build fix
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NYoichi Yuasa <yuasa@hh.iij4u.or.jp>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5bfb5d69
  31. 07 11月, 2005 1 次提交