1. 28 3月, 2006 1 次提交
    • S
      [PATCH] sched: new sched domain for representing multi-core · 1e9f28fa
      Siddha, Suresh B 提交于
      Add a new sched domain for representing multi-core with shared caches
      between cores.  Consider a dual package system, each package containing two
      cores and with last level cache shared between cores with in a package.  If
      there are two runnable processes, with this appended patch those two
      processes will be scheduled on different packages.
      
      On such systems, with this patch we have observed 8% perf improvement with
      specJBB(2 warehouse) benchmark and 35% improvement with CFP2000 rate(with 2
      users).
      
      This new domain will come into play only on multi-core systems with shared
      caches.  On other systems, this sched domain will be removed by domain
      degeneration code.  This new domain can be also used for implementing power
      savings policy (see OLS 2005 CMP kernel scheduler paper for more details..
      I will post another patch for power savings policy soon)
      
      Most of the arch/* file changes are for cpu_coregroup_map() implementation.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1e9f28fa
  2. 27 3月, 2006 4 次提交
    • A
      [PATCH] bitops: i386: use generic bitops · 1cc2b994
      Akinobu Mita 提交于
      - remove generic_fls64()
      - remove sched_find_first_bit()
      - remove generic_hweight{32,16,8}()
      - remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit()
      - remove minix_{test,set,test_and_clear,test,find_first_zero}_bit()
      Signed-off-by: NAkinobu Mita <mita@miraclelinux.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1cc2b994
    • M
      [PATCH] x86: kprobes-booster · 311ac88f
      Masami Hiramatsu 提交于
      Current kprobe copies the original instruction at the probe point and replaces
      it with a breakpoint instruction (int3).  When the kernel hits the probe
      point, kprobe handler is invoked.  And the copied instruction is single-step
      executed on the copied buffer (not on the original address) by kprobe.  After
      that, the kprobe checks registers and modify it (if need) as if the
      instructions was executed on the original address.
      
      My proposal is based on the fact there are many instructions which do NOT
      require the register modification after the single-step execution.  When the
      copied instruction is a kind of them, kprobe just jumps back to the next
      instruction after single-step execution.  If so, why don't we execute those
      instructions directly?
      
      With kprobe-booster patch, kprobes will execute a copied instruction directly
      and (if need) jump back to original code.  This direct execution is executed
      when the kprobe don't have both post_handler and break_handler, and the copied
      instruction can be executed directly.
      
      I sorted instructions which can be executed directly or not;
      
      - Call instructions are NG(can not be executed directly).
        We should correct the return address pushed into top of stack.
      - Indirect instructions except for absolute indirect-jumps
        are NG. Those instructions changes EIP randomly. We should
        check EIP and correct it.
      - Instructions that change EIP beyond the range of the
        instruction buffer are NG.
      - Instructions that change EIP to tail 5 bytes of the
        instruction buffer (it is the size of a jump instruction).
        We must write a jump instruction which backs to original
        kernel code in the instruction buffer.
      - Break point instruction is NG. We should not touch EIP and
        pass to other handlers.
      - Absolute direct/indirect jumps are OK.- Conditional Jumps are NG.
      - Halt and software-interruptions are NG. Because it will stay on
        the instruction buffer of kprobes.
      - Prefixes are NG.
      - Unknown/reserved opcode is NG.
      - Other 1 byte instructions are OK. But those instructions need a
        jump back code.
      - 2 bytes instructions are mapped sparsely. So, in this release,
        this patch don't boost those instructions.
      
      >From Intel's IA-32 opcode map described in IA-32 Intel Architecture Software
      Developer's Manual Vol.2 B, I determined that following opcodes are not
      boostable.
      
      - 0FH (2byte escape)
      - 70H - 7FH (Jump on condition)
      - 9AH (Call) and 9CH (Pushf)
      - C0H-C1H (Grp 2: includes reserved opcode)
      - C6H-C7H (Grp11: includes reserved opcode)
      - CCH-CEH (Software-interrupt)
      - D0H-D3H (Grp2: includes reserved opcode)
      - D6H (Reserved)
      - D8H-DFH (Coprocessor)
      - E0H-E3H (loop/conditional jump)
      - E8H (Call)
      - F0H-F3H (Prefixes and reserved)
      - F4H (Halt)
      - F6H-F7H (Grp3: includes reserved opcode)
      - FEH-FFH(Grp4,5: includes reserved opcode)
      
      Kprobe-booster checks whether target instruction can be boosted (can be
      executed directly) at arch_copy_kprobe() function.  If the target instruction
      can be boosted, it clears "boostable" flag.  If not, it sets "boostable" flag
      -1.  This is disabled status.  In resume_execution() function, If "boostable"
      flag is cleared, kprobe-booster measures the size of the target instruction
      and sets "boostable" flag 1.
      
      In kprobe_handler(), kprobe checks the "boostable" flag.  If the flag is 1, it
      resets current kprobe and executes instruction buffer directly instead of
      single stepping.
      
      When unregistering a boosted kprobe, it calls synchronize_sched()
      after "int3" is removed. So we can ensure followings after
      the synchronize_sched() called.
      - interrupt handlers are finished on all CPUs.
      - instruction buffer is not executed on all CPUs.
      And we can release the boosted kprobe safely.
      
      And also, on preemptible kernel, the booster is not enabled where the kernel
      preemption is enabled.  So, there are no preempted threads on the instruction
      buffer.
      
      The description of kretprobe-booster:
      ====================================
      
      In the normal operation, kretprobe make a target function return to trampoline
      code.  And a kprobe (called trampoline_probe) have been inserted at the
      trampoline code.  When the kernel hits this kprobe, it calls kretprobe's
      handler and it returns to original return address.
      
      Kretprobe-booster patch removes the trampoline_probe.  It allows the
      trampoline code to call kretprobe's handler directly instead of invoking
      kprobe.  And tranpoline code returns to original return address.
      
      This new trampoline code stores and restores registers, so the kretprobe
      handler is still able to access those registers.
      
      Current kprobe has about 1.3 usec/probe(*) overhead, and kprobe-booster patch
      reduces it to 0.6 usec/probe(*).  Also current kretprobe has about 2.0
      usec/probe(*) overhead.  Kprobe-booster patch reduces it to 1.3 usec/probe(*),
      and the combination of both kprobe-booster patch and kretprobe-booster patch
      reduces it to 0.9 usec/probe(*).
      
      I expect the combination of both patches can reduce half of a probing
      overhead.
      
      Performance numbers strongly depend on the processor model.
      
      Andrew Morton wrote:
      > These preempt tricks look rather nasty.  Can you please describe what the
      > problem is, precisely?  And how this code avoids it?  Perhaps we can find
      > something cleaner.
      
      The problem is how to remove the copied instructions of the
      kprobe *safely* on the preemptable kernel (CONFIG_PREEMPT=y).
      
      Kprobes basically executes the following actions;
      
      (1)int3
      (2)preempt_disable()
      (3)kprobe_prehandler()
      (4)copied instructioin(single step)
      (5)kprobe_posthandler()
      (6)preempt_enable()
      (7)return to the original code
      
      During the execution of copied instruction, preemption is
      disabled (from step (2) to (6)).
      When unregistering the probes, Kprobe waits for RCU
      quiescent state by using synchronize_sched() after removing
      int3 instruction.
      Thus we can ensure the copied instruction is not executed.
      
      On the other hand, kprobe-booster executes the following actions;
      
      (1)int3
      (2)preempt_disable()
      (3)kprobe_prehandler()
      (4)preempt_enable()             <-- this one is added by my patch
      (5)copied instruction(direct execution)
      (6)jmp back to the original code
      
      The problem is that we have no way to prevent preemption on
      step (5) or (6). We cannot call preempt_disable() after step (6),
      because there are no rooms to do that. Thus, some other
      processes may be preempted at step(5) or (6) on preemptable kernel.
      And I couldn't find the easy way to ensure that other processes'
      stack do *not* have the address of them. (I thought some way
      to do that, but those are very costly.)
      
      So currently, I simply boost the kprobe only when the probe
      point is already preemption disabled.
      
      > Also, the patch adds a preempt_enable() but I don't see a corresponding
      > preempt_disable().  Am I missing something?
      
      It is corresponding to the preempt_disable() in the top of
      kprobe_handler().
      I copied the code of kprobe_handler() here:
      
      static int __kprobes kprobe_handler(struct pt_regs *regs)
      {
              struct kprobe *p;
              int ret = 0;
              kprobe_opcode_t *addr = NULL;
              unsigned long *lp;
              struct kprobe_ctlblk *kcb;
      
              /*
               * We don't want to be preempted for the entire
               * duration of kprobe processing
               */
              preempt_disable();             <-- HERE
              kcb = get_kprobe_ctlblk();
      Signed-off-by: NMasami Hiramatsu <hiramatu@sdl.hitachi.co.jp>
      Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      311ac88f
    • T
      [PATCH] 2TB files: add blkcnt_t · a0f62ac6
      Takashi Sato 提交于
      Add blkcnt_t as the type of inode.i_blocks.  This enables you to make the size
      of blkcnt_t either 4 bytes or 8 bytes on 32 bits architecture with CONFIG_LSF.
      
      - CONFIG_LSF
        Add new configuration parameter.
      - blkcnt_t
        On h8300, i386, mips, powerpc, s390 and sh that define sector_t,
        blkcnt_t is defined as u64 if CONFIG_LSF is enabled; otherwise it is
        defined as unsigned long.
        On other architectures, it is defined as unsigned long.
      - inode.i_blocks
        Change the type from sector_t to blkcnt_t.
      Signed-off-by: NTakashi Sato <sho@tnes.nec.co.jp>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a0f62ac6
    • T
      [PATCH] 2TB files: st_blocks is invalid when calling stat64 · abcb6c9f
      Takashi Sato 提交于
      This patch series fixes the following problems on 32 bits architecture.
      
      o stat64 returns the lower 32 bits of blocks, although userland st_blocks
        has 64 bits, because i_blocks has only 32 bits.  The ioctl with FIOQSIZE has
        the same problem.
      
      o As Dave Kleikamp said, making >2TB file on JFS results in writing an
        invalid block number to disk inode.  The cause is the same as above too.
      
      o In generic quota code dquot_transfer(), the file usage is calculated from
        i_blocks via inode_get_bytes().  If the file is over 2TB, the change of
        usage is less than expected.  The cause is the same as above too.
      
      o As Trond Myklebust said, statfs64's entries related to blocks are invalid
        on statfs64 for a network filesystem which has more than 2^32-1 blocks with
        CONFIG_LBD disabled.  [PATCH 3/3]
      
      We made patches to fix problems that occur when handling a large filesystem
      and a large file.  It was discussed on the mails titled "stat64 for over 2TB
      file returned invalid st_blocks".
      Signed-off-by: NTakashi Sato <sho@tnes.nec.co.jp>
      Cc: Dave Kleikamp <shaggy@austin.ibm.com>
      Cc: Jan Kara <jack@ucw.cz>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      abcb6c9f
  3. 26 3月, 2006 2 次提交
    • A
      [PATCH] x86_64: Implement early DMI scanning · f2d3efed
      Andi Kleen 提交于
      There are more and more cases where we need to know DMI information
      early to work around bugs.  i386 already had early DMI scanning, but
      x86-64 didn't.  Implement this now.
      
      This required some cleanup in the i386 code.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f2d3efed
    • D
      [PATCH] POLLRDHUP/EPOLLRDHUP handling for half-closed devices notifications · f348d70a
      Davide Libenzi 提交于
      Implement the half-closed devices notifiation, by adding a new POLLRDHUP
      (and its alias EPOLLRDHUP) bit to the existing poll/select sets.  Since the
      existing POLLHUP handling, that does not report correctly half-closed
      devices, was feared to be changed, this implementation leaves the current
      POLLHUP reporting unchanged and simply add a new bit that is set in the few
      places where it makes sense.  The same thing was discussed and conceptually
      agreed quite some time ago:
      
      http://lkml.org/lkml/2003/7/12/116
      
      Since this new event bit is added to the existing Linux poll infrastruture,
      even the existing poll/select system calls will be able to use it.  As far
      as the existing POLLHUP handling, the patch leaves it as is.  The
      pollrdhup-2.6.16.rc5-0.10.diff defines the POLLRDHUP for all the existing
      archs and sets the bit in the six relevant files.  The other attached diff
      is the simple change required to sys/epoll.h to add the EPOLLRDHUP
      definition.
      
      There is "a stupid program" to test POLLRDHUP delivery here:
      
       http://www.xmailserver.org/pollrdhup-test.c
      
      It tests poll(2), but since the delivery is same epoll(2) will work equally.
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f348d70a
  4. 24 3月, 2006 2 次提交
  5. 23 3月, 2006 12 次提交
    • N
      [PATCH] atomic: add_unless cmpxchg optimise · 0b2fcfdb
      Nick Piggin 提交于
      Without branch hints, the very unlikely chance of the loop repeating due to
      cmpxchg failure is unrolled with gcc-4 that I have tested.
      
      Improve this for architectures with a native cas/cmpxchg.  llsc archs
      should try to implement this natively.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0b2fcfdb
    • K
      [PATCH] Move read_mostly definition to asm/cache.h · 804f1594
      Kyle McMartin 提交于
      Seems like needless clutter having a bunch of #if defined(CONFIG_$ARCH) in
      include/linux/cache.h.  Move the per architecture section definition to
      asm/cache.h, and keep the if-not-defined dummy case in linux/cache.h to
      catch architectures which don't implement the section.
      
      Verified that symbols still go in .data.read_mostly on parisc,
      and the compile doesn't break.
      Signed-off-by: NKyle McMartin <kyle@parisc-linux.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      804f1594
    • M
      [PATCH] x86: Make _syscallX() macros compile in PIC mode · aeefc956
      Markus Gutschke 提交于
      Gcc reserves %ebx when compiling position-independent-code on i386.  This
      means, the _syscallX() macros in include/asm-i386/unistd.h will not
      compile.  This patch is changes the existing macros to take special care to
      preserve %ebx.
      
      The bug can be tracked at http://bugzilla.kernel.org/show_bug.cgi?id=6204Signed-off-by: NMarkus Gutschke <markus@google.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      aeefc956
    • C
      [PATCH] i386 spinlocks: disable interrupts only if we enabled them · 42c059e0
      Chuck Ebbert 提交于
      _raw_spin_lock_flags() is entered with interrupts disabled.  If it cannot
      obtain a spinlock, it checks the flags that were passed and re-enables
      interrupts before spinning if that's how the flags are set.  When the
      spinlock might be available, it disables interrupts (even if they are
      already disabled) before trying to get the lock.  Change that so interrupts
      are only disabled if they have been enabled.  This costs nine bytes of
      duplicated spinloop code.
      
      Fastpath before patch:
              jle <keep looping>      not-taken conditional jump
              cli                     disable interrupts
              jmp <try for lock>      unconditional jump
      
      Fastpath after patch, if interrupts were not enabled:
              jg <try for lock>       taken conditional branch
      Signed-off-by: NChuck Ebbert <76306.1226@compuserve.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      42c059e0
    • J
      [PATCH] Fix the imlicit declaration of mtrr_centaur_report_mcr in arch/i386/kernel/cpu/centaur.c · 52f4a91a
      Jesper Juhl 提交于
      arch/i386/kernel/cpu/centaur.c: In function `centaur_mcr_insert':
      arch/i386/kernel/cpu/centaur.c:33: warning: implicit declaration of function `mtrr_centaur_report_mcr'
      Signed-off-by: NJesper Juhl <jesper.juhl@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      52f4a91a
    • J
      [PATCH] i386: fix uses of user_mode() vs. user_mode_vm() · db753bdf
      Jan Beulich 提交于
      >commit 76381fee
      >Author: Vincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
      >Date:   Thu Jun 23 00:08:46 2005 -0700
      >
      >    [PATCH] xen: x86_64: use more usermode macro
      >
      >    Make use of the user_mode macro where it's possible.  This is useful for Xen
      >    because it will need only to redefine only the macro to a hypervisor call.
      
      I am of the opinion that the above changeset is incomplete, i.e.  it missed
      converting some previous uses of user_mode to user_mode_vm.  While most of
      them could be considered just cosmetical, at least the one in die_nmi
      doesn't appear to be.
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Cc: Vincent Hanquez <vincent.hanquez@cl.cam.ac.uk>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: James Bottomley <James.Bottomley@steeleye.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      db753bdf
    • J
      [PATCH] i386: actively synchronize vmalloc area when registering certain callbacks · 101f12af
      Jan Beulich 提交于
      Registering a callback handler through register_die_notifier() is obviously
      primarily intended for use by modules.  However, the way these currently
      get called it is basically impossible for them to actually be used by
      modules, as there is, on non-PAE configurationes, a good chance (the larger
      the module, the better) for the system to crash as a result.
      
      This is because the callback gets invoked
      
      (a) in the page fault path before the top level page table propagation
          gets carried out (hence a fault to propagate the top level page table
          entry/entries mapping to module's code/data would nest infinitly) and
      
      (b) in the NMI path, where nested faults must absolutely not happen,
          since otherwise the IRET from the nested fault re-enables NMIs,
          potentially resulting in nested NMI occurences.
      
      Besides the modular aspect, similar problems would even arise for in-
      kernel consumers of the API if they touched ioremap()ed or vmalloc()ed
      memory inside their handlers.
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      101f12af
    • S
      [PATCH] x86: early printk handling fixes · 99b7de33
      Stas Sergeev 提交于
      The history is that -mm kernels do not work for me for a few months
      already.  The things started from crashing somewhere after starting init,
      and for the last month - no boot at all, just "Uncompressing...  OK,
      booting kernel", and silence.  Early console didn't work too.  With the
      latest releases this degraded into an infinite stream of the "Unknown
      interrupt or fault" messages.  So today my patience ran out and I started
      to think how can I collect at least some info for the bug-report.  Attached
      is the patch that allows to gather some valueable debug info on the problem
      by making an early console more useable.  I can't properly test the patch,
      as the kernel still doesn't boot, so I'll explain it in details in a hope
      someone else can justify the intrusive changes.
      
      arch_hooks.h: added prototypes for setup_early_printk() and early_printk().
      
      setup.c: killed wrong setup_early_printk() prototype.  Moved
      setup_early_printk() a bit earlier, as it was not "early enough" to cover
      the bug I was fighting with.
      
      early_printk.c: made it to start printing from the bottom of the screen,
      otherwise the messages interfere with the ones of the boot-loader, so you
      can't read them.
      Signed-off-by: NStas Sergeev <stsp@aknet.ru>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Zwane Mwaikambo <zwane@arm.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      99b7de33
    • C
      [PATCH] i386: remove duplicate declaration of mp_bus_id_to_pci_bus · 7c63ee5c
      Chris Wright 提交于
      mp_bus_id_to_pci_bus is declared identically twice.
      Signed-off-by: NChris Wright <chrisw@sous-sol.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7c63ee5c
    • N
      [PATCH] Compilation fix for ES7000 when no ACPI is specified in config (i386) · e5428ede
      Natalie.Protasevich@unisys.com 提交于
      ES7000 platform code clean up for compilation errors and a warning.
      Ifdef'd the ACPI related parts in the ES7000 platform code.  They were
      causing compile errors in certain configuration (without ACPI defined).  I
      think this approach would be best (as opposed to Kconfig changes) since it
      only touches the subarch...
      
      Signed-off-by: <Natalie.Protasevich@unisys.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e5428ede
    • E
      [PATCH] i386: Add a temporary to make put_user more type safe · 30e931d4
      Eric W. Biederman 提交于
      In some code I am developing I had occasion to change the type of a
      variable.  This made the value put_user was putting to user space wrong.
      But the code continued to build cleanly without errors.
      
      Introducing a temporary fixes this problem and at least with gcc-3.3.5 does
      not cause gcc any problems with optimizing out the temporary.  gcc-4.x
      using SSA internally ought to be even better at optimizing out temporaries,
      so I don't expect a temporary to become a problem.  Especially because in
      all correct cases the types on both sides of the assignment to the
      temporary are the same.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      30e931d4
    • G
      [PATCH] x86: SMP alternatives · 9a0b5817
      Gerd Hoffmann 提交于
      Implement SMP alternatives, i.e.  switching at runtime between different
      code versions for UP and SMP.  The code can patch both SMP->UP and UP->SMP.
      The UP->SMP case is useful for CPU hotplug.
      
      With CONFIG_CPU_HOTPLUG enabled the code switches to UP at boot time and
      when the number of CPUs goes down to 1, and switches to SMP when the number
      of CPUs goes up to 2.
      
      Without CONFIG_CPU_HOTPLUG or on non-SMP-capable systems the code is
      patched once at boot time (if needed) and the tables are released
      afterwards.
      
      The changes in detail:
      
        * The current alternatives bits are moved to a separate file,
          the SMP alternatives code is added there.
      
        * The patch adds some new elf sections to the kernel:
          .smp_altinstructions
      	like .altinstructions, also contains a list
      	of alt_instr structs.
          .smp_altinstr_replacement
      	like .altinstr_replacement, but also has some space to
      	save original instruction before replaving it.
          .smp_locks
      	list of pointers to lock prefixes which can be nop'ed
      	out on UP.
          The first two are used to replace more complex instruction
          sequences such as spinlocks and semaphores.  It would be possible
          to deal with the lock prefixes with that as well, but by handling
          them as special case the table sizes become much smaller.
      
       * The sections are page-aligned and padded up to page size, so they
         can be free if they are not needed.
      
       * Splitted the code to release init pages to a separate function and
         use it to release the elf sections if they are unused.
      Signed-off-by: NGerd Hoffmann <kraxel@suse.de>
      Signed-off-by: NChuck Ebbert <76306.1226@compuserve.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9a0b5817
  6. 22 3月, 2006 2 次提交
    • Z
      [PATCH] Enable mprotect on huge pages · 8f860591
      Zhang, Yanmin 提交于
      2.6.16-rc3 uses hugetlb on-demand paging, but it doesn_t support hugetlb
      mprotect.
      
      From: David Gibson <david@gibson.dropbear.id.au>
      
        Remove a test from the mprotect() path which checks that the mprotect()ed
        range on a hugepage VMA is hugepage aligned (yes, really, the sense of
        is_aligned_hugepage_range() is the opposite of what you'd guess :-/).
      
        In fact, we don't need this test.  If the given addresses match the
        beginning/end of a hugepage VMA they must already be suitably aligned.  If
        they don't, then mprotect_fixup() will attempt to split the VMA.  The very
        first test in split_vma() will check for a badly aligned address on a
        hugepage VMA and return -EINVAL if necessary.
      
      From: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
      
        On i386 and x86-64, pte flag _PAGE_PSE collides with _PAGE_PROTNONE.  The
        identify of hugetlb pte is lost when changing page protection via mprotect.
        A page fault occurs later will trigger a bug check in huge_pte_alloc().
      
        The fix is to always make new pte a hugetlb pte and also to clean up
        legacy code where _PAGE_PRESENT is forced on in the pre-faulting day.
      Signed-off-by: NZhang Yanmin <yanmin.zhang@intel.com>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Signed-off-by: NKen Chen <kenneth.w.chen@intel.com>
      Signed-off-by: NNishanth Aravamudan <nacc@us.ibm.com>
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8f860591
    • H
      [PATCH] don't call check_acpi_pci() on x86 with ACPI disabled · 152475cb
      Herbert Poetzl 提交于
      check_acpi_pci() is called from arch/i386/kernel/setup.c even if
      CONFIG_ACPI is not defined, but the code in include/asm/acpi.h doesn't
      provide it in this case.
      Signed-off-by: NHerbert Pötzl <herbert@13thfloor.at>
      Cc: "Brown, Len" <len.brown@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      152475cb
  7. 09 3月, 2006 1 次提交
    • A
      [PATCH] i386: port ATI timer fix from x86_64 to i386 II · f9262c12
      Andi Kleen 提交于
      ATI chipsets tend to generate double timer interrupts for the local APIC
      timer when both the 8254 and the IO-APIC timer pins are enabled.  This is
      because they route it to both and the result is anded together and the CPU
      ends up processing it twice.
      
      This patch changes check_timer to disable the 8254 routing for interrupt 0.
      
      I think it would be safe on all chipsets actually (i tested it on a couple
      and it worked everywhere) and Windows seems to do it in a similar way, but
      to be conservative this patch only enables this mode on ATI (and adds
      options to enable/disable too)
      
      Ported over from a similar x86-64 change.
      
      I reused the ACPI earlyquirk infrastructure for the ATI bridge check, but
      tweaked it a bit to work even without ACPI.
      
      Inspired by a patch from Chuck Ebbert, but redone.
      
      Cc: Chuck Ebbert <76306.1226@compuserve.com>
      Cc: "Brown, Len" <len.brown@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f9262c12
  8. 25 2月, 2006 2 次提交
  9. 18 2月, 2006 1 次提交
  10. 16 2月, 2006 1 次提交
  11. 15 2月, 2006 2 次提交
  12. 12 2月, 2006 1 次提交
    • U
      [PATCH] fstatat64 support · cff2b760
      Ulrich Drepper 提交于
      The *at patches introduced fstatat and, due to inusfficient research, I
      used the newfstat functions generally as the guideline.  The result is that
      on 32-bit platforms we don't have all the information needed to implement
      fstatat64.
      
      This patch modifies the code to pass up 64-bit information if
      __ARCH_WANT_STAT64 is defined.  I renamed the syscall entry point to make
      this clear.  Other archs will continue to use the existing code.  On x86-64
      the compat code is implemented using a new sys32_ function.  this is what
      is done for the other stat syscalls as well.
      
      This patch might break some other archs (those which define
      __ARCH_WANT_STAT64 and which already wired up the syscall).  Yet others
      might need changes to accomodate the compatibility mode.  I really don't
      want to do that work because all this stat handling is a mess (more so in
      glibc, but the kernel is also affected).  It should be done by the arch
      maintainers.  I'll provide some stand-alone test shortly.  Those who are
      eager could compile glibc and run 'make check' (no installation needed).
      
      The patch below has been tested on x86 and x86-64.
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      cff2b760
  13. 08 2月, 2006 1 次提交
  14. 06 2月, 2006 1 次提交
  15. 04 2月, 2006 1 次提交
    • Z
      [PATCH] Export cpu topology in sysfs · 69dcc991
      Zhang, Yanmin 提交于
      The patch implements cpu topology exportation by sysfs.
      
      Items (attributes) are similar to /proc/cpuinfo.
      
      1) /sys/devices/system/cpu/cpuX/topology/physical_package_id:
      	represent the physical package id of  cpu X;
      2) /sys/devices/system/cpu/cpuX/topology/core_id:
      	represent the cpu core id to cpu X;
      3) /sys/devices/system/cpu/cpuX/topology/thread_siblings:
      	represent the thread siblings to cpu X in the same core;
      4) /sys/devices/system/cpu/cpuX/topology/core_siblings:
      	represent the thread siblings to cpu X in the same physical package;
      
      To implement it in an architecture-neutral way, a new source file,
      driver/base/topology.c, is to export the 5 attributes.
      
      If one architecture wants to support this feature, it just needs to
      implement 4 defines, typically in file include/asm-XXX/topology.h.
      The 4 defines are:
      #define topology_physical_package_id(cpu)
      #define topology_core_id(cpu)
      #define topology_thread_siblings(cpu)
      #define topology_core_siblings(cpu)
      
      The type of **_id is int.
      The type of siblings is cpumask_t.
      
      To be consistent on all architectures, the 4 attributes should have
      deafult values if their values are unavailable. Below is the rule.
      
      1) physical_package_id: If cpu has no physical package id, -1 is the
      default value.
      
      2) core_id: If cpu doesn't support multi-core, its core id is 0.
      
      3) thread_siblings: Just include itself, if the cpu doesn't support
      HT/multi-thread.
      
      4) core_siblings: Just include itself, if the cpu doesn't support
      multi-core and HT/Multi-thread.
      
      So be careful when declaring the 4 defines in include/asm-XXX/topology.h.
      
      If an attribute isn't defined on an architecture, it won't be exported.
      
      Thank Nathan, Greg, Andi, Paul and Venki.
      
      The patch provides defines for i386/x86_64/ia64.
      Signed-off-by: NZhang, Yanmin <yanmin.zhang@intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      69dcc991
  16. 02 2月, 2006 1 次提交
    • M
      [PATCH] VMSPLIT config options · 975b3d3d
      Mark Lord 提交于
      Enable selection of different user/kernel VM splits for i386, including an
      optimized mode for 1GB physical RAM, which gives the kernel a direct (non
      HIGHMEM) mapping to the entire 1GB rather than just the first 896MB.
      
      There is a similarly a similarly optimized mode for machines with exactly 2GB
      of physical RAM.
      
      This can speed up the kernel by avoiding having to create/destroy temporary
      HIGHMEM mappings, and by not having to include HIGHMEM support at all on such
      machines.  The flip side is that there's less virtual addressing left for
      userspace in these alternatives, and some binary-only kernel modules may
      misbehave unless rebuilt with the same VMSPLIT option as the main kernel
      image.
      
      Original idea/patch from Jens Axboe, modified based on suggestions from Linus
      et al.
      Signed-off-by: NMark Lord <mlord@pobox.com>
      Signed-off-by: NJens Axboe <axboe@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      975b3d3d
  17. 19 1月, 2006 5 次提交
    • A
      [PATCH] EDAC: core EDAC support code · da9bb1d2
      Alan Cox 提交于
      This is a subset of the bluesmoke project core code, stripped of the NMI work
      which isn't ready to merge and some of the "interesting" proc functionality
      that needs reworking or just has no place in kernel.  It requires no core
      kernel changes except the added scrub functions already posted.
      
      The goal is to merge further functionality only after the core code is
      accepted and proven in the base kernel, and only at the point the upstream
      extras are really ready to merge.
      
      From: doug thompson <norsk5@xmission.com>
      
        This converts EDAC to sysfs and is the final chunk neccessary before EDAC
        has a stable user space API and can be considered for submission into the
        base kernel.
      Signed-off-by: NAlan Cox <alan@redhat.com>
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NJesper Juhl <jesper.juhl@gmail.com>
      Signed-off-by: Ndoug thompson <norsk5@xmission.com>
      Signed-off-by: NPavel Machek <pavel@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      da9bb1d2
    • A
      [PATCH] EDAC: atomic scrub operations · 715b49ef
      Alan Cox 提交于
      EDAC requires a way to scrub memory if an ECC error is found and the chipset
      does not do the work automatically.  That means rewriting memory locations
      atomically with respect to all CPUs _and_ bus masters.  That means we can't
      use atomic_add(foo, 0) as it gets optimised for non-SMP
      
      This adds a function to include/asm-foo/atomic.h for the platforms currently
      supported which implements a scrub of a mapped block.
      
      It also adjusts a few other files include order where atomic.h is included
      before types.h as this now causes an error as atomic_scrub uses u32.
      Signed-off-by: NAlan Cox <alan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      715b49ef
    • D
      [PATCH] Add pselect/ppoll system calls on i386 · 3213e913
      David Woodhouse 提交于
      Add the sys_pselect6() and sys_poll() calls to the i386 syscall table.
      Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3213e913
    • D
      [PATCH] Handle TIF_RESTORE_SIGMASK for i386 · 283828f3
      David Howells 提交于
      Handle TIF_RESTORE_SIGMASK as added by David Woodhouse's patch entitled:
      
              [PATCH] 2/3 Add TIF_RESTORE_SIGMASK support for arch/powerpc
              [PATCH] 3/3 Generic sys_rt_sigsuspend
      
      It does the following:
      
       (1) Declares TIF_RESTORE_SIGMASK for i386.
      
       (2) Invokes it over to do_signal() when TIF_RESTORE_SIGMASK is set.
      
       (3) Makes do_signal() support TIF_RESTORE_SIGMASK, using the signal mask saved
           in current->saved_sigmask.
      
       (4) Discards sys_rt_sigsuspend() from the arch, using the generic one instead.
      
       (5) Makes sys_sigsuspend() save the signal mask and set TIF_RESTORE_SIGMASK
           rather than attempting to fudge the return registers.
      
       (6) Makes sys_sigsuspend() return -ERESTARTNOHAND rather than looping
           intrinsically.
      
       (7) Makes setup_frame(), setup_rt_frame() and handle_signal() return 0 or
           -EFAULT rather than true/false to be consistent with the rest of the
           kernel.
      
      Due to the fact do_signal() is then only called from one place:
      
       (8) Makes do_signal() no longer have a return value is it was just being
           ignored; force_sig() takes care of this.
      
       (9) Discards the old sigmask argument to do_signal() as it's no longer
           necessary.
      
      (10) Makes do_signal() static.
      
      (11) Marks the second argument to do_notify_resume() as unused. The unused
           argument should remain in the middle as the arguments are passed in as
           registers, and the ordering is specific in entry.S
      
      Given the way do_signal() is now no longer called from sys_{,rt_}sigsuspend(),
      they no longer need access to the exception frame, and so can just take
      arguments normally.
      
      This patch depends on sys_rt_sigsuspend patch.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      283828f3
    • U
      [PATCH] vfs: *at functions: i386 · 4f085507
      Ulrich Drepper 提交于
      Wire up the x86 syscalls
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4f085507