1. 13 9月, 2005 2 次提交
    • A
      [PATCH] i386: add memory clobbers to syscall macros · 6e44f12b
      Andi Kleen 提交于
      As noted by matz@suse.de
      
      The problem is, that on i386 the syscallN
      macro is defined like so:
      
        long __res; \
        __asm__ volatile ("int $0x80" \
              : "=a" (__res) \
              : "0" (__NR_##name),"b" ((long)(arg1)),"c" ((long)(arg2)), \
                "d" ((long)(arg3)),"S" ((long)(arg4)),"D" ((long)(arg5))); \
      
      If one of the arguments (in the _llseek syscall it's the arg4) is a pointer
      which the syscall is expected to write to (to the memory pointed to by this
      ptr), then this side-effect is not captured in the asm.
      
      If anyone uses this macro to define it's own version of the syscall
      (sometimes necessary when not using glibc) and it's inlined, then GCC
      doesn't know that this asm write to "*dest", when called like so for instance:
      
        out = 1;
        llseek (fd, bla, blubb, &out, trara)
        use (out);
      
      Here nobody tells GCC that "out" actually is written to (just a pointer to it
      is passed to the asm).  Hence GCC might (and in the above bug did)
      copy-propagate "1" into the second use of "out".
      
      The easiest solution would be to add a "memory" clobber to the definition
      of this syscall macro.  As this is a syscall, it shouldn't inhibit too many
      optimizations.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6e44f12b
    • A
      [PATCH] x86-64: Use ACPI PXM to parse PCI<->node assignments · 69e1a33f
      Andi Kleen 提交于
      Since this is shared code I had to implement it for i386 too
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      69e1a33f
  2. 11 9月, 2005 2 次提交
    • A
      [PATCH] include/asm-i386/: "extern inline" -> "static inline" · e2afe674
      Adrian Bunk 提交于
      "extern inline" doesn't make much sense.
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e2afe674
    • I
      [PATCH] spinlock consolidation · fb1c8f93
      Ingo Molnar 提交于
      This patch (written by me and also containing many suggestions of Arjan van
      de Ven) does a major cleanup of the spinlock code.  It does the following
      things:
      
       - consolidates and enhances the spinlock/rwlock debugging code
      
       - simplifies the asm/spinlock.h files
      
       - encapsulates the raw spinlock type and moves generic spinlock
         features (such as ->break_lock) into the generic code.
      
       - cleans up the spinlock code hierarchy to get rid of the spaghetti.
      
      Most notably there's now only a single variant of the debugging code,
      located in lib/spinlock_debug.c.  (previously we had one SMP debugging
      variant per architecture, plus a separate generic one for UP builds)
      
      Also, i've enhanced the rwlock debugging facility, it will now track
      write-owners.  There is new spinlock-owner/CPU-tracking on SMP builds too.
      All locks have lockup detection now, which will work for both soft and hard
      spin/rwlock lockups.
      
      The arch-level include files now only contain the minimally necessary
      subset of the spinlock code - all the rest that can be generalized now
      lives in the generic headers:
      
       include/asm-i386/spinlock_types.h       |   16
       include/asm-x86_64/spinlock_types.h     |   16
      
      I have also split up the various spinlock variants into separate files,
      making it easier to see which does what. The new layout is:
      
         SMP                         |  UP
         ----------------------------|-----------------------------------
         asm/spinlock_types_smp.h    |  linux/spinlock_types_up.h
         linux/spinlock_types.h      |  linux/spinlock_types.h
         asm/spinlock_smp.h          |  linux/spinlock_up.h
         linux/spinlock_api_smp.h    |  linux/spinlock_api_up.h
         linux/spinlock.h            |  linux/spinlock.h
      
      /*
       * here's the role of the various spinlock/rwlock related include files:
       *
       * on SMP builds:
       *
       *  asm/spinlock_types.h: contains the raw_spinlock_t/raw_rwlock_t and the
       *                        initializers
       *
       *  linux/spinlock_types.h:
       *                        defines the generic type and initializers
       *
       *  asm/spinlock.h:       contains the __raw_spin_*()/etc. lowlevel
       *                        implementations, mostly inline assembly code
       *
       *   (also included on UP-debug builds:)
       *
       *  linux/spinlock_api_smp.h:
       *                        contains the prototypes for the _spin_*() APIs.
       *
       *  linux/spinlock.h:     builds the final spin_*() APIs.
       *
       * on UP builds:
       *
       *  linux/spinlock_type_up.h:
       *                        contains the generic, simplified UP spinlock type.
       *                        (which is an empty structure on non-debug builds)
       *
       *  linux/spinlock_types.h:
       *                        defines the generic type and initializers
       *
       *  linux/spinlock_up.h:
       *                        contains the __raw_spin_*()/etc. version of UP
       *                        builds. (which are NOPs on non-debug, non-preempt
       *                        builds)
       *
       *   (included on UP-non-debug builds:)
       *
       *  linux/spinlock_api_up.h:
       *                        builds the _spin_*() APIs.
       *
       *  linux/spinlock.h:     builds the final spin_*() APIs.
       */
      
      All SMP and UP architectures are converted by this patch.
      
      arm, i386, ia64, ppc, ppc64, s390/s390x, x64 was build-tested via
      crosscompilers.  m32r, mips, sh, sparc, have not been tested yet, but should
      be mostly fine.
      
      From: Grant Grundler <grundler@parisc-linux.org>
      
        Booted and lightly tested on a500-44 (64-bit, SMP kernel, dual CPU).
        Builds 32-bit SMP kernel (not booted or tested).  I did not try to build
        non-SMP kernels.  That should be trivial to fix up later if necessary.
      
        I converted bit ops atomic_hash lock to raw_spinlock_t.  Doing so avoids
        some ugly nesting of linux/*.h and asm/*.h files.  Those particular locks
        are well tested and contained entirely inside arch specific code.  I do NOT
        expect any new issues to arise with them.
      
       If someone does ever need to use debug/metrics with them, then they will
        need to unravel this hairball between spinlocks, atomic ops, and bit ops
        that exist only because parisc has exactly one atomic instruction: LDCW
        (load and clear word).
      
      From: "Luck, Tony" <tony.luck@intel.com>
      
         ia64 fix
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NArjan van de Ven <arjanv@infradead.org>
      Signed-off-by: NGrant Grundler <grundler@parisc-linux.org>
      Cc: Matthew Wilcox <willy@debian.org>
      Signed-off-by: NHirokazu Takata <takata@linux-m32r.org>
      Signed-off-by: NMikael Pettersson <mikpe@csd.uu.se>
      Signed-off-by: NBenoit Boissinot <benoit.boissinot@ens-lyon.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fb1c8f93
  3. 10 9月, 2005 3 次提交
    • T
      [PATCH] fix reboot via keyboard controller reset · 59f4e7d5
      Truxton Fulton 提交于
      I have a system (Biostar IDEQ210M mini-pc with a VIA chipset) which will
      not reboot unless a keyboard is plugged in to it.  I have tried all
      combinations of the kernel "reboot=x,y" flags to no avail.  Rebooting by
      any method will leave the system in a wedged state (at the "Restarting
      system" message).
      
      I finally tracked the problem down to the machine's refusal to fully reboot
      unless the keyboard controller status register had bit 2 set.  This is the
      "System flag" which when set, indicates successful completion of the
      keyboard controller self-test (Basic Assurance Test, BAT).
      
      I suppose that something is trying to protect against sporadic reboots
      unless the keyboard controller is in a good state (a keyboard is present),
      but I need this machine to be headless.
      
      I found that setting the system flag (via the command byte) before giving
      the "pulse reset line" command will allow the reboot to proceed.  The patch
      is simple, and I think it should be fine for everybody whether they have
      this type of machine or not.  This affects the "hard" reboot (as done when
      the kernel boot flags "reboot=c,h" are used).
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      59f4e7d5
    • M
      [PATCH] i386: CONFIG_ACPI_SRAT typo fix · abda2452
      Magnus Damm 提交于
      Fix a typo involving CONFIG_ACPI_SRAT.
      Signed-off-by: NMagnus Damm <magnus@valinux.co.jp>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      abda2452
    • S
      kbuild: full dependency check on asm-offsets.h · 86feeaa8
      Sam Ravnborg 提交于
      Building asm-offsets.h has been moved to a seperate Kbuild file
      located in the top-level directory. This allow us to share the
      functionality across the architectures.
      
      The old rules in architecture specific Makefiles will die
      in subsequent patches.
      
      Furhtermore the usual kbuild dependency tracking is now used
      when deciding to rebuild asm-offsets.s. So we no longer risk
      to fail a rebuild caused by asm-offsets.c dependencies being touched.
      
      With this common rule-set we now force the same name across
      all architectures. Following patches will fix the rest.
      Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
      86feeaa8
  4. 08 9月, 2005 9 次提交
    • S
      [PATCH] Clean up struct flock64 definitions · 8d286aa5
      Stephen Rothwell 提交于
      This patch gathers all the struct flock64 definitions (and the operations),
      puts them under !CONFIG_64BIT and cleans up the arch files.
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8d286aa5
    • S
      [PATCH] Clean up struct flock definitions · 5ac353f9
      Stephen Rothwell 提交于
      This patch just gathers together all the struct flock definitions except
      xtensa into asm-generic/fcntl.h.
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5ac353f9
    • S
      [PATCH] Clean up the fcntl operations · 1abf62af
      Stephen Rothwell 提交于
      This patch puts the most popular of each fcntl operation/flag into
      asm-generic/fcntl.h and cleans up the arch files.
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1abf62af
    • S
      [PATCH] Clean up the open flags · e64ca97f
      Stephen Rothwell 提交于
      This patch puts the most popular of each open flag into asm-generic/fcntl.h
      and cleans up the arch files.
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e64ca97f
    • S
      [PATCH] Create asm-generic/fcntl.h · 9317259e
      Stephen Rothwell 提交于
      This set of patches creates asm-generic/fcntl.h and consolidates as much as
      possible from the asm-*/fcntl.h files into it.
      
      This patch just gathers all the identical bits of the asm-*/fcntl.h files into
      asm-generic/fcntl.h.
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NYoichi Yuasa <yuasa@hh.iij4u.or.jp>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9317259e
    • J
      [PATCH] remove verify_area(): remove verify_area() from various uaccess.h headers · 97de50c0
      Jesper Juhl 提交于
      Remove the deprecated (and unused) verify_area() from various uaccess.h
      headers.
      Signed-off-by: NJesper Juhl <jesper.juhl@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      97de50c0
    • C
      [PATCH] remove asm-*/hdreg.h · c8d12741
      Christoph Hellwig 提交于
      unused and useless..
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c8d12741
    • H
      [PATCH] auxiliary vector cleanups · 36d57ac4
      H. J. Lu 提交于
      The size of auxiliary vector is fixed at 42 in linux/sched.h.  But it isn't
      very obvious when looking at linux/elf.h.  This patch adds AT_VECTOR_SIZE
      so that we can change it if necessary when a new vector is added.
      
      Because of include file ordering problems, doing this necessitated the
      extraction of the AT_* symbols into a standalone header file.
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      36d57ac4
    • J
      [PATCH] FUTEX_WAKE_OP: pthread_cond_signal() speedup · 4732efbe
      Jakub Jelinek 提交于
      ATM pthread_cond_signal is unnecessarily slow, because it wakes one waiter
      (which at least on UP usually means an immediate context switch to one of
      the waiter threads).  This waiter wakes up and after a few instructions it
      attempts to acquire the cv internal lock, but that lock is still held by
      the thread calling pthread_cond_signal.  So it goes to sleep and eventually
      the signalling thread is scheduled in, unlocks the internal lock and wakes
      the waiter again.
      
      Now, before 2003-09-21 NPTL was using FUTEX_REQUEUE in pthread_cond_signal
      to avoid this performance issue, but it was removed when locks were
      redesigned to the 3 state scheme (unlocked, locked uncontended, locked
      contended).
      
      Following scenario shows why simply using FUTEX_REQUEUE in
      pthread_cond_signal together with using lll_mutex_unlock_force in place of
      lll_mutex_unlock is not enough and probably why it has been disabled at
      that time:
      
      The number is value in cv->__data.__lock.
              thr1            thr2            thr3
      0       pthread_cond_wait
      1       lll_mutex_lock (cv->__data.__lock)
      0       lll_mutex_unlock (cv->__data.__lock)
      0       lll_futex_wait (&cv->__data.__futex, futexval)
      0                       pthread_cond_signal
      1                       lll_mutex_lock (cv->__data.__lock)
      1                                       pthread_cond_signal
      2                                       lll_mutex_lock (cv->__data.__lock)
      2                                         lll_futex_wait (&cv->__data.__lock, 2)
      2                       lll_futex_requeue (&cv->__data.__futex, 0, 1, &cv->__data.__lock)
                                # FUTEX_REQUEUE, not FUTEX_CMP_REQUEUE
      2                       lll_mutex_unlock_force (cv->__data.__lock)
      0                         cv->__data.__lock = 0
      0                         lll_futex_wake (&cv->__data.__lock, 1)
      1       lll_mutex_lock (cv->__data.__lock)
      0       lll_mutex_unlock (cv->__data.__lock)
                # Here, lll_mutex_unlock doesn't know there are threads waiting
                # on the internal cv's lock
      
      Now, I believe it is possible to use FUTEX_REQUEUE in pthread_cond_signal,
      but it will cost us not one, but 2 extra syscalls and, what's worse, one of
      these extra syscalls will be done for every single waiting loop in
      pthread_cond_*wait.
      
      We would need to use lll_mutex_unlock_force in pthread_cond_signal after
      requeue and lll_mutex_cond_lock in pthread_cond_*wait after lll_futex_wait.
      
      Another alternative is to do the unlocking pthread_cond_signal needs to do
      (the lock can't be unlocked before lll_futex_wake, as that is racy) in the
      kernel.
      
      I have implemented both variants, futex-requeue-glibc.patch is the first
      one and futex-wake_op{,-glibc}.patch is the unlocking inside of the kernel.
       The kernel interface allows userland to specify how exactly an unlocking
      operation should look like (some atomic arithmetic operation with optional
      constant argument and comparison of the previous futex value with another
      constant).
      
      It has been implemented just for ppc*, x86_64 and i?86, for other
      architectures I'm including just a stub header which can be used as a
      starting point by maintainers to write support for their arches and ATM
      will just return -ENOSYS for FUTEX_WAKE_OP.  The requeue patch has been
      (lightly) tested just on x86_64, the wake_op patch on ppc64 kernel running
      32-bit and 64-bit NPTL and x86_64 kernel running 32-bit and 64-bit NPTL.
      
      With the following benchmark on UP x86-64 I get:
      
      for i in nptl-orig nptl-requeue nptl-wake_op; do echo time elf/ld.so --library-path .:$i /tmp/bench; \
      for j in 1 2; do echo ( time elf/ld.so --library-path .:$i /tmp/bench ) 2>&1; done; done
      time elf/ld.so --library-path .:nptl-orig /tmp/bench
      real 0m0.655s user 0m0.253s sys 0m0.403s
      real 0m0.657s user 0m0.269s sys 0m0.388s
      time elf/ld.so --library-path .:nptl-requeue /tmp/bench
      real 0m0.496s user 0m0.225s sys 0m0.271s
      real 0m0.531s user 0m0.242s sys 0m0.288s
      time elf/ld.so --library-path .:nptl-wake_op /tmp/bench
      real 0m0.380s user 0m0.176s sys 0m0.204s
      real 0m0.382s user 0m0.175s sys 0m0.207s
      
      The benchmark is at:
      http://sourceware.org/ml/libc-alpha/2005-03/txt00001.txt
      Older futex-requeue-glibc.patch version is at:
      http://sourceware.org/ml/libc-alpha/2005-03/txt00002.txt
      Older futex-wake_op-glibc.patch version is at:
      http://sourceware.org/ml/libc-alpha/2005-03/txt00003.txt
      Will post a new version (just x86-64 fixes so that the patch
      applies against pthread_cond_signal.S) to libc-hacker ml soon.
      
      Attached is the kernel FUTEX_WAKE_OP patch as well as a simple-minded
      testcase that will not test the atomicity of the operation, but at least
      check if the threads that should have been woken up are woken up and
      whether the arithmetic operation in the kernel gave the expected results.
      Acked-by: NIngo Molnar <mingo@redhat.com>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Cc: Jamie Lokier <jamie@shareable.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NYoichi Yuasa <yuasa@hh.iij4u.or.jp>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4732efbe
  5. 05 9月, 2005 24 次提交
    • L
      [PATCH] UML Support - Ptrace: adds the host SYSEMU support, for UML and general usage · ed75e8d5
      Laurent Vivier 提交于
            Jeff Dike <jdike@addtoit.com>,
            Paolo 'Blaisorblade' Giarrusso <blaisorblade_spam@yahoo.it>,
            Bodo Stroesser <bstroesser@fujitsu-siemens.com>
      
      Adds a new ptrace(2) mode, called PTRACE_SYSEMU, resembling PTRACE_SYSCALL
      except that the kernel does not execute the requested syscall; this is useful
      to improve performance for virtual environments, like UML, which want to run
      the syscall on their own.
      
      In fact, using PTRACE_SYSCALL means stopping child execution twice, on entry
      and on exit, and each time you also have two context switches; with SYSEMU you
      avoid the 2nd stop and so save two context switches per syscall.
      
      Also, some architectures don't have support in the host for changing the
      syscall number via ptrace(), which is currently needed to skip syscall
      execution (UML turns any syscall into getpid() to avoid it being executed on
      the host).  Fixing that is hard, while SYSEMU is easier to implement.
      
      * This version of the patch includes some suggestions of Jeff Dike to avoid
        adding any instructions to the syscall fast path, plus some other little
        changes, by myself, to make it work even when the syscall is executed with
        SYSENTER (but I'm unsure about them). It has been widely tested for quite a
        lot of time.
      
      * Various fixed were included to handle the various switches between
        various states, i.e. when for instance a syscall entry is traced with one of
        PT_SYSCALL / _SYSEMU / _SINGLESTEP and another one is used on exit.
        Basically, this is done by remembering which one of them was used even after
        the call to ptrace_notify().
      
      * We're combining TIF_SYSCALL_EMU with TIF_SYSCALL_TRACE or TIF_SINGLESTEP
        to make do_syscall_trace() notice that the current syscall was started with
        SYSEMU on entry, so that no notification ought to be done in the exit path;
        this is a bit of a hack, so this problem is solved in another way in next
        patches.
      
      * Also, the effects of the patch:
      "Ptrace - i386: fix Syscall Audit interaction with singlestep"
      are cancelled; they are restored back in the last patch of this series.
      
      Detailed descriptions of the patches doing this kind of processing follow (but
      I've already summed everything up).
      
      * Fix behaviour when changing interception kind #1.
      
        In do_syscall_trace(), we check the status of the TIF_SYSCALL_EMU flag
        only after doing the debugger notification; but the debugger might have
        changed the status of this flag because he continued execution with
        PTRACE_SYSCALL, so this is wrong.  This patch fixes it by saving the flag
        status before calling ptrace_notify().
      
      * Fix behaviour when changing interception kind #2:
        avoid intercepting syscall on return when using SYSCALL again.
      
        A guest process switching from using PTRACE_SYSEMU to PTRACE_SYSCALL
        crashes.
      
        The problem is in arch/i386/kernel/entry.S.  The current SYSEMU patch
        inhibits the syscall-handler to be called, but does not prevent
        do_syscall_trace() to be called after this for syscall completion
        interception.
      
        The appended patch fixes this.  It reuses the flag TIF_SYSCALL_EMU to
        remember "we come from PTRACE_SYSEMU and now are in PTRACE_SYSCALL", since
        the flag is unused in the depicted situation.
      
      * Fix behaviour when changing interception kind #3:
        avoid intercepting syscall on return when using SINGLESTEP.
      
        When testing 2.6.9 and the skas3.v6 patch, with my latest patch and had
        problems with singlestepping on UML in SKAS with SYSEMU.  It looped
        receiving SIGTRAPs without moving forward.  EIP of the traced process was
        the same for all SIGTRAPs.
      
      What's missing is to handle switching from PTRACE_SYSCALL_EMU to
      PTRACE_SINGLESTEP in a way very similar to what is done for the change from
      PTRACE_SYSCALL_EMU to PTRACE_SYSCALL_TRACE.
      
      I.e., after calling ptrace(PTRACE_SYSEMU), on the return path, the debugger is
      notified and then wake ups the process; the syscall is executed (or skipped,
      when do_syscall_trace() returns 0, i.e.  when using PTRACE_SYSEMU), and
      do_syscall_trace() is called again.  Since we are on the return path of a
      SYSEMU'd syscall, if the wake up is performed through ptrace(PTRACE_SYSCALL),
      we must still avoid notifying the parent of the syscall exit.  Now, this
      behaviour is extended even to resuming with PTRACE_SINGLESTEP.
      Signed-off-by: NPaolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      Cc: Jeff Dike <jdike@addtoit.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ed75e8d5
    • S
      [PATCH] add suspend/resume for timer · c3c433e4
      Shaohua Li 提交于
      The timers lack .suspend/.resume methods.  Because of this, jiffies got a
      big compensation after a S3 resume.  And then softlockup watchdog reports
      an oops.  This occured with HPET enabled, but it's also possible for other
      timers.
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c3c433e4
    • Z
      [PATCH] i386 boottime for_each_cpu broken · 4ad8d383
      Zwane Mwaikambo 提交于
      for_each_cpu walks through all processors in cpu_possible_map, which is
      defined as cpu_callout_map on i386 and isn't initialised until all
      processors have been booted. This breaks things which do for_each_cpu
      iterations early during boot. So, define cpu_possible_map as a bitmap with
      NR_CPUS bits populated. This was triggered by a patch i'm working on which
      does alloc_percpu before bringing up secondary processors.
      
      From: Alexander Nyberg <alexn@telia.com>
      
      i386-boottime-for_each_cpu-broken.patch
      i386-boottime-for_each_cpu-broken-fix.patch
      
      The SMP version of __alloc_percpu checks the cpu_possible_map before
      allocating memory for a certain cpu.  With the above patches the BSP cpuid
      is never set in cpu_possible_map which breaks CONFIG_SMP on uniprocessor
      machines (as soon as someone tries to dereference something allocated via
      __alloc_percpu, which in fact is never allocated since the cpu is not set
      in cpu_possible_map).
      Signed-off-by: NZwane Mwaikambo <zwane@arm.linux.org.uk>
      Signed-off-by: NAlexander Nyberg <alexn@telia.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4ad8d383
    • Z
      [PATCH] i386: encapsulate copying of pgd entries · d7271b14
      Zachary Amsden 提交于
      Add a clone operation for pgd updates.
      
      This helps complete the encapsulation of updates to page tables (or pages
      about to become page tables) into accessor functions rather than using
      memcpy() to duplicate them.  This is both generally good for consistency
      and also necessary for running in a hypervisor which requires explicit
      updates to page table entries.
      
      The new function is:
      
      clone_pgd_range(pgd_t *dst, pgd_t *src, int count);
      
         dst - pointer to pgd range anwhere on a pgd page
         src - ""
         count - the number of pgds to copy.
      
         dst and src can be on the same page, but the range must not overlap
         and must not cross a page boundary.
      
      Note that I ommitted using this call to copy pgd entries into the
      software suspend page root, since this is not technically a live paging
      structure, rather it is used on resume from suspend.  CC'ing Pavel in case
      he has any feedback on this.
      
      Thanks to Chris Wright for noticing that this could be more optimal in
      PAE compiles by eliminating the memset.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d7271b14
    • G
      [PATCH] x86 NMI: better support for debuggers · 748f2edb
      George Anzinger 提交于
      This patch adds a notify to the die_nmi notify that the system is about to
      be taken down.  If the notify is handled with a NOTIFY_STOP return, the
      system is given a new lease on life.
      
      We also change the nmi watchdog to carry on if die_nmi returns.
      
      This give debug code a chance to a) catch watchdog timeouts and b) possibly
      allow the system to continue, realizing that the time out may be due to
      debugger activities such as single stepping which is usually done with
      "other" cpus held.
      
      Signed-off-by: George Anzinger<george@mvista.com>
      Cc: Keith Owens <kaos@ocs.com.au>
      Signed-off-by: NGeorge Anzinger <george@mvista.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      748f2edb
    • Z
      [PATCH] x86: introduce a write acessor for updating the current LDT · f2f30ebc
      Zachary Amsden 提交于
      Introduce a write acessor for updating the current LDT.  This is required
      for hypervisors like Xen that do not allow LDT pages to be directly
      written.
      
      Testing - here's a fun little LDT test that can be trivially modified to
      test limits as well.
      
      /*
       * Copyright (c) 2005, Zachary Amsden (zach@vmware.com)
       * This is licensed under the GPL.
       */
      
      #include <stdio.h>
      #include <signal.h>
      #include <asm/ldt.h>
      #include <asm/segment.h>
      #include <sys/types.h>
      #include <unistd.h>
      #include <sys/mman.h>
      #define __KERNEL__
      #include <asm/page.h>
      
      void main(void)
      {
              struct user_desc desc;
              char *code;
              unsigned long long tsc;
      
              code = (char *)mmap(0, 8192, PROT_EXEC|PROT_READ|PROT_WRITE,
                                       MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
              desc.entry_number = 0;
              desc.base_addr = code;
              desc.limit = 1;
              desc.seg_32bit = 1;
              desc.contents = MODIFY_LDT_CONTENTS_CODE;
              desc.read_exec_only = 0;
              desc.limit_in_pages = 1;
              desc.seg_not_present = 0;
              desc.useable = 1;
              if (modify_ldt(1, &desc, sizeof(desc)) != 0) {
                      perror("modify_ldt");
              }
              printf("code base is 0x%08x\n", (unsigned)code);
              code[0x0ffe] = 0x0f;  /* rdtsc */
              code[0x0fff] = 0x31;
              code[0x1000] = 0xcb;  /* lret */
              __asm__ __volatile("lcall $7,$0xffe" : "=A" (tsc));
              printf("TSC is 0x%016llx\n", tsc);
      }
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f2f30ebc
    • Z
      [PATCH] x86: make IOPL explicit · a5201129
      Zachary Amsden 提交于
      The pushf/popf in switch_to are ONLY used to switch IOPL.  Making this
      explicit in C code is more clear.  This pushf/popf pair was added as a
      bugfix for leaking IOPL to unprivileged processes when using
      sysenter/sysexit based system calls (sysexit does not restore flags).
      
      When requesting an IOPL change in sys_iopl(), it is just as easy to change
      the current flags and the flags in the stack image (in case an IRET is
      required), but there is no reason to force an IRET if we came in from the
      SYSENTER path.
      
      This change is the minimal solution for supporting a paravirtualized Linux
      kernel that allows user processes to run with I/O privilege.  Other
      solutions require radical rewrites of part of the low level fault / system
      call handling code, or do not fully support sysenter based system calls.
      
      Unfortunately, this added one field to the thread_struct.  But as a bonus,
      on P4, the fastest time measured for switch_to() went from 312 to 260
      cycles, a win of about 17% in the fast case through this performance
      critical path.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a5201129
    • Z
      [PATCH] x86: privilege cleanup · 0998e422
      Zachary Amsden 提交于
      Privilege checking cleanup.  Originally, these diffs were much greater, but
      recent cleanups in Linux have already done much of the cleanup.  I added
      some explanatory comments in places where the reasoning behind certain
      tests is rather subtle.
      
      Also, in traps.c, we can skip the user_mode check in handle_BUG().  The
      reason is, there are only two call chains - one via die_if_kernel() and one
      via do_page_fault(), both entering from die().  Both of these paths already
      ensure that a kernel mode failure has happened.  Also, the original check
      here, if (user_mode(regs)) was insufficient anyways, since it would not
      rule out BUG faults from V8086 mode execution.
      
      Saving the %ss segment in show_regs() rather than assuming a fixed value
      also gives better information about the current kernel state in the
      register dump.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0998e422
    • Z
      [PATCH] x86: more asm cleanups · f2ab4461
      Zachary Amsden 提交于
      Some more assembler cleanups I noticed along the way.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f2ab4461
    • I
      [PATCH] i386: fix incorrect TSS entry for LDT · 4f0cb8d9
      Ingo Molnar 提交于
      Noticed by Chuck Ebbert: the .ldt entry of the TSS was set up incorrectly.
      It never mattered since this was a leftover from old times, so remove it.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4f0cb8d9
    • Z
      [PATCH] i386: use set_pte macros in a couple places where they were missing · c9b02a24
      Zachary Amsden 提交于
      Also, setting PDPEs in PAE mode does not require atomic operations, since the
      PDPEs are cached by the processor, and only reloaded on an explicit or
      implicit reload of CR3.
      
      Since the four PDPEs must always be present in an active root, and the kernel
      PDPE is never updated, we are safe even from SMIs and interrupts / NMIs using
      task gates (which reload CR3).  Actually, much of this is moot, since the user
      PDPEs are never updated either, and the only usage of task gates is by the
      doublefault handler.  It appears the only place PGDs get updated in PAE mode
      is in init_low_mappings() / zap_low_mapping() for initial page table creation
      and recovery from ACPI sleep state, and these sites are safe by inspection.
      Getting rid of the cmpxchg8b saves code space and 720 cycles in pgd_alloc on
      P4.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c9b02a24
    • Z
      [PATCH] i386: generate better code around descriptor update and access functions · 2f2984eb
      Zachary Amsden 提交于
      GCC can generate better code around descriptor update and access functions
      when there is not an explicit "eax" register constraint.
      
      Testing: You won't boot if this is messed up, since the TSS descriptor will be
      corrupted.  Verified the assembler and booted.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      2f2984eb
    • Z
      [PATCH] i386: inline assembler: cleanup and encapsulate descriptor and task register management · 4d37e7e3
      Zachary Amsden 提交于
      i386 inline assembler cleanup.
      
      This change encapsulates descriptor and task register management.  Also,
      it is possible to improve assembler generation in two cases; savesegment
      may store the value in a register instead of a memory location, which
      allows GCC to optimize stack variables into registers, and MOV MEM, SEG
      is always a 16-bit write to memory, making the casting in math-emu
      unnecessary.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4d37e7e3
    • Z
      [PATCH] i386: cleanup serialize msr · 245067d1
      Zachary Amsden 提交于
      i386 arch cleanup.  Introduce the serialize macro to serialize processor
      state.  Why the microcode update needs it I am not quite sure, since wrmsr()
      is already a serializing instruction, but it is a microcode update, so I will
      keep the semantic the same, since this could be a timing workaround.  As far
      as I can tell, this has always been there since the original microcode update
      source.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      245067d1
    • Z
      [PATCH] i386: inline asm cleanup · 4bb0d3ec
      Zachary Amsden 提交于
      i386 Inline asm cleanup.  Use cr/dr accessor functions.
      
      Also, a potential bugfix.  Also, some CR accessors really should be volatile.
      Reads from CR0 (numeric state may change in an exception handler), writes to
      CR4 (flipping CR4.TSD) and reads from CR2 (page fault) prevent instruction
      re-ordering.  I did not add memory clobber to CR3 / CR4 / CR0 updates, as it
      was not there to begin with, and in no case should kernel memory be clobbered,
      except when doing a TLB flush, which already has memory clobber.
      
      I noticed that page invalidation does not have a memory clobber.  I can't find
      a bug as a result, but there is definitely a potential for a bug here:
      
      #define __flush_tlb_single(addr) \
      	__asm__ __volatile__("invlpg %0": :"m" (*(char *) addr))
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4bb0d3ec
    • N
      [PATCH] ES7000 platform update (i386) · 56f1d5d5
      Natalie.Protasevich@unisys.com 提交于
      This is subarch update for ES7000.  I've modified platform check code and
      removed unnecessary OEM table parsing for newer systems that don't use OEM
      information during boot.  Parsing the table in fact is causing problems,
      and the platform doesn't get recognized.  The patch only affects the ES7000
      subach.
      
      Signed-off-by: <Natalie.Protasevich@unisys.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      56f1d5d5
    • V
      [PATCH] x86: sutomatically enable bigsmp when we have more than 8 CPUs · 911a62d4
      Venkatesh Pallipadi 提交于
      i386 generic subarchitecture requires explicit dmi strings or command line
      to enable bigsmp mode.  The patch below removes that restriction, and uses
      bigsmp as soon as it finds more than 8 logical CPUs, Intel processors and
      xAPIC support.
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      911a62d4
    • M
      [PATCH] x86: fix EFI memory map parsing · 7ae65fd3
      Matt Tolentino 提交于
      The memory descriptors that comprise the EFI memory map are not fixed in
      stone such that the size could change in the future.  This uses the memory
      descriptor size obtained from EFI to iterate over the memory map entries
      during boot.  This enables the removal of an x86 specific pad (and ifdef)
      in the EFI header.  I also couldn't stomach the broken up nature of the
      function to put EFI runtime calls into virtual mode any longer so I fixed
      that up a bit as well.
      
      For reference, this patch only impacts x86.
      Signed-off-by: NMatt Tolentino <matthew.e.tolentino@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7ae65fd3
    • Z
      [PATCH] x86: ptep_clear optimization · a600388d
      Zachary Amsden 提交于
      Add a new accessor for PTEs, which passes the full hint from the mmu_gather
      struct; this allows architectures with hardware pagetables to optimize away
      atomic PTE operations when destroying an address space.  Removing the
      locked operation should allow better pipelining of memory access in this
      loop.  I measured an average savings of 30-35 cycles per zap_pte_range on
      the first 500 destructions on Pentium-M, but I believe the optimization
      would win more on older processors which still assert the bus lock on xchg
      for an exclusive cacheline.
      
      Update: I made some new measurements, and this saves exactly 26 cycles over
      ptep_get_and_clear on Pentium M.  On P4, with a PAE kernel, this saves 180
      cycles per ptep_get_and_clear, for a whopping 92160 cycles savings for a
      full address space destruction.
      
      pte_clear_full is not yet used, but is provided for future optimizations
      (in particular, when running inside of a hypervisor that queues page table
      updates, the full hint allows us to avoid queueing unnecessary page table
      update for an address space in the process of being destroyed.
      
      This is not a huge win, but it does help a bit, and sets the stage for
      further hypervisor optimization of the mm layer on all architectures.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Cc: Christoph Lameter <christoph@lameter.com>
      Cc: <linux-mm@kvack.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a600388d
    • K
      [PATCH] sab: consolidate kmem_bufctl_t · fa5b08d5
      Kyle Moffett 提交于
      This is used only in slab.c and each architecture gets to define whcih
      underlying type is to be used.
      
      Seems a bit silly - move it to slab.c and use the same type for all
      architectures: unsigned int.
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fa5b08d5
    • C
      [PATCH] remove hugetlb_clean_stale_pgtable() and fix huge_pte_alloc() · 0e5c9f39
      Chen, Kenneth W 提交于
      I don't think we need to call hugetlb_clean_stale_pgtable() anymore
      in 2.6.13 because of the rework with free_pgtables().  It now collect
      all the pte page at the time of munmap.  It used to only collect page
      table pages when entire one pgd can be freed and left with staled pte
      pages.  Not anymore with 2.6.13.  This function will never be called
      and We should turn it into a BUG_ON.
      
      I also spotted two problems here, not Adam's fault :-)
      (1) in huge_pte_alloc(), it looks like a bug to me that pud is not
          checked before calling pmd_alloc()
      (2) in hugetlb_clean_stale_pgtable(), it also missed a call to
          pmd_free_tlb.  I think a tlb flush is required to flush the mapping
          for the page table itself when we clear out the pmd pointing to a
          pte page.  However, since hugetlb_clean_stale_pgtable() is never
          called, so it won't trigger the bug.
      Signed-off-by: NKen Chen <kenneth.w.chen@intel.com>
      Cc: Adam Litke <agl@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0e5c9f39
    • A
      [PATCH] hugetlb: add pte_huge() macro · 32e51a8c
      Adam Litke 提交于
      This patch adds a macro pte_huge(pte) for i386/x86_64 which is needed by a
      patch later in the series.  Instead of repeating (_PAGE_PRESENT |
      _PAGE_PSE), I've added __LARGE_PTE to i386 to match x86_64.
      Signed-off-by: NAdam Litke <agl@us.ibm.com>
      Cc: <linux-mm@kvack.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      32e51a8c
    • P
      [PATCH] mm: correct _PAGE_FILE comment · 9b4ee40e
      Paolo 'Blaisorblade' Giarrusso 提交于
      _PAGE_FILE does not indicate whether a file is in page / swap cache, it is
      set just for non-linear PTE's.  Correct the comment for i386, x86_64, UML.
      Also clearify _PAGE_NONE.
      Signed-off-by: NPaolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9b4ee40e
    • S
      [PATCH] mm: consolidate get_order · fd4fd5aa
      Stephen Rothwell 提交于
      Someone mentioned that almost all the architectures used basically the same
      implementation of get_order.  This patch consolidates them into
      asm-generic/page.h and includes that in the appropriate places.  The
      exceptions are ia64 and ppc which have their own (presumably optimised)
      versions.
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fd4fd5aa