1. 10 4月, 2006 8 次提交
  2. 01 4月, 2006 6 次提交
    • H
      kexec: grammar fix for crash_save_this_cpu() · 36a891b6
      Horms 提交于
      kexec: grammar fix for crash_save_this_cpu()
      Signed-Off-By: NHorms <horms@verge.net.au>
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      36a891b6
    • A
      [PATCH] unexport get_wchan · 0cb3463f
      Adrian Bunk 提交于
      The only user of get_wchan is the proc fs - and proc can't be built modular.
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0cb3463f
    • A
      [PATCH] sys_sync_file_range() · f79e2abb
      Andrew Morton 提交于
      Remove the recently-added LINUX_FADV_ASYNC_WRITE and LINUX_FADV_WRITE_WAIT
      fadvise() additions, do it in a new sys_sync_file_range() syscall instead.
      Reasons:
      
      - It's more flexible.  Things which would require two or three syscalls with
        fadvise() can be done in a single syscall.
      
      - Using fadvise() in this manner is something not covered by POSIX.
      
      The patch wires up the syscall for x86.
      
      The sycall is implemented in the new fs/sync.c.  The intention is that we can
      move sys_fsync(), sys_fdatasync() and perhaps sys_sync() into there later.
      
      Documentation for the syscall is in fs/sync.c.
      
      A test app (sync_file_range.c) is in
      http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz.
      
      The available-to-GPL-modules do_sync_file_range() is for knfsd: "A COMMIT can
      say NFS_DATA_SYNC or NFS_FILE_SYNC.  I can skip the ->fsync call for
      NFS_DATA_SYNC which is hopefully the more common."
      
      Note: the `async' writeout mode SYNC_FILE_RANGE_WRITE will turn synchronous if
      the queue is congested.  This is trivial to fix: add a new flag bit, set
      wbc->nonblocking.  But I'm not sure that we want to expose implementation
      details down to that level.
      
      Note: it's notable that we can sync an fd which wasn't opened for writing.
      Same with fsync() and fdatasync()).
      
      Note: the code takes some care to handle attempts to sync file contents
      outside the 16TB offset on 32-bit machines.  It makes such attempts appear to
      succeed, for best 32-bit/64-bit compatibility.  Perhaps it should make such
      requests fail...
      
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Cc: Neil Brown <neilb@cse.unsw.edu.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f79e2abb
    • O
      [PATCH] Don't pass boot parameters to argv_init[] · 9b41046c
      OGAWA Hirofumi 提交于
      The boot cmdline is parsed in parse_early_param() and
      parse_args(,unknown_bootoption).
      
      And __setup() is used in obsolete_checksetup().
      
      	start_kernel()
      		-> parse_args()
      			-> unknown_bootoption()
      				-> obsolete_checksetup()
      
      If __setup()'s callback (->setup_func()) returns 1 in
      obsolete_checksetup(), obsolete_checksetup() thinks a parameter was
      handled.
      
      If ->setup_func() returns 0, obsolete_checksetup() tries other
      ->setup_func().  If all ->setup_func() that matched a parameter returns 0,
      a parameter is seted to argv_init[].
      
      Then, when runing /sbin/init or init=app, argv_init[] is passed to the app.
      If the app doesn't ignore those arguments, it will warning and exit.
      
      This patch fixes a wrong usage of it, however fixes obvious one only.
      Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9b41046c
    • J
      [PATCH] Mark unwind info for signal trampolines in vDSOs · da2e9e1f
      Jakub Jelinek 提交于
      Mark unwind info for signal trampolines using the new S augmentation flag
      introduced in: http://gcc.gnu.org/PR26208.
      
      GCC 4.2 (or patched earlier GCC) will be able to special case unwinding
      through frames right above signal trampolines.  As the augmentations start
      with z flag and S is at the very end of the augmentation string, older GCCs
      will just skip the S flag as unknown (that's why an augmentation flag was
      chosen over say a new CFA opcode).
      Signed-off-by: NJakub Jelinek <jakub@redhat.com>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      da2e9e1f
    • V
      [PATCH] i386 kdump timer vector lockup fix · 1a75a3f0
      Vivek Goyal 提交于
      Porting the patch I posted for x86_64 to i386.
      
      http://marc.theaimsgroup.com/?l=linux-kernel&m=114178139610707&w=2
      
      o While using kdump, after a system crash when second kernel boots, timer
        vector gets (0x31) locked and CPU does not see timer interrupts
        travelling from IOAPIC to APIC.  Currently it does not lead to boot
        failure in second kernel as timer interrupts continues to come as ExtInt
        through LAPIC directly, but fixing it is good in case some boards do not
        support the other mode.
      
      o After a system crash, it is not safe to service interrupts any more,
        hence interrupts are disabled.  This leads to pending interrupts at
        LAPIC.  LAPIC sends these interrupts to the CPU during early boot of
        second kernel.  Other pending interrupts are discarded saying unexpected
        trap but timer interrupt is serviced and CPU does not issue an LAPIC EOI
        because it think this interrupt came from i8259 and sends ack to 8259.
        This leads to vector 0x31 locking as LAPIC does not clear respective ISR
        and keeps on waiting for EOI.
      
      o This patch issues extra EOI for the pending interrupts who have ISR set.
      
      o Though today only timer seems to be the special case because in early
        boot it thinks interrupts are coming from i8259 and uses
        mask_and_ack_8259A() as ack handler and does not issue LAPIC EOI.  But
        probably doing it in generic manner for all vectors makes sense.
      Signed-off-by: NVivek Goyal <vgoyal@in.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1a75a3f0
  3. 31 3月, 2006 1 次提交
    • J
      [PATCH] Introduce sys_splice() system call · 5274f052
      Jens Axboe 提交于
      This adds support for the sys_splice system call. Using a pipe as a
      transport, it can connect to files or sockets (latter as output only).
      
      From the splice.c comments:
      
         "splice": joining two ropes together by interweaving their strands.
      
         This is the "extended pipe" functionality, where a pipe is used as
         an arbitrary in-memory buffer. Think of a pipe as a small kernel
         buffer that you can use to transfer data from one end to the other.
      
         The traditional unix read/write is extended with a "splice()" operation
         that transfers data buffers to or from a pipe buffer.
      
         Named by Larry McVoy, original implementation from Linus, extended by
         Jens to support splicing to files and fixing the initial implementation
         bugs.
      Signed-off-by: NJens Axboe <axboe@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5274f052
  4. 29 3月, 2006 3 次提交
  5. 28 3月, 2006 11 次提交
    • A
      [CPUFREQ] powernow: remove private for_each_cpu_mask() · 64840e27
      Andrew Morton 提交于
      It is unneeded and wrong.
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NDave Jones <davej@redhat.com>
      64840e27
    • S
      [CPUFREQ] hotplug cpu fix for powernow-k8 · eef5167e
      shin, jacob 提交于
      Andi's previous fix to initialise powernow_data on all siblings
      will not work properly with CPU Hotplug.
      Signed-off-by: NJacob Shin <jacob.shin@amd.com>
      Signed-off-by: NDave Jones <davej@redhat.com>
      eef5167e
    • A
      [PATCH] fbdev: Make BIOS EDID reading configurable · 59153f7d
      Antonino A. Daplas 提交于
      DDC reading via the Video BIOS may take several tens of seconds with some
      combination of display cards and monitors.
      
      Make this option configurable.  It defaults to `y' to minimise disruption.
      Signed-off-by: NAntonino Daplas <adaplas@pol.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      59153f7d
    • A
      [PATCH] Notifier chain update: API changes · e041c683
      Alan Stern 提交于
      The kernel's implementation of notifier chains is unsafe.  There is no
      protection against entries being added to or removed from a chain while the
      chain is in use.  The issues were discussed in this thread:
      
          http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2
      
      We noticed that notifier chains in the kernel fall into two basic usage
      classes:
      
      	"Blocking" chains are always called from a process context
      	and the callout routines are allowed to sleep;
      
      	"Atomic" chains can be called from an atomic context and
      	the callout routines are not allowed to sleep.
      
      We decided to codify this distinction and make it part of the API.  Therefore
      this set of patches introduces three new, parallel APIs: one for blocking
      notifiers, one for atomic notifiers, and one for "raw" notifiers (which is
      really just the old API under a new name).  New kinds of data structures are
      used for the heads of the chains, and new routines are defined for
      registration, unregistration, and calling a chain.  The three APIs are
      explained in include/linux/notifier.h and their implementation is in
      kernel/sys.c.
      
      With atomic and blocking chains, the implementation guarantees that the chain
      links will not be corrupted and that chain callers will not get messed up by
      entries being added or removed.  For raw chains the implementation provides no
      guarantees at all; users of this API must provide their own protections.  (The
      idea was that situations may come up where the assumptions of the atomic and
      blocking APIs are not appropriate, so it should be possible for users to
      handle these things in their own way.)
      
      There are some limitations, which should not be too hard to live with.  For
      atomic/blocking chains, registration and unregistration must always be done in
      a process context since the chain is protected by a mutex/rwsem.  Also, a
      callout routine for a non-raw chain must not try to register or unregister
      entries on its own chain.  (This did happen in a couple of places and the code
      had to be changed to avoid it.)
      
      Since atomic chains may be called from within an NMI handler, they cannot use
      spinlocks for synchronization.  Instead we use RCU.  The overhead falls almost
      entirely in the unregister routine, which is okay since unregistration is much
      less frequent that calling a chain.
      
      Here is the list of chains that we adjusted and their classifications.  None
      of them use the raw API, so for the moment it is only a placeholder.
      
        ATOMIC CHAINS
        -------------
      arch/i386/kernel/traps.c:		i386die_chain
      arch/ia64/kernel/traps.c:		ia64die_chain
      arch/powerpc/kernel/traps.c:		powerpc_die_chain
      arch/sparc64/kernel/traps.c:		sparc64die_chain
      arch/x86_64/kernel/traps.c:		die_chain
      drivers/char/ipmi/ipmi_si_intf.c:	xaction_notifier_list
      kernel/panic.c:				panic_notifier_list
      kernel/profile.c:			task_free_notifier
      net/bluetooth/hci_core.c:		hci_notifier
      net/ipv4/netfilter/ip_conntrack_core.c:	ip_conntrack_chain
      net/ipv4/netfilter/ip_conntrack_core.c:	ip_conntrack_expect_chain
      net/ipv6/addrconf.c:			inet6addr_chain
      net/netfilter/nf_conntrack_core.c:	nf_conntrack_chain
      net/netfilter/nf_conntrack_core.c:	nf_conntrack_expect_chain
      net/netlink/af_netlink.c:		netlink_chain
      
        BLOCKING CHAINS
        ---------------
      arch/powerpc/platforms/pseries/reconfig.c:	pSeries_reconfig_chain
      arch/s390/kernel/process.c:		idle_chain
      arch/x86_64/kernel/process.c		idle_notifier
      drivers/base/memory.c:			memory_chain
      drivers/cpufreq/cpufreq.c		cpufreq_policy_notifier_list
      drivers/cpufreq/cpufreq.c		cpufreq_transition_notifier_list
      drivers/macintosh/adb.c:		adb_client_list
      drivers/macintosh/via-pmu.c		sleep_notifier_list
      drivers/macintosh/via-pmu68k.c		sleep_notifier_list
      drivers/macintosh/windfarm_core.c	wf_client_list
      drivers/usb/core/notify.c		usb_notifier_list
      drivers/video/fbmem.c			fb_notifier_list
      kernel/cpu.c				cpu_chain
      kernel/module.c				module_notify_list
      kernel/profile.c			munmap_notifier
      kernel/profile.c			task_exit_notifier
      kernel/sys.c				reboot_notifier_list
      net/core/dev.c				netdev_chain
      net/decnet/dn_dev.c:			dnaddr_chain
      net/ipv4/devinet.c:			inetaddr_chain
      
      It's possible that some of these classifications are wrong.  If they are,
      please let us know or submit a patch to fix them.  Note that any chain that
      gets called very frequently should be atomic, because the rwsem read-locking
      used for blocking chains is very likely to incur cache misses on SMP systems.
      (However, if the chain's callout routines may sleep then the chain cannot be
      atomic.)
      
      The patch set was written by Alan Stern and Chandra Seetharaman, incorporating
      material written by Keith Owens and suggestions from Paul McKenney and Andrew
      Morton.
      
      [jes@sgi.com: restructure the notifier chain initialization macros]
      Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com>
      Signed-off-by: NJes Sorensen <jes@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e041c683
    • I
      [PATCH] lightweight robust futexes: i386 · dfd4e3ec
      Ingo Molnar 提交于
      i386: add the futex_atomic_cmpxchg_inuser() assembly implementation, and wire
      up the new syscalls.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NArjan van de Ven <arjan@infradead.org>
      Acked-by: NUlrich Drepper <drepper@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      dfd4e3ec
    • D
      [PATCH] unify PFN_* macros · 22a9835c
      Dave Hansen 提交于
      Just about every architecture defines some macros to do operations on pfns.
       They're all virtually identical.  This patch consolidates all of them.
      
      One minor glitch is that at least i386 uses them in a very skeletal header
      file.  To keep away from #include dependency hell, I stuck the new
      definitions in a new, isolated header.
      
      Of all of the implementations, sh64 is the only one that varied by a bit.
      It used some masks to ensure that any sign-extension got ripped away before
      the arithmetic is done.  This has been posted to that sh64 maintainers and
      the development list.
      
      Compiles on x86, x86_64, ia64 and ppc64.
      Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      22a9835c
    • K
      [PATCH] for_each_online_pgdat: remove sorting pgdat · 3571761f
      KAMEZAWA Hiroyuki 提交于
      Because pgdat_list was linked to pgdat_list in *reverse* order, (By default)
      some of arch has to sort it by themselves.
      
      for_each_pgdat has gone..for_each_online_pgdat() uses node_online_map, which
      doesn't need to be sorted.
      
      This patch removes codes for sorting pgdat.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3571761f
    • K
      [PATCH] for_each_online_pgdat: renaming for_each_pgdat · ec936fc5
      KAMEZAWA Hiroyuki 提交于
      Replace for_each_pgdat() with for_each_online_pgdat().
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ec936fc5
    • S
      [PATCH] x86: don't use cpuid.2 to determine cache info if cpuid.4 is supported · b06be912
      Shaohua Li 提交于
      Don't use cpuid.2 to determine cache info if cpuid.4 is supported.  The
      exception is P4 trace cache.  We always use cpuid.2 to get trace cache
      under P4.
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b06be912
    • S
      [PATCH] sched: new sched domain for representing multi-core · 1e9f28fa
      Siddha, Suresh B 提交于
      Add a new sched domain for representing multi-core with shared caches
      between cores.  Consider a dual package system, each package containing two
      cores and with last level cache shared between cores with in a package.  If
      there are two runnable processes, with this appended patch those two
      processes will be scheduled on different packages.
      
      On such systems, with this patch we have observed 8% perf improvement with
      specJBB(2 warehouse) benchmark and 35% improvement with CFP2000 rate(with 2
      users).
      
      This new domain will come into play only on multi-core systems with shared
      caches.  On other systems, this sched domain will be removed by domain
      degeneration code.  This new domain can be also used for implementing power
      savings policy (see OLS 2005 CMP kernel scheduler paper for more details..
      I will post another patch for power savings policy soon)
      
      Most of the arch/* file changes are for cpu_coregroup_map() implementation.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1e9f28fa
    • O
      [PATCH] PM-Timer: don't use workaround if chipset is not buggy · dbffa471
      OGAWA Hirofumi 提交于
      Current timer_pm.c reads I/O port triple times, in order to avoid the bug
      of chipset.  But I/O port is slow.
      
      2.6.16 (pmtmr)
      Simple gettimeofday: 3.6532 microseconds
      
      2.6.16+patch (pmtmr)
      Simple gettimeofday: 1.4582 microseconds
      
      [if chip is buggy, probably it will be 7us or more in 4.2% of probability.]
      
      This patch adds blacklist of buggy chip, and if chip is not buggy, this
      uses fast normal version instead of slow workaround version.
      
      If chip is buggy, warnings "pmtmr is slow".  But sounds like there is gray
      zone.  I found the PIIX4 errata, but I couldn't find the ICH4 errata.  But
      some motherboard seems to have problem.
      
      So, if we found a ICH4, generate warnings, and use a workaround version.
      If user's ICH4 is good, the user can specify the "pmtmr_good" boot
      parameter to use fast version.
      Acked-by: NJohn Stultz <johnstul@us.ibm.com>
      Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      dbffa471
  6. 27 3月, 2006 11 次提交
    • A
      [PATCH] bitops: i386: use generic bitops · 1cc2b994
      Akinobu Mita 提交于
      - remove generic_fls64()
      - remove sched_find_first_bit()
      - remove generic_hweight{32,16,8}()
      - remove ext2_{set,clear,test,find_first_zero,find_next_zero}_bit()
      - remove minix_{test,set,test_and_clear,test,find_first_zero}_bit()
      Signed-off-by: NAkinobu Mita <mita@miraclelinux.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1cc2b994
    • P
      [PATCH] kprobes: fix broken fault handling for i386 · b4026513
      Prasanna S Panchamukhi 提交于
      Provide proper kprobes fault handling, if a user-specified pre/post handlers
      tries to access user address space, through copy_from_user(), get_user() etc.
      
      The user-specified fault handler gets called only if the fault occurs while
      executing user-specified handlers.  In such a case user-specified handler is
      allowed to fix it first, later if the user-specifed fault handler does not fix
      it, we try to fix it by calling fix_exception().
      
      The user-specified handler will not be called if the fault happens when single
      stepping the original instruction, instead we reset the current probe and
      allow the system page fault handler to fix it up.
      Signed-off-by: NPrasanna S Panchamukhi <prasanna@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b4026513
    • B
      [PATCH] kprobe handler: discard user space trap · 2326c770
      bibo,mao 提交于
      Currently kprobe handler traps only happen in kernel space, so function
      kprobe_exceptions_notify should skip traps which happen in user space.
      This patch modifies this, and it is based on 2.6.16-rc4.
      Signed-off-by: Nbibo mao <bibo.mao@intel.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: "Keshavamurthy, Anil S" <anil.s.keshavamurthy@intel.com>
      Cc: <hiramatu@sdl.hitachi.co.jp>
      Signed-off-by: NPrasanna S Panchamukhi <prasanna@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      2326c770
    • B
      [PATCH] kretprobe instance recycled by parent process · c6fd91f0
      bibo mao 提交于
      When kretprobe probes the schedule() function, if the probed process exits
      then schedule() will never return, so some kretprobe instances will never
      be recycled.
      
      In this patch the parent process will recycle retprobe instances of the
      probed function and there will be no memory leak of kretprobe instances.
      Signed-off-by: Nbibo mao <bibo.mao@intel.com>
      Cc: Masami Hiramatsu <hiramatu@sdl.hitachi.co.jp>
      Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c6fd91f0
    • M
      [PATCH] kretprobe: kretprobe-booster · c9becf58
      Masami Hiramatsu 提交于
      In normal operation, kretprobe makes a target function return to trampoline
      code.  A kprobe (called trampoline_probe) has been inserted in the trampoline
      code.  When the kernel hits this kprobe, it calls kretprobe's handler and it
      returns to the original return address.
      
      Kretprobe-booster removes the trampoline_probe.  It allows the trampoline code
      to call kretprobe's handler directly instead of invoking kprobe.  The
      trampoline code returns to the original return address.
      
      (changelog from Chuck Ebbert <76306.1226@compuserve.com> - thanks ;))
      Signed-off-by: NMasami Hiramatsu <hiramatu@sdl.hitachi.co.jp>
      Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Chuck Ebbert <76306.1226@compuserve.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c9becf58
    • M
      [PATCH] x86: kprobes-booster · 311ac88f
      Masami Hiramatsu 提交于
      Current kprobe copies the original instruction at the probe point and replaces
      it with a breakpoint instruction (int3).  When the kernel hits the probe
      point, kprobe handler is invoked.  And the copied instruction is single-step
      executed on the copied buffer (not on the original address) by kprobe.  After
      that, the kprobe checks registers and modify it (if need) as if the
      instructions was executed on the original address.
      
      My proposal is based on the fact there are many instructions which do NOT
      require the register modification after the single-step execution.  When the
      copied instruction is a kind of them, kprobe just jumps back to the next
      instruction after single-step execution.  If so, why don't we execute those
      instructions directly?
      
      With kprobe-booster patch, kprobes will execute a copied instruction directly
      and (if need) jump back to original code.  This direct execution is executed
      when the kprobe don't have both post_handler and break_handler, and the copied
      instruction can be executed directly.
      
      I sorted instructions which can be executed directly or not;
      
      - Call instructions are NG(can not be executed directly).
        We should correct the return address pushed into top of stack.
      - Indirect instructions except for absolute indirect-jumps
        are NG. Those instructions changes EIP randomly. We should
        check EIP and correct it.
      - Instructions that change EIP beyond the range of the
        instruction buffer are NG.
      - Instructions that change EIP to tail 5 bytes of the
        instruction buffer (it is the size of a jump instruction).
        We must write a jump instruction which backs to original
        kernel code in the instruction buffer.
      - Break point instruction is NG. We should not touch EIP and
        pass to other handlers.
      - Absolute direct/indirect jumps are OK.- Conditional Jumps are NG.
      - Halt and software-interruptions are NG. Because it will stay on
        the instruction buffer of kprobes.
      - Prefixes are NG.
      - Unknown/reserved opcode is NG.
      - Other 1 byte instructions are OK. But those instructions need a
        jump back code.
      - 2 bytes instructions are mapped sparsely. So, in this release,
        this patch don't boost those instructions.
      
      >From Intel's IA-32 opcode map described in IA-32 Intel Architecture Software
      Developer's Manual Vol.2 B, I determined that following opcodes are not
      boostable.
      
      - 0FH (2byte escape)
      - 70H - 7FH (Jump on condition)
      - 9AH (Call) and 9CH (Pushf)
      - C0H-C1H (Grp 2: includes reserved opcode)
      - C6H-C7H (Grp11: includes reserved opcode)
      - CCH-CEH (Software-interrupt)
      - D0H-D3H (Grp2: includes reserved opcode)
      - D6H (Reserved)
      - D8H-DFH (Coprocessor)
      - E0H-E3H (loop/conditional jump)
      - E8H (Call)
      - F0H-F3H (Prefixes and reserved)
      - F4H (Halt)
      - F6H-F7H (Grp3: includes reserved opcode)
      - FEH-FFH(Grp4,5: includes reserved opcode)
      
      Kprobe-booster checks whether target instruction can be boosted (can be
      executed directly) at arch_copy_kprobe() function.  If the target instruction
      can be boosted, it clears "boostable" flag.  If not, it sets "boostable" flag
      -1.  This is disabled status.  In resume_execution() function, If "boostable"
      flag is cleared, kprobe-booster measures the size of the target instruction
      and sets "boostable" flag 1.
      
      In kprobe_handler(), kprobe checks the "boostable" flag.  If the flag is 1, it
      resets current kprobe and executes instruction buffer directly instead of
      single stepping.
      
      When unregistering a boosted kprobe, it calls synchronize_sched()
      after "int3" is removed. So we can ensure followings after
      the synchronize_sched() called.
      - interrupt handlers are finished on all CPUs.
      - instruction buffer is not executed on all CPUs.
      And we can release the boosted kprobe safely.
      
      And also, on preemptible kernel, the booster is not enabled where the kernel
      preemption is enabled.  So, there are no preempted threads on the instruction
      buffer.
      
      The description of kretprobe-booster:
      ====================================
      
      In the normal operation, kretprobe make a target function return to trampoline
      code.  And a kprobe (called trampoline_probe) have been inserted at the
      trampoline code.  When the kernel hits this kprobe, it calls kretprobe's
      handler and it returns to original return address.
      
      Kretprobe-booster patch removes the trampoline_probe.  It allows the
      trampoline code to call kretprobe's handler directly instead of invoking
      kprobe.  And tranpoline code returns to original return address.
      
      This new trampoline code stores and restores registers, so the kretprobe
      handler is still able to access those registers.
      
      Current kprobe has about 1.3 usec/probe(*) overhead, and kprobe-booster patch
      reduces it to 0.6 usec/probe(*).  Also current kretprobe has about 2.0
      usec/probe(*) overhead.  Kprobe-booster patch reduces it to 1.3 usec/probe(*),
      and the combination of both kprobe-booster patch and kretprobe-booster patch
      reduces it to 0.9 usec/probe(*).
      
      I expect the combination of both patches can reduce half of a probing
      overhead.
      
      Performance numbers strongly depend on the processor model.
      
      Andrew Morton wrote:
      > These preempt tricks look rather nasty.  Can you please describe what the
      > problem is, precisely?  And how this code avoids it?  Perhaps we can find
      > something cleaner.
      
      The problem is how to remove the copied instructions of the
      kprobe *safely* on the preemptable kernel (CONFIG_PREEMPT=y).
      
      Kprobes basically executes the following actions;
      
      (1)int3
      (2)preempt_disable()
      (3)kprobe_prehandler()
      (4)copied instructioin(single step)
      (5)kprobe_posthandler()
      (6)preempt_enable()
      (7)return to the original code
      
      During the execution of copied instruction, preemption is
      disabled (from step (2) to (6)).
      When unregistering the probes, Kprobe waits for RCU
      quiescent state by using synchronize_sched() after removing
      int3 instruction.
      Thus we can ensure the copied instruction is not executed.
      
      On the other hand, kprobe-booster executes the following actions;
      
      (1)int3
      (2)preempt_disable()
      (3)kprobe_prehandler()
      (4)preempt_enable()             <-- this one is added by my patch
      (5)copied instruction(direct execution)
      (6)jmp back to the original code
      
      The problem is that we have no way to prevent preemption on
      step (5) or (6). We cannot call preempt_disable() after step (6),
      because there are no rooms to do that. Thus, some other
      processes may be preempted at step(5) or (6) on preemptable kernel.
      And I couldn't find the easy way to ensure that other processes'
      stack do *not* have the address of them. (I thought some way
      to do that, but those are very costly.)
      
      So currently, I simply boost the kprobe only when the probe
      point is already preemption disabled.
      
      > Also, the patch adds a preempt_enable() but I don't see a corresponding
      > preempt_disable().  Am I missing something?
      
      It is corresponding to the preempt_disable() in the top of
      kprobe_handler().
      I copied the code of kprobe_handler() here:
      
      static int __kprobes kprobe_handler(struct pt_regs *regs)
      {
              struct kprobe *p;
              int ret = 0;
              kprobe_opcode_t *addr = NULL;
              unsigned long *lp;
              struct kprobe_ctlblk *kcb;
      
              /*
               * We don't want to be preempted for the entire
               * duration of kprobe processing
               */
              preempt_disable();             <-- HERE
              kcb = get_kprobe_ctlblk();
      Signed-off-by: NMasami Hiramatsu <hiramatu@sdl.hitachi.co.jp>
      Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      311ac88f
    • M
      [PATCH] kprobes: clean up resume_execute() · b50ea74c
      Masami Hiramatsu 提交于
      Clean up kprobe's resume_execute() for i386 arch.
      Signed-off-by: NMasami Hiramatsu <hiramatu@sdl.hitachi.co.jp>
      Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b50ea74c
    • D
      [PATCH] fix array overrun in efi.c · d6d21dfd
      Darren Jenkins 提交于
      Coverity found an over-run @ line 364 of efi.c
      
      This is due to the loop checking the size correctly, then adding a '\0'
      after possibly hitting the end of the array.
      
      Ensure the loop exits with one space left in the array.
      Signed-off-by: NDarren Jenkins <darrenrjenkins@gmail.com>
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d6d21dfd
    • I
      [PATCH] sem2mutex: misc static one-file mutexes · 14cc3e2b
      Ingo Molnar 提交于
      Semaphore to mutex conversion.
      
      The conversion was generated via scripts, and the result was validated
      automatically via a script as well.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Jens Axboe <axboe@suse.de>
      Cc: Neil Brown <neilb@cse.unsw.edu.au>
      Acked-by: NAlasdair G Kergon <agk@redhat.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: Adam Belay <ambx1@neo.rr.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      14cc3e2b
    • T
      [PATCH] EFI fixes · 23dd842c
      Tolentino, Matthew E 提交于
      Here's a patch that fixes EFI boot for x86 on 2.6.16-rc5-mm3.  The
      off-by-one is admittedly my fault, but the other two fix up the rest.
      
      Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
      Cc: Matt Domsch <Matt_Domsch@dell.com>
      Cc: "Tolentino, Matthew E" <matthew.e.tolentino@intel.com>
      Cc: "Brown, Len" <len.brown@intel.com>
      Cc: Andi Kleen <ak@muc.de>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      23dd842c
    • B
      [PATCH] EFI: keep physical table addresses in efi structure · b2c99e3c
      Bjorn Helgaas 提交于
      Almost all users of the table addresses from the EFI system table want
      physical addresses.  So rather than doing the pa->va->pa conversion, just keep
      physical addresses in struct efi.
      
      This fixes a DMI bug: the efi structure contained the physical SMBIOS address
      on x86 but the virtual address on ia64, so dmi_scan_machine() used ioremap()
      on a virtual address on ia64.
      
      This is essentially the same as an earlier patch by Matt Tolentino:
      	http://marc.theaimsgroup.com/?l=linux-kernel&m=112130292316281&w=2
      except that this changes all table addresses, not just ACPI addresses.
      
      Matt's original patch was backed out because it caused MCAs on HP sx1000
      systems.  That problem is resolved by the ioremap() attribute checking added
      for ia64.
      Signed-off-by: NBjorn Helgaas <bjorn.helgaas@hp.com>
      Cc: Matt Domsch <Matt_Domsch@dell.com>
      Cc: "Tolentino, Matthew E" <matthew.e.tolentino@intel.com>
      Cc: "Brown, Len" <len.brown@intel.com>
      Cc: Andi Kleen <ak@muc.de>
      Acked-by: N"Luck, Tony" <tony.luck@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b2c99e3c