1. 28 7月, 2008 5 次提交
  2. 26 7月, 2008 2 次提交
    • D
      sparc: Wire up new system calls. · f1373da8
      David S. Miller 提交于
      This wires up the recently added Wire up signalfd4, eventfd2,
      epoll_create1, dup3, pipe2, and inotify_init1 system calls.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f1373da8
    • S
      kprobes: improve kretprobe scalability with hashed locking · ef53d9c5
      Srinivasa D S 提交于
      Currently list of kretprobe instances are stored in kretprobe object (as
      used_instances,free_instances) and in kretprobe hash table.  We have one
      global kretprobe lock to serialise the access to these lists.  This causes
      only one kretprobe handler to execute at a time.  Hence affects system
      performance, particularly on SMP systems and when return probe is set on
      lot of functions (like on all systemcalls).
      
      Solution proposed here gives fine-grain locks that performs better on SMP
      system compared to present kretprobe implementation.
      
      Solution:
      
       1) Instead of having one global lock to protect kretprobe instances
          present in kretprobe object and kretprobe hash table.  We will have
          two locks, one lock for protecting kretprobe hash table and another
          lock for kretporbe object.
      
       2) We hold lock present in kretprobe object while we modify kretprobe
          instance in kretprobe object and we hold per-hash-list lock while
          modifying kretprobe instances present in that hash list.  To prevent
          deadlock, we never grab a per-hash-list lock while holding a kretprobe
          lock.
      
       3) We can remove used_instances from struct kretprobe, as we can
          track used instances of kretprobe instances using kretprobe hash
          table.
      
      Time duration for kernel compilation ("make -j 8") on a 8-way ppc64 system
      with return probes set on all systemcalls looks like this.
      
      cacheline              non-cacheline             Un-patched kernel
      aligned patch 	       aligned patch
      ===============================================================================
      real    9m46.784s       9m54.412s                  10m2.450s
      user    40m5.715s       40m7.142s                  40m4.273s
      sys     2m57.754s       2m58.583s                  3m17.430s
      ===========================================================
      
      Time duration for kernel compilation ("make -j 8) on the same system, when
      kernel is not probed.
      =========================
      real    9m26.389s
      user    40m8.775s
      sys     2m7.283s
      =========================
      Signed-off-by: NSrinivasa DS <srinivasa@in.ibm.com>
      Signed-off-by: NJim Keniston <jkenisto@us.ibm.com>
      Acked-by: NAnanth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ef53d9c5
  3. 25 7月, 2008 2 次提交
    • U
      flag parameters: pipe · ed8cae8b
      Ulrich Drepper 提交于
      This patch introduces the new syscall pipe2 which is like pipe but it also
      takes an additional parameter which takes a flag value.  This patch implements
      the handling of O_CLOEXEC for the flag.  I did not add support for the new
      syscall for the architectures which have a special sys_pipe implementation.  I
      think the maintainers of those archs have the chance to go with the unified
      implementation but that's up to them.
      
      The implementation introduces do_pipe_flags.  I did that instead of changing
      all callers of do_pipe because some of the callers are written in assembler.
      I would probably screw up changing the assembly code.  To avoid breaking code
      do_pipe is now a small wrapper around do_pipe_flags.  Once all callers are
      changed over to do_pipe_flags the old do_pipe function can be removed.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_pipe2
      # ifdef __x86_64__
      #  define __NR_pipe2 293
      # elif defined __i386__
      #  define __NR_pipe2 331
      # else
      #  error "need __NR_pipe2"
      # endif
      #endif
      
      int
      main (void)
      {
        int fd[2];
        if (syscall (__NR_pipe2, fd, 0) != 0)
          {
            puts ("pipe2(0) failed");
            return 1;
          }
        for (int i = 0; i < 2; ++i)
          {
            int coe = fcntl (fd[i], F_GETFD);
            if (coe == -1)
              {
                puts ("fcntl failed");
                return 1;
              }
            if (coe & FD_CLOEXEC)
              {
                printf ("pipe2(0) set close-on-exit for fd[%d]\n", i);
                return 1;
              }
          }
        close (fd[0]);
        close (fd[1]);
      
        if (syscall (__NR_pipe2, fd, O_CLOEXEC) != 0)
          {
            puts ("pipe2(O_CLOEXEC) failed");
            return 1;
          }
        for (int i = 0; i < 2; ++i)
          {
            int coe = fcntl (fd[i], F_GETFD);
            if (coe == -1)
              {
                puts ("fcntl failed");
                return 1;
              }
            if ((coe & FD_CLOEXEC) == 0)
              {
                printf ("pipe2(O_CLOEXEC) does not set close-on-exit for fd[%d]\n", i);
                return 1;
              }
          }
        close (fd[0]);
        close (fd[1]);
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ed8cae8b
    • A
      PAGE_ALIGN(): correctly handle 64-bit values on 32-bit architectures · 27ac792c
      Andrea Righi 提交于
      On 32-bit architectures PAGE_ALIGN() truncates 64-bit values to the 32-bit
      boundary. For example:
      
      	u64 val = PAGE_ALIGN(size);
      
      always returns a value < 4GB even if size is greater than 4GB.
      
      The problem resides in PAGE_MASK definition (from include/asm-x86/page.h for
      example):
      
      #define PAGE_SHIFT      12
      #define PAGE_SIZE       (_AC(1,UL) << PAGE_SHIFT)
      #define PAGE_MASK       (~(PAGE_SIZE-1))
      ...
      #define PAGE_ALIGN(addr)       (((addr)+PAGE_SIZE-1)&PAGE_MASK)
      
      The "~" is performed on a 32-bit value, so everything in "and" with
      PAGE_MASK greater than 4GB will be truncated to the 32-bit boundary.
      Using the ALIGN() macro seems to be the right way, because it uses
      typeof(addr) for the mask.
      
      Also move the PAGE_ALIGN() definitions out of include/asm-*/page.h in
      include/linux/mm.h.
      
      See also lkml discussion: http://lkml.org/lkml/2008/6/11/237
      
      [akpm@linux-foundation.org: fix drivers/media/video/uvc/uvc_queue.c]
      [akpm@linux-foundation.org: fix v850]
      [akpm@linux-foundation.org: fix powerpc]
      [akpm@linux-foundation.org: fix arm]
      [akpm@linux-foundation.org: fix mips]
      [akpm@linux-foundation.org: fix drivers/media/video/pvrusb2/pvrusb2-dvb.c]
      [akpm@linux-foundation.org: fix drivers/mtd/maps/uclinux.c]
      [akpm@linux-foundation.org: fix powerpc]
      Signed-off-by: NAndrea Righi <righi.andrea@gmail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      27ac792c
  4. 24 7月, 2008 1 次提交
  5. 23 7月, 2008 1 次提交
    • D
      sparc64: Fix lockdep issues in LDC protocol layer. · b7c2a757
      David S. Miller 提交于
      We're calling request_irq() with a IRQs disabled.
      
      No straightforward fix exists because we want to
      enable these IRQs and setup state atomically before
      getting into the IRQ handler the first time.
      
      What happens now is that we mark the VIRQ to not be
      automatically enabled by request_irq().  Then we
      make explicit enable_irq() calls when we grab the
      LDC channel.
      
      This way we don't need to call request_irq() illegally
      under the LDC channel lock any more.
      
      Bump LDC version and release date.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b7c2a757
  6. 22 7月, 2008 4 次提交
  7. 19 7月, 2008 1 次提交
    • T
      nohz: prevent tick stop outside of the idle loop · b8f8c3cf
      Thomas Gleixner 提交于
      Jack Ren and Eric Miao tracked down the following long standing
      problem in the NOHZ code:
      
      	scheduler switch to idle task
      	enable interrupts
      
      Window starts here
      
      	----> interrupt happens (does not set NEED_RESCHED)
      	      	irq_exit() stops the tick
      
      	----> interrupt happens (does set NEED_RESCHED)
      
      	return from schedule()
      	
      	cpu_idle(): preempt_disable();
      
      Window ends here
      
      The interrupts can happen at any point inside the race window. The
      first interrupt stops the tick, the second one causes the scheduler to
      rerun and switch away from idle again and we end up with the tick
      disabled.
      
      The fact that it needs two interrupts where the first one does not set
      NEED_RESCHED and the second one does made the bug obscure and extremly
      hard to reproduce and analyse. Kudos to Jack and Eric.
      
      Solution: Limit the NOHZ functionality to the idle loop to make sure
      that we can not run into such a situation ever again.
      
      cpu_idle()
      {
      	preempt_disable();
      
      	while(1) {
      		 tick_nohz_stop_sched_tick(1); <- tell NOHZ code that we
      		 			          are in the idle loop
      
      		 while (!need_resched())
      		       halt();
      
      		 tick_nohz_restart_sched_tick(); <- disables NOHZ mode
      		 preempt_enable_no_resched();
      		 schedule();
      		 preempt_disable();
      	}
      }
      
      In hindsight we should have done this forever, but ... 
      
      /me grabs a large brown paperbag.
      
      Debugged-by: Jack Ren <jack.ren@marvell.com>, 
      Debugged-by: Neric miao <eric.y.miao@gmail.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      b8f8c3cf
  8. 18 7月, 2008 5 次提交
  9. 08 7月, 2008 2 次提交
  10. 03 7月, 2008 2 次提交
  11. 01 7月, 2008 1 次提交
    • I
      fix "ftrace: store mcount address in rec->ip" · 760378e1
      Ingo Molnar 提交于
      Alexander Beregalov reported this build failure:
      
      $ make CROSS_COMPILE=sparc64-unknown-linux-gnu- image modules && sudo
      make modules_install
        CHK     include/linux/version.h
        CHK     include/linux/utsrelease.h
        CALL    scripts/checksyscalls.sh
        CHK     include/linux/compile.h
      dnsdomainname: Unknown host
        CC      arch/sparc64/kernel/sparc64_ksyms.o
      arch/sparc64/kernel/sparc64_ksyms.c:116: error: '_mcount' undeclared
      here (not in a function)
      cc1: warnings being treated as errors
      arch/sparc64/kernel/sparc64_ksyms.c:116: error: type defaults to 'int'
      in declaration of '_mcount'
      
      And bisected it back to:
      
      | commit 395a59d0
      | Author: Abhishek Sagar <sagar.abhishek@gmail.com>
      | Date:   Sat Jun 21 23:47:27 2008 +0530
      |
      |     ftrace: store mcount address in rec->ip
      
      the mcount prototype is only available under CONFIG_FTRACE,
      extend it to CONFIG_MCOUNT as well.
      Reported-and-bisected-by: NAlexander Beregalov <a.beregalov@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      760378e1
  12. 26 6月, 2008 2 次提交
  13. 24 6月, 2008 1 次提交
  14. 10 6月, 2008 1 次提交
  15. 24 5月, 2008 1 次提交
  16. 22 5月, 2008 3 次提交
  17. 20 5月, 2008 2 次提交
    • D
      sparc64: Add global register dumping facility. · 93dae5b7
      David S. Miller 提交于
      When a cpu really is stuck in the kernel, it can be often
      impossible to figure out which cpu is stuck where.  The
      worst case is when the stuck cpu has interrupts disabled.
      
      Therefore, implement a global cpu state capture that uses
      SMP message interrupts which are not disabled by the
      normal IRQ enable/disable APIs of the kernel.
      
      As long as we can get a sysrq 'y' to the kernel, we can
      get a dump.  Even if the console interrupt cpu is wedged,
      we can trigger it from userspace using /proc/sysrq-trigger
      
      The output is made compact so that this facility is more
      useful on high cpu count systems, which is where this
      facility will likely find itself the most useful :)
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93dae5b7
    • A
      sparc64: remove CVS keywords · b00dc837
      Adrian Bunk 提交于
      This patch removes the CVS keywords that weren't updated for a long time
      from comments.
      Signed-off-by: NAdrian Bunk <bunk@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b00dc837
  18. 17 5月, 2008 1 次提交
  19. 13 5月, 2008 2 次提交
  20. 11 5月, 2008 1 次提交
    • D
      sparc: Fix debugger syscall restart interactions. · 28e61036
      David S. Miller 提交于
      So, forever, we've had this ptrace_signal_deliver implementation
      which tries to handle all of the nasties that can occur when the
      debugger looks at a process about to take a signal.  It's meant
      to address all of these issues inside of the kernel so that the
      debugger need not be mindful of such things.
      
      Problem is, this doesn't work.
      
      The idea was that we should do the syscall restart business first, so
      that the debugger captures that state.  Otherwise, if the debugger for
      example saves the child's state, makes the child execute something
      else, then restores the saved state, we won't handle the syscall
      restart properly because we lose the "we're in a syscall" state.
      
      The code here worked for most cases, but if the debugger actually
      passes the signal through to the child unaltered, it's possible that
      we would do a syscall restart when we shouldn't have.
      
      In particular this breaks the case of debugging a process under a gdb
      which is being debugged by yet another gdb.  gdb uses sigsuspend
      to wait for SIGCHLD of the inferior, but if gdb itself is being
      debugged by a top-level gdb we get a ptrace_stop().  The top-level gdb
      does a PTRACE_CONT with SIGCHLD to let the inferior gdb see the
      signal.  But ptrace_signal_deliver() assumed the debugger would cancel
      out the signal and therefore did a syscall restart, because the return
      error was ERESTARTNOHAND.
      
      Fix this by simply making ptrace_signal_deliver() a nop, and providing
      a way for the debugger to control system call restarting properly:
      
      1) Report a "in syscall" software bit in regs->{tstate,psr}.
         It is set early on in trap entry to a system call and is fully
         visible to the debugger via ptrace() and regsets.
      
      2) Test this bit right before doing a syscall restart.  We have
         to do a final recheck right after get_signal_to_deliver() in
         case the debugger cleared the bit during ptrace_stop().
      
      3) Clear the bit in trap return so we don't accidently try to set
         that bit in the real register.
      
      As a result we also get a ptrace_{is,clear}_syscall() for sparc32 just
      like sparc64 has.
      
      M68K has this same exact bug, and is now the only other user of the
      ptrace_signal_deliver hook.  It needs to be fixed in the same exact
      way as sparc.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      28e61036