1. 25 11月, 2008 3 次提交
  2. 24 11月, 2008 4 次提交
    • T
      mutex: __used is needed for function referenced only from inline asm · 7918baa5
      Török Edwin 提交于
      Impact: fix build failure on llvm-gcc-4.2
      
      According to the gcc manual, the 'used' attribute should be applied to
      functions referenced only from inline assembly.
      This fixes a build failure with llvm-gcc-4.2, which deleted
      __mutex_lock_slowpath, __mutex_unlock_slowpath.
      Signed-off-by: NTörök Edwin <edwintorok@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7918baa5
    • F
      tracing/function-return-tracer: free the return stack on free_task() · 65afa5e6
      Frederic Weisbecker 提交于
      Impact: avoid losing some traces when a task is freed
      
      do_exit() is not the last function called when a task finishes.
      There are still some functions which are to be called such as
      ree_task().  So we delay the freeing of the return stack to the
      last moment.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      65afa5e6
    • P
      x86, mmiotrace: fix buffer overrun detection · 7ee1768d
      Pekka Paalanen 提交于
      Impact: fix mmiotrace overrun tracing
      
      When ftrace framework moved to use the ring buffer facility, the buffer
      overrun detection was broken after 2.6.27 by commit
      
      | commit 3928a8a2
      | Author: Steven Rostedt <rostedt@goodmis.org>
      | Date:   Mon Sep 29 23:02:41 2008 -0400
      |
      |     ftrace: make work with new ring buffer
      |
      |     This patch ports ftrace over to the new ring buffer.
      
      The detection is now fixed by using the ring buffer API.
      
      When mmiotrace detects a buffer overrun, it will report the number of
      lost events. People reading an mmiotrace log must know if something was
      missed, otherwise the data may not make sense.
      Signed-off-by: NPekka Paalanen <pq@iki.fi>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7ee1768d
    • F
      tracing/function-return-tracer: don't trace kfree while it frees the return stack · eae849ca
      Frederic Weisbecker 提交于
      Impact: fix a crash
      
      While I killed the cat process, I got sometimes the following (but rare)
      crash:
      
      [   65.689027] Pid: 2969, comm: cat Not tainted (2.6.28-rc6-tip #83) AMILO Li 2727
      [   65.689027] EIP: 0060:[<00000000>] EFLAGS: 00010082 CPU: 1
      [   65.689027] EIP is at 0x0
      [   65.689027] EAX: 00000000 EBX: f66cd780 ECX: c019a64a EDX: f66cd780
      [   65.689027] ESI: 00000286 EDI: f66cd780 EBP: f630be2c ESP: f630be24
      [   65.689027]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
      [   65.689027] Process cat (pid: 2969, ti=f630a000 task=f66cd780 task.ti=f630a000)
      [   65.689027] Stack:
      [   65.689027]  00000012 f630bd54 f630be7c c012c853 00000000 c0133cc9 f66cda54 f630be5c
      [   65.689027]  f630be68 f66cda54 f66cd88c f66cd878 f7070000 00000001 f630be90 c0135dbc
      [   65.689027]  f614a614 f630be68 f630be68 f65ba200 00000002 f630bf10 f630be90 c012cad6
      [   65.689027] Call Trace:
      [   65.689027]  [<c012c853>] ? do_exit+0x603/0x850
      [   65.689027]  [<c0133cc9>] ? next_signal+0x9/0x40
      [   65.689027]  [<c0135dbc>] ? dequeue_signal+0x8c/0x180
      [   65.689027]  [<c012cad6>] ? do_group_exit+0x36/0x90
      [   65.689027]  [<c013709c>] ? get_signal_to_deliver+0x20c/0x390
      [   65.689027]  [<c0102b69>] ? do_notify_resume+0x99/0x8b0
      [   65.689027]  [<c02e6d1a>] ? tty_ldisc_deref+0x5a/0x80
      [   65.689027]  [<c014db9b>] ? trace_hardirqs_on+0xb/0x10
      [   65.689027]  [<c02e6d1a>] ? tty_ldisc_deref+0x5a/0x80
      [   65.689027]  [<c02e39b0>] ? n_tty_write+0x0/0x340
      [   65.689027]  [<c02e1812>] ? redirected_tty_write+0x82/0x90
      [   65.689027]  [<c019ee99>] ? vfs_write+0x99/0xd0
      [   65.689027]  [<c02e1790>] ? redirected_tty_write+0x0/0x90
      [   65.689027]  [<c019f342>] ? sys_write+0x42/0x70
      [   65.689027]  [<c01035ca>] ? work_notifysig+0x13/0x19
      [   65.689027] Code:  Bad EIP value.
      [   65.689027] EIP: [<00000000>] 0x0 SS:ESP 0068:f630be24
      
      This is because on do_exit(), kfree is called to free the return addresses stack
      but kfree is traced and stored its return address in this stack.
      This patch fixes it.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      eae849ca
  3. 23 11月, 2008 15 次提交
    • T
      tracing/stack-tracer: avoid races accessing file · e38da592
      Török Edwin 提交于
      Impact: fix race
      
      vma->vm_file reference is only stable while holding the mmap_sem,
      so move usage of it to within the critical section.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e38da592
    • T
      tracing/stack-tracer: introduce CONFIG_USER_STACKTRACE_SUPPORT · 8d26487f
      Török Edwin 提交于
      Impact: cleanup
      
      User stack tracing is just implemented for x86, but it is not x86 specific.
      
      Introduce a generic config flag, that is currently enabled only for x86.
      When other arches implement it, they will have to
      SELECT USER_STACKTRACE_SUPPORT.
      Signed-off-by: NTörök Edwin <edwintorok@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8d26487f
    • T
      tracing/stack-tracer: fix locking and refcounts · cffa10ae
      Török Edwin 提交于
      Impact: fix refcounting/object-access bug
      
      Hold mmap_sem while looking up/accessing vma.
      Hold the RCU lock while using the task we looked up.
      Signed-off-by: NTörök Edwin <edwintorok@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cffa10ae
    • T
      tracing/stack-tracer: fix style issues · 8d7c6a96
      Török Edwin 提交于
      Impact: cleanup
      Signed-off-by: NTörök Edwin <edwintorok@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8d7c6a96
    • S
      trace: fix compiler warning in branch profiler · 0429149f
      Steven Rostedt 提交于
      Impact: fix compiler warning
      
      The ftrace_pointers used in the branch profiler are constant values.
      They should never change. But the compiler complains when they are
      passed into the debugfs_create_file as a data pointer, because the
      function discards the qualifier.
      
      This patch typecasts the parameter to debugfs_create_file back to
      a void pointer. To remind the callbacks that they are pointing to
      a constant value, I also modified the callback local pointers to
      be const struct ftrace_pointer * as well.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0429149f
    • S
      ftrace: add ftrace_off_permanent · 69bb54ec
      Steven Rostedt 提交于
      Impact: add new API to disable all of ftrace on anomalies
      
      It case of a serious anomaly being detected (like something caught by
      lockdep) it is a good idea to disable all tracing immediately, without
      grabing any locks.
      
      This patch adds ftrace_off_permanent that disables the tracers, function
      tracing and ring buffers without a way to enable them again. This should
      only be used when something serious has been detected.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      69bb54ec
    • S
      ring-buffer: add tracing_off_permanent · 033601a3
      Steven Rostedt 提交于
      Impact: feature to permanently disable ring buffer
      
      This patch adds a API to the ring buffer code that will permanently
      disable the ring buffer from ever recording. This should only be
      called when some serious anomaly is detected, and the system
      may be in an unstable state. When that happens, shutting down the
      recording to the ring buffers may be appropriate.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      033601a3
    • S
      trace: profile all if conditionals · 2bcd521a
      Steven Rostedt 提交于
      Impact: feature to profile if statements
      
      This patch adds a branch profiler for all if () statements.
      The results will be found in:
      
        /debugfs/tracing/profile_branch
      
      For example:
      
         miss      hit    %        Function                  File              Line
       ------- ---------  -        --------                  ----              ----
             0        1 100 x86_64_start_reservations      head64.c             127
             0        1 100 copy_bootdata                  head64.c             69
             1        0   0 x86_64_start_kernel            head64.c             111
            32        0   0 set_intr_gate                  desc.h               319
             1        0   0 reserve_ebda_region            head.c               51
             1        0   0 reserve_ebda_region            head.c               47
             0        1 100 reserve_ebda_region            head.c               42
             0        0   X maxcpus                        main.c               165
      
      Miss means the branch was not taken. Hit means the branch was taken.
      The percent is the percentage the branch was taken.
      
      This adds a significant amount of overhead and should only be used
      by those analyzing their system.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2bcd521a
    • S
      trace: branch profiling should not print percent without data · bac28bfe
      Steven Rostedt 提交于
      Impact: cleanup on output of branch profiler
      
      When a branch has not been taken, it does not make sense to show
      a percentage incorrect or hit. This patch changes the behaviour
      to print out a 'X' when the branch has not been executed yet.
      
      For example:
      
       correct incorrect  %        Function                  File              Line
       ------- ---------  -        --------                  ----              ----
          2096        0   0 do_arch_prctl                  process_64.c         832
             0        0   X do_arch_prctl                  process_64.c         804
          2604        0   0 IS_ERR                         err.h                34
        130228     5765   4 __switch_to                    process_64.c         673
             0        0   X enable_TSC                     process_64.c         448
             0        0   X disable_TSC                    process_64.c         431
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      bac28bfe
    • S
      trace: consolidate unlikely and likely profiler · 45b79749
      Steven Rostedt 提交于
      Impact: clean up to make one profiler of like and unlikely tracer
      
      The likely and unlikely profiler prints out the file and line numbers
      of the annotated branches that it is profiling. It shows the number
      of times it was correct or incorrect in its guess. Having two
      different files or sections for that matter to tell us if it was a
      likely or unlikely is pretty pointless. We really only care if
      it was correct or not.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      45b79749
    • I
      tracing: allow tracing of suspend/resume & hibernation code again · cbe2f5a6
      Ingo Molnar 提交于
      Impact: widen function-tracing to suspend+resume (and hibernation) sequences
      
      Now that the ftrace kernel thread is gone, we can allow tracing
      during suspend/resume again.
      
      So revert these two commits:
      
        f42ac38c "ftrace: disable tracing for suspend to ram"
        41108eb1 "ftrace: disable tracing for hibernation"
      
      This should be tested very carefully, as it could interact with
      altneratives instruction patching, etc.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cbe2f5a6
    • T
      tracing: identify which executable object the userspace address belongs to · b54d3de9
      Török Edwin 提交于
      Impact: modify+improve the userstacktrace tracing visualization feature
      
      Store thread group leader id, and use it to lookup the address in the
      process's map. We could have looked up the address on thread's map,
      but the thread might not exist by the time we are called. The process
      might not exist either, but if you are reading trace_pipe, that is
      unlikely.
      
      Example usage:
      
       mount -t debugfs nodev /sys/kernel/debug
       cd /sys/kernel/debug/tracing
       echo userstacktrace >iter_ctrl
       echo sym-userobj >iter_ctrl
       echo sched_switch >current_tracer
       echo 1 >tracing_enabled
       cat trace_pipe >/tmp/trace&
       .... run application ...
       echo 0 >tracing_enabled
       cat /tmp/trace
      
      You'll see stack entries like:
      
         /lib/libpthread-2.7.so[+0xd370]
      
      You can convert them to function/line using:
      
         addr2line -fie /lib/libpthread-2.7.so 0xd370
      
      Or:
      
         addr2line -fie /usr/lib/debug/libpthread-2.7.so 0xd370
      
      For non-PIC/PIE executables this won't work:
      
         a.out[+0x73b]
      
      You need to run the following: addr2line -fie a.out 0x40073b
      (where 0x400000 is the default load address of a.out)
      Signed-off-by: NTörök Edwin <edwintorok@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b54d3de9
    • T
      tracing: add support for userspace stacktraces in tracing/iter_ctrl · 02b67518
      Török Edwin 提交于
      Impact: add new (default-off) tracing visualization feature
      
      Usage example:
      
       mount -t debugfs nodev /sys/kernel/debug
       cd /sys/kernel/debug/tracing
       echo userstacktrace >iter_ctrl
       echo sched_switch >current_tracer
       echo 1 >tracing_enabled
       .... run application ...
       echo 0 >tracing_enabled
      
      Then read one of 'trace','latency_trace','trace_pipe'.
      
      To get the best output you can compile your userspace programs with
      frame pointers (at least glibc + the app you are tracing).
      Signed-off-by: NTörök Edwin <edwintorok@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      02b67518
    • I
      tracing/function-return-tracer: clean up task start/exit callbacks · 82f60f0b
      Ingo Molnar 提交于
      Impact: cleanup
      
      Eliminate #ifdefs in core code by using empty inline functions.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      82f60f0b
    • F
      tracing/function-return-tracer: store return stack into task_struct and allocate it dynamically · f201ae23
      Frederic Weisbecker 提交于
      Impact: use deeper function tracing depth safely
      
      Some tests showed that function return tracing needed a more deeper depth
      of function calls. But it could be unsafe to store these return addresses
      to the stack.
      
      So these arrays will now be allocated dynamically into task_struct of current
      only when the tracer is activated.
      
      Typical scheme when tracer is activated:
      - allocate a return stack for each task in global list.
      - fork: allocate the return stack for the newly created task
      - exit: free return stack of current
      - idle init: same as fork
      
      I chose a default depth of 50. I don't have overruns anymore.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f201ae23
  4. 21 11月, 2008 3 次提交
  5. 20 11月, 2008 6 次提交
    • L
      cgroups: fix a serious bug in cgroupstats · 33d283be
      Li Zefan 提交于
      Try this, and you'll get oops immediately:
       # cd Documentation/accounting/
       # gcc -o getdelays getdelays.c
       # mount -t cgroup -o debug xxx /mnt
       # ./getdelays -C /mnt/tasks
      
      Because a normal file's dentry->d_fsdata is a pointer to struct cftype,
      not struct cgroup.
      
      After the patch, it returns EINVAL if we try to get cgroupstats
      from a normal file.
      
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NPaul Menage <menage@google.com>
      Cc: <stable@kernel.org>		[2.6.25.x, 2.6.26.x, 2.6.27.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      33d283be
    • H
      sprint_symbol(): use less stack · 966c8c12
      Hugh Dickins 提交于
      sprint_symbol(), itself used when dumping stacks, has been wasting 128
      bytes of stack: lookup the symbol directly into the buffer supplied by the
      caller, instead of using a locally declared namebuf.
      
      I believe the name != buffer strcpy() is obsolete: the design here dates
      from when module symbol lookup pointed into a supposedly const but sadly
      volatile table; nowadays it copies, but an uncalled strcpy() looks better
      here than the risk of a recursive BUG_ON().
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      966c8c12
    • K
      cgroup: fix potential deadlock in pre_destroy · 3fa59dfb
      KAMEZAWA Hiroyuki 提交于
      As Balbir pointed out, memcg's pre_destroy handler has potential deadlock.
      
      It has following lock sequence.
      
      	cgroup_mutex (cgroup_rmdir)
      	    -> pre_destroy -> mem_cgroup_pre_destroy-> force_empty
      		-> cpu_hotplug.lock. (lru_add_drain_all->
      				      schedule_work->
                                            get_online_cpus)
      
      But, cpuset has following.
      	cpu_hotplug.lock (call notifier)
      		-> cgroup_mutex. (within notifier)
      
      Then, this lock sequence should be fixed.
      
      Considering how pre_destroy works, it's not necessary to holding
      cgroup_mutex() while calling it.
      
      As a side effect, we don't have to wait at this mutex while memcg's
      force_empty works.(it can be long when there are tons of pages.)
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3fa59dfb
    • M
      cpuset: update top cpuset's mems after adding a node · f481891f
      Miao Xie 提交于
      After adding a node into the machine, top cpuset's mems isn't updated.
      
      By reviewing the code, we found that the update function
      
        cpuset_track_online_nodes()
      
      was invoked after node_states[N_ONLINE] changes.  It is wrong because
      N_ONLINE just means node has pgdat, and if node has/added memory, we use
      N_HIGH_MEMORY.  So, We should invoke the update function after
      node_states[N_HIGH_MEMORY] changes, just like its commit says.
      
      This patch fixes it.  And we use notifier of memory hotplug instead of
      direct calling of cpuset_track_online_nodes().
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Acked-by: NYasunori Goto <y-goto@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Paul Menage <menage@google.com
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f481891f
    • U
      reintroduce accept4 · de11defe
      Ulrich Drepper 提交于
      Introduce a new accept4() system call.  The addition of this system call
      matches analogous changes in 2.6.27 (dup3(), evenfd2(), signalfd4(),
      inotify_init1(), epoll_create1(), pipe2()) which added new system calls
      that differed from analogous traditional system calls in adding a flags
      argument that can be used to access additional functionality.
      
      The accept4() system call is exactly the same as accept(), except that
      it adds a flags bit-mask argument.  Two flags are initially implemented.
      (Most of the new system calls in 2.6.27 also had both of these flags.)
      
      SOCK_CLOEXEC causes the close-on-exec (FD_CLOEXEC) flag to be enabled
      for the new file descriptor returned by accept4().  This is a useful
      security feature to avoid leaking information in a multithreaded
      program where one thread is doing an accept() at the same time as
      another thread is doing a fork() plus exec().  More details here:
      http://udrepper.livejournal.com/20407.html "Secure File Descriptor Handling",
      Ulrich Drepper).
      
      The other flag is SOCK_NONBLOCK, which causes the O_NONBLOCK flag
      to be enabled on the new open file description created by accept4().
      (This flag is merely a convenience, saving the use of additional calls
      fcntl(F_GETFL) and fcntl (F_SETFL) to achieve the same result.
      
      Here's a test program.  Works on x86-32.  Should work on x86-64, but
      I (mtk) don't have a system to hand to test with.
      
      It tests accept4() with each of the four possible combinations of
      SOCK_CLOEXEC and SOCK_NONBLOCK set/clear in 'flags', and verifies
      that the appropriate flags are set on the file descriptor/open file
      description returned by accept4().
      
      I tested Ulrich's patch in this thread by applying against 2.6.28-rc2,
      and it passes according to my test program.
      
      /* test_accept4.c
      
        Copyright (C) 2008, Linux Foundation, written by Michael Kerrisk
             <mtk.manpages@gmail.com>
      
        Licensed under the GNU GPLv2 or later.
      */
      #define _GNU_SOURCE
      #include <unistd.h>
      #include <sys/syscall.h>
      #include <sys/socket.h>
      #include <netinet/in.h>
      #include <stdlib.h>
      #include <fcntl.h>
      #include <stdio.h>
      #include <string.h>
      
      #define PORT_NUM 33333
      
      #define die(msg) do { perror(msg); exit(EXIT_FAILURE); } while (0)
      
      /**********************************************************************/
      
      /* The following is what we need until glibc gets a wrapper for
        accept4() */
      
      /* Flags for socket(), socketpair(), accept4() */
      #ifndef SOCK_CLOEXEC
      #define SOCK_CLOEXEC    O_CLOEXEC
      #endif
      #ifndef SOCK_NONBLOCK
      #define SOCK_NONBLOCK   O_NONBLOCK
      #endif
      
      #ifdef __x86_64__
      #define SYS_accept4 288
      #elif __i386__
      #define USE_SOCKETCALL 1
      #define SYS_ACCEPT4 18
      #else
      #error "Sorry -- don't know the syscall # on this architecture"
      #endif
      
      static int
      accept4(int fd, struct sockaddr *sockaddr, socklen_t *addrlen, int flags)
      {
         printf("Calling accept4(): flags = %x", flags);
         if (flags != 0) {
             printf(" (");
             if (flags & SOCK_CLOEXEC)
                 printf("SOCK_CLOEXEC");
             if ((flags & SOCK_CLOEXEC) && (flags & SOCK_NONBLOCK))
                 printf(" ");
             if (flags & SOCK_NONBLOCK)
                 printf("SOCK_NONBLOCK");
             printf(")");
         }
         printf("\n");
      
      #if USE_SOCKETCALL
         long args[6];
      
         args[0] = fd;
         args[1] = (long) sockaddr;
         args[2] = (long) addrlen;
         args[3] = flags;
      
         return syscall(SYS_socketcall, SYS_ACCEPT4, args);
      #else
         return syscall(SYS_accept4, fd, sockaddr, addrlen, flags);
      #endif
      }
      
      /**********************************************************************/
      
      static int
      do_test(int lfd, struct sockaddr_in *conn_addr,
             int closeonexec_flag, int nonblock_flag)
      {
         int connfd, acceptfd;
         int fdf, flf, fdf_pass, flf_pass;
         struct sockaddr_in claddr;
         socklen_t addrlen;
      
         printf("=======================================\n");
      
         connfd = socket(AF_INET, SOCK_STREAM, 0);
         if (connfd == -1)
             die("socket");
         if (connect(connfd, (struct sockaddr *) conn_addr,
                     sizeof(struct sockaddr_in)) == -1)
             die("connect");
      
         addrlen = sizeof(struct sockaddr_in);
         acceptfd = accept4(lfd, (struct sockaddr *) &claddr, &addrlen,
                            closeonexec_flag | nonblock_flag);
         if (acceptfd == -1) {
             perror("accept4()");
             close(connfd);
             return 0;
         }
      
         fdf = fcntl(acceptfd, F_GETFD);
         if (fdf == -1)
             die("fcntl:F_GETFD");
         fdf_pass = ((fdf & FD_CLOEXEC) != 0) ==
                    ((closeonexec_flag & SOCK_CLOEXEC) != 0);
         printf("Close-on-exec flag is %sset (%s); ",
                 (fdf & FD_CLOEXEC) ? "" : "not ",
                 fdf_pass ? "OK" : "failed");
      
         flf = fcntl(acceptfd, F_GETFL);
         if (flf == -1)
             die("fcntl:F_GETFD");
         flf_pass = ((flf & O_NONBLOCK) != 0) ==
                    ((nonblock_flag & SOCK_NONBLOCK) !=0);
         printf("nonblock flag is %sset (%s)\n",
                 (flf & O_NONBLOCK) ? "" : "not ",
                 flf_pass ? "OK" : "failed");
      
         close(acceptfd);
         close(connfd);
      
         printf("Test result: %s\n", (fdf_pass && flf_pass) ? "PASS" : "FAIL");
         return fdf_pass && flf_pass;
      }
      
      static int
      create_listening_socket(int port_num)
      {
         struct sockaddr_in svaddr;
         int lfd;
         int optval;
      
         memset(&svaddr, 0, sizeof(struct sockaddr_in));
         svaddr.sin_family = AF_INET;
         svaddr.sin_addr.s_addr = htonl(INADDR_ANY);
         svaddr.sin_port = htons(port_num);
      
         lfd = socket(AF_INET, SOCK_STREAM, 0);
         if (lfd == -1)
             die("socket");
      
         optval = 1;
         if (setsockopt(lfd, SOL_SOCKET, SO_REUSEADDR, &optval,
                        sizeof(optval)) == -1)
             die("setsockopt");
      
         if (bind(lfd, (struct sockaddr *) &svaddr,
                  sizeof(struct sockaddr_in)) == -1)
             die("bind");
      
         if (listen(lfd, 5) == -1)
             die("listen");
      
         return lfd;
      }
      
      int
      main(int argc, char *argv[])
      {
         struct sockaddr_in conn_addr;
         int lfd;
         int port_num;
         int passed;
      
         passed = 1;
      
         port_num = (argc > 1) ? atoi(argv[1]) : PORT_NUM;
      
         memset(&conn_addr, 0, sizeof(struct sockaddr_in));
         conn_addr.sin_family = AF_INET;
         conn_addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
         conn_addr.sin_port = htons(port_num);
      
         lfd = create_listening_socket(port_num);
      
         if (!do_test(lfd, &conn_addr, 0, 0))
             passed = 0;
         if (!do_test(lfd, &conn_addr, SOCK_CLOEXEC, 0))
             passed = 0;
         if (!do_test(lfd, &conn_addr, 0, SOCK_NONBLOCK))
             passed = 0;
         if (!do_test(lfd, &conn_addr, SOCK_CLOEXEC, SOCK_NONBLOCK))
             passed = 0;
      
         close(lfd);
      
         exit(passed ? EXIT_SUCCESS : EXIT_FAILURE);
      }
      
      [mtk.manpages@gmail.com: rewrote changelog, updated test program]
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Tested-by: NMichael Kerrisk <mtk.manpages@gmail.com>
      Acked-by: NMichael Kerrisk <mtk.manpages@gmail.com>
      Cc: <linux-api@vger.kernel.org>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      de11defe
    • K
      sched: fix inconsistency when redistribute per-cpu tg->cfs_rq shares · ec4e0e2f
      Ken Chen 提交于
      Impact: make load-balancing more consistent
      
      In the update_shares() path leading to tg_shares_up(), the calculation of
      per-cpu cfs_rq shares is rather erratic even under moderate task wake up
      rate.  The problem is that the per-cpu tg->cfs_rq load weight used in the
      sd_rq_weight aggregation and actual redistribution of the cfs_rq->shares
      are collected at different time.  Under moderate system load, we've seen
      quite a bit of variation on the cfs_rq->shares and ultimately wildly
      affects sched_entity's load weight.
      
      This patch caches the result of initial per-cpu load weight when doing the
      sum calculation, and then pass it down to update_group_shares_cpu() for
      redistributing per-cpu cfs_rq shares.  This allows consistent total cfs_rq
      shares across all CPUs. It also simplifies the rounding and zero load
      weight check.
      Signed-off-by: NKen Chen <kenchen@google.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ec4e0e2f
  6. 19 11月, 2008 8 次提交
    • A
      profiling: clean up profile_nop() · 60a51513
      Andrew Morton 提交于
      Impact: cleanup
      
      No point in inlining this.
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      60a51513
    • I
      ftrace: fix selftest locking · 86fa2f60
      Ingo Molnar 提交于
      Impact: fix self-test boot crash
      
      Self-test failure forgot to re-lock the BKL - crashing the next
      initcall:
      
      Testing tracer irqsoff: .. no entries found ..FAILED!
      initcall init_irqsoff_tracer+0x0/0x11 returned 0 after 3906 usecs
      calling  init_mmio_trace+0x0/0xf @ 1
      ------------[ cut here ]------------
      Kernel BUG at c0c0a915 [verbose debug info unavailable]
      invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
      last sysfs file:
      
      Pid: 1, comm: swapper Not tainted (2.6.28-rc5-tip #53704)
      EIP: 0060:[<c0c0a915>] EFLAGS: 00010286 CPU: 1
      EIP is at unlock_kernel+0x10/0x2b
      EAX: ffffffff EBX: 00000000 ECX: 00000000 EDX: f7030000
      ESI: c12da19c EDI: 00000000 EBP: f7039f54 ESP: f7039f54
       DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
      Process swapper (pid: 1, ti=f7038000 task=f7030000 task.ti=f7038000)
      Stack:
       f7039f6c c0164d30 c013fed8 a7d8d7b4 00000000 00000000 f7039f74 c12fb78a
       f7039fd0 c0101132 c12fb77d 00000000 6f727200 6f632072 2d206564 c1002031
       0000000f f7039fa2 f7039fb0 3531b171 00000000 00000000 0000002f c12ca480
      Call Trace:
       [<c0164d30>] ? register_tracer+0x66/0x13f
       [<c013fed8>] ? ktime_get+0x19/0x1b
       [<c12fb78a>] ? init_mmio_trace+0xd/0xf
       [<c0101132>] ? do_one_initcall+0x4a/0x111
       [<c12fb77d>] ? init_mmio_trace+0x0/0xf
       [<c015c7e6>] ? init_irq_proc+0x46/0x59
       [<c12e851d>] ? kernel_init+0x104/0x152
       [<c12e8419>] ? kernel_init+0x0/0x152
       [<c01038b7>] ? kernel_thread_helper+0x7/0x10
      Code: 58 14 43 75 0a b8 00 9b 2d c1 e8 51 43 7a ff 64 a1 00 a0 37 c1 89 58 14 5b 5d c3 55 64 8b 15 00 a0 37 c1 83 7a 14 00 89 e5 79 04 <0f> 0b eb fe 8b 42 14 48 85 c0 89 42 14 79 0a b8 00 9b 2d c1 e8
      EIP: [<c0c0a915>] unlock_kernel+0x10/0x2b SS:ESP 0068:f7039f54
      ---[ end trace a7919e7f17c0a725 ]---
      Kernel panic - not syncing: Attempted to kill init!
      
      So clean up the flow a bit.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      86fa2f60
    • S
      ftrace: fix dyn ftrace filter selection · 32464779
      Steven Rostedt 提交于
      Impact: clean up and fix for dyn ftrace filter selection
      
      The previous logic of the dynamic ftrace selection of enabling
      or disabling functions was complex and incorrect. This patch simplifies
      the code and corrects the usage. This simplification also makes the
      code more robust.
      
      Here is the correct logic:
      
        Given a function that can be traced by dynamic ftrace:
      
        If the function is not to be traced, disable it if it was enabled.
        (this is if the function is in the set_ftrace_notrace file)
      
        (filter is on if there exists any functions in set_ftrace_filter file)
      
        If the filter is on, and we are enabling functions:
          If the function is in set_ftrace_filter, enable it if it is not
            already enabled.
          If the function is not in set_ftrace_filter, disable it if it is not
            already disabled.
      
        Otherwise, if the filter is off and we are enabling function tracing:
          Enable the function if it is not already enabled.
      
        Otherwise, if we are disabling function tracing:
          Disable the function if it is not already disabled.
      
      This code now sets or clears the ENABLED flag in the record, and at the
      end it will enable the function if the flag is set, or disable the function
      if the flag is cleared.
      
      The parameters for the function that does the above logic is also
      simplified. Instead of passing in confusing "new" and "old" where
      they might be swapped if the "enabled" flag is not set. The old logic
      even had one of the above always NULL and had to be filled in. The new
      logic simply passes in one parameter called "nop". A "call" is calculated
      in the code, and at the end of the logic, when we know we need to either
      disable or enable the function, we can then use the "nop" and "call"
      properly.
      
      This code is more robust than the previous version.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      32464779
    • S
      ftrace: make filtered functions effective on setting · 82043278
      Steven Rostedt 提交于
      Impact: fix filter selection to apply when set
      
      It can be confusing when the set_filter_functions is set (or cleared)
      and the functions being recorded by the dynamic tracer does not
      match.
      
      This patch causes the code to be updated if the function tracer is
      enabled and the filter is changed.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      82043278
    • S
      ftrace: fix set_ftrace_filter · f10ed36e
      Steven Rostedt 提交于
      Impact: fix of output of set_ftrace_filter
      
      The commit "ftrace: do not show freed records in
                   available_filter_functions"
      
      Removed a bit too much from the set_ftrace_filter code, where we now see
      all functions in the set_ftrace_filter file even when we set a filter.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f10ed36e
    • H
      ftrace: preemptoff selftest not working · a2250634
      Heiko Carstens 提交于
      Impact: fix preemptoff and preemptirqsoff tracer self-tests
      
      I was wondering why the preemptoff and preemptirqsoff tracer selftests
      don't work on s390. After all its just that they get called from
      non-preemptible context:
      
      kernel_init() will execute all initcalls, however the first line in
      kernel_init() is lock_kernel(), which causes the preempt_count to be
      increased. Any later calls to add_preempt_count() (especially those
      from the selftests) will therefore not result in a call to
      trace_preempt_off() since the check below in add_preempt_count()
      will be false:
      
              if (preempt_count() == val)
                      trace_preempt_off(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1));
      
      Hence the trace buffer will be empty.
      
      Fix this by releasing the BKL during the self-tests.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a2250634
    • V
      trace: introduce missing mutex_unlock() · 641d2f63
      Vegard Nossum 提交于
      Impact: fix tracing buffer mutex leak in case of allocation failure
      
      This error was spotted by this semantic patch:
      
        http://www.emn.fr/x-info/coccinelle/mut.html
      
      It looks correct as far as I can tell. Please review.
      Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com>
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      641d2f63
    • A
      suspend: use WARN not WARN_ON to print the message · a6a0c4ca
      Arjan van de Ven 提交于
      By using WARN(), kerneloops.org can collect which component is causing
      the delay and make statistics about that. suspend_test_finish() is
      currently the number 2 item but unless we can collect who's causing
      it we're not going to be able to fix the hot topic ones..
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a6a0c4ca
  7. 18 11月, 2008 1 次提交
    • J
      tracing: kernel/trace/trace.c: introduce missing kfree() · 0bb943c7
      Julia Lawall 提交于
      Impact: fix memory leak
      
      Error handling code following a kzalloc should free the allocated data.
      
      The semantic match that finds the problem is as follows:
      (http://www.emn.fr/x-info/coccinelle/)
      
      // <smpl>
      @r exists@
      local idexpression x;
      statement S;
      expression E;
      identifier f,l;
      position p1,p2;
      expression *ptr != NULL;
      @@
      
      (
      if ((x@p1 = \(kmalloc\|kzalloc\|kcalloc\)(...)) == NULL) S
      |
      x@p1 = \(kmalloc\|kzalloc\|kcalloc\)(...);
      ...
      if (x == NULL) S
      )
      <... when != x
           when != if (...) { <+...x...+> }
      x->f = E
      ...>
      (
       return \(0\|<+...x...+>\|ptr\);
      |
       return@p2 ...;
      )
      
      @script:python@
      p1 << r.p1;
      p2 << r.p2;
      @@
      
      print "* file: %s kmalloc %s return %s" % (p1[0].file,p1[0].line,p2[0].line)
      // </smpl>
      Signed-off-by: NJulia Lawall <julia@diku.dk>
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0bb943c7