1. 24 11月, 2008 2 次提交
    • F
      tracing/function-return-tracer: free the return stack on free_task() · 65afa5e6
      Frederic Weisbecker 提交于
      Impact: avoid losing some traces when a task is freed
      
      do_exit() is not the last function called when a task finishes.
      There are still some functions which are to be called such as
      ree_task().  So we delay the freeing of the return stack to the
      last moment.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      65afa5e6
    • F
      tracing/function-return-tracer: don't trace kfree while it frees the return stack · eae849ca
      Frederic Weisbecker 提交于
      Impact: fix a crash
      
      While I killed the cat process, I got sometimes the following (but rare)
      crash:
      
      [   65.689027] Pid: 2969, comm: cat Not tainted (2.6.28-rc6-tip #83) AMILO Li 2727
      [   65.689027] EIP: 0060:[<00000000>] EFLAGS: 00010082 CPU: 1
      [   65.689027] EIP is at 0x0
      [   65.689027] EAX: 00000000 EBX: f66cd780 ECX: c019a64a EDX: f66cd780
      [   65.689027] ESI: 00000286 EDI: f66cd780 EBP: f630be2c ESP: f630be24
      [   65.689027]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
      [   65.689027] Process cat (pid: 2969, ti=f630a000 task=f66cd780 task.ti=f630a000)
      [   65.689027] Stack:
      [   65.689027]  00000012 f630bd54 f630be7c c012c853 00000000 c0133cc9 f66cda54 f630be5c
      [   65.689027]  f630be68 f66cda54 f66cd88c f66cd878 f7070000 00000001 f630be90 c0135dbc
      [   65.689027]  f614a614 f630be68 f630be68 f65ba200 00000002 f630bf10 f630be90 c012cad6
      [   65.689027] Call Trace:
      [   65.689027]  [<c012c853>] ? do_exit+0x603/0x850
      [   65.689027]  [<c0133cc9>] ? next_signal+0x9/0x40
      [   65.689027]  [<c0135dbc>] ? dequeue_signal+0x8c/0x180
      [   65.689027]  [<c012cad6>] ? do_group_exit+0x36/0x90
      [   65.689027]  [<c013709c>] ? get_signal_to_deliver+0x20c/0x390
      [   65.689027]  [<c0102b69>] ? do_notify_resume+0x99/0x8b0
      [   65.689027]  [<c02e6d1a>] ? tty_ldisc_deref+0x5a/0x80
      [   65.689027]  [<c014db9b>] ? trace_hardirqs_on+0xb/0x10
      [   65.689027]  [<c02e6d1a>] ? tty_ldisc_deref+0x5a/0x80
      [   65.689027]  [<c02e39b0>] ? n_tty_write+0x0/0x340
      [   65.689027]  [<c02e1812>] ? redirected_tty_write+0x82/0x90
      [   65.689027]  [<c019ee99>] ? vfs_write+0x99/0xd0
      [   65.689027]  [<c02e1790>] ? redirected_tty_write+0x0/0x90
      [   65.689027]  [<c019f342>] ? sys_write+0x42/0x70
      [   65.689027]  [<c01035ca>] ? work_notifysig+0x13/0x19
      [   65.689027] Code:  Bad EIP value.
      [   65.689027] EIP: [<00000000>] 0x0 SS:ESP 0068:f630be24
      
      This is because on do_exit(), kfree is called to free the return addresses stack
      but kfree is traced and stored its return address in this stack.
      This patch fixes it.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      eae849ca
  2. 23 11月, 2008 2 次提交
  3. 21 11月, 2008 1 次提交
    • L
      function tracing: fix wrong position computing of stack_trace · 522a110b
      Liming Wang 提交于
      Impact: make output of stack_trace complete if buffer overruns
      
      When read buffer overruns, the output of stack_trace isn't complete.
      
      When printing records with seq_printf in t_show, if the read buffer
      has overruned by the current record, then this record won't be
      printed to user space through read buffer, it will just be dropped in
      this printing.
      
      When next printing, t_start should return the "*pos"th record, which
      is the one dropped by previous printing, but it just returns
      (m->private + *pos)th record.
      
      Here we use a more sane method to implement seq_operations which can
      be found in kernel code. Thus we needn't initialize m->private.
      
      About testing, it's not easy to overrun read buffer, but we can use
      seq_printf to print more padding bytes in t_show, then it's easy to
      check whether or not records are lost.
      
      This commit has been tested on both condition of overrun and non
      overrun.
      Signed-off-by: NLiming Wang <liming.wang@windriver.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      522a110b
  4. 20 11月, 2008 5 次提交
    • L
      cgroups: fix a serious bug in cgroupstats · 33d283be
      Li Zefan 提交于
      Try this, and you'll get oops immediately:
       # cd Documentation/accounting/
       # gcc -o getdelays getdelays.c
       # mount -t cgroup -o debug xxx /mnt
       # ./getdelays -C /mnt/tasks
      
      Because a normal file's dentry->d_fsdata is a pointer to struct cftype,
      not struct cgroup.
      
      After the patch, it returns EINVAL if we try to get cgroupstats
      from a normal file.
      
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NPaul Menage <menage@google.com>
      Cc: <stable@kernel.org>		[2.6.25.x, 2.6.26.x, 2.6.27.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      33d283be
    • H
      sprint_symbol(): use less stack · 966c8c12
      Hugh Dickins 提交于
      sprint_symbol(), itself used when dumping stacks, has been wasting 128
      bytes of stack: lookup the symbol directly into the buffer supplied by the
      caller, instead of using a locally declared namebuf.
      
      I believe the name != buffer strcpy() is obsolete: the design here dates
      from when module symbol lookup pointed into a supposedly const but sadly
      volatile table; nowadays it copies, but an uncalled strcpy() looks better
      here than the risk of a recursive BUG_ON().
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      966c8c12
    • K
      cgroup: fix potential deadlock in pre_destroy · 3fa59dfb
      KAMEZAWA Hiroyuki 提交于
      As Balbir pointed out, memcg's pre_destroy handler has potential deadlock.
      
      It has following lock sequence.
      
      	cgroup_mutex (cgroup_rmdir)
      	    -> pre_destroy -> mem_cgroup_pre_destroy-> force_empty
      		-> cpu_hotplug.lock. (lru_add_drain_all->
      				      schedule_work->
                                            get_online_cpus)
      
      But, cpuset has following.
      	cpu_hotplug.lock (call notifier)
      		-> cgroup_mutex. (within notifier)
      
      Then, this lock sequence should be fixed.
      
      Considering how pre_destroy works, it's not necessary to holding
      cgroup_mutex() while calling it.
      
      As a side effect, we don't have to wait at this mutex while memcg's
      force_empty works.(it can be long when there are tons of pages.)
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3fa59dfb
    • M
      cpuset: update top cpuset's mems after adding a node · f481891f
      Miao Xie 提交于
      After adding a node into the machine, top cpuset's mems isn't updated.
      
      By reviewing the code, we found that the update function
      
        cpuset_track_online_nodes()
      
      was invoked after node_states[N_ONLINE] changes.  It is wrong because
      N_ONLINE just means node has pgdat, and if node has/added memory, we use
      N_HIGH_MEMORY.  So, We should invoke the update function after
      node_states[N_HIGH_MEMORY] changes, just like its commit says.
      
      This patch fixes it.  And we use notifier of memory hotplug instead of
      direct calling of cpuset_track_online_nodes().
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Acked-by: NYasunori Goto <y-goto@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Paul Menage <menage@google.com
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f481891f
    • U
      reintroduce accept4 · de11defe
      Ulrich Drepper 提交于
      Introduce a new accept4() system call.  The addition of this system call
      matches analogous changes in 2.6.27 (dup3(), evenfd2(), signalfd4(),
      inotify_init1(), epoll_create1(), pipe2()) which added new system calls
      that differed from analogous traditional system calls in adding a flags
      argument that can be used to access additional functionality.
      
      The accept4() system call is exactly the same as accept(), except that
      it adds a flags bit-mask argument.  Two flags are initially implemented.
      (Most of the new system calls in 2.6.27 also had both of these flags.)
      
      SOCK_CLOEXEC causes the close-on-exec (FD_CLOEXEC) flag to be enabled
      for the new file descriptor returned by accept4().  This is a useful
      security feature to avoid leaking information in a multithreaded
      program where one thread is doing an accept() at the same time as
      another thread is doing a fork() plus exec().  More details here:
      http://udrepper.livejournal.com/20407.html "Secure File Descriptor Handling",
      Ulrich Drepper).
      
      The other flag is SOCK_NONBLOCK, which causes the O_NONBLOCK flag
      to be enabled on the new open file description created by accept4().
      (This flag is merely a convenience, saving the use of additional calls
      fcntl(F_GETFL) and fcntl (F_SETFL) to achieve the same result.
      
      Here's a test program.  Works on x86-32.  Should work on x86-64, but
      I (mtk) don't have a system to hand to test with.
      
      It tests accept4() with each of the four possible combinations of
      SOCK_CLOEXEC and SOCK_NONBLOCK set/clear in 'flags', and verifies
      that the appropriate flags are set on the file descriptor/open file
      description returned by accept4().
      
      I tested Ulrich's patch in this thread by applying against 2.6.28-rc2,
      and it passes according to my test program.
      
      /* test_accept4.c
      
        Copyright (C) 2008, Linux Foundation, written by Michael Kerrisk
             <mtk.manpages@gmail.com>
      
        Licensed under the GNU GPLv2 or later.
      */
      #define _GNU_SOURCE
      #include <unistd.h>
      #include <sys/syscall.h>
      #include <sys/socket.h>
      #include <netinet/in.h>
      #include <stdlib.h>
      #include <fcntl.h>
      #include <stdio.h>
      #include <string.h>
      
      #define PORT_NUM 33333
      
      #define die(msg) do { perror(msg); exit(EXIT_FAILURE); } while (0)
      
      /**********************************************************************/
      
      /* The following is what we need until glibc gets a wrapper for
        accept4() */
      
      /* Flags for socket(), socketpair(), accept4() */
      #ifndef SOCK_CLOEXEC
      #define SOCK_CLOEXEC    O_CLOEXEC
      #endif
      #ifndef SOCK_NONBLOCK
      #define SOCK_NONBLOCK   O_NONBLOCK
      #endif
      
      #ifdef __x86_64__
      #define SYS_accept4 288
      #elif __i386__
      #define USE_SOCKETCALL 1
      #define SYS_ACCEPT4 18
      #else
      #error "Sorry -- don't know the syscall # on this architecture"
      #endif
      
      static int
      accept4(int fd, struct sockaddr *sockaddr, socklen_t *addrlen, int flags)
      {
         printf("Calling accept4(): flags = %x", flags);
         if (flags != 0) {
             printf(" (");
             if (flags & SOCK_CLOEXEC)
                 printf("SOCK_CLOEXEC");
             if ((flags & SOCK_CLOEXEC) && (flags & SOCK_NONBLOCK))
                 printf(" ");
             if (flags & SOCK_NONBLOCK)
                 printf("SOCK_NONBLOCK");
             printf(")");
         }
         printf("\n");
      
      #if USE_SOCKETCALL
         long args[6];
      
         args[0] = fd;
         args[1] = (long) sockaddr;
         args[2] = (long) addrlen;
         args[3] = flags;
      
         return syscall(SYS_socketcall, SYS_ACCEPT4, args);
      #else
         return syscall(SYS_accept4, fd, sockaddr, addrlen, flags);
      #endif
      }
      
      /**********************************************************************/
      
      static int
      do_test(int lfd, struct sockaddr_in *conn_addr,
             int closeonexec_flag, int nonblock_flag)
      {
         int connfd, acceptfd;
         int fdf, flf, fdf_pass, flf_pass;
         struct sockaddr_in claddr;
         socklen_t addrlen;
      
         printf("=======================================\n");
      
         connfd = socket(AF_INET, SOCK_STREAM, 0);
         if (connfd == -1)
             die("socket");
         if (connect(connfd, (struct sockaddr *) conn_addr,
                     sizeof(struct sockaddr_in)) == -1)
             die("connect");
      
         addrlen = sizeof(struct sockaddr_in);
         acceptfd = accept4(lfd, (struct sockaddr *) &claddr, &addrlen,
                            closeonexec_flag | nonblock_flag);
         if (acceptfd == -1) {
             perror("accept4()");
             close(connfd);
             return 0;
         }
      
         fdf = fcntl(acceptfd, F_GETFD);
         if (fdf == -1)
             die("fcntl:F_GETFD");
         fdf_pass = ((fdf & FD_CLOEXEC) != 0) ==
                    ((closeonexec_flag & SOCK_CLOEXEC) != 0);
         printf("Close-on-exec flag is %sset (%s); ",
                 (fdf & FD_CLOEXEC) ? "" : "not ",
                 fdf_pass ? "OK" : "failed");
      
         flf = fcntl(acceptfd, F_GETFL);
         if (flf == -1)
             die("fcntl:F_GETFD");
         flf_pass = ((flf & O_NONBLOCK) != 0) ==
                    ((nonblock_flag & SOCK_NONBLOCK) !=0);
         printf("nonblock flag is %sset (%s)\n",
                 (flf & O_NONBLOCK) ? "" : "not ",
                 flf_pass ? "OK" : "failed");
      
         close(acceptfd);
         close(connfd);
      
         printf("Test result: %s\n", (fdf_pass && flf_pass) ? "PASS" : "FAIL");
         return fdf_pass && flf_pass;
      }
      
      static int
      create_listening_socket(int port_num)
      {
         struct sockaddr_in svaddr;
         int lfd;
         int optval;
      
         memset(&svaddr, 0, sizeof(struct sockaddr_in));
         svaddr.sin_family = AF_INET;
         svaddr.sin_addr.s_addr = htonl(INADDR_ANY);
         svaddr.sin_port = htons(port_num);
      
         lfd = socket(AF_INET, SOCK_STREAM, 0);
         if (lfd == -1)
             die("socket");
      
         optval = 1;
         if (setsockopt(lfd, SOL_SOCKET, SO_REUSEADDR, &optval,
                        sizeof(optval)) == -1)
             die("setsockopt");
      
         if (bind(lfd, (struct sockaddr *) &svaddr,
                  sizeof(struct sockaddr_in)) == -1)
             die("bind");
      
         if (listen(lfd, 5) == -1)
             die("listen");
      
         return lfd;
      }
      
      int
      main(int argc, char *argv[])
      {
         struct sockaddr_in conn_addr;
         int lfd;
         int port_num;
         int passed;
      
         passed = 1;
      
         port_num = (argc > 1) ? atoi(argv[1]) : PORT_NUM;
      
         memset(&conn_addr, 0, sizeof(struct sockaddr_in));
         conn_addr.sin_family = AF_INET;
         conn_addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
         conn_addr.sin_port = htons(port_num);
      
         lfd = create_listening_socket(port_num);
      
         if (!do_test(lfd, &conn_addr, 0, 0))
             passed = 0;
         if (!do_test(lfd, &conn_addr, SOCK_CLOEXEC, 0))
             passed = 0;
         if (!do_test(lfd, &conn_addr, 0, SOCK_NONBLOCK))
             passed = 0;
         if (!do_test(lfd, &conn_addr, SOCK_CLOEXEC, SOCK_NONBLOCK))
             passed = 0;
      
         close(lfd);
      
         exit(passed ? EXIT_SUCCESS : EXIT_FAILURE);
      }
      
      [mtk.manpages@gmail.com: rewrote changelog, updated test program]
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Tested-by: NMichael Kerrisk <mtk.manpages@gmail.com>
      Acked-by: NMichael Kerrisk <mtk.manpages@gmail.com>
      Cc: <linux-api@vger.kernel.org>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      de11defe
  5. 19 11月, 2008 8 次提交
    • A
      profiling: clean up profile_nop() · 60a51513
      Andrew Morton 提交于
      Impact: cleanup
      
      No point in inlining this.
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      60a51513
    • I
      ftrace: fix selftest locking · 86fa2f60
      Ingo Molnar 提交于
      Impact: fix self-test boot crash
      
      Self-test failure forgot to re-lock the BKL - crashing the next
      initcall:
      
      Testing tracer irqsoff: .. no entries found ..FAILED!
      initcall init_irqsoff_tracer+0x0/0x11 returned 0 after 3906 usecs
      calling  init_mmio_trace+0x0/0xf @ 1
      ------------[ cut here ]------------
      Kernel BUG at c0c0a915 [verbose debug info unavailable]
      invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
      last sysfs file:
      
      Pid: 1, comm: swapper Not tainted (2.6.28-rc5-tip #53704)
      EIP: 0060:[<c0c0a915>] EFLAGS: 00010286 CPU: 1
      EIP is at unlock_kernel+0x10/0x2b
      EAX: ffffffff EBX: 00000000 ECX: 00000000 EDX: f7030000
      ESI: c12da19c EDI: 00000000 EBP: f7039f54 ESP: f7039f54
       DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
      Process swapper (pid: 1, ti=f7038000 task=f7030000 task.ti=f7038000)
      Stack:
       f7039f6c c0164d30 c013fed8 a7d8d7b4 00000000 00000000 f7039f74 c12fb78a
       f7039fd0 c0101132 c12fb77d 00000000 6f727200 6f632072 2d206564 c1002031
       0000000f f7039fa2 f7039fb0 3531b171 00000000 00000000 0000002f c12ca480
      Call Trace:
       [<c0164d30>] ? register_tracer+0x66/0x13f
       [<c013fed8>] ? ktime_get+0x19/0x1b
       [<c12fb78a>] ? init_mmio_trace+0xd/0xf
       [<c0101132>] ? do_one_initcall+0x4a/0x111
       [<c12fb77d>] ? init_mmio_trace+0x0/0xf
       [<c015c7e6>] ? init_irq_proc+0x46/0x59
       [<c12e851d>] ? kernel_init+0x104/0x152
       [<c12e8419>] ? kernel_init+0x0/0x152
       [<c01038b7>] ? kernel_thread_helper+0x7/0x10
      Code: 58 14 43 75 0a b8 00 9b 2d c1 e8 51 43 7a ff 64 a1 00 a0 37 c1 89 58 14 5b 5d c3 55 64 8b 15 00 a0 37 c1 83 7a 14 00 89 e5 79 04 <0f> 0b eb fe 8b 42 14 48 85 c0 89 42 14 79 0a b8 00 9b 2d c1 e8
      EIP: [<c0c0a915>] unlock_kernel+0x10/0x2b SS:ESP 0068:f7039f54
      ---[ end trace a7919e7f17c0a725 ]---
      Kernel panic - not syncing: Attempted to kill init!
      
      So clean up the flow a bit.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      86fa2f60
    • S
      ftrace: fix dyn ftrace filter selection · 32464779
      Steven Rostedt 提交于
      Impact: clean up and fix for dyn ftrace filter selection
      
      The previous logic of the dynamic ftrace selection of enabling
      or disabling functions was complex and incorrect. This patch simplifies
      the code and corrects the usage. This simplification also makes the
      code more robust.
      
      Here is the correct logic:
      
        Given a function that can be traced by dynamic ftrace:
      
        If the function is not to be traced, disable it if it was enabled.
        (this is if the function is in the set_ftrace_notrace file)
      
        (filter is on if there exists any functions in set_ftrace_filter file)
      
        If the filter is on, and we are enabling functions:
          If the function is in set_ftrace_filter, enable it if it is not
            already enabled.
          If the function is not in set_ftrace_filter, disable it if it is not
            already disabled.
      
        Otherwise, if the filter is off and we are enabling function tracing:
          Enable the function if it is not already enabled.
      
        Otherwise, if we are disabling function tracing:
          Disable the function if it is not already disabled.
      
      This code now sets or clears the ENABLED flag in the record, and at the
      end it will enable the function if the flag is set, or disable the function
      if the flag is cleared.
      
      The parameters for the function that does the above logic is also
      simplified. Instead of passing in confusing "new" and "old" where
      they might be swapped if the "enabled" flag is not set. The old logic
      even had one of the above always NULL and had to be filled in. The new
      logic simply passes in one parameter called "nop". A "call" is calculated
      in the code, and at the end of the logic, when we know we need to either
      disable or enable the function, we can then use the "nop" and "call"
      properly.
      
      This code is more robust than the previous version.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      32464779
    • S
      ftrace: make filtered functions effective on setting · 82043278
      Steven Rostedt 提交于
      Impact: fix filter selection to apply when set
      
      It can be confusing when the set_filter_functions is set (or cleared)
      and the functions being recorded by the dynamic tracer does not
      match.
      
      This patch causes the code to be updated if the function tracer is
      enabled and the filter is changed.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      82043278
    • S
      ftrace: fix set_ftrace_filter · f10ed36e
      Steven Rostedt 提交于
      Impact: fix of output of set_ftrace_filter
      
      The commit "ftrace: do not show freed records in
                   available_filter_functions"
      
      Removed a bit too much from the set_ftrace_filter code, where we now see
      all functions in the set_ftrace_filter file even when we set a filter.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f10ed36e
    • H
      ftrace: preemptoff selftest not working · a2250634
      Heiko Carstens 提交于
      Impact: fix preemptoff and preemptirqsoff tracer self-tests
      
      I was wondering why the preemptoff and preemptirqsoff tracer selftests
      don't work on s390. After all its just that they get called from
      non-preemptible context:
      
      kernel_init() will execute all initcalls, however the first line in
      kernel_init() is lock_kernel(), which causes the preempt_count to be
      increased. Any later calls to add_preempt_count() (especially those
      from the selftests) will therefore not result in a call to
      trace_preempt_off() since the check below in add_preempt_count()
      will be false:
      
              if (preempt_count() == val)
                      trace_preempt_off(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1));
      
      Hence the trace buffer will be empty.
      
      Fix this by releasing the BKL during the self-tests.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a2250634
    • V
      trace: introduce missing mutex_unlock() · 641d2f63
      Vegard Nossum 提交于
      Impact: fix tracing buffer mutex leak in case of allocation failure
      
      This error was spotted by this semantic patch:
      
        http://www.emn.fr/x-info/coccinelle/mut.html
      
      It looks correct as far as I can tell. Please review.
      Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com>
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      641d2f63
    • A
      suspend: use WARN not WARN_ON to print the message · a6a0c4ca
      Arjan van de Ven 提交于
      By using WARN(), kerneloops.org can collect which component is causing
      the delay and make statistics about that. suspend_test_finish() is
      currently the number 2 item but unless we can collect who's causing
      it we're not going to be able to fix the hot topic ones..
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a6a0c4ca
  6. 18 11月, 2008 8 次提交
    • J
      tracing: kernel/trace/trace.c: introduce missing kfree() · 0bb943c7
      Julia Lawall 提交于
      Impact: fix memory leak
      
      Error handling code following a kzalloc should free the allocated data.
      
      The semantic match that finds the problem is as follows:
      (http://www.emn.fr/x-info/coccinelle/)
      
      // <smpl>
      @r exists@
      local idexpression x;
      statement S;
      expression E;
      identifier f,l;
      position p1,p2;
      expression *ptr != NULL;
      @@
      
      (
      if ((x@p1 = \(kmalloc\|kzalloc\|kcalloc\)(...)) == NULL) S
      |
      x@p1 = \(kmalloc\|kzalloc\|kcalloc\)(...);
      ...
      if (x == NULL) S
      )
      <... when != x
           when != if (...) { <+...x...+> }
      x->f = E
      ...>
      (
       return \(0\|<+...x...+>\|ptr\);
      |
       return@p2 ...;
      )
      
      @script:python@
      p1 << r.p1;
      p2 << r.p2;
      @@
      
      print "* file: %s kmalloc %s return %s" % (p1[0].file,p1[0].line,p2[0].line)
      // </smpl>
      Signed-off-by: NJulia Lawall <julia@diku.dk>
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0bb943c7
    • L
      relay: fix cpu offline problem · 98ba4031
      Lai Jiangshan 提交于
      relay_open() will close allocated buffers when failed.
      but if cpu offlined, some buffer will not be closed.
      this patch fixed it.
      
      and did cleanup for relay_reset() too.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      98ba4031
    • F
      tracing/function-return-tracer: add the overrun field · 0231022c
      Frederic Weisbecker 提交于
      Impact: help to find the better depth of trace
      
      We decided to arbitrary define the depth of function return trace as
      "20". Perhaps this is not enough. To help finding an optimal depth, we
      measure now the overrun: the number of functions that have been missed
      for the current thread. By default this is not displayed, we have to
      do set a particular flag on the return tracer: echo overrun >
      /debug/tracing/trace_options And the overrun will be printed on the
      right.
      
      As the trace shows below, the current 20 depth is not enough.
      
      update_wall_time+0x37f/0x8c0 -> update_xtime_cache (345 ns) (Overruns: 2838)
      update_wall_time+0x384/0x8c0 -> clocksource_get_next (1141 ns) (Overruns: 2838)
      do_timer+0x23/0x100 -> update_wall_time (3882 ns) (Overruns: 2838)
      tick_do_update_jiffies64+0xbf/0x160 -> do_timer (5339 ns) (Overruns: 2838)
      tick_sched_timer+0x6a/0xf0 -> tick_do_update_jiffies64 (7209 ns) (Overruns: 2838)
      vgacon_set_cursor_size+0x98/0x120 -> native_io_delay (2613 ns) (Overruns: 274)
      vgacon_cursor+0x16e/0x1d0 -> vgacon_set_cursor_size (33151 ns) (Overruns: 274)
      set_cursor+0x5f/0x80 -> vgacon_cursor (36432 ns) (Overruns: 274)
      con_flush_chars+0x34/0x40 -> set_cursor (38790 ns) (Overruns: 274)
      release_console_sem+0x1ec/0x230 -> up (721 ns) (Overruns: 274)
      release_console_sem+0x225/0x230 -> wake_up_klogd (316 ns) (Overruns: 274)
      con_flush_chars+0x39/0x40 -> release_console_sem (2996 ns) (Overruns: 274)
      con_write+0x22/0x30 -> con_flush_chars (46067 ns) (Overruns: 274)
      n_tty_write+0x1cc/0x360 -> con_write (292670 ns) (Overruns: 274)
      smp_apic_timer_interrupt+0x2a/0x90 -> native_apic_mem_write (330 ns) (Overruns: 274)
      irq_enter+0x17/0x70 -> idle_cpu (413 ns) (Overruns: 274)
      smp_apic_timer_interrupt+0x2f/0x90 -> irq_enter (1525 ns) (Overruns: 274)
      ktime_get_ts+0x40/0x70 -> getnstimeofday (465 ns) (Overruns: 274)
      ktime_get_ts+0x60/0x70 -> set_normalized_timespec (436 ns) (Overruns: 274)
      ktime_get+0x16/0x30 -> ktime_get_ts (2501 ns) (Overruns: 274)
      hrtimer_interrupt+0x77/0x1a0 -> ktime_get (3439 ns) (Overruns: 274)
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0231022c
    • F
      tracing/ftrace: make nop tracer using tracer flags · 0619faf6
      Frederic Weisbecker 提交于
      Impact: give an example on how to use specific tracer flags
      
      This patch propose to use the nop tracer to provide an
      example for using the tracer's custom flags implementation.
      
      V2: replace structures and defines just after the headers includes for
          cleanliness.
      V3: replace defines by enum values.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NSteven Noonan <steven@uplinklabs.net>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0619faf6
    • F
      tracing/ftrace: implement a set_flag callback for tracers · adf9f195
      Frederic Weisbecker 提交于
      Impact: give a way to send specific messages to tracers
      
      The current implementation of tracing uses some flags to control the
      output of general tracers. But we have no way to implement custom
      flags handling for a specific tracer. This patch proposes a new
      callback for the struct tracer which called set_flag and a structure
      that represents a 32 bits variable flag.
      
      A tracer can implement a struct tracer_flags on which it puts the
      initial value of the flag integer. Than it can place a range of flags
      with their name and their flag mask on the flag integer. The structure
      that implement a single flag is called struct tracer_opt.
      
      These custom flags will be available through the trace_options file
      like the general tracing flags. Changing their value is done like the
      other general flags. For example if you have a flag that calls "foo",
      you can activate it by writing "foo" or "nofoo" on trace_options.
      
      Note that the set_flag callback is optional and is only needed if you
      want the flags changing to be signaled to your tracer and let it to
      accept or refuse their assignment.
      
      V2: Some arrangements in coding style....
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      adf9f195
    • R
      kernel/profile.c: fix section mismatch warning · e270219f
      Rakib Mullick 提交于
      Impact: fix section mismatch warning in kernel/profile.c
      
      Here, profile_nop function has been called from a non-init function
      create_hash_tables(void). Which generetes a section mismatch warning.
      Previously, create_hash_tables(void) was a init function. So, removing
      __init from create_hash_tables(void) requires profile_nop to be
      non-init.
      
      This patch makes profile_nop function inline and fixes the
      following warning:
      
       WARNING: vmlinux.o(.text+0x6ebb6): Section mismatch in reference from
       the function create_hash_tables() to the function
       .init.text:profile_nop()
       The function create_hash_tables() references
       the function __init profile_nop().
       This is often because create_hash_tables lacks a __init
       annotation or the annotation of profile_nop is wrong.
      Signed-off-by: NRakib Mullick <rakib.mullick@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e270219f
    • L
      cpuset: fix regression when failed to generate sched domains · 700018e0
      Li Zefan 提交于
      Impact: properly rebuild sched-domains on kmalloc() failure
      
      When cpuset failed to generate sched domains due to kmalloc()
      failure, the scheduler should fallback to the single partition
      'fallback_doms' and rebuild sched domains, but now it only
      destroys but not rebuilds sched domains.
      
      The regression was introduced by:
      
      | commit dfb512ec
      | Author: Max Krasnyansky <maxk@qualcomm.com>
      | Date:   Fri Aug 29 13:11:41 2008 -0700
      |
      |    sched: arch_reinit_sched_domains() must destroy domains to force rebuild
      
      After the above commit, partition_sched_domains(0, NULL, NULL) will
      only destroy sched domains and partition_sched_domains(1, NULL, NULL)
      will create the default sched domain.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Max Krasnyansky <maxk@qualcomm.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      700018e0
    • K
      Remove -mno-spe flags as they dont belong · 65ecc14a
      Kumar Gala 提交于
      For some unknown reason at Steven Rostedt added in disabling of the SPE
      instruction generation for e500 based PPC cores in commit
      6ec56232.
      
      We are removing it because:
      
      1. It generates e500 kernels that don't work
      2. its not the correct set of flags to do this
      3. we handle this in the arch/powerpc/Makefile already
      4. its unknown in talking to Steven why he did this
      Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
      Tested-and-Acked-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      65ecc14a
  7. 17 11月, 2008 3 次提交
  8. 16 11月, 2008 11 次提交
    • I
      markers/tracpoints: fix non-modular build · 227a8375
      Ingo Molnar 提交于
      fix:
      
       kernel/marker.c: In function 'marker_module_notify':
       kernel/marker.c:905: error: 'MODULE_STATE_COMING' undeclared (first use in this function)
       [...]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      227a8375
    • M
      tracepoints: add DECLARE_TRACE() and DEFINE_TRACE() · 7e066fb8
      Mathieu Desnoyers 提交于
      Impact: API *CHANGE*. Must update all tracepoint users.
      
      Add DEFINE_TRACE() to tracepoints to let them declare the tracepoint
      structure in a single spot for all the kernel. It helps reducing memory
      consumption, especially when declaring a lot of tracepoints, e.g. for
      kmalloc tracing.
      
      *API CHANGE WARNING*: now, DECLARE_TRACE() must be used in headers for
      tracepoint declarations rather than DEFINE_TRACE(). This is the sane way
      to do it. The name previously used was misleading.
      
      Updates scheduler instrumentation to follow this API change.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7e066fb8
    • M
      tracepoints: use modules notifiers · 32f85742
      Mathieu Desnoyers 提交于
      Impact: cleanup
      
      Use module notifiers for tracepoint updates rather than adding a hook in
      module.c.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      32f85742
    • M
      tracepoints: fix disable · de0baf9a
      Mathieu Desnoyers 提交于
      Impact: fix race
      
      Set the probe array pointer to NULL when the tracepoint is disabled.
      The probe array point not being NULL could generate a race condition
      where the reader would dereference a freed pointer.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      de0baf9a
    • M
      markers: auto enable tracepoints (new API : trace_mark_tp()) · c1df1bd2
      Mathieu Desnoyers 提交于
      Impact: new API
      
      Add a new API trace_mark_tp(), which declares a marker within a
      tracepoint probe. When the marker is activated, the tracepoint is
      automatically enabled.
      
      No branch test is used at the marker site, because it would be a
      duplicate of the branch already present in the tracepoint.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c1df1bd2
    • M
      markers: use module notifier · a419246a
      Mathieu Desnoyers 提交于
      Impact: cleanup
      
      Use module notifiers instead of adding a hook in module.c.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a419246a
    • M
      markers: use rcu_*_sched_notrace and notrace · 021aeb05
      Mathieu Desnoyers 提交于
      Make marker critical code use notrace to make sure they can be used as an
      ftrace callback.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      021aeb05
    • M
      markers: fix unregister · 2bdba316
      Mathieu Desnoyers 提交于
      Impact: fix marker registers/unregister race
      
      get_marker() can return a NULL entry because the mutex is released in
      the middle of those functions. Make sure we check to see if it has been
      concurrently removed.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2bdba316
    • W
      function tracing: fix wrong pos computing when read buffer has been fulfilled · 5821e1b7
      walimis 提交于
      Impact: make output of available_filter_functions complete
      
      phenomenon:
      
      The first value of dyn_ftrace_total_info is not equal with
      `cat available_filter_functions | wc -l`, but they should be equal.
      
      root cause:
      
      When printing functions with seq_printf in t_show, if the read buffer
      is just overflowed by current function record, then this function
      won't be printed to user space through read buffer, it will
      just be dropped. So we can't see this function printing.
      
      So, every time the last function to fill the read buffer, if overflowed,
      will be dropped.
      
      This also applies to set_ftrace_filter if set_ftrace_filter has
      more bytes than read buffer.
      
      fix:
      
      Through checking return value of seq_printf, if less than 0, we know
      this function doesn't be printed. Then we decrease position to force
      this function to be printed next time, in next read buffer.
      
      Another little fix is to show correct allocating pages count.
      Signed-off-by: Nwalimis <walimisdev@gmail.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5821e1b7
    • I
      sched: fix kernel warning on /proc/sched_debug access · 29d7b90c
      Ingo Molnar 提交于
      Luis Henriques reported that with CONFIG_PREEMPT=y + CONFIG_PREEMPT_DEBUG=y +
      CONFIG_SCHED_DEBUG=y + CONFIG_LATENCYTOP=y enabled, the following warning
      triggers when using latencytop:
      
      > [  775.663239] BUG: using smp_processor_id() in preemptible [00000000] code: latencytop/6585
      > [  775.663303] caller is native_sched_clock+0x3a/0x80
      > [  775.663314] Pid: 6585, comm: latencytop Tainted: G        W 2.6.28-rc4-00355-g9c7c3546 #1
      > [  775.663322] Call Trace:
      > [  775.663343]  [<ffffffff803a94e4>] debug_smp_processor_id+0xe4/0xf0
      > [  775.663356]  [<ffffffff80213f7a>] native_sched_clock+0x3a/0x80
      > [  775.663368]  [<ffffffff80213e19>] sched_clock+0x9/0x10
      > [  775.663381]  [<ffffffff8024550d>] proc_sched_show_task+0x8bd/0x10e0
      > [  775.663395]  [<ffffffff8034466e>] sched_show+0x3e/0x80
      > [  775.663408]  [<ffffffff8031039b>] seq_read+0xdb/0x350
      > [  775.663421]  [<ffffffff80368776>] ? security_file_permission+0x16/0x20
      > [  775.663435]  [<ffffffff802f4198>] vfs_read+0xc8/0x170
      > [  775.663447]  [<ffffffff802f4335>] sys_read+0x55/0x90
      > [  775.663460]  [<ffffffff8020c67a>] system_call_fastpath+0x16/0x1b
      > ...
      
      This breakage was caused by me via:
      
        7cbaef9c: sched: optimize sched_clock() a bit
      
      Change the calls to cpu_clock().
      Reported-by: NLuis Henriques <henrix@sapo.pt>
      29d7b90c
    • F
      tracing/function-return-tracer: support for dynamic ftrace on function return tracer · e7d3737e
      Frederic Weisbecker 提交于
      This patch adds the support for dynamic tracing on the function return tracer.
      The whole difference with normal dynamic function tracing is that we don't need
      to hook on a particular callback. The only pro that we want is to nop or set
      dynamically the calls to ftrace_caller (which is ftrace_return_caller here).
      
      Some security checks ensure that we are not trying to launch dynamic tracing for
      return tracing while normal function tracing is already running.
      
      An example of trace with getnstimeofday set as a filter:
      
      ktime_get_ts+0x22/0x50 -> getnstimeofday (2283 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1396 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1825 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1426 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1464 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1524 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1434 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1464 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1502 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1404 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1397 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1051 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1314 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1344 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1163 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1390 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1374 ns)
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e7d3737e