1. 26 2月, 2009 1 次提交
    • P
      perfcounters: fix a few minor cleanliness issues · f3dfd265
      Paul Mackerras 提交于
      This fixes three issues noticed by Arnd Bergmann:
      
      - Add #ifdef __KERNEL__ and move some things around in perf_counter.h
        to make sure only the bits that userspace needs are exported to
        userspace.
      
      - Use __u64, __s64, __u32 types in the structs exported to userspace
        rather than u64, s64, u32.
      
      - Make the sys_perf_counter_open syscall available to the SPUs on
        Cell platforms.
      
      And one issue that I noticed in looking at the code again:
      
      - Wrap the perf_counter_open syscall with SYSCALL_DEFINE4 so we get
        the proper handling of int arguments on ppc64 (and some other 64-bit
        architectures).
      Reported-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      f3dfd265
  2. 23 2月, 2009 1 次提交
  3. 22 2月, 2009 5 次提交
  4. 19 2月, 2009 5 次提交
  5. 18 2月, 2009 2 次提交
    • J
      block: fix bad definition of BIO_RW_SYNC · 93dbb393
      Jens Axboe 提交于
      We can't OR shift values, so get rid of BIO_RW_SYNC and use BIO_RW_SYNCIO
      and BIO_RW_UNPLUG explicitly. This brings back the behaviour from before
      213d9417.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      93dbb393
    • F
      tracing/function-graph-tracer: trace the idle tasks · 5b058bcd
      Frederic Weisbecker 提交于
      When the function graph tracer is activated, it iterates over the task_list
      to allocate a stack to store the return addresses.
      
      But the per cpu idle tasks are not iterated by using
      do_each_thread / while_each_thread.
      
      So we have to iterate on them manually.
      
      This fixes somes weirdness in the traces and many losses of traces.
      Examples on two cpus:
      
       0)   Xorg-4287    |   2.906 us    |              }
       0)   Xorg-4287    |   3.965 us    |            }
       0)   Xorg-4287    |   5.302 us    |          }
       ------------------------------------------
       0)   Xorg-4287    =>    <idle>-0
       ------------------------------------------
      
       0)    <idle>-0    |   2.861 us    |                        }
       0)    <idle>-0    |   0.526 us    |                        set_normalized_timespec();
       0)    <idle>-0    |   7.201 us    |                      }
       0)    <idle>-0    |   8.214 us    |                    }
       0)    <idle>-0    |               |                    clockevents_program_event() {
       0)    <idle>-0    |               |                      lapic_next_event() {
       0)    <idle>-0    |   0.510 us    |                        native_apic_mem_write();
       0)    <idle>-0    |   1.546 us    |                      }
       0)    <idle>-0    |   2.583 us    |                    }
       0)    <idle>-0    | + 12.435 us   |                  }
       0)    <idle>-0    | + 13.470 us   |                }
       0)    <idle>-0    |   0.608 us    |                _spin_unlock_irqrestore();
       0)    <idle>-0    | + 23.270 us   |              }
       0)    <idle>-0    | + 24.336 us   |            }
       0)    <idle>-0    | + 25.417 us   |          }
       0)    <idle>-0    |   0.593 us    |          _spin_unlock();
       0)    <idle>-0    | + 41.869 us   |        }
       0)    <idle>-0    | + 42.906 us   |      }
       0)    <idle>-0    | + 95.035 us   |    }
       0)    <idle>-0    |   0.540 us    |    menu_reflect();
       0)    <idle>-0    | ! 100.404 us  |  }
       0)    <idle>-0    |   0.564 us    |  mce_idle_callback();
       0)    <idle>-0    |               |  enter_idle() {
       0)    <idle>-0    |   0.526 us    |    mce_idle_callback();
       0)    <idle>-0    |   1.757 us    |  }
       0)    <idle>-0    |               |  cpuidle_idle_call() {
       0)    <idle>-0    |               |    menu_select() {
       0)    <idle>-0    |   0.525 us    |      pm_qos_requirement();
       0)    <idle>-0    |   0.518 us    |      tick_nohz_get_sleep_length();
       0)    <idle>-0    |   2.621 us    |    }
      [...]
       1)    <idle>-0    |   0.518 us    |              touch_softlockup_watchdog();
       1)    <idle>-0    | + 14.355 us   |            }
       1)    <idle>-0    | + 22.840 us   |          }
       1)    <idle>-0    | + 25.949 us   |        }
       1)    <idle>-0    |               |        handle_irq() {
       1)    <idle>-0    |   0.511 us    |          irq_to_desc();
       1)    <idle>-0    |               |          handle_edge_irq() {
       1)    <idle>-0    |   0.638 us    |            _spin_lock();
       1)    <idle>-0    |               |            ack_apic_edge() {
       1)    <idle>-0    |   0.510 us    |              irq_to_desc();
       1)    <idle>-0    |               |              move_native_irq() {
       1)    <idle>-0    |   0.510 us    |                irq_to_desc();
       1)    <idle>-0    |   1.532 us    |              }
       1)    <idle>-0    |   0.511 us    |              native_apic_mem_write();
       ------------------------------------------
       1)    <idle>-0    =>    cat-5073
       ------------------------------------------
      
       1)    cat-5073    |   3.731 us    |                    }
       1)    cat-5073    |               |                    run_local_timers() {
       1)    cat-5073    |   0.533 us    |                      hrtimer_run_queues();
       1)    cat-5073    |               |                      raise_softirq() {
       1)    cat-5073    |               |                        __raise_softirq_irqoff() {
       1)    cat-5073    |               |                          /* nr: 1 */
       1)    cat-5073    |   2.718 us    |                        }
       1)    cat-5073    |   3.814 us    |                      }
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5b058bcd
  6. 16 2月, 2009 2 次提交
  7. 14 2月, 2009 1 次提交
  8. 13 2月, 2009 2 次提交
    • P
      timers: more consistently use clock vs timer · 3997ad31
      Peter Zijlstra 提交于
      While reviewing the manpages, I noticed I'd missed some clock vs timer sites.
      
      Make sure that all timer functions call cpu_timer_sample_group() and not
      cpu_clock_sample_group(). This ensures that we enable the process wide timer
      in time, and therefore pay the O(n) thread group cost from the syscall.
      
      Not doing it here, will result in the first jiffy tick after setting the timer
      doing this, resulting in a very expensive tick (but only once) and a delay in
      actually starting the timer.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3997ad31
    • P
      perfcounters: make context switch and migration software counters work again · c07c99b6
      Paul Mackerras 提交于
      Jaswinder Singh Rajput reported that commit 23a185ca caused the
      context switch and migration software counters to report zero always.
      With that commit, the software counters only count events that occur
      between sched-in and sched-out for a task.  This is necessary for the
      counter enable/disable prctls and ioctls to work.  However, the
      context switch and migration counts are incremented after sched-out
      for one task and before sched-in for the next.  Since the increment
      doesn't occur while a task is scheduled in (as far as the software
      counters are concerned) it doesn't count towards any counter.
      
      Thus the context switch and migration counters need to count events
      that occur at any time, provided the counter is enabled, not just
      those that occur while the task is scheduled in (from the perf_counter
      subsystem's point of view).  The problem though is that the software
      counter code can't tell the difference between being enabled and being
      scheduled in, and between being disabled and being scheduled out,
      since we use the one pair of enable/disable entry points for both.
      That is, the high-level disable operation simply arranges for the
      counter to not be scheduled in any more, and the high-level enable
      operation arranges for it to be scheduled in again.
      
      One way to solve this would be to have sched_in/out operations in the
      hw_perf_counter_ops struct as well as enable/disable.  However, this
      takes a simpler approach: it adds a 'prev_state' field to the
      perf_counter struct that allows a counter's enable method to know
      whether the counter was previously disabled or just inactive
      (scheduled out), and therefore whether the enable method is being
      called as a result of a high-level enable or a schedule-in operation.
      
      This then allows the context switch, migration and page fault counters
      to reset their hw.prev_count value in their enable functions only if
      they are called as a result of a high-level enable operation.
      Although page faults would normally only occur while the counter is
      scheduled in, this changes the page fault counter code too in case
      there are ever circumstances where page faults get counted against a
      task while its counters are not scheduled in.
      Reported-by: NJaswinder Singh Rajput <jaswinder@kernel.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c07c99b6
  9. 12 2月, 2009 4 次提交
  10. 11 2月, 2009 7 次提交
    • P
      sched: revert recent sync wakeup changes · fc631c82
      Peter Zijlstra 提交于
      Intel reported a 10% regression (mysql+sysbench) on a 16-way machine
      with these patches:
      
        1596e297: sched: symmetric sync vs avg_overlap
        d942fb6c: sched: fix sync wakeups
      
      Revert them.
      Reported-by: N"Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
      Bisected-by: NLin Ming <ming.m.lin@intel.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      fc631c82
    • P
      perfcounters: fix refcounting bug, take 2 · 4bcf349a
      Paul Mackerras 提交于
      Only free child_counter if it has a parent; if it doesn't, then it
      has a file pointing to it and we'll free it in perf_release.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4bcf349a
    • P
      timers: fix TIMER_ABSTIME for process wide cpu timers · 4da94d49
      Peter Zijlstra 提交于
      The POSIX timer interface allows for absolute time expiry values through the
      TIMER_ABSTIME flag, therefore we have to synchronize the timer to the clock
      every time we start it.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4da94d49
    • P
      timers: split process wide cpu clocks/timers, fix · 3fccfd67
      Peter Zijlstra 提交于
      To decrease the chance of a missed enable, always enable the timer when we
      sample it, we'll always disable it when we find that there are no active timers
      in the jiffy tick.
      
      This fixes a flood of warnings reported by Mike Galbraith.
      Reported-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3fccfd67
    • M
      perfcounters: fix use after free in perf_release() · 5af75917
      Mike Galbraith 提交于
      running...
      
        while true; do
          foo -d 1 -f 1 -c 100000 & sleep 1
          kerneltop -d 1 -f 1 -e 1 -c 25000 -p `pidof foo`
        done
      
        while true; do
          killall foo; killall kerneltop; sleep 2
        done
      
      ...in two shells with SLUB_DEBUG enabled produces flood of:
      BUG task_struct: Poison overwritten.
      
      Fix the use-after-free bug in perf_release().
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5af75917
    • O
      ptrace, x86: fix the usage of ptrace_fork() · 06eb23b1
      Oleg Nesterov 提交于
      I noticed by pure accident we have ptrace_fork() and friends. This was
      added by "x86, bts: add fork and exit handling", commit
      bf53de90.
      
      I can't test this, ds_request_bts() returns -EOPNOTSUPP, but I strongly
      believe this needs the fix. I think something like this program
      
      	int main(void)
      	{
      		int pid = fork();
      
      		if (!pid) {
      			ptrace(PTRACE_TRACEME, 0, NULL, NULL);
      			kill(getpid(), SIGSTOP);
      			fork();
      		} else {
      			struct ptrace_bts_config bts = {
      				.flags = PTRACE_BTS_O_ALLOC,
      				.size  = 4 * 4096,
      			};
      
      			wait(NULL);
      
      			ptrace(PTRACE_SETOPTIONS, pid, NULL, PTRACE_O_TRACEFORK);
      			ptrace(PTRACE_BTS_CONFIG, pid, &bts, sizeof(bts));
      			ptrace(PTRACE_CONT, pid, NULL, NULL);
      
      			sleep(1);
      		}
      
      		return 0;
      	}
      
      should crash the kernel.
      
      If the task is traced by its natural parent ptrace_reparented() returns 0
      but we should clear ->btsxxx anyway.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NMarkus Metzger <markus.t.metzger@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      06eb23b1
    • P
      perf_counters: allow users to count user, kernel and/or hypervisor events · 0475f9ea
      Paul Mackerras 提交于
      Impact: new perf_counter feature
      
      This extends the perf_counter_hw_event struct with bits that specify
      that events in user, kernel and/or hypervisor mode should not be
      counted (i.e. should be excluded), and adds code to program the PMU
      mode selection bits accordingly on x86 and powerpc.
      
      For software counters, we don't currently have the infrastructure to
      distinguish which mode an event occurs in, so we currently fail the
      counter initialization if the setting of the hw_event.exclude_* bits
      would require us to distinguish.  Context switches and CPU migrations
      are currently considered to occur in kernel mode.
      
      On x86, this changes the previous policy that only root can count
      kernel events.  Now non-root users can count kernel events or exclude
      them.  Non-root users still can't use NMI events, though.  On x86 we
      don't appear to have any way to control whether hypervisor events are
      counted or not, so hw_event.exclude_hv is ignored.
      
      On powerpc, the selection of whether to count events in user, kernel
      and/or hypervisor mode is PMU-wide, not per-counter, so this adds a
      check that the hw_event.exclude_* settings are the same as other events
      on the PMU.  Counters being added to a group have to have the same
      settings as the other hardware counters in the group.  Counters and
      groups can only be enabled in hw_perf_group_sched_in or power_perf_enable
      if they have the same settings as any other counters already on the
      PMU.  If we are not running on a hypervisor, the exclude_hv setting
      is ignored (by forcing it to 0) since we can't ever get any
      hypervisor events.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      0475f9ea
  11. 10 2月, 2009 3 次提交
    • H
      profiling: fix broken profiling regression · acd89579
      Hugh Dickins 提交于
      Impact: fix broken /proc/profile on UP machines
      
      Commit c309b917 "cpumask: convert
      kernel/profile.c" broke profiling.  prof_cpu_mask was previously
      initialized to CPU_MASK_ALL, but left uninitialized in that commit.
      We need to copy cpu_possible_mask (cpu_online_mask is not enough).
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      acd89579
    • T
      stackprotector: update make rules · 5d707e9c
      Tejun Heo 提交于
      Impact: no default -fno-stack-protector if stackp is enabled, cleanup
      
      Stackprotector make rules had the following problems.
      
      * cc support test and warning are scattered across makefile and
        kernel/panic.c.
      
      * -fno-stack-protector was always added regardless of configuration.
      
      Update such that cc support test and warning are contained in makefile
      and -fno-stack-protector is added iff stackp is turned off.  While at
      it, prepare for 32bit support.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5d707e9c
    • T
      elf: add ELF_CORE_COPY_KERNEL_REGS() · 6cd61c0b
      Tejun Heo 提交于
      ELF core dump is used for both user land core dump and kernel crash
      dump.  Depending on architecture, register might need to be accessed
      differently for userland and kernel.  Allow architectures to define
      ELF_CORE_COPY_KERNEL_REGS() and use different operation for kernel
      register dump.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6cd61c0b
  12. 09 2月, 2009 6 次提交
  13. 07 2月, 2009 1 次提交