1. 26 2月, 2009 1 次提交
    • P
      perfcounters: fix a few minor cleanliness issues · f3dfd265
      Paul Mackerras 提交于
      This fixes three issues noticed by Arnd Bergmann:
      
      - Add #ifdef __KERNEL__ and move some things around in perf_counter.h
        to make sure only the bits that userspace needs are exported to
        userspace.
      
      - Use __u64, __s64, __u32 types in the structs exported to userspace
        rather than u64, s64, u32.
      
      - Make the sys_perf_counter_open syscall available to the SPUs on
        Cell platforms.
      
      And one issue that I noticed in looking at the code again:
      
      - Wrap the perf_counter_open syscall with SYSCALL_DEFINE4 so we get
        the proper handling of int arguments on ppc64 (and some other 64-bit
        architectures).
      Reported-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      f3dfd265
  2. 16 2月, 2009 1 次提交
    • Y
      [IA64] fix __apci_unmap_table · 970ec1a8
      Yinghai Lu 提交于
      Impact: fix build error
      
      to fix:
      
        tip/arch/ia64/kernel/acpi.c:203: error: conflicting types for '__acpi_unmap_table'
        tip/include/linux/acpi.h:82: error: previous declaration of '__acpi_unmap_table' was here
        tip/arch/ia64/kernel/acpi.c:203: error: conflicting types for '__acpi_unmap_table'
        tip/include/linux/acpi.h:82: error: previous declaration of '__acpi_unmap_table' was here
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      970ec1a8
  3. 13 2月, 2009 1 次提交
    • P
      perfcounters: make context switch and migration software counters work again · c07c99b6
      Paul Mackerras 提交于
      Jaswinder Singh Rajput reported that commit 23a185ca caused the
      context switch and migration software counters to report zero always.
      With that commit, the software counters only count events that occur
      between sched-in and sched-out for a task.  This is necessary for the
      counter enable/disable prctls and ioctls to work.  However, the
      context switch and migration counts are incremented after sched-out
      for one task and before sched-in for the next.  Since the increment
      doesn't occur while a task is scheduled in (as far as the software
      counters are concerned) it doesn't count towards any counter.
      
      Thus the context switch and migration counters need to count events
      that occur at any time, provided the counter is enabled, not just
      those that occur while the task is scheduled in (from the perf_counter
      subsystem's point of view).  The problem though is that the software
      counter code can't tell the difference between being enabled and being
      scheduled in, and between being disabled and being scheduled out,
      since we use the one pair of enable/disable entry points for both.
      That is, the high-level disable operation simply arranges for the
      counter to not be scheduled in any more, and the high-level enable
      operation arranges for it to be scheduled in again.
      
      One way to solve this would be to have sched_in/out operations in the
      hw_perf_counter_ops struct as well as enable/disable.  However, this
      takes a simpler approach: it adds a 'prev_state' field to the
      perf_counter struct that allows a counter's enable method to know
      whether the counter was previously disabled or just inactive
      (scheduled out), and therefore whether the enable method is being
      called as a result of a high-level enable or a schedule-in operation.
      
      This then allows the context switch, migration and page fault counters
      to reset their hw.prev_count value in their enable functions only if
      they are called as a result of a high-level enable operation.
      Although page faults would normally only occur while the counter is
      scheduled in, this changes the page fault counter code too in case
      there are ever circumstances where page faults get counted against a
      task while its counters are not scheduled in.
      Reported-by: NJaswinder Singh Rajput <jaswinder@kernel.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c07c99b6
  4. 12 2月, 2009 2 次提交
    • H
      syscall define: fix uml compile bug · 6c597963
      Heiko Carstens 提交于
      With the new system call defines we get this on uml:
      
      arch/um/sys-i386/built-in.o: In function `sys_call_table':
      (.rodata+0x308): undefined reference to `sys_sigprocmask'
      
      Reason for this is that uml passes the preprocessor option
      -Dsigprocmask=kernel_sigprocmask to gcc when compiling the kernel.
      This causes SYSCALL_DEFINE3(sigprocmask, ...) to be expanded to
      SYSCALL_DEFINEx(3, kernel_sigprocmask, ...) and finally to a system
      call named sys_kernel_sigprocmask.  However sys_sigprocmask is missing
      because of this.
      
      To avoid macro expansion for the system call name just concatenate the
      name at first define instead of carrying it through severel levels.
      This was pointed out by Al Viro.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Reviewed-by: NWANG Cong <wangcong@zeuux.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6c597963
    • L
      cgroups: fix lockdep subclasses overflow · cfebe563
      Li Zefan 提交于
      I enabled all cgroup subsystems when compiling kernel, and then:
       # mount -t cgroup -o net_cls xxx /mnt
       # mkdir /mnt/0
      
      This showed up immediately:
       BUG: MAX_LOCKDEP_SUBCLASSES too low!
       turning off the locking correctness validator.
      
      It's caused by the cgroup hierarchy lock:
      	for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
      		struct cgroup_subsys *ss = subsys[i];
      		if (ss->root == root)
      			mutex_lock_nested(&ss->hierarchy_mutex, i);
      	}
      
      Now we have 9 cgroup subsystems, and the above 'i' for net_cls is 8, but
      MAX_LOCKDEP_SUBCLASSES is 8.
      
      This patch uses different lockdep keys for different subsystems.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NPaul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cfebe563
  5. 11 2月, 2009 7 次提交
    • M
      x86, ptrace, mm: fix double-free on race · 9f339e70
      Markus Metzger 提交于
      Ptrace_detach() races with __ptrace_unlink() if the traced task is
      reaped while detaching. This might cause a double-free of the BTS
      buffer.
      
      Change the ptrace_detach() path to only do the memory accounting in
      ptrace_bts_detach() and leave the buffer free to ptrace_bts_untrace()
      which will be called from __ptrace_unlink().
      
      The fix follows a proposal from Oleg Nesterov.
      Reported-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NMarkus Metzger <markus.t.metzger@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9f339e70
    • P
      timers: fix TIMER_ABSTIME for process wide cpu timers · 4da94d49
      Peter Zijlstra 提交于
      The POSIX timer interface allows for absolute time expiry values through the
      TIMER_ABSTIME flag, therefore we have to synchronize the timer to the clock
      every time we start it.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4da94d49
    • P
      timers: split process wide cpu clocks/timers, fix · 3fccfd67
      Peter Zijlstra 提交于
      To decrease the chance of a missed enable, always enable the timer when we
      sample it, we'll always disable it when we find that there are no active timers
      in the jiffy tick.
      
      This fixes a flood of warnings reported by Mike Galbraith.
      Reported-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3fccfd67
    • P
      perf_counters: allow users to count user, kernel and/or hypervisor events · 0475f9ea
      Paul Mackerras 提交于
      Impact: new perf_counter feature
      
      This extends the perf_counter_hw_event struct with bits that specify
      that events in user, kernel and/or hypervisor mode should not be
      counted (i.e. should be excluded), and adds code to program the PMU
      mode selection bits accordingly on x86 and powerpc.
      
      For software counters, we don't currently have the infrastructure to
      distinguish which mode an event occurs in, so we currently fail the
      counter initialization if the setting of the hw_event.exclude_* bits
      would require us to distinguish.  Context switches and CPU migrations
      are currently considered to occur in kernel mode.
      
      On x86, this changes the previous policy that only root can count
      kernel events.  Now non-root users can count kernel events or exclude
      them.  Non-root users still can't use NMI events, though.  On x86 we
      don't appear to have any way to control whether hypervisor events are
      counted or not, so hw_event.exclude_hv is ignored.
      
      On powerpc, the selection of whether to count events in user, kernel
      and/or hypervisor mode is PMU-wide, not per-counter, so this adds a
      check that the hw_event.exclude_* settings are the same as other events
      on the PMU.  Counters being added to a group have to have the same
      settings as the other hardware counters in the group.  Counters and
      groups can only be enabled in hw_perf_group_sched_in or power_perf_enable
      if they have the same settings as any other counters already on the
      PMU.  If we are not running on a hypervisor, the exclude_hv setting
      is ignored (by forcing it to 0) since we can't ever get any
      hypervisor events.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      0475f9ea
    • C
      pkt_sched: type should be __u32 in header · e672f7db
      Chuck Ebbert 提交于
      Using u32 in this header breaks the build of iptables.
      Signed-off-by: NChuck Ebbert <cebbert@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e672f7db
    • S
      hugetlbfs: fix build failure with !CONFIG_HUGETLBFS · 1db8508c
      Stefan Richter 提交于
      Fix regression due to 5a6fe125,
      "Do not account for the address space used by hugetlbfs using VM_ACCOUNT"
      which added an argument to the function hugetlb_file_setup() but not to
      the macro hugetlb_file_setup().
      Reported-by: NChris Clayton <chris2553@googlemail.com>
      Signed-off-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1db8508c
    • M
      Do not account for the address space used by hugetlbfs using VM_ACCOUNT · 5a6fe125
      Mel Gorman 提交于
      When overcommit is disabled, the core VM accounts for pages used by anonymous
      shared, private mappings and special mappings. It keeps track of VMAs that
      should be accounted for with VM_ACCOUNT and VMAs that never had a reserve
      with VM_NORESERVE.
      
      Overcommit for hugetlbfs is much riskier than overcommit for base pages
      due to contiguity requirements. It avoids overcommiting on both shared and
      private mappings using reservation counters that are checked and updated
      during mmap(). This ensures (within limits) that hugepages exist in the
      future when faults occurs or it is too easy to applications to be SIGKILLed.
      
      As hugetlbfs makes its own reservations of a different unit to the base page
      size, VM_ACCOUNT should never be set. Even if the units were correct, we would
      double account for the usage in the core VM and hugetlbfs. VM_NORESERVE may
      be set because an application can request no reserves be made for hugetlbfs
      at the risk of getting killed later.
      
      With commit fc8744ad, VM_NORESERVE and
      VM_ACCOUNT are getting unconditionally set for hugetlbfs-backed mappings. This
      breaks the accounting for both the core VM and hugetlbfs, can trigger an
      OOM storm when hugepage pools are too small lockups and corrupted counters
      otherwise are used. This patch brings hugetlbfs more in line with how the
      core VM treats VM_NORESERVE but prevents VM_ACCOUNT being set.
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5a6fe125
  6. 10 2月, 2009 3 次提交
  7. 09 2月, 2009 4 次提交
  8. 08 2月, 2009 2 次提交
  9. 07 2月, 2009 1 次提交
  10. 06 2月, 2009 7 次提交
  11. 05 2月, 2009 4 次提交
  12. 03 2月, 2009 7 次提交