1. 14 4月, 2012 1 次提交
    • W
      seccomp: add system call filtering using BPF · e2cfabdf
      Will Drewry 提交于
      [This patch depends on luto@mit.edu's no_new_privs patch:
         https://lkml.org/lkml/2012/1/30/264
       The whole series including Andrew's patches can be found here:
         https://github.com/redpig/linux/tree/seccomp
       Complete diff here:
         https://github.com/redpig/linux/compare/1dc65fed...seccomp
      ]
      
      This patch adds support for seccomp mode 2.  Mode 2 introduces the
      ability for unprivileged processes to install system call filtering
      policy expressed in terms of a Berkeley Packet Filter (BPF) program.
      This program will be evaluated in the kernel for each system call
      the task makes and computes a result based on data in the format
      of struct seccomp_data.
      
      A filter program may be installed by calling:
        struct sock_fprog fprog = { ... };
        ...
        prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &fprog);
      
      The return value of the filter program determines if the system call is
      allowed to proceed or denied.  If the first filter program installed
      allows prctl(2) calls, then the above call may be made repeatedly
      by a task to further reduce its access to the kernel.  All attached
      programs must be evaluated before a system call will be allowed to
      proceed.
      
      Filter programs will be inherited across fork/clone and execve.
      However, if the task attaching the filter is unprivileged
      (!CAP_SYS_ADMIN) the no_new_privs bit will be set on the task.  This
      ensures that unprivileged tasks cannot attach filters that affect
      privileged tasks (e.g., setuid binary).
      
      There are a number of benefits to this approach. A few of which are
      as follows:
      - BPF has been exposed to userland for a long time
      - BPF optimization (and JIT'ing) are well understood
      - Userland already knows its ABI: system call numbers and desired
        arguments
      - No time-of-check-time-of-use vulnerable data accesses are possible.
      - system call arguments are loaded on access only to minimize copying
        required for system call policy decisions.
      
      Mode 2 support is restricted to architectures that enable
      HAVE_ARCH_SECCOMP_FILTER.  In this patch, the primary dependency is on
      syscall_get_arguments().  The full desired scope of this feature will
      add a few minor additional requirements expressed later in this series.
      Based on discussion, SECCOMP_RET_ERRNO and SECCOMP_RET_TRACE seem to be
      the desired additional functionality.
      
      No architectures are enabled in this patch.
      Signed-off-by: NWill Drewry <wad@chromium.org>
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Reviewed-by: NIndan Zupancic <indan@nul.nu>
      Acked-by: NEric Paris <eparis@redhat.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      
      v18: - rebase to v3.4-rc2
           - s/chk/check/ (akpm@linux-foundation.org,jmorris@namei.org)
           - allocate with GFP_KERNEL|__GFP_NOWARN (indan@nul.nu)
           - add a comment for get_u32 regarding endianness (akpm@)
           - fix other typos, style mistakes (akpm@)
           - added acked-by
      v17: - properly guard seccomp filter needed headers (leann@ubuntu.com)
           - tighten return mask to 0x7fff0000
      v16: - no change
      v15: - add a 4 instr penalty when counting a path to account for seccomp_filter
             size (indan@nul.nu)
           - drop the max insns to 256KB (indan@nul.nu)
           - return ENOMEM if the max insns limit has been hit (indan@nul.nu)
           - move IP checks after args (indan@nul.nu)
           - drop !user_filter check (indan@nul.nu)
           - only allow explicit bpf codes (indan@nul.nu)
           - exit_code -> exit_sig
      v14: - put/get_seccomp_filter takes struct task_struct
             (indan@nul.nu,keescook@chromium.org)
           - adds seccomp_chk_filter and drops general bpf_run/chk_filter user
           - add seccomp_bpf_load for use by net/core/filter.c
           - lower max per-process/per-hierarchy: 1MB
           - moved nnp/capability check prior to allocation
             (all of the above: indan@nul.nu)
      v13: - rebase on to 88ebdda6
      v12: - added a maximum instruction count per path (indan@nul.nu,oleg@redhat.com)
           - removed copy_seccomp (keescook@chromium.org,indan@nul.nu)
           - reworded the prctl_set_seccomp comment (indan@nul.nu)
      v11: - reorder struct seccomp_data to allow future args expansion (hpa@zytor.com)
           - style clean up, @compat dropped, compat_sock_fprog32 (indan@nul.nu)
           - do_exit(SIGSYS) (keescook@chromium.org, luto@mit.edu)
           - pare down Kconfig doc reference.
           - extra comment clean up
      v10: - seccomp_data has changed again to be more aesthetically pleasing
             (hpa@zytor.com)
           - calling convention is noted in a new u32 field using syscall_get_arch.
             This allows for cross-calling convention tasks to use seccomp filters.
             (hpa@zytor.com)
           - lots of clean up (thanks, Indan!)
       v9: - n/a
       v8: - use bpf_chk_filter, bpf_run_filter. update load_fns
           - Lots of fixes courtesy of indan@nul.nu:
           -- fix up load behavior, compat fixups, and merge alloc code,
           -- renamed pc and dropped __packed, use bool compat.
           -- Added a hidden CONFIG_SECCOMP_FILTER to synthesize non-arch
              dependencies
       v7:  (massive overhaul thanks to Indan, others)
           - added CONFIG_HAVE_ARCH_SECCOMP_FILTER
           - merged into seccomp.c
           - minimal seccomp_filter.h
           - no config option (part of seccomp)
           - no new prctl
           - doesn't break seccomp on systems without asm/syscall.h
             (works but arg access always fails)
           - dropped seccomp_init_task, extra free functions, ...
           - dropped the no-asm/syscall.h code paths
           - merges with network sk_run_filter and sk_chk_filter
       v6: - fix memory leak on attach compat check failure
           - require no_new_privs || CAP_SYS_ADMIN prior to filter
             installation. (luto@mit.edu)
           - s/seccomp_struct_/seccomp_/ for macros/functions (amwang@redhat.com)
           - cleaned up Kconfig (amwang@redhat.com)
           - on block, note if the call was compat (so the # means something)
       v5: - uses syscall_get_arguments
             (indan@nul.nu,oleg@redhat.com, mcgrathr@chromium.org)
            - uses union-based arg storage with hi/lo struct to
              handle endianness.  Compromises between the two alternate
              proposals to minimize extra arg shuffling and account for
              endianness assuming userspace uses offsetof().
              (mcgrathr@chromium.org, indan@nul.nu)
            - update Kconfig description
            - add include/seccomp_filter.h and add its installation
            - (naive) on-demand syscall argument loading
            - drop seccomp_t (eparis@redhat.com)
       v4:  - adjusted prctl to make room for PR_[SG]ET_NO_NEW_PRIVS
            - now uses current->no_new_privs
              (luto@mit.edu,torvalds@linux-foundation.com)
            - assign names to seccomp modes (rdunlap@xenotime.net)
            - fix style issues (rdunlap@xenotime.net)
            - reworded Kconfig entry (rdunlap@xenotime.net)
       v3:  - macros to inline (oleg@redhat.com)
            - init_task behavior fixed (oleg@redhat.com)
            - drop creator entry and extra NULL check (oleg@redhat.com)
            - alloc returns -EINVAL on bad sizing (serge.hallyn@canonical.com)
            - adds tentative use of "always_unprivileged" as per
              torvalds@linux-foundation.org and luto@mit.edu
       v2:  - (patch 2 only)
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      e2cfabdf
  2. 24 3月, 2012 1 次提交
  3. 16 3月, 2012 1 次提交
    • C
      [PATCH v3] ipc: provide generic compat versions of IPC syscalls · 48b25c43
      Chris Metcalf 提交于
      When using the "compat" APIs, architectures will generally want to
      be able to make direct syscalls to msgsnd(), shmctl(), etc., and
      in the kernel we would want them to be handled directly by
      compat_sys_xxx() functions, as is true for other compat syscalls.
      
      However, for historical reasons, several of the existing compat IPC
      syscalls do not do this.  semctl() expects a pointer to the fourth
      argument, instead of the fourth argument itself.  msgsnd(), msgrcv()
      and shmat() expect arguments in different order.
      
      This change adds an ARCH_WANT_OLD_COMPAT_IPC config option that can be
      set to preserve this behavior for ports that use it (x86, sparc, powerpc,
      s390, and mips).  No actual semantics are changed for those architectures,
      and there is only a minimal amount of code refactoring in ipc/compat.c.
      
      Newer architectures like tile (and perhaps future architectures such
      as arm64 and unicore64) should not select this option, and thus can
      avoid having any IPC-specific code at all in their architecture-specific
      compat layer.  In the same vein, if this option is not selected, IPC_64
      mode is assumed, since that's what the <asm-generic> headers expect.
      
      The workaround code in "tile" for msgsnd() and msgrcv() is removed
      with this change; it also fixes the bug that shmat() and semctl() were
      not being properly handled.
      Reviewed-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
      48b25c43
  4. 24 2月, 2012 1 次提交
    • I
      static keys: Introduce 'struct static_key', static_key_true()/false() and... · c5905afb
      Ingo Molnar 提交于
      static keys: Introduce 'struct static_key', static_key_true()/false() and static_key_slow_[inc|dec]()
      
      So here's a boot tested patch on top of Jason's series that does
      all the cleanups I talked about and turns jump labels into a
      more intuitive to use facility. It should also address the
      various misconceptions and confusions that surround jump labels.
      
      Typical usage scenarios:
      
              #include <linux/static_key.h>
      
              struct static_key key = STATIC_KEY_INIT_TRUE;
      
              if (static_key_false(&key))
                      do unlikely code
              else
                      do likely code
      
      Or:
      
              if (static_key_true(&key))
                      do likely code
              else
                      do unlikely code
      
      The static key is modified via:
      
              static_key_slow_inc(&key);
              ...
              static_key_slow_dec(&key);
      
      The 'slow' prefix makes it abundantly clear that this is an
      expensive operation.
      
      I've updated all in-kernel code to use this everywhere. Note
      that I (intentionally) have not pushed through the rename
      blindly through to the lowest levels: the actual jump-label
      patching arch facility should be named like that, so we want to
      decouple jump labels from the static-key facility a bit.
      
      On non-jump-label enabled architectures static keys default to
      likely()/unlikely() branches.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NJason Baron <jbaron@redhat.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: a.p.zijlstra@chello.nl
      Cc: mathieu.desnoyers@efficios.com
      Cc: davem@davemloft.net
      Cc: ddaney.cavm@gmail.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20120222085809.GA26397@elte.huSigned-off-by: NIngo Molnar <mingo@elte.hu>
      c5905afb
  5. 13 1月, 2012 3 次提交
  6. 04 11月, 2011 1 次提交
    • R
      oprofile, x86: Reimplement nmi timer mode using perf event · dcfce4a0
      Robert Richter 提交于
      The legacy x86 nmi watchdog code was removed with the implementation
      of the perf based nmi watchdog. This broke Oprofile's nmi timer
      mode. To run nmi timer mode we relied on a continuous ticking nmi
      source which the nmi watchdog provided. The nmi tick was no longer
      available and current watchdog can not be used anymore since it runs
      with very long periods in the range of seconds. This patch
      reimplements the nmi timer mode using a perf counter nmi source.
      
      V2:
      * removing pr_info()
      * fix undefined reference to `__udivdi3' for 32 bit build
      * fix section mismatch of .cpuinit.data:nmi_timer_cpu_nb
      * removed nmi timer setup in arch/x86
      * implemented function stubs for op_nmi_init/exit()
      * made code more readable in oprofile_init()
      
      V3:
      * fix architectural initialization in oprofile_init()
      * fix CONFIG_OPROFILE_NMI_TIMER dependencies
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      dcfce4a0
  7. 03 8月, 2011 1 次提交
  8. 25 5月, 2011 1 次提交
  9. 10 4月, 2011 1 次提交
  10. 16 3月, 2011 1 次提交
  11. 15 2月, 2011 1 次提交
  12. 05 1月, 2011 1 次提交
    • G
      [S390] mutex: Introduce arch_mutex_cpu_relax() · 34b133f8
      Gerald Schaefer 提交于
      The spinning mutex implementation uses cpu_relax() in busy loops as a
      compiler barrier. Depending on the architecture, cpu_relax() may do more
      than needed in this specific mutex spin loops. On System z we also give
      up the time slice of the virtual cpu in cpu_relax(), which prevents
      effective spinning on the mutex.
      
      This patch replaces cpu_relax() in the spinning mutex code with
      arch_mutex_cpu_relax(), which can be defined by each architecture that
      selects HAVE_ARCH_MUTEX_CPU_RELAX. The default is still cpu_relax(), so
      this patch should not affect other architectures than System z for now.
      Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1290437256.7455.4.camel@thinkpad>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      34b133f8
  13. 26 11月, 2010 1 次提交
    • G
      mutexes, sched: Introduce arch_mutex_cpu_relax() · 335d7afb
      Gerald Schaefer 提交于
      The spinning mutex implementation uses cpu_relax() in busy loops as a
      compiler barrier. Depending on the architecture, cpu_relax() may do more
      than needed in this specific mutex spin loops. On System z we also give
      up the time slice of the virtual cpu in cpu_relax(), which prevents
      effective spinning on the mutex.
      
      This patch replaces cpu_relax() in the spinning mutex code with
      arch_mutex_cpu_relax(), which can be defined by each architecture that
      selects HAVE_ARCH_MUTEX_CPU_RELAX. The default is still cpu_relax(), so
      this patch should not affect other architectures than System z for now.
      Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1290437256.7455.4.camel@thinkpad>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      335d7afb
  14. 30 10月, 2010 1 次提交
    • S
      jump label: Add work around to i386 gcc asm goto bug · 45f81b1c
      Steven Rostedt 提交于
      On i386 (not x86_64) early implementations of gcc would have a bug
      with asm goto causing it to produce code like the following:
      
      (This was noticed by Peter Zijlstra)
      
         56 pushl 0
         67 nopl         jmp 0x6f
            popl
            jmp 0x8c
      
         6f              mov
                         test
                         je 0x8c
      
         8c mov
            call *(%esp)
      
      The jump added in the asm goto skipped over the popl that matched
      the pushl 0, which lead up to a quick crash of the system when
      the jump was enabled. The nopl is defined in the asm goto () statement
      and when tracepoints are enabled, the nop changes to a jump to the label
      that was specified by the asm goto. asm goto is suppose to tell gcc that
      the code in the asm might jump to an external label. Here gcc obviously
      fails to make that work.
      
      The bug report for gcc is here:
      
        http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46226
      
      The bug only appears on x86 when not compiled with
      -maccumulate-outgoing-args. This option is always set on x86_64 and it
      is also the work around for a function graph tracer i386 bug.
      (See commit: 746357d6)
      This explains why the bug only showed up on i386 when function graph
      tracer was not enabled.
      
      This patch now adds a CONFIG_JUMP_LABEL option that is default
      off instead of using jump labels by default. When jump labels are
      enabled, the -maccumulate-outgoing-args will be used (causing a
      slightly larger kernel image on i386). This option will exist
      until we have a way to detect if the gcc compiler in use is safe
      to use on all configurations without the work around.
      
      Note, there exists such a test, but for now we will keep the enabling
      of jump label as a manual option.
      
      Archs that know the compiler is safe with asm goto, may choose to
      select JUMP_LABEL and enable it by default.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Cause-discovered-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: David Daney <ddaney@caviumnetworks.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Richard Henderson <rth@redhat.com>
      LKML-Reference: <1288028746.3673.11.camel@laptop>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      45f81b1c
  15. 23 9月, 2010 1 次提交
    • J
      jump label: Base patch for jump label · bf5438fc
      Jason Baron 提交于
      base patch to implement 'jump labeling'. Based on a new 'asm goto' inline
      assembly gcc mechanism, we can now branch to labels from an 'asm goto'
      statment. This allows us to create a 'no-op' fastpath, which can subsequently
      be patched with a jump to the slowpath code. This is useful for code which
      might be rarely used, but which we'd like to be able to call, if needed.
      Tracepoints are the current usecase that these are being implemented for.
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      LKML-Reference: <ee8b3595967989fdaf84e698dc7447d315ce972a.1284733808.git.jbaron@redhat.com>
      
      [ cleaned up some formating ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      bf5438fc
  16. 14 9月, 2010 1 次提交
    • M
      kprobes: Fix Kconfig dependency · 05ed160e
      Masami Hiramatsu 提交于
      Fix Kconfig dependency among Kprobes, optprobe and kallsyms.
      
      Kprobes uses kallsyms_lookup for finding target function and
      checking instruction boundary, thus CONFIG_KPROBES should select
      CONFIG_KALLSYMS.
      
      Optprobe is an optional feature which is supported on x86 arch,
      and it also uses kallsyms_lookup for checking instructions in
      the target function. Since KALLSYMS_ALL just adds symbols of
      kernel variables, it doesn't need to select KALLSYMS_ALL.
      Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Acked-by: Randy Dunlap <randy.dunlap@oracle.com>,
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Felipe Contreras <felipe.contreras@gmail.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: akpm <akpm@linux-foundation.org>
      LKML-Reference: <20100913102541.20260.85700.stgit@ltc236.sdl.hitachi.co.jp>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      05ed160e
  17. 16 5月, 2010 2 次提交
    • F
      lockup_detector: Introduce CONFIG_HARDLOCKUP_DETECTOR · 23637d47
      Frederic Weisbecker 提交于
      This new config is deemed to simplify even more the lockup detector
      dependencies and can make it easier to bring a smooth sorting
      between archs that support the new generic lockup detector and those
      that still have their own, especially for those that are in the
      middle of this migration.
      
      Instead of checking whether we have CONFIG_LOCKUP_DETECTOR +
      CONFIG_PERF_EVENTS_NMI each time an arch wants to know if it needs
      to build its own lockup detector, take a shortcut with this new
      config. It is enabled only if the hardlockup detection part of
      the whole lockup detector is on.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      23637d47
    • F
      lockup_detector: Adapt CONFIG_PERF_EVENT_NMI to other archs · c01d4323
      Frederic Weisbecker 提交于
      CONFIG_PERF_EVENT_NMI is something that need to be enabled from the
      arch. This is fine on x86 as PERF_EVENTS is builtin but if other
      archs select it, they will need to handle the PERF_EVENTS dependency.
      
      Instead, handle the dependency in the generic layer:
      
      - archs need to tell what they support through HAVE_PERF_EVENTS_NMI
      - Enable magically PERF_EVENTS_NMI if we have PERF_EVENTS and
        HAVE_PERF_EVENTS_NMI.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      c01d4323
  18. 01 5月, 2010 1 次提交
    • F
      hw-breakpoints: Separate constraint space for data and instruction breakpoints · 0102752e
      Frederic Weisbecker 提交于
      There are two outstanding fashions for archs to implement hardware
      breakpoints.
      
      The first is to separate breakpoint address pattern definition
      space between data and instruction breakpoints. We then have
      typically distinct instruction address breakpoint registers
      and data address breakpoint registers, delivered with
      separate control registers for data and instruction breakpoints
      as well. This is the case of PowerPc and ARM for example.
      
      The second consists in having merged breakpoint address space
      definition between data and instruction breakpoint. Address
      registers can host either instruction or data address and
      the access mode for the breakpoint is defined in a control
      register. This is the case of x86 and Super H.
      
      This patch adds a new CONFIG_HAVE_MIXED_BREAKPOINTS_REGS config
      that archs can select if they belong to the second case. Those
      will have their slot allocation merged for instructions and
      data breakpoints.
      
      The others will have a separate slot tracking between data and
      instruction breakpoints.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NPaul Mundt <lethal@linux-sh.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Cc: K. Prasad <prasad@linux.vnet.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      0102752e
  19. 16 3月, 2010 1 次提交
    • M
      kprobes: Hide CONFIG_OPTPROBES and set if arch supports optimized kprobes · 5cc718b9
      Masami Hiramatsu 提交于
      Hide CONFIG_OPTPROBES and set if the arch supports optimized
      kprobes (IOW, HAVE_OPTPROBES=y), since this option doesn't
      change the major behavior of kprobes, and workarounds for minor
      changes are documented.
      Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Cc: systemtap <systemtap@sources.redhat.com>
      Cc: DLE <dle-develop@lists.sourceforge.net>
      Cc: Dieter Ries <mail@dieterries.net>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <20100315170054.31593.3153.stgit@localhost6.localdomain6>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5cc718b9
  20. 26 2月, 2010 4 次提交
    • R
      oprofile/x86: remove OPROFILE_IBS config option · 013cfc50
      Robert Richter 提交于
      OProfile support for IBS is now for several versions in the
      kernel. The feature is stable now and the code can be activated
      permanently.
      
      As a side effect IBS now works also on nosmp configs.
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      013cfc50
    • R
      oprofile: remove EXPERIMENTAL from the config option description · b309a294
      Robert Richter 提交于
      OProfile is already used for a long time and no longer experimental.
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      b309a294
    • R
      oprofile: remove tracing build dependency · 18b4a4d5
      Robert Richter 提交于
      The commit
      
       1155de47 ring-buffer: Make it generally available
      
      already made ring-buffer available without the TRACING option
      enabled. This patch removes the TRACING dependency from oprofile.
      
      Fixes also oprofile configuration on ia64.
      
      The patch also applies to the 2.6.32-stable kernel.
      Reported-by: NTony Jones <tonyj@suse.de>
      Cc: stable@kernel.org
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      18b4a4d5
    • M
      kprobes: Introduce kprobes jump optimization · afd66255
      Masami Hiramatsu 提交于
      Introduce kprobes jump optimization arch-independent parts.
      Kprobes uses breakpoint instruction for interrupting execution
      flow, on some architectures, it can be replaced by a jump
      instruction and interruption emulation code. This gains kprobs'
      performance drastically.
      
      To enable this feature, set CONFIG_OPTPROBES=y (default y if the
      arch supports OPTPROBE).
      
      Changes in v9:
       - Fix a bug to optimize probe when enabling.
       - Check nearby probes can be optimize/unoptimize when disarming/arming
         kprobes, instead of registering/unregistering. This will help
         kprobe-tracer because most of probes on it are usually disabled.
      
      Changes in v6:
       - Cleanup coding style for readability.
       - Add comments around get/put_online_cpus().
      
      Changes in v5:
       - Use get_online_cpus()/put_online_cpus() for avoiding text_mutex
         deadlock.
      Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Cc: systemtap <systemtap@sources.redhat.com>
      Cc: DLE <dle-develop@lists.sourceforge.net>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Jim Keniston <jkenisto@us.ibm.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Anders Kaseorg <andersk@ksplice.com>
      Cc: Tim Abbott <tabbott@ksplice.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      LKML-Reference: <20100225133407.6725.81992.stgit@localhost6.localdomain6>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      afd66255
  21. 23 2月, 2010 1 次提交
  22. 17 2月, 2010 1 次提交
  23. 18 12月, 2009 1 次提交
    • F
      hw-breakpoints: Fix hardware breakpoints -> perf events dependency · 99e8c5a3
      Frederic Weisbecker 提交于
      The kbuild's select command doesn't propagate through the config
      dependencies.
      
      Hence the current rules of hardware breakpoint's config can't
      ensure perf can never be disabled under us.
      
      We have:
      
      config X86
      	selects HAVE_HW_BREAKPOINTS
      
      config HAVE_HW_BREAKPOINTS
      	select PERF_EVENTS
      
      config PERF_EVENTS
      	[...]
      
      x86 will select the breakpoints but that won't propagate to perf
      events. The user can still disable the latter, but it is
      necessary for the breakpoints.
      
      What we need is:
      
       - x86 selects HAVE_HW_BREAKPOINTS and PERF_EVENTS
       - HAVE_HW_BREAKPOINTS depends on PERF_EVENTS
      
      so that we ensure PERF_EVENTS is enabled and frozen for x86.
      
      This fixes the following kind of build errors:
      
       In file included from arch/x86/kernel/hw_breakpoint.c:31:
       include/linux/hw_breakpoint.h: In function 'hw_breakpoint_addr':
       include/linux/hw_breakpoint.h:39: error: 'struct perf_event' has no member named 'attr'
      
      v2: Select also ANON_INODES from x86, required for perf
      Reported-by: NCyrill Gorcunov <gorcunov@gmail.com>
      Reported-by: NMichal Marek <mmarek@suse.cz>
      Reported-by: NAndrew Randrianasulu <randrik_a@yahoo.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: K.Prasad <prasad@linux.vnet.ibm.com>
      LKML-Reference: <1261010034-7786-1-git-send-regression-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      99e8c5a3
  24. 08 11月, 2009 1 次提交
    • F
      hw-breakpoints: Rewrite the hw-breakpoints layer on top of perf events · 24f1e32c
      Frederic Weisbecker 提交于
      This patch rebase the implementation of the breakpoints API on top of
      perf events instances.
      
      Each breakpoints are now perf events that handle the
      register scheduling, thread/cpu attachment, etc..
      
      The new layering is now made as follows:
      
             ptrace       kgdb      ftrace   perf syscall
                \          |          /         /
                 \         |         /         /
                                              /
                  Core breakpoint API        /
                                            /
                           |               /
                           |              /
      
                    Breakpoints perf events
      
                           |
                           |
      
                     Breakpoints PMU ---- Debug Register constraints handling
                                          (Part of core breakpoint API)
                           |
                           |
      
                   Hardware debug registers
      
      Reasons of this rewrite:
      
      - Use the centralized/optimized pmu registers scheduling,
        implying an easier arch integration
      - More powerful register handling: perf attributes (pinned/flexible
        events, exclusive/non-exclusive, tunable period, etc...)
      
      Impact:
      
      - New perf ABI: the hardware breakpoints counters
      - Ptrace breakpoints setting remains tricky and still needs some per
        thread breakpoints references.
      
      Todo (in the order):
      
      - Support breakpoints perf counter events for perf tools (ie: implement
        perf_bpcounter_event())
      - Support from perf tools
      
      Changes in v2:
      
      - Follow the perf "event " rename
      - The ptrace regression have been fixed (ptrace breakpoint perf events
        weren't released when a task ended)
      - Drop the struct hw_breakpoint and store generic fields in
        perf_event_attr.
      - Separate core and arch specific headers, drop
        asm-generic/hw_breakpoint.h and create linux/hw_breakpoint.h
      - Use new generic len/type for breakpoint
      - Handle off case: when breakpoints api is not supported by an arch
      
      Changes in v3:
      
      - Fix broken CONFIG_KVM, we need to propagate the breakpoint api
        changes to kvm when we exit the guest and restore the bp registers
        to the host.
      
      Changes in v4:
      
      - Drop the hw_breakpoint_restore() stub as it is only used by KVM
      - EXPORT_SYMBOL_GPL hw_breakpoint_restore() as KVM can be built as a
        module
      - Restore the breakpoints unconditionally on kvm guest exit:
        TIF_DEBUG_THREAD doesn't anymore cover every cases of running
        breakpoints and vcpu->arch.switch_db_regs might not always be
        set when the guest used debug registers.
        (Waiting for a reliable optimization)
      
      Changes in v5:
      
      - Split-up the asm-generic/hw-breakpoint.h moving to
        linux/hw_breakpoint.h into a separate patch
      - Optimize the breakpoints restoring while switching from kvm guest
        to host. We only want to restore the state if we have active
        breakpoints to the host, otherwise we don't care about messed-up
        address registers.
      - Add asm/hw_breakpoint.h to Kbuild
      - Fix bad breakpoint type in trace_selftest.c
      
      Changes in v6:
      
      - Fix wrong header inclusion in trace.h (triggered a build
        error with CONFIG_FTRACE_SELFTEST
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jan Kiszka <jan.kiszka@web.de>
      Cc: Jiri Slaby <jirislaby@gmail.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      24f1e32c
  25. 02 10月, 2009 1 次提交
    • A
      core, x86: Add user return notifiers · 7c68af6e
      Avi Kivity 提交于
      Add a general per-cpu notifier that is called whenever the kernel is
      about to return to userspace.  The notifier uses a thread_info flag
      and existing checks, so there is no impact on user return or context
      switch fast paths.
      
      This will be used initially to speed up KVM task switching by lazily
      updating MSRs.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      LKML-Reference: <1253342422-13811-1-git-send-email-avi@redhat.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      7c68af6e
  26. 18 9月, 2009 1 次提交
  27. 20 7月, 2009 1 次提交
    • J
      oprofile: Implement performance counter multiplexing · 4d4036e0
      Jason Yeh 提交于
      The number of hardware counters is limited. The multiplexing feature
      enables OProfile to gather more events than counters are provided by
      the hardware. This is realized by switching between events at an user
      specified time interval.
      
      A new file (/dev/oprofile/time_slice) is added for the user to specify
      the timer interval in ms. If the number of events to profile is higher
      than the number of hardware counters available, the patch will
      schedule a work queue that switches the event counter and re-writes
      the different sets of values into it. The switching mechanism needs to
      be implemented for each architecture to support multiplexing. This
      patch only implements AMD CPU support, but multiplexing can be easily
      extended for other models and architectures.
      
      There are follow-on patches that rework parts of this patch.
      Signed-off-by: NJason Yeh <jason.yeh@amd.com>
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      4d4036e0
  28. 19 6月, 2009 1 次提交
    • P
      gcov: add gcov profiling infrastructure · 2521f2c2
      Peter Oberparleiter 提交于
      Enable the use of GCC's coverage testing tool gcov [1] with the Linux
      kernel.  gcov may be useful for:
      
       * debugging (has this code been reached at all?)
       * test improvement (how do I change my test to cover these lines?)
       * minimizing kernel configurations (do I need this option if the
         associated code is never run?)
      
      The profiling patch incorporates the following changes:
      
       * change kbuild to include profiling flags
       * provide functions needed by profiling code
       * present profiling data as files in debugfs
      
      Note that on some architectures, enabling gcc's profiling option
      "-fprofile-arcs" for the entire kernel may trigger compile/link/
      run-time problems, some of which are caused by toolchain bugs and
      others which require adjustment of architecture code.
      
      For this reason profiling the entire kernel is initially restricted
      to those architectures for which it is known to work without changes.
      This restriction can be lifted once an architecture has been tested
      and found compatible with gcc's profiling. Profiling of single files
      or directories is still available on all platforms (see config help
      text).
      
      [1] http://gcc.gnu.org/onlinedocs/gcc/Gcov.htmlSigned-off-by: NPeter Oberparleiter <oberpar@linux.vnet.ibm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Li Wei <W.Li@Sun.COM>
      Cc: Michael Ellerman <michaele@au1.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Heiko Carstens <heicars2@linux.vnet.ibm.com>
      Cc: Martin Schwidefsky <mschwid2@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: WANG Cong <xiyou.wangcong@gmail.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2521f2c2
  29. 03 6月, 2009 1 次提交
  30. 10 4月, 2009 1 次提交
    • H
      mutex: have non-spinning mutexes on s390 by default · 36cd3c9f
      Heiko Carstens 提交于
      Impact: performance regression fix for s390
      
      The adaptive spinning mutexes will not always do what one would expect on
      virtualized architectures like s390. Especially the cpu_relax() loop in
      mutex_spin_on_owner might hurt if the mutex holding cpu has been scheduled
      away by the hypervisor.
      
      We would end up in a cpu_relax() loop when there is no chance that the
      state of the mutex changes until the target cpu has been scheduled again by
      the hypervisor.
      
      For that reason we should change the default behaviour to no-spin on s390.
      
      We do have an instruction which allows to yield the current cpu in favour of
      a different target cpu. Also we have an instruction which allows us to figure
      out if the target cpu is physically backed.
      
      However we need to do some performance tests until we can come up with
      a solution that will do the right thing on s390.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      LKML-Reference: <20090409184834.7a0df7b2@osiris.boeblingen.de.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      36cd3c9f
  31. 06 3月, 2009 1 次提交
  32. 05 3月, 2009 1 次提交
  33. 14 1月, 2009 1 次提交
  34. 12 12月, 2008 1 次提交
    • I
      oprofile: select RING_BUFFER · d69d59f4
      Ingo Molnar 提交于
      Impact: build fix
      
      OProfile now depends on the ring buffer infrastructure:
      
       arch/x86/oprofile/built-in.o: In function `oprofile_add_ibs_sample':
       : undefined reference to `ring_buffer_unlock_commit'
      
      Select TRACING and RING_BUFFER when oprofile is enabled.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d69d59f4