1. 14 9月, 2012 1 次提交
  2. 23 8月, 2012 2 次提交
  3. 10 8月, 2012 5 次提交
    • F
      perf: Add attribute to filter out callchains · d0775264
      Frederic Weisbecker 提交于
      Introducing following bits to the the perf_event_attr struct:
      
        - exclude_callchain_kernel to filter out kernel callchain
          from the sample dump
      
        - exclude_callchain_user to filter out user callchain
          from the sample dump
      
      We need to be able to disable standard user callchain dump when we use
      the dwarf cfi callchain mode, because frame pointer based user
      callchains are useless in this mode.
      
      Implementing also exclude_callchain_kernel to have complete set of
      options.
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      [ Added kernel callchains filtering ]
      Cc: "Frank Ch. Eigler" <fche@redhat.com>
      Cc: Arun Sharma <asharma@fb.com>
      Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Frank Ch. Eigler <fche@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Ulrich Drepper <drepper@gmail.com>
      Link: http://lkml.kernel.org/r/1344345647-11536-7-git-send-email-jolsa@redhat.comSigned-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d0775264
    • J
      perf: Add ability to attach user stack dump to sample · c5ebcedb
      Jiri Olsa 提交于
      Introducing PERF_SAMPLE_STACK_USER sample type bit to trigger the dump
      of the user level stack on sample. The size of the dump is specified by
      sample_stack_user value.
      
      Being able to dump parts of the user stack, starting from the stack
      pointer, will be useful to make a post mortem dwarf CFI based stack
      unwinding.
      
      Added HAVE_PERF_USER_STACK_DUMP config option to determine if the
      architecture provides user stack dump on perf event samples.  This needs
      access to the user stack pointer which is not unified across
      architectures. Enabling this for x86 architecture.
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Original-patch-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: "Frank Ch. Eigler" <fche@redhat.com>
      Cc: Arun Sharma <asharma@fb.com>
      Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Frank Ch. Eigler <fche@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Ulrich Drepper <drepper@gmail.com>
      Link: http://lkml.kernel.org/r/1344345647-11536-6-git-send-email-jolsa@redhat.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c5ebcedb
    • J
      perf: Add perf_output_skip function to skip bytes in sample · 5685e0ff
      Jiri Olsa 提交于
      Introducing perf_output_skip function to be able to skip data within the
      perf ring buffer.
      
      When writing data into perf ring buffer we first reserve needed place in
      ring buffer and then copy the actual data.
      
      There's a possibility we won't be able to fill all the reserved size
      with data, so we need a way to skip the remaining bytes.
      
      This is going to be useful when storing the user stack dump, where we
      might end up with less data than we originally requested.
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: "Frank Ch. Eigler" <fche@redhat.com>
      Cc: Arun Sharma <asharma@fb.com>
      Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Frank Ch. Eigler <fche@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Ulrich Drepper <drepper@gmail.com>
      Link: http://lkml.kernel.org/r/1344345647-11536-5-git-send-email-jolsa@redhat.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5685e0ff
    • F
      perf: Factor __output_copy to be usable with specific copy function · 91d7753a
      Frederic Weisbecker 提交于
      Adding a generic way to use __output_copy function with specific copy
      function via DEFINE_PERF_OUTPUT_COPY macro.
      
      Using this to add new __output_copy_user function, that provides output
      copy from user pointers. For x86 the copy_from_user_nmi function is used
      and __copy_from_user_inatomic for the rest of the architectures.
      
      This new function will be used in user stack dump on sample, coming in
      next patches.
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Cc: "Frank Ch. Eigler" <fche@redhat.com>
      Cc: Arun Sharma <asharma@fb.com>
      Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Frank Ch. Eigler <fche@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Ulrich Drepper <drepper@gmail.com>
      Link: http://lkml.kernel.org/r/1344345647-11536-4-git-send-email-jolsa@redhat.comSigned-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      91d7753a
    • J
      perf: Add ability to attach user level registers dump to sample · 4018994f
      Jiri Olsa 提交于
      Introducing PERF_SAMPLE_REGS_USER sample type bit to trigger the dump of
      user level registers on sample. Registers we want to dump are specified
      by sample_regs_user bitmask.
      
      Only user level registers are dumped at the moment. Meaning the register
      values of the user space context as it was before the user entered the
      kernel for whatever reason (syscall, irq, exception, or a PMI happening
      in userspace).
      
      The layout of the sample_regs_user bitmap is described in
      asm/perf_regs.h for archs that support register dump.
      
      This is going to be useful to bring Dwarf CFI based stack unwinding on
      top of samples.
      Original-patch-by: NFrederic Weisbecker <fweisbec@gmail.com>
      [ Dump registers ABI specification. ]
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Suggested-by: NStephane Eranian <eranian@google.com>
      Cc: "Frank Ch. Eigler" <fche@redhat.com>
      Cc: Arun Sharma <asharma@fb.com>
      Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Frank Ch. Eigler <fche@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Ulrich Drepper <drepper@gmail.com>
      Link: http://lkml.kernel.org/r/1344345647-11536-3-git-send-email-jolsa@redhat.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4018994f
  4. 07 8月, 2012 3 次提交
  5. 01 8月, 2012 5 次提交
    • M
      mm: allow PF_MEMALLOC from softirq context · 907aed48
      Mel Gorman 提交于
      This is needed to allow network softirq packet processing to make use of
      PF_MEMALLOC.
      
      Currently softirq context cannot use PF_MEMALLOC due to it not being
      associated with a task, and therefore not having task flags to fiddle with
      - thus the gfp to alloc flag mapping ignores the task flags when in
      interrupts (hard or soft) context.
      
      Allowing softirqs to make use of PF_MEMALLOC therefore requires some
      trickery.  This patch borrows the task flags from whatever process happens
      to be preempted by the softirq.  It then modifies the gfp to alloc flags
      mapping to not exclude task flags in softirq context, and modify the
      softirq code to save, clear and restore the PF_MEMALLOC flag.
      
      The save and clear, ensures the preempted task's PF_MEMALLOC flag doesn't
      leak into the softirq.  The restore ensures a softirq's PF_MEMALLOC flag
      cannot leak back into the preempted process.  This should be safe due to
      the following reasons
      
      Softirqs can run on multiple CPUs sure but the same task should not be
      	executing the same softirq code. Neither should the softirq
      	handler be preempted by any other softirq handler so the flags
      	should not leak to an unrelated softirq.
      
      Softirqs re-enable hardware interrupts in __do_softirq() so can be
      	preempted by hardware interrupts so PF_MEMALLOC is inherited
      	by the hard IRQ. However, this is similar to a process in
      	reclaim being preempted by a hardirq. While PF_MEMALLOC is
      	set, gfp_to_alloc_flags() distinguishes between hard and
      	soft irqs and avoids giving a hardirq the ALLOC_NO_WATERMARKS
      	flag.
      
      If the softirq is deferred to ksoftirq then its flags may be used
              instead of a normal tasks but as the softirq cannot be preempted,
              the PF_MEMALLOC flag does not leak to other code by accident.
      
      [davem@davemloft.net: Document why PF_MEMALLOC is safe]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Cc: David Miller <davem@davemloft.net>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Mike Christie <michaelc@cs.wisc.edu>
      Cc: Eric B Munson <emunson@mgebm.net>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      907aed48
    • J
      mm/hotplug: correctly setup fallback zonelists when creating new pgdat · 9adb62a5
      Jiang Liu 提交于
      When hotadd_new_pgdat() is called to create new pgdat for a new node, a
      fallback zonelist should be created for the new node.  There's code to try
      to achieve that in hotadd_new_pgdat() as below:
      
      	/*
      	 * The node we allocated has no zone fallback lists. For avoiding
      	 * to access not-initialized zonelist, build here.
      	 */
      	mutex_lock(&zonelists_mutex);
      	build_all_zonelists(pgdat, NULL);
      	mutex_unlock(&zonelists_mutex);
      
      But it doesn't work as expected.  When hotadd_new_pgdat() is called, the
      new node is still in offline state because node_set_online(nid) hasn't
      been called yet.  And build_all_zonelists() only builds zonelists for
      online nodes as:
      
              for_each_online_node(nid) {
                      pg_data_t *pgdat = NODE_DATA(nid);
      
                      build_zonelists(pgdat);
                      build_zonelist_cache(pgdat);
              }
      
      Though we hope to create zonelist for the new pgdat, but it doesn't.  So
      add a new parameter "pgdat" the build_all_zonelists() to build pgdat for
      the new pgdat too.
      Signed-off-by: NJiang Liu <liuj97@gmail.com>
      Signed-off-by: NXishi Qiu <qiuxishi@huawei.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Keping Chen <chenkeping@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9adb62a5
    • A
      memcg: rename config variables · c255a458
      Andrew Morton 提交于
      Sanity:
      
      CONFIG_CGROUP_MEM_RES_CTLR -> CONFIG_MEMCG
      CONFIG_CGROUP_MEM_RES_CTLR_SWAP -> CONFIG_MEMCG_SWAP
      CONFIG_CGROUP_MEM_RES_CTLR_SWAP_ENABLED -> CONFIG_MEMCG_SWAP_ENABLED
      CONFIG_CGROUP_MEM_RES_CTLR_KMEM -> CONFIG_MEMCG_KMEM
      
      [mhocko@suse.cz: fix missed bits]
      Cc: Glauber Costa <glommer@parallels.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c255a458
    • W
      mm: prepare for removal of obsolete /proc/sys/vm/nr_pdflush_threads · 3965c9ae
      Wanpeng Li 提交于
      Since per-BDI flusher threads were introduced in 2.6, the pdflush
      mechanism is not used any more.  But the old interface exported through
      /proc/sys/vm/nr_pdflush_threads still exists and is obviously useless.
      
      For back-compatibility, printk warning information and return 2 to notify
      the users that the interface is removed.
      Signed-off-by: NWanpeng Li <liwp@linux.vnet.ibm.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3965c9ae
    • H
      mm: account the total_vm in the vm_stat_account() · 44de9d0c
      Huang Shijie 提交于
      vm_stat_account() accounts the shared_vm, stack_vm and reserved_vm now.
      But we can also account for total_vm in the vm_stat_account() which makes
      the code tidy.
      
      Even for mprotect_fixup(), we can get the right result in the end.
      Signed-off-by: NHuang Shijie <shijie8@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      44de9d0c
  6. 31 7月, 2012 24 次提交
    • J
      time: Remove all direct references to timekeeper · 4e250fdd
      John Stultz 提交于
      Ingo noted that the numerous timekeeper.value references made
      the timekeeping code ugly and caused many long lines that
      had to be broken up. He recommended replacing timekeeper.value
      references with tk->value.
      
      This patch provides a local tk value for all top level time
      functions and sets it to &timekeeper. Then all timekeeper
      access is done via a tk pointer.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Link: http://lkml.kernel.org/r/1343414893-45779-6-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4e250fdd
    • J
      time: Clean up offs_real/wall_to_mono and offs_boot/total_sleep_time updates · 6d0ef903
      John Stultz 提交于
      For performance reasons, we maintain ktime_t based duplicates of
      wall_to_monotonic (offs_real) and total_sleep_time (offs_boot).
      
      Since large problems could occur (such as the resume regression
      on 3.5-rc7, or the leapsecond hrtimer issue) if these value
      pairs were to be inconsistently updated, this patch this cleans
      up how we modify these value pairs to ensure we are always
      consistent.
      
      As a side-effect this is also more efficient as we only
      caulculate the duplicate values when they are changed,
      rather then every update_wall_time call.
      
      This also provides WARN_ONs to detect if future changes break
      the invariants.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Link: http://lkml.kernel.org/r/1343414893-45779-5-git-send-email-john.stultz@linaro.org
      [ Cleaned up minor style issues. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      6d0ef903
    • J
      time: Clean up stray newlines · d4e3ab38
      John Stultz 提交于
      Ingo noted inconsistent newline usage between functions.
      This patch cleans those up.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Link: http://lkml.kernel.org/r/1343414893-45779-4-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d4e3ab38
    • J
      time/jiffies: Rename ACTHZ to SHIFTED_HZ · 02ab20ae
      John Stultz 提交于
      Ingo noted that ACTHZ is a confusing name, and requested it
      be renamed, so this patch renames ACTHZ to SHIFTED_HZ to
      better describe it.
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Link: http://lkml.kernel.org/r/1343414893-45779-3-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      02ab20ae
    • A
      perf/trace: Add ability to set a target task for events · e6dab5ff
      Andrew Vagin 提交于
      A few events are interesting not only for a current task.
      For example, sched_stat_* events are interesting for a task
      which wakes up. For this reason, it will be good if such
      events will be delivered to a target task too.
      
      Now a target task can be set by using __perf_task().
      
      The original idea and a draft patch belongs to Peter Zijlstra.
      
      I need these events for profiling sleep times. sched_switch is used for
      getting callchains and sched_stat_* is used for getting time periods.
      These events are combined in user space, then it can be analyzed by
      perf tools.
      Inspired-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Arun Sharma <asharma@fb.com>
      Signed-off-by: NAndrew Vagin <avagin@openvz.org>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1342016098-213063-1-git-send-email-avagin@openvz.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e6dab5ff
    • M
      sched/cleanups: Add load balance cpumask pointer to 'struct lb_env' · b9403130
      Michael Wang 提交于
      With this patch struct ld_env will have a pointer of the load balancing
      cpumask and we don't need to pass a cpumask around anymore.
      Signed-off-by: NMichael Wang <wangyun@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/4FFE8665.3080705@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b9403130
    • M
      kprobes/x86: ftrace based optimization for x86 · e5253896
      Masami Hiramatsu 提交于
      Add function tracer based kprobe optimization support
      handlers on x86. This allows kprobes to use function
      tracer for probing on mcount call.
      
      Link: http://lkml.kernel.org/r/20120605102838.27845.26317.stgit@localhost.localdomain
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: "Frank Ch. Eigler" <fche@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      
      [ Updated to new port of ftrace save regs functions ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e5253896
    • M
      kprobes: introduce ftrace based optimization · ae6aa16f
      Masami Hiramatsu 提交于
      Introduce function trace based kprobes optimization.
      
      With using ftrace optimization, kprobes on the mcount calling
      address, use ftrace's mcount call instead of breakpoint.
      Furthermore, this optimization works with preemptive kernel
      not like as current jump-based optimization. Of cource,
      this feature works only if the probe is on mcount call.
      
      Only if kprobe.break_handler is set, that probe is not
      optimized with ftrace (nor put on ftrace). The reason why this
      limitation comes is that this break_handler may be used only
      from jprobes which changes ip address (for fetching the function
      arguments), but function tracer ignores modified ip address.
      
      Changes in v2:
       - Fix ftrace_ops registering right after setting its filter.
       - Unregister ftrace_ops if there is no kprobe using.
       - Remove notrace dependency from __kprobes macro.
      
      Link: http://lkml.kernel.org/r/20120605102832.27845.63461.stgit@localhost.localdomain
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: "Frank Ch. Eigler" <fche@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ae6aa16f
    • M
      kprobes: Move locks into appropriate functions · 25764288
      Masami Hiramatsu 提交于
      Break a big critical region into fine-grained pieces at
      registering kprobe path. This helps us to solve circular
      locking dependency when introducing ftrace-based kprobes.
      
      Link: http://lkml.kernel.org/r/20120605102826.27845.81689.stgit@localhost.localdomain
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: "Frank Ch. Eigler" <fche@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      25764288
    • M
      kprobes: cleanup to separate probe-able check · f7fa6ef0
      Masami Hiramatsu 提交于
      Separate probe-able address checking code from
      register_kprobe().
      
      Link: http://lkml.kernel.org/r/20120605102820.27845.90133.stgit@localhost.localdomain
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: "Frank Ch. Eigler" <fche@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f7fa6ef0
    • S
      kprobes: Inverse taking of module_mutex with kprobe_mutex · 72ef3794
      Steven Rostedt 提交于
      Currently module_mutex is taken before kprobe_mutex, but this
      can cause issues when we have kprobes register ftrace, as the ftrace
      mutex is taken before enabling a tracepoint, which currently takes
      the module mutex.
      
      If module_mutex is taken before kprobe_mutex, then we can not
      have kprobes use the ftrace infrastructure.
      
      There seems to be no reason that the kprobe_mutex can't be taken
      before the module_mutex. Running lockdep shows that it is safe
      among the kernels I've run.
      
      Link: http://lkml.kernel.org/r/20120605102814.27845.21047.stgit@localhost.localdomain
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: "Frank Ch. Eigler" <fche@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      72ef3794
    • M
      ftrace: add ftrace_set_filter_ip() for address based filter · 647664ea
      Masami Hiramatsu 提交于
      Add a new filter update interface ftrace_set_filter_ip()
      to set ftrace filter by ip address, not only glob pattern.
      
      Link: http://lkml.kernel.org/r/20120605102808.27845.67952.stgit@localhost.localdomain
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: "Frank Ch. Eigler" <fche@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      647664ea
    • S
      ftrace: Add selftest to test function save-regs support · ad97772a
      Steven Rostedt 提交于
      Add selftests to test the save-regs functionality of ftrace.
      
      If the arch supports saving regs, then it will make sure that regs is
      at least not NULL in the callback.
      
      If the arch does not support saving regs, it makes sure that the
      registering of the ftrace_ops that requests saving regs fails.
      It then tests the registering of the ftrace_ops succeeds if the
      'IF_SUPPORTED' flag is set. Then it makes sure that the regs passed to
      the function is NULL.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ad97772a
    • S
      ftrace: Add selftest to test function trace recursion protection · ea701f11
      Steven Rostedt 提交于
      Add selftests to test the function tracing recursion protection actually
      does work. It also tests if a ftrace_ops states it will perform its own
      protection. Although, even if the ftrace_ops states it will protect itself,
      the ftrace infrastructure may still provide protection if the arch does
      not support all features or another ftrace_ops is registered.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ea701f11
    • S
      ftrace: Only compile ftrace selftest if selftests are enabled · 47239c4d
      Steven Rostedt 提交于
      No need to compile in the ftrace selftest helper file if selftests are
      not being executed.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      47239c4d
    • S
      ftrace: Add default recursion protection for function tracing · 4740974a
      Steven Rostedt 提交于
      As more users of the function tracer utility are being added, they do
      not always add the necessary recursion protection. To protect from
      function recursion due to tracing, if the callback ftrace_ops does not
      specifically specify that it protects against recursion (by setting
      the FTRACE_OPS_FL_RECURSION_SAFE flag), the list operation will be
      called by the mcount trampoline which adds recursion protection.
      
      If the flag is set, then the function will be called directly with no
      extra protection.
      
      Note, the list operation is called if more than one function callback
      is registered, or if the arch does not support all of the function
      tracer features.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      4740974a
    • A
      kernel/debug: Make use of KGDB_REASON_NMI · b10d22d6
      Anton Vorontsov 提交于
      Currently kernel never set KGDB_REASON_NMI. We do now, when we enter
      KGDB/KDB from an NMI.
      
      This is not to be confused with kgdb_nmicallback(), NMI callback is
      an entry for the slave CPUs during CPUs roundup, but REASON_NMI is the
      entry for the master CPU.
      Signed-off-by: NAnton Vorontsov <anton.vorontsov@linaro.org>
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      b10d22d6
    • J
      kdb: Remove cpu from the more prompt · 07cd27bb
      Jason Wessel 提交于
      Having the CPU in the more prompt is completely redundent vs the
      standard kdb prompt, and it also wastes 32 bytes on the stack.
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      07cd27bb
    • J
      kdb: Remove unused KDB_FLAG_ONLY_DO_DUMP · 0f26d0e0
      Jason Wessel 提交于
      This code cleanup was missed in the original kdb merge, and this code
      is simply not used at all.  The code that was previously used to set
      the KDB_FLAG_ONLY_DO_DUMP was removed prior to the initial kdb merge.
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      0f26d0e0
    • O
      resource: make sure requested range is included in the root range · 65fed8f6
      Octavian Purdila 提交于
      When the requested range is outside of the root range the logic in
      __reserve_region_with_split will cause an infinite recursion which will
      overflow the stack as seen in the warning bellow.
      
      This particular stack overflow was caused by requesting the
      (100000000-107ffffff) range while the root range was (0-ffffffff).  In
      this case __request_resource would return the whole root range as
      conflict range (i.e.  0-ffffffff).  Then, the logic in
      __reserve_region_with_split would continue the recursion requesting the
      new range as (conflict->end+1, end) which incidentally in this case
      equals the originally requested range.
      
      This patch aborts looking for an usable range when the request does not
      intersect with the root range.  When the request partially overlaps with
      the root range, it ajust the request to fall in the root range and then
      continues with the new request.
      
      When the request is modified or aborted errors and a stack trace are
      logged to allow catching the errors in the upper layers.
      
      [    5.968374] WARNING: at kernel/sched.c:4129 sub_preempt_count+0x63/0x89()
      [    5.975150] Modules linked in:
      [    5.978184] Pid: 1, comm: swapper Not tainted 3.0.22-mid27-00004-gb72c817 #46
      [    5.985324] Call Trace:
      [    5.987759]  [<c1039dfc>] ? console_unlock+0x17b/0x18d
      [    5.992891]  [<c1039620>] warn_slowpath_common+0x48/0x5d
      [    5.998194]  [<c1031758>] ? sub_preempt_count+0x63/0x89
      [    6.003412]  [<c1039644>] warn_slowpath_null+0xf/0x13
      [    6.008453]  [<c1031758>] sub_preempt_count+0x63/0x89
      [    6.013499]  [<c14d60c4>] _raw_spin_unlock+0x27/0x3f
      [    6.018453]  [<c10c6349>] add_partial+0x36/0x3b
      [    6.022973]  [<c10c7c0a>] deactivate_slab+0x96/0xb4
      [    6.027842]  [<c14cf9d9>] __slab_alloc.isra.54.constprop.63+0x204/0x241
      [    6.034456]  [<c103f78f>] ? kzalloc.constprop.5+0x29/0x38
      [    6.039842]  [<c103f78f>] ? kzalloc.constprop.5+0x29/0x38
      [    6.045232]  [<c10c7dc9>] kmem_cache_alloc_trace+0x51/0xb0
      [    6.050710]  [<c103f78f>] ? kzalloc.constprop.5+0x29/0x38
      [    6.056100]  [<c103f78f>] kzalloc.constprop.5+0x29/0x38
      [    6.061320]  [<c17b45e9>] __reserve_region_with_split+0x1c/0xd1
      [    6.067230]  [<c17b4693>] __reserve_region_with_split+0xc6/0xd1
      ...
      [    7.179057]  [<c17b4693>] __reserve_region_with_split+0xc6/0xd1
      [    7.184970]  [<c17b4779>] reserve_region_with_split+0x30/0x42
      [    7.190709]  [<c17a8ebf>] e820_reserve_resources_late+0xd1/0xe9
      [    7.196623]  [<c17c9526>] pcibios_resource_survey+0x23/0x2a
      [    7.202184]  [<c17cad8a>] pcibios_init+0x23/0x35
      [    7.206789]  [<c17ca574>] pci_subsys_init+0x3f/0x44
      [    7.211659]  [<c1002088>] do_one_initcall+0x72/0x122
      [    7.216615]  [<c17ca535>] ? pci_legacy_init+0x3d/0x3d
      [    7.221659]  [<c17a27ff>] kernel_init+0xa6/0x118
      [    7.226265]  [<c17a2759>] ? start_kernel+0x334/0x334
      [    7.231223]  [<c14d7482>] kernel_thread_helper+0x6/0x10
      Signed-off-by: NOctavian Purdila <octavian.purdila@intel.com>
      Signed-off-by: NRam Pai <linuxram@us.ibm.com>
      Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      65fed8f6
    • A
      taskstats: check nla_reserve() return · 25353b33
      Alan Cox 提交于
      Addresses https://bugzilla.kernel.org/show_bug.cgi?id=44621
      
      Reported-by: <rucsoftsec@gmail.com>
      Signed-off-by: NAlan Cox <alan@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      25353b33
    • S
      sysctl: suppress kmemleak messages · fd4b616b
      Steven Rostedt 提交于
      register_sysctl_table() is a strange function, as it makes internal
      allocations (a header) to register a sysctl_table.  This header is a
      handle to the table that is created, and can be used to unregister the
      table.  But if the table is permanent and never unregistered, the header
      acts the same as a static variable.
      
      Unfortunately, this allocation of memory that is never expected to be
      freed fools kmemleak in thinking that we have leaked memory.  For those
      sysctl tables that are never unregistered, and have no pointer referencing
      them, kmemleak will think that these are memory leaks:
      
      unreferenced object 0xffff880079fb9d40 (size 192):
        comm "swapper/0", pid 0, jiffies 4294667316 (age 12614.152s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff8146b590>] kmemleak_alloc+0x73/0x98
          [<ffffffff8110a935>] kmemleak_alloc_recursive.constprop.42+0x16/0x18
          [<ffffffff8110b852>] __kmalloc+0x107/0x153
          [<ffffffff8116fa72>] kzalloc.constprop.8+0xe/0x10
          [<ffffffff811703c9>] __register_sysctl_paths+0xe1/0x160
          [<ffffffff81170463>] register_sysctl_paths+0x1b/0x1d
          [<ffffffff8117047d>] register_sysctl_table+0x18/0x1a
          [<ffffffff81afb0a1>] sysctl_init+0x10/0x14
          [<ffffffff81b05a6f>] proc_sys_init+0x2f/0x31
          [<ffffffff81b0584c>] proc_root_init+0xa5/0xa7
          [<ffffffff81ae5b7e>] start_kernel+0x3d0/0x40a
          [<ffffffff81ae52a7>] x86_64_start_reservations+0xae/0xb2
          [<ffffffff81ae53ad>] x86_64_start_kernel+0x102/0x111
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      The sysctl_base_table used by sysctl itself is one such instance that
      registers the table to never be unregistered.
      
      Use kmemleak_not_leak() to suppress the kmemleak false positive.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fd4b616b
    • V
      kdump: append newline to the last lien of vmcoreinfo note · 63dca8d5
      Vivek Goyal 提交于
      The last line of vmcoreinfo note does not end with \n.  Parsing all the
      lines in note becomes easier if all lines end with \n instead of trying to
      special case the last line.
      
      I know at least one tool, vmcore-dmesg in kexec-tools tree which made the
      assumption that all lines end with \n.  I think it is a good idea to fix
      it.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      63dca8d5
    • A
      fork: fix error handling in dup_task() · f19b9f74
      Akinobu Mita 提交于
      The function dup_task() may fail at the following function calls in the
      following order.
      
      0) alloc_task_struct_node()
      1) alloc_thread_info_node()
      2) arch_dup_task_struct()
      
      Error by 0) is not a matter, it can just return.  But error by 1) requires
      releasing task_struct allocated by 0) before it returns.  Likewise, error
      by 2) requires releasing task_struct and thread_info allocated by 0) and
      1).
      
      The existing error handling calls free_task_struct() and
      free_thread_info() which do not only release task_struct and thread_info,
      but also call architecture specific arch_release_task_struct() and
      arch_release_thread_info().
      
      The problem is that task_struct and thread_info are not fully initialized
      yet at this point, but arch_release_task_struct() and
      arch_release_thread_info() are called with them.
      
      For example, x86 defines its own arch_release_task_struct() that releases
      a task_xstate.  If alloc_thread_info_node() fails in dup_task(),
      arch_release_task_struct() is called with task_struct which is just
      allocated and filled with garbage in this error handling.
      
      This actually happened with tools/testing/fault-injection/failcmd.sh
      
      	# env FAILCMD_TYPE=fail_page_alloc \
      		./tools/testing/fault-injection/failcmd.sh --times=100 \
      		--min-order=0 --ignore-gfp-wait=0 \
      		-- make -C tools/testing/selftests/ run_tests
      
      In order to fix this issue, make free_{task_struct,thread_info}() not to
      call arch_release_{task_struct,thread_info}() and call
      arch_release_{task_struct,thread_info}() implicitly where needed.
      
      Default arch_release_task_struct() and arch_release_thread_info() are
      defined as empty by default.  So this change only affects the
      architectures which implement their own arch_release_task_struct() or
      arch_release_thread_info() as listed below.
      
      arch_release_task_struct(): x86, sh
      arch_release_thread_info(): mn10300, tile
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Salman Qazi <sqazi@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f19b9f74