1. 16 11月, 2014 3 次提交
  2. 28 10月, 2014 1 次提交
    • P
      perf: Fix and clean up initialization of pmu::event_idx · c719f560
      Peter Zijlstra 提交于
      Andy reported that the current state of event_idx is rather confused.
      So remove all but the x86_pmu implementation and change the default to
      return 0 (the safe option).
      Reported-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Cody P Schafer <cody@linux.vnet.ibm.com>
      Cc: Cody P Schafer <dev@codyps.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Himangi Saraogi <himangi774@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: sukadev@linux.vnet.ibm.com <sukadev@linux.vnet.ibm.com>
      Cc: Thomas Huth <thuth@linux.vnet.ibm.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux390@de.ibm.com
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: linux-s390@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c719f560
  3. 03 10月, 2014 3 次提交
  4. 24 9月, 2014 3 次提交
  5. 19 9月, 2014 1 次提交
  6. 16 9月, 2014 1 次提交
  7. 09 9月, 2014 3 次提交
  8. 27 8月, 2014 1 次提交
  9. 24 8月, 2014 2 次提交
  10. 20 8月, 2014 1 次提交
  11. 13 8月, 2014 3 次提交
  12. 09 8月, 2014 1 次提交
    • J
      mm: memcontrol: rewrite charge API · 00501b53
      Johannes Weiner 提交于
      These patches rework memcg charge lifetime to integrate more naturally
      with the lifetime of user pages.  This drastically simplifies the code and
      reduces charging and uncharging overhead.  The most expensive part of
      charging and uncharging is the page_cgroup bit spinlock, which is removed
      entirely after this series.
      
      Here are the top-10 profile entries of a stress test that reads a 128G
      sparse file on a freshly booted box, without even a dedicated cgroup (i.e.
       executing in the root memcg).  Before:
      
          15.36%              cat  [kernel.kallsyms]   [k] copy_user_generic_string
          13.31%              cat  [kernel.kallsyms]   [k] memset
          11.48%              cat  [kernel.kallsyms]   [k] do_mpage_readpage
           4.23%              cat  [kernel.kallsyms]   [k] get_page_from_freelist
           2.38%              cat  [kernel.kallsyms]   [k] put_page
           2.32%              cat  [kernel.kallsyms]   [k] __mem_cgroup_commit_charge
           2.18%          kswapd0  [kernel.kallsyms]   [k] __mem_cgroup_uncharge_common
           1.92%          kswapd0  [kernel.kallsyms]   [k] shrink_page_list
           1.86%              cat  [kernel.kallsyms]   [k] __radix_tree_lookup
           1.62%              cat  [kernel.kallsyms]   [k] __pagevec_lru_add_fn
      
      After:
      
          15.67%           cat  [kernel.kallsyms]   [k] copy_user_generic_string
          13.48%           cat  [kernel.kallsyms]   [k] memset
          11.42%           cat  [kernel.kallsyms]   [k] do_mpage_readpage
           3.98%           cat  [kernel.kallsyms]   [k] get_page_from_freelist
           2.46%           cat  [kernel.kallsyms]   [k] put_page
           2.13%       kswapd0  [kernel.kallsyms]   [k] shrink_page_list
           1.88%           cat  [kernel.kallsyms]   [k] __radix_tree_lookup
           1.67%           cat  [kernel.kallsyms]   [k] __pagevec_lru_add_fn
           1.39%       kswapd0  [kernel.kallsyms]   [k] free_pcppages_bulk
           1.30%           cat  [kernel.kallsyms]   [k] kfree
      
      As you can see, the memcg footprint has shrunk quite a bit.
      
         text    data     bss     dec     hex filename
        37970    9892     400   48262    bc86 mm/memcontrol.o.old
        35239    9892     400   45531    b1db mm/memcontrol.o
      
      This patch (of 4):
      
      The memcg charge API charges pages before they are rmapped - i.e.  have an
      actual "type" - and so every callsite needs its own set of charge and
      uncharge functions to know what type is being operated on.  Worse,
      uncharge has to happen from a context that is still type-specific, rather
      than at the end of the page's lifetime with exclusive access, and so
      requires a lot of synchronization.
      
      Rewrite the charge API to provide a generic set of try_charge(),
      commit_charge() and cancel_charge() transaction operations, much like
      what's currently done for swap-in:
      
        mem_cgroup_try_charge() attempts to reserve a charge, reclaiming
        pages from the memcg if necessary.
      
        mem_cgroup_commit_charge() commits the page to the charge once it
        has a valid page->mapping and PageAnon() reliably tells the type.
      
        mem_cgroup_cancel_charge() aborts the transaction.
      
      This reduces the charge API and enables subsequent patches to
      drastically simplify uncharging.
      
      As pages need to be committed after rmap is established but before they
      are added to the LRU, page_add_new_anon_rmap() must stop doing LRU
      additions again.  Revive lru_cache_add_active_or_unevictable().
      
      [hughd@google.com: fix shmem_unuse]
      [hughd@google.com: Add comments on the private use of -EAGAIN]
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      00501b53
  13. 16 7月, 2014 3 次提交
    • J
      perf: Add vm_ops->name call for mmap event name retrieval · fbe26abe
      Jiri Olsa 提交于
      The following patch added another way to get mmap name: 78d683e8
      ("mm, fs: Add vm_ops->name as an alternative to arch_vma_name")
      
      The vdso vma mapping already switch to this and we no longer get vdso
      name via arch_vma_name function. Adding this way to the perf mmap
      event name retrieval code.
      
      Caught this via perf test:
      
        $ sudo ./perf test -v 7
         7: Validate PERF_RECORD_* events & perf_sample fields     :
        --- start ---
      
      SNIP
      
        PERF_RECORD_MMAP for [vdso] missing!
        test child finished with 255
        ---- end ----
        Validate PERF_RECORD_* events & perf_sample fields: FAILED!
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1405353439-14211-1-git-send-email-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fbe26abe
    • P
      perf: Fix lockdep warning on process exit · 4a1c0f26
      Peter Zijlstra 提交于
      Sasha Levin reported:
      
      > While fuzzing with trinity inside a KVM tools guest running the latest -next
      > kernel I've stumbled on the following spew:
      >
      > ======================================================
      > [ INFO: possible circular locking dependency detected ]
      > 3.15.0-next-20140613-sasha-00026-g6dd125d-dirty #654 Not tainted
      > -------------------------------------------------------
      > trinity-c578/9725 is trying to acquire lock:
      > (&(&pool->lock)->rlock){-.-...}, at: __queue_work (kernel/workqueue.c:1346)
      >
      > but task is already holding lock:
      > (&ctx->lock){-.....}, at: perf_event_exit_task (kernel/events/core.c:7471 kernel/events/core.c:7533)
      >
      > which lock already depends on the new lock.
      
      > 1 lock held by trinity-c578/9725:
      > #0: (&ctx->lock){-.....}, at: perf_event_exit_task (kernel/events/core.c:7471 kernel/events/core.c:7533)
      >
      >  Call Trace:
      >  dump_stack (lib/dump_stack.c:52)
      >  print_circular_bug (kernel/locking/lockdep.c:1216)
      >  __lock_acquire (kernel/locking/lockdep.c:1840 kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131 kernel/locking/lockdep.c:3182)
      >  lock_acquire (./arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
      >  _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151)
      >  __queue_work (kernel/workqueue.c:1346)
      >  queue_work_on (kernel/workqueue.c:1424)
      >  free_object (lib/debugobjects.c:209)
      >  __debug_check_no_obj_freed (lib/debugobjects.c:715)
      >  debug_check_no_obj_freed (lib/debugobjects.c:727)
      >  kmem_cache_free (mm/slub.c:2683 mm/slub.c:2711)
      >  free_task (kernel/fork.c:221)
      >  __put_task_struct (kernel/fork.c:250)
      >  put_ctx (include/linux/sched.h:1855 kernel/events/core.c:898)
      >  perf_event_exit_task (kernel/events/core.c:907 kernel/events/core.c:7478 kernel/events/core.c:7533)
      >  do_exit (kernel/exit.c:766)
      >  do_group_exit (kernel/exit.c:884)
      >  get_signal_to_deliver (kernel/signal.c:2347)
      >  do_signal (arch/x86/kernel/signal.c:698)
      >  do_notify_resume (arch/x86/kernel/signal.c:751)
      >  int_signal (arch/x86/kernel/entry_64.S:600)
      
      Urgh.. so the only way I can make that happen is through:
      
        perf_event_exit_task_context()
          raw_spin_lock(&child_ctx->lock);
          unclone_ctx(child_ctx)
            put_ctx(ctx->parent_ctx);
          raw_spin_unlock_irqrestore(&child_ctx->lock);
      
      And we can avoid this by doing the change below.
      
      I can't immediately see how this changed recently, but given that you
      say it's easy to reproduce, lets fix this.
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20140623141242.GB19860@laptop.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4a1c0f26
    • P
      perf: Revert ("perf: Always destroy groups on exit") · 1903d50c
      Peter Zijlstra 提交于
      Vince reported that commit 15a2d4de ("perf: Always destroy groups
      on exit") causes a regression with grouped events. In particular his
      read_group_attached.c test fails.
      
        https://github.com/deater/perf_event_tests/blob/master/tests/bugs/read_group_attached.c
      
      Because of the context switch optimization in
      perf_event_context_sched_out() the 'original' event may end up in the
      child process and when that exits the change in the patch in question
      destroys the actual grouping.
      
      Therefore revert that change and only destroy inherited groups.
      Reported-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/n/tip-zedy3uktcp753q8fw8dagx7a@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1903d50c
  14. 05 7月, 2014 1 次提交
  15. 02 7月, 2014 1 次提交
    • J
      perf: Do not allow optimized switch for non-cloned events · 1f9a7268
      Jiri Olsa 提交于
      The context check in perf_event_context_sched_out allows
      non-cloned context to be part of the optimized schedule
      out switch.
      
      This could move non-cloned context into another workload
      child. Once this child exits, the context is closed and
      leaves all original (parent) events in closed state.
      
      Any other new cloned event will have closed state and not
      measure anything. And probably causing other odd bugs.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1403598026-2310-2-git-send-email-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1f9a7268
  16. 01 7月, 2014 1 次提交
  17. 09 6月, 2014 2 次提交
  18. 06 6月, 2014 2 次提交
    • A
      perf: Differentiate exec() and non-exec() comm events · 82b89778
      Adrian Hunter 提交于
      perf tools like 'perf report' can aggregate samples by comm strings,
      which generally works.  However, there are other potential use-cases.
      For example, to pair up 'calls' with 'returns' accurately (from branch
      events like Intel BTS) it is necessary to identify whether the process
      has exec'd.  Although a comm event is generated when an 'exec' happens
      it is also generated whenever the comm string is changed on a whim
      (e.g. by prctl PR_SET_NAME).  This patch adds a flag to the comm event
      to differentiate one case from the other.
      
      In order to determine whether the kernel supports the new flag, a
      selection bit named 'exec' is added to struct perf_event_attr.  The
      bit does nothing but will cause perf_event_open() to fail if the bit
      is set on kernels that do not have it defined.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/537D9EBE.7030806@intel.com
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      82b89778
    • P
      perf: Fix perf_event_comm() vs. exec() assumption · e041e328
      Peter Zijlstra 提交于
      perf_event_comm() assumes that set_task_comm() is only called on
      exec(), and in particular that its only called on current.
      
      Neither are true, as Dave reported a WARN triggered by set_task_comm()
      being called on !current.
      
      Separate the exec() hook from the comm hook.
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Link: http://lkml.kernel.org/r/20140521153219.GH5226@laptop.programming.kicks-ass.net
      [ Build fix. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      e041e328
  19. 05 6月, 2014 4 次提交
  20. 02 6月, 2014 2 次提交
  21. 26 5月, 2014 1 次提交
    • V
      ARM: 8043/1: uprobes need icache flush after xol write · 72e6ae28
      Victor Kamensky 提交于
      After instruction write into xol area, on ARM V7
      architecture code need to flush dcache and icache to sync
      them up for given set of addresses. Having just
      'flush_dcache_page(page)' call is not enough - it is
      possible to have stale instruction sitting in icache
      for given xol area slot address.
      
      Introduce arch_uprobe_ixol_copy weak function
      that by default calls uprobes copy_to_page function and
      than flush_dcache_page function and on ARM define new one
      that handles xol slot copy in ARM specific way
      
      flush_uprobe_xol_access function shares/reuses implementation
      with/of flush_ptrace_access function and takes care of writing
      instruction to user land address space on given variety of
      different cache types on ARM CPUs. Because
      flush_uprobe_xol_access does not have vma around
      flush_ptrace_access was split into two parts. First that
      retrieves set of condition from vma and common that receives
      those conditions as flags.
      
      Note ARM cache flush function need kernel address
      through which instruction write happened, so instead
      of using uprobes copy_to_page function changed
      code to explicitly map page and do memcpy.
      
      Note arch_uprobe_copy_ixol function, in similar way as
      copy_to_user_page function, has preempt_disable/preempt_enable.
      Signed-off-by: NVictor Kamensky <victor.kamensky@linaro.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NDavid A. Long <dave.long@linaro.org>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      72e6ae28