1. 10 8月, 2016 3 次提交
    • M
      perf/core: Sched out groups atomically · 3f005e7d
      Mark Rutland 提交于
      Groups of events are supposed to be scheduled atomically, such that it
      is possible to derive meaningful ratios between their values.
      
      We take great pains to achieve this when scheduling event groups to a
      PMU in group_sched_in(), calling {start,commit}_txn() (which fall back
      to perf_pmu_{disable,enable}() if necessary) to provide this guarantee.
      However we don't mirror this in group_sched_out(), and in some cases
      events will not be scheduled out atomically.
      
      For example, if we disable an event group with PERF_EVENT_IOC_DISABLE,
      we'll cross-call __perf_event_disable() for the group leader, and will
      call group_sched_out() without having first disabled the relevant PMU.
      We will disable/enable the PMU around each pmu->del() call, but between
      each call the PMU will be enabled and events may count.
      
      Avoid this by explicitly disabling and enabling the PMU around event
      removal in group_sched_out(), mirroring what we do in group_sched_in().
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1469553141-28314-1-git-send-email-mark.rutland@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3f005e7d
    • D
      perf/core: Set cgroup in CPU contexts for new cgroup events · db4a8356
      David Carrillo-Cisneros 提交于
      There's a perf stat bug easy to observer on a machine with only one cgroup:
      
        $ perf stat -e cycles -I 1000 -C 0 -G /
        #          time             counts unit events
            1.000161699      <not counted>      cycles                    /
            2.000355591      <not counted>      cycles                    /
            3.000565154      <not counted>      cycles                    /
            4.000951350      <not counted>      cycles                    /
      
      We'd expect some output there.
      
      The underlying problem is that there is an optimization in
      perf_cgroup_sched_{in,out}() that skips the switch of cgroup events
      if the old and new cgroups in a task switch are the same.
      
      This optimization interacts with the current code in two ways
      that cause a CPU context's cgroup (cpuctx->cgrp) to be NULL even if a
      cgroup event matches the current task. These are:
      
        1. On creation of the first cgroup event in a CPU: In current code,
        cpuctx->cpu is only set in perf_cgroup_sched_in, but due to the
        aforesaid optimization, perf_cgroup_sched_in will run until the next
        cgroup switches in that CPU. This may happen late or never happen,
        depending on system's number of cgroups, CPU load, etc.
      
        2. On deletion of the last cgroup event in a cpuctx: In list_del_event,
        cpuctx->cgrp is set NULL. Any new cgroup event will not be sched in
        because cpuctx->cgrp == NULL until a cgroup switch occurs and
        perf_cgroup_sched_in is executed (updating cpuctx->cgrp).
      
      This patch fixes both problems by setting cpuctx->cgrp in list_add_event,
      mirroring what list_del_event does when removing a cgroup event from CPU
      context, as introduced in:
      
        commit 68cacd29 ("perf_events: Fix stale ->cgrp pointer in update_cgrp_time_from_cpuctx()")
      
      With this patch, cpuctx->cgrp is always set/clear when installing/removing
      the first/last cgroup event in/from the CPU context. With cpuctx->cgrp
      correctly set, event_filter_match works as intended when events are
      sched in/out.
      
      After the fix, the output is as expected:
      
        $ perf stat -e cycles -I 1000 -a -G /
        #         time             counts unit events
           1.004699159          627342882      cycles                    /
           2.007397156          615272690      cycles                    /
           3.010019057          616726074      cycles                    /
      Signed-off-by: NDavid Carrillo-Cisneros <davidcc@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vegard Nossum <vegard.nossum@gmail.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1470124092-113192-1-git-send-email-davidcc@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      db4a8356
    • P
      perf/core: Fix sideband list-iteration vs. event ordering NULL pointer deference crash · 0b8f1e2e
      Peter Zijlstra 提交于
      Vegard Nossum reported that perf fuzzing generates a NULL
      pointer dereference crash:
      
      > Digging a bit deeper into this, it seems the event itself is getting
      > created by perf_event_open() and it gets added to the pmu_event_list
      > through:
      >
      > perf_event_open()
      >  - perf_event_alloc()
      >     - account_event()
      >        - account_pmu_sb_event()
      >           - attach_sb_event()
      >
      > so at this point the event is being attached but its ->ctx is still
      > NULL. It seems like ->ctx is set just a bit later in
      > perf_event_open(), though.
      >
      > But before that, __schedule() comes along and creates a stack trace
      > similar to the one above:
      >
      > __schedule()
      >  - __perf_event_task_sched_out()
      >    - perf_iterate_sb()
      >      - perf_iterate_sb_cpu()
      >         - event_filter_match()
      >           - perf_cgroup_match()
      >             - __get_cpu_context()
      >               - (dereference ctx which is NULL)
      >
      > So I guess the question is... should the event be attached (= put on
      > the list) before ->ctx gets set? Or should the cgroup code check for a
      > NULL ->ctx?
      
      The latter seems like the simplest solution. Moving the list-add later
      creates a bit of a mess.
      Reported-by: NVegard Nossum <vegard.nossum@gmail.com>
      Tested-by: NVegard Nossum <vegard.nossum@gmail.com>
      Tested-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Carrillo-Cisneros <davidcc@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: f2fb6bef ("perf/core: Optimize side-band event delivery")
      Link: http://lkml.kernel.org/r/20160804123724.GN6862@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0b8f1e2e
  2. 09 8月, 2016 1 次提交
  3. 08 8月, 2016 1 次提交
    • J
      block: rename bio bi_rw to bi_opf · 1eff9d32
      Jens Axboe 提交于
      Since commit 63a4cc24, bio->bi_rw contains flags in the lower
      portion and the op code in the higher portions. This means that
      old code that relies on manually setting bi_rw is most likely
      going to be broken. Instead of letting that brokeness linger,
      rename the member, to force old and out-of-tree code to break
      at compile time instead of at runtime.
      
      No intended functional changes in this commit.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      1eff9d32
  4. 04 8月, 2016 6 次提交
    • J
      jump_label: remove bug.h, atomic.h dependencies for HAVE_JUMP_LABEL · 1f69bf9c
      Jason Baron 提交于
      The current jump_label.h includes bug.h for things such as WARN_ON().
      This makes the header problematic for inclusion by kernel.h or any
      headers that kernel.h includes, since bug.h includes kernel.h (circular
      dependency).  The inclusion of atomic.h is similarly problematic.  Thus,
      this should make jump_label.h 'includable' from most places.
      
      Link: http://lkml.kernel.org/r/7060ce35ddd0d20b33bf170685e6b0fab816bdf2.1467837322.git.jbaron@akamai.comSigned-off-by: NJason Baron <jbaron@akamai.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1f69bf9c
    • M
      tree-wide: replace config_enabled() with IS_ENABLED() · 97f2645f
      Masahiro Yamada 提交于
      The use of config_enabled() against config options is ambiguous.  In
      practical terms, config_enabled() is equivalent to IS_BUILTIN(), but the
      author might have used it for the meaning of IS_ENABLED().  Using
      IS_ENABLED(), IS_BUILTIN(), IS_MODULE() etc.  makes the intention
      clearer.
      
      This commit replaces config_enabled() with IS_ENABLED() where possible.
      This commit is only touching bool config options.
      
      I noticed two cases where config_enabled() is used against a tristate
      option:
      
       - config_enabled(CONFIG_HWMON)
        [ drivers/net/wireless/ath/ath10k/thermal.c ]
      
       - config_enabled(CONFIG_BACKLIGHT_CLASS_DEVICE)
        [ drivers/gpu/drm/gma500/opregion.c ]
      
      I did not touch them because they should be converted to IS_BUILTIN()
      in order to keep the logic, but I was not sure it was the authors'
      intention.
      
      Link: http://lkml.kernel.org/r/1465215656-20569-1-git-send-email-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Stas Sergeev <stsp@list.ru>
      Cc: Matt Redfearn <matt.redfearn@imgtec.com>
      Cc: Joshua Kinard <kumba@gentoo.org>
      Cc: Jiri Slaby <jslaby@suse.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Markos Chandras <markos.chandras@imgtec.com>
      Cc: "Dmitry V. Levin" <ldv@altlinux.org>
      Cc: yu-cheng yu <yu-cheng.yu@intel.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Will Drewry <wad@chromium.org>
      Cc: Nikolay Martynov <mar.kolya@gmail.com>
      Cc: Huacai Chen <chenhc@lemote.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com>
      Cc: Rafal Milecki <zajec5@gmail.com>
      Cc: James Cowgill <James.Cowgill@imgtec.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Alex Smith <alex.smith@imgtec.com>
      Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
      Cc: Qais Yousef <qais.yousef@imgtec.com>
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: Mikko Rapeli <mikko.rapeli@iki.fi>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Brian Norris <computersforpeace@gmail.com>
      Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Cc: "Luis R. Rodriguez" <mcgrof@do-not-panic.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Roland McGrath <roland@hack.frob.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: Kalle Valo <kvalo@qca.qualcomm.com>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: Tony Wu <tung7970@gmail.com>
      Cc: Huaitong Han <huaitong.han@intel.com>
      Cc: Sumit Semwal <sumit.semwal@linaro.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Jason Cooper <jason@lakedaemon.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrea Gelmini <andrea.gelmini@gelma.net>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Rabin Vincent <rabin@rab.in>
      Cc: "Maciej W. Rozycki" <macro@imgtec.com>
      Cc: David Daney <david.daney@cavium.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      97f2645f
    • J
      modules: add ro_after_init support · 444d13ff
      Jessica Yu 提交于
      Add ro_after_init support for modules by adding a new page-aligned section
      in the module layout (after rodata) for ro_after_init data and enabling RO
      protection for that section after module init runs.
      Signed-off-by: NJessica Yu <jeyu@redhat.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      444d13ff
    • R
      jump_label: disable preemption around __module_text_address(). · bdc9f373
      Rusty Russell 提交于
      Steven reported a warning caused by not holding module_mutex or
      rcu_read_lock_sched: his backtrace was corrupted but a quick audit
      found this possible cause.  It's wrong anyway...
      Reported-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      bdc9f373
    • P
      modules: Add kernel parameter to blacklist modules · be7de5f9
      Prarit Bhargava 提交于
      Blacklisting a module in linux has long been a problem.  The current
      procedure is to use rd.blacklist=module_name, however, that doesn't
      cover the case after the initramfs and before a boot prompt (where one
      is supposed to use /etc/modprobe.d/blacklist.conf to blacklist
      runtime loading). Using rd.shell to get an early prompt is hit-or-miss,
      and doesn't cover all situations AFAICT.
      
      This patch adds this functionality of permanently blacklisting a module
      by its name via the kernel parameter module_blacklist=module_name.
      
      [v2]: Rusty, use core_param() instead of __setup() which simplifies
      things.
      
      [v3]: Rusty, undo wreckage from strsep()
      
      [v4]: Rusty, simpler version of blacklisted()
      Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: linux-doc@vger.kernel.org
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      be7de5f9
    • S
      module: Do a WARN_ON_ONCE() for assert module mutex not held · 9502514f
      Steven Rostedt 提交于
      When running with lockdep enabled, I triggered the WARN_ON() in the
      module code that asserts when module_mutex or rcu_read_lock_sched are
      not held. The issue I have is that this can also be called from the
      dump_stack() code, causing us to enter an infinite loop...
      
       ------------[ cut here ]------------
       WARNING: CPU: 1 PID: 0 at kernel/module.c:268 module_assert_mutex_or_preempt+0x3c/0x3e
       Modules linked in: ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6
       CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.7.0-rc3-test-00013-g501c2375 #14
       Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
        ffff880215e8fa70 ffff880215e8fa70 ffffffff812fc8e3 0000000000000000
        ffffffff81d3e55b ffff880215e8fac0 ffffffff8104fc88 ffffffff8104fcab
        0000000915e88300 0000000000000046 ffffffffa019b29a 0000000000000001
       Call Trace:
        [<ffffffff812fc8e3>] dump_stack+0x67/0x90
        [<ffffffff8104fc88>] __warn+0xcb/0xe9
        [<ffffffff8104fcab>] ? warn_slowpath_null+0x5/0x1f
       ------------[ cut here ]------------
       WARNING: CPU: 1 PID: 0 at kernel/module.c:268 module_assert_mutex_or_preempt+0x3c/0x3e
       Modules linked in: ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6
       CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.7.0-rc3-test-00013-g501c2375 #14
       Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
        ffff880215e8f7a0 ffff880215e8f7a0 ffffffff812fc8e3 0000000000000000
        ffffffff81d3e55b ffff880215e8f7f0 ffffffff8104fc88 ffffffff8104fcab
        0000000915e88300 0000000000000046 ffffffffa019b29a 0000000000000001
       Call Trace:
        [<ffffffff812fc8e3>] dump_stack+0x67/0x90
        [<ffffffff8104fc88>] __warn+0xcb/0xe9
        [<ffffffff8104fcab>] ? warn_slowpath_null+0x5/0x1f
       ------------[ cut here ]------------
       WARNING: CPU: 1 PID: 0 at kernel/module.c:268 module_assert_mutex_or_preempt+0x3c/0x3e
       Modules linked in: ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6
       CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.7.0-rc3-test-00013-g501c2375 #14
       Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
        ffff880215e8f4d0 ffff880215e8f4d0 ffffffff812fc8e3 0000000000000000
        ffffffff81d3e55b ffff880215e8f520 ffffffff8104fc88 ffffffff8104fcab
        0000000915e88300 0000000000000046 ffffffffa019b29a 0000000000000001
       Call Trace:
        [<ffffffff812fc8e3>] dump_stack+0x67/0x90
        [<ffffffff8104fc88>] __warn+0xcb/0xe9
        [<ffffffff8104fcab>] ? warn_slowpath_null+0x5/0x1f
       ------------[ cut here ]------------
       WARNING: CPU: 1 PID: 0 at kernel/module.c:268 module_assert_mutex_or_preempt+0x3c/0x3e
      [...]
      
      Which gives us rather useless information. Worse yet, there's some race
      that causes this, and I seldom trigger it, so I have no idea what
      happened.
      
      This would not be an issue if that warning was a WARN_ON_ONCE().
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      9502514f
  5. 03 8月, 2016 20 次提交
  6. 02 8月, 2016 1 次提交
    • D
      perf/core: Change log level for duration warning to KERN_INFO · 0d87d7ec
      David Ahern 提交于
      When the perf interrupt handler exceeds a threshold warning messages
      are displayed on console:
      
        [12739.31793] perf interrupt took too long (2504 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
        [71340.165065] perf interrupt took too long (5005 > 5000), lowering kernel.perf_event_max_sample_rate to 25000
      
      Many customers and users are confused by the message wondering if
      something is wrong or they need to take action to fix a problem.
      Since a user can not do anything to fix the issue, the message is really
      more informational than a warning. Adjust the log level accordingly.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1470084569-438-1-git-send-email-dsa@cumulusnetworks.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0d87d7ec
  7. 01 8月, 2016 1 次提交
  8. 29 7月, 2016 7 次提交