1. 03 1月, 2014 3 次提交
  2. 22 12月, 2013 5 次提交
    • T
      tracing: Add and use generic set_trigger_filter() implementation · bac5fb97
      Tom Zanussi 提交于
      Add a generic event_command.set_trigger_filter() op implementation and
      have the current set of trigger commands use it - this essentially
      gives them all support for filters.
      
      Syntactically, filters are supported by adding 'if <filter>' just
      after the command, in which case only events matching the filter will
      invoke the trigger.  For example, to add a filter to an
      enable/disable_event command:
      
          echo 'enable_event:system:event if common_pid == 999' > \
                    .../othersys/otherevent/trigger
      
      The above command will only enable the system:event event if the
      common_pid field in the othersys:otherevent event is 999.
      
      As another example, to add a filter to a stacktrace command:
      
          echo 'stacktrace if common_pid == 999' > \
                         .../somesys/someevent/trigger
      
      The above command will only trigger a stacktrace if the common_pid
      field in the event is 999.
      
      The filter syntax is the same as that described in the 'Event
      filtering' section of Documentation/trace/events.txt.
      
      Because triggers can now use filters, the trigger-invoking logic needs
      to be moved in those cases - e.g. for ftrace_raw_event_calls, if a
      trigger has a filter associated with it, the trigger invocation now
      needs to happen after the { assign; } part of the call, in order for
      the trigger condition to be tested.
      
      There's still a SOFT_DISABLED-only check at the top of e.g. the
      ftrace_raw_events function, so when an event is soft disabled but not
      because of the presence of a trigger, the original SOFT_DISABLED
      behavior remains unchanged.
      
      There's also a bit of trickiness in that some triggers need to avoid
      being invoked while an event is currently in the process of being
      logged, since the trigger may itself log data into the trace buffer.
      Thus we make sure the current event is committed before invoking those
      triggers.  To do that, we split the trigger invocation in two - the
      first part (event_triggers_call()) checks the filter using the current
      trace record; if a command has the post_trigger flag set, it sets a
      bit for itself in the return value, otherwise it directly invoks the
      trigger.  Once all commands have been either invoked or set their
      return flag, event_triggers_call() returns.  The current record is
      then either committed or discarded; if any commands have deferred
      their triggers, those commands are finally invoked following the close
      of the current event by event_triggers_post_call().
      
      To simplify the above and make it more efficient, the TRIGGER_COND bit
      is introduced, which is set only if a soft-disabled trigger needs to
      use the log record for filter testing or needs to wait until the
      current log record is closed.
      
      The syscall event invocation code is also changed in analogous ways.
      
      Because event triggers need to be able to create and free filters,
      this also adds a couple external wrappers for the existing
      create_filter and free_filter functions, which are too generic to be
      made extern functions themselves.
      
      Link: http://lkml.kernel.org/r/7164930759d8719ef460357f143d995406e4eead.1382622043.git.tom.zanussi@linux.intel.comSigned-off-by: NTom Zanussi <tom.zanussi@linux.intel.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      bac5fb97
    • S
      tracing: Move ftrace_event_file() out of DYNAMIC_FTRACE ifdef · 2875a08b
      Steven Rostedt (Red Hat) 提交于
      Now that event triggers use ftrace_event_file(), it needs to be outside
      the #ifdef CONFIG_DYNAMIC_FTRACE, as it can now be used when that is
      not defined.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      2875a08b
    • T
      tracing: Add 'enable_event' and 'disable_event' event trigger commands · 7862ad18
      Tom Zanussi 提交于
      Add 'enable_event' and 'disable_event' event_command commands.
      
      enable_event and disable_event event triggers are added by the user
      via these commands in a similar way and using practically the same
      syntax as the analagous 'enable_event' and 'disable_event' ftrace
      function commands, but instead of writing to the set_ftrace_filter
      file, the enable_event and disable_event triggers are written to the
      per-event 'trigger' files:
      
          echo 'enable_event:system:event' > .../othersys/otherevent/trigger
          echo 'disable_event:system:event' > .../othersys/otherevent/trigger
      
      The above commands will enable or disable the 'system:event' trace
      events whenever the othersys:otherevent events are hit.
      
      This also adds a 'count' version that limits the number of times the
      command will be invoked:
      
          echo 'enable_event:system:event:N' > .../othersys/otherevent/trigger
          echo 'disable_event:system:event:N' > .../othersys/otherevent/trigger
      
      Where N is the number of times the command will be invoked.
      
      The above commands will will enable or disable the 'system:event'
      trace events whenever the othersys:otherevent events are hit, but only
      N times.
      
      This also makes the find_event_file() helper function extern, since
      it's useful to use from other places, such as the event triggers code,
      so make it accessible.
      
      Link: http://lkml.kernel.org/r/f825f3048c3f6b026ee37ae5825f9fc373451828.1382622043.git.tom.zanussi@linux.intel.comSigned-off-by: NTom Zanussi <tom.zanussi@linux.intel.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      7862ad18
    • T
      tracing: Add 'stacktrace' event trigger command · f21ecbb3
      Tom Zanussi 提交于
      Add 'stacktrace' event_command.  stacktrace event triggers are added
      by the user via this command in a similar way and using practically
      the same syntax as the analogous 'stacktrace' ftrace function command,
      but instead of writing to the set_ftrace_filter file, the stacktrace
      event trigger is written to the per-event 'trigger' files:
      
          echo 'stacktrace' > .../tracing/events/somesys/someevent/trigger
      
      The above command will turn on stacktraces for someevent i.e. whenever
      someevent is hit, a stacktrace will be logged.
      
      This also adds a 'count' version that limits the number of times the
      command will be invoked:
      
          echo 'stacktrace:N' > .../tracing/events/somesys/someevent/trigger
      
      Where N is the number of times the command will be invoked.
      
      The above command will log N stacktraces for someevent i.e. whenever
      someevent is hit N times, a stacktrace will be logged.
      
      Link: http://lkml.kernel.org/r/0c30c008a0828c660aa0e1bbd3255cf179ed5c30.1382622043.git.tom.zanussi@linux.intel.comSigned-off-by: NTom Zanussi <tom.zanussi@linux.intel.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f21ecbb3
    • T
      tracing: Add 'snapshot' event trigger command · 93e31ffb
      Tom Zanussi 提交于
      Add 'snapshot' event_command.  snapshot event triggers are added by
      the user via this command in a similar way and using practically the
      same syntax as the analogous 'snapshot' ftrace function command, but
      instead of writing to the set_ftrace_filter file, the snapshot event
      trigger is written to the per-event 'trigger' files:
      
          echo 'snapshot' > .../somesys/someevent/trigger
      
      The above command will turn on snapshots for someevent i.e. whenever
      someevent is hit, a snapshot will be done.
      
      This also adds a 'count' version that limits the number of times the
      command will be invoked:
      
          echo 'snapshot:N' > .../somesys/someevent/trigger
      
      Where N is the number of times the command will be invoked.
      
      The above command will snapshot N times for someevent i.e. whenever
      someevent is hit N times, a snapshot will be done.
      
      Also adds a new tracing_alloc_snapshot() function - the existing
      tracing_snapshot_alloc() function is a special version of
      tracing_snapshot() that also does the snapshot allocation - the
      snapshot triggers would like to be able to do just the allocation but
      not take a snapshot; the existing tracing_snapshot_alloc() in turn now
      also calls tracing_alloc_snapshot() underneath to do that allocation.
      
      Link: http://lkml.kernel.org/r/c9524dd07ce01f9dcbd59011290e0a8d5b47d7ad.1382622043.git.tom.zanussi@linux.intel.comSigned-off-by: NTom Zanussi <tom.zanussi@linux.intel.com>
      [ fix up from kbuild test robot <fengguang.wu@intel.com report ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      93e31ffb
  3. 21 12月, 2013 2 次提交
    • T
      tracing: Add 'traceon' and 'traceoff' event trigger commands · 2a2df321
      Tom Zanussi 提交于
      Add 'traceon' and 'traceoff' event_command commands.  traceon and
      traceoff event triggers are added by the user via these commands in a
      similar way and using practically the same syntax as the analagous
      'traceon' and 'traceoff' ftrace function commands, but instead of
      writing to the set_ftrace_filter file, the traceon and traceoff
      triggers are written to the per-event 'trigger' files:
      
          echo 'traceon' > .../tracing/events/somesys/someevent/trigger
          echo 'traceoff' > .../tracing/events/somesys/someevent/trigger
      
      The above command will turn tracing on or off whenever someevent is
      hit.
      
      This also adds a 'count' version that limits the number of times the
      command will be invoked:
      
          echo 'traceon:N' > .../tracing/events/somesys/someevent/trigger
          echo 'traceoff:N' > .../tracing/events/somesys/someevent/trigger
      
      Where N is the number of times the command will be invoked.
      
      The above commands will will turn tracing on or off whenever someevent
      is hit, but only N times.
      
      Some common register/unregister_trigger() implementations of the
      event_command reg()/unreg() callbacks are also provided, which add and
      remove trigger instances to the per-event list of triggers, and
      arm/disarm them as appropriate.  event_trigger_callback() is a
      general-purpose event_command func() implementation that orchestrates
      command parsing and registration for most normal commands.
      
      Most event commands will use these, but some will override and
      possibly reuse them.
      
      The event_trigger_init(), event_trigger_free(), and
      event_trigger_print() functions are meant to be common implementations
      of the event_trigger_ops init(), free(), and print() ops,
      respectively.
      
      Most trigger_ops implementations will use these, but some will
      override and possibly reuse them.
      
      Link: http://lkml.kernel.org/r/00a52816703b98d2072947478dd6e2d70cde5197.1382622043.git.tom.zanussi@linux.intel.comSigned-off-by: NTom Zanussi <tom.zanussi@linux.intel.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      2a2df321
    • T
      tracing: Add basic event trigger framework · 85f2b082
      Tom Zanussi 提交于
      Add a 'trigger' file for each trace event, enabling 'trace event
      triggers' to be set for trace events.
      
      'trace event triggers' are patterned after the existing 'ftrace
      function triggers' implementation except that triggers are written to
      per-event 'trigger' files instead of to a single file such as the
      'set_ftrace_filter' used for ftrace function triggers.
      
      The implementation is meant to be entirely separate from ftrace
      function triggers, in order to keep the respective implementations
      relatively simple and to allow them to diverge.
      
      The event trigger functionality is built on top of SOFT_DISABLE
      functionality.  It adds a TRIGGER_MODE bit to the ftrace_event_file
      flags which is checked when any trace event fires.  Triggers set for a
      particular event need to be checked regardless of whether that event
      is actually enabled or not - getting an event to fire even if it's not
      enabled is what's already implemented by SOFT_DISABLE mode, so trigger
      mode directly reuses that.  Event trigger essentially inherit the soft
      disable logic in __ftrace_event_enable_disable() while adding a bit of
      logic and trigger reference counting via tm_ref on top of that in a
      new trace_event_trigger_enable_disable() function.  Because the base
      __ftrace_event_enable_disable() code now needs to be invoked from
      outside trace_events.c, a wrapper is also added for those usages.
      
      The triggers for an event are actually invoked via a new function,
      event_triggers_call(), and code is also added to invoke them for
      ftrace_raw_event calls as well as syscall events.
      
      The main part of the patch creates a new trace_events_trigger.c file
      to contain the trace event triggers implementation.
      
      The standard open, read, and release file operations are implemented
      here.
      
      The open() implementation sets up for the various open modes of the
      'trigger' file.  It creates and attaches the trigger iterator and sets
      up the command parser.  If opened for reading set up the trigger
      seq_ops.
      
      The read() implementation parses the event trigger written to the
      'trigger' file, looks up the trigger command, and passes it along to
      that event_command's func() implementation for command-specific
      processing.
      
      The release() implementation does whatever cleanup is needed to
      release the 'trigger' file, like releasing the parser and trigger
      iterator, etc.
      
      A couple of functions for event command registration and
      unregistration are added, along with a list to add them to and a mutex
      to protect them, as well as an (initially empty) registration function
      to add the set of commands that will be added by future commits, and
      call to it from the trace event initialization code.
      
      also added are a couple trigger-specific data structures needed for
      these implementations such as a trigger iterator and a struct for
      trigger-specific data.
      
      A couple structs consisting mostly of function meant to be implemented
      in command-specific ways, event_command and event_trigger_ops, are
      used by the generic event trigger command implementations.  They're
      being put into trace.h alongside the other trace_event data structures
      and functions, in the expectation that they'll be needed in several
      trace_event-related files such as trace_events_trigger.c and
      trace_events.c.
      
      The event_command.func() function is meant to be called by the trigger
      parsing code in order to add a trigger instance to the corresponding
      event.  It essentially coordinates adding a live trigger instance to
      the event, and arming the triggering the event.
      
      Every event_command func() implementation essentially does the
      same thing for any command:
      
         - choose ops - use the value of param to choose either a number or
           count version of event_trigger_ops specific to the command
         - do the register or unregister of those ops
         - associate a filter, if specified, with the triggering event
      
      The reg() and unreg() ops allow command-specific implementations for
      event_trigger_op registration and unregistration, and the
      get_trigger_ops() op allows command-specific event_trigger_ops
      selection to be parameterized.  When a trigger instance is added, the
      reg() op essentially adds that trigger to the triggering event and
      arms it, while unreg() does the opposite.  The set_filter() function
      is used to associate a filter with the trigger - if the command
      doesn't specify a set_filter() implementation, the command will ignore
      filters.
      
      Each command has an associated trigger_type, which serves double duty,
      both as a unique identifier for the command as well as a value that
      can be used for setting a trigger mode bit during trigger invocation.
      
      The signature of func() adds a pointer to the event_command struct,
      used to invoke those functions, along with a command_data param that
      can be passed to the reg/unreg functions.  This allows func()
      implementations to use command-specific blobs and supports code
      re-use.
      
      The event_trigger_ops.func() command corrsponds to the trigger 'probe'
      function that gets called when the triggering event is actually
      invoked.  The other functions are used to list the trigger when
      needed, along with a couple mundane book-keeping functions.
      
      This also moves event_file_data() into trace.h so it can be used
      outside of trace_events.c.
      
      Link: http://lkml.kernel.org/r/316d95061accdee070aac8e5750afba0192fa5b9.1382622043.git.tom.zanussi@linux.intel.comSigned-off-by: NTom Zanussi <tom.zanussi@linux.intel.com>
      Idea-by: NSteve Rostedt <rostedt@goodmis.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      85f2b082
  4. 13 12月, 2013 2 次提交
    • L
      futex: move user address verification up to common code · 5cdec2d8
      Linus Torvalds 提交于
      When debugging the read-only hugepage case, I was confused by the fact
      that get_futex_key() did an access_ok() only for the non-shared futex
      case, since the user address checking really isn't in any way specific
      to the private key handling.
      
      Now, it turns out that the shared key handling does effectively do the
      equivalent checks inside get_user_pages_fast() (it doesn't actually
      check the address range on x86, but does check the page protections for
      being a user page).  So it wasn't actually a bug, but the fact that we
      treat the address differently for private and shared futexes threw me
      for a loop.
      
      Just move the check up, so that it gets done for both cases.  Also, use
      the 'rw' parameter for the type, even if it doesn't actually matter any
      more (it's a historical artifact of the old racy i386 "page faults from
      kernel space don't check write protections").
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5cdec2d8
    • L
      futex: fix handling of read-only-mapped hugepages · f12d5bfc
      Linus Torvalds 提交于
      The hugepage code had the exact same bug that regular pages had in
      commit 7485d0d3 ("futexes: Remove rw parameter from
      get_futex_key()").
      
      The regular page case was fixed by commit 9ea71503 ("futex: Fix
      regression with read only mappings"), but the transparent hugepage case
      (added in a5b338f2: "thp: update futex compound knowledge") case
      remained broken.
      
      Found by Dave Jones and his trinity tool.
      Reported-and-tested-by: NDave Jones <davej@fedoraproject.org>
      Cc: stable@kernel.org # v2.6.38+
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Darren Hart <dvhart@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f12d5bfc
  5. 11 12月, 2013 2 次提交
    • H
      KEYS: correct alignment of system_certificate_list content in assembly file · 62226983
      Hendrik Brueckner 提交于
      Apart from data-type specific alignment constraints, there are also
      architecture-specific alignment requirements.
      For example, on s390 symbols must be on even addresses implying a 2-byte
      alignment.  If the system_certificate_list_end symbol is on an odd address
      and if this address is loaded, the least-significant bit is ignored.  As a
      result, the load_system_certificate_list() fails to load the certificates
      because of a wrong certificate length calculation.
      
      To be safe, align system_certificate_list on an 8-byte boundary.  Also improve
      the length calculation of the system_certificate_list content.  Introduce a
      system_certificate_list_size (8-byte aligned because of unsigned long) variable
      that stores the length.  Let the linker calculate this size by introducing
      a start and end label for the certificate content.
      Signed-off-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      62226983
    • R
      Ignore generated file kernel/x509_certificate_list · 7cfe5b33
      Rusty Russell 提交于
      $ git status
      # On branch pending-rebases
      # Untracked files:
      #   (use "git add <file>..." to include in what will be committed)
      #
      #	kernel/x509_certificate_list
      nothing added to commit but untracked files present (use "git add" to track)
      $
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      7cfe5b33
  6. 08 12月, 2013 1 次提交
  7. 06 12月, 2013 1 次提交
  8. 29 11月, 2013 2 次提交
  9. 28 11月, 2013 2 次提交
    • T
      cgroup: fix cgroup_subsys_state leak for seq_files · e605b365
      Tejun Heo 提交于
      If a cgroup file implements either read_map() or read_seq_string(),
      such file is served using seq_file by overriding file->f_op to
      cgroup_seqfile_operations, which also overrides the release method to
      single_release() from cgroup_file_release().
      
      Because cgroup_file_open() didn't use to acquire any resources, this
      used to be fine, but since f7d58818 ("cgroup: pin
      cgroup_subsys_state when opening a cgroupfs file"), cgroup_file_open()
      pins the css (cgroup_subsys_state) which is put by
      cgroup_file_release().  The patch forgot to update the release path
      for seq_files and each open/release cycle leaks a css reference.
      
      Fix it by updating cgroup_file_release() to also handle seq_files and
      using it for seq_file release path too.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: stable@vger.kernel.org # v3.12
      e605b365
    • P
      cpuset: Fix memory allocator deadlock · 0fc0287c
      Peter Zijlstra 提交于
      Juri hit the below lockdep report:
      
      [    4.303391] ======================================================
      [    4.303392] [ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ]
      [    4.303394] 3.12.0-dl-peterz+ #144 Not tainted
      [    4.303395] ------------------------------------------------------
      [    4.303397] kworker/u4:3/689 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
      [    4.303399]  (&p->mems_allowed_seq){+.+...}, at: [<ffffffff8114e63c>] new_slab+0x6c/0x290
      [    4.303417]
      [    4.303417] and this task is already holding:
      [    4.303418]  (&(&q->__queue_lock)->rlock){..-...}, at: [<ffffffff812d2dfb>] blk_execute_rq_nowait+0x5b/0x100
      [    4.303431] which would create a new lock dependency:
      [    4.303432]  (&(&q->__queue_lock)->rlock){..-...} -> (&p->mems_allowed_seq){+.+...}
      [    4.303436]
      
      [    4.303898] the dependencies between the lock to be acquired and SOFTIRQ-irq-unsafe lock:
      [    4.303918] -> (&p->mems_allowed_seq){+.+...} ops: 2762 {
      [    4.303922]    HARDIRQ-ON-W at:
      [    4.303923]                     [<ffffffff8108ab9a>] __lock_acquire+0x65a/0x1ff0
      [    4.303926]                     [<ffffffff8108cbe3>] lock_acquire+0x93/0x140
      [    4.303929]                     [<ffffffff81063dd6>] kthreadd+0x86/0x180
      [    4.303931]                     [<ffffffff816ded6c>] ret_from_fork+0x7c/0xb0
      [    4.303933]    SOFTIRQ-ON-W at:
      [    4.303933]                     [<ffffffff8108abcc>] __lock_acquire+0x68c/0x1ff0
      [    4.303935]                     [<ffffffff8108cbe3>] lock_acquire+0x93/0x140
      [    4.303940]                     [<ffffffff81063dd6>] kthreadd+0x86/0x180
      [    4.303955]                     [<ffffffff816ded6c>] ret_from_fork+0x7c/0xb0
      [    4.303959]    INITIAL USE at:
      [    4.303960]                    [<ffffffff8108a884>] __lock_acquire+0x344/0x1ff0
      [    4.303963]                    [<ffffffff8108cbe3>] lock_acquire+0x93/0x140
      [    4.303966]                    [<ffffffff81063dd6>] kthreadd+0x86/0x180
      [    4.303969]                    [<ffffffff816ded6c>] ret_from_fork+0x7c/0xb0
      [    4.303972]  }
      
      Which reports that we take mems_allowed_seq with interrupts enabled. A
      little digging found that this can only be from
      cpuset_change_task_nodemask().
      
      This is an actual deadlock because an interrupt doing an allocation will
      hit get_mems_allowed()->...->__read_seqcount_begin(), which will spin
      forever waiting for the write side to complete.
      
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Reported-by: NJuri Lelli <juri.lelli@gmail.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Tested-by: NJuri Lelli <juri.lelli@gmail.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: stable@vger.kernel.org
      0fc0287c
  10. 27 11月, 2013 1 次提交
  11. 26 11月, 2013 3 次提交
    • S
      ftrace: Fix function graph with loading of modules · 8a56d776
      Steven Rostedt (Red Hat) 提交于
      Commit 8c4f3c3f "ftrace: Check module functions being traced on reload"
      fixed module loading and unloading with respect to function tracing, but
      it missed the function graph tracer. If you perform the following
      
       # cd /sys/kernel/debug/tracing
       # echo function_graph > current_tracer
       # modprobe nfsd
       # echo nop > current_tracer
      
      You'll get the following oops message:
      
       ------------[ cut here ]------------
       WARNING: CPU: 2 PID: 2910 at /linux.git/kernel/trace/ftrace.c:1640 __ftrace_hash_rec_update.part.35+0x168/0x1b9()
       Modules linked in: nfsd exportfs nfs_acl lockd ipt_MASQUERADE sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables uinput snd_hda_codec_idt
       CPU: 2 PID: 2910 Comm: bash Not tainted 3.13.0-rc1-test #7
       Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
        0000000000000668 ffff8800787efcf8 ffffffff814fe193 ffff88007d500000
        0000000000000000 ffff8800787efd38 ffffffff8103b80a 0000000000000668
        ffffffff810b2b9a ffffffff81a48370 0000000000000001 ffff880037aea000
       Call Trace:
        [<ffffffff814fe193>] dump_stack+0x4f/0x7c
        [<ffffffff8103b80a>] warn_slowpath_common+0x81/0x9b
        [<ffffffff810b2b9a>] ? __ftrace_hash_rec_update.part.35+0x168/0x1b9
        [<ffffffff8103b83e>] warn_slowpath_null+0x1a/0x1c
        [<ffffffff810b2b9a>] __ftrace_hash_rec_update.part.35+0x168/0x1b9
        [<ffffffff81502f89>] ? __mutex_lock_slowpath+0x364/0x364
        [<ffffffff810b2cc2>] ftrace_shutdown+0xd7/0x12b
        [<ffffffff810b47f0>] unregister_ftrace_graph+0x49/0x78
        [<ffffffff810c4b30>] graph_trace_reset+0xe/0x10
        [<ffffffff810bf393>] tracing_set_tracer+0xa7/0x26a
        [<ffffffff810bf5e1>] tracing_set_trace_write+0x8b/0xbd
        [<ffffffff810c501c>] ? ftrace_return_to_handler+0xb2/0xde
        [<ffffffff811240a8>] ? __sb_end_write+0x5e/0x5e
        [<ffffffff81122aed>] vfs_write+0xab/0xf6
        [<ffffffff8150a185>] ftrace_graph_caller+0x85/0x85
        [<ffffffff81122dbd>] SyS_write+0x59/0x82
        [<ffffffff8150a185>] ftrace_graph_caller+0x85/0x85
        [<ffffffff8150a2d2>] system_call_fastpath+0x16/0x1b
       ---[ end trace 940358030751eafb ]---
      
      The above mentioned commit didn't go far enough. Well, it covered the
      function tracer by adding checks in __register_ftrace_function(). The
      problem is that the function graph tracer circumvents that (for a slight
      efficiency gain when function graph trace is running with a function
      tracer. The gain was not worth this).
      
      The problem came with ftrace_startup() which should always be called after
      __register_ftrace_function(), if you want this bug to be completely fixed.
      
      Anyway, this solution moves __register_ftrace_function() inside of
      ftrace_startup() and removes the need to call them both.
      Reported-by: NDave Wysochanski <dwysocha@redhat.com>
      Fixes: ed926f9b ("ftrace: Use counters to enable functions to trace")
      Cc: stable@vger.kernel.org # 3.0+
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      8a56d776
    • B
      Revert "workqueue: allow work_on_cpu() to be called recursively" · 12997d1a
      Bjorn Helgaas 提交于
      This reverts commit c2fda509.
      
      c2fda509 removed lockdep annotation from work_on_cpu() to work around
      the PCI path that calls work_on_cpu() from within a work_on_cpu() work item
      (PF driver .probe() method -> pci_enable_sriov() -> add VFs -> VF driver
      .probe method).
      
      961da7fb6b22 ("PCI: Avoid unnecessary CPU switch when calling driver
      .probe() method) avoids that recursive work_on_cpu() use in a different
      way, so this revert restores the work_on_cpu() lockdep annotation.
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      12997d1a
    • L
      irq: Enable all irqs unconditionally in irq_resume · ac01810c
      Laxman Dewangan 提交于
      When the system enters suspend, it disables all interrupts in
      suspend_device_irqs(), including the interrupts marked EARLY_RESUME.
      
      On the resume side things are different. The EARLY_RESUME interrupts
      are reenabled in sys_core_ops->resume and the non EARLY_RESUME
      interrupts are reenabled in the normal system resume path.
      
      When suspend_noirq() failed or suspend is aborted for any other
      reason, we might omit the resume side call to sys_core_ops->resume()
      and therefor the interrupts marked EARLY_RESUME are not reenabled and
      stay disabled forever.
      
      To solve this, enable all irqs unconditionally in irq_resume()
      regardless whether interrupts marked EARLY_RESUMEhave been already
      enabled or not.
      
      This might try to reenable already enabled interrupts in the non
      failure case, but the only affected platform is XEN and it has been
      confirmed that it does not cause any side effects.
      
      [ tglx: Massaged changelog. ]
      Signed-off-by: NLaxman Dewangan <ldewangan@nvidia.com>
      Acked-by-and-tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: NHeiko Stuebner <heiko@sntech.de>
      Reviewed-by: NPavel Machek <pavel@ucw.cz>
      Cc: <ian.campbell@citrix.com>
      Cc: <rjw@rjwysocki.net>
      Cc: <len.brown@intel.com>
      Cc: <gregkh@linuxfoundation.org>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/1385388587-16442-1-git-send-email-ldewangan@nvidia.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      ac01810c
  12. 23 11月, 2013 6 次提交
    • L
      workqueue: fix pool ID allocation leakage and remove BUILD_BUG_ON() in init_workqueues · 4e8b22bd
      Li Bin 提交于
      When one work starts execution, the high bits of work's data contain
      pool ID. It can represent a maximum of WORK_OFFQ_POOL_NONE. Pool ID
      is assigned WORK_OFFQ_POOL_NONE when the work being initialized
      indicating that no pool is associated and get_work_pool() uses it to
      check the associated pool. So if worker_pool_assign_id() assigns a
      ID greater than or equal WORK_OFFQ_POOL_NONE to a pool, it triggers
      leakage, and it may break the non-reentrance guarantee.
      
      This patch fix this issue by modifying the worker_pool_assign_id()
      function calling idr_alloc() by setting @end param WORK_OFFQ_POOL_NONE.
      
      Furthermore, in the current implementation, the BUILD_BUG_ON() in
      init_workqueues makes no sense. The number of worker pools needed
      cannot be determined at compile time, because the number of backing
      pools for UNBOUND workqueues is dynamic based on the assigned custom
      attributes. So remove it.
      
      tj: Minor comment and indentation updates.
      Signed-off-by: NLi Bin <huawei.libin@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      4e8b22bd
    • L
      workqueue: fix comment typo for __queue_work() · 9ef28a73
      Li Bin 提交于
      It seems the "dying" should be "draining" here.
      Signed-off-by: NLi Bin <huawei.libin@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      9ef28a73
    • T
      workqueue: fix ordered workqueues in NUMA setups · 8a2b7538
      Tejun Heo 提交于
      An ordered workqueue implements execution ordering by using single
      pool_workqueue with max_active == 1.  On a given pool_workqueue, work
      items are processed in FIFO order and limiting max_active to 1
      enforces the queued work items to be processed one by one.
      
      Unfortunately, 4c16bd32 ("workqueue: implement NUMA affinity for
      unbound workqueues") accidentally broke this guarantee by applying
      NUMA affinity to ordered workqueues too.  On NUMA setups, an ordered
      workqueue would end up with separate pool_workqueues for different
      nodes.  Each pool_workqueue still limits max_active to 1 but multiple
      work items may be executed concurrently and out of order depending on
      which node they are queued to.
      
      Fix it by using dedicated ordered_wq_attrs[] when creating ordered
      workqueues.  The new attrs match the unbound ones except that no_numa
      is always set thus forcing all NUMA nodes to share the default
      pool_workqueue.
      
      While at it, add sanity check in workqueue creation path which
      verifies that an ordered workqueues has only the default
      pool_workqueue.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NLibin <huawei.libin@huawei.com>
      Cc: stable@vger.kernel.org
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      8a2b7538
    • O
      workqueue: swap set_cpus_allowed_ptr() and PF_NO_SETAFFINITY · 91151228
      Oleg Nesterov 提交于
      Move the setting of PF_NO_SETAFFINITY up before set_cpus_allowed()
      in create_worker(). Otherwise userland can change ->cpus_allowed
      in between.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      91151228
    • T
      cgroup: use a dedicated workqueue for cgroup destruction · e5fca243
      Tejun Heo 提交于
      Since be445626 ("cgroup: remove synchronize_rcu() from
      cgroup_diput()"), cgroup destruction path makes use of workqueue.  css
      freeing is performed from a work item from that point on and a later
      commit, ea15f8cc ("cgroup: split cgroup destruction into two
      steps"), moves css offlining to workqueue too.
      
      As cgroup destruction isn't depended upon for memory reclaim, the
      destruction work items were put on the system_wq; unfortunately, some
      controller may block in the destruction path for considerable duration
      while holding cgroup_mutex.  As large part of destruction path is
      synchronized through cgroup_mutex, when combined with high rate of
      cgroup removals, this has potential to fill up system_wq's max_active
      of 256.
      
      Also, it turns out that memcg's css destruction path ends up queueing
      and waiting for work items on system_wq through work_on_cpu().  If
      such operation happens while system_wq is fully occupied by cgroup
      destruction work items, work_on_cpu() can't make forward progress
      because system_wq is full and other destruction work items on
      system_wq can't make forward progress because the work item waiting
      for work_on_cpu() is holding cgroup_mutex, leading to deadlock.
      
      This can be fixed by queueing destruction work items on a separate
      workqueue.  This patch creates a dedicated workqueue -
      cgroup_destroy_wq - for this purpose.  As these work items shouldn't
      have inter-dependencies and mostly serialized by cgroup_mutex anyway,
      giving high concurrency level doesn't buy anything and the workqueue's
      @max_active is set to 1 so that destruction work items are executed
      one by one on each CPU.
      
      Hugh Dickins: Because cgroup_init() is run before init_workqueues(),
      cgroup_destroy_wq can't be allocated from cgroup_init().  Do it from a
      separate core_initcall().  In the future, we probably want to reorder
      so that workqueue init happens before cgroup_init().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NHugh Dickins <hughd@google.com>
      Reported-by: NShawn Bohrer <shawn.bohrer@gmail.com>
      Link: http://lkml.kernel.org/r/20131111220626.GA7509@sbohrermbp13-local.rgmadvisors.com
      Link: http://lkml.kernel.org/g/alpine.LNX.2.00.1310301606080.2333@eggly.anvils
      Cc: stable@vger.kernel.org # v3.9+
      e5fca243
    • M
      time: Fix 1ns/tick drift w/ GENERIC_TIME_VSYSCALL_OLD · 4be77398
      Martin Schwidefsky 提交于
      Since commit 1e75fa8b (time: Condense timekeeper.xtime
      into xtime_sec - merged in v3.6), there has been an problem
      with the error accounting in the timekeeping code, such that
      when truncating to nanoseconds, we round up to the next nsec,
      but the balancing adjustment to the ntp_error value was dropped.
      
      This causes 1ns per tick drift forward of the clock.
      
      In 3.7, this logic was isolated to only GENERIC_TIME_VSYSCALL_OLD
      architectures (s390, ia64, powerpc).
      
      The fix is simply to balance the accounting and to subtract the
      added nanosecond from ntp_error. This allows the internal long-term
      clock steering to keep the clock accurate.
      
      While this fix removes the regression added in 1e75fa8b, the
      ideal solution is to move away from GENERIC_TIME_VSYSCALL_OLD
      and use the new VSYSCALL method, which avoids entirely the
      nanosecond granular rounding, and the resulting short-term clock
      adjustment oscillation needed to keep long term accurate time.
      
      [ jstultz: Many thanks to Martin for his efforts identifying this
        	   subtle bug, and providing the fix. ]
      
      Originally-from: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Paul Turner <pjt@google.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable <stable@vger.kernel.org>  #v3.6+
      Link: http://lkml.kernel.org/r/1385149491-20307-1-git-send-email-john.stultz@linaro.orgSigned-off-by: NJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      4be77398
  13. 20 11月, 2013 5 次提交
  14. 19 11月, 2013 5 次提交