1. 15 3月, 2013 15 次提交
    • S
      tracing: Add a way to soft disable trace events · 417944c4
      Steven Rostedt (Red Hat) 提交于
      In order to let triggers enable or disable events, we need a 'soft'
      method for doing so. For example, if a function probe is added that
      lets a user enable or disable events when a function is called, that
      change must be done without taking locks or a mutex, and definitely
      it can't sleep. But the full enabling of a tracepoint is expensive.
      
      By adding a 'SOFT_DISABLE' flag, and converting the flags to be updated
      without the protection of a mutex (using set/clear_bit()), this soft
      disable flag can be used to allow critical sections to enable or disable
      events from being traced (after the event has been placed into "SOFT_MODE").
      
      Some caveats though: The comm recorder (to map pids with a comm) can not
      be soft disabled (yet). If you disable an event with with a "soft"
      disable and wait a while before reading the trace, the comm cache may be
      replaced and you'll get a bunch of <...> for comms in the trace.
      
      Reading the "enable" file for an event that is disabled will now give
      you "0*" where the '*' denotes that the tracepoint is still active but
      the event itself is "disabled".
      
      [ fixed _BIT used in & operation : thanks to Dan Carpenter and smatch ]
      
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      417944c4
    • S
      ftrace: Clean up function probe methods · e67efb93
      Steven Rostedt (Red Hat) 提交于
      When a function probe is created, each function that the probe is
      attached to, a "callback" method is called. On release of the probe,
      each function entry calls the "free" method.
      
      First, "callback" is a confusing name and does not really match what
      it does. Callback sounds like it will be called when the probe
      triggers. But that's not the case. This is really an "init" function,
      so lets rename it as such.
      
      Secondly, both "init" and "free" do not pass enough information back
      to the handlers. Pass back the ops, ip and data for each time the
      method is called. We have the information, might as well use it.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e67efb93
    • S
      tracing: Fix comments for ftrace_event_file/call flags · 57d01ad0
      Steven Rostedt (Red Hat) 提交于
      Most of the flags for the struct ftrace_event_file were moved over
      to the flags of the struct ftrace_event_call, but the comments were
      never updated.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      57d01ad0
    • S
      tracing: Optimize trace_printk() with one arg to use trace_puts() · 9d3c752c
      Steven Rostedt (Red Hat) 提交于
      Although trace_printk() is extremely fast, especially when it uses
      trace_bprintk() (writes args straight to buffer instead of inserting
      into string), it still has the overhead of calling one of the printf
      sprintf() functions, that need to scan the fmt string to determine
      what, if any args it has.
      
      This is a waste of precious CPU cycles if the printk format has no
      args but a single constant string. It is better to use trace_puts()
      which does not have the overhead of the fmt scanning.
      
      But wouldn't it be nice if the developer didn't have to think about
      such things, and the compile would just do it for them?
      
        trace_printk("this string has no args\n");
        [...]
        trace_printk("this sting does %p %d\n", foo, bar);
      
      As tracing is critical to have the least amount of overhead,
      especially when dealing with race conditions, and you want to
      eliminate any "Heisenbugs", you want the trace_printk() to use the
      fastest possible means of tracing.
      
      Currently the macro magic determines if it will use trace_bprintk()
      or if the fmt is a dynamic string (a variable), it will fall
      back to the slow trace_printk() method that does a full snprintf()
      before copying it into the buffer, where as trace_bprintk() only
      copys the pointer to the fmt and the args into the buffer.
      
      Well, now there's a way to spend some more Hogwarts cash and come
      up with new fancy macro magic.
      
        #define trace_printk(fmt, ...)			\
        do {							\
      	char _______STR[] = __stringify((__VA_ARGS__));	\
      	if (sizeof(_______STR) > 3)			\
      		do_trace_printk(fmt, ##__VA_ARGS__);	\
      	else						\
      		trace_puts(fmt);			\
        } while (0)
      
      The above needs a bit of explaining (both here and in the comments).
      
      By stringifying the __VA_ARGS__, we can, at compile time, determine
      the number of args that are being passed to trace_printk(). The extra
      parenthesis are required, otherwise the compiler complains about
      too many parameters for __stringify if there is more than one arg.
      
      When there are no args, the __stringify((__VA_ARGS__)) converts into
      "()\0", a string of 3 characters. Anything else, will be a string
      containing more than 3 characters. Now we assign that string to a
      dynamic char array, and then take the sizeof() of that array.
      If it is greater than 3 characters, we know trace_printk() has args
      and we need to do the full "do_trace_printk()" on them, otherwise
      it was only passed a single arg and we can optimize to use trace_puts().
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NSteven "The King of Nasty Macros!" Rostedt <rostedt@goodmis.org>
      9d3c752c
    • S
      tracing: Add trace_puts() for even faster trace_printk() tracing · 09ae7234
      Steven Rostedt (Red Hat) 提交于
      The trace_printk() is extremely fast and is very handy as it can be
      used in any context (including NMIs!). But it still requires scanning
      the fmt string for parsing the args. Even the trace_bprintk() requires
      a scan to know what args will be saved, although it doesn't copy the
      format string itself.
      
      Several times trace_printk() has no args, and wastes cpu cycles scanning
      the fmt string.
      
      Adding trace_puts() allows the developer to use an even faster
      tracing method that only saves the pointer to the string in the
      ring buffer without doing any format parsing at all. This will
      help remove even more of the "Heisenbug" effect, when debugging.
      
      Also fixed up the F_printk()s for the ftrace internal bprint and print events.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      09ae7234
    • S
      tracing: Add internal tracing_snapshot() functions · ad909e21
      Steven Rostedt (Red Hat) 提交于
      The new snapshot feature is quite handy. It's a way for the user
      to take advantage of the spare buffer that, until then, only
      the latency tracers used to "snapshot" the buffer when it hit
      a max latency. Now users can trigger a "snapshot" manually when
      some condition is hit in a program. But a snapshot currently can
      not be triggered by a condition inside the kernel.
      
      With the addition of tracing_snapshot() and tracing_snapshot_alloc(),
      snapshots can now be taking when a condition is hit, and the
      developer wants to snapshot the case without stopping the trace.
      
      Note, any snapshot will overwrite the old one, so take care
      in how this is done.
      
      These new functions are to be used like tracing_on(), tracing_off()
      and trace_printk() are. That is, they should never be called
      in the mainline Linux kernel. They are solely for the purpose
      of debugging.
      
      The tracing_snapshot() will not allocate a buffer, but it is
      safe to be called from any context (except NMIs). But if a
      snapshot buffer isn't allocated when it is called, it will write
      to the live buffer, complaining about the lack of a snapshot
      buffer, and then stop tracing (giving you the "permanent snapshot").
      
      tracing_snapshot_alloc() will allocate the snapshot buffer if
      it was not already allocated and then take the snapshot. This routine
      *may sleep*, and must be called from context that can sleep.
      The allocation is done with GFP_KERNEL and not atomic.
      
      If you need a snapshot in an atomic context, say in early boot,
      then it is best to call the tracing_snapshot_alloc() before then,
      where it will allocate the buffer, and then you can use the
      tracing_snapshot() anywhere you want and still get snapshots.
      
      Cc: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ad909e21
    • S
      tracing: Consolidate max_tr into main trace_array structure · 12883efb
      Steven Rostedt (Red Hat) 提交于
      Currently, the way the latency tracers and snapshot feature works
      is to have a separate trace_array called "max_tr" that holds the
      snapshot buffer. For latency tracers, this snapshot buffer is used
      to swap the running buffer with this buffer to save the current max
      latency.
      
      The only items needed for the max_tr is really just a copy of the buffer
      itself, the per_cpu data pointers, the time_start timestamp that states
      when the max latency was triggered, and the cpu that the max latency
      was triggered on. All other fields in trace_array are unused by the
      max_tr, making the max_tr mostly bloat.
      
      This change removes the max_tr completely, and adds a new structure
      called trace_buffer, that holds the buffer pointer, the per_cpu data
      pointers, the time_start timestamp, and the cpu where the latency occurred.
      
      The trace_array, now has two trace_buffers, one for the normal trace and
      one for the max trace or snapshot. By doing this, not only do we remove
      the bloat from the max_trace but the instances of traces can now use
      their own snapshot feature and not have just the top level global_trace have
      the snapshot feature and latency tracers for itself.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      12883efb
    • S
      tracing: Only clear trace buffer on module unload if event was traced · 575380da
      Steven Rostedt (Red Hat) 提交于
      Currently, when a module with events is unloaded, the trace buffer is
      cleared. This is just a safety net in case the module might have some
      strange callback when its event is outputted. But there's no reason
      to reset the buffer if the module didn't have any of its events traced.
      
      Add a flag to the event "call" structure called WAS_ENABLED and gets set
      when the event is ever enabled, and this flag never gets cleared. When a
      module gets unloaded, if any of its events have this flag set, then the
      trace buffer will get cleared.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      575380da
    • S
      tracing: Add comment for trace event flag IGNORE_ENABLE · 2a30c11f
      Steven Rostedt (Red Hat) 提交于
      All the trace event flags have comments but the IGNORE_ENABLE flag
      which is set for ftrace internal events that should not be enabled
      via the debugfs "enable" file. That is, if the top level enable file
      is set, it will enable all events. It use to just check the ftrace
      event call descriptor "reg" field and skip those whithout it, but now
      some ftrace internal events have a reg field but still need to be
      skipped. The flag was created to ignore those events.
      
      Now document it.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      2a30c11f
    • L
      tracing: Fix some section mismatch warnings · 523c8113
      Li Zefan 提交于
      As we've added __init annotation to field-defining functions, we should
      add __refdata annotation to event_call variables, which reference those
      functions.
      
      Link: http://lkml.kernel.org/r/51343C1F.2050502@huawei.comReported-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      523c8113
    • L
      tracing: Annotate event field-defining functions with __init · 7e4f44b1
      Li Zefan 提交于
      Those functions are called either during kernel boot or module init.
      
      Before:
      
      $ dmesg | grep 'Freeing unused kernel memory'
      Freeing unused kernel memory: 1208k freed
      Freeing unused kernel memory: 1360k freed
      Freeing unused kernel memory: 1960k freed
      
      After:
      
      $ dmesg | grep 'Freeing unused kernel memory'
      Freeing unused kernel memory: 1236k freed
      Freeing unused kernel memory: 1388k freed
      Freeing unused kernel memory: 1960k freed
      
      Link: http://lkml.kernel.org/r/5125877D.5000201@huawei.comSigned-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      7e4f44b1
    • L
      tracing: Add a helper function for event print functions · f71130de
      Li Zefan 提交于
      Move duplicate code in event print functions to a helper function.
      
      This shrinks the size of the kernel by ~13K.
      
         text    data     bss     dec     hex filename
      6596137 1743966 10138672        18478775        119f6b7 vmlinux.o.old
      6583002 1743849 10138672        18465523        119c2f3 vmlinux.o.new
      
      Link: http://lkml.kernel.org/r/51258746.2060304@huawei.comSigned-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f71130de
    • S
      tracing/ring-buffer: Move poll wake ups into ring buffer code · 15693458
      Steven Rostedt (Red Hat) 提交于
      Move the logic to wake up on ring buffer data into the ring buffer
      code itself. This simplifies the tracing code a lot and also has the
      added benefit that waiters on one of the instance buffers can be woken
      only when data is added to that instance instead of data added to
      any instance.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      15693458
    • S
      tracing: Pass the ftrace_file to the buffer lock reserve code · ccb469a1
      Steven Rostedt 提交于
      Pass the struct ftrace_event_file *ftrace_file to the
      trace_event_buffer_lock_reserve() (new function that replaces the
      trace_current_buffer_lock_reserver()).
      
      The ftrace_file holds a pointer to the trace_array that is in use.
      In the case of multiple buffers with different trace_arrays, this
      allows different events to be recorded into different buffers.
      
      Also fixed some of the stale comments in include/trace/ftrace.h
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ccb469a1
    • S
      tracing: Separate out trace events from global variables · ae63b31e
      Steven Rostedt 提交于
      The trace events for ftrace are all defined via global variables.
      The arrays of events and event systems are linked to a global list.
      This prevents multiple users of the event system (what to enable and
      what not to).
      
      By adding descriptors to represent the event/file relation, as well
      as to which trace_array descriptor they are associated with, allows
      for more than one set of events to be defined. Once the trace events
      files have a link between the trace event and the trace_array they
      are associated with, we can create multiple trace_arrays that can
      record separate events in separate buffers.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ae63b31e
  2. 19 2月, 2013 1 次提交
  3. 14 2月, 2013 2 次提交
    • T
      smpboot: Allow selfparking per cpu threads · 7d7e499f
      Thomas Gleixner 提交于
      The stop machine threads are still killed when a cpu goes offline. The
      reason is that the thread is used to bring the cpu down, so it can't
      be parked along with the other per cpu threads.
      
      Allow a per cpu thread to be excluded from automatic parking, so it
      can park itself once it's done
      
      Add a create callback function as well.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: Arjan van de Veen <arjan@infradead.org>
      Cc: Paul Turner <pjt@google.com>
      Cc: Richard Weinberger <rw@linutronix.de>
      Cc: Magnus Damm <magnus.damm@gmail.com>
      Link: http://lkml.kernel.org/r/20130131120741.553993267@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      7d7e499f
    • T
      workqueue: rename cpu_workqueue to pool_workqueue · 112202d9
      Tejun Heo 提交于
      workqueue has moved away from global_cwqs to worker_pools and with the
      scheduled custom worker pools, wforkqueues will be associated with
      pools which don't have anything to do with CPUs.  The workqueue code
      went through significant amount of changes recently and mass renaming
      isn't likely to hurt much additionally.  Let's replace 'cpu' with
      'pool' so that it reflects the current design.
      
      * s/struct cpu_workqueue_struct/struct pool_workqueue/
      * s/cpu_wq/pool_wq/
      * s/cwq/pwq/
      
      This patch is purely cosmetic.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      112202d9
  4. 09 2月, 2013 7 次提交
    • P
      time, Fix setting of hardware clock in NTP code · 84e345e4
      Prarit Bhargava 提交于
      At init time, if the system time is "warped" forward in warp_clock()
      it will differ from the hardware clock by sys_tz.tz_minuteswest.  This time
      difference is not taken into account when ntp updates the hardware clock,
      and this causes the system time to jump forward by this offset every reboot.
      
      The kernel must take this offset into account when writing the system time
      to the hardware clock in the ntp code.  This patch adds
      persistent_clock_is_local which indicates that an offset has been applied
      in warp_clock() and accounts for the "warp" before writing the hardware
      clock.
      
      x86 does not have this problem as rtc writes are software limited to a
      +/-15 minute window relative to the current rtc time.  Other arches, such
      as powerpc, however do a full synchronization of the system time to the
      rtc and will see this problem.
      
      [v2]: generated against tip/timers/core
      Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      84e345e4
    • H
      unbreak automounter support on 64-bit kernel with 32-bit userspace (v2) · 4f4ffc3a
      Helge Deller 提交于
      automount-support is broken on the parisc architecture, because the existing
      #if list does not include a check for defined(__hppa__). The HPPA (parisc)
      architecture is similiar to other 64bit Linux targets where we have to define
      autofs_wqt_t (which is passed back and forth to user space) as int type which
      has a size of 32bit across 32 and 64bit kernels.
      
      During the discussion on the mailing list, H. Peter Anvin suggested to invert
      the #if list since only specific platforms (specifically those who do not have
      a 32bit userspace, like IA64 and Alpha) should have autofs_wqt_t as unsigned
      long type.
      
      This suggestion is probably the best way to go, since Arm64 (and maybe others?)
      seems to have a non-working automounter. So in the long run even for other new
      upcoming architectures this inverted check seem to be the best solution, since
      it will not require them to change this #if again (unless they are 64bit only).
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Acked-by: NH. Peter Anvin <hpa@zytor.com>
      Acked-by: NIan Kent <raven@themaw.net>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      CC: James Bottomley <James.Bottomley@HansenPartnership.com>
      CC: Rolf Eike Beer <eike-kernel@sf-tec.de>
      4f4ffc3a
    • O
      uprobes: Introduce uprobe_apply() · bdf8647c
      Oleg Nesterov 提交于
      Currently it is not possible to change the filtering constraints after
      uprobe_register(), so a consumer can not, say, start to trace a task/mm
      which was previously filtered out, or remove the no longer needed bp's.
      
      Introduce uprobe_apply() which simply does register_for_each_vma() again
      to consult uprobe_consumer->filter() and install/remove the breakpoints.
      The only complication is that register_for_each_vma() can no longer
      assume that uprobe->consumers should be consulter if is_register == T,
      so we change it to accept "struct uprobe_consumer *new" instead.
      
      Unlike uprobe_register(), uprobe_apply(true) doesn't do "unregister" if
      register_for_each_vma() fails, it is up to caller to handle the error.
      
      Note: we probably need to cleanup the current interface, it is strange
      that uprobe_apply/unregister need inode/offset. We should either change
      uprobe_register() to return "struct uprobe *", or add a private ->uprobe
      member in uprobe_consumer. And in the long term uprobe_apply() should
      take a single argument, uprobe or consumer, even "bool add" should go
      away.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      bdf8647c
    • O
      perf: Introduce hw_perf_event->tp_target and ->tp_list · f22c1bb6
      Oleg Nesterov 提交于
      sys_perf_event_open()->perf_init_event(event) is called before
      find_get_context(event), this means that event->ctx == NULL when
      class->reg(TRACE_REG_PERF_REGISTER/OPEN) is called and thus it
      can't know if this event is per-task or system-wide.
      
      This patch adds hw_perf_event->tp_target for PERF_TYPE_TRACEPOINT,
      this is analogous to PERF_TYPE_BREAKPOINT/bp_target we already have.
      The patch also moves ->bp_target up so that it can overlap with the
      new member, this can help the compiler to generate the better code.
      
      trace_uprobe_register() will use it for prefiltering to avoid the
      unnecessary breakpoints in mm's we do not want to trace.
      
      ->tp_target doesn't have its own reference, but we can rely on the
      fact that either sys_perf_event_open() holds a reference, or it is
      equal to event->ctx->task. So this pointer is always valid until
      free_event().
      
      Also add the "struct list_head tp_list" into this union. It is not
      strictly necessary, but it can simplify the next changes and we can
      add it for free.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      f22c1bb6
    • O
      uprobes: Teach handler_chain() to filter out the probed task · da1816b1
      Oleg Nesterov 提交于
      Currrently the are 2 problems with pre-filtering:
      
      1. It is not possible to add/remove a task (mm) after uprobe_register()
      
      2. A forked child inherits all breakpoints and uprobe_consumer can not
         control this.
      
      This patch does the first step to improve the filtering. handler_chain()
      removes the breakpoints installed by this uprobe from current->mm if all
      handlers return UPROBE_HANDLER_REMOVE.
      
      Note that handler_chain() relies on ->register_rwsem to avoid the race
      with uprobe_register/unregister which can add/del a consumer, or even
      remove and then insert the new uprobe at the same address.
      
      Perhaps we will add uprobe_apply_mm(uprobe, mm, is_register) and teach
      copy_mm() to do filter(UPROBE_FILTER_FORK), but I think this change makes
      sense anyway.
      
      Note: instead of checking the retcode from uc->handler, we could add
      uc->filter(UPROBE_FILTER_BPHIT). But I think this is not optimal to
      call 2 hooks in a row. This buys nothing, and if handler/filter do
      something nontrivial they will probably do the same work twice.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      da1816b1
    • O
      uprobes: Reintroduce uprobe_consumer->filter() · 8a7f2fa0
      Oleg Nesterov 提交于
      Finally add uprobe_consumer->filter() and change consumer_filter()
      to actually call this method.
      
      Note that ->filter() accepts mm_struct, not task_struct. Because:
      
      	1. We do not have for_each_mm_user(mm, task).
      
      	2. Even if we implement for_each_mm_user(), ->filter() can
      	   use it itself.
      
      	3. It is not clear who will actually need this interface to
      	   do the "nontrivial" filtering.
      
      Another argument is "enum uprobe_filter_ctx", consumer->filter() can
      use it to figure out why/where it was called. For example, perhaps
      we can add UPROBE_FILTER_PRE_REGISTER used by build_map_info() to
      quickly "nack" the unwanted mm's. In this case consumer should know
      that it is called under ->i_mmap_mutex.
      
      See the previous discussion at http://marc.info/?t=135214229700002
      Perhaps we should pass more arguments, vma/vaddr?
      
      Note: this patch obviously can't help to filter out the child created
      by fork(), this will be addressed later.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      8a7f2fa0
    • O
      uprobes: Kill uprobe_consumer->filter() · fe20d71f
      Oleg Nesterov 提交于
      uprobe_consumer->filter() is pointless in its current form, kill it.
      
      We will add it back, but with the different signature/semantics. Perhaps
      we will even re-introduce the callsite in handler_chain(), but not to
      just skip uc->handler().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      fe20d71f
  5. 08 2月, 2013 6 次提交
  6. 07 2月, 2013 2 次提交
    • L
      workqueue: add delayed_work->wq to simplify reentrancy handling · 60c057bc
      Lai Jiangshan 提交于
      To avoid executing the same work item from multiple CPUs concurrently,
      a work_struct records the last pool it was on in its ->data so that,
      on the next queueing, the pool can be queried to determine whether the
      work item is still executing or not.
      
      A delayed_work goes through timer before actually being queued on the
      target workqueue and the timer needs to know the target workqueue and
      CPU.  This is currently achieved by modifying delayed_work->work.data
      such that it points to the cwq which points to the target workqueue
      and the last CPU the work item was on.  __queue_delayed_work()
      extracts the last CPU from delayed_work->work.data and then combines
      it with the target workqueue to create new work.data.
      
      The only thing this rather ugly hack achieves is encoding the target
      workqueue into delayed_work->work.data without using a separate field,
      which could be a trade off one can make; unfortunately, this entangles
      work->data management between regular workqueue and delayed_work code
      by setting cwq pointer before the work item is actually queued and
      becomes a hindrance for further improvements of work->data handling.
      
      This can be easily made sane by adding a target workqueue field to
      delayed_work.  While delayed_work is used widely in the kernel and
      this does make it a bit larger (<5%), I think this is the right
      trade-off especially given the prospect of much saner handling of
      work->data which currently involves quite tricky memory barrier
      dancing, and don't expect to see any measureable effect.
      
      Add delayed_work->wq and drop the delayed_work->work.data overloading.
      
      tj: Rewrote the description.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      60c057bc
    • L
      workqueue: replace WORK_CPU_NONE/LAST with WORK_CPU_END · 6be19588
      Lai Jiangshan 提交于
      Now that workqueue has moved away from gcwqs, workqueue no longer has
      the need to have a CPU identifier indicating "no cpu associated" - we
      now use WORK_OFFQ_POOL_NONE instead - and most uses of WORK_CPU_NONE
      are gone.
      
      The only left usage is as the end marker for for_each_*wq*()
      iterators, where the name WORK_CPU_NONE is confusing w/o actual
      WORK_CPU_NONE usages.  Similarly, WORK_CPU_LAST which equals
      WORK_CPU_NONE no longer makes sense.
      
      Replace both WORK_CPU_NONE and LAST with WORK_CPU_END.  This patch
      doesn't introduce any functional difference.
      
      tj: s/WORK_CPU_LAST/WORK_CPU_END/ and rewrote the description.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      6be19588
  7. 05 2月, 2013 2 次提交
  8. 04 2月, 2013 1 次提交
  9. 01 2月, 2013 4 次提交