1. 21 5月, 2010 6 次提交
  2. 19 5月, 2010 4 次提交
  3. 18 5月, 2010 3 次提交
  4. 17 5月, 2010 2 次提交
    • A
      atomic_t: Remove volatile from atomic_t definition · 81880d60
      Anton Blanchard 提交于
      When looking at a performance problem on PowerPC, I noticed some awful code
      generation:
      
      c00000000051fc98:       3b 60 00 01     li      r27,1
      ...
      c00000000051fca0:       3b 80 00 00     li      r28,0
      ...
      c00000000051fcdc:       93 61 00 70     stw     r27,112(r1)
      c00000000051fce0:       93 81 00 74     stw     r28,116(r1)
      c00000000051fce4:       81 21 00 70     lwz     r9,112(r1)
      c00000000051fce8:       80 01 00 74     lwz     r0,116(r1)
      c00000000051fcec:       7d 29 07 b4     extsw   r9,r9
      c00000000051fcf0:       7c 00 07 b4     extsw   r0,r0
      
      c00000000051fcf4:       7c 20 04 ac     lwsync
      c00000000051fcf8:       7d 60 f8 28     lwarx   r11,0,r31
      c00000000051fcfc:       7c 0b 48 00     cmpw    r11,r9
      c00000000051fd00:       40 c2 00 10     bne-    c00000000051fd10
      c00000000051fd04:       7c 00 f9 2d     stwcx.  r0,0,r31
      c00000000051fd08:       40 c2 ff f0     bne+    c00000000051fcf8
      c00000000051fd0c:       4c 00 01 2c     isync
      
      We create two constants, write them out to the stack, read them straight back
      in and sign extend them. What a mess.
      
      It turns out this bad code is a result of us defining atomic_t as a
      volatile int.
      
      We removed the volatile attribute from the powerpc atomic_t definition years
      ago, but commit ea435467 (atomic_t: unify all
      arch definitions) added it back in.
      
      To dig up an old quote from Linus:
      
      > The fact is, volatile on data structures is a bug. It's a wart in the C
      > language. It shouldn't be used.
      >
      > Volatile accesses in *code* can be ok, and if we have "atomic_read()"
      > expand to a "*(volatile int *)&(x)->value", then I'd be ok with that.
      >
      > But marking data structures volatile just makes the compiler screw up
      > totally, and makes code for initialization sequences etc much worse.
      
      And screw up it does :)
      
      With the volatile removed, we see much more reasonable code generation:
      
      c00000000051f5b8:       3b 60 00 01     li      r27,1
      ...
      c00000000051f5c0:       3b 80 00 00     li      r28,0
      ...
      
      c00000000051fc7c:       7c 20 04 ac     lwsync
      c00000000051fc80:       7c 00 f8 28     lwarx   r0,0,r31
      c00000000051fc84:       7c 00 d8 00     cmpw    r0,r27
      c00000000051fc88:       40 c2 00 10     bne-    c00000000051fc98
      c00000000051fc8c:       7f 80 f9 2d     stwcx.  r28,0,r31
      c00000000051fc90:       40 c2 ff f0     bne+    c00000000051fc80
      c00000000051fc94:       4c 00 01 2c     isync
      
      Six instructions less.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      81880d60
    • A
      atomic_t: Cast to volatile when accessing atomic variables · f3d46f9d
      Anton Blanchard 提交于
      In preparation for removing volatile from the atomic_t definition, this
      patch adds a volatile cast to all the atomic read functions.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f3d46f9d
  5. 16 5月, 2010 2 次提交
  6. 15 5月, 2010 9 次提交
    • A
      Fix the regression created by "set S_DEAD on unlink()..." commit · d83c49f3
      Al Viro 提交于
      1) i_flags simply doesn't work for mount/unlink race prevention;
      we may have many links to file and rm on one of those obviously
      shouldn't prevent bind on top of another later on.  To fix it
      right way we need to mark _dentry_ as unsuitable for mounting
      upon; new flag (DCACHE_CANT_MOUNT) is protected by d_flags and
      i_mutex on the inode in question.  Set it (with dont_mount(dentry))
      in unlink/rmdir/etc., check (with cant_mount(dentry)) in places
      in namespace.c that used to check for S_DEAD.  Setting S_DEAD
      is still needed in places where we used to set it (for directories
      getting killed), since we rely on it for readdir/rmdir race
      prevention.
      
      2) rename()/mount() protection has another bogosity - we unhash
      the target before we'd checked that it's not a mountpoint.  Fixed.
      
      3) ancient bogosity in pivot_root() - we locked i_mutex on the
      right directory, but checked S_DEAD on the different (and wrong)
      one.  Noticed and fixed.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      d83c49f3
    • S
      tracing: Comment the use of event_mutex with trace event flags · 1eaa4787
      Steven Rostedt 提交于
      The flags variable is protected by the event_mutex when modifying,
      but the event_mutex is not held when reading the variable.
      
      This is due to the fact that the reads occur in critical sections where
      taking a mutex (or even a spinlock) is not wanted.
      
      But the two flags that exist (enable and filter_active) have the code
      written as such to handle the reads to not need a lock.
      
      The enable flag is used just to know if the event is enabled or not
      and its use is always under the event_mutex. Whether or not the event
      is actually enabled is really determined by the tracepoint being
      registered. The flag is just a way to let the code know if the tracepoint
      is registered.
      
      The filter_active is different. It is read without the lock. If it
      is set, then the event probes jump to the filter code. There can be a
      slight mismatch between filters available and filter_active. If the flag is
      set but no filters are available, the code safely jumps to a filter nop.
      If the flag is not set and the filters are available, then the filters
      are skipped. This is acceptable since filters are usually set before
      tracing or they are set by humans, which would not notice the slight
      delay that this causes.
      
      v2: Fixed typo: "cacheing" -> "caching"
      Reported-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      1eaa4787
    • S
      tracing: Combine event filter_active and enable into single flags field · 553552ce
      Steven Rostedt 提交于
      The filter_active and enable both use an int (4 bytes each) to
      set a single flag. We can save 4 bytes per event by combining the
      two into a single integer.
      
         text	   data	    bss	    dec	    hex	filename
      4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
      4894944	1018052	 861512	6774508	 675eec	vmlinux.id
      4894871	1012292	 861512	6768675	 674823	vmlinux.flags
      
      This gives us another 5K in savings.
      
      The modification of both the enable and filter fields are done
      under the event_mutex, so it is still safe to combine the two.
      
      Note: Although Mathieu gave his Acked-by, he would like it documented
       that the reads of flags are not protected by the mutex. The way the
       code works, these reads will not break anything, but will have a
       residual effect. Since this behavior is the same even before this
       patch, describing this situation is left to another patch, as this
       patch does not change the behavior, but just brought it to Mathieu's
       attention.
      
      v2: Updated the event trace self test to for this change.
      Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      553552ce
    • S
      tracing: Remove duplicate id information in event structure · 32c0edae
      Steven Rostedt 提交于
      Now that the trace_event structure is embedded in the ftrace_event_call
      structure, there is no need for the ftrace_event_call id field.
      The id field is the same as the trace_event type field.
      
      Removing the id and re-arranging the structure brings down the tracepoint
      footprint by another 5K.
      
         text	   data	    bss	    dec	    hex	filename
      4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
      4895024	1023812	 861512	6780348	 6775bc	vmlinux.print
      4894944	1018052	 861512	6774508	 675eec	vmlinux.id
      Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      32c0edae
    • S
      tracing: Move print functions into event class · 80decc70
      Steven Rostedt 提交于
      Currently, every event has its own trace_event structure. This is
      fine since the structure is needed anyway. But the print function
      structure (trace_event_functions) is now separate. Since the output
      of the trace event is done by the class (with the exception of events
      defined by DEFINE_EVENT_PRINT), it makes sense to have the class
      define the print functions that all events in the class can use.
      
      This makes a bigger deal with the syscall events since all syscall events
      use the same class. The savings here is another 30K.
      
         text	   data	    bss	    dec	    hex	filename
      4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
      4900382	1048964	 861512	6810858	 67ecea	vmlinux.init
      4900446	1049028	 861512	6810986	 67ed6a	vmlinux.preprint
      4895024	1023812	 861512	6780348	 6775bc	vmlinux.print
      
      To accomplish this, and to let the class know what event is being
      printed, the event structure is embedded in the ftrace_event_call
      structure. This should not be an issues since the event structure
      was created for each event anyway.
      Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      80decc70
    • S
      tracing: Allow events to share their print functions · a9a57763
      Steven Rostedt 提交于
      Multiple events may use the same method to print their data.
      Instead of having all events have a pointer to their print funtions,
      the trace_event structure now points to a trace_event_functions structure
      that will hold the way to print ouf the event.
      
      The event itself is now passed to the print function to let the print
      function know what kind of event it should print.
      
      This opens the door to consolidating the way several events print
      their output.
      
         text	   data	    bss	    dec	    hex	filename
      4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
      4900382	1048964	 861512	6810858	 67ecea	vmlinux.init
      4900446	1049028	 861512	6810986	 67ed6a	vmlinux.preprint
      
      This change slightly increases the size but is needed for the next change.
      
      v3: Fix the branch tracer events to handle this change.
      
      v2: Fix the new function graph tracer event calls to handle this change.
      Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a9a57763
    • S
      tracing: Move raw_init from events to class · 0405ab80
      Steven Rostedt 提交于
      The raw_init function pointer in the event is used to initialize
      various kinds of events. The type of initialization needed is usually
      classed to the kind of event it is.
      
      Two events with the same class will always have the same initialization
      function, so it makes sense to move this to the class structure.
      
      Perhaps even making a special system structure would work since
      the initialization is the same for all events within a system.
      But since there's no system structure (yet), this will just move it
      to the class.
      
         text	   data	    bss	    dec	    hex	filename
      4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
      4900375	1053380	 861512	6815267	 67fe23	vmlinux.fields
      4900382	1048964	 861512	6810858	 67ecea	vmlinux.init
      
      The text grew very slightly, but this is a constant growth that happened
      with the changing of the C files that call the init code.
      The bigger savings is the data which will be saved the more events share
      a class.
      Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      0405ab80
    • S
      tracing: Move fields from event to class structure · 2e33af02
      Steven Rostedt 提交于
      Move the defined fields from the event to the class structure.
      Since the fields of the event are defined by the class they belong
      to, it makes sense to have the class hold the information instead
      of the individual events. The events of the same class would just
      hold duplicate information.
      
      After this change the size of the kernel dropped another 3K:
      
         text	   data	    bss	    dec	    hex	filename
      4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
      4900252	1057412	 861512	6819176	 680d68	vmlinux.regs
      4900375	1053380	 861512	6815267	 67fe23	vmlinux.fields
      
      Although the text increased, this was mainly due to the C files
      having to adapt to the change. This is a constant increase, where
      new tracepoints will not increase the Text. But the big drop is
      in the data size (as well as needed allocations to hold the fields).
      This will give even more savings as more tracepoints are created.
      
      Note, if just TRACE_EVENT()s are used and not DECLARE_EVENT_CLASS()
      with several DEFINE_EVENT()s, then the savings will be lost. But
      we are pushing developers to consolidate events with DEFINE_EVENT()
      so this should not be an issue.
      
      The kprobes define a unique class to every new event, but are dynamic
      so it should not be a issue.
      
      The syscalls however have a single class but the fields for the individual
      events are different. The syscalls use a metadata to define the
      fields. I moved the fields list from the event to the metadata and
      added a "get_fields()" function to the class. This function is used
      to find the fields. For normal events and kprobes, get_fields() just
      returns a pointer to the fields list_head in the class. For syscall
      events, it returns the fields list_head in the metadata for the event.
      
      v2:  Fixed the syscall fields. The syscall metadata needs a list
           of fields for both enter and exit.
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      2e33af02
    • S
      tracing: Remove per event trace registering · 2239291a
      Steven Rostedt 提交于
      This patch removes the register functions of TRACE_EVENT() to enable
      and disable tracepoints. The registering of a event is now down
      directly in the trace_events.c file. The tracepoint_probe_register()
      is now called directly.
      
      The prototypes are no longer type checked, but this should not be
      an issue since the tracepoints are created automatically by the
      macros. If a prototype is incorrect in the TRACE_EVENT() macro, then
      other macros will catch it.
      
      The trace_event_class structure now holds the probes to be called
      by the callbacks. This removes needing to have each event have
      a separate pointer for the probe.
      
      To handle kprobes and syscalls, since they register probes in a
      different manner, a "reg" field is added to the ftrace_event_class
      structure. If the "reg" field is assigned, then it will be called for
      enabling and disabling of the probe for either ftrace or perf. To let
      the reg function know what is happening, a new enum (trace_reg) is
      created that has the type of control that is needed.
      
      With this new rework, the 82 kernel events and 618 syscall events
      has their footprint dramatically lowered:
      
         text	   data	    bss	    dec	    hex	filename
      4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
      4914025	1088868	 861512	6864405	 68be15	vmlinux.class
      4918492	1084612	 861512	6864616	 68bee8	vmlinux.tracepoint
      4900252	1057412	 861512	6819176	 680d68	vmlinux.regs
      
      The size went from 6863829 to 6819176, that's a total of 44K
      in savings. With tracepoints being continuously added, this is
      critical that the footprint becomes minimal.
      
      v5: Added #ifdef CONFIG_PERF_EVENTS around a reference to perf
          specific structure in trace_events.c.
      
      v4: Fixed trace self tests to check probe because regfunc no longer
          exists.
      
      v3: Updated to handle void *data in beginning of probe parameters.
          Also added the tracepoint: check_trace_callback_type_##call().
      
      v2: Changed the callback probes to pass void * and typecast the
          value within the function.
      Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      2239291a
  7. 14 5月, 2010 3 次提交
    • S
      tracing: Let tracepoints have data passed to tracepoint callbacks · 38516ab5
      Steven Rostedt 提交于
      This patch adds data to be passed to tracepoint callbacks.
      
      The created functions from DECLARE_TRACE() now need a mandatory data
      parameter. For example:
      
      DECLARE_TRACE(mytracepoint, int value, value)
      
      Will create the register function:
      
      int register_trace_mytracepoint((void(*)(void *data, int value))probe,
                                      void *data);
      
      As the first argument, all callbacks (probes) must take a (void *data)
      parameter. So a callback for the above tracepoint will look like:
      
      void myprobe(void *data, int value)
      {
      }
      
      The callback may choose to ignore the data parameter.
      
      This change allows callbacks to register a private data pointer along
      with the function probe.
      
      	void mycallback(void *data, int value);
      
      	register_trace_mytracepoint(mycallback, mydata);
      
      Then the mycallback() will receive the "mydata" as the first parameter
      before the args.
      
      A more detailed example:
      
        DECLARE_TRACE(mytracepoint, TP_PROTO(int status), TP_ARGS(status));
      
        /* In the C file */
      
        DEFINE_TRACE(mytracepoint, TP_PROTO(int status), TP_ARGS(status));
      
        [...]
      
             trace_mytracepoint(status);
      
        /* In a file registering this tracepoint */
      
        int my_callback(void *data, int status)
        {
      	struct my_struct my_data = data;
      	[...]
        }
      
        [...]
      	my_data = kmalloc(sizeof(*my_data), GFP_KERNEL);
      	init_my_data(my_data);
      	register_trace_mytracepoint(my_callback, my_data);
      
      The same callback can also be registered to the same tracepoint as long
      as the data registered is different. Note, the data must also be used
      to unregister the callback:
      
      	unregister_trace_mytracepoint(my_callback, my_data);
      
      Because of the data parameter, tracepoints declared this way can not have
      no args. That is:
      
        DECLARE_TRACE(mytracepoint, TP_PROTO(void), TP_ARGS());
      
      will cause an error.
      
      If no arguments are needed, a new macro can be used instead:
      
        DECLARE_TRACE_NOARGS(mytracepoint);
      
      Since there are no arguments, the proto and args fields are left out.
      
      This is part of a series to make the tracepoint footprint smaller:
      
         text	   data	    bss	    dec	    hex	filename
      4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
      4914025	1088868	 861512	6864405	 68be15	vmlinux.class
      4918492	1084612	 861512	6864616	 68bee8	vmlinux.tracepoint
      
      Again, this patch also increases the size of the kernel, but
      lays the ground work for decreasing it.
      
       v5: Fixed net/core/drop_monitor.c to handle these updates.
      
       v4: Moved the DECLARE_TRACE() DECLARE_TRACE_NOARGS out of the
           #ifdef CONFIG_TRACE_POINTS, since the two are the same in both
           cases. The __DECLARE_TRACE() is what changes.
           Thanks to Frederic Weisbecker for pointing this out.
      
       v3: Made all register_* functions require data to be passed and
           all callbacks to take a void * parameter as its first argument.
           This makes the calling functions comply with C standards.
      
           Also added more comments to the modifications of DECLARE_TRACE().
      
       v2: Made the DECLARE_TRACE() have the ability to pass arguments
           and added a new DECLARE_TRACE_NOARGS() for tracepoints that
           do not need any arguments.
      Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      38516ab5
    • M
      tracepoints: Add check trace callback type · 53da59aa
      Mathieu Desnoyers 提交于
      This check is meant to be used by tracepoint users which do a direct cast of
      callbacks to (void *) for direct registration, thus bypassing the
      register_trace_##name and unregister_trace_##name checks.
      
      This permits to ensure that the callback type matches the function type at the
      call site, but without generating any code.
      Acked-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      LKML-Reference: <20100430165959.GA25605@Krystal>
      CC: Ingo Molnar <mingo@elte.hu>
      CC: Andrew Morton <akpm@linux-foundation.org>
      CC: Thomas Gleixner <tglx@linutronix.de>
      CC: Peter Zijlstra <peterz@infradead.org>
      CC: Arnaldo Carvalho de Melo <acme@redhat.com>
      CC: Lai Jiangshan <laijs@cn.fujitsu.com>
      CC: Li Zefan <lizf@cn.fujitsu.com>
      CC: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      53da59aa
    • S
      tracing: Create class struct for events · 8f082018
      Steven Rostedt 提交于
      This patch creates a ftrace_event_class struct that event structs point to.
      This class struct will be made to hold information to modify the
      events. Currently the class struct only holds the events system name.
      
      This patch slightly increases the size, but this change lays the ground work
      of other changes to make the footprint of tracepoints smaller.
      
      With 82 standard tracepoints, and 618 system call tracepoints
      (two tracepoints per syscall: enter and exit):
      
         text	   data	    bss	    dec	    hex	filename
      4913961	1088356	 861512	6863829	 68bbd5	vmlinux.orig
      4914025	1088868	 861512	6864405	 68be15	vmlinux.class
      
      This patch also cleans up some stale comments in ftrace.h.
      
      v2: Fixed missing semi-colon in macro.
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      8f082018
  8. 12 5月, 2010 3 次提交
    • R
      revert "procfs: provide stack information for threads" and its fixup commits · 34441427
      Robin Holt 提交于
      Originally, commit d899bf7b ("procfs: provide stack information for
      threads") attempted to introduce a new feature for showing where the
      threadstack was located and how many pages are being utilized by the
      stack.
      
      Commit c44972f1 ("procfs: disable per-task stack usage on NOMMU") was
      applied to fix the NO_MMU case.
      
      Commit 89240ba0 ("x86, fs: Fix x86 procfs stack information for threads on
      64-bit") was applied to fix a bug in ia32 executables being loaded.
      
      Commit 9ebd4eba ("procfs: fix /proc/<pid>/stat stack pointer for kernel
      threads") was applied to fix a bug which had kernel threads printing a
      userland stack address.
      
      Commit 1306d603 ('proc: partially revert "procfs: provide stack
      information for threads"') was then applied to revert the stack pages
      being used to solve a significant performance regression.
      
      This patch nearly undoes the effect of all these patches.
      
      The reason for reverting these is it provides an unusable value in
      field 28.  For x86_64, a fork will result in the task->stack_start
      value being updated to the current user top of stack and not the stack
      start address.  This unpredictability of the stack_start value makes
      it worthless.  That includes the intended use of showing how much stack
      space a thread has.
      
      Other architectures will get different values.  As an example, ia64
      gets 0.  The do_fork() and copy_process() functions appear to treat the
      stack_start and stack_size parameters as architecture specific.
      
      I only partially reverted c44972f1 ("procfs: disable per-task stack usage
      on NOMMU") .  If I had completely reverted it, I would have had to change
      mm/Makefile only build pagewalk.o when CONFIG_PROC_PAGE_MONITOR is
      configured.  Since I could not test the builds without significant effort,
      I decided to not change mm/Makefile.
      
      I only partially reverted 89240ba0 ("x86, fs: Fix x86 procfs stack
      information for threads on 64-bit") .  I left the KSTK_ESP() change in
      place as that seemed worthwhile.
      Signed-off-by: NRobin Holt <holt@sgi.com>
      Cc: Stefani Seibold <stefani@seibold.net>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      34441427
    • F
      dma-mapping: fix dma_sync_single_range_* · f33d7e2d
      FUJITA Tomonori 提交于
      dma_sync_single_range_for_cpu() and dma_sync_single_range_for_device() use
      a wrong address with a partial synchronization.
      Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f33d7e2d
    • P
      rcu: remove all rcu head initializations, except on_stack initializations · 72d5a9f7
      Paul E. McKenney 提交于
      Remove all rcu head inits. We don't care about the RCU head state before passing
      it to call_rcu() anyway. Only leave the "on_stack" variants so debugobjects can
      keep track of objects on stack.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      72d5a9f7
  9. 11 5月, 2010 8 次提交
    • C
      sched, wait: Use wrapper functions · a93d2f17
      Changli Gao 提交于
      epoll should not touch flags in wait_queue_t. This patch introduces a new
      function __add_wait_queue_exclusive(), for the users, who use wait queue as a
      LIFO queue.
      
      __add_wait_queue_tail_exclusive() is introduced too instead of
      add_wait_queue_exclusive_locked(). remove_wait_queue_locked() is removed, as
      it is a duplicate of __remove_wait_queue(), disliked by users, and with less
      users.
      Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Paul Menage <menage@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Cc: <containers@lists.linux-foundation.org>
      LKML-Reference: <1273214006-2979-1-git-send-email-xiaosuo@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a93d2f17
    • I
      Revert "perf: Fix exit() vs PERF_FORMAT_GROUP" · e3174cfd
      Ingo Molnar 提交于
      This reverts commit 4fd38e45.
      
      It causes various crashes and hangs when events are activated.
      
      The cause is not fully understood yet but we need to revert it
      because the effects are severe.
      Reported-by: NStephane Eranian <eranian@google.com>
      Reported-by: NLin Ming <ming.m.lin@intel.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e3174cfd
    • M
      rcu head introduce rcu head init on stack · 4376030a
      Mathieu Desnoyers 提交于
      PEM:
      o     Would it be possible to make this bisectable as follows?
      
            a.      Insert a new patch after current patch 4/6 that
                    defines destroy_rcu_head_on_stack(),
                    init_rcu_head_on_stack(), and init_rcu_head() with
                    their !CONFIG_DEBUG_OBJECTS_RCU_HEAD definitions.
      
      This patch performs this transition.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      CC: David S. Miller <davem@davemloft.net>
      CC: akpm@linux-foundation.org
      CC: mingo@elte.hu
      CC: laijs@cn.fujitsu.com
      CC: dipankar@in.ibm.com
      CC: josh@joshtriplett.org
      CC: dvhltc@us.ibm.com
      CC: niv@us.ibm.com
      CC: tglx@linutronix.de
      CC: peterz@infradead.org
      CC: rostedt@goodmis.org
      CC: Valdis.Kletnieks@vt.edu
      CC: dhowells@redhat.com
      CC: eric.dumazet@gmail.com
      CC: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      4376030a
    • M
      Debugobjects transition check · a5d8e467
      Mathieu Desnoyers 提交于
      Implement a basic state machine checker in the debugobjects.
      
      This state machine checker detects races and inconsistencies within the "active"
      life of a debugobject. The checker only keeps track of the current state; all
      the state machine logic is kept at the object instance level.
      
      The checker works by adding a supplementary "unsigned int astate" field to the
      debug_obj structure. It keeps track of the current "active state" of the object.
      
      The only constraints that are imposed on the states by the debugobjects system
      is that:
      
      - activation of an object sets the current active state to 0,
      - deactivation of an object expects the current active state to be 0.
      
      For the rest of the states, the state mapping is determined by the specific
      object instance. Therefore, the logic keeping track of the state machine is
      within the specialized instance, without any need to know about it at the
      debugobject level.
      
      The current object active state is changed by calling:
      
      debug_object_active_state(addr, descr, expect, next)
      
      where "expect" is the expected state and "next" is the next state to move to if
      the expected state is found. A warning is generated if the expected is not
      found.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      CC: akpm@linux-foundation.org
      CC: mingo@elte.hu
      CC: laijs@cn.fujitsu.com
      CC: dipankar@in.ibm.com
      CC: josh@joshtriplett.org
      CC: dvhltc@us.ibm.com
      CC: niv@us.ibm.com
      CC: peterz@infradead.org
      CC: rostedt@goodmis.org
      CC: Valdis.Kletnieks@vt.edu
      CC: dhowells@redhat.com
      CC: eric.dumazet@gmail.com
      CC: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      a5d8e467
    • P
      rcu: make SRCU usable in modules · d14aada8
      Paul E. McKenney 提交于
      Add a #include for mutex.h to allow SRCU to be more easily used in
      kernel modules.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      d14aada8
    • P
      rcu: slim down rcutiny by removing rcu_scheduler_active and friends · bbad9379
      Paul E. McKenney 提交于
      TINY_RCU does not need rcu_scheduler_active unless CONFIG_DEBUG_LOCK_ALLOC.
      So conditionally compile rcu_scheduler_active in order to slim down
      rcutiny a bit more.  Also gets rid of an EXPORT_SYMBOL_GPL, which is
      responsible for most of the slimming.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      bbad9379
    • P
      rcu: refactor RCU's context-switch handling · 25502a6c
      Paul E. McKenney 提交于
      The addition of preemptible RCU to treercu resulted in a bit of
      confusion and inefficiency surrounding the handling of context switches
      for RCU-sched and for RCU-preempt.  For RCU-sched, a context switch
      is a quiescent state, pure and simple, just like it always has been.
      For RCU-preempt, a context switch is in no way a quiescent state, but
      special handling is required when a task blocks in an RCU read-side
      critical section.
      
      However, the callout from the scheduler and the outer loop in ksoftirqd
      still calls something named rcu_sched_qs(), whose name is no longer
      accurate.  Furthermore, when rcu_check_callbacks() notes an RCU-sched
      quiescent state, it ends up unnecessarily (though harmlessly, aside
      from the performance hit) enqueuing the current task if it happens to
      be running in an RCU-preempt read-side critical section.  This not only
      increases the maximum latency of scheduler_tick(), it also needlessly
      increases the overhead of the next outermost rcu_read_unlock() invocation.
      
      This patch addresses this situation by separating the notion of RCU's
      context-switch handling from that of RCU-sched's quiescent states.
      The context-switch handling is covered by rcu_note_context_switch() in
      general and by rcu_preempt_note_context_switch() for preemptible RCU.
      This permits rcu_sched_qs() to handle quiescent states and only quiescent
      states.  It also reduces the maximum latency of scheduler_tick(), though
      probably by much less than a microsecond.  Finally, it means that tasks
      within preemptible-RCU read-side critical sections avoid incurring the
      overhead of queuing unless there really is a context switch.
      Suggested-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Acked-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      25502a6c
    • P
      rcu: shrink rcutiny by making synchronize_rcu_bh() be inline · da848c47
      Paul E. McKenney 提交于
      Because synchronize_rcu_bh() is identical to synchronize_sched(),
      make the former a static inline invoking the latter, saving the
      overhead of an EXPORT_SYMBOL_GPL() and the duplicate code.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      da848c47