1. 25 2月, 2010 1 次提交
    • P
      rcu: Introduce lockdep-based checking to RCU read-side primitives · 632ee200
      Paul E. McKenney 提交于
      Inspection is proving insufficient to catch all RCU misuses,
      which is understandable given that rcu_dereference() might be
      protected by any of four different flavors of RCU (RCU, RCU-bh,
      RCU-sched, and SRCU), and might also/instead be protected by any
      of a number of locking primitives. It is therefore time to
      enlist the aid of lockdep.
      
      This set of patches is inspired by earlier work by Peter
      Zijlstra and Thomas Gleixner, and takes the following approach:
      
      o	Set up separate lockdep classes for RCU, RCU-bh, and RCU-sched.
      
      o	Set up separate lockdep classes for each instance of SRCU.
      
      o	Create primitives that check for being in an RCU read-side
      	critical section.  These return exact answers if lockdep is
      	fully enabled, but if unsure, report being in an RCU read-side
      	critical section.  (We want to avoid false positives!)
      	The primitives are:
      
      	For RCU: rcu_read_lock_held(void)
      
      	For RCU-bh: rcu_read_lock_bh_held(void)
      
      	For RCU-sched: rcu_read_lock_sched_held(void)
      
      	For SRCU: srcu_read_lock_held(struct srcu_struct *sp)
      
      o	Add rcu_dereference_check(), which takes a second argument
      	in which one places a boolean expression based on the above
      	primitives and/or lockdep_is_held().
      
      o	A new kernel configuration parameter, CONFIG_PROVE_RCU, enables
      	rcu_dereference_check().  This depends on CONFIG_PROVE_LOCKING,
      	and should be quite helpful during the transition period while
      	CONFIG_PROVE_RCU-unaware patches are in flight.
      
      The existing rcu_dereference() primitive does no checking, but
      upcoming patches will change that.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1266887105-1528-1-git-send-email-paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      632ee200
  2. 16 1月, 2010 1 次提交
    • P
      rcu: 1Q2010 update for RCU documentation · 4c54005c
      Paul E. McKenney 提交于
      Add expedited functions.  Review documentation and update
      obsolete verbiage.  Also fix the advice for the RCU CPU-stall
      kernel configuration parameter, and document RCU CPU-stall
      warnings.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12635142581866-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4c54005c
  3. 16 12月, 2009 1 次提交
  4. 04 12月, 2009 1 次提交
  5. 03 12月, 2009 1 次提交
    • P
      rcu: Make RCU's CPU-stall detector be default · 8bfb2f8e
      Paul E. McKenney 提交于
      The RCU_CPU_STALL_DETECTOR costs almost nothing and has located
      some bugs that might otherwise have been difficult to track
      down.  Make it be default for the TREE RCU implementations.
      
      The vmlinux size impact is limited (on 64-bit x86 defconfig):
      
         text	   data	    bss	    dec	    hex	filename
         8440248	1260076	 995588	10695912	 a334e8	vmlinux.before
         8440774	1260060	 995588	10696422	 a336e6	vmlinux.after
      
      +526 bytes - acceptable default cost.
      
      For RAM starved systems, TINY_RCU does not support CPU-stall detection
      and is much smaller, but then again it is a uniprocessor...
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12597846162906-git-send-email->
      [ v2: added image size calculations to the changelog ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8bfb2f8e
  6. 16 11月, 2009 1 次提交
  7. 11 11月, 2009 1 次提交
  8. 07 11月, 2009 1 次提交
  9. 27 10月, 2009 1 次提交
  10. 06 10月, 2009 1 次提交
  11. 20 9月, 2009 1 次提交
  12. 02 9月, 2009 1 次提交
    • D
      CRED: Add some configurable debugging [try #6] · e0e81739
      David Howells 提交于
      Add a config option (CONFIG_DEBUG_CREDENTIALS) to turn on some debug checking
      for credential management.  The additional code keeps track of the number of
      pointers from task_structs to any given cred struct, and checks to see that
      this number never exceeds the usage count of the cred struct (which includes
      all references, not just those from task_structs).
      
      Furthermore, if SELinux is enabled, the code also checks that the security
      pointer in the cred struct is never seen to be invalid.
      
      This attempts to catch the bug whereby inode_has_perm() faults in an nfsd
      kernel thread on seeing cred->security be a NULL pointer (it appears that the
      credential struct has been previously released):
      
      	http://www.kerneloops.org/oops.php?number=252883Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      e0e81739
  13. 23 8月, 2009 2 次提交
    • P
      rcu: Remove CONFIG_PREEMPT_RCU · 6b3ef48a
      Paul E. McKenney 提交于
      Now that CONFIG_TREE_PREEMPT_RCU is in place, there is no
      further need for CONFIG_PREEMPT_RCU.  Remove it, along with
      whatever subtle bugs it may (or may not) contain.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: akpm@linux-foundation.org
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josht@linux.vnet.ibm.com
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      LKML-Reference: <125097461396-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6b3ef48a
    • P
      rcu: Merge preemptable-RCU functionality into hierarchical RCU · f41d911f
      Paul E. McKenney 提交于
      Create a kernel/rcutree_plugin.h file that contains definitions
      for preemptable RCU (or, under the #else branch of the #ifdef,
      empty definitions for the classic non-preemptable semantics).
      These definitions fit into plugins defined in kernel/rcutree.c
      for this purpose.
      
      This variant of preemptable RCU uses a new algorithm whose
      read-side expense is roughly that of classic hierarchical RCU
      under CONFIG_PREEMPT. This new algorithm's update-side expense
      is similar to that of classic hierarchical RCU, and, in absence
      of read-side preemption or blocking, is exactly that of classic
      hierarchical RCU.  Perhaps more important, this new algorithm
      has a much simpler implementation, saving well over 1,000 lines
      of code compared to mainline's implementation of preemptable
      RCU, which will hopefully be retired in favor of this new
      algorithm.
      
      The simplifications are obtained by maintaining per-task
      nesting state for running tasks, and using a simple
      lock-protected algorithm to handle accounting when tasks block
      within RCU read-side critical sections, making use of lessons
      learned while creating numerous user-level RCU implementations
      over the past 18 months.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: akpm@linux-foundation.org
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josht@linux.vnet.ibm.com
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      LKML-Reference: <12509746134003-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f41d911f
  14. 20 8月, 2009 1 次提交
  15. 25 6月, 2009 1 次提交
  16. 24 6月, 2009 1 次提交
    • T
      percpu: implement optional weak percpu definitions · 7c756e6e
      Tejun Heo 提交于
      Some archs (alpha and s390) need to use weak definitions for percpu
      variables in modules so that the compiler generates external
      references for them.
      
      This patch implements weak percpu definitions which arch can enable by
      defining ARCH_NEEDS_WEAK_PER_CPU in arch percpu header file.  This
      weak definition adds the following two restrictions on percpu variable
      definitions.
      
        1. percpu symbols must be unique whether static or not
        2. percpu variables can't be defined inside a function
      
      To ensure that these restrictions are observed in generic code, config
      option DEBUG_FORCE_WEAK_PER_CPU enables weak percpu definitions for
      all cases.
      
      This patch is inspired by Ivan Kokshaysky's alpha percpu patch.
      
      [ Impact: stricter rules for percpu variables, one more debug config option ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      7c756e6e
  17. 23 6月, 2009 1 次提交
  18. 21 6月, 2009 1 次提交
  19. 15 6月, 2009 2 次提交
  20. 12 6月, 2009 2 次提交
  21. 09 5月, 2009 1 次提交
  22. 23 4月, 2009 1 次提交
  23. 07 4月, 2009 1 次提交
  24. 01 4月, 2009 1 次提交
  25. 25 3月, 2009 2 次提交
  26. 05 3月, 2009 2 次提交
  27. 21 2月, 2009 1 次提交
  28. 19 2月, 2009 1 次提交
  29. 22 1月, 2009 1 次提交
  30. 19 1月, 2009 1 次提交
  31. 16 1月, 2009 1 次提交
  32. 08 1月, 2009 1 次提交
    • D
      NOMMU: Make VMAs per MM as for MMU-mode linux · 8feae131
      David Howells 提交于
      Make VMAs per mm_struct as for MMU-mode linux.  This solves two problems:
      
       (1) In SYSV SHM where nattch for a segment does not reflect the number of
           shmat's (and forks) done.
      
       (2) In mmap() where the VMA's vm_mm is set to point to the parent mm by an
           exec'ing process when VM_EXECUTABLE is specified, regardless of the fact
           that a VMA might be shared and already have its vm_mm assigned to another
           process or a dead process.
      
      A new struct (vm_region) is introduced to track a mapped region and to remember
      the circumstances under which it may be shared and the vm_list_struct structure
      is discarded as it's no longer required.
      
      This patch makes the following additional changes:
      
       (1) Regions are now allocated with alloc_pages() rather than kmalloc() and
           with no recourse to __GFP_COMP, so the pages are not composite.  Instead,
           each page has a reference on it held by the region.  Anything else that is
           interested in such a page will have to get a reference on it to retain it.
           When the pages are released due to unmapping, each page is passed to
           put_page() and will be freed when the page usage count reaches zero.
      
       (2) Excess pages are trimmed after an allocation as the allocation must be
           made as a power-of-2 quantity of pages.
      
       (3) VMAs are added to the parent MM's R/B tree and mmap lists.  As an MM may
           end up with overlapping VMAs within the tree, the VMA struct address is
           appended to the sort key.
      
       (4) Non-anonymous VMAs are now added to the backing inode's prio list.
      
       (5) Holes may be punched in anonymous VMAs with munmap(), releasing parts of
           the backing region.  The VMA and region structs will be split if
           necessary.
      
       (6) sys_shmdt() only releases one attachment to a SYSV IPC shared memory
           segment instead of all the attachments at that addresss.  Multiple
           shmat()'s return the same address under NOMMU-mode instead of different
           virtual addresses as under MMU-mode.
      
       (7) Core dumping for ELF-FDPIC requires fewer exceptions for NOMMU-mode.
      
       (8) /proc/maps is now the global list of mapped regions, and may list bits
           that aren't actually mapped anywhere.
      
       (9) /proc/meminfo gains a line (tagged "MmapCopy") that indicates the amount
           of RAM currently allocated by mmap to hold mappable regions that can't be
           mapped directly.  These are copies of the backing device or file if not
           anonymous.
      
      These changes make NOMMU mode more similar to MMU mode.  The downside is that
      NOMMU mode requires some extra memory to track things over NOMMU without this
      patch (VMAs are no longer shared, and there are now region structs).
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NMike Frysinger <vapier.adi@gmail.com>
      Acked-by: NPaul Mundt <lethal@linux-sh.org>
      8feae131
  33. 07 1月, 2009 1 次提交
  34. 30 12月, 2008 1 次提交
    • F
      tracing/kmemtrace: normalize the raw tracer event to the unified tracing API · 36994e58
      Frederic Weisbecker 提交于
      Impact: new tracer plugin
      
      This patch adapts kmemtrace raw events tracing to the unified tracing API.
      
      To enable and use this tracer, just do the following:
      
       echo kmemtrace > /debugfs/tracing/current_tracer
       cat /debugfs/tracing/trace
      
      You will have the following output:
      
       # tracer: kmemtrace
       #
       #
       # ALLOC  TYPE  REQ   GIVEN  FLAGS           POINTER         NODE    CALLER
       # FREE   |      |     |       |              |   |            |        |
       # |
      
      type_id 1 call_site 18446744071565527833 ptr 18446612134395152256
      type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
      type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
      type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
      type_id 0 call_site 18446744071565636711 ptr 18446612134345164672 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
      type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
      type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
      type_id 0 call_site 18446744071565636711 ptr 18446612134345164912 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
      type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
      type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
      type_id 0 call_site 18446744071565636711 ptr 18446612134345165152 bytes_req 240 bytes_alloc 240 gfp_flags 208 node -1
      type_id 0 call_site 18446744071566144042 ptr 18446612134346191680 bytes_req 1304 bytes_alloc 1312 gfp_flags 208 node -1
      type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
      type_id 0 call_site 18446744071565585597 ptr 18446612134405955584 bytes_req 4096 bytes_alloc 4096 gfp_flags 208 node -1
      type_id 1 call_site 18446744071565585534 ptr 18446612134405955584
      
      That was to stay backward compatible with the format output produced in
      inux/tracepoint.h.
      
      This is the default ouput, but note that I tried something else.
      
      If you change an option:
      
      echo kmem_minimalistic > /debugfs/trace_options
      
      and then cat /debugfs/trace, you will have the following output:
      
       # tracer: kmemtrace
       #
       #
       # ALLOC  TYPE  REQ   GIVEN  FLAGS           POINTER         NODE    CALLER
       # FREE   |      |     |       |              |   |            |        |
       # |
      
         -      C                            0xffff88007c088780          file_free_rcu
         +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
         -      C                            0xffff88007cad6000          putname
         +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
         +      K    240    240   000000d0   0xffff8800790dc780     -1   d_alloc
         -      C                            0xffff88007cad6000          putname
         +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
         +      K    240    240   000000d0   0xffff8800790dc870     -1   d_alloc
         -      C                            0xffff88007cad6000          putname
         +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
         +      K    240    240   000000d0   0xffff8800790dc960     -1   d_alloc
         +      K   1304   1312   000000d0   0xffff8800791d7340     -1   reiserfs_alloc_inode
         -      C                            0xffff88007cad6000          putname
         +      K   4096   4096   000000d0   0xffff88007cad6000     -1   getname
         -      C                            0xffff88007cad6000          putname
         +      K    992   1000   000000d0   0xffff880079045b58     -1   alloc_inode
         +      K    768   1024   000080d0   0xffff88007c096400     -1   alloc_pipe_info
         +      K    240    240   000000d0   0xffff8800790dca50     -1   d_alloc
         +      K    272    320   000080d0   0xffff88007c088780     -1   get_empty_filp
         +      K    272    320   000080d0   0xffff88007c088000     -1   get_empty_filp
      
      Yeah I shall confess kmem_minimalistic should be: kmem_alternative.
      
      Whatever, I find it more readable but this a personal opinion of course.
      We can drop it if you want.
      
      On the ALLOC/FREE column, + means an allocation and - a free.
      
      On the type column, you have K = kmalloc, C = cache, P = page
      
      I would like the flags to be GFP_* strings but that would not be easy to not
      break the column with strings....
      
      About the node...it seems to always be -1. I don't know why but that shouldn't
      be difficult to find.
      
      I moved linux/tracepoint.h to trace/tracepoint.h as well. I think that would
      be more easy to find the tracer headers if they are all in their common
      directory.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      36994e58
  35. 29 12月, 2008 1 次提交