1. 01 5月, 2008 1 次提交
  2. 30 4月, 2008 1 次提交
    • T
      infrastructure to debug (dynamic) objects · 3ac7fe5a
      Thomas Gleixner 提交于
      We can see an ever repeating problem pattern with objects of any kind in the
      kernel:
      
      1) freeing of active objects
      2) reinitialization of active objects
      
      Both problems can be hard to debug because the crash happens at a point where
      we have no chance to decode the root cause anymore.  One problem spot are
      kernel timers, where the detection of the problem often happens in interrupt
      context and usually causes the machine to panic.
      
      While working on a timer related bug report I had to hack specialized code
      into the timer subsystem to get a reasonable hint for the root cause.  This
      debug hack was fine for temporary use, but far from a mergeable solution due
      to the intrusiveness into the timer code.
      
      The code further lacked the ability to detect and report the root cause
      instantly and keep the system operational.
      
      Keeping the system operational is important to get hold of the debug
      information without special debugging aids like serial consoles and special
      knowledge of the bug reporter.
      
      The problems described above are not restricted to timers, but timers tend to
      expose it usually in a full system crash.  Other objects are less explosive,
      but the symptoms caused by such mistakes can be even harder to debug.
      
      Instead of creating specialized debugging code for the timer subsystem a
      generic infrastructure is created which allows developers to verify their code
      and provides an easy to enable debug facility for users in case of trouble.
      
      The debugobjects core code keeps track of operations on static and dynamic
      objects by inserting them into a hashed list and sanity checking them on
      object operations and provides additional checks whenever kernel memory is
      freed.
      
      The tracked object operations are:
      - initializing an object
      - adding an object to a subsystem list
      - deleting an object from a subsystem list
      
      Each operation is sanity checked before the operation is executed and the
      subsystem specific code can provide a fixup function which allows to prevent
      the damage of the operation.  When the sanity check triggers a warning message
      and a stack trace is printed.
      
      The list of operations can be extended if the need arises.  For now it's
      limited to the requirements of the first user (timers).
      
      The core code enqueues the objects into hash buckets.  The hash index is
      generated from the address of the object to simplify the lookup for the check
      on kfree/vfree.  Each bucket has it's own spinlock to avoid contention on a
      global lock.
      
      The debug code can be compiled in without being active.  The runtime overhead
      is minimal and could be optimized by asm alternatives.  A kernel command line
      option enables the debugging code.
      
      Thanks to Ingo Molnar for review, suggestions and cleanup patches.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: Greg KH <greg@kroah.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3ac7fe5a
  3. 28 4月, 2008 2 次提交
    • C
      vmallocinfo: add caller information · 23016969
      Christoph Lameter 提交于
      Add caller information so that /proc/vmallocinfo shows where the allocation
      request for a slice of vmalloc memory originated.
      
      Results in output like this:
      
      0xffffc20000000000-0xffffc20000801000 8392704 alloc_large_system_hash+0x127/0x246 pages=2048 vmalloc vpages
      0xffffc20000801000-0xffffc20000806000   20480 alloc_large_system_hash+0x127/0x246 pages=4 vmalloc
      0xffffc20000806000-0xffffc20000c07000 4198400 alloc_large_system_hash+0x127/0x246 pages=1024 vmalloc vpages
      0xffffc20000c07000-0xffffc20000c0a000   12288 alloc_large_system_hash+0x127/0x246 pages=2 vmalloc
      0xffffc20000c0a000-0xffffc20000c0c000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
      0xffffc20000c0c000-0xffffc20000c0f000   12288 acpi_os_map_memory+0x13/0x1c phys=cff64000 ioremap
      0xffffc20000c10000-0xffffc20000c15000   20480 acpi_os_map_memory+0x13/0x1c phys=cff65000 ioremap
      0xffffc20000c16000-0xffffc20000c18000    8192 acpi_os_map_memory+0x13/0x1c phys=cff69000 ioremap
      0xffffc20000c18000-0xffffc20000c1a000    8192 acpi_os_map_memory+0x13/0x1c phys=fed1f000 ioremap
      0xffffc20000c1a000-0xffffc20000c1c000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
      0xffffc20000c1c000-0xffffc20000c1e000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
      0xffffc20000c1e000-0xffffc20000c20000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
      0xffffc20000c20000-0xffffc20000c22000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
      0xffffc20000c22000-0xffffc20000c24000    8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap
      0xffffc20000c24000-0xffffc20000c26000    8192 acpi_os_map_memory+0x13/0x1c phys=e0081000 ioremap
      0xffffc20000c26000-0xffffc20000c28000    8192 acpi_os_map_memory+0x13/0x1c phys=e0080000 ioremap
      0xffffc20000c28000-0xffffc20000c2d000   20480 alloc_large_system_hash+0x127/0x246 pages=4 vmalloc
      0xffffc20000c2d000-0xffffc20000c31000   16384 tcp_init+0xd5/0x31c pages=3 vmalloc
      0xffffc20000c31000-0xffffc20000c34000   12288 alloc_large_system_hash+0x127/0x246 pages=2 vmalloc
      0xffffc20000c34000-0xffffc20000c36000    8192 init_vdso_vars+0xde/0x1f1
      0xffffc20000c36000-0xffffc20000c38000    8192 pci_iomap+0x8a/0xb4 phys=d8e00000 ioremap
      0xffffc20000c38000-0xffffc20000c3a000    8192 usb_hcd_pci_probe+0x139/0x295 [usbcore] phys=d8e00000 ioremap
      0xffffc20000c3a000-0xffffc20000c3e000   16384 sys_swapon+0x509/0xa15 pages=3 vmalloc
      0xffffc20000c40000-0xffffc20000c61000  135168 e1000_probe+0x1c4/0xa32 phys=d8a20000 ioremap
      0xffffc20000c61000-0xffffc20000c6a000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
      0xffffc20000c6a000-0xffffc20000c73000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
      0xffffc20000c73000-0xffffc20000c7c000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
      0xffffc20000c7c000-0xffffc20000c7f000   12288 e1000e_setup_tx_resources+0x29/0xbe pages=2 vmalloc
      0xffffc20000c80000-0xffffc20001481000 8392704 pci_mmcfg_arch_init+0x90/0x118 phys=e0000000 ioremap
      0xffffc20001481000-0xffffc20001682000 2101248 alloc_large_system_hash+0x127/0x246 pages=512 vmalloc
      0xffffc20001682000-0xffffc20001e83000 8392704 alloc_large_system_hash+0x127/0x246 pages=2048 vmalloc vpages
      0xffffc20001e83000-0xffffc20002204000 3674112 alloc_large_system_hash+0x127/0x246 pages=896 vmalloc vpages
      0xffffc20002204000-0xffffc2000220d000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
      0xffffc2000220d000-0xffffc20002216000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
      0xffffc20002216000-0xffffc2000221f000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
      0xffffc2000221f000-0xffffc20002228000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
      0xffffc20002228000-0xffffc20002231000   36864 _xfs_buf_map_pages+0x8e/0xc0 vmap
      0xffffc20002231000-0xffffc20002234000   12288 e1000e_setup_rx_resources+0x35/0x122 pages=2 vmalloc
      0xffffc20002240000-0xffffc20002261000  135168 e1000_probe+0x1c4/0xa32 phys=d8a60000 ioremap
      0xffffc20002261000-0xffffc2000270c000 4894720 sys_swapon+0x509/0xa15 pages=1194 vmalloc vpages
      0xffffffffa0000000-0xffffffffa0022000  139264 module_alloc+0x4f/0x55 pages=33 vmalloc
      0xffffffffa0022000-0xffffffffa0029000   28672 module_alloc+0x4f/0x55 pages=6 vmalloc
      0xffffffffa002b000-0xffffffffa0034000   36864 module_alloc+0x4f/0x55 pages=8 vmalloc
      0xffffffffa0034000-0xffffffffa003d000   36864 module_alloc+0x4f/0x55 pages=8 vmalloc
      0xffffffffa003d000-0xffffffffa0049000   49152 module_alloc+0x4f/0x55 pages=11 vmalloc
      0xffffffffa0049000-0xffffffffa0050000   28672 module_alloc+0x4f/0x55 pages=6 vmalloc
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      23016969
    • C
      vmalloc: show vmalloced areas via /proc/vmallocinfo · a10aa579
      Christoph Lameter 提交于
      Implement a new proc file that allows the display of the currently allocated
      vmalloc memory.
      
      It allows to see the users of vmalloc.  That is important if vmalloc space is
      scarce (i386 for example).
      
      And it's going to be important for the compound page fallback to vmalloc.
      Many of the current users can be switched to use compound pages with fallback.
       This means that the number of users of vmalloc is reduced and page tables no
      longer necessary to access the memory.  /proc/vmallocinfo allows to review how
      that reduction occurs.
      
      If memory becomes fragmented and larger order allocations are no longer
      possible then /proc/vmallocinfo allows to see which compound page allocations
      fell back to virtual compound pages.  That is important for new users of
      virtual compound pages.  Such as order 1 stack allocation etc that may
      fallback to virtual compound pages in the future.
      
      /proc/vmallocinfo permissions are made readable-only-by-root to avoid possible
      information leakage.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [akpm@linux-foundation.org: CONFIG_MMU=n build fix]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a10aa579
  4. 20 3月, 2008 1 次提交
  5. 09 2月, 2008 1 次提交
    • M
      CONFIG_HIGHPTE vs. sub-page page tables. · 2f569afd
      Martin Schwidefsky 提交于
      Background: I've implemented 1K/2K page tables for s390.  These sub-page
      page tables are required to properly support the s390 virtualization
      instruction with KVM.  The SIE instruction requires that the page tables
      have 256 page table entries (pte) followed by 256 page status table entries
      (pgste).  The pgstes are only required if the process is using the SIE
      instruction.  The pgstes are updated by the hardware and by the hypervisor
      for a number of reasons, one of them is dirty and reference bit tracking.
      To avoid wasting memory the standard pte table allocation should return
      1K/2K (31/64 bit) and 2K/4K if the process is using SIE.
      
      Problem: Page size on s390 is 4K, page table size is 1K or 2K.  That means
      the s390 version for pte_alloc_one cannot return a pointer to a struct
      page.  Trouble is that with the CONFIG_HIGHPTE feature on x86 pte_alloc_one
      cannot return a pointer to a pte either, since that would require more than
      32 bit for the return value of pte_alloc_one (and the pte * would not be
      accessible since its not kmapped).
      
      Solution: The only solution I found to this dilemma is a new typedef: a
      pgtable_t.  For s390 pgtable_t will be a (pte *) - to be introduced with a
      later patch.  For everybody else it will be a (struct page *).  The
      additional problem with the initialization of the ptl lock and the
      NR_PAGETABLE accounting is solved with a constructor pgtable_page_ctor and
      a destructor pgtable_page_dtor.  The page table allocation and free
      functions need to call these two whenever a page table page is allocated or
      freed.  pmd_populate will get a pgtable_t instead of a struct page pointer.
       To get the pgtable_t back from a pmd entry that has been installed with
      pmd_populate a new function pmd_pgtable is added.  It replaces the pmd_page
      call in free_pte_range and apply_to_pte_range.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2f569afd
  6. 06 2月, 2008 5 次提交
  7. 20 10月, 2007 1 次提交
  8. 17 10月, 2007 1 次提交
    • C
      Categorize GFP flags · 6cb06229
      Christoph Lameter 提交于
      The function of GFP_LEVEL_MASK seems to be unclear.  In order to clear up
      the mystery we get rid of it and replace GFP_LEVEL_MASK with 3 sets of GFP
      flags:
      
      GFP_RECLAIM_MASK	Flags used to control page allocator reclaim behavior.
      
      GFP_CONSTRAINT_MASK	Flags used to limit where allocations can occur.
      
      GFP_SLAB_BUG_MASK	Flags that the slab allocator BUG()s on.
      
      These replace the uses of GFP_LEVEL mask in the slab allocators and in
      vmalloc.c.
      
      The use of the flags not included in these sets may occur as a result of a
      slab allocation standing in for a page allocation when constructing scatter
      gather lists.  Extraneous flags are cleared and not passed through to the
      page allocator.  __GFP_MOVABLE/RECLAIMABLE, __GFP_COLD and __GFP_COMP will
      now be ignored if passed to a slab allocator.
      
      Change the allocation of allocator meta data in SLAB and vmalloc to not
      pass through flags listed in GFP_CONSTRAINT_MASK.  SLAB already removes the
      __GFP_THISNODE flag for such allocations.  Generalize that to also cover
      vmalloc.  The use of GFP_CONSTRAINT_MASK also includes __GFP_HARDWALL.
      
      The impact of allocator metadata placement on access latency to the
      cachelines of the object itself is minimal since metadata is only
      referenced on alloc and free.  The attempt is still made to place the meta
      data optimally but we consistently allow fallback both in SLAB and vmalloc
      (SLUB does not need to allocate metadata like that).
      
      Allocator metadata may serve multiple in kernel users and thus should not
      be subject to the limitations arising from a single allocation context.
      
      [akpm@linux-foundation.org: fix fallback_alloc()]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6cb06229
  9. 20 7月, 2007 2 次提交
  10. 18 7月, 2007 2 次提交
  11. 14 6月, 2007 1 次提交
  12. 17 5月, 2007 1 次提交
  13. 09 5月, 2007 1 次提交
    • C
      move die notifier handling to common code · 1eeb66a1
      Christoph Hellwig 提交于
      This patch moves the die notifier handling to common code.  Previous
      various architectures had exactly the same code for it.  Note that the new
      code is compiled unconditionally, this should be understood as an appel to
      the other architecture maintainer to implement support for it aswell (aka
      sprinkling a notify_die or two in the proper place)
      
      arm had a notifiy_die that did something totally different, I renamed it to
      arm_notify_die as part of the patch and made it static to the file it's
      declared and used at.  avr32 used to pass slightly less information through
      this interface and I brought it into line with the other architectures.
      
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: fix vmalloc_sync_all bustage]
      [bryan.wu@analog.com: fix vmalloc_sync_all in nommu]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: <linux-arch@vger.kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: NBryan Wu <bryan.wu@analog.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1eeb66a1
  14. 03 5月, 2007 1 次提交
  15. 12 2月, 2007 1 次提交
  16. 17 11月, 2006 1 次提交
  17. 13 11月, 2006 1 次提交
  18. 30 10月, 2006 1 次提交
    • G
      [PATCH] Fix GFP_HIGHMEM slab panic · 5211e6e6
      Giridhar Pemmasani 提交于
      As reported by Martin J. Bligh <mbligh@google.com>, we let through some
      non-slab bits to slab allocation through __get_vm_area_node when doing a
      vmalloc.
      
      I haven't been able to reproduce this, although I understand why it
      happens: vmalloc allocates memory with
      
      GFP_KERNEL | __GFP_HIGHMEM
      
      and commit 52fd24ca resulted in the same
      flags are passed down to cache_alloc_refill, causing the BUG.  The
      following patch fixes it.
      
      Note that when calling kmalloc_node, I am masking off __GFP_HIGHMEM with
      GFP_LEVEL_MASK, whereas __vmalloc_area_node does the same with
      
      ~(__GFP_HIGHMEM | __GFP_ZERO).
      
      IMHO, using GFP_LEVEL_MASK is preferable, but either should fix this
      problem.
      
      Signed-off-by: Giridhar Pemmasani (pgiri@yahoo.com)
      Cc: Martin J. Bligh <mbligh@google.com>
      Cc: Andrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5211e6e6
  19. 29 10月, 2006 1 次提交
  20. 17 10月, 2006 1 次提交
  21. 04 10月, 2006 1 次提交
  22. 27 9月, 2006 2 次提交
  23. 26 9月, 2006 1 次提交
  24. 15 7月, 2006 1 次提交
  25. 04 7月, 2006 1 次提交
    • I
      [PATCH] lockdep: better lock debugging · 9a11b49a
      Ingo Molnar 提交于
      Generic lock debugging:
      
       - generalized lock debugging framework. For example, a bug in one lock
         subsystem turns off debugging in all lock subsystems.
      
       - got rid of the caller address passing (__IP__/__IP_DECL__/etc.) from
         the mutex/rtmutex debugging code: it caused way too much prototype
         hackery, and lockdep will give the same information anyway.
      
       - ability to do silent tests
      
       - check lock freeing in vfree too.
      
       - more finegrained debugging options, to allow distributions to
         turn off more expensive debugging features.
      
      There's no separate 'held mutexes' list anymore - but there's a 'held locks'
      stack within lockdep, which unifies deadlock detection across all lock
      classes.  (this is independent of the lockdep validation stuff - lockdep first
      checks whether we are holding a lock already)
      
      Here are the current debugging options:
      
      CONFIG_DEBUG_MUTEXES=y
      CONFIG_DEBUG_LOCK_ALLOC=y
      
      which do:
      
       config DEBUG_MUTEXES
                bool "Mutex debugging, basic checks"
      
       config DEBUG_LOCK_ALLOC
               bool "Detect incorrect freeing of live mutexes"
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9a11b49a
  26. 23 6月, 2006 1 次提交
  27. 01 4月, 2006 1 次提交
  28. 07 11月, 2005 1 次提交
  29. 30 10月, 2005 2 次提交
    • H
      [PATCH] mm: init_mm without ptlock · 872fec16
      Hugh Dickins 提交于
      First step in pushing down the page_table_lock.  init_mm.page_table_lock has
      been used throughout the architectures (usually for ioremap): not to serialize
      kernel address space allocation (that's usually vmlist_lock), but because
      pud_alloc,pmd_alloc,pte_alloc_kernel expect caller holds it.
      
      Reverse that: don't lock or unlock init_mm.page_table_lock in any of the
      architectures; instead rely on pud_alloc,pmd_alloc,pte_alloc_kernel to take
      and drop it when allocating a new one, to check lest a racing task already
      did.  Similarly no page_table_lock in vmalloc's map_vm_area.
      
      Some temporary ugliness in __pud_alloc and __pmd_alloc: since they also handle
      user mms, which are converted only by a later patch, for now they have to lock
      differently according to whether or not it's init_mm.
      
      If sources get muddled, there's a danger that an arch source taking
      init_mm.page_table_lock will be mixed with common source also taking it (or
      neither take it).  So break the rules and make another change, which should
      break the build for such a mismatch: remove the redundant mm arg from
      pte_alloc_kernel (ppc64 scrapped its distinct ioremap_mm in 2.6.13).
      
      Exceptions: arm26 used pte_alloc_kernel on user mm, now pte_alloc_map; ia64
      used pte_alloc_map on init_mm, now pte_alloc_kernel; parisc had bad args to
      pmd_alloc and pte_alloc_kernel in unused USE_HPPA_IOREMAP code; ppc64
      map_io_page forgot to unlock on failure; ppc mmu_mapin_ram and ppc64 im_free
      took page_table_lock for no good reason.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      872fec16
    • C
      [PATCH] vmalloc_node · 930fc45a
      Christoph Lameter 提交于
      This patch adds
      
      vmalloc_node(size, node)	-> Allocate necessary memory on the specified node
      
      and
      
      get_vm_area_node(size, flags, node)
      
      and the other functions that it depends on.
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      930fc45a
  30. 09 10月, 2005 1 次提交
  31. 10 9月, 2005 1 次提交