1. 22 11月, 2006 2 次提交
    • D
      WorkStruct: Pass the work_struct pointer instead of context data · 65f27f38
      David Howells 提交于
      Pass the work_struct pointer to the work function rather than context data.
      The work function can use container_of() to work out the data.
      
      For the cases where the container of the work_struct may go away the moment the
      pending bit is cleared, it is made possible to defer the release of the
      structure by deferring the clearing of the pending bit.
      
      To make this work, an extra flag is introduced into the management side of the
      work_struct.  This governs auto-release of the structure upon execution.
      
      Ordinarily, the work queue executor would release the work_struct for further
      scheduling or deallocation by clearing the pending bit prior to jumping to the
      work function.  This means that, unless the driver makes some guarantee itself
      that the work_struct won't go away, the work function may not access anything
      else in the work_struct or its container lest they be deallocated..  This is a
      problem if the auxiliary data is taken away (as done by the last patch).
      
      However, if the pending bit is *not* cleared before jumping to the work
      function, then the work function *may* access the work_struct and its container
      with no problems.  But then the work function must itself release the
      work_struct by calling work_release().
      
      In most cases, automatic release is fine, so this is the default.  Special
      initiators exist for the non-auto-release case (ending in _NAR).
      Signed-Off-By: NDavid Howells <dhowells@redhat.com>
      65f27f38
    • D
      WorkStruct: Separate delayable and non-delayable events. · 52bad64d
      David Howells 提交于
      Separate delayable work items from non-delayable work items be splitting them
      into a separate structure (delayed_work), which incorporates a work_struct and
      the timer_list removed from work_struct.
      
      The work_struct struct is huge, and this limits it's usefulness.  On a 64-bit
      architecture it's nearly 100 bytes in size.  This reduces that by half for the
      non-delayable type of event.
      Signed-Off-By: NDavid Howells <dhowells@redhat.com>
      52bad64d
  2. 04 11月, 2006 1 次提交
  3. 22 10月, 2006 1 次提交
  4. 08 10月, 2006 1 次提交
  5. 06 10月, 2006 1 次提交
  6. 04 10月, 2006 2 次提交
  7. 30 9月, 2006 1 次提交
    • D
      [PATCH] single bit flip detector · aa83aa40
      Dave Jones 提交于
      In cases where we detect a single bit has been flipped, we spew the usual
      slab corruption message, which users instantly think is a kernel bug.  In a
      lot of cases, single bit errors are down to bad memory, or other hardware
      failure.
      
      This patch adds an extra line to the slab debug messages in those cases, in
      the hope that users will try memtest before they report a bug.
      
      000: 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
      Single bit error detected. Possibly bad RAM. Run memtest86.
      
      [akpm@osdl.org: cleanups]
      Signed-off-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      aa83aa40
  8. 27 9月, 2006 3 次提交
    • C
      [PATCH] GFP_THISNODE for the slab allocator · 765c4507
      Christoph Lameter 提交于
      This patch insures that the slab node lists in the NUMA case only contain
      slabs that belong to that specific node.  All slab allocations use
      GFP_THISNODE when calling into the page allocator.  If an allocation fails
      then we fall back in the slab allocator according to the zonelists appropriate
      for a certain context.
      
      This allows a replication of the behavior of alloc_pages and alloc_pages node
      in the slab layer.
      
      Currently allocations requested from the page allocator may be redirected via
      cpusets to other nodes.  This results in remote pages on nodelists and that in
      turn results in interrupt latency issues during cache draining.  Plus the slab
      is handing out memory as local when it is really remote.
      
      Fallback for slab memory allocations will occur within the slab allocator and
      not in the page allocator.  This is necessary in order to be able to use the
      existing pools of objects on the nodes that we fall back to before adding more
      pages to a slab.
      
      The fallback function insures that the nodes we fall back to obey cpuset
      restrictions of the current context.  We do not allocate objects from outside
      of the current cpuset context like before.
      
      Note that the implementation of locality constraints within the slab allocator
      requires importing logic from the page allocator.  This is a mischmash that is
      not that great.  Other allocators (uncached allocator, vmalloc, huge pages)
      face similar problems and have similar minimal reimplementations of the basic
      fallback logic of the page allocator.  There is another way of implementing a
      slab by avoiding per node lists (see modular slab) but this wont work within
      the existing slab.
      
      V1->V2:
      - Use NUMA_BUILD to avoid #ifdef CONFIG_NUMA
      - Exploit GFP_THISNODE being 0 in the NON_NUMA case to avoid another
        #ifdef
      
      [akpm@osdl.org: build fix]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      765c4507
    • C
      [PATCH] slab: fix kmalloc_node applying memory policies if nodeid == numa_node_id() · de3083ec
      Christoph Lameter 提交于
      kmalloc_node() falls back to ___cache_alloc() under certain conditions and
      at that point memory policies may be applied redirecting the allocation
      away from the current node.  Therefore kmalloc_node(...,numa_node_id()) or
      kmalloc_node(...,-1) may not return memory from the local node.
      
      Fix this by doing the policy check in __cache_alloc() instead of
      ____cache_alloc().
      
      This version here is a cleanup of Kiran's patch.
      
      - Tested on ia64.
      - Extra material removed.
      - Consolidate the exit path if alternate_node_alloc() returned an object.
      
      [akpm@osdl.org: warning fix]
      Signed-off-by: NAlok N Kataria <alok.kataria@calsoftinc.com>
      Signed-off-by: NRavikiran Thirumalai <kiran@scalex86.org>
      Signed-off-by: NShai Fultheim <shai@scalex86.org>
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      de3083ec
    • A
      [PATCH] Make kmem_cache_destroy() return void · 133d205a
      Alexey Dobriyan 提交于
      un-, de-, -free, -destroy, -exit, etc functions should in general return
      void.  Also,
      
      There is very little, say, filesystem driver code can do upon failed
      kmem_cache_destroy().  If it will be decided to BUG in this case, BUG
      should be put in generic code, instead.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      133d205a
  9. 26 9月, 2006 11 次提交
  10. 01 8月, 2006 2 次提交
  11. 14 7月, 2006 3 次提交
  12. 04 7月, 2006 1 次提交
  13. 01 7月, 2006 3 次提交
    • C
      [PATCH] slab: consolidate code to free slabs from freelist · ed11d9eb
      Christoph Lameter 提交于
      Post and discussion:
      http://marc.theaimsgroup.com/?t=115074342800003&r=1&w=2
      
      Code in __shrink_node() duplicates code in cache_reap()
      
      Add a new function drain_freelist that removes slabs with objects that are
      already free and use that in various places.
      
      This eliminates the __node_shrink() function and provides the interrupt
      holdoff reduction from slab_free to code that used to call __node_shrink.
      
      [akpm@osdl.org: build fixes]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ed11d9eb
    • C
      [PATCH] zoned vm counters: conversion of nr_slab to per zone counter · 9a865ffa
      Christoph Lameter 提交于
      - Allows reclaim to access counter without looping over processor counts.
      
      - Allows accurate statistics on how many pages are used in a zone by
        the slab. This may become useful to balance slab allocations over
        various zones.
      
      [akpm@osdl.org: bugfix]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9a865ffa
    • C
      [PATCH] zoned vm counters: basic ZVC (zoned vm counter) implementation · 2244b95a
      Christoph Lameter 提交于
      Per zone counter infrastructure
      
      The counters that we currently have for the VM are split per processor.  The
      processor however has not much to do with the zone these pages belong to.  We
      cannot tell f.e.  how many ZONE_DMA pages are dirty.
      
      So we are blind to potentially inbalances in the usage of memory in various
      zones.  F.e.  in a NUMA system we cannot tell how many pages are dirty on a
      particular node.  If we knew then we could put measures into the VM to balance
      the use of memory between different zones and different nodes in a NUMA
      system.  For example it would be possible to limit the dirty pages per node so
      that fast local memory is kept available even if a process is dirtying huge
      amounts of pages.
      
      Another example is zone reclaim.  We do not know how many unmapped pages exist
      per zone.  So we just have to try to reclaim.  If it is not working then we
      pause and try again later.  It would be better if we knew when it makes sense
      to reclaim unmapped pages from a zone.  This patchset allows the determination
      of the number of unmapped pages per zone.  We can remove the zone reclaim
      interval with the counters introduced here.
      
      Futhermore the ability to have various usage statistics available will allow
      the development of new NUMA balancing algorithms that may be able to improve
      the decision making in the scheduler of when to move a process to another node
      and hopefully will also enable automatic page migration through a user space
      program that can analyse the memory load distribution and then rebalance
      memory use in order to increase performance.
      
      The counter framework here implements differential counters for each processor
      in struct zone.  The differential counters are consolidated when a threshold
      is exceeded (like done in the current implementation for nr_pageache), when
      slab reaping occurs or when a consolidation function is called.
      
      Consolidation uses atomic operations and accumulates counters per zone in the
      zone structure and also globally in the vm_stat array.  VM functions can
      access the counts by simply indexing a global or zone specific array.
      
      The arrangement of counters in an array also simplifies processing when output
      has to be generated for /proc/*.
      
      Counters can be updated by calling inc/dec_zone_page_state or
      _inc/dec_zone_page_state analogous to *_page_state.  The second group of
      functions can be called if it is known that interrupts are disabled.
      
      Special optimized increment and decrement functions are provided.  These can
      avoid certain checks and use increment or decrement instructions that an
      architecture may provide.
      
      We also add a new CONFIG_DMA_IS_NORMAL that signifies that an architecture can
      do DMA to all memory and therefore ZONE_NORMAL will not be populated.  This is
      only currently set for IA64 SGI SN2 and currently only affects
      node_page_state().  In the best case node_page_state can be reduced to
      retrieving a single counter for the one zone on the node.
      
      [akpm@osdl.org: cleanups]
      [akpm@osdl.org: export vm_stat[] for filesystems]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      2244b95a
  14. 28 6月, 2006 5 次提交
  15. 23 6月, 2006 3 次提交
    • P
      [PATCH] slab: kmalloc, kzalloc comments cleanup and fix · 800590f5
      Paul Drynoff 提交于
      - Move comments for kmalloc to right place, currently it near __do_kmalloc
      
      - Comments for kzalloc
      
      - More detailed comments for kmalloc
      
      - Appearance of "kmalloc" and "kzalloc" man pages after "make mandocs"
      
      [rdunlap@xenotime.net: simplification]
      Signed-off-by: NPaul Drynoff <pauldrynoff@gmail.com>
      Acked-by: NRandy Dunlap <rdunlap@xenotime.net>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      800590f5
    • I
      [PATCH] mm/slab.c: fix early init assumption · e0a42726
      Ingo Molnar 提交于
      The SLAB bootstrap code assumes that the first two kmalloc caches created
      (the INDEX_AC and INDEX_L3 kmalloc caches) wont be off-slab.  But due to AC
      and L3 structure size increase in lockdep, one of them ended up being
      off-slab, and subsequently crashing with:
      
      Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
       [<ffffffff80267478>] kmem_cache_alloc+0x26/0x7d
      
      The fix is to introduce a bootstrap flag and to use it to prevent off-slab
      caches being created so early during bootup.
      
      (The calculation for off-slab caches is quite complex so i didnt want to
      complicate things with introducing yet another INDEX_ calculation, the flag
      approach is simpler and smaller.)
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e0a42726
    • P
      [PATCH] slab: verify pointers before free · ddc2e812
      Pekka Enberg 提交于
      Passing an invalid pointer to kfree() and kmem_cache_free() is likely to
      cause bad memory corruption or even take down the whole system because the
      bad pointer is likely reused immediately due to the per-CPU caches.  Until
      now, we don't do any verification for this if CONFIG_DEBUG_SLAB is
      disabled.
      
      As suggested by Linus, add PageSlab check to page_to_cache() and
      page_to_slab() to verify pointers passed to kfree().  Also, move the
      stronger check from cache_free_debugcheck() to kmem_cache_free() to ensure
      the passed pointer actually belongs to the cache we're about to free the
      object.
      
      For page_to_cache() and page_to_slab(), the assertions should have
      virtually no extra cost (two instructions, no data cache pressure) and for
      kmem_cache_free() the overhead should be minimal.
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      Cc: Linus Torvalds <torvalds@osdl.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ddc2e812