1. 07 1月, 2006 2 次提交
    • M
      [PATCH] mm: remove arch independent NODES_SPAN_OTHER_NODES · a94b3ab7
      Mike Kravetz 提交于
      The NODES_SPAN_OTHER_NODES config option was created so that DISCONTIGMEM
      could handle pSeries numa layouts.  However, support for DISCONTIGMEM has
      been replaced by SPARSEMEM on powerpc.  As a result, this config option and
      supporting code is no longer needed.
      
      I have already sent a patch to Paul that removes the option from powerpc
      specific code.  This removes the arch independent piece.  Doesn't really
      matter which is applied first.
      Signed-off-by: NMike Kravetz <kravetz@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a94b3ab7
    • P
      [PATCH] mm: fix __alloc_pages cpuset ALLOC_* flags · 47f3a867
      Paul Jackson 提交于
      Two changes to the setting of the ALLOC_CPUSET flag in
      mm/page_alloc.c:__alloc_pages()
      
      - A bug fix - the "ignoring mins" case should not be honoring ALLOC_CPUSET.
        This case of all cases, since it is handling a request that will free up
        more memory than is asked for (exiting tasks, e.g.) should be allowed to
        escape cpuset constraints when memory is tight.
      
      - A logic change to make it simpler.  Honor cpusets even on GFP_ATOMIC
        (!wait) requests.  With this, cpuset confinement applies to all requests
        except ALLOC_NO_WATERMARKS, so that in a subsequent cleanup patch, I can
        remove the ALLOC_CPUSET flag entirely.  Since I don't know any real reason
        this logic has to be either way, I am choosing the path of the simplest
        code.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      47f3a867
  2. 16 12月, 2005 1 次提交
  3. 04 12月, 2005 1 次提交
  4. 29 11月, 2005 1 次提交
  5. 23 11月, 2005 2 次提交
    • H
      [PATCH] unpaged: PG_reserved bad_page · 689bcebf
      Hugh Dickins 提交于
      It used to be the case that PG_reserved pages were silently never freed, but
      in 2.6.15-rc1 they may be freed with a "Bad page state" message.  We should
      work through such cases as they appear, fixing the code; but for now it's
      safer to issue the message without freeing the page, leaving PG_reserved set.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      689bcebf
    • H
      [PATCH] unpaged: unifdefed PageCompound · 664beed0
      Hugh Dickins 提交于
      It looks like snd_xxx is not the only nopage to be using PageReserved as a way
      of holding a high-order page together: which no longer works, but is masked by
      our failure to free from VM_RESERVED areas.  We cannot fix that bug without
      first substituting another way to hold the high-order page together, while
      farming out the 0-order pages from within it.
      
      That's just what PageCompound is designed for, but it's been kept under
      CONFIG_HUGETLB_PAGE.  Remove the #ifdefs: which saves some space (out- of-line
      put_page), doesn't slow down what most needs to be fast (already using
      hugetlb), and unifies the way we handle high-order pages.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      664beed0
  6. 18 11月, 2005 1 次提交
  7. 15 11月, 2005 3 次提交
    • A
      [PATCH] x86_64: Remove obsolete ARCH_HAS_ATOMIC_UNSIGNED and page_flags_t · 07808b74
      Andi Kleen 提交于
      Has been introduced for x86-64 at some point to save memory
      in struct page, but has been obsolete for some time. Just
      remove it.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      07808b74
    • A
      b0d41693
    • A
      [PATCH] x86_64: Add 4GB DMA32 zone · a2f1b424
      Andi Kleen 提交于
      Add a new 4GB GFP_DMA32 zone between the GFP_DMA and GFP_NORMAL zones.
      
      As a bit of historical background: when the x86-64 port
      was originally designed we had some discussion if we should
      use a 16MB DMA zone like i386 or a 4GB DMA zone like IA64 or
      both. Both was ruled out at this point because it was in early
      2.4 when VM is still quite shakey and had bad troubles even
      dealing with one DMA zone.  We settled on the 16MB DMA zone mainly
      because we worried about older soundcards and the floppy.
      
      But this has always caused problems since then because
      device drivers had trouble getting enough DMA able memory. These days
      the VM works much better and the wide use of NUMA has proven
      it can deal with many zones successfully.
      
      So this patch adds both zones.
      
      This helps drivers who need a lot of memory below 4GB because
      their hardware is not accessing more (graphic drivers - proprietary
      and free ones, video frame buffer drivers, sound drivers etc.).
      Previously they could only use IOMMU+16MB GFP_DMA, which
      was not enough memory.
      
      Another common problem is that hardware who has full memory
      addressing for >4GB misses it for some control structures in memory
      (like transmit rings or other metadata).  They tended to allocate memory
      in the 16MB GFP_DMA or the IOMMU/swiotlb then using pci_alloc_consistent,
      but that can tie up a lot of precious 16MB GFPDMA/IOMMU/swiotlb memory
      (even on AMD systems the IOMMU tends to be quite small) especially if you have
      many devices.  With the new zone pci_alloc_consistent can just put
      this stuff into memory below 4GB which works better.
      
      One argument was still if the zone should be 4GB or 2GB. The main
      motivation for 2GB would be an unnamed not so unpopular hardware
      raid controller (mostly found in older machines from a particular four letter
      company) who has a strange 2GB restriction in firmware. But
      that one works ok with swiotlb/IOMMU anyways, so it doesn't really
      need GFP_DMA32. I chose 4GB to be compatible with IA64 and because
      it seems to be the most common restriction.
      
      The new zone is so far added only for x86-64.
      
      For other architectures who don't set up this
      new zone nothing changes. Architectures can set a compatibility
      define in Kconfig CONFIG_DMA_IS_DMA32 that will define GFP_DMA32
      as GFP_DMA. Otherwise it's a nop because on 32bit architectures
      it's normally not needed because GFP_NORMAL (=0) is DMA able
      enough.
      
      One problem is still that GFP_DMA means different things on different
      architectures. e.g. some drivers used to have #ifdef ia64  use GFP_DMA
      (trusting it to be 4GB) #elif __x86_64__ (use other hacks like
      the swiotlb because 16MB is not enough) ... . This was quite
      ugly and is now obsolete.
      
      These should be now converted to use GFP_DMA32 unconditionally. I haven't done
      this yet. Or best only use pci_alloc_consistent/dma_alloc_coherent
      which will use GFP_DMA32 transparently.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a2f1b424
  8. 14 11月, 2005 3 次提交
  9. 11 11月, 2005 1 次提交
  10. 07 11月, 2005 1 次提交
  11. 30 10月, 2005 10 次提交
    • J
      [PATCH] mm: wider use of for_each_*cpu() · 2f96996d
      John Hawkes 提交于
      In 'mm' change the explicit use of a for-loop using NR_CPUS into the
      general for_each_cpu() constructs.  This widens the scope of potential
      future optimizations of the general constructs, as well as takes advantage
      of the existing optimizations of first_cpu() and next_cpu(), which is
      advantageous when the true CPU count is much smaller than NR_CPUS.
      Signed-off-by: NJohn Hawkes <hawkes@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      2f96996d
    • D
      [PATCH] memory hotplug: sysfs and add/remove functions · 3947be19
      Dave Hansen 提交于
      This adds generic memory add/remove and supporting functions for memory
      hotplug into a new file as well as a memory hotplug kernel config option.
      
      Individual architecture patches will follow.
      
      For now, disable memory hotplug when swsusp is enabled.  There's a lot of
      churn there right now.  We'll fix it up properly once it calms down.
      Signed-off-by: NMatt Tolentino <matthew.e.tolentino@intel.com>
      Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3947be19
    • D
      [PATCH] memory hotplug locking: zone span seqlock · bdc8cb98
      Dave Hansen 提交于
      See the "fixup bad_range()" patch for more information, but this actually
      creates a the lock to protect things making assumptions about a zone's size
      staying constant at runtime.
      Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      bdc8cb98
    • D
      [PATCH] memory hotplug locking: node_size_lock · 208d54e5
      Dave Hansen 提交于
      pgdat->node_size_lock is basically only neeeded in one place in the normal
      code: show_mem(), which is the arch-specific sysrq-m printing function.
      
      Strictly speaking, the architectures not doing memory hotplug do no need this
      locking in show_mem().  However, they are all included for completeness.  This
      should also make any future consolidation of all of the implementations a
      little more straightforward.
      
      This lock is also held in the sparsemem code during a memory removal, as
      sections are invalidated.  This is the place there pfn_valid() is made false
      for a memory area that's being removed.  The lock is only required when doing
      pfn_valid() operations on memory which the user does not already have a
      reference on the page, such as in show_mem().
      Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      208d54e5
    • D
      [PATCH] memory hotplug prep: fixup bad_range() · c6a57e19
      Dave Hansen 提交于
      When doing memory hotplug operations, the size of existing zones can obviously
      change.  This means that zone->zone_{start_pfn,spanned_pages} can change.
      
      There are currently no locks that protect these structure members.  However,
      they are rarely accessed at runtime.  Outside of swsusp, the only place that I
      can find is bad_range().
      
      So, split bad_range() up into two pieces: one that needs to be locked and
      anther that doesn't.
      Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c6a57e19
    • D
      [PATCH] memory hotplug prep: break out zone initialization · ed8ece2e
      Dave Hansen 提交于
      If a zone is empty at boot-time and then hot-added to later, it needs to run
      the same init code that would have been run on it at boot.
      
      This patch breaks out zone table and per-cpu-pages functions for use by the
      hotplug code.  You can almost see all of the free_area_init_core() function on
      one page now.  :)
      Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ed8ece2e
    • H
      [PATCH] mm: split page table lock · 4c21e2f2
      Hugh Dickins 提交于
      Christoph Lameter demonstrated very poor scalability on the SGI 512-way, with
      a many-threaded application which concurrently initializes different parts of
      a large anonymous area.
      
      This patch corrects that, by using a separate spinlock per page table page, to
      guard the page table entries in that page, instead of using the mm's single
      page_table_lock.  (But even then, page_table_lock is still used to guard page
      table allocation, and anon_vma allocation.)
      
      In this implementation, the spinlock is tucked inside the struct page of the
      page table page: with a BUILD_BUG_ON in case it overflows - which it would in
      the case of 32-bit PA-RISC with spinlock debugging enabled.
      
      Splitting the lock is not quite for free: another cacheline access.  Ideally,
      I suppose we would use split ptlock only for multi-threaded processes on
      multi-cpu machines; but deciding that dynamically would have its own costs.
      So for now enable it by config, at some number of cpus - since the Kconfig
      language doesn't support inequalities, let preprocessor compare that with
      NR_CPUS.  But I don't think it's worth being user-configurable: for good
      testing of both split and unsplit configs, split now at 4 cpus, and perhaps
      change that to 8 later.
      
      There is a benefit even for singly threaded processes: kswapd can be attacking
      one part of the mm while another part is busy faulting.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4c21e2f2
    • N
      [PATCH] core remove PageReserved · b5810039
      Nick Piggin 提交于
      Remove PageReserved() calls from core code by tightening VM_RESERVED
      handling in mm/ to cover PageReserved functionality.
      
      PageReserved special casing is removed from get_page and put_page.
      
      All setting and clearing of PageReserved is retained, and it is now flagged
      in the page_alloc checks to help ensure we don't introduce any refcount
      based freeing of Reserved pages.
      
      MAP_PRIVATE, PROT_WRITE of VM_RESERVED regions is tentatively being
      deprecated.  We never completely handled it correctly anyway, and is be
      reintroduced in future if required (Hugh has a proof of concept).
      
      Once PageReserved() calls are removed from kernel/power/swsusp.c, and all
      arch/ and driver code, the Set and Clear calls, and the PG_reserved bit can
      be trivially removed.
      
      Last real user of PageReserved is swsusp, which uses PageReserved to
      determine whether a struct page points to valid memory or not.  This still
      needs to be addressed (a generic page_is_ram() should work).
      
      A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and
      thus mapcounted and count towards shared rss).  These writes to the struct
      page could cause excessive cacheline bouncing on big systems.  There are a
      number of ways this could be addressed if it is an issue.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      
      Refcount bug fix for filemap_xip.c
      Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b5810039
    • S
      [PATCH] mm: set per-cpu-pages lower threshold to zero · e46a5e28
      Seth, Rohit 提交于
      Set the low water mark for hot pages in pcp to zero.
      
      (akpm: for the life of me I cannot remember why we created pcp->low.  Neither
      can Martin and the changelog is silent.  Maybe it was just a brainfart, but I
      have this feeling that there was a reason.  If not, we should remove the
      fields completely.  We'll see.)
      Signed-off-by: NRohit Seth <rohit.seth@intel.com>
      Cc: <linux-mm@kvack.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e46a5e28
    • S
      [PATCH] mm: page_alloc: increase size of per-cpu-pages · ba56e91c
      Seth, Rohit 提交于
      Increase the page allocator's per-cpu magazines from 1/4MB to 1/2MB.
      
      Over 100+ runs for a workload, the difference in mean is about 2%.  The best
      results for both are almost same.  Though the max variation in results with
      1/2MB is only 2.2%, whereas with 1/4MB it is 12%.
      Signed-off-by: NRohit Seth <rohit.seth@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ba56e91c
  12. 28 10月, 2005 2 次提交
  13. 27 10月, 2005 1 次提交
  14. 09 10月, 2005 1 次提交
  15. 13 9月, 2005 1 次提交
  16. 11 9月, 2005 1 次提交
  17. 08 9月, 2005 3 次提交
    • P
      [PATCH] cpusets: formalize intermediate GFP_KERNEL containment · 9bf2229f
      Paul Jackson 提交于
      This patch makes use of the previously underutilized cpuset flag
      'mem_exclusive' to provide what amounts to another layer of memory placement
      resolution.  With this patch, there are now the following four layers of
      memory placement available:
      
       1) The whole system (interrupt and GFP_ATOMIC allocations can use this),
       2) The nearest enclosing mem_exclusive cpuset (GFP_KERNEL allocations can use),
       3) The current tasks cpuset (GFP_USER allocations constrained to here), and
       4) Specific node placement, using mbind and set_mempolicy.
      
      These nest - each layer is a subset (same or within) of the previous.
      
      Layer (2) above is new, with this patch.  The call used to check whether a
      zone (its node, actually) is in a cpuset (in its mems_allowed, actually) is
      extended to take a gfp_mask argument, and its logic is extended, in the case
      that __GFP_HARDWALL is not set in the flag bits, to look up the cpuset
      hierarchy for the nearest enclosing mem_exclusive cpuset, to determine if
      placement is allowed.  The definition of GFP_USER, which used to be identical
      to GFP_KERNEL, is changed to also set the __GFP_HARDWALL bit, in the previous
      cpuset_gfp_hardwall_flag patch.
      
      GFP_ATOMIC and GFP_KERNEL allocations will stay within the current tasks
      cpuset, so long as any node therein is not too tight on memory, but will
      escape to the larger layer, if need be.
      
      The intended use is to allow something like a batch manager to handle several
      jobs, each job in its own cpuset, but using common kernel memory for caches
      and such.  Swapper and oom_kill activity is also constrained to Layer (2).  A
      task in or below one mem_exclusive cpuset should not cause swapping on nodes
      in another non-overlapping mem_exclusive cpuset, nor provoke oom_killing of a
      task in another such cpuset.  Heavy use of kernel memory for i/o caching and
      such by one job should not impact the memory available to jobs in other
      non-overlapping mem_exclusive cpusets.
      
      This patch enables providing hardwall, inescapable cpusets for memory
      allocations of each job, while sharing kernel memory allocations between
      several jobs, in an enclosing mem_exclusive cpuset.
      
      Like Dinakar's patch earlier to enable administering sched domains using the
      cpu_exclusive flag, this patch also provides a useful meaning to a cpuset flag
      that had previously done nothing much useful other than restrict what cpuset
      configurations were allowed.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9bf2229f
    • R
      [PATCH] Additions to .data.read_mostly section · 6c231b7b
      Ravikiran G Thirumalai 提交于
      Mark variables which are usually accessed for reads with __readmostly.
      Signed-off-by: NAlok N Kataria <alokk@calsoftinc.com>
      Signed-off-by: NShai Fultheim <shai@scalex86.org>
      Signed-off-by: NRavikiran Thirumalai <kiran@scalex86.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6c231b7b
    • C
      [PATCH] More __read_mostly variables · c3d8c141
      Christoph Lameter 提交于
      Move some more frequently read variables that showed up during some of our
      performance tests as sometimes ending up in hot cachelines to the
      read_mostly section.
      
      Fix: Move the __read_mostly from before hpet_usec_quotient to follow the
      variable like the other uses of __read_mostly.
      Signed-off-by: NAlok N Kataria <alokk@calsoftinc.com>
      Signed-off-by: NChristoph Lameter <christoph@scalex86.org>
      Signed-off-by: NShai Fultheim <shai@scalex86.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c3d8c141
  18. 05 9月, 2005 3 次提交
  19. 31 7月, 2005 1 次提交
  20. 28 7月, 2005 1 次提交
    • A
      [PATCH] Remove bogus warning in page_alloc.c · 12b1c5f3
      Andy Whitcroft 提交于
      Originally __free_pages_bulk used the relative page number within a zone to
      define its buddies.  This meant that to maintain the "maximally aligned"
      requirements (that an allocation of size N will be aligned at least to N
      physically) zones had to also be aligned to 1<<MAX_ORDER pages.  When
      __free_pages_bulk was updated to use the relative page frame numbers of the
      free'd pages to pair buddies this released the alignment constraint on the
      'left' edge of the zone.  This allows _either_ edge of the zone to contain
      partial MAX_ORDER sized buddies.  These simply never will have matching
      buddies and thus will never make it to the 'top' of the pyramid.
      
      The patch below removes a now redundant check ensuring that the mem_map was
      aligned to MAX_ORDER.
      Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
      Cc: Christoph Lameter <christoph@lameter.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      12b1c5f3