1. 30 9月, 2006 3 次提交
    • D
      [PATCH] single bit flip detector · aa83aa40
      Dave Jones 提交于
      In cases where we detect a single bit has been flipped, we spew the usual
      slab corruption message, which users instantly think is a kernel bug.  In a
      lot of cases, single bit errors are down to bad memory, or other hardware
      failure.
      
      This patch adds an extra line to the slab debug messages in those cases, in
      the hope that users will try memtest before they report a bug.
      
      000: 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
      Single bit error detected. Possibly bad RAM. Run memtest86.
      
      [akpm@osdl.org: cleanups]
      Signed-off-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      aa83aa40
    • A
      [PATCH] mm: make filemap_nopage use NOPAGE_SIGBUS · 79f5acf5
      Adam Litke 提交于
      Don't open-code NOPAGE_SIGBUS.
      Signed-off-by: NAdam Litke <agl@us.ibm.com>
      Acked-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      79f5acf5
    • S
      [PATCH] mm: fix a race condition under SMC + COW · 4ce072f1
      Siddha, Suresh B 提交于
      Failing context is a multi threaded process context and the failing
      sequence is as follows.
      
      One thread T0 doing self modifying code on page X on processor P0 and
      another thread T1 doing COW (breaking the COW setup as part of just
      happened fork() in another thread T2) on the same page X on processor P1.
      T0 doing SMC can endup modifying the new page Y (allocated by the T1 doing
      COW on P1) but because of different I/D TLB's, P0 ITLB will not see the new
      mapping till the flush TLB IPI from P1 is received.  During this interval,
      if T0 executes the code created by SMC it can result in an app error (as
      ITLB still points to old page X and endup executing the content in page X
      rather than using the content in page Y).
      
      Fix this issue by first clearing the PTE and flushing it, before updating
      it with new entry.
      
      Hugh sayeth:
      
        I was a bit sceptical, in the habit of thinking that Self Modifying Code
        must look such issues itself: but I guess there's nothing it can do to avoid
        this one.
      
        Fair enough, what you're changing it to is pretty much what powerpc and
        s390 were already doing, and is a more robust way of proceeding, consistent
        with how ptes are set everywhere else.
      
        The ptep_clear_flush is a bit heavy-handed (it's anxious to return the pte
        that was atomically cleared), but we'd have to wander through lots of arches
        to get the right minimal behaviour.  It'd also be nice to eliminate
        ptep_establish completely, now only used to define other macros/inlines: it
        always seemed obfuscation to me, what you've got there now is clearer.
        Let's put those cleanups on a TODO list.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Acked-by: N"David S. Miller" <davem@davemloft.net>
      Acked-by: NHugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4ce072f1
  2. 27 9月, 2006 28 次提交
    • T
      [PATCH] inode-diet: Eliminate i_blksize from the inode structure · ba52de12
      Theodore Ts'o 提交于
      This eliminates the i_blksize field from struct inode.  Filesystems that want
      to provide a per-inode st_blksize can do so by providing their own getattr
      routine instead of using the generic_fillattr() function.
      
      Note that some filesystems were providing pretty much random (and incorrect)
      values for i_blksize.
      
      [bunk@stusta.de: cleanup]
      [akpm@osdl.org: generic_fillattr() fix]
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ba52de12
    • D
      [PATCH] NOMMU: Make futexes work under NOMMU conditions · 930e652a
      David Howells 提交于
      Make futexes work under NOMMU conditions.
      
      This can be tested by running this in one shell:
      
      	#define SYSERROR(X, Y) \
      		do { if ((long)(X) == -1L) { perror(Y); exit(1); }} while(0)
      
      	int main()
      	{
      		int shmid, tmp, *f, n;
      
      		shmid = shmget(23, 4, IPC_CREAT|0666);
      		SYSERROR(shmid, "shmget");
      
      		f = shmat(shmid, NULL, 0);
      		SYSERROR(f, "shmat");
      
      		n = *f;
      		printf("WAIT: %p{%x}\n", f, n);
      		tmp = futex(f, FUTEX_WAIT, n, NULL, NULL, 0);
      		SYSERROR(tmp, "futex");
      		printf("WAITED: %d\n", tmp);
      
      		tmp = shmdt(f);
      		SYSERROR(tmp, "shmdt");
      
      		exit(0);
      	}
      
      And then this in the other shell:
      
      	#define SYSERROR(X, Y) \
      		do { if ((long)(X) == -1L) { perror(Y); exit(1); }} while(0)
      
      	int main()
      	{
      		int shmid, tmp, *f;
      
      		shmid = shmget(23, 4, IPC_CREAT|0666);
      		SYSERROR(shmid, "shmget");
      
      		f = shmat(shmid, NULL, 0);
      		SYSERROR(f, "shmat");
      
      		(*f)++;
      		printf("WAKE: %p{%x}\n", f, *f);
      		tmp = futex(f, FUTEX_WAKE, 1, NULL, NULL, 0);
      		SYSERROR(tmp, "futex");
      		printf("WOKE: %d\n", tmp);
      
      		tmp = shmdt(f);
      		SYSERROR(tmp, "shmdt");
      
      		exit(0);
      	}
      
      The first program will set up a SYSV IPC SHM segment and wait on a futex in it
      for the number at the start to change.  The program will increment that number
      and wake the first program up.  This leads to output of the form:
      
      	SHELL 1			SHELL 2
      	=======================	=======================
      	# /dowait
      	WAIT: 0xc32ac000{0}
      				# /dowake
      				WAKE: 0xc32ac000{1}
      	WAITED: 0		WOKE: 1
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      930e652a
    • D
      [PATCH] NOMMU: Make mremap() partially work for NOMMU kernels · 6fa5f80b
      David Howells 提交于
      Make mremap() partially work for NOMMU kernels.  It may resize a VMA provided
      that it doesn't exceed the size of the slab object in which the storage is
      allocated that the VMA refers to.  Shareable VMAs may not be resized.
      
      Moving VMAs (as permitted by MREMAP_MAYMOVE) is not currently supported.
      
      This patch also makes use of the fact that the VMA list is now ordered to cut
      it short when possible.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6fa5f80b
    • D
      [PATCH] NOMMU: Order the per-mm_struct VMA list · 3034097a
      David Howells 提交于
      Order the per-mm_struct VMA list by address so that searching it can be cut
      short when the appropriate address has been exceeded.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3034097a
    • D
      [PATCH] NOMMU: Permit ptrace to ignore non-PROT_WRITE VMAs in NOMMU mode · d00c7b99
      David Howells 提交于
      Permit ptrace to modify a section that's non-shared but is marked
      unwritable, such as is obtained by mapping the text segment of an ELF-FDPIC
      executable binary with into a binary that's being ptraced[*].
      
      [*] Under NOMMU conditions ptrace causes read-only MAP_PRIVATE mmaps to become
          totally private copies because if a private mapping was actually shared
          then the debugging setting breakpoints in it would potentially crash
          other processes.
      
      This is done by using the VM_MAYWRITE flag rather than the VM_WRITE flag
      when deciding whether to permit a write.
      
      Without this patch a debugger can't set breakpoints in the mapped text
      sections of executables that are mapped read-only private, even if the
      mmap() syscall has taken a private copy because PT_PTRACED is set.
      
      In addition, VM_MAYREAD is used instead of VM_READ for similar reasons.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d00c7b99
    • D
      [PATCH] NOMMU: Check VMA protections · 7b4d5b8b
      David Howells 提交于
      Check the VMA protections in get_user_pages() against what's being asked.
      
      This checks to see that we don't accidentally write on a non-writable VMA or
      permit an I/O mapping VMA to be accessed (which may lack page structs).
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7b4d5b8b
    • S
      [PATCH] Check if start address is in vma region in NOMMU function get_user_pages() · 910e46da
      Sonic Zhang 提交于
      In NOMMU arch, if run "cat /proc/self/mem", data from physical address 0
      are read.  This behavior is different from MMU arch.  In IA32, message
      "cat: /proc/self/mem: Input/output error" is reported.
      
      This issue is rootcaused by not validate the start address in NOMMU
      function get_user_pages().  Following patch solves this issue.
      Signed-off-by: NSonic Zhang <sonic.adi@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      910e46da
    • D
      [PATCH] NOMMU: Use find_vma() rather than reimplementing a VMA search · 0159b141
      David Howells 提交于
      Use find_vma() in the NOMMU version of access_process_vm() rather than
      reimplementing it.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0159b141
    • D
      [PATCH] NOMMU: Check that access_process_vm() has a valid target · 0ec76a11
      David Howells 提交于
      Check that access_process_vm() is accessing a valid mapping in the target
      process.
      
      This limits ptrace() accesses and accesses through /proc/<pid>/maps to only
      those regions actually mapped by a program.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0ec76a11
    • R
      [PATCH] Mark __remove_vm_area() static · d24afc57
      Rolf Eike Beer 提交于
      The function is exported but not used from anywhere else.  It's also marked as
      "not for driver use" so noone out there should really care.
      Signed-off-by: NRolf Eike Beer <eike-kernel@sf-tec.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d24afc57
    • R
      [PATCH] Fix kerneldoc comments in mm/vmalloc.c · ead04089
      Rolf Eike Beer 提交于
      The empty line between the short description and the first argument
      description causes a section to appear twice in the generated manpage.
      Also the short description should really be short: the script can't handle
      multiple lines.
      Signed-off-by: NRolf Eike Beer <eike-kernel@sf-tec.de>
      Acked-by: NRandy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ead04089
    • R
      [PATCH] mm/page_alloc: use NULL instead of 0 for ptr · 423b41d7
      Randy Dunlap 提交于
      Use NULL instead of 0 for pointer value, eliminate sparse warnings.
      Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      423b41d7
    • J
      [PATCH] do_no_pfn() · f4b81804
      Jes Sorensen 提交于
      Implement do_no_pfn() for handling mapping of memory without a struct page
      backing it.  This avoids creating fake page table entries for regions which
      are not backed by real memory.
      
      This feature is used by the MSPEC driver and other users, where it is
      highly undesirable to have a struct page sitting behind the page (for
      instance if the page is accessed in cached mode via the struct page in
      parallel to the the driver accessing it uncached, which can result in data
      corruption on some architectures, such as ia64).
      
      This version uses specific NOPFN_{SIGBUS,OOM} return values, rather than
      expect all negative pfn values would be an error.  It also bugs on cow
      mappings as this would not work with the VM.
      
      [akpm@osdl.org: micro-optimise]
      Signed-off-by: NJes Sorensen <jes@sgi.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f4b81804
    • C
      [PATCH] zone_statistics: Use hot node instead of cold zone_pgdat · 5d292343
      Christoph Lameter 提交于
      Now that we have the node in the hot zone of struct zone we can avoid
      accessing zone_pgdat in zone_statistics.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5d292343
    • C
      [PATCH] Do not allocate pagesets for unpopulated zones. · 66a55030
      Christoph Lameter 提交于
      We do not need to allocate pagesets for unpopulated zones.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      66a55030
    • C
      [PATCH] Add node to zone for the NUMA case · d5f541ed
      Christoph Lameter 提交于
      Add the node in order to optimize zone_to_nid.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Acked-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d5f541ed
    • C
      [PATCH] GFP_THISNODE for the slab allocator · 765c4507
      Christoph Lameter 提交于
      This patch insures that the slab node lists in the NUMA case only contain
      slabs that belong to that specific node.  All slab allocations use
      GFP_THISNODE when calling into the page allocator.  If an allocation fails
      then we fall back in the slab allocator according to the zonelists appropriate
      for a certain context.
      
      This allows a replication of the behavior of alloc_pages and alloc_pages node
      in the slab layer.
      
      Currently allocations requested from the page allocator may be redirected via
      cpusets to other nodes.  This results in remote pages on nodelists and that in
      turn results in interrupt latency issues during cache draining.  Plus the slab
      is handing out memory as local when it is really remote.
      
      Fallback for slab memory allocations will occur within the slab allocator and
      not in the page allocator.  This is necessary in order to be able to use the
      existing pools of objects on the nodes that we fall back to before adding more
      pages to a slab.
      
      The fallback function insures that the nodes we fall back to obey cpuset
      restrictions of the current context.  We do not allocate objects from outside
      of the current cpuset context like before.
      
      Note that the implementation of locality constraints within the slab allocator
      requires importing logic from the page allocator.  This is a mischmash that is
      not that great.  Other allocators (uncached allocator, vmalloc, huge pages)
      face similar problems and have similar minimal reimplementations of the basic
      fallback logic of the page allocator.  There is another way of implementing a
      slab by avoiding per node lists (see modular slab) but this wont work within
      the existing slab.
      
      V1->V2:
      - Use NUMA_BUILD to avoid #ifdef CONFIG_NUMA
      - Exploit GFP_THISNODE being 0 in the NON_NUMA case to avoid another
        #ifdef
      
      [akpm@osdl.org: build fix]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      765c4507
    • C
      [PATCH] Add NUMA_BUILD definition in kernel.h to avoid #ifdef CONFIG_NUMA · 08e0f6a9
      Christoph Lameter 提交于
      The NUMA_BUILD constant is always available and will be set to 1 on
      NUMA_BUILDs.  That way checks valid only under CONFIG_NUMA can easily be done
      without #ifdef CONFIG_NUMA
      
      F.e.
      
      if (NUMA_BUILD && <numa_condition>) {
      ...
      }
      
      [akpm: not a thing we'd normally do, but CONFIG_NUMA is special: it is
       causing ifdef explosion in core kernel, so let's see if this is a comfortable
       way in whcih to control that]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      08e0f6a9
    • J
      [PATCH] Condense output of show_free_areas() · c7241913
      Jes Sorensen 提交于
      On larger systems, the amount of output dumped on the console when you do
      SysRq-M is beyond insane.  This patch is trying to reduce it somewhat as
      even with the smaller NUMA systems that have hit the desktop this seems to
      be a fair thing to do.
      
      The philosophy I have taken is as follows:
       1) If a zone is empty, don't tell, we don't need yet another line
          telling us so. The information is available since one can look up
          the fact how many zones were initialized in the first place.
       2) Put as much information on a line is possible, if it can be done
          in one line, rahter than two, then do it in one. I tried to format
          the temperature stuff for easy reading.
      
      Change show_free_areas() to not print lines for empty zones.  If no zone
      output is printed, the zone is empty.  This reduces the number of lines
      dumped to the console in sysrq on a large system by several thousand lines.
      
      Change the zone temperature printouts to use one line per CPU instead of
      two lines (one hot, one cold).  On a 1024 CPU, 1024 node system, this
      reduces the console output by over a million lines of output.
      
      While this is a bigger problem on large NUMA systems, it is also applicable
      to smaller desktop sized and mid range NUMA systems.
      
      Old format:
      
      Mem-info:
      Node 0 DMA per-cpu:
      cpu 0 hot: high 42, batch 7 used:24
      cpu 0 cold: high 14, batch 3 used:1
      cpu 1 hot: high 42, batch 7 used:34
      cpu 1 cold: high 14, batch 3 used:0
      cpu 2 hot: high 42, batch 7 used:0
      cpu 2 cold: high 14, batch 3 used:0
      cpu 3 hot: high 42, batch 7 used:0
      cpu 3 cold: high 14, batch 3 used:0
      cpu 4 hot: high 42, batch 7 used:0
      cpu 4 cold: high 14, batch 3 used:0
      cpu 5 hot: high 42, batch 7 used:0
      cpu 5 cold: high 14, batch 3 used:0
      cpu 6 hot: high 42, batch 7 used:0
      cpu 6 cold: high 14, batch 3 used:0
      cpu 7 hot: high 42, batch 7 used:0
      cpu 7 cold: high 14, batch 3 used:0
      Node 0 DMA32 per-cpu: empty
      Node 0 Normal per-cpu: empty
      Node 0 HighMem per-cpu: empty
      Node 1 DMA per-cpu:
      [snip]
      Free pages:     5410688kB (0kB HighMem)
      Active:9536 inactive:4261 dirty:6 writeback:0 unstable:0 free:338168 slab:1931 mapped:1900 pagetables:208
      Node 0 DMA free:1676304kB min:3264kB low:4080kB high:4896kB active:128048kB inactive:61568kB present:1970880kB pages_scanned:0 all_unreclaimable? no
      lowmem_reserve[]: 0 0 0 0
      Node 0 DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
      lowmem_reserve[]: 0 0 0 0
      Node 0 Normal free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
      lowmem_reserve[]: 0 0 0 0
      Node 0 HighMem free:0kB min:512kB low:512kB high:512kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
      lowmem_reserve[]: 0 0 0 0
      Node 1 DMA free:1951728kB min:3280kB low:4096kB high:4912kB active:5632kB inactive:1504kB present:1982464kB pages_scanned:0 all_unreclaimable? no
      lowmem_reserve[]: 0 0 0 0
      ....
      
      New format:
      
      Mem-info:
      Node 0 DMA per-cpu:
      CPU    0: Hot: hi:   42, btch:   7 usd:  41   Cold: hi:   14, btch:   3 usd:   2
      CPU    1: Hot: hi:   42, btch:   7 usd:  40   Cold: hi:   14, btch:   3 usd:   1
      CPU    2: Hot: hi:   42, btch:   7 usd:   0   Cold: hi:   14, btch:   3 usd:   0
      CPU    3: Hot: hi:   42, btch:   7 usd:   0   Cold: hi:   14, btch:   3 usd:   0
      CPU    4: Hot: hi:   42, btch:   7 usd:   0   Cold: hi:   14, btch:   3 usd:   0
      CPU    5: Hot: hi:   42, btch:   7 usd:   0   Cold: hi:   14, btch:   3 usd:   0
      CPU    6: Hot: hi:   42, btch:   7 usd:   0   Cold: hi:   14, btch:   3 usd:   0
      CPU    7: Hot: hi:   42, btch:   7 usd:   0   Cold: hi:   14, btch:   3 usd:   0
      Node 1 DMA per-cpu:
      [snip]
      Free pages:     5411088kB (0kB HighMem)
      Active:9558 inactive:4233 dirty:6 writeback:0 unstable:0 free:338193 slab:1942 mapped:1918 pagetables:208
      Node 0 DMA free:1677648kB min:3264kB low:4080kB high:4896kB active:129296kB inactive:58864kB present:1970880kB pages_scanned:0 all_unreclaimable? no
      lowmem_reserve[]: 0 0 0 0
      Node 1 DMA free:1948448kB min:3280kB low:4096kB high:4912kB active:6864kB inactive:3536kB present:1982464kB pages_scanned:0 all_unreclaimable? no
      lowmem_reserve[]: 0 0 0 0
      Signed-off-by: NJes Sorensen <jes@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c7241913
    • C
      [PATCH] slab: fix kmalloc_node applying memory policies if nodeid == numa_node_id() · de3083ec
      Christoph Lameter 提交于
      kmalloc_node() falls back to ___cache_alloc() under certain conditions and
      at that point memory policies may be applied redirecting the allocation
      away from the current node.  Therefore kmalloc_node(...,numa_node_id()) or
      kmalloc_node(...,-1) may not return memory from the local node.
      
      Fix this by doing the policy check in __cache_alloc() instead of
      ____cache_alloc().
      
      This version here is a cleanup of Kiran's patch.
      
      - Tested on ia64.
      - Extra material removed.
      - Consolidate the exit path if alternate_node_alloc() returned an object.
      
      [akpm@osdl.org: warning fix]
      Signed-off-by: NAlok N Kataria <alok.kataria@calsoftinc.com>
      Signed-off-by: NRavikiran Thirumalai <kiran@scalex86.org>
      Signed-off-by: NShai Fultheim <shai@scalex86.org>
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      de3083ec
    • N
      [PATCH] page invalidation cleanup · 0fd0e6b0
      Nick Piggin 提交于
      Clean up the invalidate code, and use a common function to safely remove
      the page from pagecache.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0fd0e6b0
    • A
      [PATCH] vm: add per-zone writeout counter · e129b5c2
      Andrew Morton 提交于
      The VM is supposed to minimise the number of pages which get written off the
      LRU (for IO scheduling efficiency, and for high reclaim-success rates).  But
      we don't actually have a clear way of showing how true this is.
      
      So add `nr_vmscan_write' to /proc/vmstat and /proc/zoneinfo - the number of
      pages which have been written by the vm scanner in this zone and globally.
      
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e129b5c2
    • M
      [PATCH] Allow an arch to expand node boundaries · fb01439c
      Mel Gorman 提交于
      Arch-independent zone-sizing determines the size of a node
      (pgdat->node_spanned_pages) based on the physical memory that was
      registered by the architecture.  However, when
      CONFIG_MEMORY_HOTPLUG_RESERVE is set, the architecture expects that the
      spanned_pages will be much larger and that mem_map will be allocated that
      is used lated on memory hot-add.
      
      This patch allows an architecture that sets CONFIG_MEMORY_HOTPLUG_RESERVE
      to call push_node_boundaries() which will set the node beginning and end to
      at *least* the requested boundary.
      
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Keith Mannthey" <kmannth@gmail.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fb01439c
    • M
      [PATCH] Account for holes that are outside the range of physical memory · 9c7cd687
      Mel Gorman 提交于
      absent_pages_in_range() made the assumption that users of the API would not
      care about holes beyound the end of physical memory.  This was not the
      case.  This patch will account for ranges outside of physical memory as
      holes correctly.
      
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Keith Mannthey" <kmannth@gmail.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9c7cd687
    • M
      [PATCH] Account for memmap and optionally the kernel image as holes · 0e0b864e
      Mel Gorman 提交于
      The x86_64 code accounted for memmap and some portions of the the DMA zone as
      holes.  This was because those areas would never be reclaimed and accounting
      for them as memory affects min watermarks.  This patch will account for the
      memmap as a memory hole.  Architectures may optionally use set_dma_reserve()
      if they wish to account for a portion of memory in ZONE_DMA as a hole.
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Keith Mannthey" <kmannth@gmail.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0e0b864e
    • M
      [PATCH] Introduce mechanism for registering active regions of memory · c713216d
      Mel Gorman 提交于
      At a basic level, architectures define structures to record where active
      ranges of page frames are located.  Once located, the code to calculate zone
      sizes and holes in each architecture is very similar.  Some of this zone and
      hole sizing code is difficult to read for no good reason.  This set of patches
      eliminates the similar-looking architecture-specific code.
      
      The patches introduce a mechanism where architectures register where the
      active ranges of page frames are with add_active_range().  When all areas have
      been discovered, free_area_init_nodes() is called to initialise the pgdat and
      zones.  The zone sizes and holes are then calculated in an architecture
      independent manner.
      
      Patch 1 introduces the mechanism for registering and initialising PFN ranges
      Patch 2 changes ppc to use the mechanism - 139 arch-specific LOC removed
      Patch 3 changes x86 to use the mechanism - 136 arch-specific LOC removed
      Patch 4 changes x86_64 to use the mechanism - 74 arch-specific LOC removed
      Patch 5 changes ia64 to use the mechanism - 52 arch-specific LOC removed
      Patch 6 accounts for mem_map as a memory hole as the pages are not reclaimable.
      	It adjusts the watermarks slightly
      
      Tony Luck has successfully tested for ia64 on Itanium with tiger_defconfig,
      gensparse_defconfig and defconfig.  Bob Picco has also tested and debugged on
      IA64.  Jack Steiner successfully boot tested on a mammoth SGI IA64-based
      machine.  These were on patches against 2.6.17-rc1 and release 3 of these
      patches but there have been no ia64-changes since release 3.
      
      There are differences in the zone sizes for x86_64 as the arch-specific code
      for x86_64 accounts the kernel image and the starting mem_maps as memory holes
      but the architecture-independent code accounts the memory as present.
      
      The big benefit of this set of patches is a sizable reduction of
      architecture-specific code, some of which is very hairy.  There should be a
      greater reduction when other architectures use the same mechanisms for zone
      and hole sizing but I lack the hardware to test on.
      
      Additional credit;
      	Dave Hansen for the initial suggestion and comments on early patches
      	Andy Whitcroft for reviewing early versions and catching numerous
      		errors
      	Tony Luck for testing and debugging on IA64
      	Bob Picco for fixing bugs related to pfn registration, reviewing a
      		number of patch revisions, providing a number of suggestions
      		on future direction and testing heavily
      	Jack Steiner and Robin Holt for testing on IA64 and clarifying
      		issues related to memory holes
      	Yasunori for testing on IA64
      	Andi Kleen for reviewing and feeding back about x86_64
      	Christian Kujau for providing valuable information related to ACPI
      		problems on x86_64 and testing potential fixes
      
      This patch:
      
      Define the structure to represent an active range of page frames within a node
      in an architecture independent manner.  Architectures are expected to register
      active ranges of PFNs using add_active_range(nid, start_pfn, end_pfn) and call
      free_area_init_nodes() passing the PFNs of the end of each zone.
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Signed-off-by: NBob Picco <bob.picco@hp.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Keith Mannthey" <kmannth@gmail.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c713216d
    • A
      [PATCH] Make kmem_cache_destroy() return void · 133d205a
      Alexey Dobriyan 提交于
      un-, de-, -free, -destroy, -exit, etc functions should in general return
      void.  Also,
      
      There is very little, say, filesystem driver code can do upon failed
      kmem_cache_destroy().  If it will be decided to BUG in this case, BUG
      should be put in generic code, instead.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      133d205a
    • A
      [PATCH] Really ignore kmem_cache_destroy return value · 1a1d92c1
      Alexey Dobriyan 提交于
      * Rougly half of callers already do it by not checking return value
      * Code in drivers/acpi/osl.c does the following to be sure:
      
      	(void)kmem_cache_destroy(cache);
      
      * Those who check it printk something, however, slab_error already printed
        the name of failed cache.
      * XFS BUGs on failed kmem_cache_destroy which is not the decision
        low-level filesystem driver should make. Converted to ignore.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1a1d92c1
  3. 26 9月, 2006 9 次提交