1. 26 7月, 2011 3 次提交
  2. 21 7月, 2011 1 次提交
    • D
      superblock: introduce per-sb cache shrinker infrastructure · b0d40c92
      Dave Chinner 提交于
      With context based shrinkers, we can implement a per-superblock
      shrinker that shrinks the caches attached to the superblock. We
      currently have global shrinkers for the inode and dentry caches that
      split up into per-superblock operations via a coarse proportioning
      method that does not batch very well.  The global shrinkers also
      have a dependency - dentries pin inodes - so we have to be very
      careful about how we register the global shrinkers so that the
      implicit call order is always correct.
      
      With a per-sb shrinker callout, we can encode this dependency
      directly into the per-sb shrinker, hence avoiding the need for
      strictly ordering shrinker registrations. We also have no need for
      any proportioning code for the shrinker subsystem already provides
      this functionality across all shrinkers. Allowing the shrinker to
      operate on a single superblock at a time means that we do less
      superblock list traversals and locking and reclaim should batch more
      effectively. This should result in less CPU overhead for reclaim and
      potentially faster reclaim of items from each filesystem.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      b0d40c92
  3. 20 7月, 2011 1 次提交
  4. 13 7月, 2011 1 次提交
    • T
      x86, numa: Implement pfn -> nid mapping granularity check · 1e01979c
      Tejun Heo 提交于
      SPARSEMEM w/o VMEMMAP and DISCONTIGMEM, both used only on 32bit, use
      sections array to map pfn to nid which is limited in granularity.  If
      NUMA nodes are laid out such that the mapping cannot be accurate, boot
      will fail triggering BUG_ON() in mminit_verify_page_links().
      
      On 32bit, it's 512MiB w/ PAE and SPARSEMEM.  This seems to have been
      granular enough until commit 2706a0bf (x86, NUMA: Enable
      CONFIG_AMD_NUMA on 32bit too).  Apparently, there is a machine which
      aligns NUMA nodes to 128MiB and has only AMD NUMA but not SRAT.  This
      led to the following BUG_ON().
      
       On node 0 totalpages: 2096615
         DMA zone: 32 pages used for memmap
         DMA zone: 0 pages reserved
         DMA zone: 3927 pages, LIFO batch:0
         Normal zone: 1740 pages used for memmap
         Normal zone: 220978 pages, LIFO batch:31
         HighMem zone: 16405 pages used for memmap
         HighMem zone: 1853533 pages, LIFO batch:31
       BUG: Int 6: CR2   (null)
            EDI   (null)  ESI 00000002  EBP 00000002  ESP c1543ecc
            EBX f2400000  EDX 00000006  ECX   (null)  EAX 00000001
            err   (null)  EIP c16209aa   CS 00000060  flg 00010002
       Stack: f2400000 00220000 f7200800 c1620613 00220000 01000000 04400000 00238000
                (null) f7200000 00000002 f7200b58 f7200800 c1620929 000375fe   (null)
              f7200b80 c16395f0 00200a02 f7200a80   (null) 000375fe 00000002   (null)
       Pid: 0, comm: swapper Not tainted 2.6.39-rc5-00181-g2706a0bf #17
       Call Trace:
        [<c136b1e5>] ? early_fault+0x2e/0x2e
        [<c16209aa>] ? mminit_verify_page_links+0x12/0x42
        [<c1620613>] ? memmap_init_zone+0xaf/0x10c
        [<c1620929>] ? free_area_init_node+0x2b9/0x2e3
        [<c1607e99>] ? free_area_init_nodes+0x3f2/0x451
        [<c1601d80>] ? paging_init+0x112/0x118
        [<c15f578d>] ? setup_arch+0x791/0x82f
        [<c15f43d9>] ? start_kernel+0x6a/0x257
      
      This patch implements node_map_pfn_alignment() which determines
      maximum internode alignment and update numa_register_memblks() to
      reject NUMA configuration if alignment exceeds the pfn -> nid mapping
      granularity of the memory model as determined by PAGES_PER_SECTION.
      
      This makes the problematic machine boot w/ flatmem by rejecting the
      NUMA config and provides protection against crazy NUMA configurations.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Link: http://lkml.kernel.org/r/20110712074534.GB2872@htj.dyndns.org
      LKML-Reference: <20110628174613.GP478@escobedo.osrc.amd.com>
      Reported-and-Tested-by: NHans Rosenfeld <hans.rosenfeld@amd.com>
      Cc: Conny Seidel <conny.seidel@amd.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      1e01979c
  5. 27 5月, 2011 2 次提交
  6. 25 5月, 2011 13 次提交
  7. 10 5月, 2011 1 次提交
    • M
      Don't lock guardpage if the stack is growing up · a09a79f6
      Mikulas Patocka 提交于
      Linux kernel excludes guard page when performing mlock on a VMA with
      down-growing stack. However, some architectures have up-growing stack
      and locking the guard page should be excluded in this case too.
      
      This patch fixes lvm2 on PA-RISC (and possibly other architectures with
      up-growing stack). lvm2 calculates number of used pages when locking and
      when unlocking and reports an internal error if the numbers mismatch.
      
      [ Patch changed fairly extensively to also fix /proc/<pid>/maps for the
        grows-up case, and to move things around a bit to clean it all up and
        share the infrstructure with the /proc bits.
      
        Tested on ia64 that has both grow-up and grow-down segments  - Linus ]
      Signed-off-by: NMikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
      Tested-by: NTony Luck <tony.luck@gmail.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a09a79f6
  8. 29 4月, 2011 1 次提交
    • A
      mm: thp: fix /dev/zero MAP_PRIVATE and vm_flags cleanups · 78f11a25
      Andrea Arcangeli 提交于
      The huge_memory.c THP page fault was allowed to run if vm_ops was null
      (which would succeed for /dev/zero MAP_PRIVATE, as the f_op->mmap wouldn't
      setup a special vma->vm_ops and it would fallback to regular anonymous
      memory) but other THP logics weren't fully activated for vmas with vm_file
      not NULL (/dev/zero has a not NULL vma->vm_file).
      
      So this removes the vm_file checks so that /dev/zero also can safely use
      THP (the other albeit safer approach to fix this bug would have been to
      prevent the THP initial page fault to run if vm_file was set).
      
      After removing the vm_file checks, this also makes huge_memory.c stricter
      in khugepaged for the DEBUG_VM=y case.  It doesn't replace the vm_file
      check with a is_pfn_mapping check (but it keeps checking for VM_PFNMAP
      under VM_BUG_ON) because for a is_cow_mapping() mapping VM_PFNMAP should
      only be allowed to exist before the first page fault, and in turn when
      vma->anon_vma is null (so preventing khugepaged registration).  So I tend
      to think the previous comment saying if vm_file was set, VM_PFNMAP might
      have been set and we could still be registered in khugepaged (despite
      anon_vma was not NULL to be registered in khugepaged) was too paranoid.
      The is_linear_pfn_mapping check is also I think superfluous (as described
      by comment) but under DEBUG_VM it is safe to stay.
      
      Addresses https://bugzilla.kernel.org/show_bug.cgi?id=33682Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reported-by: NCaspar Zhang <bugs@casparzhang.com>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: <stable@kernel.org>		[2.6.38.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      78f11a25
  9. 31 3月, 2011 1 次提交
  10. 25 3月, 2011 1 次提交
  11. 24 3月, 2011 4 次提交
  12. 23 3月, 2011 3 次提交
    • D
      pagewalk: only split huge pages when necessary · 03319327
      Dave Hansen 提交于
      Right now, if a mm_walk has either ->pte_entry or ->pmd_entry set, it will
      unconditionally split any transparent huge pages it runs in to.  In
      practice, that means that anyone doing a
      
      	cat /proc/$pid/smaps
      
      will unconditionally break down every huge page in the process and depend
      on khugepaged to re-collapse it later.  This is fairly suboptimal.
      
      This patch changes that behavior.  It teaches each ->pmd_entry handler
      (there are five) that they must break down the THPs themselves.  Also, the
      _generic_ code will never break down a THP unless a ->pte_entry handler is
      actually set.
      
      This means that the ->pmd_entry handlers can now choose to deal with THPs
      without breaking them down.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Reviewed-by: NEric B Munson <emunson@mgebm.net>
      Tested-by: NEric B Munson <emunson@mgebm.net>
      Cc: Michael J Wolf <mjwolf@us.ibm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      03319327
    • G
      mm: allow GUP to fail instead of waiting on a page · 318b275f
      Gleb Natapov 提交于
      GUP user may want to try to acquire a reference to a page if it is already
      in memory, but not if IO, to bring it in, is needed.  For example KVM may
      tell vcpu to schedule another guest process if current one is trying to
      access swapped out page.  Meanwhile, the page will be swapped in and the
      guest process, that depends on it, will be able to run again.
      
      This patch adds FAULT_FLAG_RETRY_NOWAIT (suggested by Linus) and
      FOLL_NOWAIT follow_page flags.  FAULT_FLAG_RETRY_NOWAIT, when used in
      conjunction with VM_FAULT_ALLOW_RETRY, indicates to handle_mm_fault that
      it shouldn't drop mmap_sem and wait on a page, but return VM_FAULT_RETRY
      instead.
      
      [akpm@linux-foundation.org: improve FOLL_NOWAIT comment]
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      318b275f
    • D
      oom: suppress nodes that are not allowed from meminfo on oom kill · ddd588b5
      David Rientjes 提交于
      The oom killer is extremely verbose for machines with a large number of
      cpus and/or nodes.  This verbosity can often be harmful if it causes other
      important messages to be scrolled from the kernel log and incurs a
      signicant time delay, specifically for kernels with CONFIG_NODES_SHIFT >
      8.
      
      This patch causes only memory information to be displayed for nodes that
      are allowed by current's cpuset when dumping the VM state.  Information
      for all other nodes is irrelevant to the oom condition; we don't care if
      there's an abundance of memory elsewhere if we can't access it.
      
      This only affects the behavior of dumping memory information when an oom
      is triggered.  Other dumps, such as for sysrq+m, still display the
      unfiltered form when using the existing show_mem() interface.
      
      Additionally, the per-cpu pageset statistics are extremely verbose in oom
      killer output, so it is now suppressed.  This removes
      
      	nodes_weight(current->mems_allowed) * (1 + nr_cpus)
      
      lines from the oom killer output.
      
      Callers may use __show_mem(SHOW_MEM_FILTER_NODES) to filter disallowed
      nodes.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ddd588b5
  13. 18 3月, 2011 4 次提交
  14. 24 2月, 2011 1 次提交
  15. 22 1月, 2011 1 次提交
  16. 14 1月, 2011 2 次提交