1. 17 7月, 2007 4 次提交
    • P
      Make /proc/slabinfo use seq_list_xxx helpers · b92151ba
      Pavel Emelianov 提交于
      This entry prints a header in .start callback.  This is OK, but the more
      elegant solution would be to move this into the .show callback and use
      seq_list_start_head() in .start one.
      
      I have left it as is in order to make the patch just switch to new API and
      noting more.
      
      [adobriyan@sw.ru: Wrong pointer was used as kmem_cache pointer]
      Signed-off-by: NPavel Emelianov <xemul@openvz.org>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NAlexey Dobriyan <adobriyan@sw.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b92151ba
    • R
      MM: use DIV_ROUND_UP() in mm/memory.c · 68e116a3
      Rolf Eike Beer 提交于
      Replace a hand coded version of DIV_ROUND_UP().
      Signed-off-by: NRolf Eike Beer <eike-kernel@sf-tec.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      68e116a3
    • N
      hugetlb: remove unnecessary nid initialization · 31a5c6e4
      Nishanth Aravamudan 提交于
      nid is initialized to numa_node_id() but will either be overwritten in
      the loop or not used in the conditional. So remove the initialization.
      Signed-off-by: NNishanth Aravamudan <nacc@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      31a5c6e4
    • K
      change zonelist order: zonelist order selection logic · f0c0b2b8
      KAMEZAWA Hiroyuki 提交于
      Make zonelist creation policy selectable from sysctl/boot option v6.
      
      This patch makes NUMA's zonelist (of pgdat) order selectable.
      Available order are Default(automatic)/ Node-based / Zone-based.
      
      [Default Order]
      The kernel selects Node-based or Zone-based order automatically.
      
      [Node-based Order]
      This policy treats the locality of memory as the most important parameter.
      Zonelist order is created by each zone's locality. This means lower zones
      (ex. ZONE_DMA) can be used before higher zone (ex. ZONE_NORMAL) exhausion.
      IOW. ZONE_DMA will be in the middle of zonelist.
      current 2.6.21 kernel uses this.
      
      Pros.
       * A user can expect local memory as much as possible.
      Cons.
       * lower zone will be exhansted before higher zone. This may cause OOM_KILL.
      
      Maybe suitable if ZONE_DMA is relatively big and you never see OOM_KILL
      because of ZONE_DMA exhaution and you need the best locality.
      
      (example)
      assume 2 node NUMA. node(0) has ZONE_DMA/ZONE_NORMAL, node(1) has ZONE_NORMAL.
      
      *node(0)'s memory allocation order:
      
       node(0)'s NORMAL -> node(0)'s DMA -> node(1)'s NORMAL.
      
      *node(1)'s memory allocation order:
      
       node(1)'s NORMAL -> node(0)'s NORMAL -> node(0)'s DMA.
      
      [Zone-based order]
      This policy treats the zone type as the most important parameter.
      Zonelist order is created by zone-type order. This means lower zone
      never be used bofere higher zone exhaustion.
      IOW. ZONE_DMA will be always at the tail of zonelist.
      
      Pros.
       * OOM_KILL(bacause of lower zone) occurs only if the whole zones are exhausted.
      Cons.
       * memory locality may not be best.
      
      (example)
      assume 2 node NUMA. node(0) has ZONE_DMA/ZONE_NORMAL, node(1) has ZONE_NORMAL.
      
      *node(0)'s memory allocation order:
      
       node(0)'s NORMAL -> node(1)'s NORMAL -> node(0)'s DMA.
      
      *node(1)'s memory allocation order:
      
       node(1)'s NORMAL -> node(0)'s NORMAL -> node(0)'s DMA.
      
      bootoption "numa_zonelist_order=" and proc/sysctl is supporetd.
      
      command:
      %echo N > /proc/sys/vm/numa_zonelist_order
      
      Will rebuild zonelist in Node-based order.
      
      command:
      %echo Z > /proc/sys/vm/numa_zonelist_order
      
      Will rebuild zonelist in Zone-based order.
      
      Thanks to Lee Schermerhorn, he gives me much help and codes.
      
      [Lee.Schermerhorn@hp.com: add check_highest_zone to build_zonelists_in_zone_order]
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: "jesse.barnes@intel.com" <jesse.barnes@intel.com>
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f0c0b2b8
  2. 12 7月, 2007 1 次提交
    • E
      security: Protection for exploiting null dereference using mmap · ed032189
      Eric Paris 提交于
      Add a new security check on mmap operations to see if the user is attempting
      to mmap to low area of the address space.  The amount of space protected is
      indicated by the new proc tunable /proc/sys/vm/mmap_min_addr and defaults to
      0, preserving existing behavior.
      
      This patch uses a new SELinux security class "memprotect."  Policy already
      contains a number of allow rules like a_t self:process * (unconfined_t being
      one of them) which mean that putting this check in the process class (its
      best current fit) would make it useless as all user processes, which we also
      want to protect against, would be allowed. By taking the memprotect name of
      the new class it will also make it possible for us to move some of the other
      memory protect permissions out of 'process' and into the new class next time
      we bump the policy version number (which I also think is a good future idea)
      Acked-by: NStephen Smalley <sds@tycho.nsa.gov>
      Acked-by: NChris Wright <chrisw@sous-sol.org>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      ed032189
  3. 10 7月, 2007 3 次提交
    • C
      xip sendfile removal · d054fe3d
      Carsten Otte 提交于
      This patch removes xip_file_sendfile, the sendfile implementation for
      xip without replacement. Those customers that use xip on s390 are not
      using sendfile() as far as we know, and so far s390 is the only platform
      this could potentially be used on so far.
      Having sendfile is not a popular feature for execute in place file
      systems, however we have a working implementation of splice_read() based
      on fs/splice.c if anyone asks for it.
      At this point in time, it does not seem preferable to merge
      splice_read() for xip because it causes extra maintenence effort due to
      code duplication and it requires struct page behind the xip memory
      segment. We'd like to get rid of that in favor of supporting flash based
      embedded platforms (Monta Vista work) soon.
      Signed-off-by: NCarsten Otte <cotte@de.ibm.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      d054fe3d
    • H
      shmem: convert to using splice instead of sendfile() · ae976416
      Hugh Dickins 提交于
      Remove shmem_file_sendfile and resurrect shmem_readpage, as used by tmpfs
      to support loop and sendfile in 2.4 and 2.5.  Now tmpfs can support splice,
      loop and sendfile in the simplest way, using generic_file_splice_read and
      generic_file_splice_write (with the aid of shmem_prepare_write).
      
      We could make some efficiency tweaks later, if there's a real need;
      but this is stable and works well as is.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      ae976416
    • J
      sendfile: kill generic_file_sendfile() · 0452a4e5
      Jens Axboe 提交于
      It's no longer used.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      0452a4e5
  4. 09 7月, 2007 1 次提交
  5. 07 7月, 2007 2 次提交
  6. 06 7月, 2007 1 次提交
    • D
      Fix slab redzone alignment · 87a927c7
      David Woodhouse 提交于
      Commit b46b8f19 fixed a couple of bugs
      by switching the redzone to 64 bits. Unfortunately, it neglected to
      ensure that the _second_ redzone, after the slab object, is aligned
      correctly. This caused illegal instruction faults on sparc32, which for
      some reason not entirely clear to me are not trapped and fixed up.
      
      Two things need to be done to fix this:
        - increase the object size, rounding up to alignof(long long) so
          that the second redzone can be aligned correctly.
        - If SLAB_STORE_USER is set but alignof(long long)==8, allow a
          full 64 bits of space for the user word at the end of the buffer,
          even though we may not _use_ the whole 64 bits.
      
      This patch should be a no-op on any 64-bit architecture or any 32-bit
      architecture where alignof(long long) == 4. Of the others, it's tested
      on ppc32 by myself and a very similar patch was tested on sparc32 by
      Mark Fortescue, who reported the new problem.
      
      Also, fix the conditions for FORCED_DEBUG, which hadn't been adjusted to
      the new sizes. Again noticed by Mark.
      Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      87a927c7
  7. 04 7月, 2007 1 次提交
  8. 02 7月, 2007 1 次提交
  9. 29 6月, 2007 1 次提交
    • H
      mm: kill validate_anon_vma to avoid mapcount BUG · 30acbaba
      Hugh Dickins 提交于
      validate_anon_vma gave a useful check on the integrity of the anon_vma list
      when Andrea was developing obj rmap; but it was not enabled in SLES9
      itself, nor in mainline, until Nick changed commented-out RMAP_DEBUG to
      configurable CONFIG_DEBUG_VM in 2.6.17.  Now Petr Vandrovec reports that
      its BUG_ON(mapcount > 100000) can easily crash a CONFIG_DEBUG_VM=y system.
      
      That limit was just an arbitrary number to protect against an infinite
      loop.  We could raise it to something enormous (depending on sizeof struct
      vma and size of memory?); but I rather think validate_anon_vma has outlived
      its usefulness, and is better just removed - which gives a magnificent
      performance boost to anything like Petr's test program ;)
      
      Of course, a very long anon_vma list is bad news for preemption latency,
      and I believe there has been one recent report of such: let's not forget
      that, but validate_anon_vma only makes it worse not better.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: Petr Vandrovec <petr@vmware.com>
      Acked-by: NNick Piggin <npiggin@suse.de>
      Cc: Andrea Arcangeli <andrea@suse.de>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      30acbaba
  10. 24 6月, 2007 1 次提交
  11. 22 6月, 2007 1 次提交
  12. 17 6月, 2007 3 次提交
  13. 16 6月, 2007 1 次提交
    • P
      mm: Fix memory/cpu hotplug section mismatch and oops. · d09c6b80
      Paul Mundt 提交于
      When building with memory hotplug enabled and cpu hotplug disabled, we
      end up with the following section mismatch:
      
      WARNING: mm/built-in.o(.text+0x4e58): Section mismatch: reference to
      .init.text: (between 'free_area_init_node' and '__build_all_zonelists')
      
      This happens as a result of:
      
              -> free_area_init_node()
                -> free_area_init_core()
                  -> zone_pcp_init() <-- all __meminit up to this point
                    -> zone_batchsize() <-- marked as __cpuinit                     fo
      
      This happens because CONFIG_HOTPLUG_CPU=n sets __cpuinit to __init, but
      CONFIG_MEMORY_HOTPLUG=y unsets __meminit.
      
      Changing zone_batchsize() to __devinit fixes this.
      
      __devinit is the only thing that is common between CONFIG_HOTPLUG_CPU=y and
      CONFIG_MEMORY_HOTPLUG=y. In the long run, perhaps this should be moved to
      another section identifier completely. Without this, memory hot-add
      of offline nodes (via hotadd_new_pgdat()) will oops if CPU hotplug is
      not also enabled.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      Acked-by: NYasunori Goto <y-goto@jp.fujitsu.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      
      --
      
       mm/page_alloc.c |    2 +-
       1 file changed, 1 insertion(+), 1 deletion(-)
      d09c6b80
  14. 09 6月, 2007 4 次提交
  15. 01 6月, 2007 3 次提交
  16. 31 5月, 2007 2 次提交
  17. 24 5月, 2007 3 次提交
  18. 22 5月, 2007 1 次提交
    • A
      Detach sched.h from mm.h · e8edc6e0
      Alexey Dobriyan 提交于
      First thing mm.h does is including sched.h solely for can_do_mlock() inline
      function which has "current" dereference inside. By dealing with can_do_mlock()
      mm.h can be detached from sched.h which is good. See below, why.
      
      This patch
      a) removes unconditional inclusion of sched.h from mm.h
      b) makes can_do_mlock() normal function in mm/mlock.c
      c) exports can_do_mlock() to not break compilation
      d) adds sched.h inclusions back to files that were getting it indirectly.
      e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
         getting them indirectly
      
      Net result is:
      a) mm.h users would get less code to open, read, preprocess, parse, ... if
         they don't need sched.h
      b) sched.h stops being dependency for significant number of files:
         on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
         after patch it's only 3744 (-8.3%).
      
      Cross-compile tested on
      
      	all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
      	alpha alpha-up
      	arm
      	i386 i386-up i386-defconfig i386-allnoconfig
      	ia64 ia64-up
      	m68k
      	mips
      	parisc parisc-up
      	powerpc powerpc-up
      	s390 s390-up
      	sparc sparc-up
      	sparc64 sparc64-up
      	um-x86_64
      	x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig
      
      as well as my two usual configs.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e8edc6e0
  19. 19 5月, 2007 2 次提交
  20. 17 5月, 2007 4 次提交