1. 07 3月, 2010 1 次提交
    • R
      mm: change anon_vma linking to fix multi-process server scalability issue · 5beb4930
      Rik van Riel 提交于
      The old anon_vma code can lead to scalability issues with heavily forking
      workloads.  Specifically, each anon_vma will be shared between the parent
      process and all its child processes.
      
      In a workload with 1000 child processes and a VMA with 1000 anonymous
      pages per process that get COWed, this leads to a system with a million
      anonymous pages in the same anon_vma, each of which is mapped in just one
      of the 1000 processes.  However, the current rmap code needs to walk them
      all, leading to O(N) scanning complexity for each page.
      
      This can result in systems where one CPU is walking the page tables of
      1000 processes in page_referenced_one, while all other CPUs are stuck on
      the anon_vma lock.  This leads to catastrophic failure for a benchmark
      like AIM7, where the total number of processes can reach in the tens of
      thousands.  Real workloads are still a factor 10 less process intensive
      than AIM7, but they are catching up.
      
      This patch changes the way anon_vmas and VMAs are linked, which allows us
      to associate multiple anon_vmas with a VMA.  At fork time, each child
      process gets its own anon_vmas, in which its COWed pages will be
      instantiated.  The parents' anon_vma is also linked to the VMA, because
      non-COWed pages could be present in any of the children.
      
      This reduces rmap scanning complexity to O(1) for the pages of the 1000
      child processes, with O(N) complexity for at most 1/N pages in the system.
       This reduces the average scanning cost in heavily forking workloads from
      O(N) to 2.
      
      The only real complexity in this patch stems from the fact that linking a
      VMA to anon_vmas now involves memory allocations.  This means vma_adjust
      can fail, if it needs to attach a VMA to anon_vma structures.  This in
      turn means error handling needs to be added to the calling functions.
      
      A second source of complexity is that, because there can be multiple
      anon_vmas, the anon_vma linking in vma_adjust can no longer be done under
      "the" anon_vma lock.  To prevent the rmap code from walking up an
      incomplete VMA, this patch introduces the VM_LOCK_RMAP VMA flag.  This bit
      flag uses the same slot as the NOMMU VM_MAPPED_COPY, with an ifdef in mm.h
      to make sure it is impossible to compile a kernel that needs both symbolic
      values for the same bitflag.
      
      Some test results:
      
      Without the anon_vma changes, when AIM7 hits around 9.7k users (on a test
      box with 16GB RAM and not quite enough IO), the system ends up running
      >99% in system time, with every CPU on the same anon_vma lock in the
      pageout code.
      
      With these changes, AIM7 hits the cross-over point around 29.7k users.
      This happens with ~99% IO wait time, there never seems to be any spike in
      system time.  The anon_vma lock contention appears to be resolved.
      
      [akpm@linux-foundation.org: cleanups]
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5beb4930
  2. 27 2月, 2010 1 次提交
  3. 07 1月, 2010 1 次提交
  4. 17 12月, 2009 1 次提交
  5. 04 12月, 2009 1 次提交
  6. 19 11月, 2009 1 次提交
  7. 12 11月, 2009 1 次提交
  8. 01 7月, 2009 1 次提交
    • J
      [IA64] address compiler warnings perfmon.c/salinfo.c · fa276f36
      Jan Beulich 提交于
      perfmon.c has a dubious cast directly from "int" to "void *". Add
      an intermediate cast to "long" to keep gcc happy.
      
      salinfo.c uses "down_trylock()" in a highly creative way (explained
      in the comments in the file) ... but it does kick out this warning:
      
       arch/ia64/kernel/salinfo.c:195: warning: ignoring return value of 'down_trylock'
      
      which people occasionally try to "fix" in ways that do not work. Use some
      casts to keep gcc quiet.
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      fa276f36
  9. 18 6月, 2009 1 次提交
    • M
      [IA64] Convert ia64 to use int-ll64.h · e088a4ad
      Matthew Wilcox 提交于
      It is generally agreed that it would be beneficial for u64 to be an
      unsigned long long on all architectures.  ia64 (in common with several
      other 64-bit architectures) currently uses unsigned long.  Migrating
      piecemeal is too painful; this giant patch fixes all compilation warnings
      and errors that come as a result of switching to use int-ll64.h.
      
      Note that userspace will still see __u64 defined as unsigned long.  This
      is important as it affects C++ name mangling.
      
      [Updated by Tony Luck to change efi.h:efi_freemem_callback_t to use
       u64 for start/end rather than unsigned long]
      Signed-off-by: NMatthew Wilcox <willy@linux.intel.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      e088a4ad
  10. 17 6月, 2009 1 次提交
    • T
      remove put_cpu_no_resched() · 8b0b1db0
      Thomas Gleixner 提交于
      put_cpu_no_resched() is an optimization of put_cpu() which unfortunately
      can cause high latencies.
      
      The nfs iostats code uses put_cpu_no_resched() in a code sequence where a
      reschedule request caused by an interrupt between the get_cpu() and the
      put_cpu_no_resched() can delay the reschedule for at least HZ.
      
      The other users of put_cpu_no_resched() optimize correctly in interrupt
      code, but there is no real harm in using the put_cpu() function which is
      an alias for preempt_enable().  The extra check of the preemmpt count is
      not as critical as the potential source of missing a reschedule.
      
      Debugged in the preempt-rt tree and verified in mainline.
      
      Impact: remove a high latency source
      
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8b0b1db0
  11. 28 3月, 2009 1 次提交
  12. 16 3月, 2009 1 次提交
  13. 14 11月, 2008 2 次提交
  14. 02 11月, 2008 1 次提交
    • A
      saner FASYNC handling on file close · 233e70f4
      Al Viro 提交于
      As it is, all instances of ->release() for files that have ->fasync()
      need to remember to evict file from fasync lists; forgetting that
      creates a hole and we actually have a bunch that *does* forget.
      
      So let's keep our lives simple - let __fput() check FASYNC in
      file->f_flags and call ->fasync() there if it's been set.  And lose that
      crap in ->release() instances - leaving it there is still valid, but we
      don't have to bother anymore.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      233e70f4
  15. 07 10月, 2008 1 次提交
  16. 27 7月, 2008 1 次提交
  17. 26 6月, 2008 2 次提交
  18. 12 6月, 2008 1 次提交
  19. 15 5月, 2008 1 次提交
  20. 02 5月, 2008 2 次提交
  21. 29 4月, 2008 1 次提交
  22. 22 4月, 2008 1 次提交
    • J
      [IA64] minor irq handler cleanups · 9010eff0
      Jeff Garzik 提交于
      - remove unused 'irq' argument from pfm_do_interrupt_handler()
      
      - remove pointless cast to void*
      
      - add KERN_xxx prefix to printk()
      
      - remove braces around singleton C statement
      
      - in tioce_provider.c, start tioce_dma_consistent() and
        tioce_error_intr_handler() function declarations in column 0
      
      This change's main purpose is to prepare for the patchset in
      jgarzik/misc-2.6.git#irq-remove, that explores removal of the
      never-used 'irq' argument in each interrupt handler.
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      9010eff0
  23. 10 4月, 2008 1 次提交
  24. 07 3月, 2008 1 次提交
  25. 09 2月, 2008 1 次提交
  26. 06 2月, 2008 1 次提交
  27. 05 2月, 2008 1 次提交
  28. 07 12月, 2007 1 次提交
  29. 07 11月, 2007 1 次提交
    • T
      [IA64] Fix perfmon sysctl directory modes · e3ad42be
      Tony Luck 提交于
      New sanity checks in sysctl_check_table() complain about a couple
      of mode 0755 that should be 0555 in the perfmon code:
      
      sysctl table check failed: /kernel .1 Writable sysctl directory
      sysctl table check failed: /kernel/perfmon  Writable sysctl directory
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      e3ad42be
  30. 20 10月, 2007 1 次提交
  31. 13 10月, 2007 1 次提交
  32. 01 8月, 2007 1 次提交
  33. 12 5月, 2007 1 次提交
  34. 09 5月, 2007 1 次提交
  35. 07 3月, 2007 1 次提交
    • N
      [IA64] permon use-after-free fix · 41d5e5d7
      Nick Piggin 提交于
      Perfmon associates vmalloc()ed memory with a file descriptor, and installs
      a vma mapping that memory.  Unfortunately, the vm_file field is not filled
      in, so processes with mappings to that memory do not prevent the file from
      being closed and the memory freed.  This results in use-after-free bugs and
      multiple freeing of pages, etc.
      
      I saw this bug on an Altix on SLES9.  Haven't reproduced upstream but it
      looks like the same issue is there.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Stephane Eranian <eranian@hpl.hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      41d5e5d7
  36. 18 2月, 2007 1 次提交
  37. 15 2月, 2007 1 次提交
    • E
      [PATCH] sysctl: remove insert_at_head from register_sysctl · 0b4d4147
      Eric W. Biederman 提交于
      The semantic effect of insert_at_head is that it would allow new registered
      sysctl entries to override existing sysctl entries of the same name.  Which is
      pain for caching and the proc interface never implemented.
      
      I have done an audit and discovered that none of the current users of
      register_sysctl care as (excpet for directories) they do not register
      duplicate sysctl entries.
      
      So this patch simply removes the support for overriding existing entries in
      the sys_sysctl interface since no one uses it or cares and it makes future
      enhancments harder.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Acked-by: NRalf Baechle <ralf@linux-mips.org>
      Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: David Howells <dhowells@redhat.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Corey Minyard <minyard@acm.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: "John W. Linville" <linville@tuxdriver.com>
      Cc: James Bottomley <James.Bottomley@steeleye.com>
      Cc: Jan Kara <jack@ucw.cz>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Cc: David Chinner <dgc@sgi.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Patrick McHardy <kaber@trash.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0b4d4147