1. 27 7月, 2008 1 次提交
  2. 25 7月, 2008 1 次提交
    • E
      vmallocinfo: add NUMA information · a47a126a
      Eric Dumazet 提交于
      Christoph recently added /proc/vmallocinfo file to get information about
      vmalloc allocations.
      
      This patch adds NUMA specific information, giving number of pages
      allocated on each memory node.
      
      This should help to check that vmalloc() is able to respect NUMA policies.
      
      Example of output on a four nodes machine (one cpu per node)
      
      1) network hash tables are evenly spreaded on four nodes (OK) (Same
         point for inodes and dentries hash tables)
      
      2) iptables tables (x_tables) are correctly allocated on each cpu node
         (OK).
      
      3) sys_swapon() allocates its memory from one node only.
      
      4) each loaded module is using memory on one node.
      
      Sysadmins could tune their setup to change points 3) and 4) if necessary.
      
      grep "pages="  /proc/vmallocinfo
      0xffffc20000000000-0xffffc20000201000 2101248 alloc_large_system_hash+0x204/0x2c0 pages=512 vmalloc N0=128 N1=128 N2=128 N3=128
      0xffffc20000201000-0xffffc20000302000 1052672 alloc_large_system_hash+0x204/0x2c0 pages=256 vmalloc N0=64 N1=64 N2=64 N3=64
      0xffffc2000031a000-0xffffc2000031d000   12288 alloc_large_system_hash+0x204/0x2c0 pages=2 vmalloc N1=1 N2=1
      0xffffc2000031f000-0xffffc2000032b000   49152 cramfs_uncompress_init+0x2e/0x80 pages=11 vmalloc N0=3 N1=3 N2=2 N3=3
      0xffffc2000033e000-0xffffc20000341000   12288 sys_swapon+0x640/0xac0 pages=2 vmalloc N0=2
      0xffffc20000341000-0xffffc20000344000   12288 xt_alloc_table_info+0xfe/0x130 [x_tables] pages=2 vmalloc N0=2
      0xffffc20000344000-0xffffc20000347000   12288 xt_alloc_table_info+0xfe/0x130 [x_tables] pages=2 vmalloc N1=2
      0xffffc20000347000-0xffffc2000034a000   12288 xt_alloc_table_info+0xfe/0x130 [x_tables] pages=2 vmalloc N2=2
      0xffffc2000034a000-0xffffc2000034d000   12288 xt_alloc_table_info+0xfe/0x130 [x_tables] pages=2 vmalloc N3=2
      0xffffc20004381000-0xffffc20004402000  528384 alloc_large_system_hash+0x204/0x2c0 pages=128 vmalloc N0=32 N1=32 N2=32 N3=32
      0xffffc20004402000-0xffffc20004803000 4198400 alloc_large_system_hash+0x204/0x2c0 pages=1024 vmalloc vpages N0=256 N1=256 N2=256 N3=256
      0xffffc20004803000-0xffffc20004904000 1052672 alloc_large_system_hash+0x204/0x2c0 pages=256 vmalloc N0=64 N1=64 N2=64 N3=64
      0xffffc20004904000-0xffffc20004bec000 3047424 sys_swapon+0x640/0xac0 pages=743 vmalloc vpages N0=743
      0xffffffffa0000000-0xffffffffa000f000   61440 sys_init_module+0xc27/0x1d00 pages=14 vmalloc N1=14
      0xffffffffa000f000-0xffffffffa0014000   20480 sys_init_module+0xc27/0x1d00 pages=4 vmalloc N0=4
      0xffffffffa0014000-0xffffffffa0017000   12288 sys_init_module+0xc27/0x1d00 pages=2 vmalloc N0=2
      0xffffffffa0017000-0xffffffffa0022000   45056 sys_init_module+0xc27/0x1d00 pages=10 vmalloc N1=10
      0xffffffffa0022000-0xffffffffa0028000   24576 sys_init_module+0xc27/0x1d00 pages=5 vmalloc N3=5
      0xffffffffa0028000-0xffffffffa0050000  163840 sys_init_module+0xc27/0x1d00 pages=39 vmalloc N1=39
      0xffffffffa0050000-0xffffffffa0052000    8192 sys_init_module+0xc27/0x1d00 pages=1 vmalloc N1=1
      0xffffffffa0052000-0xffffffffa0056000   16384 sys_init_module+0xc27/0x1d00 pages=3 vmalloc N1=3
      0xffffffffa0056000-0xffffffffa0081000  176128 sys_init_module+0xc27/0x1d00 pages=42 vmalloc N3=42
      0xffffffffa0081000-0xffffffffa00ae000  184320 sys_init_module+0xc27/0x1d00 pages=44 vmalloc N3=44
      0xffffffffa00ae000-0xffffffffa00b1000   12288 sys_init_module+0xc27/0x1d00 pages=2 vmalloc N3=2
      0xffffffffa00b1000-0xffffffffa00b9000   32768 sys_init_module+0xc27/0x1d00 pages=7 vmalloc N0=7
      0xffffffffa00b9000-0xffffffffa00c4000   45056 sys_init_module+0xc27/0x1d00 pages=10 vmalloc N3=10
      0xffffffffa00c6000-0xffffffffa00e0000  106496 sys_init_module+0xc27/0x1d00 pages=25 vmalloc N2=25
      0xffffffffa00e0000-0xffffffffa00f1000   69632 sys_init_module+0xc27/0x1d00 pages=16 vmalloc N2=16
      0xffffffffa00f1000-0xffffffffa00f4000   12288 sys_init_module+0xc27/0x1d00 pages=2 vmalloc N3=2
      0xffffffffa00f4000-0xffffffffa00f7000   12288 sys_init_module+0xc27/0x1d00 pages=2 vmalloc N3=2
      
      [akpm@linux-foundation.org: fix comment]
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a47a126a
  3. 05 6月, 2008 1 次提交
    • M
      genirq: Expose default irq affinity mask (take 3) · 18404756
      Max Krasnyansky 提交于
      Current IRQ affinity interface does not provide a way to set affinity
      for the IRQs that will be allocated/activated in the future.
      This patch creates /proc/irq/default_smp_affinity that lets users set
      default affinity mask for the newly allocated IRQs. Changing the default
      does not affect affinity masks for the currently active IRQs, they
      have to be changed explicitly.
      
      Updated based on Paul J's comments and added some more documentation.
      Signed-off-by: NMax Krasnyansky <maxk@qualcomm.com>
      Cc: pj@sgi.com
      Cc: a.p.zijlstra@chello.nl
      Cc: tglx@linutronix.de
      Cc: rdunlap@xenotime.net
      Cc: mingo@elte.hu
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      18404756
  4. 30 4月, 2008 1 次提交
  5. 23 4月, 2008 2 次提交
  6. 12 3月, 2008 1 次提交
  7. 07 2月, 2008 1 次提交
    • E
      get rid of NR_OPEN and introduce a sysctl_nr_open · 9cfe015a
      Eric Dumazet 提交于
      NR_OPEN (historically set to 1024*1024) actually forbids processes to open
      more than 1024*1024 handles.
      
      Unfortunatly some production servers hit the not so 'ridiculously high
      value' of 1024*1024 file descriptors per process.
      
      Changing NR_OPEN is not considered safe because of vmalloc space potential
      exhaust.
      
      This patch introduces a new sysctl (/proc/sys/fs/nr_open) wich defaults to
      1024*1024, so that admins can decide to change this limit if their workload
      needs it.
      
      [akpm@linux-foundation.org: export it for sparc64]
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9cfe015a
  8. 06 2月, 2008 2 次提交
  9. 03 2月, 2008 1 次提交
  10. 02 2月, 2008 1 次提交
    • E
      [AUDIT] break large execve argument logging into smaller messages · de6bbd1d
      Eric Paris 提交于
      execve arguments can be quite large.  There is no limit on the number of
      arguments and a 4G limit on the size of an argument.
      
      this patch prints those aruguments in bite sized pieces.  a userspace size
      limitation of 8k was discovered so this keeps messages around 7.5k
      
      single arguments larger than 7.5k in length are split into multiple records
      and can be identified as aX[Y]=
      Signed-off-by: NEric Paris <eparis@redhat.com>
      de6bbd1d
  11. 01 2月, 2008 1 次提交
    • E
      [IPV4] route cache: Introduce rt_genid for smooth cache invalidation · 29e75252
      Eric Dumazet 提交于
      Current ip route cache implementation is not suited to large caches.
      
      We can consume a lot of CPU when cache must be invalidated, since we
      currently need to evict all cache entries, and this eviction is
      sometimes asynchronous. min_delay & max_delay can somewhat control this
      asynchronism behavior, but whole thing is a kludge, regularly triggering
      infamous soft lockup messages. When entries are still in use, this also
      consumes a lot of ram, filling dst_garbage.list.
      
      A better scheme is to use a generation identifier on each entry,
      so that cache invalidation can be performed by changing the table
      identifier, without having to scan all entries.
      No more delayed flushing, no more stalling when secret_interval expires.
      
      Invalidated entries will then be freed at GC time (controled by
      ip_rt_gc_timeout or stress), or when an invalidated entry is found
      in a chain when an insert is done.
      Thus we keep a normal equilibrium.
      
      This patch :
      - renames rt_hash_rnd to rt_genid (and makes it an atomic_t)
      - Adds a new rt_genid field to 'struct rtable' (filling a hole on 64bit)
      - Checks entry->rt_genid at appropriate places :
      29e75252
  12. 29 1月, 2008 1 次提交
  13. 20 10月, 2007 1 次提交
  14. 18 10月, 2007 1 次提交
    • J
      x86: expand /proc/interrupts to include missing vectors, v2 · 38e760a1
      Joe Korty 提交于
      Add missing IRQs and IRQ descriptions to /proc/interrupts.
      
      /proc/interrupts is most useful when it displays every IRQ vector in use by
      the system, not just those somebody thought would be interesting.
      
      This patch inserts the following vector displays to the i386 and x86_64
      platforms, as appropriate:
      
      	rescheduling interrupts
      	TLB flush interrupts
      	function call interrupts
      	thermal event interrupts
      	threshold interrupts
      	spurious interrupts
      
      A threshold interrupt occurs when ECC memory correction is occuring at too
      high a frequency.  Thresholds are used by the ECC hardware as occasional
      ECC failures are part of normal operation, but long sequences of ECC
      failures usually indicate a memory chip that is about to fail.
      
      Thermal event interrupts occur when a temperature threshold has been
      exceeded for some CPU chip.  IIRC, a thermal interrupt is also generated
      when the temperature drops back to a normal level.
      
      A spurious interrupt is an interrupt that was raised then lowered by the
      device before it could be fully processed by the APIC.  Hence the apic sees
      the interrupt but does not know what device it came from.  For this case
      the APIC hardware will assume a vector of 0xff.
      
      Rescheduling, call, and TLB flush interrupts are sent from one CPU to
      another per the needs of the OS.  Typically, their statistics would be used
      to discover if an interrupt flood of the given type has been occuring.
      
      AK: merged v2 and v4 which had some more tweaks
      AK: replace Local interrupts with Local timer interrupts
      AK: Fixed description of interrupt types.
      
      [ tglx: arch/x86 adaptation ]
      [ mingo: small cleanup ]
      Signed-off-by: NJoe Korty <joe.korty@ccur.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Tim Hockin <thockin@hockin.org>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      38e760a1
  15. 20 7月, 2007 2 次提交
  16. 18 7月, 2007 1 次提交
    • M
      handle kernelcore=: generic · ed7ed365
      Mel Gorman 提交于
      This patch adds the kernelcore= parameter for x86.
      
      Once all patches are applied, a new command-line parameter exist and a new
      sysctl.  This patch adds the necessary documentation.
      
      From: Yasunori Goto <y-goto@jp.fujitsu.com>
      
        When "kernelcore" boot option is specified, kernel can't boot up on ia64
        because of an infinite loop.  In addition, the parsing code can be handled
        in an architecture-independent manner.
      
        This patch uses common code to handle the kernelcore= parameter.  It is
        only available to architectures that support arch-independent zone-sizing
        (i.e.  define CONFIG_ARCH_POPULATES_NODE_MAP).  Other architectures will
        ignore the boot parameter.
      
      [bunk@stusta.de: make cmdline_parse_kernelcore() static]
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Signed-off-by: NYasunori Goto <y-goto@jp.fujitsu.com>
      Acked-by: NAndy Whitcroft <apw@shadowen.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ed7ed365
  17. 17 7月, 2007 1 次提交
  18. 09 5月, 2007 2 次提交
    • R
      Fix more "deprecated" spellos. · 8b60756a
      Randy Dunlap 提交于
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      8b60756a
    • K
      proc: maps protection · 5096add8
      Kees Cook 提交于
      The /proc/pid/ "maps", "smaps", and "numa_maps" files contain sensitive
      information about the memory location and usage of processes.  Issues:
      
      - maps should not be world-readable, especially if programs expect any
        kind of ASLR protection from local attackers.
      - maps cannot just be 0400 because "-D_FORTIFY_SOURCE=2 -O2" makes glibc
        check the maps when %n is in a *printf call, and a setuid(getuid())
        process wouldn't be able to read its own maps file.  (For reference
        see http://lkml.org/lkml/2006/1/22/150)
      - a system-wide toggle is needed to allow prior behavior in the case of
        non-root applications that depend on access to the maps contents.
      
      This change implements a check using "ptrace_may_attach" before allowing
      access to read the maps contents.  To control this protection, the new knob
      /proc/sys/kernel/maps_protect has been added, with corresponding updates to
      the procfs documentation.
      
      [akpm@linux-foundation.org: build fixes]
      [akpm@linux-foundation.org: New sysctl numbers are old hat]
      Signed-off-by: NKees Cook <kees@outflux.net>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5096add8
  19. 08 5月, 2007 1 次提交
    • D
      smaps: add clear_refs file to clear reference · b813e931
      David Rientjes 提交于
      Adds /proc/pid/clear_refs.  When any non-zero number is written to this file,
      pte_mkold() and ClearPageReferenced() is called for each pte and its
      corresponding page, respectively, in that task's VMAs.  This file is only
      writable by the user who owns the task.
      
      It is now possible to measure _approximately_ how much memory a task is using
      by clearing the reference bits with
      
      	echo 1 > /proc/pid/clear_refs
      
      and checking the reference count for each VMA from the /proc/pid/smaps output
      at a measured time interval.  For example, to observe the approximate change
      in memory footprint for a task, write a script that clears the references
      (echo 1 > /proc/pid/clear_refs), sleeps, and then greps for Pgs_Referenced and
      extracts the size in kB.  Add the sizes for each VMA together for the total
      referenced footprint.  Moments later, repeat the process and observe the
      difference.
      
      For example, using an efficient Mozilla:
      
      	accumulated time		referenced memory
      	----------------		-----------------
      		 0 s				 408 kB
      		 1 s				 408 kB
      		 2 s				 556 kB
      		 3 s				1028 kB
      		 4 s				 872 kB
      		 5 s				1956 kB
      		 6 s				 416 kB
      		 7 s				1560 kB
      		 8 s				2336 kB
      		 9 s				1044 kB
      		10 s				 416 kB
      
      This is a valuable tool to get an approximate measurement of the memory
      footprint for a task.
      
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Christoph Lameter <clameter@sgi.com>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      [akpm@linux-foundation.org: build fixes]
      [mpm@selenic.com: rename for_each_pmd]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b813e931
  20. 26 4月, 2007 1 次提交
  21. 05 3月, 2007 1 次提交
  22. 30 11月, 2006 2 次提交
  23. 04 10月, 2006 4 次提交
  24. 30 9月, 2006 1 次提交
  25. 26 9月, 2006 1 次提交
  26. 25 3月, 2006 1 次提交
  27. 10 1月, 2006 1 次提交
  28. 09 1月, 2006 1 次提交
    • A
      [PATCH] drop-pagecache · 9d0243bc
      Andrew Morton 提交于
      Add /proc/sys/vm/drop_caches.  When written to, this will cause the kernel to
      discard as much pagecache and/or reclaimable slab objects as it can.  THis
      operation requires root permissions.
      
      It won't drop dirty data, so the user should run `sync' first.
      
      Caveats:
      
      a) Holds inode_lock for exorbitant amounts of time.
      
      b) Needs to be taught about NUMA nodes: propagate these all the way through
         so the discarding can be controlled on a per-node basis.
      
      This is a debugging feature: useful for getting consistent results between
      filesystem benchmarks.  We could possibly put it under a config option, but
      it's less than 300 bytes.
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9d0243bc
  29. 10 9月, 2005 1 次提交
  30. 05 9月, 2005 1 次提交
    • M
      [PATCH] add /proc/pid/smaps · e070ad49
      Mauricio Lin 提交于
      Add a "smaps" entry to /proc/pid: show howmuch memory is resident in each
      mapping.
      
      People that want to perform a memory consumption analysing can use it
      mainly if someone needs to figure out which libraries can be reduced for
      embedded systems.  So the new features are the physical size of shared and
      clean [or dirty]; private and clean [or dirty].
      
      Take a look the example below:
      
      # cat /proc/4576/smaps
      
      08048000-080dc000 r-xp /bin/bash
      Size:               592 KB
      Rss:                500 KB
      Shared_Clean:       500 KB
      Shared_Dirty:         0 KB
      Private_Clean:        0 KB
      Private_Dirty:        0 KB
      080dc000-080e2000 rw-p /bin/bash
      Size:                24 KB
      Rss:                 24 KB
      Shared_Clean:         0 KB
      Shared_Dirty:         0 KB
      Private_Clean:        0 KB
      Private_Dirty:       24 KB
      080e2000-08116000 rw-p
      Size:               208 KB
      Rss:                208 KB
      Shared_Clean:         0 KB
      Shared_Dirty:         0 KB
      Private_Clean:        0 KB
      Private_Dirty:      208 KB
      b7e2b000-b7e34000 r-xp /lib/tls/libnss_files-2.3.2.so
      Size:                36 KB
      Rss:                 12 KB
      Shared_Clean:        12 KB
      Shared_Dirty:         0 KB
      Private_Clean:        0 KB
      Private_Dirty:        0 KB
      ...
      
      (Includes a cleanup from "Richard Purdie" <rpurdie@rpsys.net>)
      
      From: Torsten Foertsch <torsten.foertsch@gmx.net>
      
      show_smap calls first show_map and then prints its additional information to
      the seq_file.  show_map checks if all it has to print fits into the buffer and
      if yes marks the current vma as written.  While that is correct for show_map
      it is not for show_smap.  Here the vma should be marked as written only after
      the additional information is also written.
      
      The attached patch cures the problem.  It moves the functionality of the
      show_map function to a new function show_map_internal that is called with an
      additional struct mem_size_stats* argument.  Then show_map calls
      show_map_internal with NULL as struct mem_size_stats* whereas show_smap calls
      it with a real pointer.  Now the final
      
      	if (m->count < m->size)  /* vma is copied successfully */
      		m->version = (vma != get_gate_vma(task))? vma->vm_start: 0;
      
      is done only if the whole entry fits into the buffer.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e070ad49
  31. 01 5月, 2005 1 次提交
  32. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4