1. 03 2月, 2012 1 次提交
    • Y
      drivers/base/memory.c: fix memory_dev_init() long delay · 321bf4ed
      Yinghai Lu 提交于
      One system with 2048g ram, reported soft lockup on recent kernel.
      
      [   34.426749] cpu_dev_init done
      [   61.166399] BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
      [   61.166733] Modules linked in:
      [   61.166904] irq event stamp: 1935610
      [   61.178431] hardirqs last  enabled at (1935609): [<ffffffff81ce8c05>] mutex_lock_nested+0x299/0x2b4
      [   61.178923] hardirqs last disabled at (1935610): [<ffffffff81cf2bab>] apic_timer_interrupt+0x6b/0x80
      [   61.198767] softirqs last  enabled at (1935476): [<ffffffff8106e59c>] __do_softirq+0x195/0x1ab
      [   61.218604] softirqs last disabled at (1935471): [<ffffffff81cf359c>] call_softirq+0x1c/0x30
      [   61.238408] CPU 0
      [   61.238549] Modules linked in:
      [   61.238744]
      [   61.238825] Pid: 1, comm: swapper/0 Not tainted 3.3.0-rc1-tip-yh-02076-g962f689-dirty #171
      [   61.278212] RIP: 0010:[<ffffffff810b3e3a>]  [<ffffffff810b3e3a>] lock_release+0x90/0x9c
      [   61.278627] RSP: 0018:ffff883f64dbfd70  EFLAGS: 00000246
      [   61.298287] RAX: ffff883f64dc0000 RBX: 0000000000000000 RCX: 000000000000008b
      [   61.298690] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
      [   61.318383] RBP: ffff883f64dbfda0 R08: 0000000000000001 R09: 000000000000008b
      [   61.338215] R10: 0000000000000000 R11: 0000000000000000 R12: ffff883f64dbfd10
      [   61.338610] R13: ffff883f64dc0708 R14: ffff883f64dc0708 R15: ffffffff81095657
      [   61.358299] FS:  0000000000000000(0000) GS:ffff883f7d600000(0000) knlGS:0000000000000000
      [   61.378118] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [   61.378450] CR2: 0000000000000000 CR3: 00000000024af000 CR4: 00000000000007f0
      [   61.398144] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   61.417918] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [   61.418260] Process swapper/0 (pid: 1, threadinfo ffff883f64dbe000, task ffff883f64dc0000)
      [   61.445358] Stack:
      [   61.445511]  0000000000000002 ffff897f649ba168 ffff883f64dbfe10 ffff88ff64bb57a8
      [   61.458040]  0000000000000000 0000000000000000 ffff883f64dbfdc0 ffffffff81ceb1b4
      [   61.458491]  000000000011608c ffff88ff64bb58a8 ffff883f64dbfdf0 ffffffff81c57638
      [   61.478215] Call Trace:
      [   61.478367]  [<ffffffff81ceb1b4>] _raw_spin_unlock+0x21/0x2e
      [   61.497994]  [<ffffffff81c57638>] klist_next+0x9e/0xbc
      [   61.498264]  [<ffffffff8148ba99>] next_device+0xe/0x1e
      [   61.517867]  [<ffffffff8148c0cc>] subsys_find_device_by_id+0xb7/0xd6
      [   61.518197]  [<ffffffff81498846>] find_memory_block_hinted+0x3d/0x66
      [   61.537927]  [<ffffffff8149887f>] find_memory_block+0x10/0x12
      [   61.538193]  [<ffffffff814988b6>] add_memory_section+0x35/0x9e
      [   61.557932]  [<ffffffff827fecef>] memory_dev_init+0x68/0xda
      [   61.558227]  [<ffffffff827fec01>] driver_init+0x97/0xa7
      [   61.577853]  [<ffffffff827cdf3c>] kernel_init+0xf6/0x1c0
      [   61.578140]  [<ffffffff81cf34a4>] kernel_thread_helper+0x4/0x10
      [   61.597850]  [<ffffffff81ceb59d>] ? retint_restore_args+0xe/0xe
      [   61.598144]  [<ffffffff827cde46>] ? start_kernel+0x3ab/0x3ab
      [   61.617826]  [<ffffffff81cf34a0>] ? gs_change+0xb/0xb
      [   61.618060] Code: 10 48 83 3b 00 eb e8 4c 89 f2 44 89 fe 4c 89 ef e8 e1 fe ff ff 65 48 8b 04 25 40 bc 00 00 c7 80 cc 06 00 00 00 00 00 00 41 54 9d <5e> 5b 41 5c 41 5d 41 5e 41 5f 5d c3 55 48 89 e5 41 57 41 89 cf
      [   89.285380] memory_dev_init done
      
      Finally it takes about 55s to create 16400 memory entries.
      
      Root cause: for x86_64, 2048g (with 2g hole at [2g,4g), and TOP2 will be 2050g), will have 16400 memory block.
      
      find_memory_block/subsys_find_device_by_id will be expensive with that many entries.
      
      Actually, we don't need to find that memory block for BOOT path.
      
      Skip that finding make it get back to normal.
      
      [   34.466696] cpu_dev_init done
      [   35.290080] memory_dev_init done
      
      Also solved the delay with topology_init when sections_per_block is not 1.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Cc: Nathan Fontenot <nfont@austin.ibm.com>
      Cc: Robin Holt <holt@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      321bf4ed
  2. 22 12月, 2011 2 次提交
    • K
      convert 'memory' sysdev_class to a regular subsystem · 10fbcf4c
      Kay Sievers 提交于
      This moves the 'memory sysdev_class' over to a regular 'memory' subsystem
      and converts the devices to regular devices. The sysdev drivers are
      implemented as subsystem interfaces now.
      
      After all sysdev classes are ported to regular driver core entities, the
      sysdev implementation will be entirely removed from the kernel.
      Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      10fbcf4c
    • K
      cpu: convert 'cpu' and 'machinecheck' sysdev_class to a regular subsystem · 8a25a2fd
      Kay Sievers 提交于
      This moves the 'cpu sysdev_class' over to a regular 'cpu' subsystem
      and converts the devices to regular devices. The sysdev drivers are
      implemented as subsystem interfaces now.
      
      After all sysdev classes are ported to regular driver core entities, the
      sysdev implementation will be entirely removed from the kernel.
      
      Userspace relies on events and generic sysfs subsystem infrastructure
      from sysdev devices, which are made available with this conversion.
      
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Borislav Petkov <bp@amd64.org>
      Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Zhang Rui <rui.zhang@intel.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      8a25a2fd
  3. 19 11月, 2011 1 次提交
  4. 25 5月, 2011 1 次提交
    • K
      mm: per-node vmstat: show proper vmstats · fa25c503
      KOSAKI Motohiro 提交于
      commit 2ac39037 ("writeback: add
      /sys/devices/system/node/<node>/vmstat") added vmstat entry.  But
      strangely it only show nr_written and nr_dirtied.
      
              # cat /sys/devices/system/node/node20/vmstat
              nr_written 0
              nr_dirtied 0
      
      Of course, It's not adequate.  With this patch, the vmstat show all vm
      stastics as /proc/vmstat.
      
              # cat /sys/devices/system/node/node0/vmstat
      	nr_free_pages 899224
      	nr_inactive_anon 201
      	nr_active_anon 17380
      	nr_inactive_file 31572
      	nr_active_file 28277
      	nr_unevictable 0
      	nr_mlock 0
      	nr_anon_pages 17321
      	nr_mapped 8640
      	nr_file_pages 60107
      	nr_dirty 33
      	nr_writeback 0
      	nr_slab_reclaimable 6850
      	nr_slab_unreclaimable 7604
      	nr_page_table_pages 3105
      	nr_kernel_stack 175
      	nr_unstable 0
      	nr_bounce 0
      	nr_vmscan_write 0
      	nr_writeback_temp 0
      	nr_isolated_anon 0
      	nr_isolated_file 0
      	nr_shmem 260
      	nr_dirtied 1050
      	nr_written 938
      	numa_hit 962872
      	numa_miss 0
      	numa_foreign 0
      	numa_interleave 8617
      	numa_local 962872
      	numa_other 0
      	nr_anon_transparent_hugepages 0
      
      [akpm@linux-foundation.org: no externs in .c files]
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Michael Rubin <mrubin@google.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fa25c503
  5. 04 2月, 2011 1 次提交
    • N
      memory hotplug: Update phys_index to [start|end]_section_nr · d3360164
      Nathan Fontenot 提交于
      Update the 'phys_index' property of a the memory_block struct to be
      called start_section_nr, and add a end_section_nr property.  The
      data tracked here is the same but the updated naming is more in line
      with what is stored here, namely the first and last section number
      that the memory block spans.
      
      The names presented to userspace remain the same, phys_index for
      start_section_nr and end_phys_index for end_section_nr, to avoid breaking
      anything in userspace.
      
      This also updates the node sysfs code to be aware of the new capability for
      a memory block to contain multiple memory sections and be aware of the memory
      block structure name changes (start_section_nr).  This requires an additional
      parameter to unregister_mem_sect_under_nodes so that we know which memory
      section of the memory block to unregister.
      Signed-off-by: NNathan Fontenot <nfont@austin.ibm.com>
      Reviewed-by: NRobin Holt <holt@sgi.com>
      Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      d3360164
  6. 14 1月, 2011 1 次提交
  7. 27 10月, 2010 1 次提交
  8. 23 10月, 2010 1 次提交
  9. 10 8月, 2010 1 次提交
  10. 25 5月, 2010 1 次提交
  11. 07 4月, 2010 1 次提交
    • T
      nodemask: include slab.h from drivers/base/node.c · 18e5b539
      Tejun Heo 提交于
      NODEMASK_ALLOC/FREE are mapped to kmalloc/free if NODES_SHIFT > 8.
      Among its several users, drivers/base/node.c wasn't including slab.h
      leading to build failure if NODES_SHIFT > 8.  Include slab.h from
      drivers/base/node.c.
      
      This isn't an ideal solution but including slab.h directly from
      nodemask.h is not an option because nodemask.h gets included
      everywhere.  For now, make it work by including slab.h from its users.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NIngo Molnar <mingo@elte.hu>
      18e5b539
  12. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  13. 19 3月, 2010 1 次提交
  14. 08 3月, 2010 3 次提交
  15. 16 12月, 2009 8 次提交
    • D
      mm: slab-allocate memory section nodemask for large systems · 9ae49fab
      David Rientjes 提交于
      Nodemasks should not be allocated on the stack for large systems (when it
      is larger than 256 bytes) since there is a threat of overflow.
      
      This patch causes the unregister_mem_sect_under_nodes() nodemask to be
      allocated on the stack for smaller systems and be allocated by slab for
      larger systems.
      
      GFP_KERNEL is used since remove_memory_block() can block.
      
      Cc: Gary Hade <garyhade@us.ibm.com>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Cc: Alex Chiang <achiang@hp.com>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9ae49fab
    • A
      mm: add numa node symlink for cpu devices in sysfs · 1830794a
      Alex Chiang 提交于
      You can discover which CPUs belong to a NUMA node by examining
      /sys/devices/system/node/node#/
      
      However, it's not convenient to go in the other direction, when looking at
      /sys/devices/system/cpu/cpu#/
      
      Yes, you can muck about in sysfs, but adding these symlinks makes life a
      lot more convenient.
      Signed-off-by: NAlex Chiang <achiang@hp.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Gary Hade <garyhade@us.ibm.com>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1830794a
    • A
      mm: refactor unregister_cpu_under_node() · b9d52dad
      Alex Chiang 提交于
      By returning early if the node is not online, we can unindent the
      interesting code by two levels.
      
      No functional change.
      Signed-off-by: NAlex Chiang <achiang@hp.com>
      Cc: Gary Hade <garyhade@us.ibm.com>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b9d52dad
    • A
      mm: refactor register_cpu_under_node() · f8246f31
      Alex Chiang 提交于
      By returning early if the node is not online, we can unindent the
      interesting code by one level.
      
      No functional change.
      Signed-off-by: NAlex Chiang <achiang@hp.com>
      Cc: Gary Hade <garyhade@us.ibm.com>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f8246f31
    • A
      mm: add numa node symlink for memory section in sysfs · dee5d0d5
      Alex Chiang 提交于
      Commit c04fc586 (mm: show node to memory section relationship with
      symlinks in sysfs) created symlinks from nodes to memory sections, e.g.
      
      /sys/devices/system/node/node1/memory135 -> ../../memory/memory135
      
      If you're examining the memory section though and are wondering what node
      it might belong to, you can find it by grovelling around in sysfs, but
      it's a little cumbersome.
      
      Add a reverse symlink for each memory section that points back to the
      node to which it belongs.
      Signed-off-by: NAlex Chiang <achiang@hp.com>
      Cc: Gary Hade <garyhade@us.ibm.com>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dee5d0d5
    • L
      hugetlb: offload per node attribute registrations · 39da08cb
      Lee Schermerhorn 提交于
      Offload the registration and unregistration of per node hstate sysfs
      attributes to a worker thread rather than attempt the
      allocation/attachment or detachment/freeing of the attributes in the
      context of the memory hotplug handler.
      
      I don't know that this is absolutely required, but the registration can
      sleep in allocations and other mem hot plug handlers do it this way.  If
      it turns out this is NOT required, we can drop this patch.
      
      N.B.,  Only tested build, boot, libhugetlbfs regression.
             i.e., no memory hotplug testing.
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Reviewed-by: NAndi Kleen <andi@firstfloor.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Nishanth Aravamudan <nacc@us.ibm.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: Andy Whitcroft <apw@canonical.com>
      Cc: Eric Whitney <eric.whitney@hp.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      39da08cb
    • L
      hugetlb: handle memory hot-plug events · 4faf8d95
      Lee Schermerhorn 提交于
      Register per node hstate attributes only for nodes with memory.  As
      suggested by David Rientjes.
      
      With Memory Hotplug, memory can be added to a memoryless node and a node
      with memory can become memoryless.  Therefore, add a memory on/off-line
      notifier callback to [un]register a node's attributes on transition
      to/from memoryless state.
      
      N.B.,  Only tested build, boot, libhugetlbfs regression.
             i.e., no memory hotplug testing.
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Reviewed-by: NAndi Kleen <andi@firstfloor.org>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Nishanth Aravamudan <nacc@us.ibm.com>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: Andy Whitcroft <apw@canonical.com>
      Cc: Eric Whitney <eric.whitney@hp.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4faf8d95
    • L
      hugetlb: add per node hstate attributes · 9a305230
      Lee Schermerhorn 提交于
      Add the per huge page size control/query attributes to the per node
      sysdevs:
      
      /sys/devices/system/node/node<ID>/hugepages/hugepages-<size>/
      	nr_hugepages       - r/w
      	free_huge_pages    - r/o
      	surplus_huge_pages - r/o
      
      The patch attempts to re-use/share as much of the existing global hstate
      attribute initialization and handling, and the "nodes_allowed" constraint
      processing as possible.
      
      Calling set_max_huge_pages() with no node indicates a change to global
      hstate parameters.  In this case, any non-default task mempolicy will be
      used to generate the nodes_allowed mask.  A valid node id indicates an
      update to that node's hstate parameters, and the count argument specifies
      the target count for the specified node.  From this info, we compute the
      target global count for the hstate and construct a nodes_allowed node mask
      contain only the specified node.
      
      Setting the node specific nr_hugepages via the per node attribute
      effectively ignores any task mempolicy or cpuset constraints.
      
      With this patch:
      
      (me):ls /sys/devices/system/node/node0/hugepages/hugepages-2048kB
      ./  ../  free_hugepages  nr_hugepages  surplus_hugepages
      
      Starting from:
      Node 0 HugePages_Total:     0
      Node 0 HugePages_Free:      0
      Node 0 HugePages_Surp:      0
      Node 1 HugePages_Total:     0
      Node 1 HugePages_Free:      0
      Node 1 HugePages_Surp:      0
      Node 2 HugePages_Total:     0
      Node 2 HugePages_Free:      0
      Node 2 HugePages_Surp:      0
      Node 3 HugePages_Total:     0
      Node 3 HugePages_Free:      0
      Node 3 HugePages_Surp:      0
      vm.nr_hugepages = 0
      
      Allocate 16 persistent huge pages on node 2:
      (me):echo 16 >/sys/devices/system/node/node2/hugepages/hugepages-2048kB/nr_hugepages
      
      [Note that this is equivalent to:
      	numactl -m 2 hugeadmin --pool-pages-min 2M:+16
      ]
      
      Yields:
      Node 0 HugePages_Total:     0
      Node 0 HugePages_Free:      0
      Node 0 HugePages_Surp:      0
      Node 1 HugePages_Total:     0
      Node 1 HugePages_Free:      0
      Node 1 HugePages_Surp:      0
      Node 2 HugePages_Total:    16
      Node 2 HugePages_Free:     16
      Node 2 HugePages_Surp:      0
      Node 3 HugePages_Total:     0
      Node 3 HugePages_Free:      0
      Node 3 HugePages_Surp:      0
      vm.nr_hugepages = 16
      
      Global controls work as expected--reduce pool to 8 persistent huge pages:
      (me):echo 8 >/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
      
      Node 0 HugePages_Total:     0
      Node 0 HugePages_Free:      0
      Node 0 HugePages_Surp:      0
      Node 1 HugePages_Total:     0
      Node 1 HugePages_Free:      0
      Node 1 HugePages_Surp:      0
      Node 2 HugePages_Total:     8
      Node 2 HugePages_Free:      8
      Node 2 HugePages_Surp:      0
      Node 3 HugePages_Total:     0
      Node 3 HugePages_Free:      0
      Node 3 HugePages_Surp:      0
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Reviewed-by: NAndi Kleen <andi@firstfloor.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Nishanth Aravamudan <nacc@us.ibm.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: Andy Whitcroft <apw@canonical.com>
      Cc: Eric Whitney <eric.whitney@hp.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9a305230
  16. 22 9月, 2009 2 次提交
  17. 17 6月, 2009 1 次提交
  18. 13 3月, 2009 1 次提交
  19. 11 3月, 2009 1 次提交
    • R
      mm: get_nid_for_pfn() returns int · 47504980
      Roel Kluin 提交于
      get_nid_for_pfn() returns int
      
      Presumably the (nid < 0) case has never happened.
      
      We do know that it is happening on one system while creating a symlink for
      a memory section so it should also happen on the same system if
      unregister_mem_sect_under_nodes() were called to remove the same symlink.
      
      The test was actually added in response to a problem with an earlier
      version reported by Yasunori Goto where one or more of the leading pages
      of a memory section on the 2nd node of one of his systems was
      uninitialized because I believe they coincided with a memory hole.
      
      That earlier version did not ignore uninitialized pages and determined
      the nid by considering only the 1st page of each memory section.  This
      caused the symlink to the 1st memory section on the 2nd node to be
      incorrectly created in /sys/devices/system/node/node0 instead of
      /sys/devices/system/node/node1.  The problem was fixed by adding the
      test to skip over uninitialized pages.
      
      I suspect we have not seen any reports of the non-removal
      of a symlink due to the incorrect declaration of the nid
      variable in unregister_mem_sect_under_nodes() because
        - systems where a memory section could have an uninitialized
          range of leading pages are probably rare.
        - memory remove is probably not done very frequently on the
          systems that are capable of demonstrating the problem.
        - lingering symlink(s) that should have been removed may
          have simply gone unnoticed.
      
      [garyhade@us.ibm.com: wrote changelog]
      Signed-off-by: NRoel Kluin <roel.kluin@gmail.com>
      Cc: Gary Hade <garyhade@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      47504980
  20. 07 1月, 2009 1 次提交
    • G
      mm: show node to memory section relationship with symlinks in sysfs · c04fc586
      Gary Hade 提交于
      Show node to memory section relationship with symlinks in sysfs
      
      Add /sys/devices/system/node/nodeX/memoryY symlinks for all
      the memory sections located on nodeX.  For example:
      /sys/devices/system/node/node1/memory135 -> ../../memory/memory135
      indicates that memory section 135 resides on node1.
      
      Also revises documentation to cover this change as well as updating
      Documentation/ABI/testing/sysfs-devices-memory to include descriptions
      of memory hotremove files 'phys_device', 'phys_index', and 'state'
      that were previously not described there.
      
      In addition to it always being a good policy to provide users with
      the maximum possible amount of physical location information for
      resources that can be hot-added and/or hot-removed, the following
      are some (but likely not all) of the user benefits provided by
      this change.
      Immediate:
        - Provides information needed to determine the specific node
          on which a defective DIMM is located.  This will reduce system
          downtime when the node or defective DIMM is swapped out.
        - Prevents unintended onlining of a memory section that was
          previously offlined due to a defective DIMM.  This could happen
          during node hot-add when the user or node hot-add assist script
          onlines _all_ offlined sections due to user or script inability
          to identify the specific memory sections located on the hot-added
          node.  The consequences of reintroducing the defective memory
          could be ugly.
        - Provides information needed to vary the amount and distribution
          of memory on specific nodes for testing or debugging purposes.
      Future:
        - Will provide information needed to identify the memory
          sections that need to be offlined prior to physical removal
          of a specific node.
      
      Symlink creation during boot was tested on 2-node x86_64, 2-node
      ppc64, and 2-node ia64 systems.  Symlink creation during physical
      memory hot-add tested on a 2-node x86_64 system.
      Signed-off-by: NGary Hade <garyhade@us.ibm.com>
      Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c04fc586
  21. 13 12月, 2008 1 次提交
    • R
      cpumask: change cpumask_scnprintf, cpumask_parse_user, cpulist_parse, and... · 29c0177e
      Rusty Russell 提交于
      cpumask: change cpumask_scnprintf, cpumask_parse_user, cpulist_parse, and cpulist_scnprintf to take pointers.
      
      Impact: change calling convention of existing cpumask APIs
      
      Most cpumask functions started with cpus_: these have been replaced by
      cpumask_ ones which take struct cpumask pointers as expected.
      
      These four functions don't have good replacement names; fortunately
      they're rarely used, so we just change them over.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NMike Travis <travis@sgi.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: paulus@samba.org
      Cc: mingo@redhat.com
      Cc: tony.luck@intel.com
      Cc: ralf@linux-mips.org
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Cc: cl@linux-foundation.org
      Cc: srostedt@redhat.com
      29c0177e
  22. 20 10月, 2008 4 次提交
  23. 22 7月, 2008 1 次提交
    • A
      sysdev: Pass the attribute to the low level sysdev show/store function · 4a0b2b4d
      Andi Kleen 提交于
      This allow to dynamically generate attributes and share show/store
      functions between attributes. Right now most attributes are generated
      by special macros and lots of duplicated code. With the attribute
      passed it's instead possible to attach some data to the attribute
      and then use that in shared low level functions to do different things.
      
      I need this for the dynamically generated bank attributes in the x86
      machine check code, but it'll allow some further cleanups.
      
      I converted all users in tree to the new show/store prototype. It's a single
      huge patch to avoid unbisectable sections.
      
      Runtime tested: x86-32, x86-64
      Compiled only: ia64, powerpc
      Not compile tested/only grep converted: sh, arm, avr32
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      4a0b2b4d
  24. 05 7月, 2008 1 次提交
  25. 30 4月, 2008 1 次提交
  26. 20 4月, 2008 1 次提交