1. 16 12月, 2009 10 次提交
    • A
      PM: Add initcall_debug style timing for suspend/resume · f2511774
      Arjan van de Ven 提交于
      In order to diagnose overall suspend/resume times, we need
      basic instrumentation to break down the total time into per
      device timing, similar to initcall_debug.
      
      This patch adds the basic timing instrumentation, needed
      for a scritps/bootgraph.pl equivalent or humans.
      The bootgraph.pl program is still a work in progress, but
      is far enough along to know that this patch is sufficient.
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      f2511774
    • A
      PM: allow for usage_count > 0 in pm_runtime_get() · 1d531c14
      Alan Stern 提交于
      This patch (as1308c) fixes __pm_runtime_get().  Currently the routine
      will resume a device if the prior usage count was 0.  But this isn't
      right; thanks to pm_runtime_get_noresume() the usage count can be
      positive even while the device is suspended.
      Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      1d531c14
    • D
      mm: slab-allocate memory section nodemask for large systems · 9ae49fab
      David Rientjes 提交于
      Nodemasks should not be allocated on the stack for large systems (when it
      is larger than 256 bytes) since there is a threat of overflow.
      
      This patch causes the unregister_mem_sect_under_nodes() nodemask to be
      allocated on the stack for smaller systems and be allocated by slab for
      larger systems.
      
      GFP_KERNEL is used since remove_memory_block() can block.
      
      Cc: Gary Hade <garyhade@us.ibm.com>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Cc: Alex Chiang <achiang@hp.com>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9ae49fab
    • A
      mm: add numa node symlink for cpu devices in sysfs · 1830794a
      Alex Chiang 提交于
      You can discover which CPUs belong to a NUMA node by examining
      /sys/devices/system/node/node#/
      
      However, it's not convenient to go in the other direction, when looking at
      /sys/devices/system/cpu/cpu#/
      
      Yes, you can muck about in sysfs, but adding these symlinks makes life a
      lot more convenient.
      Signed-off-by: NAlex Chiang <achiang@hp.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Gary Hade <garyhade@us.ibm.com>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1830794a
    • A
      mm: refactor unregister_cpu_under_node() · b9d52dad
      Alex Chiang 提交于
      By returning early if the node is not online, we can unindent the
      interesting code by two levels.
      
      No functional change.
      Signed-off-by: NAlex Chiang <achiang@hp.com>
      Cc: Gary Hade <garyhade@us.ibm.com>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b9d52dad
    • A
      mm: refactor register_cpu_under_node() · f8246f31
      Alex Chiang 提交于
      By returning early if the node is not online, we can unindent the
      interesting code by one level.
      
      No functional change.
      Signed-off-by: NAlex Chiang <achiang@hp.com>
      Cc: Gary Hade <garyhade@us.ibm.com>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f8246f31
    • A
      mm: add numa node symlink for memory section in sysfs · dee5d0d5
      Alex Chiang 提交于
      Commit c04fc586 (mm: show node to memory section relationship with
      symlinks in sysfs) created symlinks from nodes to memory sections, e.g.
      
      /sys/devices/system/node/node1/memory135 -> ../../memory/memory135
      
      If you're examining the memory section though and are wondering what node
      it might belong to, you can find it by grovelling around in sysfs, but
      it's a little cumbersome.
      
      Add a reverse symlink for each memory section that points back to the
      node to which it belongs.
      Signed-off-by: NAlex Chiang <achiang@hp.com>
      Cc: Gary Hade <garyhade@us.ibm.com>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dee5d0d5
    • L
      hugetlb: offload per node attribute registrations · 39da08cb
      Lee Schermerhorn 提交于
      Offload the registration and unregistration of per node hstate sysfs
      attributes to a worker thread rather than attempt the
      allocation/attachment or detachment/freeing of the attributes in the
      context of the memory hotplug handler.
      
      I don't know that this is absolutely required, but the registration can
      sleep in allocations and other mem hot plug handlers do it this way.  If
      it turns out this is NOT required, we can drop this patch.
      
      N.B.,  Only tested build, boot, libhugetlbfs regression.
             i.e., no memory hotplug testing.
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Reviewed-by: NAndi Kleen <andi@firstfloor.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Nishanth Aravamudan <nacc@us.ibm.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: Andy Whitcroft <apw@canonical.com>
      Cc: Eric Whitney <eric.whitney@hp.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      39da08cb
    • L
      hugetlb: handle memory hot-plug events · 4faf8d95
      Lee Schermerhorn 提交于
      Register per node hstate attributes only for nodes with memory.  As
      suggested by David Rientjes.
      
      With Memory Hotplug, memory can be added to a memoryless node and a node
      with memory can become memoryless.  Therefore, add a memory on/off-line
      notifier callback to [un]register a node's attributes on transition
      to/from memoryless state.
      
      N.B.,  Only tested build, boot, libhugetlbfs regression.
             i.e., no memory hotplug testing.
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Reviewed-by: NAndi Kleen <andi@firstfloor.org>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Nishanth Aravamudan <nacc@us.ibm.com>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: Andy Whitcroft <apw@canonical.com>
      Cc: Eric Whitney <eric.whitney@hp.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4faf8d95
    • L
      hugetlb: add per node hstate attributes · 9a305230
      Lee Schermerhorn 提交于
      Add the per huge page size control/query attributes to the per node
      sysdevs:
      
      /sys/devices/system/node/node<ID>/hugepages/hugepages-<size>/
      	nr_hugepages       - r/w
      	free_huge_pages    - r/o
      	surplus_huge_pages - r/o
      
      The patch attempts to re-use/share as much of the existing global hstate
      attribute initialization and handling, and the "nodes_allowed" constraint
      processing as possible.
      
      Calling set_max_huge_pages() with no node indicates a change to global
      hstate parameters.  In this case, any non-default task mempolicy will be
      used to generate the nodes_allowed mask.  A valid node id indicates an
      update to that node's hstate parameters, and the count argument specifies
      the target count for the specified node.  From this info, we compute the
      target global count for the hstate and construct a nodes_allowed node mask
      contain only the specified node.
      
      Setting the node specific nr_hugepages via the per node attribute
      effectively ignores any task mempolicy or cpuset constraints.
      
      With this patch:
      
      (me):ls /sys/devices/system/node/node0/hugepages/hugepages-2048kB
      ./  ../  free_hugepages  nr_hugepages  surplus_hugepages
      
      Starting from:
      Node 0 HugePages_Total:     0
      Node 0 HugePages_Free:      0
      Node 0 HugePages_Surp:      0
      Node 1 HugePages_Total:     0
      Node 1 HugePages_Free:      0
      Node 1 HugePages_Surp:      0
      Node 2 HugePages_Total:     0
      Node 2 HugePages_Free:      0
      Node 2 HugePages_Surp:      0
      Node 3 HugePages_Total:     0
      Node 3 HugePages_Free:      0
      Node 3 HugePages_Surp:      0
      vm.nr_hugepages = 0
      
      Allocate 16 persistent huge pages on node 2:
      (me):echo 16 >/sys/devices/system/node/node2/hugepages/hugepages-2048kB/nr_hugepages
      
      [Note that this is equivalent to:
      	numactl -m 2 hugeadmin --pool-pages-min 2M:+16
      ]
      
      Yields:
      Node 0 HugePages_Total:     0
      Node 0 HugePages_Free:      0
      Node 0 HugePages_Surp:      0
      Node 1 HugePages_Total:     0
      Node 1 HugePages_Free:      0
      Node 1 HugePages_Surp:      0
      Node 2 HugePages_Total:    16
      Node 2 HugePages_Free:     16
      Node 2 HugePages_Surp:      0
      Node 3 HugePages_Total:     0
      Node 3 HugePages_Free:      0
      Node 3 HugePages_Surp:      0
      vm.nr_hugepages = 16
      
      Global controls work as expected--reduce pool to 8 persistent huge pages:
      (me):echo 8 >/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
      
      Node 0 HugePages_Total:     0
      Node 0 HugePages_Free:      0
      Node 0 HugePages_Surp:      0
      Node 1 HugePages_Total:     0
      Node 1 HugePages_Free:      0
      Node 1 HugePages_Surp:      0
      Node 2 HugePages_Total:     8
      Node 2 HugePages_Free:      8
      Node 2 HugePages_Surp:      0
      Node 3 HugePages_Total:     0
      Node 3 HugePages_Free:      0
      Node 3 HugePages_Surp:      0
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Reviewed-by: NAndi Kleen <andi@firstfloor.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Nishanth Aravamudan <nacc@us.ibm.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: Andy Whitcroft <apw@canonical.com>
      Cc: Eric Whitney <eric.whitney@hp.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9a305230
  2. 12 12月, 2009 10 次提交
  3. 09 12月, 2009 2 次提交
    • G
      powerpc/pseries: Serialize cpu hotplug operations during deactivate Vs deallocate · 51badebd
      Gautham R Shenoy 提交于
      Currently the cpu-allocation/deallocation process comprises of two steps:
      - Set the indicators and to update the device tree with DLPAR node
        information.
      
      - Online/offline the allocated/deallocated CPU.
      
      This is achieved by writing to the sysfs tunables "probe" during allocation
      and "release" during deallocation.
      
      At the sametime, the userspace can independently online/offline the CPUs of
      the system using the sysfs tunable "online".
      
      It is quite possible that when a userspace tool offlines a CPU
      for the purpose of deallocation and is in the process of updating the device
      tree, some other userspace tool could bring the CPU back online by writing to
      the "online" sysfs tunable thereby causing the deallocate process to fail.
      
      The solution to this is to serialize writes to the "probe/release" sysfs
      tunable with the writes to the "online" sysfs tunable.
      
      This patch employs a mutex to provide this serialization, which is a no-op on
      all architectures except PPC_PSERIES
      Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
      Acked-by: NVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      51badebd
    • N
      sysfs/cpu: Add probe/release files · 12633e80
      Nathan Fontenot 提交于
      Version 3 of this patch is updated with documentation added to
      Documentation/ABI.  There are no changes to any of the C code from v2
      of the patch.
      
      In order to support kernel DLPAR of CPU resources we need to provide an
      interface to add (probe) and remove (release) the resource from the system.
      This patch Creates new generic probe and release sysfs files to facilitate
      cpu probe/release.  The probe/release interface provides for allowing each
      arch to supply their own routines for implementing the backend of adding
      and removing cpus to/from the system.
      
      This also creates the powerpc specific stubs to handle the arch callouts
      from writes to the sysfs files.
      
      The creation and use of these files is regulated by the
      CONFIG_ARCH_CPU_PROBE_RELEASE option so that only architectures that need the
      capability will have the files created.
      Signed-off-by: NNathan Fontenot <nfont@austin.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      12633e80
  4. 06 12月, 2009 4 次提交
  5. 04 12月, 2009 1 次提交
  6. 29 11月, 2009 1 次提交
    • A
      PM: fix irq enable/disable in runtime PM code · 862f89b3
      Alan Stern 提交于
      This patch (as1305) fixes a bug in the irq-enable settings and removes
      some related overhead in the runtime PM code.
      
      	In __pm_runtime_resume(), within the scope of the original
      	spin_lock_irq(), we know that irqs are disabled.  There's no
      	reason to go through a pair of enable/disable cycles when
      	acquiring and releasing the parent's lock.
      
      	In __pm_runtime_set_status(), irqs are already disabled when
      	the parent's lock is acquired, and they must remain disabled
      	when it is released.
      Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      862f89b3
  7. 25 11月, 2009 1 次提交
    • V
      percpu: Fix kdump failure if booted with percpu_alloc=page · 3b034b0d
      Vivek Goyal 提交于
      o kdump functionality reserves a per cpu area at boot time and exports the
        physical address of that area to user space through sys interface. This
        area stores some dump related information like cpu register states etc
        at the time of crash.
      
      o We were assuming that per cpu area always come from linearly mapped meory
        region and using __pa() to determine physical address.
        With percpu_alloc=page, per cpu area can come from vmalloc region also and
        __pa() breaks.
      
      o This patch implments a new function to convert per cpu address to
        physical address.
      
      Before the patch, crash_notes addresses looked as follows.
      
      cpu0 60fffff49800
      cpu1 60fffff60800
      cpu2 60fffff77800
      
      These are bogus phsyical addresses.
      
      After the patch, address are following.
      
      cpu0 13eb44000
      cpu1 13eb43000
      cpu2 13eb42000
      cpu3 13eb41000
      
      These look fine. I got 4G of memory and /proc/iomem tell me following.
      
      100000000-13fffffff : System RAM
      
      tj: * added missing asm/io.h include reported by Stephen Rothwell
          * repositioned per_cpu_ptr_phys() in percpu.c and added comment.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      3b034b0d
  8. 03 11月, 2009 2 次提交
  9. 31 10月, 2009 2 次提交
  10. 22 9月, 2009 2 次提交
  11. 20 9月, 2009 1 次提交
  12. 16 9月, 2009 4 次提交
    • K
      Driver Core: devtmpfs - kernel-maintained tmpfs-based /dev · 2b2af54a
      Kay Sievers 提交于
      Devtmpfs lets the kernel create a tmpfs instance called devtmpfs
      very early at kernel initialization, before any driver-core device
      is registered. Every device with a major/minor will provide a
      device node in devtmpfs.
      
      Devtmpfs can be changed and altered by userspace at any time,
      and in any way needed - just like today's udev-mounted tmpfs.
      Unmodified udev versions will run just fine on top of it, and will
      recognize an already existing kernel-created device node and use it.
      The default node permissions are root:root 0600. Proper permissions
      and user/group ownership, meaningful symlinks, all other policy still
      needs to be applied by userspace.
      
      If a node is created by devtmps, devtmpfs will remove the device node
      when the device goes away. If the device node was created by
      userspace, or the devtmpfs created node was replaced by userspace, it
      will no longer be removed by devtmpfs.
      
      If it is requested to auto-mount it, it makes init=/bin/sh work
      without any further userspace support. /dev will be fully populated
      and dynamic, and always reflect the current device state of the kernel.
      With the commonly used dynamic device numbers, it solves the problem
      where static devices nodes may point to the wrong devices.
      
      It is intended to make the initial bootup logic simpler and more robust,
      by de-coupling the creation of the inital environment, to reliably run
      userspace processes, from a complex userspace bootstrap logic to provide
      a working /dev.
      Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NJan Blunck <jblunck@suse.de>
      Tested-By: NHarald Hoyer <harald@redhat.com>
      Tested-By: NScott James Remnant <scott@ubuntu.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      2b2af54a
    • A
      driver core: platform_device_add_data(): use kmemdup() · daa41226
      Andrew Morton 提交于
      Instead of open-coding it.
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      daa41226
    • J
      Driver core: Add support for compatibility classes · 46227094
      Jean Delvare 提交于
      When turning class devices into bus devices, we may need to
      temporarily add links in sysfs so that user-space applications
      are not confused. This is done by adding the following API:
      
      * Functions to register and unregister compatibility classes.
        These appear in sysfs at the same location as regular classes, but
        instead of class devices, they contain links to bus devices.
      * Functions to create and delete such links. Additionally, the caller
        can optionally pass a target device to which a "device" link should
        point (typically that would be the device's parent), to fully emulate
        the original class device.
      
      The i2c subsystem will be the first user of this API, as i2c adapters
      are being converted from class devices to bus devices.
      Signed-off-by: NJean Delvare <khali@linux-fr.org>
      Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
      46227094
    • M
      driver-core: move dma-coherent.c from kernel to driver/base · a56af876
      Ming Lei 提交于
      Placing dma-coherent.c in driver/base is better than in kernel,
      since it contains code to do per-device coherent dma memory
      handling.
      Signed-off-by: NMing Lei <tom.leiming@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      a56af876