提交 · 25cf84cf377c0aae5dbcf937ea89bc7893db5176 · openeuler / Kernel

07 3月, 2010 1 次提交

PM: Provide generic subsystem-level callbacks · d690b2cd

由 Rafael J. Wysocki 提交于 3月 06, 2010

There are subsystems whose power management callbacks only need to
invoke the callbacks provided by device drivers.  Still, their system
sleep PM callbacks should play well with the runtime PM callbacks,
so that devices suspended at run time can be left in that state for
a system sleep transition.

Provide a set of generic PM callbacks for such subsystems and
define convenience macros for populating dev_pm_ops structures.
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

d690b2cd

27 2月, 2010 7 次提交

PM: Allow device drivers to use dpm_wait() · f8824cee

由 Rafael J. Wysocki 提交于 1月 27, 2010

There are some dependencies between devices (in particular, between
EHCI USB controllers and their OHCI/UHCI siblings) which are not
reflected by the structure of the device tree.  With synchronous
suspend and resume these dependencies are taken into accout
automatically, because the devices in question are always registered
in the right order, but to meet these constraints with asynchronous
suspend and resume the drivers of these devices will need to use
dpm_wait() in their suspend/resume routines, so introduce a helper
function allowing them to do that.
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

f8824cee

PM: Start asynchronous resume threads upfront · 97df8c12

由 Rafael J. Wysocki 提交于 1月 23, 2010

It has been shown by testing that total device resume time can be
reduced significantly (by as much as 50% or more) if the async
threads executing some devices' resume routines are all started
before the main resume thread starts to handle the "synchronous"
devices.

This is a consequence of the fact that the slowest devices tend to be
located at the end of dpm_list, so their resume routines are started
very late.  Consequently, they have to wait for all the preceding
"synchronous" devices before their resume routines can be started
by the main resume thread, even if they are "asynchronous".  By
starting their async threads upfront we effectively move those
devices towards the beginning of dpm_list, without breaking their
ordering with respect to their parents and children.  As a result,
their resume routines are started much earlier and we are able to
save much more device resume time this way.
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

97df8c12

PM: Add facility for advanced testing of async suspend/resume · 5a2eb858

由 Rafael J. Wysocki 提交于 1月 23, 2010

Add configuration switch CONFIG_PM_ADVANCED_DEBUG for compiling in
extra PM debugging/testing code allowing one to access some
PM-related attributes of devices from the user space via sysfs.

If CONFIG_PM_ADVANCED_DEBUG is set, add sysfs attribute power/async
for every device allowing the user space to access the device's
power.async_suspend flag and modify it, if desired.
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

5a2eb858

PM: Add a switch for disabling/enabling asynchronous suspend/resume · 0e06b4a8

由 Rafael J. Wysocki 提交于 1月 23, 2010

Add sysfs attribute /sys/power/pm_async allowing the user space to
disable/enable asynchronous suspend/resume of devices.
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

0e06b4a8

PM: Asynchronous suspend and resume of devices · 5af84b82

由 Rafael J. Wysocki 提交于 1月 23, 2010

Theoretically, the total time of system sleep transitions (suspend
to RAM, hibernation) can be reduced by running suspend and resume
callbacks of device drivers in parallel with each other.  However,
there are dependencies between devices such that we're not allowed
to suspend the parent of a device before suspending the device
itself.  Analogously, we're not allowed to resume a device before
resuming its parent.

The most straightforward way to take these dependencies into accout
is to start the async threads used for suspending and resuming
devices at the core level, so that async_schedule() is called for
each suspend and resume callback supposed to be executed
asynchronously.

For this purpose, introduce a new device flag, power.async_suspend,
used to mark the devices whose suspend and resume callbacks are to be
executed asynchronously (ie. in parallel with the main suspend/resume
thread and possibly in parallel with each other) and helper function
device_enable_async_suspend() allowing one to set power.async_suspend
for given device (power.async_suspend is unset by default for all
devices).  For each device with the power.async_suspend flag set the
PM core will use async_schedule() to execute its suspend and resume
callbacks.

The async threads started for different devices as a result of
calling async_schedule() are synchronized with each other and with
the main suspend/resume thread with the help of completions, in the
following way:
(1) There is a completion, power.completion, for each device object.
(2) Each device's completion is reset before calling async_schedule()
    for the device or, in the case of devices with the
    power.async_suspend flags unset, before executing the device's
    suspend and resume callbacks.
(3) During suspend, right before running the bus type, device type
    and device class suspend callbacks for the device, the PM core
    waits for the completions of all the device's children to be
    completed.
(4) During resume, right before running the bus type, device type and
    device class resume callbacks for the device, the PM core waits
    for the completion of the device's parent to be completed.
(5) The PM core completes power.completion for each device right
    after the bus type, device type and device class suspend (or
    resume) callbacks executed for the device have returned.
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

5af84b82

PM: Add parent information to timing messages · 8cc6b39f

由 Rafael J. Wysocki 提交于 1月 23, 2010

Add parent information to the messages printed by the suspend/resume
core when initcall_debug is set.
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

8cc6b39f

PM / Runtime: Add sysfs switch for disabling device run-time PM · 53823639

由 Rafael J. Wysocki 提交于 1月 23, 2010

Add new device sysfs attribute, power/control, allowing the user
space to block the run-time power management of the devices.  If this
attribute is set to "on", the driver of the device won't be able to power
manage it at run time (without breaking the rules) and the device will
always be in the full power state (except when the entire system goes
into a sleep state).
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Acked-by: NAlan Stern <stern@rowland.harvard.edu>

53823639

17 2月, 2010 1 次提交

class: Free the class private data in class_release · 18d19c96

由 Laurent Pinchart 提交于 2月 10, 2010

Fix a memory leak by freeing the memory allocated in __class_register
for the class private data.
Signed-off-by: NLaurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
Cc: stable <stable@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

18d19c96

21 1月, 2010 2 次提交

Revert "sysdev: fix prototype for memory_sysdev_class show/store functions" · bd796671

由 Greg Kroah-Hartman 提交于 1月 19, 2010

This reverts commit 8ff410da

It should not have been sent to Linus's tree yet, as it depends
on changes that are queued up in my driver-core for the .34 kernel
merge.

Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: "Zheng, Shaohui" <shaohui.zheng@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

bd796671

driver-core: fix devtmpfs crash on s390 · f776c5ec

由 Heiko Carstens 提交于 1月 18, 2010

On Mon, Jan 18, 2010 at 05:26:20PM +0530, Sachin Sant wrote:
> Hello Heiko,
>
> Today while trying to boot next-20100118 i came across
> the following Oops :
>
> Brought up 4 CPUs
> Unable to handle kernel pointer dereference at virtual kernel address 0000000000
> 543000
> Oops: 0004 #1 SMP
> Modules linked in:
> CPU: 0 Not tainted 2.6.33-rc4-autotest-next-20100118-5-default #1
> Process swapper (pid: 1, task: 00000000fd792038, ksp: 00000000fd797a30)
> Krnl PSW : 0704200180000000 00000000001eb0b8 (shmem_parse_options+0xc0/0x328)
>           R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:2 PM:0 EA:3
> Krnl GPRS: 000000000054388a 000000000000003d 0000000000543836 000000000000003d
>           0000000000000000 0000000000483f28 0000000000536112 00000000fd797d00
>           00000000fd4ba100 0000000000000100 0000000000483978 0000000000543832
>           0000000000000000 0000000000465958 00000000001eb0b0 00000000fd797c58
> Krnl Code: 00000000001eb0aa: c0e5000994f1       brasl   %r14,31da8c
>           00000000001eb0b0: b9020022           ltgr    %r2,%r2
>           00000000001eb0b4: a784010b           brc     8,1eb2ca
>          >00000000001eb0b8: 92002000           mvi     0(%r2),0
>           00000000001eb0bc: a7080000           lhi     %r0,0
>           00000000001eb0c0: 41902001           la      %r9,1(%r2)
>           00000000001eb0c4: b9040016           lgr     %r1,%r6
>           00000000001eb0c8: b904002b           lgr     %r2,%r11
> Call Trace:
> (<00000000fd797c50> 0xfd797c50)
> <00000000001eb5da> shmem_fill_super+0x13a/0x25c
> <0000000000228cfa> get_sb_single+0xbe/0xdc
> <000000000034ffc0> dev_get_sb+0x2c/0x38
> <000000000066c602> devtmpfs_init+0x46/0xc0
> <000000000066c53e> driver_init+0x22/0x60
> <000000000064d40a> kernel_init+0x24e/0x3d0
> <000000000010a7ea> kernel_thread_starter+0x6/0xc
> <000000000010a7e4> kernel_thread_starter+0x0/0xc
>
> I never tried to boot a kernel with DEVTMPFS enabled on a s390 box.
> So am wondering if this is supported or not ? If you think this
> is supported i will send a mail to community on this.

There is nothing arch specific to devtmpfs. This part crashes because the
kernel tries to modify the data read-only section which is write protected
on s390.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: NKay Sievers <kay.sievers@vrfy.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

f776c5ec

17 1月, 2010 2 次提交

sysdev: fix prototype for memory_sysdev_class show/store functions · 8ff410da

由 Wu Fengguang 提交于 1月 15, 2010

The function prototype mismatches in call stack:

                [<ffffffff81494268>] print_block_size+0x58/0x60
                [<ffffffff81487e3f>] sysdev_class_show+0x1f/0x30
                [<ffffffff811d629b>] sysfs_read_file+0xcb/0x1f0
                [<ffffffff81176328>] vfs_read+0xc8/0x180

Due to prototype mismatch, print_block_size() will sprintf() into
*attribute instead of *buf, hence user space will read the initial
zeros from *buf:
	$ hexdump /sys/devices/system/memory/block_size_bytes
	0000000 0000 0000 0000 0000
	0000008

After patch:
	cat /sys/devices/system/memory/block_size_bytes
	0x8000000

This complements commits c29af9636 and 4a0b2b4d.
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: "Zheng, Shaohui" <shaohui.zheng@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8ff410da

memory-hotplug: add 0x prefix to HEX block_size_bytes · ba168fc3

由 Wu Fengguang 提交于 1月 15, 2010

Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ba168fc3

12 1月, 2010 1 次提交

power: fix kernel-doc notation · 0a884223

由 Randy Dunlap 提交于 1月 08, 2010

Warning(drivers/base/power/main.c:453): No description found for parameter 'dev'
Warning(drivers/base/power/main.c:453): No description found for parameter 'cb'
Warning(drivers/base/power/main.c:719): No description found for parameter 'dev'
Warning(drivers/base/power/main.c:719): No description found for parameter 'state'
Warning(drivers/base/power/main.c:719): No description found for parameter 'cb'
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0a884223

24 12月, 2009 8 次提交

devtmpfs: unlock mutex in case of string allocation error · 80422738

由 Kay Sievers 提交于 12月 22, 2009

Reported-by: NKirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

80422738

Driver core: export platform_device_register_data as a GPL symbol · 0787fdf7

由 Michael Hennerich 提交于 12月 21, 2009

This allows MFD's to register/bind drivers for their sub devices while
still being compiled as a module.
Signed-off-by: NMichael Hennerich <michael.hennerich@analog.com>
Signed-off-by: NMike Frysinger <vapier@gentoo.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

0787fdf7

driver core: Prevent reference to freed memory on error path · 99b28f1b

由 Phil Carmody 提交于 12月 14, 2009

priv is drv->p. So only free drv->p after we've finished using priv.

Found using a static code analysis tool
Signed-off-by: NPhil Carmody <ext-phil.2.carmody@nokia.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

99b28f1b

Driver-core: Fix bogus 0 error return in device_add() · e6309e75

由 Thomas Gleixner 提交于 12月 10, 2009

If device_add() is called with a device which does not have dev->p set
up, then device_private_init() is called. If that succeeds, then the
error variable is set to 0. Now if the dev_name(dev) check further
down fails, then device_add() correctly terminates, but returns 0.
That of course lets the driver progress. If later another driver uses
this half set up device as parent then device_add() of the child
device explodes and renders sysfs completely unusable.

Set the error to -EINVAL if dev_name() check fails.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: "Hans J. Koch" <hjk@linutronix.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

e6309e75

Driver core: driver_attribute parameters can often be const* · 099c2f21

由 Phil Carmody 提交于 12月 18, 2009

Many struct driver_attribute descriptors are purely read-only
structures, and there's no need to change them. Therefore make
the promise not to, which will let those descriptors be put in
a ro section.
Signed-off-by: NPhil Carmody <ext-phil.2.carmody@nokia.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

099c2f21

Driver core: bin_attribute parameters can often be const* · 66ecb92b

由 Phil Carmody 提交于 12月 18, 2009

Many struct bin_attribute descriptors are purely read-only
structures, and there's no need to change them. Therefore
make the promise not to, which will let those descriptors
be put in a ro section.
Signed-off-by: NPhil Carmody <ext-phil.2.carmody@nokia.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

66ecb92b

Driver core: device_attribute parameters can often be const* · 26579ab7

由 Phil Carmody 提交于 12月 18, 2009

Most device_attributes are const, and are begging to be
put in a ro section. However, the create and remove
file interfaces were failing to propagate the const promise
which the only functions they call offer.
Signed-off-by: NPhil Carmody <ext-phil.2.carmody@nokia.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

26579ab7

devtmpfs: Convert dirlock to a mutex · f1f76f86

由 Thomas Gleixner 提交于 12月 16, 2009

devtmpfs has a rw_lock dirlock which serializes delete_path and
create_path.

This code was obviously never tested with the usual set of debugging
facilities enabled. In the dirlock held sections the code calls:

 - vfs functions which take mutexes
 - kmalloc(, GFP_KERNEL)

In both code pathes the might sleep warning triggers and spams dmesg.

Convert the rw_lock to a mutex. There is no reason why this needs to
be a rwlock.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: stable <stable@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

f1f76f86

23 12月, 2009 1 次提交

PM / Runtime: Use device type and device class callbacks · a6ab7aa9

由 Rafael J. Wysocki 提交于 12月 22, 2009

The power management of some devices is handled through device types
and device classes rather than through bus types.  Since these
devices may also benefit from using the run-time power management
core, extend it so that the device type and device class run-time PM
callbacks can be taken into consideration by it if the bus type
callback is not defined.

Update the run-time PM core documentation to reflect this change.
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

a6ab7aa9

21 12月, 2009 1 次提交

PM: Use pm_runtime_put_sync in system resume · aa0baaef

由 Alan Stern 提交于 12月 21, 2009

This patch (as1317) fixes a bug in the PM core.  When a device is
resumed following a system sleep, the core decrements the device's
runtime PM usage counter but doesn't issue an idle notification if the
counter reaches 0.  This could prevent an otherwise unused device from
being runtime-suspended again after the system sleep.

The fix is to call pm_runtime_put_sync() instead of
pm_runtime_put_noidle().
Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

aa0baaef

18 12月, 2009 3 次提交

mm: Add notifier in pageblock isolation for balloon drivers · 925cc71e

由 Robert Jennings 提交于 12月 17, 2009

Memory balloon drivers can allocate a large amount of memory which is not
movable but could be freed to accomodate memory hotplug remove.

Prior to calling the memory hotplug notifier chain the memory in the
pageblock is isolated.  Currently, if the migrate type is not
MIGRATE_MOVABLE the isolation will not proceed, causing the memory removal
for that page range to fail.

Rather than failing pageblock isolation if the migrateteype is not
MIGRATE_MOVABLE, this patch checks if all of the pages in the pageblock,
and not on the LRU, are owned by a registered balloon driver (or other
entity) using a notifier chain.  If all of the non-movable pages are owned
by a balloon, they can be freed later through the memory notifier chain
and the range can still be isolated in set_migratetype_isolate().
Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Brian King <brking@linux.vnet.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Gerald Schaefer <geralds@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

925cc71e

PM: Measure device suspend and resume times · ecf762b2

由 Rafael J. Wysocki 提交于 12月 18, 2009

Measure and print the time of suspending and resuming all devices.
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

ecf762b2

PM: Make the initcall_debug style timing for suspend/resume complete · 875ab0b7

由 Rafael J. Wysocki 提交于 12月 18, 2009

Commit f2511774
(PM: Add initcall_debug style timing for suspend/resume) introduced
basic timing instrumentation, needed for a scritps/bootgraph.pl
equivalent or humans, but it missed the fact that bus types and
device classes which haven't been switched to using struct dev_pm_ops
objects yet need special handling.  As a result, the suspend/resume
timing information is only available for devices whose bus types or
device classes use struct dev_pm_ops objects, so the majority of
devices is not covered.

Fix this by adding basic suspend/resume timing instrumentation for
devices whose bus types and device classes still don't use struct
dev_pm_ops objects for power management.  To reduce code duplication
move the timing code to helper functions.
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

875ab0b7

16 12月, 2009 13 次提交

HWPOISON: Add soft page offline support · facb6011

由 Andi Kleen 提交于 12月 16, 2009

This is a simpler, gentler variant of memory_failure() for soft page
offlining controlled from user space.  It doesn't kill anything, just
tries to invalidate and if that doesn't work migrate the
page away.

This is useful for predictive failure analysis, where a page has
a high rate of corrected errors, but hasn't gone bad yet. Instead
it can be offlined early and avoided.

The offlining is controlled from sysfs, including a new generic
entry point for hard page offlining for symmetry too.

We use the page isolate facility to prevent re-allocation
race. Normally this is only used by memory hotplug. To avoid
races with memory allocation I am using lock_system_sleep().
This avoids the situation where memory hotplug is about
to isolate a page range and then hwpoison undoes that work.
This is a big hammer currently, but the simplest solution
currently.

When the page is not free or LRU we try to free pages
from slab and other caches. The slab freeing is currently
quite dumb and does not try to focus on the specific slab
cache which might own the page. This could be potentially
improved later.

Thanks to Fengguang Wu and Haicheng Li for some fixes.

[Added fix from Andrew Morton to adapt to new migrate_pages prototype]
Signed-off-by: NAndi Kleen <ak@linux.intel.com>

facb6011

PM: rwsem.h need not be included into main.c · d8bed5a4

由 Rafael J. Wysocki 提交于 12月 13, 2009

It is not necessary to include <linux/rwsem.h> into
drivers/base/power/main.c, so don't do that.
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

d8bed5a4

PM: Remove unnecessary goto from device_resume_noirq() · 33c33740

由 Rafael J. Wysocki 提交于 12月 13, 2009

In device_resume_noirq() there is the 'End' label and the associated
goto statement that aren't strictly necessary, so rework the code to
get rid of them.  Also modify device_suspend_noirq() so that it looks
completely analogous to device_resume_noirq().
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

33c33740

PM: Add initcall_debug style timing for suspend/resume · f2511774

由 Arjan van de Ven 提交于 12月 13, 2009

In order to diagnose overall suspend/resume times, we need
basic instrumentation to break down the total time into per
device timing, similar to initcall_debug.

This patch adds the basic timing instrumentation, needed
for a scritps/bootgraph.pl equivalent or humans.
The bootgraph.pl program is still a work in progress, but
is far enough along to know that this patch is sufficient.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

f2511774

PM: allow for usage_count > 0 in pm_runtime_get() · 1d531c14

由 Alan Stern 提交于 12月 13, 2009

This patch (as1308c) fixes __pm_runtime_get().  Currently the routine
will resume a device if the prior usage count was 0.  But this isn't
right; thanks to pm_runtime_get_noresume() the usage count can be
positive even while the device is suspended.
Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

1d531c14

mm: slab-allocate memory section nodemask for large systems · 9ae49fab

由 David Rientjes 提交于 12月 14, 2009

Nodemasks should not be allocated on the stack for large systems (when it
is larger than 256 bytes) since there is a threat of overflow.

This patch causes the unregister_mem_sect_under_nodes() nodemask to be
allocated on the stack for smaller systems and be allocated by slab for
larger systems.

GFP_KERNEL is used since remove_memory_block() can block.

Cc: Gary Hade <garyhade@us.ibm.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Alex Chiang <achiang@hp.com>
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9ae49fab

mm: add numa node symlink for cpu devices in sysfs · 1830794a

由 Alex Chiang 提交于 12月 14, 2009

You can discover which CPUs belong to a NUMA node by examining
/sys/devices/system/node/node#/

However, it's not convenient to go in the other direction, when looking at
/sys/devices/system/cpu/cpu#/

Yes, you can muck about in sysfs, but adding these symlinks makes life a
lot more convenient.
Signed-off-by: NAlex Chiang <achiang@hp.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Cc: Gary Hade <garyhade@us.ibm.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg KH <greg@kroah.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1830794a

mm: refactor unregister_cpu_under_node() · b9d52dad

由 Alex Chiang 提交于 12月 14, 2009

By returning early if the node is not online, we can unindent the
interesting code by two levels.

No functional change.
Signed-off-by: NAlex Chiang <achiang@hp.com>
Cc: Gary Hade <garyhade@us.ibm.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg KH <greg@kroah.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b9d52dad

mm: refactor register_cpu_under_node() · f8246f31

由 Alex Chiang 提交于 12月 14, 2009

By returning early if the node is not online, we can unindent the
interesting code by one level.

No functional change.
Signed-off-by: NAlex Chiang <achiang@hp.com>
Cc: Gary Hade <garyhade@us.ibm.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg KH <greg@kroah.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f8246f31

mm: add numa node symlink for memory section in sysfs · dee5d0d5

由 Alex Chiang 提交于 12月 14, 2009

Commit c04fc586 (mm: show node to memory section relationship with
symlinks in sysfs) created symlinks from nodes to memory sections, e.g.

/sys/devices/system/node/node1/memory135 -> ../../memory/memory135

If you're examining the memory section though and are wondering what node
it might belong to, you can find it by grovelling around in sysfs, but
it's a little cumbersome.

Add a reverse symlink for each memory section that points back to the
node to which it belongs.
Signed-off-by: NAlex Chiang <achiang@hp.com>
Cc: Gary Hade <garyhade@us.ibm.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Acked-by: NDavid Rientjes <rientjes@google.com>
Cc: Greg KH <greg@kroah.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dee5d0d5

hugetlb: offload per node attribute registrations · 39da08cb

由 Lee Schermerhorn 提交于 12月 14, 2009

Offload the registration and unregistration of per node hstate sysfs
attributes to a worker thread rather than attempt the
allocation/attachment or detachment/freeing of the attributes in the
context of the memory hotplug handler.

I don't know that this is absolutely required, but the registration can
sleep in allocations and other mem hot plug handlers do it this way.  If
it turns out this is NOT required, we can drop this patch.

N.B.,  Only tested build, boot, libhugetlbfs regression.
       i.e., no memory hotplug testing.
Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
Reviewed-by: NAndi Kleen <andi@firstfloor.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

39da08cb

hugetlb: handle memory hot-plug events · 4faf8d95

由 Lee Schermerhorn 提交于 12月 14, 2009

Register per node hstate attributes only for nodes with memory.  As
suggested by David Rientjes.

With Memory Hotplug, memory can be added to a memoryless node and a node
with memory can become memoryless.  Therefore, add a memory on/off-line
notifier callback to [un]register a node's attributes on transition
to/from memoryless state.

N.B.,  Only tested build, boot, libhugetlbfs regression.
       i.e., no memory hotplug testing.
Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
Reviewed-by: NAndi Kleen <andi@firstfloor.org>
Acked-by: NDavid Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4faf8d95

hugetlb: add per node hstate attributes · 9a305230

由 Lee Schermerhorn 提交于 12月 14, 2009

Add the per huge page size control/query attributes to the per node
sysdevs:

/sys/devices/system/node/node<ID>/hugepages/hugepages-<size>/
	nr_hugepages       - r/w
	free_huge_pages    - r/o
	surplus_huge_pages - r/o

The patch attempts to re-use/share as much of the existing global hstate
attribute initialization and handling, and the "nodes_allowed" constraint
processing as possible.

Calling set_max_huge_pages() with no node indicates a change to global
hstate parameters.  In this case, any non-default task mempolicy will be
used to generate the nodes_allowed mask.  A valid node id indicates an
update to that node's hstate parameters, and the count argument specifies
the target count for the specified node.  From this info, we compute the
target global count for the hstate and construct a nodes_allowed node mask
contain only the specified node.

Setting the node specific nr_hugepages via the per node attribute
effectively ignores any task mempolicy or cpuset constraints.

With this patch:

(me):ls /sys/devices/system/node/node0/hugepages/hugepages-2048kB
./  ../  free_hugepages  nr_hugepages  surplus_hugepages

Starting from:
Node 0 HugePages_Total:     0
Node 0 HugePages_Free:      0
Node 0 HugePages_Surp:      0
Node 1 HugePages_Total:     0
Node 1 HugePages_Free:      0
Node 1 HugePages_Surp:      0
Node 2 HugePages_Total:     0
Node 2 HugePages_Free:      0
Node 2 HugePages_Surp:      0
Node 3 HugePages_Total:     0
Node 3 HugePages_Free:      0
Node 3 HugePages_Surp:      0
vm.nr_hugepages = 0

Allocate 16 persistent huge pages on node 2:
(me):echo 16 >/sys/devices/system/node/node2/hugepages/hugepages-2048kB/nr_hugepages

[Note that this is equivalent to:
	numactl -m 2 hugeadmin --pool-pages-min 2M:+16
]

Yields:
Node 0 HugePages_Total:     0
Node 0 HugePages_Free:      0
Node 0 HugePages_Surp:      0
Node 1 HugePages_Total:     0
Node 1 HugePages_Free:      0
Node 1 HugePages_Surp:      0
Node 2 HugePages_Total:    16
Node 2 HugePages_Free:     16
Node 2 HugePages_Surp:      0
Node 3 HugePages_Total:     0
Node 3 HugePages_Free:      0
Node 3 HugePages_Surp:      0
vm.nr_hugepages = 16

Global controls work as expected--reduce pool to 8 persistent huge pages:
(me):echo 8 >/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages

Node 0 HugePages_Total:     0
Node 0 HugePages_Free:      0
Node 0 HugePages_Surp:      0
Node 1 HugePages_Total:     0
Node 1 HugePages_Free:      0
Node 1 HugePages_Surp:      0
Node 2 HugePages_Total:     8
Node 2 HugePages_Free:      8
Node 2 HugePages_Surp:      0
Node 3 HugePages_Total:     0
Node 3 HugePages_Free:      0
Node 3 HugePages_Surp:      0
Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: NMel Gorman <mel@csn.ul.ie>
Reviewed-by: NAndi Kleen <andi@firstfloor.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9a305230

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功