- 22 10月, 2011 1 次提交
-
-
由 Rafael J. Wysocki 提交于
The generic PM domains code in drivers/base/power/domain.c has to avoid powering off domains that provide power to wakeup devices during system suspend. Currently, however, this only works for wakeup devices directly belonging to the given domain and not for their children (or the children of their children and so on). Thus, if there's a wakeup device whose parent belongs to a power domain handled by the generic PM domains code, the domain will be powered off during system suspend preventing the device from signaling wakeup. To address this problem introduce a device flag, power.wakeup_path, that will be set during system suspend for all wakeup devices, their parents, the parents of their parents and so on. This way, all wakeup paths in the device hierarchy will be marked and the generic PM domains code will only need to avoid powering off domains containing devices whose power.wakeup_path is set. Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
-
- 17 10月, 2011 4 次提交
-
-
由 Rafael J. Wysocki 提交于
There is a problem with the current ordering of hibernate code which leads to deadlocks in some filesystems' memory shrinkers. Namely, some filesystems use freezable kernel threads that are inactive when the hibernate memory preallocation is carried out. Those same filesystems use memory shrinkers that may be triggered by the hibernate memory preallocation. If those memory shrinkers wait for the frozen kernel threads, the hibernate process deadlocks (this happens with XFS, for one example). Apparently, it is not technically viable to redesign the filesystems in question to avoid the situation described above, so the only possible solution of this issue is to defer the freezing of kernel threads until the hibernate memory preallocation is done, which is implemented by this change. Unfortunately, this requires the memory preallocation to be done before the "prepare" stage of device freeze, so after this change the only way drivers can allocate additional memory for their freeze routines in a clean way is to use PM notifiers. Reported-by: NChristoph <cr2005@u-club.de> Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
-
由 H Hartley Sweeten 提交于
Introduce the config option CONFIG_VT_CONSOLE_SLEEP in order to cleanup the #if defined ugliness for the vt suspend support functions. Note that CONFIG_VT_CONSOLE is already dependant on CONFIG_VT. The function pm_set_vt_switch is actually dependant on CONFIG_VT and not CONFIG_PM_SLEEP. This fixes a compile error when CONFIG_PM_SLEEP is not set: drivers/tty/vt/vt_ioctl.c:1794: error: redefinition of 'pm_set_vt_switch' include/linux/suspend.h:17: error: previous definition of 'pm_set_vt_switch' was here Also, remove the incorrect path from the comment in console.c. [rjw: Replaced #if defined() with #ifdef in suspend.h.] Signed-off-by: NH Hartley Sweeten <hsweeten@visionengravers.com> Acked-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
-
由 Martin Schwidefsky 提交于
For s390 there is one additional byte associated with each page, the storage key. This byte contains the referenced and changed bits and needs to be included into the hibernation image. If the storage keys are not restored to their previous state all original pages would appear to be dirty. This can cause inconsistencies e.g. with read-only filesystems. Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
-
由 ShuoX Liu 提交于
Record S3 failure time about each reason and the latest two failed devices' names in S3 progress. We can check it through 'suspend_stats' entry in debugfs. The motivation of the patch: We are enabling power features on Medfield. Comparing with PC/notebook, a mobile enters/exits suspend-2-ram (we call it s3 on Medfield) far more frequently. If it can't enter suspend-2-ram in time, the power might be used up soon. We often find sometimes, a device suspend fails. Then, system retries s3 over and over again. As display is off, testers and developers don't know what happens. Some testers and developers complain they don't know if system tries suspend-2-ram, and what device fails to suspend. They need such info for a quick check. The patch adds suspend_stats under debugfs for users to check suspend to RAM statistics quickly. If not using this patch, we have other methods to get info about what device fails. One is to turn on CONFIG_PM_DEBUG, but users would get too much info and testers need recompile the system. In addition, dynamic debug is another good tool to dump debug info. But it still doesn't match our utilization scenario closely. 1) user need write a user space parser to process the syslog output; 2) Our testing scenario is we leave the mobile for at least hours. Then, check its status. No serial console available during the testing. One is because console would be suspended, and the other is serial console connecting with spi or HSU devices would consume power. These devices are powered off at suspend-2-ram. Signed-off-by: NShuoX Liu <shuox.liu@intel.com> Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
-
- 05 10月, 2011 2 次提交
-
-
由 Rafael J. Wysocki 提交于
To read the current PM QoS value for a given device we need to make sure that the device's power.constraints object won't be removed while we're doing that. For this reason, put the operation under dev->power.lock and acquire the lock around the initialization and removal of power.constraints. Moreover, since we're using the value of power.constraints to determine whether or not the object is present, the power.constraints_state field isn't necessary any more and may be removed. However, dev_pm_qos_add_request() needs to check if the device is being removed from the system before allocating a new PM QoS constraints object for it, so make it use the power.power_state field of struct device for this purpose. Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl> -
由 Jon Mason 提交于
Add the ability to disable PCI-E MPS turning and using the BIOS configured MPS defaults. Due to the number of issues recently discovered on some x86 chipsets, make this the default behavior. Also, add the option for peer to peer DMA MPS configuration. Peer to peer DMA is outside the scope of this patch, but MPS configuration could prevent it from working by having the MPS on one root port different than the MPS on another. To work around this, simply make the system wide MPS the smallest possible value (128B). Signed-off-by: NJon Mason <mason@myri.com> Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 02 10月, 2011 2 次提交
-
-
由 MyungJoo Ham 提交于
Four cpufreq-like governors are provided as examples. powersave: use the lowest frequency possible. The user (device) should set the polling_ms as 0 because polling is useless for this governor. performance: use the highest freqeuncy possible. The user (device) should set the polling_ms as 0 because polling is useless for this governor. userspace: use the user specified frequency stored at devfreq.user_set_freq. With sysfs support in the following patch, a user may set the value with the sysfs interface. simple_ondemand: simplified version of cpufreq's ondemand governor. When a user updates OPP entries (enable/disable/add), OPP framework automatically notifies devfreq to update operating frequency accordingly. Thus, devfreq users (device drivers) do not need to update devfreq manually with OPP entry updates or set polling_ms for powersave , performance, userspace, or any other "static" governors. Note that these are given only as basic examples for governors and any devices with devfreq may implement their own governors with the drivers and use them. Signed-off-by: NMyungJoo Ham <myungjoo.ham@samsung.com> Signed-off-by: NKyungmin Park <kyungmin.park@samsung.com> Reviewed-by: NMike Turquette <mturquette@ti.com> Acked-by: NKevin Hilman <khilman@ti.com> Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
-
由 MyungJoo Ham 提交于
With OPPs, a device may have multiple operable frequency and voltage sets. However, there can be multiple possible operable sets and a system will need to choose one from them. In order to reduce the power consumption (by reducing frequency and voltage) without affecting the performance too much, a Dynamic Voltage and Frequency Scaling (DVFS) scheme may be used. This patch introduces the DVFS capability to non-CPU devices with OPPs. DVFS is a techique whereby the frequency and supplied voltage of a device is adjusted on-the-fly. DVFS usually sets the frequency as low as possible with given conditions (such as QoS assurance) and adjusts voltage according to the chosen frequency in order to reduce power consumption and heat dissipation. The generic DVFS for devices, devfreq, may appear quite similar with /drivers/cpufreq. However, cpufreq does not allow to have multiple devices registered and is not suitable to have multiple heterogenous devices with different (but simple) governors. Normally, DVFS mechanism controls frequency based on the demand for the device, and then, chooses voltage based on the chosen frequency. devfreq also controls the frequency based on the governor's frequency recommendation and let OPP pick up the pair of frequency and voltage based on the recommended frequency. Then, the chosen OPP is passed to device driver's "target" callback. When PM QoS is going to be used with the devfreq device, the device driver should enable OPPs that are appropriate with the current PM QoS requests. In order to do so, the device driver may call opp_enable and opp_disable at the notifier callback of PM QoS so that PM QoS's update_target() call enables the appropriate OPPs. Note that at least one of OPPs should be enabled at any time; be careful when there is a transition. Signed-off-by: NMyungJoo Ham <myungjoo.ham@samsung.com> Signed-off-by: NKyungmin Park <kyungmin.park@samsung.com> Reviewed-by: NMike Turquette <mturquette@ti.com> Acked-by: NKevin Hilman <khilman@ti.com> Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
-
- 01 10月, 2011 1 次提交
-
-
由 MyungJoo Ham 提交于
The patch enables to register notifier_block for an OPP-device in order to get notified for any changes in the availability of OPPs of the device. For example, if a new OPP is inserted or enable/disable status of an OPP is changed, the notifier is executed. This enables the usage of opp_add, opp_enable, and opp_disable to directly take effect with any connected entities such as cpufreq or devfreq. Signed-off-by: NMyungJoo Ham <myungjoo.ham@samsung.com> Signed-off-by: NKyungmin Park <kyungmin.park@samsung.com> Reviewed-by: NMike Turquette <mturquette@ti.com> Reviewed-by: NKevin Hilman <khilman@ti.com> Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
-
- 30 9月, 2011 1 次提交
-
-
由 Peter Zijlstra 提交于
David reported: Attached below is a watered-down version of rt/tst-cpuclock2.c from GLIBC. Just build it with "gcc -o test test.c -lpthread -lrt" or similar. Run it several times, and you will see cases where the main thread will measure a process clock difference before and after the nanosleep which is smaller than the cpu-burner thread's individual thread clock difference. This doesn't make any sense since the cpu-burner thread is part of the top-level process's thread group. I've reproduced this on both x86-64 and sparc64 (using both 32-bit and 64-bit binaries). For example: [davem@boricha build-x86_64-linux]$ ./test process: before(0.001221967) after(0.498624371) diff(497402404) thread: before(0.000081692) after(0.498316431) diff(498234739) self: before(0.001223521) after(0.001240219) diff(16698) [davem@boricha build-x86_64-linux]$ The diff of 'process' should always be >= the diff of 'thread'. I make sure to wrap the 'thread' clock measurements the most tightly around the nanosleep() call, and that the 'process' clock measurements are the outer-most ones. --- #include <unistd.h> #include <stdio.h> #include <stdlib.h> #include <time.h> #include <fcntl.h> #include <string.h> #include <errno.h> #include <pthread.h> static pthread_barrier_t barrier; static void *chew_cpu(void *arg) { pthread_barrier_wait(&barrier); while (1) __asm__ __volatile__("" : : : "memory"); return NULL; } int main(void) { clockid_t process_clock, my_thread_clock, th_clock; struct timespec process_before, process_after; struct timespec me_before, me_after; struct timespec th_before, th_after; struct timespec sleeptime; unsigned long diff; pthread_t th; int err; err = clock_getcpuclockid(0, &process_clock); if (err) return 1; err = pthread_getcpuclockid(pthread_self(), &my_thread_clock); if (err) return 1; pthread_barrier_init(&barrier, NULL, 2); err = pthread_create(&th, NULL, chew_cpu, NULL); if (err) return 1; err = pthread_getcpuclockid(th, &th_clock); if (err) return 1; pthread_barrier_wait(&barrier); err = clock_gettime(process_clock, &process_before); if (err) return 1; err = clock_gettime(my_thread_clock, &me_before); if (err) return 1; err = clock_gettime(th_clock, &th_before); if (err) return 1; sleeptime.tv_sec = 0; sleeptime.tv_nsec = 500000000; nanosleep(&sleeptime, NULL); err = clock_gettime(th_clock, &th_after); if (err) return 1; err = clock_gettime(my_thread_clock, &me_after); if (err) return 1; err = clock_gettime(process_clock, &process_after); if (err) return 1; diff = process_after.tv_nsec - process_before.tv_nsec; printf("process: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n", process_before.tv_sec, process_before.tv_nsec, process_after.tv_sec, process_after.tv_nsec, diff); diff = th_after.tv_nsec - th_before.tv_nsec; printf("thread: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n", th_before.tv_sec, th_before.tv_nsec, th_after.tv_sec, th_after.tv_nsec, diff); diff = me_after.tv_nsec - me_before.tv_nsec; printf("self: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n", me_before.tv_sec, me_before.tv_nsec, me_after.tv_sec, me_after.tv_nsec, diff); return 0; } This is due to us using p->se.sum_exec_runtime in thread_group_cputime() where we iterate the thread group and sum all data. This does not take time since the last schedule operation (tick or otherwise) into account. We can cure this by using task_sched_runtime() at the cost of having to take locks. This also means we can (and must) do away with thread_group_sched_runtime() since the modified thread_group_cputime() is now more accurate and would deadlock when called from thread_group_sched_runtime(). Aside of that it makes the function safe on 32 bit systems. The old code added t->se.sum_exec_runtime unprotected. sum_exec_runtime is a 64bit value and could be changed on another cpu at the same time. Reported-by: NDavid Miller <davem@davemloft.net> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: stable@kernel.org Link: http://lkml.kernel.org/r/1314874459.7945.22.camel@twinsTested-by: NDavid Miller <davem@davemloft.net> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 29 9月, 2011 1 次提交
-
-
由 Richard Cochran 提交于
The IEEE 1588 standard defines two kinds of messages, event and general messages. Event messages require time stamping, and general do not. When using UDP transport, two separate ports are used for the two message types. The BPF designed to recognize event messages incorrectly classifies L2 general messages as event messages. This commit fixes the issue by extending the filter to check the message type field for L2 PTP packets. Event messages are be distinguished from general messages by testing the "general" bit. Signed-off-by: NRichard Cochran <richard.cochran@omicron.at> Cc: <stable@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 9月, 2011 3 次提交
-
-
由 Linus Torvalds 提交于
That flag no longer makes sense, since we don't look up automount points as eagerly any more. Additionally, it turns out that the NO_AUTOMOUNT handling was buggy to begin with: it would avoid automounting even for cases where we really *needed* to do the automount handling, and could return ENOENT for autofs entries that hadn't been instantiated yet. With our new non-eager automount semantics, one discussion has been about adding a AT_AUTOMOUNT flag to vfs_fstatat (and thus the newfstatat() and fstatat64() system calls), but it's probably not worth it: you can always force at least directory automounting by simply adding the final '/' to the filename, which works for *all* of the stat family system calls, old and new. So AT_NO_AUTOMOUNT (and thus LOOKUP_NO_AUTOMOUNT) really were just a result of our bad default behavior. Acked-by: NIan Kent <raven@themaw.net> Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
Since we've now turned around and made LOOKUP_FOLLOW *not* force an automount, we want to add the ability to force an automount event on lookup even if we don't happen to have one of the other flags that force it implicitly (LOOKUP_OPEN, LOOKUP_DIRECTORY, LOOKUP_PARENT..) Most cases will never want to use this, since you'd normally want to delay automounting as long as possible, which usually implies LOOKUP_OPEN (when we open a file or directory, we really cannot avoid the automount any more). But Trond argued sufficiently forcefully that at a minimum bind mounting a file and quotactl will want to force the automount lookup. Some other cases (like nfs_follow_remote_path()) could use it too, although LOOKUP_DIRECTORY would work there as well. This commit just adds the flag and logic, no users yet, though. It also doesn't actually touch the LOOKUP_NO_AUTOMOUNT flag that is related, and was made irrelevant by the same change that made us not follow on LOOKUP_FOLLOW. Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Ian Kent <raven@themaw.net> Cc: Jeff Layton <jlayton@redhat.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Greg KH <gregkh@suse.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> -
由 Rafael J. Wysocki 提交于
The struct pm_domain_data data type is defined in such a way that adding new fields specific to the generic PM domains code will require include/linux/pm.h to be modified. As a result, data types used only by the generic PM domains code will be defined in two headers, although they all should be defined in pm_domain.h and pm.h will need to include more headers, which won't be very nice. For this reason change the definition of struct pm_subsys_data so that its domain_data member is a pointer, which will allow struct pm_domain_data to be subclassed by various PM domains implementations. Remove the need_restore member from struct pm_domain_data and make the generic PM domains code subclass it by adding the need_restore member to the new data type. Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
-
- 26 9月, 2011 1 次提交
-
-
由 Milan Broz 提交于
If optional discard support in dm-crypt is enabled, discards requests bypass the crypt queue and blocks of the underlying device are discarded. For the read path, discarded blocks are handled the same as normal ciphertext blocks, thus decrypted. So if the underlying device announces discarded regions return zeroes, dm-crypt must disable this flag because after decryption there is just random noise instead of zeroes. Signed-off-by: NMilan Broz <mbroz@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
- 20 9月, 2011 2 次提交
-
-
由 Christian Borntraeger 提交于
598841ca ([S390] use gmap address spaces for kvm guest images) changed kvm on s390 to use a separate address space for kvm guests. We can now put KVM guests anywhere in the user address mode with a size up to 8PB - as long as the memory is 1MB-aligned. This change was done without KVM extension capability bit. The change was added after 3.0, but we still have a chance to add a feature bit before 3.1 (keeping the releases in a sane state). We use number 71 to avoid collisions with other pending kvm patches as requested by Alexander Graf. Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com> Acked-by: NAvi Kivity <avi@redhat.com> Cc: Alexander Graf <agraf@suse.de> Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
-
由 Rob Herring 提交于
irq_domain_simple_ops is exported, but is not declared in irqdomain.h, so add it. Signed-off-by: NRob Herring <rob.herring@calxeda.com> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: marc.zyngier@arm.com Cc: thomas.abraham@linaro.org Cc: jamie@jamieiles.com Cc: b-cousson@ti.com Cc: shawn.guo@linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: devicetree-discuss@lists.ozlabs.org Link: http://lkml.kernel.org/r/1316017900-19918-2-git-send-email-robherring2@gmail.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 16 9月, 2011 2 次提交
-
-
由 Michael S. Tsirkin 提交于
dev_forward_skb loops an skb back into host networking stack which might hang on the memory indefinitely. In particular, this can happen in macvtap in bridged mode. Copy the userspace fragments to avoid blocking the sender in that case. As this patch makes skb_copy_ubufs extern now, I also added some documentation and made it clear the SKBTX_DEV_ZEROCOPY flag automatically instead of doing it in all callers. This can be made into a separate patch if people feel it's worth it. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
"Possible SYN flooding on port xxxx " messages can fill logs on servers. Change logic to log the message only once per listener, and add two new SNMP counters to track : TCPReqQFullDoCookies : number of times a SYNCOOKIE was replied to client TCPReqQFullDrop : number of times a SYN request was dropped because syncookies were not enabled. Based on a prior patch from Tom Herbert, and suggestions from David. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> CC: Tom Herbert <therbert@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 9月, 2011 2 次提交
-
-
由 Russell King 提交于
Building a kernel with hotplug disabled results in a link failure: `bgpio_remove' referenced in section `___ksymtab_gpl+bgpio_remove' of drivers/built-in.o: defined in discarded section `.devexit.text' of drivers/built-in.o This is because of bgpio_remove() is exported. It is illegal to export symbols which are discarded either at link time or as part of an init/exit section. Fix this by dropping the __devexit attributation from bgpio_remove(). Also drop the __devinit attributation from bgpio_init(). Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk> Cc: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
Revert the post-3.0 commit 82f9d486 ("memcg: add memory.vmscan_stat"). The implementation of per-memcg reclaim statistics violates how memcg hierarchies usually behave: hierarchically. The reclaim statistics are accounted to child memcgs and the parent hitting the limit, but not to hierarchy levels in between. Usually, hierarchical statistics are perfectly recursive, with each level representing the sum of itself and all its children. Since this exports statistics to userspace, this may lead to confusion and problems with changing things after the release, so revert it now, we can try again later. Signed-off-by: NJohannes Weiner <jweiner@redhat.com> Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Michal Hocko <mhocko@suse.cz> Cc: Ying Han <yinghan@google.com> Cc: Balbir Singh <bsingharora@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 9月, 2011 1 次提交
-
-
由 Randy Dunlap 提交于
Fix kernel-doc warning about internal/private data by marking it as "private:" so that kernel-doc will ignore it. Warning(include/linux/regulator/consumer.h:128): No description found for parameter 'ret' Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net> Acked-by: NMark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 06 9月, 2011 1 次提交
-
-
由 Mark Brown 提交于
This needs to be an out of band value for the register and on this device registers are 16 bit so we must shift left one to the 17th bit. Signed-off-by: NMark Brown <broonie@opensource.wolfsonmicro.com> Cc: stable@kernel.org Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>
-
- 29 8月, 2011 1 次提交
-
-
由 Stephane Eranian 提交于
The current cgroup context switch code was incorrect leading to bogus counts. Furthermore, as soon as there was an active cgroup event on a CPU, the context switch cost on that CPU would increase by a significant amount as demonstrated by a simple ping/pong example: $ ./pong Both processes pinned to CPU1, running for 10s 10684.51 ctxsw/s Now start a cgroup perf stat: $ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 100 $ ./pong Both processes pinned to CPU1, running for 10s 6674.61 ctxsw/s That's a 37% penalty. Note that pong is not even in the monitored cgroup. The results shown by perf stat are bogus: $ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 100 Performance counter stats for 'sleep 100': CPU1 <not counted> cycles test CPU1 16,984,189,138 cycles # 0.000 GHz The second 'cycles' event should report a count @ CPU clock (here 2.4GHz) as it is counting across all cgroups. The patch below fixes the bogus accounting and bypasses any cgroup switches in case the outgoing and incoming tasks are in the same cgroup. With this patch the same test now yields: $ ./pong Both processes pinned to CPU1, running for 10s 10775.30 ctxsw/s Start perf stat with cgroup: $ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 10 Run pong outside the cgroup: $ /pong Both processes pinned to CPU1, running for 10s 10687.80 ctxsw/s The penalty is now less than 2%. And the results for perf stat are correct: $ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 10 Performance counter stats for 'sleep 10': CPU1 <not counted> cycles test # 0.000 GHz CPU1 23,933,981,448 cycles # 0.000 GHz Now perf stat reports the correct counts for for the non cgroup event. If we run pong inside the cgroup, then we also get the correct counts: $ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 10 Performance counter stats for 'sleep 10': CPU1 22,297,726,205 cycles test # 0.000 GHz CPU1 23,933,981,448 cycles # 0.000 GHz 10.001457237 seconds time elapsed Signed-off-by: NStephane Eranian <eranian@google.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110825135803.GA4697@quadSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
- 27 8月, 2011 1 次提交
-
-
由 NeilBrown 提交于
The nfsservctl system call is now gone, so we should remove all linkage for it. Signed-off-by: NNeilBrown <neilb@suse.de> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 26 8月, 2011 5 次提交
-
-
由 Dilan Lee 提交于
We need a callback to do some things after pwm_enable, pwm_disable and pwm_config. Signed-off-by: NDilan Lee <dilee@nvidia.com> Reviewed-by: NRobert Morell <rmorell@nvidia.com> Reviewed-by: NArun Murthy <arun.murthy@stericsson.com> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexandre Bounine 提交于
Replace/remove use of RIO v.1.2 registers/bits that are not forward-compatible with newer versions of RapidIO specification. RapidIO specification v.1.3 removed Write Port CSR, Doorbell CSR, Mailbox CSR and Mailbox and Doorbell bits of the PEF CAR. Use of removed (since RIO v.1.3) register bits affects users of currently available 1.3 and 2.x compliant devices who may use not so recent kernel versions. Removing checks for unsupported bits makes corresponding routines compatible with all versions of RapidIO specification. Therefore, backporting makes stable kernel versions compliant with RIO v.1.3 and later as well. Signed-off-by: NAlexandre Bounine <alexandre.bounine@idt.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Thomas Moll <thomas.moll@sysgo.com> Cc: Chul Kim <chul.kim@idt.com> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Evgeniy Polyakov 提交于
Signed-off-by: NEvgeniy Polyakov <zbr@ioremap.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Josh Boyer 提交于
Purely in-memory filesystems do not use the inode hash as the dcache tells us if an entry already exists. As a result, they do not call unlock_new_inode, and thus directory inodes do not get put into a different lockdep class for i_sem. We need the different lockdep classes, because the locking order for i_mutex is different for directory inodes and regular inodes. Directory inodes can do "readdir()", which takes i_mutex *before* possibly taking mm->mmap_sem (due to a page fault while copying the directory entry to user space). In contrast, regular inodes can be mmap'ed, which takes mm->mmap_sem before accessing i_mutex. The two cases can never happen for the same inode, so no real deadlock can occur, but without the different lockdep classes, lockdep cannot understand that. As a result, if CONFIG_DEBUG_LOCK_ALLOC is set, this can lead to false positives from lockdep like below: find/645 is trying to acquire lock: (&mm->mmap_sem){++++++}, at: [<ffffffff81109514>] might_fault+0x5c/0xac but task is already holding lock: (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [<ffffffff81149f34>] vfs_readdir+0x5b/0xb4 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&sb->s_type->i_mutex_key#15){+.+.+.}: [<ffffffff8108ac26>] lock_acquire+0xbf/0x103 [<ffffffff814db822>] __mutex_lock_common+0x4c/0x361 [<ffffffff814dbc46>] mutex_lock_nested+0x40/0x45 [<ffffffff811daa87>] hugetlbfs_file_mmap+0x82/0x110 [<ffffffff81111557>] mmap_region+0x258/0x432 [<ffffffff811119dd>] do_mmap_pgoff+0x2ac/0x306 [<ffffffff81111b4f>] sys_mmap_pgoff+0x118/0x16a [<ffffffff8100c858>] sys_mmap+0x22/0x24 [<ffffffff814e3ec2>] system_call_fastpath+0x16/0x1b -> #0 (&mm->mmap_sem){++++++}: [<ffffffff8108a4bc>] __lock_acquire+0xa1a/0xcf7 [<ffffffff8108ac26>] lock_acquire+0xbf/0x103 [<ffffffff81109541>] might_fault+0x89/0xac [<ffffffff81149cff>] filldir+0x6f/0xc7 [<ffffffff811586ea>] dcache_readdir+0x67/0x205 [<ffffffff81149f54>] vfs_readdir+0x7b/0xb4 [<ffffffff8114a073>] sys_getdents+0x7e/0xd1 [<ffffffff814e3ec2>] system_call_fastpath+0x16/0x1b This patch moves the directory vs file lockdep annotation into a helper function that can be called by in-memory filesystems and has hugetlbfs call it. Signed-off-by: NJosh Boyer <jwboyer@redhat.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> -
由 Andi Kleen 提交于
I ran into a couple of programs which broke with the new Linux 3.0 version. Some of those were binary only. I tried to use LD_PRELOAD to work around it, but it was quite difficult and in one case impossible because of a mix of 32bit and 64bit executables. For example, all kind of management software from HP doesnt work, unless we pretend to run a 2.6 kernel. $ uname -a Linux svivoipvnx001 3.0.0-08107-g97cd98f #1062 SMP Fri Aug 12 18:11:45 CEST 2011 i686 i686 i386 GNU/Linux $ hpacucli ctrl all show Error: No controllers detected. $ rpm -qf /usr/sbin/hpacucli hpacucli-8.75-12.0 Another notable case is that Python now reports "linux3" from sys.platform(); which in turn can break things that were checking sys.platform() == "linux2": https://bugzilla.mozilla.org/show_bug.cgi?id=664564 It seems pretty clear to me though it's a bug in the apps that are using '==' instead of .startswith(), but this allows us to unbreak broken programs. This patch adds a UNAME26 personality that makes the kernel report a 2.6.40+x version number instead. The x is the x in 3.x. I know this is somewhat ugly, but I didn't find a better workaround, and compatibility to existing programs is important. Some programs also read /proc/sys/kernel/osrelease. This can be worked around in user space with mount --bind (and a mount namespace) To use: wget ftp://ftp.kernel.org/pub/linux/kernel/people/ak/uname26/uname26.c gcc -o uname26 uname26.c ./uname26 program Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 8月, 2011 9 次提交
-
-
由 Rafael J. Wysocki 提交于
The generic PM domains framework currently doesn't work with devices whose power.irq_safe flag is set, because runtime PM callbacks for such devices are run with interrupts disabled and the callbacks provided by the generic PM domains framework use domain mutexes and may sleep. However, such devices very well may belong to power domains on some systems, so the generic PM domains framework should take them into account. For this reason, modify the generic PM domains framework so that the domain .power_off() and .power_on() callbacks are never executed for a domain containing devices with power.irq_safe set, although the .stop_device() and .start_device() callbacks are still run for them. Additionally, introduce a flag allowing the creator of a struct generic_pm_domain object to indicate that its .stop_device() and .start_device() callbacks may be run in interrupt context (might_sleep_if() triggers if that flag is not set and one of those callbacks is run in interrupt context). Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl> -
由 Jean Pihet 提交于
Add a global notification chain that gets called upon changes to the aggregated constraint value for any device. The notification callbacks are passing the full constraint request data in order for the callees to have access to it. The current use is for the platform low-level code to access the target device of the constraint. Signed-off-by: NJean Pihet <j-pihet@ti.com> Reviewed-by: NKevin Hilman <khilman@ti.com> Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
-
由 Jean Pihet 提交于
Implement the per-device PM QoS constraints by creating a device PM QoS API, which calls the PM QoS constraints management core code. The per-device latency constraints data strctures are stored in the device dev_pm_info struct. The device PM code calls the init and destroy of the per-device constraints data struct in order to support the dynamic insertion and removal of the devices in the system. To minimize the data usage by the per-device constraints, the data struct is only allocated at the first call to dev_pm_qos_add_request. The data is later free'd when the device is removed from the system. A global mutex protects the constraints users from the data being allocated and free'd. Signed-off-by: NJean Pihet <j-pihet@ti.com> Reviewed-by: NKevin Hilman <khilman@ti.com> Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
-
由 Jean Pihet 提交于
In preparation for the per-device constratins support: - rename update_target to pm_qos_update_target - generalize and export pm_qos_update_target for usage by the upcoming per-device latency constraints framework: * operate on struct pm_qos_constraints for constraints management, * introduce an 'action' parameter for constraints add/update/remove, * the return value indicates if the aggregated constraint value has changed, - update the internal code to operate on struct pm_qos_constraints - add a NULL pointer check in the API functions Signed-off-by: NJean Pihet <j-pihet@ti.com> Reviewed-by: NKevin Hilman <khilman@ti.com> Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl> -
由 Jean Pihet 提交于
In preparation for the per-device constratins support, re-organize the data strctures: - add a struct pm_qos_constraints which contains the constraints related data - update struct pm_qos_object contents to the PM QoS internal object data. Add a pointer to struct pm_qos_constraints - update the internal code to use the new data structs. Signed-off-by: NJean Pihet <j-pihet@ti.com> Reviewed-by: NKevin Hilman <khilman@ti.com> Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
-
由 Jean Pihet 提交于
- Misc fixes to improve code readability: * rename struct pm_qos_request_list to struct pm_qos_request, * rename pm_qos_req parameter to req in internal code, consistenly use req in the API parameters, * update the in-kernel API callers to the new parameters names, * rename of fields names (requests, list, node, constraints) Signed-off-by: NJean Pihet <j-pihet@ti.com> Acked-by: Nmarkgross <markgross@thegnar.org> Reviewed-by: NKevin Hilman <khilman@ti.com> Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl> -
由 Jean Pihet 提交于
The PM QoS implementation files are better named kernel/power/qos.c and include/linux/pm_qos.h. The PM QoS support is compiled under the CONFIG_PM option. Signed-off-by: NJean Pihet <j-pihet@ti.com> Acked-by: Nmarkgross <markgross@thegnar.org> Reviewed-by: NKevin Hilman <khilman@ti.com> Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
-
由 Rafael J. Wysocki 提交于
Since the PM clock management code in drivers/base/power/clock_ops.c is used for both runtime PM and system suspend/hibernation, the definitions of data structures and headers related to it should not be located in include/linux/pm_rumtime.h. Move them to a separate header file. Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl> -
由 Rafael J. Wysocki 提交于
Currently pm_genpd_runtime_resume() has to walk the list of devices from the device's PM domain to find the corresponding device list object containing the need_restore field to check if the driver's .runtime_resume() callback should be executed for the device. This is suboptimal and can be simplified by using power.sybsys_data to store device information used by the generic PM domains code. Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
-