- 10 8月, 2010 14 次提交
-
-
由 Andi Kleen 提交于
Avoid quite a lot of warnings in header files in a gcc 4.6 -Wall builds Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Lee Schermerhorn 提交于
Define stubs for the numa_*_id() generic percpu related functions for non-NUMA configurations in <asm-generic/topology.h> where the other non-numa stubs live. Fixes ia64 !NUMA build breakage -- e.g., tiger_defconfig Back out now unneeded '#ifndef CONFIG_NUMA' guards from ia64 smpboot.c Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com> Tested-by: NTony Luck <tony.luck@intel.com> Acked-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexander Nevenchannyy 提交于
get_zone_counts() was dropped from kernel tree, see: http://www.mail-archive.com/mm-commits@vger.kernel.org/msg07313.html but its prototype remains. Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Minchan Kim 提交于
We have been used naming try_set_zone_oom and clear_zonelist_oom. The role of functions is to lock of zonelist for preventing parallel OOM. So clear_zonelist_oom makes sense but try_set_zone_oome is rather awkward and unmatched with clear_zonelist_oom. Let's change it with try_set_zonelist_oom. Signed-off-by: NMinchan Kim <minchan.kim@gmail.com> Acked-by: NDavid Rientjes <rientjes@google.com> Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Rientjes 提交于
The three oom killer sysctl variables (sysctl_oom_dump_tasks, sysctl_oom_kill_allocating_task, and sysctl_panic_on_oom) are better declared in include/linux/oom.h rather than kernel/sysctl.c. Signed-off-by: NDavid Rientjes <rientjes@google.com> Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Rientjes 提交于
There are various points in the oom killer where the kernel must determine whether to panic or not. It's better to extract this to a helper function to remove all the confusion as to its semantics. Also fix a call to dump_header() where tasklist_lock is not read- locked, as required. There's no functional change with this patch. Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: NDavid Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Rientjes 提交于
The oom killer presently kills current whenever there is no more memory free or reclaimable on its mempolicy's nodes. There is no guarantee that current is a memory-hogging task or that killing it will free any substantial amount of memory, however. In such situations, it is better to scan the tasklist for nodes that are allowed to allocate on current's set of nodes and kill the task with the highest badness() score. This ensures that the most memory-hogging task, or the one configured by the user with /proc/pid/oom_adj, is always selected in such scenarios. Signed-off-by: NDavid Rientjes <rientjes@google.com> Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Richard Kennedy 提交于
The comment suggests that when b_count equals zero it is calling __wait_no_buffer to trigger some debug, but as there is no debug in __wait_on_buffer the whole thing is redundant. AFAICT from the git log this has been the case for at least 5 years, so it seems safe just to remove this. Signed-off-by: NRichard Kennedy <richard@rsk.demon.co.uk> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jeff Mahoney <jeffm@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Rik van Riel 提交于
KSM reference counts can cause an anon_vma to exist after the processe it belongs to have already exited. Because the anon_vma lock now lives in the root anon_vma, we need to ensure that the root anon_vma stays around until after all the "child" anon_vmas have been freed. The obvious way to do this is to have a "child" anon_vma take a reference to the root in anon_vma_fork. When the anon_vma is freed at munmap or process exit, we drop the refcount in anon_vma_unlink and possibly free the root anon_vma. The KSM anon_vma reference count function also needs to be modified to deal with the possibility of freeing 2 levels of anon_vma. The easiest way to do this is to break out the KSM magic and make it generic. When compiling without CONFIG_KSM, this code is compiled out. Signed-off-by: NRik van Riel <riel@redhat.com> Tested-by: NLarry Woodman <lwoodman@redhat.com> Acked-by: NLarry Woodman <lwoodman@redhat.com> Reviewed-by: NMinchan Kim <minchan.kim@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: NMel Gorman <mel@csn.ul.ie> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Tested-by: NDave Young <hidave.darkstar@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Rik van Riel 提交于
Always (and only) lock the root (oldest) anon_vma whenever we do something in an anon_vma. The recently introduced anon_vma scalability is due to the rmap code scanning only the VMAs that need to be scanned. Many common operations still took the anon_vma lock on the root anon_vma, so always taking that lock is not expected to introduce any scalability issues. However, always taking the same lock does mean we only need to take one lock, which means rmap_walk on pages from any anon_vma in the vma is excluded from occurring during an munmap, expand_stack or other operation that needs to exclude rmap_walk and similar functions. Also add the proper locking to vma_adjust. Signed-off-by: NRik van Riel <riel@redhat.com> Tested-by: NLarry Woodman <lwoodman@redhat.com> Acked-by: NLarry Woodman <lwoodman@redhat.com> Reviewed-by: NMinchan Kim <minchan.kim@gmail.com> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: NMel Gorman <mel@csn.ul.ie> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Rik van Riel 提交于
Track the root (oldest) anon_vma in each anon_vma tree. Because we only take the lock on the root anon_vma, we cannot use the lock on higher-up anon_vmas to lock anything. This makes it impossible to do an indirect lookup of the root anon_vma, since the data structures could go away from under us. However, a direct pointer is safe because the root anon_vma is always the last one that gets freed on munmap or exit, by virtue of the same_vma list order and unlink_anon_vmas walking the list forward. [akpm@linux-foundation.org: fix typo] Signed-off-by: NRik van Riel <riel@redhat.com> Acked-by: NMel Gorman <mel@csn.ul.ie> Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Tested-by: NLarry Woodman <lwoodman@redhat.com> Acked-by: NLarry Woodman <lwoodman@redhat.com> Reviewed-by: NMinchan Kim <minchan.kim@gmail.com> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Rik van Riel 提交于
Subsitute a direct call of spin_lock(anon_vma->lock) with an inline function doing exactly the same. This makes it easier to do the substitution to the root anon_vma lock in a following patch. We will deal with the handful of special locks (nested, dec_and_lock, etc) separately. Signed-off-by: NRik van Riel <riel@redhat.com> Acked-by: NMel Gorman <mel@csn.ul.ie> Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Tested-by: NLarry Woodman <lwoodman@redhat.com> Acked-by: NLarry Woodman <lwoodman@redhat.com> Reviewed-by: NMinchan Kim <minchan.kim@gmail.com> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Rik van Riel 提交于
Rename anon_vma_lock to vma_lock_anon_vma. This matches the naming style used in page_lock_anon_vma and will come in really handy further down in this patch series. Signed-off-by: NRik van Riel <riel@redhat.com> Acked-by: NMel Gorman <mel@csn.ul.ie> Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Tested-by: NLarry Woodman <lwoodman@redhat.com> Acked-by: NLarry Woodman <lwoodman@redhat.com> Reviewed-by: NMinchan Kim <minchan.kim@gmail.com> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Cesar Eduardo Barros 提交于
kunmap_atomic() is currently at level -4 on Rusty's "Hard To Misuse" list[1] ("Follow common convention and you'll get it wrong"), except in some architectures when CONFIG_DEBUG_HIGHMEM is set[2][3]. kunmap() takes a pointer to a struct page; kunmap_atomic(), however, takes takes a pointer to within the page itself. This seems to once in a while trip people up (the convention they are following is the one from kunmap()). Make it much harder to misuse, by moving it to level 9 on Rusty's list[4] ("The compiler/linker won't let you get it wrong"). This is done by refusing to build if the type of its first argument is a pointer to a struct page. The real kunmap_atomic() is renamed to kunmap_atomic_notypecheck() (which is what you would call in case for some strange reason calling it with a pointer to a struct page is not incorrect in your code). The previous version of this patch was compile tested on x86-64. [1] http://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html [2] In these cases, it is at level 5, "Do it right or it will always break at runtime." [3] At least mips and powerpc look very similar, and sparc also seems to share a common ancestor with both; there seems to be quite some degree of copy-and-paste coding here. The include/asm/highmem.h file for these three archs mention x86 CPUs at its top. [4] http://ozlabs.org/~rusty/index.cgi/tech/2008-03-30.html [5] As an aside, could someone tell me why mn10300 uses unsigned long as the first parameter of kunmap_atomic() instead of void *? Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net> Cc: Russell King <linux@arm.linux.org.uk> (arch/arm) Cc: Ralf Baechle <ralf@linux-mips.org> (arch/mips) Cc: David Howells <dhowells@redhat.com> (arch/frv, arch/mn10300) Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> (arch/mn10300) Cc: Kyle McMartin <kyle@mcmartin.ca> (arch/parisc) Cc: Helge Deller <deller@gmx.de> (arch/parisc) Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> (arch/parisc) Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> (arch/powerpc) Cc: Paul Mackerras <paulus@samba.org> (arch/powerpc) Cc: "David S. Miller" <davem@davemloft.net> (arch/sparc) Cc: Thomas Gleixner <tglx@linutronix.de> (arch/x86) Cc: Ingo Molnar <mingo@redhat.com> (arch/x86) Cc: "H. Peter Anvin" <hpa@zytor.com> (arch/x86) Cc: Arnd Bergmann <arnd@arndb.de> (include/asm-generic) Cc: Rusty Russell <rusty@rustcorp.com.au> ("Hard To Misuse" list) Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 07 8月, 2010 3 次提交
-
-
由 Jason Wessel 提交于
A regression of building without CONFIG_HW_CONSOLE was introduced with commit b45cfba4 (vt,console,kdb: implement atomic console enter/leave functions). ERROR: "con_debug_enter" [drivers/serial/kgdboc.ko] undefined! ERROR: "vc_cons" [drivers/serial/kgdboc.ko] undefined! ERROR: "fg_console" [drivers/serial/kgdboc.ko] undefined! ERROR: "con_debug_leave" [drivers/serial/kgdboc.ko] undefined! When there is no HW console the con_debug_enter and con_debug_leave functions should have no code. Signed-off-by: NJason Wessel <jason.wessel@windriver.com> CC: Jesse Barnes <jbarnes@virtuousgeek.org> Reported-by: NRandy Dunlap <randy.dunlap@oracle.com>
-
由 Bryan Schumaker 提交于
Add a flag so we know if we mounted the NFS server using the legacy binary interface. If we used the legacy interface, then we should not show the mountd options. Signed-off-by: NBryan Schumaker <bjschuma@netapp.com> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 David Howells 提交于
Make /dev/console get initialised before any initialisation routine that invokes modprobe because if modprobe fails, it's going to want to open /dev/console, presumably to write an error message to. The problem with that is that if the /dev/console driver is not yet initialised, the chardev handler will call request_module() to invoke modprobe, which will fail, because we never compile /dev/console as a module. This will lead to a modprobe loop, showing the following in the kernel log: request_module: runaway loop modprobe char-major-5-1 request_module: runaway loop modprobe char-major-5-1 request_module: runaway loop modprobe char-major-5-1 request_module: runaway loop modprobe char-major-5-1 request_module: runaway loop modprobe char-major-5-1 This can happen, for example, when the built in md5 module can't find the built in cryptomgr module (because the latter fails to initialise). The md5 module comes before the call to tty_init(), presumably because 'crypto' comes before 'drivers' alphabetically. Fix this by calling tty_init() from chrdev_init(). Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 06 8月, 2010 7 次提交
-
-
由 Johannes Berg 提交于
The new_name argument to device_rename() can be const as kobject_rename's new_name argument is. Signed-off-by: NJohannes Berg <johannes.berg@intel.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Guenter Roeck 提交于
Signed-off-by: NGuenter Roeck <guenter.roeck@ericsson.com> Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Magnus Damm 提交于
Add BUS_NOTIFY_BIND_DRIVER as a bus notifier event. For driver binding/unbinding we with this in place have the following bus notifier events: - BUS_NOTIFY_BIND_DRIVER - before ->probe() - BUS_NOTIFY_BOUND_DRIVER - after ->probe() - BUS_NOTIFY_UNBIND_DRIVER - before ->remove() - BUS_NOTIFY_UNBOUND_DRIVER - after ->remove() The event BUS_NOTIFY_BIND_DRIVER allows bus code to be notified that ->probe() is about to be called. Useful for bus code that needs to setup hardware before the driver gets to run. With this in place platform drivers can be loaded and unloaded as modules and the new BIND event allows bus code to control for instance device clocks that must be enabled before the driver can be executed. Without this patch there is no way for the bus code to get notified that a modular driver is about to be probed. Signed-off-by: NMagnus Damm <damm@opensource.se> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Jean Delvare 提交于
sysfs_chmod_file doesn't change the attribute it operates on, so this attribute can be marked const. Signed-off-by: NJean Delvare <khali@linux-fr.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Uwe Kleine-König 提交于
This makes the two similar functions platform_device_register_simple and platform_device_register_data one line inline functions using a new generic function platform_device_register_resndata. Signed-off-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Jean Delvare 提交于
There is little rationale for marking bus_for_each_drv() __must_check. It is more of an iteration helper than a real function. You don't know in advance which callback it will be used on, so you have no clue how important it can be to check the returned value. In practice, this helper function can be used for best-effort tasks. As a matter of fact, bus_for_each_dev() is not marked __must_check. So remove it from bus_for_each_drv() as well. This is the same that was done back in October 2006 by Russell King for device_for_each_child(), for exactly the same reasons. Signed-off-by: NJean Delvare <khali@linux-fr.org> Cc: Andrew Morton <akpm@osdl.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
由 Wang Lei 提交于
Separate out the DNS resolver key type from the CIFS filesystem into its own module so that it can be made available for general use, including the AFS filesystem module. This facility makes it possible for the kernel to upcall to userspace to have it issue DNS requests, package up the replies and present them to the kernel in a useful form. The kernel is then able to cache the DNS replies as keys can be retained in keyrings. Resolver keys are of type "dns_resolver" and have a case-insensitive description that is of the form "[<type>:]<domain_name>". The optional <type> indicates the particular DNS lookup and packaging that's required. The <domain_name> is the query to be made. If <type> isn't given, a basic hostname to IP address lookup is made, and the result is stored in the key in the form of a printable string consisting of a comma-separated list of IPv4 and IPv6 addresses. This key type is supported by userspace helpers driven from /sbin/request-key and configured through /etc/request-key.conf. The cifs.upcall utility is invoked for UNC path server name to IP address resolution. The CIFS functionality is encapsulated by the dns_resolve_unc_to_ip() function, which is used to resolve a UNC path to an IP address for CIFS filesystem. This part remains in the CIFS module for now. See the added Documentation/networking/dns_resolver.txt for more information. Signed-off-by: NWang Lei <wang840925@gmail.com> Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NSteve French <sfrench@us.ibm.com>
-
- 05 8月, 2010 8 次提交
-
-
由 Jesse Barnes 提交于
Add fb ops to handle enter/exit of the kernel debugger. If present, the fb core will register them with KGDB and they'll be called when the debugger is entered and exited. The new functions are responsible for switching to an appropriate debug framebuffer and restoring the interrupted state at exit time. Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
-
由 Jason Wessel 提交于
The kernel console interface stores the number of lines it is configured to use. The kdb debugger can greatly benefit by knowing how many lines there are on the console for the pager functionality without having the end user compile in the setting or have to repeatedly change it at run time. Signed-off-by: NJason Wessel <jason.wessel@windriver.com> Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org> CC: David Airlie <airlied@linux.ie> CC: Andrew Morton <akpm@linux-foundation.org>
-
由 Jesse Barnes 提交于
These functions allow the kernel debugger to save and restore the state of the system console. Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: NJason Wessel <jason.wessel@windriver.com> CC: David Airlie <airlied@linux.ie> CC: Andrew Morton <akpm@linux-foundation.org>
-
由 Jason Wessel 提交于
The gdbserial 'p' and 'P' packets allow gdb to individually get and set registers instead of querying for all the available registers. Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
-
由 Jason Wessel 提交于
The kdb shell specification includes the ability to get and set architecture specific registers by name. For the time being individual register get and set will be implemented on a per architecture basis. If an architecture defines DBG_MAX_REG_NUM > 0 then kdb and the gdbstub will use the capability for individually getting and setting architecture specific registers. Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
-
由 Lars-Peter Clausen 提交于
Add support for the battery voltage measurement part of the JZ4740 ADC unit. Signed-off-by: NLars-Peter Clausen <lars@metafoo.de> Acked-by: NAnton Vorontsov <cbouatmailru@gmail.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/1416/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
-
由 Chris Wilson 提交于
This is required should we ever attempt to use an io-mapping where KM_USER0 is verboten, such as inside an IRQ context. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Eric Anholt <eric@anholt.net> Signed-off-by: NDave Airlie <airlied@redhat.com>
-
由 Ilya Yanok 提交于
This patch adds the quirk for PCIE controller found on Freescale MPC8308. The quirk is the same as for other MPC83xx processors. Signed-off-by: NIlya Yanok <yanok@emcraft.com> Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
-
- 04 8月, 2010 8 次提交
-
-
由 Trond Myklebust 提交于
This will allow us to save the original generic cred in rpc_message, so that if we migrate from one server to another, we can generate a new bound cred without having to punt back to the NFS layer. Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Trond Myklebust 提交于
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Trond Myklebust 提交于
Now that rpc_run_task() is the sole entry point for RPC calls, we can move the remaining rpc_client-related initialisation of struct rpc_task from sched.c into clnt.c. Also move rpc_killall_tasks() into the same file, since that too is relative to the rpc_clnt. Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Trond Myklebust 提交于
Make rpc_exit() non-inline, and ensure that it always wakes up a task that has been queued. Kill off the now unused rpc_wake_up_task(). Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Trond Myklebust 提交于
This patch allows the user to configure the credential cache hashtable size using a new module parameter: auth_hashtable_size When set, this parameter will be rounded up to the nearest power of two, with a maximum allowed value of 1024 elements. Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Trond Myklebust 提交于
Cleanup in preparation for allowing the user to determine the maximum hash table size. Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Trond Myklebust 提交于
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Patrick Pannuto 提交于
usleep_range is a finer precision implementations of msleep and is designed to be a drop-in replacement for udelay where a precise sleep / busy-wait is unnecessary. Since an easy interface to hrtimers could lead to an undesired proliferation of interrupts, we provide only a "range" API, forcing the caller to think about an acceptable tolerance on both ends and hopefully avoiding introducing another interrupt. INTRO As discussed here ( http://lkml.org/lkml/2007/8/3/250 ), msleep(1) is not precise enough for many drivers (yes, sleep precision is an unfair notion, but consistently sleeping for ~an order of magnitude greater than requested is worth fixing). This patch adds a usleep API so that udelay does not have to be used. Obviously not every udelay can be replaced (those in atomic contexts or being used for simple bitbanging come to mind), but there are many, many examples of mydriver_write(...) /* Wait for hardware to latch */ udelay(100) in various drivers where a busy-wait loop is neither beneficial nor necessary, but msleep simply does not provide enough precision and people are using a busy-wait loop instead. CONCERNS FROM THE RFC Why is udelay a problem / necessary? Most callers of udelay are in device/ driver initialization code, which is serial... As I see it, there is only benefit to sleeping over a delay; the notion of "refactoring" areas that use udelay was presented, but I see usleep as the refactoring. Consider i2c, if the bus is busy, you need to wait a bit (say 100us) before trying again, your current options are: * udelay(100) * msleep(1) <-- As noted above, actually as high as ~20ms on some platforms, so not really an option * Manually set up an hrtimer to try again in 100us (which is what usleep does anyway...) People choose the udelay route because it is EASY; we need to provide a better easy route. Device / driver / boot code is *currently* serial, but every few months someone makes noise about parallelizing boot, and IMHO, a little forward-thinking now is one less thing to worry about if/when that ever happens udelay's could be preempted Sure, but if udelay plans on looping 1000 times, and it gets preempted on loop 200, whenever it's scheduled again, it is going to do the next 800 loops. Is the interruptible case needed? Probably not, but I see usleep as a very logical parallel to msleep, so it made sense to include the "full" API. Processors are getting faster (albeit not as quickly as they are becoming more parallel), so if someone wanted to be interruptible for a few usecs, why not let them? If this is a contentious point, I'm happy to remove it. OTHER THOUGHTS I believe there is also value in exposing the usleep_range option; it gives the scheduler a lot more flexibility and allows the programmer to express his intent much more clearly; it's something I would hope future driver writers will take advantage of. To get the results in the NUMBERS section below, I literally s/udelay/usleep the kernel tree; I had to go in and undo the changes to the USB drivers, but everything else booted successfully; I find that extremely telling in and of itself -- many people are using a delay API where a sleep will suit them just fine. SOME ATTEMPTS AT NUMBERS It turns out that calculating quantifiable benefit on this is challenging, so instead I will simply present the current state of things, and I hope this to be sufficient: How many udelay calls are there in 2.6.35-rc5? udealy(ARG) >= | COUNT 1000 | 319 500 | 414 100 | 1146 20 | 1832 I am working on Android, so that is my focus for this. The following table is a modified usleep that simply printk's the amount of time requested to sleep; these tests were run on a kernel with udelay >= 20 --> usleep "boot" is power-on to lock screen "power collapse" is when the power button is pushed and the device suspends "resume" is when the power button is pushed and the lock screen is displayed (no touchscreen events or anything, just turning on the display) "use device" is from the unlock swipe to clicking around a bit; there is no sd card in this phone, so fail loading music, video, camera ACTION | TOTAL NUMBER OF USLEEP CALLS | NET TIME (us) boot | 22 | 1250 power-collapse | 9 | 1200 resume | 5 | 500 use device | 59 | 7700 The most interesting category to me is the "use device" field; 7700us of busy-wait time that could be put towards better responsiveness, or at the least less power usage. Signed-off-by: NPatrick Pannuto <ppannuto@codeaurora.org> Cc: apw@canonical.com Cc: corbet@lwn.net Cc: arjan@linux.intel.com Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-