- 02 7月, 2011 2 次提交
-
-
由 Christoph Lameter 提交于
We need to be able to use cmpxchg_double on the freelist and object count field in struct page. Rearrange the fields in struct page according to doubleword entities so that the freelist pointer comes before the counters. Do the rearranging with a future in mind where we use more doubleword atomics to avoid locking of updates to flags/mapping or lru pointers. Create another union to allow access to counters in struct page as a single unsigned long value. The doublewords must be properly aligned for cmpxchg_double to work. Sadly this increases the size of page struct by one word on some architectures. But as a resultpage structs are now cacheline aligned on x86_64. Signed-off-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NPekka Enberg <penberg@kernel.org>
-
由 Christoph Lameter 提交于
Do not use a page flag for the frozen bit. It needs to be part of the state that is handled with cmpxchg_double(). So use a bit in the counter struct in the page struct for that purpose. Signed-off-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NPekka Enberg <penberg@kernel.org>
-
- 28 6月, 2011 7 次提交
-
-
由 Jan Kara 提交于
Under heavy memory and filesystem load, users observe the assertion mapping->nrpages == 0 in end_writeback() trigger. This can be caused by page reclaim reclaiming the last page from a mapping in the following race: CPU0 CPU1 ... shrink_page_list() __remove_mapping() __delete_from_page_cache() radix_tree_delete() evict_inode() truncate_inode_pages() truncate_inode_pages_range() pagevec_lookup() - finds nothing end_writeback() mapping->nrpages != 0 -> BUG page->mapping = NULL mapping->nrpages-- Fix the problem by doing a reliable check of mapping->nrpages under mapping->tree_lock in end_writeback(). Analyzed by Jay <jinshan.xiong@whamcloud.com>, lost in LKML, and dug out by Miklos Szeredi <mszeredi@suse.de>. Cc: Jay <jinshan.xiong@whamcloud.com> Cc: Miklos Szeredi <mszeredi@suse.de> Signed-off-by: NJan Kara <jack@suse.cz> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Chris Metcalf 提交于
This is required for tilegx to be able to use the compat unistd.h header where compat_sys_sendmmsg() is now mentioned. Signed-off-by: NChris Metcalf <cmetcalf@tilera.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
Although it is used (by i915) on nothing but tmpfs, read_cache_page_gfp() is unsuited to tmpfs, because it inserts a page into pagecache before calling the filesystem's ->readpage: tmpfs may have pages in swapcache which only it knows how to locate and switch to filecache. At present tmpfs provides a ->readpage method, and copes with this by copying pages; but soon we can simplify it by removing its ->readpage. Provide shmem_read_mapping_page_gfp() now, ready for that transition, Export shmem_read_mapping_page_gfp() and add it to list in shmem_fs.h, with shmem_read_mapping_page() inline for the common mapping_gfp case. (shmem_read_mapping_page_gfp or shmem_read_cache_page_gfp? Generally the read_mapping_page functions use the mapping's ->readpage, and the read_cache_page functions use the supplied filler, so I think read_cache_page_gfp was slightly misnamed.) Signed-off-by: NHugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
2.6.35's new truncate convention gave tmpfs the opportunity to control its file truncation, no longer enforced from outside by vmtruncate(). We shall want to build upon that, to handle pagecache and swap together. Slightly redefine the ->truncate_range interface: let it now be called between the unmap_mapping_range()s, with the filesystem responsible for doing the truncate_inode_pages_range() from it - just as the filesystem is nowadays responsible for doing that from its ->setattr. Let's rename shmem_notify_change() to shmem_setattr(). Instead of calling the generic truncate_setsize(), bring that code in so we can call shmem_truncate_range() - which will later be updated to perform its own variant of truncate_inode_pages_range(). Remove the punch_hole unmap_mapping_range() from shmem_truncate_range(): now that the COW's unmap_mapping_range() comes after ->truncate_range, there is no need to call it a third time. Export shmem_truncate_range() and add it to the list in shmem_fs.h, so that i915_gem_object_truncate() can call it explicitly in future; get this patch in first, then update drm/i915 once this is available (until then, i915 will just be doing the truncate_inode_pages() twice). Though introduced five years ago, no other filesystem is implementing ->truncate_range, and its only other user is madvise(,,MADV_REMOVE): we expect to convert it to fallocate(,FALLOC_FL_PUNCH_HOLE,,) shortly, whereupon ->truncate_range can be removed from inode_operations - shmem_truncate_range() will help i915 across that transition too. Signed-off-by: NHugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
Before adding any more global entry points into shmem.c, gather such prototypes into shmem_fs.h. Remove mm's own declarations from swap.h, but for now leave the ones in mm.h: because shmem_file_setup() and shmem_zero_setup() are called from various places, and we should not force other subsystems to update immediately. Signed-off-by: NHugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vitaliy Ivanov 提交于
Fix 'make htmldocs' warnings: Warning(/include/linux/hrtimer.h:153): No description found for parameter 'clockid' Warning(/include/linux/device.h:604): Excess struct/union/enum/typedef member 'of_match' description in 'device' Warning(/include/net/sock.h:349): Excess struct/union/enum/typedef member 'sk_rmem_alloc' description in 'sock' Signed-off-by: NVitaliy Ivanov <vitalivanov@gmail.com> Acked-by: NGrant Likely <grant.likely@secretlab.ca> Acked-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 KAMEZAWA Hiroyuki 提交于
commit 21a3c964 uses node_start/end_pfn(nid) for detection start/end of nodes. But, it's not defined in linux/mmzone.h but defined in /arch/???/include/mmzone.h which is included only under CONFIG_NEED_MULTIPLE_NODES=y. Then, we see mm/page_cgroup.c: In function 'page_cgroup_init': mm/page_cgroup.c:308: error: implicit declaration of function 'node_start_pfn' mm/page_cgroup.c:309: error: implicit declaration of function 'node_end_pfn' So, fixiing page_cgroup.c is an idea... But node_start_pfn()/node_end_pfn() is a very generic macro and should be implemented in the same manner for all archs. (m32r has different implementation...) This patch removes definitions of node_start/end_pfn() in each archs and defines a unified one in linux/mmzone.h. It's not under CONFIG_NEED_MULTIPLE_NODES, now. A result of macro expansion is here (mm/page_cgroup.c) for !NUMA start_pfn = ((&contig_page_data)->node_start_pfn); end_pfn = ({ pg_data_t *__pgdat = (&contig_page_data); __pgdat->node_start_pfn + __pgdat->node_spanned_pages;}); for NUMA (x86-64) start_pfn = ((node_data[nid])->node_start_pfn); end_pfn = ({ pg_data_t *__pgdat = (node_data[nid]); __pgdat->node_start_pfn + __pgdat->node_spanned_pages;}); Changelog: - fixed to avoid using "nid" twice in node_end_pfn() macro. Reported-and-acked-by: NRandy Dunlap <randy.dunlap@oracle.com> Reported-and-tested-by: NIngo Molnar <mingo@elte.hu> Acked-by: NMel Gorman <mgorman@suse.de> Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 22 6月, 2011 2 次提交
-
-
由 Alan Stern 提交于
The PM core doesn't handle suspend failures correctly when it comes to asynchronously suspended devices. These devices are moved onto the dpm_suspended_list as soon as the corresponding async thread is started up, and they remain on the list even if they fail to suspend or the sleep transition is cancelled before they get suspended. As a result, when the PM core unwinds the transition, it tries to resume the devices even though they were never suspended. This patch (as1474) fixes the problem by adding a new "is_suspended" flag to dev_pm_info. Devices are resumed only if the flag is set. [rjw: * Moved the dev->power.is_suspended check into device_resume(), because we need to complete dev->power.completion and clear dev->power.is_prepared too for devices whose dev->power.is_suspended flags are unset. * Fixed __device_suspend() to avoid setting dev->power.is_suspended if async_error is different from zero.] Signed-off-by: NAlan Stern <stern@rowland.harvard.edu> Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl> Cc: stable@kernel.org
-
由 Alan Stern 提交于
This patch (as1473) renames the "in_suspend" field in struct dev_pm_info to "is_prepared", in preparation for an upcoming change. The new name is more descriptive of what the field really means. Signed-off-by: NAlan Stern <stern@rowland.harvard.edu> Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl> Cc: stable@kernel.org
-
- 21 6月, 2011 2 次提交
-
-
由 Linus Torvalds 提交于
Commit 13e12d14 ("vfs: reorganize 'struct inode' layout a bit") moved things around a bit changed i_state to be unsigned int instead of unsigned long. That was to help structure layout for the 64-bit case, and shrink 'struct inode' a bit (admittedly that only happened when spinlock debugging was on and i_flags didn't pack with i_lock). However, Meelis Roos reports that this results in unaligned exceptions on sprc, and it turns out that the bit-locking primitives that we use for the I_NEW bit want to use the bitops. Which want 'unsigned long', not 'unsigned int'. We really should fix the bit locking code to not have that kind of requirement, but that's a much bigger change. So for now, revert that field back to 'unsigned long' (but keep the other re-ordering changes from the commit that caused this). Andi points out that we have played games with this in 'struct page', so it's solvable with other hacks too, but since right now the struct inode size advantage only happens with some rare config options, it's not worth fighting. It _would_ be worth fixing the bitlocking code, though. Especially since there is no type safety in the bitlocking code (this never caused any warnings, and worked fine on x86-64, because the bitlocks take a 'void *' and x86-64 doesn't care that deeply about alignment). So it's currently a very easy problem to trigger by mistake and never notice. Reported-by: NMeelis Roos <mroos@linux.ee> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Miller <davem@davemloft.net> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Benny Halevy 提交于
Otherwise we end up overflowing the rpc buffer size on the receive end. Signed-off-by: NBenny Halevy <benny@tonian.com> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
- 20 6月, 2011 2 次提交
-
-
由 Al Viro 提交于
inode_permission() calls devcgroup_inode_permission() and almost all such calls are _not_ for device nodes; let's at least keep the common path straight... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Namhyung Kim 提交于
Add REQ_SECURE flag to REQ_COMMON_MASK so that init_request_from_bio() can pass it to @req->cmd_flags. Signed-off-by: NNamhyung Kim <namhyung@gmail.com> Acked-by: NAdrian Hunter <adrian.hunter@intel.com> Cc: stable@kernel.org # 2.6.36 and newer Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 19 6月, 2011 1 次提交
-
-
由 Manoj Iyer 提交于
Signed-off-by: NManoj Iyer <manoj.iyer@canonical.com> Cc: <stable@kernel.org> Signed-off-by: NChris Ball <cjb@laptop.org>
-
- 18 6月, 2011 2 次提交
-
-
由 Magnus Damm 提交于
According to the data sheet for G4, AP4 and AG5 KEYSC MODE_6 is 8x8 keys. Bump up MAXKEYS to 64 too. Signed-off-by: NMagnus Damm <damm@opensource.se> Reviewed-by: NSimon Horman <horms@verge.net.au> Signed-off-by: NDmitry Torokhov <dtor@mail.ru>
-
由 David Howells 提交于
____call_usermodehelper() now erases any credentials set by the subprocess_inf::init() function. The problem is that commit 17f60a7d ("capabilites: allow the application of capability limits to usermode helpers") creates and commits new credentials with prepare_kernel_cred() after the call to the init() function. This wipes all keyrings after umh_keys_init() is called. The best way to deal with this is to put the init() call just prior to the commit_creds() call, and pass the cred pointer to init(). That means that umh_keys_init() and suchlike can modify the credentials _before_ they are published and potentially in use by the rest of the system. This prevents request_key() from working as it is prevented from passing the session keyring it set up with the authorisation token to /sbin/request-key, and so the latter can't assume the authority to instantiate the key. This causes the in-kernel DNS resolver to fail with ENOKEY unconditionally. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NEric Paris <eparis@redhat.com> Tested-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 6月, 2011 2 次提交
-
-
由 Takao Indoh 提交于
There is a problem that kdump(2nd kernel) sometimes hangs up due to a pending IPI from 1st kernel. Kernel panic occurs because IPI comes before call_single_queue is initialized. To fix the crash, rename init_call_single_data() to call_function_init() and call it in start_kernel() so that call_single_queue can be initialized before enabling interrupts. The details of the crash are: (1) 2nd kernel boots up (2) A pending IPI from 1st kernel comes when irqs are first enabled in start_kernel(). (3) Kernel tries to handle the interrupt, but call_single_queue is not initialized yet at this point. As a result, in the generic_smp_call_function_single_interrupt(), NULL pointer dereference occurs when list_replace_init() tries to access &q->list.next. Therefore this patch changes the name of init_call_single_data() to call_function_init() and calls it before local_irq_enable() in start_kernel(). Signed-off-by: NTakao Indoh <indou.takao@jp.fujitsu.com> Reviewed-by: NWANG Cong <xiyou.wangcong@gmail.com> Acked-by: NNeil Horman <nhorman@tuxdriver.com> Acked-by: NVivek Goyal <vgoyal@redhat.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Milton Miller <miltonm@bga.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: kexec@lists.infradead.org Link: http://lkml.kernel.org/r/D6CBEE2F420741indou.takao@jp.fujitsu.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Thomas Gleixner 提交于
The clocksource watchdog code is interruptible and it has been observed that this can trigger false positives which disable the TSC. The reason is that an interrupt storm or a long running interrupt handler between the read of the watchdog source and the read of the TSC brings the two far enough apart that the delta is larger than the unstable treshold. Move both reads into a short interrupt disabled region to avoid that. Reported-and-tested-by: NVernon Mauery <vernux@us.ibm.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: stable@kernel.org
-
- 16 6月, 2011 7 次提交
-
-
由 Randy Dunlap 提交于
Make GPIOF_ defined values available even when GPIOLIB nor GENERIC_GPIO is enabled by moving them to <linux/gpio.h>. Fixes these build errors in linux-next: sound/soc/codecs/ak4641.c:524: error: 'GPIOF_OUT_INIT_LOW' undeclared (first use in this function) sound/soc/codecs/wm8915.c:2921: error: 'GPIOF_OUT_INIT_LOW' undeclared (first use in this function) Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com> Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
-
由 Josh Triplett 提交于
The "hostname" tool falls back to setting the hostname to "localhost" if /etc/hostname does not exist. Distribution init scripts have the same fallback. However, if userspace never calls sethostname, such as when booting with init=/bin/sh, or otherwise booting a minimal system without the usual init scripts, the default hostname of "(none)" remains, unhelpfully appearing in various places such as prompts ("root@(none):~#") and logs. Furthermore, "(none)" doesn't typically resolve to anything useful. Make the default hostname configurable. This removes the need for the standard fallback, provides a useful default for systems that never call sethostname, and makes minimal systems that much more useful with less configuration. Distributions could choose to use "localhost" here to avoid the fallback, while embedded systems may wish to use a specific target hostname. Signed-off-by: NJosh Triplett <josh@joshtriplett.org> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Acked-by: NDavid Miller <davem@davemloft.net> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Kel Modderman <kel@otaku42.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dr. David Alan Gilbert 提交于
BUILD_BUG_ON_ZERO and BUILD_BUG_ON_NULL must return values, even in the CHECKER case otherwise various users of it become syntactically invalid. Signed-off-by: NDr. David Alan Gilbert <linux@treblig.org> Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 KOSAKI Motohiro 提交于
Recently, Robert Mueller reported (http://lkml.org/lkml/2010/9/12/236) that zone_reclaim_mode doesn't work properly on his new NUMA server (Dual Xeon E5520 + Intel S5520UR MB). He is using Cyrus IMAPd and it's built on a very traditional single-process model. * a master process which reads config files and manages the other process * multiple imapd processes, one per connection * multiple pop3d processes, one per connection * multiple lmtpd processes, one per connection * periodical "cleanup" processes. There are thousands of independent processes. The problem is, recent Intel motherboard turn on zone_reclaim_mode by default and traditional prefork model software don't work well on it. Unfortunatelly, such models are still typical even in the 21st century. We can't ignore them. This patch raises the zone_reclaim_mode threshold to 30. 30 doesn't have any specific meaning. but 20 means that one-hop QPI/Hypertransport and such relatively cheap 2-4 socket machine are often used for traditional servers as above. The intention is that these machines don't use zone_reclaim_mode. Note: ia64 and Power have arch specific RECLAIM_DISTANCE definitions. This patch doesn't change such high-end NUMA machine behavior. Dave Hansen said: : I know specifically of pieces of x86 hardware that set the information : in the BIOS to '21' *specifically* so they'll get the zone_reclaim_mode : behavior which that implies. : : They've done performance testing and run very large and scary benchmarks : to make sure that they _want_ this turned on. What this means for them : is that they'll probably be de-optimized, at least on newer versions of : the kernel. : : If you want to do this for particular systems, maybe _that_'s what we : should do. Have a list of specific configurations that need the : defaults overridden either because they're buggy, or they have an : unusual hardware configuration not really reflected in the distance : table. And later said: : The original change in the hardware tables was for the benefit of a : benchmark. Said benchmark isn't going to get run on mainline until the : next batch of enterprise distros drops, at which point the hardware where : this was done will be irrelevant for the benchmark. I'm sure any new : hardware will just set this distance to another yet arbitrary value to : make the kernel do what it wants. :) : : Also, when the hardware got _set_ to this initially, I complained. So, I : guess I'm getting my way now, with this patch. I'm cool with it. Reported-by: NRobert Mueller <robm@fastmail.fm> Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: NChristoph Lameter <cl@linux.com> Acked-by: NDavid Rientjes <rientjes@google.com> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "Luck, Tony" <tony.luck@intel.com> Acked-by: NDave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Randy Dunlap 提交于
Fix <linux/kmsg_dump.h> when CONFIG_PRINTK is not enabled: include/linux/kmsg_dump.h:56: error: 'EINVAL' undeclared (first use in this function) include/linux/kmsg_dump.h:61: error: 'EINVAL' undeclared (first use in this function) Looks like commit 595dd3d8 ("kmsg_dump: fix build for CONFIG_PRINTK=n") uses EINVAL without having the needed header file(s), but I'm sure that I build tested that patch also. oh well. Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 KOSAKI Motohiro 提交于
Currently, memcg reclaim can disable swap token even if the swap token mm doesn't belong in its memory cgroup. It's slightly risky. If an admin creates very small mem-cgroup and silly guy runs contentious heavy memory pressure workload, every tasks are going to lose swap token and then system may become unresponsive. That's bad. This patch adds 'memcg' parameter into disable_swap_token(). and if the parameter doesn't match swap token, VM doesn't disable it. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Rik van Riel<riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michael Hennerich 提交于
Signed-off-by: NMichael Hennerich <michael.hennerich@analog.com> Signed-off-by: NMike Frysinger <vapier@gentoo.org> Cc: Richard Purdie <rpurdie@rpsys.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 6月, 2011 3 次提交
-
-
由 Benny Halevy 提交于
We don't support header padding yet so better off ditching it Reported-by: NSid Moore <learnmost@gmail.com> Signed-off-by: NBenny Halevy <benny@tonian.com> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Trond Myklebust 提交于
If the NLM daemon is killed on the NFS server, we can currently end up hanging forever on an 'unlock' request, instead of aborting. Basically, if the rpcbind request fails, or the server keeps returning garbage, we really want to quit instead of retrying. Tested-by: NVasily Averin <vvs@sw.ru> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org
-
由 Shaohua Li 提交于
Commit a26ac245(rcu: move TREE_RCU from softirq to kthread) introduced performance regression. In an AIM7 test, this commit degraded performance by about 40%. The commit runs rcu callbacks in a kthread instead of softirq. We observed high rate of context switch which is caused by this. Out test system has 64 CPUs and HZ is 1000, so we saw more than 64k context switch per second which is caused by RCU's per-CPU kthread. A trace showed that most of the time the RCU per-CPU kthread doesn't actually handle any callbacks, but instead just does a very small amount of work handling grace periods. This means that RCU's per-CPU kthreads are making the scheduler do quite a bit of work in order to allow a very small amount of RCU-related processing to be done. Alex Shi's analysis determined that this slowdown is due to lock contention within the scheduler. Unfortunately, as Peter Zijlstra points out, the scheduler's real-time semantics require global action, which means that this contention is inherent in real-time scheduling. (Yes, perhaps someone will come up with a workaround -- otherwise, -rt is not going to do well on large SMP systems -- but this patch will work around this issue in the meantime. And "the meantime" might well be forever.) This patch therefore re-introduces softirq processing to RCU, but only for core RCU work. RCU callbacks are still executed in kthread context, so that only a small amount of RCU work runs in softirq context in the common case. This should minimize ksoftirqd execution, allowing us to skip boosting of ksoftirqd for CONFIG_RCU_BOOST=y kernels. Signed-off-by: NShaohua Li <shaohua.li@intel.com> Tested-by: N"Alex,Shi" <alex.shi@intel.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
- 14 6月, 2011 1 次提交
-
-
由 Jan Kara 提交于
jbd2_journal_remove_journal_head() can oops when trying to access journal_head returned by bh2jh(). This is caused for example by the following race: TASK1 TASK2 jbd2_journal_commit_transaction() ... processing t_forget list __jbd2_journal_refile_buffer(jh); if (!jh->b_transaction) { jbd_unlock_bh_state(bh); jbd2_journal_try_to_free_buffers() jbd2_journal_grab_journal_head(bh) jbd_lock_bh_state(bh) __journal_try_to_free_buffer() jbd2_journal_put_journal_head(jh) jbd2_journal_remove_journal_head(bh); jbd2_journal_put_journal_head() in TASK2 sees that b_jcount == 0 and buffer is not part of any transaction and thus frees journal_head before TASK1 gets to doing so. Note that even buffer_head can be released by try_to_free_buffers() after jbd2_journal_put_journal_head() which adds even larger opportunity for oops (but I didn't see this happen in reality). Fix the problem by making transactions hold their own journal_head reference (in b_jcount). That way we don't have to remove journal_head explicitely via jbd2_journal_remove_journal_head() and instead just remove journal_head when b_jcount drops to zero. The result of this is that [__]jbd2_journal_refile_buffer(), [__]jbd2_journal_unfile_buffer(), and __jdb2_journal_remove_checkpoint() can free journal_head which needs modification of a few callers. Also we have to be careful because once journal_head is removed, buffer_head might be freed as well. So we have to get our own buffer_head reference where it matters. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 13 6月, 2011 2 次提交
-
-
由 Joe Perches 提交于
Use the compiler to verify format strings and arguments. Fix fallout. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Al Viro 提交于
* new refcount in struct net, controlling actual freeing of the memory * new method in kobj_ns_type_operations (->drop_ns()) * ->current_ns() semantics change - it's supposed to be followed by corresponding ->drop_ns(). For struct net in case of CONFIG_NET_NS it bumps the new refcount; net_drop_ns() decrements it and calls net_free() if the last reference has been dropped. Method renamed to ->grab_current_ns(). * old net_free() callers call net_drop_ns() instead. * sysfs_exit_ns() is gone, along with a large part of callchain leading to it; now that the references stored in ->ns[...] stay valid we do not need to hunt them down and replace them with NULL. That fixes problems in sysfs_lookup() and sysfs_readdir(), along with getting rid of sb->s_instances abuse. Note that struct net *shutdown* logics has not changed - net_cleanup() is called exactly when it used to be called. The only thing postponed by having a sysfs instance refering to that struct net is actual freeing of memory occupied by struct net. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 12 6月, 2011 2 次提交
-
-
由 Jiri Pirko 提交于
Testing of VLAN_FLAG_REORDER_HDR does not belong in vlan_untag but rather in vlan_do_receive. Otherwise the vlan header will not be properly put on the packet in the case of vlan header accelleration. As we remove the check from vlan_check_reorder_header rename it vlan_reorder_header to keep the naming clean. Fix up the skb->pkt_type early so we don't look at the packet after adding the vlan tag, which guarantees we don't goof and look at the wrong field. Use a simple if statement instead of a complicated switch statement to decided that we need to increment rx_stats for a multicast packet. Hopefully at somepoint we will just declare the case where VLAN_FLAG_REORDER_HDR is cleared as unsupported and remove the code. Until then this keeps it working correctly. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Signed-off-by: NJiri Pirko <jpirko@redhat.com> Acked-by: NChangli Gao <xiaosuo@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Howells 提交于
It uses cpu_relax(), and so needs <asm/processor.h> Without this patch, I see: CC arch/mn10300/kernel/asm-offsets.s In file included from include/linux/time.h:8, from include/linux/timex.h:56, from include/linux/sched.h:57, from arch/mn10300/kernel/asm-offsets.c:7: include/linux/seqlock.h: In function 'read_seqbegin': include/linux/seqlock.h:91: error: implicit declaration of function 'cpu_relax' whilst building asb2364_defconfig on MN10300. Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 6月, 2011 2 次提交
-
-
由 Jamie Iles 提交于
include/linux/basic_mmio_gpio.h uses a spinlock_t without including any of the spinlock headers resulting in this compiler warning. include/linux/basic_mmio_gpio.h:51:2: error: expected specifier-qualifier-list before 'spinlock_t' Explicitly include linux/spinlock_types.h to fix it. Signed-off-by: NJamie Iles <jamie@jamieiles.com> Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
-
由 Yegor Yefremov 提交于
Signed-off-by: NYegor Yefremov <yegorslists@googlemail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 6月, 2011 1 次提交
-
-
由 Linus Torvalds 提交于
This tries to make the 'struct inode' accesses denser in the data cache by moving a commonly accessed field (i_security) closer to other fields that are accessed often. It also makes 'i_state' just an 'unsigned int' rather than 'unsigned long', since we only use a few bits of that field, and moves it next to the existing 'i_flags' so that we potentially get better structure layout (although depending on config options, i_flags may already have packed in the same word as i_lock, so this improves packing only for the case of spinlock debugging) Out 'struct inode' is still way too big, and we should probably move some other fields around too (the acl fields in particular) for better data cache access density. Other fields (like the inode hash) are likely to be entirely irrelevant under most loads. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 08 6月, 2011 1 次提交
-
-
由 Alan Stern 提交于
Some USB mass-storage devices have bugs that cause them not to handle the first READ(10) command they receive correctly. The Corsair Padlock v2 returns completely bogus data for its first read (possibly it returns the data in encrypted form even though the device is supposed to be unlocked). The Feiya SD/SDHC card reader fails to complete the first READ(10) command after it is plugged in or after a new card is inserted, returning a status code that indicates it thinks the command was invalid, which prevents the kernel from retrying the read. Since the first read of a new device or a new medium is for the partition sector, the kernel is unable to retrieve the device's partition table. Users have to manually issue an "hdparm -z" or "blockdev --rereadpt" command before they can access the device. This patch (as1470) works around the problem. It adds a new quirk flag, US_FL_INVALID_READ10, indicating that the first READ(10) should always be retried immediately, as should any failing READ(10) commands (provided the preceding READ(10) command succeeded, to avoid getting stuck in a loop). The patch also adds appropriate unusual_devs entries containing the new flag. Signed-off-by: NAlan Stern <stern@rowland.harvard.edu> Tested-by: NSven Geggus <sven-usbst@geggus.net> Tested-by: NPaul Hartman <paul.hartman+linux@gmail.com> CC: Matthew Dharm <mdharm-usb@one-eyed-alien.net> CC: <stable@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
- 07 6月, 2011 1 次提交
-
-
由 Eric Dumazet 提交于
In 2.6.27, commit 393e52e3 (packet: deliver VLAN TCI to userspace) added a small information leak. Add padding field and make sure its zeroed before copy to user. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> CC: Patrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-