- 04 4月, 2009 4 次提交
-
-
由 Andy Adamson 提交于
This patch provides basic data structures representing the nfs41 sessions and slots, plus helpers for keeping a reference count on the session and freeing it. Note that our server only support a headerpadsz of 0 and it ignores backchannel attributes at the moment. Signed-off-by: NBenny Halevy <bhalevy@panasas.com> [nfsd41: remove headerpadsz from channel attributes] [nfsd41: embed nfsd4_channel in nfsd4_session] Signed-off-by: NAndy Adamson <andros@netapp.com> Signed-off-by: NBenny Halevy <bhalevy@panasas.com> [nfsd41: use bool inuse for slot state] Signed-off-by: NBenny Halevy <bhalevy@panasas.com> [nfsd41 remove sl_session from nfsd4_slot] Signed-off-by: NAndy Adamson <andros@netapp.com> Signed-off-by: NBenny Halevy <bhalevy@panasas.com> Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
-
由 Marc Eshel 提交于
Define all error code present in http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-29. Signed-off-by: NBenny Halevy <bhalevy@panasas.com> [nfsd41: clean up error code definitions] [nfsd41: change NFSERR_REPLAY_ME] Signed-off-by: NBenny Halevy <bhalevy@panasas.com> Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
-
由 Benny Halevy 提交于
Define all NFSv4.1 common operation and error code constants. Note that some of the definitions are used by both the nfs41 client and the server code. This patch is duplicated in the nfs41 and nfsd41 sessions patchset. Signed-off-by: Andy Adamson<andros@netapp.com> Signed-off-by: NBenny Halevy <bhalevy@panasas.com> [nfs41: add exchange id flags] Signed-off-by: NMike Sager <sager@netapp.com> Signed-off-by: NBenny Halevy <bhalevy@panasas.com> [removed server-only hunk changing NFSERR_REPLAY_ME] Signed-off-by: NBenny Halevy <bhalevy@panasas.com> [nfs41: add SEQ4_XX to nfs41-common-protocol] Signed-off-by: NAndy Adamson <andros@netapp.com> Signed-off-by: NBenny Halevy <bhalevy@panasas.com> [nfs41: generic error code update] [nfs41: reverse EXCHGID4_INVAL_FLAG_MASK_{A,R}] Signed-off-by: NBenny Halevy <bhalevy@panasas.com> Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
-
由 Andy Adamson 提交于
On an NFSv4.1 server cache miss that causes an upcall, NFS4ERR_DELAY will be returned. It is up to the NFSv4.1 client to resend only the operations that have not been processed. Initialize rq_usedeferral to 1 in svc_process(). It sill be turned off in nfsd4_proc_compound() only when NFSv4.1 Sessions are used. Note: this isn't an adequate solution on its own. It's acceptable as a way to get some minimal 4.1 up and working, but we're going to have to find a way to avoid returning DELAY in all common cases before 4.1 can really be considered ready. Signed-off-by: NAndy Adamson <andros@netapp.com> Signed-off-by: NBenny Halevy <bhalevy@panasas.com> [nfsd41: reverse rq_nodeferral negative logic] Signed-off-by: NBenny Halevy <bhalevy@panasas.com> [sunrpc: initialize rq_usedeferral] Signed-off-by: NAndy Adamson <andros@netapp.com> Signed-off-by: NBenny Halevy <bhalevy@panasas.com> Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
-
- 30 3月, 2009 1 次提交
-
-
由 Andy Adamson 提交于
Remove the allocation of struct nfsd4_compound_state. Signed-off-by: NAndy Adamson <andros@netapp.com> Signed-off-by: NBenny Halevy <bhalevy@panasas.com> Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
-
- 28 3月, 2009 1 次提交
-
-
由 Greg Banks 提交于
Fix a build warning about leaking CONFIG_NFSD to userspace. The nfsd_stats data structure does not need to be available to userspace; no kernel interface uses it. So move it inside #ifdef __KERNEL__ and the warning goes away. Signed-off-by: NGreg Banks <gnb@sgi.com> Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
-
- 19 3月, 2009 10 次提交
-
-
由 Chuck Lever 提交于
Clean up: Enable the use of const arguments in higher level svc_ APIs by adding const to the arguments of the helper functions in svc_xprt.h Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
-
由 Greg Banks 提交于
Add /proc/fs/nfsd/pool_stats to export to userspace various statistics about the operation of rpc server thread pools. This patch is based on a forward-ported version of knfsd-add-pool-thread-stats which has been shipping in the SGI "Enhanced NFS" product since 2006 and which was previously posted: http://article.gmane.org/gmane.linux.nfs/10375 It has also been updated thus: * moved EXPORT_SYMBOL() to near the function it exports * made the new struct struct seq_operations const * used SEQ_START_TOKEN instead of ((void *)1) * merged fix from SGI PV 990526 "sunrpc: use dprintk instead of printk in svc_pool_stats_*()" by Harshula Jayasuriya. * merged fix from SGI PV 964001 "Crash reading pool_stats before nfsds are started". Signed-off-by: NGreg Banks <gnb@sgi.com> Signed-off-by: NHarshula Jayasuriya <harshula@sgi.com> Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
-
由 Greg Banks 提交于
Avoid overloading the CPU scheduler with enormous load averages when handling high call-rate NFS loads. When the knfsd bottom half is made aware of an incoming call by the socket layer, it tries to choose an nfsd thread and wake it up. As long as there are idle threads, one will be woken up. If there are lot of nfsd threads (a sensible configuration when the server is disk-bound or is running an HSM), there will be many more nfsd threads than CPUs to run them. Under a high call-rate low service-time workload, the result is that almost every nfsd is runnable, but only a handful are actually able to run. This situation causes two significant problems: 1. The CPU scheduler takes over 10% of each CPU, which is robbing the nfsd threads of valuable CPU time. 2. At a high enough load, the nfsd threads starve userspace threads of CPU time, to the point where daemons like portmap and rpc.mountd do not schedule for tens of seconds at a time. Clients attempting to mount an NFS filesystem timeout at the very first step (opening a TCP connection to portmap) because portmap cannot wake up from select() and call accept() in time. Disclaimer: these effects were observed on a SLES9 kernel, modern kernels' schedulers may behave more gracefully. The solution is simple: keep in each svc_pool a counter of the number of threads which have been woken but have not yet run, and do not wake any more if that count reaches an arbitrary small threshold. Testing was on a 4 CPU 4 NIC Altix using 4 IRIX clients, each with 16 synthetic client threads simulating an rsync (i.e. recursive directory listing) workload reading from an i386 RH9 install image (161480 regular files in 10841 directories) on the server. That tree is small enough to fill in the server's RAM so no disk traffic was involved. This setup gives a sustained call rate in excess of 60000 calls/sec before being CPU-bound on the server. The server was running 128 nfsds. Profiling showed schedule() taking 6.7% of every CPU, and __wake_up() taking 5.2%. This patch drops those contributions to 3.0% and 2.2%. Load average was over 120 before the patch, and 20.9 after. This patch is a forward-ported version of knfsd-avoid-nfsd-overload which has been shipping in the SGI "Enhanced NFS" product since 2006. It has been posted before: http://article.gmane.org/gmane.linux.nfs/10374Signed-off-by: NGreg Banks <gnb@sgi.com> Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
-
由 David Shaw 提交于
If a filesystem being written to via NFS returns a short write count (as opposed to an error) to nfsd, nfsd treats that as a success for the entire write, rather than the short count that actually succeeded. For example, given a 8192 byte write, if the underlying filesystem only writes 4096 bytes, nfsd will ack back to the nfs client that all 8192 bytes were written. The nfs client does have retry logic for short writes, but this is never called as the client is told the complete write succeeded. There are probably other ways it could happen, but in my case it happened with a fuse (filesystem in userspace) filesystem which can rather easily have a partial write. Here is a patch to properly return the short write count to the client. Signed-off-by: NDavid Shaw <dshaw@jabberwocky.com> Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
-
由 J. Bruce Fields 提交于
As part of reducing the scope of the client_mutex, and in order to remove the need for mutexes from the callback code (so that callbacks can be done as asynchronous rpc calls), move manipulations of the file_hashtable under the recall_lock. Update the relevant comments while we're here. Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu> Cc: Alexandros Batsakis <batsakis@netapp.com> Reviewed-by: NBenny Halevy <bhalevy@panasas.com>
-
由 J. Bruce Fields 提交于
All users now pass this, so it's meaningless. Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
-
由 J. Bruce Fields 提交于
Delegreturn is enough a special case for preprocess_stateid_op to warrant just open-coding it in delegreturn. There should be no change in behavior here; we're just reshuffling code. Thanks to Yang Hongyang for catching a critical typo. Reviewed-by: NYang Hongyang <yanghy@cn.fujitsu.com> Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
-
由 Harvey Harrison 提交于
The base versions handle constant folding now, none of these headers are exported to userspace, so the __ prefixed versions are not necessary. Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com> Reviewed-by: NNeilBrown <neilb@suse.de> Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
-
由 J. Bruce Fields 提交于
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
-
由 J. Bruce Fields 提交于
Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
-
- 18 3月, 2009 1 次提交
-
-
由 J. Bruce Fields 提交于
Since creating a device node is normally an operation requiring special privilege, Igor Zhbanov points out that it is surprising (to say the least) that a client can, for example, create a device node on a filesystem exported with root_squash. So, make sure CAP_MKNOD is among the capabilities dropped when an nfsd thread handles a request from a non-root user. Reported-by: NIgor Zhbanov <izh1979@gmail.com> Cc: stable@kernel.org Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
-
- 15 3月, 2009 1 次提交
-
-
由 un'ichi Nomura 提交于
Stricter gfp_mask might be required for clone allocation. For example, request-based dm may clone bio in interrupt context so it has to use GFP_ATOMIC. Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: NMartin K. Petersen <martin.petersen@oracle.com> Cc: Alasdair G Kergon <agk@redhat.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 14 3月, 2009 1 次提交
-
-
由 FUJITA Tomonori 提交于
dma_map_sg could return a value different to 'nents' argument of dma_map_sg so the ide stack needs to save it for the later usage (e.g. for_each_sg). The ide stack also needs to save the original sg_nents value for pci_unmap_sg. Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> [bart: backport to Linus' tree] Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
-
- 13 3月, 2009 1 次提交
-
-
由 Uwe Kleine-König 提交于
This is a fix for the following crash observed in 2.6.29-rc3: http://lkml.org/lkml/2009/1/29/150 On ARM it doesn't make sense to trace a naked function because then mcount is called without stack and frame pointer being set up and there is no chance to restore the lr register to the value before mcount was called. Reported-by: NMatthias Kaehlcke <matthias@kaehlcke.net> Tested-by: NMatthias Kaehlcke <matthias@kaehlcke.net> Cc: Abhishek Sagar <sagar.abhishek@gmail.com> Cc: Steven Rostedt <rostedt@home.goodmis.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
- 12 3月, 2009 2 次提交
-
-
由 Rusty Russell 提交于
This allows us to change the representation (to a dangling bitmap or cpumask_var_t) without breaking all the callers: they can use mm_cpumask() now and won't see a difference as the changes roll into linux-next. Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 Rusty Russell 提交于
This allows us to change the representation (to a dangling bitmap or cpumask_var_t) without breaking all the callers: they can use tsk_cpumask() now and won't see a difference as the changes roll into linux-next. Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
- 11 3月, 2009 2 次提交
-
-
由 Chuck Lever 提交于
Clean up/micro-optimatization: Make the AF_INET-only version of nlm_cmp_addr() smaller. This matches the style of nlm_privileged_requester(), and makes the AF_INET-only version of nlm_cmp_addr() nearly the same size as it was before IPv6 support. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Trond Myklebust 提交于
Fix a memory leak due to allocation in the XDR layer. In cases where the RPC call needs to be retransmitted, we end up allocating new pages without clearing the old ones. Fix this by moving the allocation into nfs3_proc_setacls(). Also fix an issue discovered by Kevin Rudd, whereby the amount of memory reserved for the acls in the xdr_buf->head was miscalculated, and causing corruption. Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
- 10 3月, 2009 1 次提交
-
-
由 Dave Jones 提交于
This reverts commit e088e4c9. Removing the sysfs interface for p4-clockmod was flagged as a regression in bug 12826. Course of action: - Find out the remaining causes of overheating, and fix them if possible. ACPI should be doing the right thing automatically. If it isn't, we need to fix that. - mark p4-clockmod ui as deprecated - try again with the removal in six months. It's not really feasible to printk about the deprecation, because it needs to happen at all the sysfs entry points, which means adding a lot of strcmp("p4-clockmod".. calls to the core, which.. bleuch. Signed-off-by: NDave Jones <davej@redhat.com>
-
- 08 3月, 2009 1 次提交
-
-
由 Dmitry Torokhov 提交于
Protocol 0x37 has been reserved for iNexio devices and Sahara was supposed to get 0x38. Reported-by: NClaudio Nieder <private@claudio.ch> Signed-off-by: NDmitry Torokhov <dtor@mail.ru>
-
- 06 3月, 2009 1 次提交
-
-
由 Sergei Shtylyov 提交于
Declare CFA specific identify data words 162 and 163 for future use. Signed-off-by: NSergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: NJeff Garzik <jgarzik@redhat.com> [bart: update patch summary/description] Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
-
- 05 3月, 2009 6 次提交
-
-
HDIO_GET_IDENTITY returns 256 words currently. Noticed by Norman Diamond. Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
-
由 Stanislaw Gruszka 提交于
Signed-off-by: NStanislaw Gruszka <stf_xl@wp.pl> Cc: Andrew Victor <linux@maxim.org.za> [bart: minor checkpatch.pl / CodingStyle fixups] Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
-
由 Tejun Heo 提交于
ap->sector_buf is used as DMA target and should at least be aligned on cacheline. This caused problems on some embedded machines. Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
-
由 FUJITA Tomonori 提交于
libata passes the returned value of dma_map_sg() to dma_unmap_sg(),which is the misuse of dma_unmap_sg(). DMA-mapping.txt says: To unmap a scatterlist, just call: pci_unmap_sg(pdev, sglist, nents, direction); Again, make sure DMA activity has already finished. PLEASE NOTE: The 'nents' argument to the pci_unmap_sg call must be the _same_ one you passed into the pci_map_sg call, it should _NOT_ be the 'count' value _returned_ from the pci_map_sg call. Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Acked-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com> Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
-
由 Stuart Hayes 提交于
This fixes problems during resume with drives that take longer than 1s to be ready. The ATA-6 spec appears to allow 5 seconds for a drive to be ready. On one affected system, this patch changes "PM: resume devices took..." message from 17 seconds to 4 seconds, and gets rid of a lot of ugly timeout/error messages. Without this patch, the libata code moves on after 1s, tries to send a soft reset (which the drive doesn't see because it isn't ready) which also times out, then an IDENTIFY command is sent to the drive which times out, and finally the error handler will try to send another hard reset which will finally get things working. Signed-off-by: NStuart Hayes <stuart_hayes@dell.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
-
由 David S. Miller 提交于
As analyzed by Patrick McHardy, vlan needs to reset it's netdev_ops pointer in it's ->init() function but this leaves the compat method pointers stale. Add a netdev_resync_ops() and call it from the vlan code. Any other driver which changes ->netdev_ops after register_netdevice() will need to call this new function after doing so too. With help from Patrick McHardy. Tested-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 3月, 2009 1 次提交
-
-
由 Pallipadi, Venkatesh 提交于
Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 01 3月, 2009 2 次提交
-
-
由 Chris Leech 提交于
The DCB netlink interface is required for building the userspace tools available at e1000.sourceforge.net Signed-off-by: NChris Leech <christopher.leech@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Chris Leech 提交于
1) add an include for <linux/types.h> 2) change dcbmsg.dcb_family from unsigned char to __u8 to be more consistent with use of kernel types Signed-off-by: NChris Leech <christopher.leech@intel.com> Acked-by: NSam Ravnborg <sam@ravnborg.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 2月, 2009 1 次提交
-
-
由 David Howells 提交于
free_uid() and free_user_ns() are corecursive when CONFIG_USER_SCHED=n, but free_user_ns() is called from free_uid() by way of uid_hash_remove(), which requires uidhash_lock to be held. free_user_ns() then calls free_uid() to complete the destruction. Fix this by deferring the destruction of the user_namespace. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NSerge Hallyn <serue@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 27 2月, 2009 1 次提交
-
-
由 Dhaval Giani 提交于
Impact: fix hung task with certain (non-default) rt-limit settings Corey Hickey reported that on using setuid to change the uid of a rt process, the process would be unkillable and not be running. This is because there was no rt runtime for that user group. Add in a check to see if a user can attach an rt task to its task group. On failure, return EINVAL, which is also returned in CONFIG_CGROUP_SCHED. Reported-by: NCorey Hickey <bugfood-ml@fatooh.org> Signed-off-by: NDhaval Giani <dhaval@linux.vnet.ibm.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 26 2月, 2009 2 次提交
-
-
由 Jens Axboe 提交于
blk_recalc_rq_segments() requires a request structure passed in, which we don't have from blk_recount_segments(). So the latter allocates one on the stack, using > 400 bytes of stack for that. This can cause us to spill over one page of stack from ext4 at least: 0) 4560 400 blk_recount_segments+0x43/0x62 1) 4160 32 bio_phys_segments+0x1c/0x24 2) 4128 32 blk_rq_bio_prep+0x2a/0xf9 3) 4096 32 init_request_from_bio+0xf9/0xfe 4) 4064 112 __make_request+0x33c/0x3f6 5) 3952 144 generic_make_request+0x2d1/0x321 6) 3808 64 submit_bio+0xb9/0xc3 7) 3744 48 submit_bh+0xea/0x10e 8) 3696 368 ext4_mb_init_cache+0x257/0xa6a [ext4] 9) 3328 288 ext4_mb_regular_allocator+0x421/0xcd9 [ext4] 10) 3040 160 ext4_mb_new_blocks+0x211/0x4b4 [ext4] 11) 2880 336 ext4_ext_get_blocks+0xb61/0xd45 [ext4] 12) 2544 96 ext4_get_blocks_wrap+0xf2/0x200 [ext4] 13) 2448 80 ext4_da_get_block_write+0x6e/0x16b [ext4] 14) 2368 352 mpage_da_map_blocks+0x7e/0x4b3 [ext4] 15) 2016 352 ext4_da_writepages+0x2ce/0x43c [ext4] 16) 1664 32 do_writepages+0x2d/0x3c 17) 1632 144 __writeback_single_inode+0x162/0x2cd 18) 1488 96 generic_sync_sb_inodes+0x1e3/0x32b 19) 1392 16 sync_sb_inodes+0xe/0x10 20) 1376 48 writeback_inodes+0x69/0xb3 21) 1328 208 balance_dirty_pages_ratelimited_nr+0x187/0x2f9 22) 1120 224 generic_file_buffered_write+0x1d4/0x2c4 23) 896 176 __generic_file_aio_write_nolock+0x35f/0x393 24) 720 80 generic_file_aio_write+0x6c/0xc8 25) 640 80 ext4_file_write+0xa9/0x137 [ext4] 26) 560 320 do_sync_write+0xf0/0x137 27) 240 48 vfs_write+0xb3/0x13c 28) 192 64 sys_write+0x4c/0x74 29) 128 128 system_call_fastpath+0x16/0x1b Split the segment counting out into a __blk_recalc_rq_segments() helper to avoid allocating an onstack request just for checking the physical segment count. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Paul E. McKenney 提交于
This patch fixes a bug located by Vegard Nossum with the aid of kmemcheck, updated based on review comments from Nick Piggin, Ingo Molnar, and Andrew Morton. And cleans up the variable-name and function-name language. ;-) The boot CPU runs in the context of its idle thread during boot-up. During this time, idle_cpu(0) will always return nonzero, which will fool Classic and Hierarchical RCU into deciding that a large chunk of the boot-up sequence is a big long quiescent state. This in turn causes RCU to prematurely end grace periods during this time. This patch changes the rcutree.c and rcuclassic.c rcu_check_callbacks() function to ignore the idle task as a quiescent state until the system has started up the scheduler in rest_init(), introducing a new non-API function rcu_idle_now_means_idle() to inform RCU of this transition. RCU maintains an internal rcu_idle_cpu_truthful variable to track this state, which is then used by rcu_check_callback() to determine if it should believe idle_cpu(). Because this patch has the effect of disallowing RCU grace periods during long stretches of the boot-up sequence, this patch also introduces Josh Triplett's UP-only optimization that makes synchronize_rcu() be a no-op if num_online_cpus() returns 1. This allows boot-time code that calls synchronize_rcu() to proceed normally. Note, however, that RCU callbacks registered by call_rcu() will likely queue up until later in the boot sequence. Although rcuclassic and rcutree can also use this same optimization after boot completes, rcupreempt must restrict its use of this optimization to the portion of the boot sequence before the scheduler starts up, given that an rcupreempt RCU read-side critical section may be preeempted. In addition, this patch takes Nick Piggin's suggestion to make the system_state global variable be __read_mostly. Changes since v4: o Changes the name of the introduced function and variable to be less emotional. ;-) Changes since v3: o WARN_ON(nr_context_switches() > 0) to verify that RCU switches out of boot-time mode before the first context switch, as suggested by Nick Piggin. Changes since v2: o Created rcu_blocking_is_gp() internal-to-RCU API that determines whether a call to synchronize_rcu() is itself a grace period. o The definition of rcu_blocking_is_gp() for rcuclassic and rcutree checks to see if but a single CPU is online. o The definition of rcu_blocking_is_gp() for rcupreempt checks to see both if but a single CPU is online and if the system is still in early boot. This allows rcupreempt to again work correctly if running on a single CPU after booting is complete. o Added check to rcupreempt's synchronize_sched() for there being but one online CPU. Tested all three variants both SMP and !SMP, booted fine, passed a short rcutorture test on both x86 and Power. Located-by: NVegard Nossum <vegard.nossum@gmail.com> Tested-by: NVegard Nossum <vegard.nossum@gmail.com> Tested-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-