- 16 5月, 2012 2 次提交
-
-
由 Jan Kara 提交于
If journal superblock is written only in disk's caches and other transaction starts reusing space of the transaction cleaned from the log, it can happen blocks of a new transaction reach the disk before journal superblock. When power failure happens in such case, subsequent journal replay would still try to replay the old transaction but some of it's blocks may be already overwritten by the new transaction. For this reason we must use WRITE_FUA when updating log tail and we must first write new log tail to disk and update in-memory information only after that. Signed-off-by: NJan Kara <jack@suse.cz>
-
由 Jan Kara 提交于
There are three case of updating journal superblock. In the first case, we want to mark journal as empty (setting s_sequence to 0), in the second case we want to update log tail, in the third case we want to update s_errno. Split these cases into separate functions. It makes the code slightly more straightforward and later patches will make the distinction even more important. Signed-off-by: NJan Kara <jack@suse.cz>
-
- 11 4月, 2012 1 次提交
-
-
由 Jan Kara 提交于
Currently we write out all journal buffers in WRITE_SYNC mode. This improves performance for fsync heavy workloads but hinders performance when writes are mostly asynchronous, most noticably it slows down readers and users complain about slow desktop response etc. So submit writes as asynchronous in the normal case and only submit writes as WRITE_SYNC if we detect someone is waiting for current transaction commit. I've gathered some numbers to back this change. The first is the read latency test. It measures time to read 1 MB after several seconds of sleeping in presence of streaming writes. Top 10 times (out of 90) in us: Before After 2131586 697473 1709932 557487 1564598 535642 1480462 347573 1478579 323153 1408496 222181 1388960 181273 1329565 181070 1252486 172832 1223265 172278 Average: 619377 82180 So the improvement in both maximum and average latency is massive. I've measured fsync throughput by: fs_mark -n 100 -t 1 -s 16384 -d /mnt/fsync/ -S 1 -L 4 in presence of streaming reader. The numbers (fsyncs/s) are: Before After 9.9 6.3 6.8 6.0 6.3 6.2 5.8 6.1 So fsync performance seems unharmed by this change. Signed-off-by: NJan Kara <jack@suse.cz>
-
- 06 4月, 2012 7 次提交
-
-
由 Ben Hutchings 提交于
Commit e52ac339 ('net: Use device model to get driver name in skb_gso_segment()') removed the only in-tree caller of ethtool ops that doesn't hold the RTNL lock. Signed-off-by: NBen Hutchings <bhutchings@solarflare.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Chris Ball 提交于
This reverts commit c16e981b2fd9455af670a69a84f4c8cf07e12658, because it's no longer useful once MSI support is reverted. Signed-off-by: NChris Ball <cjb@laptop.org>
-
由 Manuel Lauss 提交于
MSI on my O2Micro OZ600 SD card reader is broken. This patch adds a quirk to disable MSI on these controllers. Signed-off-by: NManuel Lauss <manuel.lauss@googlemail.com> Signed-off-by: NChris Ball <cjb@laptop.org>
-
由 Eric Dumazet 提交于
commit 2f533844 (tcp: allow splice() to build full TSO packets) added a regression for splice() calls using SPLICE_F_MORE. We need to call tcp_flush() at the end of the last page processed in tcp_sendpages(), or else transmits can be deferred and future sends stall. Add a new internal flag, MSG_SENDPAGE_NOTLAST, acting like MSG_MORE, but with different semantic. For all sendpage() providers, its a transparent change. Only sock_sendpage() and tcp_sendpages() can differentiate the two different flags provided by pipe_to_sendpage() Reported-by: NTom Herbert <therbert@google.com> Cc: Nandita Dukkipati <nanditad@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: H.K. Jerry Chu <hkchu@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: NEric Dumazet <eric.dumazet@gmail>com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Dave Jiang 提交于
This is the fallout from adding memcpy alignment workaround for certain IOATDMA hardware. NetDMA will only use DMA engine that can handle byte align ops. Acked-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NDave Jiang <dave.jiang@intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Michal Hocko 提交于
Although mem_cgroup_uncharge_swap has an empty placeholder for !CONFIG_CGROUP_MEM_RES_CTLR_SWAP the definition is placed in the CONFIG_SWAP ifdef block so we are missing the same definition for !CONFIG_SWAP which implies !CONFIG_CGROUP_MEM_RES_CTLR_SWAP. This has not been an issue before, because mem_cgroup_uncharge_swap was not called from !CONFIG_SWAP context. But Hugh Dickins has a cleanup patch to call __mem_cgroup_commit_charge_swapin which is defined also for !CONFIG_SWAP. Let's move both the empty definition and declaration outside of the CONFIG_SWAP block to avoid the following compilation error: mm/memcontrol.c: In function '__mem_cgroup_commit_charge_swapin': mm/memcontrol.c:2837: error: implicit declaration of function 'mem_cgroup_uncharge_swap' if CONFIG_SWAP is disabled. Reported-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NMichal Hocko <mhocko@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Stephen Boyd 提交于
debugfs and a few other drivers use an open-coded version of simple_open() to pass a pointer from the file to the read/write file ops. Add support for this simple case to libfs so that we can remove the many duplicate copies of this simple function. Signed-off-by: NStephen Boyd <sboyd@codeaurora.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 4月, 2012 4 次提交
-
-
由 Eric Dumazet 提交于
Commit f04565dd (dev: use name hash for dev_seq_ops) added a second regression, as some devices are missing from /proc/net/dev if many devices are defined. When seq_file buffer is filled, the last ->next/show() method is canceled (pos value is reverted to value prior ->next() call) Problem is after above commit, we dont restart the lookup at right position in ->start() method. Fix this by removing the internal 'pos' pointer added in commit, since we need to use the 'loff_t *pos' provided by seq_file layer. This also reverts commit 5cac98dd (net: Fix corruption in /proc/*/net/dev_mcast), since its not needed anymore. Reported-by: NBen Greear <greearb@candelatech.com> Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Cc: Mihai Maruseac <mmaruseac@ixiacom.com> Tested-by: NBen Greear <greearb@candelatech.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Linus Torvalds 提交于
It just bloats the audit data structure for no good reason, since the only time those fields are filled are just before calling the common_lsm_audit() function, which is also the only user of those fields. So just make them be the arguments to common_lsm_audit(), rather than bloating that structure that is passed around everywhere, and is initialized in hot paths. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Eric Paris 提交于
After shrinking the common_audit_data stack usage for private LSM data I'm not going to shrink the data union. To do this I'm going to move anything larger than 2 void * ptrs to it's own structure and require it to be declared separately on the calling stack. Thus hot paths which don't need more than a couple pointer don't have to declare space to hold large unneeded structures. I could get this down to one void * by dealing with the key struct and the struct path. We'll see if that is helpful after taking care of networking. Signed-off-by: NEric Paris <eparis@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Eric Paris 提交于
Linus found that the gigantic size of the common audit data caused a big perf hit on something as simple as running stat() in a loop. This patch requires LSMs to declare the LSM specific portion separately rather than doing it in a union. Thus each LSM can be responsible for shrinking their portion and don't have to pay a penalty just because other LSMs have a bigger space requirement. Signed-off-by: NEric Paris <eparis@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 03 4月, 2012 2 次提交
-
-
由 Paul Gortmaker 提交于
Commit 313162d0 ("device.h: audit and cleanup users in main include dir") exchanged an include <linux/device.h> for a struct *device but in actuality I misread this file when creating 313162d0 and it should have remained an include. There were no build regressions since all consumers were already getting device.h anyway, but make it right regardless. Reported-by: NStefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Paul Gortmaker 提交于
Commit bf4289cb ("ATMEL: fix nand ecc support") indicated that it wanted to "Move platform data to a common header include/linux/platform_data/atmel_nand.h" and the new header even had re-include protectors with: #ifndef __ATMEL_NAND_H__ However, the file that was added was simply called atmel.h and this caused avr32 defconfig to fail with: In file included from arch/avr32/boards/atstk1000/setup.c:22: arch/avr32/mach-at32ap/include/mach/board.h:10:44: error: linux/platform_data/atmel_nand.h: No such file or directory In file included from arch/avr32/boards/atstk1000/setup.c:22: arch/avr32/mach-at32ap/include/mach/board.h:121: warning: 'struct atmel_nand_data' declared inside parameter list arch/avr32/mach-at32ap/include/mach/board.h:121: warning: its scope is only this definition or declaration, which is probably not what you want make[2]: *** [arch/avr32/boards/atstk1000/setup.o] Error 1 It seems the scope of the file contents will expand beyond just nand, so ignore the original intention, and fix up the users who reference the bad name with the _nand suffix. CC: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com> CC: David Woodhouse <dwmw2@infradead.org> Acked-by: NHans-Christian Egtvedt <egtvedt@samfundet.no> Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 02 4月, 2012 1 次提交
-
-
由 Pavel Shilovsky 提交于
We can deadlock if we have a write oplock and two processes use the same file handle. In this case the first process can't unlock its lock if the second process blocked on the lock in the same time. Fix it by using posix_lock_file rather than posix_lock_file_wait under cinode->lock_mutex. If we request a blocking lock and posix_lock_file indicates that there is another lock that prevents us, wait untill that lock is released and restart our call. Cc: stable@kernel.org Acked-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NPavel Shilovsky <piastry@etersoft.ru> Signed-off-by: NSteve French <sfrench@us.ibm.com>
-
- 01 4月, 2012 4 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
... implemented that way since the next commit will leave it almost alone in ext2_fs.h - most of the file (including struct ext2_super_block) is going to move to fs/ext2/ext2.h. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Thierry Reding 提交于
Since the on-disk format has been stable for quite some time, users should either use the headers provided by libext2fs or keep a private copy of this header. For the full discussion, see this thread: https://lkml.org/lkml/2012/3/21/516 While at it, this commit removes all __KERNEL__ guards, which are now unnecessary. Signed-off-by: NThierry Reding <thierry.reding@avionic-design.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jan Kara <jack@suse.cz> Cc: Ted Ts'o <tytso@mit.edu> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Andreas Dilger <aedilger@gmail.com> Cc: linux-ext4@vger.kernel.org
-
- 31 3月, 2012 1 次提交
-
-
由 Amit Shah 提交于
The thaw operation was used by the balloon driver, but after the last commit there's no reason to have separate thaw and restore callbacks. Signed-off-by: NAmit Shah <amit.shah@redhat.com>
-
- 30 3月, 2012 7 次提交
-
-
由 Boris Ostrovsky 提交于
power_usage is always assigned a negative value and should be declared a signed integer Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@amd.com> Signed-off-by: NLen Brown <len.brown@intel.com>
-
由 Boris Ostrovsky 提交于
Currently when a CPU is off-lined it enters either MWAIT-based idle or, if MWAIT is not desired or supported, HLT-based idle (which places the processor in C1 state). This patch allows processors without MWAIT support to stay in states deeper than C1. Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@amd.com> Signed-off-by: NLen Brown <len.brown@intel.com>
-
由 Daniel Lezcano 提交于
As far as I can see, this field is never used in the code. Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: NLen Brown <len.brown@intel.com>
-
由 Daniel Lezcano 提交于
All the modules name are ro-data, it is never copied to the array. eg. static struct cpuidle_driver intel_idle_driver = { .name = "intel_idle", .owner = THIS_MODULE, }; It safe to assign the pointer of this ro-data to a const char *. By this way we save 12 bytes. Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org> Acked-by: NDeepthi Dharwar <deepthi@linux.vnet.ibm.com> Tested-by: NDeepthi Dharwar <deepthi@linux.vnet.ibm.com> Signed-off-by: NLen Brown <len.brown@intel.com>
-
由 ShuoX Liu 提交于
Some C states of new CPU might be not good. One reason is BIOS might configure them incorrectly. To help developers root cause it quickly, the patch adds a new sysfs entry, so developers could disable specific C state manually. In addition, C state might have much impact on performance tuning, as it takes much time to enter/exit C states, which might delay interrupt processing. With the new debug option, developers could check if a deep C state could impact performance and how much impact it could cause. Also add this option in Documentation/cpuidle/sysfs.txt. [akpm@linux-foundation.org: check kstrtol return value] Signed-off-by: NShuoX Liu <shuox.liu@intel.com> Reviewed-by: NYanmin Zhang <yanmin_zhang@intel.com> Reviewed-and-Tested-by: NDeepthi Dharwar <deepthi@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLen Brown <len.brown@intel.com>
-
由 Jason Wessel 提交于
There has long been a limitation using software breakpoints with a kernel compiled with CONFIG_DEBUG_RODATA going back to 2.6.26. For this particular patch, it will apply cleanly and has been tested all the way back to 2.6.36. The kprobes code uses the text_poke() function which accommodates writing a breakpoint into a read-only page. The x86 kgdb code can solve the problem similarly by overriding the default breakpoint set/remove routines and using text_poke() directly. The x86 kgdb code will first attempt to use the traditional probe_kernel_write(), and next try using a the text_poke() function. The break point install method is tracked such that the correct break point removal routine will get called later on. Cc: x86@kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: stable@vger.kernel.org # >= 2.6.36 Inspried-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
-
由 Jason Wessel 提交于
There is extra state information that needs to be exposed in the kgdb_bpt structure for tracking how a breakpoint was installed. The debug_core only uses the the probe_kernel_write() to install breakpoints, but this is not enough for all the archs. Some arch such as x86 need to use text_poke() in order to install a breakpoint into a read only page. Passing the kgdb_bpt structure to kgdb_arch_set_breakpoint() and kgdb_arch_remove_breakpoint() allows other archs to set the type variable which indicates how the breakpoint was installed. Cc: stable@vger.kernel.org # >= 2.6.36 Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
-
- 29 3月, 2012 11 次提交
-
-
由 Steffen Klassert 提交于
The default netlink message size limit might be exceeded when dumping a lot of algorithms to userspace. As a result, not all of the instantiated algorithms dumped to userspace. So calculate an upper bound on the message size and call netlink_dump_start() with that value. Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Axel Lin 提交于
Signed-off-by: NAxel Lin <axel.lin@gmail.com> Signed-off-by: NMark Brown <broonie@opensource.wolfsonmicro.com>
-
由 Rusty Russell 提交于
These are obsolete: cpu_*_mask provides (const) pointers. Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 Konstantin Khlebnikov 提交于
A series of radix tree cleanups, and usage of them in the core pagecache code. Micro-benchmark: lookup 14 slots (typical page-vector size) in radix-tree there earch <step> slot filled and tagged before/after - nsec per full scan through tree * Intel Sandy Bridge i7-2620M 4Mb L3 New code always faster * AMD Athlon 6000+ 2x1Mb L2, without L3 New code generally faster, Minor degradation (marked with "*") for huge sparse trees * i386 on Sandy Bridge New code faster for common cases: tagged and dense trees. Some degradations for non-tagged lookup on sparse trees. Ideally, there might help __ffs() analog for searching first non-zero long element in array, gcc sometimes cannot optimize this loop corretly. Numbers: CPU: Intel Sandy Bridge i7-2620M 4Mb L3 radix-tree with 1024 slots: tagged lookup step 1 before 7156 after 3613 step 2 before 5399 after 2696 step 3 before 4779 after 1928 step 4 before 4456 after 1429 step 5 before 4292 after 1213 step 6 before 4183 after 1052 step 7 before 4157 after 951 step 8 before 4016 after 812 step 9 before 3952 after 851 step 10 before 3937 after 732 step 11 before 4023 after 709 step 12 before 3872 after 657 step 13 before 3892 after 633 step 14 before 3720 after 591 step 15 before 3879 after 578 step 16 before 3561 after 513 normal lookup step 1 before 4266 after 3301 step 2 before 2695 after 2129 step 3 before 2083 after 1712 step 4 before 1801 after 1534 step 5 before 1628 after 1313 step 6 before 1551 after 1263 step 7 before 1475 after 1185 step 8 before 1432 after 1167 step 9 before 1373 after 1092 step 10 before 1339 after 1134 step 11 before 1292 after 1056 step 12 before 1319 after 1030 step 13 before 1276 after 1004 step 14 before 1256 after 987 step 15 before 1228 after 992 step 16 before 1247 after 999 radix-tree with 1024*1024*128 slots: tagged lookup step 1 before 1086102841 after 674196409 step 2 before 816839155 after 498138306 step 7 before 599728907 after 240676762 step 15 before 555729253 after 185219677 step 63 before 606637748 after 128585664 step 64 before 608384432 after 102945089 step 65 before 596987114 after 123996019 step 128 before 304459225 after 56783056 step 256 before 158846855 after 31232481 step 512 before 86085652 after 18950595 step 12345 before 6517189 after 1674057 normal lookup step 1 before 626064869 after 544418266 step 2 before 418809975 after 336321473 step 7 before 242303598 after 207755560 step 15 before 208380563 after 176496355 step 63 before 186854206 after 167283638 step 64 before 176188060 after 170143976 step 65 before 185139608 after 167487116 step 128 before 88181865 after 86913490 step 256 before 45733628 after 45143534 step 512 before 24506038 after 23859036 step 12345 before 2177425 after 2018662 * AMD Athlon 6000+ 2x1Mb L2, without L3 radix-tree with 1024 slots: tag-lookup step 1 before 8164 after 5379 step 2 before 5818 after 5581 step 3 before 4959 after 4213 step 4 before 4371 after 3386 step 5 before 4204 after 2997 step 6 before 4950 after 2744 step 7 before 4598 after 2480 step 8 before 4251 after 2288 step 9 before 4262 after 2243 step 10 before 4175 after 2131 step 11 before 3999 after 2024 step 12 before 3979 after 1994 step 13 before 3842 after 1929 step 14 before 3750 after 1810 step 15 before 3735 after 1810 step 16 before 3532 after 1660 normal-lookup step 1 before 7875 after 5847 step 2 before 4808 after 4071 step 3 before 4073 after 3462 step 4 before 3677 after 3074 step 5 before 4308 after 2978 step 6 before 3911 after 3807 step 7 before 3635 after 3522 step 8 before 3313 after 3202 step 9 before 3280 after 3257 step 10 before 3166 after 3083 step 11 before 3066 after 3026 step 12 before 2985 after 2982 step 13 before 2925 after 2924 step 14 before 2834 after 2808 step 15 before 2805 after 2803 step 16 before 2647 after 2622 radix-tree with 1024*1024*128 slots: tag-lookup step 1 before 1288059720 after 951736580 step 2 before 961292300 after 884212140 step 7 before 768905140 after 547267580 step 15 before 771319480 after 456550640 step 63 before 504847640 after 242704304 step 64 before 392484800 after 177920786 step 65 before 491162160 after 246895264 step 128 before 208084064 after 97348392 step 256 before 112401035 after 51408126 step 512 before 75825834 after 29145070 step 12345 before 5603166 after 2847330 normal-lookup step 1 before 1025677120 after 861375100 step 2 before 647220080 after 572258540 step 7 before 505518960 after 484041813 step 15 before 430483053 after 444815320 * step 63 before 388113453 after 404250546 * step 64 before 374154666 after 396027440 * step 65 before 381423973 after 396704853 * step 128 before 190078700 after 202619384 * step 256 before 100886756 after 102829108 * step 512 before 64074505 after 56158720 step 12345 before 4237289 after 4422299 * * i686 on Sandy bridge radix-tree with 1024 slots: tagged lookup step 1 before 7990 after 4019 step 2 before 5698 after 2897 step 3 before 5013 after 2475 step 4 before 4630 after 1721 step 5 before 4346 after 1759 step 6 before 4299 after 1556 step 7 before 4098 after 1513 step 8 before 4115 after 1222 step 9 before 3983 after 1390 step 10 before 4077 after 1207 step 11 before 3921 after 1231 step 12 before 3894 after 1116 step 13 before 3840 after 1147 step 14 before 3799 after 1090 step 15 before 3797 after 1059 step 16 before 3783 after 745 normal lookup step 1 before 5103 after 3499 step 2 before 3299 after 2550 step 3 before 2489 after 2370 step 4 before 2034 after 2302 * step 5 before 1846 after 2268 * step 6 before 1752 after 2249 * step 7 before 1679 after 2164 * step 8 before 1627 after 2153 * step 9 before 1542 after 2095 * step 10 before 1479 after 2109 * step 11 before 1469 after 2009 * step 12 before 1445 after 2039 * step 13 before 1411 after 2013 * step 14 before 1374 after 2046 * step 15 before 1340 after 1975 * step 16 before 1331 after 2000 * radix-tree with 1024*1024*128 slots: tagged lookup step 1 before 1225865377 after 667153553 step 2 before 842427423 after 471533007 step 7 before 609296153 after 276260116 step 15 before 544232060 after 226859105 step 63 before 519209199 after 141343043 step 64 before 588980279 after 141951339 step 65 before 521099710 after 138282060 step 128 before 298476778 after 83390628 step 256 before 149358342 after 43602609 step 512 before 76994713 after 22911077 step 12345 before 53286669 after 1472111 normal lookup step 1 before 819284564 after 533635310 step 2 before 512421605 after 364956155 step 7 before 271443305 after 305721345 * step 15 before 223591630 after 273960216 * step 63 before 190320247 after 217770207 * step 64 before 178538168 after 267411372 * step 65 before 186400423 after 215347937 * step 128 before 88106045 after 140540612 * step 256 before 44812420 after 70660377 * step 512 before 24435438 after 36328275 * step 12345 before 2123924 after 2148062 * bloat-o-meter delta for this patchset + patchset with related shmem cleanups bloat-o-meter: x86_64 add/remove: 4/3 grow/shrink: 5/6 up/down: 928/-939 (-11) function old new delta radix_tree_next_chunk - 499 +499 shmem_unuse 428 554 +126 shmem_radix_tree_replace 131 227 +96 find_get_pages_tag 354 419 +65 find_get_pages_contig 345 407 +62 find_get_pages 362 396 +34 __kstrtab_radix_tree_next_chunk - 22 +22 __ksymtab_radix_tree_next_chunk - 16 +16 __kcrctab_radix_tree_next_chunk - 8 +8 radix_tree_gang_lookup_slot 204 203 -1 static.shmem_xattr_set 384 381 -3 radix_tree_gang_lookup_tag_slot 208 191 -17 radix_tree_gang_lookup 231 187 -44 radix_tree_gang_lookup_tag 247 199 -48 shmem_unlock_mapping 278 190 -88 __lookup 217 - -217 __lookup_tag 242 - -242 radix_tree_locate_item 279 - -279 bloat-o-meter: i386 add/remove: 3/3 grow/shrink: 8/9 up/down: 1075/-1275 (-200) function old new delta radix_tree_next_chunk - 757 +757 shmem_unuse 352 449 +97 find_get_pages_contig 269 322 +53 shmem_radix_tree_replace 113 154 +41 find_get_pages_tag 277 318 +41 dcache_dir_lseek 426 458 +32 __kstrtab_radix_tree_next_chunk - 22 +22 vc_do_resize 968 977 +9 snd_pcm_lib_read1 725 733 +8 __ksymtab_radix_tree_next_chunk - 8 +8 netlbl_cipsov4_list 1120 1127 +7 find_get_pages 293 291 -2 new_slab 467 459 -8 bitfill_unaligned_rev 425 417 -8 radix_tree_gang_lookup_tag_slot 177 146 -31 blk_dump_cmd 267 229 -38 radix_tree_gang_lookup_slot 212 134 -78 shmem_unlock_mapping 221 128 -93 radix_tree_gang_lookup_tag 275 162 -113 radix_tree_gang_lookup 255 126 -129 __lookup 227 - -227 __lookup_tag 271 - -271 radix_tree_locate_item 277 - -277 This patch: Implement a clean, simple and effective radix-tree iteration routine. Iterating divided into two phases: * lookup next chunk in radix-tree leaf node * iterating through slots in this chunk Main iterator function radix_tree_next_chunk() returns pointer to first slot, and stores in the struct radix_tree_iter index of next-to-last slot. For tagged-iterating it also constuct bitmask of tags for retunted chunk. All additional logic implemented as static-inline functions and macroses. Also adds radix_tree_find_next_bit() static-inline variant of find_next_bit() optimized for small constant size arrays, because find_next_bit() too heavy for searching in an array with one/two long elements. [akpm@linux-foundation.org: rework comments a bit] Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org> Tested-by: NHugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Daniel Lezcano 提交于
In the case of a child pid namespace, rebooting the system does not really makes sense. When the pid namespace is used in conjunction with the other namespaces in order to create a linux container, the reboot syscall leads to some problems. A container can reboot the host. That can be fixed by dropping the sys_reboot capability but we are unable to correctly to poweroff/ halt/reboot a container and the container stays stuck at the shutdown time with the container's init process waiting indefinitively. After several attempts, no solution from userspace was found to reliabily handle the shutdown from a container. This patch propose to make the init process of the child pid namespace to exit with a signal status set to : SIGINT if the child pid namespace called "halt/poweroff" and SIGHUP if the child pid namespace called "reboot". When the reboot syscall is called and we are not in the initial pid namespace, we kill the pid namespace for "HALT", "POWEROFF", "RESTART", and "RESTART2". Otherwise we return EINVAL. Returning EINVAL is also an easy way to check if this feature is supported by the kernel when invoking another 'reboot' option like CAD. By this way the parent process of the child pid namespace knows if it rebooted or not and can take the right decision. Test case: ========== #include <alloca.h> #include <stdio.h> #include <sched.h> #include <unistd.h> #include <signal.h> #include <sys/reboot.h> #include <sys/types.h> #include <sys/wait.h> #include <linux/reboot.h> static int do_reboot(void *arg) { int *cmd = arg; if (reboot(*cmd)) printf("failed to reboot(%d): %m\n", *cmd); } int test_reboot(int cmd, int sig) { long stack_size = 4096; void *stack = alloca(stack_size) + stack_size; int status; pid_t ret; ret = clone(do_reboot, stack, CLONE_NEWPID | SIGCHLD, &cmd); if (ret < 0) { printf("failed to clone: %m\n"); return -1; } if (wait(&status) < 0) { printf("unexpected wait error: %m\n"); return -1; } if (!WIFSIGNALED(status)) { printf("child process exited but was not signaled\n"); return -1; } if (WTERMSIG(status) != sig) { printf("signal termination is not the one expected\n"); return -1; } return 0; } int main(int argc, char *argv[]) { int status; status = test_reboot(LINUX_REBOOT_CMD_RESTART, SIGHUP); if (status < 0) return 1; printf("reboot(LINUX_REBOOT_CMD_RESTART) succeed\n"); status = test_reboot(LINUX_REBOOT_CMD_RESTART2, SIGHUP); if (status < 0) return 1; printf("reboot(LINUX_REBOOT_CMD_RESTART2) succeed\n"); status = test_reboot(LINUX_REBOOT_CMD_HALT, SIGINT); if (status < 0) return 1; printf("reboot(LINUX_REBOOT_CMD_HALT) succeed\n"); status = test_reboot(LINUX_REBOOT_CMD_POWER_OFF, SIGINT); if (status < 0) return 1; printf("reboot(LINUX_REBOOT_CMD_POWERR_OFF) succeed\n"); status = test_reboot(LINUX_REBOOT_CMD_CAD_ON, -1); if (status >= 0) { printf("reboot(LINUX_REBOOT_CMD_CAD_ON) should have failed\n"); return 1; } printf("reboot(LINUX_REBOOT_CMD_CAD_ON) has failed as expected\n"); return 0; } [akpm@linux-foundation.org: tweak and add comments] [akpm@linux-foundation.org: checkpatch fixes] Signed-off-by: NDaniel Lezcano <daniel.lezcano@free.fr> Acked-by: NSerge Hallyn <serge.hallyn@canonical.com> Tested-by: NSerge Hallyn <serge.hallyn@canonical.com> Reviewed-by: NOleg Nesterov <oleg@redhat.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Srivatsa S. Bhat 提交于
__any_online_cpu() is not optimal and also unnecessary. So, replace its use by faster cpumask_* operations. Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Venkatesh Pallipadi <venki@google.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Gilad Ben-Yossef 提交于
Add the on_each_cpu_cond() function that wraps on_each_cpu_mask() and calculates the cpumask of cpus to IPI by calling a function supplied as a parameter in order to determine whether to IPI each specific cpu. The function works around allocation failure of cpumask variable in CONFIG_CPUMASK_OFFSTACK=y by itereating over cpus sending an IPI a time via smp_call_function_single(). The function is useful since it allows to seperate the specific code that decided in each case whether to IPI a specific cpu for a specific request from the common boilerplate code of handling creating the mask, handling failures etc. [akpm@linux-foundation.org: s/gfpflags/gfp_flags/] [akpm@linux-foundation.org: avoid double-evaluation of `info' (per Michal), parenthesise evaluation of `cond_func'] [akpm@linux-foundation.org: s/CPU/CPUs, use all 80 cols in comment] Signed-off-by: NGilad Ben-Yossef <gilad@benyossef.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux-foundation.org> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Pekka Enberg <penberg@kernel.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Sasha Levin <levinsasha928@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Avi Kivity <avi@redhat.com> Acked-by: NMichal Nazarewicz <mina86@mina86.org> Cc: Kosaki Motohiro <kosaki.motohiro@gmail.com> Cc: Milton Miller <miltonm@bga.com> Reviewed-by: N"Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Gilad Ben-Yossef 提交于
We have lots of infrastructure in place to partition multi-core systems such that we have a group of CPUs that are dedicated to specific task: cgroups, scheduler and interrupt affinity, and cpuisol= boot parameter. Still, kernel code will at times interrupt all CPUs in the system via IPIs for various needs. These IPIs are useful and cannot be avoided altogether, but in certain cases it is possible to interrupt only specific CPUs that have useful work to do and not the entire system. This patch set, inspired by discussions with Peter Zijlstra and Frederic Weisbecker when testing the nohz task patch set, is a first stab at trying to explore doing this by locating the places where such global IPI calls are being made and turning the global IPI into an IPI for a specific group of CPUs. The purpose of the patch set is to get feedback if this is the right way to go for dealing with this issue and indeed, if the issue is even worth dealing with at all. Based on the feedback from this patch set I plan to offer further patches that address similar issue in other code paths. This patch creates an on_each_cpu_mask() and on_each_cpu_cond() infrastructure API (the former derived from existing arch specific versions in Tile and Arm) and uses them to turn several global IPI invocation to per CPU group invocations. Core kernel: on_each_cpu_mask() calls a function on processors specified by cpumask, which may or may not include the local processor. You must not call this function with disabled interrupts or from a hardware interrupt handler or from a bottom half handler. arch/arm: Note that the generic version is a little different then the Arm one: 1. It has the mask as first parameter 2. It calls the function on the calling CPU with interrupts disabled, but this should be OK since the function is called on the other CPUs with interrupts disabled anyway. arch/tile: The API is the same as the tile private one, but the generic version also calls the function on the with interrupts disabled in UP case This is OK since the function is called on the other CPUs with interrupts disabled. Signed-off-by: NGilad Ben-Yossef <gilad@benyossef.com> Reviewed-by: NChristoph Lameter <cl@linux.com> Acked-by: NChris Metcalf <cmetcalf@tilera.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Pekka Enberg <penberg@kernel.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Sasha Levin <levinsasha928@gmail.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Avi Kivity <avi@redhat.com> Acked-by: NMichal Nazarewicz <mina86@mina86.org> Cc: Kosaki Motohiro <kosaki.motohiro@gmail.com> Cc: Milton Miller <miltonm@bga.com> Cc: Russell King <linux@arm.linux.org.uk> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
Most system calls taking flags first check that the flags passed in are valid, and that helps userspace to detect when new flags are supported. But swapon never did so: start checking now, to help if we ever want to support more swap_flags in future. It's difficult to get stray bits set in an int, and swapon is not widely used, so this is most unlikely to break any userspace; but we can just revert if it turns out to do so. Signed-off-by: NHugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
Holepunching filesystems ext4 and xfs are using truncate_inode_pages_range but forgetting to unmap pages first (ocfs2 remembers). This is not really a bug, since races already require truncate_inode_page() to handle that case once the page is locked; but it can be very inefficient if the file being punched happens to be mapped into many vmas. Provide a drop-in replacement truncate_pagecache_range() which does the unmapping pass first, handling the awkward mismatch between arguments to truncate_inode_pages_range() and arguments to unmap_mapping_range(). Note that holepunching does not unmap privately COWed pages in the range: POSIX requires that we do so when truncating, but it's hard to justify, difficult to implement without an i_size cutoff, and no filesystem is attempting to implement it. Signed-off-by: NHugh Dickins <hughd@google.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Ben Myers <bpm@sgi.com> Cc: Alex Elder <elder@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <david@fromorbit.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 MyungJoo Ham 提交于
The new API, pm_qos_update_request_timeout() is to provide a timeout with pm_qos_update_request. For example, pm_qos_update_request_timeout(req, 100, 1000), means that QoS request on req with value 100 will be active for 1000 microseconds. After 1000 microseconds, the QoS request thru req is reset. If there were another pm_qos_update_request(req, x) during the 1000 us, this new request with value x will override as this is another request on the same req handle. A new request on the same req handle will always override the previous request whether it is the conventional request or it is the new timeout request. Signed-off-by: NMyungJoo Ham <myungjoo.ham@samsung.com> Signed-off-by: NKyungmin Park <kyungmin.park@samsung.com> Acked-by: NMark Gross <markgross@thegnar.org> Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
-