- 29 10月, 2010 3 次提交
-
-
由 Eric Paris 提交于
We disabled the ability to build fanotify in commit 7c534773. This reverts that commit and allows people to build fanotify. Signed-off-by: NEric Paris <eparis@redhat.com>
-
由 Figo.zhang 提交于
Signed-off-by: NFigo.zhang <figo1802@gmail.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ingo Molnar 提交于
Today's git tree fails to build on !CONFIG_BLOCK, due to upstream commit 367a51a3 ("fs: Add FITRIM ioctl"): include/linux/fs.h:36: error: expected specifier-qualifier-list before ‘uint64_t’ include/linux/fs.h:36: error: expected specifier-qualifier-list before ‘uint64_t’ include/linux/fs.h:36: error: expected specifier-qualifier-list before ‘uint64_t’ The commit adds uint64_t type usage to fs.h, but linux/types.h is not included explicitly - it's only included implicitly via linux/blk_types.h, and there only if CONFIG_BLOCK is enabled. Add the explicit #include to fix this. Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 28 10月, 2010 37 次提交
-
-
SYNOPSIS size[4] Tfsync tag[2] fid[4] datasync[4] size[4] Rfsync tag[2] DESCRIPTION The Tfsync transaction transfers ("flushes") all modified in-core data of file identified by fid to the disk device (or other permanent storage device) where that file resides. If datasync flag is specified data will be fleshed but does not flush modified metadata unless that metadata is needed in order to allow a subsequent data retrieval to be correctly handled. Signed-off-by: NVenkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>
-
由 M. Mohan Kumar 提交于
Synopsis size[4] TReadlink tag[2] fid[4] size[4] RReadlink tag[2] target[s] Description Readlink is used to return the contents of the symoblic link referred by fid. Contents of symboic link is returned as a response. target[s] - Contents of the symbolic link referred by fid. Signed-off-by: NM. Mohan Kumar <mohan@in.ibm.com> Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NVenkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>
-
由 M. Mohan Kumar 提交于
Use V9FS_MAGIC as the file system type while filling kernel statfs strucutre instead of using host file system magic number. Also move the definition of V9FS_MAGIC from v9fs.h to standard magic.h file. Signed-off-by: NM. Mohan Kumar <mohan@in.ibm.com> Signed-off-by: NVenkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>
-
由 M. Mohan Kumar 提交于
Synopsis size[4] TGetlock tag[2] fid[4] getlock[n] size[4] RGetlock tag[2] getlock[n] Description TGetlock is used to test for the existence of byte range posix locks on a file identified by given fid. The reply contains getlock structure. If the lock could be placed it returns F_UNLCK in type field of getlock structure. Otherwise it returns the details of the conflicting locks in the getlock structure getlock structure: type[1] - Type of lock: F_RDLCK, F_WRLCK start[8] - Starting offset for lock length[8] - Number of bytes to check for the lock If length is 0, check for lock in all bytes starting at the location 'start' through to the end of file pid[4] - PID of the process that wants to take lock/owns the task in case of reply client[4] - Client id of the system that owns the process which has the conflicting lock Signed-off-by: NM. Mohan Kumar <mohan@in.ibm.com> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NVenkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>
-
由 M. Mohan Kumar 提交于
Synopsis size[4] TLock tag[2] fid[4] flock[n] size[4] RLock tag[2] status[1] Description Tlock is used to acquire/release byte range posix locks on a file identified by given fid. The reply contains status of the lock request flock structure: type[1] - Type of lock: F_RDLCK, F_WRLCK, F_UNLCK flags[4] - Flags could be either of P9_LOCK_FLAGS_BLOCK - Blocked lock request, if there is a conflicting lock exists, wait for that lock to be released. P9_LOCK_FLAGS_RECLAIM - Reclaim lock request, used when client is trying to reclaim a lock after a server restrart (due to crash) start[8] - Starting offset for lock length[8] - Number of bytes to lock If length is 0, lock all bytes starting at the location 'start' through to the end of file pid[4] - PID of the process that wants to take lock client_id[4] - Unique client id status[1] - Status of the lock request, can be P9_LOCK_SUCCESS(0), P9_LOCK_BLOCKED(1), P9_LOCK_ERROR(2) or P9_LOCK_GRACE(3) P9_LOCK_SUCCESS - Request was successful P9_LOCK_BLOCKED - A conflicting lock is held by another process P9_LOCK_ERROR - Error while processing the lock request P9_LOCK_GRACE - Server is in grace period, it can't accept new lock requests in this period (except locks with P9_LOCK_FLAGS_RECLAIM flag set) Signed-off-by: NM. Mohan Kumar <mohan@in.ibm.com> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NVenkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>
-
SYNOPSIS size[4] Tfsync tag[2] fid[4] size[4] Rfsync tag[2] DESCRIPTION The Tfsync transaction transfers ("flushes") all modified in-core data of file identified by fid to the disk device (or other permanent storage device) where that file resides. Signed-off-by: NVenkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>
-
由 Arun R Bharadwaj 提交于
Signed-off-by: NArun R Bharadwaj <arun@linux.vnet.ibm.com> Signed-off-by: NVenkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>
-
由 Theodore Ts'o 提交于
Unfortunately perf can't deal with anything other than direct structure accesses in the TP_printk() section. It will drop dead when it sees jbd2_dev_to_name() in the "print fmt" section of the tracepoint. Addresses-Google-Bug: 3138508 Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Commit 84061e07 fixed an accounting bug only to introduce the possibility of a kernel OOPS if the journal has a non-zero j_errno field indicating that the file system had detected a fs inconsistency. After the journal replay, if the journal superblock indicates that the file system has an error, this indication is transfered to the file system and then ext4_commit_super() is called to write this to the disk. But since the percpu counters are now initialized after the journal replay, the call to ext4_commit_super() will cause a kernel oops since it needs to use the percpu counters the ext4 superblock structure. The fix is to skip setting the ext4 free block and free inode fields if the percpu counter has not been set. Thanks to Ken Sumrall for reporting and analyzing the root causes of this bug. Addresses-Google-Bug: #3054080 Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Eric Sandeen 提交于
This is analogous to Jan Kara's commit, f446daae mm: implement writeback livelock avoidance using page tagging but since we forked write_cache_pages, we need to reimplement it there (and in ext4_da_writepages, since range_cyclic handling was moved to there) If you start a large buffered IO to a file, and then set fsync after it, you'll find that fsync does not complete until the other IO stops. If you continue re-dirtying the file (say, putting dd with conv=notrunc in a loop), when fsync finally completes (after all IO is done), it reports via tracing that it has written many more pages than the file contains; in other words it has synced and re-synced pages in the file multiple times. This then leads to problems with our writeback_index update, since it advances it by pages written, and essentially sets writeback_index off the end of the file... With the following patch, we only sync as much as was dirty at the time of the sync. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Lukas Czerner 提交于
Adds an filesystem independent ioctl to allow implementation of file system batched discard support. I takes fstrim_range structure as an argument. fstrim_range is definec in the include/fs.h and its definition is as follows. struct fstrim_range { start; len; minlen; } start - first Byte to trim len - number of Bytes to trim from start minlen - minimum extent length to trim, free extents shorter than this number of Bytes will be ignored. This will be rounded up to fs block size. It is also possible to specify NULL as an argument. In this case the arguments will set itself as follows: start = 0; len = ULLONG_MAX; minlen = 0; So it will trim the whole file system at one run. After the FITRIM is done, the number of actually discarded Bytes is stored in fstrim_range.len to give the user better insight on how much storage space has been really released for wear-leveling. Signed-off-by: NLukas Czerner <lczerner@redhat.com> Reviewed-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Eric Sandeen 提交于
Many tracepoints were populating an ext4_allocation_context to pass in, but this requires a slab allocation even when tracepoints are off. In fact, 4 of 5 of these allocations were only for tracing. In addition, we were only using a small fraction of the 144 bytes of this structure for this purpose. We can do away with all these alloc/frees of the ac and simply pass in the bits we care about, instead. I tested this by turning on tracing and running through xfstests on x86_64. I did not actually do anything with the trace output, however. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Eric Sandeen 提交于
Our QA reported an oops in the ext4_mb_release_group_pa tracing, and Josef Bacik pointed out that it was because we may have a non-null but uninitialized ac_inode in the allocation context. I can reproduce it when running xfstests with ext4 tracepoints on, on a CONFIG_SLAB_DEBUG kernel. We call trace_ext4_mb_release_group_pa from 2 places, ext4_mb_discard_group_preallocations and ext4_mb_discard_lg_preallocations In both cases we allocate an ac as a container just for tracing (!) and never fill in the ac_inode. There's no reason to be assigning, testing, or printing it as far as I can see, so just remove it from the tracepoint. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NJosef Bacik <josef@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Lukas Czerner 提交于
This is done the same way as helper sb_issue_discard for blkdev_issue_discard. Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Wen Congyang 提交于
ac->inode is set to null in function ext4_mb_release_group_pa(), and then trace_ext4_mballoc_discard(ac) is called, the kernel will panic. BUG: unable to handle kernel NULL pointer dereference at 000000a4 IP: [<f87e1714>] ftrace_raw_event_ext4__mballoc+0x54/0xc0 [ext4] *pdpt = 0000000000abd001 *pde = 0000000000000000 Oops: 0000 [#1] SMP Pid: 550, comm: flush-8:16 Not tainted 2.6.36-rc1 #1 SE7320EP2/Altos G530 EIP: 0060:[<f87e1714>] EFLAGS: 00010206 CPU: 1 EIP is at ftrace_raw_event_ext4__mballoc+0x54/0xc0 [ext4] EAX: f32ac840 EBX: f3f1cf88 ECX: f32ac840 EDX: 00000000 ESI: f32ac83c EDI: f880b9d8 EBP: 00000000 ESP: f4b77ae4 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Process flush-8:16 (pid: 550, ti=f4b76000 task=f613e540 task.ti=f4b76000) Call Trace: [<f87f5ac1>] ? ext4_mb_release_group_pa+0x121/0x150 [ext4] [<f87f8356>] ? ext4_mb_discard_group_preallocations+0x336/0x400 [ext4] [<f87fb7f1>] ? ext4_mb_new_blocks+0x3d1/0x4f0 [ext4] [<c05a6c5b>] ? __make_request+0x10b/0x440 [<f87f1fb4>] ? ext4_ext_map_blocks+0x1334/0x1980 [ext4] [<c04ac78a>] ? rb_reserve_next_event+0xaa/0x3b0 [<f87d18d6>] ? ext4_map_blocks+0xd6/0x1d0 [ext4] [<f87d2da7>] ? mpage_da_map_blocks+0xc7/0x8a0 [ext4] [<c04c8a68>] ? find_get_pages_tag+0x38/0x110 [<c04d23a5>] ? __pagevec_release+0x15/0x20 [<f87d3ca5>] ? ext4_da_writepages+0x2b5/0x5d0 [ext4] [<c04cfbe0>] ? __writepage+0x0/0x30 [<c04d0e34>] ? do_writepages+0x14/0x30 [<c0526600>] ? writeback_single_inode+0xa0/0x240 [<c0526971>] ? writeback_sb_inodes+0xc1/0x180 [<c0526ab8>] ? writeback_inodes_wb+0x88/0x140 [<c0526d7b>] ? wb_writeback+0x20b/0x320 [<c045aca7>] ? lock_timer_base+0x27/0x50 [<c0526fe0>] ? wb_do_writeback+0x150/0x190 [<c05270a8>] ? bdi_writeback_thread+0x88/0x1f0 [<c043b680>] ? complete+0x40/0x60 [<c0527020>] ? bdi_writeback_thread+0x0/0x1f0 [<c0469474>] ? kthread+0x74/0x80 [<c0469400>] ? kthread+0x0/0x80 [<c040a23e>] ? kernel_thread_helper+0x6/0x10 Signed-off-by: NWen Congyang <wency@cn.fujitsu.com> Acked-by: NSteven Rostedt <rostedt@goodmis.org> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Brian King 提交于
This fixes a hang seen in jbd2_journal_release_jbd_inode on a lot of Power 6 systems running with ext4. When we get in the hung state, all I/O to the disk in question gets blocked where we stay indefinitely. Looking at the task list, I can see we are stuck in jbd2_journal_release_jbd_inode waiting on a wake up. I added some debug code to detect this scenario and dump additional data if we were stuck in jbd2_journal_release_jbd_inode for longer than 30 minutes. When it hit, I was able to see that i_flags was 0, suggesting we missed the wake up. This patch changes i_flags to be an unsigned long, uses bit operators to access it, and adds barriers around the accesses. Prior to applying this patch, we were regularly hitting this hang on numerous systems in our test environment. After applying the patch, the hangs no longer occur. Signed-off-by: NBrian King <brking@linux.vnet.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Shawn Bohrer 提交于
This make epoll use hrtimers for the timeout value which prevents epoll_wait() from timing out up to a millisecond early. This mirrors the behavior of select() and poll(). Signed-off-by: NShawn Bohrer <shawn.bohrer@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: NDavide Libenzi <davidel@xmailserver.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Zimny Lech 提交于
Signed-off-by: NZimny Lech <napohybelskurwysynom2010@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kyungmin Park 提交于
As each board and system has different memory for ramoops. It's better to define the platform data instead of module params. [akpm@linux-foundation.org: fix ramoops_remove() return type] Signed-off-by: NKyungmin Park <kyungmin.park@samsung.com> Cc: Marco Stornelli <marco.stornelli@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Stefani Seibold 提交于
Add a new __kfifo_int_must_check_helper() helper function, which is needed for kfifo_alloc() to return the right signed integer value. The origin __kfifo_must_check_helper() helper was renamed into __kfifo_uint_must_check_helper() to show the sign which is expected and returned. (And revert the temporary disabling of __kfifo_must_check_helper()) Signed-off-by: NStefani Seibold <stefani@seibold.net> Acked-by: NRandy Dunlap <randy.dunlap@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michael Holzheu 提交于
The taskstats interface uses microsecond granularity for the user and system time values. The conversion from cputime to the taskstats values uses the cputime_to_msecs primitive which effectively limits the granularity to milliseconds. Add the cputime_to_usecs primitive for architectures that have better, more precise CPU time values. Remove cputime_to_msecs primitive because there are no more users left. Signed-off-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com> Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com> Cc: Luck Tony <tony.luck@intel.com> Cc: Shailabh Nagar <nagar1234@in.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Shailabh Nagar <nagar@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexandre Bounine 提交于
RapidIO spec v.2.1 adds Idle Sequence 2 into LP-Serial Physical Layer. The fix ensures that corresponding bits are not corrupted during error handling. Signed-off-by: NAlexandre Bounine <alexandre.bounine@idt.com> Cc: Thomas Moll <thomas.moll@sysgo.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Micha Nelissen <micha@neli.hopto.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexandre Bounine 提交于
Detects RIO link to the already enumerated device and properly sets links between device objects. Changes to the enumeration/discovery logic: 1. Use Master Enable bit to signal end of the enumeration - agents may start their discovery process as soon as they see this bit set (Component Tag register was used before for this purpose). 2. Enumerator sets Component Tag (!= 0) immediately during device setup. This allows to identify the device if the redundant route exists in a RIO system. Signed-off-by: NAlexandre Bounine <alexandre.bounine@idt.com> Cc: Thomas Moll <thomas.moll@sysgo.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Micha Nelissen <micha@neli.hopto.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexandre Bounine 提交于
Add the RIO switch driver and definitions for IDT CPS-1848 and CPS-1616 Gen2 devices. Signed-off-by: NAlexandre Bounine <alexandre.bounine@idt.com> Cc: Thomas Moll <thomas.moll@sysgo.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Micha Nelissen <micha@neli.hopto.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexandre Bounine 提交于
1. Change to create attribute "routes" only for switches. 2. Add a switch-specific callback to create/remove proprietary attributes. Signed-off-by: NAlexandre Bounine <alexandre.bounine@idt.com> Cc: Thomas Moll <thomas.moll@sysgo.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Micha Nelissen <micha@neli.hopto.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexandre Bounine 提交于
The default error-stopped state handler provides recovery mechanism as defined by RIO specification. Signed-off-by: NAlexandre Bounine <alexandre.bounine@idt.com> Cc: Thomas Moll <thomas.moll@sysgo.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Micha Nelissen <micha@neli.hopto.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexandre Bounine 提交于
Create back and forward links between RIO devices. These links are intended for use by error management and hot-plug extensions. Links for redundant RIO connections between switches are not set (will be fixed in a separate patch). Signed-off-by: NAlexandre Bounine <alexandre.bounine@idt.com> Cc: Thomas Moll <thomas.moll@sysgo.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Micha Nelissen <micha@neli.hopto.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexandre Bounine 提交于
The switch port information is obtained and stored during RIO device setup. Therefore repeated reads from Switch Port Information CAR may be removed. Signed-off-by: NAlexandre Bounine <alexandre.bounine@idt.com> Cc: Thomas Moll <thomas.moll@sysgo.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Micha Nelissen <micha@neli.hopto.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexandre Bounine 提交于
This set of RapidIO patches extends support for standard error recovery mechanism and adds new IDT Gen2 sRIO switch devices - CPS-1848 and CPS-1616. Implementation of the standard error-stopped state recovery mechanism (as defined by the RapidIO specification) is required for the new switches. Version 2 of this set of patches addresses received comments and fixes an error notification setup issue found in the idt_gen2.c after the first version was released. This patch: Make RapidIO devices appear in /sys/devices/rapidio directory instead of top of /sys/devices directory. Signed-off-by: NAlexandre Bounine <alexandre.bounine@idt.com> Cc: Thomas Moll <thomas.moll@sysgo.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Micha Nelissen <micha@neli.hopto.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Paul Fulghum 提交于
Add support for extended byte synchronous mode feature of hardware. Signed-off-by: NPaul Fulghum <paulkf@microgate.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 KAMEZAWA Hiroyuki 提交于
In /proc/stat, the number of per-IRQ event is shown by making a sum each irq's events on all cpus. But we can make use of kstat_irqs(). kstat_irqs() do the same calculation, If !CONFIG_GENERIC_HARDIRQ, it's not a big cost. (Both of the number of cpus and irqs are small.) If a system is very big and CONFIG_GENERIC_HARDIRQ, it does for_each_irq() for_each_cpu() - look up a radix tree - read desc->irq_stat[cpu] This seems not efficient. This patch adds kstat_irqs() for CONFIG_GENRIC_HARDIRQ and change the calculation as for_each_irq() look up radix tree for_each_cpu() - read desc->irq_stat[cpu] This reduces cost. A test on (4096cpusp, 256 nodes, 4592 irqs) host (by Jack Steiner) %time cat /proc/stat > /dev/null Before Patch: 2.459 sec After Patch : .561 sec [akpm@linux-foundation.org: unexport kstat_irqs, coding-style tweaks] [akpm@linux-foundation.org: fix unused variable 'per_irq_sum'] Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Tested-by: NJack Steiner <steiner@sgi.com> Acked-by: NJack Steiner <steiner@sgi.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 KAMEZAWA Hiroyuki 提交于
/proc/stat shows the total number of all interrupts to each cpu. But when the number of IRQs are very large, it take very long time and 'cat /proc/stat' takes more than 10 secs. This is because sum of all irq events are counted when /proc/stat is read. This patch adds "sum of all irq" counter percpu and reduce read costs. The cost of reading /proc/stat is important because it's used by major applications as 'top', 'ps', 'w', etc.... A test on a mechin (4096cpu, 256 nodes, 4592 irqs) shows %time cat /proc/stat > /dev/null Before Patch: 12.627 sec After Patch: 2.459 sec Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Tested-by: NJack Steiner <steiner@sgi.com> Acked-by: NJack Steiner <steiner@sgi.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 KOSAKI Motohiro 提交于
Oleg Nesterov pointed out we have to prevent multiple-threads-inside-exec itself and we can reuse ->cred_guard_mutex for it. Yes, concurrent execve() has no worth. Let's move ->cred_guard_mutex from task_struct to signal_struct. It naturally prevent multiple-threads-inside-exec. Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NRoland McGrath <roland@redhat.com> Acked-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Namhyung Kim 提交于
lock_task_sighand() grabs sighand->siglock in case of returning non-NULL but unlock_task_sighand() releases it unconditionally. This leads sparse to complain about the lock context imbalance. Rename and wrap lock_task_sighand() using __cond_lock() macro to make sparse happy. Suggested-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NNamhyung Kim <namhyung@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Namhyung Kim 提交于
Fix up the arguments to arch_ptrace() to take account of the fact that @addr and @data are now unsigned long rather than long as of a preceding patch in this series. Signed-off-by: NNamhyung Kim <namhyung@gmail.com> Cc: <linux-arch@vger.kernel.org> Acked-by: NRoland McGrath <roland@redhat.com> Acked-by: NDavid Howells <dhowells@redhat.com> Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org> Acked-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Namhyung Kim 提交于
Since userspace API of ptrace syscall defines @addr and @data as void pointers, it would be more appropriate to define them as unsigned long in kernel. Therefore related functions are changed also. 'unsigned long' is typically used in other places in kernel as an opaque data type and that using this helps cleaning up a lot of warnings from sparse. Suggested-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NNamhyung Kim <namhyung@gmail.com> Acked-by: NArnd Bergmann <arnd@arndb.de> Acked-by: NRoland McGrath <roland@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Daniel Lezcano 提交于
The ns_cgroup is a control group interacting with the namespaces. When a new namespace is created, a corresponding cgroup is automatically created too. The cgroup name is the pid of the process who did 'unshare' or the child of 'clone'. This cgroup is tied with the namespace because it prevents a process to escape the control group and use the post_clone callback, so the child cgroup inherits the values of the parent cgroup. Unfortunately, the more we use this cgroup and the more we are facing problems with it: (1) when a process unshares, the cgroup name may conflict with a previous cgroup with the same pid, so unshare or clone return -EEXIST (2) the cgroup creation is out of control because there may have an application creating several namespaces where the system will automatically create several cgroups in his back and let them on the cgroupfs (eg. a vrf based on the network namespace). (3) the mix of (1) and (2) force an administrator to regularly check and clean these cgroups. This patchset removes the ns_cgroup by adding a new flag to the cgroup and the cgroupfs mount option. It enables the copy of the parent cgroup when a child cgroup is created. We can then safely remove the ns_cgroup as this flag brings a compatibility. We have now to manually create and add the task to a cgroup, which is consistent with the cgroup framework. This patch: Sent as an answer to a previous thread around the ns_cgroup. https://lists.linux-foundation.org/pipermail/containers/2009-June/018627.html It adds a control file 'clone_children' for a cgroup. This control file is a boolean specifying if the child cgroup should be a clone of the parent cgroup or not. The default value is 'false'. This flag makes the child cgroup to call the post_clone callback of all the subsystem, if it is available. At present, the cpuset is the only one which had implemented the post_clone callback. The option can be set at mount time by specifying the 'clone_children' mount option. Signed-off-by: NDaniel Lezcano <daniel.lezcano@free.fr> Signed-off-by: NSerge E. Hallyn <serge.hallyn@canonical.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Acked-by: NPaul Menage <menage@google.com> Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com> Cc: Jamal Hadi Salim <hadi@cyberus.ca> Cc: Matt Helsley <matthltc@us.ibm.com> Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-