- 30 11月, 2010 1 次提交
-
-
由 Mike Galbraith 提交于
A recurring complaint from CFS users is that parallel kbuild has a negative impact on desktop interactivity. This patch implements an idea from Linus, to automatically create task groups. Currently, only per session autogroups are implemented, but the patch leaves the way open for enhancement. Implementation: each task's signal struct contains an inherited pointer to a refcounted autogroup struct containing a task group pointer, the default for all tasks pointing to the init_task_group. When a task calls setsid(), a new task group is created, the process is moved into the new task group, and a reference to the preveious task group is dropped. Child processes inherit this task group thereafter, and increase it's refcount. When the last thread of a process exits, the process's reference is dropped, such that when the last process referencing an autogroup exits, the autogroup is destroyed. At runqueue selection time, IFF a task has no cgroup assignment, its current autogroup is used. Autogroup bandwidth is controllable via setting it's nice level through the proc filesystem: cat /proc/<pid>/autogroup Displays the task's group and the group's nice level. echo <nice level> > /proc/<pid>/autogroup Sets the task group's shares to the weight of nice <level> task. Setting nice level is rate limited for !admin users due to the abuse risk of task group locking. The feature is enabled from boot by default if CONFIG_SCHED_AUTOGROUP=y is selected, but can be disabled via the boot option noautogroup, and can also be turned on/off on the fly via: echo [01] > /proc/sys/kernel/sched_autogroup_enabled ... which will automatically move tasks to/from the root task group. Signed-off-by: NMike Galbraith <efault@gmx.de> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Paul Turner <pjt@google.com> Cc: Oleg Nesterov <oleg@redhat.com> [ Removed the task_group_path() debug code, and fixed !EVENTFD build failure. ] Signed-off-by: NIngo Molnar <mingo@elte.hu> LKML-Reference: <1290281700.28711.9.camel@maggy.simson.net> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 20 11月, 2010 2 次提交
-
-
由 Lukas Czerner 提交于
Filesystem independent ioctl was rejected as not common enough to be in core vfs ioctl. Since we still need to access to this functionality this commit adds ext4 specific ioctl EXT4_IOC_TRIM to dispatch ext4_trim_fs(). It takes fstrim_range structure as an argument. fstrim_range is definec in the include/linux/fs.h and its definition is as follows. struct fstrim_range { __u64 start; __u64 len; __u64 minlen; } start - first Byte to trim len - number of Bytes to trim from start minlen - minimum extent length to trim, free extents shorter than this number of Bytes will be ignored. This will be rounded up to fs block size. After the FITRIM is done, the number of actually discarded Bytes is stored in fstrim_range.len to give the user better insight on how much storage space has been really released for wear-leveling. Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Lukas Czerner 提交于
There was concern that FITRIM ioctl is not common enough to be included in core vfs ioctl, as Christoph Hellwig pointed out there's no real point in dispatching this out to a separate vector instead of just through ->ioctl. So this commit removes ioctl_fstrim() from vfs ioctl and trim_fs from super_operation structure. Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 19 11月, 2010 2 次提交
-
-
由 Darrick J. Wong 提交于
At the start of ext4_fill_super, ret is set to -EINVAL, and any failure path out of that function returns ret. However, the generic_check_addressable clause sets ret = 0 (if it passes), which means that a subsequent failure (e.g. a group checksum error) returns 0 even though the mount should fail. This causes vfs_kern_mount in turn to think that the mount succeeded, leading to an oops. A simple fix is to avoid using ret for the generic_check_addressable check, which was last changed in commit 30ca22c7. Signed-off-by: NDarrick J. Wong <djwong@us.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Sage Weil 提交于
One of the readdir filldir_t callers was passing the raw ceph 64-bit ino instead of the hashed 32-bit one, producing an EOVERFLOW in the filler callback. Fix this by calling the ceph_vino_to_ino() helper to do the conversion. Reported-by: NJan Smets <jan.smets@alcatel-lucent.com> Tested-by: NJan Smets <jan.smets@alcatel-lucent.com> Signed-off-by: NSage Weil <sage@newdream.net>
-
- 18 11月, 2010 5 次提交
-
-
由 yangsheng 提交于
In jbd2_journal_init_dev(), we need make sure the journal structure is fully initialzied before calling jbd2_stats_proc_init(). Reviewed-by: NAndreas Dilger <andreas.dilger@oracle.com> Signed-off-by: Nyangsheng <sheng.yang@oracle.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Dan Carpenter 提交于
If the the li_request_list was empty then it returned with the lock held. Instead of adding a "goto unlock" I just removed that special case and let it go past the empty list_for_each_safe(). Signed-off-by: NDan Carpenter <error27@gmail.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Markus Trippelsdorf 提交于
ext4_end_bio calls put_page and kmem_cache_free before calling SetPageUpdate(). This can result in setting the PageUptodate bit on random pages and causes the following BUG: BUG: Bad page state in process rm pfn:52e54 page:ffffea0001222260 count:0 mapcount:0 mapping: (null) index:0x0 arch kernel: page flags: 0x4000000000000008(uptodate) Fix the problem by moving put_io_page() after the SetPageUpdate() call. Thanks to Hugh Dickins for analyzing this problem. Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de> Tested-by: NMarkus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: NMarkus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Arnd Bergmann 提交于
Lock_kernel is gone from the code, so the comments should be updated, too. nfsd now uses lock_flocks instead of lock_kernel to protect against posix file locks. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Acked-by: NJ. Bruce Fields <bfields@redhat.com> Cc: linux-nfs@vger.kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Arnd Bergmann 提交于
The big kernel lock has been removed from all these files at some point, leaving only the #include. Remove this too as a cleanup. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 11月, 2010 1 次提交
-
-
由 Catalin Marinas 提交于
Strings allocated via kmemdup() in nfs_readdir_make_qstr() are referenced from the nfs_cache_array which is stored in a page cache page. Kmemleak does not scan such pages and it reports several false positives. This patch annotates the string->name pointer so that kmemleak does not consider it a real leak. Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com> Cc: Bryan Schumaker <bjschuma@netapp.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
- 16 11月, 2010 5 次提交
-
-
由 Trond Myklebust 提交于
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Trond Myklebust 提交于
Fix up the issue that array->eof_index needs to be able to be set even if array->size == 0. Ensure that we catch all important memory allocation error conditions and/or kmap() failures. Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Trond Myklebust 提交于
This reverts commit 80e60639. This change requires further fixes to ensure that the open doesn't succeed if the lookup later results in a regular file being created. Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Paulius Zaleckas 提交于
Trying to mount NFS (root partition in my case) fails if CONFIG_NFS_V3 is not selected. nfs_validate_mount_data() returns EPROTONOSUPPORT, because of this check: #ifndef CONFIG_NFS_V3 if (args->version == 3) goto out_v3_not_compiled; #endif /* !CONFIG_NFS_V3 */ and args->version was always initialized to 3. It was working in 2.6.36 Signed-off-by: NPaulius Zaleckas <paulius.zaleckas@gmail.com> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Trond Myklebust 提交于
Nick Bowler reports: There are no unusual messages on the client... but I just logged into the server and I see lots of messages of the following form: nfsd: request from insecure port (192.168.8.199:35766)! nfsd: request from insecure port (192.168.8.199:35766)! nfsd: request from insecure port (192.168.8.199:35766)! nfsd: request from insecure port (192.168.8.199:35766)! nfsd: request from insecure port (192.168.8.199:35766)! Bisected to commit 92476850 (SUNRPC: Properly initialize sock_xprt.srcaddr in all cases) Apparently, removing the 'transport->srcaddr.ss_family = family' from xs_create_sock() triggers this due to nlmclnt_lookup_host() incorrectly initialising the srcaddr family to AF_UNSPEC. Reported-by: NNick Bowler <nbowler@elliptictech.com> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
- 15 11月, 2010 1 次提交
-
-
由 Steven Whitehouse 提交于
This area of the code has always been a bit delicate due to the subtleties of lock ordering. The problem is that for "normal" alloc/dealloc, we always grab the inode locks first and the rgrp lock later. In order to ensure no races in looking up the unlinked, but still allocated inodes, we need to hold the rgrp lock when we do the lookup, which means that we can't take the inode glock. The solution is to borrow the technique already used by NFS to solve what is essentially the same problem (given an inode number, look up the inode carefully, checking that it really is in the expected state). We cannot do that directly from the allocation code (lock ordering again) so we give the job to the pre-existing delete workqueue and carry on with the allocation as normal. If we find there is no space, we do a journal flush (required anyway if space from a deallocation is to be released) which should block against the pending deallocations, so we should always get the space back. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 13 11月, 2010 1 次提交
-
-
由 Tao Ma 提交于
Commit 83fd9c7f changes l_level, l_requested and l_blocking of ocfs2_lock_res from int to unsigned char. But actually it is initially as -1(ocfs2_lock_res_init_common) which correspoding to 255 for unsigned char. So the whole dlm lock mechanism doesn't work now which means a disaster to ocfs2. Cc: Goldwyn Rodrigues <rgoldwyn@suse.de> Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
- 12 11月, 2010 3 次提交
-
-
由 Dave Jones 提交于
WARN_ONCE is a bit strong for a deprecation warning, given that it spews a huge backtrace. Signed-off-by: NDave Jones <davej@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Sage Weil 提交于
We start at offset 2 for the leftmost frag, and 0 for subsequent frags. When we reach the end (rightmost), we go back to 2. This fixes readdir on fragmented (large) directories. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
Clear fi->last_name when it's freed. The only caller is rewinddir() (or equivalent lseek). Signed-off-by: NSage Weil <sage@newdream.net>
-
- 11 11月, 2010 10 次提交
-
-
由 Christoph Hellwig 提交于
In commit 20cb52eb, titled "xfs: simplify xfs_vm_writepage" I added an assert that any !mapped and uptodate buffers are not dirty. That asserts turns out to trigger a lot when running fsx on filesystems with small block sizes. The reason for that is that the assert is simply incorrect. !mapped and uptodate just mean this buffer covers a hole, and whenever we do a set_page_dirty we mark all blocks in the page dirty, no matter if they have data or not. So remove the assert, and update the comment above the condition to match reality. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 J. Bruce Fields 提交于
A minor oversight from f7347ce4, "fasync: re-organize fasync entry insertion to allow it under a spinlock": this cleanup-on-error was only needed to handle -ENOMEM. Now that we're preallocating it's unneeded. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 J. Bruce Fields 提交于
We must also free the passed-in lease in the case it wasn't used because an existing lease was upgrade/downgraded or already existed. Note the nfsd caller doesn't care because it's fl_change callback returns an error in those cases. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Christoph Hellwig 提交于
XFS does not need it's inodes to actuall be hashed in the VFS inode cache, but we require the inode to be marked hashed for the writeback code to work. Insted of using insert_inode_hash, which requires a second inode_lock roundtrip after the partial merge of the inode scalability patches in 2.6.37-rc simply use the new hlist_add_fake helper to mark it hashed without requiring a lock or touching a global cache line. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Christoph Hellwig 提交于
Andi Kleen reported that gcc-4.5 gives lots of warnings for him inside the XFS code. It turned out most of them are due to the quota stubs beeing macros, and gcc now complaining about macros evaluating to 0 that are not assigned to variables. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Christoph Hellwig 提交于
The filestreams code may take the iolock on the parent inode while holding it on a child. This is the only place in XFS where we take both the child and parent iolock, so just telling lockdep about it is enough. The lock flag required for that was already added as part of the ilock lockdep annotations and unused so far. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Dave Chinner 提交于
The delayed write buffer split trace currently issues a trace for every buffer it scans. These buffers are not necessarily queued for delayed write. Indeed, when buffers are pinned, there can be thousands of traces of buffers that aren't actually queued for delayed write and the ones that are are lost in the noise. Move the trace point to record only buffers that are split out for IO to be issued on. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Dave Chinner 提交于
The walk fails to decrement the per-ag reference count when the non-blocking walk fails to obtain the per-ag reclaim lock, leading to an assert failure on debug kernels when unmounting a filesystem. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Kulikov Vasiliy 提交于
al_hreq is copied from userland. If al_hreq.buflen is not properly aligned then xfs_attr_list will ignore the last bytes of kbuf. These bytes are unitialized. It leads to leaking of contents of kernel stack memory. Signed-off-by: NVasiliy Kulikov <segooon@gmail.com> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Christoph Hellwig 提交于
We promised to do this for 2.6.37, and the code looks stable enough to keep that promise. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
- 10 11月, 2010 6 次提交
-
-
由 Sergey Senozhatsky 提交于
Commit 4221a991 "Add RCU check for find_task_by_vpid()" introduced rcu_lockdep_assert to find_task_by_pid_ns= Assertion failed in sys_ioprio_get. The patch is fixing assertion failure in ioprio_set as well. kernel/pid.c:419 invoked rcu_dereference_check() without protection! stack backtrace: Pid: 4254, comm: iotop Not tainted Call Trace: [<ffffffff810656f2>] lockdep_rcu_dereference+0xaa/0xb2 [<ffffffff81053c67>] find_task_by_pid_ns+0x4f/0x68 [<ffffffff81053c9d>] find_task_by_vpid+0x1d/0x1f [<ffffffff811104e2>] sys_ioprio_get+0x50/0x2da [<ffffffff81002182>] system_call_fastpath+0x16/0x1b V2: rcu critical section expanded according to comment by Paul E. McKenney Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Daniel J Blueman 提交于
With 2.6.37-rc1, I observe sys_ioprio_set not taking the RCU lock [1] across access to the task credentials. Inspecting the code in fs/ioprio.c, the tasklist_lock is held for read across the __task_cred call, which is presumably sufficient to prevent the task credentials becoming stale. =================================================== [ INFO: suspicious rcu_dereference_check() usage. ] --------------------------------------------------- kernel/pid.c:419 invoked rcu_dereference_check() without protection! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 1 1 lock held by start-stop-daem/2246: #0: (tasklist_lock){.?.?..}, at: [<ffffffff811a2dfa>] sys_ioprio_set+0x8a/0x400 stack backtrace: Pid: 2246, comm: start-stop-daem Not tainted 2.6.37-rc1-330cd+ #2 Call Trace: [<ffffffff8109f5f4>] lockdep_rcu_dereference+0xa4/0xc0 [<ffffffff81085651>] find_task_by_pid_ns+0x81/0x90 [<ffffffff8108567d>] find_task_by_vpid+0x1d/0x20 [<ffffffff811a3160>] sys_ioprio_set+0x3f0/0x400 [<ffffffff816efa79>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff81003482>] system_call_fastpath+0x16/0x1b Take the RCU lock for read across acquiring the pointer to the task credentials and dereferencing it. Signed-off-by: NDaniel J Blueman <daniel.blueman@gmail.com> Fixed up by Jens to fix missing rcu_read_unlock() on mismatches. Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Jens Axboe 提交于
If the iovec is being set up in a way that causes uaddr + PAGE_SIZE to overflow, we could end up attempting to map a huge number of pages. Check for this invalid input type. Reported-by: NDan Rosenberg <drosenberg@vsecurity.com> Cc: stable@kernel.org Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Jens Axboe 提交于
Reported-by: NDan Rosenberg <drosenberg@vsecurity.com> Cc: stable@kernel.org Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Sage Weil 提交于
We used to infer alignment of IOs within a page based on the file offset, which assumed they matched. This broke with direct IO that was not aligned to pages (e.g., 512-byte aligned IO). We were also trusting the alignment specified in the OSD reply, which could have been adjusted by the server. Explicitly specify the page alignment when setting up OSD IO requests. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
The offset/length arguments aren't used. Signed-off-by: NSage Weil <sage@newdream.net>
-
- 09 11月, 2010 3 次提交
-
-
由 Suresh Jayaraman 提交于
Andrew Hendry reported a kmemleak warning in 2.6.37-rc1 while editing a text file with gedit over cifs. unreferenced object 0xffff88022ee08b40 (size 32): comm "gedit", pid 2524, jiffies 4300160388 (age 2633.655s) hex dump (first 32 bytes): 5c 2e 67 6f 75 74 70 75 74 73 74 72 65 61 6d 2d \.goutputstream- 35 42 41 53 4c 56 00 de 09 00 00 00 2c 26 78 ee 5BASLV......,&x. backtrace: [<ffffffff81504a4d>] kmemleak_alloc+0x2d/0x60 [<ffffffff81136e13>] __kmalloc+0xe3/0x1d0 [<ffffffffa0313db0>] build_path_from_dentry+0xf0/0x230 [cifs] [<ffffffffa031ae1e>] cifs_setattr+0x9e/0x770 [cifs] [<ffffffff8115fe90>] notify_change+0x170/0x2e0 [<ffffffff81145ceb>] sys_fchmod+0x10b/0x140 [<ffffffff8100c172>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff The commit 1025774c that removed inode_setattr() seems to have introduced this memleak by returning early without freeing 'full_path'. Reported-by: NAndrew Hendry <andrew.hendry@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NSuresh Jayaraman <sjayaraman@suse.de> Signed-off-by: NSteve French <sfrench@us.ibm.com>
-
由 Meelis Roos 提交于
Fix openpromfs compilation by adding a missing semicolon in fs/openpromfs/inode.c openprom_mount(). Signed-off-by: NMeelis Roos <mroos@linux.ee> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jeff Layton 提交于
Commit 13cfb733 made cifs_ioctl use the tlink attached to the cifsFileInfo for a filp. This ignores the case of an open directory however, which in CIFS can have a NULL private_data until a readdir is done on it. This patch re-adds the NULL pointer checks that were removed in commit 50ae28f0 and moves the setting of tcon and "caps" variables lower. Long term, a better fix would be to establish a f_op->open routine for directories that populates that field at open time, but that requires some other changes to how readdir calls are handled. Reported-by: NKjell Rune Skaaraas <kjella79@yahoo.no> Reviewed-and-Tested-by: NSuresh Jayaraman <sjayaraman@suse.de> Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NSteve French <sfrench@us.ibm.com>
-