- 02 10月, 2006 1 次提交
-
-
由 Arnd Bergmann 提交于
Some architectures provide an execve function that does not set errno, but instead returns the result code directly. Rename these to kernel_execve to get the right semantics there. Moreover, there is no reasone for any of these architectures to still provide __KERNEL_SYSCALLS__ or _syscallN macros, so remove these right away. [akpm@osdl.org: build fix] [bunk@stusta.de: build fix] Signed-off-by: NArnd Bergmann <arnd@arndb.de> Cc: Andi Kleen <ak@muc.de> Acked-by: NPaul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ian Molton <spyro@f2s.com> Cc: Mikael Starvik <starvik@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Hirokazu Takata <takata.hirokazu@renesas.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp> Cc: Richard Curnow <rc@rc0.org.uk> Cc: William Lee Irwin III <wli@holomorphy.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp> Cc: Chris Zankel <chris@zankel.net> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: NAdrian Bunk <bunk@stusta.de> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 27 9月, 2006 1 次提交
-
-
由 Tony Luck 提交于
This reverts commit 26362554. Jakub Jelinek provided the missing futex_atomic_cmpxchg_inatomic() function, so now it should be safe to re-enable these syscalls. Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 09 9月, 2006 1 次提交
-
-
由 Andreas Schwab 提交于
The syscalls set/get_robust_list must not be wired up until futex_atomic_cmpxchg_inatomic is implemented. Otherwise the kernel will hang in handle_futex_death. Signed-off-by: NAndreas Schwab <schwab@suse.de> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 01 7月, 2006 1 次提交
-
-
由 Jörn Engel 提交于
Signed-off-by: NJörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: NAdrian Bunk <bunk@stusta.de>
-
- 23 6月, 2006 1 次提交
-
-
由 Christoph Lameter 提交于
move_pages() is used to move individual pages of a process. The function can be used to determine the location of pages and to move them onto the desired node. move_pages() returns status information for each page. long move_pages(pid, number_of_pages_to_move, addresses_of_pages[], nodes[] or NULL, status[], flags); The addresses of pages is an array of void * pointing to the pages to be moved. The nodes array contains the node numbers that the pages should be moved to. If a NULL is passed instead of an array then no pages are moved but the status array is updated. The status request may be used to determine the page state before issuing another move_pages() to move pages. The status array will contain the state of all individual page migration attempts when the function terminates. The status array is only valid if move_pages() completed successfullly. Possible page states in status[]: 0..MAX_NUMNODES The page is now on the indicated node. -ENOENT Page is not present -EACCES Page is mapped by multiple processes and can only be moved if MPOL_MF_MOVE_ALL is specified. -EPERM The page has been mlocked by a process/driver and cannot be moved. -EBUSY Page is busy and cannot be moved. Try again later. -EFAULT Invalid address (no VMA or zero page). -ENOMEM Unable to allocate memory on target node. -EIO Unable to write back page. The page must be written back in order to move it since the page is dirty and the filesystem does not provide a migration function that would allow the moving of dirty pages. -EINVAL A dirty page cannot be moved. The filesystem does not provide a migration function and has no ability to write back pages. The flags parameter indicates what types of pages to move: MPOL_MF_MOVE Move pages that are only mapped by the process. MPOL_MF_MOVE_ALL Also move pages that are mapped by multiple processes. Requires sufficient capabilities. Possible return codes from move_pages() -ENOENT No pages found that would require moving. All pages are either already on the target node, not present, had an invalid address or could not be moved because they were mapped by multiple processes. -EINVAL Flags other than MPOL_MF_MOVE(_ALL) specified or an attempt to migrate pages in a kernel thread. -EPERM MPOL_MF_MOVE_ALL specified without sufficient priviledges. or an attempt to move a process belonging to another user. -EACCES One of the target nodes is not allowed by the current cpuset. -ENODEV One of the target nodes is not online. -ESRCH Process does not exist. -E2BIG Too many pages to move. -ENOMEM Not enough memory to allocate control array. -EFAULT Parameters could not be accessed. A test program for move_pages() may be found with the patches on ftp.kernel.org:/pub/linux/kernel/people/christoph/pmig/patches-2.6.17-rc4-mm3 From: Christoph Lameter <clameter@sgi.com> Detailed results for sys_move_pages() Pass a pointer to an integer to get_new_page() that may be used to indicate where the completion status of a migration operation should be placed. This allows sys_move_pags() to report back exactly what happened to each page. Wish there would be a better way to do this. Looks a bit hacky. Signed-off-by: NChristoph Lameter <clameter@sgi.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Jes Sorensen <jes@trained-monkey.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Andi Kleen <ak@muc.de> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 26 4月, 2006 1 次提交
-
-
由 Jens Axboe 提交于
sys_splice() moves data to/from pipes with a file input/output. sys_vmsplice() moves data to a pipe, with the input being a user address range instead. This uses an approach suggested by Linus, where we can hold partial ranges inside the pages[] map. Hopefully this will be useful for network receive support as well. Signed-off-by: NJens Axboe <axboe@suse.de>
-
- 11 4月, 2006 1 次提交
-
-
由 Jens Axboe 提交于
Basically an in-kernel implementation of tee, which uses splice and the pipe buffers as an intelligent way to pass data around by reference. Where the user space tee consumes the input and produces a stdout and file output, this syscall merely duplicates the data inside a pipe to another pipe. No data is copied, the output just grabs a reference to the input pipe data. Signed-off-by: NJens Axboe <axboe@suse.de>
-
- 07 4月, 2006 1 次提交
-
-
由 Tony Luck 提交于
Join the dots to enable Ingo's robut futex syscalls. Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 05 4月, 2006 1 次提交
-
-
由 Tony Luck 提交于
Also reserve syscall numbers for {set,get}_robust_list Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 31 3月, 2006 1 次提交
-
-
由 Jens Axboe 提交于
This adds support for the sys_splice system call. Using a pipe as a transport, it can connect to files or sockets (latter as output only). From the splice.c comments: "splice": joining two ropes together by interweaving their strands. This is the "extended pipe" functionality, where a pipe is used as an arbitrary in-memory buffer. Think of a pipe as a small kernel buffer that you can use to transfer data from one end to the other. The traditional unix read/write is extended with a "splice()" operation that transfers data buffers to or from a pipe buffer. Named by Larry McVoy, original implementation from Linus, extended by Jens to support splicing to files and fixing the initial implementation bugs. Signed-off-by: NJens Axboe <axboe@suse.de> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 17 2月, 2006 1 次提交
-
-
由 Jack Steiner 提交于
It appears that if auditing is enabled, the kernel fails to check for pending signals before returning to user mode. Signed-off-by: NJack Steiner <steiner@sgi.com> Acked-by: NKen Chen <kenneth.w.chen@intel.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 09 2月, 2006 1 次提交
-
-
由 Janak Desai 提交于
Registers system call for the ia64 architecture. Reserves space for ppoll and pselect, and adds unshare at system call number 1296. Signed-off-by: NJanak Desai <janak@us.ibm.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 07 2月, 2006 1 次提交
-
-
由 Chen, Kenneth W 提交于
Wire up the ia64 syscalls for *at() functions. Signed-off-by: NKen Chen <kenneth.w.chen@intel.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 27 1月, 2006 1 次提交
-
-
由 Keith Owens 提交于
The only user of the MCA/INIT sigdelayed code (SGI's I/O probing) has moved from the kernel into SAL. Delete the MCA/INIT sigdelayed code. Signed-off-by: NKeith Owens <kaos@sgi.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 09 1月, 2006 1 次提交
-
-
由 Christoph Lameter 提交于
sys_migrate_pages implementation using swap based page migration This is the original API proposed by Ray Bryant in his posts during the first half of 2005 on linux-mm@kvack.org and linux-kernel@vger.kernel.org. The intent of sys_migrate is to migrate memory of a process. A process may have migrated to another node. Memory was allocated optimally for the prior context. sys_migrate_pages allows to shift the memory to the new node. sys_migrate_pages is also useful if the processes available memory nodes have changed through cpuset operations to manually move the processes memory. Paul Jackson is working on an automated mechanism that will allow an automatic migration if the cpuset of a process is changed. However, a user may decide to manually control the migration. This implementation is put into the policy layer since it uses concepts and functions that are also needed for mbind and friends. The patch also provides a do_migrate_pages function that may be useful for cpusets to automatically move memory. sys_migrate_pages does not modify policies in contrast to Ray's implementation. The current code here is based on the swap based page migration capability and thus is not able to preserve the physical layout relative to it containing nodeset (which may be a cpuset). When direct page migration becomes available then the implementation needs to be changed to do a isomorphic move of pages between different nodesets. The current implementation simply evicts all pages in source nodeset that are not in the target nodeset. Patch supports ia64, i386 and x86_64. Signed-off-by: NChristoph Lameter <clameter@sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 17 9月, 2005 1 次提交
-
-
由 Peter Chubb 提交于
This patch removes some compilation warnings, mostly trivially. acpi.c fix also noted by Kenji Kaneshige. Signed-off-by; Peter Chubb <peterc@gelato.unsw.edu.au> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 10 9月, 2005 2 次提交
-
-
由 Chen, Kenneth W 提交于
For architecture like ia64, the switch stack structure is fairly large (currently 528 bytes). For context switch intensive application, we found that significant amount of cache misses occurs in switch_to() function. The following patch adds a hook in the schedule() function to prefetch switch stack structure as soon as 'next' task is determined. This allows maximum overlap in prefetch cache lines for that structure. Signed-off-by: NKen Chen <kenneth.w.chen@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Sam Ravnborg 提交于
Delete obsolete stuff from arch Makefile Rename file to asm-offsets.h The trick used in the arch Makefile to circumvent the circular dependency is kept. Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
-
- 08 9月, 2005 1 次提交
-
-
由 Chen, Kenneth W 提交于
The reenabling of psr.ic should really belong to dtr mapping code block. It make the fall through code fast since it doesn't need to execute the predicated-off instruction. Logically make more sense as well since psr.ic was turned off in .map code block. Signed-off-by: NKen Chen <kenneth.w.chen@intel.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 02 8月, 2005 1 次提交
-
-
由 Ingo Molnar 提交于
This removes sys_set_zone_reclaim() for now. While i'm sure Martin is trying to solve a real problem, we must not hard-code an incomplete and insufficient approach into a syscall, because syscalls are pretty much for eternity. I am quite strongly convinced that this syscall must not hit v2.6.13 in its current form. Firstly, the syscall lacks basic syscall design: e.g. it allows the global setting of VM policy for unprivileged users. (!) [ Imagine an Oracle installation and a SAP installation on the same NUMA box fighting over the 'optimal' setting for this flag. What will they do? Will they try to set the flag to their own preferred value every second or so? ] Secondly, it was added based on a single datapoint from Martin: http://marc.theaimsgroup.com/?l=linux-mm&m=111763597218177&w=2 where Martin characterizes the numbers the following way: ' Run-to-run variability for "make -j" is huge, so these numbers aren't terribly useful except to see that with reclaim the benchmark still finishes in a reasonable amount of time. ' in other words: the fundamental problem has likely not been solved, only a tendential move into the right direction has been observed, and a handful of numbers were picked out of a set of hugely variable results, without showing the variability data. How much variance is there run-to-run? I'd really suggest to first walk the walk and see what's needed to get stable & predictable kernel compilation numbers on that NUMA box, before adding random syscalls to tune a particular aspect of the VM ... which approach might not even matter once the whole picture has been analyzed and understood! The third, most important point is that the syscall exposes VM tuning internals in a completely unstructured way. What sense does it make to have a _GLOBAL_ per-node setting for 'should we go to another node for reclaim'? If then it might make sense to do this per-app, via numalib or so. The change is minimalistic in that it doesnt remove the syscall and the underlying infrastructure changes, only the user-visible changes. We could perhaps add a CAP_SYS_ADMIN-only sysctl for this hack, a'ka /proc/sys/vm/swappiness, but even that looks quite counterproductive when the generic approach is that we are trying to reduce the number of external factors in the VM balance picture. Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 28 7月, 2005 1 次提交
-
-
由 Robert Love 提交于
Attached patch adds the inotify syscalls to ia64. Signed-off-by: NRobert Love <rml@novell.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 09 7月, 2005 1 次提交
-
-
由 H. J. Lu 提交于
Both 2.4 and 2.6 kernels need this patch for the next binutils. Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 28 6月, 2005 1 次提交
-
-
由 Jens Axboe 提交于
This updates the CFQ io scheduler to the new time sliced design (cfq v3). It provides full process fairness, while giving excellent aggregate system throughput even for many competing processes. It supports io priorities, either inherited from the cpu nice value or set directly with the ioprio_get/set syscalls. The latter closely mimic set/getpriority. This import is based on my latest from -mm. Signed-off-by: NJens Axboe <axboe@suse.de> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 22 6月, 2005 1 次提交
-
-
由 Martin Hicks 提交于
This is the core of the (much simplified) early reclaim. The goal of this patch is to reclaim some easily-freed pages from a zone before falling back onto another zone. One of the major uses of this is NUMA machines. With the default allocator behavior the allocator would look for memory in another zone, which might be off-node, before trying to reclaim from the current zone. This adds a zone tuneable to enable early zone reclaim. It is selected on a per-zone basis and is turned on/off via syscall. Adding some extra throttling on the reclaim was also required (patch 4/4). Without the machine would grind to a crawl when doing a "make -j" kernel build. Even with this patch the System Time is higher on average, but it seems tolerable. Here are some numbers for kernbench runs on a 2-node, 4cpu, 8Gig RAM Altix in the "make -j" run: wall user sys %cpu ctx sw. sleeps ---- ---- --- ---- ------ ------ No patch 1009 1384 847 258 298170 504402 w/patch, no reclaim 880 1376 667 288 254064 396745 w/patch & reclaim 1079 1385 926 252 291625 548873 These numbers are the average of 2 runs of 3 "make -j" runs done right after system boot. Run-to-run variability for "make -j" is huge, so these numbers aren't terribly useful except to seee that with reclaim the benchmark still finishes in a reasonable amount of time. I also looked at the NUMA hit/miss stats for the "make -j" runs and the reclaim doesn't make any difference when the machine is thrashing away. Doing a "make -j8" on a single node that is filled with page cache pages takes 700 seconds with reclaim turned on and 735 seconds without reclaim (due to remote memory accesses). The simple zone_reclaim syscall program is at http://www.bork.org/~mort/sgi/zone_reclaim.cSigned-off-by: NMartin Hicks <mort@sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 11 5月, 2005 1 次提交
-
-
由 David Mosberger-Tang 提交于
Some time ago, GAS was fixed to bring the .spillpsp directive in line with the Intel assembler manual (there was some disagreement as to whether or not there is a built-in 16-byte offset). Unfortunately, there are two places in the kernel where this directive is used in handwritten assembly files and those of course relied on the "buggy" behavior. As a result, when using a "fixed" assembler, the kernel picks up the UNaT bits from the wrong place (off by 16) and randomly sets NaT bits on the scratch registers. This can be noticed easily by looking at a coredump and finding various scratch registers with unexpected NaT values. The patch below fixes this by using the .spillsp directive instead, which works correctly no matter what assembler is in use. Signed-off-by: NDavid Mosberger-Tang <davidm@hpl.hp.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 04 5月, 2005 1 次提交
-
-
由 David Mosberger-Tang 提交于
Patch below fixes 3 trivial typos which are caught by the new assembler (v2.169.90). Please apply. [Note: fix to memcpy that was also part of this patch was separately applied from patches by H.J. and Andreas ... so the delta here only has the other two fixes. -Tony] Signed-off-by: NDavid Mosberger-Tang <davidm@hpl.hp.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 01 5月, 2005 1 次提交
-
-
由 Stephen Rothwell 提交于
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 28 4月, 2005 7 次提交
-
-
由 David Mosberger-Tang 提交于
This patch switches the srlz.i in ia64_leave_kernel() to srlz.d. As per architecture manual, the former is needed only to ensure that the clearing of PSR.IC is seen by the VHPT for subsequent instruction fetches. However, since the remainder of the code (up to and including the RFI instruction) is mapped by a pinned TLB entry, there is no chance of an iTLB miss and we don't care whether or not the VHPT sees PSR.IC cleared. Since srlz.d is substantially cheaper than srlz.i, this should shave off a few cycles off the interrupt path (unverified though; I'm not setup to measure this at the moment). Signed-off-by: NDavid Mosberger-Tang <davidm@hpl.hp.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
由 David Mosberger-Tang 提交于
Signed-off-by: NDavid Mosberger-Tang <davidm@hpl.hp.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
由 David Mosberger-Tang 提交于
Reschedule code to read ar.bsp as early as possible. To enable this, don't bother clearing some of the registers when we're returning to kernel stacks. Also, instead of trying to support the pNonSys case (which makes no sense), do a bugcheck instead (with break 0). Finally, remove a clear of r14 which is a left-over from the previous patch. Signed-off-by: NDavid Mosberger-Tang <davidm@hpl.hp.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
由 David Mosberger-Tang 提交于
Why is this a good idea? Clearing b7 to 0 is guaranteed to do us no good and writing it with __kernel_syscall_via_epc() yields a 6 cycle improvement _if_ the application performs another EPC-based system- call without overwriting b7, which is not all that uncommon. Well worth the minimal cost of 1 bundle of code. Signed-off-by: NDavid Mosberger-Tang <davidm@hpl.hp.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
由 David Mosberger-Tang 提交于
Decreases syscall overhead by approximately 6 cycles. Signed-off-by: NDavid Mosberger-Tang <davidm@hpl.hp.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
由 David Mosberger-Tang 提交于
This by itself is good for a 1-2 cycle speed up. Effect is bigger when combined with the later patches. Signed-off-by: NDavid Mosberger-Tang <davidm@hpl.hp.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
由 David Mosberger-Tang 提交于
Signed-off-by: NDavid Mosberger-Tang <davidm@hpl.hp.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 26 4月, 2005 2 次提交
-
-
由 David Mosberger-Tang 提交于
Sadly, I goofed in this syscall-tuning patch: ChangeSet 1.1966.1.40 2005/01/22 13:31:05 davidm@hpl.hp.com [IA64] Improve ia64_leave_syscall() for McKinley-type cores. Optimize ia64_leave_syscall() a bit better for McKinley-type cores. The patch looks big, but that's mostly due to renaming r16/r17 to r2/r3. Good for a 13 cycle improvement. The problem is that the size of the physical stacked registers was loaded into the wrong register (r3 instead of r17). Since r17 by coincidence always had the value 1, this had the effect of turning rse_clear_invalid into a no-op. That poses the risk of leaking kernel state back to user-land and is hence not acceptable. The fix below is simple, but unfortunately it costs us about 28 cycles in syscall overhead. ;-( Unfortunately, there isn't much we can do about that since those registers have to be cleared one way or another. --david Signed-off-by: NTony Luck <tony.luck@intel.com>
-
由 David Mosberger-Tang 提交于
Recently I noticed that clearing ar.ssd/ar.csd right before srlz.d is causing significant stalling in the syscall path. The patch below fixes that by moving the register-writes after srlz.d. On a Madison, this drops break-based getpid() from 241 to 226 cycles (-15 cycles). Signed-off-by: NDavid Mosberger-Tang <davidm@hpl.hp.com> Signed-off-by: NTony Luck <tony.luck@intel.com>
-
- 17 4月, 2005 1 次提交
-
-
由 Linus Torvalds 提交于
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!
-