- 22 6月, 2005 3 次提交
-
-
由 Wolfgang Wander 提交于
Ingo recently introduced a great speedup for allocating new mmaps using the free_area_cache pointer which boosts the specweb SSL benchmark by 4-5% and causes huge performance increases in thread creation. The downside of this patch is that it does lead to fragmentation in the mmap-ed areas (visible via /proc/self/maps), such that some applications that work fine under 2.4 kernels quickly run out of memory on any 2.6 kernel. The problem is twofold: 1) the free_area_cache is used to continue a search for memory where the last search ended. Before the change new areas were always searched from the base address on. So now new small areas are cluttering holes of all sizes throughout the whole mmap-able region whereas before small holes tended to close holes near the base leaving holes far from the base large and available for larger requests. 2) the free_area_cache also is set to the location of the last munmap-ed area so in scenarios where we allocate e.g. five regions of 1K each, then free regions 4 2 3 in this order the next request for 1K will be placed in the position of the old region 3, whereas before we appended it to the still active region 1, placing it at the location of the old region 2. Before we had 1 free region of 2K, now we only get two free regions of 1K -> fragmentation. The patch addresses thes issues by introducing yet another cache descriptor cached_hole_size that contains the largest known hole size below the current free_area_cache. If a new request comes in the size is compared against the cached_hole_size and if the request can be filled with a hole below free_area_cache the search is started from the base instead. The results look promising: Whereas 2.6.12-rc4 fragments quickly and my (earlier posted) leakme.c test program terminates after 50000+ iterations with 96 distinct and fragmented maps in /proc/self/maps it performs nicely (as expected) with thread creation, Ingo's test_str02 with 20000 threads requires 0.7s system time. Taking out Ingo's patch (un-patch available per request) by basically deleting all mentions of free_area_cache from the kernel and starting the search for new memory always at the respective bases we observe: leakme terminates successfully with 11 distinctive hardly fragmented areas in /proc/self/maps but thread creating is gringdingly slow: 30+s(!) system time for Ingo's test_str02 with 20000 threads. Now - drumroll ;-) the appended patch works fine with leakme: it ends with only 7 distinct areas in /proc/self/maps and also thread creation seems sufficiently fast with 0.71s for 20000 threads. Signed-off-by: NWolfgang Wander <wwc@rentec.com> Credit-to: "Richard Purdie" <rpurdie@rpsys.net> Signed-off-by: NKen Chen <kenneth.w.chen@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> (partly) Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Martin Hicks 提交于
This is the core of the (much simplified) early reclaim. The goal of this patch is to reclaim some easily-freed pages from a zone before falling back onto another zone. One of the major uses of this is NUMA machines. With the default allocator behavior the allocator would look for memory in another zone, which might be off-node, before trying to reclaim from the current zone. This adds a zone tuneable to enable early zone reclaim. It is selected on a per-zone basis and is turned on/off via syscall. Adding some extra throttling on the reclaim was also required (patch 4/4). Without the machine would grind to a crawl when doing a "make -j" kernel build. Even with this patch the System Time is higher on average, but it seems tolerable. Here are some numbers for kernbench runs on a 2-node, 4cpu, 8Gig RAM Altix in the "make -j" run: wall user sys %cpu ctx sw. sleeps ---- ---- --- ---- ------ ------ No patch 1009 1384 847 258 298170 504402 w/patch, no reclaim 880 1376 667 288 254064 396745 w/patch & reclaim 1079 1385 926 252 291625 548873 These numbers are the average of 2 runs of 3 "make -j" runs done right after system boot. Run-to-run variability for "make -j" is huge, so these numbers aren't terribly useful except to seee that with reclaim the benchmark still finishes in a reasonable amount of time. I also looked at the NUMA hit/miss stats for the "make -j" runs and the reclaim doesn't make any difference when the machine is thrashing away. Doing a "make -j8" on a single node that is filled with page cache pages takes 700 seconds with reclaim turned on and 735 seconds without reclaim (due to remote memory accesses). The simple zone_reclaim syscall program is at http://www.bork.org/~mort/sgi/zone_reclaim.cSigned-off-by: NMartin Hicks <mort@sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Ingo Molnar 提交于
This patch implements a number of smp_processor_id() cleanup ideas that Arjan van de Ven and I came up with. The previous __smp_processor_id/_smp_processor_id/smp_processor_id API spaghetti was hard to follow both on the implementational and on the usage side. Some of the complexity arose from picking wrong names, some of the complexity comes from the fact that not all architectures defined __smp_processor_id. In the new code, there are two externally visible symbols: - smp_processor_id(): debug variant. - raw_smp_processor_id(): nondebug variant. Replaces all existing uses of _smp_processor_id() and __smp_processor_id(). Defined by every SMP architecture in include/asm-*/smp.h. There is one new internal symbol, dependent on DEBUG_PREEMPT: - debug_smp_processor_id(): internal debug variant, mapped to smp_processor_id(). Also, i moved debug_smp_processor_id() from lib/kernel_lock.c into a new lib/smp_processor_id.c file. All related comments got updated and/or clarified. I have build/boot tested the following 8 .config combinations on x86: {SMP,UP} x {PREEMPT,!PREEMPT} x {DEBUG_PREEMPT,!DEBUG_PREEMPT} I have also build/boot tested x64 on UP/PREEMPT/DEBUG_PREEMPT. (Other architectures are untested, but should work just fine.) Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NArjan van de Ven <arjan@infradead.org> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 21 6月, 2005 1 次提交
-
-
由 Dmitry Torokhov 提交于
sysfs: fix the rest of the kernel so if an attribute doesn't implement show or store method read/write will return -EIO instead of 0 or -EINVAL or -EPERM. Signed-off-by: NDmitry Torokhov <dtor@mail.ru> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
- 18 6月, 2005 1 次提交
-
-
由 Ingo Molnar 提交于
Do all timer zapping in exit_itimers. Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 14 6月, 2005 1 次提交
-
-
由 Jan Kara 提交于
On one path, cond_resched_lock() fails to return true if it dropped the lock. We think this might be causing the crashes in JBD's log_do_checkpoint(). Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 01 6月, 2005 1 次提交
-
-
由 Roman Zippel 提交于
flush_icache_range() is used in two different situation - in binfmt_elf.c & co for user space mappings and module.c for kernel modules. On m68k flush_icache_range() doesn't know which data to flush, as it has separate address spaces and the pointer argument can be valid in either address space. First I considered splitting flush_icache_range(), but this patch is simpler. Setting the correct context gives flush_icache_range() enough information to flush the correct data. Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 29 5月, 2005 1 次提交
-
-
由 John Hawkes 提交于
The "unhandled interrupts" catcher, note_interrupt(), increments a global desc->irq_count and grossly damages scaling of very large systems, e.g., >192p ia64 Altix, because of this highly contented cacheline, especially for timer interrupts. 384p is severely crippled, and 512p is unuseable. All calls to note_interrupt() can be disabled by booting with "noirqdebug", but this disables the useful interrupt checking for all interrupts. I propose eliminating note_interrupt() for all per-CPU interrupts. This was the behavior of linux-2.6.10 and earlier, but in 2.6.11 a code restructuring added a call to note_interrupt() for per-CPU interrupts. Besides, note_interrupt() is a bit racy for concurrent CPU calls anyway, as the desc->irq_count++ increment isn't atomic (which, if done, would make scaling even worse). Signed-off-by: NJohn Hawkes <hawkes@sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 27 5月, 2005 2 次提交
-
-
由 Paul Jackson 提交于
There is a race in the kernel cpuset code, between the code to handle notify_on_release, and the code to remove a cpuset. The notify_on_release code can end up trying to access a cpuset that has been removed. In the most common case, this causes a NULL pointer dereference from the routine cpuset_path. However all manner of bad things are possible, in theory at least. The existing code decrements the cpuset use count, and if the count goes to zero, processes the notify_on_release request, if appropriate. However, once the count goes to zero, unless we are holding the global cpuset_sem semaphore, there is nothing to stop another task from immediately removing the cpuset entirely, and recycling its memory. The obvious fix would be to always hold the cpuset_sem semaphore while decrementing the use count and dealing with notify_on_release. However we don't want to force a global semaphore into the mainline task exit path, as that might create a scaling problem. The actual fix is almost as easy - since this is only an issue for cpusets using notify_on_release, which the top level big cpusets don't normally need to use, only take the cpuset_sem for cpusets using notify_on_release. This code has been run for hours without a hiccup, while running a cpuset create/destroy stress test that could crash the existing kernel in seconds. This patch applies to the current -linus git kernel. Signed-off-by: NPaul Jackson <pj@sgi.com> Acked-by: NSimon Derr <simon.derr@bull.net> Acked-by: NDinakar Guniguntala <dino@in.ibm.com> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 David Woodhouse 提交于
Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
-
- 26 5月, 2005 1 次提交
-
-
由 David Woodhouse 提交于
While they were all just simple blobs it made sense to just free them as we walked through and logged them. Now that there are pointers to other objects which need refcounting, we might as well revert to _only_ logging them in audit_log_exit(), and put the code to free them properly in only one place -- in audit_free_aux(). Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org> ----------------------------------------------------------
-
- 25 5月, 2005 1 次提交
-
-
由 Kirill Korotaev 提交于
If SIGKILL does not have priority, we cannot instantly kill task before it makes some unexpected job. It can be critical, but we were unable to reproduce this easily until Heiko Carstens <Heiko.Carstens@de.ibm.com> reported this problem on LKML. Signed-Off-By: NKirill Korotaev <dev@sw.ru> Signed-Off-By: NAlexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 24 5月, 2005 2 次提交
-
-
由 David Woodhouse 提交于
It comes from the user; it needs to be escaped. Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
-
由 David Woodhouse 提交于
These changes make processing of audit logs easier. Based on a patch from Steve Grubb <sgrubb@redhat.com> Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
-
- 22 5月, 2005 2 次提交
-
-
由 David Woodhouse 提交于
Move audit_serial() into audit.c and use it to generate serial numbers on messages even when there is no audit context from syscall auditing. This allows us to disambiguate audit records when more than one is generated in the same millisecond. Based on a patch by Steve Grubb after he observed the problem. Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
-
由 Samuel Thibault 提交于
In _spin_unlock_bh(lock): do { \ _raw_spin_unlock(lock); \ preempt_enable(); \ local_bh_enable(); \ __release(lock); \ } while (0) there is no reason for using preempt_enable() instead of a simple preempt_enable_no_resched() Since we know bottom halves are disabled, preempt_schedule() will always return at once (preempt_count!=0), and hence preempt_check_resched() is useless here... This fixes it by using "preempt_enable_no_resched()" instead of the "preempt_enable()", and thus avoids the useless preempt_check_resched() just before re-enabling bottom halves. Signed-off-by: NSamuel Thibault <samuel.thibault@ens-lyon.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 21 5月, 2005 4 次提交
-
-
由 Steve Grubb 提交于
The attached patch changes all occurrences of loginuid to auid. It also changes everything to %u that is an unsigned type. Signed-off-by: NSteve Grubb <sgrubb@redhat.com> Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
-
由 Steve Grubb 提交于
The original AVC_USER message wasn't consolidated with the new range of user messages. The attached patch fixes the kernel so the old messages work again. Signed-off-by: NSteve Grubb <sgrubb@redhat.com> Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
-
由 Stephen Smalley 提交于
This patch changes the SELinux AVC to defer logging of paths to the audit framework upon syscall exit, by saving a reference to the (dentry,vfsmount) pair in an auxiliary audit item on the current audit context for processing by audit_log_exit. Signed-off-by: NStephen Smalley <sds@tycho.nsa.gov> Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
-
由 Paul Jackson 提交于
This patch removes the entwining of cpusets and hotplug code in the "No more Mr. Nice Guy" case of sched.c move_task_off_dead_cpu(). Since the hotplug code is holding a spinlock at this point, we cannot take the cpuset semaphore, cpuset_sem, as would seem to be required either to update the tasks cpuset, or to scan up the nested cpuset chain, looking for the nearest cpuset ancestor that still has some CPUs that are online. So we just punt and blast the tasks cpus_allowed with all bits allowed. This reverts these lines of code to what they were before the cpuset patch. And it updates the cpuset Doc file, to match. The one known alternative to this that seems to work came from Dinakar Guniguntala, and required the hotplug code to take the cpuset_sem semaphore much earlier in its processing. So far as we know, the increased locking entanglement between cpusets and hot plug of this alternative approach is not worth doing in this case. Signed-off-by: NPaul Jackson <pj@sgi.com> Acked-by: NNathan Lynch <ntl@pobox.com> Acked-by: NDinakar Guniguntala <dino@in.ibm.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 19 5月, 2005 4 次提交
-
-
由 David Woodhouse 提交于
The limit on the number of outstanding audit messages was inadvertently removed with the switch to queuing skbs directly for sending by a kernel thread. Put it back again. Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
-
由 David Woodhouse 提交于
Nobody does. Really, it gets very silly if auditd is recording its own actions. Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
-
由 David Woodhouse 提交于
netlink_unicast() will attempt to reallocate and will free messages if the socket's rcvbuf limit is reached unless we give it an infinite timeout. So do that, from a kernel thread which is dedicated to spewing stuff up the netlink socket. Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
-
由 Steve Grubb 提交于
* If vsnprintf returns -1, it will mess up the sk buffer space accounting. This is fixed by not calling skb_put with bogus len values. * audit_log_hex was a loop that called audit_log_vformat with %02X for each character. This is very inefficient since conversion from unsigned character to Ascii representation is essentially masking, shifting, and byte lookups. Also, the length of the converted string is well known - it's twice the original. Fixed by rewriting the function. * audit_log_untrustedstring had no comments. This makes it hard for someone to understand what the string format will be. * audit_log_d_path was never fixed to use untrustedstring. This could mess up user space parsers. This was fixed to make a temp buffer, call d_path, and log temp buffer using untrustedstring. From: Steve Grubb <sgrubb@redhat.com> Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
-
- 18 5月, 2005 2 次提交
-
-
由 David Woodhouse 提交于
It's silly to have to add explicit entries for new userspace messages as we invent them. Just treat all messages in the user range the same. Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
-
由 David Brownell 提交于
This patch includes various tweaks in the messaging that appears during system pm state transitions: * Warn about certain illegal calls in the device tree, like resuming child before parent or suspending parent before child. This could happen easily enough through sysfs, or in some cases when drivers use device_pm_set_parent(). * Be more consistent about dev_dbg() tracing ... do it for resume() and shutdown() too, and never if the driver doesn't have that method. * Say which type of system sleep state is being entered. Except for the warnings, these only affect debug messaging. Signed-off-by: NDavid Brownell <dbrownell@users.sourceforge.net> Acked-by: NPavel Machek <pavel@ucw.cz> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
- 17 5月, 2005 4 次提交
-
-
由 William Lee Irwin III 提交于
profile=schedule parsing is not quite what it should be. First, str[7] is 'e', not ',', but then even if it did fall through, prof_on = SCHED_PROFILING would be clobbered inside if (get_option(...)) So a small amount of rearrangement is done in this patch to correct it. Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Matt Mackall 提交于
Move add_preferred_console out of CONFIG_PRINTK so serial console does the right thing. Signed-off-by: NMatt Mackall <mpm@selenic.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Zhang, Yanmin 提交于
On my IA64 machine, after kernel 2.6.12-rc3 boots, an edge-triggered interrupt (IRQ 46) keeps triggered over and over again. There is no IRQ 46 interrupt action handler. It has lots of impact on performance. Kernel 2.6.10 and its prior versions have no the problem. Basically, kernel 2.6.10 will mask the spurious edge interrupt if the interrupt is triggered for the second time and its status includes IRQ_DISABLE|IRQ_PENDING. Originally, IA64 kernel has its own specific _irq_desc definitions in file arch/ia64/kernel/irq.c. The definition initiates _irq_desc[irq].status to IRQ_DISABLE. Since kernel 2.6.11, it was moved to architecture independent codes, i.e. kernel/irq/handle.c, but kernel/irq/handle.c initiates _irq_desc[irq].status to 0 instead of IRQ_DISABLE. Signed-off-by: NZhang Yanmin <yanmin.zhang@intel.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 David Woodhouse 提交于
Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
-
- 14 5月, 2005 3 次提交
-
-
由 David Woodhouse 提交于
Der... if you use max_t it helps if you give it a type. Note to self: Always just apply the tested patches, don't try to port them by hand. You're not clever enough. Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
-
由 Steve Grubb 提交于
I'm going through the kernel code and have a patch that corrects several spelling errors in comments. From: Steve Grubb <sgrubb@redhat.com> Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
-
由 Steve Grubb 提交于
This patch adds more messages types to the audit subsystem so that audit analysis is quicker, intuitive, and more useful. Signed-off-by: NSteve Grubb <sgrubb@redhat.com> --- I forgot one type in the big patch. I need to add one for user space originating SE Linux avc messages. This is used by dbus and nscd. -Steve --- Updated to 2.6.12-rc4-mm1. -dwmw2 Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
-
- 13 5月, 2005 1 次提交
-
-
由 David Woodhouse 提交于
Otherwise, we will be repeatedly reallocating, even if we're only adding a few bytes at a time. Pointed out by Steve Grubb. Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
-
- 11 5月, 2005 6 次提交
-
-
由 Chris Wright 提交于
Add audit_log_type to allow callers to specify type and pid when logging. Convert audit_log to wrapper around audit_log_type. Could have converted all audit_log callers directly, but common case is default of type AUDIT_KERNEL and pid 0. Update audit_log_start to take type and pid values when creating a new audit_buffer. Move sequences that did audit_log_start, audit_log_format, audit_set_type, audit_log_end, to simply call audit_log_type directly. This obsoletes audit_set_type and audit_set_pid, so remove them. Signed-off-by: NChris Wright <chrisw@osdl.org> Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
-
由 Chris Wright 提交于
Remove code conditionally dependent on CONFIG_AUDITSYSCALL from audit.c. Move these dependencies to audit.h with the rest. Signed-off-by: NChris Wright <chrisw@osdl.org> Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
-
由 Chris Wright 提交于
Audit now actually requires netlink. So make it depend on CONFIG_NET, and remove the inline dependencies on CONFIG_NET. Signed-off-by: NChris Wright <chrisw@osdl.org> Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
-
由 Chris Wright 提交于
Signed-off-by: NChris Wright <chrisw@osdl.org> Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
-
由 David Woodhouse 提交于
We're not allowed to use args twice; we need to use va_copy. Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
-
由 David Woodhouse 提交于
Let audit_expand() know how much it's expected to grow the buffer, in the case that we have that information to hand. Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
-