- 01 7月, 2006 40 次提交
-
-
由 Doug Thompson 提交于
Remove add_mc_to_global_list(). In next patch, this function will be reimplemented with different semantics. 1 Reimplement add_mc_to_global_list() with semantics that allow the caller to determine the ID number for a mem_ctl_info structure. Then modify edac_mc_add_mc() so that the caller specifies the ID number for the new mem_ctl_info structure. Platform-specific code should be able to assign the ID numbers in a platform-specific manner. For instance, on Opteron it makes sense to have the ID of the mem_ctl_info structure match the ID of the node that the memory controller belongs to. 2 Modify callers of edac_mc_add_mc() so they use the new semantics. Signed-off-by: NDoug Thompson <norsk5@xmission.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Doug Thompson 提交于
Change MC drivers from using CVS revision strings for their version number, Now each driver has its own local string. Remove some PCI dependencies from the core EDAC module. Made the code 'struct device' centric instead of 'struct pci_dev' Most of the code changes here are from a patch by Dave Jiang. It may be best to eventually move the PCI-specific code into a separate source file. Signed-off-by: NDoug Thompson <norsk5@xmission.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Matt LaPlante 提交于
Two typos in Documentation/IPMI. Acked-by: NAlan Cox <alan@redhat.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Josh Triplett 提交于
Add __acquire annotations to rcu_read_lock and rcu_read_lock_bh, and add __release annotations to rcu_read_unlock and rcu_read_unlock_bh. This allows sparse to detect improperly paired calls to these functions. Signed-off-by: NJosh Triplett <josh@freedesktop.org> Acked-by: NPaul E. McKenney <paulmck@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Adrian Bunk 提交于
- make __cm206_init() __init (required since it calls the __init cm206_init()) - make the needlessly global bcdbin() static - remove a comment with an obsolete compile command Signed-off-by: NAdrian Bunk <bunk@stusta.de> Cc: Jens Axboe <axboe@suse.de> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Adrian Bunk 提交于
Don't show a menu that can't be entered due to lack of contents on arm (the options are only available on arm26). Signed-off-by: NAdrian Bunk <bunk@stusta.de> Acked-by: NIan Molton <spyro@f2s.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Andrew Victor 提交于
This corrects the comments describing the 'enabled' and 'pending' flags in struct rtc_wkalrm of include/linux/rtc.h. Signed-off-by: NAndrew Victor <andrew@sanpeople.com> Cc: Alessandro Zummo <a.zummo@towertech.it> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Andrew Morton 提交于
Fix a bug identified by Zou Nan hai <nanhai.zou@intel.com>: If the system is in state SYSTEM_BOOTING, and need_resched() is true, cond_resched() returns true even though it didn't reschedule. Consequently need_resched() remains true and JBD locks up. Fix that by teaching cond_resched() to only return true if it really did call schedule(). cond_resched_lock() and cond_resched_softirq() have a problem too. If we're in SYSTEM_BOOTING state and need_resched() is true, these functions will drop the lock and will then try to call schedule(), but the SYSTEM_BOOTING state will prevent schedule() from being called. So on return, need_resched() will still be true, but cond_resched_lock() has to return 1 to tell the caller that the lock was dropped. The caller will probably lock up. Bottom line: if these functions dropped the lock, they _must_ call schedule() to clear need_resched(). Make it so. Also, uninline __cond_resched(). It's largeish, and slowpath. Acked-by: NIngo Molnar <mingo@elte.hu> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Jeff Dike 提交于
The x86_64 build requires a definition for __raw_writeq. Signed-off-by: NJeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Jeff Dike 提交于
I run an x86_64 kernel with i386 userspace (Ubuntu Dapper) and decided to try out UML today. I found that UML wasn't quite aware of biarch compilers (which Ubuntu i386 ships). A fix similar to what was done for x86_64 should probably be committed (see http://marc.theaimsgroup.com/?l=linux-kernel&m=113425940204010&w=2). Without the FLAGS changes, the build will fail at a number of places and without the LINK change, the final link will fail. Signed-off-by: NNishanth Aravamudan <nacc@us.ibm.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Jeff Dike 提交于
Forgot to remove arch/um/kernel/time.c when it was mostly moved to arch/um/os-Linux. Signed-off-by: NJeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Jeff Dike 提交于
Remove um_time() and um_stime() syscalls since they are identical to system-wide ones. Signed-off-by: NPaolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Jeff Dike 提交于
do_timer must be called with xtime_lock held. I'm not sure boot_timer_handler needs this, however I don't think it hurts: it simply disables irq and takes a spinlock. Signed-off-by: NPaolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Jeff Dike 提交于
-mm in combination with an FC5 init started dying with 'stderr=1' because init didn't like the lack of /dev/console and exited. The problem was that the stderr console, which is intended to dump printk output to the terminal before the regular console is initialized, isn't a tty, and so can't make /dev/console operational. However, since it is registered first, the normal console, when it is registered, doesn't become the preferred console, and isn't attached to /dev/console. Thus, /dev/console is never operational. This patch makes the stderr console unregister itself in an initcall, which is late enough that the normal console is registered. When that happens, the normal console will become the preferred console and will be able to run /dev/console. Signed-off-by: NJeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Jeff Dike 提交于
Fix an off-by-one bug in temp file creation. Seeking to the desired length and writing a byte resulted in the file being one byte longer than expected. Signed-off-by: NJeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Jeff Dike 提交于
When parsing /proc/mounts looking for a tmpfs mount on /dev/shm, if a string that we are looking for if split across reads, then it won't be recognized. Fix this by refilling the buffer whenever we advance the cursor. Signed-off-by: NJeff Dike <jdike@addtoit.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Adrian Bunk 提交于
Fix the INIT_ENV_ARG_LIMIT dependencies to what seems to have been intended. Spotted by Jean-Luc Leger. Signed-off-by: NAdrian Bunk <bunk@stusta.de> Acked-by: NPaolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Andrew Morton 提交于
Presently, smp_processor_id() isn't necessarily set up until setup_arch(). But it's used in boot_cpu_init() and printk() and perhaps in other places, prior to setup_arch() being called. So provide a new smp_setup_processor_id() which is called before anything else, wire it up for Voyager (which boots on a CPU other than #0, and broke). Cc: James Bottomley <James.Bottomley@steeleye.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 David Quigley 提交于
Add a new security hook definition for the sys_ioprio_get operation. At present, the SELinux hook function implementation for this hook is identical to the getscheduler implementation but a separate hook is introduced to allow this check to be specialized in the future if necessary. This patch also creates a helper function get_task_ioprio which handles the access check in addition to retrieving the ioprio value for the task. Signed-off-by: NDavid Quigley <dpquigl@tycho.nsa.gov> Acked-by: NStephen Smalley <sds@tycho.nsa.gov> Signed-off-by: NJames Morris <jmorris@namei.org> Cc: Jens Axboe <axboe@suse.de> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 David Quigley 提交于
This patch updates the USB core to save and pass the sending task secid when sending signals upon AIO completion so that proper security checking can be applied by security modules. Signed-off-by: NDavid Quigley <dpquigl@tycho.nsa.gov> Signed-off-by: NJames Morris <jmorris@namei.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 David Quigley 提交于
This patch adds a call to the extended security_task_kill hook introduced by the prior patch to the kill_proc_info_as_uid function so that these signals can be properly mediated by security modules. It also updates the existing hook call in check_kill_permission. Signed-off-by: NDavid Quigley <dpquigl@tycho.nsa.gov> Signed-off-by: NJames Morris <jmorris@namei.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 David Quigley 提交于
This patch extends the security_task_kill hook to handle signals sent by AIO completion. In this case, the secid of the task responsible for the signal needs to be obtained and saved earlier, so a security_task_getsecid() hook is added, and then this saved value is passed subsequently to the extended task_kill hook for use in checking. Signed-off-by: NDavid Quigley <dpquigl@tycho.nsa.gov> Signed-off-by: NJames Morris <jmorris@namei.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Christoph Lameter 提交于
Post and discussion: http://marc.theaimsgroup.com/?t=115074342800003&r=1&w=2 Code in __shrink_node() duplicates code in cache_reap() Add a new function drain_freelist that removes slabs with objects that are already free and use that in various places. This eliminates the __node_shrink() function and provides the interrupt holdoff reduction from slab_free to code that used to call __node_shrink. [akpm@osdl.org: build fixes] Signed-off-by: NChristoph Lameter <clameter@sgi.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Christoph Lameter 提交于
The remaining counters in page_state after the zoned VM counter patches have been applied are all just for show in /proc/vmstat. They have no essential function for the VM. We use a simple increment of per cpu variables. In order to avoid the most severe races we disable preempt. Preempt does not prevent the race between an increment and an interrupt handler incrementing the same statistics counter. However, that race is exceedingly rare, we may only loose one increment or so and there is no requirement (at least not in kernel) that the vm event counters have to be accurate. In the non preempt case this results in a simple increment for each counter. For many architectures this will be reduced by the compiler to a single instruction. This single instruction is atomic for i386 and x86_64. And therefore even the rare race condition in an interrupt is avoided for both architectures in most cases. The patchset also adds an off switch for embedded systems that allows a building of linux kernels without these counters. The implementation of these counters is through inline code that hopefully results in only a single instruction increment instruction being emitted (i386, x86_64) or in the increment being hidden though instruction concurrency (EPIC architectures such as ia64 can get that done). Benefits: - VM event counter operations usually reduce to a single inline instruction on i386 and x86_64. - No interrupt disable, only preempt disable for the preempt case. Preempt disable can also be avoided by moving the counter into a spinlock. - Handling is similar to zoned VM counters. - Simple and easily extendable. - Can be omitted to reduce memory use for embedded use. References: RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=113512330605497&w=2 RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=114988082814934&w=2 local_t http://marc.theaimsgroup.com/?l=linux-kernel&m=114991748606690&w=2 V2 http://marc.theaimsgroup.com/?t=115014808400007&r=1&w=2 V3 http://marc.theaimsgroup.com/?l=linux-kernel&m=115024767022346&w=2 V4 http://marc.theaimsgroup.com/?l=linux-kernel&m=115047968808926&w=2Signed-off-by: NChristoph Lameter <clameter@sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Christoph Lameter 提交于
The numa statistics are really event counters. But they are per node and so we have had special treatment for these counters through additional fields on the pcp structure. We can now use the per zone nature of the zoned VM counters to realize these. This will shrink the size of the pcp structure on NUMA systems. We will have some room to add additional per zone counters that will all still fit in the same cacheline. Bits Prior pcp size Size after patch We can add ------------------------------------------------------------------ 64 128 bytes (16 words) 80 bytes (10 words) 48 32 76 bytes (19 words) 56 bytes (14 words) 8 (64 byte cacheline) 72 (128 byte) Remove the special statistics for numa and replace them with zoned vm counters. This has the side effect that global sums of these events now show up in /proc/vmstat. Also take the opportunity to move the zone_statistics() function from page_alloc.c into vmstat.c. Discussions: V2 http://marc.theaimsgroup.com/?t=115048227000002&r=1&w=2Signed-off-by: NChristoph Lameter <clameter@sgi.com> Acked-by: NAndi Kleen <ak@suse.de> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Andrew Morton 提交于
No callers. Cc: Christoph Lameter <clameter@engr.sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Christoph Lameter 提交于
Remove writeback state We can remove some functions now that were needed to calculate the page state for writeback control since these statistics are now directly available. Signed-off-by: NChristoph Lameter <clameter@sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Christoph Lameter 提交于
Conversion of nr_bounce to a per zone counter nr_bounce is only used for proc output. So it could be left as an event counter. However, the event counters may not be accurate and nr_bounce is categorizing types of pages in a zone. So we really need this to also be a per zone counter. [akpm@osdl.org: bugfix] Signed-off-by: NChristoph Lameter <clameter@sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Christoph Lameter 提交于
Conversion of nr_unstable to a per zone counter We need to do some special modifications to the nfs code since there are multiple cases of disposition and we need to have a page ref for proper accounting. This converts the last critical page state of the VM and therefore we need to remove several functions that were depending on GET_PAGE_STATE_LAST in order to make the kernel compile again. We are only left with event type counters in page state. [akpm@osdl.org: bugfixes] Signed-off-by: NChristoph Lameter <clameter@sgi.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Christoph Lameter 提交于
Conversion of nr_writeback to per zone counter. This removes the last page_state counter from arch/i386/mm/pgtable.c so we drop the page_state from there. [akpm@osdl.org: bugfix] Signed-off-by: NChristoph Lameter <clameter@sgi.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Christoph Lameter 提交于
This makes nr_dirty a per zone counter. Looping over all processors is avoided during writeback state determination. The counter aggregation for nr_dirty had to be undone in the NFS layer since we summed up the page counts from multiple zones. Someone more familiar with NFS should probably review what I have done. [akpm@osdl.org: bugfix] Signed-off-by: NChristoph Lameter <clameter@sgi.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Christoph Lameter 提交于
Conversion of nr_page_table_pages to a per zone counter [akpm@osdl.org: bugfix] Signed-off-by: NChristoph Lameter <clameter@sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Christoph Lameter 提交于
- Allows reclaim to access counter without looping over processor counts. - Allows accurate statistics on how many pages are used in a zone by the slab. This may become useful to balance slab allocations over various zones. [akpm@osdl.org: bugfix] Signed-off-by: NChristoph Lameter <clameter@sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Christoph Lameter 提交于
The zone_reclaim_interval was necessary because we were not able to determine how many unmapped pages exist in a zone. Therefore we had to scan in intervals to figure out if any pages were unmapped. With the zoned counters and NR_ANON_PAGES we now know the number of pagecache pages and the number of mapped pages in a zone. So we can simply skip the reclaim if there is an insufficient number of unmapped pages. We use SWAP_CLUSTER_MAX as the boundary. Drop all support for /proc/sys/vm/zone_reclaim_interval. Signed-off-by: NChristoph Lameter <clameter@sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Christoph Lameter 提交于
The current NR_FILE_MAPPED is used by zone reclaim and the dirty load calculation as the number of mapped pagecache pages. However, that is not true. NR_FILE_MAPPED includes the mapped anonymous pages. This patch separates those and therefore allows an accurate tracking of the anonymous pages per zone. It then becomes possible to determine the number of unmapped pages per zone and we can avoid scanning for unmapped pages if there are none. Also it may now be possible to determine the mapped/unmapped ratio in get_dirty_limit. Isnt the number of anonymous pages irrelevant in that calculation? Note that this will change the meaning of the number of mapped pages reported in /proc/vmstat /proc/meminfo and in the per node statistics. This may affect user space tools that monitor these counters! NR_FILE_MAPPED works like NR_FILE_DIRTY. It is only valid for pagecache pages. Signed-off-by: NChristoph Lameter <clameter@sgi.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Christoph Lameter 提交于
We can now access the number of pages in a mapped state in an inexpensive way in shrink_active_list. So drop the nr_mapped field from scan_control. [akpm@osdl.org: bugfix] Signed-off-by: NChristoph Lameter <clameter@sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Christoph Lameter 提交于
Currently a single atomic variable is used to establish the size of the page cache in the whole machine. The zoned VM counters have the same method of implementation as the nr_pagecache code but also allow the determination of the pagecache size per zone. Remove the special implementation for nr_pagecache and make it a zoned counter named NR_FILE_PAGES. Updates of the page cache counters are always performed with interrupts off. We can therefore use the __ variant here. Signed-off-by: NChristoph Lameter <clameter@sgi.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Christoph Lameter 提交于
nr_mapped is important because it allows a determination of how many pages of a zone are not mapped, which would allow a more efficient means of determining when we need to reclaim memory in a zone. We take the nr_mapped field out of the page state structure and define a new per zone counter named NR_FILE_MAPPED (the anonymous pages will be split off from NR_MAPPED in the next patch). We replace the use of nr_mapped in various kernel locations. This avoids the looping over all processors in try_to_free_pages(), writeback, reclaim (swap + zone reclaim). [akpm@osdl.org: bugfix] Signed-off-by: NChristoph Lameter <clameter@sgi.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Christoph Lameter 提交于
Per zone counter infrastructure The counters that we currently have for the VM are split per processor. The processor however has not much to do with the zone these pages belong to. We cannot tell f.e. how many ZONE_DMA pages are dirty. So we are blind to potentially inbalances in the usage of memory in various zones. F.e. in a NUMA system we cannot tell how many pages are dirty on a particular node. If we knew then we could put measures into the VM to balance the use of memory between different zones and different nodes in a NUMA system. For example it would be possible to limit the dirty pages per node so that fast local memory is kept available even if a process is dirtying huge amounts of pages. Another example is zone reclaim. We do not know how many unmapped pages exist per zone. So we just have to try to reclaim. If it is not working then we pause and try again later. It would be better if we knew when it makes sense to reclaim unmapped pages from a zone. This patchset allows the determination of the number of unmapped pages per zone. We can remove the zone reclaim interval with the counters introduced here. Futhermore the ability to have various usage statistics available will allow the development of new NUMA balancing algorithms that may be able to improve the decision making in the scheduler of when to move a process to another node and hopefully will also enable automatic page migration through a user space program that can analyse the memory load distribution and then rebalance memory use in order to increase performance. The counter framework here implements differential counters for each processor in struct zone. The differential counters are consolidated when a threshold is exceeded (like done in the current implementation for nr_pageache), when slab reaping occurs or when a consolidation function is called. Consolidation uses atomic operations and accumulates counters per zone in the zone structure and also globally in the vm_stat array. VM functions can access the counts by simply indexing a global or zone specific array. The arrangement of counters in an array also simplifies processing when output has to be generated for /proc/*. Counters can be updated by calling inc/dec_zone_page_state or _inc/dec_zone_page_state analogous to *_page_state. The second group of functions can be called if it is known that interrupts are disabled. Special optimized increment and decrement functions are provided. These can avoid certain checks and use increment or decrement instructions that an architecture may provide. We also add a new CONFIG_DMA_IS_NORMAL that signifies that an architecture can do DMA to all memory and therefore ZONE_NORMAL will not be populated. This is only currently set for IA64 SGI SN2 and currently only affects node_page_state(). In the best case node_page_state can be reduced to retrieving a single counter for the one zone on the node. [akpm@osdl.org: cleanups] [akpm@osdl.org: export vm_stat[] for filesystems] Signed-off-by: NChristoph Lameter <clameter@sgi.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Christoph Lameter 提交于
NOTE: ZVC are *not* the lightweight event counters. ZVCs are reliable whereas event counters do not need to be. Zone based VM statistics are necessary to be able to determine what the state of memory in one zone is. In a NUMA system this can be helpful for local reclaim and other memory optimizations that may be able to shift VM load in order to get more balanced memory use. It is also useful to know how the computing load affects the memory allocations on various zones. This patchset allows the retrieval of that data from userspace. The patchset introduces a framework for counters that is a cross between the existing page_stats --which are simply global counters split per cpu-- and the approach of deferred incremental updates implemented for nr_pagecache. Small per cpu 8 bit counters are added to struct zone. If the counter exceeds certain thresholds then the counters are accumulated in an array of atomic_long in the zone and in a global array that sums up all zone values. The small 8 bit counters are next to the per cpu page pointers and so they will be in high in the cpu cache when pages are allocated and freed. Access to VM counter information for a zone and for the whole machine is then possible by simply indexing an array (Thanks to Nick Piggin for pointing out that approach). The access to the total number of pages of various types does no longer require the summing up of all per cpu counters. Benefits of this patchset right now: - Ability for UP and SMP configuration to determine how memory is balanced between the DMA, NORMAL and HIGHMEM zones. - loops over all processors are avoided in writeback and reclaim paths. We can avoid caching the writeback information because the needed information is directly accessible. - Special handling for nr_pagecache removed. - zone_reclaim_interval vanishes since VM stats can now determine when it is worth to do local reclaim. - Fast inline per node page state determination. - Accurate counters in /sys/devices/system/node/node*/meminfo. Current counters are counting simply which processor allocated a page somewhere and guestimate based on that. So the counters were not useful to show the actual distribution of page use on a specific zone. - The swap_prefetch patch requires per node statistics in order to figure out when processors of a node can prefetch. This patch provides some of the needed numbers. - Detailed VM counters available in more /proc and /sys status files. References to earlier discussions: V1 http://marc.theaimsgroup.com/?l=linux-kernel&m=113511649910826&w=2 V2 http://marc.theaimsgroup.com/?l=linux-kernel&m=114980851924230&w=2 V3 http://marc.theaimsgroup.com/?l=linux-kernel&m=115014697910351&w=2 V4 http://marc.theaimsgroup.com/?l=linux-kernel&m=115024767318740&w=2 Performance tests with AIM7 did not show any regressions. Seems to be a tad faster even. Tested on ia64/NUMA. Builds fine on i386, SMP / UP. Includes fixes for s390/arm/uml arch code. This patch: Move counter code from page_alloc.c/page-flags.h to vmstat.c/h. Create vmstat.c/vmstat.h by separating the counter code and the proc functions. Move the vm_stat_text array before zoneinfo_show. [akpm@osdl.org: s390 build fix] [akpm@osdl.org: HOTPLUG_CPU build fix] Signed-off-by: NChristoph Lameter <clameter@sgi.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-