- 31 10月, 2005 5 次提交
-
-
由 Rafael J. Wysocki 提交于
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Rafael J. Wysocki 提交于
The following patch moves the functionality of swsusp related to creating and handling the snapshot of memory to a separate file, snapshot.c This should enable us to untangle the code in the future and eventually to implement some parts of swsusp.c in the user space. The patch does not change the code. Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl> Signed-off-by: NPavel Machek <pavel@suse.cz> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Rafael J. Wysocki 提交于
The following patch makes swsusp use PG_nosave and PG_nosave_free flags to mark pages that should be freed after the state of the system has been restored from the image (or in case of an error during suspend). This allows us to avoid storing metadata in swap twice and to reduce the amount of memory needed by swsusp. Additionally, it allows us to simplify the code by removing a couple of functions that are no longer necessary. Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl> Signed-off-by: NPavel Machek <pavel@suse.cz> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Ashok Raj 提交于
cpufreq entries in sysfs should only be populated when CPU is online state. When we either boot with maxcpus=x and then boot the other cpus by echoing to sysfs online file, these entries should be created and destroyed when CPU_DEAD is notified. Same treatement as cache entries under sysfs. We place the processor in the lowest frequency, so hw managed P-State transitions can still work on the other threads to save power. Primary goal was to just make these directories appear/disapper dynamically. There is one in this patch i had to do, which i really dont like myself but probably best if someone handling the cpufreq infrastructure could give this code right treatment if this is not acceptable. I guess its probably good for the first cut. - Converting lock_cpu_hotplug()/unlock_cpu_hotplug() to disable/enable preempt. The locking was smack in the middle of the notification path, when the hotplug is already holding the lock. I tried another solution to avoid this so avoid taking locks if we know we are from notification path. The solution was getting very ugly and i decided this was probably good for this iteration until someone who understands cpufreq could do a better job than me. (akpm: export cpucontrol to GPL modules: drivers/cpufreq/cpufreq_stats.c now does lock_cpu_hotplug()) Signed-off-by: NAshok Raj <ashok.raj@intel.com> Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Zwane Mwaikambo <zwane@holomorphy.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Brian Gerst 提交于
Since CONFIG_IKCONFIG_PROC already depends on CONFIG_IKCONFIG, adding configs.o again is redundant. Signed-off-by: NBrian Gerst <bgerst@didntduck.org> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 30 10月, 2005 13 次提交
-
-
由 Hugh Dickins 提交于
Christoph Lameter demonstrated very poor scalability on the SGI 512-way, with a many-threaded application which concurrently initializes different parts of a large anonymous area. This patch corrects that, by using a separate spinlock per page table page, to guard the page table entries in that page, instead of using the mm's single page_table_lock. (But even then, page_table_lock is still used to guard page table allocation, and anon_vma allocation.) In this implementation, the spinlock is tucked inside the struct page of the page table page: with a BUILD_BUG_ON in case it overflows - which it would in the case of 32-bit PA-RISC with spinlock debugging enabled. Splitting the lock is not quite for free: another cacheline access. Ideally, I suppose we would use split ptlock only for multi-threaded processes on multi-cpu machines; but deciding that dynamically would have its own costs. So for now enable it by config, at some number of cpus - since the Kconfig language doesn't support inequalities, let preprocessor compare that with NR_CPUS. But I don't think it's worth being user-configurable: for good testing of both split and unsplit configs, split now at 4 cpus, and perhaps change that to 8 later. There is a benefit even for singly threaded processes: kswapd can be attacking one part of the mm while another part is busy faulting. Signed-off-by: NHugh Dickins <hugh@veritas.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Hugh Dickins 提交于
Final step in pushing down common core's page_table_lock. follow_page no longer wants caller to hold page_table_lock, uses pte_offset_map_lock itself; and so no page_table_lock is taken in get_user_pages itself. But get_user_pages (and get_futex_key) do then need follow_page to pin the page for them: take Daniel's suggestion of bitflags to follow_page. Need one for WRITE, another for TOUCH (it was the accessed flag before: vanished along with check_user_page_readable, but surely get_numa_maps is wrong to mark every page it finds as accessed), another for GET. And another, ANON to dispose of untouched_anonymous_page: it seems silly for that to descend a second time, let follow_page observe if there was no page table and return ZERO_PAGE if so. Fix minor bug in that: check VM_LOCKED - make_pages_present ought to make readonly anonymous present. Give get_numa_maps a cond_resched while we're there. Signed-off-by: NHugh Dickins <hugh@veritas.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Hugh Dickins 提交于
Second step in pushing down the page_table_lock. Remove the temporary bridging hack from __pud_alloc, __pmd_alloc, __pte_alloc: expect callers not to hold page_table_lock, whether it's on init_mm or a user mm; take page_table_lock internally to check if a racing task already allocated. Convert their callers from common code. But avoid coming back to change them again later: instead of moving the spin_lock(&mm->page_table_lock) down, switch over to new macros pte_alloc_map_lock and pte_unmap_unlock, which encapsulate the mapping+locking and unlocking+unmapping together, and in the end may use alternatives to the mm page_table_lock itself. These callers all hold mmap_sem (some exclusively, some not), so at no level can a page table be whipped away from beneath them; and pte_alloc uses the "atomic" pmd_present to test whether it needs to allocate. It appears that on all arches we can safely descend without page_table_lock. Signed-off-by: NHugh Dickins <hugh@veritas.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Hugh Dickins 提交于
update_mem_hiwater has attracted various criticisms, in particular from those concerned with mm scalability. Originally it was called whenever rss or total_vm got raised. Then many of those callsites were replaced by a timer tick call from account_system_time. Now Frank van Maarseveen reports that to be found inadequate. How about this? Works for Frank. Replace update_mem_hiwater, a poor combination of two unrelated ops, by macros update_hiwater_rss and update_hiwater_vm. Don't attempt to keep mm->hiwater_rss up to date at timer tick, nor every time we raise rss (usually by 1): those are hot paths. Do the opposite, update only when about to lower rss (usually by many), or just before final accounting in do_exit. Handle mm->hiwater_vm in the same way, though it's much less of an issue. Demand that whoever collects these hiwater statistics do the work of taking the maximum with rss or total_vm. And there has been no collector of these hiwater statistics in the tree. The new convention needs an example, so match Frank's usage by adding a VmPeak line above VmSize to /proc/<pid>/status, and also a VmHWM line above VmRSS (High-Water-Mark or High-Water-Memory). There was a particular anomaly during mremap move, that hiwater_vm might be captured too high. A fleeting such anomaly remains, but it's quickly corrected now, whereas before it would stick. What locking? None: if the app is racy then these statistics will be racy, it's not worth any overhead to make them exact. But whenever it suits, hiwater_vm is updated under exclusive mmap_sem, and hiwater_rss under page_table_lock (for now) or with preemption disabled (later on): without going to any trouble, minimize the time between reading current values and updating, to minimize those occasions when a racing thread bumps a count up and back down in between. Signed-off-by: NHugh Dickins <hugh@veritas.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Nick Piggin 提交于
Remove PageReserved() calls from core code by tightening VM_RESERVED handling in mm/ to cover PageReserved functionality. PageReserved special casing is removed from get_page and put_page. All setting and clearing of PageReserved is retained, and it is now flagged in the page_alloc checks to help ensure we don't introduce any refcount based freeing of Reserved pages. MAP_PRIVATE, PROT_WRITE of VM_RESERVED regions is tentatively being deprecated. We never completely handled it correctly anyway, and is be reintroduced in future if required (Hugh has a proof of concept). Once PageReserved() calls are removed from kernel/power/swsusp.c, and all arch/ and driver code, the Set and Clear calls, and the PG_reserved bit can be trivially removed. Last real user of PageReserved is swsusp, which uses PageReserved to determine whether a struct page points to valid memory or not. This still needs to be addressed (a generic page_is_ram() should work). A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and thus mapcounted and count towards shared rss). These writes to the struct page could cause excessive cacheline bouncing on big systems. There are a number of ways this could be addressed if it is an issue. Signed-off-by: NNick Piggin <npiggin@suse.de> Refcount bug fix for filemap_xip.c Signed-off-by: NCarsten Otte <cotte@de.ibm.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Hugh Dickins 提交于
One anomaly remains from when Andrea rationalized the responsibilities of mmap_sem and page_table_lock: in dup_mmap we add vmas to the child holding its page_table_lock, but not the mmap_sem which normally guards the vma list and rbtree. Which could be an issue for unuse_mm: though since it just walks down the list (today with page_table_lock, tomorrow not), it's probably okay. Will need a memory barrier? Oh, keep it simple, Nick and I agreed, no harm in taking child's mmap_sem here. Signed-off-by: NHugh Dickins <hugh@veritas.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Hugh Dickins 提交于
Use the parent's oldmm throughout dup_mmap, instead of perversely going back to current->mm. (Can you hear the sigh of relief from those mpnts? Usually I squash them, but not today.) Signed-off-by: NHugh Dickins <hugh@veritas.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Hugh Dickins 提交于
I was lazy when we added anon_rss, and chose to change as few places as possible. So currently each anonymous page has to be counted twice, in rss and in anon_rss. Which won't be so good if those are atomic counts in some configurations. Change that around: keep file_rss and anon_rss separately, and add them together (with get_mm_rss macro) when the total is needed - reading two atomics is much cheaper than updating two atomics. And update anon_rss upfront, typically in memory.c, not tucked away in page_add_anon_rmap. Signed-off-by: NHugh Dickins <hugh@veritas.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Hugh Dickins 提交于
How is anon_rss initialized? In dup_mmap, and by mm_alloc's memset; but that's not so good if an mm_counter_t is a special type. And how is rss initialized? By set_mm_counter, all over the place. Come on, we just need to initialize them both at once by set_mm_counter in mm_init (which follows the memcpy when forking). Signed-off-by: NHugh Dickins <hugh@veritas.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Hugh Dickins 提交于
The original vm_stat_account has fallen into disuse, with only one user, and only one user of vm_stat_unaccount. It's easier to keep track if we convert them all to __vm_stat_account, then free it from its __shackles. Signed-off-by: NHugh Dickins <hugh@veritas.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 YOSHIFUJI Hideaki 提交于
Add missing compensation for (HZ == 250) != (1 << SHIFT_HZ) in second_overflow(). Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Al Viro 提交于
frv, sh64, ia64 and sparc64 do not have do_settimeofday() exported (the last two are using variant in kernel/time.c). Exports added to match the rest of architectures. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Oleg Nesterov 提交于
exit_signal() (called from copy_process's error path) should decrement ->signal->live, otherwise forking process will miss 'group_dead' in do_exit(). Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 28 10月, 2005 3 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Roland McGrath 提交于
This just makes sure that a thread's expiry times can't get reset after it clears them in do_exit. This is what allowed us to re-introduce the stricter BUG_ON() check in a362f463. Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Linus Torvalds 提交于
This reverts commit 3de463c7. Roland has another patch that allows us to leave the BUG_ON() in place by just making sure that the condition it tests for really is always true. That goes in next. Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 27 10月, 2005 3 次提交
-
-
由 Oleg Nesterov 提交于
There's a silly off-by-one error in the code that updates the expiration of posix CPU timers, causing them to not be properly updated when they hit exactly on their expiration time (which should be the normal case). This causes them to then fire immediately again, and only _then_ get properly updated. Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Linus Torvalds 提交于
Pointed out by Oleg Nesterov, who has been walking over the code forwards and backwards. Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Andrew Morton 提交于
With CONFIG_SMP=n: *** Warning: "cpu_online_map" [drivers/firmware/dcdbas.ko] undefined! due to set_cpus_allowed(). Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 24 10月, 2005 5 次提交
-
-
由 Oleg Nesterov 提交于
This might be harmless, but looks like a race from code inspection (I was unable to trigger it). I must admit, I don't understand why we can't return TIMER_RETRY after 'spin_unlock(&p->sighand->siglock)' without doing bump_cpu_timer(), but this is what original code does. posix_cpu_timer_set: read_lock(&tasklist_lock); spin_lock(&p->sighand->siglock); list_del_init(&timer->it.cpu.entry); spin_unlock(&p->sighand->siglock); We are probaly deleting the timer from run_posix_cpu_timers's 'firing' local list_head while run_posix_cpu_timers() does list_for_each_safe. Various bad things can happen, for example we can just delete this timer so that list_for_each() will not notice it and run_posix_cpu_timers() will not reset '->firing' flag. In that case, .... if (timer->it.cpu.firing) { read_unlock(&tasklist_lock); timer->it.cpu.firing = -1; return TIMER_RETRY; } sys_timer_settime() goes to 'retry:', calls posix_cpu_timer_set() again, it returns TIMER_RETRY ... Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Oleg Nesterov 提交于
No need to rebalance when task exited Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Oleg Nesterov 提交于
do_exit() clears ->it_##clock##_expires, but nothing prevents another cpu to attach the timer to exiting process after that. After exit_notify() does 'write_unlock_irq(&tasklist_lock)' and before do_exit() calls 'schedule() local timer interrupt can find tsk->exit_state != 0. If that state was EXIT_DEAD (or another cpu does sys_wait4) interrupted task has ->signal == NULL. At this moment exiting task has no pending cpu timers, they were cleaned up in __exit_signal()->posix_cpu_timers_exit{,_group}(), so we can just return from irq. Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Oleg Nesterov 提交于
1. cleanup_timers() sets timer->task = NULL under tasklist + ->sighand locks. That means that this code in posix_cpu_timer_del() and posix_cpu_timer_set() lock_timer(timer); if (timer->task == NULL) return; read_lock(tasklist); put_task_struct(timer->task) is racy. With this patch timer->task modified and accounted only under timer->it_lock. Sadly, this means that dead task_struct won't be freed until timer deleted or armed. 2. run_posix_cpu_timers() collects expired timers into local list under tasklist + ->sighand again. That means that posix_cpu_timer_del() should check timer->it.cpu.firing under these locks too. Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Linus Torvalds 提交于
Bursty timers aren't good for anybody, very much including latency for other programs when we trigger lots of timers in interrupt context. So set a random limit, after which we'll handle the rest on the next timer tick. Noted by Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 22 10月, 2005 2 次提交
-
-
由 Roland McGrath 提交于
When I originally moved exit_itimers into __exit_signal, that was the only place where we could reliably know it was the last thread in the group dying, without races. Since then we've gotten the signal_struct.live counter, and do_exit can reliably do group-wide cleanup work. This patch moves the call to do_exit, where it's made without locks. This avoids the deadlock issues that the old __exit_signal code's comment talks about, and the one that Oleg found recently with process CPU timers. [ This replaces e03d13e9, which is why it was just reverted. ] Signed-off-by: NRoland McGrath <roland@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Linus Torvalds 提交于
Revert commit e03d13e9, to be replaced by a much nicer fix from Roland.
-
- 20 10月, 2005 2 次提交
-
-
由 Alan Stern 提交于
The PF_NOFREEZE process flag should not be inherited when a thread is forked. This patch (as585) removes the flag from the child. This problem is starting to show up more and more as drivers turn to the kthread API instead of using kernel_thread(). As a result, their kernel threads are now children of the kthread worker instead of modprobe, and they inherit the PF_NOFREEZE flag. This can cause problems during system suspend; the kernel threads are not getting frozen as they ought to be. Signed-off-by: NAlan Stern <stern@rowland.harvard.edu> Acked-by: NPavel Machek <pavel@ucw.cz> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Roland McGrath 提交于
Oleg Nesterov reported an SMP deadlock. If there is a running timer tracking a different process's CPU time clock when the process owning the timer exits, we deadlock on tasklist_lock in posix_cpu_timer_del via exit_itimers. That code was using tasklist_lock to check for a race with __exit_signal being called on the timer-target task and clearing its ->signal. However, there is actually no such race. __exit_signal will have called posix_cpu_timers_exit and posix_cpu_timers_exit_group before it does that. Those will clear those k_itimer's association with the dying task, so posix_cpu_timer_del will return early and never reach the code in question. In addition, posix_cpu_timer_del called from exit_itimers during execve or directly from timer_delete in the process owning the timer can race with an exiting timer-target task to cause a double put on timer-target task struct. Make sure we always access cpu_timers lists with sighand lock held. Signed-off-by: NRoland McGrath <roland@redhat.com> Signed-off-by: NChris Wright <chrisw@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 18 10月, 2005 3 次提交
-
-
由 Eric Dumazet 提交于
This makes call_rcu() keep track of how many events there are on the RCU list, and cause a reschedule event when the list gets too long. This helps keep RCU event lists down. Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Oleg Nesterov 提交于
Make sure we release the task struct properly when releasing pending timers. release_task() does write_lock_irq(&tasklist_lock), so it can't race with run_posix_cpu_timers() on any cpu. Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Linus Torvalds 提交于
Dipankar made RCU limit the batch size to improve latency, but that approach is unworkable: it can cause the RCU queues to grow without bounds, since the batch limiter ended up limiting the callbacks. So make the limit much higher, and start planning on instead limiting the batch size by doing RCU callbacks more often if the queue looks like it might be growing too long. Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 15 10月, 2005 1 次提交
-
-
由 Takashi Iwai 提交于
Adds the missing EXPORT_SYMBOL_GPL for getnstimeofday() when CONFIG_TIME_INTERPOLATION isn't set. Needed by drivers/char/mmtimer.c Signed-off-by: NTakashi Iwai <tiwai@suse.de> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 11 10月, 2005 1 次提交
-
-
由 Harald Welte 提交于
If a process issues an URB from userspace and (starts to) terminate before the URB comes back, we run into the issue described above. This is because the urb saves a pointer to "current" when it is posted to the device, but there's no guarantee that this pointer is still valid afterwards. In fact, there are three separate issues: 1) the pointer to "current" can become invalid, since the task could be completely gone when the URB completion comes back from the device. 2) Even if the saved task pointer is still pointing to a valid task_struct, task_struct->sighand could have gone meanwhile. 3) Even if the process is perfectly fine, permissions may have changed, and we can no longer send it a signal. So what we do instead, is to save the PID and uid's of the process, and introduce a new kill_proc_info_as_uid() function. Signed-off-by: NHarald Welte <laforge@gnumonks.org> [ Fixed up types and added symbol exports ] Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 10 10月, 2005 1 次提交
-
-
由 Rafael J. Wysocki 提交于
The following patch makes swsusp avoid the possible temporary corruption of page translation tables during resume on x86-64. This is achieved by creating a copy of the relevant page tables that will not be modified by swsusp and can be safely used by it on resume. The problem is that during resume on x86-64 swsusp may temporarily corrupt the page tables used for the direct mapping of RAM. If that happens, a page fault occurs and cannot be handled properly, which leads to the solid hang of the affected system. This leads to the loss of the system's state from before suspend and may result in the loss of data or the corruption of filesystems, so it is a serious issue. Also, it appears to happen quite often (for me, as often as 50% of the time). The problem is related to the fact that (at least) one of the PMD entries used in the direct memory mapping (starting at PAGE_OFFSET) points to a page table the physical address of which is much greater than the physical address of the PMD entry itself. Moreover, unfortunately, the physical address of the page table before suspend (i.e. the one stored in the suspend image) happens to be different to the physical address of the corresponding page table used during resume (i.e. the one that is valid right before swsusp_arch_resume() in arch/x86_64/kernel/suspend_asm.S is executed). Thus while the image is restored, the "offending" PMD entry gets overwritten, so it does not point to the right physical address any more (i.e. there's no page table at the address pointed to by it, because it points to the address the page table has been at during suspend). Consequently, if the PMD entry is used later on, and it _is_ used in the process of copying the image pages, a page fault occurs, but it cannot be handled in the normal way and the system hangs. In principle we can call create_resume_mapping() from swsusp_arch_resume() (ie. from suspend_asm.S), but then the memory allocations in create_resume_mapping(), resume_pud_mapping(), and resume_pmd_mapping() must be made carefully so that we use _only_ NosaveFree pages in them (the other pages are overwritten by the loop in swsusp_arch_resume()). Additionally, we are in atomic context at that time, so we cannot use GFP_KERNEL. Moreover, if one of the allocations fails, we should free all of the allocated pages, so we need to trace them somehow. All of this is done in the appended patch, except that the functions populating the page tables are located in arch/x86_64/kernel/suspend.c rather than in init.c. It may be done in a more elegan way in the future, with the help of some swsusp patches that are in the works now. [AK: move some externs into headers, renamed a function] Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl> Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 09 10月, 2005 1 次提交
-
-
由 Al Viro 提交于
- added typedef unsigned int __nocast gfp_t; - replaced __nocast uses for gfp flags with gfp_t - it gives exactly the same warnings as far as sparse is concerned, doesn't change generated code (from gcc point of view we replaced unsigned int with typedef) and documents what's going on far better. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-