- 17 10月, 2007 40 次提交
-
-
由 Nick Piggin 提交于
Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nick Piggin 提交于
[akpm@linux-foundation.org: fix against git-nfs] [peterz@infradead.org: fix against git-nfs] Signed-off-by: NNick Piggin <npiggin@suse.de> Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nick Piggin 提交于
Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vladimir Saveliev 提交于
This patch makes reiserfs to use AOP_FLAG_CONT_EXPAND in order to get rid of the special generic_cont_expand routine Signed-off-by: NVladimir Saveliev <vs@namesys.com> Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vladimir Saveliev 提交于
Convert reiserfs to new aops Signed-off-by: NVladimir Saveliev <vs@namesys.com> Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vladimir Saveliev 提交于
Make reiserfs to write via generic routines. Original reiserfs write optimized for big writes is deadlock rone Signed-off-by: NVladimir Saveliev <vs@namesys.com> Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nick Piggin 提交于
Signed-off-by: NNick Piggin <npiggin@suse.de> Acked-by: NAnders Larsen <al@alarsen.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nick Piggin 提交于
Signed-off-by: NNick Piggin <npiggin@suse.de> Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nick Piggin 提交于
Signed-off-by: NNick Piggin <npiggin@suse.de> Cc: <mikulas@artax.karlin.mff.cuni.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nick Piggin 提交于
Signed-off-by: NNick Piggin <npiggin@suse.de> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nick Piggin 提交于
Signed-off-by: NNick Piggin <npiggin@suse.de> Cc: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nick Piggin 提交于
Signed-off-by: NNick Piggin <npiggin@suse.de> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nick Piggin 提交于
Rework the generic block "cont" routines to handle the new aops. Supporting cont_prepare_write would take quite a lot of code to support, so remove it instead (and we later convert all filesystems to use it). write_begin gets passed AOP_FLAG_CONT_EXPAND when called from generic_cont_expand, so filesystems can avoid the old hacks they used. Signed-off-by: NNick Piggin <npiggin@suse.de> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Steven Whitehouse 提交于
Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nick Piggin 提交于
Signed-off-by: NNick Piggin <npiggin@suse.de> Cc: David Chinner <dgc@sgi.com> Cc: Timothy Shimmin <tes@sgi.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nick Piggin 提交于
Convert ext4 to use write_begin()/write_end() methods. Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com> Signed-off-by: NNick Piggin <npiggin@suse.de> Cc: Dmitriy Monakhov <dmonakhov@sw.ru> Cc: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nick Piggin 提交于
Various fixes and improvements Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com> Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nick Piggin 提交于
Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nick Piggin 提交于
Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nick Piggin 提交于
Implement new aops for some of the simpler filesystems. Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nick Piggin 提交于
Restore the KERNEL_DS optimisation, especially helpful to the 2copy write path. This may be a pretty questionable gain in most cases, especially after the legacy 2copy write path is removed, but it doesn't cost much. Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dmitry Monakhov 提交于
Partial write can be easily supported by LO_CRYPT_NONE mode, but it is not easy in LO_CRYPT_CRYPTOAPI case, because of its block nature. I don't know who still used cryptoapi, but theoretically it is possible. So let's leave things as they are. Loop device doesn't support partial write before Nick's "write_begin/write_end" patch set, and let's it behave the same way after. Signed-off-by: NDmitriy Monakhov <dmonakhov@openvz.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nick Piggin 提交于
These are intended to replace prepare_write and commit_write with more flexible alternatives that are also able to avoid the buffered write deadlock problems efficiently (which prepare_write is unable to do). [mark.fasheh@oracle.com: API design contributions, code review and fixes] [akpm@linux-foundation.org: various fixes] [dmonakhov@sw.ru: new aop block_write_begin fix] Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com> Signed-off-by: NDmitriy Monakhov <dmonakhov@openvz.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nick Piggin 提交于
New buffers against uptodate pages are simply be marked uptodate, while the buffer_new bit remains set. This causes error-case code to zero out parts of those buffers because it thinks they contain stale data: wrong, they are actually uptodate so this is a data loss situation. Fix this by actually clearning buffer_new and marking the buffer dirty. It makes sense to always clear buffer_new before setting a buffer uptodate. Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nick Piggin 提交于
Add an iterator data structure to operate over an iovec. Add usercopy operators needed by generic_file_buffered_write, and convert that function over. Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nick Piggin 提交于
Modify the core write() code so that it won't take a pagefault while holding a lock on the pagecache page. There are a number of different deadlocks possible if we try to do such a thing: 1. generic_buffered_write 2. lock_page 3. prepare_write 4. unlock_page+vmtruncate 5. copy_from_user 6. mmap_sem(r) 7. handle_mm_fault 8. lock_page (filemap_nopage) 9. commit_write 10. unlock_page a. sys_munmap / sys_mlock / others b. mmap_sem(w) c. make_pages_present d. get_user_pages e. handle_mm_fault f. lock_page (filemap_nopage) 2,8 - recursive deadlock if page is same 2,8;2,8 - ABBA deadlock is page is different 2,6;b,f - ABBA deadlock if page is same The solution is as follows: 1. If we find the destination page is uptodate, continue as normal, but use atomic usercopies which do not take pagefaults and do not zero the uncopied tail of the destination. The destination is already uptodate, so we can commit_write the full length even if there was a partial copy: it does not matter that the tail was not modified, because if it is dirtied and written back to disk it will not cause any problems (uptodate *means* that the destination page is as new or newer than the copy on disk). 1a. The above requires that fault_in_pages_readable correctly returns access information, because atomic usercopies cannot distinguish between non-present pages in a readable mapping, from lack of a readable mapping. 2. If we find the destination page is non uptodate, unlock it (this could be made slightly more optimal), then allocate a temporary page to copy the source data into. Relock the destination page and continue with the copy. However, instead of a usercopy (which might take a fault), copy the data from the pinned temporary page via the kernel address space. (also, rename maxlen to seglen, because it was confusing) This increases the CPU/memory copy cost by almost 50% on the affected workloads. That will be solved by introducing a new set of pagecache write aops in a subsequent patch. Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nick Piggin 提交于
Hide some of the open-coded nr_segs tests into the iovec helpers. This is all to simplify generic_file_buffered_write, because that gets more complex in the next patch. Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nick Piggin 提交于
Quite a bit of code is used in maintaining these "cached pages" that are probably pretty unlikely to get used. It would require a narrow race where the page is inserted concurrently while this process is allocating a page in order to create the spare page. Then a multi-page write into an uncached part of the file, to make use of it. Next, the buffered write path (and others) uses its own LRU pagevec when it should be just using the per-CPU LRU pagevec (which will cut down on both data and code size cacheline footprint). Also, these private LRU pagevecs are emptied after just a very short time, in contrast with the per-CPU pagevecs that are persistent. Net result: 7.3 times fewer lru_lock acquisitions required to add the pages to pagecache for a bulk write (in 4K chunks). [this gets rid of some cond_resched() calls in readahead.c and mpage.c due to clashes in -mm. What put them there, and why? ] Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nick Piggin 提交于
If prepare_write fails with AOP_TRUNCATED_PAGE, or if commit_write fails, then we may have failed the write operation despite prepare_write having instantiated blocks past i_size. Fix this, and consolidate the trimming into one place. Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nick Piggin 提交于
Allow CONFIG_DEBUG_VM to switch off the prefaulting logic, to simulate the Makes the race much easier to hit. This is useful for demonstration and testing purposes, but is removed in a subsequent patch. Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrew Morton 提交于
Rename some variables and fix some types. Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrew Morton 提交于
This reverts commit 6527c2bd, which fixed the following bug: When prefaulting in the pages in generic_file_buffered_write(), we only faulted in the pages for the firts segment of the iovec. If the second of successive segment described a mmapping of the page into which we're write()ing, and that page is not up-to-date, the fault handler tries to lock the already-locked page (to bring it up to date) and deadlocks. An exploit for this bug is in writev-deadlock-demo.c, in http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz. (These demos assume blocksize < PAGE_CACHE_SIZE). The problem with this fix is that it takes the kernel back to doing a single prepare_write()/commit_write() per iovec segment. So in the worst case we'll run prepare_write+commit_write 1024 times where we previously would have run it once. The other problem with the fix is that it fix all the locking problems. <insert numbers obtained via ext3-tools's writev-speed.c here> And apparently this change killed NFS overwrite performance, because, I suppose, it talks to the server for each prepare_write+commit_write. So just back that patch out - we'll be fixing the deadlock by other means. Nick says: also it only ever actually papered over the bug, because after faulting in the pages, they might be unmapped or reclaimed. Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrew Morton 提交于
This reverts commit 81b0c871, which was a bugfix against 6527c2bd ("[PATCH] generic_file_buffered_write(): deadlock on vectored write"), which we also revert. Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nick Piggin 提交于
Revert the patch from Neil Brown to optimise NFSD writev handling. Cc: Neil Brown <neilb@suse.de> Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hisashi Hifumi 提交于
While running some memory intensive load, system response deteriorated just after swap-out started. The cause of this problem is that when a PG_reclaim page is moved to the tail of the inactive LRU list in rotate_reclaimable_page(), lru_lock spin lock is acquired every page writeback . This deteriorates system performance and makes interrupt hold off time longer when swap-out started. Following patch solves this problem. I use pagevec in rotating reclaimable pages to mitigate LRU spin lock contention and reduce interrupt hold off time. I did a test that allocating and touching pages in multiple processes, and pinging to the test machine in flooding mode to measure response under memory intensive load. The test result is: -2.6.23-rc5 --- testmachine ping statistics --- 3000 packets transmitted, 3000 received, 0% packet loss, time 53222ms rtt min/avg/max/mdev = 0.074/0.652/172.228/7.176 ms, pipe 11, ipg/ewma 17.746/0.092 ms -2.6.23-rc5-patched --- testmachine ping statistics --- 3000 packets transmitted, 3000 received, 0% packet loss, time 51924ms rtt min/avg/max/mdev = 0.072/0.108/3.884/0.114 ms, pipe 2, ipg/ewma 17.314/0.091 ms Max round-trip-time was improved. The test machine spec is that 4CPU(3.16GHz, Hyper-threading enabled) 8GB memory , 8GB swap. I did ping test again to observe performance deterioration caused by taking a ref. -2.6.23-rc6-with-modifiedpatch --- testmachine ping statistics --- 3000 packets transmitted, 3000 received, 0% packet loss, time 53386ms rtt min/avg/max/mdev = 0.074/0.110/4.716/0.147 ms, pipe 2, ipg/ewma 17.801/0.129 ms The result for my original patch is as follows. -2.6.23-rc5-with-originalpatch --- testmachine ping statistics --- 3000 packets transmitted, 3000 received, 0% packet loss, time 51924ms rtt min/avg/max/mdev = 0.072/0.108/3.884/0.114 ms, pipe 2, ipg/ewma 17.314/0.091 ms The influence to response was small. [akpm@linux-foundation.org: fix uninitalised var warning] [hugh@veritas.com: fix locking] [randy.dunlap@oracle.com: fix function declaration] [hugh@veritas.com: fix BUG at include/linux/mm.h:220!] [hugh@veritas.com: kill redundancy in rotate_reclaimable_page] [hugh@veritas.com: move_tail_pages into lru_add_drain] Signed-off-by: NHisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Lee Schermerhorn 提交于
Allow an application to query the memories allowed by its context. Updated numa_memory_policy.txt to mention that applications can use this to obtain allowed memories for constructing valid policies. TODO: update out-of-tree libnuma wrapper[s], or maybe add a new wrapper--e.g., numa_get_mems_allowed() ? Also, update numa syscall man pages. Tested with memtoy V>=0.13. Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: NChristoph Lameter <clameter@sgi.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Rik van Riel 提交于
The current VM can get itself into trouble fairly easily on systems with a small ZONE_HIGHMEM, which is common on i686 computers with 1GB of memory. On one side, page_alloc() will allocate down to zone->pages_low, while on the other side, kswapd() and balance_pgdat() will try to free memory from every zone, until every zone has more free pages than zone->pages_high. Highmem can be filled up to zone->pages_low with page tables, ramfs, vmalloc allocations and other unswappable things quite easily and without many bad side effects, since we still have a huge ZONE_NORMAL to do future allocations from. However, as long as the number of free pages in the highmem zone is below zone->pages_high, kswapd will continue swapping things out from ZONE_NORMAL, too! Sami Farin managed to get his system into a stage where kswapd had freed about 700MB of low memory and was still "going strong". The attached patch will make kswapd stop paging out data from zones when there is more than enough memory free. We do go above zone->pages_high in order to keep pressure between zones equal in normal circumstances, but the patch should prevent the kind of excesses that made Sami's computer totally unusable. Signed-off-by: NRik van Riel <riel@redhat.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jesper Juhl 提交于
vmalloc() returns a void pointer, so there's no need to cast its return value in mm/page_alloc.c::zone_wait_table_init(). Signed-off-by: NJesper Juhl <jesper.juhl@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jeff Moyer 提交于
A while back, Nick Piggin introduced a patch to reduce the node memory usage for small files (commit cfd9b7df): -#define RADIX_TREE_MAP_SHIFT 6 +#define RADIX_TREE_MAP_SHIFT (CONFIG_BASE_SMALL ? 4 : 6) Unfortunately, he didn't take into account the fact that the calculation of the maximum path was based on an assumption of having to round up: #define RADIX_TREE_MAX_PATH (RADIX_TREE_INDEX_BITS/RADIX_TREE_MAP_SHIFT + 2) So, if CONFIG_BASE_SMALL is set, you will end up with a RADIX_TREE_MAX_PATH that is one greater than necessary. The practical upshot of this is just a bit of wasted memory (one long in the height_to_maxindex array, an extra pre-allocated radix tree node per cpu, and extra stack usage in a couple of functions), but it seems worth getting right. It's also worth noting that I never build with CONFIG_BASE_SMALL. What I did to test this was duplicate the code in a small user-space program and check the results of the calculations for max path and the contents of the height_to_maxindex array. Signed-off-by: NJeff Moyer <jmoyer@redhat.com> Acked-by: NNick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nick Piggin 提交于
nobh mode error handling is not just pretty slack, it's wrong. One cannot zero out the whole page to ensure new blocks are zeroed, because it just brings the whole page "uptodate" with zeroes even if that may not be the correct uptodate data. Also, other parts of the page may already contain dirty data which would get lost by zeroing it out. Thirdly, the writeback of zeroes to the new blocks will also erase existing blocks. All these conditions are pagecache and/or filesystem corruption. The problem comes about because we didn't keep track of which buffers actually are new or old. However it is not enough just to keep only this state, because at the point we start dirtying parts of the page (new blocks, with zeroes), the handling of IO errors becomes impossible without buffers because the page may only be partially uptodate, in which case the page flags allone cannot capture the state of the parts of the page. So allocate all buffers for the page upfront, but leave them unattached so that they don't pick up any other references and can be freed when we're done. If the error path is hit, then zero the new buffers as the regular buffer path does, then attach the buffers to the page so that it can actually be written out correctly and be subject to the normal IO error handling paths. As an upshot, we save 1K of kernel stack on ia64 or powerpc 64K page systems. Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-