- 27 3月, 2009 3 次提交
-
-
由 Matthew Garrett 提交于
Change the default behaviour of the kernel to use relatime for all filesystems. This can be overridden with the "strictatime" mount option. Signed-off-by: NMatthew Garrett <mjg@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Garrett 提交于
Add support for explicitly requesting full atime updates. This makes it possible for kernels to default to relatime but still allow userspace to override it. Signed-off-by: NMatthew Garrett <mjg@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Garrett 提交于
Allow atime to be updated once per day even with relatime. This lets utilities like tmpreaper (which delete files based on last access time) continue working, making relatime a plausible default for distributions. Signed-off-by: NMatthew Garrett <mjg@redhat.com> Reviewed-by: NMatthew Wilcox <willy@linux.intel.com> Acked-by: NValerie Aurora Henson <vaurora@redhat.com> Acked-by: NAlan Cox <alan@redhat.com> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 24 3月, 2009 20 次提交
-
-
由 Steven Whitehouse 提交于
This removes some old code that was causing issues during filesystem freeze. Reported-by: NAndrew Price <andy@andrewprice.me.uk> Tested-by: NAndrew Price <andy@andrewprice.me.uk> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
The logic requires that we mark the glock dirty in page_mkwrite otherwise we might not flush correctly in the case that no allocation was required in the process of dirying the page. Also we need to set the shared write flag early for the same reason. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
This cleans up a number of bits of code mostly based in glops.c. A couple of simple functions have been merged into the callers to make it more obvious what is going on, the mysterious raising of i_writecount around the truncate_inode_pages() call has been removed. The meta_go_* operations have been renamed rgrp_go_* since that is the only lock type that they are used with. The unused argument of gfs2_read_sb has been removed. Also a bug has been fixed where a check for the rindex inode was in the wrong callback. More comments are added, and the debugging code is improved too. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Benjamin Marzinski 提交于
After calling out to the dlm, GFS2 sets the new state of a glock to gl_target in gdlm_ast(). However, gl_target is not always the lock state that was requested. If a conversion from shared to exclusive fails, finish_xmote() will call do_xmote() with LM_ST_UNLOCKED, instead of gl->gl_target, so that it can reacquire the lock in exlusive the next time around. In this case, setting the lock to gl_target in gdlm_ast() will make GFS2 think that it has the glock in exclusive mode, when really, it doesn't have the glock locked at all. This patch adds a new field to the gfs2_glock structure, gl_req, to track the mode that was requested. Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Hisashi Hifumi 提交于
I introduced "is_partially_uptodate" aops for GFS2. A page can have multiple buffers and even if a page is not uptodate, some buffers can be uptodate on pagesize != blocksize environment. This aops checks that all buffers which correspond to a part of a file that we want to read are uptodate. If so, we do not have to issue actual read IO to HDD even if a page is not uptodate because the portion we want to read are uptodate. "block_is_partially_uptodate" function is already used by ext2/3/4. With the following patch random read/write mixed workloads or random read after random write workloads can be optimized and we can get performance improvement. I did a performance test using the sysbench. #sysbench --num-threads=16 --max-requests=200000 --test=fileio --file-num=1 --file-block-size=8K --file-total-size=2G --file-test-mode=rndrw --file-fsync-freq=0 --file-rw-ratio=1 run -2.6.29-rc6 Test execution summary: total time: 202.6389s total number of events: 200000 total time taken by event execution: 2580.0480 per-request statistics: min: 0.0000s avg: 0.0129s max: 49.5852s approx. 95 percentile: 0.0462s -2.6.29-rc6-patched Test execution summary: total time: 177.8639s total number of events: 200000 total time taken by event execution: 2419.0199 per-request statistics: min: 0.0000s avg: 0.0121s max: 52.4306s approx. 95 percentile: 0.0444s arch: ia64 pagesize: 16k blocksize: 4k Signed-off-by: NHisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Hannes Eder 提交于
Impact: Make symbol static. Fix this sparse warning: fs/gfs2/rgrp.c:188:5: warning: symbol 'gfs2_bitfit' was not declared. Should it be static? Signed-off-by: NHannes Eder <hannes@hanneseder.net> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Hannes Eder 提交于
Fix this sparse warnings: fs/gfs2/rgrp.c:156:23: warning: constant 0xffffffffffffffff is so big it is unsigned long long fs/gfs2/rgrp.c:157:23: warning: constant 0xaaaaaaaaaaaaaaaa is so big it is unsigned long long fs/gfs2/rgrp.c:158:23: warning: constant 0x5555555555555555 is so big it is long long fs/gfs2/rgrp.c:194:20: warning: constant 0x5555555555555555 is so big it is long long fs/gfs2/rgrp.c:204:44: warning: constant 0x5555555555555555 is so big it is long long Signed-off-by: NHannes Eder <hannes@hanneseder.net> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
This adds support for "quota" and "noquota" mount options in addition to the existing "quota=on/off/account" so that we are compatible with the names by which these options are more generally known. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
An alignment issue with the existing bitfit algorithm was reported on IA64. This patch attempts to fix that, and also to tidy up the code a bit. There is now more documentation about how this works and it has survived a number of different tests. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
This adds a sysfs file called demote_rq to GFS2's per filesystem directory. Its possible to use this file to demote arbitrary glocks in exactly the same way as if a request had come in from a remote node. This is intended for testing issues relating to caching of data under glocks. Despite that, the interface is generic enough to send requests to any type of glock, but be careful as its not always safe to send an arbitrary message to an arbitrary glock. For that reason and to prevent DoS, this interface is restricted to root only. The messages look like this: <type>:<glocknumber> <mode> Example: echo -n "2:13324 EX" >/sys/fs/gfs2/unity:myfs/demote_rq Which means "please demote inode glock (type 2) number 13324 so that I can get an EX (exclusive) lock". The lock modes are those which would normally be sent by a remote node in its callback so if you want to unlock a glock, you use EX, to demote to shared, use SH or PR (depending on whether you like GFS2 or DLM lock modes better!). If the glock doesn't exist, you'll get -ENOENT returned. If the arguments don't make sense, you'll get -EINVAL returned. The plan is that this interface will be used in combination with the blktrace patch which I recently posted for comments although it is, of course, still useful in its own right. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
Since we have a UUID, we ought to expose it to the user via sysfs and uevents. We already have the fs name in both of these places (a combination of the lock proto and lock table name) so if we add the UUID as well, we have a full set. For older filesystems (i.e. those created before mkfs.gfs2 was writing UUIDs by default) the sysfs file will appear zero length, and no UUID env var will be added to the uevents. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
This patch allows GFS2 to generate discard requests for blocks which are no longer useful to the filesystem (i.e. those which have been freed as the result of an unlink operation). The requests are generated at the time which those blocks become available for reuse in the filesystem. In order to use this new feature, you have to specify the "discard" mount option. The code coalesces adjacent blocks into a single extent when generating the discard requests, thus generating the minimum number. If an error occurs when the request has been sent to the block device, then it will print a message and turn off the requests for that filesystem. If the problem is temporary, then you can use remount to turn the option back on again. There is also a nodiscard mount option so that you can use remount to turn discard requests off, if required. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
This patch fixes a deadlock when the journal is flushed and there are dirty inodes other than the one which caused the journal flush. Originally the journal flushing code was trying to obtain the transaction glock while running the flush code for an inode glock. We no longer require the transaction glock at this point in time since we know that any attempt to get the transaction glock from another node will result in a journal flush. So if we are flushing the journal, we can be sure that the transaction lock is still cached from when the transaction was started. By inlining a version of gfs2_trans_begin() (minus the bit which gets the transaction glock) we can avoid the deadlock problems caused if there is a demote request queued up on the transaction glock. In addition I've also moved the umount rwsem so that it covers the glock workqueue, since it all demotions are done by this workqueue now. That fixes a bug on umount which I came across while fixing the original problem. Reported-by: NDavid Teigland <teigland@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
We were keeping hold of an extra ref to the root inode in one of the error paths, that resulted in a hang. Reported-by: NNate Straz <nstraz@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com> Tested-by: NRobert Peterson <rpeterso@redhat.com>
-
由 Steven Whitehouse 提交于
The time stamp field is unused in the glock now that we are using a shrinker, so that we can remove it and save sizeof(unsigned long) bytes in each glock. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
This is the big patch that I've been working on for some time now. There are many reasons for wanting to make this change such as: o Reducing overhead by eliminating duplicated fields between structures o Simplifcation of the code (reduces the code size by a fair bit) o The locking interface is now the DLM interface itself as proposed some time ago. o Fewer lookups of glocks when processing replies from the DLM o Fewer memory allocations/deallocations for each glock o Scope to do further optimisations in the future (but this patch is more than big enough for now!) Please note that (a) this patch relates to the lock_dlm module and not the DLM itself, that is still a separate module; and (b) that we retain the ability to build GFS2 as a standalone single node filesystem with out requiring the DLM. This patch needs a lot of testing, hence my keeping it I restarted my -git tree after the last merge window. That way, this has the maximum exposure before its merged. This is (modulo a few minor bug fixes) the same patch that I've been posting on and off the the last three months and its passed a number of different tests so far. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
We only really need a single spin lock for the quota data, so lets just use the lru lock for now. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>
-
由 Abhijith Das 提交于
Deallocation of gfs2_quota_data objects now happens on-demand through a shrinker instead of routinely deallocating through the quotad daemon. Signed-off-by: NAbhijith Das <adas@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Abhijith Das 提交于
The quota code uses lvbs and this is currently not implemented in lock_nolock, thereby causing panics when quota is enabled with lock_nolock. This patch adds the relevant bits. Signed-off-by: NAbhijith Das <adas@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Steven Whitehouse 提交于
The following patch fixes an issue relating to remount and argument parsing. After this fix is applied, remount becomes atomic in that it either succeeds changing the mount to the new state, or it fails and leaves it in the old state. Previously it was possible for the parsing of options to fail part way though and for the fs to be left in a state where some of the new arguments had been applied, but some had not. Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
- 23 3月, 2009 3 次提交
-
-
由 Gertjan van Wingerde 提交于
Update all previous incarnations of my email address to the correct one. Signed-off-by: NGertjan van Wingerde <gwingerde@gmail.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Tyler Hicks 提交于
If ecryptfs_encrypted_view or ecryptfs_xattr_metadata were being specified as mount options, a NULL pointer dereference of crypt_stat was possible during lookup. This patch moves the crypt_stat assignment into ecryptfs_lookup_and_interpose_lower(), ensuring that crypt_stat will not be NULL before we attempt to dereference it. Thanks to Dan Carpenter and his static analysis tool, smatch, for finding this bug. Signed-off-by: NTyler Hicks <tyhicks@linux.vnet.ibm.com> Acked-by: NDustin Kirkland <kirkland@canonical.com> Cc: Dan Carpenter <error27@gmail.com> Cc: Serge Hallyn <serue@us.ibm.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Tyler Hicks 提交于
When allocating the memory used to store the eCryptfs header contents, a single, zeroed page was being allocated with get_zeroed_page(). However, the size of an eCryptfs header is either PAGE_CACHE_SIZE or ECRYPTFS_MINIMUM_HEADER_EXTENT_SIZE (8192), whichever is larger, and is stored in the file's private_data->crypt_stat->num_header_bytes_at_front field. ecryptfs_write_metadata_to_contents() was using num_header_bytes_at_front to decide how many bytes should be written to the lower filesystem for the file header. Unfortunately, at least 8K was being written from the page, despite the chance of the single, zeroed page being smaller than 8K. This resulted in random areas of kernel memory being written between the 0x1000 and 0x1FFF bytes offsets in the eCryptfs file headers if PAGE_SIZE was 4K. This patch allocates a variable number of pages, calculated with num_header_bytes_at_front, and passes the number of allocated pages along to ecryptfs_write_metadata_to_contents(). Thanks to Florian Streibelt for reporting the data leak and working with me to find the problem. 2.6.28 is the only kernel release with this vulnerability. Corresponds to CVE-2009-0787 Signed-off-by: NTyler Hicks <tyhicks@linux.vnet.ibm.com> Acked-by: NDustin Kirkland <kirkland@canonical.com> Reviewed-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NEugene Teo <eugeneteo@kernel.sg> Cc: Greg KH <greg@kroah.com> Cc: dann frazier <dannf@dannf.org> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: Florian Streibelt <florian@f-streibelt.de> Cc: stable@kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 20 3月, 2009 3 次提交
-
-
由 Jeff Moyer 提交于
The libaio test harness turned up a problem whereby lookup_ioctx on a bogus io context was returning the 1 valid io context from the list (harness/cases/3.p). Because of that, an extra put_iocontext was done, and when the process exited, it hit a BUG_ON in the put_iocontext macro called from exit_aio (since we expect a users count of 1 and instead get 0). The problem was introduced by "aio: make the lookup_ioctx() lockless" (commit abf137dd). Thanks to Zach for pointing out that hlist_for_each_entry_rcu will not return with a NULL tpos at the end of the loop, even if the entry was not found. Signed-off-by: NJeff Moyer <jmoyer@redhat.com> Acked-by: NZach Brown <zach.brown@oracle.com> Acked-by: NJens Axboe <jens.axboe@oracle.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Davide Libenzi 提交于
Remove a source of fput() call from inside IRQ context. Myself, like Eric, wasn't able to reproduce an fput() call from IRQ context, but Jeff said he was able to, with the attached test program. Independently from this, the bug is conceptually there, so we might be better off fixing it. This patch adds an optimization similar to the one we already do on ->ki_filp, on ->ki_eventfd. Playing with ->f_count directly is not pretty in general, but the alternative here would be to add a brand new delayed fput() infrastructure, that I'm not sure is worth it. Signed-off-by: NDavide Libenzi <davidel@xmailserver.org> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: NJeff Moyer <jmoyer@redhat.com> Cc: Zach Brown <zach.brown@oracle.com> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
Nick Piggin noticed this (very unlikely) race between setting a page dirty and creating the buffers for it - we need to hold the mapping private_lock until we've set the page dirty bit in order to make sure that create_empty_buffers() might not build up a set of buffers without the dirty bits set when the page is dirty. I doubt anybody has ever hit this race (and it didn't solve the issue Nick was looking at), but as Nick says: "Still, it does appear to solve a real race, which we should close." Acked-by: NNick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 3月, 2009 2 次提交
-
-
由 Benny Halevy 提交于
Although this operation is unsupported by our implementation we still need to provide an encode routine for it to merely encode its (error) status back in the compound reply. Thanks for Bill Baker at sun.com for testing with the Sun OpenSolaris' client, finding, and reporting this bug at Connectathon 2009. This bug was introduced in 2.6.27 Signed-off-by: NBenny Halevy <bhalevy@panasas.com> Cc: stable@kernel.org Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
-
由 Linus Torvalds 提交于
Commit ee6f779b ("filp->f_pos not correctly updated in proc_task_readdir") changed the proc code to use filp->f_pos directly, rather than through a temporary variable. In the process, that caused the operations to be done on the full 64 bits, even though the offset is never that big. That's all fine and dandy per se, but for some unfathomable reason gcc generates absolutely horrid code when using 64-bit values in switch() statements. To the point of actually calling out to gcc helper functions like __cmpdi2 rather than just doing the trivial comparisons directly the way gcc does for normal compares. At which point we get link failures, because we really don't want to support that kind of crazy code. Fix this by just casting the f_pos value to "unsigned long", which is plenty big enough for /proc, and avoids the gcc code generation issue. Reported-by: NAlexey Dobriyan <adobriyan@gmail.com> Cc: Zhang Le <r0bertz@gentoo.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 3月, 2009 1 次提交
-
-
由 Eric Sandeen 提交于
This is for Red Hat bug 490026: EXT4 panic, list corruption in ext4_mb_new_inode_pa ext4_lock_group(sb, group) is supposed to protect this list for each group, and a common code flow to remove an album is like this: ext4_get_group_no_and_offset(sb, pa->pa_pstart, &grp, NULL); ext4_lock_group(sb, grp); list_del(&pa->pa_group_list); ext4_unlock_group(sb, grp); so it's critical that we get the right group number back for this prealloc context, to lock the right group (the one associated with this pa) and prevent concurrent list manipulation. however, ext4_mb_put_pa() passes in (pa->pa_pstart - 1) with a comment, "-1 is to protect from crossing allocation group". This makes sense for the group_pa, where pa_pstart is advanced by the length which has been used (in ext4_mb_release_context()), and when the entire length has been used, pa_pstart has been advanced to the first block of the next group. However, for inode_pa, pa_pstart is never advanced; it's just set once to the first block in the group and not moved after that. So in this case, if we subtract one in ext4_mb_put_pa(), we are actually locking the *previous* group, and opening the race with the other threads which do not subtract off the extra block. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 16 3月, 2009 1 次提交
-
-
由 Zhang Le 提交于
filp->f_pos only get updated at the end of the function. Thus d_off of those dirents who are in the middle will be 0, and this will cause a problem in glibc's readdir implementation, specifically endless loop. Because when overflow occurs, f_pos will be set to next dirent to read, however it will be 0, unless the next one is the last one. So it will start over again and again. There is a sample program in man 2 gendents. This is the output of the program running on a multithread program's task dir before this patch is applied: $ ./a.out /proc/3807/task --------------- nread=128 --------------- i-node# file type d_reclen d_off d_name 506442 directory 16 1 . 506441 directory 16 0 .. 506443 directory 16 0 3807 506444 directory 16 0 3809 506445 directory 16 0 3812 506446 directory 16 0 3861 506447 directory 16 0 3862 506448 directory 16 8 3863 This is the output after this patch is applied $ ./a.out /proc/3807/task --------------- nread=128 --------------- i-node# file type d_reclen d_off d_name 506442 directory 16 1 . 506441 directory 16 2 .. 506443 directory 16 3 3807 506444 directory 16 4 3809 506445 directory 16 5 3812 506446 directory 16 6 3861 506447 directory 16 7 3862 506448 directory 16 8 3863 Signed-off-by: NZhang Le <r0bertz@gentoo.org> Acked-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 3月, 2009 5 次提交
-
-
由 Li Zefan 提交于
If bio_integrity_clone() fails, bio_clone() returns NULL without freeing the newly allocated bio. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 un'ichi Nomura 提交于
Stricter gfp_mask might be required for clone allocation. For example, request-based dm may clone bio in interrupt context so it has to use GFP_ATOMIC. Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: NMartin K. Petersen <martin.petersen@oracle.com> Cc: Alasdair G Kergon <agk@redhat.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Tyler Hicks 提交于
eCryptfs has file encryption keys (FEK), file encryption key encryption keys (FEKEK), and filename encryption keys (FNEK). The per-file FEK is encrypted with one or more FEKEKs and stored in the header of the encrypted file. I noticed that the FEK is also being encrypted by the FNEK. This is a problem if a user wants to use a different FNEK than their FEKEK, as their file contents will still be accessible with the FNEK. This is a minimalistic patch which prevents the FNEKs signatures from being copied to the inode signatures list. Ultimately, it keeps the FEK from being encrypted with a FNEK. Signed-off-by: NTyler Hicks <tyhicks@linux.vnet.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Acked-by: NDustin Kirkland <kirkland@canonical.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
When a ramfs nommu mapping is expanded, contiguous pages are allocated and added to the pagecache. The caller's reference is then passed on by moving whole pagevecs to the file lru list. If the page cache adding fails, make sure that the error path also moves the pagevec contents which might still contain up to PAGEVEC_SIZE successfully added pages, of which we would leak references otherwise. Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org> Cc: David Howells <dhowells@redhat.com> Cc: Enrik Berkhan <Enrik.Berkhan@ge.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Enrik Berkhan 提交于
The pages attached to a ramfs inode's pagecache by truncation from nothing - as done by SYSV SHM for example - may get discarded under memory pressure. The problem is that the pages are not marked dirty. Anything that creates data in an MMU-based ramfs will cause the pages holding that data will cause the set_page_dirty() aop to be called. For the NOMMU-based mmap, set_page_dirty() may be called by write(), but it won't be called by page-writing faults on writable mmaps, and it isn't called by ramfs_nommu_expand_for_mapping() when a file is being truncated from nothing to allocate a contiguous run. The solution is to mark the pages dirty at the point of allocation by the truncation code. Signed-off-by: NEnrik Berkhan <Enrik.Berkhan@ge.com> Signed-off-by: NDavid Howells <dhowells@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 3月, 2009 1 次提交
-
-
由 Eric Sandeen 提交于
Thiemo Nagel reported that: # dd if=/dev/zero of=image.ext4 bs=1M count=2 # mkfs.ext4 -v -F -b 1024 -m 0 -g 512 -G 4 -I 128 -N 1 \ -O large_file,dir_index,flex_bg,extent,sparse_super image.ext4 # mount -o loop image.ext4 mnt/ # dd if=/dev/zero of=mnt/file oopsed, with a BUG_ON in ext4_mb_normalize_request because size == EXT4_BLOCKS_PER_GROUP It appears to me (esp. after talking to Andreas) that the BUG_ON is bogus; a request of exactly EXT4_BLOCKS_PER_GROUP should be allowed, though larger sizes do indicate a problem. Fix that an another (apparently rare) codepath with a similar check. Reported-by: NThiemo Nagel <thiemo.nagel@ph.tum.de> Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 13 3月, 2009 1 次提交
-
-
由 Tao Ma 提交于
A long time ago, xs->base is allocated a 4K size and all the contents in the bucket are copied to the it. Now we use ocfs2_xattr_bucket to abstract xattr bucket and xs->base is initialized to the start of the bu_bhs[0]. So xs->base + offset will overflow when the value root is stored outside the first block. Then why we can survive the xattr test by now? It is because we always read the bucket contiguously now and kernel mm allocate continguous memory for us. We are lucky, but we should fix it. So just get the right value root as other callers do. Signed-off-by: NTao Ma <tao.ma@oracle.com> Acked-by: NJoel Becker <joel.becker@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-