- 26 3月, 2009 15 次提交
-
-
由 Jan Kara 提交于
Use lowercase names of quota functions instead of old uppercase ones. Signed-off-by: NJan Kara <jack@suse.cz> CC: linux-ext4@vger.kernel.org
-
由 Jan Kara 提交于
Use lowercase names of quota functions instead of old uppercase ones. Signed-off-by: NJan Kara <jack@suse.cz> CC: linux-ext4@vger.kernel.org
-
由 Jan Kara 提交于
Ramfs has no bussiness in quotas. Signed-off-by: NJan Kara <jack@suse.cz>
-
由 Jan Kara 提交于
Use lowercase names of quota functions instead of old uppercase ones. Signed-off-by: NJan Kara <jack@suse.cz> CC: Alexander Viro <viro@zeniv.linux.org.uk>
-
由 Jan Kara 提交于
Remove bogus typedef which is just a definition of char *. Remove unnecessary type casts. Substitute freedqbuf() with kfree. Signed-off-by: NJan Kara <jack@suse.cz>
-
由 Jan Kara 提交于
Remove this macro which is just a definition of NULL. Fix a few coding style issues along the way. Signed-off-by: NJan Kara <jack@suse.cz>
-
由 Jan Kara 提交于
Andrew Morton has suggested that three global quota locks can end up in the same cacheline which can result in bad cacheline ping-pong on SMP machines. Make locks cacheline aligned so that we avoid this problem (thanks goes to Andrew for the idea). Signed-off-by: NJan Kara <jack@suse.cz> CC: Andrew Morton <akpm@linux-foundation.org>
-
由 Jan Kara 提交于
Quota subsystem has more and more files. It's time to create a dir for it. Signed-off-by: NJan Kara <jack@suse.cz>
-
由 Mingming Cao 提交于
Uses quota reservation/claim/release to handle quota properly for delayed allocation in the three steps: 1) quotas are reserved when data being copied to cache when block allocation is defered 2) when new blocks are allocated. reserved quotas are converted to the real allocated quota, 2) over-booked quotas for metadata blocks are released back. Signed-off-by: NMingming Cao <cmm@us.ibm.com> Acked-by: N"Theodore Ts'o" <tytso@mit.edu> Signed-off-by: NJan Kara <jack@suse.cz>
-
由 Jan Kara 提交于
reiserfs_dquot_initialize() and reiserfs_dquot_drop() is no longer needed because of modified quota locking. Signed-off-by: NJan Kara <jack@suse.cz>
-
由 Jan Kara 提交于
ext4_dquot_initialize() and ext4_dquot_drop() is no longer needed because of modified quota locking. Signed-off-by: NJan Kara <jack@suse.cz>
-
由 Jan Kara 提交于
ext3_dquot_initialize() and ext3_dquot_drop() is no longer needed because of modified quota locking. Signed-off-by: NJan Kara <jack@suse.cz>
-
由 Mingming Cao 提交于
According to checkpatch: EXPORT_SYMBOL(foo); should immediately follow its function/variable Signed-off-by: NMingming Cao <cmm@us.ibm.com> Signed-off-by: NJan Kara <jack@suse.cz>
-
由 Mingming Cao 提交于
Reserved quota will be claimed at the block allocation time. Over-booked quota could be returned back with the release callback function. Signed-off-by: NMingming Cao <cmm@us.ibm.com> Signed-off-by: NJan Kara <jack@suse.cz>
-
由 Mingming Cao 提交于
Delayed allocation defers the block allocation at the dirty pages flush-out time, doing quota charge/check at that time is too late. But we can't charge the quota blocks until blocks are really allocated, otherwise users could get overcharged after reboot from system crash. This patch adds quota reservation for delayed allocation. Quota blocks are reserved in memory, inode and quota won't gets dirtied until later block allocation time. Signed-off-by: NMingming Cao <cmm@us.ibm.com> Signed-off-by: NJan Kara <jack@suse.cz>
-
- 23 3月, 2009 3 次提交
-
-
由 Gertjan van Wingerde 提交于
Update all previous incarnations of my email address to the correct one. Signed-off-by: NGertjan van Wingerde <gwingerde@gmail.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Tyler Hicks 提交于
If ecryptfs_encrypted_view or ecryptfs_xattr_metadata were being specified as mount options, a NULL pointer dereference of crypt_stat was possible during lookup. This patch moves the crypt_stat assignment into ecryptfs_lookup_and_interpose_lower(), ensuring that crypt_stat will not be NULL before we attempt to dereference it. Thanks to Dan Carpenter and his static analysis tool, smatch, for finding this bug. Signed-off-by: NTyler Hicks <tyhicks@linux.vnet.ibm.com> Acked-by: NDustin Kirkland <kirkland@canonical.com> Cc: Dan Carpenter <error27@gmail.com> Cc: Serge Hallyn <serue@us.ibm.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Tyler Hicks 提交于
When allocating the memory used to store the eCryptfs header contents, a single, zeroed page was being allocated with get_zeroed_page(). However, the size of an eCryptfs header is either PAGE_CACHE_SIZE or ECRYPTFS_MINIMUM_HEADER_EXTENT_SIZE (8192), whichever is larger, and is stored in the file's private_data->crypt_stat->num_header_bytes_at_front field. ecryptfs_write_metadata_to_contents() was using num_header_bytes_at_front to decide how many bytes should be written to the lower filesystem for the file header. Unfortunately, at least 8K was being written from the page, despite the chance of the single, zeroed page being smaller than 8K. This resulted in random areas of kernel memory being written between the 0x1000 and 0x1FFF bytes offsets in the eCryptfs file headers if PAGE_SIZE was 4K. This patch allocates a variable number of pages, calculated with num_header_bytes_at_front, and passes the number of allocated pages along to ecryptfs_write_metadata_to_contents(). Thanks to Florian Streibelt for reporting the data leak and working with me to find the problem. 2.6.28 is the only kernel release with this vulnerability. Corresponds to CVE-2009-0787 Signed-off-by: NTyler Hicks <tyhicks@linux.vnet.ibm.com> Acked-by: NDustin Kirkland <kirkland@canonical.com> Reviewed-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NEugene Teo <eugeneteo@kernel.sg> Cc: Greg KH <greg@kroah.com> Cc: dann frazier <dannf@dannf.org> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: Florian Streibelt <florian@f-streibelt.de> Cc: stable@kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 20 3月, 2009 3 次提交
-
-
由 Jeff Moyer 提交于
The libaio test harness turned up a problem whereby lookup_ioctx on a bogus io context was returning the 1 valid io context from the list (harness/cases/3.p). Because of that, an extra put_iocontext was done, and when the process exited, it hit a BUG_ON in the put_iocontext macro called from exit_aio (since we expect a users count of 1 and instead get 0). The problem was introduced by "aio: make the lookup_ioctx() lockless" (commit abf137dd). Thanks to Zach for pointing out that hlist_for_each_entry_rcu will not return with a NULL tpos at the end of the loop, even if the entry was not found. Signed-off-by: NJeff Moyer <jmoyer@redhat.com> Acked-by: NZach Brown <zach.brown@oracle.com> Acked-by: NJens Axboe <jens.axboe@oracle.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Davide Libenzi 提交于
Remove a source of fput() call from inside IRQ context. Myself, like Eric, wasn't able to reproduce an fput() call from IRQ context, but Jeff said he was able to, with the attached test program. Independently from this, the bug is conceptually there, so we might be better off fixing it. This patch adds an optimization similar to the one we already do on ->ki_filp, on ->ki_eventfd. Playing with ->f_count directly is not pretty in general, but the alternative here would be to add a brand new delayed fput() infrastructure, that I'm not sure is worth it. Signed-off-by: NDavide Libenzi <davidel@xmailserver.org> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: NJeff Moyer <jmoyer@redhat.com> Cc: Zach Brown <zach.brown@oracle.com> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
Nick Piggin noticed this (very unlikely) race between setting a page dirty and creating the buffers for it - we need to hold the mapping private_lock until we've set the page dirty bit in order to make sure that create_empty_buffers() might not build up a set of buffers without the dirty bits set when the page is dirty. I doubt anybody has ever hit this race (and it didn't solve the issue Nick was looking at), but as Nick says: "Still, it does appear to solve a real race, which we should close." Acked-by: NNick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 3月, 2009 2 次提交
-
-
由 Benny Halevy 提交于
Although this operation is unsupported by our implementation we still need to provide an encode routine for it to merely encode its (error) status back in the compound reply. Thanks for Bill Baker at sun.com for testing with the Sun OpenSolaris' client, finding, and reporting this bug at Connectathon 2009. This bug was introduced in 2.6.27 Signed-off-by: NBenny Halevy <bhalevy@panasas.com> Cc: stable@kernel.org Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
-
由 Linus Torvalds 提交于
Commit ee6f779b ("filp->f_pos not correctly updated in proc_task_readdir") changed the proc code to use filp->f_pos directly, rather than through a temporary variable. In the process, that caused the operations to be done on the full 64 bits, even though the offset is never that big. That's all fine and dandy per se, but for some unfathomable reason gcc generates absolutely horrid code when using 64-bit values in switch() statements. To the point of actually calling out to gcc helper functions like __cmpdi2 rather than just doing the trivial comparisons directly the way gcc does for normal compares. At which point we get link failures, because we really don't want to support that kind of crazy code. Fix this by just casting the f_pos value to "unsigned long", which is plenty big enough for /proc, and avoids the gcc code generation issue. Reported-by: NAlexey Dobriyan <adobriyan@gmail.com> Cc: Zhang Le <r0bertz@gentoo.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 3月, 2009 1 次提交
-
-
由 Eric Sandeen 提交于
This is for Red Hat bug 490026: EXT4 panic, list corruption in ext4_mb_new_inode_pa ext4_lock_group(sb, group) is supposed to protect this list for each group, and a common code flow to remove an album is like this: ext4_get_group_no_and_offset(sb, pa->pa_pstart, &grp, NULL); ext4_lock_group(sb, grp); list_del(&pa->pa_group_list); ext4_unlock_group(sb, grp); so it's critical that we get the right group number back for this prealloc context, to lock the right group (the one associated with this pa) and prevent concurrent list manipulation. however, ext4_mb_put_pa() passes in (pa->pa_pstart - 1) with a comment, "-1 is to protect from crossing allocation group". This makes sense for the group_pa, where pa_pstart is advanced by the length which has been used (in ext4_mb_release_context()), and when the entire length has been used, pa_pstart has been advanced to the first block of the next group. However, for inode_pa, pa_pstart is never advanced; it's just set once to the first block in the group and not moved after that. So in this case, if we subtract one in ext4_mb_put_pa(), we are actually locking the *previous* group, and opening the race with the other threads which do not subtract off the extra block. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 16 3月, 2009 1 次提交
-
-
由 Zhang Le 提交于
filp->f_pos only get updated at the end of the function. Thus d_off of those dirents who are in the middle will be 0, and this will cause a problem in glibc's readdir implementation, specifically endless loop. Because when overflow occurs, f_pos will be set to next dirent to read, however it will be 0, unless the next one is the last one. So it will start over again and again. There is a sample program in man 2 gendents. This is the output of the program running on a multithread program's task dir before this patch is applied: $ ./a.out /proc/3807/task --------------- nread=128 --------------- i-node# file type d_reclen d_off d_name 506442 directory 16 1 . 506441 directory 16 0 .. 506443 directory 16 0 3807 506444 directory 16 0 3809 506445 directory 16 0 3812 506446 directory 16 0 3861 506447 directory 16 0 3862 506448 directory 16 8 3863 This is the output after this patch is applied $ ./a.out /proc/3807/task --------------- nread=128 --------------- i-node# file type d_reclen d_off d_name 506442 directory 16 1 . 506441 directory 16 2 .. 506443 directory 16 3 3807 506444 directory 16 4 3809 506445 directory 16 5 3812 506446 directory 16 6 3861 506447 directory 16 7 3862 506448 directory 16 8 3863 Signed-off-by: NZhang Le <r0bertz@gentoo.org> Acked-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 3月, 2009 5 次提交
-
-
由 Li Zefan 提交于
If bio_integrity_clone() fails, bio_clone() returns NULL without freeing the newly allocated bio. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 un'ichi Nomura 提交于
Stricter gfp_mask might be required for clone allocation. For example, request-based dm may clone bio in interrupt context so it has to use GFP_ATOMIC. Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: NMartin K. Petersen <martin.petersen@oracle.com> Cc: Alasdair G Kergon <agk@redhat.com> Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
由 Tyler Hicks 提交于
eCryptfs has file encryption keys (FEK), file encryption key encryption keys (FEKEK), and filename encryption keys (FNEK). The per-file FEK is encrypted with one or more FEKEKs and stored in the header of the encrypted file. I noticed that the FEK is also being encrypted by the FNEK. This is a problem if a user wants to use a different FNEK than their FEKEK, as their file contents will still be accessible with the FNEK. This is a minimalistic patch which prevents the FNEKs signatures from being copied to the inode signatures list. Ultimately, it keeps the FEK from being encrypted with a FNEK. Signed-off-by: NTyler Hicks <tyhicks@linux.vnet.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Acked-by: NDustin Kirkland <kirkland@canonical.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
When a ramfs nommu mapping is expanded, contiguous pages are allocated and added to the pagecache. The caller's reference is then passed on by moving whole pagevecs to the file lru list. If the page cache adding fails, make sure that the error path also moves the pagevec contents which might still contain up to PAGEVEC_SIZE successfully added pages, of which we would leak references otherwise. Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org> Cc: David Howells <dhowells@redhat.com> Cc: Enrik Berkhan <Enrik.Berkhan@ge.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Enrik Berkhan 提交于
The pages attached to a ramfs inode's pagecache by truncation from nothing - as done by SYSV SHM for example - may get discarded under memory pressure. The problem is that the pages are not marked dirty. Anything that creates data in an MMU-based ramfs will cause the pages holding that data will cause the set_page_dirty() aop to be called. For the NOMMU-based mmap, set_page_dirty() may be called by write(), but it won't be called by page-writing faults on writable mmaps, and it isn't called by ramfs_nommu_expand_for_mapping() when a file is being truncated from nothing to allocate a contiguous run. The solution is to mark the pages dirty at the point of allocation by the truncation code. Signed-off-by: NEnrik Berkhan <Enrik.Berkhan@ge.com> Signed-off-by: NDavid Howells <dhowells@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 3月, 2009 1 次提交
-
-
由 Eric Sandeen 提交于
Thiemo Nagel reported that: # dd if=/dev/zero of=image.ext4 bs=1M count=2 # mkfs.ext4 -v -F -b 1024 -m 0 -g 512 -G 4 -I 128 -N 1 \ -O large_file,dir_index,flex_bg,extent,sparse_super image.ext4 # mount -o loop image.ext4 mnt/ # dd if=/dev/zero of=mnt/file oopsed, with a BUG_ON in ext4_mb_normalize_request because size == EXT4_BLOCKS_PER_GROUP It appears to me (esp. after talking to Andreas) that the BUG_ON is bogus; a request of exactly EXT4_BLOCKS_PER_GROUP should be allowed, though larger sizes do indicate a problem. Fix that an another (apparently rare) codepath with a similar check. Reported-by: NThiemo Nagel <thiemo.nagel@ph.tum.de> Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 13 3月, 2009 9 次提交
-
-
由 Tao Ma 提交于
A long time ago, xs->base is allocated a 4K size and all the contents in the bucket are copied to the it. Now we use ocfs2_xattr_bucket to abstract xattr bucket and xs->base is initialized to the start of the bu_bhs[0]. So xs->base + offset will overflow when the value root is stored outside the first block. Then why we can survive the xattr test by now? It is because we always read the bucket contiguously now and kernel mm allocate continguous memory for us. We are lucky, but we should fix it. So just get the right value root as other callers do. Signed-off-by: NTao Ma <tao.ma@oracle.com> Acked-by: NJoel Becker <joel.becker@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Tao Ma 提交于
We need to use le32_to_cpu to test rec->e_cpos in ocfs2_dinode_insert_check. Signed-off-by: NTao Ma <tao.ma@oracle.com> Acked-by: NJoel Becker <joel.becker@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Tiger Yang 提交于
Replace max_inline_data with max_inline_data_with_xattr to ensure it correct when xattr inlined. Signed-off-by: NTiger Yang <tiger.yang@oracle.com> Acked-by: NJoel Becker <joel.becker@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Tiger Yang 提交于
If this is a new directory with inline data, we choose to reserve the entire inline area for directory contents and force an external xattr block. Signed-off-by: NTiger Yang <tiger.yang@oracle.com> Acked-by: NJoel Becker <joel.becker@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com>
-
由 Nick Piggin 提交于
There was a report of a data corruption http://lkml.org/lkml/2008/11/14/121. There is a script included to reproduce the problem. During testing, I encountered a number of strange things with ext3, so I tried ext2 to attempt to reduce complexity of the problem. I found that fsstress would quickly hang in wait_on_inode, waiting for I_LOCK to be cleared, even though instrumentation showed that unlock_new_inode had already been called for that inode. This points to memory scribble, or synchronisation problme. i_state of I_NEW inodes is not protected by inode_lock because other processes are not supposed to touch them until I_LOCK (and I_NEW) is cleared. Adding WARN_ON(inode->i_state & I_NEW) to sites where we modify i_state revealed that generic_sync_sb_inodes is picking up new inodes from the inode lists and passing them to __writeback_single_inode without waiting for I_NEW. Subsequently modifying i_state causes corruption. In my case it would look like this: CPU0 CPU1 unlock_new_inode() __sync_single_inode() reg <- inode->i_state reg -> reg & ~(I_LOCK|I_NEW) reg <- inode->i_state reg -> inode->i_state reg -> reg | I_SYNC reg -> inode->i_state Non-atomic RMW on CPU1 overwrites CPU0 store and sets I_LOCK|I_NEW again. Fix for this is rather than wait for I_NEW inodes, just skip over them: inodes concurrently being created are not subject to data integrity operations, and should not significantly contribute to dirty memory either. After this change, I'm unable to reproduce any of the added warnings or hangs after ~1hour of running. Previously, the new warnings would start immediately and hang would happen in under 5 minutes. I'm also testing on ext3 now, and so far no problems there either. I don't know whether this fixes the problem reported above, but it fixes a real problem for me. Cc: "Jorge Boncompte [DTI2]" <jorge@dti2.net> Reported-by: NAdrian Hunter <ext-adrian.hunter@nokia.com> Cc: Jan Kara <jack@suse.cz> Cc: <stable@kernel.org> Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Li Zefan 提交于
In sget(), destroy_super(s) is called with s->s_umount held, which makes lockdep unhappy. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Menage <menage@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
If the second fasync_helper() fails, pipe_rdwr_fasync() returns the error but leaves the file on ->fasync_readers. This was always wrong, but since 233e70f4 "saner FASYNC handling on file close" we have the new problem. Because in this case setfl() doesn't set FASYNC bit, __fput() will not do ->fasync(0), and we leak fasync_struct with ->fa_file pointing to the freed file. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Andi Kleen <andi@firstfloor.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Trond Myklebust 提交于
Stephen Rothwell reports: Today's linux-next build (powerpc ppc64_defconfig) failed like this: fs/built-in.o: In function `.nfs_get_client': client.c:(.text+0x115010): undefined reference to `.__ipv6_addr_type' Fix by moving the IPV6 specific parts of commit d7371c41 ("Bug 11061, NFS mounts dropped") into the '#ifdef IPV6..." section. Also fix up a couple of formatting issues. Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Theodore Ts'o 提交于
This is a short-term warning, and even printk_ratelimit() can result in too much noise in system logs. So only print it once as a warning. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-