- 28 10月, 2010 14 次提交
-
-
由 Maciej Żenczykowski 提交于
An ext4 filesystem on a read-only device, with an external journal which is at a different device number then recorded in the superblock will fail to honor the read-only setting of the device and trigger a superblock update (write). For example: - ext4 on a software raid which is in read-only mode - external journal on a read-write device which has changed device num - attempt to mount with -o journal_dev=<new_number> - hits BUG_ON(mddev->ro = 1) in md.c Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: NMaciej Żenczykowski <zenczykowski@gmail.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Lukas Czerner 提交于
Change ext4_ext_zeroout to use sb_issue_zeroout instead of its own approach to zero out extents. Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Lukas Czerner 提交于
Use sb_issue_zeroout to zero out inode table and descriptor table blocks instead of old approach which involves journaling. Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Lukas Czerner 提交于
User-space should have the opportunity to check what features doest ext4 support in each particular copy. This adds easy interface by creating new "features" directory in sys/fs/ext4/. In that directory files advertising feature names can be created. Add lazy_itable_init to the feature list. Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Lukas Czerner 提交于
When the lazy_itable_init extended option is passed to mke2fs, it considerably speeds up filesystem creation because inode tables are not zeroed out. The fact that parts of the inode table are uninitialized is not a problem so long as the block group descriptors, which contain information regarding how much of the inode table has been initialized, has not been corrupted However, if the block group checksums are not valid, e2fsck must scan the entire inode table, and the the old, uninitialized data could potentially cause e2fsck to report false problems. Hence, it is important for the inode tables to be initialized as soon as possble. This commit adds this feature so that mke2fs can safely use the lazy inode table initialization feature to speed up formatting file systems. This is done via a new new kernel thread called ext4lazyinit, which is created on demand and destroyed, when it is no longer needed. There is only one thread for all ext4 filesystems in the system. When the first filesystem with inititable mount option is mounted, ext4lazyinit thread is created, then the filesystem can register its request in the request list. This thread then walks through the list of requests picking up scheduled requests and invoking ext4_init_inode_table(). Next schedule time for the request is computed by multiplying the time it took to zero out last inode table with wait multiplier, which can be set with the (init_itable=n) mount option (default is 10). We are doing this so we do not take the whole I/O bandwidth. When the thread is no longer necessary (request list is empty) it frees the appropriate structures and exits (and can be created later later by another filesystem). We do not disturb regular inode allocations in any way, it just do not care whether the inode table is, or is not zeroed. But when zeroing, we have to skip used inodes, obviously. Also we should prevent new inode allocations from the group, while zeroing is on the way. For that we take write alloc_sem lock in ext4_init_inode_table() and read alloc_sem in the ext4_claim_inode, so when we are unlucky and allocator hits the group which is currently being zeroed, it just has to wait. This can be suppresed using the mount option no_init_itable. Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
An attempt to modify the file system during the call to jbd2_destroy_journal() can lead to a system lockup. So add some checking to make it much more obvious when this happens to and to determine where the offending code is located. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Sergey Senozhatsky 提交于
Fix NULL pointer dereference in print_daily_error_info, when called on unmounted fs (EXT4_SB(sb) returns NULL), by removing error reporting timer in ext4_put_super. Google-Bug-Id: 3017663 Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Lukas Czerner 提交于
We can't hold the block group spinlock because we ext4_issue_discard() calls wait and hence can get rescheduled. Google-Bug-Id: 3017678 Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Lukas Czerner 提交于
sb_issue_discard() is returning negative error code, so check for -EOPNOTSUPP. Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Eric Sandeen 提交于
I'm uneasy with lots of stuff going on in ext4_da_writepages(), but bumping nr_to_write from LLONG_MAX to -8 clearly isn't making anything better, so avoid the multiplier in that case. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Eric Sandeen 提交于
Today we simply break out of the inner loop when we have accumulated max_pages; this keeps scanning forwad and doing pagevec_lookup_tag() in the while (!done) loop, this does potentially a lot of work with no net effect. When we have accumulated max_pages, just clean up and return. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Curt Wohlgemuth 提交于
ext4_group_info structures are currently allocated with kmalloc(). With a typical 4K block size, these are 136 bytes each -- meaning they'll each consume a 256-byte slab object. On a system with many ext4 large partitions, that's a lot of wasted kernel slab space. (E.g., a single 1TB partition will have about 8000 block groups, using about 2MB of slab, of which nearly 1MB is wasted.) This patch creates an array of slab pointers created as needed -- depending on the superblock block size -- and uses these slabs to allocate the group info objects. Google-Bug-Id: 2980809 Signed-off-by: NCurt Wohlgemuth <curtw@google.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Brian King 提交于
This fixes a hang seen in jbd2_journal_release_jbd_inode on a lot of Power 6 systems running with ext4. When we get in the hung state, all I/O to the disk in question gets blocked where we stay indefinitely. Looking at the task list, I can see we are stuck in jbd2_journal_release_jbd_inode waiting on a wake up. I added some debug code to detect this scenario and dump additional data if we were stuck in jbd2_journal_release_jbd_inode for longer than 30 minutes. When it hit, I was able to see that i_flags was 0, suggesting we missed the wake up. This patch changes i_flags to be an unsigned long, uses bit operators to access it, and adds barriers around the accesses. Prior to applying this patch, we were regularly hitting this hang on numerous systems in our test environment. After applying the patch, the hangs no longer occur. Signed-off-by: NBrian King <brking@linux.vnet.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
It turns out we have several problems with how EOFBLOCKS_FL is handled. First of all, there was a fencepost error where we were not clearing the EOFBLOCKS_FL when fill in the last uninitialized block, but rather when we allocate the next block _after_ the uninitalized block. Secondly we were not testing to see if we needed to clear the EOFBLOCKS_FL when writing to the file O_DIRECT or when were converting an uninitialized block (which is the most common case). Google-Bug-Id: 2928259 Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 24 9月, 2010 5 次提交
-
-
由 Srinivas Eeda 提交于
While umounting, a block mle doesn't get freed if dlm is shutdown after master request is received but before assert master. This results in unclean shutdown of dlm domain. This patch frees all mles that lie around after other nodes were notified about exiting the dlm and marking dlm state as leaving. Only block mles are expected to be around, so we log ERROR for other mles but still free them. Signed-off-by: NSrinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
由 Tao Ma 提交于
We sync our inode flags with ext2 and define them by hex values. But actually in commit 36695673(4 years ago), all these values are moved to include/linux/fs.h. So we'd better also use them as what ext2 did. So sync our inode flags with ext2 by using FS_*. Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
由 Tao Ma 提交于
The first time I read the function ocfs2_resmap_resv_bits, I consider about what 'wanted' will be used and consider about the comments. Then I find it is only used if the reservation is empty. ;) So we'd better move it to the parens so that it make the code more readable, what's more, ocfs2_resmap_resv_bits is used so frequently and we should save some cpus. Acked-by: NMark Fasheh <mfasheh@suse.com> Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
由 Tao Ma 提交于
e_leaf_clusters is a le16, so use cpu_to_le16 instead of cpu_to_le32. What's more, we change 'clusters' to unsigned int to signify that the size of 'clusters' isn't important here. Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
由 Tao Ma 提交于
In commit 30e2bab2, ext3 fixed it. So change it accordingly in ocfs2. Steps to reproduce: # touch aaa # stat -c %Z aaa 1283760364 # setfacl -m 'u::x,g::x,o::x' aaa # stat -c %Z aaa 1283760364 Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
- 23 9月, 2010 4 次提交
-
-
由 KOSAKI Motohiro 提交于
Currently, /proc/<pid>/smaps has wrong dirty pages accounting. Shared_Dirty and Private_Dirty output only pte dirty pages and ignore PG_dirty page flag. It is difference against documentation, but also inconsistent against Referenced field. (Referenced checks both pte and page flags) This patch fixes it. Test program: large-array.c --------------------------------------------------- #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> char array[1*1024*1024*1024L]; int main(void) { memset(array, 1, sizeof(array)); pause(); return 0; } --------------------------------------------------- Test case: 1. run ./large-array 2. cat /proc/`pidof large-array`/smaps 3. swapoff -a 4. cat /proc/`pidof large-array`/smaps again Test result: <before patch> 00601000-40601000 rw-p 00000000 00:00 0 Size: 1048576 kB Rss: 1048576 kB Pss: 1048576 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 218992 kB <-- showed pages as clean incorrectly Private_Dirty: 829584 kB Referenced: 388364 kB Swap: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB <after patch> 00601000-40601000 rw-p 00000000 00:00 0 Size: 1048576 kB Rss: 1048576 kB Pss: 1048576 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 1048576 kB <-- fixed Referenced: 388480 kB Swap: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: NHugh Dickins <hughd@google.com> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jan Kara 提交于
OCFS2 can return ERESTARTSYS from its write function when the process is signalled while waiting for a cluster lock (and the filesystem is mounted with intr mount option). Generally, it seems reasonable to allow filesystems to return this error code from its IO functions. As we must not leak ERESTARTSYS (and similar error codes) to userspace as a result of an AIO operation, we have to properly convert it to EINTR inside AIO code (restarting the syscall isn't really an option because other AIO could have been already submitted by the same io_submit syscall). Signed-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NJeff Moyer <jmoyer@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Zach Brown <zach.brown@oracle.com> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Arnd Bergmann 提交于
Commit 73296bc6 ("procfs: Use generic_file_llseek in /proc/vmcore") broke seeking on /proc/vmcore. This changes it back to use default_llseek in order to restore the original behaviour. The problem with generic_file_llseek is that it only allows seeks up to inode->i_sb->s_maxbytes, which is zero on procfs and some other virtual file systems. We should merge generic_file_llseek and default_llseek some day and clean this up in a proper way, but for 2.6.35/36, reverting vmcore is the safer solution. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Reported-by: NCAI Qian <caiqian@redhat.com> Tested-by: NCAI Qian <caiqian@redhat.com> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dan Rosenberg 提交于
In 32-bit compatibility mode, the error handling for compat_do_readv_writev() may free an uninitialized pointer, potentially leading to all sorts of ugly memory corruption. This is reliably triggerable by unprivileged users by invoking the readv()/writev() syscalls with an invalid iovec pointer. The below patch fixes this to emulate the non-compat version. Introduced by commit b8373363 ("compat: factor out compat_rw_copy_check_uvector from compat_do_readv_writev") Signed-off-by: NDan Rosenberg <dan.j.rosenberg@gmail.com> Cc: stable@kernel.org (2.6.35) Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 22 9月, 2010 2 次提交
-
-
由 Jan Kara 提交于
Inodes of devices such as /dev/zero can get dirty for example via utime(2) syscall or due to atime update. Backing device of such inodes (zero_bdi, etc.) is however unable to handle dirty inodes and thus __mark_inode_dirty complains. In fact, inode should be rather dirtied against backing device of the filesystem holding it. This is generally a good rule except for filesystems such as 'bdev' or 'mtd_inodefs'. Inodes in these pseudofilesystems are referenced from ordinary filesystem inodes and carry mapping with real data of the device. Thus for these inodes we have to use inode->i_mapping->backing_dev_info as we did so far. We distinguish these filesystems by checking whether sb->s_bdi points to a non-trivial backing device or not. Example: Assume we have an ext3 filesystem on /dev/sda1 mounted on /. There's a device inode A described by a path "/dev/sdb" on this filesystem. This inode will be dirtied against backing device "8:0" after this patch. bdev filesystem contains block device inode B coupled with our inode A. When someone modifies a page of /dev/sdb, it's B that gets dirtied and the dirtying happens against the backing device "8:16". Thus both inodes get filed to a correct bdi list. Cc: stable@kernel.org Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Jan Kara 提交于
These devices don't do any writeback but their device inodes still can get dirty so mark bdi appropriately so that bdi code does the right thing and files inodes to lists of bdi carrying the device inodes. Cc: stable@kernel.org Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 20 9月, 2010 1 次提交
-
-
由 Jan Harkes 提交于
Coda's REQ_* defines were renamed to avoid clashes with the block layer (commit 4aeefdc6: "coda: fixup clash with block layer REQ_* defines"). However one was missed and response messages are no longer matched with requests and waiting threads are no longer woken up. This patch fixes this. Signed-off-by: NJan Harkes <jaharkes@cs.cmu.edu> [ Also fixed up whitespace while at it -Linus ] Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 9月, 2010 3 次提交
-
-
由 Wu Fengguang 提交于
mmotm/fs/ocfs2/cluster/tcp.c: In function ‘o2net_send_message_vec’: mmotm/fs/ocfs2/cluster/tcp.c:980:6: warning: ‘ret’ may be used uninitialized in this function It seems a real bug introduced by commit 9af0b38f (ocfs2/net: Use wait_event() in o2net_send_message_vec()). cc: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: NWu Fengguang <fengguang.wu@intel.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
由 Sage Weil 提交于
We select CRYPTO_AES, but not CRYPTO. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
See if the i_data mapping has any pages to determine if the FILE_CACHE capability is currently in use, instead of assuming it is any time the rdcache_gen value is set (i.e., issued -> used). This allows the MDS RECALL_STATE process work for inodes that have cached pages. Signed-off-by: NSage Weil <sage@newdream.net>
-
- 17 9月, 2010 3 次提交
-
-
由 Sage Weil 提交于
Sending multiple flushsnap messages is problematic because we ignore the response if the tid doesn't match, and the server may only respond to each one once. It's also a waste. So, skip cap_snaps that are already on the flushing list, unless the caller tells us to resend (because we are reconnecting). Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Steven Whitehouse 提交于
Looks like this crept in, in a recent update. Reported-by: NKrzysztof Urbaniak <urban@bash.org.pl> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Sage Weil 提交于
The cap_snap creation/queueing relies on both the current i_head_snapc _and_ the i_snap_realm pointers being correct, so that the new cap_snap can properly reference the old context and the new i_head_snapc can be updated to reference the new snaprealm's context. To fix this, we: - move inodes completely to the new (split) realm so that i_snap_realm is correct, and - generate the new snapc's _before_ queueing the cap_snaps in ceph_update_snap_trace(). Signed-off-by: NSage Weil <sage@newdream.net>
-
- 15 9月, 2010 4 次提交
-
-
由 Jeff Moyer 提交于
Tavis Ormandy pointed out that do_io_submit does not do proper bounds checking on the passed-in iocb array: if (unlikely(nr < 0)) return -EINVAL; if (unlikely(!access_ok(VERIFY_READ, iocbpp, (nr*sizeof(iocbpp))))) return -EFAULT; ^^^^^^^^^^^^^^^^^^ The attached patch checks for overflow, and if it is detected, the number of iocbs submitted is scaled down to a number that will fit in the long. This is an ok thing to do, as sys_io_submit is documented as returning the number of iocbs submitted, so callers should handle a return value of less than the 'nr' argument passed in. Reported-by: NTavis Ormandy <taviso@cmpxchg8b.com> Signed-off-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jeff Layton 提交于
cifs_get_smb_ses must be called on a server pointer on which it holds an active reference. It first does a search for an existing SMB session. If it finds one, it'll put the server reference and then try to ensure that the negprot is done, etc. If it encounters an error at that point then it'll return an error. There's a potential problem here though. When cifs_get_smb_ses returns an error, the caller will also put the TCP server reference leading to a double-put. Fix this by having cifs_get_smb_ses only put the server reference if it found an existing session that it could use and isn't returning an error. Cc: stable@kernel.org Reviewed-by: NSuresh Jayaraman <sjayaraman@suse.de> Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NSteve French <sfrench@us.ibm.com>
-
由 Sage Weil 提交于
Stop sending FLUSHSNAP messages when we hit a capsnap that has dirty_pages or is still writing. We'll send the newer capsnaps only after the older ones complete. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
The 'follows' should match the seq for the snap context for the given snap cap, which is the context under which we have been dirtying and writing data and metadata. The snapshot that _contains_ those updates thus _follows_ that context's seq #. Signed-off-by: NSage Weil <sage@newdream.net>
-
- 14 9月, 2010 1 次提交
-
-
由 Sage Weil 提交于
When adding the readdir results to the cache, ceph_set_dentry_offset was clobbered our just-set offset. This can cause the readdir result offsets to get out of sync with the server. Add an argument to the helper so that it does not. This bug was introduced by 1cd3935b. Signed-off-by: NSage Weil <sage@newdream.net>
-
- 13 9月, 2010 3 次提交
-
-
由 Aneesh Kumar K.V 提交于
We should not use dotlversion for the dotu inode operations Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>
-
由 Aneesh Kumar K.V 提交于
We should use the cached dentry operation only if caching mode is enabled Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>
-
由 jvrao 提交于
NULL fid should be handled in cases where we endup calling v9fs_dir_release() before even we instantiate the fid in filp. Signed-off-by: NVenkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>
-