- 12 2月, 2011 1 次提交
-
-
由 Eric Sandeen 提交于
ext4 has a data corruption case when doing non-block-aligned asynchronous direct IO into a sparse file, as demonstrated by xfstest 240. The root cause is that while ext4 preallocates space in the hole, mappings of that space still look "new" and dio_zero_block() will zero out the unwritten portions. When more than one AIO thread is going, they both find this "new" block and race to zero out their portion; this is uncoordinated and causes data corruption. Dave Chinner fixed this for xfs by simply serializing all unaligned asynchronous direct IO. I've done the same here. The difference is that we only wait on conversions, not all IO. This is a very big hammer, and I'm not very pleased with stuffing this into ext4_file_write(). But since ext4 is DIO_LOCKING, we need to serialize it at this high level. I tried to move this into ext4_ext_direct_IO, but by then we have the i_mutex already, and we will wait on the work queue to do conversions - which must also take the i_mutex. So that won't work. This was originally exposed by qemu-kvm installing to a raw disk image with a normal sector-63 alignment. I've tested a backport of this patch with qemu, and it does avoid the corruption. It is also quite a lot slower (14 min for package installs, vs. 8 min for well-aligned) but I'll take slow correctness over fast corruption any day. Mingming suggested that we can track outstanding conversions, and wait on those so that non-sparse files won't be affected, and I've implemented that here; unaligned AIO to nonsparse files won't take a perf hit. [tytso@mit.edu: Keep the mutex as a hashed array instead of bloating the ext4 inode] [tytso@mit.edu: Fix up namespace issues so that global variables are protected with an "ext4_" prefix.] Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 17 1月, 2011 1 次提交
-
-
由 Christoph Hellwig 提交于
Currently all filesystems except XFS implement fallocate asynchronously, while XFS forced a commit. Both of these are suboptimal - in case of O_SYNC I/O we really want our allocation on disk, especially for the !KEEP_SIZE case where we actually grow the file with user-visible zeroes. On the other hand always commiting the transaction is a bad idea for fast-path uses of fallocate like for example in recent Samba versions. Given that block allocation is a data plane operation anyway change it from an inode operation to a file operation so that we have the file structure available that lets us check for O_SYNC. This also includes moving the code around for a few of the filesystems, and remove the already unnedded S_ISDIR checks given that we only wire up fallocate for regular files. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 11 1月, 2011 1 次提交
-
-
由 Theodore Ts'o 提交于
Replace the jbd2_inode structure (which is 48 bytes) with a pointer and only allocate the jbd2_inode when it is needed --- that is, when the file system has a journal present and the inode has been opened for writing. This allows us to further slim down the ext4_inode_info structure. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 28 10月, 2010 1 次提交
-
-
由 Toshiyuki Okajima 提交于
The llseek system call should return EINVAL if passed a seek offset which results in a write error. What this maximum offset should be depends on whether or not the huge_file file system feature is set, and whether or not the file is extent based or not. If the file has no "EXT4_EXTENTS_FL" flag, the maximum size which can be written (write systemcall) is different from the maximum size which can be sought (lseek systemcall). For example, the following 2 cases demonstrates the differences between the maximum size which can be written, versus the seek offset allowed by the llseek system call: #1: mkfs.ext3 <dev>; mount -t ext4 <dev> #2: mkfs.ext3 <dev>; tune2fs -Oextent,huge_file <dev>; mount -t ext4 <dev> Table. the max file size which we can write or seek at each filesystem feature tuning and file flag setting +============+===============================+===============================+ | \ File flag| | | | \ | !EXT4_EXTENTS_FL | EXT4_EXTETNS_FL | |case \| | | +------------+-------------------------------+-------------------------------+ | #1 | write: 2194719883264 | write: -------------- | | | seek: 2199023251456 | seek: -------------- | +------------+-------------------------------+-------------------------------+ | #2 | write: 4402345721856 | write: 17592186044415 | | | seek: 17592186044415 | seek: 17592186044415 | +------------+-------------------------------+-------------------------------+ The differences exist because ext4 has 2 maxbytes which are sb->s_maxbytes (= extent-mapped maxbytes) and EXT4_SB(sb)->s_bitmap_maxbytes (= block-mapped maxbytes). Although generic_file_llseek uses only extent-mapped maxbytes. (llseek of ext4_file_operations is generic_file_llseek which uses sb->s_maxbytes.) Therefore we create ext4 llseek function which uses 2 maxbytes. The new own function originates from generic_file_llseek(). If the file flag, "EXT4_EXTENTS_FL" is not set, the function alters inode->i_sb->s_maxbytes into EXT4_SB(inode->i_sb)->s_bitmap_maxbytes. Signed-off-by: NToshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: Andreas Dilger <adilger.kernel@dilger.ca>
-
- 27 7月, 2010 1 次提交
-
-
由 Toshiyuki Okajima 提交于
By running the following reproducer, we can confirm that the write system call returns with 0 when it should return the error EFBIG. #!/bin/sh /bin/dd if=/dev/zero of=./img bs=1k count=1 seek=1024k > /dev/null 2>&1 /sbin/mkfs.ext3 -Fq ./img /bin/mount -o loop -t ext4 ./img /mnt /bin/touch /mnt/file strace /bin/dd if=/dev/zero of=/mnt/file conv=notrunc bs=1k count=1 seek=$((2194719883264/1024)) 2>&1 | /bin/egrep "write.* 1024\) = " /bin/umount /mnt exit Signed-off-by: NToshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: Eric Sandeen <sandeen@redhat.com>
-
- 12 6月, 2010 1 次提交
-
-
由 Theodore Ts'o 提交于
We don't need to set s_dirt in most of the ext4 code when journaling is enabled. In ext3/4 some of the summary statistics for # of free inodes, blocks, and directories are calculated from the per-block group statistics when the file system is mounted or unmounted. As a result the superblock doesn't have to be updated, either via the journal or by setting s_dirt. There are a few exceptions, most notably when resizing the file system, where the superblock needs to be modified --- and in that case it should be done as a journalled operation if possible, and s_dirt set only in no-journal mode. This patch will optimize out some unneeded disk writes when using ext4 with a journal. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 17 5月, 2010 1 次提交
-
-
由 Dmitry Monakhov 提交于
At several places we modify EXT4_I(inode)->i_flags without holding i_mutex (ext4_do_update_inode, ...). These modifications are racy and we can lose updates to i_flags. So convert handling of i_flags to use bitops which are atomic. https://bugzilla.kernel.org/show_bug.cgi?id=15792Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 05 3月, 2010 2 次提交
-
-
由 Christoph Hellwig 提交于
Get rid of the initialize dquot operation - it is now always called from the filesystem and if a filesystem really needs it's own (which none currently does) it can just call into it's own routine directly. Rename the now static low-level dquot_initialize helper to __dquot_initialize and vfs_dq_init to dquot_initialize to have a consistent namespace. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJan Kara <jack@suse.cz>
-
由 Christoph Hellwig 提交于
Currently various places in the VFS call vfs_dq_init directly. This means we tie the quota code into the VFS. Get rid of that and make the filesystem responsible for the initialization. For most metadata operations this is a straight forward move into the methods, but for truncate and open it's a bit more complicated. For truncate we currently only call vfs_dq_init for the sys_truncate case because open already takes care of it for ftruncate and open(O_TRUNC) - the new code causes an additional vfs_dq_init for those which is harmless. For open the initialization is moved from do_filp_open into the open method, which means it happens slightly earlier now, and only for regular files. The latter is fine because we don't need to initialize it for operations on special files, and we already do it as part of the namespace operations for directories. Add a dquot_file_open helper that filesystems that support generic quotas can use to fill in ->open. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJan Kara <jack@suse.cz>
-
- 04 3月, 2010 1 次提交
-
-
由 Al Viro 提交于
path to mnt/mnt->mnt_root is no worse than that to mnt->mnt_parent/mnt->mnt_mountpoint *and* needs no pinning the sucker down (mnt is not going away and mnt->mnt_root won't change) Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 25 1月, 2010 1 次提交
-
-
由 Theodore Ts'o 提交于
At several places we modify EXT4_I(inode)->i_state without holding i_mutex (ext4_release_file, ext4_bmap, ext4_journalled_writepage, ext4_do_update_inode, ...). These modifications are racy and we can lose updates to i_state. So convert handling of i_state to use bitops which are atomic. Cc: Jan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 28 9月, 2009 1 次提交
-
-
由 Alexey Dobriyan 提交于
* mark struct vm_area_struct::vm_ops as const * mark vm_ops in AGP code But leave TTM code alone, something is fishy there with global vm_ops being used. Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 9月, 2009 1 次提交
-
-
由 Jan Kara 提交于
The syncing is now properly handled by generic_file_aio_write() so no special ext4 code is needed. CC: linux-ext4@vger.kernel.org CC: tytso@mit.edu Signed-off-by: NJan Kara <jack@suse.cz>
-
- 09 9月, 2009 1 次提交
-
-
由 Linus Torvalds 提交于
Don't implement per-filesystem 'extX_permission()' functions that have to be called for every path component operation, and instead just expose the actual ACL checking so that the VFS layer can now do it for us. Reviewed-by: NJames Morris <jmorris@namei.org> Acked-by: NSerge Hallyn <serue@us.ibm.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 6月, 2009 1 次提交
-
-
由 Theodore Ts'o 提交于
This field can be very helpful when a system administrator is trying to sort through large numbers of block devices or filesystem images. What is stored in this field can be ambiguous if multiple filesystem namespaces are in play; what we store in practice is the mountpoint interpreted by the process's namespace which first opens a file in the filesystem. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 28 3月, 2009 1 次提交
-
-
由 Aneesh Kumar K.V 提交于
With delayed allocation we should not/cannot discard inode prealloc space during file close. We would still have dirty pages for which we haven't allocated blocks yet. With this fix after each get_blocks request we check whether we have zero reserved blocks and if yes and we don't have any writers on the file we discard inode prealloc space. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 24 2月, 2009 1 次提交
-
-
由 Theodore Ts'o 提交于
When closing a file that had been previously truncated, force any delay allocated blocks that to be allocated so that if the filesystem is mounted with data=ordered, the data blocks will be pushed out to disk along with the journal commit. Many application programs expect this, so we do this to avoid zero length files if the system crashes unexpectedly. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 23 11月, 2008 1 次提交
-
-
由 Aneesh Kumar K.V 提交于
* Change EXT4_HAS_*_FEATURE to return a boolean * Add a function prototype for ext4_fiemap() in ext4.h * Make ext4_ext_fiemap_cb() and ext4_xattr_fiemap() be static functions * Add lock annotations to mb_free_blocks() Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 11 10月, 2008 1 次提交
-
-
由 Theodore Ts'o 提交于
The ext4 filesystem is getting stable enough that it's time to drop the "dev" prefix. Also remove the requirement for the TEST_FILESYS flag. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 07 10月, 2008 1 次提交
-
-
由 Eric Sandeen 提交于
ext4_ext_walk_space() was reinstated to be used for iterating over file extents with a callback; it is used by the ext4 fiemap implementation. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org
-
- 10 10月, 2008 1 次提交
-
-
由 Theodore Ts'o 提交于
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 09 9月, 2008 1 次提交
-
-
由 Theodore Ts'o 提交于
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 12 7月, 2008 2 次提交
-
-
由 Mingming Cao 提交于
Right now i_blocks is not getting updated until the blocks are actually allocaed on disk. This means with delayed allocation, right after files are copied, "ls -sF" shoes the file as taking 0 blocks on disk. "du" also shows the files taking zero space, which is highly confusing to the user. Since delayed allocation already keeps track of per-inode total number of blocks that are subject to delayed allocation, this patch fix this by using that to adjust the value returned by stat(2). When real block allocation is done, the i_blocks will get updated. Since the reserved blocks for delayed allocation will be decreased, this will be keep value returned by stat(2) consistent. Signed-off-by: NMingming Cao <cmm@us.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Aneesh Kumar K.V 提交于
We would like to get notified when we are doing a write on mmap section. This is needed with respect to preallocated area. We split the preallocated area into initialzed extent and uninitialzed extent in the call back. This let us handle ENOSPC better. Otherwise we get ENOSPC in the writepage and that would result in data loss. The changes are also needed to handle ENOSPC when writing to an mmap section of files with holes. Acked-by: NJan Kara <jack@suse.cz> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMingming Cao <cmm@us.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 30 4月, 2008 2 次提交
-
-
由 Christoph Hellwig 提交于
Move ext4 headers out of include/linux. This is just the trivial move, there's some more thing that could be done later. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NMingming Cao <cmm@us.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Andi Kleen 提交于
I checked ext4_ioctl and it looked largely safe to not be used without BKL. So convert it over to unlocked_ioctl. Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 29 1月, 2008 2 次提交
-
-
由 Aneesh Kumar K.V 提交于
We are currently taking the truncate_mutex for every read. This would have performance impact on large CPU configuration. Convert the lock to read write semaphore and take read lock when we are trying to read the file. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
-
由 Eric Sandeen 提交于
Calculate & store the max offset for bitmapped files, and catch too-large seeks, truncates, and writes in ext4, shortening or rejecting as appropriate. Signed-off-by: NEric Sandeen <sandeen@redhat.com>
-
- 18 7月, 2007 1 次提交
-
-
由 Amit Arora 提交于
This patch implements ->fallocate() inode operation in ext4. With this patch users of ext4 file systems will be able to use fallocate() system call for persistent preallocation. Current implementation only supports preallocation for regular files (directories not supported as of date) with extent maps. This patch does not support block-mapped files currently. Only FALLOC_ALLOCATE and FALLOC_RESV_SPACE modes are being supported as of now. Signed-off-by: NAmit Arora <aarora@in.ibm.com>
-
- 10 7月, 2007 1 次提交
-
-
由 Jens Axboe 提交于
They can use generic_file_splice_read() instead. Since sys_sendfile() now prefers that, there should be no change in behaviour. Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
-
- 13 2月, 2007 1 次提交
-
-
由 Arjan van de Ven 提交于
Many struct inode_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. Signed-off-by: NArjan van de Ven <arjan@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 12月, 2006 1 次提交
-
-
由 Josef "Jeff" Sipek 提交于
Change all the uses of f_{dentry,vfsmnt} to f_path.{dentry,mnt} in the ext4 filesystem. Signed-off-by: NJosef "Jeff" Sipek <jsipek@cs.sunysb.edu> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 12 10月, 2006 3 次提交
-
-
由 Mingming Cao 提交于
Reworked from a patch by Mingming Cao and Randy Dunlap Signed-off-By: NRandy Dunlap <rdunlap@xenotime.net> Signed-off-by: NMingming Cao <cmm@us.ibm.com> Signed-off-by: NDave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Mingming Cao 提交于
Mingming Cao originally did this work, and Shaggy reproduced it using some scripts from her. Signed-off-by: NMingming Cao <cmm@us.ibm.com> Signed-off-by: NDave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Dave Kleikamp 提交于
Start of the ext4 patch series. See Documentation/filesystems/ext4.txt for details. This is a simple copy of the files in fs/ext3 to fs/ext4 and /usr/incude/linux/ext3* to /usr/include/ex4* Signed-off-by: NDave Kleikamp <shaggy@austin.ibm.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 01 10月, 2006 3 次提交
-
-
由 Badari Pulavarty 提交于
This patch removes readv() and writev() methods and replaces them with aio_read()/aio_write() methods. Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Badari Pulavarty 提交于
This patch vectorizes aio_read() and aio_write() methods to prepare for collapsing all aio & vectored operations into one interface - which is aio_read()/aio_write(). Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Cc: Michael Holzheu <HOLZHEU@de.ibm.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 David Howells 提交于
Move the Ext3 device ioctl compat stuff from fs/compat_ioctl.c to the Ext3 driver so that the Ext3 header file doesn't need to be included. Signed-Off-By: NDavid Howells <dhowells@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 27 9月, 2006 1 次提交
-
-
由 Mingming Cao 提交于
Remove whitespace from ext3 and jbd, before we clone ext4. Signed-off-by: Mingming Cao<cmm@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 31 3月, 2006 1 次提交
-
-
由 Jens Axboe 提交于
This adds support for the sys_splice system call. Using a pipe as a transport, it can connect to files or sockets (latter as output only). From the splice.c comments: "splice": joining two ropes together by interweaving their strands. This is the "extended pipe" functionality, where a pipe is used as an arbitrary in-memory buffer. Think of a pipe as a small kernel buffer that you can use to transfer data from one end to the other. The traditional unix read/write is extended with a "splice()" operation that transfers data buffers to or from a pipe buffer. Named by Larry McVoy, original implementation from Linus, extended by Jens to support splicing to files and fixing the initial implementation bugs. Signed-off-by: NJens Axboe <axboe@suse.de> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-