- 11 10月, 2008 1 次提交
-
-
由 Theodore Ts'o 提交于
The ext4 filesystem is getting stable enough that it's time to drop the "dev" prefix. Also remove the requirement for the TEST_FILESYS flag. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 07 10月, 2008 2 次提交
-
-
由 Andi Kleen 提交于
While reading code I noticed that ext4_put_super() dirties the superblock bh twice. It is always done in ext4_commit_super() too. Remove the redundant dirty operation. Should be a nop semantically. Signed-off-by: NAndi Kleen <ak@linux.intel.com>
-
由 Eric Sandeen 提交于
ext4_ext_walk_space() was reinstated to be used for iterating over file extents with a callback; it is used by the ext4 fiemap implementation. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org
-
- 09 10月, 2008 1 次提交
-
-
由 Kalpak Shah 提交于
ext4_xattr_set_handle() eventually ends up calling ext4_mark_inode_dirty() which tries to expand the inode by shifting the EAs. This leads to the xattr_sem being downed again and leading to a deadlock. This patch makes sure that if ext4_xattr_set_handle() is in the call-chain, ext4_mark_inode_dirty() will not expand the inode. Signed-off-by: NKalpak Shah <kalpak.shah@sun.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 06 10月, 2008 1 次提交
-
-
由 Theodore Ts'o 提交于
This debugging markers are designed to debug problems such as the random filesystem latency problems reported by Arjan. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 10 10月, 2008 3 次提交
-
-
由 Frederic Bohe 提交于
This fixes a bug which caused on-line resizing of filesystems with a 1k blocksize to fail. The root cause of this bug was the fact that if an uninitalized bitmap block gets read in by userspace (which e2fsprogs does try to avoid, but can happen when the blocksize is less than the pagesize and an adjacent blocks is read into memory) ext4_read_block_bitmap() was erroneously depending on the buffer uptodate flag to decide whether it needed to initialize the bitmap block in memory --- i.e., to set the standard set of blocks in use by a block group (superblock, bitmaps, inode table, etc.). Essentially, ext4_read_block_bitmap() assumed it was the only routine that might try to read a block containing a block bitmap, which is simply not true. To fix this, ext4_read_block_bitmap() and ext4_read_inode_bitmap() must always initialize uninitialized bitmap blocks. Once a block or inode is allocated out of that bitmap, it will be marked as initialized in the block group descriptor, so in general this won't result any extra unnecessary work. Signed-off-by: NFrederic Bohe <frederic.bohe@bull.net> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
With modern hard drives, reading 64k takes roughly the same time as reading a 4k block. So request readahead for adjacent inode table blocks to reduce the time it takes when iterating over directories (especially when doing this in htree sort order) in a cold cache case. With this patch, the time it takes to run "git status" on a kernel tree after flushing the caches via "echo 3 > /proc/sys/vm/drop_caches" is reduced by 21%. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 24 9月, 2008 1 次提交
-
-
由 Theodore Ts'o 提交于
Previously mballoc created a separate set of functions for each proc file. This combines the tunables into a single set of functions which gets used for all of the per-superblock proc files, saving approximately 2k of compiled object code. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 23 9月, 2008 2 次提交
-
-
由 Theodore Ts'o 提交于
...and into the core setup/teardown code in fs/ext4/super.c so that other parts of ext4 can define tuning parameters. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
This is a port of a patch from Linus which fixes a 200+ byte stack usage problem in ext4_get_parent(). It's more efficient to pass down only the actual parts of the dentry that matter: the parent inode and the name, instead of allocating a struct dentry on the stack. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 07 10月, 2008 1 次提交
-
-
由 Theodore Ts'o 提交于
This fixes some very common warnings reported by kerneloops.org Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 14 9月, 2008 2 次提交
-
-
由 Eric Sandeen 提交于
lg_prealloc_list seems to cry out for a per-cpu data structure; on a large smp system I think this should be better. I've lightly tested this change on a 4-cpu system. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Acked-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Pick an ioctl number for EXT4_IOC_MIGRATE that won't conflict with other ext4 ioctl's. Since there haven't been any major userspace users of this ioctl, we can afford to change this now, to avoid potential problems later. Also, reorder the ioctl numbers in ext4.h to avoid this sort of mistake in the future. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 09 10月, 2008 1 次提交
-
-
由 Aneesh Kumar K.V 提交于
This patch hooks the ext3 to ext4 migrate interface to EXT4_IOC_SETFLAGS ioctl. The userspace interface is via chattr +e. We only allow setting extent flags. Clearing extent flag (migrating from ext4 to ext3) is not supported. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 14 9月, 2008 1 次提交
-
-
由 Aneesh Kumar K.V 提交于
The migrate ioctl writes to the filsystem, so we need to elevate the write count. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 08 9月, 2008 1 次提交
-
-
由 Li Zefan 提交于
If there group descriptors are corrupted we need unlock the block group lock before returning from the function; else we will oops when freeing a spinlock which is still being held. Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 17 9月, 2008 1 次提交
-
-
由 Theodore Ts'o 提交于
Calculate the journal device name once and stash it away in the journal_s structure. This avoids needing to call bdevname() everywhere and reduces stack usage by not needing to allocate an on-stack buffer. In addition, we eliminate the '/' that can appear in device names (e.g. "cciss/c0d0p9" --- see kernel bugzilla #11321) that can cause problems when creating proc directory names, and include the inode number to support ocfs2 which creates multiple journals with different inode numbers. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 14 9月, 2008 1 次提交
-
-
由 Alexey Dobriyan 提交于
ext4 creates per-suberblock directory in /proc/ext4/ . Name used as basis is taken from bdevname, which, surprise, can contain slash. However, proc while allowing to use proc_create("a/b", parent) form of PDE creation, assumes that parent/a was already created. bdevname in question is 'cciss/c0d0p9', directory is not created and all this stuff goes directly into /proc (which is real bug). Warning comes when _second_ partition is mounted. http://bugzilla.kernel.org/show_bug.cgi?id=11321Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 08 9月, 2008 1 次提交
-
-
由 Frederic Bohe 提交于
This fixes a bug which prevented the newly created inodes after a resize from being used on filesystems with flex_bg. Signed-off-by: NFrederic Bohe <frederic.bohe@bull.net> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 09 10月, 2008 1 次提交
-
-
由 Eric Sandeen 提交于
Note: some people thinks this represents a security bug, since it might make the system go away while it is printing a large number of console messages, especially if a serial console is involved. Hence, it has been assigned CVE-2008-3528, but it requires that the attacker either has physical access to your machine to insert a USB disk with a corrupted filesystem image (at which point why not just hit the power button), or is otherwise able to convince the system administrator to mount an arbitrary filesystem image (at which point why not just include a setuid shell or world-writable hard disk device file or some such). Me, I think they're just being silly. --tytso Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: linux-ext4@vger.kernel.org Cc: Eugene Teo <eugeneteo@kernel.sg>
-
- 14 9月, 2008 2 次提交
-
-
由 Aneesh Kumar K.V 提交于
With delayed allocation we use i_data_sem to update i_disksize. We need to update i_disksize only if the new size specified is greater than the current value and we need to make sure we don't race with other i_disksize update. With delayed allocation we will switch to the write_begin function for non-delayed allocation if we are low on free blocks. This means the write_begin function for non-delayed allocation also needs to use the same locking. We also need to check and update i_disksize even if the new size is less that inode.i_size because of delayed allocation. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Aneesh Kumar K.V 提交于
For blocksize < pagesize we need to remove blocks that got allocated in block_write_begin() if we fail with ENOSPC for later blocks. block_write_begin() internally does this if it allocated pages locally. This makes sure we don't have blocks outside inode.i_size during ENOSPC. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 09 9月, 2008 3 次提交
-
-
由 Aneesh Kumar K.V 提交于
When we truncate files, the meta-data blocks released are not reused untill we commit the truncate transaction. That means delayed get_block request will return ENOSPC even if we have free blocks left. Force a journal commit and retry block allocation if we get ENOSPC with free blocks left. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMingming Cao <cmm@us.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Aneesh Kumar K.V 提交于
Make sure we don't add the inode to the journal handle until after the block allocation, so that a journal commit will not include the inode in case of block allocation failure. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMingming Cao <cmm@us.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Aneesh Kumar K.V 提交于
We run into ENOSPC error on nonmballoc ext4, even when there is free blocks on the filesystem. The patch includes two changes: a) Set reservation to NULL if we trying to allocate near group_target_block from the goal group if the free block in the group is less than windows. This should give us a better chance to allocate near group_target_block. This also ensures that if we are not allocating near group_target_block then we don't trun off reservation. This should enable us to allocate with reservation from other groups that have large free blocks count. b) we don't need to check the window size if the block reservation is off. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMingming Cao <cmm@us.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 09 10月, 2008 2 次提交
-
-
由 Aneesh Kumar K.V 提交于
This patch converts some usage of ext4_fsblk_t to s64. This is needed so that some of the sign conversion works as expected in if loops. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMingming Cao <cmm@us.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Aneesh Kumar K.V 提交于
The delayed allocation code allocates blocks during writepages(), which can not handle block allocation failures. To deal with this, we switch away from delayed allocation mode when we are running low on free blocks. This also allows us to avoid needing to reserve a large number of meta-data blocks in case all of the requested blocks are discontiguous. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMingming Cao <cmm@us.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 10 10月, 2008 1 次提交
-
-
由 Aneesh Kumar K.V 提交于
This patch adds dirty block accounting using percpu_counters. Delayed allocation block reservation is now done by updating dirty block counter. In a later patch we switch to non delalloc mode if the filesystem free blocks is greater than 150% of total filesystem dirty blocks Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao<cmm@us.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 09 9月, 2008 1 次提交
-
-
由 Aneesh Kumar K.V 提交于
During block reservation if we don't have enough blocks left, retry block reservation with smaller block counts. This makes sure we try fallocate and DIO with smaller request size and don't fail early. The delayed allocation reservation cannot try with smaller block count. So retry block reservation to handle temporary disk full conditions. Also print free blocks details if we fail block allocation during writepages. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMingming Cao <cmm@us.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 09 10月, 2008 1 次提交
-
-
由 Aneesh Kumar K.V 提交于
With delayed allocation we need to make sure block are reserved before we attempt to allocate them. Otherwise we get block allocation failure (ENOSPC) during writepages which cannot be handled. This would mean silent data loss (We do a printk stating data will be lost). This patch updates the DIO and fallocate code path to do block reservation before block allocation. This is needed to make sure parallel DIO and fallocate request doesn't take block out of delayed reserve space. When free blocks count go below a threshold we switch to a slow patch which looks at other CPU's accumulated percpu counter values. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 20 8月, 2008 1 次提交
-
-
由 Aneesh Kumar K.V 提交于
We are a bit agressive in invalidating all the pages. But it is ok because we really don't know why the block allocation failed and it is better to come of the writeback path so that user can look for more info. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
-
- 09 9月, 2008 3 次提交
-
-
由 Theodore Ts'o 提交于
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 10 10月, 2008 1 次提交
-
-
由 Mingming Cao 提交于
percpu_counter_sum_and_set() and percpu_counter_sum() is the same except the former updates the global counter after accounting. Since we are taking the fbc->lock to calculate the precise value of the counter in percpu_counter_sum() anyway, it should simply set fbc->count too, as the percpu_counter_sum_and_set() does. This patch merges these two interfaces into one. Signed-off-by: NMingming Cao <cmm@us.ibm.com> Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 03 8月, 2008 1 次提交
-
-
由 Eric Sandeen 提交于
The variables 'from' and 'to' are not used anywhere. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Acked-by: NMingming Cao <cmm@us.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 01 8月, 2008 1 次提交
-
-
由 Al Viro 提交于
* new helper: vfs_quota_on_path(); equivalent of vfs_quota_on() sans the pathname resolution. * callers of vfs_quota_on() that do their own pathname resolution and checks based on it are switched to vfs_quota_on_path(); that way we avoid the races. * reiserfs leaked dentry/vfsmount references on several failure exits. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 29 7月, 2008 1 次提交
-
-
由 Hisashi Hifumi 提交于
When we read some part of a file through pagecache, if there is a pagecache of corresponding index but this page is not uptodate, read IO is issued and this page will be uptodate. I think this is good for pagesize == blocksize environment but there is room for improvement on pagesize != blocksize environment. Because in this case a page can have multiple buffers and even if a page is not uptodate, some buffers can be uptodate. So I suggest that when all buffers which correspond to a part of a file that we want to read are uptodate, use this pagecache and copy data from this pagecache to user buffer even if a page is not uptodate. This can reduce read IO and improve system throughput. I wrote a benchmark program and got result number with this program. This benchmark do: 1: mount and open a test file. 2: create a 512MB file. 3: close a file and umount. 4: mount and again open a test file. 5: pwrite randomly 300000 times on a test file. offset is aligned by IO size(1024bytes). 6: measure time of preading randomly 100000 times on a test file. The result was: 2.6.26 330 sec 2.6.26-patched 226 sec Arch:i386 Filesystem:ext3 Blocksize:1024 bytes Memory: 1GB On ext3/4, a file is written through buffer/block. So random read/write mixed workloads or random read after random write workloads are optimized with this patch under pagesize != blocksize environment. This test result showed this. The benchmark program is as follows: #include <stdio.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> #include <time.h> #include <stdlib.h> #include <string.h> #include <sys/mount.h> #define LEN 1024 #define LOOP 1024*512 /* 512MB */ main(void) { unsigned long i, offset, filesize; int fd; char buf[LEN]; time_t t1, t2; if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) { perror("cannot mount\n"); exit(1); } memset(buf, 0, LEN); fd = open("/root/test1/testfile", O_CREAT|O_RDWR|O_TRUNC); if (fd < 0) { perror("cannot open file\n"); exit(1); } for (i = 0; i < LOOP; i++) write(fd, buf, LEN); close(fd); if (umount("/root/test1/") < 0) { perror("cannot umount\n"); exit(1); } if (mount("/dev/sda1", "/root/test1/", "ext3", 0, 0) < 0) { perror("cannot mount\n"); exit(1); } fd = open("/root/test1/testfile", O_RDWR); if (fd < 0) { perror("cannot open file\n"); exit(1); } filesize = LEN * LOOP; for (i = 0; i < 300000; i++){ offset = (random() % filesize) & (~(LEN - 1)); pwrite(fd, buf, LEN, offset); } printf("start test\n"); time(&t1); for (i = 0; i < 100000; i++){ offset = (random() % filesize) & (~(LEN - 1)); pread(fd, buf, LEN, offset); } time(&t2); printf("%ld sec\n", t2-t1); close(fd); if (umount("/root/test1/") < 0) { perror("cannot umount\n"); exit(1); } } Signed-off-by: NHisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Christoph Hellwig <hch@infradead.org> Cc: Jan Kara <jack@ucw.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 19 8月, 2008 1 次提交
-
-
由 Aneesh Kumar K.V 提交于
For small file block allocations, mballoc uses per cpu prealloc space. Use goal block when searching for the right prealloc space. Also make sure ext4_da_writepages tries to write all the pages for small files in single attempt Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-