- 24 2月, 2013 6 次提交
-
-
由 Zhang Yanfei 提交于
The three variables are calculated from nr_free_buffer_pages so change their types to unsigned long in case of overflow. Signed-off-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Zhang Yanfei 提交于
max_buffer_heads is calculated from nr_free_buffer_pages(), so change its type to unsigned long in case of overflow. Signed-off-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Shaohua Li 提交于
When I use several fast SSD to do swap, swapper_space.tree_lock is heavily contended. This makes each swap partition have one address_space to reduce the lock contention. There is an array of address_space for swap. The swap entry type is the index to the array. In my test with 3 SSD, this increases the swapout throughput 20%. [akpm@linux-foundation.org: revert unneeded change to __add_to_swap_cache] Signed-off-by: NShaohua Li <shli@fusionio.com> Cc: Hugh Dickins <hughd@google.com> Acked-by: NRik van Riel <riel@redhat.com> Acked-by: NMinchan Kim <minchan@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Xishi Qiu 提交于
Since MCE is an x86 concept, and this code is in mm/, it would be better to use the name num_poisoned_pages instead of mce_bad_pages. [akpm@linux-foundation.org: fix mm/sparse.c] Signed-off-by: NXishi Qiu <qiuxishi@huawei.com> Signed-off-by: NJiang Liu <jiang.liu@huawei.com> Suggested-by: NBorislav Petkov <bp@alien8.de> Reviewed-by: NWanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michel Lespinasse 提交于
do_mmap_pgoff() rounds up the desired size to the next PAGE_SIZE multiple, however there was no equivalent code in mm_populate(), which caused issues. This could be fixed by introduced the same rounding in mm_populate(), however I think it's preferable to make do_mmap_pgoff() return populate as a size rather than as a boolean, so we don't have to duplicate the size rounding logic in mm_populate(). Signed-off-by: NMichel Lespinasse <walken@google.com> Acked-by: NRik van Riel <riel@redhat.com> Tested-by: NAndy Lutomirski <luto@amacapital.net> Cc: Greg Ungerer <gregungerer@westnet.com.au> Cc: David Howells <dhowells@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michel Lespinasse 提交于
When creating new mappings using the MAP_POPULATE / MAP_LOCKED flags (or with MCL_FUTURE in effect), we want to populate the pages within the newly created vmas. This may take a while as we may have to read pages from disk, so ideally we want to do this outside of the write-locked mmap_sem region. This change introduces mm_populate(), which is used to defer populating such mappings until after the mmap_sem write lock has been released. This is implemented as a generalization of the former do_mlock_pages(), which accomplished the same task but was using during mlock() / mlockall(). Signed-off-by: NMichel Lespinasse <walken@google.com> Reported-by: NAndy Lutomirski <luto@amacapital.net> Acked-by: NRik van Riel <riel@redhat.com> Tested-by: NAndy Lutomirski <luto@amacapital.net> Cc: Greg Ungerer <gregungerer@westnet.com.au> Cc: David Howells <dhowells@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 23 2月, 2013 3 次提交
-
-
由 Al Viro 提交于
It's safe only under namespace_sem or vfsmount_lock; all places in fs/namespace.c that want mnt->mnt_ns->user_ns actually want to use current->nsproxy->mnt_ns->user_ns (note the calls of check_mnt() in there). Cc: stable@vger.kernel.org Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Lukas Czerner 提交于
ext4_has_free_clusters() should tell us whether there is enough free clusters to allocate, however number of free clusters in the file system is converted to blocks using EXT4_C2B() which is not only wrong use of the macro (we should have used EXT4_NUM_B2C) but it's also completely wrong concept since everything else is in cluster units. Moreover when calculating number of root clusters we should be using macro EXT4_NUM_B2C() instead of EXT4_B2C() otherwise the result might be off by one. However r_blocks_count should always be a multiple of the cluster ratio so doing a plain bit shift should be enough here. We avoid using EXT4_B2C() because it's confusing. As a result of the first problem number of free clusters is much bigger than it should have been and ext4_has_free_clusters() would return 1 even if there is really not enough free clusters available. Fix this by removing the EXT4_C2B() conversion of free clusters and using bit shift when calculating number of root clusters. This bug affects number of xfstests tests covering file system ENOSPC situation handling. With this patch most of the ENOSPC problems with bigalloc file system disappear, especially the errors caused by delayed allocation not having enough space when the actual allocation is finally requested. Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
-
由 Eryu Guan 提交于
len is 0 means no extent needs to be removed, so return immediately. Otherwise it could trigger the following BUG_ON() in ext4_es_remove_extent() end = lblk + len - 1; BUG_ON(end < lblk); This could be reproduced by a simple truncate(1) command by an unprivileged user truncate -s $(($((2**32 - 1)) * 4096)) /mnt/ext4/testfile The same is true for __es_insert_extent(). Patched kernel passed xfstests regression test. Signed-off-by: NEryu Guan <guaneryu@gmail.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
-
- 22 2月, 2013 11 次提交
-
-
由 Zhang Yanfei 提交于
In fill_elf_header(), elf->e_ident[EI_OSABI] is always set to ELF_OSABI, so remove the unused argument 'osabi'. Signed-off-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jan Kara 提交于
When stable pages are required, we have to wait if the page is just going to disk and we want to modify it. Add proper callback to ubifs_vm_page_mkwrite(). Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jan Kara 提交于
When stable pages are required, we have to wait if the page is just going to disk and we want to modify it. Add proper callback to ocfs2_grab_pages_for_write(). Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Acked-by: NJoel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Darrick J. Wong 提交于
This provides a band-aid to provide stable page writes on jbd without needing to backport the fixed locking and page writeback bit handling schemes of jbd2. The band-aid works by using bounce buffers to snapshot page contents instead of waiting. For those wondering about the ext3 bandage -- fixing the jbd locking (which was done as part of ext4dev years ago) is a lot of surgery, and setting PG_writeback on data pages when we actually hold the page lock dropped ext3 performance by nearly an order of magnitude. If we're going to migrate iscsi and raid to use stable page writes, the complaints about high latency will likely return. We might as well centralize their page snapshotting thing to one place. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Tested-by: NAndy Lutomirski <luto@amacapital.net> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Artem Bityutskiy <dedekind1@gmail.com> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Darrick J. Wong 提交于
Fix up the ->page_mkwrite handler to provide stable page writes if necessary. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Jan Kara <jack@suse.cz> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Darrick J. Wong 提交于
Create a helper function to check if a backing device requires stable page writes and, if so, performs the necessary wait. Then, make it so that all points in the memory manager that handle making pages writable use the helper function. This should provide stable page write support to most filesystems, while eliminating unnecessary waiting for devices that don't require the feature. Before this patchset, all filesystems would block, regardless of whether or not it was necessary. ext3 would wait, but still generate occasional checksum errors. The network filesystems were left to do their own thing, so they'd wait too. After this patchset, all the disk filesystems except ext3 and btrfs will wait only if the hardware requires it. ext3 (if necessary) snapshots pages instead of blocking, and btrfs provides its own bdi so the mm will never wait. Network filesystems haven't been touched, so either they provide their own stable page guarantees or they don't block at all. The blocking behavior is back to what it was before 3.0 if you don't have a disk requiring stable page writes. Here's the result of using dbench to test latency on ext2: 3.8.0-rc3: Operation Count AvgLat MaxLat ---------------------------------------- WriteX 109347 0.028 59.817 ReadX 347180 0.004 3.391 Flush 15514 29.828 287.283 Throughput 57.429 MB/sec 4 clients 4 procs max_latency=287.290 ms 3.8.0-rc3 + patches: WriteX 105556 0.029 4.273 ReadX 335004 0.005 4.112 Flush 14982 30.540 298.634 Throughput 55.4496 MB/sec 4 clients 4 procs max_latency=298.650 ms As you can see, the maximum write latency drops considerably with this patch enabled. The other filesystems (ext3/ext4/xfs/btrfs) behave similarly, but see the cover letter for those results. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Acked-by: NSteven Whitehouse <swhiteho@redhat.com> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Junxiao Bi 提交于
If lockres refresh failed, the super lock will never be released which will cause some processes on other cluster nodes hung forever. Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Tim Gardner 提交于
smatch analysis indicates a number of redundant NULL checks before calling kfree(), eg: fs/ocfs2/alloc.c:6138 ocfs2_begin_truncate_log_recovery() info: redundant null check on *tl_copy calling kfree() fs/ocfs2/alloc.c:6755 ocfs2_zero_range_for_truncate() info: redundant null check on pages calling kfree() etc.... [akpm@linux-foundation.org: revert dubious change in ocfs2_begin_truncate_log_recovery()] Signed-off-by: NTim Gardner <tim.gardner@canonical.com> Cc: Mark Fasheh <mfasheh@suse.com> Acked-by: NJoel Becker <jlbec@evilplan.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Wei Yongjun 提交于
The dereference should be moved below the NULL test. spatch with a semantic match is used to found this. (http://coccinelle.lip6.fr/) Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 MITSUNARI Shigeo 提交于
We found that bdev->bd_invalidated was left set once revalidate_disk() is called, which results in page cache flush every time that device is open. Specifically, we found this problem in MD block device. Once we resize a MD device, mdadm --monitor periodically flush all page cache for that device every 60 or 1000 seconds when it opens the device. This bug lies since at least 3.2.0 till the latest kernel(3.6.2). Patch is attached. The following steps will reproduce the problem. 1. prepair a block device (eg /dev/sdb). 2. create two partitions: sudo parted /dev/sdb mklabel gpt mkpart primary 0% 50% mkpart primary 50% 100% 3. create a md device. sudo mdadm -C /dev/md/hoge -l 1 -n 2 -e 1.2 --assume-clean --auto=md --symlink=no /dev/sdb1 /dev/sdb2 4. create file system and mount it sudo mkfs.ext3 /dev/md/hoge sudo mkdir /mnt/test sudo mount /dev/md/hoge /mnt/test 5. try to resize the device sudo mdadm -G /dev/md/hoge --size=max 6. create a file to fill file cache. sudo dd if=/dev/urandom of=/mnt/test/data bs=1M count=10 and verify the current status of file by free command. 7. mdadm monitor will open the md device every 1000 seconds and you will find all file cache on the device are cleared. The timing can be reduced by the following steps. a) kill mdadm and restart it with --delay option /sbin/mdadm --monitor --delay=30 --pid-file /var/run/mdadm/monitor.pid --daemonise --scan --syslog or open the md device directly. sudo dd if=/dev/md/hoge of=/dev/null bs=4096 count=1 Signed-off-by: NMITSUNARI Shigeo <herumi@nifty.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jim Somerville 提交于
Running the command: inotifywait -e unmount /mnt/disk immediately aborts with a -EINVAL return code. This is however a valid parameter. This abort occurs only if unmount is the sole event parameter. If other event parameters are supplied, then the unmount event wait will work. The problem was introduced by commit 44b350fc ("inotify: Fix mask checks"). In that commit, it states: The mask checks in inotify_update_existing_watch() and inotify_new_watch() are useless because inotify_arg_to_mask() sets FS_IN_IGNORED and FS_EVENT_ON_CHILD bits anyway. But instead of removing the useless checks, it did this: mask = inotify_arg_to_mask(arg); - if (unlikely(!mask)) + if (unlikely(!(mask & IN_ALL_EVENTS))) return -EINVAL; The problem is that IN_ALL_EVENTS doesn't include IN_UNMOUNT, and other parts of the code keep IN_UNMOUNT separate from IN_ALL_EVENTS. So the check should be: if (unlikely(!(mask & (IN_ALL_EVENTS | IN_UNMOUNT)))) But inotify_arg_to_mask(arg) always sets the IN_UNMOUNT bit in the mask anyway, so the check is always going to pass and thus should simply be removed. Also note that inotify_arg_to_mask completely controls what mask bits get set from arg, there's no way for invalid bits to get enabled there. Lets fix it by simply removing the useless broken checks. Signed-off-by: NJim Somerville <Jim.Somerville@windriver.com> Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: John McCutchan <john@johnmccutchan.com> Cc: Robert Love <rlove@rlove.org> Cc: Eric Paris <eparis@parisplace.org> Cc: <stable@vger.kernel.org> [2.6.37+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 20 2月, 2013 1 次提交
-
-
由 Trond Myklebust 提交于
Currently, nlmclnt_lock will break out of the for(;;) loop when the reclaimer wakes up the blocking lock thread by setting nlm_lck_denied_grace_period. This causes the lock request to fail with an ENOLCK error. The intention was always to ensure that we resend the lock request after the grace period has expired. Reported-by: NWangyuan Zhang <Wangyuan.Zhang@netapp.com> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
-
- 19 2月, 2013 4 次提交
-
-
由 Thomas Gleixner 提交于
The static lock initializers want to be fed the proper name of the lock and not some random string. In mainline random strings are obfuscating the readability of debug output, but for RT they prevent the spinlock substitution. Fix it up. Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Gao feng 提交于
proc_net_remove has been replaced by remove_proc_entry. we can remove it now. Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Gao feng 提交于
proc_net_fops_create has been replaced by proc_create, we can remove it now. Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Lukas Czerner 提交于
Currently when new xattr block is created or released we we would call dquot_free_block() or dquot_alloc_block() respectively, among the else decrementing or incrementing the number of blocks assigned to the inode by one block. This however does not work for bigalloc file system because we always allocate/free the whole cluster so we have to count with that in dquot_free_block() and dquot_alloc_block() as well. Use the clusters-to-blocks conversion EXT4_C2B() when passing number of blocks to the dquot_alloc/free functions to fix the problem. The problem has been revealed by xfstests #117 (and possibly others). Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NEric Sandeen <sandeen@redhat.com> Cc: stable@vger.kernel.org
-
- 18 2月, 2013 12 次提交
-
-
由 Zheng Liu 提交于
Although extent status is loaded on-demand, we also need to reclaim extent from the tree when we are under a heavy memory pressure because in some cases fragmented extent tree causes status tree costs too much memory. Here we maintain a lru list in super_block. When the extent status of an inode is accessed and changed, this inode will be move to the tail of the list. The inode will be dropped from this list when it is cleared. In the inode, a counter is added to count the number of cached objects in extent status tree. Here only written/unwritten/hole extent is counted because delayed extent doesn't be reclaimed due to fiemap, bigalloc and seek_data/hole need it. The counter will be increased as a new extent is allocated, and it will be decreased as a extent is freed. In this commit we use normal shrinker framework to reclaim memory from the status tree. ext4_es_reclaim_extents_count() traverses the lru list to count the number of reclaimable extents. ext4_es_shrink() tries to reclaim written/unwritten/hole extents from extent status tree. The inode that has been shrunk is moved to the tail of lru list. Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: Jan kara <jack@suse.cz>
-
由 Zheng Liu 提交于
This commit changes some interfaces in extent status tree because we need to use inode to count the cached objects in a extent status tree. Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: Jan kara <jack@suse.cz>
-
由 Zheng Liu 提交于
Single extent cache could be removed because we have extent status tree as a extent cache, and it would be better. Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: Jan kara <jack@suse.cz>
-
由 Zheng Liu 提交于
After tracking all extent status, we already have a extent cache in memory. Every time we want to lookup a block mapping, we can first try to lookup it in extent status tree to avoid a potential disk I/O. A new function called ext4_es_lookup_extent is defined to finish this work. When we try to lookup a block mapping, we always call ext4_map_blocks and/or ext4_da_map_blocks. So in these functions we first try to lookup a block mapping in extent status tree. A new flag EXT4_GET_BLOCKS_NO_PUT_HOLE is used in ext4_da_map_blocks in order not to put a hole into extent status tree because this hole will be converted to delayed extent in the tree immediately. Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: Jan kara <jack@suse.cz>
-
由 Zheng Liu 提交于
By recording the phycisal block and status, extent status tree is able to track the status of every extents. When we call _map_blocks functions to lookup an extent or create a new written/unwritten/delayed extent, this extent will be inserted into extent status tree. We don't load all extents from disk in alloc_inode() because it costs too much memory, and if a file is opened and closed frequently it will takes too much time to load all extent information. So currently when we create/lookup an extent, this extent will be inserted into extent status tree. Hence, the extent status tree may not comprehensively contain all of the extents found in the file. Here a condition we need to take care is that an extent might contains unwritten and delayed status simultaneously because an extent is delayed allocated and could be allocated by fallocate. At this time we need to keep delayed status because later we need to update delayed reservation space using it. Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: Jan kara <jack@suse.cz>
-
由 Zheng Liu 提交于
This commit lets ext4_ext_map_blocks return EXT4_MAP_UNWRITTEN flag because in later commit ext4_map_blocks needs to use this flag to determine the extent status. Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz>
-
由 Zheng Liu 提交于
This commit renames ext4_es_find_extent with ext4_es_find_delayed_extent and improve this function. First, we split input and output parameter. Second, this function never return the first block of the next delayed extent after 'es'. Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Cc: Jan kara <jack@suse.cz>
-
由 Zheng Liu 提交于
This commit adds two members in extent_status structure to let it record physical block and extent status. Here es_pblk is used to record both of them because physical block only has 48 bits. So extent status could be stashed into it so that we can save some memory. Now written, unwritten, delayed and hole are defined as status. Due to new member is added into extent status tree, all interfaces need to be adjusted. Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz>
-
由 Zheng Liu 提交于
This commit refines the extent status tree code. 1) A prefix 'es_' is added to to the extent status tree structure members. 2) Refactored es_remove_extent() so that __es_remove_extent() can be used by es_insert_extent() to remove the old extent entry(-ies) before inserting a new one. 3) Rename extent_status_end() to ext4_es_end() 4) ext4_es_can_be_merged() is define to check whether two extents can be merged or not. 5) Update and clarified comments. Signed-off-by: NZheng Liu <wenqing.lz@taobao.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Reviewed-by: NJan Kara <jack@suse.cz>
-
由 fanchaoting 提交于
now pnfs client uses block layout, maybe we can remove blocklayoutdriver first. if we umount later, it can cause oops in unset_pnfs_layoutdriver. because nfss->pnfs_curr_ld->clear_layoutdriver is invalid. reproduce it: modprobe blocklayoutdriver mount -t nfs4 -o minorversion=1 pnfsip:/ /mnt/ rmmod blocklayoutdriver umount /mnt then you can see following CPU 0 Pid: 17023, comm: umount.nfs4 Tainted: GF O 3.7.0-rc6-pnfs #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform RIP: 0010:[<ffffffffa04cfe6d>] [<ffffffffa04cfe6d>] unset_pnfs_layoutdriver+0x1d/0x70 [nfsv4] RSP: 0018:ffff8800022d9e48 EFLAGS: 00010286 RAX: ffffffffa04a1b00 RBX: ffff88000b013800 RCX: 0000000000000001 RDX: ffffffff81ae8ee0 RSI: ffff880001ee94b8 RDI: ffff88000b013800 RBP: ffff8800022d9e58 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff880001ee9400 R13: ffff8800105978c0 R14: 00007fff25846c08 R15: 0000000001bba550 FS: 00007f45ae7f0700(0000) GS:ffff880012c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffffffffa04a1b38 CR3: 0000000002c0c000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process umount.nfs4 (pid: 17023, threadinfo ffff8800022d8000, task ffff880006e48aa0) Stack: ffff8800105978c0 ffff88000b013800 ffff8800022d9e78 ffffffffa04cd0ce ffff8800022d9e78 ffff88000b013800 ffff8800022d9ea8 ffffffffa04755a7 ffff8800022d9ea8 ffff880002f96400 ffff88000b013800 ffff880002f96400 Call Trace: [<ffffffffa04cd0ce>] nfs4_destroy_server+0x1e/0x30 [nfsv4] [<ffffffffa04755a7>] nfs_free_server+0xb7/0x150 [nfs] [<ffffffffa047d4d5>] nfs_kill_super+0x35/0x40 [nfs] [<ffffffff81178d35>] deactivate_locked_super+0x45/0x70 [<ffffffff8117986a>] deactivate_super+0x4a/0x70 [<ffffffff81193ee2>] mntput_no_expire+0xd2/0x130 [<ffffffff81194d62>] sys_umount+0x72/0xe0 [<ffffffff8154af59>] system_call_fastpath+0x16/0x1b Code: 06 e1 b8 ea ff ff ff eb 9e 0f 1f 44 00 00 55 48 89 e5 53 48 83 ec 08 66 66 66 66 90 48 8b 87 80 03 00 00 48 89 fb 48 85 c0 74 29 <48> 8b 40 38 48 85 c0 74 02 ff d0 48 8b 03 3e ff 48 04 0f 94 c2 RIP [<ffffffffa04cfe6d>] unset_pnfs_layoutdriver+0x1d/0x70 [nfsv4] RSP <ffff8800022d9e48> CR2: ffffffffa04a1b38 ---[ end trace 29f75aaedda058bf ]--- Signed-off-by: fanchaoting<fanchaoting@cn.fujitsu.com> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
-
由 Tim Gardner 提交于
smatch analysis: fs/nfs/getroot.c:130 nfs_get_root() info: redundant null check on name calling kfree() fs/nfs/unlink.c:272 nfs_async_unlink() info: redundant null check on devname_garbage calling kfree() Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: linux-nfs@vger.kernel.org Signed-off-by: NTim Gardner <tim.gardner@canonical.com> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Weston Andros Adamson 提交于
layoutget's prepare hook can call rpc_exit with status = NFS4_OK (0). Because of this, nfs4_proc_layoutget can't depend on a 0 status to mean that the RPC was successfully sent, received and parsed. To fix this, use the result's len member to see if parsing took place. This fixes the following OOPS -- calling xdr_init_decode() with a buffer length 0 doesn't set the stream's 'p' member and ends up using uninitialized memory in filelayout_decode_layout. BUG: unable to handle kernel paging request at 0000000000008050 IP: [<ffffffff81282e78>] memcpy+0x18/0x120 PGD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:11.0/0000:02:01.0/irq CPU 1 Modules linked in: nfs_layout_nfsv41_files nfs lockd fscache auth_rpcgss nfs_acl autofs4 sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 dm_mirror dm_region_hash dm_log dm_mod ppdev parport_pc parport snd_ens1371 snd_rawmidi snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc e1000 microcode vmware_balloon i2c_piix4 i2c_core sg shpchp ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix mptspi mptscsih mptbase scsi_transport_spi [last unloaded: speedstep_lib] Pid: 1665, comm: flush-0:22 Not tainted 2.6.32-356-test-2 #2 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform RIP: 0010:[<ffffffff81282e78>] [<ffffffff81282e78>] memcpy+0x18/0x120 RSP: 0018:ffff88003dfab588 EFLAGS: 00010206 RAX: ffff88003dc42000 RBX: ffff88003dfab610 RCX: 0000000000000009 RDX: 000000003f807ff0 RSI: 0000000000008050 RDI: ffff88003dc42000 RBP: ffff88003dfab5b0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000080 R12: 0000000000000024 R13: ffff88003dc42000 R14: ffff88003f808030 R15: ffff88003dfab6a0 FS: 0000000000000000(0000) GS:ffff880003420000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000008050 CR3: 000000003bc92000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process flush-0:22 (pid: 1665, threadinfo ffff88003dfaa000, task ffff880037f77540) Stack: ffffffffa0398ac1 ffff8800397c5940 ffff88003dfab610 ffff88003dfab6a0 <d> ffff88003dfab5d0 ffff88003dfab680 ffffffffa01c150b ffffea0000d82e70 <d> 000000508116713b 0000000000000000 0000000000000000 0000000000000000 Call Trace: [<ffffffffa0398ac1>] ? xdr_inline_decode+0xb1/0x120 [sunrpc] [<ffffffffa01c150b>] filelayout_decode_layout+0xeb/0x350 [nfs_layout_nfsv41_files] [<ffffffffa01c17fc>] filelayout_alloc_lseg+0x8c/0x3c0 [nfs_layout_nfsv41_files] [<ffffffff8150e6ce>] ? __wait_on_bit+0x7e/0x90 Signed-off-by: NWeston Andros Adamson <dros@netapp.com> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
-
- 15 2月, 2013 3 次提交
-
-
由 Theodore Ts'o 提交于
Use ERR_PTR()/IS_ERR() abstraction instead of passing in a separate pointer to an integer for the error code, as a code cleanup. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Theodore Ts'o 提交于
The code to read in directory blocks and verify their metadata checksums was replicated in ten different places across fs/ext4/namei.c, and the code was buggy in subtle ways in a number of those replicated sites. In some cases, ext4_error() was called with a training newline. In others, in particularly in empty_dir(), it was possible to call ext4_dirent_csum_verify() on an index block, which would trigger false warnings requesting the system adminsitrator to run e2fsck. By refactoring the code, we make the code more readable, as well as shrinking the compiled object file by over 700 bytes and 50 lines of code. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Dave Chinner 提交于
When we are converting local data to an extent format as a result of adding an attribute, the type of data contained in the local fork determines the behaviour that needs to occur. xfs_bmap_add_attrfork_local() already handles the directory data case specially by using S_ISDIR() and calling out to xfs_dir2_sf_to_block(), but with verifiers we now need to handle each different type of metadata specially and different metadata formats require different verifiers (and eventually block header initialisation). There is only a single place that we add and attribute fork to the inode, but that is in the attribute code and it knows nothing about the specific contents of the data fork. It is only the case of local data that is the issue here, so adding code to hadnle this case in the attribute specific code is wrong. Hence we are really stuck trying to detect the data fork contents in xfs_bmap_add_attrfork_local() and performing the correct callout there. Luckily the current cases can be determined by S_IS* macros, and we can push the work off to data specific callouts, but each of those callouts does a lot of work in common with xfs_bmap_local_to_extents(). The only reason that this fails for symlinks right now is is that xfs_bmap_local_to_extents() assumes the data fork contains extent data, and so attaches a a bmap extent data verifier to the buffer and simply copies the data fork information straight into it. To fix this, allow us to pass a "formatting" callback into xfs_bmap_local_to_extents() which is responsible for setting the buffer type, initialising it and copying the data fork contents over to the new buffer. This allows callers to specify how they want to format the new buffer (which is necessary for the upcoming CRC enabled metadata blocks) and hence make xfs_bmap_local_to_extents() useful for any type of data fork content. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-