- 18 1月, 2010 1 次提交
-
-
由 Jan Engelhardt 提交于
parent 49313cdac7b34c9f7ecbb1780cfc648b1c082cd7 (v2.6.32-1-g49313cd) commit ff48c08e1c05c67e8348ab6f8a24de8034e0e34d Author: Jan Engelhardt <jengelh@medozas.de> Date: Wed Dec 9 22:57:36 2009 +0100 Btrfs: fix missing last-entry in readdir(3) When one does a 32-bit readdir(3), the last entry of a directory is missing. This is however not due to passing a large value to filldir, but it seems to have to do with glibc doing telldir or something quirky. In any case, this patch fixes it in practice. Signed-off-by: NJan Engelhardt <jengelh@medozas.de> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 18 12月, 2009 15 次提交
-
-
由 Chris Mason 提交于
The recent patch to make fallocate enospc friendly would send down a NULL trans handle to the allocator. This moves the transaction start to properly fix things. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Josef Bacik 提交于
This patch makes us a bit less zealous about making sure we have enough free metadata space by pearing down the size of new metadata chunks to 256mb instead of 1gb. Also, we used to try an allocate metadata chunks when allocating data, but that sort of thing is done elsewhere now so we can just remove it. With my -ENOSPC test I used to have 3gb reserved for metadata out of 75gb, now I have 1.7gb. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Matthew Wilcox 提交于
Christoph's patch e244a0ae doesn't display the discard option in /proc/mounts, leading to some confusion for me. Here's the missing bit. Signed-off-by: NMatthew Wilcox <willy@linux.intel.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 TARUISI Hiroaki 提交于
I rebased Christian Parpart's patch to deny hard link across subvolumes. Original patch modifies also btrfs_rename, but I excluded it because we can move across subvolumes now and it make no problem. ----------------- Hard link across subvolumes should not allowed in Btrfs. btrfs_link checks root of 'to' directory is same as root of 'from' file. If not same, btrfs_link returns -EPERM. Signed-off-by: NTARUISI Hiroaki <taruishi.hiroak@jp.fujitsu.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Sage Weil 提交于
We shouldn't silently ignore unrecognized options. Signed-off-by: NSage Weil <sage@newdream.net> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Yan, Zheng 提交于
If block group 0 is completely free, btrfs_read_block_groups will add extent [0, BTRFS_SUPER_INFO_OFFSET) to the free space cache. Signed-off-by: NYan Zheng <zheng.yan@oracle.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Yan, Zheng 提交于
The bytes_used field in root item was originally planned to trace the amount of used data and tree blocks. But it never worked right since we can't trace freeing of data accurately. This patch changes it to only trace the amount of tree blocks. Signed-off-by: NYan Zheng <zheng.yan@oracle.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Yan, Zheng 提交于
The check for skip pinned case is wrong, it may breaks the while loop too soon. Signed-off-by: NYan Zheng <zheng.yan@oracle.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Yan, Zheng 提交于
iput() can trigger new transactions if we are dropping the final reference, so calling it in btrfs_commit_transaction may end up deadlock. This patch adds delayed iput to avoid the issue. Signed-off-by: NYan Zheng <zheng.yan@oracle.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Yan, Zheng 提交于
Pass transaction handle down to security and ACL initialization functions, so we can avoid starting nested transactions Signed-off-by: NYan Zheng <zheng.yan@oracle.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Yan, Zheng 提交于
truncating and deleting regular files are unbound operations, so it's not good to do them in a single transaction. This patch makes btrfs_truncate and btrfs_delete_inode start a new transaction after all items in a tree leaf are deleted. Signed-off-by: NYan Zheng <zheng.yan@oracle.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Yan, Zheng 提交于
fallocate(2) may allocate large number of file extents, so it's not good to do it in a single transaction. This patch make fallocate(2) start a new transaction for each file extents it allocates. Signed-off-by: NYan Zheng <zheng.yan@oracle.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Yan, Zheng 提交于
btrfs_lookup_dentry may trigger orphan cleanup, so it's not good to call it while committing a transaction. Signed-off-by: NYan Zheng <zheng.yan@oracle.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Yan, Zheng 提交于
We do log replay in a single transaction, so it's not good to do unbound operations. This patch cleans up orphan inodes cleanup after replaying the log. It also avoids doing other unbound operations such as truncating a file during replaying log. These unbound operations are postponed to the orphan inode cleanup stage. Signed-off-by: NYan Zheng <zheng.yan@oracle.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Yan, Zheng 提交于
There are some cases file extents are inserted without involving ordered struct. In these cases, we update disk_i_size directly, without checking pending ordered extent and DELALLOC bit. This patch extends btrfs_ordered_update_i_size() to handle these cases. Signed-off-by: NYan Zheng <zheng.yan@oracle.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 16 12月, 2009 3 次提交
-
-
由 Yan, Zheng 提交于
Rewrite btrfs_drop_extents by using btrfs_duplicate_item, so we can avoid calling lock_extent within transaction. Signed-off-by: NYan Zheng <zheng.yan@oracle.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Yan, Zheng 提交于
btrfs_duplicate_item duplicates item with new key, guaranteeing the source item and the new items are in the same tree leaf and contiguous. It allows us to split file extent in place, without using lock_extent to prevent bookend extent race. Signed-off-by: NYan Zheng <zheng.yan@oracle.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Yan, Zheng 提交于
We allow two log transactions at a time, but use same flag to mark dirty tree-log btree blocks. So we may flush dirty blocks belonging to newer log transaction when committing a log transaction. This patch fixes the issue by using two flags to mark dirty tree-log btree blocks. Signed-off-by: NYan Zheng <zheng.yan@oracle.com> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 01 12月, 2009 3 次提交
-
-
由 Marc Dionne 提交于
When IMA is active, using dentry_open without updating the IMA counters will result in free/open imbalance errors when fput is eventually called. Signed-off-by: NMarc Dionne <marc.c.dionne@gmail.com> Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Howells 提交于
While building 2.6.32-rc8-git2 for Fedora I noticed the following thinko in commit 201a1542 ("FS-Cache: Handle pages pending storage that get evicted under OOM conditions"): fs/9p/cache.c: In function '__v9fs_fscache_release_page': fs/9p/cache.c:346: error: 'vnode' undeclared (first use in this function) fs/9p/cache.c:346: error: (Each undeclared identifier is reported only once fs/9p/cache.c:346: error: for each function it appears in.) make[2]: *** [fs/9p/cache.o] Error 1 Fix the 9P filesystem to correctly construct the argument to fscache_maybe_release_page(). Signed-off-by: NKyle McMartin <kyle@redhat.com> Signed-off-by: Xiaotian Feng <dfeng@redhat.com> [from identical patch] Signed-off-by: Stefan Lippers-Hollmann <s.l-h@gmx.de> [from identical patch] Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Woodhouse 提交于
In 2.6.23 kernel, commit a32ea1e1 ("Fix read/truncate race") fixed a race in the generic code, and as a side effect, now do_generic_file_read() can ask us to readpage() past the i_size. This seems to be correctly handled by the block routines (e.g. block_read_full_page() fills the page with zeroes in case if somebody is trying to read past the last inode's block). JFFS2 doesn't handle this; it assumes that it won't be asked to read pages which don't exist -- and thus that there will be at least _one_ valid 'frag' on the page it's being asked to read. It will fill any holes with the following memset: memset(buf, 0, min(end, frag->ofs + frag->size) - offset); When the 'closest smaller match' returned by jffs2_lookup_node_frag() is actually on a previous page and ends before 'offset', that results in: memset(buf, 0, <huge unsigned negative>); Hopefully, in most cases the corruption is fatal, and quickly causing random oopses, like this: root@10.0.0.4:~/ltp-fs-20090531# ./testcases/kernel/fs/ftest/ftest01 Unable to handle kernel paging request for data at address 0x00000008 Faulting instruction address: 0xc01cd980 Oops: Kernel access of bad area, sig: 11 [#1] [...] NIP [c01cd980] rb_insert_color+0x38/0x184 LR [c0043978] enqueue_hrtimer+0x88/0xc4 Call Trace: [c6c63b60] [c004f9a8] tick_sched_timer+0xa0/0xe4 (unreliable) [c6c63b80] [c0043978] enqueue_hrtimer+0x88/0xc4 [c6c63b90] [c0043a48] __run_hrtimer+0x94/0xbc [c6c63bb0] [c0044628] hrtimer_interrupt+0x140/0x2b8 [c6c63c10] [c000f8e8] timer_interrupt+0x13c/0x254 [c6c63c30] [c001352c] ret_from_except+0x0/0x14 --- Exception: 901 at memset+0x38/0x5c LR = jffs2_read_inode_range+0x144/0x17c [c6c63cf0] [00000000] (null) (unreliable) This patch fixes the issue, plus fixes all LTP tests on NAND/UBI with JFFS2 filesystem that were failing since 2.6.23 (seems like the bug above also broke the truncation). Reported-By: NAnton Vorontsov <avorontsov@ru.mvista.com> Tested-By: NAnton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com> Cc: stable@kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 27 11月, 2009 1 次提交
-
-
由 Csaba Henk 提交于
The comment in fuse_open about O_DIRECT: "VFS checks this, but only _after_ ->open()" also holds for fuse_create, however, the same kind of check was missing there. As an impact of this bug, open(newfile, O_RDWR|O_CREAT|O_DIRECT) fails, but a stub newfile will remain if the fuse server handled the implied FUSE_CREATE request appropriately. Other impact: in the above situation ima_file_free() will complain to open/free imbalance if CONFIG_IMA is set. Signed-off-by: NCsaba Henk <csaba@gluster.com> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Cc: Harshavardhana <harsha@gluster.com> Cc: stable@kernel.org
-
- 25 11月, 2009 3 次提交
-
-
由 Steve French 提交于
Also update CHANGES file Signed-off-by: NSteve French <sfrench@us.ibm.com>
-
由 Steve French 提交于
SMB writes are sent with a starting offset and length. When the server supports the newer SMB trans2 posix open (rather than using the SMB NTCreateX) a file can be opened with SMB_O_APPEND flag, and for that case Samba server assumes that the offset sent in SMBWriteX is unneeded since the write should go to the end of the file - which can cause problems if the write was cached (since the beginning part of a page could be written twice by the client mm). Jeff suggested that masking the flag on posix open on the client is easiest for the time being. Note that recent Samba server also had an unrelated problem with SMB NTCreateX and append (see samba bugzilla bug number 6898) which should not affect current Linux clients (unless cifs Unix Extensions are disabled). The cifs client did not send the O_APPEND flag on posix open before 2.6.29 so the fix is unneeded on early kernels. In the future, for the non-cached case (O_DIRECT, and forcedirectio mounts) it would be possible and useful to send O_APPEND on posix open (for Windows case: FILE_APPEND_DATA but not FILE_WRITE_DATA on SMB NTCreateX) but for cached writes although the vfs sets the offset to end of file it may fragment a write across pages - so we can't send O_APPEND on open (could result in sending part of a page twice). CC: Stable <stable@kernel.org> Reviewed-by: NShirish Pargaonkar <shirishp@us.ibm.com> Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NSteve French <sfrench@us.ibm.com>
-
由 Steve French 提交于
Fixes bugzilla.kernel.org bug number 14641 Lookup called during network boot (network root filesystem for diskless workstation) has case where nd is null in lookup. This patch fixes that in cifs_lookup. (Shirish noted that 2.6.30 and 2.6.31 stable need the same check) Signed-off-by: NShirish Pargaonkar <shirishp@us.ibm.com> Acked-by: NJeff Layton <jlayton@redhat.com> Tested-by: NVladimir Stavrinov <vs@inist.ru> CC: Stable <stable@kernel.org> Signed-off-by: NSteve French <sfrench@us.ibm.com>
-
- 21 11月, 2009 3 次提交
-
-
由 David Howells 提交于
Provide nop fscache_stat_d() macro if CONFIG_FSCACHE_STATS=n lest errors like the following occur: fs/fscache/cache.c: In function 'fscache_withdraw_cache': fs/fscache/cache.c:386: error: implicit declaration of function 'fscache_stat_d' fs/fscache/cache.c:386: error: 'fscache_n_cop_sync_cache' undeclared (first use in this function) fs/fscache/cache.c:386: error: (Each undeclared identifier is reported only once fs/fscache/cache.c:386: error: for each function it appears in.) fs/fscache/cache.c:392: error: 'fscache_n_cop_dissociate_pages' undeclared (first use in this function) Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
GFS2 has been altered to pass THIS_MODULE to slow_work_register_user(), but hasn't been altered to #include <linux/module.h> to provide it, resulting in the following error: fs/gfs2/recovery.c:596: error: 'THIS_MODULE' undeclared here (not in a function) Add the missing #include. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
As of the patch: SLOW_WORK: Wait for outstanding work items belonging to a module to clear Wait for outstanding slow work items belonging to a module to clear when unregistering that module as a user of the facility. This prevents the put_ref code of a work item from being taken away before it returns. slow_work_register_user() takes a module pointer as an argument. CIFS must now pass THIS_MODULE as that argument, lest the following error be observed: fs/cifs/cifsfs.c: In function 'init_cifs': fs/cifs/cifsfs.c:1040: error: too few arguments to function 'slow_work_register_user' Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
- 20 11月, 2009 11 次提交
-
-
由 David Howells 提交于
Don't log the CacheFiles lookup/create object routined failing with ENOBUFS as under high memory load or high cache load they can do this quite a lot. This error simply means that the requested object cannot be created on disk due to lack of space, or due to failure of the backing filesystem to find sufficient resources. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Catch an overly long wait for an old, dying active object when we want to replace it with a new one. The probability is that all the slow-work threads are hogged, and the delete can't get a look in. What we do instead is: (1) if there's nothing in the slow work queue, we sleep until either the dying object has finished dying or there is something in the slow work queue behind which we can queue our object. (2) if there is something in the slow work queue, we return ETIMEDOUT to fscache_lookup_object(), which then puts us back on the slow work queue, presumably behind the deletion that we're blocked by. We are then deferred for a while until we work our way back through the queue - without blocking a slow-work thread unnecessarily. A backtrace similar to the following may appear in the log without this patch: INFO: task kslowd004:5711 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kslowd004 D 0000000000000000 0 5711 2 0x00000080 ffff88000340bb80 0000000000000046 ffff88002550d000 0000000000000000 ffff88002550d000 0000000000000007 ffff88000340bfd8 ffff88002550d2a8 000000000000ddf0 00000000000118c0 00000000000118c0 ffff88002550d2a8 Call Trace: [<ffffffff81058e21>] ? trace_hardirqs_on+0xd/0xf [<ffffffffa011c4d8>] ? cachefiles_wait_bit+0x0/0xd [cachefiles] [<ffffffffa011c4e1>] cachefiles_wait_bit+0x9/0xd [cachefiles] [<ffffffff81353153>] __wait_on_bit+0x43/0x76 [<ffffffff8111ae39>] ? ext3_xattr_get+0x1ec/0x270 [<ffffffff813531ef>] out_of_line_wait_on_bit+0x69/0x74 [<ffffffffa011c4d8>] ? cachefiles_wait_bit+0x0/0xd [cachefiles] [<ffffffff8104c125>] ? wake_bit_function+0x0/0x2e [<ffffffffa011bc79>] cachefiles_mark_object_active+0x203/0x23b [cachefiles] [<ffffffffa011c209>] cachefiles_walk_to_object+0x558/0x827 [cachefiles] [<ffffffffa011a429>] cachefiles_lookup_object+0xac/0x12a [cachefiles] [<ffffffffa00aa1e9>] fscache_lookup_object+0x1c7/0x214 [fscache] [<ffffffffa00aafc5>] fscache_object_state_machine+0xa5/0x52d [fscache] [<ffffffffa00ab4ac>] fscache_object_slow_work_execute+0x5f/0xa0 [fscache] [<ffffffff81082093>] slow_work_execute+0x18f/0x2d1 [<ffffffff8108239a>] slow_work_thread+0x1c5/0x308 [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34 [<ffffffff810821d5>] ? slow_work_thread+0x0/0x308 [<ffffffff8104be91>] kthread+0x7a/0x82 [<ffffffff8100beda>] child_rip+0xa/0x20 [<ffffffff8100b87c>] ? restore_args+0x0/0x30 [<ffffffff8104be17>] ? kthread+0x0/0x82 [<ffffffff8100bed0>] ? child_rip+0x0/0x20 1 lock held by kslowd004/5711: #0: (&sb->s_type->i_mutex_key#7/1){+.+.+.}, at: [<ffffffffa011be64>] cachefiles_walk_to_object+0x1b3/0x827 [cachefiles] Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Show more debugging information if cachefiles_mark_object_active() is asked to activate an active object. This may happen, for instance, if the netfs tries to register an object with the same key multiple times. The code is changed to (a) get the appropriate object lock to protect the cookie pointer whilst we dereference it, and (b) get and display the cookie key if available. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Mark parent directory locks as I_MUTEX_PARENT in the callers of cachefiles_bury_object() so that lockdep doesn't complain when that invokes vfs_unlink(): ============================================= [ INFO: possible recursive locking detected ] 2.6.32-rc6-cachefs #47 --------------------------------------------- kslowd002/3089 is trying to acquire lock: (&sb->s_type->i_mutex_key#7){+.+.+.}, at: [<ffffffff810bbf72>] vfs_unlink+0x8b/0x128 but task is already holding lock: (&sb->s_type->i_mutex_key#7){+.+.+.}, at: [<ffffffffa00e4e61>] cachefiles_walk_to_object+0x1b0/0x831 [cachefiles] other info that might help us debug this: 1 lock held by kslowd002/3089: #0: (&sb->s_type->i_mutex_key#7){+.+.+.}, at: [<ffffffffa00e4e61>] cachefiles_walk_to_object+0x1b0/0x831 [cachefiles] stack backtrace: Pid: 3089, comm: kslowd002 Not tainted 2.6.32-rc6-cachefs #47 Call Trace: [<ffffffff8105ad7b>] __lock_acquire+0x1649/0x16e3 [<ffffffff8118170e>] ? inode_has_perm+0x5f/0x61 [<ffffffff8105ae6c>] lock_acquire+0x57/0x6d [<ffffffff810bbf72>] ? vfs_unlink+0x8b/0x128 [<ffffffff81353ac3>] mutex_lock_nested+0x54/0x292 [<ffffffff810bbf72>] ? vfs_unlink+0x8b/0x128 [<ffffffff8118179e>] ? selinux_inode_permission+0x8e/0x90 [<ffffffff8117e271>] ? security_inode_permission+0x1c/0x1e [<ffffffff810bb4fb>] ? inode_permission+0x99/0xa5 [<ffffffff810bbf72>] vfs_unlink+0x8b/0x128 [<ffffffff810adb19>] ? kfree+0xed/0xf9 [<ffffffffa00e3f00>] cachefiles_bury_object+0xb6/0x420 [cachefiles] [<ffffffff81058e21>] ? trace_hardirqs_on+0xd/0xf [<ffffffffa00e7e24>] ? cachefiles_check_object_xattr+0x233/0x293 [cachefiles] [<ffffffffa00e51b0>] cachefiles_walk_to_object+0x4ff/0x831 [cachefiles] [<ffffffff81032238>] ? finish_task_switch+0x0/0xb2 [<ffffffffa00e3429>] cachefiles_lookup_object+0xac/0x12a [cachefiles] [<ffffffffa00741e9>] fscache_lookup_object+0x1c7/0x214 [fscache] [<ffffffffa0074fc5>] fscache_object_state_machine+0xa5/0x52d [fscache] [<ffffffffa00754ac>] fscache_object_slow_work_execute+0x5f/0xa0 [fscache] [<ffffffff81082093>] slow_work_execute+0x18f/0x2d1 [<ffffffff8108239a>] slow_work_thread+0x1c5/0x308 [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34 [<ffffffff810821d5>] ? slow_work_thread+0x0/0x308 [<ffffffff8104be91>] kthread+0x7a/0x82 [<ffffffff8100beda>] child_rip+0xa/0x20 [<ffffffff8100b87c>] ? restore_args+0x0/0x30 [<ffffffff8104be17>] ? kthread+0x0/0x82 [<ffffffff8100bed0>] ? child_rip+0x0/0x20 Signed-off-by: NDaivd Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Handle truncate unlocking the page we're attempting to read from the backing device before the read has completed. This was causing reports like the following to occur: Pid: 4765, comm: kslowd Not tainted 2.6.30.1 #1 Call Trace: [<ffffffffa0331d7a>] ? cachefiles_read_waiter+0xd9/0x147 [cachefiles] [<ffffffff804b74bd>] ? __wait_on_bit+0x60/0x6f [<ffffffff8022bbbb>] ? __wake_up_common+0x3f/0x71 [<ffffffff8022cc32>] ? __wake_up+0x30/0x44 [<ffffffff8024a41f>] ? __wake_up_bit+0x28/0x2d [<ffffffffa003a793>] ? ext3_truncate+0x4d7/0x8ed [ext3] [<ffffffff80281f90>] ? pagevec_lookup+0x17/0x1f [<ffffffff8028c2ff>] ? unmap_mapping_range+0x59/0x1ff [<ffffffff8022cc32>] ? __wake_up+0x30/0x44 [<ffffffff8028e286>] ? vmtruncate+0xc2/0xe2 [<ffffffff802b82cf>] ? inode_setattr+0x22/0x10a [<ffffffffa003baa5>] ? ext3_setattr+0x17b/0x1e6 [ext3] [<ffffffff802b853d>] ? notify_change+0x186/0x2c9 [<ffffffffa032d9de>] ? cachefiles_attr_changed+0x133/0x1cd [cachefiles] [<ffffffffa032df7f>] ? cachefiles_lookup_object+0xcf/0x12a [cachefiles] [<ffffffffa0318165>] ? fscache_lookup_object+0x110/0x122 [fscache] [<ffffffffa03188c3>] ? fscache_object_slow_work_execute+0x590/0x6bc [fscache] [<ffffffff80278f82>] ? slow_work_thread+0x285/0x43a [<ffffffff8024a446>] ? autoremove_wake_function+0x0/0x2e [<ffffffff80278cfd>] ? slow_work_thread+0x0/0x43a [<ffffffff8024a317>] ? kthread+0x54/0x81 [<ffffffff8020c93a>] ? child_rip+0xa/0x20 [<ffffffff8024a2c3>] ? kthread+0x0/0x81 [<ffffffff8020c930>] ? child_rip+0x0/0x20 CacheFiles: I/O Error: Readpage failed on backing file 200000000000810 FS-Cache: Cache cachefiles stopped due to I/O error Reported-by: NChristian Kujau <lists@nerdbynature.de> Reported-by: NTakashi Iwai <tiwai@suse.de> Reported-by: NDuc Le Minh <duclm.vn@gmail.com> Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
cachefiles_write_page() writes a full page to the backing file for the last page of the netfs file, even if the netfs file's last page is only a partial page. This causes the EOF on the backing file to be extended beyond the EOF of the netfs, and thus the backing file will be truncated by cachefiles_attr_changed() called from cachefiles_lookup_object(). So we need to limit the write we make to the backing file on that last page such that it doesn't push the EOF too far. Also, if a backing file that has a partial page at the end is expanded, we discard the partial page and refetch it on the basis that we then have a hole in the file with invalid data, and should the power go out... A better way to deal with this could be to record a note that the partial page contains invalid data until the correct data is written into it. This isn't a problem for netfs's that discard the whole backing file if the file size changes (such as NFS). Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
FS-Cache objects have an FSCACHE_OBJECT_EV_REQUEUE event that can theoretically be raised to ask the state machine to requeue the object for further processing before the work function returns to the slow-work facility. However, fscache_object_work_execute() was clearing that bit before checking the event mask to see whether the object has any pending events that require it to be requeued immediately. Instead, the bit should be cleared after the check and enqueue. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Start processing an object's operations when that object moves into the DYING state as the object cannot be destroyed until all its outstanding operations have completed. Furthermore, make sure that read and allocation operations handle being woken up on a dead object. Such events are recorded in the Allocs.abt and Retrvls.abt statistics as viewable through /proc/fs/fscache/stats. The code for waiting for object activation for the read and allocation operations is also extracted into its own function as it is much the same in all cases, differing only in the stats incremented. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
We must make sure that FSCACHE_COOKIE_LOOKING_UP is cleared on lookup failure (if an object reaches the LC_DYING state), and we should clear it before clearing FSCACHE_COOKIE_CREATING. If this doesn't happen then fscache_wait_for_deferred_lookup() may hold allocation and retrieval operations indefinitely until they're interrupted by signals - which in turn pins the dying object until they go away. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Add a stat counter to count retirement events rather than ordinary release events (the retire argument to fscache_relinquish_cookie()). Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Handle netfs pages that the vmscan algorithm wants to evict from the pagecache under OOM conditions, but that are waiting for write to the cache. Under these conditions, vmscan calls the releasepage() function of the netfs, asking if a page can be discarded. The problem is typified by the following trace of a stuck process: kslowd005 D 0000000000000000 0 4253 2 0x00000080 ffff88001b14f370 0000000000000046 ffff880020d0d000 0000000000000007 0000000000000006 0000000000000001 ffff88001b14ffd8 ffff880020d0d2a8 000000000000ddf0 00000000000118c0 00000000000118c0 ffff880020d0d2a8 Call Trace: [<ffffffffa00782d8>] __fscache_wait_on_page_write+0x8b/0xa7 [fscache] [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34 [<ffffffffa0078240>] ? __fscache_check_page_write+0x63/0x70 [fscache] [<ffffffffa00b671d>] nfs_fscache_release_page+0x4e/0xc4 [nfs] [<ffffffffa00927f0>] nfs_release_page+0x3c/0x41 [nfs] [<ffffffff810885d3>] try_to_release_page+0x32/0x3b [<ffffffff81093203>] shrink_page_list+0x316/0x4ac [<ffffffff8109372b>] shrink_inactive_list+0x392/0x67c [<ffffffff813532fa>] ? __mutex_unlock_slowpath+0x100/0x10b [<ffffffff81058df0>] ? trace_hardirqs_on_caller+0x10c/0x130 [<ffffffff8135330e>] ? mutex_unlock+0x9/0xb [<ffffffff81093aa2>] shrink_list+0x8d/0x8f [<ffffffff81093d1c>] shrink_zone+0x278/0x33c [<ffffffff81052d6c>] ? ktime_get_ts+0xad/0xba [<ffffffff81094b13>] try_to_free_pages+0x22e/0x392 [<ffffffff81091e24>] ? isolate_pages_global+0x0/0x212 [<ffffffff8108e743>] __alloc_pages_nodemask+0x3dc/0x5cf [<ffffffff81089529>] grab_cache_page_write_begin+0x65/0xaa [<ffffffff8110f8c0>] ext3_write_begin+0x78/0x1eb [<ffffffff81089ec5>] generic_file_buffered_write+0x109/0x28c [<ffffffff8103cb69>] ? current_fs_time+0x22/0x29 [<ffffffff8108a509>] __generic_file_aio_write+0x350/0x385 [<ffffffff8108a588>] ? generic_file_aio_write+0x4a/0xae [<ffffffff8108a59e>] generic_file_aio_write+0x60/0xae [<ffffffff810b2e82>] do_sync_write+0xe3/0x120 [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34 [<ffffffff810b18e1>] ? __dentry_open+0x1a5/0x2b8 [<ffffffff810b1a76>] ? dentry_open+0x82/0x89 [<ffffffffa00e693c>] cachefiles_write_page+0x298/0x335 [cachefiles] [<ffffffffa0077147>] fscache_write_op+0x178/0x2c2 [fscache] [<ffffffffa0075656>] fscache_op_execute+0x7a/0xd1 [fscache] [<ffffffff81082093>] slow_work_execute+0x18f/0x2d1 [<ffffffff8108239a>] slow_work_thread+0x1c5/0x308 [<ffffffff8104c0f1>] ? autoremove_wake_function+0x0/0x34 [<ffffffff810821d5>] ? slow_work_thread+0x0/0x308 [<ffffffff8104be91>] kthread+0x7a/0x82 [<ffffffff8100beda>] child_rip+0xa/0x20 [<ffffffff8100b87c>] ? restore_args+0x0/0x30 [<ffffffff8102ef83>] ? tg_shares_up+0x171/0x227 [<ffffffff8104be17>] ? kthread+0x0/0x82 [<ffffffff8100bed0>] ? child_rip+0x0/0x20 In the above backtrace, the following is happening: (1) A page storage operation is being executed by a slow-work thread (fscache_write_op()). (2) FS-Cache farms the operation out to the cache to perform (cachefiles_write_page()). (3) CacheFiles is then calling Ext3 to perform the actual write, using Ext3's standard write (do_sync_write()) under KERNEL_DS directly from the netfs page. (4) However, for Ext3 to perform the write, it must allocate some memory, in particular, it must allocate at least one page cache page into which it can copy the data from the netfs page. (5) Under OOM conditions, the memory allocator can't immediately come up with a page, so it uses vmscan to find something to discard (try_to_free_pages()). (6) vmscan finds a clean netfs page it might be able to discard (possibly the one it's trying to write out). (7) The netfs is called to throw the page away (nfs_release_page()) - but it's called with __GFP_WAIT, so the netfs decides to wait for the store to complete (__fscache_wait_on_page_write()). (8) This blocks a slow-work processing thread - possibly against itself. The system ends up stuck because it can't write out any netfs pages to the cache without allocating more memory. To avoid this, we make FS-Cache cancel some writes that aren't in the middle of actually being performed. This means that some data won't make it into the cache this time. To support this, a new FS-Cache function is added fscache_maybe_release_page() that replaces what the netfs releasepage() functions used to do with respect to the cache. The decisions fscache_maybe_release_page() makes are counted and displayed through /proc/fs/fscache/stats on a line labelled "VmScan". There are four counters provided: "nos=N" - pages that weren't pending storage; "gon=N" - pages that were pending storage when we first looked, but weren't by the time we got the object lock; "bsy=N" - pages that we ignored as they were actively being written when we looked; and "can=N" - pages that we cancelled the storage of. What I'd really like to do is alter the behaviour of the cancellation heuristics, depending on how necessary it is to expel pages. If there are plenty of other pages that aren't waiting to be written to the cache that could be ejected first, then it would be nice to hold up on immediate cancellation of cache writes - but I don't see a way of doing that. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-