- 18 4月, 2013 2 次提交
-
-
由 Maxim Patlasov 提交于
The patch implements passing "struct fuse_io_priv *io" down the stack up to fuse_send_read/write where it is used to submit request asynchronously. io->async==0 designates synchronous processing. Non-trivial part of the patch is changes in fuse_direct_io(): resources like fuse requests and user pages cannot be released immediately in async case. Signed-off-by: NMaxim Patlasov <mpatlasov@parallels.com> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
由 Miklos Szeredi 提交于
Normally blocked_waitq will be inactive, so optimize this case. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
- 17 4月, 2013 4 次提交
-
-
由 Maxim Patlasov 提交于
The patch solves thundering herd problem. So far as previous patches ensured that only allocations for background may block, it's safe to wake up one waiter. Whoever it is, it will wake up another one in request_end() afterwards. Signed-off-by: NMaxim Patlasov <mpatlasov@parallels.com> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
由 Maxim Patlasov 提交于
A task may have at most one synchronous request allocated. So these requests need not be otherwise limited. The patch re-works fuse_get_req() to follow this idea. Signed-off-by: NMaxim Patlasov <mpatlasov@parallels.com> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
由 Maxim Patlasov 提交于
Existing flag fc->blocked is used to suspend request allocation both in case of many background request submitted and period of time before init_reply arrives from userspace. Next patch will skip blocking allocations of synchronous request (disregarding fc->blocked). This is mostly OK, but we still need to suspend allocations if init_reply is not arrived yet. The patch introduces flag fc->initialized which will serve this purpose. Signed-off-by: NMaxim Patlasov <mpatlasov@parallels.com> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
由 Maxim Patlasov 提交于
There are two types of processing requests in FUSE: synchronous (via fuse_request_send()) and asynchronous (via adding to fc->bg_queue). Fortunately, the type of processing is always known in advance, at the time of request allocation. This preparatory patch utilizes this fact making fuse_get_req() aware about the type. Next patches will use it. Signed-off-by: NMaxim Patlasov <mpatlasov@parallels.com> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
- 28 2月, 2013 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 04 2月, 2013 1 次提交
-
-
由 Eric Wong 提交于
The all pointers within fuse_req must point to valid memory once fuse_force_forget() returns. This bug appeared in "fuse: implement NFS-like readdirplus support" and was never in any official Linux release. I tested the fuse_force_forget() code path by injecting to fake -ENOMEM and verified the FORGET operation was called properly in userspace. Signed-off-by: NEric Wong <normalperson@yhbt.net> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
- 24 1月, 2013 6 次提交
-
-
由 Maxim Patlasov 提交于
Previously, anyone who set flag 'argpages' only filled req->pages[] and set per-request page_offset. This patch re-works all cases where argpages=1 to fill req->page_descs[] properly. Having req->page_descs[] filled properly allows to re-work fuse_copy_pages() to copy page fragments described by req->page_descs[]. This will be useful for next patches optimizing direct_IO. Signed-off-by: NMaxim Patlasov <mpatlasov@parallels.com> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
由 Maxim Patlasov 提交于
The ability to save page pointers along with lengths and offsets in fuse_req will be useful to cover several iovec-s with a single fuse_req. Per-request page_offset is removed because anybody who need it can use req->page_descs[0].offset instead. Signed-off-by: NMaxim Patlasov <mpatlasov@parallels.com> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
由 Maxim Patlasov 提交于
The patch reworks fuse_retrieve() to allocate only so many page pointers as needed. The core part of the patch is the following calculation: num_pages = (num + offset + PAGE_SIZE - 1) >> PAGE_SHIFT; (thanks Miklos for formula). All other changes are mostly shuffling lines. Signed-off-by: NMaxim Patlasov <mpatlasov@parallels.com> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
由 Maxim Patlasov 提交于
The patch categorizes all fuse_get_req() invocations into two categories: - fuse_get_req_nopages(fc) - when caller doesn't care about req->pages - fuse_get_req(fc, n) - when caller need n page pointers (n > 0) Adding fuse_get_req_nopages() helps to avoid numerous fuse_get_req(fc, 0) scattered over code. Now it's clear from the first glance when a caller need fuse_req with page pointers. The patch doesn't make any logic changes. In multi-page case, it silly allocates array of FUSE_MAX_PAGES_PER_REQ page pointers. This will be amended by future patches. Signed-off-by: NMaxim Patlasov <mpatlasov@parallels.com> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
由 Maxim Patlasov 提交于
The patch removes inline array of FUSE_MAX_PAGES_PER_REQ page pointers from fuse_req. Instead of that, req->pages may now point either to small inline array or to an array allocated dynamically. This essentially means that all callers of fuse_request_alloc[_nofs] should pass the number of pages needed explicitly. The patch doesn't make any logic changes. Signed-off-by: NMaxim Patlasov <mpatlasov@parallels.com> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
由 Anand V. Avati 提交于
This patch implements readdirplus support in FUSE, similar to NFS. The payload returned in the readdirplus call contains 'fuse_entry_out' structure thereby providing all the necessary inputs for 'faking' a lookup() operation on the spot. If the dentry and inode already existed (for e.g. in a re-run of ls -l) then just the inode attributes timeout and dentry timeout are refreshed. With a simple client->network->server implementation of a FUSE based filesystem, the following performance observations were made: Test: Performing a filesystem crawl over 20,000 files with sh# time ls -lR /mnt Without readdirplus: Run 1: 18.1s Run 2: 16.0s Run 3: 16.2s With readdirplus: Run 1: 4.1s Run 2: 3.8s Run 3: 3.8s The performance improvement is significant as it avoided 20,000 upcalls calls (lookup). Cache consistency is no worse than what already is. Signed-off-by: NAnand V. Avati <avati@redhat.com> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
- 17 1月, 2013 1 次提交
-
-
由 Wei Yongjun 提交于
The variables mapping,index are initialized but never used otherwise, so remove the unused variables. dpatch engine is used to auto generate this patch. (https://github.com/weiyj/dpatch) Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
- 15 11月, 2012 1 次提交
-
-
由 Eric W. Biederman 提交于
Use kuid_t and kgid_t in struct fuse_conn and struct fuse_mount_data. The connection between between a fuse filesystem and a fuse daemon is established when a fuse filesystem is mounted and provided with a file descriptor the fuse daemon created by opening /dev/fuse. For now restrict the communication of uids and gids between the fuse filesystem and the fuse daemon to the initial user namespace. Enforce this by verifying the file descriptor passed to the mount of fuse was opened in the initial user namespace. Ensuring the mount happens in the initial user namespace is not necessary as mounts from non-initial user namespaces are not yet allowed. In fuse_req_init_context convert the currrent fsuid and fsgid into the initial user namespace for the request that will be sent to the fuse daemon. In fuse_fill_attr convert the uid and gid passed from the fuse daemon from the initial user namespace into kuids and kgids. In iattr_to_fattr called from fuse_setattr convert kuids and kgids into the uids and gids in the initial user namespace before passing them to the fuse filesystem. In fuse_change_attributes_common called from fuse_dentry_revalidate, fuse_permission, fuse_geattr, and fuse_setattr, and fuse_iget convert the uid and gid from the fuse daemon into a kuid and a kgid to store on the fuse inode. By default fuse mounts are restricted to task whose uid, suid, and euid matches the fuse user_id and whose gid, sgid, and egid matches the fuse group id. Convert the user_id and group_id mount options into kuids and kgids at mount time, and use uid_eq and gid_eq to compare the in fuse_allow_task. Cc: Miklos Szeredi <miklos@szeredi.hu> Acked-by: NSerge Hallyn <serge.hallyn@canonical.com> Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
-
- 27 9月, 2012 1 次提交
-
-
由 Al Viro 提交于
simplifies a bunch of callers... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 05 9月, 2012 1 次提交
-
-
由 Miklos Szeredi 提交于
In some cases fuse_retrieve() would return a short byte count if offset was non-zero. The data returned was correct, though. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Cc: stable@vger.kernel.org
-
- 20 3月, 2012 1 次提交
-
-
由 Cong Wang 提交于
Signed-off-by: NCong Wang <amwang@redhat.com>
-
- 13 12月, 2011 2 次提交
-
-
由 John Muir 提交于
Allows a FUSE file-system to tell the kernel when a file or directory is deleted. If the specified dentry has the specified inode number, the kernel will unhash it. The current 'fuse_notify_inval_entry' does not cause the kernel to clean up directories that are in use properly, and as a result the users of those directories see incorrect semantics from the file-system. The error condition seen when 'fuse_notify_inval_entry' is used to notify of a deleted directory is avoided when 'fuse_notify_delete' is used instead. The following scenario demonstrates the difference: 1. User A chdirs into 'testdir' and starts reading 'testfile'. 2. User B rm -rf 'testdir'. 3. User B creates 'testdir'. 4. User C chdirs into 'testdir'. If you run the above within the same machine on any file-system (including fuse file-systems), there is no problem: user C is able to chdir into the new testdir. The old testdir is removed from the dentry tree, but still open by user A. If operations 2 and 3 are performed via the network such that the fuse file-system uses one of the notify functions to tell the kernel that the nodes are gone, then the following error occurs for user C while user A holds the original directory open: muirj@empacher:~> ls /test/testdir ls: cannot access /test/testdir: No such file or directory The issue here is that the kernel still has a dentry for testdir, and so it is requesting the attributes for the old directory, while the file-system is responding that the directory no longer exists. If on the other hand, if the file-system can notify the kernel that the directory is deleted using the new 'fuse_notify_delete' function, then the above ls will find the new directory as expected. Signed-off-by: NJohn Muir <john@jmuir.com> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
由 Miklos Szeredi 提交于
Fix two bugs in fuse_retrieve(): - retrieving more than one page would yield repeated instances of the first page - if more than FUSE_MAX_PAGES_PER_REQ pages were requested than the request page array would overflow fuse_retrieve() was added in 2.6.36 and these bugs had been there since the beginning. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> CC: stable@vger.kernel.org
-
- 13 9月, 2011 1 次提交
-
-
由 Miklos Szeredi 提交于
kmemleak is reporting that 32 bytes are being leaked by FUSE: unreferenced object 0xe373b270 (size 32): comm "fusermount", pid 1207, jiffies 4294707026 (age 2675.187s) hex dump (first 32 bytes): 01 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<b05517d7>] kmemleak_alloc+0x27/0x50 [<b0196435>] kmem_cache_alloc+0xc5/0x180 [<b02455be>] fuse_alloc_forget+0x1e/0x20 [<b0245670>] fuse_alloc_inode+0xb0/0xd0 [<b01b1a8c>] alloc_inode+0x1c/0x80 [<b01b290f>] iget5_locked+0x8f/0x1a0 [<b0246022>] fuse_iget+0x72/0x1a0 [<b02461da>] fuse_get_root_inode+0x8a/0x90 [<b02465cf>] fuse_fill_super+0x3ef/0x590 [<b019e56f>] mount_nodev+0x3f/0x90 [<b0244e95>] fuse_mount+0x15/0x20 [<b019d1bc>] mount_fs+0x1c/0xc0 [<b01b5811>] vfs_kern_mount+0x41/0x90 [<b01b5af9>] do_kern_mount+0x39/0xd0 [<b01b7585>] do_mount+0x2e5/0x660 [<b01b7966>] sys_mount+0x66/0xa0 This leak report is consistent and happens once per boot on 3.1.0-rc5-dirty. This happens if a FORGET request is queued after the fuse device was released. Reported-by: NSitsofe Wheeler <sitsofe@yahoo.com> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Tested-by: NSitsofe Wheeler <sitsofe@yahoo.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 24 8月, 2011 1 次提交
-
-
由 Miklos Szeredi 提交于
FUSE_NOTIFY_INVAL_ENTRY didn't check the length of the write so the message processing could overrun and result in a "kernel BUG at fs/fuse/dev.c:629!" Reported-by: NHan-Wen Nienhuys <hanwenn@gmail.com> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> CC: stable@kernel.org
-
- 23 3月, 2011 1 次提交
-
-
由 Miklos Szeredi 提交于
This function basically does: remove_from_page_cache(old); page_cache_release(old); add_to_page_cache_locked(new); Except it does this atomically, so there's no possibility for the "add" to fail because of a race. If memory cgroups are enabled, then the memory cgroup charge is also moved from the old page to the new. This function is currently used by fuse to move pages into the page cache on read, instead of copying the page contents. [minchan.kim@gmail.com: add freepage() hook to replace_page_cache_page()] Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Acked-by: NRik van Riel <riel@redhat.com> Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: NMinchan Kim <minchan.kim@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 21 3月, 2011 1 次提交
-
-
由 Bryan Green 提交于
If a fuse dev connection is broken, wake up any processes that are blocking, in a poll system call, on one of the files in the now defunct filesystem. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
- 08 12月, 2010 2 次提交
-
-
由 Miklos Szeredi 提交于
Terje Malmedal reports that a fuse filesystem with 32 million inodes on a machine with lots of memory can take up to 30 minutes to process FORGET requests when all those inodes are evicted from the icache. To solve this, create a BATCH_FORGET request that allows up to about 8000 FORGET requests to be sent in a single message. This request is only sent if userspace supports interface version 7.16 or later, otherwise fall back to sending individual FORGET messages. Reported-by: NTerje Malmedal <terje.malmedal@usit.uio.no> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
由 Miklos Szeredi 提交于
Terje Malmedal reports that a fuse filesystem with 32 million inodes on a machine with lots of memory can go unresponsive for up to 30 minutes when all those inodes are evicted from the icache. The reason is that FORGET messages, sent when the inode is evicted, are queued up together with regular filesystem requests, and while the huge queue of FORGET messages are processed no other filesystem operation can proceed. Since a full fuse request structure is allocated for each inode, these take up quite a bit of memory as well. To solve these issues, create a slim 'fuse_forget_link' structure containing just the minimum of information required to send the FORGET request and chain these on a separate queue. When userspace is asking for a request make sure that FORGET and non-FORGET requests are selected fairly: for each 8 non-FORGET allow 16 FORGET requests. This will make sure FORGETs do not pile up, yet other requests are also allowed to proceed while the queued FORGETs are processed. Reported-by: NTerje Malmedal <terje.malmedal@usit.uio.no> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
- 28 10月, 2010 1 次提交
-
-
由 Miklos Szeredi 提交于
Replace iterated page_cache_release() with release_pages(), which is faster and shorter. Needs release_pages() to be exported to modules. Suggested-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 27 10月, 2010 2 次提交
-
-
由 Miklos Szeredi 提交于
Commit 7909b1c6 ("fuse: don't use atomic kmap") removed KM_USER0 usage from fuse/dev.c. Switch KM_USER1 uses to KM_USER0 for clarity. Also replace open coded clear_highpage(). Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Cc: Jan Beulich <jbeulich@novell.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jan Beulich 提交于
After all that's what they are intended for. Signed-off-by: NJan Beulich <jbeulich@novell.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 10月, 2010 1 次提交
-
-
由 Geert Uytterhoeven 提交于
fs/fuse/dev.c:1357: warning: ‘total_len’ may be used uninitialized in this function Initialize total_len to zero, else its value will be undefined. Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
- 07 9月, 2010 2 次提交
-
-
由 Miklos Szeredi 提交于
Sparse doesn't understand lock annotations of the form __releases(&foo->lock). Change them to __releases(foo->lock). Same for __acquires(). Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
由 Miklos Szeredi 提交于
David Bartly reported that fuse can hang in fuse_get_req_nofail() when the connection to the filesystem server is no longer active. If bg_queue is not empty then flush_bg_queue() called from request_end() can put more requests on to the pending queue. If this happens while ending requests on the processing queue then those background requests will be queued to the pending list and never ended. Another problem is that fuse_dev_release() didn't wake up processes sleeping on blocked_waitq. Solve this by: a) flushing the background queue before calling end_requests() on the pending and processing queues b) setting blocked = 0 and waking up processes waiting on blocked_waitq() Thanks to David for an excellent bug report. Reported-by: NDavid Bartley <andareed@gmail.com> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> CC: stable@kernel.org
-
- 12 7月, 2010 3 次提交
-
-
由 Miklos Szeredi 提交于
Userspace filesystem can request data to be retrieved from the inode's mapping. This request is synchronous and the retrieved data is queued as a new request. If the write to the fuse device returns an error then the retrieve request was not completed and a reply will not be sent. Only present pages are returned in the retrieve reply. Retrieving stops when it finds a non-present page and only data prior to that is returned. This request doesn't change the dirty state of pages. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
由 Miklos Szeredi 提交于
Userspace filesystem can request data to be stored in the inode's mapping. This request is synchronous and has no reply. If the write to the fuse device returns an error then the store request was not fully completed (but may have updated some pages). If the stored data overflows the current file size, then the size is extended, similarly to a write(2) on the filesystem. Pages which have been completely stored are marked uptodate. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
由 Miklos Szeredi 提交于
Don't use atomic kmap for mapping userspace buffers in device read/write/splice. This is necessary because the next patch (adding store notify) requires that caller of fuse_copy_page() may sleep between invocations. The simplest way to ensure this is to change the atomic kmaps to non-atomic ones. Thankfully architectures where kmap() is not a no-op are going out of fashion, so we can ignore the (probably negligible) performance impact of this change. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
- 26 5月, 2010 1 次提交
-
-
由 Kay Sievers 提交于
This adds: alias: devname:<name> to some common kernel modules, which will allow the on-demand loading of the kernel module when the device node is accessed. Ideally all these modules would be compiled-in, but distros seems too much in love with their modularization that we need to cover the common cases with this new facility. It will allow us to remove a bunch of pretty useless init scripts and modprobes from init scripts. The static device node aliases will be carried in the module itself. The program depmod will extract this information to a file in the module directory: $ cat /lib/modules/2.6.34-00650-g537b60d1-dirty/modules.devname # Device nodes to trigger on-demand module loading. microcode cpu/microcode c10:184 fuse fuse c10:229 ppp_generic ppp c108:0 tun net/tun c10:200 dm_mod mapper/control c10:235 Udev will pick up the depmod created file on startup and create all the static device nodes which the kernel modules specify, so that these modules get automatically loaded when the device node is accessed: $ /sbin/udevd --debug ... static_dev_create_from_modules: mknod '/dev/cpu/microcode' c10:184 static_dev_create_from_modules: mknod '/dev/fuse' c10:229 static_dev_create_from_modules: mknod '/dev/ppp' c108:0 static_dev_create_from_modules: mknod '/dev/net/tun' c10:200 static_dev_create_from_modules: mknod '/dev/mapper/control' c10:235 udev_rules_apply_static_dev_perms: chmod '/dev/net/tun' 0666 udev_rules_apply_static_dev_perms: chmod '/dev/fuse' 0666 A few device nodes are switched to statically allocated numbers, to allow the static nodes to work. This might also useful for systems which still run a plain static /dev, which is completely unsafe to use with any dynamic minor numbers. Note: The devname aliases must be limited to the *common* and *single*instance* device nodes, like the misc devices, and never be used for conceptually limited systems like the loop devices, which should rather get fixed properly and get a control node for losetup to talk to, instead of creating a random number of device nodes in advance, regardless if they are ever used. This facility is to hide the mess distros are creating with too modualized kernels, and just to hide that these modules are not compiled-in, and not to paper-over broken concepts. Thanks! :) Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: David S. Miller <davem@davemloft.net> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Chris Mason <chris.mason@oracle.com> Cc: Alasdair G Kergon <agk@redhat.com> Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk> Cc: Ian Kent <raven@themaw.net> Signed-Off-By: NKay Sievers <kay.sievers@vrfy.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
- 25 5月, 2010 3 次提交
-
-
由 Miklos Szeredi 提交于
Allow userspace filesystem implementation to use splice() to read from the fuse device. The userspace filesystem can now transfer data coming from a WRITE request to an arbitrary file descriptor (regular file, block device or socket) without having to go through a userspace buffer. The semantics of using splice() to read messages are: 1) with a single splice() call move the whole message from the fuse device to a temporary pipe 2) read the header from the pipe and determine the message type 3a) if message is a WRITE then splice data from pipe to destination 3b) else read rest of message to userspace buffer Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
由 Miklos Szeredi 提交于
When splicing buffers to the fuse device with SPLICE_F_MOVE, try to move pages from the pipe buffer into the page cache. This allows populating the fuse filesystem's cache without ever touching the page contents, i.e. zero copy read capability. The following steps are performed when trying to move a page into the page cache: - buf->ops->confirm() to make sure the new page is uptodate - buf->ops->steal() to try to remove the new page from it's previous place - remove_from_page_cache() on the old page - add_to_page_cache_locked() on the new page If any of the above steps fail (non fatally) then the code falls back to copying the page. In particular ->steal() will fail if there are external references (other than the page cache and the pipe buffer) to the page. Also since the remove_from_page_cache() + add_to_page_cache_locked() are non-atomic it is possible that the page cache is repopulated in between the two and add_to_page_cache_locked() will fail. This could be fixed by creating a new atomic replace_page_cache_page() function. fuse_readpages_end() needed to be reworked so it works even if page->mapping is NULL for some or all pages which can happen if the add_to_page_cache_locked() failed. A number of sanity checks were added to make sure the stolen pages don't have weird flags set, etc... These could be moved into generic splice/steal code. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
由 Miklos Szeredi 提交于
Allow userspace filesystem implementation to use splice() to write to the fuse device. The semantics of using splice() are: 1) buffer the message header and data in a temporary pipe 2) with a *single* splice() call move the message from the temporary pipe to the fuse device The READ reply message has the most interesting use for this, since now the data from an arbitrary file descriptor (which could be a regular file, a block device or a socket) can be tranferred into the fuse device without having to go through a userspace buffer. It will also allow zero copy moving of pages. One caveat is that the protocol on the fuse device requires the length of the whole message to be written into the header. But the length of the data transferred into the temporary pipe may not be known in advance. The current library implementation works around this by using vmplice to write the header and modifying the header after splicing the data into the pipe (error handling omitted): struct fuse_out_header out; iov.iov_base = &out; iov.iov_len = sizeof(struct fuse_out_header); vmsplice(pip[1], &iov, 1, 0); len = splice(input_fd, input_offset, pip[1], NULL, len, 0); /* retrospectively modify the header: */ out.len = len + sizeof(struct fuse_out_header); splice(pip[0], NULL, fuse_chan_fd(req->ch), NULL, out.len, flags); This works since vmsplice only saves a pointer to the data, it does not copy the data itself. Since pipes are currently limited to 16 pages and messages need to be spliced atomically, the length of the data is limited to 15 pages (or 60kB for 4k pages). Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-