- 25 2月, 2011 1 次提交
-
-
由 Miklos Szeredi 提交于
Single threaded NTFS-3G could get stuck if a delayed RELEASE reply triggered a DESTROY request via path_put(). Fix this by a) making RELEASE requests synchronous, whenever possible, on fuseblk filesystems b) if not possible (triggered by an asynchronous read/write) then do the path_put() in a separate thread with schedule_work(). Reported-by: NOliver Neukum <oneukum@suse.de> Cc: stable@kernel.org Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
- 13 1月, 2011 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 07 1月, 2011 4 次提交
-
-
由 Nick Piggin 提交于
Signed-off-by: NNick Piggin <npiggin@kernel.dk>
-
由 Nick Piggin 提交于
Require filesystems be aware of .d_revalidate being called in rcu-walk mode (nd->flags & LOOKUP_RCU). For now do a simple push down, returning -ECHILD from all implementations. Signed-off-by: NNick Piggin <npiggin@kernel.dk>
-
由 Nick Piggin 提交于
Reduce some branches and memory accesses in dcache lookup by adding dentry flags to indicate common d_ops are set, rather than having to check them. This saves a pointer memory access (dentry->d_op) in common path lookup situations, and saves another pointer load and branch in cases where we have d_op but not the particular operation. Patched with: git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i Signed-off-by: NNick Piggin <npiggin@kernel.dk>
-
由 Nick Piggin 提交于
RCU free the struct inode. This will allow: - Subsequent store-free path walking patch. The inode must be consulted for permissions when walking, so an RCU inode reference is a must. - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want to take i_lock no longer need to take sb_inode_list_lock to walk the list in the first place. This will simplify and optimize locking. - Could remove some nested trylock loops in dcache code - Could potentially simplify things a bit in VM land. Do not need to take the page lock to follow page->mapping. The downsides of this is the performance cost of using RCU. In a simple creat/unlink microbenchmark, performance drops by about 10% due to inability to reuse cache-hot slab objects. As iterations increase and RCU freeing starts kicking over, this increases to about 20%. In cases where inode lifetimes are longer (ie. many inodes may be allocated during the average life span of a single inode), a lot of this cache reuse is not applicable, so the regression caused by this patch is smaller. The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU, however this adds some complexity to list walking and store-free path walking, so I prefer to implement this at a later date, if it is shown to be a win in real situations. I haven't found a regression in any non-micro benchmark so I doubt it will be a problem. Signed-off-by: NNick Piggin <npiggin@kernel.dk>
-
- 08 12月, 2010 4 次提交
-
-
由 Miklos Szeredi 提交于
In kernel ABI version 7.16 and later FUSE_IOCTL_RETRY reply from a unrestricted IOCTL request shall return with an array of 'struct fuse_ioctl_iovec' instead of 'struct iovec'. This fixes the ABI ambiguity of 32bit vs. 64bit. Reported-by: N"ccmail111" <ccmail111@yahoo.com> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> CC: Tejun Heo <tj@kernel.org>
-
由 Miklos Szeredi 提交于
Terje Malmedal reports that a fuse filesystem with 32 million inodes on a machine with lots of memory can take up to 30 minutes to process FORGET requests when all those inodes are evicted from the icache. To solve this, create a BATCH_FORGET request that allows up to about 8000 FORGET requests to be sent in a single message. This request is only sent if userspace supports interface version 7.16 or later, otherwise fall back to sending individual FORGET messages. Reported-by: NTerje Malmedal <terje.malmedal@usit.uio.no> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
由 Miklos Szeredi 提交于
Terje Malmedal reports that a fuse filesystem with 32 million inodes on a machine with lots of memory can go unresponsive for up to 30 minutes when all those inodes are evicted from the icache. The reason is that FORGET messages, sent when the inode is evicted, are queued up together with regular filesystem requests, and while the huge queue of FORGET messages are processed no other filesystem operation can proceed. Since a full fuse request structure is allocated for each inode, these take up quite a bit of memory as well. To solve these issues, create a slim 'fuse_forget_link' structure containing just the minimum of information required to send the FORGET request and chain these on a separate queue. When userspace is asking for a request make sure that FORGET and non-FORGET requests are selected fairly: for each 8 non-FORGET allow 16 FORGET requests. This will make sure FORGETs do not pile up, yet other requests are also allowed to proceed while the queued FORGETs are processed. Reported-by: NTerje Malmedal <terje.malmedal@usit.uio.no> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
由 Miklos Szeredi 提交于
Get rid of unnecessary page_address()-es. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> CC: Tejun Heo <tj@kernel.org>
-
- 30 11月, 2010 2 次提交
-
-
由 Miklos Szeredi 提交于
Verify that the total length of the iovec returned in FUSE_IOCTL_RETRY doesn't overflow iov_length(). Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> CC: Tejun Heo <tj@kernel.org> CC: <stable@kernel.org> [2.6.31+]
-
由 Miklos Szeredi 提交于
If a 32bit CUSE server is run on 64bit this results in EIO being returned to the caller. The reason is that FUSE_IOCTL_RETRY reply was defined to use 'struct iovec', which is different on 32bit and 64bit archs. Work around this by looking at the size of the reply to determine which struct was used. This is only needed if CONFIG_COMPAT is defined. A more permanent fix for the interface will be to use the same struct on both 32bit and 64bit. Reported-by: N"ccmail111" <ccmail111@yahoo.com> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> CC: Tejun Heo <tj@kernel.org> CC: <stable@kernel.org> [2.6.31+]
-
- 25 11月, 2010 1 次提交
-
-
由 Ken Sumrall 提交于
The attribute cache for a file was not being cleared when a file is opened with O_TRUNC. If the filesystem's open operation truncates the file ("atomic_o_trunc" feature flag is set) then the kernel should invalidate the cached st_mtime and st_ctime attributes. Also i_size should be explicitly be set to zero as it is used sometimes without refreshing the cache. Signed-off-by: NKen Sumrall <ksumrall@android.com> Cc: Anfei <anfei.zhou@gmail.com> Cc: "Anand V. Avati" <avati@gluster.com> Signed-off-by: NMiklos Szeredi <miklos@szeredi.hu> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 29 10月, 2010 3 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
... and switch of the obvious get_sb_bdev() users to ->mount() Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 28 10月, 2010 1 次提交
-
-
由 Miklos Szeredi 提交于
Replace iterated page_cache_release() with release_pages(), which is faster and shorter. Needs release_pages() to be exported to modules. Suggested-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 27 10月, 2010 2 次提交
-
-
由 Miklos Szeredi 提交于
Commit 7909b1c6 ("fuse: don't use atomic kmap") removed KM_USER0 usage from fuse/dev.c. Switch KM_USER1 uses to KM_USER0 for clarity. Also replace open coded clear_highpage(). Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Cc: Jan Beulich <jbeulich@novell.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jan Beulich 提交于
After all that's what they are intended for. Signed-off-by: NJan Beulich <jbeulich@novell.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 26 10月, 2010 1 次提交
-
-
由 Christoph Hellwig 提交于
Instead of always assigning an increasing inode number in new_inode move the call to assign it into those callers that actually need it. For now callers that need it is estimated conservatively, that is the call is added to all filesystems that do not assign an i_ino by themselves. For a few more filesystems we can avoid assigning any inode number given that they aren't user visible, and for others it could be done lazily when an inode number is actually needed, but that's left for later patches. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 15 10月, 2010 1 次提交
-
-
由 Arnd Bergmann 提交于
All file_operations should get a .llseek operation so we can make nonseekable_open the default for future file operations without a .llseek pointer. The three cases that we can automatically detect are no_llseek, seq_lseek and default_llseek. For cases where we can we can automatically prove that the file offset is always ignored, we use noop_llseek, which maintains the current behavior of not returning an error from a seek. New drivers should normally not use noop_llseek but instead use no_llseek and call nonseekable_open at open time. Existing drivers can be converted to do the same when the maintainer knows for certain that no user code relies on calling seek on the device file. The generated code is often incorrectly indented and right now contains comments that clarify for each added line why a specific variant was chosen. In the version that gets submitted upstream, the comments will be gone and I will manually fix the indentation, because there does not seem to be a way to do that using coccinelle. Some amount of new code is currently sitting in linux-next that should get the same modifications, which I will do at the end of the merge window. Many thanks to Julia Lawall for helping me learn to write a semantic patch that does all this. ===== begin semantic patch ===== // This adds an llseek= method to all file operations, // as a preparation for making no_llseek the default. // // The rules are // - use no_llseek explicitly if we do nonseekable_open // - use seq_lseek for sequential files // - use default_llseek if we know we access f_pos // - use noop_llseek if we know we don't access f_pos, // but we still want to allow users to call lseek // @ open1 exists @ identifier nested_open; @@ nested_open(...) { <+... nonseekable_open(...) ...+> } @ open exists@ identifier open_f; identifier i, f; identifier open1.nested_open; @@ int open_f(struct inode *i, struct file *f) { <+... ( nonseekable_open(...) | nested_open(...) ) ...+> } @ read disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off) { <+... ( *off = E | *off += E | func(..., off, ...) | E = *off ) ...+> } @ read_no_fpos disable optional_qualifier exists @ identifier read_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off) { ... when != off } @ write @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; expression E; identifier func; @@ ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off) { <+... ( *off = E | *off += E | func(..., off, ...) | E = *off ) ...+> } @ write_no_fpos @ identifier write_f; identifier f, p, s, off; type ssize_t, size_t, loff_t; @@ ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off) { ... when != off } @ fops0 @ identifier fops; @@ struct file_operations fops = { ... }; @ has_llseek depends on fops0 @ identifier fops0.fops; identifier llseek_f; @@ struct file_operations fops = { ... .llseek = llseek_f, ... }; @ has_read depends on fops0 @ identifier fops0.fops; identifier read_f; @@ struct file_operations fops = { ... .read = read_f, ... }; @ has_write depends on fops0 @ identifier fops0.fops; identifier write_f; @@ struct file_operations fops = { ... .write = write_f, ... }; @ has_open depends on fops0 @ identifier fops0.fops; identifier open_f; @@ struct file_operations fops = { ... .open = open_f, ... }; // use no_llseek if we call nonseekable_open //////////////////////////////////////////// @ nonseekable1 depends on !has_llseek && has_open @ identifier fops0.fops; identifier nso ~= "nonseekable_open"; @@ struct file_operations fops = { ... .open = nso, ... +.llseek = no_llseek, /* nonseekable */ }; @ nonseekable2 depends on !has_llseek @ identifier fops0.fops; identifier open.open_f; @@ struct file_operations fops = { ... .open = open_f, ... +.llseek = no_llseek, /* open uses nonseekable */ }; // use seq_lseek for sequential files ///////////////////////////////////// @ seq depends on !has_llseek @ identifier fops0.fops; identifier sr ~= "seq_read"; @@ struct file_operations fops = { ... .read = sr, ... +.llseek = seq_lseek, /* we have seq_read */ }; // use default_llseek if there is a readdir /////////////////////////////////////////// @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier readdir_e; @@ // any other fop is used that changes pos struct file_operations fops = { ... .readdir = readdir_e, ... +.llseek = default_llseek, /* readdir is present */ }; // use default_llseek if at least one of read/write touches f_pos ///////////////////////////////////////////////////////////////// @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read.read_f; @@ // read fops use offset struct file_operations fops = { ... .read = read_f, ... +.llseek = default_llseek, /* read accesses f_pos */ }; @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, ... + .llseek = default_llseek, /* write accesses f_pos */ }; // Use noop_llseek if neither read nor write accesses f_pos /////////////////////////////////////////////////////////// @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; identifier write_no_fpos.write_f; @@ // write fops use offset struct file_operations fops = { ... .write = write_f, .read = read_f, ... +.llseek = noop_llseek, /* read and write both use no f_pos */ }; @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier write_no_fpos.write_f; @@ struct file_operations fops = { ... .write = write_f, ... +.llseek = noop_llseek, /* write uses no f_pos */ }; @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; identifier read_no_fpos.read_f; @@ struct file_operations fops = { ... .read = read_f, ... +.llseek = noop_llseek, /* read uses no f_pos */ }; @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @ identifier fops0.fops; @@ struct file_operations fops = { ... +.llseek = noop_llseek, /* no read or write fn */ }; ===== End semantic patch ===== Signed-off-by: NArnd Bergmann <arnd@arndb.de> Cc: Julia Lawall <julia@diku.dk> Cc: Christoph Hellwig <hch@infradead.org>
-
- 04 10月, 2010 1 次提交
-
-
由 Geert Uytterhoeven 提交于
fs/fuse/dev.c:1357: warning: ‘total_len’ may be used uninitialized in this function Initialize total_len to zero, else its value will be undefined. Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
- 07 9月, 2010 2 次提交
-
-
由 Miklos Szeredi 提交于
Sparse doesn't understand lock annotations of the form __releases(&foo->lock). Change them to __releases(foo->lock). Same for __acquires(). Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
由 Miklos Szeredi 提交于
David Bartly reported that fuse can hang in fuse_get_req_nofail() when the connection to the filesystem server is no longer active. If bg_queue is not empty then flush_bg_queue() called from request_end() can put more requests on to the pending queue. If this happens while ending requests on the processing queue then those background requests will be queued to the pending list and never ended. Another problem is that fuse_dev_release() didn't wake up processes sleeping on blocked_waitq. Solve this by: a) flushing the background queue before calling end_requests() on the pending and processing queues b) setting blocked = 0 and waking up processes waiting on blocked_waitq() Thanks to David for an excellent bug report. Reported-by: NDavid Bartley <andareed@gmail.com> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> CC: stable@kernel.org
-
- 10 8月, 2010 3 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Christoph Hellwig 提交于
Make sure we check the truncate constraints early on in ->setattr by adding those checks to inode_change_ok. Also clean up and document inode_change_ok to make this obvious. As a fallout we don't have to call inode_newsize_ok from simple_setsize and simplify it down to a truncate_setsize which doesn't return an error. This simplifies a lot of setattr implementations and means we use truncate_setsize almost everywhere. Get rid of fat_setsize now that it's trivial and mark ext2_setsize static to make the calling convention obvious. Keep the inode_newsize_ok in vmtruncate for now as all callers need an audit for its removal anyway. Note: setattr code in ecryptfs doesn't call inode_change_ok at all and needs a deeper audit, but that is left for later. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Christoph Hellwig 提交于
Make sure we call inode_change_ok before doing any changes in ->setattr, and make sure to call it even if our fs wants to ignore normal UNIX permissions, but use the ATTR_FORCE to skip those. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 02 8月, 2010 1 次提交
-
-
由 Eric Paris 提交于
Currently MAY_ACCESS means that filesystems must check the permissions right then and not rely on cached results or the results of future operations on the object. This can be because of a call to sys_access() or because of a call to chdir() which needs to check search without relying on any future operations inside that dir. I plan to use MAY_ACCESS for other purposes in the security system, so I split the MAY_ACCESS and the MAY_CHDIR cases. Signed-off-by: NEric Paris <eparis@redhat.com> Acked-by: NStephen D. Smalley <sds@tycho.nsa.gov> Signed-off-by: NJames Morris <jmorris@namei.org>
-
- 12 7月, 2010 3 次提交
-
-
由 Miklos Szeredi 提交于
Userspace filesystem can request data to be retrieved from the inode's mapping. This request is synchronous and the retrieved data is queued as a new request. If the write to the fuse device returns an error then the retrieve request was not completed and a reply will not be sent. Only present pages are returned in the retrieve reply. Retrieving stops when it finds a non-present page and only data prior to that is returned. This request doesn't change the dirty state of pages. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
由 Miklos Szeredi 提交于
Userspace filesystem can request data to be stored in the inode's mapping. This request is synchronous and has no reply. If the write to the fuse device returns an error then the store request was not fully completed (but may have updated some pages). If the stored data overflows the current file size, then the size is extended, similarly to a write(2) on the filesystem. Pages which have been completely stored are marked uptodate. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
由 Miklos Szeredi 提交于
Don't use atomic kmap for mapping userspace buffers in device read/write/splice. This is necessary because the next patch (adding store notify) requires that caller of fuse_copy_page() may sleep between invocations. The simplest way to ensure this is to change the atomic kmaps to non-atomic ones. Thankfully architectures where kmap() is not a no-op are going out of fashion, so we can ignore the (probably negligible) performance impact of this change. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
- 28 5月, 2010 1 次提交
-
-
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 26 5月, 2010 1 次提交
-
-
由 Kay Sievers 提交于
This adds: alias: devname:<name> to some common kernel modules, which will allow the on-demand loading of the kernel module when the device node is accessed. Ideally all these modules would be compiled-in, but distros seems too much in love with their modularization that we need to cover the common cases with this new facility. It will allow us to remove a bunch of pretty useless init scripts and modprobes from init scripts. The static device node aliases will be carried in the module itself. The program depmod will extract this information to a file in the module directory: $ cat /lib/modules/2.6.34-00650-g537b60d1-dirty/modules.devname # Device nodes to trigger on-demand module loading. microcode cpu/microcode c10:184 fuse fuse c10:229 ppp_generic ppp c108:0 tun net/tun c10:200 dm_mod mapper/control c10:235 Udev will pick up the depmod created file on startup and create all the static device nodes which the kernel modules specify, so that these modules get automatically loaded when the device node is accessed: $ /sbin/udevd --debug ... static_dev_create_from_modules: mknod '/dev/cpu/microcode' c10:184 static_dev_create_from_modules: mknod '/dev/fuse' c10:229 static_dev_create_from_modules: mknod '/dev/ppp' c108:0 static_dev_create_from_modules: mknod '/dev/net/tun' c10:200 static_dev_create_from_modules: mknod '/dev/mapper/control' c10:235 udev_rules_apply_static_dev_perms: chmod '/dev/net/tun' 0666 udev_rules_apply_static_dev_perms: chmod '/dev/fuse' 0666 A few device nodes are switched to statically allocated numbers, to allow the static nodes to work. This might also useful for systems which still run a plain static /dev, which is completely unsafe to use with any dynamic minor numbers. Note: The devname aliases must be limited to the *common* and *single*instance* device nodes, like the misc devices, and never be used for conceptually limited systems like the loop devices, which should rather get fixed properly and get a control node for losetup to talk to, instead of creating a random number of device nodes in advance, regardless if they are ever used. This facility is to hide the mess distros are creating with too modualized kernels, and just to hide that these modules are not compiled-in, and not to paper-over broken concepts. Thanks! :) Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: David S. Miller <davem@davemloft.net> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Chris Mason <chris.mason@oracle.com> Cc: Alasdair G Kergon <agk@redhat.com> Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk> Cc: Ian Kent <raven@themaw.net> Signed-Off-By: NKay Sievers <kay.sievers@vrfy.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
- 25 5月, 2010 6 次提交
-
-
由 Miklos Szeredi 提交于
Allow userspace filesystem implementation to use splice() to read from the fuse device. The userspace filesystem can now transfer data coming from a WRITE request to an arbitrary file descriptor (regular file, block device or socket) without having to go through a userspace buffer. The semantics of using splice() to read messages are: 1) with a single splice() call move the whole message from the fuse device to a temporary pipe 2) read the header from the pipe and determine the message type 3a) if message is a WRITE then splice data from pipe to destination 3b) else read rest of message to userspace buffer Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
由 Miklos Szeredi 提交于
When splicing buffers to the fuse device with SPLICE_F_MOVE, try to move pages from the pipe buffer into the page cache. This allows populating the fuse filesystem's cache without ever touching the page contents, i.e. zero copy read capability. The following steps are performed when trying to move a page into the page cache: - buf->ops->confirm() to make sure the new page is uptodate - buf->ops->steal() to try to remove the new page from it's previous place - remove_from_page_cache() on the old page - add_to_page_cache_locked() on the new page If any of the above steps fail (non fatally) then the code falls back to copying the page. In particular ->steal() will fail if there are external references (other than the page cache and the pipe buffer) to the page. Also since the remove_from_page_cache() + add_to_page_cache_locked() are non-atomic it is possible that the page cache is repopulated in between the two and add_to_page_cache_locked() will fail. This could be fixed by creating a new atomic replace_page_cache_page() function. fuse_readpages_end() needed to be reworked so it works even if page->mapping is NULL for some or all pages which can happen if the add_to_page_cache_locked() failed. A number of sanity checks were added to make sure the stolen pages don't have weird flags set, etc... These could be moved into generic splice/steal code. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
由 Miklos Szeredi 提交于
Allow userspace filesystem implementation to use splice() to write to the fuse device. The semantics of using splice() are: 1) buffer the message header and data in a temporary pipe 2) with a *single* splice() call move the message from the temporary pipe to the fuse device The READ reply message has the most interesting use for this, since now the data from an arbitrary file descriptor (which could be a regular file, a block device or a socket) can be tranferred into the fuse device without having to go through a userspace buffer. It will also allow zero copy moving of pages. One caveat is that the protocol on the fuse device requires the length of the whole message to be written into the header. But the length of the data transferred into the temporary pipe may not be known in advance. The current library implementation works around this by using vmplice to write the header and modifying the header after splicing the data into the pipe (error handling omitted): struct fuse_out_header out; iov.iov_base = &out; iov.iov_len = sizeof(struct fuse_out_header); vmsplice(pip[1], &iov, 1, 0); len = splice(input_fd, input_offset, pip[1], NULL, len, 0); /* retrospectively modify the header: */ out.len = len + sizeof(struct fuse_out_header); splice(pip[0], NULL, fuse_chan_fd(req->ch), NULL, out.len, flags); This works since vmsplice only saves a pointer to the data, it does not copy the data itself. Since pipes are currently limited to 16 pages and messages need to be spliced atomically, the length of the data is limited to 15 pages (or 60kB for 4k pages). Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
由 Miklos Szeredi 提交于
Acquire a page ref on pages in ->readpages() and release them when the read has finished. Not acquiring a reference didn't seem to cause any trouble since the page is locked and will not be kicked out of the page cache during the read. However the following patches will want to remove the page from the cache so a separate ref is needed. Making the reference in req->pages explicit also makes the code easier to understand. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
由 Miklos Szeredi 提交于
Replace uses of get_user_pages() with get_user_pages_fast(). It looks nicer and should be faster in most cases. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
由 Dan Carpenter 提交于
"map" isn't needed any more after: 0bd87182 "fuse: fix kunmap in fuse_ioctl_copy_user" Signed-off-by: NDan Carpenter <error27@gmail.com> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
-
- 30 3月, 2010 1 次提交
-
-
由 Tejun Heo 提交于
include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: NTejun Heo <tj@kernel.org> Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
-