提交 917f6dfb 编写于 作者: S Stefan Hajnoczi 提交者: Caspar Zhang

virtio-fs: add virtiofs filesystem

task #28910367
commit a62a8ef9d97da23762a588592c8b8eb50a8deb6a upstream

Add a basic file system module for virtio-fs.  This does not yet contain
shared data support between host and guest or metadata coherency speedups.
However it is already significantly faster than virtio-9p.

Design Overview
===============

With the goal of designing something with better performance and local file
system semantics, a bunch of ideas were proposed.

 - Use fuse protocol (instead of 9p) for communication between guest and
   host.  Guest kernel will be fuse client and a fuse server will run on
   host to serve the requests.

 - For data access inside guest, mmap portion of file in QEMU address space
   and guest accesses this memory using dax.  That way guest page cache is
   bypassed and there is only one copy of data (on host).  This will also
   enable mmap(MAP_SHARED) between guests.

 - For metadata coherency, there is a shared memory region which contains
   version number associated with metadata and any guest changing metadata
   updates version number and other guests refresh metadata on next access.
   This is yet to be implemented.

How virtio-fs differs from existing approaches
==============================================

The unique idea behind virtio-fs is to take advantage of the co-location of
the virtual machine and hypervisor to avoid communication (vmexits).

DAX allows file contents to be accessed without communication with the
hypervisor.  The shared memory region for metadata avoids communication in
the common case where metadata is unchanged.

By replacing expensive communication with cheaper shared memory accesses,
we expect to achieve better performance than approaches based on network
file system protocols.  In addition, this also makes it easier to achieve
local file system semantics (coherency).

These techniques are not applicable to network file system protocols since
the communications channel is bypassed by taking advantage of shared memory
on a local machine.  This is why we decided to build virtio-fs rather than
focus on 9P or NFS.

Caching Modes
=============

Like virtio-9p, different caching modes are supported which determine the
coherency level as well.  The “cache=FOO” and “writeback” options control
the level of coherence between the guest and host filesystems.

 - cache=none
   metadata, data and pathname lookup are not cached in guest.  They are
   always fetched from host and any changes are immediately pushed to host.

 - cache=always
   metadata, data and pathname lookup are cached in guest and never expire.

 - cache=auto
   metadata and pathname lookup cache expires after a configured amount of
   time (default is 1 second).  Data is cached while the file is open
   (close to open consistency).

 - writeback/no_writeback
   These options control the writeback strategy.  If writeback is disabled,
   then normal writes will immediately be synchronized with the host fs.
   If writeback is enabled, then writes may be cached in the guest until
   the file is closed or an fsync(2) performed.  This option has no effect
   on mmap-ed writes or writes going through the DAX mechanism.
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

(cherry picked from commit a62a8ef9d97da23762a588592c8b8eb50a8deb6a)
[Liubo: given that 4.19 lacks the support of fs_context to parse mount
option, here I just change it back to the 4.19 way, so we still use -o
tag=myfs-1 to get virtiofs mount.]
Signed-off-by: NLiu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
上级 3fcc16fc
......@@ -26,3 +26,14 @@ config CUSE
If you want to develop or use a userspace character device
based on CUSE, answer Y or M.
config VIRTIO_FS
tristate "Virtio Filesystem"
depends on FUSE_FS
select VIRTIO
help
The Virtio Filesystem allows guests to mount file systems from the
host.
If you want to share files between guests or with the host, answer Y
or M.
......@@ -4,5 +4,6 @@
obj-$(CONFIG_FUSE_FS) += fuse.o
obj-$(CONFIG_CUSE) += cuse.o
obj-$(CONFIG_VIRTIO_FS) += virtio_fs.o
fuse-objs := dev.o dir.o file.o inode.o control.o xattr.o acl.o
......@@ -56,17 +56,39 @@ extern unsigned max_user_congthresh;
/** Mount options */
struct fuse_mount_data {
int fd;
const char *tag; /* lifetime: .fill_super() data argument */
unsigned rootmode;
kuid_t user_id;
kgid_t group_id;
unsigned fd_present:1;
unsigned tag_present:1;
unsigned rootmode_present:1;
unsigned user_id_present:1;
unsigned group_id_present:1;
unsigned default_permissions:1;
unsigned allow_other:1;
unsigned dax:1;
unsigned destroy:1;
unsigned max_read;
unsigned blksize;
/* DAX device, may be NULL */
struct dax_device *dax_dev;
/* fuse input queue operations */
const struct fuse_iqueue_ops *fiq_ops;
/* device-specific state for fuse_iqueue */
void *fiq_priv;
/* fuse_dev pointer to fill in, should contain NULL on entry */
void **fudptr;
/* version table length in bytes */
size_t vertab_len;
/* version table kernel address */
void *vertab_kaddr;
};
/* One forget request */
......@@ -398,6 +420,11 @@ struct fuse_req {
/** Request is stolen from fuse_file->reserved_req */
struct file *stolen_file;
#if IS_ENABLED(CONFIG_VIRTIO_FS)
/** virtio-fs's physically contiguous buffer for in and out args */
void *argbuf;
#endif
};
struct fuse_iqueue;
......@@ -428,6 +455,11 @@ struct fuse_iqueue_ops {
*/
void (*wake_pending_and_unlock)(struct fuse_iqueue *fiq)
__releases(fiq->lock);
/**
* Clean up when fuse_iqueue is destroyed
*/
void (*release)(struct fuse_iqueue *fiq);
};
/** /dev/fuse input queue operations */
......@@ -981,12 +1013,16 @@ int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev,
* @fudptr: fuse_dev pointer to fill in, should contain NULL on entry
*/
int fuse_fill_super_common(struct super_block *sb,
struct fuse_mount_data *mount_data,
const struct fuse_iqueue_ops *fiq_ops,
void *fiq_priv,
void **fudptr);
struct fuse_mount_data *mount_data);
void fuse_send_init(struct fuse_conn *fc, struct fuse_req *req);
/**
* Disassociate fuse connection from superblock and kill the superblock
*
* Calls kill_anon_super(), use with do not use with bdev mounts.
*/
void fuse_kill_sb_anon(struct super_block *sb);
/**
* Add connection to control filesystem
*/
......
......@@ -432,6 +432,7 @@ static int fuse_statfs(struct dentry *dentry, struct kstatfs *buf)
enum {
OPT_FD,
OPT_TAG,
OPT_ROOTMODE,
OPT_USER_ID,
OPT_GROUP_ID,
......@@ -444,6 +445,7 @@ enum {
static const match_table_t tokens = {
{OPT_FD, "fd=%u"},
{OPT_TAG, "tag=%s"},
{OPT_ROOTMODE, "rootmode=%o"},
{OPT_USER_ID, "user_id=%u"},
{OPT_GROUP_ID, "group_id=%u"},
......@@ -466,7 +468,7 @@ static int fuse_match_uint(substring_t *s, unsigned int *res)
}
int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev,
struct user_namespace *user_ns)
struct user_namespace *user_ns)
{
char *p;
memset(d, 0, sizeof(struct fuse_mount_data));
......@@ -490,6 +492,11 @@ int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev,
d->fd_present = 1;
break;
case OPT_TAG:
d->tag = args[0].from;
d->tag_present = 1;
break;
case OPT_ROOTMODE:
if (match_octal(&args[0], &value))
return 0;
......@@ -624,8 +631,12 @@ EXPORT_SYMBOL_GPL(fuse_conn_init);
void fuse_conn_put(struct fuse_conn *fc)
{
if (refcount_dec_and_test(&fc->count)) {
struct fuse_iqueue *fiq = &fc->iq;
if (fc->destroy_req)
fuse_request_free(fc->destroy_req);
if (fiq->ops->release)
fiq->ops->release(fiq);
put_pid_ns(fc->pid_ns);
put_user_ns(fc->user_ns);
fc->release(fc);
......@@ -1062,10 +1073,7 @@ void fuse_dev_free(struct fuse_dev *fud)
EXPORT_SYMBOL_GPL(fuse_dev_free);
int fuse_fill_super_common(struct super_block *sb,
struct fuse_mount_data *mount_data,
const struct fuse_iqueue_ops *fiq_ops,
void *fiq_priv,
void **fudptr)
struct fuse_mount_data *mount_data)
{
struct fuse_dev *fud;
struct fuse_conn *fc;
......@@ -1112,7 +1120,8 @@ int fuse_fill_super_common(struct super_block *sb,
if (!fc)
goto err;
fuse_conn_init(fc, sb->s_user_ns, fiq_ops, fiq_priv);
fuse_conn_init(fc, sb->s_user_ns, mount_data->fiq_ops,
mount_data->fiq_priv);
fc->release = fuse_free_conn;
fud = fuse_dev_alloc_install(fc);
......@@ -1148,7 +1157,7 @@ int fuse_fill_super_common(struct super_block *sb,
/* Root dentry doesn't have .d_revalidate */
sb->s_d_op = &fuse_dentry_operations;
if (is_bdev) {
if (mount_data->destroy) {
fc->destroy_req = fuse_request_alloc(0);
if (!fc->destroy_req)
goto err_put_root;
......@@ -1156,7 +1165,7 @@ int fuse_fill_super_common(struct super_block *sb,
mutex_lock(&fuse_mutex);
err = -EINVAL;
if (*fudptr)
if (*mount_data->fudptr)
goto err_unlock;
err = fuse_ctl_add_conn(fc);
......@@ -1165,7 +1174,7 @@ int fuse_fill_super_common(struct super_block *sb,
list_add_tail(&fc->entry, &fuse_conn_list);
sb->s_root = root_dentry;
*fudptr = fud;
*mount_data->fudptr = fud;
mutex_unlock(&fuse_mutex);
return 0;
......@@ -1214,8 +1223,11 @@ static int fuse_fill_super(struct super_block *sb, void *data, int silent)
goto err_fput;
__set_bit(FR_BACKGROUND, &init_req->flags);
err = fuse_fill_super_common(sb, &d, &fuse_dev_fiq_ops, NULL,
&file->private_data);
d.fiq_ops = &fuse_dev_fiq_ops;
d.fiq_priv = NULL;
d.fudptr = &file->private_data;
d.destroy = is_bdev;
err = fuse_fill_super_common(sb, &d);
if (err < 0)
goto err_free_init_req;
/*
......@@ -1258,11 +1270,12 @@ static void fuse_sb_destroy(struct super_block *sb)
}
}
static void fuse_kill_sb_anon(struct super_block *sb)
void fuse_kill_sb_anon(struct super_block *sb)
{
fuse_sb_destroy(sb);
kill_anon_super(sb);
}
EXPORT_SYMBOL_GPL(fuse_kill_sb_anon);
static struct file_system_type fuse_fs_type = {
.owner = THIS_MODULE,
......
此差异已折叠。
/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) */
#ifndef _UAPI_LINUX_VIRTIO_FS_H
#define _UAPI_LINUX_VIRTIO_FS_H
#include <linux/types.h>
#include <linux/virtio_ids.h>
#include <linux/virtio_config.h>
#include <linux/virtio_types.h>
struct virtio_fs_config {
/* Filesystem name (UTF-8, not NUL-terminated, padded with NULs) */
__u8 tag[36];
/* Number of request queues */
__u32 num_request_queues;
} __attribute__((packed));
#endif /* _UAPI_LINUX_VIRTIO_FS_H */
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册