• S
    virtio-fs: add virtiofs filesystem · 917f6dfb
    Stefan Hajnoczi 提交于
    task #28910367
    commit a62a8ef9d97da23762a588592c8b8eb50a8deb6a upstream
    
    Add a basic file system module for virtio-fs.  This does not yet contain
    shared data support between host and guest or metadata coherency speedups.
    However it is already significantly faster than virtio-9p.
    
    Design Overview
    ===============
    
    With the goal of designing something with better performance and local file
    system semantics, a bunch of ideas were proposed.
    
     - Use fuse protocol (instead of 9p) for communication between guest and
       host.  Guest kernel will be fuse client and a fuse server will run on
       host to serve the requests.
    
     - For data access inside guest, mmap portion of file in QEMU address space
       and guest accesses this memory using dax.  That way guest page cache is
       bypassed and there is only one copy of data (on host).  This will also
       enable mmap(MAP_SHARED) between guests.
    
     - For metadata coherency, there is a shared memory region which contains
       version number associated with metadata and any guest changing metadata
       updates version number and other guests refresh metadata on next access.
       This is yet to be implemented.
    
    How virtio-fs differs from existing approaches
    ==============================================
    
    The unique idea behind virtio-fs is to take advantage of the co-location of
    the virtual machine and hypervisor to avoid communication (vmexits).
    
    DAX allows file contents to be accessed without communication with the
    hypervisor.  The shared memory region for metadata avoids communication in
    the common case where metadata is unchanged.
    
    By replacing expensive communication with cheaper shared memory accesses,
    we expect to achieve better performance than approaches based on network
    file system protocols.  In addition, this also makes it easier to achieve
    local file system semantics (coherency).
    
    These techniques are not applicable to network file system protocols since
    the communications channel is bypassed by taking advantage of shared memory
    on a local machine.  This is why we decided to build virtio-fs rather than
    focus on 9P or NFS.
    
    Caching Modes
    =============
    
    Like virtio-9p, different caching modes are supported which determine the
    coherency level as well.  The “cache=FOO” and “writeback” options control
    the level of coherence between the guest and host filesystems.
    
     - cache=none
       metadata, data and pathname lookup are not cached in guest.  They are
       always fetched from host and any changes are immediately pushed to host.
    
     - cache=always
       metadata, data and pathname lookup are cached in guest and never expire.
    
     - cache=auto
       metadata and pathname lookup cache expires after a configured amount of
       time (default is 1 second).  Data is cached while the file is open
       (close to open consistency).
    
     - writeback/no_writeback
       These options control the writeback strategy.  If writeback is disabled,
       then normal writes will immediately be synchronized with the host fs.
       If writeback is enabled, then writes may be cached in the guest until
       the file is closed or an fsync(2) performed.  This option has no effect
       on mmap-ed writes or writes going through the DAX mechanism.
    Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
    Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
    Acked-by: NMichael S. Tsirkin <mst@redhat.com>
    Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
    
    (cherry picked from commit a62a8ef9d97da23762a588592c8b8eb50a8deb6a)
    [Liubo: given that 4.19 lacks the support of fs_context to parse mount
    option, here I just change it back to the 4.19 way, so we still use -o
    tag=myfs-1 to get virtiofs mount.]
    Signed-off-by: NLiu Bo <bo.liu@linux.alibaba.com>
    Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
    917f6dfb
inode.c 34.0 KB