1. 27 11月, 2019 2 次提交
  2. 22 11月, 2019 3 次提交
  3. 16 11月, 2019 1 次提交
    • D
      pipe: Allow pipes to have kernel-reserved slots · 6718b6f8
      David Howells 提交于
      Split pipe->ring_size into two numbers:
      
       (1) pipe->ring_size - indicates the hard size of the pipe ring.
      
       (2) pipe->max_usage - indicates the maximum number of pipe ring slots that
           userspace orchestrated events can fill.
      
      This allows for a pipe that is both writable by the general kernel
      notification facility and by userspace, allowing plenty of ring space for
      notifications to be added whilst preventing userspace from being able to
      pin too much unswappable kernel space.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      6718b6f8
  4. 12 11月, 2019 4 次提交
  5. 31 10月, 2019 1 次提交
    • D
      pipe: Use head and tail pointers for the ring, not cursor and length · 8cefc107
      David Howells 提交于
      Convert pipes to use head and tail pointers for the buffer ring rather than
      pointer and length as the latter requires two atomic ops to update (or a
      combined op) whereas the former only requires one.
      
       (1) The head pointer is the point at which production occurs and points to
           the slot in which the next buffer will be placed.  This is equivalent
           to pipe->curbuf + pipe->nrbufs.
      
           The head pointer belongs to the write-side.
      
       (2) The tail pointer is the point at which consumption occurs.  It points
           to the next slot to be consumed.  This is equivalent to pipe->curbuf.
      
           The tail pointer belongs to the read-side.
      
       (3) head and tail are allowed to run to UINT_MAX and wrap naturally.  They
           are only masked off when the array is being accessed, e.g.:
      
      	pipe->bufs[head & mask]
      
           This means that it is not necessary to have a dead slot in the ring as
           head == tail isn't ambiguous.
      
       (4) The ring is empty if "head == tail".
      
           A helper, pipe_empty(), is provided for this.
      
       (5) The occupancy of the ring is "head - tail".
      
           A helper, pipe_occupancy(), is provided for this.
      
       (6) The number of free slots in the ring is "pipe->ring_size - occupancy".
      
           A helper, pipe_space_for_user() is provided to indicate how many slots
           userspace may use.
      
       (7) The ring is full if "head - tail >= pipe->ring_size".
      
           A helper, pipe_full(), is provided for this.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      8cefc107
  6. 23 10月, 2019 5 次提交
  7. 21 10月, 2019 7 次提交
    • V
      virtiofs: Retry request submission from worker context · a9bfd9dd
      Vivek Goyal 提交于
      If regular request queue gets full, currently we sleep for a bit and
      retrying submission in submitter's context. This assumes submitter is not
      holding any spin lock. But this assumption is not true for background
      requests. For background requests, we are called with fc->bg_lock held.
      
      This can lead to deadlock where one thread is trying submission with
      fc->bg_lock held while request completion thread has called
      fuse_request_end() which tries to acquire fc->bg_lock and gets blocked. As
      request completion thread gets blocked, it does not make further progress
      and that means queue does not get empty and submitter can't submit more
      requests.
      
      To solve this issue, retry submission with the help of a worker, instead of
      retrying in submitter's context. We already do this for hiprio/forget
      requests.
      Reported-by: NChirantan Ekbote <chirantan@chromium.org>
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      a9bfd9dd
    • V
      virtiofs: Count pending forgets as in_flight forgets · c17ea009
      Vivek Goyal 提交于
      If virtqueue is full, we put forget requests on a list and these forgets
      are dispatched later using a worker. As of now we don't count these forgets
      in fsvq->in_flight variable. This means when queue is being drained, we
      have to have special logic to first drain these pending requests and then
      wait for fsvq->in_flight to go to zero.
      
      By counting pending forgets in fsvq->in_flight, we can get rid of special
      logic and just wait for in_flight to go to zero. Worker thread will kick
      and drain all the forgets anyway, leading in_flight to zero.
      
      I also need similar logic for normal request queue in next patch where I am
      about to defer request submission in the worker context if queue is full.
      
      This simplifies the code a bit.
      
      Also add two helper functions to inc/dec in_flight. Decrement in_flight
      helper will later used to call completion when in_flight reaches zero.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      c17ea009
    • V
      virtiofs: Set FR_SENT flag only after request has been sent · 5dbe190f
      Vivek Goyal 提交于
      FR_SENT flag should be set when request has been sent successfully sent
      over virtqueue. This is used by interrupt logic to figure out if interrupt
      request should be sent or not.
      
      Also add it to fqp->processing list after sending it successfully.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      5dbe190f
    • V
      virtiofs: No need to check fpq->connected state · 7ee1e2e6
      Vivek Goyal 提交于
      In virtiofs we keep per queue connected state in virtio_fs_vq->connected
      and use that to end request if queue is not connected. And virtiofs does
      not even touch fpq->connected state.
      
      We probably need to merge these two at some point of time. For now,
      simplify the code a bit and do not worry about checking state of
      fpq->connected.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      7ee1e2e6
    • V
      virtiofs: Do not end request in submission context · 51fecdd2
      Vivek Goyal 提交于
      Submission context can hold some locks which end request code tries to hold
      again and deadlock can occur. For example, fc->bg_lock. If a background
      request is being submitted, it might hold fc->bg_lock and if we could not
      submit request (because device went away) and tried to end request, then
      deadlock happens. During testing, I also got a warning from deadlock
      detection code.
      
      So put requests on a list and end requests from a worker thread.
      
      I got following warning from deadlock detector.
      
      [  603.137138] WARNING: possible recursive locking detected
      [  603.137142] --------------------------------------------
      [  603.137144] blogbench/2036 is trying to acquire lock:
      [  603.137149] 00000000f0f51107 (&(&fc->bg_lock)->rlock){+.+.}, at: fuse_request_end+0xdf/0x1c0 [fuse]
      [  603.140701]
      [  603.140701] but task is already holding lock:
      [  603.140703] 00000000f0f51107 (&(&fc->bg_lock)->rlock){+.+.}, at: fuse_simple_background+0x92/0x1d0 [fuse]
      [  603.140713]
      [  603.140713] other info that might help us debug this:
      [  603.140714]  Possible unsafe locking scenario:
      [  603.140714]
      [  603.140715]        CPU0
      [  603.140716]        ----
      [  603.140716]   lock(&(&fc->bg_lock)->rlock);
      [  603.140718]   lock(&(&fc->bg_lock)->rlock);
      [  603.140719]
      [  603.140719]  *** DEADLOCK ***
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      51fecdd2
    • M
      fuse: don't advise readdirplus for negative lookup · 6c26f717
      Miklos Szeredi 提交于
      If the FUSE_READDIRPLUS_AUTO feature is enabled, then lookups on a
      directory before/during readdir are used as an indication that READDIRPLUS
      should be used instead of READDIR.  However if the lookup turns out to be
      negative, then selecting READDIRPLUS makes no sense.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      6c26f717
    • M
      fuse: don't dereference req->args on finished request · 2b319d1f
      Miklos Szeredi 提交于
      Move the check for async request after check for the request being already
      finished and done with.
      
      Reported-by: syzbot+ae0bb7aae3de6b4594e2@syzkaller.appspotmail.com
      Fixes: d4993774 ("fuse: stop copying args to fuse_req")
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      2b319d1f
  8. 15 10月, 2019 1 次提交
  9. 14 10月, 2019 1 次提交
  10. 24 9月, 2019 7 次提交
  11. 19 9月, 2019 1 次提交
    • S
      virtio-fs: add virtiofs filesystem · a62a8ef9
      Stefan Hajnoczi 提交于
      Add a basic file system module for virtio-fs.  This does not yet contain
      shared data support between host and guest or metadata coherency speedups.
      However it is already significantly faster than virtio-9p.
      
      Design Overview
      ===============
      
      With the goal of designing something with better performance and local file
      system semantics, a bunch of ideas were proposed.
      
       - Use fuse protocol (instead of 9p) for communication between guest and
         host.  Guest kernel will be fuse client and a fuse server will run on
         host to serve the requests.
      
       - For data access inside guest, mmap portion of file in QEMU address space
         and guest accesses this memory using dax.  That way guest page cache is
         bypassed and there is only one copy of data (on host).  This will also
         enable mmap(MAP_SHARED) between guests.
      
       - For metadata coherency, there is a shared memory region which contains
         version number associated with metadata and any guest changing metadata
         updates version number and other guests refresh metadata on next access.
         This is yet to be implemented.
      
      How virtio-fs differs from existing approaches
      ==============================================
      
      The unique idea behind virtio-fs is to take advantage of the co-location of
      the virtual machine and hypervisor to avoid communication (vmexits).
      
      DAX allows file contents to be accessed without communication with the
      hypervisor.  The shared memory region for metadata avoids communication in
      the common case where metadata is unchanged.
      
      By replacing expensive communication with cheaper shared memory accesses,
      we expect to achieve better performance than approaches based on network
      file system protocols.  In addition, this also makes it easier to achieve
      local file system semantics (coherency).
      
      These techniques are not applicable to network file system protocols since
      the communications channel is bypassed by taking advantage of shared memory
      on a local machine.  This is why we decided to build virtio-fs rather than
      focus on 9P or NFS.
      
      Caching Modes
      =============
      
      Like virtio-9p, different caching modes are supported which determine the
      coherency level as well.  The “cache=FOO” and “writeback” options control
      the level of coherence between the guest and host filesystems.
      
       - cache=none
         metadata, data and pathname lookup are not cached in guest.  They are
         always fetched from host and any changes are immediately pushed to host.
      
       - cache=always
         metadata, data and pathname lookup are cached in guest and never expire.
      
       - cache=auto
         metadata and pathname lookup cache expires after a configured amount of
         time (default is 1 second).  Data is cached while the file is open
         (close to open consistency).
      
       - writeback/no_writeback
         These options control the writeback strategy.  If writeback is disabled,
         then normal writes will immediately be synchronized with the host fs.
         If writeback is enabled, then writes may be cached in the guest until
         the file is closed or an fsync(2) performed.  This option has no effect
         on mmap-ed writes or writes going through the DAX mechanism.
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      a62a8ef9
  12. 12 9月, 2019 7 次提交