1. 01 7月, 2017 2 次提交
  2. 27 6月, 2017 3 次提交
    • J
      lightnvm: pblk: redesign GC algorithm · b20ba1bc
      Javier González 提交于
      At the moment, in order to get enough read parallelism, we have recycled
      several lines at the same time. This approach has proven not to work
      well when reaching capacity, since we end up mixing valid data from all
      lines, thus not maintaining a sustainable free/recycled line ratio.
      
      The new design, relies on a two level workqueue mechanism. In the first
      level, we read the metadata for a number of lines based on the GC list
      they reside on (this is governed by the number of valid sectors in each
      line). In the second level, we recycle a single line at a time. Here, we
      issue reads in parallel, while a single GC write thread places data in
      the write buffer. This design allows to (i) only move data from one line
      at a time, thus maintaining a sane free/recycled ration and (ii)
      maintain the GC writer busy with recycled data.
      Signed-off-by: NJavier González <javier@cnexlabs.com>
      Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b20ba1bc
    • J
      lightnvm: pblk: choose optimal victim GC line · d45ebd47
      Javier González 提交于
      At the moment, we separate the closed lines on three different list
      based on their number of valid sectors. GC recycles lines from each list
      based on capacity. Lines from each list are taken in a FIFO fashion.
      
      Since the number of lines is limited (it corresponds to the number of
      blocks in a LUN, which is somewhere between 1000-2000), we can afford
      scanning the lists to choose the optimal line to be recycled. This helps
      specially in lines with a high number of valid sectors.
      
      If the number of blocks per LUN increases, we will consider a more
      efficient policy.
      Signed-off-by: NJavier González <javier@cnexlabs.com>
      Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d45ebd47
    • J
      lightnvm: pblk: sched. metadata on write thread · dd2a4343
      Javier González 提交于
      At the moment, line metadata is persisted on a separate work queue, that
      is kicked each time that a line is closed. The assumption when designing
      this was that freeing the write thread from creating a new write request
      was better than the potential impact of writes colliding on the media
      (user I/O and metadata I/O). Experimentation has proven that this
      assumption is wrong; collision can cause up to 25% of bandwidth and
      introduce long tail latencies on the write thread, which potentially
      cause user write threads to spend more time spinning to get a free entry
      on the write buffer.
      
      This patch moves the metadata logic to the write thread. When a line is
      closed, remaining metadata is written in memory and is placed on a
      metadata queue. The write thread then takes the metadata corresponding
      to the previous line, creates the write request and schedules it to
      minimize collisions on the media. Using this approach, we see that we
      can saturate the media's bandwidth, which helps reducing both write
      latencies and the spinning time for user writer threads.
      Signed-off-by: NJavier González <javier@cnexlabs.com>
      Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      dd2a4343
  3. 24 4月, 2017 1 次提交
    • J
      lightnvm: pblk: fix erase counters on error fail · a44f53fa
      Javier González 提交于
      When block erases fail, these blocks are marked bad. The number of valid
      blocks in the line was not updated, which could cause an infinite loop
      on the erase path.
      
      Fix this atomic counter and, in order to avoid taking an irq lock on the
      interrupt context, make the erase counters atomic too.
      
      Also, in the case that a significant number of blocks become bad in a
      line, the result is the double shared metadata buffer (emeta) to stop
      the pipeline until all metadata is flushed to the media. Increase the
      number of metadata lines from 2 to 4 to avoid this case.
      
      Fixes: a4bd217b "lightnvm: physical block device (pblk) target"
      Signed-off-by: NJavier González <javier@cnexlabs.com>
      Reviewed-by: NMatias Bjørling <matias@cnexlabs.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      a44f53fa
  4. 17 4月, 2017 2 次提交
    • D
      lightnvm: pblk-gc: fix an error pointer dereference in init · 503ec94e
      Dan Carpenter 提交于
      These labels are reversed so we could end up dereferencing an error
      pointer or leaking.
      
      Fixes: 7f347ba6bb3a ("lightnvm: physical block device (pblk) target")
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      503ec94e
    • J
      lightnvm: physical block device (pblk) target · a4bd217b
      Javier González 提交于
      This patch introduces pblk, a host-side translation layer for
      Open-Channel SSDs to expose them like block devices. The translation
      layer allows data placement decisions, and I/O scheduling to be
      managed by the host, enabling users to optimize the SSD for their
      specific workloads.
      
      An open-channel SSD has a set of LUNs (parallel units) and a
      collection of blocks. Each block can be read in any order, but
      writes must be sequential. Writes may also fail, and if a block
      requires it, must also be reset before new writes can be
      applied.
      
      To manage the constraints, pblk maintains a logical to
      physical address (L2P) table,  write cache, garbage
      collection logic, recovery scheme, and logic to rate-limit
      user I/Os versus garbage collection I/Os.
      
      The L2P table is fully-associative and manages sectors at a
      4KB granularity. Pblk stores the L2P table in two places, in
      the out-of-band area of the media and on the last page of a
      line. In the cause of a power failure, pblk will perform a
      scan to recover the L2P table.
      
      The user data is organized into lines. A line is data
      striped across blocks and LUNs. The lines enable the host to
      reduce the amount of metadata to maintain besides the user
      data and makes it easier to implement RAID or erasure coding
      in the future.
      
      pblk implements multi-tenant support and can be instantiated
      multiple times on the same drive. Each instance owns a
      portion of the SSD - both regarding I/O bandwidth and
      capacity - providing I/O isolation for each case.
      
      Finally, pblk also exposes a sysfs interface that allows
      user-space to peek into the internals of pblk. The interface
      is available at /dev/block/*/pblk/ where * is the block
      device name exposed.
      
      This work also contains contributions from:
        Matias Bjørling <matias@cnexlabs.com>
        Simon A. F. Lund <slund@cnexlabs.com>
        Young Tack Jin <youngtack.jin@gmail.com>
        Huaicheng Li <huaicheng@cs.uchicago.edu>
      Signed-off-by: NJavier González <javier@cnexlabs.com>
      Signed-off-by: NMatias Bjørling <matias@cnexlabs.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      a4bd217b
新手
引导
客服 返回
顶部