• B
    exofs: RAID0 support · 5d952b83
    Boaz Harrosh 提交于
    We now support striping over mirror devices. Including variable sized
    stripe_unit.
    
    Some limits:
    * stripe_unit must be a multiple of PAGE_SIZE
    * stripe_unit * stripe_count is maximum upto 32-bit (4Gb)
    
    Tested RAID0 over mirrors, RAID0 only, mirrors only. All check.
    
    Design notes:
    * I'm not using a vectored raid-engine mechanism yet. Following the
      pnfs-objects-layout data-map structure, "Mirror" is just a private
      case of "group_width" == 1, and RAID0 is a private case of
      "Mirrors" == 1. The performance lose of the general case over the
      particular special case optimization is totally negligible, also
      considering the extra code size.
    
    * In general I added a prepare_stripes() stage that divides the
      to-be-io pages to the participating devices, the previous
      exofs_ios_write/read, now becomes _write/read_mirrors and a new
      write/read upper layer loops on all devices calling
      _write/read_mirrors. Effectively the prepare_stripes stage is the all
      secret.
      Also truncate need fixing to accommodate for striping.
    
    * In a RAID0 arrangement, in a regular usage scenario, if all inode
      layouts will start at the same device, the small files fill up the
      first device and the later devices stay empty, the farther the device
      the emptier it is.
    
      To fix that, each inode will start at a different stripe_unit,
      according to it's obj_id modulus number-of-stripe-units. And
      will then span all stripe-units in the same incrementing order
      wrapping back to the beginning of the device table. We call it
      a stripe-units moving window.
    
      Special consideration was taken to keep all devices in a mirror
      arrangement identical. So a broken osd-device could just be cloned
      from one of the mirrors and no FS scrubbing is needed. (We do that
      by rotating stripe-unit at a time and not a single device at a time.)
    
    TODO:
     We no longer verify object_length == inode->i_size in exofs_iget.
     (since i_size is stripped on multiple objects now).
     I should introduce a multiple-device attribute reading, and use
     it in exofs_iget.
    Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
    5d952b83
super.c 20.7 KB