1. 09 9月, 2009 26 次提交
  2. 30 8月, 2009 14 次提交
    • D
      md/raid456: distribute raid processing over multiple cores · 07a3b417
      Dan Williams 提交于
      Now that the resources to handle stripe_head operations are allocated
      percpu it is possible for raid5d to distribute stripe handling over
      multiple cores.  This conversion also adds a call to cond_resched() in
      the non-multicore case to prevent one core from getting monopolized for
      raid operations.
      
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      07a3b417
    • Y
      md/raid6: remove synchronous infrastructure · b774ef49
      Yuri Tikhonov 提交于
      These routines have been replaced by there asynchronous counterparts.
      Signed-off-by: NYuri Tikhonov <yur@emcraft.com>
      Signed-off-by: NIlya Yanok <yanok@emcraft.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      b774ef49
    • Y
      md/raid6: asynchronous handle_stripe6 · 6c0069c0
      Yuri Tikhonov 提交于
      1/ Use STRIPE_OP_BIOFILL to offload completion of read requests to
         raid_run_ops
      2/ Implement a handler for sh->reconstruct_state similar to the raid5 case
         (adds handling of Q parity)
      3/ Prevent handle_parity_checks6 from running concurrently with 'compute'
         operations
      4/ Hook up raid_run_ops
      Signed-off-by: NYuri Tikhonov <yur@emcraft.com>
      Signed-off-by: NIlya Yanok <yanok@emcraft.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      
      6c0069c0
    • D
      md/raid6: asynchronous handle_parity_check6 · d82dfee0
      Dan Williams 提交于
      [ Based on an original patch by Yuri Tikhonov ]
      
      Implement the state machine for handling the RAID-6 parities check and
      repair functionality.  Note that the raid6 case does not need to check
      for new failures, like raid5, as it will always writeback the correct
      disks.  The raid5 case can be updated to check zero_sum_result to avoid
      getting confused by new failures rather than retrying the entire check
      operation.
      Signed-off-by: NYuri Tikhonov <yur@emcraft.com>
      Signed-off-by: NIlya Yanok <yanok@emcraft.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      
      
      d82dfee0
    • Y
      md/raid6: asynchronous handle_stripe_dirtying6 · a9b39a74
      Yuri Tikhonov 提交于
      In the synchronous implementation of stripe dirtying we processed a
      degraded stripe with one call to handle_stripe_dirtying6().  I.e.
      compute the missing blocks from the other drives, then copy in the new
      data and reconstruct the parities.
      
      In the asynchronous case we do not perform stripe operations directly.
      Instead, operations are scheduled with flags to be later serviced by
      raid_run_ops.  So, for the degraded case the final reconstruction step
      can only be carried out after all blocks have been brought up to date by
      being read, or computed.  Like the raid5 case schedule_reconstruction()
      sets STRIPE_OP_RECONSTRUCT to request a parity generation pass and
      through operation chaining can handle compute and reconstruct in a
      single raid_run_ops pass.
      
      [dan.j.williams@intel.com: fixup handle_stripe_dirtying6 gating]
      Signed-off-by: NYuri Tikhonov <yur@emcraft.com>
      Signed-off-by: NIlya Yanok <yanok@emcraft.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      a9b39a74
    • Y
      md/raid6: asynchronous handle_stripe_fill6 · 5599becc
      Yuri Tikhonov 提交于
      Modify handle_stripe_fill6 to work asynchronously by introducing
      fetch_block6 as the raid6 analog of fetch_block5 (schedule compute
      operations for missing/out-of-sync disks).
      
      [dan.j.williams@intel.com: compute D+Q in one pass]
      Signed-off-by: NYuri Tikhonov <yur@emcraft.com>
      Signed-off-by: NIlya Yanok <yanok@emcraft.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      5599becc
    • Y
      md/raid5,6: common schedule_reconstruction for raid5/6 · c0f7bddb
      Yuri Tikhonov 提交于
      Extend schedule_reconstruction5 for reuse by the raid6 path.  Add
      support for generating Q and BUG() if a request is made to perform
      'prexor'.
      Signed-off-by: NYuri Tikhonov <yur@emcraft.com>
      Signed-off-by: NIlya Yanok <yanok@emcraft.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      c0f7bddb
    • D
      md/raid6: asynchronous raid6 operations · ac6b53b6
      Dan Williams 提交于
      [ Based on an original patch by Yuri Tikhonov ]
      
      The raid_run_ops routine uses the asynchronous offload api and
      the stripe_operations member of a stripe_head to carry out xor+pq+copy
      operations asynchronously, outside the lock.
      
      The operations performed by RAID-6 are the same as in the RAID-5 case
      except for no support of STRIPE_OP_PREXOR operations. All the others
      are supported:
      STRIPE_OP_BIOFILL
       - copy data into request buffers to satisfy a read request
      STRIPE_OP_COMPUTE_BLK
       - generate missing blocks (1 or 2) in the cache from the other blocks
      STRIPE_OP_BIODRAIN
       - copy data out of request buffers to satisfy a write request
      STRIPE_OP_RECONSTRUCT
       - recalculate parity for new data that has entered the cache
      STRIPE_OP_CHECK
       - verify that the parity is correct
      
      The flow is the same as in the RAID-5 case, and reuses some routines, namely:
      1/ ops_complete_postxor (renamed to ops_complete_reconstruct)
      2/ ops_complete_compute (updated to set up to 2 targets uptodate)
      3/ ops_run_check (renamed to ops_run_check_p for xor parity checks)
      
      [neilb@suse.de: fixes to get it to pass mdadm regression suite]
      Reviewed-by: NAndre Noll <maan@systemlinux.org>
      Signed-off-by: NYuri Tikhonov <yur@emcraft.com>
      Signed-off-by: NIlya Yanok <yanok@emcraft.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      
      
      
      ac6b53b6
    • D
      md/raid5: factor out mark_uptodate from ops_complete_compute5 · 4e7d2c0a
      Dan Williams 提交于
      ops_complete_compute5 can be reused in the raid6 path if it is updated to
      generically handle a second target.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      4e7d2c0a
    • D
      async_tx: raid6 recovery self test · cb3c8299
      Dan Williams 提交于
      Port drivers/md/raid6test/test.c to use the async raid6 recovery
      routines.  This is meant as a unit test for raid6 acceleration drivers.  In
      addition to the 16-drive test case this implements tests for the 4-disk and
      5-disk special cases (dma devices can not generically handle less than 2
      sources), and adds a test for the D+Q case.
      Reviewed-by: NAndre Noll <maan@systemlinux.org>
      Acked-by: NMaciej Sosnowski <maciej.sosnowski@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      
      cb3c8299
    • D
      dmatest: add pq support · 58691d64
      Dan Williams 提交于
      Test raid6 p+q operations with a simple "always multiply by 1" q
      calculation to fit into dmatest's current destination verification
      scheme.
      Reviewed-by: NAndre Noll <maan@systemlinux.org>
      Acked-by: NMaciej Sosnowski <maciej.sosnowski@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      
      58691d64
    • D
      async_tx: add support for asynchronous RAID6 recovery operations · 0a82a623
      Dan Williams 提交于
       async_raid6_2data_recov() recovers two data disk failures
      
       async_raid6_datap_recov() recovers a data disk and the P disk
      
      These routines are a port of the synchronous versions found in
      drivers/md/raid6recov.c.  The primary difference is breaking out the xor
      operations into separate calls to async_xor.  Two helper routines are
      introduced to perform scalar multiplication where needed.
      async_sum_product() multiplies two sources by scalar coefficients and
      then sums (xor) the result.  async_mult() simply multiplies a single
      source by a scalar.
      
      This implemention also includes, in contrast to the original
      synchronous-only code, special case handling for the 4-disk and 5-disk
      array cases.  In these situations the default N-disk algorithm will
      present 0-source or 1-source operations to dma devices.  To cover for
      dma devices where the minimum source count is 2 we implement 4-disk and
      5-disk handling in the recovery code.
      
      [ Impact: asynchronous raid6 recovery routines for 2data and datap cases ]
      
      Cc: Yuri Tikhonov <yur@emcraft.com>
      Cc: Ilya Yanok <yanok@emcraft.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: David Woodhouse <David.Woodhouse@intel.com>
      Reviewed-by: NAndre Noll <maan@systemlinux.org>
      Acked-by: NMaciej Sosnowski <maciej.sosnowski@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      
      0a82a623
    • D
      async_tx: add support for asynchronous GF multiplication · b2f46fd8
      Dan Williams 提交于
      [ Based on an original patch by Yuri Tikhonov ]
      
      This adds support for doing asynchronous GF multiplication by adding
      two additional functions to the async_tx API:
      
       async_gen_syndrome() does simultaneous XOR and Galois field
          multiplication of sources.
      
       async_syndrome_val() validates the given source buffers against known P
          and Q values.
      
      When a request is made to run async_pq against more than the hardware
      maximum number of supported sources we need to reuse the previous
      generated P and Q values as sources into the next operation.  Care must
      be taken to remove Q from P' and P from Q'.  For example to perform a 5
      source pq op with hardware that only supports 4 sources at a time the
      following approach is taken:
      
      p, q = PQ(src0, src1, src2, src3, COEF({01}, {02}, {04}, {08}))
      p', q' = PQ(p, q, q, src4, COEF({00}, {01}, {00}, {10}))
      
      p' = p + q + q + src4 = p + src4
      q' = {00}*p + {01}*q + {00}*q + {10}*src4 = q + {10}*src4
      
      Note: 4 is the minimum acceptable maxpq otherwise we punt to
      synchronous-software path.
      
      The DMA_PREP_CONTINUE flag indicates to the driver to reuse p and q as
      sources (in the above manner) and fill the remaining slots up to maxpq
      with the new sources/coefficients.
      
      Note1: Some devices have native support for P+Q continuation and can skip
      this extra work.  Devices with this capability can advertise it with
      dma_set_maxpq.  It is up to each driver how to handle the
      DMA_PREP_CONTINUE flag.
      
      Note2: The api supports disabling the generation of P when generating Q,
      this is ignored by the synchronous path but is implemented by some dma
      devices to save unnecessary writes.  In this case the continuation
      algorithm is simplified to only reuse Q as a source.
      
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: David Woodhouse <David.Woodhouse@intel.com>
      Signed-off-by: NYuri Tikhonov <yur@emcraft.com>
      Signed-off-by: NIlya Yanok <yanok@emcraft.com>
      Reviewed-by: NAndre Noll <maan@systemlinux.org>
      Acked-by: NMaciej Sosnowski <maciej.sosnowski@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      b2f46fd8
    • D
      async_tx: remove walk of tx->parent chain in dma_wait_for_async_tx · 95475e57
      Dan Williams 提交于
      We currently walk the parent chain when waiting for a given tx to
      complete however this walk may race with the driver cleanup routine.
      The routines in async_raid6_recov.c may fall back to the synchronous
      path at any point so we need to be prepared to call async_tx_quiesce()
      (which calls  dma_wait_for_async_tx).  To remove the ->parent walk we
      guarantee that every time a dependency is attached ->issue_pending() is
      invoked, then we can simply poll the initial descriptor until
      completion.
      
      This also allows for a lighter weight 'issue pending' implementation as
      there is no longer a requirement to iterate through all the channels'
      ->issue_pending() routines as long as operations have been submitted in
      an ordered chain.  async_tx_issue_pending() is added for this case.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      95475e57