1. 31 3月, 2009 14 次提交
    • N
      md: be more consistent about setting WriteMostly flag when adding a drive to an array · 575a80fa
      NeilBrown 提交于
      When a drive is added to an array using ADD_NEW_DISK, there are two
      places we can get certain flags from:  the metadata on the disk or the
      flags passed through the IOCTL.
      
      For the WriteMostly flag (aka MD_DISK_WRITEMOSTLY) we take the value
      from either of those sources depending on if it is set (i.e. we
      effectively 'or' the two sources together).
      
      This makes it awkward to clear, and is at best inconsistent.
      
      As documented code (in mdadm) requires that setting
      MD_DISK_WRITEMOSTLY in the ioctl will be effective, we resolve the
      inconsistency by always using the value for this flag from the ioctl,
      and ignoring the value on disk.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      575a80fa
    • N
      md: occasionally checkpoint drive recovery to reduce duplicate effort after a crash · 97e4f42d
      NeilBrown 提交于
      Version 1.x metadata has the ability to record the status of a
      partially completed drive recovery.
      However we only update that record on a clean shutdown.
      It would be nice to update it on unclean shutdowns too, particularly
      when using a bitmap that removes much to the 'sync' effort after an
      unclean shutdown.
      
      One complication with checkpointing recovery is that we only know
      where we are up to in terms of IO requests started, not which ones
      have completed.  And we need to know what has completed to record
      how much is recovered.  So occasionally pause the recovery until all
      submitted requests are completed, then update the record of where
      we are up to.
      
      When we have a bitmap, we already do that pause occasionally to keep
      the bitmap up-to-date.  So enhance that code to record the recovery
      offset and schedule a superblock update.
      And when there is no bitmap, just pause 16 times during the resync to
      do a checkpoint.
      '16' is a fairly arbitrary number.  But we don't really have any good
      way to judge how often is acceptable, and it seems like a reasonable
      number for now.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      97e4f42d
    • N
      md: move md_k.h from include/linux/raid/ to drivers/md/ · 43b2e5d8
      NeilBrown 提交于
      It really is nicer to keep related code together..
      Signed-off-by: NNeilBrown <neilb@suse.de>
      43b2e5d8
    • N
      md: move lots of #include lines out of .h files and into .c · bff61975
      NeilBrown 提交于
      This makes the includes more explicit, and is preparation for moving
      md_k.h to drivers/md/md.h
      
      Remove include/raid/md.h as its only remaining use was to #include
      other files.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      bff61975
    • N
      md: move most content from md.h to md_k.h · 92022950
      NeilBrown 提交于
      The extern function definitions are kernel-internal definitions, so
      they belong in md_k.h
      
      The MD_*_VERSION values could reasonably go in a number of places,
      but md_u.h seems most reasonable.
      
      This leaves almost nothing in md.h.  It will go soon.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      92022950
    • N
      md: move LEVEL_* definition from md_k.h to md_u.h · 8b2b5c21
      NeilBrown 提交于
      .. as they are part of the user-space interface.
      Also move MdpMinorShift into there so we can remove duplication.
      
      Lastly move mdp_major in.  It is less obviously part of the user-space
      interface, but do_mounts_md.c uses it, and it is acting a bit like
      user-space.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      8b2b5c21
    • C
      md: move headers out of include/linux/raid/ · ef740c37
      Christoph Hellwig 提交于
      Move the headers with the local structures for the disciplines and
      bitmap.h into drivers/md/ so that they are more easily grepable for
      hacking and not far away.  md.h is left where it is for now as there
      are some uses from the outside.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      ef740c37
    • C
      cleanup drivers/md/Makefile · 2a40a8ae
      Christoph Hellwig 提交于
      Use the -y variables instead of the old -objs so we can easily add
      conditional objects to the modules.  Also always use += to add
      subobjects to avoid problems when placing additional objects in
      some place in the file.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      2a40a8ae
    • C
      md: stop defining MAJOR_NR · 3dbd8c2e
      Christoph Hellwig 提交于
      MAJOR_NR was only required for magic in linux/blk.h in 2.4 or earlier
      kernels, so no need to keep it around.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      3dbd8c2e
    • M
      MD data integrity support · 3f9d99c1
      Martin K. Petersen 提交于
      md: Add support for data integrity to MD
      
      If all subdevices support the same protection format the MD device is
      flagged as integrity capable.
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      3f9d99c1
    • N
      md: write bitmap information to devices that are undergoing recovery. · 355a43e6
      NeilBrown 提交于
      When we add some spares to an array and start recovery, and we have
      a bitmap which is stored 'internally' on all devices, we call
      bitmap_write_all to make sure the bitmap is correct on the new
      device(s).
      However that doesn't work as write_sb_page only writes to
      'In_sync' devices, and devices undergoing recovery are not
      'In_sync' until recovery finishes.
      
      So extend write_sb_page (actually next_active_rdev) to include devices
      that are under recovery.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      355a43e6
    • N
      md: never clear bit from the write-intent bitmap when the array is degraded. · d0a4bb49
      NeilBrown 提交于
      
      It is safe to clear a bit from the write-intent bitmap for a raid1
      if we know the data has been written to all devices, which is
      what the current test does.
      
      But it is not always safe to update the 'events_cleared' counter in
      that case.  This is because one request could complete successfully
      after some other request has partially failed.
      
      So simply disable the clearing and updating of events_cleared whenever
      the array is degraded.  This might end up not clearing some bits that
      could safely be cleared, but it is safest approach.
      
      Note that the bug fixed here did not risk corrupting data by letting
      the array get out-of-sync.  Rather it meant that when a device is
      removed and re-added to the array, it might incorrectly require a full
      recovery rather than just recovering based on the bitmap.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      d0a4bb49
    • N
      md: Allow write-intent bitmaps to have chunksize < PAGE_SIZE · 1187cf0a
      NeilBrown 提交于
      md currently insists that the chunk size used for write-intent
      bitmaps (the amount of data that corresponds to one chunk)
      be at least one page.
      
      The reason for this restriction is lost in the mists of time,
      but a review of the code (and a vague memory) suggests that the only
      problem would be related to resync.  Resync tries very hard to
      work in multiples of a page, but also needs to sync with units
      of a bitmap_chunk too.
      
      This connection comes out in the bitmap_start_sync call.
      
      So change bitmap_start_sync to always work in multiples of a page.
      If the bitmap chunk size is less that one page, we flag multiple
      chunks as 'syncing' and generally make them all appear to the
      resync routines like one chunk.
      
      All other code either already works with data ranges that could
      span multiple chunks, or explicitly only cares about a single chunk.
      Signed-off-by: NNeil Brown <neilb@suse.de>
      1187cf0a
    • N
      md: Fix is_mddev_idle test (again). · eea1bf38
      NeilBrown 提交于
      There are two problems with is_mddev_idle.
      
      1/ sync_io is 'atomic_t' and hence 'int'.  curr_events and all the
         rest are 'long'.
         So if sync_io were to wrap on a 64bit host, the value of
         curr_events would go very negative suddenly, and take a very
         long time to return to positive.
      
         So do all calculations as 'int'.  That gives us plenty of precision
         for what we need.
      
      2/ To initialise rdev->last_events we simply call is_mddev_idle, on
         the assumption that it will make sure that last_events is in a
         suitable range.  It used to do this, but now it does not.
         So now we need to be more explicit about initialisation.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      eea1bf38
  2. 10 3月, 2009 7 次提交
  3. 09 3月, 2009 19 次提交