1. 22 5月, 2012 39 次提交
    • N
      md/bitmap: record the space available for the bitmap in the superblock. · 1dff2b87
      NeilBrown 提交于
      Now that bitmaps can grow and shrink it is best if we record
      how much space is available.  This means that when
      we reduce the size of the bitmap we won't "lose" the space
      for late when we might want to increase the size of the bitmap
      again.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      1dff2b87
    • N
      md/raid10: Remove extras after reshape to smaller number of devices. · 63aced61
      NeilBrown 提交于
      When a reshape which reduced the number of devices finishes
      we must remove the extra devices.
      
      So ensure  that raid10_remove_disk won't try to keep them, and
      have raid10_finish_reshape clear the 'in_sync' flag.  Then
      remove_and_add_spares will be able to remove them.
      Reported-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      63aced61
    • N
      md/raid5: improve removal of extra devices after reshape. · da7613b8
      NeilBrown 提交于
      After a reshape which reduced the number of devices we need
      to disconnect the extra devices.
      The code for this doesn't currently handle 'replacement' devices.
      It is very unlikely that such devices will be present, but it is
      safest to handle them anyway.
      
      So simplify the handling.  Just clear In_sync and leave it
      to remove_and_add_spaces (which will be called soon) to do
      the real works.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      da7613b8
    • Y
      md: check the return of mddev_find() · 0c098220
      Yuanhan Liu 提交于
      Check the return of mddev_find(), since it may fail due to out of
      memeory or out of usable minor number.
      
      The reason I chose -ENODEV instead of -ENOMEM or something else is
      md_alloc() function chose that ;)
      Signed-off-by: NYuanhan Liu <yuanhan.liu@linux.intel.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      0c098220
    • J
      MD RAID1: Further conditionalize 'fullsync' · 4f0a5e01
      Jonathan Brassow 提交于
      A RAID1 device does not necessarily need a fullsync if the bitmap can be used instead.
      
      Similar to commit d6b212f4 in raid5.c, if a raid1
      device can be brought back (i.e. from a transient failure) it shouldn't need a
      complete resync.  Provided the bitmap is not to old, it will have recorded the areas
      of the disk that need recovery.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      4f0a5e01
    • J
      DM RAID: Use md_error() in place of simply setting Faulty bit · c32fb9e7
      Jonathan Brassow 提交于
      When encountering an error while reading the superblock, call md_error.
      
      We are currently setting the 'Faulty' bit on one of the array devices when an
      error is encountered while reading the superblock of a dm-raid array.  We should
      be calling md_error(), as it handles the error more completely.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      c32fb9e7
    • J
      DM RAID: Record and handle missing devices · 81f382f9
      Jonathan Brassow 提交于
      Missing dm-raid devices should be recorded in the superblock
      
      When specifying the devices that compose a DM RAID array, it is possible to denote
      failed or missing devices with '-'s.  When this occurs, we must record this in the
      superblock.  We do this by checking if the array position's data device is missing
      and then forcing MD to record the superblock by setting 'MD_CHANGE_DEVS' in
      'raid_resume'.  If we do not cause the superblock to be rewritten by the resume
      function, it is possible for a stale superblock to be written by an out-going
      in-active table (during 'raid_dtr').
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      81f382f9
    • J
      DM RAID: Set recovery flags on resume · 47525e59
      Jonathan Brassow 提交于
      Properly initialize MD recovery flags when resuming device-mapper devices.
      
      When a device-mapper device is suspended, all I/O must stop.  This is done by
      calling 'md_stop_writes' and 'mddev_suspend'.  These calls in-turn manipulate
      the recovery flags - including setting 'MD_RECOVERY_FROZEN'.  The DM device
      may have been suspended while recovery was not yet complete, so the process
      needs to pick-up where it left off.  Since 'mddev_resume' does not unset
      'MD_RECOVERY_FROZEN' and set 'MD_RECOVERY_NEEDED', we must do it ourselves.
      'MD_RECOVERY_NEEDED' can safely be set in 'mddev_resume', but 'MD_RECOVERY_FROZEN'
      must be set outside of 'mddev_resume' due to how MD handles RAID reshaping.
      (e.g.  It is possible for a user to delay reshaping a RAID5->RAID6 by purposefully
      setting 'MD_RECOVERY_FROZEN'.  Clearing it in 'mddev_resume' would override the
      desired behavior.)
      
      Because 'mddev_resume' already unconditionally calls 'md_wakeup_thread(mddev->thread)'
      there is no need to make this call from 'raid_resume' since it calls 'mddev_resume'.
      
      Also clean up where  level_store calls mddev_resume() - it current
      duplicates some of the funcitons of that call. - NB
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      47525e59
    • N
      md/raid5: Allow reshape while a bitmap is present. · 30b67645
      NeilBrown 提交于
      We always should have allowed this.  A raid5 reshape doesn't change
      the size of the bitmap, so not need to restrict it.
      
      Also add a test to make sure we don't try to start a reshape on a
      failed array.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      30b67645
    • N
      md/raid10: resize bitmap when required during reshape. · bb63a701
      NeilBrown 提交于
      If a reshape changes the size of the array, then we can now
      update the bitmap to suit - so do so.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      bb63a701
    • N
      md: allow array to be resized while bitmap is present. · a4a6125a
      NeilBrown 提交于
      Now that bitmaps can be resized, we can allow an array to be resized
      while the bitmap is present.
      
      This only covers resizing that involves changing the effective size
      of member devices, not resizing that changes the number of devices.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      a4a6125a
    • N
      md/bitmap: make sure reshape request are reflected in superblock. · b81a0404
      NeilBrown 提交于
      As a reshape may change the sync_size and/or chunk_size, we need
      to update these whenever we write out the bitmap superblock.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      b81a0404
    • N
      md/bitmap: add bitmap_resize function to allow bitmap resizing. · d60b479d
      NeilBrown 提交于
      This function will allocate the new data structures and copy
      bits across from old to new, allowing for the possibility that the
      chunksize has changed.
      
      Use the same function for performing the initial allocation
      of the structures.  This improves test coverage.
      
      When bitmap_resize is used to resize an existing bitmap, it
      only copies '1' bits in, not '0' bits.
      So when allocating the bitmap, ensure everything is initialised
      to ZERO.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      d60b479d
    • N
      md/bitmap: use DIV_ROUND_UP instead of open-code · 15702d7f
      NeilBrown 提交于
      Also take the opportunity to simplify CHUNK_BLOCK_RATIO.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      15702d7f
    • N
      md/bitmap: create a 'struct bitmap_counts' substructure of 'struct bitmap' · 40cffcc0
      NeilBrown 提交于
      The new "struct bitmap_counts" contains all the fields that are
      related to counting the number of active writes in each bitmap chunk.
      
      Having this separate will make it easier to change the chunksize
      or overall size of a bitmap atomically.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      40cffcc0
    • N
      md/bitmap: make bitmap bitops atomic. · 63c68268
      NeilBrown 提交于
      This allows us to remove spinlock protection which is
      more heavy-weight than simple atomics.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      63c68268
    • N
      md/bitmap: make _page_attr bitops atomic. · bdfd1140
      NeilBrown 提交于
      Using e.g. set_bit instead of __set_bit and using test_and_clear_bit
      allow us to remove some locking and contract other locked ranges.
      
      It is rare that we set or clear a lot of these bits, so gain should
      outweigh any cost.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      bdfd1140
    • N
      md/bitmap: merge bitmap_file_unmap and bitmap_file_put. · fae7d326
      NeilBrown 提交于
      There functions really do one thing together: release the
      'bitmap_storage'.  So make them just one function.
      
      Since we removed the locking (previous patch), we don't need to zero
      any fields before freeing them, so it all becomes a bit simpler.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      fae7d326
    • N
      md/bitmap: remove async freeing of bitmap file. · 62f82faa
      NeilBrown 提交于
      There is no real value in freeing things the moment there is an error.
      It is just as good to free the bitmap file and pages when the bitmap
      is explicitly removed (and replaced?) or at shutdown.
      
      With this gone, the bitmap will only disappear when the array is
      quiescent, so we can remove some locking.
      
      As the 'filemap' doesn't disappear now, include extra checks before
      trying to write any of it out.
      Also remove the check for "has it disappeared" in
      bitmap_daemon_write().
      Signed-off-by: NNeilBrown <neilb@suse.de>
      62f82faa
    • N
      md/bitmap: convert some spin_lock_irqsave to spin_lock_irq · 74667123
      NeilBrown 提交于
      All of these sites can only be called from process context with
      irqs enabled, so using irqsave/irqrestore just adds noise.
      Remove it.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      74667123
    • N
      md/bitmap: use set_bit, test_bit, etc for operation on bitmap->flags. · b405fe91
      NeilBrown 提交于
      We currently use '&' and '|' which isn't the norm in the kernel
      and doesn't allow easy atomicity.
      So change to bit numbers and {set,clear,test}_bit.
      This allows us to remove a spinlock/unlock (which was dubious anyway)
      and some other simplifications.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      b405fe91
    • N
      md/bitmap: remove single-bit manipulation on sb->state · 84e92345
      NeilBrown 提交于
      Just do single-bit manipulations on bitmap->flags and copy whole
      value between that and sb->state.
      
      This will allow next patch which changes how bit manipulations are
      performed on bitmap->flags.
      
      This does result in BITMAP_STALE not being set in sb by
      bitmap_read_sb, however as the setting is determined by other
      information in the 'sb' we do not lose information this way.
      Normally, bitmap_load will be called shortly which will clear
      BITMAP_STALE anyway.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      84e92345
    • N
      md/bitmap: remove bitmap_mask_state · edbb79df
      NeilBrown 提交于
      This function isn't really needed.  It sets or clears a flag in both
      bitmap->flags and sb->state.
      However both times it is called, bitmap_update_sb is called soon
      afterwards which copies bitmap->flags to sb->state.
      So just make changes to bitmap->flags, and open-code those rather than
      hiding in a function.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      edbb79df
    • N
      md/bitmap: move storage allocation from bitmap_load to bitmap_create. · bc9891a8
      NeilBrown 提交于
      We should allocate memory for the storage-bitmap at create-time, not
      load time.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      bc9891a8
    • N
      md/bitmap: separate bitmap file allocation to its own function. · d1244cb0
      NeilBrown 提交于
      This will allow allocation before swapping in a new bitmap.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      d1244cb0
    • N
      md/bitmap: store bytes in file rather than just in last page. · 9b1215c1
      NeilBrown 提交于
      This number is more generally useful, and bytes-in-last-page is
      easily extracted from it.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      9b1215c1
    • N
      md/bitmap: move some fields of 'struct bitmap' into a 'storage' substruct. · 1ec885cd
      NeilBrown 提交于
      This new 'struct bitmap_storage' reflects the external storage of the
      bitmap.
      Having this clearly defined will make it easier to change the storage
      used while the array is active.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      1ec885cd
    • N
      md/bitmap: change *_page_attr() to take a page number, not a page. · d189122d
      NeilBrown 提交于
      Most often we have the page number, not the page.  And that is what
      the  *_page_attr() functions really want.  So change the arguments to
      take that number.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      d189122d
    • N
      md/bitmap: centralise allocation of bitmap file pages. · 27581e5a
      NeilBrown 提交于
      Instead of allocating pages in read_sb_page, read_page and
      bitmap_read_sb, allocate them all in bitmap_init_from disk.
      
      Also replace the hack of calling "attach_page_buffers(page, NULL)" to
      ensure that free_buffer() won't complain, by putting a test for
      PagePrivate in free_buffer().
      Signed-off-by: NNeilBrown <neilb@suse.de>
      27581e5a
    • N
      md/bitmap: allow a bitmap with no backing storage. · ef99bf48
      NeilBrown 提交于
      An md bitmap comprises two parts
       - internal counting of active writes per 'chunk'.
       - external storage of whether there are any active writes on
         each chunk
      
      The second requires the first, but the first doesn't require the
      second.
      
      Not having backing storage means that the bitmap cannot expedite
      resync after a crash, but it still allows us to expedite the recovery
      of a recently-removed device.
      
      So: allow a bitmap to exist even if there is no backing device.
      In that case we default to 128M chunks.
      
      A particular value of this is that we can remove and re-add a bitmap
      (possibly of a different granularity) on a degraded array, and not
      lose the information needed to fast-recover the missing device.
      
      We don't actually activate these bitmaps yet - that will come
      in a later patch.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      ef99bf48
    • N
      md/bitmap: add new 'space' attribute for bitmaps. · 6409bb05
      NeilBrown 提交于
      If we are to allow bitmaps to be resized when the array is resized,
      we need to know how much space there is.
      
      So create an attribute to store this information and set appropriate
      defaults.
      
      It can be set more precisely via sysfs, or future metadata extensions
      may allow it to be recorded.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      6409bb05
    • N
      md/bitmap: disentangle two different 'pending' flags. · bf07bb7d
      NeilBrown 提交于
      There are two different 'pending' concepts in the handling of the
      write intent bitmap.
      
      Firstly, a 'page' from the bitmap (which container PAGE_SIZE*8 bits)
      may have changes (bits cleared) that should be written in due course.
      There is no hurry for these and the page will transition from
      PENDING to NEEDWRITE and will then be written, though if it ever
      becomes DIRTY it will be written much sooner and PENDING will be
      cleared.
      
      Secondly, a page of counters - which contains PAGE_SIZE/2 counters, one
      for each bit, can usefully have a 'pending' flag which indicates if
      any of the counters are low (2 or 1) and ready to be processed by
      bitmap_daemon_work().  If this flag is clear we can skip the whole
      page.
      
      These two concepts are currently combined in the bitmap-file flag.
      This causes a tighter connection between the counters and the bitmap
      file than I would like - as I want to add some flexibility to the
      bitmap file.
      
      So introduce a new flag with the page-of-counters, and rewrite
      bitmap_daemon_work() so that it handles the two different 'pending'
      concepts separately.
      
      This also allows us to clear BITMAP_PAGE_PENDING when we write out
      a dirty page, which may occasionally reduce the number of times we
      write a page.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      bf07bb7d
    • S
      raid5: support sync request · bc0934f0
      Shaohua Li 提交于
      REQ_SYNC is ignored in current raid5 code. Block layer does use it to do
      policy,
      for example ioscheduler. This patch adds it.
      Signed-off-by: NShaohua Li <shli@fusionio.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      bc0934f0
    • S
      raid5: remove unused variables · cceeca43
      Shaohua Li 提交于
      The two variables are useless.
      Signed-off-by: NShaohua Li <shli@fusionio.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      cceeca43
    • M
      md/raid10: Fix memleak in r10buf_pool_alloc · 5fdd2cf8
      majianpeng 提交于
      If the allocation of rep1_bio fails, we currently don't free the 'bio'
      of the same dev.
      
      Reported by kmemleak.
      Signed-off-by: Nmajianpeng <majianpeng@gmail.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      5fdd2cf8
    • M
      md/raid1: allow fix_read_error to read from recovering device. · da8840a7
      majianpeng 提交于
      When attempting to fix a read error, it is acceptable to read from a
      device that is recovering, provided the recovery has got past the
      place we are reading from.  This makes the test for "can we read from
      here" the same as the test in read_balance.
      Signed-off-by: Nmajianpeng <majianpeng@gmail.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      da8840a7
    • N
      md: move freeing of badblocks.page into md_rdev_clear · 4fa2f327
      NeilBrown 提交于
      This ensures that it is always freed - there were case where
      we failed to free the page.
      Reported-by: Nmajianpeng <majianpeng@gmail.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      4fa2f327
    • N
      md: dm-raid should call helper function to clear rdev. · 545c8795
      NeilBrown 提交于
      dm-raid currently open-codes the freeing of some members of
      and rdev.  It is more maintainable to have it call common code
      from md.c which does this for all call-sites.
      
      So remove free_disk_sb to md_rdev_clear, export it, and use it in
      dm-raid.c
      Signed-off-by: NNeilBrown <neilb@suse.de>
      545c8795
    • N
      md/raid10: add reshape support · 3ea7daa5
      NeilBrown 提交于
      A 'near' or 'offset' lay RAID10 array can be reshaped to a different
      'near' or 'offset' layout, a different chunk size, and a different
      number of devices.
      However the number of copies cannot change.
      
      Unlike RAID5/6, we do not support having user-space backup data that
      is being relocated during a 'critical section'.  Rather, the
      data_offset of each device must change so that when writing any block
      to a new location, it will not over-write any data that is still
      'live'.
      
      This means that RAID10 reshape is not supportable on v0.90 metadata.
      
      The different between the old data_offset and the new_offset must be
      at least the larger of the chunksize multiplied by offset copies of
      each of the old and new layout. (for 'near' mode, offset_copies == 1).
      
      A larger difference of around 64M seems useful for in-place reshapes
      as more data can be moved between metadata updates.
      Very large differences (e.g. 512M) seem to slow the process down due
      to lots of long seeks (on oldish consumer graded devices at least).
      
      Metadata needs to be updated whenever the place we are about to write
      to is considered - by the current metadata - to still contain data in
      the old layout.
      
      [unbalanced locking fix from Dan Carpenter <dan.carpenter@oracle.com>]
      Signed-off-by: NNeilBrown <neilb@suse.de>
      3ea7daa5
  2. 21 5月, 2012 1 次提交