1. 26 7月, 2010 7 次提交
    • N
      md/bitmap: separate out loading a bitmap from initialising the structures. · 69e51b44
      NeilBrown 提交于
      dm makes this distinction between ->ctr and ->resume, so we need to
      too.
      
      Also get the new bitmap_load to clear out the bitmap first, as this is
      most consistent with the dm suspend/resume approach
      Signed-off-by: NNeilBrown <neilb@suse.de>
      69e51b44
    • N
      md/bitmap: prepare for storing write-intent-bitmap via dm-dirty-log. · e384e585
      NeilBrown 提交于
      This allows md/raid5 to fully work as a dm target.
      
      Normally md uses a 'filemap' which contains a list of pages of bits
      each of which may be written separately.
      dm-log uses and all-or-nothing approach to writing the log, so
      when using a dm-log, ->filemap is NULL and the flags normally stored
      in filemap_attr are stored in ->logattrs instead.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      e384e585
    • N
      md/bitmap: optimise scanning of empty bitmaps. · ef425673
      NeilBrown 提交于
      A bitmap is stored as one page per 2048 bits.
      If none of the bits are set, the page is not allocated.
      
      When bitmap_get_counter finds that a page isn't allocate,
      it just reports that one bit work of space isn't flagged,
      rather than reporting that 2048 bits worth of space are
      unflagged.
      This can cause searches for flagged bits (e.g. bitmap_close_sync)
      to do more work than is really necessary.
      
      So change bitmap_get_counter (when creating) to report a number of
      blocks that more accurately reports the range of the device for which
      no counter currently exists.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      ef425673
    • N
      md/bitmap: clean up plugging calls. · b63d7c2e
      NeilBrown 提交于
      1/ use md_unplug in bitmap.c as we will soon be using bitmaps under
        arrays with no queue attached.
      
      2/ Don't bother plugging the queue when we set a bit in the bitmap.
         The reason for this was to encourage as many bits as possible to
         get set before we unplug and write stuff out.
         However every personality already plugs the queue after
         bitmap_startwrite either directly (raid1/raid10) or be setting
         STRIPE_BIT_DELAY which causes the queue to be plugged later
         (raid5).
      Signed-off-by: NNeilBrown <neilb@suse.de>
      b63d7c2e
    • N
      md/bitmap: reduce dependence on sysfs. · 5ff5afff
      NeilBrown 提交于
      For dm-raid45 we will want to use bitmaps in dm-targets which don't
      have entries in sysfs, so cope with the mddev not living in sysfs.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      5ff5afff
    • N
      md/bitmap: white space clean up and similar. · ac2f40be
      NeilBrown 提交于
      Fixes some whitespace problems
      Fixed some checkpatch.pl complaints.
      Replaced kmalloc ... memset(0), with kzalloc
      Fixed an unlikely memory leak on an error path.
      Reformatted a number of 'if/else' sets, sometimes
      replacing goto with an else clause.
      Removed some old comments and commented-out code.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      ac2f40be
    • N
      md: be more careful setting MD_CHANGE_CLEAN · 676e42d8
      NeilBrown 提交于
      When MD_CHANGE_CLEAN is set we might block in md_write_start.
      So we should only set it when fairly sure that something will clear
      it.
      
      There are two places where it is set so as to encourage a metadata
      update to record the progress of resync/recovery.  This should only
      be done if the internal metadata update mechanisms are in use, which
      can be tested by by inspecting '->persistent'.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      676e42d8
  2. 22 5月, 2010 2 次提交
    • C
      sanitize vfs_fsync calling conventions · 8018ab05
      Christoph Hellwig 提交于
      Now that the last user passing a NULL file pointer is gone we can remove
      the redundant dentry argument and associated hacks inside vfs_fsynmc_range.
      
      The next step will be removig the dentry argument from ->fsync, but given
      the luck with the last round of method prototype changes I'd rather
      defer this until after the main merge window.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      8018ab05
    • E
      sysfs: Implement sysfs tagged directory support. · 3ff195b0
      Eric W. Biederman 提交于
      The problem.  When implementing a network namespace I need to be able
      to have multiple network devices with the same name.  Currently this
      is a problem for /sys/class/net/*, /sys/devices/virtual/net/*, and
      potentially a few other directories of the form /sys/ ... /net/*.
      
      What this patch does is to add an additional tag field to the
      sysfs dirent structure.  For directories that should show different
      contents depending on the context such as /sys/class/net/, and
      /sys/devices/virtual/net/ this tag field is used to specify the
      context in which those directories should be visible.  Effectively
      this is the same as creating multiple distinct directories with
      the same name but internally to sysfs the result is nicer.
      
      I am calling the concept of a single directory that looks like multiple
      directories all at the same path in the filesystem tagged directories.
      
      For the networking namespace the set of directories whose contents I need
      to filter with tags can depend on the presence or absence of hotplug
      hardware or which modules are currently loaded.  Which means I need
      a simple race free way to setup those directories as tagged.
      
      To achieve a reace free design all tagged directories are created
      and managed by sysfs itself.
      
      Users of this interface:
      - define a type in the sysfs_tag_type enumeration.
      - call sysfs_register_ns_types with the type and it's operations
      - sysfs_exit_ns when an individual tag is no longer valid
      
      - Implement mount_ns() which returns the ns of the calling process
        so we can attach it to a sysfs superblock.
      - Implement ktype.namespace() which returns the ns of a syfs kobject.
      
      Everything else is left up to sysfs and the driver layer.
      
      For the network namespace mount_ns and namespace() are essentially
      one line functions, and look to remain that.
      
      Tags are currently represented a const void * pointers as that is
      both generic, prevides enough information for equality comparisons,
      and is trivial to create for current users, as it is just the
      existing namespace pointer.
      
      The work needed in sysfs is more extensive.  At each directory
      or symlink creating I need to check if the directory it is being
      created in is a tagged directory and if so generate the appropriate
      tag to place on the sysfs_dirent.  Likewise at each symlink or
      directory removal I need to check if the sysfs directory it is
      being removed from is a tagged directory and if so figure out
      which tag goes along with the name I am deleting.
      
      Currently only directories which hold kobjects, and
      symlinks are supported.  There is not enough information
      in the current file attribute interfaces to give us anything
      to discriminate on which makes it useless, and there are
      no potential users which makes it an uninteresting problem
      to solve.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NBenjamin Thery <benjamin.thery@bull.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      3ff195b0
  3. 18 5月, 2010 3 次提交
  4. 14 12月, 2009 10 次提交
  5. 16 10月, 2009 1 次提交
  6. 23 9月, 2009 1 次提交
  7. 26 5月, 2009 1 次提交
    • N
      md: bitmap: improve bitmap maintenance code. · be512691
      NeilBrown 提交于
      The code for checking which bits in the bitmap can be cleared
      has 2 problems:
       1/ it repeatedly takes and drops a spinlock, where it would make
          more sense to just hold on to it most of the time.
       2/ it doesn't make use of some opportunities to skip large sections
          of the bitmap
      
      This patch fixes those.  It will only affect CPU consumption, not
      correctness.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      be512691
  8. 23 5月, 2009 1 次提交
  9. 07 5月, 2009 2 次提交
    • N
      md: fix some (more) errors with bitmaps on devices larger than 2TB. · db305e50
      NeilBrown 提交于
      If a write intent bitmap covers more than 2TB, we sometimes work with
      values beyond 32bit, so these need to be sector_t.  This patches
      add the required casts to some unsigned longs that are being shifted
      up.
      
      This will affect any raid10 larger than 2TB, or any raid1/4/5/6 with
      member devices that are larger than 2TB.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Reported-by: N"Mario 'BitKoenig' Holbe" <Mario.Holbe@TU-Ilmenau.DE>
      Cc: stable@kernel.org
      db305e50
    • N
      md: fix loading of out-of-date bitmap. · b74fd282
      NeilBrown 提交于
      When md is loading a bitmap which it knows is out of date, it fills
      each page with 1s and writes it back out again.  However the
      write_page call makes used of bitmap->file_pages and
      bitmap->last_page_size which haven't been set correctly yet.  So this
      can sometimes fail.
      
      Move the setting of file_pages and last_page_size to before the call
      to write_page.
      
      This bug can cause the assembly on an array to fail, thus making the
      data inaccessible.  Hence I think it is a suitable candidate for
      -stable.
      
      Cc: stable@kernel.org
      Reported-by: NVojtech Pavlik <vojtech@suse.cz>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      b74fd282
  10. 20 4月, 2009 1 次提交
  11. 14 4月, 2009 1 次提交
    • N
      md: improve usefulness and accuracy of sysfs file md/sync_completed. · acb180b0
      NeilBrown 提交于
      The sync_completed file reports how much of a resync (or recovery or
      reshape) has been completed.
      However due to the possibility of out-of-order completion of writes,
      it is not certain to be accurate.
      
      We have an internal value - mddev->curr_resync_completed - which is an
      accurate value (though it might not always be quite so uptodate).
      
      So:
       - make curr_resync_completed be uptodate a little more often,
         particularly when raid5 reshape updates status in the metadata
       - report curr_resync_completed in the sysfs file
       - allow poll/select to report all updates to md/sync_completed.
      
      This makes sync_completed completed usable by any external metadata
      handler that wants to record this status information in its metadata.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      acb180b0
  12. 31 3月, 2009 8 次提交
    • A
      md: Make mddev->size sector-based. · 58c0fed4
      Andre Noll 提交于
      This patch renames the "size" field of struct mddev_s to "dev_sectors"
      and stores the number of 512-byte sectors instead of the number of
      1K-blocks in it.
      
      All users of that field, including raid levels 1,4-6,10, are adjusted
      accordingly. This simplifies the code a bit because it allows to get
      rid of a couple of divisions/multiplications by two.
      
      In order to make checkpatch happy, some minor coding style issues
      have also been addressed. In particular, size_store() now uses
      strict_strtoull() instead of simple_strtoull().
      Signed-off-by: NAndre Noll <maan@systemlinux.org>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      58c0fed4
    • N
      md: occasionally checkpoint drive recovery to reduce duplicate effort after a crash · 97e4f42d
      NeilBrown 提交于
      Version 1.x metadata has the ability to record the status of a
      partially completed drive recovery.
      However we only update that record on a clean shutdown.
      It would be nice to update it on unclean shutdowns too, particularly
      when using a bitmap that removes much to the 'sync' effort after an
      unclean shutdown.
      
      One complication with checkpointing recovery is that we only know
      where we are up to in terms of IO requests started, not which ones
      have completed.  And we need to know what has completed to record
      how much is recovered.  So occasionally pause the recovery until all
      submitted requests are completed, then update the record of where
      we are up to.
      
      When we have a bitmap, we already do that pause occasionally to keep
      the bitmap up-to-date.  So enhance that code to record the recovery
      offset and schedule a superblock update.
      And when there is no bitmap, just pause 16 times during the resync to
      do a checkpoint.
      '16' is a fairly arbitrary number.  But we don't really have any good
      way to judge how often is acceptable, and it seems like a reasonable
      number for now.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      97e4f42d
    • N
      md: move md_k.h from include/linux/raid/ to drivers/md/ · 43b2e5d8
      NeilBrown 提交于
      It really is nicer to keep related code together..
      Signed-off-by: NNeilBrown <neilb@suse.de>
      43b2e5d8
    • N
      md: move lots of #include lines out of .h files and into .c · bff61975
      NeilBrown 提交于
      This makes the includes more explicit, and is preparation for moving
      md_k.h to drivers/md/md.h
      
      Remove include/raid/md.h as its only remaining use was to #include
      other files.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      bff61975
    • C
      md: move headers out of include/linux/raid/ · ef740c37
      Christoph Hellwig 提交于
      Move the headers with the local structures for the disciplines and
      bitmap.h into drivers/md/ so that they are more easily grepable for
      hacking and not far away.  md.h is left where it is for now as there
      are some uses from the outside.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      ef740c37
    • N
      md: write bitmap information to devices that are undergoing recovery. · 355a43e6
      NeilBrown 提交于
      When we add some spares to an array and start recovery, and we have
      a bitmap which is stored 'internally' on all devices, we call
      bitmap_write_all to make sure the bitmap is correct on the new
      device(s).
      However that doesn't work as write_sb_page only writes to
      'In_sync' devices, and devices undergoing recovery are not
      'In_sync' until recovery finishes.
      
      So extend write_sb_page (actually next_active_rdev) to include devices
      that are under recovery.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      355a43e6
    • N
      md: never clear bit from the write-intent bitmap when the array is degraded. · d0a4bb49
      NeilBrown 提交于
      
      It is safe to clear a bit from the write-intent bitmap for a raid1
      if we know the data has been written to all devices, which is
      what the current test does.
      
      But it is not always safe to update the 'events_cleared' counter in
      that case.  This is because one request could complete successfully
      after some other request has partially failed.
      
      So simply disable the clearing and updating of events_cleared whenever
      the array is degraded.  This might end up not clearing some bits that
      could safely be cleared, but it is safest approach.
      
      Note that the bug fixed here did not risk corrupting data by letting
      the array get out-of-sync.  Rather it meant that when a device is
      removed and re-added to the array, it might incorrectly require a full
      recovery rather than just recovering based on the bitmap.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      d0a4bb49
    • N
      md: Allow write-intent bitmaps to have chunksize < PAGE_SIZE · 1187cf0a
      NeilBrown 提交于
      md currently insists that the chunk size used for write-intent
      bitmaps (the amount of data that corresponds to one chunk)
      be at least one page.
      
      The reason for this restriction is lost in the mists of time,
      but a review of the code (and a vague memory) suggests that the only
      problem would be related to resync.  Resync tries very hard to
      work in multiples of a page, but also needs to sync with units
      of a bitmap_chunk too.
      
      This connection comes out in the bitmap_start_sync call.
      
      So change bitmap_start_sync to always work in multiples of a page.
      If the bitmap chunk size is less that one page, we flag multiple
      chunks as 'syncing' and generally make them all appear to the
      resync routines like one chunk.
      
      All other code either already works with data ranges that could
      span multiple chunks, or explicitly only cares about a single chunk.
      Signed-off-by: NNeil Brown <neilb@suse.de>
      1187cf0a
  13. 09 1月, 2009 2 次提交
    • C
      md: use list_for_each_entry macro directly · 159ec1fc
      Cheng Renquan 提交于
      The rdev_for_each macro defined in <linux/raid/md_k.h> is identical to
      list_for_each_entry_safe, from <linux/list.h>, it should be defined to
      use list_for_each_entry_safe, instead of reinventing the wheel.
      
      But some calls to each_entry_safe don't really need a safe version,
      just a direct list_for_each_entry is enough, this could save a temp
      variable (tmp) in every function that used rdev_for_each.
      
      In this patch, most rdev_for_each loops are replaced by list_for_each_entry,
      totally save many tmp vars; and only in the other situations that will call
      list_del to delete an entry, the safe version is used.
      Signed-off-by: NCheng Renquan <crquan@gmail.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      159ec1fc
    • N
      md: fix bitmap-on-external-file bug. · 53845270
      NeilBrown 提交于
      commit a2ed9615
      fixed a bug with 'internal' bitmaps, but in the process broke
      'in a file' bitmaps.  So they are broken in 2.6.28
      
      This fixes it, and needs to go in 2.6.28-stable.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Cc: stable@kernel.org
      53845270