1. 01 11月, 2011 13 次提交
    • J
      dm log userspace: add log device dependency · 5a25f0eb
      Jonathan E Brassow 提交于
      Allow userspace dm log implementations to register their log device so it
      is no longer missing from the list of device dependencies.
      
      When device mapper targets use a device they normally call dm_get_device
      which includes it in the device list returned to userspace applications
      such as LVM through the DM_TABLE_DEPS ioctl.  Userspace log devices
      don't use dm_get_device as userspace opens them so they are missing from
      the list of dependencies.
      
      This patch extends the DM_ULOG_CTR operation to allow userspace to
      respond with the name of the log device (if appropriate) to be
      registered via 'dm_get_device'.  DM_ULOG_REQUEST_VERSION is incremented.
      
      This is backwards compatible.  If the kernel and userspace log server
      have both been updated, the new information will be passed down to the
      kernel and the device will be registered.  If the kernel is new, but
      the log server is old, the log server will not pass down any device
      information and the kernel will simply bypass the device registration
      as before.  If the kernel is old but the log server is new, the log
      server will see the old version number and not pass the device info.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      5a25f0eb
    • J
      dm log userspace: fix comment hyphens · b8954457
      Jonathan Brassow 提交于
      Fix comments: clustered-disk needs a hyphen not an underscore.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      b8954457
    • J
      dm: add thin provisioning target · 991d9fa0
      Joe Thornber 提交于
      Initial EXPERIMENTAL implementation of device-mapper thin provisioning
      with snapshot support.  The 'thin' target is used to create instances of
      the virtual devices that are hosted in the 'thin-pool' target.  The
      thin-pool target provides data sharing among devices.  This sharing is
      made possible using the persistent-data library in the previous patch.
      
      The main highlight of this implementation, compared to the previous
      implementation of snapshots, is that it allows many virtual devices to
      be stored on the same data volume, simplifying administration and
      allowing sharing of data between volumes (thus reducing disk usage).
      
      Another big feature is support for arbitrary depth of recursive
      snapshots (snapshots of snapshots of snapshots ...).  The previous
      implementation of snapshots did this by chaining together lookup tables,
      and so performance was O(depth).  This new implementation uses a single
      data structure so we don't get this degradation with depth.
      
      For further information and examples of how to use this, please read
      Documentation/device-mapper/thin-provisioning.txt
      Signed-off-by: NJoe Thornber <thornber@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      991d9fa0
    • J
      dm: add persistent data library · 3241b1d3
      Joe Thornber 提交于
      The persistent-data library offers a re-usable framework for the storage
      and management of on-disk metadata in device-mapper targets.
      
      It's used by the thin-provisioning target in the next patch and in an
      upcoming hierarchical storage target.
      
      For further information, please read
      Documentation/device-mapper/persistent-data.txt
      Signed-off-by: NJoe Thornber <thornber@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      3241b1d3
    • M
      dm: add bufio · 95d402f0
      Mikulas Patocka 提交于
      The dm-bufio interface allows you to do cached I/O on devices,
      holding recently-read blocks in memory and performing delayed writes.
      
      We don't use buffer cache or page cache already present in the kernel, because:
      * we need to handle block sizes larger than a page
      * we can't allocate memory to perform reads or we'd have deadlocks
      
      Currently, when a cache is required, we limit its size to a fraction of
      available memory.  Usage can be viewed and changed in
      /sys/module/dm_bufio/parameters/ .
      
      The first user is thin provisioning, but more dm users are planned.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      95d402f0
    • A
      dm: export dm get md · 3cf2e4ba
      Alasdair G Kergon 提交于
      Export dm_get_md() for the new thin provisioning target to use.
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      3cf2e4ba
    • A
      dm table: add immutable feature · 36a0456f
      Alasdair G Kergon 提交于
      Introduce DM_TARGET_IMMUTABLE to indicate that the target type cannot be mixed
      with any other target type, and once loaded into a device, it cannot be
      replaced with a table containing a different type.
      
      The thin provisioning pool device will use this.
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      36a0456f
    • A
      dm table: add always writeable feature · cc6cbe14
      Alasdair G Kergon 提交于
      Add a target feature flag DM_TARGET_ALWAYS_WRITEABLE to indicate that a target
      does not support read-only mode.
      
      The initial implementation of the thin provisioning target uses this.
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      cc6cbe14
    • A
      dm table: add singleton feature · 3791e2fc
      Alasdair G Kergon 提交于
      Introduce the concept of a singleton table which contains exactly one target.
      
      If a target type sets the DM_TARGET_SINGLETON feature bit device-mapper
      will ensure that any table that includes that target contains no others.
      
      The thin provisioning pool target uses this.
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      3791e2fc
    • M
      dm kcopyd: add dm_kcopyd_zero to zero an area · 7f069653
      Mikulas Patocka 提交于
      This patch introduces dm_kcopyd_zero() to make it easy to use
      kcopyd to write zeros into the requested areas instead
      instead of copying.  It is implemented by passing a NULL
      copying source to dm_kcopyd_copy().
      
      The forthcoming thin provisioning target uses this.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      7f069653
    • N
      dm: remove superfluous smp_mb · fbdc86f3
      Namhyung Kim 提交于
      Since set_current_state() contains a memory barrier in it,
      an additional barrier isn't needed.
      Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      fbdc86f3
    • N
      dm: use local printk ratelimit · 71a16736
      Namhyung Kim 提交于
      printk_ratelimit() shares global ratelimiting state with all
      other subsystems, so its usage is discouraged. Instead,
      define and use dm's local state.
      Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      71a16736
    • M
      dm table: propagate non rotational flag · 4693c966
      Mandeep Singh Baines 提交于
      Allow QUEUE_FLAG_NONROT to propagate up the device stack if all
      underlying devices are non-rotational.  Tools like ureadahead will
      schedule IOs differently based on the rotational flag.
      
      With this patch, I see boot time go from 7.75 s to 7.46 s on my device.
      Suggested-by: NJ. Richard Barnette <jrbarnette@chromium.org>
      Signed-off-by: NMandeep Singh Baines <msb@chromium.org>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Jens Axboe <jaxboe@fusionio.com>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: dm-devel@redhat.com
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      4693c966
  2. 24 10月, 2011 1 次提交
  3. 26 9月, 2011 4 次提交
  4. 21 9月, 2011 1 次提交
    • N
      md: Avoid waking up a thread after it has been freed. · 01f96c0a
      NeilBrown 提交于
      Two related problems:
      
      1/ some error paths call "md_unregister_thread(mddev->thread)"
         without subsequently clearing ->thread.  A subsequent call
         to mddev_unlock will try to wake the thread, and crash.
      
      2/ Most calls to md_wakeup_thread are protected against the thread
         disappeared either by:
            - holding the ->mutex
            - having an active request, so something else must be keeping
              the array active.
         However mddev_unlock calls md_wakeup_thread after dropping the
         mutex and without any certainty of an active request, so the
         ->thread could theoretically disappear.
         So we need a spinlock to provide some protections.
      
      So change md_unregister_thread to take a pointer to the thread
      pointer, and ensure that it always does the required locking, and
      clears the pointer properly.
      Reported-by: N"Moshe Melnikov" <moshe@zadarastorage.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      cc: stable@kernel.org
      01f96c0a
  5. 10 9月, 2011 3 次提交
    • N
      md: Fix handling for devices from 2TB to 4TB in 0.90 metadata. · 27a7b260
      NeilBrown 提交于
      0.90 metadata uses an unsigned 32bit number to count the number of
      kilobytes used from each device.
      This should allow up to 4TB per device.
      However we multiply this by 2 (to get sectors) before casting to a
      larger type, so sizes above 2TB get truncated.
      
      Also we allow rdev->sectors to be larger than 4TB, so it is possible
      for the array to be resized larger than the metadata can handle.
      So make sure rdev->sectors never exceeds 4TB when 0.90 metadata is in
      used.
      
      Also the sanity check at the end of super_90_load should include level
      1 as it used ->size too. (RAID0 and Linear don't use ->size at all).
      Reported-by: NPim Zandbergen <P.Zandbergen@macroscoop.nl>
      Cc: stable@kernel.org
      Signed-off-by: NNeilBrown <neilb@suse.de>
      27a7b260
    • N
      md/raid1,10: Remove use-after-free bug in make_request. · 079fa166
      NeilBrown 提交于
      A single request to RAID1 or RAID10 might result in multiple
      requests if there are known bad blocks that need to be avoided.
      
      To detect if we need to submit another write request we test:
       	if (sectors_handled < (bio->bi_size >> 9)) {
      
      However this is after we call **_write_done() so the 'bio' no longer
      belongs to us - the writes could have completed and the bio freed.
      
      So move the **_write_done call until after the test against
      bio->bi_size.
      
      This addresses https://bugzilla.kernel.org/show_bug.cgi?id=41862Reported-by: NBruno Wolff III <bruno@wolff.to>
      Tested-by: NBruno Wolff III <bruno@wolff.to>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      079fa166
    • N
      md/raid10: unify handling of write completion. · 19d5f834
      NeilBrown 提交于
      A write can complete at two different places:
      1/ when the last member-device write completes, through
         raid10_end_write_request
      2/ in make_request() when we remove the initial bias from ->remaining.
      
      These two should do exactly the same thing and the comment says they
      do, but they don't.
      
      So factor the correct code out into a function and call it in both
      places.  This makes the code much more similar to RAID1.
      
      The difference is only significant if there is an error, and they
      usually take a while, so it is unlikely that there will be an error
      already when make_request is completing, so this is unlikely to cause
      real problems.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      19d5f834
  6. 31 8月, 2011 1 次提交
    • N
      md/raid5: fix a hang on device failure. · 43220aa0
      NeilBrown 提交于
      Waiting for a 'blocked' rdev to become unblocked in the raid5d thread
      cannot work with internal metadata as it is the raid5d thread which
      will clear the blocked flag.
      This wasn't a problem in 3.0 and earlier as we only set the blocked
      flag when external metadata was used then.
      However we now set it always, so we need to be more careful.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      43220aa0
  7. 30 8月, 2011 1 次提交
  8. 25 8月, 2011 4 次提交
  9. 02 8月, 2011 12 次提交