1. 28 3月, 2006 10 次提交
    • A
      [PATCH] dm: tidy mdptr · 9ade92a9
      Alasdair G Kergon 提交于
      Change dm_get_mdptr() to take a struct mapped_device instead of dev_t.
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9ade92a9
    • M
      [PATCH] dm: store md name · 7e51f257
      Mike Anderson 提交于
      The patch stores a printable device number in struct mapped_device for use in
      warning messages and with a proposed netlink interface.
      Signed-off-by: NMike Anderson <andmike@us.ibm.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7e51f257
    • J
      [PATCH] dm flush queue EINTR · 1ecac7fd
      Jun'ichi Nomura 提交于
      If dm_suspend() is cancelled, bios already added to the deferred list need to
      be submitted.  Otherwise they remain 'in limbo' until there's a dm_resume().
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1ecac7fd
    • A
      [PATCH] dm snapshot: fix kcopyd destructor · 138728dc
      Alasdair G Kergon 提交于
      Before removing a snapshot, wait for the completion of any kcopyd jobs using
      it.
      
      Do this by maintaining a count (nr_jobs) of how many outstanding jobs each
      kcopyd_client has.
      
      The snapshot destructor first unregisters the snapshot so that no new kcopyd
      jobs (created by writes to the origin) will reference that particular
      snapshot.  kcopyd_client_destroy() is now run next to wait for the completion
      of any outstanding jobs before the snapshot exception structures (that those
      jobs reference) are freed.
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      138728dc
    • N
      [PATCH] dm: make sure QUEUE_FLAG_CLUSTER is set properly · 969429b5
      NeilBrown 提交于
      This flag should be set for a virtual device iff it is set for all
      underlying devices.
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Acked-by: NAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      969429b5
    • A
      [PATCH] dm: remove SECTOR_FORMAT · 4ee218cd
      Andrew Morton 提交于
      We don't know what type sector_t has.  Sometimes it's unsigned long, sometimes
      it's unsigned long long.  For example on ppc64 it's unsigned long with
      CONFIG_LBD=n and on x86_64 it's unsigned long long with CONFIG_LBD=n.
      
      The way to handle all of this is to always use unsigned long long and to
      always typecast the sector_t when printing it.
      Acked-by: NAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4ee218cd
    • J
      [PATCH] drivers/md/dm-raid1.c: Fix inconsistent mirroring after interrupted recovery · 930d332a
      Jun'ichi Nomura 提交于
      dm-mirror has potential data corruption problem: while on-disk log shows
      that all disk contents are in-sync, actual contents of the disks are not
      synchronized.  This problem occurs if initial recovery (synching) is
      interrupted and resumed.
      
      Attached patch fixes this problem.
      
      Background:
      
      rh_dec() changes the region state from RH_NOSYNC (out-of-sync) to RH_CLEAN
      (in-sync), which results in the corresponding bit of clean_bits being set.
      
      This is harmful if on-disk log is used and the map is removed/suspended
      before the initial sync is completed.  The clean_bits is written down to
      the on-disk log at the map removal, and, upon resume, it's read and copied
      to sync_bits.  Since the recovery process refers to the sync_bits to find a
      region to be recovered, the region whose state was changed from RH_NOSYNC
      to RH_CLEAN is no longer recovered.
      
      If you haven't applied dm-raid1-read-balancing.patch proposed in dm-devel
      sometimes ago, the contents of the mirrored disk just corrupt silently.  If
      you have, balanced read may get bogus data from out-of-sync disks.
      
      The patch keeps RH_NOSYNC state unchanged.  It will be changed to
      RH_RECOVERING when recovery starts and get reclaimed when the recovery
      completes.  So it doesn't leak the region hash entry.
      
      Description:
      
      Keep RH_NOSYNC state unchanged when I/O on the region completes.
      
      rh_dec() changes the region state from RH_NOSYNC (out-of-sync) to RH_CLEAN
      (in-sync), which results in the corresponding bit of clean_bits being set.
      
      This is harmful if on-disk log is used and the map is removed/suspended
      before the initial sync is completed.  The clean_bits is written down to
      the on-disk log at the map removal, and, upon resume, it's read and copied
      to sync_bits.  Since the recovery process refers to the sync_bits to find a
      region to be recovered, the region whose state was changed from RH_NOSYNC
      to RH_CLEAN is no longer recovered.
      
      If you haven't applied dm-raid1-read-balancing.patch proposed in dm-devel
      sometimes ago, the contents of the mirrored disk just corrupt silently.  If
      you have, balanced read may get bogus data from out-of-sync disks.
      
      The RH_NOSYNC region will be changed to RH_RECOVERING when recovery starts
      on the region and get reclaimed when the recovery completes.  So it doesn't
      leak the region hash entry.
      
      Alasdair said:
      
        I've analysed the relevant part of the state machine and I believe that
        the patch is correct.
      
        (Further work on this code is still needed - this patch has the
        side-effect of holding onto memory unnecessarily for long periods of time
        under certain workloads - but better that than corrupting data.)
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Acked-by: NAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      930d332a
    • A
      [PATCH] device-mapper snapshot: fix invalidation · 76df1c65
      Alasdair G Kergon 提交于
      When a snapshot becomes invalid, s->valid is set to 0.  In this state, a
      snapshot can no longer be accessed.
      
      When s->lock is acquired, before doing anything else, s->valid must be checked
      to ensure the snapshot remains valid.
      
      This patch eliminates some races (that may cause panics) by adding some
      missing checks.  At the same time, some unnecessary levels of indentation are
      removed and snapshot invalidation is moved into a single function that always
      generates a device-mapper event.
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      76df1c65
    • A
      [PATCH] device-mapper snapshot: replace sibling list · b4b610f6
      Alasdair G Kergon 提交于
      The siblings "list" is used unsafely at the moment.
      
      Firstly, only the element on the list being changed gets locked (via the
      snapshot lock), not the next and previous elements which have pointers that
      are also being changed.
      
      Secondly, if you have two or more snapshots and write to the same chunk a
      second time before every snapshot has finished making its private copy of the
      data, if you're unlucky, _origin_write() could attempt its list_merge() and
      dereference a 'last' pointer to a pending_exception structure that has just
      been freed.
      
      Analysis reveals that the list is actually only there for reference counting.
      If 5 pending_exceptions are needed in origin_write, then the 5 are joined
      together into a 5-element list - without a separate list head because there's
      nowhere suitable to store it.  As the pending_exceptions complete, they are
      removed from the list one-by-one and any contents of origin_bios get moved
      across to one of the remaining pending_exceptions on the list.  Whichever one
      is last is detected because list_empty() is then true and the origin_bios get
      submitted.
      
      The fix proposed here uses an alternative reference counting mechanism by
      choosing one of the pending_exceptions as primary and maintaining an atomic
      counter there.
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b4b610f6
    • A
      [PATCH] device-mapper snapshot: fix origin_write pending_exception submission · eccf0817
      Alasdair G Kergon 提交于
      Say you have several snapshots of the same origin and then you issue a write
      to some place in the origin for the first time.
      
      Before the device-mapper snapshot target lets the write go through to the
      underlying device, it needs to make a copy of the data that is about to be
      overwritten.  Each snapshot is independent, so it makes one copy for each
      snapshot.
      
      __origin_write() loops through each snapshot and checks to see whether a copy
      is needed for that snapshot.  (A copy is only needed the first time that data
      changes.)
      
      If a copy is needed, the code allocates a 'pending_exception' structure
      holding the details.  It links these together for all the snapshots, then
      works its way through this list and submits the copying requests to the kcopyd
      thread by calling start_copy().  When each request is completed, the original
      pending_exception structure gets freed in pending_complete().
      
      If you're very unlucky, this structure can get freed *before* the submission
      process has finished walking the list.
      
      This patch:
      
        1) Creates a new temporary list pe_queue to hold the pending exception
           structures;
      
        2) Does all the bookkeeping up-front, then walks through the new list
           safely and calls start_copy() for each pending_exception that needed it;
      
        3) Avoids attempting to add pe->siblings to the list if it's already
           connected.
      
      [NB This does not fix all the races in this code.  More patches will follow.]
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      eccf0817
  2. 27 3月, 2006 7 次提交
  3. 26 3月, 2006 1 次提交
  4. 25 3月, 2006 2 次提交
  5. 24 3月, 2006 1 次提交
  6. 22 3月, 2006 1 次提交
  7. 19 3月, 2006 1 次提交
  8. 17 3月, 2006 1 次提交
    • K
      [PATCH] dm stripe: Fix bounds · 8ba32fde
      Kevin Corry 提交于
      The dm-stripe target currently does not enforce that the size of a stripe
      device be a multiple of the chunk-size.  Under certain conditions, this can
      lead to I/O requests going off the end of an underlying device.  This
      test-case shows one example.
      
      echo "0 100 linear /dev/hdb1 0" | dmsetup create linear0
      echo "0 100 linear /dev/hdb1 100" | dmsetup create linear1
      echo "0 200 striped 2 32 /dev/mapper/linear0 0 /dev/mapper/linear1 0" | \
         dmsetup create stripe0
      dd if=/dev/zero of=/dev/mapper/stripe0 bs=1k
      
      This will produce the output:
      dd: writing '/dev/mapper/stripe0': Input/output error
      97+0 records in
      96+0 records out
      
      And in the kernel log will be:
      attempt to access beyond end of device
      dm-0: rw=0, want=104, limit=100
      
      The patch will check that the table size is a multiple of the stripe
      chunk-size when the table is created, which will prevent the above striped
      device from being created.
      
      This should not affect tools like LVM or EVMS, since in all the cases I can
      think of, striped devices are always created with the sizes being a
      multiple of the chunk-size.
      
      The size of a stripe device must be a multiple of its chunk-size.
      
      (akpm: that typecast is quite gratuitous)
      Signed-off-by: NKevin Corry <kevcorry@us.ibm.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8ba32fde
  9. 10 3月, 2006 1 次提交
  10. 25 2月, 2006 2 次提交
  11. 04 2月, 2006 3 次提交
  12. 03 2月, 2006 5 次提交
  13. 02 2月, 2006 5 次提交