1. 30 10月, 2008 2 次提交
    • M
      dm snapshot: wait for chunks in destructor · 879129d2
      Mikulas Patocka 提交于
      If there are several snapshots sharing an origin and one is removed
      while the origin is being written to, the snapshot's mempool may get
      deleted while elements are still referenced.
      
      Prior to dm-snapshot-use-per-device-mempools.patch the pending
      exceptions may still have been referenced after the snapshot was
      destroyed, but this was not a problem because the shared mempool
      was still there.
      
      This patch fixes the problem by tracking the number of mempool elements
      in use.
      
      The scenario:
      - You have an origin and two snapshots 1 and 2.
      - Someone writes to the origin.
      - It creates two exceptions in the snapshots, snapshot 1 will be primary
      exception, snapshot 2's pending_exception->primary_pe will point to the
      exception in snapshot 1.
      - The exceptions are being relocated, relocation of exception 1 finishes
      (but it's pending_exception is still allocated, because it is referenced
      by an exception from snapshot 2)
      - The user lvremoves snapshot 1 --- it calls just suspend (does nothing)
      and destructor. md->pending is zero (there is no I/O submitted to the
      snapshot by md layer), so it won't help us.
      - The destructor waits for kcopyd jobs to finish on snapshot 1 --- but
      there are none.
      - The destructor on snapshot 1 cleans up everything.
      - The relocation of exception on snapshot 2 finishes, it drops reference
      on primary_pe. This frees its primary_pe pointer. Primary_pe points to
      pending exception created for snapshot 1. So it frees memory into
      non-existing mempool.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      879129d2
    • M
      dm snapshot: fix register_snapshot deadlock · 60c856c8
      Mikulas Patocka 提交于
      register_snapshot() performs a GFP_KERNEL allocation while holding
      _origins_lock for write, but that could write out dirty pages onto a
      device that attempts to acquire _origins_lock for read, resulting in
      deadlock.
      
      So move the allocation up before taking the lock.
      
      This path is not performance-critical, so it doesn't matter that we
      allocate memory and free it if we find that we won't need it.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      60c856c8
  2. 22 10月, 2008 2 次提交
    • M
      dm snapshot: drop unused last_percent · f68d4f3d
      Mikulas Patocka 提交于
      The last_percent field is unused - remove it.
      (It dates from when events were triggered as each X% filled up.)
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      f68d4f3d
    • M
      dm snapshot: fix primary_pe race · 7c5f78b9
      Mikulas Patocka 提交于
      Fix a race condition with primary_pe ref_count handling.
      
      put_pending_exception runs under dm_snapshot->lock, it does atomic_dec_and_test
      on primary_pe->ref_count, and later does atomic_read primary_pe->ref_count.
      
      __origin_write does atomic_dec_and_test on primary_pe->ref_count without holding
      dm_snapshot->lock.
      
      This opens the following race condition:
      Assume two CPUs, CPU1 is executing put_pending_exception (and holding
      dm_snapshot->lock). CPU2 is executing __origin_write in parallel.
      primary_pe->ref_count == 2.
      
      CPU1:
      if (primary_pe && atomic_dec_and_test(&primary_pe->ref_count))
      	origin_bios = bio_list_get(&primary_pe->origin_bios);
      ... decrements primary_pe->ref_count to 1. Doesn't load origin_bios
      
      CPU2:
      if (first && atomic_dec_and_test(&primary_pe->ref_count)) {
      	flush_bios(bio_list_get(&primary_pe->origin_bios));
      	free_pending_exception(primary_pe);
      	/* If we got here, pe_queue is necessarily empty. */
      	return r;
      }
      ... decrements primary_pe->ref_count to 0, submits pending bios, frees
      primary_pe.
      
      CPU1:
      if (!primary_pe || primary_pe != pe)
      	free_pending_exception(pe);
      ... this has no effect.
      if (primary_pe && !atomic_read(&primary_pe->ref_count))
      	free_pending_exception(primary_pe);
      ... sees ref_count == 0 (written by CPU 2), does double free !!
      
      This bug can happen only if someone is simultaneously writing to both the
      origin and the snapshot.
      
      If someone is writing only to the origin, __origin_write will submit kcopyd
      request after it decrements primary_pe->ref_count (so it can't happen that the
      finished copy races with primary_pe->ref_count decrementation).
      
      If someone is writing only to the snapshot, __origin_write isn't invoked at all
      and the race can't happen.
      
      The race happens when someone writes to the snapshot --- this creates
      pending_exception with primary_pe == NULL and starts copying. Then, someone
      writes to the same chunk in the snapshot, and __origin_write races with
      termination of already submitted request in pending_complete (that calls
      put_pending_exception).
      
      This race may be reason for bugs:
        http://bugzilla.kernel.org/show_bug.cgi?id=11636
        https://bugzilla.redhat.com/show_bug.cgi?id=465825
      
      The patch fixes the code to make sure that:
      1. If atomic_dec_and_test(&primary_pe->ref_count) returns false, the process
      must no longer dereference primary_pe (because someone else may free it under
      us).
      2. If atomic_dec_and_test(&primary_pe->ref_count) returns true, the process
      is responsible for freeing primary_pe.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Cc: stable@kernel.org
      7c5f78b9
  3. 21 7月, 2008 3 次提交
  4. 25 4月, 2008 5 次提交
  5. 29 3月, 2008 1 次提交
  6. 08 2月, 2008 2 次提交
  7. 20 10月, 2007 1 次提交
  8. 10 10月, 2007 1 次提交
  9. 13 7月, 2007 3 次提交
  10. 09 12月, 2006 3 次提交
  11. 08 12月, 2006 1 次提交
  12. 22 11月, 2006 1 次提交
  13. 03 10月, 2006 8 次提交
  14. 01 7月, 2006 1 次提交
  15. 27 6月, 2006 2 次提交
  16. 28 3月, 2006 4 次提交
    • A
      [PATCH] dm snapshot: fix kcopyd destructor · 138728dc
      Alasdair G Kergon 提交于
      Before removing a snapshot, wait for the completion of any kcopyd jobs using
      it.
      
      Do this by maintaining a count (nr_jobs) of how many outstanding jobs each
      kcopyd_client has.
      
      The snapshot destructor first unregisters the snapshot so that no new kcopyd
      jobs (created by writes to the origin) will reference that particular
      snapshot.  kcopyd_client_destroy() is now run next to wait for the completion
      of any outstanding jobs before the snapshot exception structures (that those
      jobs reference) are freed.
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      138728dc
    • A
      [PATCH] dm: remove SECTOR_FORMAT · 4ee218cd
      Andrew Morton 提交于
      We don't know what type sector_t has.  Sometimes it's unsigned long, sometimes
      it's unsigned long long.  For example on ppc64 it's unsigned long with
      CONFIG_LBD=n and on x86_64 it's unsigned long long with CONFIG_LBD=n.
      
      The way to handle all of this is to always use unsigned long long and to
      always typecast the sector_t when printing it.
      Acked-by: NAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4ee218cd
    • A
      [PATCH] device-mapper snapshot: fix invalidation · 76df1c65
      Alasdair G Kergon 提交于
      When a snapshot becomes invalid, s->valid is set to 0.  In this state, a
      snapshot can no longer be accessed.
      
      When s->lock is acquired, before doing anything else, s->valid must be checked
      to ensure the snapshot remains valid.
      
      This patch eliminates some races (that may cause panics) by adding some
      missing checks.  At the same time, some unnecessary levels of indentation are
      removed and snapshot invalidation is moved into a single function that always
      generates a device-mapper event.
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      76df1c65
    • A
      [PATCH] device-mapper snapshot: replace sibling list · b4b610f6
      Alasdair G Kergon 提交于
      The siblings "list" is used unsafely at the moment.
      
      Firstly, only the element on the list being changed gets locked (via the
      snapshot lock), not the next and previous elements which have pointers that
      are also being changed.
      
      Secondly, if you have two or more snapshots and write to the same chunk a
      second time before every snapshot has finished making its private copy of the
      data, if you're unlucky, _origin_write() could attempt its list_merge() and
      dereference a 'last' pointer to a pending_exception structure that has just
      been freed.
      
      Analysis reveals that the list is actually only there for reference counting.
      If 5 pending_exceptions are needed in origin_write, then the 5 are joined
      together into a 5-element list - without a separate list head because there's
      nowhere suitable to store it.  As the pending_exceptions complete, they are
      removed from the list one-by-one and any contents of origin_bios get moved
      across to one of the remaining pending_exceptions on the list.  Whichever one
      is last is detected because list_empty() is then true and the origin_bios get
      submitted.
      
      The fix proposed here uses an alternative reference counting mechanism by
      choosing one of the pending_exceptions as primary and maintaining an atomic
      counter there.
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b4b610f6