1. 22 10月, 2008 2 次提交
    • M
      dm snapshot: drop unused last_percent · f68d4f3d
      Mikulas Patocka 提交于
      The last_percent field is unused - remove it.
      (It dates from when events were triggered as each X% filled up.)
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      f68d4f3d
    • M
      dm snapshot: fix primary_pe race · 7c5f78b9
      Mikulas Patocka 提交于
      Fix a race condition with primary_pe ref_count handling.
      
      put_pending_exception runs under dm_snapshot->lock, it does atomic_dec_and_test
      on primary_pe->ref_count, and later does atomic_read primary_pe->ref_count.
      
      __origin_write does atomic_dec_and_test on primary_pe->ref_count without holding
      dm_snapshot->lock.
      
      This opens the following race condition:
      Assume two CPUs, CPU1 is executing put_pending_exception (and holding
      dm_snapshot->lock). CPU2 is executing __origin_write in parallel.
      primary_pe->ref_count == 2.
      
      CPU1:
      if (primary_pe && atomic_dec_and_test(&primary_pe->ref_count))
      	origin_bios = bio_list_get(&primary_pe->origin_bios);
      ... decrements primary_pe->ref_count to 1. Doesn't load origin_bios
      
      CPU2:
      if (first && atomic_dec_and_test(&primary_pe->ref_count)) {
      	flush_bios(bio_list_get(&primary_pe->origin_bios));
      	free_pending_exception(primary_pe);
      	/* If we got here, pe_queue is necessarily empty. */
      	return r;
      }
      ... decrements primary_pe->ref_count to 0, submits pending bios, frees
      primary_pe.
      
      CPU1:
      if (!primary_pe || primary_pe != pe)
      	free_pending_exception(pe);
      ... this has no effect.
      if (primary_pe && !atomic_read(&primary_pe->ref_count))
      	free_pending_exception(primary_pe);
      ... sees ref_count == 0 (written by CPU 2), does double free !!
      
      This bug can happen only if someone is simultaneously writing to both the
      origin and the snapshot.
      
      If someone is writing only to the origin, __origin_write will submit kcopyd
      request after it decrements primary_pe->ref_count (so it can't happen that the
      finished copy races with primary_pe->ref_count decrementation).
      
      If someone is writing only to the snapshot, __origin_write isn't invoked at all
      and the race can't happen.
      
      The race happens when someone writes to the snapshot --- this creates
      pending_exception with primary_pe == NULL and starts copying. Then, someone
      writes to the same chunk in the snapshot, and __origin_write races with
      termination of already submitted request in pending_complete (that calls
      put_pending_exception).
      
      This race may be reason for bugs:
        http://bugzilla.kernel.org/show_bug.cgi?id=11636
        https://bugzilla.redhat.com/show_bug.cgi?id=465825
      
      The patch fixes the code to make sure that:
      1. If atomic_dec_and_test(&primary_pe->ref_count) returns false, the process
      must no longer dereference primary_pe (because someone else may free it under
      us).
      2. If atomic_dec_and_test(&primary_pe->ref_count) returns true, the process
      is responsible for freeing primary_pe.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Cc: stable@kernel.org
      7c5f78b9
  2. 21 7月, 2008 3 次提交
  3. 25 4月, 2008 5 次提交
  4. 29 3月, 2008 1 次提交
  5. 08 2月, 2008 2 次提交
  6. 20 10月, 2007 1 次提交
  7. 10 10月, 2007 1 次提交
  8. 13 7月, 2007 3 次提交
  9. 09 12月, 2006 3 次提交
  10. 08 12月, 2006 1 次提交
  11. 22 11月, 2006 1 次提交
  12. 03 10月, 2006 8 次提交
  13. 01 7月, 2006 1 次提交
  14. 27 6月, 2006 2 次提交
  15. 28 3月, 2006 5 次提交
    • A
      [PATCH] dm snapshot: fix kcopyd destructor · 138728dc
      Alasdair G Kergon 提交于
      Before removing a snapshot, wait for the completion of any kcopyd jobs using
      it.
      
      Do this by maintaining a count (nr_jobs) of how many outstanding jobs each
      kcopyd_client has.
      
      The snapshot destructor first unregisters the snapshot so that no new kcopyd
      jobs (created by writes to the origin) will reference that particular
      snapshot.  kcopyd_client_destroy() is now run next to wait for the completion
      of any outstanding jobs before the snapshot exception structures (that those
      jobs reference) are freed.
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      138728dc
    • A
      [PATCH] dm: remove SECTOR_FORMAT · 4ee218cd
      Andrew Morton 提交于
      We don't know what type sector_t has.  Sometimes it's unsigned long, sometimes
      it's unsigned long long.  For example on ppc64 it's unsigned long with
      CONFIG_LBD=n and on x86_64 it's unsigned long long with CONFIG_LBD=n.
      
      The way to handle all of this is to always use unsigned long long and to
      always typecast the sector_t when printing it.
      Acked-by: NAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4ee218cd
    • A
      [PATCH] device-mapper snapshot: fix invalidation · 76df1c65
      Alasdair G Kergon 提交于
      When a snapshot becomes invalid, s->valid is set to 0.  In this state, a
      snapshot can no longer be accessed.
      
      When s->lock is acquired, before doing anything else, s->valid must be checked
      to ensure the snapshot remains valid.
      
      This patch eliminates some races (that may cause panics) by adding some
      missing checks.  At the same time, some unnecessary levels of indentation are
      removed and snapshot invalidation is moved into a single function that always
      generates a device-mapper event.
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      76df1c65
    • A
      [PATCH] device-mapper snapshot: replace sibling list · b4b610f6
      Alasdair G Kergon 提交于
      The siblings "list" is used unsafely at the moment.
      
      Firstly, only the element on the list being changed gets locked (via the
      snapshot lock), not the next and previous elements which have pointers that
      are also being changed.
      
      Secondly, if you have two or more snapshots and write to the same chunk a
      second time before every snapshot has finished making its private copy of the
      data, if you're unlucky, _origin_write() could attempt its list_merge() and
      dereference a 'last' pointer to a pending_exception structure that has just
      been freed.
      
      Analysis reveals that the list is actually only there for reference counting.
      If 5 pending_exceptions are needed in origin_write, then the 5 are joined
      together into a 5-element list - without a separate list head because there's
      nowhere suitable to store it.  As the pending_exceptions complete, they are
      removed from the list one-by-one and any contents of origin_bios get moved
      across to one of the remaining pending_exceptions on the list.  Whichever one
      is last is detected because list_empty() is then true and the origin_bios get
      submitted.
      
      The fix proposed here uses an alternative reference counting mechanism by
      choosing one of the pending_exceptions as primary and maintaining an atomic
      counter there.
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b4b610f6
    • A
      [PATCH] device-mapper snapshot: fix origin_write pending_exception submission · eccf0817
      Alasdair G Kergon 提交于
      Say you have several snapshots of the same origin and then you issue a write
      to some place in the origin for the first time.
      
      Before the device-mapper snapshot target lets the write go through to the
      underlying device, it needs to make a copy of the data that is about to be
      overwritten.  Each snapshot is independent, so it makes one copy for each
      snapshot.
      
      __origin_write() loops through each snapshot and checks to see whether a copy
      is needed for that snapshot.  (A copy is only needed the first time that data
      changes.)
      
      If a copy is needed, the code allocates a 'pending_exception' structure
      holding the details.  It links these together for all the snapshots, then
      works its way through this list and submits the copying requests to the kcopyd
      thread by calling start_copy().  When each request is completed, the original
      pending_exception structure gets freed in pending_complete().
      
      If you're very unlucky, this structure can get freed *before* the submission
      process has finished walking the list.
      
      This patch:
      
        1) Creates a new temporary list pe_queue to hold the pending exception
           structures;
      
        2) Does all the bookkeeping up-front, then walks through the new list
           safely and calls start_copy() for each pending_exception that needed it;
      
        3) Avoids attempting to add pe->siblings to the list if it's already
           connected.
      
      [NB This does not fix all the races in this code.  More patches will follow.]
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      eccf0817
  16. 27 3月, 2006 1 次提交