1. 12 8月, 2010 11 次提交
    • J
      dm crypt: use kstrdup · a9c88f2e
      Julia Lawall 提交于
      Use kstrdup when the goal of an allocation is copy a string into the
      allocated region.
      
      The semantic patch that makes this change is as follows:
      (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @@
      expression from,to;
      expression flag,E1,E2;
      statement S;
      @@
      
      -  to = kmalloc(strlen(from) + 1,flag);
      +  to = kstrdup(from, flag);
         ... when != \(from = E1 \| to = E1 \)
         if (to==NULL || ...) S
         ... when != \(from = E2 \| to = E2 \)
      -  strcpy(to, from);
      // </smpl>
      Signed-off-by: NJulia Lawall <julia@diku.dk>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      a9c88f2e
    • A
      dm ioctl: use nonseekable_open · 402ab352
      Arnd Bergmann 提交于
      The dm control device does not implement read/write, so it has no use for
      seeking.  Using no_llseek prevents falling back to default_llseek, which
      requires the BKL.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      402ab352
    • K
      dm: separate device deletion from dm_put · 3f77316d
      Kiyoshi Ueda 提交于
      This patch separates the device deletion code from dm_put()
      to make sure the deletion happens in the process context.
      
      By this patch, device deletion always occurs in an ioctl (process)
      context and dm_put() can be called in interrupt context.
      As a result, the request-based dm's bad dm_put() usage pointed out
      by Mikulas below disappears.
          http://marc.info/?l=dm-devel&m=126699981019735&w=2
      
      Without this patch, I confirmed there is a case to crash the system:
          dm_put() => dm_table_destroy() => vfree() => BUG_ON(in_interrupt())
      
      Some more backgrounds and details:
      In request-based dm, a device opener can remove a mapped_device
      while the last request is still completing, because bios in the last
      request complete first and then the device opener can close and remove
      the mapped_device before the last request completes:
        CPU0                                          CPU1
        =================================================================
        <<INTERRUPT>>
        blk_end_request_all(clone_rq)
          blk_update_request(clone_rq)
            bio_endio(clone_bio) == end_clone_bio
              blk_update_request(orig_rq)
                bio_endio(orig_bio)
                                                      <<I/O completed>>
                                                      dm_blk_close()
                                                      dev_remove()
                                                        dm_put(md)
                                                          <<Free md>>
         blk_finish_request(clone_rq)
           ....
           dm_end_request(clone_rq)
             free_rq_clone(clone_rq)
             blk_end_request_all(orig_rq)
             rq_completed(md)
      
      So request-based dm used dm_get()/dm_put() to hold md for each I/O
      until its request completion handling is fully done.
      However, the final dm_put() can call the device deletion code which
      must not be run in interrupt context and may cause kernel panic.
      
      To solve the problem, this patch moves the device deletion code,
      dm_destroy(), to predetermined places that is actually deleting
      the mapped_device in ioctl (process) context, and changes dm_put()
      just to decrement the reference count of the mapped_device.
      By this change, dm_put() can be used in any context and the symmetric
      model below is introduced:
          dm_create():  create a mapped_device
          dm_destroy(): destroy a mapped_device
          dm_get():     increment the reference count of a mapped_device
          dm_put():     decrement the reference count of a mapped_device
      
      dm_destroy() waits for all references of the mapped_device to disappear,
      then deletes the mapped_device.
      
      dm_destroy() uses active waiting with msleep(1), since deleting
      the mapped_device isn't performance-critical task.
      And since at this point, nobody opens the mapped_device and no new
      reference will be taken, the pending counts are just for racing
      completing activity and will eventually decrease to zero.
      
      For the unlikely case of the forced module unload, dm_destroy_immediate(),
      which doesn't wait and forcibly deletes the mapped_device, is also
      introduced and used in dm_hash_remove_all().  Otherwise, "rmmod -f"
      may be stuck and never return.
      And now, because the mapped_device is deleted at this point, subsequent
      accesses to the mapped_device may cause NULL pointer references.
      
      Cc: stable@kernel.org
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      3f77316d
    • K
      dm ioctl: release _hash_lock between devices in remove_all · 98f33285
      Kiyoshi Ueda 提交于
      This patch changes dm_hash_remove_all() to release _hash_lock when
      removing a device.  After removing the device, dm_hash_remove_all()
      takes _hash_lock and searches the hash from scratch again.
      
      This patch is a preparation for the next patch, which changes device
      deletion code to wait for md reference to be 0.  Without this patch,
      the wait in the next patch may cause AB-BA deadlock:
        CPU0                                CPU1
        -----------------------------------------------------------------------
        dm_hash_remove_all()
          down_write(_hash_lock)
                                            table_status()
                                              md = find_device()
                                                     dm_get(md)
                                                       <increment md->holders>
                                              dm_get_live_or_inactive_table()
                                                dm_get_inactive_table()
                                                  down_write(_hash_lock)
          <in the md deletion code>
            <wait for md->holders to be 0>
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Cc: stable@kernel.org
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      98f33285
    • K
      dm: prevent access to md being deleted · abdc568b
      Kiyoshi Ueda 提交于
      This patch prevents access to mapped_device which is being deleted.
      
      Currently, even after a mapped_device has been removed from the hash,
      it could be accessed through idr_find() using minor number.
      That could cause a race and NULL pointer reference below:
        CPU0                          CPU1
        ------------------------------------------------------------------
        dev_remove(param)
          down_write(_hash_lock)
          dm_lock_for_deletion(md)
            spin_lock(_minor_lock)
            set_bit(DMF_DELETING)
            spin_unlock(_minor_lock)
          __hash_remove(hc)
          up_write(_hash_lock)
                                      dev_status(param)
                                        md = find_device(param)
                                               down_read(_hash_lock)
                                               __find_device_hash_cell(param)
                                                 dm_get_md(param->dev)
                                                   md = dm_find_md(dev)
                                                          spin_lock(_minor_lock)
                                                          md = idr_find(MINOR(dev))
                                                          spin_unlock(_minor_lock)
          dm_put(md)
            free_dev(md)
                                                   dm_get(md)
                                               up_read(_hash_lock)
                                        __dev_status(md, param)
                                        dm_put(md)
      
      This patch fixes such problems.
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Cc: stable@kernel.org
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      abdc568b
    • P
      dm ioctl: return uevent flag after rename · 856a6f1d
      Peter Rajnoha 提交于
      All the dm ioctls that generate uevents set the DM_UEVENT_GENERATED flag so
      that userspace knows whether or not to wait for a uevent to be processed
      before continuing,
      
      The dm rename ioctl sets this flag but was not structured to return it
      to userspace.  This patch restructures the rename ioctl processing to
      behave like the other ioctls that return data and so fix this.
      Signed-off-by: NPeter Rajnoha <prajnoha@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      856a6f1d
    • A
      dm ioctl: make __dev_status void · 094ea9a0
      Alasdair G Kergon 提交于
      __dev_status() cannot fail so make it void and simplify callers.
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      094ea9a0
    • P
      dm ioctl: remove __dev_status from geometry and target message · 6be54494
      Peter Rajnoha 提交于
      Remove useless __dev_status call while processing an ioctl that sets up
      device geometry and target message.  The data is not returned to
      userspace so there is no point collecting it and in the case of
      target_message it is collected before processing the message so if it
      did return it might be stale.
      Signed-off-by: NPeter Rajnoha <prajnoha@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      6be54494
    • M
      dm snapshot: test chunk size against both origin and snapshot · c2411045
      Mikulas Patocka 提交于
      Validate chunk size against both origin and snapshot sector size
      
      Don't allow chunk size smaller than either origin or snapshot logical
      sector size. Reading or writing data not aligned to sector size is not
      allowed and causes immediate errors.
      
      This requires us to open the origin before initialising the
      exception store and to export dm_snap_origin.
      
      Cc: stable@kernel.org
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Reviewed-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      c2411045
    • M
      dm snapshot: iterate origin and cow devices · 1e5554c8
      Mikulas Patocka 提交于
      Iterate both origin and snapshot devices
      
      iterate_devices method should call the callback for all the devices where
      the bio may be remapped. Thus, snapshot_iterate_devices should call the callback
      for both snapshot and origin underlying devices because it remaps some bios
      to the snapshot and some to the origin.
      
      snapshot_iterate_devices called the callback only for the origin device.
      This led to badly calculated device limits if snapshot and origin were placed
      on different types of disks.
      
      Cc: stable@kernel.org
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Reviewed-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      1e5554c8
    • A
      dm mpath: fix NULL pointer dereference when path parameters missing · 6bbf79a1
      Alasdair G Kergon 提交于
      multipath_ctr() forgets to return an error after detecting
      missing path parameters.  Fix this.
      Signed-off-by: NPatrick LoPresti <lopresti@gmail.com>
      Cc: stable@kernel.org
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      6bbf79a1
  2. 08 8月, 2010 8 次提交
  3. 07 8月, 2010 1 次提交
  4. 26 7月, 2010 16 次提交
    • N
      md/bitmap: separate out loading a bitmap from initialising the structures. · 69e51b44
      NeilBrown 提交于
      dm makes this distinction between ->ctr and ->resume, so we need to
      too.
      
      Also get the new bitmap_load to clear out the bitmap first, as this is
      most consistent with the dm suspend/resume approach
      Signed-off-by: NNeilBrown <neilb@suse.de>
      69e51b44
    • N
      md/bitmap: prepare for storing write-intent-bitmap via dm-dirty-log. · e384e585
      NeilBrown 提交于
      This allows md/raid5 to fully work as a dm target.
      
      Normally md uses a 'filemap' which contains a list of pages of bits
      each of which may be written separately.
      dm-log uses and all-or-nothing approach to writing the log, so
      when using a dm-log, ->filemap is NULL and the flags normally stored
      in filemap_attr are stored in ->logattrs instead.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      e384e585
    • N
      md/bitmap: optimise scanning of empty bitmaps. · ef425673
      NeilBrown 提交于
      A bitmap is stored as one page per 2048 bits.
      If none of the bits are set, the page is not allocated.
      
      When bitmap_get_counter finds that a page isn't allocate,
      it just reports that one bit work of space isn't flagged,
      rather than reporting that 2048 bits worth of space are
      unflagged.
      This can cause searches for flagged bits (e.g. bitmap_close_sync)
      to do more work than is really necessary.
      
      So change bitmap_get_counter (when creating) to report a number of
      blocks that more accurately reports the range of the device for which
      no counter currently exists.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      ef425673
    • N
      md/bitmap: clean up plugging calls. · b63d7c2e
      NeilBrown 提交于
      1/ use md_unplug in bitmap.c as we will soon be using bitmaps under
        arrays with no queue attached.
      
      2/ Don't bother plugging the queue when we set a bit in the bitmap.
         The reason for this was to encourage as many bits as possible to
         get set before we unplug and write stuff out.
         However every personality already plugs the queue after
         bitmap_startwrite either directly (raid1/raid10) or be setting
         STRIPE_BIT_DELAY which causes the queue to be plugged later
         (raid5).
      Signed-off-by: NNeilBrown <neilb@suse.de>
      b63d7c2e
    • N
      md/bitmap: reduce dependence on sysfs. · 5ff5afff
      NeilBrown 提交于
      For dm-raid45 we will want to use bitmaps in dm-targets which don't
      have entries in sysfs, so cope with the mddev not living in sysfs.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      5ff5afff
    • N
      md/bitmap: white space clean up and similar. · ac2f40be
      NeilBrown 提交于
      Fixes some whitespace problems
      Fixed some checkpatch.pl complaints.
      Replaced kmalloc ... memset(0), with kzalloc
      Fixed an unlikely memory leak on an error path.
      Reformatted a number of 'if/else' sets, sometimes
      replacing goto with an else clause.
      Removed some old comments and commented-out code.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      ac2f40be
    • N
      md/raid5: export raid5 unplugging interface. · 9f7c2220
      NeilBrown 提交于
      Also remove remaining accesses to ->queue and ->gendisk when ->queue
      is NULL (As it is in a DM target).
      Signed-off-by: NNeilBrown <neilb@suse.de>
      9f7c2220
    • N
      md/plug: optionally use plugger to unplug an array during resync/recovery. · 252ac522
      NeilBrown 提交于
      If an array doesn't have a 'queue' then md_do_sync cannot
      unplug it.
      In that case it will have a 'plugger', so make that available
      to the mddev, and use it to unplug the array if needed.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      252ac522
    • N
      md/raid5: add simple plugging infrastructure. · 2ac87401
      NeilBrown 提交于
      md/raid5 uses the plugging infrastructure provided by the block layer
      and 'struct request_queue'.  However when we plug raid5 under dm there
      is no request queue so we cannot use that.
      
      So create a similar infrastructure that is much lighter weight and use
      it for raid5.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      2ac87401
    • N
      md/raid5: export is_congested test · 11d8a6e3
      NeilBrown 提交于
      the dm module will need this for dm-raid45.
      
      Also only access ->queue->backing_dev_info->congested_fn
      if ->queue actually exists.  It won't in a dm target.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      11d8a6e3
    • N
      raid5: Don't set read-ahead when there is no queue · 4a5add49
      NeilBrown 提交于
      dm-raid456 does not provide a 'queue' for raid5 to use,
      so we must make raid5 stop depending on the queue.
      
      First: read_ahead
      dm handles read-ahead adjustment fully in userspace, so
      simply don't do any readahead adjustments if there is
      no queue.
      
      Also re-arrange code slightly so all the accesses to ->queue are
      together.
      
      Finally, move the blk_queue_merge_bvec function into the 'if' as
      the ->split_io setting in dm-raid456 has the same effect.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      4a5add49
    • N
      md: add support for raising dm events. · 768a418d
      NeilBrown 提交于
      dm uses scheduled work to raise events to user-space.
      So allow md device to have work_structs and schedule them on an error.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      768a418d
    • N
      md: export various start/stop interfaces · 390ee602
      NeilBrown 提交于
      export entry points for starting and stopping md arrays.
      This will be used by a module to make md/raid5 work under
      dm.
      Also stop calling md_stop_writes from md_stop, as that won't
      work well with dm - it will want to call the two separately.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      390ee602
    • N
      md: split out md_rdev_init · e8bb9a83
      NeilBrown 提交于
      This functionality will be needed separately in a subsequent patch, so
      split it into it's own exported function.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      e8bb9a83
    • N
      md: be more careful setting MD_CHANGE_CLEAN · 676e42d8
      NeilBrown 提交于
      When MD_CHANGE_CLEAN is set we might block in md_write_start.
      So we should only set it when fairly sure that something will clear
      it.
      
      There are two places where it is set so as to encourage a metadata
      update to record the progress of resync/recovery.  This should only
      be done if the internal metadata update mechanisms are in use, which
      can be tested by by inspecting '->persistent'.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      676e42d8
    • N
      md/raid5: ensure we create a unique name for kmem_cache when mddev has no gendisk · f4be6b43
      NeilBrown 提交于
      We will shortly allow md devices with no gendisk (they are attached to
      a dm-target instead).  That will cause mdname() to return 'mdX'.
      There is one place where mdname really needs to be unique: when
      creating the name for a slab cache.
      So in that case, if there is no gendisk, you the address of the mddev
      formatted in HEX to provide a unique name.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      f4be6b43
  5. 21 7月, 2010 2 次提交
  6. 24 6月, 2010 2 次提交
    • N
      md/raid5: don't include 'spare' drives when reshaping to fewer devices. · 3424bf6a
      NeilBrown 提交于
      There are few situations where it would make any sense to add a spare
      when reducing the number of devices in an array, but it is
      conceivable:  A 6 drive RAID6 with two missing devices could be
      reshaped to a 5 drive RAID6, and a spare could become available
      just in time for the reshape, but not early enough to have been
      recovered first.  'freezing' recovery can make this easy to
      do without any races.
      
      However doing such a thing is a bad idea.  md will not record the
      partially-recovered state of the 'spare' and when the reshape
      finished it will think that the spare is still spare.
      Easiest way to avoid this confusion is to simply disallow it.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      3424bf6a
    • N
      md/raid5: add a missing 'continue' in a loop. · 2f115882
      NeilBrown 提交于
      As the comment says, the tail of this loop only applies to devices
      that are not fully in sync, so if In_sync was set, we should avoid
      the rest of the loop.
      
      This bug will hardly ever cause an actual problem.  The worst it
      can do is allow an array to be assembled that is dirty and degraded,
      which is not generally a good idea (without warning the sysadmin
      first).
      
      This will only happen if the array is RAID4 or a RAID5/6 in an
      intermediate state during a reshape and so has one drive that is
      all 'parity' - no data - while some other device has failed.
      
      This is certainly possible, but not at all common.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      2f115882