1. 06 1月, 2009 13 次提交
    • M
      dm: add name and uuid to sysfs · 784aae73
      Milan Broz 提交于
      Implement simple read-only sysfs entry for device-mapper block device.
      
      This patch adds a simple sysfs directory named "dm" under block device
      properties and implements
      	- name attribute (string containing mapped device name)
      	- uuid attribute (string containing UUID, or empty string if not set)
      
      The kobject is embedded in mapped_device struct, so no additional
      memory allocation is needed for initializing sysfs entry.
      
      During the processing of sysfs attribute we need to lock mapped device
      which is done by a new function dm_get_from_kobj, which returns the md
      associated with kobject and increases the usage count.
      
      Each 'show attribute' function is responsible for its own locking.
      Signed-off-by: NMilan Broz <mbroz@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      784aae73
    • M
      dm table: rework reference counting · d5816876
      Mikulas Patocka 提交于
      Rework table reference counting.
      
      The existing code uses a reference counter. When the last reference is
      dropped and the counter reaches zero, the table destructor is called.
      Table reference counters are acquired/released from upcalls from other
      kernel code (dm_any_congested, dm_merge_bvec, dm_unplug_all).
      If the reference counter reaches zero in one of the upcalls, the table
      destructor is called from almost random kernel code.
      
      This leads to various problems:
      * dm_any_congested being called under a spinlock, which calls the
        destructor, which calls some sleeping function.
      * the destructor attempting to take a lock that is already taken by the
        same process.
      * stale reference from some other kernel code keeps the table
        constructed, which keeps some devices open, even after successful
        return from "dmsetup remove". This can confuse lvm and prevent closing
        of underlying devices or reusing device minor numbers.
      
      The patch changes reference counting so that the table destructor can be
      called only at predetermined places.
      
      The table has always exactly one reference from either mapped_device->map
      or hash_cell->new_map. After this patch, this reference is not counted
      in table->holders.  A pair of dm_create_table/dm_destroy_table functions
      is used for table creation/destruction.
      
      Temporary references from the other code increase table->holders. A pair
      of dm_table_get/dm_table_put functions is used to manipulate it.
      
      When the table is about to be destroyed, we wait for table->holders to
      reach 0. Then, we call the table destructor.  We use active waiting with
      msleep(1), because the situation happens rarely (to one user in 5 years)
      and removing the device isn't performance-critical task: the user doesn't
      care if it takes one tick more or not.
      
      This way, the destructor is called only at specific points
      (dm_table_destroy function) and the above problems associated with lazy
      destruction can't happen.
      
      Finally remove the temporary protection added to dm_any_congested().
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      d5816876
    • A
      dm: support barriers on simple devices · ab4c1424
      Andi Kleen 提交于
      Implement barrier support for single device DM devices
      
      This patch implements barrier support in DM for the common case of dm linear
      just remapping a single underlying device. In this case we can safely
      pass the barrier through because there can be no reordering between
      devices.
      
       NB. Any DM device might cease to support barriers if it gets
           reconfigured so code must continue to allow for a possible
           -EOPNOTSUPP on every barrier bio submitted.  - agk
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      ab4c1424
    • K
      dm request: add caches · 8fbf26ad
      Kiyoshi Ueda 提交于
      This patch prepares some kmem_caches for request-based dm.
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      8fbf26ad
    • M
      dm ioctl: allow dm_copy_name_and_uuid to return only one field · 23d39f63
      Milan Broz 提交于
      Allow NULL buffer in dm_copy_name_and_uuid if you only want to return one of
      the fields.
      
      (Required by a following patch that adds these fields to sysfs.)
      Signed-off-by: NMilan Broz <mbroz@redhat.com>
      Reviewed-by: NAlasdair G Kergon <agk@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      23d39f63
    • M
      dm log: ensure log bitmap fits on log device · ac1f0ac2
      Milan Broz 提交于
      Check that the log bitmap will fit within the log device.
      Signed-off-by: NMilan Broz <mbroz@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      ac1f0ac2
    • M
      dm log: move region_size validation · 2045e88e
      Milan Broz 提交于
      Move log size validation from mirror target to log constructor.
      
      Removed PAGE_SIZE restriction we no longer think necessary.
      Signed-off-by: NMilan Broz <mbroz@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      2045e88e
    • T
      dm log: avoid reinitialising io_req on every operation · 6f3af01c
      Takahiro Yasui 提交于
      rw_header function updates three members of io_req data every time
      when I/O is processed. bi_rw and notify.fn are never modified once
      they get initialized, and so they can be set in advance.
      
      header_to_disk() can also be pulled out of write_header() since only one
      caller needs it and write_header() can be replaced by rw_header()
      directly.
      Signed-off-by: NTakahiro Yasui <tyasui@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      6f3af01c
    • M
      dm: consolidate target deregistration error handling · 10d3bd09
      Mikulas Patocka 提交于
      Change dm_unregister_target to return void and use BUG() for error
      reporting.
      
      dm_unregister_target can only fail because of programming bug in the
      target driver. It can't fail because of user's behavior or disk errors.
      
      This patch changes unregister_target to return void and use BUG if
      someone tries to unregister non-registered target or unregister target
      that is in use.
      
      This patch removes code duplication (testing of error codes in all dm
      targets) and reports bugs in just one place, in dm_unregister_target. In
      some target drivers, these return codes were ignored, which could lead
      to a situation where bugs could be missed.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      10d3bd09
    • J
      dm raid1: fix error count · d460c65a
      Jonathan Brassow 提交于
      Always increase the error count when I/O on a leg of a mirror fails.
      
      The error count is used to decide whether to select an alternative
      mirror leg.  If the target doesn't use the "handle_errors" feature, the
      error count is not updated and the bio can get requeued forever by the
      read callback.
      
      Fix it by increasing error_count before the handle_errors feature
      checking.
      
      Cc: stable@kernel.org
      Signed-off-by: NMilan Broz <mbroz@redhat.com>
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      d460c65a
    • T
      dm log: fix dm_io_client leak on error paths · c7a2bd19
      Takahiro Yasui 提交于
      In create_log_context function, dm_io_client_destroy function needs
      to be called, when memory allocation of disk_header, sync_bits and
      recovering_bits failed, but dm_io_client_destroy is not called.
      
      Cc: stable@kernel.org
      Signed-off-by: NTakahiro Yasui <tyasui@redhat.com>
      Acked-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      c7a2bd19
    • M
      dm snapshot: change yield to msleep · 90fa1527
      Mikulas Patocka 提交于
      Change yield() to msleep(1). If the thread had realtime priority,
      yield() doesn't really yield, so the yielding process would loop
      indefinitely and cause machine lockup.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      90fa1527
    • M
      dm table: drop reference at unbind · a1b51e98
      Mikulas Patocka 提交于
      Move one dm_table_put() so that the last reference in the thread
      gets dropped in __unbind().
      
      This is required for a following patch,
      dm-table-rework-reference-counting.patch, which will change the logic in
      such a way that table destructor is called only at specific points in
      the code.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      a1b51e98
  2. 29 12月, 2008 1 次提交
    • J
      bio: allow individual slabs in the bio_set · bb799ca0
      Jens Axboe 提交于
      Instead of having a global bio slab cache, add a reference to one
      in each bio_set that is created. This allows for personalized slabs
      in each bio_set, so that they can have bios of different sizes.
      
      This means we can personalize the bios we return. File systems may
      want to embed the bio inside another structure, to avoid allocation
      more items (and stuffing them in ->bi_private) after the get a bio.
      Or we may want to embed a number of bio_vecs directly at the end
      of a bio, to avoid doing two allocations to return a bio. This is now
      possible.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      bb799ca0
  3. 19 12月, 2008 1 次提交
    • N
      md: Don't read past end of bitmap when reading bitmap. · a2ed9615
      NeilBrown 提交于
      When we read the write-intent-bitmap off the device, we currently
      read a whole number of pages.
      When PAGE_SIZE is 4K, this works due to the alignment we enforce
      on the superblock and bitmap.
      When PAGE_SIZE is 64K, this case read past the end-of-device
      which causes an error.
      
      When we write the superblock, we ensure to clip the last page
      to just be the required size.  Copy that code into the read path
      to just read the required number of sectors.
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Cc: stable@kernel.org
      a2ed9615
  4. 03 12月, 2008 1 次提交
    • M
      block: fix setting of max_segment_size and seg_boundary mask · 0e435ac2
      Milan Broz 提交于
      Fix setting of max_segment_size and seg_boundary mask for stacked md/dm
      devices.
      
      When stacking devices (LVM over MD over SCSI) some of the request queue
      parameters are not set up correctly in some cases by default, namely
      max_segment_size and and seg_boundary mask.
      
      If you create MD device over SCSI, these attributes are zeroed.
      
      Problem become when there is over this mapping next device-mapper mapping
      - queue attributes are set in DM this way:
      
      request_queue   max_segment_size  seg_boundary_mask
      SCSI                65536             0xffffffff
      MD RAID1                0                      0
      LVM                 65536                 -1 (64bit)
      
      Unfortunately bio_add_page (resp.  bio_phys_segments) calculates number of
      physical segments according to these parameters.
      
      During the generic_make_request() is segment cout recalculated and can
      increase bio->bi_phys_segments count over the allowed limit.  (After
      bio_clone() in stack operation.)
      
      Thi is specially problem in CCISS driver, where it produce OOPS here
      
          BUG_ON(creq->nr_phys_segments > MAXSGENTRIES);
      
      (MAXSEGENTRIES is 31 by default.)
      
      Sometimes even this command is enough to cause oops:
      
        dd iflag=direct if=/dev/<vg>/<lv> of=/dev/null bs=128000 count=10
      
      This command generates bios with 250 sectors, allocated in 32 4k-pages
      (last page uses only 1024 bytes).
      
      For LVM layer, it allocates bio with 31 segments (still OK for CCISS),
      unfortunatelly on lower layer it is recalculated to 32 segments and this
      violates CCISS restriction and triggers BUG_ON().
      
      The patch tries to fix it by:
      
       * initializing attributes above in queue request constructor
         blk_queue_make_request()
      
       * make sure that blk_queue_stack_limits() inherits setting
      
       (DM uses its own function to set the limits because it
       blk_queue_stack_limits() was introduced later.  It should probably switch
       to use generic stack limit function too.)
      
       * sets the default seg_boundary value in one place (blkdev.h)
      
       * use this mask as default in DM (instead of -1, which differs in 64bit)
      
      Bugs related to this:
      https://bugzilla.redhat.com/show_bug.cgi?id=471639
      http://bugzilla.kernel.org/show_bug.cgi?id=8672Signed-off-by: NMilan Broz <mbroz@redhat.com>
      Reviewed-by: NAlasdair G Kergon <agk@redhat.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: Tejun Heo <htejun@gmail.com>
      Cc: Mike Miller <mike.miller@hp.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      0e435ac2
  5. 26 11月, 2008 2 次提交
  6. 14 11月, 2008 6 次提交
  7. 06 11月, 2008 3 次提交
    • A
      md: linear: Fix a division by zero bug for very small arrays. · f1cd14ae
      Andre Noll 提交于
      We currently oops with a divide error on starting a linear software
      raid array consisting of at least two very small (< 500K) devices.
      
      The bug is caused by the calculation of the hash table size which
      tries to compute sector_div(sz, base) with "base" being zero due to
      the small size of the component devices of the array.
      
      Fix this by requiring the hash spacing to be at least one which
      implies that also "base" is non-zero.
      
      This bug has existed since about 2.6.14.
      
      Cc: stable@kernel.org
      Signed-off-by: NAndre Noll <maan@systemlinux.org>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      f1cd14ae
    • N
      md: fix bug in raid10 recovery. · a53a6c85
      NeilBrown 提交于
      Adding a spare to a raid10 doesn't cause recovery to start.
      This is due to an silly type in
        commit 6c2fce2e
      and so is a bug in 2.6.27 and .28-rc.
      
      Thanks to Thomas Backlund for bisecting to find this.
      
      Cc: Thomas Backlund <tmb@mandriva.org>
      Cc: stable@kernel.org
      Signed-off-by: NNeilBrown <neilb@suse.de>
      a53a6c85
    • N
      md: revert the recent addition of a call to the BLKRRPART ioctl. · cb3ac42b
      NeilBrown 提交于
      It turns out that it is only safe to call blkdev_ioctl when the device
      is actually open (as ->bd_disk is set to NULL on last close).  And it
      is quite possible for do_md_stop to be called when the device is not
      open.  So discard the call to blkdev_ioctl(BLKRRPART) which was
      added in
         commit 934d9c23
      
      It is just as easy to call this ioctl from userspace when needed (on
      mdadm -S) so leave it out of the kernel
      Signed-off-by: NNeilBrown <neilb@suse.de>
      cb3ac42b
  8. 30 10月, 2008 3 次提交
    • M
      dm snapshot: wait for chunks in destructor · 879129d2
      Mikulas Patocka 提交于
      If there are several snapshots sharing an origin and one is removed
      while the origin is being written to, the snapshot's mempool may get
      deleted while elements are still referenced.
      
      Prior to dm-snapshot-use-per-device-mempools.patch the pending
      exceptions may still have been referenced after the snapshot was
      destroyed, but this was not a problem because the shared mempool
      was still there.
      
      This patch fixes the problem by tracking the number of mempool elements
      in use.
      
      The scenario:
      - You have an origin and two snapshots 1 and 2.
      - Someone writes to the origin.
      - It creates two exceptions in the snapshots, snapshot 1 will be primary
      exception, snapshot 2's pending_exception->primary_pe will point to the
      exception in snapshot 1.
      - The exceptions are being relocated, relocation of exception 1 finishes
      (but it's pending_exception is still allocated, because it is referenced
      by an exception from snapshot 2)
      - The user lvremoves snapshot 1 --- it calls just suspend (does nothing)
      and destructor. md->pending is zero (there is no I/O submitted to the
      snapshot by md layer), so it won't help us.
      - The destructor waits for kcopyd jobs to finish on snapshot 1 --- but
      there are none.
      - The destructor on snapshot 1 cleans up everything.
      - The relocation of exception on snapshot 2 finishes, it drops reference
      on primary_pe. This frees its primary_pe pointer. Primary_pe points to
      pending exception created for snapshot 1. So it frees memory into
      non-existing mempool.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      879129d2
    • M
      dm snapshot: fix register_snapshot deadlock · 60c856c8
      Mikulas Patocka 提交于
      register_snapshot() performs a GFP_KERNEL allocation while holding
      _origins_lock for write, but that could write out dirty pages onto a
      device that attempts to acquire _origins_lock for read, resulting in
      deadlock.
      
      So move the allocation up before taking the lock.
      
      This path is not performance-critical, so it doesn't matter that we
      allocate memory and free it if we find that we won't need it.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      60c856c8
    • I
      dm raid1: fix do_failures · b34578a4
      Ilpo Jarvinen 提交于
      Missing braces.  Commit 1f965b19 (dm raid1: separate region_hash interface
      part1) broke it.
      Signed-off-by: NIlpo Jarvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Cc: Heinz Mauelshagen <hjm@redhat.com>
      b34578a4
  9. 28 10月, 2008 1 次提交
    • N
      md: destroy partitions and notify udev when md array is stopped. · 934d9c23
      NeilBrown 提交于
      md arrays are not currently destroyed when they are stopped - they
      remain in /sys/block.  Last time I tried this I tripped over locking
      too much.
      
      A consequence of this is that udev doesn't remove anything from /dev.
      This is rather ugly.
      
      As an interim measure until proper device removal can be achieved,
      make sure all partitions are removed using the BLKRRPART ioctl, and
      send a KOBJ_CHANGE when an md array is stopped.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      934d9c23
  10. 23 10月, 2008 1 次提交
  11. 22 10月, 2008 8 次提交