1. 12 8月, 2010 23 次提交
    • M
      dm snapshot: implement merge · b1d55528
      Mikulas Patocka 提交于
      Implement merge method for the snapshot origin to improve read
      performance.
      
      Without merge method, dm asks the upper layers to submit smallest possible
      bios --- one page. Submitting such small bios impacts performance negatively
      when reading or writing the origin device.
      
      Without this patch, CPU consumption when reading the origin on lvm on md-raid0
      was 6 to 12%, with this patch, it drops to 1 to 4%.
      
      Note: in my testing, it actually degraded performance in some settings, I
      traced it to Maxtor disks having problems with > 512-sector requests.
      Reducing the number of sectors to /sys/block/sd*/queue/max_sectors_kb to
      256 fixed the read performance. I think we don't have to care about weird
      disks that actually degrade performance because of large requests being
      sent to them.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      b1d55528
    • M
      dm: do not initialise full request queue when bio based · 4a0b4ddf
      Mike Snitzer 提交于
      Change bio-based mapped devices no longer to have a fully initialized
      request_queue (request_fn, elevator, etc).  This means bio-based DM
      devices no longer register elevator sysfs attributes ('iosched/' tree
      or 'scheduler' other than "none").
      
      In contrast, a request-based DM device will continue to have a full
      request_queue and will register elevator sysfs attributes.  Therefore
      a user can determine a DM device's type by checking if elevator sysfs
      attributes exist.
      
      First allocate a minimalist request_queue structure for a DM device
      (needed for both bio and request-based DM).
      
      Initialization of a full request_queue is deferred until it is known
      that the DM device is request-based, at the end of the table load
      sequence.
      
      Factor DM device's request_queue initialization:
      - common to both request-based and bio-based into dm_init_md_queue().
      - specific to request-based into dm_init_request_based_queue().
      
      The md->type_lock mutex is used to protect md->queue, in addition to
      md->type, during table_load().
      
      A DM device's first table_load will establish the immutable md->type.
      But md->queue initialization, based on md->type, may fail at that time
      (because blk_init_allocated_queue cannot allocate memory).  Therefore
      any subsequent table_load must (re)try dm_setup_md_queue independently of
      establishing md->type.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      4a0b4ddf
    • M
      dm ioctl: make bio or request based device type immutable · a5664dad
      Mike Snitzer 提交于
      Determine whether a mapped device is bio-based or request-based when
      loading its first (inactive) table and don't allow that to be changed
      later.
      
      This patch performs different device initialisation in each of the two
      cases.  (We don't think it's necessary to add code to support changing
      between the two types.)
      
      Allowed md->type transitions:
        DM_TYPE_NONE to DM_TYPE_BIO_BASED
        DM_TYPE_NONE to DM_TYPE_REQUEST_BASED
      
      We now prevent table_load from replacing the inactive table with a
      conflicting type of table even after an explicit table_clear.
      
      Introduce 'type_lock' into the struct mapped_device to protect md->type
      and to prepare for the next patch that will change the queue
      initialization and allocate memory while md->type_lock is held.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      
       drivers/md/dm-ioctl.c    |   15 +++++++++++++++
       drivers/md/dm.c          |   37 ++++++++++++++++++++++++++++++-------
       drivers/md/dm.h          |    5 +++++
       include/linux/dm-ioctl.h |    4 ++--
       4 files changed, 52 insertions(+), 9 deletions(-)
      a5664dad
    • M
      dm: skip second flush on bio unsupported error · 708e9295
      Mikulas Patocka 提交于
      When processing barriers, skip the second flush if processing the bio
      failed with -EOPNOTSUPP.  This can happen with discard+barrier requests.
      If the device doesn't support discard, there would be two useless
      SYNCHRONIZE CACHE commands.  The first dm_flush cannot be so easily
      optimized out, so we leave it there.
      
      Previously, -EOPNOTSUPP could be received in dec_pending only with empty
      barriers and we ignored that error, assuming the device not supporting
      cache flushes has cache always consistent.  With the addition of discard
      barriers, this -EOPNOTSUPP can also be generated by discards and we
      must record it in md->barrier_error for process_barrier.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      708e9295
    • T
      dm snapshot: persistent use define for disk header chunk size · 87c961cb
      Tomohiro Kusumi 提交于
      This patch fixes hard-coded value for the size of a chunk that includes
      disk header for persistent snapshot. It should be changed to existing
      macro NUM_SNAPSHOT_HDR_CHUNKS instead of using hard-coded value 1.
      Signed-off-by: NTomohiro Kusumi <kusumi.tomohiro@jp.fujitsu.com>
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      87c961cb
    • J
      dm crypt: use kstrdup · a9c88f2e
      Julia Lawall 提交于
      Use kstrdup when the goal of an allocation is copy a string into the
      allocated region.
      
      The semantic patch that makes this change is as follows:
      (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @@
      expression from,to;
      expression flag,E1,E2;
      statement S;
      @@
      
      -  to = kmalloc(strlen(from) + 1,flag);
      +  to = kstrdup(from, flag);
         ... when != \(from = E1 \| to = E1 \)
         if (to==NULL || ...) S
         ... when != \(from = E2 \| to = E2 \)
      -  strcpy(to, from);
      // </smpl>
      Signed-off-by: NJulia Lawall <julia@diku.dk>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      a9c88f2e
    • A
      dm ioctl: use nonseekable_open · 402ab352
      Arnd Bergmann 提交于
      The dm control device does not implement read/write, so it has no use for
      seeking.  Using no_llseek prevents falling back to default_llseek, which
      requires the BKL.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      402ab352
    • K
      dm: separate device deletion from dm_put · 3f77316d
      Kiyoshi Ueda 提交于
      This patch separates the device deletion code from dm_put()
      to make sure the deletion happens in the process context.
      
      By this patch, device deletion always occurs in an ioctl (process)
      context and dm_put() can be called in interrupt context.
      As a result, the request-based dm's bad dm_put() usage pointed out
      by Mikulas below disappears.
          http://marc.info/?l=dm-devel&m=126699981019735&w=2
      
      Without this patch, I confirmed there is a case to crash the system:
          dm_put() => dm_table_destroy() => vfree() => BUG_ON(in_interrupt())
      
      Some more backgrounds and details:
      In request-based dm, a device opener can remove a mapped_device
      while the last request is still completing, because bios in the last
      request complete first and then the device opener can close and remove
      the mapped_device before the last request completes:
        CPU0                                          CPU1
        =================================================================
        <<INTERRUPT>>
        blk_end_request_all(clone_rq)
          blk_update_request(clone_rq)
            bio_endio(clone_bio) == end_clone_bio
              blk_update_request(orig_rq)
                bio_endio(orig_bio)
                                                      <<I/O completed>>
                                                      dm_blk_close()
                                                      dev_remove()
                                                        dm_put(md)
                                                          <<Free md>>
         blk_finish_request(clone_rq)
           ....
           dm_end_request(clone_rq)
             free_rq_clone(clone_rq)
             blk_end_request_all(orig_rq)
             rq_completed(md)
      
      So request-based dm used dm_get()/dm_put() to hold md for each I/O
      until its request completion handling is fully done.
      However, the final dm_put() can call the device deletion code which
      must not be run in interrupt context and may cause kernel panic.
      
      To solve the problem, this patch moves the device deletion code,
      dm_destroy(), to predetermined places that is actually deleting
      the mapped_device in ioctl (process) context, and changes dm_put()
      just to decrement the reference count of the mapped_device.
      By this change, dm_put() can be used in any context and the symmetric
      model below is introduced:
          dm_create():  create a mapped_device
          dm_destroy(): destroy a mapped_device
          dm_get():     increment the reference count of a mapped_device
          dm_put():     decrement the reference count of a mapped_device
      
      dm_destroy() waits for all references of the mapped_device to disappear,
      then deletes the mapped_device.
      
      dm_destroy() uses active waiting with msleep(1), since deleting
      the mapped_device isn't performance-critical task.
      And since at this point, nobody opens the mapped_device and no new
      reference will be taken, the pending counts are just for racing
      completing activity and will eventually decrease to zero.
      
      For the unlikely case of the forced module unload, dm_destroy_immediate(),
      which doesn't wait and forcibly deletes the mapped_device, is also
      introduced and used in dm_hash_remove_all().  Otherwise, "rmmod -f"
      may be stuck and never return.
      And now, because the mapped_device is deleted at this point, subsequent
      accesses to the mapped_device may cause NULL pointer references.
      
      Cc: stable@kernel.org
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      3f77316d
    • K
      dm ioctl: release _hash_lock between devices in remove_all · 98f33285
      Kiyoshi Ueda 提交于
      This patch changes dm_hash_remove_all() to release _hash_lock when
      removing a device.  After removing the device, dm_hash_remove_all()
      takes _hash_lock and searches the hash from scratch again.
      
      This patch is a preparation for the next patch, which changes device
      deletion code to wait for md reference to be 0.  Without this patch,
      the wait in the next patch may cause AB-BA deadlock:
        CPU0                                CPU1
        -----------------------------------------------------------------------
        dm_hash_remove_all()
          down_write(_hash_lock)
                                            table_status()
                                              md = find_device()
                                                     dm_get(md)
                                                       <increment md->holders>
                                              dm_get_live_or_inactive_table()
                                                dm_get_inactive_table()
                                                  down_write(_hash_lock)
          <in the md deletion code>
            <wait for md->holders to be 0>
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Cc: stable@kernel.org
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      98f33285
    • K
      dm: prevent access to md being deleted · abdc568b
      Kiyoshi Ueda 提交于
      This patch prevents access to mapped_device which is being deleted.
      
      Currently, even after a mapped_device has been removed from the hash,
      it could be accessed through idr_find() using minor number.
      That could cause a race and NULL pointer reference below:
        CPU0                          CPU1
        ------------------------------------------------------------------
        dev_remove(param)
          down_write(_hash_lock)
          dm_lock_for_deletion(md)
            spin_lock(_minor_lock)
            set_bit(DMF_DELETING)
            spin_unlock(_minor_lock)
          __hash_remove(hc)
          up_write(_hash_lock)
                                      dev_status(param)
                                        md = find_device(param)
                                               down_read(_hash_lock)
                                               __find_device_hash_cell(param)
                                                 dm_get_md(param->dev)
                                                   md = dm_find_md(dev)
                                                          spin_lock(_minor_lock)
                                                          md = idr_find(MINOR(dev))
                                                          spin_unlock(_minor_lock)
          dm_put(md)
            free_dev(md)
                                                   dm_get(md)
                                               up_read(_hash_lock)
                                        __dev_status(md, param)
                                        dm_put(md)
      
      This patch fixes such problems.
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Cc: stable@kernel.org
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      abdc568b
    • P
      dm ioctl: return uevent flag after rename · 856a6f1d
      Peter Rajnoha 提交于
      All the dm ioctls that generate uevents set the DM_UEVENT_GENERATED flag so
      that userspace knows whether or not to wait for a uevent to be processed
      before continuing,
      
      The dm rename ioctl sets this flag but was not structured to return it
      to userspace.  This patch restructures the rename ioctl processing to
      behave like the other ioctls that return data and so fix this.
      Signed-off-by: NPeter Rajnoha <prajnoha@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      856a6f1d
    • A
      dm ioctl: make __dev_status void · 094ea9a0
      Alasdair G Kergon 提交于
      __dev_status() cannot fail so make it void and simplify callers.
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      094ea9a0
    • P
      dm ioctl: remove __dev_status from geometry and target message · 6be54494
      Peter Rajnoha 提交于
      Remove useless __dev_status call while processing an ioctl that sets up
      device geometry and target message.  The data is not returned to
      userspace so there is no point collecting it and in the case of
      target_message it is collected before processing the message so if it
      did return it might be stale.
      Signed-off-by: NPeter Rajnoha <prajnoha@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      6be54494
    • M
      dm snapshot: test chunk size against both origin and snapshot · c2411045
      Mikulas Patocka 提交于
      Validate chunk size against both origin and snapshot sector size
      
      Don't allow chunk size smaller than either origin or snapshot logical
      sector size. Reading or writing data not aligned to sector size is not
      allowed and causes immediate errors.
      
      This requires us to open the origin before initialising the
      exception store and to export dm_snap_origin.
      
      Cc: stable@kernel.org
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Reviewed-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      c2411045
    • M
      dm snapshot: iterate origin and cow devices · 1e5554c8
      Mikulas Patocka 提交于
      Iterate both origin and snapshot devices
      
      iterate_devices method should call the callback for all the devices where
      the bio may be remapped. Thus, snapshot_iterate_devices should call the callback
      for both snapshot and origin underlying devices because it remaps some bios
      to the snapshot and some to the origin.
      
      snapshot_iterate_devices called the callback only for the origin device.
      This led to badly calculated device limits if snapshot and origin were placed
      on different types of disks.
      
      Cc: stable@kernel.org
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Reviewed-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      1e5554c8
    • A
      dm mpath: fix NULL pointer dereference when path parameters missing · 6bbf79a1
      Alasdair G Kergon 提交于
      multipath_ctr() forgets to return an error after detecting
      missing path parameters.  Fix this.
      Signed-off-by: NPatrick LoPresti <lopresti@gmail.com>
      Cc: stable@kernel.org
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      6bbf79a1
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 · 5af568cb
      Linus Torvalds 提交于
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
        isofs: Fix lseek() to position beyond 4 GB
        vfs: remove unused MNT_STRICTATIME
        vfs: show unreachable paths in getcwd and proc
        vfs: only add " (deleted)" where necessary
        vfs: add prepend_path() helper
        vfs: __d_path: dont prepend the name of the root dentry
        ia64: perfmon: add d_dname method
        vfs: add helpers to get root and pwd
        cachefiles: use path_get instead of lone dget
        fs/sysv/super.c: add support for non-PDP11 v7 filesystems
        V7: Adjust sanity checks for some volumes
        Add v7 alias
        v9fs: fixup for inode_setattr being removed
      
      Manual merge to take Al's version of the fs/sysv/super.c file: it merged
      cleanly, but Al had removed an unnecessary header include, so his side
      was better.
      5af568cb
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus · 062e27ec
      Linus Torvalds 提交于
      * git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus:
        Squashfs: fix checkpatch.pl warnings
        Squashfs: fix filename typo
        Squashfs: update Kconfig and documentation for LZO
        Squashfs: fix block size use in LZO decompressor
        Squashfs: Add LZO compression support
        squashfs: fix filename in header comment
        Squashfs: Make XATTR config name consistent with other file systems
        squashfs: fix compiler inline warning
      062e27ec
    • L
      Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd · bf25db36
      Linus Torvalds 提交于
      * 'for-linus' of git://git.open-osd.org/linux-open-osd:
        exofs: Fix groups code when num_devices is not divisible by group_width
        exofs: Remove useless optimization
        exofs: exofs_file_fsync and exofs_file_flush correctness
        exofs: Remove superfluous dependency on buffer_head and writeback
      bf25db36
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client · 682c30ed
      Linus Torvalds 提交于
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (39 commits)
        ceph: generalize mon requests, add pool op support
        ceph: only queue async writeback on cap revocation if there is dirty data
        ceph: do not ignore osd_idle_ttl mount option
        ceph: constify dentry_operations
        ceph: whitespace cleanup
        ceph: add flock/fcntl lock support
        ceph: define on-wire types, constants for file locking support
        ceph: add CEPH_FEATURE_FLOCK to the supported feature bits
        ceph: support v2 reconnect encoding
        ceph: support v2 client_caps encoding
        ceph: move AES iv definition to shared header
        ceph: fix decoding of pool snap info
        ceph: make ->sync_fs not wait if wait==0
        ceph: warn on missing snap realm
        ceph: print useful error message when crush rule not found
        ceph: use %pU to print uuid (fsid)
        ceph: sync header defs with server code
        ceph: clean up header guards
        ceph: strip misleading/obsolete version, feature info
        ceph: specify supported features in super.h
        ...
      682c30ed
    • L
      Merge branch 'msm-video' of git://codeaurora.org/quic/kernel/dwalker/linux-msm · 84479f3c
      Linus Torvalds 提交于
      * 'msm-video' of git://codeaurora.org/quic/kernel/dwalker/linux-msm:
        video: msm: Fix section mismatch in mddi.c.
        drivers: video: msm: drop some unused variables
      84479f3c
    • L
      Merge branch 'ixp4xx' of git://git.kernel.org/pub/scm/linux/kernel/git/chris/linux-2.6 · 946880fa
      Linus Torvalds 提交于
      * 'ixp4xx' of git://git.kernel.org/pub/scm/linux/kernel/git/chris/linux-2.6:
        IXP4xx: Fix LL debugging on little-endian CPU.
        IXP4xx: Fix sparse warnings in I/O primitives.
        IXP4xx: Make mdio_bus struct static in the Ethernet driver.
        IXP4xx: Fix ixp4xx_crypto little-endian operation.
        IXP4xx: Prevent HSS transmitter lockup by disabling FRaMe signals.
        ixp4xx/vulcan: add PCI support
        ixp4xx: base support for Arcom Vulcan
      946880fa
    • L
      Merge branch 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm · 636d1742
      Linus Torvalds 提交于
      * 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm: (226 commits)
        ARM: 6323/1: cam60: don't use __init for cam60_spi_{flash_platform_data,partitions}
        ARM: 6324/1: cam60: move cam60_spi_devices to .init.data
        ARM: 6322/1: imx/pca100: Fix name of spi platform data
        ARM: 6321/1: fix syntax error in main Kconfig file
        ARM: 6297/1: move U300 timer to dynamic clock lookup
        ARM: 6296/1: clock U300 intcon and timer properly
        ARM: 6295/1: fix U300 apb_pclk split
        ARM: 6306/1: fix inverted MMC card detect in U300
        ARM: 6299/1: errata: TLBIASIDIS and TLBIMVAIS operations can broadcast a faulty ASID
        ARM: 6294/1: etm: do a dummy read from OSSRR during initialization
        ARM: 6292/1: coresight: add ETM management registers
        ARM: 6288/1: ftrace: document mcount formats
        ARM: 6287/1: ftrace: clean up mcount assembly indentation
        ARM: 6286/1: fix Thumb-2 decompressor broken by "Auto calculate ZRELADDR"
        ARM: 6281/1: video/imxfb.c: allow usage without BACKLIGHT_CLASS_DEVICE
        ARM: 6280/1: imx: Fix build failure when including <mach/gpio.h> without <linux/spinlock.h>
        ARM: S5PV210: Fix on missing s3c-sdhci card detection method for hsmmc3
        ARM: S5P: Fix on missing S5P_DEV_FIMC in plat-s5p/Kconfig
        ARM: S5PV210: Override FIMC driver name on Aquila board
        ARM: S5PC100: enable FIMC on SMDKC100
        ...
      
      Fix up conflicts in arch/arm/mach-{s5pc100,s5pv210}/cpu.c due to
      different subsystem 'setname' calls, and trivial port types in
      include/linux/serial_core.h
      636d1742
  2. 11 8月, 2010 17 次提交