1. 18 12月, 2015 1 次提交
  2. 01 11月, 2015 2 次提交
  3. 24 10月, 2015 2 次提交
  4. 12 10月, 2015 1 次提交
    • G
      md-cluster: Improve md_reload_sb to be less error prone · 70bcecdb
      Goldwyn Rodrigues 提交于
      md_reload_sb is too simplistic and it explicitly needs to determine
      the changes made by the writing node. However, there are multiple areas
      where a simple reload could fail.
      
      Instead, read the superblock of one of the "good" rdevs and update
      the necessary information:
      
      - read the superblock into a newly allocated page, by temporarily
        swapping out rdev->sb_page and calling ->load_super.
      - if that fails return
      - if it succeeds, call check_sb_changes
        1. iterates over list of active devices and checks the matching
         dev_roles[] value.
         	If that is 'faulty', the device must be  marked as faulty
      	 - call md_error to mark the device as faulty. Make sure
      	   not to set CHANGE_DEVS and wakeup mddev->thread or else
      	   it would initiate a resync process, which is the responsibility
      	   of the "primary" node.
      	 - clear the Blocked bit
      	 - Call remove_and_add_spares() to hot remove the device.
      	If the device is 'spare':
      	 - call remove_and_add_spares() to get the number of spares
      	   added in this operation.
      	 - Reduce mddev->degraded to mark the array as not degraded.
        2. reset recovery_cp
      - read the rest of the rdevs to update recovery_offset. If recovery_offset
        is equal to MaxSector, call spare_active() to set it In_sync
      
      This required that recovery_offset be initialized to MaxSector, as
      opposed to zero so as to communicate the end of sync for a rdev.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      70bcecdb
  5. 14 8月, 2015 1 次提交
    • K
      block: kill merge_bvec_fn() completely · 8ae12666
      Kent Overstreet 提交于
      As generic_make_request() is now able to handle arbitrarily sized bios,
      it's no longer necessary for each individual block driver to define its
      own ->merge_bvec_fn() callback. Remove every invocation completely.
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: drbd-user@lists.linbit.com
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Yehuda Sadeh <yehuda@inktank.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Alex Elder <elder@kernel.org>
      Cc: ceph-devel@vger.kernel.org
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: Neil Brown <neilb@suse.de>
      Cc: linux-raid@vger.kernel.org
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Acked-by: NeilBrown <neilb@suse.de> (for the 'md' bits)
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
      [dpark: also remove ->merge_bvec_fn() in dm-thin as well as
       dm-era-target, and resolve merge conflicts]
      Signed-off-by: NDongsu Park <dpark@posteo.net>
      Signed-off-by: NMing Lin <ming.l@ssi.samsung.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      8ae12666
  6. 02 6月, 2015 1 次提交
    • T
      writeback: separate out include/linux/backing-dev-defs.h · 66114cad
      Tejun Heo 提交于
      With the planned cgroup writeback support, backing-dev related
      declarations will be more widely used across block and cgroup;
      unfortunately, including backing-dev.h from include/linux/blkdev.h
      makes cyclic include dependency quite likely.
      
      This patch separates out backing-dev-defs.h which only has the
      essential definitions and updates blkdev.h to include it.  c files
      which need access to more backing-dev details now include
      backing-dev.h directly.  This takes backing-dev.h off the common
      include dependency chain making it a lot easier to use it across block
      and cgroup.
      
      v2: fs/fat build failure fixed.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      66114cad
  7. 22 4月, 2015 3 次提交
  8. 23 2月, 2015 5 次提交
    • G
      Add new disk to clustered array · 1aee41f6
      Goldwyn Rodrigues 提交于
      Algorithm:
      1. Node 1 issues mdadm --manage /dev/mdX --add /dev/sdYY which issues
         ioctl(ADD_NEW_DISC with disc.state set to MD_DISK_CLUSTER_ADD)
      2. Node 1 sends NEWDISK with uuid and slot number
      3. Other nodes issue kobject_uevent_env with uuid and slot number
      (Steps 4,5 could be a udev rule)
      4. In userspace, the node searches for the disk, perhaps
         using blkid -t SUB_UUID=""
      5. Other nodes issue either of the following depending on whether the disk
         was found:
         ioctl(ADD_NEW_DISK with disc.state set to MD_DISK_CANDIDATE and
      	 disc.number set to slot number)
         ioctl(CLUSTERED_DISK_NACK)
      6. Other nodes drop lock on no-new-devs (CR) if device is found
      7. Node 1 attempts EX lock on no-new-devs
      8. If node 1 gets the lock, it sends METADATA_UPDATED after unmarking the disk
         as SpareLocal
      9. If not (get no-new-dev lock), it fails the operation and sends METADATA_UPDATED
      10. Other nodes understand if the device is added or not by reading the superblock again after receiving the METADATA_UPDATED message.
      Signed-off-by: NLidong Zhong <lzhong@suse.com>
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      1aee41f6
    • G
      Reload superblock if METADATA_UPDATED is received · 1d7e3e96
      Goldwyn Rodrigues 提交于
      Re-reads the devices by invalidating the cache.
      Since we don't write to faulty devices, this is detected using
      events recorded in the devices. If it is old as compared to the mddev
      mark it is faulty.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      1d7e3e96
    • G
      Add node recovery callbacks · cf921cc1
      Goldwyn Rodrigues 提交于
      DLM offers callbacks when a node fails and the lock remastery
      is performed:
      
      1. recover_prep: called when DLM discovers a node is down
      2. recover_slot: called when DLM identifies the node and recovery
      		can start
      3. recover_done: called when all nodes have completed recover_slot
      
      recover_slot() and recover_done() are also called when the node joins
      initially in order to inform the node with its slot number. These slot
      numbers start from one, so we deduct one to make it start with zero
      which the cluster-md code uses.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      cf921cc1
    • G
      Introduce md_cluster_info · c4ce867f
      Goldwyn Rodrigues 提交于
      md_cluster_info stores the cluster information in the MD device.
      
      The join() is called when mddev detects it is a clustered device.
      The main responsibilities are:
      	1. Setup a DLM lockspace
      	2. Setup all initial locks such as super block locks and bitmap lock (will come later)
      
      The leave() clears up the lockspace and all the locks held.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      c4ce867f
    • G
      Introduce md_cluster_operations to handle cluster functions · edb39c9d
      Goldwyn Rodrigues 提交于
      This allows dynamic registering of cluster hooks.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      edb39c9d
  9. 06 2月, 2015 5 次提交
  10. 04 2月, 2015 5 次提交
    • N
      md: protect ->pers changes with mddev->lock · 36d091f4
      NeilBrown 提交于
      ->pers is already protected by ->reconfig_mutex, and
      cannot possibly change when there are threads running or
      outstanding IO.
      
      However there are some places where we access ->pers
      not in a thread or IO context, and where ->reconfig_mutex
      is unnecessarily heavy-weight:  level_show and md_seq_show().
      
      So protect all changes, and those accesses, with ->lock.
      This is a step toward taking those accesses out from under
      reconfig_mutex.
      
      [Fixed missing "mddev->pers" -> "pers" conversion, thanks to
       Dan Carpenter <dan.carpenter@oracle.com>]
      Signed-off-by: NNeilBrown <neilb@suse.de>
      36d091f4
    • N
      md: rename ->stop to ->free · afa0f557
      NeilBrown 提交于
      Now that the ->stop function only frees the private data,
      rename is accordingly.
      
      Also pass in the private pointer as an arg rather than using
      mddev->private.  This flexibility will be useful in level_store().
      
      Finally, don't clear ->private.  It doesn't make sense to clear
      it seeing that isn't what we free, and it is no longer necessary
      to clear ->private (it was some time ago before  ->to_remove was
      introduced).
      
      Setting ->to_remove in ->free() is a bit of a wart, but not a
      big problem at the moment.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      afa0f557
    • N
      md: make merge_bvec_fn more robust in face of personality changes. · 64590f45
      NeilBrown 提交于
      There is no locking around calls to merge_bvec_fn(), so
      it is possible that calls which coincide with a level (or personality)
      change could go wrong.
      
      So create a central dispatch point for these functions and use
      rcu_read_lock().
      If the array is suspended, reject any merge that can be rejected.
      If not, we know it is safe to call the function.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      64590f45
    • N
      md: make ->congested robust against personality changes. · 5c675f83
      NeilBrown 提交于
      There is currently no locking around calls to the 'congested'
      bdi function.  If called at an awkward time while an array is
      being converted from one level (or personality) to another, there
      is a tiny chance of running code in an unreferenced module etc.
      
      So add a 'congested' function to the md_personality operations
      structure, and call it with appropriate locking from a central
      'mddev_congested'.
      
      When the array personality is changing the array will be 'suspended'
      so no IO is processed.
      If mddev_congested detects this, it simply reports that the
      array is congested, which is a safe guess.
      As mddev_suspend calls synchronize_rcu(), mddev_congested can
      avoid races by included the whole call inside an rcu_read_lock()
      region.
      This require that the congested functions for all subordinate devices
      can be run under rcu_lock.  Fortunately this is the case.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      5c675f83
    • N
      md: rename mddev->write_lock to mddev->lock · 85572d7c
      NeilBrown 提交于
      This lock is used for (slightly) more than helping with writing
      superblocks, and it will soon be extended further.  So the
      name is inappropriate.
      
      Also, the _irq variant hasn't been needed since 2.6.37 as it is
      never taking from interrupt or bh context.
      
      So:
        -rename write_lock to lock
        -document what it protects
        -remove _irq ... except in md_flush_request() as there
           is no wait_event_lock() (with no _irq).  This can be
           cleaned up after appropriate changes to wait.h.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      85572d7c
  11. 14 10月, 2014 1 次提交
  12. 09 4月, 2014 1 次提交
    • N
      md/bitmap: don't abuse i_writecount for bitmap files. · 035328c2
      NeilBrown 提交于
      md bitmap code currently tries to use i_writecount to stop any other
      process from writing to out bitmap file.  But that is really an abuse
      and has bit-rotted so locking is all wrong.
      
      So discard that - root should be allowed to shoot self in foot.
      
      Still use it in a much less intrusive way to stop the same file being
      used as bitmap on two different array, and apply other checks to
      ensure the file is at least vaguely usable for bitmap storage
      (is regular, is open for write.  Support for ->bmap is already checked
      elsewhere).
      Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      035328c2
  13. 14 1月, 2014 1 次提交
    • N
      md: fix problem when adding device to read-only array with bitmap. · 8313b8e5
      NeilBrown 提交于
      If an array is started degraded, and then the missing device
      is found it can be re-added and a minimal bitmap-based recovery
      will bring it fully up-to-date.
      
      If the array is read-only a recovery would not be allowed.
      But also if the array is read-only and the missing device was
      present very recently, then there could be no need for any
      recovery at all, so we simply include the device in the read-only
      array without any recovery.
      
      However... if the missing device was removed a little longer ago
      it could be missing some updates, but if a bitmap is present it will
      be conditionally accepted pending a bitmap-based update.  We don't
      currently detect this case properly and will include that old
      device into the read-only array with no recovery even though it really
      needs a recovery.
      
      This patch keeps track of whether a bitmap-based-recovery is really
      needed or not in the new Bitmap_sync rdev flag.  If that is set,
      then the device will not be added to a read-only array.
      
      Cc: Andrei Warkentin <andreiw@vmware.com>
      Fixes: d70ed2e4
      Cc: stable@vger.kernel.org (3.2+)
      Signed-off-by: NNeilBrown <neilb@suse.de>
      8313b8e5
  14. 12 12月, 2013 1 次提交
    • T
      kernfs: s/sysfs_dirent/kernfs_node/ and rename its friends accordingly · 324a56e1
      Tejun Heo 提交于
      kernfs has just been separated out from sysfs and we're already in
      full conflict mode.  Nothing can make the situation any worse.  Let's
      take the chance to name things properly.
      
      This patch performs the following renames.
      
      * s/sysfs_elem_dir/kernfs_elem_dir/
      * s/sysfs_elem_symlink/kernfs_elem_symlink/
      * s/sysfs_elem_attr/kernfs_elem_file/
      * s/sysfs_dirent/kernfs_node/
      * s/sd/kn/ in kernfs proper
      * s/parent_sd/parent/
      * s/target_sd/target/
      * s/dir_sd/parent/
      * s/to_sysfs_dirent()/rb_to_kn()/
      * misc renames of local vars when they conflict with the above
      
      Because md, mic and gpio dig into sysfs details, this patch ends up
      modifying them.  All are sysfs_dirent renames and trivial.  While we
      can avoid these by introducing a dummy wrapping struct sysfs_dirent
      around kernfs_node, given the limited usage outside kernfs and sysfs
      proper, I don't think such workaround is called for.
      
      This patch is strictly rename only and doesn't introduce any
      functional difference.
      
      - mic / gpio renames were missing.  Spotted by kbuild test robot.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
      Cc: kbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      324a56e1
  15. 09 11月, 2013 1 次提交
  16. 27 9月, 2013 1 次提交
    • T
      sysfs: clean up sysfs_get_dirent() · 388975cc
      Tejun Heo 提交于
      The pre-existing sysfs interfaces which take explicit namespace
      argument are weird in that they place the optional @ns in front of
      @name which is contrary to the established convention.  For example,
      we end up forcing vast majority of sysfs_get_dirent() users to do
      sysfs_get_dirent(parent, NULL, name), which is silly and error-prone
      especially as @ns and @name may be interchanged without causing
      compilation warning.
      
      This renames sysfs_get_dirent() to sysfs_get_dirent_ns() and swap the
      positions of @name and @ns, and sysfs_get_dirent() is now a wrapper
      around sysfs_get_dirent_ns().  This makes confusions a lot less
      likely.
      
      There are other interfaces which take @ns before @name.  They'll be
      updated by following patches.
      
      This patch doesn't introduce any functional changes.
      
      v2: EXPORT_SYMBOL_GPL() wasn't updated leading to undefined symbol
          error on module builds.  Reported by build test robot.  Fixed.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Kay Sievers <kay@vrfy.org>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      388975cc
  17. 27 8月, 2013 2 次提交
    • N
      md: avoid deadlock when dirty buffers during md_stop. · 260fa034
      NeilBrown 提交于
      When the last process closes /dev/mdX sync_blockdev will be called so
      that all buffers get flushed.
      So if it is then opened for the STOP_ARRAY ioctl to be sent there will
      be nothing to flush.
      
      However if we open /dev/mdX in order to send the STOP_ARRAY ioctl just
      moments before some other process which was writing closes their file
      descriptor, then there won't be a 'last close' and the buffers might
      not get flushed.
      
      So do_md_stop() calls sync_blockdev().  However at this point it is
      holding ->reconfig_mutex.  So if the array is currently 'clean' then
      the writes from sync_blockdev() will not complete until the array
      can be marked dirty and that won't happen until some other thread
      can get ->reconfig_mutex.  So we deadlock.
      
      We need to move the sync_blockdev() call to before we take
      ->reconfig_mutex.
      However then some other thread could open /dev/mdX and write to it
      after we call sync_blockdev() and before we actually stop the array.
      This can leave dirty data in the page cache which is awkward.
      
      So introduce new flag MD_STILL_CLOSED.  Set it before calling
      sync_blockdev(), clear it if anyone does open the file, and abort the
      STOP_ARRAY attempt if it gets set before we lock against further
      opens.
      
      It is still possible to get problems if you open /dev/mdX, write to
      it, then issue the STOP_ARRAY ioctl.  Just don't do that.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      260fa034
    • N
      md: Don't test all of mddev->flags at once. · 7a0a5355
      NeilBrown 提交于
      mddev->flags is mostly used to record if an update of the
      metadata is needed.  Sometimes the whole field is tested
      instead of just the important bits.  This makes it difficult
      to introduce more state bits.
      
      So replace all bare tests of mddev->flags with tests for the bits
      that actually need testing.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      7a0a5355
  18. 26 6月, 2013 1 次提交
    • J
      MD: Remember the last sync operation that was performed · c4a39551
      Jonathan Brassow 提交于
      MD:  Remember the last sync operation that was performed
      
      This patch adds a field to the mddev structure to track the last
      sync operation that was performed.  This is especially useful when
      it comes to what is recorded in mismatch_cnt in sysfs.  If the
      last operation was "data-check", then it reports the number of
      descrepancies found by the user-initiated check.  If it was a
      "repair" operation, then it is reporting the number of
      descrepancies repaired.  etc.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      c4a39551
  19. 24 4月, 2013 1 次提交
  20. 20 3月, 2013 1 次提交
    • J
      MD: Prevent sysfs operations on uninitialized kobjects · 90584fc9
      Jonathan Brassow 提交于
      MD: Prevent sysfs operations on uninitialized kobjects
      
      Device-mapper does not use sysfs; but when device-mapper is leveraging
      MD's RAID personalities, MD sometimes attempts to update sysfs.  This
      patch adds checks for 'mddev-kobj.sd' in sysfs_[un]link_rdev to ensure
      it is about to operate on something valid.  This patch also checks for
      'mddev->kobj.sd' before calling 'sysfs_notify' in 'remove_and_add_spares'.
      Although 'sysfs_notify' already makes this check, doing so in
      'remove_and_add_spares' prevents an additional mutex operation.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      90584fc9
  21. 13 12月, 2012 1 次提交
  22. 30 11月, 2012 1 次提交
    • L
      wait: add wait_event_lock_irq() interface · eed8c02e
      Lukas Czerner 提交于
      New wait_event{_interruptible}_lock_irq{_cmd} macros added. This commit
      moves the private wait_event_lock_irq() macro from MD to regular wait
      includes, introduces new macro wait_event_lock_irq_cmd() instead of using
      the old method with omitting cmd parameter which is ugly and makes a use
      of new macros in the MD. It also introduces the _interruptible_ variant.
      
      The use of new interface is when one have a special lock to protect data
      structures used in the condition, or one also needs to invoke "cmd"
      before putting it to sleep.
      
      All new macros are expected to be called with the lock taken. The lock
      is released before sleep and is reacquired afterwards. We will leave the
      macro with the lock held.
      
      Note to DM: IMO this should also fix theoretical race on waitqueue while
      using simultaneously wait_event_lock_irq() and wait_event() because of
      lack of locking around current state setting and wait queue removal.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      eed8c02e
  23. 11 10月, 2012 1 次提交