1. 23 2月, 2015 3 次提交
    • G
      Use separate bitmaps for each nodes in the cluster · b97e9257
      Goldwyn Rodrigues 提交于
      On-disk format:
      
      0                    4k                     8k                    12k
      -------------------------------------------------------------------
      | idle                | md super            | bm super [0] + bits |
      | bm bits[0, contd]   | bm super[1] + bits  | bm bits[1, contd]   |
      | bm super[2] + bits  | bm bits [2, contd]  | bm super[3] + bits  |
      | bm bits [3, contd]  |                     |                     |
      
      Bitmap super has a field nodes, which defines the maximum number
      of nodes the device can use. While reading the bitmap super, if
      the cluster finds out that the number of nodes is > 0:
      1. Requests the md-cluster module.
      2. Calls md_cluster_ops->join(), which sets up clustering such as
         joining DLM lockspace.
      
      Since the first time, the first bitmap is read. After the call
      to the cluster_setup, the bitmap offset is adjusted and the
      superblock is re-read. This also ensures the bitmap is read
      the bitmap lock (when bitmap lock is introduced in later patches)
      
      Questions:
      1. cluster name is repeated in all bitmap supers. Is that okay?
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      b97e9257
    • G
      Add node recovery callbacks · cf921cc1
      Goldwyn Rodrigues 提交于
      DLM offers callbacks when a node fails and the lock remastery
      is performed:
      
      1. recover_prep: called when DLM discovers a node is down
      2. recover_slot: called when DLM identifies the node and recovery
      		can start
      3. recover_done: called when all nodes have completed recover_slot
      
      recover_slot() and recover_done() are also called when the node joins
      initially in order to inform the node with its slot number. These slot
      numbers start from one, so we deduct one to make it start with zero
      which the cluster-md code uses.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      cf921cc1
    • G
      Introduce md_cluster_info · c4ce867f
      Goldwyn Rodrigues 提交于
      md_cluster_info stores the cluster information in the MD device.
      
      The join() is called when mddev detects it is a clustered device.
      The main responsibilities are:
      	1. Setup a DLM lockspace
      	2. Setup all initial locks such as super block locks and bitmap lock (will come later)
      
      The leave() clears up the lockspace and all the locks held.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      c4ce867f
  2. 06 2月, 2015 2 次提交
  3. 02 2月, 2015 1 次提交
  4. 09 10月, 2014 1 次提交
    • N
      md/bitmap: always wait for writes on unplug. · 4b5060dd
      NeilBrown 提交于
      If two threads call bitmap_unplug at the same time, then
      one might schedule all the writes, and the other might
      decide that it doesn't need to wait.  But really it does.
      
      It rarely hurts to wait when it isn't absolutely necessary,
      and the current code doesn't really focus on 'absolutely necessary'
      anyway.  So just wait always.
      
      This can potentially lead to data corruption if a crash happens
      at an awkward time and data was written before the bitmap was
      updated.  It is very unlikely, but this should go to -stable
      just to be safe.  Appropriate for any -stable.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Cc: stable@vger.kernel.org (please delay until 3.18 is released)
      4b5060dd
  5. 29 5月, 2014 1 次提交
  6. 09 4月, 2014 1 次提交
    • N
      md/bitmap: don't abuse i_writecount for bitmap files. · 035328c2
      NeilBrown 提交于
      md bitmap code currently tries to use i_writecount to stop any other
      process from writing to out bitmap file.  But that is really an abuse
      and has bit-rotted so locking is all wrong.
      
      So discard that - root should be allowed to shoot self in foot.
      
      Still use it in a much less intrusive way to stop the same file being
      used as bitmap on two different array, and apply other checks to
      ensure the file is at least vaguely usable for bitmap storage
      (is regular, is open for write.  Support for ->bmap is already checked
      elsewhere).
      Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      035328c2
  7. 12 12月, 2013 1 次提交
    • T
      kernfs: s/sysfs_dirent/kernfs_node/ and rename its friends accordingly · 324a56e1
      Tejun Heo 提交于
      kernfs has just been separated out from sysfs and we're already in
      full conflict mode.  Nothing can make the situation any worse.  Let's
      take the chance to name things properly.
      
      This patch performs the following renames.
      
      * s/sysfs_elem_dir/kernfs_elem_dir/
      * s/sysfs_elem_symlink/kernfs_elem_symlink/
      * s/sysfs_elem_attr/kernfs_elem_file/
      * s/sysfs_dirent/kernfs_node/
      * s/sd/kn/ in kernfs proper
      * s/parent_sd/parent/
      * s/target_sd/target/
      * s/dir_sd/parent/
      * s/to_sysfs_dirent()/rb_to_kn()/
      * misc renames of local vars when they conflict with the above
      
      Because md, mic and gpio dig into sysfs details, this patch ends up
      modifying them.  All are sysfs_dirent renames and trivial.  While we
      can avoid these by introducing a dummy wrapping struct sysfs_dirent
      around kernfs_node, given the limited usage outside kernfs and sysfs
      proper, I don't think such workaround is called for.
      
      This patch is strictly rename only and doesn't introduce any
      functional difference.
      
      - mic / gpio renames were missing.  Spotted by kbuild test robot.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
      Cc: kbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      324a56e1
  8. 27 9月, 2013 1 次提交
    • T
      sysfs: clean up sysfs_get_dirent() · 388975cc
      Tejun Heo 提交于
      The pre-existing sysfs interfaces which take explicit namespace
      argument are weird in that they place the optional @ns in front of
      @name which is contrary to the established convention.  For example,
      we end up forcing vast majority of sysfs_get_dirent() users to do
      sysfs_get_dirent(parent, NULL, name), which is silly and error-prone
      especially as @ns and @name may be interchanged without causing
      compilation warning.
      
      This renames sysfs_get_dirent() to sysfs_get_dirent_ns() and swap the
      positions of @name and @ns, and sysfs_get_dirent() is now a wrapper
      around sysfs_get_dirent_ns().  This makes confusions a lot less
      likely.
      
      There are other interfaces which take @ns before @name.  They'll be
      updated by following patches.
      
      This patch doesn't introduce any functional changes.
      
      v2: EXPORT_SYMBOL_GPL() wasn't updated leading to undefined symbol
          error on module builds.  Reported by build test robot.  Fixed.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Kay Sievers <kay@vrfy.org>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      388975cc
  9. 14 6月, 2013 1 次提交
  10. 24 4月, 2013 1 次提交
  11. 23 2月, 2013 1 次提交
  12. 11 10月, 2012 2 次提交
  13. 02 8月, 2012 1 次提交
    • N
      md/raid1: submit IO from originating thread instead of md thread. · f54a9d0e
      NeilBrown 提交于
      queuing writes to the md thread means that all requests go through the
      one processor which may not be able to keep up with very high request
      rates.
      
      So use the plugging infrastructure to submit all requests on unplug.
      If a 'schedule' is needed, we fall back on the old approach of handing
      the requests to the thread for it to handle.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      f54a9d0e
  14. 22 5月, 2012 22 次提交
    • N
      md/bitmap: record the space available for the bitmap in the superblock. · 1dff2b87
      NeilBrown 提交于
      Now that bitmaps can grow and shrink it is best if we record
      how much space is available.  This means that when
      we reduce the size of the bitmap we won't "lose" the space
      for late when we might want to increase the size of the bitmap
      again.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      1dff2b87
    • N
      md/bitmap: make sure reshape request are reflected in superblock. · b81a0404
      NeilBrown 提交于
      As a reshape may change the sync_size and/or chunk_size, we need
      to update these whenever we write out the bitmap superblock.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      b81a0404
    • N
      md/bitmap: add bitmap_resize function to allow bitmap resizing. · d60b479d
      NeilBrown 提交于
      This function will allocate the new data structures and copy
      bits across from old to new, allowing for the possibility that the
      chunksize has changed.
      
      Use the same function for performing the initial allocation
      of the structures.  This improves test coverage.
      
      When bitmap_resize is used to resize an existing bitmap, it
      only copies '1' bits in, not '0' bits.
      So when allocating the bitmap, ensure everything is initialised
      to ZERO.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      d60b479d
    • N
      md/bitmap: use DIV_ROUND_UP instead of open-code · 15702d7f
      NeilBrown 提交于
      Also take the opportunity to simplify CHUNK_BLOCK_RATIO.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      15702d7f
    • N
      md/bitmap: create a 'struct bitmap_counts' substructure of 'struct bitmap' · 40cffcc0
      NeilBrown 提交于
      The new "struct bitmap_counts" contains all the fields that are
      related to counting the number of active writes in each bitmap chunk.
      
      Having this separate will make it easier to change the chunksize
      or overall size of a bitmap atomically.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      40cffcc0
    • N
      md/bitmap: make bitmap bitops atomic. · 63c68268
      NeilBrown 提交于
      This allows us to remove spinlock protection which is
      more heavy-weight than simple atomics.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      63c68268
    • N
      md/bitmap: make _page_attr bitops atomic. · bdfd1140
      NeilBrown 提交于
      Using e.g. set_bit instead of __set_bit and using test_and_clear_bit
      allow us to remove some locking and contract other locked ranges.
      
      It is rare that we set or clear a lot of these bits, so gain should
      outweigh any cost.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      bdfd1140
    • N
      md/bitmap: merge bitmap_file_unmap and bitmap_file_put. · fae7d326
      NeilBrown 提交于
      There functions really do one thing together: release the
      'bitmap_storage'.  So make them just one function.
      
      Since we removed the locking (previous patch), we don't need to zero
      any fields before freeing them, so it all becomes a bit simpler.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      fae7d326
    • N
      md/bitmap: remove async freeing of bitmap file. · 62f82faa
      NeilBrown 提交于
      There is no real value in freeing things the moment there is an error.
      It is just as good to free the bitmap file and pages when the bitmap
      is explicitly removed (and replaced?) or at shutdown.
      
      With this gone, the bitmap will only disappear when the array is
      quiescent, so we can remove some locking.
      
      As the 'filemap' doesn't disappear now, include extra checks before
      trying to write any of it out.
      Also remove the check for "has it disappeared" in
      bitmap_daemon_write().
      Signed-off-by: NNeilBrown <neilb@suse.de>
      62f82faa
    • N
      md/bitmap: convert some spin_lock_irqsave to spin_lock_irq · 74667123
      NeilBrown 提交于
      All of these sites can only be called from process context with
      irqs enabled, so using irqsave/irqrestore just adds noise.
      Remove it.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      74667123
    • N
      md/bitmap: use set_bit, test_bit, etc for operation on bitmap->flags. · b405fe91
      NeilBrown 提交于
      We currently use '&' and '|' which isn't the norm in the kernel
      and doesn't allow easy atomicity.
      So change to bit numbers and {set,clear,test}_bit.
      This allows us to remove a spinlock/unlock (which was dubious anyway)
      and some other simplifications.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      b405fe91
    • N
      md/bitmap: remove single-bit manipulation on sb->state · 84e92345
      NeilBrown 提交于
      Just do single-bit manipulations on bitmap->flags and copy whole
      value between that and sb->state.
      
      This will allow next patch which changes how bit manipulations are
      performed on bitmap->flags.
      
      This does result in BITMAP_STALE not being set in sb by
      bitmap_read_sb, however as the setting is determined by other
      information in the 'sb' we do not lose information this way.
      Normally, bitmap_load will be called shortly which will clear
      BITMAP_STALE anyway.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      84e92345
    • N
      md/bitmap: remove bitmap_mask_state · edbb79df
      NeilBrown 提交于
      This function isn't really needed.  It sets or clears a flag in both
      bitmap->flags and sb->state.
      However both times it is called, bitmap_update_sb is called soon
      afterwards which copies bitmap->flags to sb->state.
      So just make changes to bitmap->flags, and open-code those rather than
      hiding in a function.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      edbb79df
    • N
      md/bitmap: move storage allocation from bitmap_load to bitmap_create. · bc9891a8
      NeilBrown 提交于
      We should allocate memory for the storage-bitmap at create-time, not
      load time.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      bc9891a8
    • N
      md/bitmap: separate bitmap file allocation to its own function. · d1244cb0
      NeilBrown 提交于
      This will allow allocation before swapping in a new bitmap.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      d1244cb0
    • N
      md/bitmap: store bytes in file rather than just in last page. · 9b1215c1
      NeilBrown 提交于
      This number is more generally useful, and bytes-in-last-page is
      easily extracted from it.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      9b1215c1
    • N
      md/bitmap: move some fields of 'struct bitmap' into a 'storage' substruct. · 1ec885cd
      NeilBrown 提交于
      This new 'struct bitmap_storage' reflects the external storage of the
      bitmap.
      Having this clearly defined will make it easier to change the storage
      used while the array is active.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      1ec885cd
    • N
      md/bitmap: change *_page_attr() to take a page number, not a page. · d189122d
      NeilBrown 提交于
      Most often we have the page number, not the page.  And that is what
      the  *_page_attr() functions really want.  So change the arguments to
      take that number.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      d189122d
    • N
      md/bitmap: centralise allocation of bitmap file pages. · 27581e5a
      NeilBrown 提交于
      Instead of allocating pages in read_sb_page, read_page and
      bitmap_read_sb, allocate them all in bitmap_init_from disk.
      
      Also replace the hack of calling "attach_page_buffers(page, NULL)" to
      ensure that free_buffer() won't complain, by putting a test for
      PagePrivate in free_buffer().
      Signed-off-by: NNeilBrown <neilb@suse.de>
      27581e5a
    • N
      md/bitmap: allow a bitmap with no backing storage. · ef99bf48
      NeilBrown 提交于
      An md bitmap comprises two parts
       - internal counting of active writes per 'chunk'.
       - external storage of whether there are any active writes on
         each chunk
      
      The second requires the first, but the first doesn't require the
      second.
      
      Not having backing storage means that the bitmap cannot expedite
      resync after a crash, but it still allows us to expedite the recovery
      of a recently-removed device.
      
      So: allow a bitmap to exist even if there is no backing device.
      In that case we default to 128M chunks.
      
      A particular value of this is that we can remove and re-add a bitmap
      (possibly of a different granularity) on a degraded array, and not
      lose the information needed to fast-recover the missing device.
      
      We don't actually activate these bitmaps yet - that will come
      in a later patch.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      ef99bf48
    • N
      md/bitmap: add new 'space' attribute for bitmaps. · 6409bb05
      NeilBrown 提交于
      If we are to allow bitmaps to be resized when the array is resized,
      we need to know how much space there is.
      
      So create an attribute to store this information and set appropriate
      defaults.
      
      It can be set more precisely via sysfs, or future metadata extensions
      may allow it to be recorded.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      6409bb05
    • N
      md/bitmap: disentangle two different 'pending' flags. · bf07bb7d
      NeilBrown 提交于
      There are two different 'pending' concepts in the handling of the
      write intent bitmap.
      
      Firstly, a 'page' from the bitmap (which container PAGE_SIZE*8 bits)
      may have changes (bits cleared) that should be written in due course.
      There is no hurry for these and the page will transition from
      PENDING to NEEDWRITE and will then be written, though if it ever
      becomes DIRTY it will be written much sooner and PENDING will be
      cleared.
      
      Secondly, a page of counters - which contains PAGE_SIZE/2 counters, one
      for each bit, can usefully have a 'pending' flag which indicates if
      any of the counters are low (2 or 1) and ready to be processed by
      bitmap_daemon_work().  If this flag is clear we can skip the whole
      page.
      
      These two concepts are currently combined in the bitmap-file flag.
      This causes a tighter connection between the counters and the bitmap
      file than I would like - as I want to add some flexibility to the
      bitmap file.
      
      So introduce a new flag with the page-of-counters, and rewrite
      bitmap_daemon_work() so that it handles the two different 'pending'
      concepts separately.
      
      This also allows us to clear BITMAP_PAGE_PENDING when we write out
      a dirty page, which may occasionally reduce the number of times we
      write a page.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      bf07bb7d
  15. 04 5月, 2012 1 次提交