1. 05 5月, 2016 3 次提交
    • G
      md-cluster/bitmap: fix wrong calcuation of offset · 7f86ffed
      Guoqing Jiang 提交于
      The offset is wrong in bitmap_storage_alloc, we should
      set it like below in bitmap_init_from_disk().
      
      node_offset = bitmap->cluster_slot * (DIV_ROUND_UP(store->bytes, PAGE_SIZE));
      
      Because 'offset' is only assigned to 'page->index' and
      that is usually over-written by read_sb_page. So it does
      not cause problem in general, but it still need to be fixed.
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      7f86ffed
    • G
      md-cluster: sync bitmap when node received RESYNCING msg · 18c9ff7f
      Guoqing Jiang 提交于
      If the node received RESYNCING message which means
      another node will perform resync with the area, then
      we don't want to do it again in another node.
      
      Let's set RESYNC_MASK and clear NEEDED_MASK for the
      region from old-low to new-low which has finished
      syncing, and the region from old-hi to new-hi is about
      to syncing, bitmap_sync_with_cluste is introduced for
      the purpose.
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      18c9ff7f
    • G
      md-cluster: always setup in-memory bitmap · c9d65032
      Guoqing Jiang 提交于
      The in-memory bitmap for raid is allocated on demand,
      then for cluster scenario, it is possible that slave
      node which received RESYNCING message doesn't have the
      in-memory bitmap when master node is perform resyncing,
      so we can't make bitmap is match up well among each
      nodes.
      
      So for cluster scenario, we need always preserve the
      bitmap, and ensure the page will not be freed. And a
      no_hijack flag is introduced to both bitmap_checkpage
      and bitmap_get_counter, which makes cluster raid returns
      fail once allocate failed.
      
      And the next patch is relied on this change since it
      keeps sync bitmap among each nodes during resyncing
      stage.
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      c9d65032
  2. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  3. 02 4月, 2016 1 次提交
  4. 15 3月, 2016 1 次提交
  5. 08 3月, 2016 1 次提交
  6. 25 1月, 2016 1 次提交
  7. 12 10月, 2015 2 次提交
    • G
      md-cluster: Use a small window for resync · c40f341f
      Goldwyn Rodrigues 提交于
      Suspending the entire device for resync could take too long. Resync
      in small chunks.
      
      cluster's resync window (32M) is maintained in r1conf as
      cluster_sync_low and cluster_sync_high and processed in
      raid1's sync_request(). If the current resync is outside the cluster
      resync window:
      
      1. Set the cluster_sync_low to curr_resync_completed.
      2. Check if the sync will fit in the new window, if not issue a
         wait_barrier() and set cluster_sync_low to sector_nr.
      3. Set cluster_sync_high to cluster_sync_low + resync_window.
      4. Send a message to all nodes so they may add it in their suspension
         list.
      
      bitmap_cond_end_sync is modified to allow to force a sync inorder
      to get the curr_resync_completed uptodate with the sector passed.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      c40f341f
    • G
      md: Increment version for clustered bitmaps · 3c462c88
      Goldwyn Rodrigues 提交于
      Add BITMAP_MAJOR_CLUSTERED as 5, in order to prevent older kernels
      to assemble a clustered device.
      
      In order to maximize compatibility, the major version is set to
      BITMAP_MAJOR_CLUSTERED *only* if the bitmap is clustered.
      
      Added MD_FEATURE_CLUSTERED in order to return error for older
      kernels which would assemble MD even if the bitmap is corrupted.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      3c462c88
  8. 02 10月, 2015 1 次提交
  9. 24 7月, 2015 2 次提交
  10. 23 7月, 2015 1 次提交
    • G
      md: Skip cluster setup for dm-raid · d3b178ad
      Goldwyn Rodrigues 提交于
      There is a bug that the bitmap superblock isn't initialised properly for
      dm-raid, so a new field can have garbage in new fields.
      (dm-raid does initialisation in the kernel - md initialised the
       superblock in mdadm).
      
      This means that for dm-raid we cannot currently trust the new ->nodes
      field. So:
       - use __GFP_ZERO to initialise the superblock properly for all new
          arrays
       - initialise all fields in bitmap_info in bitmap_new_disk_sb
       - ignore ->nodes for dm arrays (yes, this is a hack)
      
      This bug exposes dm-raid to bug in the (still experimental) md-cluster
      code, so it is suitable for -stable.  It does cause crashes.
      
      References: https://bugzilla.kernel.org/show_bug.cgi?id=100491
      Cc: stable@vger.kernel.org (v4.1)
      Signed-off-By: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      d3b178ad
  11. 24 6月, 2015 2 次提交
  12. 21 5月, 2015 1 次提交
  13. 22 4月, 2015 1 次提交
  14. 25 3月, 2015 1 次提交
  15. 04 3月, 2015 2 次提交
  16. 23 2月, 2015 5 次提交
    • G
      Copy set bits from another slot · 11dd35da
      Goldwyn Rodrigues 提交于
      bitmap_copy_from_slot reads the bitmap from the slot mentioned.
      It then copies the set bits to the node local bitmap.
      
      This is helper function for the resync operation on node failure.
      
      bitmap_set_memory_bits() currently assumes it is only run at startup and that
      they bitmap is currently empty.  So if it finds that a region is already
      marked as dirty, it won't mark it dirty again. Change bitmap_set_memory_bits()
      to always set the NEEDED_MASK bit if 'needed' is set.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      11dd35da
    • G
      bitmap_create returns bitmap pointer · f9209a32
      Goldwyn Rodrigues 提交于
      This is done to have multiple bitmaps open at the same time.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      f9209a32
    • G
      Use separate bitmaps for each nodes in the cluster · b97e9257
      Goldwyn Rodrigues 提交于
      On-disk format:
      
      0                    4k                     8k                    12k
      -------------------------------------------------------------------
      | idle                | md super            | bm super [0] + bits |
      | bm bits[0, contd]   | bm super[1] + bits  | bm bits[1, contd]   |
      | bm super[2] + bits  | bm bits [2, contd]  | bm super[3] + bits  |
      | bm bits [3, contd]  |                     |                     |
      
      Bitmap super has a field nodes, which defines the maximum number
      of nodes the device can use. While reading the bitmap super, if
      the cluster finds out that the number of nodes is > 0:
      1. Requests the md-cluster module.
      2. Calls md_cluster_ops->join(), which sets up clustering such as
         joining DLM lockspace.
      
      Since the first time, the first bitmap is read. After the call
      to the cluster_setup, the bitmap offset is adjusted and the
      superblock is re-read. This also ensures the bitmap is read
      the bitmap lock (when bitmap lock is introduced in later patches)
      
      Questions:
      1. cluster name is repeated in all bitmap supers. Is that okay?
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      b97e9257
    • G
      Add node recovery callbacks · cf921cc1
      Goldwyn Rodrigues 提交于
      DLM offers callbacks when a node fails and the lock remastery
      is performed:
      
      1. recover_prep: called when DLM discovers a node is down
      2. recover_slot: called when DLM identifies the node and recovery
      		can start
      3. recover_done: called when all nodes have completed recover_slot
      
      recover_slot() and recover_done() are also called when the node joins
      initially in order to inform the node with its slot number. These slot
      numbers start from one, so we deduct one to make it start with zero
      which the cluster-md code uses.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      cf921cc1
    • G
      Introduce md_cluster_info · c4ce867f
      Goldwyn Rodrigues 提交于
      md_cluster_info stores the cluster information in the MD device.
      
      The join() is called when mddev detects it is a clustered device.
      The main responsibilities are:
      	1. Setup a DLM lockspace
      	2. Setup all initial locks such as super block locks and bitmap lock (will come later)
      
      The leave() clears up the lockspace and all the locks held.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      c4ce867f
  17. 06 2月, 2015 2 次提交
  18. 02 2月, 2015 1 次提交
  19. 09 10月, 2014 1 次提交
    • N
      md/bitmap: always wait for writes on unplug. · 4b5060dd
      NeilBrown 提交于
      If two threads call bitmap_unplug at the same time, then
      one might schedule all the writes, and the other might
      decide that it doesn't need to wait.  But really it does.
      
      It rarely hurts to wait when it isn't absolutely necessary,
      and the current code doesn't really focus on 'absolutely necessary'
      anyway.  So just wait always.
      
      This can potentially lead to data corruption if a crash happens
      at an awkward time and data was written before the bitmap was
      updated.  It is very unlikely, but this should go to -stable
      just to be safe.  Appropriate for any -stable.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Cc: stable@vger.kernel.org (please delay until 3.18 is released)
      4b5060dd
  20. 29 5月, 2014 1 次提交
  21. 09 4月, 2014 1 次提交
    • N
      md/bitmap: don't abuse i_writecount for bitmap files. · 035328c2
      NeilBrown 提交于
      md bitmap code currently tries to use i_writecount to stop any other
      process from writing to out bitmap file.  But that is really an abuse
      and has bit-rotted so locking is all wrong.
      
      So discard that - root should be allowed to shoot self in foot.
      
      Still use it in a much less intrusive way to stop the same file being
      used as bitmap on two different array, and apply other checks to
      ensure the file is at least vaguely usable for bitmap storage
      (is regular, is open for write.  Support for ->bmap is already checked
      elsewhere).
      Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      035328c2
  22. 12 12月, 2013 1 次提交
    • T
      kernfs: s/sysfs_dirent/kernfs_node/ and rename its friends accordingly · 324a56e1
      Tejun Heo 提交于
      kernfs has just been separated out from sysfs and we're already in
      full conflict mode.  Nothing can make the situation any worse.  Let's
      take the chance to name things properly.
      
      This patch performs the following renames.
      
      * s/sysfs_elem_dir/kernfs_elem_dir/
      * s/sysfs_elem_symlink/kernfs_elem_symlink/
      * s/sysfs_elem_attr/kernfs_elem_file/
      * s/sysfs_dirent/kernfs_node/
      * s/sd/kn/ in kernfs proper
      * s/parent_sd/parent/
      * s/target_sd/target/
      * s/dir_sd/parent/
      * s/to_sysfs_dirent()/rb_to_kn()/
      * misc renames of local vars when they conflict with the above
      
      Because md, mic and gpio dig into sysfs details, this patch ends up
      modifying them.  All are sysfs_dirent renames and trivial.  While we
      can avoid these by introducing a dummy wrapping struct sysfs_dirent
      around kernfs_node, given the limited usage outside kernfs and sysfs
      proper, I don't think such workaround is called for.
      
      This patch is strictly rename only and doesn't introduce any
      functional difference.
      
      - mic / gpio renames were missing.  Spotted by kbuild test robot.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
      Cc: kbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      324a56e1
  23. 27 9月, 2013 1 次提交
    • T
      sysfs: clean up sysfs_get_dirent() · 388975cc
      Tejun Heo 提交于
      The pre-existing sysfs interfaces which take explicit namespace
      argument are weird in that they place the optional @ns in front of
      @name which is contrary to the established convention.  For example,
      we end up forcing vast majority of sysfs_get_dirent() users to do
      sysfs_get_dirent(parent, NULL, name), which is silly and error-prone
      especially as @ns and @name may be interchanged without causing
      compilation warning.
      
      This renames sysfs_get_dirent() to sysfs_get_dirent_ns() and swap the
      positions of @name and @ns, and sysfs_get_dirent() is now a wrapper
      around sysfs_get_dirent_ns().  This makes confusions a lot less
      likely.
      
      There are other interfaces which take @ns before @name.  They'll be
      updated by following patches.
      
      This patch doesn't introduce any functional changes.
      
      v2: EXPORT_SYMBOL_GPL() wasn't updated leading to undefined symbol
          error on module builds.  Reported by build test robot.  Fixed.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Kay Sievers <kay@vrfy.org>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      388975cc
  24. 14 6月, 2013 1 次提交
  25. 24 4月, 2013 1 次提交
  26. 23 2月, 2013 1 次提交
  27. 11 10月, 2012 2 次提交
  28. 02 8月, 2012 1 次提交
    • N
      md/raid1: submit IO from originating thread instead of md thread. · f54a9d0e
      NeilBrown 提交于
      queuing writes to the md thread means that all requests go through the
      one processor which may not be able to keep up with very high request
      rates.
      
      So use the plugging infrastructure to submit all requests on unplug.
      If a 'schedule' is needed, we fall back on the old approach of handing
      the requests to the thread for it to handle.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      f54a9d0e