1. 08 6月, 2016 2 次提交
  2. 10 5月, 2016 1 次提交
    • G
      md-cluster: gather resync infos and enable recv_thread after bitmap is ready · 51e453ae
      Guoqing Jiang 提交于
      The in-memory bitmap is not ready when node joins cluster,
      so it doesn't make sense to make gather_all_resync_info()
      called so earlier, we need to call it after the node's
      bitmap is setup. Also, recv_thread could be wake up after
      node joins cluster, but it could cause problem if node
      receives RESYNCING message without persionality since
      mddev->pers->quiesce is called in process_suspend_info.
      
      This commit introduces a new cluster interface load_bitmaps
      to fix above problems, load_bitmaps is called in bitmap_load
      where bitmap and persionality are ready, and load_bitmaps
      does the following tasks:
      
      1. call gather_all_resync_info to load all the node's
         bitmap info.
      2. set MD_CLUSTER_ALREADY_IN_CLUSTER bit to recv_thread
         could be wake up, and wake up recv_thread if there is
         pending recv event.
      
      Then ack_bast only wakes up recv_thread after IN_CLUSTER
      bit is ready otherwise MD_CLUSTER_PENDING_RESYNC_EVENT is
      set.
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      51e453ae
  3. 05 5月, 2016 6 次提交
  4. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  5. 02 4月, 2016 1 次提交
  6. 15 3月, 2016 1 次提交
  7. 08 3月, 2016 1 次提交
  8. 25 1月, 2016 1 次提交
  9. 12 10月, 2015 2 次提交
    • G
      md-cluster: Use a small window for resync · c40f341f
      Goldwyn Rodrigues 提交于
      Suspending the entire device for resync could take too long. Resync
      in small chunks.
      
      cluster's resync window (32M) is maintained in r1conf as
      cluster_sync_low and cluster_sync_high and processed in
      raid1's sync_request(). If the current resync is outside the cluster
      resync window:
      
      1. Set the cluster_sync_low to curr_resync_completed.
      2. Check if the sync will fit in the new window, if not issue a
         wait_barrier() and set cluster_sync_low to sector_nr.
      3. Set cluster_sync_high to cluster_sync_low + resync_window.
      4. Send a message to all nodes so they may add it in their suspension
         list.
      
      bitmap_cond_end_sync is modified to allow to force a sync inorder
      to get the curr_resync_completed uptodate with the sector passed.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      c40f341f
    • G
      md: Increment version for clustered bitmaps · 3c462c88
      Goldwyn Rodrigues 提交于
      Add BITMAP_MAJOR_CLUSTERED as 5, in order to prevent older kernels
      to assemble a clustered device.
      
      In order to maximize compatibility, the major version is set to
      BITMAP_MAJOR_CLUSTERED *only* if the bitmap is clustered.
      
      Added MD_FEATURE_CLUSTERED in order to return error for older
      kernels which would assemble MD even if the bitmap is corrupted.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      3c462c88
  10. 02 10月, 2015 1 次提交
  11. 24 7月, 2015 2 次提交
  12. 23 7月, 2015 1 次提交
    • G
      md: Skip cluster setup for dm-raid · d3b178ad
      Goldwyn Rodrigues 提交于
      There is a bug that the bitmap superblock isn't initialised properly for
      dm-raid, so a new field can have garbage in new fields.
      (dm-raid does initialisation in the kernel - md initialised the
       superblock in mdadm).
      
      This means that for dm-raid we cannot currently trust the new ->nodes
      field. So:
       - use __GFP_ZERO to initialise the superblock properly for all new
          arrays
       - initialise all fields in bitmap_info in bitmap_new_disk_sb
       - ignore ->nodes for dm arrays (yes, this is a hack)
      
      This bug exposes dm-raid to bug in the (still experimental) md-cluster
      code, so it is suitable for -stable.  It does cause crashes.
      
      References: https://bugzilla.kernel.org/show_bug.cgi?id=100491
      Cc: stable@vger.kernel.org (v4.1)
      Signed-off-By: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      d3b178ad
  13. 24 6月, 2015 2 次提交
  14. 21 5月, 2015 1 次提交
  15. 22 4月, 2015 1 次提交
  16. 25 3月, 2015 1 次提交
  17. 04 3月, 2015 2 次提交
  18. 23 2月, 2015 5 次提交
    • G
      Copy set bits from another slot · 11dd35da
      Goldwyn Rodrigues 提交于
      bitmap_copy_from_slot reads the bitmap from the slot mentioned.
      It then copies the set bits to the node local bitmap.
      
      This is helper function for the resync operation on node failure.
      
      bitmap_set_memory_bits() currently assumes it is only run at startup and that
      they bitmap is currently empty.  So if it finds that a region is already
      marked as dirty, it won't mark it dirty again. Change bitmap_set_memory_bits()
      to always set the NEEDED_MASK bit if 'needed' is set.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      11dd35da
    • G
      bitmap_create returns bitmap pointer · f9209a32
      Goldwyn Rodrigues 提交于
      This is done to have multiple bitmaps open at the same time.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      f9209a32
    • G
      Use separate bitmaps for each nodes in the cluster · b97e9257
      Goldwyn Rodrigues 提交于
      On-disk format:
      
      0                    4k                     8k                    12k
      -------------------------------------------------------------------
      | idle                | md super            | bm super [0] + bits |
      | bm bits[0, contd]   | bm super[1] + bits  | bm bits[1, contd]   |
      | bm super[2] + bits  | bm bits [2, contd]  | bm super[3] + bits  |
      | bm bits [3, contd]  |                     |                     |
      
      Bitmap super has a field nodes, which defines the maximum number
      of nodes the device can use. While reading the bitmap super, if
      the cluster finds out that the number of nodes is > 0:
      1. Requests the md-cluster module.
      2. Calls md_cluster_ops->join(), which sets up clustering such as
         joining DLM lockspace.
      
      Since the first time, the first bitmap is read. After the call
      to the cluster_setup, the bitmap offset is adjusted and the
      superblock is re-read. This also ensures the bitmap is read
      the bitmap lock (when bitmap lock is introduced in later patches)
      
      Questions:
      1. cluster name is repeated in all bitmap supers. Is that okay?
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      b97e9257
    • G
      Add node recovery callbacks · cf921cc1
      Goldwyn Rodrigues 提交于
      DLM offers callbacks when a node fails and the lock remastery
      is performed:
      
      1. recover_prep: called when DLM discovers a node is down
      2. recover_slot: called when DLM identifies the node and recovery
      		can start
      3. recover_done: called when all nodes have completed recover_slot
      
      recover_slot() and recover_done() are also called when the node joins
      initially in order to inform the node with its slot number. These slot
      numbers start from one, so we deduct one to make it start with zero
      which the cluster-md code uses.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      cf921cc1
    • G
      Introduce md_cluster_info · c4ce867f
      Goldwyn Rodrigues 提交于
      md_cluster_info stores the cluster information in the MD device.
      
      The join() is called when mddev detects it is a clustered device.
      The main responsibilities are:
      	1. Setup a DLM lockspace
      	2. Setup all initial locks such as super block locks and bitmap lock (will come later)
      
      The leave() clears up the lockspace and all the locks held.
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      c4ce867f
  19. 06 2月, 2015 2 次提交
  20. 02 2月, 2015 1 次提交
  21. 09 10月, 2014 1 次提交
    • N
      md/bitmap: always wait for writes on unplug. · 4b5060dd
      NeilBrown 提交于
      If two threads call bitmap_unplug at the same time, then
      one might schedule all the writes, and the other might
      decide that it doesn't need to wait.  But really it does.
      
      It rarely hurts to wait when it isn't absolutely necessary,
      and the current code doesn't really focus on 'absolutely necessary'
      anyway.  So just wait always.
      
      This can potentially lead to data corruption if a crash happens
      at an awkward time and data was written before the bitmap was
      updated.  It is very unlikely, but this should go to -stable
      just to be safe.  Appropriate for any -stable.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Cc: stable@vger.kernel.org (please delay until 3.18 is released)
      4b5060dd
  22. 29 5月, 2014 1 次提交
  23. 09 4月, 2014 1 次提交
    • N
      md/bitmap: don't abuse i_writecount for bitmap files. · 035328c2
      NeilBrown 提交于
      md bitmap code currently tries to use i_writecount to stop any other
      process from writing to out bitmap file.  But that is really an abuse
      and has bit-rotted so locking is all wrong.
      
      So discard that - root should be allowed to shoot self in foot.
      
      Still use it in a much less intrusive way to stop the same file being
      used as bitmap on two different array, and apply other checks to
      ensure the file is at least vaguely usable for bitmap storage
      (is regular, is open for write.  Support for ->bmap is already checked
      elsewhere).
      Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      035328c2
  24. 12 12月, 2013 1 次提交
    • T
      kernfs: s/sysfs_dirent/kernfs_node/ and rename its friends accordingly · 324a56e1
      Tejun Heo 提交于
      kernfs has just been separated out from sysfs and we're already in
      full conflict mode.  Nothing can make the situation any worse.  Let's
      take the chance to name things properly.
      
      This patch performs the following renames.
      
      * s/sysfs_elem_dir/kernfs_elem_dir/
      * s/sysfs_elem_symlink/kernfs_elem_symlink/
      * s/sysfs_elem_attr/kernfs_elem_file/
      * s/sysfs_dirent/kernfs_node/
      * s/sd/kn/ in kernfs proper
      * s/parent_sd/parent/
      * s/target_sd/target/
      * s/dir_sd/parent/
      * s/to_sysfs_dirent()/rb_to_kn()/
      * misc renames of local vars when they conflict with the above
      
      Because md, mic and gpio dig into sysfs details, this patch ends up
      modifying them.  All are sysfs_dirent renames and trivial.  While we
      can avoid these by introducing a dummy wrapping struct sysfs_dirent
      around kernfs_node, given the limited usage outside kernfs and sysfs
      proper, I don't think such workaround is called for.
      
      This patch is strictly rename only and doesn't introduce any
      functional difference.
      
      - mic / gpio renames were missing.  Spotted by kbuild test robot.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
      Cc: kbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      324a56e1
  25. 27 9月, 2013 1 次提交
    • T
      sysfs: clean up sysfs_get_dirent() · 388975cc
      Tejun Heo 提交于
      The pre-existing sysfs interfaces which take explicit namespace
      argument are weird in that they place the optional @ns in front of
      @name which is contrary to the established convention.  For example,
      we end up forcing vast majority of sysfs_get_dirent() users to do
      sysfs_get_dirent(parent, NULL, name), which is silly and error-prone
      especially as @ns and @name may be interchanged without causing
      compilation warning.
      
      This renames sysfs_get_dirent() to sysfs_get_dirent_ns() and swap the
      positions of @name and @ns, and sysfs_get_dirent() is now a wrapper
      around sysfs_get_dirent_ns().  This makes confusions a lot less
      likely.
      
      There are other interfaces which take @ns before @name.  They'll be
      updated by following patches.
      
      This patch doesn't introduce any functional changes.
      
      v2: EXPORT_SYMBOL_GPL() wasn't updated leading to undefined symbol
          error on module builds.  Reported by build test robot.  Fixed.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Kay Sievers <kay@vrfy.org>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      388975cc