1. 04 6月, 2018 1 次提交
    • B
      GFS2: gfs2_free_extlen can return an extent that is too long · dc8fbb03
      Bob Peterson 提交于
      Function gfs2_free_extlen calculates the length of an extent of
      free blocks that may be reserved. The end pointer was calculated as
      end = start + bh->b_size but b_size is incorrect because the
      bitmap usually stops prior to the end of the buffer data on
      the last bitmap.
      
      What this means is that when you do a write, you can reserve a
      chunk of blocks that runs off the end of the last bitmap. For
      example, I've got a file system where there is only one bitmap
      for each rgrp, so ri_length==1. I saw cases in which iozone
      tried to do a big write, grabbed a large block reservation,
      chose rgrp 5464152, which has ri_data0 5464153 and ri_data 8188.
      So 5464153 + 8188 = 5472341 which is the end of the rgrp.
      
      When it grabbed a reservation it got back: 5470936, length 7229.
      But 5470936 + 7229 = 5478165. So the reservation starts inside
      the rgrp but runs 5824 blocks past the end of the bitmap.
      
      This patch fixes the calculation so it won't exceed the last
      bitmap. It also adds a BUG_ON to guard against overflows in the
      future.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      dc8fbb03
  2. 31 1月, 2018 1 次提交
  3. 23 1月, 2018 2 次提交
  4. 17 1月, 2018 1 次提交
  5. 13 12月, 2017 3 次提交
    • A
      gfs2: Add a crc field to resource group headers · 850d2d91
      Andrew Price 提交于
      Add the rg_crc field to store a crc32 of the gfs2_rgrp structure. This
      allows us to check resource group headers' integrity and removes the
      requirement to check them against the rindex entries in fsck. If this
      field is found to be zero, it should be ignored (or updated with an
      accurate value).
      Signed-off-by: NAndrew Price <anprice@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      850d2d91
    • A
      gfs2: Add rindex fields to rgrp headers · 166725d9
      Andrew Price 提交于
      Add rg_data0, rg_data and rg_bitbytes to struct gfs2_rgrp. The fields
      are identical to their counterparts in struct gfs2_rindex and are
      intended to reduce the use of the rindex. For now the fields are only
      written back as the in-memory equivalents in struct gfs2_rgrpd are set
      using values from the rindex. However, they are needed at this point so
      that userspace can make use of them, allowing a migration away from the
      rindex over time.
      
      The new fields take up previously reserved space which was explicitly
      zeroed on write so, in clusters with mixed kernels, these fields could
      get zeroed after being set and this should not be treated as an error.
      Signed-off-by: NAndrew Price <anprice@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      166725d9
    • A
      gfs2: Add a next-resource-group pointer to resource groups · 65adc273
      Andrew Price 提交于
      Add a new rg_skip field to struct gfs2_rgrp, replacing __pad. The
      rg_skip field has the following meaning:
      
      - If rg_skip is zero, it is considered unset and not useful.
      - If rg_skip is non-zero, its value will be the number of blocks between
        this rgrp's address and the next rgrp's address. This can be used as a
        hint by fsck.gfs2 when rebuilding a bad rindex, for example.
      
      This will provide less dependency on the rindex in future, and allow
      tools such as fsck.gfs2 to iterate the resource groups without keeping
      the rindex around.
      
      The field is updated in gfs2_rgrp_out() so that existing file systems
      will have it set. This means that any resource groups that aren't ever
      written will not be updated. The final rgrp is a special case as there
      is no next rgrp, so it will always have a rg_skip of 0 (unless the fs is
      extended).
      
      Before this patch, gfs2_rgrp_out() zeroes the __pad field explicitly, so
      the rg_skip field can get set back to 0 in cases where nodes with and
      without this patch are mixed in a cluster. In some cases, the field may
      bounce between being set by one node and then zeroed by another which
      may harm performance slightly, e.g. when two nodes create many small
      files. In testing this situation is rare but it becomes more likely as
      the filesystem fills up and there are fewer resource groups to choose
      from. The problem goes away when all nodes are running with this patch.
      Dipping into the space currently occupied by the rg_reserved field would
      have resulted in the same problem as it is also explicitly zeroed, so
      unfortunately there is no other way around it.
      Signed-off-by: NAndrew Price <anprice@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      65adc273
  6. 28 11月, 2017 1 次提交
    • B
      GFS2: Combine gfs2_free_di with gfs2_free_uninit_di · a18c78c5
      Bob Peterson 提交于
      Before this patch, function gfs2_free_di was 4 lines of code, and
      one of those lines was to call gfs2_free_uninit_di. Although
      unlikely, if function gfs2_free_uninit_di encountered an error
      finding the block to be freed, the error was silently ignored by the
      caller, which went ahead and improperly did a quota-change operation
      and meta_wipe despite the error. This patch combines the two
      functions into one to make the code more readable and fixes the bug
      by returning from the combined function before it takes those next
      incorrect steps.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      a18c78c5
  7. 30 8月, 2017 1 次提交
  8. 09 8月, 2017 1 次提交
  9. 05 7月, 2017 1 次提交
  10. 19 4月, 2017 1 次提交
    • B
      GFS2: Non-recursive delete · d552a2b9
      Bob Peterson 提交于
      Implement truncate/delete as a non-recursive algorithm. The older
      algorithm was implemented with recursion to strip off each layer
      at a time (going by height, starting with the maximum height.
      This version tries to do the same thing but without recursion,
      and without needing to allocate new structures or lists in memory.
      
      For example, say you want to truncate a very large file to 1 byte,
      and its end-of-file metapath is: 0.505.463.428. The starting
      metapath would be 0.0.0.0. Since it's a truncate to non-zero, it
      needs to preserve that byte, and all metadata pointing to it.
      So it would start at 0.0.0.0, look up all its metadata buffers,
      then free all data blocks pointed to at the highest level.
      After that buffer is "swept", it moves on to 0.0.0.1, then
      0.0.0.2, etc., reading in buffers and sweeping them clean.
      When it gets to the end of the 0.0.0 metadata buffer (for 4K
      blocks the last valid one is 0.0.0.508), it backs up to the
      previous height and starts working on 0.0.1.0, then 0.0.1.1,
      and so forth. After it reaches the end and sweeps 0.0.1.508,
      it continues with 0.0.2.0, and so on. When that height is
      exhausted, and it reaches 0.0.508.508 it backs up another level,
      to 0.1.0.0, then 0.1.0.1, through 0.1.0.508. So it has to keep
      marching backwards and forwards through the metadata until it's
      all swept clean. Once it has all the data blocks freed, it
      lowers the strip height, and begins the process all over again,
      but with one less height. This time it sweeps 0.0.0 through
      0.505.463. When that's clean, it lowers the strip height again
      and works to free 0.505. Eventually it strips the lowest height, 0.
      For a delete or truncate to 0, all metadata for all heights of
      0.0.0.0 would be freed. For a truncate to 1 byte, 0.0.0.0 would
      be preserved.
      
      This isn't much different from normal integer incrementing,
      where an integer gets incremented from 0000 (0.0.0.0) to 3021
      (3.0.2.1). So 0000 gets increments to 0001, 0002, up to 0009,
      then on to 0010, 0011 up to 0099, then 0100 and so forth. It's
      just that each "digit" goes from 0 to 508 (for a total of 509
      pointers) rather than from 0 to 9.
      
      Note that the dinode will only have 483 pointers due to the
      dinode structure itself.
      
      Also note: this is just an example. These numbers (509 and 483)
      are based on a standard 4K block size. Smaller block sizes will
      yield smaller numbers of indirect pointers accordingly.
      
      The truncation process is accomplished with the help of two
      major functions and a few helper functions.
      
      Functions do_strip and recursive_scan are obsolete, so removed.
      
      New function sweep_bh_for_rgrps cleans a buffer_head pointed to
      by the given metapath and height. By cleaning, I mean it frees
      all blocks starting at the offset passed in metapath. It starts
      at the first block in the buffer pointed to by the metapath and
      identifies its resource group (rgrp). From there it frees all
      subsequent block pointers that lie within that rgrp. If it's
      already inside a transaction, it stays within it as long as it
      can. In other words, it doesn't close a transaction until it knows
      it's freed what it can from the resource group. In this way,
      multiple buffers may be cleaned in a single transaction, as long
      as those blocks in the buffer all lie within the same rgrp.
      
      If it's not in a transaction, it starts one. If the buffer_head
      has references to blocks within multiple rgrps, it frees all the
      blocks inside the first rgrp it finds, then closes the
      transaction. Then it repeats the cycle: identifies the next
      unfreed block, uses it to find its rgrp, then starts a new
      transaction for that set. It repeats this process repeatedly
      until the buffer_head contains no more references to any blocks
      past the given metapath.
      
      Function trunc_dealloc has been reworked into a finite state
      automaton. It has basically 3 active states:
      DEALLOC_MP_FULL, DEALLOC_MP_LOWER, and DEALLOC_FILL_MP:
      
      The DEALLOC_MP_FULL state implies the metapath has a full set
      of buffers out to the "shrink height", and therefore, it can
      call function sweep_bh_for_rgrps to free the blocks within the
      highest height of the metapath. If it's just swept the lowest
      level (or an error has occurred) the state machine is ended.
      Otherwise it proceeds to the DEALLOC_MP_LOWER state.
      
      The DEALLOC_MP_LOWER state implies we are finished with a given
      buffer_head, which may now be released, and therefore we are
      then missing some buffer information from the metapath. So we
      need to find more buffers to read in. In most cases, this is
      just a matter of releasing the buffer_head and moving to the
      next pointer from the previous height, so it may be read in and
      swept as well. If it can't find another non-null pointer to
      process, it checks whether it's reached the end of a height
      and needs to lower the strip height, or whether it still needs
      move forward through the previous height's metadata. In this
      state, all zero-pointers are skipped. From this state, it can
      only loop around (once more backing up another height) or,
      once a valid metapath is found (one that has non-zero
      pointers), proceed to state DEALLOC_FILL_MP.
      
      The DEALLOC_FILL_MP state implies that we have a metapath
      but not all its buffers are read in. So we must proceed to read
      in buffer_heads until the metapath has a valid buffer for every
      height. If the previous state backed us up 3 heights, we may
      need to read in a buffer, increment the height, then repeat the
      process until buffers have been read in for all required heights.
      If it's successful reading a buffer, and it's at the highest
      height we need, it proceeds back to the DEALLOC_MP_FULL state.
      If it's unable to fill in a buffer, (encounters a hole, etc.)
      it tries to find another non-zero block pointer. If they're all
      zero, it lowers the height and returns to the DEALLOC_MP_LOWER
      state. If it finds a good non-null pointer, it loops around and
      reads it in, while keeping the metapath in lock-step with the
      pointers it examines.
      
      The state machine runs until the truncation request is
      satisfied. Then any transactions are ended, the quota and
      statfs data are updated, and the function is complete.
      
      Helper function metaptr1 was introduced to be an easy way to
      determine the start of a buffer_head's indirect pointers.
      
      Helper function lookup_mp_height was introduced to find a
      metapath index and read in the buffer that corresponds to it.
      In this way, function lookup_metapath becomes a simple loop to
      call it for every height.
      
      Helper function fillup_metapath is similar to lookup_metapath
      except it can do partial lookups. If the state machine
      backed up multiple levels (like 2999 wrapping to 3000) it
      needs to find out the next starting point and start issuing
      metadata reads at that point.
      
      Helper function hptrs is a shortcut to determine how many
      pointers should be expected in a buffer. Height 0 is the dinode
      which has fewer pointers than the others.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      d552a2b9
  11. 13 7月, 2016 1 次提交
    • B
      GFS2: Check rs_free with rd_rsspin protection · 44f52122
      Bob Peterson 提交于
      For the last process to close a file opened for write, function
      gfs2_rsqa_delete was deleting the file's inode's block reservation
      out of the rgrp reservations tree. Then it was checking to make sure
      rs_free was 0, but it was performing the check outside the protection
      of rd_rsspin spin_lock. The rd_rsspin spin_lock protection is needed
      to prevent a race between the process freeing the reservation and
      another who is allocating a new set of blocks inside the same rgrp
      for the same inode, thus changing its value.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      44f52122
  12. 27 6月, 2016 1 次提交
    • A
      gfs2: Lock holder cleanup · 6df9f9a2
      Andreas Gruenbacher 提交于
      Make the code more readable by cleaning up the different ways of
      initializing lock holders and checking for initialized lock holders:
      mark lock holders as uninitialized by setting the holder's glock to NULL
      (gfs2_holder_mark_uninitialized) instead of zeroing out the entire
      object or using a separate flag.  Recognize initialized holders by their
      non-NULL glock (gfs2_holder_initialized).  Don't zero out holder objects
      which are immeditiately initialized via gfs2_holder_init or
      gfs2_glock_nq_init.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      6df9f9a2
  13. 10 6月, 2016 1 次提交
    • B
      GFS2: don't set rgrp gl_object until it's inserted into rgrp tree · 36e4ad03
      Bob Peterson 提交于
      Before this patch, function read_rindex_entry would set a rgrp
      glock's gl_object pointer to itself before inserting the rgrp into
      the rgrp rbtree. The problem is: if another process was also reading
      the rgrp in, and had already inserted its newly created rgrp, then
      the second call to read_rindex_entry would overwrite that value,
      then return a bad return code to the caller. Later, other functions
      would reference the now-freed rgrp memory by way of gl_object.
      In some cases, that could result in gfs2_rgrp_brelse being called
      twice for the same rgrp: once for the failed attempt and once for
      the "real" rgrp release. Eventually the kernel would panic.
      There are also a number of other things that could go wrong when
      a kernel module is accessing freed storage. For example, this could
      result in rgrp corruption because the fake rgrp would point to a
      fake bitmap in memory too, causing gfs2_inplace_reserve to search
      some random memory for free blocks, and find some, since we were
      never setting rgd->rd_bits to NULL before freeing it.
      
      This patch fixes the problem by not setting gl_object until we
      have successfully inserted the rgrp into the rbtree. Also, it sets
      rd_bits to NULL as it frees them, which will ensure any accidental
      access to the wrong rgrp will result in a kernel panic rather than
      file system corruption, which is preferred.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      36e4ad03
  14. 02 5月, 2016 1 次提交
  15. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  16. 19 12月, 2015 1 次提交
    • B
      GFS2: Always use iopen glock for gl_deletes · 5ea31bc0
      Bob Peterson 提交于
      Before this patch, when function try_rgrp_unlink queued a glock for
      delete_work to reclaim the space, it used the inode glock to do so.
      That's different from the iopen callback which uses the iopen glock
      for the same purpose. We should be consistent and always use the
      iopen glock. This may also save us reference counting problems with
      the inode glock, since clear_glock does an extra glock_put() for the
      inode glock.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      5ea31bc0
  17. 15 12月, 2015 1 次提交
    • B
      GFS2: Make rgrp reservations part of the gfs2_inode structure · a097dc7e
      Bob Peterson 提交于
      Before this patch, multi-block reservation structures were allocated
      from a special slab. This patch folds the structure into the gfs2_inode
      structure. The disadvantage is that the gfs2_inode needs more memory,
      even when a file is opened read-only. The advantages are: (a) we don't
      need the special slab and the extra time it takes to allocate and
      deallocate from it. (b) we no longer need to worry that the structure
      exists for things like quota management. (c) This also allows us to
      remove the calls to get_write_access and put_write_access since we
      know the structure will exist.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      a097dc7e
  18. 24 11月, 2015 1 次提交
    • B
      GFS2: Extract quota data from reservations structure (revert 5407e242) · b54e9a0b
      Bob Peterson 提交于
      This patch basically reverts the majority of patch 5407e242.
      That patch eliminated the gfs2_qadata structure in favor of just
      using the reservations structure. The problem with doing that is that
      it increases the size of the reservations structure. That is not an
      issue until it comes time to fold the reservations structure into the
      inode in memory so we know it's always there. By separating out the
      quota structure again, we aren't punishing the non-quota users by
      making all the inodes bigger, requiring more slab space. This patch
      creates a new slab area to allocate the quota stuff so it's managed
      a little more sanely.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      b54e9a0b
  19. 17 11月, 2015 1 次提交
  20. 09 11月, 2015 1 次提交
  21. 30 10月, 2015 1 次提交
  22. 04 9月, 2015 2 次提交
  23. 19 6月, 2015 1 次提交
  24. 19 5月, 2015 1 次提交
  25. 06 5月, 2015 1 次提交
    • A
      gfs2: handle NULL rgd in set_rgrp_preferences · 959b6717
      Abhi Das 提交于
      The function set_rgrp_preferences() does not handle the (rarely
      returned) NULL value from gfs2_rgrpd_get_next() and this patch
      fixes that.
      
      The fs image in question is only 150MB in size which allows for
      only 1 rgrp to be created. The in-memory rb tree has only 1 node
      and when gfs2_rgrpd_get_next() is called on this sole rgrp, it
      returns NULL. (Default behavior is to wrap around the rb tree and
      return the first node to give the illusion of a circular linked
      list. In the case of only 1 rgrp, we can't have
      gfs2_rgrpd_get_next() return the same rgrp (first, last, next all
      point to the same rgrp)... that would cause unintended consequences
      and infinite loops.)
      Signed-off-by: NAbhi Das <adas@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      959b6717
  26. 24 4月, 2015 2 次提交
  27. 19 3月, 2015 1 次提交
    • A
      gfs2: allow quota_check and inplace_reserve to return available blocks · 25435e5e
      Abhi Das 提交于
      struct gfs2_alloc_parms is passed to gfs2_quota_check() and
      gfs2_inplace_reserve() with ap->target containing the number of
      blocks being requested for allocation in the current operation.
      
      We add a new field to struct gfs2_alloc_parms called 'allowed'.
      gfs2_quota_check() and gfs2_inplace_reserve() return the max
      blocks allowed by quota and the max blocks allowed by the chosen
      rgrp respectively in 'allowed'.
      
      A new field 'min_target', when non-zero, tells gfs2_quota_check()
      and gfs2_inplace_reserve() to not return -EDQUOT/-ENOSPC when
      there are atleast 'min_target' blocks allowable/available. The
      assumption is that the caller is ok with just 'min_target' blocks
      and will likely proceed with allocating them.
      Signed-off-by: NAbhi Das <adas@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
      25435e5e
  28. 04 11月, 2014 2 次提交
  29. 03 10月, 2014 1 次提交
  30. 19 9月, 2014 1 次提交
    • A
      GFS2: fix bad inode i_goal values during block allocation · 00a158be
      Abhi Das 提交于
      This patch checks if i_goal is either zero or if doesn't exist
      within any rgrp (i.e gfs2_blk2rgrpd() returns NULL). If so, it
      assigns the ip->i_no_addr block as the i_goal.
      
      There are two scenarios where a bad i_goal can result in a
      -EBADSLT error.
      
      1. Attempting to allocate to an existing inode:
      Control reaches gfs2_inplace_reserve() and ip->i_goal is bad.
      We need to fix i_goal here.
      
      2. A new inode is created in a directory whose i_goal is hosed:
      In this case, the parent dir's i_goal is copied onto the new
      inode. Since the new inode is not yet created, the ip->i_no_addr
      field is invalid and so, the fix in gfs2_inplace_reserve() as per
      1) won't work in this scenario. We need to catch and fix it sooner
      in the parent dir itself (gfs2_create_inode()), before it is
      copied to the new inode.
      Signed-off-by: NAbhi Das <adas@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      00a158be
  31. 18 7月, 2014 1 次提交
  32. 14 5月, 2014 1 次提交
    • B
      GFS2: remove transaction glock · 24972557
      Benjamin Marzinski 提交于
      GFS2 has a transaction glock, which must be grabbed for every
      transaction, whose purpose is to deal with freezing the filesystem.
      Aside from this involving a large amount of locking, it is very easy to
      make the current fsfreeze code hang on unfreezing.
      
      This patch rewrites how gfs2 handles freezing the filesystem. The
      transaction glock is removed. In it's place is a freeze glock, which is
      cached (but not held) in a shared state by every node in the cluster
      when the filesystem is mounted. This lock only needs to be grabbed on
      freezing, and actions which need to be safe from freezing, like
      recovery.
      
      When a node wants to freeze the filesystem, it grabs this glock
      exclusively.  When the freeze glock state changes on the nodes (either
      from shared to unlocked, or shared to exclusive), the filesystem does a
      special log flush.  gfs2_log_flush() does all the work for flushing out
      the and shutting down the incore log, and then it tries to grab the
      freeze glock in a shared state again.  Since the filesystem is stuck in
      gfs2_log_flush, no new transaction can start, and nothing can be written
      to disk. Unfreezing the filesytem simply involes dropping the freeze
      glock, allowing gfs2_log_flush() to grab and then release the shared
      lock, so it is cached for next time.
      
      However, in order for the unfreezing ioctl to occur, gfs2 needs to get a
      shared lock on the filesystem root directory inode to check permissions.
      If that glock has already been grabbed exclusively, fsfreeze will be
      unable to get the shared lock and unfreeze the filesystem.
      
      In order to allow the unfreeze, this patch makes gfs2 grab a shared lock
      on the filesystem root directory during the freeze, and hold it until it
      unfreezes the filesystem.  The functions which need to grab a shared
      lock in order to allow the unfreeze ioctl to be issued now use the lock
      grabbed by the freeze code instead.
      
      The freeze and unfreeze code take care to make sure that this shared
      lock will not be dropped while another process is using it.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      24972557
  33. 07 3月, 2014 2 次提交