1. 26 9月, 2017 1 次提交
  2. 31 8月, 2017 3 次提交
    • E
      gfs2: preserve i_mode if __gfs2_set_acl() fails · 309e8cda
      Ernesto A. Fernández 提交于
      When changing a file's acl mask, __gfs2_set_acl() will first set the
      group bits of i_mode to the value of the mask, and only then set the
      actual extended attribute representing the new acl.
      
      If the second part fails (due to lack of space, for example) and the
      file had no acl attribute to begin with, the system will from now on
      assume that the mask permission bits are actual group permission bits,
      potentially granting access to the wrong users.
      
      Prevent this by only changing the inode mode after the acl has been set.
      Signed-off-by: NErnesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      309e8cda
    • E
      gfs2: don't return ENODATA in __gfs2_xattr_set unless replacing · 54aae14b
      Ernesto A. Fernández 提交于
      The function __gfs2_xattr_set() will return -ENODATA when called to
      remove a xattr that does not exist. The result is that setfacl will
      show an exit status of 1 when called to set only a file's mode bits
      (on a file with no ACLs), despite succeeding. A "No data available"
      error will be printed as well.
      
      To fix this return 0 instead, except when the XATTR_REPLACE flag is
      set, in which case -ENODATA is appropriate. This is consistent with
      how most other xattr setting functions work, in other filesystems.
      Signed-off-by: NErnesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      54aae14b
    • B
      GFS2: Fix non-recursive truncate bug · c4a9d189
      Bob Peterson 提交于
      Before this patch if you truncated a file to a smaller size it
      wasn't freeing all the blocks properly. There are two reasons.
      
      First, the metapath comparison was not comparing previous heights.
      I added a function, mp_eq_to_hgt, which checks the metapath at
      all heights prior to the target height.
      
      Second, in function find_nonnull_ptr, it needed to zero out all
      pointers for heights following the target height. Translated into
      decimal integer terms, this way a number like 299, when incremented,
      becomes 300, not 399. The 2 gets incremented to 3, and the following
      digits need to be reset.
      
      These two things allow the truncate state machine to properly find
      the blocks it needs to delete.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      c4a9d189
  3. 30 8月, 2017 2 次提交
  4. 26 8月, 2017 1 次提交
  5. 25 8月, 2017 2 次提交
    • A
      gfs2: Silence gcc format-truncation warning · 561b7969
      Andreas Gruenbacher 提交于
      Enlarge sd_fsname to be big enough for the longest long lock table name
      and an arbitrary journal number.  This silences two -Wformat-truncation
      warnings with gcc 7.1.1.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      561b7969
    • B
      GFS2: Withdraw for IO errors writing to the journal or statfs · 942b0cdd
      Bob Peterson 提交于
      Before this patch, if GFS2 encountered IO errors while writing to
      the journal, it would not report the problem, so they would go
      unnoticed, sometimes for many hours. Sometimes this would only be
      noticed later, when recovery tried to do journal replay and failed
      due to invalid metadata at the blocks that resulted in IO errors.
      
      This patch makes GFS2's log daemon check for IO errors. If it
      encounters one, it withdraws from the file system and reports
      why in dmesg. A similar action is taken when IO errors occur when
      writing to the system statfs file.
      
      These errors are also reported back to any callers of fsync, since
      that requires the journal to be flushed. Therefore, any IO errors
      that would previously go unnoticed are now noticed and the file
      system is withdrawn as early as possible, thus preventing further
      file system damage.
      
      Also note that this reintroduces superblock variable sd_log_error,
      which Christoph removed with commit f729b66f.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      942b0cdd
  6. 24 8月, 2017 1 次提交
    • C
      block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992
      Christoph Hellwig 提交于
      This way we don't need a block_device structure to submit I/O.  The
      block_device has different life time rules from the gendisk and
      request_queue and is usually only available when the block device node
      is open.  Other callers need to explicitly create one (e.g. the lightnvm
      passthrough code, or the new nvme multipathing code).
      
      For the actual I/O path all that we need is the gendisk, which exists
      once per block device.  But given that the block layer also does
      partition remapping we additionally need a partition index, which is
      used for said remapping in generic_make_request.
      
      Note that all the block drivers generally want request_queue or
      sometimes the gendisk, so this removes a layer of indirection all
      over the stack.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      74d46992
  7. 16 8月, 2017 1 次提交
    • T
      gfs2: fix slab corruption during mounting and umounting gfs file system · cc1dfa8b
      Thomas Tai 提交于
      When using cman-3.0.12.1 and gfs2-utils-3.0.12.1, mounting and
      unmounting GFS2 file system would cause kernel to hang. The slab
      allocator suggests that it is likely a double free memory corruption.
      The issue is traced back to v3.9-rc6 where a patch is submitted to
      use kzalloc() for storing a bitmap instead of using a local variable.
      The intention is to allocate memory during mount and to free memory
      during unmount. The original patch misses a code path which has
      already freed the memory and caused memory corruption. This patch sets
      the memory pointer to NULL after the memory is freed, so that double
      free memory corruption will not happen.
      
      gdlm_mount()
        '-- set_recover_size() which use kzalloc()
        '-- if dlm does not support ops callbacks then
                '--- free_recover_size() which use kfree()
      
      gldm_unmount()
        '-- free_recover_size() which use kfree()
      
      Previous patch which introduced the double free issue is
      commit 57c7310b ("GFS2: use kmalloc for lvb bitmap")
      Signed-off-by: NThomas Tai <thomas.tai@oracle.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Reviewed-by: NLiam R. Howlett <Liam.Howlett@Oracle.com>
      cc1dfa8b
  8. 10 8月, 2017 6 次提交
    • A
      gfs2: forcibly flush ail to relieve memory pressure · b066a4ee
      Abhi Das 提交于
      On systems with low memory, it is possible for gfs2 to infinitely
      loop in balance_dirty_pages() under heavy IO (creating sparse files).
      
      balance_dirty_pages() attempts to write out the dirty pages via
      gfs2_writepages() but none are found because these dirty pages are
      being used by the journaling code in the ail. Normally, the journal
      has an upper threshold which when hit triggers an automatic flush
      of the ail. But this threshold can be higher than the number of
      allowable dirty pages and result in the ail never being flushed.
      
      This patch forces an ail flush when gfs2_writepages() fails to write
      anything. This is a good indication that the ail might be holding
      some dirty pages.
      Signed-off-by: NAbhi Das <adas@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      b066a4ee
    • A
      gfs2: Clean up waiting on glocks · a91323e2
      Andreas Gruenbacher 提交于
      The prepare_to_wait_on_glock and finish_wait_on_glock functions introduced in
      commit 56a365be "gfs2: gfs2_glock_get: Wait on freeing glocks" are
      better removed, resulting in cleaner code.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      a91323e2
    • A
      gfs2: Defer deleting inodes under memory pressure · 6a1c8f6d
      Andreas Gruenbacher 提交于
      When under memory pressure and an inode's link count has dropped to
      zero, defer deleting the inode to the delete workqueue.  This avoids
      calling into DLM under memory pressure, which can deadlock.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      6a1c8f6d
    • A
      gfs2: gfs2_evict_inode: Put glocks asynchronously · 71c1b213
      Andreas Gruenbacher 提交于
      gfs2_evict_inode is called to free inodes under memory pressure.  The
      function calls into DLM when an inode's last cluster-wide reference goes
      away (remote unlink) and to release the glock and associated DLM lock
      before finally destroying the inode.  However, if DLM is blocked on
      memory to become available, calling into DLM again will deadlock.
      
      Avoid that by decoupling releasing glocks from destroying inodes in that
      case: with gfs2_glock_queue_put, glocks will be dequeued asynchronously
      in work queue context, when the associated inodes have likely already
      been destroyed.
      
      With this change, inodes can end up being unlinked, remote-unlink can be
      triggered, and then the inode can be reallocated before all
      remote-unlink callbacks are processed.  To detect that, revalidate the
      link count in gfs2_evict_inode to make sure we're not deleting an
      allocated, referenced inode.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      71c1b213
    • A
      gfs2: Get rid of gfs2_set_nlink · eebd2e81
      Andreas Gruenbacher 提交于
      Remove gfs2_set_nlink which prevents the link count of an inode from
      becoming non-zero once it has reached zero.  The next commit reduces the
      amount of waiting on glocks when an inode is evicted from memory.  With
      that, an inode can become reallocated before all the remote-unlink
      callbacks from a previous delete are processed, which causes the link
      count to change from zero to non-zero.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      eebd2e81
    • A
      gfs2: gfs2_glock_get: Wait on freeing glocks · 0515480a
      Andreas Gruenbacher 提交于
      Keep glocks in their hash table until they are freed instead of removing
      them when their last reference is dropped.  This allows to wait for any
      previous instances of a glock to go away in gfs2_glock_get before
      creating a new glocks.
      
      Special thanks to Andy Price for finding and fixing a problem which also
      required us to delete the rcu_read_unlock from the error case in function
      gfs2_glock_get.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      0515480a
  9. 09 8月, 2017 6 次提交
  10. 01 8月, 2017 1 次提交
  11. 21 7月, 2017 4 次提交
    • B
      GFS2: Set gl_object in inode lookup only after block type check · 4d7c18c7
      Bob Peterson 提交于
      Before this patch, the inode glock's gl_object was set after a
      reference was acquired, but before the block type was verified.
      In cases where the block was unlinked, then freed and reused on
      another node, a residule delete callback (delete_work) would try
      to look up the inode, eventually failing the block check, but
      only after it overwrites gl_object with a pointer to the wrong
      inode. This patch moves the assignment of gl_object after the
      block check so it won't be improperly overwritten.
      
      Likewise, at the end of the function, gfs2_inode_lookup was
      clearing gl_object after it unlocked the glock, which meant
      another process might free the glock in the meantime. This
      patch guards against that case.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Reviewed-by: NAndreas Gruenbacher <agruenba@redhat.com>
      4d7c18c7
    • B
      GFS2: Introduce helper for clearing gl_object · df3d87bd
      Bob Peterson 提交于
      This patch introduces a new helper function in glock.h that
      clears gl_object, with an added integrity check. An additional
      integrity check has been added to glock_set_object, plus comments.
      This is step 1 in a series to ensure gl_object integrity.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Reviewed-by: NAndreas Gruenbacher <agruenba@redhat.com>
      df3d87bd
    • C
      gfs2: add flag REQ_PRIO for metadata I/O · e477b24b
      Coly Li 提交于
      When gfs2 does metadata I/O, only REQ_META is used as a metadata hint of
      the bio. But flag REQ_META is just a hint for block trace, not for block
      layer code to handle a bio as metadata request.
      
      For some of metadata I/Os of gfs2, A REQ_PRIO flag on the metadata bio
      would be very informative to block layer code. For example, if bcache is
      used as a I/O cache for gfs2, it will be possible for bcache code to get
      the hint and cache the pre-fetched metadata blocks on cache device. This
      behavior may be helpful to improve metadata I/O performance if the
      following requests hit the cache.
      
      Here are the locations in gfs2 code where a REQ_PRIO flag should be added,
      - All places where REQ_READAHEAD is used, gfs2 code uses this flag for
        metadata read ahead.
      - In gfs2_meta_rq() where the first metadata block is read in.
      - In gfs2_write_buf_to_page(), read in quota metadata blocks to have them
        up to date.
      These metadata blocks are probably to be accessed again in future, adding
      a REQ_PRIO flag may have bcache to keep such metadata in fast cache
      device. For system without a cache layer, REQ_PRIO can still provide hint
      to block layer to handle metadata requests more properly.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      e477b24b
    • W
      GFS2: fix code parameter error in inode_go_lock · e7cb550d
      Wang Xibo 提交于
      In inode_go_lock() function, the parameter order of list_add() is error.
      According to the define of list_add(), the first parameter is new entry
      and the second is the list head, so ip->i_trunc_list should be the
      first parameter and the sdp->sd_trunc_list should be second.
      
      Signed-off-by: Wang Xibo<wang.xibo@zte.com.cn>
      Signed-off-by: Xiao Likun<xiao.likun@zte.com.cn>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      e7cb550d
  12. 20 7月, 2017 1 次提交
  13. 19 7月, 2017 1 次提交
    • J
      gfs2: Don't clear SGID when inheriting ACLs · 914cea93
      Jan Kara 提交于
      When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
      set, DIR1 is expected to have SGID bit set (and owning group equal to
      the owning group of 'DIR0'). However when 'DIR0' also has some default
      ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
      'DIR1' to get cleared if user is not member of the owning group.
      
      Fix the problem by moving posix_acl_update_mode() out of
      __gfs2_set_acl() into gfs2_set_acl(). That way the function will not be
      called when inheriting ACLs which is what we want as it prevents SGID
      bit clearing and the mode has been properly set by posix_acl_create()
      anyway.
      
      Fixes: 07393101Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      914cea93
  14. 18 7月, 2017 1 次提交
  15. 17 7月, 2017 1 次提交
    • B
      GFS2: Prevent double brelse in gfs2_meta_indirect_buffer · 61eaadcd
      Bob Peterson 提交于
      Before this patch, problems reading in indirect buffers would send
      an IO error back to the caller, and release the buffer_head with
      brelse() in function gfs2_meta_indirect_buffer, however, it would
      still return the address of the buffer_head it released. After the
      error was discovered, function gfs2_block_map would call function
      release_metapath to free all buffers. That checked:
      if (mp->mp_bh[i] == NULL) but since the value was set after the
      error, it was non-zero, so brelse was called a second time. This
      resulted in the following error:
      
      kernel: WARNING: at fs/buffer.c:1224 __brelse+0x3a/0x40() (Tainted: G        W  -- ------------   )
      kernel: Hardware name: RHEV Hypervisor
      kernel: VFS: brelse: Trying to free free buffer
      
      This patch changes gfs2_meta_indirect_buffer so it only sets
      the buffer_head pointer in cases where it isn't released.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
      61eaadcd
  16. 08 7月, 2017 1 次提交
  17. 06 7月, 2017 2 次提交
    • J
      buffer: set errors in mapping at the time that the error occurs · 87354e5d
      Jeff Layton 提交于
      I noticed on xfs that I could still sometimes get back an error on fsync
      on a fd that was opened after the error condition had been cleared.
      
      The problem is that the buffer code sets the write_io_error flag and
      then later checks that flag to set the error in the mapping. That flag
      perisists for quite a while however. If the file is later opened with
      O_TRUNC, the buffers will then be invalidated and the mapping's error
      set such that a subsequent fsync will return error. I think this is
      incorrect, as there was no writeback between the open and fsync.
      
      Add a new mark_buffer_write_io_error operation that sets the flag and
      the error in the mapping at the same time. Replace all calls to
      set_buffer_write_io_error with mark_buffer_write_io_error, and remove
      the places that check this flag in order to set the error in the
      mapping.
      
      This sets the error in the mapping earlier, at the time that it's first
      detected.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      87354e5d
    • D
      VFS: Provide empty name qstr · cdf01226
      David Howells 提交于
      Provide an empty name (ie. "") qstr for general use.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      cdf01226
  18. 05 7月, 2017 5 次提交