1. 09 8月, 2017 6 次提交
  2. 21 7月, 2017 4 次提交
    • B
      GFS2: Set gl_object in inode lookup only after block type check · 4d7c18c7
      Bob Peterson 提交于
      Before this patch, the inode glock's gl_object was set after a
      reference was acquired, but before the block type was verified.
      In cases where the block was unlinked, then freed and reused on
      another node, a residule delete callback (delete_work) would try
      to look up the inode, eventually failing the block check, but
      only after it overwrites gl_object with a pointer to the wrong
      inode. This patch moves the assignment of gl_object after the
      block check so it won't be improperly overwritten.
      
      Likewise, at the end of the function, gfs2_inode_lookup was
      clearing gl_object after it unlocked the glock, which meant
      another process might free the glock in the meantime. This
      patch guards against that case.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Reviewed-by: NAndreas Gruenbacher <agruenba@redhat.com>
      4d7c18c7
    • B
      GFS2: Introduce helper for clearing gl_object · df3d87bd
      Bob Peterson 提交于
      This patch introduces a new helper function in glock.h that
      clears gl_object, with an added integrity check. An additional
      integrity check has been added to glock_set_object, plus comments.
      This is step 1 in a series to ensure gl_object integrity.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Reviewed-by: NAndreas Gruenbacher <agruenba@redhat.com>
      df3d87bd
    • C
      gfs2: add flag REQ_PRIO for metadata I/O · e477b24b
      Coly Li 提交于
      When gfs2 does metadata I/O, only REQ_META is used as a metadata hint of
      the bio. But flag REQ_META is just a hint for block trace, not for block
      layer code to handle a bio as metadata request.
      
      For some of metadata I/Os of gfs2, A REQ_PRIO flag on the metadata bio
      would be very informative to block layer code. For example, if bcache is
      used as a I/O cache for gfs2, it will be possible for bcache code to get
      the hint and cache the pre-fetched metadata blocks on cache device. This
      behavior may be helpful to improve metadata I/O performance if the
      following requests hit the cache.
      
      Here are the locations in gfs2 code where a REQ_PRIO flag should be added,
      - All places where REQ_READAHEAD is used, gfs2 code uses this flag for
        metadata read ahead.
      - In gfs2_meta_rq() where the first metadata block is read in.
      - In gfs2_write_buf_to_page(), read in quota metadata blocks to have them
        up to date.
      These metadata blocks are probably to be accessed again in future, adding
      a REQ_PRIO flag may have bcache to keep such metadata in fast cache
      device. For system without a cache layer, REQ_PRIO can still provide hint
      to block layer to handle metadata requests more properly.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      e477b24b
    • W
      GFS2: fix code parameter error in inode_go_lock · e7cb550d
      Wang Xibo 提交于
      In inode_go_lock() function, the parameter order of list_add() is error.
      According to the define of list_add(), the first parameter is new entry
      and the second is the list head, so ip->i_trunc_list should be the
      first parameter and the sdp->sd_trunc_list should be second.
      
      Signed-off-by: Wang Xibo<wang.xibo@zte.com.cn>
      Signed-off-by: Xiao Likun<xiao.likun@zte.com.cn>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      e7cb550d
  3. 20 7月, 2017 1 次提交
  4. 19 7月, 2017 1 次提交
    • J
      gfs2: Don't clear SGID when inheriting ACLs · 914cea93
      Jan Kara 提交于
      When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
      set, DIR1 is expected to have SGID bit set (and owning group equal to
      the owning group of 'DIR0'). However when 'DIR0' also has some default
      ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
      'DIR1' to get cleared if user is not member of the owning group.
      
      Fix the problem by moving posix_acl_update_mode() out of
      __gfs2_set_acl() into gfs2_set_acl(). That way the function will not be
      called when inheriting ACLs which is what we want as it prevents SGID
      bit clearing and the mode has been properly set by posix_acl_create()
      anyway.
      
      Fixes: 07393101Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      914cea93
  5. 18 7月, 2017 1 次提交
  6. 17 7月, 2017 1 次提交
    • B
      GFS2: Prevent double brelse in gfs2_meta_indirect_buffer · 61eaadcd
      Bob Peterson 提交于
      Before this patch, problems reading in indirect buffers would send
      an IO error back to the caller, and release the buffer_head with
      brelse() in function gfs2_meta_indirect_buffer, however, it would
      still return the address of the buffer_head it released. After the
      error was discovered, function gfs2_block_map would call function
      release_metapath to free all buffers. That checked:
      if (mp->mp_bh[i] == NULL) but since the value was set after the
      error, it was non-zero, so brelse was called a second time. This
      resulted in the following error:
      
      kernel: WARNING: at fs/buffer.c:1224 __brelse+0x3a/0x40() (Tainted: G        W  -- ------------   )
      kernel: Hardware name: RHEV Hypervisor
      kernel: VFS: brelse: Trying to free free buffer
      
      This patch changes gfs2_meta_indirect_buffer so it only sets
      the buffer_head pointer in cases where it isn't released.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
      61eaadcd
  7. 08 7月, 2017 3 次提交
    • K
      exec: Limit arg stack to at most 75% of _STK_LIM · da029c11
      Kees Cook 提交于
      To avoid pathological stack usage or the need to special-case setuid
      execs, just limit all arg stack usage to at most 75% of _STK_LIM (6MB).
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      da029c11
    • L
      vfs: fix flock compat thinko · b59eea55
      Linus Torvalds 提交于
      Michael Ellerman reported that commit 8c6657cb ("Switch flock
      copyin/copyout primitives to copy_{from,to}_user()") broke his
      networking on a bunch of PPC machines (64-bit kernel, 32-bit userspace).
      
      The reason is a brown-paper bug by that commit, which had the arguments
      to "copy_flock_fields()" in the wrong order, breaking the compat
      handling for file locking.  Apparently very few people run 32-bit user
      space on x86 any more, so the PPC people got the honor of noticing this
      "feature".
      
      Michael also sent a minimal diff that just changed the order of the
      arguments in that macro.
      
      This is not that minimal diff.
      
      This not only changes the order of the arguments in the macro, it also
      changes them to be pointers (to be consistent with all the other uses of
      those pointers), and makes the functions that do all of this also have
      the proper "const" attribution on the source pointers in order to make
      issues like that (using the source as a destination) be really obvious.
      Reported-by: NMichael Ellerman <mpe@ellerman.id.au>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b59eea55
    • A
      gfs2: Fix glock rhashtable rcu bug · 961ae1d8
      Andreas Gruenbacher 提交于
      Before commit 88ffbf3e "GFS2: Use resizable hash table for glocks",
      glocks were freed via call_rcu to allow reading the glock hashtable
      locklessly using rcu.  This was then changed to free glocks immediately,
      which made reading the glock hashtable unsafe.  Bring back the original
      code for freeing glocks via call_rcu.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Cc: stable@vger.kernel.org # 4.3+
      961ae1d8
  8. 07 7月, 2017 9 次提交
  9. 06 7月, 2017 14 次提交
    • J
      btrfs: minimal conversion to errseq_t writeback error reporting on fsync · 333427a5
      Jeff Layton 提交于
      Just check and advance the errseq_t in the file before returning, and
      use an errseq_t based check for writeback errors.
      
      Other internal callers of filemap_* functions are left as-is.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      333427a5
    • J
      xfs: minimal conversion to errseq_t writeback error reporting · 1b180274
      Jeff Layton 提交于
      Just check and advance the data errseq_t in struct file before
      before returning from fsync on normal files. Internal filemap_*
      callers are left as-is.
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      1b180274
    • J
      ext4: use errseq_t based error handling for reporting data writeback errors · 6acec592
      Jeff Layton 提交于
      Add a call to filemap_report_wb_err at the end of ext4_sync_file. This
      will ensure that we check and advance the errseq_t in the file, which
      allows us to track and report errors on all open fds when they occur.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      6acec592
    • J
      fs: convert __generic_file_fsync to use errseq_t based reporting · 383aa543
      Jeff Layton 提交于
      Many simple, block-based filesystems use generic_file_fsync as their
      fsync operation. Some others (ext* and fat) also call this function
      to handle syncing out data.
      
      Switch this code over to use errseq_t based error reporting so that
      all of these filesystems get reliable error reporting via fsync,
      fdatasync and msync.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      383aa543
    • J
      block: convert to errseq_t based writeback error tracking · 372cf243
      Jeff Layton 提交于
      This is a very minimal conversion to errseq_t based error tracking
      for raw block device access. Just have it use the standard
      file_write_and_wait_range call.
      
      Note that there are internal callers that call sync_blockdev
      and the like that are not affected by this. They'll continue
      to use the AS_EIO/AS_ENOSPC flags for error reporting like
      they always have for now.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      372cf243
    • J
      dax: set errors in mapping when writeback fails · 819ec6b9
      Jeff Layton 提交于
      Jan Kara's description for this patch is much better than mine, so I'm
      quoting it verbatim here:
      
      DAX currently doesn't set errors in the mapping when cache flushing
      fails in dax_writeback_mapping_range(). Since this function can get
      called only from fsync(2) or sync(2), this is actually as good as it can
      currently get since we correctly propagate the error up from
      dax_writeback_mapping_range() to filemap_fdatawrite()
      
      However, in the future better writeback error handling will enable us to
      properly report these errors on fsync(2) even if there are multiple file
      descriptors open against the file or if sync(2) gets called before
      fsync(2). So convert DAX to using standard error reporting through the
      mapping.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-and-tested-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      819ec6b9
    • J
      fs: new infrastructure for writeback error handling and reporting · 5660e13d
      Jeff Layton 提交于
      Most filesystems currently use mapping_set_error and
      filemap_check_errors for setting and reporting/clearing writeback errors
      at the mapping level. filemap_check_errors is indirectly called from
      most of the filemap_fdatawait_* functions and from
      filemap_write_and_wait*. These functions are called from all sorts of
      contexts to wait on writeback to finish -- e.g. mostly in fsync, but
      also in truncate calls, getattr, etc.
      
      The non-fsync callers are problematic. We should be reporting writeback
      errors during fsync, but many places spread over the tree clear out
      errors before they can be properly reported, or report errors at
      nonsensical times.
      
      If I get -EIO on a stat() call, there is no reason for me to assume that
      it is because some previous writeback failed. The fact that it also
      clears out the error such that a subsequent fsync returns 0 is a bug,
      and a nasty one since that's potentially silent data corruption.
      
      This patch adds a small bit of new infrastructure for setting and
      reporting errors during address_space writeback. While the above was my
      original impetus for adding this, I think it's also the case that
      current fsync semantics are just problematic for userland. Most
      applications that call fsync do so to ensure that the data they wrote
      has hit the backing store.
      
      In the case where there are multiple writers to the file at the same
      time, this is really hard to determine. The first one to call fsync will
      see any stored error, and the rest get back 0. The processes with open
      fds may not be associated with one another in any way. They could even
      be in different containers, so ensuring coordination between all fsync
      callers is not really an option.
      
      One way to remedy this would be to track what file descriptor was used
      to dirty the file, but that's rather cumbersome and would likely be
      slow. However, there is a simpler way to improve the semantics here
      without incurring too much overhead.
      
      This set adds an errseq_t to struct address_space, and a corresponding
      one is added to struct file. Writeback errors are recorded in the
      mapping's errseq_t, and the one in struct file is used as the "since"
      value.
      
      This changes the semantics of the Linux fsync implementation such that
      applications can now use it to determine whether there were any
      writeback errors since fsync(fd) was last called (or since the file was
      opened in the case of fsync having never been called).
      
      Note that those writeback errors may have occurred when writing data
      that was dirtied via an entirely different fd, but that's the case now
      with the current mapping_set_error/filemap_check_error infrastructure.
      This will at least prevent you from getting a false report of success.
      
      The new behavior is still consistent with the POSIX spec, and is more
      reliable for application developers. This patch just adds some basic
      infrastructure for doing this, and ensures that the f_wb_err "cursor"
      is properly set when a file is opened. Later patches will change the
      existing code to use this new infrastructure for reporting errors at
      fsync time.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      5660e13d
    • J
      jbd2: don't clear and reset errors after waiting on writeback · 76341cab
      Jeff Layton 提交于
      Resetting this flag is almost certainly racy, and will be problematic
      with some coming changes.
      
      Make filemap_fdatawait_keep_errors return int, but not clear the flag(s).
      Have jbd2 call it instead of filemap_fdatawait and don't attempt to
      re-set the error flag if it fails.
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      76341cab
    • J
      buffer: set errors in mapping at the time that the error occurs · 87354e5d
      Jeff Layton 提交于
      I noticed on xfs that I could still sometimes get back an error on fsync
      on a fd that was opened after the error condition had been cleared.
      
      The problem is that the buffer code sets the write_io_error flag and
      then later checks that flag to set the error in the mapping. That flag
      perisists for quite a while however. If the file is later opened with
      O_TRUNC, the buffers will then be invalidated and the mapping's error
      set such that a subsequent fsync will return error. I think this is
      incorrect, as there was no writeback between the open and fsync.
      
      Add a new mark_buffer_write_io_error operation that sets the flag and
      the error in the mapping at the same time. Replace all calls to
      set_buffer_write_io_error with mark_buffer_write_io_error, and remove
      the places that check this flag in order to set the error in the
      mapping.
      
      This sets the error in the mapping earlier, at the time that it's first
      detected.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      87354e5d
    • J
      fs: check for writeback errors after syncing out buffers in generic_file_fsync · dac257f7
      Jeff Layton 提交于
      ext2 currently does a test+clear of the AS_EIO flag, which is
      is problematic for some coming changes.
      
      What we really need to do instead is call filemap_check_errors
      in __generic_file_fsync after syncing out the buffers. That
      will be sufficient for this case, and help other callers detect
      these errors properly as well.
      
      With that, we don't need to twiddle it in ext2.
      Suggested-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NMatthew Wilcox <mawilcox@microsoft.com>
      dac257f7
    • J
    • R
      CIFS: fix circular locking dependency · 966681c9
      Rabin Vincent 提交于
      When a CIFS filesystem is mounted with the forcemand option and the
      following command is run on it, lockdep warns about a circular locking
      dependency between CifsInodeInfo::lock_sem and the inode lock.
      
       while echo foo > hello; do :; done & while touch -c hello; do :; done
      
      cifs_writev() takes the locks in the wrong order, but note that we can't
      only flip the order around because it releases the inode lock before the
      call to generic_write_sync() while it holds the lock_sem across that
      call.
      
      But, AFAICS, there is no need to hold the CifsInodeInfo::lock_sem across
      the generic_write_sync() call either, so we can release both the locks
      before generic_write_sync(), and change the order.
      
       ======================================================
       WARNING: possible circular locking dependency detected
       4.12.0-rc7+ #9 Not tainted
       ------------------------------------------------------
       touch/487 is trying to acquire lock:
        (&cifsi->lock_sem){++++..}, at: cifsFileInfo_put+0x88f/0x16a0
      
       but task is already holding lock:
        (&sb->s_type->i_mutex_key#11){+.+.+.}, at: utimes_common+0x3ad/0x870
      
       which lock already depends on the new lock.
      
       the existing dependency chain (in reverse order) is:
      
       -> #1 (&sb->s_type->i_mutex_key#11){+.+.+.}:
              __lock_acquire+0x1f74/0x38f0
              lock_acquire+0x1cc/0x600
              down_write+0x74/0x110
              cifs_strict_writev+0x3cb/0x8c0
              __vfs_write+0x4c1/0x930
              vfs_write+0x14c/0x2d0
              SyS_write+0xf7/0x240
              entry_SYSCALL_64_fastpath+0x1f/0xbe
      
       -> #0 (&cifsi->lock_sem){++++..}:
              check_prevs_add+0xfa0/0x1d10
              __lock_acquire+0x1f74/0x38f0
              lock_acquire+0x1cc/0x600
              down_write+0x74/0x110
              cifsFileInfo_put+0x88f/0x16a0
              cifs_setattr+0x992/0x1680
              notify_change+0x61a/0xa80
              utimes_common+0x3d4/0x870
              do_utimes+0x1c1/0x220
              SyS_utimensat+0x84/0x1a0
              entry_SYSCALL_64_fastpath+0x1f/0xbe
      
       other info that might help us debug this:
      
        Possible unsafe locking scenario:
      
              CPU0                    CPU1
              ----                    ----
         lock(&sb->s_type->i_mutex_key#11);
                                      lock(&cifsi->lock_sem);
                                      lock(&sb->s_type->i_mutex_key#11);
         lock(&cifsi->lock_sem);
      
        *** DEADLOCK ***
      
       2 locks held by touch/487:
        #0:  (sb_writers#10){.+.+.+}, at: mnt_want_write+0x41/0xb0
        #1:  (&sb->s_type->i_mutex_key#11){+.+.+.}, at: utimes_common+0x3ad/0x870
      
       stack backtrace:
       CPU: 0 PID: 487 Comm: touch Not tainted 4.12.0-rc7+ #9
       Call Trace:
        dump_stack+0xdb/0x185
        print_circular_bug+0x45b/0x790
        __lock_acquire+0x1f74/0x38f0
        lock_acquire+0x1cc/0x600
        down_write+0x74/0x110
        cifsFileInfo_put+0x88f/0x16a0
        cifs_setattr+0x992/0x1680
        notify_change+0x61a/0xa80
        utimes_common+0x3d4/0x870
        do_utimes+0x1c1/0x220
        SyS_utimensat+0x84/0x1a0
        entry_SYSCALL_64_fastpath+0x1f/0xbe
      
      Fixes: 19dfc1f5 ("cifs: fix the race in cifs_writev()")
      Signed-off-by: NRabin Vincent <rabinv@axis.com>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      Acked-by: NPavel Shilovsky <pshilov@microsoft.com>
      966681c9
    • C
      cifs: set oparms.create_options rather than or'ing in CREATE_OPEN_BACKUP_INTENT · 709340a0
      Colin Ian King 提交于
      Currently oparms.create_options is uninitialized and the code is logically
      or'ing in CREATE_OPEN_BACKUP_INTENT onto a garbage value of
      oparms.create_options from the stack.  Fix this by just setting the value
      rather than or'ing in the setting.
      
      Detected by CoverityScan, CID#1447220 ("Unitialized scale value")
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
      709340a0
    • L
      cifs: Do not modify mid entry after submitting I/O in cifs_call_async · 93d2cb6c
      Long Li 提交于
      In cifs_call_async, server may respond as soon as I/O is submitted. Because
      mid entry is freed on the return path, it should not be modified after I/O
      is submitted.
      
      cifs_save_when_sent modifies the sent timestamp in mid entry, and should not
      be called after I/O. Call it before I/O.
      Signed-off-by: NLong Li <longli@microsoft.com>
      Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      93d2cb6c