1. 11 7月, 2017 7 次提交
  2. 10 7月, 2017 3 次提交
    • G
      btrfs: nowait aio: Correct assignment of pos · ff0fa732
      Goldwyn Rodrigues 提交于
      Assigning pos for usage early messes up in append mode, where the pos is
      re-assigned in generic_write_checks(). Assign pos later to get the
      correct position to write from iocb->ki_pos.
      
      Since check_can_nocow also uses the value of pos, we shift
      generic_write_checks() before check_can_nocow(). Checks with IOCB_DIRECT
      are present in generic_write_checks(), so checking for IOCB_NOWAIT is
      enough.
      
      Also, put locking sequence in the fast path.
      
      This fixes a user visible bug, as reported:
      
      "apparently breaks several shell related features on my system.
      In zsh history stopped working, because no new entries are added
      anymore.
      I fist noticed the issue when I tried to build mplayer. It uses a shell
      script to generate a help_mp.h file:
      [...]
      
      Here is a simple testcase:
      
       % echo "foo" >> test
       % echo "foo" >> test
       % cat test
       foo
       %
      "
      
      Fixes: edf064e7 ("btrfs: nowait aio support")
      CC: Jens Axboe <axboe@kernel.dk>
      Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
      Link: https://lkml.kernel.org/r/20170704042306.GA274@x4Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ff0fa732
    • D
      afs: Add metadata xattrs · d3e3b7ea
      David Howells 提交于
      Add xattrs to allow the user to get/set metadata in lieu of having pioctl()
      available.  The following xattrs are now available:
      
       - "afs.cell"
      
         The name of the cell in which the vnode's volume resides.
      
       - "afs.fid"
      
         The volume ID, vnode ID and vnode uniquifier of the file as three hex
         numbers separated by colons.
      
       - "afs.volume"
      
         The name of the volume in which the vnode resides.
      
      For example:
      
      	# getfattr -d -m ".*" /mnt/scratch
      	getfattr: Removing leading '/' from absolute path names
      	# file: mnt/scratch
      	afs.cell="mycell.myorg.org"
      	afs.fid="10000b:1:1"
      	afs.volume="scratch"
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d3e3b7ea
    • M
      afs: Ignore AFS_ACE_READ and AFS_ACE_WRITE for directories · fd249821
      Marc Dionne 提交于
      The AFS_ACE_READ and AFS_ACE_WRITE permission bits should not
      be used to make access decisions for the directory itself.  They
      are meant to control access for the objects contained in that
      directory.
      
      Reading a directory is allowed if the AFS_ACE_LOOKUP bit is set.
      This would cause an incorrect access denied error for a directory
      with AFS_ACE_LOOKUP but not AFS_ACE_READ.
      
      The AFS_ACE_WRITE bit does not allow operations that modify the
      directory.  For a directory with AFS_ACE_WRITE but neither
      AFS_ACE_INSERT nor AFS_ACE_DELETE, this would result in trying
      operations that would ultimately be denied by the server.
      Signed-off-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fd249821
  3. 08 7月, 2017 5 次提交
    • K
      exec: Limit arg stack to at most 75% of _STK_LIM · da029c11
      Kees Cook 提交于
      To avoid pathological stack usage or the need to special-case setuid
      execs, just limit all arg stack usage to at most 75% of _STK_LIM (6MB).
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      da029c11
    • D
      xfs: don't crash on unexpected holes in dir/attr btrees · cd87d867
      Darrick J. Wong 提交于
      In quite a few places we call xfs_da_read_buf with a mappedbno that we
      don't control, then assume that the function passes back either an error
      code or a buffer pointer.  Unfortunately, if mappedbno == -2 and bno
      maps to a hole, we get a return code of zero and a NULL buffer, which
      means that we crash if we actually try to use that buffer pointer.  This
      happens immediately when we set the buffer type for transaction context.
      
      Therefore, check that we have no error code and a non-NULL bp before
      trying to use bp.  This patch is a follow-up to an incomplete fix in
      96a3aefb ("xfs: don't crash if reading a directory results in an
      unexpected hole").
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      cd87d867
    • A
      dentry name snapshots · 49d31c2f
      Al Viro 提交于
      take_dentry_name_snapshot() takes a safe snapshot of dentry name;
      if the name is a short one, it gets copied into caller-supplied
      structure, otherwise an extra reference to external name is grabbed
      (those are never modified).  In either case the pointer to stable
      string is stored into the same structure.
      
      dentry must be held by the caller of take_dentry_name_snapshot(),
      but may be freely dropped afterwards - the snapshot will stay
      until destroyed by release_dentry_name_snapshot().
      
      Intended use:
      	struct name_snapshot s;
      
      	take_dentry_name_snapshot(&s, dentry);
      	...
      	access s.name
      	...
      	release_dentry_name_snapshot(&s);
      
      Replaces fsnotify_oldname_...(), gets used in fsnotify to obtain the name
      to pass down with event.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      49d31c2f
    • L
      vfs: fix flock compat thinko · b59eea55
      Linus Torvalds 提交于
      Michael Ellerman reported that commit 8c6657cb ("Switch flock
      copyin/copyout primitives to copy_{from,to}_user()") broke his
      networking on a bunch of PPC machines (64-bit kernel, 32-bit userspace).
      
      The reason is a brown-paper bug by that commit, which had the arguments
      to "copy_flock_fields()" in the wrong order, breaking the compat
      handling for file locking.  Apparently very few people run 32-bit user
      space on x86 any more, so the PPC people got the honor of noticing this
      "feature".
      
      Michael also sent a minimal diff that just changed the order of the
      arguments in that macro.
      
      This is not that minimal diff.
      
      This not only changes the order of the arguments in the macro, it also
      changes them to be pointers (to be consistent with all the other uses of
      those pointers), and makes the functions that do all of this also have
      the proper "const" attribution on the source pointers in order to make
      issues like that (using the source as a destination) be really obvious.
      Reported-by: NMichael Ellerman <mpe@ellerman.id.au>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b59eea55
    • A
      gfs2: Fix glock rhashtable rcu bug · 961ae1d8
      Andreas Gruenbacher 提交于
      Before commit 88ffbf3e "GFS2: Use resizable hash table for glocks",
      glocks were freed via call_rcu to allow reading the glock hashtable
      locklessly using rcu.  This was then changed to free glocks immediately,
      which made reading the glock hashtable unsafe.  Bring back the original
      code for freeing glocks via call_rcu.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Cc: stable@vger.kernel.org # 4.3+
      961ae1d8
  4. 07 7月, 2017 11 次提交
  5. 06 7月, 2017 14 次提交
    • A
      move file_{start,end}_write() out of do_iter_write() · 62473a2d
      Al Viro 提交于
      ... and do *not* grab it in vfs_write_iter().
      
      Fixes: "fs: implement vfs_iter_read using do_iter_read"
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      62473a2d
    • J
      btrfs: minimal conversion to errseq_t writeback error reporting on fsync · 333427a5
      Jeff Layton 提交于
      Just check and advance the errseq_t in the file before returning, and
      use an errseq_t based check for writeback errors.
      
      Other internal callers of filemap_* functions are left as-is.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      333427a5
    • J
      xfs: minimal conversion to errseq_t writeback error reporting · 1b180274
      Jeff Layton 提交于
      Just check and advance the data errseq_t in struct file before
      before returning from fsync on normal files. Internal filemap_*
      callers are left as-is.
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      1b180274
    • J
      ext4: use errseq_t based error handling for reporting data writeback errors · 6acec592
      Jeff Layton 提交于
      Add a call to filemap_report_wb_err at the end of ext4_sync_file. This
      will ensure that we check and advance the errseq_t in the file, which
      allows us to track and report errors on all open fds when they occur.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      6acec592
    • J
      fs: convert __generic_file_fsync to use errseq_t based reporting · 383aa543
      Jeff Layton 提交于
      Many simple, block-based filesystems use generic_file_fsync as their
      fsync operation. Some others (ext* and fat) also call this function
      to handle syncing out data.
      
      Switch this code over to use errseq_t based error reporting so that
      all of these filesystems get reliable error reporting via fsync,
      fdatasync and msync.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      383aa543
    • J
      block: convert to errseq_t based writeback error tracking · 372cf243
      Jeff Layton 提交于
      This is a very minimal conversion to errseq_t based error tracking
      for raw block device access. Just have it use the standard
      file_write_and_wait_range call.
      
      Note that there are internal callers that call sync_blockdev
      and the like that are not affected by this. They'll continue
      to use the AS_EIO/AS_ENOSPC flags for error reporting like
      they always have for now.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      372cf243
    • J
      dax: set errors in mapping when writeback fails · 819ec6b9
      Jeff Layton 提交于
      Jan Kara's description for this patch is much better than mine, so I'm
      quoting it verbatim here:
      
      DAX currently doesn't set errors in the mapping when cache flushing
      fails in dax_writeback_mapping_range(). Since this function can get
      called only from fsync(2) or sync(2), this is actually as good as it can
      currently get since we correctly propagate the error up from
      dax_writeback_mapping_range() to filemap_fdatawrite()
      
      However, in the future better writeback error handling will enable us to
      properly report these errors on fsync(2) even if there are multiple file
      descriptors open against the file or if sync(2) gets called before
      fsync(2). So convert DAX to using standard error reporting through the
      mapping.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-and-tested-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      819ec6b9
    • J
      fs: new infrastructure for writeback error handling and reporting · 5660e13d
      Jeff Layton 提交于
      Most filesystems currently use mapping_set_error and
      filemap_check_errors for setting and reporting/clearing writeback errors
      at the mapping level. filemap_check_errors is indirectly called from
      most of the filemap_fdatawait_* functions and from
      filemap_write_and_wait*. These functions are called from all sorts of
      contexts to wait on writeback to finish -- e.g. mostly in fsync, but
      also in truncate calls, getattr, etc.
      
      The non-fsync callers are problematic. We should be reporting writeback
      errors during fsync, but many places spread over the tree clear out
      errors before they can be properly reported, or report errors at
      nonsensical times.
      
      If I get -EIO on a stat() call, there is no reason for me to assume that
      it is because some previous writeback failed. The fact that it also
      clears out the error such that a subsequent fsync returns 0 is a bug,
      and a nasty one since that's potentially silent data corruption.
      
      This patch adds a small bit of new infrastructure for setting and
      reporting errors during address_space writeback. While the above was my
      original impetus for adding this, I think it's also the case that
      current fsync semantics are just problematic for userland. Most
      applications that call fsync do so to ensure that the data they wrote
      has hit the backing store.
      
      In the case where there are multiple writers to the file at the same
      time, this is really hard to determine. The first one to call fsync will
      see any stored error, and the rest get back 0. The processes with open
      fds may not be associated with one another in any way. They could even
      be in different containers, so ensuring coordination between all fsync
      callers is not really an option.
      
      One way to remedy this would be to track what file descriptor was used
      to dirty the file, but that's rather cumbersome and would likely be
      slow. However, there is a simpler way to improve the semantics here
      without incurring too much overhead.
      
      This set adds an errseq_t to struct address_space, and a corresponding
      one is added to struct file. Writeback errors are recorded in the
      mapping's errseq_t, and the one in struct file is used as the "since"
      value.
      
      This changes the semantics of the Linux fsync implementation such that
      applications can now use it to determine whether there were any
      writeback errors since fsync(fd) was last called (or since the file was
      opened in the case of fsync having never been called).
      
      Note that those writeback errors may have occurred when writing data
      that was dirtied via an entirely different fd, but that's the case now
      with the current mapping_set_error/filemap_check_error infrastructure.
      This will at least prevent you from getting a false report of success.
      
      The new behavior is still consistent with the POSIX spec, and is more
      reliable for application developers. This patch just adds some basic
      infrastructure for doing this, and ensures that the f_wb_err "cursor"
      is properly set when a file is opened. Later patches will change the
      existing code to use this new infrastructure for reporting errors at
      fsync time.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      5660e13d
    • J
      jbd2: don't clear and reset errors after waiting on writeback · 76341cab
      Jeff Layton 提交于
      Resetting this flag is almost certainly racy, and will be problematic
      with some coming changes.
      
      Make filemap_fdatawait_keep_errors return int, but not clear the flag(s).
      Have jbd2 call it instead of filemap_fdatawait and don't attempt to
      re-set the error flag if it fails.
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      76341cab
    • J
      buffer: set errors in mapping at the time that the error occurs · 87354e5d
      Jeff Layton 提交于
      I noticed on xfs that I could still sometimes get back an error on fsync
      on a fd that was opened after the error condition had been cleared.
      
      The problem is that the buffer code sets the write_io_error flag and
      then later checks that flag to set the error in the mapping. That flag
      perisists for quite a while however. If the file is later opened with
      O_TRUNC, the buffers will then be invalidated and the mapping's error
      set such that a subsequent fsync will return error. I think this is
      incorrect, as there was no writeback between the open and fsync.
      
      Add a new mark_buffer_write_io_error operation that sets the flag and
      the error in the mapping at the same time. Replace all calls to
      set_buffer_write_io_error with mark_buffer_write_io_error, and remove
      the places that check this flag in order to set the error in the
      mapping.
      
      This sets the error in the mapping earlier, at the time that it's first
      detected.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      87354e5d
    • J
      fs: check for writeback errors after syncing out buffers in generic_file_fsync · dac257f7
      Jeff Layton 提交于
      ext2 currently does a test+clear of the AS_EIO flag, which is
      is problematic for some coming changes.
      
      What we really need to do instead is call filemap_check_errors
      in __generic_file_fsync after syncing out the buffers. That
      will be sufficient for this case, and help other callers detect
      these errors properly as well.
      
      With that, we don't need to twiddle it in ext2.
      Suggested-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NMatthew Wilcox <mawilcox@microsoft.com>
      dac257f7
    • J
    • T
      ext4: fix __ext4_new_inode() journal credits calculation · af65207c
      Tahsin Erdogan 提交于
      ea_inode feature allows creating extended attributes that are up to
      64k in size. Update __ext4_new_inode() to pick increased credit limits.
      
      To avoid overallocating too many journal credits, update
      __ext4_xattr_set_credits() to make a distinction between xattr create
      vs update. This helps __ext4_new_inode() because all attributes are
      known to be new, so we can save credits that are normally needed to
      delete old values.
      
      Also, have fscrypt specify its maximum context size so that we don't
      end up allocating credits for 64k size.
      Signed-off-by: NTahsin Erdogan <tahsin@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      af65207c
    • T
      ext4: skip ext4_init_security() and encryption on ea_inodes · ad47f953
      Tahsin Erdogan 提交于
      Extended attribute inodes are internal to ext4. Adding encryption/security
      related attributes on them would mean dealing with nested calls into ea code.
      Since they have no direct exposure to user mode, just avoid creating ea
      entries for them.
      Signed-off-by: NTahsin Erdogan <tahsin@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      ad47f953