1. 12 7月, 2018 1 次提交
    • A
      introduce FMODE_CREATED and switch to it · 73a09dd9
      Al Viro 提交于
      Parallel to FILE_CREATED, goes into ->f_mode instead of *opened.
      NFS is a bit of a wart here - it doesn't have file at the point
      where FILE_CREATED used to be set, so we need to propagate it
      there (for now).  IMA is another one (here and everywhere)...
      
      Note that this needs do_dentry_open() to leave old bits in ->f_mode
      alone - we want it to preserve FMODE_CREATED if it had been already
      set (no other bit can be there).
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      73a09dd9
  2. 05 6月, 2018 1 次提交
  3. 01 6月, 2018 1 次提交
  4. 29 5月, 2018 3 次提交
  5. 11 4月, 2018 4 次提交
  6. 28 11月, 2017 1 次提交
    • L
      Rename superblock flags (MS_xyz -> SB_xyz) · 1751e8a6
      Linus Torvalds 提交于
      This is a pure automated search-and-replace of the internal kernel
      superblock flags.
      
      The s_flags are now called SB_*, with the names and the values for the
      moment mirroring the MS_* flags that they're equivalent to.
      
      Note how the MS_xyz flags are the ones passed to the mount system call,
      while the SB_xyz flags are what we then use in sb->s_flags.
      
      The script to do this was:
      
          # places to look in; re security/*: it generally should *not* be
          # touched (that stuff parses mount(2) arguments directly), but
          # there are two places where we really deal with superblock flags.
          FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
                  include/linux/fs.h include/uapi/linux/bfs_fs.h \
                  security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
          # the list of MS_... constants
          SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
                DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
                POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
                I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
                ACTIVE NOUSER"
      
          SED_PROG=
          for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done
      
          # we want files that contain at least one of MS_...,
          # with fs/namespace.c and fs/pnode.c excluded.
          L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')
      
          for f in $L; do sed -i $f $SED_PROG; done
      Requested-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1751e8a6
  7. 18 11月, 2017 2 次提交
  8. 25 10月, 2017 1 次提交
    • M
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns... · 6aa7de05
      Mark Rutland 提交于
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()
      
      Please do not apply this to mainline directly, instead please re-run the
      coccinelle script shown below and apply its output.
      
      For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
      preference to ACCESS_ONCE(), and new code is expected to use one of the
      former. So far, there's been no reason to change most existing uses of
      ACCESS_ONCE(), as these aren't harmful, and changing them results in
      churn.
      
      However, for some features, the read/write distinction is critical to
      correct operation. To distinguish these cases, separate read/write
      accessors must be used. This patch migrates (most) remaining
      ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
      coccinelle script:
      
      ----
      // Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
      // WRITE_ONCE()
      
      // $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch
      
      virtual patch
      
      @ depends on patch @
      expression E1, E2;
      @@
      
      - ACCESS_ONCE(E1) = E2
      + WRITE_ONCE(E1, E2)
      
      @ depends on patch @
      expression E;
      @@
      
      - ACCESS_ONCE(E)
      + READ_ONCE(E)
      ----
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: davem@davemloft.net
      Cc: linux-arch@vger.kernel.org
      Cc: mpe@ellerman.id.au
      Cc: shuah@kernel.org
      Cc: snitzer@redhat.com
      Cc: thor.thayer@linux.intel.com
      Cc: tj@kernel.org
      Cc: viro@zeniv.linux.org.uk
      Cc: will.deacon@arm.com
      Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6aa7de05
  9. 17 10月, 2017 4 次提交
    • N
      NFS: remove special-case revalidate in nfs_opendir() · 1fea73ac
      NeilBrown 提交于
      Commit f5a73672 ("NFS: allow close-to-open cache semantics to
      apply to root of NFS filesystem") added a call to
      __nfs_revalidate_inode() to nfs_opendir to as the lookup
      process wouldn't reliable do this.
      
      Subsequent commit a3fbbde7 ("VFS: we need to set LOOKUP_JUMPED
      on mountpoint crossing") make this unnecessary.  So remove the
      unnecessary code.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      1fea73ac
    • N
      NFS: revalidate "." etc correctly on "open". · b688741c
      NeilBrown 提交于
      For correct close-to-open semantics, NFS must validate
      the change attribute of a directory (or file) on open.
      
      Since commit ecf3d1f1 ("vfs: kill FS_REVAL_DOT by adding a
      d_weak_revalidate dentry op"), open() of "." or a path ending ".." is
      not revalidated reliably (except when that direct is a mount point).
      
      Prior to that commit, "." was revalidated using nfs_lookup_revalidate()
      which checks the LOOKUP_OPEN flag and forces revalidation if the flag is
      set.
      Since that commit, nfs_weak_revalidate() is used for NFSv3 (which
      ignores the flags) and nothing is used for NFSv4.
      
      This is fixed by using nfs_lookup_verify_inode() in
      nfs_weak_revalidate().  This does the revalidation exactly when needed.
      Also, add a definition of .d_weak_revalidate for NFSv4.
      
      The incorrect behavior is easily demonstrated by running "echo *" in
      some non-mountpoint NFS directory while watching network traffic.
      Without this patch, "echo *" sometimes doesn't produce any traffic.
      With the patch it always does.
      
      Fixes: ecf3d1f1 ("vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op")
      cc: stable@vger.kernel.org (3.9+)
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      b688741c
    • A
      NFS: Don't compare apples to elephants to determine access bits · 1750d929
      Anna Schumaker 提交于
      The NFS_ACCESS_* flags aren't a 1:1 mapping to the MAY_* flags, so
      checking for MAY_WHATEVER might have surprising results in
      nfs*_proc_access().  Let's simplify this check when determining which
      bits to ask for, and do it in a generic place instead of copying code
      for each NFS version.
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      1750d929
    • A
      NFS: Create NFS_ACCESS_* flags · 3c181827
      Anna Schumaker 提交于
      Passing the NFS v4 flags into the v3 code seems weird to me, even if
      they are defined to the same values.  This patch adds in generic flags
      to help me feel better
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      3c181827
  10. 07 9月, 2017 1 次提交
  11. 21 7月, 2017 3 次提交
  12. 14 7月, 2017 5 次提交
    • P
      nfs: replace d_add with d_splice_alias in atomic_open · 774d9513
      Peng Tao 提交于
      It's a trival change but follows knfsd export document that asks
      for d_splice_alias during lookup.
      Signed-off-by: NPeng Tao <tao.peng@primarydata.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      774d9513
    • N
      NFS: guard against confused server in nfs_atomic_open() · eaa2b82c
      NeilBrown 提交于
      A confused server could return a filehandle for an
      NFSv4 OPEN request, which it previously returned for a directory.
      So the inode returned by  ->open_context() in nfs_atomic_open()
      could conceivably be a directory inode.
      
      This has particular implications for the call to
      nfs_file_set_open_context() in nfs_finish_open().
      If that is called on a directory inode, then the nfs_open_context
      that gets stored in the filp->private_data will be linked to
      nfs_inode->open_files.
      
      When the directory is closed, nfs_closedir() will (ultimately)
      free the ->private_data, but not unlink it from nfs_inode->open_files
      (because it doesn't expect an nfs_open_context there).
      
      Subsequently the memory could get used for something else and eventually
      if the ->open_files list is walked, the walker will fall off the end and
      crash.
      
      So: change nfs_finish_open() to only call nfs_file_set_open_context()
      for regular-file inodes.
      
      This failure mode has been seen in a production setting (unknown NFS
      server implementation).  The kernel was v3.0 and the specific sequence
      seen would not affect more recent kernels, but I think a risk is still
      present, and caution is wise.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      eaa2b82c
    • N
      NFS: only invalidate dentrys that are clearly invalid. · cc89684c
      NeilBrown 提交于
      Since commit bafc9b75 ("vfs: More precise tests in d_invalidate")
      in v3.18, a return of '0' from ->d_revalidate() will cause the dentry
      to be invalidated even if it has filesystems mounted on or it or on a
      descendant.  The mounted filesystem is unmounted.
      
      This means we need to be careful not to return 0 unless the directory
      referred to truly is invalid.  So -ESTALE or -ENOENT should invalidate
      the directory.  Other errors such a -EPERM or -ERESTARTSYS should be
      returned from ->d_revalidate() so they are propagated to the caller.
      
      A particular problem can be demonstrated by:
      
      1/ mount an NFS filesystem using NFSv3 on /mnt
      2/ mount any other filesystem on /mnt/foo
      3/ ls /mnt/foo
      4/ turn off network, or otherwise make the server unable to respond
      5/ ls /mnt/foo &
      6/ cat /proc/$!/stack # note that nfs_lookup_revalidate is in the call stack
      7/ kill -9 $! # this results in -ERESTARTSYS being returned
      8/ observe that /mnt/foo has been unmounted.
      
      This patch changes nfs_lookup_revalidate() to only treat
        -ESTALE from nfs_lookup_verify_inode() and
        -ESTALE or -ENOENT from ->lookup()
      as indicating an invalid inode.  Other errors are returned.
      
      Also nfs_check_inode_attributes() is changed to return -ESTALE rather
      than -EIO.  This is consistent with the error returned in similar
      circumstances from nfs_update_inode().
      
      As this bug allows any user to unmount a filesystem mounted on an NFS
      filesystem, this fix is suitable for stable kernels.
      
      Fixes: bafc9b75 ("vfs: More precise tests in d_invalidate")
      Cc: stable@vger.kernel.org (v3.18+)
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      cc89684c
    • B
      NFS: nfs_rename() - revalidate directories on -ERESTARTSYS · 818a8dbe
      Benjamin Coddington 提交于
      An interrupted rename will leave the old dentry behind if the rename
      succeeds.  Fix this by forcing a lookup the next time through
      ->d_revalidate.
      
      A previous attempt at solving this problem took the approach to complete
      the work of the rename asynchronously, however that approach was wrong
      since it would allow the d_move() to occur after the directory's i_mutex
      had been dropped by the original process.
      Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      818a8dbe
    • B
      NFS: convert flags to bool · a7a3b1e9
      Benjamin Coddington 提交于
      NFS uses some int, and unsigned int :1, and bool as flags in structs and
      args.  Assert the preference for uniformly replacing these with the bool
      type.
      Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      a7a3b1e9
  13. 28 6月, 2017 1 次提交
  14. 06 5月, 2017 1 次提交
  15. 21 4月, 2017 1 次提交
    • B
      NFS: switch back to to ->iterate() · b044f645
      Benjamin Coddington 提交于
      NFS has some optimizations for readdir to choose between using READDIR or
      READDIRPLUS based on workload, and which NFS operation to use is determined
      by subsequent interactions with lookup, d_revalidate, and getattr.
      
      Concurrent use of nfs_readdir() via ->iterate_shared() can cause those
      optimizations to repeatedly invalidate the pagecache used to store
      directory entries during readdir(), which causes some very bad performance
      for directories with many entries (more than about 10000).
      
      There's a couple ways to fix this in NFS, but no fix would be as simple as
      going back to ->iterate() to serialize nfs_readdir(), and neither fix I
      tested performed as well as going back to ->iterate().
      
      The first required taking the directory's i_lock for each entry, with the
      result of terrible contention.
      
      The second way adds another flag to the nfs_inode, and so keeps the
      optimizations working for large directories.  The difference from using
      ->iterate() here is that much more memory is consumed for a given workload
      without any performance gain.
      
      The workings of nfs_readdir() are such that concurrent users are serialized
      within read_cache_page() waiting to retrieve pages of entries from the
      server.  By serializing this work in iterate_dir() instead, contention for
      cache pages is reduced.  Waiting processes can have an uncontended pass at
      the entirety of the directory's pagecache once previous processes have
      completed filling it.
      
      v2 - Keep the bits needed for parallel lookup
      Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      b044f645
  16. 28 3月, 2017 1 次提交
  17. 09 2月, 2017 1 次提交
  18. 20 12月, 2016 2 次提交
  19. 10 12月, 2016 1 次提交
  20. 05 12月, 2016 1 次提交
  21. 03 12月, 2016 3 次提交
  22. 02 12月, 2016 1 次提交
    • N
      NFSv4: add flock_owner to open context · 532d4def
      NeilBrown 提交于
      An open file description (struct file) in a given process can be
      associated with two different lock owners.
      
      It can have a Posix lock owner which will be different in each process
      that has a fd on the file.
      It can have a Flock owner which will be the same in all processes.
      
      When searching for a lock stateid to use, we need to consider both of these
      owners
      
      So add a new "flock_owner" to the "nfs_open_context" (of which there
      is one for each open file description).
      
      This flock_owner does not need to be reference-counted as there is a
      1-1 relation between 'struct file' and nfs open contexts,
      and it will never be part of a list of contexts.  So there is no need
      for a 'flock_context' - just the owner is enough.
      
      The io_count included in the (Posix) lock_context provides no
      guarantee that all read-aheads that could use the state have
      completed, so not supporting it for flock locks in not a serious
      problem.  Synchronization between flock and read-ahead can be added
      later if needed.
      
      When creating an open_context for a non-openning create call, we don't have
      a 'struct file' to pass in, so the lock context gets initialized with
      a NULL owner, but this will never be used.
      
      The flock_owner is not used at all in this patch, that will come later.
      Acked-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      532d4def