1. 07 7月, 2022 3 次提交
    • A
      step_into(): lose inode argument · a4f5b521
      Al Viro 提交于
      make handle_mounts() always fetch it.  This is just the first step -
      the callers of step_into() will stop trying to calculate the sucker,
      etc.
      
      The passed value should be equal to dentry->d_inode in all cases;
      in RCU mode - fetched after we'd sampled ->d_seq.  Might as well
      fetch it here.  We do need to validate ->d_seq, which duplicates
      the check currently done in lookup_fast(); that duplication will
      go away shortly.
      
      After that change handle_mounts() always ignores the initial value of
      *inode and always sets it on success.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a4f5b521
    • A
      namei: stash the sampled ->d_seq into nameidata · 03fa86e9
      Al Viro 提交于
      New field: nd->next_seq.  Set to 0 outside of RCU mode, holds the sampled
      value for the next dentry to be considered.  Used instead of an arseload
      of local variables, arguments, etc.
      
      step_into() has lost seq argument; nd->next_seq is used, so dentry passed
      to it must be the one ->next_seq is about.
      
      There are two requirements for RCU pathwalk:
      	1) it should not give a hard failure (other than -ECHILD) unless
      non-RCU pathwalk might fail that way given suitable timings.
      	2) it should not succeed unless non-RCU pathwalk might succeed
      with the same end location given suitable timings.
      
      The use of seq numbers is the way we achieve that.  Invariant we want
      to maintain is:
      	if RCU pathwalk can reach the state with given nd->path, nd->inode
      and nd->seq after having traversed some part of pathname, it must be possible
      for non-RCU pathwalk to reach the same nd->path and nd->inode after having
      traversed the same part of pathname, and observe the nd->path.dentry->d_seq
      equal to what RCU pathwalk has in nd->seq
      
      	For transition from parent to child, we sample child's ->d_seq
      and verify that parent's ->d_seq remains unchanged.  Anything that
      disrupts parent-child relationship would've bumped ->d_seq on both.
      	For transitions from child to parent we sample parent's ->d_seq
      and verify that child's ->d_seq has not changed.  Same reasoning as
      for the previous case applies.
      	For transition from mountpoint to root of mounted we sample
      the ->d_seq of root and verify that nobody has touched mount_lock since
      the beginning of pathwalk.  That guarantees that mount we'd found had
      been there all along, with these mountpoint and root of the mounted.
      It would be possible for a non-RCU pathwalk to reach the previous state,
      find the same mount and observe its root at the moment we'd sampled
      ->d_seq of that
      	For transitions from root of mounted to mountpoint we sample
      ->d_seq of mountpoint and verify that mount_lock had not been touched
      since the beginning of pathwalk.  The same reasoning as in the
      previous case applies.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      03fa86e9
    • A
      namei: move clearing LOOKUP_RCU towards rcu_read_unlock() · 6e180327
      Al Viro 提交于
      try_to_unlazy()/try_to_unlazy_next() drop LOOKUP_RCU in the
      very beginning and do rcu_read_unlock() only at the very end.
      However, nothing done in between even looks at the flag in
      question; might as well clear it at the same time we unlock.
      
      Note that try_to_unlazy_next() used to call legitimize_mnt(),
      which might drop/regain rcu_read_lock() in some cases.  This
      is no longer true, so we really have rcu_read_lock() held
      all along until the end.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      6e180327
  2. 06 7月, 2022 4 次提交
    • A
      switch try_to_unlazy_next() to __legitimize_mnt() · 7e4745a0
      Al Viro 提交于
      The tricky case (__legitimize_mnt() failing after having grabbed
      a reference) can be trivially dealt with by leaving nd->path.mnt
      non-NULL, for terminate_walk() to drop it.
      
      legitimize_mnt() becomes static after that.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7e4745a0
    • A
      follow_dotdot{,_rcu}(): change calling conventions · 51c6546c
      Al Viro 提交于
      Instead of returning NULL when we are in root, just make it return
      the current position (and set *seqp and *inodep accordingly).
      That collapses the calls of step_into() in handle_dots()
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      51c6546c
    • A
      namei: get rid of pointless unlikely(read_seqcount_retry(...)) · 82ef0698
      Al Viro 提交于
      read_seqcount_retry() et.al. are inlined and there's enough annotations
      for compiler to figure out that those are unlikely to return non-zero.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      82ef0698
    • A
      __follow_mount_rcu(): verify that mount_lock remains unchanged · 20aac6c6
      Al Viro 提交于
      Validate mount_lock seqcount as soon as we cross into mount in RCU
      mode.  Sure, ->mnt_root is pinned and will remain so until we
      do rcu_read_unlock() anyway, and we will eventually fail to unlazy if
      the mount_lock had been touched, but we might run into a hard error
      (e.g. -ENOENT) before trying to unlazy.  And it's possible to end
      up with RCU pathwalk racing with rename() and umount() in a way
      that would fail with -ENOENT while non-RCU pathwalk would've
      succeeded with any timings.
      
      Once upon a time we hadn't needed that, but analysis had been subtle,
      brittle and went out of window as soon as RENAME_EXCHANGE had been
      added.
      
      It's narrow, hard to hit and won't get you anything other than
      stray -ENOENT that could be arranged in much easier way with the
      same priveleges, but it's a bug all the same.
      
      Cc: stable@kernel.org
      X-sky-is-falling: unlikely
      Fixes: da1ce067 "vfs: add cross-rename"
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      20aac6c6
  3. 11 6月, 2022 4 次提交
  4. 10 6月, 2022 2 次提交
    • D
      netfs: Fix gcc-12 warning by embedding vfs inode in netfs_i_context · 874c8ca1
      David Howells 提交于
      While randstruct was satisfied with using an open-coded "void *" offset
      cast for the netfs_i_context <-> inode casting, __builtin_object_size() as
      used by FORTIFY_SOURCE was not as easily fooled.  This was causing the
      following complaint[1] from gcc v12:
      
        In file included from include/linux/string.h:253,
                         from include/linux/ceph/ceph_debug.h:7,
                         from fs/ceph/inode.c:2:
        In function 'fortify_memset_chk',
            inlined from 'netfs_i_context_init' at include/linux/netfs.h:326:2,
            inlined from 'ceph_alloc_inode' at fs/ceph/inode.c:463:2:
        include/linux/fortify-string.h:242:25: warning: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Wattribute-warning]
          242 |                         __write_overflow_field(p_size_field, size);
              |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      Fix this by embedding a struct inode into struct netfs_i_context (which
      should perhaps be renamed to struct netfs_inode).  The struct inode
      vfs_inode fields are then removed from the 9p, afs, ceph and cifs inode
      structs and vfs_inode is then simply changed to "netfs.inode" in those
      filesystems.
      
      Further, rename netfs_i_context to netfs_inode, get rid of the
      netfs_inode() function that converted a netfs_i_context pointer to an
      inode pointer (that can now be done with &ctx->inode) and rename the
      netfs_i_context() function to netfs_inode() (which is now a wrapper
      around container_of()).
      
      Most of the changes were done with:
      
        perl -p -i -e 's/vfs_inode/netfs.inode/'g \
              `git grep -l 'vfs_inode' -- fs/{9p,afs,ceph,cifs}/*.[ch]`
      
      Kees suggested doing it with a pair structure[2] and a special
      declarator to insert that into the network filesystem's inode
      wrapper[3], but I think it's cleaner to embed it - and then it doesn't
      matter if struct randomisation reorders things.
      
      Dave Chinner suggested using a filesystem-specific VFS_I() function in
      each filesystem to convert that filesystem's own inode wrapper struct
      into the VFS inode struct[4].
      
      Version #2:
       - Fix a couple of missed name changes due to a disabled cifs option.
       - Rename nfs_i_context to nfs_inode
       - Use "netfs" instead of "nic" as the member name in per-fs inode wrapper
         structs.
      
      [ This also undoes commit 507160f4 ("netfs: gcc-12: temporarily
        disable '-Wattribute-warning' for now") that is no longer needed ]
      
      Fixes: bc899ee1 ("netfs: Add a netfs inode context")
      Reported-by: NJeff Layton <jlayton@kernel.org>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NJeff Layton <jlayton@kernel.org>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NXiubo Li <xiubli@redhat.com>
      cc: Jonathan Corbet <corbet@lwn.net>
      cc: Eric Van Hensbergen <ericvh@gmail.com>
      cc: Latchesar Ionkov <lucho@ionkov.net>
      cc: Dominique Martinet <asmadeus@codewreck.org>
      cc: Christian Schoenebeck <linux_oss@crudebyte.com>
      cc: Marc Dionne <marc.dionne@auristor.com>
      cc: Ilya Dryomov <idryomov@gmail.com>
      cc: Steve French <smfrench@gmail.com>
      cc: William Kucharski <william.kucharski@oracle.com>
      cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      cc: Dave Chinner <david@fromorbit.com>
      cc: linux-doc@vger.kernel.org
      cc: v9fs-developer@lists.sourceforge.net
      cc: linux-afs@lists.infradead.org
      cc: ceph-devel@vger.kernel.org
      cc: linux-cifs@vger.kernel.org
      cc: samba-technical@lists.samba.org
      cc: linux-fsdevel@vger.kernel.org
      cc: linux-hardening@vger.kernel.org
      Link: https://lore.kernel.org/r/d2ad3a3d7bdd794c6efb562d2f2b655fb67756b9.camel@kernel.org/ [1]
      Link: https://lore.kernel.org/r/20220517210230.864239-1-keescook@chromium.org/ [2]
      Link: https://lore.kernel.org/r/20220518202212.2322058-1-keescook@chromium.org/ [3]
      Link: https://lore.kernel.org/r/20220524101205.GI2306852@dread.disaster.area/ [4]
      Link: https://lore.kernel.org/r/165296786831.3591209.12111293034669289733.stgit@warthog.procyon.org.uk/ # v1
      Link: https://lore.kernel.org/r/165305805651.4094995.7763502506786714216.stgit@warthog.procyon.org.uk # v2
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      874c8ca1
    • L
      netfs: gcc-12: temporarily disable '-Wattribute-warning' for now · 507160f4
      Linus Torvalds 提交于
      This is a pure band-aid so that I can continue merging stuff from people
      while some of the gcc-12 fallout gets sorted out.
      
      In particular, gcc-12 is very unhappy about the kinds of pointer
      arithmetic tricks that netfs does, and that makes the fortify checks
      trigger in afs and ceph:
      
        In function ‘fortify_memset_chk’,
            inlined from ‘netfs_i_context_init’ at include/linux/netfs.h:327:2,
            inlined from ‘afs_set_netfs_context’ at fs/afs/inode.c:61:2,
            inlined from ‘afs_root_iget’ at fs/afs/inode.c:543:2:
        include/linux/fortify-string.h:258:25: warning: call to ‘__write_overflow_field’ declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Wattribute-warning]
          258 |                         __write_overflow_field(p_size_field, size);
              |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      and the reason is that netfs_i_context_init() is passed a 'struct inode'
      pointer, and then it does
      
              struct netfs_i_context *ctx = netfs_i_context(inode);
      
              memset(ctx, 0, sizeof(*ctx));
      
      where that netfs_i_context() function just does pointer arithmetic on
      the inode pointer, knowing that the netfs_i_context is laid out
      immediately after it in memory.
      
      This is all truly disgusting, since the whole "netfs_i_context is laid
      out immediately after it in memory" is not actually remotely true in
      general, but is just made to be that way for afs and ceph.
      
      See for example fs/cifs/cifsglob.h:
      
        struct cifsInodeInfo {
              struct {
                      /* These must be contiguous */
                      struct inode    vfs_inode;      /* the VFS's inode record */
                      struct netfs_i_context netfs_ctx; /* Netfslib context */
              };
      	[...]
      
      and realize that this is all entirely wrong, and the pointer arithmetic
      that netfs_i_context() is doing is also very very wrong and wouldn't
      give the right answer if netfs_ctx had different alignment rules from a
      'struct inode', for example).
      
      Anyway, that's just a long-winded way to say "the gcc-12 warning is
      actually quite reasonable, and our code happens to work but is pretty
      disgusting".
      
      This is getting fixed properly, but for now I made the mistake of
      thinking "the week right after the merge window tends to be calm for me
      as people take a breather" and I did a sustem upgrade.  And I got gcc-12
      as a result, so to continue merging fixes from people and not have the
      end result drown in warnings, I am fixing all these gcc-12 issues I hit.
      
      Including with these kinds of temporary fixes.
      
      Cc: Kees Cook <keescook@chromium.org>
      Cc: David Howells <dhowells@redhat.com>
      Link: https://lore.kernel.org/all/AEEBCF5D-8402-441D-940B-105AA718C71F@chromium.org/Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      507160f4
  5. 08 6月, 2022 3 次提交
    • D
      zonefs: fix zonefs_iomap_begin() for reads · c1c1204c
      Damien Le Moal 提交于
      If a readahead is issued to a sequential zone file with an offset
      exactly equal to the current file size, the iomap type is set to
      IOMAP_UNWRITTEN, which will prevent an IO, but the iomap length is
      calculated as 0. This causes a WARN_ON() in iomap_iter():
      
      [17309.548939] WARNING: CPU: 3 PID: 2137 at fs/iomap/iter.c:34 iomap_iter+0x9cf/0xe80
      [...]
      [17309.650907] RIP: 0010:iomap_iter+0x9cf/0xe80
      [...]
      [17309.754560] Call Trace:
      [17309.757078]  <TASK>
      [17309.759240]  ? lock_is_held_type+0xd8/0x130
      [17309.763531]  iomap_readahead+0x1a8/0x870
      [17309.767550]  ? iomap_read_folio+0x4c0/0x4c0
      [17309.771817]  ? lockdep_hardirqs_on_prepare+0x400/0x400
      [17309.778848]  ? lock_release+0x370/0x750
      [17309.784462]  ? folio_add_lru+0x217/0x3f0
      [17309.790220]  ? reacquire_held_locks+0x4e0/0x4e0
      [17309.796543]  read_pages+0x17d/0xb60
      [17309.801854]  ? folio_add_lru+0x238/0x3f0
      [17309.807573]  ? readahead_expand+0x5f0/0x5f0
      [17309.813554]  ? policy_node+0xb5/0x140
      [17309.819018]  page_cache_ra_unbounded+0x27d/0x450
      [17309.825439]  filemap_get_pages+0x500/0x1450
      [17309.831444]  ? filemap_add_folio+0x140/0x140
      [17309.837519]  ? lock_is_held_type+0xd8/0x130
      [17309.843509]  filemap_read+0x28c/0x9f0
      [17309.848953]  ? zonefs_file_read_iter+0x1ea/0x4d0 [zonefs]
      [17309.856162]  ? trace_contention_end+0xd6/0x130
      [17309.862416]  ? __mutex_lock+0x221/0x1480
      [17309.868151]  ? zonefs_file_read_iter+0x166/0x4d0 [zonefs]
      [17309.875364]  ? filemap_get_pages+0x1450/0x1450
      [17309.881647]  ? __mutex_unlock_slowpath+0x15e/0x620
      [17309.888248]  ? wait_for_completion_io_timeout+0x20/0x20
      [17309.895231]  ? lock_is_held_type+0xd8/0x130
      [17309.901115]  ? lock_is_held_type+0xd8/0x130
      [17309.906934]  zonefs_file_read_iter+0x356/0x4d0 [zonefs]
      [17309.913750]  new_sync_read+0x2d8/0x520
      [17309.919035]  ? __x64_sys_lseek+0x1d0/0x1d0
      
      Furthermore, this causes iomap_readahead() to loop forever as
      iomap_readahead_iter() always returns 0, making no progress.
      
      Fix this by treating reads after the file size as access to holes,
      setting the iomap type to IOMAP_HOLE, the iomap addr to IOMAP_NULL_ADDR
      and using the length argument as is for the iomap length. To simplify
      the code with this change, zonefs_iomap_begin() is split into the read
      variant, zonefs_read_iomap_begin() and zonefs_read_iomap_ops, and the
      write variant, zonefs_write_iomap_begin() and zonefs_write_iomap_ops.
      Reported-by: NJorgen Hansen <Jorgen.Hansen@wdc.com>
      Fixes: 8dcc1a9d ("fs: New zonefs file system")
      Signed-off-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Reviewed-by: NJorgen Hansen <Jorgen.Hansen@wdc.com>
      c1c1204c
    • D
      zonefs: Do not ignore explicit_open with active zone limit · 96eca145
      Damien Le Moal 提交于
      A zoned device may have no limit on the number of open zones but may
      have a limit on the number of active zones it can support. In such
      case, the explicit_open mount option should not be ignored to ensure
      that the open() system call activates the zone with an explicit zone
      open command, thus guaranteeing that the zone can be written.
      
      Enforce this by ignoring the explicit_open mount option only for
      devices that have both the open and active zone limits equal to 0.
      
      Fixes: 87c9ce3f ("zonefs: Add active seq file accounting")
      Signed-off-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      96eca145
    • D
      zonefs: fix handling of explicit_open option on mount · a2a513be
      Damien Le Moal 提交于
      Ignoring the explicit_open mount option on mount for devices that do not
      have a limit on the number of open zones must be done after the mount
      options are parsed and set in s_mount_opts. Move the check to ignore
      the explicit_open option after the call to zonefs_parse_options() in
      zonefs_fill_super().
      
      Fixes: b5c00e97 ("zonefs: open/close zone on file open/close")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      a2a513be
  6. 07 6月, 2022 1 次提交
  7. 06 6月, 2022 5 次提交
  8. 05 6月, 2022 1 次提交
  9. 04 6月, 2022 1 次提交
    • P
      cifs: skip trailing separators of prefix paths · ef605e86
      Paulo Alcantara 提交于
      During DFS failover, prefix paths may change, so make sure to not
      leave trailing separators when parsing thew in
      dfs_cache_get_tgt_share().  The separators of prefix paths are already
      handled by build_path_from_dentry_optional_prefix().
      
      Consider the following DFS link:
      
        //dom/dfs/link: [\srv1\share\dir1, \srv2\share\dir1]
      
      Before commit:
      
        mount.cifs //dom/dfs/link
        tree connect to \\srv1\share; prefix_path=dir1
        disconnect srv1; failover to srv2
        tree connect to \\srv2\share; prefix_path=dir1\
        mv foo bar
      
        ...
        SMB2 430 Create Request File: dir1\\foo;GetInfo Request FILE_INFO/SMB2_FILE_ALL_INFO;Close Request
        SMB2 582 Create Response File: dir1\\foo;GetInfo Response;Close Response
        SMB2 430 Create Request File: dir1\\bar;GetInfo Request FILE_INFO/SMB2_FILE_ALL_INFO;Close Request
        SMB2 286 Create Response, Error: STATUS_OBJECT_NAME_NOT_FOUND;GetInfo Response, Error: STATUS_OBJECT_NAME_NOT_FOUND;Close Response, Error: STATUS_OBJECT_NAME_NOT_FOUND
        SMB2 462 Create Request File: dir1\\foo;SetInfo Request FILE_INFO/SMB2_FILE_RENAME_INFO NewName:dir1\\bar;Close Request
        SMB2 478 Create Response File: dir1\\foo;SetInfo Response, Error: STATUS_OBJECT_NAME_INVALID;Close Response
      
      After commit:
      
        mount.cifs //dom/dfs/link
        tree connect to \\srv1\share; prefix_path=dir1
        disconnect srv1; failover to srv2
        tree connect to \\srv2\share; prefix_path=dir1
        mv foo bar
      
        ...
        SMB2 430 Create Request File: dir1\foo;GetInfo Request FILE_INFO/SMB2_FILE_ALL_INFO;Close Request
        SMB2 582 Create Response File: dir1\foo;GetInfo Response;Close Response
        SMB2 430 Create Request File: dir1\bar;GetInfo Request FILE_INFO/SMB2_FILE_ALL_INFO;Close Request
        SMB2 286 Create Response, Error: STATUS_OBJECT_NAME_NOT_FOUND;GetInfo Response, Error: STATUS_OBJECT_NAME_NOT_FOUND;Close Response, Error: STATUS_OBJECT_NAME_NOT_FOUND
        SMB2 462 Create Request File: dir1\foo;SetInfo Request FILE_INFO/SMB2_FILE_RENAME_INFO NewName:dir1\bar;Close Request
        SMB2 478 Create Response File: dir1\foo;SetInfo Response;Close Response
      Signed-off-by: NPaulo Alcantara (SUSE) <pc@cjr.nz>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      ef605e86
  10. 03 6月, 2022 1 次提交
  11. 02 6月, 2022 6 次提交
  12. 01 6月, 2022 6 次提交
  13. 31 5月, 2022 3 次提交