1. 09 6月, 2018 1 次提交
    • S
      nfsd: fix potential use-after-free in nfsd4_decode_getdeviceinfo · 3171822f
      Scott Mayhew 提交于
      When running a fuzz tester against a KASAN-enabled kernel, the following
      splat periodically occurs.
      
      The problem occurs when the test sends a GETDEVICEINFO request with a
      malformed xdr array (size but no data) for gdia_notify_types and the
      array size is > 0x3fffffff, which results in an overflow in the value of
      nbytes which is passed to read_buf().
      
      If the array size is 0x40000000, 0x80000000, or 0xc0000000, then after
      the overflow occurs, the value of nbytes 0, and when that happens the
      pointer returned by read_buf() points to the end of the xdr data (i.e.
      argp->end) when really it should be returning NULL.
      
      Fix this by returning NFS4ERR_BAD_XDR if the array size is > 1000 (this
      value is arbitrary, but it's the same threshold used by
      nfsd4_decode_bitmap()... in could really be any value >= 1 since it's
      expected to get at most a single bitmap in gdia_notify_types).
      
      [  119.256854] ==================================================================
      [  119.257611] BUG: KASAN: use-after-free in nfsd4_decode_getdeviceinfo+0x5a4/0x5b0 [nfsd]
      [  119.258422] Read of size 4 at addr ffff880113ada000 by task nfsd/538
      
      [  119.259146] CPU: 0 PID: 538 Comm: nfsd Not tainted 4.17.0+ #1
      [  119.259662] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.fc25 04/01/2014
      [  119.261202] Call Trace:
      [  119.262265]  dump_stack+0x71/0xab
      [  119.263371]  print_address_description+0x6a/0x270
      [  119.264609]  kasan_report+0x258/0x380
      [  119.265854]  ? nfsd4_decode_getdeviceinfo+0x5a4/0x5b0 [nfsd]
      [  119.267291]  nfsd4_decode_getdeviceinfo+0x5a4/0x5b0 [nfsd]
      [  119.268549]  ? nfs4svc_decode_compoundargs+0xa5b/0x13c0 [nfsd]
      [  119.269873]  ? nfsd4_decode_sequence+0x490/0x490 [nfsd]
      [  119.271095]  nfs4svc_decode_compoundargs+0xa5b/0x13c0 [nfsd]
      [  119.272393]  ? nfsd4_release_compoundargs+0x1b0/0x1b0 [nfsd]
      [  119.273658]  nfsd_dispatch+0x183/0x850 [nfsd]
      [  119.274918]  svc_process+0x161c/0x31a0 [sunrpc]
      [  119.276172]  ? svc_printk+0x190/0x190 [sunrpc]
      [  119.277386]  ? svc_xprt_release+0x451/0x680 [sunrpc]
      [  119.278622]  nfsd+0x2b9/0x430 [nfsd]
      [  119.279771]  ? nfsd_destroy+0x1c0/0x1c0 [nfsd]
      [  119.281157]  kthread+0x2db/0x390
      [  119.282347]  ? kthread_create_worker_on_cpu+0xc0/0xc0
      [  119.283756]  ret_from_fork+0x35/0x40
      
      [  119.286041] Allocated by task 436:
      [  119.287525]  kasan_kmalloc+0xa0/0xd0
      [  119.288685]  kmem_cache_alloc+0xe9/0x1f0
      [  119.289900]  get_empty_filp+0x7b/0x410
      [  119.291037]  path_openat+0xca/0x4220
      [  119.292242]  do_filp_open+0x182/0x280
      [  119.293411]  do_sys_open+0x216/0x360
      [  119.294555]  do_syscall_64+0xa0/0x2f0
      [  119.295721]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      [  119.298068] Freed by task 436:
      [  119.299271]  __kasan_slab_free+0x130/0x180
      [  119.300557]  kmem_cache_free+0x78/0x210
      [  119.301823]  rcu_process_callbacks+0x35b/0xbd0
      [  119.303162]  __do_softirq+0x192/0x5ea
      
      [  119.305443] The buggy address belongs to the object at ffff880113ada000
                      which belongs to the cache filp of size 256
      [  119.308556] The buggy address is located 0 bytes inside of
                      256-byte region [ffff880113ada000, ffff880113ada100)
      [  119.311376] The buggy address belongs to the page:
      [  119.312728] page:ffffea00044eb680 count:1 mapcount:0 mapping:0000000000000000 index:0xffff880113ada780
      [  119.314428] flags: 0x17ffe000000100(slab)
      [  119.315740] raw: 0017ffe000000100 0000000000000000 ffff880113ada780 00000001000c0001
      [  119.317379] raw: ffffea0004553c60 ffffea00045c11e0 ffff88011b167e00 0000000000000000
      [  119.319050] page dumped because: kasan: bad access detected
      
      [  119.321652] Memory state around the buggy address:
      [  119.322993]  ffff880113ad9f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [  119.324515]  ffff880113ad9f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [  119.326087] >ffff880113ada000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  119.327547]                    ^
      [  119.328730]  ffff880113ada080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  119.330218]  ffff880113ada100: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
      [  119.331740] ==================================================================
      Signed-off-by: NScott Mayhew <smayhew@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      3171822f
  2. 06 6月, 2018 1 次提交
    • D
      vfs: change inode times to use struct timespec64 · 95582b00
      Deepa Dinamani 提交于
      struct timespec is not y2038 safe. Transition vfs to use
      y2038 safe struct timespec64 instead.
      
      The change was made with the help of the following cocinelle
      script. This catches about 80% of the changes.
      All the header file and logic changes are included in the
      first 5 rules. The rest are trivial substitutions.
      I avoid changing any of the function signatures or any other
      filesystem specific data structures to keep the patch simple
      for review.
      
      The script can be a little shorter by combining different cases.
      But, this version was sufficient for my usecase.
      
      virtual patch
      
      @ depends on patch @
      identifier now;
      @@
      - struct timespec
      + struct timespec64
        current_time ( ... )
        {
      - struct timespec now = current_kernel_time();
      + struct timespec64 now = current_kernel_time64();
        ...
      - return timespec_trunc(
      + return timespec64_trunc(
        ... );
        }
      
      @ depends on patch @
      identifier xtime;
      @@
       struct \( iattr \| inode \| kstat \) {
       ...
      -       struct timespec xtime;
      +       struct timespec64 xtime;
       ...
       }
      
      @ depends on patch @
      identifier t;
      @@
       struct inode_operations {
       ...
      int (*update_time) (...,
      -       struct timespec t,
      +       struct timespec64 t,
      ...);
       ...
       }
      
      @ depends on patch @
      identifier t;
      identifier fn_update_time =~ "update_time$";
      @@
       fn_update_time (...,
      - struct timespec *t,
      + struct timespec64 *t,
       ...) { ... }
      
      @ depends on patch @
      identifier t;
      @@
      lease_get_mtime( ... ,
      - struct timespec *t
      + struct timespec64 *t
        ) { ... }
      
      @te depends on patch forall@
      identifier ts;
      local idexpression struct inode *inode_node;
      identifier i_xtime =~ "^i_[acm]time$";
      identifier ia_xtime =~ "^ia_[acm]time$";
      identifier fn_update_time =~ "update_time$";
      identifier fn;
      expression e, E3;
      local idexpression struct inode *node1;
      local idexpression struct inode *node2;
      local idexpression struct iattr *attr1;
      local idexpression struct iattr *attr2;
      local idexpression struct iattr attr;
      identifier i_xtime1 =~ "^i_[acm]time$";
      identifier i_xtime2 =~ "^i_[acm]time$";
      identifier ia_xtime1 =~ "^ia_[acm]time$";
      identifier ia_xtime2 =~ "^ia_[acm]time$";
      @@
      (
      (
      - struct timespec ts;
      + struct timespec64 ts;
      |
      - struct timespec ts = current_time(inode_node);
      + struct timespec64 ts = current_time(inode_node);
      )
      
      <+... when != ts
      (
      - timespec_equal(&inode_node->i_xtime, &ts)
      + timespec64_equal(&inode_node->i_xtime, &ts)
      |
      - timespec_equal(&ts, &inode_node->i_xtime)
      + timespec64_equal(&ts, &inode_node->i_xtime)
      |
      - timespec_compare(&inode_node->i_xtime, &ts)
      + timespec64_compare(&inode_node->i_xtime, &ts)
      |
      - timespec_compare(&ts, &inode_node->i_xtime)
      + timespec64_compare(&ts, &inode_node->i_xtime)
      |
      ts = current_time(e)
      |
      fn_update_time(..., &ts,...)
      |
      inode_node->i_xtime = ts
      |
      node1->i_xtime = ts
      |
      ts = inode_node->i_xtime
      |
      <+... attr1->ia_xtime ...+> = ts
      |
      ts = attr1->ia_xtime
      |
      ts.tv_sec
      |
      ts.tv_nsec
      |
      btrfs_set_stack_timespec_sec(..., ts.tv_sec)
      |
      btrfs_set_stack_timespec_nsec(..., ts.tv_nsec)
      |
      - ts = timespec64_to_timespec(
      + ts =
      ...
      -)
      |
      - ts = ktime_to_timespec(
      + ts = ktime_to_timespec64(
      ...)
      |
      - ts = E3
      + ts = timespec_to_timespec64(E3)
      |
      - ktime_get_real_ts(&ts)
      + ktime_get_real_ts64(&ts)
      |
      fn(...,
      - ts
      + timespec64_to_timespec(ts)
      ,...)
      )
      ...+>
      (
      <... when != ts
      - return ts;
      + return timespec64_to_timespec(ts);
      ...>
      )
      |
      - timespec_equal(&node1->i_xtime1, &node2->i_xtime2)
      + timespec64_equal(&node1->i_xtime2, &node2->i_xtime2)
      |
      - timespec_equal(&node1->i_xtime1, &attr2->ia_xtime2)
      + timespec64_equal(&node1->i_xtime2, &attr2->ia_xtime2)
      |
      - timespec_compare(&node1->i_xtime1, &node2->i_xtime2)
      + timespec64_compare(&node1->i_xtime1, &node2->i_xtime2)
      |
      node1->i_xtime1 =
      - timespec_trunc(attr1->ia_xtime1,
      + timespec64_trunc(attr1->ia_xtime1,
      ...)
      |
      - attr1->ia_xtime1 = timespec_trunc(attr2->ia_xtime2,
      + attr1->ia_xtime1 =  timespec64_trunc(attr2->ia_xtime2,
      ...)
      |
      - ktime_get_real_ts(&attr1->ia_xtime1)
      + ktime_get_real_ts64(&attr1->ia_xtime1)
      |
      - ktime_get_real_ts(&attr.ia_xtime1)
      + ktime_get_real_ts64(&attr.ia_xtime1)
      )
      
      @ depends on patch @
      struct inode *node;
      struct iattr *attr;
      identifier fn;
      identifier i_xtime =~ "^i_[acm]time$";
      identifier ia_xtime =~ "^ia_[acm]time$";
      expression e;
      @@
      (
      - fn(node->i_xtime);
      + fn(timespec64_to_timespec(node->i_xtime));
      |
       fn(...,
      - node->i_xtime);
      + timespec64_to_timespec(node->i_xtime));
      |
      - e = fn(attr->ia_xtime);
      + e = fn(timespec64_to_timespec(attr->ia_xtime));
      )
      
      @ depends on patch forall @
      struct inode *node;
      struct iattr *attr;
      identifier i_xtime =~ "^i_[acm]time$";
      identifier ia_xtime =~ "^ia_[acm]time$";
      identifier fn;
      @@
      {
      + struct timespec ts;
      <+...
      (
      + ts = timespec64_to_timespec(node->i_xtime);
      fn (...,
      - &node->i_xtime,
      + &ts,
      ...);
      |
      + ts = timespec64_to_timespec(attr->ia_xtime);
      fn (...,
      - &attr->ia_xtime,
      + &ts,
      ...);
      )
      ...+>
      }
      
      @ depends on patch forall @
      struct inode *node;
      struct iattr *attr;
      struct kstat *stat;
      identifier ia_xtime =~ "^ia_[acm]time$";
      identifier i_xtime =~ "^i_[acm]time$";
      identifier xtime =~ "^[acm]time$";
      identifier fn, ret;
      @@
      {
      + struct timespec ts;
      <+...
      (
      + ts = timespec64_to_timespec(node->i_xtime);
      ret = fn (...,
      - &node->i_xtime,
      + &ts,
      ...);
      |
      + ts = timespec64_to_timespec(node->i_xtime);
      ret = fn (...,
      - &node->i_xtime);
      + &ts);
      |
      + ts = timespec64_to_timespec(attr->ia_xtime);
      ret = fn (...,
      - &attr->ia_xtime,
      + &ts,
      ...);
      |
      + ts = timespec64_to_timespec(attr->ia_xtime);
      ret = fn (...,
      - &attr->ia_xtime);
      + &ts);
      |
      + ts = timespec64_to_timespec(stat->xtime);
      ret = fn (...,
      - &stat->xtime);
      + &ts);
      )
      ...+>
      }
      
      @ depends on patch @
      struct inode *node;
      struct inode *node2;
      identifier i_xtime1 =~ "^i_[acm]time$";
      identifier i_xtime2 =~ "^i_[acm]time$";
      identifier i_xtime3 =~ "^i_[acm]time$";
      struct iattr *attrp;
      struct iattr *attrp2;
      struct iattr attr ;
      identifier ia_xtime1 =~ "^ia_[acm]time$";
      identifier ia_xtime2 =~ "^ia_[acm]time$";
      struct kstat *stat;
      struct kstat stat1;
      struct timespec64 ts;
      identifier xtime =~ "^[acmb]time$";
      expression e;
      @@
      (
      ( node->i_xtime2 \| attrp->ia_xtime2 \| attr.ia_xtime2 \) = node->i_xtime1  ;
      |
       node->i_xtime2 = \( node2->i_xtime1 \| timespec64_trunc(...) \);
      |
       node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
      |
       node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \);
      |
       stat->xtime = node2->i_xtime1;
      |
       stat1.xtime = node2->i_xtime1;
      |
      ( node->i_xtime2 \| attrp->ia_xtime2 \) = attrp->ia_xtime1  ;
      |
      ( attrp->ia_xtime1 \| attr.ia_xtime1 \) = attrp2->ia_xtime2;
      |
      - e = node->i_xtime1;
      + e = timespec64_to_timespec( node->i_xtime1 );
      |
      - e = attrp->ia_xtime1;
      + e = timespec64_to_timespec( attrp->ia_xtime1 );
      |
      node->i_xtime1 = current_time(...);
      |
       node->i_xtime2 = node->i_xtime1 = node->i_xtime3 =
      - e;
      + timespec_to_timespec64(e);
      |
       node->i_xtime1 = node->i_xtime3 =
      - e;
      + timespec_to_timespec64(e);
      |
      - node->i_xtime1 = e;
      + node->i_xtime1 = timespec_to_timespec64(e);
      )
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Cc: <anton@tuxera.com>
      Cc: <balbi@kernel.org>
      Cc: <bfields@fieldses.org>
      Cc: <darrick.wong@oracle.com>
      Cc: <dhowells@redhat.com>
      Cc: <dsterba@suse.com>
      Cc: <dwmw2@infradead.org>
      Cc: <hch@lst.de>
      Cc: <hirofumi@mail.parknet.co.jp>
      Cc: <hubcap@omnibond.com>
      Cc: <jack@suse.com>
      Cc: <jaegeuk@kernel.org>
      Cc: <jaharkes@cs.cmu.edu>
      Cc: <jslaby@suse.com>
      Cc: <keescook@chromium.org>
      Cc: <mark@fasheh.com>
      Cc: <miklos@szeredi.hu>
      Cc: <nico@linaro.org>
      Cc: <reiserfs-devel@vger.kernel.org>
      Cc: <richard@nod.at>
      Cc: <sage@redhat.com>
      Cc: <sfrench@samba.org>
      Cc: <swhiteho@redhat.com>
      Cc: <tj@kernel.org>
      Cc: <trond.myklebust@primarydata.com>
      Cc: <tytso@mit.edu>
      Cc: <viro@zeniv.linux.org.uk>
      95582b00
  3. 08 5月, 2018 1 次提交
  4. 04 4月, 2018 2 次提交
    • J
      nfsd: fix incorrect umasks · 880a3a53
      J. Bruce Fields 提交于
      We're neglecting to clear the umask after it's set, which can cause a
      later unrelated rpc to (incorrectly) use the same umask if it happens to
      be processed by the same thread.
      
      There's a more subtle problem here too:
      
      An NFSv4 compound request is decoded all in one pass before any
      operations are executed.
      
      Currently we're setting current->fs->umask at the time we decode the
      compound.  In theory a single compound could contain multiple creates
      each setting a umask.  In that case we'd end up using whichever umask
      was passed in the *last* operation as the umask for all the creates,
      whether that was correct or not.
      
      So, we should just be saving the umask at decode time and waiting to set
      it until we actually process the corresponding operation.
      
      In practice it's unlikely any client would do multiple creates in a
      single compound.  And even if it did they'd likely be from the same
      process (hence carry the same umask).  So this is a little academic, but
      we should get it right anyway.
      
      Fixes: 47057abd (nfsd: add support for the umask attribute)
      Cc: stable@vger.kernel.org
      Reported-by: NLucash Stach <l.stach@pengutronix.de>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      880a3a53
    • C
      nfsd: Add I/O trace points in the NFSv4 read proc · 87c5942e
      Chuck Lever 提交于
      NFSv4 read compound processing invokes nfsd_splice_read and
      nfs_readv directly, so the trace points currently in nfsd_read are
      not invoked for NFSv4 reads.
      
      Move the NFSD READ trace points to common helpers so that NFSv4
      reads are captured.
      
      Also, record any local I/O error that occurs, the total count of
      bytes that were actually returned, and whether splice or vectored
      read was used.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      87c5942e
  5. 20 3月, 2018 1 次提交
  6. 09 2月, 2018 3 次提交
    • A
      NFSD: hide unused svcxdr_dupstr() · 2285ae76
      Arnd Bergmann 提交于
      There is now only one caller left for svcxdr_dupstr() and this is inside
      of an #ifdef, so we can get a warning when the option is disabled:
      
      fs/nfsd/nfs4xdr.c:241:1: error: 'svcxdr_dupstr' defined but not used [-Werror=unused-function]
      
      This changes the remaining caller to use a nicer IS_ENABLED() check,
      which lets the compiler drop the unused code silently.
      
      Fixes: e40d99e6183e ("NFSD: Clean up symlink argument XDR decoders")
      Suggested-by: NRasmus Villemoes <rasmus.villemoes@prevas.dk>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      2285ae76
    • A
      nfsd: store stat times in fill_pre_wcc() instead of inode times · 39ca1bf6
      Amir Goldstein 提交于
      The time values in stat and inode may differ for overlayfs and stat time
      values are the correct ones to use. This is also consistent with the fact
      that fill_post_wcc() also stores stat time values.
      
      This means introducing a stat call that could fail, where previously we
      were just copying values out of the inode.  To be conservative about
      changing behavior, we fall back to copying values out of the inode in
      the error case.  It might be better just to clear fh_pre_saved (though
      note the BUG_ON in set_change_info).
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      39ca1bf6
    • J
      nfsd: return RESOURCE not GARBAGE_ARGS on too many ops · 0078117c
      J. Bruce Fields 提交于
      A client that sends more than a hundred ops in a single compound
      currently gets an rpc-level GARBAGE_ARGS error.
      
      It would be more helpful to return NFS4ERR_RESOURCE, since that gives
      the client a better idea how to recover (for example by splitting up the
      compound into smaller compounds).
      
      This is all a bit academic since we've never actually seen a reason for
      clients to send such long compounds, but we may as well fix it.
      
      While we're there, just use NFSD4_MAX_OPS_PER_COMPOUND == 16, the
      constant we already use in the 4.1 case, instead of hard-coding 100.
      Chances anyone actually uses even 16 ops per compound are small enough
      that I think there's a neglible risk or any regression.
      
      This fixes pynfs test COMP6.
      Reported-by: N"Lu, Xinyu" <luxy.fnst@cn.fujitsu.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      0078117c
  7. 06 9月, 2017 1 次提交
    • C
      nfsd: Incoming xdr_bufs may have content in tail buffer · eae03e2a
      Chuck Lever 提交于
      Since the beginning, svcsock has built a received RPC Call message
      by populating the xdr_buf's head, then placing the remaining
      message bytes in the xdr_buf's page list. The xdr_buf's tail is
      never populated.
      
      This means that an NFSv4 COMPOUND containing an NFS WRITE operation
      plus trailing operations has a page list that contains the WRITE
      data payload followed by the trailing operations. NFSv4 XDR decoders
      will not look in the xdr_buf's tail, ever, because svcsock never put
      anything there.
      
      To support transports that can pass the write payload in the
      xdr_buf's pagelist and trailing content in the xdr_buf's tail,
      introduce logic in READ_BUF that switches to the xdr_buf's tail vec
      when the decoder runs out of content in rq_arg.pages.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      eae03e2a
  8. 25 8月, 2017 6 次提交
  9. 14 7月, 2017 3 次提交
  10. 13 7月, 2017 1 次提交
    • J
      nfsd4: factor ctime into change attribute · 630458e7
      J. Bruce Fields 提交于
      Factoring ctime into the nfsv4 change attribute gives us better
      properties than just i_version alone.
      
      Eventually we'll likely also expose this (as opposed to raw i_version)
      to userspace, at which point we'll want to move it to a common helper,
      called from either userspace or individual filesystems.  For now, nfsd
      is the only user.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      630458e7
  11. 15 5月, 2017 3 次提交
  12. 11 5月, 2017 2 次提交
  13. 03 3月, 2017 1 次提交
    • D
      statx: Add a system call to make enhanced file info available · a528d35e
      David Howells 提交于
      Add a system call to make extended file information available, including
      file creation and some attribute flags where available through the
      underlying filesystem.
      
      The getattr inode operation is altered to take two additional arguments: a
      u32 request_mask and an unsigned int flags that indicate the
      synchronisation mode.  This change is propagated to the vfs_getattr*()
      function.
      
      Functions like vfs_stat() are now inline wrappers around new functions
      vfs_statx() and vfs_statx_fd() to reduce stack usage.
      
      ========
      OVERVIEW
      ========
      
      The idea was initially proposed as a set of xattrs that could be retrieved
      with getxattr(), but the general preference proved to be for a new syscall
      with an extended stat structure.
      
      A number of requests were gathered for features to be included.  The
      following have been included:
      
       (1) Make the fields a consistent size on all arches and make them large.
      
       (2) Spare space, request flags and information flags are provided for
           future expansion.
      
       (3) Better support for the y2038 problem [Arnd Bergmann] (tv_sec is an
           __s64).
      
       (4) Creation time: The SMB protocol carries the creation time, which could
           be exported by Samba, which will in turn help CIFS make use of
           FS-Cache as that can be used for coherency data (stx_btime).
      
           This is also specified in NFSv4 as a recommended attribute and could
           be exported by NFSD [Steve French].
      
       (5) Lightweight stat: Ask for just those details of interest, and allow a
           netfs (such as NFS) to approximate anything not of interest, possibly
           without going to the server [Trond Myklebust, Ulrich Drepper, Andreas
           Dilger] (AT_STATX_DONT_SYNC).
      
       (6) Heavyweight stat: Force a netfs to go to the server, even if it thinks
           its cached attributes are up to date [Trond Myklebust]
           (AT_STATX_FORCE_SYNC).
      
      And the following have been left out for future extension:
      
       (7) Data version number: Could be used by userspace NFS servers [Aneesh
           Kumar].
      
           Can also be used to modify fill_post_wcc() in NFSD which retrieves
           i_version directly, but has just called vfs_getattr().  It could get
           it from the kstat struct if it used vfs_xgetattr() instead.
      
           (There's disagreement on the exact semantics of a single field, since
           not all filesystems do this the same way).
      
       (8) BSD stat compatibility: Including more fields from the BSD stat such
           as creation time (st_btime) and inode generation number (st_gen)
           [Jeremy Allison, Bernd Schubert].
      
       (9) Inode generation number: Useful for FUSE and userspace NFS servers
           [Bernd Schubert].
      
           (This was asked for but later deemed unnecessary with the
           open-by-handle capability available and caused disagreement as to
           whether it's a security hole or not).
      
      (10) Extra coherency data may be useful in making backups [Andreas Dilger].
      
           (No particular data were offered, but things like last backup
           timestamp, the data version number and the DOS archive bit would come
           into this category).
      
      (11) Allow the filesystem to indicate what it can/cannot provide: A
           filesystem can now say it doesn't support a standard stat feature if
           that isn't available, so if, for instance, inode numbers or UIDs don't
           exist or are fabricated locally...
      
           (This requires a separate system call - I have an fsinfo() call idea
           for this).
      
      (12) Store a 16-byte volume ID in the superblock that can be returned in
           struct xstat [Steve French].
      
           (Deferred to fsinfo).
      
      (13) Include granularity fields in the time data to indicate the
           granularity of each of the times (NFSv4 time_delta) [Steve French].
      
           (Deferred to fsinfo).
      
      (14) FS_IOC_GETFLAGS value.  These could be translated to BSD's st_flags.
           Note that the Linux IOC flags are a mess and filesystems such as Ext4
           define flags that aren't in linux/fs.h, so translation in the kernel
           may be a necessity (or, possibly, we provide the filesystem type too).
      
           (Some attributes are made available in stx_attributes, but the general
           feeling was that the IOC flags were to ext[234]-specific and shouldn't
           be exposed through statx this way).
      
      (15) Mask of features available on file (eg: ACLs, seclabel) [Brad Boyer,
           Michael Kerrisk].
      
           (Deferred, probably to fsinfo.  Finding out if there's an ACL or
           seclabal might require extra filesystem operations).
      
      (16) Femtosecond-resolution timestamps [Dave Chinner].
      
           (A __reserved field has been left in the statx_timestamp struct for
           this - if there proves to be a need).
      
      (17) A set multiple attributes syscall to go with this.
      
      ===============
      NEW SYSTEM CALL
      ===============
      
      The new system call is:
      
      	int ret = statx(int dfd,
      			const char *filename,
      			unsigned int flags,
      			unsigned int mask,
      			struct statx *buffer);
      
      The dfd, filename and flags parameters indicate the file to query, in a
      similar way to fstatat().  There is no equivalent of lstat() as that can be
      emulated with statx() by passing AT_SYMLINK_NOFOLLOW in flags.  There is
      also no equivalent of fstat() as that can be emulated by passing a NULL
      filename to statx() with the fd of interest in dfd.
      
      Whether or not statx() synchronises the attributes with the backing store
      can be controlled by OR'ing a value into the flags argument (this typically
      only affects network filesystems):
      
       (1) AT_STATX_SYNC_AS_STAT tells statx() to behave as stat() does in this
           respect.
      
       (2) AT_STATX_FORCE_SYNC will require a network filesystem to synchronise
           its attributes with the server - which might require data writeback to
           occur to get the timestamps correct.
      
       (3) AT_STATX_DONT_SYNC will suppress synchronisation with the server in a
           network filesystem.  The resulting values should be considered
           approximate.
      
      mask is a bitmask indicating the fields in struct statx that are of
      interest to the caller.  The user should set this to STATX_BASIC_STATS to
      get the basic set returned by stat().  It should be noted that asking for
      more information may entail extra I/O operations.
      
      buffer points to the destination for the data.  This must be 256 bytes in
      size.
      
      ======================
      MAIN ATTRIBUTES RECORD
      ======================
      
      The following structures are defined in which to return the main attribute
      set:
      
      	struct statx_timestamp {
      		__s64	tv_sec;
      		__s32	tv_nsec;
      		__s32	__reserved;
      	};
      
      	struct statx {
      		__u32	stx_mask;
      		__u32	stx_blksize;
      		__u64	stx_attributes;
      		__u32	stx_nlink;
      		__u32	stx_uid;
      		__u32	stx_gid;
      		__u16	stx_mode;
      		__u16	__spare0[1];
      		__u64	stx_ino;
      		__u64	stx_size;
      		__u64	stx_blocks;
      		__u64	__spare1[1];
      		struct statx_timestamp	stx_atime;
      		struct statx_timestamp	stx_btime;
      		struct statx_timestamp	stx_ctime;
      		struct statx_timestamp	stx_mtime;
      		__u32	stx_rdev_major;
      		__u32	stx_rdev_minor;
      		__u32	stx_dev_major;
      		__u32	stx_dev_minor;
      		__u64	__spare2[14];
      	};
      
      The defined bits in request_mask and stx_mask are:
      
      	STATX_TYPE		Want/got stx_mode & S_IFMT
      	STATX_MODE		Want/got stx_mode & ~S_IFMT
      	STATX_NLINK		Want/got stx_nlink
      	STATX_UID		Want/got stx_uid
      	STATX_GID		Want/got stx_gid
      	STATX_ATIME		Want/got stx_atime{,_ns}
      	STATX_MTIME		Want/got stx_mtime{,_ns}
      	STATX_CTIME		Want/got stx_ctime{,_ns}
      	STATX_INO		Want/got stx_ino
      	STATX_SIZE		Want/got stx_size
      	STATX_BLOCKS		Want/got stx_blocks
      	STATX_BASIC_STATS	[The stuff in the normal stat struct]
      	STATX_BTIME		Want/got stx_btime{,_ns}
      	STATX_ALL		[All currently available stuff]
      
      stx_btime is the file creation time, stx_mask is a bitmask indicating the
      data provided and __spares*[] are where as-yet undefined fields can be
      placed.
      
      Time fields are structures with separate seconds and nanoseconds fields
      plus a reserved field in case we want to add even finer resolution.  Note
      that times will be negative if before 1970; in such a case, the nanosecond
      fields will also be negative if not zero.
      
      The bits defined in the stx_attributes field convey information about a
      file, how it is accessed, where it is and what it does.  The following
      attributes map to FS_*_FL flags and are the same numerical value:
      
      	STATX_ATTR_COMPRESSED		File is compressed by the fs
      	STATX_ATTR_IMMUTABLE		File is marked immutable
      	STATX_ATTR_APPEND		File is append-only
      	STATX_ATTR_NODUMP		File is not to be dumped
      	STATX_ATTR_ENCRYPTED		File requires key to decrypt in fs
      
      Within the kernel, the supported flags are listed by:
      
      	KSTAT_ATTR_FS_IOC_FLAGS
      
      [Are any other IOC flags of sufficient general interest to be exposed
      through this interface?]
      
      New flags include:
      
      	STATX_ATTR_AUTOMOUNT		Object is an automount trigger
      
      These are for the use of GUI tools that might want to mark files specially,
      depending on what they are.
      
      Fields in struct statx come in a number of classes:
      
       (0) stx_dev_*, stx_blksize.
      
           These are local system information and are always available.
      
       (1) stx_mode, stx_nlinks, stx_uid, stx_gid, stx_[amc]time, stx_ino,
           stx_size, stx_blocks.
      
           These will be returned whether the caller asks for them or not.  The
           corresponding bits in stx_mask will be set to indicate whether they
           actually have valid values.
      
           If the caller didn't ask for them, then they may be approximated.  For
           example, NFS won't waste any time updating them from the server,
           unless as a byproduct of updating something requested.
      
           If the values don't actually exist for the underlying object (such as
           UID or GID on a DOS file), then the bit won't be set in the stx_mask,
           even if the caller asked for the value.  In such a case, the returned
           value will be a fabrication.
      
           Note that there are instances where the type might not be valid, for
           instance Windows reparse points.
      
       (2) stx_rdev_*.
      
           This will be set only if stx_mode indicates we're looking at a
           blockdev or a chardev, otherwise will be 0.
      
       (3) stx_btime.
      
           Similar to (1), except this will be set to 0 if it doesn't exist.
      
      =======
      TESTING
      =======
      
      The following test program can be used to test the statx system call:
      
      	samples/statx/test-statx.c
      
      Just compile and run, passing it paths to the files you want to examine.
      The file is built automatically if CONFIG_SAMPLES is enabled.
      
      Here's some example output.  Firstly, an NFS directory that crosses to
      another FSID.  Note that the AUTOMOUNT attribute is set because transiting
      this directory will cause d_automount to be invoked by the VFS.
      
      	[root@andromeda ~]# /tmp/test-statx -A /warthog/data
      	statx(/warthog/data) = 0
      	results=7ff
      	  Size: 4096            Blocks: 8          IO Block: 1048576  directory
      	Device: 00:26           Inode: 1703937     Links: 125
      	Access: (3777/drwxrwxrwx)  Uid:     0   Gid:  4041
      	Access: 2016-11-24 09:02:12.219699527+0000
      	Modify: 2016-11-17 10:44:36.225653653+0000
      	Change: 2016-11-17 10:44:36.225653653+0000
      	Attributes: 0000000000001000 (-------- -------- -------- -------- -------- -------- ---m---- --------)
      
      Secondly, the result of automounting on that directory.
      
      	[root@andromeda ~]# /tmp/test-statx /warthog/data
      	statx(/warthog/data) = 0
      	results=7ff
      	  Size: 4096            Blocks: 8          IO Block: 1048576  directory
      	Device: 00:27           Inode: 2           Links: 125
      	Access: (3777/drwxrwxrwx)  Uid:     0   Gid:  4041
      	Access: 2016-11-24 09:02:12.219699527+0000
      	Modify: 2016-11-17 10:44:36.225653653+0000
      	Change: 2016-11-17 10:44:36.225653653+0000
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a528d35e
  14. 18 2月, 2017 1 次提交
  15. 07 2月, 2017 1 次提交
    • N
      NFSDv4: use export cache flushtime for changeid on V4ROOT objects. · b8800921
      NeilBrown 提交于
      If you change the set of filesystems that are exported, then
      the contents of various directories in the NFSv4 pseudo-root
      is likely to change.  However the change-id of those
      directories is currently tied to the underlying directory,
      so the client may not see the changes in a timely fashion.
      
      This patch changes the change-id number to be derived from the
      "flush_time" of the export cache.  Whenever any changes are
      made to the set of exported filesystems, this flush_time is
      updated.  The result is that clients see changes to the set
      of exported filesystems much more quickly, often immediately.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      b8800921
  16. 01 2月, 2017 3 次提交
    • J
      nfsd: opt in to labeled nfs per export · 32ddd944
      J. Bruce Fields 提交于
      Currently turning on NFSv4.2 results in 4.2 clients suddenly seeing the
      individual file labels as they're set on the server.  This is not what
      they've previously seen, and not appropriate in may cases.  (In
      particular, if clients have heterogenous security policies then one
      client's labels may not even make sense to another.)  Labeled NFS should
      be opted in only in those cases when the administrator knows it makes
      sense.
      
      It's helpful to be able to turn 4.2 on by default, and otherwise the
      protocol upgrade seems free of regressions.  So, default labeled NFS to
      off and provide an export flag to reenable it.
      
      Users wanting labeled NFS support on an export will henceforth need to:
      
      	- make sure 4.2 support is enabled on client and server (as
      	  before), and
      	- upgrade the server nfs-utils to a version supporting the new
      	  "security_label" export flag.
      	- set that "security_label" flag on the export.
      
      This is commit may be seen as a regression to anyone currently depending
      on security labels.  We believe those cases are currently rare.
      
      Reported-by: tibbs@math.uh.edu
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      32ddd944
    • J
      nfsd: constify nfsd_suppatttrs · 5cf23dbb
      J. Bruce Fields 提交于
      To keep me from accidentally writing to this again....
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      5cf23dbb
    • K
      NFSD: pass an integer for stable type to nfsd_vfs_write · 54bbb7d2
      Kinglong Mee 提交于
      After fae5096a "nfsd: assume writeable exportabled filesystems have
      f_sync" we no longer modify this argument.
      
      This is just cleanup, no change in functionality.
      Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      54bbb7d2
  17. 13 1月, 2017 1 次提交
  18. 16 12月, 2016 1 次提交
  19. 09 12月, 2016 1 次提交
  20. 02 11月, 2016 2 次提交
    • J
      nfsd: catch errors in decode_fattr earlier · e864c189
      J. Bruce Fields 提交于
      3c8e0316 "NFSv4: do exact check about attribute specified" fixed
      some handling of unsupported-attribute errors, but it also delayed
      checking for unwriteable attributes till after we decode them.  This
      could lead to odd behavior in the case a client attemps to set an
      attribute we don't know about followed by one we try to parse.  In that
      case the parser for the known attribute will attempt to parse the
      unknown attribute.  It should fail in some safe way, but the error might
      at least be incorrect (probably bad_xdr instead of inval).  So, it's
      better to do that check at the start.
      
      As far as I know this doesn't cause any problems with current clients
      but it might be a minor issue e.g. if we encounter a future client that
      supports a new attribute that we currently don't.
      
      Cc: Yu Zhiguo <yuzg@cn.fujitsu.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      e864c189
    • J
      nfsd: clean up supported attribute handling · 916d2d84
      J. Bruce Fields 提交于
      Minor cleanup, no change in behavior.
      
      Provide helpers for some common attribute bitmap operations.  Drop some
      comments that just echo the code.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      916d2d84
  21. 08 10月, 2016 1 次提交
  22. 23 9月, 2016 1 次提交
  23. 16 7月, 2016 1 次提交
  24. 14 7月, 2016 1 次提交
    • A
      nfsd: implement machine credential support for some operations · ed941643
      Andrew Elble 提交于
      This addresses the conundrum referenced in RFC5661 18.35.3,
      and will allow clients to return state to the server using the
      machine credentials.
      
      The biggest part of the problem is that we need to allow the client
      to send a compound op with integrity/privacy on mounts that don't
      have it enabled.
      
      Add server support for properly decoding and using spo_must_enforce
      and spo_must_allow bits. Add support for machine credentials to be
      used for CLOSE, OPEN_DOWNGRADE, LOCKU, DELEGRETURN,
      and TEST/FREE STATEID.
      Implement a check so as to not throw WRONGSEC errors when these
      operations are used if integrity/privacy isn't turned on.
      
      Without this, Linux clients with credentials that expired while holding
      delegations were getting stuck in an endless loop.
      Signed-off-by: NAndrew Elble <aweits@rit.edu>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      ed941643