1. 13 11月, 2017 17 次提交
    • D
      afs: Overhaul the callback handling · c435ee34
      David Howells 提交于
      Overhaul the AFS callback handling by the following means:
      
       (1) Don't give up callback promises on vnodes that we are no longer using,
           rather let them just expire on the server or let the server break
           them.  This is actually more efficient for the server as the callback
           lookup is expensive if there are lots of extant callbacks.
      
       (2) Only give up the callback promises we have from a server when the
           server record is destroyed.  Then we can just give up *all* the
           callback promises on it in one go.
      
       (3) Servers can end up being shared between cells if cells are aliased, so
           don't add all the vnodes being backed by a particular server into a
           big FID-indexed tree on that server as there may be duplicates.
      
           Instead have each volume instance (~= superblock) register an interest
           in a server as it starts to make use of it and use this to allow the
           processor for callbacks from the server to find the superblock and
           thence the inode corresponding to the FID being broken by means of
           ilookup_nowait().
      
       (4) Rather than iterating over the entire callback list when a mass-break
           comes in from the server, maintain a counter of mass-breaks in
           afs_server (cb_seq) and make afs_validate() check it against the copy
           in afs_vnode.
      
           It would be nice not to have to take a read_lock whilst doing this,
           but that's tricky without using RCU.
      
       (5) Save a ref on the fileserver we're using for a call in the afs_call
           struct so that we can access its cb_s_break during call decoding.
      
       (6) Write-lock around callback and status storage in a vnode and read-lock
           around getattr so that we don't see the status mid-update.
      
      This has the following consequences:
      
       (1) Data invalidation isn't seen until someone calls afs_validate() on a
           vnode.  Unfortunately, we need to use a key to query the server, but
           getting one from a background thread is tricky without caching loads
           of keys all over the place.
      
       (2) Mass invalidation isn't seen until someone calls afs_validate().
      
       (3) Callback breaking is going to hit the inode_hash_lock quite a bit.
           Could this be replaced with rcu_read_lock() since inodes are destroyed
           under RCU conditions.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      c435ee34
    • D
      afs: Rename struct afs_call server member to cm_server · d0676a16
      David Howells 提交于
      Rename the server member of struct afs_call to cm_server as we're only
      going to be using it for incoming calls for the Cache Manager service.
      This makes it easier to differentiate from the pointer to the target server
      for the client, which will point to a different structure to allow for
      callback handling.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      d0676a16
    • D
      afs: Fix the afs_uuid struct to make the char-sized fields signed · 03dc2cfc
      David Howells 提交于
      In AFS's encoding of a UUID, the eight 'char' fields are all signed, so
      represent them with __s8 rather than __u8.  This makes the compiler
      sign-extend them correctly when XDR-encoding them.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      03dc2cfc
    • D
      afs: Connect up the CB.ProbeUuid · f4b3526d
      David Howells 提交于
      The handler for the CB.ProbeUuid operation in the cache manager is
      implemented, but isn't listed in the switch-statement of operation
      selection, so won't be used.  Fix this by adding it.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      f4b3526d
    • D
      afs: Potentially return call->reply[0] from afs_make_call() · 33cd7f2b
      David Howells 提交于
      If call->ret_reply0 is set, return call->reply[0] on success.  Change the
      return type of afs_make_call() to long so that this can be passed back
      without bit loss and then cast to a pointer if required.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      33cd7f2b
    • D
      afs: Condense afs_call's reply{,2,3,4} into an array · 97e3043a
      David Howells 提交于
      Condense struct afs_call's reply anchor members - reply{,2,3,4} - into an
      array.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      97e3043a
    • D
      afs: Consolidate abort_to_error translators · f780c8ea
      David Howells 提交于
      The AFS abort code space is shared across all services, so there's no need
      for separate abort_to_error translators for each service.
      
      Consolidate them into a single function and remove the function pointers
      for them.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      f780c8ea
    • D
      afs: Allow IPv6 address specification of VL servers · 3838d3ec
      David Howells 提交于
      Allow VL server specifications to be given IPv6 addresses as well as IPv4
      addresses, for example as:
      
      	echo add foo.org 1111:2222:3333:0:4444:5555:6666:7777 >/proc/fs/afs/cells
      
      Note that ':' is the expected separator for separating IPv4 addresses, but
      if a ',' is detected or no '.' is detected in the string, the delimiter is
      switched to ','.
      
      This also works with DNS AFSDB or SRV record strings fetched by upcall from
      userspace.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      3838d3ec
    • D
      afs: Keep and pass sockaddr_rxrpc addresses rather than in_addr · 4d9df986
      David Howells 提交于
      Keep and pass sockaddr_rxrpc addresses around rather than keeping and
      passing in_addr addresses to allow for the use of IPv6 and non-standard
      port numbers in future.
      
      This also allows the port and service_id fields to be removed from the
      afs_call struct.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      4d9df986
    • D
      afs: Update the cache index structure · ad6a942a
      David Howells 提交于
      Update the cache index structure in the following ways:
      
       (1) Don't use the volume name followed by the volume type as levels in the
           cache index.  Volumes can be renamed.  Use the volume ID instead.
      
       (2) Don't store the VLDB data for a volume in the tree.  If the volume
           database should be cached locally, then it should be done in a separate
           tree.
      
       (3) Expand the volume ID stored in the cache to 64 bits.
      
       (4) Expand the file/vnode ID stored in the cache to 96 bits.
      
       (5) Increment the cache structure version number to 1.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      ad6a942a
    • D
      afs: Add some protocol defs · 91a90380
      David Howells 提交于
      Add some protocol definitions, including max field lengths, flag defs, an
      XDR-encoded UUID def, more VL operation IDs and more fileserver abort
      codes.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      91a90380
    • D
      afs: Push the net ns pointer to more places · 9ed900b1
      David Howells 提交于
      Push the network namespace pointer to more places in AFS, including the
      afs_server structure (which doesn't hold a ref on the netns).
      
      In particular, afs_put_cell() now takes requires a net ns parameter so that
      it can safely alter the netns after decrementing the cell usage count - the
      cell will be deallocated by a background thread after being cached for a
      period, which means that it's not safe to access it after reducing its
      usage count.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      9ed900b1
    • D
      afs: Note the cell in the superblock info also · 49566f6f
      David Howells 提交于
      Keep a reference to the cell in the superblock info structure in addition
      to the volume and net pointers.  This will make it easier to clean up in a
      future patch in which afs_put_volume() will need the cell pointer.
      
      Whilst we're at it, make the cell and volume getting functions return a
      pointer to the object got to make the call sites look neater.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      49566f6f
    • D
      afs: Fix server reaping · 59fa1c4a
      David Howells 提交于
      Fix server reaping and make sure it's all done before we start trying to
      purge cells, given that servers currently pin cells.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      59fa1c4a
    • D
      afs: Close the rxrpc socket only after purging the servers · e3b2ffe0
      David Howells 提交于
      Close the rxrpc socket only after we've purged the server records (and also
      cell and volume records which might refer to servers) so that we can give
      up the callbacks on each server.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      e3b2ffe0
    • D
      afs: Lay the groundwork for supporting network namespaces · f044c884
      David Howells 提交于
      Lay the groundwork for supporting network namespaces (netns) to the AFS
      filesystem by moving various global features to a network-namespace struct
      (afs_net) and providing an instance of this as a temporary global variable
      that everything uses via accessor functions for the moment.
      
      The following changes have been made:
      
       (1) Store the netns in the superblock info.  This will be obtained from
           the mounter's nsproxy on a manual mount and inherited from the parent
           superblock on an automount.
      
       (2) The cell list is made per-netns.  It can be viewed through
           /proc/net/afs/cells and also be modified by writing commands to that
           file.
      
       (3) The local workstation cell is set per-ns in /proc/net/afs/rootcell.
           This is unset by default.
      
       (4) The 'rootcell' module parameter, which sets a cell and VL server list
           modifies the init net namespace, thereby allowing an AFS root fs to be
           theoretically used.
      
       (5) The volume location lists and the file lock manager are made
           per-netns.
      
       (6) The AF_RXRPC socket and associated I/O bits are made per-ns.
      
      The various workqueues remain global for the moment.
      
      Changes still to be made:
      
       (1) /proc/fs/afs/ should be moved to /proc/net/afs/ and a symlink emplaced
           from the old name.
      
       (2) A per-netns subsys needs to be registered for AFS into which it can
           store its per-netns data.
      
       (3) Rather than the AF_RXRPC socket being opened on module init, it needs
           to be opened on the creation of a superblock in that netns.
      
       (4) The socket needs to be closed when the last superblock using it is
           destroyed and all outstanding client calls on it have been completed.
           This prevents a reference loop on the namespace.
      
       (5) It is possible that several namespaces will want to use AFS, in which
           case each one will need its own UDP port.  These can either be set
           through /proc/net/afs/cm_port or the kernel can pick one at random.
           The init_ns gets 7001 by default.
      
      Other issues that need resolving:
      
       (1) The DNS keyring needs net-namespacing.
      
       (2) Where do upcalls go (eg. DNS request-key upcall)?
      
       (3) Need something like open_socket_in_file_ns() syscall so that AFS
           command line tools attempting to operate on an AFS file/volume have
           their RPC calls go to the right place.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      f044c884
    • D
      Pass mode to wait_on_atomic_t() action funcs and provide default actions · 5e4def20
      David Howells 提交于
      Make wait_on_atomic_t() pass the TASK_* mode onto its action function as an
      extra argument and make it 'unsigned int throughout.
      
      Also, consolidate a bunch of identical action functions into a default
      function that can do the appropriate thing for the mode.
      
      Also, change the argument name in the bit_wait*() function declarations to
      reflect the fact that it's the mode and not the bit number.
      
      [Peter Z gives this a grudging ACK, but thinks that the whole atomic_t wait
      should be done differently, though he's not immediately sure as to how]
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      cc: Ingo Molnar <mingo@kernel.org>
      5e4def20
  2. 12 11月, 2017 1 次提交
    • A
      pstore: Use ktime_get_real_fast_ns() instead of __getnstimeofday() · df27067e
      Arnd Bergmann 提交于
      __getnstimeofday() is a rather odd interface, with a number of quirks:
      
      - The caller may come from NMI context, but the implementation is not NMI safe,
        one way to get there from NMI is
      
            NMI handler:
              something bad
                panic()
                  kmsg_dump()
                    pstore_dump()
                       pstore_record_init()
                         __getnstimeofday()
      
      - The calling conventions are different from any other timekeeping functions,
        to deal with returning an error code during suspended timekeeping.
      
      Address the above issues by using a completely different method to get the
      time: ktime_get_real_fast_ns() is NMI safe and has a reasonable behavior
      when timekeeping is suspended: it returns the time at which it got
      suspended. As Thomas Gleixner explained, this is safe, as
      ktime_get_real_fast_ns() does not call into the clocksource driver that
      might be suspended.
      
      The result can easily be transformed into a timespec structure. Since
      ktime_get_real_fast_ns() was not exported to modules, add the export.
      
      The pstore behavior for the suspended case changes slightly, as it now
      stores the timestamp at which timekeeping was suspended instead of storing
      a zero timestamp.
      
      This change is not addressing y2038-safety, that's subject to a more
      complex follow up patch.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Anton Vorontsov <anton@enomsg.org>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Colin Cross <ccross@android.com>
      Link: https://lkml.kernel.org/r/20171110152530.1926955-1-arnd@arndb.de
      df27067e
  3. 03 11月, 2017 4 次提交
  4. 02 11月, 2017 1 次提交
    • G
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman 提交于
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
  5. 30 10月, 2017 1 次提交
  6. 27 10月, 2017 1 次提交
  7. 26 10月, 2017 5 次提交
  8. 25 10月, 2017 2 次提交
  9. 24 10月, 2017 4 次提交
    • A
      ovl: do not cleanup unsupported index entries · fa0096e3
      Amir Goldstein 提交于
      With index=on, ovl_indexdir_cleanup() tries to cleanup invalid index
      entries (e.g. bad index name). This behavior could result in cleaning of
      entries created by newer kernels and is therefore undesirable.
      Instead, abort mount if such entries are encountered. We still cleanup
      'stale' entries and 'orphan' entries, both those cases can be a result
      of offline changes to lower and upper dirs.
      
      When encoutering an index entry of type directory or whiteout, kernel
      was supposed to fallback to read-only mount, but the fill_super()
      operation returns EROFS in this case instead of returning success with
      read-only mount flag, so mount fails when encoutering directory or
      whiteout index entries. Bless this behavior by returning -EINVAL on
      directory and whiteout index entries as we do for all unsupported index
      entries.
      
      Fixes: 61b67471 ("ovl: do not cleanup directory and whiteout index..")
      Cc: <stable@vger.kernel.org> # v4.13
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      fa0096e3
    • A
      ovl: handle ENOENT on index lookup · 7937a56f
      Amir Goldstein 提交于
      Treat ENOENT from index entry lookup the same way as treating a returned
      negative dentry. Apparently, either could be returned if file is not
      found, depending on the underlying file system.
      
      Fixes: 359f392c ("ovl: lookup index entry for copy up origin")
      Cc: <stable@vger.kernel.org> # v4.13
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      7937a56f
    • A
      ovl: fix EIO from lookup of non-indexed upper · 6eaf0111
      Amir Goldstein 提交于
      Commit fbaf94ee ("ovl: don't set origin on broken lower hardlink")
      attempt to avoid the condition of non-indexed upper inode with lower
      hardlink as origin. If this condition is found, lookup returns EIO.
      
      The protection of commit mentioned above does not cover the case of lower
      that is not a hardlink when it is copied up (with either index=off/on)
      and then lower is hardlinked while overlay is offline.
      
      Changes to lower layer while overlayfs is offline should not result in
      unexpected behavior, so a permanent EIO error after creating a link in
      lower layer should not be considered as correct behavior.
      
      This fix replaces EIO error with success in cases where upper has origin
      but no index is found, or index is found that does not match upper
      inode. In those cases, lookup will not fail and the returned overlay inode
      will be hashed by upper inode instead of by lower origin inode.
      
      Fixes: 359f392c ("ovl: lookup index entry for copy up origin")
      Cc: <stable@vger.kernel.org> # v4.13
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      6eaf0111
    • C
      xfs: fix AIM7 regression · 942491c9
      Christoph Hellwig 提交于
      Apparently our current rwsem code doesn't like doing the trylock, then
      lock for real scheme.  So change our read/write methods to just do the
      trylock for the RWF_NOWAIT case.  This fixes a ~25% regression in
      AIM7.
      
      Fixes: 91f9943e ("fs: support RWF_NOWAIT for buffered reads")
      Reported-by: Nkernel test robot <xiaolong.ye@intel.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      942491c9
  10. 20 10月, 2017 1 次提交
    • M
      membarrier: Provide register expedited private command · a961e409
      Mathieu Desnoyers 提交于
      This introduces a "register private expedited" membarrier command which
      allows eventual removal of important memory barrier constraints on the
      scheduler fast-paths. It changes how the "private expedited" membarrier
      command (new to 4.14) is used from user-space.
      
      This new command allows processes to register their intent to use the
      private expedited command.  This affects how the expedited private
      command introduced in 4.14-rc is meant to be used, and should be merged
      before 4.14 final.
      
      Processes are now required to register before using
      MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM.
      
      This fixes a problem that arose when designing requested extensions to
      sys_membarrier() to allow JITs to efficiently flush old code from
      instruction caches.  Several potential algorithms are much less painful
      if the user register intent to use this functionality early on, for
      example, before the process spawns the second thread.  Registering at
      this time removes the need to interrupt each and every thread in that
      process at the first expedited sys_membarrier() system call.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a961e409
  11. 19 10月, 2017 3 次提交