1. 17 10月, 2020 1 次提交
  2. 06 10月, 2020 1 次提交
    • D
      x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}() · ec6347bb
      Dan Williams 提交于
      In reaction to a proposal to introduce a memcpy_mcsafe_fast()
      implementation Linus points out that memcpy_mcsafe() is poorly named
      relative to communicating the scope of the interface. Specifically what
      addresses are valid to pass as source, destination, and what faults /
      exceptions are handled.
      
      Of particular concern is that even though x86 might be able to handle
      the semantics of copy_mc_to_user() with its common copy_user_generic()
      implementation other archs likely need / want an explicit path for this
      case:
      
        On Fri, May 1, 2020 at 11:28 AM Linus Torvalds <torvalds@linux-foundation.org> wrote:
        >
        > On Thu, Apr 30, 2020 at 6:21 PM Dan Williams <dan.j.williams@intel.com> wrote:
        > >
        > > However now I see that copy_user_generic() works for the wrong reason.
        > > It works because the exception on the source address due to poison
        > > looks no different than a write fault on the user address to the
        > > caller, it's still just a short copy. So it makes copy_to_user() work
        > > for the wrong reason relative to the name.
        >
        > Right.
        >
        > And it won't work that way on other architectures. On x86, we have a
        > generic function that can take faults on either side, and we use it
        > for both cases (and for the "in_user" case too), but that's an
        > artifact of the architecture oddity.
        >
        > In fact, it's probably wrong even on x86 - because it can hide bugs -
        > but writing those things is painful enough that everybody prefers
        > having just one function.
      
      Replace a single top-level memcpy_mcsafe() with either
      copy_mc_to_user(), or copy_mc_to_kernel().
      
      Introduce an x86 copy_mc_fragile() name as the rename for the
      low-level x86 implementation formerly named memcpy_mcsafe(). It is used
      as the slow / careful backend that is supplanted by a fast
      copy_mc_generic() in a follow-on patch.
      
      One side-effect of this reorganization is that separating copy_mc_64.S
      to its own file means that perf no longer needs to track dependencies
      for its memcpy_64.S benchmarks.
      
       [ bp: Massage a bit. ]
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NTony Luck <tony.luck@intel.com>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Cc: <stable@vger.kernel.org>
      Link: http://lore.kernel.org/r/CAHk-=wjSqtXAqfUJxFtWNwmguFASTgB0dz1dT3V-78Quiezqbg@mail.gmail.com
      Link: https://lkml.kernel.org/r/160195561680.2163339.11574962055305783722.stgit@dwillia2-desk3.amr.corp.intel.com
      ec6347bb
  3. 03 10月, 2020 2 次提交
    • C
      iov_iter: transparently handle compat iovecs in import_iovec · 89cd35c5
      Christoph Hellwig 提交于
      Use in compat_syscall to import either native or the compat iovecs, and
      remove the now superflous compat_import_iovec.
      
      This removes the need for special compat logic in most callers, and
      the remaining ones can still be simplified by using __import_iovec
      with a bool compat parameter.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      89cd35c5
    • C
      iov_iter: refactor rw_copy_check_uvector and import_iovec · bfdc5970
      Christoph Hellwig 提交于
      Split rw_copy_check_uvector into two new helpers with more sensible
      calling conventions:
      
       - iovec_from_user copies a iovec from userspace either into the provided
         stack buffer if it fits, or allocates a new buffer for it.  Returns
         the actually used iovec.  It also verifies that iov_len does fit a
         signed type, and handles compat iovecs if the compat flag is set.
       - __import_iovec consolidates the native and compat versions of
         import_iovec. It calls iovec_from_user, then validates each iovec
         actually points to user addresses, and ensures the total length
         doesn't overflow.
      
      This has two major implications:
      
       - the access_process_vm case loses the total lenght checking, which
         wasn't required anyway, given that each call receives two iovecs
         for the local and remote side of the operation, and it verifies
         the total length on the local side already.
       - instead of a single loop there now are two loops over the iovecs.
         Given that the iovecs are cache hot this doesn't make a major
         difference
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      bfdc5970
  4. 25 9月, 2020 1 次提交
  5. 21 8月, 2020 3 次提交
    • A
      saner calling conventions for csum_and_copy_..._user() · c693cc46
      Al Viro 提交于
      All callers of these primitives will
      	* discard anything we might've copied in case of error
      	* ignore the csum value in case of error
      	* always pass 0xffffffff as the initial sum, so the
      resulting csum value (in case of success, that is) will never be 0.
      
      That suggest the following calling conventions:
      	* don't pass err_ptr - just return 0 on error.
      	* don't bother with zeroing destination, etc. in case of error
      	* don't pass the initial sum - just use 0xffffffff.
      
      This commit does the minimal conversion in the instances of csum_and_copy_...();
      the changes of actual asm code behind them are done later in the series.
      Note that this asm code is often shared with csum_partial_copy_nocheck();
      the difference is that csum_partial_copy_nocheck() passes 0 for initial
      sum while csum_and_copy_..._user() pass 0xffffffff.  Fortunately, we are
      free to pass 0xffffffff in all cases and subsequent patches will use that
      freedom without any special comments.
      
      A part that could be split off: parisc and uml/i386 claimed to have
      csum_and_copy_to_user() instances of their own, but those were identical
      to the generic one, so we simply drop them.  Not sure if it's worth
      a separate commit...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      c693cc46
    • A
      csum_and_copy_..._user(): pass 0xffffffff instead of 0 as initial sum · 99a2c96d
      Al Viro 提交于
      Preparation for the change of calling conventions; right now all
      callers pass 0 as initial sum.  Passing 0xffffffff instead yields
      the values comparable mod 0xffff and guarantees that 0 will not
      be returned on success.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      99a2c96d
    • A
      csum_partial_copy_nocheck(): drop the last argument · cc44c17b
      Al Viro 提交于
      It's always 0.  Note that we theoretically could use ~0U as well -
      result will be the same modulo 0xffff, _if_ the damn thing did the
      right thing for any value of initial sum; later we'll make use of
      that when convenient.
      
      However, unlike csum_and_copy_..._user(), there are instances that
      did not work for arbitrary initial sums; c6x is one such.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      cc44c17b
  6. 30 6月, 2020 1 次提交
    • H
      iov_iter: Move unnecessary inclusion of crypto/hash.h · 7999096f
      Herbert Xu 提交于
      The header file linux/uio.h includes crypto/hash.h which pulls in
      most of the Crypto API.  Since linux/uio.h is used throughout the
      kernel this means that every tiny bit of change to the Crypto API
      causes the entire kernel to get rebuilt.
      
      This patch fixes this by moving it into lib/iov_iter.c instead
      where it is actually used.
      
      This patch also fixes the ifdef to use CRYPTO_HASH instead of just
      CRYPTO which does not guarantee the existence of ahash.
      
      Unfortunately a number of drivers were relying on linux/uio.h to
      provide access to linux/slab.h.  This patch adds inclusions of
      linux/slab.h as detected by build failures.
      
      Also skbuff.h was relying on this to provide a declaration for
      ahash_request.  This patch adds a forward declaration instead.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7999096f
  7. 21 3月, 2020 1 次提交
  8. 17 12月, 2019 1 次提交
  9. 16 11月, 2019 1 次提交
    • D
      pipe: Allow pipes to have kernel-reserved slots · 6718b6f8
      David Howells 提交于
      Split pipe->ring_size into two numbers:
      
       (1) pipe->ring_size - indicates the hard size of the pipe ring.
      
       (2) pipe->max_usage - indicates the maximum number of pipe ring slots that
           userspace orchestrated events can fill.
      
      This allows for a pipe that is both writable by the general kernel
      notification facility and by userspace, allowing plenty of ring space for
      notifications to be added whilst preventing userspace from being able to
      pin too much unswappable kernel space.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      6718b6f8
  10. 31 10月, 2019 1 次提交
    • D
      pipe: Use head and tail pointers for the ring, not cursor and length · 8cefc107
      David Howells 提交于
      Convert pipes to use head and tail pointers for the buffer ring rather than
      pointer and length as the latter requires two atomic ops to update (or a
      combined op) whereas the former only requires one.
      
       (1) The head pointer is the point at which production occurs and points to
           the slot in which the next buffer will be placed.  This is equivalent
           to pipe->curbuf + pipe->nrbufs.
      
           The head pointer belongs to the write-side.
      
       (2) The tail pointer is the point at which consumption occurs.  It points
           to the next slot to be consumed.  This is equivalent to pipe->curbuf.
      
           The tail pointer belongs to the read-side.
      
       (3) head and tail are allowed to run to UINT_MAX and wrap naturally.  They
           are only masked off when the array is being accessed, e.g.:
      
      	pipe->bufs[head & mask]
      
           This means that it is not necessary to have a dead slot in the ring as
           head == tail isn't ambiguous.
      
       (4) The ring is empty if "head == tail".
      
           A helper, pipe_empty(), is provided for this.
      
       (5) The occupancy of the ring is "head - tail".
      
           A helper, pipe_occupancy(), is provided for this.
      
       (6) The number of free slots in the ring is "pipe->ring_size - occupancy".
      
           A helper, pipe_space_for_user() is provided to indicate how many slots
           userspace may use.
      
       (7) The ring is full if "head - tail >= pipe->ring_size".
      
           A helper, pipe_full(), is provided for this.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      8cefc107
  11. 23 10月, 2019 1 次提交
    • A
      compat_ioctl: reimplement SG_IO handling · 98aaaec4
      Arnd Bergmann 提交于
      There are two code locations that implement the SG_IO ioctl: the old
      sg.c driver, and the generic scsi_ioctl helper that is in turn used by
      multiple drivers.
      
      To eradicate the old compat_ioctl conversion handler for the SG_IO
      command, I implement a readable pair of put_sg_io_hdr() /get_sg_io_hdr()
      helper functions that can be used for both compat and native mode,
      and then I call this from both drivers.
      
      For the iovec handling, there is already a compat_import_iovec() function
      that can simply be called in place of import_iovec().
      
      To avoid having to pass the compat/native state through multiple
      indirections, I mark the SG_IO command itself as compatible in
      fs/compat_ioctl.c and use in_compat_syscall() to figure out where
      we are called from.
      
      As a side-effect of this, the sg.c driver now also accepts the 32-bit
      sg_io_hdr format in compat mode using the read/write interface, not
      just ioctl. This should improve compatiblity with old 32-bit binaries,
      but it would break if any application intentionally passes the 64-bit
      data structure in compat mode here.
      
      Steffen Maier helped debug an issue in an earlier version of this patch.
      
      Cc: Steffen Maier <maier@linux.ibm.com>
      Cc: linux-scsi@vger.kernel.org
      Cc: Doug Gilbert <dgilbert@interlog.com>
      Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      98aaaec4
  12. 25 9月, 2019 1 次提交
  13. 01 6月, 2019 1 次提交
  14. 21 5月, 2019 1 次提交
  15. 15 5月, 2019 1 次提交
    • I
      mm/gup: change GUP fast to use flags rather than a write 'bool' · 73b0140b
      Ira Weiny 提交于
      To facilitate additional options to get_user_pages_fast() change the
      singular write parameter to be gup_flags.
      
      This patch does not change any functionality.  New functionality will
      follow in subsequent patches.
      
      Some of the get_user_pages_fast() call sites were unchanged because they
      already passed FOLL_WRITE or 0 for the write parameter.
      
      NOTE: It was suggested to change the ordering of the get_user_pages_fast()
      arguments to ensure that callers were converted.  This breaks the current
      GUP call site convention of having the returned pages be the final
      parameter.  So the suggestion was rejected.
      
      Link: http://lkml.kernel.org/r/20190328084422.29911-4-ira.weiny@intel.com
      Link: http://lkml.kernel.org/r/20190317183438.2057-4-ira.weiny@intel.comSigned-off-by: NIra Weiny <ira.weiny@intel.com>
      Reviewed-by: NMike Marshall <hubcap@omnibond.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      73b0140b
  16. 04 4月, 2019 1 次提交
  17. 27 2月, 2019 1 次提交
    • E
      iov_iter: optimize page_copy_sane() · 6daef95b
      Eric Dumazet 提交于
      Avoid cache line miss dereferencing struct page if we can.
      
      page_copy_sane() mostly deals with order-0 pages.
      
      Extra cache line miss is visible on TCP recvmsg() calls dealing
      with GRO packets (typically 45 page frags are attached to one skb).
      
      Bringing the 45 struct pages into cpu cache while copying the data
      is not free, since the freeing of the skb (and associated
      page frags put_page()) can happen after cache lines have been evicted.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      6daef95b
  18. 04 1月, 2019 1 次提交
    • L
      Remove 'type' argument from access_ok() function · 96d4f267
      Linus Torvalds 提交于
      Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
      of the user address range verification function since we got rid of the
      old racy i386-only code to walk page tables by hand.
      
      It existed because the original 80386 would not honor the write protect
      bit when in kernel mode, so you had to do COW by hand before doing any
      user access.  But we haven't supported that in a long time, and these
      days the 'type' argument is a purely historical artifact.
      
      A discussion about extending 'user_access_begin()' to do the range
      checking resulted this patch, because there is no way we're going to
      move the old VERIFY_xyz interface to that model.  And it's best done at
      the end of the merge window when I've done most of my merges, so let's
      just get this done once and for all.
      
      This patch was mostly done with a sed-script, with manual fix-ups for
      the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
      
      There were a couple of notable cases:
      
       - csky still had the old "verify_area()" name as an alias.
      
       - the iter_iov code had magical hardcoded knowledge of the actual
         values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
         really used it)
      
       - microblaze used the type argument for a debug printout
      
      but other than those oddities this should be a total no-op patch.
      
      I tried to fix up all architectures, did fairly extensive grepping for
      access_ok() uses, and the changes are trivial, but I may have missed
      something.  Any missed conversion should be trivially fixable, though.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      96d4f267
  19. 13 12月, 2018 2 次提交
  20. 28 11月, 2018 1 次提交
  21. 26 11月, 2018 1 次提交
  22. 24 10月, 2018 3 次提交
    • D
      iov_iter: Add I/O discard iterator · 9ea9ce04
      David Howells 提交于
      Add a new iterator, ITER_DISCARD, that can only be used in READ mode and
      just discards any data copied to it.
      
      This is useful in a network filesystem for discarding any unwanted data
      sent by a server.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      9ea9ce04
    • D
      iov_iter: Separate type from direction and use accessor functions · aa563d7b
      David Howells 提交于
      In the iov_iter struct, separate the iterator type from the iterator
      direction and use accessor functions to access them in most places.
      
      Convert a bunch of places to use switch-statements to access them rather
      then chains of bitwise-AND statements.  This makes it easier to add further
      iterator types.  Also, this can be more efficient as to implement a switch
      of small contiguous integers, the compiler can use ~50% fewer compare
      instructions than it has to use bitwise-and instructions.
      
      Further, cease passing the iterator type into the iterator setup function.
      The iterator function can set that itself.  Only the direction is required.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      aa563d7b
    • D
      iov_iter: Use accessor function · 00e23707
      David Howells 提交于
      Use accessor functions to access an iterator's type and direction.  This
      allows for the possibility of using some other method of determining the
      type of iterator than if-chains with bitwise-AND conditions.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      00e23707
  23. 16 7月, 2018 3 次提交
  24. 15 5月, 2018 1 次提交
  25. 03 5月, 2018 2 次提交
  26. 12 10月, 2017 1 次提交
  27. 21 9月, 2017 1 次提交
  28. 07 7月, 2017 1 次提交
    • A
      iov_iter: saner checks on copyin/copyout · 09fc68dc
      Al Viro 提交于
      * might_fault() is better checked in caller (and e.g. fault-in + kmap_atomic
      codepath also needs might_fault() coverage)
      * we have already done object size checks
      * we have *NOT* done access_ok() recently enough; we rely upon the
      iovec array having passed sanity checks back when it had been created
      and not nothing having buggered it since.  However, that's very much
      non-local, so we'd better recheck that.
      
      So the thing we want does not match anything in uaccess - we need
      access_ok + kasan checks + raw copy without any zeroing.  Just define
      such helpers and use them here.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      09fc68dc
  29. 30 6月, 2017 2 次提交
  30. 10 6月, 2017 1 次提交
    • D
      x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass operations · 0aed55af
      Dan Williams 提交于
      The pmem driver has a need to transfer data with a persistent memory
      destination and be able to rely on the fact that the destination writes are not
      cached. It is sufficient for the writes to be flushed to a cpu-store-buffer
      (non-temporal / "movnt" in x86 terms), as we expect userspace to call fsync()
      to ensure data-writes have reached a power-fail-safe zone in the platform. The
      fsync() triggers a REQ_FUA or REQ_FLUSH to the pmem driver which will turn
      around and fence previous writes with an "sfence".
      
      Implement a __copy_from_user_inatomic_flushcache, memcpy_page_flushcache, and
      memcpy_flushcache, that guarantee that the destination buffer is not dirty in
      the cpu cache on completion. The new copy_from_iter_flushcache and sub-routines
      will be used to replace the "pmem api" (include/linux/pmem.h +
      arch/x86/include/asm/pmem.h). The availability of copy_from_iter_flushcache()
      and memcpy_flushcache() are gated by the CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
      config symbol, and fallback to copy_from_iter_nocache() and plain memcpy()
      otherwise.
      
      This is meant to satisfy the concern from Linus that if a driver wants to do
      something beyond the normal nocache semantics it should be something private to
      that driver [1], and Al's concern that anything uaccess related belongs with
      the rest of the uaccess code [2].
      
      The first consumer of this interface is a new 'copy_from_iter' dax operation so
      that pmem can inject cache maintenance operations without imposing this
      overhead on other dax-capable drivers.
      
      [1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html
      [2]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009942.html
      
      Cc: <x86@kernel.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      0aed55af