1. 27 2月, 2019 1 次提交
    • E
      iov_iter: optimize page_copy_sane() · 6daef95b
      Eric Dumazet 提交于
      Avoid cache line miss dereferencing struct page if we can.
      
      page_copy_sane() mostly deals with order-0 pages.
      
      Extra cache line miss is visible on TCP recvmsg() calls dealing
      with GRO packets (typically 45 page frags are attached to one skb).
      
      Bringing the 45 struct pages into cpu cache while copying the data
      is not free, since the freeing of the skb (and associated
      page frags put_page()) can happen after cache lines have been evicted.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      6daef95b
  2. 04 1月, 2019 1 次提交
    • L
      Remove 'type' argument from access_ok() function · 96d4f267
      Linus Torvalds 提交于
      Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
      of the user address range verification function since we got rid of the
      old racy i386-only code to walk page tables by hand.
      
      It existed because the original 80386 would not honor the write protect
      bit when in kernel mode, so you had to do COW by hand before doing any
      user access.  But we haven't supported that in a long time, and these
      days the 'type' argument is a purely historical artifact.
      
      A discussion about extending 'user_access_begin()' to do the range
      checking resulted this patch, because there is no way we're going to
      move the old VERIFY_xyz interface to that model.  And it's best done at
      the end of the merge window when I've done most of my merges, so let's
      just get this done once and for all.
      
      This patch was mostly done with a sed-script, with manual fix-ups for
      the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
      
      There were a couple of notable cases:
      
       - csky still had the old "verify_area()" name as an alias.
      
       - the iter_iov code had magical hardcoded knowledge of the actual
         values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
         really used it)
      
       - microblaze used the type argument for a debug printout
      
      but other than those oddities this should be a total no-op patch.
      
      I tried to fix up all architectures, did fairly extensive grepping for
      access_ok() uses, and the changes are trivial, but I may have missed
      something.  Any missed conversion should be trivially fixable, though.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      96d4f267
  3. 13 12月, 2018 2 次提交
  4. 28 11月, 2018 1 次提交
  5. 26 11月, 2018 1 次提交
  6. 24 10月, 2018 3 次提交
    • D
      iov_iter: Add I/O discard iterator · 9ea9ce04
      David Howells 提交于
      Add a new iterator, ITER_DISCARD, that can only be used in READ mode and
      just discards any data copied to it.
      
      This is useful in a network filesystem for discarding any unwanted data
      sent by a server.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      9ea9ce04
    • D
      iov_iter: Separate type from direction and use accessor functions · aa563d7b
      David Howells 提交于
      In the iov_iter struct, separate the iterator type from the iterator
      direction and use accessor functions to access them in most places.
      
      Convert a bunch of places to use switch-statements to access them rather
      then chains of bitwise-AND statements.  This makes it easier to add further
      iterator types.  Also, this can be more efficient as to implement a switch
      of small contiguous integers, the compiler can use ~50% fewer compare
      instructions than it has to use bitwise-and instructions.
      
      Further, cease passing the iterator type into the iterator setup function.
      The iterator function can set that itself.  Only the direction is required.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      aa563d7b
    • D
      iov_iter: Use accessor function · 00e23707
      David Howells 提交于
      Use accessor functions to access an iterator's type and direction.  This
      allows for the possibility of using some other method of determining the
      type of iterator than if-chains with bitwise-AND conditions.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      00e23707
  7. 16 7月, 2018 3 次提交
  8. 15 5月, 2018 1 次提交
  9. 03 5月, 2018 2 次提交
  10. 12 10月, 2017 1 次提交
  11. 21 9月, 2017 1 次提交
  12. 07 7月, 2017 1 次提交
    • A
      iov_iter: saner checks on copyin/copyout · 09fc68dc
      Al Viro 提交于
      * might_fault() is better checked in caller (and e.g. fault-in + kmap_atomic
      codepath also needs might_fault() coverage)
      * we have already done object size checks
      * we have *NOT* done access_ok() recently enough; we rely upon the
      iovec array having passed sanity checks back when it had been created
      and not nothing having buggered it since.  However, that's very much
      non-local, so we'd better recheck that.
      
      So the thing we want does not match anything in uaccess - we need
      access_ok + kasan checks + raw copy without any zeroing.  Just define
      such helpers and use them here.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      09fc68dc
  13. 30 6月, 2017 2 次提交
  14. 10 6月, 2017 1 次提交
    • D
      x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass operations · 0aed55af
      Dan Williams 提交于
      The pmem driver has a need to transfer data with a persistent memory
      destination and be able to rely on the fact that the destination writes are not
      cached. It is sufficient for the writes to be flushed to a cpu-store-buffer
      (non-temporal / "movnt" in x86 terms), as we expect userspace to call fsync()
      to ensure data-writes have reached a power-fail-safe zone in the platform. The
      fsync() triggers a REQ_FUA or REQ_FLUSH to the pmem driver which will turn
      around and fence previous writes with an "sfence".
      
      Implement a __copy_from_user_inatomic_flushcache, memcpy_page_flushcache, and
      memcpy_flushcache, that guarantee that the destination buffer is not dirty in
      the cpu cache on completion. The new copy_from_iter_flushcache and sub-routines
      will be used to replace the "pmem api" (include/linux/pmem.h +
      arch/x86/include/asm/pmem.h). The availability of copy_from_iter_flushcache()
      and memcpy_flushcache() are gated by the CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
      config symbol, and fallback to copy_from_iter_nocache() and plain memcpy()
      otherwise.
      
      This is meant to satisfy the concern from Linus that if a driver wants to do
      something beyond the normal nocache semantics it should be something private to
      that driver [1], and Al's concern that anything uaccess related belongs with
      the rest of the uaccess code [2].
      
      The first consumer of this interface is a new 'copy_from_iter' dax operation so
      that pmem can inject cache maintenance operations without imposing this
      overhead on other dax-capable drivers.
      
      [1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html
      [2]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009942.html
      
      Cc: <x86@kernel.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      0aed55af
  15. 09 5月, 2017 2 次提交
    • M
      treewide: use kv[mz]alloc* rather than opencoded variants · 752ade68
      Michal Hocko 提交于
      There are many code paths opencoding kvmalloc.  Let's use the helper
      instead.  The main difference to kvmalloc is that those users are
      usually not considering all the aspects of the memory allocator.  E.g.
      allocation requests <= 32kB (with 4kB pages) are basically never failing
      and invoke OOM killer to satisfy the allocation.  This sounds too
      disruptive for something that has a reasonable fallback - the vmalloc.
      On the other hand those requests might fallback to vmalloc even when the
      memory allocator would succeed after several more reclaim/compaction
      attempts previously.  There is no guarantee something like that happens
      though.
      
      This patch converts many of those places to kv[mz]alloc* helpers because
      they are more conservative.
      
      Link: http://lkml.kernel.org/r/20170306103327.2766-2-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> # Xen bits
      Acked-by: NKees Cook <keescook@chromium.org>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: Andreas Dilger <andreas.dilger@intel.com> # Lustre
      Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> # KVM/s390
      Acked-by: Dan Williams <dan.j.williams@intel.com> # nvdim
      Acked-by: David Sterba <dsterba@suse.com> # btrfs
      Acked-by: Ilya Dryomov <idryomov@gmail.com> # Ceph
      Acked-by: Tariq Toukan <tariqt@mellanox.com> # mlx4
      Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx5
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Anton Vorontsov <anton@enomsg.org>
      Cc: Colin Cross <ccross@android.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Cc: Santosh Raspatur <santosh@chelsio.com>
      Cc: Hariprasad S <hariprasad@chelsio.com>
      Cc: Yishai Hadas <yishaih@mellanox.com>
      Cc: Oleg Drokin <oleg.drokin@intel.com>
      Cc: "Yan, Zheng" <zyan@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      752ade68
    • A
      fix braino in generic_file_read_iter() · 5b47d59a
      Al Viro 提交于
      Wrong sign of iov_iter_revert() argument.  Unfortunately, slipped through
      the testing, since most of the time we don't do anything to the iterator
      afterwards and potential oops on walking the iter->iov too far backwards
      is too infrequent to be easily triggered.
      
      Add a sanity check in iov_iter_revert() to catch bugs like this one;
      fortunately, the same braino hadn't happened in other callers, but we'd
      better have a warning if such thing crops up.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5b47d59a
  16. 30 4月, 2017 1 次提交
  17. 03 4月, 2017 1 次提交
  18. 29 3月, 2017 2 次提交
  19. 15 1月, 2017 1 次提交
  20. 23 12月, 2016 1 次提交
    • A
      [iov_iter] fix iterate_all_kinds() on empty iterators · 33844e66
      Al Viro 提交于
      Problem similar to ones dealt with in "fold checks into iterate_and_advance()"
      and followups, except that in this case we really want to do nothing when
      asked for zero-length operation - unlike zero-length iterate_and_advance(),
      zero-length iterate_all_kinds() has no side effects, and callers are simpler
      that way.
      
      That got exposed when copy_from_iter_full() had been used by tipc, which
      builds an msghdr with zero payload and (now) feeds it to a primitive
      based on iterate_all_kinds() instead of iterate_and_advance().
      Reported-by: NJon Maloy <jon.maloy@ericsson.com>
      Tested-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      33844e66
  21. 06 12月, 2016 1 次提交
    • A
      [iov_iter] new primitives - copy_from_iter_full() and friends · cbbd26b8
      Al Viro 提交于
      copy_from_iter_full(), copy_from_iter_full_nocache() and
      csum_and_copy_from_iter_full() - counterparts of copy_from_iter()
      et.al., advancing iterator only in case of successful full copy
      and returning whether it had been successful or not.
      
      Convert some obvious users.  *NOTE* - do not blindly assume that
      something is a good candidate for those unless you are sure that
      not advancing iov_iter in failure case is the right thing in
      this case.  Anything that does short read/short write kind of
      stuff (or is in a loop, etc.) is unlikely to be a good one.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      cbbd26b8
  22. 17 11月, 2016 1 次提交
    • A
      fix iov_iter_advance() for ITER_PIPE · 680bb946
      Abhi Das 提交于
      iov_iter_advance() needs to decrement iter->count by the number of
      bytes we'd moved beyond.  Normal flavours do that, but ITER_PIPE
      doesn't and ITER_PIPE generic_file_read_iter() for O_DIRECT files
      ends up with a bogus fallback to page cache read, resulting in incorrect
      values for file offset and bytes read.
      Signed-off-by: NAbhi Das <adas@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      680bb946
  23. 01 11月, 2016 1 次提交
  24. 15 10月, 2016 1 次提交
  25. 12 10月, 2016 1 次提交
  26. 06 10月, 2016 2 次提交
    • M
      pipe: add pipe_buf_release() helper · a779638c
      Miklos Szeredi 提交于
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a779638c
    • A
      new iov_iter flavour: pipe-backed · 241699cd
      Al Viro 提交于
      iov_iter variant for passing data into pipe.  copy_to_iter()
      copies data into page(s) it has allocated and stuffs them into
      the pipe; copy_page_to_iter() stuffs there a reference to the
      page given to it.  Both will try to coalesce if possible.
      iov_iter_zero() is similar to copy_to_iter(); iov_iter_get_pages()
      and friends will do as copy_to_iter() would have and return the
      pages where the data would've been copied.  iov_iter_advance()
      will truncate everything past the spot it has advanced to.
      
      New primitive: iov_iter_pipe(), used for initializing those.
      pipe should be locked all along.
      
      Running out of space acts as fault would for iovec-backed ones;
      in other words, giving it to ->read_iter() may result in short
      read if the pipe overflows, or -EFAULT if it happens with nothing
      copied there.
      
      In other words, ->read_iter() on those acts pretty much like
      ->splice_read().  Moreover, all generic_file_splice_read() users,
      as well as many other ->splice_read() instances can be switched
      to that scheme - that'll happen in the next commit.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      241699cd
  27. 28 9月, 2016 1 次提交
    • A
      get rid of separate multipage fault-in primitives · 4bce9f6e
      Al Viro 提交于
      * the only remaining callers of "short" fault-ins are just as happy with generic
      variants (both in lib/iov_iter.c); switch them to multipage variants, kill the
      "short" ones
      * rename the multipage variants to now available plain ones.
      * get rid of compat macro defining iov_iter_fault_in_multipage_readable by
      expanding it in its only user.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4bce9f6e
  28. 18 9月, 2016 1 次提交
  29. 29 7月, 2016 1 次提交
    • M
      mm: optimize copy_page_to/from_iter_iovec · 3fa6c507
      Mikulas Patocka 提交于
      copy_page_to_iter_iovec() and copy_page_from_iter_iovec() copy some data
      to userspace or from userspace.  These functions have a fast path where
      they map a page using kmap_atomic and a slow path where they use kmap.
      
      kmap is slower than kmap_atomic, so the fast path is preferred.
      
      However, on kernels without highmem support, kmap just calls
      page_address, so there is no need to avoid kmap.  On kernels without
      highmem support, the fast path just increases code size (and cache
      footprint) and it doesn't improve copy performance in any way.
      
      This patch enables the fast path only if CONFIG_HIGHMEM is defined.
      
      Code size reduced by this patch:
        x86 (without highmem)	  928
        x86-64		  960
        sparc64		  848
        alpha			 1136
        pa-risc		 1200
      
      [akpm@linux-foundation.org: use IS_ENABLED(), per Andi]
      Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1607221711410.4818@file01.intranet.prod.int.rdu2.redhat.comSigned-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3fa6c507
  30. 10 6月, 2016 1 次提交