1. 17 1月, 2020 1 次提交
  2. 22 5月, 2019 1 次提交
    • E
      iov_iter: optimize page_copy_sane() · 627bb2d9
      Eric Dumazet 提交于
      commit 6daef95b8c914866a46247232a048447fff97279 upstream.
      
      Avoid cache line miss dereferencing struct page if we can.
      
      page_copy_sane() mostly deals with order-0 pages.
      
      Extra cache line miss is visible on TCP recvmsg() calls dealing
      with GRO packets (typically 45 page frags are attached to one skb).
      
      Bringing the 45 struct pages into cpu cache while copying the data
      is not free, since the freeing of the skb (and associated
      page frags put_page()) can happen after cache lines have been evicted.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      627bb2d9
  3. 16 7月, 2018 3 次提交
  4. 15 5月, 2018 1 次提交
  5. 03 5月, 2018 2 次提交
  6. 12 10月, 2017 1 次提交
  7. 21 9月, 2017 1 次提交
  8. 07 7月, 2017 1 次提交
    • A
      iov_iter: saner checks on copyin/copyout · 09fc68dc
      Al Viro 提交于
      * might_fault() is better checked in caller (and e.g. fault-in + kmap_atomic
      codepath also needs might_fault() coverage)
      * we have already done object size checks
      * we have *NOT* done access_ok() recently enough; we rely upon the
      iovec array having passed sanity checks back when it had been created
      and not nothing having buggered it since.  However, that's very much
      non-local, so we'd better recheck that.
      
      So the thing we want does not match anything in uaccess - we need
      access_ok + kasan checks + raw copy without any zeroing.  Just define
      such helpers and use them here.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      09fc68dc
  9. 30 6月, 2017 2 次提交
  10. 10 6月, 2017 1 次提交
    • D
      x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass operations · 0aed55af
      Dan Williams 提交于
      The pmem driver has a need to transfer data with a persistent memory
      destination and be able to rely on the fact that the destination writes are not
      cached. It is sufficient for the writes to be flushed to a cpu-store-buffer
      (non-temporal / "movnt" in x86 terms), as we expect userspace to call fsync()
      to ensure data-writes have reached a power-fail-safe zone in the platform. The
      fsync() triggers a REQ_FUA or REQ_FLUSH to the pmem driver which will turn
      around and fence previous writes with an "sfence".
      
      Implement a __copy_from_user_inatomic_flushcache, memcpy_page_flushcache, and
      memcpy_flushcache, that guarantee that the destination buffer is not dirty in
      the cpu cache on completion. The new copy_from_iter_flushcache and sub-routines
      will be used to replace the "pmem api" (include/linux/pmem.h +
      arch/x86/include/asm/pmem.h). The availability of copy_from_iter_flushcache()
      and memcpy_flushcache() are gated by the CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE
      config symbol, and fallback to copy_from_iter_nocache() and plain memcpy()
      otherwise.
      
      This is meant to satisfy the concern from Linus that if a driver wants to do
      something beyond the normal nocache semantics it should be something private to
      that driver [1], and Al's concern that anything uaccess related belongs with
      the rest of the uaccess code [2].
      
      The first consumer of this interface is a new 'copy_from_iter' dax operation so
      that pmem can inject cache maintenance operations without imposing this
      overhead on other dax-capable drivers.
      
      [1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html
      [2]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009942.html
      
      Cc: <x86@kernel.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      0aed55af
  11. 09 5月, 2017 2 次提交
    • M
      treewide: use kv[mz]alloc* rather than opencoded variants · 752ade68
      Michal Hocko 提交于
      There are many code paths opencoding kvmalloc.  Let's use the helper
      instead.  The main difference to kvmalloc is that those users are
      usually not considering all the aspects of the memory allocator.  E.g.
      allocation requests <= 32kB (with 4kB pages) are basically never failing
      and invoke OOM killer to satisfy the allocation.  This sounds too
      disruptive for something that has a reasonable fallback - the vmalloc.
      On the other hand those requests might fallback to vmalloc even when the
      memory allocator would succeed after several more reclaim/compaction
      attempts previously.  There is no guarantee something like that happens
      though.
      
      This patch converts many of those places to kv[mz]alloc* helpers because
      they are more conservative.
      
      Link: http://lkml.kernel.org/r/20170306103327.2766-2-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> # Xen bits
      Acked-by: NKees Cook <keescook@chromium.org>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: Andreas Dilger <andreas.dilger@intel.com> # Lustre
      Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> # KVM/s390
      Acked-by: Dan Williams <dan.j.williams@intel.com> # nvdim
      Acked-by: David Sterba <dsterba@suse.com> # btrfs
      Acked-by: Ilya Dryomov <idryomov@gmail.com> # Ceph
      Acked-by: Tariq Toukan <tariqt@mellanox.com> # mlx4
      Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx5
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Anton Vorontsov <anton@enomsg.org>
      Cc: Colin Cross <ccross@android.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Cc: Santosh Raspatur <santosh@chelsio.com>
      Cc: Hariprasad S <hariprasad@chelsio.com>
      Cc: Yishai Hadas <yishaih@mellanox.com>
      Cc: Oleg Drokin <oleg.drokin@intel.com>
      Cc: "Yan, Zheng" <zyan@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      752ade68
    • A
      fix braino in generic_file_read_iter() · 5b47d59a
      Al Viro 提交于
      Wrong sign of iov_iter_revert() argument.  Unfortunately, slipped through
      the testing, since most of the time we don't do anything to the iterator
      afterwards and potential oops on walking the iter->iov too far backwards
      is too infrequent to be easily triggered.
      
      Add a sanity check in iov_iter_revert() to catch bugs like this one;
      fortunately, the same braino hadn't happened in other callers, but we'd
      better have a warning if such thing crops up.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5b47d59a
  12. 30 4月, 2017 1 次提交
  13. 03 4月, 2017 1 次提交
  14. 29 3月, 2017 2 次提交
  15. 15 1月, 2017 1 次提交
  16. 23 12月, 2016 1 次提交
    • A
      [iov_iter] fix iterate_all_kinds() on empty iterators · 33844e66
      Al Viro 提交于
      Problem similar to ones dealt with in "fold checks into iterate_and_advance()"
      and followups, except that in this case we really want to do nothing when
      asked for zero-length operation - unlike zero-length iterate_and_advance(),
      zero-length iterate_all_kinds() has no side effects, and callers are simpler
      that way.
      
      That got exposed when copy_from_iter_full() had been used by tipc, which
      builds an msghdr with zero payload and (now) feeds it to a primitive
      based on iterate_all_kinds() instead of iterate_and_advance().
      Reported-by: NJon Maloy <jon.maloy@ericsson.com>
      Tested-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      33844e66
  17. 06 12月, 2016 1 次提交
    • A
      [iov_iter] new primitives - copy_from_iter_full() and friends · cbbd26b8
      Al Viro 提交于
      copy_from_iter_full(), copy_from_iter_full_nocache() and
      csum_and_copy_from_iter_full() - counterparts of copy_from_iter()
      et.al., advancing iterator only in case of successful full copy
      and returning whether it had been successful or not.
      
      Convert some obvious users.  *NOTE* - do not blindly assume that
      something is a good candidate for those unless you are sure that
      not advancing iov_iter in failure case is the right thing in
      this case.  Anything that does short read/short write kind of
      stuff (or is in a loop, etc.) is unlikely to be a good one.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      cbbd26b8
  18. 17 11月, 2016 1 次提交
    • A
      fix iov_iter_advance() for ITER_PIPE · 680bb946
      Abhi Das 提交于
      iov_iter_advance() needs to decrement iter->count by the number of
      bytes we'd moved beyond.  Normal flavours do that, but ITER_PIPE
      doesn't and ITER_PIPE generic_file_read_iter() for O_DIRECT files
      ends up with a bogus fallback to page cache read, resulting in incorrect
      values for file offset and bytes read.
      Signed-off-by: NAbhi Das <adas@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      680bb946
  19. 01 11月, 2016 1 次提交
  20. 15 10月, 2016 1 次提交
  21. 12 10月, 2016 1 次提交
  22. 06 10月, 2016 2 次提交
    • M
      pipe: add pipe_buf_release() helper · a779638c
      Miklos Szeredi 提交于
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a779638c
    • A
      new iov_iter flavour: pipe-backed · 241699cd
      Al Viro 提交于
      iov_iter variant for passing data into pipe.  copy_to_iter()
      copies data into page(s) it has allocated and stuffs them into
      the pipe; copy_page_to_iter() stuffs there a reference to the
      page given to it.  Both will try to coalesce if possible.
      iov_iter_zero() is similar to copy_to_iter(); iov_iter_get_pages()
      and friends will do as copy_to_iter() would have and return the
      pages where the data would've been copied.  iov_iter_advance()
      will truncate everything past the spot it has advanced to.
      
      New primitive: iov_iter_pipe(), used for initializing those.
      pipe should be locked all along.
      
      Running out of space acts as fault would for iovec-backed ones;
      in other words, giving it to ->read_iter() may result in short
      read if the pipe overflows, or -EFAULT if it happens with nothing
      copied there.
      
      In other words, ->read_iter() on those acts pretty much like
      ->splice_read().  Moreover, all generic_file_splice_read() users,
      as well as many other ->splice_read() instances can be switched
      to that scheme - that'll happen in the next commit.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      241699cd
  23. 28 9月, 2016 1 次提交
    • A
      get rid of separate multipage fault-in primitives · 4bce9f6e
      Al Viro 提交于
      * the only remaining callers of "short" fault-ins are just as happy with generic
      variants (both in lib/iov_iter.c); switch them to multipage variants, kill the
      "short" ones
      * rename the multipage variants to now available plain ones.
      * get rid of compat macro defining iov_iter_fault_in_multipage_readable by
      expanding it in its only user.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4bce9f6e
  24. 18 9月, 2016 1 次提交
  25. 29 7月, 2016 1 次提交
    • M
      mm: optimize copy_page_to/from_iter_iovec · 3fa6c507
      Mikulas Patocka 提交于
      copy_page_to_iter_iovec() and copy_page_from_iter_iovec() copy some data
      to userspace or from userspace.  These functions have a fast path where
      they map a page using kmap_atomic and a slow path where they use kmap.
      
      kmap is slower than kmap_atomic, so the fast path is preferred.
      
      However, on kernels without highmem support, kmap just calls
      page_address, so there is no need to avoid kmap.  On kernels without
      highmem support, the fast path just increases code size (and cache
      footprint) and it doesn't improve copy performance in any way.
      
      This patch enables the fast path only if CONFIG_HIGHMEM is defined.
      
      Code size reduced by this patch:
        x86 (without highmem)	  928
        x86-64		  960
        sparc64		  848
        alpha			 1136
        pa-risc		 1200
      
      [akpm@linux-foundation.org: use IS_ENABLED(), per Andi]
      Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1607221711410.4818@file01.intranet.prod.int.rdu2.redhat.comSigned-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3fa6c507
  26. 10 6月, 2016 1 次提交
  27. 26 5月, 2016 1 次提交
    • A
      do "fold checks into iterate_and_advance()" right · 19f18459
      Al Viro 提交于
      the only case when we should skip the iterate_and_advance() guts
      is when nothing's left in the iterator, _not_ just when requested
      amount is 0.  Said guts will do nothing in the latter case anyway;
      the problem we tried to deal with in the aforementioned commit is
      that when there's nothing left *and* the amount requested is 0,
      we might end up deferencing one iovec too many; the value we fetch
      from there is discarded in that case, but theoretically it might
      oops if the iovec array ends exactly at the end of page with the
      next page not mapped.
      
      Bailing out on zero size requested had an unexpected side effect -
      zero-length segment in the beginning of iovec array ended up
      throwing do_loop_readv_writev() into infinite spin; we do not
      advance past the empty segment at all.  Reproducer is trivial:
      echo '#include <sys/uio.h>' >a.c
      echo 'main() {char c; struct iovec v[] = {{&c,0},{&c,1}}; readv(0,v,2);}' >>a.c
      cc a.c && ./a.out </proc/uptime
      
      which should end up with the process not hanging.  Probably ought to
      go into LTP or xfstests...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      19f18459
  28. 10 5月, 2016 1 次提交
  29. 09 4月, 2016 1 次提交
  30. 07 12月, 2015 2 次提交
  31. 12 4月, 2015 1 次提交
  32. 30 3月, 2015 1 次提交
    • A
      saner iov_iter initialization primitives · bc917be8
      Al Viro 提交于
      iovec-backed iov_iter instances are assumed to satisfy several properties:
      	* no more than UIO_MAXIOV elements in iovec array
      	* total size of all ranges is no more than MAX_RW_COUNT
      	* all ranges pass access_ok().
      
      The problem is, invariants of data structures should be established in the
      primitives creating those data structures, not in the code using those
      primitives.  And iov_iter_init() violates that principle.  For a while we
      managed to get away with that, but once the use of iov_iter started to
      spread, it didn't take long for shit to hit the fan - missed check in
      sys_sendto() had introduced a roothole.
      
      We _do_ have primitives for importing and validating iovecs (both native and
      compat ones) and those primitives are almost always followed by shoving the
      resulting iovec into iov_iter.  Life would be considerably simpler (and safer)
      if we combined those primitives with initializing iov_iter.
      
      That gives us two new primitives - import_iovec() and compat_import_iovec().
      Calling conventions:
      	iovec = iov_array;
      	err = import_iovec(direction, uvec, nr_segs,
      			   ARRAY_SIZE(iov_array), &iovec,
      			   &iter);
      imports user vector into kernel space (into iov_array if it fits, allocated
      if it doesn't fit or if iovec was NULL), validates it and sets iter up to
      refer to it.  On success 0 is returned and allocated kernel copy (or NULL
      if the array had fit into caller-supplied one) is returned via iovec.
      On failure all allocations are undone and -E... is returned.  If the total
      size of ranges exceeds MAX_RW_COUNT, the excess is silently truncated.
      
      compat_import_iovec() expects uvec to be a pointer to user array of compat_iovec;
      otherwise it's identical to import_iovec().
      
      Finally, import_single_range() sets iov_iter backed by single-element iovec
      covering a user-supplied range -
      
      	err = import_single_range(direction, address, size, iovec, &iter);
      
      does validation and sets iter up.  Again, size in excess of MAX_RW_COUNT gets
      silently truncated.
      
      Next commits will be switching the things up to use of those and reducing
      the amount of iov_iter_init() instances.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      bc917be8