1. 06 4月, 2018 3 次提交
    • R
      headers: untangle kmemleak.h from mm.h · 514c6032
      Randy Dunlap 提交于
      Currently <linux/slab.h> #includes <linux/kmemleak.h> for no obvious
      reason.  It looks like it's only a convenience, so remove kmemleak.h
      from slab.h and add <linux/kmemleak.h> to any users of kmemleak_* that
      don't already #include it.  Also remove <linux/kmemleak.h> from source
      files that do not use it.
      
      This is tested on i386 allmodconfig and x86_64 allmodconfig.  It would
      be good to run it through the 0day bot for other $ARCHes.  I have
      neither the horsepower nor the storage space for the other $ARCHes.
      
      Update: This patch has been extensively build-tested by both the 0day
      bot & kisskb/ozlabs build farms.  Both of them reported 2 build failures
      for which patches are included here (in v2).
      
      [ slab.h is the second most used header file after module.h; kernel.h is
        right there with slab.h. There could be some minor error in the
        counting due to some #includes having comments after them and I didn't
        combine all of those. ]
      
      [akpm@linux-foundation.org: security/keys/big_key.c needs vmalloc.h, per sfr]
      Link: http://lkml.kernel.org/r/e4309f98-3749-93e1-4bb7-d9501a39d015@infradead.org
      Link: http://kisskb.ellerman.id.au/kisskb/head/13396/Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Reported-by: Michael Ellerman <mpe@ellerman.id.au>	[2 build failures]
      Reported-by: Fengguang Wu <fengguang.wu@intel.com>	[2 build failures]
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Wei Yongjun <weiyongjun1@huawei.com>
      Cc: Luis R. Rodriguez <mcgrof@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
      Cc: John Johansen <john.johansen@canonical.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      514c6032
    • C
      net/9p/client.c: fix potential refcnt problem of trans module · 9421c3e6
      Chengguang Xu 提交于
      When specifying trans_mod multiple times in a mount, it will cause an
      inaccurate refcount of the trans module.  Also, in the error case of
      option parsing, we should put the trans module if we have already got
      it.
      
      Link: http://lkml.kernel.org/r/1522154942-57339-1-git-send-email-cgxu519@gmx.comSigned-off-by: NChengguang Xu <cgxu519@gmx.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Eric Van Hensbergen <ericvh@gmail.com>
      Cc: Ron Minnich <rminnich@sandia.gov>
      Cc: Latchesar Ionkov <lucho@ionkov.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9421c3e6
    • G
      net/9p: avoid -ERESTARTSYS leak to userspace · a8522243
      Greg Kurz 提交于
      If it was interrupted by a signal, the 9p client may need to send some
      more requests to the server for cleanup before returning to userspace.
      
      To avoid such a last minute request to be interrupted right away, the
      client memorizes if a signal is pending, clears TIF_SIGPENDING, handles
      the request and calls recalc_sigpending() before returning.
      
      Unfortunately, if the transmission of this cleanup request fails for any
      reason, the transport returns an error and the client propagates it
      right away, without calling recalc_sigpending().
      
      This ends up with -ERESTARTSYS from the initially interrupted request
      crawling up to syscall exit, with TIF_SIGPENDING cleared by the cleanup
      request.  The specific signal handling code, which is responsible for
      converting -ERESTARTSYS to -EINTR is not called, and userspace receives
      the confusing errno value:
      
        open: Unknown error 512 (512)
      
      This is really hard to hit in real life.  I discovered the issue while
      working on hot-unplug of a virtio-9p-pci device with an instrumented
      QEMU allowing to control request completion.
      
      Both p9_client_zc_rpc() and p9_client_rpc() functions have this buggy
      error path actually.  Their code flow is a bit obscure and the best
      thing to do would probably be a full rewrite: to really ensure this
      situation of clearing TIF_SIGPENDING and returning -ERESTARTSYS can
      never happen.
      
      But given the general lack of interest for the 9p code, I won't risk
      breaking more things.  So this patch simply fixes the buggy paths in
      both functions with a trivial label+goto.
      
      Thanks to Laurent Dufour for his help and suggestions on how to find the
      root cause and how to fix it.
      
      Link: http://lkml.kernel.org/r/152062809886.10599.7361006774123053312.stgit@bahia.lanSigned-off-by: NGreg Kurz <groug@kaod.org>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NYiwen Jiang <jiangyiwen@huawei.com>
      Cc: Eric Van Hensbergen <ericvh@gmail.com>
      Cc: Ron Minnich <rminnich@sandia.gov>
      Cc: Latchesar Ionkov <lucho@ionkov.net>
      Cc: David Miller <davem@davemloft.net>
      Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a8522243
  2. 05 4月, 2018 1 次提交
  3. 04 4月, 2018 21 次提交
    • G
      tipc: Fix namespace violation in tipc_sk_fill_sock_diag · 4b2e6877
      GhantaKrishnamurthy MohanKrishna 提交于
      To fetch UID info for socket diagnostics, we determine the
      namespace of user context using tipc socket instance. This
      may cause namespace violation, as the kernel will remap based
      on UID.
      
      We fix this by fetching namespace info using the calling userspace
      netlink socket.
      
      Fixes: c30b70de (tipc: implement socket diagnostics for AF_TIPC)
      Reported-by: syzbot+326e587eff1074657718@syzkaller.appspotmail.com
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NGhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4b2e6877
    • P
      net: avoid unneeded atomic operation in ip*_append_data() · 9e8445a5
      Paolo Abeni 提交于
      After commit 694aba69 ("ipv4: factorize sk_wmem_alloc updates
      done by __ip_append_data()") and commit 1f4c6eb2 ("ipv6:
      factorize sk_wmem_alloc updates done by __ip6_append_data()"),
      when transmitting sub MTU datagram, an addtional, unneeded atomic
      operation is performed in ip*_append_data() to update wmem_alloc:
      in the above condition the delta is 0.
      
      The above cause small but measurable performance regression in UDP
      xmit tput test with packet size below MTU.
      
      This change avoids such overhead updating wmem_alloc only if
      wmem_alloc_delta is non zero.
      
      The error path is left intentionally unmodified: it's a slow path
      and simplicity is preferred to performances.
      
      Fixes: 694aba69 ("ipv4: factorize sk_wmem_alloc updates done by __ip_append_data()")
      Fixes: 1f4c6eb2 ("ipv6: factorize sk_wmem_alloc updates done by __ip6_append_data()")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9e8445a5
    • J
      tipc: Fix missing list initializations in struct tipc_subscription · b714295a
      Jon Maloy 提交于
      When an item of struct tipc_subscription is created, we fail to
      initialize the two lists aggregated into the struct. This has so far
      never been a problem, since the items are just added to a root
      object by list_add(), which does not require the addee list to be
      pre-initialized. However, syzbot is provoking situations where this
      addition fails, whereupon the attempted removal if the item from
      the list causes a crash.
      
      This problem seems to always have been around, despite that the code
      for creating this object was rewritten in commit 242e82cc ("tipc:
      collapse subscription creation functions"), which is still in net-next.
      
      We fix this for that commit by initializing the two lists properly.
      
      Fixes: 242e82cc ("tipc: collapse subscription creation functions")
      Reported-by: syzbot+0bb443b74ce09197e970@syzkaller.appspotmail.com
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b714295a
    • A
      ipv6: udp: set dst cache for a connected sk if current not valid · 4f858c56
      Alexey Kodanev 提交于
      A new RTF_CACHE route can be created between ip6_sk_dst_lookup_flow()
      and ip6_dst_store() calls in udpv6_sendmsg(), when datagram sending
      results to ICMPV6_PKT_TOOBIG error:
      
          udp_v6_send_skb(), for example with vti6 tunnel:
              vti6_xmit(), get ICMPV6_PKT_TOOBIG error
                  skb_dst_update_pmtu(), can create a RTF_CACHE clone
                  icmpv6_send()
          ...
          udpv6_err()
              ip6_sk_update_pmtu()
                 ip6_update_pmtu(), can create a RTF_CACHE clone
                 ...
                 ip6_datagram_dst_update()
                      ip6_dst_store()
      
      And after commit 33c162a9 ("ipv6: datagram: Update dst cache of
      a connected datagram sk during pmtu update"), the UDPv6 error handler
      can update socket's dst cache, but it can happen before the update in
      the end of udpv6_sendmsg(), preventing getting the new dst cache on
      the next udpv6_sendmsg() calls.
      
      In order to fix it, save dst in a connected socket only if the current
      socket's dst cache is invalid.
      
      The previous patch prepared ip6_sk_dst_lookup_flow() to do that with
      the new argument, and this patch enables it in udpv6_sendmsg().
      
      Fixes: 33c162a9 ("ipv6: datagram: Update dst cache of a connected datagram sk during pmtu update")
      Fixes: 45e4fd26 ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception")
      Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f858c56
    • A
      ipv6: udp: convert 'connected' to bool type in udpv6_sendmsg() · 9f542f61
      Alexey Kodanev 提交于
      This should make it consistent with ip6_sk_dst_lookup_flow()
      that is accepting the new 'connected' parameter of type bool.
      Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f542f61
    • A
      ipv6: allow to cache dst for a connected sk in ip6_sk_dst_lookup_flow() · 96818159
      Alexey Kodanev 提交于
      Add 'connected' parameter to ip6_sk_dst_lookup_flow() and update
      the cache only if ip6_sk_dst_check() returns NULL and a socket
      is connected.
      
      The function is used as before, the new behavior for UDP sockets
      in udpv6_sendmsg() will be enabled in the next patch.
      Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      96818159
    • A
      ipv6: add a wrapper for ip6_dst_store() with flowi6 checks · 7d6850f7
      Alexey Kodanev 提交于
      Move commonly used pattern of ip6_dst_store() usage to a separate
      function - ip6_sk_dst_store_flow(), which will check the addresses
      for equality using the flow information, before saving them.
      
      There is no functional changes in this patch. In addition, it will
      be used in the next patch, in ip6_sk_dst_lookup_flow().
      Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d6850f7
    • C
      af_unix: remove redundant lockdep class · 3848ec5d
      Cong Wang 提交于
      After commit 581319c5 ("net/socket: use per af lockdep classes for sk queues")
      sock queue locks now have per-af lockdep classes, including unix socket.
      It is no longer necessary to workaround it.
      
      I noticed this while looking at a syzbot deadlock report, this patch
      itself doesn't fix it (this is why I don't add Reported-by).
      
      Fixes: 581319c5 ("net/socket: use per af lockdep classes for sk queues")
      Cc: Paolo Abeni <pabeni@redhat.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3848ec5d
    • D
      rxrpc: Fix undefined packet handling · b41d7cfe
      David Howells 提交于
      By analogy with other Rx implementations, RxRPC packet types 9, 10 and 11
      should just be discarded rather than being aborted like other undefined
      packet types.
      Reported-by: NJeffrey Altman <jaltman@auristor.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b41d7cfe
    • E
      sunrpc: remove incorrect HMAC request initialization · f3aefb6a
      Eric Biggers 提交于
      make_checksum_hmac_md5() is allocating an HMAC transform and doing
      crypto API calls in the following order:
      
          crypto_ahash_init()
          crypto_ahash_setkey()
          crypto_ahash_digest()
      
      This is wrong because it makes no sense to init() the request before a
      key has been set, given that the initial state depends on the key.  And
      digest() is short for init() + update() + final(), so in this case
      there's no need to explicitly call init() at all.
      
      Before commit 9fa68f62 ("crypto: hash - prevent using keyed hashes
      without setting key") the extra init() had no real effect, at least for
      the software HMAC implementation.  (There are also hardware drivers that
      implement HMAC-MD5, and it's not immediately obvious how gracefully they
      handle init() before setkey().)  But now the crypto API detects this
      incorrect initialization and returns -ENOKEY.  This is breaking NFS
      mounts in some cases.
      
      Fix it by removing the incorrect call to crypto_ahash_init().
      Reported-by: NMichael Young <m.a.young@durham.ac.uk>
      Fixes: 9fa68f62 ("crypto: hash - prevent using keyed hashes without setting key")
      Fixes: fffdaef2 ("gss_krb5: Add support for rc4-hmac encryption")
      Cc: stable@vger.kernel.org
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      f3aefb6a
    • C
      NFSD: Clean up legacy NFS SYMLINK argument XDR decoders · 38a70315
      Chuck Lever 提交于
      Move common code in NFSD's legacy SYMLINK decoders into a helper.
      The immediate benefits include:
      
       - one fewer data copies on transports that support DDP
       - consistent error checking across all versions
       - reduction of code duplication
       - support for both legal forms of SYMLINK requests on RDMA
         transports for all versions of NFS (in particular, NFSv2, for
         completeness)
      
      In the long term, this helper is an appropriate spot to perform a
      per-transport call-out to fill the pathname argument using, say,
      RDMA Reads.
      
      Filling the pathname in the proc function also means that eventually
      the incoming filehandle can be interpreted so that filesystem-
      specific memory can be allocated as a sink for the pathname
      argument, rather than using anonymous pages.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      38a70315
    • C
      NFSD: Clean up legacy NFS WRITE argument XDR decoders · 8154ef27
      Chuck Lever 提交于
      Move common code in NFSD's legacy NFS WRITE decoders into a helper.
      The immediate benefit is reduction of code duplication and some nice
      micro-optimizations (see below).
      
      In the long term, this helper can perform a per-transport call-out
      to fill the rq_vec (say, using RDMA Reads).
      
      The legacy WRITE decoders and procs are changed to work like NFSv4,
      which constructs the rq_vec just before it is about to call
      vfs_writev.
      
      Why? Calling a transport call-out from the proc instead of the XDR
      decoder means that the incoming FH can be resolved to a particular
      filesystem and file. This would allow pages from the backing file to
      be presented to the transport to be filled, rather than presenting
      anonymous pages and copying or flipping them into the file's page
      cache later.
      
      I also prefer using the pages in rq_arg.pages, instead of pulling
      the data pages directly out of the rqstp::rq_pages array. This is
      currently the way the NFSv3 write decoder works, but the other two
      do not seem to take this approach. Fixing this removes the only
      reference to rq_pages found in NFSD, eliminating an NFSD assumption
      about how transports use the pages in rq_pages.
      
      Lastly, avoid setting up the first element of rq_vec as a zero-
      length buffer. This happens with an RDMA transport when a normal
      Read chunk is present because the data payload is in rq_arg's
      page list (none of it is in the head buffer).
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      8154ef27
    • C
      svc: Report xprt dequeue latency · 55f5088c
      Chuck Lever 提交于
      Record the time between when a rqstp is enqueued on a transport
      and when it is dequeued. This includes how long the rqstp waits on
      the queue and how long it takes the kernel scheduler to wake a
      nfsd thread to service it.
      
      The svc_xprt_dequeue trace point is altered to include the number
      of microseconds between xprt_enqueue and xprt_dequeue.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      55f5088c
    • C
      sunrpc: Report per-RPC execution stats · aaba72cd
      Chuck Lever 提交于
      Introduce a mechanism to report the server-side execution latency of
      each RPC. The goal is to enable user space to filter the trace
      record for latency outliers, build histograms, etc.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      aaba72cd
    • C
      sunrpc: Re-purpose trace_svc_process · 0b9547bf
      Chuck Lever 提交于
      Currently, trace_svc_process has two call sites:
      
      1. Just after a call to svc_send. svc_send already invokes
         trace_svc_send with the same arguments just before returning
      
      2. Just before a call to svc_drop. svc_drop already invokes
         trace_svc_drop with the same arguments just after it is called
      
      Therefore trace_svc_process does not provide any additional
      information not already provided by these other trace points.
      
      However, it would be useful to record the incoming RPC procedure.
      So reuse trace_svc_process for this purpose.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      0b9547bf
    • C
      sunrpc: Save remote presentation address in svc_xprt for trace events · ece200dd
      Chuck Lever 提交于
      TP_printk defines a format string that is passed to user space for
      converting raw trace event records to something human-readable.
      
      My user space's printf (Oracle Linux 7), however, does not have a
      %pI format specifier. The result is that what is supposed to be an
      IP address in the output of "trace-cmd report" is just a string that
      says the field couldn't be displayed.
      
      To fix this, adopt the same approach as the client: maintain a pre-
      formated presentation address for occasions when %pI is not
      available.
      
      The location of the trace_svc_send trace point is adjusted so that
      rqst->rq_xprt is not NULL when the trace event is recorded.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      ece200dd
    • C
      sunrpc: Simplify trace_svc_recv · 41f306d0
      Chuck Lever 提交于
      There doesn't seem to be a lot of value in calling trace_svc_recv
      in the failing case.
      
      1. There are two very common cases: one is the transport is not
      ready, and the other is shutdown. Neither is terribly interesting.
      
      2. The trace record for the failing case contains nothing but
      the status code.
      
      Therefore the trace point call site in the error exit is removed.
      Since the trace point is now recording a length instead of a
      status, rename the status field and remove the case that records a
      zero XID.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      41f306d0
    • C
      sunrpc: Simplify do_enqueue tracing · 7dbb53ba
      Chuck Lever 提交于
      There are three cases where svc_xprt_do_enqueue() returns without
      waking an nfsd thread:
      
      1. There is no work to do
      
      2. The transport is already busy
      
      3. There are no available nfsd threads
      
      Only 3. is truly interesting. Move the trace point so it records
      that there was work to do and either an nfsd thread was awoken, or
      a free one could not found.
      
      As an additional clean up, remove a redundant comment and a couple
      of dprintk call sites.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      7dbb53ba
    • C
      sunrpc: Move trace_svc_xprt_dequeue() · caa3e106
      Chuck Lever 提交于
      Reduce the amount of noise generated by trace_svc_xprt_dequeue by
      moving it to the end of svc_get_next_xprt. This generates exactly
      one trace event when a ready xprt is found, rather than spurious
      events when there is no work to do. The empty events contain no
      information that can't be obtained simply by tracing function calls
      to svc_xprt_dequeue.
      
      A small additional benefit is simplification of the svc_xprt_event
      trace class, which no longer has to handle the case when the @xprt
      parameter is NULL.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      caa3e106
    • C
      svc: Simplify ->xpo_secure_port · 989f881e
      Chuck Lever 提交于
      Clean up: Instead of returning a value that is used to set or clear
      a bit, just make ->xpo_secure_port mangle that bit, and return void.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      989f881e
    • C
      sunrpc: Remove unneeded pointer dereference · 63a1b156
      Chuck Lever 提交于
      Clean up: Noticed during code inspection that there is already a
      local automatic variable "xprt" so dereferencing rqst->rq_xprt
      again is unnecessary.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      63a1b156
  4. 03 4月, 2018 15 次提交