1. 30 9月, 2018 1 次提交
    • D
      Merge tag 'rxrpc-fixes-20180928' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · f810dcec
      David S. Miller 提交于
      David Howells says:
      
      ====================
      rxrpc: Fixes
      
      Here are some miscellaneous fixes for AF_RXRPC:
      
       (1) Remove a duplicate variable initialisation.
      
       (2) Fix one of the checks made when we decide to set up a new incoming
           service call in which a flag is being checked in the wrong field of
           the packet header.  This check is abstracted out into helper
           functions.
      
       (3) Fix RTT gathering.  The code has been trying to make use of socket
           timestamps, but wasn't actually enabling them.  The code has also been
           recording a transmit time for the outgoing packet for which we're
           going to measure the RTT after sending the message - but we can get
           the incoming packet before we get to that and record a negative RTT.
      
       (4) Fix the emission of BUSY packets (we are emitting ABORTs instead).
      
       (5) Improve error checking on incoming packets.
      
       (6) Try to fix a bug in new service call handling whereby a BUG we should
           never be able to reach somehow got triggered.  Do this by moving much
           of the checking as early as possible and not repeating it later
           (depends on (5) above).
      
       (7) Fix the sockopts set on a UDP6 socket to include the ones set on a
           UDP4 socket so that we receive UDP4 errors and packet-too-large
           notifications too.
      
       (8) Fix the distribution of errors so that we do it at the point of
           receiving an error in the UDP callback rather than deferring it
           thereby cutting short any transmissions that would otherwise occur in
           the window.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f810dcec
  2. 29 9月, 2018 19 次提交
    • D
      Merge branch 'netpoll-second-round-of-fixes' · f13d1b48
      David S. Miller 提交于
      Eric Dumazet says:
      
      ====================
      netpoll: second round of fixes.
      
      As diagnosed by Song Liu, ndo_poll_controller() can
      be very dangerous on loaded hosts, since the cpu
      calling ndo_poll_controller() might steal all NAPI
      contexts (for all RX/TX queues of the NIC).
      
      This capture, showing one ksoftirqd eating all cycles
      can last for unlimited amount of time, since one
      cpu is generally not able to drain all the queues under load.
      
      It seems that all networking drivers that do use NAPI
      for their TX completions, should not provide a ndo_poll_controller() :
      
      Most NAPI drivers have netpoll support already handled
      in core networking stack, since netpoll_poll_dev()
      uses poll_napi(dev) to iterate through registered
      NAPI contexts for a device.
      
      First patch is a fix in poll_one_napi().
      
      Then following patches take care of ten drivers.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f13d1b48
    • E
      ibmvnic: remove ndo_poll_controller · 0c3b9d1b
      Eric Dumazet 提交于
      As diagnosed by Song Liu, ndo_poll_controller() can
      be very dangerous on loaded hosts, since the cpu
      calling ndo_poll_controller() might steal all NAPI
      contexts (for all RX/TX queues of the NIC). This capture
      can last for unlimited amount of time, since one
      cpu is generally not able to drain all the queues under load.
      
      ibmvnic uses NAPI for TX completions, so we better let core
      networking stack call the napi->poll() to avoid the capture.
      
      ibmvnic_netpoll_controller() was completely wrong anyway,
      as it was scheduling NAPI to service RX queues (instead of TX),
      so I doubt netpoll ever worked on this driver.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
      Cc: John Allen <jallen@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c3b9d1b
    • E
      sfc-falcon: remove ndo_poll_controller · a4f570be
      Eric Dumazet 提交于
      As diagnosed by Song Liu, ndo_poll_controller() can
      be very dangerous on loaded hosts, since the cpu
      calling ndo_poll_controller() might steal all NAPI
      contexts (for all RX/TX queues of the NIC). This capture
      can last for unlimited amount of time, since one
      cpu is generally not able to drain all the queues under load.
      
      sfc-falcon uses NAPI for TX completions, so we better let core
      networking stack call the napi->poll() to avoid the capture.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Solarflare linux maintainers <linux-net-drivers@solarflare.com>
      Cc: Edward Cree <ecree@solarflare.com>
      Cc: Bert Kenward <bkenward@solarflare.com>
      Acked-By: NBert Kenward <bkenward@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a4f570be
    • E
      sfc: remove ndo_poll_controller · 9447a10f
      Eric Dumazet 提交于
      As diagnosed by Song Liu, ndo_poll_controller() can
      be very dangerous on loaded hosts, since the cpu
      calling ndo_poll_controller() might steal all NAPI
      contexts (for all RX/TX queues of the NIC). This capture
      can last for unlimited amount of time, since one
      cpu is generally not able to drain all the queues under load.
      
      sfc uses NAPI for TX completions, so we better let core
      networking stack call the napi->poll() to avoid the capture.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Edward Cree <ecree@solarflare.com>
      Cc: Bert Kenward <bkenward@solarflare.com>
      Cc: Solarflare linux maintainers <linux-net-drivers@solarflare.com>
      Acked-By: NBert Kenward <bkenward@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9447a10f
    • E
      net: ena: remove ndo_poll_controller · 21627982
      Eric Dumazet 提交于
      As diagnosed by Song Liu, ndo_poll_controller() can
      be very dangerous on loaded hosts, since the cpu
      calling ndo_poll_controller() might steal all NAPI
      contexts (for all RX/TX queues of the NIC). This capture
      can last for unlimited amount of time, since one
      cpu is generally not able to drain all the queues under load.
      
      ena uses NAPI for TX completions, so we better let core
      networking stack call the napi->poll() to avoid the capture.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Netanel Belgazal <netanel@amazon.com>
      Cc: Saeed Bishara <saeedb@amazon.com>
      Cc: Zorik Machulsky <zorik@amazon.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      21627982
    • E
      qlogic: netxen: remove ndo_poll_controller · 3548fcf7
      Eric Dumazet 提交于
      As diagnosed by Song Liu, ndo_poll_controller() can
      be very dangerous on loaded hosts, since the cpu
      calling ndo_poll_controller() might steal all NAPI
      contexts (for all RX/TX queues of the NIC). This capture
      can last for unlimited amount of time, since one
      cpu is generally not able to drain all the queues under load.
      
      netxen uses NAPI for TX completions, so we better let core
      networking stack call the napi->poll() to avoid the capture.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Manish Chopra <manish.chopra@cavium.com>
      Cc: Rahul Verma <rahul.verma@cavium.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3548fcf7
    • E
      qlcnic: remove ndo_poll_controller · 81b059b2
      Eric Dumazet 提交于
      As diagnosed by Song Liu, ndo_poll_controller() can
      be very dangerous on loaded hosts, since the cpu
      calling ndo_poll_controller() might steal all NAPI
      contexts (for all RX/TX queues of the NIC). This capture
      can last for unlimited amount of time, since one
      cpu is generally not able to drain all the queues under load.
      
      qlcnic uses NAPI for TX completions, so we better let core
      networking stack call the napi->poll() to avoid the capture.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Harish Patil <harish.patil@cavium.com>
      Cc: Manish Chopra <manish.chopra@cavium.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      81b059b2
    • E
      virtio_net: remove ndo_poll_controller · 260dd2c3
      Eric Dumazet 提交于
      As diagnosed by Song Liu, ndo_poll_controller() can
      be very dangerous on loaded hosts, since the cpu
      calling ndo_poll_controller() might steal all NAPI
      contexts (for all RX/TX queues of the NIC). This capture
      can last for unlimited amount of time, since one
      cpu is generally not able to drain all the queues under load.
      
      virto_net uses NAPI for TX completions, so we better let core
      networking stack call the napi->poll() to avoid the capture.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      260dd2c3
    • E
      net: hns: remove ndo_poll_controller · 4bd2c03b
      Eric Dumazet 提交于
      As diagnosed by Song Liu, ndo_poll_controller() can
      be very dangerous on loaded hosts, since the cpu
      calling ndo_poll_controller() might steal all NAPI
      contexts (for all RX/TX queues of the NIC). This capture
      can last for unlimited amount of time, since one
      cpu is generally not able to drain all the queues under load.
      
      hns uses NAPI for TX completions, so we better let core
      networking stack call the napi->poll() to avoid the capture.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Yisen Zhuang <yisen.zhuang@huawei.com>
      Cc: Salil Mehta <salil.mehta@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4bd2c03b
    • E
      ehea: remove ndo_poll_controller · 226a2dd6
      Eric Dumazet 提交于
      As diagnosed by Song Liu, ndo_poll_controller() can
      be very dangerous on loaded hosts, since the cpu
      calling ndo_poll_controller() might steal all NAPI
      contexts (for all RX/TX queues of the NIC). This capture
      can last for unlimited amount of time, since one
      cpu is generally not able to drain all the queues under load.
      
      ehea uses NAPI for TX completions, so we better let core
      networking stack call the napi->poll() to avoid the capture.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Douglas Miller <dougmill@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      226a2dd6
    • E
      hinic: remove ndo_poll_controller · e71fb423
      Eric Dumazet 提交于
      As diagnosed by Song Liu, ndo_poll_controller() can
      be very dangerous on loaded hosts, since the cpu
      calling ndo_poll_controller() might steal all NAPI
      contexts (for all RX/TX queues of the NIC). This capture
      can last for unlimited amount of time, since one
      cpu is generally not able to drain all the queues under load.
      
      hinic uses NAPI for TX completions, so we better let core
      networking stack call the napi->poll() to avoid the capture.
      
      Note that hinic_netpoll() was incorrectly scheduling NAPI
      on both RX and TX queues.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Aviad Krawczyk <aviad.krawczyk@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e71fb423
    • E
      netpoll: do not test NAPI_STATE_SCHED in poll_one_napi() · c24498c6
      Eric Dumazet 提交于
      Since we do no longer require NAPI drivers to provide
      an ndo_poll_controller(), napi_schedule() has not been done
      before poll_one_napi() invocation.
      
      So testing NAPI_STATE_SCHED is likely to cause early returns.
      
      While we are at it, remove outdated comment.
      
      Note to future bisections : This change might surface prior
      bugs in drivers. See commit 73f21c65 ("bnxt_en: Fix TX
      timeout during netpoll.") for one occurrence.
      
      Fixes: ac3d9dd0 ("netpoll: make ndo_poll_controller() optional")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Tested-by: NSong Liu <songliubraving@fb.com>
      Cc: Michael Chan <michael.chan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c24498c6
    • D
      Merge tag 'mac80211-for-davem-2018-09-27' of... · 05c5e9ff
      David S. Miller 提交于
      Merge tag 'mac80211-for-davem-2018-09-27' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
      
      Johannes Berg says:
      
      ====================
      More patches than I'd like perhaps, but each seems reasonable:
       * two new spectre-v1 mitigations in nl80211
       * TX status fix in general, and mesh in particular
       * powersave vs. offchannel fix
       * regulatory initialization fix
       * fix for a queue hang due to a bad return value
       * allocate TXQs for active monitor interfaces, fixing my
         earlier patch to avoid unnecessary allocations where I
         missed this case needed them
       * fix TDLS data frames priority assignment
       * fix scan results processing to take into account duplicate
         channel numbers (over different operating classes, but we
         don't necessarily know the operating class)
       * various hwsim fixes for radio destruction and new radio
         announcement messages
       * remove an extraneous kernel-doc line
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      05c5e9ff
    • S
      qed: Fix shmem structure inconsistency between driver and the mfw. · 5f672090
      Sudarsana Reddy Kalluru 提交于
      The structure shared between driver and the management FW (mfw) differ in
      sizes. This would lead to issues when driver try to access the structure
      members which are not-aligned with the mfw copy e.g., data_ptr usage in the
      case of mfw_tlv request.
      Align the driver structure with mfw copy, add reserved field(s) to driver
      structure for the members not used by the driver.
      
      Fixes: dd006921 ("qed: Add MFW interfaces for TLV request support.)
      Signed-off-by: NSudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
      Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
      5f672090
    • S
    • S
      MAINTAINERS: change bridge maintainers · ce7d17d6
      Stephen Hemminger 提交于
      I haven't been doing reviews only but not active development on bridge
      code for several years. Roopa and Nikolay have been doing most of
      the new features and have agreed to take over as new co-maintainers.
      Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
      Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Acked-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      ce7d17d6
    • D
      Merge branch 's390-qeth-fixes' · 26258cb3
      David S. Miller 提交于
      Julian Wiedmann says:
      
      ====================
      s390/qeth: fixes 2019-09-26
      
      please apply two qeth patches for -net. The first is a trivial cleanup
      required for patch #2 by Jean, which fixes a potential endless loop.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      26258cb3
    • J
      s390: qeth: Fix potential array overrun in cmd/rc lookup · 048a7f8b
      Jean Delvare 提交于
      Functions qeth_get_ipa_msg and qeth_get_ipa_cmd_name are modifying
      the last member of global arrays without any locking that I can see.
      If two instances of either function are running at the same time,
      it could cause a race ultimately leading to an array overrun (the
      contents of the last entry of the array is the only guarantee that
      the loop will ever stop).
      
      Performing the lookups without modifying the arrays is admittedly
      slower (two comparisons per iteration instead of one) but these
      are operations which are rare (should only be needed in error
      cases or when debugging, not during successful operation) and it
      seems still less costly than introducing a mutex to protect the
      arrays in question.
      
      As a side bonus, it allows us to declare both arrays as const data.
      Signed-off-by: NJean Delvare <jdelvare@suse.de>
      Cc: Julian Wiedmann <jwi@linux.ibm.com>
      Cc: Ursula Braun <ubraun@linux.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      048a7f8b
    • Z
      s390: qeth_core_mpc: Use ARRAY_SIZE instead of reimplementing its function · 065a2cdc
      zhong jiang 提交于
      Use the common code ARRAY_SIZE macro instead of a private implementation.
      Reviewed-by: NJean Delvare <jdelvare@suse.de>
      Signed-off-by: Nzhong jiang <zhongjiang@huawei.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      065a2cdc
  3. 28 9月, 2018 7 次提交
    • D
      rxrpc: Fix error distribution · f3344303
      David Howells 提交于
      Fix error distribution by immediately delivering the errors to all the
      affected calls rather than deferring them to a worker thread.  The problem
      with the latter is that retries and things can happen in the meantime when we
      want to stop that sooner.
      
      To this end:
      
       (1) Stop the error distributor from removing calls from the error_targets
           list so that peer->lock isn't needed to synchronise against other adds
           and removals.
      
       (2) Require the peer's error_targets list to be accessed with RCU, thereby
           avoiding the need to take peer->lock over distribution.
      
       (3) Don't attempt to affect a call's state if it is already marked complete.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      f3344303
    • D
      rxrpc: Fix transport sockopts to get IPv4 errors on an IPv6 socket · 37a675e7
      David Howells 提交于
      It seems that enabling IPV6_RECVERR on an IPv6 socket doesn't also turn on
      IP_RECVERR, so neither local errors nor ICMP-transported remote errors from
      IPv4 peer addresses are returned to the AF_RXRPC protocol.
      
      Make the sockopt setting code in rxrpc_open_socket() fall through from the
      AF_INET6 case to the AF_INET case to turn on all the AF_INET options too in
      the AF_INET6 case.
      
      Fixes: f2aeed3a ("rxrpc: Fix error reception on AF_INET6 sockets")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      37a675e7
    • D
      rxrpc: Make service call handling more robust · 0099dc58
      David Howells 提交于
      Make the following changes to improve the robustness of the code that sets
      up a new service call:
      
       (1) Cache the rxrpc_sock struct obtained in rxrpc_data_ready() to do a
           service ID check and pass that along to rxrpc_new_incoming_call().
           This means that I can remove the check from rxrpc_new_incoming_call()
           without the need to worry about the socket attached to the local
           endpoint getting replaced - which would invalidate the check.
      
       (2) Cache the rxrpc_peer struct, thereby allowing the peer search to be
           done once.  The peer is passed to rxrpc_new_incoming_call(), thereby
           saving the need to repeat the search.
      
           This also reduces the possibility of rxrpc_publish_service_conn()
           BUG()'ing due to the detection of a duplicate connection, despite the
           initial search done by rxrpc_find_connection_rcu() having turned up
           nothing.
      
           This BUG() shouldn't ever get hit since rxrpc_data_ready() *should* be
           non-reentrant and the result of the initial search should still hold
           true, but it has proven possible to hit.
      
           I *think* this may be due to __rxrpc_lookup_peer_rcu() cutting short
           the iteration over the hash table if it finds a matching peer with a
           zero usage count, but I don't know for sure since it's only ever been
           hit once that I know of.
      
           Another possibility is that a bug in rxrpc_data_ready() that checked
           the wrong byte in the header for the RXRPC_CLIENT_INITIATED flag
           might've let through a packet that caused a spurious and invalid call
           to be set up.  That is addressed in another patch.
      
       (3) Fix __rxrpc_lookup_peer_rcu() to skip peer records that have a zero
           usage count rather than stopping and returning not found, just in case
           there's another peer record behind it in the bucket.
      
       (4) Don't search the peer records in rxrpc_alloc_incoming_call(), but
           rather either use the peer cached in (2) or, if one wasn't found,
           preemptively install a new one.
      
      Fixes: 8496af50 ("rxrpc: Use RCU to access a peer's service connection tree")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      0099dc58
    • D
      rxrpc: Improve up-front incoming packet checking · 403fc2a1
      David Howells 提交于
      Do more up-front checking on incoming packets to weed out invalid ones and
      also ones aimed at services that we don't support.
      
      Whilst we're at it, replace the clearing of call and skew if we don't find
      a connection with just initialising the variables to zero at the top of the
      function.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      403fc2a1
    • D
      rxrpc: Emit BUSY packets when supposed to rather than ABORTs · ece64fec
      David Howells 提交于
      In the input path, a received sk_buff can be marked for rejection by
      setting RXRPC_SKB_MARK_* in skb->mark and, if needed, some auxiliary data
      (such as an abort code) in skb->priority.  The rejection is handled by
      queueing the sk_buff up for dealing with in process context.  The output
      code reads the mark and priority and, theoretically, generates an
      appropriate response packet.
      
      However, if RXRPC_SKB_MARK_BUSY is set, this isn't noticed and an ABORT
      message with a random abort code is generated (since skb->priority wasn't
      set to anything).
      
      Fix this by outputting the appropriate sort of packet.
      
      Also, whilst we're at it, most of the marks are no longer used, so remove
      them and rename the remaining two to something more obvious.
      
      Fixes: 248f219c ("rxrpc: Rewrite the data and ack handling code")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      ece64fec
    • D
      rxrpc: Fix RTT gathering · b604dd98
      David Howells 提交于
      Fix RTT information gathering in AF_RXRPC by the following means:
      
       (1) Enable Rx timestamping on the transport socket with SO_TIMESTAMPNS.
      
       (2) If the sk_buff doesn't have a timestamp set when rxrpc_data_ready()
           collects it, set it at that point.
      
       (3) Allow ACKs to be requested on the last packet of a client call, but
           not a service call.  We need to be careful lest we undo:
      
      	bf7d620a
      	Author: David Howells <dhowells@redhat.com>
      	Date:   Thu Oct 6 08:11:51 2016 +0100
      	rxrpc: Don't request an ACK on the last DATA packet of a call's Tx phase
      
           but that only really applies to service calls that we're handling,
           since the client side gets to send the final ACK (or not).
      
       (4) When about to transmit an ACK or DATA packet, record the Tx timestamp
           before only; don't update the timestamp afterwards.
      
       (5) Switch the ordering between recording the serial and recording the
           timestamp to always set the serial number first.  The serial number
           shouldn't be seen referenced by an ACK packet until we've transmitted
           the packet bearing it - so in the Rx path, we don't need the timestamp
           until we've checked the serial number.
      
      Fixes: cf1a6474 ("rxrpc: Add per-peer RTT tracker")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      b604dd98
    • D
      rxrpc: Fix checks as to whether we should set up a new call · dc71db34
      David Howells 提交于
      There's a check in rxrpc_data_ready() that's checking the CLIENT_INITIATED
      flag in the packet type field rather than in the packet flags field.
      
      Fix this by creating a pair of helper functions to check whether the packet
      is going to the client or to the server and use them generally.
      
      Fixes: 248f219c ("rxrpc: Rewrite the data and ack handling code")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      dc71db34
  4. 27 9月, 2018 13 次提交
    • D
      rxrpc: Remove dup code from rxrpc_find_connection_rcu() · 092ffc51
      David Howells 提交于
      rxrpc_find_connection_rcu() initialises variable k twice with the same
      information.  Remove one of the initialisations.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      092ffc51
    • M
      nl80211: Fix possible Spectre-v1 for CQM RSSI thresholds · 1222a160
      Masashi Honma 提交于
      Use array_index_nospec() to sanitize i with respect to speculation.
      
      Note that the user doesn't control i directly, but can make it out
      of bounds by not finding a threshold in the array.
      Signed-off-by: NMasashi Honma <masashi.honma@gmail.com>
      [add note about user control, as explained by Masashi]
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      1222a160
    • M
      net-tcp: /proc/sys/net/ipv4/tcp_probe_interval is a u32 not int · d4ce5808
      Maciej Żenczykowski 提交于
      (fix documentation and sysctl access to treat it as such)
      
      Tested:
        # zcat /proc/config.gz | egrep ^CONFIG_HZ
        CONFIG_HZ_1000=y
        CONFIG_HZ=1000
        # echo $[(1<<32)/1000 + 1] | tee /proc/sys/net/ipv4/tcp_probe_interval
        4294968
        tee: /proc/sys/net/ipv4/tcp_probe_interval: Invalid argument
        # echo $[(1<<32)/1000] | tee /proc/sys/net/ipv4/tcp_probe_interval
        4294967
        # echo 0 | tee /proc/sys/net/ipv4/tcp_probe_interval
        # echo -1 | tee /proc/sys/net/ipv4/tcp_probe_interval
        -1
        tee: /proc/sys/net/ipv4/tcp_probe_interval: Invalid argument
      Signed-off-by: NMaciej Żenczykowski <maze@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4ce5808
    • M
      bnxt_en: Fix TX timeout during netpoll. · 73f21c65
      Michael Chan 提交于
      The current netpoll implementation in the bnxt_en driver has problems
      that may miss TX completion events.  bnxt_poll_work() in effect is
      only handling at most 1 TX packet before exiting.  In addition,
      there may be in flight TX completions that ->poll() may miss even
      after we fix bnxt_poll_work() to handle all visible TX completions.
      netpoll may not call ->poll() again and HW may not generate IRQ
      because the driver does not ARM the IRQ when the budget (0 for netpoll)
      is reached.
      
      We fix it by handling all TX completions and to always ARM the IRQ
      when we exit ->poll() with 0 budget.
      
      Also, the logic to ACK the completion ring in case it is almost filled
      with TX completions need to be adjusted to take care of the 0 budget
      case, as discussed with Eric Dumazet <edumazet@google.com>
      Reported-by: NSong Liu <songliubraving@fb.com>
      Reviewed-by: NSong Liu <songliubraving@fb.com>
      Tested-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      73f21c65
    • H
      vxlan: fill ttl inherit info · 8fd78069
      Hangbin Liu 提交于
      When add vxlan ttl inherit support, I forgot to fill it when dump
      vlxan info. Fix it now.
      
      Fixes: 72f6d71e ("vxlan: add ttl inherit support")
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8fd78069
    • A
      net: phy: sfp: Fix unregistering of HWMON SFP device · 3e322474
      Andrew Lunn 提交于
      A HWMON device is only registered is the SFP module supports the
      diagnostic page and is complient to SFF8472. Don't unconditionally
      unregister the hwmon device when the SFP module is remove, otherwise
      we access data structures which don't exist.
      Reported-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Fixes: 1323061a ("net: phy: sfp: Add HWMON support for module sensors")
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Tested-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3e322474
    • N
      qed: Avoid implicit enum conversion in qed_iwarp_parse_rx_pkt · 77f2d753
      Nathan Chancellor 提交于
      Clang warns when one enumerated type is implicitly converted to another.
      
      drivers/net/ethernet/qlogic/qed/qed_iwarp.c:1713:25: warning: implicit
      conversion from enumeration type 'enum tcp_ip_version' to different
      enumeration type 'enum qed_tcp_ip_version' [-Wenum-conversion]
                      cm_info->ip_version = TCP_IPV4;
                                          ~ ^~~~~~~~
      drivers/net/ethernet/qlogic/qed/qed_iwarp.c:1733:25: warning: implicit
      conversion from enumeration type 'enum tcp_ip_version' to different
      enumeration type 'enum qed_tcp_ip_version' [-Wenum-conversion]
                      cm_info->ip_version = TCP_IPV6;
                                          ~ ^~~~~~~~
      2 warnings generated.
      
      Use the appropriate values from the expected type, qed_tcp_ip_version:
      
      TCP_IPV4 = QED_TCP_IPV4 = 0
      TCP_IPV6 = QED_TCP_IPV6 = 1
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/125Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      77f2d753
    • N
      qed: Avoid constant logical operation warning in qed_vf_pf_acquire · 1c492a9d
      Nathan Chancellor 提交于
      Clang warns when a constant is used in a boolean context as it thinks a
      bitwise operation may have been intended.
      
      drivers/net/ethernet/qlogic/qed/qed_vf.c:415:27: warning: use of logical
      '&&' with constant operand [-Wconstant-logical-operand]
              if (!p_iov->b_pre_fp_hsi &&
                                       ^
      drivers/net/ethernet/qlogic/qed/qed_vf.c:415:27: note: use '&' for a
      bitwise operation
              if (!p_iov->b_pre_fp_hsi &&
                                       ^~
                                       &
      drivers/net/ethernet/qlogic/qed/qed_vf.c:415:27: note: remove constant
      to silence this warning
              if (!p_iov->b_pre_fp_hsi &&
                                      ~^~
      1 warning generated.
      
      This has been here since commit 1fe614d1 ("qed: Relax VF firmware
      requirements") and I am not entirely sure why since 0 isn't a special
      case. Just remove the statement causing Clang to warn since it isn't
      required.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/126Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1c492a9d
    • M
      bonding: avoid possible dead-lock · d4859d74
      Mahesh Bandewar 提交于
      Syzkaller reported this on a slightly older kernel but it's still
      applicable to the current kernel -
      
      ======================================================
      WARNING: possible circular locking dependency detected
      4.18.0-next-20180823+ #46 Not tainted
      ------------------------------------------------------
      syz-executor4/26841 is trying to acquire lock:
      00000000dd41ef48 ((wq_completion)bond_dev->name){+.+.}, at: flush_workqueue+0x2db/0x1e10 kernel/workqueue.c:2652
      
      but task is already holding lock:
      00000000768ab431 (rtnl_mutex){+.+.}, at: rtnl_lock net/core/rtnetlink.c:77 [inline]
      00000000768ab431 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x412/0xc30 net/core/rtnetlink.c:4708
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #2 (rtnl_mutex){+.+.}:
             __mutex_lock_common kernel/locking/mutex.c:925 [inline]
             __mutex_lock+0x171/0x1700 kernel/locking/mutex.c:1073
             mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1088
             rtnl_lock+0x17/0x20 net/core/rtnetlink.c:77
             bond_netdev_notify drivers/net/bonding/bond_main.c:1310 [inline]
             bond_netdev_notify_work+0x44/0xd0 drivers/net/bonding/bond_main.c:1320
             process_one_work+0xc73/0x1aa0 kernel/workqueue.c:2153
             worker_thread+0x189/0x13c0 kernel/workqueue.c:2296
             kthread+0x35a/0x420 kernel/kthread.c:246
             ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:415
      
      -> #1 ((work_completion)(&(&nnw->work)->work)){+.+.}:
             process_one_work+0xc0b/0x1aa0 kernel/workqueue.c:2129
             worker_thread+0x189/0x13c0 kernel/workqueue.c:2296
             kthread+0x35a/0x420 kernel/kthread.c:246
             ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:415
      
      -> #0 ((wq_completion)bond_dev->name){+.+.}:
             lock_acquire+0x1e4/0x4f0 kernel/locking/lockdep.c:3901
             flush_workqueue+0x30a/0x1e10 kernel/workqueue.c:2655
             drain_workqueue+0x2a9/0x640 kernel/workqueue.c:2820
             destroy_workqueue+0xc6/0x9d0 kernel/workqueue.c:4155
             __alloc_workqueue_key+0xef9/0x1190 kernel/workqueue.c:4138
             bond_init+0x269/0x940 drivers/net/bonding/bond_main.c:4734
             register_netdevice+0x337/0x1100 net/core/dev.c:8410
             bond_newlink+0x49/0xa0 drivers/net/bonding/bond_netlink.c:453
             rtnl_newlink+0xef4/0x1d50 net/core/rtnetlink.c:3099
             rtnetlink_rcv_msg+0x46e/0xc30 net/core/rtnetlink.c:4711
             netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2454
             rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4729
             netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
             netlink_unicast+0x5a0/0x760 net/netlink/af_netlink.c:1343
             netlink_sendmsg+0xa18/0xfc0 net/netlink/af_netlink.c:1908
             sock_sendmsg_nosec net/socket.c:622 [inline]
             sock_sendmsg+0xd5/0x120 net/socket.c:632
             ___sys_sendmsg+0x7fd/0x930 net/socket.c:2115
             __sys_sendmsg+0x11d/0x290 net/socket.c:2153
             __do_sys_sendmsg net/socket.c:2162 [inline]
             __se_sys_sendmsg net/socket.c:2160 [inline]
             __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2160
             do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
             entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      other info that might help us debug this:
      
      Chain exists of:
        (wq_completion)bond_dev->name --> (work_completion)(&(&nnw->work)->work) --> rtnl_mutex
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(rtnl_mutex);
                                     lock((work_completion)(&(&nnw->work)->work));
                                     lock(rtnl_mutex);
        lock((wq_completion)bond_dev->name);
      
       *** DEADLOCK ***
      
      1 lock held by syz-executor4/26841:
      
      stack backtrace:
      CPU: 1 PID: 26841 Comm: syz-executor4 Not tainted 4.18.0-next-20180823+ #46
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
       print_circular_bug.isra.34.cold.55+0x1bd/0x27d kernel/locking/lockdep.c:1222
       check_prev_add kernel/locking/lockdep.c:1862 [inline]
       check_prevs_add kernel/locking/lockdep.c:1975 [inline]
       validate_chain kernel/locking/lockdep.c:2416 [inline]
       __lock_acquire+0x3449/0x5020 kernel/locking/lockdep.c:3412
       lock_acquire+0x1e4/0x4f0 kernel/locking/lockdep.c:3901
       flush_workqueue+0x30a/0x1e10 kernel/workqueue.c:2655
       drain_workqueue+0x2a9/0x640 kernel/workqueue.c:2820
       destroy_workqueue+0xc6/0x9d0 kernel/workqueue.c:4155
       __alloc_workqueue_key+0xef9/0x1190 kernel/workqueue.c:4138
       bond_init+0x269/0x940 drivers/net/bonding/bond_main.c:4734
       register_netdevice+0x337/0x1100 net/core/dev.c:8410
       bond_newlink+0x49/0xa0 drivers/net/bonding/bond_netlink.c:453
       rtnl_newlink+0xef4/0x1d50 net/core/rtnetlink.c:3099
       rtnetlink_rcv_msg+0x46e/0xc30 net/core/rtnetlink.c:4711
       netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2454
       rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4729
       netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
       netlink_unicast+0x5a0/0x760 net/netlink/af_netlink.c:1343
       netlink_sendmsg+0xa18/0xfc0 net/netlink/af_netlink.c:1908
       sock_sendmsg_nosec net/socket.c:622 [inline]
       sock_sendmsg+0xd5/0x120 net/socket.c:632
       ___sys_sendmsg+0x7fd/0x930 net/socket.c:2115
       __sys_sendmsg+0x11d/0x290 net/socket.c:2153
       __do_sys_sendmsg net/socket.c:2162 [inline]
       __se_sys_sendmsg net/socket.c:2160 [inline]
       __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2160
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x457089
      Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f2df20a5c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00007f2df20a66d4 RCX: 0000000000457089
      RDX: 0000000000000000 RSI: 0000000020000180 RDI: 0000000000000003
      RBP: 0000000000930140 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 00000000004d40b8 R14: 00000000004c8ad8 R15: 0000000000000001
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4859d74
    • M
      bonding: pass link-local packets to bonding master also. · 6a9e461f
      Mahesh Bandewar 提交于
      Commit b89f04c6 ("bonding: deliver link-local packets with
      skb->dev set to link that packets arrived on") changed the behavior
      of how link-local-multicast packets are processed. The change in
      the behavior broke some legacy use cases where these packets are
      expected to arrive on bonding master device also.
      
      This patch passes the packet to the stack with the link it arrived
      on as well as passes to the bonding-master device to preserve the
      legacy use case.
      
      Fixes: b89f04c6 ("bonding: deliver link-local packets with skb->dev set to link that packets arrived on")
      Reported-by: NMichal Soltys <soltys@ziu.info>
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6a9e461f
    • N
      qed: Avoid implicit enum conversion in qed_roce_mode_to_flavor · d3a31579
      Nathan Chancellor 提交于
      Clang warns when one enumerated type is implicitly converted to another.
      
      drivers/net/ethernet/qlogic/qed/qed_roce.c:153:12: warning: implicit
      conversion from enumeration type 'enum roce_mode' to different
      enumeration type 'enum roce_flavor' [-Wenum-conversion]
                      flavor = ROCE_V2_IPV6;
                             ~ ^~~~~~~~~~~~
      drivers/net/ethernet/qlogic/qed/qed_roce.c:156:12: warning: implicit
      conversion from enumeration type 'enum roce_mode' to different
      enumeration type 'enum roce_flavor' [-Wenum-conversion]
                      flavor = MAX_ROCE_MODE;
                             ~ ^~~~~~~~~~~~~
      2 warnings generated.
      
      Use the appropriate values from the expected type, roce_flavor:
      
      ROCE_V2_IPV6 = RROCE_IPV6 = 2
      MAX_ROCE_MODE = MAX_ROCE_FLAVOR = 3
      
      While we're add it, ditch the local variable flavor, we can just return
      the value directly from the switch statement.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/125Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3a31579
    • N
      qed: Fix mask parameter in qed_vf_prep_tunn_req_tlv · db803f36
      Nathan Chancellor 提交于
      Clang complains when one enumerated type is implicitly converted to
      another.
      
      drivers/net/ethernet/qlogic/qed/qed_vf.c:686:6: warning: implicit
      conversion from enumeration type 'enum qed_tunn_mode' to different
      enumeration type 'enum qed_tunn_clss' [-Wenum-conversion]
                                       QED_MODE_L2GENEVE_TUNN,
                                       ^~~~~~~~~~~~~~~~~~~~~~
      
      Update mask's parameter to expect qed_tunn_mode, which is what was
      intended.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/125Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      db803f36
    • N
      qed: Avoid implicit enum conversion in qed_set_tunn_cls_info · a898fba3
      Nathan Chancellor 提交于
      Clang warns when one enumerated type is implicitly converted to another.
      
      drivers/net/ethernet/qlogic/qed/qed_sp_commands.c:163:25: warning:
      implicit conversion from enumeration type 'enum tunnel_clss' to
      different enumeration type 'enum qed_tunn_clss' [-Wenum-conversion]
              p_tun->vxlan.tun_cls = type;
                                   ~ ^~~~
      drivers/net/ethernet/qlogic/qed/qed_sp_commands.c:165:26: warning:
      implicit conversion from enumeration type 'enum tunnel_clss' to
      different enumeration type 'enum qed_tunn_clss' [-Wenum-conversion]
              p_tun->l2_gre.tun_cls = type;
                                    ~ ^~~~
      drivers/net/ethernet/qlogic/qed/qed_sp_commands.c:167:26: warning:
      implicit conversion from enumeration type 'enum tunnel_clss' to
      different enumeration type 'enum qed_tunn_clss' [-Wenum-conversion]
              p_tun->ip_gre.tun_cls = type;
                                    ~ ^~~~
      drivers/net/ethernet/qlogic/qed/qed_sp_commands.c:169:29: warning:
      implicit conversion from enumeration type 'enum tunnel_clss' to
      different enumeration type 'enum qed_tunn_clss' [-Wenum-conversion]
              p_tun->l2_geneve.tun_cls = type;
                                       ~ ^~~~
      drivers/net/ethernet/qlogic/qed/qed_sp_commands.c:171:29: warning:
      implicit conversion from enumeration type 'enum tunnel_clss' to
      different enumeration type 'enum qed_tunn_clss' [-Wenum-conversion]
              p_tun->ip_geneve.tun_cls = type;
                                       ~ ^~~~
      5 warnings generated.
      
      Avoid this by changing type to an int.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/125Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a898fba3