1. 18 1月, 2017 16 次提交
    • S
      qed: Replace memset with eth_zero_addr · 0ee28e31
      Shyam Saini 提交于
      Use eth_zero_addr to assign zero address to the given address array
      instead of memset when the second argument in memset is address
      of zero. Also, it makes the code clearer
      Signed-off-by: NShyam Saini <mayhs11saini@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ee28e31
    • L
      bridge: sparse fixes in br_ip6_multicast_alloc_query() · 53631a5f
      Lance Richardson 提交于
      Changed type of csum field in struct igmpv3_query from __be16 to
      __sum16 to eliminate type warning, made same change in struct
      igmpv3_report for consistency.
      
      Fixed up an ntohs() where htons() should have been used instead.
      Signed-off-by: NLance Richardson <lrichard@redhat.com>
      Acked-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      53631a5f
    • D
      580bdf56
    • D
      Merge branch 'mpls-packet-stats' · e60a4263
      David S. Miller 提交于
      Robert Shearman says:
      
      ====================
      mpls: Packet stats
      
      This patchset records per-interface packet stats in the MPLS
      forwarding path and exports them using a nest of attributes root at a
      new IFLA_STATS_AF_SPEC attribute as part of RTM_GETSTATS messages:
      
      [IFLA_STATS_AF_SPEC]
       -> [AF_MPLS]
        -> [MPLS_STATS_LINK]
         -> struct mpls_link_stats
      
      The first patch adds the rtnl infrastructure for this, including a new
      callbacks to per-AF ops of fill_stats_af and get_stats_af_size. The
      second patch records MPLS stats and makes use of the infrastructure to
      export them. The rtnl infrastructure could also be used to export IPv6
      stats in the future.
      
      Changes in v2:
       - make incrementing IPv6 stats in mpls_stats_inc_outucastpkts
         conditional on CONFIG_IPV6 to fix build with CONFIG_IPV6=n
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e60a4263
    • R
      mpls: Packet stats · 27d69105
      Robert Shearman 提交于
      Having MPLS packet stats is useful for observing network operation and
      for diagnosing network problems. In the absence of anything better,
      RFC2863 and RFC3813 are used for guidance for which stats to expose
      and the semantics of them. In particular rx_noroutes maps to in
      unknown protos in RFC2863. The stats are exposed to userspace via
      AF_MPLS attributes embedded in the IFLA_STATS_AF_SPEC attribute of
      RTM_GETSTATS messages.
      
      All the introduced fields are 64-bit, even error ones, to ensure no
      overflow with long uptimes. Per-CPU counters are used to avoid
      cache-line contention on the commonly used fields. The other fields
      have also been made per-CPU for code to avoid performance problems in
      error conditions on the assumption that on some platforms the cost of
      atomic operations could be more expensive than sending the packet
      (which is what would be done in the success case). If that's not the
      case, we could instead not use per-CPU counters for these fields.
      
      Only unicast and non-fragment are exposed at the moment, but other
      counters can be exposed in the future either by adding to the end of
      struct mpls_link_stats or by additional netlink attributes in the
      AF_MPLS IFLA_STATS_AF_SPEC nested attribute.
      Signed-off-by: NRobert Shearman <rshearma@brocade.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      27d69105
    • R
      net: AF-specific RTM_GETSTATS attributes · aefb4d4a
      Robert Shearman 提交于
      Add the functionality for including address-family-specific per-link
      stats in RTM_GETSTATS messages. This is done through adding a new
      IFLA_STATS_AF_SPEC attribute under which address family attributes are
      nested and then the AF-specific attributes can be further nested. This
      follows the model of IFLA_AF_SPEC on RTM_*LINK messages and it has the
      advantage of presenting an easily extended hierarchy. The rtnl_af_ops
      structure is extended to provide AFs with the opportunity to fill and
      provide the size of their stats attributes.
      
      One alternative would have been to provide AFs with the ability to add
      attributes directly into the RTM_GETSTATS message without a nested
      hierarchy. I discounted this approach as it increases the rate at
      which the 32 attribute number space is used up and it makes
      implementation a little more tricky for stats dump resuming (at the
      moment the order in which attributes are added to the message has to
      match the numeric order of the attributes).
      
      Another alternative would have been to register per-AF RTM_GETSTATS
      handlers. I discounted this approach as I perceived a common use-case
      to be getting all the stats for an interface and this approach would
      necessitate multiple requests/dumps to retrieve them all.
      Signed-off-by: NRobert Shearman <rshearma@brocade.com>
      Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aefb4d4a
    • J
      stmmac: add missing of_node_put · a249708b
      Julia Lawall 提交于
      The function stmmac_dt_phy provides several possibilities for initializing
      plat->mdio_node, all of which have the effect of increasing the reference
      count of the assigned value.  This field is not updated elsewhere, so the
      value is live until the end of the lifetime of plat (devm_allocated), just
      after the end of stmmac_remove_config_dt.  Thus, add an of_node_put on
      plat->mdio_node in stmmac_remove_config_dt.  It is possible that the field
      mdio_node is never initialized, but of_node_put is NULL-safe, so it is also
      safe to call of_node_put in that case.
      Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
      Acked-by: NAlexandre TORGUE <alexandre.torgue@st.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a249708b
    • R
      virtio: don't set VIRTIO_NET_HDR_F_DATA_VALID on xmit · 501db511
      Rolf Neugebauer 提交于
      This patch part reverts fd2a0437 and e858fae2 which introduced a
      subtle change in how the virtio_net flags are derived from the SKBs
      ip_summed field.
      
      With the above commits, the flags are set to VIRTIO_NET_HDR_F_DATA_VALID
      when ip_summed == CHECKSUM_UNNECESSARY, thus treating it differently to
      ip_summed == CHECKSUM_NONE, which should be the same.
      
      Further, the virtio spec 1.0 / CS04 explicitly says that
      VIRTIO_NET_HDR_F_DATA_VALID must not be set by the driver.
      
      Fixes: fd2a0437 ("virtio_net: introduce virtio_net_hdr_{from,to}_skb")
      Fixes: e858fae2 (" virtio_net: use common code for virtio_net_hdr and skb GSO conversion")
      Signed-off-by: NRolf Neugebauer <rolf.neugebauer@docker.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      501db511
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 4b19a9e2
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Handle multicast packets properly in fast-RX path of mac80211, from
          Johannes Berg.
      
       2) Because of a logic bug, the user can't actually force SW
          checksumming on r8152 devices. This makes diagnosis of hw
          checksumming bugs really annoying. Fix from Hayes Wang.
      
       3) VXLAN route lookup does not take the source and destination ports
          into account, which means IPSEC policies cannot be matched properly.
          Fix from Martynas Pumputis.
      
       4) Do proper RCU locking in netvsc callbacks, from Stephen Hemminger.
      
       5) Fix SKB leaks in mlxsw driver, from Arkadi Sharshevsky.
      
       6) If lwtunnel_fill_encap() fails, we do not abort the netlink message
          construction properly in fib_dump_info(), from David Ahern.
      
       7) Do not use kernel stack for DMA buffers in atusb driver, from Stefan
          Schmidt.
      
       8) Openvswitch conntack actions need to maintain a correct checksum,
          fix from Lance Richardson.
      
       9) ax25_disconnect() is missing a check for ax25->sk being NULL, in
          fact it already checks this, but not in all of the necessary spots.
          Fix from Basil Gunn.
      
      10) Action GET operations in the packet scheduler can erroneously bump
          the reference count of the entry, making it unreleasable. Fix from
          Jamal Hadi Salim. Jamal gives a great set of example command lines
          that trigger this in the commit message.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits)
        net sched actions: fix refcnt when GETing of action after bind
        net/mlx4_core: Eliminate warning messages for SRQ_LIMIT under SRIOV
        net/mlx4_core: Fix when to save some qp context flags for dynamic VST to VGT transitions
        net/mlx4_core: Fix racy CQ (Completion Queue) free
        net: stmmac: don't use netdev_[dbg, info, ..] before net_device is registered
        net/mlx5e: Fix a -Wmaybe-uninitialized warning
        ax25: Fix segfault after sock connection timeout
        bpf: rework prog_digest into prog_tag
        tipc: allocate user memory with GFP_KERNEL flag
        net: phy: dp83867: allow RGMII_TXID/RGMII_RXID interface types
        ip6_tunnel: Account for tunnel header in tunnel MTU
        mld: do not remove mld souce list info when set link down
        be2net: fix MAC addr setting on privileged BE3 VFs
        be2net: don't delete MAC on close on unprivileged BE3 VFs
        be2net: fix status check in be_cmd_pmac_add()
        cpmac: remove hopeless #warning
        ravb: do not use zero-length alignment DMA descriptor
        mlx4: do not call napi_schedule() without care
        openvswitch: maintain correct checksum state in conntrack actions
        tcp: fix tcp_fastopen unaligned access complaints on sparc
        ...
      4b19a9e2
    • L
      Merge branch 'stable/for-linus-4.10' of... · 203f80f1
      Linus Torvalds 提交于
      Merge branch 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb
      
      Pull swiotlb fix from Konrad Rzeszutek Wilk:
       "A tiny fix to make sure that page-sized mappings are page-aligned (and
        not say straddle two pages). This is important for some drivers (such
        as NVME)"
      
      * 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
        swiotlb: ensure that page-sized mappings are page-aligned
      203f80f1
    • L
      Merge tag 'mmc-v4.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc · 7e84b303
      Linus Torvalds 提交于
      Pull MMC fixes from Ulf Hansson:
       "MMC core:
         - fix regressions detecting HS/HS DDR eMMC cards related to CMD6
      
        MMC host:
         - mmc: mxs-mmc: Fix additional cycles after transmission stop
         - sdhci-acpi: Only powered up enabled acpi child devices
         - meson: avoid possible NULL dereference"
      
      * tag 'mmc-v4.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
        mmc: core: Restore parts of the polling policy when switch to HS/HS DDR
        mmc: mxs-mmc: Fix additional cycles after transmission stop
        mmc: sdhci-acpi: Only powered up enabled acpi child devices
        MMC: meson: avoid possible NULL dereference
      7e84b303
    • L
      Merge tag 'for-linus-20170116' of git://git.infradead.org/linux-mtd · 7d8b8c09
      Linus Torvalds 提交于
      Pull MTD fixes from Brian Norris:
       "Just NAND updates from Boris:
      
         - avoid compiling xway NAND controller driver as a module (which
           didn't work)
      
         - fix tango NAND DT binding and make sure the controller is in a
           clean state at probe time
      
         - add dependency on HAS_IOMEM to the oxnas NAND driver
      
         - fix irq number validity check in the lpc32xx driver"
      
      * tag 'for-linus-20170116' of git://git.infradead.org/linux-mtd:
        mtd: nand: lpc32xx: fix invalid error handling of a requested irq
        mtd: nand: tango: Reset pbus to raw mode in probe
        mtd: nand: tango: Update DT binding description
        mtd: nand: oxnas_nand: fix build errors on arch/um, require HAS_IOMEM
        mtd: nand: xway: fix build because of module functions
        mtd: nand: xway: disable module support
      7d8b8c09
    • P
      net: marvell: sky2: use new api ethtool_{get|set}_link_ksettings · 55f78fcd
      Philippe Reynes 提交于
      The ethtool api {get|set}_settings is deprecated.
      We move this driver to new api {get|set}_link_ksettings.
      Signed-off-by: NPhilippe Reynes <tremyfr@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      55f78fcd
    • P
      net: marvell: skge: use new api ethtool_{get|set}_link_ksettings · 0f826385
      Philippe Reynes 提交于
      The ethtool api {get|set}_settings is deprecated.
      We move this driver to new api {get|set}_link_ksettings.
      
      The callback set_link_ksettings no longer update the value
      of advertising, as the struct ethtool_link_ksettings is
      defined as const.
      
      As I don't have the hardware, I'd be very pleased if
      someone may test this patch.
      Signed-off-by: NPhilippe Reynes <tremyfr@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0f826385
    • P
      net: jme: use new api ethtool_{get|set}_link_ksettings · c523838c
      Philippe Reynes 提交于
      The ethtool api {get|set}_settings is deprecated.
      We move this driver to new api {get|set}_link_ksettings.
      
      As I don't have the hardware, I'd be very pleased if
      someone may test this patch.
      Signed-off-by: NPhilippe Reynes <tremyfr@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c523838c
    • P
      net: korina: use new api ethtool_{get|set}_link_ksettings · af473688
      Philippe Reynes 提交于
      The ethtool api {get|set}_settings is deprecated.
      We move this driver to new api {get|set}_link_ksettings.
      Signed-off-by: NPhilippe Reynes <tremyfr@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af473688
  2. 17 1月, 2017 24 次提交
    • D
      Merge branch 'mvneta-xmit_more-bql' · b8128c42
      David S. Miller 提交于
      Marcin Wojtas says:
      
      ====================
      mvneta xmit_more and bql support
      
      This is a delayed v2 of short patchset, which introduces xmit_more and BQL
      to mvneta driver. The only one change was added in xmit_more support -
      condition check preventing excessive descriptors concatenation before
      flushing in HW.
      
      Any comments or feedback would be welcome.
      
      Changelog:
      v1 -> v2:
      
      * Add checking condition that ensures too much descriptors are not
        concatenated before flushing in HW.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b8128c42
    • M
      net: mvneta: add BQL support · a29b6235
      Marcin Wojtas 提交于
      Tests showed that when whole bandwidth is consumed, the latency for
      various kind of traffic can reach high values. With saturated
      link (e.g. with iperf from target to host) simple ping could take
      significant amount of time. BQL proved to improve this situation
      when implemented in mvneta driver. Measurements of ping latency
      for 3 link speeds:
      Speed | Latency w/o BQL | Latency with BQL
      10    |      7-14 ms    |     3.5 ms
      100   |      2-12 ms    |     0.6 ms
      1000  |   often timeout |   up to 2ms
      
      Decreasing latency as above result in sligt performance cost - 4kpps
      (-1.4%) when pushing 64B packets via two bridged interfaces of Armada 38x.
      For 1500B packets in the same setup, the mpstat tool showed +8% of
      CPU occupation (default affinity, second CPU idle). Even though this
      cost seems reasonable to take, considering other improvements.
      
      This commit adds byte queue limit mechanism for the mvneta driver.
      Signed-off-by: NMarcin Wojtas <mw@semihalf.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a29b6235
    • S
      net: mvneta: add xmit_more support · 2a90f7e1
      Simon Guinot 提交于
      Basing on xmit_more flag of the skb, TX descriptors can be concatenated
      before flushing. This commit delay Tx descriptor flush if the queue is
      running and if there is more skb's to send.
      
      A maximum allowed number of descriptors for flushing at once due to
      MVNETA_TXQ_UPDATE_REG(q) reqisters limitation, is 255. Because of that
      a new macro was added (MVNETA_TXQ_DEC_SENT_MASK) in order to ensure that
      concatenated amount of descriptor does not exceed that value.
      Signed-off-by: NSimon Guinot <simon.guinot@sequanux.org>
      Signed-off-by: NMarcin Wojtas <mw@semihalf.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2a90f7e1
    • J
      net sched actions: fix refcnt when GETing of action after bind · 0faa9cb5
      Jamal Hadi Salim 提交于
      Demonstrating the issue:
      
      .. add a drop action
      $sudo $TC actions add action drop index 10
      
      .. retrieve it
      $ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 2 bind 0 installed 29 sec used 29 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      ... bug 1 above: reference is two.
          Reference is actually 1 but we forget to subtract 1.
      
      ... do a GET again and we see the same issue
          try a few times and nothing changes
      ~$ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 2 bind 0 installed 31 sec used 31 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      ... lets try to bind the action to a filter..
      $ sudo $TC qdisc add dev lo ingress
      $ sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \
        u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 10
      
      ... and now a few GETs:
      $ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 3 bind 1 installed 204 sec used 204 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      $ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 4 bind 1 installed 206 sec used 206 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      $ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 5 bind 1 installed 235 sec used 235 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      .... as can be observed the reference count keeps going up.
      
      After the fix
      
      $ sudo $TC actions add action drop index 10
      $ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 1 bind 0 installed 4 sec used 4 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      $ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 1 bind 0 installed 6 sec used 6 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      $ sudo $TC qdisc add dev lo ingress
      $ sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \
        u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 10
      
      $ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 2 bind 1 installed 32 sec used 32 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      $ sudo $TC -s actions get action gact index 10
      
      	action order 1: gact action drop
      	 random type none pass val 0
      	 index 10 ref 2 bind 1 installed 33 sec used 33 sec
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      Fixes: aecc5cef ("net sched actions: fix GETing actions")
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0faa9cb5
    • L
      Merge tag 'nfs-for-4.10-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · 5cf7a0f3
      Linus Torvalds 提交于
      Pull NFS client bugfixes from Trond Myklebust:
      
       - fix invalid fget()/fput() calls when doing file locking
      
       - fix multiple directory cache invalidation issues due to the client
         failing to recognise that the directory wasn't changed
      
       - fix client recovery when server reboots multiple times
      
      * tag 'nfs-for-4.10-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
        NFSv4: Fix client recovery when server reboots multiple times
        NFSv4: update_changeattr should update the attribute timestamp
        NFSv4: Don't call update_changeattr() unless the unlink is successful
        NFSv4: Don't apply change_info4 twice on rename within a directory
        NFSv4: Call update_changeattr() from _nfs4_proc_open only if a file was created
        nfs: Don't take a reference on fl->fl_file for LOCK operation
      5cf7a0f3
    • D
      Merge branch 'mlx4-core-fixes' · 617125e7
      David S. Miller 提交于
      Tariq Toukan says:
      
      ====================
      mlx4 core fixes
      
      This patchset contains bug fixes from Jack to the mlx4 Core driver.
      
      Patch 1 solves a race in the flow of CQ free.
      Patch 2 moves some qp context flags update to the correct qp transition.
      Patch 3 eliminates warnings from the path of SRQ_LIMIT that flood the message log,
      and keeps them only in the path of SRQ_CATAS_ERROR.
      
      Series generated against net commit:
      1a717fcf Merge tag 'mac80211-for-davem-2017-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      617125e7
    • J
      net/mlx4_core: Eliminate warning messages for SRQ_LIMIT under SRIOV · 9577b174
      Jack Morgenstein 提交于
      When running SRIOV, warnings for SRQ LIMIT events flood the Hypervisor's
      message log when (correct, normally operating) apps use SRQ LIMIT events
      as a trigger to post WQEs to SRQs.
      
      Add more information to the existing debug printout for SRQ_LIMIT, and
      output the warning messages only for the SRQ CATAS ERROR event.
      
      Fixes: acba2420 ("mlx4_core: Add wrapper functions and comm channel and slave event support to EQs")
      Fixes: e0debf9c ("mlx4_core: Reduce warning message for SRQ_LIMIT event to debug level")
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9577b174
    • J
      net/mlx4_core: Fix when to save some qp context flags for dynamic VST to VGT transitions · 7c3945bc
      Jack Morgenstein 提交于
      Save the qp context flags byte containing the flag disabling vlan stripping
      in the RESET to INIT qp transition, rather than in the INIT to RTR
      transition. Per the firmware spec, the flags in this byte are active
      in the RESET to INIT transition.
      
      As a result of saving the flags in the incorrect qp transition, when
      switching dynamically from VGT to VST and back to VGT, the vlan
      remained stripped (as is required for VST) and did not return to
      not-stripped (as is required for VGT).
      
      Fixes: f0f829bf ("net/mlx4_core: Add immediate activate for VGT->VST->VGT")
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c3945bc
    • J
      net/mlx4_core: Fix racy CQ (Completion Queue) free · 291c566a
      Jack Morgenstein 提交于
      In function mlx4_cq_completion() and mlx4_cq_event(), the
      radix_tree_lookup requires a rcu_read_lock.
      This is mandatory: if another core frees the CQ, it could
      run the radix_tree_node_rcu_free() call_rcu() callback while
      its being used by the radix tree lookup function.
      
      Additionally, in function mlx4_cq_event(), since we are adding
      the rcu lock around the radix-tree lookup, we no longer need to take
      the spinlock. Also, the synchronize_irq() call for the async event
      eliminates the need for incrementing the cq reference count in
      mlx4_cq_event().
      
      Other changes:
      1. In function mlx4_cq_free(), replace spin_lock_irq with spin_lock:
         we no longer take this spinlock in the interrupt context.
         The spinlock here, therefore, simply protects against different
         threads simultaneously invoking mlx4_cq_free() for different cq's.
      
      2. In function mlx4_cq_free(), we move the radix tree delete to before
         the synchronize_irq() calls. This guarantees that we will not
         access this cq during any subsequent interrupts, and therefore can
         safely free the CQ after the synchronize_irq calls. The rcu_read_lock
         in the interrupt handlers only needs to protect against corrupting the
         radix tree; the interrupt handlers may access the cq outside the
         rcu_read_lock due to the synchronize_irq calls which protect against
         premature freeing of the cq.
      
      3. In function mlx4_cq_event(), we change the mlx_warn message to mlx4_dbg.
      
      4. We leave the cq reference count mechanism in place, because it is
         still needed for the cq completion tasklet mechanism.
      
      Fixes: 6d90aa5c ("net/mlx4_core: Make sure there are no pending async events when freeing CQ")
      Fixes: 225c7b1f ("IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters")
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      291c566a
    • P
      net/sched: cls_flower: Disallow duplicate internal elements · a3308d8f
      Paul Blakey 提交于
      Flower currently allows having the same filter twice with the same
      priority. Actions (and statistics update) will always execute on the
      first inserted rule leaving the second rule unused.
      This patch disallows that.
      Signed-off-by: NPaul Blakey <paulb@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a3308d8f
    • H
      net: stmmac: don't use netdev_[dbg, info, ..] before net_device is registered · b618ab45
      Heiner Kallweit 提交于
      Don't use netdev_info and friends before the net_device is registered.
      This avoids ugly messages like
      "meson8b-dwmac c9410000.ethernet (unnamed net_device) (uninitialized):
      Enable RX Mitigation via HW Watchdog Timer"
      Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b618ab45
    • A
      net/mlx5e: Fix a -Wmaybe-uninitialized warning · abeffce9
      Arnd Bergmann 提交于
      As found by Olof's build bot, we gain a harmless warning about a
      potential uninitialized variable reference in mlx5:
      
      drivers/net/ethernet/mellanox/mlx5/core/en_tc.c: In function 'parse_tc_fdb_actions':
      drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:769:13: warning: 'out_dev' may be used uninitialized in this function [-Wmaybe-uninitialized]
      drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:811:21: note: 'out_dev' was declared here
      
      This was introduced through the addition of an 'IS_ERR/PTR_ERR' pair
      that gcc is unfortunately unable to completely figure out.
      
      The problem being gcc cannot tell that if(IS_ERR()) in
      mlx5e_route_lookup_ipv4() is equivalent to checking if(err) later,
      so it assumes that 'out_dev' is used after the 'return PTR_ERR(rt)'.
      
      The PTR_ERR_OR_ZERO() case by comparison is fairly easy to detect
      by gcc, so it can't get that wrong, so it no longer warns.
      
      Hadar Hen Zion already attempted to fix the warning earlier by adding fake
      initializations, but that ended up not fully addressing all warnings, so
      I'm reverting it now that it is no longer needed.
      
      Link: http://arm-soc.lixom.net/buildlogs/mainline/v4.10-rc3-98-gcff3b2c/
      Fixes: a42485eb ("net/mlx5e: TC ipv4 tunnel encap offload error flow fixes")
      Fixes: a757d108 ("net/mlx5e: Fix kbuild warnings for uninitialized parameters")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      abeffce9
    • D
      ipv6: sr: add missing Kbuild export for header files · a50a05f4
      David Lebrun 提交于
      Add missing IPv6-SR header files in include/uapi/linux/Kbuild.
      
      Also, prevent seg6_lwt_headroom() from being exported and add
      missing linux/types.h include.
      Signed-off-by: NDavid Lebrun <david.lebrun@uclouvain.be>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a50a05f4
    • D
      bpf, trace: make ctx access checks more robust · 2d071c64
      Daniel Borkmann 提交于
      Make sure that ctx cannot potentially be accessed oob by asserting
      explicitly that ctx access size into pt_regs for BPF_PROG_TYPE_KPROBE
      programs must be within limits. In case some 32bit archs have pt_regs
      not being a multiple of 8, then BPF_DW access could cause such access.
      
      BPF_PROG_TYPE_KPROBE progs don't have a ctx conversion function since
      there's no extra mapping needed. kprobe_prog_is_valid_access() didn't
      enforce sizeof(long) as the only allowed access size, since LLVM can
      generate non BPF_W/BPF_DW access to regs from time to time.
      
      For BPF_PROG_TYPE_TRACEPOINT we don't have a ctx conversion either, so
      add a BUILD_BUG_ON() check to make sure that BPF_DW access will not be
      a similar issue in future (ctx works on event buffer as opposed to
      pt_regs there).
      
      Fixes: 2541517c ("tracing, perf: Implement BPF programs attached to kprobes")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d071c64
    • B
      ax25: Fix segfault after sock connection timeout · 8a367e74
      Basil Gunn 提交于
      The ax.25 socket connection timed out & the sock struct has been
      previously taken down ie. sock struct is now a NULL pointer. Checking
      the sock_flag causes the segfault.  Check if the socket struct pointer
      is NULL before checking sock_flag. This segfault is seen in
      timed out netrom connections.
      
      Please submit to -stable.
      Signed-off-by: NBasil Gunn <basil@pacabunga.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a367e74
    • M
      ipvlan: fix dev_id creation corner case. · 019ec003
      Mahesh Bandewar 提交于
      In the last patch da36e13c ("ipvlan: improvise dev_id generation
      logic in IPvlan") I missed some part of Dave's suggestion and because
      of that the dev_id creation could fail in a corner case scenario. This
      would happen when more or less 64k devices have been already created and
      several have been deleted. If the devices that are still sticking around
      are the last n bits from the bitmap. So in this scenario even if lower
      bits are available, the dev_id search is so narrow that it always fails.
      
      Fixes: da36e13c ("ipvlan: improvise dev_id generation logic in IPvlan")
      CC: David Miller <davem@davemloft.org>
      CC: Eric Dumazet <edumazet@google.com>
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      019ec003
    • D
      bpf: rework prog_digest into prog_tag · f1f7714e
      Daniel Borkmann 提交于
      Commit 7bd509e3 ("bpf: add prog_digest and expose it via
      fdinfo/netlink") was recently discussed, partially due to
      admittedly suboptimal name of "prog_digest" in combination
      with sha1 hash usage, thus inevitably and rightfully concerns
      about its security in terms of collision resistance were
      raised with regards to use-cases.
      
      The intended use cases are for debugging resp. introspection
      only for providing a stable "tag" over the instruction sequence
      that both kernel and user space can calculate independently.
      It's not usable at all for making a security relevant decision.
      So collisions where two different instruction sequences generate
      the same tag can happen, but ideally at a rather low rate. The
      "tag" will be dumped in hex and is short enough to introspect
      in tracepoints or kallsyms output along with other data such
      as stack trace, etc. Thus, this patch performs a rename into
      prog_tag and truncates the tag to a short output (64 bits) to
      make it obvious it's not collision-free.
      
      Should in future a hash or facility be needed with a security
      relevant focus, then we can think about requirements, constraints,
      etc that would fit to that situation. For now, rework the exposed
      parts for the current use cases as long as nothing has been
      released yet. Tested on x86_64 and s390x.
      
      Fixes: 7bd509e3 ("bpf: add prog_digest and expose it via fdinfo/netlink")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f1f7714e
    • E
      sfc: get PIO buffer size from the NIC · c634700f
      Edward Cree 提交于
      The 8000 series SFC NICs have 4K PIO buffers, rather than the 2K of
       the 7000 series.  Rather than having a hard-coded PIO buffer size
       (ER_DZ_TX_PIOBUF_SIZE), read it from the GET_CAPABILITIES_V2 MCDI
       response.
      Signed-off-by: NEdward Cree <ecree@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c634700f
    • E
      sfc: allow PIO more often · de1deff9
      Edward Cree 提交于
      If an option descriptor has been sent on a queue but not followed by a
       packet, there will have been no completion event, so the read and write
       counts won't match and we'll think we can't do PIO.  This combines with
       the fact that we have two TX queues (for en/disable checksum offload),
       and that both must be empty for PIO to happen.
      This patch adds a separate "packet_write_count" that tracks the most
       recent write_count we expect to see a completion event for; this excludes
       option descriptors but _includes_ PIO descriptors (even though they look
       like option descriptors).  This is then used, rather than write_count,
       in efx_nic_tx_is_empty().
      We only bother to maintain packet_write_count on EF10, since on Siena
       (a) there are no option descriptors and it always equals write_count, and
       (b) there's no PIO, so we don't need it anyway.
      Signed-off-by: NEdward Cree <ecree@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de1deff9
    • M
      sctp: remove useless code from sctp_apply_peer_addr_params · cdfb1a9f
      Marcelo Ricardo Leitner 提交于
      sctp_frag_point() doesn't store anything, and thus just calling it
      cannot do anything useful.
      
      sctp_apply_peer_addr_params is only called by
      sctp_setsockopt_peer_addr_params. When operating on an asoc,
      sctp_setsockopt_peer_addr_params will call sctp_apply_peer_addr_params
      once for the asoc, and then once for each transport this asoc has,
      meaning that the frag_point will be recomputed when updating the
      transports and calling it when updating the asoc is not necessary.
      IOW, no action is needed here and we can remove this call.
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Reviewed-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cdfb1a9f
    • M
    • C
      flow dissector: check if arp_eth is null rather than arp · 57b68ec2
      Colin Ian King 提交于
      arp is being checked instead of arp_eth to see if the call to
      __skb_header_pointer failed. Fix this by checking arp_eth is
      null instead of arp.   Also fix to use length hlen rather than
      hlen - sizeof(_arp); thanks to Eric Dumazet for spotting
      this latter issue.
      
      CoverityScan CID#1396428 ("Logically dead code") on 2nd
      arp comparison (which should be arp_eth instead).
      
      Fixes: commit 55733350 ("flow disector: ARP support")
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      57b68ec2
    • E
      netlink: do not enter direct reclaim from netlink_trim() · e89df813
      Eric Dumazet 提交于
      In commit d35c99ff ("netlink: do not enter direct reclaim from
      netlink_dump()") we made sure to not trigger expensive memory reclaim.
      
      Problem is that a bit later, netlink_trim() might be called and
      trigger memory reclaim.
      
      netlink_trim() should be best effort, and really as fast as possible.
      Under memory pressure, it is fine to not trim this skb.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e89df813
    • H
      cxgb4: Shutdown adapter if firmware times out or errors out · 3be0679b
      Hariprasad Shenai 提交于
      Perform an emergency shutdown of the adapter and stop it from
      continuing any further communication on the ports or DMA to the
      host. This is typically used when the adapter and/or firmware
      have crashed and we want to prevent any further accidental
      communication with the rest of the world. This will also force
      the port Link Status to go down -- if register writes work --
      which should help our peers figure out that we're down.
      Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3be0679b