1. 11 1月, 2016 40 次提交
    • G
      net: bfin_mac: Use phy_find_first() instead of open-coding it · 713d4024
      Guenter Roeck 提交于
      Use phy_find_first() to find the first phy device instead of
      open-coding it.
      
      Cc: Andrew Lunn <andrew@lunn.ch>
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      713d4024
    • D
      Merge branch 'ovs-cleanups' · 50ee6382
      David S. Miller 提交于
      Jean Sacren says:
      
      ====================
      Trivial fix-ups for openvswitch
      
      This series does trivial fix-ups for openvswitch as follows:
      
      1) Clean up the leftover of the unused function.
      
      2) Fix up the twisted struct geneve_port member name.
      
      3) Update the kernel doc to reflect the changes in struct vport.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      50ee6382
    • J
      openvswitch: update kernel doc for struct vport · c5420eb1
      Jean Sacren 提交于
      commit be4ace6e ("openvswitch: Move dev pointer into vport itself")
      
      The commit above added @dev and moved @rcu to the bottom of struct
      vport, but the change was not reflected in the kernel doc. So let's
      update the kernel doc as well.
      Signed-off-by: NJean Sacren <sakiwit@gmail.com>
      Cc: Thomas Graf <tgraf@suug.ch>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c5420eb1
    • J
      openvswitch: fix struct geneve_port member name · 2f7066ad
      Jean Sacren 提交于
      commit 6b001e68 ("openvswitch: Use Geneve device.")
      
      The commit above introduced 'port_no' as the name for the member of
      struct geneve_port. The correct name should be 'dst_port' as described
      in the kernel doc. Let's fix that member name and all the pertinent
      instances so that both doc and code would be consistent.
      Signed-off-by: NJean Sacren <sakiwit@gmail.com>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2f7066ad
    • J
      openvswitch: clean up unused function · 5ea03042
      Jean Sacren 提交于
      commit 6b001e68 ("openvswitch: Use Geneve device.")
      
      The commit above deleted the only call site of ovs_tunnel_route_lookup()
      and now that function is not used any more. So let's delete the function
      definition as well.
      Signed-off-by: NJean Sacren <sakiwit@gmail.com>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5ea03042
    • G
      net: ti: cpmac: Fix build error due to missed API change · 3c6396d6
      Guenter Roeck 提交于
      Commit 7f854420 ("phy: Add API for {un}registering an mdio device to
      a bus") introduces an API to access mii_bus structures, but missed to
      update the TI cpamc driver. This results in the following error message.
      
      drivers/net/ethernet/ti/cpmac.c: In function 'cpmac_probe':
      drivers/net/ethernet/ti/cpmac.c:1119:18: error:
      	'struct mii_bus' has no member named 'phy_map'
      
      Fixes: 7f854420 ("phy: Add API for {un}registering an mdio device to a bus")
      Cc: Andrew Lunn <andrew@lunn.ch>
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Acked-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3c6396d6
    • G
      net: tc35815: Drop unused variable · e253e8fb
      Guenter Roeck 提交于
      Commit e7f4dc35 ("mdio: Move allocation of interrupts into core")
      removes some code from tc_mii_init(), but does not remove a now unused
      variable. This results in the following build warning.
      
      drivers/net/ethernet/toshiba/tc35815.c: In function 'tc_mii_init':
      drivers/net/ethernet/toshiba/tc35815.c:670:6: warning: unused variable 'i'
      
      Fixes: e7f4dc35 ("mdio: Move allocation of interrupts into core")
      Cc: Andrew Lunn <andrew@lunn.ch>
      Acked-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e253e8fb
    • G
      net: tc35815: Fix build error due to missed API change · a05876b3
      Guenter Roeck 提交于
      Commit 7f854420 ("phy: Add API for {un}registering an mdio device to
      a bus") introduces an API to access mii_bus structures, but missed to
      update the tc35815 driver. This results in the following error message.
      
      drivers/net/ethernet/toshiba/tc35815.c: In function 'tc_mii_probe':
      drivers/net/ethernet/toshiba/tc35815.c:617:18: error:
      	'struct mii_bus' has no member named 'phy_map'
      drivers/net/ethernet/toshiba/tc35815.c:623:24: error:
      	'struct mii_bus' has no member named 'phy_map'
      
      Instead of looping over the list of phy addresses to find a phy chip,
      use phy_find_first(). While the intent of the original code was to return
      an error if more than one phy was specified, this code path was never
      executed because the loop aborted after finding the first phy. The
      original code is therefore semantically identical to phy_find_first(),
      thus it is simpler and more straightforward to use phy_find_first()
      directly.
      
      Fixes: 7f854420 ("phy: Add API for {un}registering an mdio device to a bus")
      Cc: Andrew Lunn <andrew@lunn.ch>
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Acked-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a05876b3
    • J
      net: phy: Add support for SMSC LAN8740 PHY · 26706d43
      Joshua Henderson 提交于
      LAN8740 has a different phy_id than LAN8710/LAN8720.
      Signed-off-by: NJoshua Henderson <joshua.henderson@microchip.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      26706d43
    • D
      Merge tag 'wireless-drivers-next-for-davem-2016-01-09' of... · 7d7f5d04
      David S. Miller 提交于
      Merge tag 'wireless-drivers-next-for-davem-2016-01-09' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
      
      Kalle Valo says:
      
      ====================
      brcmfmac
      
      * query features through firmware command
      * ARP offload through inet notifier
      * force probe to succeed for debugging purposes
      * random mac support for scheduled scan
      * support wowl upon net detect
      
      iwlwifi
      
      * bug fixes and improvements for firmware debug system
      * advertise support for Rx A-MSDU in A-MPDU
      * support -20.ucode
      * fix WoWLAN for iwldvm
      * preparations towards multiple Rx queues
      * platform power improvements for GO mode when no clients are associated
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d7f5d04
    • E
      net: add scheduling point in recvmmsg/sendmmsg · a78cb84c
      Eric Dumazet 提交于
      Applications often have to reduce number of datagrams
      they receive or send per system call to avoid starvation problems.
      
      Really the kernel should take care of this by using cond_resched(),
      so that applications can experiment bigger batch sizes.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a78cb84c
    • L
      ipv6: always add flag an address that failed DAD with DADFAILED · 3d171f39
      Lubomir Rintel 提交于
      The userspace needs to know why is the address being removed so that it can
      perhaps obtain a new address.
      
      Without the DADFAILED flag it's impossible to distinguish removal of a
      temporary and tentative address due to DAD failure from other reasons (device
      removed, manual address removal).
      Signed-off-by: NLubomir Rintel <lkundrak@v3.sk>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3d171f39
    • F
      net: lpc_eth: Remove unused variables · 541b8e29
      Fabio Estevam 提交于
      Commit e7f4dc35 ("mdio: Move allocation of interrupts into core")
      introduced the following build warnings:
      
      drivers/net/ethernet/nxp/lpc_eth.c: In function 'lpc_mii_init':
      drivers/net/ethernet/nxp/lpc_eth.c:865:1: warning: label 'err_out_1' defined but not used [-Wunused-label]
      drivers/net/ethernet/nxp/lpc_eth.c:826:20: warning: unused variable 'i' [-Wunused-variable]
      
      Remove the unused variables to fix them.
      Reported-by: NOlof's autobuilder <build@lixom.net>
      Signed-off-by: NFabio Estevam <fabio.estevam@nxp.com>
      Acked-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      541b8e29
    • S
      bfin_mac: fix error path · fdffd2e8
      Sudip Mukherjee 提交于
      While building blackfin defconfig we were getting a build warning:
      warning: label 'out_err_irq_alloc' defined but not used.
      
      Commit e7f4dc35 ("mdio: Move allocation of interrupts into core")
      removed the label out_err_mdiobus_register but then mistakenly jumped to
      out_err_alloc. But it was actually supposed to jump to out_err_irq_alloc.
      
      Fixes: e7f4dc35 ("mdio: Move allocation of interrupts into core")
      Cc: Andrew Lunn <andrew@lunn.ch>
      Signed-off-by: NSudip Mukherjee <sudip@vectorindia.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fdffd2e8
    • S
      phy: fix blackfin build failure · 053842a8
      Sudip Mukherjee 提交于
      The build of blackfin defconfig is failing with the error:
      error: 'struct mii_bus' has no member named 'phy_map'
      
      A new API mdiobus_get_phy() was introduced and phy_map was removed but
      it was not changed here.
      
      Fixes: 7f854420 ("phy: Add API for {un}registering an mdio device to a bus.")
      Cc: Andrew Lunn <andrew@lunn.ch>
      Signed-off-by: NSudip Mukherjee <sudip@vectorindia.org>
      Acked-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      053842a8
    • H
      cxgb4: Fixes static checker warning in mps_tcam_show() · 89e7a154
      Hariprasad Shenai 提交于
      The commit 115b56af ("cxgb4: Update mps_tcam output to include T6
      fields") from Dec 23, 2015, leads to the following static checker
      warning:
      
              drivers/net/ethernet/chelsio/cxgb4/cxgb4_debugfs.c:1735
      mps_tcam_show()
              warn: we tested 'lookup_type' before and it was 'true'
      
      Fixing it.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      89e7a154
    • D
      Merge branch 'emac-RK3036' · 1f76f731
      David S. Miller 提交于
      Xing Zheng says:
      
      ====================
      Add support emac for the RK3036 SoC platform
      
        We have supported the emac for RK3066/RK3188, but the RK3036 have
      some configuration different with them. We should let the driver of
      emac_rockchip compatible with other Rockchip SoCs.
      
      Changes in v2:
      - Separate DTS from patch series.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f76f731
    • X
      net: ethernet: arc: Add support emac for RK3036 · af72261f
      Xing Zheng 提交于
      The RK3036's GRFs offset are different with RK3066/RK3188, and need to set
      mac TX/RX clock before probe emac.
      Signed-off-by: NXing Zheng <zhengxing@rock-chips.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af72261f
    • X
      net: ethernet: arc: Keep emac compatibility for more Rockchip SoCs · f4c9d3ee
      Xing Zheng 提交于
      On the RK3066/RK3188, there was fixed GRF offset configuration to set emac
      and fixed DIV2 mac TX/RX clock. So, we need to easily set and fit to other
      SoCs (RK3036) which maybe have different GRF offset, and need adjust mac
      TX/RX clock.
      Signed-off-by: NXing Zheng <zhengxing@rock-chips.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4c9d3ee
    • X
      net: ethernet: arc: Probe emac after set RMII clock · c9bca2fe
      Xing Zheng 提交于
      After enter arc_emac_probe, emac will get_phy_id, phy_poll_reset and
      other connecting PHY via mdiobus_read, so we need to set correct
      ref clock rate for emac before probe emac.
      Signed-off-by: NXing Zheng <zhengxing@rock-chips.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9bca2fe
    • D
      Merge branch 'bnxt_en-zeropad-fw-and-reset' · 0652cb5b
      David S. Miller 提交于
      Michael Chan says:
      
      ====================
      bnxt_en: Zero pad fw messages and add fw reset.
      
      2 patches related to firmware for net-next.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0652cb5b
    • R
      bnxt_en: Reset embedded processor after applying firmware upgrade · d2d6318c
      Rob Swindell 提交于
      Use HWRM_FW_RESET command to request a self-reset of the embedded
      processor(s) after successfully applying a firmware update. For boot
      processor, the self-reset is currently deferred until the next PCIe reset.
      Signed-off-by: NRob Swindell <swindell@broadcom.com>
      Signed-off-by: NMichael Chan <mchan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d2d6318c
    • M
      bnxt_en: Zero pad firmware messages to 128 bytes. · d79979a1
      Michael Chan 提交于
      For future compatibility, zero pad all messages that the driver sends
      to the firmware to 128 bytes.  If these messages are extended in the
      future with new byte enables, zero padding these messages now will
      guarantee future compatibility.
      Signed-off-by: NMichael Chan <mchan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d79979a1
    • D
      net, sched: add clsact qdisc · 1f211a1b
      Daniel Borkmann 提交于
      This work adds a generalization of the ingress qdisc as a qdisc holding
      only classifiers. The clsact qdisc works on ingress, but also on egress.
      In both cases, it's execution happens without taking the qdisc lock, and
      the main difference for the egress part compared to prior version of [1]
      is that this can be applied with _any_ underlying real egress qdisc (also
      classless ones).
      
      Besides solving the use-case of [1], that is, allowing for more programmability
      on assigning skb->priority for the mqprio case that is supported by most
      popular 10G+ NICs, it also opens up a lot more flexibility for other tc
      applications. The main work on classification can already be done at clsact
      egress time if the use-case allows and state stored for later retrieval
      f.e. again in skb->priority with major/minors (which is checked by most
      classful qdiscs before consulting tc_classify()) and/or in other skb fields
      like skb->tc_index for some light-weight post-processing to get to the
      eventual classid in case of a classful qdisc. Another use case is that
      the clsact egress part allows to have a central egress counterpart to
      the ingress classifiers, so that classifiers can easily share state (e.g.
      in cls_bpf via eBPF maps) for ingress and egress.
      
      Currently, default setups like mq + pfifo_fast would require for this to
      use, for example, prio qdisc instead (to get a tc_classify() run) and to
      duplicate the egress classifier for each queue. With clsact, it allows
      for leaving the setup as is, it can additionally assign skb->priority to
      put the skb in one of pfifo_fast's bands and it can share state with maps.
      Moreover, we can access the skb's dst entry (f.e. to retrieve tclassid)
      w/o the need to perform a skb_dst_force() to hold on to it any longer. In
      lwt case, we can also use this facility to setup dst metadata via cls_bpf
      (bpf_skb_set_tunnel_key()) without needing a real egress qdisc just for
      that (case of IFF_NO_QUEUE devices, for example).
      
      The realization can be done without any changes to the scheduler core
      framework. All it takes is that we have two a-priori defined minors/child
      classes, where we can mux between ingress and egress classifier list
      (dev->ingress_cl_list and dev->egress_cl_list, latter stored close to
      dev->_tx to avoid extra cacheline miss for moderate loads). The egress
      part is a bit similar modelled to handle_ing() and patched to a noop in
      case the functionality is not used. Both handlers are now called
      sch_handle_ingress() and sch_handle_egress(), code sharing among the two
      doesn't seem practical as there are various minor differences in both
      paths, so that making them conditional in a single handler would rather
      slow things down.
      
      Full compatibility to ingress qdisc is provided as well. Since both
      piggyback on TC_H_CLSACT, only one of them (ingress/clsact) can exist
      per netdevice, and thus ingress qdisc specific behaviour can be retained
      for user space. This means, either a user does 'tc qdisc add dev foo ingress'
      and configures ingress qdisc as usual, or the 'tc qdisc add dev foo clsact'
      alternative, where both, ingress and egress classifier can be configured
      as in the below example. ingress qdisc supports attaching classifier to any
      minor number whereas clsact has two fixed minors for muxing between the
      lists, therefore to not break user space setups, they are better done as
      two separate qdiscs.
      
      I decided to extend the sch_ingress module with clsact functionality so
      that commonly used code can be reused, the module is being aliased with
      sch_clsact so that it can be auto-loaded properly. Alternative would have been
      to add a flag when initializing ingress to alter its behaviour plus aliasing
      to a different name (as it's more than just ingress). However, the first would
      end up, based on the flag, choosing the new/old behaviour by calling different
      function implementations to handle each anyway, the latter would require to
      register ingress qdisc once again under different alias. So, this really begs
      to provide a minimal, cleaner approach to have Qdisc_ops and Qdisc_class_ops
      by its own that share callbacks used by both.
      
      Example, adding qdisc:
      
         # tc qdisc add dev foo clsact
         # tc qdisc show dev foo
         qdisc mq 0: root
         qdisc pfifo_fast 0: parent :1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
         qdisc pfifo_fast 0: parent :2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
         qdisc pfifo_fast 0: parent :3 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
         qdisc pfifo_fast 0: parent :4 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
         qdisc clsact ffff: parent ffff:fff1
      
      Adding filters (deleting, etc works analogous by specifying ingress/egress):
      
         # tc filter add dev foo ingress bpf da obj bar.o sec ingress
         # tc filter add dev foo egress  bpf da obj bar.o sec egress
         # tc filter show dev foo ingress
         filter protocol all pref 49152 bpf
         filter protocol all pref 49152 bpf handle 0x1 bar.o:[ingress] direct-action
         # tc filter show dev foo egress
         filter protocol all pref 49152 bpf
         filter protocol all pref 49152 bpf handle 0x1 bar.o:[egress] direct-action
      
      A 'tc filter show dev foo' or 'tc filter show dev foo parent ffff:' will
      show an empty list for clsact. Either using the parent names (ingress/egress)
      or specifying the full major/minor will then show the related filter lists.
      
      Prior work on a mqprio prequeue() facility [1] was done mainly by John Fastabend.
      
        [1] http://patchwork.ozlabs.org/patch/512949/Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f211a1b
    • A
      ethernet: amd: au1000: Remove pointless warning · ede55997
      Andrew Lunn 提交于
      The warning about being able to read any MDIO device, not just the
      attached ethernet devices PHY applies to all MDIO drivers. So remove
      it. This also removes a reference to a member in phy_device which has
      moved.
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ede55997
    • A
      staging: netlogic: Fix build error due to missed API change · 3fe01e24
      Andrew Lunn 提交于
      Fix a number of build errors due to moving the phy_map and centralizing
      interrupt allocation.
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3fe01e24
    • G
      net: ethernet: faraday: Use phy_find_first() instead of open coding it · e574f398
      Guenter Roeck 提交于
      Use phy_find_first() to find the first phy device instead of
      open coding it.
      
      Cc: Andrew Lunn <andrew@lunn.ch>
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Acked-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e574f398
    • G
      net: ethernet: broadcom: Fix build errors · ee64f08e
      Guenter Roeck 提交于
      Commit 7f854420 ("phy: Add API for {un}registering an mdio device to
      a bus") introduces an API to access mii_bus structures, but missed to
      update the sb1250 driver. This results in the following build error.
      
      drivers/net/ethernet/broadcom/sb1250-mac.c: In function 'sbmac_mii_probe':
      drivers/net/ethernet/broadcom/sb1250-mac.c:2360:24: error:
      	'struct mii_bus' has no member named 'phy_map'
      
      Use phy_find_first() instead of open coding it.
      
      Commit 2220943a ("phy: Centralise print about attached phy") introduces
      the following build error.
      
      drivers/net/ethernet/broadcom/sb1250-mac.c: In function 'sbmac_mii_probe':
      drivers/net/ethernet/broadcom/sb1250-mac.c:2383:20: error: 'phydev' undeclared
      
      Fixes: 7f854420 ("phy: Add API for {un}registering an mdio device to a bus")
      Fixes: 2220943a ("phy: Centralise print about attached phy")
      Cc: Andrew Lunn <andrew@lunn.ch>
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Acked-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee64f08e
    • D
      Merge branch 'mdio-device-fixes' · 5c721d56
      David S. Miller 提交于
      Andrew Lunn says:
      
      ====================
      Fix breakage from mdio device
      
      These two patches fix MIPS platforms which got broken by
      the recent mdio device patchset.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5c721d56
    • A
      net: ethernet-rgmii.c: Fix breakage from moving phdev bus · 0c129bf7
      Andrew Lunn 提交于
      The mdio device patches moved the bus member in phy_device into a
      substructure. This driver got missed. Fix it.
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c129bf7
    • A
      net: lantiq_etop.c: Use helper to find first phy · 2a4fc4ea
      Andrew Lunn 提交于
      Make use of the helper to find the first phy device.
      This also fixes the compile breakage.
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2a4fc4ea
    • R
      stmmac: Don't exit mdio registration when mdio subnode is not found in the DTS · 6c672c9b
      Romain Perier 提交于
      Originally, most of the platforms using this driver did not define an mdio subnode
      in the devicetree. Commit e34d65 ("stmmac: create of compatible mdio bus for stmmac driver")
      introduced a backward compatibily issue by using of_mdiobus_register explicitly
      with an mdio subnode. This patch fixes the issue by calling the function
      mdiobus_register, when mdio subnode is not found. The driver is now compatible
      with both modes.
      
      Fixes: e34d6569 ("stmmac: create of compatible mdio bus for stmmac driver")
      Signed-off-by: NRomain Perier <romain.perier@gmail.com>
      Tested-by: NPhil Reid <preid@electromag.com.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6c672c9b
    • D
      Merge branch 'bpf-next' · 749f7df1
      David S. Miller 提交于
      Daniel Borkmann says:
      
      ====================
      BPF update
      
      Fixes a csum issue on ingress. As mentioned previously, net-next
      seems just fine imho. Later on, will follow up with couple of
      replacements like ovs_skb_postpush_rcsum() etc.
      
      Thanks!
      
      v1 -> v2:
        - Added patch 1 with helper
        - Implemented Hannes' idea to just use csum_partial, thanks!
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      749f7df1
    • D
      bpf: add skb_postpush_rcsum and fix dev_forward_skb occasions · f8ffad69
      Daniel Borkmann 提交于
      Add a small helper skb_postpush_rcsum() and fix up redirect locations
      that need CHECKSUM_COMPLETE fixups on ingress. dev_forward_skb() expects
      a proper csum that covers also Ethernet header, f.e. since 2c26d34b
      ("net/core: Handle csum for CHECKSUM_COMPLETE VXLAN forwarding"), we
      also do skb_postpull_rcsum() after pulling Ethernet header off via
      eth_type_trans().
      
      When using eBPF in a netns setup f.e. with vxlan in collect metadata mode,
      I can trigger the following csum issue with an IPv6 setup:
      
        [  505.144065] dummy1: hw csum failure
        [...]
        [  505.144108] Call Trace:
        [  505.144112]  <IRQ>  [<ffffffff81372f08>] dump_stack+0x44/0x5c
        [  505.144134]  [<ffffffff81607cea>] netdev_rx_csum_fault+0x3a/0x40
        [  505.144142]  [<ffffffff815fee3f>] __skb_checksum_complete+0xcf/0xe0
        [  505.144149]  [<ffffffff816f0902>] nf_ip6_checksum+0xb2/0x120
        [  505.144161]  [<ffffffffa08c0e0e>] icmpv6_error+0x17e/0x328 [nf_conntrack_ipv6]
        [  505.144170]  [<ffffffffa0898eca>] ? ip6t_do_table+0x2fa/0x645 [ip6_tables]
        [  505.144177]  [<ffffffffa08c0725>] ? ipv6_get_l4proto+0x65/0xd0 [nf_conntrack_ipv6]
        [  505.144189]  [<ffffffffa06c9a12>] nf_conntrack_in+0xc2/0x5a0 [nf_conntrack]
        [  505.144196]  [<ffffffffa08c039c>] ipv6_conntrack_in+0x1c/0x20 [nf_conntrack_ipv6]
        [  505.144204]  [<ffffffff8164385d>] nf_iterate+0x5d/0x70
        [  505.144210]  [<ffffffff816438d6>] nf_hook_slow+0x66/0xc0
        [  505.144218]  [<ffffffff816bd302>] ipv6_rcv+0x3f2/0x4f0
        [  505.144225]  [<ffffffff816bca40>] ? ip6_make_skb+0x1b0/0x1b0
        [  505.144232]  [<ffffffff8160b77b>] __netif_receive_skb_core+0x36b/0x9a0
        [  505.144239]  [<ffffffff8160bdc8>] ? __netif_receive_skb+0x18/0x60
        [  505.144245]  [<ffffffff8160bdc8>] __netif_receive_skb+0x18/0x60
        [  505.144252]  [<ffffffff8160ccff>] process_backlog+0x9f/0x140
        [  505.144259]  [<ffffffff8160c4a5>] net_rx_action+0x145/0x320
        [...]
      
      What happens is that on ingress, we push Ethernet header back in, either
      from cls_bpf or right before skb_do_redirect(), but without updating csum.
      The "hw csum failure" can be fixed by using the new skb_postpush_rcsum()
      helper for the dev_forward_skb() case to correct the csum diff again.
      
      Thanks to Hannes Frederic Sowa for the csum_partial() idea!
      
      Fixes: 3896d655 ("bpf: introduce bpf_clone_redirect() helper")
      Fixes: 27b29f63 ("bpf: add bpf_redirect() helper")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f8ffad69
    • D
      net, sched: add skb_at_tc_ingress helper · fdc5432a
      Daniel Borkmann 提交于
      Add a skb_at_tc_ingress() as this will be needed elsewhere as well and
      can hide the ugly ifdef.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fdc5432a
    • D
      Merge branch 'tcp-keepalive-namespaceify' · 4156afaf
      David S. Miller 提交于
      Nikolay Borisov says:
      
      ====================
      Namespaceify tcp keepalive machinery
      
      The following patch series enables the tcp keepalive mechanism
      to be configured per net namespace. This is especially useful
      if you have multiple containers hosted on one node and one of
      them is under DoS-  in such situations one thing which could
      be done is to configure the tcp keepalive settings such that
      connections for that particular container are being reset
      faster.
      
      Another scenario where not being able to control those knob
      comes per container is problematic is occurs the value of
      net.netfilter.nf_conntrack_tcp_timeout_established is set
      below the keepalive interval, in such situations the server won't
      send an RST packet resulting in applications not trying to
      reconnect and stale connection waiting. Changing the global
      keepalive value is a possible solution but it might interfere
      with other containers.
      
      The three patches gradually convert each of the affected knobs
      to be per netns. I thought it would be easier for review than
      put everything in one patch. If people deem it more appropriate
      to squash everything in one patch (maybe after review) I'd
      be more than happy to do it.
      
      The patches have been compile-tested on 4.4 and functionally
      tested on 3.12 and they work as expected.
      
      These are based off 4.4-rc8
      ====================
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4156afaf
    • N
      ipv4: Namespecify the tcp_keepalive_intvl sysctl knob · b840d15d
      Nikolay Borisov 提交于
      This is the final part required to namespaceify the tcp
      keep alive mechanism.
      Signed-off-by: NNikolay Borisov <kernel@kyup.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b840d15d
    • N
      ipv4: Namespecify tcp_keepalive_probes sysctl knob · 9bd6861b
      Nikolay Borisov 提交于
      This is required to have full tcp keepalive mechanism namespace
      support.
      Signed-off-by: NNikolay Borisov <kernel@kyup.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9bd6861b
    • N
      ipv4: Namespaceify tcp_keepalive_time sysctl knob · 13b287e8
      Nikolay Borisov 提交于
      Different net namespaces might have different requirements as to
      the keepalive time of tcp sockets. This might be required in cases
      where different firewall rules are in place which require tcp
      timeout sockets to be increased/decreased independently of the host.
      Signed-off-by: NNikolay Borisov <kernel@kyup.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      13b287e8
    • D
      Merge branch 'mlxsw-layer2-multicast' · d3517f19
      David S. Miller 提交于
      Jiri Pirko says:
      
      ====================
      mlxsw: Adding layer 2 multicast
      
      Elad says:
      
      This patchset add Linux hardware reflection for L2 multicast offload and add
      MC support in mlxsw. For every bridge MDB entry insertion, either by IGMP
      snooping or by static insertion/removal, a switchdev ops is been called.
      In mlxsw, a new multicast group (MID) is been created and ports are assigned.
      When all ports are removed, the multicast group is been deleted.
      
      ---
      v1->v2:
      - GFP_ATOMIC->GFP_KERNEL change in patch 7/8
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3517f19