1. 14 5月, 2020 1 次提交
    • M
      net: phy: broadcom: add cable test support · 11ecf8c5
      Michael Walle 提交于
      Most modern broadcom PHYs support ECD (enhanced cable diagnostics). Add
      support for it in the bcm-phy-lib so they can easily be used in the PHY
      driver.
      
      There are two access methods for ECD: legacy by expansion registers and
      via the new RDB registers which are exclusive. Provide functions in two
      variants where the PHY driver can choose from. To keep things simple for
      now, we just switch the register access to expansion registers in the
      RDB variant for now. On the flipside, we have to keep a bus lock to
      prevent any other non-legacy access on the PHY.
      
      The results of the intra-pair tests are inconclusive (at least for the
      BCM54140). Most of the times half the length is reported but sometimes
      the length is correct.
      Signed-off-by: NMichael Walle <michael@walle.cc>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      11ecf8c5
  2. 13 5月, 2020 5 次提交
    • V
      net: dsa: tag_sja1105: implement sub-VLAN decoding · 84eeb5d4
      Vladimir Oltean 提交于
      Create a subvlan_map as part of each port's tagger private structure.
      This keeps reverse mappings of bridge-to-dsa_8021q VLAN retagging rules.
      
      Note that as of this patch, this piece of code is never engaged, due to
      the fact that the driver hasn't installed any retagging rule, so we'll
      always see packets with a subvlan code of 0 (untagged).
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      84eeb5d4
    • V
      net: dsa: tag_8021q: support up to 8 VLANs per port using sub-VLANs · 3eaae1d0
      Vladimir Oltean 提交于
      For switches that support VLAN retagging, such as sja1105, we extend
      dsa_8021q by encoding a "sub-VLAN" into the remaining 3 free bits in the
      dsa_8021q tag.
      
      A sub-VLAN is nothing more than a number in the range 0-7, which serves
      as an index into a per-port driver lookup table. The sub-VLAN value of
      zero means that traffic is untagged (this is also backwards-compatible
      with dsa_8021q without retagging).
      
      The switch should be configured to retag VLAN-tagged traffic that gets
      transmitted towards the CPU port (and towards the CPU only). Example:
      
      bridge vlan add dev sw1p0 vid 100
      
      The switch retags frames received on port 0, going to the CPU, and
      having VID 100, to the VID of 1104 (0x0450). In dsa_8021q language:
      
       | 11  | 10  |  9  |  8  |  7  |  6  |  5  |  4  |  3  |  2  |  1  |  0  |
       +-----------+-----+-----------------+-----------+-----------------------+
       |    DIR    | SVL |    SWITCH_ID    |  SUBVLAN  |          PORT         |
       +-----------+-----+-----------------+-----------+-----------------------+
      
      0x0450 means:
       - DIR = 0b01: this is an RX VLAN
       - SUBVLAN = 0b001: this is subvlan #1
       - SWITCH_ID = 0b001: this is switch 1 (see the name "sw1p0")
       - PORT = 0b0000: this is port 0 (see the name "sw1p0")
      
      The driver also remembers the "1 -> 100" mapping. In the hotpath, if the
      sub-VLAN from the tag encodes a non-untagged frame, this mapping is used
      to create a VLAN hwaccel tag, with the value of 100.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3eaae1d0
    • V
      net: dsa: sja1105: prepare tagger for handling DSA tags and VLAN simultaneously · 38b5beea
      Vladimir Oltean 提交于
      In VLAN-unaware mode, sja1105 uses VLAN tags with a custom TPID of
      0xdadb. While in the yet-to-be introduced best_effort_vlan_filtering
      mode, it needs to work with normal VLAN TPID values.
      
      A complication arises when we must transmit a VLAN-tagged packet to the
      switch when it's in VLAN-aware mode. We need to construct a packet with
      2 VLAN tags, and the switch will use the outer header for routing and
      pop it on egress. But sadly, here the 2 hardware generations don't
      behave the same:
      
      - E/T switches won't pop an ETH_P_8021AD tag on egress, it seems
        (packets will remain double-tagged).
      - P/Q/R/S switches will drop a packet with 2 ETH_P_8021Q tags (it looks
        like it tries to prevent VLAN hopping).
      
      But looks like the reverse is also true:
      
      - E/T switches have no problem popping the outer tag from packets with
        2 ETH_P_8021Q tags.
      - P/Q/R/S will have no problem popping a single tag even if that is
        ETH_P_8021AD.
      
      So it is clear that if we want the hardware to work with dsa_8021q
      tagging in VLAN-aware mode, we need to send different TPIDs depending on
      revision. Keep that information in priv->info->qinq_tpid.
      
      The per-port tagger structure will hold an xmit_tpid value that depends
      not only upon the qinq_tpid, but also upon the VLAN awareness state
      itself (in case we must transmit using 0xdadb).
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      38b5beea
    • V
      net: dsa: sja1105: save/restore VLANs using a delta commit method · ec5ae610
      Vladimir Oltean 提交于
      Managing the VLAN table that is present in hardware will become very
      difficult once we add a third operating state
      (best_effort_vlan_filtering). That is because correct cleanup (not too
      little, not too much) becomes virtually impossible, when VLANs can be
      added from the bridge layer, from dsa_8021q for basic tagging, for
      cross-chip bridging, as well as retagging rules for sub-VLANs and
      cross-chip sub-VLANs. So we need to rethink VLAN interaction with the
      switch in a more scalable way.
      
      In preparation for that, use the priv->expect_dsa_8021q boolean to
      classify any VLAN request received through .port_vlan_add or
      .port_vlan_del towards either one of 2 internal lists: bridge VLANs and
      dsa_8021q VLANs.
      
      Then, implement a central sja1105_build_vlan_table method that creates a
      VLAN configuration from scratch based on the 2 lists of VLANs kept by
      the driver, and based on the VLAN awareness state. Currently, if we are
      VLAN-unaware, install the dsa_8021q VLANs, otherwise the bridge VLANs.
      
      Then, implement a delta commit procedure that identifies which VLANs
      from this new configuration are actually different from the config
      previously committed to hardware. We apply the delta through the dynamic
      configuration interface (we don't reset the switch). The result is that
      the hardware should see the exact sequence of operations as before this
      patch.
      
      This also helps remove the "br" argument passed to
      dsa_8021q_crosschip_bridge_join, which it was only using to figure out
      whether it should commit the configuration back to us or not, based on
      the VLAN awareness state of the bridge. We can simplify that, by always
      allowing those VLANs inside of our dsa_8021q_vlans list, and committing
      those to hardware when necessary.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec5ae610
    • V
      net: dsa: tag_8021q: introduce a vid_is_dsa_8021q helper · 1f66b0f0
      Vladimir Oltean 提交于
      This function returns a boolean denoting whether the VLAN passed as
      argument is part of the 1024-3071 range that the dsa_8021q tagging
      scheme uses.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f66b0f0
  3. 12 5月, 2020 3 次提交
    • C
      net: cleanly handle kernel vs user buffers for ->msg_control · 1f466e1f
      Christoph Hellwig 提交于
      The msg_control field in struct msghdr can either contain a user
      pointer when used with the recvmsg system call, or a kernel pointer
      when used with sendmsg.  To complicate things further kernel_recvmsg
      can stuff a kernel pointer in and then use set_fs to make the uaccess
      helpers accept it.
      
      Replace it with a union of a kernel pointer msg_control field, and
      a user pointer msg_control_user one, and allow kernel_recvmsg operate
      on a proper kernel pointer using a bitfield to override the normal
      choice of a user pointer for recvmsg.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f466e1f
    • C
      net: add a CMSG_USER_DATA macro · 0462b6bd
      Christoph Hellwig 提交于
      Add a variant of CMSG_DATA that operates on user pointer to avoid
      sparse warnings about casting to/from user pointers.  Also fix up
      CMSG_DATA to rely on the gcc extension that allows void pointer
      arithmetics to cut down on the amount of casts.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0462b6bd
    • G
      team: Replace zero-length array with flexible-array · 9c8255c8
      Gustavo A. R. Silva 提交于
      The current codebase makes use of the zero-length array language
      extension to the C90 standard, but the preferred mechanism to declare
      variable-length types such as these ones is a flexible array member[1][2],
      introduced in C99:
      
      struct foo {
              int stuff;
              struct boo array[];
      };
      
      By making use of the mechanism above, we will get a compiler warning
      in case the flexible array does not occur last in the structure, which
      will help us prevent some kind of undefined behavior bugs from being
      inadvertently introduced[3] to the codebase from now on.
      
      Also, notice that, dynamic memory allocations won't be affected by
      this change:
      
      "Flexible array members have incomplete type, and so the sizeof operator
      may not be applied. As a quirk of the original implementation of
      zero-length arrays, sizeof evaluates to zero."[1]
      
      sizeof(flexible-array-member) triggers a warning because flexible array
      members have incomplete type[1]. There are some instances of code in
      which the sizeof operator is being incorrectly/erroneously applied to
      zero-length arrays and the result is zero. Such instances may be hiding
      some bugs. So, this work (flexible-array member conversions) will also
      help to get completely rid of those sorts of issues.
      
      This issue was found with the help of Coccinelle.
      
      [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
      [2] https://github.com/KSPP/linux/issues/21
      [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour")
      Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9c8255c8
  4. 11 5月, 2020 5 次提交
    • V
      net: dsa: sja1105: implement cross-chip bridging operations · ac02a451
      Vladimir Oltean 提交于
      sja1105 uses dsa_8021q for DSA tagging, a format which is VLAN at heart
      and which is compatible with cascading. A complete description of this
      tagging format is in net/dsa/tag_8021q.c, but a quick summary is that
      each external-facing port tags incoming frames with a unique pvid, and
      this special VLAN is transmitted as tagged towards the inside of the
      system, and as untagged towards the exterior. The tag encodes the switch
      id and the source port index.
      
      This means that cross-chip bridging for dsa_8021q only entails adding
      the dsa_8021q pvids of one switch to the RX filter of the other
      switches. Everything else falls naturally into place, as long as the
      bottom-end of ports (the leaves in the tree) is comprised exclusively of
      dsa_8021q-compatible (i.e. sja1105 switches). Otherwise, there would be
      a chance that a front-panel switch transmits a packet tagged with a
      dsa_8021q header, header which it wouldn't be able to remove, and which
      would hence "leak" out.
      
      The only use case I tested (due to lack of board availability) was when
      the sja1105 switches are part of disjoint trees (however, this doesn't
      change the fact that multiple sja1105 switches still need unique switch
      identifiers in such a system). But in principle, even "true" single-tree
      setups (with DSA links) should work just as fine, except for a small
      change which I can't test: dsa_towards_port should be used instead of
      dsa_upstream_port (I made the assumption that the routing port that any
      sja1105 should use towards its neighbours is the CPU port. That might
      not hold true in other setups).
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      ac02a451
    • A
      net: ethtool: Add helpers for reporting test results · 1e2dc145
      Andrew Lunn 提交于
      The PHY drivers can use these helpers for reporting the results. The
      results get translated into netlink attributes which are added to the
      pre-allocated skbuf.
      
      v3:
      Poison phydev->skb
      Return -EMSGSIZE when ethnl_bcastmsg_put() fails
      Return valid error code when nla_nest_start() fails
      Use u8 for results
      Actually put u32 length into message
      
      v4:
      s/ENOTSUPP/EOPNOTSUPP/g
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      1e2dc145
    • A
      net: ethtool: Add infrastructure for reporting cable test results · 1dd3f212
      Andrew Lunn 提交于
      Provide infrastructure for PHY drivers to report the cable test
      results.  A netlink skb is associated to the phydev. Helpers will be
      added which can add results to this skb. Once the test has finished
      the results are sent to user space.
      
      When netlink ethtool is not part of the kernel configuration stubs are
      provided. It is also impossible to trigger a cable test, so the error
      code returned by the alloc function is of no consequence.
      
      v2:
      Include the status complete in the netlink notification message
      
      v4:
      Replace -EINVAL with -EMSGSIZE
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      1dd3f212
    • A
      net: phy: Add support for polling cable test · 97c22438
      Andrew Lunn 提交于
      Some PHYs are not capable of generating interrupts when a cable test
      finished. They do however support interrupts for normal operations,
      like link up/down. As such, the PHY state machine would normally not
      poll the PHY.
      
      Add support for indicating the PHY state machine must poll the PHY
      when performing a cable test.
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      97c22438
    • A
      net: phy: Add cable test support to state machine · a68a8138
      Andrew Lunn 提交于
      Running a cable test is desruptive to normal operation of the PHY and
      can take a 5 to 10 seconds to complete. The RTNL lock cannot be held
      for this amount of time, and add a new state to the state machine for
      running a cable test.
      
      The driver is expected to implement two functions. The first is used
      to start a cable test. Once the test has started, it should return.
      
      The second function is called once per second, or on interrupt to
      check if the cable test is complete, and to allow the PHY to report
      the status.
      
      v2:
      Rename phy_cable_test_abort to phy_abort_cable_test
      Return different extack when already running test
      Use phy_init_hw() to reset the PHY
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      a68a8138
  5. 10 5月, 2020 1 次提交
    • G
      IB/mlx4: Replace zero-length array with flexible-array · e7bb7ece
      Gustavo A. R. Silva 提交于
      The current codebase makes use of the zero-length array language
      extension to the C90 standard, but the preferred mechanism to declare
      variable-length types such as these ones is a flexible array member[1][2],
      introduced in C99:
      
      struct foo {
              int stuff;
              struct boo array[];
      };
      
      By making use of the mechanism above, we will get a compiler warning
      in case the flexible array does not occur last in the structure, which
      will help us prevent some kind of undefined behavior bugs from being
      inadvertently introduced[3] to the codebase from now on.
      
      Also, notice that, dynamic memory allocations won't be affected by
      this change:
      
      "Flexible array members have incomplete type, and so the sizeof operator
      may not be applied. As a quirk of the original implementation of
      zero-length arrays, sizeof evaluates to zero."[1]
      
      sizeof(flexible-array-member) triggers a warning because flexible array
      members have incomplete type[1]. There are some instances of code in
      which the sizeof operator is being incorrectly/erroneously applied to
      zero-length arrays and the result is zero. Such instances may be hiding
      some bugs. So, this work (flexible-array member conversions) will also
      help to get completely rid of those sorts of issues.
      
      This issue was found with the help of Coccinelle.
      
      [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
      [2] https://github.com/KSPP/linux/issues/21
      [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour")
      Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      e7bb7ece
  6. 08 5月, 2020 4 次提交
  7. 07 5月, 2020 4 次提交
    • O
      ethtool: provide UAPI for PHY master/slave configuration. · bdbdac76
      Oleksij Rempel 提交于
      This UAPI is needed for BroadR-Reach 100BASE-T1 devices. Due to lack of
      auto-negotiation support, we needed to be able to configure the
      MASTER-SLAVE role of the port manually or from an application in user
      space.
      
      The same UAPI can be used for 1000BASE-T or MultiGBASE-T devices to
      force MASTER or SLAVE role. See IEEE 802.3-2018:
      22.2.4.3.7 MASTER-SLAVE control register (Register 9)
      22.2.4.3.8 MASTER-SLAVE status register (Register 10)
      40.5.2 MASTER-SLAVE configuration resolution
      45.2.1.185.1 MASTER-SLAVE config value (1.2100.14)
      45.2.7.10 MultiGBASE-T AN control 1 register (Register 7.32)
      
      The MASTER-SLAVE role affects the clock configuration:
      
      -------------------------------------------------------------------------------
      When the  PHY is configured as MASTER, the PMA Transmit function shall
      source TX_TCLK from a local clock source. When configured as SLAVE, the
      PMA Transmit function shall source TX_TCLK from the clock recovered from
      data stream provided by MASTER.
      
      iMX6Q                     KSZ9031                XXX
      ------\                /-----------\        /------------\
            |                |           |        |            |
       MAC  |<----RGMII----->| PHY Slave |<------>| PHY Master |
            |<--- 125 MHz ---+-<------/  |        | \          |
      ------/                \-----------/        \------------/
                                                     ^
                                                      \-TX_TCLK
      
      -------------------------------------------------------------------------------
      
      Since some clock or link related issues are only reproducible in a
      specific MASTER-SLAVE-role, MAC and PHY configuration, it is beneficial
      to provide generic (not 100BASE-T1 specific) interface to the user space
      for configuration flexibility and trouble shooting.
      Signed-off-by: NOleksij Rempel <o.rempel@pengutronix.de>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bdbdac76
    • W
      net: stricter validation of untrusted gso packets · 9274124f
      Willem de Bruijn 提交于
      Syzkaller again found a path to a kernel crash through bad gso input:
      a packet with transport header extending beyond skb_headlen(skb).
      
      Tighten validation at kernel entry:
      
      - Verify that the transport header lies within the linear section.
      
          To avoid pulling linux/tcp.h, verify just sizeof tcphdr.
          tcp_gso_segment will call pskb_may_pull (th->doff * 4) before use.
      
      - Match the gso_type against the ip_proto found by the flow dissector.
      
      Fixes: bfd5f4a3 ("packet: Add GSO/csum offload support.")
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9274124f
    • H
      timer: add fsleep for flexible sleeping · c6af13d3
      Heiner Kallweit 提交于
      Sleeping for a certain amount of time requires use of different
      functions, depending on the time period.
      Documentation/timers/timers-howto.rst explains when to use which
      function, and also checkpatch checks for some potentially
      problematic cases.
      
      So let's create a helper that automatically chooses the appropriate
      sleep function -> fsleep(), for flexible sleeping
      
      If the delay is a constant, then the compiler should be able to ensure
      that the new helper doesn't create overhead. If the delay is not
      constant, then the new helper can save some code.
      Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c6af13d3
    • M
      net: phy: add concept of shared storage for PHYs · 63490847
      Michael Walle 提交于
      There are packages which contain multiple PHY devices, eg. a quad PHY
      transceiver. Provide functions to allocate and free shared storage.
      
      Usually, a quad PHY contains global registers, which don't belong to any
      PHY. Provide convenience functions to access these registers.
      Signed-off-by: NMichael Walle <michael@walle.cc>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      63490847
  8. 06 5月, 2020 1 次提交
  9. 05 5月, 2020 3 次提交
  10. 03 5月, 2020 1 次提交
  11. 02 5月, 2020 3 次提交
  12. 01 5月, 2020 6 次提交
    • M
      net: phy: bcm54140: add second PHY ID · e4e51da6
      Michael Walle 提交于
      This PHY has two PHY IDs depending on its mode. Adjust the mask so that
      it includes both IDs.
      Signed-off-by: NMichael Walle <michael@walle.cc>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e4e51da6
    • Y
      ptp_qoriq: output PPS signal on FIPER2 in default · f256356f
      Yangbo Lu 提交于
      Output PPS signal on FIPER2 (Fixed Period Interval Pulse) in default
      which is more desired by user.
      Signed-off-by: NYangbo Lu <yangbo.lu@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f256356f
    • E
      tcp: add tp->dup_ack_counter · 2b195850
      Eric Dumazet 提交于
      In commit 86de5921 ("tcp: defer SACK compression after DupThresh")
      I added a TCP_FASTRETRANS_THRESH bias to tp->compressed_ack in order
      to enable sack compression only after 3 dupacks.
      
      Since we plan to relax this rule for flows that involve
      stacks not requiring this old rule, this patch adds
      a distinct tp->dup_ack_counter.
      
      This means the TCP_FASTRETRANS_THRESH value is now used
      in a single location that a future patch can adjust:
      
      	if (tp->dup_ack_counter < TCP_FASTRETRANS_THRESH) {
      		tp->dup_ack_counter++;
      		goto send_now;
      	}
      
      This patch also introduces tcp_sack_compress_send_ack()
      helper to ease following patch comprehension.
      
      This patch refines LINUX_MIB_TCPACKCOMPRESSED to not
      count the acks that we had to send if the timer expires
      or tcp_sack_compress_send_ack() is sending an ack.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2b195850
    • M
      docs: networking: convert netdev-features.txt to ReST · ea5bacaa
      Mauro Carvalho Chehab 提交于
      Not much to be done here:
      
      - add SPDX header;
      - adjust titles and chapters, adding proper markups;
      - add to networking/index.rst.
      Signed-off-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea5bacaa
    • D
      inet_diag: add cgroup id attribute · 6e3a401f
      Dmitry Yakunin 提交于
      This patch adds cgroup v2 ID to common inet diag message attributes.
      Cgroup v2 ID is kernfs ID (ino or ino+gen). This attribute allows filter
      inet diag output by cgroup ID obtained by name_to_handle_at() syscall.
      When net_cls or net_prio cgroup is activated this ID is equal to 1 (root
      cgroup ID) for newly created sockets.
      
      Some notes about this ID:
      
      1) gets initialized in socket() syscall
      2) incoming socket gets ID from listening socket
         (not during accept() syscall)
      3) not changed when process get moved to another cgroup
      4) can point to deleted cgroup (refcounting)
      
      v2:
        - use CONFIG_SOCK_CGROUP_DATA instead if CONFIG_CGROUPS
      
      v3:
        - fix attr size by using nla_total_size_64bit() (Eric Dumazet)
        - more detailed commit message (Konstantin Khlebnikov)
      Signed-off-by: NDmitry Yakunin <zeil@yandex-team.ru>
      Reviewed-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Acked-By: NTejun Heo <tj@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e3a401f
    • P
      mptcp: move option parsing into mptcp_incoming_options() · cfde141e
      Paolo Abeni 提交于
      The mptcp_options_received structure carries several per
      packet flags (mp_capable, mp_join, etc.). Such fields must
      be cleared on each packet, even on dropped ones or packet
      not carrying any MPTCP options, but the current mptcp
      code clears them only on TCP option reset.
      
      On several races/corner cases we end-up with stray bits in
      incoming options, leading to WARN_ON splats. e.g.:
      
      [  171.164906] Bad mapping: ssn=32714 map_seq=1 map_data_len=32713
      [  171.165006] WARNING: CPU: 1 PID: 5026 at net/mptcp/subflow.c:533 warn_bad_map (linux-mptcp/net/mptcp/subflow.c:533 linux-mptcp/net/mptcp/subflow.c:531)
      [  171.167632] Modules linked in: ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel geneve ip6_udp_tunnel udp_tunnel macsec macvtap tap ipvlan macvlan 8021q garp mrp xfrm_interface veth netdevsim nlmon dummy team bonding vcan bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun binfmt_misc intel_rapl_msr intel_rapl_common rfkill kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev virtio_balloon pcspkr i2c_piix4 sunrpc ip_tables xfs libcrc32c crc32c_intel serio_raw virtio_console ata_generic virtio_blk virtio_net net_failover failover ata_piix libata
      [  171.199464] CPU: 1 PID: 5026 Comm: repro Not tainted 5.7.0-rc1.mptcp_f227fdf5d388+ #95
      [  171.200886] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
      [  171.202546] RIP: 0010:warn_bad_map (linux-mptcp/net/mptcp/subflow.c:533 linux-mptcp/net/mptcp/subflow.c:531)
      [  171.206537] Code: c1 ea 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 04 84 d2 75 1d 8b 55 3c 44 89 e6 48 c7 c7 20 51 13 95 e8 37 8b 22 fe <0f> 0b 48 83 c4 08 5b 5d 41 5c c3 89 4c 24 04 e8 db d6 94 fe 8b 4c
      [  171.220473] RSP: 0018:ffffc90000150560 EFLAGS: 00010282
      [  171.221639] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      [  171.223108] RDX: 0000000000000000 RSI: 0000000000000008 RDI: fffff5200002a09e
      [  171.224388] RBP: ffff8880aa6e3c00 R08: 0000000000000001 R09: fffffbfff2ec9955
      [  171.225706] R10: ffffffff9764caa7 R11: fffffbfff2ec9954 R12: 0000000000007fca
      [  171.227211] R13: ffff8881066f4a7f R14: ffff8880aa6e3c00 R15: 0000000000000020
      [  171.228460] FS:  00007f8623719740(0000) GS:ffff88810be00000(0000) knlGS:0000000000000000
      [  171.230065] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  171.231303] CR2: 00007ffdab190a50 CR3: 00000001038ea006 CR4: 0000000000160ee0
      [  171.232586] Call Trace:
      [  171.233109]  <IRQ>
      [  171.233531] get_mapping_status (linux-mptcp/net/mptcp/subflow.c:691)
      [  171.234371] mptcp_subflow_data_available (linux-mptcp/net/mptcp/subflow.c:736 linux-mptcp/net/mptcp/subflow.c:832)
      [  171.238181] subflow_state_change (linux-mptcp/net/mptcp/subflow.c:1085 (discriminator 1))
      [  171.239066] tcp_fin (linux-mptcp/net/ipv4/tcp_input.c:4217)
      [  171.240123] tcp_data_queue (linux-mptcp/./include/linux/compiler.h:199 linux-mptcp/net/ipv4/tcp_input.c:4822)
      [  171.245083] tcp_rcv_established (linux-mptcp/./include/linux/skbuff.h:1785 linux-mptcp/./include/net/tcp.h:1774 linux-mptcp/./include/net/tcp.h:1847 linux-mptcp/net/ipv4/tcp_input.c:5238 linux-mptcp/net/ipv4/tcp_input.c:5730)
      [  171.254089] tcp_v4_rcv (linux-mptcp/./include/linux/spinlock.h:393 linux-mptcp/net/ipv4/tcp_ipv4.c:2009)
      [  171.258969] ip_protocol_deliver_rcu (linux-mptcp/net/ipv4/ip_input.c:204 (discriminator 1))
      [  171.260214] ip_local_deliver_finish (linux-mptcp/./include/linux/rcupdate.h:651 linux-mptcp/net/ipv4/ip_input.c:232)
      [  171.261389] ip_local_deliver (linux-mptcp/./include/linux/netfilter.h:307 linux-mptcp/./include/linux/netfilter.h:301 linux-mptcp/net/ipv4/ip_input.c:252)
      [  171.265884] ip_rcv (linux-mptcp/./include/linux/netfilter.h:307 linux-mptcp/./include/linux/netfilter.h:301 linux-mptcp/net/ipv4/ip_input.c:539)
      [  171.273666] process_backlog (linux-mptcp/./include/linux/rcupdate.h:651 linux-mptcp/net/core/dev.c:6135)
      [  171.275328] net_rx_action (linux-mptcp/net/core/dev.c:6572 linux-mptcp/net/core/dev.c:6640)
      [  171.280472] __do_softirq (linux-mptcp/./arch/x86/include/asm/jump_label.h:25 linux-mptcp/./include/linux/jump_label.h:200 linux-mptcp/./include/trace/events/irq.h:142 linux-mptcp/kernel/softirq.c:293)
      [  171.281379] do_softirq_own_stack (linux-mptcp/arch/x86/entry/entry_64.S:1083)
      [  171.282358]  </IRQ>
      
      We could address the issue clearing explicitly the relevant fields
      in several places - tcp_parse_option, tcp_fast_parse_options,
      possibly others.
      
      Instead we move the MPTCP option parsing into the already existing
      mptcp ingress hook, so that we need to clear the fields in a single
      place.
      
      This allows us dropping an MPTCP hook from the TCP code and
      removing the quite large mptcp_options_received from the tcp_sock
      struct. On the flip side, the MPTCP sockets will traverse the
      option space twice (in tcp_parse_option() and in
      mptcp_incoming_options(). That looks acceptable: we already
      do that for syn and 3rd ack packets, plain TCP socket will
      benefit from it, and even MPTCP sockets will experience better
      code locality, reducing the jumps between TCP and MPTCP code.
      
      v1 -> v2:
       - rebased on current '-net' tree
      
      Fixes: 648ef4b8 ("mptcp: Implement MPTCP receive path")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cfde141e
  13. 30 4月, 2020 2 次提交
  14. 29 4月, 2020 1 次提交