1. 12 5月, 2020 4 次提交
    • C
      net: cleanly handle kernel vs user buffers for ->msg_control · 1f466e1f
      Christoph Hellwig 提交于
      The msg_control field in struct msghdr can either contain a user
      pointer when used with the recvmsg system call, or a kernel pointer
      when used with sendmsg.  To complicate things further kernel_recvmsg
      can stuff a kernel pointer in and then use set_fs to make the uaccess
      helpers accept it.
      
      Replace it with a union of a kernel pointer msg_control field, and
      a user pointer msg_control_user one, and allow kernel_recvmsg operate
      on a proper kernel pointer using a bitfield to override the normal
      choice of a user pointer for recvmsg.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f466e1f
    • C
      net: add a CMSG_USER_DATA macro · 0462b6bd
      Christoph Hellwig 提交于
      Add a variant of CMSG_DATA that operates on user pointer to avoid
      sparse warnings about casting to/from user pointers.  Also fix up
      CMSG_DATA to rely on the gcc extension that allows void pointer
      arithmetics to cut down on the amount of casts.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0462b6bd
    • G
      team: Replace zero-length array with flexible-array · 9c8255c8
      Gustavo A. R. Silva 提交于
      The current codebase makes use of the zero-length array language
      extension to the C90 standard, but the preferred mechanism to declare
      variable-length types such as these ones is a flexible array member[1][2],
      introduced in C99:
      
      struct foo {
              int stuff;
              struct boo array[];
      };
      
      By making use of the mechanism above, we will get a compiler warning
      in case the flexible array does not occur last in the structure, which
      will help us prevent some kind of undefined behavior bugs from being
      inadvertently introduced[3] to the codebase from now on.
      
      Also, notice that, dynamic memory allocations won't be affected by
      this change:
      
      "Flexible array members have incomplete type, and so the sizeof operator
      may not be applied. As a quirk of the original implementation of
      zero-length arrays, sizeof evaluates to zero."[1]
      
      sizeof(flexible-array-member) triggers a warning because flexible array
      members have incomplete type[1]. There are some instances of code in
      which the sizeof operator is being incorrectly/erroneously applied to
      zero-length arrays and the result is zero. Such instances may be hiding
      some bugs. So, this work (flexible-array member conversions) will also
      help to get completely rid of those sorts of issues.
      
      This issue was found with the help of Coccinelle.
      
      [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
      [2] https://github.com/KSPP/linux/issues/21
      [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour")
      Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9c8255c8
    • G
      ipv6: Replace zero-length array with flexible-array · 0fa39d6d
      Gustavo A. R. Silva 提交于
      The current codebase makes use of the zero-length array language
      extension to the C90 standard, but the preferred mechanism to declare
      variable-length types such as these ones is a flexible array member[1][2],
      introduced in C99:
      
      struct foo {
              int stuff;
              struct boo array[];
      };
      
      By making use of the mechanism above, we will get a compiler warning
      in case the flexible array does not occur last in the structure, which
      will help us prevent some kind of undefined behavior bugs from being
      inadvertently introduced[3] to the codebase from now on.
      
      Also, notice that, dynamic memory allocations won't be affected by
      this change:
      
      "Flexible array members have incomplete type, and so the sizeof operator
      may not be applied. As a quirk of the original implementation of
      zero-length arrays, sizeof evaluates to zero."[1]
      
      sizeof(flexible-array-member) triggers a warning because flexible array
      members have incomplete type[1]. There are some instances of code in
      which the sizeof operator is being incorrectly/erroneously applied to
      zero-length arrays and the result is zero. Such instances may be hiding
      some bugs. So, this work (flexible-array member conversions) will also
      help to get completely rid of those sorts of issues.
      
      This issue was found with the help of Coccinelle.
      
      [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
      [2] https://github.com/KSPP/linux/issues/21
      [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour")
      Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0fa39d6d
  2. 11 5月, 2020 10 次提交
    • V
      net: dsa: sja1105: implement cross-chip bridging operations · ac02a451
      Vladimir Oltean 提交于
      sja1105 uses dsa_8021q for DSA tagging, a format which is VLAN at heart
      and which is compatible with cascading. A complete description of this
      tagging format is in net/dsa/tag_8021q.c, but a quick summary is that
      each external-facing port tags incoming frames with a unique pvid, and
      this special VLAN is transmitted as tagged towards the inside of the
      system, and as untagged towards the exterior. The tag encodes the switch
      id and the source port index.
      
      This means that cross-chip bridging for dsa_8021q only entails adding
      the dsa_8021q pvids of one switch to the RX filter of the other
      switches. Everything else falls naturally into place, as long as the
      bottom-end of ports (the leaves in the tree) is comprised exclusively of
      dsa_8021q-compatible (i.e. sja1105 switches). Otherwise, there would be
      a chance that a front-panel switch transmits a packet tagged with a
      dsa_8021q header, header which it wouldn't be able to remove, and which
      would hence "leak" out.
      
      The only use case I tested (due to lack of board availability) was when
      the sja1105 switches are part of disjoint trees (however, this doesn't
      change the fact that multiple sja1105 switches still need unique switch
      identifiers in such a system). But in principle, even "true" single-tree
      setups (with DSA links) should work just as fine, except for a small
      change which I can't test: dsa_towards_port should be used instead of
      dsa_upstream_port (I made the assumption that the routing port that any
      sja1105 should use towards its neighbours is the CPU port. That might
      not hold true in other setups).
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      ac02a451
    • V
      net: dsa: introduce a dsa_switch_find function · 3b7bc1f0
      Vladimir Oltean 提交于
      Somewhat similar to dsa_tree_find, dsa_switch_find returns a dsa_switch
      structure pointer by searching for its tree index and switch index (the
      parameters from dsa,member). To be used, for example, by drivers who
      implement .crosschip_bridge_join and need a reference to the other
      switch indicated to by the tree_index and sw_index arguments.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      3b7bc1f0
    • V
      net: dsa: permit cross-chip bridging between all trees in the system · f66a6a69
      Vladimir Oltean 提交于
      One way of utilizing DSA is by cascading switches which do not all have
      compatible taggers. Consider the following real-life topology:
      
            +---------------------------------------------------------------+
            | LS1028A                                                       |
            |               +------------------------------+                |
            |               |      DSA master for Felix    |                |
            |               |(internal ENETC port 2: eno2))|                |
            |  +------------+------------------------------+-------------+  |
            |  | Felix embedded L2 switch                                |  |
            |  |                                                         |  |
            |  | +--------------+   +--------------+   +--------------+  |  |
            |  | |DSA master for|   |DSA master for|   |DSA master for|  |  |
            |  | |  SJA1105 1   |   |  SJA1105 2   |   |  SJA1105 3   |  |  |
            |  | |(Felix port 1)|   |(Felix port 2)|   |(Felix port 3)|  |  |
            +--+-+--------------+---+--------------+---+--------------+--+--+
      
      +-----------------------+ +-----------------------+ +-----------------------+
      |   SJA1105 switch 1    | |   SJA1105 switch 2    | |   SJA1105 switch 3    |
      +-----+-----+-----+-----+ +-----+-----+-----+-----+ +-----+-----+-----+-----+
      |sw1p0|sw1p1|sw1p2|sw1p3| |sw2p0|sw2p1|sw2p2|sw2p3| |sw3p0|sw3p1|sw3p2|sw3p3|
      +-----+-----+-----+-----+ +-----+-----+-----+-----+ +-----+-----+-----+-----+
      
      The above can be described in the device tree as follows (obviously not
      complete):
      
      mscc_felix {
      	dsa,member = <0 0>;
      	ports {
      		port@4 {
      			ethernet = <&enetc_port2>;
      		};
      	};
      };
      
      sja1105_switch1 {
      	dsa,member = <1 1>;
      	ports {
      		port@4 {
      			ethernet = <&mscc_felix_port1>;
      		};
      	};
      };
      
      sja1105_switch2 {
      	dsa,member = <2 2>;
      	ports {
      		port@4 {
      			ethernet = <&mscc_felix_port2>;
      		};
      	};
      };
      
      sja1105_switch3 {
      	dsa,member = <3 3>;
      	ports {
      		port@4 {
      			ethernet = <&mscc_felix_port3>;
      		};
      	};
      };
      
      Basically we instantiate one DSA switch tree for every hardware switch
      in the system, but we still give them globally unique switch IDs (will
      come back to that later). Having 3 disjoint switch trees makes the
      tagger drivers "just work", because net devices are registered for the
      3 Felix DSA master ports, and they are also DSA slave ports to the ENETC
      port. So packets received on the ENETC port are stripped of their
      stacked DSA tags one by one.
      
      Currently, hardware bridging between ports on the same sja1105 chip is
      possible, but switching between sja1105 ports on different chips is
      handled by the software bridge. This is fine, but we can do better.
      
      In fact, the dsa_8021q tag used by sja1105 is compatible with cascading.
      In other words, a sja1105 switch can correctly parse and route a packet
      containing a dsa_8021q tag. So if we could enable hardware bridging on
      the Felix DSA master ports, cross-chip bridging could be completely
      offloaded.
      
      Such as system would be used as follows:
      
      ip link add dev br0 type bridge && ip link set dev br0 up
      for port in sw0p0 sw0p1 sw0p2 sw0p3 \
      	    sw1p0 sw1p1 sw1p2 sw1p3 \
      	    sw2p0 sw2p1 sw2p2 sw2p3; do
      	ip link set dev $port master br0
      done
      
      The above makes switching between ports on the same row be performed in
      hardware, and between ports on different rows in software. Now assume
      the Felix switch ports are called swp0, swp1, swp2. By running the
      following extra commands:
      
      ip link add dev br1 type bridge && ip link set dev br1 up
      for port in swp0 swp1 swp2; do
      	ip link set dev $port master br1
      done
      
      the CPU no longer sees packets which traverse sja1105 switch boundaries
      and can be forwarded directly by Felix. The br1 bridge would not be used
      for any sort of traffic termination.
      
      For this to work, we need to give drivers an opportunity to listen for
      bridging events on DSA trees other than their own, and pass that other
      tree index as argument. I have made the assumption, for the moment, that
      the other existing DSA notifiers don't need to be broadcast to other
      trees. That assumption might turn out to be incorrect. But in the
      meantime, introduce a dsa_broadcast function, similar in purpose to
      dsa_port_notify, which is used only by the bridging notifiers.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      f66a6a69
    • V
      net: bridge: allow enslaving some DSA master network devices · 9eb8eff0
      Vladimir Oltean 提交于
      Commit 8db0a2ee ("net: bridge: reject DSA-enabled master netdevices
      as bridge members") added a special check in br_if.c in order to check
      for a DSA master network device with a tagging protocol configured. This
      was done because back then, such devices, once enslaved in a bridge
      would become inoperative and would not pass DSA tagged traffic anymore
      due to br_handle_frame returning RX_HANDLER_CONSUMED.
      
      But right now we have valid use cases which do require bridging of DSA
      masters. One such example is when the DSA master ports are DSA switch
      ports themselves (in a disjoint tree setup). This should be completely
      equivalent, functionally speaking, from having multiple DSA switches
      hanging off of the ports of a switchdev driver. So we should allow the
      enslaving of DSA tagged master network devices.
      
      Instead of the regular br_handle_frame(), install a new function
      br_handle_frame_dummy() on these DSA masters, which returns
      RX_HANDLER_PASS in order to call into the DSA specific tagging protocol
      handlers, and lift the restriction from br_add_if.
      Suggested-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Suggested-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Tested-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      9eb8eff0
    • A
      net: ethtool: Add helpers for reporting test results · 1e2dc145
      Andrew Lunn 提交于
      The PHY drivers can use these helpers for reporting the results. The
      results get translated into netlink attributes which are added to the
      pre-allocated skbuf.
      
      v3:
      Poison phydev->skb
      Return -EMSGSIZE when ethnl_bcastmsg_put() fails
      Return valid error code when nla_nest_start() fails
      Use u8 for results
      Actually put u32 length into message
      
      v4:
      s/ENOTSUPP/EOPNOTSUPP/g
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      1e2dc145
    • A
      net: ethtool: Add infrastructure for reporting cable test results · 1dd3f212
      Andrew Lunn 提交于
      Provide infrastructure for PHY drivers to report the cable test
      results.  A netlink skb is associated to the phydev. Helpers will be
      added which can add results to this skb. Once the test has finished
      the results are sent to user space.
      
      When netlink ethtool is not part of the kernel configuration stubs are
      provided. It is also impossible to trigger a cable test, so the error
      code returned by the alloc function is of no consequence.
      
      v2:
      Include the status complete in the netlink notification message
      
      v4:
      Replace -EINVAL with -EMSGSIZE
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      1dd3f212
    • A
      net: ethtool: Add attributes for cable test reports · b28efb93
      Andrew Lunn 提交于
      Add the attributes needed to report cable test results to userspace.
      The reports are expected to be per twisted pair. A nested property per
      pair can report the result of the cable test. A nested property can
      also report the length of the cable to any fault.
      
      v2:
      Grammar fixes
      Change length from u16 to u32
      s/DEV/HEADER/g
      Add status attributes
      Rename pairs from numbers to letters.
      
      v3:
      Fixed example in document
      Add ETHTOOL_A_CABLE_NEST_* enum
      Add ETHTOOL_MSG_CABLE_TEST_NTF to documentation
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      b28efb93
    • A
      net: ethtool: netlink: Add support for triggering a cable test · 11ca3c42
      Andrew Lunn 提交于
      Add new ethtool netlink calls to trigger the starting of a PHY cable
      test.
      
      Add Kconfig'ury to ETHTOOL_NETLINK so that PHYLIB is not a module when
      ETHTOOL_NETLINK is builtin, which would result in kernel linking errors.
      
      v2:
      Remove unwanted white space change
      Remove ethnl_cable_test_act_ops and use doit handler
      Rename cable_test_set_policy cable_test_act_policy
      Remove ETHTOOL_MSG_CABLE_TEST_ACT_REPLY
      
      v3:
      Remove ETHTOOL_MSG_CABLE_TEST_ACT_REPLY from documentation
      Remove unused cable_test_get_policy
      Add Reviewed-by tags
      
      v4:
      Remove unwanted blank line
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NMichal Kubecek <mkubecek@suse.cz>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      11ca3c42
    • A
      net: phy: Add support for polling cable test · 97c22438
      Andrew Lunn 提交于
      Some PHYs are not capable of generating interrupts when a cable test
      finished. They do however support interrupts for normal operations,
      like link up/down. As such, the PHY state machine would normally not
      poll the PHY.
      
      Add support for indicating the PHY state machine must poll the PHY
      when performing a cable test.
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      97c22438
    • A
      net: phy: Add cable test support to state machine · a68a8138
      Andrew Lunn 提交于
      Running a cable test is desruptive to normal operation of the PHY and
      can take a 5 to 10 seconds to complete. The RTNL lock cannot be held
      for this amount of time, and add a new state to the state machine for
      running a cable test.
      
      The driver is expected to implement two functions. The first is used
      to start a cable test. Once the test has started, it should return.
      
      The second function is called once per second, or on interrupt to
      check if the cable test is complete, and to allow the PHY to report
      the status.
      
      v2:
      Rename phy_cable_test_abort to phy_abort_cable_test
      Return different extack when already running test
      Use phy_init_hw() to reset the PHY
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      a68a8138
  3. 10 5月, 2020 1 次提交
    • G
      IB/mlx4: Replace zero-length array with flexible-array · e7bb7ece
      Gustavo A. R. Silva 提交于
      The current codebase makes use of the zero-length array language
      extension to the C90 standard, but the preferred mechanism to declare
      variable-length types such as these ones is a flexible array member[1][2],
      introduced in C99:
      
      struct foo {
              int stuff;
              struct boo array[];
      };
      
      By making use of the mechanism above, we will get a compiler warning
      in case the flexible array does not occur last in the structure, which
      will help us prevent some kind of undefined behavior bugs from being
      inadvertently introduced[3] to the codebase from now on.
      
      Also, notice that, dynamic memory allocations won't be affected by
      this change:
      
      "Flexible array members have incomplete type, and so the sizeof operator
      may not be applied. As a quirk of the original implementation of
      zero-length arrays, sizeof evaluates to zero."[1]
      
      sizeof(flexible-array-member) triggers a warning because flexible array
      members have incomplete type[1]. There are some instances of code in
      which the sizeof operator is being incorrectly/erroneously applied to
      zero-length arrays and the result is zero. Such instances may be hiding
      some bugs. So, this work (flexible-array member conversions) will also
      help to get completely rid of those sorts of issues.
      
      This issue was found with the help of Coccinelle.
      
      [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
      [2] https://github.com/KSPP/linux/issues/21
      [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour")
      Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      e7bb7ece
  4. 09 5月, 2020 1 次提交
  5. 08 5月, 2020 6 次提交
  6. 07 5月, 2020 8 次提交
    • P
      net: flow_offload: skip hw stats check for FLOW_ACTION_HW_STATS_DONT_CARE · 16f80360
      Pablo Neira Ayuso 提交于
      This patch adds FLOW_ACTION_HW_STATS_DONT_CARE which tells the driver
      that the frontend does not need counters, this hw stats type request
      never fails. The FLOW_ACTION_HW_STATS_DISABLED type explicitly requests
      the driver to disable the stats, however, if the driver cannot disable
      counters, it bails out.
      
      TCA_ACT_HW_STATS_* maintains the 1:1 mapping with FLOW_ACTION_HW_STATS_*
      except by disabled which is mapped to FLOW_ACTION_HW_STATS_DISABLED
      (this is 0 in tc). Add tc_act_hw_stats() to perform the mapping between
      TCA_ACT_HW_STATS_* and FLOW_ACTION_HW_STATS_*.
      
      Fixes: 319a1d19 ("flow_offload: check for basic action hw stats type")
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16f80360
    • O
      ethtool: provide UAPI for PHY master/slave configuration. · bdbdac76
      Oleksij Rempel 提交于
      This UAPI is needed for BroadR-Reach 100BASE-T1 devices. Due to lack of
      auto-negotiation support, we needed to be able to configure the
      MASTER-SLAVE role of the port manually or from an application in user
      space.
      
      The same UAPI can be used for 1000BASE-T or MultiGBASE-T devices to
      force MASTER or SLAVE role. See IEEE 802.3-2018:
      22.2.4.3.7 MASTER-SLAVE control register (Register 9)
      22.2.4.3.8 MASTER-SLAVE status register (Register 10)
      40.5.2 MASTER-SLAVE configuration resolution
      45.2.1.185.1 MASTER-SLAVE config value (1.2100.14)
      45.2.7.10 MultiGBASE-T AN control 1 register (Register 7.32)
      
      The MASTER-SLAVE role affects the clock configuration:
      
      -------------------------------------------------------------------------------
      When the  PHY is configured as MASTER, the PMA Transmit function shall
      source TX_TCLK from a local clock source. When configured as SLAVE, the
      PMA Transmit function shall source TX_TCLK from the clock recovered from
      data stream provided by MASTER.
      
      iMX6Q                     KSZ9031                XXX
      ------\                /-----------\        /------------\
            |                |           |        |            |
       MAC  |<----RGMII----->| PHY Slave |<------>| PHY Master |
            |<--- 125 MHz ---+-<------/  |        | \          |
      ------/                \-----------/        \------------/
                                                     ^
                                                      \-TX_TCLK
      
      -------------------------------------------------------------------------------
      
      Since some clock or link related issues are only reproducible in a
      specific MASTER-SLAVE-role, MAC and PHY configuration, it is beneficial
      to provide generic (not 100BASE-T1 specific) interface to the user space
      for configuration flexibility and trouble shooting.
      Signed-off-by: NOleksij Rempel <o.rempel@pengutronix.de>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bdbdac76
    • E
      tcp: refine tcp_pacing_delay() for very low pacing rates · 8dc242ad
      Eric Dumazet 提交于
      With the addition of horizon feature to sch_fq, we noticed some
      suboptimal behavior of extremely low pacing rate TCP flows, especially
      when TCP is not aware of a drop happening in lower stacks.
      
      Back in commit 3f80e08f ("tcp: add tcp_reset_xmit_timer() helper"),
      tcp_pacing_delay() was added to estimate an extra delay to add to standard
      rto timers.
      
      This patch removes the skb argument from this helper and
      tcp_reset_xmit_timer() because it makes more sense to simply
      consider the time at which next packet is allowed to be sent,
      instead of the time of whatever packet has been sent.
      
      This avoids arming RTO timer too soon and removes
      spurious horizon drops.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8dc242ad
    • W
      net: stricter validation of untrusted gso packets · 9274124f
      Willem de Bruijn 提交于
      Syzkaller again found a path to a kernel crash through bad gso input:
      a packet with transport header extending beyond skb_headlen(skb).
      
      Tighten validation at kernel entry:
      
      - Verify that the transport header lies within the linear section.
      
          To avoid pulling linux/tcp.h, verify just sizeof tcphdr.
          tcp_gso_segment will call pskb_may_pull (th->doff * 4) before use.
      
      - Match the gso_type against the ip_proto found by the flow dissector.
      
      Fixes: bfd5f4a3 ("packet: Add GSO/csum offload support.")
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9274124f
    • V
      net: dsa: ocelot: the MAC table on Felix is twice as large · 21ce7f3e
      Vladimir Oltean 提交于
      When running 'bridge fdb dump' on Felix, sometimes learnt and static MAC
      addresses would appear, sometimes they wouldn't.
      
      Turns out, the MAC table has 4096 entries on VSC7514 (Ocelot) and 8192
      entries on VSC9959 (Felix), so the existing code from the Ocelot common
      library only dumped half of Felix's MAC table. They are both organized
      as a 4-way set-associative TCAM, so we just need a single variable
      indicating the correct number of rows.
      
      Fixes: 56051948 ("net: dsa: ocelot: add driver for Felix switch family")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      21ce7f3e
    • H
      timer: add fsleep for flexible sleeping · c6af13d3
      Heiner Kallweit 提交于
      Sleeping for a certain amount of time requires use of different
      functions, depending on the time period.
      Documentation/timers/timers-howto.rst explains when to use which
      function, and also checkpatch checks for some potentially
      problematic cases.
      
      So let's create a helper that automatically chooses the appropriate
      sleep function -> fsleep(), for flexible sleeping
      
      If the delay is a constant, then the compiler should be able to ensure
      that the new helper doesn't create overhead. If the delay is not
      constant, then the new helper can save some code.
      Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c6af13d3
    • F
      ipv6: Implement draft-ietf-6man-rfc4941bis · 969c5464
      Fernando Gont 提交于
      Implement the upcoming rev of RFC4941 (IPv6 temporary addresses):
      https://tools.ietf.org/html/draft-ietf-6man-rfc4941bis-09
      
      * Reduces the default Valid Lifetime to 2 days
        The number of extra addresses employed when Valid Lifetime was
        7 days exacerbated the stress caused on network
        elements/devices. Additionally, the motivation for temporary
        addresses is indeed privacy and reduced exposure. With a
        default Valid Lifetime of 7 days, an address that becomes
        revealed by active communication is reachable and exposed for
        one whole week. The only use case for a Valid Lifetime of 7
        days could be some application that is expecting to have long
        lived connections. But if you want to have a long lived
        connections, you shouldn't be using a temporary address in the
        first place. Additionally, in the era of mobile devices, general
        applications should nevertheless be prepared and robust to
        address changes (e.g. nodes swap wifi <-> 4G, etc.)
      
      * Employs different IIDs for different prefixes
        To avoid network activity correlation among addresses configured
        for different prefixes
      
      * Uses a simpler algorithm for IID generation
        No need to store "history" anywhere
      Signed-off-by: NFernando Gont <fgont@si6networks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      969c5464
    • M
      net: phy: add concept of shared storage for PHYs · 63490847
      Michael Walle 提交于
      There are packages which contain multiple PHY devices, eg. a quad PHY
      transceiver. Provide functions to allocate and free shared storage.
      
      Usually, a quad PHY contains global registers, which don't belong to any
      PHY. Provide convenience functions to access these registers.
      Signed-off-by: NMichael Walle <michael@walle.cc>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      63490847
  7. 06 5月, 2020 2 次提交
  8. 05 5月, 2020 7 次提交
    • C
      bonding: remove useless stats_lock_key · e7511f56
      Cong Wang 提交于
      After commit b3e80d44
      ("bonding: fix lockdep warning in bond_get_stats()") the dynamic
      key is no longer necessary, as we compute nest level at run-time.
      So, we can just remove it to save some lockdep key entries.
      
      Test commands:
       ip link add bond0 type bond
       ip link add bond1 type bond
       ip link set bond0 master bond1
       ip link set bond0 nomaster
       ip link set bond1 master bond0
      
      Reported-and-tested-by: syzbot+aaa6fa4949cc5d9b7b25@syzkaller.appspotmail.com
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Acked-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e7511f56
    • C
      net: partially revert dynamic lockdep key changes · 1a33e10e
      Cong Wang 提交于
      This patch reverts the folowing commits:
      
      commit 064ff66e
      "bonding: add missing netdev_update_lockdep_key()"
      
      commit 53d37497
      "net: avoid updating qdisc_xmit_lock_key in netdev_update_lockdep_key()"
      
      commit 1f26c0d3
      "net: fix kernel-doc warning in <linux/netdevice.h>"
      
      commit ab92d68f
      "net: core: add generic lockdep keys"
      
      but keeps the addr_list_lock_key because we still lock
      addr_list_lock nestedly on stack devices, unlikely xmit_lock
      this is safe because we don't take addr_list_lock on any fast
      path.
      
      Reported-and-tested-by: syzbot+aaa6fa4949cc5d9b7b25@syzkaller.appspotmail.com
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Taehee Yoo <ap420073@gmail.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a33e10e
    • E
      net_sched: sch_fq: add horizon attribute · 39d01050
      Eric Dumazet 提交于
      QUIC servers would like to use SO_TXTIME, without having CAP_NET_ADMIN,
      to efficiently pace UDP packets.
      
      As far as sch_fq is concerned, we need to add safety checks, so
      that a buggy application does not fill the qdisc with packets
      having delivery time far in the future.
      
      This patch adds a configurable horizon (default: 10 seconds),
      and a configurable policy when a packet is beyond the horizon
      at enqueue() time:
      - either drop the packet (default policy)
      - or cap its delivery time to the horizon.
      
      $ tc -s -d qd sh dev eth0
      qdisc fq 8022: root refcnt 257 limit 10000p flow_limit 100p buckets 1024
       orphan_mask 1023 quantum 10Kb initial_quantum 51160b low_rate_threshold 550Kbit
       refill_delay 40.0ms timer_slack 10.000us horizon 10.000s
       Sent 1234215879 bytes 837099 pkt (dropped 21, overlimits 0 requeues 6)
       backlog 0b 0p requeues 6
        flows 1191 (inactive 1177 throttled 0)
        gc 0 highprio 0 throttled 692 latency 11.480us
        pkts_too_long 0 alloc_errors 0 horizon_drops 21 horizon_caps 0
      
      v2: fixed an overflow on 32bit kernels in fq_init(), reported
          by kbuild test robot <lkp@intel.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      39d01050
    • C
      net_sched: fix tcm_parent in tc filter dump · a7df4870
      Cong Wang 提交于
      When we tell kernel to dump filters from root (ffff:ffff),
      those filters on ingress (ffff:0000) are matched, but their
      true parents must be dumped as they are. However, kernel
      dumps just whatever we tell it, that is either ffff:ffff
      or ffff:0000:
      
       $ nl-cls-list --dev=dummy0 --parent=root
       cls basic dev dummy0 id none parent root prio 49152 protocol ip match-all
       cls basic dev dummy0 id :1 parent root prio 49152 protocol ip match-all
       $ nl-cls-list --dev=dummy0 --parent=ffff:
       cls basic dev dummy0 id none parent ffff: prio 49152 protocol ip match-all
       cls basic dev dummy0 id :1 parent ffff: prio 49152 protocol ip match-all
      
      This is confusing and misleading, more importantly this is
      a regression since 4.15, so the old behavior must be restored.
      
      And, when tc filters are installed on a tc class, the parent
      should be the classid, rather than the qdisc handle. Commit
      edf6711c ("net: sched: remove classid and q fields from tcf_proto")
      removed the classid we save for filters, we can just restore
      this classid in tcf_block.
      
      Steps to reproduce this:
       ip li set dev dummy0 up
       tc qd add dev dummy0 ingress
       tc filter add dev dummy0 parent ffff: protocol arp basic action pass
       tc filter show dev dummy0 root
      
      Before this patch:
       filter protocol arp pref 49152 basic
       filter protocol arp pref 49152 basic handle 0x1
      	action order 1: gact action pass
      	 random type none pass val 0
      	 index 1 ref 1 bind 1
      
      After this patch:
       filter parent ffff: protocol arp pref 49152 basic
       filter parent ffff: protocol arp pref 49152 basic handle 0x1
       	action order 1: gact action pass
       	 random type none pass val 0
      	 index 1 ref 1 bind 1
      
      Fixes: a10fa201 ("net: sched: propagate q and parent from caller down to tcf_fill_node")
      Fixes: edf6711c ("net: sched: remove classid and q fields from tcf_proto")
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a7df4870
    • H
      net: add helper eth_hw_addr_crc · b86cd700
      Heiner Kallweit 提交于
      Several drivers use the same code as basis for filter hashes. Therefore
      let's factor it out to a helper. This way drivers don't have to access
      struct netdev_hw_addr internals.
      Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b86cd700
    • G
      uapi: revert flexible-array conversions · 1e6e9d0f
      Gustavo A. R. Silva 提交于
      These structures can get embedded in other structures in user-space
      and cause all sorts of warnings and problems. So, we better don't take
      any chances and keep the zero-length arrays in place for now.
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      1e6e9d0f
    • L
      gcc-10 warnings: fix low-hanging fruit · 9d82973e
      Linus Torvalds 提交于
      Due to a bug-report that was compiler-dependent, I updated one of my
      machines to gcc-10.  That shows a lot of new warnings.  Happily they
      seem to be mostly the valid kind, but it's going to cause a round of
      churn for getting rid of them..
      
      This is the really low-hanging fruit of removing a couple of zero-sized
      arrays in some core code.  We have had a round of these patches before,
      and we'll have many more coming, and there is nothing special about
      these except that they were particularly trivial, and triggered more
      warnings than most.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9d82973e
  9. 03 5月, 2020 1 次提交