1. 04 5月, 2019 10 次提交
  2. 03 5月, 2019 18 次提交
    • D
      Merge branch 'NXP-SJA1105-DSA-driver' · 8ef988b9
      David S. Miller 提交于
      Vladimir Oltean says:
      
      ====================
      NXP SJA1105 DSA driver
      
      This patchset adds a DSA driver for the SPI-controlled NXP SJA1105
      switch.  Due to the hardware's unfriendliness, most of its state needs
      to be shadowed in kernel memory by the driver. To support this and keep
      a decent amount of cleanliness in the code, a new generic API for
      converting between CPU-accessible ("unpacked") structures and
      hardware-accessible ("packed") structures is proposed and used.
      
      The driver is GPL-2.0 licensed. The source code files which are licensed
      as BSD-3-Clause are hardware support files and derivative of the
      userspace NXP sja1105-tool program, which is BSD-3-Clause licensed.
      
      TODO items:
      * Add support for traffic.
      * Add full support for the P/Q/R/S series. The patches were mostly
        tested on a first-generation T device.
      * Add timestamping support and PTP clock manipulation.
      * Figure out how the tc-taprio hardware offload that was just proposed
        by Vinicius can be used to configure the switch's time-aware scheduler.
      * Rework link state callbacks to use phylink once the SGMII port
        is supported.
      
      Changes in v5:
      1. Removed trailing empty lines at the end of files.
      2. Moved the lib/packing.c file under a CONFIG_PACKING option instead of
         having it always built-in. The module is GPL licensed, which applies
         to its distribution in binary form, but the code is dual-licensed
         which means it can be used in projects with other licenses as well.
      3. Made SJA1105 driver select CONFIG_PACKING and CONFIG_CRC32.
      
      v4 patchset can be found at:
      https://lwn.net/Articles/787077/
      
      Changes in v4:
      1. Previous patchset was broken apart, and for the moment the driver is
         configuring the switch as unmanaged. Support for regular and management
         traffic, as well as for PTP timestamping, will be submitted once the
         basic driver is accepted. Some core DSA patches were also broken out
         of the series, and are a dependency for this series:
         https://patchwork.ozlabs.org/project/netdev/list/?series=105069
      2. Addressed Jiri Pirko's feedback about too generic function and macro
         naming.
      3. Re-introduced ETH_P_DSA_8021Q.
      
      v3 patchset can be found at:
      https://lkml.org/lkml/2019/4/12/978
      
      Changes in v3:
      1. Removed the patch for a dedicated Ethertype to use with 802.1Q DSA
         tagging
      2. Changed the SJA1105 switch tagging protocol sysfs label from
         "sja1105" to "8021q" to denote to users such as tcpdump that the
         structure is more generic.
      3. Respun previous patch "net: dsa: Allow drivers to modulate between
         presence and absence of tagging". Current equivalent patch is called
         "net: dsa: Allow drivers to filter packets they can decode source
         port from" and at least allows reception of management traffic during
         the time when switch tagging is not enabled.
      4. Added DSA-level fixes for the bridge core not unsetting
         vlan_filtering when ports leave. The global VLAN filtering is treated
         as a special case. Made the mt7530 driver use this. This patch
         benefits the SJA1105 because otherwise traffic in standalone mode
         would no longer work after removing the ports from a vlan_filtering
         bridge, since the driver and the hardware would be in an inconsistent
         state.
      5. Restructured the documentation as rst. This depends upon the recently
         submitted "[PATCH net-next] Documentation: net: dsa: transition to
         the rst format": https://patchwork.ozlabs.org/patch/1084658/.
      
      v2 patchset can be found at:
      https://www.spinics.net/lists/netdev/msg563454.html
      
      Changes in v2:
      1. Device ID is no longer auto-detected but enforced based on explicit DT
         compatible string. This helps with stricter checking of DT bindings.
      2. Group all device-specific operations into a sja1105_info structure and
         avoid using the IS_ET() and IS_PQRS() macros at runtime as much as possible.
      3. Added more verbiage to commit messages and documentation.
      4. Treat the case where RGMII internal delays are requested through DT bindings
         and return error.
      5. Miscellaneous cosmetic cleanup in sja1105_clocking.c
      6. Not advertising link features that are not supported, such as pause frames
         and the half duplex modes.
      7. Fixed a mistake in previous patchset where the switch tagging was not
         actually enabled (lost during a rebase). This brought up another uncaught
         issue where switching at runtime between tagging and no-tagging was not
         supported by DSA. Fixed up the mistake in "net: dsa: sja1105: Add support
         for traffic through standalone ports", and added the new patch "net: dsa:
         Allow drivers to modulate between presence and absence of tagging" to
         address the other issue.
      8. Added a workaround for switch resets cutting a frame in the middle of
         transmission, which would throw off some link partners.
      9. Changed the TPID from ETH_P_EDSA (0xDADA) to a newly introduced one:
         ETH_P_DSA_8021Q (0xDADB). Uncovered another mistake in the previous patchset
         with a missing ntohs(), which was not caught because 0xDADA is
         endian-agnostic.
      10. Made NET_DSA_TAG_8021Q select VLAN_8021Q
      11. Renamed __dsa_port_vlan_add to dsa_port_vid_add and not to
          dsa_port_vlan_add_trans, as suggested, because the corresponding _del function
          does not have a transactional phase and the naming is more uniform this way.
      
      v1 patchset can be found at:
      https://www.spinics.net/lists/netdev/msg561589.html
      
      Changes from RFC:
      1. Removed the packing code for the static configuration tables that were
         not currently used
      2. Removed the code for unpacking a static configuration structure from
         a memory buffer (not used)
      3. Completely removed the SGMII stubs, since the configuration is not
         complete anyway.
      4. Moved some code from the SJA1105 introduction commit into the patch
         that used it.
      5. Made the code for checking global VLAN filtering generic and made b53
         driver use it.
      6. Made mt7530 driver use the new generic dp->vlan_filtering
      7. Fixed check for stringset in .get_sset_count
      8. Minor cleanup in sja1105_clocking.c
      9. Fixed a confusing typo in DSA
      
      RFC can be found at:
      https://www.mail-archive.com/netdev@vger.kernel.org/msg291717.html
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ef988b9
    • V
    • V
    • V
      net: dsa: sja1105: Reject unsupported link modes for AN · ad9f299a
      Vladimir Oltean 提交于
      Ethernet flow control:
      
      The switch MAC does not consume, nor does it emit pause frames. It
      simply forwards them as any other Ethernet frame (and since the DMAC is,
      per IEEE spec, 01-80-C2-00-00-01, it means they are filtered as
      link-local traffic and forwarded to the CPU, which can't do anything
      useful with them).
      
      Duplex:
      
      There is no duplex setting in the SJA1105 MAC. It is known to forward
      traffic at line rate on the same port in both directions. Therefore it
      must be that it only supports full duplex.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ad9f299a
    • V
      net: dsa: sja1105: Prevent PHY jabbering during switch reset · 1a4c6940
      Vladimir Oltean 提交于
      Resetting the switch at runtime is currently done while changing the
      vlan_filtering setting (due to the required TPID change).
      
      But reset is asynchronous with packet egress, and the switch core will
      not wait for egress to finish before carrying on with the reset
      operation.
      
      As a result, a connected PHY such as the BCM5464 would see an
      unterminated Ethernet frame and start to jabber (repeat the last seen
      Ethernet symbols - jabber is by definition an oversized Ethernet frame
      with bad FCS). This behavior is strange in itself, but it also causes
      the MACs of some link partners (such as the FRDM-LS1012A) to completely
      lock up.
      
      So as a remedy for this situation, when switch reset is required, simply
      inhibit Tx on all ports, and wait for the necessary time for the
      eventual one frame left in the egress queue (not even the Tx inhibit
      command is instantaneous) to be flushed.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a4c6940
    • V
      net: dsa: sja1105: Add support for configuring address ageing time · 8456721d
      Vladimir Oltean 提交于
      If STP is active, this setting is applied on bridged ports each time an
      Ethernet link is established (topology changes).
      
      Since the setting is global to the switch and a reset is required to
      change it, resets are prevented if the new callback does not change the
      value that the hardware already is programmed for.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8456721d
    • V
    • V
      net: dsa: sja1105: Add support for VLAN operations · 6666cebc
      Vladimir Oltean 提交于
      VLAN filtering cannot be properly disabled in SJA1105. So in order to
      emulate the "no VLAN awareness" behavior (not dropping traffic that is
      tagged with a VID that isn't configured on the port), we need to hack
      another switch feature: programmable TPID (which is 0x8100 for 802.1Q).
      We are reprogramming the TPID to a bogus value which leaves the switch
      thinking that all traffic is untagged, and therefore accepts it.
      
      Under a vlan_filtering bridge, the proper TPID of ETH_P_8021Q is
      installed again, and the switch starts identifying 802.1Q-tagged
      traffic.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6666cebc
    • V
      ether: Add dedicated Ethertype for pseudo-802.1Q DSA tagging · bf5bc3ce
      Vladimir Oltean 提交于
      There are two possible utilizations so far:
      
      - Switch devices that don't support a native insertion/extraction header
        on the CPU port may still enjoy the benefits of port isolation with a
        custom VLAN tag.
      
        For this, they need to have a customizable TPID in hardware and a new
        Ethertype to distinguish between real 802.1Q traffic and the private
        tags used for port separation.
      
      - Switches that don't support the deactivation of VLAN awareness, but
        still want to have a mode in which they accept all traffic, including
        frames that are tagged with a VLAN not configured on their ports, may
        use this as a fake to trick the hardware into thinking that the TPID
        for VLAN is something other than 0x8100.
      
      What follows after the ETH_P_DSA_8021Q EtherType is a regular VLAN
      header (TCI), however there is no other EtherType that can be used for
      this purpose and doesn't already have a well-defined meaning.
      ETH_P_8021AD, ETH_P_QINQ1, ETH_P_QINQ2 and ETH_P_QINQ3 expect that
      another follow-up VLAN tag is present, which is not the case here.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Suggested-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf5bc3ce
    • V
      net: dsa: sja1105: Error out if RGMII delays are requested in DT · f5b8631c
      Vladimir Oltean 提交于
      Documentation/devicetree/bindings/net/ethernet.txt is confusing because
      it says what the MAC should not do, but not what it *should* do:
      
        * "rgmii-rxid" (RGMII with internal RX delay provided by the PHY, the MAC
           should not add an RX delay in this case)
      
      The gap in semantics is threefold:
      1. Is it illegal for the MAC to apply the Rx internal delay by itself,
         and simplify the phy_mode (mask off "rgmii-rxid" into "rgmii") before
         passing it to of_phy_connect? The documentation would suggest yes.
      1. For "rgmii-rxid", while the situation with the Rx clock skew is more
         or less clear (needs to be added by the PHY), what should the MAC
         driver do about the Tx delays? Is it an implicit wild card for the
         MAC to apply delays in the Tx direction if it can? What if those were
         already added as serpentine PCB traces, how could that be made more
         obvious through DT bindings so that the MAC doesn't attempt to add
         them twice and again potentially break the link?
      3. If the interface is a fixed-link and therefore the PHY object is
         fixed (a purely software entity that obviously cannot add clock
         skew), what is the meaning of the above property?
      
      So an interpretation of the RGMII bindings was chosen that hopefully
      does not contradict their intention but also makes them more applied.
      The SJA1105 driver understands to act upon "rgmii-*id" phy-mode bindings
      if the port is in the PHY role (either explicitly, or if it is a
      fixed-link). Otherwise it always passes the duty of setting up delays to
      the PHY driver.
      
      The error behavior that this patch adds is required on SJA1105E/T where
      the MAC really cannot apply internal delays. If the other end of the
      fixed-link cannot apply RGMII delays either (this would be specified
      through its own DT bindings), then the situation requires PCB delays.
      
      For SJA1105P/Q/R/S, this is however hardware supported and the error is
      thus only temporary. I created a stub function pointer for configuring
      delays per-port on RXC and TXC, and will implement it when I have access
      to a board with this hardware setup.
      
      Meanwhile do not allow the user to select an invalid configuration.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f5b8631c
    • V
      net: dsa: sja1105: Add support for FDB and MDB management · 291d1e72
      Vladimir Oltean 提交于
      Currently only the (more difficult) first generation E/T series is
      supported. Here the TCAM is only 4-way associative, and to know where
      the hardware will search for a FDB entry, we need to perform the same
      hash algorithm in order to install the entry in the correct bin.
      
      On P/Q/R/S, the TCAM should be fully associative. However the SPI
      command interface is different, and because I don't have access to a
      new-generation device at the moment, support for it is TODO.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      291d1e72
    • V
      net: dsa: Introduce driver for NXP SJA1105 5-port L2 switch · 8aa9ebcc
      Vladimir Oltean 提交于
      At this moment the following is supported:
      * Link state management through phylib
      * Autonomous L2 forwarding managed through iproute2 bridge commands.
      
      IP termination must be done currently through the master netdevice,
      since the switch is unmanaged at this point and using
      DSA_TAG_PROTO_NONE.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NGeorg Waibel <georg.waibel@sensor-technik.de>
      Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8aa9ebcc
    • V
      lib: Add support for generic packing operations · 554aae35
      Vladimir Oltean 提交于
      This provides an unified API for accessing register bit fields
      regardless of memory layout. The basic unit of data for these API
      functions is the u64. The process of transforming an u64 from native CPU
      encoding into the peripheral's encoding is called 'pack', and
      transforming it from peripheral to native CPU encoding is 'unpack'.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      554aae35
    • N
      net: macb: shrink macb_platform_data structure · 8b952747
      Nicolas Ferre 提交于
      This structure was used intensively for machine specific values
      when DT was not used. Since the removal of AVR32 from the kernel,
      this structure is only used for passing clocks from PCI macb wrapper, all
      other fields being 0.
      All other known platforms use DT.
      
      Remove the leftovers but make sure that PCI macb still works as
      expected by using default values:
      - phydev->irq is set to PHY_POLL by mdiobus_alloc()
      - mii_bus->phy_mask is cleared while allocating it
      - bp->phy_interface is set to PHY_INTERFACE_MODE_MII if mode not found
      in DT.
      
      This simplifies driver probe path and particularly phy handling.
      Signed-off-by: NNicolas Ferre <nicolas.ferre@microchip.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b952747
    • N
      net: macb: remove redundant struct phy_device declaration · d5aeb176
      Nicolas Ferre 提交于
      While moving the chunk of code during 739de9a1
      ("net: macb: Reorganize macb_mii bringup"), the declaration of
      struct phy_device declaration was kept. It's not useful in this
      function as we alrady have a phydev pointer.
      Signed-off-by: NNicolas Ferre <nicolas.ferre@microchip.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d5aeb176
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · ff24e498
      David S. Miller 提交于
      Three trivial overlapping conflicts.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ff24e498
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · ea986679
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Out of bounds access in xfrm IPSEC policy unlink, from Yue Haibing.
      
       2) Missing length check for esp4 UDP encap, from Sabrina Dubroca.
      
       3) Fix byte order of RX STBC access in mac80211, from Johannes Berg.
      
       4) Inifnite loop in bpftool map create, from Alban Crequy.
      
       5) Register mark fix in ebpf verifier after pkt/null checks, from Paul
          Chaignon.
      
       6) Properly use rcu_dereference_sk_user_data in L2TP code, from Eric
          Dumazet.
      
       7) Buffer overrun in marvell phy driver, from Andrew Lunn.
      
       8) Several crash and statistics handling fixes to bnxt_en driver, from
          Michael Chan and Vasundhara Volam.
      
       9) Several fixes to the TLS layer from Jakub Kicinski (copying negative
          amounts of data in reencrypt, reencrypt frag copying, blind nskb->sk
          NULL deref, etc).
      
      10) Several UDP GRO fixes, from Paolo Abeni and Eric Dumazet.
      
      11) PID/UID checks on ipv6 flow labels are inverted, from Willem de
          Bruijn.
      
      12) Use after free in l2tp, from Eric Dumazet.
      
      13) IPV6 route destroy races, also from Eric Dumazet.
      
      14) SCTP state machine can erroneously run recursively, fix from Xin
          Long.
      
      15) Adjust AF_PACKET msg_name length checks, add padding bytes if
          necessary. From Willem de Bruijn.
      
      16) Preserve skb_iif, so that forwarded packets have consistent values
          even if fragmentation is involved. From Shmulik Ladkani.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (69 commits)
        udp: fix GRO packet of death
        ipv6: A few fixes on dereferencing rt->from
        rds: ib: force endiannes annotation
        selftests: fib_rule_tests: print the result and return 1 if any tests failed
        ipv4: ip_do_fragment: Preserve skb_iif during fragmentation
        net/tls: avoid NULL pointer deref on nskb->sk in fallback
        selftests: fib_rule_tests: Fix icmp proto with ipv6
        packet: validate msg_namelen in send directly
        packet: in recvmsg msg_name return at least sizeof sockaddr_ll
        sctp: avoid running the sctp state machine recursively
        stmmac: pci: Fix typo in IOT2000 comment
        Documentation: fix netdev-FAQ.rst markup warning
        ipv6: fix races in ip6_dst_destroy()
        l2ip: fix possible use-after-free
        appletalk: Set error code if register_snap_client failed
        net: dsa: bcm_sf2: fix buffer overflow doing set_rxnfc
        rxrpc: Fix net namespace cleanup
        ipv6/flowlabel: wait rcu grace period before put_pid()
        vrf: Use orig netdev to count Ip6InNoRoutes and a fresh route lookup when sending dest unreach
        tcp: add sanity tests in tcp_add_backlog()
        ...
      ea986679
    • L
      Merge tag 'for-linus-20190502' of git://git.kernel.dk/linux-block · 5ce3307b
      Linus Torvalds 提交于
      Pull io_uring fixes from Jens Axboe:
       "This is mostly io_uring fixes/tweaks. Most of these were actually done
        in time for the last -rc, but I wanted to ensure that everything
        tested out great before including them. The code delta looks larger
        than it really is, as it's mostly just comment additions/changes.
      
        Outside of the comment additions/changes, this is mostly removal of
        unnecessary barriers. In all, this pull request contains:
      
         - Tweak to how we handle errors at submission time. We now post a
           completion event if the error occurs on behalf of an sqe, instead
           of returning it through the system call. If the error happens
           outside of a specific sqe, we return the error through the system
           call. This makes it nicer to use and makes the "normal" use case
           behave the same as the offload cases. (me)
      
         - Fix for a missing req reference drop from async context (me)
      
         - If an sqe is submitted with RWF_NOWAIT, don't punt it to async
           context. Return -EAGAIN directly, instead of using it as a hint to
           do async punt. (Stefan)
      
         - Fix notes on barriers (Stefan)
      
         - Remove unnecessary barriers (Stefan)
      
         - Fix potential double free of memory in setup error (Mark)
      
         - Further improve sq poll CPU validation (Mark)
      
         - Fix page allocation warning and leak on buffer registration error
           (Mark)
      
         - Fix iov_iter_type() for new no-ref flag (Ming)
      
         - Fix a case where dio doesn't honor bio no-page-ref (Ming)"
      
      * tag 'for-linus-20190502' of git://git.kernel.dk/linux-block:
        io_uring: avoid page allocation warnings
        iov_iter: fix iov_iter_type
        block: fix handling for BIO_NO_PAGE_REF
        io_uring: drop req submit reference always in async punt
        io_uring: free allocated io_memory once
        io_uring: fix SQPOLL cpu validation
        io_uring: have submission side sqe errors post a cqe
        io_uring: remove unnecessary barrier after unsetting IORING_SQ_NEED_WAKEUP
        io_uring: remove unnecessary barrier after incrementing dropped counter
        io_uring: remove unnecessary barrier before reading SQ tail
        io_uring: remove unnecessary barrier after updating SQ head
        io_uring: remove unnecessary barrier before reading cq head
        io_uring: remove unnecessary barrier before wq_has_sleeper
        io_uring: fix notes on barriers
        io_uring: fix handling SQEs requesting NOWAIT
      5ce3307b
  3. 02 5月, 2019 12 次提交
    • L
      Merge tag 'pci-v5.1-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · b7a5b22b
      Linus Torvalds 提交于
      Pull PCI fixes from Bjorn Helgaas:
       "I apologize for sending these so late in the cycle. We went back and
        forth about how to deal with the unexpected logging of intentional
        link state changes and finally decided to just config them off by
        default.
      
        PCI fixes:
      
         - Stop ignoring "pci=disable_acs_redir" parameter (Logan Gunthorpe)
      
         - Use shared MSI/MSI-X vector for Link Bandwidth Management (Alex
           Williamson)
      
         - Add Kconfig option for Link Bandwidth notification messages (Keith
           Busch)"
      
      * tag 'pci-v5.1-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        PCI/LINK: Add Kconfig option (default off)
        PCI/portdrv: Use shared MSI/MSI-X vector for Bandwidth Management
        PCI: Fix issue with "pci=disable_acs_redir" parameter being ignored
      b7a5b22b
    • L
      Merge tag 'mtd/fixes-for-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux · e2a4b102
      Linus Torvalds 提交于
      Pull MTD fix from Richard Weinberger:
       "A single regression fix for the marvell nand driver"
      
      * tag 'mtd/fixes-for-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
        mtd: rawnand: marvell: Clean the controller state before each operation
      e2a4b102
    • K
      PCI/LINK: Add Kconfig option (default off) · 2078e1e7
      Keith Busch 提交于
      e8303bb7 ("PCI/LINK: Report degraded links via link bandwidth
      notification") added dmesg logging whenever a link changes speed or width
      to a state that is considered degraded.  Unfortunately, it cannot
      differentiate signal integrity-related link changes from those
      intentionally initiated by an endpoint driver, including drivers that may
      live in userspace or VMs when making use of vfio-pci.  Some GPU drivers
      actively manage the link state to save power, which generates a stream of
      messages like this:
      
        vfio-pci 0000:07:00.0: 32.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s x16 link at 0000:00:02.0 (capable of 64.000 Gb/s with 5 GT/s x16 link)
      
      Since we can't distinguish the intentional changes from the signal
      integrity issues, leave the reporting turned off by default.  Add a Kconfig
      option to turn it on if desired.
      
      Fixes: e8303bb7 ("PCI/LINK: Report degraded links via link bandwidth notification")
      Link: https://lore.kernel.org/linux-pci/20190501142942.26972-1-keith.busch@intel.comSigned-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      2078e1e7
    • E
      net: ll_temac: Fix typo bug for 32-bit · 26f146ed
      Esben Haabendal 提交于
      Fixes: d84aec42 ("net: ll_temac: Fix support for 64-bit platforms")
      Signed-off-by: NEsben Haabendal <esben@geanix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      26f146ed
    • E
      udp: fix GRO packet of death · 4dd2b82d
      Eric Dumazet 提交于
      syzbot was able to crash host by sending UDP packets with a 0 payload.
      
      TCP does not have this issue since we do not aggregate packets without
      payload.
      
      Since dev_gro_receive() sets gso_size based on skb_gro_len(skb)
      it seems not worth trying to cope with padded packets.
      
      BUG: KASAN: slab-out-of-bounds in skb_gro_receive+0xf5f/0x10e0 net/core/skbuff.c:3826
      Read of size 16 at addr ffff88808893fff0 by task syz-executor612/7889
      
      CPU: 0 PID: 7889 Comm: syz-executor612 Not tainted 5.1.0-rc7+ #96
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
       kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
       __asan_report_load16_noabort+0x14/0x20 mm/kasan/generic_report.c:133
       skb_gro_receive+0xf5f/0x10e0 net/core/skbuff.c:3826
       udp_gro_receive_segment net/ipv4/udp_offload.c:382 [inline]
       call_gro_receive include/linux/netdevice.h:2349 [inline]
       udp_gro_receive+0xb61/0xfd0 net/ipv4/udp_offload.c:414
       udp4_gro_receive+0x763/0xeb0 net/ipv4/udp_offload.c:478
       inet_gro_receive+0xe72/0x1110 net/ipv4/af_inet.c:1510
       dev_gro_receive+0x1cd0/0x23c0 net/core/dev.c:5581
       napi_gro_frags+0x36b/0xd10 net/core/dev.c:5843
       tun_get_user+0x2f24/0x3fb0 drivers/net/tun.c:1981
       tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2027
       call_write_iter include/linux/fs.h:1866 [inline]
       do_iter_readv_writev+0x5e1/0x8e0 fs/read_write.c:681
       do_iter_write fs/read_write.c:957 [inline]
       do_iter_write+0x184/0x610 fs/read_write.c:938
       vfs_writev+0x1b3/0x2f0 fs/read_write.c:1002
       do_writev+0x15e/0x370 fs/read_write.c:1037
       __do_sys_writev fs/read_write.c:1110 [inline]
       __se_sys_writev fs/read_write.c:1107 [inline]
       __x64_sys_writev+0x75/0xb0 fs/read_write.c:1107
       do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x441cc0
      Code: 05 48 3d 01 f0 ff ff 0f 83 9d 09 fc ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 83 3d 51 93 29 00 00 75 14 b8 14 00 00 00 0f 05 <48> 3d 01 f0 ff ff 0f 83 74 09 fc ff c3 48 83 ec 08 e8 ba 2b 00 00
      RSP: 002b:00007ffe8c716118 EFLAGS: 00000246 ORIG_RAX: 0000000000000014
      RAX: ffffffffffffffda RBX: 00007ffe8c716150 RCX: 0000000000441cc0
      RDX: 0000000000000001 RSI: 00007ffe8c716170 RDI: 00000000000000f0
      RBP: 0000000000000000 R08: 000000000000ffff R09: 0000000000a64668
      R10: 0000000020000040 R11: 0000000000000246 R12: 000000000000c2d9
      R13: 0000000000402b50 R14: 0000000000000000 R15: 0000000000000000
      
      Allocated by task 5143:
       save_stack+0x45/0xd0 mm/kasan/common.c:75
       set_track mm/kasan/common.c:87 [inline]
       __kasan_kmalloc mm/kasan/common.c:497 [inline]
       __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:470
       kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:505
       slab_post_alloc_hook mm/slab.h:437 [inline]
       slab_alloc mm/slab.c:3393 [inline]
       kmem_cache_alloc+0x11a/0x6f0 mm/slab.c:3555
       mm_alloc+0x1d/0xd0 kernel/fork.c:1030
       bprm_mm_init fs/exec.c:363 [inline]
       __do_execve_file.isra.0+0xaa3/0x23f0 fs/exec.c:1791
       do_execveat_common fs/exec.c:1865 [inline]
       do_execve fs/exec.c:1882 [inline]
       __do_sys_execve fs/exec.c:1958 [inline]
       __se_sys_execve fs/exec.c:1953 [inline]
       __x64_sys_execve+0x8f/0xc0 fs/exec.c:1953
       do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 5351:
       save_stack+0x45/0xd0 mm/kasan/common.c:75
       set_track mm/kasan/common.c:87 [inline]
       __kasan_slab_free+0x102/0x150 mm/kasan/common.c:459
       kasan_slab_free+0xe/0x10 mm/kasan/common.c:467
       __cache_free mm/slab.c:3499 [inline]
       kmem_cache_free+0x86/0x260 mm/slab.c:3765
       __mmdrop+0x238/0x320 kernel/fork.c:677
       mmdrop include/linux/sched/mm.h:49 [inline]
       finish_task_switch+0x47b/0x780 kernel/sched/core.c:2746
       context_switch kernel/sched/core.c:2880 [inline]
       __schedule+0x81b/0x1cc0 kernel/sched/core.c:3518
       preempt_schedule_irq+0xb5/0x140 kernel/sched/core.c:3745
       retint_kernel+0x1b/0x2d
       arch_local_irq_restore arch/x86/include/asm/paravirt.h:767 [inline]
       kmem_cache_free+0xab/0x260 mm/slab.c:3766
       anon_vma_chain_free mm/rmap.c:134 [inline]
       unlink_anon_vmas+0x2ba/0x870 mm/rmap.c:401
       free_pgtables+0x1af/0x2f0 mm/memory.c:394
       exit_mmap+0x2d1/0x530 mm/mmap.c:3144
       __mmput kernel/fork.c:1046 [inline]
       mmput+0x15f/0x4c0 kernel/fork.c:1067
       exec_mmap fs/exec.c:1046 [inline]
       flush_old_exec+0x8d9/0x1c20 fs/exec.c:1279
       load_elf_binary+0x9bc/0x53f0 fs/binfmt_elf.c:864
       search_binary_handler fs/exec.c:1656 [inline]
       search_binary_handler+0x17f/0x570 fs/exec.c:1634
       exec_binprm fs/exec.c:1698 [inline]
       __do_execve_file.isra.0+0x1394/0x23f0 fs/exec.c:1818
       do_execveat_common fs/exec.c:1865 [inline]
       do_execve fs/exec.c:1882 [inline]
       __do_sys_execve fs/exec.c:1958 [inline]
       __se_sys_execve fs/exec.c:1953 [inline]
       __x64_sys_execve+0x8f/0xc0 fs/exec.c:1953
       do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The buggy address belongs to the object at ffff88808893f7c0
       which belongs to the cache mm_struct of size 1496
      The buggy address is located 600 bytes to the right of
       1496-byte region [ffff88808893f7c0, ffff88808893fd98)
      The buggy address belongs to the page:
      page:ffffea0002224f80 count:1 mapcount:0 mapping:ffff88821bc40ac0 index:0xffff88808893f7c0 compound_mapcount: 0
      flags: 0x1fffc0000010200(slab|head)
      raw: 01fffc0000010200 ffffea00025b4f08 ffffea00027b9d08 ffff88821bc40ac0
      raw: ffff88808893f7c0 ffff88808893e440 0000000100000001 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff88808893fe80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff88808893ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      >ffff88808893ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                                                                   ^
       ffff888088940000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       ffff888088940080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      
      Fixes: e20cf8d3 ("udp: implement GRO for plain UDP sockets.")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4dd2b82d
    • L
      Merge tag 'for-v5.1-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply · 600d7258
      Linus Torvalds 提交于
      Pull power supply fixes from Sebastian Reichel:
       "Two more fixes for the 5.1 cycle.
      
        One division by zero fix in a specific driver and one core workaround
        for bad userspace behaviour from systemd regarding uevents. IMHO this
        can be considered to be a userspace bug, but the debug messages are
        useless anyways
      
         - cpcap-battery: fix a division by zero
      
         - core: fix systemd issue due to log messages produced by uevent"
      
      * tag 'for-v5.1-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply:
        power: supply: sysfs: prevent endless uevent loop with CONFIG_POWER_SUPPLY_DEBUG
        power: supply: cpcap-battery: Fix division by zero
      600d7258
    • M
      ipv6: A few fixes on dereferencing rt->from · 886b7a50
      Martin KaFai Lau 提交于
      It is a followup after the fix in
      commit 9c69a132 ("route: Avoid crash from dereferencing NULL rt->from")
      
      rt6_do_redirect():
      1. NULL checking is needed on rt->from because a parallel
         fib6_info delete could happen that sets rt->from to NULL.
         (e.g. rt6_remove_exception() and fib6_drop_pcpu_from()).
      
      2. fib6_info_hold() is not enough.  Same reason as (1).
         Meaning, holding dst->__refcnt cannot ensure
         rt->from is not NULL or rt->from->fib6_ref is not 0.
      
         Instead of using fib6_info_hold_safe() which ip6_rt_cache_alloc()
         is already doing, this patch chooses to extend the rcu section
         to keep "from" dereference-able after checking for NULL.
      
      inet6_rtm_getroute():
      1. NULL checking is also needed on rt->from for a similar reason.
         Note that inet6_rtm_getroute() is using RTNL_FLAG_DOIT_UNLOCKED.
      
      Fixes: a68886a6 ("net/ipv6: Make from in rt6_info rcu protected")
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NWei Wang <weiwan@google.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      886b7a50
    • N
      rds: ib: force endiannes annotation · f3505745
      Nicholas Mc Guire 提交于
      While the endiannes is being handled correctly as indicated by the comment
      above the offending line - sparse was unhappy with the missing annotation
      as be64_to_cpu() expects a __be64 argument. To mitigate this annotation
      all involved variables are changed to a consistent __le64 and the
       conversion to uint64_t delayed to the call to rds_cong_map_updated().
      Signed-off-by: NNicholas Mc Guire <hofrat@osadl.org>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f3505745
    • D
      Merge branch 'net-mvpp2-cls-Add-classification' · f76c4b57
      David S. Miller 提交于
      Maxime Chevallier says:
      
      ====================
      net: mvpp2: cls: Add classification
      
      This series is a rework of the previously standalone patch adding
      classification support for mvpp2 :
      
      https://lore.kernel.org/netdev/20190423075031.26074-1-maxime.chevallier@bootlin.com/
      
      This patch has been reworked according to Saeed's review, to make sure
      that the location of the rule is always respected and serves as a way to
      prioritize rules between each other. This the 3rd iteration of this
      submission, but since it's now a series, I reset the revision numbering.
      
      This series implements that in a limited configuration for now, since we
      limit the total number of rules per port to 4.
      
      The main factors for this limitation are that :
       - We share the classification tables between all ports (4 max, although
         one is only used for internal loopback), hence we have to perform a
         logical separation between rules, which is done today by dedicated
         ranges for each port in each table
      
       - The "Flow table", which dictates which lookups operations are
         performed for an ingress packet, in subdivided into 22 "sub flows",
         each corresponding to a traffic type based on the L3 proto, L4
         proto, the presence or not of a VLAN tag and the L3 fragmentation.
      
         This makes so that when adding a rule, it has to be added into each
         of these subflows, introducing duplications of entries and limiting
         our max number of entries.
      
      These limitations can be overcomed in several ways, but for readability
      sake, I'd rather submit basic classification offload support for now,
      and improve it gradually.
      
      This series also adds a small cosmetic cleanup patch (1), and also adds
      support for the "Drop" action compared to the first submission of this
      feature. It is simple enough to be added with this basic support.
      
      Compared to the first submissions, the NETIF_F_NTUPLE flag was also
      removed, following Saeed's comment.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f76c4b57
    • M
      net: mvpp2: cls: Allow dropping packets with classification offload · bec2d46d
      Maxime Chevallier 提交于
      This commit introduces support for the "Drop" action in classification
      offload. This corresponds to the "-1" action with ethtool -N.
      
      This is achieved using the color marking actions available in the C2
      engine, which associate a color to a packet. These colors can be either
      Green, Yellow or Red, Red meaning that the packet should be dropped.
      
      Green and Yellow colors are interpreted by the Policer, which isn't
      supported yet.
      
      This method of dropping using the Classifier is different than the
      already existing early-drop features, such as VLAN filtering and MAC
      UC/MC filtering, which are performed during the Parsing step, and
      therefore take precedence over classification actions.
      Signed-off-by: NMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bec2d46d
    • M
      net: mvpp2: cls: Add Classification offload support · 90b509b3
      Maxime Chevallier 提交于
      This commit introduces basic classification offloading support for the
      PPv2 controller.
      
      The PPv2 classifier has many classification engines, for now we only use
      the C2 TCAM match engine.
      
      This engine allows to perform ternary lookups on 64 bits keys (called
      Header Extracted Key), that are built by extracting fields from the packet
      header and concatenating them. At most 4 fields can be extracted for a
      single lookup.
      
      This basic implementation allows to build the HEK from the following
      fields :
       - L4 source and destination ports (for UDP and TCP)
      
      More fields are to be added in the future.
      
      Classification flows are added through the ethtool interface, using the
      newly introduced flow_rule infrastructure as an internal rule
      representation, allowing to more easily implement tc flower rules if
      need be.
      
      The internal design for now allocates one range of 4 rules per port
      due to the internal design of the flow table, which uses 22 sub-flows.
      
      When inserting a classification rule, the rule is created in every
      relevant sub-flow.
      
      This low rule-count is a very simple design which reaches quickly the
      limitations of the flow table ordering, but guarantees that the rule
      ordering will always be respected.
      
      This commit only introduces support for the "steer to rxq" action.
      Signed-off-by: NMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      90b509b3
    • M
      net: mvpp2: cls: Use a bitfield to represent the flow_type · 84e90b0b
      Maxime Chevallier 提交于
      As of today, the classification code is used only for RSS. We split the
      incoming traffic into multiple flows, that correspond to the ethtool
      flow_type parameter.
      
      We don't want to use the ethtool flow definitions such as TCP_V4_FLOW,
      for several reason :
      
       - We want to decorrelate the driver code from ethtool as much as
         possible, so that we can easily use other interfaces such as tc flower,
      
       - We want the flow_type to be a bitfield, so that we can match flows
         embedded into each other, such as TCP4 which is a subset of IP4.
      
      This commit does the conversion to the newer type.
      Signed-off-by: NMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      84e90b0b