1. 28 2月, 2020 28 次提交
  2. 27 2月, 2020 12 次提交
    • D
      Merge branch 'VLANs-DSA-switches-and-multiple-bridges' · 2b99e54b
      David S. Miller 提交于
      Russell King says:
      
      ====================
      VLANs, DSA switches and multiple bridges
      
      This is a repost of the previously posted RFC back in December, which
      did not get fully reviewed.  I've dropped the RFC tag this time as no
      one really found anything too problematical in the RFC posting.
      
      I've been trying to configure DSA for VLANs and not having much success.
      The setup is quite simple:
      
      - The main network is untagged
      - The wifi network is a vlan tagged with id $VN running over the main
        network.
      
      I have an Armada 388 Clearfog with a PCIe wifi card which I'm trying to
      setup to provide wifi access to the vlan $VN network, while the switch
      is also part of the main network.
      
      However, I'm encountering problems:
      
      1) vlan support in DSA has a different behaviour from the Linux
         software bridge implementation.
      
          # bridge vlan
          port    vlan ids
          lan1     1 PVID Egress Untagged
          ...
      
         shows the default setup - the bridge ports are all configured for
         vlan 1, untagged egress, and vlan 1 as the port vid.  Issuing:
      
          # ip li set dev br0 type bridge vlan_filtering 1
      
         with no other vlan configuration commands on a Linux software bridge
         continues to allow untagged traffic to flow across the bridge.
      
         This difference in behaviour is because the MV88E6xxx VTU is
         completely empty - because net/dsa ignores all vlan settings for
         a port if br_vlan_enabled(dp->bridge_dev) is false - this reflects
         the vlan filtering state of the bridge, not whether the bridge is
         vlan aware.
      
         What this means is that attempting to configure the bridge port
         vlans before enabling vlan filtering works for Linux software
         bridges, but fails for DSA bridges.
      
      2) Assuming the above is sorted, we move on to the next issue, which
         is altogether more weird.  Let's take a setup where we have a
         DSA bridge with lan1..6 in a bridge device, br0, with vlan
         filtering enabled.  lan1 is the upstream port, lan2 is a downstream
         port that also wants to see traffic on vlan id $VN.
      
         Both lan1 and lan2 are configured for that:
      
           # bridge vlan add vid $VN dev lan1
           # bridge vlan add vid $VN dev lan2
           # ip li set br0 type bridge vlan_filtering 1
      
         Untagged traffic can now pass between all the six lan ports, and
         vlan $VN between lan1 and lan2 only.  The MV88E6xxx 8021q_mode
         debugfs file shows all lan ports are in mode "secure" - this is
         important!  /sys/class/net/br0/bridge/vlan_filtering contains 1.
      
         tcpdumping from another machine on lan4 shows that no $VN traffic
         reaches it.  Everything seems to be working correctly...
      
         In order to further bridge vlan $VN traffic to hostapd's wifi
         interface, things get a little more complex - we can't add hostapd's
         wifi interface to br0 directly, because hostapd will bring up the
         wifi interface and leak the main, untagged traffic onto the wifi.
         (hostapd does have vlan support, but only as a dynamic per-client
         thing, and there's no hooks I can see to allow script-based config
         of the network setup before hostapd up's the wifi interface.)
      
         So, what I tried was:
      
           # ip li add link br0 name br0.$VN type vlan id $VN
           # bridge vlan add vid $VN dev br0 self
           # ip li set dev br0.$VN up
      
         So far so good, we get a vlan interface on top of the bridge, and
         tcpdumping it shows we get traffic.  The 8021q_mode file has not
         changed state.  Everything still seems to be correct.
      
           # bridge addbr br1
      
         Still nothing has changed.
      
           # bridge addif br1 br0.$VN
      
         And now the 8021q_mode debugfs file shows that all ports are now in
         "disabled" mode, but /sys/class/net/br0/bridge/vlan_filtering still
         contains '1'.  In other words, br0 still thinks vlan filtering is
         enabled, but the hardware has had vlan filtering disabled.
      
         Adding some stack traces to an appropriate point indicates that this
         is because __switchdev_handle_port_attr_set() recurses down through
         the tree of interfaces, skipping over the vlan interface, applying
         br1's configuration to br0's ports.
      
         This surely can not be right - surely
         __switchdev_handle_port_attr_set() and similar should stop recursing
         down through another master bridge device?  There are probably other
         network device classes that switchdev shouldn't recurse down too.
      
         I've considered whether switchdev is the right level to do it, and
         I think it is - as we want the check/set callbacks to be called for
         the top level device even if it is a master bridge device, but we
         don't want to recurse through a lower master bridge device.
      
      v2: dropped patch 3, since that has an outstanding issue, and my
      question on it has not been answered.  Otherwise, these are the
      same patches.  Maybe we can move forward with just these two?
      
      v3: include DSA ports in patch 2
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2b99e54b
    • R
      net: dsa: mv88e6xxx: fix duplicate vlan warning · 933b4425
      Russell King 提交于
      When setting VLANs on DSA switches, the VLAN is added to both the port
      concerned as well as the CPU port by dsa_slave_vlan_add(), as well as
      any DSA ports.  If multiple ports are configured with the same VLAN ID,
      this triggers a warning on the CPU and DSA ports.
      
      Avoid this warning for CPU and DSA ports.
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      933b4425
    • R
      net: switchdev: do not propagate bridge updates across bridges · 07c6f980
      Russell King 提交于
      When configuring a tree of independent bridges, propagating changes
      from the upper bridge across a bridge master to the lower bridge
      ports brings surprises.
      
      For example, a lower bridge may have vlan filtering enabled.  It
      may have a vlan interface attached to the bridge master, which may
      then be incorporated into another bridge.  As soon as the lower
      bridge vlan interface is attached to the upper bridge, the lower
      bridge has vlan filtering disabled.
      
      This occurs because switchdev recursively applies its changes to
      all lower devices no matter what.
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Tested-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      07c6f980
    • D
      net: qrtr: Fix error pointer vs NULL bugs · 9baeea50
      Dan Carpenter 提交于
      The callers only expect NULL pointers, so returning an error pointer
      will lead to an Oops.
      
      Fixes: 0c2204a4 ("net: qrtr: Migrate nameservice to kernel from userspace")
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9baeea50
    • A
      net: phy: mscc: add missing shift for media operation mode selection · 1ac7b090
      Antoine Tenart 提交于
      This patch adds a missing shift for the media operation mode selection.
      This does not fix the driver as the current operation mode (copper) has
      a value of 0, but this wouldn't work for other modes.
      Signed-off-by: NAntoine Tenart <antoine.tenart@bootlin.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ac7b090
    • A
      net: ena: fix broken interface between ENA driver and FW · 92040c6d
      Arthur Kiyanovski 提交于
      In this commit we revert the part of
      commit 1a63443a ("net/amazon: Ensure that driver version is aligned to the linux kernel"),
      which breaks the interface between the ENA driver and FW.
      
      We also replace the use of DRIVER_VERSION with DRIVER_GENERATION
      when we bring back the deleted constants that are used in interface with
      ENA device FW.
      
      This commit does not change the driver version reported to the user via
      ethtool, which remains the kernel version.
      
      Fixes: 1a63443a ("net/amazon: Ensure that driver version is aligned to the linux kernel")
      Signed-off-by: NArthur Kiyanovski <akiyano@amazon.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      92040c6d
    • D
      Merge branch 'mptcp-update-mptcp-ack-sequence-outside-of-recv-path' · 621135a0
      David S. Miller 提交于
      Florian Westphal says:
      
      ====================
      mptcp: update mptcp ack sequence outside of recv path
      
      This series moves mptcp-level ack sequence update outside of the recvmsg path.
      Current approach has two problems:
      
      1. There is delay between arrival of new data and the time we can ack
         this data.
      2. If userspace doesn't call recv for some time, mptcp ack_seq is not
         updated at all, even if this data is queued in the subflow socket
         receive queue.
      
      Move skbs from the subflow socket receive queue to the mptcp-level
      receive queue, updating the mptcp-level ack sequence and have recv
      take skbs from the mptcp-level receive queue.
      
      The first place where we will attempt to update the mptcp level acks
      is from the subflows' data_ready callback, even before we make userspace
      aware of new data.
      
      Because of possible deadlock (we need to take the mptcp socket lock
      while already holding the subflow sockets lock), we may still need to
      defer the mptcp-level ack update.  In such case, this work will be either
      done from work queue or recv path, depending on which runs sooner.
      
      In order to avoid pointless scheduling of the work queue, work
      will be queued from the mptcp sockets lock release callback.
      This allows to detect when the socket owner did drain the subflow
      socket receive queue.
      
      Please see individual patches for more information.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      621135a0
    • P
      mptcp: defer work schedule until mptcp lock is released · 14c441b5
      Paolo Abeni 提交于
      Don't schedule the work queue right away, instead defer this
      to the lock release callback.
      
      This has the advantage that it will give recv path a chance to
      complete -- this might have moved all pending packets from the
      subflow to the mptcp receive queue, which allows to avoid the
      schedule_work().
      Co-developed-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      14c441b5
    • F
      mptcp: avoid work queue scheduling if possible · 2e52213c
      Florian Westphal 提交于
      We can't lock_sock() the mptcp socket from the subflow data_ready callback,
      it would result in ABBA deadlock with the subflow socket lock.
      
      We can however grab the spinlock: if that succeeds and the mptcp socket
      is not owned at the moment, we can process the new skbs right away
      without deferring this to the work queue.
      
      This avoids the schedule_work and hence the small delay until the
      work item is processed.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e52213c
    • F
      mptcp: remove mptcp_read_actor · bfae9dae
      Florian Westphal 提交于
      Only used to discard stale data from the subflow, so move
      it where needed.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bfae9dae
    • F
      mptcp: add rmem queue accounting · 600911ff
      Florian Westphal 提交于
      If userspace never drains the receive buffers we must stop draining
      the subflow socket(s) at some point.
      
      This adds the needed rmem accouting for this.
      If the threshold is reached, we stop draining the subflows.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      600911ff
    • F
      mptcp: update mptcp ack sequence from work queue · 6771bfd9
      Florian Westphal 提交于
      If userspace is not reading data, all the mptcp-level acks contain the
      ack_seq from the last time userspace read data rather than the most
      recent in-sequence value.
      
      This causes pointless retransmissions for data that is already queued.
      
      The reason for this is that all the mptcp protocol level processing
      happens at mptcp_recv time.
      
      This adds work queue to move skbs from the subflow sockets receive
      queue on the mptcp socket receive queue (which was not used so far).
      
      This allows us to announce the correct mptcp ack sequence in a timely
      fashion, even when the application does not call recv() on the mptcp socket
      for some time.
      
      We still wake userspace tasks waiting for POLLIN immediately:
      If the mptcp level receive queue is empty (because the work queue is
      still pending) it can be filled from in-sequence subflow sockets at
      recv time without a need to wait for the worker.
      
      The skb_orphan when moving skbs from subflow to mptcp level is needed,
      because the destructor (sock_rfree) relies on skb->sk (ssk!) lock
      being taken.
      
      A followup patch will add needed rmem accouting for the moved skbs.
      
      Other problem: In case application behaves as expected, and calls
      recv() as soon as mptcp socket becomes readable, the work queue will
      only waste cpu cycles.  This will also be addressed in followup patches.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6771bfd9