1. 08 2月, 2017 13 次提交
  2. 07 2月, 2017 27 次提交
    • D
      Merge branch 'dsa2-pdata' · 521613c5
      David S. Miller 提交于
      Florian Fainelli says:
      
      ====================
      net: dsa: Support for pdata in dsa2
      
      This is not exactly new, and was sent before, although back then, I did not
      have an user of the pre-declared MDIO board information, but now we do. Note
      that I have additional changes queued up to have b53 register platform data for
      MIPS bcm47xx and bcm63xx.
      
      Yes I know that we should have the Orion platforms eventually be converted to
      Device Tree, but until that happens, I don't want any remaining users of the
      old "dsa" platform device (hence the previous DTS submissions for ARM/mvebu)
      and, there will be platforms out there that most likely won't never see DT
      coming their way (BCM47xx is almost 100% sure, BCM63xx maybe not in a distant
      future).
      
      We would probably want the whole series to be merged via David Miller's tree
      to simplify things.
      
      Thanks!
      
      Changes in v5:
      
      - dropped changes to drivers/base/ because after more than a month, we cannot
        get any answer from Greg KH
      
      Changes in v4:
      
      - Changed device_find_class() to device_find_in_class_name()
      - Added kerneldoc above device_find_in_class_name() to explain what it does
        and the calling convention regarding device reference counts
      - Changed dev_to_net_device to device_to_net_device() added comments
        about what it does and the caller conventions regarding reference counts
      
      Changes in v3:
      
      - Tested EPROBE_DEFER from a mockup MDIO/DSA switch driver and everything
        is fine, once the driver finally probes we have access to platform data
        as expected
      
      - added comment above dsa_port_is_valid() that port->name is mandatory
        for platform data cases
      
      - added an extra check in dsa_parse_member() for a NULL pdata pointer
      
      - fixed a bunch of checkpatch errors and warnings
      
      Changes in v2:
      
      - Rebased against latest net-next/master
      
      - Moved dev_find_class() to device_find_class() into drivers/base/core.c
      
      - Moved dev_to_net_device into net/core/dev.c
      
      - Utilize dsa_chip_data directly instead of dsa_platform_data
      
      - Augmented dsa_chip_data to be multi-CPU port ready
      
      Changes from last submission (few months back):
      
      - rebased against latest net-next
      
      - do not introduce dsa2_platform_data which was overkill and was meant to
        allow us to do exaclty the same things with platform data and Device Tree
        we use the existing dsa_platform_data instead
      
      - properly register MDIO devices when the MDIO bus is registered and associate
        platform_data with them
      
      - add a change to the Orion platform code to demonstrate how this can be used
      
      Thank you
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      521613c5
    • F
      ARM: orion: Register DSA switch as a MDIO device · 575e93f7
      Florian Fainelli 提交于
      Utilize the ability to pass board specific MDIO bus information towards a
      particular MDIO device thus allowing us to provide the per-port switch layout
      to the Marvell 88E6XXX switch driver.
      
      Since we would end-up with conflicting registration paths, do not register the
      "dsa" platform device anymore.
      
      Note that the MDIO devices registered by code in net/dsa/dsa2.c does not
      parse a dsa_platform_data, but directly take a dsa_chip_data (specific
      to a single switch chip), so we update the different call sites to pass
      this structure down to orion_ge00_switch_init().
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      575e93f7
    • F
      net: phy: Allow pre-declaration of MDIO devices · 648ea013
      Florian Fainelli 提交于
      Allow board support code to collect pre-declarations for MDIO devices by
      registering them with mdiobus_register_board_info(). SPI and I2C buses
      have a similar feature, we were missing this for MDIO devices, but this
      is particularly useful for e.g: MDIO-connected switches which need to
      provide their port layout (often board-specific) to a MDIO Ethernet
      switch driver.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      648ea013
    • F
      net: dsa: Add support for platform data · 71e0bbde
      Florian Fainelli 提交于
      Allow drivers to use the new DSA API with platform data. Most of the
      code in net/dsa/dsa2.c does not rely so much on device_nodes and can get
      the same information from platform_data instead.
      
      We purposely do not support distributed configurations with platform
      data, so drivers should be providing a pointer to a 'struct
      dsa_chip_data' structure if they wish to communicate per-port layout.
      
      Multiple CPUs port could potentially be supported and dsa_chip_data is
      extended to receive up to one reference to an upstream network device
      per port described by a dsa_chip_data structure.
      
      dsa_dev_to_net_device() increments the network device's reference count,
      so we intentionally call dev_put() to be consistent with the DT-enabled
      path, until we have a generic notifier based solution.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      71e0bbde
    • F
      net: dsa: Rename and export dev_to_net_device() · 14b89f36
      Florian Fainelli 提交于
      In preparation for using this function in net/dsa/dsa2.c, rename the function
      to make its scope DSA specific, and export it.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      14b89f36
    • A
      net: dsa: mv88e6xxx: Refactor remaining port setup · a23b2961
      Andrew Lunn 提交于
      Move the remaining port configuration code which varies per device
      into port.c, using ops were necessary. This makes
      mv88e6xxx_6185_family() and mv88e6xxx_6095_family() unused, so remove
      them.
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a23b2961
    • A
      net: dsa: mv88e6xxx: Implement Clause 45 access to SMI devices · cf3e80df
      Andrew Lunn 提交于
      The mv88e6390 MDIO bus controllers can support for clause 45 accesses.
      The internal SERDES interfaces need this, and it is likely external
      10GHz PHYs will be clause 45.
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf3e80df
    • D
      Merge branch 'mv88e6390-CMODE' · 8661a631
      David S. Miller 提交于
      Andrew Lunn says:
      
      ====================
      Set the CMODE for mv88e6390 ports
      
      The mv88e6390 ports 9 & 10 allow there CMODE to be set. CMODE is part
      of what linux defines as phy-mode. Add the needed phy-modes to linux,
      and add code which will act upon the phy-mode property to configure
      the switch port.
      
      These patches have been posted before as part of a bigger patchset
      which has now been broken up. I've added the received reviewed by
      tags, and added device tree documentation.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8661a631
    • A
      net: dsa: mv88e6xxx: Set the CMODE for mv88e6390 ports 9 & 10 · f39908d3
      Andrew Lunn 提交于
      Unlike most ports, ports 9 and 10 of the 6390X family have configurable
      PHY modes. Set the mode as part of adjust_link().
      
      Ordering is important, because the SERDES interfaces connected to
      ports 9 and 10 can be split and assigned to other ports. The CMODE has
      to be correctly set before the SERDES interface on another port can be
      configured. Such configuration is likely to be performed in
      port_enable() and port_disabled(), called on slave_open() and
      slave_close().
      
      The simple case is port 9 and 10 are used for 'CPU' or 'DSA'. In this
      case, the CMODE is set via a phy-mode in dsa_cpu_dsa_setup(), which is
      called early in the switch setup.
      
      When ports 9 or 10 are used as user ports, and have a fixed-phy, when
      the fixed fixed-phy is attached, dsa_slave_adjust_link() is called,
      which results in the adjust_link function being called, setting the
      cmode. The port_enable() will for other ports will be called much
      later.
      
      When ports 9 or 10 are used as user ports and have a real phy attached
      which does not use all the available SERDES interface, e.g. a 1Gbps
      SGMII, there is currently no mechanism in place to set the CMODE of
      the port from software. It must be hoped the stripping resistors are
      correct.
      
      At the same time, add a function to get the cmode. This will be needed
      when configuring the SERDES interfaces.
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f39908d3
    • A
      net: phy: Add 2000base-x, 2500base-x and rxaui modes · 55601a88
      Andrew Lunn 提交于
      The mv88e6390 ports 9 and 10 supports some additional PHY modes. Add
      these modes to the PHY core so they can be used in the binding.
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      55601a88
    • D
      Merge branch 'virtio_net-XDP-adjust_head' · 108d9c71
      David S. Miller 提交于
      John Fastabend says:
      
      ====================
      XDP adjust head support for virtio
      
      This series adds adjust head support for virtio. The following is my
      test setup. I use qemu + virtio as follows,
      
      ./x86_64-softmmu/qemu-system-x86_64 \
        -hda /var/lib/libvirt/images/Fedora-test0.img \
        -m 4096  -enable-kvm -smp 2 -netdev tap,id=hn0,queues=4,vhost=on \
        -device virtio-net-pci,netdev=hn0,mq=on,guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off,vectors=9
      
      In order to use XDP with virtio until LRO is supported TSO must be
      turned off in the host. The important fields in the above command line
      are the following,
      
        guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off
      
      Also note it is possible to conusme more queues than can be supported
      because when XDP is enabled for retransmit XDP attempts to use a queue
      per cpu. My standard queue count is 'queues=4'.
      
      After loading the VM I run the relevant XDP test programs in,
      
        ./sammples/bpf
      
      For this series I tested xdp1, xdp2, and xdp_tx_iptunnel. I usually test
      with iperf (-d option to get bidirectional traffic), ping, and pktgen.
      I also have a modified xdp1 that returns XDP_PASS on any packet to ensure
      the normal traffic path to the stack continues to work with XDP loaded.
      
      It would be great to automate this soon. At the moment I do it by hand
      which is starting to get tedious.
      
      v2: original series dropped trace points after merge.
      ====================
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      108d9c71
    • J
      virtio_net: XDP support for adjust_head · 2de2f7f4
      John Fastabend 提交于
      Add support for XDP adjust head by allocating a 256B header region
      that XDP programs can grow into. This is only enabled when a XDP
      program is loaded.
      
      In order to ensure that we do not have to unwind queue headroom push
      queue setup below bpf_prog_add. It reads better to do a prog ref
      unwind vs another queue setup call.
      
      At the moment this code must do a full reset to ensure old buffers
      without headroom on program add or with headroom on program removal
      are not used incorrectly in the datapath. Ideally we would only
      have to disable/enable the RX queues being updated but there is no
      API to do this at the moment in virtio so use the big hammer. In
      practice it is likely not that big of a problem as this will only
      happen when XDP is enabled/disabled changing programs does not
      require the reset. There is some risk that the driver may either
      have an allocation failure or for some reason fail to correctly
      negotiate with the underlying backend in this case the driver will
      be left uninitialized. I have not seen this ever happen on my test
      systems and for what its worth this same failure case can occur
      from probe and other contexts in virtio framework.
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2de2f7f4
    • J
      virtio_net: refactor freeze/restore logic into virtnet reset logic · 9fe7bfce
      John Fastabend 提交于
      For XDP we will need to reset the queues to allow for buffer headroom
      to be configured. In order to do this we need to essentially run the
      freeze()/restore() code path. Unfortunately the locking requirements
      between the freeze/restore and reset paths are different however so
      we can not simply reuse the code.
      
      This patch refactors the code path and adds a reset helper routine.
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9fe7bfce
    • J
      virtio_net: remove duplicate queue pair binding in XDP · 722d8283
      John Fastabend 提交于
      Factor out qp assignment.
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      722d8283
    • J
      virtio_net: factor out xdp handler for readability · 0354e4d1
      John Fastabend 提交于
      At this point the do_xdp_prog is mostly if/else branches handling
      the different modes of virtio_net. So remove it and handle running
      the program in the per mode handlers.
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0354e4d1
    • J
      virtio_net: wrap rtnl_lock in test for calling with lock already held · 47315329
      John Fastabend 提交于
      For XDP use case and to allow ethtool reset tests it is useful to be
      able to use reset paths from contexts where rtnl lock is already
      held.
      
      This requries updating virtnet_set_queues and free_receive_bufs the
      two places where rtnl_lock is taken in virtio_net. To do this we
      use the following pattern,
      
      	_foo(...) { do stuff }
      	foo(...) { rtnl_lock(); _foo(...); rtnl_unlock()};
      
      this allows us to use freeze()/restore() flow from both contexts.
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      47315329
    • D
      Merge branch 'bridge-improve-cache-utilization' · 152bff37
      David S. Miller 提交于
      Nikolay Aleksandrov says:
      
      ====================
      bridge: improve cache utilization
      
      This is the first set which begins to deal with the bad bridge cache
      access patterns. The first patch rearranges the bridge and port structs
      a little so the frequently (and closely) accessed members are in the same
      cache line. The second patch then moves the garbage collection to a
      workqueue trying to improve system responsiveness under load (many fdbs)
      and more importantly removes the need to check if the matched entry is
      expired in __br_fdb_get which was a major source of false-sharing.
      The third patch is a preparation for the final one which
      If properly configured, i.e. ports bound to CPUs (thus updating "updated"
      locally) then the bridge's HitM goes from 100% to 0%, but even without
      binding we get a win because previously every lookup that iterated over
      the hash chain caused false-sharing due to the first cache line being
      used for both mac/vid and used/updated fields.
      
      Some results from tests I've run:
      (note that these were run in good conditions for the baseline, everything
       ran on a single NUMA node and there were only 3 fdbs)
      
      1. baseline
      100% Load HitM on the fdbs (between everyone who has done lookups and hit
                                  one of the 3 hash chains of the communicating
                                  src/dst fdbs)
      Overall 5.06% Load HitM for the bridge, first place in the list
      
      2. patched & ports bound to CPUs
      0% Local load HitM, bridge is not even in the c2c report list
      Also there's 3% consistent improvement in netperf tests.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      152bff37
    • N
      bridge: fdb: write to used and updated at most once per jiffy · 83a718d6
      Nikolay Aleksandrov 提交于
      Writing once per jiffy is enough to limit the bridge's false sharing.
      After this change the bridge doesn't show up in the local load HitM stats.
      Suggested-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      83a718d6
    • N
      bridge: move write-heavy fdb members in their own cache line · 1214628c
      Nikolay Aleksandrov 提交于
      Fdb's used and updated fields are written to on every packet forward and
      packet receive respectively. Thus if we are receiving packets from a
      particular fdb, they'll cause false-sharing with everyone who has looked
      it up (even if it didn't match, since mac/vid share cache line!). The
      "used" field is even worse since it is updated on every packet forward
      to that fdb, thus the standard config where X ports use a single gateway
      results in 100% fdb false-sharing. Note that this patch does not prevent
      the last scenario, but it makes it better for other bridge participants
      which are not using that fdb (and are only doing lookups over it).
      The point is with this move we make sure that only communicating parties
      get the false-sharing, in a later patch we'll show how to avoid that too.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1214628c
    • N
      bridge: move to workqueue gc · f7cdee8a
      Nikolay Aleksandrov 提交于
      Move the fdb garbage collector to a workqueue which fires at least 10
      milliseconds apart and cleans chain by chain allowing for other tasks
      to run in the meantime. When having thousands of fdbs the system is much
      more responsive. Most importantly remove the need to check if the
      matched entry has expired in __br_fdb_get that causes false-sharing and
      is completely unnecessary if we cleanup entries, at worst we'll get 10ms
      of traffic for that entry before it gets deleted.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f7cdee8a
    • N
      bridge: modify bridge and port to have often accessed fields in one cache line · 1f90c7f3
      Nikolay Aleksandrov 提交于
      Move around net_bridge so the vlan fields are in the beginning since
      they're checked on every packet even if vlan filtering is disabled.
      For the port move flags & vlan group to the beginning, so they're in the
      same cache line with the port's state (both flags and state are checked
      on each packet).
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f90c7f3
    • W
      bpf: enable verifier to add 0 to packet ptr · 63dfef75
      William Tu 提交于
      The patch fixes the case when adding a zero value to the packet
      pointer.  The zero value could come from src_reg equals type
      BPF_K or CONST_IMM.  The patch fixes both, otherwise the verifer
      reports the following error:
        [...]
          R0=imm0,min_value=0,max_value=0
          R1=pkt(id=0,off=0,r=4)
          R2=pkt_end R3=fp-12
          R4=imm4,min_value=4,max_value=4
          R5=pkt(id=0,off=4,r=4)
        269: (bf) r2 = r0     // r2 becomes imm0
        270: (77) r2 >>= 3
        271: (bf) r4 = r1     // r4 becomes pkt ptr
        272: (0f) r4 += r2    // r4 += 0
        addition of negative constant to packet pointer is not allowed
      Signed-off-by: NWilliam Tu <u9012063@gmail.com>
      Signed-off-by: NMihai Budiu <mbudiu@vmware.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      63dfef75
    • J
      bpf: test for AND edge cases · 29200c19
      Josef Bacik 提交于
      These two tests are based on the work done for f23cc643.  The first test is
      just a basic one to make sure we don't allow AND'ing negative values, even if it
      would result in a valid index for the array.  The second is a cleaned up version
      of the original testcase provided by Jann Horn that resulted in the commit.
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      29200c19
    • D
      Merge branch 'dsa-add-fabric-notifier' · 9172d2a0
      David S. Miller 提交于
      Vivien Didelot says:
      
      ====================
      net: dsa: add fabric notifier
      
      When a switch fabric is composed of multiple switch chips, these chips
      must be programmed accordingly when an event occurred on one of them.
      
      Examples of such event include hardware bridging: when a Linux bridge
      spans interconnected chips, they must be programmed to allow external
      ports to ingress frames on their internal ports.
      
      Another example is cross-chip hardware VLANs. Switch chips in-between
      interconnected bridge ports must also configure a given VLAN to allow
      packets to pass through them.
      
      In order to support that, this patchset introduces a non-intrusive
      notifier mechanism. It adds a notifier head in every DSA switch tree
      (the said fabric), and a notifier block in every DSA switch chip.
      
      When an even occurs, it is chained to all notifiers of the fabric.
      Switch chips can react accordingly if they are cross-chip capable.
      
      On a dynamic debug enabled system, bridging a port in a multi-chip
      fabric will print something like this (ZII Rev B board):
      
          # brctl addif br0 lan3
          mv88e6085 0.1:00: crosschip DSA port 1.0 bridged to br0
          mv88e6085 0.4:00: crosschip DSA port 1.0 bridged to br0
          # brctl delif br0 lan3
          mv88e6085 0.1:00: crosschip DSA port 1.0 unbridged from br0
          mv88e6085 0.4:00: crosschip DSA port 1.0 unbridged from br0
      
      Currently only bridging events are added. A patchset introducing support
      for cross-chip hardware bridging configuration in mv88e6xxx will follow
      right after. Then events for switchdev operations are next on the line.
      
      We should note that non-switchdev events do not support rolling-back
      switch-wide operations. We'll have to work on closer integration with
      switchdev for that, like introducing new attributes or objects, to
      benefit from the prepare and commit phases.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9172d2a0
    • V
      net: dsa: introduce bridge notifier · 04d3a4c6
      Vivien Didelot 提交于
      A slave device will now notify the switch fabric once its port is
      bridged or unbridged, instead of calling directly its switch operations.
      
      This code allows propagating cross-chip bridging events in the fabric.
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      04d3a4c6
    • V
      net: dsa: add switch notifier · f515f192
      Vivien Didelot 提交于
      Add a notifier block per DSA switch, registered against a notifier head
      in the switch fabric they belong to.
      
      This infrastructure will allow to propagate fabric-wide events such as
      port bridging, VLAN configuration, etc. If a DSA switch driver cares
      about cross-chip configuration, such events can be caught.
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f515f192
    • V
      net: dsa: change state setter scope · c5d35cb3
      Vivien Didelot 提交于
      The scope of the functions inside net/dsa/slave.c must be the slave
      net_device pointer. Change to state setter helper accordingly to
      simplify callers.
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c5d35cb3