1. 19 5月, 2014 5 次提交
  2. 17 5月, 2014 35 次提交
    • D
      Merge branch 'cdc_ncm-coalesce' · 33fcc5e0
      David S. Miller 提交于
      Bjørn Mork says:
      
      ====================
      cdc_ncm: add buffer tuning and stats using ethtool
      
      Quoting the previous description of this series (skip to the
      changelog below if you only want a summary of the changes):
      
      "I have got quite a few reports from frustrated users of OpenWRT
      hosts trying to use some powerful LTE modem, but not achieving
      full speed.  This is typically caused by a combination of
      big buffers and little memory, giving in allocation errors and
      bad performance as a result.
      
      This series is an attempt to let users adjust the size of these
      buffers without having to rebuild the driver.
      
      Patches 1 - 4 are mostly rearranging existing code, in preparing
      for the dynamic buffer size changes.
      
      Patch 5 adds userspace control (ab)using the ethtool coalescing
      API. This isn't a perfect match, which is the main reason why I
      post this series as a RFC.
      
      Patch 6 is an unrelated framing optimization, reducing the
      overhead quite a bit and allowing for better use of smaller
      buffers.
      
      Patch 7 changes the way we calculate frame padding cutoff. The
      problem with big buffers is made much worse by the current padding
      strategy where zero padding often can account for more than 90% of
      the frames.
      
      Patch 8 add some counters giving some insight into how well the
      NCM/MBIM protocol works, supporting further tuning.
      
      Patch 9 reduce the initial maximum buffer size from 32kB to 16kB
      in an attempt to make the default better suit all. It is still
      possible to tune this up again to the old fixed max, using the
      new tuning knobs.
      
      I must admit that I had higher hopes for this series before I
      tested it on my own modems.  One really unexpected result was
      that one of the MBIM modems accepted the new rx buffer size we
      set, but happily continued sending buffers of the same size as
      before.  Needless to say:  This did not work very well...
      
      So don't really expect to be able to use any values with any
      given device. Firmware implementations are still... I don't
      think I have words suitable for a public mailing list.
      
      But I am hoping this will help the many users who have had success
      rebuilding the driver with lower fixed limits.
      
      Please test and/or comment!"
      
      Changes:
      
      ** RFC -> v1 **
      
      Patch 10 - a follow-up to a comment Joe Perches made in November
                 2013.  I don't always forget :-)
      Patch 11 - removes the redundant "connected" driver state, and the
                 associated .check_connect callbacks.
      
      ** v1 -> v2 **
      
      Patch 1  - Better handling of minium rx buffer size, based on feedback
                 from Oliver Neukum and Enrico Mioso
      Patch 5  - fixed locking around timer interval update
      Patch 9  - fixed whitespace error
      Patch 12 - new fix related to the tuneable tx timer
      
      ...and spelling fixes all over the commit messages.  I have finally
      added a spelling hook, which I'm sure may of you will appreciate :-)
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      33fcc5e0
    • B
      net: cdc_ncm: do not start timer on an empty skb · 046c6594
      Bjørn Mork 提交于
      We can end up with a freshly allocated tx_curr_skb with no frames
      in it.  In this case it does not make any sense to start the timer.
      This avoids the timer periodically trying to start tx when there
      is nothing in the queue.
      Signed-off-by: NBjørn Mork <bjorn@mork.no>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      046c6594
    • B
      net: cdc_ncm: remove redundant "disconnected" flag · fa83dbee
      Bjørn Mork 提交于
      Calling netif_carrier_{on,off} is sufficient.  There is no need
      to duplicate the carrier state in a driver specific flag.
      Acked-by: NEnrico Mioso <mrkiko.rs@gmail.com>
      Signed-off-by: NBjørn Mork <bjorn@mork.no>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fa83dbee
    • B
      net: cdc_ncm: fix argument alignment · 916f7640
      Bjørn Mork 提交于
      Reported-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NBjørn Mork <bjorn@mork.no>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      916f7640
    • B
      net: cdc_ncm: use sane defaults for rx/tx buffers · 50f1cb1c
      Bjørn Mork 提交于
      Lots of devices request much larger buffers than reasonable. This
      cause real problems for users of hosts with limited resources.
      
      Reducing the default buffer size to 16kB for such devices is
      a reasonable trade-off between allowing them to aggregate traffic
      and avoiding memory exhaustion on resource restrained hosts.
      Signed-off-by: NBjørn Mork <bjorn@mork.no>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      50f1cb1c
    • B
      net: cdc_ncm/cdc_mbim: adding NCM protocol statistics · beeecd42
      Bjørn Mork 提交于
      To have an idea of the effects of the protocol coalescing
      it's useful to have some counters showing the different
      aspects.
      
      Due to the asymmetrical usbnet interface the netdev
      rx_bytes counter has been counting real received payload,
      while the tx_bytes counter has included the NCM/MBIM
      framing overhead. This overhead can be many times the
      payload because of the aggressive padding strategy of
      this driver, and will vary a lot depending on device
      and traffic.
      
      With very few exceptions, users are only interested in
      the payload size.  Having an somewhat accurate payload
      byte counter is particularly important for mobile
      broadband devices, which many NCM devices and of course
      all MBIM devices are. Users and userspace applications
      will use this counter to monitor account quotas.
      
      Having protocol specific counters for the overhead, we are
      now able to correct the tx_bytes netdev counter so that
      it shows the real payload
      Signed-off-by: NBjørn Mork <bjorn@mork.no>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      beeecd42
    • B
      net: cdc_ncm: set reasonable padding limits · 43e4c6df
      Bjørn Mork 提交于
      We pad frames larger than X to maximum size for devices which
      don't need a ZLP after maximum sized frames. This allows the
      device to optimize its transfers for one fixed buffer size.
      
      X was arbitrarily set at 512 bytes regardless of real buffer
      maximum, causing extreme overheads due to excessive padding of
      larger tx buffers. Limit the padding to at most 3 full USB
      packets, still allowing the overhead to payload ratio of 3/1.
      Signed-off-by: NBjørn Mork <bjorn@mork.no>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      43e4c6df
    • B
      net: cdc_ncm: use true max dgram count for header estimates · 70559b89
      Bjørn Mork 提交于
      Many newer NCM and MBIM devices will request a maximum tx
      datagram count which is much smaller than our hard-coded
      absolute max. We can reduce the overhead without sacrificing
      any of the simplicity for these devices, by simply using the
      true negotiated count in when calculated the maximum NTH and
      NDP header sizes.
      Signed-off-by: NBjørn Mork <bjorn@mork.no>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      70559b89
    • B
      net: cdc_ncm: use ethtool to tune coalescing settings · 6c4e548f
      Bjørn Mork 提交于
      Datagram coalescing is an integral part of the NCM and MBIM
      protocols, intended to reduce the interrupt load primarily
      on the device end of the USB link.  As with all coalescing
      solutions, there is a trade-off between buffering and
      interrupts.
      
      The current defaults are based on the assumption that device
      side buffers should be the limiting factor.  However, many
      modern high speed LTE modems suffers from buffer-bloat,
      making this assumption fail. This results in sub-optimal
      performance due to excessive coalescing.  And in cases where
      such modems are connected to cheap embedded hosts there is
      often severe buffer allocation issues, giving very noticeable
      performance degradation .
      
      A start on improving this is going from build time hard
      coded limits to per device user configurable limits.  The
      ethtool coalescing API was selected as user interface
      because, although the tuned values are buffer sizes, these
      settings directly control datagram coalescing.
      Signed-off-by: NBjørn Mork <bjorn@mork.no>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6c4e548f
    • B
      net: cdc_ncm: support rx_max/tx_max updates when running · 68864abf
      Bjørn Mork 提交于
      Finish the rx_max/tx_max setup by flushing buffers and
      informing usbnet about the changes.  This way, the settings
      can be modified while the netdev is up and running.
      Signed-off-by: NBjørn Mork <bjorn@mork.no>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      68864abf
    • B
      net: cdc_ncm: split .bind device initialization · 08c74fc9
      Bjørn Mork 提交于
      Now that we have split out the part of the device setup
      which MUST be done with the data interface in altsetting 0,
      we can delay the rest of the initialization. This allows us
      to move some of post-init buffer size config from bind to
      the appropriate setup function.
      
      The purpose of this refactoring is to collect all code
      adjusting the rx_max and tx_max buffers in one place, so
      that it is easier to call it from multiple call sites.
      Signed-off-by: NBjørn Mork <bjorn@mork.no>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      08c74fc9
    • B
      net: cdc_ncm: factor out one-time device initialization · f8afb73d
      Bjørn Mork 提交于
      Split the parts of setup dealing with device initialization from
      parts just setting defaults for attributes which might be
      changed after initialization.
      
      Some commands of the device initialization are only allowed when
      the data interface is in its disabled altsetting, so we must
      separate them out of we are to allow rerunning parts of setup.
      Signed-off-by: NBjørn Mork <bjorn@mork.no>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f8afb73d
    • B
      net: cdc_ncm: split out rx_max/tx_max update of setup · 5aa73d5d
      Bjørn Mork 提交于
      Split out the part of setup dealing with updating the rx_max
      and tx_max buffer sizes so that this code can be reused for
      dynamically updating the limits.
      Signed-off-by: NBjørn Mork <bjorn@mork.no>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5aa73d5d
    • T
      97dc48e2
    • D
      Merge branch 'ieee802154-next' · a47e8f5a
      David S. Miller 提交于
      Phoebe Buckheister says:
      
      ====================
      802154: implement link-layer security
      
      This patch series implements 802.15.4-2011 link layer security.
      
      Patches 1 and 2 prepare for llsec by adding data structures to represent the
      llsec PIB as specified in 802.15.4-2011. I've changed some structures from
      their specification to be more sensible, since 802.15.4 specifies some
      structures in not-exactly-useful ways. Nested lists are common, but not very
      accessible for netlink methods, and not very fast to traverse when searching
      for specific elements either.
      
      Patch 3 implements backends for these structures in mac802154.
      
      Patch 4 and 5 implement the encryption and decryption methods, split from patch
      3 to ease review. The encryption and decryption methods are almost entirely
      compliant with the specified outgoing/incoming frame procedures. Decryption
      deviates from the specification slightly where the specification makes no
      sense, i.e. encrypted frames with security level 0 may be sent, but must be
      dropped an reception - but transforms for processing such frames are given a
      few lines in the standard. I've opted to not drop these frames instead of not
      implementing the transforms that wouldn't be used if they were dropped.
      
      Patch 6 links the mac802154 llsec with the SoftMAC devices. This is mainly
      init//fini code for llsec context, handling of security subheaders and calling
      the encryption/decryption methods.
      
      Patch 7 adds sockopts to 802.15.4 dgram sockets to modifiy outgoing security
      parameters on a per-socket basis. Ideally, this would also be available for
      sockets on 6lowpan devices, but I'm not sure how to do that nicely.
      
      Patch 8 adds forwarders to the llsec configuration methods for netlink, patch
      10 implements these netlink accessors. This is mainly mechanical.
      
      Patch 11, implements a key tracking option for devices that previous patches
      haven't, because I'm not entirely sure whether this is the best approach to the
      problem. It performs reasonably well though, so I decided to include it as a
      separate patch in this series instead of sending an RFC just for this one
      option.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a47e8f5a
    • P
      ieee802154, mac802154: implement devkey record option · f0f77dc6
      Phoebe Buckheister 提交于
      The 802.15.4-2011 standard states that for each key, a list of devices
      that use this key shall be kept. Previous patches have only considered
      two options:
      
       * a device "uses" (or may use) all keys, rendering the list useless
       * a device is restricted to a certain set of keys
      
      Another option would be that a device *may* use all keys, but need not
      do so, and we are interested in the actual set of keys the device uses.
      Recording keys used by any given device may have a noticable performance
      impact and might not be needed as often. The common case, in which a
      device will not switch keys too often, should still perform well.
      Signed-off-by: NPhoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f0f77dc6
    • P
      ieee802154: add netlink interfaces for llsec · 3e9c156e
      Phoebe Buckheister 提交于
      This patch adds user-visible interfaces for the llsec infrastructure.
      For the added methods, the only major difference between all add/remove
      implementation lies in how the specific object is parsed, and for dump
      requests, how objects are written into netlink messages.
      
      To save on boilerplate code, table dumps are routed through a helper
      function that handles netlink dump state, leaving the actual dumping
      code to care only about iterating over the table to be dumped and
      filling netlink messages. For add/remove methods, the boilerplate
      required to work is not quite as large, but still enough to also move
      into a local helper.
      Signed-off-by: NPhoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3e9c156e
    • P
    • P
    • P
      ieee802154: add dgram sockopts for security control · af9eed5b
      Phoebe Buckheister 提交于
      Allow datagram sockets to override the security settings of the device
      they send from on a per-socket basis. Requires CAP_NET_ADMIN or
      CAP_NET_RAW, since raw sockets can send arbitrary packets anyway.
      Signed-off-by: NPhoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af9eed5b
    • P
    • P
    • P
    • P
      mac802154: add llsec structures and mutators · 5d637d5a
      Phoebe Buckheister 提交于
      This patch adds containers and mutators for the major ieee802154_llsec
      structures to mac802154. Most of the (rather simple) ieee802154_llsec
      structs are wrapped only to provide an rcu_head for orderly disposal,
      but some structs - llsec keys notably - require more complex
      bookkeeping.
      
      Since each llsec key may be referenced by a number of llsec key table
      entries (with differing key ids, but the same actual key), we want to
      save memory and not allocate crypto transforms for each entry in the
      table. Thus, the mac802154 llsec key is reference-counted instead.
      Further, each key will have four associated crypto transforms - three
      CCM transforms for the authsizes 4/8/16 and one CTR transform for
      unauthenticated encryption. If we had a CCM* transform that allowed
      authsize 0, and authsize as part of requests instead of transforms, this
      would not be necessary.
      Signed-off-by: NPhoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5d637d5a
    • P
      mac802154: update Kconfig · 87de726c
      Phoebe Buckheister 提交于
      Link-layer security requires AES CCM for authenticated modes and AES CTR
      for the unauthenticated encryption mode.
      Signed-off-by: NPhoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      87de726c
    • P
      ieee802154: add types for link-layer security · dc20759f
      Phoebe Buckheister 提交于
      The added structures match 802.15.4-2011 link-layer security PIBs as
      closely as is reasonable. Some lists required by the standard were
      modeled as bitmaps (frame_types and command_frame_ids in *llsec_key,
      802.15.4-2011 7.5/Table 61), since using lists for those seems a bit
      excessive and not particularly useful. The DeviceDescriptorHandleList
      was inverted and is here a per-device list, since operations on this
      list are likely to have both a key and a device at hand, and per-device
      lists of keys are shorter than per-key lists of devices.
      Signed-off-by: NPhoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dc20759f
    • D
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch · e54740e6
      David S. Miller 提交于
      Jesse Gross says:
      
      ====================
      A set of OVS changes for net-next/3.16.
      
      The major change here is a switch from per-CPU to per-NUMA flow
      statistics. This improves scalability by reducing kernel overhead
      in flow setup and maintenance.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e54740e6
    • D
      Merge branch 'dt_fixed_phy' · ad2ebb3d
      David S. Miller 提交于
      Thomas Petazzoni says:
      
      ====================
      Add DT support for fixed PHYs
      
      Here is a fourth version of the patch set that adds a Device Tree
      binding and the related code to support fixed PHYs. I'm hoping to get
      this merged in 3.16.
      
      Changes since v3:
      
       * Rebased on top of v3.15-rc5
      
       * In patch "net: phy: decouple PHY id and PHY address in fixed PHY
         driver", changed the PHY ID of fixed PHYs from 0xdeadbeef to 0x0,
         as suggested by Grant Likely.
      
       * Fixed the !CONFIG_PHY_FIXED case in patch "net: phy: extend fixed
         driver with fixed_phy_register()". Noticed by Florian Fainelli.
      
       * Added Acked-by from Grant Likely and Florian Fainelli on patch
         "net: phy: extend fixed driver with fixed_phy_register()".
      
       * Reworked the new fixed-link DT binding to be just a sub-node of the
         Ethernet MAC node, and not a node referenced by the 'phy'
         property. This was requested by Grant Likely.
      
       * Reworked the code implementing the new DT binding to also make it
         accept the old, single property based, DT binding.
      
       * Added a patch that actually uses the new fixed link DT binding for
         the Armada XP Matrix board.
      
      Changes since v2:
      
       * Rebased on top of v3.14-rc1, and re-tested on hardware.
      
       * Removed the RFC tag, since there seems to be some real interest in
         this feature, and the code has gone through several iterations
         already.
      
       * The error handling in fixed_phy_register() has been fixed.
      
      Changes since v1:
      
       * Instead of using a 'fixed-link' property inside the Ethernet device
         DT node, with a fairly cryptic succession of integer values, we now
         use a PHY subnode under the Ethernet device DT node, with explicit
         properties to configure the duplex, speed, pause and other PHY
         properties.
      
       * The PHY address is automatically allocated by the kernel and no
         longer visible in the Device Tree binding.
      
       * The PHY device is created directly when the network driver calls
         of_phy_connect_fixed_link(), and associated to the PHY DT node,
         which allows the existing of_phy_connect() function to work,
         without the need to use the deprecated of_phy_connect_fixed_link().
      
      Posts of previous versions:
      
        RFCv1:   http://www.spinics.net/lists/netdev/msg243253.html
        RFCv2:   http://lists.infradead.org/pipermail/linux-arm-kernel/2013-September/196919.html
        PATCHv3: http://www.spinics.net/lists/netdev/msg273117.html
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ad2ebb3d
    • T
      ARM: mvebu: use the fixed-link PHY DT binding for the Armada XP Matrix board · 84f6e11f
      Thomas Petazzoni 提交于
      The Armada XP Matrix board has an Ethernet PHY that isn't configurable
      through the MDIO bus, so we use the newly introduced fixed-link PHY DT
      binding to represent the PHY of this platform and get network working.
      Signed-off-by: NThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      84f6e11f
    • T
      net: mvneta: add support for fixed links · 83895bed
      Thomas Petazzoni 提交于
      Following the introduction of of_phy_register_fixed_link(), this patch
      introduces fixed link support in the mvneta driver, for Marvell Armada
      370/XP SOCs.
      Signed-off-by: NThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      83895bed
    • T
      of: provide a binding for fixed link PHYs · 3be2a49e
      Thomas Petazzoni 提交于
      Some Ethernet MACs have a "fixed link", and are not connected to a
      normal MDIO-managed PHY device. For those situations, a Device Tree
      binding allows to describe a "fixed link" using a special PHY node.
      
      This patch adds:
      
       * A documentation for the fixed PHY Device Tree binding.
      
       * An of_phy_is_fixed_link() function that an Ethernet driver can call
         on its PHY phandle to find out whether it's a fixed link PHY or
         not. It should typically be used to know if
         of_phy_register_fixed_link() should be called.
      
       * An of_phy_register_fixed_link() function that instantiates the
         fixed PHY into the PHY subsystem, so that when the driver calls
         of_phy_connect(), the PHY device associated to the OF node will be
         found.
      
      These two additional functions also support the old fixed-link Device
      Tree binding used on PowerPC platforms, so that ultimately, the
      network device drivers for those platforms could be converted to use
      of_phy_is_fixed_link() and of_phy_register_fixed_link() instead of
      of_phy_connect_fixed_link(), while keeping compatibility with their
      respective Device Tree bindings.
      Signed-off-by: NThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Tested-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3be2a49e
    • T
      net: phy: extend fixed driver with fixed_phy_register() · a7595121
      Thomas Petazzoni 提交于
      The existing fixed_phy_add() function has several drawbacks that
      prevents it from being used as is for OF-based declaration of fixed
      PHYs:
      
       * The address of the PHY on the fake bus needs to be passed, while a
         dynamic allocation is desired.
      
       * Since the phy_device instantiation is post-poned until the next
         mdiobus scan, there is no way to associate the fixed PHY with its
         OF node, which later prevents of_phy_connect() from finding this
         fixed PHY from a given OF node.
      
      To solve this, this commit introduces fixed_phy_register(), which will
      allocate an available PHY address, add the PHY using fixed_phy_add()
      and instantiate the phy_device structure associated with the provided
      OF node.
      Signed-off-by: NThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Acked-by: NGrant Likely <grant.likely@linaro.org>
      Tested-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a7595121
    • T
      net: phy: decouple PHY id and PHY address in fixed PHY driver · 9b744942
      Thomas Petazzoni 提交于
      Until now, the fixed_phy_add() function was taking as argument
      'phy_id', which was used both as the PHY address on the fake fixed
      MDIO bus, and as the PHY id, as available in the MII_PHYSID1 and
      MII_PHYSID2 registers. However, those two informations are completely
      unrelated.
      
      This patch decouples them. The PHY id of fixed PHYs is hardcoded to be
      0x0. Ideally, a really reserved value would be nicer, but there
      doesn't seem to be an easy of making sure a dummy value can be
      assigned to the Linux kernel for such usage.
      
      The PHY address remains passed by the caller of phy_fixed_add().
      Signed-off-by: NThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Tested-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b744942
    • D
      Merge branch 'bridge-non-promisc' · 2770abcc
      David S. Miller 提交于
      Vlad Yasevich says:
      
      ====================
      bridge: Non-promisc bridge ports support
      
      This series adds functionality to the bridge device to enable
      operations without setting all ports to promiscuous mode.
      
      The basic concept is this.  The bridge keeps track of the ports
      that support learning and flooding packets to unknown destinations.
      We call these ports auto-discovery ports since they automatically
      discover who is behind them through learning and flooding.
      
      If flooding and learning are disabled via flags, then the port
      requires static configuration to tell it which mac addresses
      are behind it.  This is accomplished through adding of fdbs.
      These fdbs should be static as dynamic fdbs can expire and systems
      will become unreachable due to lack of flooding.
      
      If the user marks all ports as needing static configuration then
      we can safely make them non-promiscuous since we will know all the
      information about them.
      
      If the user leaves only 1 port as automatic, then we can mark
      that port as not-promiscuous as well.  One could think of
      this a edge relay similar to what's support by embedded switches
      in SRIOV devices.  Since we have all the information about the
      other ports, we can just program the mac addresses into the
      single automatic port to receive all necessary traffic.
      More information about this is patch 6.
      
      In other cases, we keep all ports promiscuous as before.
      
      There are some other cases when promiscuous mode has to be turned
      back on.  One is when the bridge itself if placed in promiscuous
      mode (user sets promisc flag).  The other is if vlan filtering is
      turned off.  Since this is the default configuration, the default
      bridge operation is not changed.
      
      Changes since v2:
       - White space and spelling fixes from Michael Tsirkin
       - Squash patches 6, 7 and 8 to prevent bisect breakage.
      
      Changes since v1:
       - Address issues rasied by Stephen Heminger
       - Address initializer comments raised by Sergey Shtylyov
       - Rebased recent net-next.
      
      Changes since rfc v2:
       - Better description of in the commit logs
       - Leave port in promiscuous mode if IFF_UNICAST_FLT is disabled on the
         device.
       - Fix issue with flag masking
       - Rework patch ordering a bit.
      
      Changes since rfc v1:
       - Removed private list.  We now traverse the fdb hashtable itself
         to write necessary addresses to the ports (Stephen's concern)
       - Add learning flag to the mask for flags that decides if the port
         is 'auto' or not (suggest by MST and Jamal).
       - Simplified tracking of such ports at the cost of a loop over all
         ports (suggested by MST)
      
      I've played with quite a large number of ports and the current approach
      seems to work fairly well.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2770abcc
    • V
      bridge: Automatically manage port promiscuous mode. · 2796d0c6
      Vlad Yasevich 提交于
      There exist configurations where the administrator or another management
      entity has the foreknowledge of all the mac addresses of end systems
      that are being bridged together.
      
      In these environments, the administrator can statically configure known
      addresses in the bridge FDB and disable flooding and learning on ports.
      This makes it possible to turn off promiscuous mode on the interfaces
      connected to the bridge.
      
      Here is why disabling flooding and learning allows us to control
      promiscuity:
       Consider port X.  All traffic coming into this port from outside the
      bridge (ingress) will be either forwarded through other ports of the
      bridge (egress) or dropped.  Forwarding (egress) is defined by FDB
      entries and by flooding in the event that no FDB entry exists.
      In the event that flooding is disabled, only FDB entries define
      the egress.  Once learning is disabled, only static FDB entries
      provided by a management entity define the egress.  If we provide
      information from these static FDBs to the ingress port X, then we'll
      be able to accept all traffic that can be successfully forwarded and
      drop all the other traffic sooner without spending CPU cycles to
      process it.
       Another way to define the above is as following equations:
          ingress = egress + drop
       expanding egress
          ingress = static FDB + learned FDB + flooding + drop
       disabling flooding and learning we a left with
          ingress = static FDB + drop
      
      By adding addresses from the static FDB entries to the MAC address
      filter of an ingress port X, we fully define what the bridge can
      process without dropping and can thus turn off promiscuous mode,
      thus dropping packets sooner.
      
      There have been suggestions that we may want to allow learning
      and update the filters with learned addresses as well.  This
      would require mac-level authentication similar to 802.1x to
      prevent attacks against the hw filters as they are limited
      resource.
      
      Additionally, if the user places the bridge device in promiscuous mode,
      all ports are placed in promiscuous mode regardless of the changes
      to flooding and learning.
      
      Since the above functionality depends on full static configuration,
      we have also require that vlan filtering be enabled to take
      advantage of this.  The reason is that the bridge has to be
      able to receive and process VLAN-tagged frames and the there
      are only 2 ways to accomplish this right now: promiscuous mode
      or vlan filtering.
      Suggested-by: NMichael S. Tsirkin <mst@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2796d0c6