1. 02 3月, 2016 12 次提交
    • J
      Support to encoding decoding skb mark on IFE action · 084e2f65
      Jamal Hadi Salim 提交于
      Example usage:
      Set the skb using skbedit then allow it to be encoded
      
      sudo tc qdisc add dev $ETH root handle 1: prio
      sudo tc filter add dev $ETH parent 1: protocol ip prio 10 \
      u32 match ip protocol 1 0xff flowid 1:2 \
      action skbedit mark 17 \
      action ife encode \
      allow mark \
      dst 02:15:15:15:15:15
      
      Note: You dont need the skbedit action if you are already encoding the
      skb mark earlier. A zero skb mark, when seen, will not be encoded.
      
      Alternative hard code static mark of 0x12 every time the filter matches
      
      sudo $TC filter add dev $ETH parent 1: protocol ip prio 10 \
      u32 match ip protocol 1 0xff flowid 1:2 \
      action ife encode \
      type 0xDEAD \
      use mark 0x12 \
      dst 02:15:15:15:15:15
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      084e2f65
    • J
      introduce IFE action · ef6980b6
      Jamal Hadi Salim 提交于
      This action allows for a sending side to encapsulate arbitrary metadata
      which is decapsulated by the receiving end.
      The sender runs in encoding mode and the receiver in decode mode.
      Both sender and receiver must specify the same ethertype.
      At some point we hope to have a registered ethertype and we'll
      then provide a default so the user doesnt have to specify it.
      For now we enforce the user specify it.
      
      Lets show example usage where we encode icmp from a sender towards
      a receiver with an skbmark of 17; both sender and receiver use
      ethertype of 0xdead to interop.
      
      YYYY: Lets start with Receiver-side policy config:
      xxx: add an ingress qdisc
      sudo tc qdisc add dev $ETH ingress
      
      xxx: any packets with ethertype 0xdead will be subjected to ife decoding
      xxx: we then restart the classification so we can match on icmp at prio 3
      sudo $TC filter add dev $ETH parent ffff: prio 2 protocol 0xdead \
      u32 match u32 0 0 flowid 1:1 \
      action ife decode reclassify
      
      xxx: on restarting the classification from above if it was an icmp
      xxx: packet, then match it here and continue to the next rule at prio 4
      xxx: which will match based on skb mark of 17
      sudo tc filter add dev $ETH parent ffff: prio 3 protocol ip \
      u32 match ip protocol 1 0xff flowid 1:1 \
      action continue
      
      xxx: match on skbmark of 0x11 (decimal 17) and accept
      sudo tc filter add dev $ETH parent ffff: prio 4 protocol ip \
      handle 0x11 fw flowid 1:1 \
      action ok
      
      xxx: Lets show the decoding policy
      sudo tc -s filter ls dev $ETH parent ffff: protocol 0xdead
      xxx:
      filter pref 2 u32
      filter pref 2 u32 fh 800: ht divisor 1
      filter pref 2 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1  (rule hit 0 success 0)
        match 00000000/00000000 at 0 (success 0 )
              action order 1: ife decode action reclassify
               index 1 ref 1 bind 1 installed 14 sec used 14 sec
               type: 0x0
               Metadata: allow mark allow hash allow prio allow qmap
              Action statistics:
              Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
              backlog 0b 0p requeues 0
      xxx:
      Observe that above lists all metadatum it can decode. Typically these
      submodules will already be compiled into a monolithic kernel or
      loaded as modules
      
      YYYY: Lets show the sender side now ..
      
      xxx: Add an egress qdisc on the sender netdev
      sudo tc qdisc add dev $ETH root handle 1: prio
      xxx:
      xxx: Match all icmp packets to 192.168.122.237/24, then
      xxx: tag the packet with skb mark of decimal 17, then
      xxx: Encode it with:
      xxx:	ethertype 0xdead
      xxx:	add skb->mark to whitelist of metadatum to send
      xxx:	rewrite target dst MAC address to 02:15:15:15:15:15
      xxx:
      sudo $TC filter add dev $ETH parent 1: protocol ip prio 10  u32 \
      match ip dst 192.168.122.237/24 \
      match ip protocol 1 0xff \
      flowid 1:2 \
      action skbedit mark 17 \
      action ife encode \
      type 0xDEAD \
      allow mark \
      dst 02:15:15:15:15:15
      
      xxx: Lets show the encoding policy
      sudo tc -s filter ls dev $ETH parent 1: protocol ip
      xxx:
      filter pref 10 u32
      filter pref 10 u32 fh 800: ht divisor 1
      filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:2  (rule hit 0 success 0)
        match c0a87aed/ffffffff at 16 (success 0 )
        match 00010000/00ff0000 at 8 (success 0 )
      
      	action order 1:  skbedit mark 17
      	 index 6 ref 1 bind 1
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      
      	action order 2: ife encode action pipe
      	 index 3 ref 1 bind 1
      	 dst MAC: 02:15:15:15:15:15 type: 0xDEAD
       	 Metadata: allow mark
       	Action statistics:
      	Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
      	backlog 0b 0p requeues 0
      xxx:
      
      test by sending ping from sender to destination
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ef6980b6
    • N
      bridge: mcast: add support for more router port information dumping · 59f78f9f
      Nikolay Aleksandrov 提交于
      Allow for more multicast router port information to be dumped such as
      timer and type attributes. For that that purpose we need to extend the
      MDBA_ROUTER_PORT attribute similar to how it was done for the mdb entries
      recently. The new format is thus:
      [MDBA_ROUTER_PORT] = { <- nested attribute
          u32 ifindex <- router port ifindex for user-space compatibility
          [MDBA_ROUTER_PATTR attributes]
      }
      This way it remains compatible with older users (they'll simply retrieve
      the u32 in the beginning) and new users can parse the remaining
      attributes. It would also allow to add future extensions to the router
      port without breaking compatibility.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      59f78f9f
    • N
      bridge: mcast: add support for temporary port router · a55d8246
      Nikolay Aleksandrov 提交于
      Add support for a temporary router port which doesn't depend only on the
      incoming query. It can be refreshed if set to the same value, which is
      a no-op for the rest.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a55d8246
    • N
      bridge: mcast: do nothing if port's multicast_router is set to the same val · 4950cfd1
      Nikolay Aleksandrov 提交于
      This is needed for the upcoming temporary port router. There's no point
      to go through the logic if the value is the same.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4950cfd1
    • N
      bridge: mcast: use names for the different multicast_router types · 7f0aec7a
      Nikolay Aleksandrov 提交于
      Using raw values makes it difficult to extend and also understand the
      code, give them names and do explicit per-option manipulation in
      br_multicast_set_port_router.
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7f0aec7a
    • V
      net: dsa: support VLAN filtering switchdev attr · fb2dabad
      Vivien Didelot 提交于
      When a user explicitly requests VLAN filtering with something like:
      
          # echo 1 > /sys/class/net/<bridge>/bridge/vlan_filtering
      
      Switchdev propagates a SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING port
      attribute.
      
      Add support for it in the DSA layer with a new port_vlan_filtering
      function to let drivers toggle 802.1Q filtering on user demand.
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fb2dabad
    • J
      Introduce devlink infrastructure · bfcd3a46
      Jiri Pirko 提交于
      Introduce devlink infrastructure for drivers to register and expose to
      userspace via generic Netlink interface.
      
      There are two basic objects defined:
      devlink - one instance for every "parent device", for example switch ASIC
      devlink port - one instance for every physical port of the device.
      
      This initial portion implements basic get/dump of objects to userspace.
      Also, port splitter and port type setting is implemented.
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bfcd3a46
    • J
      net: sched: cls_u32 add bit to specify software only rules · 9e8ce79c
      John Fastabend 提交于
      In the initial implementation the only way to stop a rule from being
      inserted into the hardware table was via the device feature flag.
      However this doesn't work well when working on an end host system
      where packets are expect to hit both the hardware and software
      datapaths.
      
      For example we can imagine a rule that will match an IP address and
      increment a field. If we install this rule in both hardware and
      software we may increment the field twice. To date we have only
      added support for the drop action so we have been able to ignore
      these cases. But as we extend the action support we will hit this
      example plus more such cases. Arguably these are not even corner
      cases in many working systems these cases will be common.
      
      To avoid forcing the driver to always abort (i.e. the above example)
      this patch adds a flag to add a rule in software only. A careful
      user can use this flag to build software and hardware datapaths
      that work together. One example we have found particularly useful
      is to use hardware resources to set the skb->mark on the skb when
      the match may be expensive to run in software but a mark lookup
      in a hash table is cheap. The idea here is hardware can do in one
      lookup what the u32 classifier may need to traverse multiple lists
      and hash tables to compute. The flag is only passed down on inserts.
      On deletion to avoid stale references in hardware we always try
      to remove a rule if it exists.
      
      The flags field is part of the classifier specific options. Although
      it is tempting to lift this into the generic structure doing this
      proves difficult do to how the tc netlink attributes are implemented
      along with how the dump/change routines are called. There is also
      precedence for putting seemingly generic pieces in the specific
      classifier options such as TCA_U32_POLICE, TCA_U32_ACT, etc. So
      although not ideal I've left FLAGS in the u32 options as well as it
      simplifies the code greatly and user space has already learned how
      to manage these bits ala 'tc' tool.
      
      Another thing if trying to update a rule we require the flags to
      be unchanged. This is to force user space, software u32 and
      the hardware u32 to keep in sync. Thanks to Simon Horman for
      catching this case.
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9e8ce79c
    • J
      net: sched: consolidate offload decision in cls_u32 · 6843e7a2
      John Fastabend 提交于
      The offload decision was originally very basic and tied to if the dev
      implemented the appropriate ndo op hook. The next step is to allow
      the user to more flexibly define if any paticular rule should be
      offloaded or not. In order to have this logic in one function lift
      the current check into a helper routine tc_should_offload().
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6843e7a2
    • P
      ovs: propagate per dp max headroom to all vports · 3a927bc7
      Paolo Abeni 提交于
      This patch implements bookkeeping support to compute the maximum
      headroom for all the devices in each datapath. When said value
      changes, the underlying devs are notified via the
      ndo_set_rx_headroom method.
      
      This also increases the internal vports xmit performance.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a927bc7
    • P
      bridge: notify enslaved devices of headroom changes · 45493d47
      Paolo Abeni 提交于
      On bridge needed_headroom changes, the enslaved devices are
      notified via the ndo_set_rx_headroom method
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      45493d47
  2. 01 3月, 2016 4 次提交
  3. 27 2月, 2016 3 次提交
    • A
      GSO: Provide software checksum of tunneled UDP fragmentation offload · 22463876
      Alexander Duyck 提交于
      On reviewing the code I realized that GRE and UDP tunnels could cause a
      kernel panic if we used GSO to segment a large UDP frame that was sent
      through the tunnel with an outer checksum and hardware offloads were not
      available.
      
      In order to correct this we need to update the feature flags that are
      passed to the skb_segment function so that in the event of UDP
      fragmentation being requested for the inner header the segmentation
      function will correctly generate the checksum for the payload if we cannot
      segment the outer header.
      Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      22463876
    • D
      net: l3mdev: prefer VRF master for source address selection · 17b693cd
      David Lamparter 提交于
      When selecting an address in context of a VRF, the vrf master should be
      preferred for address selection.  If it isn't, the user has a hard time
      getting the system to select to their preference - the code will pick
      the address off the first in-VRF interface it can find, which on a
      router could well be a non-routable address.
      Signed-off-by: NDavid Lamparter <equinox@diac24.net>
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      [dsa: Fixed comment style and removed extra blank link ]
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      17b693cd
    • D
      net: l3mdev: address selection should only consider devices in L3 domain · 3f2fb9a8
      David Ahern 提交于
      David Lamparter noted a use case where the source address selection fails
      to pick an address from a VRF interface - unnumbered interfaces.
      
      Relevant commands from his script:
          ip addr add 9.9.9.9/32 dev lo
          ip link set lo up
      
          ip link add name vrf0 type vrf table 101
          ip rule add oif vrf0 table 101
          ip rule add iif vrf0 table 101
          ip link set vrf0 up
          ip addr add 10.0.0.3/32 dev vrf0
      
          ip link add name dummy2 type dummy
          ip link set dummy2 master vrf0 up
      
          --> note dummy2 has no address - unnumbered device
      
          ip route add 10.2.2.2/32 dev dummy2 table 101
          ip neigh add 10.2.2.2 dev dummy2 lladdr 02:00:00:00:00:02
      
          tcpdump -ni dummy2 &
      
      And using ping instead of his socat example:
          $ ping -I vrf0 -c1 10.2.2.2
          ping: Warning: source address might be selected on device other than vrf0.
          PING 10.2.2.2 (10.2.2.2) from 9.9.9.9 vrf0: 56(84) bytes of data.
      
      >From tcpdump:
          12:57:29.449128 IP 9.9.9.9 > 10.2.2.2: ICMP echo request, id 2491, seq 1, length 64
      
      Note the source address is from lo and is not a VRF local address. With
      this patch:
      
          $ ping -I vrf0 -c1 10.2.2.2
          PING 10.2.2.2 (10.2.2.2) from 10.0.0.3 vrf0: 56(84) bytes of data.
      
      >From tcpdump:
          12:59:25.096426 IP 10.0.0.3 > 10.2.2.2: ICMP echo request, id 2113, seq 1, length 64
      
      Now the source address comes from vrf0.
      
      The ipv4 function for selecting source address takes a const argument.
      Removing the const requires touching a lot of places, so instead
      l3mdev_master_ifindex_rcu is changed to take a const argument and then
      do the typecast to non-const as required by netdev_master_upper_dev_get_rcu.
      This is similar to what l3mdev_fib_table_rcu does.
      
      IPv6 for unnumbered interfaces appears to be selecting the addresses
      properly.
      
      Cc: David Lamparter <david@opensourcerouting.org>
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3f2fb9a8
  4. 26 2月, 2016 14 次提交
    • D
      net: ethtool: remove unused __ethtool_get_settings · 3237fc63
      David Decotigny 提交于
      replaced by __ethtool_get_link_ksettings.
      Signed-off-by: NDavid Decotigny <decot@googlers.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3237fc63
    • D
      7cad1bac
    • D
      702b26a2
    • D
      57709798
    • D
      net: ethtool: add new ETHTOOL_xLINKSETTINGS API · 3f1ac7a7
      David Decotigny 提交于
      This patch defines a new ETHTOOL_GLINKSETTINGS/SLINKSETTINGS API,
      handled by the new get_link_ksettings/set_link_ksettings callbacks.
      This API provides support for most legacy ethtool_cmd fields, adds
      support for larger link mode masks (up to 4064 bits, variable length),
      and removes ethtool_cmd deprecated
      fields (transceiver/maxrxpkt/maxtxpkt).
      
      This API is deprecating the legacy ETHTOOL_GSET/SSET API and provides
      the following backward compatibility properties:
       - legacy ethtool with legacy drivers: no change, still using the
         get_settings/set_settings callbacks.
       - legacy ethtool with new get/set_link_ksettings drivers: the new
         driver callbacks are used, data internally converted to legacy
         ethtool_cmd. ETHTOOL_GSET will return only the 1st 32b of each link
         mode mask. ETHTOOL_SSET will fail if user tries to set the
         ethtool_cmd deprecated fields to
         non-0 (transceiver/maxrxpkt/maxtxpkt). A kernel warning is logged if
         driver sets higher bits.
       - future ethtool with legacy drivers: no change, still using the
         get_settings/set_settings callbacks, internally converted to new data
         structure. Deprecated fields (transceiver/maxrxpkt/maxtxpkt) will be
         ignored and seen as 0 from user space. Note that that "future"
         ethtool tool will not allow changes to these deprecated fields.
       - future ethtool with new drivers: direct call to the new callbacks.
      
      By "future" ethtool, what is meant is:
       - query: first try ETHTOOL_GLINKSETTINGS, and revert to ETHTOOL_GSET if
         fails
       - set: query first and remember which of ETHTOOL_GLINKSETTINGS or
         ETHTOOL_GSET was successful
         + if ETHTOOL_GLINKSETTINGS was successful, then change config with
           ETHTOOL_SLINKSETTINGS. A failure there is final (do not try
           ETHTOOL_SSET).
         + otherwise ETHTOOL_GSET was successful, change config with
           ETHTOOL_SSET. A failure there is final (do not try
           ETHTOOL_SLINKSETTINGS).
      
      The interaction user/kernel via the new API requires a small
      ETHTOOL_GLINKSETTINGS handshake first to agree on the length of the link
      mode bitmaps. If kernel doesn't agree with user, it returns the bitmap
      length it is expecting from user as a negative length (and cmd field is
      0). When kernel and user agree, kernel returns valid info in all
      fields (ie. link mode length > 0 and cmd is ETHTOOL_GLINKSETTINGS).
      
      Data structure crossing user/kernel boundary is 32/64-bit
      agnostic. Converted internally to a legal kernel bitmap.
      
      The internal __ethtool_get_settings kernel helper will gradually be
      replaced by __ethtool_get_link_ksettings by the time the first
      "link_settings" drivers start to appear. So this patch doesn't change
      it, it will be removed before it needs to be changed.
      Signed-off-by: NDavid Decotigny <decot@googlers.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3f1ac7a7
    • T
      net: Facility to report route quality of connected sockets · a87cb3e4
      Tom Herbert 提交于
      This patch add the SO_CNX_ADVICE socket option (setsockopt only). The
      purpose is to allow an application to give feedback to the kernel about
      the quality of the network path for a connected socket. The value
      argument indicates the type of quality report. For this initial patch
      the only supported advice is a value of 1 which indicates "bad path,
      please reroute"-- the action taken by the kernel is to call
      dst_negative_advice which will attempt to choose a different ECMP route,
      reset the TX hash for flow label and UDP source port in encapsulation,
      etc.
      
      This facility should be useful for connected UDP sockets where only the
      application can provide any feedback about path quality. It could also
      be useful for TCP applications that have additional knowledge about the
      path outside of the normal TCP control loop.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a87cb3e4
    • D
      net: ipv6: Make address flushing on ifdown optional · f1705ec1
      David Ahern 提交于
      Currently, all ipv6 addresses are flushed when the interface is configured
      down, including global, static addresses:
      
          $ ip -6 addr show dev eth1
          3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
              inet6 2100:1::2/120 scope global
                 valid_lft forever preferred_lft forever
              inet6 fe80::e0:f9ff:fe79:34bd/64 scope link
                 valid_lft forever preferred_lft forever
          $ ip link set dev eth1 down
          $ ip -6 addr show dev eth1
          << nothing; all addresses have been flushed>>
      
      Add a new sysctl to make this behavior optional. The new setting defaults to
      flush all addresses to maintain backwards compatibility. When the set global
      addresses with no expire times are not flushed on an admin down. The sysctl
      is per-interface or system-wide for all interfaces
      
          $ sysctl -w net.ipv6.conf.eth1.keep_addr_on_down=1
      or
          $ sysctl -w net.ipv6.conf.all.keep_addr_on_down=1
      
      Will keep addresses on eth1 on an admin down.
      
          $ ip -6 addr show dev eth1
          3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
              inet6 2100:1::2/120 scope global
                 valid_lft forever preferred_lft forever
              inet6 fe80::e0:f9ff:fe79:34bd/64 scope link
                 valid_lft forever preferred_lft forever
          $ ip link set dev eth1 down
          $ ip -6 addr show dev eth1
          3: eth1: <BROADCAST,MULTICAST> mtu 1500 state DOWN qlen 1000
              inet6 2100:1::2/120 scope global tentative
                 valid_lft forever preferred_lft forever
              inet6 fe80::e0:f9ff:fe79:34bd/64 scope link tentative
                 valid_lft forever preferred_lft forever
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f1705ec1
    • F
      tipc: fix null deref crash in compat config path · 619b1745
      Florian Westphal 提交于
      msg.dst_sk needs to be set up with a valid socket because some callbacks
      later derive the netns from it.
      
      Fixes: 263ea09084d172d ("Revert "genl: Add genlmsg_new_unicast() for unicast message allocation")
      Reported-by: NJon Maloy <maloy@donjonn.com>
      Bisected-by: NJon Maloy <maloy@donjonn.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Acked-by Jon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      619b1745
    • J
      tipc: fix crash during node removal · d25a0125
      Jon Paul Maloy 提交于
      When the TIPC module is unloaded, we have identified a race condition
      that allows a node reference counter to go to zero and the node instance
      being freed before the node timer is finished with accessing it. This
      leads to occasional crashes, especially in multi-namespace environments.
      
      The scenario goes as follows:
      
      CPU0:(node_stop)                       CPU1:(node_timeout)  // ref == 2
      
      1:                                          if(!mod_timer())
      2: if (del_timer())
      3:   tipc_node_put()                                        // ref -> 1
      4: tipc_node_put()                                          // ref -> 0
      5:   kfree_rcu(node);
      6:                                               tipc_node_get(node)
      7:                                               // BOOM!
      
      We now clean up this functionality as follows:
      
      1) We remove the node pointer from the node lookup table before we
         attempt deactivating the timer. This way, we reduce the risk that
         tipc_node_find() may obtain a valid pointer to an instance marked
         for deletion; a harmless but undesirable situation.
      
      2) We use del_timer_sync() instead of del_timer() to safely deactivate
         the node timer without any risk that it might be reactivated by the
         timeout handler. There is no risk of deadlock here, since the two
         functions never touch the same spinlocks.
      
      3: We remove a pointless tipc_node_get() + tipc_node_put() from the
         timeout handler.
      Reported-by: NZhijiang Hu <huzhijiang@gmail.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d25a0125
    • J
      tipc: eliminate risk of finding to-be-deleted node instance · b170997a
      Jon Paul Maloy 提交于
      Although we have never seen it happen, we have identified the
      following problematic scenario when nodes are stopped and deleted:
      
      CPU0:                            CPU1:
      
      tipc_node_xxx()                                   //ref == 1
         tipc_node_put()                                //ref -> 0
                                       tipc_node_find() // node still in table
             tipc_node_delete()
               list_del_rcu(n. list)
                                       tipc_node_get()  //ref -> 1, bad
               kfree_rcu()
      
                                       tipc_node_put() //ref to 0 again.
                                       kfree_rcu()     // BOOM!
      
      We fix this by introducing use of the conditional kref_get_if_not_zero()
      instead of kref_get() in the function tipc_node_find(). This eliminates
      any risk of post-mortem access.
      Reported-by: NZhijiang Hu <huzhijiang@gmail.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b170997a
    • V
      net: dsa: drop vlan_getnext · 477b1845
      Vivien Didelot 提交于
      The VLAN GetNext operation is specific to some switches, and thus can be
      complicated to implement for some drivers.
      
      Remove the support for the vlan_getnext/port_pvid_get approach in favor
      of the generic and simpler port_vlan_dump function.
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      477b1845
    • V
      net: dsa: add port_vlan_dump routine · 65aebfc0
      Vivien Didelot 提交于
      Similar to port_fdb_dump, add a port_vlan_dump function to DSA drivers
      which gets passed the switchdev VLAN object and callback.
      
      This function, if implemented, takes precedence over the soon legacy
      vlan_getnext/port_pvid_get approach.
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      65aebfc0
    • W
      net_sched: add network namespace support for tc actions · ddf97ccd
      WANG Cong 提交于
      Currently tc actions are stored in a per-module hashtable,
      therefore are visible to all network namespaces. This is
      probably the last part of the tc subsystem which is not
      aware of netns now. This patch makes them per-netns,
      several tc action API's need to be adjusted for this.
      
      The tc action API code is ugly due to historical reasons,
      we need to refactor that code in the future.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ddf97ccd
    • W
      net_sched: prepare tcf_hashinfo_destroy() for netns support · 1d4150c0
      WANG Cong 提交于
      We only release the memory of the hashtable itself, not its
      entries inside. This is not a problem yet since we only call
      it in module release path, and module is refcount'ed by
      actions. This would be a problem after we move the per module
      hinfo into per netns in the latter patch.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d4150c0
  5. 25 2月, 2016 6 次提交
  6. 24 2月, 2016 1 次提交
新手
引导
客服 返回
顶部