1. 26 2月, 2016 10 次提交
    • D
      net: ethtool: add new ETHTOOL_xLINKSETTINGS API · 3f1ac7a7
      David Decotigny 提交于
      This patch defines a new ETHTOOL_GLINKSETTINGS/SLINKSETTINGS API,
      handled by the new get_link_ksettings/set_link_ksettings callbacks.
      This API provides support for most legacy ethtool_cmd fields, adds
      support for larger link mode masks (up to 4064 bits, variable length),
      and removes ethtool_cmd deprecated
      fields (transceiver/maxrxpkt/maxtxpkt).
      
      This API is deprecating the legacy ETHTOOL_GSET/SSET API and provides
      the following backward compatibility properties:
       - legacy ethtool with legacy drivers: no change, still using the
         get_settings/set_settings callbacks.
       - legacy ethtool with new get/set_link_ksettings drivers: the new
         driver callbacks are used, data internally converted to legacy
         ethtool_cmd. ETHTOOL_GSET will return only the 1st 32b of each link
         mode mask. ETHTOOL_SSET will fail if user tries to set the
         ethtool_cmd deprecated fields to
         non-0 (transceiver/maxrxpkt/maxtxpkt). A kernel warning is logged if
         driver sets higher bits.
       - future ethtool with legacy drivers: no change, still using the
         get_settings/set_settings callbacks, internally converted to new data
         structure. Deprecated fields (transceiver/maxrxpkt/maxtxpkt) will be
         ignored and seen as 0 from user space. Note that that "future"
         ethtool tool will not allow changes to these deprecated fields.
       - future ethtool with new drivers: direct call to the new callbacks.
      
      By "future" ethtool, what is meant is:
       - query: first try ETHTOOL_GLINKSETTINGS, and revert to ETHTOOL_GSET if
         fails
       - set: query first and remember which of ETHTOOL_GLINKSETTINGS or
         ETHTOOL_GSET was successful
         + if ETHTOOL_GLINKSETTINGS was successful, then change config with
           ETHTOOL_SLINKSETTINGS. A failure there is final (do not try
           ETHTOOL_SSET).
         + otherwise ETHTOOL_GSET was successful, change config with
           ETHTOOL_SSET. A failure there is final (do not try
           ETHTOOL_SLINKSETTINGS).
      
      The interaction user/kernel via the new API requires a small
      ETHTOOL_GLINKSETTINGS handshake first to agree on the length of the link
      mode bitmaps. If kernel doesn't agree with user, it returns the bitmap
      length it is expecting from user as a negative length (and cmd field is
      0). When kernel and user agree, kernel returns valid info in all
      fields (ie. link mode length > 0 and cmd is ETHTOOL_GLINKSETTINGS).
      
      Data structure crossing user/kernel boundary is 32/64-bit
      agnostic. Converted internally to a legal kernel bitmap.
      
      The internal __ethtool_get_settings kernel helper will gradually be
      replaced by __ethtool_get_link_ksettings by the time the first
      "link_settings" drivers start to appear. So this patch doesn't change
      it, it will be removed before it needs to be changed.
      Signed-off-by: NDavid Decotigny <decot@googlers.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3f1ac7a7
    • T
      net: Facility to report route quality of connected sockets · a87cb3e4
      Tom Herbert 提交于
      This patch add the SO_CNX_ADVICE socket option (setsockopt only). The
      purpose is to allow an application to give feedback to the kernel about
      the quality of the network path for a connected socket. The value
      argument indicates the type of quality report. For this initial patch
      the only supported advice is a value of 1 which indicates "bad path,
      please reroute"-- the action taken by the kernel is to call
      dst_negative_advice which will attempt to choose a different ECMP route,
      reset the TX hash for flow label and UDP source port in encapsulation,
      etc.
      
      This facility should be useful for connected UDP sockets where only the
      application can provide any feedback about path quality. It could also
      be useful for TCP applications that have additional knowledge about the
      path outside of the normal TCP control loop.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a87cb3e4
    • D
      net: ipv6: Make address flushing on ifdown optional · f1705ec1
      David Ahern 提交于
      Currently, all ipv6 addresses are flushed when the interface is configured
      down, including global, static addresses:
      
          $ ip -6 addr show dev eth1
          3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
              inet6 2100:1::2/120 scope global
                 valid_lft forever preferred_lft forever
              inet6 fe80::e0:f9ff:fe79:34bd/64 scope link
                 valid_lft forever preferred_lft forever
          $ ip link set dev eth1 down
          $ ip -6 addr show dev eth1
          << nothing; all addresses have been flushed>>
      
      Add a new sysctl to make this behavior optional. The new setting defaults to
      flush all addresses to maintain backwards compatibility. When the set global
      addresses with no expire times are not flushed on an admin down. The sysctl
      is per-interface or system-wide for all interfaces
      
          $ sysctl -w net.ipv6.conf.eth1.keep_addr_on_down=1
      or
          $ sysctl -w net.ipv6.conf.all.keep_addr_on_down=1
      
      Will keep addresses on eth1 on an admin down.
      
          $ ip -6 addr show dev eth1
          3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
              inet6 2100:1::2/120 scope global
                 valid_lft forever preferred_lft forever
              inet6 fe80::e0:f9ff:fe79:34bd/64 scope link
                 valid_lft forever preferred_lft forever
          $ ip link set dev eth1 down
          $ ip -6 addr show dev eth1
          3: eth1: <BROADCAST,MULTICAST> mtu 1500 state DOWN qlen 1000
              inet6 2100:1::2/120 scope global tentative
                 valid_lft forever preferred_lft forever
              inet6 fe80::e0:f9ff:fe79:34bd/64 scope link tentative
                 valid_lft forever preferred_lft forever
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f1705ec1
    • F
      tipc: fix null deref crash in compat config path · 619b1745
      Florian Westphal 提交于
      msg.dst_sk needs to be set up with a valid socket because some callbacks
      later derive the netns from it.
      
      Fixes: 263ea09084d172d ("Revert "genl: Add genlmsg_new_unicast() for unicast message allocation")
      Reported-by: NJon Maloy <maloy@donjonn.com>
      Bisected-by: NJon Maloy <maloy@donjonn.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Acked-by Jon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      619b1745
    • J
      tipc: fix crash during node removal · d25a0125
      Jon Paul Maloy 提交于
      When the TIPC module is unloaded, we have identified a race condition
      that allows a node reference counter to go to zero and the node instance
      being freed before the node timer is finished with accessing it. This
      leads to occasional crashes, especially in multi-namespace environments.
      
      The scenario goes as follows:
      
      CPU0:(node_stop)                       CPU1:(node_timeout)  // ref == 2
      
      1:                                          if(!mod_timer())
      2: if (del_timer())
      3:   tipc_node_put()                                        // ref -> 1
      4: tipc_node_put()                                          // ref -> 0
      5:   kfree_rcu(node);
      6:                                               tipc_node_get(node)
      7:                                               // BOOM!
      
      We now clean up this functionality as follows:
      
      1) We remove the node pointer from the node lookup table before we
         attempt deactivating the timer. This way, we reduce the risk that
         tipc_node_find() may obtain a valid pointer to an instance marked
         for deletion; a harmless but undesirable situation.
      
      2) We use del_timer_sync() instead of del_timer() to safely deactivate
         the node timer without any risk that it might be reactivated by the
         timeout handler. There is no risk of deadlock here, since the two
         functions never touch the same spinlocks.
      
      3: We remove a pointless tipc_node_get() + tipc_node_put() from the
         timeout handler.
      Reported-by: NZhijiang Hu <huzhijiang@gmail.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d25a0125
    • J
      tipc: eliminate risk of finding to-be-deleted node instance · b170997a
      Jon Paul Maloy 提交于
      Although we have never seen it happen, we have identified the
      following problematic scenario when nodes are stopped and deleted:
      
      CPU0:                            CPU1:
      
      tipc_node_xxx()                                   //ref == 1
         tipc_node_put()                                //ref -> 0
                                       tipc_node_find() // node still in table
             tipc_node_delete()
               list_del_rcu(n. list)
                                       tipc_node_get()  //ref -> 1, bad
               kfree_rcu()
      
                                       tipc_node_put() //ref to 0 again.
                                       kfree_rcu()     // BOOM!
      
      We fix this by introducing use of the conditional kref_get_if_not_zero()
      instead of kref_get() in the function tipc_node_find(). This eliminates
      any risk of post-mortem access.
      Reported-by: NZhijiang Hu <huzhijiang@gmail.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b170997a
    • V
      net: dsa: drop vlan_getnext · 477b1845
      Vivien Didelot 提交于
      The VLAN GetNext operation is specific to some switches, and thus can be
      complicated to implement for some drivers.
      
      Remove the support for the vlan_getnext/port_pvid_get approach in favor
      of the generic and simpler port_vlan_dump function.
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      477b1845
    • V
      net: dsa: add port_vlan_dump routine · 65aebfc0
      Vivien Didelot 提交于
      Similar to port_fdb_dump, add a port_vlan_dump function to DSA drivers
      which gets passed the switchdev VLAN object and callback.
      
      This function, if implemented, takes precedence over the soon legacy
      vlan_getnext/port_pvid_get approach.
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      65aebfc0
    • W
      net_sched: add network namespace support for tc actions · ddf97ccd
      WANG Cong 提交于
      Currently tc actions are stored in a per-module hashtable,
      therefore are visible to all network namespaces. This is
      probably the last part of the tc subsystem which is not
      aware of netns now. This patch makes them per-netns,
      several tc action API's need to be adjusted for this.
      
      The tc action API code is ugly due to historical reasons,
      we need to refactor that code in the future.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ddf97ccd
    • W
      net_sched: prepare tcf_hashinfo_destroy() for netns support · 1d4150c0
      WANG Cong 提交于
      We only release the memory of the hashtable itself, not its
      entries inside. This is not a problem yet since we only call
      it in module release path, and module is refcount'ed by
      actions. This would be a problem after we move the per module
      hinfo into per netns in the latter patch.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d4150c0
  2. 25 2月, 2016 6 次提交
  3. 24 2月, 2016 2 次提交
  4. 23 2月, 2016 20 次提交
  5. 22 2月, 2016 2 次提交