1. 23 8月, 2012 1 次提交
    • E
      net: remove delay at device dismantle · 0115e8e3
      Eric Dumazet 提交于
      I noticed extra one second delay in device dismantle, tracked down to
      a call to dst_dev_event() while some call_rcu() are still in RCU queues.
      
      These call_rcu() were posted by rt_free(struct rtable *rt) calls.
      
      We then wait a little (but one second) in netdev_wait_allrefs() before
      kicking again NETDEV_UNREGISTER.
      
      As the call_rcu() are now completed, dst_dev_event() can do the needed
      device swap on busy dst.
      
      To solve this problem, add a new NETDEV_UNREGISTER_FINAL, called
      after a rcu_barrier(), but outside of RTNL lock.
      
      Use NETDEV_UNREGISTER_FINAL with care !
      
      Change dst_dev_event() handler to react to NETDEV_UNREGISTER_FINAL
      
      Also remove NETDEV_UNREGISTER_BATCH, as its not used anymore after
      IP cache removal.
      
      With help from Gao feng
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Mahesh Bandewar <maheshb@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Gao feng <gaofeng@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0115e8e3
  2. 10 8月, 2012 2 次提交
  3. 30 7月, 2012 1 次提交
    • L
      ipv6: fix incorrect route 'expires' value passed to userspace · 8253947e
      Li Wei 提交于
      When userspace use RTM_GETROUTE to dump route table, with an already
      expired route entry, we always got an 'expires' value(2147157)
      calculated base on INT_MAX.
      
      The reason of this problem is in the following satement:
      	rt->dst.expires - jiffies < INT_MAX
      gcc promoted the type of both sides of '<' to unsigned long, thus
      a small negative value would be considered greater than INT_MAX.
      
      With the help of Eric Dumazet, do the out of bound checks in
      rtnl_put_cacheinfo(), _after_ conversion to clock_t.
      Signed-off-by: NLi Wei <lw@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8253947e
  4. 28 7月, 2012 1 次提交
    • J
      net: fix rtnetlink IFF_PROMISC and IFF_ALLMULTI handling · b1beb681
      Jiri Benc 提交于
      When device flags are set using rtnetlink, IFF_PROMISC and IFF_ALLMULTI
      flags are handled specially. Function dev_change_flags sets IFF_PROMISC and
      IFF_ALLMULTI bits in dev->gflags according to the passed value but
      do_setlink passes a result of rtnl_dev_combine_flags which takes those bits
      from dev->flags.
      
      This can be easily trigerred by doing:
      
      tcpdump -i eth0 &
      ip l s up eth0
      
      ip sets IFF_UP flag in ifi_flags and ifi_change, which is combined with
      IFF_PROMISC by rtnl_dev_combine_flags, causing __dev_change_flags to set
      IFF_PROMISC in gflags.
      Reported-by: NMax Matveev <makc@redhat.com>
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1beb681
  5. 23 7月, 2012 1 次提交
  6. 21 7月, 2012 2 次提交
  7. 15 7月, 2012 1 次提交
  8. 11 7月, 2012 2 次提交
  9. 30 6月, 2012 1 次提交
    • P
      netlink: add netlink_kernel_cfg parameter to netlink_kernel_create · a31f2d17
      Pablo Neira Ayuso 提交于
      This patch adds the following structure:
      
      struct netlink_kernel_cfg {
              unsigned int    groups;
              void            (*input)(struct sk_buff *skb);
              struct mutex    *cb_mutex;
      };
      
      That can be passed to netlink_kernel_create to set optional configurations
      for netlink kernel sockets.
      
      I've populated this structure by looking for NULL and zero parameters at the
      existing code. The remaining parameters that always need to be set are still
      left in the original interface.
      
      That includes optional parameters for the netlink socket creation. This allows
      easy extensibility of this interface in the future.
      
      This patch also adapts all callers to use this new interface.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a31f2d17
  10. 28 6月, 2012 1 次提交
  11. 16 5月, 2012 1 次提交
  12. 16 4月, 2012 4 次提交
    • J
      net: rtnetlink notify events for FDB NTF_SELF adds and deletes · 3ff661c3
      John Fastabend 提交于
      It is useful to be able to monitor for FDB events in user space.
      This patch adds support to generate netlink events when a change
      is made to a device supporting the FDB ops.
      
      This brings embedded switches inline with the SW net/bridge which
      triggers events on FDB updates as well.
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3ff661c3
    • J
      net: add fdb generic dump routine · d83b0603
      John Fastabend 提交于
      This adds a generic dump routine drivers can call. It
      should be sufficient to handle any bridging model that
      uses the unicast address list. This should be most SR-IOV
      enabled NICs.
      
      v2: return error on nlmsg_put and use -EMSGSIZE instead
          of -ENOMEM this is inline other usages
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d83b0603
    • J
      net: add generic PF_BRIDGE:RTM_ FDB hooks · 77162022
      John Fastabend 提交于
      This adds two new flags NTF_MASTER and NTF_SELF that can
      now be used to specify where PF_BRIDGE netlink commands should
      be sent. NTF_MASTER sends the commands to the 'dev->master'
      device for parsing. Typically this will be the linux net/bridge,
      or open-vswitch devices. Also without any flags set the command
      will be handled by the master device as well so that current user
      space tools continue to work as expected.
      
      The NTF_SELF flag will push the PF_BRIDGE commands to the
      device. In the basic example below the commands are then parsed
      and programmed in the embedded bridge.
      
      Note if both NTF_SELF and NTF_MASTER bits are set then the
      command will be sent to both 'dev->master' and 'dev' this allows
      user space to easily keep the embedded bridge and software bridge
      in sync.
      
      There is a slight complication in the case with both flags set
      when an error occurs. To resolve this the rtnl handler clears
      the NTF_ flag in the netlink ack to indicate which sets completed
      successfully. The add/del handlers will abort as soon as any
      error occurs.
      
      To support this new net device ops were added to call into
      the device and the existing bridging code was refactored
      to use these. There should be no required changes in user space
      to support the current bridge behavior.
      
      A basic setup with a SR-IOV enabled NIC looks like this,
      
                veth0  veth2
                  |      |
                ------------
                |  bridge0 |   <---- software bridging
                ------------
                     /
                     /
        ethx.y      ethx
          VF         PF
           \         \          <---- propagate FDB entries to HW
           \         \
        --------------------
        |  Embedded Bridge |    <---- hardware offloaded switching
        --------------------
      
      In this case the embedded bridge must be managed to allow 'veth0'
      to communicate with 'ethx.y' correctly. At present drivers managing
      the embedded bridge either send frames onto the network which
      then get dropped by the switch OR the embedded bridge will flood
      these frames. With this patch we have a mechanism to manage the
      embedded bridge correctly from user space. This example is specific
      to SR-IOV but replacing the VF with another PF or dropping this
      into the DSA framework generates similar management issues.
      
      Examples session using the 'br'[1] tool to add, dump and then
      delete a mac address with a new "embedded" option and enabled
      ixgbe driver:
      
      # br fdb add 22:35:19:ac:60:59 dev eth3
      # br fdb
      port    mac addr                flags
      veth0   22:35:19:ac:60:58       static
      veth0   9a:5f:81:f7:f6:ec       local
      eth3    00:1b:21:55:23:59       local
      eth3    22:35:19:ac:60:59       static
      veth0   22:35:19:ac:60:57       static
      #br fdb add 22:35:19:ac:60:59 embedded dev eth3
      #br fdb
      port    mac addr                flags
      veth0   22:35:19:ac:60:58       static
      veth0   9a:5f:81:f7:f6:ec       local
      eth3    00:1b:21:55:23:59       local
      eth3    22:35:19:ac:60:59       static
      veth0   22:35:19:ac:60:57       static
      eth3    22:35:19:ac:60:59       local embedded
      #br fdb del 22:35:19:ac:60:59 embedded dev eth3
      
      I added a couple lines to 'br' to set the flags correctly is all. It
      is my opinion that the merit of this patch is now embedded and SW
      bridges can both be modeled correctly in user space using very nearly
      the same message passing.
      
      [1] 'br' tool was published as an RFC here and will be renamed 'bridge'
          http://patchwork.ozlabs.org/patch/117664/
      
      Thanks to Jamal Hadi Salim, Stephen Hemminger and Ben Hutchings for
      valuable feedback, suggestions, and review.
      
      v2: fixed api descriptions and error case with both NTF_SELF and
          NTF_MASTER set plus updated patch description.
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      77162022
    • E
      net: cleanup unsigned to unsigned int · 95c96174
      Eric Dumazet 提交于
      Use of "unsigned int" is preferred to bare "unsigned" in net tree.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      95c96174
  13. 14 4月, 2012 1 次提交
  14. 02 4月, 2012 2 次提交
  15. 29 3月, 2012 1 次提交
  16. 05 3月, 2012 1 次提交
  17. 01 3月, 2012 1 次提交
  18. 27 2月, 2012 1 次提交
  19. 22 2月, 2012 1 次提交
    • G
      rtnetlink: Fix problem with buffer allocation · 115c9b81
      Greg Rose 提交于
      Implement a new netlink attribute type IFLA_EXT_MASK.  The mask
      is a 32 bit value that can be used to indicate to the kernel that
      certain extended ifinfo values are requested by the user application.
      At this time the only mask value defined is RTEXT_FILTER_VF to
      indicate that the user wants the ifinfo dump to send information
      about the VFs belonging to the interface.
      
      This patch fixes a bug in which certain applications do not have
      large enough buffers to accommodate the extra information returned
      by the kernel with large numbers of SR-IOV virtual functions.
      Those applications will not send the new netlink attribute with
      the interface info dump request netlink messages so they will
      not get unexpectedly large request buffers returned by the kernel.
      
      Modifies the rtnl_calcit function to traverse the list of net
      devices and compute the minimum buffer size that can hold the
      info dumps of all matching devices based upon the filter passed
      in via the new netlink attribute filter mask.  If no filter
      mask is sent then the buffer allocation defaults to NLMSG_GOODSIZE.
      
      With this change it is possible to add yet to be defined netlink
      attributes to the dump request which should make it fairly extensible
      in the future.
      Signed-off-by: NGreg Rose <gregory.v.rose@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      115c9b81
  20. 27 1月, 2012 1 次提交
  21. 06 1月, 2012 1 次提交
  22. 14 12月, 2011 1 次提交
  23. 17 10月, 2011 1 次提交
    • G
      if_link: Add additional parameter to IFLA_VF_INFO for spoof checking · 5f8444a3
      Greg Rose 提交于
      Add configuration setting for drivers to turn spoof checking on or off
      for discrete VFs.
      
      v2 - Fix indentation problem, wrap the ifla_vf_info structure in
           #ifdef __KERNEL__ to prevent user space from accessing and
           change function paramater for the spoof check setting netdev
           op from u8 to bool.
      v3 - Preset spoof check setting to -1 so that user space tools such
           as ip can detect that the driver didn't report a spoofcheck
           setting.  Prevents incorrect display of spoof check settings
           for drivers that don't report it.
      Signed-off-by: NGreg Rose <gregory.v.rose@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      5f8444a3
  24. 11 8月, 2011 1 次提交
  25. 02 7月, 2011 1 次提交
    • T
      rtnl: provide link dump consistency info · 4e985ada
      Thomas Graf 提交于
      This patch adds a change sequence counter to each net namespace
      which is bumped whenever a netdevice is added or removed from
      the list. If such a change occurred while a link dump took place,
      the dump will have the NLM_F_DUMP_INTR flag set in the first
      message which has been interrupted and in all subsequent messages
      of the same dump.
      
      Note that links may still be modified or renamed while a dump is
      taking place but we can guarantee for userspace to receive a
      complete list of links and not miss any.
      
      Testing:
      I have added 500 VLAN netdevices to make sure the dump is split
      over multiple messages. Then while continuously dumping links in
      one process I also continuously deleted and re-added a dummy
      netdevice in another process. Multiple dumps per seconds have
      had the NLM_F_DUMP_INTR flag set.
      
      I guess we can wait for Johannes patch to hit net-next via the
      wireless tree.  I just wanted to give this some testing right away.
      Signed-off-by: NThomas Graf <tgraf@infradead.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e985ada
  26. 10 6月, 2011 1 次提交
    • G
      rtnetlink: Compute and store minimum ifinfo dump size · c7ac8679
      Greg Rose 提交于
      The message size allocated for rtnl ifinfo dumps was limited to
      a single page.  This is not enough for additional interface info
      available with devices that support SR-IOV and caused a bug in
      which VF info would not be displayed if more than approximately
      40 VFs were created per interface.
      
      Implement a new function pointer for the rtnl_register service that will
      calculate the amount of data required for the ifinfo dump and allocate
      enough data to satisfy the request.
      Signed-off-by: NGreg Rose <gregory.v.rose@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      c7ac8679
  27. 26 5月, 2011 1 次提交
    • E
      net: hold rtnl again in dump callbacks · 2907c35f
      Eric Dumazet 提交于
      Commit e67f88dd (dont hold rtnl mutex during netlink dump callbacks)
      missed fact that rtnl_fill_ifinfo() must be called with rtnl held.
      
      Because of possible deadlocks between two mutexes (cb_mutex and rtnl),
      its not easy to solve this problem, so revert this part of the patch.
      
      It also forgot one rcu_read_unlock() in FIB dump_rules()
      
      Add one ASSERT_RTNL() in rtnl_fill_ifinfo() to remind us the rule.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Patrick McHardy <kaber@trash.net>
      CC: Stephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2907c35f
  28. 23 5月, 2011 1 次提交
  29. 11 5月, 2011 1 次提交
  30. 10 5月, 2011 1 次提交
    • E
      net: use batched device unregister in veth and macvlan · 226bd341
      Eric Dumazet 提交于
      veth devices dont use the batched device unregisters yet.
      
      Since veth are a pair of devices, it makes sense to use a batch of two
      unregisters, this roughly divides dismantle time by two.
      
      Fix this by changing dellink() callers to always provide a non NULL
      head. (Idea from Michał Mirosław)
      
      This patch also handles macvlan case : We now dismantle all macvlans on
      top of a lower dev at once.
      Reported-by: NAlex Bligh <alex@alex.org.uk>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Michał Mirosław <mirqus@gmail.com>
      Cc: Jesse Gross <jesse@nicira.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ben Greear <greearb@candelatech.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      226bd341
  31. 06 5月, 2011 1 次提交
  32. 03 5月, 2011 1 次提交
    • E
      net: dont hold rtnl mutex during netlink dump callbacks · e67f88dd
      Eric Dumazet 提交于
      Four years ago, Patrick made a change to hold rtnl mutex during netlink
      dump callbacks.
      
      I believe it was a wrong move. This slows down concurrent dumps, making
      good old /proc/net/ files faster than rtnetlink in some situations.
      
      This occurred to me because one "ip link show dev ..." was _very_ slow
      on a workload adding/removing network devices in background.
      
      All dump callbacks are able to use RCU locking now, so this patch does
      roughly a revert of commits :
      
      1c2d670f : [RTNETLINK]: Hold rtnl_mutex during netlink dump callbacks
      6313c1e0 : [RTNETLINK]: Remove unnecessary locking in dump callbacks
      
      This let writers fight for rtnl mutex and readers going full speed.
      
      It also takes care of phonet : phonet_route_get() is now called from rcu
      read section. I renamed it to phonet_route_get_rcu()
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Remi Denis-Courmont <remi.denis-courmont@nokia.com>
      Acked-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e67f88dd
  33. 31 3月, 2011 1 次提交