1. 16 9月, 2015 1 次提交
  2. 01 9月, 2015 1 次提交
    • D
      tcp: use dctcp if enabled on the route to the initiator · c3a8d947
      Daniel Borkmann 提交于
      Currently, the following case doesn't use DCTCP, even if it should:
      A responder has f.e. Cubic as system wide default, but for a specific
      route to the initiating host, DCTCP is being set in RTAX_CC_ALGO. The
      initiating host then uses DCTCP as congestion control, but since the
      initiator sets ECT(0), tcp_ecn_create_request() doesn't set ecn_ok,
      and we have to fall back to Reno after 3WHS completes.
      
      We were thinking on how to solve this in a minimal, non-intrusive
      way without bloating tcp_ecn_create_request() needlessly: lets cache
      the CA ecn option flag in RTAX_FEATURES. In other words, when ECT(0)
      is set on the SYN packet, set ecn_ok=1 iff route RTAX_FEATURES
      contains the unexposed (internal-only) DST_FEATURE_ECN_CA. This allows
      to only do a single metric feature lookup inside tcp_ecn_create_request().
      
      Joint work with Florian Westphal.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c3a8d947
  3. 22 7月, 2015 1 次提交
  4. 16 7月, 2015 2 次提交
  5. 09 7月, 2015 1 次提交
    • D
      rtnetlink: verify IFLA_VF_INFO attributes before passing them to driver · 4f7d2cdf
      Daniel Borkmann 提交于
      Jason Gunthorpe reported that since commit c02db8c6 ("rtnetlink: make
      SR-IOV VF interface symmetric"), we don't verify IFLA_VF_INFO attributes
      anymore with respect to their policy, that is, ifla_vfinfo_policy[].
      
      Before, they were part of ifla_policy[], but they have been nested since
      placed under IFLA_VFINFO_LIST, that contains the attribute IFLA_VF_INFO,
      which is another nested attribute for the actual VF attributes such as
      IFLA_VF_MAC, IFLA_VF_VLAN, etc.
      
      Despite the policy being split out from ifla_policy[] in this commit,
      it's never applied anywhere. nla_for_each_nested() only does basic nla_ok()
      testing for struct nlattr, but it doesn't know about the data context and
      their requirements.
      
      Fix, on top of Jason's initial work, does 1) parsing of the attributes
      with the right policy, and 2) using the resulting parsed attribute table
      from 1) instead of the nla_for_each_nested() loop (just like we used to
      do when still part of ifla_policy[]).
      
      Reference: http://thread.gmane.org/gmane.linux.network/368913
      Fixes: c02db8c6 ("rtnetlink: make SR-IOV VF interface symmetric")
      Reported-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
      Cc: Greg Rose <gregory.v.rose@intel.com>
      Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      Cc: Rony Efraim <ronye@mellanox.com>
      Cc: Vlad Zolotarov <vladz@cloudius-systems.com>
      Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      Cc: Thomas Graf <tgraf@suug.ch>
      Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NVlad Zolotarov <vladz@cloudius-systems.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f7d2cdf
  6. 23 6月, 2015 1 次提交
    • S
      switchdev; add VLAN support for port's bridge_getlink · 7d4f8d87
      Scott Feldman 提交于
      One more missing piece of the puzzle.  Add vlan dump support to switchdev
      port's bridge_getlink.  iproute2 "bridge vlan show" cmd already knows how
      to show the vlans installed on the bridge and the device , but (until now)
      no one implemented the port vlan part of the netlink PF_BRIDGE:RTM_GETLINK
      msg.  Before this patch, "bridge vlan show":
      
      	$ bridge -c vlan show
      	port    vlan ids
      	sw1p1    30-34			<< bridge side vlans
      		 57
      
      	sw1p1				<< device side vlans (missing)
      
      	sw1p2    57
      
      	sw1p2
      
      	sw1p3
      
      	sw1p4
      
      	br0     None
      
      (When the port is bridged, the output repeats the vlan list for the vlans
      on the bridge side of the port and the vlans on the device side of the
      port.  The listing above show no vlans for the device side even though they
      are installed).
      
      After this patch:
      
      	$ bridge -c vlan show
      	port    vlan ids
      	sw1p1    30-34			<< bridge side vlan
      		 57
      
      	sw1p1    30-34			<< device side vlans
      		 57
      		 3840 PVID
      
      	sw1p2    57
      
      	sw1p2    57
      		 3840 PVID
      
      	sw1p3    3842 PVID
      
      	sw1p4    3843 PVID
      
      	br0     None
      
      I re-used ndo_dflt_bridge_getlink to add vlan fill call-back func.
      switchdev support adds an obj dump for VLAN objects, using the same
      call-back scheme as FDB dump.  Support included for both compressed and
      un-compressed vlan dumps.
      Signed-off-by: NScott Feldman <sfeldma@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d4f8d87
  7. 16 6月, 2015 1 次提交
  8. 18 5月, 2015 1 次提交
    • N
      rtnl/bond: don't send rtnl msg for unregistered iface · ed2a80ab
      Nicolas Dichtel 提交于
      Before the patch, the command 'ip link add bond2 type bond mode 802.3ad'
      causes the kernel to send a rtnl message for the bond2 interface, with an
      ifindex 0.
      
      'ip monitor' shows:
      0: bond2: <BROADCAST,MULTICAST,MASTER> mtu 1500 state DOWN group default
          link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
      9: bond2@NONE: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN group default
          link/ether ea:3e:1f:53:92:7b brd ff:ff:ff:ff:ff:ff
      [snip]
      
      The patch fixes the spotted bug by checking in bond driver if the interface
      is registered before calling the notifier chain.
      It also adds a check in rtmsg_ifinfo() to prevent this kind of bug in the
      future.
      
      Fixes: d4261e56 ("bonding: create netlink event when bonding option is changed")
      CC: Jiri Pirko <jiri@resnulli.us>
      Reported-by: NJulien Meunier <julien.meunier@6wind.com>
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ed2a80ab
  9. 14 5月, 2015 1 次提交
  10. 13 5月, 2015 2 次提交
  11. 10 5月, 2015 1 次提交
  12. 30 4月, 2015 1 次提交
    • N
      bridge/nl: remove wrong use of NLM_F_MULTI · 46c264da
      Nicolas Dichtel 提交于
      NLM_F_MULTI must be used only when a NLMSG_DONE message is sent. In fact,
      it is sent only at the end of a dump.
      
      Libraries like libnl will wait forever for NLMSG_DONE.
      
      Fixes: e5a55a89 ("net: create generic bridge ops")
      Fixes: 815cccbf ("ixgbe: add setlink, getlink support to ixgbe and ixgbevf")
      CC: John Fastabend <john.r.fastabend@intel.com>
      CC: Sathya Perla <sathya.perla@emulex.com>
      CC: Subbu Seetharaman <subbu.seetharaman@emulex.com>
      CC: Ajit Khaparde <ajit.khaparde@emulex.com>
      CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      CC: intel-wired-lan@lists.osuosl.org
      CC: Jiri Pirko <jiri@resnulli.us>
      CC: Scott Feldman <sfeldma@gmail.com>
      CC: Stephen Hemminger <stephen@networkplumber.org>
      CC: bridge@lists.linux-foundation.org
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      46c264da
  13. 11 4月, 2015 2 次提交
  14. 10 4月, 2015 1 次提交
  15. 03 4月, 2015 1 次提交
  16. 25 3月, 2015 2 次提交
    • W
      net: allow to delete a whole device group · 66400d54
      WANG Cong 提交于
      With dev group, we can change a batch of net devices,
      so we should allow to delete them together too.
      
      Group 0 is not allowed to be deleted since it is
      the default group.
      
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      66400d54
    • W
      net: use for_each_netdev_safe() in rtnl_group_changelink() · d079535d
      WANG Cong 提交于
      In case we move the whole dev group to another netns,
      we should call for_each_netdev_safe(), otherwise we get
      a soft lockup:
      
       NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [ip:798]
       irq event stamp: 255424
       hardirqs last  enabled at (255423): [<ffffffff81a2aa95>] restore_args+0x0/0x30
       hardirqs last disabled at (255424): [<ffffffff81a2ad5a>] apic_timer_interrupt+0x6a/0x80
       softirqs last  enabled at (255422): [<ffffffff81079ebc>] __do_softirq+0x2c1/0x3a9
       softirqs last disabled at (255417): [<ffffffff8107a190>] irq_exit+0x41/0x95
       CPU: 0 PID: 798 Comm: ip Not tainted 4.0.0-rc4+ #881
       Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
       task: ffff8800d1b88000 ti: ffff880119530000 task.ti: ffff880119530000
       RIP: 0010:[<ffffffff810cad11>]  [<ffffffff810cad11>] debug_lockdep_rcu_enabled+0x28/0x30
       RSP: 0018:ffff880119533778  EFLAGS: 00000246
       RAX: ffff8800d1b88000 RBX: 0000000000000002 RCX: 0000000000000038
       RDX: 0000000000000000 RSI: ffff8800d1b888c8 RDI: ffff8800d1b888c8
       RBP: ffff880119533778 R08: 0000000000000000 R09: 0000000000000000
       R10: 0000000000000000 R11: 000000000000b5c2 R12: 0000000000000246
       R13: ffff880119533708 R14: 00000000001d5a40 R15: ffff88011a7d5a40
       FS:  00007fc01315f740(0000) GS:ffff88011a600000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
       CR2: 00007f367a120988 CR3: 000000011849c000 CR4: 00000000000007f0
       Stack:
        ffff880119533798 ffffffff811ac868 ffffffff811ac831 ffffffff811ac828
        ffff8801195337c8 ffffffff811ac8c9 ffff8801195339b0 ffff8801197633e0
        0000000000000000 ffff8801195339b0 ffff8801195337d8 ffffffff811ad2d7
       Call Trace:
        [<ffffffff811ac868>] rcu_read_lock+0x37/0x6e
        [<ffffffff811ac831>] ? rcu_read_unlock+0x5f/0x5f
        [<ffffffff811ac828>] ? rcu_read_unlock+0x56/0x5f
        [<ffffffff811ac8c9>] __fget+0x2a/0x7a
        [<ffffffff811ad2d7>] fget+0x13/0x15
        [<ffffffff811be732>] proc_ns_fget+0xe/0x38
        [<ffffffff817c7714>] get_net_ns_by_fd+0x11/0x59
        [<ffffffff817df359>] rtnl_link_get_net+0x33/0x3e
        [<ffffffff817df3d7>] do_setlink+0x73/0x87b
        [<ffffffff810b28ce>] ? trace_hardirqs_off+0xd/0xf
        [<ffffffff81a2aa95>] ? retint_restore_args+0xe/0xe
        [<ffffffff817e0301>] rtnl_newlink+0x40c/0x699
        [<ffffffff817dffe0>] ? rtnl_newlink+0xeb/0x699
        [<ffffffff81a29246>] ? _raw_spin_unlock+0x28/0x33
        [<ffffffff8143ed1e>] ? security_capable+0x18/0x1a
        [<ffffffff8107da51>] ? ns_capable+0x4d/0x65
        [<ffffffff817de5ce>] rtnetlink_rcv_msg+0x181/0x194
        [<ffffffff817de407>] ? rtnl_lock+0x17/0x19
        [<ffffffff817de407>] ? rtnl_lock+0x17/0x19
        [<ffffffff817de44d>] ? __rtnl_unlock+0x17/0x17
        [<ffffffff818327c6>] netlink_rcv_skb+0x4d/0x93
        [<ffffffff817de42f>] rtnetlink_rcv+0x26/0x2d
        [<ffffffff81830f18>] netlink_unicast+0xcb/0x150
        [<ffffffff8183198e>] netlink_sendmsg+0x501/0x523
        [<ffffffff8115cba9>] ? might_fault+0x59/0xa9
        [<ffffffff817b5398>] ? copy_from_user+0x2a/0x2c
        [<ffffffff817b7b74>] sock_sendmsg+0x34/0x3c
        [<ffffffff817b7f6d>] ___sys_sendmsg+0x1b8/0x255
        [<ffffffff8115c5eb>] ? handle_pte_fault+0xbd5/0xd4a
        [<ffffffff8100a2b0>] ? native_sched_clock+0x35/0x37
        [<ffffffff8109e94b>] ? sched_clock_local+0x12/0x72
        [<ffffffff8109eb9c>] ? sched_clock_cpu+0x9e/0xb7
        [<ffffffff810cadbf>] ? rcu_read_lock_held+0x3b/0x3d
        [<ffffffff811ac1d8>] ? __fcheck_files+0x4c/0x58
        [<ffffffff811ac946>] ? __fget_light+0x2d/0x52
        [<ffffffff817b8adc>] __sys_sendmsg+0x42/0x60
        [<ffffffff817b8b0c>] SyS_sendmsg+0x12/0x1c
        [<ffffffff81a29e32>] system_call_fastpath+0x12/0x17
      
      Fixes: e7ed828f ("netlink: support setting devgroup parameters")
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d079535d
  17. 19 3月, 2015 1 次提交
  18. 11 3月, 2015 1 次提交
  19. 01 3月, 2015 3 次提交
    • E
      net: do not use rcu in rtnl_dump_ifinfo() · cac5e65e
      Eric Dumazet 提交于
      We did a failed attempt in the past to only use rcu in rtnl dump
      operations (commit e67f88dd "net: dont hold rtnl mutex during
      netlink dump callbacks")
      
      Now that dumps are holding RTNL anyway, there is no need to also
      use rcu locking, as it forbids any scheduling ability, like
      GFP_KERNEL allocations that controlling path should use instead
      of GFP_ATOMIC whenever possible.
      
      This should fix following splat Cong Wang reported :
      
       [ INFO: suspicious RCU usage. ]
       3.19.0+ #805 Tainted: G        W
      
       include/linux/rcupdate.h:538 Illegal context switch in RCU read-side critical section!
      
       other info that might help us debug this:
      
       rcu_scheduler_active = 1, debug_locks = 0
       2 locks held by ip/771:
        #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff8182b8f4>] netlink_dump+0x21/0x26c
        #1:  (rcu_read_lock){......}, at: [<ffffffff817d785b>] rcu_read_lock+0x0/0x6e
      
       stack backtrace:
       CPU: 3 PID: 771 Comm: ip Tainted: G        W       3.19.0+ #805
       Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
        0000000000000001 ffff8800d51e7718 ffffffff81a27457 0000000029e729e6
        ffff8800d6108000 ffff8800d51e7748 ffffffff810b539b ffffffff820013dd
        00000000000001c8 0000000000000000 ffff8800d7448088 ffff8800d51e7758
       Call Trace:
        [<ffffffff81a27457>] dump_stack+0x4c/0x65
        [<ffffffff810b539b>] lockdep_rcu_suspicious+0x107/0x110
        [<ffffffff8109796f>] rcu_preempt_sleep_check+0x45/0x47
        [<ffffffff8109e457>] ___might_sleep+0x1d/0x1cb
        [<ffffffff8109e67d>] __might_sleep+0x78/0x80
        [<ffffffff814b9b1f>] idr_alloc+0x45/0xd1
        [<ffffffff810cb7ab>] ? rcu_read_lock_held+0x3b/0x3d
        [<ffffffff814b9f9d>] ? idr_for_each+0x53/0x101
        [<ffffffff817c1383>] alloc_netid+0x61/0x69
        [<ffffffff817c14c3>] __peernet2id+0x79/0x8d
        [<ffffffff817c1ab7>] peernet2id+0x13/0x1f
        [<ffffffff817d8673>] rtnl_fill_ifinfo+0xa8d/0xc20
        [<ffffffff810b17d9>] ? __lock_is_held+0x39/0x52
        [<ffffffff817d894f>] rtnl_dump_ifinfo+0x149/0x213
        [<ffffffff8182b9c2>] netlink_dump+0xef/0x26c
        [<ffffffff8182bcba>] netlink_recvmsg+0x17b/0x2c5
        [<ffffffff817b0adc>] __sock_recvmsg+0x4e/0x59
        [<ffffffff817b1b40>] sock_recvmsg+0x3f/0x51
        [<ffffffff817b1f9a>] ___sys_recvmsg+0xf6/0x1d9
        [<ffffffff8115dc67>] ? handle_pte_fault+0x6e1/0xd3d
        [<ffffffff8100a3a0>] ? native_sched_clock+0x35/0x37
        [<ffffffff8109f45b>] ? sched_clock_local+0x12/0x72
        [<ffffffff8109f6ac>] ? sched_clock_cpu+0x9e/0xb7
        [<ffffffff810cb7ab>] ? rcu_read_lock_held+0x3b/0x3d
        [<ffffffff811abde8>] ? __fcheck_files+0x4c/0x58
        [<ffffffff811ac556>] ? __fget_light+0x2d/0x52
        [<ffffffff817b376f>] __sys_recvmsg+0x42/0x60
        [<ffffffff817b379f>] SyS_recvmsg+0x12/0x1c
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Fixes: 0c7aecd4 ("netns: add rtnl cmd to add and get peer netns ids")
      Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      Reported-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cac5e65e
    • E
      net: Verify permission to link_net in newlink · 06615bed
      Eric W. Biederman 提交于
      When applicable verify that the caller has permisson to the underlying
      network namespace for a newly created network device.
      
      Similary checks exist for the network namespace a network device will
      be created in.
      
      Fixes: 317f4810 ("rtnl: allow to create device with IFLA_LINK_NETNSID set")
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06615bed
    • E
      net: Verify permission to dest_net in newlink · 505ce415
      Eric W. Biederman 提交于
      When applicable verify that the caller has permision to create a
      network device in another network namespace.  This check is already
      present when moving a network device between network namespaces in
      setlink so all that is needed is to duplicate that check in newlink.
      
      This change almost backports cleanly, but there are context conflicts
      as the code that follows was added in v4.0-rc1
      
      Fixes: b51642f6 net: Enable a userns root rtnl calls that are safe for unprivilged users
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      505ce415
  20. 25 2月, 2015 1 次提交
  21. 16 2月, 2015 1 次提交
  22. 08 2月, 2015 1 次提交
    • D
      rtnetlink: ifla_vf_policy: fix misuses of NLA_BINARY · 364d5716
      Daniel Borkmann 提交于
      ifla_vf_policy[] is wrong in advertising its individual member types as
      NLA_BINARY since .type = NLA_BINARY in combination with .len declares the
      len member as *max* attribute length [0, len].
      
      The issue is that when do_setvfinfo() is being called to set up a VF
      through ndo handler, we could set corrupted data if the attribute length
      is less than the size of the related structure itself.
      
      The intent is exactly the opposite, namely to make sure to pass at least
      data of minimum size of len.
      
      Fixes: ebc08a6f ("rtnetlink: Add VF config code to rtnetlink")
      Cc: Mitch Williams <mitch.a.williams@intel.com>
      Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      364d5716
  23. 05 2月, 2015 1 次提交
  24. 02 2月, 2015 1 次提交
  25. 30 1月, 2015 1 次提交
    • N
      rtnetlink: pass link_net to the newlink handler · 7b4ce694
      Nicolas Dichtel 提交于
      When IFLA_LINK_NETNSID is used, the netdevice should be built in this link netns
      and moved at the end to another netns (pointed by the socket netns or
      IFLA_NET_NS_[PID|FD]).
      
      Existing user of the newlink handler will use the netns argument (src_net) to
      find a link netdevice or to check some other information into the link netns.
      For example, to find a netdevice, two information are required: an ifindex
      (usually from IFLA_LINK) and a netns (this link netns).
      
      Note: when using IFLA_LINK_NETNSID and IFLA_NET_NS_[PID|FD], a user may create a
      netdevice that stands in netnsX and with its link part in netnsY, by sending a
      rtnl message from netnsZ.
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7b4ce694
  26. 29 1月, 2015 1 次提交
  27. 24 1月, 2015 1 次提交
  28. 20 1月, 2015 2 次提交
  29. 19 1月, 2015 2 次提交
  30. 18 1月, 2015 2 次提交
    • J
      netlink: make nlmsg_end() and genlmsg_end() void · 053c095a
      Johannes Berg 提交于
      Contrary to common expectations for an "int" return, these functions
      return only a positive value -- if used correctly they cannot even
      return 0 because the message header will necessarily be in the skb.
      
      This makes the very common pattern of
      
        if (genlmsg_end(...) < 0) { ... }
      
      be a whole bunch of dead code. Many places also simply do
      
        return nlmsg_end(...);
      
      and the caller is expected to deal with it.
      
      This also commonly (at least for me) causes errors, because it is very
      common to write
      
        if (my_function(...))
          /* error condition */
      
      and if my_function() does "return nlmsg_end()" this is of course wrong.
      
      Additionally, there's not a single place in the kernel that actually
      needs the message length returned, and if anyone needs it later then
      it'll be very easy to just use skb->len there.
      
      Remove this, and make the functions void. This removes a bunch of dead
      code as described above. The patch adds lines because I did
      
      -	return nlmsg_end(...);
      +	nlmsg_end(...);
      +	return 0;
      
      I could have preserved all the function's return values by returning
      skb->len, but instead I've audited all the places calling the affected
      functions and found that none cared. A few places actually compared
      the return value with <= 0 in dump functionality, but that could just
      be changed to < 0 with no change in behaviour, so I opted for the more
      efficient version.
      
      One instance of the error I've made numerous times now is also present
      in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't
      check for <0 or <=0 and thus broke out of the loop every single time.
      I've preserved this since it will (I think) have caused the messages to
      userspace to be formatted differently with just a single message for
      every SKB returned to userspace. It's possible that this isn't needed
      for the tools that actually use this, but I don't even know what they
      are so couldn't test that changing this behaviour would be acceptable.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      053c095a
    • R
      bridge: fix setlink/dellink notifications · 02dba438
      Roopa Prabhu 提交于
      problems with bridge getlink/setlink notifications today:
              - bridge setlink generates two notifications to userspace
                      - one from the bridge driver
                      - one from rtnetlink.c (rtnl_bridge_notify)
              - dellink generates one notification from rtnetlink.c. Which
      	means bridge setlink and dellink notifications are not
      	consistent
      
              - Looking at the code it appears,
      	If both BRIDGE_FLAGS_MASTER and BRIDGE_FLAGS_SELF were set,
              the size calculation in rtnl_bridge_notify can be wrong.
              Example: if you set both BRIDGE_FLAGS_MASTER and BRIDGE_FLAGS_SELF
              in a setlink request to rocker dev, rtnl_bridge_notify will
      	allocate skb for one set of bridge attributes, but,
      	both the bridge driver and rocker dev will try to add
      	attributes resulting in twice the number of attributes
      	being added to the skb.  (rocker dev calls ndo_dflt_bridge_getlink)
      
      There are multiple options:
      1) Generate one notification including all attributes from master and self:
         But, I don't think it will work, because both master and self may use
         the same attributes/policy. Cannot pack the same set of attributes in a
         single notification from both master and slave (duplicate attributes).
      
      2) Generate one notification from master and the other notification from
         self (This seems to be ideal):
           For master: the master driver will send notification (bridge in this
      	example)
           For self: the self driver will send notification (rocker in the above
      	example. It can use helpers from rtnetlink.c to do so. Like the
      	ndo_dflt_bridge_getlink api).
      
      This patch implements 2) (leaving the 'rtnl_bridge_notify' around to be used
      with 'self').
      
      v1->v2 :
      	- rtnl_bridge_notify is now called only for self,
      	so, remove 'BRIDGE_FLAGS_SELF' check and cleanup a few things
      	- rtnl_bridge_dellink used to always send a RTM_NEWLINK msg
      	earlier. So, I have changed the notification from br_dellink to
      	go as RTM_NEWLINK
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02dba438
  31. 06 1月, 2015 1 次提交
    • D
      net: tcp: add RTAX_CC_ALGO fib handling · ea697639
      Daniel Borkmann 提交于
      This patch adds the minimum necessary for the RTAX_CC_ALGO congestion
      control metric to be set up and dumped back to user space.
      
      While the internal representation of RTAX_CC_ALGO is handled as a u32
      key, we avoided to expose this implementation detail to user space, thus
      instead, we chose the netlink attribute that is being exchanged between
      user space to be the actual congestion control algorithm name, similarly
      as in the setsockopt(2) API in order to allow for maximum flexibility,
      even for 3rd party modules.
      
      It is a bit unfortunate that RTAX_QUICKACK used up a whole RTAX slot as
      it should have been stored in RTAX_FEATURES instead, we first thought
      about reusing it for the congestion control key, but it brings more
      complications and/or confusion than worth it.
      
      Joint work with Florian Westphal.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea697639