1. 18 11月, 2015 1 次提交
  2. 23 10月, 2015 1 次提交
  3. 22 10月, 2015 1 次提交
    • A
      netlink: Rightsize IFLA_AF_SPEC size calculation · b1974ed0
      Arad, Ronen 提交于
      if_nlmsg_size() overestimates the minimum allocation size of netlink
      dump request (when called from rtnl_calcit()) or the size of the
      message (when called from rtnl_getlink()). This is because
      ext_filter_mask is not supported by rtnl_link_get_af_size() and
      rtnl_link_get_size().
      
      The over-estimation is significant when at least one netdev has many
      VLANs configured (8 bytes for each configured VLAN).
      
      This patch-set "rightsizes" the protocol specific attribute size
      calculation by propagating ext_filter_mask to rtnl_link_get_af_size()
      and adding this a argument to get_link_af_size op in rtnl_af_ops.
      
      Bridge module already used filtering aware sizing for notifications.
      br_get_link_af_size_filtered() is consistent with the modified
      get_link_af_size op so it replaces br_get_link_af_size() in br_af_ops.
      br_get_link_af_size() becomes unused and thus removed.
      Signed-off-by: NRonen Arad <ronen.arad@intel.com>
      Acked-by: NSridhar Samudrala <sridhar.samudrala@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1974ed0
  4. 09 10月, 2015 1 次提交
  5. 03 10月, 2015 1 次提交
  6. 16 9月, 2015 2 次提交
  7. 01 9月, 2015 1 次提交
    • D
      tcp: use dctcp if enabled on the route to the initiator · c3a8d947
      Daniel Borkmann 提交于
      Currently, the following case doesn't use DCTCP, even if it should:
      A responder has f.e. Cubic as system wide default, but for a specific
      route to the initiating host, DCTCP is being set in RTAX_CC_ALGO. The
      initiating host then uses DCTCP as congestion control, but since the
      initiator sets ECT(0), tcp_ecn_create_request() doesn't set ecn_ok,
      and we have to fall back to Reno after 3WHS completes.
      
      We were thinking on how to solve this in a minimal, non-intrusive
      way without bloating tcp_ecn_create_request() needlessly: lets cache
      the CA ecn option flag in RTAX_FEATURES. In other words, when ECT(0)
      is set on the SYN packet, set ecn_ok=1 iff route RTAX_FEATURES
      contains the unexposed (internal-only) DST_FEATURE_ECN_CA. This allows
      to only do a single metric feature lookup inside tcp_ecn_create_request().
      
      Joint work with Florian Westphal.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c3a8d947
  8. 22 7月, 2015 1 次提交
  9. 16 7月, 2015 2 次提交
  10. 09 7月, 2015 1 次提交
    • D
      rtnetlink: verify IFLA_VF_INFO attributes before passing them to driver · 4f7d2cdf
      Daniel Borkmann 提交于
      Jason Gunthorpe reported that since commit c02db8c6 ("rtnetlink: make
      SR-IOV VF interface symmetric"), we don't verify IFLA_VF_INFO attributes
      anymore with respect to their policy, that is, ifla_vfinfo_policy[].
      
      Before, they were part of ifla_policy[], but they have been nested since
      placed under IFLA_VFINFO_LIST, that contains the attribute IFLA_VF_INFO,
      which is another nested attribute for the actual VF attributes such as
      IFLA_VF_MAC, IFLA_VF_VLAN, etc.
      
      Despite the policy being split out from ifla_policy[] in this commit,
      it's never applied anywhere. nla_for_each_nested() only does basic nla_ok()
      testing for struct nlattr, but it doesn't know about the data context and
      their requirements.
      
      Fix, on top of Jason's initial work, does 1) parsing of the attributes
      with the right policy, and 2) using the resulting parsed attribute table
      from 1) instead of the nla_for_each_nested() loop (just like we used to
      do when still part of ifla_policy[]).
      
      Reference: http://thread.gmane.org/gmane.linux.network/368913
      Fixes: c02db8c6 ("rtnetlink: make SR-IOV VF interface symmetric")
      Reported-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com>
      Cc: Greg Rose <gregory.v.rose@intel.com>
      Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      Cc: Rony Efraim <ronye@mellanox.com>
      Cc: Vlad Zolotarov <vladz@cloudius-systems.com>
      Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      Cc: Thomas Graf <tgraf@suug.ch>
      Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NVlad Zolotarov <vladz@cloudius-systems.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f7d2cdf
  11. 23 6月, 2015 1 次提交
    • S
      switchdev; add VLAN support for port's bridge_getlink · 7d4f8d87
      Scott Feldman 提交于
      One more missing piece of the puzzle.  Add vlan dump support to switchdev
      port's bridge_getlink.  iproute2 "bridge vlan show" cmd already knows how
      to show the vlans installed on the bridge and the device , but (until now)
      no one implemented the port vlan part of the netlink PF_BRIDGE:RTM_GETLINK
      msg.  Before this patch, "bridge vlan show":
      
      	$ bridge -c vlan show
      	port    vlan ids
      	sw1p1    30-34			<< bridge side vlans
      		 57
      
      	sw1p1				<< device side vlans (missing)
      
      	sw1p2    57
      
      	sw1p2
      
      	sw1p3
      
      	sw1p4
      
      	br0     None
      
      (When the port is bridged, the output repeats the vlan list for the vlans
      on the bridge side of the port and the vlans on the device side of the
      port.  The listing above show no vlans for the device side even though they
      are installed).
      
      After this patch:
      
      	$ bridge -c vlan show
      	port    vlan ids
      	sw1p1    30-34			<< bridge side vlan
      		 57
      
      	sw1p1    30-34			<< device side vlans
      		 57
      		 3840 PVID
      
      	sw1p2    57
      
      	sw1p2    57
      		 3840 PVID
      
      	sw1p3    3842 PVID
      
      	sw1p4    3843 PVID
      
      	br0     None
      
      I re-used ndo_dflt_bridge_getlink to add vlan fill call-back func.
      switchdev support adds an obj dump for VLAN objects, using the same
      call-back scheme as FDB dump.  Support included for both compressed and
      un-compressed vlan dumps.
      Signed-off-by: NScott Feldman <sfeldma@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d4f8d87
  12. 16 6月, 2015 1 次提交
  13. 18 5月, 2015 1 次提交
    • N
      rtnl/bond: don't send rtnl msg for unregistered iface · ed2a80ab
      Nicolas Dichtel 提交于
      Before the patch, the command 'ip link add bond2 type bond mode 802.3ad'
      causes the kernel to send a rtnl message for the bond2 interface, with an
      ifindex 0.
      
      'ip monitor' shows:
      0: bond2: <BROADCAST,MULTICAST,MASTER> mtu 1500 state DOWN group default
          link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
      9: bond2@NONE: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN group default
          link/ether ea:3e:1f:53:92:7b brd ff:ff:ff:ff:ff:ff
      [snip]
      
      The patch fixes the spotted bug by checking in bond driver if the interface
      is registered before calling the notifier chain.
      It also adds a check in rtmsg_ifinfo() to prevent this kind of bug in the
      future.
      
      Fixes: d4261e56 ("bonding: create netlink event when bonding option is changed")
      CC: Jiri Pirko <jiri@resnulli.us>
      Reported-by: NJulien Meunier <julien.meunier@6wind.com>
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ed2a80ab
  14. 14 5月, 2015 1 次提交
  15. 13 5月, 2015 2 次提交
  16. 10 5月, 2015 1 次提交
  17. 30 4月, 2015 1 次提交
    • N
      bridge/nl: remove wrong use of NLM_F_MULTI · 46c264da
      Nicolas Dichtel 提交于
      NLM_F_MULTI must be used only when a NLMSG_DONE message is sent. In fact,
      it is sent only at the end of a dump.
      
      Libraries like libnl will wait forever for NLMSG_DONE.
      
      Fixes: e5a55a89 ("net: create generic bridge ops")
      Fixes: 815cccbf ("ixgbe: add setlink, getlink support to ixgbe and ixgbevf")
      CC: John Fastabend <john.r.fastabend@intel.com>
      CC: Sathya Perla <sathya.perla@emulex.com>
      CC: Subbu Seetharaman <subbu.seetharaman@emulex.com>
      CC: Ajit Khaparde <ajit.khaparde@emulex.com>
      CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      CC: intel-wired-lan@lists.osuosl.org
      CC: Jiri Pirko <jiri@resnulli.us>
      CC: Scott Feldman <sfeldma@gmail.com>
      CC: Stephen Hemminger <stephen@networkplumber.org>
      CC: bridge@lists.linux-foundation.org
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      46c264da
  18. 11 4月, 2015 2 次提交
  19. 10 4月, 2015 1 次提交
  20. 03 4月, 2015 1 次提交
  21. 25 3月, 2015 2 次提交
    • W
      net: allow to delete a whole device group · 66400d54
      WANG Cong 提交于
      With dev group, we can change a batch of net devices,
      so we should allow to delete them together too.
      
      Group 0 is not allowed to be deleted since it is
      the default group.
      
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      66400d54
    • W
      net: use for_each_netdev_safe() in rtnl_group_changelink() · d079535d
      WANG Cong 提交于
      In case we move the whole dev group to another netns,
      we should call for_each_netdev_safe(), otherwise we get
      a soft lockup:
      
       NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [ip:798]
       irq event stamp: 255424
       hardirqs last  enabled at (255423): [<ffffffff81a2aa95>] restore_args+0x0/0x30
       hardirqs last disabled at (255424): [<ffffffff81a2ad5a>] apic_timer_interrupt+0x6a/0x80
       softirqs last  enabled at (255422): [<ffffffff81079ebc>] __do_softirq+0x2c1/0x3a9
       softirqs last disabled at (255417): [<ffffffff8107a190>] irq_exit+0x41/0x95
       CPU: 0 PID: 798 Comm: ip Not tainted 4.0.0-rc4+ #881
       Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
       task: ffff8800d1b88000 ti: ffff880119530000 task.ti: ffff880119530000
       RIP: 0010:[<ffffffff810cad11>]  [<ffffffff810cad11>] debug_lockdep_rcu_enabled+0x28/0x30
       RSP: 0018:ffff880119533778  EFLAGS: 00000246
       RAX: ffff8800d1b88000 RBX: 0000000000000002 RCX: 0000000000000038
       RDX: 0000000000000000 RSI: ffff8800d1b888c8 RDI: ffff8800d1b888c8
       RBP: ffff880119533778 R08: 0000000000000000 R09: 0000000000000000
       R10: 0000000000000000 R11: 000000000000b5c2 R12: 0000000000000246
       R13: ffff880119533708 R14: 00000000001d5a40 R15: ffff88011a7d5a40
       FS:  00007fc01315f740(0000) GS:ffff88011a600000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
       CR2: 00007f367a120988 CR3: 000000011849c000 CR4: 00000000000007f0
       Stack:
        ffff880119533798 ffffffff811ac868 ffffffff811ac831 ffffffff811ac828
        ffff8801195337c8 ffffffff811ac8c9 ffff8801195339b0 ffff8801197633e0
        0000000000000000 ffff8801195339b0 ffff8801195337d8 ffffffff811ad2d7
       Call Trace:
        [<ffffffff811ac868>] rcu_read_lock+0x37/0x6e
        [<ffffffff811ac831>] ? rcu_read_unlock+0x5f/0x5f
        [<ffffffff811ac828>] ? rcu_read_unlock+0x56/0x5f
        [<ffffffff811ac8c9>] __fget+0x2a/0x7a
        [<ffffffff811ad2d7>] fget+0x13/0x15
        [<ffffffff811be732>] proc_ns_fget+0xe/0x38
        [<ffffffff817c7714>] get_net_ns_by_fd+0x11/0x59
        [<ffffffff817df359>] rtnl_link_get_net+0x33/0x3e
        [<ffffffff817df3d7>] do_setlink+0x73/0x87b
        [<ffffffff810b28ce>] ? trace_hardirqs_off+0xd/0xf
        [<ffffffff81a2aa95>] ? retint_restore_args+0xe/0xe
        [<ffffffff817e0301>] rtnl_newlink+0x40c/0x699
        [<ffffffff817dffe0>] ? rtnl_newlink+0xeb/0x699
        [<ffffffff81a29246>] ? _raw_spin_unlock+0x28/0x33
        [<ffffffff8143ed1e>] ? security_capable+0x18/0x1a
        [<ffffffff8107da51>] ? ns_capable+0x4d/0x65
        [<ffffffff817de5ce>] rtnetlink_rcv_msg+0x181/0x194
        [<ffffffff817de407>] ? rtnl_lock+0x17/0x19
        [<ffffffff817de407>] ? rtnl_lock+0x17/0x19
        [<ffffffff817de44d>] ? __rtnl_unlock+0x17/0x17
        [<ffffffff818327c6>] netlink_rcv_skb+0x4d/0x93
        [<ffffffff817de42f>] rtnetlink_rcv+0x26/0x2d
        [<ffffffff81830f18>] netlink_unicast+0xcb/0x150
        [<ffffffff8183198e>] netlink_sendmsg+0x501/0x523
        [<ffffffff8115cba9>] ? might_fault+0x59/0xa9
        [<ffffffff817b5398>] ? copy_from_user+0x2a/0x2c
        [<ffffffff817b7b74>] sock_sendmsg+0x34/0x3c
        [<ffffffff817b7f6d>] ___sys_sendmsg+0x1b8/0x255
        [<ffffffff8115c5eb>] ? handle_pte_fault+0xbd5/0xd4a
        [<ffffffff8100a2b0>] ? native_sched_clock+0x35/0x37
        [<ffffffff8109e94b>] ? sched_clock_local+0x12/0x72
        [<ffffffff8109eb9c>] ? sched_clock_cpu+0x9e/0xb7
        [<ffffffff810cadbf>] ? rcu_read_lock_held+0x3b/0x3d
        [<ffffffff811ac1d8>] ? __fcheck_files+0x4c/0x58
        [<ffffffff811ac946>] ? __fget_light+0x2d/0x52
        [<ffffffff817b8adc>] __sys_sendmsg+0x42/0x60
        [<ffffffff817b8b0c>] SyS_sendmsg+0x12/0x1c
        [<ffffffff81a29e32>] system_call_fastpath+0x12/0x17
      
      Fixes: e7ed828f ("netlink: support setting devgroup parameters")
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d079535d
  22. 19 3月, 2015 1 次提交
  23. 11 3月, 2015 1 次提交
  24. 01 3月, 2015 3 次提交
    • E
      net: do not use rcu in rtnl_dump_ifinfo() · cac5e65e
      Eric Dumazet 提交于
      We did a failed attempt in the past to only use rcu in rtnl dump
      operations (commit e67f88dd "net: dont hold rtnl mutex during
      netlink dump callbacks")
      
      Now that dumps are holding RTNL anyway, there is no need to also
      use rcu locking, as it forbids any scheduling ability, like
      GFP_KERNEL allocations that controlling path should use instead
      of GFP_ATOMIC whenever possible.
      
      This should fix following splat Cong Wang reported :
      
       [ INFO: suspicious RCU usage. ]
       3.19.0+ #805 Tainted: G        W
      
       include/linux/rcupdate.h:538 Illegal context switch in RCU read-side critical section!
      
       other info that might help us debug this:
      
       rcu_scheduler_active = 1, debug_locks = 0
       2 locks held by ip/771:
        #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff8182b8f4>] netlink_dump+0x21/0x26c
        #1:  (rcu_read_lock){......}, at: [<ffffffff817d785b>] rcu_read_lock+0x0/0x6e
      
       stack backtrace:
       CPU: 3 PID: 771 Comm: ip Tainted: G        W       3.19.0+ #805
       Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
        0000000000000001 ffff8800d51e7718 ffffffff81a27457 0000000029e729e6
        ffff8800d6108000 ffff8800d51e7748 ffffffff810b539b ffffffff820013dd
        00000000000001c8 0000000000000000 ffff8800d7448088 ffff8800d51e7758
       Call Trace:
        [<ffffffff81a27457>] dump_stack+0x4c/0x65
        [<ffffffff810b539b>] lockdep_rcu_suspicious+0x107/0x110
        [<ffffffff8109796f>] rcu_preempt_sleep_check+0x45/0x47
        [<ffffffff8109e457>] ___might_sleep+0x1d/0x1cb
        [<ffffffff8109e67d>] __might_sleep+0x78/0x80
        [<ffffffff814b9b1f>] idr_alloc+0x45/0xd1
        [<ffffffff810cb7ab>] ? rcu_read_lock_held+0x3b/0x3d
        [<ffffffff814b9f9d>] ? idr_for_each+0x53/0x101
        [<ffffffff817c1383>] alloc_netid+0x61/0x69
        [<ffffffff817c14c3>] __peernet2id+0x79/0x8d
        [<ffffffff817c1ab7>] peernet2id+0x13/0x1f
        [<ffffffff817d8673>] rtnl_fill_ifinfo+0xa8d/0xc20
        [<ffffffff810b17d9>] ? __lock_is_held+0x39/0x52
        [<ffffffff817d894f>] rtnl_dump_ifinfo+0x149/0x213
        [<ffffffff8182b9c2>] netlink_dump+0xef/0x26c
        [<ffffffff8182bcba>] netlink_recvmsg+0x17b/0x2c5
        [<ffffffff817b0adc>] __sock_recvmsg+0x4e/0x59
        [<ffffffff817b1b40>] sock_recvmsg+0x3f/0x51
        [<ffffffff817b1f9a>] ___sys_recvmsg+0xf6/0x1d9
        [<ffffffff8115dc67>] ? handle_pte_fault+0x6e1/0xd3d
        [<ffffffff8100a3a0>] ? native_sched_clock+0x35/0x37
        [<ffffffff8109f45b>] ? sched_clock_local+0x12/0x72
        [<ffffffff8109f6ac>] ? sched_clock_cpu+0x9e/0xb7
        [<ffffffff810cb7ab>] ? rcu_read_lock_held+0x3b/0x3d
        [<ffffffff811abde8>] ? __fcheck_files+0x4c/0x58
        [<ffffffff811ac556>] ? __fget_light+0x2d/0x52
        [<ffffffff817b376f>] __sys_recvmsg+0x42/0x60
        [<ffffffff817b379f>] SyS_recvmsg+0x12/0x1c
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Fixes: 0c7aecd4 ("netns: add rtnl cmd to add and get peer netns ids")
      Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      Reported-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cac5e65e
    • E
      net: Verify permission to link_net in newlink · 06615bed
      Eric W. Biederman 提交于
      When applicable verify that the caller has permisson to the underlying
      network namespace for a newly created network device.
      
      Similary checks exist for the network namespace a network device will
      be created in.
      
      Fixes: 317f4810 ("rtnl: allow to create device with IFLA_LINK_NETNSID set")
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06615bed
    • E
      net: Verify permission to dest_net in newlink · 505ce415
      Eric W. Biederman 提交于
      When applicable verify that the caller has permision to create a
      network device in another network namespace.  This check is already
      present when moving a network device between network namespaces in
      setlink so all that is needed is to duplicate that check in newlink.
      
      This change almost backports cleanly, but there are context conflicts
      as the code that follows was added in v4.0-rc1
      
      Fixes: b51642f6 net: Enable a userns root rtnl calls that are safe for unprivilged users
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      505ce415
  25. 25 2月, 2015 1 次提交
  26. 16 2月, 2015 1 次提交
  27. 08 2月, 2015 1 次提交
    • D
      rtnetlink: ifla_vf_policy: fix misuses of NLA_BINARY · 364d5716
      Daniel Borkmann 提交于
      ifla_vf_policy[] is wrong in advertising its individual member types as
      NLA_BINARY since .type = NLA_BINARY in combination with .len declares the
      len member as *max* attribute length [0, len].
      
      The issue is that when do_setvfinfo() is being called to set up a VF
      through ndo handler, we could set corrupted data if the attribute length
      is less than the size of the related structure itself.
      
      The intent is exactly the opposite, namely to make sure to pass at least
      data of minimum size of len.
      
      Fixes: ebc08a6f ("rtnetlink: Add VF config code to rtnetlink")
      Cc: Mitch Williams <mitch.a.williams@intel.com>
      Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      364d5716
  28. 05 2月, 2015 1 次提交
  29. 02 2月, 2015 1 次提交
  30. 30 1月, 2015 1 次提交
    • N
      rtnetlink: pass link_net to the newlink handler · 7b4ce694
      Nicolas Dichtel 提交于
      When IFLA_LINK_NETNSID is used, the netdevice should be built in this link netns
      and moved at the end to another netns (pointed by the socket netns or
      IFLA_NET_NS_[PID|FD]).
      
      Existing user of the newlink handler will use the netns argument (src_net) to
      find a link netdevice or to check some other information into the link netns.
      For example, to find a netdevice, two information are required: an ifindex
      (usually from IFLA_LINK) and a netns (this link netns).
      
      Note: when using IFLA_LINK_NETNSID and IFLA_NET_NS_[PID|FD], a user may create a
      netdevice that stands in netnsX and with its link part in netnsY, by sending a
      rtnl message from netnsZ.
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7b4ce694
  31. 29 1月, 2015 1 次提交
  32. 24 1月, 2015 1 次提交
  33. 20 1月, 2015 1 次提交