1. 22 4月, 2017 1 次提交
    • M
      bonding: fix wq initialization for links created via netlink · ea8ffc08
      Mahesh Bandewar 提交于
      Earlier patch 4493b81b ("bonding: initialize work-queues during
      creation of bond") moved the work-queue initialization from bond_open()
      to bond_create(). However this caused the link those are created using
      netlink 'create bond option' (ip link add bondX type bond); create the
      new trunk without initializing work-queues. Prior to the above mentioned
      change, ndo_open was in both paths and things worked correctly. The
      consequence is visible in the report shared by Joe Stringer -
      
      I've noticed that this patch breaks bonding within namespaces if
      you're not careful to perform device cleanup correctly.
      
      Here's my repro script, you can run on any net-next with this patch
      and you'll start seeing some weird behaviour:
      
      ip netns add foo
      ip li add veth0 type veth peer name veth0+ netns foo
      ip li add veth1 type veth peer name veth1+ netns foo
      ip netns exec foo ip li add bond0 type bond
      ip netns exec foo ip li set dev veth0+ master bond0
      ip netns exec foo ip li set dev veth1+ master bond0
      ip netns exec foo ip addr add dev bond0 192.168.0.1/24
      ip netns exec foo ip li set dev bond0 up
      ip li del dev veth0
      ip li del dev veth1
      
      The second to last command segfaults, last command hangs. rtnl is now
      permanently locked. It's not a problem if you take bond0 down before
      deleting veths, or delete bond0 before deleting veths. If you delete
      either end of the veth pair as per above, either inside or outside the
      namespace, it hits this problem.
      
      Here's some kernel logs:
      [ 1221.801610] bond0: Enslaving veth0+ as an active interface with an up link
      [ 1224.449581] bond0: Enslaving veth1+ as an active interface with an up link
      [ 1281.193863] bond0: Releasing backup interface veth0+
      [ 1281.193866] bond0: the permanent HWaddr of veth0+ -
      16:bf:fb:e0:b8:43 - is still in use by bond0 - set the HWaddr of
      veth0+ to a different address to avoid conflicts
      [ 1281.193867] ------------[ cut here ]------------
      [ 1281.193873] WARNING: CPU: 0 PID: 2024 at kernel/workqueue.c:1511
      __queue_delayed_work+0x13f/0x150
      [ 1281.193873] Modules linked in: bonding veth openvswitch nf_nat_ipv6
      nf_nat_ipv4 nf_nat autofs4 nfsd auth_rpcgss nfs_acl binfmt_misc nfs
      lockd grace sunrpc fscache ppdev vmw_balloon coretemp psmouse
      serio_raw vmwgfx ttm drm_kms_helper vmw_vmci netconsole parport_pc
      configfs drm i2c_piix4 fb_sys_fops syscopyarea sysfillrect sysimgblt
      shpchp mac_hid nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4
      nf_defrag_ipv4 nf_conntrack libcrc32c lp parport hid_generic usbhid
      hid mptspi mptscsih e1000 mptbase ahci libahci
      [ 1281.193905] CPU: 0 PID: 2024 Comm: ip Tainted: G        W
      4.10.0-bisect-bond-v0.14 #37
      [ 1281.193906] Hardware name: VMware, Inc. VMware Virtual
      Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014
      [ 1281.193906] Call Trace:
      [ 1281.193912]  dump_stack+0x63/0x89
      [ 1281.193915]  __warn+0xd1/0xf0
      [ 1281.193917]  warn_slowpath_null+0x1d/0x20
      [ 1281.193918]  __queue_delayed_work+0x13f/0x150
      [ 1281.193920]  queue_delayed_work_on+0x27/0x40
      [ 1281.193929]  bond_change_active_slave+0x25b/0x670 [bonding]
      [ 1281.193932]  ? synchronize_rcu_expedited+0x27/0x30
      [ 1281.193935]  __bond_release_one+0x489/0x510 [bonding]
      [ 1281.193939]  ? addrconf_notify+0x1b7/0xab0
      [ 1281.193942]  bond_netdev_event+0x2c5/0x2e0 [bonding]
      [ 1281.193944]  ? netconsole_netdev_event+0x124/0x190 [netconsole]
      [ 1281.193947]  notifier_call_chain+0x49/0x70
      [ 1281.193948]  raw_notifier_call_chain+0x16/0x20
      [ 1281.193950]  call_netdevice_notifiers_info+0x35/0x60
      [ 1281.193951]  rollback_registered_many+0x23b/0x3e0
      [ 1281.193953]  unregister_netdevice_many+0x24/0xd0
      [ 1281.193955]  rtnl_delete_link+0x3c/0x50
      [ 1281.193956]  rtnl_dellink+0x8d/0x1b0
      [ 1281.193960]  rtnetlink_rcv_msg+0x95/0x220
      [ 1281.193962]  ? __kmalloc_node_track_caller+0x35/0x280
      [ 1281.193964]  ? __netlink_lookup+0xf1/0x110
      [ 1281.193966]  ? rtnl_newlink+0x830/0x830
      [ 1281.193967]  netlink_rcv_skb+0xa7/0xc0
      [ 1281.193969]  rtnetlink_rcv+0x28/0x30
      [ 1281.193970]  netlink_unicast+0x15b/0x210
      [ 1281.193971]  netlink_sendmsg+0x319/0x390
      [ 1281.193974]  sock_sendmsg+0x38/0x50
      [ 1281.193975]  ___sys_sendmsg+0x25c/0x270
      [ 1281.193978]  ? mem_cgroup_commit_charge+0x76/0xf0
      [ 1281.193981]  ? page_add_new_anon_rmap+0x89/0xc0
      [ 1281.193984]  ? lru_cache_add_active_or_unevictable+0x35/0xb0
      [ 1281.193985]  ? __handle_mm_fault+0x4e9/0x1170
      [ 1281.193987]  __sys_sendmsg+0x45/0x80
      [ 1281.193989]  SyS_sendmsg+0x12/0x20
      [ 1281.193991]  do_syscall_64+0x6e/0x180
      [ 1281.193993]  entry_SYSCALL64_slow_path+0x25/0x25
      [ 1281.193995] RIP: 0033:0x7f6ec122f5a0
      [ 1281.193995] RSP: 002b:00007ffe69e89c48 EFLAGS: 00000246 ORIG_RAX:
      000000000000002e
      [ 1281.193997] RAX: ffffffffffffffda RBX: 00007ffe69e8dd60 RCX: 00007f6ec122f5a0
      [ 1281.193997] RDX: 0000000000000000 RSI: 00007ffe69e89c90 RDI: 0000000000000003
      [ 1281.193998] RBP: 00007ffe69e89c90 R08: 0000000000000000 R09: 0000000000000003
      [ 1281.193999] R10: 00007ffe69e89a10 R11: 0000000000000246 R12: 0000000058f14b9f
      [ 1281.193999] R13: 0000000000000000 R14: 00000000006473a0 R15: 00007ffe69e8e450
      [ 1281.194001] ---[ end trace 713a77486cbfbfa3 ]---
      
      Fixes: 4493b81b ("bonding: initialize work-queues during creation of bond")
      Reported-by: NJoe Stringer <joe@ovn.org>
      Tested-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Acked-by: NAndy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea8ffc08
  2. 06 4月, 2017 1 次提交
    • J
      bonding: attempt to better support longer hw addresses · faeeb317
      Jarod Wilson 提交于
      People are using bonding over Infiniband IPoIB connections, and who knows
      what else. Infiniband has a hardware address length of 20 octets
      (INFINIBAND_ALEN), and the network core defines a MAX_ADDR_LEN of 32.
      Various places in the bonding code are currently hard-wired to 6 octets
      (ETH_ALEN), such as the 3ad code, which I've left untouched here. Besides,
      only alb is currently possible on Infiniband links right now anyway, due
      to commit 1533e773, so the alb code is where most of the changes are.
      
      One major component of this change is the addition of a bond_hw_addr_copy
      function that takes a length argument, instead of using ether_addr_copy
      everywhere that hardware addresses need to be copied about. The other
      major component of this change is converting the bonding code from using
      struct sockaddr for address storage to struct sockaddr_storage, as the
      former has an address storage space of only 14, while the latter is 128
      minus a few, which is necessary to support bonding over device with up to
      MAX_ADDR_LEN octet hardware addresses. Additionally, this probably fixes
      up some memory corruption issues with the current code, where it's
      possible to write an infiniband hardware address into a sockaddr declared
      on the stack.
      
      Lightly tested on a dual mlx4 IPoIB setup, which properly shows a 20-octet
      hardware address now:
      
      $ cat /proc/net/bonding/bond0
      Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
      
      Bonding Mode: fault-tolerance (active-backup) (fail_over_mac active)
      Primary Slave: mlx4_ib0 (primary_reselect always)
      Currently Active Slave: mlx4_ib0
      MII Status: up
      MII Polling Interval (ms): 100
      Up Delay (ms): 100
      Down Delay (ms): 100
      
      Slave Interface: mlx4_ib0
      MII Status: up
      Speed: Unknown
      Duplex: Unknown
      Link Failure Count: 0
      Permanent HW addr:
      80:00:02:08:fe:80:00:00:00:00:00:00:e4:1d:2d:03:00:1d:67:01
      Slave queue ID: 0
      
      Slave Interface: mlx4_ib1
      MII Status: up
      Speed: Unknown
      Duplex: Unknown
      Link Failure Count: 0
      Permanent HW addr:
      80:00:02:09:fe:80:00:00:00:00:00:01:e4:1d:2d:03:00:1d:67:02
      Slave queue ID: 0
      
      Also tested with a standard 1Gbps NIC bonding setup (with a mix of
      e1000 and e1000e cards), running LNST's bonding tests.
      
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: netdev@vger.kernel.org
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      faeeb317
  3. 28 3月, 2017 1 次提交
  4. 18 11月, 2016 1 次提交
    • A
      netns: make struct pernet_operations::id unsigned int · c7d03a00
      Alexey Dobriyan 提交于
      Make struct pernet_operations::id unsigned.
      
      There are 2 reasons to do so:
      
      1)
      This field is really an index into an zero based array and
      thus is unsigned entity. Using negative value is out-of-bound
      access by definition.
      
      2)
      On x86_64 unsigned 32-bit data which are mixed with pointers
      via array indexing or offsets added or subtracted to pointers
      are preffered to signed 32-bit data.
      
      "int" being used as an array index needs to be sign-extended
      to 64-bit before being used.
      
      	void f(long *p, int i)
      	{
      		g(p[i]);
      	}
      
        roughly translates to
      
      	movsx	rsi, esi
      	mov	rdi, [rsi+...]
      	call 	g
      
      MOVSX is 3 byte instruction which isn't necessary if the variable is
      unsigned because x86_64 is zero extending by default.
      
      Now, there is net_generic() function which, you guessed it right, uses
      "int" as an array index:
      
      	static inline void *net_generic(const struct net *net, int id)
      	{
      		...
      		ptr = ng->ptr[id - 1];
      		...
      	}
      
      And this function is used a lot, so those sign extensions add up.
      
      Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
      messing with code generation):
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      
      Unfortunately some functions actually grow bigger.
      This is a semmingly random artefact of code generation with register
      allocator being used differently. gcc decides that some variable
      needs to live in new r8+ registers and every access now requires REX
      prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
      used which is longer than [r8]
      
      However, overall balance is in negative direction:
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      	function                                     old     new   delta
      	nfsd4_lock                                  3886    3959     +73
      	tipc_link_build_proto_msg                   1096    1140     +44
      	mac80211_hwsim_new_radio                    2776    2808     +32
      	tipc_mon_rcv                                1032    1058     +26
      	svcauth_gss_legacy_init                     1413    1429     +16
      	tipc_bcbase_select_primary                   379     392     +13
      	nfsd4_exchange_id                           1247    1260     +13
      	nfsd4_setclientid_confirm                    782     793     +11
      		...
      	put_client_renew_locked                      494     480     -14
      	ip_set_sockfn_get                            730     716     -14
      	geneve_sock_add                              829     813     -16
      	nfsd4_sequence_done                          721     703     -18
      	nlmclnt_lookup_host                          708     686     -22
      	nfsd4_lockt                                 1085    1063     -22
      	nfs_get_client                              1077    1050     -27
      	tcf_bpf_init                                1106    1076     -30
      	nfsd4_encode_fattr                          5997    5930     -67
      	Total: Before=154856051, After=154854321, chg -0.00%
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7d03a00
  5. 28 9月, 2016 1 次提交
    • A
      bonding: quit messing with IOCTL · 4ad41c1e
      Al Viro 提交于
      The only remaining users are issuing SIOCGMIIPHY and SIOCGMIIREG,
      neither of which deals with userland pointers.  Simply calling
      ->ndo_do_ioctl() is fine; no messing with set_fs() is needed.
      It used to mess with SIOCETHTOOL, which would've needed set_fs(),
      but that has been killed in "[NET] ethtool ops are the only way"
      9 years ago...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4ad41c1e
  6. 01 7月, 2016 1 次提交
  7. 19 3月, 2016 1 次提交
    • E
      bonding: fix bond_get_stats() · fe30937b
      Eric Dumazet 提交于
      bond_get_stats() can be called from rtnetlink (with RTNL held)
      or from /proc/net/dev seq handler (with RCU held)
      
      The logic added in commit 5f0c5f73 ("bonding: make global bonding
      stats more reliable") kind of assumed only one cpu could run there.
      
      If multiple threads are reading /proc/net/dev, stats can be really
      messed up after a while.
      
      A second problem is that some fields are 32bit, so we need to properly
      handle the wrap around problem.
      
      Given that RTNL is not always held, we need to use
      bond_for_each_slave_rcu().
      
      Fixes: 5f0c5f73 ("bonding: make global bonding stats more reliable")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Andy Gospodarek <gospo@cumulusnetworks.com>
      Cc: Jay Vosburgh <j.vosburgh@gmail.com>
      Cc: Veaceslav Falico <vfalico@gmail.com>
      Reviewed-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe30937b
  8. 04 12月, 2015 2 次提交
  9. 31 8月, 2015 1 次提交
  10. 11 5月, 2015 3 次提交
    • M
      bonding: Implement user key part of port_key in an AD system. · d22a5fc0
      Mahesh Bandewar 提交于
      The port key has three components - user-key, speed-part, and duplex-part.
      The LSBit is for the duplex-part, next 5 bits are for the speed while the
      remaining 10 bits are the user defined key bits. Get these 10 bits
      from the user-space (through the SysFs interface) and use it to form the
      admin port-key. Allowed range for the user-key is 0 - 1023 (10 bits). If
      it is not provided then use zero for the user-key-bits (default).
      
      It can set using following example code -
      
         # modprobe bonding mode=4
         # usr_port_key=$(( RANDOM & 0x3FF ))
         # echo $usr_port_key > /sys/class/net/bond0/bonding/ad_user_port_key
         # echo +eth1 > /sys/class/net/bond0/bonding/slaves
         ...
         # ip link set bond0 up
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Reviewed-by: NNikolay Aleksandrov <nikolay@redhat.com>
      [jt: * fixed up style issues reported by checkpatch
           * fixed up context from change in ad_actor_sys_prio patch]
      Signed-off-by: NJonathan Toppins <jtoppins@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d22a5fc0
    • M
      bonding: Allow userspace to set actors' macaddr in an AD-system. · 74514957
      Mahesh Bandewar 提交于
      In an AD system, the communication between actor and partner is the
      business between these two entities. In the current setup anyone on the
      same L2 can "guess" the LACPDU contents and then possibly send the
      spoofed LACPDUs and trick the partner causing connectivity issues for
      the AD system. This patch allows to use a random mac-address obscuring
      it's identity making it harder for someone in the L2 is do the same thing.
      
      This patch allows user-space to choose the mac-address for the AD-system.
      This mac-address can not be NULL or a Multicast. If the mac-address is set
      from user-space; kernel will honor it and will not overwrite it. In the
      absence (value from user space); the logic will default to using the
      masters' mac as the mac-address for the AD-system.
      
      It can be set using example code below -
      
         # modprobe bonding mode=4
         # sys_mac_addr=$(printf '%02x:%02x:%02x:%02x:%02x:%02x' \
                          $(( (RANDOM & 0xFE) | 0x02 )) \
                          $(( RANDOM & 0xFF )) \
                          $(( RANDOM & 0xFF )) \
                          $(( RANDOM & 0xFF )) \
                          $(( RANDOM & 0xFF )) \
                          $(( RANDOM & 0xFF )))
         # echo $sys_mac_addr > /sys/class/net/bond0/bonding/ad_actor_system
         # echo +eth1 > /sys/class/net/bond0/bonding/slaves
         ...
         # ip link set bond0 up
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Reviewed-by: NNikolay Aleksandrov <nikolay@redhat.com>
      [jt: fixed up style issues reported by checkpatch]
      Signed-off-by: NJonathan Toppins <jtoppins@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      74514957
    • M
      bonding: Allow userspace to set actors' system_priority in AD system · 6791e466
      Mahesh Bandewar 提交于
      This patch allows user to randomize the system-priority in an ad-system.
      The allowed range is 1 - 0xFFFF while default value is 0xFFFF. If user
      does not specify this value, the system defaults to 0xFFFF, which is
      what it was before this patch.
      
      Following example code could set the value -
          # modprobe bonding mode=4
          # sys_prio=$(( 1 + RANDOM + RANDOM ))
          # echo $sys_prio > /sys/class/net/bond0/bonding/ad_actor_sys_prio
          # echo +eth1 > /sys/class/net/bond0/bonding/slaves
          ...
          # ip link set bond0 up
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Reviewed-by: NNikolay Aleksandrov <nikolay@redhat.com>
      [jt: * fixed up style issues reported by checkpatch
           * changed how the default value is set in bond_check_params(), this
             makes the default consistent between what gets set for a new bond
             and what the default is claimed to be in the bonding options.]
      Signed-off-by: NJonathan Toppins <jtoppins@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6791e466
  11. 27 4月, 2015 1 次提交
  12. 10 2月, 2015 1 次提交
  13. 05 2月, 2015 2 次提交
  14. 28 1月, 2015 1 次提交
  15. 11 11月, 2014 1 次提交
  16. 01 11月, 2014 1 次提交
  17. 07 10月, 2014 2 次提交
  18. 30 9月, 2014 1 次提交
    • A
      bonding: make global bonding stats more reliable · 5f0c5f73
      Andy Gospodarek 提交于
      As the code stands today, bonding stats are based simply on the stats
      from the member interfaces.  If a member was to be removed from a bond,
      the stats would instantly drop.  This would be confusing to an admin
      would would suddonly see interface stats drop while traffic is still
      flowing.
      
      In addition to preventing the stats drops mentioned above, new members
      will now be added to the bond and only traffic received after the member
      was added to the bond will be counted as part of bonding stats.  Bonding
      counters will also be updated when any slaves are dropped to make sure
      the reported stats are reliable.
      
      v2: Changes suggested by Nik to properly allocate/free stats memory.
      v3: Properly destroy workqueue and fix netlink configuration path.
      v4: Moved cached stats into bonding and slave structs as there does not
      seem to be a complexity/performance benefit to using alloc'd memory vs
      in-struct memory.
      Signed-off-by: NAndy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5f0c5f73
  19. 16 9月, 2014 1 次提交
  20. 14 9月, 2014 3 次提交
  21. 10 9月, 2014 2 次提交
  22. 21 7月, 2014 1 次提交
  23. 17 7月, 2014 1 次提交
    • M
      bonding: Do not try to send packets over dead link in TLB mode. · 6b794c1c
      Mahesh Bandewar 提交于
      In TLB mode if tlb_dynamic_lb is NOT set, slaves from the bond
      group are selected based on the hash distribution. This does not
      exclude dead links which are part of the bond. Also if there is a
      temporary link event which brings down the interface, packets
      hashed on that interface would be dropped too.
      
      This patch fixes these issues and distributes flows across the
      UP links only. Also the array construction of links which are
      capable of sending packets happen in the control path leaving
      only link-selection during the data-path.
      
      One possible side effect of this is - at a link event; all
      flows will be shuffled to get good distribution. But impact of
      this should be minimum with the assumption that a member or
      members of the bond group are not available is a very temporary
      situation.
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6b794c1c
  24. 16 7月, 2014 3 次提交
  25. 05 6月, 2014 1 次提交
    • V
      bonding: Support macvlans on top of tlb/rlb mode bonds · 14af9963
      Vlad Yasevich 提交于
      To make TLB mode work, the patch allows learning packets
      to be sent using mac addresses assigned to macvlan devices,
      also taking into an account vlans that may be between the
      bond and macvlan device.
      
      To make RLB work, all we have to do is accept ARP packets
      for addresses added to the bond dev->uc list.  Since RLB
      mode will take care to update the peers directly with
      correct mac addresses, learning packets for these addresses
      do not have be send to switch.
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      14af9963
  26. 23 5月, 2014 1 次提交
  27. 17 5月, 2014 4 次提交