1. 21 7月, 2017 1 次提交
  2. 08 7月, 2017 1 次提交
    • W
      bonding: avoid NETDEV_CHANGEMTU event when unregistering slave · f51048c3
      WANG Cong 提交于
      As Hongjun/Nicolas summarized in their original patch:
      
      "
      When a device changes from one netns to another, it's first unregistered,
      then the netns reference is updated and the dev is registered in the new
      netns. Thus, when a slave moves to another netns, it is first
      unregistered. This triggers a NETDEV_UNREGISTER event which is caught by
      the bonding driver. The driver calls bond_release(), which calls
      dev_set_mtu() and thus triggers NETDEV_CHANGEMTU (the device is still in
      the old netns).
      "
      
      This is a very special case, because the device is being unregistered
      no one should still care about the NETDEV_CHANGEMTU event triggered
      at this point, we can avoid broadcasting this event on this path,
      and avoid touching inetdev_event()/addrconf_notify() path.
      
      It requires to export __dev_set_mtu() to bonding driver.
      Reported-by: NHongjun Li <hongjun.li@6wind.com>
      Reported-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Cc: Jay Vosburgh <j.vosburgh@gmail.com>
      Cc: Veaceslav Falico <vfalico@gmail.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f51048c3
  3. 09 6月, 2017 1 次提交
  4. 08 6月, 2017 1 次提交
    • D
      net: Fix inconsistent teardown and release of private netdev state. · cf124db5
      David S. Miller 提交于
      Network devices can allocate reasources and private memory using
      netdev_ops->ndo_init().  However, the release of these resources
      can occur in one of two different places.
      
      Either netdev_ops->ndo_uninit() or netdev->destructor().
      
      The decision of which operation frees the resources depends upon
      whether it is necessary for all netdev refs to be released before it
      is safe to perform the freeing.
      
      netdev_ops->ndo_uninit() presumably can occur right after the
      NETDEV_UNREGISTER notifier completes and the unicast and multicast
      address lists are flushed.
      
      netdev->destructor(), on the other hand, does not run until the
      netdev references all go away.
      
      Further complicating the situation is that netdev->destructor()
      almost universally does also a free_netdev().
      
      This creates a problem for the logic in register_netdevice().
      Because all callers of register_netdevice() manage the freeing
      of the netdev, and invoke free_netdev(dev) if register_netdevice()
      fails.
      
      If netdev_ops->ndo_init() succeeds, but something else fails inside
      of register_netdevice(), it does call ndo_ops->ndo_uninit().  But
      it is not able to invoke netdev->destructor().
      
      This is because netdev->destructor() will do a free_netdev() and
      then the caller of register_netdevice() will do the same.
      
      However, this means that the resources that would normally be released
      by netdev->destructor() will not be.
      
      Over the years drivers have added local hacks to deal with this, by
      invoking their destructor parts by hand when register_netdevice()
      fails.
      
      Many drivers do not try to deal with this, and instead we have leaks.
      
      Let's close this hole by formalizing the distinction between what
      private things need to be freed up by netdev->destructor() and whether
      the driver needs unregister_netdevice() to perform the free_netdev().
      
      netdev->priv_destructor() performs all actions to free up the private
      resources that used to be freed by netdev->destructor(), except for
      free_netdev().
      
      netdev->needs_free_netdev is a boolean that indicates whether
      free_netdev() should be done at the end of unregister_netdevice().
      
      Now, register_netdevice() can sanely release all resources after
      ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
      and netdev->priv_destructor().
      
      And at the end of unregister_netdevice(), we invoke
      netdev->priv_destructor() and optionally call free_netdev().
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf124db5
  5. 28 5月, 2017 1 次提交
    • V
      bonding: Prevent duplicate userspace notification · 7a7e96e0
      Vlad Yasevich 提交于
      Whenever a user changes bonding options, a NETDEV_CHANGEINFODATA
      notificatin is generated which results in a rtnelink message to
      be sent.  While runnig 'ip monitor', we can actually see 2 messages,
      one a result of the event, and the other a result of state change
      that is generated bo netdev_state_change().  However, this is not
      always the case. If bonding changes were done via sysfs or ifenslave
      (old ioctl interface), then only 1 message is seen.
      
      This patch removes duplicate messages in the case of using netlink
      to configure bonding.  It introduceds a separte function that
      triggers a netdev event and uses that function in the syfs and ioctl
      cases.
      
      This was discovered while auditing all the different envents and
      continues the effort of cleaning up duplicated netlink messages.
      
      CC: David Ahern <dsa@cumulusnetworks.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NVladislav Yasevich <vyasevic@redhat.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7a7e96e0
  6. 26 5月, 2017 1 次提交
    • N
      bonding: Don't update slave->link until ready to commit · 797a9364
      Nithin Sujir 提交于
      In the loadbalance arp monitoring scheme, when a slave link change is
      detected, the slave->link is immediately updated and slave_state_changed
      is set. Later down the function, the rtnl_lock is acquired and the
      changes are committed, updating the bond link state.
      
      However, the acquisition of the rtnl_lock can fail. The next time the
      monitor runs, since slave->link is already updated, it determines that
      link is unchanged. This results in the bond link state permanently out
      of sync with the slave link.
      
      This patch modifies bond_loadbalance_arp_mon() to handle link changes
      identical to bond_ab_arp_{inspect/commit}(). The new link state is
      maintained in slave->new_link until we're ready to commit at which point
      it's copied into slave->link.
      
      NOTE: miimon_{inspect/commit}() has a more complex state machine
      requiring the use of the bond_{propose,commit}_link_state() functions
      which maintains the intermediate state in slave->link_new_state. The arp
      monitors don't require that.
      
      Testing: This bug is very easy to reproduce with the following steps.
      1. In a loop, toggle a slave link of a bond slave interface.
      2. In a separate loop, do ifconfig up/down of an unrelated interface to
      create contention for rtnl_lock.
      Within a few iterations, the bond link goes out of sync with the slave
      link.
      Signed-off-by: NNithin Nayak Sujir <nsujir@tintri.com>
      Cc: Mahesh Bandewar <maheshb@google.com>
      Cc: Jay Vosburgh <jay.vosburgh@canonical.com>
      Acked-by: NMahesh Bandewar <maheshb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      797a9364
  7. 23 5月, 2017 1 次提交
    • J
      bonding: fix randomly populated arp target array · 72ccc471
      Jarod Wilson 提交于
      In commit dc9c4d0f, the arp_target array moved from a static global
      to a local variable. By the nature of static globals, the array used to
      be initialized to all 0. At present, it's full of random data, which
      that gets interpreted as arp_target values, when none have actually been
      specified. Systems end up booting with spew along these lines:
      
      [   32.161783] IPv6: ADDRCONF(NETDEV_UP): lacp0: link is not ready
      [   32.168475] IPv6: ADDRCONF(NETDEV_UP): lacp0: link is not ready
      [   32.175089] 8021q: adding VLAN 0 to HW filter on device lacp0
      [   32.193091] IPv6: ADDRCONF(NETDEV_UP): lacp0: link is not ready
      [   32.204892] lacp0: Setting MII monitoring interval to 100
      [   32.211071] lacp0: Removing ARP target 216.124.228.17
      [   32.216824] lacp0: Removing ARP target 218.160.255.255
      [   32.222646] lacp0: Removing ARP target 185.170.136.184
      [   32.228496] lacp0: invalid ARP target 255.255.255.255 specified for removal
      [   32.236294] lacp0: option arp_ip_target: invalid value (-255.255.255.255)
      [   32.243987] lacp0: Removing ARP target 56.125.228.17
      [   32.249625] lacp0: Removing ARP target 218.160.255.255
      [   32.255432] lacp0: Removing ARP target 15.157.233.184
      [   32.261165] lacp0: invalid ARP target 255.255.255.255 specified for removal
      [   32.268939] lacp0: option arp_ip_target: invalid value (-255.255.255.255)
      [   32.276632] lacp0: Removing ARP target 16.0.0.0
      [   32.281755] lacp0: Removing ARP target 218.160.255.255
      [   32.287567] lacp0: Removing ARP target 72.125.228.17
      [   32.293165] lacp0: Removing ARP target 218.160.255.255
      [   32.298970] lacp0: Removing ARP target 8.125.228.17
      [   32.304458] lacp0: Removing ARP target 218.160.255.255
      
      None of these were actually specified as ARP targets, and the driver does
      seem to clean up the mess okay, but it's rather noisy and confusing, leaks
      values to userspace, and the 255.255.255.255 spew shows up even when debug
      prints are disabled.
      
      The fix: just zero out arp_target at init time.
      
      While we're in here, init arp_all_targets_value in the right place.
      
      Fixes: dc9c4d0f ("bonding: reduce scope of some global variables")
      CC: Mahesh Bandewar <maheshb@google.com>
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: netdev@vger.kernel.org
      CC: stable@vger.kernel.org
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Acked-by: NAndy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      72ccc471
  8. 29 4月, 2017 1 次提交
    • P
      bonding: avoid defaulting hard_header_len to ETH_HLEN on slave removal · 19cdead3
      Paolo Abeni 提交于
      On slave list updates, the bonding driver computes its hard_header_len
      as the maximum of all enslaved devices's hard_header_len.
      If the slave list is empty, e.g. on last enslaved device removal,
      ETH_HLEN is used.
      
      Since the bonding header_ops are set only when the first enslaved
      device is attached, the above can lead to header_ops->create()
      being called with the wrong skb headroom in place.
      
      If bond0 is configured on top of ipoib devices, with the
      following commands:
      
      ifup bond0
      for slave in $BOND_SLAVES_LIST; do
      	ip link set dev $slave nomaster
      done
      ping -c 1 <ip on bond0 subnet>
      
      we will obtain a skb_under_panic() with a similar call trace:
      	skb_push+0x3d/0x40
      	push_pseudo_header+0x17/0x30 [ib_ipoib]
      	ipoib_hard_header+0x4e/0x80 [ib_ipoib]
      	arp_create+0x12f/0x220
      	arp_send_dst.part.19+0x28/0x50
      	arp_solicit+0x115/0x290
      	neigh_probe+0x4d/0x70
      	__neigh_event_send+0xa7/0x230
      	neigh_resolve_output+0x12e/0x1c0
      	ip_finish_output2+0x14b/0x390
      	ip_finish_output+0x136/0x1e0
      	ip_output+0x76/0xe0
      	ip_local_out+0x35/0x40
      	ip_send_skb+0x19/0x40
      	ip_push_pending_frames+0x33/0x40
      	raw_sendmsg+0x7d3/0xb50
      	inet_sendmsg+0x31/0xb0
      	sock_sendmsg+0x38/0x50
      	SYSC_sendto+0x102/0x190
      	SyS_sendto+0xe/0x10
      	do_syscall_64+0x67/0x180
      	entry_SYSCALL64_slow_path+0x25/0x25
      
      This change addresses the issue avoiding updating the bonding device
      hard_header_len when the slaves list become empty, forbidding to
      shrink it below the value used by header_ops->create().
      
      The bug is there since commit 54ef3137 ("[PATCH] bonding: Handle large
      hard_header_len") but the panic can be triggered only since
      commit fc791b63 ("IB/ipoib: move back IB LL address into the hard
      header").
      Reported-by: NNorbert P <noe@physik.uzh.ch>
      Fixes: 54ef3137 ("[PATCH] bonding: Handle large hard_header_len")
      Fixes: fc791b63 ("IB/ipoib: move back IB LL address into the hard header")
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      19cdead3
  9. 22 4月, 2017 1 次提交
    • M
      bonding: fix wq initialization for links created via netlink · ea8ffc08
      Mahesh Bandewar 提交于
      Earlier patch 4493b81b ("bonding: initialize work-queues during
      creation of bond") moved the work-queue initialization from bond_open()
      to bond_create(). However this caused the link those are created using
      netlink 'create bond option' (ip link add bondX type bond); create the
      new trunk without initializing work-queues. Prior to the above mentioned
      change, ndo_open was in both paths and things worked correctly. The
      consequence is visible in the report shared by Joe Stringer -
      
      I've noticed that this patch breaks bonding within namespaces if
      you're not careful to perform device cleanup correctly.
      
      Here's my repro script, you can run on any net-next with this patch
      and you'll start seeing some weird behaviour:
      
      ip netns add foo
      ip li add veth0 type veth peer name veth0+ netns foo
      ip li add veth1 type veth peer name veth1+ netns foo
      ip netns exec foo ip li add bond0 type bond
      ip netns exec foo ip li set dev veth0+ master bond0
      ip netns exec foo ip li set dev veth1+ master bond0
      ip netns exec foo ip addr add dev bond0 192.168.0.1/24
      ip netns exec foo ip li set dev bond0 up
      ip li del dev veth0
      ip li del dev veth1
      
      The second to last command segfaults, last command hangs. rtnl is now
      permanently locked. It's not a problem if you take bond0 down before
      deleting veths, or delete bond0 before deleting veths. If you delete
      either end of the veth pair as per above, either inside or outside the
      namespace, it hits this problem.
      
      Here's some kernel logs:
      [ 1221.801610] bond0: Enslaving veth0+ as an active interface with an up link
      [ 1224.449581] bond0: Enslaving veth1+ as an active interface with an up link
      [ 1281.193863] bond0: Releasing backup interface veth0+
      [ 1281.193866] bond0: the permanent HWaddr of veth0+ -
      16:bf:fb:e0:b8:43 - is still in use by bond0 - set the HWaddr of
      veth0+ to a different address to avoid conflicts
      [ 1281.193867] ------------[ cut here ]------------
      [ 1281.193873] WARNING: CPU: 0 PID: 2024 at kernel/workqueue.c:1511
      __queue_delayed_work+0x13f/0x150
      [ 1281.193873] Modules linked in: bonding veth openvswitch nf_nat_ipv6
      nf_nat_ipv4 nf_nat autofs4 nfsd auth_rpcgss nfs_acl binfmt_misc nfs
      lockd grace sunrpc fscache ppdev vmw_balloon coretemp psmouse
      serio_raw vmwgfx ttm drm_kms_helper vmw_vmci netconsole parport_pc
      configfs drm i2c_piix4 fb_sys_fops syscopyarea sysfillrect sysimgblt
      shpchp mac_hid nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4
      nf_defrag_ipv4 nf_conntrack libcrc32c lp parport hid_generic usbhid
      hid mptspi mptscsih e1000 mptbase ahci libahci
      [ 1281.193905] CPU: 0 PID: 2024 Comm: ip Tainted: G        W
      4.10.0-bisect-bond-v0.14 #37
      [ 1281.193906] Hardware name: VMware, Inc. VMware Virtual
      Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014
      [ 1281.193906] Call Trace:
      [ 1281.193912]  dump_stack+0x63/0x89
      [ 1281.193915]  __warn+0xd1/0xf0
      [ 1281.193917]  warn_slowpath_null+0x1d/0x20
      [ 1281.193918]  __queue_delayed_work+0x13f/0x150
      [ 1281.193920]  queue_delayed_work_on+0x27/0x40
      [ 1281.193929]  bond_change_active_slave+0x25b/0x670 [bonding]
      [ 1281.193932]  ? synchronize_rcu_expedited+0x27/0x30
      [ 1281.193935]  __bond_release_one+0x489/0x510 [bonding]
      [ 1281.193939]  ? addrconf_notify+0x1b7/0xab0
      [ 1281.193942]  bond_netdev_event+0x2c5/0x2e0 [bonding]
      [ 1281.193944]  ? netconsole_netdev_event+0x124/0x190 [netconsole]
      [ 1281.193947]  notifier_call_chain+0x49/0x70
      [ 1281.193948]  raw_notifier_call_chain+0x16/0x20
      [ 1281.193950]  call_netdevice_notifiers_info+0x35/0x60
      [ 1281.193951]  rollback_registered_many+0x23b/0x3e0
      [ 1281.193953]  unregister_netdevice_many+0x24/0xd0
      [ 1281.193955]  rtnl_delete_link+0x3c/0x50
      [ 1281.193956]  rtnl_dellink+0x8d/0x1b0
      [ 1281.193960]  rtnetlink_rcv_msg+0x95/0x220
      [ 1281.193962]  ? __kmalloc_node_track_caller+0x35/0x280
      [ 1281.193964]  ? __netlink_lookup+0xf1/0x110
      [ 1281.193966]  ? rtnl_newlink+0x830/0x830
      [ 1281.193967]  netlink_rcv_skb+0xa7/0xc0
      [ 1281.193969]  rtnetlink_rcv+0x28/0x30
      [ 1281.193970]  netlink_unicast+0x15b/0x210
      [ 1281.193971]  netlink_sendmsg+0x319/0x390
      [ 1281.193974]  sock_sendmsg+0x38/0x50
      [ 1281.193975]  ___sys_sendmsg+0x25c/0x270
      [ 1281.193978]  ? mem_cgroup_commit_charge+0x76/0xf0
      [ 1281.193981]  ? page_add_new_anon_rmap+0x89/0xc0
      [ 1281.193984]  ? lru_cache_add_active_or_unevictable+0x35/0xb0
      [ 1281.193985]  ? __handle_mm_fault+0x4e9/0x1170
      [ 1281.193987]  __sys_sendmsg+0x45/0x80
      [ 1281.193989]  SyS_sendmsg+0x12/0x20
      [ 1281.193991]  do_syscall_64+0x6e/0x180
      [ 1281.193993]  entry_SYSCALL64_slow_path+0x25/0x25
      [ 1281.193995] RIP: 0033:0x7f6ec122f5a0
      [ 1281.193995] RSP: 002b:00007ffe69e89c48 EFLAGS: 00000246 ORIG_RAX:
      000000000000002e
      [ 1281.193997] RAX: ffffffffffffffda RBX: 00007ffe69e8dd60 RCX: 00007f6ec122f5a0
      [ 1281.193997] RDX: 0000000000000000 RSI: 00007ffe69e89c90 RDI: 0000000000000003
      [ 1281.193998] RBP: 00007ffe69e89c90 R08: 0000000000000000 R09: 0000000000000003
      [ 1281.193999] R10: 00007ffe69e89a10 R11: 0000000000000246 R12: 0000000058f14b9f
      [ 1281.193999] R13: 0000000000000000 R14: 00000000006473a0 R15: 00007ffe69e8e450
      [ 1281.194001] ---[ end trace 713a77486cbfbfa3 ]---
      
      Fixes: 4493b81b ("bonding: initialize work-queues during creation of bond")
      Reported-by: NJoe Stringer <joe@ovn.org>
      Tested-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Acked-by: NAndy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea8ffc08
  10. 18 4月, 2017 1 次提交
  11. 14 4月, 2017 1 次提交
  12. 06 4月, 2017 1 次提交
    • J
      bonding: attempt to better support longer hw addresses · faeeb317
      Jarod Wilson 提交于
      People are using bonding over Infiniband IPoIB connections, and who knows
      what else. Infiniband has a hardware address length of 20 octets
      (INFINIBAND_ALEN), and the network core defines a MAX_ADDR_LEN of 32.
      Various places in the bonding code are currently hard-wired to 6 octets
      (ETH_ALEN), such as the 3ad code, which I've left untouched here. Besides,
      only alb is currently possible on Infiniband links right now anyway, due
      to commit 1533e773, so the alb code is where most of the changes are.
      
      One major component of this change is the addition of a bond_hw_addr_copy
      function that takes a length argument, instead of using ether_addr_copy
      everywhere that hardware addresses need to be copied about. The other
      major component of this change is converting the bonding code from using
      struct sockaddr for address storage to struct sockaddr_storage, as the
      former has an address storage space of only 14, while the latter is 128
      minus a few, which is necessary to support bonding over device with up to
      MAX_ADDR_LEN octet hardware addresses. Additionally, this probably fixes
      up some memory corruption issues with the current code, where it's
      possible to write an infiniband hardware address into a sockaddr declared
      on the stack.
      
      Lightly tested on a dual mlx4 IPoIB setup, which properly shows a 20-octet
      hardware address now:
      
      $ cat /proc/net/bonding/bond0
      Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
      
      Bonding Mode: fault-tolerance (active-backup) (fail_over_mac active)
      Primary Slave: mlx4_ib0 (primary_reselect always)
      Currently Active Slave: mlx4_ib0
      MII Status: up
      MII Polling Interval (ms): 100
      Up Delay (ms): 100
      Down Delay (ms): 100
      
      Slave Interface: mlx4_ib0
      MII Status: up
      Speed: Unknown
      Duplex: Unknown
      Link Failure Count: 0
      Permanent HW addr:
      80:00:02:08:fe:80:00:00:00:00:00:00:e4:1d:2d:03:00:1d:67:01
      Slave queue ID: 0
      
      Slave Interface: mlx4_ib1
      MII Status: up
      Speed: Unknown
      Duplex: Unknown
      Link Failure Count: 0
      Permanent HW addr:
      80:00:02:09:fe:80:00:00:00:00:00:01:e4:1d:2d:03:00:1d:67:02
      Slave queue ID: 0
      
      Also tested with a standard 1Gbps NIC bonding setup (with a mix of
      e1000 and e1000e cards), running LNST's bonding tests.
      
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: netdev@vger.kernel.org
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      faeeb317
  13. 05 4月, 2017 1 次提交
    • M
      bonding: fix active-backup transition · 3f3c278c
      Mahesh Bandewar 提交于
      Earlier patch c4adfc82 ("bonding: make speed, duplex setting
      consistent with link state") made an attempt to keep slave state
      consistent with speed and duplex settings. Unfortunately link-state
      transition is used to change the active link especially when used
      in conjunction with mii-mon. The above mentioned patch broke that
      logic. Also when speed and duplex settings for a link are updated
      during a link-event, the link-status should not be changed to
      invoke correct transition logic.
      
      This patch fixes this issue by moving the link-state update outside
      of the bond_update_speed_duplex() fn and to the places where this fn
      is called and update link-state selectively.
      
      Fixes: c4adfc82 ("bonding: make speed, duplex setting consistent
      with link state")
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Reviewed-by: NAndy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3f3c278c
  14. 31 3月, 2017 1 次提交
    • E
      bonding: refine bond_fold_stats() wrap detection · 142c6594
      Eric Dumazet 提交于
      Some device drivers reset their stats at down/up events, possibly
      fooling bonding stats, since they operate with relative deltas.
      
      It is nearly not possible to fix drivers, since some of them compute the
      tx/rx counters based on per rx/tx queue stats, and the queues can be
      reconfigured (ethtool -L) between the down/up sequence.
      
      Lets avoid accumulating 'negative' values that render bonding stats
      useless.
      
      It is better to lose small deltas, assuming the bonding stats are
      fetched at a reasonable frequency.
      
      Fixes: 5f0c5f73 ("bonding: make global bonding stats more reliable")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      142c6594
  15. 28 3月, 2017 3 次提交
    • M
      bonding: correctly update link status during mii-commit phase · b5bf0f5b
      Mahesh Bandewar 提交于
      bond_miimon_commit() marks the link UP after attempting to get the speed
      and duplex settings for the link. There is a possibility that
      bond_update_speed_duplex() could fail. This is another place where it
      could result into an inconsistent bonding link state.
      
      With this patch the link will be marked UP only if the speed and duplex
      values retrieved have sane values and processed further.
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b5bf0f5b
    • M
      bonding: make speed, duplex setting consistent with link state · c4adfc82
      Mahesh Bandewar 提交于
      bond_update_speed_duplex() retrieves speed and duplex settings. There
      is a possibility of failure in retrieving these values but caller has
      to assume it's always successful. This leads to having inconsistent
      slave link settings. If these (speed, duplex) values cannot be
      retrieved, then keeping the link UP causes problems.
      
      The updated bond_update_speed_duplex() returns 0 on success if it
      retrieves sane values for speed and duplex. On failure it returns 1
      and marks the link down.
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c4adfc82
    • M
      bonding: improve link-status update in mii-monitoring · de77ecd4
      Mahesh Bandewar 提交于
      The primary issue is that mii-inspect phase updates link-state and
      expects changes to be committed during the mii-commit phase. After
      the inspect phase if it fails to acquire rtnl-mutex, the commit
      phase (bond_mii_commit) doesn't get to run. This partially updated
      state stays and makes the internal-state inconsistent.
      
      e.g. setup bond0 => slaves: eth1, eth2
      eth1 goes DOWN -> UP
         mii_monitor()
      	mii-inspect()
      	    bond_set_slave_link_state(eth1, UP, DontNotify)
      	rtnl_trylock() <- fails!
      
      Next mii-monitor round
      eth1: No change
         mii_monitor()
      	mii-inspect()
      	    eth1->link == current-status (ethtool_ops->get_link)
      	    no-change-detected
      
      End result:
          eth1:
            Link = BOND_LINK_UP
            Speed = 0xfffff  [SpeedUnknown]
            Duplex = 0xff    [DuplexUnknown]
      
      This doesn't always happen but for some unlucky machines in a large set
      of machines it creates problems.
      
      The fix for this is to avoid making changes during inspect phase and
      postpone them until acquiring the rtnl-mutex / invoking commit phase.
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de77ecd4
  16. 10 3月, 2017 4 次提交
  17. 03 3月, 2017 1 次提交
  18. 07 2月, 2017 1 次提交
  19. 04 2月, 2017 1 次提交
  20. 09 1月, 2017 1 次提交
  21. 18 11月, 2016 1 次提交
    • A
      netns: make struct pernet_operations::id unsigned int · c7d03a00
      Alexey Dobriyan 提交于
      Make struct pernet_operations::id unsigned.
      
      There are 2 reasons to do so:
      
      1)
      This field is really an index into an zero based array and
      thus is unsigned entity. Using negative value is out-of-bound
      access by definition.
      
      2)
      On x86_64 unsigned 32-bit data which are mixed with pointers
      via array indexing or offsets added or subtracted to pointers
      are preffered to signed 32-bit data.
      
      "int" being used as an array index needs to be sign-extended
      to 64-bit before being used.
      
      	void f(long *p, int i)
      	{
      		g(p[i]);
      	}
      
        roughly translates to
      
      	movsx	rsi, esi
      	mov	rdi, [rsi+...]
      	call 	g
      
      MOVSX is 3 byte instruction which isn't necessary if the variable is
      unsigned because x86_64 is zero extending by default.
      
      Now, there is net_generic() function which, you guessed it right, uses
      "int" as an array index:
      
      	static inline void *net_generic(const struct net *net, int id)
      	{
      		...
      		ptr = ng->ptr[id - 1];
      		...
      	}
      
      And this function is used a lot, so those sign extensions add up.
      
      Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
      messing with code generation):
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      
      Unfortunately some functions actually grow bigger.
      This is a semmingly random artefact of code generation with register
      allocator being used differently. gcc decides that some variable
      needs to live in new r8+ registers and every access now requires REX
      prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
      used which is longer than [r8]
      
      However, overall balance is in negative direction:
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      	function                                     old     new   delta
      	nfsd4_lock                                  3886    3959     +73
      	tipc_link_build_proto_msg                   1096    1140     +44
      	mac80211_hwsim_new_radio                    2776    2808     +32
      	tipc_mon_rcv                                1032    1058     +26
      	svcauth_gss_legacy_init                     1413    1429     +16
      	tipc_bcbase_select_primary                   379     392     +13
      	nfsd4_exchange_id                           1247    1260     +13
      	nfsd4_setclientid_confirm                    782     793     +11
      		...
      	put_client_renew_locked                      494     480     -14
      	ip_set_sockfn_get                            730     716     -14
      	geneve_sock_add                              829     813     -16
      	nfsd4_sequence_done                          721     703     -18
      	nlmclnt_lookup_host                          708     686     -22
      	nfsd4_lockt                                 1085    1063     -22
      	nfs_get_client                              1077    1050     -27
      	tcf_bpf_init                                1106    1076     -30
      	nfsd4_encode_fattr                          5997    5930     -67
      	Total: Before=154856051, After=154854321, chg -0.00%
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7d03a00
  22. 31 10月, 2016 1 次提交
  23. 18 10月, 2016 1 次提交
  24. 28 9月, 2016 1 次提交
    • A
      bonding: quit messing with IOCTL · 4ad41c1e
      Al Viro 提交于
      The only remaining users are issuing SIOCGMIIPHY and SIOCGMIIREG,
      neither of which deals with userland pointers.  Simply calling
      ->ndo_do_ioctl() is fine; no messing with set_fs() is needed.
      It used to mess with SIOCETHTOOL, which would've needed set_fs(),
      but that has been killed in "[NET] ethtool ops are the only way"
      9 years ago...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4ad41c1e
  25. 05 9月, 2016 1 次提交
    • M
      bonding: Fix bonding crash · 24b27fc4
      Mahesh Bandewar 提交于
      Following few steps will crash kernel -
      
        (a) Create bonding master
            > modprobe bonding miimon=50
        (b) Create macvlan bridge on eth2
            > ip link add link eth2 dev mvl0 address aa:0:0:0:0:01 \
      	   type macvlan
        (c) Now try adding eth2 into the bond
            > echo +eth2 > /sys/class/net/bond0/bonding/slaves
            <crash>
      
      Bonding does lots of things before checking if the device enslaved is
      busy or not.
      
      In this case when the notifier call-chain sends notifications, the
      bond_netdev_event() assumes that the rx_handler /rx_handler_data is
      registered while the bond_enslave() hasn't progressed far enough to
      register rx_handler for the new slave.
      
      This patch adds a rx_handler check that can be performed right at the
      beginning of the enslave code to avoid getting into this situation.
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      24b27fc4
  26. 02 9月, 2016 1 次提交
  27. 10 8月, 2016 1 次提交
  28. 26 7月, 2016 1 次提交
  29. 06 7月, 2016 2 次提交
  30. 10 6月, 2016 1 次提交
  31. 08 6月, 2016 1 次提交
  32. 19 3月, 2016 2 次提交
  33. 26 2月, 2016 1 次提交