1. 18 11月, 2016 1 次提交
    • A
      netns: make struct pernet_operations::id unsigned int · c7d03a00
      Alexey Dobriyan 提交于
      Make struct pernet_operations::id unsigned.
      
      There are 2 reasons to do so:
      
      1)
      This field is really an index into an zero based array and
      thus is unsigned entity. Using negative value is out-of-bound
      access by definition.
      
      2)
      On x86_64 unsigned 32-bit data which are mixed with pointers
      via array indexing or offsets added or subtracted to pointers
      are preffered to signed 32-bit data.
      
      "int" being used as an array index needs to be sign-extended
      to 64-bit before being used.
      
      	void f(long *p, int i)
      	{
      		g(p[i]);
      	}
      
        roughly translates to
      
      	movsx	rsi, esi
      	mov	rdi, [rsi+...]
      	call 	g
      
      MOVSX is 3 byte instruction which isn't necessary if the variable is
      unsigned because x86_64 is zero extending by default.
      
      Now, there is net_generic() function which, you guessed it right, uses
      "int" as an array index:
      
      	static inline void *net_generic(const struct net *net, int id)
      	{
      		...
      		ptr = ng->ptr[id - 1];
      		...
      	}
      
      And this function is used a lot, so those sign extensions add up.
      
      Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
      messing with code generation):
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      
      Unfortunately some functions actually grow bigger.
      This is a semmingly random artefact of code generation with register
      allocator being used differently. gcc decides that some variable
      needs to live in new r8+ registers and every access now requires REX
      prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
      used which is longer than [r8]
      
      However, overall balance is in negative direction:
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      	function                                     old     new   delta
      	nfsd4_lock                                  3886    3959     +73
      	tipc_link_build_proto_msg                   1096    1140     +44
      	mac80211_hwsim_new_radio                    2776    2808     +32
      	tipc_mon_rcv                                1032    1058     +26
      	svcauth_gss_legacy_init                     1413    1429     +16
      	tipc_bcbase_select_primary                   379     392     +13
      	nfsd4_exchange_id                           1247    1260     +13
      	nfsd4_setclientid_confirm                    782     793     +11
      		...
      	put_client_renew_locked                      494     480     -14
      	ip_set_sockfn_get                            730     716     -14
      	geneve_sock_add                              829     813     -16
      	nfsd4_sequence_done                          721     703     -18
      	nlmclnt_lookup_host                          708     686     -22
      	nfsd4_lockt                                 1085    1063     -22
      	nfs_get_client                              1077    1050     -27
      	tcf_bpf_init                                1106    1076     -30
      	nfsd4_encode_fattr                          5997    5930     -67
      	Total: Before=154856051, After=154854321, chg -0.00%
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7d03a00
  2. 31 10月, 2016 1 次提交
  3. 18 10月, 2016 1 次提交
  4. 28 9月, 2016 1 次提交
    • A
      bonding: quit messing with IOCTL · 4ad41c1e
      Al Viro 提交于
      The only remaining users are issuing SIOCGMIIPHY and SIOCGMIIREG,
      neither of which deals with userland pointers.  Simply calling
      ->ndo_do_ioctl() is fine; no messing with set_fs() is needed.
      It used to mess with SIOCETHTOOL, which would've needed set_fs(),
      but that has been killed in "[NET] ethtool ops are the only way"
      9 years ago...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4ad41c1e
  5. 05 9月, 2016 1 次提交
    • M
      bonding: Fix bonding crash · 24b27fc4
      Mahesh Bandewar 提交于
      Following few steps will crash kernel -
      
        (a) Create bonding master
            > modprobe bonding miimon=50
        (b) Create macvlan bridge on eth2
            > ip link add link eth2 dev mvl0 address aa:0:0:0:0:01 \
      	   type macvlan
        (c) Now try adding eth2 into the bond
            > echo +eth2 > /sys/class/net/bond0/bonding/slaves
            <crash>
      
      Bonding does lots of things before checking if the device enslaved is
      busy or not.
      
      In this case when the notifier call-chain sends notifications, the
      bond_netdev_event() assumes that the rx_handler /rx_handler_data is
      registered while the bond_enslave() hasn't progressed far enough to
      register rx_handler for the new slave.
      
      This patch adds a rx_handler check that can be performed right at the
      beginning of the enslave code to avoid getting into this situation.
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      24b27fc4
  6. 02 9月, 2016 1 次提交
  7. 10 8月, 2016 1 次提交
  8. 26 7月, 2016 1 次提交
  9. 06 7月, 2016 2 次提交
  10. 10 6月, 2016 1 次提交
  11. 08 6月, 2016 1 次提交
  12. 19 3月, 2016 2 次提交
  13. 26 2月, 2016 1 次提交
  14. 17 2月, 2016 1 次提交
    • J
      bonding: don't use stale speed and duplex information · 266b495f
      Jay Vosburgh 提交于
      There is presently a race condition between the bonding periodic
      link monitor and the updating of a slave's speed and duplex.  The former
      occurs on a periodic basis, and the latter in response to a driver's
      calling of netif_carrier_on.
      
      	It is possible for the periodic monitor to run between the
      driver call of netif_carrier_on and the receipt of the NETDEV_CHANGE
      event that causes bonding to update the slave's speed and duplex.  This
      manifests most notably as a report that a slave is up and "0 Mbps full
      duplex" after enslavement, but in principle could report an incorrect
      speed and duplex after any link up event if the device comes up with a
      different speed or duplex.  This affects the 802.3ad aggregator
      selection, as the speed and duplex are selection criteria.
      
      	This is fixed by updating the speed and duplex in the periodic
      monitor, prior to using that information.
      
      	This was done historically in bonding, but the call to
      bond_update_speed_duplex was removed in commit 876254ae ("bonding:
      don't call update_speed_duplex() under spinlocks"), as it might sleep
      under lock.  Later, the locking was changed to only hold RTNL, and so
      after commit 876254ae ("bonding: don't call update_speed_duplex()
      under spinlocks") this call is again safe.
      Tested-by: N"Tantilov, Emil S" <emil.s.tantilov@intel.com>
      Cc: Veaceslav Falico <vfalico@gmail.com>
      Cc: dingtianhong <dingtianhong@huawei.com>
      Fixes: 876254ae ("bonding: don't call update_speed_duplex() under spinlocks")
      Signed-off-by: NJay Vosburgh <jay.vosburgh@canonical.com>
      Acked-by: NDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      266b495f
  15. 13 2月, 2016 1 次提交
    • J
      bonding: Fix ARP monitor validation · 21a75f09
      Jay Vosburgh 提交于
      The current logic in bond_arp_rcv will accept an incoming ARP for
      validation if (a) the receiving slave is either "active" (which includes
      the currently active slave, or the current ARP slave) or, (b) there is a
      currently active slave, and it has received an ARP since it became active.
      For case (b), the receiving slave isn't the currently active slave, and is
      receiving the original broadcast ARP request, not an ARP reply from the
      target.
      
      	This logic can fail if there is no currently active slave.  In
      this situation, the ARP probe logic cycles through all slaves, assigning
      each in turn as the "current_arp_slave" for one arp_interval, then setting
      that one as "active," and sending an ARP probe from that slave.  The
      current logic expects the ARP reply to arrive on the sending
      current_arp_slave, however, due to switch FDB updating delays, the reply
      may be directed to another slave.
      
      	This can arise if the bonding slaves and switch are working, but
      the ARP target is not responding.  When the ARP target recovers, a
      condition may result wherein the ARP target host replies faster than the
      switch can update its forwarding table, causing each ARP reply to be sent
      to the previous current_arp_slave.  This will never pass the logic in
      bond_arp_rcv, as neither of the above conditions (a) or (b) are met.
      
      	Some experimentation on a LAN shows ARP reply round trips in the
      200 usec range, but my available switches never update their FDB in less
      than 4000 usec.
      
      	This patch changes the logic in bond_arp_rcv to additionally
      accept an ARP reply for validation on any slave if there is a current ARP
      slave and it sent an ARP probe during the previous arp_interval.
      
      Fixes: aeea64ac ("bonding: don't trust arp requests unless active slave really works")
      Cc: Veaceslav Falico <vfalico@gmail.com>
      Cc: Andy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: NJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      21a75f09
  16. 11 2月, 2016 1 次提交
  17. 08 2月, 2016 1 次提交
  18. 06 2月, 2016 2 次提交
  19. 12 1月, 2016 1 次提交
  20. 16 12月, 2015 1 次提交
    • T
      net: Rename NETIF_F_ALL_CSUM to NETIF_F_CSUM_MASK · a188222b
      Tom Herbert 提交于
      The name NETIF_F_ALL_CSUM is a misnomer. This does not correspond to the
      set of features for offloading all checksums. This is a mask of the
      checksum offload related features bits. It is incorrect to set both
      NETIF_F_HW_CSUM and NETIF_F_IP_CSUM or NETIF_F_IPV6 at the same time for
      features of a device.
      
      This patch:
        - Changes instances of NETIF_F_ALL_CSUM to NETIF_F_CSUM_MASK (where
          NETIF_F_ALL_CSUM is being used as a mask).
        - Changes bonding, sfc/efx, ipvlan, macvlan, vlan, and team drivers to
          use NEITF_F_HW_CSUM in features list instead of NETIF_F_ALL_CSUM.
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a188222b
  21. 04 12月, 2015 7 次提交
  22. 08 11月, 2015 1 次提交
    • J
      bonding: fix panic on non-ARPHRD_ETHER enslave failure · 40baec22
      Jay Vosburgh 提交于
      Since commit 7d5cd2ce529b, when bond_enslave fails on devices that
      are not ARPHRD_ETHER, if needed, it resets the bonding device back to
      ARPHRD_ETHER by calling ether_setup.
      
      	Unfortunately, ether_setup clobbers dev->flags, clearing IFF_UP
      if the bond device is up, leaving it in a quasi-down state without
      having actually gone through dev_close.  For bonding, if any periodic
      work queue items are active (miimon, arp_interval, etc), those will
      remain running, as they are stopped by bond_close.  At this point, if
      the bonding module is unloaded or the bond is deleted, the system will
      panic when the work function is called.
      
      	This panic is resolved by calling dev_close on the bond itself
      prior to calling ether_setup.
      
      Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NJay Vosburgh <jay.vosburgh@canonical.com>
      Fixes: 7d5cd2ce ("bonding: correctly handle bonding type change on enslave failure")
      Acked-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      40baec22
  23. 03 11月, 2015 1 次提交
  24. 16 10月, 2015 1 次提交
  25. 18 9月, 2015 1 次提交
    • E
      bonding: use l4 hash if available · 4b1b865e
      Eric Dumazet 提交于
      If skb carries a l4 hash, no need to perform a flow dissection.
      
      Performance is slightly better :
      
      lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100
      2.39012e+06
      lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100
      2.39393e+06
      lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100
      2.39988e+06
      
      After patch :
      
      lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100
      2.43579e+06
      lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100
      2.44304e+06
      lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100
      2.44312e+06
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Tom Herbert <tom@herbertland.com>
      Cc: Mahesh Bandewar <maheshb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4b1b865e
  26. 02 9月, 2015 1 次提交
  27. 29 8月, 2015 1 次提交
    • N
      bonding: fix bond_poll_controller bh_enable warning · b0d4943e
      Nikolay Aleksandrov 提交于
      The problem is rcu_read_unlock_bh() which triggers a warning when irqs are
      disabled. ndo_poll_controller should run with irqs disabled always so we
      can drop the rcu_read_lock_bh.
      
      [   98.502922] bond0: making interface eth1 the new active one
      [   98.503039] ------------[ cut here ]------------
      [   98.503039] WARNING: CPU: 0 PID: 1744 at kernel/softirq.c:150 __local_bh_enable_ip+0x96/0xc0()
      [   98.503039] Modules linked in: bonding(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache netconsole ppdev joydev parport_pc serio_raw parport i2c_piix4 video acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc virtio_net e1000 ata_generic pcnet32 mii virtio_pci virtio_ring virtio pata_acpi
      [   98.503039] CPU: 0 PID: 1744 Comm: ifenslave Tainted: G           OE   4.2.0-rc7+ #56
      [   98.503039] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [   98.503039]  0000000000000000 00000000e96ba230 ffff880020c236b8 ffffffff8183f105
      [   98.503039]  0000000000000000 0000000000000000 ffff880020c236f8 ffffffff810a9496
      [   98.503039]  ffff88002ea99e08 0000000000000200 ffffffffa02a8e06 ffff88002ea99e08
      [   98.503039] Call Trace:
      [   98.503039]  [<ffffffff8183f105>] dump_stack+0x4c/0x65
      [   98.503039]  [<ffffffff810a9496>] warn_slowpath_common+0x86/0xc0
      [   98.503039]  [<ffffffffa02a8e06>] ? bond_poll_controller+0x146/0x250 [bonding]
      [   98.503039]  [<ffffffff810a95ca>] warn_slowpath_null+0x1a/0x20
      [   98.503039]  [<ffffffff810ae376>] __local_bh_enable_ip+0x96/0xc0
      [   98.503039]  [<ffffffffa02a8e2f>] bond_poll_controller+0x16f/0x250 [bonding]
      [   98.503039]  [<ffffffffa02a8cf3>] ? bond_poll_controller+0x33/0x250 [bonding]
      [   98.503039]  [<ffffffff810feaed>] ? trace_hardirqs_off+0xd/0x10
      [   98.503039]  [<ffffffff81848afb>] ? _raw_spin_unlock_irqrestore+0x5b/0x60
      [   98.503039]  [<ffffffff816ec48e>] netpoll_poll_dev+0x6e/0x350
      [   98.503039]  [<ffffffff816eb977>] ? netpoll_start_xmit+0x137/0x1d0
      [   98.503039]  [<ffffffff816b2e8b>] ? __alloc_skb+0x5b/0x210
      [   98.503039]  [<ffffffff816ec89d>] netpoll_send_skb_on_dev+0x12d/0x2a0
      [   98.503039]  [<ffffffff816eccde>] netpoll_send_udp+0x2ce/0x430
      [   98.503039]  [<ffffffffa0190850>] write_msg+0xb0/0xf0 [netconsole]
      [   98.503039]  [<ffffffff81116b63>] call_console_drivers.constprop.25+0x133/0x260
      [   98.503039]  [<ffffffff81117934>] console_unlock+0x2f4/0x580
      [   98.503039]  [<ffffffff81117ea5>] ? vprintk_emit+0x2e5/0x630
      [   98.503039]  [<ffffffff81117ee5>] vprintk_emit+0x325/0x630
      [   98.503039]  [<ffffffff81118379>] vprintk_default+0x29/0x40
      [   98.503039]  [<ffffffff8183de4f>] printk+0x55/0x6b
      [   98.503039]  [<ffffffff816c754c>] __netdev_printk+0x16c/0x260
      [   98.503039]  [<ffffffff816c7a12>] netdev_info+0x62/0x80
      [   98.503039]  [<ffffffffa02ab464>] bond_change_active_slave+0x134/0x6a0 [bonding]
      [   98.503039]  [<ffffffffa02aba95>] bond_select_active_slave+0xc5/0x310 [bonding]
      [   98.503039]  [<ffffffffa02aeb78>] bond_enslave+0x1088/0x10c0 [bonding]
      [   98.503039]  [<ffffffffa02af46b>] bond_do_ioctl+0x37b/0x400 [bonding]
      [   98.503039]  [<ffffffff81101d8d>] ? trace_hardirqs_on+0xd/0x10
      [   98.503039]  [<ffffffff816dc437>] ? rtnl_lock+0x17/0x20
      [   98.503039]  [<ffffffff816e5fd1>] dev_ifsioc+0x331/0x3e0
      [   98.503039]  [<ffffffff816e62dc>] dev_ioctl+0xec/0x6c0
      [   98.503039]  [<ffffffff816a6c6a>] sock_do_ioctl+0x4a/0x60
      [   98.503039]  [<ffffffff816a7300>] sock_ioctl+0x1c0/0x250
      [   98.503039]  [<ffffffff81271bfe>] do_vfs_ioctl+0x2ee/0x540
      [   98.503039]  [<ffffffff810fd943>] ? up_read+0x23/0x40
      [   98.503039]  [<ffffffff81070993>] ? __do_page_fault+0x1d3/0x420
      [   98.503039]  [<ffffffff8127e246>] ? __fget_light+0x66/0x90
      [   98.503039]  [<ffffffff81271ec9>] SyS_ioctl+0x79/0x90
      [   98.503039]  [<ffffffff8184936e>] entry_SYSCALL_64_fastpath+0x12/0x76
      [   98.503039] ---[ end trace 00cfa804b0670051 ]---
      
      Fixes: 616f4541 ("bonding: implement bond_poll_controller()")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Acked-by: NMahesh Bandewar <maheshb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b0d4943e
  28. 19 8月, 2015 1 次提交
  29. 13 8月, 2015 1 次提交
  30. 21 7月, 2015 2 次提交
    • D
      bonding: correct the MAC address for "follow" fail_over_mac policy · a951bc1e
      dingtianhong 提交于
      The "follow" fail_over_mac policy is useful for multiport devices that
      either become confused or incur a performance penalty when multiple
      ports are programmed with the same MAC address, but the same MAC
      address still may happened by this steps for this policy:
      
      1) echo +eth0 > /sys/class/net/bond0/bonding/slaves
         bond0 has the same mac address with eth0, it is MAC1.
      
      2) echo +eth1 > /sys/class/net/bond0/bonding/slaves
         eth1 is backup, eth1 has MAC2.
      
      3) ifconfig eth0 down
         eth1 became active slave, bond will swap MAC for eth0 and eth1,
         so eth1 has MAC1, and eth0 has MAC2.
      
      4) ifconfig eth1 down
         there is no active slave, and eth1 still has MAC1, eth2 has MAC2.
      
      5) ifconfig eth0 up
         the eth0 became active slave again, the bond set eth0 to MAC1.
      
      Something wrong here, then if you set eth1 up, the eth0 and eth1 will have the same
      MAC address, it will break this policy for ACTIVE_BACKUP mode.
      
      This patch will fix this problem by finding the old active slave and
      swap them MAC address before change active slave.
      Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
      Tested-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a951bc1e
    • N
      bonding: correctly handle bonding type change on enslave failure · 7d5cd2ce
      Nikolay Aleksandrov 提交于
      If the bond is enslaving a device with different type it will be setup
      by it, but if after being setup the enslave fails the bond doesn't
      switch back its type and also keeps pointers to foreign structures that can
      be long gone. Thus revert back any type changes if the enslave failed and
      the bond had to change its type.
      Example:
       Before patch:
      $ echo lo > bond0/bonding/slaves
      -bash: echo: write error: Cannot assign requested address
      $ ip l sh bond0
      20: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN
      mode DEFAULT group default
          link/loopback 16:54:78:34:bd:41 brd 00:00:00:00:00:00
      $ echo +eth1 > bond0/bonding/slaves
      $ ip l sh bond0
      20: bond0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
      DEFAULT group default qlen 1000
          link/ether 52:54:00:3f:47:69 brd ff:ff:ff:ff:ff:ff
      (notice the MASTER flag is gone)
      
       After patch:
      $ echo lo > bond0/bonding/slaves
      -bash: echo: write error: Cannot assign requested address
      $ ip l sh bond0
      21: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN
      mode DEFAULT group default qlen 1000
          link/ether 6e:66:94:f6:07:fc brd ff:ff:ff:ff:ff:ff
      $ echo +eth1 > bond0/bonding/slaves
      $ ip l sh bond0
      21: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN
      mode DEFAULT group default qlen 1000
          link/ether 52:54:00:3f:47:69 brd ff:ff:ff:ff:ff:ff
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Fixes: e36b9d16 ("bonding: clean muticast addresses when device changes type")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d5cd2ce