1. 17 2月, 2016 20 次提交
    • A
      net: arc_emac: fix koops caused by sk_buff free · c278c253
      Alexander Kochetkov 提交于
      There is a race between arc_emac_tx() and arc_emac_tx_clean().
      sk_buff got freed by arc_emac_tx_clean() while arc_emac_tx()
      submitting sk_buff.
      
      In order to free sk_buff arc_emac_tx_clean() checks:
          if ((info & FOR_EMAC) || !txbd->data)
              break;
          ...
          dev_kfree_skb_irq(skb);
      
      If condition false, arc_emac_tx_clean() free sk_buff.
      
      In order to submit txbd, arc_emac_tx() do:
          priv->tx_buff[*txbd_curr].skb = skb;
          ...
          priv->txbd[*txbd_curr].data = cpu_to_le32(addr);
          ...
          ...  <== arc_emac_tx_clean() check condition here
          ...  <== (info & FOR_EMAC) is false
          ...  <== !txbd->data is false
          ...
          *info = cpu_to_le32(FOR_EMAC | FIRST_OR_LAST_MASK | len);
      
      In order to reproduce the situation,
      run device:
          # iperf -s
      run on host:
          # iperf -t 600 -c <device-ip-addr>
      
      [   28.396284] ------------[ cut here ]------------
      [   28.400912] kernel BUG at .../net/core/skbuff.c:1355!
      [   28.414019] Internal error: Oops - BUG: 0 [#1] SMP ARM
      [   28.419150] Modules linked in:
      [   28.422219] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G    B           4.4.0+ #120
      [   28.429516] Hardware name: Rockchip (Device Tree)
      [   28.434216] task: c0665070 ti: c0660000 task.ti: c0660000
      [   28.439622] PC is at skb_put+0x10/0x54
      [   28.443381] LR is at arc_emac_poll+0x260/0x474
      [   28.447821] pc : [<c03af580>]    lr : [<c028fec4>]    psr: a0070113
      [   28.447821] sp : c0661e58  ip : eea68502  fp : ef377000
      [   28.459280] r10: 0000012c  r9 : f08b2000  r8 : eeb57100
      [   28.464498] r7 : 00000000  r6 : ef376594  r5 : 00000077  r4 : ef376000
      [   28.471015] r3 : 0030488b  r2 : ef13e880  r1 : 000005ee  r0 : eeb57100
      [   28.477534] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
      [   28.484658] Control: 10c5387d  Table: 8eaf004a  DAC: 00000051
      [   28.490396] Process swapper/0 (pid: 0, stack limit = 0xc0660210)
      [   28.496393] Stack: (0xc0661e58 to 0xc0662000)
      [   28.500745] 1e40:                                                       00000002 00000000
      [   28.508913] 1e60: 00000000 ef376520 00000028 f08b23b8 00000000 ef376520 ef7b6900 c028fc64
      [   28.517082] 1e80: 2f158000 c0661ea8 c0661eb0 0000012c c065e900 c03bdeac ffff95e9 c0662100
      [   28.525250] 1ea0: c0663924 00000028 c0661ea8 c0661ea8 c0661eb0 c0661eb0 0000001e c0660000
      [   28.533417] 1ec0: 40000003 00000008 c0695a00 0000000a c066208c 00000100 c0661ee0 c0027410
      [   28.541584] 1ee0: ef0fb700 2f158000 00200000 ffff95e8 00000004 c0662100 c0662080 00000003
      [   28.549751] 1f00: 00000000 00000000 00000000 c065b45c 0000001e ef005000 c0647a30 00000000
      [   28.557919] 1f20: 00000000 c0027798 00000000 c005cf40 f0802100 c0662ffc c0661f60 f0803100
      [   28.566088] 1f40: c0661fb8 c00093bc c000ffb4 60070013 ffffffff c0661f94 c0661fb8 c00137d4
      [   28.574267] 1f60: 00000001 00000000 00000000 c001ffa0 00000000 c0660000 00000000 c065a364
      [   28.582441] 1f80: c0661fb8 c0647a30 00000000 00000000 00000000 c0661fb0 c000ffb0 c000ffb4
      [   28.590608] 1fa0: 60070013 ffffffff 00000051 00000000 00000000 c005496c c0662400 c061bc40
      [   28.598776] 1fc0: ffffffff ffffffff 00000000 c061b680 00000000 c0647a30 00000000 c0695294
      [   28.606943] 1fe0: c0662488 c0647a2c c066619c 6000406a 413fc090 6000807c 00000000 00000000
      [   28.615127] [<c03af580>] (skb_put) from [<ef376520>] (0xef376520)
      [   28.621218] Code: e5902054 e590c090 e3520000 0a000000 (e7f001f2)
      [   28.627307] ---[ end trace 4824734e2243fdb6 ]---
      
      [   34.377068] Internal error: Oops: 17 [#1] SMP ARM
      [   34.382854] Modules linked in:
      [   34.385947] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 4.4.0+ #120
      [   34.392219] Hardware name: Rockchip (Device Tree)
      [   34.396937] task: ef02d040 ti: ef05c000 task.ti: ef05c000
      [   34.402376] PC is at __dev_kfree_skb_irq+0x4/0x80
      [   34.407121] LR is at arc_emac_poll+0x130/0x474
      [   34.411583] pc : [<c03bb640>]    lr : [<c028fd94>]    psr: 60030013
      [   34.411583] sp : ef05de68  ip : 0008e83c  fp : ef377000
      [   34.423062] r10: c001bec4  r9 : 00000000  r8 : f08b24c8
      [   34.428296] r7 : f08b2400  r6 : 00000075  r5 : 00000019  r4 : ef376000
      [   34.434827] r3 : 00060000  r2 : 00000042  r1 : 00000001  r0 : 00000000
      [   34.441365] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
      [   34.448507] Control: 10c5387d  Table: 8f25c04a  DAC: 00000051
      [   34.454262] Process ksoftirqd/0 (pid: 3, stack limit = 0xef05c210)
      [   34.460449] Stack: (0xef05de68 to 0xef05e000)
      [   34.464827] de60:                   ef376000 c028fd94 00000000 c0669480 c0669480 ef376520
      [   34.473022] de80: 00000028 00000001 00002ae4 ef376520 ef7b6900 c028fc64 2f158000 ef05dec0
      [   34.481215] dea0: ef05dec8 0000012c c065e900 c03bdeac ffff983f c0662100 c0663924 00000028
      [   34.489409] dec0: ef05dec0 ef05dec0 ef05dec8 ef05dec8 ef7b6000 ef05c000 40000003 00000008
      [   34.497600] dee0: c0695a00 0000000a c066208c 00000100 ef05def8 c0027410 ef7b6000 40000000
      [   34.505795] df00: 04208040 ffff983e 00000004 c0662100 c0662080 00000003 ef05c000 ef027340
      [   34.513985] df20: ef05c000 c0666c2c 00000000 00000001 00000002 00000000 00000000 c0027568
      [   34.522176] df40: ef027340 c003ef48 ef027300 00000000 ef027340 c003edd4 00000000 00000000
      [   34.530367] df60: 00000000 c003c37c ffffff7f 00000001 00000000 ef027340 00000000 00030003
      [   34.538559] df80: ef05df80 ef05df80 00000000 00000000 ef05df90 ef05df90 ef05dfac ef027300
      [   34.546750] dfa0: c003c2a4 00000000 00000000 c000f578 00000000 00000000 00000000 00000000
      [   34.554939] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      [   34.563129] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 ffffffff dfff7fff
      [   34.571360] [<c03bb640>] (__dev_kfree_skb_irq) from [<c028fd94>] (arc_emac_poll+0x130/0x474)
      [   34.579840] [<c028fd94>] (arc_emac_poll) from [<c03bdeac>] (net_rx_action+0xdc/0x28c)
      [   34.587712] [<c03bdeac>] (net_rx_action) from [<c0027410>] (__do_softirq+0xcc/0x1f8)
      [   34.595482] [<c0027410>] (__do_softirq) from [<c0027568>] (run_ksoftirqd+0x2c/0x50)
      [   34.603168] [<c0027568>] (run_ksoftirqd) from [<c003ef48>] (smpboot_thread_fn+0x174/0x18c)
      [   34.611466] [<c003ef48>] (smpboot_thread_fn) from [<c003c37c>] (kthread+0xd8/0xec)
      [   34.619075] [<c003c37c>] (kthread) from [<c000f578>] (ret_from_fork+0x14/0x3c)
      [   34.626317] Code: e8bd8010 e3a00000 e12fff1e e92d4010 (e59030a4)
      [   34.632572] ---[ end trace cca5a3d86a82249a ]---
      Signed-off-by: NAlexander Kochetkov <al.kochet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c278c253
    • A
      net: Copy inner L3 and L4 headers as unaligned on GRE TEB · 78565208
      Alexander Duyck 提交于
      This patch corrects the unaligned accesses seen on GRE TEB tunnels when
      generating hash keys.  Specifically what this patch does is make it so that
      we force the use of skb_copy_bits when the GRE inner headers will be
      unaligned due to NET_IP_ALIGNED being a non-zero value.
      Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
      Acked-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      78565208
    • D
      Merge branch 'mlx5-fixes' · 7b4c534e
      David S. Miller 提交于
      Saeed Mahameed says:
      
      ====================
      mlx5 driver fixes for 4.5-rc2
      
      We added here a patch from Matan and Alaa for addressing Linus comments on
      the mess w.r.t reserved field names in the driver/firmware auto-generated file.
      
      Once the patch hits linus tree, we'll ask Doug to rebase his tree on that
      rc so both net-next and rdma-next development for 4.6 will be done under
      the fixed robust form.
      
      Also provided two patches that addresses the dynamic ndo initialization
      issue of mlx5e netdevice.
      
      Or and Saeed.
      
      changes from V1: (Only first patch was changed)
      In this V we fixed the issues addressed in Or's previous e-mail.
      	1. Offsets took into account two dimensional u8 arrays
      	2. Offsets took into account nesting unions and structs
      	3. Offsets for unions
      	4. Offsets for any reserved field
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7b4c534e
    • S
      net/mlx5e: Use static constant netdevice ndos · b0eed40e
      Saeed Mahameed 提交于
      Currently our netdevice ops is a one static global variable which
      is referenced by all mlx5e netdevice instances. This can be
      problematic when different driver instances do not share same
      HW capabilities (e.g SRIOV PF and VFs probed to the host).
      
      Now we have two constant global netdevice ops variables, one
      for basic netdevice ops and the other with extended SRIOV ops,
      on netdevice construction we choose the one suitable for
      current device capabilities.
      
      Fixes: 66e49ded ("net/mlx5e: Add support for SR-IOV ndos")
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b0eed40e
    • S
      net/mlx5e: Remove select queue ndo initialization · b2368727
      Saeed Mahameed 提交于
      Currently mlx5e_select_queue is redundant since num_tc is always 1.
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b2368727
    • M
      net/mlx5: Use offset based reserved field names in the IFC header file · b4ff3a36
      Matan Barak 提交于
      mlx5_ifc.h is a header file representing the API and ABI between
      the driver to the firmware and hardware. This file is used from
      both the mlx5_ib and mlx5_core drivers.
      
      Previously, this file used incrementing counter to indicate
      reserved fields, for example:
      
      struct mlx5_ifc_odp_per_transport_service_cap_bits {
              u8         send[0x1];
              u8         receive[0x1];
              u8         write[0x1];
              u8         read[0x1];
              u8         reserved_0[0x1];
              u8         srq_receive[0x1];
              u8         reserved_1[0x1a];
      };
      
      If one developer implements through net-next feature A that uses
      reserved_0, they replace it with featureA and renames reserved_1 to
      reserved_0. In the same kernel cycle, a 2nd developer could implement
      feature B through the rdma tree, that uses reserved_1 and split it to
      featureB and a smaller reserved_1 field. This will cause a conflict
      when the two trees are merged.
      
      The source of this conflict is that the 1st developer changed *all*
      reserved fields.
      
      As Linus suggested, we change the layout of structs to:
      
      struct mlx5_ifc_odp_per_transport_service_cap_bits {
      	u8         send[0x1];
      	u8         receive[0x1];
      	u8         write[0x1];
      	u8         read[0x1];
      	u8         reserved_at_4[0x1];
      	u8         srq_receive[0x1];
      	u8         reserved_at_6[0x1a];
      };
      
      This makes the conflicts much more rare and preserves the locality of
      changes.
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NAlaa Hleihel <alaa@mellanox.com>
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b4ff3a36
    • J
      bonding: don't use stale speed and duplex information · 266b495f
      Jay Vosburgh 提交于
      There is presently a race condition between the bonding periodic
      link monitor and the updating of a slave's speed and duplex.  The former
      occurs on a periodic basis, and the latter in response to a driver's
      calling of netif_carrier_on.
      
      	It is possible for the periodic monitor to run between the
      driver call of netif_carrier_on and the receipt of the NETDEV_CHANGE
      event that causes bonding to update the slave's speed and duplex.  This
      manifests most notably as a report that a slave is up and "0 Mbps full
      duplex" after enslavement, but in principle could report an incorrect
      speed and duplex after any link up event if the device comes up with a
      different speed or duplex.  This affects the 802.3ad aggregator
      selection, as the speed and duplex are selection criteria.
      
      	This is fixed by updating the speed and duplex in the periodic
      monitor, prior to using that information.
      
      	This was done historically in bonding, but the call to
      bond_update_speed_duplex was removed in commit 876254ae ("bonding:
      don't call update_speed_duplex() under spinlocks"), as it might sleep
      under lock.  Later, the locking was changed to only hold RTNL, and so
      after commit 876254ae ("bonding: don't call update_speed_duplex()
      under spinlocks") this call is again safe.
      Tested-by: N"Tantilov, Emil S" <emil.s.tantilov@intel.com>
      Cc: Veaceslav Falico <vfalico@gmail.com>
      Cc: dingtianhong <dingtianhong@huawei.com>
      Fixes: 876254ae ("bonding: don't call update_speed_duplex() under spinlocks")
      Signed-off-by: NJay Vosburgh <jay.vosburgh@canonical.com>
      Acked-by: NDing Tianhong <dingtianhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      266b495f
    • A
      net: am79c961a: avoid %? in inline assembly · a5a23ad5
      Arnd Bergmann 提交于
      The am79c961a.c driver fails to build with clang because of an
      unusual inline assembly construct:
      
      drivers/net/ethernet/amd/am79c961a.c:53:7: error: invalid % escape in inline assembly string
       "str%?h        %1, [%2]        @ NET_RAP\n\t"
      
      The same change has been done a decade ago in arch/arm as of
      6a39dd62 ("[ARM] 3759/2: Remove uses of %?"), but apparently
      some drivers were missed.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a5a23ad5
    • R
      net: smc91x: propagate irq return code · bd59cfc5
      Robert Jarzmik 提交于
      The smc91x driver doesn't honor the probe deferral mechanism when the
      interrupt source is not yet available, such as one provided by a gpio
      controller not probed.
      
      Fix this by propagating the platform_get_irq() error code as the probe
      return value.
      Signed-off-by: NRobert Jarzmik <robert.jarzmik@free.fr>
      Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd59cfc5
    • D
      Merge branch 'bcm7xxx-fixes' · bfb3a9df
      David S. Miller 提交于
      Florian Fainelli says:
      
      ====================
      Subject: [PATCH net v2 0/4] net: phy: bcm7xxx 40nm PHY fixes
      
      Here is a collection of fixes for the 40nm Ethernet PHY supported
      by the 7xxx PHY driver, please also queue these fixes for stable.
      
      Changes in v2:
      
      - dropped the cleanup patch, not appropriate
      - added another patch removing bogus wildcard entries
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bfb3a9df
    • F
      net: phy: bcm7xxx: Remove wildcard entries · 815717d1
      Florian Fainelli 提交于
      Remove the two wildcard entries, they serve no purpose and will match way too
      many devices, some of them being covered by the driver in
      drivers/net/phy/broadcom.c. Remove the now unused bcm7xxx_dummy_config_init()
      function which would produce a warning.
      
      Fixes: b560a58c ("net: phy: add Broadcom BCM7xxx internal PHY driver")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      815717d1
    • F
      net: phy: bcm7xxx: Fix bcm7xxx_config_init() check · 258bf443
      Florian Fainelli 提交于
      Since we were wrongly advertising gigabit features for these 10/100 only
      Ethernet PHYs, bcm7xxx_config_init() which is supposed to apply workaround
      would have not run since the check would be true, now that we have fixed the
      PHY features, remove that check since it has no reasoning to be there anymore.
      
      Fixes: e18556ee ("net: phy: bcm7xxx: do not use PHY_BRCM_100MBPS_WAR")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      258bf443
    • F
      net: phy: bcm7xxx: Fix 40nm EPHY features · c6dd213a
      Florian Fainelli 提交于
      The PHY entries for BCM7425/29/35 declare the 40nm Ethernet PHY as being
      10/100/1000 capable, while this is just a 10/100 capable PHY device, fix that.
      
      Fixes: d068b02c ("net: phy: add BCM7425 and BCM7429 PHYs")
      Fixes: 9458ceab ("net: phy: bcm7xxx: Add entry for BCM7435")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c6dd213a
    • F
      net: phy: bcm7xxx: Fix shadow mode 2 disabling · 50d89980
      Florian Fainelli 提交于
      The clear and set masks in the call to phy_set_clr_bits() called from
      bcm7xxx_config_init() are inverted. We need to fix this by swapping the two
      arguments, that is, set 0 bits, but clear the shade mode 2 enable bit.
      
      Fixes: b560a58c ("net: phy: add Broadcom BCM7xxx internal PHY driver")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      50d89980
    • D
      Merge branch 'ravb-fixes' · 5d8e498f
      David S. Miller 提交于
      Sergei Shtylyov says:
      
      ====================
      ravb: fix the fallout of R-Car gen3 gPTP support
      
         Here's a set of 2 patches against DaveM's 'net.git' repo fixing up the
      incomplete commit f5d7837f ("ravb: ptp: Add CONFIG mode support").
      I'm proposing these as fixes but they can be merged as cleanups as well...
      
      [1/2] ravb: kill duplicate setting of CCC.CSEL
      [2/2] ravb: skip gPTP start/stop on R-Car gen3
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5d8e498f
    • S
      ravb: skip gPTP start/stop on R-Car gen3 · 50bfd838
      Sergei Shtylyov 提交于
      When adding support for the  R-Car gen3 gPTP active in configuration mode,
      some call sites of ravb_ptp_{init|stop}() were missed due to an oversight.
      Add  checks for the R-Car gen2 SoCs around these...
      
      Fixes: f5d7837f ("ravb: ptp: Add CONFIG mode support")
      Signed-off-by: NSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      50bfd838
    • S
      ravb: kill duplicate setting of CCC.CSEL · d0c5f45a
      Sergei Shtylyov 提交于
      When  adding support for the  R-Car gen3 gPTP active in configuration mode,
      the code setting the CCC.CSEL field  was duplicated due to an oversight.
      For R-Car gen 2 it's just redundant  and for R-Car gen3 the write at this
      time is probably  ignored due to CCC.GAC bit being already  set...
      
      Fixes: f5d7837f ("ravb: ptp: Add CONFIG mode support")
      Signed-off-by: NSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d0c5f45a
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · dba36b38
      David S. Miller 提交于
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contain a rather large batch for your net that
      includes accumulated bugfixes, they are:
      
      1) Run conntrack cleanup from workqueue process context to avoid hitting
         soft lockup via watchdog for large tables. This is required by the
         IPv6 masquerading extension. From Florian Westphal.
      
      2) Use original skbuff from nfnetlink batch when calling netlink_ack()
         on error since this needs to access the skb->sk pointer.
      
      3) Incremental fix on top of recent Sasha Levin's lock fix for conntrack
         resizing.
      
      4) Fix several problems in nfnetlink batch message header sanitization
         and error handling, from Phil Turnbull.
      
      5) Select NF_DUP_IPV6 based on CONFIG_IPV6, from Arnd Bergmann.
      
      6) Fix wrong signess in return values on nf_tables counter expression,
         from Anton Protopopov.
      
      Due to the NetDev 1.1 organization burden, I had no chance to pass up
      this to you any sooner in this release cycle, sorry about that.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dba36b38
    • R
      af_unix: Guard against other == sk in unix_dgram_sendmsg · a5527dda
      Rainer Weikusat 提交于
      The unix_dgram_sendmsg routine use the following test
      
      if (unlikely(unix_peer(other) != sk && unix_recvq_full(other))) {
      
      to determine if sk and other are in an n:1 association (either
      established via connect or by using sendto to send messages to an
      unrelated socket identified by address). This isn't correct as the
      specified address could have been bound to the sending socket itself or
      because this socket could have been connected to itself by the time of
      the unix_peer_get but disconnected before the unix_state_lock(other). In
      both cases, the if-block would be entered despite other == sk which
      might either block the sender unintentionally or lead to trying to unlock
      the same spin lock twice for a non-blocking send. Add a other != sk
      check to guard against this.
      
      Fixes: 7d267278 ("unix: avoid use-after-free in ep_remove_wait_queue")
      Reported-By: NPhilipp Hahn <pmhahn@pmhahn.de>
      Signed-off-by: NRainer Weikusat <rweikusat@mobileactivedefense.com>
      Tested-by: NPhilipp Hahn <pmhahn@pmhahn.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a5527dda
    • R
      af_unix: Don't set err in unix_stream_read_generic unless there was an error · 1b92ee3d
      Rainer Weikusat 提交于
      The present unix_stream_read_generic contains various code sequences of
      the form
      
      err = -EDISASTER;
      if (<test>)
      	goto out;
      
      This has the unfortunate side effect of possibly causing the error code
      to bleed through to the final
      
      out:
      	return copied ? : err;
      
      and then to be wrongly returned if no data was copied because the caller
      didn't supply a data buffer, as demonstrated by the program available at
      
      http://pad.lv/1540731
      
      Change it such that err is only set if an error condition was detected.
      
      Fixes: 3822b5c2 ("af_unix: Revert 'lock_interruptible' in stream receive code")
      Reported-by: NJoseph Salisbury <joseph.salisbury@canonical.com>
      Signed-off-by: NRainer Weikusat <rweikusat@mobileactivedefense.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1b92ee3d
  2. 13 2月, 2016 19 次提交
    • M
      dscc4: Undefined signed int shift · db92ea5d
      Michael McConville 提交于
      My analysis in the below mail applies, although the second part is
      unnecessary because i isn't used in arithmetic operations here:
      
      https://marc.info/?l=openbsd-tech&m=145377854103866&w=2
      
      Thanks for your time.
      Signed-off-by: NMichael McConville <mmcco@mykolab.com>
      Acked-by: NFrancois Romieu <romieu@fr.zoreil.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      db92ea5d
    • V
      net: dsa: mv88e6xxx: do not leave reserved VLANs · 66d9cd0f
      Vivien Didelot 提交于
      BRIDGE_VLAN_FILTERING automatically adds a newly bridged port to the
      VLAN with the bridge's default_pvid.
      
      The mv88e6xxx driver currently reserves VLANs 4000+ for unbridged ports
      isolation. When a port joins a bridge, it leaves its reserved VLAN. When
      a port leaves a bridge, it joins again its reserved VLAN.
      
      But if the VLAN filtering is disabled, or if this hardware VLAN is
      already in use, the bridged port ends up with no default VLAN, and the
      communication with the CPU is thus broken.
      
      To fix this, make a port join its reserved VLAN once on setup, never
      leave it, and restore its PVID after another one was eventually used.
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Tested-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      66d9cd0f
    • V
      net: dsa: mv88e6xxx: fix software VLAN deletion · 3c06f08b
      Vivien Didelot 提交于
      The current bridge code calls switchdev_port_obj_del on a VLAN port even
      if the corresponding switchdev_port_obj_add call returned -EOPNOTSUPP.
      
      If the DSA driver doesn't return -EOPNOTSUPP for a software port VLAN in
      its port_vlan_del function, the VLAN is not deleted. Unbridging the port
      also generates a stack trace for the same reason.
      
      This can be quickly tested on a VLAN filtering enabled system with:
      
          # brctl addbr br0
          # brctl addif br0 lan0
          # brctl addbr br1
          # brctl addif br1 lan1
          # brctl delif br1 lan1
      
      Both bridges have a default default_pvid set to 1. lan0 uses the
      hardware VLAN 1 while lan1 falls back to the software VLAN 1.
      
      Unbridging lan1 does not delete its software VLAN, and thus generates
      the following stack trace:
      
          [ 2991.681705] device lan1 left promiscuous mode
          [ 2991.686237] br1: port 1(lan1) entered disabled state
          [ 2991.725094] ------------[ cut here ]------------
          [ 2991.729761] WARNING: CPU: 0 PID: 869 at net/bridge/br_vlan.c:314 __vlan_group_free+0x4c/0x50()
          [ 2991.738437] Modules linked in:
          [ 2991.741546] CPU: 0 PID: 869 Comm: ip Not tainted 4.4.0 #16
          [ 2991.747039] Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree)
          [ 2991.753511] Backtrace:
          [ 2991.756008] [<80014450>] (dump_backtrace) from [<8001469c>] (show_stack+0x20/0x24)
          [ 2991.763604]  r6:80512644 r5:00000009 r4:00000000 r3:00000000
          [ 2991.769343] [<8001467c>] (show_stack) from [<80268e44>] (dump_stack+0x24/0x28)
          [ 2991.776618] [<80268e20>] (dump_stack) from [<80025568>] (warn_slowpath_common+0x98/0xc4)
          [ 2991.784750] [<800254d0>] (warn_slowpath_common) from [<80025650>] (warn_slowpath_null+0x2c/0x34)
          [ 2991.793557]  r8:00000000 r7:9f786a8c r6:9f76c440 r5:9f786a00 r4:9f68ac00
          [ 2991.800366] [<80025624>] (warn_slowpath_null) from [<80512644>] (__vlan_group_free+0x4c/0x50)
          [ 2991.808946] [<805125f8>] (__vlan_group_free) from [<80514488>] (nbp_vlan_flush+0x44/0x68)
          [ 2991.817147]  r4:9f68ac00 r3:9ec70000
          [ 2991.820772] [<80514444>] (nbp_vlan_flush) from [<80506f08>] (del_nbp+0xac/0x130)
          [ 2991.828201]  r5:9f56f800 r4:9f786a00
          [ 2991.831841] [<80506e5c>] (del_nbp) from [<8050774c>] (br_del_if+0x40/0xbc)
          [ 2991.838724]  r7:80590f68 r6:00000000 r5:9ec71c38 r4:9f76c440
          [ 2991.844475] [<8050770c>] (br_del_if) from [<80503dc0>] (br_del_slave+0x1c/0x20)
          [ 2991.851802]  r5:9ec71c38 r4:9f56f800
          [ 2991.855428] [<80503da4>] (br_del_slave) from [<80484a34>] (do_setlink+0x324/0x7b8)
          [ 2991.863043] [<80484710>] (do_setlink) from [<80485e90>] (rtnl_newlink+0x508/0x6f4)
          [ 2991.870616]  r10:00000000 r9:9ec71ba8 r8:00000000 r7:00000000 r6:9f6b0400 r5:9f56f800
          [ 2991.878548]  r4:8076278c
          [ 2991.881110] [<80485988>] (rtnl_newlink) from [<80484048>] (rtnetlink_rcv_msg+0x18c/0x22c)
          [ 2991.889315]  r10:9f7d4e40 r9:00000000 r8:00000000 r7:00000000 r6:9f7d4e40 r5:9f6b0400
          [ 2991.897250]  r4:00000000
          [ 2991.899814] [<80483ebc>] (rtnetlink_rcv_msg) from [<80497c74>] (netlink_rcv_skb+0xb0/0xcc)
          [ 2991.908104]  r8:00000000 r7:9f7d4e40 r6:9f7d4e40 r5:80483ebc r4:9f6b0400
          [ 2991.914928] [<80497bc4>] (netlink_rcv_skb) from [<80483eb4>] (rtnetlink_rcv+0x34/0x3c)
          [ 2991.922874]  r6:9f5ea000 r5:00000028 r4:9f7d4e40 r3:80483e80
          [ 2991.928622] [<80483e80>] (rtnetlink_rcv) from [<80497604>] (netlink_unicast+0x180/0x200)
          [ 2991.936742]  r4:9f4edc00 r3:80483e80
          [ 2991.940362] [<80497484>] (netlink_unicast) from [<80497a88>] (netlink_sendmsg+0x33c/0x350)
          [ 2991.948648]  r8:00000000 r7:00000028 r6:00000000 r5:9f5ea000 r4:9ec71f4c
          [ 2991.955481] [<8049774c>] (netlink_sendmsg) from [<80457ff0>] (sock_sendmsg+0x24/0x34)
          [ 2991.963342]  r10:00000000 r9:9ec71e28 r8:00000000 r7:9f1e2140 r6:00000000 r5:00000000
          [ 2991.971276]  r4:9ec71f4c
          [ 2991.973849] [<80457fcc>] (sock_sendmsg) from [<80458af0>] (___sys_sendmsg+0x1fc/0x204)
          [ 2991.981809] [<804588f4>] (___sys_sendmsg) from [<804598d0>] (__sys_sendmsg+0x4c/0x7c)
          [ 2991.989640]  r10:00000000 r9:9ec70000 r8:80010824 r7:00000128 r6:7ee946c4 r5:00000000
          [ 2991.997572]  r4:9f1e2140
          [ 2992.000128] [<80459884>] (__sys_sendmsg) from [<80459918>] (SyS_sendmsg+0x18/0x1c)
          [ 2992.007725]  r6:00000000 r5:7ee9c7b8 r4:7ee946e0
          [ 2992.012430] [<80459900>] (SyS_sendmsg) from [<80010660>] (ret_fast_syscall+0x0/0x3c)
          [ 2992.020182] ---[ end trace 5d4bc29f4da04280 ]---
      
      To fix this, return -EOPNOTSUPP in _mv88e6xxx_port_vlan_del instead of
      -ENOENT if the hardware VLAN doesn't exist or the port is not a member.
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Tested-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3c06f08b
    • C
      net: cavium: liquidio: fix check for in progress flag · 19a6d156
      Colin Ian King 提交于
      smatch detected a suspicious looking bitop condition:
      
      drivers/net/ethernet/cavium/liquidio/lio_main.c:2529
        handle_timestamp() warn: suspicious bitop condition
      
      (skb_shinfo(skb)->tx_flags | SKBTX_IN_PROGRESS is always non-zero,
      so the logic is definitely not correct.  Use & to mask the correct
      bit.
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      19a6d156
    • V
      hv_netvsc: Restore needed_headroom request · 14a03cf8
      Vitaly Kuznetsov 提交于
      Commit c0eb4540 ("hv_netvsc: Don't ask for additional head room in the
      skb") got rid of needed_headroom setting for the driver. With the change I
      hit the following issue trying to use ptkgen module:
      
      [   57.522021] kernel BUG at net/core/skbuff.c:1128!
      [   57.522021] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
      ...
      [   58.721068] Call Trace:
      [   58.721068]  [<ffffffffa0144e86>] netvsc_start_xmit+0x4c6/0x8e0 [hv_netvsc]
      ...
      [   58.721068]  [<ffffffffa02f87fc>] ? pktgen_finalize_skb+0x25c/0x2a0 [pktgen]
      [   58.721068]  [<ffffffff814f5760>] ? __netdev_alloc_skb+0xc0/0x100
      [   58.721068]  [<ffffffffa02f9907>] pktgen_thread_worker+0x257/0x1920 [pktgen]
      
      Basically, we're calling skb_cow_head(skb, RNDIS_AND_PPI_SIZE) and crash on
          if (skb_shared(skb))
              BUG();
      
      We probably need to restore needed_headroom setting (but shrunk to
      RNDIS_AND_PPI_SIZE as we don't need more) to request the required headroom
      space. In theory, it should not give us performance penalty.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      14a03cf8
    • D
      Merge branch 'mvneta-fixes' · 603607de
      David S. Miller 提交于
      Gregory CLEMENT says:
      
      ====================
      mvneta fixes for SMP
      
      Following this bug report:
      http://thread.gmane.org/gmane.linux.ports.arm.kernel/468173 and the
      suggestions from Russell King, I reviewed all the code involving
      multi-CPU. It ended with this series of patches which should improve
      the stability of the driver.
      
      During my test I found another bug which is fixed by new patch (the
      second one of this new version of the series)
      
      The two first patches fix real bugs, the others fix potential issues
      in the driver.
      
      Changelog:
      
      v1 -> v2
      Fix spinlock comment. Pointed by David Miller
      
      v2 -> v3
       - Fix typos and mistake in the comments. Pointed by Sergei Shtylyov
       - Add a new patch fixing the CPU choice in mvneta_percpu_elect
       - Use lock in last patch to prevent remaining race condition. Pointed
         by Jisheng
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      603607de
    • G
      net: mvneta: Fix race condition during stopping · 120cfa50
      Gregory CLEMENT 提交于
      When stopping the port, the CPU notifier are still there whereas the
      mvneta_stop_dev function calls mvneta_percpu_disable() on each CPUs.
      It was possible to have a new CPU coming at this point which could be
      racy.
      
      This patch adds a flag preventing executing the code notifier for a new
      CPU when the port is stopping. It also uses the spinlock introduces
      previously. To avoid the deadlock, the lock has been moved outside the
      mvneta_percpu_elect function.
      Signed-off-by: NGregory CLEMENT <gregory.clement@free-electrons.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      120cfa50
    • G
      net: mvneta: The mvneta_percpu_elect function should be atomic · 5888511e
      Gregory CLEMENT 提交于
      Electing a CPU must be done in an atomic way: it should be done after or
      before the removal/insertion of a CPU and this function is not reentrant.
      
      During the loop of mvneta_percpu_elect we associates the queues to the
      CPUs, if there is a topology change during this loop, then the mapping
      between the CPUs and the queues could be wrong. During this loop the
      interrupt mask is also updating for each CPUs, It should not be changed
      in the same time by other part of the driver.
      
      This patch adds spinlock to create the needed critical sections.
      Signed-off-by: NGregory CLEMENT <gregory.clement@free-electrons.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5888511e
    • G
      net: mvneta: Modify the queue related fields from each cpu · db488c10
      Gregory CLEMENT 提交于
      In the MVNETA_INTR_* registers, the queues related fields are per cpu,
      according to the datasheet (comment in [] are added by me):
      "In a multi-CPU system, bits of RX[or TX] queues for which the access by
      the reading[or writing] CPU is disabled are read as 0, and cannot be
      cleared[or written]."
      
      That means that each time we want to manipulate these bits we had to do
      it on each cpu and not only on the current cpu.
      Signed-off-by: NGregory CLEMENT <gregory.clement@free-electrons.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      db488c10
    • G
      net: mvneta: Remove unused code · cde4c0fe
      Gregory CLEMENT 提交于
      Since the commit 2dcf75e2 ("net: mvneta: Associate RX queues with
      each CPU") all the percpu irq are used and disabled at initialization, so
      there is no point to disable them first.
      Signed-off-by: NGregory CLEMENT <gregory.clement@free-electrons.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cde4c0fe
    • G
      net: mvneta: Use on_each_cpu when possible · 6b125d63
      Gregory CLEMENT 提交于
      Instead of using a for_each_* loop in which we just call the
      smp_call_function_single macro, it is more simple to directly use the
      on_each_cpu macro. Moreover, this macro ensures that the calls will be
      done all at once.
      Suggested-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NGregory CLEMENT <gregory.clement@free-electrons.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6b125d63
    • G
      net: mvneta: Fix the CPU choice in mvneta_percpu_elect · cad5d847
      Gregory CLEMENT 提交于
      When passing to the management of multiple RX queue, the
      mvneta_percpu_elect function was broken. The use of the modulo can lead
      to elect the wrong cpu. For example with rxq_def=2, if the CPU 2 goes
      offline and then online, we ended with the third RX queue activated in
      the same time on CPU 0 and CPU2, which lead to a kernel crash.
      
      With this fix, we don't try to get "the closer" CPU if the default CPU is
      gone, now we just use CPU 0 which always be there. Thanks to this, the
      code becomes more readable, easier to maintain and more predicable.
      
      Cc: stable@vger.kernel.org
      Fixes: 2dcf75e2 ("net: mvneta: Associate RX queues with each CPU")
      Signed-off-by: NGregory CLEMENT <gregory.clement@free-electrons.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cad5d847
    • G
      net: mvneta: Fix for_each_present_cpu usage · 129219e4
      Gregory CLEMENT 提交于
      This patch convert the for_each_present in on_each_cpu, instead of
      applying on the present cpus it will be applied only on the online cpus.
      This fix a bug reported on
      http://thread.gmane.org/gmane.linux.ports.arm.kernel/468173.
      
      Using the macro on_each_cpu (instead of a for_each_* loop) also ensures
      that all the calls will be done all at once.
      
      Fixes: f8642885 ("net: mvneta: Statically assign queues to CPUs")
      Reported-by: NStefan Roese <stefan.roese@gmail.com>
      Suggested-by: NJisheng Zhang <jszhang@marvell.com>
      Suggested-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NGregory CLEMENT <gregory.clement@free-electrons.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      129219e4
    • L
      vsock: Fix blocking ops call in prepare_to_wait · 59888180
      Laura Abbott 提交于
      We receoved a bug report from someone using vmware:
      
      WARNING: CPU: 3 PID: 660 at kernel/sched/core.c:7389
      __might_sleep+0x7d/0x90()
      do not call blocking ops when !TASK_RUNNING; state=1 set at
      [<ffffffff810fa68d>] prepare_to_wait+0x2d/0x90
      Modules linked in: vmw_vsock_vmci_transport vsock snd_seq_midi
      snd_seq_midi_event snd_ens1371 iosf_mbi gameport snd_rawmidi
      snd_ac97_codec ac97_bus snd_seq coretemp snd_seq_device snd_pcm
      snd_timer snd soundcore ppdev crct10dif_pclmul crc32_pclmul
      ghash_clmulni_intel vmw_vmci vmw_balloon i2c_piix4 shpchp parport_pc
      parport acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc btrfs
      xor raid6_pq 8021q garp stp llc mrp crc32c_intel serio_raw mptspi vmwgfx
      drm_kms_helper ttm drm scsi_transport_spi mptscsih e1000 ata_generic
      mptbase pata_acpi
      CPU: 3 PID: 660 Comm: vmtoolsd Not tainted
      4.2.0-0.rc1.git3.1.fc23.x86_64 #1
      Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop
      Reference Platform, BIOS 6.00 05/20/2014
       0000000000000000 0000000049e617f3 ffff88006ac37ac8 ffffffff818641f5
       0000000000000000 ffff88006ac37b20 ffff88006ac37b08 ffffffff810ab446
       ffff880068009f40 ffffffff81c63bc0 0000000000000061 0000000000000000
      Call Trace:
       [<ffffffff818641f5>] dump_stack+0x4c/0x65
       [<ffffffff810ab446>] warn_slowpath_common+0x86/0xc0
       [<ffffffff810ab4d5>] warn_slowpath_fmt+0x55/0x70
       [<ffffffff8112551d>] ? debug_lockdep_rcu_enabled+0x1d/0x20
       [<ffffffff810fa68d>] ? prepare_to_wait+0x2d/0x90
       [<ffffffff810fa68d>] ? prepare_to_wait+0x2d/0x90
       [<ffffffff810da2bd>] __might_sleep+0x7d/0x90
       [<ffffffff812163b3>] __might_fault+0x43/0xa0
       [<ffffffff81430477>] copy_from_iter+0x87/0x2a0
       [<ffffffffa039460a>] __qp_memcpy_to_queue+0x9a/0x1b0 [vmw_vmci]
       [<ffffffffa0394740>] ? qp_memcpy_to_queue+0x20/0x20 [vmw_vmci]
       [<ffffffffa0394757>] qp_memcpy_to_queue_iov+0x17/0x20 [vmw_vmci]
       [<ffffffffa0394d50>] qp_enqueue_locked+0xa0/0x140 [vmw_vmci]
       [<ffffffffa039593f>] vmci_qpair_enquev+0x4f/0xd0 [vmw_vmci]
       [<ffffffffa04847bb>] vmci_transport_stream_enqueue+0x1b/0x20
      [vmw_vsock_vmci_transport]
       [<ffffffffa047ae05>] vsock_stream_sendmsg+0x2c5/0x320 [vsock]
       [<ffffffff810fabd0>] ? wake_atomic_t_function+0x70/0x70
       [<ffffffff81702af8>] sock_sendmsg+0x38/0x50
       [<ffffffff81702ff4>] SYSC_sendto+0x104/0x190
       [<ffffffff8126e25a>] ? vfs_read+0x8a/0x140
       [<ffffffff817042ee>] SyS_sendto+0xe/0x10
       [<ffffffff8186d9ae>] entry_SYSCALL_64_fastpath+0x12/0x76
      
      transport->stream_enqueue may call copy_to_user so it should
      not be called inside a prepare_to_wait. Narrow the scope of
      the prepare_to_wait to avoid the bad call. This also applies
      to vsock_stream_recvmsg as well.
      Reported-by: NVinson Lee <vlee@freedesktop.org>
      Tested-by: NVinson Lee <vlee@freedesktop.org>
      Signed-off-by: NLaura Abbott <labbott@fedoraproject.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      59888180
    • C
      r8169:fix system hange problem. · a2cb7ec0
      Chun-Hao Lin 提交于
      There are typos in setting RTL8168H hardware parameters. If system install
      another version driver that may cuase system hang.
      Signed-off-by: NChunhao Lin <hau@realtek.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a2cb7ec0
    • E
      ipv4: fix memory leaks in ip_cmsg_send() callers · 91948309
      Eric Dumazet 提交于
      Dmitry reported memory leaks of IP options allocated in
      ip_cmsg_send() when/if this function returns an error.
      
      Callers are responsible for the freeing.
      
      Many thanks to Dmitry for the report and diagnostic.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      91948309
    • A
      net: mvpp2: Return correct error codes · c2bb7bc5
      Amitoj Kaur Chawla 提交于
      The return value of kzalloc on failure of allocation of memory should
      be -ENOMEM and not -1.
      
      Found using Coccinelle. A simplified version of the semantic patch
      used is:
      
      //<smpl>
      @@
      expression *e;
      position p,q;
      @@
      
      e@q = kzalloc(...);
      if@p (e == NULL) {
      ...
      return
      - -1
      + -ENOMEM
      ;
      }
      //</smpl>
      
      This function may also return -1 after calling mpp2_prs_tcam_port_map_get.
      So that the function consistently returns meaningful error values on
      failure, the -1 is changed to -EINVAL.
      Signed-off-by: NAmitoj Kaur Chawla <amitoj1606@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c2bb7bc5
    • A
      net: cavium: liquidio: Return correct error code · 08a965ec
      Amitoj Kaur Chawla 提交于
      The return value of vmalloc on failure of allocation of memory should
      be -ENOMEM and not -1.
      
      Found using Coccinelle. A simplified version of the semantic patch
      used is:
      
      //<smpl>
      @@
      expression *e;
      identifier l1;
      position p,q;
      @@
      
      e@q = vmalloc(...);
      if@p (e == NULL) {
      ...
      goto l1;
      }
      l1:
      ...
      return -1
      + -ENOMEM
      ;
      //</smpl
      
      The single call site of the containing function checks whether the
      returned value is -1, so this check is changed as well. The single call
      site of this call site, however, only checks whether the value is not 0,
      so no further change was required.
      Signed-off-by: NAmitoj Kaur Chawla <amitoj1606@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      08a965ec
    • J
      bonding: Fix ARP monitor validation · 21a75f09
      Jay Vosburgh 提交于
      The current logic in bond_arp_rcv will accept an incoming ARP for
      validation if (a) the receiving slave is either "active" (which includes
      the currently active slave, or the current ARP slave) or, (b) there is a
      currently active slave, and it has received an ARP since it became active.
      For case (b), the receiving slave isn't the currently active slave, and is
      receiving the original broadcast ARP request, not an ARP reply from the
      target.
      
      	This logic can fail if there is no currently active slave.  In
      this situation, the ARP probe logic cycles through all slaves, assigning
      each in turn as the "current_arp_slave" for one arp_interval, then setting
      that one as "active," and sending an ARP probe from that slave.  The
      current logic expects the ARP reply to arrive on the sending
      current_arp_slave, however, due to switch FDB updating delays, the reply
      may be directed to another slave.
      
      	This can arise if the bonding slaves and switch are working, but
      the ARP target is not responding.  When the ARP target recovers, a
      condition may result wherein the ARP target host replies faster than the
      switch can update its forwarding table, causing each ARP reply to be sent
      to the previous current_arp_slave.  This will never pass the logic in
      bond_arp_rcv, as neither of the above conditions (a) or (b) are met.
      
      	Some experimentation on a LAN shows ARP reply round trips in the
      200 usec range, but my available switches never update their FDB in less
      than 4000 usec.
      
      	This patch changes the logic in bond_arp_rcv to additionally
      accept an ARP reply for validation on any slave if there is a current ARP
      slave and it sent an ARP probe during the previous arp_interval.
      
      Fixes: aeea64ac ("bonding: don't trust arp requests unless active slave really works")
      Cc: Veaceslav Falico <vfalico@gmail.com>
      Cc: Andy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: NJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      21a75f09
  3. 12 2月, 2016 1 次提交