1. 24 9月, 2015 14 次提交
    • D
      8139cp: Fix TSO/scatter-gather descriptor setup · a3b80404
      David Woodhouse 提交于
      When sending a TSO frame in multiple buffers, we were neglecting to set
      the first descriptor up in TSO mode.
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a3b80404
    • D
      8139cp: Fix tx_queued debug message to print correct slot numbers · 26b0bad6
      David Woodhouse 提交于
      After a certain amount of staring at the debug output of this driver, I
      realised it was lying to me.
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      26b0bad6
    • D
      8139cp: Do not re-enable RX interrupts in cp_tx_timeout() · aaa0062e
      David Woodhouse 提交于
      If an RX interrupt was already received but NAPI has not yet run when
      the RX timeout happens, we end up in cp_tx_timeout() with RX interrupts
      already disabled. Blindly re-enabling them will cause an IRQ storm.
      
      (This is made particularly horrid by the fact that cp_interrupt() always
      returns that it's handled the interrupt, even when it hasn't actually
      done anything. If it didn't do that, the core IRQ code would have
      detected the storm and handled it, I'd have had a clear smoking gun
      backtrace instead of just a spontaneously resetting router, and I'd have
      at *least* two days of my life back. Changing the return value of
      cp_interrupt() will be argued about under separate cover.)
      
      Unconditionally leave RX interrupts disabled after the reset, and
      schedule NAPI to check the receive ring and re-enable them.
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aaa0062e
    • D
      Merge branch 'netcp-fixes' · 3c6cb3ac
      David S. Miller 提交于
      Murali Karicheri says:
      
      ====================
      net: netcp: a set of bug fixes
      
      This patch series fixes a set of issues in netcp driver seen during internal
      testing of the driver. While at it, do some clean up as well.
      
      The fixes are tested on K2HK, K2L and K2E EVMs and the boot up logs can be
      seen at
      
       http://pastebin.ubuntu.com/12533100/
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3c6cb3ac
    • K
      net: netcp: fix deadlock reported by lockup detector · 8ceaf361
      Karicheri, Muralidharan 提交于
      A deadlock trace is seen in netcp driver with lockup detector enabled.
      The trace log is provided below for reference. This patch fixes the
      bug by removing the usage of netcp_modules_lock within ndo_ops functions.
      ndo_{open/close/ioctl)() is already called with rtnl_lock held. So there
      is no need to hold another mutex for serialization across processes on
      multiple cores.  So remove use of netcp_modules_lock mutex from these
      ndo ops functions.
      
      ndo_set_rx_mode() shouldn't be using a mutex as it is called from atomic
      context. In the case of ndo_set_rx_mode(), there can be call to this API
      without rtnl_lock held from an atomic context. As the underlying modules
      are expected to add address to a hardware table, it is to be protected
      across concurrent updates and hence a spin lock is used to synchronize
      the access. Same with ndo_vlan_rx_add_vid() & ndo_vlan_rx_kill_vid().
      
      Probably the netcp_modules_lock is used to protect the module not being
      removed as part of rmmod. Currently this is not fully implemented and
      assumes the interface is brought down before doing rmmod of modules.
      The support for rmmmod while interface is up is expected in a future
      patch set when additional modules such as pa, qos are added. For now
      all of the tests such as if up/down, reboot, iperf works fine with this
      patch applied.
      
      Deadlock trace seen with lockup detector enabled is shown below for
      reference.
      
      [   16.863014] ======================================================
      [   16.869183] [ INFO: possible circular locking dependency detected ]
      [   16.875441] 4.1.6-01265-gfb1e101 #1 Tainted: G        W
      [   16.881176] -------------------------------------------------------
      [   16.887432] ifconfig/1662 is trying to acquire lock:
      [   16.892386]  (netcp_modules_lock){+.+.+.}, at: [<c03e8110>]
      netcp_ndo_open+0x168/0x518
      [   16.900321]
      [   16.900321] but task is already holding lock:
      [   16.906144]  (rtnl_mutex){+.+.+.}, at: [<c053a418>] devinet_ioctl+0xf8/0x7e4
      [   16.913206]
      [   16.913206] which lock already depends on the new lock.
      [   16.913206]
      [   16.921372]
      [   16.921372] the existing dependency chain (in reverse order) is:
      [   16.928844]
      -> #1 (rtnl_mutex){+.+.+.}:
      [   16.932865]        [<c06023f0>] mutex_lock_nested+0x68/0x4a8
      [   16.938521]        [<c04c5758>] register_netdev+0xc/0x24
      [   16.943831]        [<c03e65c0>] netcp_module_probe+0x214/0x2ec
      [   16.949660]        [<c03e8a54>] netcp_register_module+0xd4/0x140
      [   16.955663]        [<c089654c>] keystone_gbe_init+0x10/0x28
      [   16.961233]        [<c000977c>] do_one_initcall+0xb8/0x1f8
      [   16.966714]        [<c0867e04>] kernel_init_freeable+0x148/0x1e8
      [   16.972720]        [<c05f9994>] kernel_init+0xc/0xe8
      [   16.977682]        [<c0010038>] ret_from_fork+0x14/0x3c
      [   16.982905]
      -> #0 (netcp_modules_lock){+.+.+.}:
      [   16.987619]        [<c006eab0>] lock_acquire+0x118/0x320
      [   16.992928]        [<c06023f0>] mutex_lock_nested+0x68/0x4a8
      [   16.998582]        [<c03e8110>] netcp_ndo_open+0x168/0x518
      [   17.004064]        [<c04c48f0>] __dev_open+0xa8/0x10c
      [   17.009112]        [<c04c4b74>] __dev_change_flags+0x94/0x144
      [   17.014853]        [<c04c4c3c>] dev_change_flags+0x18/0x48
      [   17.020334]        [<c053a9fc>] devinet_ioctl+0x6dc/0x7e4
      [   17.025729]        [<c04a59ec>] sock_ioctl+0x1d0/0x2a8
      [   17.030865]        [<c0142844>] do_vfs_ioctl+0x41c/0x688
      [   17.036173]        [<c0142ae4>] SyS_ioctl+0x34/0x5c
      [   17.041046]        [<c000ff60>] ret_fast_syscall+0x0/0x54
      [   17.046441]
      [   17.046441] other info that might help us debug this:
      [   17.046441]
      [   17.054434]  Possible unsafe locking scenario:
      [   17.054434]
      [   17.060343]        CPU0                    CPU1
      [   17.064862]        ----                    ----
      [   17.069381]   lock(rtnl_mutex);
      [   17.072522]                                lock(netcp_modules_lock);
      [   17.078875]                                lock(rtnl_mutex);
      [   17.084532]   lock(netcp_modules_lock);
      [   17.088366]
      [   17.088366]  *** DEADLOCK ***
      [   17.088366]
      [   17.094279] 1 lock held by ifconfig/1662:
      [   17.098278]  #0:  (rtnl_mutex){+.+.+.}, at: [<c053a418>]
      devinet_ioctl+0xf8/0x7e4
      [   17.105774]
      [   17.105774] stack backtrace:
      [   17.110124] CPU: 1 PID: 1662 Comm: ifconfig Tainted: G        W
      4.1.6-01265-gfb1e101 #1
      [   17.118637] Hardware name: Keystone
      [   17.122123] [<c00178e4>] (unwind_backtrace) from [<c0013cbc>]
      (show_stack+0x10/0x14)
      [   17.129862] [<c0013cbc>] (show_stack) from [<c05ff450>]
      (dump_stack+0x84/0xc4)
      [   17.137079] [<c05ff450>] (dump_stack) from [<c0068e34>]
      (print_circular_bug+0x210/0x330)
      [   17.145161] [<c0068e34>] (print_circular_bug) from [<c006ab7c>]
      (validate_chain.isra.35+0xf98/0x13ac)
      [   17.154372] [<c006ab7c>] (validate_chain.isra.35) from [<c006da60>]
      (__lock_acquire+0x52c/0xcc0)
      [   17.163149] [<c006da60>] (__lock_acquire) from [<c006eab0>]
      (lock_acquire+0x118/0x320)
      [   17.171058] [<c006eab0>] (lock_acquire) from [<c06023f0>]
      (mutex_lock_nested+0x68/0x4a8)
      [   17.179140] [<c06023f0>] (mutex_lock_nested) from [<c03e8110>]
      (netcp_ndo_open+0x168/0x518)
      [   17.187484] [<c03e8110>] (netcp_ndo_open) from [<c04c48f0>]
      (__dev_open+0xa8/0x10c)
      [   17.195133] [<c04c48f0>] (__dev_open) from [<c04c4b74>]
      (__dev_change_flags+0x94/0x144)
      [   17.203129] [<c04c4b74>] (__dev_change_flags) from [<c04c4c3c>]
      (dev_change_flags+0x18/0x48)
      [   17.211560] [<c04c4c3c>] (dev_change_flags) from [<c053a9fc>]
      (devinet_ioctl+0x6dc/0x7e4)
      [   17.219729] [<c053a9fc>] (devinet_ioctl) from [<c04a59ec>]
      (sock_ioctl+0x1d0/0x2a8)
      [   17.227378] [<c04a59ec>] (sock_ioctl) from [<c0142844>]
      (do_vfs_ioctl+0x41c/0x688)
      [   17.234939] [<c0142844>] (do_vfs_ioctl) from [<c0142ae4>]
      (SyS_ioctl+0x34/0x5c)
      [   17.242242] [<c0142ae4>] (SyS_ioctl) from [<c000ff60>]
      (ret_fast_syscall+0x0/0x54)
      [   17.258855] netcp-1.0 2620110.netcp eth0: Link is Up - 1Gbps/Full - flow
      control off
      [   17.271282] BUG: sleeping function called from invalid context at
      kernel/locking/mutex.c:616
      [   17.279712] in_atomic(): 1, irqs_disabled(): 0, pid: 1662, name: ifconfig
      [   17.286500] INFO: lockdep is turned off.
      [   17.290413] Preemption disabled at:[<  (null)>]   (null)
      [   17.295728]
      [   17.297214] CPU: 1 PID: 1662 Comm: ifconfig Tainted: G        W
      4.1.6-01265-gfb1e101 #1
      [   17.305735] Hardware name: Keystone
      [   17.309223] [<c00178e4>] (unwind_backtrace) from [<c0013cbc>]
      (show_stack+0x10/0x14)
      [   17.316970] [<c0013cbc>] (show_stack) from [<c05ff450>]
      (dump_stack+0x84/0xc4)
      [   17.324194] [<c05ff450>] (dump_stack) from [<c06023b0>]
      (mutex_lock_nested+0x28/0x4a8)
      [   17.332112] [<c06023b0>] (mutex_lock_nested) from [<c03e9840>]
      (netcp_set_rx_mode+0x160/0x210)
      [   17.340724] [<c03e9840>] (netcp_set_rx_mode) from [<c04c483c>]
      (dev_set_rx_mode+0x1c/0x28)
      [   17.348982] [<c04c483c>] (dev_set_rx_mode) from [<c04c490c>]
      (__dev_open+0xc4/0x10c)
      [   17.356724] [<c04c490c>] (__dev_open) from [<c04c4b74>]
      (__dev_change_flags+0x94/0x144)
      [   17.364729] [<c04c4b74>] (__dev_change_flags) from [<c04c4c3c>]
      (dev_change_flags+0x18/0x48)
      [   17.373166] [<c04c4c3c>] (dev_change_flags) from [<c053a9fc>]
      (devinet_ioctl+0x6dc/0x7e4)
      [   17.381344] [<c053a9fc>] (devinet_ioctl) from [<c04a59ec>]
      (sock_ioctl+0x1d0/0x2a8)
      [   17.388994] [<c04a59ec>] (sock_ioctl) from [<c0142844>]
      (do_vfs_ioctl+0x41c/0x688)
      [   17.396563] [<c0142844>] (do_vfs_ioctl) from [<c0142ae4>]
      (SyS_ioctl+0x34/0x5c)
      [   17.403873] [<c0142ae4>] (SyS_ioctl) from [<c000ff60>]
      (ret_fast_syscall+0x0/0x54)
      [   17.413772] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
      udhcpc (v1.20.2) started
      Sending discover...
      [   18.690666] netcp-1.0 2620110.netcp eth0: Link is Up - 1Gbps/Full - flow
      control off
      Sending discover...
      [   22.250972] netcp-1.0 2620110.netcp eth0: Link is Up - 1Gbps/Full - flow
      control off
      [   22.258721] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
      [   22.265458] BUG: sleeping function called from invalid context at
      kernel/locking/mutex.c:616
      [   22.273896] in_atomic(): 1, irqs_disabled(): 0, pid: 342, name: kworker/1:1
      [   22.280854] INFO: lockdep is turned off.
      [   22.284767] Preemption disabled at:[<  (null)>]   (null)
      [   22.290074]
      [   22.291568] CPU: 1 PID: 342 Comm: kworker/1:1 Tainted: G        W
      4.1.6-01265-gfb1e101 #1
      [   22.300255] Hardware name: Keystone
      [   22.303750] Workqueue: ipv6_addrconf addrconf_dad_work
      [   22.308895] [<c00178e4>] (unwind_backtrace) from [<c0013cbc>]
      (show_stack+0x10/0x14)
      [   22.316643] [<c0013cbc>] (show_stack) from [<c05ff450>]
      (dump_stack+0x84/0xc4)
      [   22.323867] [<c05ff450>] (dump_stack) from [<c06023b0>]
      (mutex_lock_nested+0x28/0x4a8)
      [   22.331786] [<c06023b0>] (mutex_lock_nested) from [<c03e9840>]
      (netcp_set_rx_mode+0x160/0x210)
      [   22.340394] [<c03e9840>] (netcp_set_rx_mode) from [<c04c9d18>]
      (__dev_mc_add+0x54/0x68)
      [   22.348401] [<c04c9d18>] (__dev_mc_add) from [<c05ab358>]
      (igmp6_group_added+0x168/0x1b4)
      [   22.356580] [<c05ab358>] (igmp6_group_added) from [<c05ad2cc>]
      (ipv6_dev_mc_inc+0x4f0/0x5a8)
      [   22.365019] [<c05ad2cc>] (ipv6_dev_mc_inc) from [<c058f0d0>]
      (addrconf_dad_work+0x21c/0x33c)
      [   22.373460] [<c058f0d0>] (addrconf_dad_work) from [<c0042850>]
      (process_one_work+0x214/0x8d0)
      [   22.381986] [<c0042850>] (process_one_work) from [<c0042f54>]
      (worker_thread+0x48/0x4bc)
      [   22.390071] [<c0042f54>] (worker_thread) from [<c004868c>]
      (kthread+0xf0/0x108)
      [   22.397381] [<c004868c>] (kthread) from [<c0010038>]
      
      Trace related to incorrect usage of mutex inside ndo_set_rx_mode
      
      [   24.086066] BUG: sleeping function called from invalid context at
      kernel/locking/mutex.c:616
      [   24.094506] in_atomic(): 1, irqs_disabled(): 0, pid: 1682, name: ifconfig
      [   24.101291] INFO: lockdep is turned off.
      [   24.105203] Preemption disabled at:[<  (null)>]   (null)
      [   24.110511]
      [   24.112005] CPU: 2 PID: 1682 Comm: ifconfig Tainted: G        W
      4.1.6-01265-gfb1e101 #1
      [   24.120518] Hardware name: Keystone
      [   24.124018] [<c00178e4>] (unwind_backtrace) from [<c0013cbc>]
      (show_stack+0x10/0x14)
      [   24.131772] [<c0013cbc>] (show_stack) from [<c05ff450>]
      (dump_stack+0x84/0xc4)
      [   24.138989] [<c05ff450>] (dump_stack) from [<c06023b0>]
      (mutex_lock_nested+0x28/0x4a8)
      [   24.146908] [<c06023b0>] (mutex_lock_nested) from [<c03e9840>]
      (netcp_set_rx_mode+0x160/0x210)
      [   24.155523] [<c03e9840>] (netcp_set_rx_mode) from [<c04c483c>]
      (dev_set_rx_mode+0x1c/0x28)
      [   24.163787] [<c04c483c>] (dev_set_rx_mode) from [<c04c490c>]
      (__dev_open+0xc4/0x10c)
      [   24.171531] [<c04c490c>] (__dev_open) from [<c04c4b74>]
      (__dev_change_flags+0x94/0x144)
      [   24.179528] [<c04c4b74>] (__dev_change_flags) from [<c04c4c3c>]
      (dev_change_flags+0x18/0x48)
      [   24.187966] [<c04c4c3c>] (dev_change_flags) from [<c053a9fc>]
      (devinet_ioctl+0x6dc/0x7e4)
      [   24.196145] [<c053a9fc>] (devinet_ioctl) from [<c04a59ec>]
      (sock_ioctl+0x1d0/0x2a8)
      [   24.203803] [<c04a59ec>] (sock_ioctl) from [<c0142844>]
      (do_vfs_ioctl+0x41c/0x688)
      [   24.211373] [<c0142844>] (do_vfs_ioctl) from [<c0142ae4>]
      (SyS_ioctl+0x34/0x5c)
      [   24.218676] [<c0142ae4>] (SyS_ioctl) from [<c000ff60>]
      (ret_fast_syscall+0x0/0x54)
      [   24.227156] IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
      Signed-off-by: NMurali Karicheri <m-karicheri2@ti.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ceaf361
    • K
      net: netcp: allocate buffers to desc before re-enable interrupt · 99f8ef5d
      Karicheri, Muralidharan 提交于
      Currently netcp_rxpool_refill() that refill descriptors and attached
      buffers to fdq while interrupt is enabled as part of NAPI poll. Doing
      it while interrupt is disabled could be beneficial as hardware will
      not be starved when CPU is busy with processing interrupt.
      Signed-off-by: NMurali Karicheri <m-karicheri2@ti.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      99f8ef5d
    • K
      net: netcp: check for interface handle in netcp_module_probe() · 915c5857
      Karicheri, Muralidharan 提交于
      Currently netcp_module_probe() doesn't check the return value of
      of_parse_phandle() that points to the interface data for the
      module and then pass the node ptr to the module which is incorrect.
      Check for return value and free the intf_modpriv if there is error.
      Signed-off-by: NMurali Karicheri <m-karicheri2@ti.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      915c5857
    • K
      net: netcp: add error check to netcp_allocate_rx_buf() · e558b1fb
      Karicheri, Muralidharan 提交于
      Currently, if netcp_allocate_rx_buf() fails due no descriptors
      in the rx free descriptor queue, inside the netcp_rxpool_refill() function
      the iterative loop to fill buffers doesn't terminate right away. So modify
      the netcp_allocate_rx_buf() to return an error code and use it break the
      loop when there is error.
      Signed-off-by: NMurali Karicheri <m-karicheri2@ti.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e558b1fb
    • K
      net: netcp: move netcp_register_interface() to after attach module · 736532a0
      Karicheri, Muralidharan 提交于
      The netcp interface is not fully initialized before attach the module
      to the interface. For example, the tx pipe/rx pipe is initialized
      in ethss module as part of attach(). So until this is complete, the
      interface can't be registered.  So move registration of interface to
      net device outside the current loop that attaches the modules to the
      interface.
      Signed-off-by: NMurali Karicheri <m-karicheri2@ti.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      736532a0
    • K
      net: netcp: remove dead code from the driver · 156e3c21
      Karicheri, Muralidharan 提交于
      netcp_core is the first driver that will get initialized and the modules
      (ethss, pa etc) will then get initialized. So the code at the end of
      netcp_probe() that iterate over the modules is a dead code as the module
      list will be always be empty. So remove this code.
      Signed-off-by: NMurali Karicheri <m-karicheri2@ti.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      156e3c21
    • W
      net: netcp: ethss: fix error in calling sgmii api with incorrect offset · 8c85151d
      WingMan Kwok 提交于
      On K2HK, sgmii module registers of slave 0 and 1 are mem
      mapped to one contiguous block, while those of slave 2
      and 3 are mapped to another contiguous block.  However,
      on K2E and K2L, sgmii module registers of all slaves are
      mem mapped to one contiguous block.  SGMII APIs expect
      slave 0 sgmii base when API is invoked for slave 0 and 1,
      and slave 2 sgmii base when invoked for other slaves.
      Before this patch, slave 0 sgmii base is always passed to
      sgmii API for K2E regardless which slave is the API invoked
      for.  This patch fixes the problem.
      Signed-off-by: NWingMan Kwok <w-kwok2@ti.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8c85151d
    • D
      Fix AF_PACKET ABI breakage in 4.2 · d3869efe
      David Woodhouse 提交于
      Commit 7d824109 ("virtio: add explicit big-endian support to memory
      accessors") accidentally changed the virtio_net header used by
      AF_PACKET with PACKET_VNET_HDR from host-endian to big-endian.
      
      Since virtio_legacy_is_little_endian() is a very long identifier,
      define a vio_le macro and use that throughout the code instead of the
      hard-coded 'false' for little-endian.
      
      This restores the ABI to match 4.1 and earlier kernels, and makes my
      test program work again.
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3869efe
    • N
      netpoll: Close race condition between poll_one_napi and napi_disable · 2d8bff12
      Neil Horman 提交于
      Drivers might call napi_disable while not holding the napi instance poll_lock.
      In those instances, its possible for a race condition to exist between
      poll_one_napi and napi_disable.  That is to say, poll_one_napi only tests the
      NAPI_STATE_SCHED bit to see if there is work to do during a poll, and as such
      the following may happen:
      
      CPU0				CPU1
      ndo_tx_timeout			napi_poll_dev
       napi_disable			 poll_one_napi
        test_and_set_bit (ret 0)
      				  test_bit (ret 1)
         reset adapter		   napi_poll_routine
      
      If the adapter gets a tx timeout without a napi instance scheduled, its possible
      for the adapter to think it has exclusive access to the hardware  (as the napi
      instance is now scheduled via the napi_disable call), while the netpoll code
      thinks there is simply work to do.  The result is parallel hardware access
      leading to corrupt data structures in the driver, and a crash.
      
      Additionaly, there is another, more critical race between netpoll and
      napi_disable.  The disabled napi state is actually identical to the scheduled
      state for a given napi instance.  The implication being that, if a napi instance
      is disabled, a netconsole instance would see the napi state of the device as
      having been scheduled, and poll it, likely while the driver was dong something
      requiring exclusive access.  In the case above, its fairly clear that not having
      the rings in a state ready to be polled will cause any number of crashes.
      
      The fix should be pretty easy.  netpoll uses its own bit to indicate that that
      the napi instance is in a state of being serviced by netpoll (NAPI_STATE_NPSVC).
      We can just gate disabling on that bit as well as the sched bit.  That should
      prevent netpoll from conducting a napi poll if we convert its set bit to a
      test_and_set_bit operation to provide mutual exclusion
      
      Change notes:
      V2)
      	Remove a trailing whtiespace
      	Resubmit with proper subject prefix
      
      V3)
      	Clean up spacing nits
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: jmaxwell@redhat.com
      Tested-by: jmaxwell@redhat.com
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d8bff12
    • E
      tcp: add proper TS val into RST packets · 675ee231
      Eric Dumazet 提交于
      RST packets sent on behalf of TCP connections with TS option (RFC 7323
      TCP timestamps) have incorrect TS val (set to 0), but correct TS ecr.
      
      A > B: Flags [S], seq 0, win 65535, options [mss 1000,nop,nop,TS val 100
      ecr 0], length 0
      B > A: Flags [S.], seq 2444755794, ack 1, win 28960, options [mss
      1460,nop,nop,TS val 7264344 ecr 100], length 0
      A > B: Flags [.], ack 1, win 65535, options [nop,nop,TS val 110 ecr
      7264344], length 0
      
      B > A: Flags [R.], seq 1, ack 1, win 28960, options [nop,nop,TS val 0
      ecr 110], length 0
      
      We need to call skb_mstamp_get() to get proper TS val,
      derived from skb->skb_mstamp
      
      Note that RFC 1323 was advocating to not send TS option in RST segment,
      but RFC 7323 recommends the opposite :
      
        Once TSopt has been successfully negotiated, that is both <SYN> and
        <SYN,ACK> contain TSopt, the TSopt MUST be sent in every non-<RST>
        segment for the duration of the connection, and SHOULD be sent in an
        <RST> segment (see Section 5.2 for details)
      
      Note this RFC recommends to send TS val = 0, but we believe it is
      premature : We do not know if all TCP stacks are properly
      handling the receive side :
      
         When an <RST> segment is
         received, it MUST NOT be subjected to the PAWS check by verifying an
         acceptable value in SEG.TSval, and information from the Timestamps
         option MUST NOT be used to update connection state information.
         SEG.TSecr MAY be used to provide stricter <RST> acceptance checks.
      
      In 5 years, if/when all TCP stack are RFC 7323 ready, we might consider
      to decide to send TS val = 0, if it buys something.
      
      Fixes: 7faee5c0 ("tcp: remove TCP_SKB_CB(skb)->when")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      675ee231
  2. 23 9月, 2015 6 次提交
    • N
      net: dsa: Fix Marvell Egress Trailer check · fbd03513
      Neil Armstrong 提交于
      The Marvell Egress rx trailer check must be fixed to
      correctly detect bad bits in the third byte of the
      Eggress trailer as described in the Table 28 of the
      88E6060 datasheet.
      The current code incorrectly omits to check the third
      byte and checks the fourth byte twice.
      Signed-off-by: NNeil Armstrong <narmstrong@baylibre.com>
      Acked-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fbd03513
    • D
      lib: fix data race in rhashtable_rehash_one · 7def0f95
      Dmitriy Vyukov 提交于
      rhashtable_rehash_one() uses complex logic to update entry->next field,
      after INIT_RHT_NULLS_HEAD and NULLS_MARKER expansion:
      
      entry->next = 1 | ((base + off) << 1)
      
      This can be compiled along the lines of:
      
      entry->next = base + off
      entry->next <<= 1
      entry->next |= 1
      
      Which will break concurrent readers.
      
      NULLS value recomputation is not needed here, so just remove
      the complex logic.
      
      The data race was found with KernelThreadSanitizer (KTSAN).
      Signed-off-by: NDmitry Vyukov <dvyukov@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7def0f95
    • T
      ch9200: Convert to use module_usb_driver · 23eedbc2
      Tobias Klauser 提交于
      Converts the ch9200 driver to use the module_usb_driver() macro which
      makes the code smaller and a bit simpler.
      Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
      Acked-by: NMatthew Garrett <mjg59@srcf.ucam.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      23eedbc2
    • J
      openvswitch: Zero flows on allocation. · ae5f2fb1
      Jesse Gross 提交于
      When support for megaflows was introduced, OVS needed to start
      installing flows with a mask applied to them. Since masking is an
      expensive operation, OVS also had an optimization that would only
      take the parts of the flow keys that were covered by a non-zero
      mask. The values stored in the remaining pieces should not matter
      because they are masked out.
      
      While this works fine for the purposes of matching (which must always
      look at the mask), serialization to netlink can be problematic. Since
      the flow and the mask are serialized separately, the uninitialized
      portions of the flow can be encoded with whatever values happen to be
      present.
      
      In terms of functionality, this has little effect since these fields
      will be masked out by definition. However, it leaks kernel memory to
      userspace, which is a potential security vulnerability. It is also
      possible that other code paths could look at the masked key and get
      uninitialized data, although this does not currently appear to be an
      issue in practice.
      
      This removes the mask optimization for flows that are being installed.
      This was always intended to be the case as the mask optimizations were
      really targetting per-packet flow operations.
      
      Fixes: 03f0d916 ("openvswitch: Mega flow implementation")
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      Acked-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae5f2fb1
    • R
      net: dsa: actually force the speed on the CPU port · 53adc9e8
      Russell King 提交于
      Commit 54d792f2 ("net: dsa: Centralise global and port setup
      code into mv88e6xxx.") merged in the 4.2 merge window broke the link
      speed forcing for the CPU port of Marvell DSA switches.  The original
      code was:
      
              /* MAC Forcing register: don't force link, speed, duplex
               * or flow control state to any particular values on physical
               * ports, but force the CPU port and all DSA ports to 1000 Mb/s
               * full duplex.
               */
              if (dsa_is_cpu_port(ds, p) || ds->dsa_port_mask & (1 << p))
                      REG_WRITE(addr, 0x01, 0x003e);
              else
                      REG_WRITE(addr, 0x01, 0x0003);
      
      but the new code does a read-modify-write:
      
                      reg = _mv88e6xxx_reg_read(ds, REG_PORT(port), PORT_PCS_CTRL);
                      if (dsa_is_cpu_port(ds, port) ||
                          ds->dsa_port_mask & (1 << port)) {
                              reg |= PORT_PCS_CTRL_FORCE_LINK |
                                      PORT_PCS_CTRL_LINK_UP |
                                      PORT_PCS_CTRL_DUPLEX_FULL |
                                      PORT_PCS_CTRL_FORCE_DUPLEX;
                              if (mv88e6xxx_6065_family(ds))
                                      reg |= PORT_PCS_CTRL_100;
                              else
                                      reg |= PORT_PCS_CTRL_1000;
      
      The link speed in the PCS control register is a two bit field.  Forcing
      the link speed in this way doesn't ensure that the bit field is set to
      the correct value - on the hardware I have here, the speed bitfield
      remains set to 0x03, resulting in the speed not being forced to gigabit.
      
      We must clear both bits before forcing the link speed.
      
      Fixes: 54d792f2 ("net: dsa: Centralise global and port setup code into mv88e6xxx.")
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Acked-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      53adc9e8
    • J
      geneve: ensure ECN info is handled properly in all tx/rx paths · 08399efc
      John W. Linville 提交于
      Partially due to a pre-exising "thinko", the new metadata-based tx/rx
      paths were handling ECN propagation differently than the traditional
      tx/rx paths.  This patch removes the "thinko" (involving multiple
      ip_hdr assignments) on the rx path and corrects the ECN handling on
      both the rx and tx paths.
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      Reviewed-by: NJesse Gross <jesse@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      08399efc
  3. 22 9月, 2015 14 次提交
  4. 21 9月, 2015 6 次提交
    • H
      netlink: Fix autobind race condition that leads to zero port ID · 1f770c0a
      Herbert Xu 提交于
      The commit c0bb07df ("netlink:
      Reset portid after netlink_insert failure") introduced a race
      condition where if two threads try to autobind the same socket
      one of them may end up with a zero port ID.  This led to kernel
      deadlocks that were observed by multiple people.
      
      This patch reverts that commit and instead fixes it by introducing
      a separte rhash_portid variable so that the real portid is only set
      after the socket has been successfully hashed.
      
      Fixes: c0bb07df ("netlink: Reset portid after netlink_insert failure")
      Reported-by: NTejun Heo <tj@kernel.org>
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f770c0a
    • M
      macvtap: fix TUNSETSNDBUF values > 64k · 3ea79249
      Michael S. Tsirkin 提交于
      Upon TUNSETSNDBUF,  macvtap reads the requested sndbuf size into
      a local variable u.
      commit 39ec7de7 ("macvtap: fix uninitialized access on
      TUNSETIFF") changed its type to u16 (which is the right thing to
      do for all other macvtap ioctls), breaking all values > 64k.
      
      The value of TUNSETSNDBUF is actually a signed 32 bit integer, so
      the right thing to do is to read it into an int.
      
      Cc: David S. Miller <davem@davemloft.net>
      Fixes: 39ec7de7 ("macvtap: fix uninitialized access on TUNSETIFF")
      Reported-by: Mark A. Peloquin
      Bisected-by: NMatthew Rosato <mjrosato@linux.vnet.ibm.com>
      Reported-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Tested-by: NMatthew Rosato <mjrosato@linux.vnet.ibm.com>
      Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3ea79249
    • N
      ip6tunnel: make rx/tx bytes counters consistent · 83cf9a25
      Nicolas Dichtel 提交于
      Like the previous patch, which fixes ipv4 tunnels, here is the ipv6 part.
      
      Before the patch, the external ipv6 header + gre header were included on
      tx.
      
      After the patch:
      $ ping -c1 192.168.6.121 ; ip -s l ls dev ip6gre1
      PING 192.168.6.121 (192.168.6.121) 56(84) bytes of data.
      64 bytes from 192.168.6.121: icmp_req=1 ttl=64 time=1.92 ms
      
      --- 192.168.6.121 ping statistics ---
      1 packets transmitted, 1 received, 0% packet loss, time 0ms
      rtt min/avg/max/mdev = 1.923/1.923/1.923/0.000 ms
      7: ip6gre1@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN mode DEFAULT group default
          link/gre6 20:01:06:60:30:08:c1:c3:00:00:00:00:00:00:01:23 peer 20:01:06:60:30:08:c1:c3:00:00:00:00:00:00:01:21
          RX: bytes  packets  errors  dropped overrun mcast
          84         1        0       0       0       0
          TX: bytes  packets  errors  dropped carrier collsns
          84         1        0       0       0       0
      $ ping -c1 192.168.1.121 ; ip -s l ls dev ip6tnl1
      PING 192.168.1.121 (192.168.1.121) 56(84) bytes of data.
      64 bytes from 192.168.1.121: icmp_req=1 ttl=64 time=2.28 ms
      
      --- 192.168.1.121 ping statistics ---
      1 packets transmitted, 1 received, 0% packet loss, time 0ms
      rtt min/avg/max/mdev = 2.288/2.288/2.288/0.000 ms
      8: ip6tnl1@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1452 qdisc noqueue state UNKNOWN mode DEFAULT group default
          link/tunnel6 2001:660:3008:c1c3::123 peer 2001:660:3008:c1c3::121
          RX: bytes  packets  errors  dropped overrun mcast
          84         1        0       0       0       0
          TX: bytes  packets  errors  dropped carrier collsns
          84         1        0       0       0       0
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      83cf9a25
    • N
      iptunnel: make rx/tx bytes counters consistent · bc22a0e2
      Nicolas Dichtel 提交于
      This was already done a long time ago in
      commit 64194c31 ("inet: Make tunnel RX/TX byte counters more consistent")
      but tx path was broken (at least since 3.10).
      
      Before the patch the gre header was included on tx.
      
      After the patch:
      $ ping -c1 192.168.0.121 ; ip -s l ls dev gre1
      PING 192.168.0.121 (192.168.0.121) 56(84) bytes of data.
      64 bytes from 192.168.0.121: icmp_req=1 ttl=64 time=2.95 ms
      
      --- 192.168.0.121 ping statistics ---
      1 packets transmitted, 1 received, 0% packet loss, time 0ms
      rtt min/avg/max/mdev = 2.955/2.955/2.955/0.000 ms
      7: gre1@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1468 qdisc noqueue state UNKNOWN mode DEFAULT group default
          link/gre 10.16.0.249 peer 10.16.0.121
          RX: bytes  packets  errors  dropped overrun mcast
          84         1        0       0       0       0
          TX: bytes  packets  errors  dropped carrier collsns
          84         1        0       0       0       0
      Reported-by: NJulien Meunier <julien.meunier@6wind.com>
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc22a0e2
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · ac813744
      David S. Miller 提交于
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patch contains Netfilter fixes for your net tree, they are:
      
      1) nf_log_unregister() should only set to NULL the logger that is being
         unregistered, instead of everything else. Patch from Florian Westphal.
      
      2) Fix a crash when accessing physoutdev from PREROUTING in br_netfilter.
         This is partially reverting the patch to shrink nf_bridge_info to 32 bytes.
         Also from Florian.
      
      3) Use existing match/target extensions in the internal nft_compat extension
         lists when the extension is family unspecific (ie. NFPROTO_UNSPEC).
      
      4) Wait for rcu grace period before leaving nf_log_unregister().
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ac813744
    • E
      tipc: reinitialize pointer after skb linearize · 4e3ae001
      Erik Hugne 提交于
      The msg pointer into header may change after skb linearization.
      We must reinitialize it after calling skb_linearize to prevent
      operating on a freed or invalid pointer.
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Reported-by: NTamás Végh <tamas.vegh@ericsson.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e3ae001