1. 07 2月, 2017 1 次提交
  2. 06 2月, 2017 7 次提交
  3. 05 2月, 2017 2 次提交
  4. 04 2月, 2017 3 次提交
    • E
      net: use a work queue to defer net_disable_timestamp() work · 5fa8bbda
      Eric Dumazet 提交于
      Dmitry reported a warning [1] showing that we were calling
      net_disable_timestamp() -> static_key_slow_dec() from a non
      process context.
      
      Grabbing a mutex while holding a spinlock or rcu_read_lock()
      is not allowed.
      
      As Cong suggested, we now use a work queue.
      
      It is possible netstamp_clear() exits while netstamp_needed_deferred
      is not zero, but it is probably not worth trying to do better than that.
      
      netstamp_needed_deferred atomic tracks the exact number of deferred
      decrements.
      
      [1]
      [ INFO: suspicious RCU usage. ]
      4.10.0-rc5+ #192 Not tainted
      -------------------------------
      ./include/linux/rcupdate.h:561 Illegal context switch in RCU read-side
      critical section!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 0
      2 locks held by syz-executor14/23111:
       #0:  (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff83a35c35>] lock_sock
      include/net/sock.h:1454 [inline]
       #0:  (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff83a35c35>]
      rawv6_sendmsg+0x1e65/0x3ec0 net/ipv6/raw.c:919
       #1:  (rcu_read_lock){......}, at: [<ffffffff83ae2678>] nf_hook
      include/linux/netfilter.h:201 [inline]
       #1:  (rcu_read_lock){......}, at: [<ffffffff83ae2678>]
      __ip6_local_out+0x258/0x840 net/ipv6/output_core.c:160
      
      stack backtrace:
      CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
      01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:15 [inline]
       dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
       lockdep_rcu_suspicious+0x139/0x180 kernel/locking/lockdep.c:4452
       rcu_preempt_sleep_check include/linux/rcupdate.h:560 [inline]
       ___might_sleep+0x560/0x650 kernel/sched/core.c:7748
       __might_sleep+0x95/0x1a0 kernel/sched/core.c:7739
       mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752
       atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060
       __static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149
       static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174
       net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728
       sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403
       __sk_destruct+0x27d/0x6b0 net/core/sock.c:1441
       sk_destruct+0x47/0x80 net/core/sock.c:1460
       __sk_free+0x57/0x230 net/core/sock.c:1468
       sock_wfree+0xae/0x120 net/core/sock.c:1645
       skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655
       skb_release_all+0x15/0x60 net/core/skbuff.c:668
       __kfree_skb+0x15/0x20 net/core/skbuff.c:684
       kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705
       inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
       inet_frag_put include/net/inet_frag.h:133 [inline]
       nf_ct_frag6_gather+0x1106/0x3840
      net/ipv6/netfilter/nf_conntrack_reasm.c:617
       ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
       nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline]
       nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
       nf_hook include/linux/netfilter.h:212 [inline]
       __ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160
       ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
       ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
       ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
       rawv6_push_pending_frames net/ipv6/raw.c:613 [inline]
       rawv6_sendmsg+0x2d1a/0x3ec0 net/ipv6/raw.c:927
       inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
       sock_sendmsg_nosec net/socket.c:635 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:645
       sock_write_iter+0x326/0x600 net/socket.c:848
       do_iter_readv_writev+0x2e3/0x5b0 fs/read_write.c:695
       do_readv_writev+0x42c/0x9b0 fs/read_write.c:872
       vfs_writev+0x87/0xc0 fs/read_write.c:911
       do_writev+0x110/0x2c0 fs/read_write.c:944
       SYSC_writev fs/read_write.c:1017 [inline]
       SyS_writev+0x27/0x30 fs/read_write.c:1014
       entry_SYSCALL_64_fastpath+0x1f/0xc2
      RIP: 0033:0x445559
      RSP: 002b:00007f6f46fceb58 EFLAGS: 00000292 ORIG_RAX: 0000000000000014
      RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 0000000000445559
      RDX: 0000000000000001 RSI: 0000000020f1eff0 RDI: 0000000000000005
      RBP: 00000000006e19c0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000292 R12: 0000000000700000
      R13: 0000000020f59000 R14: 0000000000000015 R15: 0000000000020400
      BUG: sleeping function called from invalid context at
      kernel/locking/mutex.c:752
      in_atomic(): 1, irqs_disabled(): 0, pid: 23111, name: syz-executor14
      INFO: lockdep is turned off.
      CPU: 2 PID: 23111 Comm: syz-executor14 Not tainted 4.10.0-rc5+ #192
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
      01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:15 [inline]
       dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
       ___might_sleep+0x47e/0x650 kernel/sched/core.c:7780
       __might_sleep+0x95/0x1a0 kernel/sched/core.c:7739
       mutex_lock_nested+0x24f/0x1730 kernel/locking/mutex.c:752
       atomic_dec_and_mutex_lock+0x119/0x160 kernel/locking/mutex.c:1060
       __static_key_slow_dec+0x7a/0x1e0 kernel/jump_label.c:149
       static_key_slow_dec+0x51/0x90 kernel/jump_label.c:174
       net_disable_timestamp+0x3b/0x50 net/core/dev.c:1728
       sock_disable_timestamp+0x98/0xc0 net/core/sock.c:403
       __sk_destruct+0x27d/0x6b0 net/core/sock.c:1441
       sk_destruct+0x47/0x80 net/core/sock.c:1460
       __sk_free+0x57/0x230 net/core/sock.c:1468
       sock_wfree+0xae/0x120 net/core/sock.c:1645
       skb_release_head_state+0xfc/0x200 net/core/skbuff.c:655
       skb_release_all+0x15/0x60 net/core/skbuff.c:668
       __kfree_skb+0x15/0x20 net/core/skbuff.c:684
       kfree_skb+0x16e/0x4c0 net/core/skbuff.c:705
       inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
       inet_frag_put include/net/inet_frag.h:133 [inline]
       nf_ct_frag6_gather+0x1106/0x3840
      net/ipv6/netfilter/nf_conntrack_reasm.c:617
       ipv6_defrag+0x1be/0x2b0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
       nf_hook_entry_hookfn include/linux/netfilter.h:102 [inline]
       nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
       nf_hook include/linux/netfilter.h:212 [inline]
       __ip6_local_out+0x489/0x840 net/ipv6/output_core.c:160
       ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
       ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
       ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
       rawv6_push_pending_frames net/ipv6/raw.c:613 [inline]
       rawv6_sendmsg+0x2d1a/0x3ec0 net/ipv6/raw.c:927
       inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
       sock_sendmsg_nosec net/socket.c:635 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:645
       sock_write_iter+0x326/0x600 net/socket.c:848
       do_iter_readv_writev+0x2e3/0x5b0 fs/read_write.c:695
       do_readv_writev+0x42c/0x9b0 fs/read_write.c:872
       vfs_writev+0x87/0xc0 fs/read_write.c:911
       do_writev+0x110/0x2c0 fs/read_write.c:944
       SYSC_writev fs/read_write.c:1017 [inline]
       SyS_writev+0x27/0x30 fs/read_write.c:1014
       entry_SYSCALL_64_fastpath+0x1f/0xc2
      RIP: 0033:0x445559
      
      Fixes: b90e5794 ("net: dont call jump_label_dec from irq context")
      Suggested-by: NCong Wang <xiyou.wangcong@gmail.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5fa8bbda
    • S
      ethtool: do not vzalloc(0) on registers dump · 3808d348
      Stanislaw Gruszka 提交于
      If ->get_regs_len() callback return 0, we allocate 0 bytes of memory,
      what print ugly warning in dmesg, which can be found further below.
      
      This happen on mac80211 devices where ieee80211_get_regs_len() just
      return 0 and driver only fills ethtool_regs structure and actually
      do not provide any dump. However I assume this can happen on other
      drivers i.e. when for some devices driver provide regs dump and for
      others do not. Hence preventing to to print warning in ethtool code
      seems to be reasonable.
      
      ethtool: vmalloc: allocation failure: 0 bytes, mode:0x24080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO)
      <snip>
      Call Trace:
      [<ffffffff813bde47>] dump_stack+0x63/0x8c
      [<ffffffff811b0a1f>] warn_alloc+0x13f/0x170
      [<ffffffff811f0476>] __vmalloc_node_range+0x1e6/0x2c0
      [<ffffffff811f0874>] vzalloc+0x54/0x60
      [<ffffffff8169986c>] dev_ethtool+0xb4c/0x1b30
      [<ffffffff816adbb1>] dev_ioctl+0x181/0x520
      [<ffffffff816714d2>] sock_do_ioctl+0x42/0x50
      <snip>
      Mem-Info:
      active_anon:435809 inactive_anon:173951 isolated_anon:0
       active_file:835822 inactive_file:196932 isolated_file:0
       unevictable:0 dirty:8 writeback:0 unstable:0
       slab_reclaimable:157732 slab_unreclaimable:10022
       mapped:83042 shmem:306356 pagetables:9507 bounce:0
       free:130041 free_pcp:1080 free_cma:0
      Node 0 active_anon:1743236kB inactive_anon:695804kB active_file:3343288kB inactive_file:787728kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:332168kB dirty:32kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 1225424kB writeback_tmp:0kB unstable:0kB pages_scanned:0 all_unreclaimable? no
      Node 0 DMA free:15900kB min:136kB low:168kB high:200kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15984kB managed:15900kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
      lowmem_reserve[]: 0 3187 7643 7643
      Node 0 DMA32 free:419732kB min:28124kB low:35152kB high:42180kB active_anon:541180kB inactive_anon:248988kB active_file:1466388kB inactive_file:389632kB unevictable:0kB writepending:0kB present:3370280kB managed:3290932kB mlocked:0kB slab_reclaimable:217184kB slab_unreclaimable:4180kB kernel_stack:160kB pagetables:984kB bounce:0kB free_pcp:2236kB local_pcp:660kB free_cma:0kB
      lowmem_reserve[]: 0 0 4456 4456
      Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3808d348
    • D
      ipv6: sr: remove cleanup flag and fix HMAC computation · 013e8167
      David Lebrun 提交于
      In the latest version of the IPv6 Segment Routing IETF draft [1] the
      cleanup flag is removed and the flags field length is shrunk from 16 bits
      to 8 bits. As a consequence, the input of the HMAC computation is modified
      in a non-backward compatible way by covering the whole octet of flags
      instead of only the cleanup bit. As such, if an implementation compatible
      with the latest draft computes the HMAC of an SRH who has other flags set
      to 1, then the HMAC result would differ from the current implementation.
      
      This patch carries those modifications to prevent conflict with other
      implementations of IPv6 SR.
      
      [1] https://tools.ietf.org/html/draft-ietf-6man-segment-routing-header-05Signed-off-by: NDavid Lebrun <david.lebrun@uclouvain.be>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      013e8167
  5. 03 2月, 2017 4 次提交
    • M
      net: phy: Fix lack of reference count on PHY driver · cafe8df8
      Mao Wenan 提交于
      There is currently no reference count being held on the PHY driver,
      which makes it possible to remove the PHY driver module while the PHY
      state machine is running and polling the PHY. This could cause crashes
      similar to this one to show up:
      
      [   43.361162] BUG: unable to handle kernel NULL pointer dereference at 0000000000000140
      [   43.361162] IP: phy_state_machine+0x32/0x490
      [   43.361162] PGD 59dc067
      [   43.361162] PUD 0
      [   43.361162]
      [   43.361162] Oops: 0000 [#1] SMP
      [   43.361162] Modules linked in: dsa_loop [last unloaded: broadcom]
      [   43.361162] CPU: 0 PID: 1299 Comm: kworker/0:3 Not tainted 4.10.0-rc5+ #415
      [   43.361162] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
      BIOS Ubuntu-1.8.2-1ubuntu2 04/01/2014
      [   43.361162] Workqueue: events_power_efficient phy_state_machine
      [   43.361162] task: ffff880006782b80 task.stack: ffffc90000184000
      [   43.361162] RIP: 0010:phy_state_machine+0x32/0x490
      [   43.361162] RSP: 0018:ffffc90000187e18 EFLAGS: 00000246
      [   43.361162] RAX: 0000000000000000 RBX: ffff8800059e53c0 RCX:
      ffff880006a15c60
      [   43.361162] RDX: ffff880006782b80 RSI: 0000000000000000 RDI:
      ffff8800059e5428
      [   43.361162] RBP: ffffc90000187e48 R08: ffff880006a15c40 R09:
      0000000000000000
      [   43.361162] R10: 0000000000000000 R11: 0000000000000000 R12:
      ffff8800059e5428
      [   43.361162] R13: ffff8800059e5000 R14: 0000000000000000 R15:
      ffff880006a15c40
      [   43.361162] FS:  0000000000000000(0000) GS:ffff880006a00000(0000)
      knlGS:0000000000000000
      [   43.361162] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   43.361162] CR2: 0000000000000140 CR3: 0000000005979000 CR4:
      00000000000006f0
      [   43.361162] Call Trace:
      [   43.361162]  process_one_work+0x1b4/0x3e0
      [   43.361162]  worker_thread+0x43/0x4d0
      [   43.361162]  ? __schedule+0x17f/0x4e0
      [   43.361162]  kthread+0xf7/0x130
      [   43.361162]  ? process_one_work+0x3e0/0x3e0
      [   43.361162]  ? kthread_create_on_node+0x40/0x40
      [   43.361162]  ret_from_fork+0x29/0x40
      [   43.361162] Code: 56 41 55 41 54 4c 8d 67 68 53 4c 8d af 40 fc ff ff
      48 89 fb 4c 89 e7 48 83 ec 08 e8 c9 9d 27 00 48 8b 83 60 ff ff ff 44 8b
      73 98 <48> 8b 90 40 01 00 00 44 89 f0 48 85 d2 74 08 4c 89 ef ff d2 8b
      
      Keep references on the PHY driver module right before we are going to
      utilize it in phy_attach_direct(), and conversely when we don't use it
      anymore in phy_detach().
      Signed-off-by: NMao Wenan <maowenan@huawei.com>
      [florian: rebase, rework commit message]
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cafe8df8
    • D
      Merge branch 'mlx4-queue-reinit' · 2372bcda
      David S. Miller 提交于
      Martin KaFai Lau says:
      
      ====================
      mlx4: Misc bug fixes after reinitializing queues
      
      This patchset fixes misc bugs after reinitializing
      queues (e.g. by ethtool -L).
      
      v2:
      * Add another fix to mem leak in tx_ring[t] and tx_cq[t]
      * In mlx4_en_try_alloc_resources(),
        move all xdp_prog logic after calling mlx4_en_alloc_resources()
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2372bcda
    • M
      mlx4: xdp_prog becomes inactive after ethtool '-L' or '-G' · 770f8225
      Martin KaFai Lau 提交于
      After calling mlx4_en_try_alloc_resources (e.g. by changing the
      number of rx-queues with ethtool -L), the existing xdp_prog becomes
      inactive.
      
      The bug is that the xdp_prog ptr has not been carried over from
      the old rx-queues to the new rx-queues
      
      Fixes: 47a38e15 ("net/mlx4_en: add support for fast rx drop bpf program")
      Cc: Brenden Blanco <bblanco@plumgrid.com>
      Cc: Saeed Mahameed <saeedm@mellanox.com>
      Cc: Tariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      770f8225
    • M
      mlx4: Fix memory leak after mlx4_en_update_priv() · f32b20e8
      Martin KaFai Lau 提交于
      In mlx4_en_update_priv(), dst->tx_ring[t] and dst->tx_cq[t]
      are over-written by src->tx_ring[t] and src->tx_cq[t] without
      first calling kfree.
      
      One of the reproducible code paths is by doing 'ethtool -L'.
      
      The fix is to do the kfree in mlx4_en_free_resources().
      
      Here is the kmemleak report:
      unreferenced object 0xffff880841211800 (size 2048):
        comm "ethtool", pid 3096, jiffies 4294716940 (age 528.353s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff81930718>] kmemleak_alloc+0x28/0x50
          [<ffffffff8120b213>] kmem_cache_alloc_trace+0x103/0x260
          [<ffffffff8170e0a8>] mlx4_en_try_alloc_resources+0x118/0x1a0
          [<ffffffff817065a9>] mlx4_en_set_ringparam+0x169/0x210
          [<ffffffff818040c5>] dev_ethtool+0xae5/0x2190
          [<ffffffff8181b898>] dev_ioctl+0x168/0x6f0
          [<ffffffff817d7a72>] sock_do_ioctl+0x42/0x50
          [<ffffffff817d819b>] sock_ioctl+0x21b/0x2d0
          [<ffffffff81247a73>] do_vfs_ioctl+0x93/0x6a0
          [<ffffffff812480f9>] SyS_ioctl+0x79/0x90
          [<ffffffff8193d7ea>] entry_SYSCALL_64_fastpath+0x18/0xad
          [<ffffffffffffffff>] 0xffffffffffffffff
      unreferenced object 0xffff880841213000 (size 2048):
        comm "ethtool", pid 3096, jiffies 4294716940 (age 528.353s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff81930718>] kmemleak_alloc+0x28/0x50
          [<ffffffff8120b213>] kmem_cache_alloc_trace+0x103/0x260
          [<ffffffff8170e0cb>] mlx4_en_try_alloc_resources+0x13b/0x1a0
          [<ffffffff817065a9>] mlx4_en_set_ringparam+0x169/0x210
          [<ffffffff818040c5>] dev_ethtool+0xae5/0x2190
          [<ffffffff8181b898>] dev_ioctl+0x168/0x6f0
          [<ffffffff817d7a72>] sock_do_ioctl+0x42/0x50
          [<ffffffff817d819b>] sock_ioctl+0x21b/0x2d0
          [<ffffffff81247a73>] do_vfs_ioctl+0x93/0x6a0
          [<ffffffff812480f9>] SyS_ioctl+0x79/0x90
          [<ffffffff8193d7ea>] entry_SYSCALL_64_fastpath+0x18/0xad
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      (gdb) list *mlx4_en_try_alloc_resources+0x118
      0xffffffff8170e0a8 is in mlx4_en_try_alloc_resources (drivers/net/ethernet/mellanox/mlx4/en_netdev.c:2145).
      2140                    if (!dst->tx_ring_num[t])
      2141                            continue;
      2142
      2143                    dst->tx_ring[t] = kzalloc(sizeof(struct mlx4_en_tx_ring *) *
      2144                                              MAX_TX_RINGS, GFP_KERNEL);
      2145                    if (!dst->tx_ring[t])
      2146                            goto err_free_tx;
      2147
      2148                    dst->tx_cq[t] = kzalloc(sizeof(struct mlx4_en_cq *) *
      2149                                            MAX_TX_RINGS, GFP_KERNEL);
      (gdb) list *mlx4_en_try_alloc_resources+0x13b
      0xffffffff8170e0cb is in mlx4_en_try_alloc_resources (drivers/net/ethernet/mellanox/mlx4/en_netdev.c:2150).
      2145                    if (!dst->tx_ring[t])
      2146                            goto err_free_tx;
      2147
      2148                    dst->tx_cq[t] = kzalloc(sizeof(struct mlx4_en_cq *) *
      2149                                            MAX_TX_RINGS, GFP_KERNEL);
      2150                    if (!dst->tx_cq[t]) {
      2151                            kfree(dst->tx_ring[t]);
      2152                            goto err_free_tx;
      2153                    }
      2154            }
      
      Fixes: ec25bc04 ("net/mlx4_en: Add resilience in low memory systems")
      Cc: Eugenia Emantayev <eugenia@mellanox.com>
      Cc: Saeed Mahameed <saeedm@mellanox.com>
      Cc: Tariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f32b20e8
  6. 02 2月, 2017 10 次提交
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 6d04dfc8
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Fix handling of interrupt status in stmmac driver. Just because we
          have masked the event from generating interrupts, doesn't mean the
          bit won't still be set in the interrupt status register. From Alexey
          Brodkin.
      
       2) Fix DMA API debugging splats in gianfar driver, from Arseny Solokha.
      
       3) Fix off-by-one error in __ip6_append_data(), from Vlad Yasevich.
      
       4) cls_flow does not match on icmpv6 codes properly, from Simon Horman.
      
       5) Initial MAC address can be set incorrectly in some scenerios, from
          Ivan Vecera.
      
       6) Packet header pointer arithmetic fix in ip6_tnl_parse_tlv_end_lim(),
          from Dan Carpenter.
      
       7) Fix divide by zero in __tcp_select_window(), from Eric Dumazet.
      
       8) Fix crash in iwlwifi when unregistering thermal zone, from Jens
          Axboe.
      
       9) Check for DMA mapping errors in starfire driver, from Alexey
          Khoroshilov.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (31 commits)
        tcp: fix 0 divide in __tcp_select_window()
        ipv6: pointer math error in ip6_tnl_parse_tlv_enc_lim()
        net: fix ndo_features_check/ndo_fix_features comment ordering
        net/sched: matchall: Fix configuration race
        be2net: fix initial MAC setting
        ipv6: fix flow labels when the traffic class is non-0
        net: thunderx: avoid dereferencing xcv when NULL
        net/sched: cls_flower: Correct matching on ICMPv6 code
        ipv6: Paritially checksum full MTU frames
        net/mlx4_core: Avoid command timeouts during VF driver device shutdown
        gianfar: synchronize DMA API usage by free_skb_rx_queue w/ gfar_new_page
        net: ethtool: add support for 2500BaseT and 5000BaseT link modes
        can: bcm: fix hrtimer/tasklet termination in bcm op removal
        net: adaptec: starfire: add checks for dma mapping errors
        net: phy: micrel: KSZ8795 do not set SUPPORTED_[Asym_]Pause
        can: Fix kernel panic at security_sock_rcv_skb
        net: macb: Fix 64 bit addressing support for GEM
        stmmac: Discard masked flags in interrupt status register
        net/mlx5e: Check ets capability before ets query FW command
        net/mlx5e: Fix update of hash function/key via ethtool
        ...
      6d04dfc8
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 2883aaea
      Linus Torvalds 提交于
      Pull fscache fixes from Al Viro.
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        fscache: Fix dead object requeue
        fscache: Clear outstanding writes when disabling a cookie
        FS-Cache: Initialise stores_lock in netfs cookie
      2883aaea
    • E
      tcp: fix 0 divide in __tcp_select_window() · 06425c30
      Eric Dumazet 提交于
      syszkaller fuzzer was able to trigger a divide by zero, when
      TCP window scaling is not enabled.
      
      SO_RCVBUF can be used not only to increase sk_rcvbuf, also
      to decrease it below current receive buffers utilization.
      
      If mss is negative or 0, just return a zero TCP window.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NDmitry Vyukov  <dvyukov@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06425c30
    • D
      ipv6: pointer math error in ip6_tnl_parse_tlv_enc_lim() · 63117f09
      Dan Carpenter 提交于
      Casting is a high precedence operation but "off" and "i" are in terms of
      bytes so we need to have some parenthesis here.
      
      Fixes: fbfa743a ("ipv6: fix ip6_tnl_parse_tlv_enc_lim()")
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      63117f09
    • L
      Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · e387dc12
      Linus Torvalds 提交于
      Pull crypto fixes from Herbert Xu:
       "This fixes a bug in CBC/CTR on ARM64 that breaks chaining as well as a
        bug in the core API that causes registration failures when a driver
        unloads and then reloads an algorithm"
      
      * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: arm64/aes-blk - honour iv_out requirement in CBC and CTR modes
        crypto: api - Clear CRYPTO_ALG_DEAD bit before registering an alg
      e387dc12
    • L
      Merge tag 'dmaengine-fix-4.10-rc7' of git://git.infradead.org/users/vkoul/slave-dma · 35609502
      Linus Torvalds 提交于
      Pull dmaengine fixes from Vinod Koul:
       "A couple of fixes showed up late in the cycle so sending them up and
        sending early in the week and not on Friday :).
      
        They fix a double lock in pl330 driver and runtime pm fixes for cppi
        driver"
      
      * tag 'dmaengine-fix-4.10-rc7' of git://git.infradead.org/users/vkoul/slave-dma:
        dmaengine: pl330: fix double lock
        dmaengine: cppi41: Clean up pointless warnings
        dmaengine: cppi41: Fix oops in cppi41_runtime_resume
        dmaengine: cppi41: Fix runtime PM timeouts with USB mass storage
      35609502
    • D
      net: fix ndo_features_check/ndo_fix_features comment ordering · 1a2a1444
      Dimitris Michailidis 提交于
      Commit cdba756f ("net: move ndo_features_check() close to
      ndo_start_xmit()") inadvertently moved the doc comment for
      .ndo_fix_features instead of .ndo_features_check. Fix the comment
      ordering.
      
      Fixes: cdba756f ("net: move ndo_features_check() close to ndo_start_xmit()")
      Signed-off-by: NDimitris Michailidis <dmichail@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a2a1444
    • Y
      net/sched: matchall: Fix configuration race · fd62d9f5
      Yotam Gigi 提交于
      In the current version, the matchall internal state is split into two
      structs: cls_matchall_head and cls_matchall_filter. This makes little
      sense, as matchall instance supports only one filter, and there is no
      situation where one exists and the other does not. In addition, that led
      to some races when filter was deleted while packet was processed.
      
      Unify that two structs into one, thus simplifying the process of matchall
      creation and deletion. As a result, the new, delete and get callbacks have
      a dummy implementation where all the work is done in destroy and change
      callbacks, as was done in cls_cgroup.
      
      Fixes: bf3994d2 ("net/sched: introduce Match-all classifier")
      Reported-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NYotam Gigi <yotamg@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd62d9f5
    • L
      Merge tag 'pinctrl-v4.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · c325b353
      Linus Torvalds 提交于
      Pull pin control fixes from Linus Walleij:
       "Another week, another set of pin control fixes. The subsystem has seen
        high patch-spot activity recently.
      
        The majority of the patches are for Intel, I vaguely think it mostly
        concern phones, tablets and maybe chromebooks and even laptops with
        this Intel Atom family chips.
      
        Driver fixes only:
      
         - one fix to the Berlin driver making the SD card work fully again.
      
         - one fix to the Allwinner/sunxi bias function: one premature change
           needs to be partially reverted.
      
         - the remaining four patches are to Intel embedded SoCs: baytrail
           (three patches) and merrifield (one patch): register access
           debounce fixes and a missing spinlock"
      
      * tag 'pinctrl-v4.10-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
        pinctrl: baytrail: Add missing spinlock usage in byt_gpio_irq_handler
        pinctrl: baytrail: Debounce register is one per community
        pinctrl: baytrail: Rectify debounce support (part 2)
        pinctrl: intel: merrifield: Add missed check in mrfld_config_set()
        pinctrl: sunxi: Don't enforce bias disable (for now)
        pinctrl: berlin-bg4ct: fix the value for "sd1a" of pin SCRD0_CRD_PRES
      c325b353
    • I
      be2net: fix initial MAC setting · 4993b39a
      Ivan Vecera 提交于
      Recent commit 34393529 ("be2net: fix MAC addr setting on privileged
      BE3 VFs") allows privileged BE3 VFs to set its MAC address during
      initialization. Although the initial MAC for such VFs is already
      programmed by parent PF the subsequent setting performed by VF is OK,
      but in certain cases (after fresh boot) this command in VF can fail.
      
      The MAC should be initialized only when:
      1) no MAC is programmed (always except BE3 VFs during first init)
      2) programmed MAC is different from requested (e.g. MAC is set when
         interface is down). In this case the initial MAC programmed by PF
         needs to be deleted.
      
      The adapter->dev_mac contains MAC address currently programmed in HW so
      it should be zeroed when the MAC is deleted from HW and should not be
      filled when MAC is set when interface is down in be_mac_addr_set() as
      no programming is performed in this case.
      
      Example of failure without the fix (immediately after fresh boot):
      
      # ip link set eth0 up  <- eth0 is BE3 PF
      be2net 0000:01:00.0 eth0: Link is Up
      
      # echo 1 > /sys/class/net/eth0/device/sriov_numvfs  <- Create 1 VF
      ...
      be2net 0000:01:04.0: Emulex OneConnect(be3): VF  port 0
      
      # ip link set eth8 up  <- eth8 is created privileged VF
      be2net 0000:01:04.0: opcode 59-1 failed:status 1-76
      RTNETLINK answers: Input/output error
      
      # echo 0 > /sys/class/net/eth0/device/sriov_numvfs  <- Delete VF
      iommu: Removing device 0000:01:04.0 from group 33
      ...
      
      # echo 1 > /sys/class/net/eth0/device/sriov_numvfs  <- Create it again
      iommu: Removing device 0000:01:04.0 from group 33
      ...
      
      # ip link set eth8 up
      be2net 0000:01:04.0 eth8: Link is Up
      
      Initialization is now OK.
      
      v2 - Corrected the comment and condition check suggested by Suresh & Harsha
      
      Fixes: 34393529 ("be2net: fix MAC addr setting on privileged BE3 VFs")
      Cc: Sathya Perla <sathya.perla@broadcom.com>
      Cc: Ajit Khaparde <ajit.khaparde@broadcom.com>
      Cc: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
      Cc: Somnath Kotur <somnath.kotur@broadcom.com>
      Signed-off-by: NIvan Vecera <cera@cera.cz>
      Acked-by: NSriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4993b39a
  7. 01 2月, 2017 12 次提交
    • L
      Merge tag 'trace-4.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · a2ca3d61
      Linus Torvalds 提交于
      Pull tracing fix from Steven Rostedt:
       "It was reported to me that the thread created by the hwlat tracer does
        not migrate after the first instance. I found that there was as small
        bug in the logic, and fixed it. It's minor, but should be fixed
        regardless. There's not much impact outside the hwlat tracer"
      
      * tag 'trace-4.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracing: Fix hwlat kthread migration
      a2ca3d61
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 283725af
      Linus Torvalds 提交于
      Pull input subsystem fixes from Dmitry Torokhov:
       "A fix for a crash in the wm97xx driver and synaptics-rmi4 will stop
        throwing erroneous warnings."
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: synaptics-rmi4 - fix reversed conditions in enable/disable_irq_wake
        Input: wm97xx - make missing platform data non-fatal
      283725af
    • L
      Merge branch 'for-4.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup · f1774f46
      Linus Torvalds 提交于
      Pull cgroup fix from Tejun Heo:
       "The cgroup creation path was getting the order of operations wrong and
        exposing cgroups which don't have their names set yet to controllers
        which can lead to NULL derefs.
      
        This contains the fix for the bug"
      
      * 'for-4.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
        cgroup: don't online subsystems before cgroup_name/path() are operational
      f1774f46
    • L
      Merge branch 'for-4.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu · 298a2d87
      Linus Torvalds 提交于
      Pull percpu fix from Tejun Heo:
       "Douglas found and fixed a ref leak bug in percpu_ref_tryget[_live]().
      
        The bug is caused by storing the return value of atomic_long_inc_not_zero()
        into an int temp variable before returning it as a bool. The interim
        cast to int loses the upper bits and can lead to false negatives. As
        percpu_ref uses a high bit to mark a draining counter, this can happen
        relatively easily.
      
        Fixed by using bool for the temp variable"
      
      * 'for-4.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
        percpu-refcount: fix reference leak during percpu-atomic transition
      298a2d87
    • L
      Merge branch 'for-4.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata · 52e02f27
      Linus Torvalds 提交于
      Pull libata fixes from Tejun Heo:
       "Three libata fixes: an error handling fix, blacklist addition for
        another fallout from upping the default max sectors, and fix for a
        sense data reporting bug which affects new harddrives which can report
        sense data"
      
      * 'for-4.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
        ata: sata_mv:- Handle return value of devm_ioremap.
        libata: Fix ATA request sense
        libata: apply MAX_SEC_1024 to all CX1-JB*-HP devices
      52e02f27
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid · c9194b99
      Linus Torvalds 提交于
      Pull HID fixes from Jiri Kosina:
      
       - regression fix (sleeping while atomic) for cp2112, from Johan Hovold
      
       - regression fix for proximity handling under certain circumstances in
         Wacom driver, from Jason Gerecke
      
       - functional fix for Logitech Rumblepad 2, from Ardinartsev Nikita
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
        HID: cp2112: fix gpio-callback error handling
        HID: cp2112: fix sleep-while-atomic
        HID: hid-lg: Fix immediate disconnection of Logitech Rumblepad 2
        HID: usbhid: Quirk a AMI virtual mouse and keyboard with ALWAYS_POLL
        HID: wacom: Fix poor prox handling in 'wacom_pl_irq'
      c9194b99
    • L
      Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6 · 415f9b71
      Linus Torvalds 提交于
      Pull cifs fix from Steve French:
       "A small cifs fix for stable"
      
      * 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: initialize file_info_lock
      415f9b71
    • D
      fscache: Fix dead object requeue · e26bfebd
      David Howells 提交于
      Under some circumstances, an fscache object can become queued such that it
      fscache_object_work_func() can be called once the object is in the
      OBJECT_DEAD state.  This results in the kernel oopsing when it tries to
      invoke the handler for the state (which is hard coded to 0x2).
      
      The way this comes about is something like the following:
      
       (1) The object dispatcher is processing a work state for an object.  This
           is done in workqueue context.
      
       (2) An out-of-band event comes in that isn't masked, causing the object to
           be queued, say EV_KILL.
      
       (3) The object dispatcher finishes processing the current work state on
           that object and then sees there's another event to process, so,
           without returning to the workqueue core, it processes that event too.
           It then follows the chain of events that initiates until we reach
           OBJECT_DEAD without going through a wait state (such as
           WAIT_FOR_CLEARANCE).
      
           At this point, object->events may be 0, object->event_mask will be 0
           and oob_event_mask will be 0.
      
       (4) The object dispatcher returns to the workqueue processor, and in due
           course, this sees that the object's work item is still queued and
           invokes it again.
      
       (5) The current state is a work state (OBJECT_DEAD), so the dispatcher
           jumps to it - resulting in an OOPS.
      
      When I'm seeing this, the work state in (1) appears to have been either
      LOOK_UP_OBJECT or CREATE_OBJECT (object->oob_table is
      fscache_osm_lookup_oob).
      
      The window for (2) is very small:
      
       (A) object->event_mask is cleared whilst the event dispatch process is
           underway - though there's no memory barrier to force this to the top
           of the function.
      
           The window, therefore is from the time the object was selected by the
           workqueue processor and made requeueable to the time the mask was
           cleared.
      
       (B) fscache_raise_event() will only queue the object if it manages to set
           the event bit and the corresponding event_mask bit was set.
      
           The enqueuement is then deferred slightly whilst we get a ref on the
           object and get the per-CPU variable for workqueue congestion.  This
           slight deferral slightly increases the probability by allowing extra
           time for the workqueue to make the item requeueable.
      
      Handle this by giving the dead state a processor function and checking the
      for the dead state address rather than seeing if the processor function is
      address 0x2.  The dead state processor function can then set a flag to
      indicate that it's occurred and give a warning if it occurs more than once
      per object.
      
      If this race occurs, an oops similar to the following is seen (note the RIP
      value):
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000002
      IP: [<0000000000000002>] 0x1
      PGD 0
      Oops: 0010 [#1] SMP
      Modules linked in: ...
      CPU: 17 PID: 16077 Comm: kworker/u48:9 Not tainted 3.10.0-327.18.2.el7.x86_64 #1
      Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 12/27/2015
      Workqueue: fscache_object fscache_object_work_func [fscache]
      task: ffff880302b63980 ti: ffff880717544000 task.ti: ffff880717544000
      RIP: 0010:[<0000000000000002>]  [<0000000000000002>] 0x1
      RSP: 0018:ffff880717547df8  EFLAGS: 00010202
      RAX: ffffffffa0368640 RBX: ffff880edf7a4480 RCX: dead000000200200
      RDX: 0000000000000002 RSI: 00000000ffffffff RDI: ffff880edf7a4480
      RBP: ffff880717547e18 R08: 0000000000000000 R09: dfc40a25cb3a4510
      R10: dfc40a25cb3a4510 R11: 0000000000000400 R12: 0000000000000000
      R13: ffff880edf7a4510 R14: ffff8817f6153400 R15: 0000000000000600
      FS:  0000000000000000(0000) GS:ffff88181f420000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000002 CR3: 000000000194a000 CR4: 00000000001407e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Stack:
       ffffffffa0363695 ffff880edf7a4510 ffff88093f16f900 ffff8817faa4ec00
       ffff880717547e60 ffffffff8109d5db 00000000faa4ec18 0000000000000000
       ffff8817faa4ec18 ffff88093f16f930 ffff880302b63980 ffff88093f16f900
      Call Trace:
       [<ffffffffa0363695>] ? fscache_object_work_func+0xa5/0x200 [fscache]
       [<ffffffff8109d5db>] process_one_work+0x17b/0x470
       [<ffffffff8109e4ac>] worker_thread+0x21c/0x400
       [<ffffffff8109e290>] ? rescuer_thread+0x400/0x400
       [<ffffffff810a5acf>] kthread+0xcf/0xe0
       [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
       [<ffffffff816460d8>] ret_from_fork+0x58/0x90
       [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NJeremy McNicoll <jeremymc@redhat.com>
      Tested-by: NFrank Sorenson <sorenson@redhat.com>
      Tested-by: NBenjamin Coddington <bcodding@redhat.com>
      Reviewed-by: NBenjamin Coddington <bcodding@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e26bfebd
    • D
      fscache: Clear outstanding writes when disabling a cookie · 6bdded59
      David Howells 提交于
      fscache_disable_cookie() needs to clear the outstanding writes on the
      cookie it's disabling because they cannot be completed after.
      
      Without this, fscache_nfs_open_file() gets stuck because it disables the
      cookie when the file is opened for writing but can't uncache the pages till
      afterwards - otherwise there's a race between the open routine and anyone
      who already has it open R/O and is still reading from it.
      
      Looking in /proc/pid/stack of the offending process shows:
      
      [<ffffffffa0142883>] __fscache_wait_on_page_write+0x82/0x9b [fscache]
      [<ffffffffa014336e>] __fscache_uncache_all_inode_pages+0x91/0xe1 [fscache]
      [<ffffffffa01740fa>] nfs_fscache_open_file+0x59/0x9e [nfs]
      [<ffffffffa01ccf41>] nfs4_file_open+0x17f/0x1b8 [nfsv4]
      [<ffffffff8117350e>] do_dentry_open+0x16d/0x2b7
      [<ffffffff811743ac>] vfs_open+0x5c/0x65
      [<ffffffff81184185>] path_openat+0x785/0x8fb
      [<ffffffff81184343>] do_filp_open+0x48/0x9e
      [<ffffffff81174710>] do_sys_open+0x13b/0x1cb
      [<ffffffff811747b9>] SyS_open+0x19/0x1b
      [<ffffffff81001c44>] do_syscall_64+0x80/0x17a
      [<ffffffff8165c2da>] return_from_SYSCALL_64+0x0/0x7a
      [<ffffffffffffffff>] 0xffffffffffffffff
      Reported-by: NJianhong Yin <jiyin@redhat.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NJeff Layton <jlayton@redhat.com>
      Acked-by: NSteve Dickson <steved@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      6bdded59
    • D
      FS-Cache: Initialise stores_lock in netfs cookie · 62deb818
      David Howells 提交于
      Initialise the stores_lock in fscache netfs cookies.  Technically, it
      shouldn't be necessary, since the netfs cookie is an index and stores no
      data, but initialising it anyway adds insignificant overhead.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Acked-by: NSteve Dickson <steved@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      62deb818
    • D
      ipv6: fix flow labels when the traffic class is non-0 · 90427ef5
      Dimitris Michailidis 提交于
      ip6_make_flowlabel() determines the flow label for IPv6 packets. It's
      supposed to be passed a flow label, which it returns as is if non-0 and
      in some other cases, otherwise it calculates a new value.
      
      The problem is callers often pass a flowi6.flowlabel, which may also
      contain traffic class bits. If the traffic class is non-0
      ip6_make_flowlabel() mistakes the non-0 it gets as a flow label and
      returns the whole thing. Thus it can return a 'flow label' longer than
      20b and the low 20b of that is typically 0 resulting in packets with 0
      label. Moreover, different packets of a flow may be labeled differently.
      For a TCP flow with ECN non-payload and payload packets get different
      labels as exemplified by this pair of consecutive packets:
      
      (pure ACK)
      Internet Protocol Version 6, Src: 2002:af5:11a3::, Dst: 2002:af5:11a2::
          0110 .... = Version: 6
          .... 0000 0000 .... .... .... .... .... = Traffic Class: 0x00 (DSCP: CS0, ECN: Not-ECT)
              .... 0000 00.. .... .... .... .... .... = Differentiated Services Codepoint: Default (0)
              .... .... ..00 .... .... .... .... .... = Explicit Congestion Notification: Not ECN-Capable Transport (0)
          .... .... .... 0001 1100 1110 0100 1001 = Flow Label: 0x1ce49
          Payload Length: 32
          Next Header: TCP (6)
      
      (payload)
      Internet Protocol Version 6, Src: 2002:af5:11a3::, Dst: 2002:af5:11a2::
          0110 .... = Version: 6
          .... 0000 0010 .... .... .... .... .... = Traffic Class: 0x02 (DSCP: CS0, ECN: ECT(0))
              .... 0000 00.. .... .... .... .... .... = Differentiated Services Codepoint: Default (0)
              .... .... ..10 .... .... .... .... .... = Explicit Congestion Notification: ECN-Capable Transport codepoint '10' (2)
          .... .... .... 0000 0000 0000 0000 0000 = Flow Label: 0x00000
          Payload Length: 688
          Next Header: TCP (6)
      
      This patch allows ip6_make_flowlabel() to be passed more than just a
      flow label and has it extract the part it really wants. This was simpler
      than modifying the callers. With this patch packets like the above become
      
      Internet Protocol Version 6, Src: 2002:af5:11a3::, Dst: 2002:af5:11a2::
          0110 .... = Version: 6
          .... 0000 0000 .... .... .... .... .... = Traffic Class: 0x00 (DSCP: CS0, ECN: Not-ECT)
              .... 0000 00.. .... .... .... .... .... = Differentiated Services Codepoint: Default (0)
              .... .... ..00 .... .... .... .... .... = Explicit Congestion Notification: Not ECN-Capable Transport (0)
          .... .... .... 1010 1111 1010 0101 1110 = Flow Label: 0xafa5e
          Payload Length: 32
          Next Header: TCP (6)
      
      Internet Protocol Version 6, Src: 2002:af5:11a3::, Dst: 2002:af5:11a2::
          0110 .... = Version: 6
          .... 0000 0010 .... .... .... .... .... = Traffic Class: 0x02 (DSCP: CS0, ECN: ECT(0))
              .... 0000 00.. .... .... .... .... .... = Differentiated Services Codepoint: Default (0)
              .... .... ..10 .... .... .... .... .... = Explicit Congestion Notification: ECN-Capable Transport codepoint '10' (2)
          .... .... .... 1010 1111 1010 0101 1110 = Flow Label: 0xafa5e
          Payload Length: 688
          Next Header: TCP (6)
      Signed-off-by: NDimitris Michailidis <dmichail@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      90427ef5
    • V
      net: thunderx: avoid dereferencing xcv when NULL · c73e4426
      Vincent 提交于
      This fixes the following smatch and coccinelle warnings:
      
        drivers/net/ethernet/cavium/thunder/thunder_xcv.c:119 xcv_setup_link() error: we previously assumed 'xcv' could be null (see line 118) [smatch]
        drivers/net/ethernet/cavium/thunder/thunder_xcv.c:119:16-20: ERROR: xcv is NULL but dereferenced. [coccinelle]
      
      Fixes: 6465859a ("net: thunderx: Add RGMII interface type support")
      Signed-off-by: NVincent Stehlé <vincent.stehle@laposte.net>
      Cc: Sunil Goutham <sgoutham@cavium.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c73e4426
  8. 31 1月, 2017 1 次提交
    • S
      tracing: Fix hwlat kthread migration · 79c6f448
      Steven Rostedt (VMware) 提交于
      The hwlat tracer creates a kernel thread at start of the tracer. It is
      pinned to a single CPU and will move to the next CPU after each period of
      running. If the user modifies the migration thread's affinity, it will not
      change after that happens.
      
      The original code created the thread at the first instance it was called,
      but later was changed to destroy the thread after the tracer was finished,
      and would not be created until the next instance of the tracer was
      established. The code that initialized the affinity was only called on the
      initial instantiation of the tracer. After that, it was not initialized, and
      the previous affinity did not match the current newly created one, making
      it appear that the user modified the thread's affinity when it did not, and
      the thread failed to migrate again.
      
      Cc: stable@vger.kernel.org
      Fixes: 0330f7aa ("tracing: Have hwlat trace migrate across tracing_cpumask CPUs")
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      79c6f448