1. 22 12月, 2017 5 次提交
    • W
      skbuff: skb_copy_ubufs must release uarg even without user frags · b90ddd56
      Willem de Bruijn 提交于
      skb_copy_ubufs creates a private copy of frags[] to release its hold
      on user frags, then calls uarg->callback to notify the owner.
      
      Call uarg->callback even when no frags exist. This edge case can
      happen when zerocopy_sg_from_iter finds enough room in skb_headlen
      to copy all the data.
      
      Fixes: 3ece7826 ("sock: skb_copy_ubufs support for compound pages")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b90ddd56
    • W
      skbuff: orphan frags before zerocopy clone · 268b7906
      Willem de Bruijn 提交于
      Call skb_zerocopy_clone after skb_orphan_frags, to avoid duplicate
      calls to skb_uarg(skb)->callback for the same data.
      
      skb_zerocopy_clone associates skb_shinfo(skb)->uarg from frag_skb
      with each segment. This is only safe for uargs that do refcounting,
      which is those that pass skb_orphan_frags without dropping their
      shared frags. For others, skb_orphan_frags drops the user frags and
      sets the uarg to NULL, after which sock_zerocopy_clone has no effect.
      
      Qemu hangs were reported due to duplicate vhost_net_zerocopy_callback
      calls for the same data causing the vhost_net_ubuf_ref_>refcount to
      drop below zero.
      
      Link: http://lkml.kernel.org/r/<CAF=yD-LWyCD4Y0aJ9O0e_CHLR+3JOeKicRRTEVCPxgw4XOcqGQ@mail.gmail.com>
      Fixes: 1f8b977a ("sock: enable MSG_ZEROCOPY")
      Reported-by: NAndreas Hartmann <andihartmann@01019freenet.de>
      Reported-by: NDavid Hill <dhill@redhat.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      268b7906
    • S
      net: reevalulate autoflowlabel setting after sysctl setting · 513674b5
      Shaohua Li 提交于
      sysctl.ip6.auto_flowlabels is default 1. In our hosts, we set it to 2.
      If sockopt doesn't set autoflowlabel, outcome packets from the hosts are
      supposed to not include flowlabel. This is true for normal packet, but
      not for reset packet.
      
      The reason is ipv6_pinfo.autoflowlabel is set in sock creation. Later if
      we change sysctl.ip6.auto_flowlabels, the ipv6_pinfo.autoflowlabel isn't
      changed, so the sock will keep the old behavior in terms of auto
      flowlabel. Reset packet is suffering from this problem, because reset
      packet is sent from a special control socket, which is created at boot
      time. Since sysctl.ipv6.auto_flowlabels is 1 by default, the control
      socket will always have its ipv6_pinfo.autoflowlabel set, even after
      user set sysctl.ipv6.auto_flowlabels to 1, so reset packset will always
      have flowlabel. Normal sock created before sysctl setting suffers from
      the same issue. We can't even turn off autoflowlabel unless we kill all
      socks in the hosts.
      
      To fix this, if IPV6_AUTOFLOWLABEL sockopt is used, we use the
      autoflowlabel setting from user, otherwise we always call
      ip6_default_np_autolabel() which has the new settings of sysctl.
      
      Note, this changes behavior a little bit. Before commit 42240901
      (ipv6: Implement different admin modes for automatic flow labels), the
      autoflowlabel behavior of a sock isn't sticky, eg, if sysctl changes,
      existing connection will change autoflowlabel behavior. After that
      commit, autoflowlabel behavior is sticky in the whole life of the sock.
      With this patch, the behavior isn't sticky again.
      
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Tom Herbert <tom@quantonium.net>
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      513674b5
    • E
      openvswitch: Fix pop_vlan action for double tagged frames · c48e7473
      Eric Garver 提交于
      skb_vlan_pop() expects skb->protocol to be a valid TPID for double
      tagged frames. So set skb->protocol to the TPID and let skb_vlan_pop()
      shift the true ethertype into position for us.
      
      Fixes: 5108bbad ("openvswitch: add processing of L3 packets")
      Signed-off-by: NEric Garver <e@erig.me>
      Reviewed-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c48e7473
    • I
      ipv6: Honor specified parameters in fibmatch lookup · 58acfd71
      Ido Schimmel 提交于
      Currently, parameters such as oif and source address are not taken into
      account during fibmatch lookup. Example (IPv4 for reference) before
      patch:
      
      $ ip -4 route show
      192.0.2.0/24 dev dummy0 proto kernel scope link src 192.0.2.1
      198.51.100.0/24 dev dummy1 proto kernel scope link src 198.51.100.1
      
      $ ip -6 route show
      2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium
      2001:db8:2::/64 dev dummy1 proto kernel metric 256 pref medium
      fe80::/64 dev dummy0 proto kernel metric 256 pref medium
      fe80::/64 dev dummy1 proto kernel metric 256 pref medium
      
      $ ip -4 route get fibmatch 192.0.2.2 oif dummy0
      192.0.2.0/24 dev dummy0 proto kernel scope link src 192.0.2.1
      $ ip -4 route get fibmatch 192.0.2.2 oif dummy1
      RTNETLINK answers: No route to host
      
      $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy0
      2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium
      $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy1
      2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium
      
      After:
      
      $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy0
      2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium
      $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy1
      RTNETLINK answers: Network is unreachable
      
      The problem stems from the fact that the necessary route lookup flags
      are not set based on these parameters.
      
      Instead of duplicating the same logic for fibmatch, we can simply
      resolve the original route from its copy and dump it instead.
      
      Fixes: 18c3a61c ("net: ipv6: RTM_GETROUTE: return matched fib result when requested")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58acfd71
  2. 21 12月, 2017 25 次提交
  3. 20 12月, 2017 10 次提交
    • D
      bpf: Fix tools and testing build. · 19c832ed
      David Miller 提交于
      I'm getting various build failures on sparc64.  The key is
      usually that the userland tools get built 32-bit.
      
      1) clock_gettime() is in librt, so that must be added to the link
         libraries.
      
      2) "sizeof(x)" must be printed with "%Z" printf prefix.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      19c832ed
    • M
      net/mlx5: Stay in polling mode when command EQ destroy fails · a2fba188
      Moshe Shemesh 提交于
      During unload, on mlx5_stop_eqs we move command interface from events
      mode to polling mode, but if command interface EQ destroy fail we move
      back to events mode.
      That's wrong since even if we fail to destroy command interface EQ, we
      do release its irq, so no interrupts will be received.
      
      Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
      Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      a2fba188
    • M
      net/mlx5: Cleanup IRQs in case of unload failure · d6b2785c
      Moshe Shemesh 提交于
      When mlx5_stop_eqs fails to destroy any of the eqs it returns with an error.
      In such failure flow the function will return without
      releasing all EQs irqs and then pci_free_irq_vectors will fail.
      Fix by only warn on destroy EQ failure and continue to release other
      EQs and their irqs.
      
      It fixes the following kernel trace:
      kernel: kernel BUG at drivers/pci/msi.c:352!
      ...
      ...
      kernel: Call Trace:
      kernel: pci_disable_msix+0xd3/0x100
      kernel: pci_free_irq_vectors+0xe/0x20
      kernel: mlx5_load_one.isra.17+0x9f5/0xec0 [mlx5_core]
      
      Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
      Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      d6b2785c
    • M
      net/mlx5: Fix steering memory leak · 139ed6c6
      Maor Gottlieb 提交于
      Flow steering priority and namespace are software only objects that
      didn't have the proper destructors and were not freed during steering
      cleanup.
      
      Fix it by adding destructor functions for these objects.
      
      Fixes: bd71b08e ("net/mlx5: Support multiple updates of steering rules in parallel")
      Signed-off-by: NMaor Gottlieb <maorg@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      139ed6c6
    • G
      net/mlx5e: Prevent possible races in VXLAN control flow · 0c1cc8b2
      Gal Pressman 提交于
      When calling add/remove VXLAN port, a lock must be held in order to
      prevent race scenarios when more than one add/remove happens at the
      same time.
      Fix by holding our state_lock (mutex) as done by all other parts of the
      driver.
      Note that the spinlock protecting the radix-tree is still needed in
      order to synchronize radix-tree access from softirq context.
      
      Fixes: b3f63c3d ("net/mlx5e: Add netdev support for VXLAN tunneling")
      Signed-off-by: NGal Pressman <galp@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      0c1cc8b2
    • G
      net/mlx5e: Add refcount to VXLAN structure · 23f4cc2c
      Gal Pressman 提交于
      A refcount mechanism must be implemented in order to prevent unwanted
      scenarios such as:
      - Open an IPv4 VXLAN interface
      - Open an IPv6 VXLAN interface (different socket)
      - Remove one of the interfaces
      
      With current implementation, the UDP port will be removed from our VXLAN
      database and turn off the offloads for the other interface, which is
      still active.
      The reference count mechanism will only allow UDP port removals once all
      consumers are gone.
      
      Fixes: b3f63c3d ("net/mlx5e: Add netdev support for VXLAN tunneling")
      Signed-off-by: NGal Pressman <galp@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      23f4cc2c
    • G
      net/mlx5e: Fix possible deadlock of VXLAN lock · 63235141
      Gal Pressman 提交于
      mlx5e_vxlan_lookup_port is called both from mlx5e_add_vxlan_port (user
      context) and mlx5e_features_check (softirq), but the lock acquired does
      not disable bottom half and might result in deadlock. Fix it by simply
      replacing spin_lock() with spin_lock_bh().
      While at it, replace all unnecessary spin_lock_irq() to spin_lock_bh().
      
      lockdep's WARNING: inconsistent lock state
      [  654.028136] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
      [  654.028229] swapper/5/0 [HC0[0]:SC1[9]:HE1:SE0] takes:
      [  654.028321]  (&(&vxlan_db->lock)->rlock){+.?.}, at: [<ffffffffa06e7f0e>] mlx5e_vxlan_lookup_port+0x1e/0x50 [mlx5_core]
      [  654.028528] {SOFTIRQ-ON-W} state was registered at:
      [  654.028607]   _raw_spin_lock+0x3c/0x70
      [  654.028689]   mlx5e_vxlan_lookup_port+0x1e/0x50 [mlx5_core]
      [  654.028794]   mlx5e_vxlan_add_port+0x2e/0x120 [mlx5_core]
      [  654.028878]   process_one_work+0x1e9/0x640
      [  654.028942]   worker_thread+0x4a/0x3f0
      [  654.029002]   kthread+0x141/0x180
      [  654.029056]   ret_from_fork+0x24/0x30
      [  654.029114] irq event stamp: 579088
      [  654.029174] hardirqs last  enabled at (579088): [<ffffffff818f475a>] ip6_finish_output2+0x49a/0x8c0
      [  654.029309] hardirqs last disabled at (579087): [<ffffffff818f470e>] ip6_finish_output2+0x44e/0x8c0
      [  654.029446] softirqs last  enabled at (579030): [<ffffffff810b3b3d>] irq_enter+0x6d/0x80
      [  654.029567] softirqs last disabled at (579031): [<ffffffff810b3c05>] irq_exit+0xb5/0xc0
      [  654.029684] other info that might help us debug this:
      [  654.029781]  Possible unsafe locking scenario:
      
      [  654.029868]        CPU0
      [  654.029908]        ----
      [  654.029947]   lock(&(&vxlan_db->lock)->rlock);
      [  654.030045]   <Interrupt>
      [  654.030090]     lock(&(&vxlan_db->lock)->rlock);
      [  654.030162]
       *** DEADLOCK ***
      
      Fixes: b3f63c3d ("net/mlx5e: Add netdev support for VXLAN tunneling")
      Signed-off-by: NGal Pressman <galp@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      63235141
    • M
      net/mlx5: Fix error flow in CREATE_QP command · dbff26e4
      Moni Shoua 提交于
      In error flow, when DESTROY_QP command should be executed, the wrong
      mailbox was set with data, not the one that is written to hardware,
      Fix that.
      
      Fixes: 09a7d9ec '{net,IB}/mlx5: QP/XRCD commands via mlx5 ifc'
      Signed-off-by: NMoni Shoua <monis@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      dbff26e4
    • E
      net/mlx5: Fix misspelling in the error message and comment · 777ec2b2
      Eugenia Emantayev 提交于
      Fix misspelling in word syndrome.
      
      Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
      Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      777ec2b2
    • E
      net/mlx5e: Fix defaulting RX ring size when not needed · 696a97cf
      Eugenia Emantayev 提交于
      Fixes the bug when turning on/off CQE compression mechanism
      resets the RX rings size to default value when it is not
      needed.
      
      Fixes: 2fc4bfb7 ("net/mlx5e: Dynamic RQ type infrastructure")
      Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      696a97cf