1. 02 9月, 2016 1 次提交
    • P
      tipc: fix random link resets while adding a second bearer · d2f394dc
      Parthasarathy Bhuvaragan 提交于
      In a dual bearer configuration, if the second tipc link becomes
      active while the first link still has pending nametable "bulk"
      updates, it randomly leads to reset of the second link.
      
      When a link is established, the function named_distribute(),
      fills the skb based on node mtu (allows room for TUNNEL_PROTOCOL)
      with NAME_DISTRIBUTOR message for each PUBLICATION.
      However, the function named_distribute() allocates the buffer by
      increasing the node mtu by INT_H_SIZE (to insert NAME_DISTRIBUTOR).
      This consumes the space allocated for TUNNEL_PROTOCOL.
      
      When establishing the second link, the link shall tunnel all the
      messages in the first link queue including the "bulk" update.
      As size of the NAME_DISTRIBUTOR messages while tunnelling, exceeds
      the link mtu the transmission fails (-EMSGSIZE).
      
      Thus, the synch point based on the message count of the tunnel
      packets is never reached leading to link timeout.
      
      In this commit, we adjust the size of name distributor message so that
      they can be tunnelled.
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d2f394dc
  2. 01 9月, 2016 2 次提交
  3. 30 8月, 2016 2 次提交
  4. 26 8月, 2016 4 次提交
  5. 25 8月, 2016 6 次提交
  6. 24 8月, 2016 7 次提交
  7. 23 8月, 2016 3 次提交
    • J
      net sched: fix encoding to use real length · 28a10c42
      Jamal Hadi Salim 提交于
      Encoding of the metadata was using the padded length as opposed to
      the real length of the data which is a bug per specification.
      This has not been an issue todate because all metadatum specified
      so far has been 32 bit where aligned and data length are the same width.
      This also includes a bug fix for validating the length of a u16 field.
      But since there is no metadata of size u16 yes we are fine to include it
      here.
      
      While at it get rid of magic numbers.
      
      Fixes: ef6980b6 ("net sched: introduce IFE action")
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      28a10c42
    • S
      net: ip_finish_output_gso: Allow fragmenting segments of tunneled skbs if their DF is unset · c0451fe1
      Shmulik Ladkani 提交于
      In b8247f09,
      
         "net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs"
      
      gso skbs arriving from an ingress interface that go through UDP
      tunneling, are allowed to be fragmented if the resulting encapulated
      segments exceed the dst mtu of the egress interface.
      
      This aligned the behavior of gso skbs to non-gso skbs going through udp
      encapsulation path.
      
      However the non-gso vs gso anomaly is present also in the following
      cases of a GRE tunnel:
       - ip_gre in collect_md mode, where TUNNEL_DONT_FRAGMENT is not set
         (e.g. OvS vport-gre with df_default=false)
       - ip_gre in nopmtudisc mode, where IFLA_GRE_IGNORE_DF is set
      
      In both of the above cases, the non-gso skbs get fragmented, whereas the
      gso skbs (having skb_gso_network_seglen that exceeds dst mtu) get dropped,
      as they don't go through the segment+fragment code path.
      
      Fix: Setting IPSKB_FRAG_SEGS if the tunnel specified IP_DF bit is NOT set.
      
      Tunnels that do set IP_DF, will not go to fragmentation of segments.
      This preserves behavior of ip_gre in (the default) pmtudisc mode.
      
      Fixes: b8247f09 ("net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, allow segmentation for local udp tunneled skbs")
      Reported-by: Nwenxu <wenxu@ucloud.cn>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com>
      Tested-by: Nwenxu <wenxu@ucloud.cn>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c0451fe1
    • M
      net: ipv6: Remove addresses for failures with strict DAD · 85b51b12
      Mike Manning 提交于
      If DAD fails with accept_dad set to 2, global addresses and host routes
      are incorrectly left in place. Even though disable_ipv6 is set,
      contrary to documentation, the addresses are not dynamically deleted
      from the interface. It is only on a subsequent link down/up that these
      are removed. The fix is not only to set the disable_ipv6 flag, but
      also to call addrconf_ifdown(), which is the action to carry out when
      disabling IPv6. This results in the addresses and routes being deleted
      immediately. The DAD failure for the LL addr is determined as before
      via netlink, or by the absence of the LL addr (which also previously
      would have had to be checked for in case of an intervening link down
      and up). As the call to addrconf_ifdown() requires an rtnl lock, the
      logic to disable IPv6 when DAD fails is moved to addrconf_dad_work().
      
      Previous behavior:
      
      root@vm1:/# sysctl net.ipv6.conf.eth3.accept_dad=2
      net.ipv6.conf.eth3.accept_dad = 2
      root@vm1:/# ip -6 addr add 2000::10/64 dev eth3
      root@vm1:/# ip link set up eth3
      root@vm1:/# ip -6 addr show dev eth3
      5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
          inet6 2000::10/64 scope global
             valid_lft forever preferred_lft forever
          inet6 fe80::5054:ff:fe43:dd5a/64 scope link tentative dadfailed
             valid_lft forever preferred_lft forever
      root@vm1:/# ip -6 route show dev eth3
      2000::/64  proto kernel  metric 256
      fe80::/64  proto kernel  metric 256
      root@vm1:/# ip link set down eth3
      root@vm1:/# ip link set up eth3
      root@vm1:/# ip -6 addr show dev eth3
      root@vm1:/# ip -6 route show dev eth3
      root@vm1:/#
      
      New behavior:
      
      root@vm1:/# sysctl net.ipv6.conf.eth3.accept_dad=2
      net.ipv6.conf.eth3.accept_dad = 2
      root@vm1:/# ip -6 addr add 2000::10/64 dev eth3
      root@vm1:/# ip link set up eth3
      root@vm1:/# ip -6 addr show dev eth3
      root@vm1:/# ip -6 route show dev eth3
      root@vm1:/#
      Signed-off-by: NMike Manning <mmanning@brocade.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      85b51b12
  8. 20 8月, 2016 2 次提交
    • G
      l2tp: Fix the connect status check in pppol2tp_getname · 56cff471
      Gao Feng 提交于
      The sk->sk_state is bits flag, so need use bit operation check
      instead of value check.
      Signed-off-by: NGao Feng <fgao@ikuai8.com>
      Tested-by: NGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      56cff471
    • M
      sctp: linearize early if it's not GSO · 4c2f2454
      Marcelo Ricardo Leitner 提交于
      Because otherwise when crc computation is still needed it's way more
      expensive than on a linear buffer to the point that it affects
      performance.
      
      It's so expensive that netperf test gives a perf output as below:
      
      Overhead  Command         Shared Object       Symbol
        18,62%  netserver       [kernel.vmlinux]    [k] crc32_generic_shift
         2,57%  netserver       [kernel.vmlinux]    [k] __pskb_pull_tail
         1,94%  netserver       [kernel.vmlinux]    [k] fib_table_lookup
         1,90%  netserver       [kernel.vmlinux]    [k] copy_user_enhanced_fast_string
         1,66%  swapper         [kernel.vmlinux]    [k] intel_idle
         1,63%  netserver       [kernel.vmlinux]    [k] _raw_spin_lock
         1,59%  netserver       [sctp]              [k] sctp_packet_transmit
         1,55%  netserver       [kernel.vmlinux]    [k] memcpy_erms
         1,42%  netserver       [sctp]              [k] sctp_rcv
      
      # netperf -H 192.168.10.1 -l 10 -t SCTP_STREAM -cC -- -m 12000
      SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.10.1 () port 0 AF_INET
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
      212992 212992  12000    10.00      3016.42   2.88     3.78     1.874   2.462
      
      After patch:
      Overhead  Command         Shared Object      Symbol
         2,75%  netserver       [kernel.vmlinux]   [k] memcpy_erms
         2,63%  netserver       [kernel.vmlinux]   [k] copy_user_enhanced_fast_string
         2,39%  netserver       [kernel.vmlinux]   [k] fib_table_lookup
         2,04%  netserver       [kernel.vmlinux]   [k] __pskb_pull_tail
         1,91%  netserver       [kernel.vmlinux]   [k] _raw_spin_lock
         1,91%  netserver       [sctp]             [k] sctp_packet_transmit
         1,72%  netserver       [mlx4_en]          [k] mlx4_en_process_rx_cq
         1,68%  netserver       [sctp]             [k] sctp_rcv
      
      # netperf -H 192.168.10.1 -l 10 -t SCTP_STREAM -cC -- -m 12000
      SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.10.1 () port 0 AF_INET
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
      212992 212992  12000    10.00      3681.77   3.83     3.46     2.045   1.849
      
      Fixes: 3acb50c1 ("sctp: delay as much as possible skb_linearize")
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c2f2454
  9. 19 8月, 2016 1 次提交
  10. 18 8月, 2016 9 次提交
  11. 17 8月, 2016 2 次提交
  12. 16 8月, 2016 1 次提交
    • V
      tipc: fix NULL pointer dereference in shutdown() · d2fbdf76
      Vegard Nossum 提交于
      tipc_msg_create() can return a NULL skb and if so, we shouldn't try to
      call tipc_node_xmit_skb() on it.
      
          general protection fault: 0000 [#1] PREEMPT SMP KASAN
          CPU: 3 PID: 30298 Comm: trinity-c0 Not tainted 4.7.0-rc7+ #19
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
          task: ffff8800baf09980 ti: ffff8800595b8000 task.ti: ffff8800595b8000
          RIP: 0010:[<ffffffff830bb46b>]  [<ffffffff830bb46b>] tipc_node_xmit_skb+0x6b/0x140
          RSP: 0018:ffff8800595bfce8  EFLAGS: 00010246
          RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000003023b0e0
          RDX: 0000000000000000 RSI: dffffc0000000000 RDI: ffffffff83d12580
          RBP: ffff8800595bfd78 R08: ffffed000b2b7f32 R09: 0000000000000000
          R10: fffffbfff0759725 R11: 0000000000000000 R12: 1ffff1000b2b7f9f
          R13: ffff8800595bfd58 R14: ffffffff83d12580 R15: dffffc0000000000
          FS:  00007fcdde242700(0000) GS:ffff88011af80000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: 00007fcddde1db10 CR3: 000000006874b000 CR4: 00000000000006e0
          DR0: 00007fcdde248000 DR1: 00007fcddd73d000 DR2: 00007fcdde248000
          DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000090602
          Stack:
           0000000000000018 0000000000000018 0000000041b58ab3 ffffffff83954208
           ffffffff830bb400 ffff8800595bfd30 ffffffff8309d767 0000000000000018
           0000000000000018 ffff8800595bfd78 ffffffff8309da1a 00000000810ee611
          Call Trace:
           [<ffffffff830c84a3>] tipc_shutdown+0x553/0x880
           [<ffffffff825b4a3b>] SyS_shutdown+0x14b/0x170
           [<ffffffff8100334c>] do_syscall_64+0x19c/0x410
           [<ffffffff83295ca5>] entry_SYSCALL64_slow_path+0x25/0x25
          Code: 90 00 b4 0b 83 c7 00 f1 f1 f1 f1 4c 8d 6d e0 c7 40 04 00 00 00 f4 c7 40 08 f3 f3 f3 f3 48 89 d8 48 c1 e8 03 c7 45 b4 00 00 00 00 <80> 3c 30 00 75 78 48 8d 7b 08 49 8d 75 c0 48 b8 00 00 00 00 00
          RIP  [<ffffffff830bb46b>] tipc_node_xmit_skb+0x6b/0x140
           RSP <ffff8800595bfce8>
          ---[ end trace 57b0484e351e71f1 ]---
      
      I feel like we should maybe return -ENOMEM or -ENOBUFS, but I'm not sure
      userspace is equipped to handle that. Anyway, this is better than a GPF
      and looks somewhat consistent with other tipc_msg_create() callers.
      Signed-off-by: NVegard Nossum <vegard.nossum@oracle.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d2fbdf76