1. 14 8月, 2013 3 次提交
  2. 10 8月, 2013 2 次提交
  3. 08 8月, 2013 2 次提交
    • E
      tcp: cubic: fix bug in bictcp_acked() · cd6b423a
      Eric Dumazet 提交于
      While investigating about strange increase of retransmit rates
      on hosts ~24 days after boot, Van found hystart was disabled
      if ca->epoch_start was 0, as following condition is true
      when tcp_time_stamp high order bit is set.
      
      (s32)(tcp_time_stamp - ca->epoch_start) < HZ
      
      Quoting Van :
      
       At initialization & after every loss ca->epoch_start is set to zero so
       I believe that the above line will turn off hystart as soon as the 2^31
       bit is set in tcp_time_stamp & hystart will stay off for 24 days.
       I think we've observed that cubic's restart is too aggressive without
       hystart so this might account for the higher drop rate we observe.
      Diagnosed-by: NVan Jacobson <vanj@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cd6b423a
    • E
      tcp: cubic: fix overflow error in bictcp_update() · 2ed0edf9
      Eric Dumazet 提交于
      commit 17a6e9f1 ("tcp_cubic: fix clock dependency") added an
      overflow error in bictcp_update() in following code :
      
      /* change the unit from HZ to bictcp_HZ */
      t = ((tcp_time_stamp + msecs_to_jiffies(ca->delay_min>>3) -
            ca->epoch_start) << BICTCP_HZ) / HZ;
      
      Because msecs_to_jiffies() being unsigned long, compiler does
      implicit type promotion.
      
      We really want to constrain (tcp_time_stamp - ca->epoch_start)
      to a signed 32bit value, or else 't' has unexpected high values.
      
      This bugs triggers an increase of retransmit rates ~24 days after
      boot [1], as the high order bit of tcp_time_stamp flips.
      
      [1] for hosts with HZ=1000
      
      Big thanks to Van Jacobson for spotting this problem.
      Diagnosed-by: NVan Jacobson <vanj@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2ed0edf9
  4. 06 8月, 2013 2 次提交
  5. 03 8月, 2013 1 次提交
  6. 25 7月, 2013 1 次提交
  7. 20 7月, 2013 1 次提交
  8. 17 7月, 2013 1 次提交
    • E
      ipv4: set transport header earlier · 21d1196a
      Eric Dumazet 提交于
      commit 45f00f99 ("ipv4: tcp: clean up tcp_v4_early_demux()") added a
      performance regression for non GRO traffic, basically disabling
      IP early demux.
      
      IPv6 stack resets transport header in ip6_rcv() before calling
      IP early demux in ip6_rcv_finish(), while IPv4 does this only in
      ip_local_deliver_finish(), _after_ IP early demux.
      
      GRO traffic happened to enable IP early demux because transport header
      is also set in inet_gro_receive()
      
      Instead of reverting the faulty commit, we can make IPv4/IPv6 behave the
      same : transport_header should be set in ip_rcv() instead of
      ip_local_deliver_finish()
      
      ip_local_deliver_finish() can also use skb_network_header_len() which is
      faster than ip_hdrlen()
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      21d1196a
  9. 13 7月, 2013 1 次提交
  10. 12 7月, 2013 3 次提交
    • A
      gre: Fix MTU sizing check for gretap tunnels · 8c91e162
      Alexander Duyck 提交于
      This change fixes an MTU sizing issue seen with gretap tunnels when non-gso
      packets are sent from the interface.
      
      In my case I was able to reproduce the issue by simply sending a ping of
      1421 bytes with the gretap interface created on a device with a standard
      1500 mtu.
      
      This fix is based on the fact that the tunnel mtu is already adjusted by
      dev->hard_header_len so it would make sense that any packets being compared
      against that mtu should also be adjusted by hard_header_len and the tunnel
      header instead of just the tunnel header.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Reported-by: NCong Wang <amwang@redhat.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8c91e162
    • A
      gso: Update tunnel segmentation to support Tx checksum offload · cdbaa0bb
      Alexander Duyck 提交于
      This change makes it so that the GRE and VXLAN tunnels can make use of Tx
      checksum offload support provided by some drivers via the hw_enc_features.
      Without this fix enabling GSO means sacrificing Tx checksum offload and
      this actually leads to a performance regression as shown below:
      
                  Utilization
                  Send
      Throughput  local         GSO
      10^6bits/s  % S           state
        6276.51   8.39          enabled
        7123.52   8.42          disabled
      
      To resolve this it was necessary to address two items.  First
      netif_skb_features needed to be updated so that it would correctly handle
      the Trans Ether Bridging protocol without impacting the need to check for
      Q-in-Q tagging.  To do this it was necessary to update harmonize_features
      so that it used skb_network_protocol instead of just using the outer
      protocol.
      
      Second it was necessary to update the GRE and UDP tunnel segmentation
      offloads so that they would reset the encapsulation bit and inner header
      offsets after the offload was complete.
      
      As a result of this change I have seen the following results on a interface
      with Tx checksum enabled for encapsulated frames:
      
                  Utilization
                  Send
      Throughput  local         GSO
      10^6bits/s  % S           state
        7123.52   8.42          disabled
        8321.75   5.43          enabled
      
      v2: Instead of replacing refrence to skb->protocol with
          skb_network_protocol just replace the protocol reference in
          harmonize_features to allow for double VLAN tag checks.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cdbaa0bb
    • C
      inet: fix spacing in assignment · 3b8ccd44
      Camelia Groza 提交于
      Found using checkpatch.pl
      Signed-off-by: NCamelia Groza <camelia.groza@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b8ccd44
  11. 11 7月, 2013 2 次提交
  12. 09 7月, 2013 1 次提交
  13. 04 7月, 2013 2 次提交
  14. 03 7月, 2013 2 次提交
    • P
      ip_tunnels: Use skb-len to PMTU check. · 23a3647b
      Pravin B Shelar 提交于
      In path mtu check, ip header total length works for gre device
      but not for gre-tap device.  Use skb len which is consistent
      for all tunneling types.  This is old bug in gre.
      This also fixes mtu calculation bug introduced by
      commit c5441932 (GRE: Refactor GRE tunneling code).
      Reported-by: NTimo Teras <timo.teras@iki.fi>
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      23a3647b
    • H
      ipv6: call udp_push_pending_frames when uncorking a socket with AF_INET pending data · 8822b64a
      Hannes Frederic Sowa 提交于
      We accidentally call down to ip6_push_pending_frames when uncorking
      pending AF_INET data on a ipv6 socket. This results in the following
      splat (from Dave Jones):
      
      skbuff: skb_under_panic: text:ffffffff816765f6 len:48 put:40 head:ffff88013deb6df0 data:ffff88013deb6dec tail:0x2c end:0xc0 dev:<NULL>
      ------------[ cut here ]------------
      kernel BUG at net/core/skbuff.c:126!
      invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      Modules linked in: dccp_ipv4 dccp 8021q garp bridge stp dlci mpoa snd_seq_dummy sctp fuse hidp tun bnep nfnetlink scsi_transport_iscsi rfcomm can_raw can_bcm af_802154 appletalk caif_socket can caif ipt_ULOG x25 rose af_key pppoe pppox ipx phonet irda llc2 ppp_generic slhc p8023 psnap p8022 llc crc_ccitt atm bluetooth
      +netrom ax25 nfc rfkill rds af_rxrpc coretemp hwmon kvm_intel kvm crc32c_intel snd_hda_codec_realtek ghash_clmulni_intel microcode pcspkr snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep usb_debug snd_seq snd_seq_device snd_pcm e1000e snd_page_alloc snd_timer ptp snd pps_core soundcore xfs libcrc32c
      CPU: 2 PID: 8095 Comm: trinity-child2 Not tainted 3.10.0-rc7+ #37
      task: ffff8801f52c2520 ti: ffff8801e6430000 task.ti: ffff8801e6430000
      RIP: 0010:[<ffffffff816e759c>]  [<ffffffff816e759c>] skb_panic+0x63/0x65
      RSP: 0018:ffff8801e6431de8  EFLAGS: 00010282
      RAX: 0000000000000086 RBX: ffff8802353d3cc0 RCX: 0000000000000006
      RDX: 0000000000003b90 RSI: ffff8801f52c2ca0 RDI: ffff8801f52c2520
      RBP: ffff8801e6431e08 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000001 R11: 0000000000000001 R12: ffff88022ea0c800
      R13: ffff88022ea0cdf8 R14: ffff8802353ecb40 R15: ffffffff81cc7800
      FS:  00007f5720a10740(0000) GS:ffff880244c00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000005862000 CR3: 000000022843c000 CR4: 00000000001407e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
      Stack:
       ffff88013deb6dec 000000000000002c 00000000000000c0 ffffffff81a3f6e4
       ffff8801e6431e18 ffffffff8159a9aa ffff8801e6431e90 ffffffff816765f6
       ffffffff810b756b 0000000700000002 ffff8801e6431e40 0000fea9292aa8c0
      Call Trace:
       [<ffffffff8159a9aa>] skb_push+0x3a/0x40
       [<ffffffff816765f6>] ip6_push_pending_frames+0x1f6/0x4d0
       [<ffffffff810b756b>] ? mark_held_locks+0xbb/0x140
       [<ffffffff81694919>] udp_v6_push_pending_frames+0x2b9/0x3d0
       [<ffffffff81694660>] ? udplite_getfrag+0x20/0x20
       [<ffffffff8162092a>] udp_lib_setsockopt+0x1aa/0x1f0
       [<ffffffff811cc5e7>] ? fget_light+0x387/0x4f0
       [<ffffffff816958a4>] udpv6_setsockopt+0x34/0x40
       [<ffffffff815949f4>] sock_common_setsockopt+0x14/0x20
       [<ffffffff81593c31>] SyS_setsockopt+0x71/0xd0
       [<ffffffff816f5d54>] tracesys+0xdd/0xe2
      Code: 00 00 48 89 44 24 10 8b 87 d8 00 00 00 48 89 44 24 08 48 8b 87 e8 00 00 00 48 c7 c7 c0 04 aa 81 48 89 04 24 31 c0 e8 e1 7e ff ff <0f> 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55
      RIP  [<ffffffff816e759c>] skb_panic+0x63/0x65
       RSP <ffff8801e6431de8>
      
      This patch adds a check if the pending data is of address family AF_INET
      and directly calls udp_push_ending_frames from udp_v6_push_pending_frames
      if that is the case.
      
      This bug was found by Dave Jones with trinity.
      
      (Also move the initialization of fl6 below the AF_INET check, even if
      not strictly necessary.)
      
      Cc: Dave Jones <davej@redhat.com>
      Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8822b64a
  15. 02 7月, 2013 3 次提交
    • C
      ipip: fix a regression in ioctl · 3b7b514f
      Cong Wang 提交于
      This is a regression introduced by
      commit fd58156e (IPIP: Use ip-tunneling code.)
      
      Similar to GRE tunnel, previously we only check the parameters
      for SIOCADDTUNNEL and SIOCCHGTUNNEL, after that commit, the
      check is moved for all commands.
      
      So, just check for SIOCADDTUNNEL and SIOCCHGTUNNEL.
      
      Also, the check for i_key, o_key etc. is suspicious too,
      which did not exist before, reset them before passing
      to ip_tunnel_ioctl().
      
      Cc: Pravin B Shelar <pshelar@nicira.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NCong Wang <amwang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b7b514f
    • C
      vti: remove duplicated code to fix a memory leak · ab6c7a0a
      Cong Wang 提交于
      vti module allocates dev->tstats twice: in vti_fb_tunnel_init()
      and in vti_tunnel_init(), this lead to a memory leak of
      dev->tstats.
      
      Just remove the duplicated operations in vti_fb_tunnel_init().
      
      (candidate for -stable)
      
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Saurabh Mohan <saurabh.mohan@vyatta.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NCong Wang <amwang@redhat.com>
      Acked-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ab6c7a0a
    • C
      gre: fix a regression in ioctl · 6c734fb8
      Cong Wang 提交于
      When testing GRE tunnel, I got:
      
       # ip tunnel show
       get tunnel gre0 failed: Invalid argument
       get tunnel gre1 failed: Invalid argument
      
      This is a regression introduced by commit c5441932
      ("GRE: Refactor GRE tunneling code.") because previously we
      only check the parameters for SIOCADDTUNNEL and SIOCCHGTUNNEL,
      after that commit, the check is moved for all commands.
      
      So, just check for SIOCADDTUNNEL and SIOCCHGTUNNEL.
      
      After this patch I got:
      
       # ip tunnel show
       gre0: gre/ip  remote any  local any  ttl inherit  nopmtudisc
       gre1: gre/ip  remote 192.168.122.101  local 192.168.122.45  ttl inherit
      
      Cc: Pravin B Shelar <pshelar@nicira.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NCong Wang <amwang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6c734fb8
  16. 29 6月, 2013 1 次提交
    • T
      ipv4: use next hop exceptions also for input routes · 2ffae99d
      Timo Teräs 提交于
      Commit d2d68ba9 (ipv4: Cache input routes in fib_info nexthops)
      assmued that "locally destined, and routed packets, never trigger
      PMTU events or redirects that will be processed by us".
      
      However, it seems that tunnel devices do trigger PMTU events in certain
      cases. At least ip_gre, ip6_gre, sit, and ipip do use the inner flow's
      skb_dst(skb)->ops->update_pmtu to propage mtu information from the
      outer flows. These can cause the inner flow mtu to be decreased. If
      next hop exceptions are not consulted for pmtu, IP fragmentation will
      not be done properly for these routes.
      
      It also seems that we really need to have the PMTU information always
      for netfilter TCPMSS clamp-to-pmtu feature to work properly.
      
      So for the time being, cache separate copies of input routes for
      each next hop exception.
      Signed-off-by: NTimo Teräs <timo.teras@iki.fi>
      Reviewed-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2ffae99d
  17. 28 6月, 2013 2 次提交
    • P
      netlink: fix splat in skb_clone with large messages · 3a36515f
      Pablo Neira 提交于
      Since (c05cdb1b netlink: allow large data transfers from user-space),
      netlink splats if it invokes skb_clone on large netlink skbs since:
      
      * skb_shared_info was not correctly initialized.
      * skb->destructor is not set in the cloned skb.
      
      This was spotted by trinity:
      
      [  894.990671] BUG: unable to handle kernel paging request at ffffc9000047b001
      [  894.991034] IP: [<ffffffff81a212c4>] skb_clone+0x24/0xc0
      [...]
      [  894.991034] Call Trace:
      [  894.991034]  [<ffffffff81ad299a>] nl_fib_input+0x6a/0x240
      [  894.991034]  [<ffffffff81c3b7e6>] ? _raw_read_unlock+0x26/0x40
      [  894.991034]  [<ffffffff81a5f189>] netlink_unicast+0x169/0x1e0
      [  894.991034]  [<ffffffff81a601e1>] netlink_sendmsg+0x251/0x3d0
      
      Fix it by:
      
      1) introducing a new netlink_skb_clone function that is used in nl_fib_input,
         that sets our special skb->destructor in the cloned skb. Moreover, handle
         the release of the large cloned skb head area in the destructor path.
      
      2) not allowing large skbuffs in the netlink broadcast path. I cannot find
         any reasonable use of the large data transfer using netlink in that path,
         moreover this helps to skip extra skb_clone handling.
      
      I found two more netlink clients that are cloning the skbs, but they are
      not in the sendmsg path. Therefore, the sole client cloning that I found
      seems to be the fib frontend.
      
      Thanks to Eric Dumazet for helping to address this issue.
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a36515f
    • N
      sit: add support of x-netns · 5e6700b3
      Nicolas Dichtel 提交于
      This patch allows to switch the netns when packet is encapsulated or
      decapsulated. In other word, the encapsulated packet is received in a netns,
      where the lookup is done to find the tunnel. Once the tunnel is found, the
      packet is decapsulated and injecting into the corresponding interface which
      stands to another netns.
      
      When one of the two netns is removed, the tunnel is destroyed.
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5e6700b3
  18. 27 6月, 2013 1 次提交
  19. 26 6月, 2013 1 次提交
  20. 24 6月, 2013 1 次提交
  21. 20 6月, 2013 7 次提交