1. 27 3月, 2013 1 次提交
  2. 25 3月, 2013 1 次提交
  3. 22 3月, 2013 1 次提交
    • E
      tcp: preserve ACK clocking in TSO · f4541d60
      Eric Dumazet 提交于
      A long standing problem with TSO is the fact that tcp_tso_should_defer()
      rearms the deferred timer, while it should not.
      
      Current code leads to following bad bursty behavior :
      
      20:11:24.484333 IP A > B: . 297161:316921(19760) ack 1 win 119
      20:11:24.484337 IP B > A: . ack 263721 win 1117
      20:11:24.485086 IP B > A: . ack 265241 win 1117
      20:11:24.485925 IP B > A: . ack 266761 win 1117
      20:11:24.486759 IP B > A: . ack 268281 win 1117
      20:11:24.487594 IP B > A: . ack 269801 win 1117
      20:11:24.488430 IP B > A: . ack 271321 win 1117
      20:11:24.489267 IP B > A: . ack 272841 win 1117
      20:11:24.490104 IP B > A: . ack 274361 win 1117
      20:11:24.490939 IP B > A: . ack 275881 win 1117
      20:11:24.491775 IP B > A: . ack 277401 win 1117
      20:11:24.491784 IP A > B: . 316921:332881(15960) ack 1 win 119
      20:11:24.492620 IP B > A: . ack 278921 win 1117
      20:11:24.493448 IP B > A: . ack 280441 win 1117
      20:11:24.494286 IP B > A: . ack 281961 win 1117
      20:11:24.495122 IP B > A: . ack 283481 win 1117
      20:11:24.495958 IP B > A: . ack 285001 win 1117
      20:11:24.496791 IP B > A: . ack 286521 win 1117
      20:11:24.497628 IP B > A: . ack 288041 win 1117
      20:11:24.498459 IP B > A: . ack 289561 win 1117
      20:11:24.499296 IP B > A: . ack 291081 win 1117
      20:11:24.500133 IP B > A: . ack 292601 win 1117
      20:11:24.500970 IP B > A: . ack 294121 win 1117
      20:11:24.501388 IP B > A: . ack 295641 win 1117
      20:11:24.501398 IP A > B: . 332881:351881(19000) ack 1 win 119
      
      While the expected behavior is more like :
      
      20:19:49.259620 IP A > B: . 197601:202161(4560) ack 1 win 119
      20:19:49.260446 IP B > A: . ack 154281 win 1212
      20:19:49.261282 IP B > A: . ack 155801 win 1212
      20:19:49.262125 IP B > A: . ack 157321 win 1212
      20:19:49.262136 IP A > B: . 202161:206721(4560) ack 1 win 119
      20:19:49.262958 IP B > A: . ack 158841 win 1212
      20:19:49.263795 IP B > A: . ack 160361 win 1212
      20:19:49.264628 IP B > A: . ack 161881 win 1212
      20:19:49.264637 IP A > B: . 206721:211281(4560) ack 1 win 119
      20:19:49.265465 IP B > A: . ack 163401 win 1212
      20:19:49.265886 IP B > A: . ack 164921 win 1212
      20:19:49.266722 IP B > A: . ack 166441 win 1212
      20:19:49.266732 IP A > B: . 211281:215841(4560) ack 1 win 119
      20:19:49.267559 IP B > A: . ack 167961 win 1212
      20:19:49.268394 IP B > A: . ack 169481 win 1212
      20:19:49.269232 IP B > A: . ack 171001 win 1212
      20:19:49.269241 IP A > B: . 215841:221161(5320) ack 1 win 119
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Van Jacobson <vanj@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Nandita Dukkipati <nanditad@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4541d60
  4. 21 3月, 2013 2 次提交
    • M
      ipconfig: Fix newline handling in log message. · 283951f9
      Martin Fuzzey 提交于
      When using ipconfig the logs currently look like:
      
      Single name server:
      [    3.467270] IP-Config: Complete:
      [    3.470613]      device=eth0, hwaddr=ac:de:48:00:00:01, ipaddr=172.16.42.2, mask=255.255.255.0, gw=172.16.42.1
      [    3.480670]      host=infigo-1, domain=, nis-domain=(none)
      [    3.486166]      bootserver=172.16.42.1, rootserver=172.16.42.1, rootpath=
      [    3.492910]      nameserver0=172.16.42.1[    3.496853] ALSA device list:
      
      Three name servers:
      [    3.496949] IP-Config: Complete:
      [    3.500293]      device=eth0, hwaddr=ac:de:48:00:00:01, ipaddr=172.16.42.2, mask=255.255.255.0, gw=172.16.42.1
      [    3.510367]      host=infigo-1, domain=, nis-domain=(none)
      [    3.515864]      bootserver=172.16.42.1, rootserver=172.16.42.1, rootpath=
      [    3.522635]      nameserver0=172.16.42.1, nameserver1=172.16.42.100
      [    3.529149] , nameserver2=172.16.42.200
      
      Fix newline handling for these cases
      Signed-off-by: NMartin Fuzzey <mfuzzey@parkeon.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      283951f9
    • T
      udp: add encap_destroy callback · 44046a59
      Tom Parkin 提交于
      Users of udp encapsulation currently have an encap_rcv callback which they can
      use to hook into the udp receive path.
      
      In situations where a encapsulation user allocates resources associated with a
      udp encap socket, it may be convenient to be able to also hook the proto
      .destroy operation.  For example, if an encap user holds a reference to the
      udp socket, the destroy hook might be used to relinquish this reference.
      
      This patch adds a socket destroy hook into udp, which is set and enabled
      in the same way as the existing encap_rcv hook.
      Signed-off-by: NTom Parkin <tparkin@katalix.com>
      Signed-off-by: NJames Chapman <jchapman@katalix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      44046a59
  5. 20 3月, 2013 1 次提交
  6. 19 3月, 2013 2 次提交
    • H
      inet: limit length of fragment queue hash table bucket lists · 5a3da1fe
      Hannes Frederic Sowa 提交于
      This patch introduces a constant limit of the fragment queue hash
      table bucket list lengths. Currently the limit 128 is choosen somewhat
      arbitrary and just ensures that we can fill up the fragment cache with
      empty packets up to the default ip_frag_high_thresh limits. It should
      just protect from list iteration eating considerable amounts of cpu.
      
      If we reach the maximum length in one hash bucket a warning is printed.
      This is implemented on the caller side of inet_frag_find to distinguish
      between the different users of inet_fragment.c.
      
      I dropped the out of memory warning in the ipv4 fragment lookup path,
      because we already get a warning by the slab allocator.
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Jesper Dangaard Brouer <jbrouer@redhat.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a3da1fe
    • E
      tcp: dont handle MTU reduction on LISTEN socket · 0d4f0608
      Eric Dumazet 提交于
      When an ICMP ICMP_FRAG_NEEDED (or ICMPV6_PKT_TOOBIG) message finds a
      LISTEN socket, and this socket is currently owned by the user, we
      set TCP_MTU_REDUCED_DEFERRED flag in listener tsq_flags.
      
      This is bad because if we clone the parent before it had a chance to
      clear the flag, the child inherits the tsq_flags value, and next
      tcp_release_cb() on the child will decrement sk_refcnt.
      
      Result is that we might free a live TCP socket, as reported by
      Dormando.
      
      IPv4: Attempt to release TCP socket in state 1
      
      Fix this issue by testing sk_state against TCP_LISTEN early, so that we
      set TCP_MTU_REDUCED_DEFERRED on appropriate sockets (not a LISTEN one)
      
      This bug was introduced in commit 563d34d0
      (tcp: dont drop MTU reduction indications)
      Reported-by: Ndormando <dormando@rydia.net>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d4f0608
  7. 17 3月, 2013 1 次提交
  8. 14 3月, 2013 1 次提交
  9. 12 3月, 2013 1 次提交
  10. 08 3月, 2013 1 次提交
  11. 06 3月, 2013 1 次提交
    • D
      net/ipv4: Timestamp option cannot overflow with prespecified addresses · fa2b04f4
      David Ward 提交于
      When a router forwards a packet that contains the IPv4 timestamp option,
      if there is no space left in the option for the router to add its own
      timestamp, then the router increments the Overflow value in the option.
      
      However, if the addresses of the routers are prespecified in the option,
      then the overflow condition cannot happen: the option is structured so
      that each prespecified router has a place to write its timestamp. Other
      routers do not add a timestamp, so there will never be a lack of space.
      
      This fix ensures that the Overflow value in the IPv4 timestamp option is
      not incremented when the addresses of the routers are prespecified, even
      if the Pointer value is greater than the Length value.
      Signed-off-by: NDavid Ward <david.ward@ll.mit.edu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fa2b04f4
  12. 05 3月, 2013 1 次提交
  13. 02 3月, 2013 1 次提交
  14. 28 2月, 2013 1 次提交
    • S
      hlist: drop the node parameter from iterators · b67bfe0d
      Sasha Levin 提交于
      I'm not sure why, but the hlist for each entry iterators were conceived
      
              list_for_each_entry(pos, head, member)
      
      The hlist ones were greedy and wanted an extra parameter:
      
              hlist_for_each_entry(tpos, pos, head, member)
      
      Why did they need an extra pos parameter? I'm not quite sure. Not only
      they don't really need it, it also prevents the iterator from looking
      exactly like the list iterator, which is unfortunate.
      
      Besides the semantic patch, there was some manual work required:
      
       - Fix up the actual hlist iterators in linux/list.h
       - Fix up the declaration of other iterators based on the hlist ones.
       - A very small amount of places were using the 'node' parameter, this
       was modified to use 'obj->member' instead.
       - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
       properly, so those had to be fixed up manually.
      
      The semantic patch which is mostly the work of Peter Senna Tschudin is here:
      
      @@
      iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
      
      type T;
      expression a,c,d,e;
      identifier b;
      statement S;
      @@
      
      -T b;
          <+... when != b
      (
      hlist_for_each_entry(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue(a,
      - b,
      c) S
      |
      hlist_for_each_entry_from(a,
      - b,
      c) S
      |
      hlist_for_each_entry_rcu(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_rcu_bh(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue_rcu_bh(a,
      - b,
      c) S
      |
      for_each_busy_worker(a, c,
      - b,
      d) S
      |
      ax25_uid_for_each(a,
      - b,
      c) S
      |
      ax25_for_each(a,
      - b,
      c) S
      |
      inet_bind_bucket_for_each(a,
      - b,
      c) S
      |
      sctp_for_each_hentry(a,
      - b,
      c) S
      |
      sk_for_each(a,
      - b,
      c) S
      |
      sk_for_each_rcu(a,
      - b,
      c) S
      |
      sk_for_each_from
      -(a, b)
      +(a)
      S
      + sk_for_each_from(a) S
      |
      sk_for_each_safe(a,
      - b,
      c, d) S
      |
      sk_for_each_bound(a,
      - b,
      c) S
      |
      hlist_for_each_entry_safe(a,
      - b,
      c, d, e) S
      |
      hlist_for_each_entry_continue_rcu(a,
      - b,
      c) S
      |
      nr_neigh_for_each(a,
      - b,
      c) S
      |
      nr_neigh_for_each_safe(a,
      - b,
      c, d) S
      |
      nr_node_for_each(a,
      - b,
      c) S
      |
      nr_node_for_each_safe(a,
      - b,
      c, d) S
      |
      - for_each_gfn_sp(a, c, d, b) S
      + for_each_gfn_sp(a, c, d) S
      |
      - for_each_gfn_indirect_valid_sp(a, c, d, b) S
      + for_each_gfn_indirect_valid_sp(a, c, d) S
      |
      for_each_host(a,
      - b,
      c) S
      |
      for_each_host_safe(a,
      - b,
      c, d) S
      |
      for_each_mesh_entry(a,
      - b,
      c, d) S
      )
          ...+>
      
      [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
      [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
      [akpm@linux-foundation.org: checkpatch fixes]
      [akpm@linux-foundation.org: fix warnings]
      [akpm@linux-foudnation.org: redo intrusive kvm changes]
      Tested-by: NPeter Senna Tschudin <peter.senna@gmail.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b67bfe0d
  15. 26 2月, 2013 3 次提交
    • P
      Revert "ip_gre: propogate target device GSO capability to the tunnel device" · 7992ae6d
      Pravin B Shelar 提交于
      This reverts commit eb6b9a8c.
      
      Above commit limits GSO capability of gre device to just TSO, but
      software GRE-GSO is capable of handling all GSO capabilities.
      
      This patch also fixes following panic which reverted commit introduced:-
      
      BUG: unable to handle kernel NULL pointer dereference at 00000000000000a2
      IP: [<ffffffffa0680fd1>] ipgre_tunnel_bind_dev+0x161/0x1f0 [ip_gre]
      PGD 42bc19067 PUD 42bca9067 PMD 0
      Oops: 0000 [#1] SMP
      Pid: 2636, comm: ip Tainted: GF            3.8.0+ #83 Dell Inc. PowerEdge R620/0KCKR5
      RIP: 0010:[<ffffffffa0680fd1>]  [<ffffffffa0680fd1>] ipgre_tunnel_bind_dev+0x161/0x1f0 [ip_gre]
      RSP: 0018:ffff88042bfcb708  EFLAGS: 00010246
      RAX: 00000000000005b6 RBX: ffff88042d2fa000 RCX: 0000000000000044
      RDX: 0000000000000018 RSI: 0000000000000078 RDI: 0000000000000060
      RBP: ffff88042bfcb748 R08: 0000000000000018 R09: 000000000000000c
      R10: 0000000000000020 R11: 000000000101010a R12: ffff88042d2fa800
      R13: 0000000000000000 R14: ffff88042d2fa800 R15: ffff88042cd7f650
      FS:  00007fa784f55700(0000) GS:ffff88043fd20000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000000000a2 CR3: 000000042d8b9000 CR4: 00000000000407e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process ip (pid: 2636, threadinfo ffff88042bfca000, task ffff88042d142a80)
      Stack:
       0000000100000000 002f000000000000 0a01010100000000 000000000b010101
       ffff88042d2fa800 ffff88042d2fa000 ffff88042bfcb858 ffff88042f418c00
       ffff88042bfcb798 ffffffffa068199a ffff88042bfcb798 ffff88042d2fa830
      Call Trace:
       [<ffffffffa068199a>] ipgre_newlink+0xca/0x160 [ip_gre]
       [<ffffffff8143b692>] rtnl_newlink+0x532/0x5f0
       [<ffffffff8143b2fc>] ? rtnl_newlink+0x19c/0x5f0
       [<ffffffff81438978>] rtnetlink_rcv_msg+0x2c8/0x340
       [<ffffffff814386b0>] ? rtnetlink_rcv+0x40/0x40
       [<ffffffff814560f9>] netlink_rcv_skb+0xa9/0xd0
       [<ffffffff81438695>] rtnetlink_rcv+0x25/0x40
       [<ffffffff81455ddc>] netlink_unicast+0x1ac/0x230
       [<ffffffff81456a45>] netlink_sendmsg+0x265/0x380
       [<ffffffff814138c0>] sock_sendmsg+0xb0/0xe0
       [<ffffffff8141141e>] ? move_addr_to_kernel+0x4e/0x90
       [<ffffffff81420445>] ? verify_iovec+0x85/0xf0
       [<ffffffff81414ffd>] __sys_sendmsg+0x3fd/0x420
       [<ffffffff8114b701>] ? handle_mm_fault+0x251/0x3b0
       [<ffffffff8114f39f>] ? vma_link+0xcf/0xe0
       [<ffffffff81415239>] sys_sendmsg+0x49/0x90
       [<ffffffff814ffd19>] system_call_fastpath+0x16/0x1b
      
      CC: Dmitry Kravkov <dmitry@broadcom.com>
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Acked-by: NDmitry Kravkov <dmitry@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7992ae6d
    • P
      IP_GRE: Fix GRE_CSUM case. · 8f10098f
      Pravin B Shelar 提交于
      commit "ip_gre: allow CSUM capable devices to handle packets"
      aa0e51cd, broke GRE_CSUM case.
      GRE_CSUM needs checksum computed for inner packet. Therefore
      csum-calculation can not be offloaded if tunnel device requires
      GRE_CSUM.  Following patch fixes it by computing inner packet checksum
      for GRE_CSUM type, for all other type of GRE devices csum is offloaded.
      
      CC: Dmitry Kravkov <dmitry@broadcom.com>
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Acked-by: NDmitry Kravkov <dmitry@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8f10098f
    • P
      IP_GRE: Fix IP-Identification. · 490ab081
      Pravin B Shelar 提交于
      GRE-GSO generates ip fragments with id 0,2,3,4... for every
      GSO packet, which is not correct. Following patch fixes it
      by setting ip-header id unique id of fragments are allowed.
      As Eric Dumazet suggested it is optimized by using inner ip-header
      whenever inner packet is ipv4.
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      490ab081
  16. 23 2月, 2013 3 次提交
  17. 22 2月, 2013 2 次提交
  18. 20 2月, 2013 2 次提交
  19. 19 2月, 2013 6 次提交
  20. 16 2月, 2013 2 次提交
  21. 14 2月, 2013 4 次提交
    • P
      net: Fix possible wrong checksum generation. · c9af6db4
      Pravin B Shelar 提交于
      Patch cef401de (net: fix possible wrong checksum
      generation) fixed wrong checksum calculation but it broke TSO by
      defining new GSO type but not a netdev feature for that type.
      net_gso_ok() would not allow hardware checksum/segmentation
      offload of such packets without the feature.
      
      Following patch fixes TSO and wrong checksum. This patch uses
      same logic that Eric Dumazet used. Patch introduces new flag
      SKBTX_SHARED_FRAG if at least one frag can be modified by
      the user. but SKBTX_SHARED_FRAG flag is kept in skb shared
      info tx_flags rather than gso_type.
      
      tx_flags is better compared to gso_type since we can have skb with
      shared frag without gso packet. It does not link SHARED_FRAG to
      GSO, So there is no need to define netdev feature for this.
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9af6db4
    • A
      tcp: send packets with a socket timestamp · ee684b6f
      Andrey Vagin 提交于
      A socket timestamp is a sum of the global tcp_time_stamp and
      a per-socket offset.
      
      A socket offset is added in places where externally visible
      tcp timestamp option is parsed/initialized.
      
      Connections in the SYN_RECV state are not supported, global
      tcp_time_stamp is used for them, because repair mode doesn't support
      this state. In a future it can be implemented by the similar way
      as for TIME_WAIT sockets.
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: James Morris <jmorris@namei.org>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Signed-off-by: NAndrey Vagin <avagin@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee684b6f
    • A
      tcp: set and get per-socket timestamp · 93be6ce0
      Andrey Vagin 提交于
      A timestamp can be set, only if a socket is in the repair mode.
      
      This patch adds a new socket option TCP_TIMESTAMP, which allows to
      get and set current tcp times stamp.
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: James Morris <jmorris@namei.org>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Signed-off-by: NAndrey Vagin <avagin@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93be6ce0
    • A
      tcp: adding a per-socket timestamp offset · ceaa1fef
      Andrey Vagin 提交于
      This functionality is used for restoring tcp sockets. A tcp timestamp
      depends on how long a system has been running, so it's differ for each
      host. The solution is to set a per-socket offset.
      
      A per-socket offset for a TIME_WAIT socket is inherited from a proper
      tcp socket.
      
      tcp_request_sock doesn't have a timestamp offset, because the repair
      mode for them are not implemented.
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: James Morris <jmorris@namei.org>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Signed-off-by: NAndrey Vagin <avagin@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ceaa1fef
  22. 11 2月, 2013 1 次提交
  23. 07 2月, 2013 1 次提交