1. 25 3月, 2013 5 次提交
  2. 23 3月, 2013 2 次提交
  3. 22 3月, 2013 7 次提交
    • E
      tcp: preserve ACK clocking in TSO · f4541d60
      Eric Dumazet 提交于
      A long standing problem with TSO is the fact that tcp_tso_should_defer()
      rearms the deferred timer, while it should not.
      
      Current code leads to following bad bursty behavior :
      
      20:11:24.484333 IP A > B: . 297161:316921(19760) ack 1 win 119
      20:11:24.484337 IP B > A: . ack 263721 win 1117
      20:11:24.485086 IP B > A: . ack 265241 win 1117
      20:11:24.485925 IP B > A: . ack 266761 win 1117
      20:11:24.486759 IP B > A: . ack 268281 win 1117
      20:11:24.487594 IP B > A: . ack 269801 win 1117
      20:11:24.488430 IP B > A: . ack 271321 win 1117
      20:11:24.489267 IP B > A: . ack 272841 win 1117
      20:11:24.490104 IP B > A: . ack 274361 win 1117
      20:11:24.490939 IP B > A: . ack 275881 win 1117
      20:11:24.491775 IP B > A: . ack 277401 win 1117
      20:11:24.491784 IP A > B: . 316921:332881(15960) ack 1 win 119
      20:11:24.492620 IP B > A: . ack 278921 win 1117
      20:11:24.493448 IP B > A: . ack 280441 win 1117
      20:11:24.494286 IP B > A: . ack 281961 win 1117
      20:11:24.495122 IP B > A: . ack 283481 win 1117
      20:11:24.495958 IP B > A: . ack 285001 win 1117
      20:11:24.496791 IP B > A: . ack 286521 win 1117
      20:11:24.497628 IP B > A: . ack 288041 win 1117
      20:11:24.498459 IP B > A: . ack 289561 win 1117
      20:11:24.499296 IP B > A: . ack 291081 win 1117
      20:11:24.500133 IP B > A: . ack 292601 win 1117
      20:11:24.500970 IP B > A: . ack 294121 win 1117
      20:11:24.501388 IP B > A: . ack 295641 win 1117
      20:11:24.501398 IP A > B: . 332881:351881(19000) ack 1 win 119
      
      While the expected behavior is more like :
      
      20:19:49.259620 IP A > B: . 197601:202161(4560) ack 1 win 119
      20:19:49.260446 IP B > A: . ack 154281 win 1212
      20:19:49.261282 IP B > A: . ack 155801 win 1212
      20:19:49.262125 IP B > A: . ack 157321 win 1212
      20:19:49.262136 IP A > B: . 202161:206721(4560) ack 1 win 119
      20:19:49.262958 IP B > A: . ack 158841 win 1212
      20:19:49.263795 IP B > A: . ack 160361 win 1212
      20:19:49.264628 IP B > A: . ack 161881 win 1212
      20:19:49.264637 IP A > B: . 206721:211281(4560) ack 1 win 119
      20:19:49.265465 IP B > A: . ack 163401 win 1212
      20:19:49.265886 IP B > A: . ack 164921 win 1212
      20:19:49.266722 IP B > A: . ack 166441 win 1212
      20:19:49.266732 IP A > B: . 211281:215841(4560) ack 1 win 119
      20:19:49.267559 IP B > A: . ack 167961 win 1212
      20:19:49.268394 IP B > A: . ack 169481 win 1212
      20:19:49.269232 IP B > A: . ack 171001 win 1212
      20:19:49.269241 IP A > B: . 215841:221161(5320) ack 1 win 119
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Van Jacobson <vanj@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Nandita Dukkipati <nanditad@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4541d60
    • T
      rtnetlink: Remove passing of attributes into rtnl_doit functions · 661d2967
      Thomas Graf 提交于
      With decnet converted, we can finally get rid of rta_buf and its
      computations around it. It also gets rid of the minimal header
      length verification since all message handlers do that explicitly
      anyway.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      661d2967
    • T
      decnet: Parse netlink attributes on our own · 58d7d8f9
      Thomas Graf 提交于
      decnet is the only subsystem left that is relying on the global
      netlink attribute buffer rta_buf. It's horrible design and we
      want to get rid of it.
      
      This converts all of decnet to do implicit attribute parsing. It
      also gets rid of the error prone struct dn_kern_rta.
      
      Yes, the fib_magic() stuff is not pretty.
      
      It's compiled tested but I need someone with appropriate hardware
      to test the patch since I don't have access to it.
      
      Cc: linux-decnet-user@lists.sourceforge.net
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58d7d8f9
    • C
      udp: increase inner ip header ID during segmentation · d6a8c36d
      Cong Wang 提交于
      Similar to GRE tunnel, UDP tunnel should take care of IP header ID
      too.
      
      Cc: Pravin B Shelar <pshelar@nicira.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NCong Wang <amwang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d6a8c36d
    • C
      ip_gre: increase inner ip header ID during segmentation · 10c0d7ed
      Cong Wang 提交于
      According to the previous discussion [1] on netdev list, DaveM insists
      we should increase the IP header ID for each segmented packets.
      This patch fixes it.
      
      Cc: Pravin B Shelar <pshelar@nicira.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NCong Wang <amwang@redhat.com>
      
      1. http://marc.info/?t=136384172700001&r=1&w=2Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      10c0d7ed
    • A
      netlink: Diag core and basic socket info dumping (v2) · eaaa3139
      Andrey Vagin 提交于
      The netlink_diag can be built as a module, just like it's done in
      unix sockets.
      
      The core dumping message carries the basic info about netlink sockets:
      family, type and protocol, portis, dst_group, dst_portid, state.
      
      Groups can be received as an optional parameter NETLINK_DIAG_GROUPS.
      
      Netlink sockets cab be filtered by protocols.
      
      The socket inode number and cookie is reserved for future per-socket info
      retrieving. The per-protocol filtering is also reserved for future by
      requiring the sdiag_protocol to be zero.
      
      The file /proc/net/netlink doesn't provide enough information for
      dumping netlink sockets. It doesn't provide dst_group, dst_portid,
      groups above 32.
      
      v2: fix NETLINK_DIAG_MAX. Now it's equal to the last constant.
      Acked-by: NPavel Emelyanov <xemul@parallels.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Gao feng <gaofeng@cn.fujitsu.com>
      Cc: Thomas Graf <tgraf@suug.ch>
      Signed-off-by: NAndrey Vagin <avagin@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eaaa3139
    • A
      net: prepare netlink code for netlink diag · 0f29c768
      Andrey Vagin 提交于
      Move a few declarations in a header.
      Acked-by: NPavel Emelyanov <xemul@parallels.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Gao feng <gaofeng@cn.fujitsu.com>
      Cc: Thomas Graf <tgraf@suug.ch>
      Signed-off-by: NAndrey Vagin <avagin@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0f29c768
  4. 21 3月, 2013 23 次提交
  5. 20 3月, 2013 2 次提交
    • P
      netfilter: remove unused "config IP_NF_QUEUE" · 3dd6664f
      Paul Bolle 提交于
      Kconfig symbol IP_NF_QUEUE is unused since commit
      d16cf20e ("netfilter: remove ip_queue
      support"). Let's remove it too.
      Signed-off-by: NPaul Bolle <pebolle@tiscali.nl>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      3dd6664f
    • W
      packet: packet fanout rollover during socket overload · 77f65ebd
      Willem de Bruijn 提交于
      Changes:
        v3->v2: rebase (no other changes)
                passes selftest
        v2->v1: read f->num_members only once
                fix bug: test rollover mode + flag
      
      Minimize packet drop in a fanout group. If one socket is full,
      roll over packets to another from the group. Maintain flow
      affinity during normal load using an rxhash fanout policy, while
      dispersing unexpected traffic storms that hit a single cpu, such
      as spoofed-source DoS flows. Rollover breaks affinity for flows
      arriving at saturated sockets during those conditions.
      
      The patch adds a fanout policy ROLLOVER that rotates between sockets,
      filling each socket before moving to the next. It also adds a fanout
      flag ROLLOVER. If passed along with any other fanout policy, the
      primary policy is applied until the chosen socket is full. Then,
      rollover selects another socket, to delay packet drop until the
      entire system is saturated.
      
      Probing sockets is not free. Selecting the last used socket, as
      rollover does, is a greedy approach that maximizes chance of
      success, at the cost of extreme load imbalance. In practice, with
      sufficiently long queues to absorb bursts, sockets are drained in
      parallel and load balance looks uniform in `top`.
      
      To avoid contention, scales counters with number of sockets and
      accesses them lockfree. Values are bounds checked to ensure
      correctness.
      
      Tested using an application with 9 threads pinned to CPUs, one socket
      per thread and sufficient busywork per packet operation to limits each
      thread to handling 32 Kpps. When sent 500 Kpps single UDP stream
      packets, a FANOUT_CPU setup processes 32 Kpps in total without this
      patch, 270 Kpps with the patch. Tested with read() and with a packet
      ring (V1).
      
      Also, passes psock_fanout.c unit test added to selftests.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      77f65ebd
  6. 19 3月, 2013 1 次提交
    • B
      xfrm: use xfrm direction when lookup policy · b5fb82c4
      Baker Zhang 提交于
      because xfrm policy direction has same value with corresponding
      flow direction, so this problem is covered.
      
      In xfrm_lookup and __xfrm_policy_check, flow_cache_lookup is used to
      accelerate the lookup.
      
      Flow direction is given to flow_cache_lookup by policy_to_flow_dir.
      
      When the flow cache is mismatched, callback 'resolver' is called.
      
      'resolver' requires xfrm direction,
      so convert direction back to xfrm direction.
      Signed-off-by: NBaker Zhang <baker.zhang@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b5fb82c4