1. 26 5月, 2015 15 次提交
    • F
      ipv6: don't increase size when refragmenting forwarded ipv6 skbs · 485fca66
      Florian Westphal 提交于
      since commit 6aafeef0 ("netfilter: push reasm skb through instead of
      original frag skbs") we will end up sometimes re-fragmenting skbs
      that we've reassembled.
      
      ipv6 defrag preserves the original skbs using the skb frag list, i.e. as long
      as the skb frag list is preserved there is no problem since we keep
      original geometry of fragments intact.
      
      However, in the rare case where the frag list is munged or skb
      is linearized, we might send larger fragments than what we originally
      received.
      
      A router in the path might then send packet-too-big errors even if
      sender never sent fragments exceeding the reported mtu:
      
      mtu 1500 - 1500:1400 - 1400:1280 - 1280
           A         R1         R2        B
      
      1 - A sends to B, fragment size 1400
      2 - R2 sends pkttoobig error for 1280
      3 - A sends to B, fragment size 1280
      4 - R2 sends pkttoobig error for 1280 again because it sees fragments of size 1400.
      
      make sure ip6_fragment always caps MTU at largest packet size seen
      when defragmented skb is forwarded.
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      485fca66
    • S
      atm:he - Change 1 to true for bool type variable. · 376cd36d
      Shailendra Verma 提交于
      The variable irq_coalesce is bool type.
      So assign the value true instead of 1.
      Signed-off-by: NShailendra Verma <shailendra.capricorn@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      376cd36d
    • S
      net:xen-netback - Change 1 to true for bool type variable. · c489dbb1
      Shailendra Verma 提交于
      The variable separate_tx_rx_irq is bool type so assigning true
      instead of 1.
      Signed-off-by: NShailendra Verma <shailendra.capricorn@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c489dbb1
    • D
      Merge branch 'ipv6_route_sharing' · c1a34035
      David S. Miller 提交于
      Martin KaFai Lau says:
      
      ====================
      ipv6: Only create RTF_CACHE route after encountering pmtu exception
      
      v4 -> v5:
      - Patch 1 is new. Clean up the ipv6_select_ident() and ip6_fragment().
      
      - Further simplify the newly added rt6_get_pcpu_route().  If there is a
        'prev' after cmpxchg, return prev instead of the newly created percpu
        clone.
      
      v3 -> v4:
      - Patch 8 is new. It keeps track of the DST_NOCACHE routes in a list to handle
        the iface down/unregister event.
      
      - Remove rcu from the newly added rt6i_pcpu variable.  It is not needed
        because it has already been protected by the existing reader/writer lock.
      
      - Thanks to 'Julian Anastasov <ja@ssi.bg>' for testing the FLOWI_FLAG_KNOWN_NH
        patches.
      
      v2 -> v3:
      - Patch 5 to 7 are new.  They take care of cases where the daddr in
        skb is not the one used to do the route look-up.  There is also
        related changes to rt6_nexthop() since v2 which is in patch 2/9.
        Thanks to 'Julian Anastasov <ja@ssi.bg>' for pointing it out.
      
      - Fix a few problems in __ip6_rt_update_pmtu(), like setting the expire
        and mtu before inserting to the tree and don't do dst_destroy() after
        tree insertion failure.  Also update the rt6i_pmtu in fib6_add_rt2node().
        Thanks to 'Steffen Klassert <steffen.klassert@secunet.com>' for pointing
        it out.
      
      - Merge ip6_pmtu_rt_cache_alloc() into ip6_rt_cache_alloc().
      
      v1 -> v2:
      - Move the /128 route bug fixes to another series (accepted).
      - Create a function for checking (rt6i_flags & (RTF_NONEXTHOP | RTF_GATEWAY)).
      - Avoid shuffling the skb network_header.  Instead, change the function
        signature to take iph instead of skb.
      
      - Many Thanks to 'Hannes Frederic Sowa <hannes@stressinduktion.org>' on
        reviewing v1 and v2 and giving advice.
      
      --Martin
      
      ~~~ start: v1 compose message (with the out-dated parts removed) ~~~
      
      This series is to avoid creating a RTF_CACHE route whenever we are consulting
      the fib6 tree with a new destination.  Instead, only create RTF_CACHE route
      when we see a pmtu exception.
      
      Out of all ipv6 RTF_CACHE routes that are created, the percentage that has a
      different mtu is very small. In one of our end-user facing proxy server,
      only 1k out of 80k RTF_CACHE routes have a smaller MTU.  For our DC
      traffic, there is no mtu exception.
      
      A large fib6 tree has problems like, 'ip -6 r show' takes a long time.
      gc may kick in too often.  Also, when a service has restarted and a lot
      of new TCP conn requests come in, it creates pressure on the tree by inserting
      a lot of RTF_CACHE in a short time and it currently requires a write lock
      to do that.
      
      The first few patches are prep works to remove assumption that the
      returned rt is always RTF_CACHE.
      
      The patch 'ipv6: Only create RTF_CACHE routes after encountering pmtu exception'
      do the lazy RTF_CACHE route creation.
      
      The following patches added percpu rt to compensate the performance loss after
      doing the RTF_CACHE lazy creation.
      
      Here is some numbers of the udpflood test.  The udpflood has been
      slightly modified to have a time limit instead of count limit.
      
      A /64 via gateway route is used for the test. Each udpflood uses 10000 dst
      addresses.  The dst addresses of different udpflood processes do not overlap
      with each other.
      
      1                    16M                          15M
      10                   61M                          61M
      20                   65M                          62M
      40                   88M                          83M
      
      ~~~ end: v1 compose message ~~~
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c1a34035
    • M
      ipv6: Create percpu rt6_info · d52d3997
      Martin KaFai Lau 提交于
      After the patch
      'ipv6: Only create RTF_CACHE routes after encountering pmtu exception',
      we need to compensate the performance hit (bouncing dst->__refcnt).
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: Julian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d52d3997
    • M
      ipv6: Break up ip6_rt_copy() · 83a09abd
      Martin KaFai Lau 提交于
      This patch breaks up ip6_rt_copy() into ip6_rt_copy_init() and
      ip6_rt_cache_alloc().
      
      In the later patch, we need to create a percpu rt6_info copy. Hence,
      refactor the common rt6_info init codes to ip6_rt_copy_init().
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: Julian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      83a09abd
    • M
      ipv6: Keep track of DST_NOCACHE routes in case of iface down/unregister · 8d0b94af
      Martin KaFai Lau 提交于
      This patch keeps track of the DST_NOCACHE routes in a list and replaces its
      dev with loopback during the iface down/unregister event.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: Julian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d0b94af
    • M
      ipv6: Create RTF_CACHE clone when FLOWI_FLAG_KNOWN_NH is set · 3da59bd9
      Martin KaFai Lau 提交于
      This patch always creates RTF_CACHE clone with DST_NOCACHE
      when FLOWI_FLAG_KNOWN_NH is set so that the rt6i_dst is set to
      the fl6->daddr.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NJulian Anastasov <ja@ssi.bg>
      Tested-by: NJulian Anastasov <ja@ssi.bg>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3da59bd9
    • M
      ipv6: Set FLOWI_FLAG_KNOWN_NH at flowi6_flags · 48e8aa6e
      Martin KaFai Lau 提交于
      The neighbor look-up used to depend on the rt6i_gateway (if
      there is a gateway) or the rt6i_dst (if it is a RTF_CACHE clone)
      as the nexthop address.  Note that rt6i_dst is set to fl6->daddr
      for the RTF_CACHE clone where fl6->daddr is the one used to do
      the route look-up.
      
      Now, we only create RTF_CACHE clone after encountering exception.
      When doing the neighbor look-up with a route that is neither a gateway
      nor a RTF_CACHE clone, the daddr in skb will be used as the nexthop.
      
      In some cases, the daddr in skb is not the one used to do
      the route look-up.  One example is in ip_vs_dr_xmit_v6() where the
      real nexthop server address is different from the one in the skb.
      
      This patch is going to follow the IPv4 approach and ask the
      ip6_pol_route() callers to set the FLOWI_FLAG_KNOWN_NH properly.
      
      In the next patch, ip6_pol_route() will honor the FLOWI_FLAG_KNOWN_NH
      and create a RTF_CACHE clone.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NJulian Anastasov <ja@ssi.bg>
      Tested-by: NJulian Anastasov <ja@ssi.bg>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48e8aa6e
    • M
      ipv6: Add rt6_get_cookie() function · b197df4f
      Martin KaFai Lau 提交于
      Instead of doing the rt6->rt6i_node check whenever we need
      to get the route's cookie.  Refactor it into rt6_get_cookie().
      It is a prep work to handle FLOWI_FLAG_KNOWN_NH and also
      percpu rt6_info later.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: Julian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b197df4f
    • M
      ipv6: Only create RTF_CACHE routes after encountering pmtu exception · 45e4fd26
      Martin KaFai Lau 提交于
      This patch creates a RTF_CACHE routes only after encountering a pmtu
      exception.
      
      After ip6_rt_update_pmtu() has inserted the RTF_CACHE route to the fib6
      tree, the rt->rt6i_node->fn_sernum is bumped which will fail the
      ip6_dst_check() and trigger a relookup.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: Julian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      45e4fd26
    • M
      ipv6: Combine rt6_alloc_cow and rt6_alloc_clone · 8b9df265
      Martin KaFai Lau 提交于
      A prep work for creating RTF_CACHE on exception only.  After this
      patch, the same condition (rt->rt6i_flags & (RTF_NONEXTHOP | RTF_GATEWAY))
      is checked twice. This redundancy will be removed in the later patch.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: Julian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b9df265
    • M
      ipv6: Remove external dependency on rt6i_gateway and RTF_ANYCAST · 2647a9b0
      Martin KaFai Lau 提交于
      When creating a RTF_CACHE route, RTF_ANYCAST is set based on rt6i_dst.
      Also, rt6i_gateway is always set to the nexthop while the nexthop
      could be a gateway or the rt6i_dst.addr.
      
      After removing the rt6i_dst and rt6i_src dependency in the last patch,
      we also need to stop the caller from depending on rt6i_gateway and
      RTF_ANYCAST.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: Julian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2647a9b0
    • M
      ipv6: Remove external dependency on rt6i_dst and rt6i_src · fd0273d7
      Martin KaFai Lau 提交于
      This patch removes the assumptions that the returned rt is always
      a RTF_CACHE entry with the rt6i_dst and rt6i_src containing the
      destination and source address.  The dst and src can be recovered from
      the calling site.
      
      We may consider to rename (rt6i_dst, rt6i_src) to
      (rt6i_key_dst, rt6i_key_src) later.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: Julian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd0273d7
    • M
      ipv6: Clean up ipv6_select_ident() and ip6_fragment() · 286c2349
      Martin KaFai Lau 提交于
      This patch changes the ipv6_select_ident() signature to return a
      fragment id instead of taking a whole frag_hdr as a param to
      only set the frag_hdr->identification.
      
      It also cleans up ip6_fragment() to obtain the fragment id at the
      beginning instead of using multiple "if" later to check fragment id
      has been generated or not.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: Julian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      286c2349
  2. 25 5月, 2015 21 次提交
  3. 23 5月, 2015 4 次提交
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 36583eb5
      David S. Miller 提交于
      Conflicts:
      	drivers/net/ethernet/cadence/macb.c
      	drivers/net/phy/phy.c
      	include/linux/skbuff.h
      	net/ipv4/tcp.c
      	net/switchdev/switchdev.c
      
      Switchdev was a case of RTNH_H_{EXTERNAL --> OFFLOAD}
      renaming overlapping with net-next changes of various
      sorts.
      
      phy.c was a case of two changes, one adding a local
      variable to a function whilst the second was removing
      one.
      
      tcp.c overlapped a deadlock fix with the addition of new tcp_info
      statistic values.
      
      macb.c involved the addition of two zyncq device entries.
      
      skbuff.h involved adding back ipv4_daddr to nf_bridge_info
      whilst net-next changes put two other existing members of
      that struct into a union.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      36583eb5
    • D
      Merge branch 'pktgen-new-scripts' · fa7912be
      David S. Miller 提交于
      Jesper Dangaard Brouer says:
      
      ====================
      pktgen: cleanups and introducing new samples/pktgen scripts
      
      v3:
       - Aborted v2 send due it was not generating diff stat
         (this is a bug in stg-mail, if not in the root directory)
      
      v2: address nitpicks from Cong Wang
       - Remove useless cat's, but keep them for old pgset()
       - Comment on: Due to pgctrl, cannot use exit code $? from grep
       - Use arithmetic compare in pktgen_sample03_burst_single_flow.sh
      
      This patchset is focused on making pktgen easier to use and better
      documented. It contains a number of documentation updates and minor
      changes to pktgen.  The major contribution is introduction of common
      helper function for sample scripts.
      
      Instead of the old pgset() function, three new shell functions for
      configuring the different components of pktgen are introduced:
       pg_ctrl(), pg_thread() and pg_set().
      
      The new functions correspond to pktgens different components.
       * pg_ctrl()   control "pgctrl" (/proc/net/pktgen/pgctrl)
       * pg_thread() control the kernel threads and binding to devices
       * pg_set()    control setup of individual devices
      
      Helpers also provide consistent parameter parsing across the sample
      scripts.
      
      Usage example:
       ./pktgen_sample01_simple.sh -i eth41 -m 00:12:C0:02:AC:5A -d 192.168.41.2
      
      Usage: ./pktgen_sample01_simple.sh [-vx] -i ethX
        -i : ($DEV)       output interface/device (required)
        -s : ($PKT_SIZE)  packet size
        -d : ($DEST_IP)   destination IP
        -m : ($DST_MAC)   destination MAC-addr
        -t : ($THREADS)   threads to start
        -c : ($SKB_CLONE) SKB clones send before alloc new SKB
        -b : ($BURST)     HW level bursting of SKBs
        -v : ($VERBOSE)   verbose
        -x : ($DEBUG)     debug
      
      These scripts are borrowed from:
       https://github.com/netoptimizer/network-testing/tree/master/pktgen
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fa7912be
    • J
      pktgen: add benchmark script pktgen_bench_xmit_mode_netif_receive.sh · 05a14d5e
      Jesper Dangaard Brouer 提交于
      This script pktgen_bench_xmit_mode_netif_receive.sh is a benchmark
      script, which can be used for benchmarking part of the network stack.
      This can be used for performance improving or catching regression in
      that area.
      
      The script is developed for benchmarking ingress qdisc path, original
      idea by Alexei Starovoitov.  This script don't really need any
      hardware.  This is achieved via the recently introduced stack inject
      feature "xmit_mode netif_receive". See commit 62f64aed ("pktgen:
      introduce xmit_mode '<start_xmit|netif_receive>'").
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      05a14d5e
    • J
      pktgen: add sample script pktgen_sample03_burst_single_flow.sh · 1d73ba16
      Jesper Dangaard Brouer 提交于
      Add the pktgen samples script pktgen_sample03_burst_single_flow.sh
      that demonstrates how to acheive maximum performance.
      
      If correctly tuned[1] single CPU 10Gbit/s wirespeed small pkts is
      possible[2] which is 14.88Mpps.  The trick is to take advantage of the
      "burst" feature introduced in commit 38b2cf29 ("net: pktgen:
      packet bursting via skb->xmit_more").
      
      [1] http://netoptimizer.blogspot.dk/2014/06/pktgen-for-network-overload-testing.html
      [2] http://netoptimizer.blogspot.dk/2014/10/unlocked-10gbps-tx-wirespeed-smallest.htmlSigned-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d73ba16