1. 23 11月, 2011 1 次提交
  2. 19 11月, 2011 1 次提交
    • H
      ipv6: Remove all uses of LL_ALLOCATED_SPACE · a7ae1992
      Herbert Xu 提交于
      ipv6: Remove all uses of LL_ALLOCATED_SPACE
      
      The macro LL_ALLOCATED_SPACE was ill-conceived.  It applies the
      alignment to the sum of needed_headroom and needed_tailroom.  As
      the amount that is then reserved for head room is needed_headroom
      with alignment, this means that the tail room left may be too small.
      
      This patch replaces all uses of LL_ALLOCATED_SPACE in net/ipv6
      with the macro LL_RESERVED_SPACE and direct reference to
      needed_tailroom.
      
      This also fixes the problem with needed_headroom changing between
      allocating the skb and reserving the head room.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a7ae1992
  3. 17 11月, 2011 3 次提交
  4. 16 11月, 2011 1 次提交
  5. 15 11月, 2011 2 次提交
  6. 14 11月, 2011 3 次提交
    • E
      neigh: new unresolved queue limits · 8b5c171b
      Eric Dumazet 提交于
      Le mercredi 09 novembre 2011 à 16:21 -0500, David Miller a écrit :
      > From: David Miller <davem@davemloft.net>
      > Date: Wed, 09 Nov 2011 16:16:44 -0500 (EST)
      >
      > > From: Eric Dumazet <eric.dumazet@gmail.com>
      > > Date: Wed, 09 Nov 2011 12:14:09 +0100
      > >
      > >> unres_qlen is the number of frames we are able to queue per unresolved
      > >> neighbour. Its default value (3) was never changed and is responsible
      > >> for strange drops, especially if IP fragments are used, or multiple
      > >> sessions start in parallel. Even a single tcp flow can hit this limit.
      > >  ...
      > >
      > > Ok, I've applied this, let's see what happens :-)
      >
      > Early answer, build fails.
      >
      > Please test build this patch with DECNET enabled and resubmit.  The
      > decnet neigh layer still refers to the removed ->queue_len member.
      >
      > Thanks.
      
      Ouch, this was fixed on one machine yesterday, but not the other one I
      used this morning, sorry.
      
      [PATCH V5 net-next] neigh: new unresolved queue limits
      
      unres_qlen is the number of frames we are able to queue per unresolved
      neighbour. Its default value (3) was never changed and is responsible
      for strange drops, especially if IP fragments are used, or multiple
      sessions start in parallel. Even a single tcp flow can hit this limit.
      
      $ arp -d 192.168.20.108 ; ping -c 2 -s 8000 192.168.20.108
      PING 192.168.20.108 (192.168.20.108) 8000(8028) bytes of data.
      8008 bytes from 192.168.20.108: icmp_seq=2 ttl=64 time=0.322 ms
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b5c171b
    • J
      ip6_tunnel: copy parms.name after register_netdevice · 731abb9c
      Josh Boyer 提交于
      Commit 1c5cae81 removed an explicit call to dev_alloc_name in ip6_tnl_create
      because register_netdevice will now create a valid name.  This works for the
      net_device itself.
      
      However the tunnel keeps a copy of the name in the parms structure for the
      ip6_tnl associated with the tunnel.  parms.name is set by copying the net_device
      name in ip6_tnl_dev_init_gen.  That function is called from ip6_tnl_dev_init in
      ip6_tnl_create, but it is done before register_netdevice is called so the name
      is set to a bogus value in the parms.name structure.
      
      This shows up if you do a simple tunnel add, followed by a tunnel show:
      
      [root@localhost ~]# ip -6 tunnel add remote fec0::100 local fec0::200
      [root@localhost ~]# ip -6 tunnel show
      ip6tnl0: ipv6/ipv6 remote :: local :: encaplimit 0 hoplimit 0 tclass 0x00 flowlabel 0x00000 (flowinfo 0x00000000)
      ip6tnl%d: ipv6/ipv6 remote fec0::100 local fec0::200 encaplimit 4 hoplimit 64 tclass 0x00 flowlabel 0x00000 (flowinfo 0x00000000)
      [root@localhost ~]#
      
      Fix this by moving the strcpy out of ip6_tnl_dev_init_gen, and calling it after
      register_netdevice has successfully returned.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJosh Boyer <jwboyer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      731abb9c
    • E
      ipv6: reduce percpu needs for icmpv6msg mibs · 2a24444f
      Eric Dumazet 提交于
      Reading /proc/net/snmp6 on a machine with a lot of cpus is very
      expensive (can be ~88000 us).
      
      This is because ICMPV6MSG MIB uses 4096 bytes per cpu, and folding
      values for all possible cpus can read 16 Mbytes of memory (32MBytes on
      non x86 arches)
      
      ICMP messages are not considered as fast path on a typical server, and
      eventually few cpus handle them anyway. We can afford an atomic
      operation instead of using percpu data.
      
      This saves 4096 bytes per cpu and per network namespace.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2a24444f
  7. 13 11月, 2011 1 次提交
  8. 10 11月, 2011 3 次提交
    • E
      ipv4: PKTINFO doesnt need dst reference · d826eb14
      Eric Dumazet 提交于
      Le lundi 07 novembre 2011 à 15:33 +0100, Eric Dumazet a écrit :
      
      > At least, in recent kernels we dont change dst->refcnt in forwarding
      > patch (usinf NOREF skb->dst)
      >
      > One particular point is the atomic_inc(dst->refcnt) we have to perform
      > when queuing an UDP packet if socket asked PKTINFO stuff (for example a
      > typical DNS server has to setup this option)
      >
      > I have one patch somewhere that stores the information in skb->cb[] and
      > avoid the atomic_{inc|dec}(dst->refcnt).
      >
      
      OK I found it, I did some extra tests and believe its ready.
      
      [PATCH net-next] ipv4: IP_PKTINFO doesnt need dst reference
      
      When a socket uses IP_PKTINFO notifications, we currently force a dst
      reference for each received skb. Reader has to access dst to get needed
      information (rt_iif & rt_spec_dst) and must release dst reference.
      
      We also forced a dst reference if skb was put in socket backlog, even
      without IP_PKTINFO handling. This happens under stress/load.
      
      We can instead store the needed information in skb->cb[], so that only
      softirq handler really access dst, improving cache hit ratios.
      
      This removes two atomic operations per packet, and false sharing as
      well.
      
      On a benchmark using a mono threaded receiver (doing only recvmsg()
      calls), I can reach 720.000 pps instead of 570.000 pps.
      
      IP_PKTINFO is typically used by DNS servers, and any multihomed aware
      UDP application.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d826eb14
    • N
      ah: Read nexthdr value before overwriting it in ahash input callback. · b7ea81a5
      Nick Bowler 提交于
      The AH4/6 ahash input callbacks read out the nexthdr field from the AH
      header *after* they overwrite that header.  This is obviously not going
      to end well.  Fix it up.
      Signed-off-by: NNick Bowler <nbowler@elliptictech.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b7ea81a5
    • N
      ah: Correctly pass error codes in ahash output callback. · 069294e8
      Nick Bowler 提交于
      The AH4/6 ahash output callbacks pass nexthdr to xfrm_output_resume
      instead of the error code.  This appears to be a copy+paste error from
      the input case, where nexthdr is expected.  This causes the driver to
      continuously add AH headers to the datagram until either an allocation
      fails and the packet is dropped or the ahash driver hits a synchronous
      fallback and the resulting monstrosity is transmitted.
      
      Correct this issue by simply passing the error code unadulterated.
      Signed-off-by: NNick Bowler <nbowler@elliptictech.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      069294e8
  9. 09 11月, 2011 4 次提交
  10. 02 11月, 2011 1 次提交
  11. 01 11月, 2011 3 次提交
  12. 30 10月, 2011 1 次提交
    • A
      ipv6: fix route lookup in addrconf_prefix_rcv() · 14ef37b6
      Andreas Hofmeister 提交于
      The route lookup to find a previously auto-configured route for a prefixes used
      to use rt6_lookup(), with the prefix from the RA used as an address. However,
      that kind of lookup ignores routing tables, the prefix length and route flags,
      so when there were other matching routes, even in different tables and/or with
      a different prefix length, the wrong route would be manipulated.
      
      Now, a new function "addrconf_get_prefix_route()" is used for the route lookup,
      which searches in RT6_TABLE_PREFIX and takes the prefix-length and route flags
      into account.
      Signed-off-by: NAndreas Hofmeister <andi@collax.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      14ef37b6
  13. 29 10月, 2011 1 次提交
  14. 28 10月, 2011 1 次提交
  15. 27 10月, 2011 1 次提交
  16. 25 10月, 2011 1 次提交
    • A
      ipv6: Do not use routes from locally generated RAs · 9f56220f
      Andreas Hofmeister 提交于
      When hybrid mode is enabled (accept_ra == 2), the kernel also sees RAs
      generated locally. This is useful since it allows the kernel to auto-configure
      its own interface addresses.
      
      However, if 'accept_ra_defrtr' and/or 'accept_ra_rtr_pref' are set and the
      locally generated RAs announce the default route and/or other route information,
      the kernel happily inserts bogus routes with its own address as gateway.
      
      With this patch, adding routes from an RA will be skiped when the RAs source
      address matches any local address, just as if 'accept_ra_defrtr' and
      'accept_ra_rtr_pref' were set to 0.
      Signed-off-by: NAndreas Hofmeister <andi@collax.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f56220f
  17. 24 10月, 2011 1 次提交
  18. 21 10月, 2011 2 次提交
  19. 20 10月, 2011 1 次提交
  20. 19 10月, 2011 4 次提交
  21. 18 10月, 2011 1 次提交
  22. 14 10月, 2011 1 次提交
    • E
      net: more accurate skb truesize · 87fb4b7b
      Eric Dumazet 提交于
      skb truesize currently accounts for sk_buff struct and part of skb head.
      kmalloc() roundings are also ignored.
      
      Considering that skb_shared_info is larger than sk_buff, its time to
      take it into account for better memory accounting.
      
      This patch introduces SKB_TRUESIZE(X) macro to centralize various
      assumptions into a single place.
      
      At skb alloc phase, we put skb_shared_info struct at the exact end of
      skb head, to allow a better use of memory (lowering number of
      reallocations), since kmalloc() gives us power-of-two memory blocks.
      
      Unless SLUB/SLUB debug is active, both skb->head and skb_shared_info are
      aligned to cache lines, as before.
      
      Note: This patch might trigger performance regressions because of
      misconfigured protocol stacks, hitting per socket or global memory
      limits that were previously not reached. But its a necessary step for a
      more accurate memory accounting.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Andi Kleen <ak@linux.intel.com>
      CC: Ben Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      87fb4b7b
  23. 11 10月, 2011 1 次提交
  24. 05 10月, 2011 1 次提交