1. 12 12月, 2011 1 次提交
  2. 10 11月, 2011 1 次提交
    • E
      ipv4: PKTINFO doesnt need dst reference · d826eb14
      Eric Dumazet 提交于
      Le lundi 07 novembre 2011 à 15:33 +0100, Eric Dumazet a écrit :
      
      > At least, in recent kernels we dont change dst->refcnt in forwarding
      > patch (usinf NOREF skb->dst)
      >
      > One particular point is the atomic_inc(dst->refcnt) we have to perform
      > when queuing an UDP packet if socket asked PKTINFO stuff (for example a
      > typical DNS server has to setup this option)
      >
      > I have one patch somewhere that stores the information in skb->cb[] and
      > avoid the atomic_{inc|dec}(dst->refcnt).
      >
      
      OK I found it, I did some extra tests and believe its ready.
      
      [PATCH net-next] ipv4: IP_PKTINFO doesnt need dst reference
      
      When a socket uses IP_PKTINFO notifications, we currently force a dst
      reference for each received skb. Reader has to access dst to get needed
      information (rt_iif & rt_spec_dst) and must release dst reference.
      
      We also forced a dst reference if skb was put in socket backlog, even
      without IP_PKTINFO handling. This happens under stress/load.
      
      We can instead store the needed information in skb->cb[], so that only
      softirq handler really access dst, improving cache hit ratios.
      
      This removes two atomic operations per packet, and false sharing as
      well.
      
      On a benchmark using a mono threaded receiver (doing only recvmsg()
      calls), I can reach 720.000 pps instead of 570.000 pps.
      
      IP_PKTINFO is typically used by DNS servers, and any multihomed aware
      UDP application.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d826eb14
  3. 22 10月, 2011 1 次提交
  4. 21 10月, 2011 1 次提交
  5. 08 8月, 2011 1 次提交
  6. 29 4月, 2011 1 次提交
    • E
      inet: add RCU protection to inet->opt · f6d8bd05
      Eric Dumazet 提交于
      We lack proper synchronization to manipulate inet->opt ip_options
      
      Problem is ip_make_skb() calls ip_setup_cork() and
      ip_setup_cork() possibly makes a copy of ipc->opt (struct ip_options),
      without any protection against another thread manipulating inet->opt.
      
      Another thread can change inet->opt pointer and free old one under us.
      
      Use RCU to protect inet->opt (changed to inet->inet_opt).
      
      Instead of handling atomic refcounts, just copy ip_options when
      necessary, to avoid cache line dirtying.
      
      We cant insert an rcu_head in struct ip_options since its included in
      skb->cb[], so this patch is large because I had to introduce a new
      ip_options_rcu structure.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f6d8bd05
  7. 23 4月, 2011 1 次提交
  8. 26 10月, 2010 1 次提交
  9. 14 9月, 2010 1 次提交
  10. 24 6月, 2010 1 次提交
  11. 11 6月, 2010 1 次提交
    • E
      ip: ip_ra_control() rcu fix · 592fcb9d
      Eric Dumazet 提交于
      commit 66018506 (ip: Router Alert RCU conversion) introduced RCU
      lookups to ip_call_ra_chain(). It missed proper deinit phase :
      When ip_ra_control() deletes an ip_ra_chain, it should make sure
      ip_call_ra_chain() users can not start to use socket during the rcu
      grace period. It should also delay the sock_put() after the grace
      period, or we risk a premature socket freeing and corruptions, as
      raw sockets are not rcu protected yet.
      
      This delay avoids using expensive atomic_inc_not_zero() in
      ip_call_ra_chain().
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      592fcb9d
  12. 08 6月, 2010 1 次提交
  13. 29 4月, 2010 1 次提交
  14. 02 4月, 2010 1 次提交
  15. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  16. 12 1月, 2010 1 次提交
  17. 29 10月, 2009 1 次提交
  18. 20 10月, 2009 2 次提交
  19. 19 10月, 2009 1 次提交
    • E
      inet: rename some inet_sock fields · c720c7e8
      Eric Dumazet 提交于
      In order to have better cache layouts of struct sock (separate zones
      for rx/tx paths), we need this preliminary patch.
      
      Goal is to transfert fields used at lookup time in the first
      read-mostly cache line (inside struct sock_common) and move sk_refcnt
      to a separate cache line (only written by rx path)
      
      This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
      sport and id fields. This allows a future patch to define these
      fields as macros, like sk_refcnt, without name clashes.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c720c7e8
  20. 01 10月, 2009 1 次提交
  21. 25 9月, 2009 1 次提交
  22. 03 6月, 2009 1 次提交
  23. 02 6月, 2009 2 次提交
  24. 20 11月, 2008 1 次提交
  25. 17 11月, 2008 1 次提交
  26. 03 11月, 2008 1 次提交
  27. 01 10月, 2008 1 次提交
  28. 12 6月, 2008 1 次提交
  29. 29 4月, 2008 1 次提交
  30. 28 4月, 2008 1 次提交
  31. 10 4月, 2008 1 次提交
    • D
      [IPV4]: Fix byte value boundary check in do_ip_getsockopt(). · 951e07c9
      David S. Miller 提交于
      This fixes kernel bugzilla 10371.
      
      As reported by M.Piechaczek@osmosys.tv, if we try to grab a
      char sized socket option value, as in:
      
        unsigned char ttl = 255;
        socklen_t     len = sizeof(ttl);
        setsockopt(socket, IPPROTO_IP, IP_MULTICAST_TTL, &ttl, &len);
      
        getsockopt(socket, IPPROTO_IP, IP_MULTICAST_TTL, &ttl, &len);
      
      The ttl returned will be wrong on big-endian, and on both little-
      endian and big-endian the next three bytes in userspace are written
      with garbage.
      
      It's because of this test in do_ip_getsockopt():
      
      	if (len < sizeof(int) && len > 0 && val>=0 && val<255) {
      
      It should allow a 'val' of 255 to pass here, but it doesn't so it
      copies a full 'int' back to userspace.
      
      On little-endian that will write the correct value into the location
      but it spams on the next three bytes in userspace.  On big endian it
      writes the wrong value into the location and spams the next three
      bytes.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      951e07c9
  32. 26 3月, 2008 1 次提交
  33. 25 3月, 2008 3 次提交
  34. 18 3月, 2008 1 次提交
  35. 06 3月, 2008 1 次提交
  36. 13 2月, 2008 1 次提交
    • D
      [IPV4]: Remove IP_TOS setting privilege checks. · e4f8b5d4
      David S. Miller 提交于
      Various RFCs have all sorts of things to say about the CS field of the
      DSCP value.  In particular they try to make the distinction between
      values that should be used by "user applications" and things like
      routing daemons.
      
      This seems to have influenced the CAP_NET_ADMIN check which exists for
      IP_TOS socket option settings, but in fact it has an off-by-one error
      so it wasn't allowing CS5 which is meant for "user applications" as
      well.
      
      Further adding to the inconsistency and brokenness here, IPV6 does not
      validate the DSCP values specified for the IPV6_TCLASS socket option.
      
      The real actual uses of these TOS values are system specific in the
      final analysis, and these RFC recommendations are just that, "a
      recommendation".  In fact the standards very purposefully use
      "SHOULD" and "SHOULD NOT" when describing how these values can be
      used.
      
      In the final analysis the only clean way to provide consistency here
      is to remove the CAP_NET_ADMIN check.  The alternatives just don't
      work out:
      
      1) If we add the CAP_NET_ADMIN check to ipv6, this can break existing
         setups.
      
      2) If we just fix the off-by-one error in the class comparison in
         IPV4, certain DSCP values can be used in IPV6 but not IPV4 by
         default.  So people will just ask for a sysctl asking to
         override that.
      
      I checked several other freely available kernel trees and they
      do not make any privilege checks in this area like we do.  For
      the BSD stacks, this goes back all the way to Stevens Volume 2
      and beyond.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e4f8b5d4