1. 16 4月, 2012 2 次提交
  2. 22 2月, 2012 1 次提交
  3. 13 2月, 2012 1 次提交
    • J
      net: implement IP_RECVTOS for IP_PKTOPTIONS · 4c507d28
      Jiri Benc 提交于
      Currently, it is not easily possible to get TOS/DSCP value of packets from
      an incoming TCP stream. The mechanism is there, IP_PKTOPTIONS getsockopt
      with IP_RECVTOS set, the same way as incoming TTL can be queried. This is
      not actually implemented for TOS, though.
      
      This patch adds this functionality, both for IPv4 (IP_PKTOPTIONS) and IPv6
      (IPV6_2292PKTOPTIONS). For IPv4, like in the IP_RECVTTL case, the value of
      the TOS field is stored from the other party's ACK.
      
      This is needed for proxies which require DSCP transparency. One such example
      is at http://zph.bratcheda.org/.
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c507d28
  4. 09 2月, 2012 1 次提交
    • E
      ipv4: Implement IP_UNICAST_IF socket option. · 76e21053
      Erich E. Hoover 提交于
      The IP_UNICAST_IF feature is needed by the Wine project.  This patch
      implements the feature by setting the outgoing interface in a similar
      fashion to that of IP_MULTICAST_IF.  A separate option is needed to
      handle this feature since the existing options do not provide all of
      the characteristics required by IP_UNICAST_IF, a summary is provided
      below.
      
      SO_BINDTODEVICE:
      * SO_BINDTODEVICE requires administrative privileges, IP_UNICAST_IF
      does not.  From reading some old mailing list articles my
      understanding is that SO_BINDTODEVICE requires administrative
      privileges because it can override the administrator's routing
      settings.
      * The SO_BINDTODEVICE option restricts both outbound and inbound
      traffic, IP_UNICAST_IF only impacts outbound traffic.
      
      IP_PKTINFO:
      * Since IP_PKTINFO and IP_UNICAST_IF are independent options,
      implementing IP_UNICAST_IF with IP_PKTINFO will likely break some
      applications.
      * Implementing IP_UNICAST_IF on top of IP_PKTINFO significantly
      complicates the Wine codebase and reduces the socket performance
      (doing this requires a lot of extra communication between the
      "server" and "user" layers).
      
      bind():
      * bind() does not work on broadcast packets, IP_UNICAST_IF is
      specifically intended to work with broadcast packets.
      * Like SO_BINDTODEVICE, bind() restricts both outbound and inbound
      traffic.
      Signed-off-by: NErich E. Hoover <ehoover@mines.edu>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      76e21053
  5. 12 12月, 2011 1 次提交
  6. 10 11月, 2011 1 次提交
    • E
      ipv4: PKTINFO doesnt need dst reference · d826eb14
      Eric Dumazet 提交于
      Le lundi 07 novembre 2011 à 15:33 +0100, Eric Dumazet a écrit :
      
      > At least, in recent kernels we dont change dst->refcnt in forwarding
      > patch (usinf NOREF skb->dst)
      >
      > One particular point is the atomic_inc(dst->refcnt) we have to perform
      > when queuing an UDP packet if socket asked PKTINFO stuff (for example a
      > typical DNS server has to setup this option)
      >
      > I have one patch somewhere that stores the information in skb->cb[] and
      > avoid the atomic_{inc|dec}(dst->refcnt).
      >
      
      OK I found it, I did some extra tests and believe its ready.
      
      [PATCH net-next] ipv4: IP_PKTINFO doesnt need dst reference
      
      When a socket uses IP_PKTINFO notifications, we currently force a dst
      reference for each received skb. Reader has to access dst to get needed
      information (rt_iif & rt_spec_dst) and must release dst reference.
      
      We also forced a dst reference if skb was put in socket backlog, even
      without IP_PKTINFO handling. This happens under stress/load.
      
      We can instead store the needed information in skb->cb[], so that only
      softirq handler really access dst, improving cache hit ratios.
      
      This removes two atomic operations per packet, and false sharing as
      well.
      
      On a benchmark using a mono threaded receiver (doing only recvmsg()
      calls), I can reach 720.000 pps instead of 570.000 pps.
      
      IP_PKTINFO is typically used by DNS servers, and any multihomed aware
      UDP application.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d826eb14
  7. 22 10月, 2011 1 次提交
  8. 21 10月, 2011 1 次提交
  9. 08 8月, 2011 1 次提交
  10. 29 4月, 2011 1 次提交
    • E
      inet: add RCU protection to inet->opt · f6d8bd05
      Eric Dumazet 提交于
      We lack proper synchronization to manipulate inet->opt ip_options
      
      Problem is ip_make_skb() calls ip_setup_cork() and
      ip_setup_cork() possibly makes a copy of ipc->opt (struct ip_options),
      without any protection against another thread manipulating inet->opt.
      
      Another thread can change inet->opt pointer and free old one under us.
      
      Use RCU to protect inet->opt (changed to inet->inet_opt).
      
      Instead of handling atomic refcounts, just copy ip_options when
      necessary, to avoid cache line dirtying.
      
      We cant insert an rcu_head in struct ip_options since its included in
      skb->cb[], so this patch is large because I had to introduce a new
      ip_options_rcu structure.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f6d8bd05
  11. 23 4月, 2011 1 次提交
  12. 26 10月, 2010 1 次提交
  13. 14 9月, 2010 1 次提交
  14. 24 6月, 2010 1 次提交
  15. 11 6月, 2010 1 次提交
    • E
      ip: ip_ra_control() rcu fix · 592fcb9d
      Eric Dumazet 提交于
      commit 66018506 (ip: Router Alert RCU conversion) introduced RCU
      lookups to ip_call_ra_chain(). It missed proper deinit phase :
      When ip_ra_control() deletes an ip_ra_chain, it should make sure
      ip_call_ra_chain() users can not start to use socket during the rcu
      grace period. It should also delay the sock_put() after the grace
      period, or we risk a premature socket freeing and corruptions, as
      raw sockets are not rcu protected yet.
      
      This delay avoids using expensive atomic_inc_not_zero() in
      ip_call_ra_chain().
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      592fcb9d
  16. 08 6月, 2010 1 次提交
  17. 29 4月, 2010 1 次提交
  18. 02 4月, 2010 1 次提交
  19. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  20. 12 1月, 2010 1 次提交
  21. 29 10月, 2009 1 次提交
  22. 20 10月, 2009 2 次提交
  23. 19 10月, 2009 1 次提交
    • E
      inet: rename some inet_sock fields · c720c7e8
      Eric Dumazet 提交于
      In order to have better cache layouts of struct sock (separate zones
      for rx/tx paths), we need this preliminary patch.
      
      Goal is to transfert fields used at lookup time in the first
      read-mostly cache line (inside struct sock_common) and move sk_refcnt
      to a separate cache line (only written by rx path)
      
      This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
      sport and id fields. This allows a future patch to define these
      fields as macros, like sk_refcnt, without name clashes.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c720c7e8
  24. 01 10月, 2009 1 次提交
  25. 25 9月, 2009 1 次提交
  26. 03 6月, 2009 1 次提交
  27. 02 6月, 2009 2 次提交
  28. 20 11月, 2008 1 次提交
  29. 17 11月, 2008 1 次提交
  30. 03 11月, 2008 1 次提交
  31. 01 10月, 2008 1 次提交
  32. 12 6月, 2008 1 次提交
  33. 29 4月, 2008 1 次提交
  34. 28 4月, 2008 1 次提交
  35. 10 4月, 2008 1 次提交
    • D
      [IPV4]: Fix byte value boundary check in do_ip_getsockopt(). · 951e07c9
      David S. Miller 提交于
      This fixes kernel bugzilla 10371.
      
      As reported by M.Piechaczek@osmosys.tv, if we try to grab a
      char sized socket option value, as in:
      
        unsigned char ttl = 255;
        socklen_t     len = sizeof(ttl);
        setsockopt(socket, IPPROTO_IP, IP_MULTICAST_TTL, &ttl, &len);
      
        getsockopt(socket, IPPROTO_IP, IP_MULTICAST_TTL, &ttl, &len);
      
      The ttl returned will be wrong on big-endian, and on both little-
      endian and big-endian the next three bytes in userspace are written
      with garbage.
      
      It's because of this test in do_ip_getsockopt():
      
      	if (len < sizeof(int) && len > 0 && val>=0 && val<255) {
      
      It should allow a 'val' of 255 to pass here, but it doesn't so it
      copies a full 'int' back to userspace.
      
      On little-endian that will write the correct value into the location
      but it spams on the next three bytes in userspace.  On big endian it
      writes the wrong value into the location and spams the next three
      bytes.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      951e07c9
  36. 26 3月, 2008 1 次提交
  37. 25 3月, 2008 1 次提交