1. 22 6月, 2011 3 次提交
  2. 21 6月, 2011 1 次提交
  3. 19 6月, 2011 1 次提交
  4. 18 6月, 2011 2 次提交
    • E
      inet_diag: fix inet_diag_bc_audit() · eeb14972
      Eric Dumazet 提交于
      A malicious user or buggy application can inject code and trigger an
      infinite loop in inet_diag_bc_audit()
      
      Also make sure each instruction is aligned on 4 bytes boundary, to avoid
      unaligned accesses.
      Reported-by: NDan Rosenberg <drosenberg@vsecurity.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eeb14972
    • E
      net: rfs: enable RFS before first data packet is received · 1eddcead
      Eric Dumazet 提交于
      Le jeudi 16 juin 2011 à 23:38 -0400, David Miller a écrit :
      > From: Ben Hutchings <bhutchings@solarflare.com>
      > Date: Fri, 17 Jun 2011 00:50:46 +0100
      >
      > > On Wed, 2011-06-15 at 04:15 +0200, Eric Dumazet wrote:
      > >> @@ -1594,6 +1594,7 @@ int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb)
      > >>  			goto discard;
      > >>
      > >>  		if (nsk != sk) {
      > >> +			sock_rps_save_rxhash(nsk, skb->rxhash);
      > >>  			if (tcp_child_process(sk, nsk, skb)) {
      > >>  				rsk = nsk;
      > >>  				goto reset;
      > >>
      > >
      > > I haven't tried this, but it looks reasonable to me.
      > >
      > > What about IPv6?  The logic in tcp_v6_do_rcv() looks very similar.
      >
      > Indeed ipv6 side needs the same fix.
      >
      > Eric please add that part and resubmit.  And in fact I might stick
      > this into net-2.6 instead of net-next-2.6
      >
      
      OK, here is the net-2.6 based one then, thanks !
      
      [PATCH v2] net: rfs: enable RFS before first data packet is received
      
      First packet received on a passive tcp flow is not correctly RFS
      steered.
      
      One sock_rps_record_flow() call is missing in inet_accept()
      
      But before that, we also must record rxhash when child socket is setup.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Tom Herbert <therbert@google.com>
      CC: Ben Hutchings <bhutchings@solarflare.com>
      CC: Jamal Hadi Salim <hadi@cyberus.ca>
      Signed-off-by: NDavid S. Miller <davem@conan.davemloft.net>
      1eddcead
  5. 16 6月, 2011 5 次提交
  6. 12 6月, 2011 1 次提交
    • E
      snmp: reduce percpu needs by 50% · 8f0ea0fe
      Eric Dumazet 提交于
      SNMP mibs use two percpu arrays, one used in BH context, another in USER
      context. With increasing number of cpus in machines, and fact that ipv6
      uses per network device ipstats_mib, this is consuming a lot of memory
      if many network devices are registered.
      
      commit be281e55 (ipv6: reduce per device ICMP mib sizes) shrinked
      percpu needs for ipv6, but we can reduce memory use a bit more.
      
      With recent percpu infrastructure (irqsafe_cpu_inc() ...), we no longer
      need this BH/USER separation since we can update counters in a single
      x86 instruction, regardless of the BH/USER context.
      
      Other arches than x86 might need to disable irq in their
      irqsafe_cpu_inc() implementation : If this happens to be a problem, we
      can make SNMP_ARRAY_SZ arch dependent, but a previous poll
      ( https://lkml.org/lkml/2011/3/17/174 ) to arch maintainers did not
      raise strong opposition.
      
      Only on 32bit arches, we need to disable BH for 64bit counters updates
      done from USER context (currently used for IP MIB)
      
      This also reduces vmlinux size :
      
      1) x86_64 build
      $ size vmlinux.before vmlinux.after
         text	   data	    bss	    dec	    hex	filename
      7853650	1293772	1896448	11043870	 a8841e	vmlinux.before
      7850578	1293772	1896448	11040798	 a8781e	vmlinux.after
      
      2) i386  build
      $ size vmlinux.before vmlinux.afterpatch
         text	   data	    bss	    dec	    hex	filename
      6039335	 635076	3670016	10344427	 9dd7eb	vmlinux.before
      6037342	 635076	3670016	10342434	 9dd022	vmlinux.afterpatch
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Andi Kleen <andi@firstfloor.org>
      CC: Ingo Molnar <mingo@elte.hu>
      CC: Tejun Heo <tj@kernel.org>
      CC: Christoph Lameter <cl@linux-foundation.org>
      CC: Benjamin Herrenschmidt <benh@kernel.crashing.org
      CC: linux-arch@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8f0ea0fe
  7. 10 6月, 2011 2 次提交
  8. 09 6月, 2011 3 次提交
    • E
      net: pmtu_expires fixes · fe6fe792
      Eric Dumazet 提交于
      commit 2c8cec5c (ipv4: Cache learned PMTU information in inetpeer)
      added some racy peer->pmtu_expires accesses.
      
      As its value can be changed by another cpu/thread, we should be more
      careful, reading its value once.
      
      Add peer_pmtu_expired() and peer_pmtu_cleaned() helpers
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe6fe792
    • E
      inetpeer: remove unused list · 4b9d9be8
      Eric Dumazet 提交于
      Andi Kleen and Tim Chen reported huge contention on inetpeer
      unused_peers.lock, on memcached workload on a 40 core machine, with
      disabled route cache.
      
      It appears we constantly flip peers refcnt between 0 and 1 values, and
      we must insert/remove peers from unused_peers.list, holding a contended
      spinlock.
      
      Remove this list completely and perform a garbage collection on-the-fly,
      at lookup time, using the expired nodes we met during the tree
      traversal.
      
      This removes a lot of code, makes locking more standard, and obsoletes
      two sysctls (inet_peer_gc_mintime and inet_peer_gc_maxtime). This also
      removes two pointers in inet_peer structure.
      
      There is still a false sharing effect because refcnt is in first cache
      line of object [were the links and keys used by lookups are located], we
      might move it at the end of inet_peer structure to let this first cache
      line mostly read by cpus.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Andi Kleen <andi@firstfloor.org>
      CC: Tim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4b9d9be8
    • J
      tcp: RFC2988bis + taking RTT sample from 3WHS for the passive open side · 9ad7c049
      Jerry Chu 提交于
      This patch lowers the default initRTO from 3secs to 1sec per
      RFC2988bis. It falls back to 3secs if the SYN or SYN-ACK packet
      has been retransmitted, AND the TCP timestamp option is not on.
      
      It also adds support to take RTT sample during 3WHS on the passive
      open side, just like its active open counterpart, and uses it, if
      valid, to seed the initRTO for the data transmission phase.
      
      The patch also resets ssthresh to its initial default at the
      beginning of the data transmission phase, and reduces cwnd to 1 if
      there has been MORE THAN ONE retransmission during 3WHS per RFC5681.
      Signed-off-by: NH.K. Jerry Chu <hkchu@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ad7c049
  9. 06 6月, 2011 4 次提交
  10. 02 6月, 2011 1 次提交
  11. 01 6月, 2011 1 次提交
    • C
      ip_options_compile: properly handle unaligned pointer · 48bdf072
      Chris Metcalf 提交于
      The current code takes an unaligned pointer and does htonl() on it to
      make it big-endian, then does a memcpy().  The problem is that the
      compiler decides that since the pointer is to a __be32, it is legal
      to optimize the copy into a processor word store.  However, on an
      architecture that does not handled unaligned writes in kernel space,
      this produces an unaligned exception fault.
      
      The solution is to track the pointer as a "char *" (which removes a bunch
      of unpleasant casts in any case), and then just use put_unaligned_be32()
      to write the value to memory.
      Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
      Signed-off-by: NDavid S. Miller <davem@zippy.davemloft.net>
      48bdf072
  12. 28 5月, 2011 1 次提交
  13. 25 5月, 2011 1 次提交
  14. 24 5月, 2011 3 次提交
    • E
      seqlock: Get rid of SEQLOCK_UNLOCKED · c4dbe54e
      Eric Dumazet 提交于
      All static seqlock should be initialized with the lockdep friendly
      __SEQLOCK_UNLOCKED() macro.
      
      Remove legacy SEQLOCK_UNLOCKED() macro.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: David Miller <davem@davemloft.net>
      Link: http://lkml.kernel.org/r/%3C1306238888.3026.31.camel%40edumazet-laptop%3ESigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      c4dbe54e
    • D
      net: convert %p usage to %pK · 71338aa7
      Dan Rosenberg 提交于
      The %pK format specifier is designed to hide exposed kernel pointers,
      specifically via /proc interfaces.  Exposing these pointers provides an
      easy target for kernel write vulnerabilities, since they reveal the
      locations of writable structures containing easily triggerable function
      pointers.  The behavior of %pK depends on the kptr_restrict sysctl.
      
      If kptr_restrict is set to 0, no deviation from the standard %p behavior
      occurs.  If kptr_restrict is set to 1, the default, if the current user
      (intended to be a reader via seq_printf(), etc.) does not have CAP_SYSLOG
      (currently in the LSM tree), kernel pointers using %pK are printed as 0's.
       If kptr_restrict is set to 2, kernel pointers using %pK are printed as
      0's regardless of privileges.  Replacing with 0's was chosen over the
      default "(null)", which cannot be parsed by userland %p, which expects
      "(nil)".
      
      The supporting code for kptr_restrict and %pK are currently in the -mm
      tree.  This patch converts users of %p in net/ to %pK.  Cases of printing
      pointers to the syslog are not covered, since this would eliminate useful
      information for postmortem debugging and the reading of the syslog is
      already optionally protected by the dmesg_restrict sysctl.
      Signed-off-by: NDan Rosenberg <drosenberg@vsecurity.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Thomas Graf <tgraf@infradead.org>
      Cc: Eugene Teo <eugeneteo@kernel.org>
      Cc: Kees Cook <kees.cook@canonical.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Eric Paris <eparis@parisplace.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      71338aa7
    • E
      net: ping: cleanups ping_v4_unhash() · 19a76fa9
      Eric Dumazet 提交于
      net/ipv4/ping.c: In function ‘ping_v4_unhash’:
      net/ipv4/ping.c:140:28: warning: variable ‘hslot’ set but not used
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Vasiliy Kulikov <segoon@openwall.com>
      Acked-by: NVasiliy Kulikov <segoon@openwall.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      19a76fa9
  15. 23 5月, 2011 3 次提交
  16. 20 5月, 2011 3 次提交
  17. 19 5月, 2011 4 次提交
  18. 18 5月, 2011 1 次提交