1. 01 8月, 2011 4 次提交
  2. 27 7月, 2011 1 次提交
  3. 26 7月, 2011 1 次提交
    • Y
      ipv6: Do not leave router anycast address for /127 prefixes. · 32019e65
      YOSHIFUJI Hideaki 提交于
      Original commit 2bda8a0c... "Disable router anycast
      address for /127 prefixes" says:
      
      |   No need for matching code in addrconf_leave_anycast() as it
      |   will silently ignore any attempt to leave an unknown anycast
      |   address.
      
      After analysis, because 1) we may add two or more prefixes on the
      same interface, or 2)user may have manually joined that anycast,
      we may hit chances to have anycast address which as if we had
      generated one by /127 prefix and we should not leave from subnet-
      router anycast address unconditionally.
      
      CC: Bjørn Mork <bjorn@mork.no>
      CC: Brian Haley <brian.haley@hp.com>
      Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      32019e65
  4. 22 7月, 2011 2 次提交
    • E
      ipv6: make fragment identifications less predictable · 87c48fa3
      Eric Dumazet 提交于
      IPv6 fragment identification generation is way beyond what we use for
      IPv4 : It uses a single generator. Its not scalable and allows DOS
      attacks.
      
      Now inetpeer is IPv6 aware, we can use it to provide a more secure and
      scalable frag ident generator (per destination, instead of system wide)
      
      This patch :
      1) defines a new secure_ipv6_id() helper
      2) extends inet_getid() to provide 32bit results
      3) extends ipv6_select_ident() with a new dest parameter
      Reported-by: NFernando Gont <fernando@gont.com.ar>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      87c48fa3
    • E
      ipv6: unshare inetpeers · 21efcfa0
      Eric Dumazet 提交于
      We currently cow metrics a bit too soon in IPv6 case : All routes are
      tied to a single inetpeer entry.
      
      Change ip6_rt_copy() to get destination address as second argument, so
      that we fill rt6i_dst before the dst_copy_metrics() call.
      
      icmp6_dst_alloc() must set rt6i_dst before calling dst_metric_set(), or
      else the cow is done while rt6i_dst is still NULL.
      
      If orig route points to readonly metrics, we can share the pointer
      instead of performing the memory allocation and copy.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      21efcfa0
  5. 18 7月, 2011 4 次提交
  6. 17 7月, 2011 4 次提交
  7. 14 7月, 2011 1 次提交
    • D
      net: Embed hh_cache inside of struct neighbour. · f6b72b62
      David S. Miller 提交于
      Now that there is a one-to-one correspondance between neighbour
      and hh_cache entries, we no longer need:
      
      1) dynamic allocation
      2) attachment to dst->hh
      3) refcounting
      
      Initialization of the hh_cache entry is indicated by hh_len
      being non-zero, and such initialization is always done with
      the neighbour's lock held as a writer.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f6b72b62
  8. 07 7月, 2011 1 次提交
  9. 05 7月, 2011 1 次提交
    • M
      net: bind() fix error return on wrong address family · c349a528
      Marcus Meissner 提交于
      Hi,
      
      Reinhard Max also pointed out that the error should EAFNOSUPPORT according
      to POSIX.
      
      The Linux manpages have it as EINVAL, some other OSes (Minix, HPUX, perhaps BSD) use
      EAFNOSUPPORT. Windows uses WSAEFAULT according to MSDN.
      
      Other protocols error values in their af bind() methods in current mainline git as far
      as a brief look shows:
      	EAFNOSUPPORT: atm, appletalk, l2tp, llc, phonet, rxrpc
      	EINVAL: ax25, bluetooth, decnet, econet, ieee802154, iucv, netlink, netrom, packet, rds, rose, unix, x25,
      	No check?: can/raw, ipv6/raw, irda, l2tp/l2tp_ip
      
      Ciao, Marcus
      Signed-off-by: NMarcus Meissner <meissner@suse.de>
      Cc: Reinhard Max <max@suse.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c349a528
  10. 02 7月, 2011 3 次提交
  11. 22 6月, 2011 2 次提交
    • X
      udp/recvmsg: Clear MSG_TRUNC flag when starting over for a new packet · 9cfaa8de
      Xufeng Zhang 提交于
      Consider this scenario: When the size of the first received udp packet
      is bigger than the receive buffer, MSG_TRUNC bit is set in msg->msg_flags.
      However, if checksum error happens and this is a blocking socket, it will
      goto try_again loop to receive the next packet.  But if the size of the
      next udp packet is smaller than receive buffer, MSG_TRUNC flag should not
      be set, but because MSG_TRUNC bit is not cleared in msg->msg_flags before
      receive the next packet, MSG_TRUNC is still set, which is wrong.
      
      Fix this problem by clearing MSG_TRUNC flag when starting over for a
      new packet.
      Signed-off-by: NXufeng Zhang <xufeng.zhang@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9cfaa8de
    • X
      ipv6/udp: Use the correct variable to determine non-blocking condition · 32c90254
      Xufeng Zhang 提交于
      udpv6_recvmsg() function is not using the correct variable to determine
      whether or not the socket is in non-blocking operation, this will lead
      to unexpected behavior when a UDP checksum error occurs.
      
      Consider a non-blocking udp receive scenario: when udpv6_recvmsg() is
      called by sock_common_recvmsg(), MSG_DONTWAIT bit of flags variable in
      udpv6_recvmsg() is cleared by "flags & ~MSG_DONTWAIT" in this call:
      
          err = sk->sk_prot->recvmsg(iocb, sk, msg, size, flags & MSG_DONTWAIT,
                         flags & ~MSG_DONTWAIT, &addr_len);
      
      i.e. with udpv6_recvmsg() getting these values:
      
      	int noblock = flags & MSG_DONTWAIT
      	int flags = flags & ~MSG_DONTWAIT
      
      So, when udp checksum error occurs, the execution will go to
      csum_copy_err, and then the problem happens:
      
          csum_copy_err:
                  ...............
                  if (flags & MSG_DONTWAIT)
                          return -EAGAIN;
                  goto try_again;
                  ...............
      
      But it will always go to try_again as MSG_DONTWAIT has been cleared
      from flags at call time -- only noblock contains the original value
      of MSG_DONTWAIT, so the test should be:
      
                  if (noblock)
                          return -EAGAIN;
      
      This is also consistent with what the ipv4/udp code does.
      Signed-off-by: NXufeng Zhang <xufeng.zhang@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      32c90254
  12. 18 6月, 2011 1 次提交
    • E
      net: rfs: enable RFS before first data packet is received · 1eddcead
      Eric Dumazet 提交于
      Le jeudi 16 juin 2011 à 23:38 -0400, David Miller a écrit :
      > From: Ben Hutchings <bhutchings@solarflare.com>
      > Date: Fri, 17 Jun 2011 00:50:46 +0100
      >
      > > On Wed, 2011-06-15 at 04:15 +0200, Eric Dumazet wrote:
      > >> @@ -1594,6 +1594,7 @@ int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb)
      > >>  			goto discard;
      > >>
      > >>  		if (nsk != sk) {
      > >> +			sock_rps_save_rxhash(nsk, skb->rxhash);
      > >>  			if (tcp_child_process(sk, nsk, skb)) {
      > >>  				rsk = nsk;
      > >>  				goto reset;
      > >>
      > >
      > > I haven't tried this, but it looks reasonable to me.
      > >
      > > What about IPv6?  The logic in tcp_v6_do_rcv() looks very similar.
      >
      > Indeed ipv6 side needs the same fix.
      >
      > Eric please add that part and resubmit.  And in fact I might stick
      > this into net-2.6 instead of net-next-2.6
      >
      
      OK, here is the net-2.6 based one then, thanks !
      
      [PATCH v2] net: rfs: enable RFS before first data packet is received
      
      First packet received on a passive tcp flow is not correctly RFS
      steered.
      
      One sock_rps_record_flow() call is missing in inet_accept()
      
      But before that, we also must record rxhash when child socket is setup.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Tom Herbert <therbert@google.com>
      CC: Ben Hutchings <bhutchings@solarflare.com>
      CC: Jamal Hadi Salim <hadi@cyberus.ca>
      Signed-off-by: NDavid S. Miller <davem@conan.davemloft.net>
      1eddcead
  13. 16 6月, 2011 1 次提交
    • N
      netfilter: fix looped (broad|multi)cast's MAC handling · 2c38de4c
      Nicolas Cavallari 提交于
      By default, when broadcast or multicast packet are sent from a local
      application, they are sent to the interface then looped by the kernel
      to other local applications, going throught netfilter hooks in the
      process.
      
      These looped packet have their MAC header removed from the skb by the
      kernel looping code. This confuse various netfilter's netlink queue,
      netlink log and the legacy ip_queue, because they try to extract a
      hardware address from these packets, but extracts a part of the IP
      header instead.
      
      This patch prevent NFQUEUE, NFLOG and ip_QUEUE to include a MAC header
      if there is none in the packet.
      Signed-off-by: NNicolas Cavallari <cavallar@lri.fr>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      2c38de4c
  14. 10 6月, 2011 1 次提交
    • G
      rtnetlink: Compute and store minimum ifinfo dump size · c7ac8679
      Greg Rose 提交于
      The message size allocated for rtnl ifinfo dumps was limited to
      a single page.  This is not enough for additional interface info
      available with devices that support SR-IOV and caused a bug in
      which VF info would not be displayed if more than approximately
      40 VFs were created per interface.
      
      Implement a new function pointer for the rtnl_register service that will
      calculate the amount of data required for the ifinfo dump and allocate
      enough data to satisfy the request.
      Signed-off-by: NGreg Rose <gregory.v.rose@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      c7ac8679
  15. 09 6月, 2011 2 次提交
  16. 07 6月, 2011 1 次提交
  17. 06 6月, 2011 3 次提交
  18. 24 5月, 2011 2 次提交
    • D
      net: convert %p usage to %pK · 71338aa7
      Dan Rosenberg 提交于
      The %pK format specifier is designed to hide exposed kernel pointers,
      specifically via /proc interfaces.  Exposing these pointers provides an
      easy target for kernel write vulnerabilities, since they reveal the
      locations of writable structures containing easily triggerable function
      pointers.  The behavior of %pK depends on the kptr_restrict sysctl.
      
      If kptr_restrict is set to 0, no deviation from the standard %p behavior
      occurs.  If kptr_restrict is set to 1, the default, if the current user
      (intended to be a reader via seq_printf(), etc.) does not have CAP_SYSLOG
      (currently in the LSM tree), kernel pointers using %pK are printed as 0's.
       If kptr_restrict is set to 2, kernel pointers using %pK are printed as
      0's regardless of privileges.  Replacing with 0's was chosen over the
      default "(null)", which cannot be parsed by userland %p, which expects
      "(nil)".
      
      The supporting code for kptr_restrict and %pK are currently in the -mm
      tree.  This patch converts users of %p in net/ to %pK.  Cases of printing
      pointers to the syslog are not covered, since this would eliminate useful
      information for postmortem debugging and the reading of the syslog is
      already optionally protected by the dmesg_restrict sysctl.
      Signed-off-by: NDan Rosenberg <drosenberg@vsecurity.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Thomas Graf <tgraf@infradead.org>
      Cc: Eugene Teo <eugeneteo@kernel.org>
      Cc: Kees Cook <kees.cook@canonical.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Eric Paris <eparis@parisplace.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      71338aa7
    • D
      ipv6: Fix return of xfrm6_tunnel_rcv() · 6ac3f664
      David S. Miller 提交于
      Like ipv4, just return xfrm6_rcv_spi()'s return value directly.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6ac3f664
  19. 21 5月, 2011 1 次提交
  20. 20 5月, 2011 1 次提交
    • E
      ipv6: reduce per device ICMP mib sizes · be281e55
      Eric Dumazet 提交于
      ipv6 has per device ICMP SNMP counters, taking too much space because
      they use percpu storage.
      
      needed size per device is :
      (512+4)*sizeof(long)*number_of_possible_cpus*2
      
      On a 32bit kernel, 16 possible cpus, this wastes more than 64kbytes of
      memory per ipv6 enabled network device, taken in vmalloc pool.
      
      Since ICMP messages are rare, just use shared counters (atomic_long_t)
      
      Per network space ICMP counters are still using percpu memory, we might
      also convert them to shared counters in a future patch.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Denys Fedoryshchenko <denys@visp.net.lb>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      be281e55
  21. 11 5月, 2011 1 次提交
    • S
      xfrm: Assign the inner mode output function to the dst entry · 43a4dea4
      Steffen Klassert 提交于
      As it is, we assign the outer modes output function to the dst entry
      when we create the xfrm bundle. This leads to two problems on interfamily
      scenarios. We might insert ipv4 packets into ip6_fragment when called
      from xfrm6_output. The system crashes if we try to fragment an ipv4
      packet with ip6_fragment. This issue was introduced with git commit
      ad0081e4 (ipv6: Fragment locally generated tunnel-mode IPSec6 packets
      as needed). The second issue is, that we might insert ipv4 packets in
      netfilter6 and vice versa on interfamily scenarios.
      
      With this patch we assign the inner mode output function to the dst entry
      when we create the xfrm bundle. So xfrm4_output/xfrm6_output from the inner
      mode is used and the right fragmentation and netfilter functions are called.
      We switch then to outer mode with the output_finish functions.
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      43a4dea4
  22. 10 5月, 2011 1 次提交
  23. 09 5月, 2011 1 次提交
    • D
      inet: Pass flowi to ->queue_xmit(). · d9d8da80
      David S. Miller 提交于
      This allows us to acquire the exact route keying information from the
      protocol, however that might be managed.
      
      It handles all of the possibilities, from the simplest case of storing
      the key in inet->cork.fl to the more complex setup SCTP has where
      individual transports determine the flow.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d9d8da80