1. 01 8月, 2012 5 次提交
    • M
      netvm: set PF_MEMALLOC as appropriate during SKB processing · b4b9e355
      Mel Gorman 提交于
      In order to make sure pfmemalloc packets receive all memory needed to
      proceed, ensure processing of pfmemalloc SKBs happens under PF_MEMALLOC.
      This is limited to a subset of protocols that are expected to be used for
      writing to swap.  Taps are not allowed to use PF_MEMALLOC as these are
      expected to communicate with userspace processes which could be paged out.
      
      [a.p.zijlstra@chello.nl: Ideas taken from various patches]
      [jslaby@suse.cz: Lock imbalance fix]
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Christie <michaelc@cs.wisc.edu>
      Cc: Eric B Munson <emunson@mgebm.net>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b4b9e355
    • M
      netvm: allow skb allocation to use PFMEMALLOC reserves · c93bdd0e
      Mel Gorman 提交于
      Change the skb allocation API to indicate RX usage and use this to fall
      back to the PFMEMALLOC reserve when needed.  SKBs allocated from the
      reserve are tagged in skb->pfmemalloc.  If an SKB is allocated from the
      reserve and the socket is later found to be unrelated to page reclaim, the
      packet is dropped so that the memory remains available for page reclaim.
      Network protocols are expected to recover from this packet loss.
      
      [a.p.zijlstra@chello.nl: Ideas taken from various patches]
      [davem@davemloft.net: Use static branches, coding style corrections]
      [sebastian@breakpoint.cc: Avoid unnecessary cast, fix !CONFIG_NET build]
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Christie <michaelc@cs.wisc.edu>
      Cc: Eric B Munson <emunson@mgebm.net>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c93bdd0e
    • M
      netvm: allow the use of __GFP_MEMALLOC by specific sockets · 7cb02404
      Mel Gorman 提交于
      Allow specific sockets to be tagged SOCK_MEMALLOC and use __GFP_MEMALLOC
      for their allocations.  These sockets will be able to go below watermarks
      and allocate from the emergency reserve.  Such sockets are to be used to
      service the VM (iow.  to swap over).  They must be handled kernel side,
      exposing such a socket to user-space is a bug.
      
      There is a risk that the reserves be depleted so for now, the
      administrator is responsible for increasing min_free_kbytes as necessary
      to prevent deadlock for their workloads.
      
      [a.p.zijlstra@chello.nl: Original patches]
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Christie <michaelc@cs.wisc.edu>
      Cc: Eric B Munson <emunson@mgebm.net>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7cb02404
    • M
      net: introduce sk_gfp_atomic() to allow addition of GFP flags depending on the individual socket · 99a1dec7
      Mel Gorman 提交于
      Introduce sk_gfp_atomic(), this function allows to inject sock specific
      flags to each sock related allocation.  It is only used on allocation
      paths that may be required for writing pages back to network storage.
      
      [davem@davemloft.net: Use sk_gfp_atomic only when necessary]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Mike Christie <michaelc@cs.wisc.edu>
      Cc: Eric B Munson <emunson@mgebm.net>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      99a1dec7
    • A
      memcg: rename config variables · c255a458
      Andrew Morton 提交于
      Sanity:
      
      CONFIG_CGROUP_MEM_RES_CTLR -> CONFIG_MEMCG
      CONFIG_CGROUP_MEM_RES_CTLR_SWAP -> CONFIG_MEMCG_SWAP
      CONFIG_CGROUP_MEM_RES_CTLR_SWAP_ENABLED -> CONFIG_MEMCG_SWAP_ENABLED
      CONFIG_CGROUP_MEM_RES_CTLR_KMEM -> CONFIG_MEMCG_KMEM
      
      [mhocko@suse.cz: fix missed bits]
      Cc: Glauber Costa <glommer@parallels.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c255a458
  2. 31 7月, 2012 1 次提交
  3. 28 7月, 2012 5 次提交
    • J
      tcp: perform DMA to userspace only if there is a task waiting for it · 59ea33a6
      Jiri Kosina 提交于
      Back in 2006, commit 1a2449a8 ("[I/OAT]: TCP recv offload to I/OAT")
      added support for receive offloading to IOAT dma engine if available.
      
      The code in tcp_rcv_established() tries to perform early DMA copy if
      applicable. It however does so without checking whether the userspace
      task is actually expecting the data in the buffer.
      
      This is not a problem under normal circumstances, but there is a corner
      case where this doesn't work -- and that's when MSG_TRUNC flag to
      recvmsg() is used.
      
      If the IOAT dma engine is not used, the code properly checks whether
      there is a valid ucopy.task and the socket is owned by userspace, but
      misses the check in the dmaengine case.
      
      This problem can be observed in real trivially -- for example 'tbench' is a
      good reproducer, as it makes a heavy use of MSG_TRUNC. On systems utilizing
      IOAT, you will soon find tbench waiting indefinitely in sk_wait_data(), as they
      have been already early-copied in tcp_rcv_established() using dma engine.
      
      This patch introduces the same check we are performing in the simple
      iovec copy case to the IOAT case as well. It fixes the indefinite
      recvmsg(MSG_TRUNC) hangs.
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      59ea33a6
    • J
      Revert "openvswitch: potential NULL deref in sample()" · 60810307
      Jesse Gross 提交于
      This reverts commit 5b3e7e6c.
      
      The problem that the original commit was attempting to fix can
      never happen in practice because validation is done one a per-flow
      basis rather than a per-packet basis.  Adding additional checks at
      runtime is unnecessary and inconsistent with the rest of the code.
      
      CC: Dan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      60810307
    • E
      ipv4: fix TCP early demux · 505fbcf0
      Eric Dumazet 提交于
      commit 92101b3b (ipv4: Prepare for change of rt->rt_iif encoding.)
      invalidated TCP early demux, because rx_dst_ifindex is not properly
      initialized and checked.
      
      Also remove the use of inet_iif(skb) in favor or skb->skb_iif
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      505fbcf0
    • J
      net: fix rtnetlink IFF_PROMISC and IFF_ALLMULTI handling · b1beb681
      Jiri Benc 提交于
      When device flags are set using rtnetlink, IFF_PROMISC and IFF_ALLMULTI
      flags are handled specially. Function dev_change_flags sets IFF_PROMISC and
      IFF_ALLMULTI bits in dev->gflags according to the passed value but
      do_setlink passes a result of rtnl_dev_combine_flags which takes those bits
      from dev->flags.
      
      This can be easily trigerred by doing:
      
      tcpdump -i eth0 &
      ip l s up eth0
      
      ip sets IFF_UP flag in ifi_flags and ifi_change, which is combined with
      IFF_PROMISC by rtnl_dev_combine_flags, causing __dev_change_flags to set
      IFF_PROMISC in gflags.
      Reported-by: NMax Matveev <makc@redhat.com>
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1beb681
    • H
      tcp: Add TCP_USER_TIMEOUT negative value check · 42493570
      Hangbin Liu 提交于
      TCP_USER_TIMEOUT is a TCP level socket option that takes an unsigned int. But
      patch "tcp: Add TCP_USER_TIMEOUT socket option"(dca43c75) didn't check the negative
      values. If a user assign -1 to it, the socket will set successfully and wait
      for 4294967295 miliseconds. This patch add a negative value check to avoid
      this issue.
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      42493570
  4. 27 7月, 2012 2 次提交
  5. 26 7月, 2012 1 次提交
  6. 25 7月, 2012 2 次提交
  7. 24 7月, 2012 10 次提交
  8. 23 7月, 2012 14 次提交
    • W
      rds: set correct msg_namelen · 06b6a1cf
      Weiping Pan 提交于
      Jay Fenlason (fenlason@redhat.com) found a bug,
      that recvfrom() on an RDS socket can return the contents of random kernel
      memory to userspace if it was called with a address length larger than
      sizeof(struct sockaddr_in).
      rds_recvmsg() also fails to set the addr_len paramater properly before
      returning, but that's just a bug.
      There are also a number of cases wher recvfrom() can return an entirely bogus
      address. Anything in rds_recvmsg() that returns a non-negative value but does
      not go through the "sin = (struct sockaddr_in *)msg->msg_name;" code path
      at the end of the while(1) loop will return up to 128 bytes of kernel memory
      to userspace.
      
      And I write two test programs to reproduce this bug, you will see that in
      rds_server, fromAddr will be overwritten and the following sock_fd will be
      destroyed.
      Yes, it is the programmer's fault to set msg_namelen incorrectly, but it is
      better to make the kernel copy the real length of address to user space in
      such case.
      
      How to run the test programs ?
      I test them on 32bit x86 system, 3.5.0-rc7.
      
      1 compile
      gcc -o rds_client rds_client.c
      gcc -o rds_server rds_server.c
      
      2 run ./rds_server on one console
      
      3 run ./rds_client on another console
      
      4 you will see something like:
      server is waiting to receive data...
      old socket fd=3
      server received data from client:data from client
      msg.msg_namelen=32
      new socket fd=-1067277685
      sendmsg()
      : Bad file descriptor
      
      /***************** rds_client.c ********************/
      
      int main(void)
      {
      	int sock_fd;
      	struct sockaddr_in serverAddr;
      	struct sockaddr_in toAddr;
      	char recvBuffer[128] = "data from client";
      	struct msghdr msg;
      	struct iovec iov;
      
      	sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0);
      	if (sock_fd < 0) {
      		perror("create socket error\n");
      		exit(1);
      	}
      
      	memset(&serverAddr, 0, sizeof(serverAddr));
      	serverAddr.sin_family = AF_INET;
      	serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
      	serverAddr.sin_port = htons(4001);
      
      	if (bind(sock_fd, (struct sockaddr*)&serverAddr, sizeof(serverAddr)) < 0) {
      		perror("bind() error\n");
      		close(sock_fd);
      		exit(1);
      	}
      
      	memset(&toAddr, 0, sizeof(toAddr));
      	toAddr.sin_family = AF_INET;
      	toAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
      	toAddr.sin_port = htons(4000);
      	msg.msg_name = &toAddr;
      	msg.msg_namelen = sizeof(toAddr);
      	msg.msg_iov = &iov;
      	msg.msg_iovlen = 1;
      	msg.msg_iov->iov_base = recvBuffer;
      	msg.msg_iov->iov_len = strlen(recvBuffer) + 1;
      	msg.msg_control = 0;
      	msg.msg_controllen = 0;
      	msg.msg_flags = 0;
      
      	if (sendmsg(sock_fd, &msg, 0) == -1) {
      		perror("sendto() error\n");
      		close(sock_fd);
      		exit(1);
      	}
      
      	printf("client send data:%s\n", recvBuffer);
      
      	memset(recvBuffer, '\0', 128);
      
      	msg.msg_name = &toAddr;
      	msg.msg_namelen = sizeof(toAddr);
      	msg.msg_iov = &iov;
      	msg.msg_iovlen = 1;
      	msg.msg_iov->iov_base = recvBuffer;
      	msg.msg_iov->iov_len = 128;
      	msg.msg_control = 0;
      	msg.msg_controllen = 0;
      	msg.msg_flags = 0;
      	if (recvmsg(sock_fd, &msg, 0) == -1) {
      		perror("recvmsg() error\n");
      		close(sock_fd);
      		exit(1);
      	}
      
      	printf("receive data from server:%s\n", recvBuffer);
      
      	close(sock_fd);
      
      	return 0;
      }
      
      /***************** rds_server.c ********************/
      
      int main(void)
      {
      	struct sockaddr_in fromAddr;
      	int sock_fd;
      	struct sockaddr_in serverAddr;
      	unsigned int addrLen;
      	char recvBuffer[128];
      	struct msghdr msg;
      	struct iovec iov;
      
      	sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0);
      	if(sock_fd < 0) {
      		perror("create socket error\n");
      		exit(0);
      	}
      
      	memset(&serverAddr, 0, sizeof(serverAddr));
      	serverAddr.sin_family = AF_INET;
      	serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
      	serverAddr.sin_port = htons(4000);
      	if (bind(sock_fd, (struct sockaddr*)&serverAddr, sizeof(serverAddr)) < 0) {
      		perror("bind error\n");
      		close(sock_fd);
      		exit(1);
      	}
      
      	printf("server is waiting to receive data...\n");
      	msg.msg_name = &fromAddr;
      
      	/*
      	 * I add 16 to sizeof(fromAddr), ie 32,
      	 * and pay attention to the definition of fromAddr,
      	 * recvmsg() will overwrite sock_fd,
      	 * since kernel will copy 32 bytes to userspace.
      	 *
      	 * If you just use sizeof(fromAddr), it works fine.
      	 * */
      	msg.msg_namelen = sizeof(fromAddr) + 16;
      	/* msg.msg_namelen = sizeof(fromAddr); */
      	msg.msg_iov = &iov;
      	msg.msg_iovlen = 1;
      	msg.msg_iov->iov_base = recvBuffer;
      	msg.msg_iov->iov_len = 128;
      	msg.msg_control = 0;
      	msg.msg_controllen = 0;
      	msg.msg_flags = 0;
      
      	while (1) {
      		printf("old socket fd=%d\n", sock_fd);
      		if (recvmsg(sock_fd, &msg, 0) == -1) {
      			perror("recvmsg() error\n");
      			close(sock_fd);
      			exit(1);
      		}
      		printf("server received data from client:%s\n", recvBuffer);
      		printf("msg.msg_namelen=%d\n", msg.msg_namelen);
      		printf("new socket fd=%d\n", sock_fd);
      		strcat(recvBuffer, "--data from server");
      		if (sendmsg(sock_fd, &msg, 0) == -1) {
      			perror("sendmsg()\n");
      			close(sock_fd);
      			exit(1);
      		}
      	}
      
      	close(sock_fd);
      	return 0;
      }
      Signed-off-by: NWeiping Pan <wpan@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06b6a1cf
    • D
      openvswitch: potential NULL deref in sample() · 5b3e7e6c
      Dan Carpenter 提交于
      If there is no OVS_SAMPLE_ATTR_ACTIONS set then "acts_list" is NULL and
      it leads to a NULL dereference when we call nla_len(acts_list).  This
      is a static checker fix, not something I have seen in testing.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5b3e7e6c
    • E
      tcp: dont drop MTU reduction indications · 563d34d0
      Eric Dumazet 提交于
      ICMP messages generated in output path if frame length is bigger than
      mtu are actually lost because socket is owned by user (doing the xmit)
      
      One example is the ipgre_tunnel_xmit() calling
      icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu));
      
      We had a similar case fixed in commit a34a101e (ipv6: disable GSO on
      sockets hitting dst_allfrag).
      
      Problem of such fix is that it relied on retransmit timers, so short tcp
      sessions paid a too big latency increase price.
      
      This patch uses the tcp_release_cb() infrastructure so that MTU
      reduction messages (ICMP messages) are not lost, and no extra delay
      is added in TCP transmits.
      Reported-by: NMaciej Żenczykowski <maze@google.com>
      Diagnosed-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Nandita Dukkipati <nanditad@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Tore Anderson <tore@fud.no>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      563d34d0
    • J
      tcp: avoid oops in tcp_metrics and reset tcpm_stamp · 9a0a9502
      Julian Anastasov 提交于
      	In tcp_tw_remember_stamp we incorrectly checked tw
      instead of tm, it can lead to oops if the cached entry is
      not found.
      
      	tcpm_stamp was not updated in tcpm_check_stamp when
      tcpm_suck_dst was called, move the update into tcpm_suck_dst,
      so that we do not call it infinitely on every next cache hit
      after TCP_METRICS_TIMEOUT.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9a0a9502
    • J
      net: Fix references to out-of-scope variables in put_cmsg_compat() · 81881047
      Jesper Juhl 提交于
      In net/compat.c::put_cmsg_compat() we may assign 'data' the address of
      either the 'ctv' or 'cts' local variables inside the 'if
      (!COMPAT_USE_64BIT_TIME)' branch.
      
      Those variables go out of scope at the end of the 'if' statement, so
      when we use 'data' further down in 'copy_to_user(CMSG_COMPAT_DATA(cm),
      data, cmlen - sizeof(struct compat_cmsghdr))' there's no telling what
      it may be refering to - not good.
      
      Fix the problem by simply giving 'ctv' and 'cts' function scope.
      Signed-off-by: NJesper Juhl <jj@chaosbits.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      81881047
    • A
      get rid of ->scm_work_list · 6120d3db
      Al Viro 提交于
      recursion in __scm_destroy() will be cut by delaying final fput()
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      6120d3db
    • J
      net: netprio_cgroup: rework update socket logic · 406a3c63
      John Fastabend 提交于
      Instead of updating the sk_cgrp_prioidx struct field on every send
      this only updates the field when a task is moved via cgroup
      infrastructure.
      
      This allows sockets that may be used by a kernel worker thread
      to be managed. For example in the iscsi case today a user can
      put iscsid in a netprio cgroup and control traffic will be sent
      with the correct sk_cgrp_prioidx value set but as soon as data
      is sent the kernel worker thread isssues a send and sk_cgrp_prioidx
      is updated with the kernel worker threads value which is the
      default case.
      
      It seems more correct to only update the field when the user
      explicitly sets it via control group infrastructure. This allows
      the users to manage sockets that may be used with other threads.
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      406a3c63
    • M
      skbuff: export skb_copy_ubufs · dcc0fb78
      Michael S. Tsirkin 提交于
      Export skb_copy_ubufs so that modules can orphan frags.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dcc0fb78
    • M
      net: orphan frags on receive · 1080e512
      Michael S. Tsirkin 提交于
      zero copy packets are normally sent to the outside
      network, but bridging, tun etc might loop them
      back to host networking stack. If this happens
      destructors will never be called, so orphan
      the frags immediately on receive.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1080e512
    • M
      skbuff: convert to skb_orphan_frags · 70008aa5
      Michael S. Tsirkin 提交于
      Reduce code duplication a bit using the new helper.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      70008aa5
    • M
      rtnl: Add #ifdef CONFIG_RPS around num_rx_queues reference · 1d69c2b3
      Mark A. Greer 提交于
      Commit 76ff5cc9
      (rtnl: allow to specify number of rx and tx queues
      on device creation) added a reference to the net_device
      structure's 'num_rx_queues' member in
      
      	net/core/rtnetlink.c:rtnl_fill_ifinfo()
      
      However, the definition for 'num_rx_queues' is surrounded
      by an '#ifdef CONFIG_RPS' while the new reference to it is
      not.  This causes a compile error when CONFIG_RPS is not
      defined.
      
      Fix the compile error by surrounding the new reference to
      'num_rx_queues' by an '#ifdef CONFIG_RPS'.
      
      CC: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NMark A. Greer <mgreer@animalcreek.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d69c2b3
    • N
      sctp: Implement quick failover draft from tsvwg · 5aa93bcf
      Neil Horman 提交于
      I've seen several attempts recently made to do quick failover of sctp transports
      by reducing various retransmit timers and counters.  While its possible to
      implement a faster failover on multihomed sctp associations, its not
      particularly robust, in that it can lead to unneeded retransmits, as well as
      false connection failures due to intermittent latency on a network.
      
      Instead, lets implement the new ietf quick failover draft found here:
      http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05
      
      This will let the sctp stack identify transports that have had a small number of
      errors, and avoid using them quickly until their reliability can be
      re-established.  I've tested this out on two virt guests connected via multiple
      isolated virt networks and believe its in compliance with the above draft and
      works well.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      CC: Vlad Yasevich <vyasevich@gmail.com>
      CC: Sridhar Samudrala <sri@us.ibm.com>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: linux-sctp@vger.kernel.org
      CC: joe@perches.com
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5aa93bcf
    • K
      net: fix race condition in several drivers when reading stats · e3906486
      Kevin Groeneveld 提交于
      Fix race condition in several network drivers when reading stats on 32bit
      UP architectures.  These drivers update their stats in a BH context and
      therefore should use u64_stats_fetch_begin_bh/u64_stats_fetch_retry_bh
      instead of u64_stats_fetch_begin/u64_stats_fetch_retry when reading the
      stats.
      Signed-off-by: NKevin Groeneveld <kgroeneveld@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e3906486
    • E
      ipv4: tcp: set unicast_sock uc_ttl to -1 · 0980e56e
      Eric Dumazet 提交于
      Set unicast_sock uc_ttl to -1 so that we select the right ttl,
      instead of sending packets with a 0 ttl.
      
      Bug added in commit be9f4a44 (ipv4: tcp: remove per net tcp_sock)
      Signed-off-by: NHiroaki SHIMODA <shimoda.hiroaki@gmail.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0980e56e