1. 31 10月, 2009 3 次提交
  2. 30 10月, 2009 1 次提交
  3. 29 10月, 2009 2 次提交
    • G
      net: Fix 'Re: PACKET_TX_RING: packet size is too long' · b5dd884e
      Gabor Gombas 提交于
      Currently PACKET_TX_RING forces certain amount of every frame to remain
      unused. This probably originates from an early version of the
      PACKET_TX_RING patch that in fact used the extra space when the (since
      removed) CONFIG_PACKET_MMAP_ZERO_COPY option was enabled. The current
      code does not make any use of this extra space.
      
      This patch removes the extra space reservation and lets userspace make
      use of the full frame size.
      Signed-off-by: NGabor Gombas <gombasg@sztaki.hu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b5dd884e
    • N
      AF_RAW: Augment raw_send_hdrinc to expand skb to fit iphdr->ihl (v2) · 55888dfb
      Neil Horman 提交于
      Augment raw_send_hdrinc to correct for incorrect ip header length values
      
      A series of oopses was reported to me recently.  Apparently when using AF_RAW
      sockets to send data to peers that were reachable via ipsec encapsulation,
      people could panic or BUG halt their systems.
      
      I've tracked the problem down to user space sending an invalid ip header over an
      AF_RAW socket with IP_HDRINCL set to 1.
      
      Basically what happens is that userspace sends down an ip frame that includes
      only the header (no data), but sets the ip header ihl value to a large number,
      one that is larger than the total amount of data passed to the sendmsg call.  In
      raw_send_hdrincl, we allocate an skb based on the size of the data in the msghdr
      that was passed in, but assume the data is all valid.  Later during ipsec
      encapsulation, xfrm4_tranport_output moves the entire frame back in the skbuff
      to provide headroom for the ipsec headers.  During this operation, the
      skb->transport_header is repointed to a spot computed by
      skb->network_header + the ip header length (ihl).  Since so little data was
      passed in relative to the value of ihl provided by the raw socket, we point
      transport header to an unknown location, resulting in various crashes.
      
      This fix for this is pretty straightforward, simply validate the value of of
      iph->ihl when sending over a raw socket.  If (iph->ihl*4U) > user data buffer
      size, drop the frame and return -EINVAL.  I just confirmed this fixes the
      reported crashes.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      55888dfb
  4. 28 10月, 2009 5 次提交
  5. 24 10月, 2009 1 次提交
  6. 23 10月, 2009 1 次提交
  7. 20 10月, 2009 8 次提交
    • H
      tcp: Try to catch MSG_PEEK bug · b6b39e8f
      Herbert Xu 提交于
      This patch tries to print out more information when we hit the
      MSG_PEEK bug in tcp_recvmsg.  It's been around since at least
      2005 and it's about time that we finally fix it.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6b39e8f
    • E
      net: Fix IP_MULTICAST_IF · 55b80503
      Eric Dumazet 提交于
      ipv4/ipv6 setsockopt(IP_MULTICAST_IF) have dubious __dev_get_by_index() calls.
      
      This function should be called only with RTNL or dev_base_lock held, or reader
      could see a corrupt hash chain and eventually enter an endless loop.
      
      Fix is to call dev_get_by_index()/dev_put().
      
      If this happens to be performance critical, we could define a new dev_exist_by_index()
      function to avoid touching dev refcount.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      55b80503
    • D
      bluetooth: static lock key fix · 45054dc1
      Dave Young 提交于
      When shutdown ppp connection, lockdep waring about non-static key
      will happen, it is caused by the lock is not initialized properly
      at that time.
      
      Fix with tuning the lock/skb_queue_head init order
      
      [   94.339261] INFO: trying to register non-static key.
      [   94.342509] the code is fine but needs lockdep annotation.
      [   94.342509] turning off the locking correctness validator.
      [   94.342509] Pid: 0, comm: swapper Not tainted 2.6.31-mm1 #2
      [   94.342509] Call Trace:
      [   94.342509]  [<c0248fbe>] register_lock_class+0x58/0x241
      [   94.342509]  [<c024b5df>] ? __lock_acquire+0xb57/0xb73
      [   94.342509]  [<c024ab34>] __lock_acquire+0xac/0xb73
      [   94.342509]  [<c024b7fa>] ? lock_release_non_nested+0x17b/0x1de
      [   94.342509]  [<c024b662>] lock_acquire+0x67/0x84
      [   94.342509]  [<c04cd1eb>] ? skb_dequeue+0x15/0x41
      [   94.342509]  [<c054a857>] _spin_lock_irqsave+0x2f/0x3f
      [   94.342509]  [<c04cd1eb>] ? skb_dequeue+0x15/0x41
      [   94.342509]  [<c04cd1eb>] skb_dequeue+0x15/0x41
      [   94.342509]  [<c054a648>] ? _read_unlock+0x1d/0x20
      [   94.342509]  [<c04cd641>] skb_queue_purge+0x14/0x1b
      [   94.342509]  [<fab94fdc>] l2cap_recv_frame+0xea1/0x115a [l2cap]
      [   94.342509]  [<c024b5df>] ? __lock_acquire+0xb57/0xb73
      [   94.342509]  [<c0249c04>] ? mark_lock+0x1e/0x1c7
      [   94.342509]  [<f8364963>] ? hci_rx_task+0xd2/0x1bc [bluetooth]
      [   94.342509]  [<fab95346>] l2cap_recv_acldata+0xb1/0x1c6 [l2cap]
      [   94.342509]  [<f8364997>] hci_rx_task+0x106/0x1bc [bluetooth]
      [   94.342509]  [<fab95295>] ? l2cap_recv_acldata+0x0/0x1c6 [l2cap]
      [   94.342509]  [<c02302c4>] tasklet_action+0x69/0xc1
      [   94.342509]  [<c022fbef>] __do_softirq+0x94/0x11e
      [   94.342509]  [<c022fcaf>] do_softirq+0x36/0x5a
      [   94.342509]  [<c022fe14>] irq_exit+0x35/0x68
      [   94.342509]  [<c0204ced>] do_IRQ+0x72/0x89
      [   94.342509]  [<c02038ee>] common_interrupt+0x2e/0x34
      [   94.342509]  [<c024007b>] ? pm_qos_add_requirement+0x63/0x9d
      [   94.342509]  [<c038e8a5>] ? acpi_idle_enter_bm+0x209/0x238
      [   94.342509]  [<c049d238>] cpuidle_idle_call+0x5c/0x94
      [   94.342509]  [<c02023f8>] cpu_idle+0x4e/0x6f
      [   94.342509]  [<c0534153>] rest_init+0x53/0x55
      [   94.342509]  [<c0781894>] start_kernel+0x2f0/0x2f5
      [   94.342509]  [<c0781091>] i386_start_kernel+0x91/0x96
      Reported-by: NOliver Hartkopp <oliver@hartkopp.net>
      Signed-off-by: NDave Young <hidave.darkstar@gmail.com>
      Tested-by: NOliver Hartkopp <oliver@hartkopp.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      45054dc1
    • D
      bluetooth: scheduling while atomic bug fix · f74c77cb
      Dave Young 提交于
      Due to driver core changes dev_set_drvdata will call kzalloc which should be
      in might_sleep context, but hci_conn_add will be called in atomic context
      
      Like dev_set_name move dev_set_drvdata to work queue function.
      
      oops as following:
      
      Oct  2 17:41:59 darkstar kernel: [  438.001341] BUG: sleeping function called from invalid context at mm/slqb.c:1546
      Oct  2 17:41:59 darkstar kernel: [  438.001345] in_atomic(): 1, irqs_disabled(): 0, pid: 2133, name: sdptool
      Oct  2 17:41:59 darkstar kernel: [  438.001348] 2 locks held by sdptool/2133:
      Oct  2 17:41:59 darkstar kernel: [  438.001350]  #0:  (sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP){+.+.+.}, at: [<faa1d2f5>] lock_sock+0xa/0xc [l2cap]
      Oct  2 17:41:59 darkstar kernel: [  438.001360]  #1:  (&hdev->lock){+.-.+.}, at: [<faa20e16>] l2cap_sock_connect+0x103/0x26b [l2cap]
      Oct  2 17:41:59 darkstar kernel: [  438.001371] Pid: 2133, comm: sdptool Not tainted 2.6.31-mm1 #2
      Oct  2 17:41:59 darkstar kernel: [  438.001373] Call Trace:
      Oct  2 17:41:59 darkstar kernel: [  438.001381]  [<c022433f>] __might_sleep+0xde/0xe5
      Oct  2 17:41:59 darkstar kernel: [  438.001386]  [<c0298843>] __kmalloc+0x4a/0x15a
      Oct  2 17:41:59 darkstar kernel: [  438.001392]  [<c03f0065>] ? kzalloc+0xb/0xd
      Oct  2 17:41:59 darkstar kernel: [  438.001396]  [<c03f0065>] kzalloc+0xb/0xd
      Oct  2 17:41:59 darkstar kernel: [  438.001400]  [<c03f04ff>] device_private_init+0x15/0x3d
      Oct  2 17:41:59 darkstar kernel: [  438.001405]  [<c03f24c5>] dev_set_drvdata+0x18/0x26
      Oct  2 17:41:59 darkstar kernel: [  438.001414]  [<fa51fff7>] hci_conn_init_sysfs+0x40/0xd9 [bluetooth]
      Oct  2 17:41:59 darkstar kernel: [  438.001422]  [<fa51cdc0>] ? hci_conn_add+0x128/0x186 [bluetooth]
      Oct  2 17:41:59 darkstar kernel: [  438.001429]  [<fa51ce0f>] hci_conn_add+0x177/0x186 [bluetooth]
      Oct  2 17:41:59 darkstar kernel: [  438.001437]  [<fa51cf8a>] hci_connect+0x3c/0xfb [bluetooth]
      Oct  2 17:41:59 darkstar kernel: [  438.001442]  [<faa20e87>] l2cap_sock_connect+0x174/0x26b [l2cap]
      Oct  2 17:41:59 darkstar kernel: [  438.001448]  [<c04c8df5>] sys_connect+0x60/0x7a
      Oct  2 17:41:59 darkstar kernel: [  438.001453]  [<c024b703>] ? lock_release_non_nested+0x84/0x1de
      Oct  2 17:41:59 darkstar kernel: [  438.001458]  [<c028804b>] ? might_fault+0x47/0x81
      Oct  2 17:41:59 darkstar kernel: [  438.001462]  [<c028804b>] ? might_fault+0x47/0x81
      Oct  2 17:41:59 darkstar kernel: [  438.001468]  [<c033361f>] ? __copy_from_user_ll+0x11/0xce
      Oct  2 17:41:59 darkstar kernel: [  438.001472]  [<c04c9419>] sys_socketcall+0x82/0x17b
      Oct  2 17:41:59 darkstar kernel: [  438.001477]  [<c020329d>] syscall_call+0x7/0xb
      Signed-off-by: NDave Young <hidave.darkstar@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f74c77cb
    • J
      tcp: fix TCP_DEFER_ACCEPT retrans calculation · b103cf34
      Julian Anastasov 提交于
      Fix TCP_DEFER_ACCEPT conversion between seconds and
      retransmission to match the TCP SYN-ACK retransmission periods
      because the time is converted to such retransmissions. The old
      algorithm selects one more retransmission in some cases. Allow
      up to 255 retransmissions.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b103cf34
    • J
      tcp: reduce SYN-ACK retrans for TCP_DEFER_ACCEPT · 0c3d79bc
      Julian Anastasov 提交于
      Change SYN-ACK retransmitting code for the TCP_DEFER_ACCEPT
      users to not retransmit SYN-ACKs during the deferring period if
      ACK from client was received. The goal is to reduce traffic
      during the deferring period. When the period is finished
      we continue with sending SYN-ACKs (at least one) but this time
      any traffic from client will change the request to established
      socket allowing application to terminate it properly.
      Also, do not drop acked request if sending of SYN-ACK fails.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c3d79bc
    • J
      tcp: accept socket after TCP_DEFER_ACCEPT period · d1b99ba4
      Julian Anastasov 提交于
      Willy Tarreau and many other folks in recent years
      were concerned what happens when the TCP_DEFER_ACCEPT period
      expires for clients which sent ACK packet. They prefer clients
      that actively resend ACK on our SYN-ACK retransmissions to be
      converted from open requests to sockets and queued to the
      listener for accepting after the deferring period is finished.
      Then application server can decide to wait longer for data
      or to properly terminate the connection with FIN if read()
      returns EAGAIN which is an indication for accepting after
      the deferring period. This change still can have side effects
      for applications that expect always to see data on the accepted
      socket. Others can be prepared to work in both modes (with or
      without TCP_DEFER_ACCEPT period) and their data processing can
      ignore the read=EAGAIN notification and to allocate resources for
      clients which proved to have no data to send during the deferring
      period. OTOH, servers that use TCP_DEFER_ACCEPT=1 as flag (not
      as a timeout) to wait for data will notice clients that didn't
      send data for 3 seconds but that still resend ACKs.
      Thanks to Willy Tarreau for the initial idea and to
      Eric Dumazet for the review and testing the change.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d1b99ba4
    • D
      Revert "tcp: fix tcp_defer_accept to consider the timeout" · a1a2ad91
      David S. Miller 提交于
      This reverts commit 6d01a026.
      
      Julian Anastasov, Willy Tarreau and Eric Dumazet have come up
      with a more correct way to deal with this.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a1a2ad91
  8. 19 10月, 2009 1 次提交
    • T
      AF_UNIX: Fix deadlock on connecting to shutdown socket · 77238f2b
      Tomoki Sekiyama 提交于
      I found a deadlock bug in UNIX domain socket, which makes able to DoS
      attack against the local machine by non-root users.
      
      How to reproduce:
      1. Make a listening AF_UNIX/SOCK_STREAM socket with an abstruct
          namespace(*), and shutdown(2) it.
       2. Repeat connect(2)ing to the listening socket from the other sockets
          until the connection backlog is full-filled.
       3. connect(2) takes the CPU forever. If every core is taken, the
          system hangs.
      
      PoC code: (Run as many times as cores on SMP machines.)
      
      int main(void)
      {
      	int ret;
      	int csd;
      	int lsd;
      	struct sockaddr_un sun;
      
      	/* make an abstruct name address (*) */
      	memset(&sun, 0, sizeof(sun));
      	sun.sun_family = PF_UNIX;
      	sprintf(&sun.sun_path[1], "%d", getpid());
      
      	/* create the listening socket and shutdown */
      	lsd = socket(AF_UNIX, SOCK_STREAM, 0);
      	bind(lsd, (struct sockaddr *)&sun, sizeof(sun));
      	listen(lsd, 1);
      	shutdown(lsd, SHUT_RDWR);
      
      	/* connect loop */
      	alarm(15); /* forcely exit the loop after 15 sec */
      	for (;;) {
      		csd = socket(AF_UNIX, SOCK_STREAM, 0);
      		ret = connect(csd, (struct sockaddr *)&sun, sizeof(sun));
      		if (-1 == ret) {
      			perror("connect()");
      			break;
      		}
      		puts("Connection OK");
      	}
      	return 0;
      }
      
      (*) Make sun_path[0] = 0 to use the abstruct namespace.
          If a file-based socket is used, the system doesn't deadlock because
          of context switches in the file system layer.
      
      Why this happens:
       Error checks between unix_socket_connect() and unix_wait_for_peer() are
       inconsistent. The former calls the latter to wait until the backlog is
       processed. Despite the latter returns without doing anything when the
       socket is shutdown, the former doesn't check the shutdown state and
       just retries calling the latter forever.
      
      Patch:
       The patch below adds shutdown check into unix_socket_connect(), so
       connect(2) to the shutdown socket will return -ECONREFUSED.
      Signed-off-by: NTomoki Sekiyama <tomoki.sekiyama.qu@hitachi.com>
      Signed-off-by: NMasanori Yoshida <masanori.yoshida.tv@hitachi.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      77238f2b
  9. 13 10月, 2009 5 次提交
    • E
      udp: Fix udp_poll() and ioctl() · 85584672
      Eric Dumazet 提交于
      udp_poll() can in some circumstances drop frames with incorrect checksums.
      
      Problem is we now have to lock the socket while dropping frames, or risk
      sk_forward corruption.
      
      This bug is present since commit 95766fff
      ([UDP]: Add memory accounting.)
      
      While we are at it, we can correct ioctl(SIOCINQ) to also drop bad frames.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      85584672
    • W
      tcp: fix tcp_defer_accept to consider the timeout · 6d01a026
      Willy Tarreau 提交于
      I was trying to use TCP_DEFER_ACCEPT and noticed that if the
      client does not talk, the connection is never accepted and
      remains in SYN_RECV state until the retransmits expire, where
      it finally is deleted. This is bad when some firewall such as
      netfilter sits between the client and the server because the
      firewall sees the connection in ESTABLISHED state while the
      server will finally silently drop it without sending an RST.
      
      This behaviour contradicts the man page which says it should
      wait only for some time :
      
             TCP_DEFER_ACCEPT (since Linux 2.4)
                Allows a listener to be awakened only when data arrives
                on the socket.  Takes an integer value  (seconds), this
                can  bound  the  maximum  number  of attempts TCP will
                make to complete the connection. This option should not
                be used in code intended to be portable.
      
      Also, looking at ipv4/tcp.c, a retransmit counter is correctly
      computed :
      
              case TCP_DEFER_ACCEPT:
                      icsk->icsk_accept_queue.rskq_defer_accept = 0;
                      if (val > 0) {
                              /* Translate value in seconds to number of
                               * retransmits */
                              while (icsk->icsk_accept_queue.rskq_defer_accept < 32 &&
                                     val > ((TCP_TIMEOUT_INIT / HZ) <<
                                             icsk->icsk_accept_queue.rskq_defer_accept))
                                      icsk->icsk_accept_queue.rskq_defer_accept++;
                              icsk->icsk_accept_queue.rskq_defer_accept++;
                      }
                      break;
      
      ==> rskq_defer_accept is used as a counter of retransmits.
      
      But in tcp_minisocks.c, this counter is only checked. And in
      fact, I have found no location which updates it. So I think
      that what was intended was to decrease it in tcp_minisocks
      whenever it is checked, which the trivial patch below does.
      Signed-off-by: NWilly Tarreau <w@1wt.eu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6d01a026
    • J
      mac80211: document ieee80211_rx() context requirement · d20ef63d
      Johannes Berg 提交于
      ieee80211_rx() must be called with softirqs disabled
      since the networking stack requires this for netif_rx()
      and some code in mac80211 can assume that it can not
      be processing its own tasklet and this call at the same
      time.
      
      It may be possible to remove this requirement after a
      careful audit of mac80211 and doing any needed locking
      improvements in it along with disabling softirqs around
      netif_rx(). An alternative might be to push all packet
      processing to process context in mac80211, instead of
      to the tasklet, and add other synchronisation.
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      d20ef63d
    • J
      mac80211: fix ibss race · 51f98f13
      Johannes Berg 提交于
      When a scan completes, we call ieee80211_sta_find_ibss(),
      which is also called from other places. When the scan was
      done in software, there's no problem as both run from the
      single-threaded mac80211 workqueue and are thus serialised
      against each other, but with hardware scan the completion
      can be in a different context and race against callers of
      this function from the workqueue (e.g. due to beacon RX).
      So instead of calling ieee80211_sta_find_ibss() directly,
      just arm the timer and have it fire, scheduling the work,
      which will invoke ieee80211_sta_find_ibss() (if that is
      appropriate in the current state).
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      51f98f13
    • F
  10. 12 10月, 2009 1 次提交
  11. 09 10月, 2009 1 次提交
  12. 08 10月, 2009 3 次提交
  13. 07 10月, 2009 1 次提交
  14. 05 10月, 2009 3 次提交
    • J
      wext: let get_wireless_stats() sleep · a160ee69
      Johannes Berg 提交于
      A number of drivers (recently including cfg80211-based ones)
      assume that all wireless handlers, including statistics, can
      sleep and they often also implicitly assume that the rtnl is
      held around their invocation. This is almost always true now
      except when reading from sysfs:
      
        BUG: sleeping function called from invalid context at kernel/mutex.c:280
        in_atomic(): 1, irqs_disabled(): 0, pid: 10450, name: head
        2 locks held by head/10450:
         #0:  (&buffer->mutex){+.+.+.}, at: [<c10ceb99>] sysfs_read_file+0x24/0xf4
         #1:  (dev_base_lock){++.?..}, at: [<c12844ee>] wireless_show+0x1a/0x4c
        Pid: 10450, comm: head Not tainted 2.6.32-rc3 #1
        Call Trace:
         [<c102301c>] __might_sleep+0xf0/0xf7
         [<c1324355>] mutex_lock_nested+0x1a/0x33
         [<f8cea53b>] wdev_lock+0xd/0xf [cfg80211]
         [<f8cea58f>] cfg80211_wireless_stats+0x45/0x12d [cfg80211]
         [<c13118d6>] get_wireless_stats+0x16/0x1c
         [<c12844fe>] wireless_show+0x2a/0x4c
      
      Fix this by using the rtnl instead of dev_base_lock.
      Reported-by: NMiles Lane <miles.lane@gmail.com>
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a160ee69
    • E
      pktgen: restore nanosec delays · 9240d715
      Eric Dumazet 提交于
      Commit fd29cf72 (pktgen: convert to use ktime_t)
      inadvertantly converted "delay" parameter from nanosec to microsec.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9240d715
    • E
      pktgen: Fix multiqueue handling · 896a7cf8
      Eric Dumazet 提交于
      It is not currently possible to instruct pktgen to use one selected tx queue.
      
      When Robert added multiqueue support in commit 45b270f8, he added
      an interval (queue_map_min, queue_map_max), and his code doesnt take
      into account the case of min = max, to select one tx queue exactly.
      
      I suspect a high performance setup on a eight txqueue device wants
      to use exactly eight cpus, and assign one tx queue to each sender.
      
      This patchs makes pktgen select the right tx queue, not the first one.
      
      Also updates Documentation to reflect Robert changes.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NRobert Olsson <robert.olsson@its.uu.se>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      896a7cf8
  15. 03 10月, 2009 1 次提交
  16. 02 10月, 2009 3 次提交
    • A
      net: Use sk_mark for routing lookup in more places · 914a9ab3
      Atis Elsts 提交于
      This patch against v2.6.31 adds support for route lookup using sk_mark in some 
      more places. The benefits from this patch are the following.
      First, SO_MARK option now has effect on UDP sockets too.
      Second, ip_queue_xmit() and inet_sk_rebuild_header() could fail to do routing 
      lookup correctly if TCP sockets with SO_MARK were used.
      Signed-off-by: NAtis Elsts <atis@mikrotik.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      914a9ab3
    • O
      IPv4 TCP fails to send window scale option when window scale is zero · 89e95a61
      Ori Finkelman 提交于
      Acknowledge TCP window scale support by inserting the proper option in SYN/ACK
      and SYN headers even if our window scale is zero.
      
      This fixes the following observed behavior:
      
      1. Client sends a SYN with TCP window scaling option and non zero window scale
      value to a Linux box.
      2. Linux box notes large receive window from client.
      3. Linux decides on a zero value of window scale for its part.
      4. Due to compare against requested window scale size option, Linux does not to
       send windows scale TCP option header on SYN/ACK at all.
      
      With the following result:
      
      Client box thinks TCP window scaling is not supported, since SYN/ACK had no
      TCP window scale option, while Linux thinks that TCP window scaling is
      supported (and scale might be non zero), since SYN had  TCP window scale
      option and we have a mismatched idea between the client and server
      regarding window sizes.
      
      Probably it also fixes up the following bug (not observed in practice):
      
      1. Linux box opens TCP connection to some server.
      2. Linux decides on zero value of window scale.
      3. Due to compare against computed window scale size option, Linux does
      not to set windows scale TCP  option header on SYN.
      
      With the expected result that the server OS does not use window scale option
      due to not receiving such an option in the SYN headers, leading to suboptimal
      performance.
      Signed-off-by: NGilad Ben-Yossef <gilad@codefidence.com>
      Signed-off-by: NOri Finkelman <ori@comsleep.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      89e95a61
    • A
      net/ipv4/tcp.c: fix min() type mismatch warning · 4fdb78d3
      Andrew Morton 提交于
      net/ipv4/tcp.c: In function 'do_tcp_setsockopt':
      net/ipv4/tcp.c:2050: warning: comparison of distinct pointer types lacks a cast
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4fdb78d3