1. 21 7月, 2012 4 次提交
  2. 20 7月, 2012 9 次提交
    • Y
      net-tcp: Fast Open client - cookie-less mode · 67da22d2
      Yuchung Cheng 提交于
      In trusted networks, e.g., intranet, data-center, the client does not
      need to use Fast Open cookie to mitigate DoS attacks. In cookie-less
      mode, sendmsg() with MSG_FASTOPEN flag will send SYN-data regardless
      of cookie availability.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      67da22d2
    • Y
      net-tcp: Fast Open client - detecting SYN-data drops · aab48743
      Yuchung Cheng 提交于
      On paths with firewalls dropping SYN with data or experimental TCP options,
      Fast Open connections will have experience SYN timeout and bad performance.
      The solution is to track such incidents in the cookie cache and disables
      Fast Open temporarily.
      
      Since only the original SYN includes data and/or Fast Open option, the
      SYN-ACK has some tell-tale sign (tcp_rcv_fastopen_synack()) to detect
      such drops. If a path has recurring Fast Open SYN drops, Fast Open is
      disabled for 2^(recurring_losses) minutes starting from four minutes up to
      roughly one and half day. sendmsg with MSG_FASTOPEN flag will succeed but
      it behaves as connect() then write().
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aab48743
    • Y
      net-tcp: Fast Open client - sendmsg(MSG_FASTOPEN) · cf60af03
      Yuchung Cheng 提交于
      sendmsg() (or sendto()) with MSG_FASTOPEN is a combo of connect(2)
      and write(2). The application should replace connect() with it to
      send data in the opening SYN packet.
      
      For blocking socket, sendmsg() blocks until all the data are buffered
      locally and the handshake is completed like connect() call. It
      returns similar errno like connect() if the TCP handshake fails.
      
      For non-blocking socket, it returns the number of bytes queued (and
      transmitted in the SYN-data packet) if cookie is available. If cookie
      is not available, it transmits a data-less SYN packet with Fast Open
      cookie request option and returns -EINPROGRESS like connect().
      
      Using MSG_FASTOPEN on connecting or connected socket will result in
      simlar errno like repeating connect() calls. Therefore the application
      should only use this flag on new sockets.
      
      The buffer size of sendmsg() is independent of the MSS of the connection.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf60af03
    • Y
      net-tcp: Fast Open client - sending SYN-data · 783237e8
      Yuchung Cheng 提交于
      This patch implements sending SYN-data in tcp_connect(). The data is
      from tcp_sendmsg() with flag MSG_FASTOPEN (implemented in a later patch).
      
      The length of the cookie in tcp_fastopen_req, init'd to 0, controls the
      type of the SYN. If the cookie is not cached (len==0), the host sends
      data-less SYN with Fast Open cookie request option to solicit a cookie
      from the remote. If cookie is not available (len > 0), the host sends
      a SYN-data with Fast Open cookie option. If cookie length is negative,
        the SYN will not include any Fast Open option (for fall back operations).
      
      To deal with middleboxes that may drop SYN with data or experimental TCP
      option, the SYN-data is only sent once. SYN retransmits do not include
      data or Fast Open options. The connection will fall back to regular TCP
      handshake.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      783237e8
    • Y
      net-tcp: Fast Open client - cookie cache · 1fe4c481
      Yuchung Cheng 提交于
      With help from Eric Dumazet, add Fast Open metrics in tcp metrics cache.
      The basic ones are MSS and the cookies. Later patch will cache more to
      handle unfriendly middleboxes.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1fe4c481
    • Y
      net-tcp: Fast Open base · 2100c8d2
      Yuchung Cheng 提交于
      This patch impelements the common code for both the client and server.
      
      1. TCP Fast Open option processing. Since Fast Open does not have an
         option number assigned by IANA yet, it shares the experiment option
         code 254 by implementing draft-ietf-tcpm-experimental-options
         with a 16 bits magic number 0xF989. This enables global experiments
         without clashing the scarce(2) experimental options available for TCP.
      
         When the draft status becomes standard (maybe), the client should
         switch to the new option number assigned while the server supports
         both numbers for transistion.
      
      2. The new sysctl tcp_fastopen
      
      3. A place holder init function
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2100c8d2
    • D
      net: Fix warnings in dst_ops.h · d8f1641b
      David S. Miller 提交于
      include/net/dst_ops.h:28:20: warning: ‘struct sock’ declared inside parameter list
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d8f1641b
    • E
      ipv4: tcp: remove per net tcp_sock · be9f4a44
      Eric Dumazet 提交于
      tcp_v4_send_reset() and tcp_v4_send_ack() use a single socket
      per network namespace.
      
      This leads to bad behavior on multiqueue NICS, because many cpus
      contend for the socket lock and once socket lock is acquired, extra
      false sharing on various socket fields slow down the operations.
      
      To better resist to attacks, we use a percpu socket. Each cpu can
      run without contention, using appropriate memory (local node)
      
      Additional features :
      
      1) We also mirror the queue_mapping of the incoming skb, so that
      answers use the same queue if possible.
      
      2) Setting SOCK_USE_WRITE_QUEUE socket flag speedup sock_wfree()
      
      3) We now limit the number of in-flight RST/ACK [1] packets
      per cpu, instead of per namespace, and we honor the sysctl_wmem_default
      limit dynamically. (Prior to this patch, sysctl_wmem_default value was
      copied at boot time, so any further change would not affect tcp_sock
      limit)
      
      [1] These packets are only generated when no socket was matched for
      the incoming packet.
      Reported-by: NBill Sommerfeld <wsommerfeld@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      be9f4a44
    • J
      ipv4: use seqlock for nh_exceptions · aee06da6
      Julian Anastasov 提交于
      Use global seqlock for the nh_exceptions. Call
      fnhe_oldest with the right hash chain. Correct the diff
      value for dst_set_expires.
      
      v2: after suggestions from Eric Dumazet:
      * get rid of spin lock fnhe_lock, rearrange update_or_create_fnhe
      * continue daddr search in rt_bind_exception
      
      v3:
      * remove the daddr check before seqlock in rt_bind_exception
      * restart lookup in rt_bind_exception on detected seqlock change,
      as suggested by David Miller
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aee06da6
  3. 19 7月, 2012 2 次提交
  4. 18 7月, 2012 1 次提交
  5. 17 7月, 2012 7 次提交
    • D
      ipv4: Add FIB nexthop exceptions. · 4895c771
      David S. Miller 提交于
      In a regime where we have subnetted route entries, we need a way to
      store persistent storage about destination specific learned values
      such as redirects and PMTU values.
      
      This is implemented here via nexthop exceptions.
      
      The initial implementation is a 2048 entry hash table with relaiming
      starting at chain length 5.  A more sophisticated scheme can be
      devised if that proves necessary.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4895c771
    • D
      net: Pass optional SKB and SK arguments to dst_ops->{update_pmtu,redirect}() · 6700c270
      David S. Miller 提交于
      This will be used so that we can compose a full flow key.
      
      Even though we have a route in this context, we need more.  In the
      future the routes will be without destination address, source address,
      etc. keying.  One ipv4 route will cover entire subnets, etc.
      
      In this environment we have to have a way to possess persistent storage
      for redirects and PMTU information.  This persistent storage will exist
      in the FIB tables, and that's why we'll need to be able to rebuild a
      full lookup flow key here.  Using that flow key will do a fib_lookup()
      and create/update the persistent entry.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6700c270
    • L
      cfg80211: add cellular base station regulatory hint support · 57b5ce07
      Luis R. Rodriguez 提交于
      Cellular base stations can provide hints to cfg80211 about
      where they think we are. This can be done for example on
      a cell phone. To enable these hints we simply allow them
      through as user regulatory hints but we allow userspace
      to clasify the hint as either coming directly from the
      user or coming from a cellular base station. This option
      is only available when you enable
      CONFIG_CFG80211_CERTIFICATION_ONUS.
      
      The base station hints themselves will not be processed
      by the core unless at least one device on the system
      supports this feature.
      Signed-off-by: NLuis R. Rodriguez <mcgrof@qca.qualcomm.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      57b5ce07
    • L
      ipvs: fix oops on NAT reply in br_nf context · 9e33ce45
      Lin Ming 提交于
      IPVS should not reset skb->nf_bridge in FORWARD hook
      by calling nf_reset for NAT replies. It triggers oops in
      br_nf_forward_finish.
      
      [  579.781508] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
      [  579.781669] IP: [<ffffffff817b1ca5>] br_nf_forward_finish+0x58/0x112
      [  579.781792] PGD 218f9067 PUD 0
      [  579.781865] Oops: 0000 [#1] SMP
      [  579.781945] CPU 0
      [  579.781983] Modules linked in:
      [  579.782047]
      [  579.782080]
      [  579.782114] Pid: 4644, comm: qemu Tainted: G        W    3.5.0-rc5-00006-g95e69f9 #282 Hewlett-Packard  /30E8
      [  579.782300] RIP: 0010:[<ffffffff817b1ca5>]  [<ffffffff817b1ca5>] br_nf_forward_finish+0x58/0x112
      [  579.782455] RSP: 0018:ffff88007b003a98  EFLAGS: 00010287
      [  579.782541] RAX: 0000000000000008 RBX: ffff8800762ead00 RCX: 000000000001670a
      [  579.782653] RDX: 0000000000000000 RSI: 000000000000000a RDI: ffff8800762ead00
      [  579.782845] RBP: ffff88007b003ac8 R08: 0000000000016630 R09: ffff88007b003a90
      [  579.782957] R10: ffff88007b0038e8 R11: ffff88002da37540 R12: ffff88002da01a02
      [  579.783066] R13: ffff88002da01a80 R14: ffff88002d83c000 R15: ffff88002d82a000
      [  579.783177] FS:  0000000000000000(0000) GS:ffff88007b000000(0063) knlGS:00000000f62d1b70
      [  579.783306] CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
      [  579.783395] CR2: 0000000000000004 CR3: 00000000218fe000 CR4: 00000000000027f0
      [  579.783505] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  579.783684] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [  579.783795] Process qemu (pid: 4644, threadinfo ffff880021b20000, task ffff880021aba760)
      [  579.783919] Stack:
      [  579.783959]  ffff88007693cedc ffff8800762ead00 ffff88002da01a02 ffff8800762ead00
      [  579.784110]  ffff88002da01a02 ffff88002da01a80 ffff88007b003b18 ffffffff817b26c7
      [  579.784260]  ffff880080000000 ffffffff81ef59f0 ffff8800762ead00 ffffffff81ef58b0
      [  579.784477] Call Trace:
      [  579.784523]  <IRQ>
      [  579.784562]
      [  579.784603]  [<ffffffff817b26c7>] br_nf_forward_ip+0x275/0x2c8
      [  579.784707]  [<ffffffff81704b58>] nf_iterate+0x47/0x7d
      [  579.784797]  [<ffffffff817ac32e>] ? br_dev_queue_push_xmit+0xae/0xae
      [  579.784906]  [<ffffffff81704bfb>] nf_hook_slow+0x6d/0x102
      [  579.784995]  [<ffffffff817ac32e>] ? br_dev_queue_push_xmit+0xae/0xae
      [  579.785175]  [<ffffffff8187fa95>] ? _raw_write_unlock_bh+0x19/0x1b
      [  579.785179]  [<ffffffff817ac417>] __br_forward+0x97/0xa2
      [  579.785179]  [<ffffffff817ad366>] br_handle_frame_finish+0x1a6/0x257
      [  579.785179]  [<ffffffff817b2386>] br_nf_pre_routing_finish+0x26d/0x2cb
      [  579.785179]  [<ffffffff817b2cf0>] br_nf_pre_routing+0x55d/0x5c1
      [  579.785179]  [<ffffffff81704b58>] nf_iterate+0x47/0x7d
      [  579.785179]  [<ffffffff817ad1c0>] ? br_handle_local_finish+0x44/0x44
      [  579.785179]  [<ffffffff81704bfb>] nf_hook_slow+0x6d/0x102
      [  579.785179]  [<ffffffff817ad1c0>] ? br_handle_local_finish+0x44/0x44
      [  579.785179]  [<ffffffff81551525>] ? sky2_poll+0xb35/0xb54
      [  579.785179]  [<ffffffff817ad62a>] br_handle_frame+0x213/0x229
      [  579.785179]  [<ffffffff817ad417>] ? br_handle_frame_finish+0x257/0x257
      [  579.785179]  [<ffffffff816e3b47>] __netif_receive_skb+0x2b4/0x3f1
      [  579.785179]  [<ffffffff816e69fc>] process_backlog+0x99/0x1e2
      [  579.785179]  [<ffffffff816e6800>] net_rx_action+0xdf/0x242
      [  579.785179]  [<ffffffff8107e8a8>] __do_softirq+0xc1/0x1e0
      [  579.785179]  [<ffffffff8135a5ba>] ? trace_hardirqs_off_thunk+0x3a/0x6c
      [  579.785179]  [<ffffffff8188812c>] call_softirq+0x1c/0x30
      
      The steps to reproduce as follow,
      
      1. On Host1, setup brige br0(192.168.1.106)
      2. Boot a kvm guest(192.168.1.105) on Host1 and start httpd
      3. Start IPVS service on Host1
         ipvsadm -A -t 192.168.1.106:80 -s rr
         ipvsadm -a -t 192.168.1.106:80 -r 192.168.1.105:80 -m
      4. Run apache benchmark on Host2(192.168.1.101)
         ab -n 1000 http://192.168.1.106/
      
      ip_vs_reply4
        ip_vs_out
          handle_response
            ip_vs_notrack
              nf_reset()
              {
                skb->nf_bridge = NULL;
              }
      
      Actually, IPVS wants in this case just to replace nfct
      with untracked version. So replace the nf_reset(skb) call
      in ip_vs_notrack() with a nf_conntrack_put(skb->nfct) call.
      Signed-off-by: NLin Ming <mlin@ss.pku.edu.cn>
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      9e33ce45
    • T
      cfg80211: support TX error rate CQM · 84f10708
      Thomas Pedersen 提交于
      Let the user configure serveral TX error conection quality monitoring
      parameters: % error rate, survey interval, and # of attempted packets.
      
      On exceeding the TX failure rate over the given interval, the driver
      will send a CQM notify event with the actual TX failure rate and
      packets attempted.
      Signed-off-by: NThomas Pedersen <c_tpeder@qca.qualcomm.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      84f10708
    • E
      tcp: implement RFC 5961 3.2 · 282f23c6
      Eric Dumazet 提交于
      Implement the RFC 5691 mitigation against Blind
      Reset attack using RST bit.
      
      Idea is to validate incoming RST sequence,
      to match RCV.NXT value, instead of previouly accepted
      window : (RCV.NXT <= SEG.SEQ < RCV.NXT+RCV.WND)
      
      If sequence is in window but not an exact match, send
      a "challenge ACK", so that the other part can resend an
      RST with the appropriate sequence.
      
      Add a new sysctl, tcp_challenge_ack_limit, to limit
      number of challenge ACK sent per second.
      
      Add a new SNMP counter to count number of challenge acks sent.
      (netstat -s | grep TCPChallengeACK)
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Kiran Kumar Kella <kkiran@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      282f23c6
    • A
      net: make sock diag per-namespace · 51d7cccf
      Andrey Vagin 提交于
      Before this patch sock_diag works for init_net only and dumps
      information about sockets from all namespaces.
      
      This patch expands sock_diag for all name-spaces.
      It creates a netlink kernel socket for each netns and filters
      data during dumping.
      
      v2: filter accoding with netns in all places
          remove an unused variable.
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: James Morris <jmorris@namei.org>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      CC: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Signed-off-by: NAndrew Vagin <avagin@openvz.org>
      Acked-by: NPavel Emelyanov <xemul@parallels.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      51d7cccf
  6. 16 7月, 2012 3 次提交
  7. 15 7月, 2012 1 次提交
    • M
      Bluetooth: Use tx window from config response for ack timing · c20f8e35
      Mat Martineau 提交于
      This change addresses an L2CAP ERTM throughput problem when a remote
      device does not fully utilize the available transmit window.
      
      The L2CAP ERTM transmit window size determines the maximum number of
      unacked frames that may be outstanding at any time. It is configured
      separately for each direction of an ERTM connection. Each side sends a
      configuration request with a tx_win field indicating how many unacked
      frames it is capable of receiving before sending an ack. The
      configuration response's tx_win field shows how many frames the
      transmitter will actually send before waiting for an ack.
      
      It's important to trace both the actual transmit window (to check for
      validity of incoming frames) and the number of frames that the
      transmitter will send before waiting (to send acks at the appropriate
      time). Now there are separate tx_win and ack_win values. ack_win is
      updated based on configuration responses, and is used to determine
      when acks are sent.
      Signed-off-by: NMat Martineau <mathewm@codeaurora.org>
      Signed-off-by: NGustavo Padovan <gustavo.padovan@collabora.co.uk>
      c20f8e35
  8. 13 7月, 2012 4 次提交
  9. 12 7月, 2012 9 次提交