1. 12 2月, 2015 7 次提交
  2. 11 2月, 2015 14 次提交
  3. 10 2月, 2015 8 次提交
  4. 09 2月, 2015 4 次提交
  5. 08 2月, 2015 7 次提交
    • N
      tcp: mitigate ACK loops for connections as tcp_timewait_sock · 4fb17a60
      Neal Cardwell 提交于
      Ensure that in state FIN_WAIT2 or TIME_WAIT, where the connection is
      represented by a tcp_timewait_sock, we rate limit dupacks in response
      to incoming packets (a) with TCP timestamps that fail PAWS checks, or
      (b) with sequence numbers that are out of the acceptable window.
      
      We do not send a dupack in response to out-of-window packets if it has
      been less than sysctl_tcp_invalid_ratelimit (default 500ms) since we
      last sent a dupack in response to an out-of-window packet.
      Reported-by: NAvery Fay <avery@mixpanel.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4fb17a60
    • N
      tcp: mitigate ACK loops for connections as tcp_sock · f2b2c582
      Neal Cardwell 提交于
      Ensure that in state ESTABLISHED, where the connection is represented
      by a tcp_sock, we rate limit dupacks in response to incoming packets
      (a) with TCP timestamps that fail PAWS checks, or (b) with sequence
      numbers or ACK numbers that are out of the acceptable window.
      
      We do not send a dupack in response to out-of-window packets if it has
      been less than sysctl_tcp_invalid_ratelimit (default 500ms) since we
      last sent a dupack in response to an out-of-window packet.
      
      There is already a similar (although global) rate-limiting mechanism
      for "challenge ACKs". When deciding whether to send a challence ACK,
      we first consult the new per-connection rate limit, and then the
      global rate limit.
      Reported-by: NAvery Fay <avery@mixpanel.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f2b2c582
    • N
      tcp: mitigate ACK loops for connections as tcp_request_sock · a9b2c06d
      Neal Cardwell 提交于
      In the SYN_RECV state, where the TCP connection is represented by
      tcp_request_sock, we now rate-limit SYNACKs in response to a client's
      retransmitted SYNs: we do not send a SYNACK in response to client SYN
      if it has been less than sysctl_tcp_invalid_ratelimit (default 500ms)
      since we last sent a SYNACK in response to a client's retransmitted
      SYN.
      
      This allows the vast majority of legitimate client connections to
      proceed unimpeded, even for the most aggressive platforms, iOS and
      MacOS, which actually retransmit SYNs 1-second intervals for several
      times in a row. They use SYN RTO timeouts following the progression:
      1,1,1,1,1,2,4,8,16,32.
      Reported-by: NAvery Fay <avery@mixpanel.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a9b2c06d
    • N
      tcp: helpers to mitigate ACK loops by rate-limiting out-of-window dupacks · 032ee423
      Neal Cardwell 提交于
      Helpers for mitigating ACK loops by rate-limiting dupacks sent in
      response to incoming out-of-window packets.
      
      This patch includes:
      
      - rate-limiting logic
      - sysctl to control how often we allow dupacks to out-of-window packets
      - SNMP counter for cases where we rate-limited our dupack sending
      
      The rate-limiting logic in this patch decides to not send dupacks in
      response to out-of-window segments if (a) they are SYNs or pure ACKs
      and (b) the remote endpoint is sending them faster than the configured
      rate limit.
      
      We rate-limit our responses rather than blocking them entirely or
      resetting the connection, because legitimate connections can rely on
      dupacks in response to some out-of-window segments. For example, zero
      window probes are typically sent with a sequence number that is below
      the current window, and ZWPs thus expect to thus elicit a dupack in
      response.
      
      We allow dupacks in response to TCP segments with data, because these
      may be spurious retransmissions for which the remote endpoint wants to
      receive DSACKs. This is safe because segments with data can't
      realistically be part of ACK loops, which by their nature consist of
      each side sending pure/data-less ACKs to each other.
      
      The dupack interval is controlled by a new sysctl knob,
      tcp_invalid_ratelimit, given in milliseconds, in case an administrator
      needs to dial this upward in the face of a high-rate DoS attack. The
      name and units are chosen to be analogous to the existing analogous
      knob for ICMP, icmp_ratelimit.
      
      The default value for tcp_invalid_ratelimit is 500ms, which allows at
      most one such dupack per 500ms. This is chosen to be 2x faster than
      the 1-second minimum RTO interval allowed by RFC 6298 (section 2, rule
      2.4). We allow the extra 2x factor because network delay variations
      can cause packets sent at 1 second intervals to be compressed and
      arrive much closer.
      Reported-by: NAvery Fay <avery@mixpanel.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      032ee423
    • J
      net: openvswitch: Support masked set actions. · 83d2b9ba
      Jarno Rajahalme 提交于
      OVS userspace already probes the openvswitch kernel module for
      OVS_ACTION_ATTR_SET_MASKED support.  This patch adds the kernel module
      implementation of masked set actions.
      
      The existing set action sets many fields at once.  When only a subset
      of the IP header fields, for example, should be modified, all the IP
      fields need to be exact matched so that the other field values can be
      copied to the set action.  A masked set action allows modification of
      an arbitrary subset of the supported header bits without requiring the
      rest to be matched.
      
      Masked set action is now supported for all writeable key types, except
      for the tunnel key.  The set tunnel action is an exception as any
      input tunnel info is cleared before action processing starts, so there
      is no tunnel info to mask.
      
      The kernel module converts all (non-tunnel) set actions to masked set
      actions.  This makes action processing more uniform, and results in
      less branching and duplicating the action processing code.  When
      returning actions to userspace, the fully masked set actions are
      converted back to normal set actions.  We use a kernel internal action
      code to be able to tell the userspace provided and converted masked
      set actions apart.
      Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
      Acked-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      83d2b9ba
    • S
      x86/tlb/trace: Do not trace on CPU that is offline · 6c8465a8
      Steven Rostedt (Red Hat) 提交于
      When taking a CPU down for suspend and resume, a tracepoint may be called
      when the CPU has been designated offline. As tracepoints require RCU for
      protection, they must not be called if the current CPU is offline.
      
      Unfortunately, trace_tlb_flush() is called in this scenario as was noted
      by LOCKDEP:
      
      ...
      
       Disabling non-boot CPUs ...
       intel_pstate CPU 1 exiting
      
       ===============================
       smpboot: CPU 1 didn't die...
       [ INFO: suspicious RCU usage. ]
       3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted
       -------------------------------
       include/trace/events/tlb.h:35 suspicious rcu_dereference_check() usage!
      
       other info that might help us debug this:
      
       RCU used illegally from offline CPU!
       rcu_scheduler_active = 1, debug_locks = 0
       no locks held by swapper/1/0.
      
       stack backtrace:
       CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150204.1-iniza-small #1
       Hardware name: SAMSUNG ELECTRONICS CO., LTD. 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
        0000000000000001 ffff88011a44fe18 ffffffff817e370d 0000000000000011
        ffff88011a448290 ffff88011a44fe48 ffffffff810d6847 ffff8800c66b9600
        0000000000000001 ffff88011a44c000 ffffffff81cb3900 ffff88011a44fe78
       Call Trace:
        [<ffffffff817e370d>] dump_stack+0x4c/0x65
        [<ffffffff810d6847>] lockdep_rcu_suspicious+0xe7/0x120
        [<ffffffff810b71a5>] idle_task_exit+0x205/0x2c0
        [<ffffffff81054c4e>] play_dead_common+0xe/0x50
        [<ffffffff81054ca5>] native_play_dead+0x15/0x140
        [<ffffffff8102963f>] arch_cpu_idle_dead+0xf/0x20
        [<ffffffff810cd89e>] cpu_startup_entry+0x37e/0x580
        [<ffffffff81053e20>] start_secondary+0x140/0x150
       intel_pstate CPU 2 exiting
      
      ...
      
      By converting the tlb_flush tracepoint to a TRACE_EVENT_CONDITION where the
      condition is cpu_online(smp_processor_id()), we can avoid calling RCU protected
      code when the CPU is offline.
      
      Link: http://lkml.kernel.org/r/CA+icZUUGiGDoL5NU8RuxKzFjoLjEKRtUWx=JB8B9a0EQv-eGzQ@mail.gmail.com
      
      Cc: stable@vger.kernel.org # 3.17+
      Fixes: d17d8f9d "x86/mm: Add tracepoints for TLB flushes"
      Reported-by: NSedat Dilek <sedat.dilek@gmail.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Suggested-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: NDave Hansen <dave@sr71.net>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      6c8465a8
    • S
      tracing: Add condition check to RCU lockdep checks · a05d59a5
      Steven Rostedt (Red Hat) 提交于
      The trace_tlb_flush() tracepoint can be called when a CPU is going offline.
      When a CPU is offline, RCU is no longer watching that CPU and since the
      tracepoint is protected by RCU, it must not be called. To prevent the
      tlb_flush tracepoint from being called when the CPU is offline, it was
      converted to a TRACE_EVENT_CONDITION where the condition checks if the
      CPU is online before calling the tracepoint.
      
      Unfortunately, this was not enough to stop lockdep from complaining about
      it. Even though the RCU protected code of the tracepoint will never be
      called, the condition is hidden within the tracepoint, and even though the
      condition prevents RCU code from being called, the lockdep checks are
      outside the tracepoint (this is to test tracepoints even when they are not
      enabled).
      
      Even though tracepoints should be checked to be RCU safe when they are not
      enabled, the condition should still be considered when checking RCU.
      
      Link: http://lkml.kernel.org/r/CA+icZUUGiGDoL5NU8RuxKzFjoLjEKRtUWx=JB8B9a0EQv-eGzQ@mail.gmail.com
      
      Fixes: 3a630178 "tracing: generate RCU warnings even when tracepoints are disabled"
      Cc: stable@vger.kernel.org # 3.18+
      Acked-by: NDave Hansen <dave@sr71.net>
      Reported-by: NSedat Dilek <sedat.dilek@gmail.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a05d59a5