1. 03 11月, 2015 1 次提交
  2. 22 10月, 2015 4 次提交
  3. 21 10月, 2015 5 次提交
    • Y
      tcp: use RACK to detect losses · 4f41b1c5
      Yuchung Cheng 提交于
      This patch implements the second half of RACK that uses the the most
      recent transmit time among all delivered packets to detect losses.
      
      tcp_rack_mark_lost() is called upon receiving a dubious ACK.
      It then checks if an not-yet-sacked packet was sent at least
      "reo_wnd" prior to the sent time of the most recently delivered.
      If so the packet is deemed lost.
      
      The "reo_wnd" reordering window starts with 1msec for fast loss
      detection and changes to min-RTT/4 when reordering is observed.
      We found 1msec accommodates well on tiny degree of reordering
      (<3 pkts) on faster links. We use min-RTT instead of SRTT because
      reordering is more of a path property but SRTT can be inflated by
      self-inflicated congestion. The factor of 4 is borrowed from the
      delayed early retransmit and seems to work reasonably well.
      
      Since RACK is still experimental, it is now used as a supplemental
      loss detection on top of existing algorithms. It is only effective
      after the fast recovery starts or after the timeout occurs. The
      fast recovery is still triggered by FACK and/or dupack threshold
      instead of RACK.
      
      We introduce a new sysctl net.ipv4.tcp_recovery for future
      experiments of loss recoveries. For now RACK can be disabled by
      setting it to 0.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f41b1c5
    • Y
      tcp: track the packet timings in RACK · 659a8ad5
      Yuchung Cheng 提交于
      This patch is the first half of the RACK loss recovery.
      
      RACK loss recovery uses the notion of time instead
      of packet sequence (FACK) or counts (dupthresh). It's inspired by the
      previous FACK heuristic in tcp_mark_lost_retrans(): when a limited
      transmit (new data packet) is sacked, then current retransmitted
      sequence below the newly sacked sequence must been lost,
      since at least one round trip time has elapsed.
      
      But it has several limitations:
      1) can't detect tail drops since it depends on limited transmit
      2) is disabled upon reordering (assumes no reordering)
      3) only enabled in fast recovery ut not timeout recovery
      
      RACK (Recently ACK) addresses these limitations with the notion
      of time instead: a packet P1 is lost if a later packet P2 is s/acked,
      as at least one round trip has passed.
      
      Since RACK cares about the time sequence instead of the data sequence
      of packets, it can detect tail drops when later retransmission is
      s/acked while FACK or dupthresh can't. For reordering RACK uses a
      dynamically adjusted reordering window ("reo_wnd") to reduce false
      positives on ever (small) degree of reordering.
      
      This patch implements tcp_advanced_rack() which tracks the
      most recent transmission time among the packets that have been
      delivered (ACKed or SACKed) in tp->rack.mstamp. This timestamp
      is the key to determine which packet has been lost.
      
      Consider an example that the sender sends six packets:
      T1: P1 (lost)
      T2: P2
      T3: P3
      T4: P4
      T100: sack of P2. rack.mstamp = T2
      T101: retransmit P1
      T102: sack of P2,P3,P4. rack.mstamp = T4
      T205: ACK of P4 since the hole is repaired. rack.mstamp = T101
      
      We need to be careful about spurious retransmission because it may
      falsely advance tp->rack.mstamp by an RTT or an RTO, causing RACK
      to falsely mark all packets lost, just like a spurious timeout.
      
      We identify spurious retransmission by the ACK's TS echo value.
      If TS option is not applicable but the retransmission is acknowledged
      less than min-RTT ago, it is likely to be spurious. We refrain from
      using the transmission time of these spurious retransmissions.
      
      The second half is implemented in the next patch that marks packet
      lost using RACK timestamp.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      659a8ad5
    • Y
      tcp: skb_mstamp_after helper · 625a5e10
      Yuchung Cheng 提交于
      a helper to prepare the first main RACK patch.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      625a5e10
    • Y
      tcp: remove tcp_mark_lost_retrans() · af82f4e8
      Yuchung Cheng 提交于
      Remove the existing lost retransmit detection because RACK subsumes
      it completely. This also stops the overloading the ack_seq field of
      the skb control block.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af82f4e8
    • Y
      tcp: track min RTT using windowed min-filter · f6722583
      Yuchung Cheng 提交于
      Kathleen Nichols' algorithm for tracking the minimum RTT of a
      data stream over some measurement window. It uses constant space
      and constant time per update. Yet it almost always delivers
      the same minimum as an implementation that has to keep all
      the data in the window. The measurement window is tunable via
      sysctl.net.ipv4.tcp_min_rtt_wlen with a default value of 5 minutes.
      
      The algorithm keeps track of the best, 2nd best & 3rd best min
      values, maintaining an invariant that the measurement time of
      the n'th best >= n-1'th best. It also makes sure that the three
      values are widely separated in the time window since that bounds
      the worse case error when that data is monotonically increasing
      over the window.
      
      Upon getting a new min, we can forget everything earlier because
      it has no value - the new min is less than everything else in the
      window by definition and it's the most recent. So we restart fresh
      on every new min and overwrites the 2nd & 3rd choices. The same
      property holds for the 2nd & 3rd best.
      
      Therefore we have to maintain two invariants to maximize the
      information in the samples, one on values (1st.v <= 2nd.v <=
      3rd.v) and the other on times (now-win <=1st.t <= 2nd.t <= 3rd.t <=
      now). These invariants determine the structure of the code
      
      The RTT input to the windowed filter is the minimum RTT measured
      from ACK or SACK, or as the last resort from TCP timestamps.
      
      The accessor tcp_min_rtt() returns the minimum RTT seen in the
      window. ~0U indicates it is not available. The minimum is 1usec
      even if the true RTT is below that.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f6722583
  4. 19 10月, 2015 2 次提交
  5. 17 10月, 2015 4 次提交
    • E
      net: add pfmemalloc check in sk_add_backlog() · c7c49b8f
      Eric Dumazet 提交于
      Greg reported crashes hitting the following check in __sk_backlog_rcv()
      
      	BUG_ON(!sock_flag(sk, SOCK_MEMALLOC));
      
      The pfmemalloc bit is currently checked in sk_filter().
      
      This works correctly for TCP, because sk_filter() is ran in
      tcp_v[46]_rcv() before hitting the prequeue or backlog checks.
      
      For UDP or other protocols, this does not work, because the sk_filter()
      is ran from sock_queue_rcv_skb(), which might be called _after_ backlog
      queuing if socket is owned by user by the time packet is processed by
      softirq handler.
      
      Fixes: b4b9e355 ("netvm: set PF_MEMALLOC as appropriate during SKB processing")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NGreg Thelen <gthelen@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7c49b8f
    • A
      netfilter: turn NF_HOOK into an inline function · 008027c3
      Arnd Bergmann 提交于
      A recent change to the dst_output handling caused a new warning
      when the call to NF_HOOK() is the only used of a local variable
      passed as 'dev', and CONFIG_NETFILTER is disabled:
      
      net/ipv6/ip6_output.c: In function 'ip6_output':
      net/ipv6/ip6_output.c:135:21: warning: unused variable 'dev' [-Wunused-variable]
      
      The reason for this is that the NF_HOOK macro in this case does
      not reference the variable at all, and the call to dev_net(dev)
      got removed from the ip6_output function. To avoid that warning now
      and in the future, this changes the macro into an equivalent
      inline function, which tells the compiler that the variable is
      passed correctly but still unused.
      
      The dn_forward function apparently had the same problem in
      the past and added a local workaround that no longer works
      with the inline function. In order to avoid a regression, we
      have to also remove the #ifdef from decnet in the same patch.
      
      Fixes: ede2059d ("dst: Pass net into dst->output")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      008027c3
    • F
      netfilter: make nf_queue_entry_get_refs return void · ed78d09d
      Florian Westphal 提交于
      We don't care if module is being unloaded anymore since hook unregister
      handling will destroy queue entries using that hook.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      ed78d09d
    • F
      netfilter: remove hook owner refcounting · 2ffbceb2
      Florian Westphal 提交于
      since commit 8405a8ff ("netfilter: nf_qeueue: Drop queue entries on
      nf_unregister_hook") all pending queued entries are discarded.
      
      So we can simply remove all of the owner handling -- when module is
      removed it also needs to unregister all its hooks.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      2ffbceb2
  6. 16 10月, 2015 3 次提交
  7. 15 10月, 2015 12 次提交
  8. 14 10月, 2015 1 次提交
  9. 13 10月, 2015 8 次提交
    • A
      can: at91: remove at91_can_data · 42160a04
      Alexandre Belloni 提交于
      struct at91_can_data was used to pass a callback to the driver, allowing it
      to switch the transceiver on and off. As all at91 boards are now using DT,
      this is not used anymore, remove that structure.
      Signed-off-by: NAlexandre Belloni <alexandre.belloni@free-electrons.com>
      Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      42160a04
    • A
      can: avoid using timeval for uapi · ba61a8d9
      Arnd Bergmann 提交于
      The can subsystem communicates with user space using a bcm_msg_head
      header, which contains two timestamps. This is problematic for
      multiple reasons:
      
      a) The structure layout is currently incompatible between 64-bit
         user space and 32-bit user space, and cannot work in compat
         mode (other than x32).
      
      b) The timeval structure layout will change in 32-bit user
         space when we fix the y2038 overflow problem by redefining
         time_t to 64-bit, making new 32-bit user space incompatible
         with the current kernel interface.
         Cars last a long time and often use old kernels, so the actual
         users of this code are the most likely ones to migrate to y2038
         safe user space.
      
      This tries to work around part of the problem by changing the
      publicly visible user interface in the header, but not the binary
      interface. Fortunately, the values passed around in the structure
      are relative times and do not actually suffer from the y2038
      overflow, so 32-bit is enough here.
      
      We replace the use of 'struct timeval' with a newly defined
      'struct bcm_timeval' that uses the exact same binary layout
      as before and that still suffers from problem a) but not problem
      b).
      
      The downside of this approach is that any user space program
      that currently assigns a timeval structure to these members
      rather than writing the tv_sec/tv_usec portions individually
      will suffer a compile-time error when built with an updated
      kernel header. Fixing this error makes it work fine with old
      and new headers though.
      
      We could address problem a) by using '__u32' or 'int' members
      rather than 'long', but that would have a more significant
      downside in also breaking support for all existing 64-bit user
      binaries that might be using this interface, which is likely
      not acceptable.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NOliver Hartkopp <socketcan@hartkopp.net>
      Cc: linux-can@vger.kernel.org
      Cc: linux-api@vger.kernel.org
      Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      ba61a8d9
    • D
      net: Add IPv6 support to l3mdev · ccf3c8c3
      David Ahern 提交于
      Add operations to retrieve cached IPv6 dst entry from l3mdev device
      and lookup IPv6 source address.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ccf3c8c3
    • A
      cfg80211: Add multiple scan plans for scheduled scan · 3b06d277
      Avraham Stern 提交于
      Add the option to configure multiple 'scan plans' for scheduled scan.
      Each 'scan plan' defines the number of scan cycles and the interval
      between scans. The scan plans are executed in the order they were
      configured. The last scan plan will always run infinitely and thus
      defines only the interval between scans.
      The maximum number of scan plans supported by the device and the
      maximum number of iterations in a single scan plan are advertised
      to userspace so it can configure the scan plans appropriately.
      
      When scheduled scan results are received there is no way to know which
      scan plan is being currently executed, so there is no way to know when
      the next scan iteration will start. This is not a problem, however.
      The scan start timestamp is only used for flushing old scan results,
      and there is no difference between flushing all results received until
      the end of the previous iteration or the start of the current one,
      since no results will be received in between.
      Signed-off-by: NAvraham Stern <avraham.stern@intel.com>
      Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      3b06d277
    • J
      wireless: add WNM action frame categories · af614261
      Johannes Berg 提交于
      Add the WNM and unprotected WNM categories and mark the latter
      as not robust.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      af614261
    • J
      wireless: update robust action frame list · a4288289
      Johannes Berg 提交于
      Unprotected DMG and VHT action frames are not protected, reflect
      that in the list.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      a4288289
    • D
      nl80211: allow BSS data to include CLOCK_BOOTTIME timestamp · 6e19bc4b
      Dmitry Shmidt 提交于
      For location and connectivity services, userspace would often like
      to know the time when the BSS was last seen. The current "last seen"
      value is calculated in a way that makes it less useful, especially
      if the system suspended in the meantime.
      
      Add the ability for the driver to report a real CLOCK_BOOTTIME stamp
      that can then be reported to userspace (if present).
      
      Drivers wishing to use this must be converted to the new API to call
      cfg80211_inform_bss_data() or cfg80211_inform_bss_frame_data(). They
      need to ensure the reported value is accurate enough even when the
      frame might have been buffered in the device (e.g. firmware.)
      Signed-off-by: NDmitry Shmidt <dimitrysh@google.com>
      [modified to use struct, inlines]
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      6e19bc4b
    • T
      Revert "mac80211: remove exposing 'mfp' to drivers" · 93f0490e
      Tamizh chelvam 提交于
      This reverts commit 5c48f120.
      
      Some device drivers (ath10k) offload part of aggregation including AddBA/DelBA
      negotiations to firmware. In such scenario, the PMF configuration of
      the station needs to be provided to driver to enable encryption of
      AddBA/DelBA action frames.
      Signed-off-by: NTamizh chelvam <c_traja@qti.qualcomm.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      93f0490e