1. 06 4月, 2016 6 次提交
    • J
      mac80211: enable collecting station statistics per-CPU · c9c5962b
      Johannes Berg 提交于
      If the driver advertises the new HW flag USE_RSS, make the
      station statistics on the fast-rx path per-CPU. This will
      enable calling the RX in parallel, only hitting locking or
      shared cachelines when the fast-RX path isn't available.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      c9c5962b
    • J
      mac80211: add fast-rx path · 49ddf8e6
      Johannes Berg 提交于
      The regular RX path has a lot of code, but with a few
      assumptions on the hardware it's possible to reduce the
      amount of code significantly. Currently the assumptions
      on the driver are the following:
       * hardware/driver reordering buffer (if supporting aggregation)
       * hardware/driver decryption & PN checking (if using encryption)
       * hardware/driver did de-duplication
       * hardware/driver did A-MSDU deaggregation
       * AP_LINK_PS is used (in AP mode)
       * no client powersave handling in mac80211 (in client mode)
      
      of which some are actually checked per packet:
       * de-duplication
       * PN checking
       * decryption
      and additionally packets must
       * not be A-MSDU (have been deaggregated by driver/device)
       * be data packets
       * not be fragmented
       * be unicast
       * have RFC 1042 header
      
      Additionally dynamically we assume:
       * no encryption or CCMP/GCMP, TKIP/WEP/other not allowed
       * station must be authorized
       * 4-addr format not enabled
      
      Some data needed for the RX path is cached in a new per-station
      "fast_rx" structure, so that we only need to look at this and
      the packet, no other memory when processing packets on the fast
      RX path.
      
      After doing the above per-packet checks, the data path collapses
      down to a pretty simple conversion function taking advantage of
      the data cached in the small fast_rx struct.
      
      This should speed up the RX processing, and will make it easier
      to reason about parallelizing RX (for which statistics will need
      to be per-CPU still.)
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      49ddf8e6
    • J
      mac80211: fix RX u64 stats consistency on 32-bit platforms · 0f9c5a61
      Johannes Berg 提交于
      On 32-bit platforms, the 64-bit counters we keep need to be protected
      to be consistently read. Use the u64_stats_sync mechanism to do that.
      
      In order to not end up with overly long lines, refactor the tidstats
      assignments a bit.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      0f9c5a61
    • J
      mac80211: fix last RX rate data consistency · 4f6b1b3d
      Johannes Berg 提交于
      When storing the last_rate_* values in the RX code, there's nothing
      to guarantee consistency, so a concurrent reader could see, e.g.
      last_rate_idx on the new value, but last_rate_flag still on the old,
      getting completely bogus values in the end.
      
      To fix this, I lifted the sta_stats_encode_rate() function from my
      old rate statistics code, which encodes the entire rate data into a
      single 16-bit value, avoiding the consistency issue.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      4f6b1b3d
    • J
      mac80211: add separate last_ack variable · b8da6b6a
      Johannes Berg 提交于
      Instead of touching the rx_stats.last_rx from the status path, introduce
      and use a status_stats.last_ack variable. This will make rx_stats.last_rx
      indicate when the last frame was received, making it available for real
      "last_rx" and statistics gathering; statistics, when done per-CPU, will
      need to figure out which place was updated last for those items where the
      "last" value is exposed.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      b8da6b6a
    • J
      mac80211: move averaged values out of rx_stats · 0be6ed13
      Johannes Berg 提交于
      Move the averaged values out of rx_stats and into rx_stats_avg,
      to cleanly split them out. The averaged ones cannot be supported
      for parallel RX in a per-CPU fashion, while the other values can
      be collected per CPU and then combined/selected when needed.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      0be6ed13
  2. 05 4月, 2016 4 次提交
  3. 24 2月, 2016 2 次提交
  4. 04 12月, 2015 1 次提交
  5. 21 10月, 2015 3 次提交
  6. 15 10月, 2015 1 次提交
  7. 22 9月, 2015 1 次提交
  8. 14 8月, 2015 1 次提交
    • J
      mac80211: use DECLARE_EWMA · 40d9a38a
      Johannes Berg 提交于
      Instead of using the out-of-line average calculation, use the new
      DECLARE_EWMA() macro to declare a signal EWMA, and use that.
      
      This actually *reduces* the code size slightly (on x86-64) while
      also reducing the station info size by 80 bytes.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      40d9a38a
  9. 17 7月, 2015 8 次提交
  10. 10 6月, 2015 1 次提交
  11. 06 5月, 2015 1 次提交
  12. 05 5月, 2015 1 次提交
  13. 22 4月, 2015 2 次提交
    • J
      mac80211: extend fast-xmit for more ciphers · e495c247
      Johannes Berg 提交于
      When crypto is offloaded then in some cases it's all handled
      by the device, and in others only some space for the IV must
      be reserved in the frame. Handle both of these cases in the
      fast-xmit path, up to a limit of 18 bytes of space for IVs.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      e495c247
    • J
      mac80211: add TX fastpath · 17c18bf8
      Johannes Berg 提交于
      In order to speed up mac80211's TX path, add the "fast-xmit" cache
      that will cache the data frame 802.11 header and other data to be
      able to build the frame more quickly. This cache is rebuilt when
      external triggers imply changes, but a lot of the checks done per
      packet today are simplified away to the check for the cache.
      
      There's also a more detailed description in the code.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      17c18bf8
  14. 20 4月, 2015 2 次提交
    • J
      mac80211: lock rate control · 35c347ac
      Johannes Berg 提交于
      Both minstrel (reported by Sven Eckelmann) and the iwlwifi rate
      control aren't properly taking concurrency into account. It's
      likely that the same is true for other rate control algorithms.
      
      In the case of minstrel this manifests itself in crashes when an
      update and other data access are run concurrently, for example
      when the stations change bandwidth or similar. In iwlwifi, this
      can cause firmware crashes.
      
      Since fixing all rate control algorithms will be very difficult,
      just provide locking for invocations. This protects the internal
      data structures the algorithms maintain.
      
      I've manipulated hostapd to test this, by having it change its
      advertised bandwidth roughly ever 150ms. At the same time, I'm
      running a flood ping between the client and the AP, which causes
      this race of update vs. get_rate/status to easily happen on the
      client. With this change, the system survives this test.
      Reported-by: NSven Eckelmann <sven@open-mesh.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      35c347ac
    • B
      mac80211: introduce plink lock for plink fields · 48bf6bed
      Bob Copeland 提交于
      The mesh plink code uses sta->lock to serialize access to the
      plink state fields between the peer link state machine and the
      peer link timer.  Some paths (e.g. those involving
      mps_qos_null_tx()) unfortunately hold this spinlock across
      frame tx, which is soon to be disallowed.  Add a new spinlock
      just for plink access.
      Signed-off-by: NBob Copeland <me@bobcopeland.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      48bf6bed
  15. 02 4月, 2015 1 次提交
    • F
      mac80211: add an intermediate software queue implementation · ba8c3d6f
      Felix Fietkau 提交于
      This allows drivers to request per-vif and per-sta-tid queues from which
      they can pull frames. This makes it easier to keep the hardware queues
      short, and to improve fairness between clients and vifs.
      
      The task of scheduling packet transmission is left up to the driver -
      queueing is controlled by mac80211. Drivers can only dequeue packets by
      calling ieee80211_tx_dequeue. This makes it possible to add active queue
      management later without changing drivers using this code.
      
      This can also be used as a starting point to implement A-MSDU
      aggregation in a way that does not add artificially induced latency.
      Signed-off-by: NFelix Fietkau <nbd@openwrt.org>
      [resolved minor context conflict, minor changes, endian annotations]
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      ba8c3d6f
  16. 01 4月, 2015 2 次提交
    • J
      mac80211: fix RX A-MPDU session reorder timer deletion · 788211d8
      Johannes Berg 提交于
      There's an issue with the way the RX A-MPDU reorder timer is
      deleted that can cause a kernel crash like this:
      
       * tid_rx is removed - call_rcu(ieee80211_free_tid_rx)
       * station is destroyed
       * reorder timer fires before ieee80211_free_tid_rx() runs,
         accessing the station, thus potentially crashing due to
         the use-after-free
      
      The station deletion is protected by synchronize_net(), but
      that isn't enough -- ieee80211_free_tid_rx() need not have
      run when that returns (it deletes the timer.) We could use
      rcu_barrier() instead of synchronize_net(), but that's much
      more expensive.
      
      Instead, to fix this, add a field tracking that the session
      is being deleted. In this case, the only re-arming of the
      timer happens with the reorder spinlock held, so make that
      code not rearm it if the session is being deleted and also
      delete the timer after setting that field. This ensures the
      timer cannot fire after ___ieee80211_stop_rx_ba_session()
      returns, which fixes the problem.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      788211d8
    • J
      mac80211: use rhashtable for station table · 7bedd0cf
      Johannes Berg 提交于
      We currently have a hand-rolled table with 256 entries and are
      using the last byte of the MAC address as the hash. This hash
      is obviously very fast, but collisions are easily created and
      we waste a lot of space in the common case of just connecting
      as a client to an AP where we just have a single station. The
      other common case of an AP is also suboptimal due to the size
      of the hash table and the ease of causing collisions.
      
      Convert all of this to use rhashtable with jhash, which gives
      us the advantage of a far better hash function (with random
      perturbation to avoid hash collision attacks) and of course
      that the hash table grows and shrinks dynamically with chain
      length, improving both cases above.
      
      Use a specialised hash function (using jhash, but with fixed
      length) to achieve better compiler optimisation as suggested
      by Sergey Ryazanov.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      7bedd0cf
  17. 01 3月, 2015 1 次提交
    • J
      mac80211: remove TX latency measurement code · abfbc3af
      Johannes Berg 提交于
      Revert commit ad38bfc9 ("mac80211: Tx frame latency statistics")
      (along with some follow-up fixes).
      
      This code turned out not to be as useful in the current form as we
      thought, and we've internally hacked it up more, but that's not
      very suitable for upstream (for now), and we might just do that
      with tracing instead.
      
      Therefore, for now at least, remove this code. We might also need
      to use the skb->tstamp field for the TCP performance issue, which
      is more important than the debugging.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      abfbc3af
  18. 08 1月, 2015 1 次提交
  19. 20 11月, 2014 1 次提交