1. 11 8月, 2016 1 次提交
  2. 09 6月, 2016 2 次提交
    • M
      mac80211: implement fair queueing per txq · fa962b92
      Michal Kazior 提交于
      mac80211's software queues were designed to work
      very closely with device tx queues. They are
      required to make use of 802.11 packet aggregation
      easily and efficiently.
      
      Due to the way 802.11 aggregation is designed it
      only makes sense to keep fair queuing as close to
      hardware as possible to reduce induced latency and
      inertia and provide the best flow responsiveness.
      
      This change doesn't translate directly to
      immediate and significant gains. End result
      depends on driver's induced latency. Best results
      can be achieved if driver keeps its own tx
      queue/fifo fill level to a minimum.
      Signed-off-by: NMichal Kazior <michal.kazior@tieto.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      fa962b92
    • M
      mac80211: skip netdev queue control with software queuing · 80a83cfc
      Michal Kazior 提交于
      Qdiscs are designed with no regard to 802.11
      aggregation requirements and hand out
      packet-by-packet with no guarantee they are
      destined to the same tid. This does more bad than
      good no matter how fairly a given qdisc may behave
      on an ethernet interface.
      
      Software queuing used per-AC netdev subqueue
      congestion control whenever a global AC limit was
      hit. This meant in practice a single station or
      tid queue could starve others rather easily. This
      could resonate with qdiscs in a bad way or could
      just end up with poor aggregation performance.
      Increasing the AC limit would increase induced
      latency which is also bad.
      
      Disabling qdiscs by default and performing
      taildrop instead of netdev subqueue congestion
      control on the other hand makes it possible for
      tid queues to fill up "in the meantime" while
      preventing stations starving each other.
      
      This increases aggregation opportunities and
      should allow software queuing based drivers
      achieve better performance by utilizing airtime
      more efficiently with big aggregates.
      Signed-off-by: NMichal Kazior <michal.kazior@tieto.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      80a83cfc
  3. 06 4月, 2016 8 次提交
    • F
      mac80211: add A-MSDU tx support · 6e0456b5
      Felix Fietkau 提交于
      Requires software tx queueing and fast-xmit support. For good
      performance, drivers need frag_list support as well. This avoids the
      need for copying data of aggregated frames. Running without it is only
      supported for debugging purposes.
      
      To avoid performance and packet size issues, the rate control module or
      driver needs to limit the maximum A-MSDU size by setting
      max_rc_amsdu_len in struct ieee80211_sta.
      Signed-off-by: NFelix Fietkau <nbd@openwrt.org>
      [fix locking issue]
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      6e0456b5
    • J
      mac80211: enable collecting station statistics per-CPU · c9c5962b
      Johannes Berg 提交于
      If the driver advertises the new HW flag USE_RSS, make the
      station statistics on the fast-rx path per-CPU. This will
      enable calling the RX in parallel, only hitting locking or
      shared cachelines when the fast-RX path isn't available.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      c9c5962b
    • J
      mac80211: add fast-rx path · 49ddf8e6
      Johannes Berg 提交于
      The regular RX path has a lot of code, but with a few
      assumptions on the hardware it's possible to reduce the
      amount of code significantly. Currently the assumptions
      on the driver are the following:
       * hardware/driver reordering buffer (if supporting aggregation)
       * hardware/driver decryption & PN checking (if using encryption)
       * hardware/driver did de-duplication
       * hardware/driver did A-MSDU deaggregation
       * AP_LINK_PS is used (in AP mode)
       * no client powersave handling in mac80211 (in client mode)
      
      of which some are actually checked per packet:
       * de-duplication
       * PN checking
       * decryption
      and additionally packets must
       * not be A-MSDU (have been deaggregated by driver/device)
       * be data packets
       * not be fragmented
       * be unicast
       * have RFC 1042 header
      
      Additionally dynamically we assume:
       * no encryption or CCMP/GCMP, TKIP/WEP/other not allowed
       * station must be authorized
       * 4-addr format not enabled
      
      Some data needed for the RX path is cached in a new per-station
      "fast_rx" structure, so that we only need to look at this and
      the packet, no other memory when processing packets on the fast
      RX path.
      
      After doing the above per-packet checks, the data path collapses
      down to a pretty simple conversion function taking advantage of
      the data cached in the small fast_rx struct.
      
      This should speed up the RX processing, and will make it easier
      to reason about parallelizing RX (for which statistics will need
      to be per-CPU still.)
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      49ddf8e6
    • J
      mac80211: fix RX u64 stats consistency on 32-bit platforms · 0f9c5a61
      Johannes Berg 提交于
      On 32-bit platforms, the 64-bit counters we keep need to be protected
      to be consistently read. Use the u64_stats_sync mechanism to do that.
      
      In order to not end up with overly long lines, refactor the tidstats
      assignments a bit.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      0f9c5a61
    • J
      mac80211: fix last RX rate data consistency · 4f6b1b3d
      Johannes Berg 提交于
      When storing the last_rate_* values in the RX code, there's nothing
      to guarantee consistency, so a concurrent reader could see, e.g.
      last_rate_idx on the new value, but last_rate_flag still on the old,
      getting completely bogus values in the end.
      
      To fix this, I lifted the sta_stats_encode_rate() function from my
      old rate statistics code, which encodes the entire rate data into a
      single 16-bit value, avoiding the consistency issue.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      4f6b1b3d
    • J
      mac80211: add separate last_ack variable · b8da6b6a
      Johannes Berg 提交于
      Instead of touching the rx_stats.last_rx from the status path, introduce
      and use a status_stats.last_ack variable. This will make rx_stats.last_rx
      indicate when the last frame was received, making it available for real
      "last_rx" and statistics gathering; statistics, when done per-CPU, will
      need to figure out which place was updated last for those items where the
      "last" value is exposed.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      b8da6b6a
    • J
      mac80211: move averaged values out of rx_stats · 0be6ed13
      Johannes Berg 提交于
      Move the averaged values out of rx_stats and into rx_stats_avg,
      to cleanly split them out. The averaged ones cannot be supported
      for parallel RX in a per-CPU fashion, while the other values can
      be collected per CPU and then combined/selected when needed.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      0be6ed13
    • A
      mac80211: track and tell driver about GO client P2P PS abilities · 52cfa1d6
      Ayala Beker 提交于
      Legacy clients don't support P2P power save mechanism, and thus if a P2P GO
      has a legacy client connected to it, it should disable P2P PS mechanisms.
      Let the driver know about this with a new bss_conf parameter.
      Signed-off-by: NAyala Beker <ayala.beker@intel.com>
      Signed-off-by: NEmmanuel Grumbach <emmanuel.grumbach@intel.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      52cfa1d6
  4. 05 4月, 2016 3 次提交
  5. 24 2月, 2016 4 次提交
  6. 14 1月, 2016 1 次提交
    • E
      mac80211: fix PS-Poll handling · 1a57081a
      Emmanuel Grumbach 提交于
      My commit below broken PS-Poll handling. In case the driver
      has no frames buffered, driver_release_tids will be 0, but
      calling find_highest_prio_tid() with 0 as a parameter is
      not a good idea:
      fls(0) - 1 = -1.
      This bug caused mac80211 to think that frames were buffered
      in the driver which in turn was confused because mac80211
      was asking to release frames that were not reported to
      exist.
      On iwlwifi, this led to the WARNING below:
      
      WARNING: CPU: 0 PID: 11230 at drivers/net/wireless/intel/iwlwifi/mvm/sta.c:1733 iwl_mvm_sta_modify_sleep_tx_count+0x2af/0x320 [iwlmvm]()
      ffffffffc0627c60 ffff8800069b7648 ffffffff81888913 0000000000000000
      0000000000000000 ffff8800069b7688 ffffffff81089d6a ffff8800069b7678
      0000000000000001 ffff88003b35abf0 ffff88000698b128 ffff8800069b76d4
      Call Trace:
      [<ffffffff81888913>] dump_stack+0x4c/0x65
      [<ffffffff81089d6a>] warn_slowpath_common+0x8a/0xc0
      [<ffffffff81089e5a>] warn_slowpath_null+0x1a/0x20
      [<ffffffffc05f36bf>] iwl_mvm_sta_modify_sleep_tx_count+0x2af/0x320 [iwlmvm]
      [<ffffffffc05dae41>] iwl_mvm_mac_release_buffered_frames+0x31/0x40 [iwlmvm]
      [<ffffffffc045d8b6>] ieee80211_sta_ps_deliver_response+0x6e6/0xd80 [mac80211]
      [<ffffffffc0461296>] ieee80211_sta_ps_deliver_poll_response+0x26/0x30 [mac80211]
      [<ffffffffc048f743>] ieee80211_rx_handlers+0xa83/0x2900 [mac80211]
      [<ffffffffc04917ad>] ieee80211_prepare_and_rx_handle+0x1ed/0xa70 [mac80211]
      [<ffffffffc045e3d5>] ? sta_info_get_bss+0x5/0x4a0 [mac80211]
      [<ffffffffc04925b6>] ieee80211_rx_napi+0x586/0xcd0 [mac80211]
      [<ffffffffc05eaa3e>] iwl_mvm_rx_rx_mpdu+0x59e/0xc60 [iwlmvm]
      
      Fixes: 0ead2510 ("mac80211: allow the driver to send EOSP when needed")
      Signed-off-by: NEmmanuel Grumbach <emmanuel.grumbach@intel.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      1a57081a
  7. 04 12月, 2015 3 次提交
  8. 21 10月, 2015 2 次提交
    • J
      mac80211: move station statistics into sub-structs · e5a9f8d0
      Johannes Berg 提交于
      Group station statistics by where they're (mostly) updated
      (TX, RX and TX-status) and group them into sub-structs of
      the struct sta_info.
      
      Also rename the variables since the grouping now makes it
      obvious where they belong.
      
      This makes it easier to identify where the statistics are
      updated in the code, and thus easier to think about them.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      e5a9f8d0
    • J
      mac80211: move beacon_loss_count into ifmgd · 976bd9ef
      Johannes Berg 提交于
      There's little point in keeping (and even sending to userspace)
      the beacon_loss_count value per station, since it can only apply
      to the AP on a managed-mode connection. Move the value to ifmgd,
      advertise it only in managed mode, and remove it from ethtool as
      it's available through better interfaces.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      976bd9ef
  9. 15 10月, 2015 1 次提交
  10. 05 10月, 2015 1 次提交
    • A
      mac80211: use ktime_get_seconds · 84b00607
      Arnd Bergmann 提交于
      The mac80211 code uses ktime_get_ts to measure the connected time.
      As this uses monotonic time, it is y2038 safe on 32-bit systems,
      but we still want to deprecate the use of 'timespec' because most
      other users are broken.
      
      This changes the code to use ktime_get_seconds() instead, which
      avoids the timespec structure and is slightly more efficient.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: linux-wireless@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      84b00607
  11. 14 8月, 2015 1 次提交
    • J
      mac80211: use DECLARE_EWMA · 40d9a38a
      Johannes Berg 提交于
      Instead of using the out-of-line average calculation, use the new
      DECLARE_EWMA() macro to declare a signal EWMA, and use that.
      
      This actually *reduces* the code size slightly (on x86-64) while
      also reducing the station info size by 80 bytes.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      40d9a38a
  12. 17 7月, 2015 3 次提交
  13. 10 6月, 2015 1 次提交
    • J
      mac80211: convert HW flags to unsigned long bitmap · 30686bf7
      Johannes Berg 提交于
      As we're running out of hardware capability flags pretty quickly,
      convert them to use the regular test_bit() style unsigned long
      bitmaps.
      
      This introduces a number of helper functions/macros to set and to
      test the bits, along with new debugfs code.
      
      The occurrences of an explicit __clear_bit() are intentional, the
      drivers were never supposed to change their supported bits on the
      fly. We should investigate changing this to be a per-frame flag.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      30686bf7
  14. 24 4月, 2015 1 次提交
  15. 23 4月, 2015 2 次提交
  16. 22 4月, 2015 1 次提交
    • J
      mac80211: add TX fastpath · 17c18bf8
      Johannes Berg 提交于
      In order to speed up mac80211's TX path, add the "fast-xmit" cache
      that will cache the data frame 802.11 header and other data to be
      able to build the frame more quickly. This cache is rebuilt when
      external triggers imply changes, but a lot of the checks done per
      packet today are simplified away to the check for the cache.
      
      There's also a more detailed description in the code.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      17c18bf8
  17. 20 4月, 2015 2 次提交
    • J
      mac80211: lock rate control · 35c347ac
      Johannes Berg 提交于
      Both minstrel (reported by Sven Eckelmann) and the iwlwifi rate
      control aren't properly taking concurrency into account. It's
      likely that the same is true for other rate control algorithms.
      
      In the case of minstrel this manifests itself in crashes when an
      update and other data access are run concurrently, for example
      when the stations change bandwidth or similar. In iwlwifi, this
      can cause firmware crashes.
      
      Since fixing all rate control algorithms will be very difficult,
      just provide locking for invocations. This protects the internal
      data structures the algorithms maintain.
      
      I've manipulated hostapd to test this, by having it change its
      advertised bandwidth roughly ever 150ms. At the same time, I'm
      running a flood ping between the client and the AP, which causes
      this race of update vs. get_rate/status to easily happen on the
      client. With this change, the system survives this test.
      Reported-by: NSven Eckelmann <sven@open-mesh.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      35c347ac
    • B
      mac80211: introduce plink lock for plink fields · 48bf6bed
      Bob Copeland 提交于
      The mesh plink code uses sta->lock to serialize access to the
      plink state fields between the peer link state machine and the
      peer link timer.  Some paths (e.g. those involving
      mps_qos_null_tx()) unfortunately hold this spinlock across
      frame tx, which is soon to be disallowed.  Add a new spinlock
      just for plink access.
      Signed-off-by: NBob Copeland <me@bobcopeland.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      48bf6bed
  18. 02 4月, 2015 1 次提交
    • F
      mac80211: add an intermediate software queue implementation · ba8c3d6f
      Felix Fietkau 提交于
      This allows drivers to request per-vif and per-sta-tid queues from which
      they can pull frames. This makes it easier to keep the hardware queues
      short, and to improve fairness between clients and vifs.
      
      The task of scheduling packet transmission is left up to the driver -
      queueing is controlled by mac80211. Drivers can only dequeue packets by
      calling ieee80211_tx_dequeue. This makes it possible to add active queue
      management later without changing drivers using this code.
      
      This can also be used as a starting point to implement A-MSDU
      aggregation in a way that does not add artificially induced latency.
      Signed-off-by: NFelix Fietkau <nbd@openwrt.org>
      [resolved minor context conflict, minor changes, endian annotations]
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      ba8c3d6f
  19. 01 4月, 2015 1 次提交
    • J
      mac80211: use rhashtable for station table · 7bedd0cf
      Johannes Berg 提交于
      We currently have a hand-rolled table with 256 entries and are
      using the last byte of the MAC address as the hash. This hash
      is obviously very fast, but collisions are easily created and
      we waste a lot of space in the common case of just connecting
      as a client to an AP where we just have a single station. The
      other common case of an AP is also suboptimal due to the size
      of the hash table and the ease of causing collisions.
      
      Convert all of this to use rhashtable with jhash, which gives
      us the advantage of a far better hash function (with random
      perturbation to avoid hash collision attacks) and of course
      that the hash table grows and shrinks dynamically with chain
      length, improving both cases above.
      
      Use a specialised hash function (using jhash, but with fixed
      length) to achieve better compiler optimisation as suggested
      by Sergey Ryazanov.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      7bedd0cf
  20. 20 3月, 2015 1 次提交