1. 06 10月, 2015 2 次提交
  2. 17 9月, 2015 1 次提交
  3. 09 9月, 2015 1 次提交
  4. 26 8月, 2015 1 次提交
  5. 30 7月, 2015 1 次提交
    • P
      ath10k: initialize msdu ext. descriptor before use · ae7d3821
      Peter Oh 提交于
      Initial QCA99X0 support has a known issue with TCP Tx throughput.
      All other path such as UDP Tx/Rx and TCP Rx meet their expectation
      (> 900Mbps), but TCP Tx marked as low as 5Mbps when single pair is
      used on iperf.
      
      The root cause is turned out because TSO flag is not initialized
      properly so that firmware configures TSO in wrong way.
      TSO flags in msdu extension descriptor is required to be reset
      to indicate firmware there is no TSO is enabled, otherwise it
      could act as TSO is enabled which causes huge throughput drop.
      
      In fact, it's enough by resetting TSO flags only to prevent the
      unexpected behavior, but initializing whole msdu ext. descriptor
      will help to clear uncertainty of firmware could bring on as it
      constantly updated.
      Signed-off-by: NPeter Oh <poh@qca.qualcomm.com>
      Signed-off-by: NKalle Valo <kvalo@qca.qualcomm.com>
      ae7d3821
  6. 29 7月, 2015 2 次提交
    • D
      ath10k: enable raw encap mode and software crypto engine · ccec9038
      David Liu 提交于
      This patch enables raw Rx/Tx encap mode to support software based
      crypto engine. This patch introduces a new module param 'cryptmode'.
      
       cryptmode:
      
         0: Use hardware crypto engine globally with native Wi-Fi mode TX/RX
            encapsulation to the firmware. This is the default mode.
         1: Use sofware crypto engine globally with raw mode TX/RX
            encapsulation to the firmware.
      
      Known limitation:
         A-MSDU must be disabled for RAW Tx encap mode to perform well when
         heavy traffic is applied.
      
      Testing: (by Michal Kazior <michal.kazior@tieto.com>)
      
           a) Performance Testing
      
            cryptmode=1
             ap=qca988x sta=killer1525
              killer1525  ->  qca988x     194.496 mbps [tcp1 ip4]
              killer1525  ->  qca988x     238.309 mbps [tcp5 ip4]
              killer1525  ->  qca988x     266.958 mbps [udp1 ip4]
              killer1525  ->  qca988x     477.468 mbps [udp5 ip4]
              qca988x     ->  killer1525  301.378 mbps [tcp1 ip4]
              qca988x     ->  killer1525  297.949 mbps [tcp5 ip4]
              qca988x     ->  killer1525  331.351 mbps [udp1 ip4]
              qca988x     ->  killer1525  371.528 mbps [udp5 ip4]
             ap=killer1525 sta=qca988x
              qca988x     ->  killer1525  331.447 mbps [tcp1 ip4]
              qca988x     ->  killer1525  328.783 mbps [tcp5 ip4]
              qca988x     ->  killer1525  375.309 mbps [udp1 ip4]
              qca988x     ->  killer1525  403.379 mbps [udp5 ip4]
              killer1525  ->  qca988x     203.689 mbps [tcp1 ip4]
              killer1525  ->  qca988x     222.339 mbps [tcp5 ip4]
              killer1525  ->  qca988x     264.199 mbps [udp1 ip4]
              killer1525  ->  qca988x     479.371 mbps [udp5 ip4]
      
            Note:
             - only open network tested for RAW vs nwifi performance comparison
             - killer1525 (qca6174 hw2.2) is 2x2 device (hence max 866mbps)
             - used iperf
             - OTA, devices a few cm apart from each other, no shielding
             - tcpX/udpX, X - means number of threads used
      
            Overview:
             - relative Tx performance drop is seen but is within reasonable and
               expected threshold (A-MSDU must be disabled with RAW Tx)
      
           b) Connectivity Testing
      
            cryptmode=1
             ap=iwl6205 sta1=qca988x crypto=open     topology-1ap1sta          OK
             ap=iwl6205 sta1=qca988x crypto=wep1     topology-1ap1sta          OK
             ap=iwl6205 sta1=qca988x crypto=wpa      topology-1ap1sta          OK
             ap=iwl6205 sta1=qca988x crypto=wpa-ccmp topology-1ap1sta          OK
             ap=qca988x sta1=iwl6205 crypto=open     topology-1ap1sta          OK
             ap=qca988x sta1=iwl6205 crypto=wep1     topology-1ap1sta          OK
             ap=qca988x sta1=iwl6205 crypto=wpa      topology-1ap1sta          OK
             ap=qca988x sta1=iwl6205 crypto=wpa-ccmp topology-1ap1sta          OK
             ap=iwl6205 sta1=qca988x crypto=open     topology-1ap1sta2br       OK
             ap=iwl6205 sta1=qca988x crypto=wep1     topology-1ap1sta2br       OK
             ap=iwl6205 sta1=qca988x crypto=wpa      topology-1ap1sta2br       OK
             ap=iwl6205 sta1=qca988x crypto=wpa-ccmp topology-1ap1sta2br       OK
             ap=qca988x sta1=iwl6205 crypto=open     topology-1ap1sta2br       OK
             ap=qca988x sta1=iwl6205 crypto=wep1     topology-1ap1sta2br       OK
             ap=qca988x sta1=iwl6205 crypto=wpa      topology-1ap1sta2br       OK
             ap=qca988x sta1=iwl6205 crypto=wpa-ccmp topology-1ap1sta2br       OK
             ap=iwl6205 sta1=qca988x crypto=open     topology-1ap1sta2br1vlan  OK
             ap=iwl6205 sta1=qca988x crypto=wep1     topology-1ap1sta2br1vlan  OK
             ap=iwl6205 sta1=qca988x crypto=wpa      topology-1ap1sta2br1vlan  OK
             ap=iwl6205 sta1=qca988x crypto=wpa-ccmp topology-1ap1sta2br1vlan  OK
             ap=qca988x sta1=iwl6205 crypto=open     topology-1ap1sta2br1vlan  OK
             ap=qca988x sta1=iwl6205 crypto=wep1     topology-1ap1sta2br1vlan  OK
             ap=qca988x sta1=iwl6205 crypto=wpa      topology-1ap1sta2br1vlan  OK
             ap=qca988x sta1=iwl6205 crypto=wpa-ccmp topology-1ap1sta2br1vlan  OK
      
            Note:
             - each test takes all possible endpoint pairs and pings
             - each pair-ping flushes arp table
             - ip6 is used
      
           c) Testbed Topology:
      
            1ap1sta:
              [ap] ---- [sta]
      
              endpoints: ap, sta
      
            1ap1sta2br:
              [veth0] [ap] ---- [sta] [veth2]
                 |     |          |     |
              [veth1]  |          \   [veth3]
                  \   /            \  /
                  [br0]            [br1]
      
              endpoints: veth0, veth2, br0, br1
              note: STA works in 4addr mode, AP has wds_sta=1
      
            1ap1sta2br1vlan:
              [veth0] [ap] ---- [sta] [veth2]
                 |     |          |     |
              [veth1]  |          \   [veth3]
                  \   /            \  /
                [br0]              [br1]
                  |                  |
                [vlan0_id2]        [vlan1_id2]
      
              endpoints: vlan0_id2, vlan1_id2
              note: STA works in 4addr mode, AP has wds_sta=1
      
      Credits:
      
          Thanks to Michal Kazior <michal.kazior@tieto.com> who helped find the
          amsdu issue, contributed a workaround (already squashed into this
          patch), and contributed the throughput and connectivity tests results.
      Signed-off-by: NDavid Liu <cfliu.tw@gmail.com>
      Signed-off-by: NMichal Kazior <michal.kazior@tieto.com>
      Tested-by: NMichal Kazior <michal.kazior@tieto.com>
      Signed-off-by: NKalle Valo <kvalo@qca.qualcomm.com>
      ccec9038
    • Q
      ath10k: Improve performance by reducing tx_lock contention · 005fb161
      Qi Zhou 提交于
      During tx completion, tx_lock is held for longer than required, preventing
      efficient refill of htt->pending_tx. Refactor the code so that only MSDU
      related operations are protected by the lock.
      
      Improves downstream performance on a dual-core ARM Freescale LS1024A
      (f.k.a. Mindspeed Comcerto 2000) AP with a 3x3 client from 495 to 580 Mbps.
      Other CPU bound multicore systems may also benefit.
      Signed-off-by: NDenton Gentry <dgentry@google.com>
      Signed-off-by: NAvery Pennarun <apenwarr@google.com>
      [mfaltesek@google.com: removed conflicting code for tracking msdu_ids.]
      Signed-off-by: NMarty Faltesek <mfaltesek@google.com>
      Signed-off-by: NKalle Valo <kvalo@qca.qualcomm.com>
      005fb161
  7. 24 7月, 2015 3 次提交
  8. 02 7月, 2015 1 次提交
    • R
      ath10k: configure frag desc memory to target for qca99X0 · d9156b5f
      Raja Mani 提交于
      Pre qca99X0 chipsets follows the model where dynamically allocate
      memory for frag desc on getting new skb for TX. But, this is not
      going to be the case in qca99X0. It expects frag desc memory to be
      allocated at boot time and let the driver to reuse allocated memory
      after every TX completion. So there won't be any dynamic frag memory
      memory allocation in qca99X0 during data transmission.
      
      qca99X0 hardware doesn't need fragment desc address to be programmed
      in msdu descriptor for every data transaction. It needs to know only
      starting address of fragment descriptor at the time of the boot.
      During data transmission, qca99X0 hardware can retrieve corresponding
      frag addr by adding programmed frag desc base addr + msdu id.
      
      Allocate continuous fragment descriptor memory (same size as number of
      descriptor) at the time of target initialization and configure allocated
      dma address to the target via HTT_H2T_MSG_TYPE_FRAG_DESC_BANK_CFG.
      
      How this is allocated continuous memory is going to be used is not
      covered in this patch. It just allocates memory and hand over to firmware.
      If we don't do it at init time, qca99X0 will stall when firmware tries
      to do TX.
      Signed-off-by: NRaja Mani <rmani@qti.qualcomm.com>
      Signed-off-by: NKalle Valo <kvalo@qca.qualcomm.com>
      d9156b5f
  9. 02 4月, 2015 1 次提交
  10. 30 3月, 2015 1 次提交
  11. 04 2月, 2015 1 次提交
  12. 27 1月, 2015 2 次提交
  13. 13 1月, 2015 1 次提交
  14. 08 12月, 2014 1 次提交
  15. 26 11月, 2014 1 次提交
  16. 17 11月, 2014 1 次提交
  17. 31 10月, 2014 1 次提交
    • M
      ath10k: speed up hw recovery · 7962b0d8
      Michal Kazior 提交于
      In some cases hw recovery was taking an absurdly
      long time due to ath10k waiting for things that
      would never really complete.
      
      Instead of waiting for inevitable timeouts poke
      all completions and wakequeues and check if it's
      still worth waiting.
      
      Reading/writing ar->state requires conf_mutex.
      Since waiters might be holding it introduce a new
      flag CRASH_FLUSH so it's possible to tell waiters
      to abort whatever they were waiting for.
      Signed-off-by: NMichal Kazior <michal.kazior@tieto.com>
      Signed-off-by: NKalle Valo <kvalo@qca.qualcomm.com>
      7962b0d8
  18. 08 10月, 2014 1 次提交
  19. 07 10月, 2014 1 次提交
  20. 18 9月, 2014 2 次提交
  21. 27 8月, 2014 1 次提交
  22. 22 7月, 2014 1 次提交
    • M
      ath10k: prevent some tx flushing failures · 708b9bde
      Michal Kazior 提交于
      Firmware could request inspection of some
      submitted tx requests. Since the callback wasn't
      implemented it was possible to bleed tx msdu_ids
      which could translate to tx flushing timeouts.
      
      There's nothing ath10k can do to help firmware
      with tx processing now so just report all tx
      frames as already inspected to prevent firmware
      from sending up inspection events and force it to
      report regular tx completion indications with
      discard status.
      Signed-off-by: NMichal Kazior <michal.kazior@tieto.com>
      Signed-off-by: NKalle Valo <kvalo@qca.qualcomm.com>
      708b9bde
  23. 14 7月, 2014 1 次提交
  24. 23 5月, 2014 1 次提交
  25. 28 2月, 2014 3 次提交
    • M
      ath10k: reduce htt tx/rx spinlock overhead · 45967089
      Michal Kazior 提交于
      It is inefficient to grab irqsave spinlocks for
      skb lists for each queue/dequeue action.
      
      Using rx_ring.lock and tx_lock allows to use less
      heavy bh spinlock functions and moving locking
      upwards allows to toggle spinlocks less often.
      Signed-off-by: NMichal Kazior <michal.kazior@tieto.com>
      Signed-off-by: NKalle Valo <kvalo@qca.qualcomm.com>
      45967089
    • M
      ath10k: bypass htc for htt tx path · a16942e6
      Michal Kazior 提交于
      Going through full htc tx path for htt tx is a
      waste of resources. By skipping it it's possible
      to easily submit scatter-gather to the pci hif for
      reduced host cpu load and improved performance.
      
      The new approach uses dma pool to store the
      following metadata for each tx request:
       * msdu fragment list
       * htc header
       * htt tx command
      
      The htt tx command contains a msdu prefetch.
      Instead of copying it original mapped msdu address
      is used to submit a second scatter-gather item to
      hif to make a complete htt tx command.
      
      The htt tx command itself hands over dma mapped
      pointers to msdus and completion of the command
      itself doesn't mean the frame has been sent and
      can be unmapped/freed. This is why htc tx
      completion is skipped for htt tx as all tx related
      resources are freed upon htt tx completion
      indication event (which also implicitly means htt
      tx command itself was completed).
      
      Since now each htt tx request effectively consists
      of 2 copy engine items CE_HTT_H2T_MSG_SRC_NENTRIES
      is updated to allow maximum of
      TARGET_10X_NUM_MSDU_DESC msdus being queued. This
      keeps the tx path resource management simple.
      Signed-off-by: NMichal Kazior <michal.kazior@tieto.com>
      Signed-off-by: NKalle Valo <kvalo@qca.qualcomm.com>
      a16942e6
    • M
      ath10k: remove DMA mapping wrappers · 767d34fc
      Michal Kazior 提交于
      There's no real benefit from using them. DMA-API
      already provides debugging. Some skbuffs are
      already mapped directly with DMA-API since wrapper
      arguments were insufficient and extending them
      would be pointless.
      Signed-off-by: NMichal Kazior <michal.kazior@tieto.com>
      Signed-off-by: NKalle Valo <kvalo@qca.qualcomm.com>
      767d34fc
  26. 13 2月, 2014 1 次提交
  27. 21 10月, 2013 1 次提交
  28. 27 9月, 2013 1 次提交
  29. 20 9月, 2013 4 次提交