1. 11 3月, 2014 2 次提交
    • C
      gianfar: Use Single-Queue polling for "fsl,etsec2" · 71ff9e3d
      Claudiu Manoil 提交于
      For the "fsl,etsec2" compatible models the driver currently
      supports 8 Tx and Rx DMA rings (aka HW queues).  However, there
      are only 2 pairs of Rx/Tx interrupt lines, as these controllers
      are integrated in low power SoCs with 2 CPUs at most.  As a result,
      there are at most 2 NAPI instances that have to service multiple
      Tx and Rx queues for these devices.  This complicates the NAPI
      polling routine having to iterate over the mutiple Rx/Tx queues
      hooked to the same interrupt lines.  And there's also an overhead
      at HW level, as the controller needs to service all the 8 Tx rings
      in a round robin manner.  The combined overhead shows up for multi
      parallel Tx flows transmitted by the kernel stack, when the driver
      usually starts returning NETDEV_TX_BUSY leading to NETDEV WATCHDOG
      Tx timeout triggering if the Tx path is congested for too long.
      
      As an alternative, this patch makes the driver support only one
      Tx/Rx DMA ring per NAPI instance (per interrupt group or pair
      of Tx/Rx interrupt lines) by default.  The simplified single queue
      polling routine (gfar_poll_sq) will be the default napi poll routine
      for the etsec2 devices too.  Some adjustments needed to be made to
      link the Tx/Rx HW queues with each NAPI instance (2 in this case).
      The gfar_poll_sq() is already successfully used by older SQ_SG_MODE
      (single interrupt group) controllers.
      This patch fixes Tx timeout triggering under heavy Tx traffic load
      (i.e. iperf -c -P 8) for the "fsl,etsec2" (currently the only
      MQ_MG_MODE devices).  There's also a significant memory footprint
      reduction by supporting 2 Rx/Tx DMA rings (at most), instead of 8,
      for these devices.
      Signed-off-by: NClaudiu Manoil <claudiu.manoil@freescale.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      71ff9e3d
    • C
      gianfar: Separate out the Tx interrupt handling (Tx NAPI) · aeb12c5e
      Claudiu Manoil 提交于
      There are some concurrency issues on devices w/ 2 CPUs related
      to the handling of Rx and Tx interrupts.  eTSEC has separate
      interrupt lines for Rx and Tx but a single imask register
      to mask these interrupts and a single NAPI instance to handle
      both Rx and Tx work.  As a result, the Rx and Tx ISRs are
      identical, both are invoking gfar_schedule_cleanup(), however
      both handlers can be entered at the same time when the Rx and
      Tx interrupts are taken by different CPUs.  In this case
      spurrious interrupts (SPU) show up (in /proc/interrupts)
      indicating a concurrency issue.  Also, Tx overruns followed
      by Tx timeout have been observed under heavy Tx traffic load.
      
      To address these issues, the schedule cleanup ISR part has
      been changed to handle the Rx and Tx interrupts independently.
      The patch adds a separate NAPI poll routine for Tx cleanup to
      be triggerred independently by the Tx confirmation interrupts
      only.  Existing poll functions are modified to handle only
      the Rx path processing.  The Tx poll routine does not need a
      budget, since Tx processing doesn't consume NAPI budget, and
      hence it is registered with minimum NAPI weight.
      NAPI scheduling does not require locking since there are
      different NAPI instances between the Rx and Tx confirmation
      paths now.
      So, the patch fixes the occurence of spurrious Rx/Tx interrupts.
      Tx overruns also occur less frequently now.
      Signed-off-by: NClaudiu Manoil <claudiu.manoil@freescale.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aeb12c5e
  2. 08 3月, 2014 15 次提交
  3. 07 3月, 2014 14 次提交
  4. 05 3月, 2014 1 次提交
    • S
      be2net: dma_sync each RX frag before passing it to the stack · e50287be
      Sathya Perla 提交于
      The driver currently maps a page for DMA, divides the page into multiple
      frags and posts them to the HW. It un-maps the page after data is received
      on all the frags of the page. This scheme doesn't work when bounce buffers
      are used for DMA (swiotlb=force kernel param).
      
      This patch fixes this problem by calling dma_sync_single_for_cpu() for each
      frag (excepting the last one) so that the data is copied from the bounce
      buffers. The page is un-mapped only when DMA finishes on the last frag of
      the page.
      (Thanks Ben H. for suggesting the dma_sync API!)
      
      This patch also renames the "last_page_user" field of be_rx_page_info{}
      struct to "last_frag" to improve readability of the fixed code.
      Reported-by: NLi Fengmao <li.fengmao@zte.com.cn>
      Signed-off-by: NSathya Perla <sathya.perla@emulex.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e50287be
  5. 04 3月, 2014 6 次提交
  6. 03 3月, 2014 2 次提交