1. 17 1月, 2014 21 次提交
    • S
      net: stmmac: move hardware setup for stmmac_open to new function · 523f11b5
      Srinivas Kandagatla 提交于
      This patch moves hardware setup part of the code in stmmac_open to a new
      function stmmac_hw_setup, the reason for doing this is to make hw
      initialization independent function so that PM functions can re-use it to
      re-initialize the IP after returning from low power state.
      This will also avoid code duplication across stmmac_resume/restore and
      stmmac_open.
      Signed-off-by: NSrinivas Kandagatla <srinivas.kandagatla@st.com>
      Acked-by: NGiuseppe Cavallaro <peppe.cavallaro@st.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      523f11b5
    • S
      net: stmmac: move dma allocation to new function · 09f8d696
      Srinivas Kandagatla 提交于
      This patch moves dma resource allocation to a new function
      alloc_dma_desc_resources, the reason for moving this to a new function
      is to keep the memory allocations in a separate function. One more reason
      it to get suspend and hibernation cases working without releasing and
      allocating these resources during suspend-resume and freeze-restore
      cases.
      Signed-off-by: NSrinivas Kandagatla <srinivas.kandagatla@st.com>
      Acked-by: NGiuseppe Cavallaro <peppe.cavallaro@st.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      09f8d696
    • S
      net: stmmac: mdio: remove reset gpio free · 984203ce
      Srinivas Kandagatla 提交于
      This patch removes gpio_free for reset line of the phy, driver stores
      the gpio number in its private data-structure to use in future. As the
      driver uses this pin in future this pin should not be freed.
      Signed-off-by: NSrinivas Kandagatla <srinivas.kandagatla@st.com>
      Acked-by: NGiuseppe Cavallaro <peppe.cavallaro@st.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      984203ce
    • S
      net: stmmac: support max-speed device tree property · 9cbadf09
      Srinivas Kandagatla 提交于
      This patch adds support to "max-speed" property which is a standard
      Ethernet device tree property. max-speed specifies maximum speed
      (specified in megabits per second) supported the device.
      
      Depending on the clocking schemes some of the boards can only support
      few link speeds, so having a way to limit the link speed in the mac
      driver would allow such setups to work reliably.
      
      Without this patch there is no way to tell the driver to limit the
      link speed.
      Signed-off-by: NSrinivas Kandagatla <srinivas.kandagatla@st.com>
      Acked-by: NGiuseppe Cavallaro <peppe.cavallaro@st.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9cbadf09
    • D
      Merge branch 'mvneta' · 82a342d1
      David S. Miller 提交于
      Willy Tarreau says:
      
      ====================
      Assorted mvneta fixes and improvements
      
      this series provides some fixes for a number of issues met with the
      mvneta driver, then adds some improvements. Patches 1-5 are fixes
      and would be needed in 3.13 and likely -stable. The next ones are
      performance improvements and cleanups :
      
        - driver lockup when reading stats while sending traffic from multiple
          CPUs : this obviously only happens on SMP and is the result of missing
          locking on the driver. The problem was present since the introduction
          of the driver in 3.8. The first patch performs some changes that are
          needed for the second one which actually fixes the issue by using
          per-cpu counters. It could make sense to backport this to the relevant
          stable versions.
      
        - mvneta_tx_timeout calls various functions to reset the NIC, and these
          functions sleep, which is not allowed here, resulting in a panic.
          Better completely disable this Tx timeout handler for now since it is
          never called. The problem was encountered while developing some new
          features, it's uncertain whether it's possible to reproduce it with
          regular usage, so maybe a backport to stable is not needed.
      
        - replace the Tx timer with a real Tx IRQ. As first reported by Arnaud
          Ebalard and explained by Eric Dumazet, there is no way this driver
          can work correctly if it uses a driver to recycle the Tx descriptors.
          If too many packets are sent at once, the driver quickly ends up with
          no descriptors (which happens twice as easily in GSO) and has to wait
          10ms for recycling its descriptors and being able to send again. Eric
          has worked around this in the core GSO code. But still when routing
          traffic or sending UDP packets, the limitation is very visible. Using
          Tx IRQs allows Tx descriptors to be recycled when sent. The coalesce
          value is still configurable using ethtool. This fix turns the UDP
          send bitrate from 134 Mbps to 987 Mbps (ie: line rate). It's made of
          two patches, one to add the relevant bits from the original Marvell's
          driver, and another one to implement the change. I don't know if it
          should be backported to stable, as the bug only causes poor performance.
      
        - Patches 6..8 are essentially cleanups, code deduplication and minor
          optimizations for not re-fetching a value we already have (status).
      
        - patch 9 changes the prefetch of Rx descriptor from current one to
          next one. In benchmarks, it results in about 1% general performance
          increase on HTTP traffic, probably because prefetching the current
          descriptor does not leave enough time between the start of prefetch
          and its usage.
      
        - patch 10 implements support for build_skb() on Rx path. The driver
          now preallocates frags instead of skbs and builds an skb just before
          delivering it. This results in a 2% performance increase on HTTP
          traffic, and up to 5% on small packet Rx rate.
      
        - patch 11 implements rx_copybreak for small packets (256 bytes). It
          avoids a dma_map_single()/dma_unmap_single() and increases the Rx
          rate by 16.4%, from 486kpps to 573kpps. Further improvements up to
          711kpps are possible depending how the DMA is used.
      
        - patches 12 and 13 are extra cleanups made possible by some of the
          simplifications above.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      82a342d1
    • A
      net: mvneta: make mvneta_txq_done() return void · cd713199
      Arnaud Ebalard 提交于
      The function return parameter is not used in mvneta_tx_done_gbe(),
      where the function is called. This patch makes the function return
      void.
      Reviewed-by: NWilly Tarreau <w@1wt.eu>
      Signed-off-by: NArnaud Ebalard <arno@natisbad.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cd713199
    • A
      net: mvneta: mvneta_tx_done_gbe() cleanups · 0713a86a
      Arnaud Ebalard 提交于
      mvneta_tx_done_gbe() return value and third parameter are no more
      used. This patch changes the function prototype and removes a useless
      variable where the function is called.
      Reviewed-by: NWilly Tarreau <w@1wt.eu>
      Signed-off-by: NArnaud Ebalard <arno@natisbad.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0713a86a
    • W
      net: mvneta: implement rx_copybreak · f19fadfc
      willy tarreau 提交于
      calling dma_map_single()/dma_unmap_single() is quite expensive compared
      to copying a small packet. So let's copy short frames and keep the buffers
      mapped. We set the limit to 256 bytes which seems to give good results both
      on the XP-GP board and on the AX3/4.
      
      The Rx small packet rate increased by 16.4% doing this, from 486kpps to
      573kpps. It is worth noting that even the call to the function
      dma_sync_single_range_for_cpu() is expensive (300 ns) although less
      than dma_unmap_single(). Without it, the packet rate raises to 711kpps
      (+24% more). Thus on systems where coherency from device to CPU is
      guaranteed by a snoop control unit, this patch should provide even more
      gains, and probably rx_copybreak could be increased.
      
      Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
      Tested-by: NArnaud Ebalard <arno@natisbad.org>
      Signed-off-by: NWilly Tarreau <w@1wt.eu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f19fadfc
    • W
      net: mvneta: convert to build_skb() · 8ec2cd48
      willy tarreau 提交于
      Make use of build_skb() to allocate frags on the RX path. When frag size
      is lower than a page size, we can use netdev_alloc_frag(), and we fall back
      to kmalloc() for larger sizes. The frag size is stored into the mvneta_port
      struct. The alloc/free functions check the frag size to decide what alloc/
      free method to use. MTU changes are safe because the MTU change function
      stops the device and clears the queues before applying the change.
      
      With this patch, I observed a reproducible 2% performance improvement on
      HTTP-based benchmarks, and 5% on small packet RX rate.
      
      Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
      Tested-by: NArnaud Ebalard <arno@natisbad.org>
      Signed-off-by: NWilly Tarreau <w@1wt.eu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ec2cd48
    • W
      net: mvneta: prefetch next rx descriptor instead of current one · 34e4179d
      willy tarreau 提交于
      Currently, the mvneta driver tries to prefetch the current Rx
      descriptor during read. Tests have shown that prefetching the
      next one instead increases general performance by about 1% on
      HTTP traffic.
      
      Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
      Tested-by: NArnaud Ebalard <arno@natisbad.org>
      Signed-off-by: NWilly Tarreau <w@1wt.eu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      34e4179d
    • W
      net: mvneta: simplify access to the rx descriptor status · 5428213c
      willy tarreau 提交于
      At several places, we already know the value of the rx status but
      we call functions which dereference the pointer again to get it
      and don't need the descriptor for anything else. Simplify this
      task by replacing the rx desc pointer by the status word itself.
      
      Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
      Tested-by: NArnaud Ebalard <arno@natisbad.org>
      Signed-off-by: NWilly Tarreau <w@1wt.eu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5428213c
    • W
      net: mvneta: factor rx refilling code · a1a65ab1
      willy tarreau 提交于
      Make mvneta_rxq_fill() use mvneta_rx_refill() instead of using
      duplicate code.
      
      Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
      Tested-by: NArnaud Ebalard <arno@natisbad.org>
      Signed-off-by: NWilly Tarreau <w@1wt.eu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a1a65ab1
    • W
      net: mvneta: remove tests for impossible cases in the tx_done path · 6c498974
      willy tarreau 提交于
      Currently, mvneta_txq_bufs_free() calls mvneta_tx_done_policy() with
      a non-null cause to retrieve the pointer to the next queue to process.
      There are useless tests on the return queue number and on the pointer,
      all of which are well defined within a known limited set. This code
      path is fast, although not critical. Removing 3 tests here that the
      compiler could not optimize (verified) is always desirable.
      
      Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
      Tested-by: NArnaud Ebalard <arno@natisbad.org>
      Signed-off-by: NWilly Tarreau <w@1wt.eu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6c498974
    • W
      net: mvneta: replace Tx timer with a real interrupt · 71f6d1b3
      willy tarreau 提交于
      Right now the mvneta driver doesn't handle Tx IRQ, and relies on two
      mechanisms to flush Tx descriptors : a flush at the end of mvneta_tx()
      and a timer. If a burst of packets is emitted faster than the device
      can send them, then the queue is stopped until next wake-up of the
      timer 10ms later. This causes jerky output traffic with bursts and
      pauses, making it difficult to reach line rate with very few streams.
      
      A test on UDP traffic shows that it's not possible to go beyond 134
      Mbps / 12 kpps of outgoing traffic with 1500-bytes IP packets. Routed
      traffic tends to observe pauses as well if the traffic is bursty,
      making it even burstier after the wake-up.
      
      It seems that this feature was inherited from the original driver but
      nothing there mentions any reason for not using the interrupt instead,
      which the chip supports.
      
      Thus, this patch enables Tx interrupts and removes the timer. It does
      the two at once because it's not really possible to make the two
      mechanisms coexist, so a split patch doesn't make sense.
      
      First tests performed on a Mirabox (Armada 370) show that less CPU
      seems to be used when sending traffic. One reason might be that we now
      call the mvneta_tx_done_gbe() with a mask indicating which queues have
      been done instead of looping over all of them.
      
      The same UDP test above now happily reaches 987 Mbps / 87.7 kpps.
      Single-stream TCP traffic can now more easily reach line rate. HTTP
      transfers of 1 MB objects over a single connection went from 730 to
      840 Mbps. It is even possible to go significantly higher (>900 Mbps)
      by tweaking tcp_tso_win_divisor.
      
      Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
      Cc: Arnaud Ebalard <arno@natisbad.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Tested-by: NArnaud Ebalard <arno@natisbad.org>
      Signed-off-by: NWilly Tarreau <w@1wt.eu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      71f6d1b3
    • W
      net: mvneta: add missing bit descriptions for interrupt masks and causes · 40ba35e7
      willy tarreau 提交于
      Marvell has not published the chip's datasheet yet, so it's very hard
      to find the relevant bits to manipulate to change the IRQ behaviour.
      Fortunately, these bits are described in the proprietary LSP patch set
      which is publicly available here :
      
          http://www.plugcomputer.org/downloads/mirabox/
      
      So let's put them back in the driver in order to reduce the burden of
      current and future maintenance.
      
      Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
      Tested-by: NArnaud Ebalard <arno@natisbad.org>
      Signed-off-by: NWilly Tarreau <w@1wt.eu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      40ba35e7
    • W
      net: mvneta: do not schedule in mvneta_tx_timeout · 29021366
      willy tarreau 提交于
      If a queue timeout is reported, we can oops because of some
      schedules while the caller is atomic, as shown below :
      
        mvneta d0070000.ethernet eth0: tx timeout
        BUG: scheduling while atomic: bash/1528/0x00000100
        Modules linked in: slhttp_ethdiv(C) [last unloaded: slhttp_ethdiv]
        CPU: 2 PID: 1528 Comm: bash Tainted: G        WC   3.13.0-rc4-mvebu-nf #180
        [<c0011bd9>] (unwind_backtrace+0x1/0x98) from [<c000f1ab>] (show_stack+0xb/0xc)
        [<c000f1ab>] (show_stack+0xb/0xc) from [<c02ad323>] (dump_stack+0x4f/0x64)
        [<c02ad323>] (dump_stack+0x4f/0x64) from [<c02abe67>] (__schedule_bug+0x37/0x4c)
        [<c02abe67>] (__schedule_bug+0x37/0x4c) from [<c02ae261>] (__schedule+0x325/0x3ec)
        [<c02ae261>] (__schedule+0x325/0x3ec) from [<c02adb97>] (schedule_timeout+0xb7/0x118)
        [<c02adb97>] (schedule_timeout+0xb7/0x118) from [<c0020a67>] (msleep+0xf/0x14)
        [<c0020a67>] (msleep+0xf/0x14) from [<c01dcbe5>] (mvneta_stop_dev+0x21/0x194)
        [<c01dcbe5>] (mvneta_stop_dev+0x21/0x194) from [<c01dcfe9>] (mvneta_tx_timeout+0x19/0x24)
        [<c01dcfe9>] (mvneta_tx_timeout+0x19/0x24) from [<c024afc7>] (dev_watchdog+0x18b/0x1c4)
        [<c024afc7>] (dev_watchdog+0x18b/0x1c4) from [<c0020b53>] (call_timer_fn.isra.27+0x17/0x5c)
        [<c0020b53>] (call_timer_fn.isra.27+0x17/0x5c) from [<c0020cad>] (run_timer_softirq+0x115/0x170)
        [<c0020cad>] (run_timer_softirq+0x115/0x170) from [<c001ccb9>] (__do_softirq+0xbd/0x1a8)
        [<c001ccb9>] (__do_softirq+0xbd/0x1a8) from [<c001cfad>] (irq_exit+0x61/0x98)
        [<c001cfad>] (irq_exit+0x61/0x98) from [<c000d4bf>] (handle_IRQ+0x27/0x60)
        [<c000d4bf>] (handle_IRQ+0x27/0x60) from [<c000843b>] (armada_370_xp_handle_irq+0x33/0xc8)
        [<c000843b>] (armada_370_xp_handle_irq+0x33/0xc8) from [<c000fba9>] (__irq_usr+0x49/0x60)
      
      Ben Hutchings attempted to propose a better fix consisting in using a
      scheduled work for this, but while it fixed this panic, it caused other
      random freezes and panics proving that the reset sequence in the driver
      is unreliable and that additional fixes should be investigated.
      
      When sending multiple streams over a link limited to 100 Mbps, Tx timeouts
      happen from time to time, and the driver correctly recovers only when the
      function is disabled.
      
      Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Tested-by: NArnaud Ebalard <arno@natisbad.org>
      Signed-off-by: NWilly Tarreau <w@1wt.eu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      29021366
    • W
      net: mvneta: use per_cpu stats to fix an SMP lock up · 74c41b04
      willy tarreau 提交于
      Stats writers are mvneta_rx() and mvneta_tx(). They don't lock anything
      when they update the stats, and as a result, it randomly happens that
      the stats freeze on SMP if two updates happen during stats retrieval.
      This is very easily reproducible by starting two HTTP servers and binding
      each of them to a different CPU, then consulting /proc/net/dev in loops
      during transfers, the interface should immediately lock up. This issue
      also randomly happens upon link state changes during transfers, because
      the stats are collected in this situation, but it takes more attempts to
      reproduce it.
      
      The comments in netdevice.h suggest using per_cpu stats instead to get
      rid of this issue.
      
      This patch implements this. It merges both rx_stats and tx_stats into
      a single "stats" member with a single syncp. Both mvneta_rx() and
      mvneta_rx() now only update the a single CPU's counters.
      
      In turn, mvneta_get_stats64() does the summing by iterating over all CPUs
      to get their respective stats.
      
      With this change, stats are still correct and no more lockup is encountered.
      
      Note that this bug was present since the first import of the mvneta
      driver.  It might make sense to backport it to some stable trees. If
      so, it depends on "d33dc73 net: mvneta: increase the 64-bit rx/tx stats
      out of the hot path".
      
      Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Tested-by: NArnaud Ebalard <arno@natisbad.org>
      Signed-off-by: NWilly Tarreau <w@1wt.eu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      74c41b04
    • W
      net: mvneta: increase the 64-bit rx/tx stats out of the hot path · dc4277dd
      willy tarreau 提交于
      Better count packets and bytes in the stack and on 32 bit then
      accumulate them at the end for once. This saves two memory writes
      and two memory barriers per packet. The incoming packet rate was
      increased by 4.7% on the Openblocks AX3 thanks to this.
      
      Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Cc: Gregory CLEMENT <gregory.clement@free-electrons.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Tested-by: NArnaud Ebalard <arno@natisbad.org>
      Signed-off-by: NWilly Tarreau <w@1wt.eu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dc4277dd
    • P
      drivers/net: delete non-required instances of include <linux/init.h> · a81ab36b
      Paul Gortmaker 提交于
      None of these files are actually using any __init type directives
      and hence don't need to include <linux/init.h>.   Most are just a
      left over from __devinit and __cpuinit removal, or simply due to
      code getting copied from one driver to the next.
      
      This covers everything under drivers/net except for wireless, which
      has been submitted separately.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a81ab36b
    • D
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nftables · 5ff1dd24
      David S. Miller 提交于
      Pablo Neira Ayuso says:
      
      ====================
      This small batch contains several Netfilter fixes for your net-next
      tree, more specifically:
      
      * Fix compilation warning in nft_ct in NF_CONNTRACK_MARK is not set,
        from Kristian Evensen.
      
      * Add dependency to IPV6 for NF_TABLES_INET. This one has been reported
        by the several robots that are testing .config combinations, from Paul
        Gortmaker.
      
      * Fix default base chain policy setting in nf_tables, from myself.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5ff1dd24
    • J
      neigh: use NEIGH_VAR_INIT in ndo_neigh_setup functions. · 89740ca7
      Jiri Pirko 提交于
      When ndo_neigh_setup is called, the bitfield used by NEIGH_VAR_SET is
      not initialized yet. This might cause confusion for the people who use
      NEIGH_VAR_SET in ndo_neigh_setup. So rather introduce NEIGH_VAR_INIT for
      usage in ndo_neigh_setup.
      Signed-off-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      89740ca7
  2. 16 1月, 2014 19 次提交