1. 13 7月, 2018 40 次提交
    • D
      Merge branch 'mvpp2-add-RSS-support' · 23c9ef2b
      David S. Miller 提交于
      Maxime Chevallier says:
      
      ====================
      net: mvpp2: add RSS support
      
      This series adds support for RSS on PPv2. There already was some code to
      handle the RSS tables, but the driver was missing all the classification
      steps required to actually use these tables.
      
      RSS is used through the classifier, using at least 2 lookups :
       - One using the C2 engine, a TCAM engine that match the packet based on
         some header extracted fields, assigns the default rx queue for that
         packet and tag it for RSS
       - One using the C3Hx engine, which computes the hash that's used to perform
         the lookup in the RSS table.
      
      Since RSS spreads the load across CPUs, we need to make sure that packets
      from the same flow are always assigned the same rx queue, to prevent
      re-ordering.
      
      This series therefore adds a classification step based on the Header Parser,
      that separate ingress traffic into 52 flows, based on some L2, L3 and L4
      parameters.
      
      Patches 1 and 2 fix some header issues, from the driver splitting
      
      Patches 3 to 7 make sure the correct receive queue setup is used for RSS
      
      Patches 8 to 14 deal with the way we handle the RSS tables
      
      Patch 15 implement basic classifier configuration, by using it to assign the
      default receive queue
      
      Patch 16 implement the ingress traffic splitting into multiple flows
      
      Patch 17 adds RSS support, by using the needed classification steps
      
      Patch 18 adds the required ethtool ops to configure the flow hash parameters
      
      This was tested on MacchiatoBin, giving some nice performance improvements
      using ip forwarding (going from 5Gbps to 9.6Gbps total throughput).
      
      RSS is disabled by default.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      23c9ef2b
    • M
      net: mvpp2: allow setting RSS flow hash parameters with ethtool · 436d4fdb
      Maxime Chevallier 提交于
      This commit allows setting the RSS hash generation parameters from
      ethtool. When setting parameters for a given flow type from ethtool
      (e.g. tcp4), all the corresponding flows in the flow table are updated,
      according to the supported hash parameters.
      
      For example, when configuring TCP over IPv4 hash parameters to be
      src/dst IP  + src/dst port ("ethtool -N eth0 rx-flow-hash tcp4 sdfn"),
      we only set the "src/dst port" hash parameters on the non-fragmented TCP
      over IPv4 flows.
      Signed-off-by: NMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      436d4fdb
    • M
      net: mvpp2: add an RSS classification step for each flow · d33ec452
      Maxime Chevallier 提交于
      One of the classification action that can be performed is to compute a
      hash of the packet header based on some header fields, and lookup a RSS
      table based on this hash to determine the final RxQ.
      
      This is done by adding one lookup entry per flow per port, so that we
      can configure the hash generation parameters for each flow and each
      port.
      
      There are 2 possible engines that can be used for RSS hash generation :
      
       - C3HA, that generates a hash based on up to 4 header-extracted fields
       - C3HB, that does the same as c3HA, but also includes L4 info in the hash
      
      There are a lot of fields that can be extracted from the header. For now,
      we only use the ones that we can configure using ethtool :
       - DST MAC address
       - L3 info
       - Source IP
       - Destination IP
       - Source port
       - Destination port
      
      The C3HB engine is selected when we use L4 fields (src/dst port).
      
                     Header parser          Dec table
       Ingress pkt  +-------------+ flow id +----------------------------+
      ------------->| TCAM + SRAM |-------->|TCP IPv4 w/ VLAN, not frag  |
                    +-------------+         |TCP IPv4 w/o VLAN, not frag |
                                            |TCP IPv4 w/ VLAN, frag      |--+
                                            |etc.                        |  |
                                            +----------------------------+  |
                                                                            |
                                                  Flow table                |
        +---------+   +------------+         +--------------------------+   |
        | RSS tbl |<--| Classifier |<--------| flow 0: C2 lookup        |   |
        +---------+   +------------+         |         C3 lookup port 0 |   |
                       |         |           |         C3 lookup port 1 |   |
               +-----------+ +-------------+ |         ...              |   |
               | C2 engine | | C3H engines | | flow 1: C2 lookup        |<--+
               +-----------+ +-------------+ |         C3 lookup port 0 |
                                             |         ...              |
                                             | ...                      |
                                             | flow 51 : C2 lookup      |
                                             |           ...            |
                                             +--------------------------+
      
      The C2 engine also gains the role of enabling and disabling the RSS
      table lookup for this packet.
      Signed-off-by: NMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d33ec452
    • M
      net: mvpp2: split ingress traffic into multiple flows · f9358e12
      Maxime Chevallier 提交于
      The PPv2 classifier allows to perform classification operations on each
      ingress packet, based on the flow the packet is assigned to.
      
      The current code uses only 1 flow per port, and the only classification
      action consists of assigning the rx queue to the packet, depending on the
      port.
      
      In preparation for adding RSS support, we have to split all incoming
      traffic into different flows. Since RSS assigns a rx queue depending on
      the hash of some header fields, we have to make sure that the hash is
      generated in a consistent way for all packets in the same flow.
      
      What we call a "flow" is actually a set of attributes attached to a
      packet that depends on various L2/L3/L4 info.
      
      This patch introduces 52 flows, wich are a combination of various L2, L3
      and L4 attributes :
       - Whether or not the packet has a VLAN tag
       - Whether the packet is IPv4, IPv6 or something else
       - Whether the packet is TCP, UDP or something else
       - Whether or not the packet is fragmented at L3 level.
      
      The flow is associated to a packet by the Header Parser. Each flow
      corresponds to an entry in the decoding table. This entry then points to
      the sequence of classification lookups to be performed by the
      classifier, represented in the flow table.
      
      For now, the only lookup we perform is a C2 lookup to set the default
      rx queue.
      
                     Header parser          Dec table
       Ingress pkt  +-------------+ flow id +----------------------------+
      ------------->| TCAM + SRAM |-------->|TCP IPv4 w/ VLAN, not frag  |
                    +-------------+         |TCP IPv4 w/o VLAN, not frag |
                                            |TCP IPv4 w/ VLAN, frag      |--+
                                            |etc.                        |  |
                                            +----------------------------+  |
                                                                            |
                                                 Flow table                 |
                      +------------+        +---------------------+         |
           To RxQ <---| Classifier |<-------| flow 0: C2 lookup   |<--------+
                      +------------+        | flow 1: C2 lookup   |
                             |              | ...                 |
                      +------------+        | flow 51 : C2 lookup |
      		| C2 engine  |        +---------------------+
                      +------------+
      Signed-off-by: NMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f9358e12
    • M
      net: mvpp2: use classifier to assign default rx queue · b1a962c6
      Maxime Chevallier 提交于
      The PPv2 Controller has a classifier, that can perform multiple lookup
      operations for each packet, using different engines.
      
      One of these engines is the C2 engine, which performs TCAM based lookups
      on data extracted from the packet header. When a packet matches an
      entry, the engine sets various attributes, used to perform
      classification operations.
      
      One of these attributes is the rx queue in which the packet should be sent.
      The current code uses the lookup_id table (also called decoding table)
      to assign the rx queue. However, this only works if we use one entry per
      port in the decoding table, which won't be the case once we add RSS
      lookups.
      
      This patch uses the C2 engine to assign the rx queue to each packet.
      
      The C2 engine is used through the flow table, which dictates what
      classification operations are done for a given flow.
      
      Right now, we have one flow per port, which contains every ingress
      packet for this port.
      Signed-off-by: NMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1a962c6
    • M
      net: mvpp2: rename per-port RSS init function · e6e21c02
      Maxime Chevallier 提交于
      mvpp22_init_rss function configures the RSS parameters for each port, so
      rename it accordingly. Since this function relies on classifier
      configuration, move its call right after the classifier config.
      Signed-off-by: NMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e6e21c02
    • M
      net: mvpp2: make sure we don't spread load on disabled CPUs · 2a2f467d
      Maxime Chevallier 提交于
      When filling the RSS table, we have to make sure that the rx queue is
      attached to an online CPU.
      
      This patch is not a full support for cpu_hotplug, but rather a way to
      make sure that we don't break network on system booted with the maxcpus
      parameter.
      Signed-off-by: NMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2a2f467d
    • A
      net: mvpp2: improve the distribution of packets on CPUs when using RSS · 662ae3fe
      Antoine Tenart 提交于
      This patch adds an extra indirection when setting the indirection table
      into the RSS hardware table to improve the packets distribution across
      CPUs. For example, if 2 queues are used on a multi-core system this new
      indirection will choose two queues on two different CPUs instead of the
      two first queues which are on the same first CPU.
      Signed-off-by: NAntoine Tenart <antoine.tenart@bootlin.com>
      Signed-off-by: NMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      662ae3fe
    • A
      net: mvpp2: RSS indirection table support · 8179642b
      Antoine Tenart 提交于
      This patch adds the RSS indirection table support, allowing to use the
      ethtool -x and -X options to dump and set this table.
      Signed-off-by: NAntoine Tenart <antoine.tenart@bootlin.com>
      [Maxime: Small warning fixes, use one table per port]
      Signed-off-by: NMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8179642b
    • M
      net: mvpp2: use one RSS table per port · a27a254c
      Maxime Chevallier 提交于
      PPv2 Controller has 8 RSS Tables, of 32 entries each. A lookup in the
      RXQ2RSS_TABLE is performed for each incoming packet, and the RSS Table
      to be used is chosen according to the default rx queue that would be
      used for the packet.
      
      This default rx queue is set in the Lookup_id Table (also called
      Decoding Table), and is equal to the port->first_rxq.
      
      Since the Classifier itself isn't active at any time for the moment,
      this doesn't have a direct effect, the default rx queue at the moment is
      the one where all packets end-up into.
      Signed-off-by: NMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a27a254c
    • M
      net: mvpp2: fix RSS register definitions · 4b86097b
      Maxime Chevallier 提交于
      There is no RSS_TABLE register in PPv2 Controller. The register 0x1510
      which was specified is actually named "RSS_HASH_SEL", but isn't used by
      this driver at all.
      
      Based on how this register was used, it should have been the
      RXQ2RSS_TABLE register, which allows to select the RSS table that will
      be used for the incoming packet.
      
      The RSS_TABLE_POINTER is actually a field of this RXQ2RSS_TABLE
      register.
      
      Since RSS tables are actually not used by the driver for now, this
      commit does not fix a runtime bug.
      Signed-off-by: NMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4b86097b
    • A
      net: mvpp2: fix a typo in the RSS code · 132baa03
      Antoine Tenart 提交于
      Cosmetic patch fixing a typo in one of the RSS comments.
      Signed-off-by: NAntoine Tenart <antoine.tenart@bootlin.com>
      Signed-off-by: NMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      132baa03
    • M
      net: mvpp2: use only one rx queue per port per CPU · f8c6ba84
      Maxime Chevallier 提交于
      The number of receive queue per port is :
       - MVPP2_DEFAULT_RXQ if in single queue mode
       - MVPP2_DEFAULT_RXQ * num_possible_cpus if in multi queue mode
      
      with MVPP2_DEFAULT_RXQ = 4.
      
      However, we don't use the extra rx queues at the moment, we really only
      need one per port per CPU, until some more advanced classification rules
      are implemented.
      Suggested-by: NStefan Chulski <stefanc@marvell.com>
      Signed-off-by: NMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f8c6ba84
    • M
      net: mvpp2: fix hardcoded number of rx queues · 790d32c6
      Maxime Chevallier 提交于
      There's a dedicated #define that indicates the number of rx queues per
      port per cpu, this commit removes a harcoded use of that value
      
      This doesn't fix any runtime bugs since the harcoded value matches the
      expected value.
      Signed-off-by: NMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      790d32c6
    • Y
      net: mvpp2: use RSS only when using multi-queue mode · 4c4a5686
      Yan Markman 提交于
      Since RSS only applies when we have per-cpu rx queues, it should only
      be enabled when the driver is configured to make use of multi-queue
      mode.
      Signed-off-by: NYan Markman <ymarkman@marvell.com>
      [Maxime: Commit message]
      Signed-off-by: NMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c4a5686
    • M
      net: mvpp2: make multi queue mode the default mode · 3f6aaf72
      Maxime Chevallier 提交于
      The multi queue mode is needed to have RSS available, and offers some
      nice advantages, being able to have one rx queue vector per CPU.
      
      This mode has been usable through the use of a module parameter, this
      commit makes it the default value.
      Signed-off-by: NMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3f6aaf72
    • M
      net: mvpp2: make sure we use single queue mode on PPv2.1 · 1e27a628
      Maxime Chevallier 提交于
      The PPv2 driver defines 2 "queue_modes" :
       - QDIST_SINGLE_MODE, where each port share one rx queue vector
         between all CPUs
       - QDIST_MULTI_MODE, where each port has one rx queue vector per CPU.
      
      Multi queue mode isn't available on PPv2.1, make sure we fallback to
      single mode when running on this revision.
      Signed-off-by: NMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1e27a628
    • M
      net: mvpp2: define the number of RSS entries per table in mvpp2.h · 0ad2f539
      Maxime Chevallier 提交于
      The size of the the RSS indirection tables should be defined in mvpp2.h,
      so that we can use it in all files of the PPv2 driver.
      
      This commit moves the define in mvpp2.h, and adds the missing #include
      in mvpp2_cls.h.
      Signed-off-by: NMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ad2f539
    • M
      net: mvpp2: fix include guards in mvpp2_prs.h · 53a40025
      Maxime Chevallier 提交于
      Include guards should be put before #includes. This doesn't fix any bug,
      but prevent future compilation issues when adding new files in the mvpp2
      driver
      
      The Header Parser init function needs the platform_device definition,
      and with the fixed include guards we need to add the missing include.
      Signed-off-by: NMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      53a40025
    • P
      net: gro: properly remove skb from list · 68d2f84a
      Prashant Bhole 提交于
      Following crash occurs in validate_xmit_skb_list() when same skb is
      iterated multiple times in the loop and consume_skb() is called.
      
      The root cause is calling list_del_init(&skb->list) and not clearing
      skb->next in d4546c25. list_del_init(&skb->list) sets skb->next
      to point to skb itself. skb->next needs to be cleared because other
      parts of network stack uses another kind of SKB lists.
      validate_xmit_skb_list() uses such list.
      
      A similar type of bugfix was reported by Jesper Dangaard Brouer.
      https://patchwork.ozlabs.org/patch/942541/
      
      This patch clears skb->next and changes list_del_init() to list_del()
      so that list->prev will maintain the list poison.
      
      [  148.185511] ==================================================================
      [  148.187865] BUG: KASAN: use-after-free in validate_xmit_skb_list+0x4b/0xa0
      [  148.190158] Read of size 8 at addr ffff8801e52eefc0 by task swapper/1/0
      [  148.192940]
      [  148.193642] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.18.0-rc3+ #25
      [  148.195423] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180531_142017-buildhw-08.phx2.fedoraproject.org-1.fc28 04/01/2014
      [  148.199129] Call Trace:
      [  148.200565]  <IRQ>
      [  148.201911]  dump_stack+0xc6/0x14c
      [  148.203572]  ? dump_stack_print_info.cold.1+0x2f/0x2f
      [  148.205083]  ? kmsg_dump_rewind_nolock+0x59/0x59
      [  148.206307]  ? validate_xmit_skb+0x2c6/0x560
      [  148.207432]  ? debug_show_held_locks+0x30/0x30
      [  148.208571]  ? validate_xmit_skb_list+0x4b/0xa0
      [  148.211144]  print_address_description+0x6c/0x23c
      [  148.212601]  ? validate_xmit_skb_list+0x4b/0xa0
      [  148.213782]  kasan_report.cold.6+0x241/0x2fd
      [  148.214958]  validate_xmit_skb_list+0x4b/0xa0
      [  148.216494]  sch_direct_xmit+0x1b0/0x680
      [  148.217601]  ? dev_watchdog+0x4e0/0x4e0
      [  148.218675]  ? do_raw_spin_trylock+0x10/0x120
      [  148.219818]  ? do_raw_spin_lock+0xe0/0xe0
      [  148.221032]  __dev_queue_xmit+0x1167/0x1810
      [  148.222155]  ? sched_clock+0x5/0x10
      [...]
      
      [  148.474257] Allocated by task 0:
      [  148.475363]  kasan_kmalloc+0xbf/0xe0
      [  148.476503]  kmem_cache_alloc+0xb4/0x1b0
      [  148.477654]  __build_skb+0x91/0x250
      [  148.478677]  build_skb+0x67/0x180
      [  148.479657]  e1000_clean_rx_irq+0x542/0x8a0
      [  148.480757]  e1000_clean+0x652/0xd10
      [  148.481772]  net_rx_action+0x4ea/0xc20
      [  148.482808]  __do_softirq+0x1f9/0x574
      [  148.483831]
      [  148.484575] Freed by task 0:
      [  148.485504]  __kasan_slab_free+0x12e/0x180
      [  148.486589]  kmem_cache_free+0xb4/0x240
      [  148.487634]  kfree_skbmem+0xed/0x150
      [  148.488648]  consume_skb+0x146/0x250
      [  148.489665]  validate_xmit_skb+0x2b7/0x560
      [  148.490754]  validate_xmit_skb_list+0x70/0xa0
      [  148.491897]  sch_direct_xmit+0x1b0/0x680
      [  148.493949]  __dev_queue_xmit+0x1167/0x1810
      [  148.495103]  br_dev_queue_push_xmit+0xce/0x250
      [  148.496196]  br_forward_finish+0x276/0x280
      [  148.497234]  __br_forward+0x44f/0x520
      [  148.498260]  br_forward+0x19f/0x1b0
      [  148.499264]  br_handle_frame_finish+0x65e/0x980
      [  148.500398]  NF_HOOK.constprop.10+0x290/0x2a0
      [  148.501522]  br_handle_frame+0x417/0x640
      [  148.502582]  __netif_receive_skb_core+0xaac/0x18f0
      [  148.503753]  __netif_receive_skb_one_core+0x98/0x120
      [  148.504958]  netif_receive_skb_internal+0xe3/0x330
      [  148.506154]  napi_gro_complete+0x190/0x2a0
      [  148.507243]  dev_gro_receive+0x9f7/0x1100
      [  148.508316]  napi_gro_receive+0xcb/0x260
      [  148.509387]  e1000_clean_rx_irq+0x2fc/0x8a0
      [  148.510501]  e1000_clean+0x652/0xd10
      [  148.511523]  net_rx_action+0x4ea/0xc20
      [  148.512566]  __do_softirq+0x1f9/0x574
      [  148.513598]
      [  148.514346] The buggy address belongs to the object at ffff8801e52eefc0
      [  148.514346]  which belongs to the cache skbuff_head_cache of size 232
      [  148.517047] The buggy address is located 0 bytes inside of
      [  148.517047]  232-byte region [ffff8801e52eefc0, ffff8801e52ef0a8)
      [  148.519549] The buggy address belongs to the page:
      [  148.520726] page:ffffea000794bb00 count:1 mapcount:0 mapping:ffff880106f4dfc0 index:0xffff8801e52ee840 compound_mapcount: 0
      [  148.524325] flags: 0x17ffffc0008100(slab|head)
      [  148.525481] raw: 0017ffffc0008100 ffff880106b938d0 ffff880106b938d0 ffff880106f4dfc0
      [  148.527503] raw: ffff8801e52ee840 0000000000190011 00000001ffffffff 0000000000000000
      [  148.529547] page dumped because: kasan: bad access detected
      
      Fixes: d4546c25 ("net: Convert GRO SKB handling to list_head.")
      Signed-off-by: NPrashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
      Reported-by: NTyler Hicks <tyhicks@canonical.com>
      Tested-by: NTyler Hicks <tyhicks@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      68d2f84a
    • D
      Merge branch 's390-qeth-updates' · c8c81de9
      David S. Miller 提交于
      Julian Wiedmann says:
      
      ====================
      s390/qeth: updates 2018-07-11
      
      please apply this first batch of qeth patches for net-next. It brings the
      usual cleanups, and some performance improvements to the transmit paths.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c8c81de9
    • J
      s390/qeth: speed-up IPv4 OSA xmit · fb321f25
      Julian Wiedmann 提交于
      Move the xmit of offload-eligible (ie IPv4) traffic on OSA over to the
      new, copy-free path.
      As with L2, we'll need to preserve the skb_orphan() behaviour of the
      old code path until TX completion is sufficiently fast.
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fb321f25
    • J
      s390/qeth: speed-up L3 IQD xmit · a647a025
      Julian Wiedmann 提交于
      This implements a new xmit path for L3 HiperSockets, which carves the
      HW header from skb headroom instead of allocating it from the hdr cache.
      It also adds NETIF_F_SG support.
      
      The delta in qeth_l3_xmit() is all just removal of IQD-specific code and
      some minor consolidation.
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a647a025
    • J
      s390/qeth: add a L3 xmit wrapper · ea1d4a0c
      Julian Wiedmann 提交于
      In preparation for future work, move the high-level xmit work into a
      separate wrapper. This matches the L2 xmit code.
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea1d4a0c
    • J
      s390/qeth: increase GSO max size for eligible L3 devices · 371a1e7a
      Julian Wiedmann 提交于
      When a L3 device doesn't offer TSO, allow the stack to build full-size
      GSO skbs.
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      371a1e7a
    • J
      s390/qeth: clean up exported symbols · 09960b3a
      Julian Wiedmann 提交于
      Remove some redundant EXPORTs. While at it, also move some L2-only
      prototypes into the proper header file.
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      09960b3a
    • J
      s390/qeth: consolidate ccwgroup driver definition · 6d8769ab
      Julian Wiedmann 提交于
      Reshuffle the code a bit so that everything is in one place.
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6d8769ab
    • J
      s390/qeth: clean up Output Queue selection · 86c0cdb9
      Julian Wiedmann 提交于
      Consolidate duplicated code, fix the misuse of RTN_UNSPEC and simplify
      the handling of non-unicast traffic on IQD devices.
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86c0cdb9
    • J
      s390/qeth: fine-tune RX modesetting · 9aa17df3
      Julian Wiedmann 提交于
      Changing a device's address lists (or its promisc mode) already triggers
      an RX modeset, there's no need to do it manually from the L2 driver's
      ndo_vlan_rx_kill_vid() hook.
      
      Also when setting a device online, dev_open() already calls
      dev_set_rx_mode(). So a manual modeset is only necessary from the
      recovery path.
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9aa17df3
    • J
      s390/qeth: remove unused buffer->aob pointer · f67a43a7
      Julian Wiedmann 提交于
      Except for tracing, the pointer is not used.
      
      At the same time, accessing it from qeth_qdio_output_handler() is racy:
      whenever qeth_qdio_cq_handler() gets control, its call to
      qeth_qdio_handle_aob() frees the AOB.
      
      So the AOB pointer that qeth_qdio_output_handler() stores into 'buffer'
      can go stale at any time, and trigger a use-after-free.
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f67a43a7
    • J
      s390/qeth: various buffer management cleanups · 3b346c18
      Julian Wiedmann 提交于
      Use the new qeth_scrub_qdio_buffer() helper, remove an extra parameter
      from qeth_clear_output_buffer(), init the bufstates.user field just once
      (in qeth_flush_buffers()) and remove some noisy trace messages.
      Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b346c18
    • J
      net: ipv4: fix listify ip_rcv_finish in case of forwarding · 0761680d
      Jesper Dangaard Brouer 提交于
      In commit 5fa12739 ("net: ipv4: listify ip_rcv_finish") calling
      dst_input(skb) was split-out.  The ip_sublist_rcv_finish() just calls
      dst_input(skb) in a loop.
      
      The problem is that ip_sublist_rcv_finish() forgot to remove the SKB
      from the list before invoking dst_input().  Further more we need to
      clear skb->next as other parts of the network stack use another kind
      of SKB lists for xmit_more (see dev_hard_start_xmit).
      
      A crash occurs if e.g. dst_input() invoke ip_forward(), which calls
      dst_output()/ip_output() that eventually calls __dev_queue_xmit() +
      sch_direct_xmit(), and a crash occurs in validate_xmit_skb_list().
      
      This patch only fixes the crash, but there is a huge potential for
      a performance boost if we can pass an SKB-list through to ip_forward.
      
      Fixes: 5fa12739 ("net: ipv4: listify ip_rcv_finish")
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NEdward Cree <ecree@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0761680d
    • A
      nfp: avoid using getnstimeofday64() · 51bef926
      Arnd Bergmann 提交于
      getnstimeofday64 is deprecated in favor of the ktime_get() family of
      functions. The direct replacement would be ktime_get_real_ts64(),
      but I'm picking the basic ktime_get() instead:
      
      - using a ktime_t simplifies the code compared to timespec64
      - using monotonic time instead of real time avoids issues caused
        by a concurrent settimeofday() or during a leap second adjustment.
      Acked-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      51bef926
    • A
      liquidio: use ktime_get_real_ts64() instead of getnstimeofday64() · 44c58899
      Arnd Bergmann 提交于
      The two do the same thing, but we want to have a consistent
      naming in the kernel.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NFelix Manlunas <felix.manlunas@cavium.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      44c58899
    • D
      Merge branch 'net-sched-act_skbedit-lockless-data-path' · ccb06fba
      David S. Miller 提交于
      Davide Caratti says:
      
      ====================
      net/sched: act_skbedit: lockless data path
      
      the data path of act_skbedit can be faster if we avoid using spinlocks:
       - patch 1 converts act_skbedit statistics to use per-cpu counters
       - patch 2 lets act_skbedit use RCU to read/update its configuration
      
      test procedure (using pktgen from https://github.com/netoptimizer):
      
       # ip link add name eth1 type dummy
       # ip link set dev eth1 up
       # tc qdisc add dev eth1 clsact
       # tc filter add dev eth1 egress matchall action skbedit priority c1a0:c1a0
       # for c in 1 2 4 ; do
       > ./pktgen_bench_xmit_mode_queue_xmit.sh -v -s 64 -t $c -n 5000000 -i eth1
       > done
      
      test results (avg. pps/thread)
      
        $c | before patch |  after patch | improvement
       ----+--------------+--------------+------------
         1 | 3917464 ± 3% | 4000458 ± 3% |  irrelevant
         2 | 3455367 ± 4% | 3953076 ± 1% |        +14%
         4 | 2496594 ± 2% | 3801123 ± 3% |        +52%
      
      v2: rebased on latest net-next
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ccb06fba
    • D
      net/sched: act_skbedit: don't use spinlock in the data path · c749cdda
      Davide Caratti 提交于
      use RCU instead of spin_{,un}lock_bh, to protect concurrent read/write on
      act_skbedit configuration. This reduces the effects of contention in the
      data path, in case multiple readers are present.
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c749cdda
    • D
      net/sched: skbedit: use per-cpu counters · 6f3dfb0d
      Davide Caratti 提交于
      use per-CPU counters, instead of sharing a single set of stats with all
      cores: this removes the need of spinlocks when stats are read/updated.
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6f3dfb0d
    • A
      tcp: use monotonic timestamps for PAWS · cca9bab1
      Arnd Bergmann 提交于
      Using get_seconds() for timestamps is deprecated since it can lead
      to overflows on 32-bit systems. While the interface generally doesn't
      overflow until year 2106, the specific implementation of the TCP PAWS
      algorithm breaks in 2038 when the intermediate signed 32-bit timestamps
      overflow.
      
      A related problem is that the local timestamps in CLOCK_REALTIME form
      lead to unexpected behavior when settimeofday is called to set the system
      clock backwards or forwards by more than 24 days.
      
      While the first problem could be solved by using an overflow-safe method
      of comparing the timestamps, a nicer solution is to use a monotonic
      clocksource with ktime_get_seconds() that simply doesn't overflow (at
      least not until 136 years after boot) and that doesn't change during
      settimeofday().
      
      To make 32-bit and 64-bit architectures behave the same way here, and
      also save a few bytes in the tcp_options_received structure, I'm changing
      the type to a 32-bit integer, which is now safe on all architectures.
      
      Finally, the ts_recent_stamp field also (confusingly) gets used to store
      a jiffies value in tcp_synq_overflow()/tcp_synq_no_recent_overflow().
      This is currently safe, but changing the type to 32-bit requires
      some small changes there to keep it working.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cca9bab1
    • V
      net/tls: Use aead_request_alloc/free for request alloc/free · d2bdd268
      Vakul Garg 提交于
      Instead of kzalloc/free for aead_request allocation and free, use
      functions aead_request_alloc(), aead_request_free(). It ensures that
      any sensitive crypto material held in crypto transforms is securely
      erased from memory.
      Signed-off-by: NVakul Garg <vakul.garg@nxp.com>
      Acked-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d2bdd268
    • P
      tc-testing: add geneve options in tunnel_key unit tests · cba54f9c
      Pieter Jansen van Vuuren 提交于
      Extend tc tunnel_key action unit tests with geneve options. Tests
      include testing single and multiple geneve options, as well as
      testing geneve options that are expected to fail.
      Signed-off-by: NPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
      Acked-by: NLucas Bates <lucasb@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cba54f9c