1. 06 9月, 2017 23 次提交
    • J
      vhost_net: correctly check tx avail during rx busy polling · 8b949bef
      Jason Wang 提交于
      We check tx avail through vhost_enable_notify() in the past which is
      wrong since it only checks whether or not guest has filled more
      available buffer since last avail idx synchronization which was just
      done by vhost_vq_avail_empty() before. What we really want is checking
      pending buffers in the avail ring. Fix this by calling
      vhost_vq_avail_empty() instead.
      
      This issue could be noticed by doing netperf TCP_RR benchmark as
      client from guest (but not host). With this fix, TCP_RR from guest to
      localhost restores from 1375.91 trans per sec to 55235.28 trans per
      sec on my laptop (Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz).
      
      Fixes: 03088137 ("vhost_net: basic polling support")
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b949bef
    • C
      net: mdio-mux: add mdio_mux parameter to mdio_mux_init() · 5482a978
      Corentin Labbe 提交于
      mdio_mux_init() use the parameter dev for two distinct thing:
      1) Have a device for all devm_ functions
      2) Get device_node from it
      
      Since it is two distinct purpose, this patch add a parameter mdio_mux
      that is linked to task 2.
      
      This will also permit to register an of_node mdio-mux that lacks a direct
      owning device.
      For example a mdio-mux which is a subnode of a real device.
      Signed-off-by: NCorentin Labbe <clabbe.montjoie@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5482a978
    • D
      rxrpc: Make service connection lookup always check for retry · fdade4f6
      David Howells 提交于
      When an RxRPC service packet comes in, the target connection is looked up
      by an rb-tree search under RCU and a read-locked seqlock; the seqlock retry
      check is, however, currently skipped if we got a match, but probably
      shouldn't be in case the connection we found gets replaced whilst we're
      doing a search.
      
      Make the lookup procedure always go through need_seqretry(), even if the
      lookup was successful.  This makes sure we always pick up on a write-lock
      event.
      
      On the other hand, since we don't take a ref on the object, but rely on RCU
      to prevent its destruction after dropping the seqlock, I'm not sure this is
      necessary.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fdade4f6
    • R
      net: stmmac: Delete dead code for MDIO registration · 5e369aef
      Romain Perier 提交于
      This code is no longer used, the logging function was changed by commit
      fbca1647 ("net: stmmac: Use the right logging function in stmmac_mdio_register").
      It was previously showing information about the type of the IRQ, if it's
      polled, ignored or a normal interrupt. As we don't want information loss,
      I have moved this code to phy_attached_print().
      
      Fixes: fbca1647 ("net: stmmac: Use the right logging function in stmmac_mdio_register")
      Signed-off-by: NRomain Perier <romain.perier@collabora.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5e369aef
    • C
      gianfar: Fix Tx flow control deactivation · 5d621672
      Claudiu Manoil 提交于
      The wrong register is checked for the Tx flow control bit,
      it should have been maccfg1 not maccfg2.
      This went unnoticed for so long probably because the impact is
      hardly visible, not to mention the tangled code from adjust_link().
      First, link flow control (i.e. handling of Rx/Tx link level pause frames)
      is disabled by default (needs to be enabled via 'ethtool -A').
      Secondly, maccfg2 always returns 0 for tx_flow_oldval (except for a few
      old boards), which results in Tx flow control remaining always on
      once activated.
      
      Fixes: 45b679c9 ("gianfar: Implement PAUSE frame generation support")
      Signed-off-by: NClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5d621672
    • G
      cxgb4: Ignore MPS_TX_INT_CAUSE[Bubble] for T6 · ef18e3b9
      Ganesh Goudar 提交于
      MPS_TX_INT_CAUSE[Bubble] is a normal condition for T6, hence
      ignore this interrupt for T6.
      Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: NCasey Leedom <leedom@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ef18e3b9
    • G
      cxgb4: Fix pause frame count in t4_get_port_stats · 2de489f4
      Ganesh Goudar 提交于
      MPS_STAT_CTL[CountPauseStatTx] and MPS_STAT_CTL[CountPauseStatRx]
      only control whether or not Pause Frames will be counted as part
      of the 64-Byte Tx/Rx Frame counters.  These bits do not control
      whether Pause Frames are counted in the Total Tx/Rx Frames/Bytes
      counters.
      Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: NCasey Leedom <leedom@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2de489f4
    • G
      cxgb4: fix memory leak · 128416ac
      Ganesh Goudar 提交于
      do not reuse the loop counter which is used iterate over
      the ports, so that sched_tbl will be freed for all the ports.
      Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      128416ac
    • J
      tun: rename generic_xdp to skb_xdp · 1cfe6e93
      Jason Wang 提交于
      Rename "generic_xdp" to "skb_xdp" to avoid confusing it with the
      generic XDP which will be done at netif_receive_skb().
      
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1cfe6e93
    • J
      tun: reserve extra headroom only when XDP is set · 7df13219
      Jason Wang 提交于
      We reserve headroom unconditionally which could cause unnecessary
      stress on socket memory accounting because of increased trusesize. Fix
      this by only reserve extra headroom when XDP is set.
      
      Cc: Jakub Kicinski <kubakici@wp.pl>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7df13219
    • D
      Merge branch 'dsa-tx-queues' · 9e776f22
      David S. Miller 提交于
      Florian Fainelli says:
      
      ====================
      net: dsa: Allow switch drivers to indicate number of TX queues
      
      This patch series extracts the parts of the patch set that are likely not to be
      controversial and actually bringing multi-queue support to DSA-created network
      devices.
      
      With these patches, we can now use sch_multiq as documented under
      Documentation/networking/multique.txt and let applications dedice the switch
      port output queue they want to use. Currently only Broadcom tags utilize that
      information.
      
      Resending based on David's feedback regarding the patches not in patchwork.
      
      Changes in v2:
      - use a proper define for the number of TX queues in bcm_sf2.c (Andrew)
      
      Changes from RFC:
      
      - dropped the ability to configure RX queues since we don't do anything with
        those just yet
      - dropped the patches that dealt with binding the DSA slave network devices
        queues with their master network devices queues this will be worked on
        separately.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9e776f22
    • F
      net: dsa: bcm_sf2: Configure IMP port TC2QOS mapping · c837fc81
      Florian Fainelli 提交于
      Even though TC2QOS mapping is for switch egress queues, we need to
      configure it correclty in order for the Broadcom tag ingress (CPU ->
      switch) queue selection to work correctly since there is a 1:1 mapping
      between switch egress queues and ingress queues.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c837fc81
    • F
      net: dsa: bcm_sf2: Advertise number of egress queues · 18118377
      Florian Fainelli 提交于
      The switch supports 8 egress queues per port, so indicate that such that
      net/dsa/slave.c::dsa_slave_create can allocate the right number of TX queues.
      While at it use SF2_NUM_EGRESS_QUEUE as a define for the number of queues we
      support.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      18118377
    • F
      net: dsa: tag_brcm: Set output queue from skb queue mapping · 0f15b098
      Florian Fainelli 提交于
      We originally used skb->priority but that was not quite correct as this
      bitfield needs to contain the egress switch queue we intend to send this
      SKB to.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0f15b098
    • F
      net: dsa: Allow switch drivers to indicate number of TX queues · 55199df6
      Florian Fainelli 提交于
      Let switch drivers indicate how many TX queues they support. Some
      switches, such as Broadcom Starfighter 2 are designed with 8 egress
      queues. Future changes will allow us to leverage the queue mapping and
      direct the transmission towards a particular queue.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      55199df6
    • I
      f1c2eddf
    • T
      net/mlx4_core: Use ARRAY_SIZE macro · 691223ec
      Thomas Meyer 提交于
      Use ARRAY_SIZE macro, rather than explicitly coding some variant of it
      yourself.
      Found with: find -type f -name "*.c" -o -name "*.h" | xargs perl -p -i -e
      's/\bsizeof\s*\(\s*(\w+)\s*\)\s*\ /\s*sizeof\s*\(\s*\1\s*\[\s*0\s*\]\s*\)
      /ARRAY_SIZE(\1)/g' and manual check/verification.
      Signed-off-by: NThomas Meyer <thomas@m3y3r.de>
      Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      691223ec
    • D
      Merge branch 'flow_dissector-fixes' · c4492d8a
      David S. Miller 提交于
      Tom Herbert says:
      
      ====================
      flow_dissector: Flow dissector fixes
      
      This patch set fixes some basic issues with __skb_flow_dissect function.
      
      Items addressed:
        - Cleanup control flow in the function; in particular eliminate a
          bunch of goto's and implement a simplified control flow model
        - Add limits for number of encapsulations and headers that can be
          dissected
      
      v2:
        - Simplify the logic for limits on flow dissection. Just set the
          limit based on the number of headers the flow dissector can
          processes. The accounted headers includes encapsulation headers,
          extension headers, or other shim headers.
      
      Tested:
      
      Ran normal traffic, GUE, and VXLAN traffic.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c4492d8a
    • T
      flow_dissector: Add limit for number of headers to dissect · 1eed4dfb
      Tom Herbert 提交于
      In flow dissector there are no limits to the number of nested
      encapsulations or headers that might be dissected which makes for a
      nice DOS attack. This patch sets a limit of the number of headers
      that flow dissector will parse.
      
      Headers includes network layer headers, transport layer headers, shim
      headers for encapsulation, IPv6 extension headers, etc. The limit for
      maximum number of headers to parse has be set to fifteen to account for
      a reasonable number of encapsulations, extension headers, VLAN,
      in a packet. Note that this limit does not supercede the STOP_AT_*
      flags which may stop processing before the headers limit is reached.
      Reported-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NTom Herbert <tom@quantonium.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1eed4dfb
    • T
      flow_dissector: Cleanup control flow · 3a1214e8
      Tom Herbert 提交于
      __skb_flow_dissect is riddled with gotos that make discerning the flow,
      debugging, and extending the capability difficult. This patch
      reorganizes things so that we only perform goto's after the two main
      switch statements (no gotos within the cases now). It also eliminates
      several goto labels so that there are only two labels that can be target
      for goto.
      Reported-by: NAlexander Popov <alex.popov@linux.com>
      Signed-off-by: NTom Herbert <tom@quantonium.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a1214e8
    • A
      soc: ti/knav_dma: include dmaengine header · 2c08ab3f
      Arnd Bergmann 提交于
      A header file cleanup apparently caused a build regression
      with one driver using the knav infrastructure:
      
      In file included from drivers/net/ethernet/ti/netcp_core.c:30:0:
      include/linux/soc/ti/knav_dma.h:129:30: error: field 'direction' has incomplete type
        enum dma_transfer_direction direction;
                                    ^~~~~~~~~
      drivers/net/ethernet/ti/netcp_core.c: In function 'netcp_txpipe_open':
      drivers/net/ethernet/ti/netcp_core.c:1349:21: error: 'DMA_MEM_TO_DEV' undeclared (first use in this function); did you mean 'DMA_MEMORY_MAP'?
        config.direction = DMA_MEM_TO_DEV;
                           ^~~~~~~~~~~~~~
                           DMA_MEMORY_MAP
      drivers/net/ethernet/ti/netcp_core.c:1349:21: note: each undeclared identifier is reported only once for each function it appears in
      drivers/net/ethernet/ti/netcp_core.c: In function 'netcp_setup_navigator_resources':
      drivers/net/ethernet/ti/netcp_core.c:1659:22: error: 'DMA_DEV_TO_MEM' undeclared (first use in this function); did you mean 'DMA_DESC_HOST'?
        config.direction  = DMA_DEV_TO_MEM;
      
      As the header is no longer included implicitly through netdevice.h,
      we should include it in the header that references the enum.
      
      Fixes: 0dd5759d ("net: remove dmaengine.h inclusion from netdevice.h")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c08ab3f
    • A
      net/ncsi: fix ncsi_vlan_rx_{add,kill}_vid references · fd0c88b7
      Arnd Bergmann 提交于
      We get a new link error in allmodconfig kernels after ftgmac100
      started using the ncsi helpers:
      
      ERROR: "ncsi_vlan_rx_kill_vid" [drivers/net/ethernet/faraday/ftgmac100.ko] undefined!
      ERROR: "ncsi_vlan_rx_add_vid" [drivers/net/ethernet/faraday/ftgmac100.ko] undefined!
      
      Related to that, we get another error when CONFIG_NET_NCSI is disabled:
      
      drivers/net/ethernet/faraday/ftgmac100.c:1626:25: error: 'ncsi_vlan_rx_add_vid' undeclared here (not in a function); did you mean 'ncsi_start_dev'?
      drivers/net/ethernet/faraday/ftgmac100.c:1627:26: error: 'ncsi_vlan_rx_kill_vid' undeclared here (not in a function); did you mean 'ncsi_vlan_rx_add_vid'?
      
      This fixes both problems at once, using a 'static inline' stub helper
      for the disabled case, and exporting the functions when they are present.
      
      Fixes: 51564585 ("ftgmac100: Support NCSI VLAN filtering when available")
      Fixes: 21acf630 ("net/ncsi: Configure VLAN tag filter")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd0c88b7
    • E
      bpf: fix numa_node validation · 96e5ae4e
      Eric Dumazet 提交于
      syzkaller reported crashes in bpf map creation or map update [1]
      
      Problem is that nr_node_ids is a signed integer,
      NUMA_NO_NODE is also an integer, so it is very tempting
      to declare numa_node as a signed integer.
      
      This means the typical test to validate a user provided value :
      
              if (numa_node != NUMA_NO_NODE &&
                  (numa_node >= nr_node_ids ||
                   !node_online(numa_node)))
      
      must be written :
      
              if (numa_node != NUMA_NO_NODE &&
                  ((unsigned int)numa_node >= nr_node_ids ||
                   !node_online(numa_node)))
      
      [1]
      kernel BUG at mm/slab.c:3256!
      invalid opcode: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 0 PID: 2946 Comm: syzkaller916108 Not tainted 4.13.0-rc7+ #35
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      task: ffff8801d2bc60c0 task.stack: ffff8801c0c90000
      RIP: 0010:____cache_alloc_node+0x1d4/0x1e0 mm/slab.c:3292
      RSP: 0018:ffff8801c0c97638 EFLAGS: 00010096
      RAX: ffffffffffff8b7b RBX: 0000000001080220 RCX: 0000000000000000
      RDX: 00000000ffff8b7b RSI: 0000000001080220 RDI: ffff8801dac00040
      RBP: ffff8801c0c976c0 R08: 0000000000000000 R09: 0000000000000000
      R10: ffff8801c0c97620 R11: 0000000000000001 R12: ffff8801dac00040
      R13: ffff8801dac00040 R14: 0000000000000000 R15: 00000000ffff8b7b
      FS:  0000000002119940(0000) GS:ffff8801db200000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020001fec CR3: 00000001d2980000 CR4: 00000000001406f0
      Call Trace:
       __do_kmalloc_node mm/slab.c:3688 [inline]
       __kmalloc_node+0x33/0x70 mm/slab.c:3696
       kmalloc_node include/linux/slab.h:535 [inline]
       alloc_htab_elem+0x2a8/0x480 kernel/bpf/hashtab.c:740
       htab_map_update_elem+0x740/0xb80 kernel/bpf/hashtab.c:820
       map_update_elem kernel/bpf/syscall.c:587 [inline]
       SYSC_bpf kernel/bpf/syscall.c:1468 [inline]
       SyS_bpf+0x20c5/0x4c40 kernel/bpf/syscall.c:1443
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      RIP: 0033:0x440409
      RSP: 002b:00007ffd1f1792b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440409
      RDX: 0000000000000020 RSI: 0000000020006000 RDI: 0000000000000002
      RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000401d70
      R13: 0000000000401e00 R14: 0000000000000000 R15: 0000000000000000
      Code: 83 c2 01 89 50 18 4c 03 70 08 e8 38 f4 ff ff 4d 85 f6 0f 85 3e ff ff ff 44 89 fe 4c 89 ef e8 94 fb ff ff 49 89 c6 e9 2b ff ff ff <0f> 0b 0f 0b 0f 0b 66 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41
      RIP: ____cache_alloc_node+0x1d4/0x1e0 mm/slab.c:3292 RSP: ffff8801c0c97638
      ---[ end trace d745f355da2e33ce ]---
      Kernel panic - not syncing: Fatal exception
      
      Fixes: 96eabe7a ("bpf: Allow selecting numa node during map creation")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      96e5ae4e
  2. 05 9月, 2017 1 次提交
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · 2ff81cd3
      David S. Miller 提交于
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter updates for next-net (part 2)
      
      The following patchset contains Netfilter updates for net-next. This
      patchset includes updates for nf_tables, removal of
      CONFIG_NETFILTER_DEBUG and a new mode for xt_hashlimit. More
      specifically, they:
      
      1) Add new rate match mode for hashlimit, this introduces a new revision
         for this match. The idea is to stop matching packets until ratelimit
         criteria stands true. Patch from Vishwanath Pai.
      
      2) Add ->select_ops indirection to nf_tables named objects, so we can
         choose between different flavours of the same object type, patch from
         Pablo M. Bermudo.
      
      3) Shorter function names in nft_limit, basically:
         s/nft_limit_pkt_bytes/nft_limit_bytes, also from Pablo M. Bermudo.
      
      4) Add new stateful limit named object type, this allows us to create
         limit policies that you can identify via name, also from Pablo.
      
      5) Remove unused hooknum parameter in conntrack ->packet indirection.
         From Florian Westphal.
      
      6) Patches to remove CONFIG_NETFILTER_DEBUG and macros such as
         IP_NF_ASSERT and IP_NF_ASSERT. From Varsha Rao.
      
      7) Add nf_tables_updchain() helper function and use it from
         nf_tables_newchain() to make it more maintainable. Similarly,
         add nf_tables_addchain() and use it too.
      
      8) Add new netlink NLM_F_NONREC flag, this flag should only be used for
         deletion requests, specifically, to support non-recursive deletion.
         Based on what we discussed during NFWS'17 in Faro.
      
      9) Use NLM_F_NONREC from table and sets in nf_tables.
      
      10) Support for recursive chain deletion. Table and set deletion
          commands come with an implicit content flush on deletion, while
          chains do not. This patch addresses this inconsistency by adding
          the code to perform recursive chain deletions. This also comes with
          the bits to deal with the new NLM_F_NONREC netlink flag.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2ff81cd3
  3. 04 9月, 2017 16 次提交