1. 06 9月, 2017 25 次提交
    • H
      rds: Fix non-atomic operation on shared flag variable · f530f39f
      Håkon Bugge 提交于
      The bits in m_flags in struct rds_message are used for a plurality of
      reasons, and from different contexts. To avoid any missing updates to
      m_flags, use the atomic set_bit() instead of the non-atomic equivalent.
      Signed-off-by: NHåkon Bugge <haakon.bugge@oracle.com>
      Reviewed-by: NKnut Omang <knut.omang@oracle.com>
      Reviewed-by: NWei Lin Guay <wei.lin.guay@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f530f39f
    • J
      net: sched: don't use GFP_KERNEL under spin lock · 2c8468dc
      Jakub Kicinski 提交于
      The new TC IDR code uses GFP_KERNEL under spin lock.  Which leads
      to:
      
      [  582.621091] BUG: sleeping function called from invalid context at ../mm/slab.h:416
      [  582.629721] in_atomic(): 1, irqs_disabled(): 0, pid: 3379, name: tc
      [  582.636939] 2 locks held by tc/3379:
      [  582.641049]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff910354ce>] rtnetlink_rcv_msg+0x92e/0x1400
      [  582.650958]  #1:  (&(&tn->idrinfo->lock)->rlock){+.-.+.}, at: [<ffffffff9110a5e0>] tcf_idr_create+0x2f0/0x8e0
      [  582.662217] Preemption disabled at:
      [  582.662222] [<ffffffff9110a5e0>] tcf_idr_create+0x2f0/0x8e0
      [  582.672592] CPU: 9 PID: 3379 Comm: tc Tainted: G        W       4.13.0-rc7-debug-00648-g43503a79b9f0 #287
      [  582.683432] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.3.4 11/08/2016
      [  582.691937] Call Trace:
      ...
      [  582.742460]  kmem_cache_alloc+0x286/0x540
      [  582.747055]  radix_tree_node_alloc.constprop.6+0x4a/0x450
      [  582.753209]  idr_get_free_cmn+0x627/0xf80
      ...
      [  582.815525]  idr_alloc_cmn+0x1a8/0x270
      ...
      [  582.833804]  tcf_idr_create+0x31b/0x8e0
      ...
      
      Try to preallocate the memory with idr_prealloc(GFP_KERNEL)
      (as suggested by Eric Dumazet), and change the allocation
      flags under spin lock.
      
      Fixes: 65a206c0 ("net/sched: Change act_api and act_xxx modules to use IDR")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c8468dc
    • J
      vhost_net: correctly check tx avail during rx busy polling · 8b949bef
      Jason Wang 提交于
      We check tx avail through vhost_enable_notify() in the past which is
      wrong since it only checks whether or not guest has filled more
      available buffer since last avail idx synchronization which was just
      done by vhost_vq_avail_empty() before. What we really want is checking
      pending buffers in the avail ring. Fix this by calling
      vhost_vq_avail_empty() instead.
      
      This issue could be noticed by doing netperf TCP_RR benchmark as
      client from guest (but not host). With this fix, TCP_RR from guest to
      localhost restores from 1375.91 trans per sec to 55235.28 trans per
      sec on my laptop (Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz).
      
      Fixes: 03088137 ("vhost_net: basic polling support")
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b949bef
    • C
      net: mdio-mux: add mdio_mux parameter to mdio_mux_init() · 5482a978
      Corentin Labbe 提交于
      mdio_mux_init() use the parameter dev for two distinct thing:
      1) Have a device for all devm_ functions
      2) Get device_node from it
      
      Since it is two distinct purpose, this patch add a parameter mdio_mux
      that is linked to task 2.
      
      This will also permit to register an of_node mdio-mux that lacks a direct
      owning device.
      For example a mdio-mux which is a subnode of a real device.
      Signed-off-by: NCorentin Labbe <clabbe.montjoie@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5482a978
    • D
      rxrpc: Make service connection lookup always check for retry · fdade4f6
      David Howells 提交于
      When an RxRPC service packet comes in, the target connection is looked up
      by an rb-tree search under RCU and a read-locked seqlock; the seqlock retry
      check is, however, currently skipped if we got a match, but probably
      shouldn't be in case the connection we found gets replaced whilst we're
      doing a search.
      
      Make the lookup procedure always go through need_seqretry(), even if the
      lookup was successful.  This makes sure we always pick up on a write-lock
      event.
      
      On the other hand, since we don't take a ref on the object, but rely on RCU
      to prevent its destruction after dropping the seqlock, I'm not sure this is
      necessary.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fdade4f6
    • R
      net: stmmac: Delete dead code for MDIO registration · 5e369aef
      Romain Perier 提交于
      This code is no longer used, the logging function was changed by commit
      fbca1647 ("net: stmmac: Use the right logging function in stmmac_mdio_register").
      It was previously showing information about the type of the IRQ, if it's
      polled, ignored or a normal interrupt. As we don't want information loss,
      I have moved this code to phy_attached_print().
      
      Fixes: fbca1647 ("net: stmmac: Use the right logging function in stmmac_mdio_register")
      Signed-off-by: NRomain Perier <romain.perier@collabora.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5e369aef
    • C
      gianfar: Fix Tx flow control deactivation · 5d621672
      Claudiu Manoil 提交于
      The wrong register is checked for the Tx flow control bit,
      it should have been maccfg1 not maccfg2.
      This went unnoticed for so long probably because the impact is
      hardly visible, not to mention the tangled code from adjust_link().
      First, link flow control (i.e. handling of Rx/Tx link level pause frames)
      is disabled by default (needs to be enabled via 'ethtool -A').
      Secondly, maccfg2 always returns 0 for tx_flow_oldval (except for a few
      old boards), which results in Tx flow control remaining always on
      once activated.
      
      Fixes: 45b679c9 ("gianfar: Implement PAUSE frame generation support")
      Signed-off-by: NClaudiu Manoil <claudiu.manoil@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5d621672
    • G
      cxgb4: Ignore MPS_TX_INT_CAUSE[Bubble] for T6 · ef18e3b9
      Ganesh Goudar 提交于
      MPS_TX_INT_CAUSE[Bubble] is a normal condition for T6, hence
      ignore this interrupt for T6.
      Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: NCasey Leedom <leedom@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ef18e3b9
    • G
      cxgb4: Fix pause frame count in t4_get_port_stats · 2de489f4
      Ganesh Goudar 提交于
      MPS_STAT_CTL[CountPauseStatTx] and MPS_STAT_CTL[CountPauseStatRx]
      only control whether or not Pause Frames will be counted as part
      of the 64-Byte Tx/Rx Frame counters.  These bits do not control
      whether Pause Frames are counted in the Total Tx/Rx Frames/Bytes
      counters.
      Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: NCasey Leedom <leedom@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2de489f4
    • G
      cxgb4: fix memory leak · 128416ac
      Ganesh Goudar 提交于
      do not reuse the loop counter which is used iterate over
      the ports, so that sched_tbl will be freed for all the ports.
      Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      128416ac
    • J
      tun: rename generic_xdp to skb_xdp · 1cfe6e93
      Jason Wang 提交于
      Rename "generic_xdp" to "skb_xdp" to avoid confusing it with the
      generic XDP which will be done at netif_receive_skb().
      
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1cfe6e93
    • J
      tun: reserve extra headroom only when XDP is set · 7df13219
      Jason Wang 提交于
      We reserve headroom unconditionally which could cause unnecessary
      stress on socket memory accounting because of increased trusesize. Fix
      this by only reserve extra headroom when XDP is set.
      
      Cc: Jakub Kicinski <kubakici@wp.pl>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7df13219
    • D
      Merge branch 'dsa-tx-queues' · 9e776f22
      David S. Miller 提交于
      Florian Fainelli says:
      
      ====================
      net: dsa: Allow switch drivers to indicate number of TX queues
      
      This patch series extracts the parts of the patch set that are likely not to be
      controversial and actually bringing multi-queue support to DSA-created network
      devices.
      
      With these patches, we can now use sch_multiq as documented under
      Documentation/networking/multique.txt and let applications dedice the switch
      port output queue they want to use. Currently only Broadcom tags utilize that
      information.
      
      Resending based on David's feedback regarding the patches not in patchwork.
      
      Changes in v2:
      - use a proper define for the number of TX queues in bcm_sf2.c (Andrew)
      
      Changes from RFC:
      
      - dropped the ability to configure RX queues since we don't do anything with
        those just yet
      - dropped the patches that dealt with binding the DSA slave network devices
        queues with their master network devices queues this will be worked on
        separately.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9e776f22
    • F
      net: dsa: bcm_sf2: Configure IMP port TC2QOS mapping · c837fc81
      Florian Fainelli 提交于
      Even though TC2QOS mapping is for switch egress queues, we need to
      configure it correclty in order for the Broadcom tag ingress (CPU ->
      switch) queue selection to work correctly since there is a 1:1 mapping
      between switch egress queues and ingress queues.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c837fc81
    • F
      net: dsa: bcm_sf2: Advertise number of egress queues · 18118377
      Florian Fainelli 提交于
      The switch supports 8 egress queues per port, so indicate that such that
      net/dsa/slave.c::dsa_slave_create can allocate the right number of TX queues.
      While at it use SF2_NUM_EGRESS_QUEUE as a define for the number of queues we
      support.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      18118377
    • F
      net: dsa: tag_brcm: Set output queue from skb queue mapping · 0f15b098
      Florian Fainelli 提交于
      We originally used skb->priority but that was not quite correct as this
      bitfield needs to contain the egress switch queue we intend to send this
      SKB to.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0f15b098
    • F
      net: dsa: Allow switch drivers to indicate number of TX queues · 55199df6
      Florian Fainelli 提交于
      Let switch drivers indicate how many TX queues they support. Some
      switches, such as Broadcom Starfighter 2 are designed with 8 egress
      queues. Future changes will allow us to leverage the queue mapping and
      direct the transmission towards a particular queue.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      55199df6
    • I
      f1c2eddf
    • T
      net/mlx4_core: Use ARRAY_SIZE macro · 691223ec
      Thomas Meyer 提交于
      Use ARRAY_SIZE macro, rather than explicitly coding some variant of it
      yourself.
      Found with: find -type f -name "*.c" -o -name "*.h" | xargs perl -p -i -e
      's/\bsizeof\s*\(\s*(\w+)\s*\)\s*\ /\s*sizeof\s*\(\s*\1\s*\[\s*0\s*\]\s*\)
      /ARRAY_SIZE(\1)/g' and manual check/verification.
      Signed-off-by: NThomas Meyer <thomas@m3y3r.de>
      Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      691223ec
    • D
      Merge branch 'flow_dissector-fixes' · c4492d8a
      David S. Miller 提交于
      Tom Herbert says:
      
      ====================
      flow_dissector: Flow dissector fixes
      
      This patch set fixes some basic issues with __skb_flow_dissect function.
      
      Items addressed:
        - Cleanup control flow in the function; in particular eliminate a
          bunch of goto's and implement a simplified control flow model
        - Add limits for number of encapsulations and headers that can be
          dissected
      
      v2:
        - Simplify the logic for limits on flow dissection. Just set the
          limit based on the number of headers the flow dissector can
          processes. The accounted headers includes encapsulation headers,
          extension headers, or other shim headers.
      
      Tested:
      
      Ran normal traffic, GUE, and VXLAN traffic.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c4492d8a
    • T
      flow_dissector: Add limit for number of headers to dissect · 1eed4dfb
      Tom Herbert 提交于
      In flow dissector there are no limits to the number of nested
      encapsulations or headers that might be dissected which makes for a
      nice DOS attack. This patch sets a limit of the number of headers
      that flow dissector will parse.
      
      Headers includes network layer headers, transport layer headers, shim
      headers for encapsulation, IPv6 extension headers, etc. The limit for
      maximum number of headers to parse has be set to fifteen to account for
      a reasonable number of encapsulations, extension headers, VLAN,
      in a packet. Note that this limit does not supercede the STOP_AT_*
      flags which may stop processing before the headers limit is reached.
      Reported-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NTom Herbert <tom@quantonium.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1eed4dfb
    • T
      flow_dissector: Cleanup control flow · 3a1214e8
      Tom Herbert 提交于
      __skb_flow_dissect is riddled with gotos that make discerning the flow,
      debugging, and extending the capability difficult. This patch
      reorganizes things so that we only perform goto's after the two main
      switch statements (no gotos within the cases now). It also eliminates
      several goto labels so that there are only two labels that can be target
      for goto.
      Reported-by: NAlexander Popov <alex.popov@linux.com>
      Signed-off-by: NTom Herbert <tom@quantonium.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a1214e8
    • A
      soc: ti/knav_dma: include dmaengine header · 2c08ab3f
      Arnd Bergmann 提交于
      A header file cleanup apparently caused a build regression
      with one driver using the knav infrastructure:
      
      In file included from drivers/net/ethernet/ti/netcp_core.c:30:0:
      include/linux/soc/ti/knav_dma.h:129:30: error: field 'direction' has incomplete type
        enum dma_transfer_direction direction;
                                    ^~~~~~~~~
      drivers/net/ethernet/ti/netcp_core.c: In function 'netcp_txpipe_open':
      drivers/net/ethernet/ti/netcp_core.c:1349:21: error: 'DMA_MEM_TO_DEV' undeclared (first use in this function); did you mean 'DMA_MEMORY_MAP'?
        config.direction = DMA_MEM_TO_DEV;
                           ^~~~~~~~~~~~~~
                           DMA_MEMORY_MAP
      drivers/net/ethernet/ti/netcp_core.c:1349:21: note: each undeclared identifier is reported only once for each function it appears in
      drivers/net/ethernet/ti/netcp_core.c: In function 'netcp_setup_navigator_resources':
      drivers/net/ethernet/ti/netcp_core.c:1659:22: error: 'DMA_DEV_TO_MEM' undeclared (first use in this function); did you mean 'DMA_DESC_HOST'?
        config.direction  = DMA_DEV_TO_MEM;
      
      As the header is no longer included implicitly through netdevice.h,
      we should include it in the header that references the enum.
      
      Fixes: 0dd5759d ("net: remove dmaengine.h inclusion from netdevice.h")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c08ab3f
    • A
      net/ncsi: fix ncsi_vlan_rx_{add,kill}_vid references · fd0c88b7
      Arnd Bergmann 提交于
      We get a new link error in allmodconfig kernels after ftgmac100
      started using the ncsi helpers:
      
      ERROR: "ncsi_vlan_rx_kill_vid" [drivers/net/ethernet/faraday/ftgmac100.ko] undefined!
      ERROR: "ncsi_vlan_rx_add_vid" [drivers/net/ethernet/faraday/ftgmac100.ko] undefined!
      
      Related to that, we get another error when CONFIG_NET_NCSI is disabled:
      
      drivers/net/ethernet/faraday/ftgmac100.c:1626:25: error: 'ncsi_vlan_rx_add_vid' undeclared here (not in a function); did you mean 'ncsi_start_dev'?
      drivers/net/ethernet/faraday/ftgmac100.c:1627:26: error: 'ncsi_vlan_rx_kill_vid' undeclared here (not in a function); did you mean 'ncsi_vlan_rx_add_vid'?
      
      This fixes both problems at once, using a 'static inline' stub helper
      for the disabled case, and exporting the functions when they are present.
      
      Fixes: 51564585 ("ftgmac100: Support NCSI VLAN filtering when available")
      Fixes: 21acf630 ("net/ncsi: Configure VLAN tag filter")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd0c88b7
    • E
      bpf: fix numa_node validation · 96e5ae4e
      Eric Dumazet 提交于
      syzkaller reported crashes in bpf map creation or map update [1]
      
      Problem is that nr_node_ids is a signed integer,
      NUMA_NO_NODE is also an integer, so it is very tempting
      to declare numa_node as a signed integer.
      
      This means the typical test to validate a user provided value :
      
              if (numa_node != NUMA_NO_NODE &&
                  (numa_node >= nr_node_ids ||
                   !node_online(numa_node)))
      
      must be written :
      
              if (numa_node != NUMA_NO_NODE &&
                  ((unsigned int)numa_node >= nr_node_ids ||
                   !node_online(numa_node)))
      
      [1]
      kernel BUG at mm/slab.c:3256!
      invalid opcode: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 0 PID: 2946 Comm: syzkaller916108 Not tainted 4.13.0-rc7+ #35
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      task: ffff8801d2bc60c0 task.stack: ffff8801c0c90000
      RIP: 0010:____cache_alloc_node+0x1d4/0x1e0 mm/slab.c:3292
      RSP: 0018:ffff8801c0c97638 EFLAGS: 00010096
      RAX: ffffffffffff8b7b RBX: 0000000001080220 RCX: 0000000000000000
      RDX: 00000000ffff8b7b RSI: 0000000001080220 RDI: ffff8801dac00040
      RBP: ffff8801c0c976c0 R08: 0000000000000000 R09: 0000000000000000
      R10: ffff8801c0c97620 R11: 0000000000000001 R12: ffff8801dac00040
      R13: ffff8801dac00040 R14: 0000000000000000 R15: 00000000ffff8b7b
      FS:  0000000002119940(0000) GS:ffff8801db200000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020001fec CR3: 00000001d2980000 CR4: 00000000001406f0
      Call Trace:
       __do_kmalloc_node mm/slab.c:3688 [inline]
       __kmalloc_node+0x33/0x70 mm/slab.c:3696
       kmalloc_node include/linux/slab.h:535 [inline]
       alloc_htab_elem+0x2a8/0x480 kernel/bpf/hashtab.c:740
       htab_map_update_elem+0x740/0xb80 kernel/bpf/hashtab.c:820
       map_update_elem kernel/bpf/syscall.c:587 [inline]
       SYSC_bpf kernel/bpf/syscall.c:1468 [inline]
       SyS_bpf+0x20c5/0x4c40 kernel/bpf/syscall.c:1443
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      RIP: 0033:0x440409
      RSP: 002b:00007ffd1f1792b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440409
      RDX: 0000000000000020 RSI: 0000000020006000 RDI: 0000000000000002
      RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000401d70
      R13: 0000000000401e00 R14: 0000000000000000 R15: 0000000000000000
      Code: 83 c2 01 89 50 18 4c 03 70 08 e8 38 f4 ff ff 4d 85 f6 0f 85 3e ff ff ff 44 89 fe 4c 89 ef e8 94 fb ff ff 49 89 c6 e9 2b ff ff ff <0f> 0b 0f 0b 0f 0b 66 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41
      RIP: ____cache_alloc_node+0x1d4/0x1e0 mm/slab.c:3292 RSP: ffff8801c0c97638
      ---[ end trace d745f355da2e33ce ]---
      Kernel panic - not syncing: Fatal exception
      
      Fixes: 96eabe7a ("bpf: Allow selecting numa node during map creation")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      96e5ae4e
  2. 05 9月, 2017 1 次提交
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · 2ff81cd3
      David S. Miller 提交于
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter updates for next-net (part 2)
      
      The following patchset contains Netfilter updates for net-next. This
      patchset includes updates for nf_tables, removal of
      CONFIG_NETFILTER_DEBUG and a new mode for xt_hashlimit. More
      specifically, they:
      
      1) Add new rate match mode for hashlimit, this introduces a new revision
         for this match. The idea is to stop matching packets until ratelimit
         criteria stands true. Patch from Vishwanath Pai.
      
      2) Add ->select_ops indirection to nf_tables named objects, so we can
         choose between different flavours of the same object type, patch from
         Pablo M. Bermudo.
      
      3) Shorter function names in nft_limit, basically:
         s/nft_limit_pkt_bytes/nft_limit_bytes, also from Pablo M. Bermudo.
      
      4) Add new stateful limit named object type, this allows us to create
         limit policies that you can identify via name, also from Pablo.
      
      5) Remove unused hooknum parameter in conntrack ->packet indirection.
         From Florian Westphal.
      
      6) Patches to remove CONFIG_NETFILTER_DEBUG and macros such as
         IP_NF_ASSERT and IP_NF_ASSERT. From Varsha Rao.
      
      7) Add nf_tables_updchain() helper function and use it from
         nf_tables_newchain() to make it more maintainable. Similarly,
         add nf_tables_addchain() and use it too.
      
      8) Add new netlink NLM_F_NONREC flag, this flag should only be used for
         deletion requests, specifically, to support non-recursive deletion.
         Based on what we discussed during NFWS'17 in Faro.
      
      9) Use NLM_F_NONREC from table and sets in nf_tables.
      
      10) Support for recursive chain deletion. Table and set deletion
          commands come with an implicit content flush on deletion, while
          chains do not. This patch addresses this inconsistency by adding
          the code to perform recursive chain deletions. This also comes with
          the bits to deal with the new NLM_F_NONREC netlink flag.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2ff81cd3
  3. 04 9月, 2017 14 次提交
    • P
      netfilter: nf_tables: support for recursive chain deletion · 9dee1474
      Pablo Neira Ayuso 提交于
      This patch sorts out an asymmetry in deletions. Currently, table and set
      deletion commands come with an implicit content flush on deletion.
      However, chain deletion results in -EBUSY if there is content in this
      chain, so no implicit flush happens. So you have to send a flush command
      in first place to delete chains, this is inconsistent and it can be
      annoying in terms of user experience.
      
      This patch uses the new NLM_F_NONREC flag to request non-recursive chain
      deletion, ie. if the chain to be removed contains rules, then this
      returns EBUSY. This problem was discussed during the NFWS'17 in Faro,
      Portugal. In iptables, you hit -EBUSY if you try to delete a chain that
      contains rules, so you have to flush first before you can remove
      anything. Since iptables-compat uses the nf_tables netlink interface, it
      has to use the NLM_F_NONREC flag from userspace to retain the original
      iptables semantics, ie.  bail out on removing chains that contain rules.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      9dee1474
    • P
      netfilter: nf_tables: use NLM_F_NONREC for deletion requests · a8278400
      Pablo Neira Ayuso 提交于
      Bail out if user requests non-recursive deletion for tables and sets.
      This new flags tells nf_tables netlink interface to reject deletions if
      tables and sets have content.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      a8278400
    • P
      netlink: add NLM_F_NONREC flag for deletion requests · 2335ba70
      Pablo Neira Ayuso 提交于
      In the last NFWS in Faro, Portugal, we discussed that netlink is lacking
      the semantics to request non recursive deletions, ie. do not delete an
      object iff it has child objects that hang from this parent object that
      the user requests to be deleted.
      
      We need this new flag to solve a problem for the iptables-compat
      backward compatibility utility, that runs iptables commands using the
      existing nf_tables netlink interface. Specifically, custom chains in
      iptables cannot be deleted if there are rules in it, however, nf_tables
      allows to remove any chain that is populated with content. To sort out
      this asymmetry, iptables-compat userspace sets this new NLM_F_NONREC
      flag to obtain the same semantics that iptables provides.
      
      This new flag should only be used for deletion requests. Note this new
      flag value overlaps with the existing:
      
      * NLM_F_ROOT for get requests.
      * NLM_F_REPLACE for new requests.
      
      However, those flags should not ever be used in deletion requests.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      2335ba70
    • P
      netfilter: nf_tables: add nf_tables_addchain() · 4035285f
      Pablo Neira Ayuso 提交于
      Wrap the chain addition path in a function to make it more maintainable.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      4035285f
    • P
      netfilter: nf_tables: add nf_tables_updchain() · 2c4a488a
      Pablo Neira Ayuso 提交于
      nf_tables_newchain() is too large, wrap the chain update path in a
      function to make it more maintainable.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      2c4a488a
    • V
      net: Remove CONFIG_NETFILTER_DEBUG and _ASSERT() macros. · 9efdb14f
      Varsha Rao 提交于
      This patch removes CONFIG_NETFILTER_DEBUG and _ASSERT() macros as they
      are no longer required. Replace _ASSERT() macros with WARN_ON().
      Signed-off-by: NVarsha Rao <rvarsha016@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      9efdb14f
    • V
      net: Replace NF_CT_ASSERT() with WARN_ON(). · 44d6e2f2
      Varsha Rao 提交于
      This patch removes NF_CT_ASSERT() and instead uses WARN_ON().
      Signed-off-by: NVarsha Rao <rvarsha016@gmail.com>
      44d6e2f2
    • F
      netfilter: remove unused hooknum arg from packet functions · d1c1e39d
      Florian Westphal 提交于
      tested with allmodconfig build.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      d1c1e39d
    • P
      netfilter: nft_limit: add stateful object type · a6912055
      Pablo M. Bermudo Garay 提交于
      Register a new limit stateful object type into the stateful object
      infrastructure.
      Signed-off-by: NPablo M. Bermudo Garay <pablombg@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      a6912055
    • P
      netfilter: nft_limit: replace pkt_bytes with bytes · 6e323887
      Pablo M. Bermudo Garay 提交于
      Just a small refactor patch in order to improve the code readability.
      Signed-off-by: NPablo M. Bermudo Garay <pablombg@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      6e323887
    • P
      netfilter: nf_tables: add select_ops for stateful objects · dfc46034
      Pablo M. Bermudo Garay 提交于
      This patch adds support for overloading stateful objects operations
      through the select_ops() callback, just as it is implemented for
      expressions.
      
      This change is needed for upcoming additions to the stateful objects
      infrastructure.
      Signed-off-by: NPablo M. Bermudo Garay <pablombg@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      dfc46034
    • V
      netfilter: xt_hashlimit: add rate match mode · bea74641
      Vishwanath Pai 提交于
      This patch adds a new feature to hashlimit that allows matching on the
      current packet/byte rate without rate limiting. This can be enabled
      with a new flag --hashlimit-rate-match. The match returns true if the
      current rate of packets is above/below the user specified value.
      
      The main difference between the existing algorithm and the new one is
      that the existing algorithm rate-limits the flow whereas the new
      algorithm does not. Instead it *classifies* the flow based on whether
      it is above or below a certain rate. I will demonstrate this with an
      example below. Let us assume this rule:
      
      iptables -A INPUT -m hashlimit --hashlimit-above 10/s -j new_chain
      
      If the packet rate is 15/s, the existing algorithm would ACCEPT 10
      packets every second and send 5 packets to "new_chain".
      
      But with the new algorithm, as long as the rate of 15/s is sustained,
      all packets will continue to match and every packet is sent to new_chain.
      
      This new functionality will let us classify different flows based on
      their current rate, so that further decisions can be made on them based on
      what the current rate is.
      
      This is how the new algorithm works:
      We divide time into intervals of 1 (sec/min/hour) as specified by
      the user. We keep track of the number of packets/bytes processed in the
      current interval. After each interval we reset the counter to 0.
      
      When we receive a packet for match, we look at the packet rate
      during the current interval and the previous interval to make a
      decision:
      
      if [ prev_rate < user and cur_rate < user ]
              return Below
      else
              return Above
      
      Where cur_rate is the number of packets/bytes seen in the current
      interval, prev is the number of packets/bytes seen in the previous
      interval and 'user' is the rate specified by the user.
      
      We also provide flexibility to the user for choosing the time
      interval using the option --hashilmit-interval. For example the user can
      keep a low rate like x/hour but still keep the interval as small as 1
      second.
      
      To preserve backwards compatibility we have to add this feature in a new
      revision, so I've created revision 3 for hashlimit. The two new options
      we add are:
      
      --hashlimit-rate-match
      --hashlimit-rate-interval
      
      I have updated the help text to add these new options. Also added a few
      tests for the new options.
      Suggested-by: NIgor Lubashev <ilubashe@akamai.com>
      Reviewed-by: NJosh Hunt <johunt@akamai.com>
      Signed-off-by: NVishwanath Pai <vpai@akamai.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      bea74641
    • D
      Merge branch 'for-upstream' of... · 45865dab
      David S. Miller 提交于
      Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
      
      Johan Hedberg says:
      
      ====================
      pull request: bluetooth-next 2017-09-03
      
      Here's one last bluetooth-next pull request for the 4.14 kernel:
      
       - NULL pointer fix in ca8210 802.15.4 driver
       - A few "const" fixes
       - New Kconfig option for disabling legacy interfaces
      
      Please let me know if there are any issues pulling. Thanks.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      45865dab
    • D
      Merge branch 'qualcomm-rmnet-Fix-comments-on-initial-patchset' · f98ce389
      David S. Miller 提交于
      Subash Abhinov Kasiviswanathan says:
      
      ====================
      net: qualcomm: rmnet: Fix comments on initial patchset
      
      This series fixes the comments from Dan on the first patch series.
      
      Fixes a memory corruption which could occur if mux_id was higher than 32.
      Remove the RMNET_LOCAL_LOGICAL_ENDPOINT which is no longer used.
      Make a log message more useful.
      Combine __rmnet_set_endpoint_config() with rmnet_set_endpoint_config().
      Set the mux_id in rmnet_vnd_newlink().
      Set the ingress and egress data format directly in newlink.
      Implement ndo_get_iflink to find the real_dev.
      Rename the real_dev_info to port to make it similar to other drivers.
      
      The conversion of rmnet_devices to a list and hash lookup will be sent
      as part of a seperate patch.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f98ce389