1. 01 7月, 2017 22 次提交
  2. 30 6月, 2017 18 次提交
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · 52a623bd
      David S. Miller 提交于
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter updates for net-next
      
      The following patchset contains Netfilter updates for your net-next
      tree. This batch contains connection tracking updates for the cleanup
      iteration path, patches from Florian Westphal:
      
      X) Skip unconfirmed conntracks in nf_ct_iterate_cleanup_net(), just set
         dying bit to let the CPU release them.
      
      X) Add nf_ct_iterate_destroy() to be used on module removal, to kill
         conntrack from all namespace.
      
      X) Restart iteration on hashtable resizing, since both may occur at
         the same time.
      
      X) Use the new nf_ct_iterate_destroy() to remove conntrack with NAT
         mapping on module removal.
      
      X) Use nf_ct_iterate_destroy() to remove conntrack entries helper
         module removal, from Liping Zhang.
      
      X) Use nf_ct_iterate_cleanup_net() to remove the timeout extension
         if user requests this, also from Liping.
      
      X) Add net_ns_barrier() and use it from FTP helper, so make sure
         no concurrent namespace removal happens at the same time while
         the helper module is being removed.
      
      X) Use NFPROTO_MAX in layer 3 conntrack protocol array, to reduce
         module size. Same thing in nf_tables.
      
      Updates for the nf_tables infrastructure:
      
      X) Prepare usage of the extended ACK reporting infrastructure for
         nf_tables.
      
      X) Remove unnecessary forward declaration in nf_tables hash set.
      
      X) Skip set size estimation if number of element is not specified.
      
      X) Changes to accomodate a (faster) unresizable hash set implementation,
         for anonymous sets and dynamic size fixed sets with no timeouts.
      
      X) Faster lookup function for unresizable hash table for 2 and 4
         bytes key.
      
      And, finally, a bunch of asorted small updates and cleanups:
      
      X) Do not hold reference to netdev from ipt_CLUSTER, instead subscribe
         to device events and look up for index from the packet path, this
         is fixing an issue that is present since the very beginning, patch
         from Xin Long.
      
      X) Use nf_register_net_hook() in ipt_CLUSTER, from Florian Westphal.
      
      X) Use ebt_invalid_target() whenever possible in the ebtables tree,
         from Gao Feng.
      
      X) Calm down compilation warning in nf_dup infrastructure, patch from
         stephen hemminger.
      
      X) Statify functions in nftables rt expression, also from stephen.
      
      X) Update Makefile to use canonical method to specify nf_tables-objs.
         From Jike Song.
      
      X) Use nf_conntrack_helpers_register() in amanda and H323.
      
      X) Space cleanup for ctnetlink, from linzhang.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52a623bd
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 4d8a991d
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Need to access netdev->num_rx_queues behind an accessor in netvsc
          driver otherwise the build breaks with some configs, from Arnd
          Bergmann.
      
       2) Add dummy xfrm_dev_event() so that build doesn't fail when
          CONFIG_XFRM_OFFLOAD is not set. From Hangbin Liu.
      
       3) Don't OOPS when pfkey_msg2xfrm_state() signals an erros, from Dan
          Carpenter.
      
       4) Fix MCDI command size for filter operations in sfc driver, from
          Martin Habets.
      
       5) Fix UFO segmenting so that we don't calculate incorrect checksums,
          from Michal Kubecek.
      
       6) When ipv6 datagram connects fail, reset destination address and
          port. From Wei Wang.
      
       7) TCP disconnect must reset the cached receive DST, from WANG Cong.
      
       8) Fix sign extension bug on 32-bit in dev_get_stats(), from Eric
          Dumazet.
      
       9) fman driver has to depend on HAS_DMA, from Madalin Bucur.
      
      10) Fix bpf pointer leak with xadd in verifier, from Daniel Borkmann.
      
      11) Fix negative page counts with GFO, from Michal Kubecek.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits)
        sfc: fix attempt to translate invalid filter ID
        net: handle NAPI_GRO_FREE_STOLEN_HEAD case also in napi_frags_finish()
        bpf: prevent leaking pointer via xadd on unpriviledged
        arcnet: com20020-pci: add missing pdev setup in netdev structure
        arcnet: com20020-pci: fix dev_id calculation
        arcnet: com20020: remove needless base_addr assignment
        Trivial fix to spelling mistake in arc_printk message
        arcnet: change irq handler to lock irqsave
        rocker: move dereference before free
        mlxsw: spectrum_router: Fix NULL pointer dereference
        net: sched: Fix one possible panic when no destroy callback
        virtio-net: serialize tx routine during reset
        net: usb: asix88179_178a: Add support for the Belkin B2B128
        fsl/fman: add dependency on HAS_DMA
        net: prevent sign extension in dev_get_stats()
        tcp: reset sk_rx_dst in tcp_disconnect()
        net: ipv6: reset daddr and dport in sk if connect() fails
        bnx2x: Don't log mc removal needlessly
        bnxt_en: Fix netpoll handling.
        bnxt_en: Add missing logic to handle TPA end error conditions.
        ...
      4d8a991d
    • L
      Merge tag 'for-4.12/dm-fixes-5' of... · 27bc3440
      Linus Torvalds 提交于
      Merge tag 'for-4.12/dm-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
      
      Pull device mapper fixes from Mike Snitzer:
      
       - dm thinp fix for crash that will occur when metadata device failure
         races with discard passdown to the underlying data device.
      
       - dm raid fix to not access the superblock's >= 1.9.0 'sectors' member
         unconditionally.
      
      * tag 'for-4.12/dm-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
        dm thin: do not queue freed thin mapping for next stage processing
        dm raid: fix oops on upgrading to extended superblock format
      27bc3440
    • L
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 374bf883
      Linus Torvalds 提交于
      Pull block fixes from Jens Axboe:
       "Two fixes that should go into this release.
      
        One is an nvme regression fix from Keith, fixing a missing queue
        freeze if the controller is being reset. This causes the reset to
        hang.
      
        The other is a fix for a leak of the bio protection info, if smaller
        sized O_DIRECT is used. This fix should be more involved as we have
        other problematic paths in the kernel, but given as this isn't a
        regression in this series, we'll tackle those for 4.13"
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        block: provide bio_uninit() free freeing integrity/task associations
        nvme/pci: Fix stuck nvme reset
      374bf883
    • E
      sfc: fix attempt to translate invalid filter ID · d58299a4
      Edward Cree 提交于
      When filter insertion fails with no rollback, we were trying to convert
       EFX_EF10_FILTER_ID_INVALID to an id to store in 'ids' (which is either
       vlan->uc or vlan->mc).  This would WARN_ON_ONCE and then record a bogus
       filter ID of 0x1fff, neither of which is a good thing.
      
      Fixes: 0ccb998b ("sfc: fix filter_id misinterpretation in edge case")
      Signed-off-by: NEdward Cree <ecree@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d58299a4
    • D
      Merge branch 'mlx4-dynamic-tc-tx-queues' · fcce2fdb
      David S. Miller 提交于
      Tariq Toukan says:
      
      ====================
      mlx4_en dynamic TC tx queues
      
      This patchset from Inbar aligns the number of TX queues
      to the actual need, according to the TC configuration.
      
      Series generated against net-next commit:
      2ee87db3 Merge branch 'nfp-get_phys_port_name-for-representors-and-SR-IOV-reorder'
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fcce2fdb
    • I
      net/mlx4_en: Do not allocate redundant TX queues when TC is disabled · ec327f7a
      Inbar Karmy 提交于
      Currently the number of TX queues that are allocated doesn't depend
      on the number of TCs, the module always loads with max num of UP
      per channel.
      In order to prevent the allocation of unnecessary memory, the
      module will load with minimum number of UPs per channel, and the
      user will be able to control the number of TX queues per channel
      by changing the number of TC to 8 using the tc command.
      The variable num_up will hold the information about the current
      number of UPs.
      Due to the change, needed to remove the lines that set the value of
      UP to be different than zero in the func "mlx4_en_select_queue",
      since now the num of TX queues that are allocated is only one per channel
      in default.
      In order not to force the UP to be zero in case of only one TC, added
      a condition before forcing it in the func "mlx4_en_fill_qp_context".
      
      Tested:
      After the module is loaded with minimum number of UP per channel, to
      increase num of TCs to 8, use:
      tc qdisc add dev ens8 root mqprio num_tc 8
      In order to decrease the number of TCs to minimum number of UP per channel,
      use:
      tc qdisc del dev ens8 root
      Signed-off-by: NInbar Karmy <inbark@mellanox.com>
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Cc: Tarick Bedeir <tarick@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec327f7a
    • I
      net/mlx4_en: Add dynamic variable to hold the number of user priorities (UP) · f21ad614
      Inbar Karmy 提交于
      Until this patch, the number of UPs was hard coded for eight.
      Replace this with a variable in struct "mlx4_en_port_profile".
      Currently, the variable will hold the maximum number of UP,
      as before.
      The patch creates an infrastructure to add an option for dynamic
      change of the actual number of TCs.
      Signed-off-by: NInbar Karmy <inbark@mellanox.com>
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Cc: Tarick Bedeir <tarick@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f21ad614
    • M
      net: handle NAPI_GRO_FREE_STOLEN_HEAD case also in napi_frags_finish() · e44699d2
      Michal Kubeček 提交于
      Recently I started seeing warnings about pages with refcount -1. The
      problem was traced to packets being reused after their head was merged into
      a GRO packet by skb_gro_receive(). While bisecting the issue pointed to
      commit c21b48cc ("net: adjust skb->truesize in ___pskb_trim()") and
      I have never seen it on a kernel with it reverted, I believe the real
      problem appeared earlier when the option to merge head frag in GRO was
      implemented.
      
      Handling NAPI_GRO_FREE_STOLEN_HEAD state was only added to GRO_MERGED_FREE
      branch of napi_skb_finish() so that if the driver uses napi_gro_frags()
      and head is merged (which in my case happens after the skb_condense()
      call added by the commit mentioned above), the skb is reused including the
      head that has been merged. As a result, we release the page reference
      twice and eventually end up with negative page refcount.
      
      To fix the problem, handle NAPI_GRO_FREE_STOLEN_HEAD in napi_frags_finish()
      the same way it's done in napi_skb_finish().
      
      Fixes: d7e8883c ("net: make GRO aware of skb->head_frag")
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e44699d2
    • A
      net: bridge: constify attribute_group structures. · cddbb79f
      Arvind Yadav 提交于
      attribute_groups are not supposed to change at runtime. All functions
      working with attribute_groups provided by <linux/sysfs.h> work with const
      attribute_group. So mark the non-const structs as const.
      
      File size before:
         text	   data	    bss	    dec	    hex	filename
         2645	    896	      0	   3541	    dd5	net/bridge/br_sysfs_br.o
      
      File size After adding 'const':
         text	   data	    bss	    dec	    hex	filename
         2701	    832	      0	   3533	    dcd	net/bridge/br_sysfs_br.o
      Signed-off-by: NArvind Yadav <arvind.yadav.cs@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cddbb79f
    • A
      net: constify attribute_group structures. · 38ef00cc
      Arvind Yadav 提交于
      attribute_groups are not supposed to change at runtime. All functions
      working with attribute_groups provided by <linux/device.h> work with const
      attribute_group. So mark the non-const structs as const.
      
      File size before:
         text	   data	    bss	    dec	    hex	filename
         9968	   3168	     16	  13152	   3360	net/core/net-sysfs.o
      
      File size After adding 'const':
         text	   data	    bss	    dec	    hex	filename
        10160	   2976	     16	  13152	   3360	net/core/net-sysfs.o
      Signed-off-by: NArvind Yadav <arvind.yadav.cs@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      38ef00cc
    • A
      net: freescale: gianfar : constify dev_pm_ops structures. · ee27244b
      Arvind Yadav 提交于
      dev_pm_ops are not supposed to change at runtime. All functions
      working with dev_pm_ops provided by <linux/device.h> work with const
      dev_pm_ops. So mark the non-const structs as const.
      
      File size before:
         text	   data	    bss	    dec	    hex	filename
        19057	    392	      0	  19449	   4bf9	drivers/net/ethernet/freescale/gianfar.o
      
      File size After adding 'const':
         text	   data	    bss	    dec	    hex	filename
        19249	    192	      0	  19441	   4bf1	drivers/net/ethernet/freescale/gianfar.o
      Signed-off-by: NArvind Yadav <arvind.yadav.cs@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee27244b
    • A
      net: smc91x: constify dev_pm_ops structures. · d19724ec
      Arvind Yadav 提交于
      dev_pm_ops are not supposed to change at runtime. All functions
      working with dev_pm_ops provided by <linux/device.h> work with const
      dev_pm_ops. So mark the non-const structs as const.
      
      File size before:
         text	   data	    bss	    dec	    hex	filename
        18709	    401	      0	  19110	   4aa6	drivers/net/ethernet/smsc/smc91x.o
      
      File size After adding 'const':
         text	   data	    bss	    dec	    hex	filename
        18901	    201	      0	  19102	   4a9e	drivers/net/ethernet/smsc/smc91x.o
      Signed-off-by: NArvind Yadav <arvind.yadav.cs@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d19724ec
    • A
      net: ibm: ibmveth: constify dev_pm_ops structures. · eb60a73d
      Arvind Yadav 提交于
      dev_pm_ops are not supposed to change at runtime. All functions
      working with dev_pm_ops provided by <linux/device.h> work with const
      dev_pm_ops. So mark the non-const structs as const.
      
      File size before:
         text	   data	    bss	    dec	    hex	filename
        15426	   1256	      0	  16682	   412a	drivers/net/ethernet/ibm/ibmveth.o
      
      File size After adding 'const':
         text	   data	    bss	    dec	    hex	filename
        15618	   1064	      0	  16682	   412a	drivers/net/ethernet/ibm/ibmveth.o
      Signed-off-by: NArvind Yadav <arvind.yadav.cs@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb60a73d
    • D
      bpf: prevent leaking pointer via xadd on unpriviledged · 6bdf6abc
      Daniel Borkmann 提交于
      Leaking kernel addresses on unpriviledged is generally disallowed,
      for example, verifier rejects the following:
      
        0: (b7) r0 = 0
        1: (18) r2 = 0xffff897e82304400
        3: (7b) *(u64 *)(r1 +48) = r2
        R2 leaks addr into ctx
      
      Doing pointer arithmetic on them is also forbidden, so that they
      don't turn into unknown value and then get leaked out. However,
      there's xadd as a special case, where we don't check the src reg
      for being a pointer register, e.g. the following will pass:
      
        0: (b7) r0 = 0
        1: (7b) *(u64 *)(r1 +48) = r0
        2: (18) r2 = 0xffff897e82304400 ; map
        4: (db) lock *(u64 *)(r1 +48) += r2
        5: (95) exit
      
      We could store the pointer into skb->cb, loose the type context,
      and then read it out from there again to leak it eventually out
      of a map value. Or more easily in a different variant, too:
      
         0: (bf) r6 = r1
         1: (7a) *(u64 *)(r10 -8) = 0
         2: (bf) r2 = r10
         3: (07) r2 += -8
         4: (18) r1 = 0x0
         6: (85) call bpf_map_lookup_elem#1
         7: (15) if r0 == 0x0 goto pc+3
         R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R6=ctx R10=fp
         8: (b7) r3 = 0
         9: (7b) *(u64 *)(r0 +0) = r3
        10: (db) lock *(u64 *)(r0 +0) += r6
        11: (b7) r0 = 0
        12: (95) exit
      
        from 7 to 11: R0=inv,min_value=0,max_value=0 R6=ctx R10=fp
        11: (b7) r0 = 0
        12: (95) exit
      
      Prevent this by checking xadd src reg for pointer types. Also
      add a couple of test cases related to this.
      
      Fixes: 1be7f75d ("bpf: enable non-root eBPF programs")
      Fixes: 17a52670 ("bpf: verifier (add verifier core)")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NEdward Cree <ecree@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6bdf6abc
    • T
      ibmvnic: Fix assignment of RX/TX IRQ's · 5df969c3
      Thomas Falcon 提交于
      The driver currently creates RX/TX queues during device probe, but
      assigns IRQ's to them during device open. On reset, however,
      IRQ's are assigned when resetting the queues. If there is a reset
      while the device is closed and the device is later opened, the driver will
      request IRQ's twice, causing the open to fail. This patch assigns
      the IRQ's in the ibmvnic_init function after the queues are reset or
      initialized, ensuring IRQ's are only requested once.
      Signed-off-by: NThomas Falcon <tlfalcon@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5df969c3
    • D
      net: ipmr: Add ipmr_rtm_getroute · 4f75ba69
      Donald Sharp 提交于
      Add to RTNL_FAMILY_IPMR, RTM_GETROUTE the ability
      to retrieve one S,G mroute from a specified table.
      
      *,G will return mroute information for just that
      particular mroute if it exists.  This is because
      it is entirely possible to have more S's then
      can fit in one skb to return to the requesting
      process.
      Signed-off-by: NDonald Sharp <sharpd@cumulusnetworks.com>
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f75ba69
    • M
      bpf: Fix out-of-bound access on interpreters[] · 8007e40a
      Martin KaFai Lau 提交于
      The index is off-by-one when fp->aux->stack_depth
      has already been rounded up to 32.  In particular,
      if stack_depth is 512, the index will be 16.
      
      The fix is to round_up and then takes -1 instead of round_down.
      
      [   22.318680] ==================================================================
      [   22.319745] BUG: KASAN: global-out-of-bounds in bpf_prog_select_runtime+0x48a/0x670
      [   22.320737] Read of size 8 at addr ffffffff82aadae0 by task sockex3/1946
      [   22.321646]
      [   22.321858] CPU: 1 PID: 1946 Comm: sockex3 Tainted: G        W       4.12.0-rc6-01680-g2ee87db3 #22
      [   22.323061] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.el7.centos 04/01/2014
      [   22.324260] Call Trace:
      [   22.324612]  dump_stack+0x67/0x99
      [   22.325081]  print_address_description+0x1e8/0x290
      [   22.325734]  ? bpf_prog_select_runtime+0x48a/0x670
      [   22.326360]  kasan_report+0x265/0x350
      [   22.326860]  __asan_report_load8_noabort+0x19/0x20
      [   22.327484]  bpf_prog_select_runtime+0x48a/0x670
      [   22.328109]  bpf_prog_load+0x626/0xd40
      [   22.328637]  ? __bpf_prog_charge+0xc0/0xc0
      [   22.329222]  ? check_nnp_nosuid.isra.61+0x100/0x100
      [   22.329890]  ? __might_fault+0xf6/0x1b0
      [   22.330446]  ? lock_acquire+0x360/0x360
      [   22.331013]  SyS_bpf+0x67c/0x24d0
      [   22.331491]  ? trace_hardirqs_on+0xd/0x10
      [   22.332049]  ? __getnstimeofday64+0xaf/0x1c0
      [   22.332635]  ? bpf_prog_get+0x20/0x20
      [   22.333135]  ? __audit_syscall_entry+0x300/0x600
      [   22.333770]  ? syscall_trace_enter+0x540/0xdd0
      [   22.334339]  ? exit_to_usermode_loop+0xe0/0xe0
      [   22.334950]  ? do_syscall_64+0x48/0x410
      [   22.335446]  ? bpf_prog_get+0x20/0x20
      [   22.335954]  do_syscall_64+0x181/0x410
      [   22.336454]  entry_SYSCALL64_slow_path+0x25/0x25
      [   22.337121] RIP: 0033:0x7f263fe81f19
      [   22.337618] RSP: 002b:00007ffd9a3440c8 EFLAGS: 00000202 ORIG_RAX: 0000000000000141
      [   22.338619] RAX: ffffffffffffffda RBX: 0000000000aac5fb RCX: 00007f263fe81f19
      [   22.339600] RDX: 0000000000000030 RSI: 00007ffd9a3440d0 RDI: 0000000000000005
      [   22.340470] RBP: 0000000000a9a1e0 R08: 0000000000a9a1e0 R09: 0000009d00000001
      [   22.341430] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000010000
      [   22.342411] R13: 0000000000a9a023 R14: 0000000000000001 R15: 0000000000000003
      [   22.343369]
      [   22.343593] The buggy address belongs to the variable:
      [   22.344241]  interpreters+0x80/0x980
      [   22.344708]
      [   22.344908] Memory state around the buggy address:
      [   22.345556]  ffffffff82aad980: 00 00 00 04 fa fa fa fa 04 fa fa fa fa fa fa fa
      [   22.346449]  ffffffff82aada00: 00 00 00 00 00 fa fa fa fa fa fa fa 00 00 00 00
      [   22.347361] >ffffffff82aada80: 00 00 00 00 00 00 00 00 00 00 00 00 fa fa fa fa
      [   22.348301]                                                        ^
      [   22.349142]  ffffffff82aadb00: 00 01 fa fa fa fa fa fa 00 00 00 00 00 00 00 00
      [   22.350058]  ffffffff82aadb80: 00 00 07 fa fa fa fa fa 00 00 05 fa fa fa fa fa
      [   22.350984] ==================================================================
      
      Fixes: b870aa90 ("bpf: use different interpreter depending on required stack size")
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NAlexei Starovoitov <ast@fb.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8007e40a