1. 23 10月, 2016 9 次提交
  2. 21 10月, 2016 11 次提交
    • W
      ipv6: fix a potential deadlock in do_ipv6_setsockopt() · 8651be8f
      WANG Cong 提交于
      Baozeng reported this deadlock case:
      
             CPU0                    CPU1
             ----                    ----
        lock([  165.136033] sk_lock-AF_INET6);
                                     lock([  165.136033] rtnl_mutex);
                                     lock([  165.136033] sk_lock-AF_INET6);
        lock([  165.136033] rtnl_mutex);
      
      Similar to commit 87e9f031
      ("ipv4: fix a potential deadlock in mcast getsockopt() path")
      this is due to we still have a case, ipv6_sock_mc_close(),
      where we acquire sk_lock before rtnl_lock. Close this deadlock
      with the similar solution, that is always acquire rtnl lock first.
      
      Fixes: baf606d9 ("ipv4,ipv6: grab rtnl before locking the socket")
      Reported-by: NBaozeng Ding <sploving1@gmail.com>
      Tested-by: NBaozeng Ding <sploving1@gmail.com>
      Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Reviewed-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8651be8f
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 8dbad1a8
      David S. Miller 提交于
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for your net tree,
      they are:
      
      1) Fix compilation warning in xt_hashlimit on m68k 32-bits, from
         Geert Uytterhoeven.
      
      2) Fix wrong timeout in set elements added from packet path via
         nft_dynset, from Anders K. Pedersen.
      
      3) Remove obsolete nf_conntrack_events_retry_timeout sysctl
         documentation, from Nicolas Dichtel.
      
      4) Ensure proper initialization of log flags via xt_LOG, from
         Liping Zhang.
      
      5) Missing alias to autoload ipcomp, also from Liping Zhang.
      
      6) Missing NFTA_HASH_OFFSET attribute validation, again from Liping.
      
      7) Wrong integer type in the new nft_parse_u32_check() function,
         from Dan Carpenter.
      
      8) Another wrong integer type declaration in nft_exthdr_init, also
         from Dan Carpenter.
      
      9) Fix insufficient mode validation in nft_range.
      
      10) Fix compilation warning in nft_range due to possible uninitialized
          value, from Arnd Bergmann.
      
      11) Zero nf_hook_ops allocated via xt_hook_alloc() in x_tables to
          calm down kmemcheck, from Florian Westphal.
      
      12) Schedule gc_worker() to run again if GC_MAX_EVICTS quota is reached,
          from Nicolas Dichtel.
      
      13) Fix nf_queue() after conversion to single-linked hook list, related
          to incorrect bypass flag handling and incorrect hook point of
          reinjection.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8dbad1a8
    • F
      kexec: Export kexec_in_progress to modules · 97dcaa0f
      Florian Fainelli 提交于
      The bcm_sf2 driver uses kexec_in_progress to know whether it can power
      down an integrated PHY during shutdown, and can be built as a module.
      Other modules may be using this in the future, so export it.
      
      Fixes: 2399d614 ("net: dsa: bcm_sf2: Prevent GPHY shutdown for kexec'd kernels")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      97dcaa0f
    • E
      ipv4: disable BH in set_ping_group_range() · a681574c
      Eric Dumazet 提交于
      In commit 4ee3bd4a ("ipv4: disable BH when changing ip local port
      range") Cong added BH protection in set_local_port_range() but missed
      that same fix was needed in set_ping_group_range()
      
      Fixes: b8f1a556 ("udp: Add function to make source port for UDP tunnels")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NEric Salo <salo@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a681574c
    • E
      udp: must lock the socket in udp_disconnect() · 286c72de
      Eric Dumazet 提交于
      Baozeng Ding reported KASAN traces showing uses after free in
      udp_lib_get_port() and other related UDP functions.
      
      A CONFIG_DEBUG_PAGEALLOC=y kernel would eventually crash.
      
      I could write a reproducer with two threads doing :
      
      static int sock_fd;
      static void *thr1(void *arg)
      {
      	for (;;) {
      		connect(sock_fd, (const struct sockaddr *)arg,
      			sizeof(struct sockaddr_in));
      	}
      }
      
      static void *thr2(void *arg)
      {
      	struct sockaddr_in unspec;
      
      	for (;;) {
      		memset(&unspec, 0, sizeof(unspec));
      	        connect(sock_fd, (const struct sockaddr *)&unspec,
      			sizeof(unspec));
              }
      }
      
      Problem is that udp_disconnect() could run without holding socket lock,
      and this was causing list corruptions.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NBaozeng Ding <sploving1@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      286c72de
    • F
      net: dsa: bcm_sf2: Prevent GPHY shutdown for kexec'd kernels · 2399d614
      Florian Fainelli 提交于
      For a kernel that is being kexec'd we re-enable the integrated GPHY in
      order for the subsequent MDIO bus scan to succeed and properly bind to
      the bcm7xxx PHY driver. If we did not do that, the GPHY would be shut
      down by the time the MDIO driver is probing the bus, and it would fail
      to read the correct PHY OUI and therefore bind to an appropriate PHY
      driver. Later on, this would cause DSA not to be able to successfully
      attach to the PHY, and the interface would not be created at all.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2399d614
    • D
      bpf, test: fix ld_abs + vlan push/pop stress test · 0d906b1e
      Daniel Borkmann 提交于
      After commit 636c2628 ("net: skbuff: Remove errornous length
      validation in skb_vlan_pop()") mentioned test case stopped working,
      throwing a -12 (ENOMEM) return code. The issue however is not due to
      636c2628, but rather due to a buggy test case that got uncovered
      from the change in behaviour in 636c2628.
      
      The data_size of that test case for the skb was set to 1. In the
      bpf_fill_ld_abs_vlan_push_pop() handler bpf insns are generated that
      loop with: reading skb data, pushing 68 tags, reading skb data,
      popping 68 tags, reading skb data, etc, in order to force a skb
      expansion and thus trigger that JITs recache skb->data. Problem is
      that initial data_size is too small.
      
      While before 636c2628, the test silently bailed out due to the
      skb->len < VLAN_ETH_HLEN check with returning 0, and now throwing an
      error from failing skb_ensure_writable(). Set at least minimum of
      ETH_HLEN as an initial length so that on first push of data, equivalent
      pop will succeed.
      
      Fixes: 4d9c5c53 ("test_bpf: add bpf_skb_vlan_push/pop() tests")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d906b1e
    • S
      net: add recursion limit to GRO · fcd91dd4
      Sabrina Dubroca 提交于
      Currently, GRO can do unlimited recursion through the gro_receive
      handlers.  This was fixed for tunneling protocols by limiting tunnel GRO
      to one level with encap_mark, but both VLAN and TEB still have this
      problem.  Thus, the kernel is vulnerable to a stack overflow, if we
      receive a packet composed entirely of VLAN headers.
      
      This patch adds a recursion counter to the GRO layer to prevent stack
      overflow.  When a gro_receive function hits the recursion limit, GRO is
      aborted for this skb and it is processed normally.  This recursion
      counter is put in the GRO CB, but could be turned into a percpu counter
      if we run out of space in the CB.
      
      Thanks to Vladimír Beneš <vbenes@redhat.com> for the initial bug report.
      
      Fixes: CVE-2016-7039
      Fixes: 9b174d88 ("net: Add Transparent Ethernet Bridging GRO support.")
      Fixes: 66e5133f ("vlan: Add GRO support for non hardware accelerated vlan")
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: NJiri Benc <jbenc@redhat.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fcd91dd4
    • J
      ipv6: properly prevent temp_prefered_lft sysctl race · 7aa8e63f
      Jiri Bohac 提交于
      The check for an underflow of tmp_prefered_lft is always false
      because tmp_prefered_lft is unsigned. The intention of the check
      was to guard against racing with an update of the
      temp_prefered_lft sysctl, potentially resulting in an underflow.
      
      As suggested by David Miller, the best way to prevent the race is
      by reading the sysctl variable using READ_ONCE.
      Signed-off-by: NJiri Bohac <jbohac@suse.cz>
      Reported-by: NJulia Lawall <julia.lawall@lip6.fr>
      Fixes: 76506a98 ("IPv6: fix DESYNC_FACTOR")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7aa8e63f
    • P
      netfilter: fix nf_queue handling · 7034b566
      Pablo Neira Ayuso 提交于
      nf_queue handling is broken since e3b37f11 ("netfilter: replace
      list_head with single linked list") for two reasons:
      
      1) If the bypass flag is set on, there are no userspace listeners and
         we still have more hook entries to iterate over, then jump to the
         next hook. Otherwise accept the packet. On nf_reinject() path, the
         okfn() needs to be invoked.
      
      2) We should not re-enter the same hook on packet reinjection. If the
         packet is accepted, we have to skip the current hook from where the
         packet was enqueued, otherwise the packets gets enqueued over and
         over again.
      
      This restores the previous list_for_each_entry_continue() behaviour
      happening from nf_iterate() that was dealing with these two cases.
      This patch introduces a new nf_queue() wrapper function so this fix
      becomes simpler.
      
      Fixes: e3b37f11 ("netfilter: replace list_head with single linked list")
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      7034b566
    • N
      netfilter: conntrack: restart gc immediately if GC_MAX_EVICTS is reached · 7bb6615d
      Nicolas Dichtel 提交于
      When the maximum evictions number is reached, do not wait 5 seconds before
      the next run.
      
      CC: Florian Westphal <fw@strlen.de>
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      7bb6615d
  3. 20 10月, 2016 13 次提交
  4. 19 10月, 2016 7 次提交
    • E
      tcp: do not export sysctl_tcp_low_latency · 82454581
      Eric Dumazet 提交于
      Since commit b2fb4f54 ("tcp: uninline tcp_prequeue()") we no longer
      access sysctl_tcp_low_latency from a module.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      82454581
    • J
      rtnetlink: Add rtnexthop offload flag to compare mask · 85dda4e5
      Jiri Pirko 提交于
      The offload flag is a status flag and should not be used by
      FIB semantics for comparison.
      
      Fixes: 37ed9493 ("rtnetlink: add RTNH_F_EXTERNAL flag for fib offload")
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Reviewed-by: NAndy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      85dda4e5
    • I
      switchdev: Execute bridge ndos only for bridge ports · 97c24290
      Ido Schimmel 提交于
      We recently got the following warning after setting up a vlan device on
      top of an offloaded bridge and executing 'bridge link':
      
      WARNING: CPU: 0 PID: 18566 at drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c:81 mlxsw_sp_port_orig_get.part.9+0x55/0x70 [mlxsw_spectrum]
      [...]
       CPU: 0 PID: 18566 Comm: bridge Not tainted 4.8.0-rc7 #1
       Hardware name: Mellanox Technologies Ltd. Mellanox switch/Mellanox switch, BIOS 4.6.5 05/21/2015
        0000000000000286 00000000e64ab94f ffff880406e6f8f0 ffffffff8135eaa3
        0000000000000000 0000000000000000 ffff880406e6f930 ffffffff8108c43b
        0000005106e6f988 ffff8803df398840 ffff880403c60108 ffff880406e6f990
       Call Trace:
        [<ffffffff8135eaa3>] dump_stack+0x63/0x90
        [<ffffffff8108c43b>] __warn+0xcb/0xf0
        [<ffffffff8108c56d>] warn_slowpath_null+0x1d/0x20
        [<ffffffffa01420d5>] mlxsw_sp_port_orig_get.part.9+0x55/0x70 [mlxsw_spectrum]
        [<ffffffffa0142195>] mlxsw_sp_port_attr_get+0xa5/0xb0 [mlxsw_spectrum]
        [<ffffffff816f151f>] switchdev_port_attr_get+0x4f/0x140
        [<ffffffff816f15d0>] switchdev_port_attr_get+0x100/0x140
        [<ffffffff816f15d0>] switchdev_port_attr_get+0x100/0x140
        [<ffffffff816f1d6b>] switchdev_port_bridge_getlink+0x5b/0xc0
        [<ffffffff816f2680>] ? switchdev_port_fdb_dump+0x90/0x90
        [<ffffffff815f5427>] rtnl_bridge_getlink+0xe7/0x190
        [<ffffffff8161a1b2>] netlink_dump+0x122/0x290
        [<ffffffff8161b0df>] __netlink_dump_start+0x15f/0x190
        [<ffffffff815f5340>] ? rtnl_bridge_dellink+0x230/0x230
        [<ffffffff815fab46>] rtnetlink_rcv_msg+0x1a6/0x220
        [<ffffffff81208118>] ? __kmalloc_node_track_caller+0x208/0x2c0
        [<ffffffff815f5340>] ? rtnl_bridge_dellink+0x230/0x230
        [<ffffffff815fa9a0>] ? rtnl_newlink+0x890/0x890
        [<ffffffff8161cf54>] netlink_rcv_skb+0xa4/0xc0
        [<ffffffff815f56f8>] rtnetlink_rcv+0x28/0x30
        [<ffffffff8161c92c>] netlink_unicast+0x18c/0x240
        [<ffffffff8161ccdb>] netlink_sendmsg+0x2fb/0x3a0
        [<ffffffff815c5a48>] sock_sendmsg+0x38/0x50
        [<ffffffff815c6031>] SYSC_sendto+0x101/0x190
        [<ffffffff815c7111>] ? __sys_recvmsg+0x51/0x90
        [<ffffffff815c6b6e>] SyS_sendto+0xe/0x10
        [<ffffffff817017f2>] entry_SYSCALL_64_fastpath+0x1a/0xa4
      
      The problem is that the 8021q module propagates the call to
      ndo_bridge_getlink() via switchdev ops, but the switch driver doesn't
      recognize the netdev, as it's not offloaded.
      
      While we can ignore calls being made to non-bridge ports inside the
      driver, a better fix would be to push this check up to the switchdev
      layer.
      
      Note that these ndos can be called for non-bridged netdev, but this only
      happens in certain PF drivers which don't call the corresponding
      switchdev functions anyway.
      
      Fixes: 99f44bb3 ("mlxsw: spectrum: Enable L3 interfaces on top of bridge devices")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reported-by: NTamir Winetroub <tamirw@mellanox.com>
      Tested-by: NTamir Winetroub <tamirw@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      97c24290
    • I
      net: core: Correctly iterate over lower adjacency list · e4961b07
      Ido Schimmel 提交于
      Tamir reported the following trace when processing ARP requests received
      via a vlan device on top of a VLAN-aware bridge:
      
       NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [swapper/1:0]
      [...]
       CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W       4.8.0-rc7 #1
       Hardware name: Mellanox Technologies Ltd. "MSN2100-CB2F"/"SA001017", BIOS 5.6.5 06/07/2016
       task: ffff88017edfea40 task.stack: ffff88017ee10000
       RIP: 0010:[<ffffffff815dcc73>]  [<ffffffff815dcc73>] netdev_all_lower_get_next_rcu+0x33/0x60
      [...]
       Call Trace:
        <IRQ>
        [<ffffffffa015de0a>] mlxsw_sp_port_lower_dev_hold+0x5a/0xa0 [mlxsw_spectrum]
        [<ffffffffa016f1b0>] mlxsw_sp_router_netevent_event+0x80/0x150 [mlxsw_spectrum]
        [<ffffffff810ad07a>] notifier_call_chain+0x4a/0x70
        [<ffffffff810ad13a>] atomic_notifier_call_chain+0x1a/0x20
        [<ffffffff815ee77b>] call_netevent_notifiers+0x1b/0x20
        [<ffffffff815f2eb6>] neigh_update+0x306/0x740
        [<ffffffff815f38ce>] neigh_event_ns+0x4e/0xb0
        [<ffffffff8165ea3f>] arp_process+0x66f/0x700
        [<ffffffff8170214c>] ? common_interrupt+0x8c/0x8c
        [<ffffffff8165ec29>] arp_rcv+0x139/0x1d0
        [<ffffffff816e505a>] ? vlan_do_receive+0xda/0x320
        [<ffffffff815e3794>] __netif_receive_skb_core+0x524/0xab0
        [<ffffffff815e6830>] ? dev_queue_xmit+0x10/0x20
        [<ffffffffa06d612d>] ? br_forward_finish+0x3d/0xc0 [bridge]
        [<ffffffffa06e5796>] ? br_handle_vlan+0xf6/0x1b0 [bridge]
        [<ffffffff815e3d38>] __netif_receive_skb+0x18/0x60
        [<ffffffff815e3dc0>] netif_receive_skb_internal+0x40/0xb0
        [<ffffffff815e3e4c>] netif_receive_skb+0x1c/0x70
        [<ffffffffa06d7856>] br_pass_frame_up+0xc6/0x160 [bridge]
        [<ffffffffa06d63d7>] ? deliver_clone+0x37/0x50 [bridge]
        [<ffffffffa06d656c>] ? br_flood+0xcc/0x160 [bridge]
        [<ffffffffa06d7b14>] br_handle_frame_finish+0x224/0x4f0 [bridge]
        [<ffffffffa06d7f94>] br_handle_frame+0x174/0x300 [bridge]
        [<ffffffff815e3599>] __netif_receive_skb_core+0x329/0xab0
        [<ffffffff81374815>] ? find_next_bit+0x15/0x20
        [<ffffffff8135e802>] ? cpumask_next_and+0x32/0x50
        [<ffffffff810c9968>] ? load_balance+0x178/0x9b0
        [<ffffffff815e3d38>] __netif_receive_skb+0x18/0x60
        [<ffffffff815e3dc0>] netif_receive_skb_internal+0x40/0xb0
        [<ffffffff815e3e4c>] netif_receive_skb+0x1c/0x70
        [<ffffffffa01544a1>] mlxsw_sp_rx_listener_func+0x61/0xb0 [mlxsw_spectrum]
        [<ffffffffa005c9f7>] mlxsw_core_skb_receive+0x187/0x200 [mlxsw_core]
        [<ffffffffa007332a>] mlxsw_pci_cq_tasklet+0x63a/0x9b0 [mlxsw_pci]
        [<ffffffff81091986>] tasklet_action+0xf6/0x110
        [<ffffffff81704556>] __do_softirq+0xf6/0x280
        [<ffffffff8109213f>] irq_exit+0xdf/0xf0
        [<ffffffff817042b4>] do_IRQ+0x54/0xd0
        [<ffffffff8170214c>] common_interrupt+0x8c/0x8c
      
      The problem is that netdev_all_lower_get_next_rcu() never advances the
      iterator, thereby causing the loop over the lower adjacency list to run
      forever.
      
      Fix this by advancing the iterator and avoid the infinite loop.
      
      Fixes: 7ce856aa ("mlxsw: spectrum: Add couple of lower device helper functions")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reported-by: NTamir Winetroub <tamirw@mellanox.com>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e4961b07
    • E
      flow_dissector: Check skb for VLAN only if skb specified. · 3805a938
      Eric Garver 提交于
      Fixes a panic when calling eth_get_headlen(). Noticed on i40e driver.
      
      Fixes: d5709f7a ("flow_dissector: For stripped vlan, get vlan info from skb->vlan_tci")
      Signed-off-by: NEric Garver <e@erig.me>
      Reviewed-by: NJakub Sitnicki <jkbs@redhat.com>
      Acked-by: NAmir Vadai <amir@vadai.me>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3805a938
    • W
      qed: Use list_move_tail instead of list_del/list_add_tail · b4f0fd4b
      Wei Yongjun 提交于
      Using list_move_tail() instead of list_del() + list_add_tail().
      Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com>
      Acked-by: NYuval Mintz <Yuval.Mintz@cavium.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b4f0fd4b
    • A
      rocker: fix maybe-uninitialized warning · ecf244f7
      Arnd Bergmann 提交于
      In some rare configurations, we get a warning about the 'index' variable
      being used without an initialization:
      
      drivers/net/ethernet/rocker/rocker_ofdpa.c: In function ‘ofdpa_port_fib_ipv4.isra.16.constprop’:
      drivers/net/ethernet/rocker/rocker_ofdpa.c:2425:92: warning: ‘index’ may be used uninitialized in this function [-Wmaybe-uninitialized]
      
      This is a false positive, the logic is just a bit too complex for gcc
      to follow here. Moving the intialization of 'index' a little further
      down makes it clear to gcc that the function always returns an error
      if it is not initialized.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ecf244f7