1. 25 8月, 2017 11 次提交
    • P
      tipc: reassign pointers after skb reallocation / linearization · 60d1d936
      Parthasarathy Bhuvaragan 提交于
      In tipc_msg_reverse(), we assign skb attributes to local pointers
      in stack at startup. This is followed by skb_linearize() and for
      cloned buffers we perform skb relocation using pskb_expand_head().
      Both these methods may update the skb attributes and thus making
      the pointers incorrect.
      
      In this commit, we fix this error by ensuring that the pointers
      are re-assigned after any of these skb operations.
      
      Fixes: 29042e19 ("tipc: let function tipc_msg_reverse() expand header
      when needed")
      Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      60d1d936
    • P
      tipc: perform skb_linearize() before parsing the inner header · 27163138
      Parthasarathy Bhuvaragan 提交于
      In tipc_rcv(), we linearize only the header and usually the packets
      are consumed as the nodes permit direct reception. However, if the
      skb contains tunnelled message due to fail over or synchronization
      we parse it in tipc_node_check_state() without performing
      linearization. This will cause link disturbances if the skb was
      non linear.
      
      In this commit, we perform linearization for the above messages.
      Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      27163138
    • E
      net_sched: fix a refcount_t issue with noop_qdisc · 551143d8
      Eric Dumazet 提交于
      syzkaller reported a refcount_t warning [1]
      
      Issue here is that noop_qdisc refcnt was never really considered as
      a true refcount, since qdisc_destroy() found TCQ_F_BUILTIN set :
      
      if (qdisc->flags & TCQ_F_BUILTIN ||
          !refcount_dec_and_test(&qdisc->refcnt)))
      	return;
      
      Meaning that all atomic_inc() we did on noop_qdisc.refcnt were not
      really needed, but harmless until refcount_t came.
      
      To fix this problem, we simply need to not increment noop_qdisc.refcnt,
      since we never decrement it.
      
      [1]
      refcount_t: increment on 0; use-after-free.
      ------------[ cut here ]------------
      WARNING: CPU: 0 PID: 21754 at lib/refcount.c:152 refcount_inc+0x47/0x50 lib/refcount.c:152
      Kernel panic - not syncing: panic_on_warn set ...
      
      CPU: 0 PID: 21754 Comm: syz-executor7 Not tainted 4.13.0-rc6+ #20
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:16 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:52
       panic+0x1e4/0x417 kernel/panic.c:180
       __warn+0x1c4/0x1d9 kernel/panic.c:541
       report_bug+0x211/0x2d0 lib/bug.c:183
       fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:190
       do_trap_no_signal arch/x86/kernel/traps.c:224 [inline]
       do_trap+0x260/0x390 arch/x86/kernel/traps.c:273
       do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:310
       do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:323
       invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:846
      RIP: 0010:refcount_inc+0x47/0x50 lib/refcount.c:152
      RSP: 0018:ffff8801c43477a0 EFLAGS: 00010282
      RAX: 000000000000002b RBX: ffffffff86093c14 RCX: 0000000000000000
      RDX: 000000000000002b RSI: ffffffff8159314e RDI: ffffed0038868ee8
      RBP: ffff8801c43477a8 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff86093ac0
      R13: 0000000000000001 R14: ffff8801d0f3bac0 R15: dffffc0000000000
       attach_default_qdiscs net/sched/sch_generic.c:792 [inline]
       dev_activate+0x7d3/0xaa0 net/sched/sch_generic.c:833
       __dev_open+0x227/0x330 net/core/dev.c:1380
       __dev_change_flags+0x695/0x990 net/core/dev.c:6726
       dev_change_flags+0x88/0x140 net/core/dev.c:6792
       dev_ifsioc+0x5a6/0x930 net/core/dev_ioctl.c:256
       dev_ioctl+0x2bc/0xf90 net/core/dev_ioctl.c:554
       sock_do_ioctl+0x94/0xb0 net/socket.c:968
       sock_ioctl+0x2c2/0x440 net/socket.c:1058
       vfs_ioctl fs/ioctl.c:45 [inline]
       do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
       SYSC_ioctl fs/ioctl.c:700 [inline]
       SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
      
      Fixes: 7b936405 ("net, sched: convert Qdisc.refcnt from atomic_t to refcount_t")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: Reshetova, Elena <elena.reshetova@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      551143d8
    • F
      net: systemport: Free DMA coherent descriptors on errors · c2062ee3
      Florian Fainelli 提交于
      In case bcm_sysport_init_tx_ring() is not able to allocate ring->cbs, we
      would return with an error, and call bcm_sysport_fini_tx_ring() and it
      would see that ring->cbs is NULL and do nothing. This would leak the
      coherent DMA descriptor area, so we need to free it on error before
      returning.
      Reported-by: NEric Dumazet <edumazet@gmail.com>
      Fixes: 80105bef ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c2062ee3
    • F
      net: bcmgenet: Be drop monitor friendly · d4fec855
      Florian Fainelli 提交于
      There are 3 spots where we call dev_kfree_skb() but we are actually
      just doing a normal SKB consumption: __bcmgenet_tx_reclaim() for normal
      TX reclamation, bcmgenet_alloc_rx_buffers() during the initial RX ring
      setup and bcmgenet_free_rx_buffers() during RX ring cleanup.
      
      Fixes: d6707bec ("net: bcmgenet: rewrite bcmgenet_rx_refill()")
      Fixes: f48bed16 ("net: bcmgenet: Free skb after last Tx frag")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4fec855
    • Y
      bpf: fix bpf_setsockopts return value · 4e458deb
      Yuchung Cheng 提交于
      This patch fixes a bug causing any sock operations to always return EINVAL.
      
      Fixes: a5192c52 ("bpf: fix to bpf_setsockops").
      Reported-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NCraig Gallek <kraig@google.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e458deb
    • F
      net: systemport: Be drop monitor friendly · c45182eb
      Florian Fainelli 提交于
      Utilize dev_consume_skb_any(cb->skb) in bcm_sysport_free_cb() which is
      used when a TX packet is completed, as well as when the RX ring is
      cleaned on shutdown. None of these two cases are packet drops, so be
      drop monitor friendly.
      Suggested-by: NEric Dumazet <edumazet@gmail.com>
      Fixes: 80105bef ("net: systemport: add Broadcom SYSTEMPORT Ethernet MAC driver")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c45182eb
    • B
      tipc: Fix tipc_sk_reinit handling of -EAGAIN · 6c7e983b
      Bob Peterson 提交于
      In 9dbbfb0a function tipc_sk_reinit
      had additional logic added to loop in the event that function
      rhashtable_walk_next() returned -EAGAIN. No worries.
      
      However, if rhashtable_walk_start returns -EAGAIN, it does "continue",
      and therefore skips the call to rhashtable_walk_stop(). That has
      the effect of calling rcu_read_lock() without its paired call to
      rcu_read_unlock(). Since rcu_read_lock() may be nested, the problem
      may not be apparent for a while, especially since resize events may
      be rare. But the comments to rhashtable_walk_start() state:
      
       * ...Note that we take the RCU lock in all
       * cases including when we return an error.  So you must always call
       * rhashtable_walk_stop to clean up.
      
      This patch replaces the continue with a goto and label to ensure a
      matching call to rhashtable_walk_stop().
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6c7e983b
    • A
      qlge: avoid memcpy buffer overflow · e58f9583
      Arnd Bergmann 提交于
      gcc-8.0.0 (snapshot) points out that we copy a variable-length string
      into a fixed length field using memcpy() with the destination length,
      and that ends up copying whatever follows the string:
      
          inlined from 'ql_core_dump' at drivers/net/ethernet/qlogic/qlge/qlge_dbg.c:1106:2:
      drivers/net/ethernet/qlogic/qlge/qlge_dbg.c:708:2: error: 'memcpy' reading 15 bytes from a region of size 14 [-Werror=stringop-overflow=]
        memcpy(seg_hdr->description, desc, (sizeof(seg_hdr->description)) - 1);
      
      Changing it to use strncpy() will instead zero-pad the destination,
      which seems to be the right thing to do here.
      
      The bug is probably harmless, but it seems like a good idea to address
      it in stable kernels as well, if only for the purpose of building with
      gcc-8 without warnings.
      
      Fixes: a61f8026 ("qlge: Add ethtool register dump function.")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e58f9583
    • E
      virtio_net: be drop monitor friendly · dadc0736
      Eric Dumazet 提交于
      This change is needed to not fool drop monitor.
      (perf record ... -e skb:kfree_skb )
      
      Packets were properly sent and are consumed after TX completion.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dadc0736
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · af57d2b7
      David S. Miller 提交于
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains Netfilter fixes for your net tree,
      they are:
      
      1) Fix use after free of struct proc_dir_entry in ipt_CLUSTERIP, patch
         from Sabrina Dubroca.
      
      2) Fix spurious EINVAL errors from iptables over nft compatibility layer.
      
      3) Reload pointer to ip header only if there is non-terminal verdict,
         ie. XT_CONTINUE, otherwise invalid memory access may happen, patch
         from Taehee Yoo.
      
      4) Fix interaction between SYNPROXY and NAT, SYNPROXY adds sequence
         adjustment already, however from nf_nat_setup() assumes there's not.
         Patch from Xin Long.
      
      5) Fix burst arithmetics in nft_limit as Joe Stringer mentioned during
         NFWS in Faro. Patch from Andy Zhou.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af57d2b7
  2. 24 8月, 2017 17 次提交
    • A
      netfilter: nf_tables: Fix nft limit burst handling · c26844ed
      andy zhou 提交于
      Current implementation treats the burst configuration the same as
      rate configuration. This can cause the per packet cost to be lower
      than configured. In effect, this bug causes the token bucket to be
      refilled at a higher rate than what user has specified.
      
      This patch changes the implementation so that the token bucket size
      is controlled by "rate + burst", while maintain the token bucket
      refill rate the same as user specified.
      
      Fixes: 96518518 ("netfilter: add nftables")
      Signed-off-by: NAndy Zhou <azhou@ovn.org>
      Acked-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      c26844ed
    • X
      netfilter: check for seqadj ext existence before adding it in nf_nat_setup_info · ab6dd1be
      Xin Long 提交于
      Commit 4440a2ab ("netfilter: synproxy: Check oom when adding synproxy
      and seqadj ct extensions") wanted to drop the packet when it fails to add
      seqadj ext due to no memory by checking if nfct_seqadj_ext_add returns
      NULL.
      
      But that nfct_seqadj_ext_add returns NULL can also happen when seqadj ext
      already exists in a nf_conn. It will cause that userspace protocol doesn't
      work when both dnat and snat are configured.
      
      Li Shuang found this issue in the case:
      
      Topo:
         ftp client                   router                  ftp server
        10.167.131.2  <-> 10.167.131.254  10.167.141.254 <-> 10.167.141.1
      
      Rules:
        # iptables -t nat -A PREROUTING -i eth1 -p tcp -m tcp --dport 21 -j \
          DNAT --to-destination 10.167.141.1
        # iptables -t nat -A POSTROUTING -o eth2 -p tcp -m tcp --dport 21 -j \
          SNAT --to-source 10.167.141.254
      
      In router, when both dnat and snat are added, nf_nat_setup_info will be
      called twice. The packet can be dropped at the 2nd time for DNAT due to
      seqadj ext is already added at the 1st time for SNAT.
      
      This patch is to fix it by checking for seqadj ext existence before adding
      it, so that the packet will not be dropped if seqadj ext already exists.
      
      Note that as Florian mentioned, as a long term, we should review ext_add()
      behaviour, it's better to return a pointer to the existing ext instead.
      
      Fixes: 4440a2ab ("netfilter: synproxy: Check oom when adding synproxy and seqadj ct extensions")
      Reported-by: NLi Shuang <shuali@redhat.com>
      Acked-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      ab6dd1be
    • D
      Merge branch 'bnxt_en-bug-fixes' · d0273ef3
      David S. Miller 提交于
      Michael Chan says:
      
      ====================
      bnxt_en: bug fixes.
      
      3 bug fixes related to XDP ring accounting in bnxt_setup_tc(), freeing
      MSIX vectors when bnxt_re unregisters, and preserving the user-administered
      PF MAC address when disabling SRIOV.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d0273ef3
    • M
      bnxt_en: Do not setup MAC address in bnxt_hwrm_func_qcaps(). · a22a6ac2
      Michael Chan 提交于
      bnxt_hwrm_func_qcaps() is called during probe to get all device
      resources and it also sets up the factory MAC address.  The same function
      is called when SRIOV is disabled to reclaim all resources.  If
      the MAC address has been overridden by a user administered MAC
      address, calling this function will overwrite it.
      
      Separate the logic that sets up the default MAC address into a new
      function bnxt_init_mac_addr() that is only called during probe time.
      
      Fixes: 4a21b49b ("bnxt_en: Improve VF resource accounting.")
      Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a22a6ac2
    • M
      bnxt_en: Free MSIX vectors when unregistering the device from bnxt_re. · 146ed3c5
      Michael Chan 提交于
      Take back ownership of the MSIX vectors when unregistering the device
      from bnxt_re.
      
      Fixes: a588e458 ("bnxt_en: Add interface to support RDMA driver.")
      Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      146ed3c5
    • M
      bnxt_en: Fix .ndo_setup_tc() to include XDP rings. · 87e9b377
      Michael Chan 提交于
      When the number of TX rings is changed in bnxt_setup_tc(), we need to
      include the XDP rings in the total TX ring count.
      
      Fixes: 38413406 ("bnxt_en: Add support for XDP_TX action.")
      Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      87e9b377
    • J
      nfp: TX time stamp packets before HW doorbell is rung · 46f1c52e
      Jakub Kicinski 提交于
      TX completion may happen any time after HW queue was kicked.
      We can't access the skb afterwards.  Move the time stamping
      before ringing the doorbell.
      
      Fixes: 4c352362 ("net: add driver for Netronome NFP4000/NFP6000 NIC VFs")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      46f1c52e
    • S
      sctp: Avoid out-of-bounds reads from address storage · ee6c88bb
      Stefano Brivio 提交于
      inet_diag_msg_sctp{,l}addr_fill() and sctp_get_sctp_info() copy
      sizeof(sockaddr_storage) bytes to fill in sockaddr structs used
      to export diagnostic information to userspace.
      
      However, the memory allocated to store sockaddr information is
      smaller than that and depends on the address family, so we leak
      up to 100 uninitialized bytes to userspace. Just use the size of
      the source structs instead, in all the three cases this is what
      userspace expects. Zero out the remaining memory.
      
      Unused bytes (i.e. when IPv4 addresses are used) in source
      structs sctp_sockaddr_entry and sctp_transport are already
      cleared by sctp_add_bind_addr() and sctp_transport_new(),
      respectively.
      
      Noticed while testing KASAN-enabled kernel with 'ss':
      
      [ 2326.885243] BUG: KASAN: slab-out-of-bounds in inet_sctp_diag_fill+0x42c/0x6c0 [sctp_diag] at addr ffff881be8779800
      [ 2326.896800] Read of size 128 by task ss/9527
      [ 2326.901564] CPU: 0 PID: 9527 Comm: ss Not tainted 4.11.0-22.el7a.x86_64 #1
      [ 2326.909236] Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 01/17/2017
      [ 2326.917585] Call Trace:
      [ 2326.920312]  dump_stack+0x63/0x8d
      [ 2326.924014]  kasan_object_err+0x21/0x70
      [ 2326.928295]  kasan_report+0x288/0x540
      [ 2326.932380]  ? inet_sctp_diag_fill+0x42c/0x6c0 [sctp_diag]
      [ 2326.938500]  ? skb_put+0x8b/0xd0
      [ 2326.942098]  ? memset+0x31/0x40
      [ 2326.945599]  check_memory_region+0x13c/0x1a0
      [ 2326.950362]  memcpy+0x23/0x50
      [ 2326.953669]  inet_sctp_diag_fill+0x42c/0x6c0 [sctp_diag]
      [ 2326.959596]  ? inet_diag_msg_sctpasoc_fill+0x460/0x460 [sctp_diag]
      [ 2326.966495]  ? __lock_sock+0x102/0x150
      [ 2326.970671]  ? sock_def_wakeup+0x60/0x60
      [ 2326.975048]  ? remove_wait_queue+0xc0/0xc0
      [ 2326.979619]  sctp_diag_dump+0x44a/0x760 [sctp_diag]
      [ 2326.985063]  ? sctp_ep_dump+0x280/0x280 [sctp_diag]
      [ 2326.990504]  ? memset+0x31/0x40
      [ 2326.994007]  ? mutex_lock+0x12/0x40
      [ 2326.997900]  __inet_diag_dump+0x57/0xb0 [inet_diag]
      [ 2327.003340]  ? __sys_sendmsg+0x150/0x150
      [ 2327.007715]  inet_diag_dump+0x4d/0x80 [inet_diag]
      [ 2327.012979]  netlink_dump+0x1e6/0x490
      [ 2327.017064]  __netlink_dump_start+0x28e/0x2c0
      [ 2327.021924]  inet_diag_handler_cmd+0x189/0x1a0 [inet_diag]
      [ 2327.028045]  ? inet_diag_rcv_msg_compat+0x1b0/0x1b0 [inet_diag]
      [ 2327.034651]  ? inet_diag_dump_compat+0x190/0x190 [inet_diag]
      [ 2327.040965]  ? __netlink_lookup+0x1b9/0x260
      [ 2327.045631]  sock_diag_rcv_msg+0x18b/0x1e0
      [ 2327.050199]  netlink_rcv_skb+0x14b/0x180
      [ 2327.054574]  ? sock_diag_bind+0x60/0x60
      [ 2327.058850]  sock_diag_rcv+0x28/0x40
      [ 2327.062837]  netlink_unicast+0x2e7/0x3b0
      [ 2327.067212]  ? netlink_attachskb+0x330/0x330
      [ 2327.071975]  ? kasan_check_write+0x14/0x20
      [ 2327.076544]  netlink_sendmsg+0x5be/0x730
      [ 2327.080918]  ? netlink_unicast+0x3b0/0x3b0
      [ 2327.085486]  ? kasan_check_write+0x14/0x20
      [ 2327.090057]  ? selinux_socket_sendmsg+0x24/0x30
      [ 2327.095109]  ? netlink_unicast+0x3b0/0x3b0
      [ 2327.099678]  sock_sendmsg+0x74/0x80
      [ 2327.103567]  ___sys_sendmsg+0x520/0x530
      [ 2327.107844]  ? __get_locked_pte+0x178/0x200
      [ 2327.112510]  ? copy_msghdr_from_user+0x270/0x270
      [ 2327.117660]  ? vm_insert_page+0x360/0x360
      [ 2327.122133]  ? vm_insert_pfn_prot+0xb4/0x150
      [ 2327.126895]  ? vm_insert_pfn+0x32/0x40
      [ 2327.131077]  ? vvar_fault+0x71/0xd0
      [ 2327.134968]  ? special_mapping_fault+0x69/0x110
      [ 2327.140022]  ? __do_fault+0x42/0x120
      [ 2327.144008]  ? __handle_mm_fault+0x1062/0x17a0
      [ 2327.148965]  ? __fget_light+0xa7/0xc0
      [ 2327.153049]  __sys_sendmsg+0xcb/0x150
      [ 2327.157133]  ? __sys_sendmsg+0xcb/0x150
      [ 2327.161409]  ? SyS_shutdown+0x140/0x140
      [ 2327.165688]  ? exit_to_usermode_loop+0xd0/0xd0
      [ 2327.170646]  ? __do_page_fault+0x55d/0x620
      [ 2327.175216]  ? __sys_sendmsg+0x150/0x150
      [ 2327.179591]  SyS_sendmsg+0x12/0x20
      [ 2327.183384]  do_syscall_64+0xe3/0x230
      [ 2327.187471]  entry_SYSCALL64_slow_path+0x25/0x25
      [ 2327.192622] RIP: 0033:0x7f41d18fa3b0
      [ 2327.196608] RSP: 002b:00007ffc3b731218 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      [ 2327.205055] RAX: ffffffffffffffda RBX: 00007ffc3b731380 RCX: 00007f41d18fa3b0
      [ 2327.213017] RDX: 0000000000000000 RSI: 00007ffc3b731340 RDI: 0000000000000003
      [ 2327.220978] RBP: 0000000000000002 R08: 0000000000000004 R09: 0000000000000040
      [ 2327.228939] R10: 00007ffc3b730f30 R11: 0000000000000246 R12: 0000000000000003
      [ 2327.236901] R13: 00007ffc3b731340 R14: 00007ffc3b7313d0 R15: 0000000000000084
      [ 2327.244865] Object at ffff881be87797e0, in cache kmalloc-64 size: 64
      [ 2327.251953] Allocated:
      [ 2327.254581] PID = 9484
      [ 2327.257215]  save_stack_trace+0x1b/0x20
      [ 2327.261485]  save_stack+0x46/0xd0
      [ 2327.265179]  kasan_kmalloc+0xad/0xe0
      [ 2327.269165]  kmem_cache_alloc_trace+0xe6/0x1d0
      [ 2327.274138]  sctp_add_bind_addr+0x58/0x180 [sctp]
      [ 2327.279400]  sctp_do_bind+0x208/0x310 [sctp]
      [ 2327.284176]  sctp_bind+0x61/0xa0 [sctp]
      [ 2327.288455]  inet_bind+0x5f/0x3a0
      [ 2327.292151]  SYSC_bind+0x1a4/0x1e0
      [ 2327.295944]  SyS_bind+0xe/0x10
      [ 2327.299349]  do_syscall_64+0xe3/0x230
      [ 2327.303433]  return_from_SYSCALL_64+0x0/0x6a
      [ 2327.308194] Freed:
      [ 2327.310434] PID = 4131
      [ 2327.313065]  save_stack_trace+0x1b/0x20
      [ 2327.317344]  save_stack+0x46/0xd0
      [ 2327.321040]  kasan_slab_free+0x73/0xc0
      [ 2327.325220]  kfree+0x96/0x1a0
      [ 2327.328530]  dynamic_kobj_release+0x15/0x40
      [ 2327.333195]  kobject_release+0x99/0x1e0
      [ 2327.337472]  kobject_put+0x38/0x70
      [ 2327.341266]  free_notes_attrs+0x66/0x80
      [ 2327.345545]  mod_sysfs_teardown+0x1a5/0x270
      [ 2327.350211]  free_module+0x20/0x2a0
      [ 2327.354099]  SyS_delete_module+0x2cb/0x2f0
      [ 2327.358667]  do_syscall_64+0xe3/0x230
      [ 2327.362750]  return_from_SYSCALL_64+0x0/0x6a
      [ 2327.367510] Memory state around the buggy address:
      [ 2327.372855]  ffff881be8779700: fc fc fc fc 00 00 00 00 00 00 00 00 fc fc fc fc
      [ 2327.380914]  ffff881be8779780: fb fb fb fb fb fb fb fb fc fc fc fc 00 00 00 00
      [ 2327.388972] >ffff881be8779800: 00 00 00 00 fc fc fc fc fb fb fb fb fb fb fb fb
      [ 2327.397031]                                ^
      [ 2327.401792]  ffff881be8779880: fc fc fc fc fb fb fb fb fb fb fb fb fc fc fc fc
      [ 2327.409850]  ffff881be8779900: 00 00 00 00 00 04 fc fc fc fc fc fc 00 00 00 00
      [ 2327.417907] ==================================================================
      
      This fixes CVE-2017-7558.
      
      References: https://bugzilla.redhat.com/show_bug.cgi?id=1480266
      Fixes: 8f840e47 ("sctp: add the sctp_diag.c file")
      Cc: Xin Long <lucien.xin@gmail.com>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Reviewed-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee6c88bb
    • E
      net: dsa: use consume_skb() · 2b33bc8a
      Eric Dumazet 提交于
      Two kfree_skb() should be consume_skb(), to be friend with drop monitor
      (perf record ... -e skb:kfree_skb)
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2b33bc8a
    • D
      Merge branch 'nfp-fixes' · 40e607cb
      David S. Miller 提交于
      Jakub Kicinski says:
      
      ====================
      nfp: fix SR-IOV deadlock and representor bugs
      
      This series tackles the bug I've already tried to fix in commit
      6d48ceb2 ("nfp: allocate a private workqueue for driver work").
      I created a separate workqueue to avoid possible deadlock, and
      the lockdep error disappeared, coincidentally.  The way workqueues
      are operating, separate workqueue doesn't necessarily mean separate
      thread of execution.  Luckily we can safely forego the lock.
      
      Second fix changes the order in which vNIC netdevs and representors
      are created/destroyed.  The fix is kept small and should be sufficient
      for net because of how flower uses representors, a more thorough fix
      will be targeted at net-next.
      
      Third fix avoids leaking mapped frame buffers if FW sent a frame with
      unknown portid.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      40e607cb
    • J
      nfp: avoid buffer leak when representor is missing · 1691a4c0
      Jakub Kicinski 提交于
      When driver receives a muxed frame, but it can't find the representor
      netdev it is destined to it will try to "drop" that frame, i.e. reuse
      the buffer.  The issue is that the replacement buffer has already been
      allocated at this point, and reusing the buffer from received frame
      will leak it.  Change the code to put the new buffer on the ring
      earlier and not reuse the old buffer (make the buffer parameter
      to nfp_net_rx_drop() a NULL).
      
      Fixes: 91bf82ca ("nfp: add support for tx/rx with metadata portid")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1691a4c0
    • J
      nfp: make sure representors are destroyed before their lower netdev · 326ce603
      Jakub Kicinski 提交于
      App start/stop callbacks can perform application initialization.
      Unfortunately, flower app started using them for creating and
      destroying representors.  This can lead to a situation where
      lower vNIC netdev is destroyed while representors still try
      to pass traffic.  This will most likely lead to a NULL-dereference
      on the lower netdev TX path.
      
      Move the start/stop callbacks, so that representors are created/
      destroyed when vNICs are fully initialized.
      
      Fixes: 5de73ee4 ("nfp: general representor implementation")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      326ce603
    • J
      nfp: don't hold PF lock while enabling SR-IOV · d6e1ab9e
      Jakub Kicinski 提交于
      Enabling SR-IOV VFs will cause the PCI subsystem to schedule a
      work and flush its workqueue.  Since the nfp driver schedules its
      own work we can't enable VFs while holding driver load.  Commit
      6d48ceb2 ("nfp: allocate a private workqueue for driver work")
      tried to avoid this deadlock by creating a separate workqueue.
      Unfortunately, due to the architecture of workqueue subsystem this
      does not guarantee a separate thread of execution.  Luckily
      we can simply take pci_enable_sriov() from under the driver lock.
      
      Take pci_disable_sriov() from under the lock too for symmetry.
      
      Fixes: 6d48ceb2 ("nfp: allocate a private workqueue for driver work")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d6e1ab9e
    • D
      Merge branch 'dst-tag-ksz-fix' · 2f19f50e
      David S. Miller 提交于
      Florian Fainelli says:
      
      ====================
      net: dsa: Fix tag_ksz.c
      
      This implements David's suggestion of providing low-level functions
      to control whether skb_pad() and skb_put_padto() should be freeing
      the passed skb.
      
      We make use of it to fix a double free in net/dsa/tag_ksz.c that would
      occur if we kept using skb_put_padto() in both places.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2f19f50e
    • F
      net: dsa: skb_put_padto() already frees nskb · 49716679
      Florian Fainelli 提交于
      The first call of skb_put_padto() will free up the SKB on error, but we
      return NULL which tells dsa_slave_xmit() that the original SKB should be
      freed so this would lead to a double free here.
      
      The second skb_put_padto() already frees the passed sk_buff reference
      upon error, so calling kfree_skb() on it again is not necessary.
      
      Detected by CoverityScan, CID#1416687 ("USE_AFTER_FREE")
      
      Fixes: e71cb9e0 ("net: dsa: ksz: fix skb freeing")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NWoojung Huh <Woojung.Huh@microchip.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      49716679
    • F
      net: core: Specify skb_pad()/skb_put_padto() SKB freeing · cd0a137a
      Florian Fainelli 提交于
      Rename skb_pad() into __skb_pad() and make it take a third argument:
      free_on_error which controls whether kfree_skb() should be called or
      not, skb_pad() directly makes use of it and passes true to preserve its
      existing behavior. Do exactly the same thing with __skb_put_padto() and
      skb_put_padto().
      Suggested-by: NDavid Miller <davem@davemloft.net>
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NWoojung Huh <Woojung.Huh@microchip.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cd0a137a
    • S
      net: stmmac: socfgpa: Ensure emac bit set in sys manager for MII/GMII/SGMII. · 013dae5d
      Stephan Gatzka 提交于
      When using MII/GMII/SGMII in the Altera SoC, the phy needs to be
      wired through the FPGA. To ensure correct behavior, the appropriate
      bit in the System Manager FPGA Interface Group register needs to be
      set.
      Signed-off-by: NStephan Gatzka <stephan.gatzka@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      013dae5d
  3. 23 8月, 2017 12 次提交
    • F
      fsl/man: Inherit parent device and of_node · a1a50c8e
      Florian Fainelli 提交于
      Junote Cai reported that he was not able to get a DSA setup involving the
      Freescale DPAA/FMAN driver to work and narrowed it down to
      of_find_net_device_by_node(). This function requires the network device's
      device reference to be correctly set which is the case here, though we have
      lost any device_node association there.
      
      The problem is that dpaa_eth_add_device() allocates a "dpaa-ethernet" platform
      device, and later on dpaa_eth_probe() is called but SET_NETDEV_DEV() won't be
      propagating &pdev->dev.of_node properly. Fix this by inherenting both the parent
      device and the of_node when dpaa_eth_add_device() creates the platform device.
      
      Fixes: 39339616 ("fsl/fman: Add FMan MAC driver")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a1a50c8e
    • D
      bpf: fix map value attribute for hash of maps · 33ba43ed
      Daniel Borkmann 提交于
      Currently, iproute2's BPF ELF loader works fine with array of maps
      when retrieving the fd from a pinned node and doing a selfcheck
      against the provided map attributes from the object file, but we
      fail to do the same for hash of maps and thus refuse to get the
      map from pinned node.
      
      Reason is that when allocating hash of maps, fd_htab_map_alloc() will
      set the value size to sizeof(void *), and any user space map creation
      requests are forced to set 4 bytes as value size. Thus, selfcheck
      will complain about exposed 8 bytes on 64 bit archs vs. 4 bytes from
      object file as value size. Contract is that fdinfo or BPF_MAP_GET_FD_BY_ID
      returns the value size used to create the map.
      
      Fix it by handling it the same way as we do for array of maps, which
      means that we leave value size at 4 bytes and in the allocation phase
      round up value size to 8 bytes. alloc_htab_elem() needs an adjustment
      in order to copy rounded up 8 bytes due to bpf_fd_htab_map_update_elem()
      calling into htab_map_update_elem() with the pointer of the map
      pointer as value. Unlike array of maps where we just xchg(), we're
      using the generic htab_map_update_elem() callback also used from helper
      calls, which published the key/value already on return, so we need
      to ensure to memcpy() the right size.
      
      Fixes: bcc6b1b7 ("bpf: Add hash of maps support")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      33ba43ed
    • F
      net: phy: Deal with unbound PHY driver in phy_attached_print() · fcd03e36
      Florian Fainelli 提交于
      Priit reported that stmmac was crashing with the trace below. This is because
      phy_attached_print() is called too early right after the PHY device has been
      found, but before it has a driver attached, since that is only done in
      phy_probe() which occurs later.
      
      Fix this by dealing with a possibly NULL phydev->drv point since that can
      happen here, but could also happen if we voluntarily did an unbind of the
      PHY device with the PHY driver.
      
      sun7i-dwmac 1c50000.ethernet: PTP uses main clock
      sun7i-dwmac 1c50000.ethernet: no reset control found
      sun7i-dwmac 1c50000.ethernet: no regulator found
      sun7i-dwmac 1c50000.ethernet: Ring mode enabled
      sun7i-dwmac 1c50000.ethernet: DMA HW capability register supported
      sun7i-dwmac 1c50000.ethernet: Normal descriptors
      libphy: stmmac: probed
      Unable to handle kernel NULL pointer dereference at virtual address 00000048
      pgd = c0004000
      [00000048] *pgd=00000000
      Internal error: Oops: 5 [#1] SMP ARM
      Modules linked in:
      CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc6-00318-g0065bd7fa384 #1
      Hardware name: Allwinner sun7i (A20) Family
      task: ee868000 task.stack: ee85c000
      PC is at phy_attached_print+0x1c/0x8c
      LR is at stmmac_mdio_register+0x12c/0x200
      pc : [<c04510ac>]    lr : [<c045e6b4>]    psr: 60000013
      sp : ee85ddc8  ip : 00000000  fp : c07dfb5c
      r10: ee981210  r9 : 00000001  r8 : eea73000
      r7 : eeaa6dd0  r6 : eeb49800  r5 : 00000000  r4 : 00000000
      r3 : 00000000  r2 : 00000000  r1 : 00000000  r0 : eeb49800
      Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
      Control: 10c5387d  Table: 4000406a  DAC: 00000051
      Process swapper/0 (pid: 1, stack limit = 0xee85c210)
      Stack: (0xee85ddc8 to 0xee85e000)
      ddc0:                   00000000 00000002 eeb49400 eea72000 00000000 eeb49400
      dde0: c045e6b4 00000000 ffffffff eeab0810 00000000 c08051f8 ee9292c0 c016d480
      de00: eea725c0 eea73000 eea72000 00000001 eea726c0 c0457d0c 00000040 00000020
      de20: 00000000 c045b850 00000001 00000000 ee981200 eeab0810 eeaa6ed0 ee981210
      de40: 00000000 c094a4a0 00000000 c0465180 eeaa7550 f08d0000 c9ffb90c 00000032
      de60: fffffffa 00000032 ee981210 ffffffed c0a46620 fffffdfb c0a46620 c03f7be8
      de80: ee981210 c0a9a388 00000000 00000000 c0a46620 c03f63e0 ee981210 c0a46620
      dea0: ee981244 00000000 00000007 000000c6 c094a4a0 c03f6534 00000000 c0a46620
      dec0: c03f6490 c03f49ec ee828a58 ee9217b4 c0a46620 eeaa4b00 c0a43230 c03f59fc
      dee0: c08051f8 c094a49c c0a46620 c0a46620 00000000 c091c668 c093783c c03f6dfc
      df00: ffffe000 00000000 c091c668 c010177c eefe0938 eefe0935 c085e200 000000c6
      df20: 00000005 c0136bc8 60000013 c080b3a4 00000006 00000006 c07ce7b4 00000000
      df40: c07d7ddc c07cef28 eefe0938 eefe093e c0a0b2f0 c0a641c0 c0a641c0 c0a641c0
      df60: c0937834 00000007 000000c6 c094a4a0 00000000 c0900d88 00000006 00000006
      df80: 00000000 c09005a8 00000000 c060ecf4 00000000 00000000 00000000 00000000
      dfa0: 00000000 c060ecfc 00000000 c0107738 00000000 00000000 00000000 00000000
      dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
      dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 ffdeffff ffffffff
      [<c04510ac>] (phy_attached_print) from [<c045e6b4>] (stmmac_mdio_register+0x12c/0x200)
      [<c045e6b4>] (stmmac_mdio_register) from [<c045b850>] (stmmac_dvr_probe+0x850/0x96c)
      [<c045b850>] (stmmac_dvr_probe) from [<c0465180>] (sun7i_gmac_probe+0x120/0x180)
      [<c0465180>] (sun7i_gmac_probe) from [<c03f7be8>] (platform_drv_probe+0x50/0xac)
      [<c03f7be8>] (platform_drv_probe) from [<c03f63e0>] (driver_probe_device+0x234/0x2e4)
      [<c03f63e0>] (driver_probe_device) from [<c03f6534>] (__driver_attach+0xa4/0xa8)
      [<c03f6534>] (__driver_attach) from [<c03f49ec>] (bus_for_each_dev+0x4c/0x9c)
      [<c03f49ec>] (bus_for_each_dev) from [<c03f59fc>] (bus_add_driver+0x190/0x214)
      [<c03f59fc>] (bus_add_driver) from [<c03f6dfc>] (driver_register+0x78/0xf4)
      [<c03f6dfc>] (driver_register) from [<c010177c>] (do_one_initcall+0x44/0x168)
      [<c010177c>] (do_one_initcall) from [<c0900d88>] (kernel_init_freeable+0x144/0x1d0)
      [<c0900d88>] (kernel_init_freeable) from [<c060ecfc>] (kernel_init+0x8/0x110)
      [<c060ecfc>] (kernel_init) from [<c0107738>] (ret_from_fork+0x14/0x3c)
      Code: e59021c8 e59d401c e590302c e3540000 (e5922048)
      ---[ end trace 39ae87c7923562d0 ]---
      Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
      Tested-By: NPriit Laes <plaes@plaes.org>
      Fixes: fbca1647 ("net: stmmac: Use the right logging function in stmmac_mdio_register")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fcd03e36
    • D
      Merge branch 'net-sched-couple-of-chain-fixes' · e188245d
      David S. Miller 提交于
      Jiri Pirko says:
      
      ====================
      net: sched: couple of chain fixes
      
      Jiri Pirko (2):
        net: sched: fix use after free when tcf_chain_destroy is called
          multiple times
        net: sched: don't do tcf_chain_flush from tcf_chain_destroy
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e188245d
    • J
      net: sched: don't do tcf_chain_flush from tcf_chain_destroy · 30d65e8f
      Jiri Pirko 提交于
      tcf_chain_flush needs to be called with RTNL. However, on
      free_tcf->
       tcf_action_goto_chain_fini->
        tcf_chain_put->
         tcf_chain_destroy->
          tcf_chain_flush
      callpath, it is called without RTNL.
      This issue was notified by following warning:
      
      [  155.599052] WARNING: suspicious RCU usage
      [  155.603165] 4.13.0-rc5jiri+ #54 Not tainted
      [  155.607456] -----------------------------
      [  155.611561] net/sched/cls_api.c:195 suspicious rcu_dereference_protected() usage!
      
      Since on this callpath, the chain is guaranteed to be already empty
      by check in tcf_chain_put, move the tcf_chain_flush call out and call it
      only where it is needed - into tcf_block_put.
      
      Fixes: db50514f ("net: sched: add termination action to allow goto chain")
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      30d65e8f
    • J
      net: sched: fix use after free when tcf_chain_destroy is called multiple times · 744a4cf6
      Jiri Pirko 提交于
      The goto_chain termination action takes a reference of a chain. In that
      case, there is an issue when block_put is called tcf_chain_destroy
      directly. The follo-up call of tcf_chain_put by goto_chain action free
      works with memory that is already freed. This was caught by kasan:
      
      [  220.337908] BUG: KASAN: use-after-free in tcf_chain_put+0x1b/0x50
      [  220.344103] Read of size 4 at addr ffff88036d1f2cec by task systemd-journal/261
      [  220.353047] CPU: 0 PID: 261 Comm: systemd-journal Not tainted 4.13.0-rc5jiri+ #54
      [  220.360661] Hardware name: Mellanox Technologies Ltd. Mellanox switch/Mellanox x86 mezzanine board, BIOS 4.6.5 08/02/2016
      [  220.371784] Call Trace:
      [  220.374290]  <IRQ>
      [  220.376355]  dump_stack+0xd5/0x150
      [  220.391485]  print_address_description+0x86/0x410
      [  220.396308]  kasan_report+0x181/0x4c0
      [  220.415211]  tcf_chain_put+0x1b/0x50
      [  220.418949]  free_tcf+0x95/0xc0
      
      So allow tcf_chain_destroy to be called multiple times, free only in
      case the reference count drops to 0.
      
      Fixes: 5bc17018 ("net: sched: introduce multichain support for filters")
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      744a4cf6
    • E
      udp: on peeking bad csum, drop packets even if not at head · fd6055a8
      Eric Dumazet 提交于
      When peeking, if a bad csum is discovered, the skb is unlinked from
      the queue with __sk_queue_drop_skb and the peek operation restarted.
      
      __sk_queue_drop_skb only drops packets that match the queue head.
      
      This fails if the skb was found after the head, using SO_PEEK_OFF
      socket option. This causes an infinite loop.
      
      We MUST drop this problematic skb, and we can simply check if skb was
      already removed by another thread, by looking at skb->next :
      
      This pointer is set to NULL by the  __skb_unlink() operation, that might
      have happened only under the spinlock protection.
      
      Many thanks to syzkaller team (and particularly Dmitry Vyukov who
      provided us nice C reproducers exhibiting the lockup) and Willem de
      Bruijn who provided first version for this patch and a test program.
      
      Fixes: 627d2d6b ("udp: enable MSG_PEEK at non-zero offset")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Acked-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd6055a8
    • S
      macsec: add genl family module alias · 78362998
      Sabrina Dubroca 提交于
      This helps tools such as wpa_supplicant can start even if the macsec
      module isn't loaded yet.
      
      Fixes: c09440f7 ("macsec: introduce IEEE 802.1AE driver")
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      78362998
    • D
      Merge branch 'tipc-topology-server-fixes' · bfe9a6d7
      David S. Miller 提交于
      Parthasarathy Bhuvaragan says:
      
      ====================
      tipc: topology server fixes
      
      The following commits fixes two race conditions causing general
      protection faults.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bfe9a6d7
    • Y
      tipc: fix a race condition of releasing subscriber object · fd849b7c
      Ying Xue 提交于
      No matter whether a request is inserted into workqueue as a work item
      to cancel a subscription or to delete a subscription's subscriber
      asynchronously, the work items may be executed in different workers.
      As a result, it doesn't mean that one request which is raised prior to
      another request is definitely handled before the latter. By contrast,
      if the latter request is executed before the former request, below
      error may happen:
      
      [  656.183644] BUG: spinlock bad magic on CPU#0, kworker/u8:0/12117
      [  656.184487] general protection fault: 0000 [#1] SMP
      [  656.185160] Modules linked in: tipc ip6_udp_tunnel udp_tunnel 9pnet_virtio 9p 9pnet virtio_net virtio_pci virtio_ring virtio [last unloaded: ip6_udp_tunnel]
      [  656.187003] CPU: 0 PID: 12117 Comm: kworker/u8:0 Not tainted 4.11.0-rc7+ #6
      [  656.187920] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [  656.188690] Workqueue: tipc_rcv tipc_recv_work [tipc]
      [  656.189371] task: ffff88003f5cec40 task.stack: ffffc90004448000
      [  656.190157] RIP: 0010:spin_bug+0xdd/0xf0
      [  656.190678] RSP: 0018:ffffc9000444bcb8 EFLAGS: 00010202
      [  656.191375] RAX: 0000000000000034 RBX: ffff88003f8d1388 RCX: 0000000000000000
      [  656.192321] RDX: ffff88003ba13708 RSI: ffff88003ba0cd08 RDI: ffff88003ba0cd08
      [  656.193265] RBP: ffffc9000444bcd0 R08: 0000000000000030 R09: 000000006b6b6b6b
      [  656.194208] R10: ffff8800bde3e000 R11: 00000000000001b4 R12: 6b6b6b6b6b6b6b6b
      [  656.195157] R13: ffffffff81a3ca64 R14: ffff88003f8d1388 R15: ffff88003f8d13a0
      [  656.196101] FS:  0000000000000000(0000) GS:ffff88003ba00000(0000) knlGS:0000000000000000
      [  656.197172] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  656.197935] CR2: 00007f0b3d2e6000 CR3: 000000003ef9e000 CR4: 00000000000006f0
      [  656.198873] Call Trace:
      [  656.199210]  do_raw_spin_lock+0x66/0xa0
      [  656.199735]  _raw_spin_lock_bh+0x19/0x20
      [  656.200258]  tipc_subscrb_subscrp_delete+0x28/0xf0 [tipc]
      [  656.200990]  tipc_subscrb_rcv_cb+0x45/0x260 [tipc]
      [  656.201632]  tipc_receive_from_sock+0xaf/0x100 [tipc]
      [  656.202299]  tipc_recv_work+0x2b/0x60 [tipc]
      [  656.202872]  process_one_work+0x157/0x420
      [  656.203404]  worker_thread+0x69/0x4c0
      [  656.203898]  kthread+0x138/0x170
      [  656.204328]  ? process_one_work+0x420/0x420
      [  656.204889]  ? kthread_create_on_node+0x40/0x40
      [  656.205527]  ret_from_fork+0x29/0x40
      [  656.206012] Code: 48 8b 0c 25 00 c5 00 00 48 c7 c7 f0 24 a3 81 48 81 c1 f0 05 00 00 65 8b 15 61 ef f5 7e e8 9a 4c 09 00 4d 85 e4 44 8b 4b 08 74 92 <45> 8b 84 24 40 04 00 00 49 8d 8c 24 f0 05 00 00 eb 8d 90 0f 1f
      [  656.208504] RIP: spin_bug+0xdd/0xf0 RSP: ffffc9000444bcb8
      [  656.209798] ---[ end trace e2a800e6eb0770be ]---
      
      In above scenario, the request of deleting subscriber was performed
      earlier than the request of canceling a subscription although the
      latter was issued before the former, which means tipc_subscrb_delete()
      was called before tipc_subscrp_cancel(). As a result, when
      tipc_subscrb_subscrp_delete() called by tipc_subscrp_cancel() was
      executed to cancel a subscription, the subscription's subscriber
      refcnt had been decreased to 1. After tipc_subscrp_delete() where
      the subscriber was freed because its refcnt was decremented to zero,
      but the subscriber's lock had to be released, as a consequence, panic
      happened.
      
      By contrast, if we increase subscriber's refcnt before
      tipc_subscrb_subscrp_delete() is called in tipc_subscrp_cancel(),
      the panic issue can be avoided.
      
      Fixes: d094c4d5 ("tipc: add subscription refcount to avoid invalid delete")
      Reported-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd849b7c
    • P
      tipc: remove subscription references only for pending timers · 458be024
      Parthasarathy Bhuvaragan 提交于
      In commit, 139bb36f ("tipc: advance the time of deleting
      subscription from subscriber->subscrp_list"), we delete the
      subscription from the subscribers list and from nametable
      unconditionally. This leads to the following bug if the timer
      running tipc_subscrp_timeout() in another CPU accesses the
      subscription list after the subscription delete request.
      
      [39.570] general protection fault: 0000 [#1] SMP
      ::
      [39.574] task: ffffffff81c10540 task.stack: ffffffff81c00000
      [39.575] RIP: 0010:tipc_subscrp_timeout+0x32/0x80 [tipc]
      [39.576] RSP: 0018:ffff88003ba03e90 EFLAGS: 00010282
      [39.576] RAX: dead000000000200 RBX: ffff88003f0f3600 RCX: 0000000000000101
      [39.577] RDX: dead000000000100 RSI: 0000000000000201 RDI: ffff88003f0d7948
      [39.578] RBP: ffff88003ba03ea0 R08: 0000000000000001 R09: ffff88003ba03ef8
      [39.579] R10: 000000000000014f R11: 0000000000000000 R12: ffff88003f0d7948
      [39.580] R13: ffff88003f0f3618 R14: ffffffffa006c250 R15: ffff88003f0f3600
      [39.581] FS:  0000000000000000(0000) GS:ffff88003ba00000(0000) knlGS:0000000000000000
      [39.582] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [39.583] CR2: 00007f831c6e0714 CR3: 000000003d3b0000 CR4: 00000000000006f0
      [39.584] Call Trace:
      [39.584]  <IRQ>
      [39.585]  call_timer_fn+0x3d/0x180
      [39.585]  ? tipc_subscrb_rcv_cb+0x260/0x260 [tipc]
      [39.586]  run_timer_softirq+0x168/0x1f0
      [39.586]  ? sched_clock_cpu+0x16/0xc0
      [39.587]  __do_softirq+0x9b/0x2de
      [39.587]  irq_exit+0x60/0x70
      [39.588]  smp_apic_timer_interrupt+0x3d/0x50
      [39.588]  apic_timer_interrupt+0x86/0x90
      [39.589] RIP: 0010:default_idle+0x20/0xf0
      [39.589] RSP: 0018:ffffffff81c03e58 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10
      [39.590] RAX: 0000000000000000 RBX: ffffffff81c10540 RCX: 0000000000000000
      [39.591] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
      [39.592] RBP: ffffffff81c03e68 R08: 0000000000000000 R09: 0000000000000000
      [39.593] R10: ffffc90001cbbe00 R11: 0000000000000000 R12: 0000000000000000
      [39.594] R13: ffffffff81c10540 R14: 0000000000000000 R15: 0000000000000000
      [39.595]  </IRQ>
      ::
      [39.603] RIP: tipc_subscrp_timeout+0x32/0x80 [tipc] RSP: ffff88003ba03e90
      [39.604] ---[ end trace 79ce94b7216cb459 ]---
      
      Fixes: 139bb36f ("tipc: advance the time of deleting subscription from subscriber->subscrp_list")
      Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      458be024
    • N
      mlxsw: spectrum_switchdev: Fix mrouter flag update · 4eb6a3bd
      Nogah Frankel 提交于
      Update the value of the mrouter flag in struct mlxsw_sp_bridge_port when
      it is being changed.
      
      Fixes: c57529e1 ("mlxsw: spectrum: Replace vPorts with Port-VLAN")
      Signed-off-by: NNogah Frankel <nogahf@mellanox.com>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4eb6a3bd