1. 28 7月, 2019 40 次提交
    • P
      KVM: nVMX: do not use dangling shadow VMCS after guest reset · 967bc679
      Paolo Bonzini 提交于
      commit 88dddc11a8d6b09201b4db9d255b3394d9bc9e57 upstream.
      
      If a KVM guest is reset while running a nested guest, free_nested will
      disable the shadow VMCS execution control in the vmcs01.  However,
      on the next KVM_RUN vmx_vcpu_run would nevertheless try to sync
      the VMCS12 to the shadow VMCS which has since been freed.
      
      This causes a vmptrld of a NULL pointer on my machime, but Jan reports
      the host to hang altogether.  Let's see how much this trivial patch fixes.
      Reported-by: NJan Kiszka <jan.kiszka@siemens.com>
      Cc: Liran Alon <liran.alon@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      967bc679
    • T
      ext4: allow directory holes · 3a17ca86
      Theodore Ts'o 提交于
      commit 4e19d6b65fb4fc42e352ce9883649e049da14743 upstream.
      
      The largedir feature was intended to allow ext4 directories to have
      unmapped directory blocks (e.g., directory holes).  And so the
      released e2fsprogs no longer enforces this for largedir file systems;
      however, the corresponding change to the kernel-side code was not made.
      
      This commit fixes this oversight.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3a17ca86
    • R
      ext4: use jbd2_inode dirty range scoping · caa4e082
      Ross Zwisler 提交于
      commit 73131fbb003b3691cfcf9656f234b00da497fcd6 upstream.
      
      Use the newly introduced jbd2_inode dirty range scoping to prevent us
      from waiting forever when trying to complete a journal transaction.
      Signed-off-by: NRoss Zwisler <zwisler@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: stable@vger.kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      caa4e082
    • R
      jbd2: introduce jbd2_inode dirty range scoping · af3812b6
      Ross Zwisler 提交于
      commit 6ba0e7dc64a5adcda2fbe65adc466891795d639e upstream.
      
      Currently both journal_submit_inode_data_buffers() and
      journal_finish_inode_data_buffers() operate on the entire address space
      of each of the inodes associated with a given journal entry.  The
      consequence of this is that if we have an inode where we are constantly
      appending dirty pages we can end up waiting for an indefinite amount of
      time in journal_finish_inode_data_buffers() while we wait for all the
      pages under writeback to be written out.
      
      The easiest way to cause this type of workload is do just dd from
      /dev/zero to a file until it fills the entire filesystem.  This can
      cause journal_finish_inode_data_buffers() to wait for the duration of
      the entire dd operation.
      
      We can improve this situation by scoping each of the inode dirty ranges
      associated with a given transaction.  We do this via the jbd2_inode
      structure so that the scoping is contained within jbd2 and so that it
      follows the lifetime and locking rules for that structure.
      
      This allows us to limit the writeback & wait in
      journal_submit_inode_data_buffers() and
      journal_finish_inode_data_buffers() respectively to the dirty range for
      a given struct jdb2_inode, keeping us from waiting forever if the inode
      in question is still being appended to.
      Signed-off-by: NRoss Zwisler <zwisler@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: stable@vger.kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      af3812b6
    • R
      mm: add filemap_fdatawait_range_keep_errors() · 4becd6c1
      Ross Zwisler 提交于
      commit aa0bfcd939c30617385ffa28682c062d78050eba upstream.
      
      In the spirit of filemap_fdatawait_range() and
      filemap_fdatawait_keep_errors(), introduce
      filemap_fdatawait_range_keep_errors() which both takes a range upon
      which to wait and does not clear errors from the address space.
      Signed-off-by: NRoss Zwisler <zwisler@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: stable@vger.kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4becd6c1
    • T
      ext4: enforce the immutable flag on open files · c9ea4620
      Theodore Ts'o 提交于
      commit 02b016ca7f99229ae6227e7b2fc950c4e140d74a upstream.
      
      According to the chattr man page, "a file with the 'i' attribute
      cannot be modified..."  Historically, this was only enforced when the
      file was opened, per the rest of the description, "... and the file
      can not be opened in write mode".
      
      There is general agreement that we should standardize all file systems
      to prevent modifications even for files that were opened at the time
      the immutable flag is set.  Eventually, a change to enforce this at
      the VFS layer should be landing in mainline.  Until then, enforce this
      at the ext4 level to prevent xfstests generic/553 from failing.
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Cc: stable@kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c9ea4620
    • D
      ext4: don't allow any modifications to an immutable file · 29171e82
      Darrick J. Wong 提交于
      commit 2e53840362771c73eb0a5ff71611507e64e8eecd upstream.
      
      Don't allow any modifications to a file that's marked immutable, which
      means that we have to flush all the writable pages to make the readonly
      and we have to check the setattr/setflags parameters more closely.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      29171e82
    • P
      perf/core: Fix race between close() and fork() · 4a5cc64d
      Peter Zijlstra 提交于
      commit 1cf8dfe8a661f0462925df943140e9f6d1ea5233 upstream.
      
      Syzcaller reported the following Use-after-Free bug:
      
      	close()						clone()
      
      							  copy_process()
      							    perf_event_init_task()
      							      perf_event_init_context()
      							        mutex_lock(parent_ctx->mutex)
      								inherit_task_group()
      								  inherit_group()
      								    inherit_event()
      								      mutex_lock(event->child_mutex)
      								      // expose event on child list
      								      list_add_tail()
      								      mutex_unlock(event->child_mutex)
      							        mutex_unlock(parent_ctx->mutex)
      
      							    ...
      							    goto bad_fork_*
      
      							  bad_fork_cleanup_perf:
      							    perf_event_free_task()
      
      	  perf_release()
      	    perf_event_release_kernel()
      	      list_for_each_entry()
      		mutex_lock(ctx->mutex)
      		mutex_lock(event->child_mutex)
      		// event is from the failing inherit
      		// on the other CPU
      		perf_remove_from_context()
      		list_move()
      		mutex_unlock(event->child_mutex)
      		mutex_unlock(ctx->mutex)
      
      							      mutex_lock(ctx->mutex)
      							      list_for_each_entry_safe()
      							        // event already stolen
      							      mutex_unlock(ctx->mutex)
      
      							    delayed_free_task()
      							      free_task()
      
      	     list_for_each_entry_safe()
      	       list_del()
      	       free_event()
      	         _free_event()
      		   // and so event->hw.target
      		   // is the already freed failed clone()
      		   if (event->hw.target)
      		     put_task_struct(event->hw.target)
      		       // WHOOPSIE, already quite dead
      
      Which puts the lie to the the comment on perf_event_free_task():
      'unexposed, unused context' not so much.
      
      Which is a 'fun' confluence of fail; copy_process() doing an
      unconditional free_task() and not respecting refcounts, and perf having
      creative locking. In particular:
      
        82d94856 ("perf/core: Fix lock inversion between perf,trace,cpuhp")
      
      seems to have overlooked this 'fun' parade.
      
      Solve it by using the fact that detached events still have a reference
      count on their (previous) context. With this perf_event_free_task()
      can detect when events have escaped and wait for their destruction.
      Debugged-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Reported-by: syzbot+a24c397a29ad22d86c98@syzkaller.appspotmail.com
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: 82d94856 ("perf/core: Fix lock inversion between perf,trace,cpuhp")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4a5cc64d
    • A
      perf/core: Fix exclusive events' grouping · 75100ec5
      Alexander Shishkin 提交于
      commit 8a58ddae23796c733c5dfbd717538d89d036c5bd upstream.
      
      So far, we tried to disallow grouping exclusive events for the fear of
      complications they would cause with moving between contexts. Specifically,
      moving a software group to a hardware context would violate the exclusivity
      rules if both groups contain matching exclusive events.
      
      This attempt was, however, unsuccessful: the check that we have in the
      perf_event_open() syscall is both wrong (looks at wrong PMU) and
      insufficient (group leader may still be exclusive), as can be illustrated
      by running:
      
        $ perf record -e '{intel_pt//,cycles}' uname
        $ perf record -e '{cycles,intel_pt//}' uname
      
      ultimately successfully.
      
      Furthermore, we are completely free to trigger the exclusivity violation
      by:
      
         perf -e '{cycles,intel_pt//}' -e '{intel_pt//,instructions}'
      
      even though the helpful perf record will not allow that, the ABI will.
      
      The warning later in the perf_event_open() path will also not trigger, because
      it's also wrong.
      
      Fix all this by validating the original group before moving, getting rid
      of broken safeguards and placing a useful one to perf_install_in_context().
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: mathieu.poirier@linaro.org
      Cc: will.deacon@arm.com
      Fixes: bed5b25a ("perf: Add a pmu capability for "exclusive" events")
      Link: https://lkml.kernel.org/r/20190701110755.24646-1-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      75100ec5
    • P
      MIPS: lb60: Fix pin mappings · 0e6ef184
      Paul Cercueil 提交于
      commit 1323c3b72a987de57141cabc44bf9cd83656bc70 upstream.
      
      The pin mappings introduced in commit 636f8ba6
      ("MIPS: JZ4740: Qi LB60: Add pinctrl configuration for several drivers")
      are completely wrong. The pinctrl driver name is incorrect, and the
      function and group fields are swapped.
      
      Fixes: 636f8ba6 ("MIPS: JZ4740: Qi LB60: Add pinctrl configuration for several drivers")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NPaul Cercueil <paul@crapouillou.net>
      Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NPaul Burton <paul.burton@mips.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: od@zcrc.me
      Cc: linux-mips@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0e6ef184
    • K
      gpio: davinci: silence error prints in case of EPROBE_DEFER · dd5994ab
      Keerthy 提交于
      commit 541e4095f388c196685685633c950cb9b97f8039 upstream.
      
      Silence error prints in case of EPROBE_DEFER. This avoids
      multiple/duplicate defer prints during boot.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NKeerthy <j-keerthy@ti.com>
      Signed-off-by: NBartosz Golaszewski <bgolaszewski@baylibre.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dd5994ab
    • C
      dma-buf: Discard old fence_excl on retrying get_fences_rcu for realloc · c947cf3e
      Chris Wilson 提交于
      commit f5b07b04e5f090a85d1e96938520f2b2b58e4a8e upstream.
      
      If we have to drop the seqcount & rcu lock to perform a krealloc, we
      have to restart the loop. In doing so, be careful not to lose track of
      the already acquired exclusive fence.
      
      Fixes: fedf5413 ("dma-buf: Restart reservation_object_get_fences_rcu() after writes")
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Cc: Christian König <christian.koenig@amd.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Sumit Semwal <sumit.semwal@linaro.org>
      Cc: stable@vger.kernel.org #v4.10
      Reviewed-by: NChristian König <christian.koenig@amd.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190604125323.21396-1-chris@chris-wilson.co.ukSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c947cf3e
    • J
      dma-buf: balance refcount inbalance · 95ee55ca
      Jérôme Glisse 提交于
      commit 5e383a9798990c69fc759a4930de224bb497e62c upstream.
      
      The debugfs take reference on fence without dropping them.
      Signed-off-by: NJérôme Glisse <jglisse@redhat.com>
      Cc: Christian König <christian.koenig@amd.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Sumit Semwal <sumit.semwal@linaro.org>
      Cc: linux-media@vger.kernel.org
      Cc: dri-devel@lists.freedesktop.org
      Cc: linaro-mm-sig@lists.linaro.org
      Cc: Stéphane Marchesin <marcheu@chromium.org>
      Cc: stable@vger.kernel.org
      Reviewed-by: NChristian König <christian.koenig@amd.com>
      Signed-off-by: NSumit Semwal <sumit.semwal@linaro.org>
      Link: https://patchwork.freedesktop.org/patch/msgid/20181206161840.6578-1-jglisse@redhat.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      95ee55ca
    • N
      net: bridge: stp: don't cache eth dest pointer before skb pull · b72fb8de
      Nikolay Aleksandrov 提交于
      [ Upstream commit 2446a68ae6a8cee6d480e2f5b52f5007c7c41312 ]
      
      Don't cache eth dest pointer before calling pskb_may_pull.
      
      Fixes: cf0f02d0 ("[BRIDGE]: use llc for receiving STP packets")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b72fb8de
    • N
      net: bridge: don't cache ether dest pointer on input · 78701843
      Nikolay Aleksandrov 提交于
      [ Upstream commit 3d26eb8ad1e9b906433903ce05f775cf038e747f ]
      
      We would cache ether dst pointer on input in br_handle_frame_finish but
      after the neigh suppress code that could lead to a stale pointer since
      both ipv4 and ipv6 suppress code do pskb_may_pull. This means we have to
      always reload it after the suppress code so there's no point in having
      it cached just retrieve it directly.
      
      Fixes: 057658cb ("bridge: suppress arp pkts on BR_NEIGH_SUPPRESS ports")
      Fixes: ed842fae ("bridge: suppress nd pkts on BR_NEIGH_SUPPRESS ports")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      78701843
    • N
      net: bridge: mcast: fix stale ipv6 hdr pointer when handling v6 query · 41a8df71
      Nikolay Aleksandrov 提交于
      [ Upstream commit 3b26a5d03d35d8f732d75951218983c0f7f68dff ]
      
      We get a pointer to the ipv6 hdr in br_ip6_multicast_query but we may
      call pskb_may_pull afterwards and end up using a stale pointer.
      So use the header directly, it's just 1 place where it's needed.
      
      Fixes: 08b202b6 ("bridge br_multicast: IPv6 MLD support.")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Tested-by: NMartin Weinelt <martin@linuxlounge.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      41a8df71
    • N
      net: bridge: mcast: fix stale nsrcs pointer in igmp3/mld2 report handling · caf4488f
      Nikolay Aleksandrov 提交于
      [ Upstream commit e57f61858b7cf478ed6fa23ed4b3876b1c9625c4 ]
      
      We take a pointer to grec prior to calling pskb_may_pull and use it
      afterwards to get nsrcs so record nsrcs before the pull when handling
      igmp3 and we get a pointer to nsrcs and call pskb_may_pull when handling
      mld2 which again could lead to reading 2 bytes out-of-bounds.
      
       ==================================================================
       BUG: KASAN: use-after-free in br_multicast_rcv+0x480c/0x4ad0 [bridge]
       Read of size 2 at addr ffff8880421302b4 by task ksoftirqd/1/16
      
       CPU: 1 PID: 16 Comm: ksoftirqd/1 Tainted: G           OE     5.2.0-rc6+ #1
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
       Call Trace:
        dump_stack+0x71/0xab
        print_address_description+0x6a/0x280
        ? br_multicast_rcv+0x480c/0x4ad0 [bridge]
        __kasan_report+0x152/0x1aa
        ? br_multicast_rcv+0x480c/0x4ad0 [bridge]
        ? br_multicast_rcv+0x480c/0x4ad0 [bridge]
        kasan_report+0xe/0x20
        br_multicast_rcv+0x480c/0x4ad0 [bridge]
        ? br_multicast_disable_port+0x150/0x150 [bridge]
        ? ktime_get_with_offset+0xb4/0x150
        ? __kasan_kmalloc.constprop.6+0xa6/0xf0
        ? __netif_receive_skb+0x1b0/0x1b0
        ? br_fdb_update+0x10e/0x6e0 [bridge]
        ? br_handle_frame_finish+0x3c6/0x11d0 [bridge]
        br_handle_frame_finish+0x3c6/0x11d0 [bridge]
        ? br_pass_frame_up+0x3a0/0x3a0 [bridge]
        ? virtnet_probe+0x1c80/0x1c80 [virtio_net]
        br_handle_frame+0x731/0xd90 [bridge]
        ? select_idle_sibling+0x25/0x7d0
        ? br_handle_frame_finish+0x11d0/0x11d0 [bridge]
        __netif_receive_skb_core+0xced/0x2d70
        ? virtqueue_get_buf_ctx+0x230/0x1130 [virtio_ring]
        ? do_xdp_generic+0x20/0x20
        ? virtqueue_napi_complete+0x39/0x70 [virtio_net]
        ? virtnet_poll+0x94d/0xc78 [virtio_net]
        ? receive_buf+0x5120/0x5120 [virtio_net]
        ? __netif_receive_skb_one_core+0x97/0x1d0
        __netif_receive_skb_one_core+0x97/0x1d0
        ? __netif_receive_skb_core+0x2d70/0x2d70
        ? _raw_write_trylock+0x100/0x100
        ? __queue_work+0x41e/0xbe0
        process_backlog+0x19c/0x650
        ? _raw_read_lock_irq+0x40/0x40
        net_rx_action+0x71e/0xbc0
        ? __switch_to_asm+0x40/0x70
        ? napi_complete_done+0x360/0x360
        ? __switch_to_asm+0x34/0x70
        ? __switch_to_asm+0x40/0x70
        ? __schedule+0x85e/0x14d0
        __do_softirq+0x1db/0x5f9
        ? takeover_tasklets+0x5f0/0x5f0
        run_ksoftirqd+0x26/0x40
        smpboot_thread_fn+0x443/0x680
        ? sort_range+0x20/0x20
        ? schedule+0x94/0x210
        ? __kthread_parkme+0x78/0xf0
        ? sort_range+0x20/0x20
        kthread+0x2ae/0x3a0
        ? kthread_create_worker_on_cpu+0xc0/0xc0
        ret_from_fork+0x35/0x40
      
       The buggy address belongs to the page:
       page:ffffea0001084c00 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x0
       flags: 0xffffc000000000()
       raw: 00ffffc000000000 ffffea0000cfca08 ffffea0001098608 0000000000000000
       raw: 0000000000000000 0000000000000003 00000000ffffff7f 0000000000000000
       page dumped because: kasan: bad access detected
      
       Memory state around the buggy address:
       ffff888042130180: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       ffff888042130200: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       > ffff888042130280: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                                           ^
       ffff888042130300: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       ffff888042130380: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       ==================================================================
       Disabling lock debugging due to kernel taint
      
      Fixes: bc8c20ac ("bridge: multicast: treat igmpv3 report with INCLUDE and no sources as a leave")
      Reported-by: NMartin Weinelt <martin@linuxlounge.net>
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Tested-by: NMartin Weinelt <martin@linuxlounge.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      caf4488f
    • X
      sctp: not bind the socket in sctp_connect · bc9a2f36
      Xin Long 提交于
      [ Upstream commit 9b6c08878e23adb7cc84bdca94d8a944b03f099e ]
      
      Now when sctp_connect() is called with a wrong sa_family, it binds
      to a port but doesn't set bp->port, then sctp_get_af_specific will
      return NULL and sctp_connect() returns -EINVAL.
      
      Then if sctp_bind() is called to bind to another port, the last
      port it has bound will leak due to bp->port is NULL by then.
      
      sctp_connect() doesn't need to bind ports, as later __sctp_connect
      will do it if bp->port is NULL. So remove it from sctp_connect().
      While at it, remove the unnecessary sockaddr.sa_family len check
      as it's already done in sctp_inet_connect.
      
      Fixes: 644fbdea ("sctp: fix the issue that flags are ignored when using kernel_connect")
      Reported-by: syzbot+079bf326b38072f849d9@syzkaller.appspotmail.com
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bc9a2f36
    • J
      net/tls: make sure offload also gets the keys wiped · fde351ae
      Jakub Kicinski 提交于
      [ Upstream commit acd3e96d53a24d219f720ed4012b62723ae05da1 ]
      
      Commit 86029d10 ("tls: zero the crypto information from tls_context
      before freeing") added memzero_explicit() calls to clear the key material
      before freeing struct tls_context, but it missed tls_device.c has its
      own way of freeing this structure. Replace the missing free.
      
      Fixes: 86029d10 ("tls: zero the crypto information from tls_context before freeing")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fde351ae
    • C
      net_sched: unset TCQ_F_CAN_BYPASS when adding filters · d9571a9f
      Cong Wang 提交于
      [ Upstream commit 3f05e6886a595c9a29a309c52f45326be917823c ]
      
      For qdisc's that support TC filters and set TCQ_F_CAN_BYPASS,
      notably fq_codel, it makes no sense to let packets bypass the TC
      filters we setup in any scenario, otherwise our packets steering
      policy could not be enforced.
      
      This can be reproduced easily with the following script:
      
       ip li add dev dummy0 type dummy
       ifconfig dummy0 up
       tc qd add dev dummy0 root fq_codel
       tc filter add dev dummy0 parent 8001: protocol arp basic action mirred egress redirect dev lo
       tc filter add dev dummy0 parent 8001: protocol ip basic action mirred egress redirect dev lo
       ping -I dummy0 192.168.112.1
      
      Without this patch, packets are sent directly to dummy0 without
      hitting any of the filters. With this patch, packets are redirected
      to loopback as expected.
      
      This fix is not perfect, it only unsets the flag but does not set it back
      because we have to save the information somewhere in the qdisc if we
      really want that. Note, both fq_codel and sfq clear this flag in their
      ->bind_tcf() but this is clearly not sufficient when we don't use any
      class ID.
      
      Fixes: 23624935 ("net_sched: TCQ_F_CAN_BYPASS generalization")
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d9571a9f
    • C
      netrom: hold sock when setting skb->destructor · 69cd5845
      Cong Wang 提交于
      [ Upstream commit 4638faac032756f7eab5524be7be56bee77e426b ]
      
      sock_efree() releases the sock refcnt, if we don't hold this refcnt
      when setting skb->destructor to it, the refcnt would not be balanced.
      This leads to several bug reports from syzbot.
      
      I have checked other users of sock_efree(), all of them hold the
      sock refcnt.
      
      Fixes: c8c8218ec5af ("netrom: fix a memory leak in nr_rx_frame()")
      Reported-and-tested-by: <syzbot+622bdabb128acc33427d@syzkaller.appspotmail.com>
      Reported-and-tested-by: <syzbot+6eaef7158b19e3fec3a0@syzkaller.appspotmail.com>
      Reported-and-tested-by: <syzbot+9399c158fcc09b21d0d2@syzkaller.appspotmail.com>
      Reported-and-tested-by: <syzbot+a34e5f3d0300163f0c87@syzkaller.appspotmail.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      69cd5845
    • C
      netrom: fix a memory leak in nr_rx_frame() · dc59a2ab
      Cong Wang 提交于
      [ Upstream commit c8c8218ec5af5d2598381883acbefbf604e56b5e ]
      
      When the skb is associated with a new sock, just assigning
      it to skb->sk is not sufficient, we have to set its destructor
      to free the sock properly too.
      
      Reported-by: syzbot+d6636a36d3c34bd88938@syzkaller.appspotmail.com
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dc59a2ab
    • A
      macsec: fix checksumming after decryption · 0c5cb5a1
      Andreas Steinmetz 提交于
      [ Upstream commit 7d8b16b9facb0dd81d1469808dd9a575fa1d525a ]
      
      Fix checksumming after decryption.
      Signed-off-by: NAndreas Steinmetz <ast@domdv.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0c5cb5a1
    • A
      macsec: fix use-after-free of skb during RX · 21252f49
      Andreas Steinmetz 提交于
      [ Upstream commit 095c02da80a41cf6d311c504d8955d6d1c2add10 ]
      
      Fix use-after-free of skb when rx_handler returns RX_HANDLER_PASS.
      Signed-off-by: NAndreas Steinmetz <ast@domdv.de>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      21252f49
    • A
      net/mlx5e: IPoIB, Add error path in mlx5_rdma_setup_rn · a8ba53da
      Aya Levin 提交于
      [ Upstream commit ef1ce7d7b67b46661091c7ccc0396186b7a247ef ]
      
      Check return value from mlx5e_attach_netdev, add error path on failure.
      
      Fixes: 48935bbb ("net/mlx5e: IPoIB, Add netdevice profile skeleton")
      Signed-off-by: NAya Levin <ayal@mellanox.com>
      Reviewed-by: NFeras Daoud <ferasda@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a8ba53da
    • P
      vrf: make sure skb->data contains ip header to make routing · a2aa162a
      Peter Kosyh 提交于
      [ Upstream commit 107e47cc80ec37cb332bd41b22b1c7779e22e018 ]
      
      vrf_process_v4_outbound() and vrf_process_v6_outbound() do routing
      using ip/ipv6 addresses, but don't make sure the header is available
      in skb->data[] (skb_headlen() is less then header size).
      
      Case:
      
      1) igb driver from intel.
      2) Packet size is greater then 255.
      3) MPLS forwards to VRF device.
      
      So, patch adds pskb_may_pull() calls in vrf_process_v4/v6_outbound()
      functions.
      Signed-off-by: NPeter Kosyh <p.kosyh@gmail.com>
      Reviewed-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a2aa162a
    • C
      tcp: Reset bytes_acked and bytes_received when disconnecting · 1b200acd
      Christoph Paasch 提交于
      [ Upstream commit e858faf556d4e14c750ba1e8852783c6f9520a0e ]
      
      If an app is playing tricks to reuse a socket via tcp_disconnect(),
      bytes_acked/received needs to be reset to 0. Otherwise tcp_info will
      report the sum of the current and the old connection..
      
      Cc: Eric Dumazet <edumazet@google.com>
      Fixes: 0df48c26 ("tcp: add tcpi_bytes_acked to tcp_info")
      Fixes: bdd1f9ed ("tcp: add tcpi_bytes_received to tcp_info")
      Signed-off-by: NChristoph Paasch <cpaasch@apple.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1b200acd
    • E
      tcp: fix tcp_set_congestion_control() use from bpf hook · c60f57df
      Eric Dumazet 提交于
      [ Upstream commit 8d650cdedaabb33e85e9b7c517c0c71fcecc1de9 ]
      
      Neal reported incorrect use of ns_capable() from bpf hook.
      
      bpf_setsockopt(...TCP_CONGESTION...)
        -> tcp_set_congestion_control()
         -> ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)
          -> ns_capable_common()
           -> current_cred()
            -> rcu_dereference_protected(current->cred, 1)
      
      Accessing 'current' in bpf context makes no sense, since packets
      are processed from softirq context.
      
      As Neal stated : The capability check in tcp_set_congestion_control()
      was written assuming a system call context, and then was reused from
      a BPF call site.
      
      The fix is to add a new parameter to tcp_set_congestion_control(),
      so that the ns_capable() call is only performed under the right
      context.
      
      Fixes: 91b5b21c ("bpf: Add support for changing congestion control")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Lawrence Brakmo <brakmo@fb.com>
      Reported-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c60f57df
    • E
      tcp: be more careful in tcp_fragment() · 6323c238
      Eric Dumazet 提交于
      [ Upstream commit b617158dc096709d8600c53b6052144d12b89fab ]
      
      Some applications set tiny SO_SNDBUF values and expect
      TCP to just work. Recent patches to address CVE-2019-11478
      broke them in case of losses, since retransmits might
      be prevented.
      
      We should allow these flows to make progress.
      
      This patch allows the first and last skb in retransmit queue
      to be split even if memory limits are hit.
      
      It also adds the some room due to the fact that tcp_sendmsg()
      and tcp_sendpage() might overshoot sk_wmem_queued by about one full
      TSO skb (64KB size). Note this allowance was already present
      in stable backports for kernels < 4.15
      
      Note for < 4.15 backports :
       tcp_rtx_queue_tail() will probably look like :
      
      static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk)
      {
      	struct sk_buff *skb = tcp_send_head(sk);
      
      	return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk);
      }
      
      Fixes: f070ef2ac667 ("tcp: tcp_fragment() should apply sane memory limits")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NAndrew Prout <aprout@ll.mit.edu>
      Tested-by: NAndrew Prout <aprout@ll.mit.edu>
      Tested-by: NJonathan Lemon <jonathan.lemon@gmail.com>
      Tested-by: NMichal Kubecek <mkubecek@suse.cz>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NChristoph Paasch <cpaasch@apple.com>
      Cc: Jonathan Looney <jtl@netflix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6323c238
    • T
      sky2: Disable MSI on ASUS P6T · b640ade0
      Takashi Iwai 提交于
      [ Upstream commit a261e3797506bd561700be643fe1a85bf81e9661 ]
      
      The onboard sky2 NIC on ASUS P6T WS PRO doesn't work after PM resume
      due to the infamous IRQ problem.  Disabling MSI works around it, so
      let's add it to the blacklist.
      
      Unfortunately the BIOS on the machine doesn't fill the standard
      DMI_SYS_* entry, so we pick up DMI_BOARD_* entries instead.
      
      BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1142496Reported-and-tested-by: NMarcus Seyfarth <m.seyfarth@gmail.com>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b640ade0
    • M
      sctp: fix error handling on stream scheduler initialization · d9ee5afd
      Marcelo Ricardo Leitner 提交于
      [ Upstream commit 4d1415811e492d9a8238f8a92dd0d51612c788e9 ]
      
      It allocates the extended area for outbound streams only on sendmsg
      calls, if they are not yet allocated.  When using the priority
      stream scheduler, this initialization may imply into a subsequent
      allocation, which may fail.  In this case, it was aborting the stream
      scheduler initialization but leaving the ->ext pointer (allocated) in
      there, thus in a partially initialized state.  On a subsequent call to
      sendmsg, it would notice the ->ext pointer in there, and trip on
      uninitialized stuff when trying to schedule the data chunk.
      
      The fix is undo the ->ext initialization if the stream scheduler
      initialization fails and avoid the partially initialized state.
      
      Although syzkaller bisected this to commit 4ff40b86262b ("sctp: set
      chunk transport correctly when it's a new asoc"), this bug was actually
      introduced on the commit I marked below.
      
      Reported-by: syzbot+c1a380d42b190ad1e559@syzkaller.appspotmail.com
      Fixes: 5bbbbe32 ("sctp: introduce stream scheduler foundations")
      Tested-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: NNeil Horman <nhorman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d9ee5afd
    • D
      rxrpc: Fix send on a connected, but unbound socket · bfa79135
      David Howells 提交于
      [ Upstream commit e835ada07091f40dcfb1bc735082bd0a7c005e59 ]
      
      If sendmsg() or sendmmsg() is called on a connected socket that hasn't had
      bind() called on it, then an oops will occur when the kernel tries to
      connect the call because no local endpoint has been allocated.
      
      Fix this by implicitly binding the socket if it is in the
      RXRPC_CLIENT_UNBOUND state, just like it does for the RXRPC_UNBOUND state.
      
      Further, the state should be transitioned to RXRPC_CLIENT_BOUND after this
      to prevent further attempts to bind it.
      
      This can be tested with:
      
      	#include <stdio.h>
      	#include <stdlib.h>
      	#include <string.h>
      	#include <sys/socket.h>
      	#include <arpa/inet.h>
      	#include <linux/rxrpc.h>
      	static const unsigned char inet6_addr[16] = {
      		0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, -1, 0xac, 0x14, 0x14, 0xaa
      	};
      	int main(void)
      	{
      		struct sockaddr_rxrpc srx;
      		struct cmsghdr *cm;
      		struct msghdr msg;
      		unsigned char control[16];
      		int fd;
      		memset(&srx, 0, sizeof(srx));
      		srx.srx_family = 0x21;
      		srx.srx_service = 0;
      		srx.transport_type = AF_INET;
      		srx.transport_len = 0x1c;
      		srx.transport.sin6.sin6_family = AF_INET6;
      		srx.transport.sin6.sin6_port = htons(0x4e22);
      		srx.transport.sin6.sin6_flowinfo = htons(0x4e22);
      		srx.transport.sin6.sin6_scope_id = htons(0xaa3b);
      		memcpy(&srx.transport.sin6.sin6_addr, inet6_addr, 16);
      		cm = (struct cmsghdr *)control;
      		cm->cmsg_len	= CMSG_LEN(sizeof(unsigned long));
      		cm->cmsg_level	= SOL_RXRPC;
      		cm->cmsg_type	= RXRPC_USER_CALL_ID;
      		*(unsigned long *)CMSG_DATA(cm) = 0;
      		msg.msg_name = NULL;
      		msg.msg_namelen = 0;
      		msg.msg_iov = NULL;
      		msg.msg_iovlen = 0;
      		msg.msg_control = control;
      		msg.msg_controllen = cm->cmsg_len;
      		msg.msg_flags = 0;
      		fd = socket(AF_RXRPC, SOCK_DGRAM, AF_INET);
      		connect(fd, (struct sockaddr *)&srx, sizeof(srx));
      		sendmsg(fd, &msg, 0);
      		return 0;
      	}
      
      Leading to the following oops:
      
      	BUG: kernel NULL pointer dereference, address: 0000000000000018
      	#PF: supervisor read access in kernel mode
      	#PF: error_code(0x0000) - not-present page
      	...
      	RIP: 0010:rxrpc_connect_call+0x42/0xa01
      	...
      	Call Trace:
      	 ? mark_held_locks+0x47/0x59
      	 ? __local_bh_enable_ip+0xb6/0xba
      	 rxrpc_new_client_call+0x3b1/0x762
      	 ? rxrpc_do_sendmsg+0x3c0/0x92e
      	 rxrpc_do_sendmsg+0x3c0/0x92e
      	 rxrpc_sendmsg+0x16b/0x1b5
      	 sock_sendmsg+0x2d/0x39
      	 ___sys_sendmsg+0x1a4/0x22a
      	 ? release_sock+0x19/0x9e
      	 ? reacquire_held_locks+0x136/0x160
      	 ? release_sock+0x19/0x9e
      	 ? find_held_lock+0x2b/0x6e
      	 ? __lock_acquire+0x268/0xf73
      	 ? rxrpc_connect+0xdd/0xe4
      	 ? __local_bh_enable_ip+0xb6/0xba
      	 __sys_sendmsg+0x5e/0x94
      	 do_syscall_64+0x7d/0x1bf
      	 entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fixes: 2341e077 ("rxrpc: Simplify connect() implementation and simplify sendmsg() op")
      Reported-by: syzbot+7966f2a0b2c7da8939b4@syzkaller.appspotmail.com
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bfa79135
    • H
      r8169: fix issue with confused RX unit after PHY power-down on RTL8411b · 3e4e6b71
      Heiner Kallweit 提交于
      [ Upstream commit fe4e8db0392a6c2e795eb89ef5fcd86522e66248 ]
      
      On RTL8411b the RX unit gets confused if the PHY is powered-down.
      This was reported in [0] and confirmed by Realtek. Realtek provided
      a sequence to fix the RX unit after PHY wakeup.
      
      The issue itself seems to have been there longer, the Fixes tag
      refers to where the fix applies properly.
      
      [0] https://bugzilla.redhat.com/show_bug.cgi?id=1692075
      
      Fixes: a99790bf ("r8169: Reinstate ASPM Support")
      Tested-by: NIonut Radu <ionut.radu@gmail.com>
      Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3e4e6b71
    • AlbinYang's avatar
      nfc: fix potential illegal memory access · 97739e5c
      AlbinYang 提交于
      [ Upstream commit dd006fc434e107ef90f7de0db9907cbc1c521645 ]
      
      The frags_q is not properly initialized, it may result in illegal memory
      access when conn_info is NULL.
      The "goto free_exit" should be replaced by "goto exit".
      Signed-off-by: AlbinYang's avatarYang Wei <albin_yang@163.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      97739e5c
    • J
      net: stmmac: Re-work the queue selection for TSO packets · f47f68cc
      Jose Abreu 提交于
      [ Upstream commit 4993e5b37e8bcb55ac90f76eb6d2432647273747 ]
      
      Ben Hutchings says:
      	"This is the wrong place to change the queue mapping.
      	stmmac_xmit() is called with a specific TX queue locked,
      	and accessing a different TX queue results in a data race
      	for all of that queue's state.
      
      	I think this commit should be reverted upstream and in all
      	stable branches.  Instead, the driver should implement the
      	ndo_select_queue operation and override the queue mapping there."
      
      Fixes: c5acdbee22a1 ("net: stmmac: Send TSO packets always from Queue 0")
      Suggested-by: NBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: NJose Abreu <joabreu@synopsys.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f47f68cc
    • A
      net: phy: sfp: hwmon: Fix scaling of RX power · 201d7d62
      Andrew Lunn 提交于
      [ Upstream commit 0cea0e1148fe134a4a3aaf0b1496f09241fb943a ]
      
      The RX power read from the SFP uses units of 0.1uW. This must be
      scaled to units of uW for HWMON. This requires a divide by 10, not the
      current 100.
      
      With this change in place, sensors(1) and ethtool -m agree:
      
      sff2-isa-0000
      Adapter: ISA adapter
      in0:          +3.23 V
      temp1:        +33.1 C
      power1:      270.00 uW
      power2:      200.00 uW
      curr1:        +0.01 A
      
              Laser output power                        : 0.2743 mW / -5.62 dBm
              Receiver signal average optical power     : 0.2014 mW / -6.96 dBm
      
      Reported-by: chris.healy@zii.aero
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Fixes: 1323061a ("net: phy: sfp: Add HWMON support for module sensors")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      201d7d62
    • J
      net: openvswitch: fix csum updates for MPLS actions · c60bce64
      John Hurley 提交于
      [ Upstream commit 0e3183cd2a64843a95b62f8bd4a83605a4cf0615 ]
      
      Skbs may have their checksum value populated by HW. If this is a checksum
      calculated over the entire packet then the CHECKSUM_COMPLETE field is
      marked. Changes to the data pointer on the skb throughout the network
      stack still try to maintain this complete csum value if it is required
      through functions such as skb_postpush_rcsum.
      
      The MPLS actions in Open vSwitch modify a CHECKSUM_COMPLETE value when
      changes are made to packet data without a push or a pull. This occurs when
      the ethertype of the MAC header is changed or when MPLS lse fields are
      modified.
      
      The modification is carried out using the csum_partial function to get the
      csum of a buffer and add it into the larger checksum. The buffer is an
      inversion of the data to be removed followed by the new data. Because the
      csum is calculated over 16 bits and these values align with 16 bits, the
      effect is the removal of the old value from the CHECKSUM_COMPLETE and
      addition of the new value.
      
      However, the csum fed into the function and the outcome of the
      calculation are also inverted. This would only make sense if it was the
      new value rather than the old that was inverted in the input buffer.
      
      Fix the issue by removing the bit inverts in the csum_partial calculation.
      
      The bug was verified and the fix tested by comparing the folded value of
      the updated CHECKSUM_COMPLETE value with the folded value of a full
      software checksum calculation (reset skb->csum to 0 and run
      skb_checksum_complete(skb)). Prior to the fix the outcomes differed but
      after they produce the same result.
      
      Fixes: 25cd9ba0 ("openvswitch: Add basic MPLS support to kernel")
      Fixes: bc7cc599 ("openvswitch: update checksum in {push,pop}_mpls")
      Signed-off-by: NJohn Hurley <john.hurley@netronome.com>
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c60bce64
    • L
      net: neigh: fix multiple neigh timer scheduling · 257441a0
      Lorenzo Bianconi 提交于
      [ Upstream commit 071c37983d99da07797294ea78e9da1a6e287144 ]
      
      Neigh timer can be scheduled multiple times from userspace adding
      multiple neigh entries and forcing the neigh timer scheduling passing
      NTF_USE in the netlink requests.
      This will result in a refcount leak and in the following dump stack:
      
      [   32.465295] NEIGH: BUG, double timer add, state is 8
      [   32.465308] CPU: 0 PID: 416 Comm: double_timer_ad Not tainted 5.2.0+ #65
      [   32.465311] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.12.0-2.fc30 04/01/2014
      [   32.465313] Call Trace:
      [   32.465318]  dump_stack+0x7c/0xc0
      [   32.465323]  __neigh_event_send+0x20c/0x880
      [   32.465326]  ? ___neigh_create+0x846/0xfb0
      [   32.465329]  ? neigh_lookup+0x2a9/0x410
      [   32.465332]  ? neightbl_fill_info.constprop.0+0x800/0x800
      [   32.465334]  neigh_add+0x4f8/0x5e0
      [   32.465337]  ? neigh_xmit+0x620/0x620
      [   32.465341]  ? find_held_lock+0x85/0xa0
      [   32.465345]  rtnetlink_rcv_msg+0x204/0x570
      [   32.465348]  ? rtnl_dellink+0x450/0x450
      [   32.465351]  ? mark_held_locks+0x90/0x90
      [   32.465354]  ? match_held_lock+0x1b/0x230
      [   32.465357]  netlink_rcv_skb+0xc4/0x1d0
      [   32.465360]  ? rtnl_dellink+0x450/0x450
      [   32.465363]  ? netlink_ack+0x420/0x420
      [   32.465366]  ? netlink_deliver_tap+0x115/0x560
      [   32.465369]  ? __alloc_skb+0xc9/0x2f0
      [   32.465372]  netlink_unicast+0x270/0x330
      [   32.465375]  ? netlink_attachskb+0x2f0/0x2f0
      [   32.465378]  netlink_sendmsg+0x34f/0x5a0
      [   32.465381]  ? netlink_unicast+0x330/0x330
      [   32.465385]  ? move_addr_to_kernel.part.0+0x20/0x20
      [   32.465388]  ? netlink_unicast+0x330/0x330
      [   32.465391]  sock_sendmsg+0x91/0xa0
      [   32.465394]  ___sys_sendmsg+0x407/0x480
      [   32.465397]  ? copy_msghdr_from_user+0x200/0x200
      [   32.465401]  ? _raw_spin_unlock_irqrestore+0x37/0x40
      [   32.465404]  ? lockdep_hardirqs_on+0x17d/0x250
      [   32.465407]  ? __wake_up_common_lock+0xcb/0x110
      [   32.465410]  ? __wake_up_common+0x230/0x230
      [   32.465413]  ? netlink_bind+0x3e1/0x490
      [   32.465416]  ? netlink_setsockopt+0x540/0x540
      [   32.465420]  ? __fget_light+0x9c/0xf0
      [   32.465423]  ? sockfd_lookup_light+0x8c/0xb0
      [   32.465426]  __sys_sendmsg+0xa5/0x110
      [   32.465429]  ? __ia32_sys_shutdown+0x30/0x30
      [   32.465432]  ? __fd_install+0xe1/0x2c0
      [   32.465435]  ? lockdep_hardirqs_off+0xb5/0x100
      [   32.465438]  ? mark_held_locks+0x24/0x90
      [   32.465441]  ? do_syscall_64+0xf/0x270
      [   32.465444]  do_syscall_64+0x63/0x270
      [   32.465448]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fix the issue unscheduling neigh_timer if selected entry is in 'IN_TIMER'
      receiving a netlink request with NTF_USE flag set
      Reported-by: NMarek Majkowski <marek@cloudflare.com>
      Fixes: 0c5c2d30 ("neigh: Allow for user space users of the neighbour table")
      Signed-off-by: NLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      257441a0
    • F
      net: make skb_dst_force return true when dst is refcounted · 832d0ea7
      Florian Westphal 提交于
      [ Upstream commit b60a77386b1d4868f72f6353d35dabe5fbe981f2 ]
      
      netfilter did not expect that skb_dst_force() can cause skb to lose its
      dst entry.
      
      I got a bug report with a skb->dst NULL dereference in netfilter
      output path.  The backtrace contains nf_reinject(), so the dst might have
      been cleared when skb got queued to userspace.
      
      Other users were fixed via
      if (skb_dst(skb)) {
      	skb_dst_force(skb);
      	if (!skb_dst(skb))
      		goto handle_err;
      }
      
      But I think its preferable to make the 'dst might be cleared' part
      of the function explicit.
      
      In netfilter case, skb with a null dst is expected when queueing in
      prerouting hook, so drop skb for the other hooks.
      
      v2:
       v1 of this patch returned true in case skb had no dst entry.
       Eric said:
         Say if we have two skb_dst_force() calls for some reason
         on the same skb, only the first one will return false.
      
       This now returns false even when skb had no dst, as per Erics
       suggestion, so callers might need to check skb_dst() first before
       skb_dst_force().
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      832d0ea7
    • B
      net: dsa: mv88e6xxx: wait after reset deactivation · 6ab30a4c
      Baruch Siach 提交于
      [ Upstream commit 7b75e49de424ceb53d13e60f35d0a73765626fda ]
      
      Add a 1ms delay after reset deactivation. Otherwise the chip returns
      bogus ID value. This is observed with 88E6390 (Peridot) chip.
      Signed-off-by: NBaruch Siach <baruch@tkos.co.il>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6ab30a4c