1. 12 1月, 2019 15 次提交
    • P
      net: clear skb->tstamp in bridge forwarding path · 41d1c883
      Paolo Abeni 提交于
      Matteo reported forwarding issues inside the linux bridge,
      if the enslaved interfaces use the fq qdisc.
      
      Similar to commit 8203e2d8 ("net: clear skb->tstamp in
      forwarding paths"), we need to clear the tstamp field in
      the bridge forwarding path.
      
      Fixes: 80b14dee ("net: Add a new socket option for a future transmit time.")
      Fixes: fb420d5d ("tcp/fq: move back to CLOCK_MONOTONIC")
      Reported-and-tested-by: NMatteo Croce <mcroce@redhat.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      41d1c883
    • D
      Merge branch 'bpfilter-fixes' · 3f4261d4
      David S. Miller 提交于
      Taehee Yoo says:
      
      ====================
      net: bpfilter: fix two bugs in bpfilter
      
      This patches fix two bugs in the bpfilter_umh which are related in
      iptables command.
      
      The first patch adds an exit code for UMH process.
      This provides an opportunity to cleanup members of the umh_info
      to modules which use the UMH.
      In order to identify UMH processes, a new flag PF_UMH is added.
      
      The second patch makes the bpfilter_umh use UMH cleanup callback.
      
      The third patch adds re-start routine for the bpfilter_umh.
      The bpfilter_umh does not re-start after error occurred.
      because there is no re-start routine in the module.
      
      The fourth patch ensures that the bpfilter.ko module will not removed while
      it's being used.
      The bpfilter.ko is not protected by locks or module reference counter.
      Therefore that can be removed while module is being used.
      In order to protect that, mutex is used.
      
      The first and second patch are preparation patches for the third and
      fourth patch.
      
      TEST #1
         while :
         do
      	modprobe bpfilter
      	kill -9 <pid of the bpfilter_umh>
      	iptables -vnL
         done
      
      TEST #2
         while :
         do
      	iptables -I FORWARD -m string --string ap --algo kmp &
      	iptables -F &
      	modprobe -rv bpfilter &
         done
      
      TEST #3
         while :
         do
      	modprobe bpfilter &
      	modprobe -rv bpfilter &
         done
      
      The TEST1 makes a failure of iptables command.
      This is fixed by the third patch.
      
      The TEST2 makes a panic because of a race condition in the bpfilter_umh
      module.
      This is fixed by the fourth patch.
      
      The TEST3 makes a double-create UMH process.
      This is fixed by the third and fourth patch.
      
      v4 :
       - declare the exit_umh() as static inline
       - check stop flag in the load_umh() to avoid a double-create UMH
      v3 :
       - Avoid unnecessary list lookup for non-UMH processes
       - Add a new PF_UMH flag
      v2 : add the first and second patch
      v1 : Initial patch
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3f4261d4
    • T
      net: bpfilter: disallow to remove bpfilter module while being used · 71a85084
      Taehee Yoo 提交于
      The bpfilter.ko module can be removed while functions of the bpfilter.ko
      are executing. so panic can occurred. in order to protect that, locks can
      be used. a bpfilter_lock protects routines in the
      __bpfilter_process_sockopt() but it's not enough because __exit routine
      can be executed concurrently.
      
      Now, the bpfilter_umh can not run in parallel.
      So, the module do not removed while it's being used and it do not
      double-create UMH process.
      The members of the umh_info and the bpfilter_umh_ops are protected by
      the bpfilter_umh_ops.lock.
      
      test commands:
         while :
         do
      	iptables -I FORWARD -m string --string ap --algo kmp &
      	modprobe -rv bpfilter &
         done
      
      splat looks like:
      [  298.623435] BUG: unable to handle kernel paging request at fffffbfff807440b
      [  298.628512] #PF error: [normal kernel read fault]
      [  298.633018] PGD 124327067 P4D 124327067 PUD 11c1a3067 PMD 119eb2067 PTE 0
      [  298.638859] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
      [  298.638859] CPU: 0 PID: 2997 Comm: iptables Not tainted 4.20.0+ #154
      [  298.638859] RIP: 0010:__mutex_lock+0x6b9/0x16a0
      [  298.638859] Code: c0 00 00 e8 89 82 ff ff 80 bd 8f fc ff ff 00 0f 85 d9 05 00 00 48 8b 85 80 fc ff ff 48 bf 00 00 00 00 00 fc ff df 48 c1 e8 03 <80> 3c 38 00 0f 85 1d 0e 00 00 48 8b 85 c8 fc ff ff 49 39 47 58 c6
      [  298.638859] RSP: 0018:ffff88810e7777a0 EFLAGS: 00010202
      [  298.638859] RAX: 1ffffffff807440b RBX: ffff888111bd4d80 RCX: 0000000000000000
      [  298.638859] RDX: 1ffff110235ff806 RSI: ffff888111bd5538 RDI: dffffc0000000000
      [  298.638859] RBP: ffff88810e777b30 R08: 0000000080000002 R09: 0000000000000000
      [  298.638859] R10: 0000000000000000 R11: 0000000000000000 R12: fffffbfff168a42c
      [  298.638859] R13: ffff888111bd4d80 R14: ffff8881040e9a05 R15: ffffffffc03a2000
      [  298.638859] FS:  00007f39e3758700(0000) GS:ffff88811ae00000(0000) knlGS:0000000000000000
      [  298.638859] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  298.638859] CR2: fffffbfff807440b CR3: 000000011243e000 CR4: 00000000001006f0
      [  298.638859] Call Trace:
      [  298.638859]  ? mutex_lock_io_nested+0x1560/0x1560
      [  298.638859]  ? kasan_kmalloc+0xa0/0xd0
      [  298.638859]  ? kmem_cache_alloc+0x1c2/0x260
      [  298.638859]  ? __alloc_file+0x92/0x3c0
      [  298.638859]  ? alloc_empty_file+0x43/0x120
      [  298.638859]  ? alloc_file_pseudo+0x220/0x330
      [  298.638859]  ? sock_alloc_file+0x39/0x160
      [  298.638859]  ? __sys_socket+0x113/0x1d0
      [  298.638859]  ? __x64_sys_socket+0x6f/0xb0
      [  298.638859]  ? do_syscall_64+0x138/0x560
      [  298.638859]  ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  298.638859]  ? __alloc_file+0x92/0x3c0
      [  298.638859]  ? init_object+0x6b/0x80
      [  298.638859]  ? cyc2ns_read_end+0x10/0x10
      [  298.638859]  ? cyc2ns_read_end+0x10/0x10
      [  298.638859]  ? hlock_class+0x140/0x140
      [  298.638859]  ? sched_clock_local+0xd4/0x140
      [  298.638859]  ? sched_clock_local+0xd4/0x140
      [  298.638859]  ? check_flags.part.37+0x440/0x440
      [  298.638859]  ? __lock_acquire+0x4f90/0x4f90
      [  298.638859]  ? set_rq_offline.part.89+0x140/0x140
      [ ... ]
      
      Fixes: d2ba09c1 ("net: add skeleton of bpfilter kernel module")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      71a85084
    • T
      net: bpfilter: restart bpfilter_umh when error occurred · 61fbf593
      Taehee Yoo 提交于
      The bpfilter_umh will be stopped via __stop_umh() when the bpfilter
      error occurred.
      The bpfilter_umh() couldn't start again because there is no restart
      routine.
      
      The section of the bpfilter_umh_{start/end} is no longer .init.rodata
      because these area should be reused in the restart routine. hence
      the section name is changed to .bpfilter_umh.
      
      The bpfilter_ops->start() is restart callback. it will be called when
      bpfilter_umh is stopped.
      The stop bit means bpfilter_umh is stopped. this bit is set by both
      start and stop routine.
      
      Before this patch,
      Test commands:
         $ iptables -vnL
         $ kill -9 <pid of bpfilter_umh>
         $ iptables -vnL
         [  480.045136] bpfilter: write fail -32
         $ iptables -vnL
      
      All iptables commands will fail.
      
      After this patch,
      Test commands:
         $ iptables -vnL
         $ kill -9 <pid of bpfilter_umh>
         $ iptables -vnL
         $ iptables -vnL
      
      Now, all iptables commands will work.
      
      Fixes: d2ba09c1 ("net: add skeleton of bpfilter kernel module")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      61fbf593
    • T
      net: bpfilter: use cleanup callback to release umh_info · 5b4cb650
      Taehee Yoo 提交于
      Now, UMH process is killed, do_exit() calls the umh_info->cleanup callback
      to release members of the umh_info.
      This patch makes bpfilter_umh's cleanup routine to use the
      umh_info->cleanup callback.
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5b4cb650
    • T
      umh: add exit routine for UMH process · 73ab1cb2
      Taehee Yoo 提交于
      A UMH process which is created by the fork_usermode_blob() such as
      bpfilter needs to release members of the umh_info when process is
      terminated.
      But the do_exit() does not release members of the umh_info. hence module
      which uses UMH needs own code to detect whether UMH process is
      terminated or not.
      But this implementation needs extra code for checking the status of
      UMH process. it eventually makes the code more complex.
      
      The new PF_UMH flag is added and it is used to identify UMH processes.
      The exit_umh() does not release members of the umh_info.
      Hence umh_info->cleanup callback should release both members of the
      umh_info and the private data.
      Suggested-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      73ab1cb2
    • J
      isdn: i4l: isdn_tty: Fix some concurrency double-free bugs · 2ff33d66
      Jia-Ju Bai 提交于
      The functions isdn_tty_tiocmset() and isdn_tty_set_termios() may be
      concurrently executed.
      
      isdn_tty_tiocmset
        isdn_tty_modem_hup
          line 719: kfree(info->dtmf_state);
          line 721: kfree(info->silence_state);
          line 723: kfree(info->adpcms);
          line 725: kfree(info->adpcmr);
      
      isdn_tty_set_termios
        isdn_tty_modem_hup
          line 719: kfree(info->dtmf_state);
          line 721: kfree(info->silence_state);
          line 723: kfree(info->adpcms);
          line 725: kfree(info->adpcmr);
      
      Thus, some concurrency double-free bugs may occur.
      
      These possible bugs are found by a static tool written by myself and
      my manual code review.
      
      To fix these possible bugs, the mutex lock "modem_info_mutex" used in
      isdn_tty_tiocmset() is added in isdn_tty_set_termios().
      Signed-off-by: NJia-Ju Bai <baijiaju1990@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2ff33d66
    • Z
      vhost/vsock: fix vhost vsock cid hashing inconsistent · 7fbe078c
      Zha Bin 提交于
      The vsock core only supports 32bit CID, but the Virtio-vsock spec define
      CID (dst_cid and src_cid) as u64 and the upper 32bits is reserved as
      zero. This inconsistency causes one bug in vhost vsock driver. The
      scenarios is:
      
        0. A hash table (vhost_vsock_hash) is used to map an CID to a vsock
        object. And hash_min() is used to compute the hash key. hash_min() is
        defined as:
        (sizeof(val) <= 4 ? hash_32(val, bits) : hash_long(val, bits)).
        That means the hash algorithm has dependency on the size of macro
        argument 'val'.
        0. In function vhost_vsock_set_cid(), a 64bit CID is passed to
        hash_min() to compute the hash key when inserting a vsock object into
        the hash table.
        0. In function vhost_vsock_get(), a 32bit CID is passed to hash_min()
        to compute the hash key when looking up a vsock for an CID.
      
      Because the different size of the CID, hash_min() returns different hash
      key, thus fails to look up the vsock object for an CID.
      
      To fix this bug, we keep CID as u64 in the IOCTLs and virtio message
      headers, but explicitly convert u64 to u32 when deal with the hash table
      and vsock core.
      
      Fixes: 834e772c ("vhost/vsock: fix use-after-free in network stack callers")
      Link: https://github.com/stefanha/virtio/blob/vsock/trunk/content.texSigned-off-by: NZha Bin <zhabin@linux.alibaba.com>
      Reviewed-by: NLiu Jiang <gerry@linux.alibaba.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7fbe078c
    • D
      Merge branch 'stmmac-fixes' · 5fea7f10
      David S. Miller 提交于
      Jose Abreu says:
      
      ====================
      net: stmmac: Misc Fixes
      
      Some small fixes for stmmac targeting -net. Detailed info in commit log.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5fea7f10
    • J
      net: stmmac: Prevent RX starvation in stmmac_napi_poll() · fa0be0a4
      Jose Abreu 提交于
      Currently, TX is given a budget which is consumed by stmmac_tx_clean()
      and stmmac_rx() is given the remaining non-consumed budget.
      
      This is wrong and in case we are sending a large number of packets this
      can starve RX because remaining budget will be low.
      
      Let's give always the same budget for RX and TX clean.
      
      While at it, check if we missed any interrupts while we were in NAPI
      callback by looking at DMA interrupt status.
      
      Cc: Joao Pinto <jpinto@synopsys.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: Alexandre Torgue <alexandre.torgue@st.com>
      Signed-off-by: NJose Abreu <joabreu@synopsys.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fa0be0a4
    • J
      net: stmmac: Fix the logic of checking if RX Watchdog must be enabled · 3b509466
      Jose Abreu 提交于
      RX Watchdog can be disabled by platform definitions but currently we are
      initializing the descriptors before checking if Watchdog must be
      disabled or not.
      
      Fix this by checking earlier if user wants Watchdog disabled or not.
      
      Cc: Joao Pinto <jpinto@synopsys.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: Alexandre Torgue <alexandre.torgue@st.com>
      Signed-off-by: NJose Abreu <joabreu@synopsys.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b509466
    • J
      net: stmmac: Check if CBS is supported before configuring · 0650d401
      Jose Abreu 提交于
      Check if CBS is currently supported before trying to configure it in HW.
      
      Cc: Joao Pinto <jpinto@synopsys.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: Alexandre Torgue <alexandre.torgue@st.com>
      Signed-off-by: NJose Abreu <joabreu@synopsys.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0650d401
    • J
      net: stmmac: dwxgmac2: Only clear interrupts that are active · fcc509eb
      Jose Abreu 提交于
      In DMA interrupt handler we were clearing all interrupts status, even
      the ones that were not active. Fix this and only clear the active
      interrupts.
      
      Cc: Joao Pinto <jpinto@synopsys.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: Alexandre Torgue <alexandre.torgue@st.com>
      Signed-off-by: NJose Abreu <joabreu@synopsys.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fcc509eb
    • J
      net: stmmac: Fix PCI module removal leak · 6dea7e18
      Jose Abreu 提交于
      Since commit b7d0f08e, the enable / disable of PCI device is not
      managed which will result in IO regions not being automatically unmapped.
      As regions continue mapped it is currently not possible to remove and
      then probe again the PCI module of stmmac.
      
      Fix this by manually unmapping regions on remove callback.
      
      Changes from v1:
      - Fix build error
      
      Cc: Joao Pinto <jpinto@synopsys.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: Alexandre Torgue <alexandre.torgue@st.com>
      Fixes: b7d0f08e ("net: stmmac: Fix WoL for PCI-based setups")
      Signed-off-by: NJose Abreu <joabreu@synopsys.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6dea7e18
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · e8b108b0
      David S. Miller 提交于
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2019-01-11
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Fix TCP-BPF support for correctly setting the initial window
         via TCP_BPF_IW on an active TFO sender, from Yuchung.
      
      2) Fix a panic in BPF's stack_map_get_build_id()'s ELF parsing on
         32 bit archs caused by page_address() returning NULL, from Song.
      
      3) Fix BTF pretty print in kernel and bpftool when bitfield member
         offset is greater than 256. Also add test cases, from Yonghong.
      
      4) Fix improper argument handling in xdp1 sample, from Ioana.
      
      5) Install missing tcp_server.py and tcp_client.py files from
         BPF selftests, from Anders.
      
      6) Add test_libbpf to gitignore in libbpf and BPF selftests,
         from Stanislav.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e8b108b0
  2. 11 1月, 2019 8 次提交
    • D
      Merge branch 'bpf-fix-bitfield-printing' · fb4129b9
      Daniel Borkmann 提交于
      Yonghong Song says:
      
      ====================
      The previous BTF kind_flag support patch set introduced a bug
      for kernel bpffs pretty printing and another bug for bpftool
      map pretty printing. If a bitfield struct member offset is
      greater than 256 bits, printed value for that struct
      member will be incorrect.
      
      - Patch #1 fixed the bug in kernel bpffs pretty printing.
      - Patch #2 enhanced the test_btf test case to cover the
                 issue exposed by patch #1.
      - Patch #3 fixed the bug in bpftool map pretty printing.
      ====================
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      fb4129b9
    • Y
      tools/bpf: fix bpftool map dump with bitfields · 298e59d3
      Yonghong Song 提交于
      Commit 8772c8bc ("tools: bpftool: support pretty print
      with kind_flag set") added bpftool map dump with kind_flag
      support. When bitfield_size can be retrieved directly from
      btf_member, function btf_dumper_bitfield() is called to
      dump the bitfield. The implementation passed the
      wrong parameter "bit_offset" to the function. The excepted
      value is the bit_offset within a byte while the passed-in
      value is the struct member offset.
      
      This commit fixed the bug with passing correct "bit_offset"
      with adjusted data pointer.
      
      Fixes: 8772c8bc ("tools: bpftool: support pretty print with kind_flag set")
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      298e59d3
    • Y
      tools/bpf: test btf bitfield with >=256 struct member offset · e43207fa
      Yonghong Song 提交于
      This patch modified test_btf pretty print test to cover
      the bitfield with struct member equal to or greater 256.
      
      Without the previous kernel patch fix, the modified test will fail:
      
        $ test_btf -p
        ......
        BTF pretty print array(#1)......unexpected pprint output
        expected: 0: {0,0,0,0x3,0x0,0x3,{0|[0,0,0,0,0,0,0,0]},ENUM_ZERO,4,0x1}
            read: 0: {0,0,0,0x3,0x0,0x3,{0|[0,0,0,0,0,0,0,0]},ENUM_ZERO,4,0x0}
      
        BTF pretty print array(#2)......unexpected pprint output
        expected: 0: {0,0,0,0x3,0x0,0x3,{0|[0,0,0,0,0,0,0,0]},ENUM_ZERO,4,0x1}
            read: 0: {0,0,0,0x3,0x0,0x3,{0|[0,0,0,0,0,0,0,0]},ENUM_ZERO,4,0x0}
      
        PASS:6 SKIP:0 FAIL:2
      
      With the kernel fix, the modified test will succeed:
        $ test_btf -p
        ......
        BTF pretty print array(#1)......OK
        BTF pretty print array(#2)......OK
        PASS:8 SKIP:0 FAIL:0
      
      Fixes: 9d5f9f70 ("bpf: btf: fix struct/union/fwd types with kind_flag")
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      e43207fa
    • Y
      bpf: fix bpffs bitfield pretty print · 17e3ac81
      Yonghong Song 提交于
      Commit 9d5f9f70 ("bpf: btf: fix struct/union/fwd types
      with kind_flag") introduced kind_flag and used bitfield_size
      in the btf_member to directly pretty print member values.
      
      The commit contained a bug where the incorrect parameters could be
      passed to function btf_bitfield_seq_show(). The bits_offset
      parameter in the function expects a value less than 8.
      Instead, the member offset in the structure is passed.
      
      The below is btf_bitfield_seq_show() func signature:
        void btf_bitfield_seq_show(void *data, u8 bits_offset,
                                   u8 nr_bits, struct seq_file *m)
      both bits_offset and nr_bits are u8 type. If the bitfield
      member offset is greater than 256, incorrect value will
      be printed.
      
      This patch fixed the issue by calculating correct proper
      data offset and bits_offset similar to non kind_flag case.
      
      Fixes: 9d5f9f70 ("bpf: btf: fix struct/union/fwd types with kind_flag")
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      17e3ac81
    • H
      net: ethernet: mediatek: fix warning in phy_start_aneg · b19bce03
      Heiner Kallweit 提交于
      linux 5.0-rc1 shows following warning on bpi-r2/mt7623 bootup:
      
      [ 5.170597] WARNING: CPU: 3 PID: 1 at drivers/net/phy/phy.c:548 phy_start_aneg+0x110/0x144
      [ 5.178826] called from state READY
      ....
      [ 5.264111] [<c0629fd4>] (phy_start_aneg) from [<c0e3e720>] (mtk_init+0x414/0x47c)
      [ 5.271630] r7:df5f5eec r6:c0f08c48 r5:00000000 r4:dea67800
      [ 5.277256] [<c0e3e30c>] (mtk_init) from [<c07dabbc>] (register_netdevice+0x98/0x51c)
      [ 5.285035] r8:00000000 r7:00000000 r6:c0f97080 r5:c0f08c48 r4:dea67800
      [ 5.291693] [<c07dab24>] (register_netdevice) from [<c07db06c>] (register_netdev+0x2c/0x44)
      [ 5.299989] r8:00000000 r7:dea2e608 r6:deacea00 r5:dea2e604 r4:dea67800
      [ 5.306646] [<c07db040>] (register_netdev) from [<c06326d8>] (mtk_probe+0x668/0x7ac)
      [ 5.314336] r5:dea2e604 r4:dea2e040
      [ 5.317890] [<c0632070>] (mtk_probe) from [<c05a78fc>] (platform_drv_probe+0x58/0xa8)
      [ 5.325670] r10:c0f86bac r9:00000000 r8:c0fbe578 r7:00000000 r6:c0f86bac r5:00000000
      [ 5.333445] r4:deacea10
      [ 5.335963] [<c05a78a4>] (platform_drv_probe) from [<c05a5248>] (really_probe+0x2d8/0x424)
      
      maybe other boards using this generic driver are affected
      
      v2:
      optimization:
      
      - phy_set_max_speed() is only needed if you want to reduce the
        max speed, typically if the PHY supports 1Gbps but the MAC
        supports 100Mbps only.
      
      - The pause parameters are autonegotiated. Except you have a specific
        need you normally don't need to manually fiddle with this.
      
      - phy_start_aneg() is called implicitly by the phylib state machine,
        you shouldn't call it manually except you have a good excuse.
      
      - netif_carrier_on/netif_carrier_off in mtk_phy_link_adjust() isn't
        needed. It's done by phy_link_change() in phylib.
      Signed-off-by: NFrank Wunderlich <frank-w@public-files.de>
      Reviewed-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Acked-by: NSean Wang <sean.wang@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b19bce03
    • Y
      tcp: change txhash on SYN-data timeout · c5715b8f
      Yuchung Cheng 提交于
      Previously upon SYN timeouts the sender recomputes the txhash to
      try a different path. However this does not apply on the initial
      timeout of SYN-data (active Fast Open). Therefore an active IPv6
      Fast Open connection may incur one second RTO penalty to take on
      a new path after the second SYN retransmission uses a new flow label.
      
      This patch removes this undesirable behavior so Fast Open changes
      the flow label just like the regular connections. This also helps
      avoid falsely disabling Fast Open on the sender which triggers
      after two consecutive SYN timeouts on Fast Open.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Reviewed-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c5715b8f
    • A
      net: dsa: mv88x6xxx: mv88e6390 errata · ea89098e
      Andrew Lunn 提交于
      The 6390 copper ports have an errata which require poking magic values
      into undocumented magic registers and then performing a software
      reset.
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea89098e
    • W
      bonding: update nest level on unlink · 001e465f
      Willem de Bruijn 提交于
      A network device stack with multiple layers of bonding devices can
      trigger a false positive lockdep warning. Adding lockdep nest levels
      fixes this. Update the level on both enslave and unlink, to avoid the
      following series of events ..
      
          ip netns add test
          ip netns exec test bash
          ip link set dev lo addr 00:11:22:33:44:55
          ip link set dev lo down
      
          ip link add dev bond1 type bond
          ip link add dev bond2 type bond
      
          ip link set dev lo master bond1
          ip link set dev bond1 master bond2
      
          ip link set dev bond1 nomaster
          ip link set dev bond2 master bond1
      
      .. from still generating a splat:
      
          [  193.652127] ======================================================
          [  193.658231] WARNING: possible circular locking dependency detected
          [  193.664350] 4.20.0 #8 Not tainted
          [  193.668310] ------------------------------------------------------
          [  193.674417] ip/15577 is trying to acquire lock:
          [  193.678897] 00000000a40e3b69 (&(&bond->stats_lock)->rlock#3/3){+.+.}, at: bond_get_stats+0x58/0x290
          [  193.687851]
          	       but task is already holding lock:
          [  193.693625] 00000000807b9d9f (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0x58/0x290
      
          [..]
      
          [  193.851092]        lock_acquire+0xa7/0x190
          [  193.855138]        _raw_spin_lock_nested+0x2d/0x40
          [  193.859878]        bond_get_stats+0x58/0x290
          [  193.864093]        dev_get_stats+0x5a/0xc0
          [  193.868140]        bond_get_stats+0x105/0x290
          [  193.872444]        dev_get_stats+0x5a/0xc0
          [  193.876493]        rtnl_fill_stats+0x40/0x130
          [  193.880797]        rtnl_fill_ifinfo+0x6c5/0xdc0
          [  193.885271]        rtmsg_ifinfo_build_skb+0x86/0xe0
          [  193.890091]        rtnetlink_event+0x5b/0xa0
          [  193.894320]        raw_notifier_call_chain+0x43/0x60
          [  193.899225]        netdev_change_features+0x50/0xa0
          [  193.904044]        bond_compute_features.isra.46+0x1ab/0x270
          [  193.909640]        bond_enslave+0x141d/0x15b0
          [  193.913946]        do_set_master+0x89/0xa0
          [  193.918016]        do_setlink+0x37c/0xda0
          [  193.921980]        __rtnl_newlink+0x499/0x890
          [  193.926281]        rtnl_newlink+0x48/0x70
          [  193.930238]        rtnetlink_rcv_msg+0x171/0x4b0
          [  193.934801]        netlink_rcv_skb+0xd1/0x110
          [  193.939103]        rtnetlink_rcv+0x15/0x20
          [  193.943151]        netlink_unicast+0x3b5/0x520
          [  193.947544]        netlink_sendmsg+0x2fd/0x3f0
          [  193.951942]        sock_sendmsg+0x38/0x50
          [  193.955899]        ___sys_sendmsg+0x2ba/0x2d0
          [  193.960205]        __x64_sys_sendmsg+0xad/0x100
          [  193.964687]        do_syscall_64+0x5a/0x460
          [  193.968823]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fixes: 7e2556e4 ("bonding: avoid lockdep confusion in bond_get_stats()")
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      001e465f
  3. 10 1月, 2019 11 次提交
    • S
      bpf: fix panic in stack_map_get_build_id() on i386 and arm32 · beaf3d19
      Song Liu 提交于
      As Naresh reported, test_stacktrace_build_id() causes panic on i386 and
      arm32 systems. This is caused by page_address() returns NULL in certain
      cases.
      
      This patch fixes this error by using kmap_atomic/kunmap_atomic instead
      of page_address.
      
      Fixes: 615755a7 (" bpf: extend stackmap to save binary_build_id+offset instead of address")
      Reported-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      beaf3d19
    • A
      selftests: bpf: install files tcp_(server|client)*.py · f98937c6
      Anders Roxell 提交于
      When test_tcpbpf_user runs it complains that it can't find files
      tcp_server.py and tcp_client.py.
      
      Rework so that tcp_server.py and tcp_client.py gets installed, added them
      to the variable TEST_PROGS_EXTENDED.
      
      Fixes: d6d4f60c ("bpf: add selftest for tcpbpf")
      Signed-off-by: NAnders Roxell <anders.roxell@linaro.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      f98937c6
    • I
      samples: bpf: user proper argument index · 11b36abc
      Ioana Ciornei 提交于
      Use optind as index for argv instead of a hardcoded value.
      When the program has options this leads to improper parameter handling.
      
      Fixes: dc378a1a ("samples: bpf: get ifindex from ifname")
      Signed-off-by: NIoana Ciornei <ioana.ciornei@nxp.com>
      Acked-by: NMatteo Croce <mcroce@redhat.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      11b36abc
    • S
      selftests/bpf: add missing executables to .gitignore · e3ca63de
      Stanislav Fomichev 提交于
      We build test_libbpf with CXX to make sure linking against C++ works.
      
      $ make -s -C tools/lib/bpf
      $ git status -sb
      ? tools/lib/bpf/test_libbpf
      $ make -s -C tools/testing/selftests/bpf
      $ git status -sb
      ? tools/lib/bpf/test_libbpf
      ? tools/testing/selftests/bpf/test_libbpf
      
      Fixes: 8c4905b9 ("libbpf: make sure bpf headers are c++ include-able")
      Signed-off-by: NStanislav Fomichev <sdf@google.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      e3ca63de
    • E
      ipv6: fix kernel-infoleak in ipv6_local_error() · 7d033c9f
      Eric Dumazet 提交于
      This patch makes sure the flow label in the IPv6 header
      forged in ipv6_local_error() is initialized.
      
      BUG: KMSAN: kernel-infoleak in _copy_to_user+0x16b/0x1f0 lib/usercopy.c:32
      CPU: 1 PID: 24675 Comm: syz-executor1 Not tainted 4.20.0-rc7+ #4
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x173/0x1d0 lib/dump_stack.c:113
       kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:613
       kmsan_internal_check_memory+0x455/0xb00 mm/kmsan/kmsan.c:675
       kmsan_copy_to_user+0xab/0xc0 mm/kmsan/kmsan_hooks.c:601
       _copy_to_user+0x16b/0x1f0 lib/usercopy.c:32
       copy_to_user include/linux/uaccess.h:177 [inline]
       move_addr_to_user+0x2e9/0x4f0 net/socket.c:227
       ___sys_recvmsg+0x5d7/0x1140 net/socket.c:2284
       __sys_recvmsg net/socket.c:2327 [inline]
       __do_sys_recvmsg net/socket.c:2337 [inline]
       __se_sys_recvmsg+0x2fa/0x450 net/socket.c:2334
       __x64_sys_recvmsg+0x4a/0x70 net/socket.c:2334
       do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
       entry_SYSCALL_64_after_hwframe+0x63/0xe7
      RIP: 0033:0x457ec9
      Code: 6d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 3b b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f8750c06c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002f
      RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457ec9
      RDX: 0000000000002000 RSI: 0000000020000400 RDI: 0000000000000005
      RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007f8750c076d4
      R13: 00000000004c4a60 R14: 00000000004d8140 R15: 00000000ffffffff
      
      Uninit was stored to memory at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:204 [inline]
       kmsan_save_stack mm/kmsan/kmsan.c:219 [inline]
       kmsan_internal_chain_origin+0x134/0x230 mm/kmsan/kmsan.c:439
       __msan_chain_origin+0x70/0xe0 mm/kmsan/kmsan_instr.c:200
       ipv6_recv_error+0x1e3f/0x1eb0 net/ipv6/datagram.c:475
       udpv6_recvmsg+0x398/0x2ab0 net/ipv6/udp.c:335
       inet_recvmsg+0x4fb/0x600 net/ipv4/af_inet.c:830
       sock_recvmsg_nosec net/socket.c:794 [inline]
       sock_recvmsg+0x1d1/0x230 net/socket.c:801
       ___sys_recvmsg+0x4d5/0x1140 net/socket.c:2278
       __sys_recvmsg net/socket.c:2327 [inline]
       __do_sys_recvmsg net/socket.c:2337 [inline]
       __se_sys_recvmsg+0x2fa/0x450 net/socket.c:2334
       __x64_sys_recvmsg+0x4a/0x70 net/socket.c:2334
       do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
       entry_SYSCALL_64_after_hwframe+0x63/0xe7
      
      Uninit was created at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:204 [inline]
       kmsan_internal_poison_shadow+0x92/0x150 mm/kmsan/kmsan.c:158
       kmsan_kmalloc+0xa6/0x130 mm/kmsan/kmsan_hooks.c:176
       kmsan_slab_alloc+0xe/0x10 mm/kmsan/kmsan_hooks.c:185
       slab_post_alloc_hook mm/slab.h:446 [inline]
       slab_alloc_node mm/slub.c:2759 [inline]
       __kmalloc_node_track_caller+0xe18/0x1030 mm/slub.c:4383
       __kmalloc_reserve net/core/skbuff.c:137 [inline]
       __alloc_skb+0x309/0xa20 net/core/skbuff.c:205
       alloc_skb include/linux/skbuff.h:998 [inline]
       ipv6_local_error+0x1a7/0x9e0 net/ipv6/datagram.c:334
       __ip6_append_data+0x129f/0x4fd0 net/ipv6/ip6_output.c:1311
       ip6_make_skb+0x6cc/0xcf0 net/ipv6/ip6_output.c:1775
       udpv6_sendmsg+0x3f8e/0x45d0 net/ipv6/udp.c:1384
       inet_sendmsg+0x54a/0x720 net/ipv4/af_inet.c:798
       sock_sendmsg_nosec net/socket.c:621 [inline]
       sock_sendmsg net/socket.c:631 [inline]
       __sys_sendto+0x8c4/0xac0 net/socket.c:1788
       __do_sys_sendto net/socket.c:1800 [inline]
       __se_sys_sendto+0x107/0x130 net/socket.c:1796
       __x64_sys_sendto+0x6e/0x90 net/socket.c:1796
       do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
       entry_SYSCALL_64_after_hwframe+0x63/0xe7
      
      Bytes 4-7 of 28 are uninitialized
      Memory access of size 28 starts at ffff8881937bfce0
      Data copied to user address 0000000020000000
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d033c9f
    • K
      net/core/neighbour: tell kmemleak about hash tables · 85704cb8
      Konstantin Khlebnikov 提交于
      This fixes false-positive kmemleak reports about leaked neighbour entries:
      
      unreferenced object 0xffff8885c6e4d0a8 (size 1024):
        comm "softirq", pid 0, jiffies 4294922664 (age 167640.804s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 20 2c f3 83 ff ff ff ff  ........ ,......
          08 c0 ef 5f 84 88 ff ff 01 8c 7d 02 01 00 00 00  ..._......}.....
        backtrace:
          [<00000000748509fe>] ip6_finish_output2+0x887/0x1e40
          [<0000000036d7a0d8>] ip6_output+0x1ba/0x600
          [<0000000027ea7dba>] ip6_send_skb+0x92/0x2f0
          [<00000000d6e2111d>] udp_v6_send_skb.isra.24+0x680/0x15e0
          [<000000000668a8be>] udpv6_sendmsg+0x18c9/0x27a0
          [<000000004bd5fa90>] sock_sendmsg+0xb3/0xf0
          [<000000008227b29f>] ___sys_sendmsg+0x745/0x8f0
          [<000000008698009d>] __sys_sendmsg+0xde/0x170
          [<00000000889dacf1>] do_syscall_64+0x9b/0x400
          [<0000000081cdb353>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
          [<000000005767ed39>] 0xffffffffffffffff
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      85704cb8
    • C
      net: cxgb4: fix various indentation issues · fd21c89b
      Colin Ian King 提交于
      There are some lines that have indentation issues, fix these.
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd21c89b
    • C
      net: cxgb3: fix various indentation issues · 2acc0abc
      Colin Ian King 提交于
      There are handful of lines that have indentation issues, fix these.
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2acc0abc
    • W
      ip: on queued skb use skb_header_pointer instead of pskb_may_pull · 4a06fa67
      Willem de Bruijn 提交于
      Commit 2efd4fca ("ip: in cmsg IP(V6)_ORIGDSTADDR call
      pskb_may_pull") avoided a read beyond the end of the skb linear
      segment by calling pskb_may_pull.
      
      That function can trigger a BUG_ON in pskb_expand_head if the skb is
      shared, which it is when when peeking. It can also return ENOMEM.
      
      Avoid both by switching to safer skb_header_pointer.
      
      Fixes: 2efd4fca ("ip: in cmsg IP(V6)_ORIGDSTADDR call pskb_may_pull")
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4a06fa67
    • S
      tun: publish tfile after it's fully initialized · 0b7959b6
      Stanislav Fomichev 提交于
      BUG: unable to handle kernel NULL pointer dereference at 00000000000000d1
      Call Trace:
       ? napi_gro_frags+0xa7/0x2c0
       tun_get_user+0xb50/0xf20
       tun_chr_write_iter+0x53/0x70
       new_sync_write+0xff/0x160
       vfs_write+0x191/0x1e0
       __x64_sys_write+0x5e/0xd0
       do_syscall_64+0x47/0xf0
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      I think there is a subtle race between sending a packet via tap and
      attaching it:
      
      CPU0:                    CPU1:
      tun_chr_ioctl(TUNSETIFF)
        tun_set_iff
          tun_attach
            rcu_assign_pointer(tfile->tun, tun);
                               tun_fops->write_iter()
                                 tun_chr_write_iter
                                   tun_napi_alloc_frags
                                     napi_get_frags
                                       napi->skb = napi_alloc_skb
            tun_napi_init
              netif_napi_add
                napi->skb = NULL
                                    napi->skb is NULL here
                                    napi_gro_frags
                                      napi_frags_skb
      				  skb = napi->skb
      				  skb_reset_mac_header(skb)
      				  panic()
      
      Move rcu_assign_pointer(tfile->tun) and rcu_assign_pointer(tun->tfiles) to
      be the last thing we do in tun_attach(); this should guarantee that when we
      call tun_get() we always get an initialized object.
      
      v2 changes:
      * remove extra napi_mutex locks/unlocks for napi operations
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Fixes: 90e33d45 ("tun: enable napi_gro_frags() for TUN/TAP driver")
      Signed-off-by: NStanislav Fomichev <sdf@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0b7959b6
    • Y
      bpf: correctly set initial window on active Fast Open sender · 31aa6503
      Yuchung Cheng 提交于
      The existing BPF TCP initial congestion window (TCP_BPF_IW) does not
      to work on (active) Fast Open sender. This is because it changes the
      (initial) window only if data_segs_out is zero -- but data_segs_out
      is also incremented on SYN-data.  This patch fixes the issue by
      proerly accounting for SYN-data additionally.
      
      Fixes: fc747810 ("bpf: Adds support for setting initial cwnd")
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Reviewed-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      31aa6503
  4. 09 1月, 2019 6 次提交
    • J
      packet: Do not leak dev refcounts on error exit · d972f3dc
      Jason Gunthorpe 提交于
      'dev' is non NULL when the addr_len check triggers so it must goto a label
      that does the dev_put otherwise dev will have a leaked refcount.
      
      This bug causes the ib_ipoib module to become unloadable when using
      systemd-network as it triggers this check on InfiniBand links.
      
      Fixes: 99137b78 ("packet: validate address length")
      Reported-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d972f3dc
    • D
      Merge branch 'mlxsw-fixes' · 4314b1f6
      David S. Miller 提交于
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2019-01-08
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Fix BSD'ism in sendmsg(2) to rewrite unspecified IPv6 dst for
         unconnected UDP sockets with [::1] _after_ cgroup BPF invocation,
         from Andrey.
      
      2) Follow-up fix to the speculation fix where we need to reject a
         corner case for sanitation when ptr and scalars are mixed in the
         same alu op. Also, some unrelated minor doc fixes, from Daniel.
      
      3) Fix BPF kselftest's incorrect uses of create_and_get_cgroup()
         by not assuming fd of zero value to be the result of an error
         case, from Stanislav.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4314b1f6
    • I
      selftests: forwarding: Add a test for VLAN deletion · 4fabf3bf
      Ido Schimmel 提交于
      Add a VLAN on a bridge port, delete it and make sure the PVID VLAN is
      not affected.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4fabf3bf
    • I
      mlxsw: spectrum_switchdev: Set PVID correctly during VLAN deletion · 674bed5d
      Ido Schimmel 提交于
      When a VLAN is deleted from a bridge port we should not change the PVID
      unless the deleted VLAN is the PVID.
      
      Fixes: fe9ccc78 ("mlxsw: spectrum_switchdev: Don't batch VLAN operations")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      674bed5d
    • I
      selftests: forwarding: Fix test for different devices · 289fb44d
      Ido Schimmel 提交于
      When running the test on the Spectrum ASIC the generated packets are
      counted on the ingress filter and injected back to the pipeline because
      of the 'pass' action. The router block then drops the packets due to
      checksum error, as the test generates packets with zero checksum.
      
      When running the test on an emulator that is not as strict about
      checksum errors the test fails since packets are counted twice. Once by
      the emulated ASIC on its ingress filter and again by the kernel as the
      emulator does not perform checksum validation and allows the packets to
      be trapped by a matching host route.
      
      Fix this by changing the action to 'drop', which will prevent the packet
      from continuing further in the pipeline to the router block.
      
      For veth pairs this change is essentially a NOP given packets are only
      processed once (by the kernel).
      
      Fixes: a0b61f3d ("selftests: forwarding: vxlan_bridge_1d: Add an ECN decap test")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NPetr Machata <petrm@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      289fb44d
    • I
      net: bridge: Fix VLANs memory leak · 27973793
      Ido Schimmel 提交于
      When adding / deleting VLANs to / from a bridge port, the bridge driver
      first tries to propagate the information via switchdev and falls back to
      the 8021q driver in case the underlying driver does not support
      switchdev. This can result in a memory leak [1] when VXLAN and mlxsw
      ports are enslaved to the bridge:
      
      $ ip link set dev vxlan0 master br0
      # No mlxsw ports are enslaved to 'br0', so mlxsw ignores the switchdev
      # notification and the bridge driver adds the VLAN on 'vxlan0' via the
      # 8021q driver
      $ bridge vlan add vid 10 dev vxlan0 pvid untagged
      # mlxsw port is enslaved to the bridge
      $ ip link set dev swp1 master br0
      # mlxsw processes the switchdev notification and the 8021q driver is
      # skipped
      $ bridge vlan del vid 10 dev vxlan0
      
      This results in 'struct vlan_info' and 'struct vlan_vid_info' being
      leaked, as they were allocated by the 8021q driver during VLAN addition,
      but never freed as the 8021q driver was skipped during deletion.
      
      Fix this by introducing a new VLAN private flag that indicates whether
      the VLAN was added on the port by switchdev or the 8021q driver. If the
      VLAN was added by the 8021q driver, then we make sure to delete it via
      the 8021q driver as well.
      
      [1]
      unreferenced object 0xffff88822d20b1e8 (size 256):
        comm "bridge", pid 2532, jiffies 4295216998 (age 1188.830s)
        hex dump (first 32 bytes):
          e0 42 97 ce 81 88 ff ff 00 00 00 00 00 00 00 00  .B..............
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<00000000f82d851d>] kmem_cache_alloc_trace+0x1be/0x330
          [<00000000e0178b02>] vlan_vid_add+0x661/0x920
          [<00000000218ebd5f>] __vlan_add+0x1be9/0x3a00
          [<000000006eafa1ca>] nbp_vlan_add+0x8b3/0xd90
          [<000000003535392c>] br_vlan_info+0x132/0x410
          [<00000000aedaa9dc>] br_afspec+0x75c/0x870
          [<00000000f5716133>] br_setlink+0x3dc/0x6d0
          [<00000000aceca5e2>] rtnl_bridge_setlink+0x615/0xb30
          [<00000000a2f2d23e>] rtnetlink_rcv_msg+0x3a3/0xa80
          [<0000000064097e69>] netlink_rcv_skb+0x152/0x3c0
          [<000000008be8d614>] rtnetlink_rcv+0x21/0x30
          [<000000009ab2ca25>] netlink_unicast+0x52f/0x740
          [<00000000e7d9ac96>] netlink_sendmsg+0x9c7/0xf50
          [<000000005d1e2050>] sock_sendmsg+0xbe/0x120
          [<00000000d51426bc>] ___sys_sendmsg+0x778/0x8f0
          [<00000000b9d7b2cc>] __sys_sendmsg+0x112/0x270
      unreferenced object 0xffff888227454308 (size 32):
        comm "bridge", pid 2532, jiffies 4295216998 (age 1188.882s)
        hex dump (first 32 bytes):
          88 b2 20 2d 82 88 ff ff 88 b2 20 2d 82 88 ff ff  .. -...... -....
          81 00 0a 00 01 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<00000000f82d851d>] kmem_cache_alloc_trace+0x1be/0x330
          [<0000000018050631>] vlan_vid_add+0x3e6/0x920
          [<00000000218ebd5f>] __vlan_add+0x1be9/0x3a00
          [<000000006eafa1ca>] nbp_vlan_add+0x8b3/0xd90
          [<000000003535392c>] br_vlan_info+0x132/0x410
          [<00000000aedaa9dc>] br_afspec+0x75c/0x870
          [<00000000f5716133>] br_setlink+0x3dc/0x6d0
          [<00000000aceca5e2>] rtnl_bridge_setlink+0x615/0xb30
          [<00000000a2f2d23e>] rtnetlink_rcv_msg+0x3a3/0xa80
          [<0000000064097e69>] netlink_rcv_skb+0x152/0x3c0
          [<000000008be8d614>] rtnetlink_rcv+0x21/0x30
          [<000000009ab2ca25>] netlink_unicast+0x52f/0x740
          [<00000000e7d9ac96>] netlink_sendmsg+0x9c7/0xf50
          [<000000005d1e2050>] sock_sendmsg+0xbe/0x120
          [<00000000d51426bc>] ___sys_sendmsg+0x778/0x8f0
          [<00000000b9d7b2cc>] __sys_sendmsg+0x112/0x270
      
      Fixes: d70e42b2 ("mlxsw: spectrum: Enable VxLAN enslavement to VLAN-aware bridges")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NPetr Machata <petrm@mellanox.com>
      Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
      Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Cc: bridge@lists.linux-foundation.org
      Acked-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      27973793