1. 04 3月, 2021 7 次提交
  2. 03 3月, 2021 5 次提交
    • J
      iwlwifi: don't call netif_napi_add() with rxq->lock held (was Re: Lockdep... · 295d4cd8
      Jiri Kosina 提交于
      iwlwifi: don't call netif_napi_add() with rxq->lock held (was Re: Lockdep warning in iwl_pcie_rx_handle())
      
      We can't call netif_napi_add() with rxq-lock held, as there is a potential
      for deadlock as spotted by lockdep (see below). rxq->lock is not
      protecting anything over the netif_napi_add() codepath anyway, so let's
      drop it just before calling into NAPI.
      
       ========================================================
       WARNING: possible irq lock inversion dependency detected
       5.12.0-rc1-00002-gbada49429032 #5 Not tainted
       --------------------------------------------------------
       irq/136-iwlwifi/565 just changed the state of lock:
       ffff89f28433b0b0 (&rxq->lock){+.-.}-{2:2}, at: iwl_pcie_rx_handle+0x7f/0x960 [iwlwifi]
       but this lock took another, SOFTIRQ-unsafe lock in the past:
        (napi_hash_lock){+.+.}-{2:2}
      
       and interrupts could create inverse lock ordering between them.
      
       other info that might help us debug this:
        Possible interrupt unsafe locking scenario:
      
              CPU0                    CPU1
              ----                    ----
         lock(napi_hash_lock);
                                      local_irq_disable();
                                      lock(&rxq->lock);
                                      lock(napi_hash_lock);
         <Interrupt>
           lock(&rxq->lock);
      
        *** DEADLOCK ***
      
       1 lock held by irq/136-iwlwifi/565:
        #0: ffff89f2b1440170 (sync_cmd_lockdep_map){+.+.}-{0:0}, at: iwl_pcie_irq_handler+0x5/0xb30
      
       the shortest dependencies between 2nd lock and 1st lock:
        -> (napi_hash_lock){+.+.}-{2:2} {
           HARDIRQ-ON-W at:
                             lock_acquire+0x277/0x3d0
                             _raw_spin_lock+0x2c/0x40
                             netif_napi_add+0x14b/0x270
                             e1000_probe+0x2fe/0xee0 [e1000e]
                             local_pci_probe+0x42/0x90
                             pci_device_probe+0x10b/0x1c0
                             really_probe+0xef/0x4b0
                             driver_probe_device+0xde/0x150
                             device_driver_attach+0x4f/0x60
                             __driver_attach+0x9c/0x140
                             bus_for_each_dev+0x79/0xc0
                             bus_add_driver+0x18d/0x220
                             driver_register+0x5b/0xf0
                             do_one_initcall+0x5b/0x300
                             do_init_module+0x5b/0x21c
                             load_module+0x1dae/0x22c0
                             __do_sys_finit_module+0xad/0x110
                             do_syscall_64+0x33/0x80
                             entry_SYSCALL_64_after_hwframe+0x44/0xae
           SOFTIRQ-ON-W at:
                             lock_acquire+0x277/0x3d0
                             _raw_spin_lock+0x2c/0x40
                             netif_napi_add+0x14b/0x270
                             e1000_probe+0x2fe/0xee0 [e1000e]
                             local_pci_probe+0x42/0x90
                             pci_device_probe+0x10b/0x1c0
                             really_probe+0xef/0x4b0
                             driver_probe_device+0xde/0x150
                             device_driver_attach+0x4f/0x60
                             __driver_attach+0x9c/0x140
                             bus_for_each_dev+0x79/0xc0
                             bus_add_driver+0x18d/0x220
                             driver_register+0x5b/0xf0
                             do_one_initcall+0x5b/0x300
                             do_init_module+0x5b/0x21c
                             load_module+0x1dae/0x22c0
                             __do_sys_finit_module+0xad/0x110
                             do_syscall_64+0x33/0x80
                             entry_SYSCALL_64_after_hwframe+0x44/0xae
           INITIAL USE at:
                            lock_acquire+0x277/0x3d0
                            _raw_spin_lock+0x2c/0x40
                            netif_napi_add+0x14b/0x270
                            e1000_probe+0x2fe/0xee0 [e1000e]
                            local_pci_probe+0x42/0x90
                            pci_device_probe+0x10b/0x1c0
                            really_probe+0xef/0x4b0
                            driver_probe_device+0xde/0x150
                            device_driver_attach+0x4f/0x60
                            __driver_attach+0x9c/0x140
                            bus_for_each_dev+0x79/0xc0
                            bus_add_driver+0x18d/0x220
                            driver_register+0x5b/0xf0
                            do_one_initcall+0x5b/0x300
                            do_init_module+0x5b/0x21c
                            load_module+0x1dae/0x22c0
                            __do_sys_finit_module+0xad/0x110
                            do_syscall_64+0x33/0x80
                            entry_SYSCALL_64_after_hwframe+0x44/0xae
         }
         ... key      at: [<ffffffffae84ef38>] napi_hash_lock+0x18/0x40
         ... acquired at:
          _raw_spin_lock+0x2c/0x40
          netif_napi_add+0x14b/0x270
          _iwl_pcie_rx_init+0x1f4/0x710 [iwlwifi]
          iwl_pcie_rx_init+0x1b/0x3b0 [iwlwifi]
          iwl_trans_pcie_start_fw+0x2ac/0x6a0 [iwlwifi]
          iwl_mvm_load_ucode_wait_alive+0x116/0x460 [iwlmvm]
          iwl_run_init_mvm_ucode+0xa4/0x3a0 [iwlmvm]
          iwl_op_mode_mvm_start+0x9ed/0xbf0 [iwlmvm]
          _iwl_op_mode_start.isra.4+0x42/0x80 [iwlwifi]
          iwl_opmode_register+0x71/0xe0 [iwlwifi]
          iwl_mvm_init+0x34/0x1000 [iwlmvm]
          do_one_initcall+0x5b/0x300
          do_init_module+0x5b/0x21c
          load_module+0x1dae/0x22c0
          __do_sys_finit_module+0xad/0x110
          do_syscall_64+0x33/0x80
          entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      [ ... lockdep output trimmed .... ]
      
      Fixes: 25edc8f2 ("iwlwifi: pcie: properly implement NAPI")
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Acked-by: NLuca Coelho <luciano.coelho@intel.com>
      Signed-off-by: NKalle Valo <kvalo@codeaurora.org>
      Link: https://lore.kernel.org/r/nycvar.YFH.7.76.2103021134060.12405@cbobk.fhfr.pm
      295d4cd8
    • P
      iwlwifi: fix ARCH=i386 compilation warnings · 436b2656
      Pierre-Louis Bossart 提交于
      An unsigned long variable should rely on '%lu' format strings, not '%zd'
      
      Fixes: a1a6a4cf ("iwlwifi: pnvm: implement reading PNVM from UEFI")
      Signed-off-by: NPierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
      Acked-by: NLuca Coelho <luciano.coelho@intel.com>
      Signed-off-by: NKalle Valo <kvalo@codeaurora.org>
      Link: https://lore.kernel.org/r/20210302011640.1276636-1-pierre-louis.bossart@linux.intel.com
      436b2656
    • W
      iwlwifi: mvm: add terminate entry for dmi_system_id tables · a22549f1
      Wei Yongjun 提交于
      Make sure dmi_system_id tables are NULL terminated. This crashed when LTO was enabled:
      
      BUG: KASAN: global-out-of-bounds in dmi_check_system+0x5a/0x70
      Read of size 1 at addr ffffffffc16af750 by task NetworkManager/1913
      
      CPU: 4 PID: 1913 Comm: NetworkManager Not tainted 5.12.0-rc1+ #10057
      Hardware name: LENOVO 20THCTO1WW/20THCTO1WW, BIOS N2VET27W (1.12 ) 12/21/2020
      Call Trace:
       dump_stack+0x90/0xbe
       print_address_description.constprop.0+0x1d/0x140
       ? dmi_check_system+0x5a/0x70
       ? dmi_check_system+0x5a/0x70
       kasan_report.cold+0x7b/0xd4
       ? dmi_check_system+0x5a/0x70
       __asan_load1+0x4d/0x50
       dmi_check_system+0x5a/0x70
       iwl_mvm_up+0x1360/0x1690 [iwlmvm]
       ? iwl_mvm_send_recovery_cmd+0x270/0x270 [iwlmvm]
       ? setup_object.isra.0+0x27/0xd0
       ? kasan_poison+0x20/0x50
       ? ___slab_alloc.constprop.0+0x483/0x5b0
       ? mempool_kmalloc+0x17/0x20
       ? ftrace_graph_ret_addr+0x2a/0xb0
       ? kasan_poison+0x3c/0x50
       ? cfg80211_iftype_allowed+0x2e/0x90 [cfg80211]
       ? __kasan_check_write+0x14/0x20
       ? mutex_lock+0x86/0xe0
       ? __mutex_lock_slowpath+0x20/0x20
       __iwl_mvm_mac_start+0x49/0x290 [iwlmvm]
       iwl_mvm_mac_start+0x37/0x50 [iwlmvm]
       drv_start+0x73/0x1b0 [mac80211]
       ieee80211_do_open+0x53e/0xf10 [mac80211]
       ? ieee80211_check_concurrent_iface+0x266/0x2e0 [mac80211]
       ieee80211_open+0xb9/0x100 [mac80211]
       __dev_open+0x1b8/0x280
      
      Fixes: a2ac0f48 ("iwlwifi: mvm: implement approved list for the PPAG feature")
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com>
      Reviewed-by: NNathan Chancellor <nathan@kernel.org>
      Tested-by: NVictor Michel <vic.michel.web@gmail.com>
      Acked-by: NLuca Coelho <luciano.coelho@intel.com>
      [kvalo@codeaurora.org: improve commit log]
      Signed-off-by: NKalle Valo <kvalo@codeaurora.org>
      Link: https://lore.kernel.org/r/20210223140039.1708534-1-weiyongjun1@huawei.com
      a22549f1
    • B
      net: ethernet: mtk-star-emac: fix wrong unmap in RX handling · 95b39f07
      Biao Huang 提交于
      mtk_star_dma_unmap_rx() should unmap the dma_addr of old skb rather than
      that of new skb.
      Assign new_dma_addr to desc_data.dma_addr after all handling of old skb
      ends to avoid unexpected receive side error.
      
      Fixes: f96e9641 ("net: ethernet: mtk-star-emac: fix error path in RX handling")
      Signed-off-by: NBiao Huang <biao.huang@mediatek.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      95b39f07
    • W
      stmmac: intel: Fix mdio bus registration issue for TGL-H/ADL-S · fa706dce
      Wong Vee Khee 提交于
      On Intel platforms which consist of two Ethernet Controllers such as
      TGL-H and ADL-S, a unique MDIO bus id is required for MDIO bus to be
      successful registered:
      
      [   13.076133] sysfs: cannot create duplicate filename '/class/mdio_bus/stmmac-1'
      [   13.083404] CPU: 8 PID: 1898 Comm: systemd-udevd Tainted: G     U            5.11.0-net-next #106
      [   13.092410] Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-S ADP-S DRR4 CRB, BIOS ADLIFSI1.R00.1494.B00.2012031421 12/03/2020
      [   13.105709] Call Trace:
      [   13.108176]  dump_stack+0x64/0x7c
      [   13.111553]  sysfs_warn_dup+0x56/0x70
      [   13.115273]  sysfs_do_create_link_sd.isra.2+0xbd/0xd0
      [   13.120371]  device_add+0x4df/0x840
      [   13.123917]  ? complete_all+0x2a/0x40
      [   13.127636]  __mdiobus_register+0x98/0x310 [libphy]
      [   13.132572]  stmmac_mdio_register+0x1c5/0x3f0 [stmmac]
      [   13.137771]  ? stmmac_napi_add+0xa5/0xf0 [stmmac]
      [   13.142493]  stmmac_dvr_probe+0x806/0xee0 [stmmac]
      [   13.147341]  intel_eth_pci_probe+0x1cb/0x250 [dwmac_intel]
      [   13.152884]  pci_device_probe+0xd2/0x150
      [   13.156897]  really_probe+0xf7/0x4d0
      [   13.160527]  driver_probe_device+0x5d/0x140
      [   13.164761]  device_driver_attach+0x4f/0x60
      [   13.168996]  __driver_attach+0xa2/0x140
      [   13.172891]  ? device_driver_attach+0x60/0x60
      [   13.177300]  bus_for_each_dev+0x76/0xc0
      [   13.181188]  bus_add_driver+0x189/0x230
      [   13.185083]  ? 0xffffffffc0795000
      [   13.188446]  driver_register+0x5b/0xf0
      [   13.192249]  ? 0xffffffffc0795000
      [   13.195577]  do_one_initcall+0x4d/0x210
      [   13.199467]  ? kmem_cache_alloc_trace+0x2ff/0x490
      [   13.204228]  do_init_module+0x5b/0x21c
      [   13.208031]  load_module+0x2a0c/0x2de0
      [   13.211838]  ? __do_sys_finit_module+0xb1/0x110
      [   13.216420]  __do_sys_finit_module+0xb1/0x110
      [   13.220825]  do_syscall_64+0x33/0x40
      [   13.224451]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [   13.229515] RIP: 0033:0x7fc2b1919ccd
      [   13.233113] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 93 31 0c 00 f7 d8 64 89 01 48
      [   13.251912] RSP: 002b:00007ffcea2e5b98 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
      [   13.259527] RAX: ffffffffffffffda RBX: 0000560558920f10 RCX: 00007fc2b1919ccd
      [   13.266706] RDX: 0000000000000000 RSI: 00007fc2b1a881e3 RDI: 0000000000000012
      [   13.273887] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000
      [   13.281036] R10: 0000000000000012 R11: 0000000000000246 R12: 00007fc2b1a881e3
      [   13.288183] R13: 0000000000000000 R14: 0000000000000000 R15: 00007ffcea2e5d58
      [   13.295389] libphy: mii_bus stmmac-1 failed to register
      
      Fixes: 88af9bd4 ("stmmac: intel: Add ADL-S 1Gbps PCI IDs")
      Fixes: 8450e23f ("stmmac: intel: Add PCI IDs for TGL-H platform")
      Signed-off-by: NWong Vee Khee <vee.khee.wong@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fa706dce
  3. 02 3月, 2021 28 次提交
    • E
      tcp: add sanity tests to TCP_QUEUE_SEQ · 8811f4a9
      Eric Dumazet 提交于
      Qingyu Li reported a syzkaller bug where the repro
      changes RCV SEQ _after_ restoring data in the receive queue.
      
      mprotect(0x4aa000, 12288, PROT_READ)    = 0
      mmap(0x1ffff000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x1ffff000
      mmap(0x20000000, 16777216, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x20000000
      mmap(0x21000000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x21000000
      socket(AF_INET6, SOCK_STREAM, IPPROTO_IP) = 3
      setsockopt(3, SOL_TCP, TCP_REPAIR, [1], 4) = 0
      connect(3, {sa_family=AF_INET6, sin6_port=htons(0), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_scope_id=0}, 28) = 0
      setsockopt(3, SOL_TCP, TCP_REPAIR_QUEUE, [1], 4) = 0
      sendmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="0x0000000000000003\0\0", iov_len=20}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 20
      setsockopt(3, SOL_TCP, TCP_REPAIR, [0], 4) = 0
      setsockopt(3, SOL_TCP, TCP_QUEUE_SEQ, [128], 4) = 0
      recvfrom(3, NULL, 20, 0, NULL, NULL)    = -1 ECONNRESET (Connection reset by peer)
      
      syslog shows:
      [  111.205099] TCP recvmsg seq # bug 2: copied 80, seq 0, rcvnxt 80, fl 0
      [  111.207894] WARNING: CPU: 1 PID: 356 at net/ipv4/tcp.c:2343 tcp_recvmsg_locked+0x90e/0x29a0
      
      This should not be allowed. TCP_QUEUE_SEQ should only be used
      when queues are empty.
      
      This patch fixes this case, and the tx path as well.
      
      Fixes: ee995283 ("tcp: Initial repair mode")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=212005Reported-by: NQingyu Li <ieatmuttonchuan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8811f4a9
    • A
      hv_netvsc: Fix validation in netvsc_linkstatus_callback() · 3946688e
      Andrea Parri (Microsoft) 提交于
      Contrary to the RNDIS protocol specification, certain (pre-Fe)
      implementations of Hyper-V's vSwitch did not account for the status
      buffer field in the length of an RNDIS packet; the bug was fixed in
      newer implementations.  Validate the status buffer fields using the
      length of the 'vmtransfer_page' packet (all implementations), that
      is known/validated to be less than or equal to the receive section
      size and not smaller than the length of the RNDIS message.
      Reported-by: NDexuan Cui <decui@microsoft.com>
      Suggested-by: NHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: NAndrea Parri (Microsoft) <parri.andrea@gmail.com>
      Fixes: 505e3f00 ("hv_netvsc: Add (more) validation for untrusted Hyper-V values")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3946688e
    • D
      net: dsa: tag_mtk: fix 802.1ad VLAN egress · 9200f515
      DENG Qingfang 提交于
      A different TPID bit is used for 802.1ad VLAN frames.
      Reported-by: NIlario Gelmetti <iochesonome@gmail.com>
      Fixes: f0af3431 ("net: dsa: mediatek: combine MediaTek tag with VLAN tag")
      Signed-off-by: NDENG Qingfang <dqfext@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9200f515
    • W
      net: expand textsearch ts_state to fit skb_seq_state · b228c9b0
      Willem de Bruijn 提交于
      The referenced commit expands the skb_seq_state used by
      skb_find_text with a 4B frag_off field, growing it to 48B.
      
      This exceeds container ts_state->cb, causing a stack corruption:
      
      [   73.238353] Kernel panic - not syncing: stack-protector: Kernel stack
      is corrupted in: skb_find_text+0xc5/0xd0
      [   73.247384] CPU: 1 PID: 376 Comm: nping Not tainted 5.11.0+ #4
      [   73.252613] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
      BIOS 1.14.0-2 04/01/2014
      [   73.260078] Call Trace:
      [   73.264677]  dump_stack+0x57/0x6a
      [   73.267866]  panic+0xf6/0x2b7
      [   73.270578]  ? skb_find_text+0xc5/0xd0
      [   73.273964]  __stack_chk_fail+0x10/0x10
      [   73.277491]  skb_find_text+0xc5/0xd0
      [   73.280727]  string_mt+0x1f/0x30
      [   73.283639]  ipt_do_table+0x214/0x410
      
      The struct is passed between skb_find_text and its callbacks
      skb_prepare_seq_read, skb_seq_read and skb_abort_seq read through
      the textsearch interface using TS_SKB_CB.
      
      I assumed that this mapped to skb->cb like other .._SKB_CB wrappers.
      skb->cb is 48B. But it maps to ts_state->cb, which is only 40B.
      
      skb->cb was increased from 40B to 48B after ts_state was introduced,
      in commit 3e3850e9 ("[NETFILTER]: Fix xfrm lookup in
      ip_route_me_harder/ip6_route_me_harder").
      
      Increase ts_state.cb[] to 48 to fit the struct.
      
      Also add a BUILD_BUG_ON to avoid a repeat.
      
      The alternative is to directly add a dependency from textsearch onto
      linux/skbuff.h, but I think the intent is textsearch to have no such
      dependencies on its callers.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=211911
      Fixes: 97550f6f ("net: compound page support in skb_seq_read")
      Reported-by: NKris Karas <bugs-a17@moonlit-rail.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b228c9b0
    • M
      docs: networking: bonding.rst Fix a typo in bonding.rst · 2353db75
      Masanari Iida 提交于
      This patch fixes a spelling typo in bonding.rst.
      Signed-off-by: NMasanari Iida <standby24x7@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2353db75
    • D
      Merge tag 'linux-can-fixes-for-5.12-20210301' of... · 2eb48982
      David S. Miller 提交于
      Merge tag 'linux-can-fixes-for-5.12-20210301' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
      
      Marc Kleine-Budde says:
      
      ====================
      pull-request: can 2021-03-01
      
      this is a pull request of 6 patches for net/master.
      
      The first 3 patches are by Joakim Zhang for the flexcan driver and fix
      the probing and starting of the chip.
      
      The next patch is by me, for the mcp251xfd driver and reverts the BQL
      support. BQL support got mainline with rc1 and assumes that CAN frames
      are always echoed, which is not the case. A proper fix requires
      changes more changes and will be rolled out via linux-can-next later.
      
      Oleksij Rempel's patch fixes the socket ref counting if socket was
      closed before setting skb ownership.
      
      Torin Cooper-Bennun's patch for the tcan4x5x driver fixes a race
      condition, where the chip is first attached the bus and then the MRAM
      is initialized, which may result in lost data.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2eb48982
    • D
      Merge branch 'enetc-fixes' · 8a00946e
      David S. Miller 提交于
      Vladimir Oltean says:
      
      ====================
      Fixes for NXP ENETC driver
      
      This contains an assorted set of fixes collected over the past 2 weeks
      on the enetc driver. Some are related to VLAN processing, some to
      physical link settings, some are fixups of previous hardware workarounds,
      and some are simply zero-day data path bugs that for some reason were
      never caught or at least identified.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a00946e
    • V
      net: enetc: keep RX ring consumer index in sync with hardware · 3a5d12c9
      Vladimir Oltean 提交于
      The RX rings have a producer index owned by hardware, where newly
      received frame buffers are placed, and a consumer index owned by
      software, where newly allocated buffers are placed, in expectation of
      hardware being able to place frame data in them.
      
      Hardware increments the producer index when a frame is received, however
      it is not allowed to increment the producer index to match the consumer
      index (RBCIR) since the ring can hold at most RBLENR[LENGTH]-1 received
      BDs. Whenever the producer index matches the value of the consumer
      index, the ring has no unprocessed received frames and all BDs in the
      ring have been initialized/prepared by software, i.e. hardware owns all
      BDs in the ring.
      
      The code uses the next_to_clean variable to keep track of the producer
      index, and the next_to_use variable to keep track of the consumer index.
      
      The RX rings are seeded from enetc_refill_rx_ring, which is called from
      two places:
      
      1. initially the ring is seeded until full with enetc_bd_unused(rx_ring),
         i.e. with 511 buffers. This will make next_to_clean=0 and next_to_use=511:
      
      .ndo_open
      -> enetc_open
         -> enetc_setup_bdrs
            -> enetc_setup_rxbdr
               -> enetc_refill_rx_ring
      
      2. then during the data path processing, it is refilled with 16 buffers
         at a time:
      
      enetc_msix
      -> napi_schedule
         -> enetc_poll
            -> enetc_clean_rx_ring
               -> enetc_refill_rx_ring
      
      There is just one problem: the initial seeding done during .ndo_open
      updates just the producer index (ENETC_RBPIR) with 0, and the software
      next_to_clean and next_to_use variables. Notably, it will not update the
      consumer index to make the hardware aware of the newly added buffers.
      
      Wait, what? So how does it work?
      
      Well, the reset values of the producer index and of the consumer index
      of a ring are both zero. As per the description in the second paragraph,
      it means that the ring is full of buffers waiting for hardware to put
      frames in them, which by coincidence is almost true, because we have in
      fact seeded 511 buffers into the ring.
      
      But will the hardware attempt to access the 512th entry of the ring,
      which has an invalid BD in it? Well, no, because in order to do that, it
      would have to first populate the first 511 entries, and the NAPI
      enetc_poll will kick in by then. Eventually, after 16 processed slots
      have become available in the RX ring, enetc_clean_rx_ring will call
      enetc_refill_rx_ring and then will [ finally ] update the consumer index
      with the new software next_to_use variable. From now on, the
      next_to_clean and next_to_use variables are in sync with the producer
      and consumer ring indices.
      
      So the day is saved, right? Well, not quite. Freeing the memory
      allocated for the rings is done in:
      
      enetc_close
      -> enetc_clear_bdrs
         -> enetc_clear_rxbdr
            -> this just disables the ring
      -> enetc_free_rxtx_rings
         -> enetc_free_rx_ring
            -> sets next_to_clean and next_to_use to 0
      
      but again, nothing is committed to the hardware producer and consumer
      indices (yay!). The assumption is that the ring is disabled, so the
      indices don't matter anyway, and it's the responsibility of the "open"
      code path to set those up.
      
      .. Except that the "open" code path does not set those up properly.
      
      While initially, things almost work, during subsequent enetc_close ->
      enetc_open sequences, we have problems. To be precise, the enetc_open
      that is subsequent to enetc_close will again refill the ring with 511
      entries, but it will leave the consumer index untouched. Untouched
      means, of course, equal to the value it had before disabling the ring
      and draining the old buffers in enetc_close.
      
      But as mentioned, enetc_setup_rxbdr will at least update the producer
      index though, through this line of code:
      
      	enetc_rxbdr_wr(hw, idx, ENETC_RBPIR, 0);
      
      so at this stage we'll have:
      
      next_to_clean=0 (in hardware 0)
      next_to_use=511 (in hardware we'll have the refill index prior to enetc_close)
      
      Again, the next_to_clean and producer index are in sync and set to
      correct values, so the driver manages to limp on. Eventually, 16 ring
      entries will be consumed by enetc_poll, and the savior
      enetc_clean_rx_ring will come and call enetc_refill_rx_ring, and then
      update the hardware consumer ring based upon the new next_to_use.
      
      So.. it works?
      Well, by coincidence, it almost does, but there's a circumstance where
      enetc_clean_rx_ring won't be there to save us. If the previous value of
      the consumer index was 15, there's a problem, because the NAPI poll
      sequence will only issue a refill when 16 or more buffers have been
      consumed.
      
      It's easiest to illustrate this with an example:
      
      ip link set eno0 up
      ip addr add 192.168.100.1/24 dev eno0
      ping 192.168.100.1 -c 20 # ping this port from another board
      ip link set eno0 down
      ip link set eno0 up
      ping 192.168.100.1 -c 20 # ping it again from the same other board
      
      One by one:
      
      1. ip link set eno0 up
      -> calls enetc_setup_rxbdr:
         -> calls enetc_refill_rx_ring(511 buffers)
         -> next_to_clean=0 (in hw 0)
         -> next_to_use=511 (in hw 0)
      
      2. ping 192.168.100.1 -c 20 # ping this port from another board
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=1 next_to_clean 0 (in hw 1) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=2 next_to_clean 1 (in hw 2) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=3 next_to_clean 2 (in hw 3) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=4 next_to_clean 3 (in hw 4) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=5 next_to_clean 4 (in hw 5) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=6 next_to_clean 5 (in hw 6) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=7 next_to_clean 6 (in hw 7) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=8 next_to_clean 7 (in hw 8) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=9 next_to_clean 8 (in hw 9) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=10 next_to_clean 9 (in hw 10) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=11 next_to_clean 10 (in hw 11) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=12 next_to_clean 11 (in hw 12) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=13 next_to_clean 12 (in hw 13) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=14 next_to_clean 13 (in hw 14) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=15 next_to_clean 14 (in hw 15) next_to_use 511 (in hw 0)
      enetc_clean_rx_ring: enetc_refill_rx_ring(16) increments next_to_use by 16 (mod 512) and writes it to hw
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=0 next_to_clean 15 (in hw 16) next_to_use 15 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=1 next_to_clean 16 (in hw 17) next_to_use 15 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=2 next_to_clean 17 (in hw 18) next_to_use 15 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=3 next_to_clean 18 (in hw 19) next_to_use 15 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=4 next_to_clean 19 (in hw 20) next_to_use 15 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=5 next_to_clean 20 (in hw 21) next_to_use 15 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=6 next_to_clean 21 (in hw 22) next_to_use 15 (in hw 15)
      
      20 packets transmitted, 20 packets received, 0% packet loss
      
      3. ip link set eno0 down
      enetc_free_rx_ring: next_to_clean 0 (in hw 22), next_to_use 0 (in hw 15)
      
      4. ip link set eno0 up
      -> calls enetc_setup_rxbdr:
         -> calls enetc_refill_rx_ring(511 buffers)
         -> next_to_clean=0 (in hw 0)
         -> next_to_use=511 (in hw 15)
      
      5. ping 192.168.100.1 -c 20 # ping it again from the same other board
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=1 next_to_clean 0 (in hw 1) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=2 next_to_clean 1 (in hw 2) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=3 next_to_clean 2 (in hw 3) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=4 next_to_clean 3 (in hw 4) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=5 next_to_clean 4 (in hw 5) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=6 next_to_clean 5 (in hw 6) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=7 next_to_clean 6 (in hw 7) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=8 next_to_clean 7 (in hw 8) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=9 next_to_clean 8 (in hw 9) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=10 next_to_clean 9 (in hw 10) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=11 next_to_clean 10 (in hw 11) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=12 next_to_clean 11 (in hw 12) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=13 next_to_clean 12 (in hw 13) next_to_use 511 (in hw 15)
      enetc_clean_rx_ring: rx_frm_cnt=1 cleaned_cnt=14 next_to_clean 13 (in hw 14) next_to_use 511 (in hw 15)
      
      20 packets transmitted, 12 packets received, 40% packet loss
      
      And there it dies. No enetc_refill_rx_ring (because cleaned_cnt must be equal
      to 15 for that to happen), no nothing. The hardware enters the condition where
      the producer (14) + 1 is equal to the consumer (15) index, which makes it
      believe it has no more free buffers to put packets in, so it starts discarding
      them:
      
      ip netns exec ns0 ethtool -S eno0 | grep -v ': 0'
      NIC statistics:
           Rx ring  0 discarded frames: 8
      
      Summarized, if the interface receives between 16 and 32 (mod 512) frames
      and then there is a link flap, then the port will eventually die with no
      way to recover. If it receives less than 16 (mod 512) frames, then the
      initial NAPI poll [ before the link flap ] will not update the consumer
      index in hardware (it will remain zero) which will be ok when the buffers
      are later reinitialized. If more than 32 (mod 512) frames are received,
      the initial NAPI poll has the chance to refill the ring twice, updating
      the consumer index to at least 32. So after the link flap, the consumer
      index is still wrong, but the post-flap NAPI poll gets a chance to
      refill the ring once (because it passes through cleaned_cnt=15) and
      makes the consumer index be again back in sync with next_to_use.
      
      The solution to this problem is actually simple, we just need to write
      next_to_use into the hardware consumer index at enetc_open time, which
      always brings it back in sync after an initial buffer seeding process.
      
      The simpler thing would be to put the write to the consumer index into
      enetc_refill_rx_ring directly, but there are issues with the MDIO
      locking: in the NAPI poll code we have the enetc_lock_mdio() taken from
      top-level and we use the unlocked enetc_wr_reg_hot, whereas in
      enetc_open, the enetc_lock_mdio() is not taken at the top level, but
      instead by each individual enetc_wr_reg, so we are forced to put an
      additional enetc_wr_reg in enetc_setup_rxbdr. Better organization of
      the code is left as a refactoring exercise.
      
      Fixes: d4fd0404 ("enetc: Introduce basic PF and VF ENETC ethernet drivers")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a5d12c9
    • V
      net: enetc: remove bogus write to SIRXIDR from enetc_setup_rxbdr · 96a5223b
      Vladimir Oltean 提交于
      The Station Interface Receive Interrupt Detect Register (SIRXIDR)
      contains a 16-bit wide mask of 'interrupt detected' events for each ring
      associated with a port. Bit i is write-1-to-clean for RX ring i.
      
      I have no explanation whatsoever how this line of code came to be
      inserted in the blamed commit. I checked the downstream versions of that
      patch and none of them have it.
      
      The somewhat comical aspect of it is that we're writing a binary number
      to the SIRXIDR register, which is derived from enetc_bd_unused(rx_ring).
      Since the RX rings have 512 buffer descriptors, we end up writing 511 to
      this register, which is 0x1ff, so we are effectively clearing the
      'interrupt detected' event for rings 0-8.
      
      This register is not what is used for interrupt handling though - it
      only provides a summary for the entire SI. The hardware provides one
      separate Interrupt Detect Register per RX ring, which auto-clears upon
      read. So there doesn't seem to be any adverse effect caused by this
      bogus write.
      
      There is, however, one reason why this should be handled as a bugfix:
      next_to_clean _should_ be committed to hardware, just not to that
      register, and this was obscuring the fact that it wasn't. This is fixed
      in the next patch, and removing the bogus line now allows the fix patch
      to be backported beyond that point.
      
      Fixes: fd5736bf ("enetc: Workaround for MDIO register access issue")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      96a5223b
    • V
      net: enetc: force the RGMII speed and duplex instead of operating in inband mode · c76a9721
      Vladimir Oltean 提交于
      The ENETC port 0 MAC supports in-band status signaling coming from a PHY
      when operating in RGMII mode, and this feature is enabled by default.
      
      It has been reported that RGMII is broken in fixed-link, and that is not
      surprising considering the fact that no PHY is attached to the MAC in
      that case, but a switch.
      
      This brings us to the topic of the patch: the enetc driver should have
      not enabled the optional in-band status signaling for RGMII unconditionally,
      but should have forced the speed and duplex to what was resolved by
      phylink.
      
      Note that phylink does not accept the RGMII modes as valid for in-band
      signaling, and these operate a bit differently than 1000base-x and SGMII
      (notably there is no clause 37 state machine so no ACK required from the
      MAC, instead the PHY sends extra code words on RXD[3:0] whenever it is
      not transmitting something else, so it should be safe to leave a PHY
      with this option unconditionally enabled even if we ignore it). The spec
      talks about this here:
      https://e2e.ti.com/cfs-file/__key/communityserver-discussions-components-files/138/RGMIIv1_5F00_3.pdf
      
      Fixes: 71b77a7a ("enetc: Migrate to PHYLINK and PCS_LYNX")
      Cc: Florian Fainelli <f.fainelli@gmail.com>
      Cc: Andrew Lunn <andrew@lunn.ch>
      Cc: Russell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Acked-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c76a9721
    • V
      net: enetc: don't disable VLAN filtering in IFF_PROMISC mode · a74dbce9
      Vladimir Oltean 提交于
      Quoting from the blamed commit:
      
          In promiscuous mode, it is more intuitive that all traffic is received,
          including VLAN tagged traffic. It appears that it is necessary to set
          the flag in PSIPVMR for that to be the case, so VLAN promiscuous mode is
          also temporarily enabled. On exit from promiscuous mode, the setting
          made by ethtool is restored.
      
      Intuitive or not, there isn't any definition issued by a standards body
      which says that promiscuity has anything to do with VLAN filtering - it
      only has to do with accepting packets regardless of destination MAC address.
      
      In fact people are already trying to use this misunderstanding/bug of
      the enetc driver as a justification to transform promiscuity into
      something it never was about: accepting every packet (maybe that would
      be the "rx-all" netdev feature?):
      https://lore.kernel.org/netdev/20201110153958.ci5ekor3o2ekg3ky@ipetronik.com/
      
      This is relevant because there are use cases in the kernel (such as
      tc-flower rules with the protocol 802.1Q and a vlan_id key) which do not
      (yet) use the vlan_vid_add API to be compatible with VLAN-filtering NICs
      such as enetc, so for those, disabling rx-vlan-filter is currently the
      only right solution to make these setups work:
      https://lore.kernel.org/netdev/CA+h21hoxwRdhq4y+w8Kwgm74d4cA0xLeiHTrmT-VpSaM7obhkg@mail.gmail.com/
      The blamed patch has unintentionally introduced one more way for this to
      work, which is to enable IFF_PROMISC, however this is non-portable
      because port promiscuity is not meant to disable VLAN filtering.
      Therefore, it could invite people to write broken scripts for enetc, and
      then wonder why they are broken when migrating to other drivers that
      don't handle promiscuity in the same way.
      
      Fixes: 7070eea5 ("enetc: permit configuration of rx-vlan-filter with ethtool")
      Cc: Markus Blöchl <Markus.Bloechl@ipetronik.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a74dbce9
    • V
      net: enetc: fix incorrect TPID when receiving 802.1ad tagged packets · 827b6fd0
      Vladimir Oltean 提交于
      When the enetc ports have rx-vlan-offload enabled, they report a TPID of
      ETH_P_8021Q regardless of what was actually in the packet. When
      rx-vlan-offload is disabled, packets have the proper TPID. Fix this
      inconsistency by finishing the TODO left in the code.
      
      Fixes: d4fd0404 ("enetc: Introduce basic PF and VF ENETC ethernet drivers")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      827b6fd0
    • V
      net: enetc: take the MDIO lock only once per NAPI poll cycle · 6d36ecdb
      Vladimir Oltean 提交于
      The workaround for the ENETC MDIO erratum caused a performance
      degradation of 82 Kpps (seen with IP forwarding of two 1Gbps streams of
      64B packets). This is due to excessive locking and unlocking in the fast
      path, which can be avoided.
      
      By taking the MDIO read-side lock only once per NAPI poll cycle, we are
      able to regain 54 Kpps (65%) of the performance hit. The rest of the
      performance degradation comes from the TX data path, but unfortunately
      it doesn't look like we can optimize that away easily, even with
      netdev_xmit_more(), there just isn't any skb batching done, to help with
      taking the MDIO lock less often than once per packet.
      
      We need to change the register accessor type for enetc_get_tx_tstamp,
      because it now runs under the enetc_lock_mdio as per the new call path
      detailed below:
      
      enetc_msix
      -> napi_schedule
         -> enetc_poll
            -> enetc_lock_mdio
            -> enetc_clean_tx_ring
               -> enetc_get_tx_tstamp
            -> enetc_clean_rx_ring
            -> enetc_unlock_mdio
      
      Fixes: fd5736bf ("enetc: Workaround for MDIO register access issue")
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6d36ecdb
    • V
      net: enetc: initialize RFS/RSS memories for unused ports too · 3222b5b6
      Vladimir Oltean 提交于
      Michael reports that since linux-next-20210211, the AER messages for ECC
      errors have started reappearing, and this time they can be reliably
      reproduced with the first ping on one of his LS1028A boards.
      
      $ ping 1[   33.258069] pcieport 0000:00:1f.0: AER: Multiple Corrected error received: 0000:00:00.0
      72.16.0.1
      PING [   33.267050] pcieport 0000:00:1f.0: AER: can't find device of ID0000
      172.16.0.1 (172.16.0.1): 56 data bytes
      64 bytes from 172.16.0.1: seq=0 ttl=64 time=17.124 ms
      64 bytes from 172.16.0.1: seq=1 ttl=64 time=0.273 ms
      
      $ devmem 0x1f8010e10 32
      0xC0000006
      
      It isn't clear why this is necessary, but it seems that for the errors
      to go away, we must clear the entire RFS and RSS memory, not just for
      the ports in use.
      
      Sadly the code is structured in such a way that we can't have unified
      logic for the used and unused ports. For the minimal initialization of
      an unused port, we need just to enable and ioremap the PF memory space,
      and a control buffer descriptor ring. Unused ports must then free the
      CBDR because the driver will exit, but used ports can not pick up from
      where that code path left, since the CBDR API does not reinitialize a
      ring when setting it up, so its producer and consumer indices are out of
      sync between the software and hardware state. So a separate
      enetc_init_unused_port function was created, and it gets called right
      after the PF memory space is enabled.
      
      Fixes: 07bf34a5 ("net: enetc: initialize the RFS and RSS memories")
      Reported-by: NMichael Walle <michael@walle.cc>
      Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Tested-by: NMichael Walle <michael@walle.cc>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3222b5b6
    • V
      net: enetc: don't overwrite the RSS indirection table when initializing · c646d10d
      Vladimir Oltean 提交于
      After the blamed patch, all RX traffic gets hashed to CPU 0 because the
      hashing indirection table set up in:
      
      enetc_pf_probe
      -> enetc_alloc_si_resources
         -> enetc_configure_si
            -> enetc_setup_default_rss_table
      
      is overwritten later in:
      
      enetc_pf_probe
      -> enetc_init_port_rss_memory
      
      which zero-initializes the entire port RSS table in order to avoid ECC errors.
      
      The trouble really is that enetc_init_port_rss_memory really neads
      enetc_alloc_si_resources to be called, because it depends upon
      enetc_alloc_cbdr and enetc_setup_cbdr. But that whole enetc_configure_si
      thing could have been better thought out, it has nothing to do in a
      function called "alloc_si_resources", especially since its counterpart,
      "free_si_resources", does nothing to unwind the configuration of the SI.
      
      The point is, we need to pull out enetc_configure_si out of
      enetc_alloc_resources, and move it after enetc_init_port_rss_memory.
      This allows us to set up the default RSS indirection table after
      initializing the memory.
      
      Fixes: 07bf34a5 ("net: enetc: initialize the RFS and RSS memories")
      Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c646d10d
    • Y
      inetpeer: use div64_ul() and clamp_val() calculate inet_peer_threshold · 8bd2a055
      Yejune Deng 提交于
      In inet_initpeers(), struct inet_peer on IA32 uses 128 bytes in nowdays.
      Get rid of the cascade and use div64_ul() and clamp_val() calculate that
      will not need to be adjusted in the future as suggested by Eric Dumazet.
      Suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NYejune Deng <yejune.deng@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8bd2a055
    • P
      net/qrtr: fix __netdev_alloc_skb call · 093b036a
      Pavel Skripkin 提交于
      syzbot found WARNING in __alloc_pages_nodemask()[1] when order >= MAX_ORDER.
      It was caused by a huge length value passed from userspace to qrtr_tun_write_iter(),
      which tries to allocate skb. Since the value comes from the untrusted source
      there is no need to raise a warning in __alloc_pages_nodemask().
      
      [1] WARNING in __alloc_pages_nodemask+0x5f8/0x730 mm/page_alloc.c:5014
      Call Trace:
       __alloc_pages include/linux/gfp.h:511 [inline]
       __alloc_pages_node include/linux/gfp.h:524 [inline]
       alloc_pages_node include/linux/gfp.h:538 [inline]
       kmalloc_large_node+0x60/0x110 mm/slub.c:3999
       __kmalloc_node_track_caller+0x319/0x3f0 mm/slub.c:4496
       __kmalloc_reserve net/core/skbuff.c:150 [inline]
       __alloc_skb+0x4e4/0x5a0 net/core/skbuff.c:210
       __netdev_alloc_skb+0x70/0x400 net/core/skbuff.c:446
       netdev_alloc_skb include/linux/skbuff.h:2832 [inline]
       qrtr_endpoint_post+0x84/0x11b0 net/qrtr/qrtr.c:442
       qrtr_tun_write_iter+0x11f/0x1a0 net/qrtr/tun.c:98
       call_write_iter include/linux/fs.h:1901 [inline]
       new_sync_write+0x426/0x650 fs/read_write.c:518
       vfs_write+0x791/0xa30 fs/read_write.c:605
       ksys_write+0x12d/0x250 fs/read_write.c:658
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported-by: syzbot+80dccaee7c6630fa9dcf@syzkaller.appspotmail.com
      Signed-off-by: NPavel Skripkin <paskripkin@gmail.com>
      Acked-by: NAlexander Lobakin <alobakin@pm.me>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      093b036a
    • D
      Merge branch 'sh_eth-masks' · 5db4f74e
      David S. Miller 提交于
      Sergey Shtylyov says:
      
      ====================
      Fix TRSCER masks in the Ether driver
      
      Here are 3 patches against DaveM's 'net' repo. I'm fixing the TRSCER masks in
      the driver to match the manuals...
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5db4f74e
    • S
      sh_eth: fix TRSCER mask for R7S9210 · 165bc5a4
      Sergey Shtylyov 提交于
      According  to the RZ/A2M Group User's Manual: Hardware, Rev. 2.00,
      the TRSCER register has bit 9 reserved, hence we can't use the driver's
      default TRSCER mask.  Add the explicit initializer for sh_eth_cpu_data::
      trscer_err_mask for R7S9210.
      
      Fixes: 6e0bb04d ("sh_eth: Add R7S9210 support")
      Signed-off-by: NSergey Shtylyov <s.shtylyov@omprussia.ru>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      165bc5a4
    • S
      sh_eth: fix TRSCER mask for R7S72100 · 75be7fb7
      Sergey Shtylyov 提交于
      According  to  the RZ/A1H Group, RZ/A1M Group User's Manual: Hardware,
      Rev. 4.00, the TRSCER register has bit 9 reserved, hence we can't use
      the driver's default TRSCER mask.  Add the explicit initializer for
      sh_eth_cpu_data::trscer_err_mask for R7S72100.
      
      Fixes: db893473 ("sh_eth: Add support for r7s72100")
      Signed-off-by: NSergey Shtylyov <s.shtylyov@omprussia.ru>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      75be7fb7
    • S
      sh_eth: fix TRSCER mask for SH771x · 8c91bc3d
      Sergey Shtylyov 提交于
      According  to  the SH7710, SH7712, SH7713 Group User's Manual: Hardware,
      Rev. 3.00, the TRSCER register actually has only bit 7 valid (and named
      differently), with all the other bits reserved. Apparently, this was not
      the case with some early revisions of the manual as we have the other
      bits declared (and set) in the original driver.  Follow the suit and add
      the explicit sh_eth_cpu_data::trscer_err_mask initializer for SH771x...
      
      Fixes: 86a74ff2 ("net: sh_eth: add support for Renesas SuperH Ethernet")
      Signed-off-by: NSergey Shtylyov <s.shtylyov@omprussia.ru>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8c91bc3d
    • T
      atm: lanai: dont run lanai_dev_close if not open · a2bd4583
      Tong Zhang 提交于
      lanai_dev_open() can fail. When it fail, lanai->base is unmapped and the
      pci device is disabled. The caller, lanai_init_one(), then tries to run
      atm_dev_deregister(). This will subsequently call lanai_dev_close() and
      use the already released MMIO area.
      
      To fix this issue, set the lanai->base to NULL if open fail,
      and test the flag in lanai_dev_close().
      
      [    8.324153] lanai: lanai_start() failed, err=19
      [    8.324819] lanai(itf 0): shutting down interface
      [    8.325211] BUG: unable to handle page fault for address: ffffc90000180024
      [    8.325781] #PF: supervisor write access in kernel mode
      [    8.326215] #PF: error_code(0x0002) - not-present page
      [    8.326641] PGD 100000067 P4D 100000067 PUD 100139067 PMD 10013a067 PTE 0
      [    8.327206] Oops: 0002 [#1] SMP KASAN NOPTI
      [    8.327557] CPU: 0 PID: 95 Comm: modprobe Not tainted 5.11.0-rc7-00090-gdcc0b490 #12
      [    8.328229] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-48-gd9c812dda519-4
      [    8.329145] RIP: 0010:lanai_dev_close+0x4f/0xe5 [lanai]
      [    8.329587] Code: 00 48 c7 c7 00 d3 01 c0 e8 49 4e 0a c2 48 8d bd 08 02 00 00 e8 6e 52 14 c1 48 80
      [    8.330917] RSP: 0018:ffff8881029ef680 EFLAGS: 00010246
      [    8.331196] RAX: 000000000003fffe RBX: ffff888102fb4800 RCX: ffffffffc001a98a
      [    8.331572] RDX: ffffc90000180000 RSI: 0000000000000246 RDI: ffff888102fb4000
      [    8.331948] RBP: ffff888102fb4000 R08: ffffffff8115da8a R09: ffffed102053deaa
      [    8.332326] R10: 0000000000000003 R11: ffffed102053dea9 R12: ffff888102fb48a4
      [    8.332701] R13: ffffffffc00123c0 R14: ffff888102fb4b90 R15: ffff888102fb4b88
      [    8.333077] FS:  00007f08eb9056a0(0000) GS:ffff88815b400000(0000) knlGS:0000000000000000
      [    8.333502] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [    8.333806] CR2: ffffc90000180024 CR3: 0000000102a28000 CR4: 00000000000006f0
      [    8.334182] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [    8.334557] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [    8.334932] Call Trace:
      [    8.335066]  atm_dev_deregister+0x161/0x1a0 [atm]
      [    8.335324]  lanai_init_one.cold+0x20c/0x96d [lanai]
      [    8.335594]  ? lanai_send+0x2a0/0x2a0 [lanai]
      [    8.335831]  local_pci_probe+0x6f/0xb0
      [    8.336039]  pci_device_probe+0x171/0x240
      [    8.336255]  ? pci_device_remove+0xe0/0xe0
      [    8.336475]  ? kernfs_create_link+0xb6/0x110
      [    8.336704]  ? sysfs_do_create_link_sd.isra.0+0x76/0xe0
      [    8.336983]  really_probe+0x161/0x420
      [    8.337181]  driver_probe_device+0x6d/0xd0
      [    8.337401]  device_driver_attach+0x82/0x90
      [    8.337626]  ? device_driver_attach+0x90/0x90
      [    8.337859]  __driver_attach+0x60/0x100
      [    8.338065]  ? device_driver_attach+0x90/0x90
      [    8.338298]  bus_for_each_dev+0xe1/0x140
      [    8.338511]  ? subsys_dev_iter_exit+0x10/0x10
      [    8.338745]  ? klist_node_init+0x61/0x80
      [    8.338956]  bus_add_driver+0x254/0x2a0
      [    8.339164]  driver_register+0xd3/0x150
      [    8.339370]  ? 0xffffffffc0028000
      [    8.339550]  do_one_initcall+0x84/0x250
      [    8.339755]  ? trace_event_raw_event_initcall_finish+0x150/0x150
      [    8.340076]  ? free_vmap_area_noflush+0x1a5/0x5c0
      [    8.340329]  ? unpoison_range+0xf/0x30
      [    8.340532]  ? ____kasan_kmalloc.constprop.0+0x84/0xa0
      [    8.340806]  ? unpoison_range+0xf/0x30
      [    8.341014]  ? unpoison_range+0xf/0x30
      [    8.341217]  do_init_module+0xf8/0x350
      [    8.341419]  load_module+0x3fe6/0x4340
      [    8.341621]  ? vm_unmap_ram+0x1d0/0x1d0
      [    8.341826]  ? ____kasan_kmalloc.constprop.0+0x84/0xa0
      [    8.342101]  ? module_frob_arch_sections+0x20/0x20
      [    8.342358]  ? __do_sys_finit_module+0x108/0x170
      [    8.342604]  __do_sys_finit_module+0x108/0x170
      [    8.342841]  ? __ia32_sys_init_module+0x40/0x40
      [    8.343083]  ? file_open_root+0x200/0x200
      [    8.343298]  ? do_sys_open+0x85/0xe0
      [    8.343491]  ? filp_open+0x50/0x50
      [    8.343675]  ? exit_to_user_mode_prepare+0xfc/0x130
      [    8.343935]  do_syscall_64+0x33/0x40
      [    8.344132]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [    8.344401] RIP: 0033:0x7f08eb887cf7
      [    8.344594] Code: 48 89 57 30 48 8b 04 24 48 89 47 38 e9 1d a0 02 00 48 89 f8 48 89 f7 48 89 d6 41
      [    8.345565] RSP: 002b:00007ffcd5c98ad8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
      [    8.345962] RAX: ffffffffffffffda RBX: 00000000008fea70 RCX: 00007f08eb887cf7
      [    8.346336] RDX: 0000000000000000 RSI: 00000000008fd9e0 RDI: 0000000000000003
      [    8.346711] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000001
      [    8.347085] R10: 00007f08eb8eb300 R11: 0000000000000246 R12: 00000000008fd9e0
      [    8.347460] R13: 0000000000000000 R14: 00000000008fddd0 R15: 0000000000000001
      [    8.347836] Modules linked in: lanai(+) atm
      [    8.348065] CR2: ffffc90000180024
      [    8.348244] ---[ end trace 7fdc1c668f2003e5 ]---
      [    8.348490] RIP: 0010:lanai_dev_close+0x4f/0xe5 [lanai]
      [    8.348772] Code: 00 48 c7 c7 00 d3 01 c0 e8 49 4e 0a c2 48 8d bd 08 02 00 00 e8 6e 52 14 c1 48 80
      [    8.349745] RSP: 0018:ffff8881029ef680 EFLAGS: 00010246
      [    8.350022] RAX: 000000000003fffe RBX: ffff888102fb4800 RCX: ffffffffc001a98a
      [    8.350397] RDX: ffffc90000180000 RSI: 0000000000000246 RDI: ffff888102fb4000
      [    8.350772] RBP: ffff888102fb4000 R08: ffffffff8115da8a R09: ffffed102053deaa
      [    8.351151] R10: 0000000000000003 R11: ffffed102053dea9 R12: ffff888102fb48a4
      [    8.351525] R13: ffffffffc00123c0 R14: ffff888102fb4b90 R15: ffff888102fb4b88
      [    8.351918] FS:  00007f08eb9056a0(0000) GS:ffff88815b400000(0000) knlGS:0000000000000000
      [    8.352343] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [    8.352647] CR2: ffffc90000180024 CR3: 0000000102a28000 CR4: 00000000000006f0
      [    8.353022] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [    8.353397] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [    8.353958] modprobe (95) used greatest stack depth: 26216 bytes left
      Signed-off-by: NTong Zhang <ztong0001@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a2bd4583
    • T
      atm: eni: dont release is never initialized · 4deb550b
      Tong Zhang 提交于
      label err_eni_release is reachable when eni_start() fail.
      In eni_start() it calls dev->phy->start() in the last step, if start()
      fail we don't need to call phy->stop(), if start() is never called, we
      neither need to call phy->stop(), otherwise null-ptr-deref will happen.
      
      In order to fix this issue, don't call phy->stop() in label err_eni_release
      
      [    4.875714] ==================================================================
      [    4.876091] BUG: KASAN: null-ptr-deref in suni_stop+0x47/0x100 [suni]
      [    4.876433] Read of size 8 at addr 0000000000000030 by task modprobe/95
      [    4.876778]
      [    4.876862] CPU: 0 PID: 95 Comm: modprobe Not tainted 5.11.0-rc7-00090-gdcc0b490 #2
      [    4.877290] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-48-gd94
      [    4.877876] Call Trace:
      [    4.878009]  dump_stack+0x7d/0xa3
      [    4.878191]  kasan_report.cold+0x10c/0x10e
      [    4.878410]  ? __slab_free+0x2f0/0x340
      [    4.878612]  ? suni_stop+0x47/0x100 [suni]
      [    4.878832]  suni_stop+0x47/0x100 [suni]
      [    4.879043]  eni_do_release+0x3b/0x70 [eni]
      [    4.879269]  eni_init_one.cold+0x1152/0x1747 [eni]
      [    4.879528]  ? _raw_spin_lock_irqsave+0x7b/0xd0
      [    4.879768]  ? eni_ioctl+0x270/0x270 [eni]
      [    4.879990]  ? __mutex_lock_slowpath+0x10/0x10
      [    4.880226]  ? eni_ioctl+0x270/0x270 [eni]
      [    4.880448]  local_pci_probe+0x6f/0xb0
      [    4.880650]  pci_device_probe+0x171/0x240
      [    4.880864]  ? pci_device_remove+0xe0/0xe0
      [    4.881086]  ? kernfs_create_link+0xb6/0x110
      [    4.881315]  ? sysfs_do_create_link_sd.isra.0+0x76/0xe0
      [    4.881594]  really_probe+0x161/0x420
      [    4.881791]  driver_probe_device+0x6d/0xd0
      [    4.882010]  device_driver_attach+0x82/0x90
      [    4.882233]  ? device_driver_attach+0x90/0x90
      [    4.882465]  __driver_attach+0x60/0x100
      [    4.882671]  ? device_driver_attach+0x90/0x90
      [    4.882903]  bus_for_each_dev+0xe1/0x140
      [    4.883114]  ? subsys_dev_iter_exit+0x10/0x10
      [    4.883346]  ? klist_node_init+0x61/0x80
      [    4.883557]  bus_add_driver+0x254/0x2a0
      [    4.883764]  driver_register+0xd3/0x150
      [    4.883971]  ? 0xffffffffc0038000
      [    4.884149]  do_one_initcall+0x84/0x250
      [    4.884355]  ? trace_event_raw_event_initcall_finish+0x150/0x150
      [    4.884674]  ? unpoison_range+0xf/0x30
      [    4.884875]  ? ____kasan_kmalloc.constprop.0+0x84/0xa0
      [    4.885150]  ? unpoison_range+0xf/0x30
      [    4.885352]  ? unpoison_range+0xf/0x30
      [    4.885557]  do_init_module+0xf8/0x350
      [    4.885760]  load_module+0x3fe6/0x4340
      [    4.885960]  ? vm_unmap_ram+0x1d0/0x1d0
      [    4.886166]  ? ____kasan_kmalloc.constprop.0+0x84/0xa0
      [    4.886441]  ? module_frob_arch_sections+0x20/0x20
      [    4.886697]  ? __do_sys_finit_module+0x108/0x170
      [    4.886941]  __do_sys_finit_module+0x108/0x170
      [    4.887178]  ? __ia32_sys_init_module+0x40/0x40
      [    4.887419]  ? file_open_root+0x200/0x200
      [    4.887634]  ? do_sys_open+0x85/0xe0
      [    4.887826]  ? filp_open+0x50/0x50
      [    4.888009]  ? fpregs_assert_state_consistent+0x4d/0x60
      [    4.888287]  ? exit_to_user_mode_prepare+0x2f/0x130
      [    4.888547]  do_syscall_64+0x33/0x40
      [    4.888739]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [    4.889010] RIP: 0033:0x7ff62fcf1cf7
      [    4.889202] Code: 48 89 57 30 48 8b 04 24 48 89 47 38 e9 1d a0 02 00 48 89 f8 48 89 f71
      [    4.890172] RSP: 002b:00007ffe6644ade8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
      [    4.890570] RAX: ffffffffffffffda RBX: 0000000000f2ca70 RCX: 00007ff62fcf1cf7
      [    4.890944] RDX: 0000000000000000 RSI: 0000000000f2b9e0 RDI: 0000000000000003
      [    4.891318] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000001
      [    4.891691] R10: 00007ff62fd55300 R11: 0000000000000246 R12: 0000000000f2b9e0
      [    4.892064] R13: 0000000000000000 R14: 0000000000f2bdd0 R15: 0000000000000001
      [    4.892439] ==================================================================
      Signed-off-by: NTong Zhang <ztong0001@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4deb550b
    • G
      net: phy: fix save wrong speed and duplex problem if autoneg is on · d9032dba
      Guangbin Huang 提交于
      If phy uses generic driver and autoneg is on, enter command
      "ethtool -s eth0 speed 50" will not change phy speed actually, but
      command "ethtool eth0" shows speed is 50Mb/s because phydev->speed
      has been set to 50 and no update later.
      
      And duplex setting has same problem too.
      
      However, if autoneg is on, phy only changes speed and duplex according to
      phydev->advertising, but not phydev->speed and phydev->duplex. So in this
      case, phydev->speed and phydev->duplex don't need to be set in function
      phy_ethtool_ksettings_set() if autoneg is on.
      
      Fixes: 51e2a384 ("PHY: Avoid unnecessary aneg restarts")
      Signed-off-by: NGuangbin Huang <huangguangbin2@huawei.com>
      Signed-off-by: NHuazhong Tan <tanhuazhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d9032dba
    • J
      net: always use icmp{,v6}_ndo_send from ndo_start_xmit · 4372339e
      Jason A. Donenfeld 提交于
      There were a few remaining tunnel drivers that didn't receive the prior
      conversion to icmp{,v6}_ndo_send. Knowing now that this could lead to
      memory corrution (see ee576c47 ("net: icmp: pass zeroed opts from
      icmp{,v6}_ndo_send before sending") for details), there's even more
      imperative to have these all converted. So this commit goes through the
      remaining cases that I could find and does a boring translation to the
      ndo variety.
      
      The Fixes: line below is the merge that originally added icmp{,v6}_
      ndo_send and converted the first batch of icmp{,v6}_send users. The
      rationale then for the change applies equally to this patch. It's just
      that these drivers were left out of the initial conversion because these
      network devices are hiding in net/ rather than in drivers/net/.
      
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: David Ahern <dsahern@kernel.org>
      Cc: Jakub Kicinski <kuba@kernel.org>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Fixes: 803381f9 ("Merge branch 'icmp-account-for-NAT-when-sending-icmps-from-ndo-layer'")
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4372339e
    • D
      net: dsa: tag_rtl4_a: fix egress tags · 9eb8bc59
      DENG Qingfang 提交于
      Commit 86dd9868 has several issues, but was accepted too soon
      before anyone could take a look.
      
      - Double free. dsa_slave_xmit() will free the skb if the xmit function
        returns NULL, but the skb is already freed by eth_skb_pad(). Use
        __skb_put_padto() to avoid that.
      - Unnecessary allocation. It has been done by DSA core since commit
        a3b0b647.
      - A u16 pointer points to skb data. It should be __be16 for network
        byte order.
      - Typo in comments. "numer" -> "number".
      
      Fixes: 86dd9868 ("net: dsa: tag_rtl4_a: Support also egress tags")
      Signed-off-by: NDENG Qingfang <dqfext@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9eb8bc59
    • J
      xen-netback: use local var in xenvif_tx_check_gop() instead of re-calculating · 826d8217
      Jan Beulich 提交于
      shinfo already holds the result of skb_shinfo(skb) at this point - no
      need to re-invoke the construct even twice.
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      826d8217
    • I
      net: phy: ti: take into account all possible interrupt sources · 73f476aa
      Ioana Ciornei 提交于
      The previous implementation of .handle_interrupt() did not take into
      account the fact that all the interrupt status registers should be
      acknowledged since multiple interrupt sources could be asserted.
      
      Fix this by reading all the status registers before exiting with
      IRQ_NONE or triggering the PHY state machine.
      
      Fixes: 1d1ae3c6 ("net: phy: ti: implement generic .handle_interrupt() callback")
      Reported-by: NSven Schuchmann <schuchmann@schleissheimer.de>
      Signed-off-by: NIoana Ciornei <ioana.ciornei@nxp.com>
      Link: https://lore.kernel.org/r/20210226153020.867852-1-ciorneiioana@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      73f476aa