1. 05 5月, 2022 3 次提交
  2. 04 5月, 2022 22 次提交
    • D
      Merge tag 'mlx5-fixes-2022-05-03' of git://git.kernel.org/pub/scm/linux/kernel/g · ad0724b9
      David S. Miller 提交于
      it/saeed/linux
      
      Saeed Mahameed says:
      
      ====================
      mlx5 fixes 2022-05-03
      
      This series provides bug fixes to mlx5 driver.
      Please pull and let me know if there is any problem.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ad0724b9
    • M
      net/mlx5: Fix matching on inner TTC · a042d7f5
      Mark Bloch 提交于
      The cited commits didn't use proper matching on inner TTC
      as a result distribution of encapsulated packets wasn't symmetric
      between the physical ports.
      
      Fixes: 4c71ce50 ("net/mlx5: Support partial TTC rules")
      Fixes: 8e25a2bc ("net/mlx5: Lag, add support to create TTC tables for LAG port selection")
      Signed-off-by: NMark Bloch <mbloch@nvidia.com>
      Reviewed-by: NMaor Gottlieb <maorg@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      a042d7f5
    • M
      net/mlx5: Avoid double clear or set of sync reset requested · fc3d3db0
      Moshe Shemesh 提交于
      Double clear of reset requested state can lead to NULL pointer as it
      will try to delete the timer twice. This can happen for example on a
      race between abort from FW and pci error or reset. Avoid such case using
      test_and_clear_bit() to verify only one time reset requested state clear
      flow. Similarly use test_and_set_bit() to verify only one time reset
      requested state set flow.
      
      Fixes: 7dd6df32 ("net/mlx5: Handle sync reset abort event")
      Signed-off-by: NMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: NMaher Sanalla <msanalla@nvidia.com>
      Reviewed-by: NShay Drory <shayd@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      fc3d3db0
    • M
      net/mlx5: Fix deadlock in sync reset flow · cb7786a7
      Moshe Shemesh 提交于
      The sync reset flow can lead to the following deadlock when
      poll_sync_reset() is called by timer softirq and waiting on
      del_timer_sync() for the same timer. Fix that by moving the part of the
      flow that waits for the timer to reset_reload_work.
      
      It fixes the following kernel Trace:
      RIP: 0010:del_timer_sync+0x32/0x40
      ...
      Call Trace:
       <IRQ>
       mlx5_sync_reset_clear_reset_requested+0x26/0x50 [mlx5_core]
       poll_sync_reset.cold+0x36/0x52 [mlx5_core]
       call_timer_fn+0x32/0x130
       __run_timers.part.0+0x180/0x280
       ? tick_sched_handle+0x33/0x60
       ? tick_sched_timer+0x3d/0x80
       ? ktime_get+0x3e/0xa0
       run_timer_softirq+0x2a/0x50
       __do_softirq+0xe1/0x2d6
       ? hrtimer_interrupt+0x136/0x220
       irq_exit+0xae/0xb0
       smp_apic_timer_interrupt+0x7b/0x140
       apic_timer_interrupt+0xf/0x20
       </IRQ>
      
      Fixes: 3c5193a8 ("net/mlx5: Use del_timer_sync in fw reset flow of halting poll")
      Signed-off-by: NMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: NMaher Sanalla <msanalla@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      cb7786a7
    • M
      net/mlx5e: Fix trust state reset in reload · b781bff8
      Moshe Tal 提交于
      Setting dscp2prio during the driver reload can cause dcb ieee app list to
      be not empty after the reload finish and as a result to a conflict between
      the priority trust state reported by the app and the state in the device
      register.
      
      Reset the dcb ieee app list on initialization in case this is
      conflicting with the register status.
      
      Fixes: 2a5e7a13 ("net/mlx5e: Add dcbnl dscp to priority support")
      Signed-off-by: NMoshe Tal <moshet@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      b781bff8
    • A
      net/mlx5e: Avoid checking offload capability in post_parse action · 0e322efd
      Ariel Levkovich 提交于
      During TC action parsing, the can_offload callback is called
      before calling the action's main parsing callback.
      
      Later on, the can_offload callback is called again before handling
      the action's post_parse callback if exists.
      
      Since the main parsing callback might have changed and set parsing
      params for the rule, following can_offload checks might fail because
      some parsing params were already set.
      
      Specifically, the ct action main parsing sets the ct param in the
      parsing status structure and when the second can_offload for ct action
      is called, before handling the ct post parsing, it will return an error
      since it checks this ct param to indicate multiple ct actions which are
      not supported.
      
      Therefore, the can_offload call is removed from the post parsing
      handling to prevent such cases.
      This is allowed since the first can_offload call will ensure that the
      action can be offloaded and the fact the code reached the post parsing
      handling already means that the action can be offloaded.
      
      Fixes: 8300f225 ("net/mlx5e: Create new flow attr for multi table actions")
      Signed-off-by: NAriel Levkovich <lariel@nvidia.com>
      Reviewed-by: NPaul Blakey <paulb@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      0e322efd
    • P
      net/mlx5e: CT: Fix queued up restore put() executing after relevant ft release · b069e14f
      Paul Blakey 提交于
      __mlx5_tc_ct_entry_put() queues release of tuple related to some ct FT,
      if that is the last reference to that tuple, the actual deletion of
      the tuple can happen after the FT is already destroyed and freed.
      
      Flush the used workqueue before destroying the ct FT.
      
      Fixes: a2173131 ("net/mlx5e: CT: manage the lifetime of the ct entry object")
      Reviewed-by: NOz Shlomo <ozsh@nvidia.com>
      Signed-off-by: NPaul Blakey <paulb@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      b069e14f
    • A
      net/mlx5e: TC, fix decap fallback to uplink when int port not supported · e3fdc71b
      Ariel Levkovich 提交于
      When resolving the decap route device for a tunnel decap rule,
      the result may be an OVS internal port device.
      
      Prior to adding the support for internal port offload, such case
      would result in using the uplink as the default decap route device
      which allowed devices that can't support internal port offload
      to offload this decap rule.
      
      This behavior got broken by adding the internal port offload which
      will fail in case the device can't support internal port offload.
      
      To restore the old behavior, use the uplink device as the decap
      route as before when internal port offload is not supported.
      
      Fixes: b16eb3c8 ("net/mlx5: Support internal port as decap route device")
      Signed-off-by: NAriel Levkovich <lariel@nvidia.com>
      Reviewed-by: NMaor Dickman <maord@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      e3fdc71b
    • A
      net/mlx5e: TC, Fix ct_clear overwriting ct action metadata · 087032ee
      Ariel Levkovich 提交于
      ct_clear action is translated to clearing reg_c metadata
      which holds ct state and zone information using mod header
      actions.
      These actions are allocated during the actions parsing, as
      part of the flow attributes main mod header action list.
      
      If ct action exists in the rule, the flow's main mod header
      is used only in the post action table rule, after the ct tables
      which set the ct info in the reg_c as part of the ct actions.
      
      Therefore, if the original rule has a ct_clear action followed
      by a ct action, the ct action reg_c setting will be done first and
      will be followed by the ct_clear resetting reg_c and overwriting
      the ct info.
      
      Fix this by moving the ct_clear mod header actions allocation from
      the ct action parsing stage to the ct action post parsing stage where
      it is already known if ct_clear is followed by a ct action.
      In such case, we skip the mod header actions allocation for the ct
      clear since the ct action will write to reg_c anyway after clearing it.
      
      Fixes: 806401c2 ("net/mlx5e: CT, Fix multiple allocations and memleak of mod acts")
      Signed-off-by: NAriel Levkovich <lariel@nvidia.com>
      Reviewed-by: NPaul Blakey <paulb@nvidia.com>
      Reviewed-by: NRoi Dayan <roid@nvidia.com>
      Reviewed-by: NMaor Dickman <maord@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      087032ee
    • V
      net/mlx5e: Lag, Don't skip fib events on current dst · 4a2a664e
      Vlad Buslov 提交于
      Referenced change added check to skip updating fib when new fib instance
      has same or lower priority. However, new fib instance can be an update on
      same dst address as existing one even though the structure is another
      instance that has different address. Ignoring events on such instances
      causes multipath LAG state to not be correctly updated.
      
      Track 'dst' and 'dst_len' fields of fib event fib_entry_notifier_info
      structure and don't skip events that have the same value of that fields.
      
      Fixes: ad11c4f1 ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
      Signed-off-by: NVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: NMaor Dickman <maord@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      4a2a664e
    • V
      net/mlx5e: Lag, Fix fib_info pointer assignment · a6589155
      Vlad Buslov 提交于
      Referenced change incorrectly sets single path fib_info even when LAG is
      not active. Fix it by moving call to mlx5_lag_fib_set() into conditional
      that verifies LAG state.
      
      Fixes: ad11c4f1 ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
      Signed-off-by: NVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: NMaor Dickman <maord@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      a6589155
    • V
      net/mlx5e: Lag, Fix use-after-free in fib event handler · 27b0420f
      Vlad Buslov 提交于
      Recent commit that modified fib route event handler to handle events
      according to their priority introduced use-after-free[0] in mp->mfi pointer
      usage. The pointer now is not just cached in order to be compared to
      following fib_info instances, but is also dereferenced to obtain
      fib_priority. However, since mlx5 lag code doesn't hold the reference to
      fin_info during whole mp->mfi lifetime, it could be used after fib_info
      instance has already been freed be kernel infrastructure code.
      
      Don't ever dereference mp->mfi pointer. Refactor it to be 'const void*'
      type and cache fib_info priority in dedicated integer. Group
      fib_info-related data into dedicated 'fib' structure that will be further
      extended by following patches in the series.
      
      [0]:
      
      [  203.588029] ==================================================================
      [  203.590161] BUG: KASAN: use-after-free in mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
      [  203.592386] Read of size 4 at addr ffff888144df2050 by task kworker/u20:4/138
      
      [  203.594766] CPU: 3 PID: 138 Comm: kworker/u20:4 Tainted: G    B             5.17.0-rc7+ #6
      [  203.596751] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      [  203.598813] Workqueue: mlx5_lag_mp mlx5_lag_fib_update [mlx5_core]
      [  203.600053] Call Trace:
      [  203.600608]  <TASK>
      [  203.601110]  dump_stack_lvl+0x48/0x5e
      [  203.601860]  print_address_description.constprop.0+0x1f/0x160
      [  203.602950]  ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
      [  203.604073]  ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
      [  203.605177]  kasan_report.cold+0x83/0xdf
      [  203.605969]  ? mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
      [  203.607102]  mlx5_lag_fib_update+0xabd/0xd60 [mlx5_core]
      [  203.608199]  ? mlx5_lag_init_fib_work+0x1c0/0x1c0 [mlx5_core]
      [  203.609382]  ? read_word_at_a_time+0xe/0x20
      [  203.610463]  ? strscpy+0xa0/0x2a0
      [  203.611463]  process_one_work+0x722/0x1270
      [  203.612344]  worker_thread+0x540/0x11e0
      [  203.613136]  ? rescuer_thread+0xd50/0xd50
      [  203.613949]  kthread+0x26e/0x300
      [  203.614627]  ? kthread_complete_and_exit+0x20/0x20
      [  203.615542]  ret_from_fork+0x1f/0x30
      [  203.616273]  </TASK>
      
      [  203.617174] Allocated by task 3746:
      [  203.617874]  kasan_save_stack+0x1e/0x40
      [  203.618644]  __kasan_kmalloc+0x81/0xa0
      [  203.619394]  fib_create_info+0xb41/0x3c50
      [  203.620213]  fib_table_insert+0x190/0x1ff0
      [  203.621020]  fib_magic.isra.0+0x246/0x2e0
      [  203.621803]  fib_add_ifaddr+0x19f/0x670
      [  203.622563]  fib_inetaddr_event+0x13f/0x270
      [  203.623377]  blocking_notifier_call_chain+0xd4/0x130
      [  203.624355]  __inet_insert_ifa+0x641/0xb20
      [  203.625185]  inet_rtm_newaddr+0xc3d/0x16a0
      [  203.626009]  rtnetlink_rcv_msg+0x309/0x880
      [  203.626826]  netlink_rcv_skb+0x11d/0x340
      [  203.627626]  netlink_unicast+0x4cc/0x790
      [  203.628430]  netlink_sendmsg+0x762/0xc00
      [  203.629230]  sock_sendmsg+0xb2/0xe0
      [  203.629955]  ____sys_sendmsg+0x58a/0x770
      [  203.630756]  ___sys_sendmsg+0xd8/0x160
      [  203.631523]  __sys_sendmsg+0xb7/0x140
      [  203.632294]  do_syscall_64+0x35/0x80
      [  203.633045]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      [  203.634427] Freed by task 0:
      [  203.635063]  kasan_save_stack+0x1e/0x40
      [  203.635844]  kasan_set_track+0x21/0x30
      [  203.636618]  kasan_set_free_info+0x20/0x30
      [  203.637450]  __kasan_slab_free+0xfc/0x140
      [  203.638271]  kfree+0x94/0x3b0
      [  203.638903]  rcu_core+0x5e4/0x1990
      [  203.639640]  __do_softirq+0x1ba/0x5d3
      
      [  203.640828] Last potentially related work creation:
      [  203.641785]  kasan_save_stack+0x1e/0x40
      [  203.642571]  __kasan_record_aux_stack+0x9f/0xb0
      [  203.643478]  call_rcu+0x88/0x9c0
      [  203.644178]  fib_release_info+0x539/0x750
      [  203.644997]  fib_table_delete+0x659/0xb80
      [  203.645809]  fib_magic.isra.0+0x1a3/0x2e0
      [  203.646617]  fib_del_ifaddr+0x93f/0x1300
      [  203.647415]  fib_inetaddr_event+0x9f/0x270
      [  203.648251]  blocking_notifier_call_chain+0xd4/0x130
      [  203.649225]  __inet_del_ifa+0x474/0xc10
      [  203.650016]  devinet_ioctl+0x781/0x17f0
      [  203.650788]  inet_ioctl+0x1ad/0x290
      [  203.651533]  sock_do_ioctl+0xce/0x1c0
      [  203.652315]  sock_ioctl+0x27b/0x4f0
      [  203.653058]  __x64_sys_ioctl+0x124/0x190
      [  203.653850]  do_syscall_64+0x35/0x80
      [  203.654608]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      [  203.666952] The buggy address belongs to the object at ffff888144df2000
                      which belongs to the cache kmalloc-256 of size 256
      [  203.669250] The buggy address is located 80 bytes inside of
                      256-byte region [ffff888144df2000, ffff888144df2100)
      [  203.671332] The buggy address belongs to the page:
      [  203.672273] page:00000000bf6c9314 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x144df0
      [  203.674009] head:00000000bf6c9314 order:2 compound_mapcount:0 compound_pincount:0
      [  203.675422] flags: 0x2ffff800010200(slab|head|node=0|zone=2|lastcpupid=0x1ffff)
      [  203.676819] raw: 002ffff800010200 0000000000000000 dead000000000122 ffff888100042b40
      [  203.678384] raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
      [  203.679928] page dumped because: kasan: bad access detected
      
      [  203.681455] Memory state around the buggy address:
      [  203.682421]  ffff888144df1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  203.683863]  ffff888144df1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  203.685310] >ffff888144df2000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  203.686701]                                                  ^
      [  203.687820]  ffff888144df2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [  203.689226]  ffff888144df2100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  203.690620] ==================================================================
      
      Fixes: ad11c4f1 ("net/mlx5e: Lag, Only handle events from highest priority multipath entry")
      Signed-off-by: NVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: NMaor Dickman <maord@nvidia.com>
      Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      27b0420f
    • M
      net/mlx5e: Fix the calling of update_buffer_lossy() API · c4d963a5
      Mark Zhang 提交于
      The arguments of update_buffer_lossy() is in a wrong order. Fix it.
      
      Fixes: 88b3d5c9 ("net/mlx5e: Fix port buffers cell size value")
      Signed-off-by: NMark Zhang <markzhang@nvidia.com>
      Reviewed-by: NMaor Gottlieb <maorg@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      c4d963a5
    • V
      net/mlx5e: Don't match double-vlan packets if cvlan is not set · ada09af9
      Vlad Buslov 提交于
      Currently, match VLAN rule also matches packets that have multiple VLAN
      headers. This behavior is similar to buggy flower classifier behavior that
      has recently been fixed. Fix the issue by matching on
      outer_second_cvlan_tag with value 0 which will cause the HW to verify the
      packet doesn't contain second vlan header.
      
      Fixes: 699e96dd ("net/mlx5e: Support offloading tc double vlan headers match")
      Signed-off-by: NVlad Buslov <vladbu@nvidia.com>
      Reviewed-by: NMaor Dickman <maord@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      ada09af9
    • A
      net/mlx5: Fix slab-out-of-bounds while reading resource dump menu · 7ba2d9d8
      Aya Levin 提交于
      Resource dump menu may span over more than a single page, support it.
      Otherwise, menu read may result in a memory access violation: reading
      outside of the allocated page.
      Note that page format of the first menu page contains menu headers while
      the proceeding menu pages contain only records.
      
      The KASAN logs are as follows:
      BUG: KASAN: slab-out-of-bounds in strcmp+0x9b/0xb0
      Read of size 1 at addr ffff88812b2e1fd0 by task systemd-udevd/496
      
      CPU: 5 PID: 496 Comm: systemd-udevd Tainted: G    B  5.16.0_for_upstream_debug_2022_01_10_23_12 #1
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      Call Trace:
       <TASK>
       dump_stack_lvl+0x57/0x7d
       print_address_description.constprop.0+0x1f/0x140
       ? strcmp+0x9b/0xb0
       ? strcmp+0x9b/0xb0
       kasan_report.cold+0x83/0xdf
       ? strcmp+0x9b/0xb0
       strcmp+0x9b/0xb0
       mlx5_rsc_dump_init+0x4ab/0x780 [mlx5_core]
       ? mlx5_rsc_dump_destroy+0x80/0x80 [mlx5_core]
       ? lockdep_hardirqs_on_prepare+0x286/0x400
       ? raw_spin_unlock_irqrestore+0x47/0x50
       ? aomic_notifier_chain_register+0x32/0x40
       mlx5_load+0x104/0x2e0 [mlx5_core]
       mlx5_init_one+0x41b/0x610 [mlx5_core]
       ....
      The buggy address belongs to the object at ffff88812b2e0000
       which belongs to the cache kmalloc-4k of size 4096
      The buggy address is located 4048 bytes to the right of
       4096-byte region [ffff88812b2e0000, ffff88812b2e1000)
      The buggy address belongs to the page:
      page:000000009d69807a refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88812b2e6000 pfn:0x12b2e0
      head:000000009d69807a order:3 compound_mapcount:0 compound_pincount:0
      flags: 0x8000000000010200(slab|head|zone=2)
      raw: 8000000000010200 0000000000000000 dead000000000001 ffff888100043040
      raw: ffff88812b2e6000 0000000080040000 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff88812b2e1e80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff88812b2e1f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      >ffff88812b2e1f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                                                       ^
       ffff88812b2e2000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff88812b2e2080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      ==================================================================
      
      Fixes: 12206b17 ("net/mlx5: Add support for resource dump")
      Signed-off-by: NAya Levin <ayal@nvidia.com>
      Reviewed-by: NMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      7ba2d9d8
    • A
      net/mlx5e: Fix wrong source vport matching on tunnel rule · cb0d54cb
      Ariel Levkovich 提交于
      When OVS internal port is the vtep device, the first decap
      rule is matching on the internal port's vport metadata value
      and then changes the metadata to be the uplink's value.
      
      Therefore, following rules on the tunnel, in chain > 0, should
      avoid matching on internal port metadata and use the uplink
      vport metadata instead.
      
      Select the uplink's metadata value for the source vport match
      in case the rule is in chain greater than zero, even if the tunnel
      route device is internal port.
      
      Fixes: 166f431e ("net/mlx5e: Add indirect tc offload of ovs internal port")
      Signed-off-by: NAriel Levkovich <lariel@nvidia.com>
      Reviewed-by: NMaor Dickman <maord@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      cb0d54cb
    • J
      Merge branch 'bnxt_en-bug-fixes' · 0a806ecc
      Jakub Kicinski 提交于
      Michael Chan says:
      
      ====================
      bnxt_en: Bug fixes
      
      This patch series includes 3 fixes:
       - Fix an occasional VF open failure.
       - Fix a PTP spinlock usage before initialization
       - Fix unnecesary RX packet drops under high TX traffic load.
      ====================
      
      Link: https://lore.kernel.org/r/1651540392-2260-1-git-send-email-michael.chan@broadcom.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      0a806ecc
    • M
      bnxt_en: Fix unnecessary dropping of RX packets · 195af579
      Michael Chan 提交于
      In bnxt_poll_p5(), we first check cpr->has_more_work.  If it is true,
      we are in NAPI polling mode and we will call __bnxt_poll_cqs() to
      continue polling.  It is possible to exhanust the budget again when
      __bnxt_poll_cqs() returns.
      
      We then enter the main while loop to check for new entries in the NQ.
      If we had previously exhausted the NAPI budget, we may call
      __bnxt_poll_work() to process an RX entry with zero budget.  This will
      cause packets to be dropped unnecessarily, thinking that we are in the
      netpoll path.  Fix it by breaking out of the while loop if we need
      to process an RX NQ entry with no budget left.  We will then exit
      NAPI and stay in polling mode.
      
      Fixes: 389a877a ("bnxt_en: Process the NQ under NAPI continuous polling.")
      Reviewed-by: NAndy Gospodarek <andrew.gospodarek@broadcom.com>
      Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      195af579
    • M
      bnxt_en: Initiallize bp->ptp_lock first before using it · 2b156fb5
      Michael Chan 提交于
      bnxt_ptp_init() calls bnxt_ptp_init_rtc() which will acquire the ptp_lock
      spinlock.  The spinlock is not initialized until later.  Move the
      bnxt_ptp_init_rtc() call after the spinlock is initialized.
      
      Fixes: 24ac1ecd ("bnxt_en: Add driver support to use Real Time Counter for PTP")
      Reviewed-by: NPavan Chebbi <pavan.chebbi@broadcom.com>
      Reviewed-by: NSaravanan Vajravel <saravanan.vajravel@broadcom.com>
      Reviewed-by: NAndy Gospodarek <andrew.gospodarek@broadcom.com>
      Reviewed-by: NSomnath Kotur <somnath.kotur@broadcom.com>
      Reviewed-by: NDamodharam Ammepalli <damodharam.ammepalli@broadcom.com>
      Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      2b156fb5
    • S
      bnxt_en: Fix possible bnxt_open() failure caused by wrong RFS flag · 13ba7943
      Somnath Kotur 提交于
      bnxt_open() can fail in this code path, especially on a VF when
      it fails to reserve default rings:
      
      bnxt_open()
        __bnxt_open_nic()
          bnxt_clear_int_mode()
          bnxt_init_dflt_ring_mode()
      
      RX rings would be set to 0 when we hit this error path.
      
      It is possible for a subsequent bnxt_open() call to potentially succeed
      with a code path like this:
      
      bnxt_open()
        bnxt_hwrm_if_change()
          bnxt_fw_init_one()
            bnxt_fw_init_one_p3()
              bnxt_set_dflt_rfs()
                bnxt_rfs_capable()
                  bnxt_hwrm_reserve_rings()
      
      On older chips, RFS is capable if we can reserve the number of vnics that
      is equal to RX rings + 1.  But since RX rings is still set to 0 in this
      code path, we may mistakenly think that RFS is supported for 0 RX rings.
      
      Later, when the default RX rings are reserved and we try to enable
      RFS, it would fail and cause bnxt_open() to fail unnecessarily.
      
      We fix this in 2 places.  bnxt_rfs_capable() will always return false if
      RX rings is not yet set.  bnxt_init_dflt_ring_mode() will call
      bnxt_set_dflt_rfs() which will always clear the RFS flags if RFS is not
      supported.
      
      Fixes: 20d7d1c5 ("bnxt_en: reliably allocate IRQ table on reset to avoid crash")
      Signed-off-by: NSomnath Kotur <somnath.kotur@broadcom.com>
      Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      13ba7943
    • S
      smsc911x: allow using IRQ0 · 5ef9b803
      Sergey Shtylyov 提交于
      The AlphaProject AP-SH4A-3A/AP-SH4AD-0A SH boards use IRQ0 for their SMSC
      LAN911x Ethernet chip, so the networking on them must have been broken by
      commit 965b2aa7 ("net/smsc911x: fix irq resource allocation failure")
      which filtered out 0 as well as the negative error codes -- it was kinda
      correct at the time, as platform_get_irq() could return 0 on of_irq_get()
      failure and on the actual 0 in an IRQ resource.  This issue was fixed by
      me (back in 2016!), so we should be able to fix this driver to allow IRQ0
      usage again...
      
      When merging this to the stable kernels, make sure you also merge commit
      e330b9a6 ("platform: don't return 0 from platform_get_irq[_byname]()
      on error") -- that's my fix to platform_get_irq() for the DT platforms...
      
      Fixes: 965b2aa7 ("net/smsc911x: fix irq resource allocation failure")
      Signed-off-by: NSergey Shtylyov <s.shtylyov@omp.ru>
      Link: https://lore.kernel.org/r/656036e4-6387-38df-b8a7-6ba683b16e63@omp.ruSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      5ef9b803
    • M
      net: sfp: Add tx-fault workaround for Huawei MA5671A SFP ONT · 2069624d
      Matthew Hagan 提交于
      As noted elsewhere, various GPON SFP modules exhibit non-standard
      TX-fault behaviour. In the tested case, the Huawei MA5671A, when used
      in combination with a Marvell mv88e6085 switch, was found to
      persistently assert TX-fault, resulting in the module being disabled.
      
      This patch adds a quirk to ignore the SFP_F_TX_FAULT state, allowing the
      module to function.
      
      Change from v1: removal of erroneous return statment (Andrew Lunn)
      Signed-off-by: NMatthew Hagan <mnhagan88@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Link: https://lore.kernel.org/r/20220502223315.1973376-1-mnhagan88@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      2069624d
  3. 03 5月, 2022 7 次提交
  4. 02 5月, 2022 2 次提交
  5. 01 5月, 2022 4 次提交
    • D
      Merge branch 'nfc-fixes' · b6693611
      David S. Miller 提交于
      Duoming Zhou says:
      
      ====================
      Replace improper checks and fix bugs in nfc subsystem
      
      The first patch is used to replace improper checks in netlink related
      functions of nfc core, the second patch is used to fix bugs in
      nfcmrvl driver.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6693611
    • D
      nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs · d270453a
      Duoming Zhou 提交于
      There are destructive operations such as nfcmrvl_fw_dnld_abort and
      gpio_free in nfcmrvl_nci_unregister_dev. The resources such as firmware,
      gpio and so on could be destructed while the upper layer functions such as
      nfcmrvl_fw_dnld_start and nfcmrvl_nci_recv_frame is executing, which leads
      to double-free, use-after-free and null-ptr-deref bugs.
      
      There are three situations that could lead to double-free bugs.
      
      The first situation is shown below:
      
         (Thread 1)                 |      (Thread 2)
      nfcmrvl_fw_dnld_start         |
       ...                          |  nfcmrvl_nci_unregister_dev
       release_firmware()           |   nfcmrvl_fw_dnld_abort
        kfree(fw) //(1)             |    fw_dnld_over
                                    |     release_firmware
        ...                         |      kfree(fw) //(2)
                                    |     ...
      
      The second situation is shown below:
      
         (Thread 1)                 |      (Thread 2)
      nfcmrvl_fw_dnld_start         |
       ...                          |
       mod_timer                    |
       (wait a time)                |
       fw_dnld_timeout              |  nfcmrvl_nci_unregister_dev
         fw_dnld_over               |   nfcmrvl_fw_dnld_abort
          release_firmware          |    fw_dnld_over
           kfree(fw) //(1)          |     release_firmware
           ...                      |      kfree(fw) //(2)
      
      The third situation is shown below:
      
             (Thread 1)               |       (Thread 2)
      nfcmrvl_nci_recv_frame          |
       if(..->fw_download_in_progress)|
        nfcmrvl_fw_dnld_recv_frame    |
         queue_work                   |
                                      |
      fw_dnld_rx_work                 | nfcmrvl_nci_unregister_dev
       fw_dnld_over                   |  nfcmrvl_fw_dnld_abort
        release_firmware              |   fw_dnld_over
         kfree(fw) //(1)              |    release_firmware
                                      |     kfree(fw) //(2)
      
      The firmware struct is deallocated in position (1) and deallocated
      in position (2) again.
      
      The crash trace triggered by POC is like below:
      
      BUG: KASAN: double-free or invalid-free in fw_dnld_over
      Call Trace:
        kfree
        fw_dnld_over
        nfcmrvl_nci_unregister_dev
        nci_uart_tty_close
        tty_ldisc_kill
        tty_ldisc_hangup
        __tty_hangup.part.0
        tty_release
        ...
      
      What's more, there are also use-after-free and null-ptr-deref bugs
      in nfcmrvl_fw_dnld_start. If we deallocate firmware struct, gpio or
      set null to the members of priv->fw_dnld in nfcmrvl_nci_unregister_dev,
      then, we dereference firmware, gpio or the members of priv->fw_dnld in
      nfcmrvl_fw_dnld_start, the UAF or NPD bugs will happen.
      
      This patch reorders destructive operations after nci_unregister_device
      in order to synchronize between cleanup routine and firmware download
      routine.
      
      The nci_unregister_device is well synchronized. If the device is
      detaching, the firmware download routine will goto error. If firmware
      download routine is executing, nci_unregister_device will wait until
      firmware download routine is finished.
      
      Fixes: 3194c687 ("NFC: nfcmrvl: add firmware download support")
      Signed-off-by: NDuoming Zhou <duoming@zju.edu.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d270453a
    • D
      nfc: replace improper check device_is_registered() in netlink related functions · da5c0f11
      Duoming Zhou 提交于
      The device_is_registered() in nfc core is used to check whether
      nfc device is registered in netlink related functions such as
      nfc_fw_download(), nfc_dev_up() and so on. Although device_is_registered()
      is protected by device_lock, there is still a race condition between
      device_del() and device_is_registered(). The root cause is that
      kobject_del() in device_del() is not protected by device_lock.
      
         (cleanup task)         |     (netlink task)
                                |
      nfc_unregister_device     | nfc_fw_download
       device_del               |  device_lock
        ...                     |   if (!device_is_registered)//(1)
        kobject_del//(2)        |   ...
       ...                      |  device_unlock
      
      The device_is_registered() returns the value of state_in_sysfs and
      the state_in_sysfs is set to zero in kobject_del(). If we pass check in
      position (1), then set zero in position (2). As a result, the check
      in position (1) is useless.
      
      This patch uses bool variable instead of device_is_registered() to judge
      whether the nfc device is registered, which is well synchronized.
      
      Fixes: 3e256b8f ("NFC: add nfc subsystem core")
      Signed-off-by: NDuoming Zhou <duoming@zju.edu.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      da5c0f11
    • T
      net: stmmac: disable Split Header (SPH) for Intel platforms · 47f753c1
      Tan Tee Min 提交于
      Based on DesignWare Ethernet QoS datasheet, we are seeing the limitation
      of Split Header (SPH) feature is not supported for Ipv4 fragmented packet.
      This SPH limitation will cause ping failure when the packets size exceed
      the MTU size. For example, the issue happens once the basic ping packet
      size is larger than the configured MTU size and the data is lost inside
      the fragmented packet, replaced by zeros/corrupted values, and leads to
      ping fail.
      
      So, disable the Split Header for Intel platforms.
      
      v2: Add fixes tag in commit message.
      
      Fixes: 67afd6d1("net: stmmac: Add Split Header support and enable it in XGMAC cores")
      Cc: <stable@vger.kernel.org> # 5.10.x
      Suggested-by: NOng, Boon Leong <boon.leong.ong@intel.com>
      Signed-off-by: NMohammad Athari Bin Ismail <mohammad.athari.ismail@intel.com>
      Signed-off-by: NWong Vee Khee <vee.khee.wong@linux.intel.com>
      Signed-off-by: NTan Tee Min <tee.min.tan@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      47f753c1
  6. 30 4月, 2022 2 次提交
    • E
      mld: respect RCU rules in ip6_mc_source() and ip6_mc_msfilter() · a9384a4c
      Eric Dumazet 提交于
      Whenever RCU protected list replaces an object,
      the pointer to the new object needs to be updated
      _before_ the call to kfree_rcu() or call_rcu()
      
      Also ip6_mc_msfilter() needs to update the pointer
      before releasing the mc_lock mutex.
      
      Note that linux-5.13 was supporting kfree_rcu(NULL, rcu),
      so this fix does not need the conditional test I was
      forced to use in the equivalent patch for IPv4.
      
      Fixes: 882ba1f7 ("mld: convert ipv6_mc_socklist->sflist to RCU")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Taehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a9384a4c
    • E
      net: igmp: respect RCU rules in ip_mc_source() and ip_mc_msfilter() · dba5bdd5
      Eric Dumazet 提交于
      syzbot reported an UAF in ip_mc_sf_allow() [1]
      
      Whenever RCU protected list replaces an object,
      the pointer to the new object needs to be updated
      _before_ the call to kfree_rcu() or call_rcu()
      
      Because kfree_rcu(ptr, rcu) got support for NULL ptr
      only recently in commit 12edff04 ("rcu: Make kfree_rcu()
      ignore NULL pointers"), I chose to use the conditional
      to make sure stable backports won't miss this detail.
      
      if (psl)
          kfree_rcu(psl, rcu);
      
      net/ipv6/mcast.c has similar issues, addressed in a separate patch.
      
      [1]
      BUG: KASAN: use-after-free in ip_mc_sf_allow+0x6bb/0x6d0 net/ipv4/igmp.c:2655
      Read of size 4 at addr ffff88807d37b904 by task syz-executor.5/908
      
      CPU: 0 PID: 908 Comm: syz-executor.5 Not tainted 5.18.0-rc4-syzkaller-00064-g8f4dd166 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       print_address_description.constprop.0.cold+0xeb/0x467 mm/kasan/report.c:313
       print_report mm/kasan/report.c:429 [inline]
       kasan_report.cold+0xf4/0x1c6 mm/kasan/report.c:491
       ip_mc_sf_allow+0x6bb/0x6d0 net/ipv4/igmp.c:2655
       raw_v4_input net/ipv4/raw.c:190 [inline]
       raw_local_deliver+0x4d1/0xbe0 net/ipv4/raw.c:218
       ip_protocol_deliver_rcu+0xcf/0xb30 net/ipv4/ip_input.c:193
       ip_local_deliver_finish+0x2ee/0x4c0 net/ipv4/ip_input.c:233
       NF_HOOK include/linux/netfilter.h:307 [inline]
       NF_HOOK include/linux/netfilter.h:301 [inline]
       ip_local_deliver+0x1b3/0x200 net/ipv4/ip_input.c:254
       dst_input include/net/dst.h:461 [inline]
       ip_rcv_finish+0x1cb/0x2f0 net/ipv4/ip_input.c:437
       NF_HOOK include/linux/netfilter.h:307 [inline]
       NF_HOOK include/linux/netfilter.h:301 [inline]
       ip_rcv+0xaa/0xd0 net/ipv4/ip_input.c:556
       __netif_receive_skb_one_core+0x114/0x180 net/core/dev.c:5405
       __netif_receive_skb+0x24/0x1b0 net/core/dev.c:5519
       netif_receive_skb_internal net/core/dev.c:5605 [inline]
       netif_receive_skb+0x13e/0x8e0 net/core/dev.c:5664
       tun_rx_batched.isra.0+0x460/0x720 drivers/net/tun.c:1534
       tun_get_user+0x28b7/0x3e30 drivers/net/tun.c:1985
       tun_chr_write_iter+0xdb/0x200 drivers/net/tun.c:2015
       call_write_iter include/linux/fs.h:2050 [inline]
       new_sync_write+0x38a/0x560 fs/read_write.c:504
       vfs_write+0x7c0/0xac0 fs/read_write.c:591
       ksys_write+0x127/0x250 fs/read_write.c:644
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7f3f12c3bbff
      Code: 89 54 24 18 48 89 74 24 10 89 7c 24 08 e8 99 fd ff ff 48 8b 54 24 18 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 48 89 44 24 08 e8 cc fd ff ff 48
      RSP: 002b:00007f3f13ea9130 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 00007f3f12d9bf60 RCX: 00007f3f12c3bbff
      RDX: 0000000000000036 RSI: 0000000020002ac0 RDI: 00000000000000c8
      RBP: 00007f3f12ce308d R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000036 R11: 0000000000000293 R12: 0000000000000000
      R13: 00007fffb68dd79f R14: 00007f3f13ea9300 R15: 0000000000022000
       </TASK>
      
      Allocated by task 908:
       kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
       kasan_set_track mm/kasan/common.c:45 [inline]
       set_alloc_info mm/kasan/common.c:436 [inline]
       ____kasan_kmalloc mm/kasan/common.c:515 [inline]
       ____kasan_kmalloc mm/kasan/common.c:474 [inline]
       __kasan_kmalloc+0xa6/0xd0 mm/kasan/common.c:524
       kasan_kmalloc include/linux/kasan.h:234 [inline]
       __do_kmalloc mm/slab.c:3710 [inline]
       __kmalloc+0x209/0x4d0 mm/slab.c:3719
       kmalloc include/linux/slab.h:586 [inline]
       sock_kmalloc net/core/sock.c:2501 [inline]
       sock_kmalloc+0xb5/0x100 net/core/sock.c:2492
       ip_mc_source+0xba2/0x1100 net/ipv4/igmp.c:2392
       do_ip_setsockopt net/ipv4/ip_sockglue.c:1296 [inline]
       ip_setsockopt+0x2312/0x3ab0 net/ipv4/ip_sockglue.c:1432
       raw_setsockopt+0x274/0x2c0 net/ipv4/raw.c:861
       __sys_setsockopt+0x2db/0x6a0 net/socket.c:2180
       __do_sys_setsockopt net/socket.c:2191 [inline]
       __se_sys_setsockopt net/socket.c:2188 [inline]
       __x64_sys_setsockopt+0xba/0x150 net/socket.c:2188
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Freed by task 753:
       kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
       kasan_set_track+0x21/0x30 mm/kasan/common.c:45
       kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
       ____kasan_slab_free mm/kasan/common.c:366 [inline]
       ____kasan_slab_free+0x13d/0x180 mm/kasan/common.c:328
       kasan_slab_free include/linux/kasan.h:200 [inline]
       __cache_free mm/slab.c:3439 [inline]
       kmem_cache_free_bulk+0x69/0x460 mm/slab.c:3774
       kfree_bulk include/linux/slab.h:437 [inline]
       kfree_rcu_work+0x51c/0xa10 kernel/rcu/tree.c:3318
       process_one_work+0x996/0x1610 kernel/workqueue.c:2289
       worker_thread+0x665/0x1080 kernel/workqueue.c:2436
       kthread+0x2e9/0x3a0 kernel/kthread.c:376
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298
      
      Last potentially related work creation:
       kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
       __kasan_record_aux_stack+0x7e/0x90 mm/kasan/generic.c:348
       kvfree_call_rcu+0x74/0x990 kernel/rcu/tree.c:3595
       ip_mc_msfilter+0x712/0xb60 net/ipv4/igmp.c:2510
       do_ip_setsockopt net/ipv4/ip_sockglue.c:1257 [inline]
       ip_setsockopt+0x32e1/0x3ab0 net/ipv4/ip_sockglue.c:1432
       raw_setsockopt+0x274/0x2c0 net/ipv4/raw.c:861
       __sys_setsockopt+0x2db/0x6a0 net/socket.c:2180
       __do_sys_setsockopt net/socket.c:2191 [inline]
       __se_sys_setsockopt net/socket.c:2188 [inline]
       __x64_sys_setsockopt+0xba/0x150 net/socket.c:2188
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Second to last potentially related work creation:
       kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
       __kasan_record_aux_stack+0x7e/0x90 mm/kasan/generic.c:348
       call_rcu+0x99/0x790 kernel/rcu/tree.c:3074
       mpls_dev_notify+0x552/0x8a0 net/mpls/af_mpls.c:1656
       notifier_call_chain+0xb5/0x200 kernel/notifier.c:84
       call_netdevice_notifiers_info+0xb5/0x130 net/core/dev.c:1938
       call_netdevice_notifiers_extack net/core/dev.c:1976 [inline]
       call_netdevice_notifiers net/core/dev.c:1990 [inline]
       unregister_netdevice_many+0x92e/0x1890 net/core/dev.c:10751
       default_device_exit_batch+0x449/0x590 net/core/dev.c:11245
       ops_exit_list+0x125/0x170 net/core/net_namespace.c:167
       cleanup_net+0x4ea/0xb00 net/core/net_namespace.c:594
       process_one_work+0x996/0x1610 kernel/workqueue.c:2289
       worker_thread+0x665/0x1080 kernel/workqueue.c:2436
       kthread+0x2e9/0x3a0 kernel/kthread.c:376
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:298
      
      The buggy address belongs to the object at ffff88807d37b900
       which belongs to the cache kmalloc-64 of size 64
      The buggy address is located 4 bytes inside of
       64-byte region [ffff88807d37b900, ffff88807d37b940)
      
      The buggy address belongs to the physical page:
      page:ffffea0001f4dec0 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88807d37b180 pfn:0x7d37b
      flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff)
      raw: 00fff00000000200 ffff888010c41340 ffffea0001c795c8 ffff888010c40200
      raw: ffff88807d37b180 ffff88807d37b000 000000010000001f 0000000000000000
      page dumped because: kasan: bad access detected
      page_owner tracks the page as allocated
      page last allocated via order 0, migratetype Unmovable, gfp_mask 0x342040(__GFP_IO|__GFP_NOWARN|__GFP_COMP|__GFP_HARDWALL|__GFP_THISNODE), pid 2963, tgid 2963 (udevd), ts 139732238007, free_ts 139730893262
       prep_new_page mm/page_alloc.c:2441 [inline]
       get_page_from_freelist+0xba2/0x3e00 mm/page_alloc.c:4182
       __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5408
       __alloc_pages_node include/linux/gfp.h:587 [inline]
       kmem_getpages mm/slab.c:1378 [inline]
       cache_grow_begin+0x75/0x350 mm/slab.c:2584
       cache_alloc_refill+0x27f/0x380 mm/slab.c:2957
       ____cache_alloc mm/slab.c:3040 [inline]
       ____cache_alloc mm/slab.c:3023 [inline]
       __do_cache_alloc mm/slab.c:3267 [inline]
       slab_alloc mm/slab.c:3309 [inline]
       __do_kmalloc mm/slab.c:3708 [inline]
       __kmalloc+0x3b3/0x4d0 mm/slab.c:3719
       kmalloc include/linux/slab.h:586 [inline]
       kzalloc include/linux/slab.h:714 [inline]
       tomoyo_encode2.part.0+0xe9/0x3a0 security/tomoyo/realpath.c:45
       tomoyo_encode2 security/tomoyo/realpath.c:31 [inline]
       tomoyo_encode+0x28/0x50 security/tomoyo/realpath.c:80
       tomoyo_realpath_from_path+0x186/0x620 security/tomoyo/realpath.c:288
       tomoyo_get_realpath security/tomoyo/file.c:151 [inline]
       tomoyo_path_perm+0x21b/0x400 security/tomoyo/file.c:822
       security_inode_getattr+0xcf/0x140 security/security.c:1350
       vfs_getattr fs/stat.c:157 [inline]
       vfs_statx+0x16a/0x390 fs/stat.c:232
       vfs_fstatat+0x8c/0xb0 fs/stat.c:255
       __do_sys_newfstatat+0x91/0x110 fs/stat.c:425
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      page last free stack trace:
       reset_page_owner include/linux/page_owner.h:24 [inline]
       free_pages_prepare mm/page_alloc.c:1356 [inline]
       free_pcp_prepare+0x549/0xd20 mm/page_alloc.c:1406
       free_unref_page_prepare mm/page_alloc.c:3328 [inline]
       free_unref_page+0x19/0x6a0 mm/page_alloc.c:3423
       __vunmap+0x85d/0xd30 mm/vmalloc.c:2667
       __vfree+0x3c/0xd0 mm/vmalloc.c:2715
       vfree+0x5a/0x90 mm/vmalloc.c:2746
       __do_replace+0x16b/0x890 net/ipv6/netfilter/ip6_tables.c:1117
       do_replace net/ipv6/netfilter/ip6_tables.c:1157 [inline]
       do_ip6t_set_ctl+0x90d/0xb90 net/ipv6/netfilter/ip6_tables.c:1639
       nf_setsockopt+0x83/0xe0 net/netfilter/nf_sockopt.c:101
       ipv6_setsockopt+0x122/0x180 net/ipv6/ipv6_sockglue.c:1026
       tcp_setsockopt+0x136/0x2520 net/ipv4/tcp.c:3696
       __sys_setsockopt+0x2db/0x6a0 net/socket.c:2180
       __do_sys_setsockopt net/socket.c:2191 [inline]
       __se_sys_setsockopt net/socket.c:2188 [inline]
       __x64_sys_setsockopt+0xba/0x150 net/socket.c:2188
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Memory state around the buggy address:
       ffff88807d37b800: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc
       ffff88807d37b880: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc
      >ffff88807d37b900: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
                         ^
       ffff88807d37b980: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
       ffff88807d37ba00: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc
      
      Fixes: c85bb41e ("igmp: fix ip_mc_sf_allow race [v5]")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Flavio Leitner <fbl@sysclose.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dba5bdd5