1. 16 12月, 2022 3 次提交
    • R
      arm64: fix a concurrency issue in emulation_proc_handler() · f676dd4c
      ruanjinjie 提交于
      hulk inclusion
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I65T0J
      CVE: NA
      
      -------------------------------
      
      In emulation_proc_handler(), read and write operations are performed on
      insn->current_mode. In the concurrency scenario, mutex only protects
      writing insn->current_mode, and not protects the read. Suppose there are
      two concurrent tasks, task1 updates insn->current_mode to INSN_EMULATE
      in the critical section, the prev_mode of task2 is still the old data
      INSN_UNDEF of insn->current_mode. As a result, two tasks call
      update_insn_emulation_mode twice with prev_mode = INSN_UNDEF and
      current_mode = INSN_EMULATE, then call register_emulation_hooks twice,
      resulting in a list_add double problem.
      
      Call trace:
       __list_add_valid+0xd8/0xe4
       register_undef_hook+0x94/0x13c
       update_insn_emulation_mode+0xd0/0x12c
       emulation_proc_handler+0xd8/0xf4
       proc_sys_call_handler+0x140/0x250
       proc_sys_write+0x1c/0x2c
       new_sync_write+0xec/0x18c
       vfs_write+0x214/0x2ac
       ksys_write+0x70/0xfc
       __arm64_sys_write+0x24/0x30
       el0_svc_common.constprop.0+0x7c/0x1bc
       do_el0_svc+0x2c/0x94
       el0_svc+0x20/0x30
       el0_sync_handler+0xb0/0xb4
       el0_sync+0x160/0x180
      
      Fixes: 08f3f0b2 ("arm64: fix oops in concurrently setting insn_emulation sysctls")
      Signed-off-by: Nruanjinjie <ruanjinjie@huawei.com>
      Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
      Reviewed-by: NLiao Chang <liaochang1@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      f676dd4c
    • Z
      dm thin: Fix ABBA deadlock between shrink_slab and dm_pool_abort_metadata · 6ae2a8a9
      Zhihao Cheng 提交于
      mainline inclusion
      from mainline-v6.1
      commit 8111964f
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I65I9A
      CVE: NA
      
      -------------------------------
      
      Following concurrent processes:
      
                P1(drop cache)                P2(kworker)
      drop_caches_sysctl_handler
       drop_slab
        shrink_slab
         down_read(&shrinker_rwsem)  - LOCK A
         do_shrink_slab
          super_cache_scan
           prune_icache_sb
            dispose_list
             evict
              ext4_evict_inode
      	 ext4_clear_inode
      	  ext4_discard_preallocations
      	   ext4_mb_load_buddy_gfp
      	    ext4_mb_init_cache
      	     ext4_read_block_bitmap_nowait
      	      ext4_read_bh_nowait
      	       submit_bh
      	        dm_submit_bio
      		                 do_worker
      				  process_deferred_bios
      				   commit
      				    metadata_operation_failed
      				     dm_pool_abort_metadata
      				      down_write(&pmd->root_lock) - LOCK B
      		                      __destroy_persistent_data_objects
      				       dm_block_manager_destroy
      				        dm_bufio_client_destroy
      				         unregister_shrinker
      					  down_write(&shrinker_rwsem)
      		 thin_map                            |
      		  dm_thin_find_block                 ↓
      		   down_read(&pmd->root_lock) --> ABBA deadlock
      
      , which triggers hung task:
      
      [   76.974820] INFO: task kworker/u4:3:63 blocked for more than 15 seconds.
      [   76.976019]       Not tainted 6.1.0-rc4-00011-g8f17dd350364-dirty #910
      [   76.978521] task:kworker/u4:3    state:D stack:0     pid:63    ppid:2
      [   76.978534] Workqueue: dm-thin do_worker
      [   76.978552] Call Trace:
      [   76.978564]  __schedule+0x6ba/0x10f0
      [   76.978582]  schedule+0x9d/0x1e0
      [   76.978588]  rwsem_down_write_slowpath+0x587/0xdf0
      [   76.978600]  down_write+0xec/0x110
      [   76.978607]  unregister_shrinker+0x2c/0xf0
      [   76.978616]  dm_bufio_client_destroy+0x116/0x3d0
      [   76.978625]  dm_block_manager_destroy+0x19/0x40
      [   76.978629]  __destroy_persistent_data_objects+0x5e/0x70
      [   76.978636]  dm_pool_abort_metadata+0x8e/0x100
      [   76.978643]  metadata_operation_failed+0x86/0x110
      [   76.978649]  commit+0x6a/0x230
      [   76.978655]  do_worker+0xc6e/0xd90
      [   76.978702]  process_one_work+0x269/0x630
      [   76.978714]  worker_thread+0x266/0x630
      [   76.978730]  kthread+0x151/0x1b0
      [   76.978772] INFO: task test.sh:2646 blocked for more than 15 seconds.
      [   76.979756]       Not tainted 6.1.0-rc4-00011-g8f17dd350364-dirty #910
      [   76.982111] task:test.sh         state:D stack:0     pid:2646  ppid:2459
      [   76.982128] Call Trace:
      [   76.982139]  __schedule+0x6ba/0x10f0
      [   76.982155]  schedule+0x9d/0x1e0
      [   76.982159]  rwsem_down_read_slowpath+0x4f4/0x910
      [   76.982173]  down_read+0x84/0x170
      [   76.982177]  dm_thin_find_block+0x4c/0xd0
      [   76.982183]  thin_map+0x201/0x3d0
      [   76.982188]  __map_bio+0x5b/0x350
      [   76.982195]  dm_submit_bio+0x2b6/0x930
      [   76.982202]  __submit_bio+0x123/0x2d0
      [   76.982209]  submit_bio_noacct_nocheck+0x101/0x3e0
      [   76.982222]  submit_bio_noacct+0x389/0x770
      [   76.982227]  submit_bio+0x50/0xc0
      [   76.982232]  submit_bh_wbc+0x15e/0x230
      [   76.982238]  submit_bh+0x14/0x20
      [   76.982241]  ext4_read_bh_nowait+0xc5/0x130
      [   76.982247]  ext4_read_block_bitmap_nowait+0x340/0xc60
      [   76.982254]  ext4_mb_init_cache+0x1ce/0xdc0
      [   76.982259]  ext4_mb_load_buddy_gfp+0x987/0xfa0
      [   76.982263]  ext4_discard_preallocations+0x45d/0x830
      [   76.982274]  ext4_clear_inode+0x48/0xf0
      [   76.982280]  ext4_evict_inode+0xcf/0xc70
      [   76.982285]  evict+0x119/0x2b0
      [   76.982290]  dispose_list+0x43/0xa0
      [   76.982294]  prune_icache_sb+0x64/0x90
      [   76.982298]  super_cache_scan+0x155/0x210
      [   76.982303]  do_shrink_slab+0x19e/0x4e0
      [   76.982310]  shrink_slab+0x2bd/0x450
      [   76.982317]  drop_slab+0xcc/0x1a0
      [   76.982323]  drop_caches_sysctl_handler+0xb7/0xe0
      [   76.982327]  proc_sys_call_handler+0x1bc/0x300
      [   76.982331]  proc_sys_write+0x17/0x20
      [   76.982334]  vfs_write+0x3d3/0x570
      [   76.982342]  ksys_write+0x73/0x160
      [   76.982347]  __x64_sys_write+0x1e/0x30
      [   76.982352]  do_syscall_64+0x35/0x80
      [   76.982357]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Function metadata_operation_failed() is called when operations failed
      on dm pool metadata, dm pool will destroy and recreate metadata. So,
      shrinker will be unregistered and registered, which could down write
      shrinker_rwsem under pmd_write_lock.
      
      Fix it by allocating dm_block_manager before locking pmd->root_lock
      and destroying old dm_block_manager after unlocking pmd->root_lock,
      then old dm_block_manager is replaced with new dm_block_manager under
      pmd->root_lock. So, shrinker register/unregister could be done without
      holding pmd->root_lock.
      
      Fetch a reproducer in [Link].
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=216676
      Fixes: e49e5829 ("dm thin: add read only and fail io modes")
      
      Conflicts:
      	drivers/md/dm-thin-metadata.c
      	[ 873f258b("dm thin metadata: do not write metadata if no
      	  changes occurred") is not applied.
      	  6a1b1ddc("dm thin metadata: add wrappers for managing
      	  write locking of metadata") is not applied. ]
      Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      6ae2a8a9
    • Z
      sched/qos: Don't unthrottle cfs_rq when cfs_rq is throttled by qos · fbea24f5
      Zhang Qiao 提交于
      hulk inclusion
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I64OUS
      CVE: NA
      
      -------------------------------
      
      When a cfs_rq throttled by qos, mark cfs_rq->throttled as 1,
      and cfs bw will unthrottled this cfs_rq by mistake, it cause
      a list_del_valid warning.
      So add macro QOS_THROTTLED(=2), when a cfs_rq is throttled by
      qos, we mark the cfs_rq->throttled as QOS_THROTTLED, will check
      the value of cfs_rq->throttled before unthrottle a cfs_rq.
      Signed-off-by: NZhang Qiao <zhangqiao22@huawei.com>
      Reviewed-by: NChen Hui <judy.chenhui@huawei.com>
      Reviewed-by: Nzheng zucheng <zhengzucheng@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      fbea24f5
  2. 13 12月, 2022 3 次提交
  3. 08 12月, 2022 26 次提交
    • C
      mm/sharepool: Fix a double free problem caused by init_local_group · e7162989
      Chen Jun 提交于
      hulk inclusion
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I64Y5Y
      CVE: NA
      
      -------------------------------
      
      If local_group_add_task fails in init_local_group. ida free the
      same id twice.
      
      init_local_group
        local_group_add_task    // failed
        goto free_spg
      
      free_spg:
        free_sp_group_locked
          free_sp_group_id      // free spg->id
      free_spg_id:
        free_new_spg_id         // double free spg->id
      
      To fix it, return before calling free_new_spg_id.
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>
      Reviewed-by: Nchenweilong <chenweilong@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      e7162989
    • B
      bpf, test_run: Fix alignment problem in bpf_prog_test_run_skb() · 12f3b19c
      Baisong Zhong 提交于
      stable inclusion
      from stable-v4.19.267
      commit 730fb1ef974a13915bc7651364d8b3318891cd70
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU
      CVE: NA
      
      --------------------------------
      
      commit d3fd203f upstream.
      
      We got a syzkaller problem because of aarch64 alignment fault
      if KFENCE enabled. When the size from user bpf program is an odd
      number, like 399, 407, etc, it will cause the struct skb_shared_info's
      unaligned access. As seen below:
      
        BUG: KFENCE: use-after-free read in __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032
      
        Use-after-free read at 0xffff6254fffac077 (in kfence-#213):
         __lse_atomic_add arch/arm64/include/asm/atomic_lse.h:26 [inline]
         arch_atomic_add arch/arm64/include/asm/atomic.h:28 [inline]
         arch_atomic_inc include/linux/atomic-arch-fallback.h:270 [inline]
         atomic_inc include/asm-generic/atomic-instrumented.h:241 [inline]
         __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032
         skb_clone+0xf4/0x214 net/core/skbuff.c:1481
         ____bpf_clone_redirect net/core/filter.c:2433 [inline]
         bpf_clone_redirect+0x78/0x1c0 net/core/filter.c:2420
         bpf_prog_d3839dd9068ceb51+0x80/0x330
         bpf_dispatcher_nop_func include/linux/bpf.h:728 [inline]
         bpf_test_run+0x3c0/0x6c0 net/bpf/test_run.c:53
         bpf_prog_test_run_skb+0x638/0xa7c net/bpf/test_run.c:594
         bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline]
         __do_sys_bpf kernel/bpf/syscall.c:4441 [inline]
         __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381
      
        kfence-#213: 0xffff6254fffac000-0xffff6254fffac196, size=407, cache=kmalloc-512
      
        allocated by task 15074 on cpu 0 at 1342.585390s:
         kmalloc include/linux/slab.h:568 [inline]
         kzalloc include/linux/slab.h:675 [inline]
         bpf_test_init.isra.0+0xac/0x290 net/bpf/test_run.c:191
         bpf_prog_test_run_skb+0x11c/0xa7c net/bpf/test_run.c:512
         bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline]
         __do_sys_bpf kernel/bpf/syscall.c:4441 [inline]
         __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381
         __arm64_sys_bpf+0x50/0x60 kernel/bpf/syscall.c:4381
      
      To fix the problem, we adjust @size so that (@size + @hearoom) is a
      multiple of SMP_CACHE_BYTES. So we make sure the struct skb_shared_info
      is aligned to a cache line.
      
      Fixes: 1cf1cae9 ("bpf: introduce BPF_PROG_TEST_RUN command")
      Signed-off-by: NBaisong Zhong <zhongbaisong@huawei.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/bpf/20221102081620.1465154-1-zhongbaisong@huawei.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      12f3b19c
    • E
      macvlan: enforce a consistent minimal mtu · 8364d44b
      Eric Dumazet 提交于
      stable inclusion
      from stable-v4.19.267
      commit 650137a7c0b2892df2e5b0bc112d7b09a78c93c8
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU
      CVE: NA
      
      --------------------------------
      
      commit b64085b0 upstream.
      
      macvlan should enforce a minimal mtu of 68, even at link creation.
      
      This patch avoids the current behavior (which could lead to crashes
      in ipv6 stack if the link is brought up)
      
      $ ip link add macvlan1 link eno1 mtu 8 type macvlan  # This should fail !
      $ ip link sh dev macvlan1
      5: macvlan1@eno1: <BROADCAST,MULTICAST> mtu 8 qdisc noop
          state DOWN mode DEFAULT group default qlen 1000
          link/ether 02:47:6c:24:74:82 brd ff:ff:ff:ff:ff:ff
      $ ip link set macvlan1 mtu 67
      Error: mtu less than device minimum.
      $ ip link set macvlan1 mtu 68
      $ ip link set macvlan1 mtu 8
      Error: mtu less than device minimum.
      
      Fixes: 91572088 ("net: use core MTU range checking in core net infra")
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      8364d44b
    • C
      net: macvlan: fix memory leaks of macvlan_common_newlink · aa800fa8
      Chuang Wang 提交于
      stable inclusion
      from stable-v4.19.267
      commit a81b44d1df1f07f00c0dcc0a0b3d2fa24a46289e
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 23569b56 ]
      
      kmemleak reports memory leaks in macvlan_common_newlink, as follows:
      
       ip link add link eth0 name .. type macvlan mode source macaddr add
       <MAC-ADDR>
      
      kmemleak reports:
      
      unreferenced object 0xffff8880109bb140 (size 64):
        comm "ip", pid 284, jiffies 4294986150 (age 430.108s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 b8 aa 5a 12 80 88 ff ff  ..........Z.....
          80 1b fa 0d 80 88 ff ff 1e ff ac af c7 c1 6b 6b  ..............kk
        backtrace:
          [<ffffffff813e06a7>] kmem_cache_alloc_trace+0x1c7/0x300
          [<ffffffff81b66025>] macvlan_hash_add_source+0x45/0xc0
          [<ffffffff81b66a67>] macvlan_changelink_sources+0xd7/0x170
          [<ffffffff81b6775c>] macvlan_common_newlink+0x38c/0x5a0
          [<ffffffff81b6797e>] macvlan_newlink+0xe/0x20
          [<ffffffff81d97f8f>] __rtnl_newlink+0x7af/0xa50
          [<ffffffff81d98278>] rtnl_newlink+0x48/0x70
          ...
      
      In the scenario where the macvlan mode is configured as 'source',
      macvlan_changelink_sources() will be execured to reconfigure list of
      remote source mac addresses, at the same time, if register_netdevice()
      return an error, the resource generated by macvlan_changelink_sources()
      is not cleaned up.
      
      Using this patch, in the case of an error, it will execute
      macvlan_flush_sources() to ensure that the resource is cleaned up.
      
      Fixes: aa5fd0fb ("driver: macvlan: Destroy new macvlan port if macvlan_common_newlink failed.")
      Signed-off-by: NChuang Wang <nashuiliang@gmail.com>
      Link: https://lore.kernel.org/r/20221109090735.690500-1-nashuiliang@gmail.comSigned-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      aa800fa8
    • A
      ipv6: addrlabel: fix infoleak when sending struct ifaddrlblmsg to network · ebaa932a
      Alexander Potapenko 提交于
      stable inclusion
      from stable-v4.19.267
      commit 6d26d0587abccb9835382a0b53faa7b9b1cd83e3
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit c23fb2c8 ]
      
      When copying a `struct ifaddrlblmsg` to the network, __ifal_reserved
      remained uninitialized, resulting in a 1-byte infoleak:
      
        BUG: KMSAN: kernel-network-infoleak in __netdev_start_xmit ./include/linux/netdevice.h:4841
         __netdev_start_xmit ./include/linux/netdevice.h:4841
         netdev_start_xmit ./include/linux/netdevice.h:4857
         xmit_one net/core/dev.c:3590
         dev_hard_start_xmit+0x1dc/0x800 net/core/dev.c:3606
         __dev_queue_xmit+0x17e8/0x4350 net/core/dev.c:4256
         dev_queue_xmit ./include/linux/netdevice.h:3009
         __netlink_deliver_tap_skb net/netlink/af_netlink.c:307
         __netlink_deliver_tap+0x728/0xad0 net/netlink/af_netlink.c:325
         netlink_deliver_tap net/netlink/af_netlink.c:338
         __netlink_sendskb net/netlink/af_netlink.c:1263
         netlink_sendskb+0x1d9/0x200 net/netlink/af_netlink.c:1272
         netlink_unicast+0x56d/0xf50 net/netlink/af_netlink.c:1360
         nlmsg_unicast ./include/net/netlink.h:1061
         rtnl_unicast+0x5a/0x80 net/core/rtnetlink.c:758
         ip6addrlbl_get+0xfad/0x10f0 net/ipv6/addrlabel.c:628
         rtnetlink_rcv_msg+0xb33/0x1570 net/core/rtnetlink.c:6082
        ...
        Uninit was created at:
         slab_post_alloc_hook+0x118/0xb00 mm/slab.h:742
         slab_alloc_node mm/slub.c:3398
         __kmem_cache_alloc_node+0x4f2/0x930 mm/slub.c:3437
         __do_kmalloc_node mm/slab_common.c:954
         __kmalloc_node_track_caller+0x117/0x3d0 mm/slab_common.c:975
         kmalloc_reserve net/core/skbuff.c:437
         __alloc_skb+0x27a/0xab0 net/core/skbuff.c:509
         alloc_skb ./include/linux/skbuff.h:1267
         nlmsg_new ./include/net/netlink.h:964
         ip6addrlbl_get+0x490/0x10f0 net/ipv6/addrlabel.c:608
         rtnetlink_rcv_msg+0xb33/0x1570 net/core/rtnetlink.c:6082
         netlink_rcv_skb+0x299/0x550 net/netlink/af_netlink.c:2540
         rtnetlink_rcv+0x26/0x30 net/core/rtnetlink.c:6109
         netlink_unicast_kernel net/netlink/af_netlink.c:1319
         netlink_unicast+0x9ab/0xf50 net/netlink/af_netlink.c:1345
         netlink_sendmsg+0xebc/0x10f0 net/netlink/af_netlink.c:1921
        ...
      
      This patch ensures that the reserved field is always initialized.
      
      Reported-by: syzbot+3553517af6020c4f2813f1003fe76ef3cbffe98d@syzkaller.appspotmail.com
      Fixes: 2a8cc6c8 ("[IPV6] ADDRCONF: Support RFC3484 configurable address selection policy table.")
      Signed-off-by: NAlexander Potapenko <glider@google.com>
      Reviewed-by: NDavid Ahern <dsahern@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      ebaa932a
    • J
      net: gso: fix panic on frag_list with mixed head alloc types · d6f805ed
      Jiri Benc 提交于
      stable inclusion
      from stable-v4.19.267
      commit bd5362e58721e4d0d1a37796593bd6e51536ce7a
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 9e4b7a99 ]
      
      Since commit 3dcbdb13 ("net: gso: Fix skb_segment splat when
      splitting gso_size mangled skb having linear-headed frag_list"), it is
      allowed to change gso_size of a GRO packet. However, that commit assumes
      that "checking the first list_skb member suffices; i.e if either of the
      list_skb members have non head_frag head, then the first one has too".
      
      It turns out this assumption does not hold. We've seen BUG_ON being hit
      in skb_segment when skbs on the frag_list had differing head_frag with
      the vmxnet3 driver. This happens because __netdev_alloc_skb and
      __napi_alloc_skb can return a skb that is page backed or kmalloced
      depending on the requested size. As the result, the last small skb in
      the GRO packet can be kmalloced.
      
      There are three different locations where this can be fixed:
      
      (1) We could check head_frag in GRO and not allow GROing skbs with
          different head_frag. However, that would lead to performance
          regression on normal forward paths with unmodified gso_size, where
          !head_frag in the last packet is not a problem.
      
      (2) Set a flag in bpf_skb_net_grow and bpf_skb_net_shrink indicating
          that NETIF_F_SG is undesirable. That would need to eat a bit in
          sk_buff. Furthermore, that flag can be unset when all skbs on the
          frag_list are page backed. To retain good performance,
          bpf_skb_net_grow/shrink would have to walk the frag_list.
      
      (3) Walk the frag_list in skb_segment when determining whether
          NETIF_F_SG should be cleared. This of course slows things down.
      
      This patch implements (3). To limit the performance impact in
      skb_segment, the list is walked only for skbs with SKB_GSO_DODGY set
      that have gso_size changed. Normal paths thus will not hit it.
      
      We could check only the last skb but since we need to walk the whole
      list anyway, let's stay on the safe side.
      
      Fixes: 3dcbdb13 ("net: gso: Fix skb_segment splat when splitting gso_size mangled skb having linear-headed frag_list")
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Reviewed-by: NWillem de Bruijn <willemb@google.com>
      Link: https://lore.kernel.org/r/e04426a6a91baf4d1081e1b478c82b5de25fdf21.1667407944.git.jbenc@redhat.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      d6f805ed
    • K
      tcp/udp: Make early_demux back namespacified. · 41b39f2b
      Kuniyuki Iwashima 提交于
      stable inclusion
      from stable-v4.19.265
      commit 7162f05f1f0f2b9554ea207842a1389c067cc1db
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU
      CVE: NA
      
      --------------------------------
      
      commit 11052589 upstream.
      
      Commit e21145a9 ("ipv4: namespacify ip_early_demux sysctl knob") made
      it possible to enable/disable early_demux on a per-netns basis.  Then, we
      introduced two knobs, tcp_early_demux and udp_early_demux, to switch it for
      TCP/UDP in commit dddb64bc ("net: Add sysctl to toggle early demux for
      tcp and udp").  However, the .proc_handler() was wrong and actually
      disabled us from changing the behaviour in each netns.
      
      We can execute early_demux if net.ipv4.ip_early_demux is on and each proto
      .early_demux() handler is not NULL.  When we toggle (tcp|udp)_early_demux,
      the change itself is saved in each netns variable, but the .early_demux()
      handler is a global variable, so the handler is switched based on the
      init_net's sysctl variable.  Thus, netns (tcp|udp)_early_demux knobs have
      nothing to do with the logic.  Whether we CAN execute proto .early_demux()
      is always decided by init_net's sysctl knob, and whether we DO it or not is
      by each netns ip_early_demux knob.
      
      This patch namespacifies (tcp|udp)_early_demux again.  For now, the users
      of the .early_demux() handler are TCP and UDP only, and they are called
      directly to avoid retpoline.  So, we can remove the .early_demux() handler
      from inet6?_protos and need not dereference them in ip6?_rcv_finish_core().
      If another proto needs .early_demux(), we can restore it at that time.
      
      Fixes: dddb64bc ("net: Add sysctl to toggle early demux for tcp and udp")
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20220713175207.7727-1-kuniyu@amazon.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      41b39f2b
    • Z
      ipv6: fix WARNING in ip6_route_net_exit_late() · 05a9ea59
      Zhengchao Shao 提交于
      stable inclusion
      from stable-v4.19.265
      commit 83fbf246ced54dadd7b9adc2a16efeff30ba944d
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 768b3c74 ]
      
      During the initialization of ip6_route_net_init_late(), if file
      ipv6_route or rt6_stats fails to be created, the initialization is
      successful by default. Therefore, the ipv6_route or rt6_stats file
      doesn't be found during the remove in ip6_route_net_exit_late(). It
      will cause WRNING.
      
      The following is the stack information:
      name 'rt6_stats'
      WARNING: CPU: 0 PID: 9 at fs/proc/generic.c:712 remove_proc_entry+0x389/0x460
      Modules linked in:
      Workqueue: netns cleanup_net
      RIP: 0010:remove_proc_entry+0x389/0x460
      PKRU: 55555554
      Call Trace:
      <TASK>
      ops_exit_list+0xb0/0x170
      cleanup_net+0x4ea/0xb00
      process_one_work+0x9bf/0x1710
      worker_thread+0x665/0x1080
      kthread+0x2e4/0x3a0
      ret_from_fork+0x1f/0x30
      </TASK>
      
      Fixes: cdb18761 ("[NETNS][IPV6] route6 - create route6 proc files for the namespace")
      Signed-off-by: NZhengchao Shao <shaozhengchao@huawei.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20221102020610.351330-1-shaozhengchao@huawei.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      05a9ea59
    • C
      net, neigh: Fix null-ptr-deref in neigh_table_clear() · d31519a4
      Chen Zhongjin 提交于
      stable inclusion
      from stable-v4.19.265
      commit b736592de2aa53aee2d48d6b129bc0c892007bbe
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit f8017317 ]
      
      When IPv6 module gets initialized but hits an error in the middle,
      kenel panic with:
      
      KASAN: null-ptr-deref in range [0x0000000000000598-0x000000000000059f]
      CPU: 1 PID: 361 Comm: insmod
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
      RIP: 0010:__neigh_ifdown.isra.0+0x24b/0x370
      RSP: 0018:ffff888012677908 EFLAGS: 00000202
      ...
      Call Trace:
       <TASK>
       neigh_table_clear+0x94/0x2d0
       ndisc_cleanup+0x27/0x40 [ipv6]
       inet6_init+0x21c/0x2cb [ipv6]
       do_one_initcall+0xd3/0x4d0
       do_init_module+0x1ae/0x670
      ...
      Kernel panic - not syncing: Fatal exception
      
      When ipv6 initialization fails, it will try to cleanup and calls:
      
      neigh_table_clear()
        neigh_ifdown(tbl, NULL)
          pneigh_queue_purge(&tbl->proxy_queue, dev_net(dev == NULL))
          # dev_net(NULL) triggers null-ptr-deref.
      
      Fix it by passing NULL to pneigh_queue_purge() in neigh_ifdown() if dev
      is NULL, to make kernel not panic immediately.
      
      Fixes: 66ba215c ("neigh: fix possible DoS due to net iface start/stop loop")
      Signed-off-by: NChen Zhongjin <chenzhongjin@huawei.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Reviewed-by: NDenis V. Lunev <den@openvz.org>
      Link: https://lore.kernel.org/r/20221101121552.21890-1-chenzhongjin@huawei.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      d31519a4
    • N
      tcp: fix indefinite deferral of RTO with SACK reneging · e6d23a4b
      Neal Cardwell 提交于
      stable inclusion
      from stable-v4.19.264
      commit 633da7b30b240000b1d9b690e43848406a0d060f
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 3d2af9cc ]
      
      This commit fixes a bug that can cause a TCP data sender to repeatedly
      defer RTOs when encountering SACK reneging.
      
      The bug is that when we're in fast recovery in a scenario with SACK
      reneging, every time we get an ACK we call tcp_check_sack_reneging()
      and it can note the apparent SACK reneging and rearm the RTO timer for
      srtt/2 into the future. In some SACK reneging scenarios that can
      happen repeatedly until the receive window fills up, at which point
      the sender can't send any more, the ACKs stop arriving, and the RTO
      fires at srtt/2 after the last ACK. But that can take far too long
      (O(10 secs)), since the connection is stuck in fast recovery with a
      low cwnd that cannot grow beyond ssthresh, even if more bandwidth is
      available.
      
      This fix changes the logic in tcp_check_sack_reneging() to only rearm
      the RTO timer if data is cumulatively ACKed, indicating forward
      progress. This avoids this kind of nearly infinite loop of RTO timer
      re-arming. In addition, this meets the goals of
      tcp_check_sack_reneging() in handling Windows TCP behavior that looks
      temporarily like SACK reneging but is not really.
      
      Many thanks to Jakub Kicinski and Neil Spring, who reported this issue
      and provided critical packet traces that enabled root-causing this
      issue. Also, many thanks to Jakub Kicinski for testing this fix.
      
      Fixes: 5ae344c9 ("tcp: reduce spurious retransmits due to transient SACK reneging")
      Reported-by: NJakub Kicinski <kuba@kernel.org>
      Reported-by: NNeil Spring <ntspring@fb.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Tested-by: NJakub Kicinski <kuba@kernel.org>
      Link: https://lore.kernel.org/r/20221021170821.1093930-1-ncardwell.kernel@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      e6d23a4b
    • Z
      net: fix UAF issue in nfqnl_nf_hook_drop() when ops_init() failed · c50c8779
      Zhengchao Shao 提交于
      stable inclusion
      from stable-v4.19.264
      commit 5a2ea549be94924364f6911227d99be86e8cf34a
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit d266935a ]
      
      When the ops_init() interface is invoked to initialize the net, but
      ops->init() fails, data is released. However, the ptr pointer in
      net->gen is invalid. In this case, when nfqnl_nf_hook_drop() is invoked
      to release the net, invalid address access occurs.
      
      The process is as follows:
      setup_net()
      	ops_init()
      		data = kzalloc(...)   ---> alloc "data"
      		net_assign_generic()  ---> assign "date" to ptr in net->gen
      		...
      		ops->init()           ---> failed
      		...
      		kfree(data);          ---> ptr in net->gen is invalid
      	...
      	ops_exit_list()
      		...
      		nfqnl_nf_hook_drop()
      			*q = nfnl_queue_pernet(net) ---> q is invalid
      
      The following is the Call Trace information:
      BUG: KASAN: use-after-free in nfqnl_nf_hook_drop+0x264/0x280
      Read of size 8 at addr ffff88810396b240 by task ip/15855
      Call Trace:
      <TASK>
      dump_stack_lvl+0x8e/0xd1
      print_report+0x155/0x454
      kasan_report+0xba/0x1f0
      nfqnl_nf_hook_drop+0x264/0x280
      nf_queue_nf_hook_drop+0x8b/0x1b0
      __nf_unregister_net_hook+0x1ae/0x5a0
      nf_unregister_net_hooks+0xde/0x130
      ops_exit_list+0xb0/0x170
      setup_net+0x7ac/0xbd0
      copy_net_ns+0x2e6/0x6b0
      create_new_namespaces+0x382/0xa50
      unshare_nsproxy_namespaces+0xa6/0x1c0
      ksys_unshare+0x3a4/0x7e0
      __x64_sys_unshare+0x2d/0x40
      do_syscall_64+0x35/0x80
      entry_SYSCALL_64_after_hwframe+0x46/0xb0
      </TASK>
      
      Allocated by task 15855:
      kasan_save_stack+0x1e/0x40
      kasan_set_track+0x21/0x30
      __kasan_kmalloc+0xa1/0xb0
      __kmalloc+0x49/0xb0
      ops_init+0xe7/0x410
      setup_net+0x5aa/0xbd0
      copy_net_ns+0x2e6/0x6b0
      create_new_namespaces+0x382/0xa50
      unshare_nsproxy_namespaces+0xa6/0x1c0
      ksys_unshare+0x3a4/0x7e0
      __x64_sys_unshare+0x2d/0x40
      do_syscall_64+0x35/0x80
      entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      Freed by task 15855:
      kasan_save_stack+0x1e/0x40
      kasan_set_track+0x21/0x30
      kasan_save_free_info+0x2a/0x40
      ____kasan_slab_free+0x155/0x1b0
      slab_free_freelist_hook+0x11b/0x220
      __kmem_cache_free+0xa4/0x360
      ops_init+0xb9/0x410
      setup_net+0x5aa/0xbd0
      copy_net_ns+0x2e6/0x6b0
      create_new_namespaces+0x382/0xa50
      unshare_nsproxy_namespaces+0xa6/0x1c0
      ksys_unshare+0x3a4/0x7e0
      __x64_sys_unshare+0x2d/0x40
      do_syscall_64+0x35/0x80
      entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      Fixes: f875bae0 ("net: Automatically allocate per namespace data.")
      Signed-off-by: NZhengchao Shao <shaozhengchao@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      c50c8779
    • I
      serial: 8250: Flush DMA Rx on RLSI · 176d2de6
      Ilpo Järvinen 提交于
      stable inclusion
      from stable-v4.19.267
      commit 40f5fa26c11bca5c947d218cc4fe6e0c64932070
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU
      CVE: NA
      
      --------------------------------
      
      commit 1980860e upstream.
      
      Returning true from handle_rx_dma() without flushing DMA first creates
      a data ordering hazard. If DMA Rx has handled any character at the
      point when RLSI occurs, the non-DMA path handles any pending characters
      jumping them ahead of those characters that are pending under DMA.
      
      Fixes: 75df022b ("serial: 8250_dma: Fix RX handling")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@linux.intel.com>
      Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Link: https://lore.kernel.org/r/20221108121952.5497-5-ilpo.jarvinen@linux.intel.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      176d2de6
    • I
      serial: 8250: Fall back to non-DMA Rx if IIR_RDI occurs · 80703cbe
      Ilpo Järvinen 提交于
      stable inclusion
      from stable-v4.19.267
      commit 62cda857457c7de0922852d54d69b140bd6eeb7e
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU
      CVE: NA
      
      --------------------------------
      
      commit a931237c upstream.
      
      DW UART sometimes triggers IIR_RDI during DMA Rx when IIR_RX_TIMEOUT
      should have been triggered instead. Since IIR_RDI has higher priority
      than IIR_RX_TIMEOUT, this causes the Rx to hang into interrupt loop.
      The problem seems to occur at least with some combinations of
      small-sized transfers (I've reproduced the problem on Elkhart Lake PSE
      UARTs).
      
      If there's already an on-going Rx DMA and IIR_RDI triggers, fall
      graciously back to non-DMA Rx. That is, behave as if IIR_RX_TIMEOUT had
      occurred.
      
      8250_omap already considers IIR_RDI similar to this change so its
      nothing unheard of.
      
      Fixes: 75df022b ("serial: 8250_dma: Fix RX handling")
      Cc: <stable@vger.kernel.org>
      Co-developed-by: NSrikanth Thokala <srikanth.thokala@intel.com>
      Signed-off-by: NSrikanth Thokala <srikanth.thokala@intel.com>
      Co-developed-by: NAman Kumar <aman.kumar@intel.com>
      Signed-off-by: NAman Kumar <aman.kumar@intel.com>
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@linux.intel.com>
      Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Link: https://lore.kernel.org/r/20221108121952.5497-2-ilpo.jarvinen@linux.intel.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      80703cbe
    • G
      capabilities: fix potential memleak on error path from vfs_getxattr_alloc() · 779c11cb
      Gaosheng Cui 提交于
      stable inclusion
      from stable-v4.19.265
      commit 90577bcc01c4188416a47269f8433f70502abe98
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU
      CVE: NA
      
      --------------------------------
      
      commit 8cf0a1bc upstream.
      
      In cap_inode_getsecurity(), we will use vfs_getxattr_alloc() to
      complete the memory allocation of tmpbuf, if we have completed
      the memory allocation of tmpbuf, but failed to call handler->get(...),
      there will be a memleak in below logic:
      
        |-- ret = (int)vfs_getxattr_alloc(mnt_userns, ...)
          |           /* ^^^ alloc for tmpbuf */
          |-- value = krealloc(*xattr_value, error + 1, flags)
          |           /* ^^^ alloc memory */
          |-- error = handler->get(handler, ...)
          |           /* error! */
          |-- *xattr_value = value
          |           /* xattr_value is &tmpbuf (memory leak!) */
      
      So we will try to free(tmpbuf) after vfs_getxattr_alloc() fails to fix it.
      
      Cc: stable@vger.kernel.org
      Fixes: 8db6c34f ("Introduce v3 namespaced file capabilities")
      Signed-off-by: NGaosheng Cui <cuigaosheng1@huawei.com>
      Acked-by: NSerge Hallyn <serge@hallyn.com>
      [PM: subject line and backtrace tweaks]
      Signed-off-by: NPaul Moore <paul@paul-moore.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      779c11cb
    • A
      security: commoncap: fix -Wstringop-overread warning · 7e2d8bfc
      Arnd Bergmann 提交于
      stable inclusion
      from stable-v4.19.191
      commit 2f34dd12fd7a28888286924d74c0313532bc52d8
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU
      CVE: NA
      
      --------------------------------
      
      commit 82e5d8cc upstream.
      
      gcc-11 introdces a harmless warning for cap_inode_getsecurity:
      
      security/commoncap.c: In function ‘cap_inode_getsecurity’:
      security/commoncap.c:440:33: error: ‘memcpy’ reading 16 bytes from a region of size 0 [-Werror=stringop-overread]
        440 |                                 memcpy(&nscap->data, &cap->data, sizeof(__le32) * 2 * VFS_CAP_U32);
            |                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      The problem here is that tmpbuf is initialized to NULL, so gcc assumes
      it is not accessible unless it gets set by vfs_getxattr_alloc().  This is
      a legitimate warning as far as I can tell, but the code is correct since
      it correctly handles the error when that function fails.
      
      Add a separate NULL check to tell gcc about it as well.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NChristian Brauner <christian.brauner@ubuntu.com>
      Signed-off-by: NJames Morris <jamorris@linux.microsoft.com>
      Cc: Andrey Zhizhikin <andrey.z@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      7e2d8bfc
    • D
      ring_buffer: Do not deactivate non-existant pages · ee1cf85b
      Daniil Tatianin 提交于
      stable inclusion
      from stable-v4.19.267
      commit 455ea324770205525cbc13f155806a5346794339
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU
      CVE: NA
      
      --------------------------------
      
      commit 56f4ca0a upstream.
      
      rb_head_page_deactivate() expects cpu_buffer to contain a valid list of
      ->pages, so verify that the list is actually present before calling it.
      
      Found by Linux Verification Center (linuxtesting.org) with the SVACE
      static analysis tool.
      
      Link: https://lkml.kernel.org/r/20221114143129.3534443-1-d-tatianin@yandex-team.ru
      
      Cc: stable@vger.kernel.org
      Fixes: 77ae365e ("ring-buffer: make lockless")
      Signed-off-by: NDaniil Tatianin <d-tatianin@yandex-team.ru>
      Signed-off-by: NSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      ee1cf85b
    • X
      ftrace: Fix null pointer dereference in ftrace_add_mod() · 33ffa638
      Xiu Jianfeng 提交于
      stable inclusion
      from stable-v4.19.267
      commit b5bfc61f541d3f092b13dedcfe000d86eb8e133c
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU
      CVE: NA
      
      --------------------------------
      
      commit 19ba6c8a upstream.
      
      The @ftrace_mod is allocated by kzalloc(), so both the members {prev,next}
      of @ftrace_mode->list are NULL, it's not a valid state to call list_del().
      If kstrdup() for @ftrace_mod->{func|module} fails, it goes to @out_free
      tag and calls free_ftrace_mod() to destroy @ftrace_mod, then list_del()
      will write prev->next and next->prev, where null pointer dereference
      happens.
      
      BUG: kernel NULL pointer dereference, address: 0000000000000008
      Oops: 0002 [#1] PREEMPT SMP NOPTI
      Call Trace:
       <TASK>
       ftrace_mod_callback+0x20d/0x220
       ? do_filp_open+0xd9/0x140
       ftrace_process_regex.isra.51+0xbf/0x130
       ftrace_regex_write.isra.52.part.53+0x6e/0x90
       vfs_write+0xee/0x3a0
       ? __audit_filter_op+0xb1/0x100
       ? auditd_test_task+0x38/0x50
       ksys_write+0xa5/0xe0
       do_syscall_64+0x3a/0x90
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      Kernel panic - not syncing: Fatal exception
      
      So call INIT_LIST_HEAD() to initialize the list member to fix this issue.
      
      Link: https://lkml.kernel.org/r/20221116015207.30858-1-xiujianfeng@huawei.com
      
      Cc: stable@vger.kernel.org
      Fixes: 673feb9d ("ftrace: Add :mod: caching infrastructure to trace_array")
      Signed-off-by: NXiu Jianfeng <xiujianfeng@huawei.com>
      Signed-off-by: NSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      33ffa638
    • W
      ftrace: Optimize the allocation for mcount entries · 3d193fb1
      Wang Wensheng 提交于
      stable inclusion
      from stable-v4.19.267
      commit d110bb57a7e9831465aa3abb6c0d1cc658b05fbe
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU
      CVE: NA
      
      --------------------------------
      
      commit bcea02b0 upstream.
      
      If we can't allocate this size, try something smaller with half of the
      size. Its order should be decreased by one instead of divided by two.
      
      Link: https://lkml.kernel.org/r/20221109094434.84046-3-wangwensheng4@huawei.com
      
      Cc: <mhiramat@kernel.org>
      Cc: <mark.rutland@arm.com>
      Cc: stable@vger.kernel.org
      Fixes: a7900875 ("ftrace: Allocate the mcount record pages as groups")
      Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com>
      Signed-off-by: NSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      3d193fb1
    • L
      kprobe: reverse kp->flags when arm_kprobe failed · 04150923
      Li Qiang 提交于
      stable inclusion
      from stable-v4.19.265
      commit d608ed66abfaccc233404be2583ab89c37e560fc
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU
      CVE: NA
      
      --------------------------------
      
      commit 4a6f316d upstream.
      
      In aggregate kprobe case, when arm_kprobe failed,
      we need set the kp->flags with KPROBE_FLAG_DISABLED again.
      If not, the 'kp' kprobe will been considered as enabled
      but it actually not enabled.
      
      Link: https://lore.kernel.org/all/20220902155820.34755-1-liq3ea@163.com/
      
      Fixes: 12310e34 ("kprobes: Propagate error from arm_kprobe_ftrace()")
      Cc: stable@vger.kernel.org
      Signed-off-by: NLi Qiang <liq3ea@163.com>
      Acked-by: NMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: NMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      04150923
    • A
      mm: fs: initialize fsdata passed to write_begin/write_end interface · c01f46a9
      Alexander Potapenko 提交于
      stable inclusion
      from stable-v4.19.267
      commit 8a5be2948f350d34b1f6acb9ca3be4c89359a057
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU
      CVE: NA
      
      --------------------------------
      
      commit 1468c6f4 upstream.
      
      Functions implementing the a_ops->write_end() interface accept the `void
      *fsdata` parameter that is supposed to be initialized by the corresponding
      a_ops->write_begin() (which accepts `void **fsdata`).
      
      However not all a_ops->write_begin() implementations initialize `fsdata`
      unconditionally, so it may get passed uninitialized to a_ops->write_end(),
      resulting in undefined behavior.
      
      Fix this by initializing fsdata with NULL before the call to
      write_begin(), rather than doing so in all possible a_ops implementations.
      
      This patch covers only the following cases found by running x86 KMSAN
      under syzkaller:
      
       - generic_perform_write()
       - cont_expand_zero() and generic_cont_expand_simple()
       - page_symlink()
      
      Other cases of passing uninitialized fsdata may persist in the codebase.
      
      Link: https://lkml.kernel.org/r/20220915150417.722975-43-glider@google.comSigned-off-by: NAlexander Potapenko <glider@google.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers@google.com>
      Cc: Eric Biggers <ebiggers@kernel.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Ilya Leoshkevich <iii@linux.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Marco Elver <elver@google.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      c01f46a9
    • Z
      nfs4: Fix kmemleak when allocate slot failed · 920f74ac
      Zhang Xiaoxu 提交于
      stable inclusion
      from stable-v4.19.265
      commit 86ce0e93cf6fb4d0c447323ac66577c642628b9d
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 7e843672 ]
      
      If one of the slot allocate failed, should cleanup all the other
      allocated slots, otherwise, the allocated slots will leak:
      
        unreferenced object 0xffff8881115aa100 (size 64):
          comm ""mount.nfs"", pid 679, jiffies 4294744957 (age 115.037s)
          hex dump (first 32 bytes):
            00 cc 19 73 81 88 ff ff 00 a0 5a 11 81 88 ff ff  ...s......Z.....
            00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          backtrace:
            [<000000007a4c434a>] nfs4_find_or_create_slot+0x8e/0x130
            [<000000005472a39c>] nfs4_realloc_slot_table+0x23f/0x270
            [<00000000cd8ca0eb>] nfs40_init_client+0x4a/0x90
            [<00000000128486db>] nfs4_init_client+0xce/0x270
            [<000000008d2cacad>] nfs4_set_client+0x1a2/0x2b0
            [<000000000e593b52>] nfs4_create_server+0x300/0x5f0
            [<00000000e4425dd2>] nfs4_try_get_tree+0x65/0x110
            [<00000000d3a6176f>] vfs_get_tree+0x41/0xf0
            [<0000000016b5ad4c>] path_mount+0x9b3/0xdd0
            [<00000000494cae71>] __x64_sys_mount+0x190/0x1d0
            [<000000005d56bdec>] do_syscall_64+0x35/0x80
            [<00000000687c9ae4>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      Fixes: abf79bb3 ("NFS: Add a slot table to struct nfs_client for NFSv4.0 transport blocking")
      Signed-off-by: NZhang Xiaoxu <zhangxiaoxu5@huawei.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      920f74ac
    • C
      kernfs: fix use-after-free in __kernfs_remove · a1a691b4
      Christian A. Ehrhardt 提交于
      stable inclusion
      from stable-v4.19.264
      commit 028cf780743eea79abffa7206b9dcfc080ad3546
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU
      CVE: NA
      
      --------------------------------
      
      commit 4abc9965 upstream.
      
      Syzkaller managed to trigger concurrent calls to
      kernfs_remove_by_name_ns() for the same file resulting in
      a KASAN detected use-after-free. The race occurs when the root
      node is freed during kernfs_drain().
      
      To prevent this acquire an additional reference for the root
      of the tree that is removed before calling __kernfs_remove().
      
      Found by syzkaller with the following reproducer (slab_nomerge is
      required):
      
      syz_mount_image$ext4(0x0, &(0x7f0000000100)='./file0\x00', 0x100000, 0x0, 0x0, 0x0, 0x0)
      r0 = openat(0xffffffffffffff9c, &(0x7f0000000080)='/proc/self/exe\x00', 0x0, 0x0)
      close(r0)
      pipe2(&(0x7f0000000140)={0xffffffffffffffff, <r1=>0xffffffffffffffff}, 0x800)
      mount$9p_fd(0x0, &(0x7f0000000040)='./file0\x00', &(0x7f00000000c0), 0x408, &(0x7f0000000280)={'trans=fd,', {'rfdno', 0x3d, r0}, 0x2c, {'wfdno', 0x3d, r1}, 0x2c, {[{@cache_loose}, {@mmap}, {@loose}, {@loose}, {@mmap}], [{@mask={'mask', 0x3d, '^MAY_EXEC'}}, {@fsmagic={'fsmagic', 0x3d, 0x10001}}, {@dont_hash}]}})
      
      Sample report:
      
      ==================================================================
      BUG: KASAN: use-after-free in kernfs_type include/linux/kernfs.h:335 [inline]
      BUG: KASAN: use-after-free in kernfs_leftmost_descendant fs/kernfs/dir.c:1261 [inline]
      BUG: KASAN: use-after-free in __kernfs_remove.part.0+0x843/0x960 fs/kernfs/dir.c:1369
      Read of size 2 at addr ffff8880088807f0 by task syz-executor.2/857
      
      CPU: 0 PID: 857 Comm: syz-executor.2 Not tainted 6.0.0-rc3-00363-g7726d4c3 #5
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0x6e/0x91 lib/dump_stack.c:106
       print_address_description mm/kasan/report.c:317 [inline]
       print_report.cold+0x5e/0x5e5 mm/kasan/report.c:433
       kasan_report+0xa3/0x130 mm/kasan/report.c:495
       kernfs_type include/linux/kernfs.h:335 [inline]
       kernfs_leftmost_descendant fs/kernfs/dir.c:1261 [inline]
       __kernfs_remove.part.0+0x843/0x960 fs/kernfs/dir.c:1369
       __kernfs_remove fs/kernfs/dir.c:1356 [inline]
       kernfs_remove_by_name_ns+0x108/0x190 fs/kernfs/dir.c:1589
       sysfs_slab_add+0x133/0x1e0 mm/slub.c:5943
       __kmem_cache_create+0x3e0/0x550 mm/slub.c:4899
       create_cache mm/slab_common.c:229 [inline]
       kmem_cache_create_usercopy+0x167/0x2a0 mm/slab_common.c:335
       p9_client_create+0xd4d/0x1190 net/9p/client.c:993
       v9fs_session_init+0x1e6/0x13c0 fs/9p/v9fs.c:408
       v9fs_mount+0xb9/0xbd0 fs/9p/vfs_super.c:126
       legacy_get_tree+0xf1/0x200 fs/fs_context.c:610
       vfs_get_tree+0x85/0x2e0 fs/super.c:1530
       do_new_mount fs/namespace.c:3040 [inline]
       path_mount+0x675/0x1d00 fs/namespace.c:3370
       do_mount fs/namespace.c:3383 [inline]
       __do_sys_mount fs/namespace.c:3591 [inline]
       __se_sys_mount fs/namespace.c:3568 [inline]
       __x64_sys_mount+0x282/0x300 fs/namespace.c:3568
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      RIP: 0033:0x7f725f983aed
      Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f725f0f7028 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
      RAX: ffffffffffffffda RBX: 00007f725faa3f80 RCX: 00007f725f983aed
      RDX: 00000000200000c0 RSI: 0000000020000040 RDI: 0000000000000000
      RBP: 00007f725f9f419c R08: 0000000020000280 R09: 0000000000000000
      R10: 0000000000000408 R11: 0000000000000246 R12: 0000000000000000
      R13: 0000000000000006 R14: 00007f725faa3f80 R15: 00007f725f0d7000
       </TASK>
      
      Allocated by task 855:
       kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
       kasan_set_track mm/kasan/common.c:45 [inline]
       set_alloc_info mm/kasan/common.c:437 [inline]
       __kasan_slab_alloc+0x66/0x80 mm/kasan/common.c:470
       kasan_slab_alloc include/linux/kasan.h:224 [inline]
       slab_post_alloc_hook mm/slab.h:727 [inline]
       slab_alloc_node mm/slub.c:3243 [inline]
       slab_alloc mm/slub.c:3251 [inline]
       __kmem_cache_alloc_lru mm/slub.c:3258 [inline]
       kmem_cache_alloc+0xbf/0x200 mm/slub.c:3268
       kmem_cache_zalloc include/linux/slab.h:723 [inline]
       __kernfs_new_node+0xd4/0x680 fs/kernfs/dir.c:593
       kernfs_new_node fs/kernfs/dir.c:655 [inline]
       kernfs_create_dir_ns+0x9c/0x220 fs/kernfs/dir.c:1010
       sysfs_create_dir_ns+0x127/0x290 fs/sysfs/dir.c:59
       create_dir lib/kobject.c:63 [inline]
       kobject_add_internal+0x24a/0x8d0 lib/kobject.c:223
       kobject_add_varg lib/kobject.c:358 [inline]
       kobject_init_and_add+0x101/0x160 lib/kobject.c:441
       sysfs_slab_add+0x156/0x1e0 mm/slub.c:5954
       __kmem_cache_create+0x3e0/0x550 mm/slub.c:4899
       create_cache mm/slab_common.c:229 [inline]
       kmem_cache_create_usercopy+0x167/0x2a0 mm/slab_common.c:335
       p9_client_create+0xd4d/0x1190 net/9p/client.c:993
       v9fs_session_init+0x1e6/0x13c0 fs/9p/v9fs.c:408
       v9fs_mount+0xb9/0xbd0 fs/9p/vfs_super.c:126
       legacy_get_tree+0xf1/0x200 fs/fs_context.c:610
       vfs_get_tree+0x85/0x2e0 fs/super.c:1530
       do_new_mount fs/namespace.c:3040 [inline]
       path_mount+0x675/0x1d00 fs/namespace.c:3370
       do_mount fs/namespace.c:3383 [inline]
       __do_sys_mount fs/namespace.c:3591 [inline]
       __se_sys_mount fs/namespace.c:3568 [inline]
       __x64_sys_mount+0x282/0x300 fs/namespace.c:3568
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Freed by task 857:
       kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
       kasan_set_track+0x21/0x30 mm/kasan/common.c:45
       kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:370
       ____kasan_slab_free mm/kasan/common.c:367 [inline]
       ____kasan_slab_free mm/kasan/common.c:329 [inline]
       __kasan_slab_free+0x108/0x190 mm/kasan/common.c:375
       kasan_slab_free include/linux/kasan.h:200 [inline]
       slab_free_hook mm/slub.c:1754 [inline]
       slab_free_freelist_hook mm/slub.c:1780 [inline]
       slab_free mm/slub.c:3534 [inline]
       kmem_cache_free+0x9c/0x340 mm/slub.c:3551
       kernfs_put.part.0+0x2b2/0x520 fs/kernfs/dir.c:547
       kernfs_put+0x42/0x50 fs/kernfs/dir.c:521
       __kernfs_remove.part.0+0x72d/0x960 fs/kernfs/dir.c:1407
       __kernfs_remove fs/kernfs/dir.c:1356 [inline]
       kernfs_remove_by_name_ns+0x108/0x190 fs/kernfs/dir.c:1589
       sysfs_slab_add+0x133/0x1e0 mm/slub.c:5943
       __kmem_cache_create+0x3e0/0x550 mm/slub.c:4899
       create_cache mm/slab_common.c:229 [inline]
       kmem_cache_create_usercopy+0x167/0x2a0 mm/slab_common.c:335
       p9_client_create+0xd4d/0x1190 net/9p/client.c:993
       v9fs_session_init+0x1e6/0x13c0 fs/9p/v9fs.c:408
       v9fs_mount+0xb9/0xbd0 fs/9p/vfs_super.c:126
       legacy_get_tree+0xf1/0x200 fs/fs_context.c:610
       vfs_get_tree+0x85/0x2e0 fs/super.c:1530
       do_new_mount fs/namespace.c:3040 [inline]
       path_mount+0x675/0x1d00 fs/namespace.c:3370
       do_mount fs/namespace.c:3383 [inline]
       __do_sys_mount fs/namespace.c:3591 [inline]
       __se_sys_mount fs/namespace.c:3568 [inline]
       __x64_sys_mount+0x282/0x300 fs/namespace.c:3568
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      The buggy address belongs to the object at ffff888008880780
       which belongs to the cache kernfs_node_cache of size 128
      The buggy address is located 112 bytes inside of
       128-byte region [ffff888008880780, ffff888008880800)
      
      The buggy address belongs to the physical page:
      page:00000000732833f8 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8880
      flags: 0x100000000000200(slab|node=0|zone=1)
      raw: 0100000000000200 0000000000000000 dead000000000122 ffff888001147280
      raw: 0000000000000000 0000000000150015 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff888008880680: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
       ffff888008880700: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
      >ffff888008880780: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                                   ^
       ffff888008880800: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
       ffff888008880880: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
      ==================================================================
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: stable <stable@kernel.org> # -rc3
      Signed-off-by: NChristian A. Ehrhardt <lk@c--e.de>
      Link: https://lore.kernel.org/r/20220913121723.691454-1-lk@c--e.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      a1a691b4
    • R
      mm,hugetlb: take hugetlb_lock before decrementing h->resv_huge_pages · 7a5b0955
      Rik van Riel 提交于
      stable inclusion
      from stable-v4.19.264
      commit 2b35432d324898ec41beb27031d2a1a864a4d40e
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU
      CVE: NA
      
      --------------------------------
      
      commit 12df140f upstream.
      
      The h->*_huge_pages counters are protected by the hugetlb_lock, but
      alloc_huge_page has a corner case where it can decrement the counter
      outside of the lock.
      
      This could lead to a corrupted value of h->resv_huge_pages, which we have
      observed on our systems.
      
      Take the hugetlb_lock before decrementing h->resv_huge_pages to avoid a
      potential race.
      
      Link: https://lkml.kernel.org/r/20221017202505.0e6a4fcd@imladris.surriel.com
      Fixes: a88c7695 ("mm: hugetlb: fix hugepage memory leak caused by wrong reserve count")
      Signed-off-by: NRik van Riel <riel@surriel.com>
      Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Glen McCready <gkmccready@meta.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      7a5b0955
    • S
      mm: /proc/pid/smaps_rollup: fix no vma's null-deref · 33213b46
      Seth Jenkins 提交于
      stable inclusion
      from stable-v4.19.264
      commit dbe863bce7679c7f5ec0e993d834fe16c5e687b5
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU
      CVE: NA
      
      --------------------------------
      
      Commit 258f669e ("mm: /proc/pid/smaps_rollup: convert to single value
      seq_file") introduced a null-deref if there are no vma's in the task in
      show_smaps_rollup.
      
      Fixes: 258f669e ("mm: /proc/pid/smaps_rollup: convert to single value seq_file")
      Signed-off-by: NSeth Jenkins <sethjenkins@google.com>
      Reviewed-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Tested-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      33213b46
    • X
      signal handling: don't use BUG_ON() for debugging · a2f88993
      Xia Fukun 提交于
      stable inclusion
      from stable-v4.19.267
      commit 93d9cef55f8fe463e3b9f6c73c7a32619222c657
      category: bugfix
      bugzilla: 187828, https://gitee.com/openeuler/kernel/issues/I63UEU
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit a382f8fe ]
      
      These are indeed "should not happen" situations, but it turns out recent
      changes made the 'task_is_stopped_or_trace()' case trigger (fix for that
      exists, is pending more testing), and the BUG_ON() makes it
      unnecessarily hard to actually debug for no good reason.
      
      It's been that way for a long time, but let's make it clear: BUG_ON() is
      not good for debugging, and should never be used in situations where you
      could just say "this shouldn't happen, but we can continue".
      
      Use WARN_ON_ONCE() instead to make sure it gets logged, and then just
      continue running.  Instead of making the system basically unusuable
      because you crashed the machine while potentially holding some very core
      locks (eg this function is commonly called while holding 'tasklist_lock'
      for writing).
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NXia Fukun <xiafukun@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      a2f88993
    • X
      ida: don't use BUG_ON() for debugging · eaab6483
      Xia Fukun 提交于
      stable inclusion
      from stable-v4.19.267
      commit 33d2f83e3f2c1fdabb365d25bed3aa630041cbc0
      category: bugfix
      bugzilla: 188002, https://gitee.com/openeuler/kernel/issues/I63UEU
      CVE: NA
      
      --------------------------------
      
      commit fc82bbf4 upstream.
      
      This is another old BUG_ON() that just shouldn't exist (see also commit
      a382f8fe: "signal handling: don't use BUG_ON() for debugging").
      
      In fact, as Matthew Wilcox points out, this condition shouldn't really
      even result in a warning, since a negative id allocation result is just
      a normal allocation failure:
      
        "I wonder if we should even warn here -- sure, the caller is trying to
         free something that wasn't allocated, but we don't warn for
         kfree(NULL)"
      
      and goes on to point out how that current error check is only causing
      people to unnecessarily do their own index range checking before freeing
      it.
      
      This was noted by Itay Iellin, because the bluetooth HCI socket cookie
      code does *not* do that range checking, and ends up just freeing the
      error case too, triggering the BUG_ON().
      
      The HCI code requires CAP_NET_RAW, and seems to just result in an ugly
      splat, but there really is no reason to BUG_ON() here, and we have
      generally striven for allocation models where it's always ok to just do
      
          free(alloc());
      
      even if the allocation were to fail for some random reason (usually
      obviously that "random" reason being some resource limit).
      
      Fixes: 88eca020 ("ida: simplified functions for id allocation")
      Reported-by: NItay Iellin <ieitayie@gmail.com>
      Suggested-by: NMatthew Wilcox <willy@infradead.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NXia Fukun <xiafukun@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      eaab6483
  4. 06 12月, 2022 1 次提交
    • O
      !272 [openEuler-1.0-LTS] Add MWAIT Cx support for Zhaoxin CPUs. · 75ea48ac
      openeuler-ci-bot 提交于
      Merge Pull Request from: @leoliu-oc 
       
      When the processor is idle,low-power idle states (C-states) can be used to save power. For Zhaoxin processors,there are two methods to enter idle states. One is HLT instruction and legacy method of I/O reads from the CPI-defined register (known as P_LVLx),the other one is MWAIT instruction with idle states hints.
      
      Default for legacy operating system,HLT and P_LVLx I/O reads are used for Zhaoxin Processors to enter idle states, but we have checked on some Zhaoxin platform that MWAIT instruction is more efficient than P_LVLx I/O reads and HLT, so we add MWAIT Cx support for Zhaoxin Processors.
      
      ### Issue
      https://gitee.com/openeuler/kernel/issues/I62TOM
      
      ### Test
      N/A
      
      ### Known Issue
      N/A
      
      ### Default config change
      N/A
      
       
       
      Link:https://gitee.com/openeuler/kernel/pulls/272 
      Reviewed-by: Laibin Qiu <qiulaibin@huawei.com> 
      Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com> 
      75ea48ac
  5. 05 12月, 2022 3 次提交
  6. 29 11月, 2022 4 次提交