1. 20 6月, 2023 6 次提交
  2. 19 6月, 2023 2 次提交
  3. 16 6月, 2023 2 次提交
  4. 15 6月, 2023 9 次提交
    • H
      sched: Fix negative count for jump label · cde6dbb8
      Hui Tang 提交于
      hulk inclusion
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I7DA63
      CVE: NA
      
      --------------------------------
      
      Add mutex lock to prevent negative count for jump label.
      
      [28612.530675] ------------[ cut here ]------------
      [28612.532708] jump label: negative count!
      [28612.535031] WARNING: CPU: 4 PID: 3899 at kernel/jump_label.c:202
      	__static_key_slow_dec_cpuslocked+0x204/0x240
      [28612.538216] Kernel panic - not syncing: panic_on_warn set ...
      [28612.538216]
      [28612.540487] CPU: 4 PID: 3899 Comm: sh Kdump: loaded Not tainted
      [28612.542788] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
      [28612.546455] Call Trace:
      [28612.547339]  dump_stack+0xc6/0x11e
      [28612.548546]  ? __static_key_slow_dec_cpuslocked+0x200/0x240
      [28612.550352]  panic+0x1d6/0x46b
      [28612.551375]  ? refcount_error_report+0x2a5/0x2a5
      [28612.552915]  ? kmsg_dump_rewind_nolock+0xde/0xde
      [28612.554358]  ? sched_clock_cpu+0x18/0x1b0
      [28612.555699]  ? __warn+0x1d1/0x210
      [28612.556799]  ? __static_key_slow_dec_cpuslocked+0x204/0x240
      [28612.558548]  __warn+0x1ec/0x210
      [28612.559621]  ? __static_key_slow_dec_cpuslocked+0x204/0x240
      [28612.561536]  report_bug+0x1ee/0x2b0
      [28612.562706]  fixup_bug.part.4+0x37/0x80
      [28612.563937]  do_error_trap+0x21c/0x260
      [28612.565109]  ? fixup_bug.part.4+0x80/0x80
      [28612.566453]  ? check_preemption_disabled+0x34/0x1f0
      [28612.567991]  ? trace_hardirqs_off_thunk+0x1a/0x1c
      [28612.569534]  ? lockdep_hardirqs_off+0x1cb/0x2b0
      [28612.570993]  ? error_entry+0x9a/0x130
      [28612.572138]  ? trace_hardirqs_off_caller+0x59/0x1a0
      [28612.573710]  ? trace_hardirqs_off_thunk+0x1a/0x1c
      [28612.575232]  invalid_op+0x14/0x20
      [root@lo[ca2lh8ost6 12.576387]  ? vprintk_func+0x68/0x1a0
      [28612.577827]  ? __static_key_slow_dec_cpuslocked+0x204/0x240
      smartg[ri2d]8# 612.579662]  ? __static_key_slow_dec_cpuslocked+0x204/0x240
      [28612.581781]  ? static_key_disable+0x30/0x30
      [28612.583248]  ? s
      tatic_key_slow_dec+0x57/0x90
      [28612.584997]  ? tg_set_dynamic_affinity_mode+0x42/0x70
      [28612.586714]  ? cgroup_file_write+0x471/0x6a0
      [28612.588162]  ? cgroup_css.part.4+0x100/0x100
      [28612.589579]  ? cgroup_css.part.4+0x100/0x100
      [28612.591031]  ? kernfs_fop_write+0x2af/0x430
      [28612.592625]  ? kernfs_vma_page_mkwrite+0x230/0x230
      [28612.594274]  ? __vfs_write+0xef/0x680
      [28612.595590]  ? kernel_read+0x110/0x110
      ea8612.596899]  ? check_preemption_disabled+0x3mkd4ir/: 0canxno1t fcr0
      Signed-off-by: NHui Tang <tanghui20@huawei.com>
      Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
      Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>
      cde6dbb8
    • H
      sched: Fix possible deadlock in tg_set_dynamic_affinity_mode · 21e5d85e
      Hui Tang 提交于
      hulk inclusion
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I7CGD0
      CVE: NA
      
      ----------------------------------------
      
      Deadlock occurs in two situations as follows:
      
      The first case:
      
      tg_set_dynamic_affinity_mode    --- raw_spin_lock_irq(&auto_affi->lock);
      	->start_auto_affintiy   --- trigger timer
      		->tg_update_task_prefer_cpus
      			>css_task_inter_next
      				->raw_spin_unlock_irq
      
      hr_timer_run_queues
        ->sched_auto_affi_period_timer --- try spin lock (&auto_affi->lock)
      
      The second case as follows:
      
      [  291.470810] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
      [  291.472715] rcu:     1-...0: (0 ticks this GP) idle=a6a/1/0x4000000000000002 softirq=78516/78516 fqs=5249
      [  291.475268] rcu:     (detected by 6, t=21006 jiffies, g=202169, q=9862)
      [  291.477038] Sending NMI from CPU 6 to CPUs 1:
      [  291.481268] NMI backtrace for cpu 1
      [  291.481273] CPU: 1 PID: 1923 Comm: sh Kdump: loaded Not tainted 4.19.90+ #150
      [  291.481278] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
      [  291.481281] RIP: 0010:queued_spin_lock_slowpath+0x136/0x9a0
      [  291.481289] Code: c0 74 3f 49 89 dd 48 89 dd 48 b8 00 00 00 00 00 fc ff df 49 c1 ed 03 83 e5 07 49 01 c5 83 c5 03 48 83 05 c4 66 b9 05 01 f3 90 <41> 0f b6 45 00 40 38 c5 7c 08 84 c0 0f 85 ad 07 00 00 0
      [  291.481292] RSP: 0018:ffff88801de87cd8 EFLAGS: 00000002
      [  291.481297] RAX: 0000000000000101 RBX: ffff888001be0a28 RCX: ffffffffb8090f7d
      [  291.481301] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff888001be0a28
      [  291.481304] RBP: 0000000000000003 R08: ffffed100037c146 R09: ffffed100037c146
      [  291.481307] R10: 000000001106b143 R11: ffffed100037c145 R12: 1ffff11003bd0f9c
      [  291.481311] R13: ffffed100037c145 R14: fffffbfff7a38dee R15: dffffc0000000000
      [  291.481315] FS:  00007fac4f306740(0000) GS:ffff88801de80000(0000) knlGS:0000000000000000
      [  291.481318] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  291.481321] CR2: 00007fac4f4bb650 CR3: 00000000046b6000 CR4: 00000000000006e0
      [  291.481323] Call Trace:
      [  291.481324]  <IRQ>
      [  291.481326]  ? osq_unlock+0x2a0/0x2a0
      [  291.481329]  ? check_preemption_disabled+0x4c/0x290
      [  291.481331]  ? rcu_accelerate_cbs+0x33/0xed0
      [  291.481333]  _raw_spin_lock_irqsave+0x83/0xa0
      [  291.481336]  sched_auto_affi_period_timer+0x251/0x820
      [  291.481338]  ? __remove_hrtimer+0x151/0x200
      [  291.481340]  __hrtimer_run_queues+0x39d/0xa50
      [  291.481343]  ? tg_update_affinity_domain_down+0x460/0x460
      [  291.481345]  ? enqueue_hrtimer+0x2e0/0x2e0
      [  291.481348]  ? ktime_get_update_offsets_now+0x1d7/0x2c0
      [  291.481350]  hrtimer_run_queues+0x243/0x470
      [  291.481352]  run_local_timers+0x5e/0x150
      [  291.481354]  update_process_times+0x36/0xb0
      [  291.481357]  tick_sched_handle.isra.4+0x7c/0x180
      [  291.481359]  tick_nohz_handler+0xd1/0x1d0
      [  291.481365]  smp_apic_timer_interrupt+0x12c/0x4e0
      [  291.481368]  apic_timer_interrupt+0xf/0x20
      [  291.481370]  </IRQ>
      [  291.481372]  ? smp_call_function_many+0x68c/0x840
      [  291.481375]  ? smp_call_function_many+0x6ab/0x840
      [  291.481377]  ? arch_unregister_cpu+0x60/0x60
      [  291.481379]  ? native_set_fixmap+0x100/0x180
      [  291.481381]  ? arch_unregister_cpu+0x60/0x60
      [  291.481384]  ? set_task_select_cpus+0x116/0x940
      [  291.481386]  ? smp_call_function+0x53/0xc0
      [  291.481388]  ? arch_unregister_cpu+0x60/0x60
      [  291.481390]  ? on_each_cpu+0x49/0xf0
      [  291.481393]  ? set_task_select_cpus+0x115/0x940
      [  291.481395]  ? text_poke_bp+0xff/0x180
      [  291.481397]  ? poke_int3_handler+0xc0/0xc0
      [  291.481400]  ? __set_prefer_cpus_ptr.constprop.4+0x1cd/0x900
      [  291.481402]  ? hrtick+0x1b0/0x1b0
      [  291.481404]  ? set_task_select_cpus+0x115/0x940
      [  291.481407]  ? __jump_label_transform.isra.0+0x3a1/0x470
      [  291.481409]  ? kernel_init+0x280/0x280
      [  291.481411]  ? kasan_check_read+0x1d/0x30
      [  291.481413]  ? mutex_lock+0x96/0x100
      [  291.481415]  ? __mutex_lock_slowpath+0x30/0x30
      [  291.481418]  ? arch_jump_label_transform+0x52/0x80
      [  291.481420]  ? set_task_select_cpus+0x115/0x940
      [  291.481422]  ? __jump_label_update+0x1a1/0x1e0
      [  291.481424]  ? jump_label_update+0x2ee/0x3b0
      [  291.481427]  ? static_key_slow_inc_cpuslocked+0x1c8/0x2d0
      [  291.481430]  ? start_auto_affinity+0x190/0x200
      [  291.481432]  ? tg_set_dynamic_affinity_mode+0xad/0xf0
      [  291.481435]  ? cpu_affinity_mode_write_u64+0x22/0x30
      [  291.481437]  ? cgroup_file_write+0x46f/0x660
      [  291.481439]  ? cgroup_init_cftypes+0x300/0x300
      [  291.481441]  ? __mutex_lock_slowpath+0x30/0x30
      Signed-off-by: NHui Tang <tanghui20@huawei.com>
      Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
      Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>
      21e5d85e
    • H
      sched: fix WARN found by deadlock detect · 217edab9
      Hui Tang 提交于
      hulk inclusion
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I7BQZ0
      CVE: NA
      
      ----------------------------------------
      
      The WARNING report when run:
      echo 1 > /sys/fs/cgroup/cpu/cpu.dynamic_affinity_mode
      
      [  147.276757] WARNING: CPU: 5 PID: 1770 at kernel/cpu.c:326 \
      	lockdep_assert_cpus_held+0xac/0xd0
      [  147.279670] Kernel panic - not syncing: panic_on_warn set ...
      [  147.279670]
      [  147.282211] CPU: 5 PID: 1770 Comm: bash Kdump: loaded Not tainted 4.19
      [  147.284796] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)..
      [  147.290963] Call Trace:
      [  147.292459]  dump_stack+0xc6/0x11e
      [  147.294295]  ? lockdep_assert_cpus_held+0xa0/0xd0
      [  147.296876]  panic+0x1d6/0x46b
      [  147.298591]  ? refcount_error_report+0x2a5/0x2a5
      [  147.301131]  ? kmsg_dump_rewind_nolock+0xde/0xde
      [  147.303738]  ? sched_clock_cpu+0x18/0x1b0
      [  147.305943]  ? __warn+0x1d1/0x210
      [  147.307831]  ? lockdep_assert_cpus_held+0xac/0xd0
      [  147.310469]  __warn+0x1ec/0x210
      [  147.312271]  ? lockdep_assert_cpus_held+0xac/0xd0
      [  147.314838]  report_bug+0x1ee/0x2b0
      [  147.316798]  fixup_bug.part.4+0x37/0x80
      [  147.318946]  do_error_trap+0x21c/0x260
      [  147.321062]  ? fixup_bug.part.4+0x80/0x80
      [  147.323253]  ? check_preemption_disabled+0x34/0x1f0
      [  147.324886]  ? trace_hardirqs_off_thunk+0x1a/0x1c
      [  147.326277]  ? lockdep_hardirqs_off+0x1cb/0x2b0
      [  147.327505]  ? error_entry+0x9a/0x130
      [  147.328523]  ? trace_hardirqs_off_caller+0x59/0x1a0
      [  147.329844]  ? trace_hardirqs_off_thunk+0x1a/0x1c
      [  147.331124]  invalid_op+0x14/0x20
      [  147.332057]  ? vprintk_func+0x68/0x1a0
      [  147.333082]  ? lockdep_assert_cpus_held+0xac/0xd0
      [  147.334355]  ? lockdep_assert_cpus_held+0xac/0xd0
      [  147.335624]  ? static_key_slow_inc_cpuslocked+0x5a/0x230
      [  147.337079]  ? tg_set_dynamic_affinity_mode+0x4f/0x70
      [  147.338444]  ? cgroup_file_write+0x471/0x6a0
      [  147.339604]  ? cgroup_css.part.4+0x100/0x100
      [  147.340782]  ? cgroup_css.part.4+0x100/0x100
      [  147.341943]  ? kernfs_fop_write+0x2af/0x430
      [  147.343083]  ? kernfs_vma_page_mkwrite+0x230/0x230
      [  147.344401]  ? __vfs_write+0xef/0x680
      [  147.345404]  ? kernel_read+0x110/0x110
      Signed-off-by: NHui Tang <tanghui20@huawei.com>
      Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
      Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>
      217edab9
    • H
      sched: fix smart grid usage count · d9099163
      Hui Tang 提交于
      hulk inclusion
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I7D98G
      CVE: NA
      
      ----------------------------------------
      
      smart_grid_usage_dec() should called when free taskgroup
      if the mode is auto.
      Signed-off-by: NHui Tang <tanghui20@huawei.com>
      Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
      Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>
      d9099163
    • H
      sched: Add static key to reduce noise · 373fd236
      Hui Tang 提交于
      hulk inclusion
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I7A718
      
      --------------------------------
      
      Add static key to reduce noise when not enable dynamic affinity.
      There are better performance in some case, such for lmbench.
      
      Fixes: 243865da ("cpuset: Introduce new interface for scheduler ...")
      Signed-off-by: NHui Tang <tanghui20@huawei.com>
      Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
      Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>
      373fd236
    • D
      net: nsh: Use correct mac_offset to unwind gso skb in nsh_gso_segment() · 822fb46b
      Dong Chenchen 提交于
      stable inclusion
      from stable-v4.19.283
      commit d2309e0cb27b6871b273fbc1725e93be62570d86
      category: bugfix
      bugzilla: 188702, https://gitee.com/openeuler/kernel/issues/I7DUPI
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=d2309e0cb27b6871b273fbc1725e93be62570d86
      
      --------------------------------
      
      [ Upstream commit c83b4938 ]
      
      As the call trace shows, skb_panic was caused by wrong skb->mac_header
      in nsh_gso_segment():
      
      invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
      CPU: 3 PID: 2737 Comm: syz Not tainted 6.3.0-next-20230505 #1
      RIP: 0010:skb_panic+0xda/0xe0
      call Trace:
       skb_push+0x91/0xa0
       nsh_gso_segment+0x4f3/0x570
       skb_mac_gso_segment+0x19e/0x270
       __skb_gso_segment+0x1e8/0x3c0
       validate_xmit_skb+0x452/0x890
       validate_xmit_skb_list+0x99/0xd0
       sch_direct_xmit+0x294/0x7c0
       __dev_queue_xmit+0x16f0/0x1d70
       packet_xmit+0x185/0x210
       packet_snd+0xc15/0x1170
       packet_sendmsg+0x7b/0xa0
       sock_sendmsg+0x14f/0x160
      
      The root cause is:
      nsh_gso_segment() use skb->network_header - nhoff to reset mac_header
      in skb_gso_error_unwind() if inner-layer protocol gso fails.
      However, skb->network_header may be reset by inner-layer protocol
      gso function e.g. mpls_gso_segment. skb->mac_header reset by the
      inaccurate network_header will be larger than skb headroom.
      
      nsh_gso_segment
          nhoff = skb->network_header - skb->mac_header;
          __skb_pull(skb,nsh_len)
          skb_mac_gso_segment
              mpls_gso_segment
                  skb_reset_network_header(skb);//skb->network_header+=nsh_len
                  return -EINVAL;
          skb_gso_error_unwind
              skb_push(skb, nsh_len);
              skb->mac_header = skb->network_header - nhoff;
              // skb->mac_header > skb->headroom, cause skb_push panic
      
      Use correct mac_offset to restore mac_header and get rid of nhoff.
      
      Fixes: c411ed85 ("nsh: add GSO support")
      Reported-by: syzbot+632b5d9964208bfef8c0@syzkaller.appspotmail.com
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDong Chenchen <dongchenchen2@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NDong Chenchen <dongchenchen2@huawei.com>
      Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      822fb46b
    • O
      !1134 【openEuler-1.0-LTS】cpufreq:conservative: Fix load in fast_dbs_update() · f59c1a3e
      openeuler-ci-bot 提交于
      Merge Pull Request from: @xuesinian 
       
      Remove "dbs_update(policy)" for getting load in fast_dbs_update(), incoming "load" from cs_dbs_update().
      
      Load results are inaccurate after two consecutive updates, resulting in inaccurate frequency scaling.
      
      Related issue : #I7DJU2  
       
      Link:https://gitee.com/openeuler/kernel/pulls/1134 
      
      Reviewed-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> 
      Signed-off-by: Zhang Changzhong <zhangchangzhong@huawei.com> 
      f59c1a3e
    • C
      firewire: fix potential uaf in outbound_phy_packet_callback() · 160e0014
      Chengfeng Ye 提交于
      stable inclusion
      from stable-v4.19.242
      commit 34380b5647f13fecb458fea9a3eb3d8b3a454709
      category: bugfix
      bugzilla: https://gitee.com/src-openeuler/kernel/issues/I7BYU9
      CVE: CVE-2023-3159
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=34380b5647f1
      
      --------------------------------
      
      commit b7c81f80 upstream.
      
      &e->event and e point to the same address, and &e->event could
      be freed in queue_event. So there is a potential uaf issue if
      we dereference e after calling queue_event(). Fix this by adding
      a temporary variable to maintain e->client in advance, this can
      avoid the potential uaf issue.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NChengfeng Ye <cyeaa@connect.ust.hk>
      Signed-off-by: NTakashi Sakamoto <o-takashi@sakamocchi.jp>
      Link: https://lore.kernel.org/r/20220409041243.603210-2-o-takashi@sakamocchi.jpSigned-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NWei Li <liwei391@huawei.com>
      Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com>
      Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      160e0014
    • X
      cpufreq: conservative: fix load in fast_dbs_update() · 0dfa77a2
      XueSinian 提交于
      driver inclusion
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I7DJU2
      CVE: NA
      
      ----------------------------------------
      
      Remove "dbs_update(policy)" for getting load in fast_dbs_update(),
      incoming load from cs_dbs_update().
      
      Reason:
      Load results are inaccurate after two consecutive updates, resulting
      in inaccurate frequency scaling.
      
      Fixes: 75704b66 ("cpufreq: conservative: Add a switch to enable fast mode")
      Signed-off-by: NXue Sinian <xuesinian@huawei.com>
      0dfa77a2
  5. 12 6月, 2023 5 次提交
  6. 09 6月, 2023 2 次提交
    • W
      sched: smart grid: init sched_grid_qos structure on QOS purpose · ce35ded5
      Wang ShaoBo 提交于
      hulk inclusion
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I7BQZ0
      CVE: NA
      
      ----------------------------------------
      
      As smart grid scheduling (SGS) may shrink resources and affect task QOS,
      We provide methods for evaluating task QOS in divided grid, we mainly
      focus on the following two aspects:
      
         1. Evaluate whether (such as CPU or memory) resources meet our demand
         2. Ensure the least impact when working with (cpufreq and cpuidle) governors
      
      For tackling this questions, we have summarized several sampling methods
      to obtain tasks' characteristics at same time reducing scheduling noise
      as much as possible:
      
        1. we detected the key factors that how sensitive a process is in cpufreq
           or cpuidle adjustment, and to guide the cpufreq/cpuidle governor
        2. We dynamically monitor process memory bandwidth and adjust memory
           allocation to minimize cross-remote memory access
        3. We provide a variety of load tracking mechanisms to adapt to different
           types of task's load change
      
           ---------------------------------     -----------------
          |            class A              |   |     class B     |
          |    --------        --------     |   |     --------    |
          |   | group0 |      | group1 |    |---|    | group2 |   |----------+
          |    --------        --------     |   |     --------    |          |
          |    CPU/memory sensitive type    |   |   balance type  |          |
           ----------------+----------------     --------+--------           |
                           v                             v                   | (target cpufreq)
           -------------------------------------------------------           | (sensitivity)
          |              Not satisfied with QOS?                  |          |
           --------------------------+----------------------------           |
                                     v                                       v
           -------------------------------------------------------     ----------------
          |              expand or shrink resource                |<--|  energy model  |
           ----------------------------+--------------------------     ----------------
                                       v                                     |
           -----------          -----------          ------------            v
          |           |        |           |        |            |     ---------------
          |   GRID0   +--------+   GRID1   +--------+   GRID2    |<-- |   governor    |
          |           |        |           |        |            |     ---------------
           -----------          -----------          ------------
                         \            |            /
                          \  -------------------  /
                            |  pages migration  |
                             -------------------
      
      We will introduce the energy model in the follow-up implementation, and consider
      the dynamic affinity adjustment between each divided grid in the runtime.
      Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
      Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>
      ce35ded5
    • H
      sched: Introduce smart grid scheduling strategy for cfs · 713cfd26
      Hui Tang 提交于
      hulk inclusion
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I7BQZ0
      CVE: NA
      
      ----------------------------------------
      
      We want to dynamically expand or shrink the affinity range of tasks
      based on the CPU topology level while meeting the minimum resource
      requirements of tasks.
      
      We divide several level of affinity domains according to sched domains:
      
      level4   * SOCKET  [                                                  ]
      level3   * DIE     [                             ]
      level2   * MC      [             ] [             ]
      level1   * SMT     [     ] [     ] [     ] [     ]
      level0   * CPU      0   1   2   3   4   5   6   7
      
      Whether users tend to choose power saving or performance will affect
      strategy of adjusting affinity, when selecting the power saving mode,
      we will choose a more appropriate affinity based on the energy model
      to reduce power consumption, while considering the QOS of resources
      such as CPU and memory consumption, for instance, if the current task
      CPU load is less than required, smart grid will judge whether to aggregate
      tasks together into a smaller range or not according to energy model.
      
      The main difference from EAS is that we pay more attention to the impact
      of power consumption brought by such as cpuidle and DVFS, and classify
      tasks to reduce interference and ensure resource QOS in each divided unit,
      which are more suitable for general-purpose on non-heterogeneous CPUs.
      
              --------        --------        --------
             | group0 |      | group1 |      | group2 |
              --------        --------        --------
      	   |                |              |
      	   v                |              v
             ---------------------+-----     -----------------
            |                  ---v--   |   |
            |       DIE0      |  MC1 |  |   |   DIE1
            |                  ------   |   |
             ---------------------------     -----------------
      
      We regularly count the resource satisfaction of groups, and adjust the
      affinity, scheduling balance and migrating memory will be considered
      based on memory location for better meetting resource requirements.
      Signed-off-by: NHui Tang <tanghui20@huawei.com>
      Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
      Reviewed-by: NChen Hui <judy.chenhui@huawei.com>
      Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
      Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>
      713cfd26
  7. 08 6月, 2023 14 次提交
    • Z
      ipmi: fix SSIF not responding under certain cond. · aaf2ccb4
      Zhang Yuchen 提交于
      stable inclusion
      from stable-v4.19.283
      commit ba810999356bffa4627985123c15327c692318e5
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ba810999356bffa4627985123c15327c692318e5
      
      --------------------------------
      
      [ Upstream commit 6d2555cd ]
      
      The ipmi communication is not restored after a specific version of BMC is
      upgraded on our server.
      The ipmi driver does not respond after printing the following log:
      
          ipmi_ssif: Invalid response getting flags: 1c 1
      
      I found that after entering this branch, ssif_info->ssif_state always
      holds SSIF_GETTING_FLAGS and never return to IDLE.
      
      As a result, the driver cannot be loaded, because the driver status is
      checked during the unload process and must be IDLE in shutdown_ssif():
      
              while (ssif_info->ssif_state != SSIF_IDLE)
                      schedule_timeout(1);
      
      The process trigger this problem is:
      
      1. One msg timeout and next msg start send, and call
      ssif_set_need_watch().
      
      2. ssif_set_need_watch()->watch_timeout()->start_flag_fetch() change
      ssif_state to SSIF_GETTING_FLAGS.
      
      3. In msg_done_handler() ssif_state == SSIF_GETTING_FLAGS, if an error
      message is received, the second branch does not modify the ssif_state.
      
      4. All retry action need IS_SSIF_IDLE() == True. Include retry action in
      watch_timeout(), msg_done_handler(). Sending msg does not work either.
      SSIF_IDLE is also checked in start_next_msg().
      
      5. The only thing that can be triggered in the SSIF driver is
      watch_timeout(), after destory_user(), this timer will stop too.
      
      So, if enter this branch, the ssif_state will remain SSIF_GETTING_FLAGS
      and can't send msg, no timer started, can't unload.
      
      We did a comparative test before and after adding this patch, and the
      result is effective.
      
      Fixes: 25930707 ("ipmi: Add SMBus interface driver (SSIF)")
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NZhang Yuchen <zhangyuchen.lcr@bytedance.com>
      Message-Id: <20230412074907.80046-1-zhangyuchen.lcr@bytedance.com>
      Signed-off-by: NCorey Minyard <minyard@acm.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYi Yang <yiyang13@huawei.com>
      Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      aaf2ccb4
    • C
      ipmi_ssif: Rename idle state and check · b0afa0fa
      Corey Minyard 提交于
      stable inclusion
      from stable-v4.19.283
      commit 13b3a05b5b03b4fce5a9ea03dc91159dea1f6ef9
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=13b3a05b5b03b4fce5a9ea03dc91159dea1f6ef9
      
      --------------------------------
      
      [ Upstream commit 8230831c ]
      
      Rename the SSIF_IDLE() to IS_SSIF_IDLE(), since that is more clear, and
      rename SSIF_NORMAL to SSIF_IDLE, since that's more accurate.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NCorey Minyard <cminyard@mvista.com>
      Stable-dep-of: 6d2555cd ("ipmi: fix SSIF not responding under certain cond.")
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      
      conflict:
      	drivers/char/ipmi/ipmi_ssif.c
      Signed-off-by: NYi Yang <yiyang13@huawei.com>
      Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      b0afa0fa
    • T
      mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock · c68cb237
      Tetsuo Handa 提交于
      stable inclusion
      from stable-v4.19.283
      commit 90c4e02baef3eed8640c9375bd0e75bddd0ec08d
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U
      CVE: NA
      
      --------------------------------
      
      commit 1007843a upstream.
      
      syzbot is reporting circular locking dependency which involves
      zonelist_update_seq seqlock [1], for this lock is checked by memory
      allocation requests which do not need to be retried.
      
      One deadlock scenario is kmalloc(GFP_ATOMIC) from an interrupt handler.
      
        CPU0
        ----
        __build_all_zonelists() {
          write_seqlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount odd
          // e.g. timer interrupt handler runs at this moment
            some_timer_func() {
              kmalloc(GFP_ATOMIC) {
                __alloc_pages_slowpath() {
                  read_seqbegin(&zonelist_update_seq) {
                    // spins forever because zonelist_update_seq.seqcount is odd
                  }
                }
              }
            }
          // e.g. timer interrupt handler finishes
          write_sequnlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount even
        }
      
      This deadlock scenario can be easily eliminated by not calling
      read_seqbegin(&zonelist_update_seq) from !__GFP_DIRECT_RECLAIM allocation
      requests, for retry is applicable to only __GFP_DIRECT_RECLAIM allocation
      requests.  But Michal Hocko does not know whether we should go with this
      approach.
      
      Another deadlock scenario which syzbot is reporting is a race between
      kmalloc(GFP_ATOMIC) from tty_insert_flip_string_and_push_buffer() with
      port->lock held and printk() from __build_all_zonelists() with
      zonelist_update_seq held.
      
        CPU0                                   CPU1
        ----                                   ----
        pty_write() {
          tty_insert_flip_string_and_push_buffer() {
                                               __build_all_zonelists() {
                                                 write_seqlock(&zonelist_update_seq);
                                                 build_zonelists() {
                                                   printk() {
                                                     vprintk() {
                                                       vprintk_default() {
                                                         vprintk_emit() {
                                                           console_unlock() {
                                                             console_flush_all() {
                                                               console_emit_next_record() {
                                                                 con->write() = serial8250_console_write() {
            spin_lock_irqsave(&port->lock, flags);
            tty_insert_flip_string() {
              tty_insert_flip_string_fixed_flag() {
                __tty_buffer_request_room() {
                  tty_buffer_alloc() {
                    kmalloc(GFP_ATOMIC | __GFP_NOWARN) {
                      __alloc_pages_slowpath() {
                        zonelist_iter_begin() {
                          read_seqbegin(&zonelist_update_seq); // spins forever because zonelist_update_seq.seqcount is odd
                                                                   spin_lock_irqsave(&port->lock, flags); // spins forever because port->lock is held
                          }
                        }
                      }
                    }
                  }
                }
              }
            }
            spin_unlock_irqrestore(&port->lock, flags);
                                                                   // message is printed to console
                                                                   spin_unlock_irqrestore(&port->lock, flags);
                                                                 }
                                                               }
                                                             }
                                                           }
                                                         }
                                                       }
                                                     }
                                                   }
                                                 }
                                                 write_sequnlock(&zonelist_update_seq);
                                               }
          }
        }
      
      This deadlock scenario can be eliminated by
      
        preventing interrupt context from calling kmalloc(GFP_ATOMIC)
      
      and
      
        preventing printk() from calling console_flush_all()
      
      while zonelist_update_seq.seqcount is odd.
      
      Since Petr Mladek thinks that __build_all_zonelists() can become a
      candidate for deferring printk() [2], let's address this problem by
      
        disabling local interrupts in order to avoid kmalloc(GFP_ATOMIC)
      
      and
      
        disabling synchronous printk() in order to avoid console_flush_all()
      
      .
      
      As a side effect of minimizing duration of zonelist_update_seq.seqcount
      being odd by disabling synchronous printk(), latency at
      read_seqbegin(&zonelist_update_seq) for both !__GFP_DIRECT_RECLAIM and
      __GFP_DIRECT_RECLAIM allocation requests will be reduced.  Although, from
      lockdep perspective, not calling read_seqbegin(&zonelist_update_seq) (i.e.
      do not record unnecessary locking dependency) from interrupt context is
      still preferable, even if we don't allow calling kmalloc(GFP_ATOMIC)
      inside
      write_seqlock(&zonelist_update_seq)/write_sequnlock(&zonelist_update_seq)
      section...
      
      Link: https://lkml.kernel.org/r/8796b95c-3da3-5885-fddd-6ef55f30e4d3@I-love.SAKURA.ne.jp
      Fixes: 3d36424b ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation")
      Link: https://lkml.kernel.org/r/ZCrs+1cDqPWTDFNM@alley [2]
      Reported-by: Nsyzbot <syzbot+223c7461c58c58a4cb10@syzkaller.appspotmail.com>
        Link: https://syzkaller.appspot.com/bug?extid=223c7461c58c58a4cb10 [1]
      Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NMel Gorman <mgorman@techsingularity.net>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
      Cc: John Ogness <john.ogness@linutronix.de>
      Cc: Patrick Daly <quic_pdaly@quicinc.com>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      Reviewed-by: NNanyong Sun <sunnanyong@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      c68cb237
    • T
      printk: declare printk_deferred_{enter,safe}() in include/linux/printk.h · 13e4745b
      Tetsuo Handa 提交于
      stable inclusion
      from stable-v4.19.283
      commit 09b28fe9ff2fce03efc7d71dc79b58a49b01d0e9
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U
      CVE: NA
      
      --------------------------------
      
      commit 85e3e7fb upstream.
      
      [This patch implements subset of original commit 85e3e7fb ("printk:
      remove NMI tracking") where commit 1007843a ("mm/page_alloc: fix
      potential deadlock on zonelist_update_seq seqlock") depends on, for
      commit 3d36424b ("mm/page_alloc: fix race condition between
      build_all_zonelists and page allocation") was backported to stable.]
      
      All NMI contexts are handled the same as the safe context: store the
      message and defer printing. There is no need to have special NMI
      context tracking for this. Using in_nmi() is enough.
      
      There are several parts of the kernel that are manually calling into
      the printk NMI context tracking in order to cause general printk
      deferred printing:
      
          arch/arm/kernel/smp.c
          arch/powerpc/kexec/crash.c
          kernel/trace/trace.c
      
      For arm/kernel/smp.c and powerpc/kexec/crash.c, provide a new
      function pair printk_deferred_enter/exit that explicitly achieves the
      same objective.
      
      For ftrace, remove the printk context manipulation completely. It was
      added in commit 03fc7f9c ("printk/nmi: Prevent deadlock when
      accessing the main log buffer in NMI"). The purpose was to enforce
      storing messages directly into the ring buffer even in NMI context.
      It really should have only modified the behavior in NMI context.
      There is no need for a special behavior any longer. All messages are
      always stored directly now. The console deferring is handled
      transparently in vprintk().
      Signed-off-by: NJohn Ogness <john.ogness@linutronix.de>
      [pmladek@suse.com: Remove special handling in ftrace.c completely.
      Signed-off-by: NPetr Mladek <pmladek@suse.com>
      Link: https://lore.kernel.org/r/20210715193359.25946-5-john.ogness@linutronix.de
      [penguin-kernel: Copy only printk_deferred_{enter,safe}() definition ]
      Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Conflicts:
      	include/linux/printk.h
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      Reviewed-by: NNanyong Sun <sunnanyong@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      13e4745b
    • I
      serial: 8250: Fix serial8250_tx_empty() race with DMA Tx · e84cc7b8
      Ilpo Järvinen 提交于
      stable inclusion
      from stable-v4.19.283
      commit 3f9cab5766daa1c1e5b389cd12b6e717ce95852f
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U
      CVE: NA
      
      --------------------------------
      
      There's a potential race before THRE/TEMT deasserts when DMA Tx is
      starting up (or the next batch of continuous Tx is being submitted).
      This can lead to misdetecting Tx empty condition.
      
      It is entirely normal for THRE/TEMT to be set for some time after the
      DMA Tx had been setup in serial8250_tx_dma(). As Tx side is definitely
      not empty at that point, it seems incorrect for serial8250_tx_empty()
      claim Tx is empty.
      
      Fix the race by also checking in serial8250_tx_empty() whether there's
      DMA Tx active.
      
      Note: This fix only addresses in-kernel race mainly to make using
      TCSADRAIN/FLUSH robust. Userspace can still cause other races but they
      seem userspace concurrency control problems.
      
      Fixes: 9ee4b83e ("serial: 8250: Add support for dmaengine")
      Cc: stable@vger.kernel.org
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@linux.intel.com>
      Link: https://lore.kernel.org/r/20230317113318.31327-3-ilpo.jarvinen@linux.intel.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      (cherry picked from commit 146a37e0)
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@linux.intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      e84cc7b8
    • I
      tty: Prevent writing chars during tcsetattr TCSADRAIN/FLUSH · 5195a946
      Ilpo Järvinen 提交于
      stable inclusion
      from stable-v4.19.283
      commit 3271859c2d2709515db4ad7a6cbdd3617da04405
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U
      CVE: NA
      
      --------------------------------
      
      If userspace races tcsetattr() with a write, the drained condition
      might not be guaranteed by the kernel. There is a race window after
      checking Tx is empty before tty_set_termios() takes termios_rwsem for
      write. During that race window, more characters can be queued by a
      racing writer.
      
      Any ongoing transmission might produce garbage during HW's
      ->set_termios() call. The intent of TCSADRAIN/FLUSH seems to be
      preventing such a character corruption. If those flags are set, take
      tty's write lock to stop any writer before performing the lower layer
      Tx empty check and wait for the pending characters to be sent (if any).
      
      The initial wait for all-writers-done must be placed outside of tty's
      write lock to avoid deadlock which makes it impossible to use
      tty_wait_until_sent(). The write lock is retried if a racing write is
      detected.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Cc: stable@vger.kernel.org
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@linux.intel.com>
      Link: https://lore.kernel.org/r/20230317113318.31327-2-ilpo.jarvinen@linux.intel.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      (cherry picked from commit 094fb49a)
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@linux.intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      5195a946
    • K
      af_packet: Don't send zero-byte data in packet_sendmsg_spkt(). · dd9c02c5
      Kuniyuki Iwashima 提交于
      stable inclusion
      from stable-v4.19.283
      commit 0a607752b4ef5d48d81b1a1be6ac010f991fdf63
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 6a341729 ]
      
      syzkaller reported a warning below [0].
      
      We can reproduce it by sending 0-byte data from the (AF_PACKET,
      SOCK_PACKET) socket via some devices whose dev->hard_header_len
      is 0.
      
          struct sockaddr_pkt addr = {
              .spkt_family = AF_PACKET,
              .spkt_device = "tun0",
          };
          int fd;
      
          fd = socket(AF_PACKET, SOCK_PACKET, 0);
          sendto(fd, NULL, 0, 0, (struct sockaddr *)&addr, sizeof(addr));
      
      We have a similar fix for the (AF_PACKET, SOCK_RAW) socket as
      commit dc633700 ("net/af_packet: check len when min_header_len
      equals to 0").
      
      Let's add the same test for the SOCK_PACKET socket.
      
      [0]:
      skb_assert_len
      WARNING: CPU: 1 PID: 19945 at include/linux/skbuff.h:2552 skb_assert_len include/linux/skbuff.h:2552 [inline]
      WARNING: CPU: 1 PID: 19945 at include/linux/skbuff.h:2552 __dev_queue_xmit+0x1f26/0x31d0 net/core/dev.c:4159
      Modules linked in:
      CPU: 1 PID: 19945 Comm: syz-executor.0 Not tainted 6.3.0-rc7-02330-gca6270c12e20 #1
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
      RIP: 0010:skb_assert_len include/linux/skbuff.h:2552 [inline]
      RIP: 0010:__dev_queue_xmit+0x1f26/0x31d0 net/core/dev.c:4159
      Code: 89 de e8 1d a2 85 fd 84 db 75 21 e8 64 a9 85 fd 48 c7 c6 80 2a 1f 86 48 c7 c7 c0 06 1f 86 c6 05 23 cf 27 04 01 e8 fa ee 56 fd <0f> 0b e8 43 a9 85 fd 0f b6 1d 0f cf 27 04 31 ff 89 de e8 e3 a1 85
      RSP: 0018:ffff8880217af6e0 EFLAGS: 00010282
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc90001133000
      RDX: 0000000000040000 RSI: ffffffff81186922 RDI: 0000000000000001
      RBP: ffff8880217af8b0 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000001 R11: 0000000000000001 R12: ffff888030045640
      R13: ffff8880300456b0 R14: ffff888030045650 R15: ffff888030045718
      FS:  00007fc5864da640(0000) GS:ffff88806cd00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020005740 CR3: 000000003f856003 CR4: 0000000000770ee0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      PKRU: 55555554
      Call Trace:
       <TASK>
       dev_queue_xmit include/linux/netdevice.h:3085 [inline]
       packet_sendmsg_spkt+0xc4b/0x1230 net/packet/af_packet.c:2066
       sock_sendmsg_nosec net/socket.c:724 [inline]
       sock_sendmsg+0x1b4/0x200 net/socket.c:747
       ____sys_sendmsg+0x331/0x970 net/socket.c:2503
       ___sys_sendmsg+0x11d/0x1c0 net/socket.c:2557
       __sys_sendmmsg+0x18c/0x430 net/socket.c:2643
       __do_sys_sendmmsg net/socket.c:2672 [inline]
       __se_sys_sendmmsg net/socket.c:2669 [inline]
       __x64_sys_sendmmsg+0x9c/0x100 net/socket.c:2669
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3c/0x90 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x72/0xdc
      RIP: 0033:0x7fc58791de5d
      Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 9f 1b 00 f7 d8 64 89 01 48
      RSP: 002b:00007fc5864d9cc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
      RAX: ffffffffffffffda RBX: 00000000004bbf80 RCX: 00007fc58791de5d
      RDX: 0000000000000001 RSI: 0000000020005740 RDI: 0000000000000004
      RBP: 00000000004bbf80 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 000000000000000b R14: 00007fc58797e530 R15: 0000000000000000
       </TASK>
      ---[ end trace 0000000000000000 ]---
      skb len=0 headroom=16 headlen=0 tailroom=304
      mac=(16,0) net=(16,-1) trans=-1
      shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
      csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0)
      hash(0x0 sw=0 l4=0) proto=0x0000 pkttype=0 iif=0
      dev name=sit0 feat=0x00000006401d7869
      sk family=17 type=10 proto=0
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      dd9c02c5
    • F
      nohz: Add TICK_DEP_BIT_RCU · 5fd1535b
      Frederic Weisbecker 提交于
      stable inclusion
      from stable-v4.19.283
      commit fe2ae32a7ec9fa64f61993b808f25315b9996b03
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 01b4c399 ]
      
      If a nohz_full CPU is looping in the kernel, the scheduling-clock tick
      might nevertheless remain disabled.  In !PREEMPT kernels, this can
      prevent RCU's attempts to enlist the aid of that CPU's executions of
      cond_resched(), which can in turn result in an arbitrarily delayed grace
      period and thus an OOM.  RCU therefore needs a way to enable a holdout
      nohz_full CPU's scheduler-clock interrupt.
      
      This commit therefore provides a new TICK_DEP_BIT_RCU value which RCU can
      pass to tick_dep_set_cpu() and friends to force on the scheduler-clock
      interrupt for a specified CPU or task.  In some cases, rcutorture needs
      to turn on the scheduler-clock tick, so this commit also exports the
      relevant symbols to GPL-licensed modules.
      Signed-off-by: NFrederic Weisbecker <frederic@kernel.org>
      Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
      Stable-dep-of: 58d76682 ("tick/nohz: Fix cpu_is_hotpluggable() by checking with nohz subsystem")
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      5fd1535b
    • Y
      perf/core: Fix hardlockup failure caused by perf throttle · b17c9247
      Yang Jihong 提交于
      stable inclusion
      from stable-v4.19.283
      commit 6805c1fcbe37989dd6fa4d545d04ccbea5e69d1f
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 15def34e ]
      
      commit e050e3f0 ("perf: Fix broken interrupt rate throttling")
      introduces a change in throttling threshold judgment. Before this,
      compare hwc->interrupts and max_samples_per_tick, then increase
      hwc->interrupts by 1, but this commit reverses order of these two
      behaviors, causing the semantics of max_samples_per_tick to change.
      In literal sense of "max_samples_per_tick", if hwc->interrupts ==
      max_samples_per_tick, it should not be throttled, therefore, the judgment
      condition should be changed to "hwc->interrupts > max_samples_per_tick".
      
      In fact, this may cause the hardlockup to fail, The minimum value of
      max_samples_per_tick may be 1, in this case, the return value of
      __perf_event_account_interrupt function is 1.
      As a result, nmi_watchdog gets throttled, which would stop PMU (Use x86
      architecture as an example, see x86_pmu_handle_irq).
      
      Fixes: e050e3f0 ("perf: Fix broken interrupt rate throttling")
      Signed-off-by: NYang Jihong <yangjihong1@huawei.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20230227023508.102230-1-yangjihong1@huawei.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      b17c9247
    • M
      of: Fix modalias string generation · 56ce204e
      Miquel Raynal 提交于
      stable inclusion
      from stable-v4.19.283
      commit d72e2dc104e65827798e116f9e4853b85488d3e8
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit b19a4266 ]
      
      The helper generating an OF based modalias (of_device_get_modalias())
      works fine, but due to the use of snprintf() internally it needs a
      buffer one byte longer than what should be needed just for the entire
      string (excluding the '\0'). Most users of this helper are sysfs hooks
      providing the modalias string to users. They all provide a PAGE_SIZE
      buffer which is way above the number of bytes required to fit the
      modalias string and hence do not suffer from this issue.
      
      There is another user though, of_device_request_module(), which is only
      called by drivers/usb/common/ulpi.c. This request module function is
      faulty, but maybe because in most cases there is an alternative, ULPI
      driver users have not noticed it.
      
      In this function, of_device_get_modalias() is called twice. The first
      time without buffer just to get the number of bytes required by the
      modalias string (excluding the null byte), and a second time, after
      buffer allocation, to fill the buffer. The allocation asks for an
      additional byte, in order to store the trailing '\0'. However, the
      buffer *length* provided to of_device_get_modalias() excludes this extra
      byte. The internal use of snprintf() with a length that is exactly the
      number of bytes to be written has the effect of using the last available
      byte to store a '\0', which then smashes the last character of the
      modalias string.
      
      Provide the actual size of the buffer to of_device_get_modalias() to fix
      this issue.
      
      Note: the "str[size - 1] = '\0';" line is not really needed as snprintf
      will anyway end the string with a null byte, but there is a possibility
      that this function might be called on a struct device_node without
      compatible, in this case snprintf() would not be executed. So we keep it
      just to avoid possible unbounded strings.
      
      Cc: Stephen Boyd <sboyd@kernel.org>
      Cc: Peter Chen <peter.chen@kernel.org>
      Fixes: 9c829c09 ("of: device: Support loading a module with OF based modalias")
      Signed-off-by: NMiquel Raynal <miquel.raynal@bootlin.com>
      Reviewed-by: NRob Herring <robh@kernel.org>
      Signed-off-by: NSrinivas Kandagatla <srinivas.kandagatla@linaro.org>
      Link: https://lore.kernel.org/r/20230404172148.82422-2-srinivas.kandagatla@linaro.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      56ce204e
    • K
      tcp/udp: Fix memleaks of sk and zerocopy skbs with TX timestamp. · 16067223
      Kuniyuki Iwashima 提交于
      stable inclusion
      from stable-v4.19.283
      commit 1f69c086b20e27763af28145981435423f088268
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 50749f2d ]
      
      syzkaller reported [0] memory leaks of an UDP socket and ZEROCOPY
      skbs.  We can reproduce the problem with these sequences:
      
        sk = socket(AF_INET, SOCK_DGRAM, 0)
        sk.setsockopt(SOL_SOCKET, SO_TIMESTAMPING, SOF_TIMESTAMPING_TX_SOFTWARE)
        sk.setsockopt(SOL_SOCKET, SO_ZEROCOPY, 1)
        sk.sendto(b'', MSG_ZEROCOPY, ('127.0.0.1', 53))
        sk.close()
      
      sendmsg() calls msg_zerocopy_alloc(), which allocates a skb, sets
      skb->cb->ubuf.refcnt to 1, and calls sock_hold().  Here, struct
      ubuf_info_msgzc indirectly holds a refcnt of the socket.  When the
      skb is sent, __skb_tstamp_tx() clones it and puts the clone into
      the socket's error queue with the TX timestamp.
      
      When the original skb is received locally, skb_copy_ubufs() calls
      skb_unclone(), and pskb_expand_head() increments skb->cb->ubuf.refcnt.
      This additional count is decremented while freeing the skb, but struct
      ubuf_info_msgzc still has a refcnt, so __msg_zerocopy_callback() is
      not called.
      
      The last refcnt is not released unless we retrieve the TX timestamped
      skb by recvmsg().  Since we clear the error queue in inet_sock_destruct()
      after the socket's refcnt reaches 0, there is a circular dependency.
      If we close() the socket holding such skbs, we never call sock_put()
      and leak the count, sk, and skb.
      
      TCP has the same problem, and commit e0c8bccd ("net: stream:
      purge sk_error_queue in sk_stream_kill_queues()") tried to fix it
      by calling skb_queue_purge() during close().  However, there is a
      small chance that skb queued in a qdisc or device could be put
      into the error queue after the skb_queue_purge() call.
      
      In __skb_tstamp_tx(), the cloned skb should not have a reference
      to the ubuf to remove the circular dependency, but skb_clone() does
      not call skb_copy_ubufs() for zerocopy skb.  So, we need to call
      skb_orphan_frags_rx() for the cloned skb to call skb_copy_ubufs().
      
      [0]:
      BUG: memory leak
      unreferenced object 0xffff88800c6d2d00 (size 1152):
        comm "syz-executor392", pid 264, jiffies 4294785440 (age 13.044s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 cd af e8 81 00 00 00 00  ................
          02 00 07 40 00 00 00 00 00 00 00 00 00 00 00 00  ...@............
        backtrace:
          [<0000000055636812>] sk_prot_alloc+0x64/0x2a0 net/core/sock.c:2024
          [<0000000054d77b7a>] sk_alloc+0x3b/0x800 net/core/sock.c:2083
          [<0000000066f3c7e0>] inet_create net/ipv4/af_inet.c:319 [inline]
          [<0000000066f3c7e0>] inet_create+0x31e/0xe40 net/ipv4/af_inet.c:245
          [<000000009b83af97>] __sock_create+0x2ab/0x550 net/socket.c:1515
          [<00000000b9b11231>] sock_create net/socket.c:1566 [inline]
          [<00000000b9b11231>] __sys_socket_create net/socket.c:1603 [inline]
          [<00000000b9b11231>] __sys_socket_create net/socket.c:1588 [inline]
          [<00000000b9b11231>] __sys_socket+0x138/0x250 net/socket.c:1636
          [<000000004fb45142>] __do_sys_socket net/socket.c:1649 [inline]
          [<000000004fb45142>] __se_sys_socket net/socket.c:1647 [inline]
          [<000000004fb45142>] __x64_sys_socket+0x73/0xb0 net/socket.c:1647
          [<0000000066999e0e>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
          [<0000000066999e0e>] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
          [<0000000017f238c1>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      BUG: memory leak
      unreferenced object 0xffff888017633a00 (size 240):
        comm "syz-executor392", pid 264, jiffies 4294785440 (age 13.044s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 2d 6d 0c 80 88 ff ff  .........-m.....
        backtrace:
          [<000000002b1c4368>] __alloc_skb+0x229/0x320 net/core/skbuff.c:497
          [<00000000143579a6>] alloc_skb include/linux/skbuff.h:1265 [inline]
          [<00000000143579a6>] sock_omalloc+0xaa/0x190 net/core/sock.c:2596
          [<00000000be626478>] msg_zerocopy_alloc net/core/skbuff.c:1294 [inline]
          [<00000000be626478>] msg_zerocopy_realloc+0x1ce/0x7f0 net/core/skbuff.c:1370
          [<00000000cbfc9870>] __ip_append_data+0x2adf/0x3b30 net/ipv4/ip_output.c:1037
          [<0000000089869146>] ip_make_skb+0x26c/0x2e0 net/ipv4/ip_output.c:1652
          [<00000000098015c2>] udp_sendmsg+0x1bac/0x2390 net/ipv4/udp.c:1253
          [<0000000045e0e95e>] inet_sendmsg+0x10a/0x150 net/ipv4/af_inet.c:819
          [<000000008d31bfde>] sock_sendmsg_nosec net/socket.c:714 [inline]
          [<000000008d31bfde>] sock_sendmsg+0x141/0x190 net/socket.c:734
          [<0000000021e21aa4>] __sys_sendto+0x243/0x360 net/socket.c:2117
          [<00000000ac0af00c>] __do_sys_sendto net/socket.c:2129 [inline]
          [<00000000ac0af00c>] __se_sys_sendto net/socket.c:2125 [inline]
          [<00000000ac0af00c>] __x64_sys_sendto+0xe1/0x1c0 net/socket.c:2125
          [<0000000066999e0e>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
          [<0000000066999e0e>] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
          [<0000000017f238c1>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Fixes: f214f915 ("tcp: enable MSG_ZEROCOPY")
      Fixes: b5947e5d ("udp: msg_zerocopy")
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      16067223
    • Z
      ipv4: Fix potential uninit variable access bug in __ip_make_skb() · e76e6556
      Ziyang Xuan 提交于
      stable inclusion
      from stable-v4.19.283
      commit 022ea4374c319690c804706bda9dc42946d1556d
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 99e5acae ]
      
      Like commit ea30388b ("ipv6: Fix an uninit variable access bug in
      __ip6_make_skb()"). icmphdr does not in skb linear region under the
      scenario of SOCK_RAW socket. Access icmp_hdr(skb)->type directly will
      trigger the uninit variable access bug.
      
      Use a local variable icmp_type to carry the correct value in different
      scenarios.
      
      Fixes: 96793b48 ("[IPV4]: Add ICMPMsgStats MIB (RFC 4293)")
      Reviewed-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      e76e6556
    • H
      crypto: drbg - Only fail when jent is unavailable in FIPS mode · 262311ea
      Herbert Xu 提交于
      stable inclusion
      from stable-v4.19.283
      commit 1fd247c1ded58f9bc1130fe4b26fb187fa1af55d
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 686cd976 ]
      
      When jent initialisation fails for any reason other than ENOENT,
      the entire drbg fails to initialise, even when we're not in FIPS
      mode.  This is wrong because we can still use the kernel RNG when
      we're not in FIPS mode.
      
      Change it so that it only fails when we are in FIPS mode.
      
      Fixes: 57225e67 ("crypto: drbg - Use callback API for random readiness")
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Reviewed-by: NStephan Mueller <smueller@chronox.de>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      262311ea
    • N
      crypto: drbg - make drbg_prepare_hrng() handle jent instantiation errors · 72a9e727
      Nicolai Stange 提交于
      stable inclusion
      from stable-v4.19.283
      commit f1943e5703861f89f4376596e3d28d0dd52c5ead
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 559edd47 ]
      
      Now that drbg_prepare_hrng() doesn't do anything but to instantiate a
      jitterentropy crypto_rng instance, it looks a little odd to have the
      related error handling at its only caller, drbg_instantiate().
      
      Move the handling of jitterentropy allocation failures from
      drbg_instantiate() close to the allocation itself in drbg_prepare_hrng().
      
      There is no change in behaviour.
      Signed-off-by: NNicolai Stange <nstange@suse.de>
      Reviewed-by: NStephan Müller <smueller@chronox.de>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Stable-dep-of: 686cd976 ("crypto: drbg - Only fail when jent is unavailable in FIPS mode")
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      72a9e727