提交 · 8267b0d5924ddde1b2b287897f919f4c69e728e0 · openeuler / Kernel

12 6月, 2023 3 次提交

net: Remove WARN_ON_ONCE(sk->sk_forward_alloc) from sk_stream_kill_queues(). · 8267b0d5

由 Kuniyuki Iwashima 提交于 6月 12, 2023

stable inclusion
from stable-v4.19.275
commit 9089260f9270abc99ef33ba2dc775ebcd9e534b2
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6TPN9
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=9089260f9270abc99ef33ba2dc775ebcd9e534b2

--------------------------------

commit 62ec33b4 upstream.

Christoph Paasch reported that commit b5fc2923 ("inet6: Remove
inet6_destroy_sock() in sk->sk_prot->destroy().") started triggering
WARN_ON_ONCE(sk->sk_forward_alloc) in sk_stream_kill_queues().  [0 - 2]
Also, we can reproduce it by a program in [3].

In the commit, we delay freeing ipv6_pinfo.pktoptions from sk->destroy()
to sk->sk_destruct(), so sk->sk_forward_alloc is no longer zero in
inet_csk_destroy_sock().

The same check has been in inet_sock_destruct() from at least v2.6,
we can just remove the WARN_ON_ONCE().  However, among the users of
sk_stream_kill_queues(), only CAIF is not calling inet_sock_destruct().
Thus, we add the same WARN_ON_ONCE() to caif_sock_destructor().

[0]: https://lore.kernel.org/netdev/39725AB4-88F1-41B3-B07F-949C5CAEFF4F@icloud.com/
[1]: https://github.com/multipath-tcp/mptcp_net-next/issues/341
[2]:
WARNING: CPU: 0 PID: 3232 at net/core/stream.c:212 sk_stream_kill_queues+0x2f9/0x3e0
Modules linked in:
CPU: 0 PID: 3232 Comm: syz-executor.0 Not tainted 6.2.0-rc5ab24eb4698afbe147b424149c529e2a43ec24eb5 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:sk_stream_kill_queues+0x2f9/0x3e0
Code: 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e ec 00 00 00 8b ab 08 01 00 00 e9 60 ff ff ff e8 d0 5f b6 fe 0f 0b eb 97 e8 c7 5f b6 fe <0f> 0b eb a0 e8 be 5f b6 fe 0f 0b e9 6a fe ff ff e8 02 07 e3 fe e9
RSP: 0018:ffff88810570fc68 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff888101f38f40 RSI: ffffffff8285e529 RDI: 0000000000000005
RBP: 0000000000000ce0 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000ce0 R11: 0000000000000001 R12: ffff8881009e9488
R13: ffffffff84af2cc0 R14: 0000000000000000 R15: ffff8881009e9458
FS:  00007f7fdfbd5800(0000) GS:ffff88811b600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b32923000 CR3: 00000001062fc006 CR4: 0000000000170ef0
Call Trace:
 <TASK>
 inet_csk_destroy_sock+0x1a1/0x320
 __tcp_close+0xab6/0xe90
 tcp_close+0x30/0xc0
 inet_release+0xe9/0x1f0
 inet6_release+0x4c/0x70
 __sock_release+0xd2/0x280
 sock_close+0x15/0x20
 __fput+0x252/0xa20
 task_work_run+0x169/0x250
 exit_to_user_mode_prepare+0x113/0x120
 syscall_exit_to_user_mode+0x1d/0x40
 do_syscall_64+0x48/0x90
 entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f7fdf7ae28d
Code: c1 20 00 00 75 10 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ee fb ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 37 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
RSP: 002b:00000000007dfbb0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000004 RCX: 00007f7fdf7ae28d
RDX: 0000000000000000 RSI: ffffffffffffffff RDI: 0000000000000003
RBP: 0000000000000000 R08: 000000007f338e0f R09: 0000000000000e0f
R10: 000000007f338e13 R11: 0000000000000293 R12: 00007f7fdefff000
R13: 00007f7fdefffcd8 R14: 00007f7fdefffce0 R15: 00007f7fdefffcd8
 </TASK>

[3]: https://lore.kernel.org/netdev/20230208004245.83497-1-kuniyu@amazon.com/

Fixes: b5fc2923 ("inet6: Remove inet6_destroy_sock() in sk->sk_prot->destroy().")
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Reported-by: NChristoph Paasch <christophpaasch@icloud.com>
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

8267b0d5

dccp/tcp: Avoid negative sk_forward_alloc by ipv6_pinfo.pktoptions. · fe2eb442

由 Kuniyuki Iwashima 提交于 6月 12, 2023

stable inclusion
from stable-v4.19.273
commit 28a1742fcc1d8692e9090864373aa77fb3a0c8fd
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6TPN9
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=28a1742fcc1d8692e9090864373aa77fb3a0c8fd

--------------------------------

commit ca43ccf4 upstream.

Eric Dumazet pointed out [0] that when we call skb_set_owner_r()
for ipv6_pinfo.pktoptions, sk_rmem_schedule() has not been called,
resulting in a negative sk_forward_alloc.

We add a new helper which clones a skb and sets its owner only
when sk_rmem_schedule() succeeds.

Note that we move skb_set_owner_r() forward in (dccp|tcp)_v6_do_rcv()
because tcp_send_synack() can make sk_forward_alloc negative before
ipv6_opt_accepted() in the crossed SYN-ACK or self-connect() cases.

[0]: https://lore.kernel.org/netdev/CANn89iK9oc20Jdi_41jb9URdF210r7d1Y-+uypbMSbOfY6jqrg@mail.gmail.com/

Fixes: 323fbd0e ("net: dccp: Add handling of IPV6_PKTOPTIONS to dccp_v6_do_rcv()")
Fixes: 3df80d93 ("[DCCP]: Introduce DCCPv6")
Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

fe2eb442

media: dvb-core: Fix kernel WARNING for blocking operation in wait_event*() · cfbabcb3

由 Takashi Iwai 提交于 6月 12, 2023

mainline inclusion
from mainline-v6.4-rc3
commit b8c75e4a
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6YKXB
CVE: CVE-2023-31084

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b8c75e4a1b325ea0a9433fa8834be97b5836b946

--------------------------------

Using a semaphore in the wait_event*() condition is no good idea.
It hits a kernel WARN_ON() at prepare_to_wait_event() like:
  do not call blocking ops when !TASK_RUNNING; state=1 set at
  prepare_to_wait_event+0x6d/0x690

For avoiding the potential deadlock, rewrite to an open-coded loop
instead.  Unlike the loop in wait_event*(), this uses wait_woken()
after the condition check, hence the task state stays consistent.

CVE-2023-31084 was assigned to this bug.

Link: https://lore.kernel.org/r/CA+UBctCu7fXn4q41O_3=id1+OdyQ85tZY1x+TkT-6OVBL6KAUw@mail.gmail.com/

Link: https://lore.kernel.org/linux-media/20230512151800.1874-1-tiwai@suse.deReported-by: NYu Hao <yhao016@ucr.edu>
Closes: https://nvd.nist.gov/vuln/detail/CVE-2023-31084Signed-off-by: NTakashi Iwai <tiwai@suse.de>
Signed-off-by: NMauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: NChen Jiahao <chenjiahao16@huawei.com>
Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
Reviewed-by: NLiao Chang <liaochang1@huawei.com>
Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

cfbabcb3

09 6月, 2023 2 次提交

sched: smart grid: init sched_grid_qos structure on QOS purpose · ce35ded5

由 Wang ShaoBo 提交于 6月 09, 2023

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I7BQZ0
CVE: NA

----------------------------------------

As smart grid scheduling (SGS) may shrink resources and affect task QOS,
We provide methods for evaluating task QOS in divided grid, we mainly
focus on the following two aspects:

   1. Evaluate whether (such as CPU or memory) resources meet our demand
   2. Ensure the least impact when working with (cpufreq and cpuidle) governors

For tackling this questions, we have summarized several sampling methods
to obtain tasks' characteristics at same time reducing scheduling noise
as much as possible:

  1. we detected the key factors that how sensitive a process is in cpufreq
     or cpuidle adjustment, and to guide the cpufreq/cpuidle governor
  2. We dynamically monitor process memory bandwidth and adjust memory
     allocation to minimize cross-remote memory access
  3. We provide a variety of load tracking mechanisms to adapt to different
     types of task's load change

     ---------------------------------     -----------------
    |            class A              |   |     class B     |
    |    --------        --------     |   |     --------    |
    |   | group0 |      | group1 |    |---|    | group2 |   |----------+
    |    --------        --------     |   |     --------    |          |
    |    CPU/memory sensitive type    |   |   balance type  |          |
     ----------------+----------------     --------+--------           |
                     v                             v                   | (target cpufreq)
     -------------------------------------------------------           | (sensitivity)
    |              Not satisfied with QOS?                  |          |
     --------------------------+----------------------------           |
                               v                                       v
     -------------------------------------------------------     ----------------
    |              expand or shrink resource                |<--|  energy model  |
     ----------------------------+--------------------------     ----------------
                                 v                                     |
     -----------          -----------          ------------            v
    |           |        |           |        |            |     ---------------
    |   GRID0   +--------+   GRID1   +--------+   GRID2    |<-- |   governor    |
    |           |        |           |        |            |     ---------------
     -----------          -----------          ------------
                   \            |            /
                    \  -------------------  /
                      |  pages migration  |
                       -------------------

We will introduce the energy model in the follow-up implementation, and consider
the dynamic affinity adjustment between each divided grid in the runtime.
Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>

ce35ded5

sched: Introduce smart grid scheduling strategy for cfs · 713cfd26

由 Hui Tang 提交于 6月 09, 2023

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I7BQZ0
CVE: NA

----------------------------------------

We want to dynamically expand or shrink the affinity range of tasks
based on the CPU topology level while meeting the minimum resource
requirements of tasks.

We divide several level of affinity domains according to sched domains:

level4   * SOCKET  [                                                  ]
level3   * DIE     [                             ]
level2   * MC      [             ] [             ]
level1   * SMT     [     ] [     ] [     ] [     ]
level0   * CPU      0   1   2   3   4   5   6   7

Whether users tend to choose power saving or performance will affect
strategy of adjusting affinity, when selecting the power saving mode,
we will choose a more appropriate affinity based on the energy model
to reduce power consumption, while considering the QOS of resources
such as CPU and memory consumption, for instance, if the current task
CPU load is less than required, smart grid will judge whether to aggregate
tasks together into a smaller range or not according to energy model.

The main difference from EAS is that we pay more attention to the impact
of power consumption brought by such as cpuidle and DVFS, and classify
tasks to reduce interference and ensure resource QOS in each divided unit,
which are more suitable for general-purpose on non-heterogeneous CPUs.

        --------        --------        --------
       | group0 |      | group1 |      | group2 |
        --------        --------        --------
	   |                |              |
	   v                |              v
       ---------------------+-----     -----------------
      |                  ---v--   |   |
      |       DIE0      |  MC1 |  |   |   DIE1
      |                  ------   |   |
       ---------------------------     -----------------

We regularly count the resource satisfaction of groups, and adjust the
affinity, scheduling balance and migrating memory will be considered
based on memory location for better meetting resource requirements.
Signed-off-by: NHui Tang <tanghui20@huawei.com>
Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
Reviewed-by: NChen Hui <judy.chenhui@huawei.com>
Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>

713cfd26

08 6月, 2023 25 次提交

ipmi: fix SSIF not responding under certain cond. · aaf2ccb4

由 Zhang Yuchen 提交于 6月 08, 2023

stable inclusion
from stable-v4.19.283
commit ba810999356bffa4627985123c15327c692318e5
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ba810999356bffa4627985123c15327c692318e5

--------------------------------

[ Upstream commit 6d2555cd ]

The ipmi communication is not restored after a specific version of BMC is
upgraded on our server.
The ipmi driver does not respond after printing the following log:

    ipmi_ssif: Invalid response getting flags: 1c 1

I found that after entering this branch, ssif_info->ssif_state always
holds SSIF_GETTING_FLAGS and never return to IDLE.

As a result, the driver cannot be loaded, because the driver status is
checked during the unload process and must be IDLE in shutdown_ssif():

        while (ssif_info->ssif_state != SSIF_IDLE)
                schedule_timeout(1);

The process trigger this problem is:

1. One msg timeout and next msg start send, and call
ssif_set_need_watch().

2. ssif_set_need_watch()->watch_timeout()->start_flag_fetch() change
ssif_state to SSIF_GETTING_FLAGS.

3. In msg_done_handler() ssif_state == SSIF_GETTING_FLAGS, if an error
message is received, the second branch does not modify the ssif_state.

4. All retry action need IS_SSIF_IDLE() == True. Include retry action in
watch_timeout(), msg_done_handler(). Sending msg does not work either.
SSIF_IDLE is also checked in start_next_msg().

5. The only thing that can be triggered in the SSIF driver is
watch_timeout(), after destory_user(), this timer will stop too.

So, if enter this branch, the ssif_state will remain SSIF_GETTING_FLAGS
and can't send msg, no timer started, can't unload.

We did a comparative test before and after adding this patch, and the
result is effective.

Fixes: 25930707 ("ipmi: Add SMBus interface driver (SSIF)")

Cc: stable@vger.kernel.org
Signed-off-by: NZhang Yuchen <zhangyuchen.lcr@bytedance.com>
Message-Id: <20230412074907.80046-1-zhangyuchen.lcr@bytedance.com>
Signed-off-by: NCorey Minyard <minyard@acm.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYi Yang <yiyang13@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

aaf2ccb4

ipmi_ssif: Rename idle state and check · b0afa0fa

由 Corey Minyard 提交于 6月 08, 2023

stable inclusion
from stable-v4.19.283
commit 13b3a05b5b03b4fce5a9ea03dc91159dea1f6ef9
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=13b3a05b5b03b4fce5a9ea03dc91159dea1f6ef9

--------------------------------

[ Upstream commit 8230831c ]

Rename the SSIF_IDLE() to IS_SSIF_IDLE(), since that is more clear, and
rename SSIF_NORMAL to SSIF_IDLE, since that's more accurate.

Cc: stable@vger.kernel.org
Signed-off-by: NCorey Minyard <cminyard@mvista.com>
Stable-dep-of: 6d2555cd ("ipmi: fix SSIF not responding under certain cond.")
Signed-off-by: NSasha Levin <sashal@kernel.org>

conflict:
	drivers/char/ipmi/ipmi_ssif.c
Signed-off-by: NYi Yang <yiyang13@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

b0afa0fa

mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock · c68cb237

由 Tetsuo Handa 提交于 6月 08, 2023

stable inclusion
from stable-v4.19.283
commit 90c4e02baef3eed8640c9375bd0e75bddd0ec08d
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U
CVE: NA

--------------------------------

commit 1007843a upstream.

syzbot is reporting circular locking dependency which involves
zonelist_update_seq seqlock [1], for this lock is checked by memory
allocation requests which do not need to be retried.

One deadlock scenario is kmalloc(GFP_ATOMIC) from an interrupt handler.

  CPU0
  ----
  __build_all_zonelists() {
    write_seqlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount odd
    // e.g. timer interrupt handler runs at this moment
      some_timer_func() {
        kmalloc(GFP_ATOMIC) {
          __alloc_pages_slowpath() {
            read_seqbegin(&zonelist_update_seq) {
              // spins forever because zonelist_update_seq.seqcount is odd
            }
          }
        }
      }
    // e.g. timer interrupt handler finishes
    write_sequnlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount even
  }

This deadlock scenario can be easily eliminated by not calling
read_seqbegin(&zonelist_update_seq) from !__GFP_DIRECT_RECLAIM allocation
requests, for retry is applicable to only __GFP_DIRECT_RECLAIM allocation
requests.  But Michal Hocko does not know whether we should go with this
approach.

Another deadlock scenario which syzbot is reporting is a race between
kmalloc(GFP_ATOMIC) from tty_insert_flip_string_and_push_buffer() with
port->lock held and printk() from __build_all_zonelists() with
zonelist_update_seq held.

  CPU0                                   CPU1
  ----                                   ----
  pty_write() {
    tty_insert_flip_string_and_push_buffer() {
                                         __build_all_zonelists() {
                                           write_seqlock(&zonelist_update_seq);
                                           build_zonelists() {
                                             printk() {
                                               vprintk() {
                                                 vprintk_default() {
                                                   vprintk_emit() {
                                                     console_unlock() {
                                                       console_flush_all() {
                                                         console_emit_next_record() {
                                                           con->write() = serial8250_console_write() {
      spin_lock_irqsave(&port->lock, flags);
      tty_insert_flip_string() {
        tty_insert_flip_string_fixed_flag() {
          __tty_buffer_request_room() {
            tty_buffer_alloc() {
              kmalloc(GFP_ATOMIC | __GFP_NOWARN) {
                __alloc_pages_slowpath() {
                  zonelist_iter_begin() {
                    read_seqbegin(&zonelist_update_seq); // spins forever because zonelist_update_seq.seqcount is odd
                                                             spin_lock_irqsave(&port->lock, flags); // spins forever because port->lock is held
                    }
                  }
                }
              }
            }
          }
        }
      }
      spin_unlock_irqrestore(&port->lock, flags);
                                                             // message is printed to console
                                                             spin_unlock_irqrestore(&port->lock, flags);
                                                           }
                                                         }
                                                       }
                                                     }
                                                   }
                                                 }
                                               }
                                             }
                                           }
                                           write_sequnlock(&zonelist_update_seq);
                                         }
    }
  }

This deadlock scenario can be eliminated by

  preventing interrupt context from calling kmalloc(GFP_ATOMIC)

and

  preventing printk() from calling console_flush_all()

while zonelist_update_seq.seqcount is odd.

Since Petr Mladek thinks that __build_all_zonelists() can become a
candidate for deferring printk() [2], let's address this problem by

  disabling local interrupts in order to avoid kmalloc(GFP_ATOMIC)

and

  disabling synchronous printk() in order to avoid console_flush_all()

.

As a side effect of minimizing duration of zonelist_update_seq.seqcount
being odd by disabling synchronous printk(), latency at
read_seqbegin(&zonelist_update_seq) for both !__GFP_DIRECT_RECLAIM and
__GFP_DIRECT_RECLAIM allocation requests will be reduced.  Although, from
lockdep perspective, not calling read_seqbegin(&zonelist_update_seq) (i.e.
do not record unnecessary locking dependency) from interrupt context is
still preferable, even if we don't allow calling kmalloc(GFP_ATOMIC)
inside
write_seqlock(&zonelist_update_seq)/write_sequnlock(&zonelist_update_seq)
section...

Link: https://lkml.kernel.org/r/8796b95c-3da3-5885-fddd-6ef55f30e4d3@I-love.SAKURA.ne.jp
Fixes: 3d36424b ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation")
Link: https://lkml.kernel.org/r/ZCrs+1cDqPWTDFNM@alley [2]
Reported-by: Nsyzbot <syzbot+223c7461c58c58a4cb10@syzkaller.appspotmail.com>
  Link: https://syzkaller.appspot.com/bug?extid=223c7461c58c58a4cb10 [1]
Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NMel Gorman <mgorman@techsingularity.net>
Cc: Petr Mladek <pmladek@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Cc: John Ogness <john.ogness@linutronix.de>
Cc: Patrick Daly <quic_pdaly@quicinc.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
Reviewed-by: NNanyong Sun <sunnanyong@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

c68cb237

printk: declare printk_deferred_{enter,safe}() in include/linux/printk.h · 13e4745b

由 Tetsuo Handa 提交于 6月 08, 2023

stable inclusion
from stable-v4.19.283
commit 09b28fe9ff2fce03efc7d71dc79b58a49b01d0e9
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U
CVE: NA

--------------------------------

commit 85e3e7fb upstream.

[This patch implements subset of original commit 85e3e7fb ("printk:
remove NMI tracking") where commit 1007843a ("mm/page_alloc: fix
potential deadlock on zonelist_update_seq seqlock") depends on, for
commit 3d36424b ("mm/page_alloc: fix race condition between
build_all_zonelists and page allocation") was backported to stable.]

All NMI contexts are handled the same as the safe context: store the
message and defer printing. There is no need to have special NMI
context tracking for this. Using in_nmi() is enough.

There are several parts of the kernel that are manually calling into
the printk NMI context tracking in order to cause general printk
deferred printing:

    arch/arm/kernel/smp.c
    arch/powerpc/kexec/crash.c
    kernel/trace/trace.c

For arm/kernel/smp.c and powerpc/kexec/crash.c, provide a new
function pair printk_deferred_enter/exit that explicitly achieves the
same objective.

For ftrace, remove the printk context manipulation completely. It was
added in commit 03fc7f9c ("printk/nmi: Prevent deadlock when
accessing the main log buffer in NMI"). The purpose was to enforce
storing messages directly into the ring buffer even in NMI context.
It really should have only modified the behavior in NMI context.
There is no need for a special behavior any longer. All messages are
always stored directly now. The console deferring is handled
transparently in vprintk().
Signed-off-by: NJohn Ogness <john.ogness@linutronix.de>
[pmladek@suse.com: Remove special handling in ftrace.c completely.
Signed-off-by: NPetr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20210715193359.25946-5-john.ogness@linutronix.de
[penguin-kernel: Copy only printk_deferred_{enter,safe}() definition ]
Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Conflicts:
	include/linux/printk.h
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
Reviewed-by: NNanyong Sun <sunnanyong@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

13e4745b

serial: 8250: Fix serial8250_tx_empty() race with DMA Tx · e84cc7b8

由 Ilpo Järvinen 提交于 6月 08, 2023

stable inclusion
from stable-v4.19.283
commit 3f9cab5766daa1c1e5b389cd12b6e717ce95852f
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U
CVE: NA

--------------------------------

There's a potential race before THRE/TEMT deasserts when DMA Tx is
starting up (or the next batch of continuous Tx is being submitted).
This can lead to misdetecting Tx empty condition.

It is entirely normal for THRE/TEMT to be set for some time after the
DMA Tx had been setup in serial8250_tx_dma(). As Tx side is definitely
not empty at that point, it seems incorrect for serial8250_tx_empty()
claim Tx is empty.

Fix the race by also checking in serial8250_tx_empty() whether there's
DMA Tx active.

Note: This fix only addresses in-kernel race mainly to make using
TCSADRAIN/FLUSH robust. Userspace can still cause other races but they
seem userspace concurrency control problems.

Fixes: 9ee4b83e ("serial: 8250: Add support for dmaengine")
Cc: stable@vger.kernel.org
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20230317113318.31327-3-ilpo.jarvinen@linux.intel.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 146a37e0)
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

e84cc7b8

tty: Prevent writing chars during tcsetattr TCSADRAIN/FLUSH · 5195a946

由 Ilpo Järvinen 提交于 6月 08, 2023

stable inclusion
from stable-v4.19.283
commit 3271859c2d2709515db4ad7a6cbdd3617da04405
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U
CVE: NA

--------------------------------

If userspace races tcsetattr() with a write, the drained condition
might not be guaranteed by the kernel. There is a race window after
checking Tx is empty before tty_set_termios() takes termios_rwsem for
write. During that race window, more characters can be queued by a
racing writer.

Any ongoing transmission might produce garbage during HW's
->set_termios() call. The intent of TCSADRAIN/FLUSH seems to be
preventing such a character corruption. If those flags are set, take
tty's write lock to stop any writer before performing the lower layer
Tx empty check and wait for the pending characters to be sent (if any).

The initial wait for all-writers-done must be placed outside of tty's
write lock to avoid deadlock which makes it impossible to use
tty_wait_until_sent(). The write lock is retried if a racing write is
detected.

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20230317113318.31327-2-ilpo.jarvinen@linux.intel.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 094fb49a)
Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

5195a946

af_packet: Don't send zero-byte data in packet_sendmsg_spkt(). · dd9c02c5

由 Kuniyuki Iwashima 提交于 6月 08, 2023

stable inclusion
from stable-v4.19.283
commit 0a607752b4ef5d48d81b1a1be6ac010f991fdf63
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U
CVE: NA

--------------------------------

[ Upstream commit 6a341729 ]

syzkaller reported a warning below [0].

We can reproduce it by sending 0-byte data from the (AF_PACKET,
SOCK_PACKET) socket via some devices whose dev->hard_header_len
is 0.

    struct sockaddr_pkt addr = {
        .spkt_family = AF_PACKET,
        .spkt_device = "tun0",
    };
    int fd;

    fd = socket(AF_PACKET, SOCK_PACKET, 0);
    sendto(fd, NULL, 0, 0, (struct sockaddr *)&addr, sizeof(addr));

We have a similar fix for the (AF_PACKET, SOCK_RAW) socket as
commit dc633700 ("net/af_packet: check len when min_header_len
equals to 0").

Let's add the same test for the SOCK_PACKET socket.

[0]:
skb_assert_len
WARNING: CPU: 1 PID: 19945 at include/linux/skbuff.h:2552 skb_assert_len include/linux/skbuff.h:2552 [inline]
WARNING: CPU: 1 PID: 19945 at include/linux/skbuff.h:2552 __dev_queue_xmit+0x1f26/0x31d0 net/core/dev.c:4159
Modules linked in:
CPU: 1 PID: 19945 Comm: syz-executor.0 Not tainted 6.3.0-rc7-02330-gca6270c12e20 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:skb_assert_len include/linux/skbuff.h:2552 [inline]
RIP: 0010:__dev_queue_xmit+0x1f26/0x31d0 net/core/dev.c:4159
Code: 89 de e8 1d a2 85 fd 84 db 75 21 e8 64 a9 85 fd 48 c7 c6 80 2a 1f 86 48 c7 c7 c0 06 1f 86 c6 05 23 cf 27 04 01 e8 fa ee 56 fd <0f> 0b e8 43 a9 85 fd 0f b6 1d 0f cf 27 04 31 ff 89 de e8 e3 a1 85
RSP: 0018:ffff8880217af6e0 EFLAGS: 00010282
RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffc90001133000
RDX: 0000000000040000 RSI: ffffffff81186922 RDI: 0000000000000001
RBP: ffff8880217af8b0 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: ffff888030045640
R13: ffff8880300456b0 R14: ffff888030045650 R15: ffff888030045718
FS:  00007fc5864da640(0000) GS:ffff88806cd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020005740 CR3: 000000003f856003 CR4: 0000000000770ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 <TASK>
 dev_queue_xmit include/linux/netdevice.h:3085 [inline]
 packet_sendmsg_spkt+0xc4b/0x1230 net/packet/af_packet.c:2066
 sock_sendmsg_nosec net/socket.c:724 [inline]
 sock_sendmsg+0x1b4/0x200 net/socket.c:747
 ____sys_sendmsg+0x331/0x970 net/socket.c:2503
 ___sys_sendmsg+0x11d/0x1c0 net/socket.c:2557
 __sys_sendmmsg+0x18c/0x430 net/socket.c:2643
 __do_sys_sendmmsg net/socket.c:2672 [inline]
 __se_sys_sendmmsg net/socket.c:2669 [inline]
 __x64_sys_sendmmsg+0x9c/0x100 net/socket.c:2669
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3c/0x90 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7fc58791de5d
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 9f 1b 00 f7 d8 64 89 01 48
RSP: 002b:00007fc5864d9cc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
RAX: ffffffffffffffda RBX: 00000000004bbf80 RCX: 00007fc58791de5d
RDX: 0000000000000001 RSI: 0000000020005740 RDI: 0000000000000004
RBP: 00000000004bbf80 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000000b R14: 00007fc58797e530 R15: 0000000000000000
 </TASK>
---[ end trace 0000000000000000 ]---
skb len=0 headroom=16 headlen=0 tailroom=304
mac=(16,0) net=(16,-1) trans=-1
shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0)
hash(0x0 sw=0 l4=0) proto=0x0000 pkttype=0 iif=0
dev name=sit0 feat=0x00000006401d7869
sk family=17 type=10 proto=0

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

dd9c02c5

nohz: Add TICK_DEP_BIT_RCU · 5fd1535b

由 Frederic Weisbecker 提交于 6月 08, 2023

stable inclusion
from stable-v4.19.283
commit fe2ae32a7ec9fa64f61993b808f25315b9996b03
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U
CVE: NA

--------------------------------

[ Upstream commit 01b4c399 ]

If a nohz_full CPU is looping in the kernel, the scheduling-clock tick
might nevertheless remain disabled.  In !PREEMPT kernels, this can
prevent RCU's attempts to enlist the aid of that CPU's executions of
cond_resched(), which can in turn result in an arbitrarily delayed grace
period and thus an OOM.  RCU therefore needs a way to enable a holdout
nohz_full CPU's scheduler-clock interrupt.

This commit therefore provides a new TICK_DEP_BIT_RCU value which RCU can
pass to tick_dep_set_cpu() and friends to force on the scheduler-clock
interrupt for a specified CPU or task.  In some cases, rcutorture needs
to turn on the scheduler-clock tick, so this commit also exports the
relevant symbols to GPL-licensed modules.
Signed-off-by: NFrederic Weisbecker <frederic@kernel.org>
Signed-off-by: NPaul E. McKenney <paulmck@kernel.org>
Stable-dep-of: 58d76682 ("tick/nohz: Fix cpu_is_hotpluggable() by checking with nohz subsystem")
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

5fd1535b

perf/core: Fix hardlockup failure caused by perf throttle · b17c9247

由 Yang Jihong 提交于 6月 08, 2023

stable inclusion
from stable-v4.19.283
commit 6805c1fcbe37989dd6fa4d545d04ccbea5e69d1f
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U
CVE: NA

--------------------------------

[ Upstream commit 15def34e ]

commit e050e3f0 ("perf: Fix broken interrupt rate throttling")
introduces a change in throttling threshold judgment. Before this,
compare hwc->interrupts and max_samples_per_tick, then increase
hwc->interrupts by 1, but this commit reverses order of these two
behaviors, causing the semantics of max_samples_per_tick to change.
In literal sense of "max_samples_per_tick", if hwc->interrupts ==
max_samples_per_tick, it should not be throttled, therefore, the judgment
condition should be changed to "hwc->interrupts > max_samples_per_tick".

In fact, this may cause the hardlockup to fail, The minimum value of
max_samples_per_tick may be 1, in this case, the return value of
__perf_event_account_interrupt function is 1.
As a result, nmi_watchdog gets throttled, which would stop PMU (Use x86
architecture as an example, see x86_pmu_handle_irq).

Fixes: e050e3f0 ("perf: Fix broken interrupt rate throttling")
Signed-off-by: NYang Jihong <yangjihong1@huawei.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20230227023508.102230-1-yangjihong1@huawei.comSigned-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

b17c9247

of: Fix modalias string generation · 56ce204e

由 Miquel Raynal 提交于 6月 08, 2023

stable inclusion
from stable-v4.19.283
commit d72e2dc104e65827798e116f9e4853b85488d3e8
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U
CVE: NA

--------------------------------

[ Upstream commit b19a4266 ]

The helper generating an OF based modalias (of_device_get_modalias())
works fine, but due to the use of snprintf() internally it needs a
buffer one byte longer than what should be needed just for the entire
string (excluding the '\0'). Most users of this helper are sysfs hooks
providing the modalias string to users. They all provide a PAGE_SIZE
buffer which is way above the number of bytes required to fit the
modalias string and hence do not suffer from this issue.

There is another user though, of_device_request_module(), which is only
called by drivers/usb/common/ulpi.c. This request module function is
faulty, but maybe because in most cases there is an alternative, ULPI
driver users have not noticed it.

In this function, of_device_get_modalias() is called twice. The first
time without buffer just to get the number of bytes required by the
modalias string (excluding the null byte), and a second time, after
buffer allocation, to fill the buffer. The allocation asks for an
additional byte, in order to store the trailing '\0'. However, the
buffer *length* provided to of_device_get_modalias() excludes this extra
byte. The internal use of snprintf() with a length that is exactly the
number of bytes to be written has the effect of using the last available
byte to store a '\0', which then smashes the last character of the
modalias string.

Provide the actual size of the buffer to of_device_get_modalias() to fix
this issue.

Note: the "str[size - 1] = '\0';" line is not really needed as snprintf
will anyway end the string with a null byte, but there is a possibility
that this function might be called on a struct device_node without
compatible, in this case snprintf() would not be executed. So we keep it
just to avoid possible unbounded strings.

Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Peter Chen <peter.chen@kernel.org>
Fixes: 9c829c09 ("of: device: Support loading a module with OF based modalias")
Signed-off-by: NMiquel Raynal <miquel.raynal@bootlin.com>
Reviewed-by: NRob Herring <robh@kernel.org>
Signed-off-by: NSrinivas Kandagatla <srinivas.kandagatla@linaro.org>
Link: https://lore.kernel.org/r/20230404172148.82422-2-srinivas.kandagatla@linaro.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

56ce204e

tcp/udp: Fix memleaks of sk and zerocopy skbs with TX timestamp. · 16067223

由 Kuniyuki Iwashima 提交于 6月 08, 2023

stable inclusion
from stable-v4.19.283
commit 1f69c086b20e27763af28145981435423f088268
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U
CVE: NA

--------------------------------

[ Upstream commit 50749f2d ]

syzkaller reported [0] memory leaks of an UDP socket and ZEROCOPY
skbs.  We can reproduce the problem with these sequences:

  sk = socket(AF_INET, SOCK_DGRAM, 0)
  sk.setsockopt(SOL_SOCKET, SO_TIMESTAMPING, SOF_TIMESTAMPING_TX_SOFTWARE)
  sk.setsockopt(SOL_SOCKET, SO_ZEROCOPY, 1)
  sk.sendto(b'', MSG_ZEROCOPY, ('127.0.0.1', 53))
  sk.close()

sendmsg() calls msg_zerocopy_alloc(), which allocates a skb, sets
skb->cb->ubuf.refcnt to 1, and calls sock_hold().  Here, struct
ubuf_info_msgzc indirectly holds a refcnt of the socket.  When the
skb is sent, __skb_tstamp_tx() clones it and puts the clone into
the socket's error queue with the TX timestamp.

When the original skb is received locally, skb_copy_ubufs() calls
skb_unclone(), and pskb_expand_head() increments skb->cb->ubuf.refcnt.
This additional count is decremented while freeing the skb, but struct
ubuf_info_msgzc still has a refcnt, so __msg_zerocopy_callback() is
not called.

The last refcnt is not released unless we retrieve the TX timestamped
skb by recvmsg().  Since we clear the error queue in inet_sock_destruct()
after the socket's refcnt reaches 0, there is a circular dependency.
If we close() the socket holding such skbs, we never call sock_put()
and leak the count, sk, and skb.

TCP has the same problem, and commit e0c8bccd ("net: stream:
purge sk_error_queue in sk_stream_kill_queues()") tried to fix it
by calling skb_queue_purge() during close().  However, there is a
small chance that skb queued in a qdisc or device could be put
into the error queue after the skb_queue_purge() call.

In __skb_tstamp_tx(), the cloned skb should not have a reference
to the ubuf to remove the circular dependency, but skb_clone() does
not call skb_copy_ubufs() for zerocopy skb.  So, we need to call
skb_orphan_frags_rx() for the cloned skb to call skb_copy_ubufs().

[0]:
BUG: memory leak
unreferenced object 0xffff88800c6d2d00 (size 1152):
  comm "syz-executor392", pid 264, jiffies 4294785440 (age 13.044s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 cd af e8 81 00 00 00 00  ................
    02 00 07 40 00 00 00 00 00 00 00 00 00 00 00 00  ...@............
  backtrace:
    [<0000000055636812>] sk_prot_alloc+0x64/0x2a0 net/core/sock.c:2024
    [<0000000054d77b7a>] sk_alloc+0x3b/0x800 net/core/sock.c:2083
    [<0000000066f3c7e0>] inet_create net/ipv4/af_inet.c:319 [inline]
    [<0000000066f3c7e0>] inet_create+0x31e/0xe40 net/ipv4/af_inet.c:245
    [<000000009b83af97>] __sock_create+0x2ab/0x550 net/socket.c:1515
    [<00000000b9b11231>] sock_create net/socket.c:1566 [inline]
    [<00000000b9b11231>] __sys_socket_create net/socket.c:1603 [inline]
    [<00000000b9b11231>] __sys_socket_create net/socket.c:1588 [inline]
    [<00000000b9b11231>] __sys_socket+0x138/0x250 net/socket.c:1636
    [<000000004fb45142>] __do_sys_socket net/socket.c:1649 [inline]
    [<000000004fb45142>] __se_sys_socket net/socket.c:1647 [inline]
    [<000000004fb45142>] __x64_sys_socket+0x73/0xb0 net/socket.c:1647
    [<0000000066999e0e>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    [<0000000066999e0e>] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
    [<0000000017f238c1>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

BUG: memory leak
unreferenced object 0xffff888017633a00 (size 240):
  comm "syz-executor392", pid 264, jiffies 4294785440 (age 13.044s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 2d 6d 0c 80 88 ff ff  .........-m.....
  backtrace:
    [<000000002b1c4368>] __alloc_skb+0x229/0x320 net/core/skbuff.c:497
    [<00000000143579a6>] alloc_skb include/linux/skbuff.h:1265 [inline]
    [<00000000143579a6>] sock_omalloc+0xaa/0x190 net/core/sock.c:2596
    [<00000000be626478>] msg_zerocopy_alloc net/core/skbuff.c:1294 [inline]
    [<00000000be626478>] msg_zerocopy_realloc+0x1ce/0x7f0 net/core/skbuff.c:1370
    [<00000000cbfc9870>] __ip_append_data+0x2adf/0x3b30 net/ipv4/ip_output.c:1037
    [<0000000089869146>] ip_make_skb+0x26c/0x2e0 net/ipv4/ip_output.c:1652
    [<00000000098015c2>] udp_sendmsg+0x1bac/0x2390 net/ipv4/udp.c:1253
    [<0000000045e0e95e>] inet_sendmsg+0x10a/0x150 net/ipv4/af_inet.c:819
    [<000000008d31bfde>] sock_sendmsg_nosec net/socket.c:714 [inline]
    [<000000008d31bfde>] sock_sendmsg+0x141/0x190 net/socket.c:734
    [<0000000021e21aa4>] __sys_sendto+0x243/0x360 net/socket.c:2117
    [<00000000ac0af00c>] __do_sys_sendto net/socket.c:2129 [inline]
    [<00000000ac0af00c>] __se_sys_sendto net/socket.c:2125 [inline]
    [<00000000ac0af00c>] __x64_sys_sendto+0xe1/0x1c0 net/socket.c:2125
    [<0000000066999e0e>] do_syscall_x64 arch/x86/entry/common.c:50 [inline]
    [<0000000066999e0e>] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
    [<0000000017f238c1>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

Fixes: f214f915 ("tcp: enable MSG_ZEROCOPY")
Fixes: b5947e5d ("udp: msg_zerocopy")
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

16067223

ipv4: Fix potential uninit variable access bug in __ip_make_skb() · e76e6556

由 Ziyang Xuan 提交于 6月 08, 2023

stable inclusion
from stable-v4.19.283
commit 022ea4374c319690c804706bda9dc42946d1556d
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U
CVE: NA

--------------------------------

[ Upstream commit 99e5acae ]

Like commit ea30388b ("ipv6: Fix an uninit variable access bug in
__ip6_make_skb()"). icmphdr does not in skb linear region under the
scenario of SOCK_RAW socket. Access icmp_hdr(skb)->type directly will
trigger the uninit variable access bug.

Use a local variable icmp_type to carry the correct value in different
scenarios.

Fixes: 96793b48 ("[IPV4]: Add ICMPMsgStats MIB (RFC 4293)")
Reviewed-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

e76e6556

crypto: drbg - Only fail when jent is unavailable in FIPS mode · 262311ea

由 Herbert Xu 提交于 6月 08, 2023

stable inclusion
from stable-v4.19.283
commit 1fd247c1ded58f9bc1130fe4b26fb187fa1af55d
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U
CVE: NA

--------------------------------

[ Upstream commit 686cd976 ]

When jent initialisation fails for any reason other than ENOENT,
the entire drbg fails to initialise, even when we're not in FIPS
mode.  This is wrong because we can still use the kernel RNG when
we're not in FIPS mode.

Change it so that it only fails when we are in FIPS mode.

Fixes: 57225e67 ("crypto: drbg - Use callback API for random readiness")
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: NStephan Mueller <smueller@chronox.de>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

262311ea

crypto: drbg - make drbg_prepare_hrng() handle jent instantiation errors · 72a9e727

由 Nicolai Stange 提交于 6月 08, 2023

stable inclusion
from stable-v4.19.283
commit f1943e5703861f89f4376596e3d28d0dd52c5ead
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U
CVE: NA

--------------------------------

[ Upstream commit 559edd47 ]

Now that drbg_prepare_hrng() doesn't do anything but to instantiate a
jitterentropy crypto_rng instance, it looks a little odd to have the
related error handling at its only caller, drbg_instantiate().

Move the handling of jitterentropy allocation failures from
drbg_instantiate() close to the allocation itself in drbg_prepare_hrng().

There is no change in behaviour.
Signed-off-by: NNicolai Stange <nstange@suse.de>
Reviewed-by: NStephan Müller <smueller@chronox.de>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Stable-dep-of: 686cd976 ("crypto: drbg - Only fail when jent is unavailable in FIPS mode")
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

72a9e727

net/packet: convert po->auxdata to an atomic flag · e773dd0e

由 Eric Dumazet 提交于 6月 08, 2023

stable inclusion
from stable-v4.19.283
commit e70e38104e5ecd6717f46f054592ba2683c5c7c3
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U
CVE: NA

--------------------------------

[ Upstream commit fd53c297 ]

po->auxdata can be read while another thread
is changing its value, potentially raising KCSAN splat.

Convert it to PACKET_SOCK_AUXDATA flag.

Fixes: 8dc41944 ("[PACKET]: Add optional checksum computation for recvmsg")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

e773dd0e

net/packet: convert po->origdev to an atomic flag · e7ca0485

由 Eric Dumazet 提交于 6月 08, 2023

stable inclusion
from stable-v4.19.283
commit 36a320c3e2fe960eb5eb56ee31c2a07f55d4000a
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U
CVE: NA

--------------------------------

[ Upstream commit ee5675ec ]

syzbot/KCAN reported that po->origdev can be read
while another thread is changing its value.

We can avoid this splat by converting this field
to an actual bit.

Following patches will convert remaining 1bit fields.

Fixes: 80feaacb ("[AF_PACKET]: Add option to return orig_dev to userspace.")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

e7ca0485

ring-buffer: Sync IRQ works before buffer destruction · 925fad9e

由 Johannes Berg 提交于 6月 08, 2023

stable inclusion
from stable-v4.19.283
commit 2702b67f59d455072a08dc40312f9b090d4dec04
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U
CVE: NA

--------------------------------

commit 675751bb upstream.

If something was written to the buffer just before destruction,
it may be possible (maybe not in a real system, but it did
happen in ARCH=um with time-travel) to destroy the ringbuffer
before the IRQ work ran, leading this KASAN report (or a crash
without KASAN):

    BUG: KASAN: slab-use-after-free in irq_work_run_list+0x11a/0x13a
    Read of size 8 at addr 000000006d640a48 by task swapper/0

    CPU: 0 PID: 0 Comm: swapper Tainted: G        W  O       6.3.0-rc1 #7
    Stack:
     60c4f20f 0c203d48 41b58ab3 60f224fc
     600477fa 60f35687 60c4f20f 601273dd
     00000008 6101eb00 6101eab0 615be548
    Call Trace:
     [<60047a58>] show_stack+0x25e/0x282
     [<60c609e0>] dump_stack_lvl+0x96/0xfd
     [<60c50d4c>] print_report+0x1a7/0x5a8
     [<603078d3>] kasan_report+0xc1/0xe9
     [<60308950>] __asan_report_load8_noabort+0x1b/0x1d
     [<60232844>] irq_work_run_list+0x11a/0x13a
     [<602328b4>] irq_work_tick+0x24/0x34
     [<6017f9dc>] update_process_times+0x162/0x196
     [<6019f335>] tick_sched_handle+0x1a4/0x1c3
     [<6019fd9e>] tick_sched_timer+0x79/0x10c
     [<601812b9>] __hrtimer_run_queues.constprop.0+0x425/0x695
     [<60182913>] hrtimer_interrupt+0x16c/0x2c4
     [<600486a3>] um_timer+0x164/0x183
     [...]

    Allocated by task 411:
     save_stack_trace+0x99/0xb5
     stack_trace_save+0x81/0x9b
     kasan_save_stack+0x2d/0x54
     kasan_set_track+0x34/0x3e
     kasan_save_alloc_info+0x25/0x28
     ____kasan_kmalloc+0x8b/0x97
     __kasan_kmalloc+0x10/0x12
     __kmalloc+0xb2/0xe8
     load_elf_phdrs+0xee/0x182
     [...]

    The buggy address belongs to the object at 000000006d640800
     which belongs to the cache kmalloc-1k of size 1024
    The buggy address is located 584 bytes inside of
     freed 1024-byte region [000000006d640800, 000000006d640c00)

Add the appropriate irq_work_sync() so the work finishes before
the buffers are destroyed.

Prior to the commit in the Fixes tag below, there was only a
single global IRQ work, so this issue didn't exist.

Link: https://lore.kernel.org/linux-trace-kernel/20230427175920.a76159263122.I8295e405c44362a86c995e9c2c37e3e03810aa56@changeid

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 15693458 ("tracing/ring-buffer: Move poll wake ups into ring buffer code")
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NSteven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

925fad9e

dccp: Call inet6_destroy_sock() via sk->sk_destruct(). · 02fde7c4

由 Kuniyuki Iwashima 提交于 6月 08, 2023

stable inclusion
from stable-v4.19.282
commit b165119e6cc96ceaeea061ecca0750f0052aa2c8
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U
CVE: NA

--------------------------------

commit 1651951e upstream.

After commit d38afeec ("tcp/udp: Call inet6_destroy_sock()
in IPv6 sk->sk_destruct()."), we call inet6_destroy_sock() in
sk->sk_destruct() by setting inet6_sock_destruct() to it to make
sure we do not leak inet6-specific resources.

DCCP sets its own sk->sk_destruct() in the dccp_init_sock(), and
DCCPv6 socket shares it by calling the same init function via
dccp_v6_init_sock().

To call inet6_sock_destruct() from DCCPv6 sk->sk_destruct(), we
export it and set dccp_v6_sk_destruct() in the init function.
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

02fde7c4

inet6: Remove inet6_destroy_sock() in sk->sk_prot->destroy(). · 85d76b96

由 Kuniyuki Iwashima 提交于 6月 08, 2023

stable inclusion
from stable-v4.19.282
commit e1820a934398d35a5925bcf61316983c90857f7d
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U
CVE: NA

--------------------------------

commit b5fc2923 upstream.

After commit d38afeec ("tcp/udp: Call inet6_destroy_sock()
in IPv6 sk->sk_destruct()."), we call inet6_destroy_sock() in
sk->sk_destruct() by setting inet6_sock_destruct() to it to make
sure we do not leak inet6-specific resources.

Now we can remove unnecessary inet6_destroy_sock() calls in
sk->sk_prot->destroy().

DCCP and SCTP have their own sk->sk_destruct() function, so we
change them separately in the following patches.
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

85d76b96

tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct(). · 7d023a7e

由 Kuniyuki Iwashima 提交于 6月 08, 2023

stable inclusion
from stable-v4.19.282
commit 1d8a87d6b6ee29fd37e8ecd67d45aef76bacc88b
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U
CVE: NA

--------------------------------

commit d38afeec upstream.

Originally, inet6_sk(sk)->XXX were changed under lock_sock(), so we were
able to clean them up by calling inet6_destroy_sock() during the IPv6 ->
IPv4 conversion by IPV6_ADDRFORM.  However, commit 03485f2a ("udpv6:
Add lockless sendmsg() support") added a lockless memory allocation path,
which could cause a memory leak:

setsockopt(IPV6_ADDRFORM)                 sendmsg()
+-----------------------+                 +-------+
- do_ipv6_setsockopt(sk, ...)             - udpv6_sendmsg(sk, ...)
  - sockopt_lock_sock(sk)                   ^._ called via udpv6_prot
    - lock_sock(sk)                             before WRITE_ONCE()
  - WRITE_ONCE(sk->sk_prot, &tcp_prot)
  - inet6_destroy_sock()                    - if (!corkreq)
  - sockopt_release_sock(sk)                  - ip6_make_skb(sk, ...)
    - release_sock(sk)                          ^._ lockless fast path for
                                                    the non-corking case

                                                - __ip6_append_data(sk, ...)
                                                  - ipv6_local_rxpmtu(sk, ...)
                                                    - xchg(&np->rxpmtu, skb)
                                                      ^._ rxpmtu is never freed.

                                                - goto out_no_dst;

                                            - lock_sock(sk)

For now, rxpmtu is only the case, but not to miss the future change
and a similar bug fixed in commit e2732600 ("net: ping6: Fix
memleak in ipv6_renew_options()."), let's set a new function to IPv6
sk->sk_destruct() and call inet6_cleanup_sock() there.  Since the
conversion does not change sk->sk_destruct(), we can guarantee that
we can clean up IPv6 resources finally.

We can now remove all inet6_destroy_sock() calls from IPv6 protocol
specific ->destroy() functions, but such changes are invasive to
backport.  So they can be posted as a follow-up later for net-next.

Fixes: 03485f2a ("udpv6: Add lockless sendmsg() support")
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

7d023a7e

udp: Call inet6_destroy_sock() in setsockopt(IPV6_ADDRFORM). · f5b6756a

由 Kuniyuki Iwashima 提交于 6月 08, 2023

stable inclusion
from stable-v4.19.282
commit 9577d9f0fbf1eea95f61b02627572c401499aa32
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7BZ5U
CVE: NA

--------------------------------

commit 21985f43 upstream.

Commit 4b340ae2 ("IPv6: Complete IPV6_DONTFRAG support") forgot
to add a change to free inet6_sk(sk)->rxpmtu while converting an IPv6
socket into IPv4 with IPV6_ADDRFORM.  After conversion, sk_prot is
changed to udp_prot and ->destroy() never cleans it up, resulting in
a memory leak.

This is due to the discrepancy between inet6_destroy_sock() and
IPV6_ADDRFORM, so let's call inet6_destroy_sock() from IPV6_ADDRFORM
to remove the difference.

However, this is not enough for now because rxpmtu can be changed
without lock_sock() after commit 03485f2a ("udpv6: Add lockless
sendmsg() support").  We will fix this case in the following patch.

Note we will rename inet6_destroy_sock() to inet6_cleanup_sock() and
remove unnecessary inet6_destroy_sock() calls in sk_prot->destroy()
in the future.

Fixes: 4b340ae2 ("IPv6: Complete IPV6_DONTFRAG support")
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

f5b6756a

lib/cmdline: fix get_option() for strings starting with hyphen · 828b20a2

由 Andy Shevchenko 提交于 6月 08, 2023

mainline inclusion
from mainline-v5.11-rc1
commit e291851d
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I73DDA
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e291851d65495739e4eede33b6bc387bb546a19b

----------------------------------------------

When string doesn't have an integer and starts from hyphen get_option()
may return interesting results.  Fix it to return 0.

The simple_strtoull() is used due to absence of simple_strtoul() in a boot
code on some architectures.

Note, the Fixes tag below is rather for anthropological curiosity.

Link: https://lkml.kernel.org/r/20201112180732.75589-4-andriy.shevchenko@linux.intel.com
Fixes: f68565831e72 ("Import 2.4.0-test2pre3")
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Gow <davidgow@google.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Matti Vaittinen <matti.vaittinen@fi.rohmeurope.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Vitor Massaru Iha <vitor@massaru.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NZhang Zekun <zhangzekun11@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

828b20a2

of: overlay: fix for_each_child.cocci warnings · a81561c6

由 kernel test robot 提交于 6月 08, 2023

mainline inclusion
from mainline-v5.13-rc1
commit c4d74f0f
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I73DDA
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c4d74f0f978ed5ceee62cd3f6708081042e582a1

------------------------------------------------

Function "for_each_child_of_node" should have of_node_put() before goto.

Generated by: scripts/coccinelle/iterators/for_each_child.cocci

Fixes: 82c2d813 ("coccinelle: iterators: Add for_each_child.cocci script")
CC: Sumera Priyadarsini <sylphrenadin@gmail.com>
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NJulia Lawall <julia.lawall@inria.fr>
Reviewed-by: NFrank Rowand <frank.rowand@sony.com>
Tested-by: NFrank Rowand <frank.rowand@sony.com>
Link: https://lore.kernel.org/r/alpine.DEB.2.22.394.2103221918450.2918@hadrienSigned-off-by: NRob Herring <robh@kernel.org>
Signed-off-by: NZhang Zekun <zhangzekun11@huawei.com>
Reviewed-by: Nxuqiang <xuqiang36@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

a81561c6

kprobes: Fix to handle forcibly unoptimized kprobes on freeing_list · 26e94eaf

由 Masami Hiramatsu (Google) 提交于 6月 08, 2023

mainline inclusion
from mainline-v6.3-rc1
commit 4fbd2f83
category: bugfix
bugzilla: 188446, gitee.com/openeuler/kernel/issues/I6DK3O
CVE: NA

--------------------------------

Since forcibly unoptimized kprobes will be put on the freeing_list directly
in the unoptimize_kprobe(), do_unoptimize_kprobes() must continue to check
the freeing_list even if unoptimizing_list is empty.

This bug can happen if a kprobe is put in an instruction which is in the
middle of the jump-replaced instruction sequence of an optprobe, *and* the
optprobe is recently unregistered and queued on unoptimizing_list.
In this case, the optprobe will be unoptimized forcibly (means immediately)
and put it into the freeing_list, expecting the optprobe will be handled in
do_unoptimize_kprobe().
But if there is no other optprobes on the unoptimizing_list, current code
returns from the do_unoptimize_kprobe() soon and does not handle the
optprobe which is on the freeing_list. Then the optprobe will hit the
WARN_ON_ONCE() in the do_free_cleaned_kprobes(), because it is not handled
in the latter loop of the do_unoptimize_kprobe().

To solve this issue, do not return from do_unoptimize_kprobes() immediately
even if unoptimizing_list is empty.

Moreover, this change affects another case. kill_optimized_kprobes() expects
kprobe_optimizer() will just free the optprobe on freeing_list.
So I changed it to just do list_move() to freeing_list if optprobes are on
unoptimizing list. And the do_unoptimize_kprobe() will skip
arch_disarm_kprobe() if the probe on freeing_list has gone flag.

Link: https://lore.kernel.org/all/Y8URdIfVr3pq2X8w@xpf.sh.intel.com/
Link: https://lore.kernel.org/all/167448024501.3253718.13037333683110512967.stgit@devnote3/

Fixes: e4add247 ("kprobes: Fix optimize_kprobe()/unoptimize_kprobe() cancellation logic")
Reported-by: NPengfei Xu <pengfei.xu@intel.com>
Signed-off-by: NMasami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: stable@vger.kernel.org
Acked-by: NSteven Rostedt (Google) <rostedt@goodmis.org>
Conflicts:
        kernel/kprobes.c
Signed-off-by: NYang Jihong <yangjihong1@huawei.com>
Signed-off-by: NChen Zhongjin <chenzhongjin@huawei.com>
Reviewed-by: NYang Jihong <yangjihong1@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

26e94eaf

fs: hfsplus: fix UAF issue in hfsplus_put_super · 16f9da38

由 Dongliang Mu 提交于 6月 08, 2023

stable inclusion
from stable-v4.19.273
commit e226f1fdcee1ca6e68233b132718deb578a84e38
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I79LIO
CVE: CVE-2023-2985

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-4.19.y&id=e226f1fdcee1ca6e68233b132718deb578a84e38

--------------------------------

commit 07db5e24 upstream.

The current hfsplus_put_super first calls hfs_btree_close on
sbi->ext_tree, then invokes iput on sbi->hidden_dir, resulting in an
use-after-free issue in hfsplus_release_folio.

As shown in hfsplus_fill_super, the error handling code also calls iput
before hfs_btree_close.

To fix this error, we move all iput calls before hfsplus_btree_close.

Note that this patch is tested on Syzbot.

Link: https://lkml.kernel.org/r/20230226124948.3175736-1-mudongliangabcd@gmail.com
Reported-by: syzbot+57e3e98f7e3b80f64d56@syzkaller.appspotmail.com
Tested-by: NDongliang Mu <mudongliangabcd@gmail.com>
Signed-off-by: NDongliang Mu <mudongliangabcd@gmail.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NLong Li <leo.lilong@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

16f9da38

07 6月, 2023 6 次提交

block: Fix the partition start may overflow in add_partition() · a325a269

由 Zhong Jinghua 提交于 6月 07, 2023

hulk inclusion
category: bugfix
bugzilla: 187268, https://gitee.com/openeuler/kernel/issues/I76JDY
CVE: NA

----------------------------------------

In the block_ioctl, we can pass in the unsigned number 0x8000000000000000
as an input parameter, like below:

block_ioctl
  blkdev_ioctl
    blkpg_ioctl
      blkpg_do_ioctl
        copy_from_user
        bdev_add_partition
          add_partition
            p->start_sect = start; // start = 0x8000000000000000

Then, there was an warning when submit bio:

WARNING: CPU: 0 PID: 382 at fs/iomap/apply.c:54
Call trace:
 iomap_apply+0x644/0x6e0
 __iomap_dio_rw+0x5cc/0xa24
 iomap_dio_rw+0x4c/0xcc
 ext4_dio_read_iter
 ext4_file_read_iter
 ext4_file_read_iter+0x318/0x39c
 call_read_iter
 lo_rw_aio.isra.0+0x748/0x75c
 do_req_filebacked+0x2d4/0x370
 loop_handle_cmd
 loop_queue_work+0x94/0x23c
 kthread_worker_fn+0x160/0x6bc
 loop_kthread_worker_fn+0x3c/0x50
 kthread+0x20c/0x25c
 ret_from_fork+0x10/0x18

Stack:

submit_bio_noacct
  submit_bio_checks
    blk_partition_remap
      bio->bi_iter.bi_sector += p->start_sect
      // bio->bi_iter.bi_sector = 0xffc0000000000000 + 65408
..
loop_queue_work
 loop_handle_cmd
  do_req_filebacked
   pos = ((loff_t) blk_rq_pos(rq) << 9) + lo->lo_offset // pos < 0
   lo_rw_aio
     call_read_iter
      ext4_dio_read_iter
       __iomap_dio_rw
        iomap_apply
         ext4_iomap_begin
           map.m_lblk = offset >> blkbits
             ext4_set_iomap
             iomap->offset = (u64) map->m_lblk << blkbits
             // iomap->offset = 64512
         WARN_ON(iomap.offset > pos) // iomap.offset = 64512 and pos < 0

This is unreasonable for start + length > disk->part0.nr_sects. There is
already a similar check in blk_add_partition().
Fix it by adding a check in bdev_add_partition().
Signed-off-by: NZhong Jinghua <zhongjinghua@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

a325a269

block: refactor blkpg_ioctl · ad0628d3

由 Christoph Hellwig 提交于 6月 07, 2023

mainline inclusion
from mainline-v5.8-rc1
commit fa9156ae
category: bugfix
bugzilla: 187268, https://gitee.com/openeuler/kernel/issues/I76JDY
CVE: NA

----------------------------------------

Split each sub-command out into a separate helper, and move those helpers
to block/partitions/core.c instead of having a lot of partition
manipulation logic open coded in block/ioctl.c.

Signed-off-by: Christoph Hellwig <hch@lst.de
Signed-off-by: NJens Axboe <axboe@kernel.dk>

conflicts:
	block/ioctl.c
	block/partition-generic.c
	include/linux/genhd.h
Signed-off-by: NZhong Jinghua <zhongjinghua@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

ad0628d3

nbd: get config_lock before sock_shutdown · 4b53fff3

由 Zhong Jinghua 提交于 6月 07, 2023

hulk inclusion
category: bugfix
bugzilla: 188799, https://gitee.com/openeuler/kernel/issues/I79QWO
CVE: NA

----------------------------------------

Config->socks in sock_shutdown may trigger a UAF problem.
The reason is that sock_shutdown does not hold the config_lock,
so that nbd_ioctl can release config->socks at this time.

T0: NBD_SET_SOCK
T1: NBD_DO_IT

T0						T1

nbd_ioctl
  mutex_lock(&nbd->config_lock)
  // get lock
  __nbd_ioctl
	nbd_start_device_ioctl
	  nbd_start_device
	  mutex_unlock(&nbd->config_lock)
	  // relase lock
	  wait_event_interruptible
	  (kill, enter sock_shutdown)
	  sock_shutdown
					nbd_ioctl
					  mutex_lock(&nbd->config_lock)
					  // get lock
					  __nbd_ioctl
					    nbd_add_socket
					      krealloc
						kfree(p)
					        //config->socks is NULL
	    nbd_sock *nsock = config->socks // error

Fix it by moving config_lock up before sock_shutdown.
Signed-off-by: NZhong Jinghua <zhongjinghua@huawei.com>
Reviewed-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

4b53fff3

ipv6: sr: fix out-of-bounds read when setting HMAC data. · d9d1ff02

由 David Lebrun 提交于 6月 07, 2023

stable inclusion
from stable-v4.19.258
commit f684c16971ed5e77dfa25a9ad25b5297e1f58eab
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I7ASU6
CVE: CVE-2023-2860

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=f684c16971ed5e77dfa25a9ad25b5297e1f58eab

--------------------------------

[ Upstream commit 84a53580 ]

The SRv6 layer allows defining HMAC data that can later be used to sign IPv6
Segment Routing Headers. This configuration is realised via netlink through
four attributes: SEG6_ATTR_HMACKEYID, SEG6_ATTR_SECRET, SEG6_ATTR_SECRETLEN and
SEG6_ATTR_ALGID. Because the SECRETLEN attribute is decoupled from the actual
length of the SECRET attribute, it is possible to provide invalid combinations
(e.g., secret = "", secretlen = 64). This case is not checked in the code and
with an appropriately crafted netlink message, an out-of-bounds read of up
to 64 bytes (max secret length) can occur past the skb end pointer and into
skb_shared_info:

Breakpoint 1, seg6_genl_sethmac (skb=<optimized out>, info=<optimized out>) at net/ipv6/seg6.c:208
208		memcpy(hinfo->secret, secret, slen);
(gdb) bt
 #0  seg6_genl_sethmac (skb=<optimized out>, info=<optimized out>) at net/ipv6/seg6.c:208
 #1  0xffffffff81e012e9 in genl_family_rcv_msg_doit (skb=skb@entry=0xffff88800b1f9f00, nlh=nlh@entry=0xffff88800b1b7600,
    extack=extack@entry=0xffffc90000ba7af0, ops=ops@entry=0xffffc90000ba7a80, hdrlen=4, net=0xffffffff84237580 <init_net>, family=<optimized out>,
    family=<optimized out>) at net/netlink/genetlink.c:731
 #2  0xffffffff81e01435 in genl_family_rcv_msg (extack=0xffffc90000ba7af0, nlh=0xffff88800b1b7600, skb=0xffff88800b1f9f00,
    family=0xffffffff82fef6c0 <seg6_genl_family>) at net/netlink/genetlink.c:775
 #3  genl_rcv_msg (skb=0xffff88800b1f9f00, nlh=0xffff88800b1b7600, extack=0xffffc90000ba7af0) at net/netlink/genetlink.c:792
 #4  0xffffffff81dfffc3 in netlink_rcv_skb (skb=skb@entry=0xffff88800b1f9f00, cb=cb@entry=0xffffffff81e01350 <genl_rcv_msg>)
    at net/netlink/af_netlink.c:2501
 #5  0xffffffff81e00919 in genl_rcv (skb=0xffff88800b1f9f00) at net/netlink/genetlink.c:803
 #6  0xffffffff81dff6ae in netlink_unicast_kernel (ssk=0xffff888010eec800, skb=0xffff88800b1f9f00, sk=0xffff888004aed000)
    at net/netlink/af_netlink.c:1319
 #7  netlink_unicast (ssk=ssk@entry=0xffff888010eec800, skb=skb@entry=0xffff88800b1f9f00, portid=portid@entry=0, nonblock=<optimized out>)
    at net/netlink/af_netlink.c:1345
 #8  0xffffffff81dff9a4 in netlink_sendmsg (sock=<optimized out>, msg=0xffffc90000ba7e48, len=<optimized out>) at net/netlink/af_netlink.c:1921
...
(gdb) p/x ((struct sk_buff *)0xffff88800b1f9f00)->head + ((struct sk_buff *)0xffff88800b1f9f00)->end
$1 = 0xffff88800b1b76c0
(gdb) p/x secret
$2 = 0xffff88800b1b76c0
(gdb) p slen
$3 = 64 '@'

The OOB data can then be read back from userspace by dumping HMAC state. This
commit fixes this by ensuring SECRETLEN cannot exceed the actual length of
SECRET.
Reported-by: NLucas Leong <wmliang.tw@gmail.com>
Tested: verified that EINVAL is correctly returned when secretlen > len(secret)
Fixes: 4f4853dc ("ipv6: sr: implement API to control SR HMAC structure")
Signed-off-by: NDavid Lebrun <dlebrun@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: NLiu Jian <liujian56@huawei.com>
Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

d9d1ff02

dm: add disk before alloc dax · ee776389

由 Li Lingfeng 提交于 6月 07, 2023

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I78SWJ
CVE: NA

-------------------------------

In dm_create(), alloc_dev() may trigger panic if alloc_dax() fail since
del_gendisk() will be called with add_disk() wasn't called before.

Call add_disk() before alloc_dax() to avoid it.
Signed-off-by: NLi Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

ee776389

dm thin: Fix ABBA deadlock by resetting dm_bufio_client · c7ca9c06

由 Li Lingfeng 提交于 6月 07, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I79ZEK
CVE: NA

--------------------------------

As described in commit 273494b2 ("dm thin: Fix ABBA deadlock between
shrink_slab and dm_pool_abort_metadata"), ABBA deadlock will be triggered
since shrinker_rwsem need to be held when operations failed on dm pool
metadata.

We have noticed the following three problem scenarios:
1) Described by commit 273494b2 ("dm thin: Fix ABBA deadlock between
shrink_slab and dm_pool_abort_metadata")

2) shrinker_rwsem and throttle->lock
          P1(drop cache)                        P2(kworker)
drop_caches_sysctl_handler
 drop_slab
  shrink_slab
   down_read(&shrinker_rwsem)  - LOCK A
   do_shrink_slab
    super_cache_scan
     prune_icache_sb
      dispose_list
       evict
        ext4_evict_inode
         ext4_clear_inode
          ext4_discard_preallocations
           ext4_mb_load_buddy_gfp
            ext4_mb_init_cache
             ext4_wait_block_bitmap
              __ext4_error
               ext4_handle_error
                ext4_commit_super
                 ...
                 dm_submit_bio
                                     do_worker
                                      throttle_work_update
                                       down_write(&t->lock) -- LOCK B
                                      process_deferred_bios
                                       commit
                                        metadata_operation_failed
                                         dm_pool_abort_metadata
                                          dm_block_manager_create
                                           dm_bufio_client_create
                                            register_shrinker
                                             down_write(&shrinker_rwsem)
                                             -- LOCK A
                 thin_map
                  thin_bio_map
                   thin_defer_bio_with_throttle
                    throttle_lock
                     down_read(&t->lock)  - LOCK B

3) shrinker_rwsem and wait_on_buffer
          P1(drop cache)                            P2(kworker)
drop_caches_sysctl_handler
 drop_slab
  shrink_slab
   down_read(&shrinker_rwsem)  - LOCK A
   do_shrink_slab
   ...
    ext4_wait_block_bitmap
     __ext4_error
      ext4_handle_error
       jbd2_journal_abort
        jbd2_journal_update_sb_errno
         jbd2_write_superblock
          submit_bh
           // LOCK B
           // RELEASE B
                             do_worker
                              throttle_work_update
                               down_write(&t->lock) - LOCK B
                              process_deferred_bios
                               process_bio
                               commit
                                metadata_operation_failed
                                 dm_pool_abort_metadata
                                  dm_block_manager_create
                                   dm_bufio_client_create
                                    register_shrinker
                                     register_shrinker_prepared
                                      down_write(&shrinker_rwsem)  - LOCK A
                               bio_endio
      wait_on_buffer
       __wait_on_buffer

Fix these by resetting dm_bufio_client without holding shrinker_rwsem.
Signed-off-by: NLi Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

c7ca9c06

06 6月, 2023 4 次提交

!932 [sync] PR-922: jbd2: fix checkpoint inconsistent · a8e79c38

由 openeuler-ci-bot 提交于 6月 06, 2023

Merge Pull Request from: @openeuler-sync-bot 
 

Origin pull request: 
https://gitee.com/openeuler/kernel/pulls/922 
 
PR sync from:  Zhihao Cheng <chengzhihao1@huawei.com>
 https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/thread/LVVMWDDI7DASB3DYASLNKKPERIERFPSU/ 
Zhang Yi (2):
  jbd2: recheck chechpointing non-dirty buffer
  jbd2: remove t_checkpoint_io_list


-- 
2.31.1
 
 
Link:https://gitee.com/openeuler/kernel/pulls/932 

Reviewed-by: zhangyi (F) <yi.zhang@huawei.com> 
Reviewed-by: Jialin Zhang <zhangjialin11@huawei.com> 
Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>

a8e79c38

jbd2: remove t_checkpoint_io_list · d55550a7

由 Zhang Yi 提交于 6月 06, 2023

maillist inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I70WHL

Reference: https://lore.kernel.org/linux-ext4/20230531115100.2779605-1-yi.zhang@huaweicloud.com/T/#t

---------------------------------------------------------------

Since t_checkpoint_io_list was stop using in jbd2_log_do_checkpoint()
now, it's time to remove the whole t_checkpoint_io_list logic.
Signed-off-by: NZhang Yi <yi.zhang@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Conflits:
	include/linux/jbd2.h
	[ Don't remove t_checkpoint_io_list for KABI broken. ]
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
(cherry picked from commit ae9c0722)

d55550a7

jbd2: recheck chechpointing non-dirty buffer · 4d00544e

由 Zhang Yi 提交于 6月 06, 2023

maillist inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I70WHL

Reference: https://lore.kernel.org/linux-ext4/20230531115100.2779605-1-yi.zhang@huaweicloud.com/T/#t

---------------------------------------------------------------

There is a long-standing metadata corruption issue that happens from
time to time, but it's very difficult to reproduce and analyse, benefit
from the JBD2_CYCLE_RECORD option, we found out that the problem is the
checkpointing process miss to write out some buffers which are raced by
another do_get_write_access(). Looks below for detail.

jbd2_log_do_checkpoint() //transaction X
 //buffer A is dirty and not belones to any transaction
 __buffer_relink_io() //move it to the IO list
 __flush_batch()
  write_dirty_buffer()
                             do_get_write_access()
                             clear_buffer_dirty
                             __jbd2_journal_file_buffer()
                             //add buffer A to a new transaction Y
   lock_buffer(bh)
   //doesn't write out
 __jbd2_journal_remove_checkpoint()
 //finish checkpoint except buffer A
 //filesystem corrupt if the new transaction Y isn't fully write out.

Due to the t_checkpoint_list walking loop in jbd2_log_do_checkpoint()
have already handles waiting for buffers under IO and re-added new
transaction to complete commit, and it also removing cleaned buffers,
this makes sure the list will eventually get empty. So it's fine to
leave buffers on the t_checkpoint_list while flushing out and completely
stop using the t_checkpoint_io_list.

Cc: stable@vger.kernel.org
Suggested-by: NJan Kara <jack@suse.cz>
Signed-off-by: NZhang Yi <yi.zhang@huawei.com>
Tested-by: NZhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
(cherry picked from commit caa8415e)

4d00544e

irqchip/gic-v3-its: Balance initial LPI affinity across CPUs · b06e74f5

由 Marc Zyngier 提交于 6月 05, 2023

stable inclusion
from stable-v5.8-rc1
commit c5d6082d
category: perf
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6UVFG
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=c5d6082d35e0bcc20a26a067ffcfddcb5257e580

--------------------------------

When mapping a LPI, the ITS driver picks the first possible
affinity, which is in most cases CPU0, assuming that if
that's not suitable, someone will come and set the affinity
to something more interesting.

It apparently isn't the case, and people complain of poor
performance when many interrupts are glued to the same CPU.
So let's place the interrupts by finding the "least loaded"
CPU (that is, the one that has the fewer LPIs mapped to it).
So called 'managed' interrupts are an interesting case where
the affinity is actually dictated by the kernel itself, and
we should honor this.
Reported-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Tested-by: NJohn Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/1575642904-58295-1-git-send-email-john.garry@huawei.com
Link: https://lore.kernel.org/r/20200515165752.121296-3-maz@kernel.orgSigned-off-by: NRuan Jinjie <ruanjinjie@huawei.com>
Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
Signed-off-by: NZhang Changzhong <zhangchangzhong@huawei.com>

b06e74f5

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功