1. 23 5月, 2023 8 次提交
  2. 22 5月, 2023 10 次提交
    • J
      bpf: support BPF_PROG_QUERY for progs attached to sockmap · 05038388
      JofDiamonds 提交于
      mainline inclusion
      from mainline-v6.4-rc3
      commit 748cd572
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I776SR
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=748cd5729ac7421091316e32dcdffb0578563880
      
      ----------------------------------------------------------------------
      
      Right now there is no way to query whether BPF programs are
      attached to a sockmap or not.
      
      we can use the standard interface in libbpf to query, such as:
      bpf_prog_query(mapFd, BPF_SK_SKB_STREAM_PARSER, 0, NULL, ...);
      the mapFd is the fd of sockmap.
      Signed-off-by: NDi Zhu <zhudi2@huawei.com>
      Acked-by: NYonghong Song <yhs@fb.com>
      Reviewed-by: NJakub Sitnicki <jakub@cloudflare.com>
      Link: https://lore.kernel.org/r/20220119014005.1209-1-zhudi2@huawei.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
      Conflicts:
      	net/core/sock_map.c
      	include/linux/bpf.h
      Signed-off-by: NJofDiamonds <kwb0523@163.com>
      Reviewed-by: Nwuchangye <wuchangye@huawei.com>
      05038388
    • O
      !780 Backport 5.10.152 LTS · a74e16ec
      openeuler-ci-bot 提交于
      Merge Pull Request from: @sanglipeng 
       
      Backport 5.10.152 LTS patches from upstream.
      
      Conflicts:
      
      Already merged(6):
      392536023da1 block: wbt: Remove unnecessary invoking of wbt_update_limits in wbt_init
      910ba49b3345 blk-wbt: call rq_qos_add() after wb_normal is initialized 
      51b96ecaedc0 arm64: errata: Remove AES hwcap for COMPAT tasks  
      7aa3d623c11b net: sched: fix race condition in qdisc_graft()  
      f687e2111b6f fcntl: fix potential deadlocks for &fown_struct.lock 
        (merged mainline commit f671a691 fcntl: fix potential deadlocks for &fown_struct.lock)
      31b1570677e8 blk-wbt: fix that 'rwb->wc' is always set to 1 in wbt_init() 
      
      Context conflict(3):
      dea47fefa6aa perf pmu: Validate raw event with sysfs exported format bits
      b1efc196446a fcntl: make F_GETOWN(EX) return 0 on dead owner task
      
      Rejected(1):
      a6e770733dc4 arm64: topology: move store_cpu_topology() to shared code
      
      Total patches: 72 - 6 -1 = 65 
       
      Link:https://gitee.com/openeuler/kernel/pulls/780 
      
      Reviewed-by: Jialin Zhang <zhangjialin11@huawei.com> 
      Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com> 
      a74e16ec
    • P
      netfilter: nf_tables: deactivate anonymous set from preparation phase · dcb69fcc
      Pablo Neira Ayuso 提交于
      stable inclusion
      from stable-v5.10.180
      commit e044a24447189419c3a7ccc5fa6da7516036dc55
      category: bugfix
      bugzilla: https://gitee.com/src-openeuler/kernel/issues/I71F49
      CVE: CVE-2023-32233
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e044a24447189419c3a7ccc5fa6da7516036dc55
      
      --------------------------------
      
      commit c1592a89 upstream.
      
      Toggle deleted anonymous sets as inactive in the next generation, so
      users cannot perform any update on it. Clear the generation bitmask
      in case the transaction is aborted.
      
      The following KASAN splat shows a set element deletion for a bound
      anonymous set that has been already removed in the same transaction.
      
      [   64.921510] ==================================================================
      [   64.923123] BUG: KASAN: wild-memory-access in nf_tables_commit+0xa24/0x1490 [nf_tables]
      [   64.924745] Write of size 8 at addr dead000000000122 by task test/890
      [   64.927903] CPU: 3 PID: 890 Comm: test Not tainted 6.3.0+ #253
      [   64.931120] Call Trace:
      [   64.932699]  <TASK>
      [   64.934292]  dump_stack_lvl+0x33/0x50
      [   64.935908]  ? nf_tables_commit+0xa24/0x1490 [nf_tables]
      [   64.937551]  kasan_report+0xda/0x120
      [   64.939186]  ? nf_tables_commit+0xa24/0x1490 [nf_tables]
      [   64.940814]  nf_tables_commit+0xa24/0x1490 [nf_tables]
      [   64.942452]  ? __kasan_slab_alloc+0x2d/0x60
      [   64.944070]  ? nf_tables_setelem_notify+0x190/0x190 [nf_tables]
      [   64.945710]  ? kasan_set_track+0x21/0x30
      [   64.947323]  nfnetlink_rcv_batch+0x709/0xd90 [nfnetlink]
      [   64.948898]  ? nfnetlink_rcv_msg+0x480/0x480 [nfnetlink]
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NLu Wei <luwei32@huawei.com>
      Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
      Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      dcb69fcc
    • D
      xfs: verify buffer contents when we skip log replay · d38a530e
      Darrick J. Wong 提交于
      mainline inclusion
      from mainline-v6.3-rc6
      commit 22ed903e
      category: bugfix
      bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6X4UN
      CVE: CVE-2023-2124
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=22ed903eee23a5b174e240f1cdfa9acf393a5210
      
      --------------------------------
      
      syzbot detected a crash during log recovery:
      
      XFS (loop0): Mounting V5 Filesystem bfdc47fc-10d8-4eed-a562-11a831b3f791
      XFS (loop0): Torn write (CRC failure) detected at log block 0x180. Truncating head block from 0x200.
      XFS (loop0): Starting recovery (logdev: internal)
      ==================================================================
      BUG: KASAN: slab-out-of-bounds in xfs_btree_lookup_get_block+0x15c/0x6d0 fs/xfs/libxfs/xfs_btree.c:1813
      Read of size 8 at addr ffff88807e89f258 by task syz-executor132/5074
      
      CPU: 0 PID: 5074 Comm: syz-executor132 Not tainted 6.2.0-rc1-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0x1b1/0x290 lib/dump_stack.c:106
       print_address_description+0x74/0x340 mm/kasan/report.c:306
       print_report+0x107/0x1f0 mm/kasan/report.c:417
       kasan_report+0xcd/0x100 mm/kasan/report.c:517
       xfs_btree_lookup_get_block+0x15c/0x6d0 fs/xfs/libxfs/xfs_btree.c:1813
       xfs_btree_lookup+0x346/0x12c0 fs/xfs/libxfs/xfs_btree.c:1913
       xfs_btree_simple_query_range+0xde/0x6a0 fs/xfs/libxfs/xfs_btree.c:4713
       xfs_btree_query_range+0x2db/0x380 fs/xfs/libxfs/xfs_btree.c:4953
       xfs_refcount_recover_cow_leftovers+0x2d1/0xa60 fs/xfs/libxfs/xfs_refcount.c:1946
       xfs_reflink_recover_cow+0xab/0x1b0 fs/xfs/xfs_reflink.c:930
       xlog_recover_finish+0x824/0x920 fs/xfs/xfs_log_recover.c:3493
       xfs_log_mount_finish+0x1ec/0x3d0 fs/xfs/xfs_log.c:829
       xfs_mountfs+0x146a/0x1ef0 fs/xfs/xfs_mount.c:933
       xfs_fs_fill_super+0xf95/0x11f0 fs/xfs/xfs_super.c:1666
       get_tree_bdev+0x400/0x620 fs/super.c:1282
       vfs_get_tree+0x88/0x270 fs/super.c:1489
       do_new_mount+0x289/0xad0 fs/namespace.c:3145
       do_mount fs/namespace.c:3488 [inline]
       __do_sys_mount fs/namespace.c:3697 [inline]
       __se_sys_mount+0x2d3/0x3c0 fs/namespace.c:3674
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      RIP: 0033:0x7f89fa3f4aca
      Code: 83 c4 08 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fffd5fb5ef8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
      RAX: ffffffffffffffda RBX: 00646975756f6e2c RCX: 00007f89fa3f4aca
      RDX: 0000000020000100 RSI: 0000000020009640 RDI: 00007fffd5fb5f10
      RBP: 00007fffd5fb5f10 R08: 00007fffd5fb5f50 R09: 000000000000970d
      R10: 0000000000200800 R11: 0000000000000206 R12: 0000000000000004
      R13: 0000555556c6b2c0 R14: 0000000000200800 R15: 00007fffd5fb5f50
       </TASK>
      
      The fuzzed image contains an AGF with an obviously garbage
      agf_refcount_level value of 32, and a dirty log with a buffer log item
      for that AGF.  The ondisk AGF has a higher LSN than the recovered log
      item.  xlog_recover_buf_commit_pass2 reads the buffer, compares the
      LSNs, and decides to skip replay because the ondisk buffer appears to be
      newer.
      
      Unfortunately, the ondisk buffer is corrupt, but recovery just read the
      buffer with no buffer ops specified:
      
      	error = xfs_buf_read(mp->m_ddev_targp, buf_f->blf_blkno,
      			buf_f->blf_len, buf_flags, &bp, NULL);
      
      Skipping the buffer leaves its contents in memory unverified.  This sets
      us up for a kernel crash because xfs_refcount_recover_cow_leftovers
      reads the buffer (which is still around in XBF_DONE state, so no read
      verification) and creates a refcountbt cursor of height 32.  This is
      impossible so we run off the end of the cursor object and crash.
      
      Fix this by invoking the verifier on all skipped buffers and aborting
      log recovery if the ondisk buffer is corrupt.  It might be smarter to
      force replay the log item atop the buffer and then see if it'll pass the
      write verifier (like ext4 does) but for now let's go with the
      conservative option where we stop immediately.
      
      Link: https://syzkaller.appspot.com/bug?extid=7e9494b8b399902e994eSigned-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Signed-off-by: NLong Li <leo.lilong@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      d38a530e
    • Z
      iommu/arm-smmu-v3: Fix ECMDQs is not initialized correctly · 2dd184bf
      Zhen Lei 提交于
      hulk inclusion
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I6WAZX
      
      --------------------------------
      
      When the number of cores is greater than the number of ECMDQs, the number
      of ECMDQs occupied by each NUMA node is less than the number of cores of
      the node. Therefore, the first smmu->nr_ecmdq cores do not cover all
      ECMDQs.
      
      For example:
       ---------------------------------------
      |       Node0       |       Node1       |
      |---------------------------------------|
      |   0   1   2   3   |   4   5   6   7   |  CPU ID
      |---------------------------------------|
      |      0      1     |      2      3     |  ECMDQ ID
       ---------------------------------------
      
      Fixes: 3965519b ("iommu/arm-smmu-v3: Add support for less than one ECMDQ per core")
      Signed-off-by: NZhen Lei <thunder.leizhen@huawei.com>
      Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      2dd184bf
    • O
      !791 crypto: hisilicon/qm - support dumping stop queue status · 81da3d0b
      openeuler-ci-bot 提交于
      Merge Pull Request from: @xiao_jiang_shui 
       
      Add debugfs 'dev_state' to query the status of the stop queue.
      And the root user can set 'dev_timeout', if task flow fails to be
      stopped, the driver waits dev_timeout * 20ms before releasing the queue.
      
      关联issue:https://gitee.com/openeuler/kernel/issues/I76TVJ 
       
      Link:https://gitee.com/openeuler/kernel/pulls/791 
      
      Reviewed-by: Yang Shen <shenyang39@huawei.com> 
      Reviewed-by: Jialin Zhang <zhangjialin11@huawei.com> 
      Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com> 
      81da3d0b
    • O
      !794 scsi: hisi_sas: The IO timeout mechanism and error handling related bugfix · f7f184bb
      openeuler-ci-bot 提交于
      Merge Pull Request from: @xia-bing1 
       
      1. Do not frequently enter the I/O exception handling process. Change the timeout interval of the DMA setup and data frame to 2.5s.
      2. When multiple I/Os are delivered in the NCQ scenario and one of the I/Os is faulty, the group slow disk problem will occur. Add patch at the hisi_sas layer to ensure consistency between the Linux community and the openEuler solution. 
       
      Link:https://gitee.com/openeuler/kernel/pulls/794 
      
      Reviewed-by: Yihang Li <liyihang9@huawei.com> 
      Reviewed-by: Jialin Zhang <zhangjialin11@huawei.com> 
      Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com> 
      f7f184bb
    • O
      !608 Net: ethernet: Support 3snic 3s9xx network card · 7420251e
      openeuler-ci-bot 提交于
      Merge Pull Request from: @steven-song3 
       
      The driver supports 3snic 3s9xx serial network cards (100GE (40GE
      compatible)-3S930 and 25GE (10GE compatible)-3S910/3S920).
      
      feature:
      1. Support single-root I/O virtualization (SR-IOV)
      2. Support virtual machine multi queue (VMMQ)
      3. Support receive side scaling (RSS)
      4. Support physical function (PF) passthrough VMs
      5. Support the PF promiscuous mode,unicast or multicast MAC filtering, and
      all multicast mode
      6. Support IPv4/IPv6, checksum offload,TCP Segmentation Offload (TSO), and
      Large Receive Offload (LRO)
      7. Support in-band one-click logs collection
      8. Support loopback tests
      9. Support port location indicators
      ==================================
      Test:
      compille: pass
      insmod/rmmod: pass
      iperf: Pass 
       
      Link:https://gitee.com/openeuler/kernel/pulls/608 
      
      Reviewed-by: Liu Chao <liuchao173@huawei.com> 
      Reviewed-by: Jialin Zhang <zhangjialin11@huawei.com> 
      Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com> 
      7420251e
    • W
      crypto: hisilicon/qm - support dumping stop queue status · 83430c8d
      Weili Qian 提交于
      driver inclusion
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I76TVJ
      CVE: NA
      
      ----------------------------------------------------------------------
      
      The debugfs files 'dev_state' and 'dev_timeout' are added.
      
      dev_state: if dev_timeout is set, dev_state indicates the status
      of stopping the queue. 0 indicates that the queue is stopped
      successfully. Other values indicate that the queue stops fail.
      if dev_timeout is not set, the value of dev_state is 0;
      
      dev_timeout: If the queue fails to stop, the queue is released
      after waiting dev_timeout * 20ms.
      Signed-off-by: NWeili Qian <qianweili@huawei.com>
      Signed-off-by: NJiangshui Yang <yangjiangshui@h-partners.com>
      83430c8d
    • W
      crypto: hisilicon/qm - add debugfs to query the status of the stop queue · e7c81a6e
      Weili Qian 提交于
      driver inclusion
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I76TVJ
      CVE: NA
      
      ----------------------------------------------------------------------
      
      Debugfs for query the status of stopping queue, the root user can
      set the waiting time after the task flow fails to be stopped.
      Signed-off-by: NWeili Qian <qianweili@huawei.com>
      Signed-off-by: NJiangshui Yang <yangjiangshui@h-partners.com>
      e7c81a6e
  3. 20 5月, 2023 6 次提交
  4. 19 5月, 2023 9 次提交
    • N
      memcg: support ksm merge any mode per cgroup · 0f6fb357
      Nanyong Sun 提交于
      hulk inclusion
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I72R0B
      CVE: NA
      
      ----------------------------------------------------------------------
      
      Add control file "memory.ksm" to enable ksm per cgroup.
      Echo to 1 will set all tasks currently in the cgroup to ksm merge
      any mode, which means ksm gets enabled for all vma's of a process.
      Meanwhile echo to 0 will disable ksm for them and unmerge the
      merged pages.
      Cat the file will show the above state and ksm related profits
      of this cgroup.
      Signed-off-by: NNanyong Sun <sunnanyong@huawei.com>
      0f6fb357
    • D
      mm/ksm: unmerge and clear VM_MERGEABLE when setting PR_SET_MEMORY_MERGE=0 · 351ceedb
      David Hildenbrand 提交于
      mainline inclusion
      from mainline-v6.4-rc1
      commit 24139c07
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I72R0B
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=24139c07f413ef4b555482c758343d71392a19bc
      
      ----------------------------------------------------------------------
      
      Patch series "mm/ksm: improve PR_SET_MEMORY_MERGE=0 handling and cleanup
      disabling KSM", v2.
      
      (1) Make PR_SET_MEMORY_MERGE=0 unmerge pages like setting MADV_UNMERGEABLE
      does, (2) add a selftest for it and (3) factor out disabling of KSM from
      s390/gmap code.
      
      This patch (of 3):
      
      Let's unmerge any KSM pages when setting PR_SET_MEMORY_MERGE=0, and clear
      the VM_MERGEABLE flag from all VMAs -- just like KSM would.  Of course,
      only do that if we previously set PR_SET_MEMORY_MERGE=1.
      
      Link: https://lkml.kernel.org/r/20230422205420.30372-1-david@redhat.com
      Link: https://lkml.kernel.org/r/20230422205420.30372-2-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NStefan Roesch <shr@devkernel.io>
      Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
      Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Janosch Frank <frankja@linux.ibm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Conflicts:
      	mm/ksm.c
      Signed-off-by: NNanyong Sun <sunnanyong@huawei.com>
      351ceedb
    • S
      mm: add new KSM process and sysfs knobs · a098d41e
      Stefan Roesch 提交于
      mainline inclusion
      from mainline-v6.4-rc1
      commit d21077fb
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I72R0B
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d21077fbc2fc987c2e593c34dc3b4d84e546dc9f
      
      ----------------------------------------------------------------------
      
      This adds the general_profit KSM sysfs knob and the process profit metric
      knobs to ksm_stat.
      
      1) expose general_profit metric
      
         The documentation mentions a general profit metric, however this
         metric is not calculated.  In addition the formula depends on the size
         of internal structures, which makes it more difficult for an
         administrator to make the calculation.  Adding the metric for a better
         user experience.
      
      2) document general_profit sysfs knob
      
      3) calculate ksm process profit metric
      
         The ksm documentation mentions the process profit metric and how to
         calculate it.  This adds the calculation of the metric.
      
      4) mm: expose ksm process profit metric in ksm_stat
      
         This exposes the ksm process profit metric in /proc/<pid>/ksm_stat.
         The documentation mentions the formula for the ksm process profit
         metric, however it does not calculate it.  In addition the formula
         depends on the size of internal structures.  So it makes sense to
         expose it.
      
      5) document new procfs ksm knobs
      
      Link: https://lkml.kernel.org/r/20230418051342.1919757-3-shr@devkernel.ioSigned-off-by: NStefan Roesch <shr@devkernel.io>
      Reviewed-by: NBagas Sanjaya <bagasdotme@gmail.com>
      Acked-by: NDavid Hildenbrand <david@redhat.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Rik van Riel <riel@surriel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NNanyong Sun <sunnanyong@huawei.com>
      a098d41e
    • S
      mm: add new api to enable ksm per process · 2cd2cdfe
      Stefan Roesch 提交于
      mainline inclusion
      from mainline-v6.4-rc1
      commit d7597f59
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I72R0B
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d7597f59d1d33e9efbffa7060deb9ee5bd119e62
      
      ----------------------------------------------------------------------
      
      Patch series "mm: process/cgroup ksm support", v9.
      
      So far KSM can only be enabled by calling madvise for memory regions.  To
      be able to use KSM for more workloads, KSM needs to have the ability to be
      enabled / disabled at the process / cgroup level.
      
      Use case 1:
        The madvise call is not available in the programming language.  An
        example for this are programs with forked workloads using a garbage
        collected language without pointers.  In such a language madvise cannot
        be made available.
      
        In addition the addresses of objects get moved around as they are
        garbage collected.  KSM sharing needs to be enabled "from the outside"
        for these type of workloads.
      
      Use case 2:
        The same interpreter can also be used for workloads where KSM brings
        no benefit or even has overhead.  We'd like to be able to enable KSM on
        a workload by workload basis.
      
      Use case 3:
        With the madvise call sharing opportunities are only enabled for the
        current process: it is a workload-local decision.  A considerable number
        of sharing opportunities may exist across multiple workloads or jobs (if
        they are part of the same security domain).  Only a higler level entity
        like a job scheduler or container can know for certain if its running
        one or more instances of a job.  That job scheduler however doesn't have
        the necessary internal workload knowledge to make targeted madvise
        calls.
      
      Security concerns:
      
        In previous discussions security concerns have been brought up.  The
        problem is that an individual workload does not have the knowledge about
        what else is running on a machine.  Therefore it has to be very
        conservative in what memory areas can be shared or not.  However, if the
        system is dedicated to running multiple jobs within the same security
        domain, its the job scheduler that has the knowledge that sharing can be
        safely enabled and is even desirable.
      
      Performance:
      
        Experiments with using UKSM have shown a capacity increase of around 20%.
      
        Here are the metrics from an instagram workload (taken from a machine
        with 64GB main memory):
      
         full_scans: 445
         general_profit: 20158298048
         max_page_sharing: 256
         merge_across_nodes: 1
         pages_shared: 129547
         pages_sharing: 5119146
         pages_to_scan: 4000
         pages_unshared: 1760924
         pages_volatile: 10761341
         run: 1
         sleep_millisecs: 20
         stable_node_chains: 167
         stable_node_chains_prune_millisecs: 2000
         stable_node_dups: 2751
         use_zero_pages: 0
         zero_pages_sharing: 0
      
      After the service is running for 30 minutes to an hour, 4 to 5 million
      shared pages are common for this workload when using KSM.
      
      Detailed changes:
      
      1. New options for prctl system command
         This patch series adds two new options to the prctl system call.
         The first one allows to enable KSM at the process level and the second
         one to query the setting.
      
      The setting will be inherited by child processes.
      
      With the above setting, KSM can be enabled for the seed process of a cgroup
      and all processes in the cgroup will inherit the setting.
      
      2. Changes to KSM processing
         When KSM is enabled at the process level, the KSM code will iterate
         over all the VMA's and enable KSM for the eligible VMA's.
      
         When forking a process that has KSM enabled, the setting will be
         inherited by the new child process.
      
      3. Add general_profit metric
         The general_profit metric of KSM is specified in the documentation,
         but not calculated.  This adds the general profit metric to
         /sys/kernel/debug/mm/ksm.
      
      4. Add more metrics to ksm_stat
         This adds the process profit metric to /proc/<pid>/ksm_stat.
      
      5. Add more tests to ksm_tests and ksm_functional_tests
         This adds an option to specify the merge type to the ksm_tests.
         This allows to test madvise and prctl KSM.
      
         It also adds a two new tests to ksm_functional_tests: one to test
         the new prctl options and the other one is a fork test to verify that
         the KSM process setting is inherited by client processes.
      
      This patch (of 3):
      
      So far KSM can only be enabled by calling madvise for memory regions.  To
      be able to use KSM for more workloads, KSM needs to have the ability to be
      enabled / disabled at the process / cgroup level.
      
      1. New options for prctl system command
      
         This patch series adds two new options to the prctl system call.
         The first one allows to enable KSM at the process level and the second
         one to query the setting.
      
         The setting will be inherited by child processes.
      
         With the above setting, KSM can be enabled for the seed process of a
         cgroup and all processes in the cgroup will inherit the setting.
      
      2. Changes to KSM processing
      
         When KSM is enabled at the process level, the KSM code will iterate
         over all the VMA's and enable KSM for the eligible VMA's.
      
         When forking a process that has KSM enabled, the setting will be
         inherited by the new child process.
      
        1) Introduce new MMF_VM_MERGE_ANY flag
      
           This introduces the new flag MMF_VM_MERGE_ANY flag.  When this flag
           is set, kernel samepage merging (ksm) gets enabled for all vma's of a
           process.
      
        2) Setting VM_MERGEABLE on VMA creation
      
           When a VMA is created, if the MMF_VM_MERGE_ANY flag is set, the
           VM_MERGEABLE flag will be set for this VMA.
      
        3) support disabling of ksm for a process
      
           This adds the ability to disable ksm for a process if ksm has been
           enabled for the process with prctl.
      
        4) add new prctl option to get and set ksm for a process
      
           This adds two new options to the prctl system call
           - enable ksm for all vmas of a process (if the vmas support it).
           - query if ksm has been enabled for a process.
      
      3. Disabling MMF_VM_MERGE_ANY for storage keys in s390
      
         In the s390 architecture when storage keys are used, the
         MMF_VM_MERGE_ANY will be disabled.
      
      Link: https://lkml.kernel.org/r/20230418051342.1919757-1-shr@devkernel.io
      Link: https://lkml.kernel.org/r/20230418051342.1919757-2-shr@devkernel.ioSigned-off-by: NStefan Roesch <shr@devkernel.io>
      Acked-by: NDavid Hildenbrand <david@redhat.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Bagas Sanjaya <bagasdotme@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Conflicts:
      	kernel/sys.c mm/ksm.c mm/mmap.c
      Signed-off-by: NNanyong Sun <sunnanyong@huawei.com>
      2cd2cdfe
    • X
      ksm: add profit monitoring documentation · ac02d6e7
      xu xin 提交于
      mainline inclusion
      from mainline-v6.1-rc1
      commit 21b7bdb5
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I72R0B
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=21b7bdb504ae6b0a795c8d63818611ce02b532c1
      
      ----------------------------------------------------------------------
      
      Add the description of KSM profit and how to determine it separately in
      system-wide range and inner a single process.
      
      Link: https://lkml.kernel.org/r/20220830144003.299870-1-xu.xin16@zte.com.cnSigned-off-by: Nxu xin <xu.xin16@zte.com.cn>
      Reviewed-by: NXiaokai Ran <ran.xiaokai@zte.com.cn>
      Reviewed-by: NYang Yang <yang.yang29@zte.com.cn>
      Reviewed-by: NBagas Sanjaya <bagasdotme@gmail.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Izik Eidus <izik.eidus@ravellosystems.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Conflicts:
      	Documentation/admin-guide/mm/ksm.rst
      Signed-off-by: NNanyong Sun <sunnanyong@huawei.com>
      ac02d6e7
    • X
      ksm: count allocated ksm rmap_items for each process · 8c3ecf85
      xu xin 提交于
      mainline inclusion
      from mainline-v6.1-rc1
      commit cb4df4ca
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I72R0B
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cb4df4cae4f2bd8cf7a32eff81178fce31600f7c
      
      ----------------------------------------------------------------------
      
      Patch series "ksm: count allocated rmap_items and update documentation",
      v5.
      
      KSM can save memory by merging identical pages, but also can consume
      additional memory, because it needs to generate rmap_items to save each
      scanned page's brief rmap information.
      
      To determine how beneficial the ksm-policy (like madvise), they are using
      brings, so we add a new interface /proc/<pid>/ksm_stat for each process
      The value "ksm_rmap_items" in it indicates the total allocated ksm
      rmap_items of this process.
      
      The detailed description can be seen in the following patches' commit
      message.
      
      This patch (of 2):
      
      KSM can save memory by merging identical pages, but also can consume
      additional memory, because it needs to generate rmap_items to save each
      scanned page's brief rmap information.  Some of these pages may be merged,
      but some may not be abled to be merged after being checked several times,
      which are unprofitable memory consumed.
      
      The information about whether KSM save memory or consume memory in
      system-wide range can be determined by the comprehensive calculation of
      pages_sharing, pages_shared, pages_unshared and pages_volatile.  A simple
      approximate calculation:
      
      	profit =~ pages_sharing * sizeof(page) - (all_rmap_items) *
      	         sizeof(rmap_item);
      
      where all_rmap_items equals to the sum of pages_sharing, pages_shared,
      pages_unshared and pages_volatile.
      
      But we cannot calculate this kind of ksm profit inner single-process wide
      because the information of ksm rmap_item's number of a process is lacked.
      For user applications, if this kind of information could be obtained, it
      helps upper users know how beneficial the ksm-policy (like madvise) they
      are using brings, and then optimize their app code.  For example, one
      application madvise 1000 pages as MERGEABLE, while only a few pages are
      really merged, then it's not cost-efficient.
      
      So we add a new interface /proc/<pid>/ksm_stat for each process in which
      the value of ksm_rmap_itmes is only shown now and so more values can be
      added in future.
      
      So similarly, we can calculate the ksm profit approximately for a single
      process by:
      
      	profit =~ ksm_merging_pages * sizeof(page) - ksm_rmap_items *
      		 sizeof(rmap_item);
      
      where ksm_merging_pages is shown at /proc/<pid>/ksm_merging_pages, and
      ksm_rmap_items is shown in /proc/<pid>/ksm_stat.
      
      Link: https://lkml.kernel.org/r/20220830143731.299702-1-xu.xin16@zte.com.cn
      Link: https://lkml.kernel.org/r/20220830143838.299758-1-xu.xin16@zte.com.cnSigned-off-by: Nxu xin <xu.xin16@zte.com.cn>
      Reviewed-by: NXiaokai Ran <ran.xiaokai@zte.com.cn>
      Reviewed-by: NYang Yang <yang.yang29@zte.com.cn>
      Signed-off-by: NCGEL ZTE <cgel.zte@gmail.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Bagas Sanjaya <bagasdotme@gmail.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Izik Eidus <izik.eidus@ravellosystems.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Conflicts:
      	include/linux/mm_types.h
      Signed-off-by: NNanyong Sun <sunnanyong@huawei.com>
      8c3ecf85
    • X
      ksm: count ksm merging pages for each process · 44acbc78
      xu xin 提交于
      mainline inclusion
      from mainline-v5.19-rc1
      commit 76093853
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I72R0B
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7609385337a4feb6236e42dcd0df2185683ce839
      
      ----------------------------------------------------------------------
      
      Some applications or containers want to use KSM by calling madvise() to
      advise areas of address space to be MERGEABLE.  But they may not know
      which applications are more likely to cause real merges in the
      deployment.  If this patch is applied, it helps them know their
      corresponding number of merged pages, and then optimize their app code.
      
      As current KSM only counts the number of KSM merging pages(e.g.
      ksm_pages_sharing and ksm_pages_shared) of the whole system, we cannot see
      the more fine-grained KSM merging, for the upper application optimization,
      the merging area cannot be set easily according to the KSM page merging
      probability of each process.  Therefore, it is necessary to add extra
      statistical means so that the upper level users can know the detailed KSM
      merging information of each process.
      
      We add a new proc file named as ksm_merging_pages under /proc/<pid>/ to
      indicate the involved ksm merging pages of this process.
      
      [akpm@linux-foundation.org: fix comment typo, remove BUG_ON()s]
      Link: https://lkml.kernel.org/r/20220325082318.2352853-1-xu.xin16@zte.com.cnSigned-off-by: Nxu xin <xu.xin16@zte.com.cn>
      Reported-by: Nkernel test robot <lkp@intel.com>
      Reviewed-by: NYang Yang <yang.yang29@zte.com.cn>
      Reviewed-by: NRan Xiaokai <ran.xiaokai@zte.com.cn>
      Reported-by: NZeal Robot <zealci@zte.com.cn>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Ohhoon Kwon <ohoono.kwon@samsung.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Feng Tang <feng.tang@intel.com>
      Cc: Yang Yang <yang.yang29@zte.com.cn>
      Cc: Ran Xiaokai <ran.xiaokai@zte.com.cn>
      Cc: Zeal Robot <zealci@zte.com.cn>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Conflicts:
      	include/linux/mm_types.h
      Signed-off-by: NNanyong Sun <sunnanyong@huawei.com>
      44acbc78
    • S
      Net: ethernet: Support 3snic 3s9xx network card · ebbca448
      Steven Song 提交于
      3snic inclusion
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I6TX4J
      CVE: NA
      
       --------------------------------
      
      The driver supports 3snic 3s9xx serial network cards (100GE (40GE
      compatible)-3S930 and 25GE (10GE compatible)-3S910/3S920).
      
      Feature:
      1. Support single-root I/O virtualization (SR-IOV)
      2. Support virtual machine multi queue (VMMQ)
      3. Support receive side scaling (RSS)
      4. Support physical function (PF) passthrough VMs
      5. Support the PF promiscuous mode,unicast or multicast MAC filtering, and
         all multicast mode
      6. Support IPv4/IPv6, checksum offload,TCP Segmentation Offload (TSO), and
         Large Receive Offload (LRO)
      7. Support in-band one-click logs collection
      8. Support loopback tests
      9. Support port location indicators
      Reviewed-by: NChen Mou <chenmou@3snic.com>
      Reviewed-by: NWan Renyong <wanry@3snic.com>
      Reviewed-by: NYang Gan <yanggan@3snic.com>
      Reviewed-by: NWen Liang <wenliang@3snic.com>
      Signed-off-by: NSteven Song <steven.song@3snic.com>
      ebbca448
    • O
      !778 [sync] PR-774: Backport CVEs and bugfixes · 665edcec
      openeuler-ci-bot 提交于
      Merge Pull Request from: @openeuler-sync-bot 
       
      
      Origin pull request: 
      https://gitee.com/openeuler/kernel/pulls/774 
       
      Pull new CVEs:
      CVE-2023-32269
      CVE-2023-2002
      CVE-2023-26544
      CVE-2023-0459
      
      mm bugfixes from Yu Kuai
      fs bugfix from yangerkun
      fs perfs from Zhihao Cheng
      
       
       
      Link:https://gitee.com/openeuler/kernel/pulls/778 
      
      Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> 
      Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> 
      665edcec
  5. 18 5月, 2023 7 次提交