1. 01 11月, 2022 22 次提交
  2. 31 10月, 2022 10 次提交
    • O
      !178 bpf: add a bpf_override_reg helper · ea43b636
      openeuler-ci-bot 提交于
      Merge Pull Request from: @JqyangCode 
       
      [2022开源之夏] - 为ebpf添加新的辅助函数-bpf_override_reg ,功能是修改目标函数的寄存器,包括入参,返回值等
      --------------------------------------
      Error injection is a very important method for testing system stability. But for now,
      it is still lacking in this regard, but BPF can fill this gap perfectly with its kprobe
      function.
      
      We can use some validation methods to ensure that it only fires on calls we restrict.
      Although there are bpf_override_funciton that can complete some related operations
      before, this is a function that bypasses the initial detection and only modifies the
      return value to the specified value.This does not meet some of our practical scenarios:
      
      1. For example, other registers (such as input registers) need to be modified: when we
      receive a network packet, we will convert it into a structure and pass it to the corresponding
      function for processing.For the fault tolerance of network data, we need to modify the
      members of this structure, which it cannot do.
      
      2. The function cannot be mounted or what needs to be modified is not the function but the
      instruction: when the sensor reads the IO data, we need to simulate the IO data error.
      At this time, the reading of the IO data may not be a function, but a few simple instructions
      
      In summary, it is necessary to extend an interface that can modify any register, which can
      provide us with a simple way to achieve system error injection
      
      bugzilla: #I55FLX 
       
      Link:https://gitee.com/openeuler/kernel/pulls/178 
      Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> 
      Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> 
      ea43b636
    • J
      bpf: add a bpf_override_regs helper · e05ebae4
      JqyangCode 提交于
      openeuler inclusion
      category: feature
      bugzilla: https://gitee.com/open_euler/dashboard?issue_id=I55FLX
      CVE: NA
      
      Reference: NA
      
      -------------------
      
      Error injection is a very important method for testing system stability. But for now,
      it is still lacking in this regard, but BPF can fill this gap perfectly with its kprobe
      function.
      
      We can use some validation methods to ensure that it only fires on calls we restrict.
      Although there are bpf_override_funciton that can complete some related operations
      before, this is a function that bypasses the initial detection and only modifies the
      return value to the specified value.This does not meet some of our practical scenarios:
      
      1. For example, other registers (such as input registers) need to be modified: when we
      receive a network packet, we will convert it into a structure and pass it to the corresponding
      function for processing.For the fault tolerance of network data, we need to modify the
      members of this structure, which it cannot do.
      
      2. The function cannot be mounted or what needs to be modified is not the function but the
      instruction: when the sensor reads the IO data, we need to simulate the IO data error.
      At this time, the reading of the IO data may not be a function, but a few simple instructions
      
      In summary, it is necessary to extend an interface that can modify any register, which can
      provide us with a simple way to achieve system error injection
      Signed-off-by: NJqyangCode <teanix@163.com>
      e05ebae4
    • O
      !182 futex: introduce the direct-thread-switch mechanism · 525e6792
      openeuler-ci-bot 提交于
      Merge Pull Request from: @hizhisong 
       
      In some cases, we need to run several low-thrashing required threads together
      which act as logical operations like PV operations. This kind of thread always
      falls asleep and wakes other threads up, and thread switching requires the
      kernel to do several scheduling related overheads (Select the proper core
      to execute, wake the task up, enqueue the task, mark the task scheduling flag,
      pick the task at the proper time, dequeue the task and do context switching).
      These overheads mentioned above are not accepted for the low-thrashing threads.
      Therefore, we require a mechanism to decline the unnecessary overhead and to
      swap threads directly without affecting the fairness of CFS tasks.
      
      To achieve this goal, we implemented the direct-thread-switch mechanism
      based on the futex_swap patch*, which switches the DTS task directly with
      the shared schedule entity. Also, we ensured the kernel keeps secure and
      consistent basically.
      
      \* https://lore.kernel.org/lkml/20200722234538.166697-2-posk@posk.io/
       
       
      Link:https://gitee.com/openeuler/kernel/pulls/182 
      Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> 
      Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> 
      525e6792
    • B
      futex: introduce the direct-thread-switch mechanism · dad99a57
      briansun 提交于
      openeuler inclusion
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4L9RU
      CVE: NA
      
      Reference: https://lore.kernel.org/lkml/20200722234538.166697-2-posk@posk.io/
      
      -------------------
      
      In some scenarios, we need to run several low-thrashing required threads
      together which act as logical operations like PV operations. This kind of
      thread always falls asleep and wakes other threads up, and thread switching
      requires the kernel to do several scheduling related overheads (Select the
      proper core to execute, wake the task up, enqueue the task, mark the task
      scheduling flag, pick the task at the proper time, dequeue the task and do
      context switching). These overheads mentioned above are not accepted for the
      low-thrashing threads. Therefore, we require a mechanism to decline the
      unnecessary overhead and to swap threads directly without affecting the
      fairness of CFS tasks.
      
      To achieve this goal, we implemented the direct-thread-switch mechanism
      based on the futex_swap patch*, which switches the DTS task directly with
      the shared schedule entity. Also, we ensured the kernel keeps secure and
      consistent basically.
      Signed-off-by: NZhi Song <hizhisong@gmail.com>
      dad99a57
    • P
      selftests/futex: add futex_swap selftest · 8871eed8
      Peter Oskolkov 提交于
      openeuler inclusion
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4L9RU
      CVE: NA
      
      -------------------
      
      This is the final patch in FUTEX_SWAP patchset. It
      adds a test/benchmark to validate behavior and
      compare performance of a new FUTEX_SWAP futex operation.
      
      Detailed API design and behavior considerations are provided
      in the commit messages of the previous two patches.
      Signed-off-by: NPeter Oskolkov <posk@google.com>
      8871eed8
    • P
      futex/sched: add wake_up_process_prefer_current_cpu, use in FUTEX_SWAP · fc4a2354
      Peter Oskolkov 提交于
      openeuler inclusion
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4L9RU
      CVE: NA
      
      -------------------
      
      As described in the previous patch in this patchset
      ("futex: introduce FUTEX_SWAP operation"), it is often
      beneficial to wake a task and run it on the same CPU
      where the current going to sleep task it running.
      
      Internally at Google, switchto_switch sycall not only
      migrates the wakee to the current CPU, but also moves
      the waker's load stats to the wakee, thus ensuring
      that the migration to the current CPU does not interfere
      with load balancing. switchto_switch also does the
      context switch into the wakee, bypassing schedule().
      
      This patchset does not go that far yet, it simply
      migrates the wakee to the current CPU and calls schedule().
      
      In follow-up patches I will try to fune-tune the behavior by adjusting
      load stats and schedule(): our internal switchto_switch
      is still about 2x faster than FUTEX_SWAP (see numbers below).
      
      And now about performance: futex_swap benchmark
      from the last patch in this patchset produces this typical
      output:
      
      $ ./futex_swap -i 100000
      
      ------- running SWAP_WAKE_WAIT -----------
      
      completed 100000 swap and back iterations in 820683263 ns: 4103 ns per swap
      PASS
      
      ------- running SWAP_SWAP -----------
      
      completed 100000 swap and back iterations in 124034476 ns: 620 ns per swap
      PASS
      
      In the above, the first benchmark (SWAP_WAKE_WAIT) calls FUTEX_WAKE,
      then FUTEX_WAIT; the second benchmark (SWAP_SWAP) calls FUTEX_SWAP.
      
      If the benchmark is restricted to a single cpu:
      
      $ taskset -c 1 ./futex_swap -i 1000000
      
      The numbers are very similar, as expected (with wake+wait being
      a bit slower than swap due to two vs one syscalls).
      
      Please also note that switchto_switch is about 2x faster than
      FUTEX_SWAP because it does a contex switch to the wakee immediately,
      bypassing schedule(), so this is one of the options I'll
      explore in further patches (if/when this initial patchset is
      accepted).
      
      Tested: see the last patch is this patchset.
      Signed-off-by: NPeter Oskolkov <posk@google.com>
      fc4a2354
    • P
      futex: introduce FUTEX_SWAP operation · b87ae678
      Peter Oskolkov 提交于
      openeuler inclusion
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4L9RU
      CVE: NA
      
      -------------------
      
      As Paul Turner presented at LPC in 2013 ...
      - pdf: http://pdxplumbers.osuosl.org/2013/ocw//system/presentations/1653/original/LPC%20-%20User%20Threading.pdf
      - video: https://www.youtube.com/watch?v=KXuZi9aeGTw
      
      ... Google has developed an M:N userspace threading subsystem backed
      by Google-private SwitchTo Linux Kernel API (page 17 in the pdf referenced
      above). This subsystem provides latency-sensitive services at Google with
      fine-grained user-space control/scheduling over what is running when,
      and this subsystem is used widely internally (called schedulers or fibers).
      
      This patchset is the first step to open-source this work. As explained
      in the linked pdf and video, SwitchTo API has three core operations: wait,
      resume, and swap (=switch). So this patchset adds a FUTEX_SWAP operation
      that, in addition to FUTEX_WAIT and FUTEX_WAKE, will provide a foundation
      on top of which user-space threading libraries can be built.
      
      Another common use case for FUTEX_SWAP is message passing a-la RPC
      between tasks: task/thread T1 prepares a message,
      wakes T2 to work on it, and waits for the results; when T2 is done, it
      wakes T1 and waits for more work to arrive. Currently the simplest
      way to implement this is
      
      a. T1: futex-wake T2, futex-wait
      b. T2: wakes, does what it has been woken to do
      c. T2: futex-wake T1, futex-wait
      
      With FUTEX_SWAP, steps a and c above can be reduced to one futex operation
      that runs 5-10 times faster.
      
      Patches in this patchset:
      
      Patch 1: (this patch) introduce FUTEX_SWAP futex operation that,
               internally, does wake + wait. The purpose of this patch is
               to work out the API.
      Patch 2: a first rough attempt to make FUTEX_SWAP faster than
               what wake + wait can do.
      Patch 3: a selftest that can also be used to benchmark FUTEX_SWAP vs
               FUTEX_WAKE + FUTEX_WAIT.
      
      Tested: see patch 3 in this patchset.
      Signed-off-by: NPeter Oskolkov <posk@google.com>
      b87ae678
    • O
      !172 f2fs: extent cache: support unaligned extent · 79d1782a
      openeuler-ci-bot 提交于
      Merge Pull Request from: @gewus 
       
          Compressed inode may suffer read performance issue due to it can not
          use extent cache, so I propose to add this unaligned extent support
          to improve it.
      
          Currently, it only works in readonly format f2fs image.
      
          Unaligned extent: in one compressed cluster, physical block number
          will be less than logical block number, so we add an extra physical
          block length in extent info in order to indicate such extent status.
      
          The idea is if one whole cluster blocks are contiguous physically,
          once its mapping info was readed at first time, we will cache an
          unaligned (or aligned) extent info entry in extent cache, it expects
          that the mapping info will be hitted when rereading cluster. 
       
      Link:https://gitee.com/openeuler/kernel/pulls/172 
      Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> 
      Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> 
      79d1782a
    • C
      f2fs: extent cache: support unaligned extent · 5282d0b8
      Chao Yu 提交于
      mainline inclusion
      from mainline-v5.19
      commit 94afd6d6
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5XX1X
      CVE: NA
      
      ----------------------
      
      Compressed inode may suffer read performance issue due to it can not
      use extent cache, so I propose to add this unaligned extent support
      to improve it.
      
      Currently, it only works in readonly format f2fs image.
      
      Unaligned extent: in one compressed cluster, physical block number
      will be less than logical block number, so we add an extra physical
      block length in extent info in order to indicate such extent status.
      
      The idea is if one whole cluster blocks are contiguous physically,
      once its mapping info was readed at first time, we will cache an
      unaligned (or aligned) extent info entry in extent cache, it expects
      that the mapping info will be hitted when rereading cluster.
      
      Merge policy:
      - Aligned extents can be merged.
      - Aligned extent and unaligned extent can not be merged.
      Signed-off-by: NChao Yu <chao@kernel.org>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NGewus <1319579758@qq.com>
      5282d0b8
    • D
      f2fs: enable extent cache for compression files in read-only · 6f08e1da
      Daeho Jeong 提交于
      mainline inclusion
      from mainline-v5.19
      commit 4215d054
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5XX1X
      CVE: NA
      
      -----------------
      
      Let's allow extent cache for RO partition.
      Signed-off-by: NDaeho Jeong <daehojeong@google.com>
      Reviewed-by: NChao Yu <chao@kernel.org>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NGeuws <1319579758@qq.com>
      6f08e1da
  3. 29 10月, 2022 2 次提交
    • O
      !189 mm: page_alloc: Add a tracepoint to trace the call of __alloc_pages() and export symbols · 3b7ab23f
      openeuler-ci-bot 提交于
      Merge Pull Request from: @AoDaMo 
       
      This is the result of the OSPP 2022 Project
      
      /proc/meminfo is the main way for user to know the physical memory usage, but it cannot know every operation of allocating physical pages. Some kernel modules can call alloc_pages to get physical pages directly, which part of physical memory will not be counted by /proc/meminfo.
          
      In order to trace the specific process of physical page allocation and release, a solution is to insert a new modules into kernel and register tracepoint handlers to obtain the physical page allocation information. So i added a new tracepoint named mm_page_alloc_enter at the entrance of mm/page_alloc.c:__alloc_pages(), and exported relevant tracepoint symbols for kernel module programming. 
       
      Link:https://gitee.com/openeuler/kernel/pulls/189 
      Reviewed-by: Liu YongQiang <liuyongqiang13@huawei.com> 
      Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> 
      Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> 
      3b7ab23f
    • O
      !167 Summer OSPP 2022:Support Multi Gen LRU in openEuler · 63906034
      openeuler-ci-bot 提交于
      Merge Pull Request from: @morymiss 
       
      Multi Gen LRU is a feature for optimizing the memory recycling strategy. According to Google's test, with the help of MGLRU, the CPU utilization of kswapd is reduced by 40%, the background miskill is reduced by 85% when 75% of the memory is occupied, and the rendering delay is reduced by 18% when 50% of the memory is occupied. This feature has been integrated into the Linux next branch. In this PR, the patch is integrated into the openEuler-22.09 branch.
      
      MGLRU is mainly designed for memory recycling. First, the LRU linked list of the management page is divided into more algebra. In each generation, the number of page refluts is divided into different levels to achieve more granular management of the page. At the same time, when executing the kswapd daemon, scan the PTE of the active processes since the last execution, instead of directly scanning the physical memory. This can effectively use the spatial locality, improve the scanning efficiency, and reduce the CPU overhead.
      
      In the current test results, the IOPS and BW indicators of the fio tool are significantly improved compared with the conventional recovery methods, which proves that the memory management performance of the system is significantly improved under high IO load, and the cpu occupation of the kswapd process is significantly reduced in the test of cpu occupation. In addition, in the current test, the server, embedded and mobile devices are also tested in scenarios. The test results show the good performance of MGLRU in memory recycling, especially in high load scenarios.
      
      bugzilla: https://gitee.com/openeuler/open-source-summer/issues/I55Z0L
      
      Reference:
      https://lore.kernel.org/lkml/20220407031525.2368067-1-yuzhao @google.com/
      https://android-review.googlesource.com/c/kernel/common/ +/2050906/10 
       
      Link:https://gitee.com/openeuler/kernel/pulls/167 
      Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> 
      Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> 
      63906034
  4. 28 10月, 2022 6 次提交