1. 16 3月, 2020 2 次提交
  2. 14 3月, 2020 6 次提交
  3. 12 3月, 2020 9 次提交
  4. 11 3月, 2020 4 次提交
  5. 05 3月, 2020 19 次提交
    • C
      livepatch/x86: enable livepatch config openeuler · 4b90845b
      Cheng Jian 提交于
      hulk inclusion
      category: feature
      bugzilla: 5507
      CVE: NA
      
      ---------------------------
      
      We have completed the livepatch without ftrace for x86_64,
      we can now enable it.
      Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>
      Reviewed-By: NXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      4b90845b
    • C
      livepatch/x86: enable livepatch config for hulk · 6a07930a
      Cheng Jian 提交于
      hulk inclusion
      category: feature
      bugzilla: 5507
      CVE: NA
      
      ---------------------------
      
      We have completed the livepatch without ftrace for x86_64,
      we can now enable it.
      Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>
      Reviewed-By: NXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      6a07930a
    • C
      livepatch/arm64: check active func in consistency stack checking · b6f7ad60
      Cheng Jian 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 5507/31358
      CVE: NA
      ---------------------------
      
      When doing consistency stack checking, if we try to patch a
      function which has been patched already. We should check the
      new function(not the origin func) that is activeness currently,
      it's always the first entry in list func_node->func_stack.
      
      Example :
      	module : origin			livepatch v1		livepatch v2
      	func   : old func A -[enable]=> new func A' -[enable]=> new func A''
      	check  :		A			A'
      
      when we try to patch function A to new function A'' by livepatch
      v2, but the func A has already patched to function A' by livepatch
      v1, so function A' which provided in livepatch v1 is active in the
      stack instead of origin function A. Even if the long jump method is
      used, we jump to the new function A' using a call without LR, the
      origin function A will not appear in the stack. We must check the
      active function A' in consistency stack checking.
      Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>
      Reviewed-By: NXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      b6f7ad60
    • C
      livepatch/x86: check active func in consistency stack checking · 27057710
      Cheng Jian 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 5507/31358
      CVE: NA
      ---------------------------
      
      When doing consistency stack checking, if we try to patch a
      function which has been patched already. We should check the
      new function(not the origin func) that is activeness currently,
      it's always the first entry in list func_node->func_stack.
      
      Example :
      	module : origin			livepatch v1		livepatch v2
      	func   : old func A -[enable]=> new func A' -[enable]=> new func A''
      	check  :		A			A'
      
      when we try to patch function A to new function A'' by livepatch
      v2, but the func A has already patched to function A' by livepatch
      v1, so function A' which provided in livepatch v1 is active in the
      stack instead of origin function A. Even if the long jump method is
      used, we jump to the new function A' using a call without LR, the
      origin function A will not appear in the stack. We must check the
      active function A' in consistency stack checking.
      Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>
      Reviewed-By: NXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      27057710
    • C
      livepatch/x86: support livepatch without ftrace · 7e2ab91e
      Cheng Jian 提交于
      hulk inclusion
      category: feature
      bugzilla: 5507
      CVE: NA
      
      ----------------------------------------
      
      support livepatch without ftrace for x86_64
      
      supported now:
              livepatch relocation when init_patch after load_module;
              instruction patched when enable;
      	activeness function check;
      	enforcing the patch stacking principle;
      
      x86_64 use variable length instruction, so there's no need to consider
      extra implementation for long jumps.
      Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>
      Signed-off-by: NLi Bin <huawei.libin@huawei.com>
      Tested-by: NYang ZuoTing <yangzuoting@huawei.com>
      Tested-by: NCheng Jian <cj.chengjian@huawei.com>
      Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
      Reviewed-By: NXie XiuQi <xiexiuqi@huawei.com>
      Reviewed-By: NXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      7e2ab91e
    • O
      KVM: nVMX: Check IO instruction VM-exit conditions · eb232fcd
      Oliver Upton 提交于
      commit 35a571346a94fb93b5b3b6a599675ef3384bc75c upstream.
      
      Consult the 'unconditional IO exiting' and 'use IO bitmaps' VM-execution
      controls when checking instruction interception. If the 'use IO bitmaps'
      VM-execution control is 1, check the instruction access against the IO
      bitmaps to determine if the instruction causes a VM-exit.
      Signed-off-by: NOliver Upton <oupton@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      eb232fcd
    • O
      KVM: nVMX: Refactor IO bitmap checks into helper function · 6959f6e4
      Oliver Upton 提交于
      commit e71237d3ff1abf9f3388337cfebf53b96df2020d upstream.
      
      Checks against the IO bitmap are useful for both instruction emulation
      and VM-exit reflection. Refactor the IO bitmap checks into a helper
      function.
      Signed-off-by: NOliver Upton <oupton@google.com>
      Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      6959f6e4
    • P
      KVM: nVMX: Don't emulate instructions in guest mode · eb561a7a
      Paolo Bonzini 提交于
      commit 07721feee46b4b248402133228235318199b05ec upstream.
      
      vmx_check_intercept is not yet fully implemented. To avoid emulating
      instructions disallowed by the L1 hypervisor, refuse to emulate
      instructions by default.
      
      Cc: stable@vger.kernel.org
      [Made commit, added commit msg - Oliver]
      Signed-off-by: NOliver Upton <oupton@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      eb561a7a
    • L
      floppy: check FDC index for errors before assigning it · 2ad1a109
      Linus Torvalds 提交于
      commit 2e90ca68b0d2f5548804f22f0dd61145516171e3 upstream.
      
      Jordy Zomer reported a KASAN out-of-bounds read in the floppy driver in
      wait_til_ready().
      
      Which on the face of it can't happen, since as Willy Tarreau points out,
      the function does no particular memory access.  Except through the FDCS
      macro, which just indexes a static allocation through teh current fdc,
      which is always checked against N_FDC.
      
      Except the checking happens after we've already assigned the value.
      
      The floppy driver is a disgrace (a lot of it going back to my original
      horrd "design"), and has no real maintainer.  Nobody has the hardware,
      and nobody really cares.  But it still gets used in virtual environment
      because it's one of those things that everybody supports.
      
      The whole thing should be re-written, or at least parts of it should be
      seriously cleaned up.  The 'current fdc' index, which is used by the
      FDCS macro, and which is often shadowed by a local 'fdc' variable, is a
      prime example of how not to write code.
      
      But because nobody has the hardware or the motivation, let's just fix up
      the immediate problem with a nasty band-aid: test the fdc index before
      actually assigning it to the static 'fdc' variable.
      Reported-by: NJordy Zomer <jordy@simplyhacker.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      2ad1a109
    • S
      ext4: add cond_resched() to __ext4_find_entry() · ef91f0f3
      Shijie Luo via Kernel 提交于
      mainline inclusion
      from mainline-v5.6-rc3
      commit 9424ef56
      category: bugfix
      bugzilla: 31127
      CVE: NA
      
      -------------------------------------------------
      We tested a soft lockup problem in linux 4.19 which could also
      be found in linux 5.x.
      
      When dir inode takes up a large number of blocks, and if the
      directory is growing when we are searching, it's possible the
      restart branch could be called many times, and the do while loop
      could hold cpu a long time.
      
      Here is the call trace in linux 4.19.
      
      [  473.756186] Call trace:
      [  473.756196]  dump_backtrace+0x0/0x198
      [  473.756199]  show_stack+0x24/0x30
      [  473.756205]  dump_stack+0xa4/0xcc
      [  473.756210]  watchdog_timer_fn+0x300/0x3e8
      [  473.756215]  __hrtimer_run_queues+0x114/0x358
      [  473.756217]  hrtimer_interrupt+0x104/0x2d8
      [  473.756222]  arch_timer_handler_virt+0x38/0x58
      [  473.756226]  handle_percpu_devid_irq+0x90/0x248
      [  473.756231]  generic_handle_irq+0x34/0x50
      [  473.756234]  __handle_domain_irq+0x68/0xc0
      [  473.756236]  gic_handle_irq+0x6c/0x150
      [  473.756238]  el1_irq+0xb8/0x140
      [  473.756286]  ext4_es_lookup_extent+0xdc/0x258 [ext4]
      [  473.756310]  ext4_map_blocks+0x64/0x5c0 [ext4]
      [  473.756333]  ext4_getblk+0x6c/0x1d0 [ext4]
      [  473.756356]  ext4_bread_batch+0x7c/0x1f8 [ext4]
      [  473.756379]  ext4_find_entry+0x124/0x3f8 [ext4]
      [  473.756402]  ext4_lookup+0x8c/0x258 [ext4]
      [  473.756407]  __lookup_hash+0x8c/0xe8
      [  473.756411]  filename_create+0xa0/0x170
      [  473.756413]  do_mkdirat+0x6c/0x140
      [  473.756415]  __arm64_sys_mkdirat+0x28/0x38
      [  473.756419]  el0_svc_common+0x78/0x130
      [  473.756421]  el0_svc_handler+0x38/0x78
      [  473.756423]  el0_svc+0x8/0xc
      [  485.755156] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [tmp:5149]
      
      Add cond_resched() to avoid soft lockup and to provide a better
      system responding.
      
      Link: https://lore.kernel.org/r/20200215080206.13293-1-luoshijie1@huawei.comSigned-off-by: NShijie Luo <luoshijie1@huawei.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: stable@kernel.org
      Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      ef91f0f3
    • X
      x86 / config: add openeuler_defconfig · 95a54772
      Xiongfeng Wang 提交于
      hulk inclusion
      category: config
      bugzilla: 31089
      CVE: NA
      
      -----------------------------
      
      Add openeuler_defconfig for openeuler itself.
      Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
      Reviewed-By: NXie XiuQi <xiexiuqi@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      95a54772
    • Z
      files_cgroup: Fix soft lockup when refcnt overflow. · 22f98d8e
      Zhang Xiaoxu 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 31087
      CVE: NA
      
      ---------------------
      
      There is a soft lockup call trace as below:
        CPU: 0 PID: 1360 Comm: imapsvcd Kdump: loaded Tainted: G           OE
        task: ffff8a7296e1eeb0 ti: ffff8a7296aa0000 task.ti: ffff8a7296aa0000
        RIP: 0010:[<ffffffffb691ecb4>]  [<ffffffffb691ecb4>]
        __css_tryget+0x24/0x50
        RSP: 0018:ffff8a7296aa3db8  EFLAGS: 00000a87
        RAX: 0000000080000000 RBX: ffff8a7296aa3df8 RCX: ffff8a72820d9a08
        RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8a72820d9a00
        RBP: ffff8a7296aa3db8 R08: 000000000001c360 R09: ffffffffb6a478f4
        R10: ffffffffb6935e83 R11: ffffffffffffffd0 R12: 0000000057d35cd8
        R13: 000000d000000002 R14: ffffffffb6892fbe R15: 000000d000000002
        FS:  0000000000000000(0000) GS:ffff8a72fec00000(0063)
        knlGS:00000000c6e65b40
        CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
        CR2: 0000000057d35cd8 CR3: 00000007e8008000 CR4: 00000000003607f0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         [<ffffffffb6a93578>] files_cgroup_assign+0x48/0x60
         [<ffffffffb6a47972>] dup_fd+0xb2/0x2f0
         [<ffffffffb6935e83>] ? audit_alloc+0xe3/0x180
         [<ffffffffb6893a03>] copy_process+0xbd3/0x1a40
         [<ffffffffb6894a21>] do_fork+0x91/0x320
         [<ffffffffb6f329e6>] ? trace_do_page_fault+0x56/0x150
         [<ffffffffb6894d36>] SyS_clone+0x16/0x20
         [<ffffffffb6f3bf8c>] ia32_ptregs_common+0x4c/0xfc
         code: 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 8d 4f 08 48 89 e5 8b
               47 08 8d 90 00 00 00 80 85 c0 0f 49 d0 8d 72 01 89 d0 f0 0f b1
      
      When the child process exit, we doesn't call dec refcnt, so, the refcnt
      maybe overflow. Then the 'task_get_css' will dead loop because the
      'css_refcnt' will return an unbias refcnt, if the refcnt is negitave,
      '__css_tryget' always return false, then 'task_get_css' dead looped.
      
      The child process always call 'close_files' when exit, add dec refcnt in
      it.
      Signed-off-by: NZhang Xiaoxu <zhangxiaoxu5@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      Signed-off-by: NZhang Xiaoxu <zhangxiaoxu5@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      22f98d8e
    • J
      vt: selection, close sel_buffer race · 099d032f
      Jiri Slaby 提交于
      mainline inclusion
      from mainline-v5.6-rc2
      commit 07e6124a1a46b4b5a9b3cacc0c306b50da87abf5
      category: bugfix
      bugzilla: 13690
      CVE: CVE-2020-8648
      
      -------------------------------------------------
      
      syzkaller reported this UAF:
      BUG: KASAN: use-after-free in n_tty_receive_buf_common+0x2481/0x2940 drivers/tty/n_tty.c:1741
      Read of size 1 at addr ffff8880089e40e9 by task syz-executor.1/13184
      
      CPU: 0 PID: 13184 Comm: syz-executor.1 Not tainted 5.4.7 #1
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
      Call Trace:
      ...
       kasan_report+0xe/0x20 mm/kasan/common.c:634
       n_tty_receive_buf_common+0x2481/0x2940 drivers/tty/n_tty.c:1741
       tty_ldisc_receive_buf+0xac/0x190 drivers/tty/tty_buffer.c:461
       paste_selection+0x297/0x400 drivers/tty/vt/selection.c:372
       tioclinux+0x20d/0x4e0 drivers/tty/vt/vt.c:3044
       vt_ioctl+0x1bcf/0x28d0 drivers/tty/vt/vt_ioctl.c:364
       tty_ioctl+0x525/0x15a0 drivers/tty/tty_io.c:2657
       vfs_ioctl fs/ioctl.c:47 [inline]
      
      It is due to a race between parallel paste_selection (TIOCL_PASTESEL)
      and set_selection_user (TIOCL_SETSEL) invocations. One uses sel_buffer,
      while the other frees it and reallocates a new one for another
      selection. Add a mutex to close this race.
      
      The mutex takes care properly of sel_buffer and sel_buffer_lth only. The
      other selection global variables (like sel_start, sel_end, and sel_cons)
      are protected only in set_selection_user. The other functions need quite
      some more work to close the races of the variables there. This is going
      to happen later.
      
      This likely fixes (I am unsure as there is no reproducer provided) bug
      206361 too. It was marked as CVE-2020-8648.
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Reported-by: syzbot+59997e8d5cbdc486e6f6@syzkaller.appspotmail.com
      References: https://bugzilla.kernel.org/show_bug.cgi?id=206361
      Cc: stable <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20200210081131.23572-2-jslaby@suse.czSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      099d032f
    • J
      vt: selection, handle pending signals in paste_selection · 734874b5
      Jiri Slaby 提交于
      mainline inclusion
      from mainline-v5.6-rc2
      commit 687bff0cd08f790d540cfb7b2349f0d876cdddec
      category: bugfix
      bugzilla: 13690
      CVE: CVE-2020-8648
      
      -------------------------------------------------
      
      When pasting a selection to a vt, the task is set as INTERRUPTIBLE while
      waiting for a tty to unthrottle. But signals are not handled at all.
      Normally, this is not a problem as tty_ldisc_receive_buf receives all
      the goods and a user has no reason to interrupt the task.
      
      There are two scenarios where this matters:
      1) when the tty is throttled and a signal is sent to the process, it
         spins on a CPU until the tty is unthrottled. schedule() does not
         really echedule, but returns immediately, of course.
      2) when the sel_buffer becomes invalid, KASAN prevents any reads from it
         and the loop simply does not proceed and spins forever (causing the
         tty to throttle, but the code never sleeps, the same as above). This
         sometimes happens as there is a race in the sel_buffer handling code.
      
      So add signal handling to this ioctl (TIOCL_PASTESEL) and return -EINTR
      in case a signal is pending.
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Cc: stable <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20200210081131.23572-1-jslaby@suse.czSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      734874b5
    • G
      RDMA/hns: Compilation Configuration update · f8930e91
      Gao Xun 提交于
      driver inclusion
      category: Bugfix
      bugzilla: NA
      CVE: NA
      
      We updated dfx module related conditional compilation layout to
      ensure proper compilation when we turnoff dfx in .config file.
      Signed-off-by: NGao Xun <gaoxun3@huawei.com>
      Reviewed-by: NHu Chunzhi <huchunzhi@huawei.com>
      Reviewed-by: NWang Lin <wanglin137@huawei.com>
      Reviewed-by: NZhao Weibo <zhaoweibo3@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      f8930e91
    • Z
      jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer · 9e486a76
      zhangyi (F) 提交于
      [ Upstream commit c96dceea ]
      
      Commit 904cdbd4 ("jbd2: clear dirty flag when revoking a buffer from
      an older transaction") set the BH_Freed flag when forgetting a metadata
      buffer which belongs to the committing transaction, it indicate the
      committing process clear dirty bits when it is done with the buffer. But
      it also clear the BH_Mapped flag at the same time, which may trigger
      below NULL pointer oops when block_size < PAGE_SIZE.
      
      rmdir 1             kjournald2                 mkdir 2
                          jbd2_journal_commit_transaction
      		    commit transaction N
      jbd2_journal_forget
      set_buffer_freed(bh1)
                          jbd2_journal_commit_transaction
                           commit transaction N+1
                           ...
                           clear_buffer_mapped(bh1)
                                                     ext4_getblk(bh2 ummapped)
                                                     ...
                                                     grow_dev_page
                                                      init_page_buffers
                                                       bh1->b_private=NULL
                                                       bh2->b_private=NULL
                           jbd2_journal_put_journal_head(jh1)
                            __journal_remove_journal_head(hb1)
      		       jh1 is NULL and trigger oops
      
      *) Dir entry block bh1 and bh2 belongs to one page, and the bh2 has
         already been unmapped.
      
      For the metadata buffer we forgetting, we should always keep the mapped
      flag and clear the dirty flags is enough, so this patch pick out the
      these buffers and keep their BH_Mapped flag.
      
      Link: https://lore.kernel.org/r/20200213063821.30455-3-yi.zhang@huawei.com
      Fixes: 904cdbd4 ("jbd2: clear dirty flag when revoking a buffer from an older transaction")
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      9e486a76
    • Z
      jbd2: move the clearing of b_modified flag to the journal_unmap_buffer() · c61ee205
      zhangyi (F) 提交于
      [ Upstream commit 6a66a7ded12baa6ebbb2e3e82f8cb91382814839 ]
      
      There is no need to delay the clearing of b_modified flag to the
      transaction committing time when unmapping the journalled buffer, so
      just move it to the journal_unmap_buffer().
      
      Link: https://lore.kernel.org/r/20200213063821.30455-2-yi.zhang@huawei.comReviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@kernel.org
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      c61ee205
    • B
      iscsi: use dynamic single thread workqueue to improve performance · c2cc4a1f
      Biaoxiang Ye 提交于
      euleros inclusion
      category: feature
      feature: Implement NUMA affinity for order workqueue
      
      -------------------------------------------------
      
      On aarch64 NUMA machines, the kworker of iscsi created always jump
      around across node boundaries. If it work on the different node even
      different cpu package with the softirq of network interface, memcpy
      with in iscsi_tcp_segment_recv will be slow down, and iscsi got an
      terrible performance.
      
      In this patch, we trace the cpu of softirq, and tell queue_work_on
      to execute iscsi_xmitworker on the same NUMA node.
      
      The performance data as below:
      fio cmd:
      fio -filename=/dev/disk/by-id/wwn-0x6883fd3100a2ad260036281700000000
      -direct=1 -iodepth=32 -rw=read -bs=64k -size=30G -ioengine=libaio
      -numjobs=1 -group_reporting -name=mytest -time_based -ramp_time=60
      -runtime=60
      
      before patch:
      Jobs: 1 (f=1): [R] [52.5% done] [852.3MB/0KB/0KB /s] [13.7K/0/0 iops] [eta 00m:57s]
      Jobs: 1 (f=1): [R] [53.3% done] [861.4MB/0KB/0KB /s] [13.8K/0/0 iops] [eta 00m:56s]
      Jobs: 1 (f=1): [R] [54.2% done] [868.2MB/0KB/0KB /s] [13.9K/0/0 iops] [eta 00m:55s]
      
      after pactch:
      Jobs: 1 (f=1): [R] [53.3% done] [1070MB/0KB/0KB /s] [17.2K/0/0 iops] [eta 00m:56s]
      Jobs: 1 (f=1): [R] [55.0% done] [1064MB/0KB/0KB /s] [17.3K/0/0 iops] [eta 00m:54s]
      Jobs: 1 (f=1): [R] [56.7% done] [1069MB/0KB/0KB /s] [17.1K/0/0 iops] [eta 00m:52s]
      
      cpu info:
      Architecture:          aarch64
      Byte Order:            Little Endian
      CPU(s):                128
      On-line CPU(s) list:   0-127
      Thread(s) per core:    1
      Core(s) per socket:    64
      Socket(s):             2
      NUMA node(s):          4
      Model:                 0
      CPU max MHz:           2600.0000
      CPU min MHz:           200.0000
      BogoMIPS:              200.00
      L1d cache:             64K
      L1i cache:             64K
      L2 cache:              512K
      L3 cache:              32768K
      NUMA node0 CPU(s):     0-31
      NUMA node1 CPU(s):     32-63
      NUMA node2 CPU(s):     64-95
      NUMA node3 CPU(s):     96-127
      Signed-off-by: NBiaoxiang Ye <yebiaoxiang@huawei.com>
      Acked-by: NHanjun Guo <guohanjun@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      c2cc4a1f
    • B
      workqueue: implement NUMA affinity for single thread workqueue · df86cc94
      Biaoxiang Ye 提交于
      euleros inclusion
      category: feature
      feature: Implement NUMA affinity for order workqueue
      
      -------------------------------------------------
      
      Currently, single thread workqueue only have single pwq, all of
      works are queued the same workerpool. This is not optimal on
      NUMA machines, will cause workers jump around across node.
      
      This patch add a new wq flags __WQ_DYNAMIC,  this new kind of
      single thread workqueue creates a separate pwq covering the
      intersecting CPUS for each NUMA node which has online CPUS
      in @attrs->cpumask instead of mapping all entries of numa_pwq_tbl[]
      to the same pwq. After this, we can specify the @cpu of
      queue_work_on, so the work can be executed on the same NUMA
      node of the specified @cpu.
      This kind of wq only support single work, multi works can't guarantee
      the work's order.
      Signed-off-by: NBiaoxiang Ye <yebiaoxiang@huawei.com>
      Acked-by: NHanjun Guo <guohanjun@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      df86cc94