• B
    iscsi: use dynamic single thread workqueue to improve performance · fc7b0914
    Biaoxiang Ye 提交于
    euleros inclusion
    category: feature
    feature: Implement NUMA affinity for order workqueue
    
    -------------------------------------------------
    
    On aarch64 NUMA machines, the kworker of iscsi created always jump
    around across node boundaries. If it work on the different node even
    different cpu package with the softirq of network interface, memcpy
    with in iscsi_tcp_segment_recv will be slow down, and iscsi got an
    terrible performance.
    
    In this patch, we trace the cpu of softirq, and tell queue_work_on
    to execute iscsi_xmitworker on the same NUMA node.
    
    The performance data as below:
    fio cmd:
    fio -filename=/dev/disk/by-id/wwn-0x6883fd3100a2ad260036281700000000
    -direct=1 -iodepth=32 -rw=read -bs=64k -size=30G -ioengine=libaio
    -numjobs=1 -group_reporting -name=mytest -time_based -ramp_time=60
    -runtime=60
    
    before patch:
    Jobs: 1 (f=1): [R] [52.5% done] [852.3MB/0KB/0KB /s] [13.7K/0/0 iops] [eta 00m:57s]
    Jobs: 1 (f=1): [R] [53.3% done] [861.4MB/0KB/0KB /s] [13.8K/0/0 iops] [eta 00m:56s]
    Jobs: 1 (f=1): [R] [54.2% done] [868.2MB/0KB/0KB /s] [13.9K/0/0 iops] [eta 00m:55s]
    
    after pactch:
    Jobs: 1 (f=1): [R] [53.3% done] [1070MB/0KB/0KB /s] [17.2K/0/0 iops] [eta 00m:56s]
    Jobs: 1 (f=1): [R] [55.0% done] [1064MB/0KB/0KB /s] [17.3K/0/0 iops] [eta 00m:54s]
    Jobs: 1 (f=1): [R] [56.7% done] [1069MB/0KB/0KB /s] [17.1K/0/0 iops] [eta 00m:52s]
    
    cpu info:
    Architecture:          aarch64
    Byte Order:            Little Endian
    CPU(s):                128
    On-line CPU(s) list:   0-127
    Thread(s) per core:    1
    Core(s) per socket:    64
    Socket(s):             2
    NUMA node(s):          4
    Model:                 0
    CPU max MHz:           2600.0000
    CPU min MHz:           200.0000
    BogoMIPS:              200.00
    L1d cache:             64K
    L1i cache:             64K
    L2 cache:              512K
    L3 cache:              32768K
    NUMA node0 CPU(s):     0-31
    NUMA node1 CPU(s):     32-63
    NUMA node2 CPU(s):     64-95
    NUMA node3 CPU(s):     96-127
    Signed-off-by: NBiaoxiang Ye <yebiaoxiang@huawei.com>
    Acked-by: NHanjun Guo <guohanjun@huawei.com>
    Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
    fc7b0914
libiscsi.c 99.2 KB