- 10 1月, 2022 3 次提交
-
-
由 Li Ruilin 提交于
euleros inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4LOJ6 CVE: NA ------------------------------ Add a framwork to transform io informations to userspace client and process prefetch request sent by userspace client. Create a char device namede "acache" for connecting between kernelspace and userspace. Save informations of all io requests into a buffer and pass them to client when client reads from the device. The prefetch request could be treated as normal io request. As deference, those requests have no need return data back to userspace, and they should not append readahead part. Add two parameters. acache_dev_size is for controlling size of buffer to save io informations. acache_prefetch_workers is for controlling max threads to process prefetch requests. Signed-off-by: NLi Ruilin <liruilin4@huawei.com> Reviewed-by: NLuan Jianhai <luanjianhai@huawei.com> Reviewed-by: NPeng Junyi <pengjunyi1@huawei.com> Acked-by: NXie Xiuqi <xiexiuqi@huawei.com> Signed-off-by: NCheng Jian <cj.chengjian@huawei.com> Reviewed-by: NGuangxing Deng <dengguangxing@huawei.com> Reviewed-by: Nchao song <chao.song@huawei.com> Reviewed-by: Nchao song <chao.song@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Baisong Zhong 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4PY1Q CVE: NA -------------------------------- We add stub info in some structures to maintain the consistency of KABI Signed-off-by: NBaisong Zhong <zhongbaisong@huawei.com> Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com> Reviewed-by: NYue Haibing <yuehaibing@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Zheng Zengkai 提交于
driver inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4ETXO CVE: NA ----------------------------------------- Fix following build warnings in arm32 builds: drivers/net/ethernet/huawei/bma/edma_drv/bma_devintf.c: In function ‘bma_cdev_add_msg’: drivers/net/ethernet/huawei/bma/edma_drv/bma_pci.h:92:20: warning: format ‘%ld’ expects argument of type ‘long int’, but argument 5 has type ‘size_t {aka unsigned int}’ [-Wformat=] drivers/net/ethernet/huawei/bma/veth_drv/veth_hb.c: In function ‘veth_recv_pkt’: drivers/net/ethernet/huawei/bma/veth_drv/veth_hb.c:74:37: warning: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 7 has type ‘dma_addr_t {aka unsigned int}’ [-Wformat=] Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 08 1月, 2022 21 次提交
-
-
由 Kemeng Shi 提交于
euleros inclusion category: feature feature: etmem bugzilla: https://gitee.com/openeuler/kernel/issues/I4OODH?from=project-issue CVE: NA ------------------------------------------------- Add /proc/sys/kernel/hugepage_pmem_allocall switch. Set 1 to allowed all memory in pmem could alloc for hugepage. Set 0(default) hugepage alloc is limited by zone watermark as usual. Add /proc/sys/kernel/hugepage_mig_noalloc switch. Set 1 to forbid new hugepage alloc in hugepage migration when hugepage in dest node runs out. Set 0(default) to allow hugepage alloc in hugepage migration as usual. Signed-off-by: NKemeng Shi <shikemeng@huawei.com> Reviewed-by: Nlouhongxiang <louhongxiang@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Fan Du 提交于
euleros inclusion category: feature feature: etmem bugzilla: https://gitee.com/openeuler/kernel/issues/I4OODH?from=project-issue CVE: NA ------------------------------------------------- User space migration daemon could check /sys/bus/node/devices/nodeX/type for node type. Software can interrogate node type for node memory type and distance to get desirable target node in migration. grep -r . /sys/devices/system/node/*/type /sys/devices/system/node/node0/type:dram /sys/devices/system/node/node1/type:dram /sys/devices/system/node/node2/type:pmem /sys/devices/system/node/node3/type:pmem Along with next patch which export `peer_node`, migration daemon could easily find the memory type of current node, and the target node in case of migration. grep -r . /sys/devices/system/node/*/peer_node /sys/devices/system/node/node0/peer_node:2 /sys/devices/system/node/node1/peer_node:3 /sys/devices/system/node/node2/peer_node:0 /sys/devices/system/node/node3/peer_node:1 Signed-off-by: NFan Du <fan.du@intel.com> Signed-off-by: NFengguang Wu <fengguang.wu@intel.com> Signed-off-by: NKemeng Shi <shikemeng@huawei.com> Reviewed-by: Nlouhongxiang <louhongxiang@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Kemeng Shi 提交于
euleros inclusion category: feature feature: etmem bugzilla: https://gitee.com/openeuler/kernel/issues/I4OODH?from=project-issue CVE: NA ------------------------------------------------- Driver dax_kmem will export pmem as a NUMA node. This patch will record node consists of persistent memory for futher use. Signed-off-by: NKemeng Shi <shikemeng@huawei.com> Reviewed-by: Nlouhongxiang <louhongxiang@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Kemeng Shi 提交于
euleros inclusion category: feature feature: etmem bugzilla: https://gitee.com/openeuler/kernel/issues/I4OODH?from=project-issue CVE: NA ------------------------------------------------- add a callback in pte_hole during walk_page_range for user to scan page without page table. Signed-off-by: NKemeng Shi <shikemeng@huawei.com> Reviewed-by: Nlouhongxiang <louhongxiang@huawei.com> Reviewed-by: NChen Wandun <chenwandun@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Kemeng Shi 提交于
euleros inclusion category: feature feature: etmem bugzilla: https://gitee.com/openeuler/kernel/issues/I4OODH?from=project-issue CVE: NA ------------------------------------------------- Now, we will call cond_resched after scan a full memslot. If we scan a huge memslot, it will take long time before cond_resched. So call cond_resched after scan walk_step size memory. Signed-off-by: NKemeng Shi <shikemeng@huawei.com> Reviewed-by: Nlouhongxiang <louhongxiang@huawei.com> Reviewed-by: NChen Wandun <chenwandun@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Kemeng Shi 提交于
euleros inclusion category: feature feature: etmem bugzilla: https://gitee.com/openeuler/kernel/issues/I4OODH?from=project-issue CVE: NA ------------------------------------------------- Kvm shadow page may be freed when etmem_scan is walking ept page table. Hold mmu_lock when walking ept page table to avoid UAF. To avoid holding mmu_lock for too long time, walk step module parameter is added to control lock holding time. Signed-off-by: NKemeng Shi <shikemeng@huawei.com> Reviewed-by: Nlouhongxiang <louhongxiang@huawei.com> Reviewed-by: NChen Wandun <chenwandun@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Kemeng Shi 提交于
euleros inclusion category: feature feature: etmem bugzilla: https://gitee.com/openeuler/kernel/issues/I4OODH?from=project-issue CVE: NA ------------------------------------------------- Module scan/swap and etmem access export file operations without protection. Kernel crash can be triggered by following: 1.insert scan/swap module. 2.etmem check if exported file operations are set. 3.remove scan/swap module. 4.etmem call checked file operation. 5.kernel crash happens. Fix this as following: Module scan/swap set and clear operations with lock held. Etmem in kernel calls try_module_get to with lock held. Etmem call read/open/release/ioctl callback without lock held with module get. Another concurrent access situaction is that open for idles_pages and swap_pages will success without scan/swap module inserted. If scan/swap module is inserteds after open, subsequent call of open/read/close will call exported file operations set by scan/swap. This also may trigger kernel crash as following: 1.open idle_pages or swap_pages 2.modprobe scan/swap module 3.close idle_pages or swap_pages(module_put is called without try_module_get) 4.modprobe -r scan/swap module found invalid module reference count in trace delete_module syscall->try_stop_module->try_release_module_ref and report a BUG_ON for ret < 0. Fix this by only return file successfully with scan/swap module inserted. Signed-off-by: NKemeng Shi <shikemeng@huawei.com> Reviewed-by: Nlouhongxiang <louhongxiang@huawei.com> Reviewed-by: NChen Wandun <chenwandun@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Kemeng Shi 提交于
euleros inclusion category: feature feature: etmem bugzilla: https://gitee.com/openeuler/kernel/issues/I4OODH?from=project-issue CVE: NA ------------------------------------------------- free pic before return from vm_idle_read in etmem scan Signed-off-by: NKemeng Shi <shikemeng@huawei.com> Reviewed-by: Nlouhongxiang <louhongxiang@huawei.com> Reviewed-by: NChen Wandun <chenwandun@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Kemeng Shi 提交于
euleros inclusion category: feature feature: etmem bugzilla: https://gitee.com/openeuler/kernel/issues/I4OODH?from=project-issue CVE: NA ------------------------------------------------- Before this patch, etmem_scan is failed if vm and host has different page level. This patch supports scan 4 level ept while 5 level page is enabled in host. Signed-off-by: NKemeng Shi <shikemeng@huawei.com> Reviewed-by: Nlouhongxiang <louhongxiang@huawei.com> Reviewed-by: NChen Wandun <chenwandun@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Kemeng Shi 提交于
euleros inclusion category: feature feature: etmem bugzilla: https://gitee.com/openeuler/kernel/issues/I4OODH?from=project-issue CVE: NA ------------------------------------------------- 1. add hugetlb_entry callback to report hugetlb page. 2. try to walk host page table when ept entry is not present. 3. add SCAN_AS_HUGE to report ept page in pmd level as host hugetlb page may be splited into 4k ept page in vm. 4. add SCAN_IGN_HOST for user to ignore access from host. Signed-off-by: NKemeng Shi <shikemeng@huawei.com> Reviewed-by: Nlouhongxiang <louhongxiang@huawei.com> Reviewed-by: NChen Wandun <chenwandun@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Kemeng Shi 提交于
euleros inclusion category: feature feature: etmem bugzilla: https://gitee.com/openeuler/kernel/issues/I4OODH?from=project-issue CVE: NA ------------------------------------------------- support ioctl for etmem scan to set scan flag Signed-off-by: NKemeng Shi <shikemeng@huawei.com> Reviewed-by: Nlouhongxiang <louhongxiang@huawei.com> Reviewed-by: NChen Wandun <chenwandun@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Chengchang Tang 提交于
mainline inclusion from mainline-v5.16-rc5 commit 38d22088 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4O23M CVE: NA --------------------------------------------------------------------- HIP06 is no longer supported. In order to reduce unnecessary maintenance, the code of HIP06 is removed. Link: https://lore.kernel.org/r/20211220130558.61585-1-liangwenpeng@huawei.comSigned-off-by: NChengchang Tang <tangchengchang@huawei.com> Signed-off-by: NWenpeng Liang <liangwenpeng@huawei.com> Reviewed-by: NLeon Romanovsky <leonro@nvidia.com> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> sigend-off-by: NGuofeng Yue <yueguofeng@hisilicon.com> Reviewed-by: NYangyang Li <liyangyang20@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Weihang Li 提交于
mainline inclusion from mainline-v5.15-rc1 commit ab5cbb9d category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4O23M CVE: NA --------------------------------------------------------------------- There is no need to prints error for hw_v1. Link: https://lore.kernel.org/r/1629985056-57004-5-git-send-email-liangwenpeng@huawei.comSigned-off-by: NWeihang Li <liweihang@huawei.com> Signed-off-by: NWenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> sigend-off-by: NGuofeng Yue <yueguofeng@hisilicon.com> Reviewed-by: NYangyang Li <liyangyang20@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Yixing Liu 提交于
mainline inclusion from mainline-v5.15-rc1 commit 0045e0d3 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4O23M CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/commit/?id=0045e0d3f42ed7d05434bb5bc16acfc793ea4891 --------------------------------------------------------------------- The current write wqe mechanism is to write to DDR first, and then notify the hardware through doorbell to read the data. Direct wqe is a mechanism to fill wqe directly into the hardware. In the case of light load, the wqe will be filled into pcie bar space of the hardware, this will reduce one memory access operation and therefore reduce the latency. SIMD instructions allows cpu to write the 512 bits at one time to device memory, thus it can be used for posting direct wqe. Add direct wqe enable switch and address mapping. Link: https://lore.kernel.org/r/20211207124901.42123-2-liangwenpeng@huawei.comSigned-off-by: NYixing Liu <liuyixing1@huawei.com> Signed-off-by: NWenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> sigend-off-by: NGuofeng Yue <yueguofeng@hisilicon.com> Reviewed-by: NYangyang Li <liyangyang20@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Yixing Liu 提交于
mainline inclusion from mainline-v5.15-rc3 commit 39d5534b category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4O23M CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/commit/?id=39d5534b1302189c809e90641ffae8cbdc42a8fc --------------------------------------------------------------------- It is more general for ARM device drivers to use the device attribute to map PCI BAR spaces. Fixes: 9a443537 ("IB/hns: Add driver files for hns RoCE driver") Link: https://lore.kernel.org/r/20211206133652.27476-1-liangwenpeng@huawei.comSigned-off-by: NYixing Liu <liuyixing1@huawei.com> Signed-off-by: NWenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> sigend-off-by: NGuofeng Yue <yueguofeng@hisilicon.com> Reviewed-by: NYangyang Li <liyangyang20@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Yixing Liu 提交于
mainline inclusion from mainline-v5.15-rc1 commit ae2854c5 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4O23M CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/commit/?id=ae2854c5d318c8415e2f033b29fcfcb81a9e9aa7 --------------------------------------------------------------------- Encapsulate qp db into two functions: user and kernel. Link: https://lore.kernel.org/r/1629985056-57004-7-git-send-email-liangwenpeng@huawei.comSigned-off-by: NYixing Liu <liuyixing1@huawei.com> Signed-off-by: NWenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> sigend-off-by: NGuofeng Yue <yueguofeng@hisilicon.com> Reviewed-by: NYangyang Li <liyangyang20@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Chengchang Tang 提交于
mainline inclusion from mainline-v5.15-rc1 commit 6d202d9f category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4O23M CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/commit/?id=6d202d9f70a33560ab62b81da2b062c936437e54 --------------------------------------------------------------------- Add a new implementation for mmap by using the new mmap entry API. This makes way for further use of the dynamic mmap allocator in this driver. Link: https://lore.kernel.org/r/20211028105640.1056-1-liangwenpeng@huawei.comSigned-off-by: NChengchang Tang <tangchengchang@huawei.com> Signed-off-by: NYixing Liu <liuyixing1@huawei.com> Signed-off-by: NWenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> sigend-off-by: NGuofeng Yue <yueguofeng@hisilicon.com> Reviewed-by: NYangyang Li <liyangyang20@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Yangyang Li 提交于
mainline inclusion from mainline-v5.15-rc1 commit 8feafd90 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4O23M CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/commit/?id=8feafd9017ba5b01c3ea256b59ac2c867a762659 --------------------------------------------------------------------- Switch uar index allocation and release from hns' own bitmap interface to IDA interface. Link: https://lore.kernel.org/r/1629336980-17499-2-git-send-email-liangwenpeng@huawei.comSigned-off-by: NYangyang Li <liyangyang20@huawei.com> Signed-off-by: NWenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> sigend-off-by: NGuofeng Yue <yueguofeng@hisilicon.com> Reviewed-by: NYixing Liu <liuyixing1@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wenpeng Liang 提交于
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4NI61 -------------------------------------------------------------- If the corresponding bit is not set, the user will not be able to create AH. Fixes: 9a443537 ("IB/hns: Add driver files for hns RoCE driver") Signed-off-by: NWenpeng Liang <liangwenpeng@huawei.com> sigend-off-by: NGuofeng Yue <yueguofeng@hisilicon.com> Reviewed-by: NYangyang Li <liyangyang20@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Guofeng Yue 提交于
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4O23M ---------------------------------------------------------------------------- The DWQE flag bit is added to fix the problem that the DWQE function is not enabled. Fixes: a7c87b3e ("RDMA/hns: Add support of direct wqe") Signed-off-by: NGuofeng Yue <yueguofeng@hisilicon.com> Reviewed-by: NYangyang Li <liyangyang20@huawei.com> Reviewed-by: NWenpeng Liang <liangwenpeng@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Yixing Liu 提交于
mainline inclusion from mainline-v5.15-rc1 commit 260f64a4 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4O23M CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=260f64a40198309008026447f7fda277a73ed8c3 ---------------------------------------------------------------------------- The stash feature is enabled by default on HIP09. Fixes: f93c39bc ("RDMA/hns: Add support for QP stash") Fixes: bfefae9f ("RDMA/hns: Add support for CQ stash") Link: https://lore.kernel.org/r/1629539607-33217-3-git-send-email-liangwenpeng@huawei.comSigned-off-by: NYixing Liu <liuyixing1@huawei.com> Signed-off-by: NWenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com> sigend-off-by: NGuofeng Yue <yueguofeng@hisilicon.com> Reviewed-by: NYangyang Li <liyangyang20@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 07 1月, 2022 16 次提交
-
-
由 Wang Yufen 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4PNEK CVE: NA ------------------------------------------------- Add stub proto ops for tcp compression socket. Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: NWang Yufen <wangyufen@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NLu Wei <luwei32@huawei.com> Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang Hai 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4PNEK CVE: NA ------------------------------------------------- Tcp compression is used to reduce the amount of data transmitted between multiple machines, which can increase the transmission capacity. The local tcp connection is a single machine transfer, so there is no meaning to use tcp compression. Ignore it by default. Enable by sysctl: echo 1 > /proc/net/ipv4/tcp_compression_local Signed-off-by: NWang Hai <wanghai38@huawei.com> Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: NWang Yufen <wangyufen@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NLu Wei <luwei32@huawei.com> Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang Yufen 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4PNEK CVE: NA ------------------------------------------------- Only enable compression for give server ports, this means we will check either dport when send SYN or sport when send SYN-ACK. Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: NWang Yufen <wangyufen@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NLu Wei <luwei32@huawei.com> Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wei Yongjun 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4PNEK CVE: NA ------------------------------------------------- Add sysctl interface for enable/disable tcp compression by ports. Example: $ echo 4000 > /proc/sys/net/ipv4/tcp_compression_ports will enable port 4000 for tcp compression $ echo 4000,5000 > /proc/sys/net/ipv4/tcp_compression_ports will enable both port 4000 and 5000 for tcp compression $ echo > /proc/sys/net/ipv4/tcp_compression_ports will disable tcp compression. Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: NWang Yufen <wangyufen@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NLu Wei <luwei32@huawei.com> Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang Yufen 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4PNEK CVE: NA ------------------------------------------------- When establishing a tcp connection or closing it, the tcp compression needs to be initialized or cleaned up at the same time. Add dummy init and cleanup hook for tcp compression. It will be implemented later. Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: NWang Yufen <wangyufen@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NLu Wei <luwei32@huawei.com> Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang Yufen 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4PNEK CVE: NA ------------------------------------------------- Add new tcp COMP option to SYN and SYN-ACK when tcp COMP is enabled. connection compress payload only when both side support it. Signed-off-by: NWang Yufen <wangyufen@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NLu Wei <luwei32@huawei.com> Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wei Yongjun 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4PNEK CVE: NA ------------------------------------------------- Add config item CONFIG_TCP_COMP for tcp payload compression. This allows payload compression handling of the TCP protocol to be done in-kernel. This patch only adds the CONFIG_TCP_COMP config, tcp compression capability is implemented later. Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: NWang Yufen <wangyufen@huawei.com> Signed-off-by: NYang Yingliang <yangyingliang@huawei.com> Signed-off-by: NLu Wei <luwei32@huawei.com> Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Dave Chinner 提交于
mainline-inclusion from mainline-v5.14-rc4 commit 33c0dd78 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=33c0dd7898a11ef19169abe5c5049fa6aa099c64 ------------------------------------------------- We only use the CIL workqueue in the CIL, so it makes no sense to hang it off the xfs_mount and have to walk multiple pointers back up to the mount when we have the CIL structures right there. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NLihong Kou <koulihong@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Dave Chinner 提交于
mainline-inclusion from mainline-v5.14-rc4 commit 39823d0f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=39823d0fac9416cb89c252d78e262ee8cd76a7d8 ------------------------------------------------- Because we use a single work structure attached to the CIL rather than the CIL context, we can only queue a single work item at a time. This results in the CIL being single threaded and limits performance when it becomes CPU bound. The design of the CIL is that it is pipelined and multiple commits can be running concurrently, but the way the work is currently implemented means that it is not pipelining as it was intended. The critical work to switch the CIL context can take a few milliseconds to run, but the rest of the CIL context flush can take hundreds of milliseconds to complete. The context switching is the serialisation point of the CIL, once the context has been switched the rest of the context push can run asynchrnously with all other context pushes. Hence we can move the work to the CIL context so that we can run multiple CIL pushes at the same time and spread the majority of the work out over multiple CPUs. We can keep the per-cpu CIL commit state on the CIL rather than the context, because the context is pinned to the CIL until the switch is done and we aggregate and drain the per-cpu state held on the CIL during the context switch. However, because we no longer serialise the CIL work, we can have effectively unlimited CIL pushes in progress. We don't want to do this - not only does it create contention on the iclogs and the state machine locks, we can run the log right out of space with outstanding pushes. Instead, limit the work concurrency to 4 concurrent works being processed at a time. This is enough concurrency to remove the CIL from being a CPU bound bottleneck but not enough to create new contention points or unbound concurrency issues. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NLihong Kou <koulihong@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Dave Chinner 提交于
mainline-inclusion from mainline-v5.14-rc4 commit 0020a190 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0020a190cf3eac16995143db41b21b82bacdcbe3 ------------------------------------------------- The AIL pushing is stalling on log forces when it comes across pinned items. This is happening on removal workloads where the AIL is dominated by stale items that are removed from AIL when the checkpoint that marks the items stale is committed to the journal. This results is relatively few items in the AIL, but those that are are often pinned as directories items are being removed from are still being logged. As a result, many push cycles through the CIL will first issue a blocking log force to unpin the items. This can take some time to complete, with tracing regularly showing push delays of half a second and sometimes up into the range of several seconds. Sequences like this aren't uncommon: .... 399.829437: xfsaild: last lsn 0x11002dd000 count 101 stuck 101 flushing 0 tout 20 <wanted 20ms, got 270ms delay> 400.099622: xfsaild: target 0x11002f3600, prev 0x11002f3600, last lsn 0x0 400.099623: xfsaild: first lsn 0x11002f3600 400.099679: xfsaild: last lsn 0x1100305000 count 16 stuck 11 flushing 0 tout 50 <wanted 50ms, got 500ms delay> 400.589348: xfsaild: target 0x110032e600, prev 0x11002f3600, last lsn 0x0 400.589349: xfsaild: first lsn 0x1100305000 400.589595: xfsaild: last lsn 0x110032e600 count 156 stuck 101 flushing 30 tout 50 <wanted 50ms, got 460ms delay> 400.950341: xfsaild: target 0x1100353000, prev 0x110032e600, last lsn 0x0 400.950343: xfsaild: first lsn 0x1100317c00 400.950436: xfsaild: last lsn 0x110033d200 count 105 stuck 101 flushing 0 tout 20 <wanted 20ms, got 200ms delay> 401.142333: xfsaild: target 0x1100361600, prev 0x1100353000, last lsn 0x0 401.142334: xfsaild: first lsn 0x110032e600 401.142535: xfsaild: last lsn 0x1100353000 count 122 stuck 101 flushing 8 tout 10 <wanted 10ms, got 10ms delay> 401.154323: xfsaild: target 0x1100361600, prev 0x1100361600, last lsn 0x1100353000 401.154328: xfsaild: first lsn 0x1100353000 401.154389: xfsaild: last lsn 0x1100353000 count 101 stuck 101 flushing 0 tout 20 <wanted 20ms, got 300ms delay> 401.451525: xfsaild: target 0x1100361600, prev 0x1100361600, last lsn 0x0 401.451526: xfsaild: first lsn 0x1100353000 401.451804: xfsaild: last lsn 0x1100377200 count 170 stuck 22 flushing 122 tout 50 <wanted 50ms, got 500ms delay> 401.933581: xfsaild: target 0x1100361600, prev 0x1100361600, last lsn 0x0 .... In each of these cases, every AIL pass saw 101 log items stuck on the AIL (pinned) with very few other items being found. Each pass, a log force was issued, and delay between last/first is the sleep time + the sync log force time. Some of these 101 items pinned the tail of the log. The tail of the log does slowly creep forward (first lsn), but the problem is that the log is actually out of reservation space because it's been running so many transactions that stale items that never reach the AIL but consume log space. Hence we have a largely empty AIL, with long term pins on items that pin the tail of the log that don't get pushed frequently enough to keep log space available. The problem is the hundreds of milliseconds that we block in the log force pushing the CIL out to disk. The AIL should not be stalled like this - it needs to run and flush items that are at the tail of the log with minimal latency. What we really need to do is trigger a log flush, but then not wait for it at all - we've already done our waiting for stuff to complete when we backed off prior to the log force being issued. Even if we remove the XFS_LOG_SYNC from the xfs_log_force() call, we still do a blocking flush of the CIL and that is what is causing the issue. Hence we need a new interface for the CIL to trigger an immediate background push of the CIL to get it moving faster but not to wait on that to occur. While the CIL is pushing, the AIL can also be pushing. We already have an internal interface to do this - xlog_cil_push_now() - but we need a wrapper for it to be used externally. xlog_cil_force_seq() can easily be extended to do what we need as it already implements the synchronous CIL push via xlog_cil_push_now(). Add the necessary flags and "push current sequence" semantics to xlog_cil_force_seq() and convert the AIL pushing to use it. One of the complexities here is that the CIL push does not guarantee that the commit record for the CIL checkpoint is written to disk. The current log force ensures this by submitting the current ACTIVE iclog that the commit record was written to. We need the CIL to actually write this commit record to disk for an async push to ensure that the checkpoint actually makes it to disk and unpins the pinned items in the checkpoint on completion. Hence we need to pass down to the CIL push that we are doing an async flush so that it can switch out the commit_iclog if necessary to get written to disk when the commit iclog is finally released. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NAllison Henderson <allison.henderson@oracle.com> Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NLihong Kou <koulihong@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Christoph Hellwig 提交于
mainline-inclusion from mainline-v5.11-rc4 commit ae29e422 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ae29e4220fd3047b5442e7e8db8027d7745093f5 ------------------------------------------------- If the inode is not pinned by the time fsync is called we don't need the ilock to protect against concurrent clearing of ili_fsync_fields as the inode won't need a log flush or clearing of these fields. Not taking the iolock allows for full concurrency of fsync and thus O_DSYNC completions with io_uring/aio write submissions. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NLihong Kou <koulihong@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Christoph Hellwig 提交于
mainline-inclusion from mainline-v5.11-rc4 commit f22c7f87 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f22c7f87777361f94aa17f746fbadfa499248dc8 ------------------------------------------------- Factor out the log syncing logic into two helpers to make the code easier to read and more maintainable. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NLihong Kou <koulihong@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Darrick J. Wong 提交于
mainline-inclusion from mainline-v5.14-rc4 commit 40b1de00 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=40b1de007aca4f9ec4ee4322c29f026ebb60ac96 ------------------------------------------------- Now that we defer inode inactivation, we've decoupled the process of unlinking or closing an inode from the process of inactivating it. In theory this should lead to better throughput since we now inactivate the queued inodes in batches instead of one at a time. Unfortunately, one of the primary risks with this decoupling is the loss of rate control feedback between the frontend and background threads. In other words, a rm -rf /* thread can run the system out of memory if it can queue inodes for inactivation and jump to a new CPU faster than the background threads can actually clear the deferred work. The workers can get scheduled off the CPU if they have to do IO, etc. To solve this problem, we configure a shrinker so that it will activate the /second/ time the shrinkers are called. The custom shrinker will queue all percpu deferred inactivation workers immediately and set a flag to force frontend callers who are releasing a vfs inode to wait for the inactivation workers. On my test VM with 560M of RAM and a 2TB filesystem, this seems to solve most of the OOMing problem when deleting 10 million inodes. Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NLihong Kou <koulihong@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Darrick J. Wong 提交于
mainline-inclusion from mainline-v5.14-rc4 commit a6343e4d category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a6343e4d9278b3919c809fab9945c4d8f04fadf5 ------------------------------------------------- When we're servicing an INUMBERS or BULKSTAT request or running quotacheck, grab an empty transaction so that we can use its inherent recursive buffer locking abilities to detect inode btree cycles without hitting ABBA buffer deadlocks. This patch requires the deferred inode inactivation patchset because xfs_irele cannot directly call xfs_inactive when the iwalk itself has an (empty) transaction. Found by fuzzing an inode btree pointer to introduce a cycle into the tree (xfs/365). Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NLihong Kou <koulihong@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Darrick J. Wong 提交于
mainline-inclusion from mainline-v5.14-rc4 commit e8d04c2a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e8d04c2abcebd66bdbacd53bb273d824d4e27080 ------------------------------------------------- In xfs_trans_alloc, if the block reservation call returns ENOSPC, we call xfs_blockgc_free_space with a NULL icwalk structure to try to free space. Each frontend thread that encounters this situation starts its own walk of the inode cache to see if it can find anything, which is wasteful since we don't have any additional selection criteria. For this one common case, create a function that reschedules all pending background work immediately and flushes the workqueue so that the scan can run in parallel. Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NLihong Kou <koulihong@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Darrick J. Wong 提交于
mainline-inclusion from mainline-v5.14-rc4 commit 6f649091 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6f6490914d9b712004ddad648e47b1bf22647978 ------------------------------------------------- Now that we have the infrastructure to switch background workers on and off at will, fix the block gc worker code so that we don't actually run the worker when the filesystem is frozen, same as we do for deferred inactivation. Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NLihong Kou <koulihong@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-