Alibaba Cloud Kernel release 18
Cloud Kernel release 18 is rolling out! This release is rebased upon v4.19.91 LTS, let's see what else we're bringing to you (this is really a long changelog):
Highlight: Features, Enhancements and Bug Fixes from Alibaba Cloud Linux Internal Version
- alinux: mm: add proc interface to control context readahead (Xiaoguang Wang)
- alinux: Hookers: add arm64 support (Zou Cao)
- alinux: mm, memcg: export workingset counters on memcg v1 (Xu Yu)
- alinux: pci/iohub-sriov: Support for Alibaba PCIe IOHub SRIOV (liushanghui)
- alinux: mm, memcg: abort priority oom if with oom victim (Xu Yu)
- alinux: mm, memcg: account number of processes in the css (Xu Yu)
- alinux: mm, memcg: fix soft lockup in priority oom (Xu Yu)
- alinux: mm, memcg: record latency of memcg wmark reclaim (Xu Yu)
- alinux: doc: use unified official project name Cloud Kernel (Caspar Zhang)
- alinux: mm: oom_kill: show killed task's cgroup info in global oom (Wenwei Tao)
- alinux: mm: memcontrol: enable oom.group on cgroup-v1 (Wenwei Tao)
- alinux: doc: alibaba: Add priority oom descriptions (Wenwei Tao)
- alinux: mm: memcontrol: introduce memcg priority oom (Wenwei Tao)
- alinux: kernel: cgroup: account number of tasks in the css and its descendants (Wenwei Tao)
- alinux: doc: Add Documentation/alibaba/interfaces.rst (Xunlei Pang)
- alinux: memcg: Account throttled time due to memory.wmark_min_adj (Xunlei Pang)
- alinux: memcg: Introduce memory.wmark_min_adj (Xunlei Pang)
- alinux: memcg: Provide users the ability to reap zombie memcgs (Xunlei Pang)
- alinux: jbd2: track slow handle which is preventing transaction committing (Xiaoguang Wang)
- alinux: fs: record page or bio info while process is waitting on it (Xiaoguang Wang)
- alinux: blk: add iohang check function (Xiaoguang Wang)
- alinux: mm,memcg: export memory.{min,low} to cgroup v1 (Xu Yu)
- alinux: mm,memcg: export memory.{events,events.local} to v1 (Xu Yu)
- alinux: mm,memcg: export memory.high to v1 (Xu Yu)
- alinux: arm64: add livepatch support (Zou Cao)
- alinux: blk-throttle: fix logic error about BIO_THROTL_STATED in throtl_bio_end_io() (Xiaoguang Wang)
- alinux: jbd2: fix build errors (Xiaoguang Wang)
- alinux: mm: remove unused variable (Joseph Qi)
- alinux: jbd2: fix build warnings (Joseph Qi)
- alinux: mm: kidled: fix frame-larger-than build warning (Xu Yu)
- alinux: mm: thp: remove deferred split queue from mem_cgroup (Caspar Zhang)
- alinux: psi: using cpuacct_cgrp_id under CONFIG_CGROUP_CPUACCT (Joseph Qi)
- alinux: iocost: fix format mismatch build warning (Joseph Qi)
- alinux: mm: memcontrol: memcg_wmark_wq can be static (kbuild test robot)
New Features and Enhancements From Upstream
- AMD CPU Enhancements
- Hygon CPU Support
- IOUring Support
- cpuidle: Support guest halt polling (Yihao Wu)
- mm: fix trying to reclaim unevictable lru page when calling madvise_pageout (zhong jiang)
- mm: factor out common parts between MADV_COLD and MADV_PAGEOUT (Minchan Kim)
- mm: introduce MADV_PAGEOUT (Minchan Kim)
- mm: introduce MADV_COLD (Minchan Kim)
- mm: change PAGEREF_RECLAIM_CLEAN with PAGE_REFRECLAIM (Minchan Kim)
- arm64: mm: implement pte_devmap support (Shannon Zhao)
- add the support of patchable-function-entry for hotfix kpatch with gcc 9.2 (Zou Cao)
- KVM: arm64: Add support 1G hugepages at stage 2 (Shannon Zhao)
- spi: spi: add GPIO chipselect support (Baoyou Xie)
Kernel Config Changes
- configs: enable overlay redirect dir and inode index by default
- configs: Build support for Alibaba PCIe IOHub SRIOV
- configs: enable CONFIG_FTRACE_SYSCALLS on x86_64 kernel
- configs: Enable arm64 hookers support
- configs: enable CONFIG_LIVEPATCH for aarch64
- configs: enable NVME block device support
- configs: configs: enable intel idle driver
- configs: enable guest halt polling support
- configs: enable X86 PM timer support
- configs: enable io wq for iouring
- configs: add CGROUP_BPF support on X86
- configs: add vmware support
- configs: enable SOFT_WATCHDOG
- configs: enable Hygon support
- configs: enable iocost for aarch64
- configs: enable CONFIG_BLK_DEBUG_FS by default
- configs: add aarch64 config base
- configs: enable deferred page init
- configs: always enable THP by default
- configs: enable iouring support
Other Bug Fixes
- vfs: fix do_last() regression (Al Viro)
- io-wq: wait for io_wq_create() to setup necessary workers (Jens Axboe) {CVE-2019-19241}
- io_uring: async workers should inherit the user creds (Jens Axboe) {CVE-2019-19241}
- io-wq: have io_wq_create() take a 'data' argument (Jens Axboe) {CVE-2019-19241}
- io_wq: add get/put_work handlers to io_wq_create() (Jens Axboe) {CVE-2019-19241}
- dccp: Fix memleak in __feat_register_sp (YueHaibing) {CVE-2019-20096}
- scsi: libsas: stop discovering if oob mode is disconnected (Jason Yan) {CVE-2019-19965}
- drm/i915/gen9: Clear residual context state on context switch (Akeem G Abodunrin) {CVE-2019-14615}
- RDMA: Fix goto target to release the allocated memory (Navid Emamdoost) {CVE-2019-19077}
- ipmi: Fix memory leak in __ipmi_bmc_register (Navid Emamdoost) {CVE-2019-19046}
- vt: selection, close sel_buffer race (Jiri Slaby) {CVE-2020-8648}
- vgacon: Fix a UAF in vgacon_invert_region (Zhang Xiaoxu) {CVE-2020-8647,CVE-2020-8649}
- do_last(): fetch directory ->i_mode and ->i_uid before it's too late (Al Viro) {CVE-2020-8428}
- x86/kvm: Be careful not to clear KVM_VCPU_FLUSH_TLB bit (Boris Ostrovsky) {CVE-2019-3016}
- KVM: nVMX: Check IO instruction VM-exit conditions (Oliver Upton) {CVE-2020-2732}
- KVM: nVMX: Refactor IO bitmap checks into helper function (Oliver Upton) {CVE-2020-2732}
- KVM: nVMX: Don't emulate instructions in guest mode (Paolo Bonzini) {CVE-2020-2732}
- mm: fix tick timer stall during deferred page init (Shile Zhang)
- bpf/sockmap: Read psock ingress_msg before sk_receive_queue (Lingpeng Chen)
- mm: memcontrol: use CSS_TASK_ITER_PROCS at mem_cgroup_scan_tasks() (Tetsuo Handa)
- io_uring: io_uring_enter(2) don't poll while SETUP_IOPOLL|SETUP_SQPOLL enabled (Xiaoguang Wang)
- md: make sure desc_nr less than MD_SB_DISKS (Yufen Yu)
- md: avoid invalid memory access for array sb->dev_roles (Yufen Yu)
- md: no longer compare spare disk superblock events in super_load (Yufen Yu)
- md: return -ENODEV if rdev has no mddev assigned (Pawel Baldysiak)
- md/raid10: Fix raid10 replace hang when new added disk faulty (Alex Wu)
- cpuidle: governor: Add new governors to cpuidle_governors again (Rafael J. Wysocki)
- kvm: x86: add host poll control msrs (Marcelo Tosatti)
- KVM: arm64: Opportunistically turn off WFI trapping when using direct LPI injection (Marc Zyngier)
- KVM: vgic-v4: Track the number of VLPIs per vcpu (Marc Zyngier)
- KVM: arm64: vgic-v4: Move the GICv4 residency flow to be driven by vcpu_load/put (Marc Zyngier)
- EDAC, skx: Retrieve and print retry_rd_err_log registers (Tony Luck)
- tools headers uapi: Sync asm-generic/mman-common.h with the kernel (Arnaldo Carvalho de Melo)
- tools build: Check if gettid() is available before providing helper (Arnaldo Carvalho de Melo)
- efi: Make efi_rts_work accessible to efi page fault handler (Sai Praneeth)
- netfilter: conntrack: udp: set stream timeout to 2 minutes (Florian Westphal)
- netfilter: conntrack: udp: only extend timeout to stream mode after 2s (Florian Westphal)
- iomap: Allow forcing of waiting for running DIO in iomap_dio_rw() (Jan Kara)
- io_uring: fix poll_list race for SETUP_IOPOLL|SETUP_SQPOLL (Xiaoguang Wang)
- io_uring: add io_uring support (Joseph Qi)
- ext4: start to support iopoll method (Xiaoguang Wang)
- ext4: Move to shared i_rwsem even without dioread_nolock mount opt (Ritesh Harjani)
- ext4: Start with shared i_rwsem in case of DIO instead of exclusive (Ritesh Harjani)
- ext4: fix ext4_dax_read/write inode locking sequence for IOCB_NOWAIT (Ritesh Harjani)
- ext4: introduce direct I/O write using iomap infrastructure (Matthew Bobrowski)
- iomap: move the iomap_dio_rw ->end_io callback into a structure (Christoph Hellwig)
- ext4: update ext4_sync_file() to not use __generic_file_fsync() (Matthew Bobrowski)
- ext4: move inode extension check out from ext4_iomap_alloc() (Matthew Bobrowski)
- ext4: move inode extension/truncate code out from ->iomap_end() callback (Matthew Bobrowski)
- ext4: introduce direct I/O read using iomap infrastructure (Matthew Bobrowski)
- ext4: introduce new callback for IOMAP_REPORT (Matthew Bobrowski)
- iomap: use a srcmap for a read-modify-write I/O (Goldwyn Rodrigues)
- ext4: split IOMAP_WRITE branch in ext4_iomap_begin() into helper (Matthew Bobrowski)
- ext4: move set iomap routines into a separate helper ext4_set_iomap() (Matthew Bobrowski)
- ext4: iomap that extends beyond EOF should be marked dirty (Matthew Bobrowski)
- ext4: update direct I/O read lock pattern for IOCB_NOWAIT (Matthew Bobrowski)
- ext4: reorder map.m_flags checks within ext4_iomap_begin() (Matthew Bobrowski)
- x86/amd_nb: Make hygon_nb_misc_ids static (Pu Wen)
- io-wq: add support for bounded vs unbunded work (Jens Axboe)
- io-wq: io_wqe_run_queue() doesn't need to use list_empty_careful() (Jens Axboe)
- io-wq: use proper nesting IRQ disabling spinlocks for cancel (Jens Axboe)
- io-wq: use kfree_rcu() to simplify the code (YueHaibing)
- net: add __sys_accept4_file() helper (Jens Axboe)
- sched/core, workqueues: Distangle worker accounting from rq lock (Thomas Gleixner)
- sched: Remove stale PF_MUTEX_TESTER bit (Thomas Gleixner)
- ixgbe: Fix calculation of queue with VFs and flow director on interface flap (Cambda Zhu)
- tcp: do not leave dangling pointers in tp->highest_sack (Eric Dumazet)
- include/linux/notifier.h: SRCU: fix ctags (Sam Protsenko)
- mm: thp: don't need care deferred split queue in memcg charge move path (Wei Yang)
- signal: simplify set_user_sigmask/restore_user_sigmask (Oleg Nesterov)
- block: never take page references for ITER_BVEC (Christoph Hellwig)
- signal: remove the wrong signal_pending() check in restore_user_sigmask() (Oleg Nesterov)
- uio: make import_iovec()/compat_import_iovec() return bytes on success (Jens Axboe)
- blk-mq: fix NULL pointer deference in case no poll implementation (Joseph Qi)
- req->error only used for iopoll (Stefan Bühler)
- fs: add sync_file_range() helper (Jens Axboe)
- drm/amdgpu/gmc: fix compiler errors [-Werror,-Wmissing-braces] (V2) (Shirish S)
- add perf smmu-v3 support and fixed duplicate function (Zou Cao)
- iommu/dma: Use NUMA aware memory allocations in __iommu_dma_alloc_pages() (Ganapatrao Kulkarni)
- mm/hotplug: make remove_memory() interface usable (Pavel Tatashin)
- mm/memory_hotplug: make remove_memory() take the device_hotplug_lock (David Hildenbrand)
- mm: initialize MAX_ORDER_NR_PAGES at a time instead of doing larger sections (Alexander Duyck)
- mm: implement new zone specific memblock iterator (Alexander Duyck)
- mm: drop meminit_pfn_in_nid as it is redundant (Alexander Duyck)
- mm: use mm_zero_struct_page from SPARC on all 64b architectures (Alexander Duyck)
- nvme-mpath: remove I/O polling support (Christoph Hellwig)
- amd-gpu: Don't undefine READ and WRITE (David Howells)
- blk-mq: grab .q_usage_counter when queuing request from plug code path (Ming Lei)
- block/bfq: fix ifdef for CONFIG_BFQ_GROUP_IOSCHED=y (Konstantin Khlebnikov)
- block: remove bogus check for queue_lock assignment (Jens Axboe)
- block: don't use bio->bi_vcnt to figure out segment number (Ming Lei)
- scsi: core: Run queue when state is set to running after being blocked (zhengbin)
- block: fix NULL pointer dereference in register_disk (zhengbin)
- blk-mq: Add a NULL check in blk_mq_free_map_and_requests() (Dan Carpenter)
- blk-mq: place trace_block_getrq() in correct place (Xiaoguang Wang)
- blk-mq: protect debugfs_create_files() from failures (Greg Kroah-Hartman)
- blk-mq: not embed .mq_kobj and ctx->kobj into queue instance (Ming Lei)
- blk-mq: fallback to previous nr_hw_queues when updating fails (Jianchao Wang)
- blk-mq: realloc hctx when hw queue is mapped to another node (Jianchao Wang)
- blk-mq: adjust debugfs and sysfs register when updating nr_hw_queues (Jianchao Wang)
- mm/memblock.c: skip kmemleak for kasan_init() (Qian Cai)
- tpm: tpm_tis_spi: Introduce a flow control callback (Stephen Boyd)
- tcp: Add snd_wnd to TCP_INFO (Thomas Higdon)
- tcp: Add TCP_INFO counter for packets received out-of-order (Thomas Higdon)
- mm: don't raise MEMCG_OOM event due to failed high-order allocation (Roman Gushchin)
- mm, memcg: introduce memory.events.local (Shakeel Butt)
- mm, memcg: consider subtrees in memory.events (Chris Down)
- iio: adc: ti-ads7950: use SPI_CS_WORD to reduce CPU usage (David Lechner)
- spi: spi-davinci: Add support for SPI_CS_WORD (David Lechner)
- spi: add software implementation for SPI_CS_WORD (David Lechner)
- spi: add new SPI_CS_WORD flag (David Lechner)
- spi: davinci: Remove chip select GPIO pdata (Linus Walleij)
- block: fix 32 bit overflow in __blkdev_issue_discard() (Dave Chinner)
- block: cleanup __blkdev_issue_discard() (Ming Lei)
- iov_iter: fix iov_iter_type (Ming Lei)
- tools headers: Update x86's syscall_64.tbl and uapi/asm-generic/unistd (Arnaldo Carvalho de Melo)
- block: add BIO_NO_PAGE_REF flag (Jens Axboe)
- iov_iter: add ITER_BVEC_FLAG_NO_REF flag (Jens Axboe)
- net: split out functions related to registering inflight socket files (Jens Axboe)
- block: implement bio helper to add iter bvec pages to bio (Jens Axboe)
- fs: add fget_many() and fput_many() (Jens Axboe)
- xfs: Fix stale data exposure when readahead races with hole punch (Jan Kara)
- fs: Export generic_fadvise() (Jan Kara)
- xfs: fix missed wakeup on l_flush_wait (Rik van Riel)
- fs: xfs: xfs_log: Don't use KM_MAYFAIL at xfs_log_reserve(). (Tetsuo Handa)
- xfs: fix off-by-one error in rtbitmap cross-reference (Darrick J. Wong)
- xfs: unlock inode when xfs_ioctl_setattr_get_trans can't get transaction (Darrick J. Wong)
- xfs: fix backwards endian conversion in scrub (Darrick J. Wong)
- xfs: libxfs: move xfs_perag_put late (Pan Bian)
- xfs: finobt AG reserves don't consider last AG can be a runt (Dave Chinner)
- exportfs: fix 'passing zero to ERR_PTR()' warning (YueHaibing)
- NFS: change sign of nfs_fh length (Frank Sorenson)
- nfs: fix xfstest generic/099 failed on nfsv3 (ZhangXiaoxu)
- fs/sync.c: sync_file_range(2) may use WB_SYNC_ALL writeback (Amir Goldstein)
- sysfs: convert BUG_ON to WARN_ON (Greg Kroah-Hartman)
- ext4: fix integer overflow when calculating commit interval (zhangyi (F))
- ext4: cond_resched in work-heavy group loops (Khazhismel Kumykov)
- jbd2: discard dirty data when forgetting an un-journalled buffer (zhangyi (F))
- ext4: replace opencoded i_writecount usage with inode_is_open_for_write() (Nikolay Borisov)
- block: introduce mp_bvec_for_each_page() for iterating over page (Ming Lei)
- block: introduce bvec_nth_page() (Joseph Qi)
- iomap: wire up the iopoll method (Christoph Hellwig)
- block: add bio_set_polled() helper (Jens Axboe)
- block: wire up block device iopoll method (Christoph Hellwig)
- fs: add an iopoll method to struct file_operations (Christoph Hellwig)
- block: clear REQ_HIPRI if polling is not supported (Christoph Hellwig)
- signal: Add restore_user_sigmask() (Deepa Dinamani)
- signal: Add set_user_sigmask() (Deepa Dinamani)
- block: remove ->poll_fn (Christoph Hellwig)
- block: make blk_poll() take a parameter on whether to spin or not (Jens Axboe)
- blk-mq: when polling for IO, look for any completion (Jens Axboe)
- block: Introduce get_current_ioprio() (Damien Le Moal)
- block: have ->poll_fn() return number of entries polled (Jens Axboe)
- block: for async O_DIRECT, mark us as polling if asked to (Jens Axboe)
- block: add REQ_HIPRI and inherit it from IOCB_HIPRI (Jens Axboe)
- iov_iter: Separate type from direction and use accessor functions (David Howells)
- iov_iter: Use accessor function (David Howells)
- EDAC: skx_common: downgrade message importance on missing PCI device (Aristeu Rozanski)
- tcp: Fix highest_sack and highest_sack_seq (Cambda Zhu)