- 30 9月, 2022 2 次提交
-
-
由 Jens Axboe 提交于
With end_io handlers now being able to potentially pass ownership of the request upon completion, we can allow requests with end_io handlers in the batch completion handling. Reviewed-by: NAnuj Gupta <anuj20.g@samsung.com> Reviewed-by: NKeith Busch <kbusch@kernel.org> Co-developed-by: NStefan Roesch <shr@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Everything is just converted to returning RQ_END_IO_NONE, and there should be no functional changes with this patch. In preparation for allowing the end_io handler to pass ownership back to the block layer, rather than retain ownership of the request. Reviewed-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 29 9月, 2022 1 次提交
-
-
由 Pankaj Raghav 提交于
The current implementation of blk_mq_plug() disables plugging for all operations that involves a transfer to the device as we just check if the last bit in op_is_write() function. Modify blk_mq_plug() to disable plugging only for REQ_OP_WRITE and REQ_OP_WRITE_ZEROS as they might require a zone lock. Suggested-by: NChristoph Hellwig <hch@lst.de> Suggested-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NPankaj Raghav <p.raghav@samsung.com> Reviewed-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220929074745.103073-2-p.raghav@samsung.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 27 9月, 2022 3 次提交
-
-
由 Christoph Hellwig 提交于
Replace blk_queue_nowait with a bdev_nowait helpers that takes the block_device given that the I/O submission path should not have to look into the request_queue. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NPankaj Raghav <p.raghav@samsung.com> Link: https://lore.kernel.org/r/20220927075815.269694-1-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Mark them as unsigned so that we don't need extra casts, and define them relative to cdword0 instead of requiring extra shifts. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NHannes Reinecke <hare@suse.de>
-
由 Christoph Hellwig 提交于
Pass the gendisk to blkcg_schedule_throttle as part of moving the blk-cgroup infrastructure to be gendisk based. Remove the unused !BLK_CGROUP stub while we're at it. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAndreas Herrmann <aherrmann@suse.de> Acked-by: NTejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220921180501.1539876-17-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 24 9月, 2022 3 次提交
-
-
由 ZiyangZhang 提交于
START_USER_RECOVERY and END_USER_RECOVERY are two new control commands to support user recovery feature. After a crash, user should send START_USER_RECOVERY, it will: (1) check if (a)current ublk_device is UBLK_S_DEV_QUIESCED which was set by quiesce_work and (b)chardev is released (2) reinit all ubqs, including: (a) put the task_struct and reset ->ubq_daemon to NULL. (b) reset all ublk_io. (3) reset ub->mm to NULL. Then, user should start a new process and send FETCH_REQ on each ubq_daemon. Finally, user should send END_USER_RECOVERY, it will: (1) wait for all new ubq_daemons getting ready. (2) update ublksrv_pid (3) unquiesce the request queue and expect incoming ublk_queue_rq() (4) convert ub's state to UBLK_S_DEV_LIVE Note: we can handle STOP_DEV between START_USER_RECOVERY and END_USER_RECOVERY. This is helpful to users who cannot start new process after sending START_USER_RECOVERY ctrl-cmd. Signed-off-by: NZiyangZhang <ZiyangZhang@linux.alibaba.com> Reviewed-by: NMing Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220923153919.44078-7-ZiyangZhang@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 ZiyangZhang 提交于
UBLK_F_USER_RECOVERY_REISSUE implies that: With a dying ubq_daemon, ublk_drv let monitor_work requeues rq issued to userspace(ublksrv) before the ubq_daemon is dying. UBLK_F_USER_RECOVERY_REISSUE is designed for backends which: (1) tolerate double-write since ublk_drv may issue the same rq twice. (2) does not let frontend users get I/O error, such as read-only FS and VM backend. Signed-off-by: NZiyangZhang <ZiyangZhang@linux.alibaba.com> Reviewed-by: NMing Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220923153919.44078-6-ZiyangZhang@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 ZiyangZhang 提交于
Define some macros for recovery feature. UBLK_S_DEV_QUIESCED implies that ublk_device is quiesced and is ready for recovery. This state can be observed by userspace. UBLK_F_USER_RECOVERY implies that: (1) ublk_drv enables recovery feature. It won't let monitor_work to automatically abort rqs and release the device. (2) With a dying ubq_daemon, ublk_drv ends(aborts) rqs issued to userspace(ublksrv) before crash. (3) With a dying ubq_daemon, in task work and ublk_queue_rq(), ublk_drv requeues rqs. Signed-off-by: NZiyangZhang <ZiyangZhang@linux.alibaba.com> Reviewed-by: NMing Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220923153919.44078-3-ZiyangZhang@linux.alibaba.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 23 9月, 2022 1 次提交
-
-
由 Will Deacon 提交于
Due to undocumented, hysterical raisins on x86, the CFI jump-table sections in .text are needlessly aligned to PMD_SIZE in the vmlinux linker script. When compiling a CFI-enabled arm64 kernel with a 64KiB page-size, a PMD maps 512MiB of virtual memory and so the .text section increases to a whopping 940MiB and blows the final Image up to 960MiB. Others report a link failure. Since the CFI jump-table requires only instruction alignment, reduce the alignment directives to function alignment for parity with other parts of the .text section. This reduces the size of the .text section for the aforementioned 64KiB page size arm64 kernel to 19MiB for a much more reasonable total Image size of 39MiB. Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: "Mohan Rao .vanimina" <mailtoc.mohanrao@gmail.com> Cc: Kees Cook <keescook@chromium.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/all/CAL_GTzigiNOMYkOPX1KDnagPhJtFNqSK=1USNbS0wUL4PW6-Uw@mail.gmail.com/ Fixes: cf68fffb ("add support for Clang CFI") Reviewed-by: NMark Rutland <mark.rutland@arm.com> Tested-by: NMark Rutland <mark.rutland@arm.com> Reviewed-by: NSami Tolvanen <samitolvanen@google.com> Reviewed-by: NKees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220922215715.13345-1-will@kernel.orgSigned-off-by: NWill Deacon <will@kernel.org>
-
- 22 9月, 2022 10 次提交
-
-
由 Phil Auld 提交于
As PAGE_SIZE is unsigned long, -1 > PAGE_SIZE when NR_CPUS <= 3. This leads to very large file sizes: topology$ ls -l total 0 -r--r--r-- 1 root root 18446744073709551615 Sep 5 11:59 core_cpus -r--r--r-- 1 root root 4096 Sep 5 11:59 core_cpus_list -r--r--r-- 1 root root 4096 Sep 5 10:58 core_id -r--r--r-- 1 root root 18446744073709551615 Sep 5 10:10 core_siblings -r--r--r-- 1 root root 4096 Sep 5 11:59 core_siblings_list -r--r--r-- 1 root root 18446744073709551615 Sep 5 11:59 die_cpus -r--r--r-- 1 root root 4096 Sep 5 11:59 die_cpus_list -r--r--r-- 1 root root 4096 Sep 5 11:59 die_id -r--r--r-- 1 root root 18446744073709551615 Sep 5 11:59 package_cpus -r--r--r-- 1 root root 4096 Sep 5 11:59 package_cpus_list -r--r--r-- 1 root root 4096 Sep 5 10:58 physical_package_id -r--r--r-- 1 root root 18446744073709551615 Sep 5 10:10 thread_siblings -r--r--r-- 1 root root 4096 Sep 5 11:59 thread_siblings_list Adjust the inequality to catch the case when NR_CPUS is configured to a small value. Fixes: 7ee951ac ("drivers/base: fix userspace break from using bin_attributes for cpumap and cpulist") Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Yury Norov <yury.norov@gmail.com> Cc: stable@vger.kernel.org Cc: feng xiangjun <fengxj325@gmail.com> Reported-by: Nfeng xiangjun <fengxj325@gmail.com> Signed-off-by: NPhil Auld <pauld@redhat.com> Signed-off-by: NYury Norov <yury.norov@gmail.com> Link: https://lore.kernel.org/r/20220906203542.1796629-1-pauld@redhat.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Pavel Begunkov 提交于
Add a zerocopy version of sendmsg. Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6aabc4bdfc0ec78df6ec9328137e394af9d4e7ef.1663668091.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
We need the poll_flags to know how to poll for the IO, and we should have the batch structure in preparation for supporting batched completions with iopoll. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Kanchan Joshi 提交于
This is in preparation to support iopoll for nvme passthrough. Signed-off-by: NKanchan Joshi <joshi.k@samsung.com> Link: https://lore.kernel.org/r/20220823161443.49436-4-joshi.k@samsung.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Kanchan Joshi 提交于
Put this up in the same way as iopoll is done for regular read/write IO. Make place for storing a cookie into struct io_uring_cmd on submission. Perform the completion using the ->uring_cmd_iopoll handler. Signed-off-by: NKanchan Joshi <joshi.k@samsung.com> Signed-off-by: NPankaj Raghav <p.raghav@samsung.com> Link: https://lore.kernel.org/r/20220823161443.49436-3-joshi.k@samsung.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Kanchan Joshi 提交于
io_uring will invoke this to do completion polling on uring-cmd operations. Signed-off-by: NKanchan Joshi <joshi.k@samsung.com> Link: https://lore.kernel.org/r/20220823161443.49436-2-joshi.k@samsung.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Dylan Yudaken 提交于
Add tracing for io_run_local_task_work Signed-off-by: NDylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220830125013.570060-8-dylany@fb.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Dylan Yudaken 提交于
Some workloads rely on a registered eventfd (via io_uring_register_eventfd(3)) in order to wake up and process the io_uring. In the case of a ring setup with IORING_SETUP_DEFER_TASKRUN, that eventfd also needs to be signalled when there are tasks to run. This changes an old behaviour which assumed 1 eventfd signal implied at least 1 CQE, however only when this new flag is set (and so old users will not notice). This should be expected with the IORING_SETUP_DEFER_TASKRUN flag as it is not guaranteed that every task will result in a CQE. Signed-off-by: NDylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220830125013.570060-7-dylany@fb.com [axboe: fold in call_rcu() serialization fix] Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Dylan Yudaken 提交于
Allow deferring async tasks until the user calls io_uring_enter(2) with the IORING_ENTER_GETEVENTS flag. Enable this mode with a flag at io_uring_setup time. This functionality requires that the later io_uring_enter will be called from the same submission task, and therefore restrict this flag to work only when IORING_SETUP_SINGLE_ISSUER is also set. Being able to hand pick when tasks are run prevents the problem where there is current work to be done, however task work runs anyway. For example, a common workload would obtain a batch of CQEs, and process each one. Interrupting this to additional taskwork would add latency but not gain anything. If instead task work is deferred to just before more CQEs are obtained then no additional latency is added. The way this is implemented is by trying to keep task work local to a io_ring_ctx, rather than to the submission task. This is required, as the application will want to wake up only a single io_ring_ctx at a time to process work, and so the lists of work have to be kept separate. This has some other benefits like not having to check the task continually in handle_tw_list (and potentially unlocking/locking those), and reducing locks in the submit & process completions path. There are networking cases where using this option can reduce request latency by 50%. For example a contrived example using [1] where the client sends 2k data and receives the same data back while doing some system calls (to trigger task work) shows this reduction. The reason ends up being that if sending responses is delayed by processing task work, then the client side sits idle. Whereas reordering the sends first means that the client runs it's workload in parallel with the local task work. [1]: Using https://github.com/DylanZA/netbench/tree/defer_run Client: ./netbench --client_only 1 --control_port 10000 --host <host> --tx "epoll --threads 16 --per_thread 1 --size 2048 --resp 2048 --workload 1000" Server: ./netbench --server_only 1 --control_port 10000 --rx "io_uring --defer_taskrun 0 --workload 100" --rx "io_uring --defer_taskrun 1 --workload 100" Signed-off-by: NDylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220830125013.570060-5-dylany@fb.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Dylan Yudaken 提交于
Guard wakeups that the user can trigger, and that may end up triggering a call back into eventfd_signal. This is in addition to the current approach that only guards in eventfd_signal. Rename in_eventfd_signal -> in_eventfd at the same time to reflect this. Without this there would be a deadlock in the following code using libaio: int main() { struct io_context *ctx = NULL; struct iocb iocb; struct iocb *iocbs[] = { &iocb }; int evfd; uint64_t val = 1; evfd = eventfd(0, EFD_CLOEXEC); assert(!io_setup(2, &ctx)); io_prep_poll(&iocb, evfd, POLLIN); io_set_eventfd(&iocb, evfd); assert(1 == io_submit(ctx, 1, iocbs)); write(evfd, &val, 8); } Signed-off-by: NDylan Yudaken <dylany@fb.com> Reviewed-by: NJens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20220816135959.1490641-1-dylany@fb.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 21 9月, 2022 2 次提交
-
-
由 Bart Van Assche 提交于
The documentation of the blk_eh_timer_return enumeration values does not reflect correctly how e.g. the SCSI core uses these values. Fix the documentation. Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Fixes: 88b0cfad ("block: document the blk_eh_timer_return values") Signed-off-by: NBart Van Assche <bvanassche@acm.org> Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com> Link: https://lore.kernel.org/r/20220920200626.3422296-1-bvanassche@acm.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Lu Baolu 提交于
This reverts commit 9cd4f143. Some issues were reported on the original commit. Some thunderbolt devices don't work anymore due to the following DMA fault. DMAR: DRHD: handling fault status reg 2 DMAR: [INTR-REMAP] Request device [09:00.0] fault index 0x8080 [fault reason 0x25] Blocked a compatibility format interrupt request Bring it back for now to avoid functional regression. Fixes: 9cd4f143 ("iommu/vt-d: Fix possible recursive locking in intel_iommu_init()") Link: https://lore.kernel.org/linux-iommu/485A6EA5-6D58-42EA-B298-8571E97422DE@getmailspring.com/ Link: https://bugzilla.kernel.org/show_bug.cgi?id=216497 Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: <stable@vger.kernel.org> # 5.19.x Reported-and-tested-by: NGeorge Hilliard <thirtythreeforty@gmail.com> Signed-off-by: NLu Baolu <baolu.lu@linux.intel.com> Link: https://lore.kernel.org/r/20220920081701.3453504-1-baolu.lu@linux.intel.comSigned-off-by: NJoerg Roedel <jroedel@suse.de>
-
- 20 9月, 2022 2 次提交
-
-
由 Christoph Hellwig 提交于
PSI accounting is now done by the VM code, where it should have been since the beginning. Signed-off-by: NChristoph Hellwig <hch@lst.de> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Link: https://lore.kernel.org/r/20220915094200.139713-6-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
PSI tries to account for the cost of bringing back in pages discarded by the MM LRU management. Currently the prime place for that is hooked into the bio submission path, which is a rather bad place: - it does not actually account I/O for non-block file systems, of which we have many - it adds overhead and a layering violation to the block layer Add the accounting into the two places in the core MM code that read pages into an address space by calling into ->read_folio and ->readahead so that the entire file system operations are covered, to broaden the coverage and allow removing the accounting in the block layer going forward. As psi_memstall_enter can deal with nested calls this will not lead to double accounting even while the bio annotations are still present. Signed-off-by: NChristoph Hellwig <hch@lst.de> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Link: https://lore.kernel.org/r/20220915094200.139713-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 16 9月, 2022 2 次提交
-
-
由 Benjamin Poirier 提交于
There are already a few definitions of arrays containing MULTICAST_LACPDU_ADDR and the next patch will add one more use. These all contain the same constant data so define one common instance for all bonding code. Signed-off-by: NBenjamin Poirier <bpoirier@nvidia.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Haimin Zhang 提交于
There is uninit value bug in dgram_sendmsg function in net/ieee802154/socket.c when the length of valid data pointed by the msg->msg_name isn't verified. We introducing a helper function ieee802154_sockaddr_check_size to check namelen. First we check there is addr_type in ieee802154_addr_sa. Then, we check namelen according to addr_type. Also fixed in raw_bind, dgram_bind, dgram_connect. Signed-off-by: NHaimin Zhang <tcs_kernel@tencent.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 9月, 2022 1 次提交
-
-
由 Jiangshan Yi 提交于
Fix spelling typo in comment. Reported-by: Nk2ci <kernel-bot@kylinos.cn> Signed-off-by: NJiangshan Yi <yijiangshan@kylinos.cn> Signed-off-by: NHelge Deller <deller@gmx.de>
-
- 12 9月, 2022 2 次提交
-
-
由 Yu Kuai 提交于
Test scripts: cd /sys/fs/cgroup/blkio/ echo "8:0 1024" > blkio.throttle.write_bps_device echo $$ > cgroup.procs dd if=/dev/zero of=/dev/sda bs=10k count=1 oflag=direct & dd if=/dev/zero of=/dev/sda bs=10k count=1 oflag=direct & Test result: 10240 bytes (10 kB, 10 KiB) copied, 10.0134 s, 1.0 kB/s 10240 bytes (10 kB, 10 KiB) copied, 10.0135 s, 1.0 kB/s The problem is that the second bio is finished after 10s instead of 20s. Root cause: 1) second bio will be flagged: __blk_throtl_bio while (true) { ... if (sq->nr_queued[rw]) -> some bio is throttled already break }; bio_set_flag(bio, BIO_THROTTLED); -> flag the bio 2) flagged bio will be dispatched without waiting: throtl_dispatch_tg tg_may_dispatch tg_with_in_bps_limit if (bps_limit == U64_MAX || bio_flagged(bio, BIO_THROTTLED)) *wait = 0; -> wait time is zero return true; commit 9f5ede3c ("block: throttle split bio in case of iops limit") support to count split bios for iops limit, thus it adds flagged bio checking in tg_with_in_bps_limit() so that split bios will only count once for bps limit, however, it introduce a new problem that io throttle won't work if multiple bios are throttled. In order to fix the problem, handle iops/bps limit in different ways: 1) for iops limit, there is no flag to record if the bio is throttled, and iops is always applied. 2) for bps limit, original bio will be flagged with BIO_BPS_THROTTLED, and io throttle will ignore bio with the flag. Noted this patch also remove the code to set flag in __bio_clone(), it's introduced in commit 111be883 ("block-throttle: avoid double charge"), and author thinks split bio can be resubmited and throttled again, which is wrong because split bio will continue to dispatch from caller. Fixes: 9f5ede3c ("block: throttle split bio in case of iops limit") Cc: <stable@vger.kernel.org> Signed-off-by: NYu Kuai <yukuai3@huawei.com> Acked-by: NTejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220829022240.3348319-2-yukuai1@huaweicloud.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Keith Busch 提交于
Batched completions can clear multiple bits, but we're only decrementing the wait_cnt by one each time. This can cause waiters to never be woken, stalling IO. Use the batched count instead. Link: https://bugzilla.kernel.org/show_bug.cgi?id=215679Signed-off-by: NKeith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20220909184022.1709476-1-kbusch@fb.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 11 9月, 2022 1 次提交
-
-
由 Lu Baolu 提交于
The global rwsem dmar_global_lock was introduced by commit 3a5670e8 ("iommu/vt-d: Introduce a rwsem to protect global data structures"). It is used to protect DMAR related global data from DMAR hotplug operations. The dmar_global_lock used in the intel_iommu_init() might cause recursive locking issue, for example, intel_iommu_get_resv_regions() is taking the dmar_global_lock from within a section where intel_iommu_init() already holds it via probe_acpi_namespace_devices(). Using dmar_global_lock in intel_iommu_init() could be relaxed since it is unlikely that any IO board must be hot added before the IOMMU subsystem is initialized. This eliminates the possible recursive locking issue by moving down DMAR hotplug support after the IOMMU is initialized and removing the uses of dmar_global_lock in intel_iommu_init(). Fixes: d5692d4a ("iommu/vt-d: Fix suspicious RCU usage in probe_acpi_namespace_devices()") Reported-by: NRobin Murphy <robin.murphy@arm.com> Signed-off-by: NLu Baolu <baolu.lu@linux.intel.com> Reviewed-by: NKevin Tian <kevin.tian@intel.com> Link: https://lore.kernel.org/r/894db0ccae854b35c73814485569b634237b5538.1657034828.git.robin.murphy@arm.com Link: https://lore.kernel.org/r/20220718235325.3952426-1-baolu.lu@linux.intel.comSigned-off-by: NJoerg Roedel <jroedel@suse.de>
-
- 10 9月, 2022 1 次提交
-
-
由 Luiz Augusto von Dentz 提交于
Recent changes breaks HCIGETDEVINFO since it changes the size of hci_dev_info. Fixes: 26afbd82 ("Bluetooth: Add initial implementation of CIS connections") Reported-by: NMarek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: NLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
-
- 08 9月, 2022 1 次提交
-
-
由 Linus Torvalds 提交于
Commit d4252071 ("add barriers to buffer_uptodate and set_buffer_uptodate") added proper memory barriers to the buffer head BH_Uptodate bit, so that anybody who tests a buffer for being up-to-date will be guaranteed to actually see initialized state. However, that commit didn't _just_ add the memory barrier, it also ended up dropping the "was it already set" logic that the BUFFER_FNS() macro had. That's conceptually the right thing for a generic "this is a memory barrier" operation, but in the case of the buffer contents, we really only care about the memory barrier for the _first_ time we set the bit, in that the only memory ordering protection we need is to avoid anybody seeing uninitialized memory contents. Any other access ordering wouldn't be about the BH_Uptodate bit anyway, and would require some other proper lock (typically BH_Lock or the folio lock). A reader that races with somebody invalidating the buffer head isn't an issue wrt the memory ordering, it's a serialization issue. Now, you'd think that the buffer head operations don't matter in this day and age (and I certainly thought so), but apparently some loads still end up being heavy users of buffer heads. In particular, the kernel test robot reported that not having this bit access optimization in place caused a noticeable direct IO performance regression on ext4: fxmark.ssd_ext4_no_jnl_DWTL_54_directio.works/sec -26.5% regression although you presumably need a fast disk and a lot of cores to actually notice. Link: https://lore.kernel.org/all/Yw8L7HTZ%2FdE2%2Fo9C@xsang-OptiPlex-9020/Reported-by: Nkernel test robot <oliver.sang@intel.com> Tested-by: NFengwei Yin <fengwei.yin@intel.com> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: stable@kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 07 9月, 2022 3 次提交
-
-
由 Ilpo Järvinen 提交于
A very common pattern in the drivers is to advance xmit tail index and do bookkeeping of Tx'ed characters. Create uart_xmit_advance() to handle it. Reviewed-by: NAndy Shevchenko <andy.shevchenko@gmail.com> Cc: stable <stable@kernel.org> Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20220901143934.8850-2-ilpo.jarvinen@linux.intel.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Menglong Dong 提交于
As Eric reported, the 'reason' field is not presented when trace the kfree_skb event by perf: $ perf record -e skb:kfree_skb -a sleep 10 $ perf script ip_defrag 14605 [021] 221.614303: skb:kfree_skb: skbaddr=0xffff9d2851242700 protocol=34525 location=0xffffffffa39346b1 reason: The cause seems to be passing kernel address directly to TP_printk(), which is not right. As the enum 'skb_drop_reason' is not exported to user space through TRACE_DEFINE_ENUM(), perf can't get the drop reason string from the 'reason' field, which is a number. Therefore, we introduce the macro DEFINE_DROP_REASON(), which is used to define the trace enum by TRACE_DEFINE_ENUM(). With the help of DEFINE_DROP_REASON(), now we can remove the auto-generate that we introduced in the commit ec43908d ("net: skb: use auto-generation to convert skb drop reason to string"), and define the string array 'drop_reasons'. Hmmmm...now we come back to the situation that have to maintain drop reasons in both enum skb_drop_reason and DEFINE_DROP_REASON. But they are both in dropreason.h, which makes it easier. After this commit, now the format of kfree_skb is like this: $ cat /tracing/events/skb/kfree_skb/format name: kfree_skb ID: 1524 format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1; field:void * skbaddr; offset:8; size:8; signed:0; field:void * location; offset:16; size:8; signed:0; field:unsigned short protocol; offset:24; size:2; signed:0; field:enum skb_drop_reason reason; offset:28; size:4; signed:0; print fmt: "skbaddr=%p protocol=%u location=%p reason: %s", REC->skbaddr, REC->protocol, REC->location, __print_symbolic(REC->reason, { 1, "NOT_SPECIFIED" }, { 2, "NO_SOCKET" } ...... Fixes: ec43908d ("net: skb: use auto-generation to convert skb drop reason to string") Link: https://lore.kernel.org/netdev/CANn89i+bx0ybvE55iMYf5GJM48WwV1HNpdm9Q6t-HaEstqpCSA@mail.gmail.com/Reported-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NMenglong Dong <imagedong@tencent.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Christoph Hellwig 提交于
Now that the remaining users in drivers are gone, this function can be marked static. Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 06 9月, 2022 1 次提交
-
-
由 Vitaly Kuznetsov 提交于
There are already three places in kernel which define PCI_VENDOR_ID_MICROSOFT and two for PCI_DEVICE_ID_HYPERV_VIDEO and there's a need to use these from core VMBus code. Move the defines where they belong. No functional change. Reviewed-by: NMichael Kelley <mikelley@microsoft.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # pci_ids.h Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Link: https://lore.kernel.org/r/20220827130345.1320254-2-vkuznets@redhat.comSigned-off-by: NWei Liu <wei.liu@kernel.org>
-
- 05 9月, 2022 3 次提交
-
-
Remove the CONFIG_PREEMPT_RT symbol from the ifdef around do_softirq_own_stack() and move it to Kconfig instead. Enable softirq stacks based on SOFTIRQ_ON_OWN_STACK which depends on HAVE_SOFTIRQ_ON_OWN_STACK and its default value is set to !PREEMPT_RT. This ensures that softirq stacks are not used on PREEMPT_RT and avoids a 'select' statement on an option which has a 'depends' statement. Link: https://lore.kernel.org/YvN5E%2FPrHfUhggr7@linutronix.deSigned-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: NArnd Bergmann <arnd@arndb.de>
-
由 Maher Sanalla 提交于
When the RDMA auxiliary driver probes, it sets its profile based on devlink driverinit value. The latter might not be in sync with FW yet (In case devlink reload is not performed), thus causing a mismatch between RDMA driver and FW. This results in the following FW syndrome when the RDMA driver tries to adjust RoCE state, which fails the probe: "0xC1F678 | modify_nic_vport_context: roce_en set on a vport that doesn't support roce" To prevent this, select the PF profile based on FW RoCE capability instead of relying on devlink driverinit value. To provide backward compatibility of the RoCE disable feature, on older FW's where roce_rw is not set (FW RoCE capability is read-only), keep the current behavior e.g., rely on devlink driverinit value. Fixes: fbfa97b4 ("net/mlx5: Disable roce at HCA level") Reviewed-by: NShay Drory <shayd@nvidia.com> Reviewed-by: NMichael Guralnik <michaelgur@nvidia.com> Reviewed-by: NSaeed Mahameed <saeedm@nvidia.com> Signed-off-by: NMaher Sanalla <msanalla@nvidia.com> Link: https://lore.kernel.org/r/cb34ce9a1df4a24c135cb804db87f7d2418bd6cc.1661763459.git.leonro@nvidia.comSigned-off-by: NLeon Romanovsky <leon@kernel.org>
-
由 Greg Kroah-Hartman 提交于
There is a very common pattern of using debugfs_remove(debufs_lookup(..)) which results in a dentry leak of the dentry that was looked up. Instead of having to open-code the correct pattern of calling dput() on the dentry, create debugfs_lookup_and_remove() to handle this pattern automatically and properly without any memory leaks. Cc: stable <stable@kernel.org> Reported-by: NKuyo Chang <kuyo.chang@mediatek.com> Tested-by: NKuyo Chang <kuyo.chang@mediatek.com> Link: https://lore.kernel.org/r/YxIaQ8cSinDR881k@kroah.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 04 9月, 2022 1 次提交
-
-
由 Jens Axboe 提交于
This reverts commit 16ede669. This is causing issues with CPU stalls on my test box, revert it for now until we understand what is going on. It looks like infinite looping off sbitmap_queue_wake_up(), but hard to tell with a lot of CPUs hitting this issue and the console scrolling infinitely. Link: https://lore.kernel.org/linux-block/e742813b-ce5c-0d58-205b-1626f639b1bd@kernel.dk/Signed-off-by: NJens Axboe <axboe@kernel.dk>
-