- 03 6月, 2021 4 次提交
-
-
由 Martin Belanger 提交于
In our application, we need a way to force TCP connections to go out a specific IP interface instead of letting Linux select the interface based on the routing tables. Add the 'host-iface' option to allow specifying the interface to use. When the option host-iface is specified, the driver uses the specified interface to set the option SO_BINDTODEVICE on the TCP socket before connecting. This new option is needed in addtion to the existing host-traddr for the following reasons: Specifying an IP interface by its associated IP address is less intuitive than specifying the actual interface name and, in some cases, simply doesn't work. That's because the association between interfaces and IP addresses is not predictable. IP addresses can be changed or can change by themselves over time (e.g. DHCP). Interface names are predictable [1] and will persist over time. Consider the following configuration. 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state ... link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 100.0.0.100/24 scope global lo valid_lft forever preferred_lft forever 2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc ... link/ether 08:00:27:21:65:ec brd ff:ff:ff:ff:ff:ff inet 100.0.0.100/24 scope global enp0s3 valid_lft forever preferred_lft forever 3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc ... link/ether 08:00:27:4f:95:5c brd ff:ff:ff:ff:ff:ff inet 100.0.0.100/24 scope global enp0s8 valid_lft forever preferred_lft forever The above is a VM that I configured with the same IP address (100.0.0.100) on all interfaces. Doing a reverse lookup to identify the unique interface associated with 100.0.0.100 does not work here. And this is why the option host_iface is required. I understand that the above config does not represent a standard host system, but I'm using this to prove a point: "We can never know how users will configure their systems". By te way, The above configuration is perfectly fine by Linux. The current TCP implementation for host_traddr performs a bind()-before-connect(). This is a common construct to set the source IP address on a TCP socket before connecting. This has no effect on how Linux selects the interface for the connection. That's because Linux uses the Weak End System model as described in RFC1122 [2]. On the other hand, setting the Source IP Address has benefits and should be supported by linux-nvme. In fact, setting the Source IP Address is a mandatory FedGov requirement (e.g. connection to a RADIUS/TACACS+ server). Consider the following configuration. $ ip addr list dev enp0s8 3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc ... link/ether 08:00:27:4f:95:5c brd ff:ff:ff:ff:ff:ff inet 192.168.56.101/24 brd 192.168.56.255 scope global enp0s8 valid_lft 426sec preferred_lft 426sec inet 192.168.56.102/24 scope global secondary enp0s8 valid_lft forever preferred_lft forever inet 192.168.56.103/24 scope global secondary enp0s8 valid_lft forever preferred_lft forever inet 192.168.56.104/24 scope global secondary enp0s8 valid_lft forever preferred_lft forever Here we can see that several addresses are associated with interface enp0s8. By default, Linux always selects the default IP address, 192.168.56.101, as the source address when connecting over interface enp0s8. Some users, however, want the ability to specify a different source address (e.g., 192.168.56.102, 192.168.56.103, ...). The option host_traddr can be used as-is to perform this function. In conclusion, I believe that we need 2 options for TCP connections. One that can be used to specify an interface (host-iface). And one that can be used to set the source address (host-traddr). Users should be allowed to use one or the other, or both, or none. Of course, the documentation for host_traddr will need some clarification. It should state that when used for TCP connection, this option only sets the source address. And the documentation for host_iface should say that this option is only available for TCP connections. References: [1] https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames/ [2] https://tools.ietf.org/html/rfc1122 Tested both IPv4 and IPv6 connections. Signed-off-by: NMartin Belanger <martin.belanger@dell.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Mario Limonciello 提交于
The documentation around the StorageD3Enable property hints that it should be made on the PCI device. This is where newer AMD systems set the property and it's required for S0i3 support. So rather than look for nodes of the root port only present on Intel systems, switch to the companion ACPI device for all systems. David Box from Intel indicated this should work on Intel as well. Link: https://lore.kernel.org/linux-nvme/YK6gmAWqaRmvpJXb@google.com/T/#m900552229fa455867ee29c33b854845fce80ba70 Link: https://docs.microsoft.com/en-us/windows-hardware/design/component-guidelines/power-management-for-storage-hardware-devices-intro Fixes: df4f9bc4 ("nvme-pci: add support for ACPI StorageD3Enable property") Suggested-by: NLiang Prike <Prike.Liang@amd.com> Acked-by: NRaul E Rangel <rrangel@chromium.org> Signed-off-by: NMario Limonciello <mario.limonciello@amd.com> Reviewed-by: NDavid E. Box <david.e.box@linux.intel.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Alexey Bogoslavsky 提交于
The algorithm that was used until now for building the APST configuration table has been found to produce entries with excessively long ITPT (idle time prior to transition) for devices declaring relatively long entry and exit latencies for non-operational power states. This leads to unnecessary waste of power and, as a result, failure to pass mandatory power consumption tests on Chromebook platforms. The new algorithm is based on two predefined ITPT values and two predefined latency tolerances. Based on these values, as well as on exit and entry latencies reported by the device, the algorithm looks for up to 2 suitable non-operational power states to use as primary and secondary APST transition targets. The predefined values are supplied to the nvme driver as module parameters: - apst_primary_timeout_ms (default: 100) - apst_secondary_timeout_ms (default: 2000) - apst_primary_latency_tol_us (default: 15000) - apst_secondary_latency_tol_us (default: 100000) The algorithm echoes the approach used by Intel's and Microsoft's drivers on Windows. The specific default parameter values are also based on those drivers. Yet, this patch doesn't introduce the ability to dynamically regenerate the APST table in the event of switching the power source from AC to battery and back. Adding this functionality may be considered in the future. In the meantime, the timeouts and tolerances reflect a compromise between values used by Microsoft for AC and battery scenarios. In most NVMe devices the new algorithm causes them to implement a more aggressive power saving policy. While beneficial in most cases, this sometimes comes at the price of a higher IO processing latency in certain scenarios as well as at the price of a potential impact on the drive's endurance (due to more frequent context saving when entering deep non- operational states). So in order to provide a fallback for systems where these regressions cannot be tolerated, the patch allows to revert to the legacy behavior by setting either apst_primary_timeout_ms or apst_primary_latency_tol_us parameter to 0. Eventually (and possibly after fine tuning the default values of the module parameters) the legacy behavior can be removed. TESTING. The new algorithm has been extensively tested. Initially, simulations were used to compare APST tables generated by old and new algorithms for a wide range of devices. After that, power consumption, performance and latencies were measured under different workloads on devices from multiple vendors (WD, Intel, Samsung, Hynix, Kioxia). Below is the description of the tests and the findings. General observations. The effect the patch has on the APST table varies depending on the entry and exit latencies advertised by the devices. For some devices, the effect is negligible (e.g. Kioxia KBG40ZNS), for some significant, making the transitions to PS3 and PS4 much quicker (e.g. WD SN530, Intel 760P), or making the sleep deeper, PS4 rather than PS3 after a similar amount of time (e.g. SK Hynix BC511). For some devices (e.g. Samsung PM991) the effect is mixed: the initial transition happens after a longer idle time, but takes the device to a lower power state. Workflows. In order to evaluate the patch's effect on the power consumption and latency, 7 workflows were used for each device. The workflows were designed to test the scenarios where significant differences between the old and new behaviors are most likely. Each workflow was tested twice: with the new and with the old APST table generation implementation. Power consumption, performance and latency were measured in the process. The following workflows were used: 1) Consecutive write at the maximum rate with IO depth of 2, with no pauses 2) Repeated pattern of 1000 consecutive writes of 4K packets followed by 50ms idle time 3) Repeated pattern of 1000 consecutive writes of 4K packets followed by 150ms idle time 4) Repeated pattern of 1000 consecutive writes of 4K packets followed by 500ms idle time 5) Repeated pattern of 1000 consecutive writes of 4K packets followed by 1.5s idle time 6) Repeated pattern of 1000 consecutive writes of 4K packets followed by 5s idle time 7) Repeated pattern of a single random read of a 4K packet followed by 150ms idle time Power consumption Actual power consumption measurements produced predictable results in accordance with the APST mechanism's theory of operation. Devices with long entry and exit latencies such as WD SN530 showed huge improvement on scenarios 4,5 and 6 of up to 62%. Devices such as Kioxia KBG40ZNS where the resulting APST table looks virtually identical with both legacy and new algorithms, showed little or no change in the average power consumption on all workflows. Devices with extra short latencies such as Samsung PM991 showed moderate increase in power consumption of up to 18% in worst case scenarios. In addition, on Intel and Samsung devices a more complex impact was observed on scenarios 3, 4 and 7. Our understanding is that due to longer stay in deep non-operational states between the writes the devices start performing background operations leading to an increase of power consumption. With the old APST tables part of these operations are delayed until the scenario is over and a longer idle period begins, but eventually this extra power is consumed anyway. Performance. In terms of performance measured on sustained write or read scenarios, the effect of the patch is minimal as in this case the device doesn't enter low power states. Latency As expected, in devices where the patch causes a more aggressive power saving policy (e.g. WD SN530, Intel 760P), an increase in latency was observed in certain scenarios. Workflow number 7, specifically designed to simulate the worst case scenario as far as latency is concerned, indeed shows a sharp increase in average latency (~2ms -> ~53ms on Intel 760P and 0.6 -> 10ms on WD SN530). The latency increase on other workloads and other devices is much milder or non-existent. Signed-off-by: NAlexey Bogoslavsky <alexey.bogoslavsky@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Colin Ian King 提交于
The variable ret is being initialized with a value that is never read, it is being updated later on. The assignment is redundant and can be removed. Signed-off-by: NColin Ian King <colin.king@canonical.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 19 5月, 2021 3 次提交
-
-
由 James Smart 提交于
The __nvmf_check_ready() routine used to bounce all filesystem io if the controller state isn't LIVE. However, a later patch changed the logic so that it rejection ends up being based on the Q live check. The FC transport has a slightly different sequence from rdma and tcp for shutting down queues/marking them non-live. FC marks its queue non-live after aborting all ios and waiting for their termination, leaving a rather large window for filesystem io to continue to hit the transport. Unfortunately this resulted in filesystem I/O or applications seeing I/O errors. Change the FC transport to mark the queues non-live at the first sign of teardown for the association (when I/O is initially terminated). Fixes: 73a53799 ("nvme-fabrics: allow to queue requests for live queues") Signed-off-by: NJames Smart <jsmart2021@gmail.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NHimanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Keith Busch 提交于
A possible race condition exists where the request to send data is enqueued from nvme_tcp_handle_r2t()'s will not be observed by nvme_tcp_send_all() if it happens to be running. The driver relies on io_work to send the enqueued request when it is runs again, but the concurrently running nvme_tcp_send_all() may not have released the send_mutex at that time. If no future commands are enqueued to re-kick the io_work, the request will timeout in the SEND_H2C state, resulting in a timeout error like: nvme nvme0: queue 1: timeout request 0x3 type 6 Ensure the io_work continues to run as long as the req_list is not empty. Fixes: db5ad6b7 ("nvme-tcp: try to send request in queue_rq context") Signed-off-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Sagi Grimberg 提交于
Commit db5ad6b7 ("nvme-tcp: try to send request in queue_rq context") added a second context that may perform a network send. This means that now RX and TX are not serialized in nvme_tcp_io_work and can run concurrently. While there is correct mutual exclusion in the TX path (where the send_mutex protect the queue socket send activity) RX activity, and more specifically request completion may run concurrently. This means we must guarantee that any mutation of the request state related to its lifetime, bytes sent must not be accessed when a completion may have possibly arrived back (and processed). The race may trigger when a request completion arrives, processed _and_ reused as a fresh new request, exactly in the (relatively short) window between the last data payload sent and before the request iov_iter is advanced. Consider the following race: 1. 16K write request is queued 2. The nvme command and the data is sent to the controller (in-capsule or solicited by r2t) 3. After the last payload is sent but before the req.iter is advanced, the controller sends back a completion. 4. The completion is processed, the request is completed, and reused to transfer a new request (write or read) 5. The new request is queued, and the driver reset the request parameters (nvme_tcp_setup_cmd_pdu). 6. Now context in (2) resumes execution and advances the req.iter ==> use-after-completion as this is already a new request. Fix this by making sure the request is not advanced after the last data payload send, knowing that a completion may have arrived already. An alternative solution would have been to delay the request completion or state change waiting for reference counting on the TX path, but besides adding atomic operations to the hot-path, it may present challenges in multi-stage R2T scenarios where a r2t handler needs to be deferred to an async execution. Reported-by: NNarayan Ayalasomayajula <narayan.ayalasomayajula@wdc.com> Tested-by: NAnil Mishra <anil.mishra@wdc.com> Reviewed-by: NKeith Busch <kbusch@kernel.org> Cc: stable@vger.kernel.org # v5.8+ Signed-off-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 13 5月, 2021 1 次提交
-
-
由 Hou Pu 提交于
The new ana_log_size should be used instead of the old one. Or kernel NULL pointer dereference will happen like below: [ 38.957849][ T69] BUG: kernel NULL pointer dereference, address: 000000000000003c [ 38.975550][ T69] #PF: supervisor write access in kernel mode [ 38.975955][ T69] #PF: error_code(0x0002) - not-present page [ 38.976905][ T69] PGD 0 P4D 0 [ 38.979388][ T69] Oops: 0002 [#1] SMP NOPTI [ 38.980488][ T69] CPU: 0 PID: 69 Comm: kworker/0:2 Not tainted 5.12.0+ #54 [ 38.981254][ T69] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 38.982502][ T69] Workqueue: events nvme_loop_execute_work [ 38.985219][ T69] RIP: 0010:memcpy_orig+0x68/0x10f [ 38.986203][ T69] Code: 83 c2 20 eb 44 48 01 d6 48 01 d7 48 83 ea 20 0f 1f 00 48 83 ea 20 4c 8b 46 f8 4c 8b 4e f0 4c 8b 56 e8 4c 8b 5e e0 48 8d 76 e0 <4c> 89 47 f8 4c 89 4f f0 4c 89 57 e8 4c 89 5f e0 48 8d 7f e0 73 d2 [ 38.987677][ T69] RSP: 0018:ffffc900001b7d48 EFLAGS: 00000287 [ 38.987996][ T69] RAX: 0000000000000020 RBX: 0000000000000024 RCX: 0000000000000010 [ 38.988327][ T69] RDX: ffffffffffffffe4 RSI: ffff8881084bc004 RDI: 0000000000000044 [ 38.988620][ T69] RBP: 0000000000000024 R08: 0000000100000000 R09: 0000000000000000 [ 38.988991][ T69] R10: 0000000100000000 R11: 0000000000000001 R12: 0000000000000024 [ 38.989289][ T69] R13: ffff8881084bc000 R14: 0000000000000000 R15: 0000000000000024 [ 38.989845][ T69] FS: 0000000000000000(0000) GS:ffff888237c00000(0000) knlGS:0000000000000000 [ 38.990234][ T69] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 38.990490][ T69] CR2: 000000000000003c CR3: 00000001085b2000 CR4: 00000000000006f0 [ 38.991105][ T69] Call Trace: [ 38.994157][ T69] sg_copy_buffer+0xb8/0xf0 [ 38.995357][ T69] nvmet_copy_to_sgl+0x48/0x6d [ 38.995565][ T69] nvmet_execute_get_log_page_ana+0xd4/0x1cb [ 38.995792][ T69] nvmet_execute_get_log_page+0xc9/0x146 [ 38.995992][ T69] nvme_loop_execute_work+0x3e/0x44 [ 38.996181][ T69] process_one_work+0x1c3/0x3c0 [ 38.996393][ T69] worker_thread+0x44/0x3d0 [ 38.996600][ T69] ? cancel_delayed_work+0x90/0x90 [ 38.996804][ T69] kthread+0xf7/0x130 [ 38.996961][ T69] ? kthread_create_worker_on_cpu+0x70/0x70 [ 38.997171][ T69] ret_from_fork+0x22/0x30 [ 38.997705][ T69] Modules linked in: [ 38.998741][ T69] CR2: 000000000000003c [ 39.000104][ T69] ---[ end trace e719927b609d0fa0 ]--- Fixes: 5e1f6899 ("nvme-multipath: fix double initialization of ANA state") Signed-off-by: NHou Pu <houpu.main@gmail.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 12 5月, 2021 1 次提交
-
-
由 Christoph Hellwig 提交于
nvme_init_identify and thus nvme_mpath_init can be called multiple times and thus must not overwrite potentially initialized or in-use fields. Split out a helper for the basic initialization when the controller is initialized and make sure the init_identify path does not blindly change in-use data structures. Fixes: 0d0b660f ("nvme: add ANA support") Reported-by: NMartin Wilck <mwilck@suse.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NHannes Reinecke <hare@suse.de>
-
- 04 5月, 2021 6 次提交
-
-
由 Daniel Wagner 提交于
When a request finally completes in end_io() after it has failed over, the bdev pointer can be stale and thus the system can crash. Set the bdev back to ns head, so the request is map to an active path when resubmitted. Signed-off-by: NDaniel Wagner <dwagner@suse.de> Reviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Tao Chiu 提交于
reset_work() in nvme-pci may hang forever in the following scenario: 1) A reset caused by a command timeout occurs due to a controller being temporarily irresponsive. 2) nvme_reset_work() restarts admin queue at nvme_alloc_admin_tags(). At the same time, a user-submitted admin command is queued and waiting for completion. Then, reset_work() changes its state to CONNECTING, and submits an identify command. 3) However, the controller does still not respond to any command, causing a timeout being fired at the user-submitted command. Unfortunately, nvme_timeout() does not see the completion on cq, and any timeout that takes place under CONNECTING state causes a controller shutdown. 4) Normally, the identify command in reset_work() would be canceled with SC_HOST_ABORTED by nvme_dev_disable(), then reset_work can tear down the controller accordingly. But the controller happens to return online and respond the identify command before nvme_dev_disable() should have been reaped it off. 5) reset_work() continues to setup_io_queues() as it observes no error in init_identify(). However, the admin queue has already been quiesced in dev_disable(). Thus, any following commands would be blocked forever in blk_execute_rq(). This can be fixed by restricting usercmd commands when controller is not in a LIVE state in nvme_queue_rq(), as what has been done previously in fabrics. ``` nvme_reset_work(): | nvme_alloc_admin_tags() | | nvme_submit_user_cmd(): nvme_init_identify(): | ... __nvme_submit_sync_cmd(): | ... | ... ---------------------------------------> nvme_timeout(): (Controller starts reponding commands) | nvme_dev_disable(, true): nvme_setup_io_queues(): | __nvme_submit_sync_cmd(): | (hung in blk_execute_rq | since run_hw_queue sees | queue quiesced) | ``` Signed-off-by: NTao Chiu <taochiu@synology.com> Signed-off-by: NCody Wong <codywong@synology.com> Reviewed-by: NLeon Chien <leonchien@synology.com> Reviewed-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Tao Chiu 提交于
queue_rq() in pci only checks if the dispatched queue (nvmeq) is ready, e.g. not being suspended. Since nvme_alloc_admin_tags() in reset flow restarts the admin queue, users are able to submit admin commands to a controller before reset_work() completes. Commands submitted under this condition may interfere with commands that performs identify, IO queue setup in reset_work(), and may result in a hang described in the following patch. As seen in the fabrics, user commands are prevented from being executed under inproper controller states. We may reuse this logic to maintain a clear admin queue during reset_work(). Signed-off-by: NTao Chiu <taochiu@synology.com> Signed-off-by: NCody Wong <codywong@synology.com> Reviewed-by: NLeon Chien <leonchien@synology.com> Reviewed-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Kanchan Joshi 提交于
nvme_clear_nvme_request() clears the nvme_command, which is unncessary for passthrough requests as nvme_command is overwritten immediately. Move clearing part from this helper to the caller, so that double memset for passthrough requests is avoided. Signed-off-by: NKanchan Joshi <joshi.k@samsung.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Kanchan Joshi 提交于
Add a helper to avoid opencoding ns->kref increment. Decrement is already done via nvme_put_ns helper. Signed-off-by: NKanchan Joshi <joshi.k@samsung.com> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Minwoo Im 提交于
In multipath case, we should consider namespace attachment with controllers in a subsystem when we find out the live controller for the namespace. This patch manually reverted the commit 3557a440 ("nvme: don't bother to look up a namespace for controller ioctls") with few more updates to nvme_ns_head_chr_ioctl which has been newly updated. Fixes: 3557a440 ("nvme: don't bother to look up a namespace for controller ioctls") Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NMinwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 22 4月, 2021 5 次提交
-
-
由 Minwoo Im 提交于
Userspace has not been allowed to I/O to device that's failed to be initialized. This patch introduces generic per-namespace character device to allow userspace to I/O regardless the block device is there or not. The chardev naming convention will similar to the existing blkdev naming, using a ng prefix instead of nvme, i.e. - /dev/ngXnY It also supports multipath which means it will not expose chardev for the hidden namespace blkdevs (e.g., nvmeXcYnZ). If /dev/ngXnY is created for a ns_head, then I/O request will be routed to a specific controller selected by the iopolicy of the subsystem. Signed-off-by: NMinwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: NJavier González <javier.gonz@samsung.com> Reviewed-by: NKeith Busch <kbusch@kernel.org> Tested-by: NKanchan Joshi <joshi.k@samsung.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Christoph Hellwig 提交于
Remove a level of indentation from the main code implementating the table search by using a goto for the APST not supported case. Also move the main comment above the function. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NNiklas Cassel <niklas.cassel@wdc.com>
-
由 Christoph Hellwig 提交于
Do not call nvme_configure_apst when the controller is not live, given that nvme_configure_apst will fail due the lack of an admin queue when the controller is being torn down and nvme_set_latency_tolerance is called from dev_pm_qos_hide_latency_tolerance. Fixes: 510a405d("nvme: fix memory leak for power latency tolerance") Reported-by: NPeng Liu <liupeng17@lenovo.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <kbusch@kernel.org>
-
由 Hannes Reinecke 提交于
Add a 'kato' controller sysfs attribute to display the current keep-alive timeout value (if any). This allows userspace to identify persistent discovery controllers, as these will have a non-zero KATO value. Signed-off-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Hannes Reinecke 提交于
According to the NVMe base spec the KATO commands should be sent at half of the KATO interval, to properly account for round-trip times. As we now will only ever send one KATO command per connection we can easily use the recommended values. This also fixes a potential issue where the request timeout for the KATO command does not match the value in the connect command, which might be causing spurious connection drops from the target. Signed-off-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 15 4月, 2021 17 次提交
-
-
由 Gopal Tiwari 提交于
Adding entry for dev_attr_fast_io_fail_tmo to avoid the kernel crash while reading and writing the fast_io_fail_tmo. Fixes: 09fbed63 (nvme: export fast_io_fail_tmo to sysfs) Signed-off-by: NGopal Tiwari <gtiwari@redhat.com> Reviewed-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Christoph Hellwig 提交于
Instead of failing to scan the namespace entirely when unsupported features are detected, just mark the gendisk hidden but allow other access like the upcoming per-namespace character device. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NJavier González <javier.gonz@samsung.com>
-
由 Christoph Hellwig 提交于
These will be reused for the per-namespace character devices. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NJavier González <javier.gonz@samsung.com> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
-
由 Christoph Hellwig 提交于
Move the multipath block_device_operations to multipath.c, where they belong. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NJavier González <javier.gonz@samsung.com> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
-
由 Christoph Hellwig 提交于
Add a helper to avoid opencoding ns_head->ref manipulations. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NJavier González <javier.gonz@samsung.com> Reviewed-by: NKanchan Joshi <joshi.k@samsung.com> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
-
由 Christoph Hellwig 提交于
Split out the ioctl code from core.c into a new file. Also update copyrights while we're at it. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NJavier González <javier.gonz@samsung.com> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
-
由 Christoph Hellwig 提交于
Don't bother to look up a namespace just to drop if after retreiving the controller for the multipath case. Just look up a live controller for the subsystem directly. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NJavier González <javier.gonz@samsung.com>
-
由 Christoph Hellwig 提交于
Only use the existing ioctl handler for the multipath case, and add a simpler one that reverts to the pre-multipath case for not shared use case. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NJavier González <javier.gonz@samsung.com>
-
由 Christoph Hellwig 提交于
Don't bother defining a separate compat_ioctl handler, and just handle the NVME_IOCTL_SUBMIT_IO32 case inline. Also only defined it for those ABIs (currently just i386 vs x86_64) that are affected. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NJavier González <javier.gonz@samsung.com>
-
由 Christoph Hellwig 提交于
Factor out a helper for the namespace based ioctls. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NJavier González <javier.gonz@samsung.com> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
-
由 Christoph Hellwig 提交于
Pass the proper user pointer instead of the not all that useful integer representation. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NJavier González <javier.gonz@samsung.com>
-
由 Christoph Hellwig 提交于
Return false from nvme_set_disk_name and let the caller set the non-multipath name instead of duplicating the naming information in two places. Also remove the pointless local variables for the disk name and flags and the not needed ctrl argument. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NJavier González <javier.gonz@samsung.com>
-
由 Minwoo Im 提交于
Move the multipath gendisk out of #ifdef CONFIG_NVME_MULTIPATH and add a new nvme_ns_head_multipath that uses it to check if a ns_head has a multipath device associated with it. Signed-off-by: NMinwoo Im <minwoo.im.dev@gmail.com> [hch: added the IS_ENABLED, converted a few existing users] Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NJavier González <javier.gonz@samsung.com> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
-
由 Niklas Cassel 提交于
There is a single trailing whitespace in core.c. Since this is just a single whitespace, the chances of this affecting backports to stable should be quite low, so let's just remove it. Signed-off-by: NNiklas Cassel <niklas.cassel@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Niklas Cassel 提交于
There is a single trailing whitespace in multipath.c. Since this is just a single whitespace, the chances of this affecting backports to stable should be quite low, so let's just remove it. Signed-off-by: NNiklas Cassel <niklas.cassel@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Niklas Cassel 提交于
There is a single trailing whitespace in pci.c. Since this is just a single whitespace, the chances of this affecting backports to stable should be quite low, so let's just remove it. Signed-off-by: NNiklas Cassel <niklas.cassel@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Niklas Cassel 提交于
According to the module parameter description for sgl_threshold, a value of 0 means that SGLs are disabled. If SGLs are disabled, we should respect that, even for the case where the request is made up of a single physical segment. Fixes: 29791057 ("nvme-pci: optimize mapping single segment requests using SGLs") Signed-off-by: NNiklas Cassel <niklas.cassel@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 13 4月, 2021 1 次提交
-
-
由 Chaitanya Kulkarni 提交于
This fixs coccicheck warning: drivers/nvme//host/lightnvm.c:1243:60-61: WARNING opportunity for kobj_to_dev() Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NMatias Bjørling <matias.bjorling@wdc.com> Link: https://lore.kernel.org/r/20210413105257.159260-2-matias.bjorling@wdc.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 09 4月, 2021 1 次提交
-
-
由 Sami Tolvanen 提交于
list_sort() internally casts the comparison function passed to it to a different type with constant struct list_head pointers, and uses this pointer to call the functions, which trips indirect call Control-Flow Integrity (CFI) checking. Instead of removing the consts, this change defines the list_cmp_func_t type and changes the comparison function types of all list_sort() callers to use const pointers, thus avoiding type mismatches. Suggested-by: NNick Desaulniers <ndesaulniers@google.com> Signed-off-by: NSami Tolvanen <samitolvanen@google.com> Reviewed-by: NNick Desaulniers <ndesaulniers@google.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKees Cook <keescook@chromium.org> Tested-by: NNick Desaulniers <ndesaulniers@google.com> Tested-by: NNathan Chancellor <nathan@kernel.org> Signed-off-by: NKees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210408182843.1754385-10-samitolvanen@google.com
-
- 06 4月, 2021 1 次提交
-
-
由 Christoph Hellwig 提交于
Instead of overloading the passthrough fast path with the deprecated block layer bounce buffering let the users that combine an old undermaintained driver with a highmem system pay the price by always falling back to copies in that case. Signed-off-by: NChristoph Hellwig <hch@lst.de> Acked-by: NMartin K. Petersen <martin.petersen@oracle.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20210331073001.46776-9-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-