- 17 2月, 2020 7 次提交
-
-
由 Robert Richter 提交于
There is a limitation to report only EDAC_MAX_LABELS in e->label of the error descriptor. This is to prevent a potential string overflow. The current implementation falls back to "any memory" in this case and also stops all further processing to find a unique row and channel of the possible error location. Reporting "any memory" is wrong as the memory controller reported an error location for one of the layers. Instead, report "unknown memory" and also do not break early in the loop to further check row and channel for uniqueness. [ bp: Massage commit message. ] Signed-off-by: NRobert Richter <rrichter@marvell.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Acked-by: NAristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200123090210.26933-7-rrichter@marvell.com
-
由 Robert Richter 提交于
Carve out the error_count increment into a separate function edac_inc_csrow(). This better separates code and reduces the indentation level. Implementation note: The function edac_inc_csrow() counts the same as before, ->ce_count is only incremented if row >= 0. This is esp. true for the case of (!e->enable_per_layer_report). Here, a DIMM was not found, variable row still has a value of -1 and ->ce_count is not incremented. [ bp: Massage commit message. ] Signed-off-by: NRobert Richter <rrichter@marvell.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NMauro Carvalho Chehab <mchehab@kernel.org> Acked-by: NAristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200214141757.8976-1-rrichter@marvell.com
-
由 Robert Richter 提交于
Each struct mci has its own error descriptor. Create a function error_desc_to_mci() to determine the corresponding mci from an error descriptor. This removes @mci from the parameter list of edac_raw_mc_handle_error() as the mci pointer does not need to be passed any longer. [ bp: Massage commit message. ] Signed-off-by: NRobert Richter <rrichter@marvell.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: NAristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200123090210.26933-5-rrichter@marvell.com
-
由 Robert Richter 提交于
Store the error type in struct edac_raw_error_desc. This makes the type parameter of edac_raw_mc_handle_error() obsolete. [ kernel-doc typo ] Reported-by: Nkbuild test robot <lkp@intel.com> Signed-off-by: NRobert Richter <rrichter@marvell.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NMauro Carvalho Chehab <mchehab@kernel.org> Acked-by: NAristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200123090210.26933-4-rrichter@marvell.com
-
由 Robert Richter 提交于
Reorder the new created functions edac_mc_alloc_csrows() and edac_mc_alloc_dimms() and move them before edac_mc_alloc(). No further code changes. Signed-off-by: NRobert Richter <rrichter@marvell.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: NAristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200123090210.26933-3-rrichter@marvell.com
-
由 Robert Richter 提交于
edac_mc_alloc() is huge. Factor out code by moving it to the two new functions edac_mc_alloc_csrows() and edac_mc_alloc_dimms(). Do not move code yet for better review. [ bp: sort local args in reversed fir tree order. ] Signed-off-by: NRobert Richter <rrichter@marvell.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org> Acked-by: NAristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200123090210.26933-2-rrichter@marvell.com
-
由 Robert Richter 提交于
There are dimm and csrow devices linked to the mci device esp. to show up in sysfs. It must be granted that children devices are removed before its mci parent. Thus, the release functions must be called in the correct order and may not miss any child before releasing its parent. In the current implementation this is only granted by the correct order of release functions. A much better approach is to use put_device() that releases the device only after all users are gone. It is the recommended way to release a device and free its memory. The function uses the device's refcount and only frees it if there are no users of it anymore such as children. So implement a mci_release() function to remove mci devices, use put_device() to free them and early initialize the mci device right after its struct has been allocated. Change the release function so that it can be universally used no matter if the device is registered or not. Since subsequent dimm and csrow sysfs links are implemented as children devices, their refcounts will keep the parent mci device from being removed as long as sysfs entries exist and until all users have been unregistered in edac_remove_sysfs_mci_device(). Remove edac_unregister_sysfs() and merge mci sysfs removal into edac_remove_sysfs_mci_device(). There is only a single instance now that removes the sysfs entries. The function can now be used in the error paths for cleanup. Also, create device release functions for all involved devices (dev->release), remove device_type release functions (dev_type-> release) and also use dev->init_name instead of dev_set_name(). [ bp: Massage commit message and comments. ] Signed-off-by: NRobert Richter <rrichter@marvell.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Acked-by: NAristeu Rozanski <aris@redhat.com> Link: https://lkml.kernel.org/r/20200212120340.4764-5-rrichter@marvell.com
-
- 16 2月, 2020 1 次提交
-
-
由 Marek Behún 提交于
The input_read function declares the size of the hex array relative to sizeof(buf), but buf is a pointer argument of the function. The hex array is meant to contain hexadecimal representation of the bin array. Link: https://lore.kernel.org/r/20200215142130.22743-1-marek.behun@nic.cz Fixes: 5bc7f990 ("bus: Add support for Moxtet bus") Signed-off-by: NMarek Behún <marek.behun@nic.cz> Reported-by: Nsohu0106 <sohu0106@126.com> Signed-off-by: NOlof Johansson <olof@lixom.net>
-
- 15 2月, 2020 8 次提交
-
-
由 Gustavo A. R. Silva 提交于
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com> Link: https://lore.kernel.org/r/20200214172132.GA28389@embeddedorSigned-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
-
由 Gustavo A. R. Silva 提交于
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com> Link: https://lore.kernel.org/r/20200214172022.GA27490@embeddedorSigned-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
-
由 Gustavo A. R. Silva 提交于
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com> Link: https://lore.kernel.org/r/20200214171907.GA26588@embeddedorSigned-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
-
由 Jason Gunthorpe 提交于
On i386: ERROR: "__udivdi3" [drivers/infiniband/hw/mlx5/mlx5_ib.ko] undefined! ERROR: "__divdi3" [drivers/infiniband/hw/mlx5/mlx5_ib.ko] undefined! Fixes: f164be8c ("IB/mlx5: Extend caps stage to handle VAR capabilities") Reported-by: NRandy Dunlap <rdunlap@infradead.org> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Reported-by: NAlexander Lobakin <alobakin@dlink.ru> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Yi Zhang 提交于
nvme fw-activate operation will get bellow warning log, fix it by update the parameter order [ 113.231513] nvme nvme0: Get FW SLOT INFO log error Fixes: 0e98719b ("nvme: simplify the API for getting log pages") Reported-by: NSujith Pandel <sujith_pandel@dell.com> Reviewed-by: NDavid Milburn <dmilburn@redhat.com> Signed-off-by: NYi Zhang <yi.zhang@redhat.com> Signed-off-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Keith Busch 提交于
Many users have reported nvme triggered irq_startup() warnings during shutdown. The driver uses the nvme queue's irq to synchronize scanning for completions, and enabling an interrupt affined to only offline CPUs triggers the alarming warning. Move the final CQE check to after disabling the device and all registered interrupts have been torn down so that we do not have any IRQ to synchronize. Link: https://bugzilla.kernel.org/show_bug.cgi?id=206509Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Nigel Kirkland 提交于
Delayed keep alive work is queued on system workqueue and may be cancelled via nvme_stop_keep_alive from nvme_reset_wq, nvme_fc_wq or nvme_wq. Check_flush_dependency detects mismatched attributes between the work-queue context used to cancel the keep alive work and system-wq. Specifically system-wq does not have the WQ_MEM_RECLAIM flag, whereas the contexts used to cancel keep alive work have WQ_MEM_RECLAIM flag. Example warning: workqueue: WQ_MEM_RECLAIM nvme-reset-wq:nvme_fc_reset_ctrl_work [nvme_fc] is flushing !WQ_MEM_RECLAIM events:nvme_keep_alive_work [nvme_core] To avoid the flags mismatch, delayed keep alive work is queued on nvme_wq. However this creates a secondary concern where work and a request to cancel that work may be in the same work queue - namely err_work in the rdma and tcp transports, which will want to flush/cancel the keep alive work which will now be on nvme_wq. After reviewing the transports, it looks like err_work can be moved to nvme_reset_wq. In fact that aligns them better with transition into RESETTING and performing related reset work in nvme_reset_wq. Change nvme-rdma and nvme-tcp to perform err_work in nvme_reset_wq. Signed-off-by: NNigel Kirkland <nigel.kirkland@broadcom.com> Signed-off-by: NJames Smart <jsmart2021@gmail.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Anton Eidelman 提交于
When nvme_tcp_io_work() fails to send to socket due to connection close/reset, error_recovery work is triggered from nvme_tcp_state_change() socket callback. This cancels all the active requests in the tagset, which requeues them. The failed request, however, was ended and thus requeued individually as well unless send returned -EPIPE. Another return code to be treated the same way is -ECONNRESET. Double requeue caused BUG_ON(blk_queued_rq(rq)) in blk_mq_requeue_request() from either the individual requeue of the failed request or the bulk requeue from blk_mq_tagset_busy_iter(, nvme_cancel_request, ); Signed-off-by: NAnton Eidelman <anton@lightbitslabs.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 14 2月, 2020 12 次提交
-
-
由 Guangbin Huang 提交于
The IPv6 address defined in struct in6_addr is specified as big endian, but there is no specified endian in struct hclge_fd_rule_tuples, so it will cause a problem if directly use memcpy() to copy ipv6 address between these two structures since this field in struct hclge_fd_rule_tuples is little endian. This patch fixes this problem by using be32_to_cpu() to convert endian of IPv6 address of struct in6_addr before copying. Fixes: d93ed94f ("net: hns3: add aRFS support for PF") Signed-off-by: NGuangbin Huang <huangguangbin2@huawei.com> Signed-off-by: NHuazhong Tan <tanhuazhong@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yonglong Liu 提交于
When enabling 4 TC after setting the bandwidth of VF, the bandwidth of VF will resume to default value, because of the qset resources changed in this case. This patch fixes it by using a fixed VF's qset resources according to HNAE3_MAX_TC macro. Fixes: ee9e4424 ("net: hns3: add support for configuring bandwidth of VF on the host") Signed-off-by: NYonglong Liu <liuyonglong@huawei.com> Signed-off-by: NHuazhong Tan <tanhuazhong@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yufeng Mo 提交于
In the current process, the management table is missing after the IMP reset. This patch adds the management table to the reset process. Fixes: f5aac71c ("net: hns3: add manager table initialization for hardware") Signed-off-by: NYufeng Mo <moyufeng@huawei.com> Signed-off-by: NHuazhong Tan <tanhuazhong@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Benjamin Tissoires 提交于
The Yoga 11e is using LEN0049, but it doesn't have a trackstick. Thus, there is no need to create a software top buttons row. However, it seems that the device works under SMBus, so keep it as part of the smbus_pnp_ids. Signed-off-by: NBenjamin Tissoires <benjamin.tissoires@redhat.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200115013023.9710-1-benjamin.tissoires@redhat.comSigned-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
-
由 Gaurav Agrawal 提交于
Add touchpad LEN2044 to the list, as it is capable of working with psmouse.synaptics_intertouch=1 Signed-off-by: NGaurav Agrawal <agrawalgaurav@gnome.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/CADdtggVzVJq5gGNmFhKSz2MBwjTpdN5YVOdr4D3Hkkv=KZRc9g@mail.gmail.comSigned-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
-
由 Lyude Paul 提交于
This supports RMI4 and everything seems to work, including the touchpad buttons. So, let's enable this by default. Signed-off-by: NLyude Paul <lyude@redhat.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200204194322.112638-1-lyude@redhat.comSigned-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
-
由 Gustavo A. R. Silva 提交于
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com> Link: https://lore.kernel.org/r/20200213002600.GA31916@embeddedor.comSigned-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
-
由 Gustavo A. R. Silva 提交于
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com> Link: https://lore.kernel.org/r/20200213002430.GA31056@embeddedor.comSigned-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
-
由 Jason A. Donenfeld 提交于
Because wireguard is calling icmp from network device context, it should use the ndo helper so that the rate limiting applies correctly. This commit adds a small test to the wireguard test suite to ensure that the new functions continue doing the right thing in the context of wireguard. It does this by setting up a condition that will definately evoke an icmp error message from the driver, but along a nat'd path. Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jason A. Donenfeld 提交于
Because sunvnet is calling icmp from network device context, it should use the ndo helper so that the rate limiting applies correctly. While we're at it, doing the additional route lookup before calling icmp_ndo_send is superfluous, since this is the job of the icmp code in the first place. Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com> Cc: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jason A. Donenfeld 提交于
Because gtp is calling icmp from network device context, it should use the ndo helper so that the rate limiting applies correctly. Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com> Cc: Harald Welte <laforge@gnumonks.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Leon Romanovsky 提交于
We don't need to set pkey as valid in case that user set only one of pkey index or port number, otherwise it will be resulted in NULL pointer dereference while accessing to uninitialized pkey list. The following crash from Syzkaller revealed it. kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN PTI CPU: 1 PID: 14753 Comm: syz-executor.2 Not tainted 5.5.0-rc5 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 RIP: 0010:get_pkey_idx_qp_list+0x161/0x2d0 Code: 01 00 00 49 8b 5e 20 4c 39 e3 0f 84 b9 00 00 00 e8 e4 42 6e fe 48 8d 7b 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 01 0f 8e d0 00 00 00 48 8d 7d 04 48 b8 RSP: 0018:ffffc9000bc6f950 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff82c8bdec RDX: 0000000000000002 RSI: ffffc900030a8000 RDI: 0000000000000010 RBP: ffff888112c8ce80 R08: 0000000000000004 R09: fffff5200178df1f R10: 0000000000000001 R11: fffff5200178df1f R12: ffff888115dc4430 R13: ffff888115da8498 R14: ffff888115dc4410 R15: ffff888115da8000 FS: 00007f20777de700(0000) GS:ffff88811b100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2f721000 CR3: 00000001173ca002 CR4: 0000000000360ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: port_pkey_list_insert+0xd7/0x7c0 ib_security_modify_qp+0x6fa/0xfc0 _ib_modify_qp+0x8c4/0xbf0 modify_qp+0x10da/0x16d0 ib_uverbs_modify_qp+0x9a/0x100 ib_uverbs_write+0xaa5/0xdf0 __vfs_write+0x7c/0x100 vfs_write+0x168/0x4a0 ksys_write+0xc8/0x200 do_syscall_64+0x9c/0x390 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: d291f1a6 ("IB/core: Enforce PKey security on QPs") Link: https://lore.kernel.org/r/20200212080651.GB679970@unrealSigned-off-by: NMaor Gottlieb <maorg@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Message-Id: <20200212080651.GB679970@unreal>
-
- 13 2月, 2020 12 次提交
-
-
由 Coly Li 提交于
Macro nr_to_fifo_front() is only used once in btree_flush_write(), it is unncessary indeed. This patch removes this macro and does calculation directly in place. Signed-off-by: NColy Li <colyli@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Coly Li 提交于
This reverts commit 1df3877f. In my testing, sometimes even all the cached btree nodes are freed, creating gc and allocator kernel threads may still fail. Finally it turns out that kthread_run() may fail if there is pending signal for current task. And the pending signal is sent from OOM killer which is triggered by memory consuption in bch_btree_check(). Therefore explicitly shrinking bcache btree node here does not help, and after the shrinker callback is improved, as well as pending signals are ignored before creating kernel threads, now such operation is unncessary anymore. This patch reverts the commit 1df3877f ("bcache: shrink btree node cache after bch_btree_check()") because we have better improvement now. Signed-off-by: NColy Li <colyli@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Coly Li 提交于
When run a cache set, all the bcache btree node of this cache set will be checked by bch_btree_check(). If the bcache btree is very large, iterating all the btree nodes will occupy too much system memory and the bcache registering process might be selected and killed by system OOM killer. kthread_run() will fail if current process has pending signal, therefore the kthread creating in run_cache_set() for gc and allocator kernel threads are very probably failed for a very large bcache btree. Indeed such OOM is safe and the registering process will exit after the registration done. Therefore this patch flushes pending signals during the cache set start up, specificly in bch_cache_allocator_start() and bch_gc_thread_start(), to make sure run_cache_set() won't fail for large cahced data set. Signed-off-by: NColy Li <colyli@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Zhu Yanjun 提交于
When run stress tests with RXE, the following Call Traces often occur watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [swapper/2:0] ... Call Trace: <IRQ> create_object+0x3f/0x3b0 kmem_cache_alloc_node_trace+0x129/0x2d0 __kmalloc_reserve.isra.52+0x2e/0x80 __alloc_skb+0x83/0x270 rxe_init_packet+0x99/0x150 [rdma_rxe] rxe_requester+0x34e/0x11a0 [rdma_rxe] rxe_do_task+0x85/0xf0 [rdma_rxe] tasklet_action_common.isra.21+0xeb/0x100 __do_softirq+0xd0/0x298 irq_exit+0xc5/0xd0 smp_apic_timer_interrupt+0x68/0x120 apic_timer_interrupt+0xf/0x20 </IRQ> ... The root cause is that tasklet is actually a softirq. In a tasklet handler, another softirq handler is triggered. Usually these softirq handlers run on the same cpu core. So this will cause "soft lockup Bug". Fixes: 8700e3e7 ("Soft RoCE driver") Link: https://lore.kernel.org/r/20200212072635.682689-8-leon@kernel.orgSigned-off-by: NZhu Yanjun <yanjunz@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Leon Romanovsky 提交于
The cmd and index variables declared as u16 and the result is supposed to be stored in u64. The C arithmetic rules doesn't promote "(index >> 8) << 16" to be u64 and leaves the end result to be u16. Fixes: 7be76bef ("IB/mlx5: Introduce VAR object and its alloc/destroy methods") Link: https://lore.kernel.org/r/20200212072635.682689-10-leon@kernel.orgSigned-off-by: NLeon Romanovsky <leonro@mellanox.com> Reviewed-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Yonatan Cohen 提交于
When disassociating a device from umad we must ensure that the sysfs access is prevented before blocking the fops, otherwise assumptions in syfs don't hold: CPU0 CPU1 ib_umad_kill_port() ibdev_show() port->ib_dev = NULL dev_name(port->ib_dev) The prior patch made an error in moving the device_destroy(), it should have been split into device_del() (above) and put_device() (below). At this point we already have the split, so move the device_del() back to its original place. kernel stack PF: error_code(0x0000) - not-present page Oops: 0000 [#1] SMP DEBUG_PAGEALLOC PTI RIP: 0010:ibdev_show+0x18/0x50 [ib_umad] RSP: 0018:ffffc9000097fe40 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffffffffa0441120 RCX: ffff8881df514000 RDX: ffff8881df514000 RSI: ffffffffa0441120 RDI: ffff8881df1e8870 RBP: ffffffff81caf000 R08: ffff8881df1e8870 R09: 0000000000000000 R10: 0000000000001000 R11: 0000000000000003 R12: ffff88822f550b40 R13: 0000000000000001 R14: ffffc9000097ff08 R15: ffff8882238bad58 FS: 00007f1437ff3740(0000) GS:ffff888236940000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000004e8 CR3: 00000001e0dfc001 CR4: 00000000001606e0 Call Trace: dev_attr_show+0x15/0x50 sysfs_kf_seq_show+0xb8/0x1a0 seq_read+0x12d/0x350 vfs_read+0x89/0x140 ksys_read+0x55/0xd0 do_syscall_64+0x55/0x1b0 entry_SYSCALL_64_after_hwframe+0x44/0xa9: Fixes: cf7ad303 ("IB/umad: Avoid destroying device while it is accessed") Link: https://lore.kernel.org/r/20200212072635.682689-9-leon@kernel.orgSigned-off-by: NYonatan Cohen <yonatanc@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Reviewed-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Yishai Hadas 提交于
As in the prior patch, the devx code is not fully cleaning up its event_lists before finishing driver_destroy allowing a later read to trigger user after free conditions. Re-arrange things so that the event_list is always empty after destroy and ensure it remains empty until the file is closed. Fixes: f7c8416c ("RDMA/core: Simplify destruction of FD uobjects") Link: https://lore.kernel.org/r/20200212072635.682689-7-leon@kernel.orgSigned-off-by: NYishai Hadas <yishaih@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Reviewed-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Michael Guralnik 提交于
When the uobject file scheme was revised to allow device disassociation from the file it became possible for read() to still happen the driver destroys the uobject. The old clode code was not tolerant to concurrent read, and when it was moved to the driver destroy it creates a bug. Ensure the event_list is empty after driver destroy by adding the missing list_del(). Otherwise read() can trigger a use after free and double kfree. Fixes: f7c8416c ("RDMA/core: Simplify destruction of FD uobjects") Link: https://lore.kernel.org/r/20200212072635.682689-6-leon@kernel.orgSigned-off-by: NMichael Guralnik <michaelgur@mellanox.com> Reviewed-by: NYishai Hadas <yishaih@mellanox.com> Signed-off-by: NLeon Romanovsky <leonro@mellanox.com> Reviewed-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
-
由 Robert Richter 提交于
All created csrow objects must be removed in the error path of edac_create_csrow_objects(). The objects have been added as devices. They need to be removed by doing a device_del() *and* put_device() call to also free their memory. The missing put_device() leaves a memory leak. Use device_unregister() instead of device_del() which properly unregisters the device doing both. Fixes: 7adc05d2 ("EDAC/sysfs: Drop device references properly") Signed-off-by: NRobert Richter <rrichter@marvell.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Tested-by: NJohn Garry <john.garry@huawei.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200212120340.4764-4-rrichter@marvell.com
-
由 Robert Richter 提交于
A test kernel with the options DEBUG_TEST_DRIVER_REMOVE, KASAN and DEBUG_KMEMLEAK set, revealed several issues when removing an mci device: 1) Use-after-free: On 27.11.19 17:07:33, John Garry wrote: > [ 22.104498] BUG: KASAN: use-after-free in > edac_remove_sysfs_mci_device+0x148/0x180 The use-after-free is caused by the mci_for_each_dimm() macro called in edac_remove_sysfs_mci_device(). The iterator was introduced with c498afaf ("EDAC: Introduce an mci_for_each_dimm() iterator"). The iterator loop calls device_unregister(&dimm->dev), which removes the sysfs entry of the device, but also frees the dimm struct in dimm_attr_release(). When incrementing the loop in mci_for_each_dimm(), the dimm struct is accessed again, after having been freed already. The fix is to free all the mci device's subsequent dimm and csrow objects at a later point, in _edac_mc_free(), when the mci device itself is being freed. This keeps the data structures intact and the mci device can be fully used until its removal. The change allows the safe usage of mci_for_each_dimm() to release dimm devices from sysfs. 2) Memory leaks: Following memory leaks have been detected: # grep edac /sys/kernel/debug/kmemleak | sort | uniq -c 1 [<000000003c0f58f9>] edac_mc_alloc+0x3bc/0x9d0 # mci->csrows 16 [<00000000bb932dc0>] edac_mc_alloc+0x49c/0x9d0 # csr->channels 16 [<00000000e2734dba>] edac_mc_alloc+0x518/0x9d0 # csr->channels[chn] 1 [<00000000eb040168>] edac_mc_alloc+0x5c8/0x9d0 # mci->dimms 34 [<00000000ef737c29>] ghes_edac_register+0x1c8/0x3f8 # see edac_mc_alloc() All leaks are from memory allocated by edac_mc_alloc(). Note: The test above shows that edac_mc_alloc() was called here from ghes_edac_register(), thus both functions show up in the stack trace but the module causing the leaks is edac_mc. The comments with the data structures involved were made manually by analyzing the objdump. The data structures listed above and created by edac_mc_alloc() are not properly removed during device removal, which is done in edac_mc_free(). There are two paths implemented to remove the device depending on device registration, _edac_mc_free() is called if the device is not registered and edac_unregister_sysfs() otherwise. The implemenations differ. For the sysfs case, the mci device removal lacks the removal of subsequent data structures (csrows, channels, dimms). This causes the memory leaks (see mci_attr_release()). [ bp: Massage commit message. ] Fixes: c498afaf ("EDAC: Introduce an mci_for_each_dimm() iterator") Fixes: faa2ad09 ("edac_mc: edac_mc_free() cannot assume mem_ctl_info is registered in sysfs.") Fixes: 7a623c03 ("edac: rewrite the sysfs code to use struct device") Reported-by: NJohn Garry <john.garry@huawei.com> Signed-off-by: NRobert Richter <rrichter@marvell.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Tested-by: NJohn Garry <john.garry@huawei.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200212120340.4764-3-rrichter@marvell.com
-
由 Johan Hovold 提交于
Make sure that the driver compatible strings matches the binding by removing the space between the manufacturer and model. Fixes: aaafb7c8 ("hwmon: (pmbus) Add support for Infineon Multi-phase xdpe122 family controllers") Cc: Vadim Pasternak <vadimp@mellanox.com> Signed-off-by: NJohan Hovold <johan@kernel.org> Link: https://lore.kernel.org/r/20200212092426.24012-1-johan@kernel.orgSigned-off-by: NGuenter Roeck <linux@roeck-us.net>
-
由 Tony Nguyen 提交于
This is a collection of trivial fixes including fixing whitespace, typos, function headers, reverse Christmas tree, etc. Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com> Tested-by: NAndrew Bowers <andrewx.bowers@intel.com> Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
-