- 13 2月, 2015 5 次提交
-
-
由 Minchan Kim 提交于
bd_holders is increased only when user open the device file as FMODE_EXCL so if something opens zram0 as !FMODE_EXCL and request I/O while another user reset zram0, we can see following warning. zram0: detected capacity change from 0 to 64424509440 Buffer I/O error on dev zram0, logical block 180823, lost async page write Buffer I/O error on dev zram0, logical block 180824, lost async page write Buffer I/O error on dev zram0, logical block 180825, lost async page write Buffer I/O error on dev zram0, logical block 180826, lost async page write Buffer I/O error on dev zram0, logical block 180827, lost async page write Buffer I/O error on dev zram0, logical block 180828, lost async page write Buffer I/O error on dev zram0, logical block 180829, lost async page write Buffer I/O error on dev zram0, logical block 180830, lost async page write Buffer I/O error on dev zram0, logical block 180831, lost async page write Buffer I/O error on dev zram0, logical block 180832, lost async page write ------------[ cut here ]------------ WARNING: CPU: 11 PID: 1996 at fs/block_dev.c:57 __blkdev_put+0x1d7/0x210() Modules linked in: CPU: 11 PID: 1996 Comm: dd Not tainted 3.19.0-rc6-next-20150202+ #1125 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: dump_stack+0x45/0x57 warn_slowpath_common+0x8a/0xc0 warn_slowpath_null+0x1a/0x20 __blkdev_put+0x1d7/0x210 blkdev_put+0x50/0x130 blkdev_close+0x25/0x30 __fput+0xdf/0x1e0 ____fput+0xe/0x10 task_work_run+0xa7/0xe0 do_notify_resume+0x49/0x60 int_signal+0x12/0x17 ---[ end trace 274fbbc5664827d2 ]--- The warning comes from bdev_write_node in blkdev_put path. static void bdev_write_inode(struct inode *inode) { spin_lock(&inode->i_lock); while (inode->i_state & I_DIRTY) { spin_unlock(&inode->i_lock); WARN_ON_ONCE(write_inode_now(inode, true)); <========= here. spin_lock(&inode->i_lock); } spin_unlock(&inode->i_lock); } The reason is dd process encounters I/O fails due to sudden block device disappear so in filemap_check_errors in __writeback_single_inode returns -EIO. If we check bd_openers instead of bd_holders, we could address the problem. When I see the brd, it already have used it rather than bd_holders so although I'm not a expert of block layer, it seems to be better. I can make following warning with below simple script. In addition, I added msleep(2000) below set_capacity(zram->disk, 0) after applying your patch to make window huge(Kudos to Ganesh!) script: echo $((60<<30)) > /sys/block/zram0/disksize setsid dd if=/dev/zero of=/dev/zram0 & sleep 1 setsid echo 1 > /sys/block/zram0/reset Signed-off-by: NMinchan Kim <minchan@kernel.org> Acked-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Ganesh Mahendran <opensource.ganesh@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Sergey Senozhatsky 提交于
We need to return set_capacity(disk, 0) from reset_store() back to zram_reset_device(), a catch by Ganesh Mahendran. Potentially, we can race set_capacity() calls from init and reset paths. The problem is that zram_reset_device() is also getting called from zram_exit(), which performs operations in misleading reversed order -- we first create_device() and then init it, while zram_exit() perform destroy_device() first and then does zram_reset_device(). This is done to remove sysfs group before we reset device, so we can continue with device reset/destruction not being raced by sysfs attr write (f.e. disksize). Apart from that, destroy_device() releases zram->disk (but we still have ->disk pointer), so we cannot acces zram->disk in later zram_reset_device() call, which may cause additional errors in the future. So, this patch rework and cleanup destroy path. 1) remove several unneeded goto labels in zram_init() 2) factor out zram_init() error path and zram_exit() into destroy_devices() function, which takes the number of devices to destroy as its argument. 3) remove sysfs group in destroy_devices() first, so we can reorder operations -- reset device (as expected) goes before disk destroy and queue cleanup. So we can always access ->disk in zram_reset_device(). 4) and, finally, return set_capacity() back under ->init_lock. [akpm@linux-foundation.org: tweak comment] Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com> Reported-by: NGanesh Mahendran <opensource.ganesh@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Sergey Senozhatsky 提交于
Ganesh Mahendran was the first one who proposed to use bdev->bd_mutex to avoid ->bd_holders race condition: CPU0 CPU1 umount /* zram->init_done is true */ reset_store() bdev->bd_holders == 0 mount ... zram_make_request() zram_reset_device() However, his solution required some considerable amount of code movement, which we can avoid. Apart from using bdev->bd_mutex in reset_store(), this patch also simplifies zram_reset_device(). zram_reset_device() has a bool parameter reset_capacity which tells it whether disk capacity and itself disk should be reset. There are two zram_reset_device() callers: -- zram_exit() passes reset_capacity=false -- reset_store() passes reset_capacity=true So we can move reset_capacity-sensitive work out of zram_reset_device() and perform it unconditionally in reset_store(). This also lets us drop reset_capacity parameter from zram_reset_device() and pass zram pointer only. Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com> Reported-by: NGanesh Mahendran <opensource.ganesh@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ganesh Mahendran 提交于
zram_meta_alloc() and zram_meta_free() are a pair. In zram_meta_alloc(), meta table is allocated. So it it better to free it in zram_meta_free(). Signed-off-by: NGanesh Mahendran <opensource.ganesh@gmail.com> Acked-by: NMinchan Kim <minchan@kernel.org> Acked-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Sergey Senozhatsky 提交于
A trivial cleanup of zram_meta_alloc() error handling. Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: NMinchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 12 2月, 2015 5 次提交
-
-
由 Andrea Arcangeli 提交于
This allows those get_user_pages calls to pass FAULT_FLAG_ALLOW_RETRY to the page fault in order to release the mmap_sem during the I/O. Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Reviewed-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andres Lagar-Cavilla <andreslc@google.com> Cc: Peter Feiner <pfeiner@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michal Hocko 提交于
Commit 5695be14 ("OOM, PM: OOM killed task shouldn't escape PM suspend") has left a race window when OOM killer manages to note_oom_kill after freeze_processes checks the counter. The race window is quite small and really unlikely and partial solution deemed sufficient at the time of submission. Tejun wasn't happy about this partial solution though and insisted on a full solution. That requires the full OOM and freezer's task freezing exclusion, though. This is done by this patch which introduces oom_sem RW lock and turns oom_killer_disable() into a full OOM barrier. oom_killer_disabled check is moved from the allocation path to the OOM level and we take oom_sem for reading for both the check and the whole OOM invocation. oom_killer_disable() takes oom_sem for writing so it waits for all currently running OOM killer invocations. Then it disable all the further OOMs by setting oom_killer_disabled and checks for any oom victims. Victims are counted via mark_tsk_oom_victim resp. unmark_oom_victim. The last victim wakes up all waiters enqueued by oom_killer_disable(). Therefore this function acts as the full OOM barrier. The page fault path is covered now as well although it was assumed to be safe before. As per Tejun, "We used to have freezing points deep in file system code which may be reacheable from page fault." so it would be better and more robust to not rely on freezing points here. Same applies to the memcg OOM killer. out_of_memory tells the caller whether the OOM was allowed to trigger and the callers are supposed to handle the situation. The page allocation path simply fails the allocation same as before. The page fault path will retry the fault (more on that later) and Sysrq OOM trigger will simply complain to the log. Normally there wouldn't be any unfrozen user tasks after try_to_freeze_tasks so the function will not block. But if there was an OOM killer racing with try_to_freeze_tasks and the OOM victim didn't finish yet then we have to wait for it. This should complete in a finite time, though, because - the victim cannot loop in the page fault handler (it would die on the way out from the exception) - it cannot loop in the page allocator because all the further allocation would fail and __GFP_NOFAIL allocations are not acceptable at this stage - it shouldn't be blocked on any locks held by frozen tasks (try_to_freeze expects lockless context) and kernel threads and work queues are not frozen yet Signed-off-by: NMichal Hocko <mhocko@suse.cz> Suggested-by: NTejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michal Hocko 提交于
While touching this area let's convert printk to pr_*. This also makes the printing of continuation lines done properly. Signed-off-by: NMichal Hocko <mhocko@suse.cz> Acked-by: NTejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michal Hocko 提交于
This patchset addresses a race which was described in the changelog for 5695be14 ("OOM, PM: OOM killed task shouldn't escape PM suspend"): : PM freezer relies on having all tasks frozen by the time devices are : getting frozen so that no task will touch them while they are getting : frozen. But OOM killer is allowed to kill an already frozen task in order : to handle OOM situtation. In order to protect from late wake ups OOM : killer is disabled after all tasks are frozen. This, however, still keeps : a window open when a killed task didn't manage to die by the time : freeze_processes finishes. The original patch hasn't closed the race window completely because that would require a more complex solution as it can be seen by this patchset. The primary motivation was to close the race condition between OOM killer and PM freezer _completely_. As Tejun pointed out, even though the race condition is unlikely the harder it would be to debug weird bugs deep in the PM freezer when the debugging options are reduced considerably. I can only speculate what might happen when a task is still runnable unexpectedly. On a plus side and as a side effect the oom enable/disable has a better (full barrier) semantic without polluting hot paths. I have tested the series in KVM with 100M RAM: - many small tasks (20M anon mmap) which are triggering OOM continually - s2ram which resumes automatically is triggered in a loop echo processors > /sys/power/pm_test while true do echo mem > /sys/power/state sleep 1s done - simple module which allocates and frees 20M in 8K chunks. If it sees freezing(current) then it tries another round of allocation before calling try_to_freeze - debugging messages of PM stages and OOM killer enable/disable/fail added and unmark_oom_victim is delayed by 1s after it clears TIF_MEMDIE and before it wakes up waiters. - rebased on top of the current mmotm which means some necessary updates in mm/oom_kill.c. mark_tsk_oom_victim is now called under task_lock but I think this should be OK because __thaw_task shouldn't interfere with any locking down wake_up_process. Oleg? As expected there are no OOM killed tasks after oom is disabled and allocations requested by the kernel thread are failing after all the tasks are frozen and OOM disabled. I wasn't able to catch a race where oom_killer_disable would really have to wait but I kinda expected the race is really unlikely. [ 242.609330] Killed process 2992 (mem_eater) total-vm:24412kB, anon-rss:2164kB, file-rss:4kB [ 243.628071] Unmarking 2992 OOM victim. oom_victims: 1 [ 243.636072] (elapsed 2.837 seconds) done. [ 243.641985] Trying to disable OOM killer [ 243.643032] Waiting for concurent OOM victims [ 243.644342] OOM killer disabled [ 243.645447] Freezing remaining freezable tasks ... (elapsed 0.005 seconds) done. [ 243.652983] Suspending console(s) (use no_console_suspend to debug) [ 243.903299] kmem_eater: page allocation failure: order:1, mode:0x204010 [...] [ 243.992600] PM: suspend of devices complete after 336.667 msecs [ 243.993264] PM: late suspend of devices complete after 0.660 msecs [ 243.994713] PM: noirq suspend of devices complete after 1.446 msecs [ 243.994717] ACPI: Preparing to enter system sleep state S3 [ 243.994795] PM: Saving platform NVS memory [ 243.994796] Disabling non-boot CPUs ... The first 2 patches are simple cleanups for OOM. They should go in regardless the rest IMO. Patches 3 and 4 are trivial printk -> pr_info conversion and they should go in ditto. The main patch is the last one and I would appreciate acks from Tejun and Rafael. I think the OOM part should be OK (except for __thaw_task vs. task_lock where a look from Oleg would appreciated) but I am not so sure I haven't screwed anything in the freezer code. I have found several surprises there. This patch (of 5): This patch is just a preparatory and it doesn't introduce any functional change. Note: I am utterly unhappy about lowmemory killer abusing TIF_MEMDIE just to wait for the oom victim and to prevent from new killing. This is just a side effect of the flag. The primary meaning is to give the oom victim access to the memory reserves and that shouldn't be necessary here. Signed-off-by: NMichal Hocko <mhocko@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
This one was driving me mad, with several lines of warnings during the allmodconfig build for a single bogus pointer cast. The warning was so verbose due to the indirect macro expansion explanation, and the whole thing was just for a debug printout. The bogus pointer-to-integer cast was pointless anyway, so just remove it, and use '%p' to show the pointer. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 2月, 2015 2 次提交
-
-
由 Linus Torvalds 提交于
Commit 84683a7e ("sata_dwc_460ex: enable COMPILE_TEST for the driver") enabled this driver for non-ppc460-ex platforms, but it was then disabled for ARM and ARM64 by commit 2de5a9c0 ("sata_dwc_460ex: disable compilation on ARM and ARM64") because it's too noisy and broken. This disabled is entirely, because it's too noisy on x86-64 too, and there's no point in disabling architectures one by one. At a minimum, the code isn't 64-bit clean, and even on 32-bit it is questionable whether it makes sense. Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kirill A. Shutemov 提交于
One bit in ->vm_flags is unused now! Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 2月, 2015 9 次提交
-
-
由 Antonios Motakis 提交于
As already demonstrated with PCI [1] and the platform bus [2], a driver_override property in sysfs can be used to bypass the id matching of a device to a AMBA driver. This can be used by VFIO to bind to any AMBA device requested by the user. [1] http://lists-archives.com/linux-kernel/28030441-pci-introduce-new-device-binding-path-using-pci_dev-driver_override.html [2] https://www.redhat.com/archives/libvir-list/2014-April/msg00382.htmlSigned-off-by: NAntonios Motakis <a.motakis@virtualopensystems.com> Reviewed-by: NKim Phillips <kim.phillips@freescale.com> Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
由 Carolyn Wyborny 提交于
This patch fixes indentation issue and error found in argument reported by static analysis. Without this patch, sparse and other static analysis errors will be found. Reported-by: NFengguang Wu <fengguang.wu@intel.com> Reported-by: NJulia Lawall <julia.lawall@lip6.fr> Signed-off-by: NCarolyn Wyborny <carolyn.wyborny@intel.com> Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Rafael J. Wysocki 提交于
Merge branch 'pci/host-generic' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci into acpi-resources modified: drivers/of/of_pci.c This fixes a build failure after merging the 'acpi-resources' branch with the PCI tree caused by bad interactions between that branch and the only commit in 'pci/host-generic'. Also that commit contains a bug which can be fixed by removing one line of code, so do that too. Link: http://marc.info/?l=linux-kernel&m=142344882101429&w=2 Link: http://marc.info/?l=linux-next&m=142346304003932&w=2Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Markus Elfring 提交于
The vunmap() function performs also input parameter validation. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net> Acked-by: NEli Cohen <eli@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Hariprasad Shenai 提交于
Add support to get option/expansion rom version flashed in the adapter via ethtool getdrvinfo function. Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yishai Hadas 提交于
The driver exposes interfaces that directly relate to HW state. Upon fatal error, consumers of these interfaces (ULPs) that rely on completion of all their posted work-request could hang, thereby introducing dependencies in shutdown order. To prevent this from happening, we manage the relevant resources (CQs, QPs) that are used by the device. Upon a fatal error, we now generate simulated completions for outstanding WQEs that were not completed at the time the HW was reset. It includes invoking the completion event handler for all involved CQs so that the ULPs will poll those CQs. When polled we return simulated CQEs with IB_WC_WR_FLUSH_ERR return code enabling ULPs to clean up their resources and not wait forever for completions upon receiving remove_one. The above change requires an extra check in the data path to make sure that when device is in error state, the simulated CQEs will be returned and no further WQEs will be posted. Signed-off-by: NYishai Hadas <yishaih@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Moni Shoua 提交于
When attaching a QP to a multicast address in bonded mode, there was an assumption that the port of the QP must be #1. This assumption isn't the case under the flow which enables maximal usage of the physical ports. Fix it by always checking the port of the original flow and create the mirrored flow on the other port. Fixes: c6215745 ('IB/mlx4: Load balance ports in port aggregation mode') Signed-off-by: NMoni Shoua <monis@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Moni Shoua 提交于
When queuing work to send the NETDEV_BONDING_INFO netdev event, it's possible that when the work is executed, the pointer to the slave becomes invalid. This can happen if between queuing the event and the execution of the work, the net-device was un-ensvaled and re-enslaved. Fix that by queuing a work with the data of the slave instead of the slave structure. Fixes: 69e61133 ('net/bonding: Notify state change on slaves') Reported-by: NNikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: NMoni Shoua <monis@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 George Spelvin 提交于
There was a bad typo in commit 43759d4f ("random: use an improved fast_mix() function") and I didn't notice because it "looked right", so I saw what I expected to see when I reviewed it. Only months later did I look and notice it's not the Threefish-inspired mix function that I had designed and optimized. Mea Culpa. Each input bit still has a chance to affect each output bit, and the fast pool is spilled *long* before it fills, so it's not a total disaster, but it's definitely not the intended great improvement. I'm still working on finding better rotation constants. These are good enough, but since it's unrolled twice, it's possible to get better mixing for free by using eight different constants rather than repeating the same four. Signed-off-by: NGeorge Spelvin <linux@horizon.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org # v3.16+ Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 2月, 2015 19 次提交
-
-
由 Carolyn Wyborny 提交于
This patch adds a call to u64_stats_init to Rx setup. This done in order to avoid lockdep errors with seqcount on newer kernels. Change-ID: Ia8ba8f0bcbd1c0e926f97d70aeee4ce4fd055e93 Signed-off-by: NCarolyn Wyborny <carolyn.wyborny@intel.com> Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
-
由 Anjali Singhai Jain 提交于
For all VSIs on a VEB, Loopback mode should be either on or off. Our configuration requires them to be ON so that VSIs can directly talk to each other without going out on the wire. Change-ID: I77b8310bc846329972b13b185949ab1431a46c30 Signed-off-by: NAnjali Singhai Jain <anjali.singhai@intel.com> Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
-
由 Vasu Dev 提交于
Set different dev_port value 1 for FCoE netdev than the default zero dev_port value for PF netdev, this helps biosdevname user tool to differentiate them correctly while both attached to the same PCI function. Change-ID: I8fb90e4ef52a1242f7580e49a3f0918735aee8ef Signed-off-by: NVasu Dev <vasu.dev@intel.com> Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
-
由 Greg Rose 提交于
s/enable/disable Change-ID: Ic0572a6c59d03e05a0a35d2e2e9d532e0512638d Signed-off-by: NGreg Rose <gregory.v.rose@intel.com> Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
-
由 Matt Jared 提交于
Make sure to clear the GPIO blink field, instead of OR'ing against zero if the field is already '1'. Change-ID: Ie52a52abd48f6f52b20778a6b8b0c542dfc9245c Signed-off-by: NMatt Jared <matthew.a.jared@intel.com> Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
-
由 Anjali Singhai Jain 提交于
This patch forces Tx descriptor writebacks on ITR by kicking off the SWINT interrupt when we notice that there are non-cache-aligned Tx descriptors waiting in the ring while interrupts are disabled under NAPI. Change-ID: dd6d9675629bf266c7515ad7a201394618c35444 Signed-off-by: NAnjali Singhai Jain <anjali.singhai@intel.com> Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
-
由 Mitch Williams 提交于
Stop the service task in the shutdown handler, preventing it from accessing the admin queue after it had been closed. This fixes a panic that could occur when the system was shut down with a lot of VFs enabled. Change-ID: I286735e3842de472385bbf7ad68d30331e508add Signed-off-by: NMitch Williams <mitch.a.williams@intel.com> Tested-by: NJim Young <james.m.young@intel.com> Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
-
由 Hariprasad Shenai 提交于
Handle clip_tbl debugfs entry, when clip_tbl isn't allocated. In commit b5a02f50 ("cxgb4: Update ipv6 address handling api") wrong argument was passed for single_open for clip_tbl debugfs entry, which led to below trace. Fixing it. ====== call Trace: [<ffffffffa073c606>] clip_tbl_open+0x16/0x30 [cxgb4] [<ffffffff8119e2fa>] do_dentry_open+0x21a/0x370 [<ffffffff8119e499>] vfs_open+0x49/0x50 [<ffffffff811b0d0e>] do_last+0x21e/0x800 [<ffffffff811b1382>] path_openat+0x92/0x470 [<ffffffff8110569f>] ? rb_reserve_next_event+0xaf/0x380 [<ffffffff8110569f>] ? rb_reserve_next_event+0xaf/0x380 [<ffffffff811b189a>] do_filp_open+0x4a/0xa0 [<ffffffff811bdc5d>] ? __alloc_fd+0xcd/0x140 [<ffffffff8119fa4a>] do_sys_open+0x11a/0x230 [<ffffffff8101219f>] ? syscall_trace_enter_phase2+0xaf/0x1b0 [<ffffffff8119fb9e>] SyS_open+0x1e/0x20 [<ffffffff815bf6f0>] tracesys_phase2+0xd4/0xd9 Code: 89 e5 66 66 66 66 90 48 8b 47 e0 48 8b 40 30 48 8b 40 58 c9 c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 66 66 66 90 48 8b 47 e0 <48> 8b 40 58 c9 c3 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 RIP [<ffffffff8120898d>] PDE_DATA+0xd/0x20 RSP <ffff8800b08c3c48> CR2: 0000000000000058 ===== Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Mitch Williams 提交于
Stop the watchdog during shutdown. Failing to do this causes a log full of admin queue errors and the occasional hang when the system is shut down. Change-ID: Ib2fd11213cca2fa589eb68577e86b1000c23c250 Signed-off-by: NMitch Williams <mitch.a.williams@intel.com> Tested-by: NJim Young <james.m.young@intel.com> Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
-
由 Mitch Williams 提交于
Occasionally on shutdown, the FW will hand us a bunch of messages filled with zeros, which can cause us to spin trying to handle them. Just ignore these and get on with shutting down. Change-ID: I347e9648f7153ad5a7b7e0847b87f7aad5f3e0da Signed-off-by: NMitch Williams <mitch.a.williams@intel.com> Tested-by: NJim Young <james.m.young@intel.com> Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
-
由 Mitch Williams 提交于
When the module is being unloaded, don't wait for the PF to politely handle all of our admin queue requests, as that might take forever with a lot of VFs enabled. Instead, just stop everything and request a VF reset. When the original shutdown code was written, VF resets were unreliable, so we avoided them. But with production hardware and firmware, and the 1.x PF driver, this is no longer the case. This fixes a potential multi-minute delay on driver unload, VF disable, or system shutdown. Change-ID: Ib43d6d860ef6b9b8f26e8dce0615a0302608c7d9 Signed-off-by: NMitch Williams <mitch.a.williams@intel.com> Tested-by: NJim Young <james.m.young@intel.com> Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
-
由 Mitch Williams 提交于
During VF deallocation, we need to lock out the VF reset code. However, we cannot depend on simply masking the interrupt, as this does not lock out the service task, which can still call the reset routine. Instead, leave the interrupt enabled, but add locking around the VF disable and reset routines. For the disable code, we wait to get the lock, as the reset code will take a finite amount of time to run. For the reset code, we just return if we fail to get the lock. Since we know that the VFs are being disabled, we don't need to handle the reset. This fixes a panic when disabling SR-IOV. Change-ID: Iea0a6cdef35c331f48c6d5b2f8e6f0e86322e7d8 Signed-off-by: NMitch Williams <mitch.a.williams@intel.com> Tested-by: NJim Young <james.m.young@intel.com> Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
-
由 Mitch Williams 提交于
When enabling 64 VFs and loading the VF driver in the host kernel, we can easily overrun the PF's admin receive queue. Double the size of this queue, and increase the work limit to allow the PF to handle more requests in a single pass through the service task. Change-ID: I0efbbdc61954bffad422a2f33c4b948a59370bf5 Signed-off-by: NMitch Williams <mitch.a.williams@intel.com> Tested-by: NJim Young <james.m.young@intel.com> Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
-
由 Mitch Williams 提交于
Delay a minimum of 10ms after VF reset, to allow the hardware's internal FIFOs to flush. Change-ID: I8a02ddb28c9f0d7303a1eb21d0b2443e5b4c1cda Signed-off-by: NMitch Williams <mitch.a.williams@intel.com> Tested-by: NJim Young <james.m.young@intel.com> Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
-
由 John W Linville 提交于
This I40E_FCOE block increments v_budget before it has been initialized, then v_budget gets overwritten a few lines later. This patch just reorders the code hunks in what I believe was the intended sequence. Coverity: CID 12600999Signed-off-by: NJohn W Linville <linville@tuxdriver.com> Tested-by: NJim Young <james.m.young@intel.com> Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
-
由 Rickard Strandqvist 提交于
Remove the function i40e_rx_is_fip() that is not used anywhere. This was partially found by using a static code analysis program called cppcheck. Signed-off-by: NRickard Strandqvist <rickard_strandqvist@spectrumdigital.se> Tested-by: NJim Young <james.m.young@intel.com> Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
-
由 Rasmus Villemoes 提交于
src_ip is a pointer to a union vxlan_addr, one member of which is a struct sockaddr. Passing a pointer to src_ip is wrong; one should pass the value of src_ip itself. Since %pIS formally expects something of type struct sockaddr*, let's pass a pointer to the appropriate union member, though this of course doesn't change the generated code. Fixes: e4c7ed41 ("vxlan: add ipv6 support") Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Shrikrishna Khare 提交于
The hex constant chosen for VMXNET3_REV1_MAGIC is offensive, replace it with its decimal equivalent. Signed-off-by: NShrikrishna Khare <skhare@vmware.com> Reviewed-by: NShreyas Bhatewara <sbhatewara@vmware.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Receive Flow Steering is a nice solution but suffers from hash collisions when a mix of connected and unconnected traffic is received on the host, when flow hash table is populated. Also, clearing flow in inet_release() makes RFS not very good for short lived flows, as many packets can follow close(). (FIN , ACK packets, ...) This patch extends the information stored into global hash table to not only include cpu number, but upper part of the hash value. I use a 32bit value, and dynamically split it in two parts. For host with less than 64 possible cpus, this gives 6 bits for the cpu number, and 26 (32-6) bits for the upper part of the hash. Since hash bucket selection use low order bits of the hash, we have a full hash match, if /proc/sys/net/core/rps_sock_flow_entries is big enough. If the hash found in flow table does not match, we fallback to RPS (if it is enabled for the rxqueue). This means that a packet for an non connected flow can avoid the IPI through a unrelated/victim CPU. This also means we no longer have to clear the table at socket close time, and this helps short lived flows performance. Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NTom Herbert <therbert@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-