- 29 12月, 2018 15 次提交
-
-
由 Jérôme Glisse 提交于
Patch series "mmu notifier contextual informations", v2. This patchset adds contextual information, why an invalidation is happening, to mmu notifier callback. This is necessary for user of mmu notifier that wish to maintains their own data structure without having to add new fields to struct vm_area_struct (vma). For instance device can have they own page table that mirror the process address space. When a vma is unmap (munmap() syscall) the device driver can free the device page table for the range. Today we do not have any information on why a mmu notifier call back is happening and thus device driver have to assume that it is always an munmap(). This is inefficient at it means that it needs to re-allocate device page table on next page fault and rebuild the whole device driver data structure for the range. Other use case beside munmap() also exist, for instance it is pointless for device driver to invalidate the device page table when the invalidation is for the soft dirtyness tracking. Or device driver can optimize away mprotect() that change the page table permission access for the range. This patchset enables all this optimizations for device drivers. I do not include any of those in this series but another patchset I am posting will leverage this. The patchset is pretty simple from a code point of view. The first two patches consolidate all mmu notifier arguments into a struct so that it is easier to add/change arguments. The last patch adds the contextual information (munmap, protection, soft dirty, clear, ...). This patch (of 3): To avoid having to change many callback definition everytime we want to add a parameter use a structure to group all parameters for the mmu_notifier invalidate_range_start/end callback. No functional changes with this patch. [akpm@linux-foundation.org: fix drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c kerneldoc] Link: http://lkml.kernel.org/r/20181205053628.3210-2-jglisse@redhat.comSigned-off-by: NJérôme Glisse <jglisse@redhat.com> Acked-by: NJan Kara <jack@suse.cz> Acked-by: Jason Gunthorpe <jgg@mellanox.com> [infiniband] Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Ross Zwisler <zwisler@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krcmar <rkrcmar@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Christian Koenig <christian.koenig@amd.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Minchan Kim 提交于
If there are lots of write IO with flash device, it could have a wearout problem of storage. To overcome the problem, admin needs to design write limitation to guarantee flash health for entire product life. This patch creates a new knob "writeback_limit" for zram. writeback_limit's default value is 0 so that it doesn't limit any writeback. If admin want to measure writeback count in a certain period, he could know it via /sys/block/zram0/bd_stat's 3rd column. If admin want to limit writeback as per-day 400M, he could do it like below. MB_SHIFT=20 4K_SHIFT=12 echo $((400<<MB_SHIFT>>4K_SHIFT)) > \ /sys/block/zram0/writeback_limit. If admin want to allow further write again, he could do it like below echo 0 > /sys/block/zram0/writeback_limit If admin want to see remaining writeback budget, cat /sys/block/zram0/writeback_limit The writeback_limit count will reset whenever you reset zram (e.g., system reboot, echo 1 > /sys/block/zramX/reset) so keeping how many of writeback happened until you reset the zram to allocate extra writeback budget in next setting is user's job. [minchan@kernel.org: v4] Link: http://lkml.kernel.org/r/20181203024045.153534-8-minchan@kernel.org Link: http://lkml.kernel.org/r/20181127055429.251614-8-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org> Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Joey Pabalinas <joeypabalinas@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Minchan Kim 提交于
bd_stat represents things that happened in the backing device. Currently it supports bd_counts, bd_reads and bd_writes which are helpful to understand wearout of flash and memory saving. [minchan@kernel.org: v4] Link: http://lkml.kernel.org/r/20181203024045.153534-7-minchan@kernel.org Link: http://lkml.kernel.org/r/20181127055429.251614-7-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org> Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Joey Pabalinas <joeypabalinas@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Minchan Kim 提交于
Add a new feature "zram idle/huge page writeback". In the zram-swap use case, zram usually has many idle/huge swap pages. It's pointless to keep them in memory (ie, zram). To solve this problem, this feature introduces idle/huge page writeback to the backing device so the goal is to save more memory space on embedded systems. Normal sequence to use idle/huge page writeback feature is as follows, while (1) { # mark allocated zram slot to idle echo all > /sys/block/zram0/idle # leave system working for several hours # Unless there is no access for some blocks on zram, # they are still IDLE marked pages. echo "idle" > /sys/block/zram0/writeback or/and echo "huge" > /sys/block/zram0/writeback # write the IDLE or/and huge marked slot into backing device # and free the memory. } Per the discussion at https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u, This patch removes direct incommpressibe page writeback feature (d2afd25114f4 ("zram: write incompressible pages to backing device")). Below concerns from Sergey: == &< == "IDLE writeback" is superior to "incompressible writeback". "incompressible writeback" is completely unpredictable and uncontrollable; it depens on data patterns and compression algorithms. While "IDLE writeback" is predictable. I even suspect, that, *ideally*, we can remove "incompressible writeback". "IDLE pages" is a super set which also includes "incompressible" pages. So, technically, we still can do "incompressible writeback" from "IDLE writeback" path; but a much more reasonable one, based on a page idling period. I understand that you want to keep "direct incompressible writeback" around. ZRAM is especially popular on devices which do suffer from flash wearout, so I can see "incompressible writeback" path becoming a dead code, long term. == &< == Below concerns from Minchan: == &< == My concern is if we enable CONFIG_ZRAM_WRITEBACK in this implementation, both hugepage/idlepage writeck will turn on. However someuser want to enable only idlepage writeback so we need to introduce turn on/off knob for hugepage or new CONFIG_ZRAM_IDLEPAGE_WRITEBACK for those usecase. I don't want to make it complicated *if possible*. Long term, I imagine we need to make VM aware of new swap hierarchy a little bit different with as-is. For example, first high priority swap can return -EIO or -ENOCOMP, swap try to fallback to next lower priority swap device. With that, hugepage writeback will work tranparently. So we could regard it as regression because incompressible pages doesn't go to backing storage automatically. Instead, user should do it via "echo huge" > /sys/block/zram/writeback" manually. == &< == Link: http://lkml.kernel.org/r/20181127055429.251614-6-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org> Reviewed-by: NJoey Pabalinas <joeypabalinas@gmail.com> Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Minchan Kim 提交于
To support idle page writeback with upcoming patches, this patch introduces a new ZRAM_IDLE flag. Userspace can mark zram slots as "idle" via "echo all > /sys/block/zramX/idle" which marks every allocated zram slot as ZRAM_IDLE. User could see it by /sys/kernel/debug/zram/zram0/block_state. 300 75.033841 ...i 301 63.806904 s..i 302 63.806919 ..hi Once there is IO for the slot, the mark will be disappeared. 300 75.033841 ... 301 63.806904 s..i 302 63.806919 ..hi Therefore, 300th block is idle zpage. With this feature, user can how many zram has idle pages which are waste of memory. Link: http://lkml.kernel.org/r/20181127055429.251614-5-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org> Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: NJoey Pabalinas <joeypabalinas@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Minchan Kim 提交于
Rename some variables and restructure some code for better readability in writeback and zs_free_page. Link: http://lkml.kernel.org/r/20181127055429.251614-4-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org> Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: NJoey Pabalinas <joeypabalinas@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Minchan Kim 提交于
If blkdev_get fails, we shouldn't do blkdev_put. Otherwise, kernel emits below log. This patch fixes it. WARNING: CPU: 0 PID: 1893 at fs/block_dev.c:1828 blkdev_put+0x105/0x120 Modules linked in: CPU: 0 PID: 1893 Comm: swapoff Not tainted 4.19.0+ #453 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 RIP: 0010:blkdev_put+0x105/0x120 Call Trace: __x64_sys_swapoff+0x46d/0x490 do_syscall_64+0x5a/0x190 entry_SYSCALL_64_after_hwframe+0x49/0xbe irq event stamp: 4466 hardirqs last enabled at (4465): __free_pages_ok+0x1e3/0x490 hardirqs last disabled at (4466): trace_hardirqs_off_thunk+0x1a/0x1c softirqs last enabled at (3420): __do_softirq+0x333/0x446 softirqs last disabled at (3407): irq_exit+0xd1/0xe0 Link: http://lkml.kernel.org/r/20181127055429.251614-3-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org> Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: NJoey Pabalinas <joeypabalinas@gmail.com> Cc: <stable@vger.kernel.org> [4.14+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Minchan Kim 提交于
Patch series "zram idle page writeback", v3. Inherently, swap device has many idle pages which are rare touched since it was allocated. It is never problem if we use storage device as swap. However, it's just waste for zram-swap. This patchset supports zram idle page writeback feature. * Admin can define what is idle page "no access since X time ago" * Admin can define when zram should writeback them * Admin can define when zram should stop writeback to prevent wearout Details are in each patch's description. This patch (of 7): ================================ WARNING: inconsistent lock state 4.19.0+ #390 Not tainted -------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. zram_verify/2095 [HC0[0]:SC1[1]:HE1:SE0] takes: 00000000b1828693 (&(&zram->bitmap_lock)->rlock){+.?.}, at: put_entry_bdev+0x1e/0x50 {SOFTIRQ-ON-W} state was registered at: _raw_spin_lock+0x2c/0x40 zram_make_request+0x755/0xdc9 generic_make_request+0x373/0x6a0 submit_bio+0x6c/0x140 __swap_writepage+0x3a8/0x480 shrink_page_list+0x1102/0x1a60 shrink_inactive_list+0x21b/0x3f0 shrink_node_memcg.constprop.99+0x4f8/0x7e0 shrink_node+0x7d/0x2f0 do_try_to_free_pages+0xe0/0x300 try_to_free_pages+0x116/0x2b0 __alloc_pages_slowpath+0x3f4/0xf80 __alloc_pages_nodemask+0x2a2/0x2f0 __handle_mm_fault+0x42e/0xb50 handle_mm_fault+0x55/0xb0 __do_page_fault+0x235/0x4b0 page_fault+0x1e/0x30 irq event stamp: 228412 hardirqs last enabled at (228412): [<ffffffff98245846>] __slab_free+0x3e6/0x600 hardirqs last disabled at (228411): [<ffffffff98245625>] __slab_free+0x1c5/0x600 softirqs last enabled at (228396): [<ffffffff98e0031e>] __do_softirq+0x31e/0x427 softirqs last disabled at (228403): [<ffffffff98072051>] irq_exit+0xd1/0xe0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&zram->bitmap_lock)->rlock); <Interrupt> lock(&(&zram->bitmap_lock)->rlock); *** DEADLOCK *** no locks held by zram_verify/2095. stack backtrace: CPU: 5 PID: 2095 Comm: zram_verify Not tainted 4.19.0+ #390 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Call Trace: <IRQ> dump_stack+0x67/0x9b print_usage_bug+0x1bd/0x1d3 mark_lock+0x4aa/0x540 __lock_acquire+0x51d/0x1300 lock_acquire+0x90/0x180 _raw_spin_lock+0x2c/0x40 put_entry_bdev+0x1e/0x50 zram_free_page+0xf6/0x110 zram_slot_free_notify+0x42/0xa0 end_swap_bio_read+0x5b/0x170 blk_update_request+0x8f/0x340 scsi_end_request+0x2c/0x1e0 scsi_io_completion+0x98/0x650 blk_done_softirq+0x9e/0xd0 __do_softirq+0xcc/0x427 irq_exit+0xd1/0xe0 do_IRQ+0x93/0x120 common_interrupt+0xf/0xf </IRQ> With writeback feature, zram_slot_free_notify could be called in softirq context by end_swap_bio_read. However, bitmap_lock is not aware of that so lockdep yell out: get_entry_bdev spin_lock(bitmap->lock); irq softirq end_swap_bio_read zram_slot_free_notify zram_slot_lock <-- deadlock prone zram_free_page put_entry_bdev spin_lock(bitmap->lock); <-- deadlock prone With akpm's suggestion (i.e. bitmap operation is already atomic), we could remove bitmap lock. It might fail to find a empty slot if serious contention happens. However, it's not severe problem because huge page writeback has already possiblity to fail if there is severe memory pressure. Worst case is just keeping the incompressible in memory, not storage. The other problem is zram_slot_lock in zram_slot_slot_free_notify. To make it safe is this patch introduces zram_slot_trylock where zram_slot_free_notify uses it. Although it's rare to be contented, this patch adds new debug stat "miss_free" to keep monitoring how often it happens. Link: http://lkml.kernel.org/r/20181127055429.251614-2-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org> Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: NJoey Pabalinas <joeypabalinas@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Hildenbrand 提交于
Userspace should always be in charge of how to online memory and if memory should be onlined automatically in the kernel. Let's drop the parameter to overwrite this - XEN passes memhp_auto_online, just like add_memory(), so we can directly use that instead internally. Link: http://lkml.kernel.org/r/20181123123740.27652-1-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com> Acked-by: NMichal Hocko <mhocko@suse.com> Reviewed-by: NOscar Salvador <osalvador@suse.de> Acked-by: NJuergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Arun KS <arunks@codeaurora.org> Cc: Mathieu Malaterre <malat@debian.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Wei Yang 提交于
In cb5e39b8 ("drivers: base: refactor add_memory_section() to add_memory_block()"), add_memory_block() is introduced, which is only invoked in memory_dev_init(). When combining these two loops in memory_dev_init() and add_memory_block(), they looks like this: for (i = 0; i < NR_MEM_SECTIONS; i += sections_per_block) for (j = i; (j < i + sections_per_block) && j < NR_MEM_SECTIONS; j++) Since it is sure the (i < NR_MEM_SECTIONS) and j sits in its own memory block, the check of (j < NR_MEM_SECTIONS) is not necessary. This patch just removes this check. Link: http://lkml.kernel.org/r/20181123222811.18216-1-richard.weiyang@gmail.comSigned-off-by: NWei Yang <richard.weiyang@gmail.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Seth Jennings <sjenning@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dan Williams 提交于
At Maintainer Summit, Greg brought up a topic I proposed around EXPORT_SYMBOL_GPL usage. The motivation was considerations for when EXPORT_SYMBOL_GPL is warranted and the criteria for taking the exceptional step of reclassifying an existing export. Specifically, I wanted to make the case that although the line is fuzzy and hard to specify in abstract terms, it is nonetheless clear that devm_memremap_pages() and HMM (Heterogeneous Memory Management) have crossed it. The devm_memremap_pages() facility should have been EXPORT_SYMBOL_GPL from the beginning, and HMM as a derivative of that functionality should have naturally picked up that designation as well. Contrary to typical rules, the HMM infrastructure was merged upstream with zero in-tree consumers. There was a promise at the time that those users would be merged "soon", but it has been over a year with no drivers arriving. While the Nouveau driver is about to belatedly make good on that promise it is clear that HMM was targeted first and foremost at an out-of-tree consumer. HMM is derived from devm_memremap_pages(), a facility Christoph and I spearheaded to support persistent memory. It combines a device lifetime model with a dynamically created 'struct page' / memmap array for any physical address range. It enables coordination and control of the many code paths in the kernel built to interact with memory via 'struct page' objects. With HMM the integration goes even deeper by allowing device drivers to hook and manipulate page fault and page free events. One interpretation of when EXPORT_SYMBOL is suitable is when it is exporting stable and generic leaf functionality. The devm_memremap_pages() facility continues to see expanding use cases, peer-to-peer DMA being the most recent, with no clear end date when it will stop attracting reworks and semantic changes. It is not suitable to export devm_memremap_pages() as a stable 3rd party driver API due to the fact that it is still changing and manipulates core behavior. Moreover, it is not in the best interest of the long term development of the core memory management subsystem to permit any external driver to effectively define its own system-wide memory management policies with no encouragement to engage with upstream. I am also concerned that HMM was designed in a way to minimize further engagement with the core-MM. That, with these hooks in place, device-drivers are free to implement their own policies without much consideration for whether and how the core-MM could grow to meet that need. Going forward not only should HMM be EXPORT_SYMBOL_GPL, but the core-MM should be allowed the opportunity and stimulus to change and address these new use cases as first class functionality. Original changelog: hmm_devmem_add(), and hmm_devmem_add_resource() duplicated devm_memremap_pages() and are now simple now wrappers around the core facility to inject a dev_pagemap instance into the global pgmap_radix and hook page-idle events. The devm_memremap_pages() interface is base infrastructure for HMM. HMM has more and deeper ties into the kernel memory management implementation than base ZONE_DEVICE which is itself a EXPORT_SYMBOL_GPL facility. Originally, the HMM page structure creation routines copied the devm_memremap_pages() code and reused ZONE_DEVICE. A cleanup to unify the implementations was discussed during the initial review: http://lkml.iu.edu/hypermail/linux/kernel/1701.2/00812.html Recent work to extend devm_memremap_pages() for the peer-to-peer-DMA facility enabled this cleanup to move forward. In addition to the integration with devm_memremap_pages() HMM depends on other GPL-only symbols: mmu_notifier_unregister_no_release percpu_ref region_intersects __class_create It goes further to consume / indirectly expose functionality that is not exported to any other driver: alloc_pages_vma walk_page_range HMM is derived from devm_memremap_pages(), and extends deep core-kernel fundamentals. Similar to devm_memremap_pages(), mark its entry points EXPORT_SYMBOL_GPL(). [logang@deltatee.com: PCI/P2PDMA: match interface changes to devm_memremap_pages()] Link: http://lkml.kernel.org/r/20181130225911.2900-1-logang@deltatee.com Link: http://lkml.kernel.org/r/154275560565.76910.15919297436557795278.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NLogan Gunthorpe <logang@deltatee.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Balbir Singh <bsingharora@gmail.com>, Cc: Michal Hocko <mhocko@suse.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dan Williams 提交于
The last step before devm_memremap_pages() returns success is to allocate a release action, devm_memremap_pages_release(), to tear the entire setup down. However, the result from devm_add_action() is not checked. Checking the error from devm_add_action() is not enough. The api currently relies on the fact that the percpu_ref it is using is killed by the time the devm_memremap_pages_release() is run. Rather than continue this awkward situation, offload the responsibility of killing the percpu_ref to devm_memremap_pages_release() directly. This allows devm_memremap_pages() to do the right thing relative to init failures and shutdown. Without this change we could fail to register the teardown of devm_memremap_pages(). The likelihood of hitting this failure is tiny as small memory allocations almost always succeed. However, the impact of the failure is large given any future reconfiguration, or disable/enable, of an nvdimm namespace will fail forever as subsequent calls to devm_memremap_pages() will fail to setup the pgmap_radix since there will be stale entries for the physical address range. An argument could be made to require that the ->kill() operation be set in the @pgmap arg rather than passed in separately. However, it helps code readability, tracking the lifetime of a given instance, to be able to grep the kill routine directly at the devm_memremap_pages() call site. Link: http://lkml.kernel.org/r/154275558526.76910.7535251937849268605.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com> Fixes: e8d51348 ("memremap: change devm_memremap_pages interface...") Reviewed-by: N"Jérôme Glisse" <jglisse@redhat.com> Reported-by: NLogan Gunthorpe <logang@deltatee.com> Reviewed-by: NLogan Gunthorpe <logang@deltatee.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Arun KS 提交于
totalram_pages and totalhigh_pages are made static inline function. Main motivation was that managed_page_count_lock handling was complicating things. It was discussed in length here, https://lore.kernel.org/patchwork/patch/995739/#1181785 So it seemes better to remove the lock and convert variables to atomic, with preventing poteintial store-to-read tearing as a bonus. [akpm@linux-foundation.org: coding style fixes] Link: http://lkml.kernel.org/r/1542090790-21750-4-git-send-email-arunks@codeaurora.orgSigned-off-by: NArun KS <arunks@codeaurora.org> Suggested-by: NMichal Hocko <mhocko@suse.com> Suggested-by: NVlastimil Babka <vbabka@suse.cz> Reviewed-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Reviewed-by: NPavel Tatashin <pasha.tatashin@soleen.com> Acked-by: NMichal Hocko <mhocko@suse.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Arun KS 提交于
totalram_pages, zone->managed_pages and totalhigh_pages updates are protected by managed_page_count_lock, but readers never care about it. Convert these variables to atomic to avoid readers potentially seeing a store tear. This patch converts zone->managed_pages. Subsequent patches will convert totalram_panges, totalhigh_pages and eventually managed_page_count_lock will be removed. Main motivation was that managed_page_count_lock handling was complicating things. It was discussed in length here, https://lore.kernel.org/patchwork/patch/995739/#1181785 So it seemes better to remove the lock and convert variables to atomic, with preventing poteintial store-to-read tearing as a bonus. Link: http://lkml.kernel.org/r/1542090790-21750-3-git-send-email-arunks@codeaurora.orgSigned-off-by: NArun KS <arunks@codeaurora.org> Suggested-by: NMichal Hocko <mhocko@suse.com> Suggested-by: NVlastimil Babka <vbabka@suse.cz> Reviewed-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Acked-by: NMichal Hocko <mhocko@suse.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Reviewed-by: NPavel Tatashin <pasha.tatashin@soleen.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Arun KS 提交于
Patch series "mm: convert totalram_pages, totalhigh_pages and managed pages to atomic", v5. This series converts totalram_pages, totalhigh_pages and zone->managed_pages to atomic variables. totalram_pages, zone->managed_pages and totalhigh_pages updates are protected by managed_page_count_lock, but readers never care about it. Convert these variables to atomic to avoid readers potentially seeing a store tear. Main motivation was that managed_page_count_lock handling was complicating things. It was discussed in length here, https://lore.kernel.org/patchwork/patch/995739/#1181785 It seemes better to remove the lock and convert variables to atomic. With the change, preventing poteintial store-to-read tearing comes as a bonus. This patch (of 4): This is in preparation to a later patch which converts totalram_pages and zone->managed_pages to atomic variables. Please note that re-reading the value might lead to a different value and as such it could lead to unexpected behavior. There are no known bugs as a result of the current code but it is better to prevent from them in principle. Link: http://lkml.kernel.org/r/1542090790-21750-2-git-send-email-arunks@codeaurora.orgSigned-off-by: NArun KS <arunks@codeaurora.org> Reviewed-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Acked-by: NMichal Hocko <mhocko@suse.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Reviewed-by: NPavel Tatashin <pasha.tatashin@soleen.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 12月, 2018 7 次提交
-
-
由 Colin Ian King 提交于
The two different assignments for pkt_len are actually the same and so the if statement is redundant and can be removed. Masking a u8 return value from inb() with 0xFF is also redundant and can also be emoved. Similarly, the two different outb calls are identical as the mask of 0xff on the second outb is redundant since a u8 is being written, so the if statement is also redundant and can be also removed. Detected by CoverityScan, CID#1475639 ("Identical code for different branches") V2: Remove the if statement for the outb calls, thanks to David Miller for spotting this. Signed-off-by: NColin Ian King <colin.king@canonical.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ivan Mironov 提交于
This happened when I tried to boot normal Fedora 29 system with latest available kernel (from fedora rawhide, plus some unrelated custom patches): BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 PGD 0 P4D 0 Oops: 0010 [#1] SMP PTI CPU: 6 PID: 1422 Comm: libvirtd Tainted: G I 4.20.0-0.rc7.git3.hpsa2.1.fc29.x86_64 #1 Hardware name: HP ProLiant BL460c G6, BIOS I24 05/21/2018 RIP: 0010: (null) Code: Bad RIP value. RSP: 0018:ffffa47ccdc9fbe0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 00000000000003e8 RCX: ffffa47ccdc9fbf8 RDX: ffffa47ccdc9fc00 RSI: ffff97d9ee7b01f8 RDI: ffff97d9f0150b80 RBP: ffff97d9f0150b80 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000003 R13: ffff97d9ef1e53e8 R14: 0000000000000009 R15: ffff97d9f0ac6730 FS: 00007f4d224ef700(0000) GS:ffff97d9fa200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 00000011ece52006 CR4: 00000000000206e0 Call Trace: ? bnx2x_chip_cleanup+0x195/0x610 [bnx2x] ? bnx2x_nic_unload+0x1e2/0x8f0 [bnx2x] ? bnx2x_reload_if_running+0x24/0x40 [bnx2x] ? bnx2x_set_features+0x79/0xa0 [bnx2x] ? __netdev_update_features+0x244/0x9e0 ? netlink_broadcast_filtered+0x136/0x4b0 ? netdev_update_features+0x22/0x60 ? dev_disable_lro+0x1c/0xe0 ? devinet_sysctl_forward+0x1c6/0x211 ? proc_sys_call_handler+0xab/0x100 ? __vfs_write+0x36/0x1a0 ? rcu_read_lock_sched_held+0x79/0x80 ? rcu_sync_lockdep_assert+0x2e/0x60 ? __sb_start_write+0x14c/0x1b0 ? vfs_write+0x159/0x1c0 ? vfs_write+0xba/0x1c0 ? ksys_write+0x52/0xc0 ? do_syscall_64+0x60/0x1f0 ? entry_SYSCALL_64_after_hwframe+0x49/0xbe After some investigation I figured out that recently added cleanup code tries to call VLAN filtering de-initialization function which exist only for newer hardware. Corresponding function pointer is not set (== 0) for older hardware, namely these chips: #define CHIP_NUM_57710 0x164e #define CHIP_NUM_57711 0x164f #define CHIP_NUM_57711E 0x1650 And I have one of those in my test system: Broadcom Inc. and subsidiaries NetXtreme II BCM57711E 10-Gigabit PCIe [14e4:1650] Function bnx2x_init_vlan_mac_fp_objs() from drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.h decides whether to initialize relevant pointers in bnx2x_sp_objs.vlan_obj or not. This regression was introduced after v4.20-rc7, and still exists in v4.20 release. Fixes: 04f05230 ("bnx2x: Remove configured vlans as part of unload sequence.") Signed-off-by: NIvan Mironov <mironov.ivan@gmail.com> Signed-off-by: NIvan Mironov <mironov.ivan@gmail.com> Acked-by: NSudarsana Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Julia Lawall 提交于
Drop LIST_HEAD where the variable it declares has never been used. The semantic patch that fixes this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ identifier x; @@ - LIST_HEAD(x); ... when != x // </smpl> Fixes: c82e9aa0 ("mlx4_core: resource tracking for HCA resources used by guests") Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Julia Lawall 提交于
Drop LIST_HEAD where the variable it declares is never used. The uses were removed in 244cd96a ("net_sched: remove list_head from tc_action"), but not the declaration. The semantic patch that fixes this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ identifier x; @@ - LIST_HEAD(x); ... when != x // </smpl> Fixes: 244cd96a ("net_sched: remove list_head from tc_action") Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Julia Lawall 提交于
Drop LIST_HEAD where the variable it declares is never used. These became useless in 244cd96a ("net_sched: remove list_head from tc_action") The semantic patch that fixes this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ identifier x; @@ - LIST_HEAD(x); ... when != x // </smpl> Fixes: 244cd96a ("net_sched: remove list_head from tc_action") Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 kbuild test robot 提交于
drivers/net/ethernet/mellanox/mlx5/core/en_rep.c:1339:57-58: Unneeded semicolon Remove unneeded semicolon. Generated by: scripts/coccinelle/misc/semicolon.cocci Fixes: 4c8fb298 ("net/mlx5e: Increase VF representors' SQ size to 128") CC: Gavi Teitz <gavi@mellanox.com> Signed-off-by: Nkbuild test robot <fengguang.wu@intel.com> Reviewed-by: NLeon Romanovsky <leonro@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
skb->sp doesn't exist anymore in the next-next tree, so mips defconfig no longer builds. Use helper instead to reset the secpath. Not even compile tested. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reported-by: NGuenter Roeck <linux@roeck-us.net> Fixes: 4165079b ("net: switch secpath to use skb extension infrastructure") Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 12月, 2018 5 次提交
-
-
由 Anson Huang 提交于
i.MX PWM module's ipg_clk_s is for PWM register access, on most of i.MX SoCs, this ipg_clk_s is from system ipg clock or perclk which is always enabled, but on i.MX7D, the ipg_clk_s is from PWM1_CLK_ROOT which is controlled by CCGR132, that means the CCGR132 MUST be enabled first before accessing PWM registers on i.MX7D. This patch adds ipg clock operation to make sure register access successfully on i.MX7D and it fixes Linux kernel boot up hang during PWM driver probe. Fixes: 4a23e6ee ("ARM: dts: imx7d-sdb: Restore pwm backlight support") Signed-off-by: NAnson Huang <Anson.Huang@nxp.com> Signed-off-by: NThierry Reding <thierry.reding@gmail.com>
-
由 Alexander Shiyan 提交于
Adopt the SPDX license identifier headers to ease license compliance management. Signed-off-by: NAlexander Shiyan <shc_work@mail.ru> Signed-off-by: NThierry Reding <thierry.reding@gmail.com>
-
由 Alexander Shiyan 提交于
Commit e39c0df1 ("pwm: Introduce the pwm_args concept") has changed the variable for the period for clps711x-pwm driver, so now pwm_get/set_period() works with pwm->state.period variable instead of pwm->args.period. This patch changes the period variable in other places where it is used. Signed-off-by: NAlexander Shiyan <shc_work@mail.ru> Signed-off-by: NThierry Reding <thierry.reding@gmail.com>
-
由 Stefan Wahren 提交于
Adopt the SPDX license identifier headers to ease license compliance management. Cc: Bart Tanghe <bart.tanghe@thomasmore.be> Signed-off-by: NStefan Wahren <stefan.wahren@i2se.com> Reviewed-by: NEric Anholt <eric@anholt.net> Signed-off-by: NThierry Reding <thierry.reding@gmail.com>
-
由 Clément Péron 提交于
The Cygnus architecture uses a Kona PWM. This is already present in the device tree but can't be built actually. Hence, allow the Kona PWM to be built for the Cygnus architecture. Signed-off-by: NClément Péron <peron.clem@gmail.com> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Reviewed-by: NScott Branden <scott.branden@broadcom.com> Acked-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: NThierry Reding <thierry.reding@gmail.com>
-
- 23 12月, 2018 13 次提交
-
-
由 Eric Biggers 提交于
Remove dead code related to internal IV generators, which are no longer used since they've been replaced with the "seqiv" and "echainiv" templates. The removed code includes: - The "givcipher" (GIVCIPHER) algorithm type. No algorithms are registered with this type anymore, so it's unneeded. - The "const char *geniv" member of aead_alg, ablkcipher_alg, and blkcipher_alg. A few algorithms still set this, but it isn't used anymore except to show via /proc/crypto and CRYPTO_MSG_GETALG. Just hardcode "<default>" or "<none>" in those cases. - The 'skcipher_givcrypt_request' structure, which is never used. Signed-off-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Eric Biggers 提交于
Fixes: cf718eaa ("crypto: cavium/nitrox - Enabled Mailbox support") Signed-off-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Nagadheeraj Rottela 提交于
Added support to offload AEAD ciphers to NITROX. Currently supported AEAD cipher is 'gcm(aes)'. Signed-off-by: NNagadheeraj Rottela <rnagadheeraj@marvell.com> Reviewed-by: NSrikanth Jampala <jsrikanth@marvell.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Fabio Estevam 提交于
The following build warnings are seen when building for ARM64 allmodconfig: drivers/crypto/mxc-scc.c:181:20: warning: format '%d' expects argument of type 'int', but argument 5 has type 'size_t' {aka 'long unsigned int'} [-Wformat=] drivers/crypto/mxc-scc.c:186:21: warning: format '%d' expects argument of type 'int', but argument 4 has type 'size_t' {aka 'long unsigned int'} [-Wformat=] drivers/crypto/mxc-scc.c:277:21: warning: format '%d' expects argument of type 'int', but argument 4 has type 'size_t' {aka 'long unsigned int'} [-Wformat=] drivers/crypto/mxc-scc.c:339:3: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] drivers/crypto/mxc-scc.c:340:3: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] Fix them by using the %zu specifier to print a size_t variable and using a plain %x to print the result of a readl(). Signed-off-by: NFabio Estevam <festevam@gmail.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Harsh Jain 提交于
Fix error counter increment in AEAD decrypt operation when validation of tag is done in Driver instead of H/W. Signed-off-by: NHarsh Jain <harsh@chelsio.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Harsh Jain 提交于
Reset the counters on receiving detach from Cxgb4. Signed-off-by: NAtul Gupta <atul.gupta@chelsio.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Harsh Jain 提交于
chcr receives "CXGB4_STATE_DETACH" event on PCI Shutdown. Wait for processing of inflight request and Mark the device unavailable. Signed-off-by: NHarsh Jain <harsh@chelsio.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Harsh Jain 提交于
Send dma address as value to function arguments instead of pointer. Signed-off-by: NHarsh Jain <harsh@chelsio.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Harsh Jain 提交于
Use tx_channel_id instead of rx_channel_id. Signed-off-by: NHarsh Jain <harsh@chelsio.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Harsh Jain 提交于
Send input as IV | AAD | Data. It will allow sending IV as Immediate Data and Creates space in Work request to add more dma mapped entries. Signed-off-by: NHarsh Jain <harsh@chelsio.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 YueHaibing 提交于
Fixes gcc '-Wunused-but-set-variable' warning: drivers/crypto/chelsio/chcr_ipsec.c: In function 'chcr_ipsec_xmit': drivers/crypto/chelsio/chcr_ipsec.c:674:33: warning: variable 'kctx_len' set but not used [-Wunused-but-set-variable] unsigned int flits = 0, ndesc, kctx_len; It not used since commit 8362ea16 ("crypto: chcr - ESN for Inline IPSec Tx") Signed-off-by: NYueHaibing <yuehaibing@huawei.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Nathan Chancellor 提交于
Clang warns when one enumerated type is implicitly converted to another: drivers/crypto/ux500/hash/hash_core.c:169:4: warning: implicit conversion from enumeration type 'enum dma_data_direction' to different enumeration type 'enum dma_transfer_direction' [-Wenum-conversion] direction, DMA_CTRL_ACK | DMA_PREP_INTERRUPT); ^~~~~~~~~ 1 warning generated. dmaengine_prep_slave_sg expects an enum from dma_transfer_direction. We know that the only direction supported by this function is DMA_TO_DEVICE because of the check at the top of this function so we can just use the equivalent value from dma_transfer_direction. DMA_TO_DEVICE = DMA_MEM_TO_DEV = 1 Signed-off-by: NNathan Chancellor <natechancellor@gmail.com> Reviewed-by: NNick Desaulniers <ndesaulniers@google.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Nathan Chancellor 提交于
Clang warns when one enumerated type is implicitly converted to another: drivers/crypto/ux500/cryp/cryp_core.c:559:5: warning: implicit conversion from enumeration type 'enum dma_data_direction' to different enumeration type 'enum dma_transfer_direction' [-Wenum-conversion] direction, DMA_CTRL_ACK); ^~~~~~~~~ drivers/crypto/ux500/cryp/cryp_core.c:583:5: warning: implicit conversion from enumeration type 'enum dma_data_direction' to different enumeration type 'enum dma_transfer_direction' [-Wenum-conversion] direction, ^~~~~~~~~ 2 warnings generated. dmaengine_prep_slave_sg expects an enum from dma_transfer_direction. Because we know the value of the dma_data_direction enum from the switch statement, we can just use the proper value from dma_transfer_direction so there is no more conversion. DMA_TO_DEVICE = DMA_MEM_TO_DEV = 1 DMA_FROM_DEVICE = DMA_DEV_TO_MEM = 2 Signed-off-by: NNathan Chancellor <natechancellor@gmail.com> Reviewed-by: NNick Desaulniers <ndesaulniers@google.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-