- 05 2月, 2019 16 次提交
-
-
由 Michael Mueller 提交于
Assure a GISA is in use before accessing the IPM to avoid a null pointer dereference issue. Signed-off-by: NMichael Mueller <mimu@linux.ibm.com> Reported-by: NHalil Pasic <pasic@linux.ibm.com> Reviewed-by: NPierre Morel <pmorel@linux.ibm.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Message-Id: <20190131085247.13826-16-mimu@linux.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
由 Michael Mueller 提交于
By initializing the GIB, it will be used by the kvm host. Signed-off-by: NMichael Mueller <mimu@linux.ibm.com> Reviewed-by: NPierre Morel <pmorel@linux.ibm.com> Reviewed-by: NHalil Pasic <pasic@linux.ibm.com> Message-Id: <20190131085247.13826-15-mimu@linux.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
由 Michael Mueller 提交于
The patch implements a handler for GIB alert interruptions on the host. Its task is to alert guests that interrupts are pending for them. A GIB alert interrupt statistic counter is added as well: $ cat /proc/interrupts CPU0 CPU1 ... GAL: 23 37 [I/O] GIB Alert ... Signed-off-by: NMichael Mueller <mimu@linux.ibm.com> Acked-by: NHalil Pasic <pasic@linux.ibm.com> Reviewed-by: NPierre Morel <pmorel@linux.ibm.com> Message-Id: <20190131085247.13826-14-mimu@linux.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
由 Michael Mueller 提交于
Function kvm_s390_gisa_clear() now clears the Interruption Pending Mask of the GISA asap. If the GISA is in the alert list at this time it stays in the list but is removed by process_gib_alert_list(). Signed-off-by: NMichael Mueller <mimu@linux.ibm.com> Acked-by: NHalil Pasic <pasic@linux.ibm.com> Reviewed-by: NPierre Morel <pmorel@linux.ibm.com> Message-Id: <20190131085247.13826-13-mimu@linux.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
由 Michael Mueller 提交于
Add the Interruption Alert Mask (IAM) to the architecture specific kvm struct. This mask in the GISA is used to define for which ISC a GIB alert will be issued. The functions kvm_s390_gisc_register() and kvm_s390_gisc_unregister() are used to (un)register a GISC (guest ISC) with a virtual machine and its GISA. Upon successful completion, kvm_s390_gisc_register() returns the ISC to be used for GIB alert interruptions. A negative return code indicates an error during registration. Theses functions will be used by other adapter types like AP and PCI to request pass-through interruption support. Signed-off-by: NMichael Mueller <mimu@linux.ibm.com> Acked-by: NPierre Morel <pmorel@linux.ibm.com> Acked-by: NHalil Pasic <pasic@linux.ibm.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Message-Id: <20190131085247.13826-12-mimu@linux.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
由 Michael Mueller 提交于
Adding the kvm reference to struct sie_page2 will allow to determine the kvm a given gisa belongs to: container_of(gisa, struct sie_page2, gisa)->kvm This functionality will be required to process a gisa in gib alert interruption context. Signed-off-by: NMichael Mueller <mimu@linux.ibm.com> Reviewed-by: NPierre Morel <pmorel@linux.ibm.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Reviewed-by: NHalil Pasic <pasic@linux.ibm.com> Message-Id: <20190131085247.13826-11-mimu@linux.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
由 Michael Mueller 提交于
The Guest Information Block (GIB) links the GISA of all guests that have adapter interrupts pending. These interrupts cannot be delivered because all vcpus of these guests are currently in WAIT state or have masked the respective Interruption Sub Class (ISC). If enabled, a GIB alert is issued on the host to schedule these guests to run suitable vcpus to consume the pending interruptions. This mechanism allows to process adapter interrupts for currently not running guests. The GIB is created during host initialization and associated with the Adapter Interruption Facility in case an Adapter Interruption Virtualization Facility is available. The GIB initialization and thus the activation of the related code will be done in an upcoming patch of this series. Signed-off-by: NMichael Mueller <mimu@linux.ibm.com> Reviewed-by: NJanosch Frank <frankja@linux.ibm.com> Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Reviewed-by: NPierre Morel <pmorel@linux.ibm.com> Acked-by: NHalil Pasic <pasic@linux.ibm.com> Message-Id: <20190131085247.13826-10-mimu@linux.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
由 Michael Mueller 提交于
This patch implements the Set Guest Information Block operation to request association or disassociation of a Guest Information Block (GIB) with the Adapter Interruption Facility. The operation is required to receive GIB alert interrupts for guest adapters in conjunction with AIV and GISA. Signed-off-by: NMichael Mueller <mimu@linux.ibm.com> Reviewed-by: NSebastian Ott <sebott@linux.ibm.com> Reviewed-by: NPierre Morel <pmorel@linux.ibm.com> Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com> Acked-by: NHalil Pasic <pasic@linux.ibm.com> Acked-by: NJanosch Frank <frankja@linux.ibm.com> Acked-by: NCornelia Huck <cohuck@redhat.com> Message-Id: <20190131085247.13826-9-mimu@linux.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
由 Michael Mueller 提交于
Use this struct analog to the kvm interruption structs for kvm emulated floating and local interruptions. GIB handling will add further fields to this structure as required. Signed-off-by: NMichael Mueller <mimu@linux.ibm.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Reviewed-by: NHalil Pasic <pasic@linux.ibm.com> Message-Id: <20190131085247.13826-8-mimu@linux.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
由 Michael Mueller 提交于
This will shorten the length of code lines. All GISA related static inline functions are local to interrupt.c. Signed-off-by: NMichael Mueller <mimu@linux.ibm.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Reviewed-by: NPierre Morel <pmorel@linux.ibm.com> Reviewed-by: NHalil Pasic <pasic@linux.ibm.com> Message-Id: <20190131085247.13826-7-mimu@linux.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
由 Michael Mueller 提交于
Interruption types that are not represented in GISA shall use pending_irqs_no_gisa() to test pending interruptions. Signed-off-by: NMichael Mueller <mimu@linux.ibm.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Reviewed-by: NPierre Morel <pmorel@linux.ibm.com> Reviewed-by: NHalil Pasic <pasic@linux.ibm.com> Message-Id: <20190131085247.13826-6-mimu@linux.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
由 Michael Mueller 提交于
The change helps to reduce line length and increases code readability. Signed-off-by: NMichael Mueller <mimu@linux.ibm.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Reviewed-by: NPierre Morel <pmorel@linux.ibm.com> Reviewed-by: NHalil Pasic <pasic@linux.ibm.com> Message-Id: <20190131085247.13826-5-mimu@linux.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
由 Michael Mueller 提交于
The vcpu idle_mask state is used by but not specific to the emulated floating interruptions. The state is relevant to gisa related interruptions as well. Signed-off-by: NMichael Mueller <mimu@linux.ibm.com> Reviewed-by: NPierre Morel <pmorel@linux.ibm.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Acked-by: NHalil Pasic <pasic@linux.ibm.com> Message-Id: <20190131085247.13826-4-mimu@linux.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
由 Michael Mueller 提交于
Use a consistent bitmap declaration throughout the code. Signed-off-by: NMichael Mueller <mimu@linux.ibm.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Reviewed-by: NHalil Pasic <pasic@linux.ibm.com> Message-Id: <20190131085247.13826-3-mimu@linux.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
由 Michael Mueller 提交于
The explicit else path specified in set_intercept_indicators_io is not required as the function returns in case the first branch is taken anyway. Signed-off-by: NMichael Mueller <mimu@linux.ibm.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Reviewed-by: NHalil Pasic <pasic@linux.ibm.com> Message-Id: <20190131085247.13826-2-mimu@linux.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
由 Michael Mueller 提交于
As suggested by our ID dept. here are some kernel message updates. Signed-off-by: NMichael Mueller <mimu@linux.ibm.com> Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
- 07 1月, 2019 1 次提交
-
-
由 Linus Torvalds 提交于
Commit 594cc251 ("make 'user_access_begin()' do 'access_ok()'") broke both alpha and SH booting in qemu, as noticed by Guenter Roeck. It turns out that the bug wasn't actually in that commit itself (which would have been surprising: it was mostly a no-op), but in how the addition of access_ok() to the strncpy_from_user() and strnlen_user() functions now triggered the case where those functions would test the access of the very last byte of the user address space. The string functions actually did that user range test before too, but they did it manually by just comparing against user_addr_max(). But with user_access_begin() doing the check (using "access_ok()"), it now exposed problems in the architecture implementations of that function. For example, on alpha, the access_ok() helper macro looked like this: #define __access_ok(addr, size) \ ((get_fs().seg & (addr | size | (addr+size))) == 0) and what it basically tests is of any of the high bits get set (the USER_DS masking value is 0xfffffc0000000000). And that's completely wrong for the "addr+size" check. Because it's off-by-one for the case where we check to the very end of the user address space, which is exactly what the strn*_user() functions do. Why? Because "addr+size" will be exactly the size of the address space, so trying to access the last byte of the user address space will fail the __access_ok() check, even though it shouldn't. As a result, the user string accessor functions failed consistently - because they literally don't know how long the string is going to be, and the max access is going to be that last byte of the user address space. Side note: that alpha macro is buggy for another reason too - it re-uses the arguments twice. And SH has another version of almost the exact same bug: #define __addr_ok(addr) \ ((unsigned long __force)(addr) < current_thread_info()->addr_limit.seg) so far so good: yes, a user address must be below the limit. But then: #define __access_ok(addr, size) \ (__addr_ok((addr) + (size))) is wrong with the exact same off-by-one case: the case when "addr+size" is exactly _equal_ to the limit is actually perfectly fine (think "one byte access at the last address of the user address space") The SH version is actually seriously buggy in another way: it doesn't actually check for overflow, even though it did copy the _comment_ that talks about overflow. So it turns out that both SH and alpha actually have completely buggy implementations of access_ok(), but they happened to work in practice (although the SH overflow one is a serious serious security bug, not that anybody likely cares about SH security). This fixes the problems by using a similar macro on both alpha and SH. It isn't trying to be clever, the end address is based on this logic: unsigned long __ao_end = __ao_a + __ao_b - !!__ao_b; which basically says "add start and length, and then subtract one unless the length was zero". We can't subtract one for a zero length, or we'd just hit an underflow instead. For a lot of access_ok() users the length is a constant, so this isn't actually as expensive as it initially looks. Reported-and-tested-by: NGuenter Roeck <linux@roeck-us.net> Cc: Matt Turner <mattst88@gmail.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 06 1月, 2019 9 次提交
-
-
由 Masahiro Yamada 提交于
You do not have to use define ... endef for filechk_* rules. For simple cases, the use of assignment looks cleaner, IMHO. I updated the usage for scripts/Kbuild.include in case somebody misunderstands the 'define ... endif' is the requirement. Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
-
由 Masahiro Yamada 提交于
Now that Kbuild automatically creates asm-generic wrappers for missing mandatory headers, it is redundant to list the same headers in generic-y and mandatory-y. Suggested-by: NSam Ravnborg <sam@ravnborg.org> Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Acked-by: NSam Ravnborg <sam@ravnborg.org>
-
由 Masahiro Yamada 提交于
These comments are leftovers of commit fcc8487d ("uapi: export all headers under uapi directories"). Prior to that commit, exported headers must be explicitly added to header-y. Now, all headers under the uapi/ directories are exported. Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
-
由 Masahiro Yamada 提交于
This commit removes redundant generic-y defines in arch/riscv/include/asm/Kbuild. [1] It is redundant to define the same generic-y in both arch/$(ARCH)/include/asm/Kbuild and arch/$(ARCH)/include/uapi/asm/Kbuild. Remove the following generic-y: errno.h fcntl.h ioctl.h ioctls.h ipcbuf.h mman.h msgbuf.h param.h poll.h posix_types.h resource.h sembuf.h setup.h shmbuf.h signal.h socket.h sockios.h stat.h statfs.h swab.h termbits.h termios.h types.h [2] It is redundant to define generic-y when arch-specific implementation exists in arch/$(ARCH)/include/asm/*.h Remove the following generic-y: cacheflush.h module.h Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
-
由 Masahiro Yamada 提交于
filechk_* rules often consist of multiple 'echo' lines. They must be surrounded with { } or ( ) to work correctly. Otherwise, only the string from the last 'echo' would be written into the target. Let's take care of that in the 'filechk' in scripts/Kbuild.include to clean up filechk_* rules. Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
-
由 Masahiro Yamada 提交于
Since commit 9c2af1c7 ("kbuild: add .DELETE_ON_ERROR special target"), the target file is automatically deleted on failure. The boilerplate code ... || { rm -f $@; false; } is unneeded. Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
-
由 Masahiro Yamada 提交于
Currently, CONFIG_JUMP_LABEL just means "I _want_ to use jump label". The jump label is controlled by HAVE_JUMP_LABEL, which is defined like this: #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL) # define HAVE_JUMP_LABEL #endif We can improve this by testing 'asm goto' support in Kconfig, then make JUMP_LABEL depend on CC_HAS_ASM_GOTO. Ugly #ifdef HAVE_JUMP_LABEL will go away, and CONFIG_JUMP_LABEL will match to the real kernel capability. Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
-
由 Masahiro Yamada 提交于
This commit removes redundant generic-y defines in arch/nds32/include/asm/Kbuild. [1] It is redundant to define the same generic-y in both arch/$(ARCH)/include/asm/Kbuild and arch/$(ARCH)/include/uapi/asm/Kbuild. Remove the following generic-y: bitsperlong.h bpf_perf_event.h errno.h fcntl.h ioctl.h ioctls.h mman.h shmbuf.h stat.h [2] It is redundant to define generic-y when arch-specific implementation exists in arch/$(ARCH)/include/asm/*.h Remove the following generic-y: ftrace.h Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
-
由 Masahiro Yamada 提交于
kernel/dma/Kconfig globally defines HAS_DMA as follows: config HAS_DMA bool depends on !NO_DMA default y Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 05 1月, 2019 14 次提交
-
-
由 Christoph Hellwig 提交于
In many cases we don't have to create a GART mapping at all, which also means there is nothing to unmap. Fix the range check that was incorrectly modified when removing the mapping_error method. Fixes: 9e8aa6b5 ("x86/amd_gart: remove the mapping_error dma_map_ops method") Reported-by: NMichal Kubecek <mkubecek@suse.cz> Signed-off-by: NChristoph Hellwig <hch@lst.de> Tested-by: NMichal Kubecek <mkubecek@suse.cz>
-
由 Christoph Hellwig 提交于
Some non-generic ia64 configs don't build swiotlb, and thus should not pull in the generic non-coherent DMA infrastructure. Fixes: 68c60834 ("swiotlb: remove dma_mark_clean") Reported-by: NTony Luck <tony.luck@gmail.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
This has been broken forever, and nobody ever really noticed because it's purely a performance issue. Long long ago, in commit 6175ddf0 ("x86: Clean up mem*io functions") Brian Gerst simplified the memory copies to and from iomem, since on x86, the instructions to access iomem are exactly the same as the regular instructions. That is technically true, and things worked, and nobody said anything. Besides, back then the regular memcpy was pretty simple and worked fine. Nobody noticed except for David Laight, that is. David has a testing a TLP monitor he was writing for an FPGA, and has been occasionally complaining about how memcpy_toio() writes things one byte at a time. Which is completely unacceptable from a performance standpoint, even if it happens to technically work. The reason it's writing one byte at a time is because while it's technically true that accesses to iomem are the same as accesses to regular memory on x86, the _granularity_ (and ordering) of accesses matter to iomem in ways that they don't matter to regular cached memory. In particular, when ERMS is set, we default to using "rep movsb" for larger memory copies. That is indeed perfectly fine for real memory, since the whole point is that the CPU is going to do cacheline optimizations and executes the memory copy efficiently for cached memory. With iomem? Not so much. With iomem, "rep movsb" will indeed work, but it will copy things one byte at a time. Slowly and ponderously. Now, originally, back in 2010 when commit 6175ddf0 was done, we didn't use ERMS, and this was much less noticeable. Our normal memcpy() was simpler in other ways too. Because in fact, it's not just about using the string instructions. Our memcpy() these days does things like "read and write overlapping values" to handle the last bytes of the copy. Again, for normal memory, overlapping accesses isn't an issue. For iomem? It can be. So this re-introduces the specialized memcpy_toio(), memcpy_fromio() and memset_io() functions. It doesn't particularly optimize them, but it tries to at least not be horrid, or do overlapping accesses. In fact, this uses the existing __inline_memcpy() function that we still had lying around that uses our very traditional "rep movsl" loop followed by movsw/movsb for the final bytes. Somebody may decide to try to improve on it, but if we've gone almost a decade with only one person really ever noticing and complaining, maybe it's not worth worrying about further, once it's not _completely_ broken? Reported-by: NDavid Laight <David.Laight@aculab.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
This actually enables the __put_user_goto() functionality in unsafe_put_user(). For an example of the effect of this, this is the code generated for the unsafe_put_user(signo, &infop->si_signo, Efault); in the waitid() system call: movl %ecx,(%rbx) # signo, MEM[(struct __large_struct *)_2] It's just one single store instruction, along with generating an exception table entry pointing to the Efault label case in case that instruction faults. Before, we would generate this: xorl %edx, %edx movl %ecx,(%rbx) # signo, MEM[(struct __large_struct *)_3] testl %edx, %edx jne .L309 with the exception table generated for that 'mov' instruction causing us to jump to a stub that set %edx to -EFAULT and then jumped back to the 'testl' instruction. So not only do we now get rid of the extra code in the normal sequence, we also avoid unnecessarily keeping that extra error register live across it all. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
This is finally the actual reason for the odd error handling in the "unsafe_get/put_user()" functions, introduced over three years ago. Using a "jump to error label" interface is somewhat odd, but very convenient as a programming interface, and more importantly, it fits very well with simply making the target be the exception handler address directly from the inline asm. The reason it took over three years to actually do this? We need "asm goto" support for it, which only became the default on x86 last year. It's now been a year that we've forced asm goto support (see commit e501ce95 "x86: Force asm-goto"), and so let's just do it here too. [ Side note: this commit was originally done back in 2016. The above commentary about timing is obviously about it only now getting merged into my real upstream tree - Linus ] Sadly, gcc still only supports "asm goto" with asms that do not have any outputs, so we are limited to only the put_user case for this. Maybe in several more years we can do the get_user case too. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Helge Deller 提交于
The alternative coding patch for parisc in kernel 4.20 broke booting machines with PA8500-PA8700 CPUs. The problem is, that for such machines the parisc kernel automatically utilizes huge pages to access kernel text code, but the set_kernel_text_rw() function, which is used shortly before applying any alternative patches, didn't used the correctly hugepage-aligned addresses to remap the kernel text read-writeable. Fixes: 3847dab7 ("parisc: Add alternative coding infrastructure") Cc: <stable@vger.kernel.org> [4.20] Signed-off-by: NHelge Deller <deller@gmx.de>
-
由 Masahiro Yamada 提交于
Enable the UniPhier MIO DMAC driver. This is used as the DMA engine for accelerating the SD/eMMC controller drivers. Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: NOlof Johansson <olof@lixom.net>
-
由 Davidlohr Bueso 提交于
This is already done for us internally by the signal machinery. Link: http://lkml.kernel.org/r/20181116002713.8474-4-dave@stgolabs.netSigned-off-by: NDavidlohr Bueso <dave@stgolabs.net> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joel Fernandes (Google) 提交于
Moving page-tables at the PMD-level on x86 is known to be safe. Enable this option so that we can do fast mremap when possible. Link: http://lkml.kernel.org/r/20181108181201.88826-4-joelaf@google.comSigned-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org> Suggested-by: NKirill A. Shutemov <kirill@shutemov.name> Acked-by: NKirill A. Shutemov <kirill@shutemov.name> Cc: Julia Lawall <Julia.Lawall@lip6.fr> Cc: Michal Hocko <mhocko@kernel.org> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joel Fernandes (Google) 提交于
Android needs to mremap large regions of memory during memory management related operations. The mremap system call can be really slow if THP is not enabled. The bottleneck is move_page_tables, which is copying each pte at a time, and can be really slow across a large map. Turning on THP may not be a viable option, and is not for us. This patch speeds up the performance for non-THP system by copying at the PMD level when possible. The speedup is an order of magnitude on x86 (~20x). On a 1GB mremap, the mremap completion times drops from 3.4-3.6 milliseconds to 144-160 microseconds. Before: Total mremap time for 1GB data: 3521942 nanoseconds. Total mremap time for 1GB data: 3449229 nanoseconds. Total mremap time for 1GB data: 3488230 nanoseconds. After: Total mremap time for 1GB data: 150279 nanoseconds. Total mremap time for 1GB data: 144665 nanoseconds. Total mremap time for 1GB data: 158708 nanoseconds. If THP is enabled the optimization is mostly skipped except in certain situations. [joel@joelfernandes.org: fix 'move_normal_pmd' unused function warning] Link: http://lkml.kernel.org/r/20181108224457.GB209347@google.com Link: http://lkml.kernel.org/r/20181108181201.88826-3-joelaf@google.comSigned-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org> Acked-by: NKirill A. Shutemov <kirill@shutemov.name> Reviewed-by: NWilliam Kucharski <william.kucharski@oracle.com> Cc: Julia Lawall <Julia.Lawall@lip6.fr> Cc: Michal Hocko <mhocko@kernel.org> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joel Fernandes (Google) 提交于
Patch series "Add support for fast mremap". This series speeds up the mremap(2) syscall by copying page tables at the PMD level even for non-THP systems. There is concern that the extra 'address' argument that mremap passes to pte_alloc may do something subtle architecture related in the future that may make the scheme not work. Also we find that there is no point in passing the 'address' to pte_alloc since its unused. This patch therefore removes this argument tree-wide resulting in a nice negative diff as well. Also ensuring along the way that the enabled architectures do not do anything funky with the 'address' argument that goes unnoticed by the optimization. Build and boot tested on x86-64. Build tested on arm64. The config enablement patch for arm64 will be posted in the future after more testing. The changes were obtained by applying the following Coccinelle script. (thanks Julia for answering all Coccinelle questions!). Following fix ups were done manually: * Removal of address argument from pte_fragment_alloc * Removal of pte_alloc_one_fast definitions from m68k and microblaze. // Options: --include-headers --no-includes // Note: I split the 'identifier fn' line, so if you are manually // running it, please unsplit it so it runs for you. virtual patch @pte_alloc_func_def depends on patch exists@ identifier E2; identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; type T2; @@ fn(... - , T2 E2 ) { ... } @pte_alloc_func_proto_noarg depends on patch exists@ type T1, T2, T3, T4; identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; @@ ( - T3 fn(T1, T2); + T3 fn(T1); | - T3 fn(T1, T2, T4); + T3 fn(T1, T2); ) @pte_alloc_func_proto depends on patch exists@ identifier E1, E2, E4; type T1, T2, T3, T4; identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; @@ ( - T3 fn(T1 E1, T2 E2); + T3 fn(T1 E1); | - T3 fn(T1 E1, T2 E2, T4 E4); + T3 fn(T1 E1, T2 E2); ) @pte_alloc_func_call depends on patch exists@ expression E2; identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; @@ fn(... -, E2 ) @pte_alloc_macro depends on patch exists@ identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$"; identifier a, b, c; expression e; position p; @@ ( - #define fn(a, b, c) e + #define fn(a, b) e | - #define fn(a, b) e + #define fn(a) e ) Link: http://lkml.kernel.org/r/20181108181201.88826-2-joelaf@google.comSigned-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org> Suggested-by: NKirill A. Shutemov <kirill@shutemov.name> Acked-by: NKirill A. Shutemov <kirill@shutemov.name> Cc: Michal Hocko <mhocko@kernel.org> Cc: Julia Lawall <Julia.Lawall@lip6.fr> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox 提交于
When testing in userspace, UBSAN pointed out that shifting into the sign bit is undefined behaviour. It doesn't really make sense to ask for the highest set bit of a negative value, so just turn the argument type into an unsigned int. Some architectures (eg ppc) already had it declared as an unsigned int, so I don't expect too many problems. Link: http://lkml.kernel.org/r/20181105221117.31828-1-willy@infradead.orgSigned-off-by: NMatthew Wilcox <willy@infradead.org> Acked-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org> Cc: <linux-arch@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
Originally, the rule used to be that you'd have to do access_ok() separately, and then user_access_begin() before actually doing the direct (optimized) user access. But experience has shown that people then decide not to do access_ok() at all, and instead rely on it being implied by other operations or similar. Which makes it very hard to verify that the access has actually been range-checked. If you use the unsafe direct user accesses, hardware features (either SMAP - Supervisor Mode Access Protection - on x86, or PAN - Privileged Access Never - on ARM) do force you to use user_access_begin(). But nothing really forces the range check. By putting the range check into user_access_begin(), we actually force people to do the right thing (tm), and the range check vill be visible near the actual accesses. We have way too long a history of people trying to avoid them. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
These two architectures actually had an intentional use of the 'type' argument to access_ok() just to avoid warnings. I had actually noticed the powerpc one, but forgot to then fix it up. And I missed the sparc32 case entirely. This is hopefully all of it. Reported-by: NMathieu Malaterre <malat@debian.org> Reported-by: NGuenter Roeck <linux@roeck-us.net> Fixes: 96d4f267 ("Remove 'type' argument from access_ok() function") Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-