- 19 10月, 2016 3 次提交
-
-
由 Lorenzo Stoakes 提交于
This removes the 'write' and 'force' use from get_user_pages_locked() and replaces them with 'gup_flags' to make the use of FOLL_FORCE explicit in callers as use of this flag can result in surprising behaviour (and hence bugs) within the mm subsystem. Signed-off-by: NLorenzo Stoakes <lstoakes@gmail.com> Acked-by: NMichal Hocko <mhocko@suse.com> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Lorenzo Stoakes 提交于
This removes the 'write' and 'force' use from get_user_pages_unlocked() and replaces them with 'gup_flags' to make the use of FOLL_FORCE explicit in callers as use of this flag can result in surprising behaviour (and hence bugs) within the mm subsystem. Signed-off-by: NLorenzo Stoakes <lstoakes@gmail.com> Reviewed-by: NJan Kara <jack@suse.cz> Acked-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Lorenzo Stoakes 提交于
This removes the redundant 'write' and 'force' parameters from __get_user_pages_unlocked() to make the use of FOLL_FORCE explicit in callers as use of this flag can result in surprising behaviour (and hence bugs) within the mm subsystem. Signed-off-by: NLorenzo Stoakes <lstoakes@gmail.com> Acked-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NJan Kara <jack@suse.cz> Acked-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 27 7月, 2016 1 次提交
-
-
由 Kirill A. Shutemov 提交于
The idea borrowed from Peter's patch from patchset on speculative page faults[1]: Instead of passing around the endless list of function arguments, replace the lot with a single structure so we can change context without endless function signature changes. The changes are mostly mechanical with exception of faultaround code: filemap_map_pages() got reworked a bit. This patch is preparation for the next one. [1] http://lkml.kernel.org/r/20141020222841.302891540@infradead.org Link: http://lkml.kernel.org/r/1466021202-61880-9-git-send-email-kirill.shutemov@linux.intel.comSigned-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 28 5月, 2016 1 次提交
-
-
由 Linus Torvalds 提交于
The do_brk() and vm_brk() return value was "unsigned long" and returned the starting address on success, and an error value on failure. The reasons are entirely historical, and go back to it basically behaving like the mmap() interface does. However, nobody actually wanted that interface, and it causes totally pointless IS_ERR_VALUE() confusion. What every single caller actually wants is just the simpler integer return of zero for success and negative error number on failure. So just convert to that much clearer and more common calling convention, and get rid of all the IS_ERR_VALUE() uses wrt vm_brk(). Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 24 5月, 2016 2 次提交
-
-
由 Michal Hocko 提交于
All the callers of vm_mmap seem to check for the failure already and bail out in one way or another on the error which means that we can change it to use killable version of vm_mmap_pgoff and return -EINTR if the current task gets killed while waiting for mmap_sem. This also means that vm_mmap_pgoff can be killable by default and drop the additional parameter. This will help in the OOM conditions when the oom victim might be stuck waiting for the mmap_sem for write which in turn can block oom_reaper which relies on the mmap_sem for read to make a forward progress and reclaim the address space of the victim. Please note that load_elf_binary is ignoring vm_mmap error for current->personality & MMAP_PAGE_ZERO case but that shouldn't be a problem because the address is not used anywhere and we never return to the userspace if we got killed. Signed-off-by: NMichal Hocko <mhocko@suse.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michal Hocko 提交于
This is a follow up work for oom_reaper [1]. As the async OOM killing depends on oom_sem for read we would really appreciate if a holder for write didn't stood in the way. This patchset is changing many of down_write calls to be killable to help those cases when the writer is blocked and waiting for readers to release the lock and so help __oom_reap_task to process the oom victim. Most of the patches are really trivial because the lock is help from a shallow syscall paths where we can return EINTR trivially and allow the current task to die (note that EINTR will never get to the userspace as the task has fatal signal pending). Others seem to be easy as well as the callers are already handling fatal errors and bail and return to userspace which should be sufficient to handle the failure gracefully. I am not familiar with all those code paths so a deeper review is really appreciated. As this work is touching more areas which are not directly connected I have tried to keep the CC list as small as possible and people who I believed would be familiar are CCed only to the specific patches (all should have received the cover though). This patchset is based on linux-next and it depends on down_write_killable for rw_semaphores which got merged into tip locking/rwsem branch and it is merged into this next tree. I guess it would be easiest to route these patches via mmotm because of the dependency on the tip tree but if respective maintainers prefer other way I have no objections. I haven't covered all the mmap_write(mm->mmap_sem) instances here $ git grep "down_write(.*\<mmap_sem\>)" next/master | wc -l 98 $ git grep "down_write(.*\<mmap_sem\>)" | wc -l 62 I have tried to cover those which should be relatively easy to review in this series because this alone should be a nice improvement. Other places can be changed on top. [0] http://lkml.kernel.org/r/1456752417-9626-1-git-send-email-mhocko@kernel.org [1] http://lkml.kernel.org/r/1452094975-551-1-git-send-email-mhocko@kernel.org [2] http://lkml.kernel.org/r/1456750705-7141-1-git-send-email-mhocko@kernel.org This patch (of 18): This is the first step in making mmap_sem write waiters killable. It focuses on the trivial ones which are taking the lock early after entering the syscall and they are not changing state before. Therefore it is very easy to change them to use down_write_killable and immediately return with -EINTR. This will allow the waiter to pass away without blocking the mmap_sem which might be required to make a forward progress. E.g. the oom reaper will need the lock for reading to dismantle the OOM victim address space. The only tricky function in this patch is vm_mmap_pgoff which has many call sites via vm_mmap. To reduce the risk keep vm_mmap with the original non-killable semantic for now. vm_munmap callers do not bother checking the return value so open code it into the munmap syscall path for now for simplicity. Signed-off-by: NMichal Hocko <mhocko@suse.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 07 4月, 2016 1 次提交
-
-
由 Ingo Molnar 提交于
The pkeys changes brought about a truly hideous set of macros in: cde70140 ("mm/gup: Overload get_user_pages() functions") ... which macros are (ab-)using the fact that __VA_ARGS__ can be used to shift parameter positions in macro arguments without breaking the build and so can be used to call separate C functions depending on the number of arguments of the macro. This allowed easy migration of these 3 GUP APIs, as both these variants worked at the C level: old: ret = get_user_pages(current, current->mm, address, 1, 1, 0, &page, NULL); new: ret = get_user_pages(address, 1, 1, 0, &page, NULL); ... while we also generated a (functionally harmless but noticeable) build time warning if the old API was used. As there are over 300 uses of these APIs, this trick eased the migration of the API and avoided excessive migration pain in linux-next. Now, with its work done, get rid of all of that complication and ugliness: 3 files changed, 16 insertions(+), 140 deletions(-) ... where the linecount of the migration hack was further inflated by the fact that there are NOMMU variants of these GUP APIs as well. Much of the conversion was done in linux-next over the past couple of months, and Linus recently removed all remaining old API uses from the upstream tree in the following upstrea commit: cb107161 ("Convert straggling drivers to new six-argument get_user_pages()") There was one more old-API usage in mm/gup.c, in the CONFIG_HAVE_GENERIC_RCU_GUP code path that ARM, ARM64 and PowerPC uses. After this commit any old API usage will break the build. [ Also fixed a PowerPC/HAVE_GENERIC_RCU_GUP warning reported by Stephen Rothwell. ] Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Hansen <dave@sr71.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 05 4月, 2016 1 次提交
-
-
由 Kirill A. Shutemov 提交于
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 3月, 2016 2 次提交
-
-
由 Andrey Ryabinin 提交于
Currently we have two copies of the same code which implements memory overcommitment logic. Let's move it into mm/util.c and hence avoid duplication. No functional changes here. Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrey Ryabinin 提交于
max_map_count sysctl unrelated to scheduler. Move its bits from include/linux/sched/sysctl.h to include/linux/mm.h. Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 19 2月, 2016 1 次提交
-
-
由 Dave Hansen 提交于
This plumbs a protection key through calc_vm_flag_bits(). We could have done this in calc_vm_prot_bits(), but I did not feel super strongly which way to go. It was pretty arbitrary which one to use. Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arve Hjønnevåg <arve@android.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chen Gang <gang.chen.5i5j@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave@sr71.net> Cc: David Airlie <airlied@linux.ie> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Geliang Tang <geliangtang@163.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Leon Romanovsky <leon@leon.nu> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Riley Andrews <riandrews@android.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: devel@driverdev.osuosl.org Cc: linux-api@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/20160212210231.E6F1F0D6@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 16 2月, 2016 1 次提交
-
-
由 Dave Hansen 提交于
The concept here was a suggestion from Ingo. The implementation horrors are all mine. This allows get_user_pages(), get_user_pages_unlocked(), and get_user_pages_locked() to be called with or without the leading tsk/mm arguments. We will give a compile-time warning about the old style being __deprecated and we will also WARN_ON() if the non-remote version is used for a remote-style access. Doing this, folks will get nice warnings and will not break the build. This should be nice for -next and will hopefully let developers fix up their own code instead of maintainers needing to do it at merge time. The way we do this is hideous. It uses the __VA_ARGS__ macro functionality to call different functions based on the number of arguments passed to the macro. There's an additional hack to ensure that our EXPORT_SYMBOL() of the deprecated symbols doesn't trigger a warning. We should be able to remove this mess as soon as -rc1 hits in the release after this is merged. Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alexander Kuleshov <kuleshovmail@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Hansen <dave@sr71.net> Cc: Dominik Dingel <dingel@linux.vnet.ibm.com> Cc: Geliang Tang <geliangtang@163.com> Cc: Jan Kara <jack@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Leon Romanovsky <leon@leon.nu> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xie XiuQi <xiexiuqi@huawei.com> Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20160212210155.73222EE1@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 15 1月, 2016 1 次提交
-
-
由 Vladimir Davydov 提交于
Mark those kmem allocations that are known to be easily triggered from userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to memcg. For the list, see below: - threadinfo - task_struct - task_delay_info - pid - cred - mm_struct - vm_area_struct and vm_region (nommu) - anon_vma and anon_vma_chain - signal_struct - sighand_struct - fs_struct - files_struct - fdtable and fdtable->full_fds_bits - dentry and external_name - inode for all filesystems. This is the most tedious part, because most filesystems overwrite the alloc_inode method. The list is far from complete, so feel free to add more objects. Nevertheless, it should be close to "account everything" approach and keep most workloads within bounds. Malevolent users will be able to breach the limit, but this was possible even with the former "account everything" approach (simply because it did not account everything in fact). [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Tejun Heo <tj@kernel.org> Cc: Greg Thelen <gthelen@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 06 11月, 2015 2 次提交
-
-
由 Geliang Tang 提交于
(1) For !CONFIG_BUG cases, the bug call is a no-op, so we couldn't care less and the change is ok. (2) ppc and mips, which HAVE_ARCH_BUG_ON, do not rely on branch predictions as it seems to be pointless[1] and thus callers should not be trying to push an optimization in the first place. (3) For CONFIG_BUG and !HAVE_ARCH_BUG_ON cases, BUG_ON() contains an unlikely compiler flag already. Hence, we can drop unlikely behind BUG_ON(). [1] http://lkml.iu.edu/hypermail/linux/kernel/1101.3/02289.htmlSigned-off-by: NGeliang Tang <geliangtang@163.com> Acked-by: NDavidlohr Bueso <dave@stgolabs.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexander Kuleshov 提交于
linux/mm.h provides offset_in_page() macro. Let's use already predefined macro instead of (addr & ~PAGE_MASK). Signed-off-by: NAlexander Kuleshov <kuleshovmail@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 9月, 2015 1 次提交
-
-
由 Oleg Nesterov 提交于
Add the additional "vm_flags_t vm_flags" argument to do_mmap_pgoff(), rename it to do_mmap(), and re-introduce do_mmap_pgoff() as a simple wrapper on top of do_mmap(). Perhaps we should update the callers of do_mmap_pgoff() and kill it later. This way mpx_mmap() can simply call do_mmap(vm_flags => VM_MPX) and do not play with vm internals. After this change mmap_region() has a single user outside of mmap.c, arch/tile/mm/elf.c:arch_setup_additional_pages(). It would be nice to change arch/tile/ and unexport mmap_region(). [kirill@shutemov.name: fix build] [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NDave Hansen <dave.hansen@linux.intel.com> Tested-by: NDave Hansen <dave.hansen@linux.intel.com> Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Minchan Kim <minchan@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 07 8月, 2015 1 次提交
-
-
由 Masahiro Yamada 提交于
Looks like the word "contiguous" is often mistyped. Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: NJiri Kosina <jkosina@suse.com>
-
- 10 7月, 2015 1 次提交
-
-
由 Eric W. Biederman 提交于
Today proc and sysfs do not contain any executable files. Several applications today mount proc or sysfs without noexec and nosuid and then depend on there being no exectuables files on proc or sysfs. Having any executable files show on proc or sysfs would cause a user space visible regression, and most likely security problems. Therefore commit to never allowing executables on proc and sysfs by adding a new flag to mark them as filesystems without executables and enforce that flag. Test the flag where MNT_NOEXEC is tested today, so that the only user visible effect will be that exectuables will be treated as if the execute bit is cleared. The filesystems proc and sysfs do not currently incoporate any executable files so this does not result in any user visible effects. This makes it unnecessary to vet changes to proc and sysfs tightly for adding exectuable files or changes to chattr that would modify existing files, as no matter what the individual file say they will not be treated as exectuable files by the vfs. Not having to vet changes to closely is important as without this we are only one proc_create call (or another goof up in the implementation of notify_change) from having problematic executables on proc. Those mistakes are all too easy to make and would create a situation where there are security issues or the assumptions of some program having to be broken (and cause userspace regressions). Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
- 25 6月, 2015 1 次提交
-
-
由 Leon Romanovsky 提交于
kenter/kleave/kdebug are wrapper macros to print functions flow and debug information. This set was written before pr_devel() was introduced, so it was controlled by "#if 0" construction. It is questionable if anyone is using them [1] now. This patch removes these macros, converts numerous printk(KERN_WARNING, ...) to use general pr_warn(...) and removes debug print line from validate_mmap_request() function. Signed-off-by: NLeon Romanovsky <leon@leon.nu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 6月, 2015 1 次提交
-
-
由 Paul Gortmaker 提交于
Compiling some arm/m68k configs with "# CONFIG_MMU is not set" reveals two more instances of module_init being used for code that can't possibly be modular, as CONFIG_MMU is either on or off. We replace them with subsys_initcall as per what was done in other mmu-enabled code. Note that direct use of __initcall is discouraged, vs. one of the priority categorized subgroups. As __initcall gets mapped onto device_initcall, our use of subsys_initcall (which makes sense for these files) will thus change this registration from level 6-device to level 4-subsys (i.e. slightly earlier). One might think that core_initcall (l2) or postcore_initcall (l3) would be more appropriate for anything in mm/ but if we look at the actual init functions themselves, we see they are just sysctl setup stuff, and hence the choice of subsys_initcall (l4) seems reasonable. At the same time it minimizes the risk of changing the priority too drastically all at once. We can adjust further in the future. Also, a couple instances of missing ";" at EOL are fixed. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm@kvack.org Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
- 12 4月, 2015 1 次提交
-
-
由 Al Viro 提交于
... instead of open-coding the call of ->read() Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 13 3月, 2015 1 次提交
-
-
由 gchen gchen 提交于
Several modules may need max_mapnr, so export, the related error with allmodconfig under c6x: MODPOST 3327 modules ERROR: "max_mapnr" [fs/pstore/ramoops.ko] undefined! ERROR: "max_mapnr" [drivers/media/v4l2-core/videobuf2-dma-contig.ko] undefined! Signed-off-by: NChen Gang <gang.chen.5i5j@gmail.com> Cc: Mark Salter <msalter@redhat.com> Cc: Aurelien Jacquiot <a-jacquiot@ti.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 3月, 2015 1 次提交
-
-
由 Joonsoo Kim 提交于
Maxime reported the following memory leak regression due to commit dbc8358c ("mm/nommu: use alloc_pages_exact() rather than its own implementation"). On v3.19, I am facing a memory leak. Each time I run a command one page is lost. Here an example with busybox's free command: / # free total used free shared buffers cached Mem: 7928 1972 5956 0 0 492 -/+ buffers/cache: 1480 6448 / # free total used free shared buffers cached Mem: 7928 1976 5952 0 0 492 -/+ buffers/cache: 1484 6444 / # free total used free shared buffers cached Mem: 7928 1980 5948 0 0 492 -/+ buffers/cache: 1488 6440 / # free total used free shared buffers cached Mem: 7928 1984 5944 0 0 492 -/+ buffers/cache: 1492 6436 / # free total used free shared buffers cached Mem: 7928 1988 5940 0 0 492 -/+ buffers/cache: 1496 6432 At some point, the system fails to sastisfy 256KB allocations: free: page allocation failure: order:6, mode:0xd0 CPU: 0 PID: 67 Comm: free Not tainted 3.19.0-05389-gacf2cf1-dirty #64 Hardware name: STM32 (Device Tree Support) show_stack+0xb/0xc warn_alloc_failed+0x97/0xbc __alloc_pages_nodemask+0x295/0x35c __get_free_pages+0xb/0x24 alloc_pages_exact+0x19/0x24 do_mmap_pgoff+0x423/0x658 vm_mmap_pgoff+0x3f/0x4e load_flat_file+0x20d/0x4f8 load_flat_binary+0x3f/0x26c search_binary_handler+0x51/0xe4 do_execveat_common+0x271/0x35c do_execve+0x19/0x1c ret_fast_syscall+0x1/0x4a Mem-info: Normal per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 active_anon:0 inactive_anon:0 isolated_anon:0 active_file:0 inactive_file:0 isolated_file:0 unevictable:123 dirty:0 writeback:0 unstable:0 free:1515 slab_reclaimable:17 slab_unreclaimable:139 mapped:0 shmem:0 pagetables:0 bounce:0 free_cma:0 Normal free:6060kB min:352kB low:440kB high:528kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:492kB isolated(anon):0ks lowmem_reserve[]: 0 0 Normal: 23*4kB (U) 22*8kB (U) 24*16kB (U) 23*32kB (U) 23*64kB (U) 23*128kB (U) 1*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 6060kB 123 total pagecache pages 2048 pages of RAM 1538 free pages 66 reserved pages 109 slab pages -46 pages shared 0 pages swap cached nommu: Allocation of length 221184 from process 67 (free) failed Normal per-cpu: CPU 0: hi: 0, btch: 1 usd: 0 active_anon:0 inactive_anon:0 isolated_anon:0 active_file:0 inactive_file:0 isolated_file:0 unevictable:123 dirty:0 writeback:0 unstable:0 free:1515 slab_reclaimable:17 slab_unreclaimable:139 mapped:0 shmem:0 pagetables:0 bounce:0 free_cma:0 Normal free:6060kB min:352kB low:440kB high:528kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:492kB isolated(anon):0ks lowmem_reserve[]: 0 0 Normal: 23*4kB (U) 22*8kB (U) 24*16kB (U) 23*32kB (U) 23*64kB (U) 23*128kB (U) 1*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 6060kB 123 total pagecache pages Unable to allocate RAM for process text/data, errno 12 SEGV This problem happens because we allocate ordered page through __get_free_pages() in do_mmap_private() in some cases and we try to free individual pages rather than ordered page in free_page_series(). In this case, freeing pages whose refcount is not 0 won't be freed to the page allocator so memory leak happens. To fix the problem, this patch changes __get_free_pages() to alloc_pages_exact() since alloc_pages_exact() returns physically-contiguous pages but each pages are refcounted. Fixes: dbc8358c ("mm/nommu: use alloc_pages_exact() rather than its own implementation"). Reported-by: NMaxime Coquelin <mcoquelin.stm32@gmail.com> Tested-by: NMaxime Coquelin <mcoquelin.stm32@gmail.com> Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <stable@vger.kernel.org> [3.19] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 12 2月, 2015 3 次提交
-
-
由 Roman Gushchin 提交于
I noticed that "allowed" can easily overflow by falling below 0, because (total_vm / 32) can be larger than "allowed". The problem occurs in OVERCOMMIT_NONE mode. In this case, a huge allocation can success and overcommit the system (despite OVERCOMMIT_NONE mode). All subsequent allocations will fall (system-wide), so system become unusable. The problem was masked out by commit c9b1d098 ("mm: limit growth of 3% hardcoded other user reserve"), but it's easy to reproduce it on older kernels: 1) set overcommit_memory sysctl to 2 2) mmap() large file multiple times (with VM_SHARED flag) 3) try to malloc() large amount of memory It also can be reproduced on newer kernels, but miss-configured sysctl_user_reserve_kbytes is required. Fix this issue by switching to signed arithmetic here. Signed-off-by: NRoman Gushchin <klamm@yandex-team.ru> Cc: Andrew Shewmaker <agshew@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrea Arcangeli 提交于
Some callers (like KVM) may want to set the gup_flags like FOLL_HWPOSION to get a proper -EHWPOSION retval instead of -EFAULT to take a more appropriate action if get_user_pages runs into a memory failure. Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Reviewed-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andres Lagar-Cavilla <andreslc@google.com> Cc: Peter Feiner <pfeiner@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrea Arcangeli 提交于
FAULT_FOLL_ALLOW_RETRY allows the page fault to drop the mmap_sem for reading to reduce the mmap_sem contention (for writing), like while waiting for I/O completion. The problem is that right now practically no get_user_pages call uses FAULT_FOLL_ALLOW_RETRY, so we're not leveraging that nifty feature. Andres fixed it for the KVM page fault. However get_user_pages_fast remains uncovered, and 99% of other get_user_pages aren't using it either (the only exception being FOLL_NOWAIT in KVM which is really nonblocking and in fact it doesn't even release the mmap_sem). So this patchsets extends the optimization Andres did in the KVM page fault to the whole kernel. It makes most important places (including gup_fast) to use FAULT_FOLL_ALLOW_RETRY to reduce the mmap_sem hold times during I/O. The only few places that remains uncovered are drivers like v4l and other exceptions that tends to work on their own memory and they're not working on random user memory (for example like O_DIRECT that uses gup_fast and is fully covered by this patch). A follow up patch should probably also add a printk_once warning to get_user_pages that should go obsolete and be phased out eventually. The "vmas" parameter of get_user_pages makes it fundamentally incompatible with FAULT_FOLL_ALLOW_RETRY (vmas array becomes meaningless the moment the mmap_sem is released). While this is just an optimization, this becomes an absolute requirement for the userfaultfd feature http://lwn.net/Articles/615086/ . The userfaultfd allows to block the page fault, and in order to do so I need to drop the mmap_sem first. So this patch also ensures that all memory where userfaultfd could be registered by KVM, the very first fault (no matter if it is a regular page fault, or a get_user_pages) always has FAULT_FOLL_ALLOW_RETRY set. Then the userfaultfd blocks and it is waken only when the pagetable is already mapped. The second fault attempt after the wakeup doesn't need FAULT_FOLL_ALLOW_RETRY, so it's ok to retry without it. This patch (of 5): We can leverage the VM_FAULT_RETRY functionality in the page fault paths better by using either get_user_pages_locked or get_user_pages_unlocked. The former allows conversion of get_user_pages invocations that will have to pass a "&locked" parameter to know if the mmap_sem was dropped during the call. Example from: down_read(&mm->mmap_sem); do_something() get_user_pages(tsk, mm, ..., pages, NULL); up_read(&mm->mmap_sem); to: int locked = 1; down_read(&mm->mmap_sem); do_something() get_user_pages_locked(tsk, mm, ..., pages, &locked); if (locked) up_read(&mm->mmap_sem); The latter is suitable only as a drop in replacement of the form: down_read(&mm->mmap_sem); get_user_pages(tsk, mm, ..., pages, NULL); up_read(&mm->mmap_sem); into: get_user_pages_unlocked(tsk, mm, ..., pages); Where tsk, mm, the intermediate "..." paramters and "pages" can be any value as before. Just the last parameter of get_user_pages (vmas) must be NULL for get_user_pages_locked|unlocked to be usable (the latter original form wouldn't have been safe anyway if vmas wasn't null, for the former we just make it explicit by dropping the parameter). If vmas is not NULL these two methods cannot be used. Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Reviewed-by: NAndres Lagar-Cavilla <andreslc@google.com> Reviewed-by: NPeter Feiner <pfeiner@google.com> Reviewed-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 2月, 2015 1 次提交
-
-
由 Kirill A. Shutemov 提交于
remap_file_pages(2) was invented to be able efficiently map parts of huge file into limited 32-bit virtual address space such as in database workloads. Nonlinear mappings are pain to support and it seems there's no legitimate use-cases nowadays since 64-bit systems are widely available. Let's drop it and get rid of all these special-cased code. The patch replaces the syscall with emulation which creates new VMA on each remap_file_pages(), unless they it can be merged with an adjacent one. I didn't find *any* real code that uses remap_file_pages(2) to test emulation impact on. I've checked Debian code search and source of all packages in ALT Linux. No real users: libc wrappers, mentions in strace, gdb, valgrind and this kind of stuff. There are few basic tests in LTP for the syscall. They work just fine with emulation. To test performance impact, I've written small test case which demonstrate pretty much worst case scenario: map 4G shmfs file, write to begin of every page pgoff of the page, remap pages in reverse order, read every page. The test creates 1 million of VMAs if emulation is in use, so I had to set vm.max_map_count to 1100000 to avoid -ENOMEM. Before: 23.3 ( +- 4.31% ) seconds After: 43.9 ( +- 0.85% ) seconds Slowdown: 1.88x I believe we can live with that. Test case: #define _GNU_SOURCE #include <assert.h> #include <stdlib.h> #include <stdio.h> #include <sys/mman.h> #define MB (1024UL * 1024) #define SIZE (4096 * MB) int main(int argc, char **argv) { unsigned long *p; long i, pass; for (pass = 0; pass < 10; pass++) { p = mmap(NULL, SIZE, PROT_READ|PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0); if (p == MAP_FAILED) { perror("mmap"); return -1; } for (i = 0; i < SIZE / 4096; i++) p[i * 4096 / sizeof(*p)] = i; for (i = 0; i < SIZE / 4096; i++) { if (remap_file_pages(p + i * 4096 / sizeof(*p), 4096, 0, (SIZE - 4096 * (i + 1)) >> 12, 0)) { perror("remap_file_pages"); return -1; } } for (i = SIZE / 4096 - 1; i >= 0; i--) assert(p[i * 4096 / sizeof(*p)] == SIZE / 4096 - i - 1); munmap(p, SIZE); } return 0; } [akpm@linux-foundation.org: fix spello] [sasha.levin@oracle.com: initialize populate before usage] [sasha.levin@oracle.com: grab file ref to prevent race while mmaping] Signed-off-by: N"Kirill A. Shutemov" <kirill@shutemov.name> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Dave Jones <davej@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Armin Rigo <arigo@tunes.org> Signed-off-by: NSasha Levin <sasha.levin@oracle.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 06 2月, 2015 1 次提交
-
-
由 Arnd Bergmann 提交于
The symbol 'high_memory' is provided on both MMU- and NOMMU-kernels, but only one of them is exported, which leads to module build errors in drivers that work fine built-in: ERROR: "high_memory" [drivers/net/virtio_net.ko] undefined! ERROR: "high_memory" [drivers/net/ppp/ppp_mppe.ko] undefined! ERROR: "high_memory" [drivers/mtd/nand/nand.ko] undefined! ERROR: "high_memory" [crypto/tcrypt.ko] undefined! ERROR: "high_memory" [crypto/cts.ko] undefined! This exports the symbol to get these to work on NOMMU as well. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: NGreg Ungerer <gerg@uclinux.org> Acked-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 21 1月, 2015 1 次提交
-
-
由 Christoph Hellwig 提交于
Since "BDI: Provide backing device capability information [try #3]" the backing_dev_info structure also provides flags for the kind of mmap operation available in a nommu environment, which is entirely unrelated to it's original purpose. Introduce a new nommu-only file operation to provide this information to the nommu mmap code instead. Splitting this from the backing_dev_info structure allows to remove lots of backing_dev_info instance that aren't otherwise needed, and entirely gets rid of the concept of providing a backing_dev_info for a character device. It also removes the need for the mtd_inodefs filesystem. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NTejun Heo <tj@kernel.org> Acked-by: NBrian Norris <computersforpeace@gmail.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 14 12月, 2014 3 次提交
-
-
由 Joonsoo Kim 提交于
do_mmap_private() in nommu.c try to allocate physically contiguous pages with arbitrary size in some cases and we now have good abstract function to do exactly same thing, alloc_pages_exact(). So, change to use it. There is no functional change. This is the preparation step for support page owner feature accurately. Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Dave Hansen <dave@sr71.net> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Jungsoo Son <jungsoo.son@lge.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Davidlohr Bueso 提交于
Shrinking/truncate logic can call nommu_shrink_inode_mappings() to verify that any shared mappings of the inode in question aren't broken (dead zone). afaict the only user being ramfs to handle the size change attribute. Pretty much a no-brainer to share the lock. Signed-off-by: NDavidlohr Bueso <dbueso@suse.de> Acked-by: N"Kirill A. Shutemov" <kirill@shutemov.name> Acked-by: NHugh Dickins <hughd@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: NMel Gorman <mgorman@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Davidlohr Bueso 提交于
Convert all open coded mutex_lock/unlock calls to the i_mmap_[lock/unlock]_write() helpers. Signed-off-by: NDavidlohr Bueso <dbueso@suse.de> Acked-by: NRik van Riel <riel@redhat.com> Acked-by: N"Kirill A. Shutemov" <kirill@shutemov.name> Acked-by: NHugh Dickins <hughd@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: NMel Gorman <mgorman@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 08 9月, 2014 1 次提交
-
-
由 Tejun Heo 提交于
Percpu allocator now supports allocation mask. Add @gfp to percpu_counter_init() so that !GFP_KERNEL allocation masks can be used with percpu_counters too. We could have left percpu_counter_init() alone and added percpu_counter_init_gfp(); however, the number of users isn't that high and introducing _gfp variants to all percpu data structures would be quite ugly, so let's just do the conversion. This is the one with the most users. Other percpu data structures are a lot easier to convert. This patch doesn't make any functional difference. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NJan Kara <jack@suse.cz> Acked-by: N"David S. Miller" <davem@davemloft.net> Cc: x86@kernel.org Cc: Jens Axboe <axboe@kernel.dk> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org>
-
- 09 8月, 2014 1 次提交
-
-
由 Andy Lutomirski 提交于
The core mm code will provide a default gate area based on FIXADDR_USER_START and FIXADDR_USER_END if !defined(__HAVE_ARCH_GATE_AREA) && defined(AT_SYSINFO_EHDR). This default is only useful for ia64. arm64, ppc, s390, sh, tile, 64-bit UML, and x86_32 have their own code just to disable it. arm, 32-bit UML, and x86_64 have gate areas, but they have their own implementations. This gets rid of the default and moves the code into ia64. This should save some code on architectures without a gate area: it's now possible to inline the gate_area functions in the default case. Signed-off-by: NAndy Lutomirski <luto@amacapital.net> Acked-by: NNathan Lynch <nathan_lynch@mentor.com> Acked-by: NH. Peter Anvin <hpa@linux.intel.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [in principle] Acked-by: Richard Weinberger <richard@nod.at> [for um] Acked-by: Will Deacon <will.deacon@arm.com> [for arm64] Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Nathan Lynch <Nathan_Lynch@mentor.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 24 6月, 2014 1 次提交
-
-
由 Steven Miao 提交于
mm could be removed from current task struct, using previous vma->vm_mm It will crash on blackfin after updated to Linux 3.15. The commit "mm: per-thread vma caching" caused the crash. mm could be removed from current task struct before mmput()-> exit_mmap()-> delete_vma_from_mm() the detailed fault information: NULL pointer access Kernel OOPS in progress Deferred Exception context CURRENT PROCESS: COMM=modprobe PID=278 CPU=0 invalid mm return address: [0x000531de]; contents of: 0x000531b0: c727 acea 0c42 181d 0000 0000 0000 a0a8 0x000531c0: b090 acaa 0c42 1806 0000 0000 0000 a0e8 0x000531d0: b0d0 e801 0000 05b3 0010 e522 0046 [a090] 0x000531e0: 6408 b090 0c00 17cc 3042 e3ff f37b 2fc8 CPU: 0 PID: 278 Comm: modprobe Not tainted 3.15.0-ADI-2014R1-pre-00345-gea9f446 #25 task: 0572b720 ti: 0569e000 task.ti: 0569e000 Compiled for cpu family 0x27fe (Rev 0), but running on:0x0000 (Rev 0) ADSP-BF609-0.0 500(MHz CCLK) 125(MHz SCLK) (mpu off) Linux version 3.15.0-ADI-2014R1-pre-00345-gea9f446 (steven@steven-OptiPlex-390) (gcc version 4.3.5 (ADI-trunk/svn-5962) ) #25 Tue Jun 10 17:47:46 CST 2014 SEQUENCER STATUS: Not tainted SEQSTAT: 00000027 IPEND: 8008 IMASK: ffff SYSCFG: 2806 EXCAUSE : 0x27 physical IVG3 asserted : <0xffa00744> { _trap + 0x0 } physical IVG15 asserted : <0xffa00d68> { _evt_system_call + 0x0 } logical irq 6 mapped : <0xffa003bc> { _bfin_coretmr_interrupt + 0x0 } logical irq 7 mapped : <0x00008828> { _bfin_fault_routine + 0x0 } logical irq 11 mapped : <0x00007724> { _l2_ecc_err + 0x0 } logical irq 13 mapped : <0x00008828> { _bfin_fault_routine + 0x0 } logical irq 39 mapped : <0x00150788> { _bfin_twi_interrupt_entry + 0x0 } logical irq 40 mapped : <0x00150788> { _bfin_twi_interrupt_entry + 0x0 } RETE: <0x00000000> /* Maybe null pointer? */ RETN: <0x0569fe50> /* kernel dynamic memory (maybe user-space) */ RETX: <0x00000480> /* Maybe fixed code section */ RETS: <0x00053384> { _exit_mmap + 0x28 } PC : <0x000531de> { _delete_vma_from_mm + 0x92 } DCPLB_FAULT_ADDR: <0x00000008> /* Maybe null pointer? */ ICPLB_FAULT_ADDR: <0x000531de> { _delete_vma_from_mm + 0x92 } PROCESSOR STATE: R0 : 00000004 R1 : 0569e000 R2 : 00bf3db4 R3 : 00000000 R4 : 057f9800 R5 : 00000001 R6 : 0569ddd0 R7 : 0572b720 P0 : 0572b854 P1 : 00000004 P2 : 00000000 P3 : 0569dda0 P4 : 0572b720 P5 : 0566c368 FP : 0569fe5c SP : 0569fd74 LB0: 057f523f LT0: 057f523e LC0: 00000000 LB1: 0005317c LT1: 00053172 LC1: 00000002 B0 : 00000000 L0 : 00000000 M0 : 0566f5bc I0 : 00000000 B1 : 00000000 L1 : 00000000 M1 : 00000000 I1 : ffffffff B2 : 00000001 L2 : 00000000 M2 : 00000000 I2 : 00000000 B3 : 00000000 L3 : 00000000 M3 : 00000000 I3 : 057f8000 A0.w: 00000000 A0.x: 00000000 A1.w: 00000000 A1.x: 00000000 USP : 056ffcf8 ASTAT: 02003024 Hardware Trace: 0 Target : <0x00003fb8> { _trap_c + 0x0 } Source : <0xffa006d8> { _exception_to_level5 + 0xa0 } JUMP.L 1 Target : <0xffa00638> { _exception_to_level5 + 0x0 } Source : <0xffa004f2> { _bfin_return_from_exception + 0x6 } RTX 2 Target : <0xffa004ec> { _bfin_return_from_exception + 0x0 } Source : <0xffa00590> { _ex_trap_c + 0x70 } JUMP.S 3 Target : <0xffa00520> { _ex_trap_c + 0x0 } Source : <0xffa0076e> { _trap + 0x2a } JUMP (P4) 4 Target : <0xffa00744> { _trap + 0x0 } FAULT : <0x000531de> { _delete_vma_from_mm + 0x92 } P0 = W[P2 + 2] Source : <0x000531da> { _delete_vma_from_mm + 0x8e } P2 = [P4 + 0x18] 5 Target : <0x000531da> { _delete_vma_from_mm + 0x8e } Source : <0x00053176> { _delete_vma_from_mm + 0x2a } IF CC JUMP pcrel 6 Target : <0x0005314c> { _delete_vma_from_mm + 0x0 } Source : <0x00053380> { _exit_mmap + 0x24 } JUMP.L 7 Target : <0x00053378> { _exit_mmap + 0x1c } Source : <0x00053394> { _exit_mmap + 0x38 } IF !CC JUMP pcrel (BP) 8 Target : <0x00053390> { _exit_mmap + 0x34 } Source : <0xffa020e0> { __cond_resched + 0x20 } RTS 9 Target : <0xffa020c0> { __cond_resched + 0x0 } Source : <0x0005338c> { _exit_mmap + 0x30 } JUMP.L 10 Target : <0x0005338c> { _exit_mmap + 0x30 } Source : <0x0005333a> { _delete_vma + 0xb2 } RTS 11 Target : <0x00053334> { _delete_vma + 0xac } Source : <0x0005507a> { _kmem_cache_free + 0xba } RTS 12 Target : <0x00055068> { _kmem_cache_free + 0xa8 } Source : <0x0005505e> { _kmem_cache_free + 0x9e } IF !CC JUMP pcrel (BP) 13 Target : <0x00055052> { _kmem_cache_free + 0x92 } Source : <0x0005501a> { _kmem_cache_free + 0x5a } IF CC JUMP pcrel 14 Target : <0x00054ff4> { _kmem_cache_free + 0x34 } Source : <0x00054fce> { _kmem_cache_free + 0xe } IF CC JUMP pcrel (BP) 15 Target : <0x00054fc0> { _kmem_cache_free + 0x0 } Source : <0x00053330> { _delete_vma + 0xa8 } JUMP.L Kernel Stack Stack info: SP: [0x0569ff24] <0x0569ff24> /* kernel dynamic memory (maybe user-space) */ Memory from 0x0569ff20 to 056a0000 0569ff20: 00000001 [04e8da5a] 00008000 00000000 00000000 056a0000 04e8da5a 04e8da5a 0569ff40: 04eb9eea ffa00dce 02003025 04ea09c5 057f523f 04ea09c4 057f523e 00000000 0569ff60: 00000000 00000000 00000000 00000000 00000000 00000000 00000001 00000000 0569ff80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0569ffa0: 0566f5bc 057f8000 057f8000 00000001 04ec0170 056ffcf8 056ffd04 057f9800 0569ffc0: 04d1d498 057f9800 057f8fe4 057f8ef0 00000001 057f928c 00000001 00000001 0569ffe0: 057f9800 00000000 00000008 00000007 00000001 00000001 00000001 <00002806> Return addresses in stack: address : <0x00002806> { _show_cpuinfo + 0x2d2 } Modules linked in: Kernel panic - not syncing: Kernel exception [ end Kernel panic - not syncing: Kernel exception Signed-off-by: NSteven Miao <realmz6@gmail.com> Acked-by: NDavidlohr Bueso <davidlohr@hp.com> Reviewed-by: NRik van Riel <riel@redhat.com> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> [3.15.x] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 07 6月, 2014 1 次提交
-
-
由 Mitchel Humpherys 提交于
printk is meant to be used with an associated log level. There are some instances of printk scattered around the mm code where the log level is missing. Add a log level and adhere to suggestions by scripts/checkpatch.pl by moving to the pr_* macros. Also add the typical pr_fmt definition so that print statements can be easily traced back to the modules where they occur, correlated one with another, etc. This will require the removal of some (now redundant) prefixes on a few print statements. Signed-off-by: NMitchel Humpherys <mitchelh@codeaurora.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 08 4月, 2014 3 次提交
-
-
由 Choi Gi-yong 提交于
Signed-off-by: NChoi Gi-yong <yong@gnoy.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Gideon Israel Dsouza 提交于
To increase compiler portability there is <linux/compiler.h> which provides convenience macros for various gcc constructs. Eg: __weak for __attribute__((weak)). I've replaced all instances of gcc attributes with the right macro in the memory management (/mm) subsystem. [akpm@linux-foundation.org: while-we're-there consistency tweaks] Signed-off-by: NGideon Israel Dsouza <gidisrael@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Davidlohr Bueso 提交于
This patch is a continuation of efforts trying to optimize find_vma(), avoiding potentially expensive rbtree walks to locate a vma upon faults. The original approach (https://lkml.org/lkml/2013/11/1/410), where the largest vma was also cached, ended up being too specific and random, thus further comparison with other approaches were needed. There are two things to consider when dealing with this, the cache hit rate and the latency of find_vma(). Improving the hit-rate does not necessarily translate in finding the vma any faster, as the overhead of any fancy caching schemes can be too high to consider. We currently cache the last used vma for the whole address space, which provides a nice optimization, reducing the total cycles in find_vma() by up to 250%, for workloads with good locality. On the other hand, this simple scheme is pretty much useless for workloads with poor locality. Analyzing ebizzy runs shows that, no matter how many threads are running, the mmap_cache hit rate is less than 2%, and in many situations below 1%. The proposed approach is to replace this scheme with a small per-thread cache, maximizing hit rates at a very low maintenance cost. Invalidations are performed by simply bumping up a 32-bit sequence number. The only expensive operation is in the rare case of a seq number overflow, where all caches that share the same address space are flushed. Upon a miss, the proposed replacement policy is based on the page number that contains the virtual address in question. Concretely, the following results are seen on an 80 core, 8 socket x86-64 box: 1) System bootup: Most programs are single threaded, so the per-thread scheme does improve ~50% hit rate by just adding a few more slots to the cache. +----------------+----------+------------------+ | caching scheme | hit-rate | cycles (billion) | +----------------+----------+------------------+ | baseline | 50.61% | 19.90 | | patched | 73.45% | 13.58 | +----------------+----------+------------------+ 2) Kernel build: This one is already pretty good with the current approach as we're dealing with good locality. +----------------+----------+------------------+ | caching scheme | hit-rate | cycles (billion) | +----------------+----------+------------------+ | baseline | 75.28% | 11.03 | | patched | 88.09% | 9.31 | +----------------+----------+------------------+ 3) Oracle 11g Data Mining (4k pages): Similar to the kernel build workload. +----------------+----------+------------------+ | caching scheme | hit-rate | cycles (billion) | +----------------+----------+------------------+ | baseline | 70.66% | 17.14 | | patched | 91.15% | 12.57 | +----------------+----------+------------------+ 4) Ebizzy: There's a fair amount of variation from run to run, but this approach always shows nearly perfect hit rates, while baseline is just about non-existent. The amounts of cycles can fluctuate between anywhere from ~60 to ~116 for the baseline scheme, but this approach reduces it considerably. For instance, with 80 threads: +----------------+----------+------------------+ | caching scheme | hit-rate | cycles (billion) | +----------------+----------+------------------+ | baseline | 1.06% | 91.54 | | patched | 99.97% | 14.18 | +----------------+----------+------------------+ [akpm@linux-foundation.org: fix nommu build, per Davidlohr] [akpm@linux-foundation.org: document vmacache_valid() logic] [akpm@linux-foundation.org: attempt to untangle header files] [akpm@linux-foundation.org: add vmacache_find() BUG_ON] [hughd@google.com: add vmacache_valid_mm() (from Oleg)] [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: adjust and enhance comments] Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com> Reviewed-by: NRik van Riel <riel@redhat.com> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Reviewed-by: NMichel Lespinasse <walken@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Tested-by: NHugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-