- 22 4月, 2015 3 次提交
-
-
由 Herbert Xu 提交于
This patch adds the function crypto_rng_set_entropy. It is only meant to be used by testmgr when testing RNG implementations by providing fixed entropy data in order to verify test vectors. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Herbert Xu 提交于
This patch converts the low-level crypto_rng interface to the "new" style. This allows existing implementations to be converted over one- by-one. Once that is complete we can then remove the old rng interface. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Herbert Xu 提交于
There is no reason why crypto_rng_reset should modify the seed so this patch marks it as const. Since our algorithms don't export a const seed function yet we have to go through some contortions for now. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
- 21 4月, 2015 3 次提交
-
-
由 Herbert Xu 提交于
This patch adds the new top-level function crypto_rng_generate which generates random numbers with additional input. It also extends the mid-level rng_gen_random function to take additional data as input. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Herbert Xu 提交于
This patch converts the top-level crypto_rng to the "new" style. It was the last algorithm type added before we switched over to the new way of doing things exemplified by shash. All users will automatically switch over to the new interface. Note that this patch does not touch the low-level interface to rng implementations. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Stephan Mueller 提交于
The creation of a shadow copy is intended to only hold a short term lock. But the drawback is that parallel users have a very similar DRBG state which only differs by a high-resolution time stamp. The DRBG will now hold a long term lock. Therefore, the lock is changed to a mutex which implies that the DRBG can only be used in process context. The lock now guards the instantiation as well as the entire DRBG generation operation. Therefore, multiple callers are fully serialized when generating a random number. As the locking is changed to use a long-term lock to avoid such similar DRBG states, the entire creation and maintenance of a shadow copy can be removed. Signed-off-by: NStephan Mueller <smueller@chronox.de> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
- 16 4月, 2015 24 次提交
-
-
由 Rasmus Villemoes 提交于
The macro BITMAP_LAST_WORD_MASK can be implemented without a conditional, which will generally lead to slightly better generated code (221 bytes saved for allmodconfig-GCOV_KERNEL, ~2k with GCOV_KERNEL). As a small bonus, this also ensures that the nbits parameter is expanded exactly once. In BITMAP_FIRST_WORD_MASK, if start is signed gcc is technically allowed to assume it is positive (or divisible by BITS_PER_LONG), and hence just do the simple mask. It doesn't seem to use this, and even on an architecture like x86 where the shift only depends on the lower 5 or 6 bits, and these bits are not affected by the signedness of the expression, gcc still generates code to compute the C99 mandated value of start % BITS_PER_LONG. So just use a mask explicitly, also for consistency with BITMAP_LAST_WORD_MASK. Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Tejun Heo <tj@kernel.org> Reviewed-by: NGeorge Spelvin <linux@horizon.com> Cc: Yury Norov <yury.norov@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Rasmus Villemoes 提交于
The current semantics of string_escape_mem are inadequate for one of its current users, vsnprintf(). If that is to honour its contract, it must know how much space would be needed for the entire escaped buffer, and string_escape_mem provides no way of obtaining that (short of allocating a large enough buffer (~4 times input string) to let it play with, and that's definitely a big no-no inside vsnprintf). So change the semantics for string_escape_mem to be more snprintf-like: Return the size of the output that would be generated if the destination buffer was big enough, but of course still only write to the part of dst it is allowed to, and (contrary to snprintf) don't do '\0'-termination. It is then up to the caller to detect whether output was truncated and to append a '\0' if desired. Also, we must output partial escape sequences, otherwise a call such as snprintf(buf, 3, "%1pE", "\123") would cause printf to write a \0 to buf[2] but leaving buf[0] and buf[1] with whatever they previously contained. This also fixes a bug in the escaped_string() helper function, which used to unconditionally pass a length of "end-buf" to string_escape_mem(); since the latter doesn't check osz for being insanely large, it would happily write to dst. For example, kasprintf(GFP_KERNEL, "something and then %pE", ...); is an easy way to trigger an oops. In test-string_helpers.c, the -ENOMEM test is replaced with testing for getting the expected return value even if the buffer is too small. We also ensure that nothing is written (by relying on a NULL pointer deref) if the output size is 0 by passing NULL - this has to work for kasprintf("%pE") to work. In net/sunrpc/cache.c, I think qword_add still has the same semantics. Someone should definitely double-check this. In fs/proc/array.c, I made the minimum possible change, but longer-term it should stop poking around in seq_file internals. [andriy.shevchenko@linux.intel.com: simplify qword_add] [andriy.shevchenko@linux.intel.com: add missed curly braces] Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Steven Rostedt 提交于
KERN_CONT is nicely commented in kern_levels.h, but pr_cont() is now used more often, and it lacks the comment stating what it is used for. It can be confused as continuing the log level, but that is not its purpose. Its purpose is to continue a line that had no newline enclosed. This should be documented by pr_cont() as well. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org> Acked-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joel Stanley 提交于
The kernel has orderly_poweroff which allows the kernel to initiate a graceful shutdown of userspace, by running /sbin/poweroff. This adds orderly_reboot that will cause userspace to shut itself down by calling /sbin/reboot. This will be used for shutdown initiated by a system controller on platforms that do not use ACPI. orderly_reboot() should be used when the system wants to allow userspace to gracefully shut itself down. For cases where the system may imminently catch on fire, the existing emergency_restart() provides an immediate reboot without involving userspace. Signed-off-by: NJoel Stanley <joel@jms.id.au> Cc: Fabian Frederick <fabf@skynet.be> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jeremy Kerr <jk@ozlabs.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jakub Sitnicki 提交于
All users of __check_region(), check_region(), and check_mem_region() are gone. We got rid of the last user in v4.0-rc1. Remove them. bloat-o-meter on x86_64 shows: add/remove: 0/3 grow/shrink: 0/0 up/down: 0/-102 (-102) function old new delta __kstrtab___check_region 15 - -15 __ksymtab___check_region 16 - -16 __check_region 71 - -71 Signed-off-by: NJakub Sitnicki <jsitnicki@gmail.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Iulia Manda 提交于
There are a lot of embedded systems that run most or all of their functionality in init, running as root:root. For these systems, supporting multiple users is not necessary. This patch adds a new symbol, CONFIG_MULTIUSER, that makes support for non-root users, non-root groups, and capabilities optional. It is enabled under CONFIG_EXPERT menu. When this symbol is not defined, UID and GID are zero in any possible case and processes always have all capabilities. The following syscalls are compiled out: setuid, setregid, setgid, setreuid, setresuid, getresuid, setresgid, getresgid, setgroups, getgroups, setfsuid, setfsgid, capget, capset. Also, groups.c is compiled out completely. In kernel/capability.c, capable function was moved in order to avoid adding two ifdef blocks. This change saves about 25 KB on a defconfig build. The most minimal kernels have total text sizes in the high hundreds of kB rather than low MB. (The 25k goes down a bit with allnoconfig, but not that much. The kernel was booted in Qemu. All the common functionalities work. Adding users/groups is not possible, failing with -ENOSYS. Bloat-o-meter output: add/remove: 7/87 grow/shrink: 19/397 up/down: 1675/-26325 (-24650) [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NIulia Manda <iulia.manda21@gmail.com> Reviewed-by: NJosh Triplett <josh@joshtriplett.org> Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org> Tested-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Rasmus Villemoes 提交于
Commit 607ca46e ("UAPI: (Scripted) Disintegrate include/linux") left behind some empty conditional blocks. Since they are useless and may cause a reader to wonder whether something is missing, remove them. Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: David Howells <dhowells@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Minchan Kim 提交于
This patch provides core functions for migration of zsmalloc. Migraion policy is simple as follows. for each size class { while { src_page = get zs_page from ZS_ALMOST_EMPTY if (!src_page) break; dst_page = get zs_page from ZS_ALMOST_FULL if (!dst_page) dst_page = get zs_page from ZS_ALMOST_EMPTY if (!dst_page) break; migrate(from src_page, to dst_page); } } For migration, we need to identify which objects in zspage are allocated to migrate them out. We could know it by iterating of freed objects in a zspage because first_page of zspage keeps free objects singly-linked list but it's not efficient. Instead, this patch adds a tag(ie, OBJ_ALLOCATED_TAG) in header of each object(ie, handle) so we could check whether the object is allocated easily. This patch adds another status bit in handle to synchronize between user access through zs_map_object and migration. During migration, we cannot move objects user are using due to data coherency between old object and new object. [akpm@linux-foundation.org: zsmalloc.c needs sched.h for cond_resched()] Signed-off-by: NMinchan Kim <minchan@kernel.org> Cc: Juneho Choi <juno.choi@lge.com> Cc: Gunho Lee <gunho.lee@lge.com> Cc: Luigi Semenzato <semenzato@google.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Seth Jennings <sjennings@variantweb.net> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Boaz Harrosh 提交于
From: Yigal Korman <yigal@plexistor.com> [v1] Without this patch, c/mtime is not updated correctly when mmap'ed page is first read from and then written to. A new xfstest is submitted for testing this (generic/080) [v2] Jan Kara has pointed out that if we add the sb_start/end_pagefault pair in the new pfn_mkwrite we are then fixing another bug where: A user could start writing to the page while filesystem is frozen. Signed-off-by: NYigal Korman <yigal@plexistor.com> Signed-off-by: NBoaz Harrosh <boaz@plexistor.com> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Boaz Harrosh 提交于
This will allow FS that uses VM_PFNMAP | VM_MIXEDMAP (no page structs) to get notified when access is a write to a read-only PFN. This can happen if we mmap() a file then first mmap-read from it to page-in a read-only PFN, than we mmap-write to the same page. We need this functionality to fix a DAX bug, where in the scenario above we fail to set ctime/mtime though we modified the file. An xfstest is attached to this patchset that shows the failure and the fix. (A DAX patch will follow) This functionality is extra important for us, because upon dirtying of a pmem page we also want to RDMA the page to a remote cluster node. We define a new pfn_mkwrite and do not reuse page_mkwrite because 1 - The name ;-) 2 - But mainly because it would take a very long and tedious audit of all page_mkwrite functions of VM_MIXEDMAP/VM_PFNMAP users. To make sure they do not now CRASH. For example current DAX code (which this is for) would crash. If we would want to reuse page_mkwrite, We will need to first patch all users, so to not-crash-on-no-page. Then enable this patch. But even if I did that I would not sleep so well at night. Adding a new vector is the safest thing to do, and is not that expensive. an extra pointer at a static function vector per driver. Also the new vector is better for performance, because else we Will call all current Kernel vectors, so to: check-ha-no-page-do-nothing and return. No need to call it from do_shared_fault because do_wp_page is called to change pte permissions anyway. Signed-off-by: NYigal Korman <yigal@plexistor.com> Signed-off-by: NBoaz Harrosh <boaz@plexistor.com> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrey Ryabinin 提交于
Mempools keep allocated objects in reserved for situations when ordinary allocation may not be possible to satisfy. These objects shouldn't be accessed before they leave the pool. This patch poison elements when get into the pool and unpoison when they leave it. This will let KASan to detect use-after-free of mempool's elements. Signed-off-by: NAndrey Ryabinin <a.ryabinin@samsung.com> Tested-by: NDavid Rientjes <rientjes@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Chernenkov <drcheren@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Alexander Potapenko <glider@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kirill A. Shutemov 提交于
Most-used page->mapping helper -- page_mapping() -- has already uninlined. Let's uninline also page_rmapping() and page_anon_vma(). It saves us depending on configuration around 400 bytes in text: text data bss dec hex filename 660318 99254 410000 1169572 11d8a4 mm/built-in.o-before 659854 99254 410000 1169108 11d6d4 mm/built-in.o I also tried to make code a bit more clean. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Christoph Lameter <cl@linux.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Stefan Strogin 提交于
Add trace events for cma_alloc() and cma_release(). The cma_alloc tracepoint is used both for successful and failed allocations, in case of allocation failure pfn=-1UL is stored and printed. Signed-off-by: NStefan Strogin <stefan.strogin@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Nazarewicz <mpn@google.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Cc: Thierry Reding <treding@nvidia.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Borislav Petkov 提交于
Flip the flag test so that it is the simplest. No functional change, just a small readability improvement: No code changed: # arch/x86/kernel/sys_x86_64.o: text data bss dec hex filename 1551 24 0 1575 627 sys_x86_64.o.before 1551 24 0 1575 627 sys_x86_64.o.after md5: 70708d1b1ad35cc891118a69dc1a63f9 sys_x86_64.o.before.asm 70708d1b1ad35cc891118a69dc1a63f9 sys_x86_64.o.after.asm Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Naoya Horiguchi 提交于
Now we have an easy access to hugepages' activeness, so existing helpers to get the information can be cleaned up. [akpm@linux-foundation.org: s/PageHugeActive/page_huge_active/] Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Hugh Dickins <hughd@google.com> Reviewed-by: NMichal Hocko <mhocko@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Rientjes 提交于
All occurrences of mempools based on slab caches with object constructors have been removed from the tree, so disallow creating them. We can only dereference mem->ctor in mm/mempool.c without including mm/slab.h in include/linux/mempool.h. So simply note the restriction, just like the comment restricting usage of __GFP_ZERO, and warn on kernels with CONFIG_DEBUG_VM() if such a mempool is allocated from. We don't want to incur this check on every element allocation, so use VM_BUG_ON(). Signed-off-by: NDavid Rientjes <rientjes@google.com> Cc: Dave Kleikamp <shaggy@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mike Kravetz 提交于
Make 'min_size=<value>' be an option when mounting a hugetlbfs. This option takes the same value as the 'size' option. min_size can be specified without specifying size. If both are specified, min_size must be less that or equal to size else the mount will fail. If min_size is specified, then at mount time an attempt is made to reserve min_size pages. If the reservation fails, the mount fails. At umount time, the reserved pages are released. Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mike Kravetz 提交于
hugetlbfs allocates huge pages from the global pool as needed. Even if the global pool contains a sufficient number pages for the filesystem size at mount time, those global pages could be grabbed for some other use. As a result, filesystem huge page allocations may fail due to lack of pages. Applications such as a database want to use huge pages for performance reasons. hugetlbfs filesystem semantics with ownership and modes work well to manage access to a pool of huge pages. However, the application would like some reasonable assurance that allocations will not fail due to a lack of huge pages. At application startup time, the application would like to configure itself to use a specific number of huge pages. Before starting, the application can check to make sure that enough huge pages exist in the system global pools. However, there are no guarantees that those pages will be available when needed by the application. What the application wants is exclusive use of a subset of huge pages. Add a new hugetlbfs mount option 'min_size=<value>' to indicate that the specified number of pages will be available for use by the filesystem. At mount time, this number of huge pages will be reserved for exclusive use of the filesystem. If there is not a sufficient number of free pages, the mount will fail. As pages are allocated to and freeed from the filesystem, the number of reserved pages is adjusted so that the specified minimum is maintained. This patch (of 4): Add a field to the subpool structure to indicate the minimimum number of huge pages to always be used by this subpool. This minimum count includes allocated pages as well as reserved pages. If the minimum number of pages for the subpool have not been allocated, pages are reserved up to this minimum. An additional field (rsv_hpages) is used to track the number of pages reserved to meet this minimum size. The hstate pointer in the subpool is convenient to have when reserving and unreserving the pages. Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Minchan Kim 提交于
"deactivate_page" was created for file invalidation so it has too specific logic for file-backed pages. So, let's change the name of the function and date to a file-specific one and yield the generic name. Signed-off-by: NMinchan Kim <minchan@kernel.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Shaohua Li <shli@kernel.org> Cc: Wang, Yalin <Yalin.Wang@sonymobile.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Eric B Munson 提交于
Currently, pages which are marked as unevictable are protected from compaction, but not from other types of migration. The POSIX real time extension explicitly states that mlock() will prevent a major page fault, but the spirit of this is that mlock() should give a process the ability to control sources of latency, including minor page faults. However, the mlock manpage only explicitly says that a locked page will not be written to swap and this can cause some confusion. The compaction code today does not give a developer who wants to avoid swap but wants to have large contiguous areas available any method to achieve this state. This patch introduces a sysctl for controlling compaction behavior with respect to the unevictable lru. Users who demand no page faults after a page is present can set compact_unevictable_allowed to 0 and users who need the large contiguous areas can enable compaction on locked memory by leaving the default value of 1. To illustrate this problem I wrote a quick test program that mmaps a large number of 1MB files filled with random data. These maps are created locked and read only. Then every other mmap is unmapped and I attempt to allocate huge pages to the static huge page pool. When the compact_unevictable_allowed sysctl is 0, I cannot allocate hugepages after fragmenting memory. When the value is set to 1, allocations succeed. Signed-off-by: NEric B Munson <emunson@akamai.com> Acked-by: NMichal Hocko <mhocko@suse.cz> Acked-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NChristoph Lameter <cl@linux.com> Acked-by: NDavid Rientjes <rientjes@google.com> Acked-by: NRik van Riel <riel@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Christoph Lameter <cl@linux.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kirill A. Shutemov 提交于
THP uses tail page refcounting to be able to split huge pages at any time. Tail page refcounting is not needed for other users of compound pages and it's harmful because of overhead. We try to exclude non-THP pages from tail page refcounting using __compound_tail_refcounted() check. It excludes most common non-THP compound pages: SL*B and hugetlb, but it doesn't catch rest of __GFP_COMP users -- drivers. And it's not only about overhead. Drivers might want to use compound pages to get refcounting semantics suitable for mapping high-order pages to userspace. But tail page refcounting breaks it. Tail page refcounting uses ->_mapcount in tail pages to store GUP pins on them. It means GUP pins would affect page_mapcount() for tail pages. It's not a problem for THP, because it never maps tail pages. But unlike THP, drivers map parts of compound pages with PTEs and it makes page_mapcount() be called for tail pages. In particular, GUP pins would shift PSS up and affect /proc/kpagecount for such pages. But, I'm not aware about anything which can lead to crash or other serious misbehaviour. Since currently all THP pages are anonymous and all drivers pages are not, we can fix the __compound_tail_refcounted() check by requiring PageAnon() to enable tail page refcounting. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: NHugh Dickins <hughd@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kirill A. Shutemov 提交于
Currently we take a naive approach to page flags on compound pages - we set the flag on the page without consideration if the flag makes sense for tail page or for compound page in general. This patchset try to sort this out by defining per-flag policy on what need to be done if page-flag helper operate on compound page. The last patch in the patchset also sanitizes usege of page->mapping for tail pages. We don't define the meaning of page->mapping for tail pages. Currently it's always NULL, which can be inconsistent with head page and potentially lead to problems. For now I caught one case of illegal usage of page flags or ->mapping: sound subsystem allocates pages with __GFP_COMP and maps them with PTEs. It leads to setting dirty bit on tail pages and access to tail_page's ->mapping. I don't see any bad behaviour caused by this, but worth fixing anyway. This patchset makes more sense if you take my THP refcounting into account: we will see more compound pages mapped with PTEs and we need to define behaviour of flags on compound pages to avoid bugs. This patch (of 16): We have page-flags helper function declarations/definitions spread over several header files. Let's consolidate them in <linux/page-flags.h>. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Acked-by: NHugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Zhang Zhen 提交于
All callers of zone_movable_is_highmem are under #ifdef CONFIG_HIGHMEM, so the else branch return 0 is not needed. Signed-off-by: NZhang Zhen <zhenzhang.zhang@huawei.com> Acked-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Zhang Zhen 提交于
vfs_readdir() was replaced by iterate_dir() in commit 5c0ba4e0 ("[readdir] introduce iterate_dir() and dir_context"). Signed-off-by: NZhang Zhen <zhenzhang.zhang@huawei.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 4月, 2015 10 次提交
-
-
由 Vladimir Murzin 提交于
Since memtest might be used by other architectures pass input parameters as phys_addr_t instead of long to prevent overflow. Signed-off-by: NVladimir Murzin <vladimir.murzin@arm.com> Acked-by: NWill Deacon <will.deacon@arm.com> Tested-by: NMark Rutland <mark.rutland@arm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vladimir Murzin 提交于
Memtest is a simple feature which fills the memory with a given set of patterns and validates memory contents, if bad memory regions is detected it reserves them via memblock API. Since memblock API is widely used by other architectures this feature can be enabled outside of x86 world. This patch set promotes memtest to live under generic mm umbrella and enables memtest feature for arm/arm64. It was reported that this patch set was useful for tracking down an issue with some errant DMA on an arm64 platform. This patch (of 6): There is nothing platform dependent in the core memtest code, so other platforms might benefit from this feature too. [linux@roeck-us.net: MEMTEST depends on MEMBLOCK] Signed-off-by: NVladimir Murzin <vladimir.murzin@arm.com> Acked-by: NWill Deacon <will.deacon@arm.com> Tested-by: NMark Rutland <mark.rutland@arm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Rientjes 提交于
Allocating a large number of elements in atomic context could quickly deplete memory reserves, so just disallow atomic resizing entirely. Nothing currently uses mempool_resize() with anything other than GFP_KERNEL, so convert existing callers to drop the gfp_mask. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NDavid Rientjes <rientjes@google.com> Acked-by: Steffen Maier <maier@linux.vnet.ibm.com> [zfcp] Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Steve French <sfrench@samba.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
If kernel panics due to oom, caused by a cgroup reaching its limit, when 'compulsory panic_on_oom' is enabled, then we will only see that the OOM happened because of "compulsory panic_on_oom is enabled" but this doesn't tell the difference between mempolicy and memcg. And dumping system wide information is plain wrong and more confusing. This patch provides the information of the cgroup whose limit triggerred panic Signed-off-by: NBalasubramani Vivekanandan <balasubramani_vivekanandan@mentor.com> Acked-by: NMichal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mel Gorman 提交于
This code is dead since commit 9e645ab6 ("sched/numa: Continue PTE scanning even if migrate rate limited") so remove it. Signed-off-by: NMel Gorman <mgorman@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kees Cook 提交于
The arch_randomize_brk() function is used on several architectures, even those that don't support ET_DYN ASLR. To avoid bulky extern/#define tricks, consolidate the support under CONFIG_ARCH_HAS_ELF_RANDOMIZE for the architectures that support it, while still handling CONFIG_COMPAT_BRK. Signed-off-by: NKees Cook <keescook@chromium.org> Cc: Hector Marco-Gisbert <hecmargi@upv.es> Cc: Russell King <linux@arm.linux.org.uk> Reviewed-by: NIngo Molnar <mingo@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: "David A. Long" <dave.long@linaro.org> Cc: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: Arun Chandran <achandran@mvista.com> Cc: Yann Droneaud <ydroneaud@opteya.com> Cc: Min-Hua Chen <orca.chen@gmail.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Alex Smith <alex@alex-smith.me.uk> Cc: Markos Chandras <markos.chandras@imgtec.com> Cc: Vineeth Vijayan <vvijayan@mvista.com> Cc: Jeff Bailey <jeffbailey@google.com> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Behan Webster <behanw@converseincode.com> Cc: Ismael Ripoll <iripoll@upv.es> Cc: Jan-Simon Mller <dl9pf@gmx.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kees Cook 提交于
When an architecture fully supports randomizing the ELF load location, a per-arch mmap_rnd() function is used to find a randomized mmap base. In preparation for randomizing the location of ET_DYN binaries separately from mmap, this renames and exports these functions as arch_mmap_rnd(). Additionally introduces CONFIG_ARCH_HAS_ELF_RANDOMIZE for describing this feature on architectures that support it (which is a superset of ARCH_BINFMT_ELF_RANDOMIZE_PIE, since s390 already supports a separated ET_DYN ASLR from mmap ASLR without the ARCH_BINFMT_ELF_RANDOMIZE_PIE logic). Signed-off-by: NKees Cook <keescook@chromium.org> Cc: Hector Marco-Gisbert <hecmargi@upv.es> Cc: Russell King <linux@arm.linux.org.uk> Reviewed-by: NIngo Molnar <mingo@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: "David A. Long" <dave.long@linaro.org> Cc: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: Arun Chandran <achandran@mvista.com> Cc: Yann Droneaud <ydroneaud@opteya.com> Cc: Min-Hua Chen <orca.chen@gmail.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Alex Smith <alex@alex-smith.me.uk> Cc: Markos Chandras <markos.chandras@imgtec.com> Cc: Vineeth Vijayan <vvijayan@mvista.com> Cc: Jeff Bailey <jeffbailey@google.com> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Behan Webster <behanw@converseincode.com> Cc: Ismael Ripoll <iripoll@upv.es> Cc: Jan-Simon Mller <dl9pf@gmx.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Toshi Kani 提交于
Change vunmap_pmd_range() and vunmap_pud_range() to tear down huge KVA mappings when they are set. pud_clear_huge() and pmd_clear_huge() return zero when no-operation is performed, i.e. huge page mapping was not used. These changes are only enabled when CONFIG_HAVE_ARCH_HUGE_VMAP is defined on the architecture. [akpm@linux-foundation.org: use consistent code layout] Signed-off-by: NToshi Kani <toshi.kani@hp.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Robert Elliott <Elliott@hp.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Toshi Kani 提交于
ioremap_pud_range() and ioremap_pmd_range() are changed to create huge I/O mappings when their capability is enabled, and a request meets required conditions -- both virtual & physical addresses are aligned by their huge page size, and a requested range fufills their huge page size. When pud_set_huge() or pmd_set_huge() returns zero, i.e. no-operation is performed, the code simply falls back to the next level. The changes are only enabled when CONFIG_HAVE_ARCH_HUGE_VMAP is defined on the architecture. Signed-off-by: NToshi Kani <toshi.kani@hp.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Robert Elliott <Elliott@hp.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Toshi Kani 提交于
Add ioremap_pud_enabled() and ioremap_pmd_enabled(), which return 1 when I/O mappings with pud/pmd are enabled on the kernel. ioremap_huge_init() calls arch_ioremap_pud_supported() and arch_ioremap_pmd_supported() to initialize the capabilities at boot-time. A new kernel option "nohugeiomap" is also added, so that user can disable the huge I/O map capabilities when necessary. Signed-off-by: NToshi Kani <toshi.kani@hp.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Robert Elliott <Elliott@hp.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-