- 21 9月, 2016 2 次提交
-
-
由 Jiri Olsa 提交于
We hit hardened usercopy feature check for kernel text access by reading kcore file: usercopy: kernel memory exposure attempt detected from ffffffff8179a01f (<kernel text>) (4065 bytes) kernel BUG at mm/usercopy.c:75! Bypassing this check for kcore by adding bounce buffer for ktext data. Reported-by: NSteve Best <sbest@redhat.com> Fixes: f5509cc1 ("mm: Hardened usercopy") Suggested-by: NKees Cook <keescook@chromium.org> Signed-off-by: NJiri Olsa <jolsa@kernel.org> Acked-by: NKees Cook <keescook@chromium.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jiri Olsa 提交于
Next patch adds bounce buffer for ktext area, so it's convenient to have single bounce buffer for both vmalloc/module and ktext cases. Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NJiri Olsa <jolsa@kernel.org> Acked-by: NKees Cook <keescook@chromium.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 9月, 2016 1 次提交
-
-
由 Dan Williams 提交于
Attempting to dump /proc/<pid>/smaps for a process with pmd dax mappings currently results in the following VM_BUG_ONs: kernel BUG at mm/huge_memory.c:1105! task: ffff88045f16b140 task.stack: ffff88045be14000 RIP: 0010:[<ffffffff81268f9b>] [<ffffffff81268f9b>] follow_trans_huge_pmd+0x2cb/0x340 [..] Call Trace: [<ffffffff81306030>] smaps_pte_range+0xa0/0x4b0 [<ffffffff814c2755>] ? vsnprintf+0x255/0x4c0 [<ffffffff8123c46e>] __walk_page_range+0x1fe/0x4d0 [<ffffffff8123c8a2>] walk_page_vma+0x62/0x80 [<ffffffff81307656>] show_smap+0xa6/0x2b0 kernel BUG at fs/proc/task_mmu.c:585! RIP: 0010:[<ffffffff81306469>] [<ffffffff81306469>] smaps_pte_range+0x499/0x4b0 Call Trace: [<ffffffff814c2795>] ? vsnprintf+0x255/0x4c0 [<ffffffff8123c46e>] __walk_page_range+0x1fe/0x4d0 [<ffffffff8123c8a2>] walk_page_vma+0x62/0x80 [<ffffffff81307696>] show_smap+0xa6/0x2b0 These locations are sanity checking page flags that must be set for an anonymous transparent huge page, but are not set for the zone_device pages associated with dax mappings. Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 01 9月, 2016 1 次提交
-
-
由 Mateusz Guzik 提交于
For more convenient access if one has a pointer to the task. As a minor nit take advantage of the fact that only task lock + rcu are needed to safely grab ->exe_file. This saves mm refcount dance. Use the helper in proc_exe_link. Signed-off-by: NMateusz Guzik <mguzik@redhat.com> Acked-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: NRichard Guy Briggs <rgb@redhat.com> Cc: <stable@vger.kernel.org> # 4.3.x Signed-off-by: NPaul Moore <paul@paul-moore.com>
-
- 19 8月, 2016 1 次提交
-
-
由 Josh Poimboeuf 提交于
When printing call return addresses found on a stack, /proc/<pid>/stack can sometimes give a confusing result. If the call instruction was the last instruction in the function (which can happen when calling a noreturn function), '%pS' will incorrectly display the name of the function which happens to be next in the object code, rather than the name of the actual calling function. Use '%pB' instead, which was created for this exact purpose. Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nilay Vaish <nilayvaish@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/47ad2821e5ebdbed1fbf83fb85424ae4fbdf8b6e.1471535549.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 15 8月, 2016 2 次提交
-
-
由 Dmitry Torokhov 提交于
If net namespace is attached to a user namespace let's make container's root owner of sysctls affecting said network namespace instead of global root. This also allows us to clean up net_ctl_permissions() because we do not need to fudge permissions anymore for the container's owner since it now owns the objects in question. Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Dmitry Torokhov 提交于
There are certain parameters that belong to net namespace and that are exported in /proc. They should be controllable by the container's owner, but are currently owned by global root and thus not available. Let's change proc code to inherit ownership of parent entry, and when create per-ns "net" proc entry set it up as owned by container's owner. Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 8月, 2016 1 次提交
-
-
由 Mel Gorman 提交于
meminfo_proc_show() and si_mem_available() are using the wrong helpers for calculating the size of the LRUs. The user-visible impact is that there appears to be an abnormally high number of unevictable pages. Link: http://lkml.kernel.org/r/20160805105805.GR2799@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 08 8月, 2016 1 次提交
-
-
由 Eric W. Biederman 提交于
Passing nsproxy into sysctl_table_root.lookup was a premature optimization in attempt to avoid depending on current. The directory /proc/self/sys has not appeared and if and when it does this code will need to be reviewed closely and reworked anyway. So remove the premature optimization. Acked-by: NKees Cook <keescook@chromium.org> Acked-by: NSerge Hallyn <serge@hallyn.com> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
- 04 8月, 2016 1 次提交
-
-
由 Geert Uytterhoeven 提交于
With gcc < 4.2 (e.g. 4.1.2): CC fs/proc/task_mmu.o cc1: error: unrecognized command line option "-Wno-override-init" To fix this, only enable the compiler option when it is actually supported by the compiler. Fixes: ca52953f ("fs/proc/task_mmu.c: suppress compilation warnings with W=1") Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org> Acked-by: NValdis Kletnieks <valdis.kletnieks@vt.edu> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 03 8月, 2016 3 次提交
-
-
由 Valdis Kletnieks 提交于
Suppress a bunch of warnings of the form: fs/proc/task_mmu.c: In function 'show_smap_vma_flags': fs/proc/task_mmu.c:635:22: warning: initialized field overwritten [-Wt override-init] [ilog2(VM_READ)] = "rd", ^~~~ fs/proc/task_mmu.c:635:22: note: (near initialization for 'mnemonics[0]') They happen because of the way we intentionally build the table, so silence the warning when building with 'make W=1'. Link: http://lkml.kernel.org/r/8727.1470022083@turing-police.cc.vt.eduSigned-off-by: NValdis Kletnieks <valdis.kletnieks@vt.edu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Arnd Bergmann 提交于
/proc/stat shows (among lots of other things) the current boottime (i.e. number of seconds since boot). While a 32-bit number is sufficient for this particular case, we want to get rid of the 'struct timespec' suffers from a 32-bit overflow in 2038. This changes the code to use a struct timespec64, which is known to be safe in all cases. Link: http://lkml.kernel.org/r/20160617201247.2292101-1-arnd@arndb.deSigned-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
This was needed before to ensure that ->signal != 0 and do_each_thread() is safe, see commit b95c35e7 ("oom: fix the unsafe usage of badness() in proc_oom_score()") for details. Today tsk->signal can't go away and for_each_thread(tsk) is always safe. Link: http://lkml.kernel.org/r/20160608211921.GA15508@redhat.comSigned-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NDavid Rientjes <rientjes@google.com> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 8月, 2016 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 31 7月, 2016 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 29 7月, 2016 8 次提交
-
-
由 Andy Lutomirski 提交于
Currently, NR_KERNEL_STACK tracks the number of kernel stacks in a zone. This only makes sense if each kernel stack exists entirely in one zone, and allowing vmapped stacks could break this assumption. Since frv has THREAD_SIZE < PAGE_SIZE, we need to track kernel stack allocations in a unit that divides both THREAD_SIZE and PAGE_SIZE on all architectures. Keep it simple and use KiB. Link: http://lkml.kernel.org/r/083c71e642c5fa5f1b6898902e1b2db7b48940d4.1468523549.git.luto@kernel.orgSigned-off-by: NAndy Lutomirski <luto@kernel.org> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: NVladimir Davydov <vdavydov@virtuozzo.com> Acked-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mel Gorman 提交于
There are now a number of accounting oddities such as mapped file pages being accounted for on the node while the total number of file pages are accounted on the zone. This can be coped with to some extent but it's confusing so this patch moves the relevant file-based accounted. Due to throttling logic in the page allocator for reliable OOM detection, it is still necessary to track dirty and writeback pages on a per-zone basis. [mgorman@techsingularity.net: fix NR_ZONE_WRITE_PENDING accounting] Link: http://lkml.kernel.org/r/1468404004-5085-5-git-send-email-mgorman@techsingularity.net Link: http://lkml.kernel.org/r/1467970510-21195-20-git-send-email-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net> Acked-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mel Gorman 提交于
NR_FILE_PAGES is the number of file pages. NR_FILE_MAPPED is the number of mapped file pages. NR_ANON_PAGES is the number of mapped anon pages. This is unhelpful naming as it's easy to confuse NR_FILE_MAPPED and NR_ANON_PAGES for mapped pages. This patch renames NR_ANON_PAGES so we have NR_FILE_PAGES is the number of file pages. NR_FILE_MAPPED is the number of mapped file pages. NR_ANON_MAPPED is the number of mapped anon pages. Link: http://lkml.kernel.org/r/1467970510-21195-19-git-send-email-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mel Gorman 提交于
Reclaim makes decisions based on the number of pages that are mapped but it's mixing node and zone information. Account NR_FILE_MAPPED and NR_ANON_PAGES pages on the node. Link: http://lkml.kernel.org/r/1467970510-21195-18-git-send-email-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net> Acked-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Hillf Danton <hillf.zj@alibaba-inc.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Rik van Riel <riel@surriel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michal Hocko 提交于
oom_score_adj is shared for the thread groups (via struct signal) but this is not sufficient to cover processes sharing mm (CLONE_VM without CLONE_SIGHAND) and so we can easily end up in a situation when some processes update their oom_score_adj and confuse the oom killer. In the worst case some of those processes might hide from the oom killer altogether via OOM_SCORE_ADJ_MIN while others are eligible. OOM killer would then pick up those eligible but won't be allowed to kill others sharing the same mm so the mm wouldn't release the mm and so the memory. It would be ideal to have the oom_score_adj per mm_struct because that is the natural entity OOM killer considers. But this will not work because some programs are doing vfork() set_oom_adj() exec() We can achieve the same though. oom_score_adj write handler can set the oom_score_adj for all processes sharing the same mm if the task is not in the middle of vfork. As a result all the processes will share the same oom_score_adj. The current implementation is rather pessimistic and checks all the existing processes by default if there is more than 1 holder of the mm but we do not have any reliable way to check for external users yet. Link: http://lkml.kernel.org/r/1466426628-15074-5-git-send-email-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com> Acked-by: NOleg Nesterov <oleg@redhat.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: David Rientjes <rientjes@google.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michal Hocko 提交于
Currently we have two proc interfaces to set oom_score_adj. The legacy /proc/<pid>/oom_adj and /proc/<pid>/oom_score_adj which both have their specific handlers. Big part of the logic is duplicated so extract the common code into __set_oom_adj helper. Legacy knob still expects some details slightly different so make sure those are handled same way - e.g. the legacy mode ignores oom_score_adj_min and it warns about the usage. This patch shouldn't introduce any functional changes. Link: http://lkml.kernel.org/r/1466426628-15074-4-git-send-email-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com> Acked-by: NOleg Nesterov <oleg@redhat.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: David Rientjes <rientjes@google.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michal Hocko 提交于
Oleg has pointed out that can simplify both oom_adj_{read,write} and oom_score_adj_{read,write} even further and drop the sighand lock. The main purpose of the lock was to protect p->signal from going away but this will not happen since ea6d290c ("signals: make task_struct->signal immutable/refcountable"). The other role of the lock was to synchronize different writers, especially those with CAP_SYS_RESOURCE. Introduce a mutex for this purpose. Later patches will need this lock anyway. Suggested-by: NOleg Nesterov <oleg@redhat.com> Link: http://lkml.kernel.org/r/1466426628-15074-3-git-send-email-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com> Acked-by: NOleg Nesterov <oleg@redhat.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: David Rientjes <rientjes@google.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michal Hocko 提交于
Series "Handle oom bypass more gracefully", V5 The following 10 patches should put some order to very rare cases of mm shared between processes and make the paths which bypass the oom killer oom reapable and therefore much more reliable finally. Even though mm shared outside of thread group is rare (either vforked tasks for a short period, use_mm by kernel threads or exotic thread model of clone(CLONE_VM) without CLONE_SIGHAND) it is better to cover them. Not only it makes the current oom killer logic quite hard to follow and reason about it can lead to weird corner cases. E.g. it is possible to select an oom victim which shares the mm with unkillable process or bypass the oom killer even when other processes sharing the mm are still alive and other weird cases. Patch 1 drops bogus task_lock and mm check from oom_{score_}adj_write. This can be considered a bug fix with a low impact as nobody has noticed for years. Patch 2 drops sighand lock because it is not needed anymore as pointed by Oleg. Patch 3 is a clean up of oom_score_adj handling and a preparatory work for later patches. Patch 4 enforces oom_adj_score to be consistent between processes sharing the mm to behave consistently with the regular thread groups. This can be considered a user visible behavior change because one thread group updating oom_score_adj will affect others which share the same mm via clone(CLONE_VM). I argue that this should be acceptable because we already have the same behavior for threads in the same thread group and sharing the mm without signal struct is just a different model of threading. This is probably the most controversial part of the series, I would like to find some consensus here. There were some suggestions to hook some counter/oom_score_adj into the mm_struct but I feel that this is not necessary right now and we can rely on proc handler + oom_kill_process to DTRT. I can be convinced otherwise but I strongly think that whatever we do the userspace has to have a way to see the current oom priority as consistently as possible. Patch 5 makes sure that no vforked task is selected if it is sharing the mm with oom unkillable task. Patch 6 ensures that all user tasks sharing the mm are killed which in turn makes sure that all oom victims are oom reapable. Patch 7 guarantees that task_will_free_mem will always imply reapable bypass of the oom killer. Patch 8 is new in this version and it addresses an issue pointed out by 0-day OOM report where an oom victim was reaped several times. Patch 9 puts an upper bound on how many times oom_reaper tries to reap a task and hides it from the oom killer to move on when no progress can be made. This will give an upper bound to how long an oom_reapable task can block the oom killer from selecting another victim if the oom_reaper is not able to reap the victim. Patch 10 tries to plug the (hopefully) last hole when we can still lock up when the oom victim is shared with oom unkillable tasks (kthreads and global init). We just try to be best effort in that case and rather fallback to kill something else than risk a lockup. This patch (of 10): Both oom_adj_write and oom_score_adj_write are using task_lock, check for task->mm and fail if it is NULL. This is not needed because the oom_score_adj is per signal struct so we do not need mm at all. The code has been introduced by 3d5992d2 ("oom: add per-mm oom disable count") but we do not do per-mm oom disable since c9f01245 ("oom: remove oom_disable_count"). The task->mm check is even not correct because the current thread might have exited but the thread group might be still alive - e.g. thread group leader would lead that echo $VAL > /proc/pid/oom_score_adj would always fail with EINVAL while /proc/pid/task/$other_tid/oom_score_adj would succeed. This is unexpected at best. Remove the lock along with the check to fix the unexpected behavior and also because there is not real need for the lock in the first place. Link: http://lkml.kernel.org/r/1466426628-15074-2-git-send-email-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com> Reviewed-by: NVladimir Davydov <vdavydov@virtuozzo.com> Acked-by: NOleg Nesterov <oleg@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 27 7月, 2016 1 次提交
-
-
由 Kirill A. Shutemov 提交于
Let's add ShmemHugePages and ShmemPmdMapped fields into meminfo and smaps. It indicates how many times we allocate and map shmem THP. NR_ANON_TRANSPARENT_HUGEPAGES is renamed to NR_ANON_THPS. Link: http://lkml.kernel.org/r/1466021202-61880-27-git-send-email-kirill.shutemov@linux.intel.comSigned-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 24 6月, 2016 3 次提交
-
-
由 Eric W. Biederman 提交于
Introduce a function may_open_dev that tests MNT_NODEV and a new superblock flab SB_I_NODEV. Use this new function in all of the places where MNT_NODEV was previously tested. Add the new SB_I_NODEV s_iflag to proc, sysfs, and mqueuefs as those filesystems should never support device nodes, and a simple superblock flags makes that very hard to get wrong. With SB_I_NODEV set if any device nodes somehow manage to show up on on a filesystem those device nodes will be unopenable. Acked-by: NSeth Forshee <seth.forshee@canonical.com> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
由 Eric W. Biederman 提交于
Move the call of get_pid_ns, the call of proc_parse_options, and the setting of s_iflags into proc_fill_super so that mount_ns can be used. Convert proc_mount to call mount_ns and remove the now unnecessary code. Acked-by: NSeth Forshee <seth.forshee@canonical.com> Reviewed-by: NDjalal Harouni <tixxdz@gmail.com> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
由 Eric W. Biederman 提交于
Replace the call of fs_fully_visible in do_new_mount from before the new superblock is allocated with a call of mount_too_revealing after the superblock is allocated. This winds up being a much better location for maintainability of the code. The first change this enables is the replacement of FS_USERNS_VISIBLE with SB_I_USERNS_VISIBLE. Moving the flag from struct filesystem_type to sb_iflags on the superblock. Unfortunately mount_too_revealing fundamentally needs to touch mnt_flags adding several MNT_LOCKED_XXX flags at the appropriate times. If the mnt_flags did not need to be touched the code could be easily moved into the filesystem specific mount code. Acked-by: NSeth Forshee <seth.forshee@canonical.com> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
- 11 6月, 2016 2 次提交
-
-
由 Linus Torvalds 提交于
We always mixed in the parent pointer into the dentry name hash, but we did it late at lookup time. It turns out that we can simplify that lookup-time action by salting the hash with the parent pointer early instead of late. A few other users of our string hashes also wanted to mix in their own pointers into the hash, and those are updated to use the same mechanism. Hash users that don't have any particular initial salt can just use the NULL pointer as a no-salt. Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: George Spelvin <linux@sciencehorizons.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jann Horn 提交于
This prevents stacking filesystems (ecryptfs and overlayfs) from using procfs as lower filesystem. There is too much magic going on inside procfs, and there is no good reason to stack stuff on top of procfs. (For example, procfs does access checks in VFS open handlers, and ecryptfs by design calls open handlers from a kernel thread that doesn't drop privileges or so.) Signed-off-by: NJann Horn <jannh@google.com> Cc: stable@vger.kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 24 5月, 2016 1 次提交
-
-
由 Michal Hocko 提交于
CLEAR_REFS_MM_HIWATER_RSS and CLEAR_REFS_SOFT_DIRTY are relying on mmap_sem for write. If the waiting task gets killed by the oom killer and it would operate on the current's mm it would block oom_reaper from asynchronous address space reclaim and reduce the chances of timely OOM resolving. Wait for the lock in the killable mode and return with EINTR if the task got killed while waiting. This will also expedite the return to the userspace and do_exit even if the mm is remote. Signed-off-by: NMichal Hocko <mhocko@suse.com> Acked-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Petr Cermak <petrcermak@chromium.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 21 5月, 2016 2 次提交
-
-
由 Janis Danisevskis 提交于
The PR_DUMPABLE flag causes the pid related paths of the proc file system to be owned by ROOT. The implementation of pthread_set/getname_np however needs access to /proc/<pid>/task/<tid>/comm. If PR_DUMPABLE is false this implementation is locked out. This patch installs a special permission function for the file "comm" that grants read and write access to all threads of the same group regardless of the ownership of the inode. For all other threads the function falls back to the generic inode permission check. [akpm@linux-foundation.org: fix spello in comment] Signed-off-by: NJanis Danisevskis <jdanis@google.com> Acked-by: NKees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Colin Ian King <colin.king@canonical.com> Cc: David Rientjes <rientjes@google.com> Cc: Minfei Huang <mnfhuang@gmail.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Calvin Owens <calvinowens@fb.com> Cc: Jann Horn <jann@thejh.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Richard W.M. Jones 提交于
It's not possible to read the process umask without also modifying it, which is what umask(2) does. A library cannot read umask safely, especially if the main program might be multithreaded. Add a new status line ("Umask") in /proc/<PID>/status. It contains the file mode creation mask (umask) in octal. It is only shown for tasks which have task->fs. This patch is adapted from one originally written by Pierre Carrier. The use case is that we have endless trouble with people setting weird umask() values (usually on the grounds of "security"), and then everything breaking. I'm on the hook to fix these. We'd like to add debugging to our program so we can dump out the umask in debug reports. Previous versions of the patch used a syscall so you could only read your own umask. That's all I need. However there was quite a lot of push-back from those, so this new version exports it in /proc. See: https://lkml.org/lkml/2016/4/13/704 [umask2] https://lkml.org/lkml/2016/4/13/487 [getumask] Signed-off-by: NRichard W.M. Jones <rjones@redhat.com> Acked-by: NKonstantin Khlebnikov <koct9i@gmail.com> Acked-by: NJerome Marchand <jmarchan@redhat.com> Acked-by: NKees Cook <keescook@chromium.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Michal Hocko <mhocko@kernel.org> Cc: Pierre Carrier <pierre@spotify.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 20 5月, 2016 1 次提交
-
-
由 Joonsoo Kim 提交于
Many developers already know that field for reference count of the struct page is _count and atomic type. They would try to handle it directly and this could break the purpose of page reference count tracepoint. To prevent direct _count modification, this patch rename it to _refcount and add warning message on the code. After that, developer who need to handle reference count will find that field should not be accessed directly. [akpm@linux-foundation.org: fix comments, per Vlastimil] [akpm@linux-foundation.org: Documentation/vm/transhuge.txt too] [sfr@canb.auug.org.au: sync ethernet driver changes] Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Sunil Goutham <sgoutham@cavium.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Manish Chopra <manish.chopra@qlogic.com> Cc: Yuval Mintz <yuval.mintz@qlogic.com> Cc: Tariq Toukan <tariqt@mellanox.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 5月, 2016 1 次提交
-
-
由 Daniel Wagner 提交于
parse_crash_elf{32|64}_headers will check the headers via the elf_check_arch respectively vmcore_elf64_check_arch macro. The MIPS architecture implements those two macros differently. In order to make the differentiation more explicit, let's introduce an vmcore_elf32_check_arch to allow the archs to overwrite it. Signed-off-by: NDaniel Wagner <daniel.wagner@bmw-carit.de> Suggested-by: NMaciej W. Rozycki <macro@imgtec.com> Reviewed-by: NMaciej W. Rozycki <macro@imgtec.com> Cc: linux-kernel@vger.kernel.org Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/12535/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
-
- 10 5月, 2016 1 次提交
-
-
由 Robin Humble 提交于
This reverts the 4.6-rc1 commit 7e2bc81d ("proc/base: make prompt shell start from new line after executing "cat /proc/$pid/wchan") because it breaks /proc/$PID/whcan formatting in ps and top. Revert also because the patch is inconsistent - it adds a newline at the end of only the '0' wchan, and does not add a newline when /proc/$PID/wchan contains a symbol name. eg. $ ps -eo pid,stat,wchan,comm PID STAT WCHAN COMMAND ... 1189 S - dbus-launch 1190 Ssl 0 dbus-daemon 1198 Sl 0 lightdm 1299 Ss ep_pol systemd 1301 S - (sd-pam) 1304 Ss wait sh Signed-off-by: NRobin Humble <plaguedbypenguins@gmail.com> Cc: Minfei Huang <mnfhuang@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 06 5月, 2016 1 次提交
-
-
由 Mathias Krause 提交于
If /proc/<PID>/environ gets read before the envp[] array is fully set up in create_{aout,elf,elf_fdpic,flat}_tables(), we might end up trying to read more bytes than are actually written, as env_start will already be set but env_end will still be zero, making the range calculation underflow, allowing to read beyond the end of what has been written. Fix this as it is done for /proc/<PID>/cmdline by testing env_end for zero. It is, apparently, intentionally set last in create_*_tables(). This bug was found by the PaX size_overflow plugin that detected the arithmetic underflow of 'this_len = env_end - (env_start + src)' when env_end is still zero. The expected consequence is that userland trying to access /proc/<PID>/environ of a not yet fully set up process may get inconsistent data as we're in the middle of copying in the environment variables. Fixes: https://forums.grsecurity.net/viewtopic.php?f=3&t=4363 Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=116461Signed-off-by: NMathias Krause <minipli@googlemail.com> Cc: Emese Revfy <re.emese@gmail.com> Cc: Pax Team <pageexec@freemail.hu> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Jarod Wilson <jarod@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 03 5月, 2016 3 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
make it usable with directory locked shared Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
... making it usable with directory locked shared Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 29 4月, 2016 1 次提交
-
-
由 Gerald Schaefer 提交于
In gather_pte_stats() a THP pmd is cast into a pte, which is wrong because the layouts may differ depending on the architecture. On s390 this will lead to inaccurate numa_maps accounting in /proc because of misguided pte_present() and pte_dirty() checks on the fake pte. On other architectures pte_present() and pte_dirty() may work by chance, but there may be an issue with direct-access (dax) mappings w/o underlying struct pages when HAVE_PTE_SPECIAL is set and THP is available. In vm_normal_page() the fake pte will be checked with pte_special() and because there is no "special" bit in a pmd, this will always return false and the VM_PFNMAP | VM_MIXEDMAP checking will be skipped. On dax mappings w/o struct pages, an invalid struct page pointer would then be returned that can crash the kernel. This patch fixes the numa_maps THP handling by introducing new "_pmd" variants of the can_gather_numa_stats() and vm_normal_page() functions. Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> [4.3+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-