- 30 6月, 2016 1 次提交
-
-
由 Miklos Szeredi 提交于
Add missing documentation for the d_op->d_real() method and d_real() helper. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
- 06 6月, 2016 1 次提交
-
-
由 Eric W. Biederman 提交于
The /dev/ptmx device node is changed to lookup the directory entry "pts" in the same directory as the /dev/ptmx device node was opened in. If there is a "pts" entry and that entry is a devpts filesystem /dev/ptmx uses that filesystem. Otherwise the open of /dev/ptmx fails. The DEVPTS_MULTIPLE_INSTANCES configuration option is removed, so that userspace can now safely depend on each mount of devpts creating a new instance of the filesystem. Each mount of devpts is now a separate and equal filesystem. Reserved ttys are now available to all instances of devpts where the mounter is in the initial mount namespace. A new vfs helper path_pts is introduced that finds a directory entry named "pts" in the directory of the passed in path, and changes the passed in path to point to it. The helper path_pts uses a function path_parent_directory that was factored out of follow_dotdot. In the implementation of devpts: - devpts_mnt is killed as it is no longer meaningful if all mounts of devpts are equal. - pts_sb_from_inode is replaced by just inode->i_sb as all cached inodes in the tty layer are now from the devpts filesystem. - devpts_add_ref is rolled into the new function devpts_ptmx. And the unnecessary inode hold is removed. - devpts_del_ref is renamed devpts_release and reduced to just a deacrivate_super. - The newinstance mount option continues to be accepted but is now ignored. In devpts_fs.h definitions for when !CONFIG_UNIX98_PTYS are removed as they are never used. Documentation/filesystems/devices.txt is updated to describe the current situation. This has been verified to work properly on openwrt-15.05, centos5, centos6, centos7, debian-6.0.2, debian-7.9, debian-8.2, ubuntu-14.04.3, ubuntu-15.10, fedora23, magia-5, mint-17.3, opensuse-42.1, slackware-14.1, gentoo-20151225 (13.0?), archlinux-2015-12-01. With the caveat that on centos6 and on slackware-14.1 that there wind up being two instances of the devpts filesystem mounted on /dev/pts, the lower copy does not end up getting used. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Cc: Greg KH <greg@kroah.com> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Serge Hallyn <serge.hallyn@ubuntu.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Aurelien Jarno <aurelien@aurel32.net> Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Cc: Jann Horn <jann@thejh.net> Cc: Jiri Slaby <jslaby@suse.com> Cc: Florian Weimer <fw@deneb.enyo.de> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 28 5月, 2016 1 次提交
-
-
由 Al Viro 提交于
smack ->d_instantiate() uses ->setxattr(), so to be able to call it before we'd hashed the new dentry and attached it to inode, we need ->setxattr() instances getting the inode as an explicit argument rather than obtaining it from dentry. Similar change for ->getxattr() had been done in commit ce23e640. Unlike ->getxattr() (which is used by both selinux and smack instances of ->d_instantiate()) ->setxattr() is used only by smack one and unfortunately it got missed back then. Reported-by: NSeung-Woo Kim <sw0312.kim@samsung.com> Tested-by: NCasey Schaufler <casey@schaufler-ca.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 27 5月, 2016 1 次提交
-
-
由 Miklos Szeredi 提交于
Two "fixme" items are actually fixed now. Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
- 26 5月, 2016 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 24 5月, 2016 1 次提交
-
-
由 Ryusuke Konishi 提交于
To respond to a certain developer's request, this explicitly state that developers can reimplement the nilfs2 design for other operating systems to share data stored in that format. Link: http://lkml.kernel.org/r/1461935747-10380-7-git-send-email-konishi.ryusuke@lab.ntt.co.jpSigned-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 21 5月, 2016 1 次提交
-
-
由 Richard W.M. Jones 提交于
It's not possible to read the process umask without also modifying it, which is what umask(2) does. A library cannot read umask safely, especially if the main program might be multithreaded. Add a new status line ("Umask") in /proc/<PID>/status. It contains the file mode creation mask (umask) in octal. It is only shown for tasks which have task->fs. This patch is adapted from one originally written by Pierre Carrier. The use case is that we have endless trouble with people setting weird umask() values (usually on the grounds of "security"), and then everything breaking. I'm on the hook to fix these. We'd like to add debugging to our program so we can dump out the umask in debug reports. Previous versions of the patch used a syscall so you could only read your own umask. That's all I need. However there was quite a lot of push-back from those, so this new version exports it in /proc. See: https://lkml.org/lkml/2016/4/13/704 [umask2] https://lkml.org/lkml/2016/4/13/487 [getumask] Signed-off-by: NRichard W.M. Jones <rjones@redhat.com> Acked-by: NKonstantin Khlebnikov <koct9i@gmail.com> Acked-by: NJerome Marchand <jmarchan@redhat.com> Acked-by: NKees Cook <keescook@chromium.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Michal Hocko <mhocko@kernel.org> Cc: Pierre Carrier <pierre@spotify.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 19 5月, 2016 1 次提交
-
-
由 Vishal Verma 提交于
In the truncate or hole-punch path in dax, we clear out sub-page ranges. If these sub-page ranges are sector aligned and sized, we can do the zeroing through the driver instead so that error-clearing is handled automatically. For sub-sector ranges, we still have to rely on clear_pmem and have the possibility of tripping over errors. Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Jan Kara <jack@suse.cz> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
-
- 03 5月, 2016 4 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
New method: ->iterate_shared(). Same arguments as in ->iterate(), called with the directory locked only shared. Once all filesystems switch, the old one will be gone. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
ta-da! The main issue is the lack of down_write_killable(), so the places like readdir.c switched to plain inode_lock(); once killable variants of rwsem primitives appear, that'll be dealt with. lockdep side also might need more work Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
We'll need to verify that there's neither a hashed nor in-lookup dentry with desired parent/name before adding to in-lookup set. One possible solution would be to hold the parent's ->d_lock through both checks, but while the in-lookup set is relatively small at any time, dcache is not. And holding the parent's ->d_lock through something like __d_lookup_rcu() would suck too badly. So we leave the parent's ->d_lock alone, which means that we watch out for the following scenario: * we verify that there's no hashed match * existing in-lookup match gets hashed by another process * we verify that there's no in-lookup matches and decide that everything's fine. Solution: per-directory kinda-sorta seqlock, bumped around the times we hash something that used to be in-lookup or move (and hash) something in place of in-lookup. Then the above would turn into * read the counter * do dcache lookup * if no matches found, check for in-lookup matches * if there had been none of those either, check if the counter has changed; repeat if it has. The "kinda-sorta" part is due to the fact that we don't have much spare space in inode. There is a spare word (shared with i_bdev/i_cdev/i_pipe), so the counter part is not a problem, but spinlock is a different story. We could use the parent's ->d_lock, and it would be less painful in terms of contention, for __d_add() it would be rather inconvenient to grab; we could do that (using lock_parent()), but... Fortunately, we can get serialization on the counter itself, and it might be a good idea in general; we can use cmpxchg() in a loop to get from even to odd and smp_store_release() from odd to even. This commit adds the counter and updating logics; the readers will be added in the next commit. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 02 5月, 2016 1 次提交
-
-
由 Christoph Hellwig 提交于
Including blkdev_direct_IO and dax_do_io. It has to be ki_pos to actually work, so eliminate the superflous argument. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 28 4月, 2016 1 次提交
-
-
由 Kees Cook 提交于
This fixes several spelling mistakes in the Documentation/ tree, which are caught by checkpatch.pl's spell checking. Signed-off-by: NKees Cook <keescook@chromium.org> Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: NRandy Dunlap <rdunlap@infradead.org> Signed-off-by: NJonathan Corbet <corbet@lwn.net>
-
- 11 4月, 2016 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 05 4月, 2016 1 次提交
-
-
由 Kirill A. Shutemov 提交于
Mostly direct substitution with occasional adjustment or removing outdated comments. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 23 3月, 2016 2 次提交
-
-
由 Maciej S. Szmigiero 提交于
FAT has long supported its own default file name encoding config setting, separate from CONFIG_NLS_DEFAULT. However, if UTF-8 encoded file names are desired FAT character set should not be set to utf8 since this would make file names case sensitive even if case insensitive matching is requested. Instead, "utf8" mount options should be provided to enable UTF-8 file names in FAT file system. Unfortunately, there was no possibility to set the default value of this option so on UTF-8 system "utf8" mount option had to be added manually to most FAT mounts. This patch adds config option to set such default value. Signed-off-by: NMaciej S. Szmigiero <mail@maciej.szmigiero.name> Acked-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Gang He 提交于
This document will describe OCFS2 online file check feature. OCFS2 is often used in high-availaibility systems. However, OCFS2 usually converts the filesystem to read-only when encounters an error. This may not be necessary, since turning the filesystem read-only would affect other running processes as well, decreasing availability. Then, a mount option (errors=continue) is introduced, which would return the -EIO errno to the calling process and terminate furhter processing so that the filesystem is not corrupted further. The filesystem is not converted to read-only, and the problematic file's inode number is reported in the kernel log. The user can try to check/fix this file via online filecheck feature. Signed-off-by: NGang He <ghe@suse.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 3月, 2016 2 次提交
-
-
由 Christoph Hellwig 提交于
This is a simple extension to the block layout driver to use SCSI persistent reservations for access control and fencing, as well as SCSI VPD pages for device identification. For this we need to pass the nfs4_client to the proc_getdeviceinfo method to generate the reservation key, and add a new fence_client method to allow for fence actions in the layout driver. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 John Stultz 提交于
This patch provides a proc/PID/timerslack_ns interface which exposes a task's timerslack value in nanoseconds and allows it to be changed. This allows power/performance management software to set timer slack for other threads according to its policy for the thread (such as when the thread is designated foreground vs. background activity) If the value written is non-zero, slack is set to that value. Otherwise sets it to the default for the thread. This interface checks that the calling task has permissions to to use PTRACE_MODE_ATTACH_FSCREDS on the target task, so that we can ensure arbitrary apps do not change the timer slack for other apps. Signed-off-by: NJohn Stultz <john.stultz@linaro.org> Acked-by: NKees Cook <keescook@chromium.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Oren Laadan <orenl@cellrox.com> Cc: Ruchi Kandoi <kandoiruchi@google.com> Cc: Rom Lemarchand <romlem@android.com> Cc: Android Kernel Team <kernel-team@android.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 12 3月, 2016 1 次提交
-
-
由 David Sterba 提交于
The document in the kernel sources is yet another palce where the documentation would need to be updated, while it is not the primary source. We actively maintain the wiki pages. Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 10 3月, 2016 2 次提交
-
-
由 Masanari Iida 提交于
This patch fix spelling typos found in Documentation/filesystems/nfs Signed-off-by: NMasanari Iida <standby24x7@gmail.com> Signed-off-by: NJonathan Corbet <corbet@lwn.net>
-
由 Javi Merino 提交于
Some minor typos: - make is unbindable -> make it unbindable - a underlying -> an underlying - different version -> different versions Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: NJavi Merino <javi.merino@arm.com> Signed-off-by: NJonathan Corbet <corbet@lwn.net>
-
- 08 3月, 2016 1 次提交
-
-
由 Konstantin Khlebnikov 提交于
Logic has been changed in kernel 3.4 by commit e9aba515 ("tty: rework pty count limiting") but still not documented. Sysctl kernel.pty.max works as global limit, kernel.pty.reserve ptys are reserved for initial devpts instance (mounted without "newinstance"). Per-instance limit also could be set by mount option "max=%d". Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 06 3月, 2016 1 次提交
-
-
由 Christoph Hellwig 提交于
Replace the current NULL-terminated array of default groups with a linked list. This gets rid of lots of nasty code to size and/or dynamically allocate the array. While we're at it also provide a conveniant helper to remove the default groups. Signed-off-by: NChristoph Hellwig <hch@lst.de> Acked-by: Felipe Balbi <balbi@kernel.org> [drivers/usb/gadget] Acked-by: NJoel Becker <jlbec@evilplan.org> Acked-by: NNicholas Bellinger <nab@linux-iscsi.org> Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
-
- 27 2月, 2016 1 次提交
-
-
由 Mike Marshall 提交于
Al Viro has cleaned up the way ops are processed and waited for, now orangefs.txt has an overview of how it works. Several recent related commits have added to the comments in the code as well. Signed-off-by: NMike Marshall <hubcap@omnibond.com>
-
- 12 2月, 2016 2 次提交
-
-
由 Qu Wenruo 提交于
Introduce a new mount option "nologreplay" to co-operate with "ro" mount option to get real readonly mount, like "norecovery" in ext* and xfs. Since the new parse_options() need to check new flags at remount time, so add a new parameter for parse_options(). Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com> Reviewed-by: NChandan Rajendra <chandan@linux.vnet.ibm.com> Tested-by: NAustin S. Hemmelgarn <ahferroin7@gmail.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Qu Wenruo 提交于
Current "recovery" mount option will only try to use backup root. However the word "recovery" is too generic and may be confusing for some users. Here introduce a new and more specific mount option, "usebackuproot" to replace "recovery" mount option. "Recovery" will be kept for compatibility reason, but will be deprecated. Also, since "usebackuproot" will only affect mount behavior and after open_ctree() it has nothing to do with the filesystem, so clear the flag after mount succeeded. This provides the basis for later unified "norecovery" mount option. Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com> [ dropped usebackuproot from show_mount, added note about 'recovery' to docs ] Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 11 2月, 2016 1 次提交
-
-
由 Peter Jones 提交于
"rm -rf" is bricking some peoples' laptops because of variables being used to store non-reinitializable firmware driver data that's required to POST the hardware. These are 100% bugs, and they need to be fixed, but in the mean time it shouldn't be easy to *accidentally* brick machines. We have to have delete working, and picking which variables do and don't work for deletion is quite intractable, so instead make everything immutable by default (except for a whitelist), and make tools that aren't quite so broad-spectrum unset the immutable flag. Signed-off-by: NPeter Jones <pjones@redhat.com> Tested-by: NLee, Chun-Yi <jlee@suse.com> Acked-by: NMatthew Garrett <mjg59@coreos.com> Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
-
- 04 2月, 2016 2 次提交
-
-
由 Konstantin Khlebnikov 提交于
* add VM_STACK as alias for VM_GROWSUP/DOWN depending on architecture * always account VMAs with flag VM_STACK as stack (as it was before) * cleanup classifying helpers * update comments and documentation Signed-off-by: NKonstantin Khlebnikov <koct9i@gmail.com> Tested-by: NSudip Mukherjee <sudipm.mukherjee@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
Commit b7643757 ("procfs: mark thread stack correctly in proc/<pid>/maps") added [stack:TID] annotation to /proc/<pid>/maps. Finding the task of a stack VMA requires walking the entire thread list, turning this into quadratic behavior: a thousand threads means a thousand stacks, so the rendering of /proc/<pid>/maps needs to look at a million combinations. The cost is not in proportion to the usefulness as described in the patch. Drop the [stack:TID] annotation to make /proc/<pid>/maps (and /proc/<pid>/numa_maps) usable again for higher thread counts. The [stack] annotation inside /proc/<pid>/task/<tid>/maps is retained, as identifying the stack VMA there is an O(1) operation. Siddesh said: "The end users needed a way to identify thread stacks programmatically and there wasn't a way to do that. I'm afraid I no longer remember (or have access to the resources that would aid my memory since I changed employers) the details of their requirement. However, I did do this on my own time because I thought it was an interesting project for me and nobody really gave any feedback then as to its utility, so as far as I am concerned you could roll back the main thread maps information since the information is available in the thread-specific files" Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com> Cc: Shaohua Li <shli@fb.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 21 1月, 2016 1 次提交
-
-
由 Namjae Jeon 提交于
Update the limitation for fat fallocate. Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com> Signed-off-by: NAmit Sahrawat <a.sahrawat@samsung.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 1月, 2016 6 次提交
-
-
由 Rodrigo Freire 提交于
The Shared Memory accounting support is present in Kernel since commit 4b02108a ("mm: oom analysis: add shmem vmstat") and in userland free(1) since 2014. This patch updates the Documentation to reflect this change. Signed-off-by: NRodrigo Freire <rfreire@redhat.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jerome Marchand 提交于
There are several shortcomings with the accounting of shared memory (SysV shm, shared anonymous mapping, mapping of a tmpfs file). The values in /proc/<pid>/status and <...>/statm don't allow to distinguish between shmem memory and a shared mapping to a regular file, even though theirs implication on memory usage are quite different: during reclaim, file mapping can be dropped or written back on disk, while shmem needs a place in swap. Also, to distinguish the memory occupied by anonymous and file mappings, one has to read the /proc/pid/statm file, which has a field for the file mappings (again, including shmem) and total memory occupied by these mappings (i.e. equivalent to VmRSS in the <...>/status file. Getting the value for anonymous mappings only is thus not exactly user-friendly (the statm file is intended to be rather efficiently machine-readable). To address both of these shortcomings, this patch adds a breakdown of VmRSS in /proc/<pid>/status via new fields RssAnon, RssFile and RssShmem, making use of the previous preparatory patch. These fields tell the user the memory occupied by private anonymous pages, mapped regular files and shmem, respectively. Other existing fields in /status and /statm files are left without change. The /statm file can be extended in the future, if there's a need for that. Example (part of) /proc/pid/status output including the new Rss* fields: VmPeak: 2001008 kB VmSize: 2001004 kB VmLck: 0 kB VmPin: 0 kB VmHWM: 5108 kB VmRSS: 5108 kB RssAnon: 92 kB RssFile: 1324 kB RssShmem: 3692 kB VmData: 192 kB VmStk: 136 kB VmExe: 4 kB VmLib: 1784 kB VmPTE: 3928 kB VmPMD: 20 kB VmSwap: 0 kB HugetlbPages: 0 kB [vbabka@suse.cz: forward-porting, tweak changelog] Signed-off-by: NJerome Marchand <jmarchan@redhat.com> Signed-off-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: NMichal Hocko <mhocko@suse.com> Acked-by: NHugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vlastimil Babka 提交于
Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed mappings, even if the mapped portion does contain pages that were swapped out. This is because unlike private anonymous mappings, shmem does not change pte to swap entry, but pte_none when swapping the page out. In the smaps page walk, such page thus looks like it was never faulted in. This patch changes smaps_pte_entry() to determine the swap status for such pte_none entries for shmem mappings, similarly to how mincore_page() does it. Swapped out shmem pages are thus accounted for. For private mappings of tmpfs files that COWed some of the pages, swaped out status of the original shmem pages is naturally ignored. If some of the private copies was also swapped out, they are accounted via their page table swap entries, so the resulting reported swap usage is then a sum of both swapped out private copies, and swapped out shmem pages that were not COWed. No double accounting can thus happen. The accounting is arguably still not as precise as for private anonymous mappings, since now we will count also pages that the process in question never accessed, but another process populated them and then let them become swapped out. I believe it is still less confusing and subtle than not showing any swap usage by shmem mappings at all. Swapped out counter might of interest of users who would like to prevent from future swapins during performance critical operation and pre-fault them at their convenience. Especially for larger swapped out regions the cost of swapin is much higher than a fresh page allocation. So a differentiation between pte_none vs. swapped out is important for those usecases. One downside of this patch is that it makes /proc/pid/smaps more expensive for shmem mappings, as we consult the radix tree for each pte_none entry, so the overal complexity is O(n*log(n)). I have measured this on a process that creates a 2GB mapping and dirties single pages with a stride of 2MB, and time how long does it take to cat /proc/pid/smaps of this process 100 times. Private anonymous mapping: real 0m0.949s user 0m0.116s sys 0m0.348s Mapping of a /dev/shm/file: real 0m3.831s user 0m0.180s sys 0m3.212s The difference is rather substantial, so the next patch will reduce the cost for shared or read-only mappings. In a less controlled experiment, I've gathered pids of processes on my desktop that have either '/dev/shm/*' or 'SYSV*' in smaps. This included the Chrome browser and some KDE processes. Again, I've run cat /proc/pid/smaps on each 100 times. Before this patch: real 0m9.050s user 0m0.518s sys 0m8.066s After this patch: real 0m9.221s user 0m0.541s sys 0m8.187s This suggests low impact on average systems. Note that this patch doesn't attempt to adjust the SwapPss field for shmem mappings, which would need extra work to determine who else could have the pages mapped. Thus the value stays zero except for COWed swapped out pages in a shmem mapping, which are accounted as usual. Signed-off-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: NJerome Marchand <jmarchan@redhat.com> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vlastimil Babka 提交于
This series is based on Jerome Marchand's [1] so let me quote the first paragraph from there: There are several shortcomings with the accounting of shared memory (sysV shm, shared anonymous mapping, mapping to a tmpfs file). The values in /proc/<pid>/status and statm don't allow to distinguish between shmem memory and a shared mapping to a regular file, even though their implications on memory usage are quite different: at reclaim, file mapping can be dropped or written back on disk while shmem needs a place in swap. As for shmem pages that are swapped-out or in swap cache, they aren't accounted at all. The original motivation for myself is that a customer found (IMHO rightfully) confusing that e.g. top output for process swap usage is unreliable with respect to swapped out shmem pages, which are not accounted for. The fundamental difference between private anonymous and shmem pages is that the latter has PTE's converted to pte_none, and not swapents. As such, they are not accounted to the number of swapents visible e.g. in /proc/pid/status VmSwap row. It might be theoretically possible to use swapents when swapping out shmem (without extra cost, as one has to change all mappers anyway), and on swap in only convert the swapent for the faulting process, leaving swapents in other processes until they also fault (so again no extra cost). But I don't know how many assumptions this would break, and it would be too disruptive change for a relatively small benefit. Instead, my approach is to document the limitation of VmSwap, and provide means to determine the swap usage for shmem areas for those who are interested and willing to pay the price, using /proc/pid/smaps. Because outside of ipcs, I don't think it's possible to currently to determine the usage at all. The previous patchset [1] did introduce new shmem-specific fields into smaps output, and functions to determine the values. I take a simpler approach, noting that smaps output already has a "Swap: X kB" line, where currently X == 0 always for shmem areas. I think we can just consider this a bug and provide the proper value by consulting the radix tree, as e.g. mincore_page() does. In the patch changelog I explain why this is also not perfect (and cannot be without swapents), but still arguably much better than showing a 0. The last two patches are adapted from Jerome's patchset and provide a VmRSS breakdown to RssAnon, RssFile and RssShm in /proc/pid/status. Hugh noted that this is a welcome addition, and I agree that it might help e.g. debugging process memory usage at albeit non-zero, but still rather low cost of extra per-mm counter and some page flag checks. [1] http://lwn.net/Articles/611966/ This patch (of 6): The documentation for /proc/pid/status does not mention that the value of VmSwap counts only swapped out anonymous private pages, and not swapped out pages of the underlying shmem objects (for shmem mappings). This is not obvious, so document this limitation. Signed-off-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: NMichal Hocko <mhocko@suse.com> Acked-by: NJerome Marchand <jmarchan@redhat.com> Acked-by: NHugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Al Viro 提交于
inode_nohighmem() is sufficient to make sure that page_get_link() won't try to allocate a highmem page. Moreover, it is sufficient to make sure that page_symlink/__page_symlink won't do the same thing. However, any filesystem that manually preseeds the symlink's page cache upon symlink(2) needs to make sure that the page it inserts there won't be a highmem one. Fortunately, only nfs and shmem have run afoul of that... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 SeongJae Park 提交于
The site for libhugetlbfs has moved from sourceforge to github. This commit updates the old url. Signed-off-by: NSeongJae Park <sj38.park@gmail.com> Acked-by: NMike Kravetz <mike.kravetz@oracle.com> Signed-off-by: NJonathan Corbet <corbet@lwn.net>
-
- 14 1月, 2016 1 次提交
-
-
由 Mike Marshall 提交于
Signed-off-by: NMike Marshall <hubcap@omnibond.com>
-
- 04 1月, 2016 1 次提交
-
-
由 Pantelis Antoniou 提交于
ConfigFS lacked binary attributes up until now. This patch introduces support for binary attributes in a somewhat similar manner of sysfs binary attributes albeit with changes that fit the configfs usage model. Problems that configfs binary attributes fix are everything that requires a binary blob as part of the configuration of a resource, such as bitstream loading for FPGAs, DTBs for dynamically created devices etc. Look at Documentation/filesystems/configfs/configfs.txt for internals and howto use them. This patch is against linux-next as of today that contains Christoph's configfs rework. Signed-off-by: NPantelis Antoniou <pantelis.antoniou@konsulko.com> [hch: folded a fix from Geert Uytterhoeven <geert+renesas@glider.be>] [hch: a few tiny updates based on review feedback] Signed-off-by: NChristoph Hellwig <hch@lst.de>
-