- 17 12月, 2009 2 次提交
-
-
由 Al Viro 提交于
There are 2 groups of alloc_file() callers: * ones that are followed by ima_counts_get * ones giving non-regular files So let's pull that ima_counts_get() into alloc_file(); it's a no-op in case of non-regular files. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
... and have the caller grab both mnt and dentry; kill leak in infiniband, while we are at it. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 11 12月, 2009 1 次提交
-
-
由 Al Viro 提交于
... we should call mm ->get_unmapped_area() instead and let our caller do the final checks. Acked-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 28 9月, 2009 1 次提交
-
-
由 Alexey Dobriyan 提交于
* mark struct vm_area_struct::vm_ops as const * mark vm_ops in AGP code But leave TTM code alone, something is fishy there with global vm_ops being used. Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 22 9月, 2009 1 次提交
-
-
由 Eric B Munson 提交于
This patchset adds a flag to mmap that allows the user to request that an anonymous mapping be backed with huge pages. This mapping will borrow functionality from the huge page shm code to create a file on the kernel internal mount and use it to approximate an anonymous mapping. The MAP_HUGETLB flag is a modifier to MAP_ANONYMOUS and will not work without both flags being preset. A new flag is necessary because there is no other way to hook into huge pages without creating a file on a hugetlbfs mount which wouldn't be MAP_ANONYMOUS. To userspace, this mapping will behave just like an anonymous mapping because the file is not accessible outside of the kernel. This patchset is meant to simplify the programming model. Presently there is a large chunk of boiler platecode, contained in libhugetlbfs, required to create private, hugepage backed mappings. This patch set would allow use of hugepages without linking to libhugetlbfs or having hugetblfs mounted. Unification of the VM code would provide these same benefits, but it has been resisted each time that it has been suggested for several reasons: it would break PAGE_SIZE assumptions across the kernel, it makes page-table abstractions really expensive, and it does not provide any benefit on architectures that do not support huge pages, incurring fast path penalties without providing any benefit on these architectures. This patch: There are two means of creating mappings backed by huge pages: 1. mmap() a file created on hugetlbfs 2. Use shm which creates a file on an internal mount which essentially maps it MAP_SHARED The internal mount is only used for shared mappings but there is very little that stops it being used for private mappings. This patch extends hugetlbfs_file_setup() to deal with the creation of files that will be mapped MAP_PRIVATE on the internal hugetlbfs mount. This extended API is used in a subsequent patch to implement the MAP_HUGETLB mmap() flag. Signed-off-by: NEric Munson <ebmunson@us.ibm.com> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Adam Litke <agl@us.ibm.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 9月, 2009 1 次提交
-
-
由 Hugh Dickins 提交于
My 353d5c30 "mm: fix hugetlb bug due to user_shm_unlock call" broke the CONFIG_SYSVIPC !CONFIG_MMU build of both 2.6.31 and 2.6.30.6: "undefined reference to `user_shm_unlock'". gcc didn't understand my comment! so couldn't figure out to optimize away user_shm_unlock() from the error path in the hugetlb-less case, as it does elsewhere. Help it to do so, in a language it understands. Reported-by: NMike Frysinger <vapier@gentoo.org> Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk> Cc: stable@kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 8月, 2009 1 次提交
-
-
由 Hugh Dickins 提交于
2.6.30's commit 8a0bdec1 removed user_shm_lock() calls in hugetlb_file_setup() but left the user_shm_unlock call in shm_destroy(). In detail: Assume that can_do_hugetlb_shm() returns true and hence user_shm_lock() is not called in hugetlb_file_setup(). However, user_shm_unlock() is called in any case in shm_destroy() and in the following atomic_dec_and_lock(&up->__count) in free_uid() is executed and if up->__count gets zero, also cleanup_user_struct() is scheduled. Note that sched_destroy_user() is empty if CONFIG_USER_SCHED is not set. However, the ref counter up->__count gets unexpectedly non-positive and the corresponding structs are freed even though there are live references to them, resulting in a kernel oops after a lots of shmget(SHM_HUGETLB)/shmctl(IPC_RMID) cycles and CONFIG_USER_SCHED set. Hugh changed Stefan's suggested patch: can_do_hugetlb_shm() at the time of shm_destroy() may give a different answer from at the time of hugetlb_file_setup(). And fixed newseg()'s no_id error path, which has missed user_shm_unlock() ever since it came in 2.6.9. Reported-by: NStefan Huber <shuber2@gmail.com> Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk> Tested-by: NStefan Huber <shuber2@gmail.com> Cc: stable@kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 6月, 2009 1 次提交
-
-
由 Mike Frysinger 提交于
The massive nommu update (8feae131) resulted in these warnings: ipc/shm.c: In function `sys_shmdt': ipc/shm.c:974: warning: unused variable `size' ipc/shm.c:972: warning: unused variable `next' Signed-off-by: NMike Frysinger <vapier@gentoo.org> Cc: David Howells <dhowells@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 22 5月, 2009 2 次提交
-
-
由 Mimi Zohar 提交于
Based on discussion on lkml (Andrew Morton and Eric Paris), move ima_counts_get down a layer into shmem/hugetlb__file_setup(). Resolves drm shmem_file_setup() usage case as well. HD comment: I still think you're doing this at the wrong level, but recognize that you probably won't be persuaded until a few more users of alloc_file() emerge, all wanting your ima_counts_get(). Resolving GEM's shmem_file_setup() is an improvement, so I'll say Acked-by: NHugh Dickins <hugh.dickins@tiscali.co.uk> Signed-off-by: NMimi Zohar <zohar@us.ibm.com> Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 Mimi Zohar 提交于
- Add support in ima_path_check() for integrity checking without incrementing the counts. (Required for nfsd.) - rename and export opencount_get to ima_counts_get - replace ima_shm_check calls with ima_counts_get - export ima_path_check Signed-off-by: NMimi Zohar <zohar@us.ibm.com> Signed-off-by: NJames Morris <jmorris@namei.org>
-
- 03 4月, 2009 1 次提交
-
-
由 Tony Battersby 提交于
shm_get_stat() assumes idr_find(&shm_ids(ns).ipcs_idr) returns "struct shmid_kernel *"; all other callers assume that it returns "struct kern_ipc_perm *". This works because "struct kern_ipc_perm" is currently the first member of "struct shmid_kernel", but it would be better to use container_of() to prevent future breakage. Signed-off-by: NTony Battersby <tonyb@cybernetics.com> Cc: Jiri Olsa <olsajiri@gmail.com> Cc: Jiri Kosina <jkosina@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 2月, 2009 1 次提交
-
-
由 Mel Gorman 提交于
When overcommit is disabled, the core VM accounts for pages used by anonymous shared, private mappings and special mappings. It keeps track of VMAs that should be accounted for with VM_ACCOUNT and VMAs that never had a reserve with VM_NORESERVE. Overcommit for hugetlbfs is much riskier than overcommit for base pages due to contiguity requirements. It avoids overcommiting on both shared and private mappings using reservation counters that are checked and updated during mmap(). This ensures (within limits) that hugepages exist in the future when faults occurs or it is too easy to applications to be SIGKILLed. As hugetlbfs makes its own reservations of a different unit to the base page size, VM_ACCOUNT should never be set. Even if the units were correct, we would double account for the usage in the core VM and hugetlbfs. VM_NORESERVE may be set because an application can request no reserves be made for hugetlbfs at the risk of getting killed later. With commit fc8744ad, VM_NORESERVE and VM_ACCOUNT are getting unconditionally set for hugetlbfs-backed mappings. This breaks the accounting for both the core VM and hugetlbfs, can trigger an OOM storm when hugepage pools are too small lockups and corrupted counters otherwise are used. This patch brings hugetlbfs more in line with how the core VM treats VM_NORESERVE but prevents VM_ACCOUNT being set. Signed-off-by: NMel Gorman <mel@csn.ul.ie> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 06 2月, 2009 2 次提交
-
-
由 Mimi Zohar 提交于
The number of calls to ima_path_check()/ima_file_free() should be balanced. An extra call to fput(), indicates the file could have been accessed without first being measured. Although f_count is incremented/decremented in places other than fget/fput, like fget_light/fput_light and get_file, the current task must already hold a file refcnt. The call to __fput() is delayed until the refcnt becomes 0, resulting in ima_file_free() flagging any changes. - add hook to increment opencount for IPC shared memory(SYSV), shmat files, and /dev/zero - moved NULL iint test in opencount_get() Signed-off-by: NMimi Zohar <zohar@us.ibm.com> Acked-by: NSerge Hallyn <serue@us.ibm.com> Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 Tony Battersby 提交于
shm_get_stat() assumes that the inode is a "struct shmem_inode_info", which is incorrect for !CONFIG_SHMEM (see fs/ramfs/inode.c: ramfs_get_inode() vs. mm/shmem.c: shmem_get_inode()). This bad assumption can cause shmctl(SHM_INFO) to lockup when shm_get_stat() tries to spin_lock(&info->lock). Users of !CONFIG_SHMEM may encounter this lockup simply by invoking the 'ipcs' command. Reported by Jiri Olsa back in February 2008: http://lkml.org/lkml/2008/2/29/74Signed-off-by: NTony Battersby <tonyb@cybernetics.com> Cc: Jiri Kosina <jkosina@suse.cz> Reported-by: NJiri Olsa <olsajiri@gmail.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: <stable@kernel.org> [2.6.everything] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 2月, 2009 1 次提交
-
-
由 Linus Torvalds 提交于
The mmap_region() code would temporarily set the VM_ACCOUNT flag for anonymous shared mappings just to inform shmem_zero_setup() that it should enable accounting for the resulting shm object. It would then clear the flag after calling ->mmap (for the /dev/zero case) or doing shmem_zero_setup() (for the MAP_ANON case). This just resulted in vma merge issues, but also made for just unnecessary confusion. Use the already-existing VM_NORESERVE flag for this instead, and let shmem_{zero|file}_setup() just figure it out from that. This also happens to make it obvious that the new DRI2 GEM layer uses a non-reserving backing store for its object allocation - which is quite possibly not intentional. But since I didn't want to change semantics in this patch, I left it alone, and just updated the caller to use the new flag semantics. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 1月, 2009 1 次提交
-
-
由 Heiko Carstens 提交于
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
-
- 08 1月, 2009 1 次提交
-
-
由 David Howells 提交于
Make VMAs per mm_struct as for MMU-mode linux. This solves two problems: (1) In SYSV SHM where nattch for a segment does not reflect the number of shmat's (and forks) done. (2) In mmap() where the VMA's vm_mm is set to point to the parent mm by an exec'ing process when VM_EXECUTABLE is specified, regardless of the fact that a VMA might be shared and already have its vm_mm assigned to another process or a dead process. A new struct (vm_region) is introduced to track a mapped region and to remember the circumstances under which it may be shared and the vm_list_struct structure is discarded as it's no longer required. This patch makes the following additional changes: (1) Regions are now allocated with alloc_pages() rather than kmalloc() and with no recourse to __GFP_COMP, so the pages are not composite. Instead, each page has a reference on it held by the region. Anything else that is interested in such a page will have to get a reference on it to retain it. When the pages are released due to unmapping, each page is passed to put_page() and will be freed when the page usage count reaches zero. (2) Excess pages are trimmed after an allocation as the allocation must be made as a power-of-2 quantity of pages. (3) VMAs are added to the parent MM's R/B tree and mmap lists. As an MM may end up with overlapping VMAs within the tree, the VMA struct address is appended to the sort key. (4) Non-anonymous VMAs are now added to the backing inode's prio list. (5) Holes may be punched in anonymous VMAs with munmap(), releasing parts of the backing region. The VMA and region structs will be split if necessary. (6) sys_shmdt() only releases one attachment to a SYSV IPC shared memory segment instead of all the attachments at that addresss. Multiple shmat()'s return the same address under NOMMU-mode instead of different virtual addresses as under MMU-mode. (7) Core dumping for ELF-FDPIC requires fewer exceptions for NOMMU-mode. (8) /proc/maps is now the global list of mapped regions, and may list bits that aren't actually mapped anywhere. (9) /proc/meminfo gains a line (tagged "MmapCopy") that indicates the amount of RAM currently allocated by mmap to hold mappable regions that can't be mapped directly. These are copies of the backing device or file if not anonymous. These changes make NOMMU mode more similar to MMU mode. The downside is that NOMMU mode requires some extra memory to track things over NOMMU without this patch (VMAs are no longer shared, and there are now region structs). Signed-off-by: NDavid Howells <dhowells@redhat.com> Tested-by: NMike Frysinger <vapier.adi@gmail.com> Acked-by: NPaul Mundt <lethal@linux-sh.org>
-
- 07 1月, 2009 1 次提交
-
-
由 WANG Cong 提交于
Use the macro shm_ids(). Remove useless check for a userspace pointer, because copy_to_user() will check it. Some style cleanups. Signed-off-by: NWANG Cong <wangcong@zeuux.org> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Cc: Pierre Peiffer <peifferp@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 1月, 2009 1 次提交
-
-
由 Al Viro 提交于
* get rid of allocations * make it return void * simplify callers Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 14 11月, 2008 3 次提交
-
-
由 David Howells 提交于
Wrap current->cred and a few other accessors to hide their actual implementation. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NJames Morris <jmorris@namei.org> Acked-by: NSerge Hallyn <serue@us.ibm.com> Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 David Howells 提交于
Separate the task security context from task_struct. At this point, the security data is temporarily embedded in the task_struct with two pointers pointing to it. Note that the Alpha arch is altered as it refers to (E)UID and (E)GID in entry.S via asm-offsets. With comment fixes Signed-off-by: Marc Dionne <marc.c.dionne@gmail.com> Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NJames Morris <jmorris@namei.org> Acked-by: NSerge Hallyn <serue@us.ibm.com> Signed-off-by: NJames Morris <jmorris@namei.org>
-
由 David Howells 提交于
Wrap access to task credentials so that they can be separated more easily from the task_struct during the introduction of COW creds. Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id(). Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more sense to use RCU directly rather than a convenient wrapper; these will be addressed by later patches. Signed-off-by: NDavid Howells <dhowells@redhat.com> Reviewed-by: NJames Morris <jmorris@namei.org> Acked-by: NSerge Hallyn <serue@us.ibm.com> Signed-off-by: NJames Morris <jmorris@namei.org>
-
- 21 10月, 2008 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 20 10月, 2008 1 次提交
-
-
由 Lee Schermerhorn 提交于
Shmem segments locked into memory via shmctl(SHM_LOCKED) should not be kept on the normal LRU, since scanning them is a waste of time and might throw off kswapd's balancing algorithms. Place them on the unevictable LRU list instead. Use the AS_UNEVICTABLE flag to mark address_space of SHM_LOCKed shared memory regions as unevictable. Then these pages will be culled off the normal LRU lists during vmscan. Add new wrapper function to clear the mapping's unevictable state when/if shared memory segment is munlocked. Add 'scan_mapping_unevictable_page()' to mm/vmscan.c to scan all pages in the shmem segment's mapping [struct address_space] for evictability now that they're no longer locked. If so, move them to the appropriate zone lru list. Changes depend on [CONFIG_]UNEVICTABLE_LRU. [kosaki.motohiro@jp.fujitsu.com: revert shm change] Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: NRik van Riel <riel@redhat.com> Signed-off-by: NKosaki Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 26 7月, 2008 1 次提交
-
-
由 Nadia Derbey 提交于
Remove the ipc_lock_down() routines: they used to call idr_find() locklessly (given that the ipc ids lock was already held), so they are not needed anymore. Signed-off-by: NNadia Derbey <Nadia.Derbey@bull.net> Acked-by: N"Paul E. McKenney" <paulmck@us.ibm.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Jim Houston <jim.houston@comcast.net> Cc: Pierre Peiffer <peifferp@gmail.com> Acked-by: NRik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 7月, 2008 1 次提交
-
-
由 Andi Kleen 提交于
The goal of this patchset is to support multiple hugetlb page sizes. This is achieved by introducing a new struct hstate structure, which encapsulates the important hugetlb state and constants (eg. huge page size, number of huge pages currently allocated, etc). The hstate structure is then passed around the code which requires these fields, they will do the right thing regardless of the exact hstate they are operating on. This patch adds the hstate structure, with a single global instance of it (default_hstate), and does the basic work of converting hugetlb to use the hstate. Future patches will add more hstate structures to allow for different hugetlbfs mounts to have different page sizes. [akpm@linux-foundation.org: coding-style fixes] Acked-by: NAdam Litke <agl@us.ibm.com> Acked-by: NNishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 6月, 2008 1 次提交
-
-
由 Paul Menage 提交于
sysvipc_shm_proc_show() picks between format strings (based on the expected maximum length of a SHM segment) in a way that prevents gcc from performing format checks on the seq_printf() parameters. This hid two format errors - shp->shm_segsz and shp->shm_nattach are both unsigned long, but were being printed as unsigned int and signed int respectively. This leads to 32-bit truncation of SHM segment sizes reported in /proc/sysvipc/shm. (And for nattach, but that's less of a problem for most users). This patch makes the format string directly visible to gcc's format specifier checker, and fixes the two broken format specifiers. Signed-off-by: NPaul Menage <menage@google.com> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Pierre Peiffer <peifferp@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 6月, 2008 1 次提交
-
-
由 Neil Horman 提交于
Found a silly double assignment of err is do_shmat. Silly, but good to clean up the useless code. Signed-off-by: NNeil Horman <nhorman@tuxdriver.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 29 4月, 2008 5 次提交
-
-
由 Pierre Peiffer 提交于
semctl_down(), msgctl_down() and shmctl_down() are used to handle the same set of commands for each kind of IPC. They all start to do the same job (they retrieve the ipc and do some permission checks) before handling the commands on their own. This patch proposes to consolidate this by moving these same pieces of code into one common function called ipcctl_pre_down(). It simplifies a little these xxxctl_down() functions and increases a little the maintainability. Signed-off-by: NPierre Peiffer <pierre.peiffer@bull.net> Acked-by: NSerge Hallyn <serue@us.ibm.com> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Pierre Peiffer 提交于
The IPC_SET command performs the same permission setting for all IPCs. This patch introduces a common ipc_update_perm() function to update these permissions and makes use of it for all IPCs. Signed-off-by: NPierre Peiffer <pierre.peiffer@bull.net> Acked-by: NSerge Hallyn <serue@us.ibm.com> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Pierre Peiffer 提交于
All IPCs make use of an intermetiate *_setbuf structure to handle the IPC_SET command. This is not really needed and, moreover, it complicates a little bit the code. This patch gets rid of the use of it and uses directly the semid64_ds/ msgid64_ds/shmid64_ds structure. In addition of removing one struture declaration, it also simplifies and improves a little bit the common 64-bits path. Signed-off-by: NPierre Peiffer <pierre.peiffer@bull.net> Acked-by: NSerge Hallyn <serue@us.ibm.com> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Pierre Peiffer 提交于
Currently, the way the different commands are handled in sys_shmctl introduces some duplicated code. This patch introduces the shmctl_down function to handle all the commands requiring the rwmutex to be taken in write mode (ie IPC_SET and IPC_RMID for now). It is the equivalent function of semctl_down for shared memory. This removes some duplicated code for handling these both commands and harmonizes the way they are handled among all IPCs. Signed-off-by: NPierre Peiffer <pierre.peiffer@bull.net> Acked-by: NSerge Hallyn <serue@us.ibm.com> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Pierre Peiffer 提交于
By continuing to consolidate a little the IPC code, each id can be built directly in ipc_addid() instead of having it built from each callers of ipc_addid() And I also remove shm_addid() in order to have, as much as possible, the same code for shm/sem/msg. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NPierre Peiffer <pierre.peiffer@bull.net> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 28 4月, 2008 2 次提交
-
-
由 Lee Schermerhorn 提交于
After further discussion with Christoph Lameter, it has become clear that my earlier attempts to clean up the mempolicy reference counting were a bit of overkill in some areas, resulting in superflous ref/unref in what are usually fast paths. In other areas, further inspection reveals that I botched the unref for interleave policies. A separate patch, suitable for upstream/stable trees, fixes up the known errors in the previous attempt to fix reference counting. This patch reworks the memory policy referencing counting and, one hopes, simplifies the code. Maybe I'll get it right this time. See the update to the numa_memory_policy.txt document for a discussion of memory policy reference counting that motivates this patch. Summary: Lookup of mempolicy, based on (vma, address) need only add a reference for shared policy, and we need only unref the policy when finished for shared policies. So, this patch backs out all of the unneeded extra reference counting added by my previous attempt. It then unrefs only shared policies when we're finished with them, using the mpol_cond_put() [conditional put] helper function introduced by this patch. Note that shmem_swapin() calls read_swap_cache_async() with a dummy vma containing just the policy. read_swap_cache_async() can call alloc_page_vma() multiple times, so we can't let alloc_page_vma() unref the shared policy in this case. To avoid this, we make a copy of any non-null shared policy and remove the MPOL_F_SHARED flag from the copy. This copy occurs before reading a page [or multiple pages] from swap, so the overhead should not be an issue here. I introduced a new static inline function "mpol_cond_copy()" to copy the shared policy to an on-stack policy and remove the flags that would require a conditional free. The current implementation of mpol_cond_copy() assumes that the struct mempolicy contains no pointers to dynamically allocated structures that must be duplicated or reference counted during copy. Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andi Kleen <ak@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Lee Schermerhorn 提交于
get_vma_policy() is not handling fallback to task policy correctly when the get_policy() vm_op returns NULL. The NULL overwrites the 'pol' variable that was holding the fallback task mempolicy. So, it was falling back directly to system default policy. Fix get_vma_policy() to use only non-NULL policy returned from the vma get_policy op. shm_get_policy() was falling back to current task's mempolicy if the "backing file system" [tmpfs vs hugetlbfs] does not support the get_policy vm_op and the vma policy is null. This is incorrect for show_numa_maps() which is likely querying the numa_maps of some task other than current. Remove this fallback. Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andi Kleen <ak@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 3月, 2008 1 次提交
-
-
由 Lee Schermerhorn 提交于
Address 3 known bugs in the current memory policy reference counting method. I have a series of patches to rework the reference counting to reduce overhead in the allocation path. However, that series will require testing in -mm once I repost it. 1) alloc_page_vma() does not release the extra reference taken for vma/shared mempolicy when the mode == MPOL_INTERLEAVE. This can result in leaking mempolicy structures. This is probably occurring, but not being noticed. Fix: add the conditional release of the reference. 2) hugezonelist unconditionally releases a reference on the mempolicy when mode == MPOL_INTERLEAVE. This can result in decrementing the reference count for system default policy [should have no ill effect] or premature freeing of task policy. If this occurred, the next allocation using task mempolicy would use the freed structure and probably BUG out. Fix: add the necessary check to the release. 3) The current reference counting method assumes that vma 'get_policy()' methods automatically add an extra reference a non-NULL returned mempolicy. This is true for shmem_get_policy() used by tmpfs mappings, including regular page shm segments. However, SHM_HUGETLB shm's, backed by hugetlbfs, just use the vma policy without the extra reference. This results in freeing of the vma policy on the first allocation, with reuse of the freed mempolicy structure on subsequent allocations. Fix: Rather than add another condition to the conditional reference release, which occur in the allocation path, just add a reference when returning the vma policy in shm_get_policy() to match the assumptions. Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com> Cc: Greg KH <greg@kroah.com> Cc: Andi Kleen <ak@suse.de> Cc: Christoph Lameter <clameter@sgi.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: David Rientjes <rientjes@google.com> Cc: <eric.whitney@hp.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 2月, 2008 3 次提交
-
-
由 Pierre Peiffer 提交于
sem_exit_ns(), msg_exit_ns() and shm_exit_ns() are all called when an ipc_namespace is released to free all ipcs of each type. But in fact, they do the same thing: they loop around all ipcs to free them individually by calling a specific routine. This patch proposes to consolidate this by introducing a common function, free_ipcs(), that do the job. The specific routine to call on each individual ipcs is passed as parameter. For this, these ipc-specific 'free' routines are reworked to take a generic 'struct ipc_perm' as parameter. Signed-off-by: NPierre Peiffer <pierre.peiffer@bull.net> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Pierre Peiffer 提交于
Each ipc_namespace contains a table of 3 pointers to struct ipc_ids (3 for msg, sem and shm, structure used to store all ipcs) These 'struct ipc_ids' are dynamically allocated for each icp_namespace as the ipc_namespace itself (for the init namespace, they are initialized with pointers to static variables instead) It is so for historical reason: in fact, before the use of idr to store the ipcs, the ipcs were stored in tables of variable length, depending of the maximum number of ipc allowed. Now, these 'struct ipc_ids' have a fixed size. As they are allocated in any cases for each new ipc_namespace, there is no gain of memory in having them allocated separately of the struct ipc_namespace. This patch proposes to make this table static in the struct ipc_namespace. Thus, we can allocate all in once and get rid of all the code needed to allocate and free these ipc_ids separately. Signed-off-by: NPierre Peiffer <pierre.peiffer@bull.net> Acked-by: NCedric Le Goater <clg@fr.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Pavel Emelyanov 提交于
Currently the IPC namespace management code is spread over the ipc/*.c files. I moved this code into ipc/namespace.c file which is compiled out when needed. The linux/ipc_namespace.h file is used to store the prototypes of the functions in namespace.c and the stubs for NAMESPACES=n case. This is done so, because the stub for copy_ipc_namespace requires the knowledge of the CLONE_NEWIPC flag, which is in sched.h. But the linux/ipc.h file itself in included into many many .c files via the sys.h->sem.h sequence so adding the sched.h into it will make all these .c depend on sched.h which is not that good. On the other hand the knowledge about the namespaces stuff is required in 4 .c files only. Besides, this patch compiles out some auxiliary functions from ipc/sem.c, msg.c and shm.c files. It turned out that moving these functions into namespaces.c is not that easy because they use many other calls and macros from the original file. Moving them would make this patch complicated. On the other hand all these functions can be consolidated, so I will send a separate patch doing this a bit later. Signed-off-by: NPavel Emelyanov <xemul@openvz.org> Acked-by: NSerge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 07 2月, 2008 1 次提交
-
-
由 Pierre Peiffer 提交于
In the new implementation of the [sem|shm|msg]_lock[_check]() routines, we use the return value of ipc_lock() in container_of() without any check. But ipc_lock may return a errcode. The use of this errcode in container_of() may alter this errcode, and we don't want this. And in xxx_exit_ns, the pointer return by idr_find is of type 'struct kern_ipc_per'... Today, the code will work as is because the member used in these container_of() is the first member of its container (offset == 0), the errcode isn't changed then. But in the general case, we can't count on this assumption and this may lead later to a real bug if we don't correct this. Again, the proposed solution is simple and correct. But, as pointed by Nadia, with this solution, the same check will be done several times (in all sub-callers...), what is not very funny/optimal... Signed-off-by: NPierre Peiffer <pierre.peiffer@bull.net> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-