- 26 7月, 2011 7 次提交
-
-
由 Hugh Dickins 提交于
We can now simplify shmem_getpage_gfp(): there is no longer a dilemma of filepage passed in via shmem_readpage(), then swappage found, which must then be copied over to it. Although at first it's tempting to replace the **pagep arg by returning struct page *, that makes a mess of IS_ERR_OR_NULL(page)s in all the callers, so leave as is. Insert BUG_ON(!PageUptodate) when we find and lock page: some of the complication came from uninitialized pages inserted into filecache prior to readpage; but now we're in control, and only release pagelock on filecache once it's uptodate (if an error occurs in reading back from swap, the page remains in swapcache, never moved to filecache). Signed-off-by: NHugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
The prealloc_page handling in shmem_getpage_gfp() is unnecessarily complicated: first simplify that before going on to filepage/swappage. That's right, don't report ENOMEM when the preallocation fails: we may or may not need the page. But simply report ENOMEM once we find we do need it, instead of dropping lock, repeating allocation, unwinding on failure etc. And leave the out label on the fast path, don't goto. Fix something that looks like a bug but turns out not to be: set PageSwapBacked on prealloc_page before its mem_cgroup_cache_charge(), as the removed case was doing. That's important before adding to LRU (determines which LRU the page goes on), and does affect which path it takes through memcontrol.c, but in the end MEM_CGROUP_CHANGE_TYPE_ SHMEM is handled no differently from CACHE. Signed-off-by: NHugh Dickins <hughd@google.com> Acked-by: NShaohua Li <shaohua.li@intel.com> Cc: "Zhang, Yanmin" <yanmin.zhang@intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
Remove that pernicious shmem_readpage() at last: the things we needed it for (splice, loop, sendfile, i915 GEM) are now fully taken care of by shmem_file_splice_read() and shmem_read_mapping_page_gfp(). This removal clears the way for a simpler shmem_getpage_gfp(), since page is never passed in; but leave most of that cleanup until after. sys_readahead() and sys_fadvise(POSIX_FADV_WILLNEED) will now EINVAL, instead of unexpectedly trying to read ahead on tmpfs: if that proves to be an issue for someone, then we can either arrange for them to return success instead, or try to implement async readahead on tmpfs. Signed-off-by: NHugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
Make shmem_getpage() a wrapper, passing mapping_gfp_mask() down to shmem_getpage_gfp(), which in turn passes gfp down to shmem_swp_alloc(). Change shmem_read_mapping_page_gfp() to use shmem_getpage_gfp() in the CONFIG_SHMEM case; but leave tiny !SHMEM using read_cache_page_gfp(). Add a BUG_ON() in case anyone happens to call this on a non-shmem mapping; though we might later want to let that case route to read_cache_page_gfp(). It annoys me to have these two almost-redundant args, gfp and fault_type: I can't find a better way; but initialize fault_type only in shmem_fault(). Note that before, read_cache_page_gfp() was allocating i915_gem's pages with __GFP_NORETRY as intended; but the corresponding swap vector pages got allocated without it, leaving a small possibility of OOM. Signed-off-by: NHugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
Tidy up shmem_file_splice_read(): Remove readahead: okay, we could implement shmem readahead on swap, but have never done so before, swap being the slow exceptional path. Use shmem_getpage() instead of find_or_create_page() plus ->readpage(). Remove several comments: sorry, I found them more distracting than helpful, and this will not be the reference version of splice_read(). Signed-off-by: NHugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
Copy __generic_file_splice_read() and generic_file_splice_read() from fs/splice.c to shmem_file_splice_read() in mm/shmem.c. Make page_cache_pipe_buf_ops and spd_release_page() accessible to it. Signed-off-by: NHugh Dickins <hughd@google.com> Cc: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
2.6.36's 7e496299 ("tmpfs: make tmpfs scalable with percpu_counter for used blocks") to make tmpfs scalable with percpu_counter used inode->i_lock in place of sbinfo->stat_lock around i_blocks updates; but that was adverse to scalability, and unnecessary, since info->lock is already held there in the fast paths. Remove those uses of i_lock, and add info->lock in the three error paths where it's then needed across shmem_free_blocks(). It's not actually needed across shmem_unacct_blocks(), but they're so often paired that it looks wrong to split them apart. Signed-off-by: NHugh Dickins <hughd@google.com> Acked-by: NTim Chen <tim.c.chen@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 28 6月, 2011 2 次提交
-
-
由 Hugh Dickins 提交于
Although it is used (by i915) on nothing but tmpfs, read_cache_page_gfp() is unsuited to tmpfs, because it inserts a page into pagecache before calling the filesystem's ->readpage: tmpfs may have pages in swapcache which only it knows how to locate and switch to filecache. At present tmpfs provides a ->readpage method, and copes with this by copying pages; but soon we can simplify it by removing its ->readpage. Provide shmem_read_mapping_page_gfp() now, ready for that transition, Export shmem_read_mapping_page_gfp() and add it to list in shmem_fs.h, with shmem_read_mapping_page() inline for the common mapping_gfp case. (shmem_read_mapping_page_gfp or shmem_read_cache_page_gfp? Generally the read_mapping_page functions use the mapping's ->readpage, and the read_cache_page functions use the supplied filler, so I think read_cache_page_gfp was slightly misnamed.) Signed-off-by: NHugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
2.6.35's new truncate convention gave tmpfs the opportunity to control its file truncation, no longer enforced from outside by vmtruncate(). We shall want to build upon that, to handle pagecache and swap together. Slightly redefine the ->truncate_range interface: let it now be called between the unmap_mapping_range()s, with the filesystem responsible for doing the truncate_inode_pages_range() from it - just as the filesystem is nowadays responsible for doing that from its ->setattr. Let's rename shmem_notify_change() to shmem_setattr(). Instead of calling the generic truncate_setsize(), bring that code in so we can call shmem_truncate_range() - which will later be updated to perform its own variant of truncate_inode_pages_range(). Remove the punch_hole unmap_mapping_range() from shmem_truncate_range(): now that the COW's unmap_mapping_range() comes after ->truncate_range, there is no need to call it a third time. Export shmem_truncate_range() and add it to the list in shmem_fs.h, so that i915_gem_object_truncate() can call it explicitly in future; get this patch in first, then update drm/i915 once this is available (until then, i915 will just be doing the truncate_inode_pages() twice). Though introduced five years ago, no other filesystem is implementing ->truncate_range, and its only other user is madvise(,,MADV_REMOVE): we expect to convert it to fallocate(,FALLOC_FL_PUNCH_HOLE,,) shortly, whereupon ->truncate_range can be removed from inode_operations - shmem_truncate_range() will help i915 across that transition too. Signed-off-by: NHugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 29 5月, 2011 1 次提交
-
-
由 Hugh Dickins 提交于
While running fsx on tmpfs with a memhog then swapoff, swapoff was hanging (interruptibly), repeatedly failing to locate the owner of a 0xff entry in the swap_map. Although shmem_writepage() does abandon when it sees incoming page index is beyond eof, there was still a window in which shmem_truncate_range() could come in between writepage's dropping lock and updating swap_map, find the half-completed swap_map entry, and in trying to free it, leave it in a state that swap_shmem_alloc() could not correct. Arguably a bug in __swap_duplicate()'s and swap_entry_free()'s handling of the different cases, but easiest to fix by moving swap_shmem_alloc() under cover of the lock. More interesting than the bug: it's been there since 2.6.33, why could I not see it with earlier kernels? The mmotm of two weeks ago seems to have some magic for generating races, this is just one of three I found. With yesterday's git I first saw this in mainline, bisected in search of that magic, but the easy reproducibility evaporated. Oh well, fix the bug. Signed-off-by: NHugh Dickins <hughd@google.com> Cc: stable@kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 27 5月, 2011 1 次提交
-
-
由 Ying Han 提交于
Two new stats in per-memcg memory.stat which tracks the number of page faults and number of major page faults. "pgfault" "pgmajfault" They are different from "pgpgin"/"pgpgout" stat which count number of pages charged/discharged to the cgroup and have no meaning of reading/ writing page to disk. It is valuable to track the two stats for both measuring application's performance as well as the efficiency of the kernel page reclaim path. Counting pagefaults per process is useful, but we also need the aggregated value since processes are monitored and controlled in cgroup basis in memcg. Functional test: check the total number of pgfault/pgmajfault of all memcgs and compare with global vmstat value: $ cat /proc/vmstat | grep fault pgfault 1070751 pgmajfault 553 $ cat /dev/cgroup/memory.stat | grep fault pgfault 1071138 pgmajfault 553 total_pgfault 1071142 total_pgmajfault 553 $ cat /dev/cgroup/A/memory.stat | grep fault pgfault 199 pgmajfault 0 total_pgfault 199 total_pgmajfault 0 Performance test: run page fault test(pft) wit 16 thread on faulting in 15G anon pages in 16G container. There is no regression noticed on the "flt/cpu/s" Sample output from pft: TAG pft:anon-sys-default: Gb Thr CLine User System Wall flt/cpu/s fault/wsec 15 16 1 0.67s 233.41s 14.76s 16798.546 266356.260 +-------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 10 16682.962 17344.027 16913.524 16928.812 166.5362 + 10 16695.568 16923.896 16820.604 16824.652 84.816568 No difference proven at 95.0% confidence [akpm@linux-foundation.org: fix build] [hughd@google.com: shmem fix] Signed-off-by: NYing Han <yinghan@google.com> Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: NMinchan Kim <minchan.kim@gmail.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: NHugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 5月, 2011 1 次提交
-
-
由 Eric Paris 提交于
Implement generic xattrs for tmpfs filesystems. The Feodra project, while trying to replace suid apps with file capabilities, realized that tmpfs, which is used on the build systems, does not support file capabilities and thus cannot be used to build packages which use file capabilities. Xattrs are also needed for overlayfs. The xattr interface is a bit odd. If a filesystem does not implement any {get,set,list}xattr functions the VFS will call into some random LSM hooks and the running LSM can then implement some method for handling xattrs. SELinux for example provides a method to support security.selinux but no other security.* xattrs. As it stands today when one enables CONFIG_TMPFS_POSIX_ACL tmpfs will have xattr handler routines specifically to handle acls. Because of this tmpfs would loose the VFS/LSM helpers to support the running LSM. To make up for that tmpfs had stub functions that did nothing but call into the LSM hooks which implement the helpers. This new patch does not use the LSM fallback functions and instead just implements a native get/set/list xattr feature for the full security.* and trusted.* namespace like a normal filesystem. This means that tmpfs can now support both security.selinux and security.capability, which was not previously possible. The basic implementation is that I attach a: struct shmem_xattr { struct list_head list; /* anchored by shmem_inode_info->xattr_list */ char *name; size_t size; char value[0]; }; Into the struct shmem_inode_info for each xattr that is set. This implementation could easily support the user.* namespace as well, except some care needs to be taken to prevent large amounts of unswappable memory being allocated for unprivileged users. [mszeredi@suse.cz: new config option, suport trusted.*, support symlinks] Signed-off-by: NEric Paris <eparis@redhat.com> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Acked-by: NSerge Hallyn <serge.hallyn@ubuntu.com> Tested-by: NSerge Hallyn <serge.hallyn@ubuntu.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Acked-by: NHugh Dickins <hughd@google.com> Tested-by: NJordi Pujol <jordipujolp@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 21 5月, 2011 1 次提交
-
-
由 Hugh Dickins 提交于
Commit 778dd893 ("tmpfs: fix race between umount and swapoff") forgot the new rules for strict atomic kmap nesting, causing WARNING: at arch/x86/mm/highmem_32.c:81 from __kunmap_atomic(), then BUG: unable to handle kernel paging request at fffb9000 from shmem_swp_set() when shmem_unuse_inode() is handling swapoff with highmem in use. My disgrace again. See https://bugzilla.kernel.org/show_bug.cgi?id=35352Reported-by: NWitold Baryluk <baryluk@smp.if.uj.edu.pl> Signed-off-by: NHugh Dickins <hughd@google.com> Cc: stable@kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 5月, 2011 1 次提交
-
-
由 Hugh Dickins 提交于
Shame on me! Commit b1dea800 "tmpfs: fix race between umount and writepage" fixed the advertized race, but introduced another: as even its comment makes clear, we cannot safely rely on a peek at list_empty() while holding no lock - until info->swapped is set, shmem_unuse_inode() may delete any formerly-swapped inode from the shmem_swaplist, which in this case would leave a swap area impossible to swapoff. Although I don't relish taking the mutex every time, I don't care much for the alternatives either; and at least the peek at list_empty() in shmem_evict_inode() (a hotter path since most inodes would never have been swapped) remains safe, because we already truncated the whole file. Signed-off-by: NHugh Dickins <hughd@google.com> Cc: stable@kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 12 5月, 2011 3 次提交
-
-
由 Hugh Dickins 提交于
Testing the shmem_swaplist replacements for igrab() revealed another bug: writes to /dev/loop0 on a tmpfs file which fills its filesystem were sometimes failing with "Buffer I/O error"s. These came from ENOSPC failures of shmem_getpage(), when racing with swapoff: the same could happen when racing with another shmem_getpage(), pulling the page in from swap in between our find_lock_page() and our taking the info->lock (though not in the single-threaded loop case). This is unacceptable, and surprising that I've not noticed it before: it dates back many years, but (presumably) was made a lot easier to reproduce in 2.6.36, which sited a page preallocation in the race window. Fix it by rechecking the page cache before settling on an ENOSPC error. Signed-off-by: NHugh Dickins <hughd@google.com> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
The use of igrab() in swapoff's shmem_unuse_inode() is just as vulnerable to umount as that in shmem_writepage(). Fix this instance by extending the protection of shmem_swaplist_mutex right across shmem_unuse_inode(): while it's on the list, the inode cannot be evicted (and the filesystem cannot be unmounted) without shmem_evict_inode() taking that mutex to remove it from the list. But since shmem_writepage() might take that mutex, we should avoid making memory allocations or memcg charges while holding it: prepare them at the outer level in shmem_unuse(). When mem_cgroup_cache_charge() was originally placed, we didn't know until that point that the page from swap was actually a shmem page; but nowadays it's noted in the swap_map, so we're safe to charge upfront. For the radix_tree, do as is done in shmem_getpage(): preload upfront, but don't pin to the cpu; so we make a habit of refreshing the node pool, but might dip into GFP_NOWAIT reserves on occasion if subsequently preempted. With the allocation and charge moved out from shmem_unuse_inode(), we can also hold index map and info->lock over from finding the entry. Signed-off-by: NHugh Dickins <hughd@google.com> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
Konstanin Khlebnikov reports that a dangerous race between umount and shmem_writepage can be reproduced by this script: for i in {1..300} ; do mkdir $i while true ; do mount -t tmpfs none $i dd if=/dev/zero of=$i/test bs=1M count=$(($RANDOM % 100)) umount $i done & done on a 6xCPU node with 8Gb RAM: kernel very unstable after this accident. =) Kernel log: VFS: Busy inodes after unmount of tmpfs. Self-destruct in 5 seconds. Have a nice day... WARNING: at lib/list_debug.c:53 __list_del_entry+0x8d/0x98() list_del corruption. prev->next should be ffff880222fdaac8, but was (null) Pid: 11222, comm: mount.tmpfs Not tainted 2.6.39-rc2+ #4 Call Trace: warn_slowpath_common+0x80/0x98 warn_slowpath_fmt+0x41/0x43 __list_del_entry+0x8d/0x98 evict+0x50/0x113 iput+0x138/0x141 ... BUG: unable to handle kernel paging request at ffffffffffffffff IP: shmem_free_blocks+0x18/0x4c Pid: 10422, comm: dd Tainted: G W 2.6.39-rc2+ #4 Call Trace: shmem_recalc_inode+0x61/0x66 shmem_writepage+0xba/0x1dc pageout+0x13c/0x24c shrink_page_list+0x28e/0x4be shrink_inactive_list+0x21f/0x382 ... shmem_writepage() calls igrab() on the inode for the page which came from page reclaim, to add it later into shmem_swaplist for swapoff operation. This igrab() can race with super-block deactivating process: shrink_inactive_list() deactivate_super() pageout() tmpfs_fs_type->kill_sb() shmem_writepage() kill_litter_super() generic_shutdown_super() evict_inodes() igrab() atomic_read(&inode->i_count) skip-inode iput() if (!list_empty(&sb->s_inodes)) printk("VFS: Busy inodes after... This igrap-iput pair was added in commit 1b1b32f2 "tmpfs: fix shmem_swaplist races" based on incorrect assumptions: igrab() protects the inode from concurrent eviction by deletion, but it does nothing to protect it from concurrent unmounting, which goes ahead despite the raised i_count. So this use of igrab() was wrong all along, but the race made much worse in 2.6.37 when commit 63997e98 "split invalidate_inodes()" replaced two attempts at invalidate_inodes() by a single evict_inodes(). Konstantin posted a plausible patch, raising sb->s_active too: I'm unsure whether it was correct or not; but burnt once by igrab(), I am sure that we don't want to rely more deeply upon externals here. Fix it by adding the inode to shmem_swaplist earlier, while the page lock on page in page cache still secures the inode against eviction, without artifically raising i_count. It was originally added later because shmem_unuse_inode() is liable to remove an inode from the list while it's unswapped; but we can guard against that by taking spinlock before dropping mutex. Reported-by: NKonstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: NHugh Dickins <hughd@google.com> Tested-by: NKonstantin Khlebnikov <khlebnikov@openvz.org> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 4月, 2011 1 次提交
-
-
由 Hugh Dickins 提交于
If you fill up a tmpfs, df was showing tmpfs 460800 - - - /tmp because of an off-by-one in the max_blocks checks. Fix it so df shows tmpfs 460800 460800 0 100% /tmp Signed-off-by: NHugh Dickins <hughd@google.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 23 3月, 2011 2 次提交
-
-
由 Hugh Dickins 提交于
Up to 2.6.22, you could use remap_file_pages(2) on a tmpfs file or a shared mapping of /dev/zero or a shared anonymous mapping. In 2.6.23 we disabled it by default, but set VM_CAN_NONLINEAR to enable it on safe mappings. We made sure to set it in shmem_mmap() for tmpfs files, but missed it in shmem_zero_setup() for the others. Fix that at last. Reported-by: NKenny Simpson <theonetruekenny@yahoo.com> Signed-off-by: NHugh Dickins <hughd@google.com> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Minchan Kim 提交于
This patch series changes remove_from_page_cache()'s page ref counting rule. Page cache ref count is decreased in delete_from_page_cache(). So we don't need to decrease the page reference in callers. Signed-off-by: NMinchan Kim <minchan.kim@gmail.com> Acked-by: NHugh Dickins <hughd@google.com> Acked-by: NMel Gorman <mel@csn.ul.ie> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org> Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 3月, 2011 1 次提交
-
-
由 Aneesh Kumar K.V 提交于
The exportfs encode handle function should return the minimum required handle size. This helps user to find out the handle size by passing 0 handle size in the first step and then redoing to the call again with the returned handle size value. Acked-by: NSerge Hallyn <serue@us.ibm.com> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 10 3月, 2011 1 次提交
-
-
由 Jens Axboe 提交于
Code has been converted over to the new explicit on-stack plugging, and delay users have been converted to use the new API for that. So lets kill off the old plugging along with aops->sync_page(). Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 01 3月, 2011 1 次提交
-
-
由 Justin P. Mattock 提交于
Signed-off-by: NJustin P. Mattock <justinmattock@gmail.com> Signed-off-by: NJiri Kosina <jkosina@suse.cz>
-
- 02 2月, 2011 1 次提交
-
-
由 Eric Paris 提交于
SELinux would like to implement a new labeling behavior of newly created inodes. We currently label new inodes based on the parent and the creating process. This new behavior would also take into account the name of the new object when deciding the new label. This is not the (supposed) full path, just the last component of the path. This is very useful because creating /etc/shadow is different than creating /etc/passwd but the kernel hooks are unable to differentiate these operations. We currently require that userspace realize it is doing some difficult operation like that and than userspace jumps through SELinux hoops to get things set up correctly. This patch does not implement new behavior, that is obviously contained in a seperate SELinux patch, but it does pass the needed name down to the correct LSM hook. If no such name exists it is fine to pass NULL. Signed-off-by: NEric Paris <eparis@redhat.com>
-
- 07 1月, 2011 1 次提交
-
-
由 Nick Piggin 提交于
RCU free the struct inode. This will allow: - Subsequent store-free path walking patch. The inode must be consulted for permissions when walking, so an RCU inode reference is a must. - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want to take i_lock no longer need to take sb_inode_list_lock to walk the list in the first place. This will simplify and optimize locking. - Could remove some nested trylock loops in dcache code - Could potentially simplify things a bit in VM land. Do not need to take the page lock to follow page->mapping. The downsides of this is the performance cost of using RCU. In a simple creat/unlink microbenchmark, performance drops by about 10% due to inability to reuse cache-hot slab objects. As iterations increase and RCU freeing starts kicking over, this increases to about 20%. In cases where inode lifetimes are longer (ie. many inodes may be allocated during the average life span of a single inode), a lot of this cache reuse is not applicable, so the regression caused by this patch is smaller. The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU, however this adds some complexity to list walking and store-free path walking, so I prefer to implement this at a later date, if it is shown to be a win in real situations. I haven't found a regression in any non-micro benchmark so I doubt it will be a problem. Signed-off-by: NNick Piggin <npiggin@kernel.dk>
-
- 29 10月, 2010 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 26 10月, 2010 3 次提交
-
-
由 Christoph Hellwig 提交于
Instead of always assigning an increasing inode number in new_inode move the call to assign it into those callers that actually need it. For now callers that need it is estimated conservatively, that is the call is added to all filesystems that do not assign an i_ino by themselves. For a few more filesystems we can avoid assigning any inode number given that they aren't user visible, and for others it could be done lazily when an inode number is actually needed, but that's left for later patches. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Clones an existing reference to inode; caller must already hold one. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
note: for race-free uses you inode_lock held Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 18 8月, 2010 1 次提交
-
-
由 Hugh Dickins 提交于
list_add() corruption messages reported from shmem_fill_super()'s recently introduced percpu_counter_init(): shmem_put_super() needs to remember to percpu_counter_destroy(). And also check error from percpu_counter_init(). Reported-bisected-and-tested-by: NTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: NHugh Dickins <hughd@google.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 8月, 2010 6 次提交
-
-
由 Shaohua Li 提交于
I'm running a shmem pagefault test case (see attached file) under a 64 CPU system. Profile shows shmem_inode_info->lock is heavily contented and 100% CPUs time are trying to get the lock. In the pagefault (no swap) case, shmem_getpage gets the lock twice, the last one is avoidable if we prealloc a page so we could reduce one time of locking. This is what below patch does. The result of the test case: 2.6.35-rc3: ~20s 2.6.35-rc3 + patch: ~12s so this is 40% improvement. One might argue if we could have better locking for shmem. But even shmem is lockless, the pagefault will soon have pagecache lock heavily contented because shmem must add new page to pagecache. So before we have better locking for pagecache, improving shmem locking doesn't have too much improvement. I did a similar pagefault test against a ramfs file, the test result is ~10.5s. [akpm@linux-foundation.org: fix comment, clean up code layout, elimintate code duplication] Signed-off-by: NShaohua Li <shaohua.li@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: "Zhang, Yanmin" <yanmin.zhang@intel.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Tim Chen 提交于
The current implementation of tmpfs is not scalable. We found that stat_lock is contended by multiple threads when we need to get a new page, leading to useless spinning inside this spin lock. This patch makes use of the percpu_counter library to maintain local count of used blocks to speed up getting and returning of pages. So the acquisition of stat_lock is unnecessary for getting and returning blocks, improving the performance of tmpfs on system with large number of cpus. On a 4 socket 32 core NHM-EX system, we saw improvement of 270%. The implementation below has a slight chance of race between threads causing a slight overshoot of the maximum configured blocks. However, any overshoot is small, and is bounded by the number of cpus. This happens when the number of used blocks is slightly below the maximum configured blocks when a thread checks the used block count, and another thread allocates the last block before the current thread does. This should not be a problem for tmpfs, as the overshoot is most likely to be a few blocks and bounded. If a strict limit is really desired, then configured the max blocks to be the limit less the number of cpus in system. Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Christoph Hellwig 提交于
Make sure we check the truncate constraints early on in ->setattr by adding those checks to inode_change_ok. Also clean up and document inode_change_ok to make this obvious. As a fallout we don't have to call inode_newsize_ok from simple_setsize and simplify it down to a truncate_setsize which doesn't return an error. This simplifies a lot of setattr implementations and means we use truncate_setsize almost everywhere. Get rid of fat_setsize now that it's trivial and mark ext2_setsize static to make the calling convention obvious. Keep the inode_newsize_ok in vmtruncate for now as all callers need an audit for its removal anyway. Note: setattr code in ecryptfs doesn't call inode_change_ok at all and needs a deeper audit, but that is left for later. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Christoph Hellwig 提交于
Make sure we call inode_change_ok before doing any changes in ->setattr, and make sure to call it even if our fs wants to ignore normal UNIX permissions, but use the ATTR_FORCE to skip those. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Christoph Hellwig 提交于
Despite its name it's now a generic implementation of ->setattr, but rather a helper to copy attributes from a struct iattr to the inode. Rename it to setattr_copy to reflect this fact. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 05 6月, 2010 1 次提交
-
-
由 Nick Piggin 提交于
mtime and ctime should be changed only if the file size has actually changed. Patches changing ext2 and tmpfs from vmtruncate to new truncate sequence has caused regressions where they always update timestamps. There is some strange cases in POSIX where truncate(2) must not update times unless the size has acutally changed, see 6e656be8. This area is all still rather buggy in different ways in a lot of filesystems and needs a cleanup and audit (ideally the vfs will provide a simple attribute or call to direct all filesystems exactly which attributes to change). But coming up with the best solution will take a while and is not appropriate for rc anyway. So fix recent regression for now. Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 28 5月, 2010 3 次提交
-
-
由 npiggin@suse.de 提交于
Cc: Christoph Hellwig <hch@lst.de> Acked-by: NHugh Dickins <hugh.dickins@tiscali.co.uk> Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Christoph Hellwig 提交于
We don't name our generic fsync implementations very well currently. The no-op implementation for in-memory filesystems currently is called simple_sync_file which doesn't make too much sense to start with, the the generic one for simple filesystems is called simple_fsync which can lead to some confusion. This patch renames the generic file fsync method to generic_file_fsync to match the other generic_file_* routines it is supposed to be used with, and the no-op implementation to noop_fsync to make it obvious what to expect. In addition add some documentation for both methods. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Daisuke Nishimura 提交于
This patch adds support for moving charge of file pages, which include normal file, tmpfs file and swaps of tmpfs file. It's enabled by setting bit 1 of <target cgroup>/memory.move_charge_at_immigrate. Unlike the case of anonymous pages, file pages(and swaps) in the range mmapped by the task will be moved even if the task hasn't done page fault, i.e. they might not be the task's "RSS", but other task's "RSS" that maps the same file. And mapcount of the page is ignored(the page can be moved even if page_mapcount(page) > 1). So, conditions that the page/swap should be met to be moved is that it must be in the range mmapped by the target task and it must be charged to the old cgroup. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix warning] Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-