- 26 2月, 2013 2 次提交
-
-
由 Namjae Jeon 提交于
This patch is a follow up on below patch: [PATCH] exportfs: add FILEID_INVALID to indicate invalid fid_type commit: 216b6cbdSigned-off-by: NNamjae Jeon <namjae.jeon@samsung.com> Signed-off-by: NVivek Trivedi <t.vivek@samsung.com> Acked-by: NSteven Whitehouse <swhiteho@redhat.com> Acked-by: NSage Weil <sage@inktank.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Note that provided ->d_dname() reproduces what we used to get for those guys in e.g. /proc/self/maps; it might be a good idea to change that to something less ugly, but for now let's keep the existing user-visible behaviour Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 24 2月, 2013 3 次提交
-
-
由 Greg Thelen 提交于
Fix several mempolicy leaks in the tmpfs mount logic. These leaks are slow - on the order of one object leaked per mount attempt. Leak 1 (umount doesn't free mpol allocated in mount): while true; do mount -t tmpfs -o mpol=interleave,size=100M nodev /mnt umount /mnt done Leak 2 (errors parsing remount options will leak mpol): mount -t tmpfs -o size=100M nodev /mnt while true; do mount -o remount,mpol=interleave,size=x /mnt 2> /dev/null done umount /mnt Leak 3 (multiple mpol per mount leak mpol): while true; do mount -t tmpfs -o mpol=interleave,mpol=interleave,size=100M nodev /mnt umount /mnt done This patch fixes all of the above. I could have broken the patch into three pieces but is seemed easier to review as one. [akpm@linux-foundation.org: fix handling of mpol_parse_str() errors, per Hugh] Signed-off-by: NGreg Thelen <gthelen@google.com> Acked-by: NHugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Greg Thelen 提交于
The tmpfs remount logic preserves filesystem mempolicy if the mpol=M option is not specified in the remount request. A new policy can be specified if mpol=M is given. Before this patch remounting an mpol bound tmpfs without specifying mpol= mount option in the remount request would set the filesystem's mempolicy object to a freed mempolicy object. To reproduce the problem boot a DEBUG_PAGEALLOC kernel and run: # mkdir /tmp/x # mount -t tmpfs -o size=100M,mpol=interleave nodev /tmp/x # grep /tmp/x /proc/mounts nodev /tmp/x tmpfs rw,relatime,size=102400k,mpol=interleave:0-3 0 0 # mount -o remount,size=200M nodev /tmp/x # grep /tmp/x /proc/mounts nodev /tmp/x tmpfs rw,relatime,size=204800k,mpol=??? 0 0 # note ? garbage in mpol=... output above # dd if=/dev/zero of=/tmp/x/f count=1 # panic here Panic: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [< (null)>] (null) [...] Oops: 0010 [#1] SMP DEBUG_PAGEALLOC Call Trace: mpol_shared_policy_init+0xa5/0x160 shmem_get_inode+0x209/0x270 shmem_mknod+0x3e/0xf0 shmem_create+0x18/0x20 vfs_create+0xb5/0x130 do_last+0x9a1/0xea0 path_openat+0xb3/0x4d0 do_filp_open+0x42/0xa0 do_sys_open+0xfe/0x1e0 compat_sys_open+0x1b/0x20 cstar_dispatch+0x7/0x1f Non-debug kernels will not crash immediately because referencing the dangling mpol will not cause a fault. Instead the filesystem will reference a freed mempolicy object, which will cause unpredictable behavior. The problem boils down to a dropped mpol reference below if shmem_parse_options() does not allocate a new mpol: config = *sbinfo shmem_parse_options(data, &config, true) mpol_put(sbinfo->mpol) sbinfo->mpol = config.mpol /* BUG: saves unreferenced mpol */ This patch avoids the crash by not releasing the mempolicy if shmem_parse_options() doesn't create a new mpol. How far back does this issue go? I see it in both 2.6.36 and 3.3. I did not look back further. Signed-off-by: NGreg Thelen <gthelen@google.com> Acked-by: NHugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
In shmem_find_get_pages_and_swap(), use the faster radix tree iterator construct from commit 78c1d784 ("radix-tree: introduce bit-optimized iterator"). Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org> Acked-by: NHugh Dickins <hughd@google.com> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 23 2月, 2013 3 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Anatol Pomozov 提交于
Allocating a file structure in function get_empty_filp() might fail because of several reasons: - not enough memory for file structures - operation is not allowed - user is over its limit Currently the function returns NULL in all cases and we loose the exact reason of the error. All callers of get_empty_filp() assume that the function can fail with ENFILE only. Return error through pointer. Change all callers to preserve this error code. [AV: cleaned up a bit, carved the get_empty_filp() part out into a separate commit (things remaining here deal with alloc_file()), removed pipe(2) behaviour change] Signed-off-by: NAnatol Pomozov <anatol.pomozov@gmail.com> Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 27 1月, 2013 1 次提交
-
-
由 Eric W. Biederman 提交于
There is no backing store to tmpfs and file creation rules are the same as for any other filesystem so it is semantically safe to allow unprivileged users to mount it. ramfs is safe for the same reasons so allow either flavor of tmpfs to be mounted by a user namespace root user. The memory control group successfully limits how much memory tmpfs can consume on any system that cares about a user namespace root using tmpfs to exhaust memory the memory control group can be deployed. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
- 03 1月, 2013 1 次提交
-
-
由 Hugh Dickins 提交于
Remove the unused argument (formerly no_context) from mpol_parse_str() and from mpol_to_str(). Signed-off-by: NHugh Dickins <hughd@google.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 12月, 2012 1 次提交
-
-
由 Andrew Morton 提交于
But the kernel decided to call it "origin" instead. Fix most of the sites. Acked-by: NHugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 12月, 2012 1 次提交
-
-
由 Hugh Dickins 提交于
Revert 3.5's commit f21f8062 ("tmpfs: revert SEEK_DATA and SEEK_HOLE") to reinstate 4fb5ef08 ("tmpfs: support SEEK_DATA and SEEK_HOLE"), with the intervening additional arg to generic_file_llseek_size(). In 3.8, ext4 is expected to join btrfs, ocfs2 and xfs with proper SEEK_DATA and SEEK_HOLE support; and a good case has now been made for it on tmpfs, so let's join the party. It's quite easy for tmpfs to scan the radix_tree to support llseek's new SEEK_DATA and SEEK_HOLE options: so add them while the minutiae are still on my mind (in particular, the !PageUptodate-ness of pages fallocated but still unwritten). [akpm@linux-foundation.org: fix warning with CONFIG_TMPFS=n] Signed-off-by: NHugh Dickins <hughd@google.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jaegeuk Hanse <jaegeuk.hanse@gmail.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Zheng Liu <wenqing.lz@taobao.com> Cc: Jeff liu <jeff.liu@oracle.com> Cc: Paul Eggert <eggert@cs.ucla.edu> Cc: Christoph Hellwig <hch@infradead.org> Cc: Josef Bacik <josef@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Andreas Dilger <adilger@dilger.ca> Cc: Marco Stornelli <marco.stornelli@gmail.com> Cc: Chris Mason <chris.mason@fusionio.com> Cc: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 07 12月, 2012 1 次提交
-
-
由 Mel Gorman 提交于
This fixes a regression in 3.7-rc, which has since gone into stable. Commit 00442ad0 ("mempolicy: fix a memory corruption by refcount imbalance in alloc_pages_vma()") changed get_vma_policy() to raise the refcount on a shmem shared mempolicy; whereas shmem_alloc_page() went on expecting alloc_page_vma() to drop the refcount it had acquired. This deserves a rework: but for now fix the leak in shmem_alloc_page(). Hugh: shmem_swapin() did not need a fix, but surely it's clearer to use the same refcounting there as in shmem_alloc_page(), delete its onstack mempolicy, and the strange mpol_cond_copy() and __mpol_cond_copy() - those were invented to let swapin_readahead() make an unknown number of calls to alloc_pages_vma() with one mempolicy; but since 00442ad0, alloc_pages_vma() has kept refcount in balance, so now no problem. Reported-and-tested-by: NTommi Rantala <tt.rantala@gmail.com> Signed-off-by: NMel Gorman <mgorman@suse.de> Signed-off-by: NHugh Dickins <hughd@google.com> Cc: stable@vger.kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 17 11月, 2012 2 次提交
-
-
由 Hugh Dickins 提交于
Under a particular load on one machine, I have hit shmem_evict_inode()'s BUG_ON(inode->i_blocks), enough times to narrow it down to a particular race between swapout and eviction. It comes from the "if (freed > 0)" asymmetry in shmem_recalc_inode(), and the lack of coherent locking between mapping's nrpages and shmem's swapped count. There's a window in shmem_writepage(), between lowering nrpages in shmem_delete_from_page_cache() and then raising swapped count, when the freed count appears to be +1 when it should be 0, and then the asymmetry stops it from being corrected with -1 before hitting the BUG. One answer is coherent locking: using tree_lock throughout, without info->lock; reasonable, but the raw_spin_lock in percpu_counter_add() on used_blocks makes that messier than expected. Another answer may be a further effort to eliminate the weird shmem_recalc_inode() altogether, but previous attempts at that failed. So far undecided, but for now change the BUG_ON to WARN_ON: in usual circumstances it remains a useful consistency check. Signed-off-by: NHugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
Fuzzing with trinity hit the "impossible" VM_BUG_ON(error) (which Fedora has converted to WARNING) in shmem_getpage_gfp(): WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70() Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49 Call Trace: warn_slowpath_common+0x7f/0xc0 warn_slowpath_null+0x1a/0x20 shmem_getpage_gfp+0xa5c/0xa70 shmem_fault+0x4f/0xa0 __do_fault+0x71/0x5c0 handle_pte_fault+0x97/0xae0 handle_mm_fault+0x289/0x350 __do_page_fault+0x18e/0x530 do_page_fault+0x2b/0x50 page_fault+0x28/0x30 tracesys+0xe1/0xe6 Thanks to Johannes for pointing to truncation: free_swap_and_cache() only does a trylock on the page, so the page lock we've held since before confirming swap is not enough to protect against truncation. What cleanup is needed in this case? Just delete_from_swap_cache(), which takes care of the memcg uncharge. Signed-off-by: NHugh Dickins <hughd@google.com> Reported-by: NDave Jones <davej@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 10月, 2012 1 次提交
-
-
由 Hugh Dickins 提交于
Fuzzing with trinity oopsed on the 1st instruction of shmem_fh_to_dentry(), u64 inum = fid->raw[2]; which is unhelpfully reported as at the end of shmem_alloc_inode(): BUG: unable to handle kernel paging request at ffff880061cd3000 IP: [<ffffffff812190d0>] shmem_alloc_inode+0x40/0x40 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Call Trace: [<ffffffff81488649>] ? exportfs_decode_fh+0x79/0x2d0 [<ffffffff812d77c3>] do_handle_open+0x163/0x2c0 [<ffffffff812d792c>] sys_open_by_handle_at+0xc/0x10 [<ffffffff83a5f3f8>] tracesys+0xe1/0xe6 Right, tmpfs is being stupid to access fid->raw[2] before validating that fh_len includes it: the buffer kmalloc'ed by do_sys_name_to_handle() may fall at the end of a page, and the next page not be present. But some other filesystems (ceph, gfs2, isofs, reiserfs, xfs) are being careless about fh_len too, in fh_to_dentry() and/or fh_to_parent(), and could oops in the same way: add the missing fh_len checks to those. Reported-by: NSasha Levin <levinsasha928@gmail.com> Signed-off-by: NHugh Dickins <hughd@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Sage Weil <sage@inktank.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: stable@vger.kernel.org Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 09 10月, 2012 1 次提交
-
-
由 Konstantin Khlebnikov 提交于
Move actual pte filling for non-linear file mappings into the new special vma operation: ->remap_pages(). Filesystems must implement this method to get non-linear mapping support, if it uses filemap_fault() then generic_file_remap_pages() can be used. Now device drivers can implement this method and obtain nonlinear vma support. Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Chris Metcalf <cmetcalf@tilera.com> #arch/tile Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Eric Paris <eparis@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Venkatesh Pallipadi <venki@google.com> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 8月, 2012 1 次提交
-
-
由 Aristeu Rozanski 提交于
Extract in-memory xattr APIs from tmpfs. Will be used by cgroup. $ size vmlinux.o text data bss dec hex filename 4658782 880729 5195032 10734543 a3cbcf vmlinux.o $ size vmlinux.o text data bss dec hex filename 4658957 880729 5195032 10734718 a3cc7e vmlinux.o v7: - checkpatch warnings fixed - Implement the changes requested by Hugh Dickins: - make simple_xattrs_init and simple_xattrs_free inline - get rid of locking and list reinitialization in simple_xattrs_free, they're not needed v6: - no changes v5: - no changes v4: - move simple_xattrs_free() to fs/xattr.c v3: - in kmem_xattrs_free(), reinitialize the list - use simple_xattr_* prefix - introduce simple_xattr_add() to prevent direct list usage Original-patch-by: NLi Zefan <lizefan@huawei.com> Cc: Li Zefan <lizefan@huawei.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Lennart Poettering <lpoetter@redhat.com> Acked-by: NHugh Dickins <hughd@google.com> Signed-off-by: NLi Zefan <lizefan@huawei.com> Signed-off-by: NAristeu Rozanski <aris@redhat.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 01 8月, 2012 1 次提交
-
-
由 Nathan Zimmer 提交于
When tmpfs has the interleave memory policy, it always starts allocating for each file from node 0 at offset 0. When there are many small files, the lower nodes fill up disproportionately. This patch spreads out node usage by starting files at nodes other than 0, by using the inode number to bias the starting node for interleave. Signed-off-by: NNathan Zimmer <nzimmer@sgi.com> Signed-off-by: NHugh Dickins <hughd@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Nick Piggin <npiggin@gmail.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 7月, 2012 1 次提交
-
-
由 Al Viro 提交于
boolean "does it have to be exclusive?" flag is passed instead; Local filesystem should just ignore it - the object is guaranteed not to be there yet. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 12 7月, 2012 3 次提交
-
-
由 Hugh Dickins 提交于
shmem_add_to_page_cache() has three callsites, but only one of them wants the radix_tree_preload() (an exceptional entry guarantees that the radix tree node is present in the other cases), and only that site can achieve mem_cgroup_uncharge_cache_page() (PageSwapCache makes it a no-op in the other cases). We did it this way originally to reflect add_to_page_cache_locked(); but it's confusing now, so move the radix_tree preloading and mem_cgroup uncharging to that one caller. Signed-off-by: NHugh Dickins <hughd@google.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
When adding the page_private checks before calling shmem_replace_page(), I did realize that there is a further race, but thought it too unlikely to need a hurried fix. But independently I've been chasing why a mem cgroup's memory.stat sometimes shows negative rss after all tasks have gone: I expected it to be a stats gathering bug, but actually it's shmem swapping's fault. It's an old surprise, that when you lock_page(lookup_swap_cache(swap)), the page may have been removed from swapcache before getting the lock; or it may have been freed and reused and be back in swapcache; and it can even be using the same swap location as before (page_private same). The swapoff case is already secure against this (swap cannot be reused until the whole area has been swapped off, and a new swapped on); and shmem_getpage_gfp() is protected by shmem_add_to_page_cache()'s check for the expected radix_tree entry - but a little too late. By that time, we might have already decided to shmem_replace_page(): I don't know of a problem from that, but I'd feel more at ease not to do so spuriously. And we have already done mem_cgroup_cache_charge(), on perhaps the wrong mem cgroup: and this charge is not then undone on the error path, because PageSwapCache ends up preventing that. It's this last case which causes the occasional negative rss in memory.stat: the page is charged here as cache, but (sometimes) found to be anon when eventually it's uncharged - and in between, it's an undeserved charge on the wrong memcg. Fix this by adding an earlier check on the radix_tree entry: it's inelegant to descend the tree twice, but swapping is not the fast path, and a better solution would need a pair (try+commit) of memcg calls, and a rework of shmem_replace_page() to keep out of the swapcache. We can use the added shmem_confirm_swap() function to replace the find_get_page+page_cache_release we were already doing on the error path. And add a comment on that -EEXIST: it seems a peculiar errno to be using, but originates from its use in radix_tree_insert(). [It can be surprising to see positive rss left in a memcg's memory.stat after all tasks have gone, since it is supposed to count anonymous but not shmem. Aside from sharing anon pages via fork with a task in some other memcg, it often happens after swapping: because a swap page can't be freed while under writeback, nor while locked. So it's not an error, and these residual pages are easily freed once pressure demands.] Signed-off-by: NHugh Dickins <hughd@google.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
Revert 4fb5ef08 ("tmpfs: support SEEK_DATA and SEEK_HOLE"). I believe it's correct, and it's been nice to have from rc1 to rc6; but as the original commit said: I don't know who actually uses SEEK_DATA or SEEK_HOLE, and whether it would be of any use to them on tmpfs. This code adds 92 lines and 752 bytes on x86_64 - is that bloat or worthwhile? Nobody asked for it, so I conclude that it's bloat: let's revert tmpfs to the dumb generic support for v3.5. We can always reinstate it later if useful, and anyone needing it in a hurry can just get it out of git. Signed-off-by: NHugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Josef Bacik <josef@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Andreas Dilger <adilger@dilger.ca> Cc: Dave Chinner <david@fromorbit.com> Cc: Marco Stornelli <marco.stornelli@gmail.com> Cc: Jeff liu <jeff.liu@oracle.com> Cc: Chris Mason <chris.mason@fusionio.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 6月, 2012 1 次提交
-
-
由 Eric Dumazet 提交于
Dave Jones reported a kernel BUG at mm/slub.c:3474! triggered by splice_shrink_spd() called from vmsplice_to_pipe() commit 35f3d14d (pipe: add support for shrinking and growing pipes) added capability to adjust pipe->buffers. Problem is some paths don't hold pipe mutex and assume pipe->buffers doesn't change for their duration. Fix this by adding nr_pages_max field in struct splice_pipe_desc, and use it in place of pipe->buffers where appropriate. splice_shrink_spd() loses its struct pipe_inode_info argument. Reported-by: NDave Jones <davej@redhat.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Tom Herbert <therbert@google.com> Cc: stable <stable@vger.kernel.org> # 2.6.35 Tested-by: NDave Jones <davej@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 08 6月, 2012 1 次提交
-
-
由 Hugh Dickins 提交于
Commit bde05d1c ("shmem: replace page if mapping excludes its zone") is not at all likely to break for anyone, but it was an earlier version from before review feedback was incorporated. Fix that up now. * shmem_replace_page must flush_dcache_page after copy_highpage [akpm] * Expand comment on why shmem_unuse_inode needs page_swapcount [akpm] * Remove excess of VM_BUG_ONs from shmem_replace_page [wangcong] * Check page_private matches swap before calling shmem_replace_page [hughd] * shmem_replace_page allow for unexpected race in radix_tree lookup [hughd] Signed-off-by: NHugh Dickins <hughd@google.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Stephane Marchesin <marcheu@chromium.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Dave Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Rob Clark <rob.clark@linaro.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 30 5月, 2012 10 次提交
-
-
由 Al Viro 提交于
pass inode + parent's inode or NULL instead of dentry + bool saying whether we want the parent or not. NOTE: that needs ceph fix folded in. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Hugh Dickins 提交于
It's quite easy for tmpfs to scan the radix_tree to support llseek's new SEEK_DATA and SEEK_HOLE options: so add them while the minutiae are still on my mind (in particular, the !PageUptodate-ness of pages fallocated but still unwritten). But I don't know who actually uses SEEK_DATA or SEEK_HOLE, and whether it would be of any use to them on tmpfs. This code adds 92 lines and 752 bytes on x86_64 - is that bloat or worthwhile? [akpm@linux-foundation.org: fix warning with CONFIG_TMPFS=n] Signed-off-by: NHugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Josef Bacik <josef@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Andreas Dilger <adilger@dilger.ca> Cc: Dave Chinner <david@fromorbit.com> Cc: Marco Stornelli <marco.stornelli@gmail.com> Cc: Jeff liu <jeff.liu@oracle.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
As it stands, a large fallocate() on tmpfs is liable to fill memory with pages, freed on failure except when they run into swap, at which point they become fixed into the file despite the failure. That feels quite wrong, to be consuming resources precisely when they're in short supply. Go the other way instead: shmem_fallocate() indicate the range it has fallocated to shmem_writepage(), keeping count of pages it's allocating; shmem_writepage() reactivate instead of swapping out pages fallocated by this syscall (but happily swap out those from earlier occasions), keeping count; shmem_fallocate() compare counts and give up once the reactivated pages have started to coming back to writepage (approximately: some zones would in fact recycle faster than others). This is a little unusual, but works well: although we could consider the failure to swap as a bug, and fix it later with SWAP_MAP_FALLOC handling added in swapfile.c and memcontrol.c, I doubt that we shall ever want to. (If there's no swap, an over-large fallocate() on tmpfs is limited in the same way as writing: stopped by rlimit, or by tmpfs mount size if that was set sensibly, or by __vm_enough_memory() heuristics if OVERCOMMIT_GUESS or OVERCOMMIT_NEVER. If OVERCOMMIT_ALWAYS, then it is liable to OOM-kill others as writing would, but stops and frees if interrupted.) Now that everything is freed on failure, we can then skip updating ctime. Signed-off-by: NHugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Cong Wang <amwang@redhat.com> Cc: Kay Sievers <kay@vrfy.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
In the previous episode, we left the already-fallocated pages attached to the file when shmem_fallocate() fails part way through. Now try to do better, by extending the earlier optimization of !Uptodate pages (then always under page lock) to !Uptodate pages (outside of page lock), representing fallocated pages. And don't waste time clearing them at the time of fallocate(), leave that until later if necessary. Adapt shmem_truncate_range() to shmem_undo_range(), so that a failing fallocate can recognize and remove precisely those !Uptodate allocations which it added (and were not independently allocated by racing tasks). But unless we start playing with swapfile.c and memcontrol.c too, once one of our fallocated pages reaches shmem_writepage(), we do then have to instantiate it as an ordinarily allocated page, before swapping out. This is unsatisfactory, but improved in the next episode. Signed-off-by: NHugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Cong Wang <amwang@redhat.com> Cc: Kay Sievers <kay@vrfy.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
The systemd plumbers expressed a wish that tmpfs support preallocation. Cong Wang wrote a patch, but several kernel guys expressed scepticism: https://lkml.org/lkml/2011/11/18/137 Christoph Hellwig: What for exactly? Please explain why preallocating on tmpfs would make any sense. Kay Sievers: To be able to safely use mmap(), regarding SIGBUS, on files on the /dev/shm filesystem. The glibc fallback loop for -ENOSYS [or -EOPNOTSUPP] on fallocate is just ugly. Hugh Dickins: If tmpfs is going to support fallocate(FALLOC_FL_PUNCH_HOLE), it would seem perverse to permit the deallocation but fail the allocation. Christoph Hellwig: Agreed. Now that we do have shmem_fallocate() for hole-punching, plumb in basic support for preallocation mode too. It's fairly straightforward (though quite a few details needed attention), except for when it fails part way through. What a pity that fallocate(2) was not specified to return the length allocated, permitting short fallocations! As it is, when it fails part way through, we ought to free what has just been allocated by this system call; but must be very sure not to free any allocated earlier, or any allocated by racing accesses (not all excluded by i_mutex). But we cannot distinguish them: so in this patch simply leak allocations on partial failure (they will be freed later if the file is removed). An attractive alternative approach would have been for fallocate() not to allocate pages at all, but note reservations by entries in the radix-tree. But that would give less assurance, and, critically, would be hard to fit with mem cgroups (who owns the reservations?): allocating pages lets fallocate() behave in just the same way as write(). Based-on-patch-by: NCong Wang <amwang@redhat.com> Signed-off-by: NHugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Cong Wang <amwang@redhat.com> Cc: Kay Sievers <kay@vrfy.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
Remove vmtruncate_range(), and remove the truncate_range method from struct inode_operations: only tmpfs ever supported it, and tmpfs has now converted over to using the fallocate method of file_operations. Update Documentation accordingly, adding (setlease and) fallocate lines. And while we're in mm.h, remove duplicate declarations of shmem_lock() and shmem_file_setup(): everyone is now using the ones in shmem_fs.h. Based-on-patch-by: NCong Wang <amwang@redhat.com> Signed-off-by: NHugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Cong Wang <amwang@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
tmpfs has supported hole-punching since 2.6.16, via madvise(,,MADV_REMOVE). But nowadays fallocate(,FALLOC_FL_PUNCH_HOLE|FALLOC_FL_KEEP_SIZE,,) is the agreed way to punch holes. So add shmem_fallocate() to support that, and tweak shmem_truncate_range() to support partial pages at both the beginning and end of range (never needed for madvise, which demands rounded addr and rounds up length). Based-on-patch-by: NCong Wang <amwang@redhat.com> Signed-off-by: NHugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Cong Wang <amwang@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
Nick proposed years ago that tmpfs should avoid clearing its pages where write will overwrite them with new data, as ramfs has long done. But I messed it up and just got bad data. Tried again recently, it works fine. Here's time output for writing 4GiB 16 times on this Core i5 laptop: before: real 0m21.169s user 0m0.028s sys 0m21.057s real 0m21.382s user 0m0.016s sys 0m21.289s real 0m21.311s user 0m0.020s sys 0m21.217s after: real 0m18.273s user 0m0.032s sys 0m18.165s real 0m18.354s user 0m0.020s sys 0m18.265s real 0m18.440s user 0m0.032s sys 0m18.337s ramfs: real 0m16.860s user 0m0.028s sys 0m16.765s real 0m17.382s user 0m0.040s sys 0m17.273s real 0m17.133s user 0m0.044s sys 0m17.021s Yes, I have done perf reports, but they need more explanation than they deserve: in summary, clear_page vanishes, its cache loading shifts into copy_user_generic_unrolled; shmem_getpage_gfp goes down, and surprisingly mark_page_accessed goes way up - I think because they are respectively where the cache gets to be reloaded after being purged by clear or copy. Suggested-by: NNick Piggin <npiggin@gmail.com> Signed-off-by: NHugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
Let tmpfs into the NOSEC optimization (avoiding file_remove_suid() overhead on most common writes): set MS_NOSEC on its superblocks. Signed-off-by: NHugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
The GMA500 GPU driver uses GEM shmem objects, but with a new twist: the backing RAM has to be below 4GB. Not a problem while the boards supported only 4GB: but now Intel's D2700MUD boards support 8GB, and their GMA3600 is managed by the GMA500 driver. shmem/tmpfs has never pretended to support hardware restrictions on the backing memory, but it might have appeared to do so before v3.1, and even now it works fine until a page is swapped out then back in. When read_cache_page_gfp() supplied a freshly allocated page for copy, that compensated for whatever choice might have been made by earlier swapin readahead; but swapoff was likely to destroy the illusion. We'd like to continue to support GMA500, so now add a new shmem_should_replace_page() check on the zone when about to move a page from swapcache to filecache (in swapin and swapoff cases), with shmem_replace_page() to allocate and substitute a suitable page (given gma500/gem.c's mapping_set_gfp_mask GFP_KERNEL | __GFP_DMA32). This does involve a minor extension to mem_cgroup_replace_page_cache() (the page may or may not have already been charged); and I've removed a comment and call to mem_cgroup_uncharge_cache_page(), which in fact is always a no-op while PageSwapCache. Also removed optimization of an unlikely path in shmem_getpage_gfp(), now that we need to check PageSwapCache more carefully (a racing caller might already have made the copy). And at one point shmem_unuse_inode() needs to use the hitherto private page_swapcount(), to guard against racing with inode eviction. It would make sense to extend shmem_should_replace_page(), to cover cpuset and NUMA mempolicy restrictions too, but set that aside for now: needs a cleanup of shmem mempolicy handling, and more testing, and ought to handle swap faults in do_swap_page() as well as shmem. Signed-off-by: NHugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@infradead.org> Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Stephane Marchesin <marcheu@chromium.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Dave Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Rob Clark <rob.clark@linaro.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 16 5月, 2012 1 次提交
-
-
由 Eric W. Biederman 提交于
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com> Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
-
- 06 5月, 2012 1 次提交
-
-
由 Jan Kara 提交于
After we moved inode_sync_wait() from end_writeback() it doesn't make sense to call the function end_writeback() anymore. Rename it to clear_inode() which well says what the function really does - set I_CLEAR flag. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
-
- 22 3月, 2012 1 次提交
-
-
由 Jarkko Sakkinen 提交于
Adds to generic xattr support introduced in Linux 3.0 by implementing initxattrs callback. This enables consulting of security attributes from LSM and EVM when inode is created. [hughd@google.com: moved under CONFIG_TMPFS_XATTR, with memcpy in shmem_xattr_alloc] Signed-off-by: NJarkko Sakkinen <jarkko.sakkinen@intel.com> Reviewed-by: NJames Morris <james.l.morris@oracle.com> Signed-off-by: NHugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 21 3月, 2012 2 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-