- 18 12月, 2009 6 次提交
-
-
由 Masami Hiramatsu 提交于
Introduce coredump parameter data structure (struct coredump_params) to simplify binfmt->core_dump() arguments. Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com> Suggested-by: NIngo Molnar <mingo@elte.hu> Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Bernhard Walle 提交于
Fix following warning in linux-next by guarding the function definition (both the "extern" and the inline) with #ifdef __KERNEL__. usr/include/linux/vt.h:89: userspace cannot call function or variable defined in the kernel Introduced by commit 5ada918b ("vt: introduce and use vt_kmsg_redirect() function"). Signed-off-by: NBernhard Walle <bernhard@bwalle.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
This reverts commit e4c570c4, as requested by Alexey: "I think I gave a good enough arguments to not merge it. To iterate: * patch makes impossible to start using ext3 on EXT3_FS=n kernels without reboot. * this is done only for one pointer on task_struct" None of config options which define task_struct are tristate directly or effectively." Requested-by: NAlexey Dobriyan <adobriyan@gmail.com> Acked-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
This reverts commit e9496ff4. Quoth Al: "it's dependent on a lot of other stuff not currently in mainline and badly broken with current fs/namespace.c. Sorry, badly out-of-order cherry-pick from old queue. PS: there's a large pending series reworking the refcounting and lifetime rules for vfsmounts that will, among other things, allow to rip a subtree away _without_ dissolving connections in it, to be garbage-collected when all active references are gone. It's considerably saner wrt "is the subtree busy" logics, but it's nowhere near being ready for merge at the moment; this changeset is one of the things becoming possible with that sucker, but it certainly shouldn't have been picked during this cycle. My apologies..." Noticed-by: NEric Paris <eparis@redhat.com> Requested-by: NAl Viro <viro@ZenIV.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Hellwig 提交于
After I_SYNC was split from I_LOCK the leftover is always used together with I_NEW and thus superflous. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Christoph Hellwig 提交于
We recently go rid of all callers of do_sync_file_range as they're better served with vfs_fsync or the filemap_write_and_wait. Now that do_sync_file_range is down to a single caller fold it into it so that people don't start using it again accidentally. While at it also switch it from using __filemap_fdatawrite_range(..., WB_SYNC_ALL) to the more clear filemap_fdatawrite_range(). Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 17 12月, 2009 18 次提交
-
-
由 Erez Zadok 提交于
Copy the inode size and blocks from one inode to another correctly on 32-bit systems with CONFIG_SMP, CONFIG_PREEMPT, or CONFIG_LBDAF. Use proper inode spinlocks only when i_size/i_blocks cannot fit in one 32-bit word. Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk> Signed-off-by: NErez Zadok <ezk@cs.sunysb.edu> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Erez Zadok 提交于
This get_nlinks parameter was never used by the only mainline user, ecryptfs; and it has never been used by unionfs or wrapfs either. Acked-by: NDustin Kirkland <kirkland@canonical.com> Acked-by: NTyler Hicks <tyhicks@linux.vnet.ibm.com> Signed-off-by: NErez Zadok <ezk@cs.sunysb.edu> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Liam Girdwood 提交于
consumer.h requires device.h for stand alone build. Signed-off-by: NLiam Girdwood <lrg@slimlogic.co.uk>
-
由 Mark Brown 提交于
Since some regulators in the system may not support suspend mode configuration we need to allow some regulators to have a missing suspend mode configuration. Do this by requiring that disabled regulators are explicitly flagged and then skip over regulators that have no state specified. Try to avoid surprises by warning the if we could set the state but no configuration is provided. This also ensures that an all zeros configuration generates a warning rather than silently disabling the regulator. Reported-by: NJoonyoung Shim <jy0922.shim@samsung.com> Signed-off-by: NMark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: NLiam Girdwood <lrg@slimlogic.co.uk>
-
由 Mark Brown 提交于
The BuckWise DC-DC convertors in WM831x devices support switching to a second output voltage using the logic level on one of the device pins. This is intended to allow rapid voltage switching for uses like cpufreq, replacing the I2C or SPI write used to configure the voltage of the regulator with a much faster GPIO status change. This is implemented by keeping the DVS voltage configured as the maximum voltage permitted for the regulator. If a request is made for the maximum voltage then the GPIO is used to switch to the DVS voltage, otherwise the normal ON voltage is updated and used. This follows the idiom used by most cpufreq drivers, which drop the minimum voltage as the core frequency is dropped but use a constant maximum - raising the voltage should normally be fast, but lowering it may be slower. Configuration of the DVS MFP on the device should be done externally, for example via OTP. Support is present in the hardware for monitoring the status of the transition using a second GPIO. This is not currently implemented but platform data is provided for it - the driver currently assumes that the device will be configured to transition immediately - but platform data is provided to reduce merge issues once it is. Signed-off-by: NMark Brown <broonie@opensource.wolfsonmicro.com> Acked-by: NSamuel Ortiz <sameo@linux.intel.com> Signed-off-by: NLiam Girdwood <lrg@slimlogic.co.uk>
-
由 Wolfram Sang 提交于
Tested with a MX25-based custom board. Signed-off-by: NWolfram Sang <w.sang@pengutronix.de> Acked-by: NMark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: NLiam Girdwood <lrg@slimlogic.co.uk>
-
由 Dave Chinner 提交于
Change all async metadata buffers to use [READ|WRITE]_META I/O types so that the I/O doesn't get issued immediately. This allows merging of adjacent metadata requests but still prioritises them over bulk data. This shows a 10-15% improvement in sequential create speed of small files. Don't include the log buffers in this classification - leave them as sync types so they are issued immediately. Signed-off-by: NDave Chinner <dgc@sgi.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Shaohua Li 提交于
Signed-off-by: NShaohua Li <shaohua.li@intel.com> Signed-off-by: NLen Brown <len.brown@intel.com>
-
由 Shaohua Li 提交于
Signed-off-by: NShaohua Li <shaohua.li@intel.com> Signed-off-by: NLen Brown <len.brown@intel.com>
-
由 Shaohua Li 提交于
v2->v1: .improve debug info as suggedted by Bjorn,Kenji .API is using uuid string as suggested by Alexey Add an API to execute _OSC. A lot of devices can have this method, so add a generic API. Signed-off-by: NShaohua Li <shaohua.li@intel.com> Signed-off-by: NLen Brown <len.brown@intel.com>
-
由 Christoph Hellwig 提交于
Currently the locking in blockdev_direct_IO is a mess, we have three different locking types and very confusing checks for some of them. The most complicated one is DIO_OWN_LOCKING for reads, which happens to not actually be used. This patch gets rid of the DIO_OWN_LOCKING - as mentioned above the read case is unused anyway, and the write side is almost identical to DIO_NO_LOCKING. The difference is that DIO_NO_LOCKING always sets the create argument for the get_blocks callback to zero, but we can easily move that to the actual get_blocks callbacks. There are four users of the DIO_NO_LOCKING mode: gfs already ignores the create argument and thus is fine with the new version, ocfs2 only errors out if create were ever set, and we can remove this dead code now, the block device code only ever uses create for an error message if we are fully beyond the device which can never happen, and last but not least XFS will need the new behavour for writes. Now we can replace the lock_type variable with a flags one, where no flag means the DIO_NO_LOCKING behaviour and DIO_LOCKING is kept as the first flag. Separate out the check for not allowing to fill holes into a separate flag, although for now both flags always get set at the same time. Also revamp the documentation of the locking scheme to actually make sense. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Christoph Hellwig 提交于
Now that we cache the ACL pointers in the generic inode all the generic_acl cruft can go away and generic_acl.c can directly implement xattr handlers dealing with the full Posix ACL semantics for in-memory filesystems. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Christoph Hellwig 提交于
Add a flags argument to struct xattr_handler and pass it to all xattr handler methods. This allows using the same methods for multiple handlers, e.g. for the ACL methods which perform exactly the same action for the access and default ACLs, just using a different underlying attribute. With a little more groundwork it'll also allow sharing the methods for the regular user/trusted/secure handlers in extN, ocfs2 and jffs2 like it's already done for xfs in this patch. Also change the inode argument to the handlers to a dentry to allow using the handlers mechnism for filesystems that require it later, e.g. cifs. [with GFS2 bits updated by Steven Whitehouse <swhiteho@redhat.com>] Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJames Morris <jmorris@namei.org> Acked-by: NJoel Becker <joel.becker@oracle.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Kill the 'update' argument of ima_path_check(), kill dead code in ima. Current rules: ima counters are bumped at the same time when the file switches from put_filp() fodder to fput() one. Which happens exactly in two places - alloc_file() and __dentry_open(). Nothing else needs to do that at all. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Eric Paris 提交于
All users outside of fs/ of get_empty_filp() have been removed. This patch moves the definition from the include/ directory to internal.h so no new users crop up and removes the EXPORT_SYMBOL. I'd love to see open intents stop using it too, but that's a problem for another day and a smarter developer! Signed-off-by: NEric Paris <eparis@redhat.com> Acked-by: NMiklos Szeredi <miklos@szeredi.hu> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
... and have the caller grab both mnt and dentry; kill leak in infiniband, while we are at it. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 16 12月, 2009 16 次提交
-
-
由 Akinobu Mita 提交于
Use bitmap library and kill some unused iommu helper functions. 1. s/iommu_area_free/bitmap_clear/ 2. s/iommu_area_reserve/bitmap_set/ 3. Use bitmap_find_next_zero_area instead of find_next_zero_area This cannot be simple substitution because find_next_zero_area doesn't check the last bit of the limit in bitmap 4. Remove iommu_area_free, iommu_area_reserve, and find_next_zero_area Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Akinobu Mita 提交于
This introduces new bitmap functions: bitmap_set: Set specified bit area bitmap_clear: Clear specified bit area bitmap_find_next_zero_area: Find free bit area These are mostly stolen from iommu helper. The differences are: - Use find_next_bit instead of doing test_bit for each bit - Rewrite bitmap_set and bitmap_clear Instead of setting or clearing for each bit. - Check the last bit of the limit iommu-helper doesn't want to find such area - The return value if there is no zero area find_next_zero_area in iommu helper: returns -1 bitmap_find_next_zero_area: return >= bitmap size Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: "David S. Miller" <davem@davemloft.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Lothar Wassmann <LW@KARO-electronics.de> Cc: Roland Dreier <rolandd@cisco.com> Cc: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jean Delvare 提交于
resource_size() doesn't change the resource it operates on, so the res parameter can be marked const. Same for resource_type(). Signed-off-by: NJean Delvare <khali@linux-fr.org> Reviewed-by: NWANG Cong <xiyou.wangcong@gmail.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Hellwig 提交于
Currently the locking in blockdev_direct_IO is a mess, we have three different locking types and very confusing checks for some of them. The most complicated one is DIO_OWN_LOCKING for reads, which happens to not actually be used. This patch gets rid of the DIO_OWN_LOCKING - as mentioned above the read case is unused anyway, and the write side is almost identical to DIO_NO_LOCKING. The difference is that DIO_NO_LOCKING always sets the create argument for the get_blocks callback to zero, but we can easily move that to the actual get_blocks callbacks. There are four users of the DIO_NO_LOCKING mode: gfs already ignores the create argument and thus is fine with the new version, ocfs2 only errors out if create were ever set, and we can remove this dead code now, the block device code only ever uses create for an error message if we are fully beyond the device which can never happen, and last but not least XFS will need the new behavour for writes. Now we can replace the lock_type variable with a flags one, where no flag means the DIO_NO_LOCKING behaviour and DIO_LOCKING is kept as the first flag. Separate out the check for not allowing to fill holes into a separate flag, although for now both flags always get set at the same time. Also revamp the documentation of the locking scheme to actually make sense. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NChristoph Hellwig <hch@lst.de> Cc: Dave Chinner <david@fromorbit.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Zach Brown <zach.brown@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alex Elder <aelder@sgi.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Shaohua Li 提交于
Don't know the reason, but it appears ki_wait field of iocb never gets used. Signed-off-by: NShaohua Li <shaohua.li@intel.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Zach Brown <zach.brown@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Amerigo Wang 提交于
Implement shrinking the reserved memory for crash kernel, if it is more than enough. For example, if you have already reserved 128M, now you just want 100M, you can do: # echo $((100*1024*1024)) > /sys/kernel/kexec_crash_size Note, you can only do this before loading the crash kernel. Signed-off-by: NWANG Cong <amwang@redhat.com> Cc: Neil Horman <nhorman@redhat.com> Acked-by: NEric W. Biederman <ebiederm@xmission.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Amerigo Wang 提交于
We have HARD_MSGMAX lower on 64bit than on 32bit, since usually 64bit machines have more memory than 32bit machines. Making it higher on 64bit seems reasonable, and keep the original number on 32bit. Acked-by: NSerge E. Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: NWANG Cong <amwang@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Manfred Spraul 提交于
Based on Nick's findings: sysv sem has the concept of semaphore arrays that consist out of multiple semaphores. Atomic operations that affect multiple semaphores are supported. The patch is the first step for optimizing simple, single semaphore operations: In addition to the global list of all pending operations, a 2nd, per-semaphore list with the simple operations is added. Note: this patch does not make sense by itself, the new list is used nowhere. Signed-off-by: NManfred Spraul <manfred@colorfullife.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Pierre Peiffer <peifferp@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
Kill force_sig_specific(), this trivial wrapper has no callers. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
No changes in compiled code. The patch adds the new helper, si_fromuser() and changes check_kill_permission() to use this helper. The real effect of this patch is that from now we "officially" consider SEND_SIG_NOINFO signal as "from user-space" signals. This is already true if we look at the code which uses SEND_SIG_NOINFO, except __send_signal() has another opinion - see the next patch. The naming of these special SEND_SIG_XXX siginfo's is really bad imho. From __send_signal()'s pov they mean SEND_SIG_NOINFO from user SEND_SIG_PRIV from kernel SEND_SIG_FORCED no info Signed-off-by: NOleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@redhat.com> Reviewed-by: NSukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
Suggested by Roland. Change tracehook_report_syscall_exit() to look at step flag and send the trap signal if needed. This change affects ia64, microblaze, parisc, powerpc, sh. They pass nonzero "step" argument to tracehook but since it was ignored the tracee reports via ptrace_notify(), this is not right and not consistent. - PTRACE_SETSIGINFO doesn't work - if the tracer resumes the tracee with signr != 0 the new signal is generated rather than delivering it - If PT_TRACESYSGOOD is set the tracee reports the wrong exit_code I don't have a powerpc machine, but I think this test-case should see the difference: #include <unistd.h> #include <sys/ptrace.h> #include <sys/wait.h> #include <assert.h> #include <stdio.h> int main(void) { int pid, status; if (!(pid = fork())) { assert(ptrace(PTRACE_TRACEME) == 0); kill(getpid(), SIGSTOP); getppid(); return 0; } assert(pid == wait(&status)); assert(ptrace(PTRACE_SETOPTIONS, pid, 0, PTRACE_O_TRACESYSGOOD) == 0); assert(ptrace(PTRACE_SYSCALL, pid, 0,0) == 0); assert(pid == wait(&status)); assert(ptrace(PTRACE_SINGLESTEP, pid, 0,0) == 0); assert(pid == wait(&status)); if (status == 0x57F) return 0; printf("kernel bug: status=%X shouldn't have 0x80\n", status); return 1; } Signed-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NRoland McGrath <roland@redhat.com> Cc: <linux-arch@vger.kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
Suggested by Roland. Currently there is no way to synthesize a single-stepping trap in the arch-independent manner. This patch adds the default helper which fills siginfo_t, arch/ can can override it. Architetures which implement user_enable_single_step() should add user_single_step_siginfo() also. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NRoland McGrath <roland@redhat.com> Cc: <linux-arch@vger.kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
No functional changes. ptrace_init_task() looks confusing, as if we always auto-attach when "bool ptrace" argument is true, while in fact we attach only if current is traced. Make the code more explicit and kill now unused ptrace_link(). Signed-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NRoland McGrath <roland@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Daisuke Nishimura 提交于
mem_cgroup_move_parent() calls try_charge first and cancel_charge on failure. IMHO, charge/uncharge(especially charge) is high cost operation, so we should avoid it as far as possible. This patch tries to delay try_charge in mem_cgroup_move_parent() by re-ordering checks it does. And this patch renames mem_cgroup_move_account() to __mem_cgroup_move_account(), changes the return value of __mem_cgroup_move_account() from int to void, and adds a new wrapper(mem_cgroup_move_account()), which checks whether a @pc is valid for moving account and calls __mem_cgroup_move_account(). This patch removes the last caller of trylock_page_cgroup(), so removes its definition too. Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 KAMEZAWA Hiroyuki 提交于
In global VM, FILE_MAPPED is used but memcg uses MAPPED_FILE. This makes grep difficult. Replace memcg's MAPPED_FILE with FILE_MAPPED And in global VM, mapped shared memory is accounted into FILE_MAPPED. But memcg doesn't. fix it. Note: page_is_file_cache() just checks SwapBacked or not. So, we need to check PageAnon. Cc: Balbir Singh <balbir@in.ibm.com> Reviewed-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 KAMEZAWA Hiroyuki 提交于
In massive parallel enviroment, res_counter can be a performance bottleneck. One strong techinque to reduce lock contention is reducing calls by coalescing some amount of calls into one. Considering charge/uncharge chatacteristic, - charge is done one by one via demand-paging. - uncharge is done by - in chunk at munmap, truncate, exit, execve... - one by one via vmscan/paging. It seems we have a chance to coalesce uncharges for improving scalability at unmap/truncation. This patch is a for coalescing uncharge. For avoiding scattering memcg's structure to functions under /mm, this patch adds memcg batch uncharge information to the task. A reason for per-task batching is for making use of caller's context information. We do batched uncharge (deleyed uncharge) when truncation/unmap occurs but do direct uncharge when uncharge is called by memory reclaim (vmscan.c). The degree of coalescing depends on callers - at invalidate/trucate... pagevec size - at unmap ....ZAP_BLOCK_SIZE (memory itself will be freed in this degree.) Then, we'll not coalescing too much. On x86-64 8cpu server, I tested overheads of memcg at page fault by running a program which does map/fault/unmap in a loop. Running a task per a cpu by taskset and see sum of the number of page faults in 60secs. [without memcg config] 40156968 page-faults # 0.085 M/sec ( +- 0.046% ) 27.67 cache-miss/faults [root cgroup] 36659599 page-faults # 0.077 M/sec ( +- 0.247% ) 31.58 miss/faults [in a child cgroup] 18444157 page-faults # 0.039 M/sec ( +- 0.133% ) 69.96 miss/faults [child with this patch] 27133719 page-faults # 0.057 M/sec ( +- 0.155% ) 47.16 miss/faults We can see some amounts of improvement. (root cgroup doesn't affected by this patch) Another patch for "charge" will follow this and above will be improved more. Changelog(since 2009/10/02): - renamed filed of memcg_batch (as pages to bytes, memsw to memsw_bytes) - some clean up and commentary/description updates. - added initialize code to copy_process(). (possible bug fix) Changelog(old): - fixed !CONFIG_MEM_CGROUP case. - rebased onto the latest mmotm + softlimit fix patches. - unified patch for callers - added commetns. - make ->do_batch as bool. - removed css_get() at el. We don't need it. Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-