1. 18 12月, 2009 6 次提交
  2. 17 12月, 2009 18 次提交
  3. 16 12月, 2009 16 次提交
    • A
      iommu-helper: use bitmap library · a66022c4
      Akinobu Mita 提交于
      Use bitmap library and kill some unused iommu helper functions.
      
      1. s/iommu_area_free/bitmap_clear/
      
      2. s/iommu_area_reserve/bitmap_set/
      
      3. Use bitmap_find_next_zero_area instead of find_next_zero_area
      
        This cannot be simple substitution because find_next_zero_area
        doesn't check the last bit of the limit in bitmap
      
      4. Remove iommu_area_free, iommu_area_reserve, and find_next_zero_area
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: Joerg Roedel <joerg.roedel@amd.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a66022c4
    • A
      bitmap: introduce bitmap_set, bitmap_clear, bitmap_find_next_zero_area · c1a2a962
      Akinobu Mita 提交于
      This introduces new bitmap functions:
      
      bitmap_set: Set specified bit area
      bitmap_clear: Clear specified bit area
      bitmap_find_next_zero_area: Find free bit area
      
      These are mostly stolen from iommu helper. The differences are:
      
      - Use find_next_bit instead of doing test_bit for each bit
      
      - Rewrite bitmap_set and bitmap_clear
      
        Instead of setting or clearing for each bit.
      
      - Check the last bit of the limit
      
        iommu-helper doesn't want to find such area
      
      - The return value if there is no zero area
      
        find_next_zero_area in iommu helper: returns -1
        bitmap_find_next_zero_area: return >= bitmap size
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Cc: Lothar Wassmann <LW@KARO-electronics.de>
      Cc: Roland Dreier <rolandd@cisco.com>
      Cc: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Joerg Roedel <joerg.roedel@amd.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c1a2a962
    • J
      resource: constify arg to resource_size() and resource_type() · f65380c0
      Jean Delvare 提交于
      resource_size() doesn't change the resource it operates on, so the res
      parameter can be marked const.  Same for resource_type().
      Signed-off-by: NJean Delvare <khali@linux-fr.org>
      Reviewed-by: NWANG Cong <xiyou.wangcong@gmail.com>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f65380c0
    • C
      direct-io: cleanup blockdev_direct_IO locking · 5fe878ae
      Christoph Hellwig 提交于
      Currently the locking in blockdev_direct_IO is a mess, we have three
      different locking types and very confusing checks for some of them.  The
      most complicated one is DIO_OWN_LOCKING for reads, which happens to not
      actually be used.
      
      This patch gets rid of the DIO_OWN_LOCKING - as mentioned above the read
      case is unused anyway, and the write side is almost identical to
      DIO_NO_LOCKING.  The difference is that DIO_NO_LOCKING always sets the
      create argument for the get_blocks callback to zero, but we can easily
      move that to the actual get_blocks callbacks.  There are four users of the
      DIO_NO_LOCKING mode: gfs already ignores the create argument and thus is
      fine with the new version, ocfs2 only errors out if create were ever set,
      and we can remove this dead code now, the block device code only ever uses
      create for an error message if we are fully beyond the device which can
      never happen, and last but not least XFS will need the new behavour for
      writes.
      
      Now we can replace the lock_type variable with a flags one, where no flag
      means the DIO_NO_LOCKING behaviour and DIO_LOCKING is kept as the first
      flag.  Separate out the check for not allowing to fill holes into a
      separate flag, although for now both flags always get set at the same
      time.
      
      Also revamp the documentation of the locking scheme to actually make
      sense.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Zach Brown <zach.brown@oracle.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Alex Elder <aelder@sgi.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <joel.becker@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5fe878ae
    • S
      aio: remove unused field · fac046ad
      Shaohua Li 提交于
      Don't know the reason, but it appears ki_wait field of iocb never gets used.
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Zach Brown <zach.brown@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fac046ad
    • A
      kexec: premit reduction of the reserved memory size · 06a7f711
      Amerigo Wang 提交于
      Implement shrinking the reserved memory for crash kernel, if it is more
      than enough.
      
      For example, if you have already reserved 128M, now you just want 100M,
      you can do:
      
      # echo $((100*1024*1024)) > /sys/kernel/kexec_crash_size
      
      Note, you can only do this before loading the crash kernel.
      Signed-off-by: NWANG Cong <amwang@redhat.com>
      Cc: Neil Horman <nhorman@redhat.com>
      Acked-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      06a7f711
    • A
      ipc: HARD_MSGMAX should be higher not lower on 64bit · 9cf18e1d
      Amerigo Wang 提交于
      We have HARD_MSGMAX lower on 64bit than on 32bit, since usually 64bit
      machines have more memory than 32bit machines.
      
      Making it higher on 64bit seems reasonable, and keep the original number
      on 32bit.
      Acked-by: NSerge E. Hallyn <serue@us.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: NWANG Cong <amwang@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9cf18e1d
    • M
      ipc/sem.c: add a per-semaphore pending list · b97e820f
      Manfred Spraul 提交于
      Based on Nick's findings:
      
      sysv sem has the concept of semaphore arrays that consist out of multiple
      semaphores.  Atomic operations that affect multiple semaphores are
      supported.
      
      The patch is the first step for optimizing simple, single semaphore
      operations: In addition to the global list of all pending operations, a
      2nd, per-semaphore list with the simple operations is added.
      
      Note: this patch does not make sense by itself, the new list is used
      nowhere.
      Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Pierre Peiffer <peifferp@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b97e820f
    • O
      signals: kill force_sig_specific() · ad09750b
      Oleg Nesterov 提交于
      Kill force_sig_specific(), this trivial wrapper has no callers.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ad09750b
    • O
      signals: SEND_SIG_NOINFO should be considered as SI_FROMUSER() · 614c517d
      Oleg Nesterov 提交于
      No changes in compiled code. The patch adds the new helper, si_fromuser()
      and changes check_kill_permission() to use this helper.
      
      The real effect of this patch is that from now we "officially" consider
      SEND_SIG_NOINFO signal as "from user-space" signals. This is already true
      if we look at the code which uses SEND_SIG_NOINFO, except __send_signal()
      has another opinion - see the next patch.
      
      The naming of these special SEND_SIG_XXX siginfo's is really bad
      imho.  From __send_signal()'s pov they mean
      
      	SEND_SIG_NOINFO		from user
      	SEND_SIG_PRIV		from kernel
      	SEND_SIG_FORCED		no info
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      Reviewed-by: NSukadev Bhattiprolu <sukadev@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      614c517d
    • O
      ptrace: change tracehook_report_syscall_exit() to handle stepping · 2f0edac5
      Oleg Nesterov 提交于
      Suggested by Roland.
      
      Change tracehook_report_syscall_exit() to look at step flag and send the
      trap signal if needed.
      
      This change affects ia64, microblaze, parisc, powerpc, sh.  They pass
      nonzero "step" argument to tracehook but since it was ignored the tracee
      reports via ptrace_notify(), this is not right and not consistent.
      
      	- PTRACE_SETSIGINFO doesn't work
      
      	- if the tracer resumes the tracee with signr != 0 the new signal
      	  is generated rather than delivering it
      
      	- If PT_TRACESYSGOOD is set the tracee reports the wrong exit_code
      
      I don't have a powerpc machine, but I think this test-case should see the
      difference:
      
      	#include <unistd.h>
      	#include <sys/ptrace.h>
      	#include <sys/wait.h>
      	#include <assert.h>
      	#include <stdio.h>
      
      	int main(void)
      	{
      		int pid, status;
      
      		if (!(pid = fork())) {
      			assert(ptrace(PTRACE_TRACEME) == 0);
      			kill(getpid(), SIGSTOP);
      
      			getppid();
      
      			return 0;
      		}
      
      		assert(pid == wait(&status));
      		assert(ptrace(PTRACE_SETOPTIONS, pid, 0, PTRACE_O_TRACESYSGOOD) == 0);
      
      		assert(ptrace(PTRACE_SYSCALL, pid, 0,0) == 0);
      		assert(pid == wait(&status));
      
      		assert(ptrace(PTRACE_SINGLESTEP, pid, 0,0) == 0);
      		assert(pid == wait(&status));
      
      		if (status == 0x57F)
      			return 0;
      
      		printf("kernel bug: status=%X shouldn't have 0x80\n", status);
      		return 1;
      	}
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NRoland McGrath <roland@redhat.com>
      Cc: <linux-arch@vger.kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2f0edac5
    • O
      ptrace: introduce user_single_step_siginfo() helper · 85ec7fd9
      Oleg Nesterov 提交于
      Suggested by Roland.
      
      Currently there is no way to synthesize a single-stepping trap in the
      arch-independent manner.  This patch adds the default helper which fills
      siginfo_t, arch/ can can override it.
      
      Architetures which implement user_enable_single_step() should add
      user_single_step_siginfo() also.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NRoland McGrath <roland@redhat.com>
      Cc: <linux-arch@vger.kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      85ec7fd9
    • O
      ptrace: cleanup ptrace_init_task()->ptrace_link() path · c6a47cc2
      Oleg Nesterov 提交于
      No functional changes.
      
      ptrace_init_task() looks confusing, as if we always auto-attach when "bool
      ptrace" argument is true, while in fact we attach only if current is
      traced.
      
      Make the code more explicit and kill now unused ptrace_link().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NRoland McGrath <roland@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c6a47cc2
    • D
      memcg: cleanup mem_cgroup_move_parent() · 57f9fd7d
      Daisuke Nishimura 提交于
      mem_cgroup_move_parent() calls try_charge first and cancel_charge on
      failure.  IMHO, charge/uncharge(especially charge) is high cost operation,
      so we should avoid it as far as possible.
      
      This patch tries to delay try_charge in mem_cgroup_move_parent() by
      re-ordering checks it does.
      
      And this patch renames mem_cgroup_move_account() to
      __mem_cgroup_move_account(), changes the return value of
      __mem_cgroup_move_account() from int to void, and adds a new
      wrapper(mem_cgroup_move_account()), which checks whether a @pc is valid
      for moving account and calls __mem_cgroup_move_account().
      
      This patch removes the last caller of trylock_page_cgroup(), so removes
      its definition too.
      Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      57f9fd7d
    • K
      memcg: make memcg's file mapped consistent with global VM · d8046582
      KAMEZAWA Hiroyuki 提交于
      In global VM, FILE_MAPPED is used but memcg uses MAPPED_FILE.  This makes
      grep difficult.  Replace memcg's MAPPED_FILE with FILE_MAPPED
      
      And in global VM, mapped shared memory is accounted into FILE_MAPPED.
      But memcg doesn't. fix it.
      Note:
        page_is_file_cache() just checks SwapBacked or not.
        So, we need to check PageAnon.
      
      Cc: Balbir Singh <balbir@in.ibm.com>
      Reviewed-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d8046582
    • K
      memcg: coalesce uncharge during unmap/truncate · 569b846d
      KAMEZAWA Hiroyuki 提交于
      In massive parallel enviroment, res_counter can be a performance
      bottleneck.  One strong techinque to reduce lock contention is reducing
      calls by coalescing some amount of calls into one.
      
      Considering charge/uncharge chatacteristic,
      	- charge is done one by one via demand-paging.
      	- uncharge is done by
      		- in chunk at munmap, truncate, exit, execve...
      		- one by one via vmscan/paging.
      
      It seems we have a chance to coalesce uncharges for improving scalability
      at unmap/truncation.
      
      This patch is a for coalescing uncharge.  For avoiding scattering memcg's
      structure to functions under /mm, this patch adds memcg batch uncharge
      information to the task.  A reason for per-task batching is for making use
      of caller's context information.  We do batched uncharge (deleyed
      uncharge) when truncation/unmap occurs but do direct uncharge when
      uncharge is called by memory reclaim (vmscan.c).
      
      The degree of coalescing depends on callers
        - at invalidate/trucate... pagevec size
        - at unmap ....ZAP_BLOCK_SIZE
      (memory itself will be freed in this degree.)
      Then, we'll not coalescing too much.
      
      On x86-64 8cpu server, I tested overheads of memcg at page fault by
      running a program which does map/fault/unmap in a loop. Running
      a task per a cpu by taskset and see sum of the number of page faults
      in 60secs.
      
      [without memcg config]
        40156968  page-faults              #      0.085 M/sec   ( +-   0.046% )
        27.67 cache-miss/faults
      [root cgroup]
        36659599  page-faults              #      0.077 M/sec   ( +-   0.247% )
        31.58 miss/faults
      [in a child cgroup]
        18444157  page-faults              #      0.039 M/sec   ( +-   0.133% )
        69.96 miss/faults
      [child with this patch]
        27133719  page-faults              #      0.057 M/sec   ( +-   0.155% )
        47.16 miss/faults
      
      We can see some amounts of improvement.
      (root cgroup doesn't affected by this patch)
      Another patch for "charge" will follow this and above will be improved more.
      
      Changelog(since 2009/10/02):
       - renamed filed of memcg_batch (as pages to bytes, memsw to memsw_bytes)
       - some clean up and commentary/description updates.
       - added initialize code to copy_process(). (possible bug fix)
      
      Changelog(old):
       - fixed !CONFIG_MEM_CGROUP case.
       - rebased onto the latest mmotm + softlimit fix patches.
       - unified patch for callers
       - added commetns.
       - make ->do_batch as bool.
       - removed css_get() at el. We don't need it.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      569b846d