1. 19 6月, 2009 1 次提交
    • I
      perf_counter, x86: Improve interactions with fast-gup · 0c871971
      Ingo Molnar 提交于
      Improve a few details in perfcounter call-chain recording that
      makes use of fast-GUP:
      
      - Use ACCESS_ONCE() to observe the pte value. ptes are fundamentally
        racy and can be changed on another CPU, so we have to be careful
        about how we access them. The PAE branch is already careful with
        read-barriers - but the non-PAE and 64-bit side needs an
        ACCESS_ONCE() to make sure the pte value is observed only once.
      
      - make the checks a bit stricter so that we can feed it any kind of
        cra^H^H^H user-space input ;-)
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0c871971
  2. 05 2月, 2009 1 次提交
  3. 30 1月, 2009 1 次提交
  4. 24 1月, 2009 1 次提交
    • H
      x86: uaccess: introduce try and catch framework · fe40c0af
      Hiroshi Shimamoto 提交于
      Impact: introduce new uaccess exception handling framework
      
      Introduce {get|put}_user_try and {get|put}_user_catch as new uaccess exception
      handling framework.
      {get|put}_user_try begins exception block and {get|put}_user_catch(err) ends
      the block and gets err if an exception occured in {get|put}_user_ex() in the
      block. The exception is stored thread_info->uaccess_err.
      
      The example usage of this framework is below;
      int func()
      {
      	int err = 0;
      
      	get_user_try {
      		get_user_ex(...);
      		get_user_ex(...);
      		:
      	} get_user_catch(err);
      
      	return err;
      }
      
      Note: get_user_ex() is not clear the value when an exception occurs, it's
      different from the behavior of __get_user(), but I think it doesn't matter.
      Signed-off-by: NHiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      fe40c0af
  5. 21 1月, 2009 2 次提交
  6. 12 12月, 2008 1 次提交
  7. 23 10月, 2008 2 次提交
  8. 12 9月, 2008 1 次提交
  9. 11 9月, 2008 1 次提交
  10. 10 9月, 2008 1 次提交
    • N
      x86: some lock annotations for user copy paths · c10d38dd
      Nick Piggin 提交于
      copy_to/from_user and all its variants (except the atomic ones) can take a
      page fault and perform non-trivial work like taking mmap_sem and entering
      the filesyste/pagecache.
      
      Unfortunately, this often escapes lockdep because a common pattern is to
      use it to read in some arguments just set up from userspace, or write data
      back to a hot buffer. In those cases, it will be unlikely for page reclaim
      to get a window in to cause copy_*_user to fault.
      
      With the new might_lock primitives, add some annotations to x86. I don't
      know if I caught all possible faulting points (it's a bit of a maze, and I
      didn't really look at 32-bit). But this is a starting point.
      
      Boots and runs OK so far.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c10d38dd
  11. 27 7月, 2008 1 次提交
    • N
      x86: lockless get_user_pages_fast() · 8174c430
      Nick Piggin 提交于
      Implement get_user_pages_fast without locking in the fastpath on x86.
      
      Do an optimistic lockless pagetable walk, without taking mmap_sem or any
      page table locks or even mmap_sem.  Page table existence is guaranteed by
      turning interrupts off (combined with the fact that we're always looking
      up the current mm, means we can do the lockless page table walk within the
      constraints of the TLB shootdown design).  Basically we can do this
      lockless pagetable walk in a similar manner to the way the CPU's pagetable
      walker does not have to take any locks to find present ptes.
      
      This patch (combined with the subsequent ones to convert direct IO to use
      it) was found to give about 10% performance improvement on a 2 socket 8
      core Intel Xeon system running an OLTP workload on DB2 v9.5
      
       "To test the effects of the patch, an OLTP workload was run on an IBM
        x3850 M2 server with 2 processors (quad-core Intel Xeon processors at
        2.93 GHz) using IBM DB2 v9.5 running Linux 2.6.24rc7 kernel.  Comparing
        runs with and without the patch resulted in an overall performance
        benefit of ~9.8%.  Correspondingly, oprofiles showed that samples from
        __up_read and __down_read routines that is seen during thread contention
        for system resources was reduced from 2.8% down to .05%.  Monitoring the
        /proc/vmstat output from the patched run showed that the counter for
        fast_gup contained a very high number while the fast_gup_slow value was
        zero."
      
      (fast_gup is the old name for get_user_pages_fast, fast_gup_slow is a
      counter we had for the number of times the slowpath was invoked).
      
      The main reason for the improvement is that DB2 has multiple threads each
      issuing direct-IO.  Direct-IO uses get_user_pages, and thus the threads
      contend the mmap_sem cacheline, and can also contend on page table locks.
      
      I would anticipate larger performance gains on larger systems, however I
      think DB2 uses an adaptive mix of threads and processes, so it could be
      that thread contention remains pretty constant as machine size increases.
      In which case, we stuck with "only" a 10% gain.
      
      The downside of using get_user_pages_fast is that if there is not a pte
      with the correct permissions for the access, we end up falling back to
      get_user_pages and so the get_user_pages_fast is a bit of extra work.
      However this should not be the common case in most performance critical
      code.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: Kconfig fix]
      [akpm@linux-foundation.org: Makefile fix/cleanup]
      [akpm@linux-foundation.org: warning fix]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Dave Kleikamp <shaggy@austin.ibm.com>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Dave Kleikamp <shaggy@austin.ibm.com>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Cc: Zach Brown <zach.brown@oracle.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Reviewed-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8174c430
  12. 23 7月, 2008 1 次提交
    • V
      x86: consolidate header guards · 77ef50a5
      Vegard Nossum 提交于
      This patch is the result of an automatic script that consolidates the
      format of all the headers in include/asm-x86/.
      
      The format:
      
      1. No leading underscore. Names with leading underscores are reserved.
      2. Pathname components are separated by two underscores. So we can
         distinguish between mm_types.h and mm/types.h.
      3. Everything except letters and numbers are turned into single
         underscores.
      Signed-off-by: NVegard Nossum <vegard.nossum@gmail.com>
      77ef50a5
  13. 09 7月, 2008 9 次提交
  14. 11 10月, 2007 1 次提交