1. 10 2月, 2009 1 次提交
    • T
      x86: make lazy %gs optional on x86_32 · ccbeed3a
      Tejun Heo 提交于
      Impact: pt_regs changed, lazy gs handling made optional, add slight
              overhead to SAVE_ALL, simplifies error_code path a bit
      
      On x86_32, %gs hasn't been used by kernel and handled lazily.  pt_regs
      doesn't have place for it and gs is saved/loaded only when necessary.
      In preparation for stack protector support, this patch makes lazy %gs
      handling optional by doing the followings.
      
      * Add CONFIG_X86_32_LAZY_GS and place for gs in pt_regs.
      
      * Save and restore %gs along with other registers in entry_32.S unless
        LAZY_GS.  Note that this unfortunately adds "pushl $0" on SAVE_ALL
        even when LAZY_GS.  However, it adds no overhead to common exit path
        and simplifies entry path with error code.
      
      * Define different user_gs accessors depending on LAZY_GS and add
        lazy_save_gs() and lazy_load_gs() which are noop if !LAZY_GS.  The
        lazy_*_gs() ops are used to save, load and clear %gs lazily.
      
      * Define ELF_CORE_COPY_KERNEL_REGS() which always read %gs directly.
      
      xen and lguest changes need to be verified.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ccbeed3a
  2. 09 2月, 2009 1 次提交
  3. 06 2月, 2009 1 次提交
  4. 05 2月, 2009 4 次提交
  5. 04 2月, 2009 1 次提交
  6. 31 1月, 2009 4 次提交
    • J
      xen: setup percpu data pointers · 795f99b6
      Jeremy Fitzhardinge 提交于
      Impact: fix xen booting
      
      We need to access percpu data fairly early, so set up the percpu
      registers as soon as possible.  We only need to load the appropriate
      segment register.  We already have a GDT, but its hard to change it
      early because we need to manipulate the pagetable to do so, and that
      hasn't been set up yet.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      795f99b6
    • J
      x86/paravirt: use callee-saved convention for pte_val/make_pte/etc · da5de7c2
      Jeremy Fitzhardinge 提交于
      Impact: Optimization
      
      In the native case, pte_val, make_pte, etc are all just identity
      functions, so there's no need to clobber a lot of registers over them.
      
      (This changes the 32-bit callee-save calling convention to return both
      EAX and EDX so functions can return 64-bit values.)
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      da5de7c2
    • J
      x86/paravirt: add register-saving thunks to reduce caller register pressure · ecb93d1c
      Jeremy Fitzhardinge 提交于
      Impact: Optimization
      
      One of the problems with inserting a pile of C calls where previously
      there were none is that the register pressure is greatly increased.
      The C calling convention says that the caller must expect a certain
      set of registers may be trashed by the callee, and that the callee can
      use those registers without restriction.  This includes the function
      argument registers, and several others.
      
      This patch seeks to alleviate this pressure by introducing wrapper
      thunks that will do the register saving/restoring, so that the
      callsite doesn't need to worry about it, but the callee function can
      be conventional compiler-generated code.  In many cases (particularly
      performance-sensitive cases) the callee will be in assembler anyway,
      and need not use the compiler's calling convention.
      
      Standard calling convention is:
      	 arguments	    return	scratch
      x86-32	 eax edx ecx	    eax		?
      x86-64	 rdi rsi rdx rcx    rax		r8 r9 r10 r11
      
      The thunk preserves all argument and scratch registers.  The return
      register is not preserved, and is available as a scratch register for
      unwrapped callee code (and of course the return value).
      
      Wrapped function pointers are themselves wrapped in a struct
      paravirt_callee_save structure, in order to get some warning from the
      compiler when functions with mismatched calling conventions are used.
      
      The most common paravirt ops, both statically and dynamically, are
      interrupt enable/disable/save/restore, so handle them first.  This is
      particularly easy since their calls are handled specially anyway.
      
      XXX Deal with VMI.  What's their calling convention?
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      ecb93d1c
    • J
      xen: move remaining mmu-related stuff into mmu.c · 319f3ba5
      Jeremy Fitzhardinge 提交于
      Impact: Cleanup
      
      Move remaining mmu-related stuff into mmu.c.
      A general cleanup, and lay the groundwork for later patches.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      319f3ba5
  7. 27 1月, 2009 1 次提交
  8. 23 1月, 2009 2 次提交
    • I
      x86, xen: fix hardirq.h merge fallout · 99d0000f
      Ingo Molnar 提交于
      Impact: build fix
      
      This build error:
      
       arch/x86/xen/suspend.c:22: error: implicit declaration of function 'fix_to_virt'
       arch/x86/xen/suspend.c:22: error: 'FIX_PARAVIRT_BOOTMAP' undeclared (first use in this function)
       arch/x86/xen/suspend.c:22: error: (Each undeclared identifier is reported only once
       arch/x86/xen/suspend.c:22: error: for each function it appears in.)
      
      triggers because the hardirq.h unification removed an implicit fixmap.h
      include - on which arch/x86/xen/suspend.c depended. Add the fixmap.h
      include explicitly.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      99d0000f
    • J
      x86/pvops: remove pte_flags pvop · ab897d20
      Jeremy Fitzhardinge 提交于
      pte_flags() was introduced as a new pvop in order to extract just the
      flags portion of a pte, which is a potentially cheaper operation than
      extracting the page number as well.  It turns out this operation is
      not needed, because simply using a mask to extract the flags from a
      pte is sufficient for all current users.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ab897d20
  9. 20 1月, 2009 1 次提交
  10. 18 1月, 2009 5 次提交
  11. 16 1月, 2009 3 次提交
    • I
      percpu: add optimized generic percpu accessors · 6dbde353
      Ingo Molnar 提交于
      It is an optimization and a cleanup, and adds the following new
      generic percpu methods:
      
        percpu_read()
        percpu_write()
        percpu_add()
        percpu_sub()
        percpu_and()
        percpu_or()
        percpu_xor()
      
      and implements support for them on x86. (other architectures will fall
      back to a default implementation)
      
      The advantage is that for example to read a local percpu variable,
      instead of this sequence:
      
       return __get_cpu_var(var);
      
       ffffffff8102ca2b:	48 8b 14 fd 80 09 74 	mov    -0x7e8bf680(,%rdi,8),%rdx
       ffffffff8102ca32:	81
       ffffffff8102ca33:	48 c7 c0 d8 59 00 00 	mov    $0x59d8,%rax
       ffffffff8102ca3a:	48 8b 04 10          	mov    (%rax,%rdx,1),%rax
      
      We can get a single instruction by using the optimized variants:
      
       return percpu_read(var);
      
       ffffffff8102ca3f:	65 48 8b 05 91 8f fd 	mov    %gs:0x7efd8f91(%rip),%rax
      
      I also cleaned up the x86-specific APIs and made the x86 code use
      these new generic percpu primitives.
      
      tj: * fixed generic percpu_sub() definition as Roel Kluin pointed out
          * added percpu_and() for completeness's sake
          * made generic percpu ops atomic against preemption
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      6dbde353
    • T
      x86: misc clean up after the percpu update · 004aa322
      Tejun Heo 提交于
      Do the following cleanups:
      
      * kill x86_64_init_pda() which now is equivalent to pda_init()
      
      * use per_cpu_offset() instead of cpu_pda() when initializing
        initial_gs
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      004aa322
    • T
      x86: fold pda into percpu area on SMP · 1a51e3a0
      Tejun Heo 提交于
      [ Based on original patch from Christoph Lameter and Mike Travis. ]
      
      Currently pdas and percpu areas are allocated separately.  %gs points
      to local pda and percpu area can be reached using pda->data_offset.
      This patch folds pda into percpu area.
      
      Due to strange gcc requirement, pda needs to be at the beginning of
      the percpu area so that pda->stack_canary is at %gs:40.  To achieve
      this, a new percpu output section macro - PERCPU_VADDR_PREALLOC() - is
      added and used to reserve pda sized chunk at the start of the percpu
      area.
      
      After this change, for boot cpu, %gs first points to pda in the
      data.init area and later during setup_per_cpu_areas() gets updated to
      point to the actual pda.  This means that setup_per_cpu_areas() need
      to reload %gs for CPU0 while clearing pda area for other cpus as cpu0
      already has modified it when control reaches setup_per_cpu_areas().
      
      This patch also removes now unnecessary get_local_pda() and its call
      sites.
      
      A lot of this patch is taken from Mike Travis' "x86_64: Fold pda into
      per cpu area" patch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1a51e3a0
  12. 12 1月, 2009 1 次提交
    • R
      x86: change flush_tlb_others to take a const struct cpumask · 4595f962
      Rusty Russell 提交于
      Impact: reduce stack usage, use new cpumask API.
      
      This is made a little more tricky by uv_flush_tlb_others which
      actually alters its argument, for an IPI to be sent to the remaining
      cpus in the mask.
      
      I solve this by allocating a cpumask_var_t for this case and falling back
      to IPI should this fail.
      
      To eliminate temporaries in the caller, all flush_tlb_others implementations
      now do the this-cpu-elimination step themselves.
      
      Note also the curious "cpus_or(f->flush_cpumask, cpumask, f->flush_cpumask)"
      which has been there since pre-git and yet f->flush_cpumask is always zero
      at this point.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NMike Travis <travis@sgi.com>
      4595f962
  13. 31 12月, 2008 1 次提交
    • M
      [PATCH] idle cputime accounting · 79741dd3
      Martin Schwidefsky 提交于
      The cpu time spent by the idle process actually doing something is
      currently accounted as idle time. This is plain wrong, the architectures
      that support VIRT_CPU_ACCOUNTING=y can do better: distinguish between the
      time spent doing nothing and the time spent by idle doing work. The first
      is accounted with account_idle_time and the second with account_system_time.
      The architectures that use the account_xxx_time interface directly and not
      the account_xxx_ticks interface now need to do the check for the idle
      process in their arch code. In particular to improve the system vs true
      idle time accounting the arch code needs to measure the true idle time
      instead of just testing for the idle process.
      To improve the tick based accounting as well we would need an architecture
      primitive that can tell us if the pt_regs of the interrupted context
      points to the magic instruction that halts the cpu.
      
      In addition idle time is no more added to the stime of the idle process.
      This field now contains the system time of the idle process as it should
      be. On systems without VIRT_CPU_ACCOUNTING this will always be zero as
      every tick that occurs while idle is running will be accounted as idle
      time.
      
      This patch contains the necessary common code changes to be able to
      distinguish idle system time and true idle time. The architectures with
      support for VIRT_CPU_ACCOUNTING need some changes to exploit this.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      79741dd3
  14. 17 12月, 2008 6 次提交
  15. 13 12月, 2008 1 次提交
  16. 01 12月, 2008 2 次提交
  17. 23 11月, 2008 1 次提交
    • I
      xen: pin correct PGD on suspend · 86bbc2c2
      Ian Campbell 提交于
      Impact: fix Xen guest boot failure
      
      commit eefb47f6 ("xen: use
      spin_lock_nest_lock when pinning a pagetable") changed xen_pgd_walk to
      walk over mm->pgd rather than taking pgd as an argument.
      
      This breaks xen_mm_(un)pin_all() because it makes init_mm.pgd readonly
      instead of the pgd we are interested in and therefore the pin subsequently
      fails.
      
      (XEN) mm.c:2280:d15 Bad type (saw 00000000e8000001 != exp 0000000060000000) for mfn bc464 (pfn 21ca7)
      (XEN) mm.c:2665:d15 Error while pinning mfn bc464
      
      [   14.586913] 1 multicall(s) failed: cpu 0
      [   14.586926] Pid: 14, comm: kstop/0 Not tainted 2.6.28-rc5-x86_32p-xenU-00172-gee2f6cc7 #200
      [   14.586940] Call Trace:
      [   14.586955]  [<c030c17a>] ? printk+0x18/0x1e
      [   14.586972]  [<c0103df3>] xen_mc_flush+0x163/0x1d0
      [   14.586986]  [<c0104bc1>] __xen_pgd_pin+0xa1/0x110
      [   14.587000]  [<c015a330>] ? stop_cpu+0x0/0xf0
      [   14.587015]  [<c0104d7b>] xen_mm_pin_all+0x4b/0x70
      [   14.587029]  [<c022bcb9>] xen_suspend+0x39/0xe0
      [   14.587042]  [<c015a330>] ? stop_cpu+0x0/0xf0
      [   14.587054]  [<c015a3cd>] stop_cpu+0x9d/0xf0
      [   14.587067]  [<c01417cd>] run_workqueue+0x8d/0x150
      [   14.587080]  [<c030e4b3>] ? _spin_unlock_irqrestore+0x23/0x40
      [   14.587094]  [<c014558a>] ? prepare_to_wait+0x3a/0x70
      [   14.587107]  [<c0141918>] worker_thread+0x88/0xf0
      [   14.587120]  [<c01453c0>] ? autoremove_wake_function+0x0/0x50
      [   14.587133]  [<c0141890>] ? worker_thread+0x0/0xf0
      [   14.587146]  [<c014509c>] kthread+0x3c/0x70
      [   14.587157]  [<c0145060>] ? kthread+0x0/0x70
      [   14.587170]  [<c0109d1b>] kernel_thread_helper+0x7/0x10
      [   14.587181]   call  1/3: op=14 arg=[c0415000] result=0
      [   14.587192]   call  2/3: op=14 arg=[e1ca2000] result=0
      [   14.587204]   call  3/3: op=26 arg=[c1808860] result=-22
      Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
      Acked-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      86bbc2c2
  18. 07 11月, 2008 2 次提交
    • J
      xen: make sure stray alias mappings are gone before pinning · d05fdf31
      Jeremy Fitzhardinge 提交于
      Xen requires that all mappings of pagetable pages are read-only, so
      that they can't be updated illegally.  As a result, if a page is being
      turned into a pagetable page, we need to make sure all its mappings
      are RO.
      
      If the page had been used for ioremap or vmalloc, it may still have
      left over mappings as a result of not having been lazily unmapped.
      This change makes sure we explicitly mop them all up before pinning
      the page.
      
      Unlike aliases created by kmap, the there can be vmalloc aliases even
      for non-high pages, so we must do the flush unconditionally.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Linux Memory Management List <linux-mm@kvack.org>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d05fdf31
    • J
      x86, xen: fix use of pgd_page now that it really does return a page · 47cb2ed9
      Jeremy Fitzhardinge 提交于
      Impact: fix 32-bit Xen guest boot crash
      
      On 32-bit PAE, pud_page, for no good reason, didn't really return a
      struct page *.  Since Jan Beulich's fix "i386/PAE: fix pud_page()",
      pud_page does return a struct page *.
      
      Because PAE has 3 pagetable levels, the pud level is folded into the
      pgd level, so pgd_page() is the same as pud_page(), and now returns
      a struct page *.  Update the xen/mmu.c code which uses pgd_page()
      accordingly.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      47cb2ed9
  19. 27 10月, 2008 1 次提交
    • C
      xen: fix Xen domU boot with batched mprotect · 9f32d21c
      Chris Lalancette 提交于
      Impact: fix guest kernel boot crash on certain configs
      
      Recent i686 2.6.27 kernels with a certain amount of memory (between
      736 and 855MB) have a problem booting under a hypervisor that supports
      batched mprotect (this includes the RHEL-5 Xen hypervisor as well as
      any 3.3 or later Xen hypervisor).
      
      The problem ends up being that xen_ptep_modify_prot_commit() is using
      virt_to_machine to calculate which pfn to update.  However, this only
      works for pages that are in the p2m list, and the pages coming from
      change_pte_range() in mm/mprotect.c are kmap_atomic pages.  Because of
      this, we can run into the situation where the lookup in the p2m table
      returns an INVALID_MFN, which we then try to pass to the hypervisor,
      which then (correctly) denies the request to a totally bogus pfn.
      
      The right thing to do is to use arbitrary_virt_to_machine, so that we
      can be sure we are modifying the right pfn.  This unfortunately
      introduces a performance penalty because of a full page-table-walk,
      but we can avoid that penalty for pages in the p2m list by checking if
      virt_addr_valid is true, and if so, just doing the lookup in the p2m
      table.
      
      The attached patch implements this, and allows my 2.6.27 i686 based
      guest with 768MB of memory to boot on a RHEL-5 hypervisor again.
      Thanks to Jeremy for the suggestions about how to fix this particular
      issue.
      Signed-off-by: NChris Lalancette <clalance@redhat.com>
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Chris Lalancette <clalance@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9f32d21c
  20. 21 10月, 2008 1 次提交