提交 · e180377f1ae48b3cbc559c9875d9b038f7f000c6 · Linux-御风守护者 / linux

13 12月, 2012 1 次提交

thp: change split_huge_page_pmd() interface · e180377f

由 Kirill A. Shutemov 提交于 12月 12, 2012

Pass vma instead of mm and add address parameter.

In most cases we already have vma on the stack. We provides
split_huge_page_pmd_mm() for few cases when we have mm, but not vma.

This change is preparation to huge zero pmd splitting implementation.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e180377f

20 9月, 2012 1 次提交

x86: get rid of TIF_IRET hackery · e76623d6

由 Al Viro 提交于 8月 02, 2012

TIF_NOTIFY_RESUME will work in precisely the same way; all that
is achieved by TIF_IRET is appearing that there's some work to be
done, so we end up on the iret exit path.  Just use NOTIFY_RESUME.
And for execve() do that in 32bit start_thread(), not sys_execve()
itself.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e76623d6

06 6月, 2012 1 次提交

x86/debug: Add KERN_<LEVEL> to bare printks, convert printks to pr_<level> · c767a54b

由 Joe Perches 提交于 5月 21, 2012

Use a more current logging style:

 - Bare printks should have a KERN_<LEVEL> for consistency's sake
 - Add pr_fmt where appropriate
 - Neaten some macro definitions
 - Convert some Ok output to OK
 - Use "%s: ", __func__ in pr_fmt for summit
 - Convert some printks to pr_<level>

Message output is not identical in all cases.
Signed-off-by: NJoe Perches <joe@perches.com>
Cc: levinsasha928@gmail.com
Link: http://lkml.kernel.org/r/1337655007.24226.10.camel@joe2Laptop
[ merged two similar patches, tidied up the changelog ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

c767a54b

22 3月, 2012 1 次提交

mm: thp: fix pmd_bad() triggering in code paths holding mmap_sem read mode · 1a5a9906

由 Andrea Arcangeli 提交于 3月 21, 2012

In some cases it may happen that pmd_none_or_clear_bad() is called with
the mmap_sem hold in read mode.  In those cases the huge page faults can
allocate hugepmds under pmd_none_or_clear_bad() and that can trigger a
false positive from pmd_bad() that will not like to see a pmd
materializing as trans huge.

It's not khugepaged causing the problem, khugepaged holds the mmap_sem
in write mode (and all those sites must hold the mmap_sem in read mode
to prevent pagetables to go away from under them, during code review it
seems vm86 mode on 32bit kernels requires that too unless it's
restricted to 1 thread per process or UP builds).  The race is only with
the huge pagefaults that can convert a pmd_none() into a
pmd_trans_huge().

Effectively all these pmd_none_or_clear_bad() sites running with
mmap_sem in read mode are somewhat speculative with the page faults, and
the result is always undefined when they run simultaneously.  This is
probably why it wasn't common to run into this.  For example if the
madvise(MADV_DONTNEED) runs zap_page_range() shortly before the page
fault, the hugepage will not be zapped, if the page fault runs first it
will be zapped.

Altering pmd_bad() not to error out if it finds hugepmds won't be enough
to fix this, because zap_pmd_range would then proceed to call
zap_pte_range (which would be incorrect if the pmd become a
pmd_trans_huge()).

The simplest way to fix this is to read the pmd in the local stack
(regardless of what we read, no need of actual CPU barriers, only
compiler barrier needed), and be sure it is not changing under the code
that computes its value.  Even if the real pmd is changing under the
value we hold on the stack, we don't care.  If we actually end up in
zap_pte_range it means the pmd was not none already and it was not huge,
and it can't become huge from under us (khugepaged locking explained
above).

All we need is to enforce that there is no way anymore that in a code
path like below, pmd_trans_huge can be false, but pmd_none_or_clear_bad
can run into a hugepmd.  The overhead of a barrier() is just a compiler
tweak and should not be measurable (I only added it for THP builds).  I
don't exclude different compiler versions may have prevented the race
too by caching the value of *pmd on the stack (that hasn't been
verified, but it wouldn't be impossible considering
pmd_none_or_clear_bad, pmd_bad, pmd_trans_huge, pmd_none are all inlines
and there's no external function called in between pmd_trans_huge and
pmd_none_or_clear_bad).

		if (pmd_trans_huge(*pmd)) {
			if (next-addr != HPAGE_PMD_SIZE) {
				VM_BUG_ON(!rwsem_is_locked(&tlb->mm->mmap_sem));
				split_huge_page_pmd(vma->vm_mm, pmd);
			} else if (zap_huge_pmd(tlb, vma, pmd, addr))
				continue;
			/* fall through */
		}
		if (pmd_none_or_clear_bad(pmd))

Because this race condition could be exercised without special
privileges this was reported in CVE-2012-1179.

The race was identified and fully explained by Ulrich who debugged it.
I'm quoting his accurate explanation below, for reference.

====== start quote =======
      mapcount 0 page_mapcount 1
      kernel BUG at mm/huge_memory.c:1384!

    At some point prior to the panic, a "bad pmd ..." message similar to the
    following is logged on the console:

      mm/memory.c:145: bad pmd ffff8800376e1f98(80000000314000e7).

    The "bad pmd ..." message is logged by pmd_clear_bad() before it clears
    the page's PMD table entry.

        143 void pmd_clear_bad(pmd_t *pmd)
        144 {
    ->  145         pmd_ERROR(*pmd);
        146         pmd_clear(pmd);
        147 }

    After the PMD table entry has been cleared, there is an inconsistency
    between the actual number of PMD table entries that are mapping the page
    and the page's map count (_mapcount field in struct page). When the page
    is subsequently reclaimed, __split_huge_page() detects this inconsistency.

       1381         if (mapcount != page_mapcount(page))
       1382                 printk(KERN_ERR "mapcount %d page_mapcount %d\n",
       1383                        mapcount, page_mapcount(page));
    -> 1384         BUG_ON(mapcount != page_mapcount(page));

    The root cause of the problem is a race of two threads in a multithreaded
    process. Thread B incurs a page fault on a virtual address that has never
    been accessed (PMD entry is zero) while Thread A is executing an madvise()
    system call on a virtual address within the same 2 MB (huge page) range.

               virtual address space
              .---------------------.
              |                     |
              |                     |
            .-|---------------------|
            | |                     |
            | |                     |<-- B(fault)
            | |                     |
      2 MB  | |/////////////////////|-.
      huge <  |/////////////////////|  > A(range)
      page  | |/////////////////////|-'
            | |                     |
            | |                     |
            '-|---------------------|
              |                     |
              |                     |
              '---------------------'

    - Thread A is executing an madvise(..., MADV_DONTNEED) system call
      on the virtual address range "A(range)" shown in the picture.

    sys_madvise
      // Acquire the semaphore in shared mode.
      down_read(&current->mm->mmap_sem)
      ...
      madvise_vma
        switch (behavior)
        case MADV_DONTNEED:
             madvise_dontneed
               zap_page_range
                 unmap_vmas
                   unmap_page_range
                     zap_pud_range
                       zap_pmd_range
                         //
                         // Assume that this huge page has never been accessed.
                         // I.e. content of the PMD entry is zero (not mapped).
                         //
                         if (pmd_trans_huge(*pmd)) {
                             // We don't get here due to the above assumption.
                         }
                         //
                         // Assume that Thread B incurred a page fault and
             .---------> // sneaks in here as shown below.
             |           //
             |           if (pmd_none_or_clear_bad(pmd))
             |               {
             |                 if (unlikely(pmd_bad(*pmd)))
             |                     pmd_clear_bad
             |                     {
             |                       pmd_ERROR
             |                         // Log "bad pmd ..." message here.
             |                       pmd_clear
             |                         // Clear the page's PMD entry.
             |                         // Thread B incremented the map count
             |                         // in page_add_new_anon_rmap(), but
             |                         // now the page is no longer mapped
             |                         // by a PMD entry (-> inconsistency).
             |                     }
             |               }
             |
             v
    - Thread B is handling a page fault on virtual address "B(fault)" shown
      in the picture.

    ...
    do_page_fault
      __do_page_fault
        // Acquire the semaphore in shared mode.
        down_read_trylock(&mm->mmap_sem)
        ...
        handle_mm_fault
          if (pmd_none(*pmd) && transparent_hugepage_enabled(vma))
              // We get here due to the above assumption (PMD entry is zero).
              do_huge_pmd_anonymous_page
                alloc_hugepage_vma
                  // Allocate a new transparent huge page here.
                ...
                __do_huge_pmd_anonymous_page
                  ...
                  spin_lock(&mm->page_table_lock)
                  ...
                  page_add_new_anon_rmap
                    // Here we increment the page's map count (starts at -1).
                    atomic_set(&page->_mapcount, 0)
                  set_pmd_at
                    // Here we set the page's PMD entry which will be cleared
                    // when Thread A calls pmd_clear_bad().
                  ...
                  spin_unlock(&mm->page_table_lock)

    The mmap_sem does not prevent the race because both threads are acquiring
    it in shared mode (down_read).  Thread B holds the page_table_lock while
    the page's map count and PMD table entry are updated.  However, Thread A
    does not synchronize on that lock.

====== end quote =======

[akpm@linux-foundation.org: checkpatch fixes]
Reported-by: NUlrich Obergfell <uobergfe@redhat.com>
Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Jones <davej@redhat.com>
Acked-by: NLarry Woodman <lwoodman@redhat.com>
Acked-by: NRik van Riel <riel@redhat.com>
Cc: <stable@vger.kernel.org>		[2.6.38+]
Cc: Mark Salter <msalter@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1a5a9906

13 3月, 2012 1 次提交

x86: Rename trap_no to trap_nr in thread_struct · 51e7dc70

由 Srikar Dronamraju 提交于 3月 12, 2012

There are precedences of trap number being referred to as
trap_nr. However thread struct refers trap number as trap_no.
Change it to trap_nr.

Also use enum instead of left-over literals for trap values.

This is pure cleanup, no functional change intended.
Suggested-by: NIngo Molnar <mingo@eltu.hu>
Signed-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jim Keniston <jkenisto@linux.vnet.ibm.com>
Cc: Linux-mm <linux-mm@kvack.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120312092555.5379.942.sendpatchset@srdronam.in.ibm.com
[ Fixed the math-emu build ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>

51e7dc70

18 1月, 2012 2 次提交

x86-32: Fix build failure with AUDIT=y, AUDITSYSCALL=n · 6015ff10

由 Al Viro 提交于 1月 18, 2012

JONGMAN HEO reports:

  With current linus git (commit a25a2b84), I got following build error,

  arch/x86/kernel/vm86_32.c: In function 'do_sys_vm86':
  arch/x86/kernel/vm86_32.c:340: error: implicit declaration of function '__audit_syscall_exit'
  make[3]: *** [arch/x86/kernel/vm86_32.o] Error 1

OK, I can reproduce it (32bit allmodconfig with AUDIT=y, AUDITSYSCALL=n)

It's due to commit d7e7528b: "Audit: push audit success and retcode
into arch ptrace.h".
Reported-by: NJONGMAN HEO <jongman.heo@samsung.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6015ff10

Audit: push audit success and retcode into arch ptrace.h · d7e7528b

由 Eric Paris 提交于 1月 03, 2012

The audit system previously expected arches calling to audit_syscall_exit to
supply as arguments if the syscall was a success and what the return code was.
Audit also provides a helper AUDITSC_RESULT which was supposed to simplify things
by converting from negative retcodes to an audit internal magic value stating
success or failure. This helper was wrong and could indicate that a valid
pointer returned to userspace was a failed syscall. The fix is to fix the
layering foolishness. We now pass audit_syscall_exit a struct pt_reg and it
in turns calls back into arch code to collect the return value and to
determine if the syscall was a success or failure. We also define a generic
is_syscall_success() macro which determines success/failure based on if the
value is < -MAX_ERRNO. This works for arches like x86 which do not use a
separate mechanism to indicate syscall failure.

We make both the is_syscall_success() and regs_return_value() static inlines
instead of macros. The reason is because the audit function must take a void*
for the regs. (uml calls theirs struct uml_pt_regs instead of just struct
pt_regs so audit_syscall_exit can't take a struct pt_regs). Since the audit
function takes a void* we need to use static inlines to cast it back to the
arch correct structure to dereference it.

The other major change is that on some arches, like ia64, MIPS and ppc, we
change regs_return_value() to give us the negative value on syscall failure.
THE only other user of this macro, kretprobe_example.c, won't notice and it
makes the value signed consistently for the audit functions across all archs.

In arch/sh/kernel/ptrace_64.c I see that we were using regs[9] in the old
audit code as the return value. But the ptrace_64.h code defined the macro
regs_return_value() as regs[3]. I have no idea which one is correct, but this
patch now uses the regs_return_value() function, so it now uses regs[3].

For powerpc we previously used regs->result but now use the
regs_return_value() function which uses regs->gprs[3]. regs->gprs[3] is
always positive so the regs_return_value(), much like ia64 makes it negative
before calling the audit code when appropriate.
Signed-off-by: NEric Paris <eparis@redhat.com>
Acked-by: H. Peter Anvin <hpa@zytor.com> [for x86 portion]
Acked-by: Tony Luck <tony.luck@intel.com> [for ia64]
Acked-by: Richard Weinberger <richard@nod.at> [for uml]
Acked-by: David S. Miller <davem@davemloft.net> [for sparc]
Acked-by: Ralf Baechle <ralf@linux-mips.org> [for mips]
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [for ppc]

d7e7528b

14 1月, 2011 1 次提交

thp: split_huge_page_mm/vma · bae9c19b

由 Andrea Arcangeli 提交于 1月 13, 2011

split_huge_page_pmd compat code.  Each one of those would need to be
expanded to hundred of lines of complex code without a fully reliable
split_huge_page_pmd design.
Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
Acked-by: NRik van Riel <riel@redhat.com>
Acked-by: NMel Gorman <mel@csn.ul.ie>
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bae9c19b

24 9月, 2010 1 次提交

x86, vm86: Fix preemption bug for int1 debug and int3 breakpoint handlers. · 6554287b

由 Bart Oldeman 提交于 9月 23, 2010

Impact: fix kernel bug such as:
BUG: scheduling while atomic: dosemu.bin/19680/0x00000004
See also Ubuntu bug 455067 at
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/455067

Commits 4915a35e
("Use preempt_conditional_sti/cli in do_int3, like on x86_64.")
and 3d2a71a5
("x86, traps: converge do_debug handlers")
started disabling preemption in int1 and int3 handlers on i386.
The problem with vm86 is that the call to handle_vm86_trap() may jump
straight to entry_32.S and never returns so preempt is never enabled
again, and there is an imbalance in the preempt count.

Commit be716615 ("x86, vm86:
fix preemption bug"), which was later (accidentally?) reverted by commit
08d68323 ("hw-breakpoints: modifying
generic debug exception to use thread-specific debug registers")
fixed the problem for debug exceptions but not for breakpoints.

There are three solutions to this problem.

1. Reenable preemption before calling handle_vm86_trap(). This
was the approach that was later reverted.

2. Do not disable preemption for i386 in breakpoint and debug handlers.
This was the situation before October 2008. As far as I understand
preemption only needs to be disabled on x86_64 because a seperate stack is
used, but it's nice to have things work the same way on
i386 and x86_64.

3. Let handle_vm86_trap() return instead of jumping to assembly code.
By setting a flag in _TIF_WORK_MASK, either TIF_IRET or TIF_NOTIFY_RESUME,
the code in entry_32.S is instructed to return to 32 bit mode from
V86 mode. The logic in entry_32.S was already present to handle signals.
(I chose TIF_IRET because it's slightly more efficient in
do_notify_resume() in signal.c, but in fact TIF_IRET can probably be
replaced by TIF_NOTIFY_RESUME everywhere.)

I'm submitting approach 3, because I believe it is the most elegant
and prevents future confusion. Still, an obvious
preempt_conditional_cli(regs); is necessary in traps.c to correct the
bug.

[ hpa: This is technically a regression, but because:
  1. the regression is so old,
  2. the patch seems relatively high risk, justifying more testing, and
  3. we're late in the 2.6.36-rc cycle,

  I'm queuing it up for the 2.6.37 merge window.  It might, however,
  justify as a -stable backport at a latter time, hence Cc: stable. ]
Signed-off-by: NBart Oldeman <bartoldeman@users.sourceforge.net>
LKML-Reference: <alpine.DEB.2.00.1009231312330.4732@localhost.localdomain>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: K.Prasad <prasad@linux.vnet.ibm.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Alexander van Heukelum <heukelum@fastmail.fm>
Cc: <stable@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

6554287b

10 12月, 2009 1 次提交

x86, 32-bit: Convert sys_vm86 & sys_vm86old · f1382f15

由 Brian Gerst 提交于 12月 09, 2009

Convert these to new PTREGSCALL stubs.
Signed-off-by: NBrian Gerst <brgerst@gmail.com>
LKML-Reference: <1260403316-5679-6-git-send-email-brgerst@gmail.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

f1382f15

07 6月, 2009 1 次提交

x86: Fix non-lazy GS handling in sys_vm86() · 3aa6b186

由 Lubomir Rintel 提交于 6月 07, 2009

This fixes a stack corruption panic or null dereference oops
due to a bad GS in resume_userspace() when returning from
sys_vm86() and calling lockdep_sys_exit().

Only a problem when CONFIG_LOCKDEP and CONFIG_CC_STACKPROTECTOR
enabled.
Signed-off-by: NLubomir Rintel <lkundrak@v3.sk>
Cc: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <1244384628.2323.4.camel@bimbo>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

3aa6b186

07 5月, 2009 1 次提交

x86: use symbolic name for VM86_SIGNAL when used as vm86 default return · 975e5f45

由 Samuel Bronson 提交于 5月 06, 2009

This code has apparently used "0" and not VM86_SIGNAL since Linux
1.1.9, when Linus added VM86_SIGNAL to vm86.h. This patch changes the
code to use the symbolic name.

The magic 0 tripped me up in trying to extend the vm86(2) manpage to
actually explain vm86()'s interface -- my greps for VM86_SIGNAL came up
fruitless.

[ Impact: cleanup; no object code change ]
Signed-off-by: NSamuel Bronson <naesten@gmail.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

975e5f45

12 2月, 2009 1 次提交

x86: use regparm(3) for passed-in pt_regs pointer · b12bdaf1

由 Brian Gerst 提交于 2月 11, 2009

Some syscalls need to access the pt_regs structure, either to copy
user register state or to modifiy it.  This patch adds stubs to load
the address of the pt_regs struct into the %eax register, and changes
the syscalls to take the pointer as an argument instead of relying on
the assumption that the pt_regs structure overlaps the function
arguments.

Drop the use of regparm(1) due to concern about gcc bugs, and to move
in the direction of the eventual removal of regparm(0) for asmlinkage.
Signed-off-by: NBrian Gerst <brgerst@gmail.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

b12bdaf1

11 2月, 2009 1 次提交

x86: pass in pt_regs pointer for syscalls that need it · 253f29a4

由 Brian Gerst 提交于 2月 10, 2009

Some syscalls need to access the pt_regs structure, either to copy
user register state or to modifiy it.  This patch adds stubs to load
the address of the pt_regs struct into the %eax register, and changes
the syscalls to regparm(1) to receive the pt_regs pointer as the
first argument.
Signed-off-by: NBrian Gerst <brgerst@gmail.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

253f29a4

10 2月, 2009 1 次提交

x86: add %gs accessors for x86_32 · d9a89a26

由 Tejun Heo 提交于 2月 09, 2009

Impact: cleanup

On x86_32, %gs is handled lazily.  It's not saved and restored on
kernel entry/exit but only when necessary which usually is during task
switch but there are few other places.  Currently, it's done by
calling savesegment() and loadsegment() explicitly.  Define
get_user_gs(), set_user_gs() and task_user_gs() and use them instead.

While at it, clean up register access macros in signal.c.

This cleans up code a bit and will help future changes.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d9a89a26

22 7月, 2008 1 次提交

x86: Introducing asm/syscalls.h · bbc1f698

由 Jaswinder Singh 提交于 7月 21, 2008

Declaring arch-dependent syscalls for x86 architecture
Signed-off-by: NJaswinder Singh <jaswinder@infradead.org>

bbc1f698

17 4月, 2008 4 次提交

G
x86: replace most VM86 flags with flags from processor-flags.h · a5c15d41
由 gorcunov@gmail.com 提交于 3月 28, 2008
```
Signed-off-by: NCyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
```
a5c15d41

x86: cleanup - rename VM_MASK to X86_VM_MASK · 6b6891f9

由 gorcunov@gmail.com 提交于 3月 28, 2008

This patch renames VM_MASK to X86_VM_MASK (which
in turn defined as alias to X86_EFLAGS_VM) to better
distinguish from virtual memory flags. We can't just
use X86_EFLAGS_VM instead because it is also used
for conditional compilation
Signed-off-by: NCyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

6b6891f9

x86: handle_vm86_trap cleanup · 0f540910

由 Roland McGrath 提交于 3月 17, 2008

Use force_sig in handle_vm86_trap like other machine traps do.
Signed-off-by: NRoland McGrath <roland@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

0f540910

x86: coding style fixes to arch/x86/kernel/vm86_32.c · 83e714e8

由 Paolo Ciarrocchi 提交于 2月 22, 2008

Before:
   total: 64 errors, 18 warnings, 840 lines checked
After:
   total: 12 errors, 15 warnings, 844 lines checked

No code changed:

arch/x86/kernel/vm86_32.o:

   text	   data	    bss	    dec	    hex	filename
   4449	     28	    132	   4609	   1201	vm86_32.o.before
   4449	     28	    132	   4609	   1201	vm86_32.o.after

md5:
   e4e51ed7689d17f04148554a3c6d5bb6  vm86_32.o.before.asm
   e4e51ed7689d17f04148554a3c6d5bb6  vm86_32.o.after.asm
Signed-off-by: NPaolo Ciarrocchi <paolo.ciarrocchi@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

83e714e8

30 1月, 2008 5 次提交

arch/x86/: spelling fixes · ab4a574e

由 Joe Perches 提交于 1月 30, 2008

Spelling fixes.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

ab4a574e

x86: remove all definitions with fastcall · 75604d7f

由 Harvey Harrison 提交于 1月 30, 2008

fastcall is always defined to be empty, remove it from arch/x86
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

75604d7f

x86: remove last users of FASTCALL · 6b7d190b

由 Harvey Harrison 提交于 1月 30, 2008

FASTCALL() is always empty.
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

6b7d190b

x86: use generic register name in the thread and tss structures · faca6227

由 H. Peter Anvin 提交于 1月 30, 2008

This changes size-specific register names (eip/rip, esp/rsp, etc.) to
generic names in the thread and tss structures.
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

faca6227

x86: rename the struct pt_regs members for 32/64-bit consistency · 65ea5b03

由 H. Peter Anvin 提交于 1月 30, 2008

We have a lot of code which differs only by the naming of specific
members of structures that contain registers.  In order to enable
additional unifications, this patch drops the e- or r- size prefix
from the register names in struct pt_regs, and drops the x- prefixes
for segment registers on the 32-bit side.

This patch also performs the equivalent renames in some additional
places that might be candidates for unification in the future.
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

65ea5b03

14 10月, 2007 1 次提交

Delete filenames in comments. · 835c34a1

由 Dave Jones 提交于 10月 12, 2007

Since the x86 merge, lots of files that referenced their own filenames
are no longer correct.  Rather than keep them up to date, just delete
them, as they add no real value.

Additionally:
- fix up comment formatting in scx200_32.c
- Remove a credit from myself in setup_64.c from a time when we had no SCM
- remove longwinded history from tsc_32.c which can be figured out from
  git.
Signed-off-by: NDave Jones <davej@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

835c34a1

11 10月, 2007 2 次提交

i386: move kernel · 9a163ed8

由 Thomas Gleixner 提交于 10月 11, 2007

Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

9a163ed8

i386: prepare shared kernel/vm86.c · 7338d435

由 Thomas Gleixner 提交于 10月 11, 2007

Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

7338d435

09 5月, 2007 1 次提交

header cleaning: don't include smp_lock.h when not used · e63340ae

由 Randy Dunlap 提交于 5月 08, 2007

Remove includes of <linux/smp_lock.h> where it is not used/needed.
Suggested by Al Viro.

Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e63340ae

13 2月, 2007 1 次提交

[PATCH] i386: Convert i386 PDA code to use %fs · 464d1a78

由 Jeremy Fitzhardinge 提交于 2月 13, 2007

Convert the PDA code to use %fs rather than %gs as the segment for
per-processor data.  This is because some processors show a small but
measurable performance gain for reloading a NULL segment selector (as %fs
generally is in user-space) versus a non-NULL one (as %gs generally is).

On modern processors the difference is very small, perhaps undetectable.
Some old AMD "K6 3D+" processors are noticably slower when %fs is used
rather than %gs; I have no idea why this might be, but I think they're
sufficiently rare that it doesn't matter much.

This patch also fixes the math emulator, which had not been adjusted to
match the changed struct pt_regs.

[frederik.deweerdt@gmail.com: fixit with gdb]
[mingo@elte.hu: Fix KVM too]
Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: NAndi Kleen <ak@suse.de>
Cc: Ian Campbell <Ian.Campbell@XenSource.com>
Acked-by: NIngo Molnar <mingo@elte.hu>
Acked-by: NZachary Amsden <zach@vmware.com>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NFrederik Deweerdt <frederik.deweerdt@gmail.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>

464d1a78

07 12月, 2006 1 次提交

[PATCH] i386: Update sys_vm86 to cope with changed pt_regs and %gs usage · 49d26b6e

由 Jeremy Fitzhardinge 提交于 12月 07, 2006

sys_vm86 uses a struct kernel_vm86_regs, which is identical to pt_regs, but
adds an extra space for all the segment registers.  Previously this structure
was completely independent, so changes in pt_regs had to be reflected in
kernel_vm86_regs.  This changes just embeds pt_regs in kernel_vm86_regs, and
makes the appropriate changes to vm86.c to deal with the new naming.

Also, since %gs is dealt with differently in the kernel, this change adjusts
vm86.c to reflect this.

While making these changes, I also cleaned up some frankly bizarre code which
was added when auditing was added to sys_vm86.
Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: NAndi Kleen <ak@suse.de>
Cc: Chuck Ebbert <76306.1226@compuserve.com>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>

49d26b6e

05 10月, 2006 1 次提交

IRQ: Maintain regs pointer globally rather than passing to IRQ handlers · 7d12e780

由 David Howells 提交于 10月 05, 2006

Maintain a per-CPU global "struct pt_regs *" variable which can be used instead
of passing regs around manually through all ~1800 interrupt handlers in the
Linux kernel.

The regs pointer is used in few places, but it potentially costs both stack
space and code to pass it around.  On the FRV arch, removing the regs parameter
from all the genirq function results in a 20% speed up of the IRQ exit path
(ie: from leaving timer_interrupt() to leaving do_IRQ()).

Where appropriate, an arch may override the generic storage facility and do
something different with the variable.  On FRV, for instance, the address is
maintained in GR28 at all times inside the kernel as part of general exception
handling.

Having looked over the code, it appears that the parameter may be handed down
through up to twenty or so layers of functions.  Consider a USB character
device attached to a USB hub, attached to a USB controller that posts its
interrupts through a cascaded auxiliary interrupt controller.  A character
device driver may want to pass regs to the sysrq handler through the input
layer which adds another few layers of parameter passing.

I've build this code with allyesconfig for x86_64 and i386.  I've runtested the
main part of the code on FRV and i386, though I can't test most of the drivers.
I've also done partial conversion for powerpc and MIPS - these at least compile
with minimal configurations.

This will affect all archs.  Mostly the changes should be relatively easy.
Take do_IRQ(), store the regs pointer at the beginning, saving the old one:

	struct pt_regs *old_regs = set_irq_regs(regs);

And put the old one back at the end:

	set_irq_regs(old_regs);

Don't pass regs through to generic_handle_irq() or __do_IRQ().

In timer_interrupt(), this sort of change will be necessary:

	-	update_process_times(user_mode(regs));
	-	profile_tick(CPU_PROFILING, regs);
	+	update_process_times(user_mode(get_irq_regs()));
	+	profile_tick(CPU_PROFILING);

I'd like to move update_process_times()'s use of get_irq_regs() into itself,
except that i386, alone of the archs, uses something other than user_mode().

Some notes on the interrupt handling in the drivers:

 (*) input_dev() is now gone entirely.  The regs pointer is no longer stored in
     the input_dev struct.

 (*) finish_unlinks() in drivers/usb/host/ohci-q.c needs checking.  It does
     something different depending on whether it's been supplied with a regs
     pointer or not.

 (*) Various IRQ handler function pointers have been moved to type
     irq_handler_t.
Signed-Off-By: NDavid Howells <dhowells@redhat.com>
(cherry picked from 1b16e7ac850969f38b375e511e3fa2f474a33867 commit)

7d12e780

01 7月, 2006 1 次提交

Remove obsolete #include <linux/config.h> · 6ab3d562

由 Jörn Engel 提交于 6月 30, 2006

Signed-off-by: NJörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: NAdrian Bunk <bunk@stusta.de>

6ab3d562

01 5月, 2006 1 次提交

[PATCH] drop task argument of audit_syscall_{entry,exit} · 5411be59

由 Al Viro 提交于 3月 29, 2006

... it's always current, and that's a good thing - allows simpler locking.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

5411be59

21 3月, 2006 1 次提交

[PATCH] make vm86 call audit_syscall_exit · 7e7f8a03

由 Jason Baron 提交于 1月 31, 2006

hi,

The motivation behind the patch below was to address messages in
/var/log/messages such as:

Jan 31 10:54:15 mets kernel: audit(:0): major=252 name_count=0: freeing
multiple contexts (1)
Jan 31 10:54:15 mets kernel: audit(:0): major=113 name_count=0: freeing
multiple contexts (2)

I can reproduce by running 'get-edid' from:
http://john.fremlin.de/programs/linux/read-edid/.

These messages come about in the log b/c the vm86 calls do not exit via
the normal system call exit paths and thus do not call
'audit_syscall_exit'. The next system call will then free the context for
itself and for the vm86 context, thus generating the above messages. This
patch addresses the issue by simply adding a call to 'audit_syscall_exit'
from the vm86 code.

Besides fixing the above error messages the patch also now allows vm86
system calls to become auditable. This is useful since strace does not
appear to properly record the return values from sys_vm86.

I think this patch is also a step in the right direction in terms of
cleaning up some core auditing code. If we can correct any other paths
that do not properly call the audit exit and entries points, then we can
also eliminate the notion of context chaining.

I've tested this patch by verifying that the log messages no longer
appear, and that the audit records for sys_vm86 appear to be correct.
Also, 'read_edid' produces itentical output.

thanks,

-Jason
Signed-off-by: NJason Baron <jbaron@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7e7f8a03

15 1月, 2006 1 次提交

correct email address of Manfred Spraul · 624dffcb

由 Christian Kujau 提交于 1月 15, 2006

I  tried to send the forcedeth maintainer an email, but it came back with:

"The mail address manfreds@colorfullife.com is not read anymore.
Please resent your mail to manfred@ instead of manfreds@."

This patch fixes this.
Signed-off-by: NAdrian Bunk <bunk@stusta.de>

624dffcb

13 1月, 2006 1 次提交

[PATCH] i386: task_thread_info() · 06b425d8

由 Al Viro 提交于 1月 12, 2006

Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

06b425d8

12 1月, 2006 1 次提交

[PATCH] capable/capability.h (arch/) · a9415644

由 Randy Dunlap 提交于 1月 11, 2006

arch: Use <linux/capability.h> where capable() is used.
Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

a9415644

30 10月, 2005 1 次提交

[PATCH] mm: i386 sh sh64 ready for split ptlock · 60ec5585

由 Hugh Dickins 提交于 10月 29, 2005

Use pte_offset_map_lock, instead of pte_offset_map (or inappropriate
pte_offset_kernel) and mm-wide page_table_lock, in sundry arch places.

The i386 vm86 mark_screen_rdonly: yes, there was and is an assumption that the
screen fits inside the one page table, as indeed it does.

The sh __do_page_fault: which handles both kernel faults (without lock) and
user mm faults (locked - though it set_pte without locking before).

The sh64 flush_cache_range and helpers: which wrongly thought callers held
page_table_lock before (only its tlb_start_vma did, and no longer does so);
moved the flush loop down, and adjusted the large versus small range decision
to consider a range which spans page tables as large.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Acked-by: NPaul Mundt <lethal@linux-sh.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

60ec5585

05 9月, 2005 1 次提交

[PATCH] i386: inline assembler: cleanup and encapsulate descriptor and task register management · 4d37e7e3

由 Zachary Amsden 提交于 9月 03, 2005

i386 inline assembler cleanup.

This change encapsulates descriptor and task register management.  Also,
it is possible to improve assembler generation in two cases; savesegment
may store the value in a register instead of a memory location, which
allows GCC to optimize stack variables into registers, and MOV MEM, SEG
is always a 16-bit write to memory, making the casting in math-emu
unnecessary.
Signed-off-by: NZachary Amsden <zach@vmware.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

4d37e7e3

Linux-御风守护者 / linux 与 Fork 源项目一致

Linux-御风守护者 / linux
与 Fork 源项目一致