1. 23 6月, 2008 29 次提交
  2. 21 6月, 2008 1 次提交
    • L
      Reinstate ZERO_PAGE optimization in 'get_user_pages()' and fix XIP · 89f5b7da
      Linus Torvalds 提交于
      KAMEZAWA Hiroyuki and Oleg Nesterov point out that since the commit
      557ed1fa ("remove ZERO_PAGE") removed
      the ZERO_PAGE from the VM mappings, any users of get_user_pages() will
      generally now populate the VM with real empty pages needlessly.
      
      We used to get the ZERO_PAGE when we did the "handle_mm_fault()", but
      since fault handling no longer uses ZERO_PAGE for new anonymous pages,
      we now need to handle that special case in follow_page() instead.
      
      In particular, the removal of ZERO_PAGE effectively removed the core
      file writing optimization where we would skip writing pages that had not
      been populated at all, and increased memory pressure a lot by allocating
      all those useless newly zeroed pages.
      
      This reinstates the optimization by making the unmapped PTE case the
      same as for a non-existent page table, which already did this correctly.
      
      While at it, this also fixes the XIP case for follow_page(), where the
      caller could not differentiate between the case of a page that simply
      could not be used (because it had no "struct page" associated with it)
      and a page that just wasn't mapped.
      
      We do that by simply returning an error pointer for pages that could not
      be turned into a "struct page *".  The error is arbitrarily picked to be
      EFAULT, since that was what get_user_pages() already used for the
      equivalent IO-mapped page case.
      
      [ Also removed an impossible test for pte_offset_map_lock() failing:
        that's not how that function works ]
      Acked-by: NOleg Nesterov <oleg@tv-sign.ru>
      Acked-by: NNick Piggin <npiggin@suse.de>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Roland McGrath <roland@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      89f5b7da
  3. 19 6月, 2008 4 次提交
    • J
      x86, geode: add a VSA2 ID for General Software · ffe6e1da
      Jordan Crouse 提交于
      General Software writes their own VSA2 module for their version
      of the Geode BIOS, which returns a different ID then the standard
      VSA2.  This was causing the framebuffer driver to break for most
      GSW boards.
      Signed-off-by: NJordan Crouse <jordan.crouse@amd.com>
      Cc: tglx@linutronix.de
      Cc: linux-geode@lists.infradead.org
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ffe6e1da
    • B
      x86: use BOOTMEM_EXCLUSIVE on 32-bit · d3942cff
      Bernhard Walle 提交于
      This patch uses the BOOTMEM_EXCLUSIVE for crashkernel reservation also for
      i386 and prints a error message on failure.
      
      The patch is still for 2.6.26 since it is only bug fixing. The unification
      of reserve_crashkernel() between i386 and x86_64 should be done for 2.6.27.
      Signed-off-by: NBernhard Walle <bwalle@suse.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: <stable@kernel.org>
      d3942cff
    • M
      x86, 32-bit: fix boot failure on TSC-less processors · df17b1d9
      Mikael Pettersson 提交于
      Booting 2.6.26-rc6 on my 486 DX/4 fails with a "BUG: Int 6"
      (invalid opcode) and a kernel halt immediately after the
      kernel has been uncompressed. The BUG shows EIP pointing
      to an rdtsc instruction in native_read_tsc(), invoked from
      native_sched_clock().
      
      (This error occurs so early that not even the serial console
      can capture it.)
      
      A bisection showed that this bug first occurs in 2.6.26-rc3-git7,
      via commit 9ccc906c:
      
      >x86: distangle user disabled TSC from unstable
      >
      >tsc_enabled is set to 0 from the command line switch "notsc" and from
      >the mark_tsc_unstable code. Seperate those functionalities and replace
      >tsc_enable with tsc_disable. This makes also the native_sched_clock()
      >decision when to use TSC understandable.
      >
      >Preparatory patch to solve the sched_clock() issue on 32 bit.
      >
      >Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
      
      The core reason for this bug is that native_sched_clock() gets
      called before tsc_init().
      
      Before the commit above, tsc_32.c used a "tsc_enabled" variable
      which defaulted to 0 == disabled, and which only got enabled late
      in tsc_init(). Thus early calls to native_sched_clock() would skip
      the TSC and use jiffies instead.
      
      After the commit above, tsc_32.c uses a "tsc_disabled" variable
      which defaults to 0, meaning that the TSC is Ok to use. Early calls
      to native_sched_clock() now erroneously try to use the TSC on
      !cpu_has_tsc processors, leading to invalid opcode exceptions.
      
      My proposed fix is to initialise tsc_disabled to a "soft disabled"
      state distinct from the hard disabled state set up by the "notsc"
      kernel option. This fixes the native_sched_clock() problem. It also
      allows tsc_init() to be simplified: instead of setting tsc_disabled = 1
      on every error return, we just set tsc_disabled = 0 once when all
      checks have succeeded.
      
      I've verified that this lets my 486 boot again. I've also verified
      that a Core2 machine still uses the TSC as clocksource after the patch.
      Signed-off-by: NMikael Pettersson <mikpe@it.uu.se>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      df17b1d9
    • S
      x86: fix NULL pointer deref in __switch_to · 75118a82
      Suresh Siddha 提交于
      Patrick McHardy reported a crash:
      
      > > I get this oops once a day, its apparently triggered by something
      > > run by cron, but the process is a different one each time.
      > >
      > > Kernel is -git from yesterday shortly before the -rc6 release
      > > (last commit is the usb-2.6 merge, the x86 patches are missing),
      > > .config is attached.
      > >
      > > I'll retry with current -git, but the patches that have gone in
      > > since I last updated don't look related.
      > >
      > > [62060.043009] BUG: unable to handle kernel NULL pointer dereference at
      > > 000001ff
      > > [62060.043009] IP: [<c0102a9b>] __switch_to+0x2f/0x118
      > > [62060.043009] *pde = 00000000
      > > [62060.043009] Oops: 0002 [#1] PREEMPT
      
      Vegard Nossum analyzed it:
      
      > This decodes to
      >
      >    0:   0f ae 00                fxsave (%eax)
      >
      > so it's related to the floating-point context. This is the exact
      > location of the crash:
      >
      > $ addr2line -e arch/x86/kernel/process_32.o -i ab0
      > include/asm/i387.h:232
      > include/asm/i387.h:262
      > arch/x86/kernel/process_32.c:595
      >
      > ...so it looks like prev_task->thread.xstate->fxsave has become NULL.
      > Or maybe it never had any other value.
      
      Somehow (as described below) TS_USEDFPU is set but the fpu is not
      allocated or freed.
      
      Another possible FPU pre-emption issue with the sleazy FPU optimization
      which was benign before but not so anymore, with the dynamic FPU allocation
      patch.
      
      New task is getting exec'd and it is prempted at the below point.
      
      flush_thread() {
      	...
      	/*
      	* Forget coprocessor state..
      	*/
      	clear_fpu(tsk);
      		<----- Preemption point
      	clear_used_math();
      	...
      }
      
      Now when it context switches in again, as the used_math() is still set
      and fpu_counter can be > 5, we will do a math_state_restore() which sets
      the task's TS_USEDFPU. After it continues from the above preemption point
      it does clear_used_math() and much later free_thread_xstate().
      
      Now, at the next context switch, it is quite possible that xstate is
      null, used_math() is not set and TS_USEDFPU is still set. This will
      trigger unlazy_fpu() causing kernel oops.
      
      Fix this  by clearing tsk's fpu_counter before clearing task's fpu.
      Reported-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      75118a82
  4. 18 6月, 2008 3 次提交
    • P
      [POWERPC] Clear sub-page HPTE present bits when demoting page size · 65ba6cdc
      Paul Mackerras 提交于
      When we demote a slice from 64k to 4k, and we are about to insert an
      HPTE for a 4k subpage and we notice that there is an existing 64k
      HPTE, we first invalidate that HPTE before inserting the new 4k
      subpage HPTE.  Since the bits that encode which hash bucket the old
      HPTE was in overlap with the bits that encode which of the 16 subpages
      have HPTEs, we need to clear out the subpage HPTE-present bits before
      starting to insert HPTEs for the 4k subpages.  If we don't do that, we
      can erroneously think that a subpage already has an HPTE when it
      doesn't.
      
      That in itself wouldn't be such a problem except that when we go to
      update the HPTE that we think is present on machines with a
      hypervisor, the hypervisor can tell us that the HPTE we think is there
      is actually there even though it isn't, which can lead to a process
      getting stuck in a loop, continually faulting.  The reason for the
      confusion is that the AVPN (abbreviated virtual page number) we are
      looking for in the HPTE for a 4k subpage can actually match the AVPN
      in a stale HPTE for another 64k page.  For example, the HPTE for
      the 4k subpage at 0x84000f000 will be in the same hash bucket and have
      the same AVPN as the HPTE for the 64k page at 0x8400f0000.
      
      This fixes the code to clear out the subpage HPTE-present bits.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      65ba6cdc
    • J
      [POWERPC] 4xx: Clear new TLB cache attribute bits in Data Storage vector · b17879f7
      Josh Boyer 提交于
      A recent commit added support for the new 440x6 and 464 cores that have the
      added WL1, IL1I, IL1D, IL2I, and ILD2 bits for the caching attributes in the
      TLBs.  The new bits were cleared in the finish_tlb_load function, however a
      similar bit of code was missed in the DataStorage interrupt vector.
      Signed-off-by: NJosh Boyer <jwboyer@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      b17879f7
    • L
      x86-64: Fix "bytes left to copy" return value for copy_from_user() · 42a886af
      Linus Torvalds 提交于
      Most users by far do not care about the exact return value (they only
      really care about whether the copy succeeded in its entirety or not),
      but a few special core routines actually care deeply about exactly how
      many bytes were copied from user space.
      
      And the unrolled versions of the x86-64 user copy routines would
      sometimes report that it had copied more bytes than it actually had.
      
      Very few uses actually have partial copies to begin with, but to make
      this bug even harder to trigger, most x86 CPU's use the "rep string"
      instructions for normal user copies, and that version didn't have this
      issue.
      
      To make it even harder to hit, the one user of this that really cared
      about the return value (and used the uncached version of the copy that
      doesn't use the "rep string" instructions) was the generic write
      routine, which pre-populated its source, once more hiding the problem by
      avoiding the exception case that triggers the bug.
      
      In other words, very special thanks to Bron Gondwana who not only
      triggered this, but created a test-program to show it, and bisected the
      behavior down to commit 08291429 ("mm:
      fix pagecache write deadlocks") which changed the access pattern just
      enough that you can now trigger it with 'writev()' with multiple
      iovec's.
      
      That commit itself was not the cause of the bug, it just allowed all the
      stars to align just right that you could trigger the problem.
      
      [ Side note: this is just the minimal fix to make the copy routines
        (with __copy_from_user_inatomic_nocache as the particular version that
        was involved in showing this) have the right return values.
      
        We really should improve on the exceptional case further - to make the
        copy do a byte-accurate copy up to the exact page limit that causes it
        to fail.  As it is, the callers have to do extra work to handle the
        limit case gracefully. ]
      Reported-by: NBron Gondwana <brong@fastmail.fm>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      
       (which didn't have this problem), and since
      most users that do the carethis was very hard to trigger, but
      42a886af
  5. 17 6月, 2008 2 次提交
  6. 16 6月, 2008 1 次提交