1. 12 11月, 2005 1 次提交
    • R
      [IA64] 4-level page tables · 837cd0bd
      Robin Holt 提交于
      This patch introduces 4-level page tables to ia64.  I have run
      some benchmarks and found nothing interesting.  Performance has
      consistently fallen within the noise range.
      
      It also introduces a config option (setting the default to 3
      levels).  The config option prevents having 4 level page
      tables with 64k base page size.
      Signed-off-by: NRobin Holt <holt@sgi.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      837cd0bd
  2. 11 11月, 2005 2 次提交
  3. 09 11月, 2005 10 次提交
    • N
      [PATCH] sched: resched and cpu_idle rework · 64c7c8f8
      Nick Piggin 提交于
      Make some changes to the NEED_RESCHED and POLLING_NRFLAG to reduce
      confusion, and make their semantics rigid.  Improves efficiency of
      resched_task and some cpu_idle routines.
      
      * In resched_task:
      - TIF_NEED_RESCHED is only cleared with the task's runqueue lock held,
        and as we hold it during resched_task, then there is no need for an
        atomic test and set there. The only other time this should be set is
        when the task's quantum expires, in the timer interrupt - this is
        protected against because the rq lock is irq-safe.
      
      - If TIF_NEED_RESCHED is set, then we don't need to do anything. It
        won't get unset until the task get's schedule()d off.
      
      - If we are running on the same CPU as the task we resched, then set
        TIF_NEED_RESCHED and no further action is required.
      
      - If we are running on another CPU, and TIF_POLLING_NRFLAG is *not* set
        after TIF_NEED_RESCHED has been set, then we need to send an IPI.
      
      Using these rules, we are able to remove the test and set operation in
      resched_task, and make clear the previously vague semantics of
      POLLING_NRFLAG.
      
      * In idle routines:
      - Enter cpu_idle with preempt disabled. When the need_resched() condition
        becomes true, explicitly call schedule(). This makes things a bit clearer
        (IMO), but haven't updated all architectures yet.
      
      - Many do a test and clear of TIF_NEED_RESCHED for some reason. According
        to the resched_task rules, this isn't needed (and actually breaks the
        assumption that TIF_NEED_RESCHED is only cleared with the runqueue lock
        held). So remove that. Generally one less locked memory op when switching
        to the idle thread.
      
      - Many idle routines clear TIF_POLLING_NRFLAG, and only set it in the inner
        most polling idle loops. The above resched_task semantics allow it to be
        set until before the last time need_resched() is checked before going into
        a halt requiring interrupt wakeup.
      
        Many idle routines simply never enter such a halt, and so POLLING_NRFLAG
        can be always left set, completely eliminating resched IPIs when rescheduling
        the idle task.
      
        POLLING_NRFLAG width can be increased, to reduce the chance of resched IPIs.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Con Kolivas <kernel@kolivas.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      64c7c8f8
    • N
      [PATCH] sched: disable preempt in idle tasks · 5bfb5d69
      Nick Piggin 提交于
      Run idle threads with preempt disabled.
      
      Also corrected a bugs in arm26's cpu_idle (make it actually call schedule()).
      How did it ever work before?
      
      Might fix the CPU hotplugging hang which Nigel Cunningham noted.
      
      We think the bug hits if the idle thread is preempted after checking
      need_resched() and before going to sleep, then the CPU offlined.
      
      After calling stop_machine_run, the CPU eventually returns from preemption and
      into the idle thread and goes to sleep.  The CPU will continue executing
      previous idle and have no chance to call play_dead.
      
      By disabling preemption until we are ready to explicitly schedule, this bug is
      fixed and the idle threads generally become more robust.
      
      From: alexs <ashepard@u.washington.edu>
      
        PPC build fix
      
      From: Yoichi Yuasa <yuasa@hh.iij4u.or.jp>
      
        MIPS build fix
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NYoichi Yuasa <yuasa@hh.iij4u.or.jp>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5bfb5d69
    • C
      [PATCH] remove ioctl32_handler_t · 7e4c54a2
      Christoph Hellwig 提交于
      Some architectures define and use this type in their compat_ioctl code, but
      all of them can easily use the identical ioctl_trans_handler_t type that is
      defined in common code.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7e4c54a2
    • B
      [IA64] add the MMIO regions that are translated to I/O port space to /proc/iomem · 4f41d5a4
      Bjorn Helgaas 提交于
      ia64 translates normal loads and stores to special MMIO regions into I/O port
      accesses.  Reserve these special MMIO regions in /proc/iomem.
      
      Sample /proc/iomem:
          f8100000000-f81003fffff : PCI Bus 0000:80 I/O Ports 00000000-00000fff
          f8100400000-f81007fffff : PCI Bus 0000:8e I/O Ports 00001000-00001fff
          f8100800000-f8100ffffff : PCI Bus 0000:9c I/O Ports 00002000-00003fff
          f8101000000-f81017fffff : PCI Bus 0000:aa I/O Ports 00004000-00005fff
      
      and corresponding /proc/ioports:
          00000000-00000fff : PCI Bus 0000:80
          00001000-00001fff : PCI Bus 0000:8e
          00002000-00003fff : PCI Bus 0000:9c
          00004000-00005fff : PCI Bus 0000:aa
      Signed-off-by: NBjorn Helgaas <bjorn.helgaas@hp.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      4f41d5a4
    • M
      [IA64] altix: misc pci interrupt related fixes · 6fb93a92
      Mark Maule 提交于
      Fix a couple of altix interrupt related bugs.
      Signed-off-by: NMark Maule <maule@sgi.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      6fb93a92
    • R
      [IA64] MCA recovery: Bump reference count on bad pages · cbb92144
      Russ Anderson 提交于
      When a page has a memory uncorrectable ECC error, the recovery
      code wants to prevent the page from being reused.  This change
      bumps the reference count to prevent the page from getting back
      on the free list.
      
      Signed-off-by: Russ Anderson (rja@sgi.com)
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      cbb92144
    • R
      [IA64] MCA recovery: pfn_valid() needs a pfn · 56f87b82
      Russ Anderson 提交于
      paddr needs to be shifted by PAGE_SHIFT to be valid
      input for pfn_valid().
      Signed-off-by: NRuss Anderson <rja@sgi.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      56f87b82
    • R
      [IA64] MCA recovery based on PSP bits · a14f25a0
      Russ Anderson 提交于
      The determination of whether an MCA is recoverable or not must
      be based on the bits set in the PSP (Processor State Parameter).
      The specific bits are shown in the Intel IA-64 Architecture Software
      Developer's Manual, Vol 2, Table 11-6 Software Recovery Bits in
      Processor State Parameter.  Those bits should be consistent
      across the entire IA-64 family of processors.
      Signed-off-by: NRuss Anderson <rja@sgi.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      a14f25a0
    • D
      [IA64] align signal-frame even when not using alternate signal-stack · cf20d1ea
      David Mosberger-Tang 提交于
      At the moment, attempting to invoke a signal-handler on the normal
      stack is guaranteed to fail if the stack-pointer happens not to be
      16-byte aligned.  This is because the signal-trampoline will attempt
      to store fp-regs with stf.spill instructions, which will trap for
      misaligned addresses.  This isn't terribly useful behavior.  It's
      better to just always align the signal frame to the next lower 16-byte
      boundary.
      Signed-off-by: NDavid Mosberger-Tang <David.Mosberger@acm.org>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      cf20d1ea
    • B
      [IA64] fix memory less node allocation · 97835245
      Bob Picco 提交于
      The original memory less node allocation attempted to use NODEDATA_ALIGN for
      alignment.  The bootmem allocator only allows a power of two alignments. This
      causes a BUG_ON for some nodes. For cpu only nodes just allocate with a
      PERCPU_PAGE_SIZE alignment.
      
      Some older firmware reports SLIT distances of 0xff and results in bestnode
      not being computed. This is now treated correctly.
      
      The failed allocation check was removed because it's redundant.  The
      bootmem allocator already makes this check.
      
      This fix has been boot tested on 4 node machine which has 4 cpu only nodes
      and 1 memory node.  Thanks to Pete Keilty for reporting this and helping me
      test it.
      Signed-off-by: NBob Picco <bob.picco@hp.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      97835245
  4. 08 11月, 2005 1 次提交
    • K
      [IA64] Extend notify_die() hooks for IA64 · 9138d581
      Keith Owens 提交于
      notify_die() added for MCA_{MONARCH,SLAVE,RENDEZVOUS}_{ENTER,PROCESS,LEAVE} and
      INIT_{MONARCH,SLAVE}_{ENTER,PROCESS,LEAVE}.  We need multiple
      notification points for these events because they can take many seconds
      to run which has nasty effects on the behaviour of the rest of the
      system.
      
      DIE_SS replaced by a generic DIE_FAULT which checks the vector number,
      to allow interception of faults other than SS.
      
      DIE_MACHINE_{HALT,RESTART} added to allow last minute close down
      processing, especially when the halt/restart routines are called from
      error handlers.
      
      DIE_OOPS added.
      
      The check for kprobe's break numbers has been moved from traps.c to
      kprobes.c, allowing DIE_BREAK to be used for any additional break
      numbers, i.e. it is no longer kprobes specific.
      
      Hooks for kernel debuggers and kernel dumpers added, ENTER and LEAVE.
      Both of these disable the system for long periods which impact on
      watchdogs and heartbeat systems in general.  More patches to come that
      use these events to reset watchdogs and heartbeats.
      
      unregister_die_notifier() added and both routines exported.  Requested
      by Dean Nelson.
      
      Lock removed from {un,}register_die_notifier.  notifier_chain_register()
      already takes a lock.  Also the generic notifier chain locking is being
      reworked to distinguish between callbacks that can block and those that
      cannot, the lock in {un,}register_die_notifier would interfere with
      that change.  http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2
      
      Leading white space removed from arch/ia64/kernel/kprobes.c.
      
      Typo in mca.c in original version of this patch found & fixed by Dean
      Nelson.
      Signed-off-by: NKeith Owens <kaos@sgi.com>
      Acked-by: NDean Nelson <dcn@sgi.com>
      Acked-by: NAnil Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      9138d581
  5. 07 11月, 2005 8 次提交
  6. 04 11月, 2005 2 次提交
  7. 01 11月, 2005 1 次提交
  8. 31 10月, 2005 3 次提交
  9. 30 10月, 2005 5 次提交
    • D
      [PATCH] memory hotplug locking: node_size_lock · 208d54e5
      Dave Hansen 提交于
      pgdat->node_size_lock is basically only neeeded in one place in the normal
      code: show_mem(), which is the arch-specific sysrq-m printing function.
      
      Strictly speaking, the architectures not doing memory hotplug do no need this
      locking in show_mem().  However, they are all included for completeness.  This
      should also make any future consolidation of all of the implementations a
      little more straightforward.
      
      This lock is also held in the sparsemem code during a memory removal, as
      sections are invalidated.  This is the place there pfn_valid() is made false
      for a memory area that's being removed.  The lock is only required when doing
      pfn_valid() operations on memory which the user does not already have a
      reference on the page, such as in show_mem().
      Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      208d54e5
    • H
      [PATCH] mm: flush_tlb_range outside ptlock · 663b97f7
      Hugh Dickins 提交于
      There was one small but very significant change in the previous patch:
      mprotect's flush_tlb_range fell outside the page_table_lock: as it is in 2.4,
      but that doesn't prove it safe in 2.6.
      
      On some architectures flush_tlb_range comes to the same as flush_tlb_mm, which
      has always been called from outside page_table_lock in dup_mmap, and is so
      proved safe.  Others required a deeper audit: I could find no reliance on
      page_table_lock in any; but in ia64 and parisc found some code which looks a
      bit as if it might want preemption disabled.  That won't do any actual harm,
      so pending a decision from the maintainers, disable preemption there.
      
      Remove comments on page_table_lock from flush_tlb_mm, flush_tlb_range and
      flush_tlb_page entries in cachetlb.txt: they were rather misleading (what
      generic code does is different from what usually happens), the rules are now
      changing, and it's not yet clear where we'll end up (will the generic
      tlb_flush_mmu happen always under lock?  never under lock?  or sometimes under
      and sometimes not?).
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      663b97f7
    • H
      [PATCH] mm: init_mm without ptlock · 872fec16
      Hugh Dickins 提交于
      First step in pushing down the page_table_lock.  init_mm.page_table_lock has
      been used throughout the architectures (usually for ioremap): not to serialize
      kernel address space allocation (that's usually vmlist_lock), but because
      pud_alloc,pmd_alloc,pte_alloc_kernel expect caller holds it.
      
      Reverse that: don't lock or unlock init_mm.page_table_lock in any of the
      architectures; instead rely on pud_alloc,pmd_alloc,pte_alloc_kernel to take
      and drop it when allocating a new one, to check lest a racing task already
      did.  Similarly no page_table_lock in vmalloc's map_vm_area.
      
      Some temporary ugliness in __pud_alloc and __pmd_alloc: since they also handle
      user mms, which are converted only by a later patch, for now they have to lock
      differently according to whether or not it's init_mm.
      
      If sources get muddled, there's a danger that an arch source taking
      init_mm.page_table_lock will be mixed with common source also taking it (or
      neither take it).  So break the rules and make another change, which should
      break the build for such a mismatch: remove the redundant mm arg from
      pte_alloc_kernel (ppc64 scrapped its distinct ioremap_mm in 2.6.13).
      
      Exceptions: arm26 used pte_alloc_kernel on user mm, now pte_alloc_map; ia64
      used pte_alloc_map on init_mm, now pte_alloc_kernel; parisc had bad args to
      pmd_alloc and pte_alloc_kernel in unused USE_HPPA_IOREMAP code; ppc64
      map_io_page forgot to unlock on failure; ppc mmu_mapin_ram and ppc64 im_free
      took page_table_lock for no good reason.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      872fec16
    • H
      [PATCH] mm: ia64 use expand_upwards · 46dea3d0
      Hugh Dickins 提交于
      ia64 has expand_backing_store function for growing its Register Backing Store
      vma upwards.  But more complete code for this purpose is found in the
      CONFIG_STACK_GROWSUP part of mm/mmap.c.  Uglify its #ifdefs further to provide
      expand_upwards for ia64 as well as expand_stack for parisc.
      
      The Register Backing Store vma should be marked VM_ACCOUNT.  Implement the
      intention of growing it only a page at a time, instead of passing an address
      outside of the vma to handle_mm_fault, with unknown consequences.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      46dea3d0
    • H
      [PATCH] mm: vm_stat_account unshackled · ab50b8ed
      Hugh Dickins 提交于
      The original vm_stat_account has fallen into disuse, with only one user, and
      only one user of vm_stat_unaccount.  It's easier to keep track if we convert
      them all to __vm_stat_account, then free it from its __shackles.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ab50b8ed
  10. 29 10月, 2005 1 次提交
  11. 28 10月, 2005 4 次提交
    • A
      [PATCH] gfp_t: remaining bits of arch/* · 53f9fc93
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      53f9fc93
    • A
      [PATCH] gfp_t: dma-mapping (ia64) · 06a54497
      Al Viro 提交于
      ... and related annotations for amd64 - swiotlb code is shared, but
      prototypes are not.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      06a54497
    • C
      [IA64] ptrace - find memory sharers on children list · 4ac0068f
      Cliff Wickman 提交于
      In arch/ia64/kernel/ptrace.c there is a test for a peek or poke of a
      register image (in register backing storage).
      The test can be unnecessarily long (and occurs while holding the tasklist_lock).
      Especially long on a large system with thousands of active tasks.
      
      The ptrace caller (presumably a debugger) specifies the pid of
      its target and an address to peek or poke.  But the debugger could be
      attached to several tasks.
      The idea of find_thread_for_addr() is to find whether the target address
      is in the RBS for any of those tasks.
      
      Currently it searches the thread-list of the target pid.  If that search
      does not find a match, and the shared mm-struct's user count indicates
      that there are other tasks sharing this address space (a rare occurrence),
      a search is made of all the tasks in the system.
      
      Another approach can drastically shorten this procedure.
      It depends upon the fact that in order to peek or poke from/to any task,
      the debugger must first attach to that task.  And when it does, the
      attached task is made a child of the debugger (is chained to its children list).
      
      Therefore we can search just the debugger's children list.
      Signed-off-by: NCliff Wickman <cpw@sgi.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      4ac0068f
    • D
      [IA64] - Avoid slow TLB purges on SGI Altix systems · c1902aae
      Dean Roe 提交于
      flush_tlb_all() can be a scaling issue on large SGI Altix systems
      since it uses the global call_lock and always executes on all cpus.
      When a process enters flush_tlb_range() to purge TLBs for another
      process, it is possible to avoid flush_tlb_all() and instead allow
      sn2_global_tlb_purge() to purge TLBs only where necessary.
      
      This patch modifies flush_tlb_range() so that this case can be handled
      by platform TLB purge functions and updates ia64_global_tlb_purge()
      accordingly.  sn2_global_tlb_purge() now calculates the region register
      value from the mm argument introduced with this patch.
      Signed-off-by: NDean Roe <roe@sgi.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      c1902aae
  12. 26 10月, 2005 2 次提交