1. 08 9月, 2005 2 次提交
    • J
      [PATCH] ia64 cpuset + build_sched_domains() mangles structures · f68f447e
      John Hawkes 提交于
      I've already sent this to the maintainers, and this is now being sent to a
      larger community audience.  I have fixed a problem with the ia64 version of
      build_sched_domains(), but a similar fix still needs to be made to the
      generic build_sched_domains() in kernel/sched.c.
      
      The "dynamic sched domains" functionality has recently been merged into
      2.6.13-rcN that sees the dynamic declaration of a cpu-exclusive (a.k.a.
      "isolated") cpuset and rebuilds the CPU Scheduler sched domains and sched
      groups to separate away the CPUs in this cpu-exclusive cpuset from the
      remainder of the non-isolated CPUs.  This allows the non-isolated CPUs to
      completely ignore the isolated CPUs when doing load-balancing.
      
      Unfortunately, build_sched_domains() expects that a sched domain will
      include all the CPUs of each node in the domain, i.e., that no node will
      belong in both an isolated cpuset and a non-isolated cpuset.  Declaring a
      cpuset that violates this presumption will produce flawed data structures
      and will oops the kernel.
      
      To trigger the problem (on a NUMA system with >1 CPUs per node):
         cd /dev/cpuset
         mkdir newcpuset
         cd newcpuset
         echo 0 >cpus
         echo 0 >mems
         echo 1 >cpu_exclusive
      
      I have fixed this shortcoming for ia64 NUMA (with multiple CPUs per node).
      A similar shortcoming exists in the generic build_sched_domains() (in
      kernel/sched.c) for NUMA, and that needs to be fixed also.  The fix
      involves dynamically allocating sched_group_nodes[] and
      sched_group_allnodes[] for each invocation of build_sched_domains(), rather
      than using global arrays for these structures.  Care must be taken to
      remember kmalloc() addresses so that arch_destroy_sched_domains() can
      properly kfree() the new dynamic structures.
      Signed-off-by: NJohn Hawkes <hawkes@sgi.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f68f447e
    • A
      [PATCH] x86/x86_64: deferred handling of writes to /proc/irqxx/smp_affinity · 54d5d424
      Ashok Raj 提交于
      When handling writes to /proc/irq, current code is re-programming rte
      entries directly. This is not recommended and could potentially cause
      chipset's to lockup, or cause missing interrupts.
      
      CONFIG_IRQ_BALANCE does this correctly, where it re-programs only when the
      interrupt is pending. The same needs to be done for /proc/irq handling as well.
      Otherwise user space irq balancers are really not doing the right thing.
      
      - Changed pending_irq_balance_cpumask to pending_irq_migrate_cpumask for
        lack of a generic name.
      - added move_irq out of IRQ_BALANCE, and added this same to X86_64
      - Added new proc handler for write, so we can do deferred write at irq
        handling time.
      - Display of /proc/irq/XX/smp_affinity used to display CPU_MASKALL, instead
        it now shows only active cpu masks, or exactly what was set.
      - Provided a common move_irq implementation, instead of duplicating
        when using generic irq framework.
      
      Tested on i386/x86_64 and ia64 with CONFIG_PCI_MSI turned on and off.
      Tested UP builds as well.
      
      MSI testing: tbd: I have cards, need to look for a x-over cable, although I
      did test an earlier version of this patch.  Will test in a couple days.
      Signed-off-by: NAshok Raj <ashok.raj@intel.com>
      Acked-by: NZwane Mwaikambo <zwane@holomorphy.com>
      Grudgingly-acked-by: NAndi Kleen <ak@muc.de>
      Signed-off-by: NCoywolf Qi Hunt <coywolf@lovecn.org>
      Signed-off-by: NAshok Raj <ashok.raj@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      54d5d424
  2. 01 9月, 2005 1 次提交
  3. 30 8月, 2005 1 次提交
    • S
      [PATCH] convert signal handling of NODEFER to act like other Unix boxes. · 69be8f18
      Steven Rostedt 提交于
      It has been reported that the way Linux handles NODEFER for signals is
      not consistent with the way other Unix boxes handle it.  I've written a
      program to test the behavior of how this flag affects signals and had
      several reports from people who ran this on various Unix boxes,
      confirming that Linux seems to be unique on the way this is handled.
      
      The way NODEFER affects signals on other Unix boxes is as follows:
      
      1) If NODEFER is set, other signals in sa_mask are still blocked.
      
      2) If NODEFER is set and the signal is in sa_mask, then the signal is
      still blocked. (Note: this is the behavior of all tested but Linux _and_
      NetBSD 2.0 *).
      
      The way NODEFER affects signals on Linux:
      
      1) If NODEFER is set, other signals are _not_ blocked regardless of
      sa_mask (Even NetBSD doesn't do this).
      
      2) If NODEFER is set and the signal is in sa_mask, then the signal being
      handled is not blocked.
      
      The patch converts signal handling in all current Linux architectures to
      the way most Unix boxes work.
      
      Unix boxes that were tested:  DU4, AIX 5.2, Irix 6.5, NetBSD 2.0, SFU
      3.5 on WinXP, AIX 5.3, Mac OSX, and of course Linux 2.6.13-rcX.
      
      * NetBSD was the only other Unix to behave like Linux on point #2. The
      main concern was brought up by point #1 which even NetBSD isn't like
      Linux.  So with this patch, we leave NetBSD as the lonely one that
      behaves differently here with #2.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      69be8f18
  4. 27 8月, 2005 1 次提交
  5. 25 8月, 2005 1 次提交
    • P
      [IA64] Rationalise Region Definitions · 0a41e250
      Peter Chubb 提交于
      Currently, region numbers are defined in several files, with several 
      names.  For example, we have REGION_KERNEL in asm/page.h and 
      RGN_KERNEL in pgtable.h 
       
      We also have address definitions that should depend on the 
      RGN_XXX macros, but are currently just long constants. 
       
      The following patch reorganises all the definitions so that they have 
      the same form (RGN_XXX), are in one place, and that addresses that 
      depend on RGN_XXX are derived from them. 
      
      (This is a necessary but not sufficient patch to allow UML-like 
      operation on IA64). 
      
      Thanks to David Mosberger for catching the change I missed in mmu_context.h.
       
      Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au> 
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      0a41e250
  6. 17 8月, 2005 1 次提交
  7. 16 8月, 2005 1 次提交
  8. 11 8月, 2005 1 次提交
    • S
      [IA64] fix perfmon context load · 6bf11e8c
      stephane.eranian@hp.com 提交于
      The PFM_LOAD_CONTEXT may fail silently and cause a session
      to remain reserved even though it should not. This can happen
      when the commands succeeds in reserving the session but fails
      when it actually tries to attach to the load_pid. In that case,
      the command has failed but will return 0. More importantly,
      the session will remain reserved. This patch fixes the problem.
      
      Signed-off-by: <stephane.eranian@hp.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      6bf11e8c
  9. 09 8月, 2005 1 次提交
  10. 02 8月, 2005 1 次提交
    • I
      [PATCH] remove sys_set_zone_reclaim() · 6cb54819
      Ingo Molnar 提交于
      This removes sys_set_zone_reclaim() for now.  While i'm sure Martin is
      trying to solve a real problem, we must not hard-code an incomplete and
      insufficient approach into a syscall, because syscalls are pretty much
      for eternity.  I am quite strongly convinced that this syscall must not
      hit v2.6.13 in its current form.
      
      Firstly, the syscall lacks basic syscall design: e.g. it allows the
      global setting of VM policy for unprivileged users. (!) [ Imagine an
      Oracle installation and a SAP installation on the same NUMA box fighting
      over the 'optimal' setting for this flag. What will they do? Will they
      try to set the flag to their own preferred value every second or so? ]
      
      Secondly, it was added based on a single datapoint from Martin:
      
       http://marc.theaimsgroup.com/?l=linux-mm&m=111763597218177&w=2
      
      where Martin characterizes the numbers the following way:
      
       ' Run-to-run variability for "make -j" is huge, so these numbers aren't
         terribly useful except to see that with reclaim the benchmark still
         finishes in a reasonable amount of time. '
      
      in other words: the fundamental problem has likely not been solved, only
      a tendential move into the right direction has been observed, and a
      handful of numbers were picked out of a set of hugely variable results,
      without showing the variability data. How much variance is there
      run-to-run?
      
      I'd really suggest to first walk the walk and see what's needed to get
      stable & predictable kernel compilation numbers on that NUMA box, before
      adding random syscalls to tune a particular aspect of the VM ... which
      approach might not even matter once the whole picture has been analyzed
      and understood!
      
      The third, most important point is that the syscall exposes VM tuning
      internals in a completely unstructured way. What sense does it make to
      have a _GLOBAL_ per-node setting for 'should we go to another node for
      reclaim'? If then it might make sense to do this per-app, via numalib or
      so.
      
      The change is minimalistic in that it doesnt remove the syscall and the
      underlying infrastructure changes, only the user-visible changes.  We
      could perhaps add a CAP_SYS_ADMIN-only sysctl for this hack, a'ka
      /proc/sys/vm/swappiness, but even that looks quite counterproductive
      when the generic approach is that we are trying to reduce the number of
      external factors in the VM balance picture.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6cb54819
  11. 28 7月, 2005 2 次提交
    • K
      [IA64] unwind.c uses wrong unat from switch_stack · b833961b
      Keith Owens 提交于
      unwind.c can read the wrong unat bits from switch_stack.
      sw->caller_unat is the value of ar.unat when the task was blocked.
      sw->ar_unat is the value of ar.unat after doing st8.spill for r4-7.
      IOW, ar_unat is caller_unat with 4 bits changed.
      
      unw_access_gr() uses sw->ar_unat for r4-7 (correct), but it also uses
      sw->ar_unat for other scratch registers (incorrect).  sw->ar_unat
      should only be used for r4-7, everything else should use
      sw->caller_unat, unless modified by unwind info.  Using sw->ar_unat
      risks picking up the 4 bits that were overwritten when r4-7 were saved.
      
      Also this line is wrong
      	unw.sw_off[unw.preg_index[UNW_REG_PFS]] = SW(AR_UNAT);
      and should be
      	unw.sw_off[unw.preg_index[UNW_REG_PFS]] = SW(AR_PFS);
      Signed-off-by: NKeith Owens <kaos@sgi.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      b833961b
    • R
      [IA64] inotify: ia64 syscalls. · d108919b
      Robert Love 提交于
      Attached patch adds the inotify syscalls to ia64.
      Signed-off-by: NRobert Love <rml@novell.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      d108919b
  12. 27 7月, 2005 1 次提交
  13. 15 7月, 2005 1 次提交
  14. 13 7月, 2005 1 次提交
  15. 12 7月, 2005 4 次提交
  16. 09 7月, 2005 1 次提交
  17. 07 7月, 2005 3 次提交
  18. 06 7月, 2005 1 次提交
  19. 29 6月, 2005 3 次提交
    • P
      [IA64] Fix another IA64 preemption problem · a68db763
      Peter Chubb 提交于
      There's another problem shown up by Ingo's recent patch to make
      smp_processor_id() complain if it's called with preemption enabled.
      local_finish_flush_tlb_mm() calls activate_context() in a situation
      where it could be rescheduled to another processor.  This patch
      disables preemption around the call.
      Signed-off-by: NPeter Chubb <peterc@gelato.unsw.edu.au>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      a68db763
    • D
      [IA64] Speed up lfetch.fault [NULL] · 458f9355
      David Mosberger-Tang 提交于
      This patch greatly speeds up the handling of lfetch.fault instructions
      which result in NaT consumption. Due to the NaT-page mapped at address
      0, this is guaranteed to happen when lfetch.fault'ing a NULL pointer.
      With this patch in place, we can even define prefetch()/prefetchw() as
      lfetch.fault without significant performance degradation.  More
      importantly, it allows compilers to be more aggressive with using
      lfetch.fault on pointers that might be NULL.
      Signed-off-by: NDavid Mosberger-Tang <davidm@hpl.hp.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      458f9355
    • M
      [IA64-SGI] pcdp: add PCDP pci interface support · 66b7f8a3
      Mark Maule 提交于
      Resend 2 with changes per Bjorn Helgaas comments.  Changes from original:
      
      + Change globals to vga_console_iobase/vga_console_membase and make them
        unconditional.
      + Address style-related comments.
      
      Patch to extend the PCDP vga setup code to support PCI io/mem translations
      for the legacy vga ioport and ram spaces on architectures (e.g. altix) which
      need them.
      
      Summary of the changes:
      
      drivers/firmware/pcdp.c
      drivers/firmware/pcdp.h
      -----------------------
      + add declaration for the spec-defined PCI interface struct (pcdp_if_pci)
        as well as support macros.
      
      + extend setup_vga_console() to know about pcdp_if_pci and add a couple of
        globals to hold the io and mem translation offsets if present.
      
      arch/ia64/kernel/setup.c
      ------------------------
      + tweek early_console_setup() to allow multiple early console setup routines
        to be called.
      
      include/asm-ia64/vga.h
      ----------------------
      + make VGA_MAP_MEM vga_console_membase aware
      Signed-off-by: NMark Maule <maule@sgi.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      66b7f8a3
  20. 28 6月, 2005 6 次提交
    • K
      [PATCH] ACPI based I/O APIC hot-plug: ia64 support · 0e888adc
      Kenji Kaneshige 提交于
      This is an ia64 implementation of acpi_register_ioapic() and
      acpi_unregister_ioapic() interfaces.
      Signed-off-by: NKenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      0e888adc
    • K
      [PATCH] ACPI based I/O APIC hot-plug: add interfaces · b1bb248a
      Kenji Kaneshige 提交于
      This patch adds the following new interfaces for I/O xAPIC
      hotplug. The implementation of these interfaces depends on each
      architecture.
      
          o int acpi_register_ioapic(acpi_handle handle, u64 phys_addr,
      			       u32 gsi_base);
      
              This new interface is to add a new I/O xAPIC specified by
              phys_addr and gsi_base pair. phys_addr is the physical address
              to which the I/O xAPIC is mapped and gsi_base is global system
              interrupt base of the I/O xAPIC. acpi_register_ioapic returns
              0 on success, or negative value on error.
      
          o int acpi_unregister_ioapic(acpi_handle handle, u32 gsi_base);
      
              This new interface is to remove a I/O xAPIC specified by
              gsi_base. acpi_unregister_ioapic returns 0 on success, or
              negative value on error.
      Signed-off-by: NKenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      b1bb248a
    • K
      [PATCH] kprobes/ia64: refuse kprobe on ivt code · c7b645f9
      Keshavamurthy Anil S 提交于
      Not safe to insert kprobes on IVT code.
      
      This patch checks to see if the address on which Kprobes is being inserted is
      in ivt code and if it is in ivt code then refuse to register kprobe.
      Signed-off-by: NAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Acked-by: NDavid Mosberger <davidm@napali.hpl.hp.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c7b645f9
    • R
      [PATCH] kprobes/ia64: refuse inserting kprobe on slot 1 · a528e21c
      Rusty Lynch 提交于
      Without the ability to atomically write 16 bytes, we can not update the
      middle slot of a bundle, slot 1, unless we stop the machine first.  This
      patch will ensure the ability to robustly insert and remove a kprobe by
      refusing to insert a kprobe on slot 1 until a mechanism is in place to
      safely handle this case.
      Signed-off-by: NRusty Lynch <rusty.lynch@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a528e21c
    • R
      [PATCH] Return probe redesign: ia64 specific implementation · 9508dbfe
      Rusty Lynch 提交于
      The following patch implements function return probes for ia64 using
      the revised design.  With this new design we no longer need to do some
      of the odd hacks previous required on the last ia64 return probe port
      that I sent out for comments.
      
      Note that this new implementation still does not resolve the problem noted
      by Keith Owens where backtrace data is lost after a return probe is hit.
      
      Changes include:
       * Addition of kretprobe_trampoline to act as a dummy function for instrumented
         functions to return to, and for the return probe infrastructure to place
         a kprobe on on, gaining control so that the return probe handler
         can be called, and so that the instruction pointer can be moved back
         to the original return address.
       * Addition of arch_init(), allowing a kprobe to be registered on
         kretprobe_trampoline
       * Addition of trampoline_probe_handler() which is used as the pre_handler
         for the kprobe inserted on kretprobe_implementation.  This is the function
         that handles the details for calling the return probe handler function
         and returning control back at the original return address
       * Addition of arch_prepare_kretprobe() which is setup as the pre_handler
         for a kprobe registered at the beginning of the target function by
         kernel/kprobes.c so that a return probe instance can be setup when
         a caller enters the target function.  (A return probe instance contains
         all the needed information for trampoline_probe_handler to do it's job.)
       * Hooks added to the exit path of a task so that we can cleanup any left-over
         return probe instances (i.e. if a task dies while inside a targeted function
         then the return probe instance was reserved at the beginning of the function
         but the function never returns so we need to mark the instance as unused.)
      Signed-off-by: NRusty Lynch <rusty.lynch@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9508dbfe
    • J
      [PATCH] Update cfq io scheduler to time sliced design · 22e2c507
      Jens Axboe 提交于
      This updates the CFQ io scheduler to the new time sliced design (cfq
      v3).  It provides full process fairness, while giving excellent
      aggregate system throughput even for many competing processes.  It
      supports io priorities, either inherited from the cpu nice value or set
      directly with the ioprio_get/set syscalls.  The latter closely mimic
      set/getpriority.
      
      This import is based on my latest from -mm.
      Signed-off-by: NJens Axboe <axboe@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      22e2c507
  21. 26 6月, 2005 4 次提交
  22. 24 6月, 2005 2 次提交