1. 21 2月, 2006 1 次提交
  2. 18 2月, 2006 6 次提交
    • D
      [PATCH] powerpc: Fix accidentally-working typo in __pud_free_tlb · 200a4552
      David Gibson 提交于
      One of the parameters to the __pud_free_tlb() macro for powerpc is
      incorrect (see patch) .  We get away with it by accident, because the one
      place the macro is called, the second parameter is a variable named "pud".
      Signed-off-by: NDavid Gibson <dwg@au1.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      200a4552
    • H
      [PATCH] s390: additional_cpus parameter · 255acee7
      Heiko Carstens 提交于
      Introduce additional_cpus command line option.  By default no additional cpu
      can be attached to the system anymore.  Only the cpus present at IPL time can
      be switched on/off.  If it is desired that additional cpus can be attached to
      the system the maximum number of additional cpus needs to be specified with
      this option.
      
      This change is necessary in order to limit the waste of per_cpu data
      structures.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      255acee7
    • C
      [PATCH] i386: fix singlestepping though a syscall · cfe91f9c
      Chuck Ebbert 提交于
      Do not mask TIF_SINGLESTEP bit in _TIF_WORK_MASK. Masking this stopped
      do_notify_resume() from being called when it should have been.
      Signed-off-by: NChuck Ebbert <76306.1226@compuserve.com>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      cfe91f9c
    • P
      [PATCH] Provide an interface for getting the current tick length · 726c14bf
      Paul Mackerras 提交于
      This provides an interface for arch code to find out how many
      nanoseconds are going to be added on to xtime by the next call to
      do_timer.  The value returned is a fixed-point number in 52.12 format
      in nanoseconds.  The reason for this format is that it gives the
      full precision that the timekeeping code is using internally.
      
      The motivation for this is to fix a problem that has arisen on 32-bit
      powerpc in that the value returned by do_gettimeofday drifts apart
      from xtime if NTP is being used.  PowerPC is now using a lockless
      do_gettimeofday based on reading the timebase register and performing
      some simple arithmetic.  (This method of getting the time is also
      exported to userspace via the VDSO.)  However, the factor and offset
      it uses were calculated based on the nominal tick length and weren't
      being adjusted when NTP varied the tick length.
      
      Note that 64-bit powerpc has had the lockless do_gettimeofday for a
      long time now.  It also had an extremely hairy routine that got called
      from the 32-bit compat routine for adjtimex, which adjusted the
      factor and offset according to what it thought the timekeeping code
      was going to do.  Not only was this only called if a 32-bit task did
      adjtimex (i.e. not if a 64-bit task did adjtimex), it was also
      duplicating computations from kernel/timer.c and it wasn't clear that
      it was (still) correct.
      
      The simple solution is to ask the timekeeping code how long the
      current jiffy will be on each timer interrupt, after calling
      do_timer.  If this jiffy will be a different length from the last one,
      we then need to compute new values for the factor and offset used in
      the lockless do_gettimeofday.  In this way we can keep xtime and
      do_gettimeofday in sync, even when NTP is varying the tick length.
      
      Note that when adjtimex varies the tick length, it almost always
      introduces the variation from the next tick on.  The only case I could
      see where adjtimex would vary the length of the current tick is when
      an old-style adjtime adjustment is being cancelled.  (It's not clear
      to me why the adjustment has to be cancelled immediately rather than
      from the next tick on.)  Thus I don't see any real need for a hook in
      adjtimex; the rare case of an old-style adjustment being cancelled can
      be fixed up at the next tick.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: Njohn stultz <johnstul@us.ibm.com>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      726c14bf
    • A
      [PATCH] x86_64: Disable tsc when apicpmtimer is active · 7fd67843
      Andi Kleen 提交于
      Otherwise it has no effect anyways.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7fd67843
    • A
      [PATCH] x86_64: Add boot option to disable randomized mappings and cleanup · a62eaf15
      Andi Kleen 提交于
      AMD SimNow!'s JIT doesn't like them at all in the guest. For distribution
      installation it's easiest if it's a boot time option.
      
      Also I moved the variable to a more appropiate place and make
      it independent from sysctl
      
      And marked __read_mostly which it is.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a62eaf15
  3. 17 2月, 2006 1 次提交
  4. 16 2月, 2006 12 次提交
  5. 15 2月, 2006 16 次提交
    • P
      [NETFILTER]: Fix xfrm lookup after SNAT · ee68cea2
      Patrick McHardy 提交于
      To find out if a packet needs to be handled by IPsec after SNAT, packets
      are currently rerouted in POST_ROUTING and a new xfrm lookup is done. This
      breaks SNAT of non-unicast packets to non-local addresses because the
      packet is routed as incoming packet and no neighbour entry is bound to the
      dst_entry. In general, it seems to be a bad idea to replace the dst_entry
      after the packet was already sent to the output routine because its state
      might not match what's expected.
      
      This patch changes the xfrm lookup in POST_ROUTING to re-use the original
      dst_entry without routing the packet again. This means no policy routing
      can be used for transport mode transforms (which keep the original route)
      when packets are SNATed to match the policy, but it looks like the best
      we can do for now.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee68cea2
    • D
      [PATCH] FRV: Use virtual interrupt disablement · 28baebae
      David Howells 提交于
      Make the FRV arch use virtual interrupt disablement because accesses to the
      processor status register (PSR) are relatively slow and because we will
      soon have the need to deal with multiple interrupt controls at the same
      time (separate h/w and inter-core interrupts).
      
      The way this is done is to dedicate one of the four integer condition code
      registers (ICC2) to maintaining a virtual interrupt disablement state
      whilst inside the kernel.  This uses the ICC2.Z flag (Zero) to indicate
      whether the interrupts are virtually disabled and the ICC2.C flag (Carry)
      to indicate whether the interrupts are physically disabled.
      
      ICC2.Z is set to indicate interrupts are virtually disabled.  ICC2.C is set
      to indicate interrupts are physically enabled.  Under normal running
      conditions Z==0 and C==1.
      
      Disabling interrupts with local_irq_disable() doesn't then actually
      physically disable interrupts - it merely sets ICC2.Z to 1.  Should an
      interrupt then happen, the exception prologue will note ICC2.Z is set and
      branch out of line using one instruction (an unlikely BEQ).  Here it will
      physically disable interrupts and clear ICC2.C.
      
      When it comes time to enable interrupts (local_irq_enable()), this simply
      clears the ICC2.Z flag and invokes a trap #2 if both Z and C flags are
      clear (the HI integer condition).  This can be done with the TIHI
      conditional trap instruction.
      
      The trap then physically reenables interrupts and sets ICC2.C again.  Upon
      returning the interrupt will be taken as interrupts will then be enabled.
      Note that whilst processing the trap, the whole exceptions system is
      disabled, and so an interrupt can't happen till it returns.
      
      If no pending interrupt had happened, ICC2.C would still be set, the HI
      condition would not be fulfilled, and no trap will happen.
      
      Saving interrupts (local_irq_save) is simply a matter of pulling the ICC2.Z
      flag out of the CCR register, shifting it down and masking it off.  This
      gives a result of 0 if interrupts were enabled and 1 if they weren't.
      
      Restoring interrupts (local_irq_restore) is then a matter of taking the
      saved value mentioned previously and XOR'ing it against 1.  If it was one,
      the result will be zero, and if it was zero the result will be non-zero.
      This result is then used to affect the ICC2.Z flag directly (it is a
      condition code flag after all).  An XOR instruction does not affect the
      Carry flag, and so that bit of state is unchanged.  The two flags can then
      be sampled to see if they're both zero using the trap (TIHI) as for the
      unconditional reenablement (local_irq_enable).
      
      This patch also:
      
       (1) Modifies the debugging stub (break.S) to handle single-stepping crossing
           into the trap #2 handler and into virtually disabled interrupts.
      
       (2) Removes superseded fixup pointers from the second instructions in the trap
           tables (there's no a separate fixup table for this).
      
       (3) Declares the trap #3 vector for use in .org directives in the trap table.
      
       (4) Moves irq_enter() and irq_exit() in do_IRQ() to avoid problems with
           virtual interrupt handling, and removes the duplicate code that has now
           been folded into irq_exit() (softirq and preemption handling).
      
       (5) Tells the compiler in the arch Makefile that ICC2 is now reserved.
      
       (6) Documents the in-kernel ABI, including the virtual interrupts.
      
       (7) Renames the old irq management functions to different names.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      28baebae
    • D
      [PATCH] FRV: Miscellaneous fixes · 68f624fc
      David Howells 提交于
      Make various alterations and fixes to the FRV arch:
      
       (1) Resyncs the FRV system call collection with the i386 arch.
      
       (2) Discards __iounmap() as it's not used.
      
       (3) Fixes the use of the SWAP/SWAPI instruction to get the arguments the right
           way around in atomic.h, and also to get the asm constraints correct.
      
       (4) Moves copy_to/from_user_page() to asm/cacheflush.h to be consistent with
           other archs.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      68f624fc
    • C
      [PATCH] sched: revert "filter affine wakeups" · d6077cb8
      Chen, Kenneth W 提交于
      Revert commit d7102e95:
      
          [PATCH] sched: filter affine wakeups
      
      Apparently caused more than 10% performance regression for aim7 benchmark.
      The setup in use is 16-cpu HP rx8620, 64Gb of memory and 12 MSA1000s with 144
      disks.  Each disk is 72Gb with a single ext3 filesystem (courtesy of HP, who
      supplied benchmark results).
      
      The problem is, for aim7, the wake-up pattern is random, but it still needs
      load balancing action in the wake-up path to achieve best performance.  With
      the above commit, lack of load balancing hurts that workload.
      
      However, for workloads like database transaction processing, the requirement
      is exactly opposite.  In the wake up path, best performance is achieved with
      absolutely zero load balancing.  We simply wake up the process on the CPU that
      it was previously run.  Worst performance is obtained when we do load
      balancing at wake up.
      
      There isn't an easy way to auto detect the workload characteristics.  Ingo's
      earlier patch that detects idle CPU and decide whether to load balance or not
      doesn't perform with aim7 either since all CPUs are busy (it causes even
      bigger perf.  regression).
      
      Revert commit d7102e95, which causes more
      than 10% performance regression with aim7.
      Signed-off-by: NKen Chen <kenneth.w.chen@intel.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d6077cb8
    • M
      [PATCH] madvise MADV_DONTFORK/MADV_DOFORK · f8225661
      Michael S. Tsirkin 提交于
      Currently, copy-on-write may change the physical address of a page even if the
      user requested that the page is pinned in memory (either by mlock or by
      get_user_pages).  This happens if the process forks meanwhile, and the parent
      writes to that page.  As a result, the page is orphaned: in case of
      get_user_pages, the application will never see any data hardware DMA's into
      this page after the COW.  In case of mlock'd memory, the parent is not getting
      the realtime/security benefits of mlock.
      
      In particular, this affects the Infiniband modules which do DMA from and into
      user pages all the time.
      
      This patch adds madvise options to control whether memory range is inherited
      across fork.  Useful e.g.  for when hardware is doing DMA from/into these
      pages.  Could also be useful to an application wanting to speed up its forks
      by cutting large areas out of consideration.
      Signed-off-by: NMichael S. Tsirkin <mst@mellanox.co.il>
      Acked-by: NHugh Dickins <hugh@veritas.com>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f8225661
    • J
      [PATCH] fix x86 topology export in sysfs for subarchitectures · 8b09fb34
      James Bottomley 提交于
      The correct way to export hyperthreading based functions is to predicate
      them on CONFIG_X86_HT.  Without this, the topology exporting patch breaks
      the build on all non-PC x86 subarchitectures.
      Signed-off-by: NJames Bottomley <James.Bottomley@SteelEye.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8b09fb34
    • T
      [PATCH] NLM: Fix the NLM_GRANTED callback checks · 5ac5f9d1
      Trond Myklebust 提交于
      If 2 threads attached to the same process are blocking on different locks on
      different files (maybe even on different servers) but have the same lock
      arguments (i.e.  same offset+length - actually quite common, since most
      processes try to lock the entire file) then the first GRANTED call that wakes
      one up will also wake the other.
      
      Currently when the NLM_GRANTED callback comes in, lockd walks the list of
      blocked locks in search of a match to the lock that the NLM server has
      granted.  Although it checks the lock pid, start and end, it fails to check
      the filehandle and the server address.
      
      By checking the filehandle and server IP address, we ensure that this only
      happens if the locks truly are referencing the same file.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5ac5f9d1
    • M
      [PATCH] jbd: revert checkpoint list changes · 7c8903f6
      Mark Fasheh 提交于
      This patch reverts commit f93ea411:
        [PATCH] jbd: split checkpoint lists
      
      This broke journal_flush() for OCFS2, which is its method of being sure
      that metadata is sent to disk for another node.
      
      And two related commits 8d3c7fce and
      43c3e6f5 with the subjects:
        [PATCH] jbd: log_do_checkpoint fix
        [PATCH] jbd: remove_transaction fix
      
      These seem to be incremental bugfixes on the original patch and as such are
      no longer needed.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      Cc: Jan Kara <jack@ucw.cz>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7c8903f6
    • A
      [IA64] Count disabled cpus as potential hot-pluggable CPUs · a6b14fa6
      Ashok Raj 提交于
      Have a facility to account for potentially hot-pluggable CPUs. ACPI doesnt
      give a determinstic method to find hot-pluggable CPUs. Hence we use 2 methods
      to assist.
      
      - BIOS can mark potentially hot-pluggable CPUs as disabled in the MADT tables.
      - User can specify the number of hot-pluggable CPUs via parameter
        additional_cpus=X
      
      The option is enabled only if ACPI_CONFIG_HOTPLUG_CPU=y which enables the
      physical hotplug option. Without which user can still use logical onlining
      and offlining of CPUs by enabling CONFIG_HOTPLUG_CPU=y
      
      Adds more bits to cpu_possible_map for potentially hot-pluggable cpus.
      Signed-off-by: NAshok Raj <ashok.raj@intel.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      a6b14fa6
    • M
    • R
    • R
      [MIPS] More uaccess.h fixes with gcc >= 4.0.1. · 3218357c
      Ralf Baechle 提交于
          
      From Richard Sandiford <richard@codesourcery.com>:
          
      This patch caused a miscompilation of the restore_gp_regs() block
      in restore_sigcontext().  This was in a 32-bit kernel compiled with
      GCC CVS head.
          
      restore_gp_regs() copies 64-bit user fields into 32-bit variables,
      and in this combination, the new __get_user_asm_ll32() clobbers too
      many registers.  It says:
          
      /*
       * Get a long long 64 using 32 bit registers.
       */
      {									\
      	__asm__ __volatile__(						\
      	"1:	lw	%1, (%3)				\n"	\
      	"2:	lw	%D1, 4(%3)				\n"	\
      	"	move	%0, $0					\n"	\
      	"3:	.section	.fixup,\"ax\"			\n"	\
      	"4:	li	%0, %4					\n"	\
      	"	move	%1, $0					\n"	\
      	"	move	%D1, $0					\n"	\
      	"	j	3b					\n"	\
      	"	.previous					\n"	\
      	"	.section	__ex_table,\"a\"		\n"	\
      	"	" __UA_ADDR "	1b, 4b				\n"	\
      	"	" __UA_ADDR "	2b, 4b				\n"	\
      	"	.previous					\n"	\
      	: "=r" (__gu_err), "=&r" (val)					\
      	: "0" (0), "r" (addr), "i" (-EFAULT));				\
      }
      
      and this requires val (%1) to be a 64-bit value.  In the case I saw,
      gcc was using $3 for the 32-bit val, and wasn't expecting $4 to be
      clobbered.
      Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      3218357c
    • A
      [MIPS] Add protected_blast_icache_range, blast_icache_range, etc. · 41700e73
      Atsushi Nemoto 提交于
          
      Add blast_xxx_range(), protected_blast_xxx_range() etc. for common
      use.  They are built by __BUILD_BLAST_CACHE_RANGE().
      Use protected_cache_op() macro for various protected_ routines.
      Output code should be logically same.
      Signed-off-by: NAtsushi Nemoto <anemo@mba.ocn.ne.jp>
      Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      41700e73
    • R
      359bbd42
    • R
      [MIPS] RM200: Give RM200 it's own timex.h. · f32ec77b
      Ralf Baechle 提交于
          
      So we can get rid of config.h and the #ifdef crapola in the generic
      timex.h.
      Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      f32ec77b
    • J
      [PATCH] add scsi_execute_in_process_context() API · faead26d
      James Bottomley 提交于
      We have several points in the SCSI stack (primarily for our device
      functions) where we need to guarantee process context, but (given the
      place where the last reference was released) we cannot guarantee this.
      
      This API gets around the issue by executing the function directly if
      the caller has process context, but scheduling a workqueue to execute
      in process context if the caller doesn't have it.  Unfortunately, it
      requires memory allocation in interrupt context, but it's better than
      what we have previously.  The true solution will require a bit of
      re-engineering, so isn't appropriate for 2.6.16.
      Signed-off-by: NJames Bottomley <James.Bottomley@SteelEye.com>
      faead26d
  6. 14 2月, 2006 1 次提交
  7. 13 2月, 2006 3 次提交