1. 03 10月, 2007 2 次提交
  2. 01 10月, 2007 1 次提交
  3. 30 9月, 2007 1 次提交
    • N
      i386: remove bogus comment about memory barrier · 4827bbb0
      Nick Piggin 提交于
      The comment being removed by this patch is incorrect and misleading.
      
      In the following situation:
      
      	1. load  ...
      	2. store 1 -> X
      	3. wmb
      	4. rmb
      	5. load  a <- Y
      	6. store ...
      
      4 will only ensure ordering of 1 with 5.
      3 will only ensure ordering of 2 with 6.
      
      Further, a CPU with strictly in-order stores will still only provide that
      2 and 6 are ordered (effectively, it is the same as a weakly ordered CPU
      with wmb after every store).
      
      In all cases, 5 may still be executed before 2 is visible to other CPUs!
      
      The additional piece of the puzzle that mb() provides is the store/load
      ordering, which fundamentally cannot be achieved with any combination of
      rmb()s and wmb()s.
      
      This can be an unexpected result if one expected any sort of global ordering
      guarantee to barriers (eg. that the barriers themselves are sequentially
      consistent with other types of barriers).  However sfence or lfence barriers
      need only provide an ordering partial ordering of memory operations -- Consider
      that wmb may be implemented as nothing more than inserting a special barrier
      entry in the store queue, or, in the case of x86, it can be a noop as the store
      queue is in order. And an rmb may be implemented as a directive to prevent
      subsequent loads only so long as their are no previous outstanding loads (while
      there could be stores still in store queues).
      
      I can actually see the occasional load/store being reordered around lfence on
      my core2. That doesn't prove my above assertions, but it does show the comment
      is wrong (unless my program is -- can send it out by request).
      
      So:
         mb() and smp_mb() always have and always will require a full mfence
         or lock prefixed instruction on x86.  And we should remove this comment.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Paul McKenney <paulmck@us.ibm.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4827bbb0
  4. 29 9月, 2007 2 次提交
    • D
      [TCP]: Fix MD5 signature handling on big-endian. · f8ab18d2
      David S. Miller 提交于
      Based upon a report and initial patch by Peter Lieven.
      
      tcp4_md5sig_key and tcp6_md5sig_key need to start with
      the exact same members as tcp_md5sig_key.  Because they
      are both cast to that type by tcp_v{4,6}_md5_do_lookup().
      
      Unfortunately tcp{4,6}_md5sig_key use a u16 for the key
      length instead of a u8, which is what tcp_md5sig_key
      uses.  This just so happens to work by accident on
      little-endian, but on big-endian it doesn't.
      
      Instead of casting, just place tcp_md5sig_key as the first member of
      the address-family specific structures, adjust the access sites, and
      kill off the ugly casts.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f8ab18d2
    • R
      [MIPS] Fix CONFIG_BUILD_ELF64 kernels with symbols in CKSEG0. · 9ae6399f
      Ralf Baechle 提交于
      The __pa() for those did assume that all symbols have XKPHYS values and
      the math fails for any other address range.
      Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      9ae6399f
  5. 28 9月, 2007 1 次提交
  6. 27 9月, 2007 4 次提交
    • L
      Revert "[PATCH] x86-64: fix x86_64-mm-sched-clock-share" · ff0ce684
      Linus Torvalds 提交于
      This reverts commit 184c44d2.
      
      As noted by Dave Jones:
         "Linus, please revert the above cset.  It doesn't seem to be
          necessary (it was added to fix a miscompile in 'make allnoconfig'
          which doesn't seem to be repeatable with it reverted) and actively
         breaks the ARM SA1100 framebuffer driver."
      Requested-by: NDave Jones <davej@redhat.com>
      Cc: Russell King <rmk+lkml@arm.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ff0ce684
    • L
      Revert "x86-64: Disable local APIC timer use on AMD systems with C1E" · f7f847b0
      Linus Torvalds 提交于
      This reverts commit e66485d7, since
      Rafael Wysocki noticed that the change only works for his in -mm, not in
      mainline (and that both "noapictimer" _and_ "apicmaintimer" are broken
      on his hardware, but that's apparently not a regression, just a symptom
      of the same issue that causes the automatic apic timer disable to not
      work).
      
      It turns out that it really doesn't work correctly on x86-64, since
      x86-64 doesn't use the generic clock events for timers yet.
      
      Thanks to Rafal for testing, and here's the ugly details on x86-64 as
      per Thomas:
      
        "I just looked into the code and the logic vs.  noapictimer on SMP is
         completely broken.
      
         On i386 the noapictimer option not only disables the local APIC
         timer, it also registers the CPUs for broadcasting via IPI on SMP
         systems.
      
         The x86-64 code uses the broadcast only when the local apic timer is
         active, i.e.  "noapictimer" is not on the command line.  This defeats
         the whole purpose of "noapictimer".  It should be there to make boxen
         work, where the local APIC timer actually has a hardware problem,
         e.g.  the nx6325.
      
         The current implementation of x86_64 only fixes the ACPI c-states
         related problem where the APIC timer stops in C3(2), nothing else.
      
         On nx6325 and other AMD X2 equipped systems which have the C1E
         enabled we run into the following:
      
         PIT keeps jiffies (and the system) running, but the local APIC timer
         interrupts can get out of sync due to this C1E effect.
      
         I don't think this is a critical problem, but it is wrong
         nevertheless.
      
         I think it's safe to revert the C1E patch and postpone the fix to the
         clock events conversion."
      
      On further reflection, Thomas noted:
      
         "It's even worse than I thought on the first check:
      
          "noapictimer" on the command line of an SMP box prevents _ONLY_ the
          boot CPU apic timer from being used.  But the secondary CPU is still
          unconditionally setting up the APIC timer and uses the non
          calibrated variable calibration_result, which is of course 0, to
          setup the APIC timer.  Wreckage guaranteed."
      
      so we'll just have to wait for the x86 merge to hopefully fix this up
      for x86-64.
      Tested-and-requested-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f7f847b0
    • T
      x86-64: Disable local APIC timer use on AMD systems with C1E · e66485d7
      Thomas Gleixner 提交于
      commit 3556ddfa titled
      
       [PATCH] x86-64: Disable local APIC timer use on AMD systems with C1E
      
      solves a problem with AMD dual core laptops e.g. HP nx6325 (Turion 64
      X2) with C1E enabled:
      
      When both cores go into idle at the same time, then the system switches
      into C1E state, which is basically the same as C3. This stops the local
      apic timer.
      
      This was debugged right after the dyntick merge on i386 and despite the
      patch title it fixes only the 32 bit path.
      
      x86_64 is still missing this fix. It seems that mainline is not really
      affected by this issue, as the PIT is running and keeps jiffies
      incrementing, but that's just waiting for trouble.
      
      -mm suffers from this problem due to the x86_64 high resolution timer
      patches.
      
      This is a quick and dirty port of the i386 code to x86_64.
      
      I spent quite a time with Rafael to debug the -mm / hrt wreckage until
      someone pointed us to this. I really had forgotten that we debugged this
      half a year ago already.
      
      Sigh, is it just me or is there something yelling arch/x86 into my ear?
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e66485d7
    • A
      fix sctp_del_bind_addr() last argument type · 78bd8fbb
      Al Viro 提交于
      It gets pointer to fastcall function, expects a pointer to normal
      one and calls the sucker.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      78bd8fbb
  7. 26 9月, 2007 3 次提交
  8. 25 9月, 2007 1 次提交
  9. 23 9月, 2007 1 次提交
    • T
      ACPI: disable lower idle C-states across suspend/resume · b04e7bdb
      Thomas Gleixner 提交于
      device_suspend() calls ACPI suspend functions, which seems to have undesired
      side effects on lower idle C-states. It took me some time to realize that
      especially the VAIO BIOSes (both Andrews jinxed UP and my elfstruck SMP one)
      show this effect. I'm quite sure that other bug reports against suspend/resume
      about turning the system into a brick have the same root cause.
      
      After fishing in the dark for quite some time, I realized that removing the ACPI
      processor module before suspend (this removes the lower C-state functionality)
      made the problem disappear. Interestingly enough the propability of having a
      bricked box is influenced by various factors (interrupts, size of the ram image,
      ...). Even adding a bunch of printks in the wrong places made the problem go
      away. The previous periodic tick implementation simply pampered over the
      problem, which explains why the dyntick / clockevents changes made this more
      prominent.
      
      We avoid complex functionality during the boot process and we have to do the
      same during suspend/resume. It is a similar scenario and equaly fragile.
      
      Add suspend / resume functions to the ACPI processor code and disable the lower
      idle C-states across suspend/resume. Fall back to the default idle
      implementation (halt) instead.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b04e7bdb
  10. 22 9月, 2007 1 次提交
  11. 21 9月, 2007 1 次提交
    • D
      signalfd simplification · b8fceee1
      Davide Libenzi 提交于
      This simplifies signalfd code, by avoiding it to remain attached to the
      sighand during its lifetime.
      
      In this way, the signalfd remain attached to the sighand only during
      poll(2) (and select and epoll) and read(2).  This also allows to remove
      all the custom "tsk == current" checks in kernel/signal.c, since
      dequeue_signal() will only be called by "current".
      
      I think this is also what Ben was suggesting time ago.
      
      The external effect of this, is that a thread can extract only its own
      private signals and the group ones.  I think this is an acceptable
      behaviour, in that those are the signals the thread would be able to
      fetch w/out signalfd.
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b8fceee1
  12. 20 9月, 2007 5 次提交
    • I
      sched: add /proc/sys/kernel/sched_compat_yield · 1799e35d
      Ingo Molnar 提交于
      add /proc/sys/kernel/sched_compat_yield to make sys_sched_yield()
      more agressive, by moving the yielding task to the last position
      in the rbtree.
      
      with sched_compat_yield=0:
      
         PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
        2539 mingo     20   0  1576  252  204 R   50  0.0   0:02.03 loop_yield
        2541 mingo     20   0  1576  244  196 R   50  0.0   0:02.05 loop
      
      with sched_compat_yield=1:
      
         PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
        2584 mingo     20   0  1576  248  196 R   99  0.0   0:52.45 loop
        2582 mingo     20   0  1576  256  204 R    0  0.0   0:00.00 loop_yield
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      1799e35d
    • M
      [MIPS] cpu-bugs64.c: GCC 3.3 constraint workaround · 09abbcff
      Maciej W. Rozycki 提交于
      Add a workaround to address warnings generated on the "n" constraint by
      GCC 3.3 and below.
      Signed-off-by: NMaciej W. Rozycki <macro@linux-mips.org>
      Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      09abbcff
    • L
      Fix NUMA Memory Policy Reference Counting · 480eccf9
      Lee Schermerhorn 提交于
      This patch proposes fixes to the reference counting of memory policy in the
      page allocation paths and in show_numa_map().  Extracted from my "Memory
      Policy Cleanups and Enhancements" series as stand-alone.
      
      Shared policy lookup [shmem] has always added a reference to the policy,
      but this was never unrefed after page allocation or after formatting the
      numa map data.
      
      Default system policy should not require additional ref counting, nor
      should the current task's task policy.  However, show_numa_map() calls
      get_vma_policy() to examine what may be [likely is] another task's policy.
      The latter case needs protection against freeing of the policy.
      
      This patch adds a reference count to a mempolicy returned by
      get_vma_policy() when the policy is a vma policy or another task's
      mempolicy.  Again, shared policy is already reference counted on lookup.  A
      matching "unref" [__mpol_free()] is performed in alloc_page_vma() for
      shared and vma policies, and in show_numa_map() for shared and another
      task's mempolicy.  We can call __mpol_free() directly, saving an admittedly
      inexpensive inline NULL test, because we know we have a non-NULL policy.
      
      Handling policy ref counts for hugepages is a bit trickier.
      huge_zonelist() returns a zone list that might come from a shared or vma
      'BIND policy.  In this case, we should hold the reference until after the
      huge page allocation in dequeue_hugepage().  The patch modifies
      huge_zonelist() to return a pointer to the mempolicy if it needs to be
      unref'd after allocation.
      
      Kernel Build [16cpu, 32GB, ia64] - average of 10 runs:
      
      		w/o patch	w/ refcount patch
      	    Avg	  Std Devn	   Avg	  Std Devn
      Real:	 100.59	    0.38	 100.63	    0.43
      User:	1209.60	    0.37	1209.91	    0.31
      System:   81.52	    0.42	  81.64	    0.34
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Acked-by: NAndi Kleen <ak@suse.de>
      Cc: Christoph Lameter <clameter@sgi.com>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      480eccf9
    • P
      Fix user namespace exiting OOPs · 28f300d2
      Pavel Emelyanov 提交于
      It turned out, that the user namespace is released during the do_exit() in
      exit_task_namespaces(), but the struct user_struct is released only during the
      put_task_struct(), i.e.  MUCH later.
      
      On debug kernels with poisoned slabs this will cause the oops in
      uid_hash_remove() because the head of the chain, which resides inside the
      struct user_namespace, will be already freed and poisoned.
      
      Since the uid hash itself is required only when someone can search it, i.e.
      when the namespace is alive, we can safely unhash all the user_struct-s from
      it during the namespace exiting.  The subsequent free_uid() will complete the
      user_struct destruction.
      
      For example simple program
      
         #include <sched.h>
      
         char stack[2 * 1024 * 1024];
      
         int f(void *foo)
         {
         	return 0;
         }
      
         int main(void)
         {
         	clone(f, stack + 1 * 1024 * 1024, 0x10000000, 0);
         	return 0;
         }
      
      run on kernel with CONFIG_USER_NS turned on will oops the
      kernel immediately.
      
      This was spotted during OpenVZ kernel testing.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NAlexey Dobriyan <adobriyan@openvz.org>
      Acked-by: N"Serge E. Hallyn" <serue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      28f300d2
    • P
      Convert uid hash to hlist · 735de223
      Pavel Emelyanov 提交于
      Surprisingly, but (spotted by Alexey Dobriyan) the uid hash still uses
      list_heads, thus occupying twice as much place as it could.  Convert it to
      hlist_heads.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NAlexey Dobriyan <adobriyan@openvz.org>
      Acked-by: NSerge Hallyn <serue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      735de223
  13. 19 9月, 2007 1 次提交
  14. 17 9月, 2007 5 次提交
  15. 15 9月, 2007 2 次提交
  16. 13 9月, 2007 1 次提交
    • P
      Define termios_1 functions for powerpc, s390, avr32 and frv · b0052fca
      Paul Mackerras 提交于
      Commit f629307c introduced uses of
      kernel_termios_to_user_termios_1 and user_termios_to_kernel_termios_1
      on all architectures.  However, powerpc, s390, avr32 and frv don't
      currently define those functions since their termios struct didn't
      need to be changed when the arbitrary baud rate stuff was added, and
      thus the kernel won't currently build on those architectures.
      
      This adds definitions of kernel_termios_to_user_termios_1 and
      user_termios_to_kernel_termios_1 to include/asm-generic/termios.h
      which are identical to kernel_termios_to_user_termios and
      user_termios_to_kernel_termios respectively.  The definitions are the
      same because the "old" termios and "new" termios are in fact the same
      on these architectures (which are the same ones that use
      asm-generic/termios.h).
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Alan Cox <alan@redhat.com>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b0052fca
  17. 12 9月, 2007 8 次提交