1. 03 4月, 2009 2 次提交
    • O
      signals: remove 'handler' parameter to tracehook functions · 43918f2b
      Oleg Nesterov 提交于
      Container-init must behave like global-init to processes within the
      container and hence it must be immune to unhandled fatal signals from
      within the container (i.e SIG_DFL signals that terminate the process).
      
      But the same container-init must behave like a normal process to processes
      in ancestor namespaces and so if it receives the same fatal signal from a
      process in ancestor namespace, the signal must be processed.
      
      Implementing these semantics requires that send_signal() determine pid
      namespace of the sender but since signals can originate from workqueues/
      interrupt-handlers, determining pid namespace of sender may not always be
      possible or safe.
      
      This patchset implements the design/simplified semantics suggested by
      Oleg Nesterov.  The simplified semantics for container-init are:
      
      	- container-init must never be terminated by a signal from a
      	  descendant process.
      
      	- container-init must never be immune to SIGKILL from an ancestor
      	  namespace (so a process in parent namespace must always be able
      	  to terminate a descendant container).
      
      	- container-init may be immune to unhandled fatal signals (like
      	  SIGUSR1) even if they are from ancestor namespace. SIGKILL/SIGSTOP
      	  are the only reliable signals to a container-init from ancestor
      	  namespace.
      
      This patch:
      
      Based on an earlier patch submitted by Oleg Nesterov and comments from
      Roland McGrath (http://lkml.org/lkml/2008/11/19/258).
      
      The handler parameter is currently unused in the tracehook functions.
      Besides, the tracehook functions are called with siglock held, so the
      functions can check the handler if they later need to.
      
      Removing the parameter simiplifies changes to sig_ignored() in a follow-on
      patch.
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Acked-by: NRoland McGrath <roland@redhat.com>
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Daniel Lezcano <daniel.lezcano@free.fr>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      43918f2b
    • A
      Simplify copy_thread() · 6f2c55b8
      Alexey Dobriyan 提交于
      First argument unused since 2.3.11.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6f2c55b8
  2. 01 4月, 2009 1 次提交
  3. 31 3月, 2009 2 次提交
    • A
      proc 2/2: remove struct proc_dir_entry::owner · 99b76233
      Alexey Dobriyan 提交于
      Setting ->owner as done currently (pde->owner = THIS_MODULE) is racy
      as correctly noted at bug #12454. Someone can lookup entry with NULL
      ->owner, thus not pinning enything, and release it later resulting
      in module refcount underflow.
      
      We can keep ->owner and supply it at registration time like ->proc_fops
      and ->data.
      
      But this leaves ->owner as easy-manipulative field (just one C assignment)
      and somebody will forget to unpin previous/pin current module when
      switching ->owner. ->proc_fops is declared as "const" which should give
      some thoughts.
      
      ->read_proc/->write_proc were just fixed to not require ->owner for
      protection.
      
      rmmod'ed directories will be empty and return "." and ".." -- no harm.
      And directories with tricky enough readdir and lookup shouldn't be modular.
      We definitely don't want such modular code.
      
      Removing ->owner will also make PDE smaller.
      
      So, let's nuke it.
      
      Kudos to Jeff Layton for reminding about this, let's say, oversight.
      
      http://bugzilla.kernel.org/show_bug.cgi?id=12454Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      99b76233
    • R
      PM: Rework handling of interrupts during suspend-resume · 2ed8d2b3
      Rafael J. Wysocki 提交于
      Use the functions introduced in by the previous patch,
      suspend_device_irqs(), resume_device_irqs() and check_wakeup_irqs(),
      to rework the handling of interrupts during suspend (hibernation) and
      resume.  Namely, interrupts will only be disabled on the CPU right
      before suspending sysdevs, while device drivers will be prevented
      from receiving interrupts, with the help of the new helper function,
      before their "late" suspend callbacks run (and analogously during
      resume).
      
      In addition, since the device interrups are now disabled before the
      CPU has turned all interrupts off and the CPU will ACK the interrupts
      setting the IRQ_PENDING bit for them, check in sysdev_suspend() if
      any wake-up interrupts are pending and abort suspend if that's the
      case.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      2ed8d2b3
  4. 30 3月, 2009 1 次提交
  5. 26 3月, 2009 2 次提交
    • R
      Revert "x86: don't compile vsmp_64 for 32bit" · 70511134
      Ravikiran G Thirumalai 提交于
      Partial revert of commit 129d8bc8
      titled 'x86: don't compile vsmp_64 for 32bit'
      
      Commit reverted to compile vsmp_64.c if CONFIG_X86_64 is defined,
      since is_vsmp_box() needs to indicate that TSCs are not synchronized, and
      hence, not a valid time source, even when CONFIG_X86_VSMP is not defined.
      Signed-off-by: NRavikiran Thirumalai <kiran@scalex86.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: shai@scalex86.org
      LKML-Reference: <20090324061429.GH7278@localdomain>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      70511134
    • R
      x86: Correct behaviour of irq affinity · e06b1b56
      Rusty Russell 提交于
      Impact: get correct smp_affinity as user requested
      
      The effect of setting desc->affinity (ie. from userspace via sysfs) has
      varied over time.  In 2.6.27, the 32-bit code anded the value with
      cpu_online_map, and both 32 and 64-bit did that anding whenever a cpu
      was unplugged.
      
      2.6.29 consolidated this into one routine (and fixed hotplug) but
      introduced another variation: anding the affinity with cfg->domain.
      
      We should just set it to what the user said - if possible.
      
      (cpu_mask_to_apicid_and already takes cpu_online_mask into account)
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <49C94DDF.2010703@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e06b1b56
  6. 25 3月, 2009 2 次提交
    • Y
      x86: use default_cpu_mask_to_apicid for 64bit · f56e5034
      Yinghai Lu 提交于
      Impact: cleanup
      
      Use online_mask directly on 64bit too.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      LKML-Reference: <49C94DAE.9070300@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f56e5034
    • Y
      x86: fix set_extra_move_desc calling · fa74c907
      Yinghai Lu 提交于
      Impact: fix bug with irq-descriptor moving when logical flat
      
      Rusty observed:
      
      > The effect of setting desc->affinity (ie. from userspace via sysfs) has varied
      > over time.  In 2.6.27, the 32-bit code anded the value with cpu_online_map,
      > and both 32 and 64-bit did that anding whenever a cpu was unplugged.
      >
      > 2.6.29 consolidated this into one routine (and fixed hotplug) but introduced
      > another variation: anding the affinity with cfg->domain.  Is this right, or
      > should we just set it to what the user said?  Or as now, indicate that we're
      > restricting it.
      
      Eric pointed out that desc->affinity should be what the user requested,
      if it is at all possible to honor the user space request.
      
      This bug got introduced by commit 22f65d31 "x86: Update io_apic.c to use
      new cpumask API".
      
      Fix it by moving the masking to before the descriptor moving ...
      Reported-by: NRusty Russell <rusty@rustcorp.com.au>
      Reported-by: NEric W. Biederman <ebiederm@xmission.com>
      LKML-Reference: <49C94134.4000408@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      fa74c907
  7. 23 3月, 2009 6 次提交
  8. 21 3月, 2009 12 次提交
  9. 19 3月, 2009 3 次提交
  10. 18 3月, 2009 9 次提交
    • J
      x86: kprobes.c fix compilation warning · cde5edbd
      Jaswinder Singh Rajput 提交于
      arch/x86/kernel/kprobes.c:196: warning: passing argument 1 of ‘search_exception_tables’ makes integer from pointer without a cast
      Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference:<49BED952.2050809@redhat.com>
      LKML-Reference: <1237378065.13488.2.camel@localhost.localdomain>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cde5edbd
    • J
      x86: cpu/mttr/cleanup.c fix compilation warning · 4e16c888
      Jaswinder Singh Rajput 提交于
      arch/x86/kernel/cpu/mtrr/cleanup.c:197: warning: format ‘%d’ expects type ‘int’, but argument 2 has type ‘long unsigned int’
      Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      LKML-Reference: <1237378015.13488.1.camel@localhost.localdomain>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4e16c888
    • R
      x86, uv: fix cpumask iterator in uv_bau_init() · 2c74d666
      Rusty Russell 提交于
      Impact: fix boot crash on UV systems
      
      Commit 76ba0ecd "cpumask: use
      cpumask_var_t in uv_flush_tlb_others" used cur_cpu as an iterator;
      it was supposed to be zero for the code below it.
      Reported-by: NCliff Wickman <cpw@sgi.com>
      Original-From: Cliff Wickman <cpw@sgi.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Acked-by: NMike Travis <travis@sgi.com>
      Cc: steiner@sgi.com
      Cc: <stable@kernel.org>
      LKML-Reference: <200903180822.31196.rusty@rustcorp.com.au>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2c74d666
    • S
      x86: add x2apic_wrmsr_fence() to x2apic flush tlb paths · ce4e240c
      Suresh Siddha 提交于
      Impact: optimize APIC IPI related barriers
      
      Uncached MMIO accesses for xapic are inherently serializing and hence
      we don't need explicit barriers for xapic IPI paths.
      
      x2apic MSR writes/reads don't have serializing semantics and hence need
      a serializing instruction or mfence, to make all the previous memory
      stores globally visisble before the x2apic msr write for IPI.
      
      Add x2apic_wrmsr_fence() in flush tlb path to x2apic specific paths.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: "steiner@sgi.com" <steiner@sgi.com>
      Cc: Nick Piggin <npiggin@suse.de>
      LKML-Reference: <1237313814.27006.203.camel@localhost.localdomain>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ce4e240c
    • A
      x86: use smp_call_function_single() in arch/x86/kernel/cpu/mcheck/mce_amd_64.c · a6b6a14e
      Andrew Morton 提交于
      Attempting to rid us of the problematic work_on_cpu().  Just use
      smp_call_function_single() here.
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      LKML-Reference: <20090318042217.EF3F1DDF39@ozlabs.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a6b6a14e
    • S
      x86: fix broken irq migration logic while cleaning up multiple vectors · 68a8ca59
      Suresh Siddha 提交于
      Impact: fix spurious IRQs
      
      During irq migration, we send a low priority interrupt to the previous
      irq destination. This happens in non interrupt-remapping case after interrupt
      starts arriving at new destination and in interrupt-remapping case after
      modifying and flushing the interrupt-remapping table entry caches.
      
      This low priority irq cleanup handler can cleanup multiple vectors, as
      multiple irq's can be migrated at almost the same time. While
      there will be multiple invocations of irq cleanup handler (one cleanup
      IPI for each irq migration), first invocation of the cleanup handler
      can potentially cleanup more than one vector (as the first invocation can
      see the requests for more than vector cleanup). When we cleanup multiple
      vectors during the first invocation of the smp_irq_move_cleanup_interrupt(),
      other vectors that are to be cleanedup can still be pending in the local
      cpu's IRR (as smp_irq_move_cleanup_interrupt() runs with interrupts disabled).
      
      When we are ready to unhook a vector corresponding to an irq, check if that
      vector is registered in the local cpu's IRR. If so skip that cleanup and
      do a self IPI with the cleanup vector, so that we give a chance to
      service the pending vector interrupt and then cleanup that vector
      allocation once we execute the lowest priority handler.
      
      This fixes spurious interrupts seen when migrating multiple vectors
      at the same time.
      
      [ This is apparently possible even on conventional xapic, although to
        the best of our knowledge it has never been seen.  The stable
        maintainers may wish to consider this one for -stable. ]
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: stable@kernel.org
      68a8ca59
    • S
      x86, ioapic: Fix non atomic allocation with interrupts disabled · 05c3dc2c
      Suresh Siddha 提交于
      Impact: fix possible race
      
      save_mask_IO_APIC_setup() was using non atomic memory allocation while getting
      called with interrupts disabled. Fix this by splitting this into two different
      function. Allocation part save_IO_APIC_setup() now happens before
      disabling interrupts.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      05c3dc2c
    • S
      x86, x2apic: cleanup ifdef CONFIG_INTR_REMAP in io_apic code · 29b61be6
      Suresh Siddha 提交于
      Impact: cleanup
      
      Clean up #ifdefs and replace them with helper functions.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      29b61be6
    • S
      x86, x2apic: cleanup the IO-APIC level migration with interrupt-remapping · 0280f7c4
      Suresh Siddha 提交于
      Impact: simplification
      
      In the current code, for level triggered migration, we need to modify the
      io-apic RTE with the update vector information, along with modifying interrupt
      remapping table entry(IRTE) with vector and destination. This is to ensure that
      remote IRR bit inthe IOAPIC RTE gets cleared when the cpu does EOI.
      
      With this patch, for level triggered, we eliminate the io-apic RTE modification
      (with the updated vector information), by using a virtual vector (io-apic pin
      number).  Real vector that is used for interrupting cpu will be coming from
      the interrupt-remapping table entry. Trigger mode in the IRTE will always be
      edge, and the actual level or edge trigger will be setup in the IO-APIC RTE.
      So a level triggered interrupt will appear as an edge to the local apic
      cpu but still as level to the IO-APIC.
      
      With this change, level irq migration can be done by simply modifying
      the interrupt-remapping table entry with out changing the io-apic RTE.
      And as the interrupt appears as edge at the cpu, in addition to do the
      local apic EOI, we need to do IO-APIC directed EOI to clear the remote
      IRR bit in  the IO-APIC RTE.
      
      This simplies the irq migration in the presence of interrupt-remapping.
      Idea-by: NRajesh Sankaran <rajesh.sankaran@intel.com>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      0280f7c4