1. 19 12月, 2008 3 次提交
  2. 18 12月, 2008 1 次提交
  3. 17 12月, 2008 1 次提交
    • Y
      x86, sparseirq: move irq_desc according to smp_affinity, v7 · 48a1b10a
      Yinghai Lu 提交于
      Impact: improve NUMA handling by migrating irq_desc on smp_affinity changes
      
      if CONFIG_NUMA_MIGRATE_IRQ_DESC is set:
      
      -  make irq_desc to go with affinity aka irq_desc moving etc
      -  call move_irq_desc in irq_complete_move()
      -  legacy irq_desc is not moved, because they are allocated via static array
      
      for logical apic mode, need to add move_desc_in_progress_in_same_domain,
      otherwise it will not be moved ==> also could need two phases to get
      irq_desc moved.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      48a1b10a
  4. 16 12月, 2008 1 次提交
  5. 13 12月, 2008 6 次提交
    • R
      cpumask: Use all NR_CPUS bits unless CONFIG_CPUMASK_OFFSTACK · 7be75853
      Rusty Russell 提交于
      Impact: futureproof as we convert more code to new APIs
      
      The old cpumask operators treat all NR_CPUS bits as relevent, the new
      ones use nr_cpumask_bits.  For large NR_CPUS and small nr_cpu_ids, this
      makes a difference.
      
      However, mixing the two can cause problems with undefined bits.  An
      arch which sets CONFIG_CPUMASK_OFFSTACK should have converted across
      to the new operators, so it's safe in that case.
      
      (Thanks to Stephen Rothwell for bisecting the initial unused-bits bug,
      and Mike Travis for this solution).
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Mike Travis <travis@sgi.com>
      7be75853
    • R
      cpumask: Introduce cpumask_of_{node,pcibus} to replace {node,pcibus}_to_cpumask · f0b848ce
      Rusty Russell 提交于
      Impact: New APIs
      
      The old node_to_cpumask/node_to_pcibus returned a cpumask_t: these
      return a pointer to a struct cpumask.  Part of removing cpumasks from
      the stack.
      
      This defines them in the generic non-NUMA case.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      f0b848ce
    • R
      cpumask: convert struct clock_event_device to cpumask pointers. · 320ab2b0
      Rusty Russell 提交于
      Impact: change calling convention of existing clock_event APIs
      
      struct clock_event_timer's cpumask field gets changed to take pointer,
      as does the ->broadcast function.
      
      Another single-patch change.  For safety, we BUG_ON() in
      clockevents_register_device() if it's not set.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Ingo Molnar <mingo@elte.hu>
      320ab2b0
    • R
      cpumask: make irq_set_affinity() take a const struct cpumask · 0de26520
      Rusty Russell 提交于
      Impact: change existing irq_chip API
      
      Not much point with gentle transition here: the struct irq_chip's
      setaffinity method signature needs to change.
      
      Fortunately, not widely used code, but hits a few architectures.
      
      Note: In irq_select_affinity() I save a temporary in by mangling
      irq_desc[irq].affinity directly.  Ingo, does this break anything?
      
      (Folded in fix from KOSAKI Motohiro)
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NMike Travis <travis@sgi.com>
      Reviewed-by: NGrant Grundler <grundler@parisc-linux.org>
      Acked-by: NIngo Molnar <mingo@redhat.com>
      Cc: ralf@linux-mips.org
      Cc: grundler@parisc-linux.org
      Cc: jeremy@xensource.com
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      0de26520
    • R
      cpumask: change cpumask_scnprintf, cpumask_parse_user, cpulist_parse, and... · 29c0177e
      Rusty Russell 提交于
      cpumask: change cpumask_scnprintf, cpumask_parse_user, cpulist_parse, and cpulist_scnprintf to take pointers.
      
      Impact: change calling convention of existing cpumask APIs
      
      Most cpumask functions started with cpus_: these have been replaced by
      cpumask_ ones which take struct cpumask pointers as expected.
      
      These four functions don't have good replacement names; fortunately
      they're rarely used, so we just change them over.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NMike Travis <travis@sgi.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: paulus@samba.org
      Cc: mingo@redhat.com
      Cc: tony.luck@intel.com
      Cc: ralf@linux-mips.org
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Cc: cl@linux-foundation.org
      Cc: srostedt@redhat.com
      29c0177e
    • R
      cpumask: centralize cpu_online_map and cpu_possible_map · 98a79d6a
      Rusty Russell 提交于
      Impact: cleanup
      
      Each SMP arch defines these themselves.  Move them to a central
      location.
      
      Twists:
      1) Some archs (m32, parisc, s390) set possible_map to all 1, so we add a
         CONFIG_INIT_ALL_POSSIBLE for this rather than break them.
      
      2) mips and sparc32 '#define cpu_possible_map phys_cpu_present_map'.
         Those archs simply have phys_cpu_present_map replaced everywhere.
      
      3) Alpha defined cpu_possible_map to cpu_present_map; this is tricky
         so I just manipulate them both in sync.
      
      4) IA64, cris and m32r have gratuitous 'extern cpumask_t cpu_possible_map'
         declarations.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Reviewed-by: NGrant Grundler <grundler@parisc-linux.org>
      Tested-by: NTony Luck <tony.luck@intel.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: Mike Travis <travis@sgi.com>
      Cc: ink@jurassic.park.msu.ru
      Cc: rmk@arm.linux.org.uk
      Cc: starvik@axis.com
      Cc: tony.luck@intel.com
      Cc: takata@linux-m32r.org
      Cc: ralf@linux-mips.org
      Cc: grundler@parisc-linux.org
      Cc: paulus@samba.org
      Cc: schwidefsky@de.ibm.com
      Cc: lethal@linux-sh.org
      Cc: wli@holomorphy.com
      Cc: davem@davemloft.net
      Cc: jdike@addtoit.com
      Cc: mingo@redhat.com
      98a79d6a
  6. 12 12月, 2008 5 次提交
  7. 11 12月, 2008 6 次提交
  8. 10 12月, 2008 1 次提交
    • N
      netpoll: fix race on poll_list resulting in garbage entry · 7b363e44
      Neil Horman 提交于
      	A few months back a race was discused between the netpoll napi service
      path, and the fast path through net_rx_action:
      http://kerneltrap.org/mailarchive/linux-netdev/2007/10/16/345470
      
      A patch was submitted for that bug, but I think we missed a case.
      
      Consider the following scenario:
      
      INITIAL STATE
      CPU0 has one napi_struct A on its poll_list
      CPU1 is calling netpoll_send_skb and needs to call poll_napi on the same
      napi_struct A that CPU0 has on its list
      
      
      
      CPU0						CPU1
      net_rx_action					poll_napi
      !list_empty (returns true)			locks poll_lock for A
      						 poll_one_napi
      						  napi->poll
      						   netif_rx_complete
      						    __napi_complete
      						    (removes A from poll_list)
      list_entry(list->next)
      
      
      In the above scenario, net_rx_action assumes that the per-cpu poll_list is
      exclusive to that cpu.  netpoll of course violates that, and because the netpoll
      path can dequeue from the poll list, its possible for CPU0 to detect a non-empty
      list at the top of the while loop in net_rx_action, but have it become empty by
      the time it calls list_entry.  Since the poll_list isn't surrounded by any other
      structure, the returned data from that list_entry call in this situation is
      garbage, and any number of crashes can result based on what exactly that garbage
      is.
      
      Given that its not fasible for performance reasons to place exclusive locks
      arround each cpus poll list to provide that mutal exclusion, I think the best
      solution is modify the netpoll path in such a way that we continue to guarantee
      that the poll_list for a cpu is in fact exclusive to that cpu.  To do this I've
      implemented the patch below.  It adds an additional bit to the state field in
      the napi_struct.  When executing napi->poll from the netpoll_path, this bit will
      be set. When a driver calls netif_rx_complete, if that bit is set, it will not
      remove the napi_struct from the poll_list.  That work will be saved for the next
      iteration of net_rx_action.
      
      I've tested this and it seems to work well.  About the biggest drawback I can
      see to it is the fact that it might result in an extra loop through
      net_rx_action in the event that the device is actually contended for (i.e. the
      netpoll path actually preforms all the needed work no the device, and the call
      to net_rx_action winds up doing nothing, except removing the napi_struct from
      the poll_list.  However I think this is probably a small price to pay, given
      that the alternative is a crash.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7b363e44
  9. 09 12月, 2008 4 次提交
  10. 08 12月, 2008 5 次提交
    • F
      tracing/function-graph-tracer: append the tracing_graph_flag · 380c4b14
      Frederic Weisbecker 提交于
      Impact: Provide a way to pause the function graph tracer
      
      As suggested by Steven Rostedt, the previous patch that prevented from
      spinlock function tracing shouldn't use the raw_spinlock to fix it.
      It's much better to follow lockdep with normal spinlock, so this patch
      adds a new flag for each task to make the function graph tracer able
      to be paused. We also can send an ftrace_printk whithout worrying of
      the irrelevant traced spinlock during insertion.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      380c4b14
    • F
      tracing/function-graph-tracer: introduce __notrace_funcgraph to filter special functions · 8b96f011
      Frederic Weisbecker 提交于
      Impact: trace more functions
      
      When the function graph tracer is configured, three more files are not
      traced to prevent only four functions to be traced. And this impacts the
      normal function tracer too.
      
      arch/x86/kernel/process_64/32.c:
      
      I had crashes when I let this file traced. After some debugging, I saw
      that the "current" task point was changed inside__swtich_to(), ie:
      "write_pda(pcurrent, next_p);" inside process_64.c Since the tracer store
      the original return address of the function inside current, we had
      crashes. Only __switch_to() has to be excluded from tracing.
      
      kernel/module.c and kernel/extable.c:
      
      Because of a function used internally by the function graph tracer:
      __kernel_text_address()
      
      To let the other functions inside these files to be traced, this patch
      introduces the __notrace_funcgraph function prefix which is __notrace if
      function graph tracer is configured and nothing if not.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8b96f011
    • Y
      x86, MSI: pass irq_cfg and irq_desc · 3145e941
      Yinghai Lu 提交于
      Impact: simplify code
      
      Pass irq_desc and cfg around, instead of raw IRQ numbers - this way
      we dont have to look it up again and again.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3145e941
    • Y
      sparse irq_desc[] array: core kernel and x86 changes · 0b8f1efa
      Yinghai Lu 提交于
      Impact: new feature
      
      Problem on distro kernels: irq_desc[NR_IRQS] takes megabytes of RAM with
      NR_CPUS set to large values. The goal is to be able to scale up to much
      larger NR_IRQS value without impacting the (important) common case.
      
      To solve this, we generalize irq_desc[NR_IRQS] to an (optional) array of
      irq_desc pointers.
      
      When CONFIG_SPARSE_IRQ=y is used, we use kzalloc_node to get irq_desc,
      this also makes the IRQ descriptors NUMA-local (to the site that calls
      request_irq()).
      
      This gets rid of the irq_cfg[] static array on x86 as well: irq_cfg now
      uses desc->chip_data for x86 to store irq_cfg.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0b8f1efa
    • L
      ring_buffer: fix comments · 361b73d5
      Lai Jiangshan 提交于
      Impact: comments cleanup
      
      fix incorrect comments for enum ring_buffer_type
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      361b73d5
  11. 06 12月, 2008 1 次提交
  12. 05 12月, 2008 1 次提交
  13. 04 12月, 2008 5 次提交
    • C
      [PATCH 2/2] documnt FMODE_ constants · fc9161e5
      Christoph Hellwig 提交于
      Make sure all FMODE_ constants are documents, and ensure a coherent
      style for the already existing comments.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      fc9161e5
    • C
      [PATCH 1/2] kill FMODE_NDELAY_NOW · fd4ce1ac
      Christoph Hellwig 提交于
      Update FMODE_NDELAY before each ioctl call so that we can kill the
      magic FMODE_NDELAY_NOW.  It would be even better to do this directly
      in setfl(), but for that we'd need to have FMODE_NDELAY for all files,
      not just block special files.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      fd4ce1ac
    • S
      pid: fix the do_each_pid_task() macro · 5ef64761
      Steven Rostedt 提交于
      Impact: macro side-effects fix
      
      This patch adds parenthesis around 'pid' in the do_each_pid_task
      macro to allow callers to pass in more complex parameters.
      
      e.g.  do_each_pid_task(*pid, type, task)
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5ef64761
    • S
      ftrace: graph of a single function · ea4e2bc4
      Steven Rostedt 提交于
      This patch adds the file:
      
         /debugfs/tracing/set_graph_function
      
      which can be used along with the function graph tracer.
      
      When this file is empty, the function graph tracer will act as
      usual. When the file has a function in it, the function graph
      tracer will only trace that function.
      
      For example:
      
       # echo blk_unplug > /debugfs/tracing/set_graph_function
       # cat /debugfs/tracing/trace
       [...]
       ------------------------------------------
       | 2)  make-19003  =>  kjournald-2219
       ------------------------------------------
      
       2)               |  blk_unplug() {
       2)               |    dm_unplug_all() {
       2)               |      dm_get_table() {
       2)      1.381 us |        _read_lock();
       2)      0.911 us |        dm_table_get();
       2)      1. 76 us |        _read_unlock();
       2) +   12.912 us |      }
       2)               |      dm_table_unplug_all() {
       2)               |        blk_unplug() {
       2)      0.778 us |          generic_unplug_device();
       2)      2.409 us |        }
       2)      5.992 us |      }
       2)      0.813 us |      dm_table_put();
       2) +   29. 90 us |    }
       2) +   34.532 us |  }
      
      You can add up to 32 functions into this file. Currently we limit it
      to 32, but this may change with later improvements.
      
      To add another function, use the append '>>':
      
        # echo sys_read >> /debugfs/tracing/set_graph_function
        # cat /debugfs/tracing/set_graph_function
        blk_unplug
        sys_read
      
      Using the '>' will clear out the function and write anew:
      
        # echo sys_write > /debug/tracing/set_graph_function
        # cat /debug/tracing/set_graph_function
        sys_write
      
      Note, if you have function graph running while doing this, the small
      time between clearing it and updating it will cause the graph to
      record all functions. This should not be an issue because after
      it sets the filter, only those functions will be recorded from then on.
      If you need to only record a particular function then set this
      file first before starting the function graph tracer. In the future
      this side effect may be corrected.
      
      The set_graph_function file is similar to the set_ftrace_filter but
      it does not take wild cards nor does it allow for more than one
      function to be set with a single write. There is no technical reason why
      this is the case, I just do not have the time yet to implement that.
      
      Note, dynamic ftrace must be enabled for this to appear because it
      uses the dynamic ftrace records to match the name to the mcount
      call sites.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ea4e2bc4
    • O
      can: Fix CAN_(EFF|RTR)_FLAG handling in can_filter · d253eee2
      Oliver Hartkopp 提交于
      Due to a wrong safety check in af_can.c it was not possible to filter
      for SFF frames with a specific CAN identifier without getting the
      same selected CAN identifier from a received EFF frame also.
      
      This fix has a minimum (but user visible) impact on the CAN filter
      API and therefore the CAN version is set to a new date.
      
      Indeed the 'old' API is still working as-is. But when now setting
      CAN_(EFF|RTR)_FLAG in can_filter.can_mask you might get less traffic
      than before - but still the stuff that you expected to get for your
      defined filter ...
      
      Thanks to Kurt Van Dijck for pointing at this issue and for the review.
      Signed-off-by: NOliver Hartkopp <oliver@hartkopp.net>
      Acked-by: NKurt Van Dijck <kurt.van.dijck@eia.be>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d253eee2