1. 12 6月, 2009 2 次提交
  2. 10 6月, 2009 4 次提交
    • L
      tracing/events: convert block trace points to TRACE_EVENT(), fix !CONFIG_BLOCK · f1db457c
      Li Zefan 提交于
      Fix building failures when CONFIG_BLOCK == n.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      LKML-Reference: <4A2F1520.8020003@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f1db457c
    • B
      spinlock: Add missing __raw_spin_lock_flags() stub for UP · 04dce7d9
      Benjamin Herrenschmidt 提交于
      This was only defined with CONFIG_DEBUG_SPINLOCK set, but some
      obscure arch/powerpc code wants it always.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      04dce7d9
    • S
      tracing: add trace_seq_vprint interface · 725c624a
      Steven Rostedt 提交于
      The code to update the print formats for events requires a vprintf
      format in the trace_seq. This patch adds that interface.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      725c624a
    • L
      tracing/events: convert block trace points to TRACE_EVENT() · 55782138
      Li Zefan 提交于
      TRACE_EVENT is a more generic way to define tracepoints. Doing so adds
      these new capabilities to this tracepoint:
      
        - zero-copy and per-cpu splice() tracing
        - binary tracing without printf overhead
        - structured logging records exposed under /debug/tracing/events
        - trace events embedded in function tracer output and other plugins
        - user-defined, per tracepoint filter expressions
        ...
      
      Cons:
      
        - no dev_t info for the output of plug, unplug_timer and unplug_io events.
          no dev_t info for getrq and sleeprq events if bio == NULL.
          no dev_t info for rq_abort,...,rq_requeue events if rq->rq_disk == NULL.
      
          This is mainly because we can't get the deivce from a request queue.
          But this may change in the future.
      
        - A packet command is converted to a string in TP_assign, not TP_print.
          While blktrace do the convertion just before output.
      
          Since pc requests should be rather rare, this is not a big issue.
      
        - In blktrace, an event can have 2 different print formats, but a TRACE_EVENT
          has a unique format, which means we have some unused data in a trace entry.
      
          The overhead is minimized by using __dynamic_array() instead of __array().
      
      I've benchmarked the ioctl blktrace vs the splice based TRACE_EVENT tracing:
      
            dd                   dd + ioctl blktrace       dd + TRACE_EVENT (splice)
      1     7.36s, 42.7 MB/s     7.50s, 42.0 MB/s          7.41s, 42.5 MB/s
      2     7.43s, 42.3 MB/s     7.48s, 42.1 MB/s          7.43s, 42.4 MB/s
      3     7.38s, 42.6 MB/s     7.45s, 42.2 MB/s          7.41s, 42.5 MB/s
      
      So the overhead of tracing is very small, and no regression when using
      those trace events vs blktrace.
      
      And the binary output of TRACE_EVENT is much smaller than blktrace:
      
       # ls -l -h
       -rw-r--r-- 1 root root 8.8M 06-09 13:24 sda.blktrace.0
       -rw-r--r-- 1 root root 195K 06-09 13:24 sda.blktrace.1
       -rw-r--r-- 1 root root 2.7M 06-09 13:25 trace_splice.out
      
      Following are some comparisons between TRACE_EVENT and blktrace:
      
      plug:
        kjournald-480   [000]   303.084981: block_plug: [kjournald]
        kjournald-480   [000]   303.084981:   8,0    P   N [kjournald]
      
      unplug_io:
        kblockd/0-118   [000]   300.052973: block_unplug_io: [kblockd/0] 1
        kblockd/0-118   [000]   300.052974:   8,0    U   N [kblockd/0] 1
      
      remap:
        kjournald-480   [000]   303.085042: block_remap: 8,0 W 102736992 + 8 <- (8,8) 33384
        kjournald-480   [000]   303.085043:   8,0    A   W 102736992 + 8 <- (8,8) 33384
      
      bio_backmerge:
        kjournald-480   [000]   303.085086: block_bio_backmerge: 8,0 W 102737032 + 8 [kjournald]
        kjournald-480   [000]   303.085086:   8,0    M   W 102737032 + 8 [kjournald]
      
      getrq:
        kjournald-480   [000]   303.084974: block_getrq: 8,0 W 102736984 + 8 [kjournald]
        kjournald-480   [000]   303.084975:   8,0    G   W 102736984 + 8 [kjournald]
      
        bash-2066  [001]  1072.953770:   8,0    G   N [bash]
        bash-2066  [001]  1072.953773: block_getrq: 0,0 N 0 + 0 [bash]
      
      rq_complete:
        konsole-2065  [001]   300.053184: block_rq_complete: 8,0 W () 103669040 + 16 [0]
        konsole-2065  [001]   300.053191:   8,0    C   W 103669040 + 16 [0]
      
        ksoftirqd/1-7   [001]  1072.953811:   8,0    C   N (5a 00 08 00 00 00 00 00 24 00) [0]
        ksoftirqd/1-7   [001]  1072.953813: block_rq_complete: 0,0 N (5a 00 08 00 00 00 00 00 24 00) 0 + 0 [0]
      
      rq_insert:
        kjournald-480   [000]   303.084985: block_rq_insert: 8,0 W 0 () 102736984 + 8 [kjournald]
        kjournald-480   [000]   303.084986:   8,0    I   W 102736984 + 8 [kjournald]
      
      Changelog from v2 -> v3:
      
      - use the newly introduced __dynamic_array().
      
      Changelog from v1 -> v2:
      
      - use __string() instead of __array() to minimize the memory required
        to store hex dump of rq->cmd().
      
      - support large pc requests.
      
      - add missing blk_fill_rwbs_rq() in block_rq_requeue TRACE_EVENT.
      
      - some cleanups.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      LKML-Reference: <4A2DF669.5070905@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      55782138
  3. 09 6月, 2009 2 次提交
    • Y
      cpumask: introduce zalloc_cpumask_var · 0281b5dc
      Yinghai Lu 提交于
      So can get cpumask_var with cpumask_clear
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      0281b5dc
    • P
      ring-buffer: pass in lockdep class key for reader_lock · 1f8a6a10
      Peter Zijlstra 提交于
      On Sun, 7 Jun 2009, Ingo Molnar wrote:
      > Testing tracer sched_switch: <6>Starting ring buffer hammer
      > PASSED
      > Testing tracer sysprof: PASSED
      > Testing tracer function: PASSED
      > Testing tracer irqsoff:
      > =============================================
      > PASSED
      > Testing tracer preemptoff: PASSED
      > Testing tracer preemptirqsoff: [ INFO: possible recursive locking detected ]
      > PASSED
      > Testing tracer branch: 2.6.30-rc8-tip-01972-ge5b9078-dirty #5760
      > ---------------------------------------------
      > rb_consumer/431 is trying to acquire lock:
      >  (&cpu_buffer->reader_lock){......}, at: [<c109eef7>] ring_buffer_reset_cpu+0x37/0x70
      >
      > but task is already holding lock:
      >  (&cpu_buffer->reader_lock){......}, at: [<c10a019e>] ring_buffer_consume+0x7e/0xc0
      >
      > other info that might help us debug this:
      > 1 lock held by rb_consumer/431:
      >  #0:  (&cpu_buffer->reader_lock){......}, at: [<c10a019e>] ring_buffer_consume+0x7e/0xc0
      
      The ring buffer is a generic structure, and can be used outside of
      ftrace. If ftrace traces within the use of the ring buffer, it can produce
      false positives with lockdep.
      
      This patch passes in a static lock key into the allocation of the ring
      buffer, so that different ring buffers will have their own lock class.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1244477919.13761.9042.camel@twins>
      
      [ store key in ring buffer descriptor ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      1f8a6a10
  4. 07 6月, 2009 1 次提交
  5. 05 6月, 2009 1 次提交
    • O
      ptrace: tracehook_report_clone: fix false positives · 087eb437
      Oleg Nesterov 提交于
      The "trace || CLONE_PTRACE" check in tracehook_report_clone() is not right,
      
      - If the untraced task does clone(CLONE_PTRACE) the new child is not traced,
        we must not queue SIGSTOP.
      
      - If we forked the traced task, but the tracer exits and untraces both the
        forking task and the new child (after copy_process() drops tasklist_lock),
        we should not queue SIGSTOP too.
      
      Change the code to check task_ptrace() != 0 instead. This is still racy, but
      the race is harmless.
      
      We can race with another tracer attaching to this child, or the tracer can
      exit and detach in parallel. But giwen that we didn't do wake_up_new_task()
      yet, the child must have the pending SIGSTOP anyway.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NRoland McGrath <roland@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      087eb437
  6. 03 6月, 2009 1 次提交
  7. 02 6月, 2009 1 次提交
  8. 01 6月, 2009 2 次提交
  9. 29 5月, 2009 3 次提交
  10. 27 5月, 2009 2 次提交
    • S
      tracing: add __print_symbolic to trace events · 0f4fc29d
      Steven Rostedt 提交于
      This patch adds __print_symbolic which is similar to __print_flags but
      works for an enumeration type instead. That is, there is only a one to one
      mapping between the values and the symbols. When a match is made, then
      it is printed, otherwise the hex value is outputed.
      
      [ Impact: add interface for showing symbol names in events ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      0f4fc29d
    • S
      tracing: add __print_flags for events · be74b73a
      Steven Rostedt 提交于
      Developers have been asking for the ability in the ftrace event tracer
      to display names of bits in a flags variable.
      
      Instead of printing out c2, it would be easier to read FOO|BAR|GOO,
      assuming that FOO is bit 1, BAR is bit 6 and GOO is bit 7.
      
      Some examples where this would be useful are the state flags in a context
      switch, kmalloc flags, and even permision flags in accessing files.
      
      [
        v2 changes include:
      
        Frederic Weisbecker's idea of using a mask instead of bits,
        thus we can output GFP_KERNEL instead of GPF_WAIT|GFP_IO|GFP_FS.
      
        Li Zefan's idea of allowing the caller of __print_flags to add their
        own delimiter (or no delimiter) where we can get for file permissions
        rwx instead of r|w|x.
      ]
      
      [
        v3 changes:
      
         Christoph Hellwig's idea of using an array instead of va_args.
      ]
      
      [ Impact: better displaying of flags in trace output ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      be74b73a
  11. 25 5月, 2009 1 次提交
    • J
      netfilter: nf_ct_tcp: fix accepting invalid RST segments · bfcaa502
      Jozsef Kadlecsik 提交于
      Robert L Mathews discovered that some clients send evil TCP RST segments,
      which are accepted by netfilter conntrack but discarded by the
      destination. Thus the conntrack entry is destroyed but the destination
      retransmits data until timeout.
      
      The same technique, i.e. sending properly crafted RST segments, can easily
      be used to bypass connlimit/connbytes based restrictions (the sample
      script written by Robert can be found in the netfilter mailing list
      archives).
      
      The patch below adds a new flag and new field to struct ip_ct_tcp_state so
      that checking RST segments can be made more strict and thus TCP conntrack
      can catch the invalid ones: the RST segment is accepted only if its
      sequence number higher than or equal to the highest ack we seen from the
      other direction. (The last_ack field cannot be reused because it is used
      to catch resent packets.)
      Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      bfcaa502
  12. 24 5月, 2009 1 次提交
  13. 22 5月, 2009 2 次提交
  14. 21 5月, 2009 1 次提交
  15. 19 5月, 2009 1 次提交
  16. 18 5月, 2009 3 次提交
    • M
      [ARM] Double check memmap is actually valid with a memmap has unexpected holes V2 · eb33575c
      Mel Gorman 提交于
      pfn_valid() is meant to be able to tell if a given PFN has valid memmap
      associated with it or not. In FLATMEM, it is expected that holes always
      have valid memmap as long as there is valid PFNs either side of the hole.
      In SPARSEMEM, it is assumed that a valid section has a memmap for the
      entire section.
      
      However, ARM and maybe other embedded architectures in the future free
      memmap backing holes to save memory on the assumption the memmap is never
      used. The page_zone linkages are then broken even though pfn_valid()
      returns true. A walker of the full memmap must then do this additional
      check to ensure the memmap they are looking at is sane by making sure the
      zone and PFN linkages are still valid. This is expensive, but walkers of
      the full memmap are extremely rare.
      
      This was caught before for FLATMEM and hacked around but it hits again for
      SPARSEMEM because the page_zone linkages can look ok where the PFN linkages
      are totally screwed. This looks like a hatchet job but the reality is that
      any clean solution would end up consumning all the memory saved by punching
      these unexpected holes in the memmap. For example, we tried marking the
      memmap within the section invalid but the section size exceeds the size of
      the hole in most cases so pfn_valid() starts returning false where valid
      memmap exists. Shrinking the size of the section would increase memory
      consumption offsetting the gains.
      
      This patch identifies when an architecture is punching unexpected holes
      in the memmap that the memory model cannot automatically detect and sets
      ARCH_HAS_HOLES_MEMORYMODEL. At the moment, this is restricted to EP93xx
      which is the model sub-architecture this has been reported on but may expand
      later. When set, walkers of the full memmap must call memmap_valid_within()
      for each PFN and passing in what it expects the page and zone to be for
      that PFN. If it finds the linkages to be broken, it assumes the memmap is
      invalid for that PFN.
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      eb33575c
    • Y
      mm, x86: remove MEMORY_HOTPLUG_RESERVE related code · 888a589f
      Yinghai Lu 提交于
      after:
      
       | commit b263295d
       | Author: Christoph Lameter <clameter@sgi.com>
       | Date:   Wed Jan 30 13:30:47 2008 +0100
       |
       |    x86: 64-bit, make sparsemem vmemmap the only memory model
      
      we don't have MEMORY_HOTPLUG_RESERVE anymore.
      
      Historically, x86-64 had an architecture-specific method for memory hotplug
      whereby it scanned the SRAT for physical memory ranges that could be
      potentially used for memory hot-add later. By reserving those ranges
      without physical memory, the memmap would be allocated and left dormant
      until needed. This depended on the DISCONTIG memory model which has been
      removed so the code implementing HOTPLUG_RESERVE is now dead.
      
      This patch removes the dead code used by MEMORY_HOTPLUG_RESERVE.
      
      (Changelog authored by Mel.)
      
      v2: updated changelog, and remove hotadd= in doc
      
      [ Impact: remove dead code ]
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Reviewed-by: NChristoph Lameter <cl@linux-foundation.org>
      Reviewed-by: NMel Gorman <mel@csn.ul.ie>
      Workflow-found-OK-by: NAndrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <4A0C4910.7090508@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      888a589f
    • J
      reiserfs: fixup perms when xattrs are disabled · b83674c0
      Jeff Mahoney 提交于
      This adds CONFIG_REISERFS_FS_XATTR protection from reiserfs_permission.
      
      This is needed to avoid warnings during file deletions and chowns with
      xattrs disabled.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b83674c0
  17. 16 5月, 2009 1 次提交
    • M
      libata: Media rotation rate and form factor heuristics · 4bca3286
      Martin K. Petersen 提交于
      This patch provides new heuristics for parsing both the form factor and
      media rotation rate ATA IDENFITY words.
      
      The reported ATA version must be 7 or greater and the device must return
      values defined as valid in the standard.  Only then are the
      characteristics reported to SCSI via the VPD B1 page.
      
      This seems like a reasonable compromise to me considering that we have
      been shipping several kernel releases that key off the rotation rate bit
      without any version checking whatsoever.  With no complaints so far.
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      4bca3286
  18. 15 5月, 2009 3 次提交
    • T
      sched, timers: cleanup avenrun users · 2d02494f
      Thomas Gleixner 提交于
      avenrun is an rough estimate so we don't have to worry about
      consistency of the three avenrun values. Remove the xtime lock
      dependency and provide a function to scale the values. Cleanup the
      users.
      
      [ Impact: cleanup ]
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      2d02494f
    • T
      sched, timers: move calc_load() to scheduler · dce48a84
      Thomas Gleixner 提交于
      Dimitri Sivanich noticed that xtime_lock is held write locked across
      calc_load() which iterates over all online CPUs. That can cause long
      latencies for xtime_lock readers on large SMP systems. 
      
      The load average calculation is an rough estimate anyway so there is
      no real need to protect the readers vs. the update. It's not a problem
      when the avenrun array is updated while a reader copies the values.
      
      Instead of iterating over all online CPUs let the scheduler_tick code
      update the number of active tasks shortly before the avenrun update
      happens. The avenrun update itself is handled by the CPU which calls
      do_timer().
      
      [ Impact: reduce xtime_lock write locked section ]
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      dce48a84
    • J
      Revert "mm: add /proc controls for pdflush threads" · cd17cbfd
      Jens Axboe 提交于
      This reverts commit fafd688e.
      
      Work is progressing to switch away from pdflush as the process backing
      for flushing out dirty data. So it seems pointless to add more knobs
      to control pdflush threads. The original author of the patch did not
      have any specific use cases for adding the knobs, so we can easily
      revert this before 2.6.30 to avoid having to maintain this API
      forever.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      cd17cbfd
  19. 13 5月, 2009 2 次提交
  20. 09 5月, 2009 6 次提交