- 27 8月, 2012 1 次提交
-
-
由 Robert Richter 提交于
Under certain workloads we see the following warnings: WQ on CPU0, prefer CPU1 WQ on CPU0, prefer CPU2 WQ on CPU0, prefer CPU3 It warns the user that the wq to access a per-cpu buffers runs not on the same cpu. This happens if the wq is rescheduled on a different cpu than where the buffer is located. This was probably implemented to detect performance issues. Not sure if there actually is one as the buffers are copied to a single buffer anyway which should be the actual bottleneck. We wont change WQ implementation. Since a user can do nothing the warning is pointless. Removing it. Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: NRobert Richter <robert.richter@amd.com>
-
- 15 2月, 2011 1 次提交
-
-
由 Heinz Graalfs 提交于
This patch introduces a new oprofile sample add function (oprofile_add_ext_hw_sample) that can also take task_struct as an argument, which is used by the hwsampler kernel module when copying hardware samples to OProfile buffers. Applied with following changes: * removed #include <linux/module.h> * whitespace changes * removed conditional compilation (CONFIG_HAVE_HWSAMPLER) * modified order of functions * fix missing function definition in header file Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: NMaran Pakkirisamy <maranp@linux.vnet.ibm.com> Signed-off-by: NHeinz Graalfs <graalfs@linux.vnet.ibm.com> Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NRobert Richter <robert.richter@amd.com>
-
- 29 10月, 2010 1 次提交
-
-
由 Tejun Heo 提交于
flush_scheduled_work() is deprecated and scheduled to be removed. sync_stop() currently cancels cpu_buffer works inside buffer_mutex and flushes the system workqueue outside. Instead, split end_cpu_work() into two parts - stopping further work enqueues and flushing works - and do the former inside buffer_mutex and latter outside. For stable kernels v2.6.35.y and v2.6.36.y. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: stable@kernel.org Signed-off-by: NRobert Richter <robert.richter@amd.com>
-
- 25 8月, 2010 1 次提交
-
-
由 Robert Richter 提交于
This patch fixes a crash during shutdown reported below. The crash is caused by accessing already freed task structs. The fix changes the order for registering and unregistering notifier callbacks. All notifiers must be initialized before buffers start working. To stop buffer synchronization we cancel all workqueues, unregister the notifier callback and then flush all buffers. After all of this we finally can free all tasks listed. This should avoid accessing freed tasks. On 22.07.10 01:14:40, Benjamin Herrenschmidt wrote: > So the initial observation is a spinlock bad magic followed by a crash > in the spinlock debug code: > > [ 1541.586531] BUG: spinlock bad magic on CPU#5, events/5/136 > [ 1541.597564] Unable to handle kernel paging request for data at address 0x6b6b6b6b6b6b6d03 > > Backtrace looks like: > > spin_bug+0x74/0xd4 > ._raw_spin_lock+0x48/0x184 > ._spin_lock+0x10/0x24 > .get_task_mm+0x28/0x8c > .sync_buffer+0x1b4/0x598 > .wq_sync_buffer+0xa0/0xdc > .worker_thread+0x1d8/0x2a8 > .kthread+0xa8/0xb4 > .kernel_thread+0x54/0x70 > > So we are accessing a freed task struct in the work queue when > processing the samples. Reported-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Cc: stable@kernel.org Signed-off-by: NRobert Richter <robert.richter@amd.com>
-
- 04 5月, 2010 1 次提交
-
-
由 Phil Carmody 提交于
http://lkml.org/lkml/2010/4/27/285 Protect against dereferencing regs when it's NULL, and force a magic number into pc to prevent too deep processing. This approach permits the dropped samples to be tallied as invalid Instruction Pointer events. e.g. output from about 15mins at 10kHz sample rate: Nr. samples received: 2565380 Nr. samples lost invalid pc: 4 Signed-off-by: NPhil Carmody <ext-phil.2.carmody@nokia.com> Signed-off-by: NRobert Richter <robert.richter@amd.com>
-
- 23 4月, 2010 1 次提交
-
-
由 Andi Kleen 提交于
oprofile used a double buffer scheme for its cpu event buffer to avoid races on reading with the old locked ring buffer. But that is obsolete now with the new ring buffer, so simply use a single buffer. This greatly simplifies the code and avoids a lot of sample drops on large runs, especially with call graph. Based on suggestions from Steven Rostedt For stable kernels from v2.6.32, but not earlier. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: stable <stable@kernel.org> Signed-off-by: NRobert Richter <robert.richter@amd.com>
-
- 01 4月, 2010 1 次提交
-
-
由 Steven Rostedt 提交于
Currently, when the ring buffer drops events, it does not record the fact that it did so. It does inform the writer that the event was dropped by returning a NULL event, but it does not put in any place holder where the event was dropped. This is not a trivial thing to add because the ring buffer mostly runs in overwrite (flight recorder) mode. That is, when the ring buffer is full, new data will overwrite old data. In a produce/consumer mode, where new data is simply dropped when the ring buffer is full, it is trivial to add the placeholder for dropped events. When there's more room to write new data, then a special event can be added to notify the reader about the dropped events. But in overwrite mode, any new write can overwrite events. A place holder can not be inserted into the ring buffer since there never may be room. A reader could also come in at anytime and miss the placeholder. Luckily, the way the ring buffer works, the read side can find out if events were lost or not, and how many events. Everytime a write takes place, if it overwrites the header page (the next read) it updates a "overrun" variable that keeps track of the number of lost events. When a reader swaps out a page from the ring buffer, it can record this number, perfom the swap, and then check to see if the number changed, and take the diff if it has, which would be the number of events dropped. This can be stored by the reader and returned to callers of the reader. Since the reader page swap will fail if the writer moved the head page since the time the reader page set up the swap, this gives room to record the overruns without worrying about races. If the reader sets up the pages, records the overrun, than performs the swap, if the swap succeeds, then the overrun variable has not been updated since the setup before the swap. For binary readers of the ring buffer, a flag is set in the header of each sub page (sub buffer) of the ring buffer. This flag is embedded in the size field of the data on the sub buffer, in the 31st bit (the size can be 32 or 64 bits depending on the architecture), but only 27 bits needs to be used for the actual size (less actually). We could add a new field in the sub buffer header to also record the number of events dropped since the last read, but this will change the format of the binary ring buffer a bit too much. Perhaps this change can be made if the information on the number of events dropped is considered important enough. Note, the notification of dropped events is only used by consuming reads or peeking at the ring buffer. Iterating over the ring buffer does not keep this information because the necessary data is only available when a page swap is made, and the iterator does not swap out pages. Cc: Robert Richter <robert.richter@amd.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: "Luis Claudio R. Goncalves" <lclaudio@uudg.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 29 10月, 2009 1 次提交
-
-
由 Tejun Heo 提交于
This patch updates percpu related symbols in oprofile such that percpu symbols are unique and don't clash with local symbols. This serves two purposes of decreasing the possibility of global percpu symbol collision and allowing dropping per_cpu__ prefix from percpu symbols. * drivers/oprofile/cpu_buffer.c: s/cpu_buffer/op_cpu_buffer/ Partly based on Rusty Russell's "alloc_percpu: rename percpu vars which cause name clashes" patch. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NRobert Richter <robert.richter@amd.com> Cc: Rusty Russell <rusty@rustcorp.com.au>
-
- 12 6月, 2009 1 次提交
-
-
由 Robert Richter 提交于
The IBS implemention writes 64 bit register values to the cpu buffer by writing two 32 values using oprofile_add_data(). This patch introduces oprofile_add_data64() to write a single 64 bit value to the buffer. Signed-off-by: NRobert Richter <robert.richter@amd.com>
-
- 11 6月, 2009 1 次提交
-
-
由 Robert Richter 提交于
This became obsolete with this commit: 6dad828b oprofile: port to the new ring_buffer Signed-off-by: NRobert Richter <robert.richter@amd.com>
-
- 07 5月, 2009 1 次提交
-
-
由 Robert Richter 提交于
The unit of oprofile_cpu_buffer_size is in samples, but was allocated in bytes. This led to the allocation of too small cpu buffers. This patch recalculates the buffer size in bytes taking also the ring_buffer_event header size into account. Reported-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: NRobert Richter <robert.richter@amd.com>
-
- 06 2月, 2009 1 次提交
-
-
由 Ingo Molnar 提交于
Oprofile's ring-buffer use was not considered. Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 18 1月, 2009 1 次提交
-
-
由 Robert Richter 提交于
Impact: fix crash In case of losing samples struct op_entry could have been used uninitialized causing e.g. a wrong preemption count or NULL pointer access. This patch fixes this. Signed-off-by: NRobert Richter <robert.richter@amd.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 08 1月, 2009 11 次提交
-
-
由 Robert Richter 提交于
This patch creates the new functions oprofile_write_reserve() oprofile_add_data() oprofile_write_commit() and makes them part of the oprofile api. Signed-off-by: NRobert Richter <robert.richter@amd.com>
-
由 Robert Richter 提交于
The ifdefs can be removed since the code is no longer ibs specific and can be used for other purposes as well. IBS specific code is only in op_model_amd.c. Signed-off-by: NRobert Richter <robert.richter@amd.com>
-
由 Robert Richter 提交于
The new ring buffer implementation allows the storage of samples with different size. This patch implements the usage of the new sample format to store ibs samples in the cpu buffer. Until now, writing to the cpu buffer could lead to incomplete sampling sequences since IBS samples were transfered in multiple samples. Due to a full buffer, data could be lost at any time. This can't happen any more since the complete data is reserved in advance and then stored in a single sample. Signed-off-by: NRobert Richter <robert.richter@amd.com>
-
由 Robert Richter 提交于
This function can be used to attach data to a sample. It returns the remaining free buffer size that has been reserved with op_cpu_buffer_write_reserve(). Signed-off-by: NRobert Richter <robert.richter@amd.com>
-
由 Robert Richter 提交于
Special events such as task or context switches are marked with an escape code in the cpu buffer followed by an event code or a task identifier. There is one escape code per event. To make escape sequences also available for data samples the internal cpu buffer format must be changed. The current implementation does not allow the extension of event codes since this would lead to collisions with the task identifiers. To avoid this, this patch introduces an event mask that allows the storage of multiple events with one escape code. Now, task identifiers are stored in the data section of the sample. The implementation also allows the usage of custom data in a sample. As a side effect the new code is much more readable and easier to understand. Signed-off-by: NRobert Richter <robert.richter@amd.com>
-
由 Robert Richter 提交于
This implements the support of samples with attached data. Signed-off-by: NRobert Richter <robert.richter@amd.com>
-
由 Robert Richter 提交于
This function prepares the cpu buffer to write a sample. Struct op_entry is used during operations on the ring buffer while struct op_sample contains the data that is stored in the ring buffer. Struct entry can be uninitialized. The function reserves a data array that is specified by size. Use op_cpu_buffer_write_commit() after preparing the sample. In case of errors a null pointer is returned, otherwise the pointer to the sample. Signed-off-by: NRobert Richter <robert.richter@amd.com>
-
由 Robert Richter 提交于
Rename the fucntion to op_add_sample() since there is a collision with another one with the same name in buffer_sync.c. Signed-off-by: NRobert Richter <robert.richter@amd.com>
-
由 Robert Richter 提交于
This code is broken since a TRACE_BEGIN_CODE is never sent to the daemon. The data becomes corrupt since the backtrace is interpreted as ibs sample. Signed-off-by: NRobert Richter <robert.richter@amd.com>
-
由 Robert Richter 提交于
Signed-off-by: NRobert Richter <robert.richter@amd.com>
-
由 Robert Richter 提交于
Signed-off-by: NRobert Richter <robert.richter@amd.com>
-
- 30 12月, 2008 4 次提交
-
-
由 Robert Richter 提交于
Signed-off-by: NRobert Richter <robert.richter@amd.com>
-
由 Robert Richter 提交于
This patch removes the unused return parameter in oprofile_begin_trace(). Also, oprofile_begin_trace() and oprofile_end_trace() are inline now. Signed-off-by: NRobert Richter <robert.richter@amd.com>
-
由 Robert Richter 提交于
This patch adds the inline function __oprofile_add_ext_sample() to cpu_buffer.c and thus reduces overhead when calling oprofile_add_sample(). Signed-off-by: NRobert Richter <robert.richter@amd.com>
-
由 Robert Richter 提交于
Reordering code to keep alloc/free functions together. Signed-off-by: NRobert Richter <robert.richter@amd.com>
-
- 29 12月, 2008 2 次提交
-
-
由 Robert Richter 提交于
This patch moves ring buffer inline functions to cpu_buffer.c. Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: NRobert Richter <robert.richter@amd.com>
-
由 Robert Richter 提交于
This patch renames cpu buffer functions to something more oprofile specific names. Functions will be moved to the global name space. Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: NRobert Richter <robert.richter@amd.com>
-
- 17 12月, 2008 1 次提交
-
-
由 Robert Richter 提交于
This patch renames kernel-wide identifiers to something more oprofile specific names. Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: NRobert Richter <robert.richter@amd.com>
-
- 11 12月, 2008 2 次提交
-
-
由 Robert Richter 提交于
The number of lost samples could be greater than the number of received samples. This patches fixes this. The implementation introduces return values for add_sample() and add_code(). Signed-off-by: NRobert Richter <robert.richter@amd.com>
-
由 Robert Richter 提交于
This function is no longer available after the port to the new ring buffer. Its removal can lead to incomplete sampling sequences since IBS samples and backtraces are transfered in multiple samples. Due to a full buffer, samples could be lost any time. The userspace daemon has to live with such incomplete sampling sequences as long as the data within one sample is consistent. This will be fixed by changing the internal buffer data there all data of one IBS sample or a backtrace is packed in a single ring buffer entry. This is possible since the new ring buffer supports variable data size. Signed-off-by: NRobert Richter <robert.richter@amd.com>
-
- 10 12月, 2008 6 次提交
-
-
由 Robert Richter 提交于
This patch replaces the current oprofile cpu buffer implementation with the ring buffer provided by the tracing framework. The motivation here is to leave the pain of implementing ring buffers to others. Oh, no, there are more advantages. Main reason is the support of different sample sizes that could be stored in the buffer. Use cases for this are IBS and Cell spu profiling. Using the new ring buffer ensures valid and complete samples and allows copying the cpu buffer stateless without knowing its content. Second it will use generic kernel API and also reduce code size. And hopefully, there are less bugs. Since the new tracing ring buffer implementation uses spin locks to protect the buffer during read/write access, it is difficult to use the buffer in an NMI handler. In this case, writing to the buffer by the NMI handler (x86) could occur also during critical sections when reading the buffer. To avoid this, there are 2 buffers for independent read and write access. Read access is in process context only, write access only in the NMI handler. If the read buffer runs empty, both buffers are swapped atomically. There is potentially a small window during swapping where the buffers are disabled and samples could be lost. Using 2 buffers is a little bit overhead, but the solution is clear and does not require changes in the ring buffer implementation. It can be changed to a single buffer solution when the ring buffer access is implemented as non-locking atomic code. The new buffer requires more size to store the same amount of samples because each sample includes an u32 header. Also, there is more code to execute for buffer access. Nonetheless, the buffer implementation is proven in the ftrace environment and worth to use also in oprofile. Patches that changes the internal IBS buffer usage will follow. Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: NRobert Richter <robert.richter@amd.com>
-
由 Robert Richter 提交于
This is in preparation for changes in the cpu buffer implementation. Signed-off-by: NRobert Richter <robert.richter@amd.com>
-
由 Robert Richter 提交于
This is in preparation for changes in the cpu buffer implementation. Signed-off-by: NRobert Richter <robert.richter@amd.com>
-
由 Robert Richter 提交于
This is in preparation for changes in the cpu buffer implementation. Signed-off-by: NRobert Richter <robert.richter@amd.com>
-
由 Robert Richter 提交于
Signed-off-by: NRobert Richter <robert.richter@amd.com>
-
由 Robert Richter 提交于
This fixes the coding style of some comments. Signed-off-by: NRobert Richter <robert.richter@amd.com>
-
- 21 10月, 2008 1 次提交
-
-
由 Carl Love 提交于
The issue is the SPU code is not holding the kernel mutex lock while adding samples to the kernel buffer. This patch creates per SPU buffers to hold the data. Data is added to the buffers from in interrupt context. The data is periodically pushed to the kernel buffer via a new Oprofile function oprofile_put_buff(). The oprofile_put_buff() function is called via a work queue enabling the funtion to acquire the mutex lock. The existing user controls for adjusting the per CPU buffer size is used to control the size of the per SPU buffers. Similarly, overflows of the SPU buffers are reported by incrementing the per CPU buffer stats. This eliminates the need to have architecture specific controls for the per SPU buffers which is not acceptable to the OProfile user tool maintainer. The export of the oprofile add_event_entry() is removed as it is no longer needed given this patch. Note, this patch has not addressed the issue of indexing arrays by the spu number. This still needs to be fixed as the spu numbering is not guarenteed to be 0 to max_num_spus-1. Signed-off-by: NCarl Love <carll@us.ibm.com> Signed-off-by: NMaynard Johnson <maynardj@us.ibm.com> Signed-off-by: NArnd Bergmann <arnd@arndb.de> Acked-by: NAcked-by: Robert Richter <robert.richter@amd.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-