- 20 4月, 2018 3 次提交
-
-
由 Martin KaFai Lau 提交于
This patch checks a few things of struct's members: 1) It has a valid size (e.g. a "const void" is invalid) 2) A member's size (+ its member's offset) does not exceed the containing struct's size. 3) The member's offset satisfies the alignment requirement The above can only be done after the needs_resolve member's type is resolved. Hence, the above is done together in btf_struct_resolve(). Each possible member's type (e.g. int, enum, modifier...) implements the check_member() ops which will be called from btf_struct_resolve(). Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Acked-by: NAlexei Starovoitov <ast@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Martin KaFai Lau 提交于
After collecting all btf_type in the first pass in an earlier patch, the second pass (in this patch) can validate the reference types (e.g. the referring type does exist and it does not refer to itself). While checking the reference type, it also gathers other information (e.g. the size of an array). This info will be useful in checking the struct's members in a later patch. They will also be useful in doing pretty print later. Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Acked-by: NAlexei Starovoitov <ast@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Martin KaFai Lau 提交于
This patch introduces BPF type Format (BTF). BTF (BPF Type Format) is the meta data format which describes the data types of BPF program/map. Hence, it basically focus on the C programming language which the modern BPF is primary using. The first use case is to provide a generic pretty print capability for a BPF map. BTF has its root from CTF (Compact C-Type format). To simplify the handling of BTF data, BTF removes the differences between small and big type/struct-member. Hence, BTF consistently uses u32 instead of supporting both "one u16" and "two u32 (+padding)" in describing type and struct-member. It also raises the number of types (and functions) limit from 0x7fff to 0x7fffffff. Due to the above changes, the format is not compatible to CTF. Hence, BTF starts with a new BTF_MAGIC and version number. This patch does the first verification pass to the BTF. The first pass checks: 1. meta-data size (e.g. It does not go beyond the total btf's size) 2. name_offset is valid 3. Each BTF_KIND (e.g. int, enum, struct....) does its own check of its meta-data. Some other checks, like checking a struct's member is referring to a valid type, can only be done in the second pass. The second verification pass will be implemented in the next patch. Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Acked-by: NAlexei Starovoitov <ast@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 17 4月, 2018 3 次提交
-
-
由 Jesper Dangaard Brouer 提交于
Changing API xdp_return_frame() to take struct xdp_frame as argument, seems like a natural choice. But there are some subtle performance details here that needs extra care, which is a deliberate choice. When de-referencing xdp_frame on a remote CPU during DMA-TX completion, result in the cache-line is change to "Shared" state. Later when the page is reused for RX, then this xdp_frame cache-line is written, which change the state to "Modified". This situation already happens (naturally) for, virtio_net, tun and cpumap as the xdp_frame pointer is the queued object. In tun and cpumap, the ptr_ring is used for efficiently transferring cache-lines (with pointers) between CPUs. Thus, the only option is to de-referencing xdp_frame. It is only the ixgbe driver that had an optimization, in which it can avoid doing the de-reference of xdp_frame. The driver already have TX-ring queue, which (in case of remote DMA-TX completion) have to be transferred between CPUs anyhow. In this data area, we stored a struct xdp_mem_info and a data pointer, which allowed us to avoid de-referencing xdp_frame. To compensate for this, a prefetchw is used for telling the cache coherency protocol about our access pattern. My benchmarks show that this prefetchw is enough to compensate the ixgbe driver. V7: Adjust for commit d9314c47 ("i40e: add support for XDP_REDIRECT") V8: Adjust for commit bd658dda ("net/mlx5e: Separate dma base address and offset in dma_sync call") Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jesper Dangaard Brouer 提交于
The generic xdp_frame format, was inspired by the cpumap own internal xdp_pkt format. It is now time to convert it over to the generic xdp_frame format. The cpumap needs one extra field dev_rx. Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jesper Dangaard Brouer 提交于
Introduce an xdp_return_frame API, and convert over cpumap as the first user, given it have queued XDP frame structure to leverage. V3: Cleanup and remove C99 style comments, pointed out by Alex Duyck. V6: Remove comment that id will be added later (Req by Alex Duyck) V8: Rename enum mem_type to xdp_mem_type (found by kbuild test robot) Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 4月, 2018 8 次提交
-
-
由 Matthew Wilcox 提交于
This results in no change in structure size on 64-bit machines as it fits in the padding between the gfp_t and the void *. 32-bit machines will grow the structure from 8 to 12 bytes. Almost all radix trees are protected with (at least) a spinlock, so as they are converted from radix trees to xarrays, the data structures will shrink again. Initialising the spinlock requires a name for the benefit of lockdep, so RADIX_TREE_INIT() now needs to know the name of the radix tree it's initialising, and so do IDR_INIT() and IDA_INIT(). Also add the xa_lock() and xa_unlock() family of wrappers to make it easier to use the lock. If we could rely on -fplan9-extensions in the compiler, we could avoid all of this syntactic sugar, but that wasn't added until gcc 4.6. Link: http://lkml.kernel.org/r/20180313132639.17387-8-willy@infradead.orgSigned-off-by: NMatthew Wilcox <mawilcox@microsoft.com> Reviewed-by: NJeff Layton <jlayton@kernel.org> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Waiman Long 提交于
Kdoc comments are added to the do_proc_dointvec_minmax_conv_param and do_proc_douintvec_minmax_conv_param structures thare are used internally for range checking. The error codes returned by proc_dointvec_minmax() and proc_douintvec_minmax() are also documented. Link: http://lkml.kernel.org/r/1519926220-7453-3-git-send-email-longman@redhat.comSigned-off-by: NWaiman Long <longman@redhat.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NLuis R. Rodriguez <mcgrof@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Kees Cook <keescook@chromium.org> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Chris Wilson 提交于
As using an unsafe module parameter is, by its very definition, an expected user action, emitting a warning is overkill. Nothing has yet gone wrong, and we add a taint flag for any future oops should something actually go wrong. So instead of having a user controllable pr_warn, downgrade it to a pr_notice for "a normal, but significant condition". We make use of unsafe kernel parameters in igt (https://cgit.freedesktop.org/drm/igt-gpu-tools/) (we have not yet succeeded in removing all such debugging options), which generates a warning and taints the kernel. The warning is unhelpful as we then need to filter it out again as we check that every test themselves do not provoke any kernel warnings. Link: http://lkml.kernel.org/r/20180226151919.9674-1-chris@chris-wilson.co.uk Fixes: 91f9d330 ("module: make it possible to have unsafe, tainting module params") Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Acked-by: NJani Nikula <jani.nikula@intel.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jean Delvare <khali@linux-fr.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Petri Latvala <petri.latvala@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Randy Dunlap 提交于
Fix sizeof argument to be the same as the data variable name. Probably a copy/paste error. Mostly harmless since both variables are unsigned int. Fixes kernel bugzilla #197371: Possible access to unintended variable in "kernel/sysctl.c" line 1339 https://bugzilla.kernel.org/show_bug.cgi?id=197371 Link: http://lkml.kernel.org/r/e0d0531f-361e-ef5f-8499-32743ba907e1@infradead.orgSigned-off-by: NRandy Dunlap <rdunlap@infradead.org> Reported-by: NPetru Mihancea <petrum@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexey Dobriyan 提交于
So "struct uts_namespace" can enjoy fine-grained SLAB debugging and usercopy protection. I'd prefer shorter name "utsns" but there is "user_namespace" already. Link: http://lkml.kernel.org/r/20180228215158.GA23146@avx2Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Serge Hallyn <serge@hallyn.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kees Cook 提交于
Since the randstruct plugin can intentionally produce extremely unusual kernel structure layouts (even performance pathological ones), some maintainers want to be able to trivially determine if an Oops is coming from a randstruct-built kernel, so as to keep their sanity when debugging. This adds the new flag and initializes taint_mask immediately when built with randstruct. Link: http://lkml.kernel.org/r/1519084390-43867-4-git-send-email-keescook@chromium.orgSigned-off-by: NKees Cook <keescook@chromium.org> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kees Cook 提交于
This consolidates the taint bit documentation into a single place with both numeric and letter values. Additionally adds the missing TAINT_AUX documentation. Link: http://lkml.kernel.org/r/1519084390-43867-3-git-send-email-keescook@chromium.orgSigned-off-by: NKees Cook <keescook@chromium.org> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kees Cook 提交于
This converts to using indexed initializers instead of comments, adds a comment on why the taint flags can't be an enum, and make sure that no one forgets to update the taint_flags when adding new bits. Link: http://lkml.kernel.org/r/1519084390-43867-2-git-send-email-keescook@chromium.orgSigned-off-by: NKees Cook <keescook@chromium.org> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 4月, 2018 1 次提交
-
-
由 Arnd Bergmann 提交于
The use of bitfields seems to confuse gcc, leading to a false-positive warning in all compiler versions: kernel/time/tick-sched.c: In function 'tick_nohz_idle_exit': kernel/time/tick-sched.c:538:2: error: 'now' may be used uninitialized in this function [-Werror=maybe-uninitialized] This introduces a temporary variable to track the flags so gcc doesn't have to evaluate twice, eliminating the code path that leads to the warning. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85301 Fixes: 1cae544d42d2 ("nohz: Gather tick_sched booleans under a common flag field") Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 09 4月, 2018 6 次提交
-
-
由 Gustavo A. R. Silva 提交于
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: NPavel Machek <pavel@ucw.cz> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Rafael J. Wysocki 提交于
Use timerqueue_iterate_next() to get to the next timer in __hrtimer_next_event_base() without browsing the timerqueue details diredctly. No intentional changes in functionality. Suggested-by: NFrederic Weisbecker <frederic@kernel.org> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Rafael J. Wysocki 提交于
Move the code setting ts->got_idle_tick into tick_sched_do_timer() to avoid code duplication. No intentional changes in functionality. Suggested-by: NFrederic Weisbecker <frederic@kernel.org> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: NFrederic Weisbecker <frederic@kernel.org>
-
由 Frederic Weisbecker 提交于
Optimize the space and leave plenty of room for further flags. Signed-off-by: NFrederic Weisbecker <frederic@kernel.org> [ rjw: Do not use __this_cpu_read() to access tick_stopped and add got_idle_tick to avoid overloading inidle ] Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Rafael J. Wysocki 提交于
If the tick isn't stopped, the target residency of the state selected by the menu governor may be greater than the actual time to the next tick and that means lost energy. To avoid that, make tick_nohz_get_sleep_length() return the current time to the next event (before stopping the tick) in addition to the estimated one via an extra pointer argument and make menu_select() use that value to refine the state selection when necessary. Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
-
由 Rafael J. Wysocki 提交于
In order to address the issue with short idle duration predictions by the idle governor after the scheduler tick has been stopped, reorder the code in cpuidle_idle_call() so that the governor idle state selection runs before tick_nohz_idle_go_idle() and use the "nohz" hint returned by cpuidle_select() to decide whether or not to stop the tick. This isn't straightforward, because menu_select() invokes tick_nohz_get_sleep_length() to get the time to the next timer event and the number returned by the latter comes from __tick_nohz_idle_stop_tick(). Fortunately, however, it is possible to compute that number without actually stopping the tick and with the help of the existing code. Namely, tick_nohz_get_sleep_length() can be made call tick_nohz_next_event(), introduced earlier, to get the time to the next non-highres timer event. If that happens, tick_nohz_next_event() need not be called by __tick_nohz_idle_stop_tick() again. If it turns out that the scheduler tick cannot be stopped going forward or the next timer event is too close for the tick to be stopped, tick_nohz_get_sleep_length() can simply return the time to the next event currently programmed into the corresponding clock event device. In addition to knowing the return value of tick_nohz_next_event(), however, tick_nohz_get_sleep_length() needs to know the time to the next highres timer event, but with the scheduler tick timer excluded, which can be computed with the help of hrtimer_get_next_event(). That minimum of that number and the tick_nohz_next_event() return value is the total time to the next timer event with the assumption that the tick will be stopped. It can be returned to the idle governor which can use it for predicting idle duration (under the assumption that the tick will be stopped) and deciding whether or not it makes sense to stop the tick before putting the CPU into the selected idle state. With the above, the sleep_length field in struct tick_sched is not necessary any more, so drop it. Link: https://bugzilla.kernel.org/show_bug.cgi?id=199227Reported-by: NDoug Smythies <dsmythies@telus.net> Reported-by: NThomas Ilsche <thomas.ilsche@tu-dresden.de> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NFrederic Weisbecker <frederic@kernel.org>
-
- 08 4月, 2018 2 次提交
-
-
由 Rafael J. Wysocki 提交于
The next set of changes will need to compute the time to the next hrtimer event over all hrtimers except for the scheduler tick one. To that end introduce a new helper function, hrtimer_next_event_without(), for computing the time until the next hrtimer event over all timers except for one and modify the underlying code in __hrtimer_next_event_base() to prepare it for being called by that new function. No intentional changes in functionality. Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NFrederic Weisbecker <frederic@kernel.org>
-
由 Rafael J. Wysocki 提交于
In order to address the issue with short idle duration predictions by the idle governor after the scheduler tick has been stopped, split tick_nohz_stop_sched_tick() into two separate routines, one computing the time to the next timer event and the other simply stopping the tick when the time to the next timer event is known. Prepare these two routines to be called separately, as one of them will be called by the idle governor in the cpuidle_select() code path after subsequent changes. Update the former callers of tick_nohz_stop_sched_tick() to use the new routines, tick_nohz_next_event() and tick_nohz_stop_tick(), instead of it and move the updates of the sleep_length field in struct tick_sched into __tick_nohz_idle_stop_tick() as it doesn't need to be updated anywhere else. There should be no intentional visible changes in functionality resulting from this change. Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NFrederic Weisbecker <frederic@kernel.org>
-
- 06 4月, 2018 17 次提交
-
-
由 Abderrahmane Benbachir 提交于
Trace events have been added around the initcall functions defined in init/main.c. But console and security have their own initcalls. This adds the trace events associated for those initcall functions. Link: http://lkml.kernel.org/r/1521765208.19745.2.camel@polymtl.ca Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: NAbderrahmane Benbachir <abderrahmane.benbachir@polymtl.ca> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Steven Rostedt (VMware) 提交于
A boot up test function update_pred_fn() dereferences filter->prog without the proper rcu annotation. To do this, we must also take the event_mutex first. Normally, this isn't needed because this test function can not race with other use cases that touch the event filters (it is disabled if any events are enabled). Reported-by: Nkbuild test robot <fengguang.wu@intel.com> Fixes: 80765597 ("tracing: Rewrite filter logic to be simpler and faster") Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Steven Rostedt (VMware) 提交于
ftrace_function_set_filter() referenences filter->prog without annotation and sparse complains about it. It needs a rcu_dereference_protected() wrapper. Reported-by: Nkbuild test robot <fengguang.wu@intel.com> Fixes: 80765597 ("tracing: Rewrite filter logic to be simpler and faster") Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Chris Wilson 提交于
In commit 932066a15335 ("tracing: Default to using trace_global_clock if sched_clock is unstable"), the logic for deciding to override the default clock if unstable was reversed from the earlier posting. I was trying to reduce the width of the message by using an early return rather than a if-block, but reverted back to using the if-block and accidentally left the predicate inverted. Link: http://lkml.kernel.org/r/20180404212450.26646-1-chris@chris-wilson.co.uk Fixes: 932066a15335 ("tracing: Default to using trace_global_clock if sched_clock is unstable") Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Steven Rostedt (VMware) 提交于
Function tracing can trace in NMIs and such. If the TSC is determined to be unstable, the tracing clock will switch to the global clock on boot up, unless "trace_clock" is specified on the kernel command line. The global clock disables interrupts to access sched_clock_cpu(), and in doing so can be done within lockdep internals (because of function tracing and NMIs). This can trigger false lockdep splats. The trace_clock_global() is special, best not to trace the irq logic within it. Link: http://lkml.kernel.org/r/20180404145015.77bde42d@gandalf.local.homeSigned-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Steven Rostedt (VMware) 提交于
As si_mem_available() can say there is enough memory even though the memory available is not useable by the ring buffer, it is best to not kill innocent applications because the ring buffer is taking up all the memory while it is trying to allocate a great deal of memory. If the allocator is user space (because kernel threads can also increase the size of the kernel ring buffer on boot up), then after si_mem_available() says there is enough memory, set the OOM killer to kill the current task if an OOM triggers during the allocation. Link: http://lkml.kernel.org/r/20180404062340.GD6312@dhcp22.suse.czSuggested-by: NMichal Hocko <mhocko@kernel.org> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Steven Rostedt (VMware) 提交于
The ring buffer is made up of a link list of pages. When making the ring buffer bigger, it will allocate all the pages it needs before adding to the ring buffer, and if it fails, it frees them and returns an error. This makes increasing the ring buffer size an all or nothing action. When this was first created, the pages were allocated with "NORETRY". This was to not cause any Out-Of-Memory (OOM) actions from allocating the ring buffer. But NORETRY was too strict, as the ring buffer would fail to expand even when there's memory available, but was taken up in the page cache. Commit 84861885 ("tracing/ring_buffer: Try harder to allocate") changed the allocating from NORETRY to RETRY_MAYFAIL. The RETRY_MAYFAIL would allocate from the page cache, but if there was no memory available, it would simple fail the allocation and not trigger an OOM. This worked fine, but had one problem. As the ring buffer would allocate one page at a time, it could take up all memory in the system before it failed to allocate and free that memory. If the allocation is happening and the ring buffer allocates all memory and then tries to take more than available, its allocation will not trigger an OOM, but if there's any allocation that happens someplace else, that could trigger an OOM, even though once the ring buffer's allocation fails, it would free up all the previous memory it tried to allocate, and allow other memory allocations to succeed. Commit d02bd27b ("mm/page_alloc.c: calculate 'available' memory in a separate function") separated out si_mem_availble() as a separate function that could be used to see how much memory is available in the system. Using this function to make sure that the ring buffer could be allocated before it tries to allocate pages we can avoid allocating all memory in the system and making it vulnerable to OOMs if other allocations are taking place. Link: http://lkml.kernel.org/r/1522320104-6573-1-git-send-email-zhaoyang.huang@spreadtrum.com CC: stable@vger.kernel.org Cc: linux-mm@kvack.org Fixes: 84861885 ("tracing/ring_buffer: Try harder to allocate") Requires: d02bd27b ("mm/page_alloc.c: calculate 'available' memory in a separate function") Reported-by: NZhaoyang Huang <huangzhaoyang@gmail.com> Tested-by: NJoel Fernandes <joelaf@google.com> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Steven Rostedt (VMware) 提交于
Running a test on a x86_32 kernel I triggered a bug that an interrupt disable/enable isn't being catched by lockdep. At least knowing where the last one was found would be helpful, but the warnings that are produced do not show this information. Even without debugging lockdep, having the WARN() display the last place hard and soft irqs were enabled or disabled is valuable. Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Dan Carpenter 提交于
Smatch complains that idx can be used uninitialized when we check if (idx < 0). It has to be the first iteration through the loop and the HIST_FIELD_FL_STACKTRACE bit has to be clear and the HIST_FIELD_FL_VAR bit has to be set to reach the bug. Link: http://lkml.kernel.org/r/20180328114815.GC29050@mwanda Fixes: 30350d65 ("tracing: Add variable support to hist triggers") Acked-by: NTom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Tom Zanussi 提交于
The strncpy() currently being used for variable string fields can result in a lack of termination if the string length is equal to the field size. Use the safer strscpy() instead, which will guarantee termination. Link: http://lkml.kernel.org/r/fb97c1e518fb358c12a4057d7445ba2c46956cd7.1522256721.git.tom.zanussi@linux.intel.comSigned-off-by: NTom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Tom Zanussi 提交于
Actions also need to be considered when checking for matching triggers - triggers differing only by action should be allowed, but currently aren't because the matching check ignores the action and erroneously returns -EEXIST. Add and call an actions_match() function to address that. Here's an example using onmatch() actions. The first -EEXIST shouldn't occur because the onmatch() is different in the second wakeup_latency() param. The second -EEXIST shouldn't occur because it's a different action (in this case, it doesn't have an action, so shouldn't be seen as being the same and therefore rejected). In the after case, both are correctly accepted (and trying to add one of them again returns -EEXIST as it should). before: # echo 'wakeup_latency u64 lat; pid_t pid' >> /sys/kernel/debug/tracing/synthetic_events # echo 'hist:keys=pid:ts0=common_timestamp.usecs if comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0 if next_comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger # echo 'hist:keys=next_pid:onmatch(sched.sched_wakeup).wakeup_latency(sched.sched_switch.$wakeup_lat,next_pid) if next_comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger # echo 'hist:keys=next_pid:onmatch(sched.sched_wakeup).wakeup_latency(sched.sched_switch.$wakeup_lat,prev_pid) if next_comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger -su: echo: write error: File exists # echo 'hist:keys=next_pid if next_comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger -su: echo: write error: File exists after: # echo 'wakeup_latency u64 lat; pid_t pid' >> /sys/kernel/debug/tracing/synthetic_events # echo 'hist:keys=pid:ts0=common_timestamp.usecs if comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger # echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0 if next_comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger # echo 'hist:keys=next_pid:onmatch(sched.sched_wakeup).wakeup_latency(sched.sched_switch.$wakeup_lat,next_pid) if next_comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger # echo 'hist:keys=next_pid:onmatch(sched.sched_wakeup).wakeup_latency(sched.sched_switch.$wakeup_lat,prev_pid) if next_comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger # echo 'hist:keys=next_pid if next_comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger Link: http://lkml.kernel.org/r/a7fd668b87ec10736c8f016ac4279c8480d50c2b.1522256721.git.tom.zanussi@linux.intel.comTested-by: NMasami Hiramatsu <mhiramat@kernel.org> Signed-off-by: NTom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Tom Zanussi 提交于
Variable references should never have flags appended when displayed - prevent that from happening. Before: # cat /sys/kernel/debug/tracing/events/sched/sched_switch/trigger hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0.usecs:... After: hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0:... Link: http://lkml.kernel.org/r/913318a5610ef6b24af2522575f671fa6ee19b6b.1522256721.git.tom.zanussi@linux.intel.comSigned-off-by: NTom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Tom Zanussi 提交于
When displaying hist triggers, variable references that have the timestamp field flag set are erroneously displayed as common_timestamp rather than the variable reference. Additionally, timestamp expressions are displayed in the same way. Fix this by forcing the timestamp flag handling to follow variable reference and expression handling. Before: # cat /sys/kernel/debug/tracing/events/sched/sched_switch/trigger hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs:... After: # cat /sys/kernel/debug/tracing/events/sched/sched_switch/trigger hist:keys=next_pid:vals=hitcount:wakeup_lat=common_timestamp.usecs-$ts0.usecs:... Link: http://lkml.kernel.org/r/92746b06be67499c2a6217bd55395b350ad18fad.1522256721.git.tom.zanussi@linux.intel.comSigned-off-by: NTom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Salvatore Mesoraca 提交于
Avoid a VLA by using a real constant expression instead of a variable. The compiler should be able to optimize the original code and avoid using an actual VLA. Anyway this change is useful because it will avoid a false positive with -Wvla, it might also help the compiler generating better code. Link: http://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com Link: http://lkml.kernel.org/r/1522399988-8815-1-git-send-email-s.mesoraca16@gmail.comSigned-off-by: NSalvatore Mesoraca <s.mesoraca16@gmail.com> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Chris Wilson 提交于
Mention the alternative of adding trace_clock=global to the kernel command line when we detect that we've used an unstable clock across a suspend/resume cycle. Link: http://lkml.kernel.org/r/20180330150132.16903-2-chris@chris-wilson.co.ukSigned-off-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Chris Wilson 提交于
Across suspend, we may see a very large drift in timestamps if the sched clock is unstable, prompting the global trace's ringbuffer code to warn and suggest switching to the global clock. Preempt this request by detecting when the sched clock is unstable (determined during late_initcall) and automatically switching the default clock over to trace_global_clock. This should prevent requiring user interaction to resolve warnings such as: Delta way too big! 18446743856563626466 ts=18446744054496180323 write stamp = 197932553857 If you just came from a suspend/resume, please switch to the trace global clock: echo global > /sys/kernel/debug/tracing/trace_clock Link: http://lkml.kernel.org/r/20180330150132.16903-1-chris@chris-wilson.co.ukSigned-off-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Rafael J. Wysocki 提交于
Add a new pointer argument to cpuidle_select() and to the ->select cpuidle governor callback to allow a boolean value indicating whether or not the tick should be stopped before entering the selected state to be returned from there. Make the ladder governor ignore that pointer (to preserve its current behavior) and make the menu governor return 'false" through it if: (1) the idle exit latency is constrained at 0, or (2) the selected state is a polling one, or (3) the expected idle period duration is within the tick period range. In addition to that, the correction factor computations in the menu governor need to take the possibility that the tick may not be stopped into account to avoid artificially small correction factor values. To that end, add a mechanism to record tick wakeups, as suggested by Peter Zijlstra, and use it to modify the menu_update() behavior when tick wakeup occurs. Namely, if the CPU is woken up by the tick and the return value of tick_nohz_get_sleep_length() is not within the tick boundary, the predicted idle duration is likely too short, so make menu_update() try to compensate for that by updating the governor statistics as though the CPU was idle for a long time. Since the value returned through the new argument pointer of cpuidle_select() is not used by its caller yet, this change by itself is not expected to alter the functionality of the code. Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
-