- 10 10月, 2014 2 次提交
-
-
由 Nishanth Aravamudan 提交于
kernel/kthread.c: partial revert of 81c98869 ("kthread: ensure locality of task_struct allocations") After discussions with Tejun, we don't want to spread the use of cpu_to_mem() (and thus knowledge of allocators/NUMA topology details) into callers, but would rather ensure the callees correctly handle memoryless nodes. With the previous patches ("topology: add support for node_to_mem_node() to determine the fallback node" and "slub: fallback to node_to_mem_node() node if allocating on memoryless node") adding and using node_to_mem_node(), we can safely undo part of the change to the kthread logic from 81c98869. Signed-off-by: NNishanth Aravamudan <nacc@linux.vnet.ibm.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: David Rientjes <rientjes@google.com> Cc: Han Pingtian <hanpt@linux.vnet.ibm.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Anton Blanchard <anton@samba.org> Cc: Christoph Lameter <cl@linux.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 chai wen 提交于
For now, soft lockup detector warns once for each case of process softlockup. But the thread 'watchdog/n' may not always get the cpu at the time slot between the task switch of two processes hogging that cpu to reset soft_watchdog_warn. An example would be two processes hogging the cpu. Process A causes the softlockup warning and is killed manually by a user. Process B immediately becomes the new process hogging the cpu preventing the softlockup code from resetting the soft_watchdog_warn variable. This case is a false negative of "warn only once for a process", as there may be a different process that is going to hog the cpu. Resolve this by saving/checking the task pointer of the hogging process and use that to reset soft_watchdog_warn too. [dzickus@redhat.com: update comment] Signed-off-by: Nchai wen <chaiw.fnst@cn.fujitsu.com> Signed-off-by: NDon Zickus <dzickus@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 03 10月, 2014 3 次提交
-
-
由 Peter Zijlstra 提交于
Oleg noticed that a cleanup by Sylvain actually uncovered a bug; by calling perf_event_free_task() when failing sched_fork() we will not yet have done the memset() on ->perf_event_ctxp[] and will therefore try and 'free' the inherited contexts, which are still in use by the parent process. This is bad.. Suggested-by: NOleg Nesterov <oleg@redhat.com> Reported-by: NOleg Nesterov <oleg@redhat.com> Reported-by: NSylvain 'ythier' Hitier <sylvain.hitier@gmail.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Steven Rostedt (Red Hat) 提交于
Commit 651e22f2 "ring-buffer: Always reset iterator to reader page" fixed one bug but in the process caused another one. The reset is to update the header page, but that fix also changed the way the cached reads were updated. The cache reads are used to test if an iterator needs to be updated or not. A ring buffer iterator, when created, disables writes to the ring buffer but does not stop other readers or consuming reads from happening. Although all readers are synchronized via a lock, they are only synchronized when in the ring buffer functions. Those functions may be called by any number of readers. The iterator continues down when its not interrupted by a consuming reader. If a consuming read occurs, the iterator starts from the beginning of the buffer. The way the iterator sees that a consuming read has happened since its last read is by checking the reader "cache". The cache holds the last counts of the read and the reader page itself. Commit 651e22f2 changed what was saved by the cache_read when the rb_iter_reset() occurred, making the iterator never match the cache. Then if the iterator calls rb_iter_reset(), it will go into an infinite loop by checking if the cache doesn't match, doing the reset and retrying, just to see that the cache still doesn't match! Which should never happen as the reset is suppose to set the cache to the current value and there's locks that keep a consuming reader from having access to the data. Fixes: 651e22f2 "ring-buffer: Always reset iterator to reader page" Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Kyle McMartin 提交于
Similar to ARM, AArch64 is generating $x and $d syms... which isn't terribly helpful when looking at %pF output and the like. Filter those out in kallsyms, modpost and when looking at module symbols. Seems simplest since none of these check EM_ARM anyway, to just add it to the strchr used, rather than trying to make things overly complicated. initcall_debug improves: dmesg_before.txt: initcall $x+0x0/0x154 [sg] returned 0 after 26331 usecs dmesg_after.txt: initcall init_sg+0x0/0x154 [sg] returned 0 after 15461 usecs Signed-off-by: NKyle McMartin <kyle@redhat.com> Acked-by: NRusty Russell <rusty@rustcorp.com.au> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-
- 02 10月, 2014 1 次提交
-
-
由 Alexei Starovoitov 提交于
consider C program represented in eBPF: int filter(int arg) { int a, b, c, *ptr; if (arg == 1) ptr = &a; else if (arg == 2) ptr = &b; else ptr = &c; *ptr = 0; return 0; } eBPF verifier has to follow all possible paths through the program to recognize that '*ptr = 0' instruction would be safe to execute in all situations. It's doing it by picking a path towards the end and observes changes to registers and stack at every insn until it reaches bpf_exit. Then it comes back to one of the previous branches and goes towards the end again with potentially different values in registers. When program has a lot of branches, the number of possible combinations of branches is huge, so verifer has a hard limit of walking no more than 32k instructions. This limit can be reached and complex (but valid) programs could be rejected. Therefore it's important to recognize equivalent verifier states to prune this depth first search. Basic idea can be illustrated by the program (where .. are some eBPF insns): 1: .. 2: if (rX == rY) goto 4 3: .. 4: .. 5: .. 6: bpf_exit In the first pass towards bpf_exit the verifier will walk insns: 1, 2, 3, 4, 5, 6 Since insn#2 is a branch the verifier will remember its state in verifier stack to come back to it later. Since insn#4 is marked as 'branch target', the verifier will remember its state in explored_states[4] linked list. Once it reaches insn#6 successfully it will pop the state recorded at insn#2 and will continue. Without search pruning optimization verifier would have to walk 4, 5, 6 again, effectively simulating execution of insns 1, 2, 4, 5, 6 With search pruning it will check whether state at #4 after jumping from #2 is equivalent to one recorded in explored_states[4] during first pass. If there is an equivalent state, verifier can prune the search at #4 and declare this path to be safe as well. In other words two states at #4 are equivalent if execution of 1, 2, 3, 4 insns and 1, 2, 4 insns produces equivalent registers and stack. Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 9月, 2014 1 次提交
-
-
由 Dan Williams 提交于
Per commit "77873803 net_dma: mark broken" net_dma is no longer used and there is no plan to fix it. This is the mechanical removal of bits in CONFIG_NET_DMA ifdef guards. Reverting the remainder of the net_dma induced changes is deferred to subsequent patches. Marked for stable due to Roman's report of a memory leak in dma_pin_iovec_pages(): https://lkml.org/lkml/2014/9/3/177 Cc: Dave Jiang <dave.jiang@intel.com> Cc: Vinod Koul <vinod.koul@intel.com> Cc: David Whipple <whipple@securedatainnovations.ch> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Cc: <stable@vger.kernel.org> Reported-by: NRoman Gushchin <klamm@yandex-team.ru> Acked-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 27 9月, 2014 11 次提交
-
-
由 Alexei Starovoitov 提交于
1. the library includes a trivial set of BPF syscall wrappers: int bpf_create_map(int key_size, int value_size, int max_entries); int bpf_update_elem(int fd, void *key, void *value); int bpf_lookup_elem(int fd, void *key, void *value); int bpf_delete_elem(int fd, void *key); int bpf_get_next_key(int fd, void *key, void *next_key); int bpf_prog_load(enum bpf_prog_type prog_type, const struct sock_filter_int *insns, int insn_len, const char *license); bpf_prog_load() stores verifier log into global bpf_log_buf[] array and BPF_*() macros to build instructions 2. test stubs configure eBPF infra with 'unspec' map and program types. These are fake types used by user space testsuite only. 3. verifier tests valid and invalid programs and expects predefined error log messages from kernel. 40 tests so far. $ sudo ./test_verifier #0 add+sub+mul OK #1 unreachable OK #2 unreachable2 OK #3 out of range jump OK #4 out of range jump2 OK #5 test1 ld_imm64 OK ... Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexei Starovoitov 提交于
This patch adds verifier core which simulates execution of every insn and records the state of registers and program stack. Every branch instruction seen during simulation is pushed into state stack. When verifier reaches BPF_EXIT, it pops the state from the stack and continues until it reaches BPF_EXIT again. For program: 1: bpf_mov r1, xxx 2: if (r1 == 0) goto 5 3: bpf_mov r0, 1 4: goto 6 5: bpf_mov r0, 2 6: bpf_exit The verifier will walk insns: 1, 2, 3, 4, 6 then it will pop the state recorded at insn#2 and will continue: 5, 6 This way it walks all possible paths through the program and checks all possible values of registers. While doing so, it checks for: - invalid instructions - uninitialized register access - uninitialized stack access - misaligned stack access - out of range stack access - invalid calling convention - instruction encoding is not using reserved fields Kernel subsystem configures the verifier with two callbacks: - bool (*is_valid_access)(int off, int size, enum bpf_access_type type); that provides information to the verifer which fields of 'ctx' are accessible (remember 'ctx' is the first argument to eBPF program) - const struct bpf_func_proto *(*get_func_proto)(enum bpf_func_id func_id); returns argument constraints of kernel helper functions that eBPF program may call, so that verifier can checks that R1-R5 types match the prototype More details in Documentation/networking/filter.txt and in kernel/bpf/verifier.c Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexei Starovoitov 提交于
check that control flow graph of eBPF program is a directed acyclic graph check_cfg() does: - detect loops - detect unreachable instructions - check that program terminates with BPF_EXIT insn - check that all branches are within program boundary Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexei Starovoitov 提交于
eBPF programs passed from userspace are using pseudo BPF_LD_IMM64 instructions to refer to process-local map_fd. Scan the program for such instructions and if FDs are valid, convert them to 'struct bpf_map' pointers which will be used by verifier to check access to maps in bpf_map_lookup/update() calls. If program passes verifier, convert pseudo BPF_LD_IMM64 into generic by dropping BPF_PSEUDO_MAP_FD flag. Note that eBPF interpreter is generic and knows nothing about pseudo insns. Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexei Starovoitov 提交于
add optional attributes for BPF_PROG_LOAD syscall: union bpf_attr { struct { ... __u32 log_level; /* verbosity level of eBPF verifier */ __u32 log_size; /* size of user buffer */ __aligned_u64 log_buf; /* user supplied 'char *buffer' */ }; }; when log_level > 0 the verifier will return its verification log in the user supplied buffer 'log_buf' which can be used by program author to analyze why verifier rejected given program. 'Understanding eBPF verifier messages' section of Documentation/networking/filter.txt provides several examples of these messages, like the program: BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), BPF_LD_MAP_FD(BPF_REG_1, 0), BPF_CALL_FUNC(BPF_FUNC_map_lookup_elem), BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1), BPF_ST_MEM(BPF_DW, BPF_REG_0, 4, 0), BPF_EXIT_INSN(), will be rejected with the following multi-line message in log_buf: 0: (7a) *(u64 *)(r10 -8) = 0 1: (bf) r2 = r10 2: (07) r2 += -8 3: (b7) r1 = 0 4: (85) call 1 5: (15) if r0 == 0x0 goto pc+1 R0=map_ptr R10=fp 6: (7a) *(u64 *)(r0 +4) = 0 misaligned access off 4 size 8 The format of the output can change at any time as verifier evolves. Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexei Starovoitov 提交于
this patch adds all of eBPF verfier documentation and empty bpf_check() The end goal for the verifier is to statically check safety of the program. Verifier will catch: - loops - out of range jumps - unreachable instructions - invalid instructions - uninitialized register access - uninitialized stack access - misaligned stack access - out of range stack access - invalid calling convention More details in Documentation/networking/filter.txt Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexei Starovoitov 提交于
in native eBPF programs userspace is using pseudo BPF_CALL instructions which encode one of 'enum bpf_func_id' inside insn->imm field. Verifier checks that program using correct function arguments to given func_id. If all checks passed, kernel needs to fixup BPF_CALL->imm fields by replacing func_id with in-kernel function pointer. eBPF interpreter just calls the function. In-kernel eBPF users continue to use generic BPF_CALL. Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexei Starovoitov 提交于
eBPF programs are similar to kernel modules. They are loaded by the user process and automatically unloaded when process exits. Each eBPF program is a safe run-to-completion set of instructions. eBPF verifier statically determines that the program terminates and is safe to execute. The following syscall wrapper can be used to load the program: int bpf_prog_load(enum bpf_prog_type prog_type, const struct bpf_insn *insns, int insn_cnt, const char *license) { union bpf_attr attr = { .prog_type = prog_type, .insns = ptr_to_u64(insns), .insn_cnt = insn_cnt, .license = ptr_to_u64(license), }; return bpf(BPF_PROG_LOAD, &attr, sizeof(attr)); } where 'insns' is an array of eBPF instructions and 'license' is a string that must be GPL compatible to call helper functions marked gpl_only Upon succesful load the syscall returns prog_fd. Use close(prog_fd) to unload the program. User space tests and examples follow in the later patches Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexei Starovoitov 提交于
'maps' is a generic storage of different types for sharing data between kernel and userspace. The maps are accessed from user space via BPF syscall, which has commands: - create a map with given type and attributes fd = bpf(BPF_MAP_CREATE, union bpf_attr *attr, u32 size) returns fd or negative error - lookup key in a given map referenced by fd err = bpf(BPF_MAP_LOOKUP_ELEM, union bpf_attr *attr, u32 size) using attr->map_fd, attr->key, attr->value returns zero and stores found elem into value or negative error - create or update key/value pair in a given map err = bpf(BPF_MAP_UPDATE_ELEM, union bpf_attr *attr, u32 size) using attr->map_fd, attr->key, attr->value returns zero or negative error - find and delete element by key in a given map err = bpf(BPF_MAP_DELETE_ELEM, union bpf_attr *attr, u32 size) using attr->map_fd, attr->key - iterate map elements (based on input key return next_key) err = bpf(BPF_MAP_GET_NEXT_KEY, union bpf_attr *attr, u32 size) using attr->map_fd, attr->key, attr->next_key - close(fd) deletes the map Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexei Starovoitov 提交于
done as separate commit to ease conflict resolution Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexei Starovoitov 提交于
BPF syscall is a multiplexor for a range of different operations on eBPF. This patch introduces syscall with single command to create a map. Next patch adds commands to access maps. 'maps' is a generic storage of different types for sharing data between kernel and userspace. Userspace example: /* this syscall wrapper creates a map with given type and attributes * and returns map_fd on success. * use close(map_fd) to delete the map */ int bpf_create_map(enum bpf_map_type map_type, int key_size, int value_size, int max_entries) { union bpf_attr attr = { .map_type = map_type, .key_size = key_size, .value_size = value_size, .max_entries = max_entries }; return bpf(BPF_MAP_CREATE, &attr, sizeof(attr)); } 'union bpf_attr' is backwards compatible with future extensions. More details in Documentation/networking/filter.txt and in manpage Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 9月, 2014 1 次提交
-
-
由 Vincent Sanders 提交于
Enable gcov support for ARM based on original patches by David Singleton and George G. Davis Riku - updated to patch to current mainline kernel. The patch has been submitted in 2010, 2012 - for symmetry, now in 2014 too. https://lwn.net/Articles/390419/ http://marc.info/?l=linux-arm-kernel&m=133823081813044 v2: remove arch/arm/kernel from gcov disabled files Cc: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Acked-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NRiku Voipio <riku.voipio@linaro.org> Signed-off-by: NVincent Sanders <vincent.sanders@collabora.co.uk> Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
- 25 9月, 2014 4 次提交
-
-
由 Vincent Stehlé 提交于
Export handle_fasteoi_irq to be able to use it in e.g. the Zynq gpio driver since commit 6dd85950 ("gpio: zynq: Fix IRQ handlers"). This fixes the following link issue: ERROR: "handle_fasteoi_irq" [drivers/gpio/gpio-zynq.ko] undefined! Signed-off-by: NVincent Stehlé <vincent.stehle@laposte.net> Acked-by: NArnd Bergmann <arnd@arndb.de> Cc: linux-arm-kernel@lists.infradead.org Cc: Vincent Stehle <vincent.stehle@laposte.net> Cc: Lars-Peter Clausen <lars@metafoo.de> Cc: Linus Walleij <linus.walleij@linaro.org> Link: http://lkml.kernel.org/r/1408663880-29179-1-git-send-email-vincent.stehle@laposte.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 NeilBrown 提交于
In commit c1221321 sched: Allow wait_on_bit_action() functions to support a timeout I suggested that a "wait_on_bit_timeout()" interface would not meet my need. This isn't true - I was just over-engineering. Including a 'private' field in wait_bit_key instead of a focused "timeout" field was just premature generalization. If some other use is ever found, it can be generalized or added later. So this patch renames "private" to "timeout" with a meaning "stop waiting when "jiffies" reaches or passes "timeout", and adds two of the many possible wait..bit..timeout() interfaces: wait_on_page_bit_killable_timeout(), which is the one I want to use, and out_of_line_wait_on_bit_timeout() which is a reasonably general example. Others can be added as needed. Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NNeilBrown <neilb@suse.de> Acked-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Zefan Li 提交于
When we change cpuset.memory_spread_{page,slab}, cpuset will flip PF_SPREAD_{PAGE,SLAB} bit of tsk->flags for each task in that cpuset. This should be done using atomic bitops, but currently we don't, which is broken. Tetsuo reported a hard-to-reproduce kernel crash on RHEL6, which happened when one thread tried to clear PF_USED_MATH while at the same time another thread tried to flip PF_SPREAD_PAGE/PF_SPREAD_SLAB. They both operate on the same task. Here's the full report: https://lkml.org/lkml/2014/9/19/230 To fix this, we make PF_SPREAD_PAGE and PF_SPREAD_SLAB atomic flags. v4: - updated mm/slab.c. (Fengguang Wu) - updated Documentation. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Miao Xie <miaox@cn.fujitsu.com> Cc: Kees Cook <keescook@chromium.org> Fixes: 950592f7 ("cpusets: update tasks' page/slab spread flags in time") Cc: <stable@vger.kernel.org> # 2.6.31+ Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: NZefan Li <lizefan@huawei.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Rafael J. Wysocki 提交于
Revert commit 6efde38f (PM / Hibernate: Iterate over set bits instead of PFNs in swsusp_free()) that introduced a NULL pointer dereference during system resume from hibernation: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff810a8cc1>] swsusp_free+0x21/0x190 PGD b39c2067 PUD b39c1067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: <irrelevant list of modules> CPU: 1 PID: 4898 Comm: s2disk Tainted: G C 3.17-rc5-amd64 #1 Debian 3.17~rc5-1~exp1 Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011 task: ffff88023155ea40 ti: ffff8800b3b14000 task.ti: ffff8800b3b14000 RIP: 0010:[<ffffffff810a8cc1>] [<ffffffff810a8cc1>] swsusp_free+0x21/0x190 RSP: 0018:ffff8800b3b17ea8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8800b39bab00 RCX: 0000000000000001 RDX: ffff8800b39bab10 RSI: ffff8800b39bab00 RDI: 0000000000000000 RBP: 0000000000000010 R08: 0000000000000000 R09: 0000000000000000 R10: ffff8800b39bab10 R11: 0000000000000246 R12: ffffea0000000000 R13: ffff880232f485a0 R14: ffff88023ac27cd8 R15: ffff880232927590 FS: 00007f406d83b700(0000) GS:ffff88023bc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 00000000b3a62000 CR4: 00000000000007e0 Stack: ffff8800b39bab00 0000000000000010 ffff880232927590 ffffffff810acb4a ffff8800b39bab00 ffffffff811a955a ffff8800b39bab10 0000000000000000 ffff88023155f098 ffffffff81a6b8c0 ffff88023155ea40 0000000000000007 Call Trace: [<ffffffff810acb4a>] ? snapshot_release+0x2a/0xb0 [<ffffffff811a955a>] ? __fput+0xca/0x1d0 [<ffffffff81080627>] ? task_work_run+0x97/0xd0 [<ffffffff81012d89>] ? do_notify_resume+0x69/0xa0 [<ffffffff8151452a>] ? int_signal+0x12/0x17 Code: 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 54 48 8b 05 ba 62 9c 00 49 bc 00 00 00 00 00 ea ff ff 48 8b 3d a1 62 9c 00 55 53 <48> 8b 10 48 89 50 18 48 8b 52 20 48 c7 40 28 00 00 00 00 c7 40 RIP [<ffffffff810a8cc1>] swsusp_free+0x21/0x190 RSP <ffff8800b3b17ea8> CR2: 0000000000000000 ---[ end trace f02be86a1ec0cccb ]--- due to forbidden_pages_map being NULL in swsusp_free(). Fixes: 6efde38f "PM / Hibernate: Iterate over set bits instead of PFNs in swsusp_free()" Reported-by: NBjørn Mork <bjorn@mork.no> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 19 9月, 2014 1 次提交
-
-
由 Zefan Li 提交于
cgroup_pidlist_start() holds cgrp->pidlist_mutex and then calls pidlist_array_load(), and cgroup_pidlist_stop() releases the mutex. It is wrong that we release the mutex in the failure path in pidlist_array_load(), because cgroup_pidlist_stop() will be called no matter if cgroup_pidlist_start() returns errno or not. Fixes: 4bac00d1 Cc: <stable@vger.kernel.org> # 3.14+ Signed-off-by: NZefan Li <lizefan@huawei.com> Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
-
- 14 9月, 2014 4 次提交
-
-
由 Frederic Weisbecker 提交于
The nohz full functionality depends on IRQ work to trigger its own interrupts. As it's used to restart the tick, we can't rely on the tick fallback for irq work callbacks, ie: we can't use the tick to restart the tick itself. Lets reject the full dynticks initialization if that arch support isn't available. As a side effect, this makes sure that nohz kick is never called from the tick. That otherwise would result in illegal hrtimer self-cancellation and lockup. Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
-
由 Frederic Weisbecker 提交于
The supports for CONFIG_NO_HZ_FULL_ALL=y and the nohz_full= kernel parameter both have their own way to do the same thing: allocate full dynticks cpumasks, fill them and initialize some state variables. Lets consolidate that all in the same place. While at it, convert some regular printk message to warnings when fundamental allocations fail. Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
-
由 Frederic Weisbecker 提交于
The nohz full kick, which restarts the tick when any resource depend on it, can't be executed anywhere given the operation it does on timers. If it is called from the scheduler or timers code, chances are that we run into a deadlock. This is why we run the nohz full kick from an irq work. That way we make sure that the kick runs on a virgin context. However if that's the case when irq work runs in its own dedicated self-ipi, things are different for the big bunch of archs that don't support the self triggered way. In order to support them, irq works are also handled by the timer interrupt as fallback. Now when irq works run on the timer interrupt, the context isn't blank. More precisely, they can run in the context of the hrtimer that runs the tick. But the nohz kick cancels and restarts this hrtimer and cancelling an hrtimer from itself isn't allowed. This is why we run in an endless loop: Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 2 CPU: 2 PID: 7538 Comm: kworker/u8:8 Not tainted 3.16.0+ #34 Workqueue: btrfs-endio-write normal_work_helper [btrfs] ffff880244c06c88 000000001b486fe1 ffff880244c06bf0 ffffffff8a7f1e37 ffffffff8ac52a18 ffff880244c06c78 ffffffff8a7ef928 0000000000000010 ffff880244c06c88 ffff880244c06c20 000000001b486fe1 0000000000000000 Call Trace: <NMI[<ffffffff8a7f1e37>] dump_stack+0x4e/0x7a [<ffffffff8a7ef928>] panic+0xd4/0x207 [<ffffffff8a1450e8>] watchdog_overflow_callback+0x118/0x120 [<ffffffff8a186b0e>] __perf_event_overflow+0xae/0x350 [<ffffffff8a184f80>] ? perf_event_task_disable+0xa0/0xa0 [<ffffffff8a01a4cf>] ? x86_perf_event_set_period+0xbf/0x150 [<ffffffff8a187934>] perf_event_overflow+0x14/0x20 [<ffffffff8a020386>] intel_pmu_handle_irq+0x206/0x410 [<ffffffff8a01937b>] perf_event_nmi_handler+0x2b/0x50 [<ffffffff8a007b72>] nmi_handle+0xd2/0x390 [<ffffffff8a007aa5>] ? nmi_handle+0x5/0x390 [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0 [<ffffffff8a008062>] default_do_nmi+0x72/0x1c0 [<ffffffff8a008268>] do_nmi+0xb8/0x100 [<ffffffff8a7ff66a>] end_repeat_nmi+0x1e/0x2e [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0 [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0 [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0 <<EOE><IRQ[<ffffffff8a0ccd2f>] lock_acquired+0xaf/0x450 [<ffffffff8a0f74c5>] ? lock_hrtimer_base.isra.20+0x25/0x50 [<ffffffff8a7fc678>] _raw_spin_lock_irqsave+0x78/0x90 [<ffffffff8a0f74c5>] ? lock_hrtimer_base.isra.20+0x25/0x50 [<ffffffff8a0f74c5>] lock_hrtimer_base.isra.20+0x25/0x50 [<ffffffff8a0f7723>] hrtimer_try_to_cancel+0x33/0x1e0 [<ffffffff8a0f78ea>] hrtimer_cancel+0x1a/0x30 [<ffffffff8a109237>] tick_nohz_restart+0x17/0x90 [<ffffffff8a10a213>] __tick_nohz_full_check+0xc3/0x100 [<ffffffff8a10a25e>] nohz_full_kick_work_func+0xe/0x10 [<ffffffff8a17c884>] irq_work_run_list+0x44/0x70 [<ffffffff8a17c8da>] irq_work_run+0x2a/0x50 [<ffffffff8a0f700b>] update_process_times+0x5b/0x70 [<ffffffff8a109005>] tick_sched_handle.isra.21+0x25/0x60 [<ffffffff8a109b81>] tick_sched_timer+0x41/0x60 [<ffffffff8a0f7aa2>] __run_hrtimer+0x72/0x470 [<ffffffff8a109b40>] ? tick_sched_do_timer+0xb0/0xb0 [<ffffffff8a0f8707>] hrtimer_interrupt+0x117/0x270 [<ffffffff8a034357>] local_apic_timer_interrupt+0x37/0x60 [<ffffffff8a80010f>] smp_apic_timer_interrupt+0x3f/0x50 [<ffffffff8a7fe52f>] apic_timer_interrupt+0x6f/0x80 To fix this we force non-lazy irq works to run on irq work self-IPIs when available. That ability of the arch to trigger irq work self IPIs is available with arch_irq_work_has_interrupt(). Reported-by: NCatalin Iacob <iacobcatalin@gmail.com> Reported-by: NDave Jones <davej@redhat.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
-
由 Frederic Weisbecker 提交于
This way we unbloat a bit main.c and more importantly we initialize nohz full after init_IRQ(). This dependency will be needed in further patches because nohz full needs irq work to raise its own IRQ. Information about the support for this ability on ARM64 is obtained on init_IRQ() which initialize the pointer to __smp_call_function. Since tick_init() is called right after init_IRQ(), this is a good place to call tick_nohz_init() and prepare for that dependency. Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
-
- 13 9月, 2014 5 次提交
-
-
由 Richard Larocque 提交于
Locks the k_itimer's it_lock member when handling the alarm timer's expiry callback. The regular posix timers defined in posix-timers.c have this lock held during timout processing because their callbacks are routed through posix_timer_fn(). The alarm timers follow a different path, so they ought to grab the lock somewhere else. Cc: stable@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Sharvil Nanavati <sharvil@google.com> Signed-off-by: NRichard Larocque <rlarocque@google.com> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Richard Larocque 提交于
Avoids sending a signal to alarm timers created with sigev_notify set to SIGEV_NONE by checking for that special case in the timeout callback. The regular posix timers avoid sending signals to SIGEV_NONE timers by not scheduling any callbacks for them in the first place. Although it would be possible to do something similar for alarm timers, it's simpler to handle this as a special case in the timeout. Prior to this patch, the alarm timer would ignore the sigev_notify value and try to deliver signals to the process anyway. Even worse, the sanity check for the value of sigev_signo is skipped when SIGEV_NONE was specified, so the signal number could be bogus. If sigev_signo was an unitialized value (as it often would be if SIGEV_NONE is used), then it's hard to predict which signal will be sent. Cc: stable@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Sharvil Nanavati <sharvil@google.com> Signed-off-by: NRichard Larocque <rlarocque@google.com> Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Richard Larocque 提交于
Returns the time remaining for an alarm timer, rather than the time at which it is scheduled to expire. If the timer has already expired or it is not currently scheduled, the it_value's members are set to zero. This new behavior matches that of the other posix-timers and the POSIX specifications. This is a change in user-visible behavior, and may break existing applications. Hopefully, few users rely on the old incorrect behavior. Cc: stable@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Sharvil Nanavati <sharvil@google.com> Signed-off-by: NRichard Larocque <rlarocque@google.com> [jstultz: minor style tweak] Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Andrew Hunter 提交于
timeval_to_jiffies tried to round a timeval up to an integral number of jiffies, but the logic for doing so was incorrect: intervals corresponding to exactly N jiffies would become N+1. This manifested itself particularly repeatedly stopping/starting an itimer: setitimer(ITIMER_PROF, &val, NULL); setitimer(ITIMER_PROF, NULL, &val); would add a full tick to val, _even if it was exactly representable in terms of jiffies_ (say, the result of a previous rounding.) Doing this repeatedly would cause unbounded growth in val. So fix the math. Here's what was wrong with the conversion: we essentially computed (eliding seconds) jiffies = usec * (NSEC_PER_USEC/TICK_NSEC) by using scaling arithmetic, which took the best approximation of NSEC_PER_USEC/TICK_NSEC with denominator of 2^USEC_JIFFIE_SC = x/(2^USEC_JIFFIE_SC), and computed: jiffies = (usec * x) >> USEC_JIFFIE_SC and rounded this calculation up in the intermediate form (since we can't necessarily exactly represent TICK_NSEC in usec.) But the scaling arithmetic is a (very slight) *over*approximation of the true value; that is, instead of dividing by (1 usec/ 1 jiffie), we effectively divided by (1 usec/1 jiffie)-epsilon (rounding down). This would normally be fine, but we want to round timeouts up, and we did so by adding 2^USEC_JIFFIE_SC - 1 before the shift; this would be fine if our division was exact, but dividing this by the slightly smaller factor was equivalent to adding just _over_ 1 to the final result (instead of just _under_ 1, as desired.) In particular, with HZ=1000, we consistently computed that 10000 usec was 11 jiffies; the same was true for any exact multiple of TICK_NSEC. We could possibly still round in the intermediate form, adding something less than 2^USEC_JIFFIE_SC - 1, but easier still is to convert usec->nsec, round in nanoseconds, and then convert using time*spec*_to_jiffies. This adds one constant multiplication, and is not observably slower in microbenchmarks on recent x86 hardware. Tested: the following program: int main() { struct itimerval zero = {{0, 0}, {0, 0}}; /* Initially set to 10 ms. */ struct itimerval initial = zero; initial.it_interval.tv_usec = 10000; setitimer(ITIMER_PROF, &initial, NULL); /* Save and restore several times. */ for (size_t i = 0; i < 10; ++i) { struct itimerval prev; setitimer(ITIMER_PROF, &zero, &prev); /* on old kernels, this goes up by TICK_USEC every iteration */ printf("previous value: %ld %ld %ld %ld\n", prev.it_interval.tv_sec, prev.it_interval.tv_usec, prev.it_value.tv_sec, prev.it_value.tv_usec); setitimer(ITIMER_PROF, &prev, NULL); } return 0; } Cc: stable@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Paul Turner <pjt@google.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Reviewed-by: NPaul Turner <pjt@google.com> Reported-by: NAaron Jacobs <jacobsa@google.com> Signed-off-by: NAndrew Hunter <ahh@google.com> [jstultz: Tweaked to apply to 3.17-rc] Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
-
由 Thomas Gleixner 提交于
futex_wait_requeue_pi() calls futex_wait_setup(). If futex_wait_setup() succeeds it returns with hb->lock held and preemption disabled. Now the sanity check after this does: if (match_futex(&q.key, &key2)) { ret = -EINVAL; goto out_put_keys; } which releases the keys but does not release hb->lock. So we happily return to user space with hb->lock held and therefor preemption disabled. Unlock hb->lock before taking the exit route. Reported-by: NDave "Trinity" Jones <davej@redhat.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NDarren Hart <dvhart@linux.intel.com> Reviewed-by: NDavidlohr Bueso <dave@stgolabs.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1409112318500.4178@nanosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 11 9月, 2014 3 次提交
-
-
由 Rasmus Villemoes 提交于
The C operator <= defines a perfectly fine total ordering on the set of values representable in a long. However, unlike its namesake in the integers, it is not translation invariant, meaning that we do not have "b <= c" iff "a+b <= a+c" for all a,b,c. This means that it is always wrong to try to boil down the relationship between two longs to a question about the sign of their difference, because the resulting relation [a LEQ b iff a-b <= 0] is neither anti-symmetric or transitive. The former is due to -LONG_MIN==LONG_MIN (take any two a,b with a-b = LONG_MIN; then a LEQ b and b LEQ a, but a != b). The latter can either be seen observing that x LEQ x+1 for all x, implying x LEQ x+1 LEQ x+2 ... LEQ x-1 LEQ x; or more directly with the simple example a=LONG_MIN, b=0, c=1, for which a-b < 0, b-c < 0, but a-c > 0. Note that it makes absolutely no difference that a transmogrying bijection has been applied before the comparison is done. In fact, had the obfuscation not been done, one could probably not observe the bug (assuming all values being compared always lie in one half of the address space, the mathematical value of a-b is always representable in a long). As it stands, one can easily obtain three file descriptors exhibiting the non-transitivity of kcmp(). Side note 1: I can't see that ensuring the MSB of the multiplier is set serves any purpose other than obfuscating the obfuscating code. Side note 2: #include <stdio.h> #include <stdlib.h> #include <string.h> #include <fcntl.h> #include <unistd.h> #include <assert.h> #include <sys/syscall.h> enum kcmp_type { KCMP_FILE, KCMP_VM, KCMP_FILES, KCMP_FS, KCMP_SIGHAND, KCMP_IO, KCMP_SYSVSEM, KCMP_TYPES, }; pid_t pid; int kcmp(pid_t pid1, pid_t pid2, int type, unsigned long idx1, unsigned long idx2) { return syscall(SYS_kcmp, pid1, pid2, type, idx1, idx2); } int cmp_fd(int fd1, int fd2) { int c = kcmp(pid, pid, KCMP_FILE, fd1, fd2); if (c < 0) { perror("kcmp"); exit(1); } assert(0 <= c && c < 3); return c; } int cmp_fdp(const void *a, const void *b) { static const int normalize[] = {0, -1, 1}; return normalize[cmp_fd(*(int*)a, *(int*)b)]; } #define MAX 100 /* This is plenty; I've seen it trigger for MAX==3 */ int main(int argc, char *argv[]) { int r, s, count = 0; int REL[3] = {0,0,0}; int fd[MAX]; pid = getpid(); while (count < MAX) { r = open("/dev/null", O_RDONLY); if (r < 0) break; fd[count++] = r; } printf("opened %d file descriptors\n", count); for (r = 0; r < count; ++r) { for (s = r+1; s < count; ++s) { REL[cmp_fd(fd[r], fd[s])]++; } } printf("== %d\t< %d\t> %d\n", REL[0], REL[1], REL[2]); qsort(fd, count, sizeof(fd[0]), cmp_fdp); memset(REL, 0, sizeof(REL)); for (r = 0; r < count; ++r) { for (s = r+1; s < count; ++s) { REL[cmp_fd(fd[r], fd[s])]++; } } printf("== %d\t< %d\t> %d\n", REL[0], REL[1], REL[2]); return (REL[0] + REL[2] != 0); } Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk> Reviewed-by: NCyrill Gorcunov <gorcunov@openvz.org> "Eric W. Biederman" <ebiederm@xmission.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Patrick Palka 提交于
We shouldn't set text_len in the code path that detects printk recursion because text_len corresponds to the length of the string inside textbuf. A few lines down from the line text_len = strlen(recursion_msg); is the line text_len += vscnprintf(text + text_len, ...); So if printk detects recursion, it sets text_len to 29 (the length of recursion_msg) and logs an error. Then the message supplied by the caller of printk is stored inside textbuf but offset by 29 bytes. This means that the output of the recursive call to printk will contain 29 bytes of garbage in front of it. This defect is caused by commit 458df9fd ("printk: remove separate printk_sched buffers and use printk buf instead") which turned the line text_len = vscnprintf(text, ...); into text_len += vscnprintf(text + text_len, ...); To fix this, this patch avoids setting text_len when logging the printk recursion error. This patch also marks unlikely() the branch leading up to this code. Fixes: 458df9fd ("printk: remove separate printk_sched buffers and use printk buf instead") Signed-off-by: NPatrick Palka <patrick@parcs.ath.cx> Reviewed-by: NPetr Mladek <pmladek@suse.cz> Reviewed-by: NJan Kara <jack@suse.cz> Acked-by: NSteven Rostedt <rostedt@goodmis.org> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Daniel Borkmann 提交于
Since BPF JIT depends on the availability of module_alloc() and module_free() helpers (HAVE_BPF_JIT and MODULES), we better build that code only in case we have BPF_JIT in our config enabled, just like with other JIT code. Fixes builds for arm/marzen_defconfig and sh/rsk7269_defconfig. ==================== kernel/built-in.o: In function `bpf_jit_binary_alloc': /home/cwang/linux/kernel/bpf/core.c:144: undefined reference to `module_alloc' kernel/built-in.o: In function `bpf_jit_binary_free': /home/cwang/linux/kernel/bpf/core.c:164: undefined reference to `module_free' make: *** [vmlinux] Error 1 ==================== Reported-by: NFengguang Wu <fengguang.wu@intel.com> Fixes: 738cbe72 ("net: bpf: consolidate JIT binary allocator") Signed-off-by: NDaniel Borkmann <dborkman@redhat.com> Acked-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 9月, 2014 2 次提交
-
-
由 Daniel Borkmann 提交于
Introduced in commit 314beb9b ("x86: bpf_jit_comp: secure bpf jit against spraying attacks") and later on replicated in aa2d2c73 ("s390/bpf,jit: address randomize and write protect jit code") for s390 architecture, write protection for BPF JIT images got added and a random start address of the JIT code, so that it's not on a page boundary anymore. Since both use a very similar allocator for the BPF binary header, we can consolidate this code into the BPF core as it's mostly JIT independant anyway. This will also allow for future archs that support DEBUG_SET_MODULE_RONX to just reuse instead of reimplementing it. JIT tested on x86_64 and s390x with BPF test suite. Signed-off-by: NDaniel Borkmann <dborkman@redhat.com> Acked-by: NAlexei Starovoitov <ast@plumgrid.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexei Starovoitov 提交于
add BPF_LD_IMM64 instruction to load 64-bit immediate value into a register. All previous instructions were 8-byte. This is first 16-byte instruction. Two consecutive 'struct bpf_insn' blocks are interpreted as single instruction: insn[0].code = BPF_LD | BPF_DW | BPF_IMM insn[0].dst_reg = destination register insn[0].imm = lower 32-bit insn[1].code = 0 insn[1].imm = upper 32-bit All unused fields must be zero. Classic BPF has similar instruction: BPF_LD | BPF_W | BPF_IMM which loads 32-bit immediate value into a register. x64 JITs it as single 'movabsq %rax, imm64' arm64 may JIT as sequence of four 'movk x0, #imm16, lsl #shift' insn Note that old eBPF programs are binary compatible with new interpreter. It helps eBPF programs load 64-bit constant into a register with one instruction instead of using two registers and 4 instructions: BPF_MOV32_IMM(R1, imm32) BPF_ALU64_IMM(BPF_LSH, R1, 32) BPF_MOV32_IMM(R2, imm32) BPF_ALU64_REG(BPF_OR, R1, R2) User space generated programs will use this instruction to load constants only. To tell kernel that user space needs a pointer the _pseudo_ variant of this instruction may be added later, which will use extra bits of encoding to indicate what type of pointer user space is asking kernel to provide. For example 'off' or 'src_reg' fields can be used for such purpose. src_reg = 1 could mean that user space is asking kernel to validate and load in-kernel map pointer. src_reg = 2 could mean that user space needs readonly data section pointer src_reg = 3 could mean that user space needs a pointer to per-cpu local data All such future pseudo instructions will not be carrying the actual pointer as part of the instruction, but rather will be treated as a request to kernel to provide one. The kernel will verify the request_for_a_pointer, then will drop _pseudo_ marking and will store actual internal pointer inside the instruction, so the end result is the interpreter and JITs never see pseudo BPF_LD_IMM64 insns and only operate on generic BPF_LD_IMM64 that loads 64-bit immediate into a register. User space never operates on direct pointers and verifier can easily recognize request_for_pointer vs other instructions. Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 9月, 2014 2 次提交
-
-
由 Masanari Iida 提交于
This patch fix spelling typo found in DocBook/kernel-api.xml. It is because the file is generated from the source comments, I have to fix the comments in source codes. Signed-off-by: NMasanari Iida <standby24x7@gmail.com> Acked-by: NRandy Dunlap <rdunlap@infradead.org> Signed-off-by: NJiri Kosina <jkosina@suse.cz>
-
由 Cong Wang 提交于
We saw a kernel soft lockup in perf_remove_from_context(), it looks like the `perf` process, when exiting, could not go out of the retry loop. Meanwhile, the target process was forking a child. So either the target process should execute the smp function call to deactive the event (if it was running) or it should do a context switch which deactives the event. It seems we optimize out a context switch in perf_event_context_sched_out(), and what's more important, we still test an obsolete task pointer when retrying, so no one actually would deactive that event in this situation. Fix it directly by reloading the task pointer in perf_remove_from_context(). This should cure the above soft lockup. Signed-off-by: NCong Wang <cwang@twopensource.com> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/1409696840-843-1-git-send-email-xiyou.wangcong@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-