- 08 2月, 2015 2 次提交
-
-
由 Steven Rostedt (Red Hat) 提交于
When taking a CPU down for suspend and resume, a tracepoint may be called when the CPU has been designated offline. As tracepoints require RCU for protection, they must not be called if the current CPU is offline. Unfortunately, trace_tlb_flush() is called in this scenario as was noted by LOCKDEP: ... Disabling non-boot CPUs ... intel_pstate CPU 1 exiting =============================== smpboot: CPU 1 didn't die... [ INFO: suspicious RCU usage. ] 3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted ------------------------------- include/trace/events/tlb.h:35 suspicious rcu_dereference_check() usage! other info that might help us debug this: RCU used illegally from offline CPU! rcu_scheduler_active = 1, debug_locks = 0 no locks held by swapper/1/0. stack backtrace: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150204.1-iniza-small #1 Hardware name: SAMSUNG ELECTRONICS CO., LTD. 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013 0000000000000001 ffff88011a44fe18 ffffffff817e370d 0000000000000011 ffff88011a448290 ffff88011a44fe48 ffffffff810d6847 ffff8800c66b9600 0000000000000001 ffff88011a44c000 ffffffff81cb3900 ffff88011a44fe78 Call Trace: [<ffffffff817e370d>] dump_stack+0x4c/0x65 [<ffffffff810d6847>] lockdep_rcu_suspicious+0xe7/0x120 [<ffffffff810b71a5>] idle_task_exit+0x205/0x2c0 [<ffffffff81054c4e>] play_dead_common+0xe/0x50 [<ffffffff81054ca5>] native_play_dead+0x15/0x140 [<ffffffff8102963f>] arch_cpu_idle_dead+0xf/0x20 [<ffffffff810cd89e>] cpu_startup_entry+0x37e/0x580 [<ffffffff81053e20>] start_secondary+0x140/0x150 intel_pstate CPU 2 exiting ... By converting the tlb_flush tracepoint to a TRACE_EVENT_CONDITION where the condition is cpu_online(smp_processor_id()), we can avoid calling RCU protected code when the CPU is offline. Link: http://lkml.kernel.org/r/CA+icZUUGiGDoL5NU8RuxKzFjoLjEKRtUWx=JB8B9a0EQv-eGzQ@mail.gmail.com Cc: stable@vger.kernel.org # 3.17+ Fixes: d17d8f9d "x86/mm: Add tracepoints for TLB flushes" Reported-by: NSedat Dilek <sedat.dilek@gmail.com> Tested-by: NSedat Dilek <sedat.dilek@gmail.com> Suggested-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: NDave Hansen <dave@sr71.net> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt (Red Hat) 提交于
The trace_tlb_flush() tracepoint can be called when a CPU is going offline. When a CPU is offline, RCU is no longer watching that CPU and since the tracepoint is protected by RCU, it must not be called. To prevent the tlb_flush tracepoint from being called when the CPU is offline, it was converted to a TRACE_EVENT_CONDITION where the condition checks if the CPU is online before calling the tracepoint. Unfortunately, this was not enough to stop lockdep from complaining about it. Even though the RCU protected code of the tracepoint will never be called, the condition is hidden within the tracepoint, and even though the condition prevents RCU code from being called, the lockdep checks are outside the tracepoint (this is to test tracepoints even when they are not enabled). Even though tracepoints should be checked to be RCU safe when they are not enabled, the condition should still be considered when checking RCU. Link: http://lkml.kernel.org/r/CA+icZUUGiGDoL5NU8RuxKzFjoLjEKRtUWx=JB8B9a0EQv-eGzQ@mail.gmail.com Fixes: 3a630178 "tracing: generate RCU warnings even when tracepoints are disabled" Cc: stable@vger.kernel.org # 3.18+ Acked-by: NDave Hansen <dave@sr71.net> Reported-by: NSedat Dilek <sedat.dilek@gmail.com> Tested-by: NSedat Dilek <sedat.dilek@gmail.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 06 2月, 2015 1 次提交
-
-
由 Yann Droneaud 提交于
While commit 7e36ef82 ("IB/core: Temporarily disable ex_query_device uverb") is correct as it makes the extended QUERY_DEVICE uverb (which came as part of commit 5a77abf9 ("IB/core: Add support for extended query device caps") and commit 860f10a7 ("IB/core: Add flags for on demand paging support")) not available to userspace, it doesn't address the initial issue regarding ib_copy_to_udata() [1][2]. Additionally, further discussions around this new uverb seems to conclude it would require a different data structure than the one currently described in <rdma/ib_user_verbs.h> [3]. Both of these issues require a revert of the changes, so this patch partially reverts commit 8cdd312c ("IB/mlx5: Implement the ODP capability query verb") and commit 860f10a7 ("IB/core: Add flags for on demand paging support") and fully reverts commit 5a77abf9 ("IB/core: Add support for extended query device caps"). [1] "Re: [PATCH v3 06/17] IB/core: Add support for extended query device caps" http://mid.gmane.org/1418733236.2779.26.camel@opteya.com [2] "Re: [PATCH] IB/core: Temporarily disable ex_query_device uverb" http://mid.gmane.org/1423067503.3030.83.camel@opteya.com [3] "RE: [PATCH v1 1/5] IB/uverbs: ex_query_device: answer must not depend on request's comp_mask" http://mid.gmane.org/2807E5FD2F6FDA4886F6618EAC48510E0CC12C30@CRSMSX101.amr.corp.intel.com Cc: Eli Cohen <eli@mellanox.com> Cc: Haggai Eran <haggaie@mellanox.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Cc: Sagi Grimberg <sagig@mellanox.com> Cc: Shachar Raindel <raindel@mellanox.com> Signed-off-by: NYann Droneaud <ydroneaud@opteya.com> Signed-off-by: NRoland Dreier <roland@purestorage.com>
-
- 05 2月, 2015 9 次提交
-
-
由 Lv Zheng 提交于
ACPICA commit 199cad16530a45aea2bec98e528866e20c5927e1 Since whether the GPE should be disabled/enabled/cleared should only be determined by the GPE driver's state machine: 1. GPE should be disabled if the driver wants to switch to the GPE polling mode when a GPE storm condition is indicated and should be enabled if the driver wants to switch back to the GPE interrupt mode when all of the storm conditions are cleared. The conditions should be protected by the driver's specific lock. 2. GPE should be enabled if the driver has accepted more than one request and should be disabled if the driver has completed all of the requests. The request count should be protected by the driver's specific lock. 3. GPE should be cleared either when the driver is about to handle an edge triggered GPE or when the driver has completed to handle a level triggered GPE. The handling code should be protected by the driver's specific lock. Thus the GPE enabling/disabling/clearing operations are likely to be performed with the driver's specific lock held while we currently cannot do this. This is because: 1. We have the acpi_gbl_gpe_lock held before invoking the GPE driver's handler. Driver's specific lock is likely to be held inside of the handler, thus we can see some dead lock issues due to the reversed locking order or recursive locking. In order to solve such dead lock issues, we need to unlock the acpi_gbl_gpe_lock before invoking the handler. BZ 1100. 2. Since GPE disabling/enabling/clearing should be determined by the GPE driver's state machine, we shouldn't perform such operations inside of ACPICA for a GPE handler to mess up the driver's state machine. BZ 1101. Originally this patch includes a logic to flush GPE handlers, it is dropped due to the following reasons: 1. This is a different issue; 2. Linux OSL has fixed this by flushing SCI in acpi_os_wait_events_complete(). We will pick up this topic when the Linux OSL fix turns out to be not sufficient. Note that currently the internal operations and the acpi_gbl_gpe_lock are also used by ACPI_GPE_DISPATCH_METHOD and ACPI_GPE_DISPATCH_NOTIFY. In order not to introduce regressions, we add one ACPI_GPE_DISPATCH_RAW_HANDLER type to be distiguished from ACPI_GPE_DISPATCH_HANDLER. For which the acpi_gbl_gpe_lock is unlocked before invoking the GPE handler and the internal enabling/disabling operations are bypassed to allow drivers to perform them at a proper position using the GPE APIs and ACPI_GPE_DISPATCH_RAW_HANDLER users should invoke acpi_set_gpe() instead of acpi_enable_gpe()/acpi_disable_gpe() to bypass the internal GPE clearing code in acpi_enable_gpe(). Lv Zheng. Known issues: 1. Edge-triggered GPE lost for frequent enablings On some buggy silicon platforms, GPE enable line may not be directly wired to the GPE trigger line. In that case, when GPE enabling is frequently performed for edge-triggered GPEs, GPE status may stay set without being triggered. This patch may maginify this problem as it allows GPE enabling to be parallel performed during the process the GPEs are handled. This is an existing issue, because: 1. For task context: Current ACPI_GPE_DISPATCH_METHOD practices have proven that this isn't a real issue - we can re-enable edge-triggered GPE in a work queue where the GPE status bit might already be set. 2. For IRQ context: This can even happen when the GPE enabling occurs before returning from the GPE handler and after unlocking the GPE lock. Thus currently no code is included to protect this. Link: https://github.com/acpica/acpica/commit/199cad16Signed-off-by: NLv Zheng <lv.zheng@intel.com> Signed-off-by: NBob Moore <robert.moore@intel.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> -
由 David E. Box 提交于
ACPICA commit e06b1624b02dc8317d144e9a6fe9d684c5fa2f90 Version 20150204. Link: https://github.com/acpica/acpica/commit/e06b1624Signed-off-by: NDavid E. Box <david.e.box@linux.intel.com> Signed-off-by: NBob Moore <robert.moore@intel.com> Signed-off-by: NLv Zheng <lv.zheng@intel.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 David E. Box 提交于
ACPICA commit 8990e73ab2aa15d6a0068b860ab54feff25bee36 Link: https://github.com/acpica/acpica/commit/8990e73aSigned-off-by: NDavid E. Box <david.e.box@linux.intel.com> Signed-off-by: NBob Moore <robert.moore@intel.com> Signed-off-by: NLv Zheng <lv.zheng@intel.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Lv Zheng 提交于
ACPICA commit 7926d5ca9452c87f866938dcea8f12e1efb58f89 There is an issue in acpi_install_gpe_handler() and acpi_remove_gpe_handler(). The code to obtain the GPE dispatcher type from the Handler->original_flags is wrong: if (((Handler->original_flags & ACPI_GPE_DISPATCH_METHOD) || (Handler->original_flags & ACPI_GPE_DISPATCH_NOTIFY)) && ACPI_GPE_DISPATCH_NOTIFY is 0x03 and ACPI_GPE_DISPATCH_METHOD is 0x02, thus this statement is TRUE for the following dispatcher types: 0x01 (ACPI_GPE_DISPATCH_HANDLER): not expected 0x02 (ACPI_GPE_DISPATCH_METHOD): expected 0x03 (ACPI_GPE_DISPATCH_NOTIFY): expected There is no functional issue due to this because Handler->original_flags is only set in acpi_install_gpe_handler(), and an earlier checker has excluded the ACPI_GPE_DISPATCH_HANDLER: if ((gpe_event_info->Flags & ACPI_GPE_DISPATCH_MASK) == ACPI_GPE_DISPATCH_HANDLER) { Status = AE_ALREADY_EXISTS; goto free_and_exit; } ... Handler->original_flags = (u8) (gpe_event_info->Flags & (ACPI_GPE_XRUPT_TYPE_MASK | ACPI_GPE_DISPATCH_MASK)); We need to clean this up before modifying the GPE dispatcher type values. In order to prevent such issue from happening in the future, this patch introduces ACPI_GPE_DISPATCH_TYPE() macro to be used to obtain the GPE dispatcher types. Lv Zheng. Link: https://github.com/acpica/acpica/commit/7926d5caSigned-off-by: NLv Zheng <lv.zheng@intel.com> Signed-off-by: NDavid E. Box <david.e.box@linux.intel.com> Signed-off-by: NBob Moore <robert.moore@intel.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> -
由 Yinghai Lu 提交于
We need to parse APIC ID for IOAPIC registration for IOAPIC hotplug. ACPI _MAT method and MADT table are used to figure out IOAPIC ID, just like parsing CPU APIC ID for CPU hotplug. [ tglx: Fixed docbook comment ] Signed-off-by: NYinghai Lu <yinghai@kernel.org> Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Len Brown <lenb@kernel.org> Link: http://lkml.kernel.org/r/1414387308-27148-8-git-send-email-jiang.liu@linux.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Jiang Liu 提交于
Use common resource list management data structure and interfaces instead of private implementation. Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com> Acked-by: NWill Deacon <will.deacon@arm.com> Acked-by: NBjorn Helgaas <bhelgaas@google.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Jiang Liu 提交于
Currently ACPI, PCI and pnp all implement the same resource list management with different data structure. We need to transfer from one data structure into another when passing resources from one subsystem into another subsystem. So move struct resource_list_entry from ACPI into resource core and rename it as resource_entry, then it could be reused by different subystems and avoid the data structure conversion. Introduce dedicated header file resource_ext.h instead of embedding it into ioport.h to avoid header file inclusion order issues. Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com> Acked-by: NVinod Koul <vinod.koul@intel.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Eric Dumazet 提交于
include/net/ipv6.h:713:22: warning: incorrect type in assignment (different base types) include/net/ipv6.h:713:22: expected restricted __be32 [usertype] hash include/net/ipv6.h:713:22: got unsigned int include/net/ipv6.h:719:25: warning: restricted __be32 degrades to integer include/net/ipv6.h:719:22: warning: invalid assignment: ^= include/net/ipv6.h:719:22: left side has type restricted __be32 include/net/ipv6.h:719:22: right side has type unsigned int Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
(struct flow_keys)->n_proto is in network order, use proper type for this. Fixes following sparse errors : net/core/flow_dissector.c:139:39: warning: incorrect type in assignment (different base types) net/core/flow_dissector.c:139:39: expected unsigned short [unsigned] [usertype] n_proto net/core/flow_dissector.c:139:39: got restricted __be16 [assigned] [usertype] proto net/core/flow_dissector.c:237:23: warning: incorrect type in assignment (different base types) net/core/flow_dissector.c:237:23: expected unsigned short [unsigned] [usertype] n_proto net/core/flow_dissector.c:237:23: got restricted __be16 [assigned] [usertype] proto Signed-off-by: NEric Dumazet <edumazet@google.com> Fixes: e0f31d84 ("flow_keys: Record IP layer protocol in skb_flow_dissect()") Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 2月, 2015 4 次提交
-
-
由 Vlad Yasevich 提交于
If the IPv6 fragment id has not been set and we perform fragmentation due to UFO, select a new fragment id. We now consider a fragment id of 0 as unset and if id selection process returns 0 (after all the pertrubations), we set it to 0x80000000, thus giving us ample space not to create collisions with the next packet we may have to fragment. When doing UFO integrity checking, we also select the fragment id if it has not be set yet. This is stored into the skb_shinfo() thus allowing UFO to function correclty. This patch also removes duplicate fragment id generation code and moves ipv6_select_ident() into the header as it may be used during GSO. Signed-off-by: NVladislav Yasevich <vyasevic@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiang Liu 提交于
Introduce helper function acpi_dev_filter_resource_type(), which may be used by acpi_dev_get_resources() to filer out resource based on resource type. Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Jiang Liu 提交于
Add field offset to struct resource_list_entry to host address space translation offset so it could be used to represent bridge resources. Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Jiang Liu 提交于
Change function acpi_dev_resource_address_space() and acpi_dev_resource_ext_address_space() to return address space translation offset. It's based on a patch from Yinghai Lu <yinghai@kernel.org>. Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 03 2月, 2015 2 次提交
-
-
由 Mikulas Patocka 提交于
The patch e22b886a ("sched/wait: Add might_sleep() checks") introduced a bug in the raid5 subsystem. The function raid5_quiesce() (and resize_stripes()) uses the 'cmd' part to release and acquire a spinlock (so we call the sleep primitives in atomic context), and therefore we cannot do the might_sleep() check. Remove it. Fixes: e22b886a ("sched/wait: Add might_sleep() checks") Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1502020935580.13510@file01.intranet.prod.int.rdu2.redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Jack Morgenstein 提交于
Commit de966c59 (net/mlx4_core: Support more than 64 VFs) was meant to allow up to 126 VFs. However, due to leaving MLX4_MFUNC_MAX too low, using more than 80 VFs resulted in memory corruptions (and Oopses) when more than 80 VFs were requested. In addition, the number of slaves was left too high. This commit fixes these issues. Fixes: de966c59 ("net/mlx4_core: Support more than 64 VFs") Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: NAmir Vadai <amirv@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 2月, 2015 2 次提交
-
-
由 Eric Dumazet 提交于
In commit be9f4a44 ("ipv4: tcp: remove per net tcp_sock") I tried to address contention on a socket lock, but the solution I chose was horrible : commit 3a7c384f ("ipv4: tcp: unicast_sock should not land outside of TCP stack") addressed a selinux regression. commit 0980e56e ("ipv4: tcp: set unicast_sock uc_ttl to -1") took care of another regression. commit b5ec8eea ("ipv4: fix ip_send_skb()") fixed another regression. commit 811230cd ("tcp: ipv4: initialize unicast_sock sk_pacing_rate") was another shot in the dark. Really, just use a proper socket per cpu, and remove the skb_orphan() call, to re-enable flow control. This solves a serious problem with FQ packet scheduler when used in hostile environments, as we do not want to allocate a flow structure for every RST packet sent in response to a spoofed packet. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Linus Torvalds 提交于
Commit 8eb23b9f ("sched: Debug nested sleeps") added code to report on nested sleep conditions, which we generally want to avoid because the inner sleeping operation can re-set the thread state to TASK_RUNNING, but that will then cause the outer sleep loop not actually sleep when it calls schedule. However, that's actually valid traditional behavior, with the inner sleep being some fairly rare case (like taking a sleeping lock that normally doesn't actually need to sleep). And the debug code would actually change the state of the task to TASK_RUNNING internally, which makes that kind of traditional and working code not work at all, because now the nested sleep doesn't just sometimes cause the outer one to not block, but will cause it to happen every time. In particular, it will cause the cardbus kernel daemon (pccardd) to basically busy-loop doing scheduling, converting a laptop into a heater, as reported by Bruno Prémont. But there may be other legacy uses of that nested sleep model in other drivers that are also likely to never get converted to the new model. This fixes both cases: - don't set TASK_RUNNING when the nested condition happens (note: even if WARN_ONCE() only _warns_ once, the return value isn't whether the warning happened, but whether the condition for the warning was true. So despite the warning only happening once, the "if (WARN_ON(..))" would trigger for every nested sleep. - in the cases where we knowingly disable the warning by using "sched_annotate_sleep()", don't change the task state (that is used for all core scheduling decisions), instead use '->task_state_change' that is used for the debugging decision itself. (Credit for the second part of the fix goes to Oleg Nesterov: "Can't we avoid this subtle change in behaviour DEBUG_ATOMIC_SLEEP adds?" with the suggested change to use 'task_state_change' as part of the test) Reported-and-bisected-by: NBruno Prémont <bonbons@linux-vserver.org> Tested-by: NRafael J Wysocki <rjw@rjwysocki.net> Acked-by: NOleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de>, Cc: Ilya Dryomov <ilya.dryomov@inktank.com>, Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Hurley <peter@hurleysoftware.com>, Cc: Davidlohr Bueso <dave@stgolabs.net>, Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 2月, 2015 1 次提交
-
-
由 Eric Dumazet 提交于
Doing the following commands on a non idle network device panics the box instantly, because cpu_bstats gets overwritten by stats. tc qdisc add dev eth0 root <your_favorite_qdisc> ... some traffic (one packet is enough) ... tc qdisc replace dev eth0 root est 1sec 4sec <your_favorite_qdisc> [ 325.355596] BUG: unable to handle kernel paging request at ffff8841dc5a074c [ 325.362609] IP: [<ffffffff81541c9e>] __gnet_stats_copy_basic+0x3e/0x90 [ 325.369158] PGD 1fa7067 PUD 0 [ 325.372254] Oops: 0000 [#1] SMP [ 325.375514] Modules linked in: ... [ 325.398346] CPU: 13 PID: 14313 Comm: tc Not tainted 3.19.0-smp-DEV #1163 [ 325.412042] task: ffff8800793ab5d0 ti: ffff881ff2fa4000 task.ti: ffff881ff2fa4000 [ 325.419518] RIP: 0010:[<ffffffff81541c9e>] [<ffffffff81541c9e>] __gnet_stats_copy_basic+0x3e/0x90 [ 325.428506] RSP: 0018:ffff881ff2fa7928 EFLAGS: 00010286 [ 325.433824] RAX: 000000000000000c RBX: ffff881ff2fa796c RCX: 000000000000000c [ 325.440988] RDX: ffff8841dc5a0744 RSI: 0000000000000060 RDI: 0000000000000060 [ 325.448120] RBP: ffff881ff2fa7948 R08: ffffffff81cd4f80 R09: 0000000000000000 [ 325.455268] R10: ffff883ff223e400 R11: 0000000000000000 R12: 000000015cba0744 [ 325.462405] R13: ffffffff81cd4f80 R14: ffff883ff223e460 R15: ffff883feea0722c [ 325.469536] FS: 00007f2ee30fa700(0000) GS:ffff88407fa20000(0000) knlGS:0000000000000000 [ 325.477630] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 325.483380] CR2: ffff8841dc5a074c CR3: 0000003feeae9000 CR4: 00000000001407e0 [ 325.490510] Stack: [ 325.492524] ffff883feea0722c ffff883fef719dc0 ffff883feea0722c ffff883ff223e4a0 [ 325.499990] ffff881ff2fa79a8 ffffffff815424ee ffff883ff223e49c 000000015cba0744 [ 325.507460] 00000000f2fa7978 0000000000000000 ffff881ff2fa79a8 ffff883ff223e4a0 [ 325.514956] Call Trace: [ 325.517412] [<ffffffff815424ee>] gen_new_estimator+0x8e/0x230 [ 325.523250] [<ffffffff815427aa>] gen_replace_estimator+0x4a/0x60 [ 325.529349] [<ffffffff815718ab>] tc_modify_qdisc+0x52b/0x590 [ 325.535117] [<ffffffff8155edd0>] rtnetlink_rcv_msg+0xa0/0x240 [ 325.540963] [<ffffffff8155ed30>] ? __rtnl_unlock+0x20/0x20 [ 325.546532] [<ffffffff8157f811>] netlink_rcv_skb+0xb1/0xc0 [ 325.552145] [<ffffffff8155b355>] rtnetlink_rcv+0x25/0x40 [ 325.557558] [<ffffffff8157f0d8>] netlink_unicast+0x168/0x220 [ 325.563317] [<ffffffff8157f47c>] netlink_sendmsg+0x2ec/0x3e0 Lets play safe and not use an union : percpu 'pointers' are mostly read anyway, and we have typically few qdiscs per host. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: John Fastabend <john.fastabend@gmail.com> Fixes: 22e0f8b9 ("net: sched: make bstats per cpu and estimator RCU safe") Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 31 1月, 2015 1 次提交
-
-
由 Toshiaki Makita 提交于
vlan_get_protocol() could not get network protocol if a skb has a 802.1ad vlan tag or multiple vlans, which caused incorrect checksum calculation in several drivers. Fix vlan_get_protocol() to retrieve network protocol instead of incorrect vlan protocol. As the logic is the same as skb_network_protocol(), create a common helper function __vlan_get_protocol() and call it from existing functions. Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 1月, 2015 1 次提交
-
-
由 Linus Torvalds 提交于
The core VM already knows about VM_FAULT_SIGBUS, but cannot return a "you should SIGSEGV" error, because the SIGSEGV case was generally handled by the caller - usually the architecture fault handler. That results in lots of duplication - all the architecture fault handlers end up doing very similar "look up vma, check permissions, do retries etc" - but it generally works. However, there are cases where the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV. In particular, when accessing the stack guard page, libsigsegv expects a SIGSEGV. And it usually got one, because the stack growth is handled by that duplicated architecture fault handler. However, when the generic VM layer started propagating the error return from the stack expansion in commit fee7e49d ("mm: propagate error from stack expansion even for guard page"), that now exposed the existing VM_FAULT_SIGBUS result to user space. And user space really expected SIGSEGV, not SIGBUS. To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those duplicate architecture fault handlers about it. They all already have the code to handle SIGSEGV, so it's about just tying that new return value to the existing code, but it's all a bit annoying. This is the mindless minimal patch to do this. A more extensive patch would be to try to gather up the mostly shared fault handling logic into one generic helper routine, and long-term we really should do that cleanup. Just from this patch, you can generally see that most architectures just copied (directly or indirectly) the old x86 way of doing things, but in the meantime that original x86 model has been improved to hold the VM semaphore for shorter times etc and to handle VM_FAULT_RETRY and other "newer" things, so it would be a good idea to bring all those improvements to the generic case and teach other architectures about them too. Reported-and-tested-by: NTakashi Iwai <tiwai@suse.de> Tested-by: NJan Engelhardt <jengelh@inai.de> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots" Cc: linux-arch@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 29 1月, 2015 2 次提交
-
-
由 Neal Cardwell 提交于
LRO, GRO, delayed ACKs, and middleboxes can cause "stretch ACKs" that cover more than the RFC-specified maximum of 2 packets. These stretch ACKs can cause serious performance shortfalls in common congestion control algorithms that were designed and tuned years ago with receiver hosts that were not using LRO or GRO, and were instead politely ACKing every other packet. This patch series fixes Reno and CUBIC to handle stretch ACKs. This patch prepares for the upcoming stretch ACK bug fix patches. It adds an "acked" parameter to tcp_cong_avoid_ai() to allow for future fixes to tcp_cong_avoid_ai() to correctly handle stretch ACKs, and changes all congestion control algorithms to pass in 1 for the ACKed count. It also changes tcp_slow_start() to return the number of packet ACK "credits" that were not processed in slow start mode, and can be processed by the congestion control module in additive increase mode. In future patches we will fix tcp_cong_avoid_ai() to handle stretch ACKs, and fix Reno and CUBIC handling of stretch ACKs in slow start and additive increase mode. Reported-by: NEyal Perry <eyalpe@mellanox.com> Signed-off-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Takashi Iwai 提交于
When ak4114 work calls its callback and the callback invokes ak4114_reinit(), it stalls due to flush_delayed_work(). For avoiding this, control the reentrance by introducing a refcount. Also flush_delayed_work() is replaced with cancel_delayed_work_sync(). The exactly same bug is present in ak4113.c and fixed as well. Reported-by: NPavel Hofman <pavel.hofman@ivitera.com> Acked-by: NJaroslav Kysela <perex@perex.cz> Tested-by: NPavel Hofman <pavel.hofman@ivitera.com> Cc: <stable@vger.kernel.org> Signed-off-by: NTakashi Iwai <tiwai@suse.de>
-
- 28 1月, 2015 2 次提交
-
-
由 Peter Zijlstra 提交于
The fix from 9fc81d87 ("perf: Fix events installation during moving group") was incomplete in that it failed to recognise that creating a group with events for different CPUs is semantically broken -- they cannot be co-scheduled. Furthermore, it leads to real breakage where, when we create an event for CPU Y and then migrate it to form a group on CPU X, the code gets confused where the counter is programmed -- triggered in practice as well by me via the perf fuzzer. Fix this by tightening the rules for creating groups. Only allow grouping of counters that can be co-scheduled in the same context. This means for the same task and/or the same cpu. Fixes: 9fc81d87 ("perf: Fix events installation during moving group") Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150123125834.090683288@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Jan Kara 提交于
Currently ->get_dqblk() and ->set_dqblk() use struct fs_disk_quota which tracks space limits and usage in 512-byte blocks. However VFS quotas track usage in bytes (as some filesystems require that) and we need to somehow pass this information. Upto now it wasn't a problem because we didn't do any unit conversion (thus VFS quota routines happily stuck number of bytes into d_bcount field of struct fd_disk_quota). Only if you tried to use Q_XGETQUOTA or Q_XSETQLIM for VFS quotas (or Q_GETQUOTA / Q_SETQUOTA for XFS quotas), you got bogus results. Hardly anyone tried this but reportedly some Samba users hit the problem in practice. So when we want interfaces compatible we need to fix this. We bite the bullet and define another quota structure used for passing information from/to ->get_dqblk()/->set_dqblk. It's somewhat sad we have to have more conversion routines in fs/quota/quota.c and another copying of quota structure slows down getting of quota information by about 2% but it seems cleaner than overloading e.g. units of d_bcount to bytes. CC: stable@vger.kernel.org Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJan Kara <jack@suse.cz>
-
- 27 1月, 2015 5 次提交
-
-
由 Hannes Frederic Sowa 提交于
Not caching dst_entries which cause redirects could be exploited by hosts on the same subnet, causing a severe DoS attack. This effect aggravated since commit f8864972 ("ipv4: fix dst race in sk_dst_get()"). Lookups causing redirects will be allocated with DST_NOCACHE set which will force dst_release to free them via RCU. Unfortunately waiting for RCU grace period just takes too long, we can end up with >1M dst_entries waiting to be released and the system will run OOM. rcuos threads cannot catch up under high softirq load. Attaching the flag to emit a redirect later on to the specific skb allows us to cache those dst_entries thus reducing the pressure on allocation and deallocation. This issue was discovered by Marcelo Leitner. Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: NMarcelo Leitner <mleitner@redhat.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NJulian Anastasov <ja@ssi.bg> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pranith Kumar 提交于
There are missing dummy routines for log_buf_addr_get() and log_buf_len_get() for when CONFIG_PRINTK is not set causing build failures. This patch adds these dummy routines at the appropriate location. Signed-off-by: NPranith Kumar <bobby.prani@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: NPetr Mladek <pmladek@suse.cz> Acked-by: NSteven Rostedt <rostedt@goodmis.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
The OOM killing invocation does a lot of duplicative checks against the task's allocation context. Rework it to take advantage of the existing checks in the allocator slowpath. The OOM killer is invoked when the allocator is unable to reclaim any pages but the allocation has to keep looping. Instead of having a check for __GFP_NORETRY hidden in oom_gfp_allowed(), just move the OOM invocation to the true branch of should_alloc_retry(). The __GFP_FS check from oom_gfp_allowed() can then be moved into the OOM avoidance branch in __alloc_pages_may_oom(), along with the PF_DUMPCORE test. __alloc_pages_may_oom() can then signal to the caller whether the OOM killer was invoked, instead of requiring it to duplicate the order and high_zoneidx checks to guess this when deciding whether to continue. Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org> Acked-by: NMichal Hocko <mhocko@suse.cz> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jean Delvare 提交于
Make the slave support depend on CONFIG_I2C_SLAVE. Otherwise it gets included unconditionally, even when it is not needed. I2C bus drivers which implement slave support must select I2C_SLAVE. Signed-off-by: NJean Delvare <jdelvare@suse.de> Signed-off-by: NWolfram Sang <wsa@the-dreams.de>
-
由 Lars-Peter Clausen 提交于
In some cases it is necessary to before additional operations after the device has been initialized and before the device is registered. This can for example be resetting the device. This patch introduces a new function snd_soc_alloc_ac97_codec() which is similar to snd_soc_new_ac97_codec() except that it does not register the device. Any users of snd_soc_alloc_ac97_codec() are responsible for calling device_add() manually. Fixes: 6794f709 ("ASoC: ac97: Drop delayed device registration") Reported-by: NManuel Lauss <manuel.lauss@gmail.com> Signed-off-by: NLars-Peter Clausen <lars@metafoo.de> Tested-by: NManuel Lauss <manuel.lauss@gmail.com> Signed-off-by: NMark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org
-
- 26 1月, 2015 2 次提交
-
-
由 Lv Zheng 提交于
struct acpi_resource_address and struct acpi_resource_extended_address64 share substracts just at different offsets. To unify the parsing functions, OSPMs like Linux need a new ACPI_ADDRESS64_ATTRIBUTE as their substructs, so they can extract the shared data. This patch also synchronizes the structure changes to the Linux kernel. The usages are searched by matching the following keywords: 1. acpi_resource_address 2. acpi_resource_extended_address 3. ACPI_RESOURCE_TYPE_ADDRESS 4. ACPI_RESOURCE_TYPE_EXTENDED_ADDRESS And we found and fixed the usages in the following files: arch/ia64/kernel/acpi-ext.c arch/ia64/pci/pci.c arch/x86/pci/acpi.c arch/x86/pci/mmconfig-shared.c drivers/xen/xen-acpi-memhotplug.c drivers/acpi/acpi_memhotplug.c drivers/acpi/pci_root.c drivers/acpi/resource.c drivers/char/hpet.c drivers/pnp/pnpacpi/rsparser.c drivers/hv/vmbus_drv.c Build tests are passed with defconfig/allnoconfig/allyesconfig and defconfig+CONFIG_ACPI=n. Original-by: NThomas Gleixner <tglx@linutronix.de> Original-by: NJiang Liu <jiang.liu@linux.intel.com> Signed-off-by: NLv Zheng <lv.zheng@intel.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Lv Zheng 提交于
ACPICA has implemented acpi_unload_parent_table() which can exactly replace the acpi_get_id()/acpi_unload_table_id() implemented in Linux kernel. The acpi_unload_parent_table() has been unit tested in ACPICA simulation environment. This patch can also help to reduce the source code differences between Linux and ACPICA. Signed-off-by: NLv Zheng <lv.zheng@intel.com> Acked-by: NBjorn Helgaas <bhelgaas@google.com> Tested-by: NOctavian Purdila <octavian.purdila@intel.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 22 1月, 2015 1 次提交
-
-
由 Rusty Russell 提交于
James Bottomley points out that it will be -1 during unload. It's only used for diagnostics, so let's not hide that as it could be a clue as to what's gone wrong. Cc: Jason Wessel <jason.wessel@windriver.com> Acked-and-documention-added-by: NJames Bottomley <James.Bottomley@HansenPartnership.com> Reviewed-by: NMasami Hiramatsu <maasami.hiramatsu.pt@hitachi.com> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
- 20 1月, 2015 2 次提交
-
-
由 Rusty Russell 提交于
Nothing needs the module pointer any more, and the next patch will call it from RCU, where the module itself might no longer exist. Removing the arg is the safest approach. This just codifies the use of the module_alloc/module_free pattern which ftrace and bpf use. Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Acked-by: NAlexei Starovoitov <ast@kernel.org> Cc: Mikael Starvik <starvik@axis.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Ley Foon Tan <lftan@altera.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: x86@kernel.org Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: linux-cris-kernel@axis.com Cc: linux-kernel@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: nios2-dev@lists.rocketboards.org Cc: linuxppc-dev@lists.ozlabs.org Cc: sparclinux@vger.kernel.org Cc: netdev@vger.kernel.org
-
由 Rusty Russell 提交于
Archs have been abusing module_free() to clean up their arch-specific allocations. Since module_free() is also (ab)used by BPF and trace code, let's keep it to simple allocations, and provide a hook called before that. This means that avr32, ia64, parisc and s390 no longer need to implement their own module_free() at all. avr32 doesn't need module_finalize() either. Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Helge Deller <deller@gmx.de> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux-kernel@vger.kernel.org Cc: linux-ia64@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: linux-s390@vger.kernel.org
-
- 19 1月, 2015 3 次提交
-
-
由 Dan Williams 提交于
Ronny reports: https://bugzilla.kernel.org/show_bug.cgi?id=87101 "Since commit 8a4aeec8 "libata/ahci: accommodate tag ordered controllers" the access to the harddisk on the first SATA-port is failing on its first access. The access to the harddisk on the second port is working normal. When reverting the above commit, access to both harddisks is working fine again." Maintain tag ordered submission as the default, but allow sata_sil24 to continue with the old behavior. Cc: <stable@vger.kernel.org> Cc: Tejun Heo <tj@kernel.org> Reported-by: NRonny Hegewald <Ronny.Hegewald@online.de> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Pablo Neira Ayuso 提交于
The user can crash the kernel if it uses any of the existing NAT expressions from the wrong hook, so add some code to validate this when loading the rule. This patch introduces nft_chain_validate_hooks() which is based on an existing function in the bridge version of the reject expression. Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org> -
由 Christian Borntraeger 提交于
sparse complains about include/trace/events/kvm.h:163:1: error: directive in argument list include/trace/events/kvm.h:167:1: error: directive in argument list include/trace/events/kvm.h:169:1: error: directive in argument list and sparse is right. Preprocessing directives in an argument of a macro are undefined behaviour as of C99 6.10.3p11. Lets use an indirection to fix this. Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-