- 12 12月, 2015 3 次提交
-
-
由 stephen hemminger 提交于
The file ila.h used for lightweight tunnels is being used by iproute2 but is not exported yet. Signed-off-by: NStephen Hemminger <stephen@networkplumber.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
XFRM can deal with SYNACK messages, sent while listener socket is not locked. We add proper rcu protection to __xfrm_sk_clone_policy() and xfrm_sk_policy_lookup() This might serve as the first step to remove xfrm.xfrm_policy_lock use in fast path. Fixes: fa76ce73 ("inet: get rid of central tcp/dccp listener timer") Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NSteffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
We will soon switch sk->sk_policy[] to RCU protection, as SYNACK packets are sent while listener socket is not locked. This patch simply adds RCU grace period before struct xfrm_policy freeing, and the corresponding rcu_head in struct xfrm_policy. Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NSteffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 12月, 2015 3 次提交
-
-
由 Eric Dumazet 提交于
TCP SYNACK messages might now be attached to request sockets. XFRM needs to get back to a listener socket. Adds new helpers that might be used elsewhere : sk_to_full_sk() and sk_const_to_full_sk() Note: We also need to add RCU protection for xfrm lookups, now TCP/DCCP have lockless listener processing. This will be addressed in separate patches. Fixes: ca6fb065 ("tcp: attach SYNACK messages to request sockets instead of listener") Reported-by: NDave Jones <davej@codemonkey.org.uk> Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tomer Tayar 提交于
Concurrent non-blocking slowpath ramrods can be completed out-of-order on the completion chain. Recycling completed elements, while previously sent elements are still completion pending, can lead to overriding of active elements on the chain. Furthermore, sending pending slowpath ramrods currently lacks the update of the chain element physical pointer. This patch: * Ensures that ramrods are sent to the FW with consecutive echo values. * Handles out-of-order completions by freeing only first successive completed entries. * Updates the chain element physical pointer when copying a pending element into a free element for sending. Signed-off-by: NTomer Tayar <Tomer.Tayar@qlogic.com> Signed-off-by: NManish Chopra <manish.chopra@qlogic.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tomer Tayar 提交于
The amount of chain next pointer elements between the producer and the consumer indices depends on which pages they currently point to. The current calculation is based only on their difference, and it can lead to a number of free elements which is higher by 1 than the actual value. Signed-off-by: NTomer Tayar <Tomer.Tayar@qlogic.com> Signed-off-by: NManish Chopra <manish.chopra@qlogic.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 12月, 2015 2 次提交
-
-
由 Felix Fietkau 提交于
This semicolon causes a build error if the function call is wrapped in parentheses. Fixes: aabc92bb ("net: add __netdev_alloc_pcpu_stats() to indicate gfp flags") Reported-by: NImre Kaloz <kaloz@openwrt.org> Signed-off-by: NFelix Fietkau <nbd@openwrt.org> Acked-by: NPablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 lucien 提交于
when A sends a data to B, then A close() and enter into SHUTDOWN_PENDING state, if B neither claim his rwnd is 0 nor send SACK for this data, A will keep retransmitting this data until t5 timeout, Max.Retrans times can't work anymore, which is bad. if B's rwnd is not 0, it should send abort after Max.Retrans times, only when B's rwnd == 0 and A's retransmitting beyonds Max.Retrans times, A will start t5 timer, which is also commit f8d96052 ("sctp: Enforce retransmission limit during shutdown") means, but it lacks the condition peer rwnd == 0. so fix it by adding a bit (zero_window_announced) in peer to record if the last rwnd is 0. If it was, zero_window_announced will be set. and use this bit to decide if start t5 timer when local.state is SHUTDOWN_PENDING. Fixes: commit f8d96052 ("sctp: Enforce retransmission limit during shutdown") Signed-off-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 12月, 2015 2 次提交
-
-
由 Marcelo Ricardo Leitner 提交于
Dmitry Vyukov reported that SCTP was triggering a WARN on socket destroy related to disabling sock timestamp. When SCTP accepts an association or peel one off, it copies sock flags but forgot to call net_enable_timestamp() if a packet timestamping flag was copied, leading to extra calls to net_disable_timestamp() whenever such clones were closed. The fix is to call net_enable_timestamp() whenever we copy a sock with that flag on, like tcp does. Reported-by: NDmitry Vyukov <dvyukov@google.com> Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: NVlad Yasevich <vyasevich@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Benc 提交于
Commit 3511494c ("vxlan: Group Policy extension") changed definition of VXLAN_HF_RCO from 0x00200000 to BIT(24). This is obviously incorrect. It's also in violation with the RFC draft. Fixes: 3511494c ("vxlan: Group Policy extension") Cc: Thomas Graf <tgraf@suug.ch> Cc: Tom Herbert <therbert@google.com> Signed-off-by: NJiri Benc <jbenc@redhat.com> Acked-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 12月, 2015 1 次提交
-
-
由 Herbert Xu 提交于
Thomas and Phil observed that under stress rhashtable insertion sometimes failed with EBUSY, even though this error should only ever been seen when we're under attack and our hash chain length has grown to an unacceptable level, even after a rehash. It turns out that the logic for detecting whether there is an existing rehash is faulty. In particular, when two threads both try to grow the same table at the same time, one of them may see the newly grown table and thus erroneously conclude that it had been rehashed. This is what leads to the EBUSY error. This patch fixes this by remembering the current last table we used during insertion so that rhashtable_insert_rehash can detect when another thread has also done a resize/rehash. When this is detected we will give up our resize/rehash and simply retry the insertion with the new table. Reported-by: NThomas Graf <tgraf@suug.ch> Reported-by: NPhil Sutter <phil@nwl.cc> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Tested-by: NPhil Sutter <phil@nwl.cc> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 12月, 2015 2 次提交
-
-
由 Eric Dumazet 提交于
qdisc_tree_decrease_qlen() suffers from two problems on multiqueue devices. One problem is that it updates sch->q.qlen and sch->qstats.drops on the mq/mqprio root qdisc, while it should not : Daniele reported underflows errors : [ 681.774821] PAX: sch->q.qlen: 0 n: 1 [ 681.774825] PAX: size overflow detected in function qdisc_tree_decrease_qlen net/sched/sch_api.c:769 cicus.693_49 min, count: 72, decl: qlen; num: 0; context: sk_buff_head; [ 681.774954] CPU: 2 PID: 19 Comm: ksoftirqd/2 Tainted: G O 4.2.6.201511282239-1-grsec #1 [ 681.774955] Hardware name: ASUSTeK COMPUTER INC. X302LJ/X302LJ, BIOS X302LJ.202 03/05/2015 [ 681.774956] ffffffffa9a04863 0000000000000000 0000000000000000 ffffffffa990ff7c [ 681.774959] ffffc90000d3bc38 ffffffffa95d2810 0000000000000007 ffffffffa991002b [ 681.774960] ffffc90000d3bc68 ffffffffa91a44f4 0000000000000001 0000000000000001 [ 681.774962] Call Trace: [ 681.774967] [<ffffffffa95d2810>] dump_stack+0x4c/0x7f [ 681.774970] [<ffffffffa91a44f4>] report_size_overflow+0x34/0x50 [ 681.774972] [<ffffffffa94d17e2>] qdisc_tree_decrease_qlen+0x152/0x160 [ 681.774976] [<ffffffffc02694b1>] fq_codel_dequeue+0x7b1/0x820 [sch_fq_codel] [ 681.774978] [<ffffffffc02680a0>] ? qdisc_peek_dequeued+0xa0/0xa0 [sch_fq_codel] [ 681.774980] [<ffffffffa94cd92d>] __qdisc_run+0x4d/0x1d0 [ 681.774983] [<ffffffffa949b2b2>] net_tx_action+0xc2/0x160 [ 681.774985] [<ffffffffa90664c1>] __do_softirq+0xf1/0x200 [ 681.774987] [<ffffffffa90665ee>] run_ksoftirqd+0x1e/0x30 [ 681.774989] [<ffffffffa90896b0>] smpboot_thread_fn+0x150/0x260 [ 681.774991] [<ffffffffa9089560>] ? sort_range+0x40/0x40 [ 681.774992] [<ffffffffa9085fe4>] kthread+0xe4/0x100 [ 681.774994] [<ffffffffa9085f00>] ? kthread_worker_fn+0x170/0x170 [ 681.774995] [<ffffffffa95d8d1e>] ret_from_fork+0x3e/0x70 mq/mqprio have their own ways to report qlen/drops by folding stats on all their queues, with appropriate locking. A second problem is that qdisc_tree_decrease_qlen() calls qdisc_lookup() without proper locking : concurrent qdisc updates could corrupt the list that qdisc_match_from_root() parses to find a qdisc given its handle. Fix first problem adding a TCQ_F_NOPARENT qdisc flag that qdisc_tree_decrease_qlen() can use to abort its tree traversal, as soon as it meets a mq/mqprio qdisc children. Second problem can be fixed by RCU protection. Qdisc are already freed after RCU grace period, so qdisc_list_add() and qdisc_list_del() simply have to use appropriate rcu list variants. A future patch will add a per struct netdev_queue list anchor, so that qdisc_tree_decrease_qlen() can have more efficient lookups. Reported-by: NDaniele Fucini <dfucini@gmail.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Cong Wang <cwang@twopensource.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
While testing the np->opt RCU conversion, I found that UDP/IPv6 was using a mixture of xchg() and sk_dst_lock to protect concurrent changes to sk->sk_dst_cache, leading to possible corruptions and crashes. ip6_sk_dst_lookup_flow() uses sk_dst_check() anyway, so the simplest way to fix the mess is to remove sk_dst_lock completely, as we did for IPv4. __ip6_dst_store() and ip6_dst_store() share same implementation. sk_setup_caps() being called with socket lock being held or not, we have to use sk_dst_set() instead of __sk_dst_set() Note that I had to move the "np->dst_cookie = rt6_get_cookie(rt);" in ip6_dst_store() before the sk_setup_caps(sk, dst) call. This is because ip6_dst_store() can be called from process context, without any lock held. As soon as the dst is installed in sk->sk_dst_cache, dst can be freed from another cpu doing a concurrent ip6_dst_store() Doing the dst dereference before doing the install is needed to make sure no use after free would trigger. Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NDmitry Vyukov <dvyukov@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 12月, 2015 2 次提交
-
-
由 Marcelo Ricardo Leitner 提交于
They don't need to be any bigger than that and with this we start a new bitfield for tracking association runtime stuff, like zero window situation. Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: NVlad Yasevich <vyasevich@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
This patch addresses multiple problems : UDP/RAW sendmsg() need to get a stable struct ipv6_txoptions while socket is not locked : Other threads can change np->opt concurrently. Dmitry posted a syzkaller (http://github.com/google/syzkaller) program desmonstrating use-after-free. Starting with TCP/DCCP lockless listeners, tcp_v6_syn_recv_sock() and dccp_v6_request_recv_sock() also need to use RCU protection to dereference np->opt once (before calling ipv6_dup_options()) This patch adds full RCU protection to np->opt Reported-by: NDmitry Vyukov <dvyukov@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 12月, 2015 3 次提交
-
-
由 Eric Dumazet 提交于
Dmitry provided a syzkaller (http://github.com/google/syzkaller) triggering a fault in sock_wake_async() when async IO is requested. Said program stressed af_unix sockets, but the issue is generic and should be addressed in core networking stack. The problem is that by the time sock_wake_async() is called, we should not access the @flags field of 'struct socket', as the inode containing this socket might be freed without further notice, and without RCU grace period. We already maintain an RCU protected structure, "struct socket_wq" so moving SOCKWQ_ASYNC_NOSPACE & SOCKWQ_ASYNC_WAITDATA into it is the safe route. It also reduces number of cache lines needing dirtying, so might provide a performance improvement anyway. In followup patches, we might move remaining flags (SOCK_NOSPACE, SOCK_PASSCRED, SOCK_PASSSEC) to save 8 bytes and let 'struct socket' being mostly read and let it being shared between cpus. Reported-by: NDmitry Vyukov <dvyukov@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
This patch is a cleanup to make following patch easier to review. Goal is to move SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA from (struct socket)->flags to a (struct socket_wq)->flags to benefit from RCU protection in sock_wake_async() To ease backports, we rename both constants. Two new helpers, sk_set_bit(int nr, struct sock *sk) and sk_clear_bit(int net, struct sock *sk) are added so that following patch can change their implementation. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nicolas Dichtel 提交于
This reverts commit ab450605. In IPv6, we cannot inherit the dst of the original dst. ndisc packets are IPv6 packets and may take another route than the original packet. This patch breaks the following scenario: a packet comes from eth0 and is forwarded through vxlan1. The encapsulated packet triggers an NS which cannot be sent because of the wrong route. CC: Jiri Benc <jbenc@redhat.com> CC: Thomas Graf <tgraf@suug.ch> Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 11月, 2015 3 次提交
-
-
由 Martin Blumenstingl 提交于
Commit 9c707762 ("packet: make packet_snd fail on len smaller than l2 header") added validation for the packet size in packet_snd. This change enforces that every packet needs a header (with at least hard_header_len bytes) plus a payload with at least one byte. Before this change the payload was optional. This fixes PPPoE connections which do not have a "Service" or "Host-Uniq" configured (which is violating the spec, but is still widely used in real-world setups). Those are currently failing with the following message: "pppd: packet size is too short (24 <= 24)" Signed-off-by: NMartin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Hannes Reinecke 提交于
When a cloned request is retried on other queues it always needs to be checked against the queue limits of that queue. Otherwise the calculations for nr_phys_segments might be wrong, leading to a crash in scsi_init_sgtable(). To clarify this the patch renames blk_rq_check_limits() to blk_cloned_rq_check_limits() and removes the symbol export, as the new function should only be used for cloned requests and never exported. Cc: Mike Snitzer <snitzer@redhat.com> Cc: Ewan Milne <emilne@redhat.com> Cc: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: NHannes Reinecke <hare@suse.de> Fixes: e2a60da7 ("block: Clean up special command handling logic") Cc: stable@vger.kernel.org # 3.7+ Acked-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Matias Bjørling 提交于
The get_bb_tbl function takes ppa as a generic address, which is converted to the ppa device address within the device driver. When the update_bbtbl callback is called from get_bb_tbl, the device specific ppa is used, instead of the generic ppa. Make sure to pass the generic ppa. Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 29 11月, 2015 2 次提交
-
-
由 Bart Van Assche 提交于
The last user is gone. Hence remove this function. Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Joern Engel <joern@logfs.org> Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
-
由 Nicholas Bellinger 提交于
This patch addresses a race + use after free where the first stage of COMPARE_AND_WRITE in compare_and_write_callback() is rescheduled after the backend sends the secondary WRITE, resulting in second stage compare_and_write_post() callback completing in target_complete_ok_work() before the first can return. Because current code depends on checking se_cmd->se_cmd_flags after return from se_cmd->transport_complete_callback(), this results in first stage having SCF_COMPARE_AND_WRITE_POST set, which incorrectly falls through into second stage CAW processing code, eventually triggering a NULL pointer dereference due to use after free. To address this bug, pass in a new *post_ret parameter into se_cmd->transport_complete_callback(), and depend upon this value instead of ->se_cmd_flags to determine when to return or fall through into ->queue_status() code for CAW. Cc: Sagi Grimberg <sagig@mellanox.com> Cc: <stable@vger.kernel.org> # v3.12+ Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
-
- 26 11月, 2015 2 次提交
-
-
由 Gabriele Paoloni 提交于
Commit b3a72384 ("ARM/PCI: Replace pci_sys_data->align_resource with global function pointer") introduced an ARM-specific align_resource() function pointer. This is not portable to other arches and doesn't work for platforms with two different PCIe host bridge controllers. Move the function pointer to the pci_host_bridge structure so each host bridge driver can specify its own align_resource() function. Signed-off-by: NGabriele Paoloni <gabriele.paoloni@huawei.com> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Reviewed-by: NArnd Bergmann <arnd@arndb.de>
-
由 Daniel Borkmann 提交于
Currently, when having map file descriptors pointing to program arrays, there's still the issue that we unconditionally flush program array contents via bpf_fd_array_map_clear() in bpf_map_release(). This happens when such a file descriptor is released and is independent of the map's refcount. Having this flush independent of the refcount is for a reason: there can be arbitrary complex dependency chains among tail calls, also circular ones (direct or indirect, nesting limit determined during runtime), and we need to make sure that the map drops all references to eBPF programs it holds, so that the map's refcount can eventually drop to zero and initiate its freeing. Btw, a walk of the whole dependency graph would not be possible for various reasons, one being complexity and another one inconsistency, i.e. new programs can be added to parts of the graph at any time, so there's no guaranteed consistent state for the time of such a walk. Now, the program array pinning itself works, but the issue is that each derived file descriptor on close would nevertheless call unconditionally into bpf_fd_array_map_clear(). Instead, keep track of users and postpone this flush until the last reference to a user is dropped. As this only concerns a subset of references (f.e. a prog array could hold a program that itself has reference on the prog array holding it, etc), we need to track them separately. Short analysis on the refcounting: on map creation time usercnt will be one, so there's no change in behaviour for bpf_map_release(), if unpinned. If we already fail in map_create(), we are immediately freed, and no file descriptor has been made public yet. In bpf_obj_pin_user(), we need to probe for a possible map in bpf_fd_probe_obj() already with a usercnt reference, so before we drop the reference on the fd with fdput(). Therefore, if actual pinning fails, we need to drop that reference again in bpf_any_put(), otherwise we keep holding it. When last reference drops on the inode, the bpf_any_put() in bpf_evict_inode() will take care of dropping the usercnt again. In the bpf_obj_get_user() case, the bpf_any_get() will grab a reference on the usercnt, still at a time when we have the reference on the path. Should we later on fail to grab a new file descriptor, bpf_any_put() will drop it, otherwise we hold it until bpf_map_release() time. Joint work with Alexei. Fixes: b2197755 ("bpf: add support for persistent maps/progs") Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 11月, 2015 3 次提交
-
-
由 Arnd Bergmann 提交于
As reported by Michal Simek, building an ARM64 kernel with CONFIG_UID16 disabled currently fails because the system call table still needs to reference the individual function entry points that are provided by kernel/sys_ni.c in this case, and the declarations are hidden inside of #ifdef CONFIG_UID16: arch/arm64/include/asm/unistd32.h:57:8: error: 'sys_lchown16' undeclared here (not in a function) __SYSCALL(__NR_lchown, sys_lchown16) I believe this problem only exists on ARM64, because older architectures tend to not need declarations when their system call table is built in assembly code, while newer architectures tend to not need UID16 support. ARM64 only uses these system calls for compatibility with 32-bit ARM binaries. This changes the CONFIG_UID16 check into CONFIG_HAVE_UID16, which is set unconditionally on ARM64 with CONFIG_COMPAT, so we see the declarations whenever we need them, but otherwise the behavior is unchanged. Fixes: af1839eb ("Kconfig: clean up the long arch list for the UID16 config option") Signed-off-by: NArnd Bergmann <arnd@arndb.de> Acked-by: NWill Deacon <will.deacon@arm.com> Cc: stable@vger.kernel.org Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-
由 Michal Kubeček 提交于
If a fragmented multicast packet is received on an ethernet device which has an active macvlan on top of it, each fragment is duplicated and received both on the underlying device and the macvlan. If some fragments for macvlan are processed before the whole packet for the underlying device is reassembled, the "overlapping fragments" test in ip6_frag_queue() discards the whole fragment queue. To resolve this, add device ifindex to the search key and require it to match reassembling multicast packets and packets to link-local addresses. Note: similar patch has been already submitted by Yoshifuji Hideaki in http://patchwork.ozlabs.org/patch/220979/ but got lost and forgotten for some reason. Signed-off-by: NMichal Kubecek <mkubecek@suse.cz> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Christoffer Dall 提交于
We were incorrectly removing the active state from the physical distributor on the timer interrupt when the timer output level was deasserted. We shouldn't be doing this without considering the virtual interrupt's active state, because the architecture requires that when an LR has the HW bit set and the pending or active bits set, then the physical interrupt must also have the corresponding bits set. This addresses an issue where we have been observing an inconsistency between the LR state and the physical distributor state where the LR state was active and the physical distributor was not active, which shouldn't happen. Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
-
- 24 11月, 2015 4 次提交
-
-
由 Jeff Layton 提交于
When LAYOUTGET gets NFS4ERR_DELAY, we currently will wait 15s before retrying the call. That is a _very_ long time, so add a timeout value to struct nfs4_layoutget and pass nfs4_async_handle_error a pointer to it. This allows the RPC engine to use a sliding delay window, instead of a 15s delay. Signed-off-by: NJeff Layton <jeff.layton@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Christoph Hellwig 提交于
The NFS CLONE_RANGE defintion was wrong and thus never worked. Fix this by simply using the btrfs ioctl defintion. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Arnd Bergmann 提交于
When the prototype for thermal_zone_bind_cooling_device changed, the static inline wrapper function was left alone, which in theory can cause build warnings: I have seen this error in the past: drivers/thermal/db8500_thermal.c: In function 'db8500_cdev_bind': drivers/thermal/db8500_thermal.c:78:9: error: too many arguments to function 'thermal_zone_bind_cooling_device' ret = thermal_zone_bind_cooling_device(thermal, i, cdev, while this one no longer shows up, there is no doubt that the prototype is still wrong, so let's just fix it anyway. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Fixes: 6cd9e9f6 ("thermal: of: fix cooling device weights in device tree") Signed-off-by: NEduardo Valentin <edubezval@gmail.com>
-
由 Rainer Weikusat 提交于
Rainer Weikusat <rweikusat@mobileactivedefense.com> writes: An AF_UNIX datagram socket being the client in an n:1 association with some server socket is only allowed to send messages to the server if the receive queue of this socket contains at most sk_max_ack_backlog datagrams. This implies that prospective writers might be forced to go to sleep despite none of the message presently enqueued on the server receive queue were sent by them. In order to ensure that these will be woken up once space becomes again available, the present unix_dgram_poll routine does a second sock_poll_wait call with the peer_wait wait queue of the server socket as queue argument (unix_dgram_recvmsg does a wake up on this queue after a datagram was received). This is inherently problematic because the server socket is only guaranteed to remain alive for as long as the client still holds a reference to it. In case the connection is dissolved via connect or by the dead peer detection logic in unix_dgram_sendmsg, the server socket may be freed despite "the polling mechanism" (in particular, epoll) still has a pointer to the corresponding peer_wait queue. There's no way to forcibly deregister a wait queue with epoll. Based on an idea by Jason Baron, the patch below changes the code such that a wait_queue_t belonging to the client socket is enqueued on the peer_wait queue of the server whenever the peer receive queue full condition is detected by either a sendmsg or a poll. A wake up on the peer queue is then relayed to the ordinary wait queue of the client socket via wake function. The connection to the peer wait queue is again dissolved if either a wake up is about to be relayed or the client socket reconnects or a dead peer is detected or the client socket is itself closed. This enables removing the second sock_poll_wait from unix_dgram_poll, thus avoiding the use-after-free, while still ensuring that no blocked writer sleeps forever. Signed-off-by: NRainer Weikusat <rweikusat@mobileactivedefense.com> Fixes: ec0d215f ("af_unix: fix 'poll for write'/connected DGRAM sockets") Reviewed-by: NJason Baron <jbaron@akamai.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 11月, 2015 1 次提交
-
-
由 Jesper Dangaard Brouer 提交于
Adjust kmem_cache_alloc_bulk API before we have any real users. Adjust API to return type 'int' instead of previously type 'bool'. This is done to allow future extension of the bulk alloc API. A future extension could be to allow SLUB to stop at a page boundary, when specified by a flag, and then return the number of objects. The advantage of this approach, would make it easier to make bulk alloc run without local IRQs disabled. With an approach of cmpxchg "stealing" the entire c->freelist or page->freelist. To avoid overshooting we would stop processing at a slab-page boundary. Else we always end up returning some objects at the cost of another cmpxchg. To keep compatible with future users of this API linking against an older kernel when using the new flag, we need to return the number of allocated objects with this API change. Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: NChristoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 21 11月, 2015 5 次提交
-
-
由 Peter Hurley 提交于
The data to audit/record is in the 'from' buffer (ie., the input read buffer). Fixes: 72586c60 ("n_tty: Fix auditing support for cannonical mode") Cc: stable <stable@vger.kernel.org> # 4.1+ Cc: Miloslav Trmač <mitr@redhat.com> Signed-off-by: NPeter Hurley <peter@hurleysoftware.com> Acked-by: NLaura Abbott <labbott@fedoraproject.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Jeff Layton 提交于
sparse says: include/linux/gfp.h:274:26: warning: incorrect type in return expression (different base types) include/linux/gfp.h:274:26: expected bool include/linux/gfp.h:274:26: got restricted gfp_t ...add a forced cast to silence the warning. Signed-off-by: NJeff Layton <jeff.layton@primarydata.com> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Richard Weinberger 提交于
sigsuspend() is nowhere used except in signal.c itself, so we can mark it static do not pollute the global namespace. But this patch is more than a boring cleanup patch, it fixes a real issue on UserModeLinux. UML has a special console driver to display ttys using xterm, or other terminal emulators, on the host side. Vegard reported that sometimes UML is unable to spawn a xterm and he's facing the following warning: WARNING: CPU: 0 PID: 908 at include/linux/thread_info.h:128 sigsuspend+0xab/0xc0() It turned out that this warning makes absolutely no sense as the UML xterm code calls sigsuspend() on the host side, at least it tries. But as the kernel itself offers a sigsuspend() symbol the linker choose this one instead of the glibc wrapper. Interestingly this code used to work since ever but always blocked signals on the wrong side. Some recent kernel change made the WARN_ON() trigger and uncovered the bug. It is a wonderful example of how much works by chance on computers. :-) Fixes: 68f3f16d ("new helper: sigsuspend()") Signed-off-by: NRichard Weinberger <richard@nod.at> Reported-by: NVegard Nossum <vegard.nossum@oracle.com> Tested-by: NVegard Nossum <vegard.nossum@oracle.com> Acked-by: NOleg Nesterov <oleg@redhat.com> Cc: <stable@vger.kernel.org> [3.5+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Daniel Baluta 提交于
This patchset introduces IIO software triggers, offers a way of configuring them via configfs and adds the IIO hrtimer based interrupt source to be used with software triggers. The architecture is now split in 3 parts, to remove all IIO trigger specific parts from IIO configfs core: (1) IIO configfs - creates the root of the IIO configfs subsys. (2) IIO software triggers - software trigger implementation, dynamically creating /config/iio/triggers group. (3) IIO hrtimer trigger - is the first interrupt source for software triggers (with syfs to follow). Each trigger type can implement its own set of attributes. Lockdep seems to be happy with the locking in configfs patch. This patch (of 5): We don't want to hardcode default groups at subsystem creation time. We export: * configfs_register_group * configfs_unregister_group to allow drivers to programatically create/destroy groups later, after module init time. This is needed for IIO configfs support. (akpm: the other 4 patches to be merged via the IIO tree) Signed-off-by: NDaniel Baluta <daniel.baluta@intel.com> Suggested-by: NLars-Peter Clausen <lars@metafoo.de> Reviewed-by: NChristoph Hellwig <hch@lst.de> Acked-by: NJoel Becker <jlbec@evilplan.org> Cc: Hartmut Knaack <knaack.h@gmx.de> Cc: Octavian Purdila <octavian.purdila@intel.com> Cc: Paul Bolle <pebolle@tiscali.nl> Cc: Adriana Reus <adriana.reus@intel.com> Cc: Cristina Opriceana <cristina.opriceana@gmail.com> Cc: Peter Meerwald <pmeerw@pmeerw.net> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Rasmus Villemoes 提交于
The various allocators return aligned memory. Telling the compiler that allows it to generate better code in many cases, for example when the return value is immediately passed to memset(). Some code does become larger, but at least we win twice as much as we lose: $ scripts/bloat-o-meter /tmp/vmlinux vmlinux add/remove: 0/0 grow/shrink: 13/52 up/down: 995/-2140 (-1145) An example of the different (and smaller) code can be seen in mm_alloc(). Before: : 48 8d 78 08 lea 0x8(%rax),%rdi : 48 89 c1 mov %rax,%rcx : 48 89 c2 mov %rax,%rdx : 48 c7 00 00 00 00 00 movq $0x0,(%rax) : 48 c7 80 48 03 00 00 movq $0x0,0x348(%rax) : 00 00 00 00 : 31 c0 xor %eax,%eax : 48 83 e7 f8 and $0xfffffffffffffff8,%rdi : 48 29 f9 sub %rdi,%rcx : 81 c1 50 03 00 00 add $0x350,%ecx : c1 e9 03 shr $0x3,%ecx : f3 48 ab rep stos %rax,%es:(%rdi) After: : 48 89 c2 mov %rax,%rdx : b9 6a 00 00 00 mov $0x6a,%ecx : 31 c0 xor %eax,%eax : 48 89 d7 mov %rdx,%rdi : f3 48 ab rep stos %rax,%es:(%rdi) So gcc's strategy is to do two possibly (but not really, of course) unaligned stores to the first and last word, then do an aligned rep stos covering the middle part with a little overlap. Maybe arches which do not allow unaligned stores gain even more. I don't know if gcc can actually make use of alignments greater than 8 for anything, so one could probably drop the __assume_xyz_alignment macros and just use __assume_aligned(8). The increases in code size are mostly caused by gcc deciding to opencode strlen() using the check-four-bytes-at-a-time trick when it knows the buffer is sufficiently aligned (one function grew by 200 bytes). Now it turns out that many of these strlen() calls showing up were in fact redundant, and they're gone from -next. Applying the two patches to next-20151001 bloat-o-meter instead says add/remove: 0/0 grow/shrink: 6/52 up/down: 244/-2140 (-1896) Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: NChristoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 20 11月, 2015 2 次提交
-
-
由 Javier Gonzalez 提交于
Add free block, used block, and bad block information to the show debug interface. This information is used to debug how targets track blocks. Also, change debug function name to make it more generic. Signed-off-by: NJavier Gonzalez <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Javier Gonzalez 提交于
Maintain number of in use blocks, free blocks, and bad blocks in a per lun basis. This allows the upper layers to get information about the state of each lun. Also, account for blocks reserved to the device on the free block count. nr_free_blocks matches now the actual number of blocks on the free list when the device is booted. Signed-off-by: NJavier Gonzalez <javier@cnexlabs.com> Signed-off-by: NMatias Bjørling <m@bjorling.me> Signed-off-by: NJens Axboe <axboe@fb.com>
-