- 11 10月, 2015 1 次提交
-
-
由 Alexei Starovoitov 提交于
eBPF socket filter programs may see junk in 'u32 cb[5]' area, since it could have been used by protocol layers earlier. For socket filter programs used in af_packet we need to clean 20 bytes of skb->cb area if it could be used by the program. For programs attached to TCP/UDP sockets we need to save/restore these 20 bytes, since it's used by protocol layers. Remove SK_RUN_FILTER macro, since it's no longer used. Long term we may move this bpf cb area to per-cpu scratch, but that requires addition of new 'per-cpu load/store' instructions, so not suitable as a short term fix. Fixes: d691f9e8 ("bpf: allow programs to write to certain skb fields") Reported-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 10月, 2015 10 次提交
-
-
由 Yaowei Bai 提交于
This patch makes lockdep_rtnl_is_held return bool due to this particular function only using either one or zero as its return value. In another patch lockdep_is_held is also made return bool. No functional change. Signed-off-by: NYaowei Bai <bywxiaobai@163.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yaowei Bai 提交于
This patch makes bad_mask return bool due to this particular function only using either one or zero as its return value. No functional change. Signed-off-by: NYaowei Bai <bywxiaobai@163.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yaowei Bai 提交于
This patch makes inet_ifa_match return bool due to this particular function only using either one or zero as its return value. No functional change. Signed-off-by: NYaowei Bai <bywxiaobai@163.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yaowei Bai 提交于
This patch makes dccp_list_has_service return bool due to this particular function only using either one or zero as its return value. No functional change. Signed-off-by: NYaowei Bai <bywxiaobai@163.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yaowei Bai 提交于
This patch makes can_dropped_invalid_skb return bool due to this particular function only using either one or zero as its return value. No functional change. Signed-off-by: NYaowei Bai <bywxiaobai@163.com> Acked-by: NMarc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yaowei Bai 提交于
This patch makes lockdep_nfnl_is_held return bool to improve readability due to this particular function only using either one or zero as its return value. No functional change. Signed-off-by: NYaowei Bai <bywxiaobai@163.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yaowei Bai 提交于
This patch makes ieee80211_is_* return bool to improve readability due to these particular functions only using either one or zero as their return value. No functional change. Signed-off-by: NYaowei Bai <bywxiaobai@163.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yaowei Bai 提交于
This patch makes lockdep_genl_is_held return bool to improve readability due to this particular function only using either one or zero as its return value. No functional change. Signed-off-by: NYaowei Bai <bywxiaobai@163.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eli Cohen 提交于
Use a single threaded work queue for each device in the system instead of using one thread for any device. This is required so we can concurrently process system error handling for all the devices that need that. Signed-off-by: NEli Cohen <eli@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eli Cohen 提交于
In preparation to handling system errors at the mlx5_core level, change the interface of cmd_work_handler to accept a 64 bit argument for the vector. This allows to encode a flag that signifies when the handler is called as a result of a driver logic that wishes to terminate commands that the hardware may not be able to terminate. Such command completions are detected at the handler and proper return status is encoded. To be able to terminate page handler commands, we make sure to set the corresponding bit in the bitmask. Signed-off-by: NEli Cohen <eli@mellanox.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 10月, 2015 6 次提交
-
-
由 Daniel Borkmann 提交于
While recently arguing on a seccomp discussion that raw prandom_u32() access shouldn't be exposed to unpriviledged user space, I forgot the fact that SKF_AD_RANDOM extension actually already does it for some time in cBPF via commit 4cd3675e ("filter: added BPF random opcode"). Since prandom_u32() is being used in a lot of critical networking code, lets be more conservative and split their states. Furthermore, consolidate eBPF and cBPF prandom handlers to use the new internal PRNG. For eBPF, bpf_get_prandom_u32() was only accessible for priviledged users, but should that change one day, we also don't want to leak raw sequences through things like eBPF maps. One thought was also to have own per bpf_prog states, but due to ABI reasons this is not easily possible, i.e. the program code currently cannot access bpf_prog itself, and copying the rnd_state to/from the stack scratch space whenever a program uses the prng seems not really worth the trouble and seems too hacky. If needed, taus113 could in such cases be implemented within eBPF using a map entry to keep the state space, or get_random_bytes() could become a second helper in cases where performance would not be critical. Both sides can trigger a one-time late init via prandom_init_once() on the shared state. Performance-wise, there should even be a tiny gain as bpf_user_rnd_u32() saves one function call. The PRNG needs to live inside the BPF core since kernels could have a NET-less config as well. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: NAlexei Starovoitov <ast@plumgrid.com> Cc: Chema Gonzalez <chema@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
Add a prandom_init_once() facility that works on the rnd_state, so that users that are keeping their own state independent from prandom_u32() can initialize their taus113 per cpu states. The motivation here is similar to net_get_random_once(): initialize the state as late as possible in the hope that enough entropy has been collected for the seeding. prandom_init_once() makes use of the recently introduced prandom_seed_full_state() helper and is generic enough so that it could also be used on fast-paths due to the DO_ONCE(). Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Hannes Frederic Sowa 提交于
Make the get_random_once() helper generic enough, so that functions in general would only be called once, where one user of this is then net_get_random_once(). The only implementation specific call is to get_random_bytes(), all the rest of this *_once() facility would be duplicated among different subsystems otherwise. The new DO_ONCE() helper will be used by prandom() later on, but might also be useful for other scenarios/subsystems as well where a one-time initialization in often-called, possibly fast path code could occur. Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Hannes Frederic Sowa 提交于
There's no good reason why users outside of networking should not be using this facility, f.e. for initializing their seeds. Therefore, make it accessible from there as get_random_once(). Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Arun Parameswaran 提交于
Add support for the Broadcom Cygnus SoCs internal PHY's. The PHYs are 1000M/100M/10M capable with support for 'EEE' and 'APD' (Auto Power Down). This driver supports the following Broadcom Cygnus SoCs: - BCM583XX (BCM58300, BCM58302, BCM58303, BCM58305) - BCM113XX (BCM11300, BCM11320, BCM11350, BCM11360) The PHY's on these SoC's require some workarounds for stable operation, both during configuration time and during suspend/resume. This driver handles the application of the workarounds. Signed-off-by: NArun Parameswaran <arunp@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Arun Parameswaran 提交于
This patch adds the Broadcom phy library to consolidate common interfaces shared by Broadcom phy's. Moved the common interfaces to the 'bcm-phy-lib.c' and updated the Broadcom PHY drivers to use the new APIs. Signed-off-by: NArun Parameswaran <arunp@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 10月, 2015 1 次提交
-
-
由 David Ahern 提交于
IPv6 addrconf keys off of IFF_SLAVE so can not use it for L3 slave. Add a new private flag and add netif_is_l3_slave function for checking it. Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 10月, 2015 2 次提交
-
-
由 Jon Ringle 提交于
This commit allows installing a custom reg_update_bits function for cases where the hardware provides a mechanism to set or clear register bits without a read/modify/write cycle. Such is the case with the Microchip ENCX24J600. If a custom reg_update_bits function is provided, it will only be used against volatile registers. Signed-off-by: NJon Ringle <jringle@gridpoint.com> Signed-off-by: NMark Brown <broonie@kernel.org>
-
由 David S. Miller 提交于
This reverts commit 7741c373.
-
- 05 10月, 2015 4 次提交
-
-
由 Daniel Borkmann 提交于
Commit ea317b26 ("bpf: Add new bpf map type to store the pointer to struct perf_event") added perf_event.h to the main eBPF header, so it gets included for all users. perf_event.h is actually only needed from array map side, so lets sanitize this a bit. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Cc: Kaixu Xia <xiakaixu@huawei.com> Acked-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
The current ongoing effort to dump existing cBPF seccomp filters back to user space requires to hold the pre-transformed instructions like we do in case of socket filters from sk_attach_filter() side, so they can be reloaded in original form at a later point in time by utilities such as criu. To prepare for this, simply extend the bpf_prog_create_from_user() API to hold a flag that tells whether we should store the original or not. Also, fanout filters could make use of that in future for things like diag. While fanout filters already use bpf_prog_destroy(), move seccomp over to them as well to handle original programs when present. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Cc: Tycho Andersen <tycho.andersen@canonical.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Tested-by: NTycho Andersen <tycho.andersen@canonical.com> Acked-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jon Ringle 提交于
This commit allows installing a custom reg_update_bits function for cases where the hardware provides a mechanism to set or clear register bits without a read/modify/write cycle. Such is the case with the Microchip ENCX24J600. Signed-off-by: NJon Ringle <jringle@gridpoint.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
SYN_RECV & TIMEWAIT sockets are not full blown, they do not have a pinet6 pointer. Fixes: ca6fb065 ("tcp: attach SYNACK messages to request sockets instead of listener") Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 10月, 2015 3 次提交
-
-
由 Simon Horman 提交于
Add a helper to allow ethernet drivers to limit the speed of a phy (that they are attached to). This mainly involves factoring out the business-end of of_set_phy_supported() and exporting a new symbol. This code seems to be open coded in several places, in several different variants. It is is envisaged that this will be used in situations where setting the "max-speed" property in DT is not appropriate, e.g. because the maximum speed is not a property of the phy hardware. Signed-off-by: NSimon Horman <horms+renesas@verge.net.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
Using routing realms as part of the classifier is quite useful, it can be viewed as a tag for one or multiple routing entries (think of an analogy to net_cls cgroup for processes), set by user space routing daemons or via iproute2 as an indicator for traffic classifiers and later on processed in the eBPF program. Unlike actions, the classifier can inspect device flags and enable netif_keep_dst() if necessary. tc actions don't have that possibility, but in case people know what they are doing, it can be used from there as well (e.g. via devs that must keep dsts by design anyway). If a realm is set, the handler returns the non-zero realm. User space can set the full 32bit realm for the dst. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
As we need to add further flags to the bpf_prog structure, lets migrate both bools to a bitfield representation. The size of the base structure (excluding insns) remains unchanged at 40 bytes. Add also tags for the kmemchecker, so that it doesn't throw false positives. Even in case gcc would generate suboptimal code, it's not being accessed in performance critical paths. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 10月, 2015 2 次提交
-
-
由 Greg Thelen 提交于
Commit 733a572e ("memcg: make mem_cgroup_read_{stat|event}() iterate possible cpus instead of online") removed the last use of the per memcg pcp_counter_lock but forgot to remove the variable. Kill the vestigial variable. Signed-off-by: NGreg Thelen <gthelen@google.com> Acked-by: NMichal Hocko <mhocko@suse.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Greg Thelen 提交于
The problem starts with a file backed dirty page which is charged to a memcg. Then page migration is used to move oldpage to newpage. Migration: - copies the oldpage's data to newpage - clears oldpage.PG_dirty - sets newpage.PG_dirty - uncharges oldpage from memcg - charges newpage to memcg Clearing oldpage.PG_dirty decrements the charged memcg's dirty page count. However, because newpage is not yet charged, setting newpage.PG_dirty does not increment the memcg's dirty page count. After migration completes newpage.PG_dirty is eventually cleared, often in account_page_cleaned(). At this time newpage is charged to a memcg so the memcg's dirty page count is decremented which causes underflow because the count was not previously incremented by migration. This underflow causes balance_dirty_pages() to see a very large unsigned number of dirty memcg pages which leads to aggressive throttling of buffered writes by processes in non root memcg. This issue: - can harm performance of non root memcg buffered writes. - can report too small (even negative) values in memory.stat[(total_)dirty] counters of all memcg, including the root. To avoid polluting migrate.c with #ifdef CONFIG_MEMCG checks, introduce page_memcg() and set_page_memcg() helpers. Test: 0) setup and enter limited memcg mkdir /sys/fs/cgroup/test echo 1G > /sys/fs/cgroup/test/memory.limit_in_bytes echo $$ > /sys/fs/cgroup/test/cgroup.procs 1) buffered writes baseline dd if=/dev/zero of=/data/tmp/foo bs=1M count=1k sync grep ^dirty /sys/fs/cgroup/test/memory.stat 2) buffered writes with compaction antagonist to induce migration yes 1 > /proc/sys/vm/compact_memory & rm -rf /data/tmp/foo dd if=/dev/zero of=/data/tmp/foo bs=1M count=1k kill % sync grep ^dirty /sys/fs/cgroup/test/memory.stat 3) buffered writes without antagonist, should match baseline rm -rf /data/tmp/foo dd if=/dev/zero of=/data/tmp/foo bs=1M count=1k sync grep ^dirty /sys/fs/cgroup/test/memory.stat (speed, dirty residue) unpatched patched 1) 841 MB/s 0 dirty pages 886 MB/s 0 dirty pages 2) 611 MB/s -33427456 dirty pages 793 MB/s 0 dirty pages 3) 114 MB/s -33427456 dirty pages 891 MB/s 0 dirty pages Notice that unpatched baseline performance (1) fell after migration (3): 841 -> 114 MB/s. In the patched kernel, post migration performance matches baseline. Fixes: c4843a75 ("memcg: add per cgroup dirty page accounting") Signed-off-by: NGreg Thelen <gthelen@google.com> Reported-by: NDave Hansen <dave.hansen@intel.com> Acked-by: NMichal Hocko <mhocko@suse.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Cc: <stable@vger.kernel.org> [4.2+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 30 9月, 2015 11 次提交
-
-
由 Eric W. Biederman 提交于
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
-
由 David Ahern 提交于
Change CONFIG dependency to CONFIG_NET_L3_MASTER_DEV as well. Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
L3 master devices allow users of the abstraction to influence FIB lookups for enslaved devices. Current API provides a means for the master device to return a specific FIB table for an enslaved device, to return an rtable/custom dst and influence the OIF used for fib lookups. Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
Rename IFF_VRF_MASTER to IFF_L3MDEV_MASTER and update the name of the netif_is_vrf and netif_index_is_vrf macros. Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
While auditing TCP stack for upcoming 'lockless' listener changes, I found I had to change fastopen_init_queue() to properly init the object before publishing it. Otherwise an other cpu could try to lock the spinlock before it gets properly initialized. Instead of adding appropriate barriers, just remove dynamic memory allocations : - Structure is 28 bytes on 64bit arches. Using additional 8 bytes for holding a pointer seems overkill. - Two listeners can share same cache line and performance would suffer. If we really want to save few bytes, we would instead dynamically allocate whole struct request_sock_queue in the future. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pravin B Shelar 提交于
Earlier patch 6ae459bd tried to detect void ckecksum partial skb by comparing pull length to checksum offset. But it does not work for all cases since checksum-offset depends on updates to skb->data. Following patch fixes it by validating checksum start offset after skb-data pointer is updated. Negative value of checksum offset start means there is no need to checksum. Fixes: 6ae459bd ("skbuff: Fix skb checksum flag on skb pull") Reported-by: NAndrew Vagin <avagin@odin.com> Signed-off-by: NPravin B Shelar <pshelar@nicira.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
This patch updates ip_check_mc_rcu so that protocol is passed as a u8 instead of a u16. The motivation is just to avoid any unneeded type transitions since some systems will require an instruction to zero extend a u8 field to a u16. Also it makes it a bit more readable as to the fact that protocol is a u8 so there are no byte ordering changes needed to pass it. Signed-off-by: NAlexander Duyck <aduyck@mirantis.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
Don't make ip6_route_me_harder guess which network namespace it is routing in, pass the network namespace in. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Eric W. Biederman 提交于
Don't make ip_route_me_harder guess which network namespace it is routing in, pass the network namespace in. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Eric W. Biederman 提交于
The network namespace is needed when routing a packet. Stop making nf_afinfo.reroute guess which network namespace is the proper namespace to route the packet in. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-