- 08 10月, 2015 40 次提交
-
-
由 Daniel Borkmann 提交于
While recently arguing on a seccomp discussion that raw prandom_u32() access shouldn't be exposed to unpriviledged user space, I forgot the fact that SKF_AD_RANDOM extension actually already does it for some time in cBPF via commit 4cd3675e ("filter: added BPF random opcode"). Since prandom_u32() is being used in a lot of critical networking code, lets be more conservative and split their states. Furthermore, consolidate eBPF and cBPF prandom handlers to use the new internal PRNG. For eBPF, bpf_get_prandom_u32() was only accessible for priviledged users, but should that change one day, we also don't want to leak raw sequences through things like eBPF maps. One thought was also to have own per bpf_prog states, but due to ABI reasons this is not easily possible, i.e. the program code currently cannot access bpf_prog itself, and copying the rnd_state to/from the stack scratch space whenever a program uses the prng seems not really worth the trouble and seems too hacky. If needed, taus113 could in such cases be implemented within eBPF using a map entry to keep the state space, or get_random_bytes() could become a second helper in cases where performance would not be critical. Both sides can trigger a one-time late init via prandom_init_once() on the shared state. Performance-wise, there should even be a tiny gain as bpf_user_rnd_u32() saves one function call. The PRNG needs to live inside the BPF core since kernels could have a NET-less config as well. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: NAlexei Starovoitov <ast@plumgrid.com> Cc: Chema Gonzalez <chema@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
Add a prandom_init_once() facility that works on the rnd_state, so that users that are keeping their own state independent from prandom_u32() can initialize their taus113 per cpu states. The motivation here is similar to net_get_random_once(): initialize the state as late as possible in the hope that enough entropy has been collected for the seeding. prandom_init_once() makes use of the recently introduced prandom_seed_full_state() helper and is generic enough so that it could also be used on fast-paths due to the DO_ONCE(). Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
Factor out the full reseed handling code that populates the state through get_random_bytes() and runs prandom_warmup(). The resulting prandom_seed_full_state() will be used later on in more than the current __prandom_reseed() user. Fix also two minor whitespace issues along the way. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Hannes Frederic Sowa 提交于
Make the get_random_once() helper generic enough, so that functions in general would only be called once, where one user of this is then net_get_random_once(). The only implementation specific call is to get_random_bytes(), all the rest of this *_once() facility would be duplicated among different subsystems otherwise. The new DO_ONCE() helper will be used by prandom() later on, but might also be useful for other scenarios/subsystems as well where a one-time initialization in often-called, possibly fast path code could occur. Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Hannes Frederic Sowa 提交于
There's no good reason why users outside of networking should not be using this facility, f.e. for initializing their seeds. Therefore, make it accessible from there as get_random_once(). Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
Commit deaa0a6a ("net: Lookup actual route when oif is VRF device") exposed a bug in __ip_route_output_key_hash for VRF devices: on FIB lookup failure if the oif is specified the current logic drops to make_route on the assumption that the route tables are wrong. For VRF/L3 master devices this leads to wrong dst entries and route lookups. For example: $ ip route ls table vrf-red unreachable default broadcast 10.2.1.0 dev eth1 proto kernel scope link src 10.2.1.2 10.2.1.0/24 dev eth1 proto kernel scope link src 10.2.1.2 local 10.2.1.2 dev eth1 proto kernel scope host src 10.2.1.2 broadcast 10.2.1.255 dev eth1 proto kernel scope link src 10.2.1.2 $ ip route get oif vrf-red 1.1.1.1 1.1.1.1 dev vrf-red src 10.0.0.2 cache With this patch: $ ip route get oif vrf-red 1.1.1.1 RTNETLINK answers: No route to host which is the correct response based on the default route Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
Similar to commit c29390c6 ("xps: must clear sender_cpu before forwarding"), we also need to clear the skb->sender_cpu when moving from RX to TX via skb_do_redirect() due to the shared location of napi_id (used on RX) and sender_cpu (used on TX). Fixes: 27b29f63 ("bpf: add bpf_redirect() helper") Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Arnd Bergmann 提交于
The recently added hns driver causes a build warning in ARM allmodconfig builds: drivers/net/ethernet/hisilicon/hns/hnae.c: In function 'handles_show': drivers/net/ethernet/hisilicon/hns/hnae.c:452:13: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] j, (u64)h->qs[i]->io_base); ^ This removes the pointless cast and prints the pointer address using the "%p" format string in all three locations. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jon Ringle 提交于
This ethernet driver supports the Micorchip enc424j600/626j600 Ethernet controller over a SPI bus interface. This driver makes use of the regmap API to optimize access to registers by caching registers where possible. Datasheet: http://ww1.microchip.com/downloads/en/DeviceDoc/39935b.pdfSigned-off-by: NJon Ringle <jringle@gridpoint.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Arun Parameswaran says: ==================== Add support for Broadcom's iProc MDIO and Cygnus Ethernet PHY This patchset adds support for the iProc MDIO interface and the Broadcom Cygnus SoC's internal Ethernet PHY. The internal Ethernet PHY(s) in the Cygnus SoC's are accessed via the MDIO interface found in most of the iProc based chips. The patch also consolidates the common API's used by the Broadcom phys to a common library. Existing Broadcom phy drivers have been modified to use the common library API's. This patch series is based on Linux v4.3-rc1 and is avaliable in: https://github.com/Broadcom/cygnus-linux/tree/cygnus-net-phy-mdio-v3 The Ethernet driver for the iProc family will be submitted soon, as will the device tree configurations for the different iProc family SoCs. Changes from v2: - Modified drivers/net/phy/Kconfig to modify the BCM_CYGNUS_PHY driver to 'depends on MDIO_BCM_IPROC' instead of 'select'. - Added github branch to the cover letter Changes from v1: - Updated device tree documentation for the iProc MDIO driver based on Florian's feedback. - Moved the core register defines from the Cygnus PHY driver to 'include/linux/brcmphy.h' based on Florian's feedback. - Created a new patch/commit to modify the bcm7xxx phy driver to use the new core register defines. - Modified the Kconfig entry for the Broadcom PHY library to 'tristate' instead of 'bool' ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Arun Parameswaran 提交于
Modified the bcm7xxx phy driver to remove local core register defines and use the common ones from "include/linux/brcmphy.h" Signed-off-by: NArun Parameswaran <arunp@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Arun Parameswaran 提交于
Add support for the Broadcom Cygnus SoCs internal PHY's. The PHYs are 1000M/100M/10M capable with support for 'EEE' and 'APD' (Auto Power Down). This driver supports the following Broadcom Cygnus SoCs: - BCM583XX (BCM58300, BCM58302, BCM58303, BCM58305) - BCM113XX (BCM11300, BCM11320, BCM11350, BCM11360) The PHY's on these SoC's require some workarounds for stable operation, both during configuration time and during suspend/resume. This driver handles the application of the workarounds. Signed-off-by: NArun Parameswaran <arunp@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Arun Parameswaran 提交于
This patch adds the Broadcom phy library to consolidate common interfaces shared by Broadcom phy's. Moved the common interfaces to the 'bcm-phy-lib.c' and updated the Broadcom PHY drivers to use the new APIs. Signed-off-by: NArun Parameswaran <arunp@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Arun Parameswaran 提交于
This patch adds support for the Broadcom iProc MDIO bus interface. The MDIO interface can be found in the Broadcom iProc family Soc's. The MDIO bus is accessed using a combination of command and data registers. This MDIO driver provides access to the Etherent GPHY's connected to the MDIO bus. Signed-off-by: NArun Parameswaran <arunp@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Arun Parameswaran 提交于
Add device tree binding documentation for the Broadcom iProc MDIO bus driver. Signed-off-by: NArun Parameswaran <arunp@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux由 David S. Miller 提交于
Santosh Shilimkar says: ==================== RDS: connection scalability and performance improvements [v4] Re-sending the same patches from v3 again since my repost of patch 05/14 from v3 was whitespace damaged. [v3] Updated patch "[PATCH v2 05/14] RDS: defer the over_batch work to send worker" as per David Miller's comment [4] to avoid the magic value usage. Patch now makes use of already available but unused send_batch_count module parameter. Rest of the patches are same as earlier version v2 [3] [v2]: Dropped "[PATCH 05/15] RDS: increase size of hash-table to 8K" from earlier version [1]. I plan to address the hash table scalability using re-sizable hash tables as suggested by David Laight and David Miller [2] This series addresses RDS connection bottlenecks on massive workloads and improve the RDMA performance almost by 3X. RDS TCP also gets a small gain of about 12%. RDS is being used in massive systems with high scalability where several hundred thousand end points and tens of thousands of local processes are operating in tens of thousand sockets. Being RC(reliable connection), socket bind and release happens very often and any inefficiencies in bind hash look ups hurts the overall system performance. RDS bin hash-table uses global spin-lock which is the biggest bottleneck. To make matter worst, it uses rcu inside global lock for hash buckets. This is being addressed by simply using per bucket rw lock which makes the locking simple and very efficient. The hash table size is still an issue and I plan to address it by using re-sizable hash tables as suggested on the list. For RDS RDMA improvement, the completion handling is revamped so that we can do batch completions. Both send and receive completion handlers are split logically to achieve the same. RDS 8K messages being one of the key usecase, mr pool is adapted to have the 8K mrs along with default 1M mrs. And while doing this, few fixes and couple of bottlenecks seen with rds_sendmsg() are addressed. Series applies against 4.3-rc1 as well net-next. Its tested on Oracle hardware with IB fabric for both bcopy as well as RDMA mode. RDS TCP is tested with iXGB NIC. Like last time, iWARP transport is untested with these changes. The patchset is also available at below git repo: git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux.git net/rds/4.3-v3 As a side note, the IB HCA driver I used for testing misses at least 3 important patches in upstream to see the full blown IB performance and am hoping to get that in mainline with help of them. ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Eric W. Biederman says: ==================== net: Pass net through the output path v2 This is the next installment of my work to pass struct net through the output path so the code does not need to guess how to figure out which network namespace it is in, and ultimately routes can have output devices in another network namespace. The first patch in this series is a fix for a bug that came in when sk was passed through the functions in the output path, and as such is probably a candidate for net. At the same time my later patches depend on it so sending the fix separately would be confusing. The second patch in this series is another fix that for an issue that came in when sk was passed through the output path. I don't think it needs a backport as I don't think anyone uses the path where the code was incorrect. The rest of the patchset focuses on the path from xxx_local_out to dst_output and in the end succeeds in passing sock_net(sk) from the socket a packet locally originates on to the dst->output function. Given the size reduction in the code I think this counts as a cleanup as much as feature work. There remain a number of helper functions (like ip option processing) to take care of before the network stack can support destination devices in other network namespaces but with this set of changes the backbone of the work is done. ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
The network namespace is already passed into dst_output pass it into dst->output lwt->output and friends. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
Compute net once in ipvlan_process_v4_outbound and ipvlan_process_v6_outbound and store it in a variable so that net does not need to be recomputed next time it is used. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
Compute net and store it in a variable in pptp_xmit, so that the value can be reused the next time it is needed. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
Compute net and store it in a variable in the functions ip_build_and_send_pkt and ip_queue_xmit so that it does not need to be recomputed next time it is needed. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
Store net in a variable in ip_tunnel_xmit so it does not need to be recomputed when it is used again. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
Stop hidding the sk parameter with an inline helper function and make all of the callers pass it, so that it is clear what the function is doing. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
Only __ip6_local_out_sk has callers so rename __ip6_local_out_sk __ip6_local_out and remove the previous __ip6_local_out. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
It is confusing and silly hiding a parameter so modify all of the callers to pass in the appropriate socket or skb->sk if no socket is known. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
For consistency with the other similar methods in the kernel pass a struct sock into the dst_ops .local_out method. Simplifying the socket passing case is needed a prequel to passing a struct net reference into .local_out. Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
Replace dst_output_okfn with dst_output Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
After a packet has been encapsulated by a tunnel we should use the tunnel sockets local multicast loopback flag to control if the encapsulated packet should be locally loopback back. Pass sk into ip_local_out_sk so that in the rare case we are dealing with a tunneled packet whose tunnel destination address is a multicast address the kernel properly decides to loopback this packet. In practice I don't think this matters as ip_queue_xmit is used by tcp, l2tp and sctp none of which I am aware of uses ip level multicasting as they are all point to point communications protocols. Let's fix this before someone uses ip_queue_xmit for a tunnel protocol that does use multicast. Fixes: aad88724 ("ipv4: add a sock pointer to dst->output() path.") Fixes: b0270e91 ("ipv4: add a sock pointer to ip_queue_xmit()") Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric W. Biederman 提交于
In the rare case where sk != skb->sk ip_local_out_sk arranges to call dst->output differently if the skb is queued or not. This is a bug. Fix this bug by passing the sk parameter of ip_local_out_sk through from ip_local_out_sk to __ip_local_out_sk (skipping __ip_local_out). Fixes: 7026b1dd ("netfilter: Pass socket pointer down through okfn().") Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue由 David S. Miller 提交于
Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2015-10-07 This series contains updates to i40e and i40evf only. Paul updates i40e to simply increase the amount of time we wait for a reset to complete since we have seen in some rare occasions the reset can take longer to complete. Shannon updates the driver to turn on Wake-on-LAN by default if it is enabled in the hardware config to begin with, rather than always disable it and wait for the user to expressly turn it on. Added new device id's and support for future devices. Fixed a possible type compare problem between a size and possible negative number. Also fixed a shift value that was wrong, which ended up with a bad bitmask. Did general house cleaning of the driver to cleanup several low lying fruit in the driver. Fixed an issue where new unicast address's would be added to the VSI list and then immediately removed and would never actually make it down to the hardware. Resolved the issue by removing the separation from unicast and multicast in the search for filters to be deleted. Mitch fixes an issue where the hardware would continue to access the memory formerly used by the rings for a VF which have been removed, causing memory corruption or DMAR errors. To relieve this condition, explicitly stop all rings associated with each VF before releasing its resources. Also fixed a panic if the driver is unable to enable MSI-X or its unable to acquire enough vectors, so propagate interrupt allocation failure information to the calling function. Cleaned up opcode that is not required. Carolyn extends the size of the test available for the interrupt names so that all the descriptive data available for the Flow Director interrupts is not truncated. Catherine fixes an issue where there was a possibility of speed getting set to 0 if advertised is set to 0 (which is the case when autoneg is disabled). Jesse fixes the checksum on big endian machines, so added code to swap it correctly. Also fixed a bug in the return from get_link_status() where only true or false was being returned, but false could mean multiple things. So allow the caller to get all the return values in the call chain bubbled back to the source so that the reason for the failure does not get lost. Anjali adds statistics to keep track of how many times we ask the stack to linearize the SKB because the hardware cannot handle SKBs with more than 8 frags per segment/single packet. ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Merge tag 'regmap-offload-update-bits' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap regmap: Allow buses to provide a custom update_bits() operation Some buses provide a native _update_bits() operation which for uncached registers is faster than doing a read/modify/write cycle as it is a single bus transaction. Add support for implementing this to regmap.
-
由 Mitch Williams 提交于
This opcode is not required. VFs that program RSS through the firmware do it by interacting directly with the firmware, and do not need to use the virtual channel for this functionality. Change-ID: Iaf17d2600e28ff1b6be8653f2fe9df1facd23b0e Signed-off-by: NMitch Williams <mitch.a.williams@intel.com> Tested-by: NAndrew Bowers <andrewx.bowers@intel.com> Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
-
由 Mitch Williams 提交于
Lower level functions are properly reporting errors, and higher-level functions are correctly responding to errors, but the errors aren't actually getting through. Typically, the middle-manager function seems to want to shield its boss from any bad news. This change fixes a panic if the driver is unable to enable MSI-X or is unable to acquire enough vectors. Change-ID: Ifd5787ce92519a5d97e4b465902db930d97b71a1 Signed-off-by: NMitch Williams <mitch.a.williams@intel.com> Tested-by: NAndrew Bowers <andrewx.bowers@intel.com> Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
-
由 Neerav Parikh 提交于
The firmware has added additional status information to allow software to determine if the APP priority for FCoE/iSCSI/FIP is valid or not in CEE DCBX mode. This patch adds to support those additional checks and will only add applications to the software table that have oper and sync bits set without any error. Change-ID: I0a76c52427dadf97d4dba4538a3068d05e4eb56b Signed-off-by: NNeerav Parikh <neerav.parikh@intel.com> Tested-by: NAndrew Bowers <andrewx.bowers@intel.com> Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
-
由 Anjali Singhai Jain 提交于
Keep track of how many times we ask the stack to linearize the skb because the HW cannot handle skbs with more than 8 frags per segment/single packet. Change-ID: If455452060963a769bbe6112cba952e79e944b52 Signed-off-by: NAnjali Singhai Jain <anjali.singhai@intel.com> Tested-by: NAndrew Bowers <andrewx.bowers@intel.com> Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
-
由 Shannon Nelson 提交于
When using something like "ip maddr add ..." to add another unicast mac address to the netdev, the mac address comes into the set_rx_mode handler in the multicast list whether it is a unicast or multicast address. This was confusing the code when it was trying to search for addresses that needed to be deleted from the VSI, because it was looking for the VSI unicast address in the netdev unicast list. The result was that a new unicast address would get added to the VSI list and then immediately removed, and would never actually make it down into the hardware. This patch removes the separation from unicast and multicast in the search for filters to be deleted. It also simplifies the logic a little with a jump to the bottom of the loop when an address is found. Now it doesn't matter which netdev list the address is hiding in, we'll check them all. Change-ID: Ie3685a92427ae7d2212bf948919ce295bc7a874c Signed-off-by: NShannon Nelson <shannon.nelson@intel.com> Tested-by: NAndrew Bowers <andrewx.bowers@intel.com> Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
-