提交 · f96da09473b52c09125cc9bf7d7d4576ae8229e0 · openeuler / Kernel

03 7月, 2017 26 次提交

bpf: simplify narrower ctx access · f96da094

由 Daniel Borkmann 提交于 7月 02, 2017

This work tries to make the semantics and code around the
narrower ctx access a bit easier to follow. Right now
everything is done inside the .is_valid_access(). Offset
matching is done differently for read/write types, meaning
writes don't support narrower access and thus matching only
on offsetof(struct foo, bar) is enough whereas for read
case that supports narrower access we must check for
offsetof(struct foo, bar) + offsetof(struct foo, bar) +
sizeof(<bar>) - 1 for each of the cases. For read cases of
individual members that don't support narrower access (like
packet pointers or skb->cb[] case which has its own narrow
access logic), we check as usual only offsetof(struct foo,
bar) like in write case. Then, for the case where narrower
access is allowed, we also need to set the aux info for the
access. Meaning, ctx_field_size and converted_op_size have
to be set. First is the original field size e.g. sizeof(<bar>)
as in above example from the user facing ctx, and latter
one is the target size after actual rewrite happened, thus
for the kernel facing ctx. Also here we need the range match
and we need to keep track changing convert_ctx_access() and
converted_op_size from is_valid_access() as both are not at
the same location.

We can simplify the code a bit: check_ctx_access() becomes
simpler in that we only store ctx_field_size as a meta data
and later in convert_ctx_accesses() we fetch the target_size
right from the location where we do convert. Should the verifier
be misconfigured we do reject for BPF_WRITE cases or target_size
that are not provided. For the subsystems, we always work on
ranges in is_valid_access() and add small helpers for ranges
and narrow access, convert_ctx_accesses() sets target_size
for the relevant instruction.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Cc: Yonghong Song <yhs@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f96da094

bpf: add bpf_skb_adjust_room helper · 2be7e212

由 Daniel Borkmann 提交于 7月 02, 2017

This work adds a helper that can be used to adjust net room of an
skb. The helper is generic and can be further extended in future.
Main use case is for having a programmatic way to add/remove room to
v4/v6 header options along with cls_bpf on egress and ingress hook
of the data path. It reuses most of the infrastructure that we added
for the bpf_skb_change_type() helper which can be used in nat64
translations. Similarly, the helper only takes care of adjusting the
room so that related data is populated and csum adapted out of the
BPF program using it.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2be7e212

bpf, net: add skb_mac_header_len helper · 0daf4349

由 Daniel Borkmann 提交于 7月 02, 2017

Add a small skb_mac_header_len() helper similarly as the
skb_network_header_len() we have and replace open coded
places in BPF's bpf_skb_change_proto() helper. Will also
be used in upcoming work.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0daf4349

net: cdc_mbim: apply "NDP to end" quirk to HP lt4132 · a68491f8

由 Tore Anderson 提交于 7月 01, 2017

The HP lt4132 LTE/HSPA+ 4G Module (03f0:a31d) is a rebranded Huawei
ME906s-158 device. It, like the ME906s-158, requires the "NDP to end"
quirk for correct operation.
Signed-off-by: NTore Anderson <tore@fud.no>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a68491f8

Documentation: fix wrong example command · 75674c4c

由 Matteo Croce 提交于 6月 30, 2017

In the IPVLAN documentation there is an example command line where the
master and slave interface names are inverted.
Fix the command line and also add the optional `name' keyword to better
describe what the command is doing.

v2: added commit message
Signed-off-by: NMatteo Croce <mcroce@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

75674c4c

vxlan: correctly set vxlan->net when creating the device in a netns · 889ce937

由 Sabrina Dubroca 提交于 6月 30, 2017

Commit a985343b ("vxlan: refactor verification and application of
configuration") modified vxlan device creation, and replaced the
assignment of vxlan->net to src_net with dev_net(netdev) in ->setup().

But dev_net(netdev) is not the same as src_net. At the time ->setup()
is called, dev_net hasn't been set yet, so we end up creating the
socket for the vxlan device in init_net.

Fix this by bringing back the assignment of vxlan->net during device
creation.

Fixes: a985343b ("vxlan: refactor verification and application of configuration")
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Reviewed-by: NMatthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

889ce937

Merge branch 'hns-phy-loopback' · c3b99db8

由 David S. Miller 提交于 7月 03, 2017

Lin Yun Sheng says:

====================
Add loopback support in phy_driver and hns ethtool fix

This Patch Set add set_loopback in phy_driver and use it to setup loopback
when doing ethtool phy self_test.

Patch V8:
	Respin the Patch based on net-next

Patch V7:
	1. Add comment why resume the phy in hns_nic_config_phy_loopback.
	2. Fix a typo error in patch description.

Patch V6:
	Fix Or'ing error code in __lb_setup.

Patch V5:
	Removing non loopback related code change.

Patch V4:
	1. Remove c45 checking
	2. Add -ENOTSUPP when function pointer is null,
	   take mutex in phy_loopback.

Patch V3:
	Calling phy_loopback enable and disable in pair in hns mac driver.

Patch V2:
	1. Add phy_loopback in phy_device.c.
	2. Do error checking and do the read and write once in
	   genphy_loopback.
	3. Remove gen10g_loopback in phy_device.c.

Patch V1:
	Initial Submit
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c3b99db8

net: hns: Use phy_driver to setup Phy loopback · 67cd9a99

由 Lin Yun Sheng 提交于 6月 30, 2017

Use function set_loopback in phy_driver to setup phy loopback
when doing ethtool self test.
Signed-off-by: NLin Yun Sheng <linyunsheng@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

67cd9a99

net: phy: Add phy loopback support in net phy framework · f0f9b4ed

由 Lin Yun Sheng 提交于 6月 30, 2017

This patch add set_loopback in phy_driver, which is used by MAC
driver to enable or disable phy loopback. it also add a generic
genphy_loopback function, which use BMCR loopback bit to enable
or disable loopback.
Signed-off-by: NLin Yun Sheng <linyunsheng@huawei.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f0f9b4ed

net/mlx5: fix memcpy limit? · 6992c6c5

由 Stephen Rothwell 提交于 6月 30, 2017

Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6992c6c5

ipv6: dad: don't remove dynamic addresses if link is down · ec8add2a

由 Sabrina Dubroca 提交于 6月 29, 2017

Currently, when the link for $DEV is down, this command succeeds but the
address is removed immediately by DAD (1):

    ip addr add 1111::12/64 dev $DEV valid_lft 3600 preferred_lft 1800

In the same situation, this will succeed and not remove the address (2):

    ip addr add 1111::12/64 dev $DEV
    ip addr change 1111::12/64 dev $DEV valid_lft 3600 preferred_lft 1800

The comment in addrconf_dad_begin() when !IF_READY makes it look like
this is the intended behavior, but doesn't explain why:

     * If the device is not ready:
     * - keep it tentative if it is a permanent address.
     * - otherwise, kill it.

We clearly cannot prevent userspace from doing (2), but we can make (1)
work consistently with (2).

addrconf_dad_stop() is only called in two cases: if DAD failed, or to
skip DAD when the link is down. In that second case, the fix is to avoid
deleting the address, like we already do for permanent addresses.

Fixes: 3c21edbd ("[IPV6]: Defer IPv6 device initialization until the link becomes ready.")
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec8add2a

net: cdc_ncm: Reduce memory use when kernel memory low · e1069bbf

由 Jim Baxter 提交于 6月 28, 2017

The CDC-NCM driver can require large amounts of memory to create
skb's and this can be a problem when the memory becomes fragmented.

This especially affects embedded systems that have constrained
resources but wish to maximise the throughput of CDC-NCM with 16KiB
NTB's.

The issue is after running for a while the kernel memory can become
fragmented and it needs compacting.
If the NTB allocation is needed before the memory has been compacted
the atomic allocation can fail which can cause increased latency,
large re-transmissions or disconnections depending upon the data
being transmitted at the time.
This situation occurs for less than a second until the kernel has
compacted the memory but the failed devices can take a lot longer to
recover from the failed TX packets.

To ease this temporary situation I modified the CDC-NCM TX path to
temporarily switch into a reduced memory mode which allocates an NTB
that will fit into a USB_CDC_NCM_NTB_MIN_OUT_SIZE (default 2048 Bytes)
sized memory block and only transmit NTB's with a single network frame
until the memory situation is resolved.
Each time this issue occurs we wait for an increasing number of
reduced size allocations before requesting a full size one to not
put additional pressure on a low memory system.

Once the memory is compacted the CDC-NCM data can resume transmitting
at the normal tx_max rate once again.
Signed-off-by: NJim Baxter <jim_baxter@mentor.com>
Reviewed-by: NBjørn Mork <bjorn@mork.no>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e1069bbf

Merge branch 'qed-Add-iWARP-support-for-QL4xxxx' · 2da95be9

由 David S. Miller 提交于 7月 03, 2017

Michal Kalderon says:

====================
qed: Add iWARP support for QL4xxxx

This patch series adds iWARP support to our QL4xxxx networking adapters.
The code changes span across qed and qedr drivers, but this series contains
changes to qed only. Once the series is accepted, the qedr series will
be submitted to the rdma tree.
There is one additional qed patch which enables the iWARP, this patch is
delayed until the qedr series will be accepted.

The patches were previously sent as an RFC, and these are the first 12
patches in the RFC series:
https://www.spinics.net/lists/linux-rdma/msg51416.html

This series was tested and built against net-next.

MAINTAINERS file is not updated in this PATCH as there is a pending patch
for qedr driver update https://patchwork.kernel.org/patch/9752761.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2da95be9

qed: Add iWARP support for physical queue allocation · 93c45984

由 Kalderon, Michal 提交于 7月 02, 2017

iWARP has different physical queue requirements than RoCE
Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: NAriel Elior <Ariel.Elior@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

93c45984

qed: Add iWARP protocol support in context allocation · 5d7dc962

由 Kalderon, Michal 提交于 7月 02, 2017

When computing how much memory is required for the different hw clients
iWARP protocol should be taken into account
Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: NAriel Elior <Ariel.Elior@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5d7dc962

qed: iWARP CM add error handling · 9816b614

由 Kalderon, Michal 提交于 7月 02, 2017

This patch introduces error handling for errors that occurred during
connection establishment.
Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: NAriel Elior <Ariel.Elior@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9816b614

qed: iWARP implement disconnect flows · fc4c6065

由 Kalderon, Michal 提交于 7月 02, 2017

This patch takes care of active/passive disconnect flows.
Disconnect flows can be initiated remotely, in which case a async event
will arrive from peer and indicated to qedr driver. These
are referred to as exceptions. When a QP is destroyed, it needs to check
that it's associated ep has been closed.
Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: NAriel Elior <Ariel.Elior@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fc4c6065

qed: iWARP CM add active side connect · 4b0fdd7c

由 Kalderon, Michal 提交于 7月 02, 2017

This patch implements the active side connect.
Offload a connection, process MPA reply and send RTR.
In some of the common passive/active functions, the active side
will work in blocking mode.
Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: NAriel Elior <Ariel.Elior@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4b0fdd7c

qed: iWARP CM add passive side connect · 456a5849

由 Kalderon, Michal 提交于 7月 02, 2017

This patch implements the passive side connect.
It addresses pre-allocating resources, creating a connection
element upon valid SYN packet received. Calling upper layer and
implementation of the accept/reject calls.

Error handling is not part of this patch.
Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: NAriel Elior <Ariel.Elior@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

456a5849

qed: iWARP CM add listener functions and initial SYN processing · 65a91a6c

由 Kalderon, Michal 提交于 7月 02, 2017

This patch adds the ability to add and remove listeners and identify
whether the SYN packet received is intended for iWARP or not. If
a listener is not found the SYN packet is posted back to the chip.
Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: NAriel Elior <Ariel.Elior@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

65a91a6c

qed: iWARP CM - setup a ll2 connection for handling SYN packets · b5c29ca7

由 Kalderon, Michal 提交于 7月 02, 2017

iWARP handles incoming SYN packets using the ll2 interface. This patch
implements ll2 setup and teardown. Additional ll2 connections will
be used in the future which are not part of this patch series.
Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: NAriel Elior <Ariel.Elior@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b5c29ca7

qed: Add iWARP support in ll2 connections · cc4ad324

由 Kalderon, Michal 提交于 7月 02, 2017

Add a new connection type for iWARP ll2 connections for setting
correct ll2 filters and connection type to FW.
Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: NAriel Elior <Ariel.Elior@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cc4ad324

qed: Rename some ll2 related defines · 526d1d05

由 Kalderon, Michal 提交于 7月 02, 2017

Make some names more generic as they will be used by iWARP too.
Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: NAriel Elior <Ariel.Elior@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

526d1d05

qed: Implement iWARP initialization, teardown and qp operations · 67b40dcc

由 Kalderon, Michal 提交于 7月 02, 2017

This patch adds iWARP support for flows that have common code
between RoCE and iWARP, such as initialization, teardown and
qp setup verbs: create, destroy, modify, query.
It introduces the iWARP specific files qed_iwarp.[ch] and
iwarp_common.h
Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: NAriel Elior <Ariel.Elior@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

67b40dcc

qed: Introduce iWARP personality · c851a9dc

由 Kalderon, Michal 提交于 7月 02, 2017

iWARP personality introduced the need for differentiating in several
places in the code whether we are RoCE, iWARP or either. This
leads to introducing new macros for querying the personality.
Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: NAriel Elior <Ariel.Elior@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c851a9dc

bpf: fix to bpf_setsockops · a5192c52

由 Lawrence Brakmo 提交于 7月 02, 2017

Fixed build error due to misplaced "#ifdef CONFIG_INET" (moved 1
statement up).
Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a5192c52

02 7月, 2017 14 次提交

Merge branch 'bpf-Add-support-for-sock_ops' · bcdb239b

由 David S. Miller 提交于 7月 01, 2017

Lawrence Brakmo says:

====================
bpf: Add support for sock_ops

Created a new BPF program type, BPF_PROG_TYPE_SOCK_OPS, and a corresponding
struct that allows BPF programs of this type to access some of the
socket's fields (such as IP addresses, ports, etc.) and setting
connection parameters such as buffer sizes, initial window, SYN/SYN-ACK
RTOs, etc.

Unlike current BPF program types that expect to be called at a particular
place in the network stack code, SOCK_OPS program can be called at
different places and use an "op" field to indicate the context. There
are currently two types of operations, those whose effect is through
their return value and those whose effect is through the new
bpf_setsocketop BPF helper function.

Example operands of the first type are:
  BPF_SOCK_OPS_TIMEOUT_INIT
  BPF_SOCK_OPS_RWND_INIT
  BPF_SOCK_OPS_NEEDS_ECN

Example operands of the secont type are:
  BPF_SOCK_OPS_TCP_CONNECT_CB
  BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB
  BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB

Current operands are only called during connection establishment so
there should not be any BPF overheads after connection establishment. The
main idea is to use connection information form both hosts, such as IP
addresses and ports to allow setting of per connection parameters to
optimize the connection's peformance.

Alghough there are already 3 mechanisms to set parameters (sysctls,
route metrics and setsockopts), this new mechanism provides some
disticnt advantages. Unlike sysctls, it can set parameters per
connection. In contrast to route metrics, it can also use port numbers
and information provided by a user level program. In addition, it could
set parameters probabilistically for evaluation purposes (i.e. do
something different on 10% of the flows and compare results with the
other 90% of the flows). Also, in cases where IPv6 addresses contain
geographic information, the rules to make changes based on the distance
(or RTT) between the hosts are much easier than route metric rules and
can be global. Finally, unlike setsockopt, it does not require
application changes and it can be updated easily at any time.

It uses the existing bpf cgroups infrastructure so the programs can be
attached per cgroup with full inheritance support. Although the bpf cgroup
framework already contains a sock related program type (BPF_PROG_TYPE_CGROUP_SOCK),
I created the new type (BPF_PROG_TYPE_SOCK_OPS) beccause the existing type
expects to be called only once during the connections's lifetime. In contrast,
the new program type will be called multiple times from different places in the
network stack code.  For example, before sending SYN and SYN-ACKs to set
an appropriate timeout, when the connection is established to set congestion
control, etc. As a result it has "op" field to specify the type of operation
requested.

This patch set also includes sample BPF programs to demostrate the differnet
features.

v2: Formatting changes, rebased to latest net-next

v3: Fixed build issues, changed socket_ops to sock_ops throught,
    fixed formatting issues, removed the syscall to load sock_ops
    program and added functionality to use existing bpf attach and
    bpf detach system calls, removed reader/writer locks in
    sock_bpfops.c (used when saving sock_ops global program)
    and fixed missing module refcount increment.

v4: Removed global sock_ops program and instead used existing cgroup bpf
    infrastructure to support a new BPF_CGROUP_ATTCH type.

v5: fixed kbuild warning happening in bpf-cgroup.h
    removed automatic converstion to host byte order from some sock_ops
      fields (ipv4 and ipv6 addresses, remote port)
    Added conversion to host byte order in some of the sample programs
    Added to sample BPF program comments about using load_sock_ops to load
    Removed is_req_sock field from bpf_sock_ops_kern and related places,
      using sk_fullsock() instead.

v6: fixes to BPF helper function setsockopt (possible NULL deferencing, etc.)
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bcdb239b

bpf: update tools/include/uapi/linux/bpf.h · 04df41e3

由 Lawrence Brakmo 提交于 6月 30, 2017

Update tools/include/uapi/linux/bpf.h to include changes related to new
bpf sock_ops program type.
Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

04df41e3

bpf: Sample bpf program to set sndcwnd clamp · 6c4a01b2

由 Lawrence Brakmo 提交于 6月 30, 2017

Sample BPF program, tcp_clamp_kern.c, to demostrate the use
of setting the sndcwnd clamp. This program assumes that if the
first 5.5 bytes of the host's IPv6 addresses are the same, then
the hosts are in the same datacenter and sets sndcwnd clamp to
100 packets, SYN and SYN-ACK RTOs to 10ms and send/receive buffer
sizes to 150KB.
Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6c4a01b2

bpf: Adds support for setting sndcwnd clamp · 13bf9641

由 Lawrence Brakmo 提交于 6月 30, 2017

Adds a new bpf_setsockopt for TCP sockets, TCP_BPF_SNDCWND_CLAMP, which
sets the initial congestion window. It is useful to limit the sndcwnd
when the host are close to each other (small RTT).
Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

13bf9641

bpf: Sample BPF program to set initial cwnd · 7bc62e28

由 Lawrence Brakmo 提交于 6月 30, 2017

Sample BPF program that assumes hosts are far away (i.e. large RTTs)
and sets initial cwnd and initial receive window to 40 packets,
send and receive buffers to 1.5MB.

In practice there would be a test to insure the hosts are actually
far enough away.
Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7bc62e28

bpf: Adds support for setting initial cwnd · fc747810

由 Lawrence Brakmo 提交于 6月 30, 2017

Adds a new bpf_setsockopt for TCP sockets, TCP_BPF_IW, which sets the
initial congestion window. This can be used when the hosts are far
apart (large RTTs) and it is safe to start with a large inital cwnd.
Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fc747810

bpf: Sample BPF program to set congestion control · bb56d444

由 Lawrence Brakmo 提交于 6月 30, 2017

Sample BPF program that sets congestion control to dctcp when both hosts
are within the same datacenter. In this example that is assumed to be
when they have the first 5.5 bytes of their IPv6 address are the same.
Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bb56d444

bpf: Add support for changing congestion control · 91b5b21c

由 Lawrence Brakmo 提交于 6月 30, 2017

Added support for changing congestion control for SOCK_OPS bpf
programs through the setsockopt bpf helper function. It also adds
a new SOCK_OPS op, BPF_SOCK_OPS_NEEDS_ECN, that is needed for
congestion controls, like dctcp, that need to enable ECN in the
SYN packets.
Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

91b5b21c

bpf: Sample BPF program to set buffer sizes · d9925368

由 Lawrence Brakmo 提交于 6月 30, 2017

This patch contains a BPF program to set initial receive window to
40 packets and send and receive buffers to 1.5MB. This would usually
be done after doing appropriate checks that indicate the hosts are
far enough away (i.e. large RTT).
Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d9925368

bpf: Add TCP connection BPF callbacks · 9872a4bd

由 Lawrence Brakmo 提交于 6月 30, 2017

Added callbacks to BPF SOCK_OPS type program before an active
connection is intialized and after a passive or active connection is
established.

The following patch demostrates how they can be used to set send and
receive buffer sizes.
Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9872a4bd

bpf: Add setsockopt helper function to bpf · 8c4b4c7e

由 Lawrence Brakmo 提交于 6月 30, 2017

Added support for calling a subset of socket setsockopts from
BPF_PROG_TYPE_SOCK_OPS programs. The code was duplicated rather
than making the changes to call the socket setsockopt function because
the changes required would have been larger.

The ops supported are:
  SO_RCVBUF
  SO_SNDBUF
  SO_MAX_PACING_RATE
  SO_PRIORITY
  SO_RCVLOWAT
  SO_MARK
Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8c4b4c7e

bpf: Sample bpf program to set initial window · c400296b

由 Lawrence Brakmo 提交于 6月 30, 2017

The sample bpf program, tcp_rwnd_kern.c, sets the initial
advertized window to 40 packets in an environment where
distinct IPv6 prefixes indicate that both hosts are not
in the same data center.
Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c400296b

bpf: Support for setting initial receive window · 13d3b1eb

由 Lawrence Brakmo 提交于 6月 30, 2017

This patch adds suppport for setting the initial advertized window from
within a BPF_SOCK_OPS program. This can be used to support larger
initial cwnd values in environments where it is known to be safe.
Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

13d3b1eb

bpf: Sample bpf program to set SYN/SYN-ACK RTOs · 61bc4d8d

由 Lawrence Brakmo 提交于 6月 30, 2017

The sample BPF program, tcp_synrto_kern.c, sets the SYN and SYN-ACK
RTOs to 10ms when both hosts are within the same datacenter (i.e.
small RTTs) in an environment where common IPv6 prefixes indicate
both hosts are in the same data center.
Signed-off-by: NLawrence Brakmo <brakmo@fb.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

61bc4d8d

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功