提交 · 662c54721d3a1e8950029cb6b0ed264d59847711 · openeuler / Kernel

07 7月, 2018 1 次提交

nfp: bpf: rename umin/umax to umin_src/umax_src · 662c5472

由 Jiong Wang 提交于 7月 06, 2018

The two fields are a copy of umin and umax info of bpf_insn->src_reg
generated by verifier.

Rename to make their meaning clear.
Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

662c5472

27 6月, 2018 1 次提交

nfp: bpf: allow source ptr type be map ptr in memcpy optimization · cc0dff6d

由 Jiong Wang 提交于 6月 26, 2018

Map read has been supported on NFP, this patch enables optimization
for memcpy from map to packet.

This patch also fixed one latent bug which will cause copying from
unexpected address once memcpy for map pointer enabled.  The fixed
code path was not exercised before.
Reported-by: NMary Pham <mary.pham@netronome.com>
Reported-by: NDavid Beckett <david.beckett@netronome.com>
Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

cc0dff6d

19 5月, 2018 3 次提交

nfp: bpf: support arithmetic indirect right shift (BPF_ARSH | BPF_X) · c217abcc

由 Jiong Wang 提交于 5月 18, 2018

Code logic is similar with arithmetic right shift by constant, and NFP
get indirect shift amount through source A operand of PREV_ALU.

It is possible to fall back to logic right shift if the MSB is known to be
zero from range info, however there is no benefit to do this given logic
indirect right shift use the same number and cycle of instruction sequence.

Suppose the MSB of regX is the bit we want to replicate to fill in all the
vacant positions, and regY contains the shift amount, then we could use
single instruction to set up both.

  [alu, --, regY, OR, regX]

  --
  NOTE: the PREV_ALU result doesn't need to write to any destination
        register.
Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

c217abcc

nfp: bpf: support arithmetic right shift by constant (BPF_ARSH | BPF_K) · f43d0f17

由 Jiong Wang 提交于 5月 18, 2018

Code logic is similar with logic right shift except we also need to set
PREV_ALU result properly, the MSB of which is the bit that will be
replicated to fill in all the vacant positions.
Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

f43d0f17

nfp: bpf: support logic indirect shifts (BPF_[L|R]SH | BPF_X) · 991f5b36

由 Jiong Wang 提交于 5月 18, 2018

For indirect shifts, shift amount is not specified as constant, NFP needs
to get the shift amount through the low 5 bits of source A operand in
PREV_ALU, therefore extra instructions are needed compared with shifts by
constants.

Because NFP is 32-bit, so we are using register pair for 64-bit shifts and
therefore would need different instruction sequences depending on whether
shift amount is less than 32 or not.

NFP branch-on-bit-test instruction emitter is added by this patch and is
used for efficient runtime check on shift amount. We'd think the shift
amount is less than 32 if bit 5 is clear and greater or equal than 32
otherwise. Shift amount is greater than or equal to 64 will result in
undefined behavior.

This patch also use range info to avoid generating unnecessary runtime code
if we are certain shift amount is less than 32 or not.
Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

991f5b36

10 5月, 2018 1 次提交

nfp: bpf: support setting the RX queue index · d985888f

由 Jakub Kicinski 提交于 5月 08, 2018

BPF has access to all internal FW datapath structures. Including
the structure containing RX queue selection. With little coordination
with the datapath we can let the offloaded BPF select the RX queue.
We just need a way to tell the datapath that queue selection has already
been done and it shouldn't overwrite it. Define a bit to tell datapath
BPF already selected a queue (QSEL_SET), if the selected queue is not
enabled (>= number of enabled queues) datapath will perform normal RSS.

BPF queue selection on the NIC can be used to replace standard
datapath RSS with fully programmable BPF/XDP RSS.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

d985888f

05 5月, 2018 2 次提交

nfp: bpf: rewrite map pointers with NFP TIDs · b4264c96

由 Jakub Kicinski 提交于 5月 03, 2018

Kernel will now replace map fds with actual pointer before
calling the offload prepare.  We can identify those pointers
and replace them with NFP table IDs instead of loading the
table ID in code generated for CALL instruction.

This allows us to support having the same CALL being used with
different maps.

Since we don't want to change the FW ABI we still need to
move the TID from R1 to portion of R0 before the jump.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: NJiong Wang <jiong.wang@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

b4264c96

nfp: bpf: perf event output helpers support · 9816dd35

由 Jakub Kicinski 提交于 5月 03, 2018

Add support for the perf_event_output family of helpers.

The implementation on the NFP will not match the host code exactly.
The state of the host map and rings is unknown to the device, hence
device can't return errors when rings are not installed.  The device
simply packs the data into a firmware notification message and sends
it over to the host, returning success to the program.

There is no notion of a host CPU on the device when packets are being
processed.  Device will only offload programs which set BPF_F_CURRENT_CPU.
Still, if map index doesn't match CPU no error will be returned (see
above).

Dropped/lost firmware notification messages will not cause "lost
events" event on the perf ring, they are only visible via device
error counters.

Firmware notification messages may also get reordered in respect
to the packets which caused their generation.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

9816dd35

25 4月, 2018 4 次提交

nfp: bpf: optimize comparisons to negative constants · 7bdc97be

由 Jakub Kicinski 提交于 4月 24, 2018

Comparison instruction requires a subtraction.  If the constant
is negative we are more likely to fit it into a NFP instruction
directly if we change the sign and use addition.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

7bdc97be

nfp: bpf: tabularize generations of compare operations · 61dd8f00

由 Jakub Kicinski 提交于 4月 24, 2018

There are quite a few compare instructions now, use a table
to translate BPF instruction code to NFP instruction parameters
instead of parameterizing helpers.  This saves LOC and makes
future extensions easier.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

61dd8f00

nfp: bpf: optimize add/sub of a negative constant · 6c59500c

由 Jakub Kicinski 提交于 4月 24, 2018

NFP instruction set can fit small immediates into the instruction.
Negative integers, however, will never fit because they will have
highest bit set.  If we swap the ALU op between ADD and SUB and
negate the constant we have a better chance of fitting small negative
integers into the instruction itself and saving one or two cycles.

immed[gprB_21, 0xfffffffc]
alu[gprA_4, gprA_4, +, gprB_21], gpr_wrboth
immed[gprB_21, 0xffffffff]
alu[gprA_5, gprA_5, +carry, gprB_21], gpr_wrboth

now becomes:

alu[gprA_4, gprA_4, -, 4], gpr_wrboth
alu[gprA_5, gprA_5, -carry, 0], gpr_wrboth
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

6c59500c

nfp: bpf: remove double space · 9c9e5323

由 Jakub Kicinski 提交于 4月 24, 2018

Whitespace cleanup - remove double space.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

9c9e5323

29 3月, 2018 11 次提交

nfp: bpf: add support for bpf_get_prandom_u32() · df4a37d8

由 Jakub Kicinski 提交于 3月 28, 2018

NFP has a prng register, which we can read to obtain a u32 worth
of pseudo random data.  Generate code for it.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: NJiong Wang <jiong.wang@netronome.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

df4a37d8

nfp: bpf: add support for atomic add of unknown values · 41aed09c

由 Jakub Kicinski 提交于 3月 28, 2018

Allow atomic add to be used even when the value is not guaranteed
to fit into a 16 bit immediate.  This requires the value to be pulled
as data, and therefore use of a transfer register and a context swap.

Track the information about possible lengths of the value, if it's
guaranteed to be larger than 16bits don't generate the code for the
optimized case at all.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: NJiong Wang <jiong.wang@netronome.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

41aed09c

nfp: bpf: expose command delay slots · b556ddd9

由 Jakub Kicinski 提交于 3月 28, 2018

Allow callers to control the delay slots of commands, instead of
giving them just a wait/nowait choice.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: NJiong Wang <jiong.wang@netronome.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

b556ddd9

nfp: bpf: add basic support for atomic adds · dcb0c27f

由 Jakub Kicinski 提交于 3月 28, 2018

Implement atomic add operation for 32 and 64 bit values.  Depend
on the verifier to ensure alignment.  Values have to be kept in
big endian and swapped upon read/write.  For now only support
atomic add of a constant.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: NJiong Wang <jiong.wang@netronome.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

dcb0c27f

nfp: bpf: add map deletes from the datapath · bfee64de

由 Jakub Kicinski 提交于 3月 28, 2018

Support calling map_delete_elem() FW helper from the datapath
programs.  For JIT checks and code are basically equivalent
to map lookups.  Similarly to other map helper key must be on
the stack.  Different pointer types are left for future extension.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: NJiong Wang <jiong.wang@netronome.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

bfee64de

nfp: bpf: add map updates from the datapath · 44d65a47

由 Jakub Kicinski 提交于 3月 28, 2018

Support calling map_update_elem() from the datapath programs
by calling into FW-provided helper.  Value pointer is passed
in LM pointer #2.  Keeping track of old state for arg3 is not
necessary, since LM pointer #2 will be always loaded in this
case, the trivial optimization for value at the bottom of the
stack can't be done here.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: NJiong Wang <jiong.wang@netronome.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

44d65a47

nfp: bpf: add helper for validating stack pointers · 2f46e0c1

由 Jakub Kicinski 提交于 3月 28, 2018

Our implementation has restriction on stack pointers for function
calls.  Move the common checks into a helper for reuse.  The state
has to be encapsulated into a structure to support parameters
other than BPF_REG_2.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: NJiong Wang <jiong.wang@netronome.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

2f46e0c1

nfp: bpf: rename map_lookup_stack() to map_call_stack_common() · fc448497

由 Jakub Kicinski 提交于 3月 28, 2018

We will reuse most of map call code gen for other map calls.
Rename the lookup gen function and use meta->func_id instead
of hard-coding lookup.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: NJiong Wang <jiong.wang@netronome.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

fc448497

nfp: bpf: detect packet reads could be cached, enable the optimisation · 87b10ecd

由 Jiong Wang 提交于 3月 28, 2018

This patch is the front end of this optimisation, it detects and marks
those packet reads that could be cached. Then the optimisation "backend"
will be activated automatically.
Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

87b10ecd

nfp: bpf: support unaligned read offset · 91ff69e8

由 Jiong Wang 提交于 3月 28, 2018

This patch add the support for unaligned read offset, i.e. the read offset
to the start of packet cache area is not aligned to REG_WIDTH. In this
case, the read area might across maximum three transfer-in registers.
Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

91ff69e8

nfp: bpf: read from packet data cache for PTR_TO_PACKET · be759237

由 Jiong Wang 提交于 3月 28, 2018

This patch assumes there is a packet data cache, and would try to read
packet data from the cache instead of from memory.

This patch only implements the optimisation "backend", it doesn't build
the packet data cache, so this optimisation is not enabled.

This patch has only enabled aligned packet data read, i.e. when the read
offset to the start of cache is REG_WIDTH aligned.
Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

be759237

25 3月, 2018 1 次提交

nfp: bpf: fix check of program max insn count · e8a4796e

由 Jakub Kicinski 提交于 3月 23, 2018

NFP program allocation length is in bytes and NFP program length
is in instructions, fix the comparison of the two.

Fixes: 9314c442 ("nfp: bpf: move translation prepare to offload.c")
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NJiong Wang <jiong.wang@netronome.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

e8a4796e

17 1月, 2018 1 次提交

nfp: bpf: reject program on instructions unknown to the JIT compiler · 74801e50

由 Quentin Monnet 提交于 1月 16, 2018

If an eBPF instruction is unknown to the driver JIT compiler, we can
reject the program at verification time.
Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NJiong Wang <jiong.wang@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

74801e50

15 1月, 2018 2 次提交

nfp: bpf: add support for reading map memory · 3dd43c33

由 Jakub Kicinski 提交于 1月 11, 2018

Map memory needs to use 40 bit addressing.  Add handling of such
accesses.  Since 40 bit addresses are formed by using both 32 bit
operands we need to pre-calculate the actual address instead of
adding in the offset inside the instruction, like we did in 32 bit
mode.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

3dd43c33

nfp: bpf: add verification and codegen for map lookups · 77a3d311

由 Jakub Kicinski 提交于 1月 11, 2018

Verify our current constraints on the location of the key are
met and generate the code for calling map lookup on the datapath.

New relocation types have to be added - for helpers and return
addresses.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

77a3d311

10 1月, 2018 6 次提交

nfp: bpf: add signed jump insns · c087aa8b

由 Nic Viljoen 提交于 1月 10, 2018

This patch adds signed jump instructions (jsgt, jsge, jslt, jsle)
to the nfp jit. As well as adding the additional required raw
assembler branch mask to nfp_asm.h
Signed-off-by: NNic Viljoen <nick.viljoen@netronome.com>
Reviewed-by: NJiong Wang <jiong.wang@netronome.com>
Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

c087aa8b

nfp: bpf: use a large constant in unresolved branches · e84797fe

由 Jakub Kicinski 提交于 1月 10, 2018

To make absolute relocated branches (branches which will be completely
rewritten with br_set_offset()) distinguishable in user space dumps
from normal jumps add a large offset to them.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NJiong Wang <jiong.wang@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

e84797fe

nfp: bpf: don't depend on high order allocations for program image · 44a12ecc

由 Jakub Kicinski 提交于 1月 10, 2018

The translator pre-allocates a buffer of maximal program size.
Due to HW/FW limitations the program buffer can't currently be
longer than 128Kb, so we used to kmalloc() it, and then map for
DMA directly.

Now that the late branch resolution is copying the program image
anyway, we can just kvmalloc() the buffer.  While at it, after
translation reallocate the buffer to save space.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

44a12ecc

nfp: bpf: relocate jump targets just before the load · 2314fe9e

由 Jakub Kicinski 提交于 1月 10, 2018

Don't translate the program assuming it will be loaded at a given
address.  This will be required for sharing programs between ports
of the same NIC, tail calls and subprograms.  It will also make the
jump targets easier to understand when dumping the program to user
space.

Translate the program as if it was going to be loaded at address
zero.  When load happens add the load offset in and set addresses
of special branches.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NJiong Wang <jiong.wang@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

2314fe9e

nfp: bpf: add helpers for modifying branch addresses · 488feeaf

由 Jakub Kicinski 提交于 1月 10, 2018

In preparation for better handling of relocations move existing
helper for setting branch offset to nfp_asm.c and add two more.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NJiong Wang <jiong.wang@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

488feeaf

nfp: bpf: move jump resolution to jit.c · 1549921d

由 Jakub Kicinski 提交于 1月 10, 2018

Jump target resolution should be in jit.c not offload.c.
No functional changes.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NJiong Wang <jiong.wang@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

1549921d

15 12月, 2017 3 次提交

nfp: bpf: optimize the adjust_head calls in trivial cases · 8231f844

由 Jakub Kicinski 提交于 12月 14, 2017

If the program is simple and has only one adjust head call
with constant parameters, we can check that the call will
always succeed at translation time.  We need to track the
location of the call and make sure parameters are always
the same.  We also have to check the parameters against
datapath constraints and ETH_HLEN.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

8231f844

nfp: bpf: add basic support for adjust head call · 0d49eaf4

由 Jakub Kicinski 提交于 12月 14, 2017

Support bpf_xdp_adjust_head().  We need to check whether the
packet offset after adjustment is within datapath's limits.
We also check if the frame is at least ETH_HLEN long (similar
to the kernel implementation).
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

0d49eaf4

nfp: bpf: prepare for call support · 2cb230bd

由 Jakub Kicinski 提交于 12月 14, 2017

Add skeleton of verifier checks and translation handler
for call instructions.  Make sure jump target resolution
will not treat them as jumps.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

2cb230bd

02 12月, 2017 4 次提交

nfp: bpf: detect load/store sequences lowered from memory copy · 6bc7103c

由 Jiong Wang 提交于 11月 30, 2017

This patch add the optimization frontend, but adding a new eBPF IR scan
pass "nfp_bpf_opt_ldst_gather".

The pass will traverse the IR to recognize the load/store pairs sequences
that come from lowering of memory copy builtins.

The gathered memory copy information will be kept in the meta info
structure of the first load instruction in the sequence and will be
consumed by the optimization backend added in the previous patches.

NOTE: a sequence with cross memory access doesn't qualify this
optimization, i.e. if one load in the sequence will load from place that
has been written by previous store. This is because when we turn the
sequence into single CPP operation, we are reading all contents at once
into NFP transfer registers, then write them out as a whole. This is not
identical with what the original load/store sequence is doing.

Detecting cross memory access for two random pointers will be difficult,
fortunately under XDP/eBPF's restrictied runtime environment, the copy
normally happen among map, packet data and stack, they do not overlap with
each other.

And for cases supported by NFP, cross memory access will only happen on
PTR_TO_PACKET. Fortunately for this, there is ID information that we could
do accurate memory alias check.
Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

6bc7103c

nfp: bpf: implement memory bulk copy for length bigger than 32-bytes · 8c900538

由 Jiong Wang 提交于 11月 30, 2017

When the gathered copy length is bigger than 32-bytes and within 128-bytes
(the maximum length a single CPP Pull/Push request can finish), the
strategy of read/write are changeed into:

  * Read.
      - use direct reference mode when length is within 32-bytes.
      - use indirect mode when length is bigger than 32-bytes.

  * Write.
      - length <= 8-bytes
        use write8 (direct_ref).
      - length <= 32-byte and 4-bytes aligned
        use write32 (direct_ref).
      - length <= 32-bytes but not 4-bytes aligned
        use write8 (indirect_ref).
      - length > 32-bytes and 4-bytes aligned
        use write32 (indirect_ref).
      - length > 32-bytes and not 4-bytes aligned and <= 40-bytes
        use write32 (direct_ref) to finish the first 32-bytes.
        use write8 (direct_ref) to finish all remaining hanging part.
      - length > 32-bytes and not 4-bytes aligned
        use write32 (indirect_ref) to finish those 4-byte aligned parts.
        use write8 (direct_ref) to finish all remaining hanging part.
Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

8c900538

nfp: bpf: implement memory bulk copy for length within 32-bytes · 9879a381

由 Jiong Wang 提交于 11月 30, 2017

For NFP, we want to re-group a sequence of load/store pairs lowered from
memcpy/memmove into single memory bulk operation which then could be
accelerated using NFP CPP bus.

This patch extends the existing load/store auxiliary information by adding
two new fields:

	struct bpf_insn *paired_st;
	s16 ldst_gather_len;

Both fields are supposed to be carried by the the load instruction at the
head of the sequence. "paired_st" is the corresponding store instruction at
the head and "ldst_gather_len" is the gathered length.

If "ldst_gather_len" is negative, then the sequence is doing memory
load/store in descending order, otherwise it is in ascending order. We need
this information to detect overlapped memory access.

This patch then optimize memory bulk copy when the copy length is within
32-bytes.

The strategy of read/write used is:

  * Read.
    Use read32 (direct_ref), always.

  * Write.
    - length <= 8-bytes
      write8 (direct_ref).
    - length <= 32-bytes and is 4-byte aligned
      write32 (direct_ref).
    - length <= 32-bytes but is not 4-byte aligned
      write8 (indirect_ref).

NOTE: the optimization should not change program semantics. The destination
register of the last load instruction should contain the same value before
and after this optimization.
Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

9879a381

nfp: bpf: encode indirect commands · 5468a8b9

由 Jakub Kicinski 提交于 11月 30, 2017

Add support for emitting commands with field overwrites.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

5468a8b9

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功