- 05 11月, 2017 1 次提交
-
-
由 Jakub Kicinski 提交于
Only support BPF_PROG_TYPE_SCHED_CLS programs in direct action mode. This simplifies preparing the offload since there will now be only one mode of operation for that type of program. We need to know the attachment mode type of cls_bpf programs, because exit codes are interpreted differently for legacy vs DA mode. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 11月, 2017 5 次提交
-
-
由 Jakub Kicinski 提交于
If kernel config does not include BPF just replace the BPF app handler with the handler for basic NIC. The BPF app will now be built only if BPF infrastructure is selected in kernel config. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NSimon Horman <simon.horman@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
Recent TC changes dropped the check protecting us from trying to offload a TC program if XDP programs are already loaded. Fixes: 90d97315 ("nfp: bpf: Convert ndo_setup_tc offloads to block callbacks") Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NSimon Horman <simon.horman@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiong Wang 提交于
This patch supports BPF_NEG under both BPF_ALU64 and BPF_ALU. LLVM recently starts to generate it. NOTE: BPF_NEG takes single operand which is an register and serve as both input and output. Signed-off-by: NJiong Wang <jiong.wang@netronome.com> Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiong Wang 提交于
The current ALU_OP_NEG is Op encoding 0x4 for NPF ALU instruction. It is actually performing "~B" operation which is bitwise NOT. The using naming ALU_OP_NEG is misleading as NEG is -B which is not the same as ~B. Signed-off-by: NJiong Wang <jiong.wang@netronome.com> Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Pirko 提交于
This restores the original behaviour before the block callbacks were introduced. Allow the drivers to do binding of block always, no matter if the NETIF_F_HW_TC feature is on or off. Move the check to the block callback which is called for rule insertion. Reported-by: NAlexander Duyck <alexander.duyck@gmail.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 11月, 2017 1 次提交
-
-
由 Alexei Starovoitov 提交于
the verifier got progressively smarter over time and size of its internal state grew as well. Time to reduce the memory consumption. Before: sizeof(struct bpf_verifier_state) = 6520 After: sizeof(struct bpf_verifier_state) = 896 It's done by observing that majority of BPF programs use little to no stack whereas verifier kept all of 512 stack slots ready always. Instead dynamically reallocate struct verifier state when stack access is detected. Runtime difference before vs after is within a noise. The number of processed instructions stays the same. Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 10月, 2017 1 次提交
-
-
由 Kees Cook 提交于
In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: Jakub Kicinski <jakub.kicinski@netronome.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jiri Pirko <jiri@mellanox.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Simon Horman <simon.horman@netronome.com> Cc: oss-drivers@netronome.com Cc: netdev@vger.kernel.org Signed-off-by: NKees Cook <keescook@chromium.org> Acked-by: NJakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 10月, 2017 9 次提交
-
-
由 Jakub Kicinski 提交于
Loading 64bit constants require up to 4 load immediates, since we can only load 16 bits at a time. If the 32bit halves of the 64bit constant are the same, however, we can save a cycle by doing a register move instead of two loads of 16 bits. Note that we don't optimize the normal ALU64 load because even though it's a 64 bit load the upper half of the register is a coming from sign extension so we can load it in one cycle anyway. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
If stack pointer has a different value on different paths but the alignment to words (4B) remains the same, we can set a new LMEM access pointer to the calculated value and access whichever word it's pointing to. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
To access beyond 64th byte of the stack we need to set a new stack pointer register (LMEM is accessed indirectly through those pointers). Add a function for encoding local CSR access instruction. Use stack pointer number 3. Note that stack pointer registers allow us to index into 32 bytes of LMEM (with shift operations i.e. when operands are restricted). This means if access is crossing 32 byte boundary we must not use offsetting, we have to set the pointer to the exact address and move it with post-increments. We depend on the datapath placing the stack base address in GPR A22 for our use. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
As long as the verifier tells us the stack offset exactly we can render the LMEM reads quite easily. Simply make sure that the offset is constant for a given instruction and add it to the instruction's offset. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
When we are performing unaligned stack accesses in the 32-64B window we have to do a read-modify-write cycle. E.g. for reading 8 bytes from address 17: 0: tmp = stack[16] 1: gprLo = tmp >> 8 2: tmp = stack[20] 3: gprLo |= tmp << 24 4: tmp = stack[20] 5: gprHi = tmp >> 8 6: tmp = stack[24] 7: gprHi |= tmp << 24 The load on line 4 is unnecessary, because tmp already contains data from stack[20]. For write we can optimize both loads and writebacks away. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
Add simple stack read support, similar to write in every aspect, but data flowing the other way. Note that unlike write which can be done in smaller than word quantities, if registers are loaded with less-than-word of stack contents - the values have to be zero extended. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
Stack is implemented by the LMEM register file. Unaligned accesses to LMEM are not allowed. Accesses also have to be 4B wide. To support stack we need to make sure offsets of pointers are known at translation time (for now) and perform correct load/mask/shift operations. Since we can access first 64B of LMEM without much effort support only stacks not bigger than 64B. Following commits will extend the possible sizes beyond that. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
nfp_bpf_check_ptr() mostly looks at the pointer register. Add a temporary variable to shorten the code. While at it make sure we print error messages if translation fails to help users identify the problem (to be carried in ext_ack in due course). Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
The need to emitting a few nops will become more common soon as we add stack and map support. Add a helper. This allows for code to be shorter but also may be handy for marking the nops with a "reason" to ease applying optimizations. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 10月, 2017 2 次提交
-
-
由 Jiri Pirko 提交于
All drivers are converted to use block callbacks for TC_SETUP_CLS*. So it is now safe to remove the calls to ndo_setup_tc from cls_* Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Pirko 提交于
Benefit from the newly introduced block callback infrastructure and convert ndo_setup_tc calls for bpf offloads to block callbacks. Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 10月, 2017 11 次提交
-
-
由 Jakub Kicinski 提交于
Add support for direct packet access in TC, note that because writing the packet will cause the verifier to generate a csum fixup prologue we won't be able to offload packet writes from TC, just yet, only the reads will work. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NSimon Horman <simon.horman@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
This patch adds ability to write packet contents using pre-validated packet pointers (direct packet access). Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NSimon Horman <simon.horman@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
In direct packet access bound checks are already done, we can simply dereference the packet pointer. Verifier/parser logic needs to record pointer type. Note that although verifier does protect us from CTX vs other pointer changes we will also want to differentiate between PACKET vs MAP_VALUE or STACK, so we can add the check already. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NSimon Horman <simon.horman@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
Move data load into a separate function and separate it from packet length checks of legacy I/O. This makes the code more readable and easier to reuse. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NSimon Horman <simon.horman@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
Sizes of fields in struct xdp_md/xdp_buff and some in sk_buff depend on target architecture. Take that into account and use struct xdp_buff, not struct xdp_md. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NSimon Horman <simon.horman@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
eBPF is host-endian specific. Translating both BE and LE eBPF to the NFP is feasible, but would require quite a bit of indirection. The fact that I don't have access to any BE hosts that would fit a 25G/40G/100G NIC is also limiting my ability to test big endian. For now restrict the offload to little endian hosts only. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NSimon Horman <simon.horman@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
Implement byte swaps with rotations, shifts and byte loads. Remember to clear upper parts of the 64 bit registers. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NSimon Horman <simon.horman@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
Register move operation is encoded as alu no op. This means that one has to specify number of unused/none parameters to the emit_alu(). Add a helper. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NSimon Horman <simon.horman@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
Now that we have BPF assemebler support in LLVM 6 we can easily test all compare instructions (LLVM 4 didn't generate most of them from C). Fix the compare to immediates and refactor the order of compare to regs to make sure they both follow the same pattern. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NSimon Horman <simon.horman@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
We optimize comparisons to immediate 0 as if (reg.lo | reg.hi). The early return statement was missing, however, which means we would generate two comparisons - optimized one followed by a normal 2x 32 bit compare. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NSimon Horman <simon.horman@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
ld_field instruction has the following format in NFP assembler: ld_field[dst, 1000, src, <<24] reoder parameters to emit_ld_field_any() to make it closer to the familiar assembler order. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NSimon Horman <simon.horman@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 10月, 2017 10 次提交
-
-
由 Jakub Kicinski 提交于
ld_field instruction is a bit special because the encoding uses two source registers and one of them becomes the output. We do need to pass the dst register to our encoding helpers though, otherwise the "write both banks" flag will not be observed. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NSimon Horman <simon.horman@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
Device expects the instructions in little endian. Make sure we byte swap on big endian hosts. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NSimon Horman <simon.horman@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
We need to append up to 8 nops after last instruction to make sure the CPU will not fetch garbage instructions with invalid ECC if the code store was not initialized. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NSimon Horman <simon.horman@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
In the initial PoC firmware I simply disabled ECC on the instruction store. Do the ECC calculation for generated instructions in the driver. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NSimon Horman <simon.horman@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
Datapath ABI version 2 stores the packet information in LMEM instead of NNRs. We also have strict restrictions on which GPRs we can use. Only GPRs 0-23 are reserved for BPF. Adjust the static register locations and "ABI" registers. Note that packet length is packed with other info so we have to extract it into one of the scratch registers, OTOH since LMEM can be used in restricted operands we don't have to extract packet pointer. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NSimon Horman <simon.horman@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
Most instructions have special fields which allow switching between base and extended Local Memory pointers. Introduce those to register encoding, we will use the extra LM pointers to access high addresses of the stack. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NSimon Horman <simon.horman@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
Temporarily drop support for skb->mark. We are primarily focusing on XDP offload, and implementing skb->mark on the new datapath has lower priority. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NSimon Horman <simon.horman@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
Remove the register renumbering optimization. To implement calling map and other helpers we need more strict register layout. We can't freely reassign register numbers. This will have the effect of running in 4 context/thread mode, which should be OK since we are moving towards integrating the BPF closer with FW app datapath anyway, and the target datapath itself runs in 4 context mode. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NSimon Horman <simon.horman@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
Add encodings of all 64bit shift operations. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NSimon Horman <simon.horman@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
Move the software reg helpers and some static data to nfp_asm.c. They are related to the previous patch, but move is done in a separate commit for ease of review. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NSimon Horman <simon.horman@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-