提交 · cbd357008604925355ae7b54a09137dabb81b580 · openeuler / raspberrypi-kernel

27 9月, 2014 7 次提交

bpf: verifier (add ability to receive verification log) · cbd35700

由 Alexei Starovoitov 提交于 9月 26, 2014

add optional attributes for BPF_PROG_LOAD syscall:
union bpf_attr {
    struct {
	...
	__u32         log_level; /* verbosity level of eBPF verifier */
	__u32         log_size;  /* size of user buffer */
	__aligned_u64 log_buf;   /* user supplied 'char *buffer' */
    };
};

when log_level > 0 the verifier will return its verification log in the user
supplied buffer 'log_buf' which can be used by program author to analyze why
verifier rejected given program.

'Understanding eBPF verifier messages' section of Documentation/networking/filter.txt
provides several examples of these messages, like the program:

  BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
  BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
  BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
  BPF_LD_MAP_FD(BPF_REG_1, 0),
  BPF_CALL_FUNC(BPF_FUNC_map_lookup_elem),
  BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
  BPF_ST_MEM(BPF_DW, BPF_REG_0, 4, 0),
  BPF_EXIT_INSN(),

will be rejected with the following multi-line message in log_buf:

  0: (7a) *(u64 *)(r10 -8) = 0
  1: (bf) r2 = r10
  2: (07) r2 += -8
  3: (b7) r1 = 0
  4: (85) call 1
  5: (15) if r0 == 0x0 goto pc+1
   R0=map_ptr R10=fp
  6: (7a) *(u64 *)(r0 +4) = 0
  misaligned access off 4 size 8

The format of the output can change at any time as verifier evolves.
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cbd35700

bpf: verifier (add docs) · 51580e79

由 Alexei Starovoitov 提交于 9月 26, 2014

this patch adds all of eBPF verfier documentation and empty bpf_check()

The end goal for the verifier is to statically check safety of the program.

Verifier will catch:
- loops
- out of range jumps
- unreachable instructions
- invalid instructions
- uninitialized register access
- uninitialized stack access
- misaligned stack access
- out of range stack access
- invalid calling convention

More details in Documentation/networking/filter.txt
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

51580e79

bpf: handle pseudo BPF_CALL insn · 0a542a86

由 Alexei Starovoitov 提交于 9月 26, 2014

in native eBPF programs userspace is using pseudo BPF_CALL instructions
which encode one of 'enum bpf_func_id' inside insn->imm field.
Verifier checks that program using correct function arguments to given func_id.
If all checks passed, kernel needs to fixup BPF_CALL->imm fields by
replacing func_id with in-kernel function pointer.
eBPF interpreter just calls the function.

In-kernel eBPF users continue to use generic BPF_CALL.
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0a542a86

bpf: expand BPF syscall with program load/unload · 09756af4

由 Alexei Starovoitov 提交于 9月 26, 2014

eBPF programs are similar to kernel modules. They are loaded by the user
process and automatically unloaded when process exits. Each eBPF program is
a safe run-to-completion set of instructions. eBPF verifier statically
determines that the program terminates and is safe to execute.

The following syscall wrapper can be used to load the program:
int bpf_prog_load(enum bpf_prog_type prog_type,
                  const struct bpf_insn *insns, int insn_cnt,
                  const char *license)
{
    union bpf_attr attr = {
        .prog_type = prog_type,
        .insns = ptr_to_u64(insns),
        .insn_cnt = insn_cnt,
        .license = ptr_to_u64(license),
    };

    return bpf(BPF_PROG_LOAD, &attr, sizeof(attr));
}
where 'insns' is an array of eBPF instructions and 'license' is a string
that must be GPL compatible to call helper functions marked gpl_only

Upon succesful load the syscall returns prog_fd.
Use close(prog_fd) to unload the program.

User space tests and examples follow in the later patches
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

09756af4

bpf: add lookup/update/delete/iterate methods to BPF maps · db20fd2b

由 Alexei Starovoitov 提交于 9月 26, 2014

'maps' is a generic storage of different types for sharing data between kernel
and userspace.

The maps are accessed from user space via BPF syscall, which has commands:

- create a map with given type and attributes
  fd = bpf(BPF_MAP_CREATE, union bpf_attr *attr, u32 size)
  returns fd or negative error

- lookup key in a given map referenced by fd
  err = bpf(BPF_MAP_LOOKUP_ELEM, union bpf_attr *attr, u32 size)
  using attr->map_fd, attr->key, attr->value
  returns zero and stores found elem into value or negative error

- create or update key/value pair in a given map
  err = bpf(BPF_MAP_UPDATE_ELEM, union bpf_attr *attr, u32 size)
  using attr->map_fd, attr->key, attr->value
  returns zero or negative error

- find and delete element by key in a given map
  err = bpf(BPF_MAP_DELETE_ELEM, union bpf_attr *attr, u32 size)
  using attr->map_fd, attr->key

- iterate map elements (based on input key return next_key)
  err = bpf(BPF_MAP_GET_NEXT_KEY, union bpf_attr *attr, u32 size)
  using attr->map_fd, attr->key, attr->next_key

- close(fd) deletes the map
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

db20fd2b

bpf: enable bpf syscall on x64 and i386 · 749730ce

由 Alexei Starovoitov 提交于 9月 26, 2014

done as separate commit to ease conflict resolution
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

749730ce

bpf: introduce BPF syscall and maps · 99c55f7d

由 Alexei Starovoitov 提交于 9月 26, 2014

BPF syscall is a multiplexor for a range of different operations on eBPF.
This patch introduces syscall with single command to create a map.
Next patch adds commands to access maps.

'maps' is a generic storage of different types for sharing data between kernel
and userspace.

Userspace example:
/* this syscall wrapper creates a map with given type and attributes
 * and returns map_fd on success.
 * use close(map_fd) to delete the map
 */
int bpf_create_map(enum bpf_map_type map_type, int key_size,
                   int value_size, int max_entries)
{
    union bpf_attr attr = {
        .map_type = map_type,
        .key_size = key_size,
        .value_size = value_size,
        .max_entries = max_entries
    };

    return bpf(BPF_MAP_CREATE, &attr, sizeof(attr));
}

'union bpf_attr' is backwards compatible with future extensions.

More details in Documentation/networking/filter.txt and in manpage
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

99c55f7d

26 9月, 2014 8 次提交

net: sched: use pinned timers · 4a8e320c

由 Eric Dumazet 提交于 9月 20, 2014

While using a MQ + NETEM setup, I had confirmation that the default
timer migration ( /proc/sys/kernel/timer_migration ) is killing us.

Installing this on a receiver side of a TCP_STREAM test, (NIC has 8 TX
queues) :

EST="est 1sec 4sec"
for ETH in eth1
do
 tc qd del dev $ETH root 2>/dev/null
 tc qd add dev $ETH root handle 1: mq
 tc qd add dev $ETH parent 1:1 $EST netem limit 70000 delay 6ms
 tc qd add dev $ETH parent 1:2 $EST netem limit 70000 delay 8ms
 tc qd add dev $ETH parent 1:3 $EST netem limit 70000 delay 10ms
 tc qd add dev $ETH parent 1:4 $EST netem limit 70000 delay 12ms
 tc qd add dev $ETH parent 1:5 $EST netem limit 70000 delay 14ms
 tc qd add dev $ETH parent 1:6 $EST netem limit 70000 delay 16ms
 tc qd add dev $ETH parent 1:7 $EST netem limit 80000 delay 18ms
 tc qd add dev $ETH parent 1:8 $EST netem limit 90000 delay 20ms
done

We can see that timers get migrated into a single cpu, presumably idle
at the time timers are set up.
Then all qdisc dequeues run from this cpu and huge lock contention
happens. This single cpu is stuck in softirq mode and cannot dequeue
fast enough.

    39.24%  [kernel]          [k] _raw_spin_lock
     2.65%  [kernel]          [k] netem_enqueue
     1.80%  [kernel]          [k] netem_dequeue
     1.63%  [kernel]          [k] copy_user_enhanced_fast_string
     1.45%  [kernel]          [k] _raw_spin_lock_bh

By pinning qdisc timers on the cpu running the qdisc, we respect proper
XPS setting and remove this lock contention.

     5.84%  [kernel]          [k] netem_enqueue
     4.83%  [kernel]          [k] _raw_spin_lock
     2.92%  [kernel]          [k] copy_user_enhanced_fast_string

Current Qdiscs that benefit from this change are :

	netem, cbq, fq, hfsc, tbf, htb.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4a8e320c

Merge branch 'gso_send_check' · 9fb426a6

由 David S. Miller 提交于 9月 26, 2014

Tom Herbert says:

====================
net: Eliminate gso_send_check

gso_send_check presents a lot of complexity for what it is being used
for. It seems that there are only two cases where it might be effective:
TCP and UFO paths. In these cases, the gso_send_check function
initializes the TCP or UDP checksum respectively to the pseudo header
checksum so that the checksum computation is appropriately offloaded or
computed in the gso_segment functions. The gso_send_check functions
are only called from dev.c in skb_mac_gso_segment when ip_summed !=
CHECKSUM_PARTIAL (which seems very unlikely in TCP case). We can move
the logic of this into the respective gso_segment functions where the
checksum is initialized if ip_summed != CHECKSUM_PARTIAL.

With the above cases handled, gso_send_check is no longer needed, so
we can remove all uses of it and the fields in the offload callbacks.
With this change, ip_summed in the skb should be preserved though all
the layers of gso_segment calls.

In follow-on patches, we may be able to remove the check setup code in
tcp_gso_segment if we can guarantee that ip_summed will always be
CHECKSUM_PARTIAL (verify all paths and probably add an assert in
tcp_gro_segment).

Tested these patches by:
  - netperf TCP_STREAM test with GSO enabled
  - Forced ip_summed != CHECKSUM_PARTIAL with above
  - Ran UDP_RR with 10000 request size over GRE tunnel. This exercised
    UFO path.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9fb426a6

net: Remove gso_send_check as an offload callback · 53e50398

由 Tom Herbert 提交于 9月 20, 2014

The send_check logic was only interesting in cases of TCP offload and
UDP UFO where the checksum needed to be initialized to the pseudo
header checksum. Now we've moved that logic into the related
gso_segment functions so gso_send_check is no longer needed.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

53e50398

udp: move logic out of udp[46]_ufo_send_check · f71470b3

由 Tom Herbert 提交于 9月 20, 2014

In udp[46]_ufo_send_check the UDP checksum initialized to the pseudo
header checksum. We can move this logic into udp[46]_ufo_fragment.
After this change udp[64]_ufo_send_check is a no-op.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f71470b3

tcp: move logic out of tcp_v[64]_gso_send_check · d020f8f7

由 Tom Herbert 提交于 9月 20, 2014

In tcp_v[46]_gso_send_check the TCP checksum is initialized to the
pseudo header checksum using __tcp_v[46]_send_check. We can move this
logic into new tcp[46]_gso_segment functions to be done when
ip_summed != CHECKSUM_PARTIAL (ip_summed == CHECKSUM_PARTIAL should be
the common case, possibly always true when taking GSO path). After this
change tcp_v[46]_gso_send_check is no-op.
Signed-off-by: NTom Herbert <therbert@google.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d020f8f7

Merge branch 'stmmac' · 2fdbfea5

由 David S. Miller 提交于 9月 26, 2014

Beniamino Galvani says:

====================
net: stmmac glue layer for Amlogic Meson SoCs

the Ethernet controller available in Amlogic Meson6 and Meson8 SoCs is
a Synopsys DesignWare MAC IP core, already supported by the stmmac
driver.

These patches add a glue layer to the driver for the platform-specific
settings required by the Amlogic variant.

This has been tested on a Amlogic S802 device with the initial Meson
support submitted by Carlo Caione [1].

[1] http://lwn.net/Articles/612000/
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2fdbfea5

net: stmmac: meson: document device tree bindings · 318fd490

由 Beniamino Galvani 提交于 9月 20, 2014

Add the device tree bindings documentation for the Amlogic Meson
variant of the Synopsys DesignWare MAC.
Signed-off-by: NBeniamino Galvani <b.galvani@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

318fd490

net: stmmac: add Amlogic Meson glue layer · 0ad5adcd

由 Beniamino Galvani 提交于 9月 20, 2014

The Ethernet controller available in Meson6 and Meson8 SoCs is a
Synopsys DesignWare MAC IP core, already supported by the stmmac
driver.

This glue layer implements some platform-specific settings needed by
the Amlogic variant.
Signed-off-by: NBeniamino Galvani <b.galvani@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0ad5adcd

25 9月, 2014 17 次提交

D

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 4daaab4f
由 David S. Miller 提交于 9月 24, 2014

4daaab4f

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · b94d525e

由 Linus Torvalds 提交于 9月 24, 2014

Pull networking fixes from David Miller:
 "Here is a quick pull request primarily meant to address the deconfig
  fallout from changing SCSI_NETLINK from being used via 'select' to
  being used via 'depends'.

  I applied a set of 5 patches written by Michal Marek, and then I
  carefully audited all of the remaining config files, basically:

   1) I scanned every arch config file, and if it mentioned CONFIG_INET
      or CONFIG_UNIX, I made sure it had CONFIG_NET=y

   2) After that, I scanned every arch config file, and if it did not
      have CONFIG_NET=y I made sure it did not reference any networking
      config options.

  Finally, we have some late breaking wireless fixes in here from John
  Linville and co"

[ And there's a sparc bpf fix snuck in too ]

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  sparc: bpf_jit: fix loads from negative offsets
  parisc: Update defconfigs which were missing CONFIG_NET.
  powerpc: Update defconfigs which were missing CONFIG_NET.
  s390: Update defconfigs which were missing CONFIG_NET.
  mips: Update some more defconfigs which were missing CONFIG_NET.
  sparc: Set CONFIG_NET=y in defconfigs
  sh: Set CONFIG_NET=y in defconfigs
  powerpc: Set CONFIG_NET=y in defconfigs
  parisc: Set CONFIG_NET=y in defconfigs
  mips: Set CONFIG_NET=y in defconfigs
  brcmfmac: Fix off by one bug in brcmf_count_20mhz_channels()
  ath9k: Fix NULL pointer dereference on early irq
  net: rfkill: gpio: Fix clock status
  NFC: st21nfca: Fix potential depmod dependency cycle
  NFC: st21nfcb: Fix depmod dependency cycle
  NFC: microread: Potential overflows in microread_target_discovered()

b94d525e

sparc: bpf_jit: fix loads from negative offsets · 35607b02

由 Alexei Starovoitov 提交于 9月 23, 2014

- fix BPF_LD|ABS|IND from negative offsets:
  make sure to sign extend lower 32 bits in 64-bit register
  before calling C helpers from JITed code, otherwise 'int k'
  argument of bpf_internal_load_pointer_neg_helper() function
  will be added as large unsigned integer, causing packet size
  check to trigger and abort the program.

  It's worth noting that JITed code for 'A = A op K' will affect
  upper 32 bits differently depending whether K is simm13 or not.
  Since small constants are sign extended, whereas large constants
  are stored in temp register and zero extended.
  That is ok and we don't have to pay a penalty of sign extension
  for every sethi, since all classic BPF instructions have 32-bit
  semantics and we only need to set correct upper bits when
  transitioning from JITed code into C.

- though instructions 'A &= 0' and 'A *= 0' are odd, JIT compiler
  should not optimize them out
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

35607b02

Merge tag 'master-2014-09-23' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless · 543a2dff

由 David S. Miller 提交于 9月 24, 2014

John W. Linville says:

====================
pull request: wireless 2014-09-23

Please consider pulling this one last batch of fixes intended for the 3.17 stream!

For the NFC bits, Samuel says:

"Hopefully not too late for a handful of NFC fixes:

- 2 potential build failures for ST21NFCA and ST21NFCB, triggered by a
  depmod dependenyc cycle.
- One potential buffer overflow in the microread driver."

On top of that...

Emil Goode provides a fix for a brcmfmac off-by-one regression which
was introduced in the 3.17 cycle.

Loic Poulain fixes a polarity mismatch for a variable assignment
inside of rfkill-gpio.

Wojciech Dubowik prevents a NULL pointer dereference in ath9k.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

543a2dff

parisc: Update defconfigs which were missing CONFIG_NET. · c899c3f3

由 David S. Miller 提交于 9月 24, 2014

Commit df568d8e ("scsi: Use 'depends' with LIBFC instead of
'select'.")  removed what happened to be the only instance of 'select
NET'. Defconfigs that were relying on the select now lack networking
support.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c899c3f3

powerpc: Update defconfigs which were missing CONFIG_NET. · 95d77997

由 David S. Miller 提交于 9月 24, 2014

Commit df568d8e ("scsi: Use 'depends' with LIBFC instead of
'select'.")  removed what happened to be the only instance of 'select
NET'. Defconfigs that were relying on the select now lack networking
support.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

95d77997

s390: Update defconfigs which were missing CONFIG_NET. · ff408ba1

由 David S. Miller 提交于 9月 24, 2014

Commit df568d8e ("scsi: Use 'depends' with LIBFC instead of
'select'.")  removed what happened to be the only instance of 'select
NET'. Defconfigs that were relying on the select now lack networking
support.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ff408ba1

mips: Update some more defconfigs which were missing CONFIG_NET. · af4de1b5

由 David S. Miller 提交于 9月 24, 2014

Commit df568d8e ("scsi: Use 'depends' with LIBFC instead of
'select'.")  removed what happened to be the only instance of 'select
NET'. Defconfigs that were relying on the select now lack networking
support.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

af4de1b5

sparc: Set CONFIG_NET=y in defconfigs · 1ab0b8b2