提交 · 44fafdaa757cf251aade6c071f772ddb4e8a9885 · openanolis / cloud-kernel

21 7月, 2016 8 次提交

net/mlx5: Use PTR_ERR_OR_ZERO() to simplify the code · 44fafdaa

由 Wei Yongjun 提交于 7月 19, 2016

Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR.

Generated by: scripts/coccinelle/api/ptr_ret.cocci
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

44fafdaa

net: ethernet: nb8800: fix error handling of nb8800_probe() · 9a7bae8a

由 Wei Yongjun 提交于 7月 19, 2016

In ops->reset() error handling case, clk_disable_unprepare() is missed
before return from this function.
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: NMans Rullgard <mans@mansr.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9a7bae8a

wan/fsl_ucc_hdlc: use module_platform_driver to simplify the code · 459421cc

由 Wei Yongjun 提交于 7月 19, 2016

module_platform_driver() makes the code simpler by eliminating
boilerplate code.
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

459421cc

wan/fsl_ucc_hdlc: remove .owner field for driver · 9d5658e6

由 Wei Yongjun 提交于 7月 19, 2016

Remove .owner field if calls are used which set it automatically.

Generated by: scripts/coccinelle/api/platform_no_drv_owner.cocci
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9d5658e6

net: axienet: Fix return value check in axienet_probe() · 3ad7b147

由 Wei Yongjun 提交于 7月 19, 2016

In case of error, the function of_parse_phandle() returns NULL
pointer not ERR_PTR(). The IS_ERR() test in the return value
check should be replaced with NULL test.

Fixes: 46aa27df ('net: axienet: Use devm_* calls')
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3ad7b147

Merge branch 'for-upstream' of... · 4599f772

由 David S. Miller 提交于 7月 20, 2016

Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next

Johan Hedberg says:

====================
pull request: bluetooth-next 2016-07-19

Here's likely the last bluetooth-next pull request for the 4.8 kernel:

 - Fix for L2CAP setsockopt
 - Fix for is_suspending flag handling in btmrvl driver
 - Addition of Bluetooth HW & FW info fields to debugfs
 - Fix to use int instead of char for callback status.

The last one (from Geert Uytterhoeven) is actually not purely a
Bluetooth (or 802.15.4) patch, but it was agreed with other maintainers
that we take it through the bluetooth-next tree.

Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4599f772

bpf, elf: add official ELF machine define for eBPF · b02b94b3

由 Daniel Borkmann 提交于 7月 20, 2016

Add the official BPF ELF e_machine value that was assigned recently [1,2]
and will be propagated to glibc, et al. LLVM is switching to it in 3.9
release.

  [1] https://github.com/llvm-mirror/llvm/commit/36b9c09330bfb5e771914cfe307588f30d5510d2
  [2] http://lists.iovisor.org/pipermail/iovisor-dev/2016-June/000266.htmlSigned-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b02b94b3

bpf: fix implicit declaration of bpf_prog_add · cc2e0b3f

由 Brenden Blanco 提交于 7月 20, 2016

For the ifndef case of CONFIG_BPF_SYSCALL, an inline version of
bpf_prog_add needs to exist otherwise the build breaks on some configs.

 drivers/net/ethernet/mellanox/mlx4/en_netdev.c:2544:10: error: implicit declaration of function 'bpf_prog_add'
       prog = bpf_prog_add(prog, priv->rx_ring_num - 1);

The function is introduced in
59d3656d ("bpf: add bpf_prog_add api for bulk prog refcnt")
and first used in
47f1afdba2b87 ("net/mlx4_en: add support for fast rx drop bpf program").

Fixes: 47f1afdba2b87 ("net/mlx4_en: add support for fast rx drop bpf program")
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Reported-by: NTariq Toukan <ttoukan.linux@gmail.com>
Signed-off-by: NBrenden Blanco <bblanco@plumgrid.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cc2e0b3f

20 7月, 2016 32 次提交

Merge branch 'xdp' · 22b35488

由 David S. Miller 提交于 7月 19, 2016

Brenden Blanco says:

====================
Add driver bpf hook for early packet drop and forwarding

This patch set introduces new infrastructure for programmatically
processing packets in the earliest stages of rx, as part of an effort
others are calling eXpress Data Path (XDP) [1]. Start this effort by
introducing a new bpf program type for early packet filtering, before
even an skb has been allocated.

Extend on this with the ability to modify packet data and send back out
on the same port.

Patch 1 adds an API for bulk bpf prog refcnt incrememnt.
Patch 2 introduces the new prog type and helpers for validating the bpf
  program. A new userspace struct is defined containing only data and
  data_end as fields, with others to follow in the future.
In patch 3, create a new ndo to pass the fd to supported drivers.
In patch 4, expose a new rtnl option to userspace.
In patch 5, enable support in mlx4 driver.
In patch 6, create a sample drop and count program. With single core,
  achieved ~20 Mpps drop rate on a 40G ConnectX3-Pro. This includes
  packet data access, bpf array lookup, and increment.
In patch 7, add a page recycle facility to mlx4 rx, enabled when xdp is
  active.
In patch 8, add the XDP_TX type to bpf.h
In patch 9, add helper in tx patch for writing tx_desc
In patch 10, add support in mlx4 for packet data write and forwarding
In patch 11, turn on packet write support in the bpf verifier
In patch 12, add a sample program for packet write and forwarding. With
  single core, achieved ~10 Mpps rewrite and forwarding.

[1] https://github.com/iovisor/bpf-docs/blob/master/Express_Data_Path.pdf

v10:
 1/12: Add bulk refcnt api.
 5/12: Move prog from priv to ring. This attribute is still only set
   globally, but the path to finer granularity should be clear. No lock
   is taken, so some rings may operate on older programs for a time (one
   napi loop). Looked into options such as napi_synchronize, but they
   were deemed too slow (calls to msleep).
   Rename prog to xdp_prog. Add xdp_ring_num to help with accounting,
   used more heavily in later patches.
 7/12: Adjust to use per-ring xdp prog. Use priv->xdp_ring_num where
   before priv->prog was used to determine buffer allocations.
 9/12: Add cpu_to_be16 to vlan_tag in mxl4_en_xmit(). Remove unused variable
   from mlx4_en_xmit and unused params from build_inline_wqe.

v9:
 4/11: Add missing newline in en_err message.
 6/11: Move page_cache cleanup from mlx4_en_destroy_rx_ring to
   mlx4_en_deactivate_rx_ring. Move mlx4_en_moderation_update back to
   static. Remove calls to mlx4_en_alloc/free_resources in mlx4_xdp_set.
   Adopt instead the approach of mlx4_en_change_mtu to use a watchdog.
 9/11: Use a per-ring function pointer in tx to separate out the code
   for regular and recycle paths of tx completion handling. Add a helper
   function to init the recycle ring and callback, called just after
   activating tx. Remove extra tx ring resource requirement, and instead
   steal from the upper rings. This helps to avoid needing
   mlx4_en_alloc_resources. Add some hopefully meaningful error
   messages for the various error cases. Reverted some of the
   hard-to-follow logic that was accounting for the extra tx rings.

v8:
 1/11: Reduce WARN_ONCE to single line. Also, change act param of that
   function to u32 to match return type of bpf_prog_run_xdp.
 2/11: Clarify locking semantics in ndo comment.
 4/11: Add en_err warning in mlx4_xdp_set on num_frags/mtu violation.

v7:
 Addressing two of the major discussion points: return codes and ndo.
 The rest will be taken as todo items for separate patches.

 Add an XDP_ABORTED type, which explicitly falls through to DROP. The
 same result must be taken for the default case as well, as it is now
 well-defined API behavior.

 Merge ndo_xdp_* into a single ndo. The style is similar to
 ndo_setup_tc, but with less unidirectional naming convention. The IFLA
 parameter names are unchanged.

 TODOs:
 Add ethtool per-ring stats for aborted, default cases, maybe even drop
 and tx as well.
 Avoid duplicate dma sync operation in XDP_PASS case as mentioned by
 Saeed.

  1/12: Add XDP_ABORTED enum, reword API comment, and update commit
   message.
  2/12: Rewrite ndo_xdp_*() into single ndo_xdp() with type/union style
    calling convention.
  3/12: Switch to ndo_xdp callback.
  4/12: Add XDP_ABORTED case as a fall-through to XDP_DROP. Implement
    ndo_xdp.
 12/12: Dropped, this will need some more work.

v6:
  2/12: drop unnecessary netif_device_present check
  4/12, 6/12, 9/12: Reorder default case statement above drop case to
    remove some copy/paste.

v5:
  0/12: Rebase and remove previous 1/13 patch
  1/12: Fix nits from Daniel. Left the (void *) cast as-is, to be fixed
    in future. Add bpf_warn_invalid_xdp_action() helper, to be used when
    out of bounds action is returned by the program. Add a comment to
    bpf.h denoting the undefined nature of out of bounds returns.
  2/12: Switch to using bpf_prog_get_type(). Rename ndo_xdp_get() to
    ndo_xdp_attached().
  3/12: Add IFLA_XDP as a nested type, and add the associated nla_policy
    for the new subtypes IFLA_XDP_FD and IFLA_XDP_ATTACHED.
  4/12: Fixup the use of READ_ONCE in the ndos. Add a user of
    bpf_warn_invalid_xdp_action helper.
  5/12: Adjust to using the nested netlink options.
  6/12: kbuild was complaining about overflow of u16 on tile
    architecture...bump frag_stride to u32. The page_offset member that
    is computed from this was already u32.

v4:
  2/12: Add inline helper for calling xdp bpf prog under rcu
  3/12: Add detail to ndo comments
  5/12: Remove mlx4_call_xdp and use inline helper instead.
  6/12: Fix checkpatch complaints
  9/12: Introduce new patch 9/12 with common helper for tx_desc write
    Refactor to use common tx_desc write helper
 11/12: Fix checkpatch complaints

v3:
  Rewrite from v2 trying to incorporate feedback from multiple sources.
  Specifically, add ability to forward packets out the same port and
    allow packet modification.
  For packet forwarding, the driver reserves a dedicated set of tx rings
    for exclusive use by xdp. Upon completion, the pages on this ring are
    recycled directly back to a small per-rx-ring page cache without
    being dma unmapped.
  Use of the percpu skb is dropped in favor of a lightweight struct
    xdp_buff. The direct packet access feature is leveraged to remove
    dependence on the skb.
  The mlx4 driver implementation allocates a page-per-packet and maps it
    in PCI_DMA_BIDIRECTIONAL mode when the bpf program is activated.
  Naming is converted to use "xdp" instead of "phys_dev".

v2:
  1/5: Drop xdp from types, instead consistently use bpf_phys_dev_
    Introduce enum for return values from phys_dev hook
  2/5: Move prog->type check to just before invoking ndo
    Change ndo to take a bpf_prog * instead of fd
    Add ndo_bpf_get rather than keeping a bool in the netdev struct
  3/5: Use ndo_bpf_get to fetch bool
  4/5: Enforce that only 1 frag is ever given to bpf prog by disallowing
    mtu to increase beyond FRAG_SZ0 when bpf prog is running, or conversely
    to set a bpf prog when priv->num_frags > 1
    Rename pseudo_skb to bpf_phys_dev_md
    Implement ndo_bpf_get
    Add dma sync just before invoking prog
    Check for explicit bpf return code rather than nonzero
    Remove increment of rx_dropped
  5/5: Use explicit bpf return code in example
    Update commit log with higher pps numbers
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

22b35488

bpf: add sample for xdp forwarding and rewrite · 764cbcce

由 Brenden Blanco 提交于 7月 19, 2016

Add a sample that rewrites and forwards packets out on the same
interface. Observed single core forwarding performance of ~10Mpps.

Since the mlx4 driver under test recycles every single packet page, the
perf output shows almost exclusively just the ring management and bpf
program work. Slowdowns are likely occurring due to cache misses.
Signed-off-by: NBrenden Blanco <bblanco@plumgrid.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

764cbcce

bpf: enable direct packet data write for xdp progs · 4acf6c0b

由 Brenden Blanco 提交于 7月 19, 2016

For forwarding to be effective, XDP programs should be allowed to
rewrite packet data.

This requires that the drivers supporting XDP must all map the packet
memory as TODEVICE or BIDIRECTIONAL before invoking the program.
Signed-off-by: NBrenden Blanco <bblanco@plumgrid.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4acf6c0b

net/mlx4_en: add xdp forwarding and data write support · 9ecc2d86

由 Brenden Blanco 提交于 7月 19, 2016

A user will now be able to loop packets back out of the same port using
a bpf program attached to xdp hook. Updates to the packet contents from
the bpf program is also supported.

For the packet write feature to work, the rx buffers are now mapped as
bidirectional when the page is allocated. This occurs only when the xdp
hook is active.

When the program returns a TX action, enqueue the packet directly to a
dedicated tx ring, so as to avoid completely any locking. This requires
the tx ring to be allocated 1:1 for each rx ring, as well as the tx
completion running in the same softirq.

Upon tx completion, this dedicated tx ring recycles pages without
unmapping directly back to the original rx ring. In steady state tx/drop
workload, effectively 0 page allocs/frees will occur.

In order to separate out the paths between free and recycle, a
free_tx_desc func pointer is introduced that is optionally updated
whenever recycle_ring is activated. By default the original free
function is always initialized.
Signed-off-by: NBrenden Blanco <bblanco@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9ecc2d86

net/mlx4_en: break out tx_desc write into separate function · 224e92e0

由 Brenden Blanco 提交于 7月 19, 2016

In preparation for writing the tx descriptor from multiple functions,
create a helper for both normal and blueflame access.
Signed-off-by: NBrenden Blanco <bblanco@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

224e92e0

bpf: add XDP_TX xdp_action for direct forwarding · 6ce96ca3

由 Brenden Blanco 提交于 7月 19, 2016

XDP enabled drivers must transmit received packets back out on the same
port they were received on when a program returns this action.
Signed-off-by: NBrenden Blanco <bblanco@plumgrid.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6ce96ca3

net/mlx4_en: add page recycle to prepare rx ring for tx support · d576acf0

由 Brenden Blanco 提交于 7月 19, 2016

The mlx4 driver by default allocates order-3 pages for the ring to
consume in multiple fragments. When the device has an xdp program, this
behavior will prevent tx actions since the page must be re-mapped in
TODEVICE mode, which cannot be done if the page is still shared.

Start by making the allocator configurable based on whether xdp is
running, such that order-0 pages are always used and never shared.

Since this will stress the page allocator, add a simple page cache to
each rx ring. Pages in the cache are left dma-mapped, and in drop-only
stress tests the page allocator is eliminated from the perf report.

Note that setting an xdp program will now require the rings to be
reconfigured.

Before:
 26.91%  ksoftirqd/0  [mlx4_en]         [k] mlx4_en_process_rx_cq
 17.88%  ksoftirqd/0  [mlx4_en]         [k] mlx4_en_alloc_frags
  6.00%  ksoftirqd/0  [mlx4_en]         [k] mlx4_en_free_frag
  4.49%  ksoftirqd/0  [kernel.vmlinux]  [k] get_page_from_freelist
  3.21%  swapper      [kernel.vmlinux]  [k] intel_idle
  2.73%  ksoftirqd/0  [kernel.vmlinux]  [k] bpf_map_lookup_elem
  2.57%  swapper      [mlx4_en]         [k] mlx4_en_process_rx_cq

After:
 31.72%  swapper      [kernel.vmlinux]       [k] intel_idle
  8.79%  swapper      [mlx4_en]              [k] mlx4_en_process_rx_cq
  7.54%  swapper      [kernel.vmlinux]       [k] poll_idle
  6.36%  swapper      [mlx4_core]            [k] mlx4_eq_int
  4.21%  swapper      [kernel.vmlinux]       [k] tasklet_action
  4.03%  swapper      [kernel.vmlinux]       [k] cpuidle_enter_state
  3.43%  swapper      [mlx4_en]              [k] mlx4_en_prepare_rx_desc
  2.18%  swapper      [kernel.vmlinux]       [k] native_irq_return_iret
  1.37%  swapper      [kernel.vmlinux]       [k] menu_select
  1.09%  swapper      [kernel.vmlinux]       [k] bpf_map_lookup_elem
Signed-off-by: NBrenden Blanco <bblanco@plumgrid.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d576acf0

Add sample for adding simple drop program to link · 86af8b41

由 Brenden Blanco 提交于 7月 19, 2016

Add a sample program that only drops packets at the BPF_PROG_TYPE_XDP_RX
hook of a link. With the drop-only program, observed single core rate is
~20Mpps.

Other tests were run, for instance without the dropcnt increment or
without reading from the packet header, the packet rate was mostly
unchanged.

$ perf record -a samples/bpf/xdp1 $(</sys/class/net/eth0/ifindex)
proto 17:   20403027 drops/s

./pktgen_sample03_burst_single_flow.sh -i $DEV -d $IP -m $MAC -t 4
Running... ctrl^C to stop
Device: eth4@0
Result: OK: 11791017(c11788327+d2689) usec, 59622913 (60byte,0frags)
  5056638pps 2427Mb/sec (2427186240bps) errors: 0
Device: eth4@1
Result: OK: 11791012(c11787906+d3106) usec, 60526944 (60byte,0frags)
  5133311pps 2463Mb/sec (2463989280bps) errors: 0
Device: eth4@2
Result: OK: 11791019(c11788249+d2769) usec, 59868091 (60byte,0frags)
  5077431pps 2437Mb/sec (2437166880bps) errors: 0
Device: eth4@3
Result: OK: 11795039(c11792403+d2636) usec, 59483181 (60byte,0frags)
  5043067pps 2420Mb/sec (2420672160bps) errors: 0

perf report --no-children:
 26.05%  ksoftirqd/0  [mlx4_en]         [k] mlx4_en_process_rx_cq
 17.84%  ksoftirqd/0  [mlx4_en]         [k] mlx4_en_alloc_frags
  5.52%  ksoftirqd/0  [mlx4_en]         [k] mlx4_en_free_frag
  4.90%  swapper      [kernel.vmlinux]  [k] poll_idle
  4.14%  ksoftirqd/0  [kernel.vmlinux]  [k] get_page_from_freelist
  2.78%  ksoftirqd/0  [kernel.vmlinux]  [k] __free_pages_ok
  2.57%  ksoftirqd/0  [kernel.vmlinux]  [k] bpf_map_lookup_elem
  2.51%  swapper      [mlx4_en]         [k] mlx4_en_process_rx_cq
  1.94%  ksoftirqd/0  [kernel.vmlinux]  [k] percpu_array_map_lookup_elem
  1.45%  swapper      [mlx4_en]         [k] mlx4_en_alloc_frags
  1.35%  ksoftirqd/0  [kernel.vmlinux]  [k] free_one_page
  1.33%  swapper      [kernel.vmlinux]  [k] intel_idle
  1.04%  ksoftirqd/0  [mlx4_en]         [k] 0x000000000001c5c5
  0.96%  ksoftirqd/0  [mlx4_en]         [k] 0x000000000001c58d
  0.93%  ksoftirqd/0  [mlx4_en]         [k] 0x000000000001c6ee
  0.92%  ksoftirqd/0  [mlx4_en]         [k] 0x000000000001c6b9
  0.89%  ksoftirqd/0  [kernel.vmlinux]  [k] __alloc_pages_nodemask
  0.83%  ksoftirqd/0  [mlx4_en]         [k] 0x000000000001c686
  0.83%  ksoftirqd/0  [mlx4_en]         [k] 0x000000000001c5d5
  0.78%  ksoftirqd/0  [mlx4_en]         [k] mlx4_alloc_pages.isra.23
  0.77%  ksoftirqd/0  [mlx4_en]         [k] 0x000000000001c5b4
  0.77%  ksoftirqd/0  [kernel.vmlinux]  [k] net_rx_action

machine specs:
 receiver - Intel E5-1630 v3 @ 3.70GHz
 sender - Intel E5645 @ 2.40GHz
 Mellanox ConnectX-3 @40G
Signed-off-by: NBrenden Blanco <bblanco@plumgrid.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

86af8b41

net/mlx4_en: add support for fast rx drop bpf program · 47a38e15

由 Brenden Blanco 提交于 7月 19, 2016

Add support for the BPF_PROG_TYPE_XDP hook in mlx4 driver.

In tc/socket bpf programs, helpers linearize skb fragments as needed
when the program touches the packet data. However, in the pursuit of
speed, XDP programs will not be allowed to use these slower functions,
especially if it involves allocating an skb.

Therefore, disallow MTU settings that would produce a multi-fragment
packet that XDP programs would fail to access. Future enhancements could
be done to increase the allowable MTU.

The xdp program is present as a per-ring data structure, but as of yet
it is not possible to set at that granularity through any ndo.
Signed-off-by: NBrenden Blanco <bblanco@plumgrid.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

47a38e15

rtnl: add option for setting link xdp prog · d1fdd913

由 Brenden Blanco 提交于 7月 19, 2016

Sets the bpf program represented by fd as an early filter in the rx path
of the netdev. The fd must have been created as BPF_PROG_TYPE_XDP.
Providing a negative value as fd clears the program. Getting the fd back
via rtnl is not possible, therefore reading of this value merely
provides a bool whether the program is valid on the link or not.
Signed-off-by: NBrenden Blanco <bblanco@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d1fdd913

net: add ndo to setup/query xdp prog in adapter rx · a7862b45

由 Brenden Blanco 提交于 7月 19, 2016

Add one new netdev op for drivers implementing the BPF_PROG_TYPE_XDP
filter. The single op is used for both setup/query of the xdp program,
modelled after ndo_setup_tc.
Signed-off-by: NBrenden Blanco <bblanco@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a7862b45

bpf: add XDP prog type for early driver filter · 6a773a15

由 Brenden Blanco 提交于 7月 19, 2016

Add a new bpf prog type that is intended to run in early stages of the
packet rx path. Only minimal packet metadata will be available, hence a
new context type, struct xdp_md, is exposed to userspace. So far only
expose the packet start and end pointers, and only in read mode.

An XDP program must return one of the well known enum values, all other
return codes are reserved for future use. Unfortunately, this
restriction is hard to enforce at verification time, so take the
approach of warning at runtime when such programs are encountered. Out
of bounds return codes should alias to XDP_ABORTED.
Signed-off-by: NBrenden Blanco <bblanco@plumgrid.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6a773a15

bpf: add bpf_prog_add api for bulk prog refcnt · 59d3656d

由 Brenden Blanco 提交于 7月 19, 2016

A subsystem may need to store many copies of a bpf program, each
deserving its own reference. Rather than requiring the caller to loop
one by one (with possible mid-loop failure), add a bulk bpf_prog_add
api.
Signed-off-by: NBrenden Blanco <bblanco@plumgrid.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

59d3656d

Merge branch 'ncsi' · ddbcb794

由 David S. Miller 提交于 7月 19, 2016

Gavin Shan says:

====================
NCSI Support

This series rebases on David's linux-net git repo ("master" branch). It's
to support NCSI stack on drivers/net/ethernet/faraday/ftgmac100.c. The
implementation is based on NCSI spec (version: 1.1.0):
https://www.dmtf.org/sites/default/files/standards/documents/DSP0222_1.1.0.pdf

As the following figure shows and defined in NCSI spec:

 * The NC-SI (aka NCSI) is defined as the interface between a (Base)
   Management Controller (BMC) and one or multiple Network Interface
   Controlers (NIC) on host side. The interface is responsible for providing
   external network connectivity for BMC.
 * Each BMC can connect to multiple packages, up to 8. Each package can have
   multiple channels, up to 32. Every package and channel are identified by
   3-bits and 5-bits in NCSI packet.
 * NCSI packet, encapsulated in ethernet frame, has 0x88F8 in the protocol
   field. The destination MAC address should be 0xFF's while the source MAC
   address can be arbitrary one.
 * NCSI packets are classified to command, response, AEN (Asynchronous Event Notification).
   Commands are sent from BMC to host (NIC) for configuration and
   information retrival. Responses, corresponding to commands, are sent from
   host to BMC for confirmation and requested information. One command should
   have one and only one response. AEN is sent from host to BMC for notification
   (e.g. link down on active channel) so that BMC can take appropriate action.

   +------------------+        +----------------------------------------------+
   |                  |        |                     Host                     |
   |        BMC       |        |                                              |
   |                  |        | +-------------------+  +-------------------+ |
   |    +---------+   |        | |     Package-A     |  |     Package-B     | |
   |    |         |   |        | +---------+---------+  +-------------------+ |
   |    |ftgmac100|   |        | | Channel | Channel |  | Channel | Channel | |
   +----+----+----+---+        +-+---------+---------+--+---------+---------+-+
             |                             |                      |
             |                             |                      |
             +-----------------------------+----------------------+

The series of patches is highlighted as:

The design for the patchset is highlighted as below:

 * The network driver uses 3 interfaces exported from NCSI stack:
   ncsi_register_dev() - Register (create) a associated NCSI device.
   ncsi_start_dev() - Bring up the NCSI device.
   ncsi_unregister_dev() - Destroy the registered NCSI device.
 * There are several data structures introduced for different objects:
   struct ncsi_dev - NCSI device seen by network device driver.
   struct ncsi_dev_priv - NCSI device seen by NCSI stack.
   struct ncsi_package - NCSI package which can have multiple channels.
   struct ncsi_channel - NCSI channel.
 * The NCSI stack is driven by workqueue and state machine internally.
 * The all available NCSI packages and channels are enumerated (probed) on
   the first call to ncsi_start_dev(). The NCSI topology won't change until
   the NCSI device is destroyed.
 * All available channels will be brought up When the hardware arbitration
   is enabled. Otherwise, only one channel is selected as active one. The
   NCSI internal is driven by state machine with help of a workqueue. In
   the meanwhile, there are 3 states for each channel which can be put into
   a queue requesting for configuration or suspending. Channels in the queue
   with inactive state set will be configured (bringup) while channels in
   the queue with active state will be suspended (teardown). The request
   configuration or suspending is being applied on the channel if it's in
   invisible state.
 * Failover, another inactive channel is selected as active, can happen when
   the hardware arbitration is disabled. The failover can be caused by timeout
   on link monitor and AEN.
 * NCSI stack should be configurable through netlink or another mechanism, it's
   not implemented in this patchset. It's something TBD.
 * The first NIC driver that is aware of NCSI: drivers/net/ethernet/faraday/ftgmac100.c

Changelog
=========
v2 -> v3:
 * Include (one line) change in include/uapi/linux/if_ether.h to fix build
   error.
v1 -> v2:
 * Support NCSI spec v1.1.0 (3 more commands and 4 hardware arbitration
   modes added).
 * Enable AEN packets according to the supported list.
 * Introduce NCSI channel states and processing queue in order to support
   the hardware arbitration.
 * The hardware arbitration is supported (tested with emulated environment).
 * Introduce link monitor with GLS (Get Link Status) command/response as part
   of the error handling defined in NCSI spec.
 * Support IPv6 address discovery when CONFIG_IPV6 is enabled.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ddbcb794

net/faraday: Mask PHY interrupt with NCSI mode · fc6061cf

由 Gavin Shan 提交于 7月 19, 2016

Bogus PHY interrupts are observed. This masks the PHY interrupt
when the interface works in NCSI mode as there is no attached
PHY under the circumstance.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: NJoel Stanley <joel@jms.id.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fc6061cf

net/faraday: Match driver according to compatible property · bb168e2e

由 Gavin Shan 提交于 7月 19, 2016

This matches the driver with devices compatible with "faraday,ftgmac100"
declared in the device tree. Originally, device's name from device
tree for it.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: NJoel Stanley <joel@jms.id.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bb168e2e

net/faraday: Support NCSI mode · bd466c3f

由 Gavin Shan 提交于 7月 19, 2016

This makes ftgmac100 driver support NCSI mode. The NCSI is enabled
on the interface if property "use-nc-si" or "use-ncsi" is found from
the device node in device tree.

   * No PHY device is used when NCSI mode is enabled.
   * The NCSI device (struct ncsi_dev) is created when probing the
     device while it's enabled/started when the interface is brought
     up.
   * Hardware IP checksum dosn't work when NCSI mode is enabled. It
     is disabled on enabled NCSI.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: NJoel Stanley <joel@jms.id.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bd466c3f

net/faraday: Read MAC address from chip · 113ce107

由 Gavin Shan 提交于 7月 19, 2016

The device is assigned with random MAC address. It isn't reasonable.
An valid MAC address might have been provided by (uboot) firmware by
device-tree or in chip. It's reasonable to use it to maintain consistency.

This uses the MAC address from device-tree or that in the chip if it's
valid. Otherwise, a random MAC address is given as before.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: NJoel Stanley <joel@jms.id.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

113ce107

net/faraday: Helper functions to create or destroy MDIO interface · eb418184

由 Gavin Shan 提交于 7月 19, 2016

This introduces two helper functions to create or destroy MDIO
interface. No logical changes introduced except the proper MDIO
names are given when having more than one MDIO bus.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: NJoel Stanley <joel@jms.id.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eb418184

net/ncsi: NCSI AEN packet handler · 7a82ecf4

由 Gavin Shan 提交于 7月 19, 2016

This introduces NCSI AEN packet handlers that result in (A) the
currently active channel is reconfigured; (B) Currently active
channel is deconfigured and disabled, another channel is chosen
as active one and configured. Case (B) won't happen if hardware
arbitration has been enabled, the channel that was in active
state is suspended simply.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: NJoel Stanley <joel@jms.id.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7a82ecf4

net/ncsi: Package and channel management · e6f44ed6

由 Gavin Shan 提交于 7月 19, 2016

This manages NCSI packages and channels:

 * The available packages and channels are enumerated in the first
   time of calling ncsi_start_dev(). The channels' capabilities are
   probed in the meanwhile. The NCSI network topology won't change
   until the NCSI device is destroyed.
 * There in a queue in every NCSI device. The element in the queue,
   channel, is waiting for configuration (bringup) or suspending
   (teardown). The channel's state (inactive/active) indicates the
   futher action (configuration or suspending) will be applied on the
   channel. Another channel's state (invisible) means the requested
   action is being applied.
 * The hardware arbitration will be enabled if all available packages
   and channels support it. All available channels try to provide
   service when hardware arbitration is enabled. Otherwise, one channel
   is selected as the active one at once.
 * When channel is in active state, meaning it's providing service, a
   timer started to retrieve the channe's link status. If the channel's
   link status fails to be updated in the determined period, the channel
   is going to be reconfigured. It's the error handling implementation
   as defined in NCSI spec.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: NJoel Stanley <joel@jms.id.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e6f44ed6

net/ncsi: NCSI response packet handler · 138635cc

由 Gavin Shan 提交于 7月 19, 2016

The NCSI response packets are sent to MC (Management Controller)
from the remote end. They are responses of NCSI command packets
for multiple purposes: completion status of NCSI command packets,
return NCSI channel's capability or configuration etc.

This defines struct to represent NCSI response packets and introduces
function ncsi_rcv_rsp() which will be used to receive NCSI response
packets and parse them.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: NJoel Stanley <joel@jms.id.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

138635cc

net/ncsi: NCSI command packet handler · 6389eaa7

由 Gavin Shan 提交于 7月 19, 2016

The NCSI command packets are sent from MC (Management Controller)
to remote end. They are used for multiple purposes: probe existing
NCSI package/channel, retrieve NCSI channel's capability, configure
NCSI channel etc.

This defines struct to represent NCSI command packets and introduces
function ncsi_xmit_cmd(), which will be used to transmit NCSI command
packet according to the request. The request is represented by struct
ncsi_cmd_arg.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: NJoel Stanley <joel@jms.id.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6389eaa7

net/ncsi: Resource management · 2d283bdd

由 Gavin Shan 提交于 7月 19, 2016

NCSI spec (DSP0222) defines several objects: package, channel, mode,
filter, version and statistics etc. This introduces the data structs
to represent those objects and implement functions to manage them.
Also, this introduces CONFIG_NET_NCSI for the newly implemented NCSI
stack.

   * The user (e.g. netdev driver) dereference NCSI device by
     "struct ncsi_dev", which is embedded to "struct ncsi_dev_priv".
     The later one is used by NCSI stack internally.
   * Every NCSI device can have multiple packages simultaneously, up
     to 8 packages. It's represented by "struct ncsi_package" and
     identified by 3-bits ID.
   * Every NCSI package can have multiple channels, up to 32. It's
     represented by "struct ncsi_channel" and identified by 5-bits ID.
   * Every NCSI channel has version, statistics, various modes and
     filters. They are represented by "struct ncsi_channel_version",
     "struct ncsi_channel_stats", "struct ncsi_channel_mode" and
     "struct ncsi_channel_filter" separately.
   * Apart from AEN (Asynchronous Event Notification), the NCSI stack
     works in terms of command and response. This introduces "struct
     ncsi_req" to represent a complete NCSI transaction made of NCSI
     request and response.

link: https://www.dmtf.org/sites/default/files/standards/documents/DSP0222_1.1.0.pdfSigned-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: NJoel Stanley <joel@jms.id.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2d283bdd

Merge branch 'dsa-mv88e6xxx-g2-cleanup-stp' · 5e31c701

由 David S. Miller 提交于 7月 19, 2016

Vivien Didelot says:

====================
net: dsa: mv88e6xxx: Global2 cleanup and STP

The Marvell switches registers are organized in distinct internal SMI
devices, such as PHY, Port, Global 1 or Global 2 registers sets.

Since not all chips support every registers sets or have slightly
differences in them (such as old 88E6060 or new 88E6390 likely to be
supported soon), make the setup code clearer now by removing a few
family checks and adding flags to describe the Global 2 registers map.

This patchset enables basic STP support and bridging on most chips when
getting rid of a few inconsistencies in chip descriptions (patch 1) and
add bridge Ageing Time support to DSA and the mv88e6xxx driver.

Changes v2 -> v3:
  - rename mv88e6xxx_update_write to mv88e6xxx_update
  - set fastest ageing time in use in the chip for multiple bridges,
    tested with a few printk

Changes v1 -> v2:
  - add a write helper for pointer-data Update registers
  - add ageing time support
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5e31c701

net: dsa: mv88e6xxx: add support for DSA ageing time · 2cfcd964

由 Vivien Didelot 提交于 7月 18, 2016

Implement the DSA driver function to configure the bridge ageing time.
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2cfcd964

net: dsa: mv88e6xxx: add G1 helper for ageing time · acddbd21

由 Vivien Didelot 提交于 7月 18, 2016

All Marvell switch chips from (88E6060 to 88E6390) have a ATU Control
register containing bits 11:4 to configure an ATU Age Time quotient.

However the coefficient used to calculate the ATU Age Time vary with the
models. E.g. 88E6060, 88E6352 and 88E6390 use respectively 16, 15 and
3.75 seconds.

Add a age_time_coeff to the info structure to handle this and a Global 1
helper to set the default age time of 5 minutes in the setup code.
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

acddbd21

net: dsa: support switchdev ageing time attr · 34a79f63

由 Vivien Didelot 提交于 7月 18, 2016

Add a new function for DSA drivers to handle the switchdev
SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME attribute.

The ageing time is passed as milliseconds.

Also because we can have multiple logical bridges on top of a physical
switch and ageing time are switch-wide, call the driver function with
the fastest ageing time in use on the chip instead of the requested one.
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

34a79f63

net: dsa: mv88e6xxx: add cap for IRL · 8ec61c7f

由 Vivien Didelot 提交于 7月 18, 2016

Add capability flags to describe the presence of Ingress Rate Limit unit
registers and an helper function to clear it.

In the meantime, fix a few harmless issues:

  - 6185 and 6095 don't have such registers (reserved)
  - the previous code didn't wait for the IRL operation to complete
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8ec61c7f

net: dsa: mv88e6xxx: add cap for Priority Override · 9bda889f

由 Vivien Didelot 提交于 7月 18, 2016

Add flags and helpers to describe the presence of Priority Override
Table (POT) related registers and simplify the setup of Global 2.
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9bda889f

net: dsa: mv88e6xxx: add cap for PVT · 63ed880d

由 Vivien Didelot 提交于 7月 18, 2016

Add flags to describe the presence of Cross-chip Port VLAN Table (PVT)
related registers and simplify the setup of Global 2.
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

63ed880d

net: dsa: mv88e6xxx: rework Switch MAC setter · 3b4caa1b

由 Vivien Didelot 提交于 7月 18, 2016

Switches such as 88E6185 as 3 Switch MAC registers in Global 1. Newer
chips such as 88E6352 have freed these registers in favor of an indirect
access in a Switch MAC/WoL/WoF register in Global 2.

Explicit this difference with G1 and G2 helpers and flags.

Also, note that this indirect access is a single-register which doesn't
require to wait for the operation to complete (like Switch MAC, Trunk
Mapping, etc.), in contrary to multi-registers indirect accesses with
several operations and a busy bit.
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3b4caa1b

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功