提交 · fbfc504a24f53f7ebe128ab55cb5dba634f4ece8 · openeuler / raspberrypi-kernel

04 5月, 2018 2 次提交

bpf: introduce new bpf AF_XDP map type BPF_MAP_TYPE_XSKMAP · fbfc504a

由 Björn Töpel 提交于 5月 02, 2018

The xskmap is yet another BPF map, very much inspired by
dev/cpu/sockmap, and is a holder of AF_XDP sockets. A user application
adds AF_XDP sockets into the map, and by using the bpf_redirect_map
helper, an XDP program can redirect XDP frames to an AF_XDP socket.

Note that a socket that is bound to certain ifindex/queue index will
*only* accept XDP frames from that netdev/queue index. If an XDP
program tries to redirect from a netdev/queue index other than what
the socket is bound to, the frame will not be received on the socket.

A socket can reside in multiple maps.

v3: Fixed race and simplified code.
v2: Removed one indirection in map lookup.
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

fbfc504a

net: initial AF_XDP skeleton · 68e8b849

由 Björn Töpel 提交于 5月 02, 2018

Buildable skeleton of AF_XDP without any functionality. Just what it
takes to register a new address family.
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

68e8b849

30 4月, 2018 1 次提交

bpf: remove tracepoints from bpf core · 4d220ed0

由 Alexei Starovoitov 提交于 4月 28, 2018

tracepoints to bpf core were added as a way to provide introspection
to bpf programs and maps, but after some time it became clear that
this approach is inadequate, so prog_id, map_id and corresponding
get_next_id, get_fd_by_id, get_info_by_fd, prog_query APIs were
introduced and fully adopted by bpftool and other applications.
The tracepoints in bpf core started to rot and causing syzbot warnings:
WARNING: CPU: 0 PID: 3008 at kernel/trace/trace_event_perf.c:274
Kernel panic - not syncing: panic_on_warn set ...
perf_trace_bpf_map_keyval+0x260/0xbd0 include/trace/events/bpf.h:228
trace_bpf_map_update_elem include/trace/events/bpf.h:274 [inline]
map_update_elem kernel/bpf/syscall.c:597 [inline]
SYSC_bpf kernel/bpf/syscall.c:1478 [inline]
Hence this patch deletes tracepoints in bpf core.
Reported-by: NEric Biggers <ebiggers3@gmail.com>
Reported-by: Nsyzbot <bot+a9dbb3c3e64b62536a4bc5ee7bbd4ca627566188@syzkaller.appspotmail.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

4d220ed0

29 4月, 2018 2 次提交

bpf/verifier: improve register value range tracking with ARSH · 9cbe1f5a

由 Yonghong Song 提交于 4月 28, 2018

When helpers like bpf_get_stack returns an int value
and later on used for arithmetic computation, the LSH and ARSH
operations are often required to get proper sign extension into
64-bit. For example, without this patch:
    54: R0=inv(id=0,umax_value=800)
    54: (bf) r8 = r0
    55: R0=inv(id=0,umax_value=800) R8_w=inv(id=0,umax_value=800)
    55: (67) r8 <<= 32
    56: R8_w=inv(id=0,umax_value=3435973836800,var_off=(0x0; 0x3ff00000000))
    56: (c7) r8 s>>= 32
    57: R8=inv(id=0)
With this patch:
    54: R0=inv(id=0,umax_value=800)
    54: (bf) r8 = r0
    55: R0=inv(id=0,umax_value=800) R8_w=inv(id=0,umax_value=800)
    55: (67) r8 <<= 32
    56: R8_w=inv(id=0,umax_value=3435973836800,var_off=(0x0; 0x3ff00000000))
    56: (c7) r8 s>>= 32
    57: R8=inv(id=0, umax_value=800,var_off=(0x0; 0x3ff))
With better range of "R8", later on when "R8" is added to other register,
e.g., a map pointer or scalar-value register, the better register
range can be derived and verifier failure may be avoided.

In our later example,
    ......
    usize = bpf_get_stack(ctx, raw_data, max_len, BPF_F_USER_STACK);
    if (usize < 0)
        return 0;
    ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
    ......
Without improving ARSH value range tracking, the register representing
"max_len - usize" will have smin_value equal to S64_MIN and will be
rejected by verifier.
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

9cbe1f5a

bpf: add bpf_get_stack helper · c195651e

由 Yonghong Song 提交于 4月 28, 2018

Currently, stackmap and bpf_get_stackid helper are provided
for bpf program to get the stack trace. This approach has
a limitation though. If two stack traces have the same hash,
only one will get stored in the stackmap table,
so some stack traces are missing from user perspective.

This patch implements a new helper, bpf_get_stack, will
send stack traces directly to bpf program. The bpf program
is able to see all stack traces, and then can do in-kernel
processing or send stack traces to user space through
shared map or bpf_perf_event_output.
Acked-by: NAlexei Starovoitov <ast@fb.com>
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

c195651e

27 4月, 2018 3 次提交

udp: add gso support to virtual devices · 83aa025f

由 Willem de Bruijn 提交于 4月 26, 2018

Virtual devices such as tunnels and bonding can handle large packets.
Only segment packets when reaching a physical or loopback device.
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

83aa025f

udp: generate gso with UDP_SEGMENT · bec1f6f6

由 Willem de Bruijn 提交于 4月 26, 2018

Support generic segmentation offload for udp datagrams. Callers can
concatenate and send at once the payload of multiple datagrams with
the same destination.

To set segment size, the caller sets socket option UDP_SEGMENT to the
length of each discrete payload. This value must be smaller than or
equal to the relevant MTU.

A follow-up patch adds cmsg UDP_SEGMENT to specify segment size on a
per send call basis.

Total byte length may then exceed MTU. If not an exact multiple of
segment size, the last segment will be shorter.

The implementation adds a gso_size field to the udp socket, ip(v6)
cmsg cookie and inet_cork structure to be able to set the value at
setsockopt or cmsg time and to work with both lockless and corked
paths.

Initial benchmark numbers show UDP GSO about as expensive as TCP GSO.

    tcp tso
     3197 MB/s 54232 msg/s 54232 calls/s
         6,457,754,262      cycles

    tcp gso
     1765 MB/s 29939 msg/s 29939 calls/s
        11,203,021,806      cycles

    tcp without tso/gso *
      739 MB/s 12548 msg/s 12548 calls/s
        11,205,483,630      cycles

    udp
      876 MB/s 14873 msg/s 624666 calls/s
        11,205,777,429      cycles

    udp gso
     2139 MB/s 36282 msg/s 36282 calls/s
        11,204,374,561      cycles

   [*] after reverting commit 0a6b2a1d
       ("tcp: switch to GSO being always on")

Measured total system cycles ('-a') for one core while pinning both
the network receive path and benchmark process to that core:

  perf stat -a -C 12 -e cycles \
    ./udpgso_bench_tx -C 12 -4 -D "$DST" -l 4

Note the reduction in calls/s with GSO. Bytes per syscall drops
increases from 1470 to 61818.
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bec1f6f6

udp: add udp gso · ee80d1eb

由 Willem de Bruijn 提交于 4月 26, 2018

Implement generic segmentation offload support for udp datagrams. A
follow-up patch adds support to the protocol stack to generate such
packets.

UDP GSO is not UFO. UFO fragments a single large datagram. GSO splits
a large payload into a number of discrete UDP datagrams.

The implementation adds a GSO type SKB_UDP_GSO_L4 to differentiate it
from UFO (SKB_UDP_GSO).

IPPROTO_UDPLITE is excluded, as that protocol has no gso handler
registered.

[ Export __udp_gso_segment for ipv6. -DaveM ]
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ee80d1eb

25 4月, 2018 7 次提交

macvlan: Provide function for interfaces to release HW offload · 53cd4d8e

由 Alexander Duyck 提交于 4月 03, 2018

This patch provides a basic function to allow a lower device to disable
macvlan offload if it was previously enabled on a given macvlan. The idea
here is to allow for recovery from failure should the lowerdev run out of
resources.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

53cd4d8e

macvlan: Add function to test for destination filtering support · 6cb1937d

由 Alexander Duyck 提交于 4月 03, 2018

This patch adds a function indicating if a given macvlan can fully supports
destination filtering, especially as it relates to unicast traffic. For
those macvlan interfaces that do not support destination filtering such
passthru or source mode filtering we should not be enabling offload
support.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

6cb1937d

macvlan: macvlan_count_rx shouldn't be static inline AND extern · a2223112

由 Alexander Duyck 提交于 4月 03, 2018

It doesn't make sense to define macvlan_count_rx as a static inline and
then add a forward declaration after that as an extern. I am dropping the
extern declaration since it seems like it is something that likely got
missed when the function was made an inline.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

a2223112

macvlan: Rename fwd_priv to accel_priv and add accessor function · 7d775f63

由 Alexander Duyck 提交于 4月 03, 2018

This change renames the fwd_priv member to accel_priv as this more
accurately reflects the actual purpose of this value. In addition I am
adding an accessor which will allow us to further abstract this in the
future if needed.
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

7d775f63

net: ethtool: Add missing kernel doc for FEC parameters · d805c520

由 Florian Fainelli 提交于 4月 23, 2018

While adding support for ethtool::get_fecparam and set_fecparam, kernel
doc for these functions was missed, add those.

Fixes: 1a5f3da2 ("net: ethtool: add support for forward error correction modes")
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d805c520

rhashtable: Revise incorrect comment on r{hl, hash}table_walk_enter() · 82266e98

由 NeilBrown 提交于 4月 24, 2018

Neither rhashtable_walk_enter() or rhltable_walk_enter() sleep, though
they do take a spinlock without irq protection.
So revise the comments to accurately state the contexts in which
these functions can be called.
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

82266e98

rhashtable: remove outdated comments about grow_decision etc · 0c6f69a5

由 NeilBrown 提交于 4月 24, 2018

grow_decision and shink_decision no longer exist, so remove
the remaining references to them.
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c6f69a5

24 4月, 2018 6 次提交

Revert "net: init sk_cookie for inet socket" · a06ac0d6

由 Yafang Shao 提交于 4月 24, 2018

This reverts commit <c6849a3a> ("net: init sk_cookie for inet socket")

Per discussion with Eric, when update sock_net(sk)->cookie_gen, the
whole cache cache line will be invalidated, as this cache line is shared
with all cpus, that may cause great performace hit.

Bellow is the data form Eric.
"Performance is reduced from ~5 Mpps to ~3.8 Mpps with 16 RX queues on
my host" when running synflood test.

Have to revert it to prevent from cache line false sharing.
Signed-off-by: NYafang Shao <laoar.shao@gmail.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a06ac0d6

net/dim: Support adaptive TX moderation · 623ad755

由 Tal Gilboa 提交于 4月 24, 2018

Interrupt moderation for TX traffic requires different profiles than RX
interrupt moderation. The main goal here is to reduce interrupt rate and
allow better payload aggregation by keeping SKBs in the TX queue a bit
longer. Ping-pong behavior would get a profile with a short timer, so
latency wouldn't increase for these scenarios. There might be a slight
degradation in bandwidth for single stream with large message sizes, since
net.ipv4.tcp_limit_output_bytes is limiting the allowed TX traffic, but
with many streams performance is always improved.
Signed-off-by: NTal Gilboa <talgi@mellanox.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

623ad755

net/dim: Rename *_get_profile() functions to *_get_rx_moderation() · 026a807c

由 Tal Gilboa 提交于 4月 24, 2018

Preparation for introducing adaptive TX to net DIM.
Signed-off-by: NTal Gilboa <talgi@mellanox.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

026a807c

bpf: sockmap, map_release does not hold refcnt for pinned maps · ba6b8de4

由 John Fastabend 提交于 4月 23, 2018

Relying on map_release hook to decrement the reference counts when a
map is removed only works if the map is not being pinned. In the
pinned case the ref is decremented immediately and the BPF programs
released. After this BPF programs may not be in-use which is not
what the user would expect.

This patch moves the release logic into bpf_map_put_uref() and brings
sockmap in-line with how a similar case is handled in prog array maps.

Fixes: 3d9e9526 ("bpf: sockmap, fix leaking maps with attached but not detached progs")
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

ba6b8de4

bpf: disable and restore preemption in __BPF_PROG_RUN_ARRAY · 6899b32b

由 Roman Gushchin 提交于 4月 23, 2018

Running bpf programs requires disabled preemption,
however at least some* of the BPF_PROG_RUN_ARRAY users
do not follow this rule.

To fix this bug, and also to make it not happen in the future,
let's add explicit preemption disabling/re-enabling
to the __BPF_PROG_RUN_ARRAY code.

* for example:
 [   17.624472] RIP: 0010:__cgroup_bpf_run_filter_sk+0x1c4/0x1d0
 ...
 [   17.640890]  inet6_create+0x3eb/0x520
 [   17.641405]  __sock_create+0x242/0x340
 [   17.641939]  __sys_socket+0x57/0xe0
 [   17.642370]  ? trace_hardirqs_off_thunk+0x1a/0x1c
 [   17.642944]  SyS_socket+0xa/0x10
 [   17.643357]  do_syscall_64+0x79/0x220
 [   17.643879]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
Signed-off-by: NRoman Gushchin <guro@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

6899b32b

qed: Add configuration information to register dump and debug data · 1ac4329a

由 Denis Bolotin 提交于 4月 23, 2018

Configuration information is added to the debug data collection, in
addition to register dump.
Added qed_dbg_nvm_image() that receives an image type, allocates a
buffer and reads the image. The images are saved in the buffers and the
dump size is updated.
Signed-off-by: NDenis Bolotin <denis.bolotin@cavium.com>
Signed-off-by: NAriel Elior <ariel.elior@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1ac4329a

23 4月, 2018 1 次提交

net: init sk_cookie for inet socket · c6849a3a

由 Yafang Shao 提交于 4月 22, 2018

With sk_cookie we can identify a socket, that is very helpful for
traceing and statistic, i.e. tcp tracepiont and ebpf.
So we'd better init it by default for inet socket.
When using it, we just need call atomic64_read(&sk->sk_cookie).
Signed-off-by: NYafang Shao <laoar.shao@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c6849a3a

21 4月, 2018 3 次提交

kasan: add no_sanitize attribute for clang builds · 12c8f25a

由 Andrey Konovalov 提交于 4月 20, 2018

KASAN uses the __no_sanitize_address macro to disable instrumentation of
particular functions.  Right now it's defined only for GCC build, which
causes false positives when clang is used.

This patch adds a definition for clang.

Note, that clang's revision 329612 or higher is required.

[andreyknvl@google.com: remove redundant #ifdef CONFIG_KASAN check]
  Link: http://lkml.kernel.org/r/c79aa31a2a2790f6131ed607c58b0dd45dd62a6c.1523967959.git.andreyknvl@google.com
Link: http://lkml.kernel.org/r/4ad725cc903f8534f8c8a60f0daade5e3d674f8d.1523554166.git.andreyknvl@google.comSigned-off-by: NAndrey Konovalov <andreyknvl@google.com>
Acked-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Paul Lawrence <paullawrence@google.com>
Cc: Sandipan Das <sandipan@linux.vnet.ibm.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

12c8f25a

writeback: safer lock nesting · 2e898e4c

由 Greg Thelen 提交于 4月 20, 2018

lock_page_memcg()/unlock_page_memcg() use spin_lock_irqsave/restore() if
the page's memcg is undergoing move accounting, which occurs when a
process leaves its memcg for a new one that has
memory.move_charge_at_immigrate set.

unlocked_inode_to_wb_begin,end() use spin_lock_irq/spin_unlock_irq() if
the given inode is switching writeback domains.  Switches occur when
enough writes are issued from a new domain.

This existing pattern is thus suspicious:
    lock_page_memcg(page);
    unlocked_inode_to_wb_begin(inode, &locked);
    ...
    unlocked_inode_to_wb_end(inode, locked);
    unlock_page_memcg(page);

If both inode switch and process memcg migration are both in-flight then
unlocked_inode_to_wb_end() will unconditionally enable interrupts while
still holding the lock_page_memcg() irq spinlock.  This suggests the
possibility of deadlock if an interrupt occurs before unlock_page_memcg().

    truncate
    __cancel_dirty_page
    lock_page_memcg
    unlocked_inode_to_wb_begin
    unlocked_inode_to_wb_end
    <interrupts mistakenly enabled>
                                    <interrupt>
                                    end_page_writeback
                                    test_clear_page_writeback
                                    lock_page_memcg
                                    <deadlock>
    unlock_page_memcg

Due to configuration limitations this deadlock is not currently possible
because we don't mix cgroup writeback (a cgroupv2 feature) and
memory.move_charge_at_immigrate (a cgroupv1 feature).

If the kernel is hacked to always claim inode switching and memcg
moving_account, then this script triggers lockup in less than a minute:

  cd /mnt/cgroup/memory
  mkdir a b
  echo 1 > a/memory.move_charge_at_immigrate
  echo 1 > b/memory.move_charge_at_immigrate
  (
    echo $BASHPID > a/cgroup.procs
    while true; do
      dd if=/dev/zero of=/mnt/big bs=1M count=256
    done
  ) &
  while true; do
    sync
  done &
  sleep 1h &
  SLEEP=$!
  while true; do
    echo $SLEEP > a/cgroup.procs
    echo $SLEEP > b/cgroup.procs
  done

The deadlock does not seem possible, so it's debatable if there's any
reason to modify the kernel.  I suggest we should to prevent future
surprises.  And Wang Long said "this deadlock occurs three times in our
environment", so there's more reason to apply this, even to stable.
Stable 4.4 has minor conflicts applying this patch.  For a clean 4.4 patch
see "[PATCH for-4.4] writeback: safer lock nesting"
https://lkml.org/lkml/2018/4/11/146

Wang Long said "this deadlock occurs three times in our environment"

[gthelen@google.com: v4]
  Link: http://lkml.kernel.org/r/20180411084653.254724-1-gthelen@google.com
[akpm@linux-foundation.org: comment tweaks, struct initialization simplification]
Change-Id: Ibb773e8045852978f6207074491d262f1b3fb613
Link: http://lkml.kernel.org/r/20180410005908.167976-1-gthelen@google.com
Fixes: 682aa8e1 ("writeback: implement unlocked_inode_to_wb transaction and use it for stat updates")
Signed-off-by: NGreg Thelen <gthelen@google.com>
Reported-by: NWang Long <wanglong19@meituan.com>
Acked-by: NWang Long <wanglong19@meituan.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: <stable@vger.kernel.org>	[v4.2+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2e898e4c

fork: unconditionally clear stack on fork · e01e8063

由 Kees Cook 提交于 4月 20, 2018

One of the classes of kernel stack content leaks[1] is exposing the
contents of prior heap or stack contents when a new process stack is
allocated.  Normally, those stacks are not zeroed, and the old contents
remain in place.  In the face of stack content exposure flaws, those
contents can leak to userspace.

Fixing this will make the kernel no longer vulnerable to these flaws, as
the stack will be wiped each time a stack is assigned to a new process.
There's not a meaningful change in runtime performance; it almost looks
like it provides a benefit.

Performing back-to-back kernel builds before:
	Run times: 157.86 157.09 158.90 160.94 160.80
	Mean: 159.12
	Std Dev: 1.54

and after:
	Run times: 159.31 157.34 156.71 158.15 160.81
	Mean: 158.46
	Std Dev: 1.46

Instead of making this a build or runtime config, Andy Lutomirski
recommended this just be enabled by default.

[1] A noisy search for many kinds of stack content leaks can be seen here:
https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=linux+kernel+stack+leak

I did some more with perf and cycle counts on running 100,000 execs of
/bin/true.

before:
Cycles: 218858861551 218853036130 214727610969 227656844122 224980542841
Mean:  221015379122.60
Std Dev: 4662486552.47

after:
Cycles: 213868945060 213119275204 211820169456 224426673259 225489986348
Mean:  217745009865.40
Std Dev: 5935559279.99

It continues to look like it's faster, though the deviation is rather
wide, but I'm not sure what I could do that would be less noisy.  I'm
open to ideas!

Link: http://lkml.kernel.org/r/20180221021659.GA37073@beastSigned-off-by: NKees Cook <keescook@chromium.org>
Acked-by: NMichal Hocko <mhocko@suse.com>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Rasmus Villemoes <rasmus.villemoes@prevas.dk>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e01e8063

20 4月, 2018 15 次提交

lan78xx: Read LED states from Device Tree · 1827b067

由 Phil Elwell 提交于 4月 19, 2018

Add support for DT property "microchip,led-modes", a vector of zero
to four cells (u32s) in the range 0-15, each of which sets the mode
for one of the LEDs. Some possible values are:

    0=link/activity          1=link1000/activity
    2=link100/activity       3=link10/activity
    4=link100/1000/activity  5=link10/1000/activity
    6=link10/100/activity    14=off    15=on

These values are given symbolic constants in a dt-bindings header.

Also use the presence of the DT property to indicate that the
LEDs should be enabled - necessary in the event that no valid OTP
or EEPROM is available.
Signed-off-by: NPhil Elwell <phil@raspberrypi.org>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1827b067

qed* : Add new TLV to request PF to update MAC in bulletin board · 809c45a0

由 Shahed Shaikh 提交于 4月 19, 2018

There may be a need for VF driver to request PF to explicitly update its
bulletin with a MAC address.
e.g. When user assigns a MAC address to VF while VF is still down,
and PF's bulletin board contains different MAC address, in this case,
when VF's interface is brought up, it gets loaded with MAC address from
bulletin board which is not desirable.

To handle this corner case, we need a new TLV to request PF to update
its bulletin board with suggested MAC.

This request will be honored only for trusted VFs.
Signed-off-by: NShahed Shaikh <shahed.shaikh@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

809c45a0

net: phy: mdio-gpio: Remove redundant platform data header · 0207dd11

由 Andrew Lunn 提交于 4月 19, 2018

The platform data header file is now unused. Remove it, but add
an extra include which it brought in.
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0207dd11

net: phy: mdio-gpio: Add #defines for the GPIO index's · fb78a95e

由 Andrew Lunn 提交于 4月 19, 2018

The GPIOs are described in device tree using a list, without names.
Add defines to indicate what each index in the list means. These
defines should also be used by platform devices passing GPIOs via a
GPIO lookup table.
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fb78a95e

net: phy: mdio-gpio: Swap to using gpio descriptors · c82fc481

由 Andrew Lunn 提交于 4月 19, 2018

This simplifies the code, removing the need to handle active low
flags, etc.
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c82fc481

net: phy: mdio-gpio: Remove support for IRQs in platform data · 68abb4f2

由 Andrew Lunn 提交于 4月 19, 2018

No current devices use IRQs in platform data, so remove support for
it. The MDIO core will also initialise the new bus such that all
addresses are polled, so remove the unneeded re-initialisation.
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

68abb4f2

net: phy: mdio-gpio: remove support for phy mask · 185a16b6

由 Andrew Lunn 提交于 4月 19, 2018

This is not needed any more by devices using platform data, so remove
it.
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

185a16b6

net: phy: mdio-gpio: remove support for ignoring turn around · c1b3eb04

由 Andrew Lunn 提交于 4月 19, 2018

This is not needed any more by devices using platform data, so remove
it.
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c1b3eb04

net: phy: mdio-bitbang: Remove reset support · a3283e25

由 Andrew Lunn 提交于 4月 19, 2018

The mdio-gpio driver was the only user of the interface reset option.
Since it no longer uses it, remove it from the bit banging code.
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a3283e25

net: phy: mdio-gpio: Remove reset function · 9e4d6093

由 Andrew Lunn 提交于 4月 19, 2018

The platform data can contain a function to call to reset
the bit banging interface. It is not used, so remove it.
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9e4d6093

bpf: btf: Add pretty print support to the basic arraymap · a26ca7c9

由 Martin KaFai Lau 提交于 4月 18, 2018

This patch adds pretty print support to the basic arraymap.
Support for other bpf maps can be added later.

This patch adds new attrs to the BPF_MAP_CREATE command to allow
specifying the btf_fd, btf_key_id and btf_value_id.  The
BPF_MAP_CREATE can then associate the btf to the map if
the creating map supports BTF.

A BTF supported map needs to implement two new map ops,
map_seq_show_elem() and map_check_btf().  This patch has
implemented these new map ops for the basic arraymap.

It also adds file_operations, bpffs_map_fops, to the pinned
map such that the pinned map can be opened and read.
After that, the user has an intuitive way to do
"cat bpffs/pathto/a-pinned-map" instead of getting
an error.

bpffs_map_fops should not be extended further to support
other operations.  Other operations (e.g. write/key-lookup...)
should be realized by the userspace tools (e.g. bpftool) through
the BPF_OBJ_GET_INFO_BY_FD, map's lookup/update interface...etc.
Follow up patches will allow the userspace to obtain
the BTF from a map-fd.

Here is a sample output when reading a pinned arraymap
with the following map's value:

struct map_value {
	int count_a;
	int count_b;
};

cat /sys/fs/bpf/pinned_array_map:

0: {1,2}
1: {3,4}
2: {5,6}
...
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Acked-by: NAlexei Starovoitov <ast@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

a26ca7c9

bpf: btf: Add BPF_OBJ_GET_INFO_BY_FD support to BTF fd · 60197cfb

由 Martin KaFai Lau 提交于 4月 18, 2018

This patch adds BPF_OBJ_GET_INFO_BY_FD support to BTF fd.
The original BTF data, which was used to create the BTF fd during
the earlier BPF_BTF_LOAD call, will be returned.

The userspace is expected to allocate buffer
to info.info and the buffer size is set to info.info_len before
calling BPF_OBJ_GET_INFO_BY_FD.

The original BTF data is copied to the userspace buffer (info.info).
Only upto the user's specified info.info_len will be copied.

The original BTF data size is set to info.info_len.  The userspace
needs to check if it is bigger than its allocated buffer size.
If it is, the userspace should realloc with the kernel-returned
info.info_len and call the BPF_OBJ_GET_INFO_BY_FD again.
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Acked-by: NAlexei Starovoitov <ast@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

60197cfb

bpf: btf: Add BPF_BTF_LOAD command · f56a653c

由 Martin KaFai Lau 提交于 4月 18, 2018

This patch adds a BPF_BTF_LOAD command which
1) loads and verifies the BTF (implemented in earlier patches)
2) returns a BTF fd to userspace.  In the next patch, the
   BTF fd can be specified during BPF_MAP_CREATE.

It currently limits to CAP_SYS_ADMIN.
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Acked-by: NAlexei Starovoitov <ast@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

f56a653c

bpf: btf: Add pretty print capability for data with BTF type info · b00b8dae

由 Martin KaFai Lau 提交于 4月 18, 2018

This patch adds pretty print capability for data with BTF type info.
The current usage is to allow pretty print for a BPF map.

The next few patches will allow a read() on a pinned map with BTF
type info for its key and value.

This patch uses the seq_printf() infra.
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Acked-by: NAlexei Starovoitov <ast@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

b00b8dae

bpf: btf: Validate type reference · eb3f595d

由 Martin KaFai Lau 提交于 4月 18, 2018

After collecting all btf_type in the first pass in an earlier patch,
the second pass (in this patch) can validate the reference types
(e.g. the referring type does exist and it does not refer to itself).

While checking the reference type, it also gathers other information (e.g.
the size of an array).  This info will be useful in checking the
struct's members in a later patch.  They will also be useful in doing
pretty print later.
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Acked-by: NAlexei Starovoitov <ast@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

eb3f595d