提交 · 6760bf2ddde8ad64f8205a651223a93de3a35494 · openeuler / raspberrypi-kernel

18 12月, 2016 34 次提交

bpf: fix mark_reg_unknown_value for spilled regs on map value marking · 6760bf2d

由 Daniel Borkmann 提交于 12月 18, 2016

Martin reported a verifier issue that hit the BUG_ON() for his
test case in the mark_reg_unknown_value() function:

  [  202.861380] kernel BUG at kernel/bpf/verifier.c:467!
  [...]
  [  203.291109] Call Trace:
  [  203.296501]  [<ffffffff811364d5>] mark_map_reg+0x45/0x50
  [  203.308225]  [<ffffffff81136558>] mark_map_regs+0x78/0x90
  [  203.320140]  [<ffffffff8113938d>] do_check+0x226d/0x2c90
  [  203.331865]  [<ffffffff8113a6ab>] bpf_check+0x48b/0x780
  [  203.343403]  [<ffffffff81134c8e>] bpf_prog_load+0x27e/0x440
  [  203.355705]  [<ffffffff8118a38f>] ? handle_mm_fault+0x11af/0x1230
  [  203.369158]  [<ffffffff812d8188>] ? security_capable+0x48/0x60
  [  203.382035]  [<ffffffff811351a4>] SyS_bpf+0x124/0x960
  [  203.393185]  [<ffffffff810515f6>] ? __do_page_fault+0x276/0x490
  [  203.406258]  [<ffffffff816db320>] entry_SYSCALL_64_fastpath+0x13/0x94

This issue got uncovered after the fix in a08dd0da ("bpf: fix
regression on verifier pruning wrt map lookups"). The reason why it
wasn't noticed before was, because as mentioned in a08dd0da,
mark_map_regs() was doing the id matching incorrectly based on the
uncached regs[regno].id. So, in the first loop, we walked all regs
and as soon as we found regno == i, then this reg's id was cleared
when calling mark_reg_unknown_value() thus that every subsequent
register was probed against id of 0 (which, in combination with the
PTR_TO_MAP_VALUE_OR_NULL type is an invalid condition that no other
register state can hold), and therefore wasn't type transitioned such
as in the spilled register case for the second loop.

Now since that got fixed, it turned out that 57a09bf0 ("bpf:
Detect identical PTR_TO_MAP_VALUE_OR_NULL registers") used
mark_reg_unknown_value() incorrectly for the spilled regs, and thus
hitting the BUG_ON() in some cases due to regno >= MAX_BPF_REG.

Although spilled regs have the same type as the non-spilled regs
for the verifier state, that is, struct bpf_reg_state, they are
semantically different from the non-spilled regs. In other words,
there can be up to 64 (MAX_BPF_STACK / BPF_REG_SIZE) spilled regs
in the stack, for example, register R<x> could have been spilled by
the program to stack location X, Y, Z, and in mark_map_regs() we
need to scan these stack slots of type STACK_SPILL for potential
registers that we have to transition from PTR_TO_MAP_VALUE_OR_NULL.
Therefore, depending on the location, the spilled_regs regno can
be a lot higher than just MAX_BPF_REG's value since we operate on
stack instead. The reset in mark_reg_unknown_value() itself is
just fine, only that the BUG_ON() was inappropriate for this. Fix
it by making a __mark_reg_unknown_value() version that can be
called from mark_map_reg() generically; we know for the non-spilled
case that the regno is always < MAX_BPF_REG anyway.

Fixes: 57a09bf0 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers")
Reported-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6760bf2d

bpf: fix overflow in prog accounting · 5ccb071e

由 Daniel Borkmann 提交于 12月 18, 2016

Commit aaac3ba9 ("bpf: charge user for creation of BPF maps and
programs") made a wrong assumption of charging against prog->pages.
Unlike map->pages, prog->pages are still subject to change when we
need to expand the program through bpf_prog_realloc().

This can for example happen during verification stage when we need to
expand and rewrite parts of the program. Should the required space
cross a page boundary, then prog->pages is not the same anymore as
its original value that we used to bpf_prog_charge_memlock() on. Thus,
we'll hit a wrap-around during bpf_prog_uncharge_memlock() when prog
is freed eventually. I noticed this that despite having unlimited
memlock, programs suddenly refused to load with EPERM error due to
insufficient memlock.

There are two ways to fix this issue. One would be to add a cached
variable to struct bpf_prog that takes a snapshot of prog->pages at the
time of charging. The other approach is to also account for resizes. I
chose to go with the latter for a couple of reasons: i) We want accounting
rather to be more accurate instead of further fooling limits, ii) adding
yet another page counter on struct bpf_prog would also be a waste just
for this purpose. We also do want to charge as early as possible to
avoid going into the verifier just to find out later on that we crossed
limits. The only place that needs to be fixed is bpf_prog_realloc(),
since only here we expand the program, so we try to account for the
needed delta and should we fail, call-sites check for outcome anyway.
On cBPF to eBPF migrations, we don't grab a reference to the user as
they are charged differently. With that in place, my test case worked
fine.

Fixes: aaac3ba9 ("bpf: charge user for creation of BPF maps and programs")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5ccb071e

bpf: dynamically allocate digest scratch buffer · aafe6ae9

由 Daniel Borkmann 提交于 12月 18, 2016

Geert rightfully complained that 7bd509e3 ("bpf: add prog_digest
and expose it via fdinfo/netlink") added a too large allocation of
variable 'raw' from bss section, and should instead be done dynamically:

  # ./scripts/bloat-o-meter kernel/bpf/core.o.1 kernel/bpf/core.o.2
  add/remove: 3/0 grow/shrink: 0/0 up/down: 33291/0 (33291)
  function                                     old     new   delta
  raw                                            -   32832  +32832
  [...]

Since this is only relevant during program creation path, which can be
considered slow-path anyway, lets allocate that dynamically and be not
implicitly dependent on verifier mutex. Move bpf_prog_calc_digest() at
the beginning of replace_map_fd_with_map_ptr() and also error handling
stays straight forward.
Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aafe6ae9

Merge branch 'gtp-fixes' · 40e972ab

由 David S. Miller 提交于 12月 17, 2016

Pablo Neira Ayuso says:

====================
GTP tunneling fixes for net

The following patchset contains two GTP tunneling fixes for your net
tree, they are:

1) Offset to IPv4 header in gtp_check_src_ms_ipv4() is incorrect, thus
   this function always succeeds and therefore this defeats this sanity
   check. This allows packets that have no PDP to go though, patch from
   Lionel Gauthier.

2) According to Note 0 of Figure 2 in Section 6 of 3GPP TS 29.060 v13.5.0
   Release 13, always set GTPv1 reserved bit to zero. This may cause
   interoperability problems, patch from Harald Welte.

Please, apply, thanks a lot!
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

40e972ab

gtp: Fix initialization of Flags octet in GTPv1 header · d928be81

由 Harald Welte 提交于 12月 15, 2016

When generating a GTPv1 header in gtp1_push_header(), initialize the
'reserved' bit to zero.  All 3GPP specifications for GTPv1 from Release
99 through Release 13 agree that a transmitter shall set this bit to
zero, see e.g. Note 0 of Figure 2 in Section 6 of 3GPP TS 29.060 v13.5.0
Release 13, available from
http://www.etsi.org/deliver/etsi_ts/129000_129099/129060/13.05.00_60/ts_129060v130500p.pdfSigned-off-by: NHarald Welte <laforge@gnumonks.org>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d928be81

gtp: gtp_check_src_ms_ipv4() always return success · 88edf103

由 Lionel Gauthier 提交于 12月 15, 2016

gtp_check_src_ms_ipv4() did not find the PDP context matching with the
UE IP address because the memory location is not right, but the result
is inverted by the Boolean "not" operator.  So whatever is the PDP
context, any call to this function is successful.
Signed-off-by: NLionel Gauthier <Lionel.Gauthier@eurecom.fr>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

88edf103

net/x25: use designated initializers · e999cb43

由 Kees Cook 提交于 12月 16, 2016

Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e999cb43

isdn: use designated initializers · ebf12f13

由 Kees Cook 提交于 12月 16, 2016

Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ebf12f13

bna: use designated initializers · 9751362a

由 Kees Cook 提交于 12月 16, 2016

Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9751362a

WAN: use designated initializers · aabd7ad9

由 Kees Cook 提交于 12月 16, 2016

Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aabd7ad9

net: use designated initializers · 9d1c0ca5

由 Kees Cook 提交于 12月 16, 2016

Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9d1c0ca5

ATM: use designated initializers · 99a5e178

由 Kees Cook 提交于 12月 16, 2016

Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

99a5e178

isdn/gigaset: use designated initializers · 47941950

由 Kees Cook 提交于 12月 16, 2016

Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

47941950

Merge branch 'virtio_net-XDP' · cd333e37

由 David S. Miller 提交于 12月 17, 2016

John Fastabend says:

====================
XDP for virtio_net

This implements virtio_net for the mergeable buffers and big_packet
modes. I tested this with vhost_net running on qemu and did not see
any issues. For testing num_buf > 1 I added a hack to vhost driver
to only but 100 bytes per buffer.

There are some restrictions for XDP to be enabled and work well
(see patch 3) for more details.

  1. GUEST_TSO{4|6} must be off
  2. MTU must be less than PAGE_SIZE
  3. queues must be available to dedicate to XDP
  4. num_bufs received in mergeable buffers must be 1
  5. big_packet mode must have all data on single page

To test this I used pktgen in the hypervisor and ran the XDP sample
programs xdp1 and xdp2 from ./samples/bpf in the host. The default
mode that is used with these patches with Linux guest and QEMU/Linux
hypervisor is the mergeable buffers mode. I tested this mode for 2+
days running xdp2 without issues. Additionally I did a series of
driver unload/load tests to check the allocate/release paths.

To test the big_packets path I applied the following simple patch against
the virtio driver forcing big_packets mode,

--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2242,7 +2242,7 @@ static int virtnet_probe(struct virtio_device *vdev)
                vi->big_packets = true;

        if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
-               vi->mergeable_rx_bufs = true;
+               vi->mergeable_rx_bufs = false;

        if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF) ||
            virtio_has_feature(vdev, VIRTIO_F_VERSION_1))

I then repeated the tests with xdp1 and xdp2. After letting them run
for a few hours I called it good enough.

Testing the unexpected case where virtio receives a packet across
multiple buffers required patching the hypervisor vhost driver to
convince it to send these unexpected packets. Then I used ping with
the -s option to trigger the case with multiple buffers. This mode
is not expected to be used but as MST pointed out per spec it is
not strictly speaking illegal to generate multi-buffer packets so we
need someway to handle these. The following patch can be used to
generate multiple buffers,

--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1777,7 +1777,8 @@ static int translate_desc(struct vhost_virtqueue
*vq, u64

                _iov = iov + ret;
                size = node->size - addr + node->start;
-               _iov->iov_len = min((u64)len - s, size);
+               printk("%s: build 100 length headers!\n", __func__);
+               _iov->iov_len = min((u64)len - s, (u64)100);//size);
                _iov->iov_base = (void __user *)(unsigned long)
                        (node->userspace_addr + addr - node->start);
                s += size;

The qemu command I most frequently used for testing (although I did test
various other combinations of devices) is the following,

 ./x86_64-softmmu/qemu-system-x86_64              \
    -hda /var/lib/libvirt/images/Fedora-test0.img \
    -m 4096  -enable-kvm -smp 2                   \
    -netdev tap,id=hn0,queues=4,vhost=on          \
    -device virtio-net-pci,netdev=hn0,mq=on,vectors=9,guest_tso4=off,guest_tso6=off \
    -serial stdio

The options 'guest_tso4=off,guest_tso6=off' are required because we
do not support LRO with XDP at the moment.

Please review any comments/feedback welcome as always.
====================
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cd333e37

virtio_net: xdp, add slowpath case for non contiguous buffers · 72979a6c

由 John Fastabend 提交于 12月 15, 2016

virtio_net XDP support expects receive buffers to be contiguous.
If this is not the case we enable a slowpath to allow connectivity
to continue but at a significan performance overhead associated with
linearizing data. To make it painfully aware to users that XDP is
running in a degraded mode we throw an xdp buffer error.

To linearize packets we allocate a page and copy the segments of
the data, including the header, into it. After this the page can be
handled by XDP code flow as normal.

Then depending on the return code the page is either freed or sent
to the XDP xmit path. There is no attempt to optimize this path.

This case is being handled simple as a precaution in case some
unknown backend were to generate packets in this form. To test this
I had to hack qemu and force it to generate these packets. I do not
expect this case to be generated by "real" backends.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

72979a6c

virtio_net: add XDP_TX support · 56434a01

由 John Fastabend 提交于 12月 15, 2016

This adds support for the XDP_TX action to virtio_net. When an XDP
program is run and returns the XDP_TX action the virtio_net XDP
implementation will transmit the packet on a TX queue that aligns
with the current CPU that the XDP packet was processed on.

Before sending the packet the header is zeroed.  Also XDP is expected
to handle checksum correctly so no checksum offload  support is
provided.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

56434a01

virtio_net: add dedicated XDP transmit queues · 672aafd5

由 John Fastabend 提交于 12月 15, 2016

XDP requires using isolated transmit queues to avoid interference
with normal networking stack (BQL, NETDEV_TX_BUSY, etc). This patch
adds a XDP queue per cpu when a XDP program is loaded and does not
expose the queues to the OS via the normal API call to
netif_set_real_num_tx_queues(). This way the stack will never push
an skb to these queues.

However virtio/vhost/qemu implementation only allows for creating
TX/RX queue pairs at this time so creating only TX queues was not
possible. And because the associated RX queues are being created I
went ahead and exposed these to the stack and let the backend use
them. This creates more RX queues visible to the network stack than
TX queues which is worth mentioning but does not cause any issues as
far as I can tell.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

672aafd5

virtio_net: Add XDP support · f600b690

由 John Fastabend 提交于 12月 15, 2016

This adds XDP support to virtio_net. Some requirements must be
met for XDP to be enabled depending on the mode. First it will
only be supported with LRO disabled so that data is not pushed
across multiple buffers. Second the MTU must be less than a page
size to avoid having to handle XDP across multiple pages.

If mergeable receive is enabled this patch only supports the case
where header and data are in the same buf which we can check when
a packet is received by looking at num_buf. If the num_buf is
greater than 1 and a XDP program is loaded the packet is dropped
and a warning is thrown. When any_header_sg is set this does not
happen and both header and data is put in a single buffer as expected
so we check this when XDP programs are loaded.  Subsequent patches
will process the packet in a degraded mode to ensure connectivity
and correctness is not lost even if backend pushes packets into
multiple buffers.

If big packets mode is enabled and MTU/LRO conditions above are
met then XDP is allowed.

This patch was tested with qemu with vhost=on and vhost=off where
mergeable and big_packet modes were forced via hard coding feature
negotiation. Multiple buffers per packet was forced via a small
test patch to vhost.c in the vhost=on qemu mode.
Suggested-by: NShrijeet Mukherjee <shrijeet@gmail.com>
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f600b690

net: xdp: add invalid buffer warning · f23bc46c

由 John Fastabend 提交于 12月 15, 2016

This adds a warning for drivers to use when encountering an invalid
buffer for XDP. For normal cases this should not happen but to catch
this in virtual/qemu setups that I may not have expected from the
emulation layer having a standard warning is useful.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f23bc46c

sctp: sctp_transport_lookup_process should rcu_read_unlock when transport is null · 08abb795

由 Xin Long 提交于 12月 15, 2016

Prior to this patch, sctp_transport_lookup_process didn't rcu_read_unlock
when it failed to find a transport by sctp_addrs_lookup_transport.

This patch is to fix it by moving up rcu_read_unlock right before checking
transport and also to remove the out path.

Fixes: 1cceda78 ("sctp: fix the issue sctp_diag uses lock_sock in rcu_read_lock")
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

08abb795

sctp: sctp_epaddr_lookup_transport should be protected by rcu_read_lock · 5cb2cd68

由 Xin Long 提交于 12月 15, 2016

Since commit 7fda702f ("sctp: use new rhlist interface on sctp transport
rhashtable"), sctp has changed to use rhlist_lookup to look up transport, but
rhlist_lookup doesn't call rcu_read_lock inside, unlike rhashtable_lookup_fast.

It is called in sctp_epaddr_lookup_transport and sctp_addrs_lookup_transport.
sctp_addrs_lookup_transport is always in the protection of rcu_read_lock(),
as __sctp_lookup_association is called in rx path or sctp_lookup_association
which are in the protection of rcu_read_lock() already.

But sctp_epaddr_lookup_transport is called by sctp_endpoint_lookup_assoc, it
doesn't call rcu_read_lock, which may cause "suspicious rcu_dereference_check
usage' in __rhashtable_lookup.

This patch is to fix it by adding rcu_read_lock in sctp_endpoint_lookup_assoc
before calling sctp_epaddr_lookup_transport.

Fixes: 7fda702f ("sctp: use new rhlist interface on sctp transport rhashtable")
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5cb2cd68

Merge branch 'dpaa_eth-fixes' · 10a3ecf4

由 David S. Miller 提交于 12月 17, 2016

Madalin Bucur says:

====================
dpaa_eth: a couple of fixes

This patch set introduces big endian accessors in the dpaa_eth driver
making sure accesses to the QBMan HW are correct on little endian
platforms. Removing a redundant Kconfig dependency on FSL_SOC.
Adding myself as maintainer of the dpaa_eth driver.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

10a3ecf4

MAINTAINERS: net: add entry for Freescale QorIQ DPAA Ethernet driver · 63f4b4b0

由 Madalin Bucur 提交于 12月 15, 2016

Add record for Freescale QORIQ DPAA Ethernet driver adding myself as
maintainer.
Signed-off-by: NMadalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

63f4b4b0

dpaa_eth: remove redundant dependency on FSL_SOC · 708f0f4f

由 Madalin Bucur 提交于 12月 15, 2016

Signed-off-by: NMadalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

708f0f4f

dpaa_eth: use big endian accessors · 7d6f8dc0

由 Claudiu Manoil 提交于 12月 15, 2016

Ensure correct access to the big endian QMan HW through proper
accessors.
Signed-off-by: NClaudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: NMadalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7d6f8dc0

irda: irnet: add member name to the miscdevice declaration · 616f6b40

由 LABBE Corentin 提交于 12月 15, 2016

Since the struct miscdevice have many members, it is dangerous to init
it without members name relying only on member order.

This patch add member name to the init declaration.
Signed-off-by: NCorentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

616f6b40

irda: irnet: Remove unused IRNET_MAJOR define · 33de4d1b

由 LABBE Corentin 提交于 12月 15, 2016

The IRNET_MAJOR define is not used, so this patch remove it.
Signed-off-by: NCorentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

33de4d1b

irnet: ppp: move IRNET_MINOR to include/linux/miscdevice.h · 24c946cc

由 LABBE Corentin 提交于 12月 15, 2016

This patch move the define for IRNET_MINOR to include/linux/miscdevice.h
It is better that all minor number definitions are in the same place.
Signed-off-by: NCorentin Labbe <clabbe.montjoie@gmail.com>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

24c946cc

irda: irnet: Move linux/miscdevice.h include · e2928235

由 LABBE Corentin 提交于 12月 15, 2016

The only use of miscdevice is irda_ppp so no need to include
linux/miscdevice.h for all irda files.
This patch move the linux/miscdevice.h include to irnet_ppp.h
Signed-off-by: NCorentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e2928235

irda: irproc.c: Remove unneeded linux/miscdevice.h include · 078497a4

由 LABBE Corentin 提交于 12月 15, 2016

irproc.c does not use any miscdevice so this patch remove this
unnecessary inclusion.
Signed-off-by: NCorentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

078497a4

bpf: cgroup: annotate pointers in struct cgroup_bpf with __rcu · dcdc43d6

由 Daniel Mack 提交于 12月 15, 2016

The member 'effective' in 'struct cgroup_bpf' is protected by RCU.
Annotate it accordingly to squelch a sparse warning.
Signed-off-by: NDaniel Mack <daniel@zonque.org>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dcdc43d6

Merge branch 'inet_csk_get_port-and-soreusport-fixes' · 28055c97

由 David S. Miller 提交于 12月 17, 2016

Tom Herbert says:

====================
inet: Fixes for inet_csk_get_port and soreusport

This patch set fixes a couple of issues I noticed while debugging our
softlockup issue in inet_csk_get_port.

- Don't allow jump into port scan in inet_csk_get_port if function
  was called with non-zero port number (looking up explicit port
  number).
- When inet_csk_get_port is called with zero port number (ie. perform
  scan) an reuseport is set on the socket, don't match sockets that
  also have reuseport set. The intent from the user should be
  to get a new port number and then explictly bind other
  sockets to that number using soreuseport.

Tested:

Ran first patch on production workload with no ill effect.

For second patch, ran a little listener application and first
demonstrated that unbound sockets with soreuseport can indeed
be bound to unrelated soreuseport sockets.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

28055c97

inet: Fix get port to handle zero port number with soreuseport set · 0643ee4f

由 Tom Herbert 提交于 12月 14, 2016

A user may call listen with binding an explicit port with the intent
that the kernel will assign an available port to the socket. In this
case inet_csk_get_port does a port scan. For such sockets, the user may
also set soreuseport with the intent a creating more sockets for the
port that is selected. The problem is that the initial socket being
opened could inadvertently choose an existing and unreleated port
number that was already created with soreuseport.

This patch adds a boolean parameter to inet_bind_conflict that indicates
rather soreuseport is allowed for the check (in addition to
sk->sk_reuseport). In calls to inet_bind_conflict from inet_csk_get_port
the argument is set to true if an explicit port is being looked up (snum
argument is nonzero), and is false if port scan is done.
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0643ee4f

inet: Don't go into port scan when looking for specific bind port · 9af7e923

由 Tom Herbert 提交于 12月 14, 2016

inet_csk_get_port is called with port number (snum argument) that may be
zero or nonzero. If it is zero, then the intent is to find an available
ephemeral port number to bind to. If snum is non-zero then the caller
is asking to allocate a specific port number. In the latter case we
never want to perform the scan in ephemeral port range. It is
conceivable that this can happen if the "goto again" in "tb_found:"
is done. This patch adds a check that snum is zero before doing
the "goto again".
Signed-off-by: NTom Herbert <tom@herbertland.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9af7e923

17 12月, 2016 6 次提交

bpf, test_verifier: fix a test case error result on unprivileged · 0eb6984f

由 Daniel Borkmann 提交于 12月 15, 2016

Running ./test_verifier as unprivileged lets 1 out of 98 tests fail:

  [...]
  #71 unpriv: check that printk is disallowed FAIL
  Unexpected error message!
  0: (7a) *(u64 *)(r10 -8) = 0
  1: (bf) r1 = r10
  2: (07) r1 += -8
  3: (b7) r2 = 8
  4: (bf) r3 = r1
  5: (85) call bpf_trace_printk#6
  unknown func bpf_trace_printk#6
  [...]

The test case is correct, just that the error outcome changed with
ebb676da ("bpf: Print function name in addition to function id").
Same as with e00c7b21 ("bpf: fix multiple issues in selftest suite
and samples") issue 2), so just fix up the function name.

Fixes: ebb676da ("bpf: Print function name in addition to function id")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0eb6984f

bpf: fix regression on verifier pruning wrt map lookups · a08dd0da

由 Daniel Borkmann 提交于 12月 15, 2016

Commit 57a09bf0 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL
registers") introduced a regression where existing programs stopped
loading due to reaching the verifier's maximum complexity limit,
whereas prior to this commit they were loading just fine; the affected
program has roughly 2k instructions.

What was found is that state pruning couldn't be performed effectively
anymore due to mismatches of the verifier's register state, in particular
in the id tracking. It doesn't mean that 57a09bf0 is incorrect per
se, but rather that verifier needs to perform a lot more work for the
same program with regards to involved map lookups.

Since commit 57a09bf0 is only about tracking registers with type
PTR_TO_MAP_VALUE_OR_NULL, the id is only needed to follow registers
until they are promoted through pattern matching with a NULL check to
either PTR_TO_MAP_VALUE or UNKNOWN_VALUE type. After that point, the
id becomes irrelevant for the transitioned types.

For UNKNOWN_VALUE, id is already reset to 0 via mark_reg_unknown_value(),
but not so for PTR_TO_MAP_VALUE where id is becoming stale. It's even
transferred further into other types that don't make use of it. Among
others, one example is where UNKNOWN_VALUE is set on function call
return with RET_INTEGER return type.

states_equal() will then fall through the memcmp() on register state;
note that the second memcmp() uses offsetofend(), so the id is part of
that since d2a4dd37 ("bpf: fix state equivalence"). But the bisect
pointed already to 57a09bf0, where we really reach beyond complexity
limit. What I found was that states_equal() often failed in this
case due to id mismatches in spilled regs with registers in type
PTR_TO_MAP_VALUE. Unlike non-spilled regs, spilled regs just perform
a memcmp() on their reg state and don't have any other optimizations
in place, therefore also id was relevant in this case for making a
pruning decision.

We can safely reset id to 0 as well when converting to PTR_TO_MAP_VALUE.
For the affected program, it resulted in a ~17 fold reduction of
complexity and let the program load fine again. Selftest suite also
runs fine. The only other place where env->id_gen is used currently is
through direct packet access, but for these cases id is long living, thus
a different scenario.

Also, the current logic in mark_map_regs() is not fully correct when
marking NULL branch with UNKNOWN_VALUE. We need to cache the destination
reg's id in any case. Otherwise, once we marked that reg as UNKNOWN_VALUE,
it's id is reset and any subsequent registers that hold the original id
and are of type PTR_TO_MAP_VALUE_OR_NULL won't be marked UNKNOWN_VALUE
anymore, since mark_map_reg() reuses the uncached regs[regno].id that
was just overridden. Note, we don't need to cache it outside of
mark_map_regs(), since it's called once on this_branch and the other
time on other_branch, which are both two independent verifier states.
A test case for this is added here, too.

Fixes: 57a09bf0 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NThomas Graf <tgraf@suug.ch>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a08dd0da

net: vrf: Drop conntrack data after pass through VRF device on Tx · eb63ecc1

由 David Ahern 提交于 12月 14, 2016

Locally originated traffic in a VRF fails in the presence of a POSTROUTING
rule. For example,

    $ iptables -t nat -A POSTROUTING -s 11.1.1.0/24  -j MASQUERADE
    $ ping -I red -c1 11.1.1.3
    ping: Warning: source address might be selected on device other than red.
    PING 11.1.1.3 (11.1.1.3) from 11.1.1.2 red: 56(84) bytes of data.
    ping: sendmsg: Operation not permitted

Worse, the above causes random corruption resulting in a panic in random
places (I have not seen a consistent backtrace).

Call nf_reset to drop the conntrack info following the pass through the
VRF device.  The nf_reset is needed on Tx but not Rx because of the order
in which NF_HOOK's are hit: on Rx the VRF device is after the real ingress
device and on Tx it is is before the real egress device. Connection
tracking should be tied to the real egress device and not the VRF device.

Fixes: 8f58336d ("net: Add ethernet header for pass through VRF device")
Fixes: 35402e31 ("net: Add IPv6 support to VRF device")
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eb63ecc1

net: vrf: Fix NAT within a VRF · a0f37efa

由 David Ahern 提交于 12月 14, 2016

Connection tracking with VRF is broken because the pass through the VRF
device drops the connection tracking info. Removing the call to nf_reset
allows DNAT and MASQUERADE to work across interfaces within a VRF.

Fixes: 73e20b76 ("net: vrf: Add support for PREROUTING rules on vrf device")
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a0f37efa

Merge branch 'cls_flower-mask' · 8a9f5fdf

由 David S. Miller 提交于 12月 17, 2016

Paul Blakey says:

====================
net/sched: cls_flower: Fix mask handling

The series fix how the mask is being handled.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8a9f5fdf

net/sched: cls_flower: Use masked key when calling HW offloads · f93bd17b

由 Paul Blakey 提交于 12月 14, 2016

Zero bits on the mask signify a "don't care" on the corresponding bits
in key. Some HWs require those bits on the key to be zero. Since these
bits are masked anyway, it's okay to provide the masked key to all
drivers.

Fixes: 5b33f488 ('net/flower: Introduce hardware offload support')
Signed-off-by: NPaul Blakey <paulb@mellanox.com>
Reviewed-by: NRoi Dayan <roid@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f93bd17b