提交 · f90ce56187e78f46560a6a31e39ee3209b1a9427 · xiphi1978 / linux

24 12月, 2015 2 次提交

cxgb4: get naming correct for iscsi queues · f90ce561

由 Hariprasad Shenai 提交于 12月 23, 2015

All the upper level protocols like rdma, iscsi have their own offload rx
queues, so instead of using the generic naming convention be specific
while naming them. Improves code readability
Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f90ce561

cxgb4: Warn if device doesn't have enough PCI bandwidth · 547fd272

由 Hariprasad Shenai 提交于 12月 23, 2015

Check if the device get enough bandwidth from the entire PCI chain to
satisfy its capabilities. This patch determines the PCIe device's
bandwidth capabilities by reading its PCIe Link Capabilities registers
and then call the pcie_get_minimum_link function to ensure that the
adapter is hooked into a slot which is capable of providing the
necessary bandwidth capabilities.
Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

547fd272

23 12月, 2015 10 次提交

Merge branch 'bindtodevice_tw_rst' · 83a76006

由 David S. Miller 提交于 12月 22, 2015

Florian Westphal says:

====================
tcp: honour SO_BINDTODEVICE for TW_RST case too

This is V2, this time as a small series since I followed Erics advice
to split this into smaller chunks, I hope this makes it easier to
review.

First patch adds inet_sk_transparent helper.
Second patch contains an if/else swap that I split from the
original TW_RST v1 one.
Third patch is the actual change without the superfluous sock_net change.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

83a76006

tcp: honour SO_BINDTODEVICE for TW_RST case too · 271c3b9b

由 Florian Westphal 提交于 12月 21, 2015

Hannes points out that when we generate tcp reset for timewait sockets we
pretend we found no socket and pass NULL sk to tcp_vX_send_reset().

Make it cope with inet tw sockets and then provide tw sk.

This makes RSTs appear on correct interface when SO_BINDTODEVICE is used.

Packetdrill test case:
// want default route to be used, we rely on BINDTODEVICE
`ip route del 192.0.2.0/24 via 192.168.0.2 dev tun0`

0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
// test case still works due to BINDTODEVICE
0.001 setsockopt(3, SOL_SOCKET, SO_BINDTODEVICE, "tun0", 4) = 0
0.100...0.200 connect(3, ..., ...) = 0

0.100 > S 0:0(0) <mss 1460,sackOK,nop,nop>
0.200 < S. 0:0(0) ack 1 win 32792 <mss 1460,sackOK,nop,nop>
0.200 > . 1:1(0) ack 1

0.210 close(3) = 0

0.210 > F. 1:1(0) ack 1 win 29200
0.300 < . 1:1(0) ack 2 win 46

// more data while in FIN_WAIT2, expect RST
1.300 < P. 1:1001(1000) ack 1 win 46

// fails without this change -- default route is used
1.301 > R 1:1(0) win 0
Reported-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Acked-by: NEric Dumazet <edumazet@google.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

271c3b9b

tcp: send_reset: test for non-NULL sk first · e46787f0

由 Florian Westphal 提交于 12月 21, 2015

tcp_md5_do_lookup requires a full socket, so once we extend
_send_reset() to also accept timewait socket we would have to change

if (!sk && hash_location)

to something like

if ((!sk || !sk_fullsock(sk)) && hash_location) {
  ...
} else {
  (sk && sk_fullsock(sk)) tcp_md5_do_lookup()
}

Switch the two branches: check if we have a socket first, then
fall back to a listener lookup if we saw a md5 option (hash_location).
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Acked-by: NEric Dumazet <edumazet@google.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e46787f0

net: add inet_sk_transparent() helper · b1f0a0e9

由 Florian Westphal 提交于 12月 21, 2015

Avoids cluttering tcp_v4_send_reset when followup patch extends
it to deal with timewait sockets.
Suggested-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Acked-by: NEric Dumazet <edumazet@google.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b1f0a0e9

mlxsw: core: Use devm_kzalloc to allocate mlxsw_hwmon structure · f4cee3af

由 Jiri Pirko 提交于 12月 22, 2015

KASan reported use-after-free for the hwmon structure. So fix this by
using devm_kzalloc and let the core take care about freeing the memory
during device dettach.
Reported-by: NIdo Schimmel <idosch@mellanox.com>
Fixes: 89309da3 ("mlxsw: core: Implement temperature hwmon interface")
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f4cee3af

net: tcp: deal with listen sockets properly in tcp_abort. · 2010b93e

由 Lorenzo Colitti 提交于 12月 22, 2015

When closing a listen socket, tcp_abort currently calls
tcp_done without clearing the request queue. If the socket has a
child socket that is established but not yet accepted, the child
socket is then left without a parent, causing a leak.

Fix this by setting the socket state to TCP_CLOSE and calling
inet_csk_listen_stop with the socket lock held, like tcp_close
does.

Tested using net_test. With this patch, calling SOCK_DESTROY on a
listen socket that has an established but not yet accepted child
socket results in the parent and the child being closed, such
that they no longer appear in sock_diag dumps.
Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2010b93e

mlxsw: core: Allow to reset temperature history via hwmon interface · e7bc73cb

由 Jiri Pirko 提交于 12月 21, 2015

Add another sysfs hwmon attribute to expose possibility to reset
temperature sensors history.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e7bc73cb

RDS: don't pretend to use cpu notifiers · f2830d09

由 Sebastian Andrzej Siewior 提交于 12月 19, 2015

It looks like an attempt to use CPU notifier here which was never
completed. Nobody tried to wire it up completely since 2k9. So I unwind
this code and get rid of everything not required. Oh look! 19 lines were
removed while code still does the same thing.
Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Tested-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f2830d09

net-sysfs: use to_net_dev in net_namespace() · 5c29482d

由 Geliang Tang 提交于 12月 22, 2015

Use to_net_dev() instead of open-coding it.
Signed-off-by: NGeliang Tang <geliangtang@163.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5c29482d

Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · d317aa58

由 David S. Miller 提交于 12月 22, 2015

Jeff Kirsher says:

====================
100GbE Intel Wired LAN Driver Updates 2015-12-22

This series contains updates to fm10k only.

Bruce cleans up the initialization of fm10k_workqueue at the global level,
which fixes a checkpatch.pl error.  Made several other cleanups of the
driver, like making structures that do not change constant, remove unused
code, cleanup code comments and use boolean states true/false instead of
an integer since a bool is all that is needed.

Jacob fixed the TLV format for little endian structures which are 4 byte
aligned copy, so add an additional __aligned(4) and __packed to ensure
that these structures are actually 4 byte aligned and packed correctly.
Updated the driver to use ether_addr_equal() instead of memcmp() to
compare MAC addresses.

Alex Duyck cleans up the exception handling so all of the paths result in
a similar state if we fail.  Specifically the driver will now unload the
mailbox interrupt, free the queue vectors and MSI-X, and then detach the
interface.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d317aa58

22 12月, 2015 10 次提交

fm10k: IS_ENABLED() is not appropriate for boolean kconfig option · 0d722ec8

由 Bruce Allan 提交于 12月 08, 2015

Tri-states need 'if IS_ENABLED()', booleans should use 'ifdef'.
Signed-off-by: NBruce Allan <bruce.w.allan@intel.com>
Tested-by: NKrishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

0d722ec8

fm10k: cleanup mailbox code comments etc · f632fed3

由 Bruce Allan 提交于 12月 08, 2015

Cleanup a number of issues with function header comments, lower-case
acronyms (i.e. FIFO, TLV), duplicate comments and a stubbed-out header
comment for fm10k_sm_mbx_init.
Signed-off-by: NBruce Allan <bruce.w.allan@intel.com>
Tested-by: NKrishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

f632fed3

fm10k: use true/false for boolean get_host_state · f355bb51

由 Bruce Allan 提交于 12月 08, 2015

Signed-off-by: NBruce Allan <bruce.w.allan@intel.com>
Tested-by: NKrishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

f355bb51

fm10k: remove unused struct element · e6f244d4

由 Bruce Allan 提交于 12月 08, 2015

Signed-off-by: NBruce Allan <bruce.w.allan@intel.com>
Tested-by: NKrishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

e6f244d4

fm10k: constify fm10k_mac_ops, fm10k_iov_ops and fm10k_info structures · f329ad73

由 Bruce Allan 提交于 12月 08, 2015

These structures never change so declare them as const.
Signed-off-by: NBruce Allan <bruce.w.allan@intel.com>
Tested-by: NKrishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

f329ad73

fm10k: address operator not needed when declaring function pointers · 4e458cfb

由 Bruce Allan 提交于 12月 08, 2015

Signed-off-by: NBruce Allan <bruce.w.allan@intel.com>
Tested-by: NKrishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

4e458cfb

fm10k: use ether_addr_equal instead of memcmp · 6186ddf0

由 Jacob Keller 提交于 11月 16, 2015

When comparing MAC addresses, use ether_addr_equal instead of memcmp to
ETH_ALEN length. Found and replaced using the following sed:

 sed -e 's/memcmp\x28\(.*\), ETH_ALEN\x29/!ether_addr_equal\x28\1\x29/'
Reported-by: NBruce Allan <bruce.w.allan@intel.com>
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Reviewed-by: NBruce Allan <bruce.w.allan@intel.com>
Tested-by: NKrishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

6186ddf0

fm10k: Cleanup exception handling for changing queues · 09f8a82b

由 Alexander Duyck 提交于 11月 10, 2015

This patch is meant to cleanup the exception handling for the paths where
we reset the interrupts and then reconfigure them.  In all of these paths
we had very different levels of exception handling.  I have updated the
driver so that all of the paths should result in a similar state if we
fail.

Specifically the driver will now unload the mailbox interrupt, free the
queue vectors and MSI-X, and then detach the interface.

In addition for any of the PCIe related resets I have added a check with
the hw_ready function to just make sure the registers are in a readable
state prior to reopening the interface.
Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
Reviewed-by: NBruce Allan <bruce.w.allan@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

09f8a82b

fm10k: correctly pack TLV structures and explain reasoning · 8c2a029c

由 Jacob Keller 提交于 11月 09, 2015

The TLV format for little endian structures is actually 4 byte aligned
copy. To this end, we need to add an additional __aligned(4) marker
along with __packed to ensure that these structures are actually 4 byte
aligned and packed correctly. Use of just __packed will not work as this
will result in 1byte alignment which is incorrect. Add a comment
explaining the reasoning behind why these structures need the special
treatment.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NKrishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

8c2a029c

fm10k: don't initialize fm10k_workqueue at global level · 07146e2e

由 Bruce Allan 提交于 11月 03, 2015

Cleans up checkpatch GLOBAL_INITIALIZERS error
Signed-off-by: NBruce Allan <bruce.w.allan@intel.com>
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NKrishneil Singh <Krishneil.k.singh@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

07146e2e

21 12月, 2015 1 次提交

ibmveth: consolidate kmalloc of array, memset 0 to kcalloc · 076ef440

由 Nicholas Mc Guire 提交于 12月 20, 2015

This is an API consolidation only. The use of kmalloc + memset to 0
is equivalent to kcalloc in this case as it is allocating an array
of elements.
Signed-off-by: NNicholas Mc Guire <hofrat@osadl.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

076ef440

19 12月, 2015 16 次提交

netcp: fix regression in receive processing · 958d104e

由 Arnd Bergmann 提交于 12月 18, 2015

A cleanup patch I did was unfortunately wrong and introduced
multiple serious bugs in the netcp rx processing, as indicated
by these correct gcc warnings:

drivers/net/ethernet/ti/netcp_core.c:776:14: warning: 'buf_ptr' may be used uninitialized in this function [-Wuninitialized]
drivers/net/ethernet/ti/netcp_core.c:687:14: warning: 'ptr' may be used uninitialized in this function [-Wuninitialized]

I have checked the patch once more and found that a call to
get_pkt_info() accidentally got removed in netcp_free_rx_desc_chain,
and netcp_process_one_rx_packet no longer retrieved the correct
buffer length. This patch should fix all the known problems,
but I did not test on real hardware.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Fixes: 89907779 ("netcp: try to reduce type confusion in descriptors")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

958d104e

asix: silence log message from oversize packet · b70183db

由 stephen hemminger 提交于 12月 17, 2015

Since it is possible for an external system to send oversize packets
at anytime, it is best for driver not to print a message and spam
the log (potential external DoS).

Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=109471Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b70183db

tcp: diag: add support for request sockets to tcp_abort() · 07f6f4a3

由 Eric Dumazet 提交于 12月 17, 2015

Adding support for SYN_RECV request sockets to tcp_abort()
is quite easy after our tcp listener rewrite.

Note that we also need to better handle listeners, or we might
leak not yet accepted children, because of a missing
inet_csk_listen_stop() call.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Tested-by: NLorenzo Colitti <lorenzo@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

07f6f4a3

Merge branch 'bpf-misc-updates' · d73e5f41

由 David S. Miller 提交于 12月 18, 2015

Daniel Borkmann says:

====================
Misc BPF updates

This series contains a couple of misc updates to the BPF code, besides
others a new helper bpf_skb_load_bytes(), moving clearing of A/X to the
classic converter, etc. Please see individual patches for details.

Thanks!
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d73e5f41

bpf, test: add couple of test cases · 9dd2af83

由 Daniel Borkmann 提交于 12月 17, 2015

Add couple of test cases for interpreter but also JITs, f.e. to test that
when imm32 moves are being done, upper 32bits of the regs are being zero
extended.

Without JIT:

  [...]
  [ 1114.129301] test_bpf: #43 MOV REG64 jited:0 128 PASS
  [ 1114.130626] test_bpf: #44 MOV REG32 jited:0 139 PASS
  [ 1114.132055] test_bpf: #45 LD IMM64 jited:0 124 PASS
  [...]

With JIT (generated code can as usual be nicely verified with the help of
bpf_jit_disasm tool):

  [...]
  [ 1062.726782] test_bpf: #43 MOV REG64 jited:1 6 PASS
  [ 1062.726890] test_bpf: #44 MOV REG32 jited:1 6 PASS
  [ 1062.726993] test_bpf: #45 LD IMM64 jited:1 6 PASS
  [...]
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9dd2af83

bpf, x86: detect/optimize loading 0 immediates · 606c88a8

由 Daniel Borkmann 提交于 12月 17, 2015

When sometimes structs or variables need to be initialized/'memset' to 0 in
an eBPF C program, the x86 BPF JIT converts this to use immediates. We can
however save a couple of bytes (f.e. even up to 7 bytes on a single emmission
of BPF_LD | BPF_IMM | BPF_DW) in the image by detecting such case and use xor
on the dst register instead.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

606c88a8

bpf: fix misleading comment in bpf_convert_filter · 23bf8807

由 Daniel Borkmann 提交于 12月 17, 2015

Comment says "User BPF's register A is mapped to our BPF register 6",
which is actually wrong as the mapping is on register 0. This can
already be inferred from the code itself. So just remove it before
someone makes assumptions based on that. Only code tells truth. ;)
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

23bf8807

bpf: move clearing of A/X into classic to eBPF migration prologue · 8b614aeb

由 Daniel Borkmann 提交于 12月 17, 2015

Back in the days where eBPF (or back then "internal BPF" ;->) was not
exposed to user space, and only the classic BPF programs internally
translated into eBPF programs, we missed the fact that for classic BPF
A and X needed to be cleared. It was fixed back then via 83d5b7ef
("net: filter: initialize A and X registers"), and thus classic BPF
specifics were added to the eBPF interpreter core to work around it.

This added some confusion for JIT developers later on that take the
eBPF interpreter code as an example for deriving their JIT. F.e. in
f75298f5 ("s390/bpf: clear correct BPF accumulator register"), at
least X could leak stack memory. Furthermore, since this is only needed
for classic BPF translations and not for eBPF (verifier takes care
that read access to regs cannot be done uninitialized), more complexity
is added to JITs as they need to determine whether they deal with
migrations or native eBPF where they can just omit clearing A/X in
their prologue and thus reduce image size a bit, see f.e. cde66c2d
("s390/bpf: Only clear A and X for converted BPF programs"). In other
cases (x86, arm64), A and X is being cleared in the prologue also for
eBPF case, which is unnecessary.

Lets move this into the BPF migration in bpf_convert_filter() where it
actually belongs as long as the number of eBPF JITs are still few. It
can thus be done generically; allowing us to remove the quirk from
__bpf_prog_run() and to slightly reduce JIT image size in case of eBPF,
while reducing code duplication on this matter in current(/future) eBPF
JITs.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Reviewed-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
Tested-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: Zi Shen Lim <zlim.lnx@gmail.com>
Cc: Yang Shi <yang.shi@linaro.org>
Acked-by: NYang Shi <yang.shi@linaro.org>
Acked-by: NZi Shen Lim <zlim.lnx@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8b614aeb

bpf: add bpf_skb_load_bytes helper · 05c74e5e

由 Daniel Borkmann 提交于 12月 17, 2015

When hacking tc programs with eBPF, one of the issues that come up
from time to time is to load addresses from headers. In eBPF as in
classic BPF, we have BPF_LD | BPF_ABS | BPF_{B,H,W} instructions that
extract a byte, half-word or word out of the skb data though helpers
such as bpf_load_pointer() (interpreter case).

F.e. extracting a whole IPv6 address could possibly look like ...

  union v6addr {
    struct {
      __u32 p1;
      __u32 p2;
      __u32 p3;
      __u32 p4;
    };
    __u8 addr[16];
  };

  [...]

  a.p1 = htonl(load_word(skb, off));
  a.p2 = htonl(load_word(skb, off +  4));
  a.p3 = htonl(load_word(skb, off +  8));
  a.p4 = htonl(load_word(skb, off + 12));

  [...]

  /* access to a.addr[...] */

This work adds a complementary helper bpf_skb_load_bytes() (we also
have bpf_skb_store_bytes()) as an alternative where the same call
would look like from an eBPF program:

  ret = bpf_skb_load_bytes(skb, off, addr, sizeof(addr));

Same verifier restrictions apply as in ffeedafb ("bpf: introduce
current->pid, tgid, uid, gid, comm accessors") case, where stack memory
access needs to be statically verified and thus guaranteed to be
initialized in first use (otherwise verifier cannot tell whether a
subsequent access to it is valid or not as it's runtime dependent).
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

05c74e5e

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · 59ce9670

由 David S. Miller 提交于 12月 18, 2015

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains the first batch of Netfilter updates for
the upcoming 4.5 kernel. This batch contains userspace netfilter header
compilation fixes, support for packet mangling in nf_tables, the new
tracing infrastructure for nf_tables and cgroup2 support for iptables.
More specifically, they are:

1) Two patches to include dependencies in our netfilter userspace
   headers to resolve compilation problems, from Mikko Rapeli.

2) Four comestic cleanup patches for the ebtables codebase, from Ian Morris.

3) Remove duplicate include in the netfilter reject infrastructure,
   from Stephen Hemminger.

4) Two patches to simplify the netfilter defragmentation code for IPv6,
   patch from Florian Westphal.

5) Fix root ownership of /proc/net netfilter for unpriviledged net
   namespaces, from Philip Whineray.

6) Get rid of unused fields in struct nft_pktinfo, from Florian Westphal.

7) Add mangling support to our nf_tables payload expression, from
   Patrick McHardy.

8) Introduce a new netlink-based tracing infrastructure for nf_tables,
   from Florian Westphal.

9) Change setter functions in nfnetlink_log to be void, from
    Rami Rosen.

10) Add netns support to the cttimeout infrastructure.

11) Add cgroup2 support to iptables, from Tejun Heo.

12) Introduce nfnl_dereference_protected() in nfnetlink, from Florian.

13) Add support for mangling pkttype in the nf_tables meta expression,
    also from Florian.

BTW, I need that you pull net into net-next, I have another batch that
requires changes that I don't yet see in net.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

59ce9670

nfp: call netif_carrier_off() during init · 4b402d71

由 Jakub Kicinski 提交于 12月 17, 2015

Netdevs default to carrier on, we should call netif_carrier_off()
during initialization since we handle carrier state changes in the
driver.
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: NRolf Neugebauer <rolf.neugebauer@netronome.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4b402d71

Merge branch 'l3mdev-accept' · 6462de8c

由 David S. Miller 提交于 12月 18, 2015

David Ahern says:

====================
net: Allow accepted sockets to be bound to l3mdev domain

Allow accepted sockets to derive their sk_bound_dev_if setting from the
l3mdev domain in which the packets originated. This version adds a sysctl
to control whether the setting is inherited, making the functionality
similar to sk_mark and its sysctl_tcp_fwmark_accept setting.

This effectively allow a process to have a "VRF-global" listen socket,
with child sockets bound to the VRF device in which the packet originated.
A similar behavior can be achieved using sk_mark, but a solution using marks
is incomplete as it does not handle duplicate addresses in different L3
domains/VRFs. Allowing sockets to inherit the sk_bound_dev_if from l3mdev
domain provides a complete solution.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6462de8c

net: Allow accepted sockets to be bound to l3mdev domain · 6dd9a14e

由 David Ahern 提交于 12月 16, 2015

Allow accepted sockets to derive their sk_bound_dev_if setting from the
l3mdev domain in which the packets originated. A sysctl setting is added
to control the behavior which is similar to sk_mark and
sysctl_tcp_fwmark_accept.

This effectively allow a process to have a "VRF-global" listen socket,
with child sockets bound to the VRF device in which the packet originated.
A similar behavior can be achieved using sk_mark, but a solution using marks
is incomplete as it does not handle duplicate addresses in different L3
domains/VRFs. Allowing sockets to inherit the sk_bound_dev_if from l3mdev
domain provides a complete solution.
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6dd9a14e

net: l3mdev: Add master device lookup by index · 1a852479

由 David Ahern 提交于 12月 16, 2015

Add helper to lookup l3mdev master index given a device index.
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1a852479

ipv6: addrconf: use stable address generator for ARPHRD_NONE · cc9da6cc

由 Bjørn Mork 提交于 12月 16, 2015

Add a new address generator mode, using the stable address generator
with an automatically generated secret. This is intended as a default
address generator mode for device types with no EUI64 implementation.
The new generator is used for ARPHRD_NONE interfaces initially, adding
default IPv6 autoconf support to e.g. tun interfaces.

If the addrgenmode is set to 'random', either by default or manually,
and no stable secret is available, then a random secret is used as
input for the stable-privacy address generator. The secret can be
read and modified like manually configured secrets, using the proc
interface. Modifying the secret will change the addrgen mode to
'stable-privacy' to indicate that it operates on a known secret.

Existing behaviour of the 'stable-privacy' mode is kept unchanged. If
a known secret is available when the device is created, then the mode
will default to 'stable-privacy' as before. The mode can be manually
set to 'random' but it will behave exactly like 'stable-privacy' in
this case. The secret will not change.

Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: 吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: NBjørn Mork <bjorn@mork.no>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cc9da6cc

ila: add NETFILTER dependency · 8cb964da

由 Arnd Bergmann 提交于 12月 18, 2015

The recently added generic ILA translation facility fails to
build when CONFIG_NETFILTER is disabled:

net/ipv6/ila/ila_xlat.c:229:20: warning: 'struct nf_hook_state' declared inside parameter list
net/ipv6/ila/ila_xlat.c:235:27: error: array type has incomplete element type 'struct nf_hook_ops'
static struct nf_hook_ops ila_nf_hook_ops[] __read_mostly = {

This adds an explicit Kconfig dependency to avoid that case.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Fixes: 7f00feaf ("ila: Add generic ILA translation facility")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8cb964da

18 12月, 2015 1 次提交

netfilter: meta: add support for setting skb->pkttype · b4aae759

由 Florian Westphal 提交于 12月 10, 2015

This allows to redirect bridged packets to local machine:

ether type ip ether daddr set aa:53:08:12:34:56 meta pkttype set unicast
Without 'set unicast', ip stack discards PACKET_OTHERHOST skbs.

It is also useful to add support for a '-m cluster like' nft rule
(where switch floods packets to several nodes, and each cluster node
 node processes a subset of packets for load distribution).

Mangling is restricted to HOST/OTHER/BROAD/MULTICAST, i.e. you cannot set
skb->pkt_type to PACKET_KERNEL or change PACKET_LOOPBACK to PACKET_HOST.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

b4aae759