提交 · ac833bddb591b9c7a0609f440f196375be184044 · openeuler / raspberrypi-kernel

25 3月, 2015 5 次提交

T
rhashtable: Mark internal/private inline functions as such · ac833bdd
由 Thomas Graf 提交于 3月 24, 2015
```
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
ac833bdd

rhashtable: Use 'unsigned int' consistently · 299e5c32

由 Thomas Graf 提交于 3月 24, 2015

Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

299e5c32

rhashtable: Extend RCU read lock into rhashtable_insert_rehash() · 58be8a58

由 Thomas Graf 提交于 3月 24, 2015

rhashtable_insert_rehash() requires RCU locks to be held in order
to access ht->tbl and traverse to the last table.

Fixes: ccd57b1b ("rhashtable: Add immediate rehash during insertion")
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

58be8a58

filter: introduce SKF_AD_VLAN_TPID BPF extension · 27cd5452

由 Michal Sekletar 提交于 3月 24, 2015

If vlan offloading takes place then vlan header is removed from frame
and its contents, both vlan_tci and vlan_proto, is available to user
space via TPACKET interface. However, only vlan_tci can be used in BPF
filters.

This commit introduces a new BPF extension. It makes possible to load
the value of vlan_proto (vlan TPID) to register A. Support for classic
BPF and eBPF is being added, analogous to skb->protocol.

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: NMichal Sekletar <msekleta@redhat.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
Reviewed-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

27cd5452

net: remove never used forwarding_accel_ops pointer from net_device · 0117ec19

由 Hannes Frederic Sowa 提交于 3月 23, 2015

Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0117ec19

24 3月, 2015 18 次提交

rhashtable: Fix sleeping inside RCU critical section in walk_stop · ba7c95ea

由 Herbert Xu 提交于 3月 24, 2015

The commit 963ecbd4 ("rhashtable:
Fix use-after-free in rhashtable_walk_stop") fixed a real bug
but created another one because we may end up sleeping inside an
RCU critical section.

This patch fixes it properly by replacing the mutex with a spin
lock that specifically protects the walker lists.
Reported-by: NSasha Levin <sasha.levin@oracle.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ba7c95ea

ipv6: introduce idgen_delay and idgen_retries knobs · 1855b7c3

由 Hannes Frederic Sowa 提交于 3月 23, 2015

This is specified by RFC 7217.

Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1855b7c3

ipv6: do retries on stable privacy addresses · 5f40ef77

由 Hannes Frederic Sowa 提交于 3月 23, 2015

If a DAD conflict is detected, we want to retry privacy stable address
generation up to idgen_retries (= 3) times with a delay of idgen_delay
(= 1 second). Add the logic to addrconf_dad_failure.

By design, we don't clean up dad failed permanent addresses.

Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f40ef77

ipv6: collapse state_lock and lock · 8e8e676d

由 Hannes Frederic Sowa 提交于 3月 23, 2015

Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8e8e676d

ipv6: introduce IFA_F_STABLE_PRIVACY flag · 64236f3f

由 Hannes Frederic Sowa 提交于 3月 23, 2015

We need to mark appropriate addresses so we can do retries in case their
DAD failed.

Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

64236f3f

ipv6: generation of stable privacy addresses for link-local and autoconf · 622c81d5

由 Hannes Frederic Sowa 提交于 3月 23, 2015

This patch implements the stable privacy address generation for
link-local and autoconf addresses as specified in RFC7217.

  RID = F(Prefix, Net_Iface, Network_ID, DAD_Counter, secret_key)

is the RID (random identifier). As the hash function F we chose one
round of sha1. Prefix will be either the link-local prefix or the
router advertised one. As Net_Iface we use the MAC address of the
device. DAD_Counter and secret_key are implemented as specified.

We don't use Network_ID, as it couples the code too closely to other
subsystems. It is specified as optional in the RFC.

As Net_Iface we only use the MAC address: we simply have no stable
identifier in the kernel we could possibly use: because this code might
run very early, we cannot depend on names, as they might be changed by
user space early on during the boot process.

A new address generation mode is introduced,
IN6_ADDR_GEN_MODE_STABLE_PRIVACY. With iproute2 one can switch back to
none or eui64 address configuration mode although the stable_secret is
already set.

We refuse writes to ipv6/conf/all/stable_secret but only allow
ipv6/conf/default/stable_secret and the interface specific file to be
written to. The default stable_secret is used as the parameter for the
namespace, the interface specific can overwrite the secret, e.g. when
switching a network configuration from one system to another while
inheriting the secret.

Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

622c81d5

ipv6: introduce secret_stable to ipv6_devconf · 3d1bec99

由 Hannes Frederic Sowa 提交于 3月 23, 2015

This patch implements the procfs logic for the stable_address knob:
The secret is formatted as an ipv6 address and will be stored per
interface and per namespace. We track initialized flag and return EIO
errors until the secret is set.

We don't inherit the secret to newly created namespaces.

Cc: Erik Kline <ek@google.com>
Cc: Fernando Gont <fgont@si6networks.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: YOSHIFUJI Hideaki/吉藤英明 <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3d1bec99

rhashtable: Add immediate rehash during insertion · ccd57b1b

由 Herbert Xu 提交于 3月 24, 2015

This patch reintroduces immediate rehash during insertion.  If
we find during insertion that the table is full or the chain
length exceeds a set limit (currently 16 but may be disabled
with insecure_elasticity) then we will force an immediate rehash.
The rehash will contain an expansion if the table utilisation
exceeds 75%.

If this rehash fails then the insertion will fail.  Otherwise the
insertion will be reattempted in the new hash table.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ccd57b1b

rhashtable: Add multiple rehash support · b824478b

由 Herbert Xu 提交于 3月 24, 2015

This patch adds the missing bits to allow multiple rehashes.  The
read-side as well as remove already handle this correctly.  So it's
only the rehasher and insertion that need modification to handle
this.

Note that this patch doesn't actually enable it so for now rehashing
is still only performed by the worker thread.

This patch also disables the explicit expand/shrink interface because
the table is meant to expand and shrink automatically, and continuing
to export these interfaces unnecessarily complicates the life of the
rehasher since the rehash process is now composed of two parts.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b824478b

rhashtable: Allow hashfn to be unset · 31ccde2d

由 Herbert Xu 提交于 3月 24, 2015

Since every current rhashtable user uses jhash as their hash
function, the fact that jhash is an inline function causes each
user to generate a copy of its code.

This function provides a solution to this problem by allowing
hashfn to be unset.  In which case rhashtable will automatically
set it to jhash.  Furthermore, if the key length is a multiple
of 4, we will switch over to jhash2.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

31ccde2d

rhashtable: Eliminate unnecessary branch in rht_key_hashfn · de91b25c

由 Herbert Xu 提交于 3月 24, 2015

When rht_key_hashfn is called from rhashtable itself and params
is equal to ht->p, there is no point in checking params.key_len
and falling back to ht->p.key_len.

For some reason gcc couldn't figure out that params is the same
as ht->p.  So let's help it by only checking params.key_len when
it's a constant.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

de91b25c

af_packet: pass checksum validation status to the user · 682f048b

由 Alexander Drozdov 提交于 3月 23, 2015

Introduce TP_STATUS_CSUM_VALID tp_status flag to tell the
af_packet user that at least the transport header checksum
has been already validated.

For now, the flag may be set for incoming packets only.
Signed-off-by: NAlexander Drozdov <al.drozdov@gmail.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

682f048b

ipv4: dccp: handle ICMP messages on DCCP_NEW_SYN_RECV request sockets · 85645bab

由 Eric Dumazet 提交于 3月 22, 2015

dccp_v4_err() can restrict lookups to ehash table, and not to listeners.

Note this patch creates the infrastructure, but this means that ICMP
messages for request sockets are ignored until complete conversion.

New dccp_req_err() helper is exported so that we can use it in IPv6
in following patch.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

85645bab

ipv4: tcp: handle ICMP messages on TCP_NEW_SYN_RECV request sockets · 26e37360

由 Eric Dumazet 提交于 3月 22, 2015

tcp_v4_err() can restrict lookups to ehash table, and not to listeners.

Note this patch creates the infrastructure, but this means that ICMP
messages for request sockets are ignored until complete conversion.

New tcp_req_err() helper is exported so that we can use it in IPv6
in following patch.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

26e37360

net: convert syn_wait_lock to a spinlock · b2827053

由 Eric Dumazet 提交于 3月 22, 2015

This is a low hanging fruit, as we'll get rid of syn_wait_lock eventually.

We hold syn_wait_lock for such small sections, that it makes no sense to use
a read/write lock. A spin lock is simply faster.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b2827053

inet: remove sk_listener parameter from syn_ack_timeout() · 42cb80a2

由 Eric Dumazet 提交于 3月 22, 2015

It is not needed, and req->sk_listener points to the listener anyway.
request_sock argument can be const.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

42cb80a2

crypto: af_alg - Allow to link sgl · 66db3739

由 Tadeusz Struk 提交于 3月 19, 2015

Allow to link af_alg sgls.
Signed-off-by: NTadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

66db3739

net: socket: add support for async operations · 0345f931

由 tadeusz.struk@intel.com 提交于 3月 19, 2015

Add support for async operations.
Signed-off-by: NTadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0345f931

23 3月, 2015 3 次提交

can: add combined rx/tx LED trigger support · c54eb70e

由 Yegor Yefremov 提交于 3月 16, 2015

Add <ifname>-rxtx trigger, that will be activated both for tx
as rx events. This trigger mimics "activity" LED for Ethernet
devices.
Signed-off-by: NYegor Yefremov <yegorslists@googlemail.com>
Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>

c54eb70e

can: use sock_efree instead of own destructor · 2b290bbb

由 Florian Westphal 提交于 3月 10, 2015

It is identical to the can destructor.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Acked-by: NOliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>

2b290bbb

netfilter: bridge: kill nf_bridge_pad · 8d045163

由 Florian Westphal 提交于 3月 18, 2015

The br_netfilter frag output function calls skb_cow_head() so in
case it needs a larger headroom to e.g. re-add a previously stripped PPPOE
or VLAN header things will still work (at cost of reallocation).

We can then move nf_bridge_encap_header_len to br_netfilter.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

8d045163

21 3月, 2015 12 次提交

net: neighbour: Add mcast_resolicit to configure the number of multicast... · 8da86466

由 YOSHIFUJI Hideaki/吉藤英明提交于 3月 19, 2015

net: neighbour: Add mcast_resolicit to configure the number of multicast resolicitations in PROBE state.

We send unicast neighbor (ARP or NDP) solicitations ucast_probes
times in PROBE state.  Zhu Yanjun reported that some implementation
does not reply against them and the entry will become FAILED, which
is undesirable.

We had been dealt with such nodes by sending multicast probes mcast_
solicit times after unicast probes in PROBE state.  In 2003, I made
a change not to send them to improve compatibility with IPv6 NDP.

Let's introduce per-protocol per-interface sysctl knob "mcast_
reprobe" to configure the number of multicast (re)solicitation for
reconfirmation in PROBE state.  The default is 0, since we have
been doing so for 10+ years.
Reported-by: NZhu Yanjun <Yanjun.Zhu@windriver.com>
CC: Ulf Samuelsson <ulf.samuelsson@ericsson.com>
Signed-off-by: NYOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8da86466

switchdev: kernel-doc cleanup on swithdev ops · 13bb8e2e

由 Scott Feldman 提交于 3月 20, 2015

Suggested-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NScott Feldman <sfeldma@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

13bb8e2e

Revert "selinux: add a skb_owned_by() hook" · d3593b5c

由 Eric Dumazet 提交于 3月 20, 2015

This reverts commit ca10b9e9.

No longer needed after commit eb8895de
("tcp: tcp_make_synack() should use sock_wmalloc")

When under SYNFLOOD, we build lot of SYNACK and hit false sharing
because of multiple modifications done on sk_listener->sk_wmem_alloc

Since tcp_make_synack() uses sock_wmalloc(), there is no need
to call skb_set_owner_w() again, as this adds two atomic operations.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d3593b5c

act_bpf: add initial eBPF support for actions · a8cb5f55

由 Daniel Borkmann 提交于 3月 20, 2015

This work extends the "classic" BPF programmable tc action by extending
its scope also to native eBPF code!

Together with commit e2e9b654 ("cls_bpf: add initial eBPF support
for programmable classifiers") this adds the facility to implement fully
flexible classifier and actions for tc that can be implemented in a C
subset in user space, "safely" loaded into the kernel, and being run in
native speed when JITed.

Also, since eBPF maps can be shared between eBPF programs, it offers the
possibility that cls_bpf and act_bpf can share data 1) between themselves
and 2) between user space applications. That means that, f.e. customized
runtime statistics can be collected in user space, but also more importantly
classifier and action behaviour could be altered based on map input from
the user space application.

For the remaining details on the workflow and integration, see the cls_bpf
commit e2e9b654. Preliminary iproute2 part can be found under [1].

  [1] http://git.breakpoint.cc/cgit/dborkman/iproute2.git/log/?h=ebpf-actSigned-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Acked-by: NJiri Pirko <jiri@resnulli.us>
Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a8cb5f55

ebpf: add sched_act_type and map it to sk_filter's verifier ops · 94caee8c

由 Daniel Borkmann 提交于 3月 20, 2015

In order to prepare eBPF support for tc action, we need to add
sched_act_type, so that the eBPF verifier is aware of what helper
function act_bpf may use, that it can load skb data and read out
currently available skb fields.

This is bascially analogous to 96be4325 ("ebpf: add sched_cls_type
and map it to sk_filter's verifier ops").

BPF_PROG_TYPE_SCHED_CLS and BPF_PROG_TYPE_SCHED_ACT need to be
separate since both will have a different set of functionality in
future (classifier vs action), thus we won't run into ABI troubles
when the point in time comes to diverge functionality from the
classifier.

The future plan for act_bpf would be that it will be able to write
into skb->data and alter selected fields mirrored in struct __sk_buff.

For an initial support, it's sufficient to map it to sk_filter_ops.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Reviewed-by: NJiri Pirko <jiri@resnulli.us>
Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

94caee8c

rhashtable: Fix undeclared EEXIST build error on ia64 · 6626af69

由 Herbert Xu 提交于 3月 20, 2015

We need to include linux/errno.h in rhashtable.h since it doesn't
always get included otherwise.
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6626af69

rhashtable: Rip out obsolete out-of-line interface · dc0ee268

由 Herbert Xu 提交于 3月 20, 2015

Now that all rhashtable users have been converted over to the
inline interface, this patch removes the unused out-of-line
interface.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dc0ee268

rhashtable: Allow hash/comparison functions to be inlined · 02fd97c3

由 Herbert Xu 提交于 3月 20, 2015

This patch deals with the complaint that we make indirect function
calls on the fast paths unnecessarily in rhashtable.  We resolve
it by moving the fast paths into inline functions that take struct
rhashtable_param (which obviously must be the same set of parameters
supplied to rhashtable_init) as an argument.

The only remaining indirect call is to obj_hashfn (or key_hashfn it
obj_hashfn is unset) on the rehash as well as the insert-during-
rehash slow path.

This patch also extends the support of vairable-length keys to
include those where the key is fixed but scattered in the object.
For example, in netlink we want to key off the namespace and the
portid but they're not next to each other.

This patch does this by directly using the object hash function
as the indicator of whether the key is accessible or not.  It
also adds a new function obj_cmpfn to compare a key against an
object.  This means that the caller no longer needs to supply
explicit compare functions.

All this is done in a backwards compatible manner so no existing
users are affected until they convert to the new interface.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

02fd97c3

rhashtable: Make rhashtable_init params argument const · 488fb86e

由 Herbert Xu 提交于 3月 20, 2015

This patch marks the rhashtable_init params argument const as
there is no reason to modify it since we will always make a copy
of it in the rhashtable.

This patch also fixes a bug where we don't actually round up the
value of min_size unless it is less than HASH_MIN_SIZE.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

488fb86e

net: increase sk_[max_]ack_backlog · becb74f0

由 Eric Dumazet 提交于 3月 19, 2015

sk_ack_backlog & sk_max_ack_backlog were 16bit fields, meaning
listen() backlog was limited to 65535.

It is time to increase the width to allow much bigger backlog,
if admins change /proc/sys/net/core/somaxconn &
/proc/sys/net/ipv4/tcp_max_syn_backlog default values.

Tested:

echo 5000000 >/proc/sys/net/core/somaxconn
echo 5000000 >/proc/sys/net/ipv4/tcp_max_syn_backlog

Ran a SYNFLOOD test against a listener using listen(fd, 5000000)

myhost~# grep request_sock_TCP /proc/slabinfo
request_sock_TCP  4185642 4411940    304   13    1 : tunables   54   27    8 : slabdata 339380 339380      0
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

becb74f0

inet: get rid of central tcp/dccp listener timer · fa76ce73

由 Eric Dumazet 提交于 3月 19, 2015

One of the major issue for TCP is the SYNACK rtx handling,
done by inet_csk_reqsk_queue_prune(), fired by the keepalive
timer of a TCP_LISTEN socket.

This function runs for awful long times, with socket lock held,
meaning that other cpus needing this lock have to spin for hundred of ms.

SYNACK are sent in huge bursts, likely to cause severe drops anyway.

This model was OK 15 years ago when memory was very tight.

We now can afford to have a timer per request sock.

Timer invocations no longer need to lock the listener,
and can be run from all cpus in parallel.

With following patch increasing somaxconn width to 32 bits,
I tested a listener with more than 4 million active request sockets,
and a steady SYNFLOOD of ~200,000 SYN per second.
Host was sending ~830,000 SYNACK per second.

This is ~100 times more what we could achieve before this patch.

Later, we will get rid of the listener hash and use ehash instead.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fa76ce73

inet: drop prev pointer handling in request sock · 52452c54

由 Eric Dumazet 提交于 3月 19, 2015

When request sock are put in ehash table, the whole notion
of having a previous request to update dl_next is pointless.

Also, following patch will get rid of big purge timer,
so we want to delete a request sock without holding listener lock.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

52452c54

20 3月, 2015 1 次提交

target: do not reject FUA CDBs when write cache is enabled but emulate_write_cache is 0 · 9bc6548f

由 Christophe Vu-Brugier 提交于 3月 19, 2015

A check that rejects a CDB with FUA bit set if no write cache is
emulated was added by the following commit:

  fde9f50f target: Add sanity checks for DPO/FUA bit usage

The condition is as follows:

  if (!dev->dev_attrib.emulate_fua_write ||
      !dev->dev_attrib.emulate_write_cache)

However, this check is wrong if the backend device supports WCE but
"emulate_write_cache" is disabled.

This patch uses se_dev_check_wce() (previously named
spc_check_dev_wce) to invoke transport->get_write_cache() if the
device has a write cache or check the "emulate_write_cache" attribute
otherwise.
Reported-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NChristophe Vu-Brugier <cvubrugier@fastmail.fm>
Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>

9bc6548f

19 3月, 2015 1 次提交

netfilter: restore rule tracing via nfnetlink_log · 4017a7ee

由 Pablo Neira Ayuso 提交于 3月 02, 2015

Since fab4085f ("netfilter: log: nf_log_packet() as real unified
interface"), the loginfo structure that is passed to nf_log_packet() is
used to explicitly indicate the logger type you want to use.

This is a problem for people tracing rules through nfnetlink_log since
packets are always routed to the NF_LOG_TYPE logger after the
aforementioned patch.

We can fix this by removing the trace loginfo structures, but that still
changes the log level from 4 to 5 for tracing messages and there may be
someone relying on this outthere. So let's just introduce a new
nf_log_trace() function that restores the former behaviour.
Reported-by: NMarkus Kötter <koetter@rrzn.uni-hannover.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

4017a7ee