提交 · c68c7f5a88328fbcd992c68e99ebd6bf7d49e9d2 · OpenHarmony / kernel_linux

22 10月, 2013 1 次提交

net: fix build warnings because of net_get_random_once merge · c68c7f5a

由 Hannes Frederic Sowa 提交于 10月 20, 2013

This patch fixes the following warning:

   In file included from include/linux/skbuff.h:27:0,
                    from include/linux/netfilter.h:5,
                    from include/net/netns/netfilter.h:5,
                    from include/net/net_namespace.h:20,
                    from include/linux/init_task.h:14,
                    from init/init_task.c:1:
include/linux/net.h:243:14: warning: 'struct static_key' declared inside parameter list [enabled by default]
          struct static_key *done_key);

on x86_64 allnoconfig, um defconfig and ia64 allmodconfig and maybe others as well.
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c68c7f5a

20 10月, 2013 4 次提交

net: introduce new macro net_get_random_once · a48e4292

由 Hannes Frederic Sowa 提交于 10月 19, 2013

net_get_random_once is a new macro which handles the initialization
of secret keys. It is possible to call it in the fast path. Only the
initialization depends on the spinlock and is rather slow. Otherwise
it should get used just before the key is used to delay the entropy
extration as late as possible to get better randomness. It returns true
if the key got initialized.

The usage of static_keys for net_get_random_once is a bit uncommon so
it needs some further explanation why this actually works:

=== In the simple non-HAVE_JUMP_LABEL case we actually have ===
no constrains to use static_key_(true|false) on keys initialized with
STATIC_KEY_INIT_(FALSE|TRUE). So this path just expands in favor of
the likely case that the initialization is already done. The key is
initialized like this:

___done_key = { .enabled = ATOMIC_INIT(0) }

The check

                if (!static_key_true(&___done_key))                     \

expands into (pseudo code)

                if (!likely(___done_key > 0))

, so we take the fast path as soon as ___done_key is increased from the
helper function.

=== If HAVE_JUMP_LABELs are available this depends ===
on patching of jumps into the prepared NOPs, which is done in
jump_label_init at boot-up time (from start_kernel). It is forbidden
and dangerous to use net_get_random_once in functions which are called
before that!

At compilation time NOPs are generated at the call sites of
net_get_random_once. E.g. net/ipv6/inet6_hashtable.c:inet6_ehashfn (we
need to call net_get_random_once two times in inet6_ehashfn, so two NOPs):

      71:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
      76:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

Both will be patched to the actual jumps to the end of the function to
call __net_get_random_once at boot time as explained above.

arch_static_branch is optimized and inlined for false as return value and
actually also returns false in case the NOP is placed in the instruction
stream. So in the fast case we get a "return false". But because we
initialize ___done_key with (enabled != (entries & 1)) this call-site
will get patched up at boot thus returning true. The final check looks
like this:

                if (!static_key_true(&___done_key))                     \
                        ___ret = __net_get_random_once(buf,             \

expands to

                if (!!static_key_false(&___done_key))                     \
                        ___ret = __net_get_random_once(buf,             \

So we get true at boot time and as soon as static_key_slow_inc is called
on the key it will invert the logic and return false for the fast path.
static_key_slow_inc will change the branch because it got initialized
with .enabled == 0. After static_key_slow_inc is called on the key the
branch is replaced with a nop again.

=== Misc: ===
The helper defers the increment into a workqueue so we don't
have problems calling this code from atomic sections. A seperate boolean
(___done) guards the case where we enter net_get_random_once again before
the increment happend.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a48e4292

static_key: WARN on usage before jump_label_init was called · c4b2c0c5

由 Hannes Frederic Sowa 提交于 10月 19, 2013

Usage of the static key primitives to toggle a branch must not be used
before jump_label_init() is called from init/main.c. jump_label_init
reorganizes and wires up the jump_entries so usage before that could
have unforeseen consequences.

Following primitives are now checked for correct use:
* static_key_slow_inc
* static_key_slow_dec
* static_key_slow_dec_deferred
* jump_label_rate_limit

The x86 architecture already checks this by testing if the default_nop
was already replaced with an optimal nop or with a branch instruction. It
will panic then. Other architectures don't check for this.

Because we need to relax this check for the x86 arch to allow code to
transition from default_nop to the enabled state and other architectures
did not check for this at all this patch introduces checking on the
static_key primitives in a non-arch dependent manner.

All checked functions are considered slow-path so the additional check
does no harm to performance.

The warnings are best observed with earlyprintk.

Based on a patch from Andi Kleen.

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c4b2c0c5

ipip: add GSO/TSO support · cb32f511

由 Eric Dumazet 提交于 10月 19, 2013

Now inet_gso_segment() is stackable, its relatively easy to
implement GSO/TSO support for IPIP

Performance results, when segmentation is done after tunnel
device (as no NIC is yet enabled for TSO IPIP support) :

Before patch :

lpq83:~# ./netperf -H 7.7.9.84 -Cc
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.9.84 () port 0 AF_INET
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB

87380 16384 16384 10.00 3357.88 5.09 3.70 2.983 2.167

After patch :

87380 16384 16384 10.00 7710.19 4.52 6.62 1.152 1.687
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cb32f511

ipv4: gso: make inet_gso_segment() stackable · 3347c960

由 Eric Dumazet 提交于 10月 19, 2013

In order to support GSO on IPIP, we need to make
inet_gso_segment() stackable.

It should not assume network header starts right after mac
header.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3347c960

18 10月, 2013 3 次提交

net: Delete trailing semi-colon from definition of netdev_WARN() · 7cc7c5e5

由 Ben Hutchings 提交于 10月 14, 2013

Macro definitions should not normally end with a semi-colon, as this
makes it dangerous to use them an if...else statement.  Happily this
has not happened yet.
Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7cc7c5e5

net: refactor sk_page_frag_refill() · 400dfd3a

由 Eric Dumazet 提交于 10月 17, 2013

While working on virtio_net new allocation strategy to increase
payload/truesize ratio, we found that refactoring sk_page_frag_refill()
was needed.

This patch splits sk_page_frag_refill() into two parts, adding
skb_page_frag_refill() which can be used without a socket.

While we are at it, add a minimum frag size of 32 for
sk_page_frag_refill()

Michael will either use netdev_alloc_frag() from softirq context,
or skb_page_frag_refill() from process context in refill_work()
 (GFP_KERNEL allocations)
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Michael Dalton <mwdalton@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

400dfd3a

net/mlx4: Fix typo, move similar defs to same location · 5930e8d0

由 Or Gerlitz 提交于 10月 15, 2013

Small code cleanup:

1. change MLX4_DEV_CAP_FLAGS2_REASSIGN_MAC_EN to MLX4_DEV_CAP_FLAG2_REASSIGN_MAC_EN

2. put MLX4_SET_PORT_PRIO2TC and MLX4_SET_PORT_SCHEDULER in the same union with the
   other MLX4_SET_PORT_yyy
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NAmir Vadai <amirv@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5930e8d0

15 10月, 2013 1 次提交

netfilter: nfnetlink: add batch support and use it from nf_tables · 0628b123

由 Pablo Neira Ayuso 提交于 10月 14, 2013

This patch adds a batch support to nfnetlink. Basically, it adds
two new control messages:

* NFNL_MSG_BATCH_BEGIN, that indicates the beginning of a batch,
  the nfgenmsg->res_id indicates the nfnetlink subsystem ID.

* NFNL_MSG_BATCH_END, that results in the invocation of the
  ss->commit callback function. If not specified or an error
  ocurred in the batch, the ss->abort function is invoked
  instead.

The end message represents the commit operation in nftables, the
lack of end message results in an abort. This patch also adds the
.call_batch function that is only called from the batch receival
path.

This patch adds atomic rule updates and dumps based on
bitmask generations. This allows to atomically commit a set of
rule-set updates incrementally without altering the internal
state of existing nf_tables expressions/matches/targets.

The idea consists of using a generation cursor of 1 bit and
a bitmask of 2 bits per rule. Assuming the gencursor is 0,
then the genmask (expressed as a bitmask) can be interpreted
as:

00 active in the present, will be active in the next generation.
01 inactive in the present, will be active in the next generation.
10 active in the present, will be deleted in the next generation.
 ^
 gencursor

Once you invoke the transition to the next generation, the global
gencursor is updated:

00 active in the present, will be active in the next generation.
01 active in the present, needs to zero its future, it becomes 00.
10 inactive in the present, delete now.
^
gencursor

If a dump is in progress and nf_tables enters a new generation,
the dump will stop and return -EBUSY to let userspace know that
it has to retry again. In order to invalidate dumps, a global
genctr counter is increased everytime nf_tables enters a new
generation.

This new operation can be used from the user-space utility
that controls the firewall, eg.

nft -f restore

The rule updates contained in `file' will be applied atomically.

cat file
-----
add filter INPUT ip saddr 1.1.1.1 counter accept #1
del filter INPUT ip daddr 2.2.2.2 counter drop   #2
-EOF-

Note that the rule 1 will be inactive until the transition to the
next generation, the rule 2 will be evicted in the next generation.

There is a penalty during the rule update due to the branch
misprediction in the packet matching framework. But that should be
quickly resolved once the iteration over the commit list that
contain rules that require updates is finished.

Event notification happens once the rule-set update has been
committed. So we skip notifications is case the rule-set update
is aborted, which can happen in case that the rule-set is tested
to apply correctly.

This patch squashed the following patches from Pablo:

* nf_tables: atomic rule updates and dumps
* nf_tables: get rid of per rule list_head for commits
* nf_tables: use per netns commit list
* nfnetlink: add batch support and use it from nf_tables
* nf_tables: all rule updates are transactional
* nf_tables: attach replacement rule after stale one
* nf_tables: do not allow deletion/replacement of stale rules
* nf_tables: remove unused NFTA_RULE_FLAGS
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

0628b123

14 10月, 2013 2 次提交

netfilter: add nftables · 96518518

由 Patrick McHardy 提交于 10月 14, 2013

This patch adds nftables which is the intended successor of iptables.
This packet filtering framework reuses the existing netfilter hooks,
the connection tracking system, the NAT subsystem, the transparent
proxying engine, the logging infrastructure and the userspace packet
queueing facilities.

In a nutshell, nftables provides a pseudo-state machine with 4 general
purpose registers of 128 bits and 1 specific purpose register to store
verdicts. This pseudo-machine comes with an extensible instruction set,
a.k.a. "expressions" in the nftables jargon. The expressions included
in this patch provide the basic functionality, they are:

* bitwise: to perform bitwise operations.
* byteorder: to change from host/network endianess.
* cmp: to compare data with the content of the registers.
* counter: to enable counters on rules.
* ct: to store conntrack keys into register.
* exthdr: to match IPv6 extension headers.
* immediate: to load data into registers.
* limit: to limit matching based on packet rate.
* log: to log packets.
* meta: to match metainformation that usually comes with the skbuff.
* nat: to perform Network Address Translation.
* payload: to fetch data from the packet payload and store it into
  registers.
* reject (IPv4 only): to explicitly close connection, eg. TCP RST.

Using this instruction-set, the userspace utility 'nft' can transform
the rules expressed in human-readable text representation (using a
new syntax, inspired by tcpdump) to nftables bytecode.

nftables also inherits the table, chain and rule objects from
iptables, but in a more configurable way, and it also includes the
original datatype-agnostic set infrastructure with mapping support.
This set infrastructure is enhanced in the follow up patch (netfilter:
nf_tables: add netlink set API).

This patch includes the following components:

* the netlink API: net/netfilter/nf_tables_api.c and
  include/uapi/netfilter/nf_tables.h
* the packet filter core: net/netfilter/nf_tables_core.c
* the expressions (described above): net/netfilter/nft_*.c
* the filter tables: arp, IPv4, IPv6 and bridge:
  net/ipv4/netfilter/nf_tables_ipv4.c
  net/ipv6/netfilter/nf_tables_ipv6.c
  net/ipv4/netfilter/nf_tables_arp.c
  net/bridge/netfilter/nf_tables_bridge.c
* the NAT table (IPv4 only):
  net/ipv4/netfilter/nf_table_nat_ipv4.c
* the route table (similar to mangle):
  net/ipv4/netfilter/nf_table_route_ipv4.c
  net/ipv6/netfilter/nf_table_route_ipv6.c
* internal definitions under:
  include/net/netfilter/nf_tables.h
  include/net/netfilter/nf_tables_core.h
* It also includes an skeleton expression:
  net/netfilter/nft_expr_template.c
  and the preliminary implementation of the meta target
  net/netfilter/nft_meta_target.c

It also includes a change in struct nf_hook_ops to add a new
pointer to store private data to the hook, that is used to store
the rule list per chain.

This patch is based on the patch from Patrick McHardy, plus merged
accumulated cleanups, fixes and small enhancements to the nftables
code that has been done since 2009, which are:

From Patrick McHardy:
* nf_tables: adjust netlink handler function signatures
* nf_tables: only retry table lookup after successful table module load
* nf_tables: fix event notification echo and avoid unnecessary messages
* nft_ct: add l3proto support
* nf_tables: pass expression context to nft_validate_data_load()
* nf_tables: remove redundant definition
* nft_ct: fix maxattr initialization
* nf_tables: fix invalid event type in nf_tables_getrule()
* nf_tables: simplify nft_data_init() usage
* nf_tables: build in more core modules
* nf_tables: fix double lookup expression unregistation
* nf_tables: move expression initialization to nf_tables_core.c
* nf_tables: build in payload module
* nf_tables: use NFPROTO constants
* nf_tables: rename pid variables to portid
* nf_tables: save 48 bits per rule
* nf_tables: introduce chain rename
* nf_tables: check for duplicate names on chain rename
* nf_tables: remove ability to specify handles for new rules
* nf_tables: return error for rule change request
* nf_tables: return error for NLM_F_REPLACE without rule handle
* nf_tables: include NLM_F_APPEND/NLM_F_REPLACE flags in rule notification
* nf_tables: fix NLM_F_MULTI usage in netlink notifications
* nf_tables: include NLM_F_APPEND in rule dumps

From Pablo Neira Ayuso:
* nf_tables: fix stack overflow in nf_tables_newrule
* nf_tables: nft_ct: fix compilation warning
* nf_tables: nft_ct: fix crash with invalid packets
* nft_log: group and qthreshold are 2^16
* nf_tables: nft_meta: fix socket uid,gid handling
* nft_counter: allow to restore counters
* nf_tables: fix module autoload
* nf_tables: allow to remove all rules placed in one chain
* nf_tables: use 64-bits rule handle instead of 16-bits
* nf_tables: fix chain after rule deletion
* nf_tables: improve deletion performance
* nf_tables: add missing code in route chain type
* nf_tables: rise maximum number of expressions from 12 to 128
* nf_tables: don't delete table if in use
* nf_tables: fix basechain release

From Tomasz Bursztyka:
* nf_tables: Add support for changing users chain's name
* nf_tables: Change chain's name to be fixed sized
* nf_tables: Add support for replacing a rule by another one
* nf_tables: Update uapi nftables netlink header documentation

From Florian Westphal:
* nft_log: group is u16, snaplen u32

From Phil Oester:
* nf_tables: operational limit match
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

96518518

netfilter: pass hook ops to hookfn · 795aa6ef

由 Patrick McHardy 提交于 10月 10, 2013

Pass the hook ops to the hookfn to allow for generic hook
functions. This change is required by nf_tables.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

795aa6ef

10 10月, 2013 1 次提交

inet: includes a sock_common in request_sock · 634fb979

由 Eric Dumazet 提交于 10月 09, 2013

TCP listener refactoring, part 5 :

We want to be able to insert request sockets (SYN_RECV) into main
ehash table instead of the per listener hash table to allow RCU
lookups and remove listener lock contention.

This patch includes the needed struct sock_common in front
of struct request_sock

This means there is no more inet6_request_sock IPv6 specific
structure.

Following inet_request_sock fields were renamed as they became
macros to reference fields from struct sock_common.
Prefix ir_ was chosen to avoid name collisions.

loc_port   -> ir_loc_port
loc_addr   -> ir_loc_addr
rmt_addr   -> ir_rmt_addr
rmt_port   -> ir_rmt_port
iif        -> ir_iif
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

634fb979

09 10月, 2013 1 次提交

ipv6: make lookups simpler and faster · efe4208f

由 Eric Dumazet 提交于 10月 03, 2013

TCP listener refactoring, part 4 :

To speed up inet lookups, we moved IPv4 addresses from inet to struct
sock_common

Now is time to do the same for IPv6, because it permits us to have fast
lookups for all kind of sockets, including upcoming SYN_RECV.

Getting IPv6 addresses in TCP lookups currently requires two extra cache
lines, plus a dereference (and memory stall).

inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6

This patch is way bigger than its IPv4 counter part, because for IPv4,
we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6,
it's not doable easily.

inet6_sk(sk)->daddr becomes sk->sk_v6_daddr
inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr

And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr
at the same offset.

We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic
macro.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

efe4208f

08 10月, 2013 3 次提交

net: Separate the close_list and the unreg_list v2 · 5cde2829

由 Eric W. Biederman 提交于 10月 05, 2013

Separate the unreg_list and the close_list in dev_close_many preventing
dev_close_many from permuting the unreg_list.  The permutations of the
unreg_list have resulted in cases where the loopback device is accessed
it has been freed in code such as dst_ifdown.  Resulting in subtle memory
corruption.

This is the second bug from sharing the storage between the close_list
and the unreg_list.  The issues that crop up with sharing are
apparently too subtle to show up in normal testing or usage, so let's
forget about being clever and use two separate lists.

v2: Make all callers pass in a close_list to dev_close_many
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5cde2829

net: fix unsafe set_memory_rw from softirq · d45ed4a4

由 Alexei Starovoitov 提交于 10月 04, 2013

on x86 system with net.core.bpf_jit_enable = 1

sudo tcpdump -i eth1 'tcp port 22'

causes the warning:
[   56.766097]  Possible unsafe locking scenario:
[   56.766097]
[   56.780146]        CPU0
[   56.786807]        ----
[   56.793188]   lock(&(&vb->lock)->rlock);
[   56.799593]   <Interrupt>
[   56.805889]     lock(&(&vb->lock)->rlock);
[   56.812266]
[   56.812266]  *** DEADLOCK ***
[   56.812266]
[   56.830670] 1 lock held by ksoftirqd/1/13:
[   56.836838]  #0:  (rcu_read_lock){.+.+..}, at: [<ffffffff8118f44c>] vm_unmap_aliases+0x8c/0x380
[   56.849757]
[   56.849757] stack backtrace:
[   56.862194] CPU: 1 PID: 13 Comm: ksoftirqd/1 Not tainted 3.12.0-rc3+ #45
[   56.868721] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
[   56.882004]  ffffffff821944c0 ffff88080bbdb8c8 ffffffff8175a145 0000000000000007
[   56.895630]  ffff88080bbd5f40 ffff88080bbdb928 ffffffff81755b14 0000000000000001
[   56.909313]  ffff880800000001 ffff880800000000 ffffffff8101178f 0000000000000001
[   56.923006] Call Trace:
[   56.929532]  [<ffffffff8175a145>] dump_stack+0x55/0x76
[   56.936067]  [<ffffffff81755b14>] print_usage_bug+0x1f7/0x208
[   56.942445]  [<ffffffff8101178f>] ? save_stack_trace+0x2f/0x50
[   56.948932]  [<ffffffff810cc0a0>] ? check_usage_backwards+0x150/0x150
[   56.955470]  [<ffffffff810ccb52>] mark_lock+0x282/0x2c0
[   56.961945]  [<ffffffff810ccfed>] __lock_acquire+0x45d/0x1d50
[   56.968474]  [<ffffffff810cce6e>] ? __lock_acquire+0x2de/0x1d50
[   56.975140]  [<ffffffff81393bf5>] ? cpumask_next_and+0x55/0x90
[   56.981942]  [<ffffffff810cef72>] lock_acquire+0x92/0x1d0
[   56.988745]  [<ffffffff8118f52a>] ? vm_unmap_aliases+0x16a/0x380
[   56.995619]  [<ffffffff817628f1>] _raw_spin_lock+0x41/0x50
[   57.002493]  [<ffffffff8118f52a>] ? vm_unmap_aliases+0x16a/0x380
[   57.009447]  [<ffffffff8118f52a>] vm_unmap_aliases+0x16a/0x380
[   57.016477]  [<ffffffff8118f44c>] ? vm_unmap_aliases+0x8c/0x380
[   57.023607]  [<ffffffff810436b0>] change_page_attr_set_clr+0xc0/0x460
[   57.030818]  [<ffffffff810cfb8d>] ? trace_hardirqs_on+0xd/0x10
[   57.037896]  [<ffffffff811a8330>] ? kmem_cache_free+0xb0/0x2b0
[   57.044789]  [<ffffffff811b59c3>] ? free_object_rcu+0x93/0xa0
[   57.051720]  [<ffffffff81043d9f>] set_memory_rw+0x2f/0x40
[   57.058727]  [<ffffffff8104e17c>] bpf_jit_free+0x2c/0x40
[   57.065577]  [<ffffffff81642cba>] sk_filter_release_rcu+0x1a/0x30
[   57.072338]  [<ffffffff811108e2>] rcu_process_callbacks+0x202/0x7c0
[   57.078962]  [<ffffffff81057f17>] __do_softirq+0xf7/0x3f0
[   57.085373]  [<ffffffff81058245>] run_ksoftirqd+0x35/0x70

cannot reuse jited filter memory, since it's readonly,
so use original bpf insns memory to hold work_struct

defer kfree of sk_filter until jit completed freeing

tested on x86_64 and i386
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d45ed4a4

netif_set_xps_queue: make cpu mask const · 3573540c

由 Michael S. Tsirkin 提交于 10月 02, 2013

virtio wants to pass in cpumask_of(cpu), make parameter
const to avoid build warnings.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3573540c

04 10月, 2013 1 次提交

inet: consolidate INET_TW_MATCH · 50805466

由 Eric Dumazet 提交于 10月 02, 2013

TCP listener refactoring, part 2 :

We can use a generic lookup, sockets being in whatever state, if
we are sure all relevant fields are at the same place in all socket
types (ESTABLISH, TIME_WAIT, SYN_RECV)

This patch removes these macros :

 inet_addrpair, inet_addrpair, tw_addrpair, tw_portpair

And adds :

 sk_portpair, sk_addrpair, sk_daddr, sk_rcv_saddr

Then, INET_TW_MATCH() is really the same than INET_MATCH()
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

50805466

03 10月, 2013 3 次提交

include/linux/skbuff.h: move CONFIG_XFRM check inside the skb_sec_path() · 0b3d8e08

由 Denis Kirjanov 提交于 10月 02, 2013

And thus we have only one function definition
Signed-off-by: NDenis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0b3d8e08

tc: export tc_defact.h to userspace · 5bc3db5c

由 stephen hemminger 提交于 9月 30, 2013

Jamal sent patch to add tc user simple actions to iproute2
but required header was not being exported.
Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5bc3db5c

ssb: provide phy address for Gigabit Ethernet driver · 4bcef89f

由 Hauke Mehrtens 提交于 9月 28, 2013

Add a function to provide the phy address which should be used to the
Gigabit Ethernet driver connected to ssb.
Signed-off-by: NHauke Mehrtens <hauke@hauke-m.de>
Reviewed-by: NNithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4bcef89f

01 10月, 2013 16 次提交

netfilter: nf_ct_sip: consolidate NAT hook functions · 180cf72f

由 holger@eitzenberger.org 提交于 9月 30, 2013

There are currently seven different NAT hooks used in both
nf_conntrack_sip and nf_nat_sip, each of the hooks is exported in
nf_conntrack_sip, then set from the nf_nat_sip NAT helper.

And because each of them is exported there is quite some overhead
introduced due of this.

By introducing nf_nat_sip_hooks I am able to reduce both text/data
somewhat.  For nf_conntrack_sip e. g. I get

        text             data              bss              dec
old    15243             5256               32            20531
new    15010             5192               32            20234
Signed-off-by: NHolger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

180cf72f

ieee80211: fix vht cap definitions · 5eb7906b

由 Eliad Peller 提交于 8月 29, 2013

VHT_CAP_BEAMFORMER_ANTENNAS cap is actually defined in the draft as
VHT_CAP_BEAMFORMEE_STS_MAX, and its size is 3 bits long.

VHT_CAP_SOUNDING_DIMENSIONS is also 3 bits long.

Fix the definitions and change the cap masking accordingly.
Signed-off-by: NEliad Peller <eliad@wizery.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

5eb7906b

skbuff: size of hole is wrong in a comment · 45906723

由 Nicolas Dichtel 提交于 9月 30, 2013

Since commit c93bdd0e ("netvm: allow skb allocation to use PFMEMALLOC
reserves"), hole size is one bit less than what is written in the comment.
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

45906723

net: skb_is_gso_v6() requires skb_is_gso() · 36a8f39e

由 Eric Dumazet 提交于 9月 29, 2013

bnx2x makes a dangerous use of skb_is_gso_v6().

It should first make sure skb is a gso packet
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Eilon Greenstein <eilong@broadcom.com>
Acked-by: NDmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

36a8f39e

mm: avoid reinserting isolated balloon pages into LRU lists · 117aad1e

由 Rafael Aquini 提交于 9月 30, 2013

Isolated balloon pages can wrongly end up in LRU lists when
migrate_pages() finishes its round without draining all the isolated
page list.

The same issue can happen when reclaim_clean_pages_from_list() tries to
reclaim pages from an isolated page list, before migration, in the CMA
path.  Such balloon page leak opens a race window against LRU lists
shrinkers that leads us to the following kernel panic:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
  IP: [<ffffffff810c2625>] shrink_page_list+0x24e/0x897
  PGD 3cda2067 PUD 3d713067 PMD 0
  Oops: 0000 [#1] SMP
  CPU: 0 PID: 340 Comm: kswapd0 Not tainted 3.12.0-rc1-22626-g4367597 #87
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  RIP: shrink_page_list+0x24e/0x897
  RSP: 0000:ffff88003da499b8  EFLAGS: 00010286
  RAX: 0000000000000000 RBX: ffff88003e82bd60 RCX: 00000000000657d5
  RDX: 0000000000000000 RSI: 000000000000031f RDI: ffff88003e82bd40
  RBP: ffff88003da49ab0 R08: 0000000000000001 R09: 0000000081121a45
  R10: ffffffff81121a45 R11: ffff88003c4a9a28 R12: ffff88003e82bd40
  R13: ffff88003da0e800 R14: 0000000000000001 R15: ffff88003da49d58
  FS:  0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00000000067d9000 CR3: 000000003ace5000 CR4: 00000000000407b0
  Call Trace:
    shrink_inactive_list+0x240/0x3de
    shrink_lruvec+0x3e0/0x566
    __shrink_zone+0x94/0x178
    shrink_zone+0x3a/0x82
    balance_pgdat+0x32a/0x4c2
    kswapd+0x2f0/0x372
    kthread+0xa2/0xaa
    ret_from_fork+0x7c/0xb0
  Code: 80 7d 8f 01 48 83 95 68 ff ff ff 00 4c 89 e7 e8 5a 7b 00 00 48 85 c0 49 89 c5 75 08 80 7d 8f 00 74 3e eb 31 48 8b 80 18 01 00 00 <48> 8b 74 0d 48 8b 78 30 be 02 00 00 00 ff d2 eb
  RIP  [<ffffffff810c2625>] shrink_page_list+0x24e/0x897
   RSP <ffff88003da499b8>
  CR2: 0000000000000028
  ---[ end trace 703d2451af6ffbfd ]---
  Kernel panic - not syncing: Fatal exception

This patch fixes the issue, by assuring the proper tests are made at
putback_movable_pages() & reclaim_clean_pages_from_list() to avoid
isolated balloon pages being wrongly reinserted in LRU lists.

[akpm@linux-foundation.org: clarify awkward comment text]
Signed-off-by: NRafael Aquini <aquini@redhat.com>
Reported-by: NLuiz Capitulino <lcapitulino@redhat.com>
Tested-by: NLuiz Capitulino <lcapitulino@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

117aad1e

netfiler: ipset: Add net namespace for ipset · 1785e8f4

由 Vitaly Lavrov 提交于 9月 30, 2013

This patch adds netns support for ipset.

Major changes were made in ip_set_core.c and ip_set.h.
Global variables are moved to per net namespace.
Added initialization code and the destruction of the network namespace ipset subsystem.
In the prototypes of public functions ip_set_* added parameter "struct net*".

The remaining corrections related to the change prototypes of public functions ip_set_*.

The patch for git://git.netfilter.org/ipset.git commit 6a4ec96c0b8caac5c35474e40e319704d92ca347
Signed-off-by: NVitaly Lavrov <lve@guap.ru>
Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>

1785e8f4

J
netfilter: ipset: Use a common function at listing the extensions · 3fd986b3
由 Jozsef Kadlecsik 提交于 9月 25, 2013
```
Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
```
3fd986b3

netfilter: ipset: Support comments for ipset entries in the core. · 68b63f08

由 Oliver Smith 提交于 9月 22, 2013

This adds the core support for having comments on ipset entries.

The comments are stored as standard null-terminated strings in
dynamically allocated memory after being passed to the kernel. As a
result of this, code has been added to the generic destroy function to
iterate all extensions and call that extension's destroy task if the set
has that extension activated, and if such a task is defined.
Signed-off-by: NOliver Smith <oliver@8.c.9.b.0.7.4.0.1.0.0.2.ip6.arpa>
Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>

68b63f08

J
netfilter: ipset: Support extensions which need a per data destroy function · 40cd63bf
由 Jozsef Kadlecsik 提交于 9月 09, 2013
```
Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
```
40cd63bf

netfilter: ipset: Generalize extensions support · 03c8b234

由 Jozsef Kadlecsik 提交于 9月 07, 2013

Get rid of the structure based extensions and introduce a blob for
the extensions. Thus we can support more extension types easily.
Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>

03c8b234

netfilter: ipset: Move extension data to set structure · ca134ce8

由 Jozsef Kadlecsik 提交于 9月 07, 2013

Default timeout and extension offsets are moved to struct set, because
all set types supports all extensions and it makes possible to generalize
extension support.
Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>

ca134ce8

J
netfilter: ipset: Rename extension offset ids to extension ids · f925f705
由 Jozsef Kadlecsik 提交于 9月 06, 2013
```
Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
```
f925f705

netfilter: ipset: Prepare ipset to support multiple networks for hash types · a04d8b6b

由 Jozsef Kadlecsik 提交于 9月 30, 2013

In order to support hash:net,net, hash:net,port,net etc. types,
arrays are introduced for the book-keeping of existing cidr sizes
and network numbers in a set.
Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>

a04d8b6b

J
netfilter: ipset: Use fix sized type for timeout in the extension part · b8cd9786
由 Jozsef Kadlecsik 提交于 5月 02, 2013
```
Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
```
b8cd9786
J
netfilter: ipset: Rename simple macro names to avoid namespace issues. · 35b8dcf8
由 Jozsef Kadlecsik 提交于 4月 30, 2013
```
Reported-by: NDavid Laight <David.Laight@ACULAB.COM>
Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
```
35b8dcf8

dev: update __dev_notify_flags() to send rtnl msg · a528c219

由 Nicolas Dichtel 提交于 9月 25, 2013

This patch only prepares the next one, there is no functional change.
Now, __dev_notify_flags() can also be used to notify flags changes via
rtnetlink.
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a528c219

29 9月, 2013 1 次提交

USBNET: fix handling padding packet · 60e453a9

由 Ming Lei 提交于 9月 23, 2013

Commit 638c5115(USBNET: support DMA SG) introduces DMA SG
if the usb host controller is capable of building packet from
discontinuous buffers, but missed handling padding packet when
building DMA SG.

This patch attachs the pre-allocated padding packet at the
end of the sg list, so padding packet can be sent to device
if drivers require that.
Reported-by: NDavid Laight <David.Laight@aculab.com>
Acked-by: NOliver Neukum <oliver@neukum.org>
Signed-off-by: NMing Lei <ming.lei@canonical.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

60e453a9

28 9月, 2013 1 次提交

mutex: replace CONFIG_HAVE_ARCH_MUTEX_CPU_RELAX with simple ifdef · 083986e8

由 Heiko Carstens 提交于 9月 28, 2013

Linus suggested to replace

 #ifndef CONFIG_HAVE_ARCH_MUTEX_CPU_RELAX
 #define arch_mutex_cpu_relax() cpu_relax()
 #endif

with just a simple

  #ifndef arch_mutex_cpu_relax
  # define arch_mutex_cpu_relax() cpu_relax()
  #endif

to get rid of CONFIG_HAVE_CPU_RELAX_SIMPLE. So architectures can
simply define arch_mutex_cpu_relax if they want an architecture
specific function instead of having to add a select statement in
their Kconfig in addition.
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>

083986e8

27 9月, 2013 2 次提交

[networking]device.h: Remove extern from function prototypes · f629d208

由 Joe Perches 提交于 9月 26, 2013

There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.
Signed-off-by: NJoe Perches <joe@perches.com>

f629d208

net.h/skbuff.h: Remove extern from function prototypes · 7965bd4d

由 Joe Perches 提交于 9月 26, 2013

There are a mix of function prototypes with and without extern
in the kernel sources.  Standardize on not using extern for
function prototypes.

Function prototypes don't need to be written with extern.
extern is assumed by the compiler.  Its use is as unnecessary as
using auto to declare automatic/local variables in a block.
Signed-off-by: NJoe Perches <joe@perches.com>

7965bd4d

OpenHarmony / kernel_linux 上一次同步 4 年多

OpenHarmony / kernel_linux
上一次同步 4 年多