提交 · 99a1dec70d5acbd8c6b3928cdebb4a2d1da676c8 · openanolis / cloud-kernel

01 8月, 2012 1 次提交

memcg: rename config variables · c255a458

由 Andrew Morton 提交于 7月 31, 2012

Sanity:

CONFIG_CGROUP_MEM_RES_CTLR -> CONFIG_MEMCG
CONFIG_CGROUP_MEM_RES_CTLR_SWAP -> CONFIG_MEMCG_SWAP
CONFIG_CGROUP_MEM_RES_CTLR_SWAP_ENABLED -> CONFIG_MEMCG_SWAP_ENABLED
CONFIG_CGROUP_MEM_RES_CTLR_KMEM -> CONFIG_MEMCG_KMEM

[mhocko@suse.cz: fix missed bits]
Cc: Glauber Costa <glommer@parallels.com>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c255a458

23 7月, 2012 1 次提交

net: netprio_cgroup: rework update socket logic · 406a3c63

由 John Fastabend 提交于 7月 20, 2012

Instead of updating the sk_cgrp_prioidx struct field on every send
this only updates the field when a task is moved via cgroup
infrastructure.

This allows sockets that may be used by a kernel worker thread
to be managed. For example in the iscsi case today a user can
put iscsid in a netprio cgroup and control traffic will be sent
with the correct sk_cgrp_prioidx value set but as soon as data
is sent the kernel worker thread isssues a send and sk_cgrp_prioidx
is updated with the kernel worker threads value which is the
default case.

It seems more correct to only update the field when the user
explicitly sets it via control group infrastructure. This allows
the users to manage sockets that may be used with other threads.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

406a3c63

12 7月, 2012 1 次提交

tcp: TCP Small Queues · 46d3ceab

由 Eric Dumazet 提交于 7月 11, 2012

This introduce TSQ (TCP Small Queues)

TSQ goal is to reduce number of TCP packets in xmit queues (qdisc &
device queues), to reduce RTT and cwnd bias, part of the bufferbloat
problem.

sk->sk_wmem_alloc not allowed to grow above a given limit,
allowing no more than ~128KB [1] per tcp socket in qdisc/dev layers at a
given time.

TSO packets are sized/capped to half the limit, so that we have two
TSO packets in flight, allowing better bandwidth use.

As a side effect, setting the limit to 40000 automatically reduces the
standard gso max limit (65536) to 40000/2 : It can help to reduce
latencies of high prio packets, having smaller TSO packets.

This means we divert sock_wfree() to a tcp_wfree() handler, to
queue/send following frames when skb_orphan() [2] is called for the
already queued skbs.

Results on my dev machines (tg3/ixgbe nics) are really impressive,
using standard pfifo_fast, and with or without TSO/GSO.

Without reduction of nominal bandwidth, we have reduction of buffering
per bulk sender :
< 1ms on Gbit (instead of 50ms with TSO)
< 8ms on 100Mbit (instead of 132 ms)

I no longer have 4 MBytes backlogged in qdisc by a single netperf
session, and both side socket autotuning no longer use 4 Mbytes.

As skb destructor cannot restart xmit itself ( as qdisc lock might be
taken at this point ), we delegate the work to a tasklet. We use one
tasklest per cpu for performance reasons.

If tasklet finds a socket owned by the user, it sets TSQ_OWNED flag.
This flag is tested in a new protocol method called from release_sock(),
to eventually send new segments.

[1] New /proc/sys/net/ipv4/tcp_limit_output_bytes tunable
[2] skb_orphan() is usually called at TX completion time,
  but some drivers call it in their start_xmit() handler.
  These drivers should at least use BQL, or else a single TCP
  session can still fill the whole NIC TX ring, since TSQ will
  have no effect.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Dave Taht <dave.taht@bufferbloat.net>
Cc: Tom Herbert <therbert@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

46d3ceab

20 6月, 2012 1 次提交

ipv4: Early TCP socket demux. · 41063e9d

由 David S. Miller 提交于 6月 19, 2012

Input packet processing for local sockets involves two major demuxes.
One for the route and one for the socket.

But we can optimize this down to one demux for certain kinds of local
sockets.

Currently we only do this for established TCP sockets, but it could
at least in theory be expanded to other kinds of connections.

If a TCP socket is established then it's identity is fully specified.

This means that whatever input route was used during the three-way
handshake must work equally well for the rest of the connection since
the keys will not change.

Once we move to established state, we cache the receive packet's input
route to use later.

Like the existing cached route in sk->sk_dst_cache used for output
packets, we have to check for route invalidations using dst->obsolete
and dst->ops->check().

Early demux occurs outside of a socket locked section, so when a route
invalidation occurs we defer the fixup of sk->sk_rx_dst until we are
actually inside of established state packet processing and thus have
the socket locked.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

41063e9d

01 6月, 2012 1 次提交

net: sock: validate data_len before allocating skb in sock_alloc_send_pskb() · cc9b17ad

由 Jason Wang 提交于 5月 30, 2012

We need to validate the number of pages consumed by data_len, otherwise frags
array could be overflowed by userspace. So this patch validate data_len and
return -EMSGSIZE when data_len may occupies more frags than MAX_SKB_FRAGS.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cc9b17ad

17 5月, 2012 2 次提交

net: core: Use pr_<level> · e005d193

由 Joe Perches 提交于 5月 16, 2012

Use the current logging style.

This enables use of dynamic debugging as well.

Convert printk(KERN_<LEVEL> to pr_<level>.
Add pr_fmt. Remove embedded prefixes, use
%s, __func__ instead.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e005d193

net: sock_flag() cleanup · 1b23a5df

由 Eric Dumazet 提交于 5月 16, 2012

- sock_flag() accepts a const pointer

- sock_flag() returns a boolean
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1b23a5df

09 5月, 2012 1 次提交

net: export sysctl_[r|w]mem_max symbols needed by ip_vs_sync · 6d8ebc8a

由 Hans Schillstrom 提交于 4月 30, 2012

To build ip_vs as a module sysctl_rmem_max and sysctl_wmem_max
needs to be exported.

The dependency was added by "ipvs: wakeup master thread" patch.
Signed-off-by: NHans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: NSimon Horman <horms@verge.net.au>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

6d8ebc8a

03 5月, 2012 2 次提交

userns: Replace user_ns_map_uid and user_ns_map_gid with from_kuid and from_kgid · 76b6db01

由 Eric W. Biederman 提交于 3月 14, 2012

These function are no longer needed replace them with their more useful equivalents.
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

76b6db01

D
net: Add missing linux/prefetch.h include to net/core/sock.c · 8c1ae10d
由 David S. Miller 提交于 5月 03, 2012
```
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
8c1ae10d

01 5月, 2012 1 次提交

net: add a prefetch in socket backlog processing · e4cbb02a

由 Eric Dumazet 提交于 4月 30, 2012

TCP or UDP stacks have big enough latencies that prefetching next
pointer is worth it.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e4cbb02a

29 4月, 2012 1 次提交

net: Fixed a coding style issue related to spaces. · cb75a36c

由 Jeffrin Jose 提交于 4月 25, 2012

Fixed a coding style issue relating to spaces
in net/core/sock.c
Signed-off-by: NJeffrin Jose <ahiliation@yahoo.co.in>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cb75a36c

27 4月, 2012 1 次提交

net: cleanups in sock_setsockopt() · 82981930

由 Eric Dumazet 提交于 4月 26, 2012

Use min_t()/max_t() macros, reformat two comments, use !!test_bit() to
match !!sock_flag()
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

82981930

24 4月, 2012 1 次提交

net: add a limit parameter to sk_add_backlog() · f545a38f

由 Eric Dumazet 提交于 4月 22, 2012

sk_add_backlog() & sk_rcvqueues_full() hard coded sk_rcvbuf as the
memory limit. We need to make this limit a parameter for TCP use.

No functional change expected in this patch, all callers still using the
old sk_rcvbuf limit.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Rick Jones <rick.jones2@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f545a38f

22 4月, 2012 1 次提交

sock: Introduce named constants for sk_reuse · 4a17fd52

由 Pavel Emelyanov 提交于 4月 19, 2012

Name them in a "backward compatible" manner, i.e. reuse or not
are still 1 and 0 respectively. The reuse value of 2 means that
the socket with it will forcibly reuse everyone else's port.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4a17fd52

16 4月, 2012 1 次提交

net: cleanup unsigned to unsigned int · 95c96174

由 Eric Dumazet 提交于 4月 15, 2012

Use of "unsigned int" is preferred to bare "unsigned" in net tree.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

95c96174

11 4月, 2012 1 次提交

cgroup: pass struct mem_cgroup instead of struct cgroup to socket memcg · 1d62e436

由 Glauber Costa 提交于 4月 09, 2012

The only reason cgroup was used, was to be consistent with the populate()
interface. Now that we're getting rid of it, not only we no longer need
it, but we also *can't* call it this way.

Since we will no longer rely on populate(), this will be called from
create(). During create, the association between struct mem_cgroup
and struct cgroup does not yet exist, since cgroup internals hasn't
yet initialized its bookkeeping. This means we would not be able
to draw the memcg pointer from the cgroup pointer in these
functions, which is highly undesirable.
Signed-off-by: NGlauber Costa <glommer@parallels.com>
Acked-by: NKamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NTejun Heo <tj@kernel.org>
CC: Li Zefan <lizefan@huawei.com>
CC: Johannes Weiner <hannes@cmpxchg.org>
CC: Michal Hocko <mhocko@suse.cz>

1d62e436

29 3月, 2012 1 次提交

Remove all #inclusions of asm/system.h · 9ffc93f2

由 David Howells 提交于 3月 28, 2012

Remove all #inclusions of asm/system.h preparatory to splitting and killing
it. Performed with the following command:

perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *`
Signed-off-by: NDavid Howells <dhowells@redhat.com>

9ffc93f2

25 2月, 2012 1 次提交
- D
  net: Add missing getsockopt for SO_NOFCS. · bc2f7996
  由 David S. Miller 提交于 2月 24, 2012
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  bc2f7996
24 2月, 2012 2 次提交

net: Add framework to allow sending packets with customized CRC. · 3bdc0eba

由 Ben Greear 提交于 2月 11, 2012

This is useful for testing RX handling of frames with bad
CRCs.

Requires driver support to actually put the packet on the
wire properly.
Signed-off-by: NBen Greear <greearb@candelatech.com>
Tested-by: NAaron Brown <aaron.f.brown@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

3bdc0eba

static keys: Introduce 'struct static_key', static_key_true()/false() and... · c5905afb

由 Ingo Molnar 提交于 2月 24, 2012

static keys: Introduce 'struct static_key', static_key_true()/false() and static_key_slow_[inc|dec]()

So here's a boot tested patch on top of Jason's series that does
all the cleanups I talked about and turns jump labels into a
more intuitive to use facility. It should also address the
various misconceptions and confusions that surround jump labels.

Typical usage scenarios:

        #include <linux/static_key.h>

        struct static_key key = STATIC_KEY_INIT_TRUE;

        if (static_key_false(&key))
                do unlikely code
        else
                do likely code

Or:

        if (static_key_true(&key))
                do likely code
        else
                do unlikely code

The static key is modified via:

        static_key_slow_inc(&key);
        ...
        static_key_slow_dec(&key);

The 'slow' prefix makes it abundantly clear that this is an
expensive operation.

I've updated all in-kernel code to use this everywhere. Note
that I (intentionally) have not pushed through the rename
blindly through to the lowest levels: the actual jump-label
patching arch facility should be named like that, so we want to
decouple jump labels from the static-key facility a bit.

On non-jump-label enabled architectures static keys default to
likely()/unlikely() branches.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Acked-by: NJason Baron <jbaron@redhat.com>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Cc: a.p.zijlstra@chello.nl
Cc: mathieu.desnoyers@efficios.com
Cc: davem@davemloft.net
Cc: ddaney.cavm@gmail.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20120222085809.GA26397@elte.huSigned-off-by: NIngo Molnar <mingo@elte.hu>

c5905afb

22 2月, 2012 1 次提交

sock: Introduce the SO_PEEK_OFF sock option · ef64a54f

由 Pavel Emelyanov 提交于 2月 21, 2012

This one specifies where to start MSG_PEEK-ing queue data from. When
set to negative value means that MSG_PEEK works as ususally -- peeks
from the head of the queue always.

When some bytes are peeked from queue and the peeking offset is non
negative it is moved forward so that the next peek will return next
portion of data.

When non-peeking recvmsg occurs and the peeking offset is non negative
is is moved backward so that the next peek will still peek the proper
data (i.e. the one that would have been picked if there were no non
peeking recv in between).

The offset is set using per-proto opteration to let the protocol handle
the locking issues and to check whether the peeking offset feature is
supported by the protocol the socket belongs to.
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ef64a54f

11 2月, 2012 1 次提交

netprio_cgroup: fix wrong memory access when NETPRIO_CGROUP=m · 2b73bc65

由 Neil Horman 提交于 2月 10, 2012

When the netprio_cgroup module is not loaded, net_prio_subsys_id
is -1, and so sock_update_prioidx() accesses cgroup_subsys array
with negative index subsys[-1].

Make the code resembles cls_cgroup code, which is bug free.
Origionally-authored-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2b73bc65

03 2月, 2012 1 次提交

cgroup: remove cgroup_subsys argument from callbacks · 761b3ef5

由 Li Zefan 提交于 1月 31, 2012

The argument is not used at all, and it's not necessary, because
a specific callback handler of course knows which subsys it
belongs to.

Now only ->pupulate() takes this argument, because the handlers of
this callback always call cgroup_add_file()/cgroup_add_files().

So we reduce a few lines of code, though the shrinking of object size
is minimal.

 16 files changed, 113 insertions(+), 162 deletions(-)

   text    data     bss     dec     hex filename
5486240  656987 7039960 13183187         c928d3 vmlinux.o.orig
5486170  656987 7039960 13183117         c9288d vmlinux.o
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

761b3ef5

23 1月, 2012 1 次提交

net: introduce res_counter_charge_nofail() for socket allocations · 0e90b31f

由 Glauber Costa 提交于 1月 20, 2012

There is a case in __sk_mem_schedule(), where an allocation
is beyond the maximum, but yet we are allowed to proceed.
It happens under the following condition:

	sk->sk_wmem_queued + size >= sk->sk_sndbuf

The network code won't revert the allocation in this case,
meaning that at some point later it'll try to do it. Since
this is never communicated to the underlying res_counter
code, there is an inbalance in res_counter uncharge operation.

I see two ways of fixing this:

1) storing the information about those allocations somewhere
   in memcg, and then deducting from that first, before
   we start draining the res_counter,
2) providing a slightly different allocation function for
   the res_counter, that matches the original behavior of
   the network code more closely.

I decided to go for #2 here, believing it to be more elegant,
since #1 would require us to do basically that, but in a more
obscure way.
Signed-off-by: NGlauber Costa <glommer@parallels.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
CC: Tejun Heo <tj@kernel.org>
CC: Li Zefan <lizf@cn.fujitsu.com>
CC: Laurent Chavey <chavey@google.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0e90b31f

10 1月, 2012 1 次提交

net: Fix build with INET disabled. · 3969eb38

由 David S. Miller 提交于 1月 09, 2012

> net/core/sock.c: In function 'sk_update_clone':
> net/core/sock.c:1278:3: error: implicit declaration of function 'sock_update_memcg'
Reported-by: NRandy Dunlap <rdunlap@xenotime.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3969eb38

09 1月, 2012 1 次提交

net: sk_update_clone is only used in net/core/sock.c · 475f1b52

由 Stephen Rothwell 提交于 1月 09, 2012

so move it there.  Fixes build errors when CONFIG_INET is not defined:

In file included from include/linux/tcp.h:211:0,
                 from include/linux/ipv6.h:221,
                 from include/net/ipv6.h:16,
                 from include/linux/sunrpc/clnt.h:26,
                 from include/linux/nfs_fs.h:50,
                 from init/do_mounts.c:20:
include/net/sock.h: In function 'sk_update_clone':
include/net/sock.h:1109:3: error: implicit declaration of function 'sock_update_memcg' [-Werror=implicit-function-declaration]
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

475f1b52

08 1月, 2012 1 次提交

net: fix sock_clone reference mismatch with tcp memcontrol · f3f511e1

由 Glauber Costa 提交于 1月 05, 2012

Sockets can also be created through sock_clone. Because it copies
all data in the sock structure, it also copies the memcg-related pointer,
and all should be fine. However, since we now use reference counts in
socket creation, we are left with some sockets that have no reference
counts. It matters when we destroy them, since it leads to a mismatch.
Signed-off-by: NGlauber Costa <glommer@parallels.com>
CC: David S. Miller <davem@davemloft.net>
CC: Greg Thelen <gthelen@google.com>
CC: Hiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com>
CC: Laurent Chavey <chavey@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f3f511e1

23 12月, 2011 1 次提交

net: relax rcvbuf limits · 0fd7bac6

由 Eric Dumazet 提交于 12月 21, 2011

skb->truesize might be big even for a small packet.

Its even bigger after commit 87fb4b7b (net: more accurate skb
truesize) and big MTU.

We should allow queueing at least one packet per receiver, even with a
low RCVBUF setting.
Reported-by: NMichal Simek <monstr@monstr.eu>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0fd7bac6

17 12月, 2011 1 次提交

net: fix sleeping while atomic problem in sock mem_cgroup. · 36b77a52

由 Glauber Costa 提交于 12月 16, 2011

We can't scan the proto_list to initialize sock cgroups, as it
holds a rwlock, and we also want to keep the code generic enough to
avoid calling the initialization functions of protocols directly,

Convert proto_list_lock into a mutex, so we can sleep and do the
necessary allocations. This lock is seldom taken, so there shouldn't
be any performance penalties associated with that
Signed-off-by: NGlauber Costa <glommer@parallels.com>
CC: Hiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com>
CC: David S. Miller <davem@davemloft.net>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Stephen Rothwell <sfr@canb.auug.org.au>
CC: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

36b77a52

13 12月, 2011 3 次提交

tcp memory pressure controls · d1a4c0b3

由 Glauber Costa 提交于 12月 11, 2011

This patch introduces memory pressure controls for the tcp
protocol. It uses the generic socket memory pressure code
introduced in earlier patches, and fills in the
necessary data in cg_proto struct.
Signed-off-by: NGlauber Costa <glommer@parallels.com>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtisu.com>
CC: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d1a4c0b3

socket: initial cgroup code. · e1aab161

由 Glauber Costa 提交于 12月 11, 2011

The goal of this work is to move the memory pressure tcp
controls to a cgroup, instead of just relying on global
conditions.

To avoid excessive overhead in the network fast paths,
the code that accounts allocated memory to a cgroup is
hidden inside a static_branch(). This branch is patched out
until the first non-root cgroup is created. So when nobody
is using cgroups, even if it is mounted, no significant performance
penalty should be seen.

This patch handles the generic part of the code, and has nothing
tcp-specific.
Signed-off-by: NGlauber Costa <glommer@parallels.com>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtsu.com>
CC: Kirill A. Shutemov <kirill@shutemov.name>
CC: David S. Miller <davem@davemloft.net>
CC: Eric W. Biederman <ebiederm@xmission.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e1aab161

foundations of per-cgroup memory pressure controlling. · 180d8cd9

由 Glauber Costa 提交于 12月 11, 2011

This patch replaces all uses of struct sock fields' memory_pressure,
memory_allocated, sockets_allocated, and sysctl_mem to acessor
macros. Those macros can either receive a socket argument, or a mem_cgroup
argument, depending on the context they live in.

Since we're only doing a macro wrapping here, no performance impact at all is
expected in the case where we don't have cgroups disabled.
Signed-off-by: NGlauber Costa <glommer@parallels.com>
Reviewed-by: NHiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com>
CC: David S. Miller <davem@davemloft.net>
CC: Eric W. Biederman <ebiederm@xmission.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

180d8cd9

29 11月, 2011 1 次提交

net: optimize socket timestamping · 08e29af3

由 Eric Dumazet 提交于 11月 28, 2011

We can test/set multiple bits from sk_flags at once, to shorten a bit
socket setup/dismantle phase.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

08e29af3

23 11月, 2011 1 次提交

net: add network priority cgroup infrastructure (v4) · 5bc1421e

由 Neil Horman 提交于 11月 22, 2011

This patch adds in the infrastructure code to create the network priority
cgroup.  The cgroup, in addition to the standard processes file creates two
control files:

1) prioidx - This is a read-only file that exports the index of this cgroup.
This is a value that is both arbitrary and unique to a cgroup in this subsystem,
and is used to index the per-device priority map

2) priomap - This is a writeable file.  On read it reports a table of 2-tuples
<name:priority> where name is the name of a network interface and priority is
indicates the priority assigned to frames egresessing on the named interface and
originating from a pid in this cgroup

This cgroup allows for skb priority to be set prior to a root qdisc getting
selected. This is benenficial for DCB enabled systems, in that it allows for any
application to use dcb configured priorities so without application modification
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
CC: Robert Love <robert.w.love@intel.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5bc1421e

10 11月, 2011 1 次提交

net: add wireless TX status socket option · 6e3e939f

由 Johannes Berg 提交于 11月 09, 2011

The 802.1X EAPOL handshake hostapd does requires
knowing whether the frame was ack'ed by the peer.
Currently, we fudge this pretty badly by not even
transmitting the frame as a normal data frame but
injecting it with radiotap and getting the status
out of radiotap monitor as well. This is rather
complex, confuses users (mon.wlan0 presence) and
doesn't work with all hardware.

To get rid of that hack, introduce a real wifi TX
status option for data frame transmissions.

This works similar to the existing TX timestamping
in that it reflects the SKB back to the socket's
error queue with a SCM_WIFI_STATUS cmsg that has
an int indicating ACK status (0/1).

Since it is possible that at some point we will
want to have TX timestamping and wifi status in a
single errqueue SKB (there's little point in not
doing that), redefine SO_EE_ORIGIN_TIMESTAMPING
to SO_EE_ORIGIN_TXSTATUS which can collect more
than just the timestamp; keep the old constant
as an alias of course. Currently the internal APIs
don't make that possible, but it wouldn't be hard
to split them up in a way that makes it possible.

Thanks to Neil Horman for helping me figure out
the functions that add the control messages.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

6e3e939f

09 11月, 2011 1 次提交

net: rename sk_clone to sk_clone_lock · e56c57d0

由 Eric Dumazet 提交于 11月 08, 2011

Make clear that sk_clone() and inet_csk_clone() return a locked socket.

Add _lock() prefix and kerneldoc.
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e56c57d0

26 10月, 2011 1 次提交

net: Unlock sock before calling sk_free() · b0691c8e

由 Thomas Gleixner 提交于 10月 25, 2011

Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b0691c8e

14 10月, 2011 1 次提交

net: more accurate skb truesize · 87fb4b7b

由 Eric Dumazet 提交于 10月 13, 2011

skb truesize currently accounts for sk_buff struct and part of skb head.
kmalloc() roundings are also ignored.

Considering that skb_shared_info is larger than sk_buff, its time to
take it into account for better memory accounting.

This patch introduces SKB_TRUESIZE(X) macro to centralize various
assumptions into a single place.

At skb alloc phase, we put skb_shared_info struct at the exact end of
skb head, to allow a better use of memory (lowering number of
reallocations), since kmalloc() gives us power-of-two memory blocks.

Unless SLUB/SLUB debug is active, both skb->head and skb_shared_info are
aligned to cache lines, as before.

Note: This patch might trigger performance regressions because of
misconfigured protocol stacks, hitting per socket or global memory
limits that were previously not reached. But its a necessary step for a
more accurate memory accounting.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
CC: Andi Kleen <ak@linux.intel.com>
CC: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

87fb4b7b

08 10月, 2011 1 次提交

net: use sock_valbool_flag to set/clear SOCK_RXQ_OVFL · 8083f0fc

由 Johannes Berg 提交于 10月 07, 2011

There's no point in open-coding sock_valbool_flag().
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8083f0fc

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功