提交 · 284b327be2f86cf751316ff344b6945e580e654f · openeuler / raspberrypi-kernel

11 11月, 2007 2 次提交

[UNIX]: The unix_nr_socks limit can be exceeded · 284b327b

由 Pavel Emelyanov 提交于 11月 10, 2007

The unix_nr_socks value is limited with the 2 * get_max_files() value,
as seen from the unix_create1(). However, the check and the actual
increment are separated with the GFP_KERNEL allocation, so this limit
can be exceeded under a memory pressure - task may go to sleep freeing
the pages and some other task will be allowed to allocate a new sock
and so on and so forth.

So make the increment before the check (similar thing is done in the
sock_kmalloc) and go to kmalloc after this.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

284b327b

[AF_UNIX]: Make unix_tot_inflight counter non-atomic · 9305cfa4

由 Pavel Emelyanov 提交于 11月 10, 2007

This counter is _always_ modified under the unix_gc_lock spinlock, 
so its atomicity can be provided w/o additional efforts.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9305cfa4

01 11月, 2007 1 次提交

[NET]: Forget the zero_it argument of sk_alloc() · 6257ff21

由 Pavel Emelyanov 提交于 11月 01, 2007

Finally, the zero_it argument can be completely removed from
the callers and from the function prototype.

Besides, fix the checkpatch.pl warnings about using the
assignments inside if-s.

This patch is rather big, and it is a part of the previous one.
I splitted it wishing to make the patches more readable. Hope 
this particular split helped.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6257ff21

20 10月, 2007 1 次提交

pid namespaces: changes to show virtual ids to user · b488893a

由 Pavel Emelyanov 提交于 10月 18, 2007

This is the largest patch in the set. Make all (I hope) the places where
the pid is shown to or get from user operate on the virtual pids.

The idea is:
 - all in-kernel data structures must store either struct pid itself
   or the pid's global nr, obtained with pid_nr() call;
 - when seeking the task from kernel code with the stored id one
   should use find_task_by_pid() call that works with global pids;
 - when showing pid's numerical value to the user the virtual one
   should be used, but however when one shows task's pid outside this
   task's namespace the global one is to be used;
 - when getting the pid from userspace one need to consider this as
   the virtual one and use appropriate task/pid-searching functions.

[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: nuther build fix]
[akpm@linux-foundation.org: yet nuther build fix]
[akpm@linux-foundation.org: remove unneeded casts]
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NAlexey Dobriyan <adobriyan@openvz.org>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Paul Menage <menage@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b488893a

15 10月, 2007 1 次提交

sched: affine sync wakeups · 71e20f18

由 Ingo Molnar 提交于 10月 15, 2007

make sync wakeups affine for cache-cold tasks: if a cache-cold task
is woken up by a sync wakeup then use the opportunity to migrate it
straight away. (the two tasks are 'related' because they communicate)
Signed-off-by: NIngo Molnar <mingo@elte.hu>

71e20f18

11 10月, 2007 3 次提交

[NET]: Make core networking code use seq_open_private · cf7732e4

由 Pavel Emelyanov 提交于 10月 10, 2007

This concerns the ipv4 and ipv6 code mostly, but also the netlink
and unix sockets.

The netlink code is an example of how to use the __seq_open_private()
call - it saves the net namespace on this private.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf7732e4

[NET]: Make socket creation namespace safe. · 1b8d7ae4

由 Eric W. Biederman 提交于 10月 08, 2007

This patch passes in the namespace a new socket should be created in
and has the socket code do the appropriate reference counting.  By
virtue of this all socket create methods are touched.  In addition
the socket create methods are modified so that they will fail if
you attempt to create a socket in a non-default network namespace.

Failing if we attempt to create a socket outside of the default
network namespace ensures that as we incrementally make the network stack
network namespace aware we will not export functionality that someone
has not audited and made certain is network namespace safe.
Allowing us to partially enable network namespaces before all of the
exotic protocols are supported.

Any protocol layers I have missed will fail to compile because I now
pass an extra parameter into the socket creation code.

[ Integrated AF_IUCV build fixes from Andrew Morton... -DaveM ]
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1b8d7ae4

[NET]: Make /proc/net per network namespace · 457c4cbc

由 Eric W. Biederman 提交于 9月 12, 2007

This patch makes /proc/net per network namespace. It modifies the global
variables proc_net and proc_net_stat to be per network namespace.
The proc_net file helpers are modified to take a network namespace argument,
and all of their callers are fixed to pass &init_net for that argument.
This ensures that all of the /proc/net files are only visible and
usable in the initial network namespace until the code behind them
has been updated to be handle multiple network namespaces.

Making /proc/net per namespace is necessary as at least some files
in /proc/net depend upon the set of network devices which is per
network namespace, and even more files in /proc/net have contents
that are relevant to a single network namespace.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

457c4cbc

31 7月, 2007 1 次提交

[AF_UNIX]: Make code static. · 13111698

由 Adrian Bunk 提交于 7月 30, 2007

The following code can now become static:
- struct unix_socket_table
- unix_table_lock
Signed-off-by: NAdrian Bunk <bunk@stusta.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

13111698

12 7月, 2007 1 次提交

[AF_UNIX]: Rewrite garbage collector, fixes race. · 1fd05ba5

由 Miklos Szeredi 提交于 7月 11, 2007

Throw out the old mark & sweep garbage collector and put in a
refcounting cycle detecting one.

The old one had a race with recvmsg, that resulted in false positives
and hence data loss.  The old algorithm operated on all unix sockets
in the system, so any additional locking would have meant performance
problems for all users of these.

The new algorithm instead only operates on "in flight" sockets, which
are very rare, and the additional locking for these doesn't negatively
impact the vast majority of users.

In fact it's probable, that there weren't *any* heavy senders of
sockets over sockets, otherwise the above race would have been
discovered long ago.

The patch works OK with the app that exposed the race with the old
code.  The garbage collection has also been verified to work in a few
simple cases.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1fd05ba5

11 7月, 2007 1 次提交

[NET]: Make all initialized struct seq_operations const. · 56b3d975

由 Philippe De Muyter 提交于 7月 10, 2007

Make all initialized struct seq_operations in net/ const
Signed-off-by: NPhilippe De Muyter <phdm@macqel.be>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

56b3d975

08 6月, 2007 1 次提交

[AF_UNIX]: Fix stream recvmsg() race. · 3c0d2f37

由 Miklos Szeredi 提交于 6月 05, 2007

A recv() on an AF_UNIX, SOCK_STREAM socket can race with a
send()+close() on the peer, causing recv() to return zero, even though
the sent data should be received.

This happens if the send() and the close() is performed between
skb_dequeue() and checking sk->sk_shutdown in unix_stream_recvmsg():

process A  skb_dequeue() returns NULL, there's no data in the socket queue
process B  new data is inserted onto the queue by unix_stream_sendmsg()
process B  sk->sk_shutdown is set to SHUTDOWN_MASK by unix_release_sock()
process A  sk->sk_shutdown is checked, unix_release_sock() returns zero

I'm surprised nobody noticed this, it's not hard to trigger.  Maybe
it's just (un)luck with the timing.

It's possible to work around this bug in userspace, by retrying the
recv() once in case of a zero return value.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3c0d2f37

04 6月, 2007 2 次提交

[AF_UNIX]: Fix datagram connect race causing an OOPS. · 278a3de5

由 David S. Miller 提交于 5月 31, 2007

Based upon an excellent bug report and initial patch by
Frederik Deweerdt.

The UNIX datagram connect code blindly dereferences other->sk_socket
via the call down to the security_unix_may_send() function.

Without locking 'other' that pointer can go NULL via unix_release_sock()
which does sock_orphan() which also marks the socket SOCK_DEAD.

So we have to lock both 'sk' and 'other' yet avoid all kinds of
potential deadlocks (connect to self is OK for datagram sockets and it
is possible for two datagram sockets to perform a simultaneous connect
to each other).  So what we do is have a "double lock" function similar
to how we handle this situation in other areas of the kernel.  We take
the lock of the socket pointer with the smallest address first in
order to avoid ABBA style deadlocks.

Once we have them both locked, we check to see if SOCK_DEAD is set
for 'other' and if so, drop everything and retry the lookup.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

278a3de5

[AF_UNIX]: Make socket locking much less confusing. · 1c92b4e5

由 David S. Miller 提交于 5月 31, 2007

The unix_state_*() locking macros imply that there is some
rwlock kind of thing going on, but the implementation is
actually a spinlock which makes the code more confusing than
it needs to be.

So use plain unix_state_lock and unix_state_unlock.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1c92b4e5

09 5月, 2007 1 次提交

header cleaning: don't include smp_lock.h when not used · e63340ae

由 Randy Dunlap 提交于 5月 08, 2007

Remove includes of <linux/smp_lock.h> where it is not used/needed.
Suggested by Al Viro.

Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e63340ae

26 4月, 2007 1 次提交

[SK_BUFF]: Introduce skb_reset_transport_header(skb) · badff6d0

由 Arnaldo Carvalho de Melo 提交于 3月 13, 2007

For the common, open coded 'skb->h.raw = skb->data' operation, so that we can
later turn skb->h.raw into a offset, reducing the size of struct sk_buff in
64bit land while possibly keeping it as a pointer on 32bit.

This one touches just the most simple cases:

skb->h.raw = skb->data;
skb->h.raw = {skb_push|[__]skb_pull}()

The next ones will handle the slightly more "complex" cases.
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

badff6d0

07 3月, 2007 1 次提交

[NET]: Revert incorrect accept queue backlog changes. · 64a14651

由 David S. Miller 提交于 3月 06, 2007

This reverts two changes:

8488df89
248f0672

A backlog value of N really does mean allow "N + 1" connections
to queue to a listening socket.  This allows one to specify
"0" as the backlog and still get 1 connection.

Noticed by Gerrit Renker and Rick Jones.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

64a14651

03 3月, 2007 1 次提交

[AF_UNIX]: Test against sk_max_ack_backlog properly. · 248f0672

由 David S. Miller 提交于 3月 02, 2007

This brings things inline with the sk_acceptq_is_full() bug
fix.  The limit test should be x >= sk_max_ack_backlog.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

248f0672

13 2月, 2007 1 次提交

[PATCH] mark struct file_operations const 8 · da7071d7

由 Arjan van de Ven 提交于 2月 12, 2007

Many struct file_operations in the kernel can be "const".  Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data.  In addition it'll catch accidental writes at compile time to
these shared resources.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

da7071d7

11 2月, 2007 1 次提交

[NET] UNIX: Fix whitespace errors. · ac7bfa62

由 YOSHIFUJI Hideaki 提交于 2月 09, 2007

Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ac7bfa62

03 12月, 2006 1 次提交

[NET]: Annotate csum_partial() callers in net/* · 44bb9363

由 Al Viro 提交于 11月 14, 2006

Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

44bb9363

23 9月, 2006 2 次提交
- B
  [AF_UNIX]: Change max_dgram_qlen sysctl to __read_mostly · 18adaf06
  由 Brian Haley 提交于 8月 31, 2006
```
Signed-off-by: NBrian Haley <brian.haley@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  18adaf06
- Y
  [NET]: Use BUILD_BUG_ON() for checking size of skb->cb. · ef047f5e
  由 YOSHIFUJI Hideaki 提交于 9月 01, 2006
```
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  ef047f5e
03 8月, 2006 1 次提交

[AF_UNIX]: Kernel memory leak fix for af_unix datagram getpeersec patch · dc49c1f9

由 Catherine Zhang 提交于 8月 02, 2006

From: Catherine Zhang <cxzhang@watson.ibm.com>

This patch implements a cleaner fix for the memory leak problem of the
original unix datagram getpeersec patch.  Instead of creating a
security context each time a unix datagram is sent, we only create the
security context when the receiver requests it.

This new design requires modification of the current
unix_getsecpeer_dgram LSM hook and addition of two new hooks, namely,
secid_to_secctx and release_secctx.  The former retrieves the security
context and the latter releases it.  A hook is required for releasing
the security context because it is up to the security module to decide
how that's done.  In the case of Selinux, it's a simple kfree
operation.
Acked-by: NStephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dc49c1f9

22 7月, 2006 1 次提交
- P
  [NET]: Conversions from kmalloc+memset to k(z|c)alloc. · 0da974f4
  由 Panagiotis Issaris 提交于 7月 21, 2006
```
Signed-off-by: NPanagiotis Issaris <takis@issaris.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  0da974f4
04 7月, 2006 2 次提交

[AF_UNIX]: datagram getpeersec fix · 882d02d6

由 Andrew Morton 提交于 7月 03, 2006

The unix_get_peersec_dgram() stub should have been inlined so that it
disappears.
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

882d02d6

[PATCH] lockdep: annotate af_unix locking · a09785a2

由 Ingo Molnar 提交于 7月 03, 2006

Teach special (recursive) locking code to the lock validator.  Also splits
af_unix's sk_receive_queue.lock class from the other networking skb-queue
locks.  Has no effect on non-lockdep kernels.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

a09785a2

01 7月, 2006 1 次提交

Remove obsolete #include <linux/config.h> · 6ab3d562

由 Jörn Engel 提交于 6月 30, 2006

Signed-off-by: NJörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: NAdrian Bunk <bunk@stusta.de>

6ab3d562

30 6月, 2006 1 次提交

[AF_UNIX]: Datagram getpeersec · 877ce7c1

由 Catherine Zhang 提交于 6月 29, 2006

This patch implements an API whereby an application can determine the
label of its peer's Unix datagram sockets via the auxiliary data mechanism of
recvmsg.

Patch purpose:

This patch enables a security-aware application to retrieve the
security context of the peer of a Unix datagram socket.  The application
can then use this security context to determine the security context for
processing on behalf of the peer who sent the packet.

Patch design and implementation:

The design and implementation is very similar to the UDP case for INET
sockets.  Basically we build upon the existing Unix domain socket API for
retrieving user credentials.  Linux offers the API for obtaining user
credentials via ancillary messages (i.e., out of band/control messages
that are bundled together with a normal message).  To retrieve the security
context, the application first indicates to the kernel such desire by
setting the SO_PASSSEC option via getsockopt.  Then the application
retrieves the security context using the auxiliary data mechanism.

An example server application for Unix datagram socket should look like this:

toggle = 1;
toggle_len = sizeof(toggle);

setsockopt(sockfd, SOL_SOCKET, SO_PASSSEC, &toggle, &toggle_len);
recvmsg(sockfd, &msg_hdr, 0);
if (msg_hdr.msg_controllen > sizeof(struct cmsghdr)) {
    cmsg_hdr = CMSG_FIRSTHDR(&msg_hdr);
    if (cmsg_hdr->cmsg_len <= CMSG_LEN(sizeof(scontext)) &&
        cmsg_hdr->cmsg_level == SOL_SOCKET &&
        cmsg_hdr->cmsg_type == SCM_SECURITY) {
        memcpy(&scontext, CMSG_DATA(cmsg_hdr), sizeof(scontext));
    }
}

sock_setsockopt is enhanced with a new socket option SOCK_PASSSEC to allow
a server socket to receive security context of the peer.

Testing:

We have tested the patch by setting up Unix datagram client and server
applications.  We verified that the server can retrieve the security context
using the auxiliary data mechanism of recvmsg.
Signed-off-by: NCatherine Zhang <cxzhang@watson.ibm.com>
Acked-by: NAcked-by: James Morris <jmorris@namei.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

877ce7c1

26 3月, 2006 1 次提交

[PATCH] POLLRDHUP/EPOLLRDHUP handling for half-closed devices notifications · f348d70a

由 Davide Libenzi 提交于 3月 25, 2006

Implement the half-closed devices notifiation, by adding a new POLLRDHUP
(and its alias EPOLLRDHUP) bit to the existing poll/select sets.  Since the
existing POLLHUP handling, that does not report correctly half-closed
devices, was feared to be changed, this implementation leaves the current
POLLHUP reporting unchanged and simply add a new bit that is set in the few
places where it makes sense.  The same thing was discussed and conceptually
agreed quite some time ago:

http://lkml.org/lkml/2003/7/12/116

Since this new event bit is added to the existing Linux poll infrastruture,
even the existing poll/select system calls will be able to use it.  As far
as the existing POLLHUP handling, the patch leaves it as is.  The
pollrdhup-2.6.16.rc5-0.10.diff defines the POLLRDHUP for all the existing
archs and sets the bit in the six relevant files.  The other attached diff
is the simple change required to sys/epoll.h to add the EPOLLRDHUP
definition.

There is "a stupid program" to test POLLRDHUP delivery here:

 http://www.xmailserver.org/pollrdhup-test.c

It tests poll(2), but since the delivery is same epoll(2) will work equally.
Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

f348d70a

21 3月, 2006 2 次提交

[NET]: sem2mutex part 2 · 57b47a53

由 Ingo Molnar 提交于 3月 20, 2006

Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

57b47a53

[AF_UNIX]: use shift instead of integer division · e9df7d7f

由 Benjamin LaHaise 提交于 3月 20, 2006

The patch below replaces a divide by 2 with a shift -- sk_sndbuf is an
integer, so gcc emits an idiv, which takes 10x longer than a shift by 1.
This improves af_unix bandwidth by ~6-10K/s. Also, tidy up the comment
to fit in 80 columns while we're at it.
Signed-off-by: NBenjamin LaHaise <benjamin.c.lahaise@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e9df7d7f

09 3月, 2006 1 次提交

[PATCH] fix file counting · 529bf6be

由 Dipankar Sarma 提交于 3月 07, 2006

I have benchmarked this on an x86_64 NUMA system and see no significant
performance difference on kernbench.  Tested on both x86_64 and powerpc.

The way we do file struct accounting is not very suitable for batched
freeing.  For scalability reasons, file accounting was
constructor/destructor based.  This meant that nr_files was decremented
only when the object was removed from the slab cache.  This is susceptible
to slab fragmentation.  With RCU based file structure, consequent batched
freeing and a test program like Serge's, we just speed this up and end up
with a very fragmented slab -

llm22:~ # cat /proc/sys/fs/file-nr
587730  0       758844

At the same time, I see only a 2000+ objects in filp cache.  The following
patch I fixes this problem.

This patch changes the file counting by removing the filp_count_lock.
Instead we use a separate percpu counter, nr_files, for now and all
accesses to it are through get_nr_files() api.  In the sysctl handler for
nr_files, we populate files_stat.nr_files before returning to user.

Counting files as an when they are created and destroyed (as opposed to
inside slab) allows us to correctly count open files with RCU.
Signed-off-by: NDipankar Sarma <dipankar@in.ibm.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

529bf6be

10 1月, 2006 1 次提交

[PATCH] mutex subsystem, semaphore to mutex: VFS, ->i_sem · 1b1dcc1b

由 Jes Sorensen 提交于 1月 09, 2006

This patch converts the inode semaphore to a mutex. I have tested it on
XFS and compiled as much as one can consider on an ia64. Anyway your
luck with it might be different.
Modified-by: NIngo Molnar <mingo@elte.hu>

(finished the conversion)
Signed-off-by: NJes Sorensen <jes@sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

1b1dcc1b

04 1月, 2006 5 次提交

[NET]: Add a dev_ioctl() fallback to sock_ioctl() · b5e5fa5e

由 Christoph Hellwig 提交于 1月 03, 2006

Currently all network protocols need to call dev_ioctl as the default
fallback in their ioctl implementations.  This patch adds a fallback
to dev_ioctl to sock_ioctl if the protocol returned -ENOIOCTLCMD.
This way all the procotol ioctl handlers can be simplified and we don't
need to export dev_ioctl.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b5e5fa5e

[AF_UNIX]: Convert to use a spinlock instead of rwlock · fd19f329

由 Benjamin LaHaise 提交于 1月 03, 2006

From: Benjamin LaHaise <bcrl@kvack.org>

In af_unix, a rwlock is used to protect internal state.  At least on my 
P4 with HT it is faster to use a spinlock due to the simpler memory 
barrier used to unlock.  This patch raises bw_unix to ~690K/s.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fd19f329

[NET]: move struct proto_ops to const · 90ddc4f0

由 Eric Dumazet 提交于 12月 22, 2005

I noticed that some of 'struct proto_ops' used in the kernel may share
a cache line used by locks or other heavily modified data. (default
linker alignement is 32 bytes, and L1_CACHE_LINE is 64 or 128 at
least)

This patch makes sure a 'struct proto_ops' can be declared as const,
so that all cpus can share all parts of it without false sharing.

This is not mandatory : a driver can still use a read/write structure
if it needs to (and eventually a __read_mostly)

I made a global stubstitute to change all existing occurences to make
them const.

This should reduce the possibility of false sharing on SMP, and
speedup some socket system calls.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

90ddc4f0

[AF_UNIX]: Use spinlock for unix_table_lock · fbe9cc4a

由 David S. Miller 提交于 12月 13, 2005

This lock is actually taken mostly as a writer,
so using a rwlock actually just makes performance
worse especially on chips like the Intel P4.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fbe9cc4a

[AF_UNIX]: Remove superfluous reference counting in unix_stream_sendmsg · 830a1e5c

由 Benjamin LaHaise 提交于 12月 13, 2005

AF_UNIX stream socket performance on P4 CPUs tends to suffer due to a
lot of pipeline flushes from atomic operations. The patch below
removes the sock_hold() and sock_put() in unix_stream_sendmsg(). This
should be safe as the socket still holds a reference to its peer which
is only released after the file descriptor's final user invokes
unix_release_sock(). The only consideration is that we must add a
memory barrier before setting the peer initially.
Signed-off-by: NBenjamin LaHaise <benjamin.c.lahaise@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

830a1e5c

09 11月, 2005 1 次提交

[PATCH] add a vfs_permission helper · e4543edd

由 Christoph Hellwig 提交于 11月 08, 2005

Most permission() calls have a struct nameidata * available.  This helper
takes that as an argument and thus makes sure we pass it down for lookup
intents and prepares for per-mount read-only support where we need a struct
vfsmount for checking whether a file is writeable.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

e4543edd