提交 · adb37c4c67f807f16beb222028fb3ce9a354dc2b · openanolis / cloud-kernel

12 7月, 2012 1 次提交
- D
  ipv4: Add redirect support to all protocol icmp error handlers. · 55be7a9c
  由 David S. Miller 提交于 7月 11, 2012
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  55be7a9c
15 6月, 2012 1 次提交

ipv4: Handle PMTU in all ICMP error handlers. · 36393395

由 David S. Miller 提交于 6月 14, 2012

With ip_rt_frag_needed() removed, we have to explicitly update PMTU
information in every ICMP error handler.

Create two helper functions to facilitate this.

1) ipv4_sk_update_pmtu()

   This updates the PMTU when we have a socket context to
   work with.

2) ipv4_update_pmtu()

   Raw version, used when no socket context is available.  For this
   interface, we essentially just pass in explicit arguments for
   the flow identity information we would have extracted from the
   socket.

   And you'll notice that ipv4_sk_update_pmtu() is simply implemented
   in terms of ipv4_update_pmtu()

Note that __ip_route_output_key() is used, rather than something like
ip_route_output_flow() or ip_route_output_key().  This is because we
absolutely do not want to end up with a route that does IPSEC
encapsulation and the like.  Instead, we only want the route that
would get us to the node described by the outermost IP header.
Reported-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

36393395

16 4月, 2012 1 次提交

ipv4: fix checkpatch errors · 5e73ea1a

由 Daniel Baluta 提交于 4月 15, 2012

Fix checkpatch errors of the following type:
	* ERROR: "foo * bar" should be "foo *bar"
	* ERROR: "(foo*)" should be "(foo *)"
Signed-off-by: NDaniel Baluta <dbaluta@ixiacom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5e73ea1a

12 3月, 2012 1 次提交

net: Convert printks to pr_<level> · 058bd4d2

由 Joe Perches 提交于 3月 11, 2012

Use a more current kernel messaging style.

Convert a printk block to print_hex_dump.
Coalesce formats, align arguments.
Use %s, __func__ instead of embedding function names.

Some messages that were prefixed with <foo>_close are
now prefixed with <foo>_fini.  Some ah4 and esp messages
are now not prefixed with "ip ".

The intent of this patch is to later add something like
  #define pr_fmt(fmt) "IPv4: " fmt.
to standardize the output messages.

Text size is trivially reduced. (x86-32 allyesconfig)

$ size net/ipv4/built-in.o*
   text	   data	    bss	    dec	    hex	filename
 887888	  31558	 249696	1169142	 11d6f6	net/ipv4/built-in.o.new
 887934	  31558	 249800	1169292	 11d78c	net/ipv4/built-in.o.old
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

058bd4d2

09 2月, 2012 1 次提交

ipv4: Implement IP_UNICAST_IF socket option. · 76e21053

由 Erich E. Hoover 提交于 2月 08, 2012

The IP_UNICAST_IF feature is needed by the Wine project.  This patch
implements the feature by setting the outgoing interface in a similar
fashion to that of IP_MULTICAST_IF.  A separate option is needed to
handle this feature since the existing options do not provide all of
the characteristics required by IP_UNICAST_IF, a summary is provided
below.

SO_BINDTODEVICE:
* SO_BINDTODEVICE requires administrative privileges, IP_UNICAST_IF
does not.  From reading some old mailing list articles my
understanding is that SO_BINDTODEVICE requires administrative
privileges because it can override the administrator's routing
settings.
* The SO_BINDTODEVICE option restricts both outbound and inbound
traffic, IP_UNICAST_IF only impacts outbound traffic.

IP_PKTINFO:
* Since IP_PKTINFO and IP_UNICAST_IF are independent options,
implementing IP_UNICAST_IF with IP_PKTINFO will likely break some
applications.
* Implementing IP_UNICAST_IF on top of IP_PKTINFO significantly
complicates the Wine codebase and reduces the socket performance
(doing this requires a lot of extra communication between the
"server" and "user" layers).

bind():
* bind() does not work on broadcast packets, IP_UNICAST_IF is
specifically intended to work with broadcast packets.
* Like SO_BINDTODEVICE, bind() restricts both outbound and inbound
traffic.
Signed-off-by: NErich E. Hoover <ehoover@mines.edu>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

76e21053

19 11月, 2011 1 次提交

ipv4: Remove all uses of LL_ALLOCATED_SPACE · 66088243

由 Herbert Xu 提交于 11月 18, 2011

ipv4: Remove all uses of LL_ALLOCATED_SPACE

The macro LL_ALLOCATED_SPACE was ill-conceived.  It applies the
alignment to the sum of needed_headroom and needed_tailroom.  As
the amount that is then reserved for head room is needed_headroom
with alignment, this means that the tail room left may be too small.

This patch replaces all uses of LL_ALLOCATED_SPACE in net/ipv4
with the macro LL_RESERVED_SPACE and direct reference to
needed_tailroom.

This also fixes the problem with needed_headroom changing between
allocating the skb and reserving the head room.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

66088243

10 11月, 2011 1 次提交

ipv4: PKTINFO doesnt need dst reference · d826eb14

由 Eric Dumazet 提交于 11月 09, 2011

Le lundi 07 novembre 2011 à 15:33 +0100, Eric Dumazet a écrit :

> At least, in recent kernels we dont change dst->refcnt in forwarding
> patch (usinf NOREF skb->dst)
>
> One particular point is the atomic_inc(dst->refcnt) we have to perform
> when queuing an UDP packet if socket asked PKTINFO stuff (for example a
> typical DNS server has to setup this option)
>
> I have one patch somewhere that stores the information in skb->cb[] and
> avoid the atomic_{inc|dec}(dst->refcnt).
>

OK I found it, I did some extra tests and believe its ready.

[PATCH net-next] ipv4: IP_PKTINFO doesnt need dst reference

When a socket uses IP_PKTINFO notifications, we currently force a dst
reference for each received skb. Reader has to access dst to get needed
information (rt_iif & rt_spec_dst) and must release dst reference.

We also forced a dst reference if skb was put in socket backlog, even
without IP_PKTINFO handling. This happens under stress/load.

We can instead store the needed information in skb->cb[], so that only
softirq handler really access dst, improving cache hit ratios.

This removes two atomic operations per packet, and false sharing as
well.

On a benchmark using a mono threaded receiver (doing only recvmsg()
calls), I can reach 720.000 pps instead of 570.000 pps.

IP_PKTINFO is typically used by DNS servers, and any multihomed aware
UDP application.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d826eb14

01 11月, 2011 1 次提交

net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modules · bc3b2d7f

由 Paul Gortmaker 提交于 7月 15, 2011

These files are non modular, but need to export symbols using
the macros now living in export.h -- call out the include so
that things won't break when we remove the implicit presence
of module.h from everywhere.
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

bc3b2d7f

08 8月, 2011 1 次提交

ipv4: route non-local sources for raw socket · 47670b76

由 Julian Anastasov 提交于 8月 07, 2011

The raw sockets can provide source address for
routing but their privileges are not considered. We
can provide non-local source address, make sure the
FLOWI_FLAG_ANYSRC flag is set if socket has privileges
for this, i.e. based on hdrincl (IP_HDRINCL) and
transparent flags.
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

47670b76

27 7月, 2011 1 次提交

atomic: use <linux/atomic.h> · 60063497

由 Arun Sharma 提交于 7月 26, 2011

This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>
Signed-off-by: NArun Sharma <asharma@fb.com>
Reviewed-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: NMike Frysinger <vapier@gentoo.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

60063497

02 7月, 2011 1 次提交

ipv4: Reduce switch/case indent · 4500ebf8

由 Joe Perches 提交于 7月 01, 2011

Make the case labels the same indent as the switch.

git diff -w shows no difference.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4500ebf8

24 5月, 2011 1 次提交

net: convert %p usage to %pK · 71338aa7

由 Dan Rosenberg 提交于 5月 23, 2011

The %pK format specifier is designed to hide exposed kernel pointers,
specifically via /proc interfaces.  Exposing these pointers provides an
easy target for kernel write vulnerabilities, since they reveal the
locations of writable structures containing easily triggerable function
pointers.  The behavior of %pK depends on the kptr_restrict sysctl.

If kptr_restrict is set to 0, no deviation from the standard %p behavior
occurs.  If kptr_restrict is set to 1, the default, if the current user
(intended to be a reader via seq_printf(), etc.) does not have CAP_SYSLOG
(currently in the LSM tree), kernel pointers using %pK are printed as 0's.
 If kptr_restrict is set to 2, kernel pointers using %pK are printed as
0's regardless of privileges.  Replacing with 0's was chosen over the
default "(null)", which cannot be parsed by userland %p, which expects
"(nil)".

The supporting code for kptr_restrict and %pK are currently in the -mm
tree.  This patch converts users of %p in net/ to %pK.  Cases of printing
pointers to the syslog are not covered, since this would eliminate useful
information for postmortem debugging and the reading of the syslog is
already optionally protected by the dmesg_restrict sysctl.
Signed-off-by: NDan Rosenberg <drosenberg@vsecurity.com>
Cc: James Morris <jmorris@namei.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Thomas Graf <tgraf@infradead.org>
Cc: Eugene Teo <eugeneteo@kernel.org>
Cc: Kees Cook <kees.cook@canonical.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David S. Miller <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Eric Paris <eparis@parisplace.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

71338aa7

09 5月, 2011 2 次提交

D
ipv4: Pass flow key down into ip_append_*(). · f5fca608
由 David S. Miller 提交于 5月 08, 2011
```
This way rt->rt_dst accesses are unnecessary.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
f5fca608

ipv4: Pass flow keys down into datagram packet building engine. · 77968b78

由 David S. Miller 提交于 5月 08, 2011

This way ip_output.c no longer needs rt->rt_{src,dst}.

We already have these keys sitting, ready and waiting, on the stack or
in a socket structure.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

77968b78

29 4月, 2011 1 次提交

inet: add RCU protection to inet->opt · f6d8bd05

由 Eric Dumazet 提交于 4月 21, 2011

We lack proper synchronization to manipulate inet->opt ip_options

Problem is ip_make_skb() calls ip_setup_cork() and
ip_setup_cork() possibly makes a copy of ipc->opt (struct ip_options),
without any protection against another thread manipulating inet->opt.

Another thread can change inet->opt pointer and free old one under us.

Use RCU to protect inet->opt (changed to inet->inet_opt).

Instead of handling atomic refcounts, just copy ip_options when
necessary, to avoid cache line dirtying.

We cant insert an rcu_head in struct ip_options since its included in
skb->cb[], so this patch is large because I had to introduce a new
ip_options_rcu structure.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f6d8bd05

23 4月, 2011 1 次提交

inet: constify ip headers and in6_addr · b71d1d42

由 Eric Dumazet 提交于 4月 22, 2011

Add const qualifiers to structs iphdr, ipv6hdr and in6_addr pointers
where possible, to make code intention more obvious.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b71d1d42

31 3月, 2011 2 次提交
- L
  Fix common misspellings · 25985edc
  由 Lucas De Marchi 提交于 3月 30, 2011
```
Fixes generated by 'codespell' and manually reviewed.
Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>
```
  25985edc
- D
  ipv4: Use flowi4_init_output() in raw_sendmsg() · ef164ae3
  由 David S. Miller 提交于 3月 31, 2011
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  ef164ae3
29 3月, 2011 1 次提交
- D
  ipv4: Don't ip_rt_put() an error pointer in RAW sockets. · 4910ac6c
  由 David S. Miller 提交于 3月 28, 2011
```
Reported-by: NMarc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  4910ac6c
13 3月, 2011 4 次提交

D
net: Put fl4_* macros to struct flowi4 and use them again. · 9cce96df
由 David S. Miller 提交于 3月 12, 2011
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
9cce96df
D
ipv4: Use flowi4 in public route lookup interfaces. · 9d6ec938
由 David S. Miller 提交于 3月 12, 2011
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
9d6ec938

net: Make flowi ports AF dependent. · 6281dcc9

由 David S. Miller 提交于 3月 12, 2011

Create two sets of port member accessors, one set prefixed by fl4_*
and the other prefixed by fl6_*

This will let us to create AF optimal flow instances.

It will work because every context in which we access the ports,
we have to be fully aware of which AF the flowi is anyways.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6281dcc9

net: Put flowi_* prefix on AF independent members of struct flowi · 1d28f42c

由 David S. Miller 提交于 3月 12, 2011

I intend to turn struct flowi into a union of AF specific flowi
structs.  There will be a common structure that each variant includes
first, much like struct sock_common.

This is the first step to move in that direction.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1d28f42c

03 3月, 2011 1 次提交
- D
  ipv4: Make output route lookup return rtable directly. · b23dd4fe
  由 David S. Miller 提交于 3月 02, 2011
```
Instead of on the stack.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  b23dd4fe
02 3月, 2011 3 次提交

ipv4: Kill can_sleep arg to ip_route_output_flow() · 273447b3

由 David S. Miller 提交于 3月 01, 2011

This boolean state is now available in the flow flags.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

273447b3

net: Add FLOWI_FLAG_CAN_SLEEP. · 5df65e55

由 David S. Miller 提交于 3月 01, 2011

And set is in contexts where the route resolution can sleep.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5df65e55

D
ipv4: Make final arg to ip_route_output_flow to be boolean "can_sleep" · 420d44da
由 David S. Miller 提交于 3月 01, 2011
```
Since that is what the current vague "flags" argument means.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
420d44da

30 1月, 2011 1 次提交

net: Add compat ioctl support for the ipv4 multicast ioctl SIOCGETSGCNT · 709b46e8

由 Eric W. Biederman 提交于 1月 29, 2011

SIOCGETSGCNT is not a unique ioctl value as it it maps tio SIOCPROTOPRIVATE +1,
which unfortunately means the existing infrastructure for compat networking
ioctls is insufficient.  A trivial compact ioctl implementation would conflict
with:

SIOCAX25ADDUID
SIOCAIPXPRISLT
SIOCGETSGCNT_IN6
SIOCGETSGCNT
SIOCRSSCAUSE
SIOCX25SSUBSCRIP
SIOCX25SDTEFACILITIES

To make this work I have updated the compat_ioctl decode path to mirror the
the normal ioctl decode path.  I have added an ipv4 inet_compat_ioctl function
so that I can have ipv4 specific compat ioctls.   I have added a compat_ioctl
function into struct proto so I can break out ioctls by which kind of ip socket
I am using.  I have added a compat_raw_ioctl function because SIOCGETSGCNT only
works on raw sockets.  I have added a ipmr_compat_ioctl that mirrors the normal
ipmr_ioctl.

This was necessary because unfortunately the struct layout for the SIOCGETSGCNT
has unsigned longs in it so changes between 32bit and 64bit kernels.

This change was sufficient to run a 32bit ip multicast routing daemon on a
64bit kernel.
Reported-by: NBill Fenner <fenner@aristanetworks.com>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

709b46e8

18 11月, 2010 1 次提交

net: use the macros defined for the members of flowi · 5811662b

由 Changli Gao 提交于 11月 12, 2010

Use the macros defined for the members of flowi to clean the code up.
Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5811662b

19 8月, 2010 1 次提交

net: simplify flags for tx timestamping · 2244d07b

由 Oliver Hartkopp 提交于 8月 17, 2010

This patch removes the abstraction introduced by the union skb_shared_tx in
the shared skb data.

The access of the different union elements at several places led to some
confusion about accessing the shared tx_flags e.g. in skb_orphan_try().

http://marc.info/?l=linux-netdev&m=128084897415886&w=2Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2244d07b

11 6月, 2010 1 次提交

net-next: remove useless union keyword · d8d1f30b

由 Changli Gao 提交于 6月 10, 2010

remove useless union keyword in rtable, rt6_info and dn_route.

Since there is only one member in a union, the union keyword isn't useful.
Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d8d1f30b

07 6月, 2010 1 次提交

raw: avoid two atomics in xmit · 1789a640

由 Eric Dumazet 提交于 6月 03, 2010

Avoid two atomic ops per raw_send_hdrinc() call

Avoid two atomic ops per raw6_send_hdrinc() call
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1789a640

29 4月, 2010 1 次提交

net: ip_queue_rcv_skb() helper · f84af32c

由 Eric Dumazet 提交于 4月 28, 2010

When queueing a skb to socket, we can immediately release its dst if
target socket do not use IP_CMSG_PKTINFO.

tcp_data_queue() can drop dst too.

This to benefit from a hot cache line and avoid the receiver, possibly
on another cpu, to dirty this cache line himself.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f84af32c

30 3月, 2010 1 次提交

include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6

由 Tejun Heo 提交于 3月 24, 2010

include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: NTejun Heo <tj@kernel.org>
Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

5a0e3ad6

25 3月, 2010 1 次提交

netfilter: ipv4: use NFPROTO values for NF_HOOK invocation · 9bbc768a

由 Jan Engelhardt 提交于 3月 23, 2010

The semantic patch that was used:
// <smpl>
@@
@@
(NF_HOOK
|NF_HOOK_COND
|nf_hook
)(
-PF_INET,
+NFPROTO_IPV4,
 ...)
// </smpl>
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>

9bbc768a

29 10月, 2009 1 次提交

AF_RAW: Augment raw_send_hdrinc to expand skb to fit iphdr->ihl (v2) · 55888dfb

由 Neil Horman 提交于 10月 28, 2009

Augment raw_send_hdrinc to correct for incorrect ip header length values

A series of oopses was reported to me recently. Apparently when using AF_RAW
sockets to send data to peers that were reachable via ipsec encapsulation,
people could panic or BUG halt their systems.

I've tracked the problem down to user space sending an invalid ip header over an
AF_RAW socket with IP_HDRINCL set to 1.

Basically what happens is that userspace sends down an ip frame that includes
only the header (no data), but sets the ip header ihl value to a large number,
one that is larger than the total amount of data passed to the sendmsg call. In
raw_send_hdrincl, we allocate an skb based on the size of the data in the msghdr
that was passed in, but assume the data is all valid. Later during ipsec
encapsulation, xfrm4_tranport_output moves the entire frame back in the skbuff
to provide headroom for the ipsec headers. During this operation, the
skb->transport_header is repointed to a spot computed by
skb->network_header + the ip header length (ihl). Since so little data was
passed in relative to the value of ihl provided by the raw socket, we point
transport header to an unknown location, resulting in various crashes.

This fix for this is pretty straightforward, simply validate the value of of
iph->ihl when sending over a raw socket. If (iph->ihl*4U) > user data buffer
size, drop the frame and return -EINVAL. I just confirmed this fixes the
reported crashes.
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

55888dfb

19 10月, 2009 1 次提交

inet: rename some inet_sock fields · c720c7e8

由 Eric Dumazet 提交于 10月 15, 2009

In order to have better cache layouts of struct sock (separate zones
for rx/tx paths), we need this preliminary patch.

Goal is to transfert fields used at lookup time in the first
read-mostly cache line (inside struct sock_common) and move sk_refcnt
to a separate cache line (only written by rx path)

This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
sport and id fields. This allows a future patch to define these
fields as macros, like sk_refcnt, without name clashes.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c720c7e8

15 10月, 2009 1 次提交

net: sk_drops consolidation · 766e9037

由 Eric Dumazet 提交于 10月 14, 2009

sock_queue_rcv_skb() can update sk_drops itself, removing need for
callers to take care of it. This is more consistent since
sock_queue_rcv_skb() also reads sk_drops when queueing a skb.

This adds sk_drops managment to many protocols that not cared yet.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

766e9037

13 10月, 2009 1 次提交

net: Generalize socket rx gap / receive queue overflow cmsg · 3b885787

由 Neil Horman 提交于 10月 12, 2009

Create a new socket level option to report number of queue overflows

Recently I augmented the AF_PACKET protocol to report the number of frames lost
on the socket receive queue between any two enqueued frames. This value was
exported via a SOL_PACKET level cmsg. AFter I completed that work it was
requested that this feature be generalized so that any datagram oriented socket
could make use of this option. As such I've created this patch, It creates a
new SOL_SOCKET level option called SO_RXQ_OVFL, which when enabled exports a
SOL_SOCKET level cmsg that reports the nubmer of times the sk_receive_queue
overflowed between any two given frames. It also augments the AF_PACKET
protocol to take advantage of this new feature (as it previously did not touch
sk->sk_drops, which this patch uses to record the overflow count). Tested
successfully by me.

Notes:

1) Unlike my previous patch, this patch simply records the sk_drops value, which
is not a number of drops between packets, but rather a total number of drops.
Deltas must be computed in user space.

2) While this patch currently works with datagram oriented protocols, it will
also be accepted by non-datagram oriented protocols. I'm not sure if thats
agreeable to everyone, but my argument in favor of doing so is that, for those
protocols which aren't applicable to this option, sk_drops will always be zero,
and reporting no drops on a receive queue that isn't used for those
non-participating protocols seems reasonable to me. This also saves us having
to code in a per-protocol opt in mechanism.

3) This applies cleanly to net-next assuming that commit
97775007 (my af packet cmsg patch) is reverted
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3b885787

01 10月, 2009 1 次提交

net: Make setsockopt() optlen be unsigned. · b7058842

由 David S. Miller 提交于 9月 30, 2009

This provides safety against negative optlen at the type
level instead of depending upon (sometimes non-trivial)
checks against this sprinkled all over the the place, in
each and every implementation.

Based upon work done by Arjan van de Ven and feedback
from Linus Torvalds.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b7058842

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功