提交 · dacc62dbf56e872ad96edde0393b9deb56d80cd5 · openanolis / cloud-kernel

09 9月, 2008 7 次提交

This reverts "Merge branch 'dccp' of git://eden-feed.erg.abdn.ac.uk/dccp_exp " · 410e27a4

由 Gerrit Renker 提交于 9月 09, 2008

as it accentally contained the wrong set of patches. These will be
submitted separately.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

410e27a4

[Bluetooth] Reject L2CAP connections on an insecure ACL link · e7c29cb1

由 Marcel Holtmann 提交于 9月 09, 2008

The Security Mode 4 of the Bluetooth 2.1 specification has strict
authentication and encryption requirements. It is the initiators job
to create a secure ACL link. However in case of malicious devices, the
acceptor has to make sure that the ACL is encrypted before allowing
any kind of L2CAP connection. The only exception here is the PSM 1 for
the service discovery protocol, because that is allowed to run on an
insecure ACL link.

Previously it was enough to reject a L2CAP connection during the
connection setup phase, but with Bluetooth 2.1 it is forbidden to
do any L2CAP protocol exchange on an insecure link (except SDP).

The new hci_conn_check_link_mode() function can be used to check the
integrity of an ACL link. This functions also takes care of the cases
where Security Mode 4 is disabled or one of the devices is based on
an older specification.
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>

e7c29cb1

[Bluetooth] Enforce correct authentication requirements · 09ab6f4c

由 Marcel Holtmann 提交于 9月 09, 2008

With the introduction of Security Mode 4 and Simple Pairing from the
Bluetooth 2.1 specification it became mandatory that the initiator
requires authentication and encryption before any L2CAP channel can
be established. The only exception here is PSM 1 for the service
discovery protocol (SDP). It is meant to be used without any encryption
since it contains only public information. This is how Bluetooth 2.0
and before handle connections on PSM 1.

For Bluetooth 2.1 devices the pairing procedure differentiates between
no bonding, general bonding and dedicated bonding. The L2CAP layer
wrongly uses always general bonding when creating new connections, but it
should not do this for SDP connections. In this case the authentication
requirement should be no bonding and the just-works model should be used,
but in case of non-SDP connection it is required to use general bonding.

If the new connection requires man-in-the-middle (MITM) protection, it
also first wrongly creates an unauthenticated link key and then later on
requests an upgrade to an authenticated link key to provide full MITM
protection. With Simple Pairing the link key generation is an expensive
operation (compared to Bluetooth 2.0 and before) and doing this twice
during a connection setup causes a noticeable delay when establishing
a new connection. This should be avoided to not regress from the expected
Bluetooth 2.0 connection times. The authentication requirements are known
up-front and so enforce them.

To fulfill these requirements the hci_connect() function has been extended
with an authentication requirement parameter that will be stored inside
the connection information and can be retrieved by userspace at any
time. This allows the correct IO capabilities exchange and results in
the expected behavior.
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>

09ab6f4c

ipvs: Embed user stats structure into kernel stats structure · e9c0ce23

由 Sven Wegener 提交于 9月 08, 2008

Instead of duplicating the fields, integrate a user stats structure into
the kernel stats structure. This is more robust when the members are
changed, because they are now automatically kept in sync.
Signed-off-by: NSven Wegener <sven.wegener@stealer.net>
Reviewed-by: NJulius Volz <juliusv@google.com>
Signed-off-by: NSimon Horman <horms@verge.net.au>

e9c0ce23

ipvs: Restrict connection table size via Kconfig · 2206a3f5

由 Sven Wegener 提交于 9月 08, 2008

Instead of checking the value in include/net/ip_vs.h, we can just
restrict the range in our Kconfig file. This will prevent values outside
of the range early.
Signed-off-by: NSven Wegener <sven.wegener@stealer.net>
Reviewed-by: NJulius Volz <juliusv@google.com>
Signed-off-by: NSimon Horman <horms@verge.net.au>

2206a3f5

warn: Turn the netdev timeout WARN_ON() into a WARN() · 5337407c

由 Arjan van de Ven 提交于 9月 08, 2008

this patch turns the netdev timeout WARN_ON_ONCE() into a WARN_ONCE(),
so that the device and driver names are inside the warning message.
This helps automated tools like kerneloops.org to collect the data
and do statistics, as well as making it more likely that humans
cut-n-paste the important message as part of a bugreport.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5337407c

netns : fix kernel panic in timewait socket destruction · d315492b

由 Daniel Lezcano 提交于 9月 08, 2008

How to reproduce ?
 - create a network namespace
 - use tcp protocol and get timewait socket
 - exit the network namespace
 - after a moment (when the timewait socket is destroyed), the kernel
   panics.

# BUG: unable to handle kernel NULL pointer dereference at
0000000000000007
IP: [<ffffffff821e394d>] inet_twdr_do_twkill_work+0x6e/0xb8
PGD 119985067 PUD 11c5c0067 PMD 0
Oops: 0000 [1] SMP
CPU 1
Modules linked in: ipv6 button battery ac loop dm_mod tg3 libphy ext3 jbd
edd fan thermal processor thermal_sys sg sata_svw libata dock serverworks
sd_mod scsi_mod ide_disk ide_core [last unloaded: freq_table]
Pid: 0, comm: swapper Not tainted 2.6.27-rc2 #3
RIP: 0010:[<ffffffff821e394d>] [<ffffffff821e394d>]
inet_twdr_do_twkill_work+0x6e/0xb8
RSP: 0018:ffff88011ff7fed0 EFLAGS: 00010246
RAX: ffffffffffffffff RBX: ffffffff82339420 RCX: ffff88011ff7ff30
RDX: 0000000000000001 RSI: ffff88011a4d03c0 RDI: ffff88011ac2fc00
RBP: ffffffff823392e0 R08: 0000000000000000 R09: ffff88002802a200
R10: ffff8800a5c4b000 R11: ffffffff823e4080 R12: ffff88011ac2fc00
R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000
FS: 0000000041cbd940(0000) GS:ffff8800bff839c0(0000)
knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000007 CR3: 00000000bd87c000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 0, threadinfo ffff8800bff9e000, task
ffff88011ff76690)
Stack: ffffffff823392e0 0000000000000100 ffffffff821e3a3a
0000000000000008
0000000000000000 ffffffff821e3a61 ffff8800bff7c000 ffffffff8203c7e7
ffff88011ff7ff10 ffff88011ff7ff10 0000000000000021 ffffffff82351108
Call Trace:
<IRQ> [<ffffffff821e3a3a>] ? inet_twdr_hangman+0x0/0x9e
[<ffffffff821e3a61>] ? inet_twdr_hangman+0x27/0x9e
[<ffffffff8203c7e7>] ? run_timer_softirq+0x12c/0x193
[<ffffffff820390d1>] ? __do_softirq+0x5e/0xcd
[<ffffffff8200d08c>] ? call_softirq+0x1c/0x28
[<ffffffff8200e611>] ? do_softirq+0x2c/0x68
[<ffffffff8201a055>] ? smp_apic_timer_interrupt+0x8e/0xa9
[<ffffffff8200cad6>] ? apic_timer_interrupt+0x66/0x70
<EOI> [<ffffffff82011f4c>] ? default_idle+0x27/0x3b
[<ffffffff8200abbd>] ? cpu_idle+0x5f/0x7d


Code: e8 01 00 00 4c 89 e7 41 ff c5 e8 8d fd ff ff 49 8b 44 24 38 4c 89 e7
65 8b 14 25 24 00 00 00 89 d2 48 8b 80 e8 00 00 00 48 f7 d0 <48> 8b 04 d0
48 ff 40 58 e8 fc fc ff ff 48 89 df e8 c0 5f 04 00
RIP [<ffffffff821e394d>] inet_twdr_do_twkill_work+0x6e/0xb8
RSP <ffff88011ff7fed0>
CR2: 0000000000000007

This patch provides a function to purge all timewait sockets related
to a network namespace. The timewait sockets life cycle is not tied with
the network namespace, that means the timewait sockets stay alive while
the network namespace dies. The timewait sockets are for avoiding to
receive a duplicate packet from the network, if the network namespace is
freed, the network stack is removed, so no chance to receive any packets
from the outside world. Furthermore, having a pending destruction timer
on these sockets with a network namespace freed is not safe and will lead
to an oops if the timer callback which try to access data belonging to 
the namespace like for example in:
	inet_twdr_do_twkill_work
		-> NET_INC_STATS_BH(twsk_net(tw), LINUX_MIB_TIMEWAITED);

Purging the timewait sockets at the network namespace destruction will:
 1) speed up memory freeing for the namespace
 2) fix kernel panic on asynchronous timewait destruction
Signed-off-by: NDaniel Lezcano <dlezcano@fr.ibm.com>
Acked-by: NDenis V. Lunev <den@openvz.org>
Acked-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d315492b

07 9月, 2008 1 次提交

sched: arch_reinit_sched_domains() must destroy domains to force rebuild · dfb512ec

由 Max Krasnyansky 提交于 8月 29, 2008

What I realized recently is that calling rebuild_sched_domains() in
arch_reinit_sched_domains() by itself is not enough when cpusets are enabled.
partition_sched_domains() code is trying to avoid unnecessary domain rebuilds
and will not actually rebuild anything if new domain masks match the old ones.

What this means is that doing
     echo 1 > /sys/devices/system/cpu/sched_mc_power_savings
on a system with cpusets enabled will not take affect untill something changes
in the cpuset setup (ie new sets created or deleted).

This patch fixes restore correct behaviour where domains must be rebuilt in
order to enable MC powersaving flags.

Test on quad-core Core2 box with both CONFIG_CPUSETS and !CONFIG_CPUSETS.
Also tested on dual-core Core2 laptop. Lockdep is happy and things are working
as expected.
Signed-off-by: NMax Krasnyansky <maxk@qualcomm.com>
Tested-by: NVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

dfb512ec

06 9月, 2008 6 次提交

x86: add NOPL as a synthetic CPU feature bit · b6734c35

由 H. Peter Anvin 提交于 8月 18, 2008

The long noops ("NOPL") are supposed to be detected by family >= 6.
Unfortunately, several non-Intel x86 implementations, both hardware
and software, don't obey this dictum. Instead, probe for NOPL
directly by executing a NOPL instruction and see if we get #UD.
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

b6734c35

tracehook: comment pasto fixes · 22f30168

由 Roland McGrath 提交于 9月 05, 2008

Fix some pasto's in comments in the new linux/tracehook.h and
asm-generic/syscall.h files.
Reported-by: NWenji Huang <wenji.huang@oracle.com>
Signed-off-by: NRoland McGrath <roland@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

22f30168

res_counter: fix off-by-one bug in setting limit · 11d55d2c

由 Li Zefan 提交于 9月 05, 2008

I found we can no longer set limit to 0 with 2.6.27-rcX:
 # mount -t cgroup -omemory xxx /mnt
 # mkdir /mnt/0
 # echo 0 > /mnt/0/memory.limit_in_bytes
 bash: echo: write error: Device or resource busy

It turned out 'limit' can't be set to 'usage', which is wrong IMO.
Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

11d55d2c

[MIPS] Fix WARNING: at kernel/smp.c:290 · e0cee3ee

由 Thomas Bogendoerfer 提交于 8月 04, 2008

trap_init issues flush_icache_range(), which uses ipi functions to
get icache flushing done on all cpus. But this is done before interrupts
are enabled and caused WARN_ON messages. This changeset introduces
a new local_flush_icache_range() and uses it before interrupts (and
additional CPUs) are enabled to avoid this problem.
Signed-off-by: NThomas Bogendoerfer <tsbogend@alpha.franken.de>
Signed-off-by: NRalf Baechle <ralf@linux-mips.org>

e0cee3ee

cfg80211: keep track of supported interface modes · f59ac048

由 Luis R. Rodriguez 提交于 8月 29, 2008

It is obviously good for userspace to know up front which
interface modes a given piece of hardware might support (even
if adding such an interface might fail later because of
concurrency issues), so let's make cfg80211 aware of that.
For good measure, disallow adding interfaces in all other
modes so drivers don't forget to announce support for one mode
when they add it.
Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NStephen Blackheath <tramp.enshrine.stephen@blacksapphire.com>
Signed-off-by: NIvo van Doorn <IvDoorn@gmail.com>
Signed-off-by: NLuis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

f59ac048

sched: fix process time monotonicity · 49048622

由 Balbir Singh 提交于 9月 05, 2008

Spencer reported a problem where utime and stime were going negative despite
the fixes in commit b27f03d4. The suspected
reason for the problem is that signal_struct maintains it's own utime and
stime (of exited tasks), these are not updated using the new task_utime()
routine, hence sig->utime can go backwards and cause the same problem
to occur (sig->utime, adds tsk->utime and not task_utime()). This patch
fixes the problem

TODO: using max(task->prev_utime, derived utime) works for now, but a more
generic solution is to implement cputime_max() and use the cputime_gt()
function for comparison.

Reported-by: spencer@bluehost.com
Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

49048622

05 9月, 2008 14 次提交

Fix conditional export of kvh.h and a.out.h to userspace. · afbc8d8e

由 Khem Raj 提交于 9月 04, 2008

Some architectures have moved the asm/ into arch/ and some have not.
This patch checks for a.out.h and kvh.h in both places before exporting
the corresponding file from linux/

[dwmw2: simplified a little]
Signed-off-by: NKhem Raj <raj.khem@gmail.com>
Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>

afbc8d8e

clockevents: prevent clockevent event_handler ending up handler_noop · 7c1e7689

由 Venkatesh Pallipadi 提交于 9月 03, 2008

There is a ordering related problem with clockevents code, due to which
clockevents_register_device() called after tickless/highres switch
will not work. The new clockevent ends up with clockevents_handle_noop as
event handler, resulting in no timer activity.

The problematic path seems to be

* old device already has hrtimer_interrupt as the event_handler
* new clockevent device registers with a higher rating
* tick_check_new_device() is called
  * clockevents_exchange_device() gets called
    * old->event_handler is set to clockevents_handle_noop
  * tick_setup_device() is called for the new device
    * which sets new->event_handler using the old->event_handler which is noop.

Change the ordering so that new device inherits the proper handler.

This does not have any issue in normal case as most likely all the clockevent
devices are setup before the highres switch. But, can potentially be affecting
some corner case where HPET force detect happens after the highres switch.
This was a problem with HPET in MSI mode code that we have been experimenting
with.
Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: NShaohua Li <shaohua.li@intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

7c1e7689

IPVS: Adjust various debug outputs to use new macros · cfc78c5a

由 Julius Volz 提交于 9月 02, 2008

Adjust various debug outputs to use the new *_BUF macro variants for
correct output of v4/v6 addresses.
Signed-off-by: NJulius Volz <juliusv@google.com>
Signed-off-by: NSimon Horman <horms@verge.net.au>

cfc78c5a

IPVS: Convert real server lookup functions · 7937df15

由 Julius Volz 提交于 9月 02, 2008

Convert functions for looking up destinations (real servers) to support
IPv6 services/dests.
Signed-off-by: NJulius Volz <juliusv@google.com>
Signed-off-by: NSimon Horman <horms@verge.net.au>

7937df15

IPVS: Add and bind IPv6 xmit functions · b3cdd2a7

由 Julius Volz 提交于 9月 02, 2008

Add xmit functions for IPv6. Also add the already needed __ip_vs_get_out_rt_v6()
to ip_vs_core.c. Bind the new xmit functions to v6 connections.
Signed-off-by: NJulius Volz <juliusv@google.com>
Signed-off-by: NSimon Horman <horms@verge.net.au>

b3cdd2a7

IPVS: Extend functions for getting/creating connections · 28364a59

由 Julius Volz 提交于 9月 02, 2008

Extend functions for getting/creating connections and connection
templates for IPv6 support and fix the callers.
Signed-off-by: NJulius Volz <juliusv@google.com>
Signed-off-by: NSimon Horman <horms@verge.net.au>

28364a59

IPVS: Extend protocol DNAT/SNAT and state handlers · 0bbdd42b

由 Julius Volz 提交于 9月 02, 2008

Extend protocol DNAT/SNAT and state handlers to work with IPv6. Also
change/introduce new checksumming helper functions for this.
Signed-off-by: NJulius Volz <juliusv@google.com>
Signed-off-by: NSimon Horman <horms@verge.net.au>

0bbdd42b

IPVS: Add 'af' args to protocol handler functions · 51ef348b

由 Julius Volz 提交于 9月 02, 2008

Add 'af' arguments to conn_schedule(), conn_in_get(), conn_out_get() and
csum_check() function pointers in struct ip_vs_protocol. Extend the
respective functions for TCP, UDP, AH and ESP and adjust the callers.

The changes in the callers need to be somewhat extensive, since they now
need to pass a filled out struct ip_vs_iphdr * to the modified functions
instead of a struct iphdr *.
Signed-off-by: NJulius Volz <juliusv@google.com>
Signed-off-by: NSimon Horman <horms@verge.net.au>

51ef348b

IPVS: Add IPv6 support flag to schedulers · b14198f6

由 Julius Volz 提交于 9月 02, 2008

Add 'supports_ipv6' flag to struct ip_vs_scheduler to indicate whether a
scheduler supports IPv6. Set the flag to 1 in schedulers that work with
IPv6, 0 otherwise. This flag is checked in a later patch while trying to
add a service with a specific scheduler. Adjust debug in v6-supporting
schedulers to work with both address families.
Signed-off-by: NJulius Volz <juliusv@google.com>
Signed-off-by: NSimon Horman <horms@verge.net.au>

b14198f6

IPVS: Add v6 support to ip_vs_service_get() · 3c2e0505

由 Julius Volz 提交于 9月 02, 2008

Add support for selecting services based on their address family to
ip_vs_service_get() and adjust the callers.
Signed-off-by: NJulius Volz <juliusv@google.com>
Signed-off-by: NSimon Horman <horms@verge.net.au>

3c2e0505

IPVS: Add internal versions of sockopt interface structs · c860c6b1

由 Julius Volz 提交于 9月 02, 2008

Add extended internal versions of struct ip_vs_service_user and struct
ip_vs_dest_user (the originals can't be modified as they are part
of the old sockopt interface). Adjust ip_vs_ctl.c to work with the new
data structures and add some minor AF-awareness.
Signed-off-by: NJulius Volz <juliusv@google.com>
Signed-off-by: NSimon Horman <horms@verge.net.au>

c860c6b1

IPVS: Add debug macros for v4 and v6 address output · c842a3ad

由 Julius Volz 提交于 9月 02, 2008

Add some debugging macros that allow conditional output of either v4 or v6
addresses, depending on an 'af' parameter. This is done by creating a
temporary string buffer in an outer debug macro and writing addresses'
string representations into it from another macro which can only be used
when inside the outer one.
Signed-off-by: NJulius Volz <juliusv@google.com>
Signed-off-by: NSimon Horman <horms@verge.net.au>

c842a3ad

IPVS: Add general v4/v6 helper functions / data structures · 64aae3cb

由 Julius Volz 提交于 9月 02, 2008

Add a struct ip_vs_iphdr for easier handling of common v4 and v6 header
fields in the same code path. ip_vs_fill_iphdr() helps to fill this struct
from an IPv4 or IPv6 header. Add further helper functions for copying and
comparing addresses.
Signed-off-by: NJulius Volz <juliusv@google.com>
Signed-off-by: NSimon Horman <horms@verge.net.au>

64aae3cb

IPVS: Change IPVS data structures to support IPv6 addresses · e7ade46a

由 Julius Volz 提交于 9月 02, 2008

Introduce new 'af' fields into IPVS data structures for specifying an
entry's address family. Convert IP addresses to be of type union
nf_inet_addr.
Signed-off-by: NJulius Volz <juliusv@google.com>
Signed-off-by: NSimon Horman <horms@verge.net.au>

e7ade46a

04 9月, 2008 12 次提交

dccp: Policy-based packet dequeueing infrastructure · d6da3511

由 Tomasz Grobelny 提交于 9月 04, 2008

This patch adds a generic infrastructure for policy-based dequeueing of 
TX packets and provides two policies:
 * a simple FIFO policy (which is the default) and
 * a priority based policy (set via socket options).
Both policies honour the tx_qlen sysctl for the maximum size of the write
queue (can be overridden via socket options). 

The priority policy uses skb->priority internally to assign an u32 priority
identifier, using the same ranking as SO_PRIORITY. The skb->priority field
is set to 0 when the packet leaves DCCP. The priority is supplied as ancillary
data using cmsg(3), the patch also provides the requisite parsing routines.
Signed-off-by: NTomasz Grobelny <tomasz@grobelny.oswiecenia.net>
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

d6da3511

tcp/dccp: Consolidate common code for RFC 3390 conversion · 6224877b

由 Gerrit Renker 提交于 9月 04, 2008

This patch consolidates the code common to TCP and CCID-2:
 * TCP uses RFC 3390 in a packet-oriented manner (tcp_input.c) and
 * CCID-2 uses RFC 3390 in packet-oriented manner (RFC 4341).
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

6224877b

dccp: Extend CCID packet dequeueing interface · e7937772

由 Gerrit Renker 提交于 9月 04, 2008

This extends the packet dequeuing interface of dccp_write_xmit() to allow
 1. CCIDs to take care of timing when the next packet may be sent;
 2. delayed sending (as before, with an inter-packet gap up to 65.535 seconds).

The main purpose is to take CCID2 out of its polling mode (when it is network-
limited, it tries every millisecond to send, without interruption).
The interface can also be used to support other CCIDs.

The mode of operation for (2) is as follows:
 * new packet is enqueued via dccp_sendmsg() => dccp_write_xmit(),
 * ccid_hc_tx_send_packet() detects that it may not send (e.g. window full), 
 * it signals this condition via `CCID_PACKET_WILL_DEQUEUE_LATER',
 * dccp_write_xmit() returns without further action;
 * after some time the wait-condition for CCID becomes true,
 * that CCID schedules the tasklet,
 * tasklet function calls ccid_hc_tx_send_packet() via dccp_write_xmit(),
 * since the wait-condition is now true, ccid_hc_tx_packet() returns "send now",
 * packet is sent, and possibly more (since dccp_write_xmit() loops).

Code reuse: the taskled function calls dccp_write_xmit(), the timer function
            reduces to a wrapper around the same code.

If the tasklet finds that the socket is locked, it re-schedules the tasklet
function (not the tasklet) after one jiffy.

Changed DCCP_BUG to dccp_pr_debug when transmit_skb returns an error (e.g. when a
local qdisc is used, NET_XMIT_DROP=1 can be returned for many packets).
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

e7937772

dccp ccid-2: Schedule Sync as out-of-band mechanism · c2f42077

由 Gerrit Renker 提交于 9月 04, 2008

The problem with Ack Vectors is that 

  i) their length is variable and can in principle grow quite large,
 ii) it is hard to predict exactly how large they will be.

Due to the second point it seems not a good idea to reduce the MPS; in
particular when on average there is enough room for the Ack Vector and an
increase in length is momentarily due to some burst loss, after which the
Ack Vector returns to its normal/average length.

The solution taken by this patch is to subtract a minimum-expected Ack Vector
length from the MPS (previous patch), and to defer any larger Ack Vectors onto
a separate Sync - but only if indeed there is no space left on the skb.

This patch provides the infrastructure to schedule Sync-packets for transporting
(urgent) out-of-band data. Its signalling is quicker than scheduling an Ack, since
it does not need to wait for new application data.

It can thus serve other parts of the DCCP code as well.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

c2f42077

dccp: Replace magic CCID-specific numbers by symbolic constants · f10ecaee

由 Gerrit Renker 提交于 9月 04, 2008

The constants DCCPO_{MIN,MAX}_CCID_SPECIFIC are nowhere used in the code, but
instead for the CCID-specific options numbers are used.

This patch unifies the use of CCID-specific option numbers, by adding symbolic
names reflecting the definitions in RFC 4340, 10.3.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

f10ecaee

dccp: Initialisation and type-checking of feature sysctls · 0a482267

由 Gerrit Renker 提交于 9月 04, 2008

This patch takes care of initialising and type-checking sysctls related to
feature negotiation. Type checking is important since some of the sysctls
now directly act on the feature-negotiation process.

The sysctls are initialised with the known default values for each feature.
For the type-checking the value constraints from RFC 4340 are used:

 * Sequence Window uses the specified Wmin=32, the maximum is ulong (4 bytes),
   tested and confirmed that it works up to 4294967295 - for Gbps speed;
 * Ack Ratio is between 0 .. 0xffff (2-byte unsigned integer);
 * CCIDs are between 0 .. 255;
 * request_retries, retries1, retries2 also between 0..255 for good measure;
 * tx_qlen is checked to be non-negative;
 * sync_ratelimit remains as before.

Further changes:
----------------
Performed s@sysctl_dccp_feat@sysctl_dccp@g since the sysctls are now in feat.c.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>

0a482267

dccp: Implement both feature-local and feature-remote Sequence Window feature · 51c7d4fa

由 Gerrit Renker 提交于 9月 04, 2008

This adds full support for local/remote Sequence Window feature, from which the 
  * sequence-number-validity (W) and 
  * acknowledgment-number-validity (W') windows 
derive as specified in RFC 4340, 7.5.3. 

Specifically, the following changes are introduced:
  * integrated new socket fields into dccp_sk;
  * updated the update_gsr/gss routines with regard to these fields;
  * updated handler code: the Sequence Window feature is located at the TX side,
    so the local feature is meant if the handler-rx flag is false;
  * the initialisation of `rcv_wnd' in reqsk is removed, since
    - rcv_wnd is not used by the code anywhere;
    - sequence number checks are not done in the LISTEN state (cf. 7.5.3);
    - dccp_check_req checks the Ack number validity more rigorously;
  * the `struct dccp_minisock' became empty and is now removed.

Until the handshake completes with activating negotiated values, the local/remote
Sequence-Window values are undefined and thus can not reliably be estimated.
This issue is addressed in a separate patch.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>

51c7d4fa

dccp: Initialisation framework for feature negotiation · 5d3dac26

由 Gerrit Renker 提交于 9月 04, 2008

This initialises feature negotiation from two tables, which are initialised
from sysctls. 

As a novel feature, specifics of the implementation (e.g. currently short
seqnos and ECN are not supported) are advertised for robustness.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>

5d3dac26

dccp ccid-2: Phase out the use of boolean Ack Vector sysctl · b235dc4a

由 Gerrit Renker 提交于 9月 04, 2008

This removes the use of the sysctl and the minisock variable for the Send Ack
Vector feature, which is now handled fully dynamically via feature negotiation;
i.e. when CCID2 is enabled, Ack Vectors are automatically enabled (as per
RFC 4341, 4.).

Using a sysctl in parallel to this implementation would open the door to
crashes, since much of the code relies on tests of the boolean minisock /
sysctl variable. Thus, this patch replaces all tests of type

	if (dccp_msk(sk)->dccpms_send_ack_vector)
		/* ... */
with
	if (dp->dccps_hc_rx_ackvec != NULL)
		/* ... */

The dccps_hc_rx_ackvec is allocated by the dccp_hdlr_ackvec() when feature
negotiation concluded that Ack Vectors are to be used on the half-connection.
Otherwise, it is NULL (due to dccp_init_sock/dccp_create_openreq_child),
so that the test is a valid one.

The activation handler for Ack Vectors is called as soon as the feature
negotiation has concluded at the
 * server when the Ack marking the transition RESPOND => OPEN arrives;
 * client after it has sent its ACK, marking the transition REQUEST => PARTOPEN.

Adding the sequence number of the Response packet to the Ack Vector has been 
removed, since
 (a) connection establishment implies that the Response has been received;
 (b) the CCIDs only look at packets received in the (PART)OPEN state, i.e.
     this entry will always be ignored;
 (c) it can not be used for anything useful - to detect loss for instance, only
     packets received after the loss can serve as pseudo-dupacks.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>

b235dc4a

dccp: Remove manual influence on NDP Count feature · 68e074bf

由 Gerrit Renker 提交于 9月 04, 2008

Updating the NDP count feature is handled automatically now:
 * for CCID-2 it is disabled, since the code does not use NDP counts;
 * for CCID-3 it is enabled, as NDP counts are used to determine loss lengths.

Allowing the user to change NDP values leads to unpredictable and failing
behaviour, since it is then possible to disable NDP counts even when they
are needed (e.g. in CCID-3).

This means that only those user settings are sensible that agree with the
values for Send NDP Count implied by the choice of CCID. But those settings
are already activated by the feature negotiation (CCID dependency tracking),
hence this form of support is redundant.

At startup the initialisation of the NDP count feature is with the default
value of 0, which is done implicitly by the zeroing-out of the socket when
it is allocated. If the choice of CCID or feature negotiation enables NDP
count, this will then be updated via the NDP activation handler.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>

68e074bf

dccp: Remove obsolete parts of the old CCID interface · 78673e24

由 Gerrit Renker 提交于 9月 04, 2008

The TX/RX CCIDs of the minisock are now redundant: similar to the Ack Vector
case, their value equals initially that of the sysctl, but at the end of
feature negotiation may be something different.

The old interface removed by this patch thus has been replaced by the newer
interface to dynamically query the currently loaded CCIDs earlier in this
patch set.

Also removed the constructors for the TX CCID and the RX CCID, since the
switch rx/non-rx is done by the handler in minisocks.c (and the handler is
the only place in the code where CCIDs are loaded).
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: NIan McDonald <ian.mcdonald@jandi.co.nz>

78673e24

dccp: Set per-connection CCIDs via socket options · fade756f

由 Gerrit Renker 提交于 9月 04, 2008

With this patch, TX/RX CCIDs can now be changed on a per-connection basis, which
overrides the defaults set by the global sysctl variables for TX/RX CCIDs.

To make full use of this facility, the remaining patches of this patch set are
needed, which track dependencies and activate negotiated feature values.

Note on the maximum number of CCIDs that can be registered:
-----------------------------------------------------------
The maximum number of CCIDs that can be registered on the socket is constrained
by the space in a Confirm/Change feature negotiation option. 

The space in these in turn depends on the size of header options as defined
in RFC 4340, 5.8. Since this is a recurring constant, it has been moved from
ackvec.h into linux/dccp.h, clarifying its purpose.

Relative to this size, the maximum number of CCID identifiers that can be 
present in a Confirm option (which always consumes 1 byte more than a Change
option, cf. 6.1) is 2 bytes less than the maximum TLV size: one for the
CCID-feature-type and one for the selected value.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

fade756f

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功