提交 · 5640f7685831e088fe6c2e1f863a6805962f8e81 · _Walt / cloud-kernel

25 9月, 2012 2 次提交

net: use a per task frag allocator · 5640f768

由 Eric Dumazet 提交于 9月 23, 2012

We currently use a per socket order-0 page cache for tcp_sendmsg()
operations.

This page is used to build fragments for skbs.

Its done to increase probability of coalescing small write() into
single segments in skbs still in write queue (not yet sent)

But it wastes a lot of memory for applications handling many mostly
idle sockets, since each socket holds one page in sk->sk_sndmsg_page

Its also quite inefficient to build TSO 64KB packets, because we need
about 16 pages per skb on arches where PAGE_SIZE = 4096, so we hit
page allocator more than wanted.

This patch adds a per task frag allocator and uses bigger pages,
if available. An automatic fallback is done in case of memory pressure.

(up to 32768 bytes per frag, thats order-3 pages on x86)

This increases TCP stream performance by 20% on loopback device,
but also benefits on other network devices, since 8x less frags are
mapped on transmit and unmapped on tx completion. Alexander Duyck
mentioned a probable performance win on systems with IOMMU enabled.

Its possible some SG enabled hardware cant cope with bigger fragments,
but their ndo_start_xmit() should already handle this, splitting a
fragment in sub fragments, since some arches have PAGE_SIZE=65536

Successfully tested on various ethernet devices.
(ixgbe, igb, bnx2x, tg3, mellanox mlx4)
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Vijay Subramanian <subramanian.vijay@gmail.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: NVijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5640f768

D
net: Remove unnecessary NULL check in scm_destroy(). · 2a6c8c79
由 David S. Miller 提交于 9月 24, 2012
```
All callers provide a non-NULL scm argument.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
2a6c8c79

24 9月, 2012 3 次提交

netfilter: nfnetlink_queue: add NFQA_CAP_LEN attribute · 6ee584be

由 Pablo Neira Ayuso 提交于 9月 24, 2012

This patch adds the NFQA_CAP_LEN attribute that allows us to know
what is the real packet size from user-space (even if we decided
to retrieve just a few bytes from the packet instead of all of it).

Security software that inspects packets should always check for
this new attribute to make sure that it is inspecting the entire
packet.

This also helps to provide a workaround for the problem described
in: http://marc.info/?l=netfilter-devel&m=134519473212536&w=2

Original idea from Florian Westphal.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

6ee584be

netfilter: nf_ct_ftp: add sequence tracking pickup facility for injected entries · 7be54ca4

由 Pablo Neira Ayuso 提交于 9月 21, 2012

This patch allows the FTP helper to pickup the sequence tracking from
the first packet seen. This is useful to fix the breakage of the first
FTP command after the failover while using conntrackd to synchronize
states.

The seq_aft_nl_num field in struct nf_ct_ftp_info has been shrinked to
16-bits (enough for what it does), so we can use the remaining 16-bits
to store the flags while using the same size for the private FTP helper
data.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

7be54ca4

netfilter: xt_time: add support to ignore day transition · 54eb3df3

由 Florian Westphal 提交于 9月 17, 2012

Currently, if you want to do something like:
"match Monday, starting 23:00, for two hours"
You need two rules, one for Mon 23:00 to 0:00 and one for Tue 0:00-1:00.

The rule: --weekdays Mo --timestart 23:00  --timestop 01:00

looks correct, but it will first match on monday from midnight to 1 a.m.
and then again for another hour from 23:00 onwards.

This permits userspace to explicitly ignore the day transition and
match for a single, continuous time period instead.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

54eb3df3

23 9月, 2012 9 次提交

D
netlink: Rearrange netlink_kernel_cfg to save space on 64-bit. · c9d2ea96
由 David S. Miller 提交于 9月 23, 2012
```
Suggested by Jan Engelhardt.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
c9d2ea96

netfilter: ipset: Support to match elements marked with "nomatch" · 3e0304a5

由 Jozsef Kadlecsik 提交于 9月 21, 2012

Exceptions can now be matched and we can branch according to the
possible cases:

a. match in the set if the element is not flagged as "nomatch"
b. match in the set if the element is flagged with "nomatch"
c. no match

i.e.

iptables ... -m set --match-set ... -j ...
iptables ... -m set --match-set ... --nomatch-entries -j ...
...
Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>

3e0304a5

J
netfilter: ipset: Coding style fixes · 3ace95c0
由 Jozsef Kadlecsik 提交于 9月 21, 2012
```
Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
```
3ace95c0
J
netfilter: ipset: Include supported revisions in module description · 10111a6e
由 Jozsef Kadlecsik 提交于 9月 21, 2012
```
Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
```
10111a6e
J
netfilter: ipset: Rewrite cidr book keeping to handle /0 · 85f8c13e
由 Jozsef Kadlecsik 提交于 9月 22, 2012
```
Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
```
85f8c13e

tcp: TCP Fast Open Server - take SYNACK RTT after completing 3WHS · 016818d0

由 Neal Cardwell 提交于 9月 22, 2012

When taking SYNACK RTT samples for servers using TCP Fast Open, fix
the code to ensure that we only call tcp_valid_rtt_meas() after we
receive the ACK that completes the 3-way handshake.

Previously we were always taking an RTT sample in
tcp_v4_syn_recv_sock(). However, for TCP Fast Open connections
tcp_v4_conn_req_fastopen() calls tcp_v4_syn_recv_sock() at the time we
receive the SYN. So for TFO we must wait until tcp_rcv_state_process()
to take the RTT sample.

To fix this, we wait until after TFO calls tcp_v4_syn_recv_sock()
before we set the snt_synack timestamp, since tcp_synack_rtt_meas()
already ensures that we only take a SYNACK RTT sample if snt_synack is
non-zero. To be careful, we only take a snt_synack timestamp when
a SYNACK transmit or retransmit succeeds.
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

016818d0

tcp: extract code to compute SYNACK RTT · 623df484

由 Neal Cardwell 提交于 9月 22, 2012

In preparation for adding another spot where we compute the SYNACK
RTT, extract this code so that it can be shared.
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

623df484

ptp: clarify the clock_name sysfs attribute · de465846

由 Richard Cochran 提交于 9月 22, 2012

There has been some confusion among PHC driver authors about the
intended purpose of the clock_name attribute. This patch expands the
documation in order to clarify how the clock_name field should be
understood.
Signed-off-by: NRichard Cochran <richardcochran@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

de465846

ptp: link the phc device to its parent device · 1ef76158

由 Richard Cochran 提交于 9月 22, 2012

PTP Hardware Clock devices appear as class devices in sysfs. This patch
changes the registration API to use the parent device, clarifying the
clock's relationship to the underlying device.
Signed-off-by: NRichard Cochran <richardcochran@gmail.com>
Acked-by: NBen Hutchings <bhutchings@solarflare.com>
Acked-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1ef76158

22 9月, 2012 1 次提交

netlink: use <linux/export.h> instead of <linux/module.h> · abb17e6c

由 Pablo Neira Ayuso 提交于 9月 21, 2012

Since (9f00d977 netlink: hide struct module parameter in netlink_kernel_create),
linux/netlink.h includes linux/module.h because of the use of THIS_MODULE.

Use linux/export.h instead, as suggested by Stephen Rothwell, which is
significantly smaller and defines THIS_MODULES.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

abb17e6c

21 9月, 2012 2 次提交

ipv4: Don't add TCP-code in inet_sock_destruct · bb68b647

由 Christoph Paasch 提交于 9月 18, 2012

Signed-off-by: NChristoph Paasch <christoph.paasch@uclouvain.be>
Acked-by: NH.K. Jerry Chu <hkchu@google.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bb68b647

IB/ipoib: Add rtnl_link_ops support · 9baa0b03

由 Or Gerlitz 提交于 9月 13, 2012

Add rtnl_link_ops to IPoIB, with the first usage being child device
create/delete through them. Childs devices are now either legacy ones,
created/deleted through the ipoib sysfs entries, or RTNL ones.

Adding support for RTNL childs involved refactoring of ipoib_vlan_add
which is now used by both the sysfs and the link_ops code.

Also, added ndo_uninit entry to support calling unregister_netdevice_queue
from the rtnl dellink entry. This required removal of calls to
ipoib_dev_cleanup from the driver in flows which use unregister_netdevice,
since the networking core will invoke ipoib_uninit which does exactly that.
Signed-off-by: NErez Shitrit <erezsh@mellanox.co.il>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9baa0b03

20 9月, 2012 6 次提交

ipv6: unify fragment thresh handling code · 6b102865

由 Amerigo Wang 提交于 9月 18, 2012

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michal Kubeček <mkubecek@suse.cz>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: NCong Wang <amwang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6b102865

ipv6: make ip6_frag_nqueues() and ip6_frag_mem() static inline · d4915c08

由 Amerigo Wang 提交于 9月 18, 2012

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michal Kubeček <mkubecek@suse.cz>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: NCong Wang <amwang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d4915c08

ipv6: unify conntrack reassembly expire code with standard one · b836c99f

由 Amerigo Wang 提交于 9月 18, 2012

Two years ago, Shan Wei tried to fix this:
http://patchwork.ozlabs.org/patch/43905/

The problem is that RFC2460 requires an ICMP Time
Exceeded -- Fragment Reassembly Time Exceeded message should be
sent to the source of that fragment, if the defragmentation
times out.

"
   If insufficient fragments are received to complete reassembly of a
   packet within 60 seconds of the reception of the first-arriving
   fragment of that packet, reassembly of that packet must be
   abandoned and all the fragments that have been received for that
   packet must be discarded.  If the first fragment (i.e., the one
   with a Fragment Offset of zero) has been received, an ICMP Time
   Exceeded -- Fragment Reassembly Time Exceeded message should be
   sent to the source of that fragment.
"

As Herbert suggested, we could actually use the standard IPv6
reassembly code which follows RFC2460.

With this patch applied, I can see ICMP Time Exceeded sent
from the receiver when the sender sent out 3/4 fragmented
IPv6 UDP packet.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michal Kubeček <mkubecek@suse.cz>
Cc: David Miller <davem@davemloft.net>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: NCong Wang <amwang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b836c99f

ipv6: add a new namespace for nf_conntrack_reasm · c038a767

由 Amerigo Wang 提交于 9月 18, 2012

As pointed by Michal, it is necessary to add a new
namespace for nf_conntrack_reasm code, this prepares
for the second patch.

Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Michal Kubeček <mkubecek@suse.cz>
Cc: David Miller <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: netfilter-devel@vger.kernel.org
Signed-off-by: NCong Wang <amwang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c038a767

netpoll: call ->ndo_select_queue() in tx path · 8c4c49df

由 Amerigo Wang 提交于 9月 17, 2012

In netpoll tx path, we miss the chance of calling ->ndo_select_queue(),
thus could cause problems when bonding is involved.

This patch makes dev_pick_tx() extern (and rename it to netdev_pick_tx())
to let netpoll call it in netpoll_send_skb_on_dev().
Reported-by: NSylvain Munaut <s.munaut@whatever-company.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: NCong Wang <amwang@redhat.com>
Tested-by: NSylvain Munaut <s.munaut@whatever-company.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8c4c49df

netdev: make address const in device address management · 6b6e2725

由 stephen hemminger 提交于 9月 17, 2012

The internal functions for add/deleting addresses don't change
their argument.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6b6e2725

18 9月, 2012 1 次提交
- D
  llc: Remove stray reference to sysctl_llc_station_ack_timeout. · b4516a28
  由 David S. Miller 提交于 9月 17, 2012
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  b4516a28
14 9月, 2012 2 次提交

scsi_netlink: Remove dead and buggy code · 8289bab1

由 Eric W. Biederman 提交于 9月 07, 2012

The scsi netlink code confuses the netlink port id with a process id,
going so far as to read NETLINK_CREDS(skb)->pid instead of the correct
NETLINK_CB(skb).pid. Fortunately it does not matter because nothing
registers to respond to scsi netlink requests.

The only interesting use of the scsi_netlink interface is
fc_host_post_vendor_event which sends a netlink multicast message.

Since nothing registers to handle scsi netlink messages kill all of the
registration logic, while retaining the same error handling behavior
preserving the userspace visible behavior and removing all of the
confused code that thought a netlink port id was a process id.

This was tested with a kernel allyesconfig build which had no problems.

Cc: James Bottomley <James.Bottomley@parallels.com>
Cc: James Smart <James.Smart@Emulex.Com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8289bab1

mISDN: Fix wrong usage of flush_work_sync while holding locks · 4b921eda

由 Karsten Keil 提交于 9月 13, 2012

It is a bad idea to hold a spinlock and call flush_work_sync.
Move the workqueue cleanup outside the spinlock and use cancel_work_sync,
on closing the channel this seems to be the more correct function.
Remove the never used and constant return value of mISDN_freebchannel.
Signed-off-by: NKarsten Keil <keil@b1-systems.de>
Cc: <stable@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4b921eda

13 9月, 2012 1 次提交

drm: Drop the NV12M and YUV420M formats · d9dd85dd

由 Ville Syrjälä 提交于 4月 20, 2012

The NV12M/YUV420M formats are identical to the NV12/YUV420 formats.
So just remove these duplicated format names.

This might look like breaking the ABI, but the code has never actually
accepted these formats, so nothing can be using them.
Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: NInki Dae <inki.dae@samsung.com>
Signed-off-by: NKyungmin Park <kyungmin.park@samsung.com>

d9dd85dd

12 9月, 2012 1 次提交

i2c: pnx: Fix read transactions of >= 2 bytes · c076ada4

由 Roland Stigge 提交于 8月 08, 2012

On transactions with n>=2 bytes, the controller actually wrongly clocks in n+1
bytes. This is caused by the (wrong) assumption that RFE in the Status Register
is 1 iff there is no byte already ordered (via a dummy TX byte). This lead to
the implementation of synchronized byte ordering, e.g.:

Dummy-TX - RX - Dummy-TX - RX - ...

But since RFE actually stays high after some Dummy-TX, it rather looks like:

Dummy-TX - Dummy-TX - RX - Dummy-TX - RX - (RX)

The last RX byte is clocked in by the bus controller, but ignored by the kernel
when filling the userspace buffer.

This patch fixes the issue by asking for RX via Dummy-TX asynchronously.
Introducing a separate counter for TX bytes.
Signed-off-by: NRoland Stigge <stigge@antcom.de>
Signed-off-by: NWolfram Sang <w.sang@pengutronix.de>

c076ada4

11 9月, 2012 3 次提交

etherdevice: introduce help function eth_zero_addr() · 6d57e907

由 Duan Jiong 提交于 9月 08, 2012

a lot of code has either the memset or an inefficient copy
from a static array that contains the all-zeros Ethernet address.
Introduce help function eth_zero_addr() to fill an address with
all zeros, making the code clearer and allowing us to get rid of
some constant arrays.
Signed-off-by: NDuan Jiong <djduanjiong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6d57e907

filter: add MOD operation · b6069a95

由 Eric Dumazet 提交于 9月 07, 2012

Add a new ALU opcode, to compute a modulus.

Commit ffe06c17 used an ancillary to implement XOR_X,
but here we reserve one of the available ALU opcode to implement both
MOD_X and MOD_K
Signed-off-by: NEric Dumazet <edumazet@google.com>
Suggested-by: NGeorge Bakos <gbakos@alpinista.org>
Cc: Jay Schulist <jschlst@samba.org>
Cc: Jiri Pirko <jpirko@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b6069a95

netlink: Rename pid to portid to avoid confusion · 15e47304

由 Eric W. Biederman 提交于 9月 07, 2012

It is a frequent mistake to confuse the netlink port identifier with a
process identifier.  Try to reduce this confusion by renaming fields
that hold port identifiers portid instead of pid.

I have carefully avoided changing the structures exported to
userspace to avoid changing the userspace API.

I have successfully built an allyesconfig kernel with this change.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

15e47304

09 9月, 2012 2 次提交

netlink: hide struct module parameter in netlink_kernel_create · 9f00d977

由 Pablo Neira Ayuso 提交于 9月 08, 2012

This patch defines netlink_kernel_create as a wrapper function of
__netlink_kernel_create to hide the struct module *me parameter
(which seems to be THIS_MODULE in all existing netlink subsystems).

Suggested by David S. Miller.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9f00d977

netlink: kill netlink_set_nonroot · 9785e10a

由 Pablo Neira Ayuso 提交于 9月 08, 2012

Replace netlink_set_nonroot by one new field `flags' in
struct netlink_kernel_cfg that is passed to netlink_kernel_create.

This patch also renames NL_NONROOT_* to NL_CFG_F_NONROOT_* since
now the flags field in nl_table is generic (so we can add more
flags if needed in the future).

Also adjust all callers in the net-next tree to use these flags
instead of netlink_set_nonroot.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9785e10a

08 9月, 2012 5 次提交

pps/ptp: Allow PHC devices to adjust PPS events for known delay · 220a60a4

由 Ben Hutchings 提交于 9月 03, 2012

Initial version by Stuart Hodgson <smhodgson@solarflare.com>

Some PHC device drivers may deliver PPS events with a significant
and variable delay, but still be able to measure precisely what
that delay is.

Add a pps_sub_ts() function for subtracting a delay from the
timestamp(s) in a PPS event, and a PTP event type (PTP_CLOCK_PPSUSR)
for which the caller provides a complete PPS event.
Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>

220a60a4

ipv4/route: arg delay is useless in rt_cache_flush() · 4ccfe6d4

由 Nicolas Dichtel 提交于 9月 07, 2012

Since route cache deletion (89aef892), delay is no
more used. Remove it.
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4ccfe6d4

scm: Don't use struct ucred in NETLINK_CB and struct scm_cookie. · dbe9a417

由 Eric W. Biederman 提交于 9月 06, 2012

Passing uids and gids on NETLINK_CB from a process in one user
namespace to a process in another user namespace can result in the
wrong uid or gid being presented to userspace.  Avoid that problem by
passing kuids and kgids instead.

- define struct scm_creds for use in scm_cookie and netlink_skb_parms
  that holds uid and gid information in kuid_t and kgid_t.

- Modify scm_set_cred to fill out scm_creds by heand instead of using
  cred_to_ucred to fill out struct ucred.  This conversion ensures
  userspace does not get incorrect uid or gid values to look at.

- Modify scm_recv to convert from struct scm_creds to struct ucred
  before copying credential values to userspace.

- Modify __scm_send to populate struct scm_creds on in the scm_cookie,
  instead of just copying struct ucred from userspace.

- Modify netlink_sendmsg to copy scm_creds instead of struct ucred
  into the NETLINK_CB.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dbe9a417

net/mlx4_core: Add security check / enforcement for flow steering rules set for VMs · 7fb40f87

由 Hadar Hen Zion 提交于 9月 05, 2012

Since VFs may be mapped to VMs which aren't trusted entities,  flow
steering rules attached through the wrapper on behalf of VFs must be
checked to make sure that their L2 specification relate to MAC address
assigned to that VF, and add L2 specification if its missing.
Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7fb40f87

net/mlx4_core: Put Firmware flow steering structures in common header files · a8edc3bf

由 Hadar Hen Zion 提交于 9月 05, 2012

To allow for usage of the flow steering Firmware structures in more locations over the driver,
such as the resource tracker, move them from mcg.c to common header files.
Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a8edc3bf

07 9月, 2012 2 次提交

SUNRPC: Fix a UDP transport regression · f39c1bfb

由 Trond Myklebust 提交于 9月 07, 2012

Commit 43cedbf0 (SUNRPC: Ensure that
we grab the XPRT_LOCK before calling xprt_alloc_slot) is causing
hangs in the case of NFS over UDP mounts.

Since neither the UDP or the RDMA transport mechanism use dynamic slot
allocation, we can skip grabbing the socket lock for those transports.
Add a new rpc_xprt_op to allow switching between the TCP and UDP/RDMA
case.

Note that the NFSv4.1 back channel assigns the slot directly
through rpc_run_bc_task, so we can ignore that case.
Reported-by: NDick Streefland <dick.streefland@altium.nl>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [>= 3.1]

f39c1bfb

kobject: fix oops with "input0: bad kobj_uevent_env content in show_uevent()" · 60e233a5

由 Bjørn Mork 提交于 9月 02, 2012

Fengguang Wu <fengguang.wu@intel.com> writes:

> After the __devinit* removal series, I can still get kernel panic in
> show_uevent(). So there are more sources of bug..
>
> Debug patch:
>
> @@ -343,8 +343,11 @@ static ssize_t show_uevent(struct device
>                 goto out;
>
>         /* copy keys to file */
> -       for (i = 0; i < env->envp_idx; i++)
> +       dev_err(dev, "uevent %d env[%d]: %s/.../%s\n", env->buflen, env->envp_idx, top_kobj->name, dev->kobj.name);
> +       for (i = 0; i < env->envp_idx; i++) {
> +               printk(KERN_ERR "uevent %d env[%d]: %s\n", (int)count, i, env->envp[i]);
>                 count += sprintf(&buf[count], "%s\n", env->envp[i]);
> +       }
>
> Oops message, the env[] is again not properly initilized:
>
> [   44.068623] input input0: uevent 61 env[805306368]: input0/.../input0
> [   44.069552] uevent 0 env[0]: (null)

This is a completely different CONFIG_HOTPLUG problem, only
demonstrating another reason why CONFIG_HOTPLUG should go away.  I had a
hard time trying to disable it anyway ;-)

The problem this time is lots of code assuming that a call to
add_uevent_var() will guarantee that env->buflen > 0.  This is not true
if CONFIG_HOTPLUG is unset.  So things like this end up overwriting
env->envp_idx because the array index is -1:

	if (add_uevent_var(env, "MODALIAS="))
		return -ENOMEM;
        len = input_print_modalias(&env->buf[env->buflen - 1],
				   sizeof(env->buf) - env->buflen,
				   dev, 0);

Don't know what the best action is, given that there seem to be a *lot*
of this around the kernel.  This patch "fixes" the problem for me, but I
don't know if it can be considered an appropriate fix.

[ It is the correct fix for now, for 3.7 forcing CONFIG_HOTPLUG to
always be on is the longterm fix, but it's too late for 3.6 and older
kernels to resolve this that way - gregkh ]
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NBjørn Mork <bjorn@mork.no>
Tested-by: NFengguang Wu <fengguang.wu@intel.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

60e233a5

_Walt / cloud-kernel 与 Fork 源项目一致

_Walt / cloud-kernel
与 Fork 源项目一致