提交 · f0940cee222790e6e995a23f25c4ffb23f939a24 · openanolis / cloud-kernel

13 1月, 2011 4 次提交

A
pass default dentry_operations to mount_pseudo() · c74a1cbb
由 Al Viro 提交于 1月 12, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
c74a1cbb

net/ceph: make ceph_msgr_wq non-reentrant · f363e45f

由 Tejun Heo 提交于 1月 03, 2011

ceph messenger code does a rather complex dancing around multithread
workqueue to make sure the same work item isn't executed concurrently
on different CPUs.  This restriction can be provided by workqueue with
WQ_NON_REENTRANT.

Make ceph_msgr_wq non-reentrant workqueue with the default concurrency
level and remove the QUEUED/BUSY logic.

* This removes backoff handling in con_work() but it couldn't reliably
  block execution of con_work() to begin with - queue_con() can be
  called after the work started but before BUSY is set.  It seems that
  it was an optimization for a rather cold path and can be safely
  removed.

* The number of concurrent work items is bound by the number of
  connections and connetions are independent from each other.  With
  the default concurrency level, different connections will be
  executed independently.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Sage Weil <sage@newdream.net>
Cc: ceph-devel@vger.kernel.org
Signed-off-by: NSage Weil <sage@newdream.net>

f363e45f

ceph: Always free allocated memory in osdmap_decode() · b0aee351

由 Jesper Juhl 提交于 12月 24, 2010

Always free memory allocated to 'pi' in
net/ceph/osdmap.c::osdmap_decode().
Signed-off-by: NJesper Juhl <jj@chaosbits.net>
Signed-off-by: NSage Weil <sage@newdream.net>

b0aee351

ceph: add dir_layout to inode · 6c0f3af7

由 Sage Weil 提交于 11月 16, 2010

Add a ceph_dir_layout to the inode, and calculate dentry hash values based
on the parent directory's specified dir_hash function. This is needed
because the old default Linux dcache hash function is extremely week and
leads to a poor distribution of files among dir fragments.
Signed-off-by: NSage Weil <sage@newdream.net>

6c0f3af7

12 1月, 2011 4 次提交

netfilter: fix race in conntrack between dump_table and destroy · 13ee6ac5

由 Stephen Hemminger 提交于 1月 11, 2011

The netlink interface to dump the connection tracking table has a race
when entries are deleted at the same time. A customer reported a crash
and the backtrace showed thatctnetlink_dump_table was running while a
conntrack entry was being destroyed.
(see https://bugzilla.vyatta.com/show_bug.cgi?id=6402).

According to RCU documentation, when using hlist_nulls the reader
must handle the case of seeing a deleted entry and not proceed
further down the linked list.  The old code would continue
which caused the scan to walk into the free list.

This patch uses locking (rather than RCU) for this operation which
is guaranteed safe, and no longer requires getting reference while
doing dump operation.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

13ee6ac5

ah: reload pointers to skb data after calling skb_cow_data() · 4b0ef1f2

由 Dang Hongwu 提交于 1月 11, 2011

skb_cow_data() may allocate a new data buffer, so pointers on
skb should be set after this function.

Bug was introduced by commit dff3bb06 ("ah4: convert to ahash")
and 8631e9bd ("ah6: convert to ahash").
Signed-off-by: NWang Xuefu <xuefu.wang@6wind.com>
Acked-by: NKrzysztof Witek <krzysztof.witek@6wind.com>
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4b0ef1f2

xfrm: check trunc_len in XFRMA_ALG_AUTH_TRUNC · fa6dd8a2

由 Nicolas Dichtel 提交于 1月 11, 2011

Maximum trunc length is defined by MAX_AH_AUTH_LEN (in bytes)
and need to be checked when this value is set (in bits) by
the user. In ah4.c and ah6.c a BUG_ON() checks this condiftion.
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fa6dd8a2

tcp: disallow bind() to reuse addr/port · c191a836

由 Eric Dumazet 提交于 1月 11, 2011

inet_csk_bind_conflict() logic currently disallows a bind() if
it finds a friend socket (a socket bound on same address/port)
satisfying a set of conditions :

1) Current (to be bound) socket doesnt have sk_reuse set
OR
2) other socket doesnt have sk_reuse set
OR
3) other socket is in LISTEN state

We should add the CLOSE state in the 3) condition, in order to avoid two
REUSEADDR sockets in CLOSE state with same local address/port, since
this can deny further operations.

Note : a prior patch tried to address the problem in a different (and
buggy) way. (commit fda48a0d tcp: bind() fix when many ports
are bound).
Reported-by: NGaspar Chilingarov <gasparch@gmail.com>
Reported-by: NDaniel Baluta <daniel.baluta@gmail.com>
Tested-by: NDaniel Baluta <daniel.baluta@gmail.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c191a836

11 1月, 2011 9 次提交

net/9p: Use proper data types · 219fd58b

由 M. Mohan Kumar 提交于 1月 10, 2011

Use proper data types for storing the count of the binary blob and
length of a string. Without this patch length calculation of string will
always result in -1 because of comparision between signed and unsigned
integer.
Signed-off-by: NM. Mohan Kumar <mohan@in.ibm.com>
Signed-off-by: NVenkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

219fd58b

CAIF: Fix IPv6 support in receive path for GPRS/3G · d7b92aff

由 Kumar Sanghvi 提交于 1月 07, 2011

Checks version field of IP in the receive path for GPRS/3G data
and appropriately sets the value of skb->protocol.
Signed-off-by: NSjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d7b92aff

arp: allow to invalidate specific ARP entries · 545ecdc3

由 Maxim Levitsky 提交于 1月 08, 2011

IPv4 over firewire needs to be able to remove ARP entries
from the ARP cache that belong to nodes that are removed, because
IPv4 over firewire uses ARP packets for private information
about nodes.

This information becomes invalid as soon as node drops
off the bus and when it reconnects, its only possible
to start talking to it after it responded to an ARP packet.
But ARP cache prevents such packets from being sent.
Signed-off-by: NMaxim Levitsky <maximlevitsky@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

545ecdc3

net_sched: factorize qdisc stats handling · bfe0d029

由 Eric Dumazet 提交于 1月 09, 2011

HTB takes into account skb is segmented in stats updates.
Generalize this to all schedulers.

They should use qdisc_bstats_update() helper instead of manipulating
bstats.bytes and bstats.packets

Add bstats_update() helper too for classes that use
gnet_stats_basic_packed fields.

Note : Right now, TCQ_F_CAN_BYPASS shortcurt can be taken only if no
stab is setup on qdisc.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bfe0d029

net: Add alloc_netdev_mqs function · 36909ea4

由 Tom Herbert 提交于 1月 09, 2011

Added alloc_netdev_mqs function which allows the number of transmit and
receive queues to be specified independenty.  alloc_netdev_mq was
changed to a macro to call the new function.  Also added
alloc_etherdev_mqs with same purpose.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

36909ea4

caif: don't set connection request param size before copying data · 91b5c98c

由 Dan Rosenberg 提交于 1月 10, 2011

The size field should not be set until after the data is successfully
copied in.
Signed-off-by: NDan Rosenberg <drosenberg@vsecurity.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

91b5c98c

phonet: some signedness bugs · facb4edc

由 Dan Carpenter 提交于 1月 10, 2011

Dan Rosenberg pointed out that there were some signed comparison bugs
in the phonet protocol.

http://marc.info/?l=full-disclosure&m=129424528425330&w=2

The problem is that we check for array overflows but "protocol" is
signed and we don't check for array underflows.  If you have already
have CAP_SYS_ADMIN then you could use the bugs to get root, or someone
could cause an oops by mistake.
Signed-off-by: NDan Carpenter <error27@gmail.com>
Acked-by: NRémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

facb4edc

NFS: Don't use vm_map_ram() in readdir · 6650239a

由 Trond Myklebust 提交于 1月 08, 2011

vm_map_ram() is not available on NOMMU platforms, and causes trouble
on incoherrent architectures such as ARM when we access the page data
through both the direct and the virtual mapping.

The alternative is to use the direct mapping to access page data
for the case when we are not crossing a page boundary, but to copy
the data into a linear scratch buffer when we are accessing data
that spans page boundaries.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: NMarc Kleine-Budde <mkl@pengutronix.de>
Cc: stable@kernel.org  [2.6.37]

6650239a

netfilter: x_tables: dont block BH while reading counters · 83723d60

由 Eric Dumazet 提交于 1月 10, 2011

Using "iptables -L" with a lot of rules have a too big BH latency.
Jesper mentioned ~6 ms and worried of frame drops.

Switch to a per_cpu seqlock scheme, so that taking a snapshot of
counters doesnt need to block BH (for this cpu, but also other cpus).

This adds two increments on seqlock sequence per ipt_do_table() call,
its a reasonable cost for allowing "iptables -L" not block BH
processing.
Reported-by: NJesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
Acked-by: NStephen Hemminger <shemminger@vyatta.com>
Acked-by: NJesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

83723d60

10 1月, 2011 8 次提交

net offloading: Convert checksums to use centrally computed features. · 03634668

由 Jesse Gross 提交于 1月 09, 2011

In order to compute the features for other offloads (primarily
scatter/gather), we need to first check the ability of the NIC to
offload the checksum for the packet.  Since we have already computed
this, we can directly use the result instead of figuring it out
again.
Signed-off-by: NJesse Gross <jesse@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

03634668

net offloading: Convert skb_need_linearize() to use precomputed features. · 02932ce9

由 Jesse Gross 提交于 1月 09, 2011

This switches skb_need_linearize() to use the features that have
been centrally computed.  In doing so, this fixes a problem where
scatter/gather should not be used because the card does not support
checksum offloading on that type of packet.  On device registration
we only check that some form of checksum offloading is available if
scatter/gatther is enabled but we must also check at transmission
time.  Examples of this include IPv6 or vlan packets on a NIC that
only supports IPv4 offloading.
Signed-off-by: NJesse Gross <jesse@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

02932ce9

net offloading: Convert dev_gso_segment() to use precomputed features. · 91ecb63c

由 Jesse Gross 提交于 1月 09, 2011

This switches dev_gso_segment() to use the device features computed
by the centralized routine.  In doing so, it fixes a problem where
it would always use dev->features, instead of those appropriate
to the number of vlan tags if any are present.
Signed-off-by: NJesse Gross <jesse@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

91ecb63c

net offloading: Pass features into netif_needs_gso(). · fc741216

由 Jesse Gross 提交于 1月 09, 2011

Now that there is a single function that can compute the device
features relevant to a packet, we don't want to run it for each
offload.  This converts netif_needs_gso() to take the features
of the device, rather than computing them itself.
Signed-off-by: NJesse Gross <jesse@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fc741216

net offloading: Generalize netif_get_vlan_features(). · f01a5236

由 Jesse Gross 提交于 1月 09, 2011

netif_get_vlan_features() is currently only used by netif_needs_gso(),
so it only concerns itself with GSO features.  However, several other
places also should take into account the contents of the packet when
deciding whether to offload to hardware.  This generalizes the function
to return features about all of the various forms of offloading.  Since
offloads tend to be linked together, this avoids duplicating the logic
in each location (i.e. the scatter/gather code also needs the checksum
logic).
Suggested-by: NMichał Mirosław <mirqus@gmail.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f01a5236

net offloading: Accept NETIF_F_HW_CSUM for all protocols. · 9497a051

由 Jesse Gross 提交于 1月 09, 2011

We currently only have software fallback for one type of checksum: the
TCP/UDP one's complement. This means that a protocol that uses hardware
offloading for a different type of checksum (FCoE, SCTP) must directly
check the device's features and do the right thing ahead of time. By
the time we get to dev_can_checksum(), we're only deciding whether to
apply the one algorithm in software or hardware. NETIF_F_HW_CSUM has the
same capabilities as the software version, so we should always use it if
present. The primary advantage of this is multiply tagged vlans can use
hardware checksumming.
Signed-off-by: NJesse Gross <jesse@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9497a051

net: fix kernel-doc warning in core/filter.c · 697d0e33

由 Randy Dunlap 提交于 1月 08, 2011

Fix new kernel-doc notation warning in net/core/filter.c:

Warning(net/core/filter.c:172): No description found for parameter 'fentry'
Warning(net/core/filter.c:172): Excess function parameter 'filter' description in 'sk_run_filter'
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

697d0e33

netlink: test for all flags of the NLM_F_DUMP composite · 0ab03c2b

由 Jan Engelhardt 提交于 1月 07, 2011

Due to NLM_F_DUMP is composed of two bits, NLM_F_ROOT | NLM_F_MATCH,
when doing "if (x & NLM_F_DUMP)", it tests for _either_ of the bits
being set. Because NLM_F_MATCH's value overlaps with NLM_F_EXCL,
non-dump requests with NLM_F_EXCL set are mistaken as dump requests.

Substitute the condition to test for _all_ bits being set.
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
Acked-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0ab03c2b

07 1月, 2011 15 次提交

dccp: make upper bound for seq_window consistent on 32/64 bit · bfbb2346

由 Gerrit Renker 提交于 1月 02, 2011

The 'seq_window' sysctl sets the initial value for the DCCP Sequence Window,
which may range from 32..2^46-1 (RFC 4340, 7.5.2). The patch sets the upper
bound consistently to 2^32-1 on both 32 and 64 bit systems, which should be
sufficient - with a RTT of 1sec and 1-byte packets, a seq_window of 2^32-1
corresponds to a link speed of 34 Gbps.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

bfbb2346

dccp: fix bug in updating the GSR · 763dadd4

由 Samuel Jero 提交于 12月 30, 2010

Currently dccp_check_seqno allows any valid packet to update the Greatest
Sequence Number Received, even if that packet's sequence number is less than
the current GSR. This patch adds a check to make sure that the new packet's
sequence number is greater than GSR.
Signed-off-by: NSamuel Jero <sj323707@ohio.edu>
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

763dadd4

dccp: fix return value for sequence-invalid packets · 2cf5be93

由 Samuel Jero 提交于 12月 30, 2010

Currently dccp_check_seqno returns 0 (indicating a valid packet) if the
acknowledgment number is out of bounds and the sync that RFC 4340 mandates at
this point is currently being rate-limited. This function should return -1,
indicating an invalid packet.
Signed-off-by: NSamuel Jero <sj323707@ohio.edu>
Acked-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

2cf5be93

fs: scale mntget/mntput · b3e19d92

由 Nick Piggin 提交于 1月 07, 2011

The problem that this patch aims to fix is vfsmount refcounting scalability.
We need to take a reference on the vfsmount for every successful path lookup,
which often go to the same mount point.

The fundamental difficulty is that a "simple" reference count can never be made
scalable, because any time a reference is dropped, we must check whether that
was the last reference. To do that requires communication with all other CPUs
that may have taken a reference count.

We can make refcounts more scalable in a couple of ways, involving keeping
distributed counters, and checking for the global-zero condition less
frequently.

- check the global sum once every interval (this will delay zero detection
  for some interval, so it's probably a showstopper for vfsmounts).

- keep a local count and only taking the global sum when local reaches 0 (this
  is difficult for vfsmounts, because we can't hold preempt off for the life of
  a reference, so a counter would need to be per-thread or tied strongly to a
  particular CPU which requires more locking).

- keep a local difference of increments and decrements, which allows us to sum
  the total difference and hence find the refcount when summing all CPUs. Then,
  keep a single integer "long" refcount for slow and long lasting references,
  and only take the global sum of local counters when the long refcount is 0.

This last scheme is what I implemented here. Attached mounts and process root
and working directory references are "long" references, and everything else is
a short reference.

This allows scalable vfsmount references during path walking over mounted
subtrees and unattached (lazy umounted) mounts with processes still running
in them.

This results in one fewer atomic op in the fastpath: mntget is now just a
per-CPU inc, rather than an atomic inc; and mntput just requires a spinlock
and non-atomic decrement in the common case. However code is otherwise bigger
and heavier, so single threaded performance is basically a wash.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

b3e19d92

fs: improve scalability of pseudo filesystems · 4b936885

由 Nick Piggin 提交于 1月 07, 2011

Regardless of how much we possibly try to scale dcache, there is likely
always going to be some fundamental contention when adding or removing children
under the same parent. Pseudo filesystems do not seem need to have connected
dentries because by definition they are disconnected.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

4b936885

fs: dcache reduce branches in lookup path · fb045adb

由 Nick Piggin 提交于 1月 07, 2011

Reduce some branches and memory accesses in dcache lookup by adding dentry
flags to indicate common d_ops are set, rather than having to check them.
This saves a pointer memory access (dentry->d_op) in common path lookup
situations, and saves another pointer load and branch in cases where we
have d_op but not the particular operation.

Patched with:

git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

fb045adb

fs: avoid inode RCU freeing for pseudo fs · ff0c7d15

由 Nick Piggin 提交于 1月 07, 2011

Pseudo filesystems that don't put inode on RCU list or reachable by
rcu-walk dentries do not need to RCU free their inodes.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

ff0c7d15

fs: icache RCU free inodes · fa0d7e3d

由 Nick Piggin 提交于 1月 07, 2011

RCU free the struct inode. This will allow:

- Subsequent store-free path walking patch. The inode must be consulted for
  permissions when walking, so an RCU inode reference is a must.
- sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
  to take i_lock no longer need to take sb_inode_list_lock to walk the list in
  the first place. This will simplify and optimize locking.
- Could remove some nested trylock loops in dcache code
- Could potentially simplify things a bit in VM land. Do not need to take the
  page lock to follow page->mapping.

The downsides of this is the performance cost of using RCU. In a simple
creat/unlink microbenchmark, performance drops by about 10% due to inability to
reuse cache-hot slab objects. As iterations increase and RCU freeing starts
kicking over, this increases to about 20%.

In cases where inode lifetimes are longer (ie. many inodes may be allocated
during the average life span of a single inode), a lot of this cache reuse is
not applicable, so the regression caused by this patch is smaller.

The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
however this adds some complexity to list walking and store-free path walking,
so I prefer to implement this at a later date, if it is shown to be a win in
real situations. I haven't found a regression in any non-micro benchmark so I
doubt it will be a problem.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

fa0d7e3d

fs: change d_delete semantics · fe15ce44

由 Nick Piggin 提交于 1月 07, 2011

Change d_delete from a dentry deletion notification to a dentry caching
advise, more like ->drop_inode. Require it to be constant and idempotent,
and not take d_lock. This is how all existing filesystems use the callback
anyway.

This makes fine grained dentry locking of dput and dentry lru scanning
much simpler.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

fe15ce44

NFS rename client back channel transport field · 4a19de0f