提交 · f966a13f92913ce8cbd35bc7f066553c9f3d41b0 · openanolis / cloud-kernel

16 1月, 2011 3 次提交

caif: checking the wrong variable · 01a85901

由 Dan Carpenter 提交于 1月 15, 2011

In the original code we check if (servl == NULL) twice. The first time
should print the message that cfmuxl_remove_uplayer() failed and set
"ret" correctly, but instead it just returns success. The second check
should be checking the value of "ret" instead of "servl".
Signed-off-by: NDan Carpenter <error27@gmail.com>
Acked-by: NSjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

01a85901

can: test size of struct sockaddr in sendmsg · 5e507328

由 Kurt Van Dijck 提交于 1月 15, 2011

This patch makes the CAN socket code conform to the manpage of sendmsg.
Signed-off-by: NKurt Van Dijck <kurt.van.dijck@eia.be>
Acked-by: NOliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5e507328

batman-adv: Use "__attribute__" shortcut macros · aa0adb1a

由 Sven Eckelmann 提交于 1月 15, 2011

Linux 2.6.21 defines different macros for __attribute__ which are also
used inside batman-adv. The next version of checkpatch.pl warns about
the usage of __attribute__((packed))).

Linux 2.6.33 defines an extra macro __always_unused which is used to
assist source code analyzers and can be used to removed the last
existing __attribute__ inside the source code.
Signed-off-by: NSven Eckelmann <sven@narfation.org>

aa0adb1a

14 1月, 2011 5 次提交

net: remove dev_txq_stats_fold() · 1ac9ad13

由 Eric Dumazet 提交于 1月 12, 2011

After recent changes, (percpu stats on vlan/tunnels...), we dont need
anymore per struct netdev_queue tx_bytes/tx_packets/tx_dropped counters.

Only remaining users are ixgbe, sch_teql, gianfar & macvlan :

1) ixgbe can be converted to use existing tx_ring counters.

2) macvlan incremented txq->tx_dropped, it can use the
dev->stats.tx_dropped counter.

3) sch_teql : almost revert ab35cd4b (Use net_device internal stats)
    Now we have ndo_get_stats64(), use it, even for "unsigned long"
fields (No need to bring back a struct net_device_stats)

4) gianfar adds a stats structure per tx queue to hold
tx_bytes/tx_packets

This removes a lockdep warning (and possible lockup) in rndis gadget,
calling dev_get_stats() from hard IRQ context.

Ref: http://www.spinics.net/lists/netdev/msg149202.htmlReported-by: NNeil Jones <neiljay@gmail.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
CC: Jarek Poplawski <jarkao2@gmail.com>
CC: Alexander Duyck <alexander.h.duyck@intel.com>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: Sandeep Gopalpet <sandeep.kumar@freescale.com>
CC: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1ac9ad13

batman-adv: Even Batman should not dereference NULL pointers · ed7809d9

由 Jesper Juhl 提交于 1月 13, 2011

There's a problem in net/batman-adv/unicast.c::frag_send_skb().
dev_alloc_skb() allocates memory and may fail, thus returning NULL. If
this happens we'll pass a NULL pointer on to skb_split() which in turn
hands it to skb_split_inside_header() from where it gets passed to
skb_put() that lets skb_tail_pointer() play with it and that function
dereferences it. And thus the bat dies.

While I was at it I also moved the call to dev_alloc_skb() above the
assignment to 'unicast_packet' since there's no reason to do that
assignment if the memory allocation fails.
Signed-off-by: NJesper Juhl <jj@chaosbits.net>
Signed-off-by: NSven Eckelmann <sven@narfation.org>

ed7809d9

mac80211: use maximum number of AMPDU frames as default in BA RX · 82694f76

由 Luciano Coelho 提交于 1月 12, 2011

When the buffer size is set to zero in the block ack parameter set
field, we should use the maximum supported number of subframes.  The
existing code was bogus and was doing some unnecessary calculations
that lead to wrong values.

Thanks Johannes for helping me figure this one out.

Cc: stable@kernel.org
Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: NLuciano Coelho <coelho@ti.com>
Reviewed-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

82694f76

mac80211: fix lockdep warning · 681c4d07

由 Johannes Berg 提交于 1月 12, 2011

Since the introduction of the fixes for the
reorder timer, mac80211 will cause lockdep
warnings because lockdep confuses
local->skb_queue and local->rx_skb_queue
and treats their lock as the same.

However, their locks are different, and are
valid in different contexts (the former is
used in IRQ context, the latter in BH only)
and the only thing to be done is mark the
former as a different lock class so that
lockdep can tell the difference.
Reported-by: NLarry Finger <Larry.Finger@lwfinger.net>
Reported-by: NSujith <m.sujith@gmail.com>
Reported-by: NMiles Lane <miles.lane@gmail.com>
Tested-by: NSujith <m.sujith@gmail.com>
Tested-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

681c4d07

netfilter: ctnetlink: fix loop in ctnetlink_get_conntrack() · f31e8d49

由 Pablo Neira Ayuso 提交于 1月 13, 2011

This patch fixes a loop in ctnetlink_get_conntrack() that can be
triggered if you use the same socket to receive events and to
perform a GET operation. Under heavy load, netlink_unicast()
may return -EAGAIN, this error code is reserved in nfnetlink for
the module load-on-demand. Instead, we return -ENOBUFS which is
the appropriate error code that has to be propagated to
user-space.
Reported-by: NHolger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

f31e8d49

13 1月, 2011 3 次提交

eth: fix new kernel-doc warning · 3806b4f3

由 Randy Dunlap 提交于 1月 12, 2011

Fix new kernel-doc warning (copy-paste typo):

Warning(net/ethernet/eth.c:366): No description found for parameter 'rxqs'
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3806b4f3

inet6: prevent network storms caused by linux IPv6 routers · 72b43d08

由 Alexey Kuznetsov 提交于 1月 12, 2011

Linux IPv6 forwards unicast packets, which are link layer multicasts...
The hole was present since day one. I was 100% this check is there, but it is not.

The problem shows itself, f.e. when Microsoft Network Load Balancer runs on a network.
This software resolves IPv6 unicast addresses to multicast MAC addresses.
Signed-off-by: NAlexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

72b43d08

netfilter: fix compilation when conntrack is disabled but tproxy is enabled · 2fc72c7b

由 KOVACS Krisztian 提交于 1月 12, 2011

The IPv6 tproxy patches split IPv6 defragmentation off of conntrack, but
failed to update the #ifdef stanzas guarding the defragmentation related
fields and code in skbuff and conntrack related code in nf_defrag_ipv6.c.

This patch adds the required #ifdefs so that IPv6 tproxy can truly be used
without connection tracking.

Original report:
http://marc.info/?l=linux-netdev&m=129010118516341&w=2Reported-by: NRandy Dunlap <randy.dunlap@oracle.com>
Acked-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NKOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

2fc72c7b

12 1月, 2011 5 次提交

net: ax25: fix information leak to userland harder · 5b919f83

由 Kees Cook 提交于 1月 12, 2011

Commit fe10ae53 adds a memset() to clear
the structure being sent back to userspace, but accidentally used the
wrong size.
Reported-by: NBrad Spengler <spender@grsecurity.net>
Signed-off-by: NKees Cook <kees.cook@canonical.com>
Cc: stable@kernel.org
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5b919f83

netfilter: fix race in conntrack between dump_table and destroy · 13ee6ac5

由 Stephen Hemminger 提交于 1月 11, 2011

The netlink interface to dump the connection tracking table has a race
when entries are deleted at the same time. A customer reported a crash
and the backtrace showed thatctnetlink_dump_table was running while a
conntrack entry was being destroyed.
(see https://bugzilla.vyatta.com/show_bug.cgi?id=6402).

According to RCU documentation, when using hlist_nulls the reader
must handle the case of seeing a deleted entry and not proceed
further down the linked list.  The old code would continue
which caused the scan to walk into the free list.

This patch uses locking (rather than RCU) for this operation which
is guaranteed safe, and no longer requires getting reference while
doing dump operation.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

13ee6ac5

ah: reload pointers to skb data after calling skb_cow_data() · 4b0ef1f2

由 Dang Hongwu 提交于 1月 11, 2011

skb_cow_data() may allocate a new data buffer, so pointers on
skb should be set after this function.

Bug was introduced by commit dff3bb06 ("ah4: convert to ahash")
and 8631e9bd ("ah6: convert to ahash").
Signed-off-by: NWang Xuefu <xuefu.wang@6wind.com>
Acked-by: NKrzysztof Witek <krzysztof.witek@6wind.com>
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4b0ef1f2

xfrm: check trunc_len in XFRMA_ALG_AUTH_TRUNC · fa6dd8a2

由 Nicolas Dichtel 提交于 1月 11, 2011

Maximum trunc length is defined by MAX_AH_AUTH_LEN (in bytes)
and need to be checked when this value is set (in bits) by
the user. In ah4.c and ah6.c a BUG_ON() checks this condiftion.
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fa6dd8a2

tcp: disallow bind() to reuse addr/port · c191a836

由 Eric Dumazet 提交于 1月 11, 2011

inet_csk_bind_conflict() logic currently disallows a bind() if
it finds a friend socket (a socket bound on same address/port)
satisfying a set of conditions :

1) Current (to be bound) socket doesnt have sk_reuse set
OR
2) other socket doesnt have sk_reuse set
OR
3) other socket is in LISTEN state

We should add the CLOSE state in the 3) condition, in order to avoid two
REUSEADDR sockets in CLOSE state with same local address/port, since
this can deny further operations.

Note : a prior patch tried to address the problem in a different (and
buggy) way. (commit fda48a0d tcp: bind() fix when many ports
are bound).
Reported-by: NGaspar Chilingarov <gasparch@gmail.com>
Reported-by: NDaniel Baluta <daniel.baluta@gmail.com>
Tested-by: NDaniel Baluta <daniel.baluta@gmail.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c191a836

11 1月, 2011 7 次提交

CAIF: Fix IPv6 support in receive path for GPRS/3G · d7b92aff

由 Kumar Sanghvi 提交于 1月 07, 2011

Checks version field of IP in the receive path for GPRS/3G data
and appropriately sets the value of skb->protocol.
Signed-off-by: NSjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d7b92aff

arp: allow to invalidate specific ARP entries · 545ecdc3

由 Maxim Levitsky 提交于 1月 08, 2011

IPv4 over firewire needs to be able to remove ARP entries
from the ARP cache that belong to nodes that are removed, because
IPv4 over firewire uses ARP packets for private information
about nodes.

This information becomes invalid as soon as node drops
off the bus and when it reconnects, its only possible
to start talking to it after it responded to an ARP packet.
But ARP cache prevents such packets from being sent.
Signed-off-by: NMaxim Levitsky <maximlevitsky@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

545ecdc3

net_sched: factorize qdisc stats handling · bfe0d029

由 Eric Dumazet 提交于 1月 09, 2011

HTB takes into account skb is segmented in stats updates.
Generalize this to all schedulers.

They should use qdisc_bstats_update() helper instead of manipulating
bstats.bytes and bstats.packets

Add bstats_update() helper too for classes that use
gnet_stats_basic_packed fields.

Note : Right now, TCQ_F_CAN_BYPASS shortcurt can be taken only if no
stab is setup on qdisc.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bfe0d029

net: Add alloc_netdev_mqs function · 36909ea4

由 Tom Herbert 提交于 1月 09, 2011

Added alloc_netdev_mqs function which allows the number of transmit and
receive queues to be specified independenty.  alloc_netdev_mq was
changed to a macro to call the new function.  Also added
alloc_etherdev_mqs with same purpose.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

36909ea4

caif: don't set connection request param size before copying data · 91b5c98c

由 Dan Rosenberg 提交于 1月 10, 2011

The size field should not be set until after the data is successfully
copied in.
Signed-off-by: NDan Rosenberg <drosenberg@vsecurity.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

91b5c98c

phonet: some signedness bugs · facb4edc

由 Dan Carpenter 提交于 1月 10, 2011

Dan Rosenberg pointed out that there were some signed comparison bugs
in the phonet protocol.

http://marc.info/?l=full-disclosure&m=129424528425330&w=2

The problem is that we check for array overflows but "protocol" is
signed and we don't check for array underflows.  If you have already
have CAP_SYS_ADMIN then you could use the bugs to get root, or someone
could cause an oops by mistake.
Signed-off-by: NDan Carpenter <error27@gmail.com>
Acked-by: NRémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

facb4edc

netfilter: x_tables: dont block BH while reading counters · 83723d60

由 Eric Dumazet 提交于 1月 10, 2011

Using "iptables -L" with a lot of rules have a too big BH latency.
Jesper mentioned ~6 ms and worried of frame drops.

Switch to a per_cpu seqlock scheme, so that taking a snapshot of
counters doesnt need to block BH (for this cpu, but also other cpus).

This adds two increments on seqlock sequence per ipt_do_table() call,
its a reasonable cost for allowing "iptables -L" not block BH
processing.
Reported-by: NJesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
Acked-by: NStephen Hemminger <shemminger@vyatta.com>
Acked-by: NJesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

83723d60

10 1月, 2011 8 次提交

net offloading: Convert checksums to use centrally computed features. · 03634668

由 Jesse Gross 提交于 1月 09, 2011

In order to compute the features for other offloads (primarily
scatter/gather), we need to first check the ability of the NIC to
offload the checksum for the packet.  Since we have already computed
this, we can directly use the result instead of figuring it out
again.
Signed-off-by: NJesse Gross <jesse@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

03634668

net offloading: Convert skb_need_linearize() to use precomputed features. · 02932ce9

由 Jesse Gross 提交于 1月 09, 2011

This switches skb_need_linearize() to use the features that have
been centrally computed.  In doing so, this fixes a problem where
scatter/gather should not be used because the card does not support
checksum offloading on that type of packet.  On device registration
we only check that some form of checksum offloading is available if
scatter/gatther is enabled but we must also check at transmission
time.  Examples of this include IPv6 or vlan packets on a NIC that
only supports IPv4 offloading.
Signed-off-by: NJesse Gross <jesse@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

02932ce9

net offloading: Convert dev_gso_segment() to use precomputed features. · 91ecb63c

由 Jesse Gross 提交于 1月 09, 2011

This switches dev_gso_segment() to use the device features computed
by the centralized routine.  In doing so, it fixes a problem where
it would always use dev->features, instead of those appropriate
to the number of vlan tags if any are present.
Signed-off-by: NJesse Gross <jesse@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

91ecb63c

net offloading: Pass features into netif_needs_gso(). · fc741216

由 Jesse Gross 提交于 1月 09, 2011

Now that there is a single function that can compute the device
features relevant to a packet, we don't want to run it for each
offload.  This converts netif_needs_gso() to take the features
of the device, rather than computing them itself.
Signed-off-by: NJesse Gross <jesse@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fc741216

net offloading: Generalize netif_get_vlan_features(). · f01a5236

由 Jesse Gross 提交于 1月 09, 2011

netif_get_vlan_features() is currently only used by netif_needs_gso(),
so it only concerns itself with GSO features.  However, several other
places also should take into account the contents of the packet when
deciding whether to offload to hardware.  This generalizes the function
to return features about all of the various forms of offloading.  Since
offloads tend to be linked together, this avoids duplicating the logic
in each location (i.e. the scatter/gather code also needs the checksum
logic).
Suggested-by: NMichał Mirosław <mirqus@gmail.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f01a5236

net offloading: Accept NETIF_F_HW_CSUM for all protocols. · 9497a051

由 Jesse Gross 提交于 1月 09, 2011

We currently only have software fallback for one type of checksum: the
TCP/UDP one's complement. This means that a protocol that uses hardware
offloading for a different type of checksum (FCoE, SCTP) must directly
check the device's features and do the right thing ahead of time. By
the time we get to dev_can_checksum(), we're only deciding whether to
apply the one algorithm in software or hardware. NETIF_F_HW_CSUM has the
same capabilities as the software version, so we should always use it if
present. The primary advantage of this is multiply tagged vlans can use
hardware checksumming.
Signed-off-by: NJesse Gross <jesse@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9497a051

net: fix kernel-doc warning in core/filter.c · 697d0e33

由 Randy Dunlap 提交于 1月 08, 2011

Fix new kernel-doc notation warning in net/core/filter.c:

Warning(net/core/filter.c:172): No description found for parameter 'fentry'
Warning(net/core/filter.c:172): Excess function parameter 'filter' description in 'sk_run_filter'
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

697d0e33

netlink: test for all flags of the NLM_F_DUMP composite · 0ab03c2b

由 Jan Engelhardt 提交于 1月 07, 2011

Due to NLM_F_DUMP is composed of two bits, NLM_F_ROOT | NLM_F_MATCH,
when doing "if (x & NLM_F_DUMP)", it tests for _either_ of the bits
being set. Because NLM_F_MATCH's value overlaps with NLM_F_EXCL,
non-dump requests with NLM_F_EXCL set are mistaken as dump requests.

Substitute the condition to test for _all_ bits being set.
Signed-off-by: NJan Engelhardt <jengelh@medozas.de>
Acked-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0ab03c2b

07 1月, 2011 9 次提交

dccp: make upper bound for seq_window consistent on 32/64 bit · bfbb2346

由 Gerrit Renker 提交于 1月 02, 2011

The 'seq_window' sysctl sets the initial value for the DCCP Sequence Window,
which may range from 32..2^46-1 (RFC 4340, 7.5.2). The patch sets the upper
bound consistently to 2^32-1 on both 32 and 64 bit systems, which should be
sufficient - with a RTT of 1sec and 1-byte packets, a seq_window of 2^32-1
corresponds to a link speed of 34 Gbps.
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

bfbb2346

dccp: fix bug in updating the GSR · 763dadd4

由 Samuel Jero 提交于 12月 30, 2010

Currently dccp_check_seqno allows any valid packet to update the Greatest
Sequence Number Received, even if that packet's sequence number is less than
the current GSR. This patch adds a check to make sure that the new packet's
sequence number is greater than GSR.
Signed-off-by: NSamuel Jero <sj323707@ohio.edu>
Signed-off-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

763dadd4

dccp: fix return value for sequence-invalid packets · 2cf5be93

由 Samuel Jero 提交于 12月 30, 2010

Currently dccp_check_seqno returns 0 (indicating a valid packet) if the
acknowledgment number is out of bounds and the sync that RFC 4340 mandates at
this point is currently being rate-limited. This function should return -1,
indicating an invalid packet.
Signed-off-by: NSamuel Jero <sj323707@ohio.edu>
Acked-by: NGerrit Renker <gerrit@erg.abdn.ac.uk>

2cf5be93

fs: scale mntget/mntput · b3e19d92

由 Nick Piggin 提交于 1月 07, 2011

The problem that this patch aims to fix is vfsmount refcounting scalability.
We need to take a reference on the vfsmount for every successful path lookup,
which often go to the same mount point.

The fundamental difficulty is that a "simple" reference count can never be made
scalable, because any time a reference is dropped, we must check whether that
was the last reference. To do that requires communication with all other CPUs
that may have taken a reference count.

We can make refcounts more scalable in a couple of ways, involving keeping
distributed counters, and checking for the global-zero condition less
frequently.

- check the global sum once every interval (this will delay zero detection
  for some interval, so it's probably a showstopper for vfsmounts).

- keep a local count and only taking the global sum when local reaches 0 (this
  is difficult for vfsmounts, because we can't hold preempt off for the life of
  a reference, so a counter would need to be per-thread or tied strongly to a
  particular CPU which requires more locking).

- keep a local difference of increments and decrements, which allows us to sum
  the total difference and hence find the refcount when summing all CPUs. Then,
  keep a single integer "long" refcount for slow and long lasting references,
  and only take the global sum of local counters when the long refcount is 0.

This last scheme is what I implemented here. Attached mounts and process root
and working directory references are "long" references, and everything else is
a short reference.

This allows scalable vfsmount references during path walking over mounted
subtrees and unattached (lazy umounted) mounts with processes still running
in them.

This results in one fewer atomic op in the fastpath: mntget is now just a
per-CPU inc, rather than an atomic inc; and mntput just requires a spinlock
and non-atomic decrement in the common case. However code is otherwise bigger
and heavier, so single threaded performance is basically a wash.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

b3e19d92

fs: improve scalability of pseudo filesystems · 4b936885

由 Nick Piggin 提交于 1月 07, 2011

Regardless of how much we possibly try to scale dcache, there is likely
always going to be some fundamental contention when adding or removing children
under the same parent. Pseudo filesystems do not seem need to have connected
dentries because by definition they are disconnected.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

4b936885

fs: dcache reduce branches in lookup path · fb045adb

由 Nick Piggin 提交于 1月 07, 2011

Reduce some branches and memory accesses in dcache lookup by adding dentry
flags to indicate common d_ops are set, rather than having to check them.
This saves a pointer memory access (dentry->d_op) in common path lookup
situations, and saves another pointer load and branch in cases where we
have d_op but not the particular operation.

Patched with:

git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

fb045adb

fs: avoid inode RCU freeing for pseudo fs · ff0c7d15

由 Nick Piggin 提交于 1月 07, 2011

Pseudo filesystems that don't put inode on RCU list or reachable by
rcu-walk dentries do not need to RCU free their inodes.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

ff0c7d15

fs: icache RCU free inodes · fa0d7e3d

由 Nick Piggin 提交于 1月 07, 2011

RCU free the struct inode. This will allow:

- Subsequent store-free path walking patch. The inode must be consulted for
  permissions when walking, so an RCU inode reference is a must.
- sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
  to take i_lock no longer need to take sb_inode_list_lock to walk the list in
  the first place. This will simplify and optimize locking.
- Could remove some nested trylock loops in dcache code
- Could potentially simplify things a bit in VM land. Do not need to take the
  page lock to follow page->mapping.

The downsides of this is the performance cost of using RCU. In a simple
creat/unlink microbenchmark, performance drops by about 10% due to inability to
reuse cache-hot slab objects. As iterations increase and RCU freeing starts
kicking over, this increases to about 20%.

In cases where inode lifetimes are longer (ie. many inodes may be allocated
during the average life span of a single inode), a lot of this cache reuse is
not applicable, so the regression caused by this patch is smaller.

The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
however this adds some complexity to list walking and store-free path walking,
so I prefer to implement this at a later date, if it is shown to be a win in
real situations. I haven't found a regression in any non-micro benchmark so I
doubt it will be a problem.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

fa0d7e3d

fs: change d_delete semantics · fe15ce44

由 Nick Piggin 提交于 1月 07, 2011

Change d_delete from a dentry deletion notification to a dentry caching
advise, more like ->drop_inode. Require it to be constant and idempotent,
and not take d_lock. This is how all existing filesystems use the callback
anyway.

This makes fine grained dentry locking of dput and dentry lru scanning
much simpler.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

fe15ce44

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功