提交 · 9bd1966344bf975b5ce65e80fd6bacc41b4325a8 · openeuler / Kernel

22 3月, 2012 25 次提交

libceph: rename "page_shift" variable to something sensible · 9bd19663

由 Alex Elder 提交于 3月 07, 2012

In write_partial_msg_pages() there is a local variable used to
track the starting offset within a bio segment to use.  Its name,
"page_shift" defies the Linux convention of using that name for
log-base-2(page size).

Since it's only used in the bio case rename it "bio_offset".  Use it
along with the page_pos field to compute the memory offset when
computing CRC's in that function.  This makes the bio case match the
others more closely.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Reviewed-by: NSage Weil <sage@newdream.net>

9bd19663

libceph: get rid of zero_page_address · 0cdf9e60

由 Alex Elder 提交于 3月 07, 2012

There's not a lot of benefit to zero_page_address, which basically
holds a mapping of the zero page through the life of the messenger
module.  Even with our own mapping, the sendpage interface where
it's used may need to kmap() it again.  It's almost certain to
be in low memory anyway.

So stop treating the zero page specially in write_partial_msg_pages()
and just get rid of zero_page_address entirely.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Reviewed-by: NSage Weil <sage@newdream.net>

0cdf9e60

libceph: only call kernel_sendpage() via helper · e36b13cc

由 Alex Elder 提交于 3月 07, 2012

Make ceph_tcp_sendpage() be the only place kernel_sendpage() is
used, by using this helper in write_partial_msg_pages().
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Reviewed-by: NSage Weil <sage@newdream.net>

e36b13cc

libceph: use kernel_sendpage() for sending zeroes · 31739139

由 Alex Elder 提交于 3月 07, 2012

If a message queued for send gets revoked, zeroes are sent over the
wire instead of any unsent data.  This is done by constructing a
message and passing it to kernel_sendmsg() via ceph_tcp_sendmsg().

Since we are already working with a page in this case we can use
the sendpage interface instead.  Create a new ceph_tcp_sendpage()
helper that sets up flags to match the way ceph_tcp_sendmsg()
does now.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Reviewed-by: NSage Weil <sage@newdream.net>

31739139

libceph: fix inverted crc option logic · 37675b0f

由 Alex Elder 提交于 3月 07, 2012

CRC's are computed for all messages between ceph entities. The CRC
computation for the data portion of message can optionally be
disabled using the "nocrc" (common) ceph option. The default is
for CRC computation for the data portion to be enabled.

Unfortunately, the code that implements this feature interprets the
feature flag wrong, meaning that by default the CRC's have *not*
been computed (or checked) for the data portion of messages unless
the "nocrc" option was supplied.

Fix this, in write_partial_msg_pages() and read_partial_message().
Also change the flag variable in write_partial_msg_pages() to be
"no_datacrc" to match the usage elsewhere in the file.

This fixes http://tracker.newdream.net/issues/2064Signed-off-by: NAlex Elder <elder@dreamhost.com>
Reviewed-by: NSage Weil <sage@newdream.net>

37675b0f

libceph: some simple changes · 84495f49

由 Alex Elder 提交于 2月 15, 2012

Nothing too big here.
    - define the size of the buffer used for consuming ignored
      incoming data using a symbolic constant
    - simplify the condition determining whether to unmap the page
      in write_partial_msg_pages(): do it for crc but not if the
      page is the zero page
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

84495f49

libceph: small refactor in write_partial_kvec() · f42299e6

由 Alex Elder 提交于 2月 15, 2012

Make a small change in the code that counts down kvecs consumed by
a ceph_tcp_sendmsg() call.  Same functionality, just blocked out
a little differently.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

f42299e6

libceph: do crc calculations outside loop · fe3ad593

由 Alex Elder 提交于 2月 15, 2012

Move blocks of code out of loops in read_partial_message_section()
and read_partial_message().  They were only was getting called at
the end of the last iteration of the loop anyway.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

fe3ad593

libceph: separate CRC calculation from byte swapping · a9a0c51a

由 Alex Elder 提交于 2月 15, 2012

Calculate CRC in a separate step from rearranging the byte order
of the result, to improve clarity and readability.

Use offsetof() to determine the number of bytes to include in the
CRC calculation.

In read_partial_message(), switch which value gets byte-swapped,
since the just-computed CRC is already likely to be in a register.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

a9a0c51a

libceph: use "do" in CRC-related Boolean variables · bca064d2

由 Alex Elder 提交于 2月 15, 2012

Change the name (and type) of a few CRC-related Boolean local
variables so they contain the word "do", to distingish their purpose
from variables used for holding an actual CRC value.

Note that in the process of doing this I identified a fairly serious
logic error in write_partial_msg_pages(): the value of "do_crc"
assigned appears to be the opposite of what it should be. No
attempt to fix this is made here; this change preserves the
erroneous behavior. The problem I found is documented here:
http://tracker.newdream.net/issues/2064Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

bca064d2

ceph: ensure Boolean options support both senses · cffaba15

由 Alex Elder 提交于 2月 15, 2012

Many ceph-related Boolean options offer the ability to both enable
and disable a feature.  For all those that don't offer this, add
a new option so that they do.

Note that ceph_show_options()--which reports mount options currently
in effect--only reports the option if it is different from the
default value.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

cffaba15

libceph: a few small changes · d3002b97

由 Alex Elder 提交于 2月 14, 2012

This gathers a number of very minor changes:
    - use %hu when formatting the a socket address's address family
    - null out the ceph_msgr_wq pointer after the queue has been
      destroyed
    - drop a needless cast in ceph_write_space()
    - add a WARN() call in ceph_state_change() in the event an
      unrecognized socket state is encountered
    - rearrange the logic in ceph_con_get() and ceph_con_put() so
      that:
        - the reference counts are only atomically read once
	- the values displayed via dout() calls are known to
	  be meaningful at the time they are formatted
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

d3002b97

libceph: make ceph_tcp_connect() return int · 41617d0c

由 Alex Elder 提交于 2月 14, 2012

There is no real need for ceph_tcp_connect() to return the socket
pointer it creates, since it already assigns it to con->sock, which
is visible to the caller.  Instead, have it return an error code,
which tidies things up a bit.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

41617d0c

libceph: encapsulate some messenger cleanup code · 6173d1f0

由 Alex Elder 提交于 2月 14, 2012

Define a helper function to perform various cleanup operations.  Use
it both in the exit routine and in the init routine in the event of
an error.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

6173d1f0

libceph: make ceph_msgr_wq private · e0f43c94

由 Alex Elder 提交于 2月 14, 2012

The messenger workqueue has no need to be public.  So give it static
scope.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

e0f43c94

libceph: encapsulate connection kvec operations · 859eb799

由 Alex Elder 提交于 2月 14, 2012

Encapsulate the operation of adding a new chunk of data to the next
open slot in a ceph_connection's out_kvec array.  Also add a "reset"
operation to make subsequent add operations start at the beginning
of the array again.

Use these routines throughout, avoiding duplicate code and ensuring
all calls are handled consistently.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

859eb799

libceph: move prepare_write_banner() · 963be4d7

由 Alex Elder 提交于 2月 14, 2012

One of the arguments to prepare_write_connect() indicates whether it
is being called immediately after a call to prepare_write_banner().
Move the prepare_write_banner() call inside prepare_write_connect(),
and reinterpret (and rename) the "after_banner" argument so it
indicates that prepare_write_connect() should *make* the call
rather than should know it has already been made.

This was split out from the next patch to highlight this change in
logic.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

963be4d7

rbd: make ceph_parse_options() return a pointer · ee57741c

由 Alex Elder 提交于 1月 24, 2012

ceph_parse_options() takes the address of a pointer as an argument
and uses it to return the address of an allocated structure if
successful.  With this interface is not evident at call sites that
the pointer is always initialized.  Change the interface to return
the address instead (or a pointer-coded error code) to make the
validity of the returned pointer obvious.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

ee57741c

ceph: eliminate some abusive casts · 99f0f3b2

由 Alex Elder 提交于 1月 23, 2012

This fixes some spots where a type cast to (void *) was used as
as a universal type hiding mechanism.  Instead, properly cast the
type to the intended target type.
Signed-off-by: NAlex Elder <elder@newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

99f0f3b2

ceph: eliminate some needless casts · bd406145

由 Alex Elder 提交于 1月 23, 2012

This eliminates type casts in some places where they are not
required.
Signed-off-by: NAlex Elder <elder@newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

bd406145

ceph: kill addr_str_lock spinlock; use atomic instead · f64a9317

由 Alex Elder 提交于 1月 23, 2012

A spinlock is used to protect a value used for selecting an array
index for a string used for formatting a socket address for human
consumption.  The index is reset to 0 if it ever reaches the maximum
index value.

Instead, use an ever-increasing atomic variable as a sequence
number, and compute the array index by masking off all but the
sequence number's lowest bits.  Make the number of entries in the
array a power of two to allow the use of such a mask (to avoid jumps
in the index value when the sequence number wraps).

The length of these strings is somewhat arbitrarily set at 60 bytes.
The worst-case length of a string produced is 54 bytes, for an IPv6
address that can't be shortened, e.g.:
    [1234:5678:9abc:def0:1111:2222:123.234.210.100]:32767
Change it so we arbitrarily use 64 bytes instead; if nothing else
it will make the array of these line up better in hex dumps.

Rename a few things to reinforce the distinction between the number
of strings in the array and the length of individual strings.
Signed-off-by: NAlex Elder <elder@newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

f64a9317

ceph: make use of "else" where appropriate · a5bc3129

由 Alex Elder 提交于 1月 23, 2012

Rearrange ceph_tcp_connect() a bit, making use of "else" rather than
re-testing a value with consecutive "if" statements.  Don't record a
connection's socket pointer unless the connect operation is
successful.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

a5bc3129

ceph: use a shared zero page rather than one per messenger · 57666519

由 Alex Elder 提交于 1月 23, 2012

Each messenger allocates a page to be used when writing zeroes
out in the event of error or other abnormal condition.  Instead,
use the kernel ZERO_PAGE() for that purpose.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

57666519

libceph: fix overflow check in crush_decode() · 64486697

由 Xi Wang 提交于 2月 16, 2012

The existing overflow check (n > ULONG_MAX / b) didn't work, because
n = ULONG_MAX / b would both bypass the check and still overflow the
allocation size a + n * b.

The correct check should be (n > (ULONG_MAX - a) / b).
Signed-off-by: NXi Wang <xi.wang@gmail.com>
Signed-off-by: NSage Weil <sage@newdream.net>

64486697

net/ceph: Only clear SOCK_NOSPACE when there is sufficient space in the socket buffer · 182fac26

由 Jim Schutt 提交于 2月 29, 2012

The Ceph messenger would sometimes queue multiple work items to write
data to a socket when the socket buffer was full.

Fix this problem by making ceph_write_space() use SOCK_NOSPACE in the
same way that net/core/stream.c:sk_stream_write_space() does, i.e.,
clearing it only when sufficient space is available in the socket buffer.
Signed-off-by: NJim Schutt <jaschut@sandia.gov>
Reviewed-by: NAlex Elder <elder@dreamhost.com>

182fac26

17 3月, 2012 2 次提交

netfilter: ctnetlink: fix race between delete and timeout expiration · a16a1647

由 Pablo Neira Ayuso 提交于 3月 16, 2012

Kerin Millar reported hardlockups while running `conntrackd -c'
in a busy firewall. That system (with several processors) was
acting as backup in a primary-backup setup.

After several tries, I found a race condition between the deletion
operation of ctnetlink and timeout expiration. This patch fixes
this problem.
Tested-by: NKerin Millar <kerframil@gmail.com>
Reported-by: NKerin Millar <kerframil@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a16a1647

ipv6: Don't dev_hold(dev) in ip6_mc_find_dev_rcu. · c5779237

由 RongQing.Li 提交于 3月 15, 2012

ip6_mc_find_dev_rcu() is called with rcu_read_lock(), so don't
need to dev_hold().
With dev_hold(), not corresponding dev_put(), will lead to leak.

[ bug introduced in 96b52e61 (ipv6: mcast: RCU conversions) ]
Signed-off-by: NRongQing.Li <roy.qing.li@gmail.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c5779237

16 3月, 2012 2 次提交

sch_sfq: revert dont put new flow at the end of flows · cc34eb67

由 Eric Dumazet 提交于 3月 13, 2012

This reverts commit d47a0ac7 (sch_sfq: dont put new flow at the end of
flows)

As Jesper found out, patch sounded great but has bad side effects.

In stress situation, pushing new flows in front of the queue can prevent
old flows doing any progress. Packets can stay in SFQ queue for
unlimited amount of time.

It's possible to add heuristics to limit this problem, but this would
add complexity outside of SFQ scope.

A more sensible answer to Dave Taht concerns (who reported the issued I
tried to solve in original commit) is probably to use a qdisc hierarchy
so that high prio packets dont enter a potentially crowded SFQ qdisc.
Reported-by: NJesper Dangaard Brouer <jdb@comx.dk>
Cc: Dave Taht <dave.taht@gmail.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cc34eb67

ipv6: fix icmp6_dst_alloc() · 122bdf67

由 Eric Dumazet 提交于 3月 14, 2012

commit 87a11578 ( ipv6: Move xfrm_lookup() call down into
icmp6_dst_alloc().) forgot to convert one error path, leading
to crashes in mld_sendpack()

Many thanks to Dave Jones for providing a very complete bug report.
Reported-by: NDave Jones <davej@redhat.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

122bdf67

12 3月, 2012 1 次提交

tcp: fix syncookie regression · dfd25fff

由 Eric Dumazet 提交于 3月 10, 2012

commit ea4fc0d6 (ipv4: Don't use rt->rt_{src,dst} in ip_queue_xmit())
added a serious regression on synflood handling.

Simon Kirby discovered a successful connection was delayed by 20 seconds
before being responsive.

In my tests, I discovered that xmit frames were lost, and needed ~4
retransmits and a socket dst rebuild before being really sent.

In case of syncookie initiated connection, we use a different path to
initialize the socket dst, and inet->cork.fl.u.ip4 is left cleared.

As ip_queue_xmit() now depends on inet flow being setup, fix this by
copying the temp flowi4 we use in cookie_v4_check().
Reported-by: NSimon Kirby <sim@netnation.com>
Bisected-by: NSimon Kirby <sim@netnation.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Tested-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dfd25fff

08 3月, 2012 5 次提交

route: Remove redirect_genid · ac3f48de

由 Steffen Klassert 提交于 3月 06, 2012

As we invalidate the inetpeer tree along with the routing cache now,
we don't need a genid to reset the redirect handling when the routing
cache is flushed.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ac3f48de

inetpeer: Invalidate the inetpeer tree along with the routing cache · 5faa5df1

由 Steffen Klassert 提交于 3月 06, 2012

We initialize the routing metrics with the values cached on the
inetpeer in rt_init_metrics(). So if we have the metrics cached on the
inetpeer, we ignore the user configured fib_metrics.

To fix this issue, we replace the old tree with a fresh initialized
inet_peer_base. The old tree is removed later with a delayed work queue.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5faa5df1

bridge: fix state reporting when port is disabled · 5200959b

由 Paulius Zaleckas 提交于 3月 06, 2012

Now we have:
eth0: link *down*
br0: port 1(eth0) entered *forwarding* state

br_log_state(p) should be called *after* p->state is set
to BR_STATE_DISABLED.
Reported-by: NZilvinas Valinskas <zilvinas@wilibox.com>
Signed-off-by: NPaulius Zaleckas <paulius.zaleckas@gmail.com>
Acked-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5200959b

bridge: br_log_state() s/entering/entered/ · d9e179ec

由 Paulius Zaleckas 提交于 3月 06, 2012

When br_log_state() is reporting state it should say "entered"
istead of "entering" since state at this point is already
changed.
Signed-off-by: NPaulius Zaleckas <paulius.zaleckas@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d9e179ec

openvswitch: Fix checksum update for actions on UDP packets. · 81e5d41d

由 Jesse Gross 提交于 3月 06, 2012

When modifying IP addresses or ports on a UDP packet we don't
correctly follow the rules for unchecksummed packets.  This meant
that packets without a checksum can be given a incorrect new checksum
and packets with a checksum can become marked as being unchecksummed.
This fixes it to handle those requirements.
Signed-off-by: NJesse Gross <jesse@nicira.com>

81e5d41d

07 3月, 2012 5 次提交

openvswitch: Honor dp_ifindex, when specified, for vport lookup by name. · 651a68ea

由 Ben Pfaff 提交于 3月 06, 2012

When OVS_VPORT_ATTR_NAME is specified and dp_ifindex is nonzero, the
logical behavior would be for the vport name lookup scope to be limited
to the specified datapath, but in fact the dp_ifindex value was ignored.
This commit causes the search scope to be honored.
Signed-off-by: NBen Pfaff <blp@nicira.com>
Signed-off-by: NJesse Gross <jesse@nicira.com>

651a68ea

IPv6: Fix not join all-router mcast group when forwarding set. · d6ddef9e

由 Li Wei 提交于 3月 05, 2012

When forwarding was set and a new net device is register,
we need add this device to the all-router mcast group.
Signed-off-by: NLi Wei <lw@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d6ddef9e

netfilter: nf_conntrack: fix early_drop with reliable event delivery · 74138511

由 Pablo Neira Ayuso 提交于 3月 06, 2012

If reliable event delivery is enabled and ctnetlink fails to deliver
the destroy event in early_drop, the conntrack subsystem cannot
drop any the candidate flow that was planned to be evicted.
Reported-by: NKerin Millar <kerframil@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

74138511

bridge: netfilter: don't call iptables on vlan packets if sysctl is off · 739e4505

由 Florian Westphal 提交于 3月 06, 2012

When net.bridge.bridge-nf-filter-vlan-tagged is 0 (default), vlan packets
arriving should not be sent to ip(6)tables by bridge netfilter.

However, it turns out that we currently always send VLAN packets to
netfilter, if ..
a), CONFIG_VLAN_8021Q is enabled ; or
b), CONFIG_VLAN_8021Q is not set but rx vlan offload is enabled
   on the bridge port.

This is because bridge netfilter treats skb with
skb->protocol == ETH_P_IP{V6} as "non-vlan packet".

With rx vlan offload on or CONFIG_VLAN_8021Q=y, the vlan header has
already been removed here, and we cannot rely on skb->protocol alone.

Fix this by only using skb->protocol if the skb has no vlan tag,
or if a vlan tag is present and filter-vlan-tagged bridge netfilter
sysctl is enabled.

We cannot remove the skb->protocol == htons(ETH_P_8021Q) test
because the vlan tag is still around in the CONFIG_VLAN_8021Q=n &&
"ethtool -K $itf rxvlan off" case.

reproducer:
iptables -t raw -I PREROUTING -i br0
iptables -t raw -I PREROUTING -i br0.1

Then send packets to an ip address configured on br0.1 interface.
Even with net.bridge.bridge-nf-filter-vlan-tagged=0, the 1st rule
will match instead of the 2nd one.

With this patch applied, the 2nd rule will match instead.
In the non-local address case, netfilter won't be consulted after
this patch unless the sysctl is switched on.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

739e4505

netfilter: bridge: fix wrong pointer dereference · a157b9d5

由 Pablo Neira Ayuso 提交于 3月 06, 2012

In adf7ff8, a invalid dereference was added in ebt_make_names.

CC [M] net/bridge/netfilter/ebtables.o
net/bridge/netfilter/ebtables.c: In function `ebt_make_names':
net/bridge/netfilter/ebtables.c:1371:20: warning: `t' may be used uninitialized in this function [-Wuninitialized]
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a157b9d5

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功