1. 19 10月, 2012 2 次提交
  2. 17 10月, 2012 3 次提交
  3. 13 10月, 2012 5 次提交
  4. 12 10月, 2012 2 次提交
  5. 11 10月, 2012 7 次提交
  6. 10 10月, 2012 1 次提交
    • J
      RDS: fix rds-ping spinlock recursion · 5175a5e7
      jeff.liu 提交于
      This is the revised patch for fixing rds-ping spinlock recursion
      according to Venkat's suggestions.
      
      RDS ping/pong over TCP feature has been broken for years(2.6.39 to
      3.6.0) since we have to set TCP cork and call kernel_sendmsg() between
      ping/pong which both need to lock "struct sock *sk". However, this
      lock has already been hold before rds_tcp_data_ready() callback is
      triggerred. As a result, we always facing spinlock resursion which
      would resulting in system panic.
      
      Given that RDS ping is only used to test the connectivity and not for
      serious performance measurements, we can queue the pong transmit to
      rds_wq as a delayed response.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      CC: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
      CC: David S. Miller <davem@davemloft.net>
      CC: James Morris <james.l.morris@oracle.com>
      Signed-off-by: NJie Liu <jeff.liu@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5175a5e7
  7. 09 10月, 2012 15 次提交
    • M
      rbtree: empty nodes have no color · 4c199a93
      Michel Lespinasse 提交于
      Empty nodes have no color.  We can make use of this property to simplify
      the code emitted by the RB_EMPTY_NODE and RB_CLEAR_NODE macros.  Also,
      we can get rid of the rb_init_node function which had been introduced by
      commit 88d19cf3 ("timers: Add rb_init_node() to allow for stack
      allocated rb nodes") to avoid some issue with the empty node's color not
      being initialized.
      
      I'm not sure what the RB_EMPTY_NODE checks in rb_prev() / rb_next() are
      doing there, though.  axboe introduced them in commit 10fd48f2
      ("rbtree: fixed reversed RB_EMPTY_NODE and rb_next/prev").  The way I
      see it, the 'empty node' abstraction is only used by rbtree users to
      flag nodes that they haven't inserted in any rbtree, so asking the
      predecessor or successor of such nodes doesn't make any sense.
      
      One final rb_init_node() caller was recently added in sysctl code to
      implement faster sysctl name lookups.  This code doesn't make use of
      RB_EMPTY_NODE at all, and from what I could see it only called
      rb_init_node() under the mistaken assumption that such initialization was
      required before node insertion.
      
      [sfr@canb.auug.org.au: fix net/ceph/osd_client.c build]
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Acked-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Daniel Santos <daniel.santos@pobox.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4c199a93
    • J
      ipvs: fix ARP resolving for direct routing mode · ad4d3ef8
      Julian Anastasov 提交于
      After the change "Make neigh lookups directly in output packet path"
      (commit a263b309) IPVS can not reach the real server for DR mode
      because we resolve the destination address from IP header, not from
      route neighbour. Use the new FLOWI_FLAG_KNOWN_NH flag to request
      output routes with known nexthop, so that it has preference
      on resolving.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ad4d3ef8
    • J
      ipv4: Add FLOWI_FLAG_KNOWN_NH · c92b9655
      Julian Anastasov 提交于
      Add flag to request that output route should be
      returned with known rt_gateway, in case we want to use
      it as nexthop for neighbour resolving.
      
      	The returned route can be cached as follows:
      
      - in NH exception: because the cached routes are not shared
      	with other destinations
      - in FIB NH: when using gateway because all destinations for
      	NH share same gateway
      
      	As last option, to return rt_gateway!=0 we have to
      set DST_NOCACHE.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c92b9655
    • J
      ipv4: introduce rt_uses_gateway · 155e8336
      Julian Anastasov 提交于
      Add new flag to remember when route is via gateway.
      We will use it to allow rt_gateway to contain address of
      directly connected host for the cases when DST_NOCACHE is
      used or when the NH exception caches per-destination route
      without DST_NOCACHE flag, i.e. when routes are not used for
      other destinations. By this way we force the neighbour
      resolving to work with the routed destination but we
      can use different address in the packet, feature needed
      for IPVS-DR where original packet for virtual IP is routed
      via route to real IP.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      155e8336
    • J
      ipv4: make sure nh_pcpu_rth_output is always allocated · f8a17175
      Julian Anastasov 提交于
      Avoid checking nh_pcpu_rth_output in fast path,
      abort fib_info creation on alloc_percpu failure.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f8a17175
    • J
      ipv4: fix forwarding for strict source routes · e0adef0f
      Julian Anastasov 提交于
      After the change "Adjust semantics of rt->rt_gateway"
      (commit f8126f1d) rt_gateway can be 0 but ip_forward() compares
      it directly with nexthop. What we want here is to check if traffic
      is to directly connected nexthop and to fail if using gateway.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e0adef0f
    • J
      ipv4: fix sending of redirects · e81da0e1
      Julian Anastasov 提交于
      After "Cache input routes in fib_info nexthops" (commit
      d2d68ba9) and "Elide fib_validate_source() completely when possible"
      (commit 7a9bc9b8) we can not send ICMP redirects. It seems we
      should not cache the RTCF_DOREDIRECT flag in nh_rth_input because
      the same fib_info can be used for traffic that is not redirected,
      eg. from other input devices or from sources that are not in same subnet.
      
      	As result, we have to disable the caching of RTCF_DOREDIRECT
      flag and to force source validation for the case when forwarding
      traffic to the input device. If traffic comes from directly connected
      source we allow redirection as it was done before both changes.
      
      	Avoid setting RTCF_DOREDIRECT if IN_DEV_TX_REDIRECTS
      is disabled, this can avoid source address validation and to
      help caching the routes.
      
      	After the change "Adjust semantics of rt->rt_gateway"
      (commit f8126f1d) we should make sure our ICMP_REDIR_HOST messages
      contain daddr instead of 0.0.0.0 when target is directly connected.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e81da0e1
    • E
      ipv6: gro: fix PV6_GRO_CB(skb)->proto problem · 86347245
      Eric Dumazet 提交于
      It seems IPV6_GRO_CB(skb)->proto can be destroyed in skb_gro_receive()
      if a new skb is allocated (to serve as an anchor for frag_list)
      
      We copy NAPI_GRO_CB() only (not the IPV6 specific part) in :
      
      *NAPI_GRO_CB(nskb) = *NAPI_GRO_CB(p);
      
      So we leave IPV6_GRO_CB(nskb)->proto to 0 (fresh skb allocation) instead
      of IPPROTO_TCP (6)
      
      ipv6_gro_complete() isnt able to call ops->gro_complete()
      [ tcp6_gro_complete() ]
      
      Fix this by moving proto in NAPI_GRO_CB() and getting rid of
      IPV6_GRO_CB
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86347245
    • F
      vlan: don't deliver frames for unknown vlans to protocols · 48cc32d3
      Florian Zumbiehl 提交于
      6a32e4f9 made the vlan code skip marking
      vlan-tagged frames for not locally configured vlans as PACKET_OTHERHOST if
      there was an rx_handler, as the rx_handler could cause the frame to be received
      on a different (virtual) vlan-capable interface where that vlan might be
      configured.
      
      As rx_handlers do not necessarily return RX_HANDLER_ANOTHER, this could cause
      frames for unknown vlans to be delivered to the protocol stack as if they had
      been received untagged.
      
      For example, if an ipv6 router advertisement that's tagged for a locally not
      configured vlan is received on an interface with macvlan interfaces attached,
      macvlan's rx_handler returns RX_HANDLER_PASS after delivering the frame to the
      macvlan interfaces, which caused it to be passed to the protocol stack, leading
      to ipv6 addresses for the announced prefix being configured even though those
      are completely unusable on the underlying interface.
      
      The fix moves marking as PACKET_OTHERHOST after the rx_handler so the
      rx_handler, if there is one, sees the frame unchanged, but afterwards,
      before the frame is delivered to the protocol stack, it gets marked whether
      there is an rx_handler or not.
      Signed-off-by: NFlorian Zumbiehl <florz@florz.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48cc32d3
    • F
      mac80211: use ieee80211_free_txskb to fix possible skb leaks · c3e7724b
      Felix Fietkau 提交于
      A few places free skbs using dev_kfree_skb even though they're called
      after ieee80211_subif_start_xmit might have cloned it for tracking tx
      status. Use ieee80211_free_txskb here to prevent skb leaks.
      Signed-off-by: NFelix Fietkau <nbd@openwrt.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      c3e7724b
    • T
      mac80211: call drv_get_tsf() in sleepable context · 55fabefe
      Thomas Pedersen 提交于
      The call to drv_get/set_tsf() was put on the workqueue to perform tsf
      adjustments since that function might sleep. However it ended up inside
      a spinlock, whose critical section must be atomic. Do tsf adjustment
      outside the spinlock instead, and get rid of a warning.
      Signed-off-by: NThomas Pedersen <thomas@cozybit.com>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      55fabefe
    • E
      net: gro: selective flush of packets · 2e71a6f8
      Eric Dumazet 提交于
      Current GRO can hold packets in gro_list for almost unlimited
      time, in case napi->poll() handler consumes its budget over and over.
      
      In this case, napi_complete()/napi_gro_flush() are not called.
      
      Another problem is that gro_list is flushed in non friendly way :
      We scan the list and complete packets in the reverse order.
      (youngest packets first, oldest packets last)
      This defeats priorities that sender could have cooked.
      
      Since GRO currently only store TCP packets, we dont really notice the
      bug because of retransmits, but this behavior can add unexpected
      latencies, particularly on mice flows clamped by elephant flows.
      
      This patch makes sure no packet can stay more than 1 ms in queue, and
      only in stress situations.
      
      It also complete packets in the right order to minimize latencies.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Jesse Gross <jesse@nicira.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e71a6f8
    • S
      ipv4: Don't report stale pmtu values to userspace · ee9a8f7a
      Steffen Klassert 提交于
      We report cached pmtu values even if they are already expired.
      Change this to not report these values after they are expired
      and fix a race in the expire time calculation, as suggested by
      Eric Dumazet.
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee9a8f7a
    • S
      ipv4: Don't create nh exeption when the device mtu is smaller than the reported pmtu · 7f92d334
      Steffen Klassert 提交于
      When a local tool like tracepath tries to send packets bigger than
      the device mtu, we create a nh exeption and set the pmtu to device
      mtu. The device mtu does not expire, so check if the device mtu is
      smaller than the reported pmtu and don't crerate a nh exeption in
      that case.
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7f92d334
    • S
      ipv4: Always invalidate or update the route on pmtu events · d851c12b
      Steffen Klassert 提交于
      Some protocols, like IPsec still cache routes. So we need to invalidate
      the old route on pmtu events to avoid the reuse of stale routes.
      We also need to update the mtu and expire time of the route if we already
      use a nh exception route, otherwise we ignore newly learned pmtu values
      after the first expiration.
      
      With this patch we always invalidate or update the route on pmtu events.
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d851c12b
  8. 08 10月, 2012 4 次提交
    • D
      KEYS: Add payload preparsing opportunity prior to key instantiate or update · cf7f601c
      David Howells 提交于
      Give the key type the opportunity to preparse the payload prior to the
      instantiation and update routines being called.  This is done with the
      provision of two new key type operations:
      
      	int (*preparse)(struct key_preparsed_payload *prep);
      	void (*free_preparse)(struct key_preparsed_payload *prep);
      
      If the first operation is present, then it is called before key creation (in
      the add/update case) or before the key semaphore is taken (in the update and
      instantiate cases).  The second operation is called to clean up if the first
      was called.
      
      preparse() is given the opportunity to fill in the following structure:
      
      	struct key_preparsed_payload {
      		char		*description;
      		void		*type_data[2];
      		void		*payload;
      		const void	*data;
      		size_t		datalen;
      		size_t		quotalen;
      	};
      
      Before the preparser is called, the first three fields will have been cleared,
      the payload pointer and size will be stored in data and datalen and the default
      quota size from the key_type struct will be stored into quotalen.
      
      The preparser may parse the payload in any way it likes and may store data in
      the type_data[] and payload fields for use by the instantiate() and update()
      ops.
      
      The preparser may also propose a description for the key by attaching it as a
      string to the description field.  This can be used by passing a NULL or ""
      description to the add_key() system call or the key_create_or_update()
      function.  This cannot work with request_key() as that required the description
      to tell the upcall about the key to be created.
      
      This, for example permits keys that store PGP public keys to generate their own
      name from the user ID and public key fingerprint in the key.
      
      The instantiate() and update() operations are then modified to look like this:
      
      	int (*instantiate)(struct key *key, struct key_preparsed_payload *prep);
      	int (*update)(struct key *key, struct key_preparsed_payload *prep);
      
      and the new payload data is passed in *prep, whether or not it was preparsed.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      cf7f601c
    • E
      net: gro: fix a potential crash in skb_gro_reset_offset · ca07e43e
      Eric Dumazet 提交于
      Before accessing skb first fragment, better make sure there
      is one.
      
      This is probably not needed for old kernels, since an ethernet frame
      cannot contain only an ethernet header, but the recent GRO addition
      to tunnels makes this patch needed.
      
      Also skb_gro_reset_offset() can be static, it actually allows
      compiler to inline it.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ca07e43e
    • E
      ipv6: GRO should be ECN friendly · 51ec0403
      Eric Dumazet 提交于
      IPv4 side of the problem was addressed in commit a9e050f4
      (net: tcp: GRO should be ECN friendly)
      
      This patch does the same, but for IPv6 : A Traffic Class mismatch
      doesnt mean flows are different, but instead should force a flush
      of previous packets.
      
      This patch removes artificial packet reordering problem.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      51ec0403
    • R
      net: Fix skb_under_panic oops in neigh_resolve_output · e1f16503
      ramesh.nagappa@gmail.com 提交于
      The retry loop in neigh_resolve_output() and neigh_connected_output()
      call dev_hard_header() with out reseting the skb to network_header.
      This causes the retry to fail with skb_under_panic. The fix is to
      reset the network_header within the retry loop.
      Signed-off-by: NRamesh Nagappa <ramesh.nagappa@ericsson.com>
      Reviewed-by: NShawn Lu <shawn.lu@ericsson.com>
      Reviewed-by: NRobert Coulson <robert.coulson@ericsson.com>
      Reviewed-by: NBillie Alsup <billie.alsup@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e1f16503
  9. 07 10月, 2012 1 次提交
    • E
      net: remove skb recycling · acb600de
      Eric Dumazet 提交于
      Over time, skb recycling infrastructure got litle interest and
      many bugs. Generic rx path skb allocation is now using page
      fragments for efficient GRO / TCP coalescing, and recyling
      a tx skb for rx path is not worth the pain.
      
      Last identified bug is that fat skbs can be recycled
      and it can endup using high order pages after few iterations.
      
      With help from Maxime Bizon, who pointed out that commit
      87151b86 (net: allow pskb_expand_head() to get maximum tailroom)
      introduced this regression for recycled skbs.
      
      Instead of fixing this bug, lets remove skb recycling.
      
      Drivers wanting really hot skbs should use build_skb() anyway,
      to allocate/populate sk_buff right before netif_receive_skb()
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Maxime Bizon <mbizon@freebox.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      acb600de