1. 18 12月, 2012 2 次提交
    • A
      libceph: socket can close in any connection state · 7bb21d68
      Alex Elder 提交于
      A connection's socket can close for any reason, independent of the
      state of the connection (and without irrespective of the connection
      mutex).  As a result, the connectino can be in pretty much any state
      at the time its socket is closed.
      
      Handle those other cases at the top of con_work().  Pull this whole
      block of code into a separate function to reduce the clutter.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      7bb21d68
    • A
      rbd: remove linger unconditionally · 61c74035
      Alex Elder 提交于
      In __unregister_linger_request(), the request is being removed
      from the osd client's req_linger list only when the request
      has a non-null osd pointer.  It should be done whether or not
      the request currently has an osd.
      
      This is most likely a non-issue because I believe the request
      will always have an osd when this function is called.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      61c74035
  2. 17 12月, 2012 2 次提交
    • A
      libceph: avoid using freed osd in __kick_osd_requests() · 685a7555
      Alex Elder 提交于
      If an osd has no requests and no linger requests, __reset_osd()
      will just remove it with a call to __remove_osd().  That drops
      a reference to the osd, and therefore the osd may have been free
      by the time __reset_osd() returns.  That function offers no
      indication this may have occurred, and as a result the osd will
      continue to be used even when it's no longer valid.
      
      Change__reset_osd() so it returns an error (ENODEV) when it
      deletes the osd being reset.  And change __kick_osd_requests() so it
      returns immediately (before referencing osd again) if __reset_osd()
      returns *any* error.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      685a7555
    • A
      ceph: don't reference req after put · 7d5f2481
      Alex Elder 提交于
      In __unregister_request(), there is a call to list_del_init()
      referencing a request that was the subject of a call to
      ceph_osdc_put_request() on the previous line.  This is not
      safe, because the request structure could have been freed
      by the time we reach the list_del_init().
      
      Fix this by reversing the order of these lines.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-off-by: NSage Weil <sage@inktank.com>
      7d5f2481
  3. 13 12月, 2012 1 次提交
    • S
      libceph: remove 'osdtimeout' option · 83aff95e
      Sage Weil 提交于
      This would reset a connection with any OSD that had an outstanding
      request that was taking more than N seconds.  The idea was that if the
      OSD was buggy, the client could compensate by resending the request.
      
      In reality, this only served to hide server bugs, and we haven't
      actually seen such a bug in quite a while.  Moreover, the userspace
      client code never did this.
      
      More importantly, often the request is taking a long time because the
      OSD is trying to recover, or overloaded, and killing the connection
      and retrying would only make the situation worse by giving the OSD
      more work to do.
      Signed-off-by: NSage Weil <sage@inktank.com>
      Reviewed-by: NAlex Elder <elder@inktank.com>
      83aff95e
  4. 01 11月, 2012 1 次提交
  5. 30 10月, 2012 1 次提交
  6. 27 10月, 2012 1 次提交
    • S
      libceph: avoid NULL kref_put from NULL alloc_msg return · 7246240c
      Sage Weil 提交于
      The ceph_on_in_msg_alloc() method calls the ->alloc_msg() helper which
      may return NULL.  It also drops con->mutex while it allocates a message,
      which means that the connection state may change (e.g., get closed).  If
      that happens, we clean up and bail out.  Avoid calling ceph_msg_put() on
      a NULL return value and triggering a crash.
      
      This was observed when an ->alloc_msg() call races with a timeout that
      resends a zillion messages and resets the connection, and ->alloc_msg()
      returns NULL (because the request was resent to another target).
      
      Fixes http://tracker.newdream.net/issues/3342Signed-off-by: NSage Weil <sage@inktank.com>
      Reviewed-by: NAlex Elder <elder@inktank.com>
      7246240c
  7. 10 10月, 2012 3 次提交
    • A
      rbd: define common queue_con_delay() · 802c6d96
      Alex Elder 提交于
      This patch defines a single function, queue_con_delay() to call
      queue_delayed_work() for a connection.  It basically generalizes
      what was previously queue_con() by adding the delay argument.
      queue_con() is now a simple helper that passes 0 for its delay.
      queue_con_delay() returns 0 if it queued work or an errno if it
      did not for some reason.
      
      If con_work() finds the BACKOFF flag set for a connection, it now
      calls queue_con_delay() to handle arranging to start again after a
      delay.
      
      Note about connection reference counts:  con_work() only ever gets
      called as a work item function.  At the time that work is scheduled,
      a reference to the connection is acquired, and the corresponding
      con_work() call is then responsible for dropping that reference
      before it returns.
      
      Previously, the backoff handling inside con_work() silently handed
      off its reference to delayed work it scheduled.  Now that
      queue_con_delay() is used, a new reference is acquired for the
      newly-scheduled work, and the original reference is dropped by the
      con->ops->put() call at the end of the function.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      802c6d96
    • A
      rbd: let con_work() handle backoff · 8618e30b
      Alex Elder 提交于
      Both ceph_fault() and con_work() include handling for imposing a
      delay before doing further processing on a faulted connection.
      The latter is used only if ceph_fault() is unable to.
      
      Instead, just let con_work() always be responsible for implementing
      the delay.  After setting up the delay value, set the BACKOFF flag
      on the connection unconditionally and call queue_con() to ensure
      con_work() will get called to handle it.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      8618e30b
    • A
      rbd: reset BACKOFF if unable to re-queue · 588377d6
      Alex Elder 提交于
      If ceph_fault() is unable to queue work after a delay, it sets the
      BACKOFF connection flag so con_work() will attempt to do so.
      
      In con_work(), when BACKOFF is set, if queue_delayed_work() doesn't
      result in newly-queued work, it simply ignores this condition and
      proceeds as if no backoff delay were desired.  There are two
      problems with this--one of which is a bug.
      
      The first problem is simply that the intended behavior is to back
      off, and if we aren't able queue the work item to run after a delay
      we're not doing that.
      
      The only reason queue_delayed_work() won't queue work is if the
      provided work item is already queued.  In the messenger, this
      means that con_work() is already scheduled to be run again.  So
      if we simply set the BACKOFF flag again when this occurs, we know
      the next con_work() call will again attempt to hold off activity
      on the connection until after the delay.
      
      The second problem--the bug--is a leak of a reference count.  If
      queue_delayed_work() returns 0 in con_work(), con->ops->put() drops
      the connection reference held on entry to con_work().  However,
      processing is (was) allowed to continue, and at the end of the
      function a second con->ops->put() is called.
      
      This patch fixes both problems.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      588377d6
  8. 02 10月, 2012 5 次提交
  9. 28 9月, 2012 2 次提交
  10. 26 9月, 2012 3 次提交
  11. 25 9月, 2012 1 次提交
  12. 24 9月, 2012 2 次提交
    • L
      batman-adv: Fix symmetry check / route flapping in multi interface setups · 7caf69fb
      Linus Lüssing 提交于
      If receiving an OGM from a neighbor other than the currently selected
      and if it has the same TQ then we are supposed to switch if this
      neighbor provides a more symmetric link than the currently selected one.
      
      However this symmetry check currently is broken if the interface of the
      neighbor we received the OGM from and the one of the currently selected
      neighbor differ: We are currently trying to determine the symmetry of the
      link towards the selected router via the link we received the OGM from
      instead of just checking via the link towards the currently selected
      router.
      
      This leads to way more route switches than necessary and can lead to
      permanent route flapping in many common multi interface setups.
      
      This patch fixes this issue by using the right interface for this
      symmetry check.
      Signed-off-by: NLinus Lüssing <linus.luessing@web.de>
      7caf69fb
    • D
      batman-adv: Fix change mac address of soft iface. · 40a3eb33
      Def 提交于
      Into function interface_set_mac_addr, the function tt_local_add was
      invoked before updating dev->dev_addr. The new MAC address was not
      tagged as NoPurge.
      Signed-off-by: NDef <def@laposte.net>
      40a3eb33
  13. 23 9月, 2012 1 次提交
  14. 22 9月, 2012 3 次提交
  15. 21 9月, 2012 8 次提交
    • E
      net: do not disable sg for packets requiring no checksum · c0d680e5
      Ed Cashin 提交于
      A change in a series of VLAN-related changes appears to have
      inadvertently disabled the use of the scatter gather feature of
      network cards for transmission of non-IP ethernet protocols like ATA
      over Ethernet (AoE).  Below is a reference to the commit that
      introduces a "harmonize_features" function that turns off scatter
      gather when the NIC does not support hardware checksumming for the
      ethernet protocol of an sk buff.
      
        commit f01a5236
        Author: Jesse Gross <jesse@nicira.com>
        Date:   Sun Jan 9 06:23:31 2011 +0000
      
            net offloading: Generalize netif_get_vlan_features().
      
      The can_checksum_protocol function is not equipped to consider a
      protocol that does not require checksumming.  Calling it for a
      protocol that requires no checksum is inappropriate.
      
      The patch below has harmonize_features call can_checksum_protocol when
      the protocol needs a checksum, so that the network layer is not forced
      to perform unnecessary skb linearization on the transmission of AoE
      packets.  Unnecessary linearization results in decreased performance
      and increased memory pressure, as reported here:
      
        http://www.spinics.net/lists/linux-mm/msg15184.html
      
      The problem has probably not been widely experienced yet, because
      only recently has the kernel.org-distributed aoe driver acquired the
      ability to use payloads of over a page in size, with the patchset
      recently included in the mm tree:
      
        https://lkml.org/lkml/2012/8/28/140
      
      The coraid.com-distributed aoe driver already could use payloads of
      greater than a page in size, but its users generally do not use the
      newest kernels.
      Signed-off-by: NEd Cashin <ecashin@coraid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c0d680e5
    • M
      xfrm_user: don't copy esn replay window twice for new states · e3ac104d
      Mathias Krause 提交于
      The ESN replay window was already fully initialized in
      xfrm_alloc_replay_state_esn(). No need to copy it again.
      
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NMathias Krause <minipli@googlemail.com>
      Acked-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e3ac104d
    • M
      xfrm_user: ensure user supplied esn replay window is valid · ecd79187
      Mathias Krause 提交于
      The current code fails to ensure that the netlink message actually
      contains as many bytes as the header indicates. If a user creates a new
      state or updates an existing one but does not supply the bytes for the
      whole ESN replay window, the kernel copies random heap bytes into the
      replay bitmap, the ones happen to follow the XFRMA_REPLAY_ESN_VAL
      netlink attribute. This leads to following issues:
      
      1. The replay window has random bits set confusing the replay handling
         code later on.
      
      2. A malicious user could use this flaw to leak up to ~3.5kB of heap
         memory when she has access to the XFRM netlink interface (requires
         CAP_NET_ADMIN).
      
      Known users of the ESN replay window are strongSwan and Steffen's
      iproute2 patch (<http://patchwork.ozlabs.org/patch/85962/>). The latter
      uses the interface with a bitmap supplied while the former does not.
      strongSwan is therefore prone to run into issue 1.
      
      To fix both issues without breaking existing userland allow using the
      XFRMA_REPLAY_ESN_VAL netlink attribute with either an empty bitmap or a
      fully specified one. For the former case we initialize the in-kernel
      bitmap with zero, for the latter we copy the user supplied bitmap. For
      state updates the full bitmap must be supplied.
      
      To prevent overflows in the bitmap length calculation the maximum size
      of bmp_len is limited to 128 by this patch -- resulting in a maximum
      replay window of 4096 packets. This should be sufficient for all real
      life scenarios (RFC 4303 recommends a default replay window size of 64).
      
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: Martin Willi <martin@revosec.ch>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NMathias Krause <minipli@googlemail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ecd79187
    • M
      xfrm_user: fix info leak in copy_to_user_tmpl() · 1f86840f
      Mathias Krause 提交于
      The memory used for the template copy is a local stack variable. As
      struct xfrm_user_tmpl contains multiple holes added by the compiler for
      alignment, not initializing the memory will lead to leaking stack bytes
      to userland. Add an explicit memset(0) to avoid the info leak.
      
      Initial version of the patch by Brad Spengler.
      
      Cc: Brad Spengler <spender@grsecurity.net>
      Signed-off-by: NMathias Krause <minipli@googlemail.com>
      Acked-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f86840f
    • M
      xfrm_user: fix info leak in copy_to_user_policy() · 7b789836
      Mathias Krause 提交于
      The memory reserved to dump the xfrm policy includes multiple padding
      bytes added by the compiler for alignment (padding bytes in struct
      xfrm_selector and struct xfrm_userpolicy_info). Add an explicit
      memset(0) before filling the buffer to avoid the heap info leak.
      Signed-off-by: NMathias Krause <minipli@googlemail.com>
      Acked-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7b789836
    • M
      xfrm_user: fix info leak in copy_to_user_state() · f778a636
      Mathias Krause 提交于
      The memory reserved to dump the xfrm state includes the padding bytes of
      struct xfrm_usersa_info added by the compiler for alignment (7 for
      amd64, 3 for i386). Add an explicit memset(0) before filling the buffer
      to avoid the info leak.
      Signed-off-by: NMathias Krause <minipli@googlemail.com>
      Acked-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f778a636
    • M
      xfrm_user: fix info leak in copy_to_user_auth() · 4c87308b
      Mathias Krause 提交于
      copy_to_user_auth() fails to initialize the remainder of alg_name and
      therefore discloses up to 54 bytes of heap memory via netlink to
      userland.
      
      Use strncpy() instead of strcpy() to fill the trailing bytes of alg_name
      with null bytes.
      Signed-off-by: NMathias Krause <minipli@googlemail.com>
      Acked-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c87308b
    • A
      tcp: restore rcv_wscale in a repair mode (v2) · bc26ccd8
      Andrey Vagin 提交于
      rcv_wscale is a symetric parameter with snd_wscale.
      
      Both this parameters are set on a connection handshake.
      
      Without this value a remote window size can not be interpreted correctly,
      because a value from a packet should be shifted on rcv_wscale.
      
      And one more thing is that wscale_ok should be set too.
      
      This patch doesn't break a backward compatibility.
      If someone uses it in a old scheme, a rcv window
      will be restored with the same bug (rcv_wscale = 0).
      
      v2: Save backward compatibility on big-endian system. Before
          the first two bytes were snd_wscale and the second two bytes were
          rcv_wscale. Now snd_wscale is opt_val & 0xFFFF and rcv_wscale >> 16.
          This approach is independent on byte ordering.
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: James Morris <jmorris@namei.org>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: Patrick McHardy <kaber@trash.net>
      CC: Pavel Emelyanov <xemul@parallels.com>
      Signed-off-by: NAndrew Vagin <avagin@openvz.org>
      Acked-by: NPavel Emelyanov <xemul@parallels.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc26ccd8
  16. 20 9月, 2012 4 次提交