1. 24 9月, 2015 2 次提交
    • N
      netpoll: Close race condition between poll_one_napi and napi_disable · 2d8bff12
      Neil Horman 提交于
      Drivers might call napi_disable while not holding the napi instance poll_lock.
      In those instances, its possible for a race condition to exist between
      poll_one_napi and napi_disable.  That is to say, poll_one_napi only tests the
      NAPI_STATE_SCHED bit to see if there is work to do during a poll, and as such
      the following may happen:
      
      CPU0				CPU1
      ndo_tx_timeout			napi_poll_dev
       napi_disable			 poll_one_napi
        test_and_set_bit (ret 0)
      				  test_bit (ret 1)
         reset adapter		   napi_poll_routine
      
      If the adapter gets a tx timeout without a napi instance scheduled, its possible
      for the adapter to think it has exclusive access to the hardware  (as the napi
      instance is now scheduled via the napi_disable call), while the netpoll code
      thinks there is simply work to do.  The result is parallel hardware access
      leading to corrupt data structures in the driver, and a crash.
      
      Additionaly, there is another, more critical race between netpoll and
      napi_disable.  The disabled napi state is actually identical to the scheduled
      state for a given napi instance.  The implication being that, if a napi instance
      is disabled, a netconsole instance would see the napi state of the device as
      having been scheduled, and poll it, likely while the driver was dong something
      requiring exclusive access.  In the case above, its fairly clear that not having
      the rings in a state ready to be polled will cause any number of crashes.
      
      The fix should be pretty easy.  netpoll uses its own bit to indicate that that
      the napi instance is in a state of being serviced by netpoll (NAPI_STATE_NPSVC).
      We can just gate disabling on that bit as well as the sched bit.  That should
      prevent netpoll from conducting a napi poll if we convert its set bit to a
      test_and_set_bit operation to provide mutual exclusion
      
      Change notes:
      V2)
      	Remove a trailing whtiespace
      	Resubmit with proper subject prefix
      
      V3)
      	Clean up spacing nits
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: jmaxwell@redhat.com
      Tested-by: jmaxwell@redhat.com
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d8bff12
    • E
      tcp: add proper TS val into RST packets · 675ee231
      Eric Dumazet 提交于
      RST packets sent on behalf of TCP connections with TS option (RFC 7323
      TCP timestamps) have incorrect TS val (set to 0), but correct TS ecr.
      
      A > B: Flags [S], seq 0, win 65535, options [mss 1000,nop,nop,TS val 100
      ecr 0], length 0
      B > A: Flags [S.], seq 2444755794, ack 1, win 28960, options [mss
      1460,nop,nop,TS val 7264344 ecr 100], length 0
      A > B: Flags [.], ack 1, win 65535, options [nop,nop,TS val 110 ecr
      7264344], length 0
      
      B > A: Flags [R.], seq 1, ack 1, win 28960, options [nop,nop,TS val 0
      ecr 110], length 0
      
      We need to call skb_mstamp_get() to get proper TS val,
      derived from skb->skb_mstamp
      
      Note that RFC 1323 was advocating to not send TS option in RST segment,
      but RFC 7323 recommends the opposite :
      
        Once TSopt has been successfully negotiated, that is both <SYN> and
        <SYN,ACK> contain TSopt, the TSopt MUST be sent in every non-<RST>
        segment for the duration of the connection, and SHOULD be sent in an
        <RST> segment (see Section 5.2 for details)
      
      Note this RFC recommends to send TS val = 0, but we believe it is
      premature : We do not know if all TCP stacks are properly
      handling the receive side :
      
         When an <RST> segment is
         received, it MUST NOT be subjected to the PAWS check by verifying an
         acceptable value in SEG.TSval, and information from the Timestamps
         option MUST NOT be used to update connection state information.
         SEG.TSecr MAY be used to provide stricter <RST> acceptance checks.
      
      In 5 years, if/when all TCP stack are RFC 7323 ready, we might consider
      to decide to send TS val = 0, if it buys something.
      
      Fixes: 7faee5c0 ("tcp: remove TCP_SKB_CB(skb)->when")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      675ee231
  2. 23 9月, 2015 6 次提交
    • N
      net: dsa: Fix Marvell Egress Trailer check · fbd03513
      Neil Armstrong 提交于
      The Marvell Egress rx trailer check must be fixed to
      correctly detect bad bits in the third byte of the
      Eggress trailer as described in the Table 28 of the
      88E6060 datasheet.
      The current code incorrectly omits to check the third
      byte and checks the fourth byte twice.
      Signed-off-by: NNeil Armstrong <narmstrong@baylibre.com>
      Acked-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fbd03513
    • D
      lib: fix data race in rhashtable_rehash_one · 7def0f95
      Dmitriy Vyukov 提交于
      rhashtable_rehash_one() uses complex logic to update entry->next field,
      after INIT_RHT_NULLS_HEAD and NULLS_MARKER expansion:
      
      entry->next = 1 | ((base + off) << 1)
      
      This can be compiled along the lines of:
      
      entry->next = base + off
      entry->next <<= 1
      entry->next |= 1
      
      Which will break concurrent readers.
      
      NULLS value recomputation is not needed here, so just remove
      the complex logic.
      
      The data race was found with KernelThreadSanitizer (KTSAN).
      Signed-off-by: NDmitry Vyukov <dvyukov@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7def0f95
    • T
      ch9200: Convert to use module_usb_driver · 23eedbc2
      Tobias Klauser 提交于
      Converts the ch9200 driver to use the module_usb_driver() macro which
      makes the code smaller and a bit simpler.
      Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
      Acked-by: NMatthew Garrett <mjg59@srcf.ucam.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      23eedbc2
    • J
      openvswitch: Zero flows on allocation. · ae5f2fb1
      Jesse Gross 提交于
      When support for megaflows was introduced, OVS needed to start
      installing flows with a mask applied to them. Since masking is an
      expensive operation, OVS also had an optimization that would only
      take the parts of the flow keys that were covered by a non-zero
      mask. The values stored in the remaining pieces should not matter
      because they are masked out.
      
      While this works fine for the purposes of matching (which must always
      look at the mask), serialization to netlink can be problematic. Since
      the flow and the mask are serialized separately, the uninitialized
      portions of the flow can be encoded with whatever values happen to be
      present.
      
      In terms of functionality, this has little effect since these fields
      will be masked out by definition. However, it leaks kernel memory to
      userspace, which is a potential security vulnerability. It is also
      possible that other code paths could look at the masked key and get
      uninitialized data, although this does not currently appear to be an
      issue in practice.
      
      This removes the mask optimization for flows that are being installed.
      This was always intended to be the case as the mask optimizations were
      really targetting per-packet flow operations.
      
      Fixes: 03f0d916 ("openvswitch: Mega flow implementation")
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      Acked-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae5f2fb1
    • R
      net: dsa: actually force the speed on the CPU port · 53adc9e8
      Russell King 提交于
      Commit 54d792f2 ("net: dsa: Centralise global and port setup
      code into mv88e6xxx.") merged in the 4.2 merge window broke the link
      speed forcing for the CPU port of Marvell DSA switches.  The original
      code was:
      
              /* MAC Forcing register: don't force link, speed, duplex
               * or flow control state to any particular values on physical
               * ports, but force the CPU port and all DSA ports to 1000 Mb/s
               * full duplex.
               */
              if (dsa_is_cpu_port(ds, p) || ds->dsa_port_mask & (1 << p))
                      REG_WRITE(addr, 0x01, 0x003e);
              else
                      REG_WRITE(addr, 0x01, 0x0003);
      
      but the new code does a read-modify-write:
      
                      reg = _mv88e6xxx_reg_read(ds, REG_PORT(port), PORT_PCS_CTRL);
                      if (dsa_is_cpu_port(ds, port) ||
                          ds->dsa_port_mask & (1 << port)) {
                              reg |= PORT_PCS_CTRL_FORCE_LINK |
                                      PORT_PCS_CTRL_LINK_UP |
                                      PORT_PCS_CTRL_DUPLEX_FULL |
                                      PORT_PCS_CTRL_FORCE_DUPLEX;
                              if (mv88e6xxx_6065_family(ds))
                                      reg |= PORT_PCS_CTRL_100;
                              else
                                      reg |= PORT_PCS_CTRL_1000;
      
      The link speed in the PCS control register is a two bit field.  Forcing
      the link speed in this way doesn't ensure that the bit field is set to
      the correct value - on the hardware I have here, the speed bitfield
      remains set to 0x03, resulting in the speed not being forced to gigabit.
      
      We must clear both bits before forcing the link speed.
      
      Fixes: 54d792f2 ("net: dsa: Centralise global and port setup code into mv88e6xxx.")
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Acked-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      53adc9e8
    • J
      geneve: ensure ECN info is handled properly in all tx/rx paths · 08399efc
      John W. Linville 提交于
      Partially due to a pre-exising "thinko", the new metadata-based tx/rx
      paths were handling ECN propagation differently than the traditional
      tx/rx paths.  This patch removes the "thinko" (involving multiple
      ip_hdr assignments) on the rx path and corrects the ECN handling on
      both the rx and tx paths.
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      Reviewed-by: NJesse Gross <jesse@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      08399efc
  3. 22 9月, 2015 14 次提交
  4. 21 9月, 2015 11 次提交
  5. 18 9月, 2015 7 次提交