1. 20 6月, 2014 3 次提交
    • P
      tg3: Clear NETIF_F_TSO6 flag before doing software GSO · 40c1deaf
      Prashant Sreedharan 提交于
      Commit d3f6f3a1 ("tg3: Prevent page
      allocation failure during TSO workaround") modified driver logic
      to use tg3_tso_bug() for any TSO fragment that hits hardware bug
      conditions thus the patch increased the scope of work for tg3_tso_bug()
      to cover devices that support NETIF_F_TSO6 as well. Prior to the
      patch, tg3_tso_bug() would only be used on devices supporting
      NETIF_F_TSO.
      
      A regression was introduced for IPv6 packets requiring the workaround.
      To properly perform GSO on SKBs with TCPV6 gso_type, we need to call
      skb_gso_segment() with NETIF_F_TSO6 feature flag cleared, or the
      function will return NULL and cause a kernel oops as tg3 is not handling
      a NULL return value. This patch fixes the problem.
      Signed-off-by: NPrashant Sreedharan <prashant@broadcom.com>
      Signed-off-by: NMichael Chan <mchan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      40c1deaf
    • N
      tcp: fix tcp_match_skb_to_sack() for unaligned SACK at end of an skb · 2cd0d743
      Neal Cardwell 提交于
      If there is an MSS change (or misbehaving receiver) that causes a SACK
      to arrive that covers the end of an skb but is less than one MSS, then
      tcp_match_skb_to_sack() was rounding up pkt_len to the full length of
      the skb ("Round if necessary..."), then chopping all bytes off the skb
      and creating a zero-byte skb in the write queue.
      
      This was visible now because the recently simplified TLP logic in
      bef1909e ("tcp: fixing TLP's FIN recovery") could find that 0-byte
      skb at the end of the write queue, and now that we do not check that
      skb's length we could send it as a TLP probe.
      
      Consider the following example scenario:
      
       mss: 1000
       skb: seq: 0 end_seq: 4000  len: 4000
       SACK: start_seq: 3999 end_seq: 4000
      
      The tcp_match_skb_to_sack() code will compute:
      
       in_sack = false
       pkt_len = start_seq - TCP_SKB_CB(skb)->seq = 3999 - 0 = 3999
       new_len = (pkt_len / mss) * mss = (3999/1000)*1000 = 3000
       new_len += mss = 4000
      
      Previously we would find the new_len > skb->len check failing, so we
      would fall through and set pkt_len = new_len = 4000 and chop off
      pkt_len of 4000 from the 4000-byte skb, leaving a 0-byte segment
      afterward in the write queue.
      
      With this new commit, we notice that the new new_len >= skb->len check
      succeeds, so that we return without trying to fragment.
      
      Fixes: adb92db8 ("tcp: Make SACK code to split only at mss boundaries")
      Reported-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Ilpo Jarvinen <ilpo.jarvinen@helsinki.fi>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2cd0d743
    • D
      Revert "net: return actual error on register_queue_kobjects" · 8e4946cc
      David S. Miller 提交于
      This reverts commit d36a4f4b.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8e4946cc
  2. 19 6月, 2014 6 次提交
    • K
      net: filter: fix upper BPF instruction limit · 6f9a093b
      Kees Cook 提交于
      The original checks (via sk_chk_filter) for instruction count uses ">",
      not ">=", so changing this in sk_convert_filter has the potential to break
      existing seccomp filters that used exactly BPF_MAXINSNS many instructions.
      
      Fixes: bd4cf0ed ("net: filter: rework/optimize internal BPF interpreter's instruction set")
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org # v3.15+
      Acked-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6f9a093b
    • D
      net: sctp: propagate sysctl errors from proc_do* properly · ff5e92c1
      Daniel Borkmann 提交于
      sysctl handler proc_sctp_do_hmac_alg(), proc_sctp_do_rto_min() and
      proc_sctp_do_rto_max() do not properly reflect some error cases
      when writing values via sysctl from internal proc functions such
      as proc_dointvec() and proc_dostring().
      
      In all these cases we pass the test for write != 0 and partially
      do additional work just to notice that additional sanity checks
      fail and we return with hard-coded -EINVAL while proc_do*
      functions might also return different errors. So fix this up by
      simply testing a successful return of proc_do* right after
      calling it.
      
      This also allows to propagate its return value onwards to the user.
      While touching this, also fix up some minor style issues.
      
      Fixes: 4f3fdf3b ("sctp: add check rto_min and rto_max in sysctl")
      Fixes: 3c68198e ("sctp: Make hmac algorithm selection for cookie generation dynamic")
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ff5e92c1
    • J
      net: return actual error on register_queue_kobjects · d36a4f4b
      Jie Liu 提交于
      Return the actual error code if call kset_create_and_add() failed
      
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NJie Liu <jeff.liu@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d36a4f4b
    • O
      bonding: Advertize vxlan offload features when supported · 5a7baa78
      Or Gerlitz 提交于
      When the underlying device supports TCP offloads for VXLAN/UDP
      encapulated traffic, we need to reflect that through the hw_enc_features
      field of the bonding net-device. This will cause the xmit path
      in the core networking stack to provide bonding with encapsulated
      GSO frames to offload into the HW etc.
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a7baa78
    • M
      skge: Added FS A8NE-FM to the list of 32bit DMA boards · ee14eb7b
      Mirko Lindner 提交于
      Added FUJITSU SIEMENS A8NE-FM to the list of 32bit DMA boards
      
      >From Tomi O.:
      After I added an entry to this MB into the skge.c
      driver in order to enable the mentioned 64bit dma disable quirk,
      the network data corruptions ended and everything is fine again.
      Signed-off-by: NMirko Lindner <mlindner@marvell.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee14eb7b
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 3a3ec1b2
      David S. Miller 提交于
      Pablo Neira Ayuso says:
      
      ====================
      netfilter fixes for net
      
      The following patchset contains netfilter updates for your net tree,
      they are:
      
      1) Fix refcount leak when dumping the dying/unconfirmed conntrack lists,
         from Florian Westphal.
      
      2) Fix crash in NAT when removing a netnamespace, also from Florian.
      
      3) Fix a crash in IPVS when trying to remove an estimator out of the
         sysctl scope, from Julian Anastasov.
      
      4) Add zone attribute to the routing to calculate the message size in
         ctnetlink events, from Ken-ichirou MATSUZAWA.
      
      5) Another fix for the dying/unconfirmed list which was preventing to
         dump more than one memory page of entries (~17 entries in x86_64).
      
      6) Fix missing RCU-safe list insertion in the rule replacement code
         in nf_tables.
      
      7) Since the new transaction infrastructure is in place, we have to
         upgrade the chain use counter from u16 to u32 to avoid overflow
         after more than 2^16 rules are added.
      
      8) Fix refcount leak when replacing rule in nf_tables. This problem
         was also introduced in new transaction.
      
      9) Call the ->destroy() callback when releasing nft-xt rules to fix
         module refcount leaks.
      
      10) Set the family in the netlink messages that contain set elements
          in nf_tables to make it consistent with other object types.
      
      11) Don't dump NAT port information if it is unset in nft_nat.
      
      12) Update the MAINTAINERS file, I have merged the ebtables entry
          into netfilter. While at it, also removed the netfilter users
          mailing list, the development list should be enough.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a3ec1b2
  3. 18 6月, 2014 3 次提交
  4. 17 6月, 2014 9 次提交
    • D
      hyperv: fix apparent cut-n-paste error in send path teardown · 2f18423d
      Dave Jones 提交于
      c25aaf81: "hyperv: Enable sendbuf mechanism on the send path" added
      some teardown code that looks like it was copied from the recieve path
      above, but missed a variable name replacement.
      Signed-off-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2f18423d
    • D
      tcp: remove unnecessary tcp_sk assignment. · 17846376
      Dave Jones 提交于
      This variable is overwritten by the child socket assignment before
      it ever gets used.
      Signed-off-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      17846376
    • C
      net: tile: fix unused variable warning · 9ebe2435
      Chris Metcalf 提交于
      'i' is unused in tile_net_dev_init() after commit d581ebf5
      ("net: tile: Use helpers from linux/etherdevice.h to check/set MAC").
      Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ebe2435
    • C
      ptp: In the testptp utility, use clock_adjtime from glibc when available · 42e1358e
      Christian Riesch 提交于
      clock_adjtime was included in glibc 2.14. _GNU_SOURCE must be defined
      to make it available.
      Signed-off-by: NChristian Riesch <christian.riesch@omicron.at>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Acked-by: NRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      42e1358e
    • J
      isdn: hisax: Drop duplicate Kconfig entry · ddc6fbd8
      Jean Delvare 提交于
      There are 2 HISAX_AVM_A1_PCMCIA Kconfig entries. The kbuild system
      ignores the second one, and apparently nobody noticed the problem so
      far, so let's remove that second entry.
      Signed-off-by: NJean Delvare <jdelvare@suse.de>
      Cc: Karsten Keil <isdn@linux-pingi.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ddc6fbd8
    • J
      isdn: hisax: Merge Kconfig ifs · a1c33346
      Jean Delvare 提交于
      The first half of the HiSax config options is presented if
      ISDN_DRV_HISAX!=n, while the second half of the options is presented
      if ISDN_DRV_HISAX. That's the same, so merge both conditionals.
      Signed-off-by: NJean Delvare <jdelvare@suse.de>
      Cc: Karsten Keil <isdn@linux-pingi.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a1c33346
    • T
      slcan: Port write_wakeup deadlock fix from slip · a8e83b17
      Tyler Hall 提交于
      The commit "slip: Fix deadlock in write_wakeup" fixes a deadlock caused
      by a change made in both slcan and slip. This is a direct port of that
      fix.
      Signed-off-by: NTyler Hall <tylerwhall@gmail.com>
      Cc: Oliver Hartkopp <socketcan@hartkopp.net>
      Cc: Andre Naujoks <nautsch2@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a8e83b17
    • T
      slip: Fix deadlock in write_wakeup · 661f7fda
      Tyler Hall 提交于
      Use schedule_work() to avoid potentially taking the spinlock in
      interrupt context.
      
      Commit cc9fa74e ("slip/slcan: added locking in wakeup function") added
      necessary locking to the wakeup function and 367525c8/ddcde142 ("can:
      slcan: Fix spinlock variant") converted it to spin_lock_bh() because the lock
      is also taken in timers.
      
      Disabling softirqs is not sufficient, however, as tty drivers may call
      write_wakeup from interrupt context. This driver calls tty->ops->write() with
      its spinlock held, which may immediately cause an interrupt on the same CPU and
      subsequent spin_bug().
      
      Simply converting to spin_lock_irq/irqsave() prevents this deadlock, but
      causes lockdep to point out a possible circular locking dependency
      between these locks:
      
      (&(&sl->lock)->rlock){-.....}, at: slip_write_wakeup
      (&port_lock_key){-.....}, at: serial8250_handle_irq.part.13
      
      The slip transmit is holding the slip spinlock when calling the tty write.
      This grabs the port lock. On an interrupt, the handler grabs the port
      lock and calls write_wakeup which grabs the slip lock. This could be a
      problem if a serial interrupt occurs on another CPU during the slip
      transmit.
      
      To deal with these issues, don't grab the lock in the wakeup function by
      deferring the writeout to a workqueue. Also hold the lock during close
      when de-assigning the tty pointer to safely disarm the worker and
      timers.
      
      This bug is easily reproducible on the first transmit when slip is
      used with the standard 8250 serial driver.
      
      [<c0410b7c>] (spin_bug+0x0/0x38) from [<c006109c>] (do_raw_spin_lock+0x60/0x1d0)
       r5:eab27000 r4:ec02754c
      [<c006103c>] (do_raw_spin_lock+0x0/0x1d0) from [<c04185c0>] (_raw_spin_lock+0x28/0x2c)
       r10:0000001f r9:eabb814c r8:eabb8140 r7:40070193 r6:ec02754c r5:eab27000
       r4:ec02754c r3:00000000
      [<c0418598>] (_raw_spin_lock+0x0/0x2c) from [<bf3a0220>] (slip_write_wakeup+0x50/0xe0 [slip])
       r4:ec027540 r3:00000003
      [<bf3a01d0>] (slip_write_wakeup+0x0/0xe0 [slip]) from [<c026e420>] (tty_wakeup+0x48/0x68)
       r6:00000000 r5:ea80c480 r4:eab27000 r3:bf3a01d0
      [<c026e3d8>] (tty_wakeup+0x0/0x68) from [<c028a8ec>] (uart_write_wakeup+0x2c/0x30)
       r5:ed68ea90 r4:c06790d8
      [<c028a8c0>] (uart_write_wakeup+0x0/0x30) from [<c028dc44>] (serial8250_tx_chars+0x114/0x170)
      [<c028db30>] (serial8250_tx_chars+0x0/0x170) from [<c028dffc>] (serial8250_handle_irq+0xa0/0xbc)
       r6:000000c2 r5:00000060 r4:c06790d8 r3:00000000
      [<c028df5c>] (serial8250_handle_irq+0x0/0xbc) from [<c02933a4>] (dw8250_handle_irq+0x38/0x64)
       r7:00000000 r6:edd2f390 r5:000000c2 r4:c06790d8
      [<c029336c>] (dw8250_handle_irq+0x0/0x64) from [<c028d2f4>] (serial8250_interrupt+0x44/0xc4)
       r6:00000000 r5:00000000 r4:c06791c4 r3:c029336c
      [<c028d2b0>] (serial8250_interrupt+0x0/0xc4) from [<c0067fe4>] (handle_irq_event_percpu+0xb4/0x2b0)
       r10:c06790d8 r9:eab27000 r8:00000000 r7:00000000 r6:0000001f r5:edd52980
       r4:ec53b6c0 r3:c028d2b0
      [<c0067f30>] (handle_irq_event_percpu+0x0/0x2b0) from [<c006822c>] (handle_irq_event+0x4c/0x6c)
       r10:c06790d8 r9:eab27000 r8:c0673ae0 r7:c05c2020 r6:ec53b6c0 r5:edd529d4
       r4:edd52980
      [<c00681e0>] (handle_irq_event+0x0/0x6c) from [<c006b140>] (handle_level_irq+0xe8/0x100)
       r6:00000000 r5:edd529d4 r4:edd52980 r3:00022000
      [<c006b058>] (handle_level_irq+0x0/0x100) from [<c00676f8>] (generic_handle_irq+0x30/0x40)
       r5:0000001f r4:0000001f
      [<c00676c8>] (generic_handle_irq+0x0/0x40) from [<c000f57c>] (handle_IRQ+0xd0/0x13c)
       r4:ea997b18 r3:000000e0
      [<c000f4ac>] (handle_IRQ+0x0/0x13c) from [<c00086c4>] (armada_370_xp_handle_irq+0x4c/0x118)
       r8:000003ff r7:ea997b18 r6:ffffffff r5:60070013 r4:c0674dc0
      [<c0008678>] (armada_370_xp_handle_irq+0x0/0x118) from [<c0013840>] (__irq_svc+0x40/0x70)
      Exception stack(0xea997b18 to 0xea997b60)
      7b00:                                                       00000001 20070013
      7b20: 00000000 0000000b 20070013 eab27000 20070013 00000000 ed10103e eab27000
      7b40: c06790d8 ea997b74 ea997b60 ea997b60 c04186c0 c04186c8 60070013 ffffffff
       r9:eab27000 r8:ed10103e r7:ea997b4c r6:ffffffff r5:60070013 r4:c04186c8
      [<c04186a4>] (_raw_spin_unlock_irqrestore+0x0/0x54) from [<c0288fc0>] (uart_start+0x40/0x44)
       r4:c06790d8 r3:c028ddd8
      [<c0288f80>] (uart_start+0x0/0x44) from [<c028982c>] (uart_write+0xe4/0xf4)
       r6:0000003e r5:00000000 r4:ed68ea90 r3:0000003e
      [<c0289748>] (uart_write+0x0/0xf4) from [<bf3a0d20>] (sl_xmit+0x1c4/0x228 [slip])
       r10:ed388e60 r9:0000003c r8:ffffffdd r7:0000003e r6:ec02754c r5:ea717eb8
       r4:ec027000
      [<bf3a0b5c>] (sl_xmit+0x0/0x228 [slip]) from [<c0368d74>] (dev_hard_start_xmit+0x39c/0x6d0)
       r8:eaf163c0 r7:ec027000 r6:ea717eb8 r5:00000000 r4:00000000
      Signed-off-by: NTyler Hall <tylerwhall@gmail.com>
      Cc: Oliver Hartkopp <socketcan@hartkopp.net>
      Cc: Andre Naujoks <nautsch2@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      661f7fda
    • N
      vmxnet3: adjust ring sizes when interface is down · f00e2b0a
      Neil Horman 提交于
      If ethtool is used to update ring sizes on a vmxnet3 interface that isn't
      running, the change isn't stored, meaning the ring update is effectively is
      ignored and lost without any indication to the user.
      
      Other network drivers store the ring size update so that ring allocation uses
      the new sizes next time the interface is brought up.  This patch modifies
      vmxnet3 to behave this way as well
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Shreyas Bhatewara <sbhatewara@vmware.com>
      CC: "VMware, Inc." <pv-drivers@vmware.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f00e2b0a
  5. 16 6月, 2014 16 次提交
  6. 15 6月, 2014 3 次提交
    • D
      net: sctp: fix permissions for rto_alpha and rto_beta knobs · b58537a1
      Daniel Borkmann 提交于
      Commit 3fd091e7 ("[SCTP]: Remove multiple levels of msecs
      to jiffies conversions.") has silently changed permissions for
      rto_alpha and rto_beta knobs from 0644 to 0444. The purpose of
      this was to discourage users from tweaking rto_alpha and
      rto_beta knobs in production environments since they are key
      to correctly compute rtt/srtt.
      
      RFC4960 under section 6.3.1. RTO Calculation says regarding
      rto_alpha and rto_beta under rule C3 and C4:
      
        [...]
        C3)  When a new RTT measurement R' is made, set
      
             RTTVAR <- (1 - RTO.Beta) * RTTVAR + RTO.Beta * |SRTT - R'|
      
             and
      
             SRTT <- (1 - RTO.Alpha) * SRTT + RTO.Alpha * R'
      
             Note: The value of SRTT used in the update to RTTVAR
             is its value before updating SRTT itself using the
             second assignment. After the computation, update
             RTO <- SRTT + 4 * RTTVAR.
      
        C4)  When data is in flight and when allowed by rule C5
             below, a new RTT measurement MUST be made each round
             trip. Furthermore, new RTT measurements SHOULD be
             made no more than once per round trip for a given
             destination transport address. There are two reasons
             for this recommendation: First, it appears that
             measuring more frequently often does not in practice
             yield any significant benefit [ALLMAN99]; second,
             if measurements are made more often, then the values
             of RTO.Alpha and RTO.Beta in rule C3 above should be
             adjusted so that SRTT and RTTVAR still adjust to
             changes at roughly the same rate (in terms of how many
             round trips it takes them to reflect new values) as
             they would if making only one measurement per
             round-trip and using RTO.Alpha and RTO.Beta as given
             in rule C3. However, the exact nature of these
             adjustments remains a research issue.
        [...]
      
      While it is discouraged to adjust rto_alpha and rto_beta
      and not further specified how to adjust them, the RFC also
      doesn't explicitly forbid it, but rather gives a RECOMMENDED
      default value (rto_alpha=3, rto_beta=2). We have a couple
      of users relying on the old permissions before they got
      changed. That said, if someone really has the urge to adjust
      them, we could allow it with a warning in the log.
      
      Fixes: 3fd091e7 ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.")
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b58537a1
    • D
      Merge branch 'csum_fixes' · e4f7ae93
      David S. Miller 提交于
      Tom Herbert says:
      
      ====================
      Fixes related to some recent checksum modifications.
      
      - Fix GSO constants to match NETIF flags
      - Fix logic in saving checksum complete in __skb_checksum_complete
      - Call __skb_checksum_complete from UDP if we are checksumming over
        whole packet in order to save checksum.
      - Fixes to VXLAN to work correctly with checksum complete
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e4f7ae93
    • T
      vxlan: Checksum fixes · f79b064c
      Tom Herbert 提交于
      Call skb_pop_rcv_encapsulation and postpull_rcsum for the Ethernet
      header to work properly with checksum complete.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f79b064c