1. 13 8月, 2015 5 次提交
    • J
      gianfar: remove faulty filer optimizer · 1f2b7293
      Jakub Kicinski 提交于
      Current filer rule optimization is broken in several ways:
       (1) Can perform reads/writes beyond end of allocated tables.
           (gianfar_ethtool.c:1326).
      
      (2) It breaks badly for rules with more than 2 specifiers
           (e.g. matching ip, port, tos).
      
      Example:
      # ethtool -N eth2 flow-type udp4 dst-ip 10.0.0.1 dst-port 1 tos 1 action 1
      Added rule with ID 254
      # ethtool -N eth2 flow-type udp4 dst-ip 10.0.0.2 dst-port 2 tos 2 action 9
      Added rule with ID 253
      # ethtool -N eth2 flow-type udp4 dst-ip 10.0.0.3 dst-port 3 tos 3 action 17
      Added rule with ID 252
      # ./filer_decode /sys/kernel/debug/gfar1/filer_raw
      00: MASK == 00000210 AND         Q:00           ctrl:00000080 prop:00000210
      01: FPR  == 00000210 AND CLE     Q:00           ctrl:00000281 prop:00000210
      02: MASK == ffffffff AND         Q:00           ctrl:00000080 prop:ffffffff
      03: DPT  == 00000003 AND         Q:00           ctrl:0000008e prop:00000003
      04: TOS  == 00000003 AND         Q:00           ctrl:0000008a prop:00000003
      05: DIA  == 0a000003 AND         Q:11           ctrl:0000448c prop:0a000003
      06: DPT  == 00000002 AND         Q:00           ctrl:0000008e prop:00000002
      07: TOS  == 00000002 AND         Q:00           ctrl:0000008a prop:00000002
      08: DIA  == 0a000002 AND         Q:09           ctrl:0000248c prop:0a000002
      09: DIA  == 0a000001 AND         Q:00           ctrl:0000008c prop:0a000001
      0a: DPT  == 00000001 AND         Q:00           ctrl:0000008e prop:00000001
      0b: TOS  == 00000001     CLE     Q:01           ctrl:0000060a prop:00000001
      ff: MASK >= 00000000             Q:00           ctrl:00000020 prop:00000000
      
      (Entire cluster gets AND-ed together).
      
       (3) We observed that the masking rules it generates do not
           play well with clustering on P2020.  Only first rule
           of the cluster would ever fire.  Given that optimizer
           relies heavily on masking this is very hard to fix.
      
      Example:
      # ethtool -N eth2 flow-type udp4 dst-ip 10.0.0.1 dst-port 1  action 1
      Added rule with ID 254
      # ethtool -N eth2 flow-type udp4 dst-ip 10.0.0.2 dst-port 2  action 9
      Added rule with ID 253
      # ethtool -N eth2 flow-type udp4 dst-ip 10.0.0.3 dst-port 3  action 17
      Added rule with ID 252
      # ./filer_decode /sys/kernel/debug/gfar1/filer_raw
      00: MASK == 00000210 AND         Q:00           ctrl:00000080 prop:00000210
      01: FPR  == 00000210 AND CLE     Q:00           ctrl:00000281 prop:00000210
      02: MASK == ffffffff AND         Q:00           ctrl:00000080 prop:ffffffff
      03: DPT  == 00000003 AND         Q:00           ctrl:0000008e prop:00000003
      04: DIA  == 0a000003             Q:11           ctrl:0000440c prop:0a000003
      05: DPT  == 00000002 AND         Q:00           ctrl:0000008e prop:00000002
      06: DIA  == 0a000002             Q:09           ctrl:0000240c prop:0a000002
      07: DIA  == 0a000001 AND         Q:00           ctrl:0000008c prop:0a000001
      08: DPT  == 00000001     CLE     Q:01           ctrl:0000060e prop:00000001
      ff: MASK >= 00000000             Q:00           ctrl:00000020 prop:00000000
      
      Which looks correct according to the spec but only the first
      (eth id 252)/last added rule for 10.0.0.3 will ever trigger.
      As if filer did not treat the AND CLE as cluster start but
      also kept AND-ing the rules.  We found no errata covering this.
      
      The fact that nobody noticed (2) or (3) makes me think
      that this feature is not very widely used and we should just
      remove it.
      Reported-by: NAleksander Dutkowski <adutkowski@gmail.com>
      Signed-off-by: NJakub Kicinski <kubakici@wp.pl>
      Acked-by: NClaudiu Manoil <claudiu.manoil@freescale.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f2b7293
    • J
      gianfar: correct list membership accounting · b5c8c890
      Jakub Kicinski 提交于
      At a cost of one line let's make sure .count is correct
      when calling gfar_process_filer_changes().
      Signed-off-by: NJakub Kicinski <kubakici@wp.pl>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b5c8c890
    • J
      gianfar: correct filer table writing · a898fe04
      Jakub Kicinski 提交于
      MAX_FILER_IDX is the last usable index.  Using less-than
      will already guarantee that one entry for catch-all rule
      will be left, no need to subtract 1 here.
      Signed-off-by: NJakub Kicinski <kubakici@wp.pl>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a898fe04
    • V
      bonding: Gratuitous ARP gets dropped when first slave added · b02e3e94
      Venkat Venkatsubra 提交于
      When the first slave is added (such as during bootup) the first
      gratuitous ARP gets dropped. We don't see this drop during a failover.
      The packet gets dropped in qdisc (noop_enqueue).
      
      The fix is to delay the sending of gratuitous ARPs till the bond dev's
      carrier is present.
      
      It can also be worked around by setting num_grat_arp to more than 1.
      Signed-off-by: NVenkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b02e3e94
    • F
      net: dsa: Do not override PHY interface if already configured · 211c504a
      Florian Fainelli 提交于
      In case we need to divert reads/writes using the slave MII bus, we may have
      already fetched a valid PHY interface property from Device Tree, and that
      mode is used by the PHY driver to make configuration decisions.
      
      If we could not fetch the "phy-mode" property, we will assign p->phy_interface
      to PHY_INTERFACE_MODE_NA, such that we can actually check for that condition as
      to whether or not we should override the interface value.
      
      Fixes: 19334920 ("net: dsa: Set valid phy interface type")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      211c504a
  2. 12 8月, 2015 3 次提交
  3. 11 8月, 2015 16 次提交
    • E
      inet: fix possible request socket leak · 3257d8b1
      Eric Dumazet 提交于
      In commit b357a364 ("inet: fix possible panic in
      reqsk_queue_unlink()"), I missed fact that tcp_check_req()
      can return the listener socket in one case, and that we must
      release the request socket refcount or we leak it.
      
      Tested:
      
       Following packetdrill test template shows the issue
      
      0     socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      +0    setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      +0    bind(3, ..., ...) = 0
      +0    listen(3, 1) = 0
      
      +0    < S 0:0(0) win 2920 <mss 1460,sackOK,nop,nop>
      +0    > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK>
      +.002 < . 1:1(0) ack 21 win 2920
      +0    > R 21:21(0)
      
      Fixes: b357a364 ("inet: fix possible panic in reqsk_queue_unlink()")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3257d8b1
    • E
      inet: fix races with reqsk timers · 2235f2ac
      Eric Dumazet 提交于
      reqsk_queue_destroy() and reqsk_queue_unlink() should use
      del_timer_sync() instead of del_timer() before calling reqsk_put(),
      otherwise we could free a req still used by another cpu.
      
      But before doing so, reqsk_queue_destroy() must release syn_wait_lock
      spinlock or risk a dead lock, as reqsk_timer_handler() might
      need to take this same spinlock from reqsk_queue_unlink() (called from
      inet_csk_reqsk_queue_drop())
      
      Fixes: fa76ce73 ("inet: get rid of central tcp/dccp listener timer")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2235f2ac
    • F
      mkiss: Fix error handling in mkiss_open() · 9d332d92
      Fabio Estevam 提交于
      If register_netdev() fails we are not propagating the error and
      we return success because ax_open() succeeded previously.
      
      Fix this by checking the return value of ax_open() and
      register_netdev() and propagate the error in case of failure.
      Reported-by: NRUC_Soft_Sec <zy900702@163.com>
      Signed-off-by: NFabio Estevam <fabio.estevam@freescale.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9d332d92
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 18255457
      David S. Miller 提交于
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contains five Netfilter fixes for your net tree,
      they are:
      
      1) Silence a warning on falling back to vmalloc(). Since 88eab472, we can
         easily hit this warning message, that gets users confused. So let's get rid
         of it.
      
      2) Recently when porting the template object allocation on top of kmalloc to
         fix the netns dependencies between x_tables and conntrack, the error
         checks where left unchanged. Remove IS_ERR() and check for NULL instead.
         Patch from Dan Carpenter.
      
      3) Don't ignore gfp_flags in the new nf_ct_tmpl_alloc() function, from
         Joe Stringer.
      
      4) Fix a crash due to NULL pointer dereference in ip6t_SYNPROXY, patch from
         Phil Sutter.
      
      5) The sequence number of the Syn+ack that is sent from SYNPROXY to clients is
         not adjusted through our NAT infrastructure, as a result the client may
         ignore this TCP packet and TCP flow hangs until the client probes us.  Also
         from Phil Sutter.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      18255457
    • D
      Merge branch 'bnx2x-fixes' · 875a74b6
      David S. Miller 提交于
      Yuval Mintz says:
      
      ====================
      bnx2x: small fixes
      
      This adds 2 small fixes, one to error flows during memory release
      and the other to flash writes via ethtool API.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      875a74b6
    • Y
      bnx2x: Free NVRAM lock at end of each page · 0ea853df
      Yuval Mintz 提交于
      Writing each 4Kb page into flash might take up-to ~100 miliseconds,
      during which time management firmware cannot acces the nvram for its
      own uses.
      
      Firmware upgrade utility use the ethtool API to burn new flash images
      for the device via the ethtool API, doing so by writing several page-worth
      of data on each command. Such action might create problems for the
      management firmware, as the nvram might not be accessible for a long time.
      
      This patch changes the write implementation, releasing the nvram lock on
      the completion of each page, allowing the management firmware time to
      claim it and perform its own required actions.
      Signed-off-by: NYuval Mintz <Yuval.Mintz@qlogic.com>
      Signed-off-by: NAriel Elior <Ariel.Elior@qlogic.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ea853df
    • Y
      bnx2x: Prevent null pointer dereference on SKB release · e1615903
      Yuval Mintz 提交于
      On error flows its possible to free an SKB even if it was not allocated.
      Signed-off-by: NYuval Mintz <Yuval.Mintz@qlogic.com>
      Signed-off-by: NAriel Elior <Ariel.Elior@qlogic.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e1615903
    • D
      cxgb4: missing curly braces in t4_setup_debugfs() · 21a44763
      Dan Carpenter 提交于
      There were missing curly braces so it means we call add_debugfs_mem()
      unintentionally.
      
      Fixes: 3ccc6cf7 ('cxgb4: Adds support for T6 adapter')
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      21a44763
    • B
      net-timestamp: Update skb_complete_tx_timestamp comment · 7a76a021
      Benjamin Poirier 提交于
      After "62bccb8c net-timestamp: Make the clone operation stand-alone from phy
      timestamping" the hwtstamps parameter of skb_complete_tx_timestamp() may no
      longer be NULL.
      Signed-off-by: NBenjamin Poirier <bpoirier@suse.com>
      Cc: Alexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7a76a021
    • F
      ipv6: don't reject link-local nexthop on other interface · 330567b7
      Florian Westphal 提交于
      48ed7b26 ("ipv6: reject locally assigned nexthop addresses") is too
      strict; it rejects following corner-case:
      
      ip -6 route add default via fe80::1:2:3 dev eth1
      
      [ where fe80::1:2:3 is assigned to a local interface, but not eth1 ]
      
      Fix this by restricting search to given device if nh is linklocal.
      
      Joint work with Hannes Frederic Sowa.
      
      Fixes: 48ed7b26 ("ipv6: reject locally assigned nexthop addresses")
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      330567b7
    • D
      netlink: make sure -EBUSY won't escape from netlink_insert · 4e7c1330
      Daniel Borkmann 提交于
      Linus reports the following deadlock on rtnl_mutex; triggered only
      once so far (extract):
      
      [12236.694209] NetworkManager  D 0000000000013b80     0  1047      1 0x00000000
      [12236.694218]  ffff88003f902640 0000000000000000 ffffffff815d15a9 0000000000000018
      [12236.694224]  ffff880119538000 ffff88003f902640 ffffffff81a8ff84 00000000ffffffff
      [12236.694230]  ffffffff81a8ff88 ffff880119c47f00 ffffffff815d133a ffffffff81a8ff80
      [12236.694235] Call Trace:
      [12236.694250]  [<ffffffff815d15a9>] ? schedule_preempt_disabled+0x9/0x10
      [12236.694257]  [<ffffffff815d133a>] ? schedule+0x2a/0x70
      [12236.694263]  [<ffffffff815d15a9>] ? schedule_preempt_disabled+0x9/0x10
      [12236.694271]  [<ffffffff815d2c3f>] ? __mutex_lock_slowpath+0x7f/0xf0
      [12236.694280]  [<ffffffff815d2cc6>] ? mutex_lock+0x16/0x30
      [12236.694291]  [<ffffffff814f1f90>] ? rtnetlink_rcv+0x10/0x30
      [12236.694299]  [<ffffffff8150ce3b>] ? netlink_unicast+0xfb/0x180
      [12236.694309]  [<ffffffff814f5ad3>] ? rtnl_getlink+0x113/0x190
      [12236.694319]  [<ffffffff814f202a>] ? rtnetlink_rcv_msg+0x7a/0x210
      [12236.694331]  [<ffffffff8124565c>] ? sock_has_perm+0x5c/0x70
      [12236.694339]  [<ffffffff814f1fb0>] ? rtnetlink_rcv+0x30/0x30
      [12236.694346]  [<ffffffff8150d62c>] ? netlink_rcv_skb+0x9c/0xc0
      [12236.694354]  [<ffffffff814f1f9f>] ? rtnetlink_rcv+0x1f/0x30
      [12236.694360]  [<ffffffff8150ce3b>] ? netlink_unicast+0xfb/0x180
      [12236.694367]  [<ffffffff8150d344>] ? netlink_sendmsg+0x484/0x5d0
      [12236.694376]  [<ffffffff810a236f>] ? __wake_up+0x2f/0x50
      [12236.694387]  [<ffffffff814cad23>] ? sock_sendmsg+0x33/0x40
      [12236.694396]  [<ffffffff814cb05e>] ? ___sys_sendmsg+0x22e/0x240
      [12236.694405]  [<ffffffff814cab75>] ? ___sys_recvmsg+0x135/0x1a0
      [12236.694415]  [<ffffffff811a9d12>] ? eventfd_write+0x82/0x210
      [12236.694423]  [<ffffffff811a0f9e>] ? fsnotify+0x32e/0x4c0
      [12236.694429]  [<ffffffff8108cb70>] ? wake_up_q+0x60/0x60
      [12236.694434]  [<ffffffff814cba09>] ? __sys_sendmsg+0x39/0x70
      [12236.694440]  [<ffffffff815d4797>] ? entry_SYSCALL_64_fastpath+0x12/0x6a
      
      It seems so far plausible that the recursive call into rtnetlink_rcv()
      looks suspicious. One way, where this could trigger is that the senders
      NETLINK_CB(skb).portid was wrongly 0 (which is rtnetlink socket), so
      the rtnl_getlink() request's answer would be sent to the kernel instead
      to the actual user process, thus grabbing rtnl_mutex() twice.
      
      One theory would be that netlink_autobind() triggered via netlink_sendmsg()
      internally overwrites the -EBUSY error to 0, but where it is wrongly
      originating from __netlink_insert() instead. That would reset the
      socket's portid to 0, which is then filled into NETLINK_CB(skb).portid
      later on. As commit d470e3b4 ("[NETLINK]: Fix two socket hashing bugs.")
      also puts it, -EBUSY should not be propagated from netlink_insert().
      
      It looks like it's very unlikely to reproduce. We need to trigger the
      rhashtable_insert_rehash() handler under a situation where rehashing
      currently occurs (one /rare/ way would be to hit ht->elasticity limits
      while not filled enough to expand the hashtable, but that would rather
      require a specifically crafted bind() sequence with knowledge about
      destination slots, seems unlikely). It probably makes sense to guard
      __netlink_insert() in any case and remap that error. It was suggested
      that EOVERFLOW might be better than an already overloaded ENOMEM.
      
      Reference: http://thread.gmane.org/gmane.linux.network/372676Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e7c1330
    • I
      bna: fix interrupts storm caused by erroneous packets · ade4dc3e
      Ivan Vecera 提交于
      The commit "e29aa339 bna: Enable Multi Buffer RX" moved packets counter
      increment from the beginning of the NAPI processing loop after the check
      for erroneous packets so they are never accounted. This counter is used
      to inform firmware about number of processed completions (packets).
      As these packets are never acked the firmware fires IRQs for them again
      and again.
      
      Fixes: e29aa339 ("bna: Enable Multi Buffer RX")
      Signed-off-by: NIvan Vecera <ivecera@redhat.com>
      Acked-by: NRasesh Mody <rasesh.mody@qlogic.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ade4dc3e
    • D
      Merge branch 'mvpp2-fixes' · ea708584
      David S. Miller 提交于
      Marcin Wojtas says:
      
      ====================
      Fixes for the network driver of Marvell Armada 375 SoC
      
      This is a set of three patches that fix long-lasting problems implemented in
      the initial support for the Armada 375 network controller.
      
      Due to an inappropriate concept of handling the per-CPU sent packets'
      processing on TX path the driver numerous problems occured, such as RCU
      stalls. Those have been fixed, of which details you can find in the commit
      logs. The patches were intensively tested on top of v4.2-rc5.
      
      I'm looking forward to any comments or remarks.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea708584
    • M
      net: mvpp2: replace TX coalescing interrupts with hrtimer · edc660fa
      Marcin Wojtas 提交于
      The PP2 controller is capable of per-CPU TX processing, which means there are
      per-CPU banked register sets and queues. Current version of the driver supports
      TX packet coalescing - once on given CPU sent packets amount reaches a threshold
      value, an IRQ occurs. However, there is a single interrupt line responsible for
      CPU0/1 TX and RX events (the latter is not per-CPU, the hardware does not
      support RSS).
      
      When the top-half executes the interrupt cause is not known. This is why in
      NAPI poll function, along with RX processing, IRQ cause register on both
      CPU's is accessed in order to determine on which of them the TX coalescing
      threshold might have been reached. Thus the egress processing and releasing the
      buffers is able to take place on the corresponding CPU. Hitherto approach lead
      to an illegal usage of on_each_cpu function in softirq context.
      
      The problem is solved by resigning from TX coalescing interrupts and separating
      egress finalization from NAPI processing. For that purpose a method of using
      hrtimer is introduced. In main transmit function (mvpp2_tx) buffers are released
      once a software coalescing threshold is reached. In case not all the data is
      processed a timer is set on this CPU - in its interrupt context a tasklet is
      scheduled in which all queues are processed. At once only one timer per-CPU can
      be running, which is controlled by a dedicated flag.
      
      This commit removes TX processing from NAPI polling function, disables hardware
      coalescing and enables hrtimer with tasklet, using new per-CPU port structure
      (mvpp2_port_pcpu).
      Signed-off-by: NMarcin Wojtas <mw@semihalf.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      edc660fa
    • M
      net: mvpp2: enable proper per-CPU TX buffers unmapping · 71ce391d
      Marcin Wojtas 提交于
      mvpp2 driver allows usage of per-CPU TX processing. Once the packets are
      prepared independetly on each CPU, the hardware enqueues the descriptors in
      common TX queue. After they are sent, the buffers and associated sk_buffs
      should be released on the corresponding CPU.
      
      This is why a special index is maintained in order to point to the right data to
      be released after transmission takes place. Each per-CPU TX queue comprise an
      array of sent sk_buffs, freed in mvpp2_txq_bufs_free function. However, the
      index was used there also for obtaining a descriptor (and therefore a buffer to
      be DMA-unmapped) from common TX queue, which was wrong, because it was not
      referring to the current CPU.
      
      This commit enables proper unmapping of sent data buffers by indexing them in
      per-CPU queues using a dedicated array for keeping their physical addresses.
      Signed-off-by: NMarcin Wojtas <mw@semihalf.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      71ce391d
    • M
      net: mvpp2: remove excessive spinlocks from driver initialization · d53793c5
      Marcin Wojtas 提交于
      Using spinlocks protection during one-time driver initialization is not
      necessary. Moreover it resulted in invalid GFP_KERNEL allocation under the lock.
      
      This commit removes redundant spinlocks from buffer manager part of mvpp2
      initialization.
      Signed-off-by: NMarcin Wojtas <mw@semihalf.com>
      Reported-by: NAlexandre Fournier <alexandre.fournier@wisp-e.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d53793c5
  4. 10 8月, 2015 2 次提交
  5. 08 8月, 2015 6 次提交
  6. 07 8月, 2015 8 次提交