1. 03 3月, 2015 4 次提交
  2. 02 3月, 2015 15 次提交
  3. 01 3月, 2015 10 次提交
    • E
      net: do not use rcu in rtnl_dump_ifinfo() · cac5e65e
      Eric Dumazet 提交于
      We did a failed attempt in the past to only use rcu in rtnl dump
      operations (commit e67f88dd "net: dont hold rtnl mutex during
      netlink dump callbacks")
      
      Now that dumps are holding RTNL anyway, there is no need to also
      use rcu locking, as it forbids any scheduling ability, like
      GFP_KERNEL allocations that controlling path should use instead
      of GFP_ATOMIC whenever possible.
      
      This should fix following splat Cong Wang reported :
      
       [ INFO: suspicious RCU usage. ]
       3.19.0+ #805 Tainted: G        W
      
       include/linux/rcupdate.h:538 Illegal context switch in RCU read-side critical section!
      
       other info that might help us debug this:
      
       rcu_scheduler_active = 1, debug_locks = 0
       2 locks held by ip/771:
        #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff8182b8f4>] netlink_dump+0x21/0x26c
        #1:  (rcu_read_lock){......}, at: [<ffffffff817d785b>] rcu_read_lock+0x0/0x6e
      
       stack backtrace:
       CPU: 3 PID: 771 Comm: ip Tainted: G        W       3.19.0+ #805
       Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
        0000000000000001 ffff8800d51e7718 ffffffff81a27457 0000000029e729e6
        ffff8800d6108000 ffff8800d51e7748 ffffffff810b539b ffffffff820013dd
        00000000000001c8 0000000000000000 ffff8800d7448088 ffff8800d51e7758
       Call Trace:
        [<ffffffff81a27457>] dump_stack+0x4c/0x65
        [<ffffffff810b539b>] lockdep_rcu_suspicious+0x107/0x110
        [<ffffffff8109796f>] rcu_preempt_sleep_check+0x45/0x47
        [<ffffffff8109e457>] ___might_sleep+0x1d/0x1cb
        [<ffffffff8109e67d>] __might_sleep+0x78/0x80
        [<ffffffff814b9b1f>] idr_alloc+0x45/0xd1
        [<ffffffff810cb7ab>] ? rcu_read_lock_held+0x3b/0x3d
        [<ffffffff814b9f9d>] ? idr_for_each+0x53/0x101
        [<ffffffff817c1383>] alloc_netid+0x61/0x69
        [<ffffffff817c14c3>] __peernet2id+0x79/0x8d
        [<ffffffff817c1ab7>] peernet2id+0x13/0x1f
        [<ffffffff817d8673>] rtnl_fill_ifinfo+0xa8d/0xc20
        [<ffffffff810b17d9>] ? __lock_is_held+0x39/0x52
        [<ffffffff817d894f>] rtnl_dump_ifinfo+0x149/0x213
        [<ffffffff8182b9c2>] netlink_dump+0xef/0x26c
        [<ffffffff8182bcba>] netlink_recvmsg+0x17b/0x2c5
        [<ffffffff817b0adc>] __sock_recvmsg+0x4e/0x59
        [<ffffffff817b1b40>] sock_recvmsg+0x3f/0x51
        [<ffffffff817b1f9a>] ___sys_recvmsg+0xf6/0x1d9
        [<ffffffff8115dc67>] ? handle_pte_fault+0x6e1/0xd3d
        [<ffffffff8100a3a0>] ? native_sched_clock+0x35/0x37
        [<ffffffff8109f45b>] ? sched_clock_local+0x12/0x72
        [<ffffffff8109f6ac>] ? sched_clock_cpu+0x9e/0xb7
        [<ffffffff810cb7ab>] ? rcu_read_lock_held+0x3b/0x3d
        [<ffffffff811abde8>] ? __fcheck_files+0x4c/0x58
        [<ffffffff811ac556>] ? __fget_light+0x2d/0x52
        [<ffffffff817b376f>] __sys_recvmsg+0x42/0x60
        [<ffffffff817b379f>] SyS_recvmsg+0x12/0x1c
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Fixes: 0c7aecd4 ("netns: add rtnl cmd to add and get peer netns ids")
      Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      Reported-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cac5e65e
    • G
      sh_eth: Fix lost MAC address on kexec · a14c7d15
      Geert Uytterhoeven 提交于
      Commit 740c7f31 ("sh_eth: Ensure DMA engines are stopped before
      freeing buffers") added a call to sh_eth_reset() to the
      sh_eth_set_ringparam() and sh_eth_close() paths.
      
      However, setting the software reset bit(s) in the EDMR register resets
      the MAC Address Registers to zero. Hence after kexec, the new kernel
      doesn't detect a valid MAC address and assigns a random MAC address,
      breaking DHCP.
      
      Set the MAC address again after the reset in sh_eth_dev_exit() to fix
      this.
      
      Tested on r8a7740/armadillo (GETHER) and r8a7791/koelsch (FAST_RCAR).
      
      Fixes: 740c7f31 ("sh_eth: Ensure DMA engines are stopped before freeing buffers")
      Signed-off-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a14c7d15
    • J
      net: bcmgenet: fix throughtput regression · 4092e6ac
      Jaedon Shin 提交于
      This patch adds bcmgenet_tx_poll for the tx_rings. This can reduce the
      interrupt load and send xmit in network stack on time. This also
      separated for the completion of tx_ring16 from bcmgenet_poll.
      
      The bcmgenet_tx_reclaim of tx_ring[{0,1,2,3}] operative by an interrupt
      is to be not more than a certain number TxBDs. It is caused by too
      slowly reclaiming the transmitted skb. Therefore, performance
      degradation of xmit after 605ad7f1 ("tcp: refine TSO autosizing").
      Signed-off-by: NJaedon Shin <jaedon.shin@gmail.com>
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4092e6ac
    • E
      macvtap: make sure neighbour code can push ethernet header · 2f1d8b9e
      Eric Dumazet 提交于
      Brian reported crashes using IPv6 traffic with macvtap/veth combo.
      
      I tracked the crashes in neigh_hh_output()
      
      -> memcpy(skb->data - HH_DATA_MOD, hh->hh_data, HH_DATA_MOD);
      
      Neighbour code assumes headroom to push Ethernet header is
      at least 16 bytes.
      
      It appears macvtap has only 14 bytes available on arches
      where NET_IP_ALIGN is 0 (like x86)
      
      Effect is a corruption of 2 bytes right before skb->head,
      and possible crashes if accessing non existing memory.
      
      This fix should also increase IPv4 performance, as paranoid code
      in ip_finish_output2() wont have to call skb_realloc_headroom()
      Reported-by: NBrian Rak <brak@vultr.com>
      Tested-by: NBrian Rak <brak@vultr.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2f1d8b9e
    • D
      Merge tag 'mac80211-for-davem-2015-02-27' of... · 32034e05
      David S. Miller 提交于
      Merge tag 'mac80211-for-davem-2015-02-27' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
      
      Johannes Berg says:
      
      ====================
      A few patches have accumulated, among them the fix for Linus's
      four-way-handshake problem. The others are various small fixes
      for problems all over, nothing really stands out.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      32034e05
    • E
      net: Verify permission to link_net in newlink · 06615bed
      Eric W. Biederman 提交于
      When applicable verify that the caller has permisson to the underlying
      network namespace for a newly created network device.
      
      Similary checks exist for the network namespace a network device will
      be created in.
      
      Fixes: 317f4810 ("rtnl: allow to create device with IFLA_LINK_NETNSID set")
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06615bed
    • E
      net: Verify permission to dest_net in newlink · 505ce415
      Eric W. Biederman 提交于
      When applicable verify that the caller has permision to create a
      network device in another network namespace.  This check is already
      present when moving a network device between network namespaces in
      setlink so all that is needed is to duplicate that check in newlink.
      
      This change almost backports cleanly, but there are context conflicts
      as the code that follows was added in v4.0-rc1
      
      Fixes: b51642f6 net: Enable a userns root rtnl calls that are safe for unprivilged users
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      505ce415
    • G
      drivers: net: cpsw: Set SECURE for dual_emac ucast · 56887149
      George McCollister 提交于
      Prior to this patch, sending a packet with the source MAC address of one
      of the CPSW interfaces to one of the CPSW slave ports while it's configured in
      dual_emac mode would update the port_num field of the VLAN/Unicast Address
      Table Entry. This would cause it to discard all incoming traffic addressed to
      that MAC address, essentially rendering the port useless until the ALE table is
      cleared (by starting and stopping the interface or rebooting.)
      
      For example, if eth0 has a MAC address of 90:59:af:8f:43:e9 it will have
      an ALE table entry:
      
      00 00 00 00 59 90 02 30 e9 43 8f af
      (VLAN Addr vlan_id=2 unicast type=0 port_num=0 addr=90:59:af:8f:43:e9)
      
      If you configure another device with the same MAC address and connect it
      to the first CPSW slave port and send some traffic the ALE table entry
      becomes:
      
      04 00 00 00 59 90 02 30 e9 43 8f af
      (VLAN Addr vlan_id=2 unicast type=0 port_num=1 addr=90:59:af:8f:43:e9)
      
      >From this point forward all incoming traffic addressed to
      90:59:af:8f:43:e9 will be dropped.
      
      Setting the SECURE bit for the VLAN/Unicast address table entry for each
      interface's MAC address corrects the problem.
      Signed-off-by: NGeorge McCollister <george.mccollister@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      56887149
    • D
      niu: fix error handling in niu_class_to_ethflow() · f55ea3d9
      Dan Carpenter 提交于
      There is a discrepancy here because the niu_class_to_ethflow() returns
      zero on failure and one on success but the caller expected zero on
      success and negative on failure.
      
      The problem means that we allow the user to pass classes and flow_types
      which we don't want.  I've looked at it a bit and I don't see it as a
      very serious bug.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f55ea3d9
    • A
      net: smc91x: use run-time configuration on all ARM machines · b70661c7
      Arnd Bergmann 提交于
      The smc91x driver traditionally gets configured at compile-time
      for whichever hardware it runs on. This no longer works on
      ARM as we continue to move to building all-in-one kernels.
      
      Most ARM configurations with this driver already use run-time
      configuration through DT or through platform_data, but a
      few have not been converted yet.
      
      I've checked all ARM boards that use this driver in their
      legacy board files, and converted the ones that were using
      compile-time configuration in smc91x.h to behave like the
      other ones and provide the interrupt polarity along with
      the MMIO configuration (width, stride) at platform device
      creation time.
      
      In particular, these combinations were previously selectable
      in Kconfig but in fact broken:
      
      - sa1100 assabet plus pleb
      - msm combined with any other armv6/v7 platform
      - pxa-idp combined with any non-DMA pxa variant
      - LogicPD PXA270 combined with any other pxa
      - nomadik combined with any other armv4/v5 platform,
        e.g. versatile.
      
      None of these seem critical enough to warrant a backport
      to stable, but it would be nice to clean this up for good.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      ----
      I would like the patch to get merged through netdev, after
      Robert and/or Linus have verified it on at least some hardware.
      
      There are a few other non-ARM platforms using this driver,
      I could do the same patch for those if we want to take
      it further.
      
       arch/arm/mach-msm/board-halibut.c    |   8 ++++-
       arch/arm/mach-msm/board-qsd8x50.c    |   8 ++++-
       arch/arm/mach-pxa/idp.c              |   5 +++
       arch/arm/mach-pxa/lpd270.c           |   8 ++++-
       arch/arm/mach-realview/core.c        |   7 ++++
       arch/arm/mach-realview/realview_eb.c |   2 +-
       arch/arm/mach-sa1100/neponset.c      |   6 ++++
       arch/arm/mach-sa1100/pleb.c          |   7 ++++
       drivers/net/ethernet/smsc/smc91x.c   |   9 +++--
       drivers/net/ethernet/smsc/smc91x.h   | 114 ++----------------------------------------------------------
       10 files changed, 57 insertions(+), 117 deletions(-)
      Tested-by: NRobert Jarzmik <robert.jarzmik@free.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b70661c7
  4. 28 2月, 2015 11 次提交
    • E
      rhashtable: use cond_resched() · 5beb5c90
      Eric Dumazet 提交于
      If a hash table has 128 slots and 16384 elems, expand to 256 slots
      takes more than one second. For larger sets, a soft lockup is detected.
      
      Holding cpu for that long, even in a work queue is a show stopper
      for non preemptable kernels.
      
      cond_resched() at strategic points to allow process scheduler
      to reschedule us.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5beb5c90
    • D
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net · 061c1a6e
      David S. Miller 提交于
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates 2015-02-26
      
      This series contains fixes for i40e and i40evf only.
      
      Alexey Khoroshilov found a possible leak of 'cmd_buf' when copy_from_user()
      failed in i40e_dbg_command_write(), so resolved by calling kfree().
      
      Shannon provides a fix to ensure the shift and bitwise precedences do not
      work backwards for us by adding parans.  Fixed the driver by preventing
      the driver from allowing stray interrupts or causing system logs from
      un-handled interrupts by combining the ICR0 shutdown with the standard
      interrupt shutdown and add the interrupt clearing to the PCI shutdown
      path.  Fixed an issue where a NVM write times out before a transaction
      can complete, so Shannon added logic to make another attempt by
      reacquiring the semaphore, then retry the write, if the one retry fails,
      we will then give up.  Adds checks to pointers before their use to ensure
      we do not try to dereference NULL pointers when returning values from the
      AdminQ calls.
      
      Akeem adds a check to bail out if the device is already down when checking
      for Tx hang subtask.
      
      Anjali fixes TSO with more than 8 frags per segment issue.  The hardware
      has some limitations which the driver needs to adhere to:
        1) no more than 8 descriptors per packet on the wire
        2) no header can span more than 3 descriptors
      If one of these events happens, the hardware will generate an internal
      error and freeze the Tx queue, so Anjali fixes this by linearizes the skb
      to avoid these situations.  Fixed an issue where the per Traffic Class
      queue count was higher than queues enabled, which will fix a warning
      with multiple function mode where systems regularly have more cores than
      vectors.  Fixed TCP/IPv6 over VXLAN Tx checksum offload, where we were
      checking the outer protocol flags and deciding the flow for the inner
      header.
      
      Jesse fixes a race condition in the transmit hang detection.  Before we
      were having issues of false Tx hang detection, no the driver makes more
      direct with the checks for progress forward by directly checking the head
      write back address and tail register when determining progress.  This
      avoids Tx hangs where the software gets behind, because we are directly
      checking hardware state when determining a hang state.
      
      Neerav fixes the transmit ring Qset handle when DCB reconfigures. The issue
      was when DCB is reconfigured to a single traffic class (TC) and the driver
      did not reset the Tx ring Qset handle to correct the mapping, which caused
      the Tx queue to disable timeouts.  Also as part of DCB reconfiguration flow
      if the Tx queue disable times out, then issue a PF reset to do some level
      of recovery.
      
      Mitch stops flow director on shutdown because, in some cases, the hardware
      would continue to try to access the FDIR ring after entering D3Hot state,
      which would cause either PCIe errors or NMIs, depending upon the system
      configuration.
      
      * NOTE * I have verified that this series of patches for net will not cause
      any merge issues when you sync up your net tree with your net-next tree.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      061c1a6e
    • L
      amd-xgbe: Request IRQs only after driver is fully setup · c30e76a7
      Lendacky, Thomas 提交于
      It is possible that the hardware may not have been properly shutdown
      before this driver gets control, through use by firmware, for example.
      Until the driver is loaded, interrupts associated with the hardware
      could go pending. When the IRQs are requested napi support has not
      been initialized yet, but the ISR will get control and schedule napi
      processing resulting in a kernel panic because the poll routine has not
      been set.
      
      Adjust the code so that the driver is fully ready to handle and process
      interrupts as soon as the IRQs are requested. This involves requesting
      and freeing IRQs during start and stop processing and ordering the napi
      add and delete calls appropriately.
      
      Also adjust the powerup and powerdown routines to match the start and
      stop routines in regards to the ordering of tasks, including napi
      related calls.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c30e76a7
    • L
      net: asix: add support for the Sitecom LN-028 USB adapter · 7488c3e3
      Luca Ceresoli 提交于
      Just another AX88178-based 10/100/1000 USB-to-Ethernet dongle. This one
      shows up in lsusb as: "Sitecom Europe B.V. LN-028 Network USB 2.0 Adapter".
      Signed-off-by: NLuca Ceresoli <luca@lucaceresoli.net>
      Cc: Francois Romieu <romieu@fr.zoreil.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: linux-usb@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7488c3e3
    • D
      Merge branch 'rhashtable' · c0eebfa3
      David S. Miller 提交于
      Daniel Borkmann says:
      
      ====================
      rhashtable updates
      
      As discussed, I'm sending out rhashtable fixups for -net.
      
      I have a couple of more patches I was working on last week pending,
      i.e. to get rid of ht->nelems and ht->shift atomic operations which
      speed-up pure insertions/deletions, e.g. on my laptop I have 2 threads,
      inserting 7M entries each, that will reduce insertion time from ~1,450 ms
      to 865 ms (performance should even be better after removing the
      grow/shrink indirections). I guess that however is rather something
      for net-next.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c0eebfa3
    • D
      rhashtable: remove indirection for grow/shrink decision functions · 4c4b52d9
      Daniel Borkmann 提交于
      Currently, all real users of rhashtable default their grow and shrink
      decision functions to rht_grow_above_75() and rht_shrink_below_30(),
      so that there's currently no need to have this explicitly selectable.
      
      It can/should be generic and private inside rhashtable until a real
      use case pops up. Since we can make this private, we'll save us this
      additional indirection layer and can improve insertion/deletion time
      as well.
      
      Reference: http://patchwork.ozlabs.org/patch/443040/Suggested-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c4b52d9
    • D
      rhashtable: unconditionally grow when max_shift is not specified · 8331de75
      Daniel Borkmann 提交于
      While commit c0c09bfd ("rhashtable: avoid unnecessary wakeup for
      worker queue") rightfully moved part of the decision making of
      whether we should expand or shrink from the expand/shrink functions
      themselves into insert/delete functions in order to avoid unnecessary
      worker wake-ups, it however introduced a regression by doing so.
      
      Before that change, if no max_shift was specified (= 0) on rhashtable
      initialization, rhashtable_expand() would just grow unconditionally
      and lets the available memory be the limiting factor. After that
      change, if no max_shift was specified, there would be _no_ expansion
      step at all.
      
      Given that netlink and tipc have a max_shift specified, it was not
      visible there, but Josh Hunt reported that if nft that starts out
      with a default element hint of 3 if not otherwise provided, would
      slow i.e. inserts down trememdously as it cannot grow larger to
      relax table occupancy.
      
      Given that the test case verifies shrinks/expands manually, we also
      must remove pointer to the helper functions to explicitly avoid
      parallel resizing on insertions/deletions. test_bucket_stats() and
      test_rht_lookup() could also be wrapped around rhashtable mutex to
      explicitly synchronize a walk from resizing, but I think that defeats
      the actual test case which intended to have explicit test steps,
      i.e. 1) inserts, 2) expands, 3) shrinks, 4) deletions, with object
      verification after each stage.
      Reported-by: NJosh Hunt <johunt@akamai.com>
      Fixes: c0c09bfd ("rhashtable: avoid unnecessary wakeup for worker queue")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Ying Xue <ying.xue@windriver.com>
      Cc: Josh Hunt <johunt@akamai.com>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8331de75
    • M
      vhost: drop hard-coded num_buffers size · 0d79a493
      Michael S. Tsirkin 提交于
      The 2 that we use for copy_to_iter comes from sizeof(u16),
      it used to be that way before the iov iter update.
      Fix it up, making it obvious the size of stack access
      is right.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d79a493
    • M
      vhost: cleanup iterator update logic · 4c5a8442
      Michael S. Tsirkin 提交于
      Recent iterator-related changes in vhost made it
      harder to follow the logic fixing up the header.
      In fact, the fixup always happens at the same
      offset: sizeof(virtio_net_hdr): sometimes the
      fixup iterator is updated by copy_to_iter,
      sometimes-by iov_iter_advance.
      
      Rearrange code to make this obvious.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c5a8442
    • D
      rocker: silence shift wrapping warning · 5f2ebfbe
      Dan Carpenter 提交于
      "val" is declared as a u64 so static checkers complain that this shift
      can wrap.  I don't have the hardware but probably it's doesn't have over
      31 ports.  Still we may as well silence the warning even if it's not a
      real bug.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: NJiri Pirko <jiri@resnulli.us>
      Acked-by: NScott Feldman <sfeldma@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5f2ebfbe
    • D
      rocker: add a check for NULL in rocker_probe_ports() · e65ad3be
      Dan Carpenter 提交于
      Make sure kmalloc() succeeds.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: NScott Feldman <sfeldma@gmail.com>
      Acked-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e65ad3be