1. 01 2月, 2019 2 次提交
  2. 31 1月, 2019 5 次提交
    • W
      fs/dcache: Track & report number of negative dentries · af0c9af1
      Waiman Long 提交于
      The current dentry number tracking code doesn't distinguish between
      positive & negative dentries.  It just reports the total number of
      dentries in the LRU lists.
      
      As excessive number of negative dentries can have an impact on system
      performance, it will be wise to track the number of positive and
      negative dentries separately.
      
      This patch adds tracking for the total number of negative dentries in
      the system LRU lists and reports it in the 5th field in the
      /proc/sys/fs/dentry-state file.  The number, however, does not include
      negative dentries that are in flight but not in the LRU yet as well as
      those in the shrinker lists which are on the way out anyway.
      
      The number of positive dentries in the LRU lists can be roughly found by
      subtracting the number of negative dentries from the unused count.
      
      Matthew Wilcox had confirmed that since the introduction of the
      dentry_stat structure in 2.1.60, the dummy array was there, probably for
      future extension.  They were not replacements of pre-existing fields.
      So no sane applications that read the value of /proc/sys/fs/dentry-state
      will do dummy thing if the last 2 fields of the sysctl parameter are not
      zero.  IOW, it will be safe to use one of the dummy array entry for
      negative dentry count.
      Signed-off-by: NWaiman Long <longman@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      af0c9af1
    • W
      fs: Don't need to put list_lru into its own cacheline · 7d10f70f
      Waiman Long 提交于
      The list_lru structure is essentially just a pointer to a table of
      per-node LRU lists.  Even if CONFIG_MEMCG_KMEM is defined, the list
      field is just used for LRU list registration and shrinker_id is set at
      initialization.  Those fields won't need to be touched that often.
      
      So there is no point to make the list_lru structures to sit in their own
      cachelines.
      Signed-off-by: NWaiman Long <longman@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7d10f70f
    • W
      fs/dcache: Fix incorrect nr_dentry_unused accounting in shrink_dcache_sb() · 1dbd449c
      Waiman Long 提交于
      The nr_dentry_unused per-cpu counter tracks dentries in both the LRU
      lists and the shrink lists where the DCACHE_LRU_LIST bit is set.
      
      The shrink_dcache_sb() function moves dentries from the LRU list to a
      shrink list and subtracts the dentry count from nr_dentry_unused.  This
      is incorrect as the nr_dentry_unused count will also be decremented in
      shrink_dentry_list() via d_shrink_del().
      
      To fix this double decrement, the decrement in the shrink_dcache_sb()
      function is taken out.
      
      Fixes: 4e717f5c ("list_lru: remove special case function list_lru_dispose_all."
      Cc: stable@kernel.org
      Signed-off-by: NWaiman Long <longman@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1dbd449c
    • L
      Merge tag 'iommu-fixes-v5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · 1c0490ce
      Linus Torvalds 提交于
      Pull IOMMU fixes from Joerg Roedel:
       "A few more fixes this time:
      
         - Two patches to fix the error path of the map_sg implementation of
           the AMD IOMMU driver.
      
         - Also a missing IOTLB flush is fixed in the AMD IOMMU driver.
      
         - Memory leak fix for the Intel IOMMU driver.
      
         - Fix a regression in the Mediatek IOMMU driver which caused device
           initialization to fail (seen as broken HDMI output)"
      
      * tag 'iommu-fixes-v5.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
        iommu/amd: Fix IOMMU page flush when detach device from a domain
        iommu/mediatek: Use correct fwspec in mtk_iommu_add_device()
        iommu/vt-d: Fix memory leak in intel_iommu_put_resv_regions()
        iommu/amd: Unmap all mapped pages in error path of map_sg
        iommu/amd: Call free_iova_fast with pfn in map_sg
      1c0490ce
    • L
      Merge tag 'gpio-v5.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio · 877ef51d
      Linus Torvalds 提交于
      Pull GPIO fixes from Linus Walleij:
       "Here is a bunch of GPIO fixes for the v5.0 series. I was helped out by
        Bartosz in collecting these fixes, for which I am very grateful, the
        biggest achievement in GPIO right now is work distribution.
      
        There is one serious core fix (timestamping) and a bunch of driver
        fixes:
      
         - Fix timestamps on nested IRQs
      
         - Handle IRQs properly in multiple instances of PCF857x
      
         - Use the right data register and IRQ type setting in the Spreadtrum
           GPIO driver
      
         - Let the value argument work properly when setting direction in the
           Altera GPIO driver
      
         - Mask interrupts properly in the vf610 driver"
      
      * tag 'gpio-v5.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
        gpio: vf610: Mask all GPIO interrupts
        gpio: altera-a10sr: Set proper output level for direction_output
        gpio: sprd: Fix incorrect irq type setting for the async EIC
        gpio: sprd: Fix the incorrect data register
        gpiolib: fix line event timestamps for nested irqs
        gpio: pcf857x: Fix interrupts on multiple instances
      877ef51d
  3. 30 1月, 2019 7 次提交
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 62967898
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Need to save away the IV across tls async operations, from Dave
          Watson.
      
       2) Upon successful packet processing, we should liberate the SKB with
          dev_consume_skb{_irq}(). From Yang Wei.
      
       3) Only apply RX hang workaround on effected macb chips, from Harini
          Katakam.
      
       4) Dummy netdev need a proper namespace assigned to them, from Josh
          Elsasser.
      
       5) Some paths of nft_compat run lockless now, and thus we need to use a
          proper refcnt_t. From Florian Westphal.
      
       6) Avoid deadlock in mlx5 by doing IRQ locking, from Moni Shoua.
      
       7) netrom does not refcount sockets properly wrt. timers, fix that by
          using the sock timer API. From Cong Wang.
      
       8) Fix locking of inexact inserts of xfrm policies, from Florian
          Westphal.
      
       9) Missing xfrm hash generation bump, also from Florian.
      
      10) Missing of_node_put() in hns driver, from Yonglong Liu.
      
      11) Fix DN_IFREQ_SIZE, from Johannes Berg.
      
      12) ip6mr notifier is invoked during traversal of wrong table, from Nir
          Dotan.
      
      13) TX promisc settings not performed correctly in qed, from Manish
          Chopra.
      
      14) Fix OOB access in vhost, from Jason Wang.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits)
        MAINTAINERS: Add entry for XDP (eXpress Data Path)
        net: set default network namespace in init_dummy_netdev()
        net: b44: replace dev_kfree_skb_xxx by dev_consume_skb_xxx for drop profiles
        net: caif: call dev_consume_skb_any when skb xmit done
        net: 8139cp: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
        net: macb: Apply RXUBR workaround only to versions with errata
        net: ti: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
        net: apple: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
        net: amd8111e: replace dev_kfree_skb_irq by dev_consume_skb_irq
        net: alteon: replace dev_kfree_skb_irq by dev_consume_skb_irq
        net: tls: Fix deadlock in free_resources tx
        net: tls: Save iv in tls_rec for async crypto requests
        vhost: fix OOB in get_rx_bufs()
        qed: Fix stack out of bounds bug
        qed: Fix system crash in ll2 xmit
        qed: Fix VF probe failure while FLR
        qed: Fix LACP pdu drops for VFs
        qed: Fix bug in tx promiscuous mode settings
        net: i825xx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
        netfilter: ipt_CLUSTERIP: fix warning unused variable cn
        ...
      62967898
    • J
      MAINTAINERS: Add entry for XDP (eXpress Data Path) · d07e1e0f
      Jesper Dangaard Brouer 提交于
      Add multiple people as maintainers for XDP, sorted alphabetically.
      
      XDP is also tied to driver level support and code, but we cannot add all
      drivers to the list. Instead K: and N: match on 'xdp' in hope to catch some
      of those changes in drivers.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d07e1e0f
    • J
      net: set default network namespace in init_dummy_netdev() · 35edfdc7
      Josh Elsasser 提交于
      Assign a default net namespace to netdevs created by init_dummy_netdev().
      Fixes a NULL pointer dereference caused by busy-polling a socket bound to
      an iwlwifi wireless device, which bumps the per-net BUSYPOLLRXPACKETS stat
      if napi_poll() received packets:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000190
        IP: napi_busy_loop+0xd6/0x200
        Call Trace:
          sock_poll+0x5e/0x80
          do_sys_poll+0x324/0x5a0
          SyS_poll+0x6c/0xf0
          do_syscall_64+0x6b/0x1f0
          entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      
      Fixes: 7db6b048 ("net: Commonize busy polling code to focus on napi_id instead of socket")
      Signed-off-by: NJosh Elsasser <jelsasser@appneta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35edfdc7
    • Y
      net: b44: replace dev_kfree_skb_xxx by dev_consume_skb_xxx for drop profiles · 0f0ed828
      Yang Wei 提交于
      The skb should be freed by dev_consume_skb_any() in b44_start_xmit()
      when bounce_skb is used. The skb is be replaced by bounce_skb, so the
      original skb should be consumed(not drop).
      
      dev_consume_skb_irq() should be called in b44_tx() when skb xmit
      done. It makes drop profiles(dropwatch, perf) more friendly.
      Signed-off-by: NYang Wei <yang.wei9@zte.com.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0f0ed828
    • Y
      net: caif: call dev_consume_skb_any when skb xmit done · e339f863
      Yang Wei 提交于
      The skb shouled be consumed when xmit done, it makes drop profiles
      (dropwatch, perf) more friendly.
      dev_kfree_skb_irq()/kfree_skb() shouled be replaced by
      dev_consume_skb_any(), it makes code cleaner.
      Signed-off-by: NYang Wei <yang.wei9@zte.com.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e339f863
    • Y
      net: 8139cp: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles · 896cebc0
      Yang Wei 提交于
      dev_consume_skb_irq() should be called in cp_tx() when skb xmit
      done. It makes drop profiles(dropwatch, perf) more friendly.
      Signed-off-by: NYang Wei <yang.wei9@zte.com.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      896cebc0
    • H
      net: macb: Apply RXUBR workaround only to versions with errata · e501070e
      Harini Katakam 提交于
      The interrupt handler contains a workaround for RX hang applicable
      to Zynq and AT91RM9200 only. Subsequent versions do not need this
      workaround. This workaround unnecessarily resets RX whenever RX used
      bit read is observed, which can be often under heavy traffic. There
      is no other action performed on RX UBR interrupt. Hence introduce a
      CAPS mask; enable this interrupt and workaround only on affected
      versions.
      Signed-off-by: NHarini Katakam <harini.katakam@xilinx.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e501070e
  4. 29 1月, 2019 16 次提交
    • Y
      net: ti: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles · b3379a42
      Yang Wei 提交于
      dev_consume_skb_irq() should be called in cpmac_end_xmit() when
      xmit done. It makes drop profiles more friendly.
      Signed-off-by: NYang Wei <yang.wei9@zte.com.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b3379a42
    • Y
      net: apple: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles · 10009115
      Yang Wei 提交于
      dev_consume_skb_irq() should be called in bmac_txdma_intr() when
      xmit done. It makes drop profiles more friendly.
      Signed-off-by: NYang Wei <yang.wei9@zte.com.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      10009115
    • AlbinYang's avatar
      net: amd8111e: replace dev_kfree_skb_irq by dev_consume_skb_irq · 3afa73dd
      AlbinYang 提交于
      dev_consume_skb_irq() should be called in amd8111e_tx() when xmit
      done. It makes drop profiles more friendly.
      Signed-off-by: NYang Wei <yang.wei9@zte.com.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3afa73dd
    • AlbinYang's avatar
      net: alteon: replace dev_kfree_skb_irq by dev_consume_skb_irq · f48af114
      AlbinYang 提交于
      dev_consume_skb_irq() should be called in ace_tx_int() when xmit
      done. It makes drop profiles more friendly.
      Signed-off-by: NYang Wei <yang.wei9@zte.com.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f48af114
    • D
      net: tls: Fix deadlock in free_resources tx · 10231213
      Dave Watson 提交于
      If there are outstanding async tx requests (when crypto returns EINPROGRESS),
      there is a potential deadlock: the tx work acquires the lock, while we
      cancel_delayed_work_sync() while holding the lock.  Drop the lock while waiting
      for the work to complete.
      
      Fixes: a42055e8 ("Add support for async encryption of records...")
      Signed-off-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      10231213
    • D
      net: tls: Save iv in tls_rec for async crypto requests · 32eb67b9
      Dave Watson 提交于
      aead_request_set_crypt takes an iv pointer, and we change the iv
      soon after setting it.  Some async crypto algorithms don't save the iv,
      so we need to save it in the tls_rec for async requests.
      
      Found by hardcoding x64 aesni to use async crypto manager (to test the async
      codepath), however I don't think this combination can happen in the wild.
      Presumably other hardware offloads will need this fix, but there have been
      no user reports.
      
      Fixes: a42055e8 ("Add support for async encryption of records...")
      Signed-off-by: NDave Watson <davejwatson@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      32eb67b9
    • J
      vhost: fix OOB in get_rx_bufs() · b46a0bf7
      Jason Wang 提交于
      After batched used ring updating was introduced in commit e2b3b35e
      ("vhost_net: batch used ring update in rx"). We tend to batch heads in
      vq->heads for more than one packet. But the quota passed to
      get_rx_bufs() was not correctly limited, which can result a OOB write
      in vq->heads.
      
              headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx,
                          vhost_len, &in, vq_log, &log,
                          likely(mergeable) ? UIO_MAXIOV : 1);
      
      UIO_MAXIOV was still used which is wrong since we could have batched
      used in vq->heads, this will cause OOB if the next buffer needs more
      than 960 (1024 (UIO_MAXIOV) - 64 (VHOST_NET_BATCH)) heads after we've
      batched 64 (VHOST_NET_BATCH) heads:
      Acked-by: NStefan Hajnoczi <stefanha@redhat.com>
      
      =============================================================================
      BUG kmalloc-8k (Tainted: G    B            ): Redzone overwritten
      -----------------------------------------------------------------------------
      
      INFO: 0x00000000fd93b7a2-0x00000000f0713384. First byte 0xa9 instead of 0xcc
      INFO: Allocated in alloc_pd+0x22/0x60 age=3933677 cpu=2 pid=2674
          kmem_cache_alloc_trace+0xbb/0x140
          alloc_pd+0x22/0x60
          gen8_ppgtt_create+0x11d/0x5f0
          i915_ppgtt_create+0x16/0x80
          i915_gem_create_context+0x248/0x390
          i915_gem_context_create_ioctl+0x4b/0xe0
          drm_ioctl_kernel+0xa5/0xf0
          drm_ioctl+0x2ed/0x3a0
          do_vfs_ioctl+0x9f/0x620
          ksys_ioctl+0x6b/0x80
          __x64_sys_ioctl+0x11/0x20
          do_syscall_64+0x43/0xf0
          entry_SYSCALL_64_after_hwframe+0x44/0xa9
      INFO: Slab 0x00000000d13e87af objects=3 used=3 fp=0x          (null) flags=0x200000000010201
      INFO: Object 0x0000000003278802 @offset=17064 fp=0x00000000e2e6652b
      
      Fixing this by allocating UIO_MAXIOV + VHOST_NET_BATCH iovs for
      vhost-net. This is done through set the limitation through
      vhost_dev_init(), then set_owner can allocate the number of iov in a
      per device manner.
      
      This fixes CVE-2018-16880.
      
      Fixes: e2b3b35e ("vhost_net: batch used ring update in rx")
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b46a0bf7
    • D
      Merge branch 'qed-Bug-fixes' · bfe2599d
      David S. Miller 提交于
      Manish Chopra says:
      
      ====================
      qed: Bug fixes
      
      This series have SR-IOV and some general fixes.
      Please consider applying it to "net"
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bfe2599d
    • M
      qed: Fix stack out of bounds bug · ffb057f9
      Manish Chopra 提交于
      KASAN reported following bug in qed_init_qm_get_idx_from_flags
      due to inappropriate casting of "pq_flags". Fix the type of "pq_flags".
      
      [  196.624707] BUG: KASAN: stack-out-of-bounds in qed_init_qm_get_idx_from_flags+0x1a4/0x1b8 [qed]
      [  196.624712] Read of size 8 at addr ffff809b00bc7360 by task kworker/0:9/1712
      [  196.624714]
      [  196.624720] CPU: 0 PID: 1712 Comm: kworker/0:9 Not tainted 4.18.0-60.el8.aarch64+debug #1
      [  196.624723] Hardware name: To be filled by O.E.M. Saber/Saber, BIOS 0ACKL024 09/26/2018
      [  196.624733] Workqueue: events work_for_cpu_fn
      [  196.624738] Call trace:
      [  196.624742]  dump_backtrace+0x0/0x2f8
      [  196.624745]  show_stack+0x24/0x30
      [  196.624749]  dump_stack+0xe0/0x11c
      [  196.624755]  print_address_description+0x68/0x260
      [  196.624759]  kasan_report+0x178/0x340
      [  196.624762]  __asan_report_load_n_noabort+0x38/0x48
      [  196.624786]  qed_init_qm_get_idx_from_flags+0x1a4/0x1b8 [qed]
      [  196.624808]  qed_init_qm_info+0xec0/0x2200 [qed]
      [  196.624830]  qed_resc_alloc+0x284/0x7e8 [qed]
      [  196.624853]  qed_slowpath_start+0x6cc/0x1ae8 [qed]
      [  196.624864]  __qede_probe.isra.10+0x1cc/0x12c0 [qede]
      [  196.624874]  qede_probe+0x78/0xf0 [qede]
      [  196.624879]  local_pci_probe+0xc4/0x180
      [  196.624882]  work_for_cpu_fn+0x54/0x98
      [  196.624885]  process_one_work+0x758/0x1900
      [  196.624888]  worker_thread+0x4e0/0xd18
      [  196.624892]  kthread+0x2c8/0x350
      [  196.624897]  ret_from_fork+0x10/0x18
      [  196.624899]
      [  196.624902] Allocated by task 2:
      [  196.624906]  kasan_kmalloc.part.1+0x40/0x108
      [  196.624909]  kasan_kmalloc+0xb4/0xc8
      [  196.624913]  kasan_slab_alloc+0x14/0x20
      [  196.624916]  kmem_cache_alloc_node+0x1dc/0x480
      [  196.624921]  copy_process.isra.1.part.2+0x1d8/0x4a98
      [  196.624924]  _do_fork+0x150/0xfa0
      [  196.624926]  kernel_thread+0x48/0x58
      [  196.624930]  kthreadd+0x3a4/0x5a0
      [  196.624932]  ret_from_fork+0x10/0x18
      [  196.624934]
      [  196.624937] Freed by task 0:
      [  196.624938] (stack is not available)
      [  196.624940]
      [  196.624943] The buggy address belongs to the object at ffff809b00bc0000
      [  196.624943]  which belongs to the cache thread_stack of size 32768
      [  196.624946] The buggy address is located 29536 bytes inside of
      [  196.624946]  32768-byte region [ffff809b00bc0000, ffff809b00bc8000)
      [  196.624948] The buggy address belongs to the page:
      [  196.624952] page:ffff7fe026c02e00 count:1 mapcount:0 mapping:ffff809b4001c000 index:0x0 compound_mapcount: 0
      [  196.624960] flags: 0xfffff8000008100(slab|head)
      [  196.624967] raw: 0fffff8000008100 dead000000000100 dead000000000200 ffff809b4001c000
      [  196.624970] raw: 0000000000000000 0000000000080008 00000001ffffffff 0000000000000000
      [  196.624973] page dumped because: kasan: bad access detected
      [  196.624974]
      [  196.624976] Memory state around the buggy address:
      [  196.624980]  ffff809b00bc7200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [  196.624983]  ffff809b00bc7280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [  196.624985] >ffff809b00bc7300: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 04 f2 f2 f2
      [  196.624988]                                                        ^
      [  196.624990]  ffff809b00bc7380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [  196.624993]  ffff809b00bc7400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [  196.624995] ==================================================================
      Signed-off-by: NManish Chopra <manishc@marvell.com>
      Signed-off-by: NAriel Elior <aelior@marvell.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ffb057f9
    • M
      qed: Fix system crash in ll2 xmit · 7c81626a
      Manish Chopra 提交于
      Cache number of fragments in the skb locally as in case
      of linear skb (with zero fragments), tx completion
      (or freeing of skb) may happen before driver tries
      to get number of frgaments from the skb which could
      lead to stale access to an already freed skb.
      Signed-off-by: NManish Chopra <manishc@marvell.com>
      Signed-off-by: NAriel Elior <aelior@marvell.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c81626a
    • M
      qed: Fix VF probe failure while FLR · 327852ec
      Manish Chopra 提交于
      VFs may hit VF-PF channel timeout while probing, as in some
      cases it was observed that VF FLR and VF "acquire" message
      transaction (i.e first message from VF to PF in VF's probe flow)
      could occur simultaneously which could lead VF to fail sending
      "acquire" message to PF as VF is marked disabled from HW perspective
      due to FLR, which will result into channel timeout and VF probe failure.
      
      In such cases, try retrying VF "acquire" message so that in later
      attempts it could be successful to pass message to PF after the VF
      FLR is completed and can be probed successfully.
      Signed-off-by: NManish Chopra <manishc@marvell.com>
      Signed-off-by: NAriel Elior <aelior@marvell.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      327852ec
    • M
      qed: Fix LACP pdu drops for VFs · ff929696
      Manish Chopra 提交于
      VF is always configured to drop control frames
      (with reserved mac addresses) but to work LACP
      on the VFs, it would require LACP control frames
      to be forwarded or transmitted successfully.
      
      This patch fixes this in such a way that trusted VFs
      (marked through ndo_set_vf_trust) would be allowed to
      pass the control frames such as LACP pdus.
      Signed-off-by: NManish Chopra <manishc@marvell.com>
      Signed-off-by: NAriel Elior <aelior@marvell.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ff929696
    • M
      qed: Fix bug in tx promiscuous mode settings · 9e71a15d
      Manish Chopra 提交于
      When running tx switched traffic between VNICs
      created via a bridge(to which VFs are added),
      adapter drops the unicast packets in tx flow due to
      VNIC's ucast mac being unknown to it. But VF interfaces
      being in promiscuous mode should have caused adapter
      to accept all the unknown ucast packets. Later, it
      was found that driver doesn't really configure tx
      promiscuous mode settings to accept all unknown unicast macs.
      
      This patch fixes tx promiscuous mode settings to accept all
      unknown/unmatched unicast macs and works out the scenario.
      Signed-off-by: NManish Chopra <manishc@marvell.com>
      Signed-off-by: NAriel Elior <aelior@marvell.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9e71a15d
    • AlbinYang's avatar
      net: i825xx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles · ca899324
      AlbinYang 提交于
      dev_consume_skb_irq() should be called in i596_interrupt() when skb
      xmit done. It makes drop profiles(dropwatch, perf) more friendly.
      Signed-off-by: AlbinYang's avatarYang Wei <albin_yang@163.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ca899324
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · ff44a837
      David S. Miller 提交于
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter/IPVS fixes for net
      
      The following patchset contains Netfilter/IPVS fixes for your net tree:
      
      1) The nftnl mutex is now per-netns, therefore use reference counter
         for matches and targets to deal with concurrent updates from netns.
         Moreover, place extensions in a pernet list. Patches from Florian Westphal.
      
      2) Bail out with EINVAL in case of negative timeouts via setsockopt()
         through ip_vs_set_timeout(), from ZhangXiaoxu.
      
      3) Spurious EINVAL on ebtables 32bit binary with 64bit kernel, also
         from Florian.
      
      4) Reset TCP option header parser in case of fingerprint mismatch,
         otherwise follow up overlapping fingerprint definitions including
         TCP options do not work, from Fernando Fernandez Mancera.
      
      5) Compilation warning in ipt_CLUSTER with CONFIG_PROC_FS unset.
         From Anders Roxell.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ff44a837
    • M
      Revert "mm, memory_hotplug: initialize struct pages for the full memory section" · 4aa9fc2a
      Michal Hocko 提交于
      This reverts commit 2830bf6f.
      
      The underlying assumption that one sparse section belongs into a single
      numa node doesn't hold really. Robert Shteynfeld has reported a boot
      failure. The boot log was not captured but his memory layout is as
      follows:
      
        Early memory node ranges
          node   1: [mem 0x0000000000001000-0x0000000000090fff]
          node   1: [mem 0x0000000000100000-0x00000000dbdf8fff]
          node   1: [mem 0x0000000100000000-0x0000001423ffffff]
          node   0: [mem 0x0000001424000000-0x0000002023ffffff]
      
      This means that node0 starts in the middle of a memory section which is
      also in node1.  memmap_init_zone tries to initialize padding of a
      section even when it is outside of the given pfn range because there are
      code paths (e.g.  memory hotplug) which assume that the full worth of
      memory section is always initialized.
      
      In this particular case, though, such a range is already intialized and
      most likely already managed by the page allocator.  Scribbling over
      those pages corrupts the internal state and likely blows up when any of
      those pages gets used.
      Reported-by: NRobert Shteynfeld <robert.shteynfeld@gmail.com>
      Fixes: 2830bf6f ("mm, memory_hotplug: initialize struct pages for the full memory section")
      Cc: stable@kernel.org
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4aa9fc2a
  5. 28 1月, 2019 10 次提交
    • A
      gpio: vf610: Mask all GPIO interrupts · 7ae710f9
      Andrew Lunn 提交于
      On SoC reset all GPIO interrupts are disable. However, if kexec is
      used to boot into a new kernel, the SoC does not experience a
      reset. Hence GPIO interrupts can be left enabled from the previous
      kernel. It is then possible for the interrupt to fire before an
      interrupt handler is registered, resulting in the kernel complaining
      of an "unexpected IRQ trap", the interrupt is never cleared, and so
      fires again, resulting in an interrupt storm.
      
      Disable all GPIO interrupts before registering the GPIO IRQ chip.
      
      Fixes: 7f2691a1 ("gpio: vf610: add gpiolib/IRQ chip driver for Vybrid")
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Acked-by: NStefan Agner <stefan@agner.ch>
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      7ae710f9
    • A
      netfilter: ipt_CLUSTERIP: fix warning unused variable cn · 206b8cc5
      Anders Roxell 提交于
      When CONFIG_PROC_FS isn't set the variable cn isn't used.
      
      net/ipv4/netfilter/ipt_CLUSTERIP.c: In function ‘clusterip_net_exit’:
      net/ipv4/netfilter/ipt_CLUSTERIP.c:849:24: warning: unused variable ‘cn’ [-Wunused-variable]
        struct clusterip_net *cn = clusterip_pernet(net);
                              ^~
      
      Rework so the variable 'cn' is declared inside "#ifdef CONFIG_PROC_FS".
      
      Fixes: b12f7bad ("netfilter: ipt_CLUSTERIP: remove wrong WARN_ON_ONCE in netns exit routine")
      Signed-off-by: NAnders Roxell <anders.roxell@linaro.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      206b8cc5
    • F
      netfilter: nfnetlink_osf: add missing fmatch check · 1a6a0951
      Fernando Fernandez Mancera 提交于
      When we check the tcp options of a packet and it doesn't match the current
      fingerprint, the tcp packet option pointer must be restored to its initial
      value in order to do the proper tcp options check for the next fingerprint.
      
      Here we can see an example.
      Assumming the following fingerprint base with two lines:
      
      S10:64:1:60:M*,S,T,N,W6:      Linux:3.0::Linux 3.0
      S20:64:1:60:M*,S,T,N,W7:      Linux:4.19:arch:Linux 4.1
      
      Where TCP options are the last field in the OS signature, all of them overlap
      except by the last one, ie. 'W6' versus 'W7'.
      
      In case a packet for Linux 4.19 kicks in, the osf finds no matching because the
      TCP options pointer is updated after checking for the TCP options in the first
      line.
      
      Therefore, reset pointer back to where it should be.
      
      Fixes: 11eeef41 ("netfilter: passive OS fingerprint xtables match")
      Signed-off-by: NFernando Fernandez Mancera <ffmancera@riseup.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      1a6a0951
    • F
      netfilter: ebtables: compat: un-break 32bit setsockopt when no rules are present · 2035f3ff
      Florian Westphal 提交于
      Unlike ip(6)tables ebtables only counts user-defined chains.
      
      The effect is that a 32bit ebtables binary on a 64bit kernel can do
      'ebtables -N FOO' only after adding at least one rule, else the request
      fails with -EINVAL.
      
      This is a similar fix as done in
      3f1e53ab ("netfilter: ebtables: don't attempt to allocate 0-sized compat array").
      
      Fixes: 7d7d7e02 ("netfilter: compat: reject huge allocation requests")
      Reported-by: NFrancesco Ruggeri <fruggeri@arista.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      2035f3ff
    • A
      net: dsa: mv88e6xxx: Fix serdes irq setup going recursive · 6fb6e637
      Andrew Lunn 提交于
      Duec to a typo, mv88e6390_serdes_irq_setup() calls itself, rather than
      mv88e6390x_serdes_irq_setup(). It then blows the stack, and shortly
      after the machine blows up.
      
      Fixes: 2defda1f ("net: dsa: mv88e6xxx: Add support for SERDES on ports 2-8 for 6390X")
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6fb6e637
    • N
      ip6mr: Fix notifiers call on mroute_clean_tables() · 146820cc
      Nir Dotan 提交于
      When the MC route socket is closed, mroute_clean_tables() is called to
      cleanup existing routes. Mistakenly notifiers call was put on the cleanup
      of the unresolved MC route entries cache.
      In a case where the MC socket closes before an unresolved route expires,
      the notifier call leads to a crash, caused by the driver trying to
      increment a non initialized refcount_t object [1] and then when handling
      is done, to decrement it [2]. This was detected by a test recently added in
      commit 6d4efada ("selftests: forwarding: Add multicast routing test").
      
      Fix that by putting notifiers call on the resolved entries traversal,
      instead of on the unresolved entries traversal.
      
      [1]
      
      [  245.748967] refcount_t: increment on 0; use-after-free.
      [  245.754829] WARNING: CPU: 3 PID: 3223 at lib/refcount.c:153 refcount_inc_checked+0x2b/0x30
      ...
      [  245.802357] Hardware name: Mellanox Technologies Ltd. MSN2740/SA001237, BIOS 5.6.5 06/07/2016
      [  245.811873] RIP: 0010:refcount_inc_checked+0x2b/0x30
      ...
      [  245.907487] Call Trace:
      [  245.910231]  mlxsw_sp_router_fib_event.cold.181+0x42/0x47 [mlxsw_spectrum]
      [  245.917913]  notifier_call_chain+0x45/0x7
      [  245.922484]  atomic_notifier_call_chain+0x15/0x20
      [  245.927729]  call_fib_notifiers+0x15/0x30
      [  245.932205]  mroute_clean_tables+0x372/0x3f
      [  245.936971]  ip6mr_sk_done+0xb1/0xc0
      [  245.940960]  ip6_mroute_setsockopt+0x1da/0x5f0
      ...
      
      [2]
      
      [  246.128487] refcount_t: underflow; use-after-free.
      [  246.133859] WARNING: CPU: 0 PID: 7 at lib/refcount.c:187 refcount_sub_and_test_checked+0x4c/0x60
      [  246.183521] Hardware name: Mellanox Technologies Ltd. MSN2740/SA001237, BIOS 5.6.5 06/07/2016
      ...
      [  246.193062] Workqueue: mlxsw_core_ordered mlxsw_sp_router_fibmr_event_work [mlxsw_spectrum]
      [  246.202394] RIP: 0010:refcount_sub_and_test_checked+0x4c/0x60
      ...
      [  246.298889] Call Trace:
      [  246.301617]  refcount_dec_and_test_checked+0x11/0x20
      [  246.307170]  mlxsw_sp_router_fibmr_event_work.cold.196+0x47/0x78 [mlxsw_spectrum]
      [  246.315531]  process_one_work+0x1fa/0x3f0
      [  246.320005]  worker_thread+0x2f/0x3e0
      [  246.324083]  kthread+0x118/0x130
      [  246.327683]  ? wq_update_unbound_numa+0x1b0/0x1b0
      [  246.332926]  ? kthread_park+0x80/0x80
      [  246.337013]  ret_from_fork+0x1f/0x30
      
      Fixes: 088aa3ee ("ip6mr: Support fib notifications")
      Signed-off-by: NNir Dotan <nird@mellanox.com>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      146820cc
    • J
      decnet: fix DN_IFREQ_SIZE · 50c29366
      Johannes Berg 提交于
      Digging through the ioctls with Al because of the previous
      patches, we found that on 64-bit decnet's dn_dev_ioctl()
      is wrong, because struct ifreq::ifr_ifru is actually 24
      bytes (not 16 as expected from struct sockaddr) due to the
      ifru_map and ifru_settings members.
      
      Clearly, decnet expects the ioctl to be called with a struct
      like
        struct ifreq_dn {
          char ifr_name[IFNAMSIZ];
          struct sockaddr_dn ifr_addr;
        };
      
      since it does
        struct ifreq *ifr = ...;
        struct sockaddr_dn *sdn = (struct sockaddr_dn *)&ifr->ifr_addr;
      
      This means that DN_IFREQ_SIZE is too big for what it wants on
      64-bit, as it is
        sizeof(struct ifreq) - sizeof(struct sockaddr) +
        sizeof(struct sockaddr_dn)
      
      This assumes that sizeof(struct sockaddr) is the size of ifr_ifru
      but that isn't true.
      
      Fix this to use offsetof(struct ifreq, ifr_ifru).
      
      This indeed doesn't really matter much - the result is that we
      copy in/out 8 bytes more than we should on 64-bit platforms. In
      case the "struct ifreq_dn" lands just on the end of a page though
      it might lead to faults.
      
      As far as I can tell, it has been like this forever, so it seems
      very likely that nobody cares.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      50c29366
    • A
      net: stmmac: dwmac-rk: fix error handling in rk_gmac_powerup() · c69c29a1
      Alexey Khoroshilov 提交于
      If phy_power_on() fails in rk_gmac_powerup(), clocks are left enabled.
      
      Found by Linux Driver Verification project (linuxtesting.org).
      Signed-off-by: NAlexey Khoroshilov <khoroshilov@ispras.ru>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c69c29a1
    • D
      Merge branch 'hns-fixes' · 417c8045
      David S. Miller 提交于
      Peng Li says:
      
      ====================
      net: hns: code optimizations & bugfixes for HNS driver
      
      This patchset includes bugfixes and code optimizations for the HNS
      ethernet controller driver
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      417c8045
    • Y
      net: hns: Fix wrong read accesses via Clause 45 MDIO protocol · cec8abba
      Yonglong Liu 提交于
      When reading phy registers via Clause 45 MDIO protocol, after write
      address operation, the driver use another write address operation, so
      can not read the right value of any phy registers. This patch fixes it.
      Signed-off-by: NYonglong Liu <liuyonglong@huawei.com>
      Signed-off-by: NPeng Li <lipeng321@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cec8abba