1. 25 5月, 2019 3 次提交
  2. 22 5月, 2019 1 次提交
  3. 18 5月, 2019 1 次提交
  4. 16 5月, 2019 3 次提交
    • D
      rxrpc: Allow the kernel to mark a call as being non-interruptible · b960a34b
      David Howells 提交于
      Allow kernel services using AF_RXRPC to indicate that a call should be
      non-interruptible.  This allows kafs to make things like lock-extension and
      writeback data storage calls non-interruptible.
      
      If this is set, signals will be ignored for operations on that call where
      possible - such as waiting to get a call channel on an rxrpc connection.
      
      It doesn't prevent UDP sendmsg from being interrupted, but that will be
      handled by packet retransmission.
      
      rxrpc_kernel_recv_data() isn't affected by this since that never waits,
      preferring instead to return -EAGAIN and leave the waiting to the caller.
      
      Userspace initiated calls can't be set to be uninterruptible at this time.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      b960a34b
    • D
      rxrpc: Provide kernel interface to set max lifespan on a call · bbd172e3
      David Howells 提交于
      Provide an interface to set max lifespan on a call from inside of the
      kernel without having to call kernel_sendmsg().
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      bbd172e3
    • D
      dns_resolver: Allow used keys to be invalidated · d0660f0b
      David Howells 提交于
      Allow used DNS resolver keys to be invalidated after use if the caller is
      doing its own caching of the results.  This reduces the amount of resources
      required.
      
      Fix AFS to invalidate DNS results to kill off permanent failure records
      that get lodged in the resolver keyring and prevent future lookups from
      happening.
      
      Fixes: 0a5143f2 ("afs: Implement VL server rotation")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      d0660f0b
  5. 15 5月, 2019 3 次提交
    • S
      net: replace CONFIG_DEBUG_KERNEL with CONFIG_DEBUG_MISC · 432d8220
      Sinan Kaya 提交于
      CONFIG_DEBUG_KERNEL should not impact code generation.  Use the newly
      defined CONFIG_DEBUG_MISC instead to keep the current code.
      
      Link: http://lkml.kernel.org/r/20190413224438.10802-6-okaya@kernel.orgSigned-off-by: NSinan Kaya <okaya@kernel.org>
      Acked-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Anders Roxell <anders.roxell@linaro.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc:  Chris Zankel <chris@zankel.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Thomas Bogendoerfer <tbogendoerfer@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      432d8220
    • I
      mm/gup: change GUP fast to use flags rather than a write 'bool' · 73b0140b
      Ira Weiny 提交于
      To facilitate additional options to get_user_pages_fast() change the
      singular write parameter to be gup_flags.
      
      This patch does not change any functionality.  New functionality will
      follow in subsequent patches.
      
      Some of the get_user_pages_fast() call sites were unchanged because they
      already passed FOLL_WRITE or 0 for the write parameter.
      
      NOTE: It was suggested to change the ordering of the get_user_pages_fast()
      arguments to ensure that callers were converted.  This breaks the current
      GUP call site convention of having the returned pages be the final
      parameter.  So the suggestion was rejected.
      
      Link: http://lkml.kernel.org/r/20190328084422.29911-4-ira.weiny@intel.com
      Link: http://lkml.kernel.org/r/20190317183438.2057-4-ira.weiny@intel.comSigned-off-by: NIra Weiny <ira.weiny@intel.com>
      Reviewed-by: NMike Marshall <hubcap@omnibond.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      73b0140b
    • I
      mm/gup: replace get_user_pages_longterm() with FOLL_LONGTERM · 932f4a63
      Ira Weiny 提交于
      Pach series "Add FOLL_LONGTERM to GUP fast and use it".
      
      HFI1, qib, and mthca, use get_user_pages_fast() due to its performance
      advantages.  These pages can be held for a significant time.  But
      get_user_pages_fast() does not protect against mapping FS DAX pages.
      
      Introduce FOLL_LONGTERM and use this flag in get_user_pages_fast() which
      retains the performance while also adding the FS DAX checks.  XDP has also
      shown interest in using this functionality.[1]
      
      In addition we change get_user_pages() to use the new FOLL_LONGTERM flag
      and remove the specialized get_user_pages_longterm call.
      
      [1] https://lkml.org/lkml/2019/3/19/939
      
      "longterm" is a relative thing and at this point is probably a misnomer.
      This is really flagging a pin which is going to be given to hardware and
      can't move.  I've thought of a couple of alternative names but I think we
      have to settle on if we are going to use FL_LAYOUT or something else to
      solve the "longterm" problem.  Then I think we can change the flag to a
      better name.
      
      Secondly, it depends on how often you are registering memory.  I have
      spoken with some RDMA users who consider MR in the performance path...
      For the overall application performance.  I don't have the numbers as the
      tests for HFI1 were done a long time ago.  But there was a significant
      advantage.  Some of which is probably due to the fact that you don't have
      to hold mmap_sem.
      
      Finally, architecturally I think it would be good for everyone to use
      *_fast.  There are patches submitted to the RDMA list which would allow
      the use of *_fast (they reworking the use of mmap_sem) and as soon as they
      are accepted I'll submit a patch to convert the RDMA core as well.  Also
      to this point others are looking to use *_fast.
      
      As an aside, Jasons pointed out in my previous submission that *_fast and
      *_unlocked look very much the same.  I agree and I think further cleanup
      will be coming.  But I'm focused on getting the final solution for DAX at
      the moment.
      
      This patch (of 7):
      
      This patch starts a series which aims to support FOLL_LONGTERM in
      get_user_pages_fast().  Some callers who would like to do a longterm (user
      controlled pin) of pages with the fast variant of GUP for performance
      purposes.
      
      Rather than have a separate get_user_pages_longterm() call, introduce
      FOLL_LONGTERM and change the longterm callers to use it.
      
      This patch does not change any functionality.  In the short term
      "longterm" or user controlled pins are unsafe for Filesystems and FS DAX
      in particular has been blocked.  However, callers of get_user_pages_fast()
      were not "protected".
      
      FOLL_LONGTERM can _only_ be supported with get_user_pages[_fast]() as it
      requires vmas to determine if DAX is in use.
      
      NOTE: In merging with the CMA changes we opt to change the
      get_user_pages() call in check_and_migrate_cma_pages() to a call of
      __get_user_pages_locked() on the newly migrated pages.  This makes the
      code read better in that we are calling __get_user_pages_locked() on the
      pages before and after a potential migration.
      
      As a side affect some of the interfaces are cleaned up but this is not the
      primary purpose of the series.
      
      In review[1] it was asked:
      
      <quote>
      > This I don't get - if you do lock down long term mappings performance
      > of the actual get_user_pages call shouldn't matter to start with.
      >
      > What do I miss?
      
      A couple of points.
      
      First "longterm" is a relative thing and at this point is probably a
      misnomer.  This is really flagging a pin which is going to be given to
      hardware and can't move.  I've thought of a couple of alternative names
      but I think we have to settle on if we are going to use FL_LAYOUT or
      something else to solve the "longterm" problem.  Then I think we can
      change the flag to a better name.
      
      Second, It depends on how often you are registering memory.  I have spoken
      with some RDMA users who consider MR in the performance path...  For the
      overall application performance.  I don't have the numbers as the tests
      for HFI1 were done a long time ago.  But there was a significant
      advantage.  Some of which is probably due to the fact that you don't have
      to hold mmap_sem.
      
      Finally, architecturally I think it would be good for everyone to use
      *_fast.  There are patches submitted to the RDMA list which would allow
      the use of *_fast (they reworking the use of mmap_sem) and as soon as they
      are accepted I'll submit a patch to convert the RDMA core as well.  Also
      to this point others are looking to use *_fast.
      
      As an asside, Jasons pointed out in my previous submission that *_fast and
      *_unlocked look very much the same.  I agree and I think further cleanup
      will be coming.  But I'm focused on getting the final solution for DAX at
      the moment.
      
      </quote>
      
      [1] https://lore.kernel.org/lkml/20190220180255.GA12020@iweiny-DESK2.sc.intel.com/T/#md6abad2569f3bf6c1f03686c8097ab6563e94965
      
      [ira.weiny@intel.com: v3]
        Link: http://lkml.kernel.org/r/20190328084422.29911-2-ira.weiny@intel.com
      Link: http://lkml.kernel.org/r/20190328084422.29911-2-ira.weiny@intel.com
      Link: http://lkml.kernel.org/r/20190317183438.2057-2-ira.weiny@intel.comSigned-off-by: NIra Weiny <ira.weiny@intel.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Mike Marshall <hubcap@omnibond.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      932f4a63
  6. 14 5月, 2019 1 次提交
    • E
      flow_dissector: disable preemption around BPF calls · b1c17a9a
      Eric Dumazet 提交于
      Various things in eBPF really require us to disable preemption
      before running an eBPF program.
      
      syzbot reported :
      
      BUG: assuming atomic context at net/core/flow_dissector.c:737
      in_atomic(): 0, irqs_disabled(): 0, pid: 24710, name: syz-executor.3
      2 locks held by syz-executor.3/24710:
       #0: 00000000e81a4bf1 (&tfile->napi_mutex){+.+.}, at: tun_get_user+0x168e/0x3ff0 drivers/net/tun.c:1850
       #1: 00000000254afebd (rcu_read_lock){....}, at: __skb_flow_dissect+0x1e1/0x4bb0 net/core/flow_dissector.c:822
      CPU: 1 PID: 24710 Comm: syz-executor.3 Not tainted 5.1.0+ #6
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       __cant_sleep kernel/sched/core.c:6165 [inline]
       __cant_sleep.cold+0xa3/0xbb kernel/sched/core.c:6142
       bpf_flow_dissect+0xfe/0x390 net/core/flow_dissector.c:737
       __skb_flow_dissect+0x362/0x4bb0 net/core/flow_dissector.c:853
       skb_flow_dissect_flow_keys_basic include/linux/skbuff.h:1322 [inline]
       skb_probe_transport_header include/linux/skbuff.h:2500 [inline]
       skb_probe_transport_header include/linux/skbuff.h:2493 [inline]
       tun_get_user+0x2cfe/0x3ff0 drivers/net/tun.c:1940
       tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2037
       call_write_iter include/linux/fs.h:1872 [inline]
       do_iter_readv_writev+0x5fd/0x900 fs/read_write.c:693
       do_iter_write fs/read_write.c:970 [inline]
       do_iter_write+0x184/0x610 fs/read_write.c:951
       vfs_writev+0x1b3/0x2f0 fs/read_write.c:1015
       do_writev+0x15b/0x330 fs/read_write.c:1058
       __do_sys_writev fs/read_write.c:1131 [inline]
       __se_sys_writev fs/read_write.c:1128 [inline]
       __x64_sys_writev+0x75/0xb0 fs/read_write.c:1128
       do_syscall_64+0x103/0x670 arch/x86/entry/common.c:298
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fixes: d58e468b ("flow_dissector: implements flow dissector BPF hook")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Petar Penkov <ppenkov@google.com>
      Cc: Stanislav Fomichev <sdf@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1c17a9a
  7. 13 5月, 2019 2 次提交
  8. 12 5月, 2019 1 次提交
  9. 11 5月, 2019 2 次提交
    • Y
      dsa: tag_brcm: Fix build error without CONFIG_NET_DSA_TAG_BRCM_PREPEND · b96a5415
      YueHaibing 提交于
      Fix gcc build error:
      
      net/dsa/tag_brcm.c:211:16: error: brcm_prepend_netdev_ops undeclared here (not in a function); did you mean brcm_netdev_ops?
       DSA_TAG_DRIVER(brcm_prepend_netdev_ops);
                      ^
      ./include/net/dsa.h:708:10: note: in definition of macro DSA_TAG_DRIVER
        .ops = &__ops,       \
                ^~~~~
      ./include/net/dsa.h:701:36: warning: dsa_tag_driver_brcm_prepend_netdev_ops defined but not used [-Wunused-variable]
       #define DSA_TAG_DRIVER_NAME(__ops) dsa_tag_driver ## _ ## __ops
                                          ^
      ./include/net/dsa.h:707:30: note: in expansion of macro DSA_TAG_DRIVER_NAME
       static struct dsa_tag_driver DSA_TAG_DRIVER_NAME(__ops) = {  \
                                    ^~~~~~~~~~~~~~~~~~~
      net/dsa/tag_brcm.c:211:1: note: in expansion of macro DSA_TAG_DRIVER
       DSA_TAG_DRIVER(brcm_prepend_netdev_ops);
      
      Like the CONFIG_NET_DSA_TAG_BRCM case,
      brcm_prepend_netdev_ops and DSA_TAG_PROTO_BRCM_PREPEND
      should be wrappeed by CONFIG_NET_DSA_TAG_BRCM_PREPEND.
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Fixes: b74b70c4 ("net: dsa: Support prepended Broadcom tag")
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b96a5415
    • T
      bridge: Fix error path for kobject_init_and_add() · bdfad5ae
      Tobin C. Harding 提交于
      Currently error return from kobject_init_and_add() is not followed by a
      call to kobject_put().  This means there is a memory leak.  We currently
      set p to NULL so that kfree() may be called on it as a noop, the code is
      arguably clearer if we move the kfree() up closer to where it is
      called (instead of after goto jump).
      
      Remove a goto label 'err1' and jump to call to kobject_put() in error
      return from kobject_init_and_add() fixing the memory leak.  Re-name goto
      label 'put_back' to 'err1' now that we don't use err1, following current
      nomenclature (err1, err2 ...).  Move call to kfree out of the error
      code at bottom of function up to closer to where memory was allocated.
      Add comment to clarify call to kfree().
      Signed-off-by: NTobin C. Harding <tobin@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bdfad5ae
  10. 10 5月, 2019 9 次提交
  11. 09 5月, 2019 5 次提交
  12. 08 5月, 2019 9 次提交
    • P
      net/sched: remove block pointer from common offload structure · d6787147
      Pieter Jansen van Vuuren 提交于
      Based on feedback from Jiri avoid carrying a pointer to the tcf_block
      structure in the tc_cls_common_offload structure. Instead store
      a flag in driver private data which indicates if offloads apply
      to a shared block at block binding time.
      Suggested-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d6787147
    • P
      net: ethernet: support of_get_mac_address new ERR_PTR error · a51645f7
      Petr Štetiar 提交于
      There was NVMEM support added to of_get_mac_address, so it could now
      return ERR_PTR encoded error values, so we need to adjust all current
      users of of_get_mac_address to this new fact.
      
      While at it, remove superfluous is_valid_ether_addr as the MAC address
      returned from of_get_mac_address is always valid and checked by
      is_valid_ether_addr anyway.
      
      Fixes: d01f449c ("of_net: add NVMEM support to of_get_mac_address")
      Signed-off-by: NPetr Štetiar <ynezz@true.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a51645f7
    • P
      net: dsa: support of_get_mac_address new ERR_PTR error · 4974f9b7
      Petr Štetiar 提交于
      There was NVMEM support added to of_get_mac_address, so it could now
      return ERR_PTR encoded error values, so we need to adjust all current
      users of of_get_mac_address to this new fact.
      
      While at it, remove superfluous is_valid_ether_addr as the MAC address
      returned from of_get_mac_address is always valid and checked by
      is_valid_ether_addr anyway.
      
      Fixes: d01f449c ("of_net: add NVMEM support to of_get_mac_address")
      Signed-off-by: NPetr Štetiar <ynezz@true.cz>
      Tested-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4974f9b7
    • S
      vrf: sit mtu should not be updated when vrf netdev is the link · ff6ab32b
      Stephen Suryaputra 提交于
      VRF netdev mtu isn't typically set and have an mtu of 65536. When the
      link of a tunnel is set, the tunnel mtu is changed from 1480 to the link
      mtu minus tunnel header. In the case of VRF netdev is the link, then the
      tunnel mtu becomes 65516. So, fix it by not setting the tunnel mtu in
      this case.
      Signed-off-by: NStephen Suryaputra <ssuryaextr@gmail.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ff6ab32b
    • Y
      net: dsa: Fix error cleanup path in dsa_init_module · 68be9302
      YueHaibing 提交于
      BUG: unable to handle kernel paging request at ffffffffa01c5430
      PGD 3270067 P4D 3270067 PUD 3271063 PMD 230bc5067 PTE 0
      Oops: 0000 [#1
      CPU: 0 PID: 6159 Comm: modprobe Not tainted 5.1.0+ #33
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
      RIP: 0010:raw_notifier_chain_register+0x16/0x40
      Code: 63 f8 66 90 e9 5d ff ff ff 90 90 90 90 90 90 90 90 90 90 90 55 48 8b 07 48 89 e5 48 85 c0 74 1c 8b 56 10 3b 50 10 7e 07 eb 12 <39> 50 10 7c 0d 48 8d 78 08 48 8b 40 08 48 85 c0 75 ee 48 89 46 08
      RSP: 0018:ffffc90001c33c08 EFLAGS: 00010282
      RAX: ffffffffa01c5420 RBX: ffffffffa01db420 RCX: 4fcef45928070a8b
      RDX: 0000000000000000 RSI: ffffffffa01db420 RDI: ffffffffa01b0068
      RBP: ffffc90001c33c08 R08: 000000003e0a33d0 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000094443661 R12: ffff88822c320700
      R13: ffff88823109be80 R14: 0000000000000000 R15: ffffc90001c33e78
      FS:  00007fab8bd08540(0000) GS:ffff888237a00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffffffffa01c5430 CR3: 00000002297ea000 CR4: 00000000000006f0
      Call Trace:
       register_netdevice_notifier+0x43/0x250
       ? 0xffffffffa01e0000
       dsa_slave_register_notifier+0x13/0x70 [dsa_core
       ? 0xffffffffa01e0000
       dsa_init_module+0x2e/0x1000 [dsa_core
       do_one_initcall+0x6c/0x3cc
       ? do_init_module+0x22/0x1f1
       ? rcu_read_lock_sched_held+0x97/0xb0
       ? kmem_cache_alloc_trace+0x325/0x3b0
       do_init_module+0x5b/0x1f1
       load_module+0x1db1/0x2690
       ? m_show+0x1d0/0x1d0
       __do_sys_finit_module+0xc5/0xd0
       __x64_sys_finit_module+0x15/0x20
       do_syscall_64+0x6b/0x1d0
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Cleanup allocated resourses if there are errors,
      otherwise it will trgger memleak.
      
      Fixes: c9eb3e0f ("net: dsa: Add support for learning FDB through notification")
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Reviewed-by: NVivien Didelot <vivien.didelot@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      68be9302
    • Y
      l2tp: Fix possible NULL pointer dereference · 638a3a1e
      YueHaibing 提交于
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000128
      PGD 0 P4D 0
      Oops: 0000 [#1
      CPU: 0 PID: 5697 Comm: modprobe Tainted: G        W         5.1.0-rc7+ #1
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
      RIP: 0010:__lock_acquire+0x53/0x10b0
      Code: 8b 1c 25 40 5e 01 00 4c 8b 6d 10 45 85 e4 0f 84 bd 06 00 00 44 8b 1d 7c d2 09 02 49 89 fe 41 89 d2 45 85 db 0f 84 47 02 00 00 <48> 81 3f a0 05 70 83 b8 00 00 00 00 44 0f 44 c0 83 fe 01 0f 86 3a
      RSP: 0018:ffffc90001c07a28 EFLAGS: 00010002
      RAX: 0000000000000000 RBX: ffff88822f038440 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000128
      RBP: ffffc90001c07a88 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000001
      R13: 0000000000000000 R14: 0000000000000128 R15: 0000000000000000
      FS:  00007fead0811540(0000) GS:ffff888237a00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000128 CR3: 00000002310da000 CR4: 00000000000006f0
      Call Trace:
       ? __lock_acquire+0x24e/0x10b0
       lock_acquire+0xdf/0x230
       ? flush_workqueue+0x71/0x530
       flush_workqueue+0x97/0x530
       ? flush_workqueue+0x71/0x530
       l2tp_exit_net+0x170/0x2b0 [l2tp_core
       ? l2tp_exit_net+0x93/0x2b0 [l2tp_core
       ops_exit_list.isra.6+0x36/0x60
       unregister_pernet_operations+0xb8/0x110
       unregister_pernet_device+0x25/0x40
       l2tp_init+0x55/0x1000 [l2tp_core
       ? 0xffffffffa018d000
       do_one_initcall+0x6c/0x3cc
       ? do_init_module+0x22/0x1f1
       ? rcu_read_lock_sched_held+0x97/0xb0
       ? kmem_cache_alloc_trace+0x325/0x3b0
       do_init_module+0x5b/0x1f1
       load_module+0x1db1/0x2690
       ? m_show+0x1d0/0x1d0
       __do_sys_finit_module+0xc5/0xd0
       __x64_sys_finit_module+0x15/0x20
       do_syscall_64+0x6b/0x1d0
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x7fead031a839
      Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
      RSP: 002b:00007ffe8d9acca8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
      RAX: ffffffffffffffda RBX: 0000560078398b80 RCX: 00007fead031a839
      RDX: 0000000000000000 RSI: 000056007659dc2e RDI: 0000000000000003
      RBP: 000056007659dc2e R08: 0000000000000000 R09: 0000560078398b80
      R10: 0000000000000003 R11: 0000000000000246 R12: 0000000000000000
      R13: 00005600783a04a0 R14: 0000000000040000 R15: 0000560078398b80
      Modules linked in: l2tp_core(+) e1000 ip_tables ipv6 [last unloaded: l2tp_core
      CR2: 0000000000000128
      ---[ end trace 8322b2b8bf83f8e1
      
      If alloc_workqueue fails in l2tp_init, l2tp_net_ops
      is unregistered on failure path. Then l2tp_exit_net
      is called which will flush NULL workqueue, this patch
      add a NULL check to fix it.
      
      Fixes: 67e04c29 ("l2tp: unregister l2tp_net_ops on failure path")
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Acked-by: NGuillaume Nault <gnault@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      638a3a1e
    • C
      taprio: add null check on sched_nest to avoid potential null pointer dereference · e4acf427
      Colin Ian King 提交于
      The call to nla_nest_start_noflag can return a null pointer and currently
      this is not being checked and this can lead to a null pointer dereference
      when the null pointer sched_nest is passed to function nla_nest_end. Fix
      this by adding in a null pointer check.
      
      Addresses-Coverity: ("Dereference null return value")
      Fixes: a3d43c0d ("taprio: Add support adding an admin schedule")
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e4acf427
    • E
      net_sched: sch_fq: handle non connected flows · 37c0aead
      Eric Dumazet 提交于
      FQ packet scheduler assumed that packets could be classified
      based on their owning socket.
      
      This means that if a UDP server uses one UDP socket to send
      packets to different destinations, packets all land
      in one FQ flow.
      
      This is unfair, since each TCP flow has a unique bucket, meaning
      that in case of pressure (fully utilised uplink), TCP flows
      have more share of the bandwidth.
      
      If we instead detect unconnected sockets, we can use a stochastic
      hash based on the 4-tuple hash.
      
      This also means a QUIC server using one UDP socket will properly
      spread the outgoing packets to different buckets, and in-kernel
      pacing based on EDT model will no longer risk having big rb-tree on
      one flow.
      
      Note that UDP application might provide the skb->hash in an
      ancillary message at sendmsg() time to avoid the cost of a dissection
      in fq packet scheduler.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      37c0aead
    • E
      net_sched: sch_fq: do not assume EDT packets are ordered · eeb84aa0
      Eric Dumazet 提交于
      TCP stack makes sure packets for a given flow are monotically
      increasing, but we want to allow UDP packets to use EDT as
      well, so that QUIC servers can use in-kernel pacing.
      
      This patch adds a per-flow rb-tree on which packets might
      be stored. We still try to use the linear list for the
      typical cases where packets are queued with monotically
      increasing skb->tstamp, since queue/dequeue packets on
      a standard list is O(1).
      
      Note that the ability to store packets in arbitrary EDT
      order will allow us to implement later a per TCP socket
      mechanism adding delays (with jitter eventually) and reorders,
      to implement convenient network emulators.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eeb84aa0