1. 29 12月, 2017 3 次提交
  2. 28 12月, 2017 7 次提交
  3. 27 12月, 2017 12 次提交
    • D
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec · 65bbbf6c
      David S. Miller 提交于
      Steffen Klassert says:
      
      ====================
      pull request (net): ipsec 2017-12-22
      
      1) Check for valid id proto in validate_tmpl(), otherwise
         we may trigger a warning in xfrm_state_fini().
         From Cong Wang.
      
      2) Fix a typo on XFRMA_OUTPUT_MARK policy attribute.
         From Michal Kubecek.
      
      3) Verify the state is valid when encap_type < 0,
         otherwise we may crash on IPsec GRO .
         From Aviv Heller.
      
      4) Fix stack-out-of-bounds read on socket policy lookup.
         We access the flowi of the wrong address family in the
         IPv4 mapped IPv6 case, fix this by catching address
         family missmatches before we do the lookup.
      
      5) fix xfrm_do_migrate() with AEAD to copy the geniv
         field too. Otherwise the state is not fully initialized
         and migration fails. From Antony Antony.
      
      6) Fix stack-out-of-bounds with misconfigured transport
         mode policies. Our policy template validation is not
         strict enough. It is possible to configure policies
         with transport mode template where the address family
         of the template does not match the selectors address
         family. Fix this by refusing such a configuration,
         address family can not change on transport mode.
      
      7) Fix a policy reference leak when reusing pcpu xdst
         entry. From Florian Westphal.
      
      8) Reinject transport-mode packets through tasklet,
         otherwise it is possible to reate a recursion
         loop. From Herbert Xu.
      
      Please pull or let me know if there are problems.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      65bbbf6c
    • F
      net: fec: unmap the xmit buffer that are not transferred by DMA · 178e5f57
      Fugang Duan 提交于
      The enet IP only support 32 bit, it will use swiotlb buffer to do dma
      mapping when xmit buffer DMA memory address is bigger than 4G in i.MX
      platform. After stress suspend/resume test, it will print out:
      
      log:
      [12826.352864] fec 5b040000.ethernet: swiotlb buffer is full (sz: 191 bytes)
      [12826.359676] DMA: Out of SW-IOMMU space for 191 bytes at device 5b040000.ethernet
      [12826.367110] fec 5b040000.ethernet eth0: Tx DMA memory map failed
      
      The issue is that the ready xmit buffers that are dma mapped but DMA still
      don't copy them into fifo, once MAC restart, these DMA buffers are not unmapped.
      So it should check the dma mapping buffer and unmap them.
      Signed-off-by: NFugang Duan <fugang.duan@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      178e5f57
    • T
      tipc: fix tipc_mon_delete() oops in tipc_enable_bearer() error path · 642a8439
      Tommi Rantala 提交于
      Calling tipc_mon_delete() before the monitor has been created will oops.
      This can happen in tipc_enable_bearer() error path if tipc_disc_create()
      fails.
      
      [   48.589074] BUG: unable to handle kernel paging request at 0000000000001008
      [   48.590266] IP: tipc_mon_delete+0xea/0x270 [tipc]
      [   48.591223] PGD 1e60c5067 P4D 1e60c5067 PUD 1eb0cf067 PMD 0
      [   48.592230] Oops: 0000 [#1] SMP KASAN
      [   48.595610] CPU: 5 PID: 1199 Comm: tipc Tainted: G    B            4.15.0-rc4-pc64-dirty #5
      [   48.597176] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
      [   48.598489] RIP: 0010:tipc_mon_delete+0xea/0x270 [tipc]
      [   48.599347] RSP: 0018:ffff8801d827f668 EFLAGS: 00010282
      [   48.600705] RAX: ffff8801ee813f00 RBX: 0000000000000204 RCX: 0000000000000000
      [   48.602183] RDX: 1ffffffff1de6a75 RSI: 0000000000000297 RDI: 0000000000000297
      [   48.604373] RBP: 0000000000000000 R08: 0000000000000000 R09: fffffbfff1dd1533
      [   48.605607] R10: ffffffff8eafbb05 R11: fffffbfff1dd1534 R12: 0000000000000050
      [   48.607082] R13: dead000000000200 R14: ffffffff8e73f310 R15: 0000000000001020
      [   48.608228] FS:  00007fc686484800(0000) GS:ffff8801f5540000(0000) knlGS:0000000000000000
      [   48.610189] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   48.611459] CR2: 0000000000001008 CR3: 00000001dda70002 CR4: 00000000003606e0
      [   48.612759] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   48.613831] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   48.615038] Call Trace:
      [   48.615635]  tipc_enable_bearer+0x415/0x5e0 [tipc]
      [   48.620623]  tipc_nl_bearer_enable+0x1ab/0x200 [tipc]
      [   48.625118]  genl_family_rcv_msg+0x36b/0x570
      [   48.631233]  genl_rcv_msg+0x5a/0xa0
      [   48.631867]  netlink_rcv_skb+0x1cc/0x220
      [   48.636373]  genl_rcv+0x24/0x40
      [   48.637306]  netlink_unicast+0x29c/0x350
      [   48.639664]  netlink_sendmsg+0x439/0x590
      [   48.642014]  SYSC_sendto+0x199/0x250
      [   48.649912]  do_syscall_64+0xfd/0x2c0
      [   48.650651]  entry_SYSCALL64_slow_path+0x25/0x25
      [   48.651843] RIP: 0033:0x7fc6859848e3
      [   48.652539] RSP: 002b:00007ffd25dff938 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      [   48.654003] RAX: ffffffffffffffda RBX: 00007ffd25dff990 RCX: 00007fc6859848e3
      [   48.655303] RDX: 0000000000000054 RSI: 00007ffd25dff990 RDI: 0000000000000003
      [   48.656512] RBP: 00007ffd25dff980 R08: 00007fc685c35fc0 R09: 000000000000000c
      [   48.657697] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000d13010
      [   48.658840] R13: 00007ffd25e009c0 R14: 0000000000000000 R15: 0000000000000000
      [   48.662972] RIP: tipc_mon_delete+0xea/0x270 [tipc] RSP: ffff8801d827f668
      [   48.664073] CR2: 0000000000001008
      [   48.664576] ---[ end trace e811818d54d5ce88 ]---
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTommi Rantala <tommi.t.rantala@nokia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      642a8439
    • T
      tipc: error path leak fixes in tipc_enable_bearer() · 19142551
      Tommi Rantala 提交于
      Fix memory leak in tipc_enable_bearer() if enable_media() fails, and
      cleanup with bearer_disable() if tipc_mon_create() fails.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NTommi Rantala <tommi.t.rantala@nokia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      19142551
    • A
      RDS: Check cmsg_len before dereferencing CMSG_DATA · 14e138a8
      Avinash Repaka 提交于
      RDS currently doesn't check if the length of the control message is
      large enough to hold the required data, before dereferencing the control
      message data. This results in following crash:
      
      BUG: KASAN: stack-out-of-bounds in rds_rdma_bytes net/rds/send.c:1013
      [inline]
      BUG: KASAN: stack-out-of-bounds in rds_sendmsg+0x1f02/0x1f90
      net/rds/send.c:1066
      Read of size 8 at addr ffff8801c928fb70 by task syzkaller455006/3157
      
      CPU: 0 PID: 3157 Comm: syzkaller455006 Not tainted 4.15.0-rc3+ #161
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:53
       print_address_description+0x73/0x250 mm/kasan/report.c:252
       kasan_report_error mm/kasan/report.c:351 [inline]
       kasan_report+0x25b/0x340 mm/kasan/report.c:409
       __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430
       rds_rdma_bytes net/rds/send.c:1013 [inline]
       rds_sendmsg+0x1f02/0x1f90 net/rds/send.c:1066
       sock_sendmsg_nosec net/socket.c:628 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:638
       ___sys_sendmsg+0x320/0x8b0 net/socket.c:2018
       __sys_sendmmsg+0x1ee/0x620 net/socket.c:2108
       SYSC_sendmmsg net/socket.c:2139 [inline]
       SyS_sendmmsg+0x35/0x60 net/socket.c:2134
       entry_SYSCALL_64_fastpath+0x1f/0x96
      RIP: 0033:0x43fe49
      RSP: 002b:00007fffbe244ad8 EFLAGS: 00000217 ORIG_RAX: 0000000000000133
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043fe49
      RDX: 0000000000000001 RSI: 000000002020c000 RDI: 0000000000000003
      RBP: 00000000006ca018 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000217 R12: 00000000004017b0
      R13: 0000000000401840 R14: 0000000000000000 R15: 0000000000000000
      
      To fix this, we verify that the cmsg_len is large enough to hold the
      data to be read, before proceeding further.
      Reported-by: Nsyzbot <syzkaller-bugs@googlegroups.com>
      Signed-off-by: NAvinash Repaka <avinash.repaka@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Reviewed-by: NYuval Shaia <yuval.shaia@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      14e138a8
    • M
      tcp: Avoid preprocessor directives in tracepoint macro args · 6a6b0b99
      Mat Martineau 提交于
      Using a preprocessor directive to check for CONFIG_IPV6 in the middle of
      a DECLARE_EVENT_CLASS macro's arg list causes sparse to report a series
      of errors:
      
      ./include/trace/events/tcp.h:68:1: error: directive in argument list
      ./include/trace/events/tcp.h:75:1: error: directive in argument list
      ./include/trace/events/tcp.h:144:1: error: directive in argument list
      ./include/trace/events/tcp.h:151:1: error: directive in argument list
      ./include/trace/events/tcp.h:216:1: error: directive in argument list
      ./include/trace/events/tcp.h:223:1: error: directive in argument list
      ./include/trace/events/tcp.h:274:1: error: directive in argument list
      ./include/trace/events/tcp.h:281:1: error: directive in argument list
      
      Once sparse finds an error, it stops printing warnings for the file it
      is checking. This masks any sparse warnings that would normally be
      reported for the core TCP code.
      
      Instead, handle the preprocessor conditionals in a couple of auxiliary
      macros. This also has the benefit of reducing duplicate code.
      
      Cc: David Ahern <dsahern@gmail.com>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6a6b0b99
    • J
      tipc: fix memory leak of group member when peer node is lost · 3a33a19b
      Jon Maloy 提交于
      When a group member receives a member WITHDRAW event, this might have
      two reasons: either the peer member is leaving the group, or the link
      to the member's node has been lost.
      
      In the latter case we need to issue a DOWN event to the user right away,
      and let function tipc_group_filter_msg() perform delete of the member
      item. However, in this case we miss to change the state of the member
      item to MBR_LEAVING, so the member item is not deleted, and we have a
      memory leak.
      
      We now separate better between the four sub-cases of a WITHRAW event
      and make sure that each case is handled correctly.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a33a19b
    • J
      net: sched: fix possible null pointer deref in tcf_block_put · 4853f128
      Jiri Pirko 提交于
      We need to check block for being null in both tcf_block_put and
      tcf_block_put_ext.
      
      Fixes: 343723dd ("net: sched: fix clsact init error path")
      Reported-by: NPrashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4853f128
    • J
      tipc: base group replicast ack counter on number of actual receivers · 0a3d805c
      Jon Maloy 提交于
      In commit 2f487712 ("tipc: guarantee that group broadcast doesn't
      bypass group unicast") we introduced a mechanism that requires the first
      (replicated) broadcast sent after a unicast to be acknowledged by all
      receivers before permitting sending of the next (true) broadcast.
      
      The counter for keeping track of the number of acknowledges to expect
      is based on the tipc_group::member_cnt variable. But this misses that
      some of the known members may not be ready for reception, and will never
      acknowledge the message, either because they haven't fully joined the
      group or because they are leaving the group. Such members are identified
      by not fulfilling the condition tested for in the function
      tipc_group_is_enabled().
      
      We now set the counter for the actual number of acks to receive at the
      moment the message is sent, by just counting the number of recipients
      satisfying the tipc_group_is_enabled() test.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a3d805c
    • C
      net_sched: fix a missing rcu barrier in mini_qdisc_pair_swap() · b2fb01f4
      Cong Wang 提交于
      The rcu_barrier_bh() in mini_qdisc_pair_swap() is to wait for
      flying RCU callback installed by a previous mini_qdisc_pair_swap(),
      however we miss it on the tp_head==NULL path, which leads to that
      the RCU callback still uses miniq_old->rcu after it is freed together
      with qdisc in qdisc_graft(). So just add it on that path too.
      
      Fixes: 46209401 ("net: core: introduce mini_Qdisc and eliminate usage of tp->q for clsact fastpath ")
      Reported-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Tested-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Cc: Jiri Pirko <jiri@mellanox.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b2fb01f4
    • G
      net: phy: micrel: ksz9031: reconfigure autoneg after phy autoneg workaround · c1a8d0a3
      Grygorii Strashko 提交于
      Under some circumstances driver will perform PHY reset in
      ksz9031_read_status() to fix autoneg failure case (idle error count =
      0xFF). When this happens ksz9031 will not detect link status change any
      more when connecting to Netgear 1G switch (link can be recovered sometimes by
      restarting netdevice "ifconfig down up"). Reproduced with TI am572x board
      equipped with ksz9031 PHY while connecting to Netgear 1G switch.
      
      Fix the issue by reconfiguring autonegotiation after PHY reset in
      ksz9031_read_status().
      
      Fixes: d2fd719b ("net/phy: micrel: Add workaround for bad autoneg")
      Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c1a8d0a3
    • A
      ip6_gre: fix device features for ioctl setup · e5a9336a
      Alexey Kodanev 提交于
      When ip6gre is created using ioctl, its features, such as
      scatter-gather, GSO and tx-checksumming will be turned off:
      
        # ip -f inet6 tunnel add gre6 mode ip6gre remote fd00::1
        # ethtool -k gre6 (truncated output)
          tx-checksumming: off
          scatter-gather: off
          tcp-segmentation-offload: off
          generic-segmentation-offload: off [requested on]
      
      But when netlink is used, they will be enabled:
        # ip link add gre6 type ip6gre remote fd00::1
        # ethtool -k gre6 (truncated output)
          tx-checksumming: on
          scatter-gather: on
          tcp-segmentation-offload: on
          generic-segmentation-offload: on
      
      This results in a loss of performance when gre6 is created via ioctl.
      The issue was found with LTP/gre tests.
      
      Fix it by moving the setup of device features to a separate function
      and invoke it with ndo_init callback because both netlink and ioctl
      will eventually call it via register_netdevice():
      
         register_netdevice()
             - ndo_init() callback -> ip6gre_tunnel_init() or ip6gre_tap_init()
                 - ip6gre_tunnel_init_common()
                      - ip6gre_tnl_init_features()
      
      The moved code also contains two minor style fixes:
        * removed needless tab from GRE6_FEATURES on NETIF_F_HIGHDMA line.
        * fixed the issue reported by checkpatch: "Unnecessary parentheses around
          'nt->encap.type == TUNNEL_ENCAP_NONE'"
      
      Fixes: ac4eb009 ("ip6gre: Add support for basic offloads offloads excluding GSO")
      Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e5a9336a
  4. 26 12月, 2017 2 次提交
  5. 23 12月, 2017 3 次提交
  6. 22 12月, 2017 11 次提交
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · ead68f21
      Linus Torvalds 提交于
      Pull networking fixes from David Miller"
       "What's a holiday weekend without some networking bug fixes? [1]
      
         1) Fix some eBPF JIT bugs wrt. SKB pointers across helper function
            calls, from Daniel Borkmann.
      
         2) Fix regression from errata limiting change to marvell PHY driver,
            from Zhao Qiang.
      
         3) Fix u16 overflow in SCTP, from Xin Long.
      
         4) Fix potential memory leak during bridge newlink, from Nikolay
            Aleksandrov.
      
         5) Fix BPF selftest build on s390, from Hendrik Brueckner.
      
         6) Don't append to cfg80211 automatically generated certs file,
            always write new ones from scratch. From Thierry Reding.
      
         7) Fix sleep in atomic in mac80211 hwsim, from Jia-Ju Bai.
      
         8) Fix hang on tg3 MTU change with certain chips, from Brian King.
      
         9) Add stall detection to arc emac driver and reset chip when this
            happens, from Alexander Kochetkov.
      
        10) Fix MTU limitng in GRE tunnel drivers, from Xin Long.
      
        11) Fix stmmac timestamping bug due to mis-shifting of field. From
            Fredrik Hallenberg.
      
        12) Fix metrics match when deleting an ipv4 route. The kernel sets
            some internal metrics bits which the user isn't going to set when
            it makes the delete request. From Phil Sutter.
      
        13) mvneta driver loop over RX queues limits on "txq_number" :-) Fix
            from Yelena Krivosheev.
      
        14) Fix double free and memory corruption in get_net_ns_by_id, from
            Eric W. Biederman.
      
        15) Flush ipv4 FIB tables in the reverse order. Some tables can share
            their actual backing data, in particular this happens for the MAIN
            and LOCAL tables. We have to kill the LOCAL table first, because
            it uses MAIN's backing memory. Fix from Ido Schimmel.
      
        16) Several eBPF verifier value tracking fixes, from Edward Cree, Jann
            Horn, and Alexei Starovoitov.
      
        17) Make changes to ipv6 autoflowlabel sysctl really propagate to
            sockets, unless the socket has set the per-socket value
            explicitly. From Shaohua Li.
      
        18) Fix leaks and double callback invocations of zerocopy SKBs, from
            Willem de Bruijn"
      
      [1] Is this a trick question? "Relaxing"? "Quiet"? "Fine"? - Linus.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (77 commits)
        skbuff: skb_copy_ubufs must release uarg even without user frags
        skbuff: orphan frags before zerocopy clone
        net: reevalulate autoflowlabel setting after sysctl setting
        openvswitch: Fix pop_vlan action for double tagged frames
        ipv6: Honor specified parameters in fibmatch lookup
        bpf: do not allow root to mangle valid pointers
        selftests/bpf: add tests for recent bugfixes
        bpf: fix integer overflows
        bpf: don't prune branches when a scalar is replaced with a pointer
        bpf: force strict alignment checks for stack pointers
        bpf: fix missing error return in check_stack_boundary()
        bpf: fix 32-bit ALU op verification
        bpf: fix incorrect tracking of register size truncation
        bpf: fix incorrect sign extension in check_alu_op()
        bpf/verifier: fix bounds calculation on BPF_RSH
        ipv4: Fix use-after-free when flushing FIB tables
        s390/qeth: fix error handling in checksum cmd callback
        tipc: remove joining group member from congested list
        selftests: net: Adding config fragment CONFIG_NUMA=y
        nfp: bpf: keep track of the offloaded program
        ...
      ead68f21
    • Q
      selftests/bpf: fix Makefile for passing LLC to the command line · cd95a892
      Quentin Monnet 提交于
      Makefile has a LLC variable that is initialised to "llc", but can
      theoretically be overridden from the command line ("make LLC=llc-6.0").
      However, this fails because for LLVM probe check, "llc" is called
      directly. Use the $(LLC) variable instead to fix this.
      
      Fixes: 22c88526 ("bpf: improve selftests and add tests for meta pointer")
      Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      cd95a892
    • D
      Merge branch 'net-zerocopy-fixes' · c50b7c47
      David S. Miller 提交于
      Saeed Mahameed says:
      
      ===================
      Mellanox, mlx5 fixes 2017-12-19
      
      The follwoing series includes some fixes for mlx5 core and etherent
      driver.
      
      Please pull and let me know if there is any problem.
      
      This series doesn't introduce any conflict with the ongoing mlx5 for-next
      submission.
      
      For -stable:
      
      kernels >= v4.7.y
          ("net/mlx5e: Fix possible deadlock of VXLAN lock")
          ("net/mlx5e: Add refcount to VXLAN structure")
          ("net/mlx5e: Prevent possible races in VXLAN control flow")
          ("net/mlx5e: Fix features check of IPv6 traffic")
      
      kernels >= v4.9.y
          ("net/mlx5: Fix error flow in CREATE_QP command")
          ("net/mlx5: Fix rate limit packet pacing naming and struct")
      
      kernels >= v4.13.y
          ("net/mlx5: FPGA, return -EINVAL if size is zero")
      
      kernels >= v4.14.y
          ("Revert "mlx5: move affinity hints assignments to generic code")
      
      All above patches apply and compile with no issues on corresponding -stable.
      ===================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c50b7c47
    • W
      skbuff: skb_copy_ubufs must release uarg even without user frags · b90ddd56
      Willem de Bruijn 提交于
      skb_copy_ubufs creates a private copy of frags[] to release its hold
      on user frags, then calls uarg->callback to notify the owner.
      
      Call uarg->callback even when no frags exist. This edge case can
      happen when zerocopy_sg_from_iter finds enough room in skb_headlen
      to copy all the data.
      
      Fixes: 3ece7826 ("sock: skb_copy_ubufs support for compound pages")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b90ddd56
    • W
      skbuff: orphan frags before zerocopy clone · 268b7906
      Willem de Bruijn 提交于
      Call skb_zerocopy_clone after skb_orphan_frags, to avoid duplicate
      calls to skb_uarg(skb)->callback for the same data.
      
      skb_zerocopy_clone associates skb_shinfo(skb)->uarg from frag_skb
      with each segment. This is only safe for uargs that do refcounting,
      which is those that pass skb_orphan_frags without dropping their
      shared frags. For others, skb_orphan_frags drops the user frags and
      sets the uarg to NULL, after which sock_zerocopy_clone has no effect.
      
      Qemu hangs were reported due to duplicate vhost_net_zerocopy_callback
      calls for the same data causing the vhost_net_ubuf_ref_>refcount to
      drop below zero.
      
      Link: http://lkml.kernel.org/r/<CAF=yD-LWyCD4Y0aJ9O0e_CHLR+3JOeKicRRTEVCPxgw4XOcqGQ@mail.gmail.com>
      Fixes: 1f8b977a ("sock: enable MSG_ZEROCOPY")
      Reported-by: NAndreas Hartmann <andihartmann@01019freenet.de>
      Reported-by: NDavid Hill <dhill@redhat.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      268b7906
    • L
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 9035a896
      Linus Torvalds 提交于
      Pull block fixes from Jens Axboe:
       "It's been a few weeks, so here's a small collection of fixes that
        should go into the current series.
      
        This contains:
      
         - NVMe pull request from Christoph, with a few important fixes.
      
         - kyber hang fix from Omar.
      
         - A blk-throttl fix from Shaohua, fixing a case where we double
           charge a bio.
      
         - Two call_single_data alignment fixes from me, fixing up some
           unfortunate changes that went into 4.14 without being properly
           reviewed on the block side (since nobody was CC'ed on the
           patch...).
      
         - A bounce buffer fix in two parts, one from me and one from Ming.
      
         - Revert bdi debug error handling patch. It's causing boot issues for
           some folks, and a week down the line, we're still no closer to a
           fix. Revert this patch for now until it's figured out, then we can
           retry for 4.16"
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        Revert "bdi: add error handle for bdi_debug_register"
        null_blk: unalign call_single_data
        block: unalign call_single_data in struct request
        block-throttle: avoid double charge
        block: fix blk_rq_append_bio
        block: don't let passthrough IO go into .make_request_fn()
        nvme: setup streams after initializing namespace head
        nvme: check hw sectors before setting chunk sectors
        nvme: call blk_integrity_unregister after queue is cleaned up
        nvme-fc: remove double put reference if admin connect fails
        nvme: set discard_alignment to zero
        kyber: fix another domain token wait queue hang
      9035a896
    • L
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 409232a4
      Linus Torvalds 提交于
      Pull KVM fixes from Paolo Bonzini:
       "ARM fixes:
         - A bug in handling of SPE state for non-vhe systems
         - A fix for a crash on system shutdown
         - Three timer fixes, introduced by the timer optimizations for v4.15
      
        x86 fixes:
         - fix for a WARN that was introduced in 4.15
         - fix for SMM when guest uses PCID
         - fixes for several bugs found by syzkaller
      
        ... and a dozen papercut fixes for the kvm_stat tool"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits)
        tools/kvm_stat: sort '-f help' output
        kvm: x86: fix RSM when PCID is non-zero
        KVM: Fix stack-out-of-bounds read in write_mmio
        KVM: arm/arm64: Fix timer enable flow
        KVM: arm/arm64: Properly handle arch-timer IRQs after vtimer_save_state
        KVM: arm/arm64: timer: Don't set irq as forwarded if no usable GIC
        KVM: arm/arm64: Fix HYP unmapping going off limits
        arm64: kvm: Prevent restoring stale PMSCR_EL1 for vcpu
        KVM/x86: Check input paging mode when cs.l is set
        tools/kvm_stat: add line for totals
        tools/kvm_stat: stop ignoring unhandled arguments
        tools/kvm_stat: suppress usage information on command line errors
        tools/kvm_stat: handle invalid regular expressions
        tools/kvm_stat: add hint on '-f help' to man page
        tools/kvm_stat: fix child trace events accounting
        tools/kvm_stat: fix extra handling of 'help' with fields filter
        tools/kvm_stat: fix missing field update after filter change
        tools/kvm_stat: fix drilldown in events-by-guests mode
        tools/kvm_stat: fix command line option '-g'
        kvm: x86: fix WARN due to uninitialized guest FPU state
        ...
      409232a4
    • S
      net: reevalulate autoflowlabel setting after sysctl setting · 513674b5
      Shaohua Li 提交于
      sysctl.ip6.auto_flowlabels is default 1. In our hosts, we set it to 2.
      If sockopt doesn't set autoflowlabel, outcome packets from the hosts are
      supposed to not include flowlabel. This is true for normal packet, but
      not for reset packet.
      
      The reason is ipv6_pinfo.autoflowlabel is set in sock creation. Later if
      we change sysctl.ip6.auto_flowlabels, the ipv6_pinfo.autoflowlabel isn't
      changed, so the sock will keep the old behavior in terms of auto
      flowlabel. Reset packet is suffering from this problem, because reset
      packet is sent from a special control socket, which is created at boot
      time. Since sysctl.ipv6.auto_flowlabels is 1 by default, the control
      socket will always have its ipv6_pinfo.autoflowlabel set, even after
      user set sysctl.ipv6.auto_flowlabels to 1, so reset packset will always
      have flowlabel. Normal sock created before sysctl setting suffers from
      the same issue. We can't even turn off autoflowlabel unless we kill all
      socks in the hosts.
      
      To fix this, if IPV6_AUTOFLOWLABEL sockopt is used, we use the
      autoflowlabel setting from user, otherwise we always call
      ip6_default_np_autolabel() which has the new settings of sysctl.
      
      Note, this changes behavior a little bit. Before commit 42240901
      (ipv6: Implement different admin modes for automatic flow labels), the
      autoflowlabel behavior of a sock isn't sticky, eg, if sysctl changes,
      existing connection will change autoflowlabel behavior. After that
      commit, autoflowlabel behavior is sticky in the whole life of the sock.
      With this patch, the behavior isn't sticky again.
      
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Tom Herbert <tom@quantonium.net>
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      513674b5
    • E
      openvswitch: Fix pop_vlan action for double tagged frames · c48e7473
      Eric Garver 提交于
      skb_vlan_pop() expects skb->protocol to be a valid TPID for double
      tagged frames. So set skb->protocol to the TPID and let skb_vlan_pop()
      shift the true ethertype into position for us.
      
      Fixes: 5108bbad ("openvswitch: add processing of L3 packets")
      Signed-off-by: NEric Garver <e@erig.me>
      Reviewed-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c48e7473
    • J
      Revert "bdi: add error handle for bdi_debug_register" · 6d0e4827
      Jens Axboe 提交于
      This reverts commit a0747a85.
      
      It breaks some booting for some users, and more than a week
      into this, there's still no good fix. Revert this commit
      for now until a solution has been found.
      Reported-by: NLaura Abbott <labbott@redhat.com>
      Reported-by: NBruno Wolff III <bruno@wolff.to>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6d0e4827
    • I
      ipv6: Honor specified parameters in fibmatch lookup · 58acfd71
      Ido Schimmel 提交于
      Currently, parameters such as oif and source address are not taken into
      account during fibmatch lookup. Example (IPv4 for reference) before
      patch:
      
      $ ip -4 route show
      192.0.2.0/24 dev dummy0 proto kernel scope link src 192.0.2.1
      198.51.100.0/24 dev dummy1 proto kernel scope link src 198.51.100.1
      
      $ ip -6 route show
      2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium
      2001:db8:2::/64 dev dummy1 proto kernel metric 256 pref medium
      fe80::/64 dev dummy0 proto kernel metric 256 pref medium
      fe80::/64 dev dummy1 proto kernel metric 256 pref medium
      
      $ ip -4 route get fibmatch 192.0.2.2 oif dummy0
      192.0.2.0/24 dev dummy0 proto kernel scope link src 192.0.2.1
      $ ip -4 route get fibmatch 192.0.2.2 oif dummy1
      RTNETLINK answers: No route to host
      
      $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy0
      2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium
      $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy1
      2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium
      
      After:
      
      $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy0
      2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium
      $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy1
      RTNETLINK answers: Network is unreachable
      
      The problem stems from the fact that the necessary route lookup flags
      are not set based on these parameters.
      
      Instead of duplicating the same logic for fibmatch, we can simply
      resolve the original route from its copy and dump it instead.
      
      Fixes: 18c3a61c ("net: ipv6: RTM_GETROUTE: return matched fib result when requested")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58acfd71
  7. 21 12月, 2017 2 次提交