1. 31 10月, 2018 1 次提交
  2. 30 10月, 2018 10 次提交
  3. 29 10月, 2018 4 次提交
    • L
      net: diag: document swapped src/dst in udp_dump_one. · 747569b0
      Lorenzo Colitti 提交于
      Since its inception, udp_dump_one has had a bug where userspace
      needs to swap src and dst addresses and ports in order to find
      the socket it wants. This is because it passes the socket source
      address to __udp[46]_lib_lookup's saddr argument, but those
      functions are intended to find local sockets matching received
      packets, so saddr is the remote address, not the local address.
      
      This can no longer be fixed for backwards compatibility reasons,
      so add a brief comment explaining that this is the case. This
      will avoid confusion and help ensure SOCK_DIAG implementations
      of new protocols don't have the same problem.
      
      Fixes: a925aa00 ("udp_diag: Implement the get_exact dumping functionality")
      Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      747569b0
    • J
      net: sched: gred: pass the right attribute to gred_change_table_def() · 38b4f18d
      Jakub Kicinski 提交于
      gred_change_table_def() takes a pointer to TCA_GRED_DPS attribute,
      and expects it will be able to interpret its contents as
      struct tc_gred_sopt.  Pass the correct gred attribute, instead of
      TCA_OPTIONS.
      
      This bug meant the table definition could never be changed after
      Qdisc was initialized (unless whatever TCA_OPTIONS contained both
      passed netlink validation and was a valid struct tc_gred_sopt...).
      
      Old behaviour:
      $ ip link add type dummy
      $ tc qdisc replace dev dummy0 parent root handle 7: \
           gred setup vqs 4 default 0
      $ tc qdisc replace dev dummy0 parent root handle 7: \
           gred setup vqs 4 default 0
      RTNETLINK answers: Invalid argument
      
      Now:
      $ ip link add type dummy
      $ tc qdisc replace dev dummy0 parent root handle 7: \
           gred setup vqs 4 default 0
      $ tc qdisc replace dev dummy0 parent root handle 7: \
           gred setup vqs 4 default 0
      $ tc qdisc replace dev dummy0 parent root handle 7: \
           gred setup vqs 4 default 0
      
      Fixes: f62d6b93 ("[PKT_SCHED]: GRED: Use central VQ change procedure")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      38b4f18d
    • N
      net: bridge: remove ipv6 zero address check in mcast queries · 0fe5119e
      Nikolay Aleksandrov 提交于
      Recently a check was added which prevents marking of routers with zero
      source address, but for IPv6 that cannot happen as the relevant RFCs
      actually forbid such packets:
      RFC 2710 (MLDv1):
      "To be valid, the Query message MUST
       come from a link-local IPv6 Source Address, be at least 24 octets
       long, and have a correct MLD checksum."
      
      Same goes for RFC 3810.
      
      And also it can be seen as a requirement in ipv6_mc_check_mld_query()
      which is used by the bridge to validate the message before processing
      it. Thus any queries with :: source address won't be processed anyway.
      So just remove the check for zero IPv6 source address from the query
      processing function.
      
      Fixes: 5a2de63f ("bridge: do not add port to router list when receives query with source 0.0.0.0")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0fe5119e
    • D
      net: Properly unlink GRO packets on overflow. · ece23711
      David S. Miller 提交于
      Just like with normal GRO processing, we have to initialize
      skb->next to NULL when we unlink overflow packets from the
      GRO hash lists.
      
      Fixes: d4546c25 ("net: Convert GRO SKB handling to list_head.")
      Reported-by: NOleksandr Natalenko <oleksandr@natalenko.name>
      Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ece23711
  4. 27 10月, 2018 6 次提交
    • E
      net/neigh: fix NULL deref in pneigh_dump_table() · aab456df
      Eric Dumazet 提交于
      pneigh can have NULL device pointer, so we need to make
      neigh_master_filtered() and neigh_ifindex_filtered() more robust.
      
      syzbot report :
      
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] PREEMPT SMP KASAN
      CPU: 0 PID: 15867 Comm: syz-executor2 Not tainted 4.19.0+ #276
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:__read_once_size include/linux/compiler.h:179 [inline]
      RIP: 0010:list_empty include/linux/list.h:203 [inline]
      RIP: 0010:netdev_master_upper_dev_get+0xa1/0x250 net/core/dev.c:6467
      RSP: 0018:ffff8801bfaf7220 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: 0000000000000001 RCX: ffffc90005e92000
      RDX: 0000000000000016 RSI: ffffffff860b44d9 RDI: 0000000000000005
      RBP: ffff8801bfaf72b0 R08: ffff8801c4c84080 R09: fffffbfff139a580
      R10: fffffbfff139a580 R11: ffffffff89cd2c07 R12: 1ffff10037f5ee45
      R13: 0000000000000000 R14: ffff8801bfaf7288 R15: 00000000000000b0
      FS:  00007f65cc68d700(0000) GS:ffff8801dae00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000001b33a21000 CR3: 00000001c6116000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       neigh_master_filtered net/core/neighbour.c:2367 [inline]
       pneigh_dump_table net/core/neighbour.c:2456 [inline]
       neigh_dump_info+0x7a9/0x1ce0 net/core/neighbour.c:2577
       netlink_dump+0x606/0x1080 net/netlink/af_netlink.c:2244
       __netlink_dump_start+0x59a/0x7c0 net/netlink/af_netlink.c:2352
       netlink_dump_start include/linux/netlink.h:216 [inline]
       rtnetlink_rcv_msg+0x809/0xc20 net/core/rtnetlink.c:4898
       netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2477
       rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4953
       netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
       netlink_unicast+0x5a5/0x760 net/netlink/af_netlink.c:1336
       netlink_sendmsg+0xa18/0xfc0 net/netlink/af_netlink.c:1917
       sock_sendmsg_nosec net/socket.c:621 [inline]
       sock_sendmsg+0xd5/0x120 net/socket.c:631
       sock_write_iter+0x35e/0x5c0 net/socket.c:900
       call_write_iter include/linux/fs.h:1808 [inline]
       new_sync_write fs/read_write.c:474 [inline]
       __vfs_write+0x6b8/0x9f0 fs/read_write.c:487
       vfs_write+0x1fc/0x560 fs/read_write.c:549
       ksys_write+0x101/0x260 fs/read_write.c:598
       __do_sys_write fs/read_write.c:610 [inline]
       __se_sys_write fs/read_write.c:607 [inline]
       __x64_sys_write+0x73/0xb0 fs/read_write.c:607
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x457569
      
      Fixes: 6f52f80e ("net/neigh: Extend dump filter to proxy neighbor dumps")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: David Ahern <dsahern@gmail.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Tested-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aab456df
    • D
      bpf: fix wrong helper enablement in cgroup local storage · d8fd9e10
      Daniel Borkmann 提交于
      Commit cd339431 ("bpf: introduce the bpf_get_local_storage()
      helper function") enabled the bpf_get_local_storage() helper also
      for BPF program types where it does not make sense to use them.
      
      They have been added both in sk_skb_func_proto() and sk_msg_func_proto()
      even though both program types are not invoked in combination with
      cgroups, and neither through BPF_PROG_RUN_ARRAY(). In the latter the
      bpf_cgroup_storage_set() is set shortly before BPF program invocation.
      
      Later, the helper bpf_get_local_storage() retrieves this prior set
      up per-cpu pointer and hands the buffer to the BPF program. The map
      argument in there solely retrieves the enum bpf_cgroup_storage_type
      from a local storage map associated with the program and based on the
      type returns either the global or per-cpu storage. However, there
      is no specific association between the program's map and the actual
      content in bpf_cgroup_storage[].
      
      Meaning, any BPF program that would have been properly run from the
      cgroup side through BPF_PROG_RUN_ARRAY() where bpf_cgroup_storage_set()
      was performed, and that is later unloaded such that prog / maps are
      teared down will cause a use after free if that pointer is retrieved
      from programs that are not run through BPF_PROG_RUN_ARRAY() but have
      the cgroup local storage helper enabled in their func proto.
      
      Lets just remove it from the two sock_map program types to fix it.
      Auditing through the types where this helper is enabled, it appears
      that these are the only ones where it was mistakenly allowed.
      
      Fixes: cd339431 ("bpf: introduce the bpf_get_local_storage() helper function")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Roman Gushchin <guro@fb.com>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NRoman Gushchin <guro@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      d8fd9e10
    • M
      net: allow traceroute with a specified interface in a vrf · f64bf6b8
      Mike Manning 提交于
      Traceroute executed in a vrf succeeds if no device is given or if the
      vrf is given as the device, but fails if the interface is given as the
      device. This is for default UDP probes, it succeeds for TCP SYN or ICMP
      ECHO probes. As the skb bound dev is the interface and the sk dev is
      the vrf, sk lookup fails for ICMP_DEST_UNREACH and ICMP_TIME_EXCEEDED
      messages. The solution is for the secondary dev to be passed so that
      the interface is available for the device match to succeed, in the same
      way as is already done for non-error cases.
      Signed-off-by: NMike Manning <mmanning@vyatta.att-mail.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f64bf6b8
    • H
      bridge: do not add port to router list when receives query with source 0.0.0.0 · 5a2de63f
      Hangbin Liu 提交于
      Based on RFC 4541, 2.1.1.  IGMP Forwarding Rules
      
        The switch supporting IGMP snooping must maintain a list of
        multicast routers and the ports on which they are attached.  This
        list can be constructed in any combination of the following ways:
      
        a) This list should be built by the snooping switch sending
           Multicast Router Solicitation messages as described in IGMP
           Multicast Router Discovery [MRDISC].  It may also snoop
           Multicast Router Advertisement messages sent by and to other
           nodes.
      
        b) The arrival port for IGMP Queries (sent by multicast routers)
           where the source address is not 0.0.0.0.
      
      We should not add the port to router list when receives query with source
      0.0.0.0.
      Reported-by: NYing Xu <yinxu@redhat.com>
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Acked-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a2de63f
    • K
      net/smc: fix smc_buf_unuse to use the lgr pointer · fb692ec4
      Karsten Graul 提交于
      The pointer to the link group is unset in the smc connection structure
      right before the call to smc_buf_unuse. Provide the lgr pointer to
      smc_buf_unuse explicitly.
      And move the call to smc_lgr_schedule_free_work to the end of
      smc_conn_free.
      
      Fixes: a6920d1d ("net/smc: handle unregistered buffers")
      Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fb692ec4
    • S
      ipv6/ndisc: Preserve IPv6 control buffer if protocol error handlers are called · ee1abcf6
      Stefano Brivio 提交于
      Commit a61bbcf2 ("[NET]: Store skb->timestamp as offset to a base
      timestamp") introduces a neighbour control buffer and zeroes it out in
      ndisc_rcv(), as ndisc_recv_ns() uses it.
      
      Commit f2776ff0 ("[IPV6]: Fix address/interface handling in UDP and
      DCCP, according to the scoping architecture.") introduces the usage of the
      IPv6 control buffer in protocol error handlers (e.g. inet6_iif() in
      present-day __udp6_lib_err()).
      
      Now, with commit b94f1c09 ("ipv6: Use icmpv6_notify() to propagate
      redirect, instead of rt6_redirect()."), we call protocol error handlers
      from ndisc_redirect_rcv(), after the control buffer is already stolen and
      some parts are already zeroed out. This implies that inet6_iif() on this
      path will always return zero.
      
      This gives unexpected results on UDP socket lookup in __udp6_lib_err(), as
      we might actually need to match sockets for a given interface.
      
      Instead of always claiming the control buffer in ndisc_rcv(), do that only
      when needed.
      
      Fixes: b94f1c09 ("ipv6: Use icmpv6_notify() to propagate redirect, instead of rt6_redirect().")
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee1abcf6
  5. 26 10月, 2018 4 次提交
  6. 25 10月, 2018 7 次提交
    • D
      net/ipv6: Allow onlink routes to have a device mismatch if it is the default route · 4ed591c8
      David Ahern 提交于
      The intent of ip6_route_check_nh_onlink is to make sure the gateway
      given for an onlink route is not actually on a connected route for
      a different interface (e.g., 2001:db8:1::/64 is on dev eth1 and then
      an onlink route has a via 2001:db8:1::1 dev eth2). If the gateway
      lookup hits the default route then it most likely will be a different
      interface than the onlink route which is ok.
      
      Update ip6_route_check_nh_onlink to disregard the device mismatch
      if the gateway lookup hits the default route. Turns out the existing
      onlink tests are passing because there is no default route or it is
      an unreachable default, so update the onlink tests to have a default
      route other than unreachable.
      
      Fixes: fc1e64e1 ("net/ipv6: Add support for onlink flag")
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ed591c8
    • D
      net: sched: Remove TCA_OPTIONS from policy · e72bde6b
      David Ahern 提交于
      Marco reported an error with hfsc:
      root@Calimero:~# tc qdisc add dev eth0 root handle 1:0 hfsc default 1
      Error: Attribute failed policy validation.
      
      Apparently a few implementations pass TCA_OPTIONS as a binary instead
      of nested attribute, so drop TCA_OPTIONS from the policy.
      
      Fixes: 8b4c3cdd ("net: sched: Add policy validation for tc attributes")
      Reported-by: NMarco Berizzi <pupilla@libero.it>
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e72bde6b
    • S
      net: udp: fix handling of CHECKSUM_COMPLETE packets · db4f1be3
      Sean Tranchetti 提交于
      Current handling of CHECKSUM_COMPLETE packets by the UDP stack is
      incorrect for any packet that has an incorrect checksum value.
      
      udp4/6_csum_init() will both make a call to
      __skb_checksum_validate_complete() to initialize/validate the csum
      field when receiving a CHECKSUM_COMPLETE packet. When this packet
      fails validation, skb->csum will be overwritten with the pseudoheader
      checksum so the packet can be fully validated by software, but the
      skb->ip_summed value will be left as CHECKSUM_COMPLETE so that way
      the stack can later warn the user about their hardware spewing bad
      checksums. Unfortunately, leaving the SKB in this state can cause
      problems later on in the checksum calculation.
      
      Since the the packet is still marked as CHECKSUM_COMPLETE,
      udp_csum_pull_header() will SUBTRACT the checksum of the UDP header
      from skb->csum instead of adding it, leaving us with a garbage value
      in that field. Once we try to copy the packet to userspace in the
      udp4/6_recvmsg(), we'll make a call to skb_copy_and_csum_datagram_msg()
      to checksum the packet data and add it in the garbage skb->csum value
      to perform our final validation check.
      
      Since the value we're validating is not the proper checksum, it's possible
      that the folded value could come out to 0, causing us not to drop the
      packet. Instead, we believe that the packet was checksummed incorrectly
      by hardware since skb->ip_summed is still CHECKSUM_COMPLETE, and we attempt
      to warn the user with netdev_rx_csum_fault(skb->dev);
      
      Unfortunately, since this is the UDP path, skb->dev has been overwritten
      by skb->dev_scratch and is no longer a valid pointer, so we end up
      reading invalid memory.
      
      This patch addresses this problem in two ways:
      	1) Do not use the dev pointer when calling netdev_rx_csum_fault()
      	   from skb_copy_and_csum_datagram_msg(). Since this gets called
      	   from the UDP path where skb->dev has been overwritten, we have
      	   no way of knowing if the pointer is still valid. Also for the
      	   sake of consistency with the other uses of
      	   netdev_rx_csum_fault(), don't attempt to call it if the
      	   packet was checksummed by software.
      
      	2) Add better CHECKSUM_COMPLETE handling to udp4/6_csum_init().
      	   If we receive a packet that's CHECKSUM_COMPLETE that fails
      	   verification (i.e. skb->csum_valid == 0), check who performed
      	   the calculation. It's possible that the checksum was done in
      	   software by the network stack earlier (such as Netfilter's
      	   CONNTRACK module), and if that says the checksum is bad,
      	   we can drop the packet immediately instead of waiting until
      	   we try and copy it to userspace. Otherwise, we need to
      	   mark the SKB as CHECKSUM_NONE, since the skb->csum field
      	   no longer contains the full packet checksum after the
      	   call to __skb_checksum_validate_complete().
      
      Fixes: e6afc8ac ("udp: remove headers from UDP packets before queueing")
      Fixes: c84d9490 ("udp: copy skb->truesize in the first cache line")
      Cc: Sam Kumar <samanthakumar@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: NSean Tranchetti <stranche@codeaurora.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      db4f1be3
    • D
      net: rtnl_dump_all needs to propagate error from dumpit function · c63586dc
      David Ahern 提交于
      If an address, route or netconf dump request is sent for AF_UNSPEC, then
      rtnl_dump_all is used to do the dump across all address families. If one
      of the dumpit functions fails (e.g., invalid attributes in the dump
      request) then rtnl_dump_all needs to propagate that error so the user
      gets an appropriate response instead of just getting no data.
      
      Fixes: effe6792 ("net: Enable kernel side filtering of route dumps")
      Fixes: 5fcd266a ("net/ipv4: Add support for dumping addresses for a specific device")
      Fixes: 6371a71f ("net/ipv6: Add support for dumping addresses for a specific device")
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c63586dc
    • D
      net: Don't return invalid table id error when dumping all families · ae677bbb
      David Ahern 提交于
      When doing a route dump across all address families, do not error out
      if the table does not exist. This allows a route dump for AF_UNSPEC
      with a table id that may only exist for some of the families.
      
      Do return the table does not exist error if dumping routes for a
      specific family and the table does not exist.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae677bbb
    • D
      net/ipv6: Put target net when address dump fails due to bad attributes · 242afaa6
      David Ahern 提交于
      If tgt_net is set based on IFA_TARGET_NETNSID attribute in the dump
      request, make sure all error paths call put_net.
      
      Fixes: 6371a71f ("net/ipv6: Add support for dumping addresses for a specific device")
      Fixes: ed6eff11 ("net/ipv6: Update inet6_dump_addr for strict data checking")
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      242afaa6
    • D
      net/ipv4: Put target net when address dump fails due to bad attributes · d7e38611
      David Ahern 提交于
      If tgt_net is set based on IFA_TARGET_NETNSID attribute in the dump
      request, make sure all error paths call put_net.
      
      Fixes: 5fcd266a ("net/ipv4: Add support for dumping addresses for a specific device")
      Fixes: c33078e3 ("net/ipv4: Update inet_dump_ifaddr for strict data checking")
      Reported-by: NLi RongQing <lirongqing@baidu.com>
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7e38611
  7. 24 10月, 2018 7 次提交
    • E
      tcp: add tcp_reset_xmit_timer() helper · 3f80e08f
      Eric Dumazet 提交于
      With EDT model, SRTT no longer is inflated by pacing delays.
      
      This means that RTO and some other xmit timers might be setup
      incorrectly. This is particularly visible with either :
      
      - Very small enforced pacing rates (SO_MAX_PACING_RATE)
      - Reduced rto (from the default 200 ms)
      
      This can lead to TCP flows aborts in the worst case,
      or spurious retransmits in other cases.
      
      For example, this session gets far more throughput
      than the requested 80kbit :
      
      $ netperf -H 127.0.0.2 -l 100 -- -q 10000
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 127.0.0.2 () port 0 AF_INET
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
      540000 262144 262144    104.00      2.66
      
      With the fix :
      
      $ netperf -H 127.0.0.2 -l 100 -- -q 10000
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 127.0.0.2 () port 0 AF_INET
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
      540000 262144 262144    104.00      0.12
      
      EDT allows for better control of rtx timers, since TCP has
      a better idea of the earliest departure time of each skb
      in the rtx queue. We only have to eventually add to the
      timer the difference of the EDT time with current time.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3f80e08f
    • M
      cgroup, netclassid: add a preemption point to write_classid · a90e90b7
      Michal Hocko 提交于
      We have seen a customer complaining about soft lockups on !PREEMPT
      kernel config with 4.4 based kernel
      
      [1072141.435366] NMI watchdog: BUG: soft lockup - CPU#21 stuck for 22s! [systemd:1]
      [1072141.444090] Modules linked in: mpt3sas raid_class binfmt_misc af_packet 8021q garp mrp stp llc xfs libcrc32c bonding iscsi_ibft iscsi_boot_sysfs msr ext4 crc16 jbd2 mbcache cdc_ether usbnet mii joydev hid_generic usbhid intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ipmi_ssif mgag200 i2c_algo_bit ttm ipmi_devintf drbg ixgbe drm_kms_helper vxlan ansi_cprng ip6_udp_tunnel drm aesni_intel udp_tunnel aes_x86_64 iTCO_wdt syscopyarea ptp xhci_pci lrw iTCO_vendor_support pps_core gf128mul ehci_pci glue_helper sysfillrect mdio pcspkr sb_edac ablk_helper cryptd ehci_hcd sysimgblt xhci_hcd fb_sys_fops edac_core mei_me lpc_ich ses usbcore enclosure dca mfd_core ipmi_si mei i2c_i801 scsi_transport_sas usb_common ipmi_msghandler shpchp fjes wmi processor button acpi_pad btrfs xor raid6_pq sd_mod crc32c_intel megaraid_sas sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod md_mod autofs4
      [1072141.444146] Supported: Yes
      [1072141.444149] CPU: 21 PID: 1 Comm: systemd Not tainted 4.4.121-92.80-default #1
      [1072141.444150] Hardware name: LENOVO Lenovo System x3650 M5 -[5462P4U]- -[5462P4U]-/01GR451, BIOS -[TCE136H-2.70]- 06/13/2018
      [1072141.444151] task: ffff880191bd0040 ti: ffff880191bd4000 task.ti: ffff880191bd4000
      [1072141.444153] RIP: 0010:[<ffffffff815229f9>]  [<ffffffff815229f9>] update_classid_sock+0x29/0x40
      [1072141.444157] RSP: 0018:ffff880191bd7d58  EFLAGS: 00000286
      [1072141.444158] RAX: ffff883b177cb7c0 RBX: 0000000000000000 RCX: 0000000000000000
      [1072141.444159] RDX: 00000000000009c7 RSI: ffff880191bd7d5c RDI: ffff8822e29bb200
      [1072141.444160] RBP: ffff883a72230980 R08: 0000000000000101 R09: 0000000000000000
      [1072141.444161] R10: 0000000000000008 R11: f000000000000000 R12: ffffffff815229d0
      [1072141.444162] R13: 0000000000000000 R14: ffff881fd0a47ac0 R15: ffff880191bd7f28
      [1072141.444163] FS:  00007f3e2f1eb8c0(0000) GS:ffff882000340000(0000) knlGS:0000000000000000
      [1072141.444164] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [1072141.444165] CR2: 00007f3e2f200000 CR3: 0000001ffea4e000 CR4: 00000000001606f0
      [1072141.444166] Stack:
      [1072141.444166]  ffffffa800000246 00000000000009c7 ffffffff8121d583 ffff8818312a05c0
      [1072141.444168]  ffff8818312a1100 ffff880197c3b280 ffff881861422858 ffffffffffffffea
      [1072141.444170]  ffffffff81522b1c ffffffff81d0ca20 ffff8817fa17b950 ffff883fdd8121e0
      [1072141.444171] Call Trace:
      [1072141.444179]  [<ffffffff8121d583>] iterate_fd+0x53/0x80
      [1072141.444182]  [<ffffffff81522b1c>] write_classid+0x4c/0x80
      [1072141.444187]  [<ffffffff8111328b>] cgroup_file_write+0x9b/0x100
      [1072141.444193]  [<ffffffff81278bcb>] kernfs_fop_write+0x11b/0x150
      [1072141.444198]  [<ffffffff81201566>] __vfs_write+0x26/0x100
      [1072141.444201]  [<ffffffff81201bed>] vfs_write+0x9d/0x190
      [1072141.444203]  [<ffffffff812028c2>] SyS_write+0x42/0xa0
      [1072141.444207]  [<ffffffff815f58c3>] entry_SYSCALL_64_fastpath+0x1e/0xca
      [1072141.445490] DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x1e/0xca
      
      If a cgroup has many tasks with many open file descriptors then we would
      end up in a large loop without any rescheduling point throught the
      operation. Add cond_resched once per task.
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      a90e90b7
    • K
      Revert "net: simplify sock_poll_wait" · 89ab066d
      Karsten Graul 提交于
      This reverts commit dd979b4d.
      
      This broke tcp_poll for SMC fallback: An AF_SMC socket establishes an
      internal TCP socket for the initial handshake with the remote peer.
      Whenever the SMC connection can not be established this TCP socket is
      used as a fallback. All socket operations on the SMC socket are then
      forwarded to the TCP socket. In case of poll, the file->private_data
      pointer references the SMC socket because the TCP socket has no file
      assigned. This causes tcp_poll to wait on the wrong socket.
      Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      89ab066d
    • T
    • T
      79b18181
    • T
      SUNRPC: Simplify lookup code · 07d02a67
      Trond Myklebust 提交于
      We no longer need to worry about whether or not the entry is hashed in
      order to figure out if the contents are valid. We only care whether or
      not the refcount is non-zero.
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      07d02a67
    • T
      95cd6232
  8. 23 10月, 2018 1 次提交
    • E
      llc: do not use sk_eat_skb() · 604d415e
      Eric Dumazet 提交于
      syzkaller triggered a use-after-free [1], caused by a combination of
      skb_get() in llc_conn_state_process() and usage of sk_eat_skb()
      
      sk_eat_skb() is assuming the skb about to be freed is only used by
      the current thread. TCP/DCCP stacks enforce this because current
      thread holds the socket lock.
      
      llc_conn_state_process() wants to make sure skb does not disappear,
      and holds a reference on the skb it manipulates. But as soon as this
      skb is added to socket receive queue, another thread can consume it.
      
      This means that llc must use regular skb_unlink() and kfree_skb()
      so that both producer and consumer can safely work on the same skb.
      
      [1]
      BUG: KASAN: use-after-free in atomic_read include/asm-generic/atomic-instrumented.h:21 [inline]
      BUG: KASAN: use-after-free in refcount_read include/linux/refcount.h:43 [inline]
      BUG: KASAN: use-after-free in skb_unref include/linux/skbuff.h:967 [inline]
      BUG: KASAN: use-after-free in kfree_skb+0xb7/0x580 net/core/skbuff.c:655
      Read of size 4 at addr ffff8801d1f6fba4 by task ksoftirqd/1/18
      
      CPU: 1 PID: 18 Comm: ksoftirqd/1 Not tainted 4.19.0-rc8+ #295
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c4/0x2b6 lib/dump_stack.c:113
       print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256
       kasan_report_error mm/kasan/report.c:354 [inline]
       kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
       check_memory_region_inline mm/kasan/kasan.c:260 [inline]
       check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267
       kasan_check_read+0x11/0x20 mm/kasan/kasan.c:272
       atomic_read include/asm-generic/atomic-instrumented.h:21 [inline]
       refcount_read include/linux/refcount.h:43 [inline]
       skb_unref include/linux/skbuff.h:967 [inline]
       kfree_skb+0xb7/0x580 net/core/skbuff.c:655
       llc_sap_state_process+0x9b/0x550 net/llc/llc_sap.c:224
       llc_sap_rcv+0x156/0x1f0 net/llc/llc_sap.c:297
       llc_sap_handler+0x65e/0xf80 net/llc/llc_sap.c:438
       llc_rcv+0x79e/0xe20 net/llc/llc_input.c:208
       __netif_receive_skb_one_core+0x14d/0x200 net/core/dev.c:4913
       __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:5023
       process_backlog+0x218/0x6f0 net/core/dev.c:5829
       napi_poll net/core/dev.c:6249 [inline]
       net_rx_action+0x7c5/0x1950 net/core/dev.c:6315
       __do_softirq+0x30c/0xb03 kernel/softirq.c:292
       run_ksoftirqd+0x94/0x100 kernel/softirq.c:653
       smpboot_thread_fn+0x68b/0xa00 kernel/smpboot.c:164
       kthread+0x35a/0x420 kernel/kthread.c:246
       ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:413
      
      Allocated by task 18:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553
       kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
       kmem_cache_alloc_node+0x144/0x730 mm/slab.c:3644
       __alloc_skb+0x119/0x770 net/core/skbuff.c:193
       alloc_skb include/linux/skbuff.h:995 [inline]
       llc_alloc_frame+0xbc/0x370 net/llc/llc_sap.c:54
       llc_station_ac_send_xid_r net/llc/llc_station.c:52 [inline]
       llc_station_rcv+0x1dc/0x1420 net/llc/llc_station.c:111
       llc_rcv+0xc32/0xe20 net/llc/llc_input.c:220
       __netif_receive_skb_one_core+0x14d/0x200 net/core/dev.c:4913
       __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:5023
       process_backlog+0x218/0x6f0 net/core/dev.c:5829
       napi_poll net/core/dev.c:6249 [inline]
       net_rx_action+0x7c5/0x1950 net/core/dev.c:6315
       __do_softirq+0x30c/0xb03 kernel/softirq.c:292
      
      Freed by task 16383:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
       kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
       __cache_free mm/slab.c:3498 [inline]
       kmem_cache_free+0x83/0x290 mm/slab.c:3756
       kfree_skbmem+0x154/0x230 net/core/skbuff.c:582
       __kfree_skb+0x1d/0x20 net/core/skbuff.c:642
       sk_eat_skb include/net/sock.h:2366 [inline]
       llc_ui_recvmsg+0xec2/0x1610 net/llc/af_llc.c:882
       sock_recvmsg_nosec net/socket.c:794 [inline]
       sock_recvmsg+0xd0/0x110 net/socket.c:801
       ___sys_recvmsg+0x2b6/0x680 net/socket.c:2278
       __sys_recvmmsg+0x303/0xb90 net/socket.c:2390
       do_sys_recvmmsg+0x181/0x1a0 net/socket.c:2466
       __do_sys_recvmmsg net/socket.c:2484 [inline]
       __se_sys_recvmmsg net/socket.c:2480 [inline]
       __x64_sys_recvmmsg+0xbe/0x150 net/socket.c:2480
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The buggy address belongs to the object at ffff8801d1f6fac0
       which belongs to the cache skbuff_head_cache of size 232
      The buggy address is located 228 bytes inside of
       232-byte region [ffff8801d1f6fac0, ffff8801d1f6fba8)
      The buggy address belongs to the page:
      page:ffffea000747dbc0 count:1 mapcount:0 mapping:ffff8801d9be7680 index:0xffff8801d1f6fe80
      flags: 0x2fffc0000000100(slab)
      raw: 02fffc0000000100 ffffea0007346e88 ffffea000705b108 ffff8801d9be7680
      raw: ffff8801d1f6fe80 ffff8801d1f6f0c0 000000010000000b 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8801d1f6fa80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
       ffff8801d1f6fb00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      >ffff8801d1f6fb80: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc
                                     ^
       ffff8801d1f6fc00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8801d1f6fc80: fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      604d415e