1. 30 8月, 2018 16 次提交
    • S
      ipv6: fix cleanup ordering for ip6_mr failure · afe49de4
      Sabrina Dubroca 提交于
      Commit 15e66807 ("ipv6: reorder icmpv6_init() and ip6_mr_init()")
      moved the cleanup label for ipmr_fail, but should have changed the
      contents of the cleanup labels as well. Now we can end up cleaning up
      icmpv6 even though it hasn't been initialized (jump to icmp_fail or
      ipmr_fail).
      
      Simply undo things in the reverse order of their initialization.
      
      Example of panic (triggered by faking a failure of icmpv6_init):
      
          kasan: GPF could be caused by NULL-ptr deref or user memory access
          general protection fault: 0000 [#1] PREEMPT SMP KASAN PTI
          [...]
          RIP: 0010:__list_del_entry_valid+0x79/0x160
          [...]
          Call Trace:
           ? lock_release+0x8a0/0x8a0
           unregister_pernet_operations+0xd4/0x560
           ? ops_free_list+0x480/0x480
           ? down_write+0x91/0x130
           ? unregister_pernet_subsys+0x15/0x30
           ? down_read+0x1b0/0x1b0
           ? up_read+0x110/0x110
           ? kmem_cache_create_usercopy+0x1b4/0x240
           unregister_pernet_subsys+0x1d/0x30
           icmpv6_cleanup+0x1d/0x30
           inet6_init+0x1b5/0x23f
      
      Fixes: 15e66807 ("ipv6: reorder icmpv6_init() and ip6_mr_init()")
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      afe49de4
    • D
      net/sched: act_pedit: fix dump of extended layered op · 85eb9af1
      Davide Caratti 提交于
      in the (rare) case of failure in nla_nest_start(), missing NULL checks in
      tcf_pedit_key_ex_dump() can make the following command
      
       # tc action add action pedit ex munge ip ttl set 64
      
      dereference a NULL pointer:
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
       PGD 800000007d1cd067 P4D 800000007d1cd067 PUD 7acd3067 PMD 0
       Oops: 0002 [#1] SMP PTI
       CPU: 0 PID: 3336 Comm: tc Tainted: G            E     4.18.0.pedit+ #425
       Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
       RIP: 0010:tcf_pedit_dump+0x19d/0x358 [act_pedit]
       Code: be 02 00 00 00 48 89 df 66 89 44 24 20 e8 9b b1 fd e0 85 c0 75 46 8b 83 c8 00 00 00 49 83 c5 08 48 03 83 d0 00 00 00 4d 39 f5 <66> 89 04 25 00 00 00 00 0f 84 81 01 00 00 41 8b 45 00 48 8d 4c 24
       RSP: 0018:ffffb5d4004478a8 EFLAGS: 00010246
       RAX: ffff8880fcda2070 RBX: ffff8880fadd2900 RCX: 0000000000000000
       RDX: 0000000000000002 RSI: ffffb5d4004478ca RDI: ffff8880fcda206e
       RBP: ffff8880fb9cb900 R08: 0000000000000008 R09: ffff8880fcda206e
       R10: ffff8880fadd2900 R11: 0000000000000000 R12: ffff8880fd26cf40
       R13: ffff8880fc957430 R14: ffff8880fc957430 R15: ffff8880fb9cb988
       FS:  00007f75a537a740(0000) GS:ffff8880fda00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000000000000 CR3: 000000007a2fa005 CR4: 00000000001606f0
       Call Trace:
        ? __nla_reserve+0x38/0x50
        tcf_action_dump_1+0xd2/0x130
        tcf_action_dump+0x6a/0xf0
        tca_get_fill.constprop.31+0xa3/0x120
        tcf_action_add+0xd1/0x170
        tc_ctl_action+0x137/0x150
        rtnetlink_rcv_msg+0x263/0x2d0
        ? _cond_resched+0x15/0x40
        ? rtnl_calcit.isra.30+0x110/0x110
        netlink_rcv_skb+0x4d/0x130
        netlink_unicast+0x1a3/0x250
        netlink_sendmsg+0x2ae/0x3a0
        sock_sendmsg+0x36/0x40
        ___sys_sendmsg+0x26f/0x2d0
        ? do_wp_page+0x8e/0x5f0
        ? handle_pte_fault+0x6c3/0xf50
        ? __handle_mm_fault+0x38e/0x520
        ? __sys_sendmsg+0x5e/0xa0
        __sys_sendmsg+0x5e/0xa0
        do_syscall_64+0x5b/0x180
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0033:0x7f75a4583ba0
       Code: c3 48 8b 05 f2 62 2c 00 f7 db 64 89 18 48 83 cb ff eb dd 0f 1f 80 00 00 00 00 83 3d fd c3 2c 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ae cc 00 00 48 89 04 24
       RSP: 002b:00007fff60ee7418 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
       RAX: ffffffffffffffda RBX: 00007fff60ee7540 RCX: 00007f75a4583ba0
       RDX: 0000000000000000 RSI: 00007fff60ee7490 RDI: 0000000000000003
       RBP: 000000005b842d3e R08: 0000000000000002 R09: 0000000000000000
       R10: 00007fff60ee6ea0 R11: 0000000000000246 R12: 0000000000000000
       R13: 00007fff60ee7554 R14: 0000000000000001 R15: 000000000066c100
       Modules linked in: act_pedit(E) ip6table_filter ip6_tables iptable_filter binfmt_misc crct10dif_pclmul ext4 crc32_pclmul mbcache ghash_clmulni_intel jbd2 pcbc snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm aesni_intel crypto_simd snd_timer cryptd glue_helper snd joydev pcspkr soundcore virtio_balloon i2c_piix4 nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c ata_generic pata_acpi virtio_net net_failover virtio_blk virtio_console failover qxl crc32c_intel drm_kms_helper syscopyarea serio_raw sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix virtio_pci libata virtio_ring i2c_core virtio floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: act_pedit]
       CR2: 0000000000000000
      
      Like it's done for other TC actions, give up dumping pedit rules and return
      an error if nla_nest_start() returns NULL.
      
      Fixes: 71d0ed70 ("net/act_pedit: Support using offset relative to the conventional network headers")
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      85eb9af1
    • C
      sh_eth: Add R7S9210 support · 6e0bb04d
      Chris Brandt 提交于
      Add support for the R7S9210 which is part of the RZ/A2 series.
      Signed-off-by: NChris Brandt <chris.brandt@renesas.com>
      Acked-by: NRob Herring <robh@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e0bb04d
    • D
      Merge branch 'hns-fixes' · def70b61
      David S. Miller 提交于
      Peng Li says:
      
      ====================
      net: hns: fix some bugs about speed and duplex change
      
      If there are packets in hardware when changing the spped
      or duplex, it may cause hardware hang up.
      
      This patchset adds the code for waiting chip to clean the all
      pkts(TX & RX) in chip when the driver uses the function named
      "adjust link".
      
      This patchset cleans the pkts as follows:
      1) close rx of chip, close tx of protocol stack.
      2) wait rcb, ppe, mac to clean.
      3) adjust link
      4) open rx of chip, open tx of protocol stack.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      def70b61
    • P
      net: hns: add netif_carrier_off before change speed and duplex · 455c4401
      Peng Li 提交于
      If there are packets in hardware when changing the speed
      or duplex, it may cause hardware hang up.
      
      This patch adds netif_carrier_off before change speed and
      duplex in ethtool_ops.set_link_ksettings, and adds
      netif_carrier_on after complete the change.
      Signed-off-by: NPeng Li <lipeng321@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      455c4401
    • P
      net: hns: add the code for cleaning pkt in chip · 31fabbee
      Peng Li 提交于
      If there are packets in hardware when changing the speed
      or duplex, it may cause hardware hang up.
      
      This patch adds the code for waiting chip to clean the all
      pkts(TX & RX) in chip when the driver uses the function named
      "adjust link".
      
      This patch cleans the pkts as follows:
      1) close rx of chip, close tx of protocol stack.
      2) wait rcb, ppe, mac to clean.
      3) adjust link
      4) open rx of chip, open tx of protocol stack.
      Signed-off-by: NPeng Li <lipeng321@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31fabbee
    • A
      r8169: set RxConfig after tx/rx is enabled for RTL8169sb/8110sb devices · 05212ba8
      Azat Khuzhin 提交于
      I have two Ethernet adapters:
        r8169 0000:03:01.0 eth0: RTL8169sb/8110sb, 00:14:d1:14:2d:49, XID 10000000, IRQ 18
        r8169 0000:01:00.0 eth0: RTL8168e/8111e, 64:66:b3:11:14:5d, XID 2c200000, IRQ 30
      And after upgrading from linux 4.15 [1] to linux 4.18+ [2] RTL8169sb failed to
      receive any packets. tcpdump shows a lot of checksum mismatch.
      
        [1]: a0f79386
        [2]: 05193597 (4.19 merge window opened)
      
      I started bisecting and the found that [3] breaks it. According to [4]:
        "For 8110S, 8110SB, and 8110SC series, the initial value of RxConfig
        needs to be set after the tx/rx is enabled."
      So I moved rtl_init_rxcfg() after enabling tx/rs and now my adapter works
      (RTL8168e works too).
      
        [3]: 3559d81e
        [4]: e542a226 ("r8169: adjust the RxConfig
      settings.")
      
      Also drop "rx" from rtl_set_rx_tx_config_registers(), since it does nothing
      with it already.
      
      Fixes: 3559d81e ("r8169: simplify
      rtl_hw_start_8169")
      
      Cc: Heiner Kallweit <hkallweit1@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: netdev@vger.kernel.org
      Cc: Realtek linux nic maintainers <nic_swsd@realtek.com>
      Signed-off-by: NAzat Khuzhin <a3at.mail@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      05212ba8
    • C
      tipc: switch to rhashtable iterator · 9a07efa9
      Cong Wang 提交于
      syzbot reported a use-after-free in tipc_group_fill_sock_diag(),
      where tipc_group_fill_sock_diag() still reads tsk->group meanwhile
      tipc_group_delete() just deletes it in tipc_release().
      
      tipc_nl_sk_walk() aims to lock this sock when walking each sock
      in the hash table to close race conditions with sock changes like
      this one, by acquiring tsk->sk.sk_lock.slock spinlock, unfortunately
      this doesn't work at all. All non-BH call path should take
      lock_sock() instead to make it work.
      
      tipc_nl_sk_walk() brutally iterates with raw rht_for_each_entry_rcu()
      where RCU read lock is required, this is the reason why lock_sock()
      can't be taken on this path. This could be resolved by switching to
      rhashtable iterator API's, where taking a sleepable lock is possible.
      Also, the iterator API's are friendly for restartable calls like
      diag dump, the last position is remembered behind the scence,
      all we need to do here is saving the iterator into cb->args[].
      
      I tested this with parallel tipc diag dump and thousands of tipc
      socket creation and release, no crash or memory leak.
      
      Reported-by: syzbot+b9c8f3ab2994b7cd1625@syzkaller.appspotmail.com
      Cc: Jon Maloy <jon.maloy@ericsson.com>
      Cc: Ying Xue <ying.xue@windriver.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9a07efa9
    • J
      Revert "net: stmmac: Do not keep rearming the coalesce timer in stmmac_xmit" · e5133f2f
      Jerome Brunet 提交于
      This reverts commit 4ae0169f.
      
      This change in the handling of the coalesce timer is causing regression on
      (at least) amlogic platforms.
      
      Network will break down very quickly (a few seconds) after starting
      a download. This can easily be reproduced using iperf3 for example.
      
      The problem has been reported on the S805, S905, S912 and A113 SoCs
      (Realtek and Micrel PHYs) and it is likely impacting all Amlogics
      platforms using Gbit ethernet
      
      No problem was seen with the platform using 10/100 only PHYs (GXL internal)
      
      Reverting change brings things back to normal and allows to use network
      again until we better understand the problem with the coalesce timer.
      
      Cc: Jose Abreu <joabreu@synopsys.com>
      Cc: Joao Pinto <jpinto@synopsys.com>
      Cc: Vitor Soares <soares@synopsys.com>
      Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>
      Cc: Alexandre Torgue <alexandre.torgue@st.com>
      Cc: Corentin Labbe <clabbe@baylibre.com>
      Signed-off-by: NJerome Brunet <jbrunet@baylibre.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e5133f2f
    • C
      tipc: fix a missing rhashtable_walk_exit() · bd583fe3
      Cong Wang 提交于
      rhashtable_walk_exit() must be paired with rhashtable_walk_enter().
      
      Fixes: 40f9f439 ("tipc: Fix tipc_sk_reinit race conditions")
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Ying Xue <ying.xue@windriver.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd583fe3
    • A
      vti6: remove !skb->ignore_df check from vti6_xmit() · 9f289546
      Alexey Kodanev 提交于
      Before the commit d6990976 ("vti6: fix PMTU caching and reporting
      on xmit") '!skb->ignore_df' check was always true because the function
      skb_scrub_packet() was called before it, resetting ignore_df to zero.
      
      In the commit, skb_scrub_packet() was moved below, and now this check
      can be false for the packet, e.g. when sending it in the two fragments,
      this prevents successful PMTU updates in such case. The next attempts
      to send the packet lead to the same tx error. Moreover, vti6 initial
      MTU value relies on PMTU adjustments.
      
      This issue can be reproduced with the following LTP test script:
          udp_ipsec_vti.sh -6 -p ah -m tunnel -s 2000
      
      Fixes: ccd740cb ("vti6: Add pmtu handling to vti6_xmit.")
      Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Acked-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f289546
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 6a5d39aa
      David S. Miller 提交于
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2018-08-29
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Fix a build error in sk_reuseport_convert_ctx_access() when
         compiling with clang which cannot resolve hweight_long() at
         build time inside the BUILD_BUG_ON() assertion, from Stefan.
      
      2) Several fixes for BPF sockmap, four of them in getting the
         bpf_msg_pull_data() helper to work, one use after free case
         in bpf_tcp_close() and one refcount leak in bpf_tcp_recvmsg(),
         from Daniel.
      
      3) Another fix for BPF sockmap where we misaccount sk_mem_uncharge()
         in the socket redirect error case from unwinding scatterlist
         twice, from John.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6a5d39aa
    • A
      Merge branch 'bpf_msg_pull_data-fixes' · d65e6c80
      Alexei Starovoitov 提交于
      Daniel Borkmann says:
      
      ====================
      This set contains three more fixes for the bpf_msg_pull_data()
      mainly for correcting scatterlist ring wrap-arounds as well as
      fixing up data pointers. For details please see individual patches.
      Thanks!
      ====================
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      d65e6c80
    • D
      bpf: fix sg shift repair start offset in bpf_msg_pull_data · a8cf76a9
      Daniel Borkmann 提交于
      When we perform the sg shift repair for the scatterlist ring, we
      currently start out at i = first_sg + 1. However, this is not
      correct since the first_sg could point to the sge sitting at slot
      MAX_SKB_FRAGS - 1, and a subsequent i = MAX_SKB_FRAGS will access
      the scatterlist ring (sg) out of bounds. Add the sk_msg_iter_var()
      helper for iterating through the ring, and apply the same rule
      for advancing to the next ring element as we do elsewhere. Later
      work will use this helper also in other places.
      
      Fixes: 015632bb ("bpf: sk_msg program helper bpf_sk_msg_pull_data")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      a8cf76a9
    • D
      bpf: fix shift upon scatterlist ring wrap-around in bpf_msg_pull_data · 2e43f95d
      Daniel Borkmann 提交于
      If first_sg and last_sg wraps around in the scatterlist ring, then we
      need to account for that in the shift as well. E.g. crafting such msgs
      where this is the case leads to a hang as shift becomes negative. E.g.
      consider the following scenario:
      
        first_sg := 14     |=>    shift := -12     msg->sg_start := 10
        last_sg  :=  3     |                       msg->sg_end   :=  5
      
      round  1:  i := 15, move_from :=   3, sg[15] := sg[  3]
      round  2:  i :=  0, move_from := -12, sg[ 0] := sg[-12]
      round  3:  i :=  1, move_from := -11, sg[ 1] := sg[-11]
      round  4:  i :=  2, move_from := -10, sg[ 2] := sg[-10]
      [...]
      round 13:  i := 11, move_from :=  -1, sg[ 2] := sg[ -1]
      round 14:  i := 12, move_from :=   0, sg[ 2] := sg[  0]
      round 15:  i := 13, move_from :=   1, sg[ 2] := sg[  1]
      round 16:  i := 14, move_from :=   2, sg[ 2] := sg[  2]
      round 17:  i := 15, move_from :=   3, sg[ 2] := sg[  3]
      [...]
      
      This means we will loop forever and never hit the msg->sg_end condition
      to break out of the loop. When we see that the ring wraps around, then
      the shift should be MAX_SKB_FRAGS - first_sg + last_sg - 1. Meaning,
      the remainder slots from the tail of the ring and the head until last_sg
      combined.
      
      Fixes: 015632bb ("bpf: sk_msg program helper bpf_sk_msg_pull_data")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      2e43f95d
    • D
      bpf: fix msg->data/data_end after sg shift repair in bpf_msg_pull_data · 0e06b227
      Daniel Borkmann 提交于
      In the current code, msg->data is set as sg_virt(&sg[i]) + start - offset
      and msg->data_end relative to it as msg->data + bytes. Using iterator i
      to point to the updated starting scatterlist element holds true for some
      cases, however not for all where we'd end up pointing out of bounds. It
      is /correct/ for these ones:
      
      1) When first finding the starting scatterlist element (sge) where we
         find that the page is already privately owned by the msg and where
         the requested bytes and headroom fit into the sge's length.
      
      However, it's /incorrect/ for the following ones:
      
      2) After we made the requested area private and updated the newly allocated
         page into first_sg slot of the scatterlist ring; when we find that no
         shift repair of the ring is needed where we bail out updating msg->data
         and msg->data_end. At that point i will point to last_sg, which in this
         case is the next elem of first_sg in the ring. The sge at that point
         might as well be invalid (e.g. i == msg->sg_end), which we use for
         setting the range of sg_virt(&sg[i]). The correct one would have been
         first_sg.
      
      3) Similar as in 2) but when we find that a shift repair of the ring is
         needed. In this case we fix up all sges and stop once we've reached the
         end. In this case i will point to will point to the new msg->sg_end,
         and the sge at that point will be invalid. Again here the requested
         range sits in first_sg.
      
      Fixes: 015632bb ("bpf: sk_msg program helper bpf_sk_msg_pull_data")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      0e06b227
  2. 29 8月, 2018 1 次提交
    • D
      bpf: fix several offset tests in bpf_msg_pull_data · 5b24109b
      Daniel Borkmann 提交于
      While recently going over bpf_msg_pull_data(), I noticed three
      issues which are fixed in here:
      
      1) When we attempt to find the first scatterlist element (sge)
         for the start offset, we add len to the offset before we check
         for start < offset + len, whereas it should come after when
         we iterate to the next sge to accumulate the offsets. For
         example, given a start offset of 12 with a sge length of 8
         for the first sge in the list would lead us to determine this
         sge as the first sge thinking it covers first 16 bytes where
         start is located, whereas start sits in subsequent sges so
         we would end up pulling in the wrong data.
      
      2) After figuring out the starting sge, we have a short-cut test
         in !msg->sg_copy[i] && bytes <= len. This checks whether it's
         not needed to make the page at the sge private where we can
         just exit by updating msg->data and msg->data_end. However,
         the length test is not fully correct. bytes <= len checks
         whether the requested bytes (end - start offsets) fit into the
         sge's length. The part that is missing is that start must not
         be sge length aligned. Meaning, the start offset into the sge
         needs to be accounted as well on top of the requested bytes
         as otherwise we can access the sge out of bounds. For example
         the sge could have length of 8, our requested bytes could have
         length of 8, but at a start offset of 4, so we also would need
         to pull in 4 bytes of the next sge, when we jump to the out
         label we do set msg->data to sg_virt(&sg[i]) + start - offset
         and msg->data_end to msg->data + bytes which would be oob.
      
      3) The subsequent bytes < copy test for finding the last sge has
         the same issue as in point 2) but also it tests for less than
         rather than less or equal to. Meaning if the sge length is of
         8 and requested bytes of 8 while having the start aligned with
         the sge, we would unnecessarily go and pull in the next sge as
         well to make it private.
      
      Fixes: 015632bb ("bpf: sk_msg program helper bpf_sk_msg_pull_data")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      5b24109b
  3. 28 8月, 2018 14 次提交
    • J
      bpf: sockmap, decrement copied count correctly in redirect error case · 501ca817
      John Fastabend 提交于
      Currently, when a redirect occurs in sockmap and an error occurs in
      the redirect call we unwind the scatterlist once in the error path
      of bpf_tcp_sendmsg_do_redirect() and then again in sendmsg(). Then
      in the error path of sendmsg we decrement the copied count by the
      send size.
      
      However, its possible we partially sent data before the error was
      generated. This can happen if do_tcp_sendpages() partially sends the
      scatterlist before encountering a memory pressure error. If this
      happens we need to decrement the copied value (the value tracking
      how many bytes were actually sent to TCP stack) by the number of
      remaining bytes _not_ the entire send size. Otherwise we risk
      confusing userspace.
      
      Also we don't need two calls to free the scatterlist one is
      good enough. So remove the one in bpf_tcp_sendmsg_do_redirect() and
      then properly reduce copied by the number of remaining bytes which
      may in fact be the entire send size if no bytes were sent.
      
      To do this use bool to indicate if free_start_sg() should do mem
      accounting or not.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      501ca817
    • S
      bpf: fix build error with clang · 3f6e138d
      Stefan Agner 提交于
      Building the newly introduced BPF_PROG_TYPE_SK_REUSEPORT leads to
      a compile time error when building with clang:
      net/core/filter.o: In function `sk_reuseport_convert_ctx_access':
        ../net/core/filter.c:7284: undefined reference to `__compiletime_assert_7284'
      
      It seems that clang has issues resolving hweight_long at compile
      time. Since SK_FL_PROTO_MASK is a constant, we can use the interface
      for known constant arguments which works fine with clang.
      
      Fixes: 2dbb9b9e ("bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT")
      Signed-off-by: NStefan Agner <stefan@agner.ch>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      3f6e138d
    • D
      bpf, sockmap: fix psock refcount leak in bpf_tcp_recvmsg · 15c480ef
      Daniel Borkmann 提交于
      In bpf_tcp_recvmsg() we first took a reference on the psock, however
      once we find that there are skbs in the normal socket's receive queue
      we return with processing them through tcp_recvmsg(). Problem is that
      we leak the taken reference on the psock in that path. Given we don't
      really do anything with the psock at this point, move the skb_queue_empty()
      test before we fetch the psock to fix this case.
      
      Fixes: 8934ce2f ("bpf: sockmap redirect ingress support")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      15c480ef
    • D
      bpf, sockmap: fix potential use after free in bpf_tcp_close · e06fa9c1
      Daniel Borkmann 提交于
      bpf_tcp_close() we pop the psock linkage to a map via psock_map_pop().
      A parallel update on the sock hash map can happen between psock_map_pop()
      and lookup_elem_raw() where we override the element under link->hash /
      link->key. In bpf_tcp_close()'s lookup_elem_raw() we subsequently only
      test whether an element is present, but we do not test whether the
      element is infact the element we were looking for.
      
      We lock the sock in bpf_tcp_close() during that time, so do we hold
      the lock in sock_hash_update_elem(). However, the latter locks the
      sock which is newly updated, not the one we're purging from the hash
      table. This means that while one CPU is doing the lookup from bpf_tcp_close(),
      another CPU is doing the map update in parallel, dropped our sock from
      the hlist and released the psock.
      
      Subsequently the first CPU will find the new sock and attempts to drop
      and release the old sock yet another time. Fix is that we need to check
      the elements for a match after lookup, similar as we do in the sock map.
      Note that the hash tab elems are freed via RCU, so access to their
      link->hash / link->key is fine since we're under RCU read side there.
      
      Fixes: e9db4ef6 ("bpf: sockhash fix omitted bucket lock in sock_close")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      e06fa9c1
    • Z
      net/rds: Use rdma_read_gids to get connection SGID/DGID in IPv6 · 53ae914d
      Zhu Yanjun 提交于
      In IPv4, the newly introduced rdma_read_gids is used to read the SGID/DGID
      for the connection which returns GID correctly for RoCE transport as well.
      
      In IPv6, rdma_read_gids is also used. The following are why rdma_read_gids
      is introduced.
      
      rdma_addr_get_dgid() for RoCE for client side connections returns MAC
      address, instead of DGID.
      rdma_addr_get_sgid() for RoCE doesn't return correct SGID for IPv6 and
      when more than one IP address is assigned to the netdevice.
      
      So the transport agnostic rdma_read_gids() API is provided by rdma_cm
      module.
      Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      53ae914d
    • L
      net: dsa: Drop GPIO includes · ad861986
      Linus Walleij 提交于
      Commit 52638f71 ("dsa: Move gpio reset into switch driver")
      moved the GPIO handling into the switch drivers but forgot
      to remove the GPIO header includes.
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ad861986
    • H
      tipc: fix the big/little endian issue in tipc_dest · 30935198
      Haiqing Bai 提交于
      In function tipc_dest_push, the 32bit variables 'node' and 'port'
      are stored separately in uppper and lower part of 64bit 'value'.
      Then this value is assigned to dst->value which is a union like:
      union
      {
        struct {
          u32 port;
          u32 node;
        };
        u64 value;
      }
      This works on little-endian machines like x86 but fails on big-endian
      machines.
      
      The fix remove the 'value' stack parameter and even the 'value'
      member of the union in tipc_dest, assign the 'node' and 'port' member
      directly with the input parameter to avoid the endian issue.
      
      Fixes: a80ae530 ("tipc: improve destination linked list")
      Signed-off-by: NZhenbo Gao <zhenbo.gao@windriver.com>
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NHaiqing Bai <Haiqing.Bai@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      30935198
    • D
      Merge branch 'net-sched-fixes' · ca2b1d2d
      David S. Miller 提交于
      Jiri Pirko says:
      
      ====================
      net: sched: couple of small fixes
      
      Jiri Pirko (2):
        net: sched: fix extack error message when chain is failed to be
          created
        net: sched: return -ENOENT when trying to remove filter from
          non-existent chain
      ====================
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ca2b1d2d
    • J
      net: sched: return -ENOENT when trying to remove filter from non-existent chain · b7b4247d
      Jiri Pirko 提交于
      When chain 0 was implicitly created, removal of non-existent filter from
      chain 0 gave -ENOENT. Once chain 0 became non-implicit, the same call is
      giving -EINVAL. Fix this by returning -ENOENT in that case.
      Reported-by: NRoman Mashak <mrv@mojatatu.com>
      Fixes: f71e0ca4 ("net: sched: Avoid implicit chain 0 creation")
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b7b4247d
    • J
      net: sched: fix extack error message when chain is failed to be created · d5ed72a5
      Jiri Pirko 提交于
      Instead "Cannot find" say "Cannot create".
      
      Fixes: c35a4acc ("net: sched: cls_api: handle generic cls errors")
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d5ed72a5
    • X
      erspan: set erspan_ver to 1 by default when adding an erspan dev · 84581bda
      Xin Long 提交于
      After erspan_ver is introudced, if erspan_ver is not set in iproute, its
      value will be left 0 by default. Since Commit 02f99df1 ("erspan: fix
      invalid erspan version."), it has broken the traffic due to the version
      check in erspan_xmit if users are not aware of 'erspan_ver' param, like
      using an old version of iproute.
      
      To fix this compatibility problem, it sets erspan_ver to 1 by default
      when adding an erspan dev in erspan_setup. Note that we can't do it in
      ipgre_netlink_parms, as this function is also used by ipgre_changelink.
      
      Fixes: 02f99df1 ("erspan: fix invalid erspan version.")
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      84581bda
    • X
      sctp: remove useless start_fail from sctp_ht_iter in proc · 834539e6
      Xin Long 提交于
      After changing rhashtable_walk_start to return void, start_fail would
      never be set other value than 0, and the checking for start_fail is
      pointless, so remove it.
      
      Fixes: 97a6ec4a ("rhashtable: Change rhashtable_walk_start to return void")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      834539e6
    • X
      sctp: hold transport before accessing its asoc in sctp_transport_get_next · bab1be79
      Xin Long 提交于
      As Marcelo noticed, in sctp_transport_get_next, it is iterating over
      transports but then also accessing the association directly, without
      checking any refcnts before that, which can cause an use-after-free
      Read.
      
      So fix it by holding transport before accessing the association. With
      that, sctp_transport_hold calls can be removed in the later places.
      
      Fixes: 626d16f5 ("sctp: export some apis or variables for sctp_diag and reuse some for proc")
      Reported-by: syzbot+fe62a0c9aa6a85c6de16@syzkaller.appspotmail.com
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bab1be79
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 050cdc6c
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) ICE, E1000, IGB, IXGBE, and I40E bug fixes from the Intel folks.
      
       2) Better fix for AB-BA deadlock in packet scheduler code, from Cong
          Wang.
      
       3) bpf sockmap fixes (zero sized key handling, etc.) from Daniel
          Borkmann.
      
       4) Send zero IPID in TCP resets and SYN-RECV state ACKs, to prevent
          attackers using it as a side-channel. From Eric Dumazet.
      
       5) Memory leak in mediatek bluetooth driver, from Gustavo A. R. Silva.
      
       6) Hook up rt->dst.input of ipv6 anycast routes properly, from Hangbin
          Liu.
      
       7) hns and hns3 bug fixes from Huazhong Tan.
      
       8) Fix RIF leak in mlxsw driver, from Ido Schimmel.
      
       9) iova range check fix in vhost, from Jason Wang.
      
      10) Fix hang in do_tcp_sendpages() with tls, from John Fastabend.
      
      11) More r8152 chips need to disable RX aggregation, from Kai-Heng Feng.
      
      12) Memory exposure in TCA_U32_SEL handling, from Kees Cook.
      
      13) TCP BBR congestion control fixes from Kevin Yang.
      
      14) hv_netvsc, ignore non-PCI devices, from Stephen Hemminger.
      
      15) qed driver fixes from Tomer Tayar.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (77 commits)
        net: sched: Fix memory exposure from short TCA_U32_SEL
        qed: fix spelling mistake "comparsion" -> "comparison"
        vhost: correctly check the iova range when waking virtqueue
        qlge: Fix netdev features configuration.
        net: macb: do not disable MDIO bus at open/close time
        Revert "net: stmmac: fix build failure due to missing COMMON_CLK dependency"
        net: macb: Fix regression breaking non-MDIO fixed-link PHYs
        mlxsw: spectrum_switchdev: Do not leak RIFs when removing bridge
        i40e: fix condition of WARN_ONCE for stat strings
        i40e: Fix for Tx timeouts when interface is brought up if DCB is enabled
        ixgbe: fix driver behaviour after issuing VFLR
        ixgbe: Prevent unsupported configurations with XDP
        ixgbe: Replace GFP_ATOMIC with GFP_KERNEL
        igb: Replace mdelay() with msleep() in igb_integrated_phy_loopback()
        igb: Replace GFP_ATOMIC with GFP_KERNEL in igb_sw_init()
        igb: Use an advanced ctx descriptor for launchtime
        e1000: ensure to free old tx/rx rings in set_ringparam()
        e1000: check on netif_running() before calling e1000_up()
        ixgb: use dma_zalloc_coherent instead of allocator/memset
        ice: Trivial formatting fixes
        ...
      050cdc6c
  4. 27 8月, 2018 9 次提交
    • J
      Fix up libata MAINTAINERS entry · 908946c4
      Jens Axboe 提交于
      The email was botched in one entry, and I also forgot to update the
      location of the git tree. It'll be under the linux-block umbrella, just
      with different branches.
      Reported-by: NBaruch Siach <baruch@tkos.co.il>
      Fixes: 7634ccd2 ("libata: maintainership update")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      908946c4
    • K
      net: sched: Fix memory exposure from short TCA_U32_SEL · 98c8f125
      Kees Cook 提交于
      Via u32_change(), TCA_U32_SEL has an unspecified type in the netlink
      policy, so max length isn't enforced, only minimum. This means nkeys
      (from userspace) was being trusted without checking the actual size of
      nla_len(), which could lead to a memory over-read, and ultimately an
      exposure via a call to u32_dump(). Reachability is CAP_NET_ADMIN within
      a namespace.
      Reported-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: netdev@vger.kernel.org
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      98c8f125
    • L
      Linux 4.19-rc1 · 5b394b2d
      Linus Torvalds 提交于
      5b394b2d
    • L
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · b933d6eb
      Linus Torvalds 提交于
      Pull timer update from Thomas Gleixner:
       "New defines for the compat time* types so they can be shared between
        32bit and 64bit builds. Not used yet, but merging them now allows the
        actual conversions to be merged through different maintainer trees
        without dependencies
      
        We still have compat interfaces for 32bit on 64bit even with the new
        2038 safe timespec/val variants because pointer size is different. And
        for the old style timespec/val interfaces we need yet another 'compat'
        interface for both 32bit native and 32bit on 64bit"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        y2038: Provide aliases for compat helpers
      b933d6eb
    • L
      Merge branch 'ida-4.19' of git://git.infradead.org/users/willy/linux-dax · aba16dc5
      Linus Torvalds 提交于
      Pull IDA updates from Matthew Wilcox:
       "A better IDA API:
      
            id = ida_alloc(ida, GFP_xxx);
            ida_free(ida, id);
      
        rather than the cumbersome ida_simple_get(), ida_simple_remove().
      
        The new IDA API is similar to ida_simple_get() but better named.  The
        internal restructuring of the IDA code removes the bitmap
        preallocation nonsense.
      
        I hope the net -200 lines of code is convincing"
      
      * 'ida-4.19' of git://git.infradead.org/users/willy/linux-dax: (29 commits)
        ida: Change ida_get_new_above to return the id
        ida: Remove old API
        test_ida: check_ida_destroy and check_ida_alloc
        test_ida: Convert check_ida_conv to new API
        test_ida: Move ida_check_max
        test_ida: Move ida_check_leaf
        idr-test: Convert ida_check_nomem to new API
        ida: Start new test_ida module
        target/iscsi: Allocate session IDs from an IDA
        iscsi target: fix session creation failure handling
        drm/vmwgfx: Convert to new IDA API
        dmaengine: Convert to new IDA API
        ppc: Convert vas ID allocation to new IDA API
        media: Convert entity ID allocation to new IDA API
        ppc: Convert mmu context allocation to new IDA API
        Convert net_namespace to new IDA API
        cb710: Convert to new IDA API
        rsxx: Convert to new IDA API
        osd: Convert to new IDA API
        sd: Convert to new IDA API
        ...
      aba16dc5
    • L
      Merge tag 'gcc-plugins-v4.19-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · c4726e77
      Linus Torvalds 提交于
      Pull gcc plugin fix from Kees Cook:
       "Lift gcc test into Kconfig. This is for better behavior when the
        kernel is built with Clang, reported by Stefan Agner"
      
      * tag 'gcc-plugins-v4.19-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        gcc-plugins: Disable when building under Clang
      c4726e77
    • L
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · d207ea8e
      Linus Torvalds 提交于
      Pull perf updates from Thomas Gleixner:
       "Kernel:
         - Improve kallsyms coverage
         - Add x86 entry trampolines to kcore
         - Fix ARM SPE handling
         - Correct PPC event post processing
      
        Tools:
         - Make the build system more robust
         - Small fixes and enhancements all over the place
         - Update kernel ABI header copies
         - Preparatory work for converting libtraceevnt to a shared library
         - License cleanups"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (100 commits)
        tools arch: Update arch/x86/lib/memcpy_64.S copy used in 'perf bench mem memcpy'
        tools arch x86: Update tools's copy of cpufeatures.h
        perf python: Fix pyrf_evlist__read_on_cpu() interface
        perf mmap: Store real cpu number in 'struct perf_mmap'
        perf tools: Remove ext from struct kmod_path
        perf tools: Add gzip_is_compressed function
        perf tools: Add lzma_is_compressed function
        perf tools: Add is_compressed callback to compressions array
        perf tools: Move the temp file processing into decompress_kmodule
        perf tools: Use compression id in decompress_kmodule()
        perf tools: Store compression id into struct dso
        perf tools: Add compression id into 'struct kmod_path'
        perf tools: Make is_supported_compression() static
        perf tools: Make decompress_to_file() function static
        perf tools: Get rid of dso__needs_decompress() call in __open_dso()
        perf tools: Get rid of dso__needs_decompress() call in symbol__disassemble()
        perf tools: Get rid of dso__needs_decompress() call in read_object_code()
        tools lib traceevent: Change to SPDX License format
        perf llvm: Allow passing options to llc in addition to clang
        perf parser: Improve error message for PMU address filters
        ...
      d207ea8e
    • L
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 2a8a2b7c
      Linus Torvalds 提交于
      Pull x86 fixes from Thomas Gleixner:
      
       - Correct the L1TF fallout on 32bit and the off by one in the 'too much
         RAM for protection' calculation.
      
       - Add a helpful kernel message for the 'too much RAM' case
      
       - Unbreak the VDSO in case that the compiler desides to use indirect
         jumps/calls and emits retpolines which cannot be resolved because the
         kernel uses its own thunks, which does not work for the VDSO. Make it
         use the builtin thunks.
      
       - Re-export start_thread() which was unexported when the 32/64bit
         implementation was unified. start_thread() is required by modular
         binfmt handlers.
      
       - Trivial cleanups
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/speculation/l1tf: Suggest what to do on systems with too much RAM
        x86/speculation/l1tf: Fix off-by-one error when warning that system has too much RAM
        x86/kvm/vmx: Remove duplicate l1d flush definitions
        x86/speculation/l1tf: Fix overflow in l1tf_pfn_limit() on 32bit
        x86/process: Re-export start_thread()
        x86/mce: Add notifier_block forward declaration
        x86/vdso: Fix vDSO build if a retpoline is emitted
      2a8a2b7c
    • L
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · de375035
      Linus Torvalds 提交于
      Pull irq update from Thomas Gleixner:
       "A small set of updats/fixes for the irq subsystem:
      
         - Allow GICv3 interrupts to be configured as wake-up sources to
           enable wakeup from suspend
      
         - Make the error handling of the STM32 irqchip init function work
      
         - A set of small cleanups and improvements"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip/gic-v3: Allow interrupt to be configured as wake-up sources
        irqchip/tango: Set irq handler and data in one go
        dt-bindings: irqchip: renesas-irqc: Document r8a774a1 support
        irqchip/s3c24xx: Remove unneeded comparison of unsigned long to 0
        irqchip/stm32: Fix init error handling
        irqchip/bcm7038-l1: Hide cpu offline callback when building for !SMP
      de375035