1. 27 12月, 2019 40 次提交
    • E
      crypto: hash - set CRYPTO_TFM_NEED_KEY if ->setkey() fails · 7d0c9769
      Eric Biggers 提交于
      mainline inclusion
      from mainline-ba7d7433 master
      commit ba7d7433
      category: bugfix
      bugzilla: 11388
      CVE: NA
      
      ---------------------------
      
      Some algorithms have a ->setkey() method that is not atomic, in the
      sense that setting a key can fail after changes were already made to the
      tfm context.  In this case, if a key was already set the tfm can end up
      in a state that corresponds to neither the old key nor the new key.
      
      It's not feasible to make all ->setkey() methods atomic, especially ones
      that have to key multiple sub-tfms.  Therefore, make the crypto API set
      CRYPTO_TFM_NEED_KEY if ->setkey() fails and the algorithm requires a
      key, to prevent the tfm from being used until a new key is set.
      
      Note: we can't set CRYPTO_TFM_NEED_KEY for OPTIONAL_KEY algorithms, so
      ->setkey() for those must nevertheless be atomic.  That's fine for now
      since only the crc32 and crc32c algorithms set OPTIONAL_KEY, and it's
      not intended that OPTIONAL_KEY be used much.
      
      [Cc stable mainly because when introducing the NEED_KEY flag I changed
       AF_ALG to rely on it; and unlike in-kernel crypto API users, AF_ALG
       previously didn't have this problem.  So these "incompletely keyed"
       states became theoretically accessible via AF_ALG -- though, the
       opportunities for causing real mischief seem pretty limited.]
      
      Fixes: 9fa68f62 ("crypto: hash - prevent using keyed hashes without setting key")
      Cc: stable@vger.kernel.org
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NJason Yan <yanaijie@huawei.com>
      Reviewed-by: NZhangXiaoxu <zhangxiaoxu5@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      7d0c9769
    • E
      crypto: testmgr - skip crc32c context test for ahash algorithms · 99f8efac
      Eric Biggers 提交于
      mainline inclusion
      from mainline-eb5e6730 master
      commit eb5e6730
      category: bugfix
      bugzilla: 11390
      CVE: NA
      
      ---------------------------
      
      Instantiating "cryptd(crc32c)" causes a crypto self-test failure because
      the crypto_alloc_shash() in alg_test_crc32c() fails.  This is because
      cryptd(crc32c) is an ahash algorithm, not a shash algorithm; so it can
      only be accessed through the ahash API, unlike shash algorithms which
      can be accessed through both the ahash and shash APIs.
      
      As the test is testing the shash descriptor format which is only
      applicable to shash algorithms, skip it for ahash algorithms.
      
      (Note that it's still important to fix crypto self-test failures even
       for weird algorithm instantiations like cryptd(crc32c) that no one
       would really use; in fips_enabled mode unprivileged users can use them
       to panic the kernel, and also they prevent treating a crypto self-test
       failure as a bug when fuzzing the kernel.)
      
      Fixes: 8e3ee85e ("crypto: crc32c - Test descriptor context format")
      Cc: stable@vger.kernel.org
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NJason Yan <yanaijie@huawei.com>
      Reviewed-by: NZhangXiaoxu <zhangxiaoxu5@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      99f8efac
    • E
      crypto: cfb - add missing 'chunksize' property · c191638a
      Eric Biggers 提交于
      mainline inclusion
      from mainline-394a9e04 master
      commit 394a9e04
      category: bugfix
      bugzilla: 11393
      CVE: NA
      
      ---------------------------
      
      Like some other block cipher mode implementations, the CFB
      implementation assumes that while walking through the scatterlist, a
      partial block does not occur until the end.  But the walk is incorrectly
      being done with a blocksize of 1, as 'cra_blocksize' is set to 1 (since
      CFB is a stream cipher) but no 'chunksize' is set.  This bug causes
      incorrect encryption/decryption for some scatterlist layouts.
      
      Fix it by setting the 'chunksize'.  Also extend the CFB test vectors to
      cover this bug as well as cases where the message length is not a
      multiple of the block size.
      
      Fixes: a7d85e06 ("crypto: cfb - add support for Cipher FeedBack mode")
      Cc: <stable@vger.kernel.org> # v4.17+
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NJason Yan <yanaijie@huawei.com>
      Reviewed-by: NZhangXiaoxu <zhangxiaoxu5@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      c191638a
    • E
      crypto: cfb - remove bogus memcpy() with src == dest · 9b515b60
      Eric Biggers 提交于
      mainline inclusion
      from mainline-6c2e322b master
      commit 6c2e322b
      category: bugfix
      bugzilla: 11394
      CVE: NA
      
      ---------------------------
      
      The memcpy() in crypto_cfb_decrypt_inplace() uses walk->iv as both the
      source and destination, which has undefined behavior.  It is unneeded
      because walk->iv is already used to hold the previous ciphertext block;
      thus, walk->iv is already updated to its final value.  So, remove it.
      
      Also, note that in-place decryption is the only case where the previous
      ciphertext block is not directly available.  Therefore, as a related
      cleanup I also updated crypto_cfb_encrypt_segment() to directly use the
      previous ciphertext block rather than save it into walk->iv.  This makes
      it consistent with in-place encryption and out-of-place decryption; now
      only in-place decryption is different, because it has to be.
      
      Fixes: a7d85e06 ("crypto: cfb - add support for Cipher FeedBack mode")
      Cc: <stable@vger.kernel.org> # v4.17+
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NJason Yan <yanaijie@huawei.com>
      Reviewed-by: NZhangXiaoxu <zhangxiaoxu5@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      9b515b60
    • E
      crypto: skcipher - set CRYPTO_TFM_NEED_KEY if ->setkey() fails · 30093a33
      Eric Biggers 提交于
      mainline inclusion
      from mainline-b1f6b4bf master
      commit b1f6b4bf
      category: bugfix
      bugzilla: 11400
      CVE: NA
      
      ---------------------------
      
      Some algorithms have a ->setkey() method that is not atomic, in the
      sense that setting a key can fail after changes were already made to the
      tfm context.  In this case, if a key was already set the tfm can end up
      in a state that corresponds to neither the old key nor the new key.
      
      For example, in lrw.c, if gf128mul_init_64k_bbe() fails due to lack of
      memory, then priv::table will be left NULL.  After that, encryption with
      that tfm will cause a NULL pointer dereference.
      
      It's not feasible to make all ->setkey() methods atomic, especially ones
      that have to key multiple sub-tfms.  Therefore, make the crypto API set
      CRYPTO_TFM_NEED_KEY if ->setkey() fails and the algorithm requires a
      key, to prevent the tfm from being used until a new key is set.
      
      [Cc stable mainly because when introducing the NEED_KEY flag I changed
       AF_ALG to rely on it; and unlike in-kernel crypto API users, AF_ALG
       previously didn't have this problem.  So these "incompletely keyed"
       states became theoretically accessible via AF_ALG -- though, the
       opportunities for causing real mischief seem pretty limited.]
      
      Fixes: f8d33fac ("crypto: skcipher - prevent using skciphers without setting key")
      Cc: <stable@vger.kernel.org> # v4.16+
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NJason Yan <yanaijie@huawei.com>
      Reviewed-by: NZhangXiaoxu <zhangxiaoxu5@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      30093a33
    • E
      crypto: aegis - fix handling chunked inputs · c83be43d
      Eric Biggers 提交于
      mainline inclusion
      from mainline-0f533e67 master
      commit 0f533e67
      category: bugfix
      bugzilla: 11404
      CVE: NA
      
      ---------------------------
      
      The generic AEGIS implementations all fail the improved AEAD tests
      because they produce the wrong result with some data layouts.  The issue
      is that they assume that if the skcipher_walk API gives 'nbytes' not
      aligned to the walksize (a.k.a. walk.stride), then it is the end of the
      data.  In fact, this can happen before the end.  Fix them.
      
      Fixes: f606a88e ("crypto: aegis - Add generic AEGIS AEAD implementations")
      Cc: <stable@vger.kernel.org> # v4.18+
      Cc: Ondrej Mosnacek <omosnace@redhat.com>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Reviewed-by: NOndrej Mosnacek <omosnace@redhat.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NJason Yan <yanaijie@huawei.com>
      Reviewed-by: NZhangXiaoxu <zhangxiaoxu5@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      c83be43d
    • E
      crypto: pcbc - remove bogus memcpy()s with src == dest · 4acb9216
      Eric Biggers 提交于
      mainline inclusion
      from mainline-251b7aea master
      commit 251b7aea
      category: bugfix
      bugzilla: 11406
      CVE: NA
      
      ---------------------------
      
      The memcpy()s in the PCBC implementation use walk->iv as both the source
      and destination, which has undefined behavior.  These memcpy()'s are
      actually unneeded, because walk->iv is already used to hold the previous
      plaintext block XOR'd with the previous ciphertext block.  Thus,
      walk->iv is already updated to its final value.
      
      So remove the broken and unnecessary memcpy()s.
      
      Fixes: 91652be5 ("[CRYPTO] pcbc: Add Propagated CBC template")
      Cc: <stable@vger.kernel.org> # v2.6.21+
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NJason Yan <yanaijie@huawei.com>
      Reviewed-by: NZhangXiaoxu <zhangxiaoxu5@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      4acb9216
    • E
      crypto: morus - fix handling chunked inputs · 243c0fda
      Eric Biggers 提交于
      mainline inclusion
      from mainline-d644f1c8 master
      commit d644f1c8
      category: bugfixx
      bugzilla: 11407
      CVE: NA
      
      ---------------------------
      
      The generic MORUS implementations all fail the improved AEAD tests
      because they produce the wrong result with some data layouts.  The issue
      is that they assume that if the skcipher_walk API gives 'nbytes' not
      aligned to the walksize (a.k.a. walk.stride), then it is the end of the
      data.  In fact, this can happen before the end.  Fix them.
      
      Fixes: 396be41f ("crypto: morus - Add generic MORUS AEAD implementations")
      Cc: <stable@vger.kernel.org> # v4.18+
      Cc: Ondrej Mosnacek <omosnace@redhat.com>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Reviewed-by: NOndrej Mosnacek <omosnace@redhat.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NJason Yan <yanaijie@huawei.com>
      Reviewed-by: NZhangXiaoxu <zhangxiaoxu5@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      243c0fda
    • E
      crypto: aead - set CRYPTO_TFM_NEED_KEY if ->setkey() fails · cb86f84f
      Eric Biggers 提交于
      mainline inclusion
      from mainline-6ebc9700 master
      commit 6ebc9700
      category: bugfixx
      bugzilla: 11409
      CVE: NA
      
      ---------------------------
      
      Some algorithms have a ->setkey() method that is not atomic, in the
      sense that setting a key can fail after changes were already made to the
      tfm context.  In this case, if a key was already set the tfm can end up
      in a state that corresponds to neither the old key nor the new key.
      
      For example, in gcm.c, if the kzalloc() fails due to lack of memory,
      then the CTR part of GCM will have the new key but GHASH will not.
      
      It's not feasible to make all ->setkey() methods atomic, especially ones
      that have to key multiple sub-tfms.  Therefore, make the crypto API set
      CRYPTO_TFM_NEED_KEY if ->setkey() fails, to prevent the tfm from being
      used until a new key is set.
      
      [Cc stable mainly because when introducing the NEED_KEY flag I changed
       AF_ALG to rely on it; and unlike in-kernel crypto API users, AF_ALG
       previously didn't have this problem.  So these "incompletely keyed"
       states became theoretically accessible via AF_ALG -- though, the
       opportunities for causing real mischief seem pretty limited.]
      
      Fixes: dc26c17f ("crypto: aead - prevent using AEADs without setting key")
      Cc: <stable@vger.kernel.org> # v4.16+
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NJason Yan <yanaijie@huawei.com>
      Reviewed-by: NZhangXiaoxu <zhangxiaoxu5@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      cb86f84f
    • E
      crypto: tgr192 - fix unaligned memory access · 0057e2de
      Eric Biggers 提交于
      mainline inclusion
      from mainline-f990f7fb master
      commit f990f7fb
      category: bugfix
      bugzilla: 11411
      CVE: NA
      
      ---------------------------
      
      Fix an unaligned memory access in tgr192_transform() by using the
      unaligned access helpers.
      
      Fixes: 06ace7a9 ("[CRYPTO] Use standard byte order macros wherever possible")
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NJason Yan <yanaijie@huawei.com>
      Reviewed-by: NZhangXiaoxu <zhangxiaoxu5@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      0057e2de
    • E
      crypto: ahash - fix another early termination in hash walk · b1d8753f
      Eric Biggers 提交于
      mainline inclusion
      from mainline-77568e53 master
      commit 77568e53
      category: bugfix
      bugzilla: 11412
      CVE: NA
      
      ---------------------------
      
      Hash algorithms with an alignmask set, e.g. "xcbc(aes-aesni)" and
      "michael_mic", fail the improved hash tests because they sometimes
      produce the wrong digest.  The bug is that in the case where a
      scatterlist element crosses pages, not all the data is actually hashed
      because the scatterlist walk terminates too early.  This happens because
      the 'nbytes' variable in crypto_hash_walk_done() is assigned the number
      of bytes remaining in the page, then later interpreted as the number of
      bytes remaining in the scatterlist element.  Fix it.
      
      Fixes: 900a081f ("crypto: ahash - Fix early termination in hash walk")
      Cc: stable@vger.kernel.org
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NJason Yan <yanaijie@huawei.com>
      Reviewed-by: NZhangXiaoxu <zhangxiaoxu5@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      b1d8753f
    • G
      Linux 4.19.30 · 91afce4d
      Greg Kroah-Hartman 提交于
      Merge 42 patches from 4.19.30 stable
      branch (53 total) beside 11 already merged patches:
      
      996ee1a net: hsr: fix memory leak in hsr_dev_finalize()
      7cfb97b net: sit: fix UBSAN Undefined behaviour in check_6rd
      96a3b14 mdio_bus: Fix use-after-free on device_register fails
      b9d0cb7 ipv6: route: purge exception on removal
      e5c31b5 team: use operstate consistently for linkup
      96dd4ef ipv6: route: enforce RCU protection in rt6_update_exception_stamp_rt()
      2e4b2ae ipv6: route: enforce RCU protection in ip6_route_check_nh_onlink()
      795cb33 bonding: fix PACKET_ORIGDEV regression
      f56b3c2 net/smc: fix smc_poll in SMC_INIT state
      96ce54b drm: Block fb changes for async plane updates
      090ce34 i40e: report correct statistics when XDP is enabled
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      91afce4d
    • Z
      vhost/vsock: fix vhost vsock cid hashing inconsistent · dbb70733
      Zha Bin 提交于
      commit 7fbe078c upstream.
      
      The vsock core only supports 32bit CID, but the Virtio-vsock spec define
      CID (dst_cid and src_cid) as u64 and the upper 32bits is reserved as
      zero. This inconsistency causes one bug in vhost vsock driver. The
      scenarios is:
      
        0. A hash table (vhost_vsock_hash) is used to map an CID to a vsock
        object. And hash_min() is used to compute the hash key. hash_min() is
        defined as:
        (sizeof(val) <= 4 ? hash_32(val, bits) : hash_long(val, bits)).
        That means the hash algorithm has dependency on the size of macro
        argument 'val'.
        0. In function vhost_vsock_set_cid(), a 64bit CID is passed to
        hash_min() to compute the hash key when inserting a vsock object into
        the hash table.
        0. In function vhost_vsock_get(), a 32bit CID is passed to hash_min()
        to compute the hash key when looking up a vsock for an CID.
      
      Because the different size of the CID, hash_min() returns different hash
      key, thus fails to look up the vsock object for an CID.
      
      To fix this bug, we keep CID as u64 in the IOCTLs and virtio message
      headers, but explicitly convert u64 to u32 when deal with the hash table
      and vsock core.
      
      Fixes: 834e772c ("vhost/vsock: fix use-after-free in network stack callers")
      Link: https://github.com/stefanha/virtio/blob/vsock/trunk/content.texSigned-off-by: NZha Bin <zhabin@linux.alibaba.com>
      Reviewed-by: NLiu Jiang <gerry@linux.alibaba.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NShengjing Zhu <i@zhsj.me>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      dbb70733
    • G
      staging: erofs: fix race when the managed cache is enabled · fb4c3a27
      Gao Xiang 提交于
      commit 51232df5 upstream.
      
      When the managed cache is enabled, the last reference count
      of a workgroup must be used for its workstation.
      
      Otherwise, it could lead to incorrect (un)freezes in
      the reclaim path, and it would be harmful.
      
      A typical race as follows:
      
      Thread 1 (In the reclaim path)  Thread 2
      workgroup_freeze(grp, 1)                                refcnt = 1
      ...
      workgroup_unfreeze(grp, 1)                              refcnt = 1
                                      workgroup_get(grp)      refcnt = 2 (x)
      workgroup_put(grp)                                      refcnt = 1 (x)
                                      ...unexpected behaviors
      
      * grp is detached but still used, which violates cache-managed
        freeze constraint.
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NGao Xiang <gaoxiang25@huawei.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      fb4c3a27
    • X
      It's wrong to add len to sector_nr in raid10 reshape twice · 66cf9622
      Xiao Ni 提交于
      commit b761dcf1 upstream.
      
      In reshape_request it already adds len to sector_nr already. It's wrong to add len to
      sector_nr again after adding pages to bio. If there is bad block it can't copy one chunk
      at a time, it needs to goto read_more. Now the sector_nr is wrong. It can cause data
      corruption.
      
      Cc: stable@vger.kernel.org # v3.16+
      Signed-off-by: NXiao Ni <xni@redhat.com>
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      66cf9622
    • K
      perf/x86/intel: Make dev_attr_allow_tsx_force_abort static · a7745399
      kbuild test robot 提交于
      commit c634dc6b upstream.
      
      Fixes: 400816f6 ("perf/x86/intel: Implement support for TSX Force Abort")
      Signed-off-by: Nkbuild test robot <lkp@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
      Cc: kbuild-all@01.org
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190313184243.GA10820@lkp-sb-ep06Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      a7745399
    • P
      perf/x86/intel: Fix memory corruption · 4ecbd494
      Peter Zijlstra 提交于
      commit ede271b0 upstream.
      
      Through:
      
        validate_event()
          x86_pmu.get_event_constraints(.idx=-1)
            tfa_get_event_constraints()
              dyn_constraint()
      
      cpuc->constraint_list[-1] is used, which is an obvious out-of-bound access.
      
      In this case, simply skip the TFA constraint code, there is no event
      constraint with just PMC3, therefore the code will never result in the
      empty set.
      
      Fixes: 400816f6 ("perf/x86/intel: Implement support for TSX Force Abort")
      Reported-by: NTony Jones <tonyj@suse.com>
      Reported-by: N"DSouza, Nelson" <nelson.dsouza@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NTony Jones <tonyj@suse.com>
      Tested-by: N"DSouza, Nelson" <nelson.dsouza@intel.com>
      Cc: eranian@google.com
      Cc: jolsa@redhat.com
      Cc: stable@kernel.org
      Link: https://lkml.kernel.org/r/20190314130705.441549378@infradead.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      4ecbd494
    • J
      ALSA: hda/realtek: Enable headset MIC of Acer TravelMate X514-51T with ALC255 · 8b35cec7
      Jian-Hong Pan 提交于
      commit cbc05fd6 upstream.
      
      The Acer TravelMate X514-51T with ALC255 cannot detect the headset MIC
      until ALC255_FIXUP_ACER_HEADSET_MIC quirk applied.  Although, the
      internal DMIC uses another module - snd_soc_skl as the driver.  We still
      need the NID 0x1a in the quirk to enable the headset MIC.
      Signed-off-by: NJian-Hong Pan <jian-hong@endlessm.com>
      Signed-off-by: NKailang Yang <kailang@realtek.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      8b35cec7
    • T
      ALSA: hda/realtek - Reduce click noise on Dell Precision 5820 headphone · e6a0f604
      Takashi Iwai 提交于
      commit c0ca5ece upstream.
      
      Dell Precision 5820 with ALC3234 codec (which is equivalent with
      ALC255) shows click noises at (runtime) PM resume on the headphone.
      The biggest source of the noise comes from the cleared headphone pin
      control at resume, which is done via the standard shutup procedure.
      
      Although we have an override of the standard shutup callback to
      replace with NOP, this would skip other needed stuff (e.g. the pull
      down of headset power).  So, instead, this "fixes" the behavior of
      alc_fixup_no_shutup() by introducing spec->no_shutup_pins flag.
      When this flag is set, Realtek codec won't call the standard
      snd_hda_shutup_pins() & co.  Now alc_fixup_no_shutup() just sets this
      flag instead of overriding spec->shutup callback itself.  This allows
      us to apply the similar fix for other entries easily if needed in
      future.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      e6a0f604
    • J
      ALSA: hda/realtek: Enable audio jacks of ASUS UX362FA with ALC294 · c86ee6a9
      Jian-Hong Pan 提交于
      commit 8bb37a2a upstream.
      
      The ASUS UX362FA with ALC294 cannot detect the headset MIC and outputs
      through the internal speaker and the headphone.  This issue can be fixed
      by the quirk in the commit 4e051106 ALSA: hda/realtek: Enable audio
      jacks of ASUS UX533FD with ALC294.
      
      Besides, ASUS UX362FA and UX533FD have the same audio initial pin config
      values.  So, this patch replaces SND_PCI_QUIRK of UX533FD with a new
      SND_HDA_PIN_QUIRK which benefits both UX362FA and UX533FD.
      
      Fixes: 4e051106 ("ALSA: hda/realtek: Enable audio jacks of ASUS UX533FD with ALC294")
      Signed-off-by: NJian-Hong Pan <jian-hong@endlessm.com>
      Signed-off-by: NMing Shuo Chiu <chiu@endlessm.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      c86ee6a9
    • J
      ALSA: hda - add more quirks for HP Z2 G4 and HP Z240 · a440199f
      Jaroslav Kysela 提交于
      commit 167897f4 upstream.
      
      Apply the HP_MIC_NO_PRESENCE fixups for the more HP Z2 G4 and
      HP Z240 models.
      Reported-by: NJeff Burrell <jeff.burrell@hp.com>
      Signed-off-by: NJaroslav Kysela <perex@perex.cz>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      a440199f
    • T
      ALSA: hda: Extend i915 component bind timeout · 2df05733
      Takashi Iwai 提交于
      commit cfc35f9c upstream.
      
      I set 10 seconds for the timeout of the i915 audio component binding
      with a hope that recent machines are fast enough to handle all probe
      tasks in that period, but I was too optimistic.  The binding may take
      longer than that, and this caused a problem on the machine with both
      audio and graphics driver modules loaded in parallel, as Paul Menzel
      experienced.  This problem haven't hit so often just because the KMS
      driver is loaded in initrd on most machines.
      
      As a simple workaround, extend the timeout to 60 seconds.
      
      Fixes: f9b54e19 ("ALSA: hda/i915: Allow delayed i915 audio component binding")
      Reported-by: NPaul Menzel <pmenzel+alsa-devel@molgen.mpg.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      2df05733
    • T
      ALSA: firewire-motu: fix construction of PCM frame for capture direction · b8f0cae8
      Takashi Sakamoto 提交于
      commit f97a0944 upstream.
      
      In data blocks of common isochronous packet for MOTU devices, PCM
      frames are multiplexed in a shape of '24 bit * 4 Audio Pack', described
      in IEC 61883-6. The frames are not aligned to quadlet.
      
      For capture PCM substream, ALSA firewire-motu driver constructs PCM
      frames by reading data blocks byte-by-byte. However this operation
      includes bug for lower byte of the PCM sample. This brings invalid
      content of the PCM samples.
      
      This commit fixes the bug.
      Reported-by: NPeter Sjöberg <autopeter@gmail.com>
      Cc: <stable@vger.kernel.org> # v4.12+
      Fixes: 4641c939 ("ALSA: firewire-motu: add MOTU specific protocol layer")
      Signed-off-by: NTakashi Sakamoto <o-takashi@sakamocchi.jp>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      b8f0cae8
    • T
      ALSA: bebob: use more identical mod_alias for Saffire Pro 10 I/O against Liquid Saffire 56 · 3ca7ef95
      Takashi Sakamoto 提交于
      commit 7dc661bd upstream.
      
      ALSA bebob driver has an entry for Focusrite Saffire Pro 10 I/O. The
      entry matches vendor_id in root directory and model_id in unit
      directory of configuration ROM for IEEE 1394 bus.
      
      On the other hand, configuration ROM of Focusrite Liquid Saffire 56
      has the same vendor_id and model_id. This device is an application of
      TCAT Dice (TCD2220 a.k.a Dice Jr.) however ALSA bebob driver can be
      bound to it randomly instead of ALSA dice driver. At present, drivers
      in ALSA firewire stack can not handle this situation appropriately.
      
      This commit uses more identical mod_alias for Focusrite Saffire Pro 10
      I/O in ALSA bebob driver.
      
      $ python2 crpp < /sys/bus/firewire/devices/fw1/config_rom
                     ROM header and bus information block
                     -----------------------------------------------------------------
      400  042a829d  bus_info_length 4, crc_length 42, crc 33437
      404  31333934  bus_name "1394"
      408  f0649222  irmc 1, cmc 1, isc 1, bmc 1, pmc 0, cyc_clk_acc 100,
                     max_rec 9 (1024), max_rom 2, gen 2, spd 2 (S400)
      40c  00130e01  company_id 00130e     |
      410  000606e0  device_id 01000606e0  | EUI-64 00130e01000606e0
      
                     root directory
                     -----------------------------------------------------------------
      414  0009d31c  directory_length 9, crc 54044
      418  04000014  hardware version
      41c  0c0083c0  node capabilities per IEEE 1394
      420  0300130e  vendor
      424  81000012  --> descriptor leaf at 46c
      428  17000006  model
      42c  81000016  --> descriptor leaf at 484
      430  130120c2  version
      434  d1000002  --> unit directory at 43c
      438  d4000006  --> dependent info directory at 450
      
                     unit directory at 43c
                     -----------------------------------------------------------------
      43c  0004707c  directory_length 4, crc 28796
      440  1200a02d  specifier id: 1394 TA
      444  13010001  version: AV/C
      448  17000006  model
      44c  81000013  --> descriptor leaf at 498
      
                     dependent info directory at 450
                     -----------------------------------------------------------------
      450  000637c7  directory_length 6, crc 14279
      454  120007f5  specifier id
      458  13000001  version
      45c  3affffc7  (immediate value)
      460  3b100000  (immediate value)
      464  3cffffc7  (immediate value)
      468  3d600000  (immediate value)
      
                     descriptor leaf at 46c
                     -----------------------------------------------------------------
      46c  00056f3b  leaf_length 5, crc 28475
      470  00000000  textual descriptor
      474  00000000  minimal ASCII
      478  466f6375  "Focu"
      47c  73726974  "srit"
      480  65000000  "e"
      
                     descriptor leaf at 484
                     -----------------------------------------------------------------
      484  0004a165  leaf_length 4, crc 41317
      488  00000000  textual descriptor
      48c  00000000  minimal ASCII
      490  50726f31  "Pro1"
      494  30494f00  "0IO"
      
                     descriptor leaf at 498
                     -----------------------------------------------------------------
      498  0004a165  leaf_length 4, crc 41317
      49c  00000000  textual descriptor
      4a0  00000000  minimal ASCII
      4a4  50726f31  "Pro1"
      4a8  30494f00  "0IO"
      
      $ python2 crpp < /sys/bus/firewire/devices/fw1/config_rom
                     ROM header and bus information block
                     -----------------------------------------------------------------
      400  040442e4  bus_info_length 4, crc_length 4, crc 17124
      404  31333934  bus_name "1394"
      408  e0ff8112  irmc 1, cmc 1, isc 1, bmc 0, pmc 0, cyc_clk_acc 255,
                     max_rec 8 (512), max_rom 1, gen 1, spd 2 (S400)
      40c  00130e04  company_id 00130e     |
      410  018001e9  device_id 04018001e9  | EUI-64 00130e04018001e9
      
                     root directory
                     -----------------------------------------------------------------
      414  00065612  directory_length 6, crc 22034
      418  0300130e  vendor
      41c  8100000a  --> descriptor leaf at 444
      420  17000006  model
      424  8100000e  --> descriptor leaf at 45c
      428  0c0087c0  node capabilities per IEEE 1394
      42c  d1000001  --> unit directory at 430
      
                     unit directory at 430
                     -----------------------------------------------------------------
      430  000418a0  directory_length 4, crc 6304
      434  1200130e  specifier id
      438  13000001  version
      43c  17000006  model
      440  8100000f  --> descriptor leaf at 47c
      
                     descriptor leaf at 444
                     -----------------------------------------------------------------
      444  00056f3b  leaf_length 5, crc 28475
      448  00000000  textual descriptor
      44c  00000000  minimal ASCII
      450  466f6375  "Focu"
      454  73726974  "srit"
      458  65000000  "e"
      
                     descriptor leaf at 45c
                     -----------------------------------------------------------------
      45c  000762c6  leaf_length 7, crc 25286
      460  00000000  textual descriptor
      464  00000000  minimal ASCII
      468  4c495155  "LIQU"
      46c  49445f53  "ID_S"
      470  41464649  "AFFI"
      474  52455f35  "RE_5"
      478  36000000  "6"
      
                     descriptor leaf at 47c
                     -----------------------------------------------------------------
      47c  000762c6  leaf_length 7, crc 25286
      480  00000000  textual descriptor
      484  00000000  minimal ASCII
      488  4c495155  "LIQU"
      48c  49445f53  "ID_S"
      490  41464649  "AFFI"
      494  52455f35  "RE_5"
      498  36000000  "6"
      
      Cc: <stable@vger.kernel.org> # v3.16+
      Fixes: 25784ec2 ("ALSA: bebob: Add support for Focusrite Saffire/SaffirePro series")
      Signed-off-by: NTakashi Sakamoto <o-takashi@sakamocchi.jp>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      3ca7ef95
    • P
      perf/x86: Fixup typo in stub functions · d40114f2
      Peter Zijlstra 提交于
      commit f764c58b upstream.
      
      Guenter reported a build warning for CONFIG_CPU_SUP_INTEL=n:
      
        > With allmodconfig-CONFIG_CPU_SUP_INTEL, this patch results in:
        >
        > In file included from arch/x86/events/amd/core.c:8:0:
        > arch/x86/events/amd/../perf_event.h:1036:45: warning: ‘struct cpu_hw_event’ declared inside parameter list will not be visible outside of this definition or declaration
        >  static inline int intel_cpuc_prepare(struct cpu_hw_event *cpuc, int cpu)
      
      While harmless (an unsed pointer is an unused pointer, no matter the type)
      it needs fixing.
      Reported-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Fixes: d01b1f96 ("perf/x86/intel: Make cpuc allocations consistent")
      Link: http://lkml.kernel.org/r/20190315081410.GR5996@hirez.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      d40114f2
    • J
      f2fs: wait on atomic writes to count F2FS_CP_WB_DATA · ead2adf5
      Jaegeuk Kim 提交于
      commit 31867b23 upstream.
      
      Otherwise, we can get wrong counts incurring checkpoint hang.
      
      IO_W (CP:  -24, Data:   24, Flush: (   0    0    1), Discard: (   0    0))
      
      Thread A                        Thread B
      - f2fs_write_data_pages
       -  __write_data_page
        - f2fs_submit_page_write
         - inc_page_count(F2FS_WB_DATA)
           type is F2FS_WB_DATA due to file is non-atomic one
      - f2fs_ioc_start_atomic_write
       - set_inode_flag(FI_ATOMIC_FILE)
                                      - f2fs_write_end_io
                                       - dec_page_count(F2FS_WB_CP_DATA)
                                         type is F2FS_WB_DATA due to file becomes
                                         atomic one
      
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      ead2adf5
    • V
      net: sched: flower: insert new filter to idr after setting its mask · 0183442b
      Vlad Buslov 提交于
      [ Upstream commit ecb3dea400d3beaf611ce76ac7a51d4230492cf2 ]
      
      When adding new filter to flower classifier, fl_change() inserts it to
      handle_idr before initializing filter extensions and assigning it a mask.
      Normally this ordering doesn't matter because all flower classifier ops
      callbacks assume rtnl lock protection. However, when filter has an action
      that doesn't have its kernel module loaded, rtnl lock is released before
      call to request_module(). During this time the filter can be accessed bu
      concurrent task before its initialization is completed, which can lead to a
      crash.
      
      Example case of NULL pointer dereference in concurrent dump:
      
      Task 1                           Task 2
      
      tc_new_tfilter()
       fl_change()
        idr_alloc_u32(fnew)
        fl_set_parms()
         tcf_exts_validate()
          tcf_action_init()
           tcf_action_init_1()
            rtnl_unlock()
            request_module()
            ...                        rtnl_lock()
            				 tc_dump_tfilter()
            				  tcf_chain_dump()
      				   fl_walk()
      				    idr_get_next_ul()
      				    tcf_node_dump()
      				     tcf_fill_node()
      				      fl_dump()
      				       mask = &f->mask->key; <- NULL ptr
            rtnl_lock()
      
      Extension initialization and mask assignment don't depend on fnew->handle
      that is allocated by idr_alloc_u32(). Move idr allocation code after action
      creation and mask assignment in fl_change() to prevent concurrent access
      to not fully initialized filter when rtnl lock is released to load action
      module.
      
      Fixes: 01683a14 ("net: sched: refactor flower walk to iterate over idr")
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Reviewed-by: NRoi Dayan <roid@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      0183442b
    • A
      missing barriers in some of unix_sock ->addr and ->path accesses · c96d668e
      Al Viro 提交于
      [ Upstream commit ae3b5641 ]
      
      Several u->addr and u->path users are not holding any locks in
      common with unix_bind().  unix_state_lock() is useless for those
      purposes.
      
      u->addr is assign-once and *(u->addr) is fully set up by the time
      we set u->addr (all under unix_table_lock).  u->path is also
      set in the same critical area, also before setting u->addr, and
      any unix_sock with ->path filled will have non-NULL ->addr.
      
      So setting ->addr with smp_store_release() is all we need for those
      "lockless" users - just have them fetch ->addr with smp_load_acquire()
      and don't even bother looking at ->path if they see NULL ->addr.
      
      Users of ->addr and ->path fall into several classes now:
          1) ones that do smp_load_acquire(u->addr) and access *(u->addr)
      and u->path only if smp_load_acquire() has returned non-NULL.
          2) places holding unix_table_lock.  These are guaranteed that
      *(u->addr) is seen fully initialized.  If unix_sock is in one of the
      "bound" chains, so's ->path.
          3) unix_sock_destructor() using ->addr is safe.  All places
      that set u->addr are guaranteed to have seen all stores *(u->addr)
      while holding a reference to u and unix_sock_destructor() is called
      when (atomic) refcount hits zero.
          4) unix_release_sock() using ->path is safe.  unix_bind()
      is serialized wrt unix_release() (normally - by struct file
      refcount), and for the instances that had ->path set by unix_bind()
      unix_release_sock() comes from unix_release(), so they are fine.
      Instances that had it set in unix_stream_connect() either end up
      attached to a socket (in unix_accept()), in which case the call
      chain to unix_release_sock() and serialization are the same as in
      the previous case, or they never get accept'ed and unix_release_sock()
      is called when the listener is shut down and its queue gets purged.
      In that case the listener's queue lock provides the barriers needed -
      unix_stream_connect() shoves our unix_sock into listener's queue
      under that lock right after having set ->path and eventual
      unix_release_sock() caller picks them from that queue under the
      same lock right before calling unix_release_sock().
          5) unix_find_other() use of ->path is pointless, but safe -
      it happens with successful lookup by (abstract) name, so ->path.dentry
      is guaranteed to be NULL there.
      earlier-variant-reviewed-by: N"Paul E. McKenney" <paulmck@linux.ibm.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      c96d668e
    • D
      ipvlan: disallow userns cap_net_admin to change global mode/flags · 394e8359
      Daniel Borkmann 提交于
      [ Upstream commit 7cc9f700 ]
      
      When running Docker with userns isolation e.g. --userns-remap="default"
      and spawning up some containers with CAP_NET_ADMIN under this realm, I
      noticed that link changes on ipvlan slave device inside that container
      can affect all devices from this ipvlan group which are in other net
      namespaces where the container should have no permission to make changes
      to, such as the init netns, for example.
      
      This effectively allows to undo ipvlan private mode and switch globally to
      bridge mode where slaves can communicate directly without going through
      hostns, or it allows to switch between global operation mode (l2/l3/l3s)
      for everyone bound to the given ipvlan master device. libnetwork plugin
      here is creating an ipvlan master and ipvlan slave in hostns and a slave
      each that is moved into the container's netns upon creation event.
      
      * In hostns:
      
        # ip -d a
        [...]
        8: cilium_host@bond0: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
           link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
           ipvlan  mode l3 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
           inet 10.41.0.1/32 scope link cilium_host
             valid_lft forever preferred_lft forever
        [...]
      
      * Spawn container & change ipvlan mode setting inside of it:
      
        # docker run -dt --cap-add=NET_ADMIN --network cilium-net --name client -l app=test cilium/netperf
        9fff485d69dcb5ce37c9e33ca20a11ccafc236d690105aadbfb77e4f4170879c
      
        # docker exec -ti client ip -d a
        [...]
        10: cilium0@if4: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
            link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
            ipvlan  mode l3 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
            inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0
               valid_lft forever preferred_lft forever
      
        # docker exec -ti client ip link change link cilium0 name cilium0 type ipvlan mode l2
      
        # docker exec -ti client ip -d a
        [...]
        10: cilium0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
            link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
            ipvlan  mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
            inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0
               valid_lft forever preferred_lft forever
      
      * In hostns (mode switched to l2):
      
        # ip -d a
        [...]
        8: cilium_host@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
            link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
            ipvlan  mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
            inet 10.41.0.1/32 scope link cilium_host
               valid_lft forever preferred_lft forever
        [...]
      
      Same l3 -> l2 switch would also happen by creating another slave inside
      the container's network namespace when specifying the existing cilium0
      link to derive the actual (bond0) master:
      
        # docker exec -ti client ip link add link cilium0 name cilium1 type ipvlan mode l2
      
        # docker exec -ti client ip -d a
        [...]
        2: cilium1@if4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
            link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
            ipvlan  mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
        10: cilium0@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
            link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
            ipvlan  mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
            inet 10.41.197.43/32 brd 10.41.197.43 scope global cilium0
               valid_lft forever preferred_lft forever
      
      * In hostns:
      
        # ip -d a
        [...]
        8: cilium_host@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
            link/ether 0c:c4:7a:e1:3d:cc brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 65535
            ipvlan  mode l2 bridge numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
            inet 10.41.0.1/32 scope link cilium_host
               valid_lft forever preferred_lft forever
        [...]
      
      One way to mitigate it is to check CAP_NET_ADMIN permissions of
      the ipvlan master device's ns, and only then allow to change
      mode or flags for all devices bound to it. Above two cases are
      then disallowed after the patch.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NMahesh Bandewar <maheshb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      394e8359
    • K
      net: Set rtm_table to RT_TABLE_COMPAT for ipv6 for tables > 255 · 09ff3f80
      Kalash Nainwal 提交于
      [ Upstream commit 97f0082a ]
      
      Set rtm_table to RT_TABLE_COMPAT for ipv6 for tables > 255 to
      keep legacy software happy. This is similar to what was done for
      ipv4 in commit 709772e6 ("net: Fix routing tables with
      id > 255 for legacy software").
      Signed-off-by: NKalash Nainwal <kalash@arista.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      09ff3f80
    • E
      net/x25: fix a race in x25_bind() · a282c337
      Eric Dumazet 提交于
      [ Upstream commit 797a22bd ]
      
      syzbot was able to trigger another soft lockup [1]
      
      I first thought it was the O(N^2) issue I mentioned in my
      prior fix (f657d22ee1f "net/x25: do not hold the cpu
      too long in x25_new_lci()"), but I eventually found
      that x25_bind() was not checking SOCK_ZAPPED state under
      socket lock protection.
      
      This means that multiple threads can end up calling
      x25_insert_socket() for the same socket, and corrupt x25_list
      
      [1]
      watchdog: BUG: soft lockup - CPU#0 stuck for 123s! [syz-executor.2:10492]
      Modules linked in:
      irq event stamp: 27515
      hardirqs last  enabled at (27514): [<ffffffff81006673>] trace_hardirqs_on_thunk+0x1a/0x1c
      hardirqs last disabled at (27515): [<ffffffff8100668f>] trace_hardirqs_off_thunk+0x1a/0x1c
      softirqs last  enabled at (32): [<ffffffff8632ee73>] x25_get_neigh+0xa3/0xd0 net/x25/x25_link.c:336
      softirqs last disabled at (34): [<ffffffff86324bc3>] x25_find_socket+0x23/0x140 net/x25/af_x25.c:341
      CPU: 0 PID: 10492 Comm: syz-executor.2 Not tainted 5.0.0-rc7+ #88
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:__sanitizer_cov_trace_pc+0x4/0x50 kernel/kcov.c:97
      Code: f4 ff ff ff e8 11 9f ea ff 48 c7 05 12 fb e5 08 00 00 00 00 e9 c8 e9 ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 55 48 89 e5 <48> 8b 75 08 65 48 8b 04 25 40 ee 01 00 65 8b 15 38 0c 92 7e 81 e2
      RSP: 0018:ffff88806e94fc48 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13
      RAX: 1ffff1100d84dac5 RBX: 0000000000000001 RCX: ffffc90006197000
      RDX: 0000000000040000 RSI: ffffffff86324bf3 RDI: ffff88806c26d628
      RBP: ffff88806e94fc48 R08: ffff88806c1c6500 R09: fffffbfff1282561
      R10: fffffbfff1282560 R11: ffffffff89412b03 R12: ffff88806c26d628
      R13: ffff888090455200 R14: dffffc0000000000 R15: 0000000000000000
      FS:  00007f3a107e4700(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f3a107e3db8 CR3: 00000000a5544000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       __x25_find_socket net/x25/af_x25.c:327 [inline]
       x25_find_socket+0x7d/0x140 net/x25/af_x25.c:342
       x25_new_lci net/x25/af_x25.c:355 [inline]
       x25_connect+0x380/0xde0 net/x25/af_x25.c:784
       __sys_connect+0x266/0x330 net/socket.c:1662
       __do_sys_connect net/socket.c:1673 [inline]
       __se_sys_connect net/socket.c:1670 [inline]
       __x64_sys_connect+0x73/0xb0 net/socket.c:1670
       do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x457e29
      Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f3a107e3c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
      RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457e29
      RDX: 0000000000000012 RSI: 0000000020000200 RDI: 0000000000000005
      RBP: 000000000073c040 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007f3a107e46d4
      R13: 00000000004be362 R14: 00000000004ceb98 R15: 00000000ffffffff
      Sending NMI from CPU 0 to CPUs 1:
      NMI backtrace for cpu 1
      CPU: 1 PID: 10493 Comm: syz-executor.3 Not tainted 5.0.0-rc7+ #88
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:__read_once_size include/linux/compiler.h:193 [inline]
      RIP: 0010:queued_write_lock_slowpath+0x143/0x290 kernel/locking/qrwlock.c:86
      Code: 4c 8d 2c 01 41 83 c7 03 41 0f b6 45 00 41 38 c7 7c 08 84 c0 0f 85 0c 01 00 00 8b 03 3d 00 01 00 00 74 1a f3 90 41 0f b6 55 00 <41> 38 d7 7c eb 84 d2 74 e7 48 89 df e8 cc aa 4e 00 eb dd be 04 00
      RSP: 0018:ffff888085c47bd8 EFLAGS: 00000206
      RAX: 0000000000000300 RBX: ffffffff89412b00 RCX: 1ffffffff1282560
      RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff89412b00
      RBP: ffff888085c47c70 R08: 1ffffffff1282560 R09: fffffbfff1282561
      R10: fffffbfff1282560 R11: ffffffff89412b03 R12: 00000000000000ff
      R13: fffffbfff1282560 R14: 1ffff11010b88f7d R15: 0000000000000003
      FS:  00007fdd04086700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fdd04064db8 CR3: 0000000090be0000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       queued_write_lock include/asm-generic/qrwlock.h:104 [inline]
       do_raw_write_lock+0x1d6/0x290 kernel/locking/spinlock_debug.c:203
       __raw_write_lock_bh include/linux/rwlock_api_smp.h:204 [inline]
       _raw_write_lock_bh+0x3b/0x50 kernel/locking/spinlock.c:312
       x25_insert_socket+0x21/0xe0 net/x25/af_x25.c:267
       x25_bind+0x273/0x340 net/x25/af_x25.c:703
       __sys_bind+0x23f/0x290 net/socket.c:1481
       __do_sys_bind net/socket.c:1492 [inline]
       __se_sys_bind net/socket.c:1490 [inline]
       __x64_sys_bind+0x73/0xb0 net/socket.c:1490
       do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x457e29
      
      Fixes: 90c27297 ("X.25 remove bkl in bind")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: andrew hendry <andrew.hendry@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      a282c337
    • J
      net/mlx4_core: Fix qp mtt size calculation · da8639ad
      Jack Morgenstein 提交于
      [ Upstream commit 8511a653 ]
      
      Calculation of qp mtt size (in function mlx4_RST2INIT_wrapper)
      ultimately depends on function roundup_pow_of_two.
      
      If the amount of memory required by the QP is less than one page,
      roundup_pow_of_two is called with argument zero.  In this case, the
      roundup_pow_of_two result is undefined.
      
      Calling roundup_pow_of_two with a zero argument resulted in the
      following stack trace:
      
      UBSAN: Undefined behaviour in ./include/linux/log2.h:61:13
      shift exponent 64 is too large for 64-bit type 'long unsigned int'
      CPU: 4 PID: 26939 Comm: rping Tainted: G OE 4.19.0-rc1
      Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.2a 07/09/2015
      Call Trace:
      dump_stack+0x9a/0xeb
      ubsan_epilogue+0x9/0x7c
      __ubsan_handle_shift_out_of_bounds+0x254/0x29d
      ? __ubsan_handle_load_invalid_value+0x180/0x180
      ? debug_show_all_locks+0x310/0x310
      ? sched_clock+0x5/0x10
      ? sched_clock+0x5/0x10
      ? sched_clock_cpu+0x18/0x260
      ? find_held_lock+0x35/0x1e0
      ? mlx4_RST2INIT_QP_wrapper+0xfb1/0x1440 [mlx4_core]
      mlx4_RST2INIT_QP_wrapper+0xfb1/0x1440 [mlx4_core]
      
      Fix this by explicitly testing for zero, and returning one if the
      argument is zero (assuming that the next higher power of 2 in this case
      should be one).
      
      Fixes: c82e9aa0 ("mlx4_core: resource tracking for HCA resources used by guests")
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      da8639ad
    • J
      net/mlx4_core: Fix locking in SRIOV mode when switching between events and polling · 8c14dffa
      Jack Morgenstein 提交于
      [ Upstream commit c07d2792 ]
      
      In procedures mlx4_cmd_use_events() and mlx4_cmd_use_polling(), we need to
      guarantee that there are no FW commands in progress on the comm channel
      (for VFs) or wrapped FW commands (on the PF) when SRIOV is active.
      
      We do this by also taking the slave_cmd_mutex when SRIOV is active.
      
      This is especially important when switching from event to polling, since we
      free the command-context array during the switch.  If there are FW commands
      in progress (e.g., waiting for a completion event), the completion event
      handler will access freed memory.
      
      Since the decision to use comm_wait or comm_poll is taken before grabbing
      the event_sem/poll_sem in mlx4_comm_cmd_wait/poll, we must take the
      slave_cmd_mutex as well (to guarantee that the decision to use events or
      polling and the call to the appropriate cmd function are atomic).
      
      Fixes: a7e1f049 ("net/mlx4_core: Fix deadlock when switching between polling and event fw commands")
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      8c14dffa
    • J
      net/mlx4_core: Fix reset flow when in command polling mode · bd279712
      Jack Morgenstein 提交于
      [ Upstream commit e15ce4b8 ]
      
      As part of unloading a device, the driver switches from
      FW command event mode to FW command polling mode.
      
      Part of switching over to polling mode is freeing the command context array
      memory (unfortunately, currently, without NULLing the command context array
      pointer).
      
      The reset flow calls "complete" to complete all outstanding fw commands
      (if we are in event mode). The check for event vs. polling mode here
      is to test if the command context array pointer is NULL.
      
      If the reset flow is activated after the switch to polling mode, it will
      attempt (incorrectly) to complete all the commands in the context array --
      because the pointer was not NULLed when the driver switched over to polling
      mode.
      
      As a result, we have a use-after-free situation, which results in a
      kernel crash.
      
      For example:
      BUG: unable to handle kernel NULL pointer dereference at           (null)
      IP: [<ffffffff876c4a8e>] __wake_up_common+0x2e/0x90
      PGD 0
      Oops: 0000 [#1] SMP
      Modules linked in: netconsole nfsv3 nfs_acl nfs lockd grace ...
      CPU: 2 PID: 940 Comm: kworker/2:3 Kdump: loaded Not tainted 3.10.0-862.el7.x86_64 #1
      Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006  04/28/2016
      Workqueue: events hv_eject_device_work [pci_hyperv]
      task: ffff8d1734ca0fd0 ti: ffff8d17354bc000 task.ti: ffff8d17354bc000
      RIP: 0010:[<ffffffff876c4a8e>]  [<ffffffff876c4a8e>] __wake_up_common+0x2e/0x90
      RSP: 0018:ffff8d17354bfa38  EFLAGS: 00010082
      RAX: 0000000000000000 RBX: ffff8d17362d42c8 RCX: 0000000000000000
      RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff8d17362d42c8
      RBP: ffff8d17354bfa70 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000298 R11: ffff8d173610e000 R12: ffff8d17362d42d0
      R13: 0000000000000246 R14: 0000000000000000 R15: 0000000000000003
      FS:  0000000000000000(0000) GS:ffff8d1802680000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 00000000f16d8000 CR4: 00000000001406e0
      Call Trace:
       [<ffffffff876c7adc>] complete+0x3c/0x50
       [<ffffffffc04242f0>] mlx4_cmd_wake_completions+0x70/0x90 [mlx4_core]
       [<ffffffffc041e7b1>] mlx4_enter_error_state+0xe1/0x380 [mlx4_core]
       [<ffffffffc041fa4b>] mlx4_comm_cmd+0x29b/0x360 [mlx4_core]
       [<ffffffffc041ff51>] __mlx4_cmd+0x441/0x920 [mlx4_core]
       [<ffffffff877f62b1>] ? __slab_free+0x81/0x2f0
       [<ffffffff87951384>] ? __radix_tree_lookup+0x84/0xf0
       [<ffffffffc043a8eb>] mlx4_free_mtt_range+0x5b/0xb0 [mlx4_core]
       [<ffffffffc043a957>] mlx4_mtt_cleanup+0x17/0x20 [mlx4_core]
       [<ffffffffc04272c7>] mlx4_free_eq+0xa7/0x1c0 [mlx4_core]
       [<ffffffffc042803e>] mlx4_cleanup_eq_table+0xde/0x130 [mlx4_core]
       [<ffffffffc0433e08>] mlx4_unload_one+0x118/0x300 [mlx4_core]
       [<ffffffffc0434191>] mlx4_remove_one+0x91/0x1f0 [mlx4_core]
      
      The fix is to set the command context array pointer to NULL after freeing
      the array.
      
      Fixes: f5aef5aa ("net/mlx4_core: Activate reset flow upon fatal command cases")
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      bd279712
    • E
      vxlan: test dev->flags & IFF_UP before calling gro_cells_receive() · 61b6a4d7
      Eric Dumazet 提交于
      [ Upstream commit 59cbf56f ]
      
      Same reasons than the ones explained in commit 4179cb5a
      ("vxlan: test dev->flags & IFF_UP before calling netif_rx()")
      
      netif_rx() or gro_cells_receive() must be called under a strict contract.
      
      At device dismantle phase, core networking clears IFF_UP
      and flush_all_backlogs() is called after rcu grace period
      to make sure no incoming packet might be in a cpu backlog
      and still referencing the device.
      
      A similar protocol is used for gro_cells infrastructure, as
      gro_cells_destroy() will be called only after a full rcu
      grace period is observed after IFF_UP has been cleared.
      
      Most drivers call netif_rx() from their interrupt handler,
      and since the interrupts are disabled at device dismantle,
      netif_rx() does not have to check dev->flags & IFF_UP
      
      Virtual drivers do not have this guarantee, and must
      therefore make the check themselves.
      
      Otherwise we risk use-after-free and/or crashes.
      
      Fixes: d342894c ("vxlan: virtual extensible lan")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      61b6a4d7
    • S
      vxlan: Fix GRO cells race condition between receive and link delete · 5afd3840
      Stefano Brivio 提交于
      [ Upstream commit ad6c9986 ]
      
      If we receive a packet while deleting a VXLAN device, there's a chance
      vxlan_rcv() is called at the same time as vxlan_dellink(). This is fine,
      except that vxlan_dellink() should never ever touch stuff that's still in
      use, such as the GRO cells list.
      
      Otherwise, vxlan_rcv() crashes while queueing packets via
      gro_cells_receive().
      
      Move the gro_cells_destroy() to vxlan_uninit(), which runs after the RCU
      grace period is elapsed and nothing needs the gro_cells anymore.
      
      This is now done in the same way as commit 8e816df8 ("geneve: Use GRO
      cells infrastructure.") originally implemented for GENEVE.
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Fixes: 58ce31cc ("vxlan: GRO support at tunnel layer")
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      5afd3840
    • G
      tcp: handle inet_csk_reqsk_queue_add() failures · 4788d11e
      Guillaume Nault 提交于
      [  Upstream commit 9d3e1368 ]
      
      Commit 7716682c ("tcp/dccp: fix another race at listener
      dismantle") let inet_csk_reqsk_queue_add() fail, and adjusted
      {tcp,dccp}_check_req() accordingly. However, TFO and syncookies
      weren't modified, thus leaking allocated resources on error.
      
      Contrary to tcp_check_req(), in both syncookies and TFO cases,
      we need to drop the request socket. Also, since the child socket is
      created with inet_csk_clone_lock(), we have to unlock it and drop an
      extra reference (->sk_refcount is initially set to 2 and
      inet_csk_reqsk_queue_add() drops only one ref).
      
      For TFO, we also need to revert the work done by tcp_try_fastopen()
      (with reqsk_fastopen_remove()).
      
      Fixes: 7716682c ("tcp/dccp: fix another race at listener dismantle")
      Signed-off-by: NGuillaume Nault <gnault@redhat.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      4788d11e
    • C
      tcp: Don't access TCP_SKB_CB before initializing it · 5b73346b
      Christoph Paasch 提交于
      [ Upstream commit f2feaefd ]
      
      Since commit eeea10b8 ("tcp: add
      tcp_v4_fill_cb()/tcp_v4_restore_cb()"), tcp_vX_fill_cb is only called
      after tcp_filter(). That means, TCP_SKB_CB(skb)->end_seq still points to
      the IP-part of the cb.
      
      We thus should not mock with it, as this can trigger bugs (thanks
      syzkaller):
      [   12.349396] ==================================================================
      [   12.350188] BUG: KASAN: slab-out-of-bounds in ip6_datagram_recv_specific_ctl+0x19b3/0x1a20
      [   12.351035] Read of size 1 at addr ffff88006adbc208 by task test_ip6_datagr/1799
      
      Setting end_seq is actually no more necessary in tcp_filter as it gets
      initialized later on in tcp_vX_fill_cb.
      
      Cc: Eric Dumazet <edumazet@google.com>
      Fixes: eeea10b8 ("tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()")
      Signed-off-by: NChristoph Paasch <cpaasch@apple.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      5b73346b
    • S
      tcp: do not report TCP_CM_INQ of 0 for closed connections · 4000bf8e
      Soheil Hassas Yeganeh 提交于
      [ Upstream commit 6466e715 ]
      
      Returning 0 as inq to userspace indicates there is no more data to
      read, and the application needs to wait for EPOLLIN. For a connection
      that has received FIN from the remote peer, however, the application
      must continue reading until getting EOF (return value of 0
      from tcp_recvmsg) or an error, if edge-triggered epoll (EPOLLET) is
      being used. Otherwise, the application will never receive a new
      EPOLLIN, since there is no epoll edge after the FIN.
      
      Return 1 when there is no data left on the queue but the
      connection has received FIN, so that the applications continue
      reading.
      
      Fixes: b75eba76 (tcp: send in-queue bytes in cmsg upon read)
      Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      4000bf8e
    • X
      sctp: remove sched init from sctp_stream_init · 286bb162
      Xin Long 提交于
      [ Upstream commit 2e990dfd ]
      
      syzbot reported a NULL-ptr deref caused by that sched->init() in
      sctp_stream_init() set stream->rr_next = NULL.
      
        kasan: GPF could be caused by NULL-ptr deref or user memory access
        RIP: 0010:sctp_sched_rr_dequeue+0xd3/0x170 net/sctp/stream_sched_rr.c:141
        Call Trace:
          sctp_outq_dequeue_data net/sctp/outqueue.c:90 [inline]
          sctp_outq_flush_data net/sctp/outqueue.c:1079 [inline]
          sctp_outq_flush+0xba2/0x2790 net/sctp/outqueue.c:1205
      
      All sched info is saved in sout->ext now, in sctp_stream_init()
      sctp_stream_alloc_out() will not change it, there's no need to
      call sched->init() again, since sctp_outq_init() has already
      done it.
      
      Fixes: 5bbbbe32 ("sctp: introduce stream scheduler foundations")
      Reported-by: syzbot+4c9934f20522c0efd657@syzkaller.appspotmail.com
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      286bb162