1. 13 12月, 2016 12 次提交
    • I
      libceph: no need for GFP_NOFS in ceph_monc_init() · 5418d0a2
      Ilya Dryomov 提交于
      It's called during inital setup, when everything should be allocated
      with GFP_KERNEL.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      5418d0a2
    • I
      libceph: stop allocating a new cipher on every crypto request · 7af3ea18
      Ilya Dryomov 提交于
      This is useless and more importantly not allowed on the writeback path,
      because crypto_alloc_skcipher() allocates memory with GFP_KERNEL, which
      can recurse back into the filesystem:
      
          kworker/9:3     D ffff92303f318180     0 20732      2 0x00000080
          Workqueue: ceph-msgr ceph_con_workfn [libceph]
           ffff923035dd4480 ffff923038f8a0c0 0000000000000001 000000009eb27318
           ffff92269eb28000 ffff92269eb27338 ffff923036b145ac ffff923035dd4480
           00000000ffffffff ffff923036b145b0 ffffffff951eb4e1 ffff923036b145a8
          Call Trace:
           [<ffffffff951eb4e1>] ? schedule+0x31/0x80
           [<ffffffff951eb77a>] ? schedule_preempt_disabled+0xa/0x10
           [<ffffffff951ed1f4>] ? __mutex_lock_slowpath+0xb4/0x130
           [<ffffffff951ed28b>] ? mutex_lock+0x1b/0x30
           [<ffffffffc0a974b3>] ? xfs_reclaim_inodes_ag+0x233/0x2d0 [xfs]
           [<ffffffff94d92ba5>] ? move_active_pages_to_lru+0x125/0x270
           [<ffffffff94f2b985>] ? radix_tree_gang_lookup_tag+0xc5/0x1c0
           [<ffffffff94dad0f3>] ? __list_lru_walk_one.isra.3+0x33/0x120
           [<ffffffffc0a98331>] ? xfs_reclaim_inodes_nr+0x31/0x40 [xfs]
           [<ffffffff94e05bfe>] ? super_cache_scan+0x17e/0x190
           [<ffffffff94d919f3>] ? shrink_slab.part.38+0x1e3/0x3d0
           [<ffffffff94d9616a>] ? shrink_node+0x10a/0x320
           [<ffffffff94d96474>] ? do_try_to_free_pages+0xf4/0x350
           [<ffffffff94d967ba>] ? try_to_free_pages+0xea/0x1b0
           [<ffffffff94d863bd>] ? __alloc_pages_nodemask+0x61d/0xe60
           [<ffffffff94ddf42d>] ? cache_grow_begin+0x9d/0x560
           [<ffffffff94ddfb88>] ? fallback_alloc+0x148/0x1c0
           [<ffffffff94ed84e7>] ? __crypto_alloc_tfm+0x37/0x130
           [<ffffffff94de09db>] ? __kmalloc+0x1eb/0x580
           [<ffffffffc09fe2db>] ? crush_choose_firstn+0x3eb/0x470 [libceph]
           [<ffffffff94ed84e7>] ? __crypto_alloc_tfm+0x37/0x130
           [<ffffffff94ed9c19>] ? crypto_spawn_tfm+0x39/0x60
           [<ffffffffc08b30a3>] ? crypto_cbc_init_tfm+0x23/0x40 [cbc]
           [<ffffffff94ed857c>] ? __crypto_alloc_tfm+0xcc/0x130
           [<ffffffff94edcc23>] ? crypto_skcipher_init_tfm+0x113/0x180
           [<ffffffff94ed7cc3>] ? crypto_create_tfm+0x43/0xb0
           [<ffffffff94ed83b0>] ? crypto_larval_lookup+0x150/0x150
           [<ffffffff94ed7da2>] ? crypto_alloc_tfm+0x72/0x120
           [<ffffffffc0a01dd7>] ? ceph_aes_encrypt2+0x67/0x400 [libceph]
           [<ffffffffc09fd264>] ? ceph_pg_to_up_acting_osds+0x84/0x5b0 [libceph]
           [<ffffffff950d40a0>] ? release_sock+0x40/0x90
           [<ffffffff95139f94>] ? tcp_recvmsg+0x4b4/0xae0
           [<ffffffffc0a02714>] ? ceph_encrypt2+0x54/0xc0 [libceph]
           [<ffffffffc0a02b4d>] ? ceph_x_encrypt+0x5d/0x90 [libceph]
           [<ffffffffc0a02bdf>] ? calcu_signature+0x5f/0x90 [libceph]
           [<ffffffffc0a02ef5>] ? ceph_x_sign_message+0x35/0x50 [libceph]
           [<ffffffffc09e948c>] ? prepare_write_message_footer+0x5c/0xa0 [libceph]
           [<ffffffffc09ecd18>] ? ceph_con_workfn+0x2258/0x2dd0 [libceph]
           [<ffffffffc09e9903>] ? queue_con_delay+0x33/0xd0 [libceph]
           [<ffffffffc09f68ed>] ? __submit_request+0x20d/0x2f0 [libceph]
           [<ffffffffc09f6ef8>] ? ceph_osdc_start_request+0x28/0x30 [libceph]
           [<ffffffffc0b52603>] ? rbd_queue_workfn+0x2f3/0x350 [rbd]
           [<ffffffff94c94ec0>] ? process_one_work+0x160/0x410
           [<ffffffff94c951bd>] ? worker_thread+0x4d/0x480
           [<ffffffff94c95170>] ? process_one_work+0x410/0x410
           [<ffffffff94c9af8d>] ? kthread+0xcd/0xf0
           [<ffffffff951efb2f>] ? ret_from_fork+0x1f/0x40
           [<ffffffff94c9aec0>] ? kthread_create_on_node+0x190/0x190
      
      Allocating the cipher along with the key fixes the issue - as long the
      key doesn't change, a single cipher context can be used concurrently in
      multiple requests.
      
      We still can't take that GFP_KERNEL allocation though.  Both
      ceph_crypto_key_clone() and ceph_crypto_key_decode() are called from
      GFP_NOFS context, so resort to memalloc_noio_{save,restore}() here.
      Reported-by: NLucas Stach <l.stach@pengutronix.de>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      7af3ea18
    • I
      libceph: uninline ceph_crypto_key_destroy() · 6db2304a
      Ilya Dryomov 提交于
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      6db2304a
    • I
      2b1e1a7c
    • I
      libceph: switch ceph_x_decrypt() to ceph_crypt() · e15fd0a1
      Ilya Dryomov 提交于
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      e15fd0a1
    • I
      libceph: switch ceph_x_encrypt() to ceph_crypt() · d03857c6
      Ilya Dryomov 提交于
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      d03857c6
    • I
      libceph: tweak calcu_signature() a little · 4eb4517c
      Ilya Dryomov 提交于
      - replace an ad-hoc array with a struct
      - rename to calc_signature() for consistency
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      4eb4517c
    • I
      libceph: rename and align ceph_x_authorizer::reply_buf · 7882a26d
      Ilya Dryomov 提交于
      It's going to be used as a temporary buffer for in-place en/decryption
      with ceph_crypt() instead of on-stack buffers, so rename to enc_buf.
      Ensure alignment to avoid GFP_ATOMIC allocations in the crypto stack.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      7882a26d
    • I
      libceph: introduce ceph_crypt() for in-place en/decryption · a45f795c
      Ilya Dryomov 提交于
      Starting with 4.9, kernel stacks may be vmalloced and therefore not
      guaranteed to be physically contiguous; the new CONFIG_VMAP_STACK
      option is enabled by default on x86.  This makes it invalid to use
      on-stack buffers with the crypto scatterlist API, as sg_set_buf()
      expects a logical address and won't work with vmalloced addresses.
      
      There isn't a different (e.g. kvec-based) crypto API we could switch
      net/ceph/crypto.c to and the current scatterlist.h API isn't getting
      updated to accommodate this use case.  Allocating a new header and
      padding for each operation is a non-starter, so do the en/decryption
      in-place on a single pre-assembled (header + data + padding) heap
      buffer.  This is explicitly supported by the crypto API:
      
          "... the caller may provide the same scatter/gather list for the
           plaintext and cipher text. After the completion of the cipher
           operation, the plaintext data is replaced with the ciphertext data
           in case of an encryption and vice versa for a decryption."
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      a45f795c
    • I
      libceph: introduce ceph_x_encrypt_offset() · 55d9cc83
      Ilya Dryomov 提交于
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      55d9cc83
    • I
      libceph: old_key in process_one_ticket() is redundant · 462e6504
      Ilya Dryomov 提交于
      Since commit 0a990e70 ("ceph: clean up service ticket decoding"),
      th->session_key isn't assigned until everything is decoded.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      462e6504
    • I
      libceph: ceph_x_encrypt_buflen() takes in_len · 36721ece
      Ilya Dryomov 提交于
      Pass what's going to be encrypted - that's msg_b, not ticket_blob.
      ceph_x_encrypt_buflen() returns the upper bound, so this doesn't change
      the maxlen calculation, but makes it a bit clearer.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      36721ece
  2. 07 12月, 2016 2 次提交
    • M
      can: raw: raw_setsockopt: limit number of can_filter that can be set · 332b05ca
      Marc Kleine-Budde 提交于
      This patch adds a check to limit the number of can_filters that can be
      set via setsockopt on CAN_RAW sockets. Otherwise allocations > MAX_ORDER
      are not prevented resulting in a warning.
      
      Reference: https://lkml.org/lkml/2016/12/2/230Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Tested-by: NAndrey Konovalov <andreyknvl@google.com>
      Cc: linux-stable <stable@vger.kernel.org>
      Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      332b05ca
    • M
      tcp: warn on bogus MSS and try to amend it · dcb17d22
      Marcelo Ricardo Leitner 提交于
      There have been some reports lately about TCP connection stalls caused
      by NIC drivers that aren't setting gso_size on aggregated packets on rx
      path. This causes TCP to assume that the MSS is actually the size of the
      aggregated packet, which is invalid.
      
      Although the proper fix is to be done at each driver, it's often hard
      and cumbersome for one to debug, come to such root cause and report/fix
      it.
      
      This patch amends this situation in two ways. First, it adds a warning
      on when this situation occurs, so it gives a hint to those trying to
      debug this. It also limit the maximum probed MSS to the adverised MSS,
      as it should never be any higher than that.
      
      The result is that the connection may not have the best performance ever
      but it shouldn't stall, and the admin will have a hint on what to look
      for.
      
      Tested with virtio by forcing gso_size to 0.
      
      v2: updated msg per David's suggestion
      v3: use skb_iif to find the interface and also log its name, per Eric
          Dumazet's suggestion. As the skb may be backlogged and the interface
          gone by then, we need to check if the number still has a meaning.
      v4: use helper tcp_gro_dev_warn() and avoid pr_warn_once inside __once, per
          David's suggestion
      
      Cc: Jonathan Maxwell <jmaxwell37@gmail.com>
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dcb17d22
  3. 06 12月, 2016 7 次提交
  4. 04 12月, 2016 1 次提交
  5. 03 12月, 2016 8 次提交
    • E
      net: avoid signed overflows for SO_{SND|RCV}BUFFORCE · b98b0bc8
      Eric Dumazet 提交于
      CAP_NET_ADMIN users should not be allowed to set negative
      sk_sndbuf or sk_rcvbuf values, as it can lead to various memory
      corruptions, crashes, OOM...
      
      Note that before commit 82981930 ("net: cleanups in
      sock_setsockopt()"), the bug was even more serious, since SO_SNDBUF
      and SO_RCVBUF were vulnerable.
      
      This needs to be backported to all known linux kernels.
      
      Again, many thanks to syzkaller team for discovering this gem.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b98b0bc8
    • M
      tipc: check minimum bearer MTU · 3de81b75
      Michal Kubeček 提交于
      Qian Zhang (张谦) reported a potential socket buffer overflow in
      tipc_msg_build() which is also known as CVE-2016-8632: due to
      insufficient checks, a buffer overflow can occur if MTU is too short for
      even tipc headers. As anyone can set device MTU in a user/net namespace,
      this issue can be abused by a regular user.
      
      As agreed in the discussion on Ben Hutchings' original patch, we should
      check the MTU at the moment a bearer is attached rather than for each
      processed packet. We also need to repeat the check when bearer MTU is
      adjusted to new device MTU. UDP case also needs a check to avoid
      overflow when calculating bearer MTU.
      
      Fixes: b97bf3fd ("[TIPC] Initial merge")
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Reported-by: NQian Zhang (张谦) <zhangqian-c@360.cn>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3de81b75
    • A
      ip6_offload: check segs for NULL in ipv6_gso_segment. · 6b6ebb6b
      Artem Savkov 提交于
      segs needs to be checked for being NULL in ipv6_gso_segment() before calling
      skb_shinfo(segs), otherwise kernel can run into a NULL-pointer dereference:
      
      [   97.811262] BUG: unable to handle kernel NULL pointer dereference at 00000000000000cc
      [   97.819112] IP: [<ffffffff816e52f9>] ipv6_gso_segment+0x119/0x2f0
      [   97.825214] PGD 0 [   97.827047]
      [   97.828540] Oops: 0000 [#1] SMP
      [   97.831678] Modules linked in: vhost_net vhost macvtap macvlan nfsv3 rpcsec_gss_krb5
      nfsv4 dns_resolver nfs fscache xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4
      iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack
      ipt_REJECT nf_reject_ipv4 tun ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter
      bridge stp llc snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel
      snd_hda_codec edac_mce_amd snd_hda_core edac_core snd_hwdep kvm_amd snd_seq kvm snd_seq_device
      snd_pcm irqbypass snd_timer ppdev parport_serial snd parport_pc k10temp pcspkr soundcore parport
      sp5100_tco shpchp sg wmi i2c_piix4 acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc
      ip_tables xfs libcrc32c sr_mod cdrom sd_mod ata_generic pata_acpi amdkfd amd_iommu_v2 radeon
      broadcom bcm_phy_lib i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
      ttm ahci serio_raw tg3 firewire_ohci libahci pata_atiixp drm ptp libata firewire_core pps_core
      i2c_core crc_itu_t fjes dm_mirror dm_region_hash dm_log dm_mod
      [   97.927721] CPU: 1 PID: 3504 Comm: vhost-3495 Not tainted 4.9.0-7.el7.test.x86_64 #1
      [   97.935457] Hardware name: AMD Snook/Snook, BIOS ESK0726A 07/26/2010
      [   97.941806] task: ffff880129a1c080 task.stack: ffffc90001bcc000
      [   97.947720] RIP: 0010:[<ffffffff816e52f9>]  [<ffffffff816e52f9>] ipv6_gso_segment+0x119/0x2f0
      [   97.956251] RSP: 0018:ffff88012fc43a10  EFLAGS: 00010207
      [   97.961557] RAX: 0000000000000000 RBX: ffff8801292c8700 RCX: 0000000000000594
      [   97.968687] RDX: 0000000000000593 RSI: ffff880129a846c0 RDI: 0000000000240000
      [   97.975814] RBP: ffff88012fc43a68 R08: ffff880129a8404e R09: 0000000000000000
      [   97.982942] R10: 0000000000000000 R11: ffff880129a84076 R12: 00000020002949b3
      [   97.990070] R13: ffff88012a580000 R14: 0000000000000000 R15: ffff88012a580000
      [   97.997198] FS:  0000000000000000(0000) GS:ffff88012fc40000(0000) knlGS:0000000000000000
      [   98.005280] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   98.011021] CR2: 00000000000000cc CR3: 0000000126c5d000 CR4: 00000000000006e0
      [   98.018149] Stack:
      [   98.020157]  00000000ffffffff ffff88012fc43ac8 ffffffffa017ad0a 000000000000000e
      [   98.027584]  0000001300000000 0000000077d59998 ffff8801292c8700 00000020002949b3
      [   98.035010]  ffff88012a580000 0000000000000000 ffff88012a580000 ffff88012fc43a98
      [   98.042437] Call Trace:
      [   98.044879]  <IRQ> [   98.046803]  [<ffffffffa017ad0a>] ? tg3_start_xmit+0x84a/0xd60 [tg3]
      [   98.053156]  [<ffffffff815eeee0>] skb_mac_gso_segment+0xb0/0x130
      [   98.059158]  [<ffffffff815eefd3>] __skb_gso_segment+0x73/0x110
      [   98.064985]  [<ffffffff815ef40d>] validate_xmit_skb+0x12d/0x2b0
      [   98.070899]  [<ffffffff815ef5d2>] validate_xmit_skb_list+0x42/0x70
      [   98.077073]  [<ffffffff81618560>] sch_direct_xmit+0xd0/0x1b0
      [   98.082726]  [<ffffffff815efd86>] __dev_queue_xmit+0x486/0x690
      [   98.088554]  [<ffffffff8135c135>] ? cpumask_next_and+0x35/0x50
      [   98.094380]  [<ffffffff815effa0>] dev_queue_xmit+0x10/0x20
      [   98.099863]  [<ffffffffa09ce057>] br_dev_queue_push_xmit+0xa7/0x170 [bridge]
      [   98.106907]  [<ffffffffa09ce161>] br_forward_finish+0x41/0xc0 [bridge]
      [   98.113430]  [<ffffffff81627cf2>] ? nf_iterate+0x52/0x60
      [   98.118735]  [<ffffffff81627d6b>] ? nf_hook_slow+0x6b/0xc0
      [   98.124216]  [<ffffffffa09ce32c>] __br_forward+0x14c/0x1e0 [bridge]
      [   98.130480]  [<ffffffffa09ce120>] ? br_dev_queue_push_xmit+0x170/0x170 [bridge]
      [   98.137785]  [<ffffffffa09ce4bd>] br_forward+0x9d/0xb0 [bridge]
      [   98.143701]  [<ffffffffa09cfbb7>] br_handle_frame_finish+0x267/0x560 [bridge]
      [   98.150834]  [<ffffffffa09d0064>] br_handle_frame+0x174/0x2f0 [bridge]
      [   98.157355]  [<ffffffff8102fb89>] ? sched_clock+0x9/0x10
      [   98.162662]  [<ffffffff810b63b2>] ? sched_clock_cpu+0x72/0xa0
      [   98.168403]  [<ffffffff815eccf5>] __netif_receive_skb_core+0x1e5/0xa20
      [   98.174926]  [<ffffffff813659f9>] ? timerqueue_add+0x59/0xb0
      [   98.180580]  [<ffffffff815ed548>] __netif_receive_skb+0x18/0x60
      [   98.186494]  [<ffffffff815ee625>] process_backlog+0x95/0x140
      [   98.192145]  [<ffffffff815edccd>] net_rx_action+0x16d/0x380
      [   98.197713]  [<ffffffff8170cff1>] __do_softirq+0xd1/0x283
      [   98.203106]  [<ffffffff8170b2bc>] do_softirq_own_stack+0x1c/0x30
      [   98.209107]  <EOI> [   98.211029]  [<ffffffff8108a5c0>] do_softirq+0x50/0x60
      [   98.216166]  [<ffffffff815ec853>] netif_rx_ni+0x33/0x80
      [   98.221386]  [<ffffffffa09eeff7>] tun_get_user+0x487/0x7f0 [tun]
      [   98.227388]  [<ffffffffa09ef3ab>] tun_sendmsg+0x4b/0x60 [tun]
      [   98.233129]  [<ffffffffa0b68932>] handle_tx+0x282/0x540 [vhost_net]
      [   98.239392]  [<ffffffffa0b68c25>] handle_tx_kick+0x15/0x20 [vhost_net]
      [   98.245916]  [<ffffffffa0abacfe>] vhost_worker+0x9e/0xf0 [vhost]
      [   98.251919]  [<ffffffffa0abac60>] ? vhost_umem_alloc+0x40/0x40 [vhost]
      [   98.258440]  [<ffffffff81003a47>] ? do_syscall_64+0x67/0x180
      [   98.264094]  [<ffffffff810a44d9>] kthread+0xd9/0xf0
      [   98.268965]  [<ffffffff810a4400>] ? kthread_park+0x60/0x60
      [   98.274444]  [<ffffffff8170a4d5>] ret_from_fork+0x25/0x30
      [   98.279836] Code: 8b 93 d8 00 00 00 48 2b 93 d0 00 00 00 4c 89 e6 48 89 df 66 89 93 c2 00 00 00 ff 10 48 3d 00 f0 ff ff 49 89 c2 0f 87 52 01 00 00 <41> 8b 92 cc 00 00 00 48 8b 80 d0 00 00 00 44 0f b7 74 10 06 66
      [   98.299425] RIP  [<ffffffff816e52f9>] ipv6_gso_segment+0x119/0x2f0
      [   98.305612]  RSP <ffff88012fc43a10>
      [   98.309094] CR2: 00000000000000cc
      [   98.312406] ---[ end trace 726a2c7a2d2d78d0 ]---
      Signed-off-by: NArtem Savkov <asavkov@redhat.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6b6ebb6b
    • S
      RDS: TCP: unregister_netdevice_notifier() in error path of rds_tcp_init_net · 721c7443
      Sowmini Varadhan 提交于
      If some error is encountered in rds_tcp_init_net, make sure to
      unregister_netdevice_notifier(), else we could trigger a panic
      later on, when the modprobe from a netns fails.
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      721c7443
    • E
      Revert: "ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()" · 80d1106a
      Eli Cooper 提交于
      This reverts commit ae148b08
      ("ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()").
      
      skb->protocol is now set in __ip_local_out() and __ip6_local_out() before
      dst_output() is called. It is no longer necessary to do it for each tunnel.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NEli Cooper <elicooper@gmx.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      80d1106a
    • E
      ipv6: Set skb->protocol properly for local output · b4e479a9
      Eli Cooper 提交于
      When xfrm is applied to TSO/GSO packets, it follows this path:
      
          xfrm_output() -> xfrm_output_gso() -> skb_gso_segment()
      
      where skb_gso_segment() relies on skb->protocol to function properly.
      
      This patch sets skb->protocol to ETH_P_IPV6 before dst_output() is called,
      fixing a bug where GSO packets sent through an ipip6 tunnel are dropped
      when xfrm is involved.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NEli Cooper <elicooper@gmx.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b4e479a9
    • E
      ipv4: Set skb->protocol properly for local output · f4180439
      Eli Cooper 提交于
      When xfrm is applied to TSO/GSO packets, it follows this path:
      
          xfrm_output() -> xfrm_output_gso() -> skb_gso_segment()
      
      where skb_gso_segment() relies on skb->protocol to function properly.
      
      This patch sets skb->protocol to ETH_P_IP before dst_output() is called,
      fixing a bug where GSO packets sent through a sit tunnel are dropped
      when xfrm is involved.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NEli Cooper <elicooper@gmx.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4180439
    • P
      packet: fix race condition in packet_set_ring · 84ac7260
      Philip Pettersson 提交于
      When packet_set_ring creates a ring buffer it will initialize a
      struct timer_list if the packet version is TPACKET_V3. This value
      can then be raced by a different thread calling setsockopt to
      set the version to TPACKET_V1 before packet_set_ring has finished.
      
      This leads to a use-after-free on a function pointer in the
      struct timer_list when the socket is closed as the previously
      initialized timer will not be deleted.
      
      The bug is fixed by taking lock_sock(sk) in packet_setsockopt when
      changing the packet version while also taking the lock at the start
      of packet_set_ring.
      
      Fixes: f6fb8f10 ("af-packet: TPACKET_V3 flexible buffer implementation.")
      Signed-off-by: NPhilip Pettersson <philip.pettersson@gmail.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      84ac7260
  6. 02 12月, 2016 2 次提交
  7. 01 12月, 2016 7 次提交
    • H
      netfilter: arp_tables: fix invoking 32bit "iptable -P INPUT ACCEPT" failed in 64bit kernel · 17a49cd5
      Hongxu Jia 提交于
      Since 09d96860 ("netfilter: x_tables: do compat validation via
      translate_table"), it used compatr structure to assign newinfo
      structure.  In translate_compat_table of ip_tables.c and ip6_tables.c,
      it used compatr->hook_entry to replace info->hook_entry and
      compatr->underflow to replace info->underflow, but not do the same
      replacement in arp_tables.c.
      
      It caused invoking 32-bit "arptbale -P INPUT ACCEPT" failed in 64bit
      kernel.
      --------------------------------------
      root@qemux86-64:~# arptables -P INPUT ACCEPT
      root@qemux86-64:~# arptables -P INPUT ACCEPT
      ERROR: Policy for `INPUT' offset 448 != underflow 0
      arptables: Incompatible with this kernel
      --------------------------------------
      
      Fixes: 09d96860 ("netfilter: x_tables: do compat validation via translate_table")
      Signed-off-by: NHongxu Jia <hongxu.jia@windriver.com>
      Acked-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      17a49cd5
    • G
      l2tp: fix address test in __l2tp_ip6_bind_lookup() · 31e2f21f
      Guillaume Nault 提交于
      The '!(addr && ipv6_addr_equal(addr, laddr))' part of the conditional
      matches if addr is NULL or if addr != laddr.
      But the intend of __l2tp_ip6_bind_lookup() is to find a sockets with
      the same address, so the ipv6_addr_equal() condition needs to be
      inverted.
      
      For better clarity and consistency with the rest of the expression, the
      (!X || X == Y) notation is used instead of !(X && X != Y).
      Signed-off-by: NGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31e2f21f
    • G
      l2tp: fix lookup for sockets not bound to a device in l2tp_ip · df90e688
      Guillaume Nault 提交于
      When looking up an l2tp socket, we must consider a null netdevice id as
      wild card. There are currently two problems caused by
      __l2tp_ip_bind_lookup() not considering 'dif' as wild card when set to 0:
      
        * A socket bound to a device (i.e. with sk->sk_bound_dev_if != 0)
          never receives any packet. Since __l2tp_ip_bind_lookup() is called
          with dif == 0 in l2tp_ip_recv(), sk->sk_bound_dev_if is always
          different from 'dif' so the socket doesn't match.
      
        * Two sockets, one bound to a device but not the other, can be bound
          to the same address. If the first socket binding to the address is
          the one that is also bound to a device, the second socket can bind
          to the same address without __l2tp_ip_bind_lookup() noticing the
          overlap.
      
      To fix this issue, we need to consider that any null device index, be
      it 'sk->sk_bound_dev_if' or 'dif', matches with any other value.
      We also need to pass the input device index to __l2tp_ip_bind_lookup()
      on reception so that sockets bound to a device never receive packets
      from other devices.
      
      This patch fixes l2tp_ip6 in the same way.
      Signed-off-by: NGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      df90e688
    • G
      l2tp: fix racy socket lookup in l2tp_ip and l2tp_ip6 bind() · d5e3a190
      Guillaume Nault 提交于
      It's not enough to check for sockets bound to same address at the
      beginning of l2tp_ip{,6}_bind(): even if no socket is found at that
      time, a socket with the same address could be bound before we take
      the l2tp lock again.
      
      This patch moves the lookup right before inserting the new socket, so
      that no change can ever happen to the list between address lookup and
      socket insertion.
      
      Care is taken to avoid side effects on the socket in case of failure.
      That is, modifications of the socket are done after the lookup, when
      binding is guaranteed to succeed, and before releasing the l2tp lock,
      so that concurrent lookups will always see fully initialised sockets.
      
      For l2tp_ip, 'ret' is set to -EINVAL before checking the SOCK_ZAPPED
      bit. Error code was mistakenly set to -EADDRINUSE on error by commit
      32c23116 ("l2tp: fix racy SOCK_ZAPPED flag check in l2tp_ip{,6}_bind()").
      Using -EINVAL restores original behaviour.
      
      For l2tp_ip6, the lookup is now always done with the correct bound
      device. Before this patch, when binding to a link-local address, the
      lookup was done with the original sk->sk_bound_dev_if, which was later
      overwritten with addr->l2tp_scope_id. Lookup is now performed with the
      final sk->sk_bound_dev_if value.
      
      Finally, the (addr_len >= sizeof(struct sockaddr_in6)) check has been
      dropped: addr is a sockaddr_l2tpip6 not sockaddr_in6 and addr_len has
      already been checked at this point (this part of the code seems to have
      been copy-pasted from net/ipv6/raw.c).
      Signed-off-by: NGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d5e3a190
    • G
      l2tp: hold socket before dropping lock in l2tp_ip{, 6}_recv() · a3c18422
      Guillaume Nault 提交于
      Socket must be held while under the protection of the l2tp lock; there
      is no guarantee that sk remains valid after the read_unlock_bh() call.
      
      Same issue for l2tp_ip and l2tp_ip6.
      Signed-off-by: NGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a3c18422
    • G
      l2tp: lock socket before checking flags in connect() · 0382a25a
      Guillaume Nault 提交于
      Socket flags aren't updated atomically, so the socket must be locked
      while reading the SOCK_ZAPPED flag.
      
      This issue exists for both l2tp_ip and l2tp_ip6. For IPv6, this patch
      also brings error handling for __ip6_datagram_connect() failures.
      Signed-off-by: NGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0382a25a
    • D
      openvswitch: Fix skb leak in IPv6 reassembly. · f92a80a9
      Daniele Di Proietto 提交于
      If nf_ct_frag6_gather() returns an error other than -EINPROGRESS, it
      means that we still have a reference to the skb.  We should free it
      before returning from handle_fragments, as stated in the comment above.
      
      Fixes: daaa7d64 ("netfilter: ipv6: avoid nf_iterate recursion")
      CC: Florian Westphal <fw@strlen.de>
      CC: Pravin B Shelar <pshelar@ovn.org>
      CC: Joe Stringer <joe@ovn.org>
      Signed-off-by: NDaniele Di Proietto <diproiettod@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f92a80a9
  8. 30 11月, 2016 1 次提交