1. 22 11月, 2014 4 次提交
  2. 20 11月, 2014 6 次提交
    • A
      fold verify_iovec() into copy_msghdr_from_user() · 08adb7da
      Al Viro 提交于
      ... and do the same on the compat side of things.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      08adb7da
    • A
      {compat_,}verify_iovec(): switch to generic copying of iovecs · 08449320
      Al Viro 提交于
      use {compat_,}rw_copy_check_uvector().  As the result, we are
      guaranteed that all iovecs seen in ->msg_iov by ->sendmsg()
      and ->recvmsg() will pass access_ok().
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      08449320
    • A
      separate kernel- and userland-side msghdr · 666547ff
      Al Viro 提交于
      Kernel-side struct msghdr is (currently) using the same layout as
      userland one, but it's not a one-to-one copy - even without considering
      32bit compat issues, we have msg_iov, msg_name and msg_control copied
      to kernel[1].  It's fairly localized, so we get away with a few functions
      where that knowledge is needed (and we could shrink that set even
      more).  Pretty much everything deals with the kernel-side variant and
      the few places that want userland one just use a bunch of force-casts
      to paper over the differences.
      
      The thing is, kernel-side definition of struct msghdr is *not* exposed
      in include/uapi - libc doesn't see it, etc.  So we can add struct user_msghdr,
      with proper annotations and let the few places that ever deal with those
      beasts use it for userland pointers.  Saner typechecking aside, that will
      allow to change the layout of kernel-side msghdr - e.g. replace
      msg_iov/msg_iovlen there with struct iov_iter, getting rid of the need
      to modify the iovec as we copy data to/from it, etc.
      
      We could introduce kernel_msghdr instead, but that would create much more
      noise - the absolute majority of the instances would need to have the
      type switched to kernel_msghdr and definition of struct msghdr in
      include/linux/socket.h is not going to be seen by userland anyway.
      
      This commit just introduces user_msghdr and switches the few places that
      are dealing with userland-side msghdr to it.
      
      [1] actually, it's even trickier than that - we copy msg_control for
      sendmsg, but keep the userland address on recvmsg.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      666547ff
    • M
      netlink: Deletion of an unnecessary check before the function call "__module_get" · fcd4d35e
      Markus Elfring 提交于
      The __module_get() function tests whether its argument is NULL and then
      returns immediately. Thus the test around the call is not needed.
      
      This issue was detected by using the Coccinelle software.
      Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fcd4d35e
    • M
      net: pktgen: Deletion of an unnecessary check before the function call "proc_remove" · ef87c5d6
      Markus Elfring 提交于
      The proc_remove() function tests whether its argument is NULL and then
      returns immediately. Thus the test around the call is not needed.
      
      This issue was detected by using the Coccinelle software.
      Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ef87c5d6
    • E
      tcp: make connect() mem charging friendly · 355a901e
      Eric Dumazet 提交于
      While working on sk_forward_alloc problems reported by Denys
      Fedoryshchenko, we found that tcp connect() (and fastopen) do not call
      sk_wmem_schedule() for SYN packet (and/or SYN/DATA packet), so
      sk_forward_alloc is negative while connect is in progress.
      
      We can fix this by calling regular sk_stream_alloc_skb() both for the
      SYN packet (in tcp_connect()) and the syn_data packet in
      tcp_send_syn_data()
      
      Then, tcp_send_syn_data() can avoid copying syn_data as we simply
      can manipulate syn_data->cb[] to remove SYN flag (and increment seq)
      
      Instead of open coding memcpy_fromiovecend(), simply use this helper.
      
      This leaves in socket write queue clean fast clone skbs.
      
      This was tested against our fastopen packetdrill tests.
      Reported-by: NDenys Fedoryshchenko <nuclearcat@nuclearcat.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      355a901e
  3. 19 11月, 2014 8 次提交
  4. 17 11月, 2014 5 次提交
  5. 14 11月, 2014 12 次提交
    • P
      openvswitch: Fix build failure. · 8cd4313a
      Pravin B Shelar 提交于
      Add dependency on INET to fix following build error. I have also
      fixed MPLS dependency.
      
      ERROR: "ip_route_output_flow" [net/openvswitch/openvswitch.ko]
      undefined!
      make[1]: *** [__modpost] Error 1
      Reported-by: NJim Davis <jim.epost@gmail.com>
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8cd4313a
    • E
      tcp: limit GSO packets to half cwnd · d649a7a8
      Eric Dumazet 提交于
      In DC world, GSO packets initially cooked by tcp_sendmsg() are usually
      big, as sk_pacing_rate is high.
      
      When network is congested, cwnd can be smaller than the GSO packets
      found in socket write queue. tcp_write_xmit() splits GSO packets
      using the available cwnd, and we end up sending a single GSO packet,
      consuming all available cwnd.
      
      With GRO aggregation on the receiver, we might handle a single GRO
      packet, sending back a single ACK.
      
      1) This single ACK might be lost
         TLP or RTO are forced to attempt a retransmit.
      2) This ACK releases a full cwnd, sender sends another big GSO packet,
         in a ping pong mode.
      
      This behavior does not fill the pipes in the best way, because of
      scheduling artifacts.
      
      Make sure we always have at least two GSO packets in flight.
      
      This allows us to safely increase GRO efficiency without risking
      spurious retransmits.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d649a7a8
    • T
      rhashtable: Drop gfp_flags arg in insert/remove functions · 6eba8224
      Thomas Graf 提交于
      Reallocation is only required for shrinking and expanding and both rely
      on a mutex for synchronization and callers of rhashtable_init() are in
      non atomic context. Therefore, no reason to continue passing allocation
      hints through the API.
      
      Instead, use GFP_KERNEL and add __GFP_NOWARN | __GFP_NORETRY to allow
      for silent fall back to vzalloc() without the OOM killer jumping in as
      pointed out by Eric Dumazet and Eric W. Biederman.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6eba8224
    • H
      rhashtable: Add parent argument to mutex_is_held · 7b4ce235
      Herbert Xu 提交于
      Currently mutex_is_held can only test locks in the that are global
      since it takes no arguments.  This prevents rhashtable from being
      used in places where locks are lock, e.g., per-namespace locks.
      
      This patch adds a parent field to mutex_is_held and rhashtable_params
      so that local locks can be used (and tested).
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7b4ce235
    • H
      netfilter: Move mutex_is_held under PROVE_LOCKING · 1f501d62
      Herbert Xu 提交于
      The rhashtable function mutex_is_held is only used when PROVE_LOCKING
      is enabled.  This patch modifies netfilter so that we can rhashtable.h
      itself can later make mutex_is_held optional depending on PROVE_LOCKING.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f501d62
    • H
      netlink: Move mutex_is_held under PROVE_LOCKING · 97127566
      Herbert Xu 提交于
      The rhashtable function mutex_is_held is only used when PROVE_LOCKING
      is enabled.  This patch modifies netlink so that we can rhashtable.h
      itself can later make mutex_is_held optional depending on PROVE_LOCKING.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      97127566
    • M
      net: generic dev_disable_lro() stacked device handling · fbe168ba
      Michal Kubeček 提交于
      Large receive offloading is known to cause problems if received packets
      are passed to other host. Therefore the kernel disables it by calling
      dev_disable_lro() whenever a network device is enslaved in a bridge or
      forwarding is enabled for it (or globally). For virtual devices we need
      to disable LRO on the underlying physical device (which is actually
      receiving the packets).
      
      Current dev_disable_lro() code handles this  propagation for a vlan
      (including 802.1ad nested vlan), macvlan or a vlan on top of a macvlan.
      It doesn't handle other stacked devices and their combinations, in
      particular propagation from a bond to its slaves which often causes
      problems in virtualization setups.
      
      As we now have generic data structures describing the upper-lower device
      relationship, dev_disable_lro() can be generalized to disable LRO also
      for all lower devices (if any) once it is disabled for the device
      itself.
      
      For bonding and teaming devices, it is necessary to disable LRO not only
      on current slaves at the moment when dev_disable_lro() is called but
      also on any slave (port) added later.
      
      v2: use lower device links for all devices (including vlan and macvlan)
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Acked-by: NVeaceslav Falico <vfalico@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fbe168ba
    • T
      FOU: Fix no return statement warning for !CONFIG_NET_FOU_IP_TUNNELS · 882288c0
      Thomas Graf 提交于
      net/ipv4/fou.c: In function ‘ip_tunnel_encap_del_fou_ops’:
      net/ipv4/fou.c:861:1: warning: no return statement in function returning non-void [-Wreturn-type]
      
      Fixes: a8c5f90f ("ip_tunnel: Ops registration for secondary encap (fou, gue)")
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      882288c0
    • I
      libceph: change from BUG to WARN for __remove_osd() asserts · cc9f1f51
      Ilya Dryomov 提交于
      No reason to use BUG_ON for osd request list assertions.
      Signed-off-by: NIlya Dryomov <idryomov@redhat.com>
      Reviewed-by: NAlex Elder <elder@linaro.org>
      cc9f1f51
    • I
      libceph: clear r_req_lru_item in __unregister_linger_request() · ba9d114e
      Ilya Dryomov 提交于
      kick_requests() can put linger requests on the notarget list.  This
      means we need to clear the much-overloaded req->r_req_lru_item in
      __unregister_linger_request() as well, or we get an assertion failure
      in ceph_osdc_release_request() - !list_empty(&req->r_req_lru_item).
      
      AFAICT the assumption was that registered linger requests cannot be on
      any of req->r_req_lru_item lists, but that's clearly not the case.
      Signed-off-by: NIlya Dryomov <idryomov@redhat.com>
      Reviewed-by: NAlex Elder <elder@linaro.org>
      ba9d114e
    • I
      libceph: unlink from o_linger_requests when clearing r_osd · a390de02
      Ilya Dryomov 提交于
      Requests have to be unlinked from both osd->o_requests (normal
      requests) and osd->o_linger_requests (linger requests) lists when
      clearing req->r_osd.  Otherwise __unregister_linger_request() gets
      confused and we trip over a !list_empty(&osd->o_linger_requests)
      assert in __remove_osd().
      
      MON=1 OSD=1:
      
          # cat remove-osd.sh
          #!/bin/bash
          rbd create --size 1 test
          DEV=$(rbd map test)
          ceph osd out 0
          sleep 3
          rbd map dne/dne # obtain a new osdmap as a side effect
          rbd unmap $DEV & # will block
          sleep 3
          ceph osd in 0
      Signed-off-by: NIlya Dryomov <idryomov@redhat.com>
      Reviewed-by: NAlex Elder <elder@linaro.org>
      a390de02
    • I
      libceph: do not crash on large auth tickets · aaef3170
      Ilya Dryomov 提交于
      Large (greater than 32k, the value of PAGE_ALLOC_COSTLY_ORDER) auth
      tickets will have their buffers vmalloc'ed, which leads to the
      following crash in crypto:
      
      [   28.685082] BUG: unable to handle kernel paging request at ffffeb04000032c0
      [   28.686032] IP: [<ffffffff81392b42>] scatterwalk_pagedone+0x22/0x80
      [   28.686032] PGD 0
      [   28.688088] Oops: 0000 [#1] PREEMPT SMP
      [   28.688088] Modules linked in:
      [   28.688088] CPU: 0 PID: 878 Comm: kworker/0:2 Not tainted 3.17.0-vm+ #305
      [   28.688088] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
      [   28.688088] Workqueue: ceph-msgr con_work
      [   28.688088] task: ffff88011a7f9030 ti: ffff8800d903c000 task.ti: ffff8800d903c000
      [   28.688088] RIP: 0010:[<ffffffff81392b42>]  [<ffffffff81392b42>] scatterwalk_pagedone+0x22/0x80
      [   28.688088] RSP: 0018:ffff8800d903f688  EFLAGS: 00010286
      [   28.688088] RAX: ffffeb04000032c0 RBX: ffff8800d903f718 RCX: ffffeb04000032c0
      [   28.688088] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8800d903f750
      [   28.688088] RBP: ffff8800d903f688 R08: 00000000000007de R09: ffff8800d903f880
      [   28.688088] R10: 18df467c72d6257b R11: 0000000000000000 R12: 0000000000000010
      [   28.688088] R13: ffff8800d903f750 R14: ffff8800d903f8a0 R15: 0000000000000000
      [   28.688088] FS:  00007f50a41c7700(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000
      [   28.688088] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [   28.688088] CR2: ffffeb04000032c0 CR3: 00000000da3f3000 CR4: 00000000000006b0
      [   28.688088] Stack:
      [   28.688088]  ffff8800d903f698 ffffffff81392ca8 ffff8800d903f6e8 ffffffff81395d32
      [   28.688088]  ffff8800dac96000 ffff880000000000 ffff8800d903f980 ffff880119b7e020
      [   28.688088]  ffff880119b7e010 0000000000000000 0000000000000010 0000000000000010
      [   28.688088] Call Trace:
      [   28.688088]  [<ffffffff81392ca8>] scatterwalk_done+0x38/0x40
      [   28.688088]  [<ffffffff81392ca8>] scatterwalk_done+0x38/0x40
      [   28.688088]  [<ffffffff81395d32>] blkcipher_walk_done+0x182/0x220
      [   28.688088]  [<ffffffff813990bf>] crypto_cbc_encrypt+0x15f/0x180
      [   28.688088]  [<ffffffff81399780>] ? crypto_aes_set_key+0x30/0x30
      [   28.688088]  [<ffffffff8156c40c>] ceph_aes_encrypt2+0x29c/0x2e0
      [   28.688088]  [<ffffffff8156d2a3>] ceph_encrypt2+0x93/0xb0
      [   28.688088]  [<ffffffff8156d7da>] ceph_x_encrypt+0x4a/0x60
      [   28.688088]  [<ffffffff8155b39d>] ? ceph_buffer_new+0x5d/0xf0
      [   28.688088]  [<ffffffff8156e837>] ceph_x_build_authorizer.isra.6+0x297/0x360
      [   28.688088]  [<ffffffff8112089b>] ? kmem_cache_alloc_trace+0x11b/0x1c0
      [   28.688088]  [<ffffffff8156b496>] ? ceph_auth_create_authorizer+0x36/0x80
      [   28.688088]  [<ffffffff8156ed83>] ceph_x_create_authorizer+0x63/0xd0
      [   28.688088]  [<ffffffff8156b4b4>] ceph_auth_create_authorizer+0x54/0x80
      [   28.688088]  [<ffffffff8155f7c0>] get_authorizer+0x80/0xd0
      [   28.688088]  [<ffffffff81555a8b>] prepare_write_connect+0x18b/0x2b0
      [   28.688088]  [<ffffffff81559289>] try_read+0x1e59/0x1f10
      
      This is because we set up crypto scatterlists as if all buffers were
      kmalloc'ed.  Fix it.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NIlya Dryomov <idryomov@redhat.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      aaef3170
  6. 13 11月, 2014 5 次提交