1. 05 11月, 2014 1 次提交
    • F
      syncookies: split cookie_check_timestamp() into two functions · f1673381
      Florian Westphal 提交于
      The function cookie_check_timestamp(), both called from IPv4/6 context,
      is being used to decode the echoed timestamp from the SYN/ACK into TCP
      options used for follow-up communication with the peer.
      
      We can remove ECN handling from that function, split it into a separate
      one, and simply rename the original function into cookie_decode_options().
      cookie_decode_options() just fills in tcp_option struct based on the
      echoed timestamp received from the peer. Anything that fails in this
      function will actually discard the request socket.
      
      While this is the natural place for decoding options such as ECN which
      commit 172d69e6 ("syncookies: add support for ECN") added, we argue
      that in particular for ECN handling, it can be checked at a later point
      in time as the request sock would actually not need to be dropped from
      this, but just ECN support turned off.
      
      Therefore, we split this functionality into cookie_ecn_ok(), which tells
      us if the timestamp indicates ECN support AND the tcp_ecn sysctl is enabled.
      
      This prepares for per-route ECN support: just looking at the tcp_ecn sysctl
      won't be enough anymore at that point; if the timestamp indicates ECN
      and sysctl tcp_ecn == 0, we will also need to check the ECN dst metric.
      
      This would mean adding a route lookup to cookie_check_timestamp(), which
      we definitely want to avoid. As we already do a route lookup at a later
      point in cookie_{v4,v6}_check(), we can simply make use of that as well
      for the new cookie_ecn_ok() function w/o any additional cost.
      
      Joint work with Daniel Borkmann.
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f1673381
  2. 04 11月, 2014 3 次提交
  3. 31 10月, 2014 7 次提交
  4. 30 10月, 2014 8 次提交
    • J
      mm: memcontrol: fix missed end-writeback page accounting · d7365e78
      Johannes Weiner 提交于
      Commit 0a31bc97 ("mm: memcontrol: rewrite uncharge API") changed
      page migration to uncharge the old page right away.  The page is locked,
      unmapped, truncated, and off the LRU, but it could race with writeback
      ending, which then doesn't unaccount the page properly:
      
      test_clear_page_writeback()              migration
                                                 wait_on_page_writeback()
        TestClearPageWriteback()
                                                 mem_cgroup_migrate()
                                                   clear PCG_USED
        mem_cgroup_update_page_stat()
          if (PageCgroupUsed(pc))
            decrease memcg pages under writeback
      
        release pc->mem_cgroup->move_lock
      
      The per-page statistics interface is heavily optimized to avoid a
      function call and a lookup_page_cgroup() in the file unmap fast path,
      which means it doesn't verify whether a page is still charged before
      clearing PageWriteback() and it has to do it in the stat update later.
      
      Rework it so that it looks up the page's memcg once at the beginning of
      the transaction and then uses it throughout.  The charge will be
      verified before clearing PageWriteback() and migration can't uncharge
      the page as long as that is still set.  The RCU lock will protect the
      memcg past uncharge.
      
      As far as losing the optimization goes, the following test results are
      from a microbenchmark that maps, faults, and unmaps a 4GB sparse file
      three times in a nested fashion, so that there are two negative passes
      that don't account but still go through the new transaction overhead.
      There is no actual difference:
      
       old:     33.195102545 seconds time elapsed       ( +-  0.01% )
       new:     33.199231369 seconds time elapsed       ( +-  0.03% )
      
      The time spent in page_remove_rmap()'s callees still adds up to the
      same, but the time spent in the function itself seems reduced:
      
           # Children      Self  Command        Shared Object       Symbol
       old:     0.12%     0.11%  filemapstress  [kernel.kallsyms]   [k] page_remove_rmap
       new:     0.12%     0.08%  filemapstress  [kernel.kallsyms]   [k] page_remove_rmap
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Cc: <stable@vger.kernel.org>	[3.17.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d7365e78
    • J
      mm: page-writeback: inline account_page_dirtied() into single caller · 3a3c02ec
      Johannes Weiner 提交于
      A follow-up patch would have changed the call signature.  To save the
      trouble, just fold it instead.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Cc: <stable@vger.kernel.org>	[3.17.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3a3c02ec
    • D
      mm, thp: fix collapsing of hugepages on madvise · 6d50e60c
      David Rientjes 提交于
      If an anonymous mapping is not allowed to fault thp memory and then
      madvise(MADV_HUGEPAGE) is used after fault, khugepaged will never
      collapse this memory into thp memory.
      
      This occurs because the madvise(2) handler for thp, hugepage_madvise(),
      clears VM_NOHUGEPAGE on the stack and it isn't stored in vma->vm_flags
      until the final action of madvise_behavior().  This causes the
      khugepaged_enter_vma_merge() to be a no-op in hugepage_madvise() when
      the vma had previously had VM_NOHUGEPAGE set.
      
      Fix this by passing the correct vma flags to the khugepaged mm slot
      handler.  There's no chance khugepaged can run on this vma until after
      madvise_behavior() returns since we hold mm->mmap_sem.
      
      It would be possible to clear VM_NOHUGEPAGE directly from vma->vm_flags
      in hugepage_advise(), but I didn't want to introduce special case
      behavior into madvise_behavior().  I think it's best to just let it
      always set vma->vm_flags itself.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Reported-by: NSuleiman Souhlal <suleiman@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6d50e60c
    • M
      drivers: of: add return value to of_reserved_mem_device_init() · 47f29df7
      Marek Szyprowski 提交于
      Driver calling of_reserved_mem_device_init() might be interested if the
      initialization has been successful or not, so add support for returning
      error code.
      
      This fixes a build warining caused by commit 7bfa5ab6 ("drivers:
      dma-coherent: add initialization from device tree"), which has been
      merged without this change and without fixing function return value.
      
      Fixes: 7bfa5ab6 ("drivers: dma-coherent: add initialization from device tree")
      Signed-off-by: NMarek Szyprowski <m.szyprowski@samsung.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: Laura Abbott <lauraa@codeaurora.org>
      Cc: Josh Cartwright <joshc@codeaurora.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kyungmin Park <kyungmin.park@samsung.com>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      47f29df7
    • N
      neigh: optimize neigh_parms_release() · 75fbfd33
      Nicolas Dichtel 提交于
      In neigh_parms_release() we loop over all entries to find the entry given in
      argument and being able to remove it from the list. By using a double linked
      list, we can avoid this loop.
      
      Here are some numbers with 30 000 dummy interfaces configured:
      
      Before the patch:
      $ time rmmod dummy
      real	2m0.118s
      user	0m0.000s
      sys	1m50.048s
      
      After the patch:
      $ time rmmod dummy
      real	1m9.970s
      user	0m0.000s
      sys	0m47.976s
      Suggested-by: NThierry Herbelot <thierry.herbelot@6wind.com>
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      75fbfd33
    • E
      net: introduce napi_schedule_irqoff() · bc9ad166
      Eric Dumazet 提交于
      napi_schedule() can be called from any context and has to mask hard
      irqs.
      
      Add a variant that can only be called from hard interrupts handlers
      or when irqs are already masked.
      
      Many NIC drivers can use it from their hard IRQ handler instead of
      generic variant.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc9ad166
    • E
      net: ipv6: Add a sysctl to make optimistic addresses useful candidates · 7fd2561e
      Erik Kline 提交于
      Add a sysctl that causes an interface's optimistic addresses
      to be considered equivalent to other non-deprecated addresses
      for source address selection purposes.  Preferred addresses
      will still take precedence over optimistic addresses, subject
      to other ranking in the source address selection algorithm.
      
      This is useful where different interfaces are connected to
      different networks from different ISPs (e.g., a cell network
      and a home wifi network).
      
      The current behaviour complies with RFC 3484/6724, and it
      makes sense if the host has only one interface, or has
      multiple interfaces on the same network (same or cooperating
      administrative domain(s), but not in the multiple distinct
      networks case.
      
      For example, if a mobile device has an IPv6 address on an LTE
      network and then connects to IPv6-enabled wifi, while the wifi
      IPv6 address is undergoing DAD, IPv6 connections will try use
      the wifi default route with the LTE IPv6 address, and will get
      stuck until they time out.
      
      Also, because optimistic nodes can receive frames, issue
      an RTM_NEWADDR as soon as DAD starts (with the IFA_F_OPTIMSTIC
      flag appropriately set).  A second RTM_NEWADDR is sent if DAD
      completes (the address flags have changed), otherwise an
      RTM_DELADDR is sent.
      
      Also: add an entry in ip-sysctl.txt for optimistic_dad.
      Signed-off-by: NErik Kline <ek@google.com>
      Acked-by: NLorenzo Colitti <lorenzo@google.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7fd2561e
    • E
      tcp: allow for bigger reordering level · dca145ff
      Eric Dumazet 提交于
      While testing upcoming Yaogong patch (converting out of order queue
      into an RB tree), I hit the max reordering level of linux TCP stack.
      
      Reordering level was limited to 127 for no good reason, and some
      network setups [1] can easily reach this limit and get limited
      throughput.
      
      Allow a new max limit of 300, and add a sysctl to allow admins to even
      allow bigger (or lower) values if needed.
      
      [1] Aggregation of links, per packet load balancing, fabrics not doing
       deep packet inspections, alternative TCP congestion modules...
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Yaogong Wang <wygivan@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dca145ff
  5. 29 10月, 2014 11 次提交
  6. 28 10月, 2014 5 次提交
  7. 25 10月, 2014 3 次提交
  8. 24 10月, 2014 2 次提交
    • W
      kvm: vfio: fix unregister kvm_device_ops of vfio · 571ee1b6
      Wanpeng Li 提交于
      After commit 80ce1639 (KVM: VFIO: register kvm_device_ops dynamically),
      kvm_device_ops of vfio can be registered dynamically. Commit 3c3c29fd
      (kvm-vfio: do not use module_init) move the dynamic register invoked by
      kvm_init in order to fix broke unloading of the kvm module. However,
      kvm_device_ops of vfio is unregistered after rmmod kvm-intel module
      which lead to device type collision detection warning after kvm-intel
      module reinsmod.
      
          WARNING: CPU: 1 PID: 10358 at /root/cathy/kvm/arch/x86/kvm/../../../virt/kvm/kvm_main.c:3289 kvm_init+0x234/0x282 [kvm]()
          Modules linked in: kvm_intel(O+) kvm(O) nfsv3 nfs_acl auth_rpcgss oid_registry nfsv4 dns_resolver nfs fscache lockd sunrpc pci_stub bridge stp llc autofs4 8021q cpufreq_ondemand ipv6 joydev microcode pcspkr igb i2c_algo_bit ehci_pci ehci_hcd e1000e i2c_i801 ixgbe ptp pps_core hwmon mdio tpm_tis tpm ipmi_si ipmi_msghandler acpi_cpufreq isci libsas scsi_transport_sas button dm_mirror dm_region_hash dm_log dm_mod [last unloaded: kvm_intel]
          CPU: 1 PID: 10358 Comm: insmod Tainted: G        W  O   3.17.0-rc1 #2
          Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.1311111329 11/11/2013
           0000000000000cd9 ffff880ff08cfd18 ffffffff814a61d9 0000000000000cd9
           0000000000000000 ffff880ff08cfd58 ffffffff810417b7 ffff880ff08cfd48
           ffffffffa045bcac ffffffffa049c420 0000000000000040 00000000000000ff
          Call Trace:
           [<ffffffff814a61d9>] dump_stack+0x49/0x60
           [<ffffffff810417b7>] warn_slowpath_common+0x7c/0x96
           [<ffffffffa045bcac>] ? kvm_init+0x234/0x282 [kvm]
           [<ffffffff810417e6>] warn_slowpath_null+0x15/0x17
           [<ffffffffa045bcac>] kvm_init+0x234/0x282 [kvm]
           [<ffffffffa016e995>] vmx_init+0x1bf/0x42a [kvm_intel]
           [<ffffffffa016e7d6>] ? vmx_check_processor_compat+0x64/0x64 [kvm_intel]
           [<ffffffff810002ab>] do_one_initcall+0xe3/0x170
           [<ffffffff811168a9>] ? __vunmap+0xad/0xb8
           [<ffffffff8109c58f>] do_init_module+0x2b/0x174
           [<ffffffff8109d414>] load_module+0x43e/0x569
           [<ffffffff8109c6d8>] ? do_init_module+0x174/0x174
           [<ffffffff8109c75a>] ? copy_module_from_user+0x39/0x82
           [<ffffffff8109b7dd>] ? module_sect_show+0x20/0x20
           [<ffffffff8109d65f>] SyS_init_module+0x54/0x81
           [<ffffffff814a9a12>] system_call_fastpath+0x16/0x1b
          ---[ end trace 0626f4a3ddea56f3 ]---
      
      The bug can be reproduced by:
      
          rmmod kvm_intel.ko
          insmod kvm_intel.ko
      
      without rmmod/insmod kvm.ko
      This patch fixes the bug by unregistering kvm_device_ops of vfio when the
      kvm-intel module is removed.
      Reported-by: NLiu Rongrong <rongrongx.liu@intel.com>
      Fixes: 3c3c29fdSigned-off-by: NWanpeng Li <wanpeng.li@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      571ee1b6
    • M
      Revert "[media] v4l2-dv-timings: fix a sparse warning" · 607ec6a5
      Mauro Carvalho Chehab 提交于
      Sparse got a fix for that. Also, it is suspected that reverting
      this patch might cause compilation breakages on userspace. So,
      revert it.
      
      This reverts commit 5c2cacc1.
      Requested-by: NHans Verkuil <hverkuil@xs4all.nl>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>
      607ec6a5