1. 15 6月, 2017 1 次提交
    • W
      cgroup: Keep accurate count of tasks in each css_set · 73a7242a
      Waiman Long 提交于
      The reference count in the css_set data structure was used as a
      proxy of the number of tasks attached to that css_set. However, that
      count is actually not an accurate measure especially with thread mode
      support. So a new variable nr_tasks is added to the css_set to keep
      track of the actual task count. This new variable is protected by
      the css_set_lock. Functions that require the actual task count are
      updated to use the new variable.
      
      tj: s/task_count/nr_tasks/ for consistency with cgroup_root->nr_cgrps.
          Refreshed on top of cgroup/for-v4.13 which dropped on
          css_set_populated() -> nr_tasks conversion.
      Signed-off-by: NWaiman Long <longman@redhat.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      73a7242a
  2. 12 6月, 2017 1 次提交
  3. 09 6月, 2017 1 次提交
    • E
      KEYS: sanitize key structs before freeing · 0620fddb
      Eric Biggers 提交于
      While a 'struct key' itself normally does not contain sensitive
      information, Documentation/security/keys.txt actually encourages this:
      
           "Having a payload is not required; and the payload can, in fact,
           just be a value stored in the struct key itself."
      
      In case someone has taken this advice, or will take this advice in the
      future, zero the key structure before freeing it.  We might as well, and
      as a bonus this could make it a bit more difficult for an adversary to
      determine which keys have recently been in use.
      
      This is safe because the key_jar cache does not use a constructor.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      0620fddb
  4. 08 6月, 2017 1 次提交
    • P
      srcu: Allow use of Classic SRCU from both process and interrupt context · 1123a604
      Paolo Bonzini 提交于
      Linu Cherian reported a WARN in cleanup_srcu_struct() when shutting
      down a guest running iperf on a VFIO assigned device.  This happens
      because irqfd_wakeup() calls srcu_read_lock(&kvm->irq_srcu) in interrupt
      context, while a worker thread does the same inside kvm_set_irq().  If the
      interrupt happens while the worker thread is executing __srcu_read_lock(),
      updates to the Classic SRCU ->lock_count[] field or the Tree SRCU
      ->srcu_lock_count[] field can be lost.
      
      The docs say you are not supposed to call srcu_read_lock() and
      srcu_read_unlock() from irq context, but KVM interrupt injection happens
      from (host) interrupt context and it would be nice if SRCU supported the
      use case.  KVM is using SRCU here not really for the "sleepable" part,
      but rather due to its IPI-free fast detection of grace periods.  It is
      therefore not desirable to switch back to RCU, which would effectively
      revert commit 719d93cd ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING",
      2014-01-16).
      
      However, the docs are overly conservative.  You can have an SRCU instance
      only has users in irq context, and you can mix process and irq context
      as long as process context users disable interrupts.  In addition,
      __srcu_read_unlock() actually uses this_cpu_dec() on both Tree SRCU and
      Classic SRCU.  For those two implementations, only srcu_read_lock()
      is unsafe.
      
      When Classic SRCU's __srcu_read_unlock() was changed to use this_cpu_dec(),
      in commit 5a41344a ("srcu: Simplify __srcu_read_unlock() via
      this_cpu_dec()", 2012-11-29), __srcu_read_lock() did two increments.
      Therefore it kept __this_cpu_inc(), with preempt_disable/enable in
      the caller.  Tree SRCU however only does one increment, so on most
      architectures it is more efficient for __srcu_read_lock() to use
      this_cpu_inc(), and any performance differences appear to be down in
      the noise.
      
      Cc: stable@vger.kernel.org
      Fixes: 719d93cd ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING")
      Reported-by: NLinu Cherian <linuc.decode@gmail.com>
      Suggested-by: NLinu Cherian <linuc.decode@gmail.com>
      Cc: kvm@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      1123a604
  5. 07 6月, 2017 3 次提交
    • R
      Revert "ACPI / sleep: Ignore spurious SCI wakeups from suspend-to-idle" · f3b7eaae
      Rafael J. Wysocki 提交于
      Revert commit eed4d47e (ACPI / sleep: Ignore spurious SCI wakeups
      from suspend-to-idle) as it turned out to be premature and triggered
      a number of different issues on various systems.
      
      That includes, but is not limited to, premature suspend-to-RAM aborts
      on Dell XPS 13 (9343) reported by Dominik.
      
      The issue the commit in question attempted to address is real and
      will need to be taken care of going forward, but evidently more work
      is needed for this purpose.
      Reported-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      f3b7eaae
    • D
      compiler, clang: suppress warning for unused static inline functions · abb2ea7d
      David Rientjes 提交于
      GCC explicitly does not warn for unused static inline functions for
      -Wunused-function.  The manual states:
      
      	Warn whenever a static function is declared but not defined or
      	a non-inline static function is unused.
      
      Clang does warn for static inline functions that are unused.
      
      It turns out that suppressing the warnings avoids potentially complex
      #ifdef directives, which also reduces LOC.
      
      Suppress the warning for clang.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      abb2ea7d
    • E
      elevator: fix truncation of icq_cache_name · 9bd2bbc0
      Eric Biggers 提交于
      gcc 7.1 reports the following warning:
      
          block/elevator.c: In function ‘elv_register’:
          block/elevator.c:898:5: warning: ‘snprintf’ output may be truncated before the last format character [-Wformat-truncation=]
               "%s_io_cq", e->elevator_name);
               ^~~~~~~~~~
          block/elevator.c:897:3: note: ‘snprintf’ output between 7 and 22 bytes into a destination of size 21
             snprintf(e->icq_cache_name, sizeof(e->icq_cache_name),
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
               "%s_io_cq", e->elevator_name);
               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      The bug is that the name of the icq_cache is 6 characters longer than
      the elevator name, but only ELV_NAME_MAX + 5 characters were reserved
      for it --- so in the case of a maximum-length elevator name, the 'q'
      character in "_io_cq" would be truncated by snprintf().  Fix it by
      reserving ELV_NAME_MAX + 6 characters instead.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Reviewed-by: NBart Van Assche <Bart.VanAssche@sandisk.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      9bd2bbc0
  6. 05 6月, 2017 1 次提交
  7. 03 6月, 2017 4 次提交
    • M
      mm: consider memblock reservations for deferred memory initialization sizing · 864b9a39
      Michal Hocko 提交于
      We have seen an early OOM killer invocation on ppc64 systems with
      crashkernel=4096M:
      
      	kthreadd invoked oom-killer: gfp_mask=0x16040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK), nodemask=7, order=0, oom_score_adj=0
      	kthreadd cpuset=/ mems_allowed=7
      	CPU: 0 PID: 2 Comm: kthreadd Not tainted 4.4.68-1.gd7fe927-default #1
      	Call Trace:
      	  dump_stack+0xb0/0xf0 (unreliable)
      	  dump_header+0xb0/0x258
      	  out_of_memory+0x5f0/0x640
      	  __alloc_pages_nodemask+0xa8c/0xc80
      	  kmem_getpages+0x84/0x1a0
      	  fallback_alloc+0x2a4/0x320
      	  kmem_cache_alloc_node+0xc0/0x2e0
      	  copy_process.isra.25+0x260/0x1b30
      	  _do_fork+0x94/0x470
      	  kernel_thread+0x48/0x60
      	  kthreadd+0x264/0x330
      	  ret_from_kernel_thread+0x5c/0xa4
      
      	Mem-Info:
      	active_anon:0 inactive_anon:0 isolated_anon:0
      	 active_file:0 inactive_file:0 isolated_file:0
      	 unevictable:0 dirty:0 writeback:0 unstable:0
      	 slab_reclaimable:5 slab_unreclaimable:73
      	 mapped:0 shmem:0 pagetables:0 bounce:0
      	 free:0 free_pcp:0 free_cma:0
      	Node 7 DMA free:0kB min:0kB low:0kB high:0kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:52428800kB managed:110016kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:320kB slab_unreclaimable:4672kB kernel_stack:1152kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
      	lowmem_reserve[]: 0 0 0 0
      	Node 7 DMA: 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 0kB
      	0 total pagecache pages
      	0 pages in swap cache
      	Swap cache stats: add 0, delete 0, find 0/0
      	Free swap  = 0kB
      	Total swap = 0kB
      	819200 pages RAM
      	0 pages HighMem/MovableOnly
      	817481 pages reserved
      	0 pages cma reserved
      	0 pages hwpoisoned
      
      the reason is that the managed memory is too low (only 110MB) while the
      rest of the the 50GB is still waiting for the deferred intialization to
      be done.  update_defer_init estimates the initial memoty to initialize
      to 2GB at least but it doesn't consider any memory allocated in that
      range.  In this particular case we've had
      
      	Reserving 4096MB of memory at 128MB for crashkernel (System RAM: 51200MB)
      
      so the low 2GB is mostly depleted.
      
      Fix this by considering memblock allocations in the initial static
      initialization estimation.  Move the max_initialise to
      reset_deferred_meminit and implement a simple memblock_reserved_memory
      helper which iterates all reserved blocks and sums the size of all that
      start below the given address.  The cumulative size is than added on top
      of the initial estimation.  This is still not ideal because
      reset_deferred_meminit doesn't consider holes and so reservation might
      be above the initial estimation whihch we ignore but let's make the
      logic simpler until we really need to handle more complicated cases.
      
      Fixes: 3a80a7fa ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set")
      Link: http://lkml.kernel.org/r/20170531104010.GI27783@dhcp22.suse.czSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Tested-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: <stable@vger.kernel.org>	[4.2+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      864b9a39
    • J
      mm/hugetlb: report -EHWPOISON not -EFAULT when FOLL_HWPOISON is specified · 9a291a7c
      James Morse 提交于
      KVM uses get_user_pages() to resolve its stage2 faults.  KVM sets the
      FOLL_HWPOISON flag causing faultin_page() to return -EHWPOISON when it
      finds a VM_FAULT_HWPOISON.  KVM handles these hwpoison pages as a
      special case.  (check_user_page_hwpoison())
      
      When huge pages are involved, this doesn't work so well.
      get_user_pages() calls follow_hugetlb_page(), which stops early if it
      receives VM_FAULT_HWPOISON from hugetlb_fault(), eventually returning
      -EFAULT to the caller.  The step to map this to -EHWPOISON based on the
      FOLL_ flags is missing.  The hwpoison special case is skipped, and
      -EFAULT is returned to user-space, causing Qemu or kvmtool to exit.
      
      Instead, move this VM_FAULT_ to errno mapping code into a header file
      and use it from faultin_page() and follow_hugetlb_page().
      
      With this, KVM works as expected.
      
      This isn't a problem for arm64 today as we haven't enabled
      MEMORY_FAILURE, but I can't see any reason this doesn't happen on x86
      too, so I think this should be a fix.  This doesn't apply earlier than
      stable's v4.11.1 due to all sorts of cleanup.
      
      [james.morse@arm.com: add vm_fault_to_errno() call to faultin_page()]
      suggested.
        Link: http://lkml.kernel.org/r/20170525171035.16359-1-james.morse@arm.com
      [akpm@linux-foundation.org: coding-style fixes]
      Link: http://lkml.kernel.org/r/20170524160900.28786-1-james.morse@arm.comSigned-off-by: NJames Morse <james.morse@arm.com>
      Acked-by: NPunit Agrawal <punit.agrawal@arm.com>
      Acked-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: <stable@vger.kernel.org>	[4.11.1+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9a291a7c
    • M
      frv: declare jiffies to be located in the .data section · 60b0a8c3
      Matthias Kaehlcke 提交于
      Commit 7c30f352 ("jiffies.h: declare jiffies and jiffies_64 with
      ____cacheline_aligned_in_smp") removed a section specification from the
      jiffies declaration that caused conflicts on some platforms.
      
      Unfortunately this change broke the build for frv:
      
        kernel/built-in.o: In function `__do_softirq': (.text+0x6460): relocation truncated to fit: R_FRV_GPREL12 against symbol
            `jiffies' defined in *ABS* section in .tmp_vmlinux1
        kernel/built-in.o: In function `__do_softirq': (.text+0x6574): relocation truncated to fit: R_FRV_GPREL12 against symbol
            `jiffies' defined in *ABS* section in .tmp_vmlinux1
        kernel/built-in.o: In function `pwq_activate_delayed_work': workqueue.c:(.text+0x15b9c): relocation truncated to fit: R_FRV_GPREL12 against
            symbol `jiffies' defined in *ABS* section in .tmp_vmlinux1
        ...
      
      Add __jiffy_arch_data to the declaration of jiffies and use it on frv to
      include the section specification.  For all other platforms
      __jiffy_arch_data (currently) has no effect.
      
      Fixes: 7c30f352 ("jiffies.h: declare jiffies and jiffies_64 with ____cacheline_aligned_in_smp")
      Link: http://lkml.kernel.org/r/20170516221333.177280-1-mka@chromium.orgSigned-off-by: NMatthias Kaehlcke <mka@chromium.org>
      Reported-by: NGuenter Roeck <linux@roeck-us.net>
      Tested-by: NGuenter Roeck <linux@roeck-us.net>
      Reviewed-by: NDavid Howells <dhowells@redhat.com>
      Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      60b0a8c3
    • M
      include/linux/gfp.h: fix ___GFP_NOLOCKDEP value · 1bde33e0
      Michal Hocko 提交于
      Igor Stoppa has noticed that __GFP_NOLOCKDEP can use a lower bit.  At
      the time commit 7e784422 ("lockdep: allow to disable reclaim lockup
      detection") was written we still had __GFP_OTHER_NODE but I have removed
      it in commit 41b6167e ("mm: get rid of __GFP_OTHER_NODE") and forgot
      to lower the bit value.
      
      The current value is outside of __GFP_BITS_SHIFT so it cannot be used
      actually.
      
      Fixes: 7e784422 ("lockdep: allow to disable reclaim lockup detection")
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Reported-by: NIgor Stoppa <igor.stoppa@nokia.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1bde33e0
  8. 02 6月, 2017 1 次提交
  9. 30 5月, 2017 1 次提交
    • A
      iommu/dma: Fix function declaration · 4b1c8898
      Arnd Bergmann 提交于
      Newly added code in the ipmmu-vmsa driver showed a small mistake
      in a header file that can't be included by itself without CONFIG_IOMMU_DMA
      enabled:
      
      In file included from drivers/iommu/ipmmu-vmsa.c:13:0:
      include/linux/dma-iommu.h:105:94: error: 'struct device' declared inside parameter list will not be visible outside of this definition or declaration [-Werror]
      
      This adds a forward declaration for 'struct device', similar to how
      we treat the other struct types in this case.
      
      Fixes: 3ae47292 ("iommu/ipmmu-vmsa: Add new IOMMU_DOMAIN_DMA ops")
      Fixes: 273df963 ("iommu/dma: Make PCI window reservation generic")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      4b1c8898
  10. 26 5月, 2017 2 次提交
  11. 25 5月, 2017 3 次提交
    • T
      ext4: fix quota charging for shared xattr blocks · b8cb5a54
      Tahsin Erdogan 提交于
      ext4_xattr_block_set() calls dquot_alloc_block() to charge for an xattr
      block when new references are made. However if dquot_initialize() hasn't
      been called on an inode, request for charging is effectively ignored
      because ext4_inode_info->i_dquot is not initialized yet.
      
      Add dquot_initialize() to call paths that lead to ext4_xattr_block_set().
      Signed-off-by: NTahsin Erdogan <tahsin@google.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      b8cb5a54
    • V
      vlan: Fix tcp checksum offloads in Q-in-Q vlans · 35d2f80b
      Vlad Yasevich 提交于
      It appears that TCP checksum offloading has been broken for
      Q-in-Q vlans.  The behavior was execerbated by the
      series
          commit afb0bc97 ("Merge branch 'stacked_vlan_tso'")
      that that enabled accleleration features on stacked vlans.
      
      However, event without that series, it is possible to trigger
      this issue.  It just requires a lot more specialized configuration.
      
      The root cause is the interaction between how
      netdev_intersect_features() works, the features actually set on
      the vlan devices and HW having the ability to run checksum with
      longer headers.
      
      The issue starts when netdev_interesect_features() replaces
      NETIF_F_HW_CSUM with a combination of NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM,
      if the HW advertises IP|IPV6 specific checksums.  This happens
      for tagged and multi-tagged packets.   However, HW that enables
      IP|IPV6 checksum offloading doesn't gurantee that packets with
      arbitrarily long headers can be checksummed.
      
      This patch disables IP|IPV6 checksums on the packet for multi-tagged
      packets.
      
      CC: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      CC: Michal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NVladislav Yasevich <vyasevic@redhat.com>
      Acked-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35d2f80b
    • T
      cpuset: consider dying css as offline · 41c25707
      Tejun Heo 提交于
      In most cases, a cgroup controller don't care about the liftimes of
      cgroups.  For the controller, a css becomes online when ->css_online()
      is called on it and offline when ->css_offline() is called.
      
      However, cpuset is special in that the user interface it exposes cares
      whether certain cgroups exist or not.  Combined with the RCU delay
      between cgroup removal and css offlining, this can lead to user
      visible behavior oddities where operations which should succeed after
      cgroup removals fail for some time period.  The effects of cgroup
      removals are delayed when seen from userland.
      
      This patch adds css_is_dying() which tests whether offline is pending
      and updates is_cpuset_online() so that the function returns false also
      while offline is pending.  This gets rid of the userland visible
      delays.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NDaniel Jordan <daniel.m.jordan@oracle.com>
      Link: http://lkml.kernel.org/r/327ca1f5-7957-fbb9-9e5f-9ba149d40ba2@oracle.com
      Cc: stable@vger.kernel.org
      Signed-off-by: NTejun Heo <tj@kernel.org>
      41c25707
  12. 24 5月, 2017 3 次提交
  13. 23 5月, 2017 7 次提交
  14. 22 5月, 2017 2 次提交
  15. 21 5月, 2017 2 次提交
  16. 18 5月, 2017 5 次提交
  17. 17 5月, 2017 1 次提交
  18. 16 5月, 2017 1 次提交
    • G
      ebtables: arpreply: Add the standard target sanity check · c953d635
      Gao Feng 提交于
      The info->target comes from userspace and it would be used directly.
      So we need to add the sanity check to make sure it is a valid standard
      target, although the ebtables tool has already checked it. Kernel needs
      to validate anything coming from userspace.
      
      If the target is set as an evil value, it would break the ebtables
      and cause a panic. Because the non-standard target is treated as one
      offset.
      
      Now add one helper function ebt_invalid_target, and we would replace
      the macro INVALID_TARGET later.
      Signed-off-by: NGao Feng <gfree.wind@vip.163.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      c953d635