1. 31 8月, 2015 1 次提交
  2. 29 8月, 2015 2 次提交
  3. 22 8月, 2015 1 次提交
    • M
      mm: make page pfmemalloc check more robust · 2f064f34
      Michal Hocko 提交于
      Commit c48a11c7 ("netvm: propagate page->pfmemalloc to skb") added
      checks for page->pfmemalloc to __skb_fill_page_desc():
      
              if (page->pfmemalloc && !page->mapping)
                      skb->pfmemalloc = true;
      
      It assumes page->mapping == NULL implies that page->pfmemalloc can be
      trusted.  However, __delete_from_page_cache() can set set page->mapping
      to NULL and leave page->index value alone.  Due to being in union, a
      non-zero page->index will be interpreted as true page->pfmemalloc.
      
      So the assumption is invalid if the networking code can see such a page.
      And it seems it can.  We have encountered this with a NFS over loopback
      setup when such a page is attached to a new skbuf.  There is no copying
      going on in this case so the page confuses __skb_fill_page_desc which
      interprets the index as pfmemalloc flag and the network stack drops
      packets that have been allocated using the reserves unless they are to
      be queued on sockets handling the swapping which is the case here and
      that leads to hangs when the nfs client waits for a response from the
      server which has been dropped and thus never arrive.
      
      The struct page is already heavily packed so rather than finding another
      hole to put it in, let's do a trick instead.  We can reuse the index
      again but define it to an impossible value (-1UL).  This is the page
      index so it should never see the value that large.  Replace all direct
      users of page->pfmemalloc by page_is_pfmemalloc which will hide this
      nastiness from unspoiled eyes.
      
      The information will get lost if somebody wants to use page->index
      obviously but that was the case before and the original code expected
      that the information should be persisted somewhere else if that is
      really needed (e.g.  what SLAB and SLUB do).
      
      [akpm@linux-foundation.org: fix blooper in slub]
      Fixes: c48a11c7 ("netvm: propagate page->pfmemalloc to skb")
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Debugged-by: NVlastimil Babka <vbabka@suse.com>
      Debugged-by: NJiri Bohac <jbohac@suse.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: David Miller <davem@davemloft.net>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: <stable@vger.kernel.org>	[3.6+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2f064f34
  4. 20 8月, 2015 1 次提交
  5. 11 8月, 2015 1 次提交
  6. 07 8月, 2015 2 次提交
    • N
      mm: check __PG_HWPOISON separately from PAGE_FLAGS_CHECK_AT_* · f4c18e6f
      Naoya Horiguchi 提交于
      The race condition addressed in commit add05cec ("mm: soft-offline:
      don't free target page in successful page migration") was not closed
      completely, because that can happen not only for soft-offline, but also
      for hard-offline.  Consider that a slab page is about to be freed into
      buddy pool, and then an uncorrected memory error hits the page just
      after entering __free_one_page(), then VM_BUG_ON_PAGE(page->flags &
      PAGE_FLAGS_CHECK_AT_PREP) is triggered, despite the fact that it's not
      necessary because the data on the affected page is not consumed.
      
      To solve it, this patch drops __PG_HWPOISON from page flag checks at
      allocation/free time.  I think it's justified because __PG_HWPOISON
      flags is defined to prevent the page from being reused, and setting it
      outside the page's alloc-free cycle is a designed behavior (not a bug.)
      
      For recent months, I was annoyed about BUG_ON when soft-offlined page
      remains on lru cache list for a while, which is avoided by calling
      put_page() instead of putback_lru_page() in page migration's success
      path.  This means that this patch reverts a major change from commit
      add05cec about the new refcounting rule of soft-offlined pages, so
      "reuse window" revives.  This will be closed by a subsequent patch.
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Dean Nelson <dnelson@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f4c18e6f
    • M
      fs, file table: reinit files_stat.max_files after deferred memory initialisation · 4248b0da
      Mel Gorman 提交于
      Dave Hansen reported the following;
      
      	My laptop has been behaving strangely with 4.2-rc2.  Once I log
      	in to my X session, I start getting all kinds of strange errors
      	from applications and see this in my dmesg:
      
              	VFS: file-max limit 8192 reached
      
      The problem is that the file-max is calculated before memory is fully
      initialised and miscalculates how much memory the kernel is using.  This
      patch recalculates file-max after deferred memory initialisation.  Note
      that using memory hotplug infrastructure would not have avoided this
      problem as the value is not recalculated after memory hot-add.
      
      4.1:             files_stat.max_files = 6582781
      4.2-rc2:         files_stat.max_files = 8192
      4.2-rc2 patched: files_stat.max_files = 6562467
      
      Small differences with the patch applied and 4.1 but not enough to matter.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Reported-by: NDave Hansen <dave.hansen@intel.com>
      Cc: Nicolai Stange <nicstange@gmail.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Alex Ng <alexng@microsoft.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4248b0da
  7. 04 8月, 2015 2 次提交
    • T
      Revert "libata: Implement NCQ autosense" · 74a80d67
      Tejun Heo 提交于
      This reverts commit 42b966fb.
      
      As implemented, ACS-4 sense reporting for ATA devices bypasses error
      diagnosis and handling in libata degrading EH behavior significantly.
      Revert the related changes for now.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: stable@vger.kernel.org #v4.1+
      74a80d67
    • T
      Revert "libata: Implement support for sense data reporting" · 84ded2f8
      Tejun Heo 提交于
      This reverts commit fe7173c2.
      
      As implemented, ACS-4 sense reporting for ATA devices bypasses error
      diagnosis and handling in libata degrading EH behavior significantly.
      Revert the related changes for now.
      
      ATA_ID_COMMAND_SET_3/4 constants are not reverted as they're used by
      later changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: stable@vger.kernel.org #v4.1+
      84ded2f8
  8. 28 7月, 2015 1 次提交
    • R
      cpufreq: Avoid attempts to create duplicate symbolic links · 559ed407
      Rafael J. Wysocki 提交于
      After commit 87549141 (cpufreq: Stop migrating sysfs files on
      hotplug) there is a problem with CPUs that share cpufreq policy
      objects with other CPUs and are initially offline.
      
      Say CPU1 shares a policy with CPU0 which is online and is registered
      first.  As part of the registration process, cpufreq_add_dev() is
      called for it.  It creates the policy object and a symbolic link
      to it from the CPU1's sysfs directory.  If CPU1 is registered
      subsequently and it is offline at that time, cpufreq_add_dev() will
      attempt to create a symbolic link to the policy object for it, but
      that link is present already, so a warning about that will be
      triggered.
      
      To avoid that warning, make cpufreq use an additional CPU mask
      containing related CPUs that are actually present for each policy
      object.  That mask is initialized when the policy object is populated
      after its creation (for the first online CPU using it) and it includes
      CPUs from the "policy CPUs" mask returned by the cpufreq driver's
      ->init() callback that are physically present at that time.  Symbolic
      links to the policy are created only for the CPUs in that mask.
      
      If cpufreq_add_dev() is invoked for an offline CPU, it checks the
      new mask and only creates the symlink if the CPU was not in it (the
      CPU is added to the mask at the same time).
      
      In turn, cpufreq_remove_dev() drops the given CPU from the new mask,
      removes its symlink to the policy object and returns, unless it is
      the CPU owning the policy object.  In that case, the policy object
      is moved to a new CPU's sysfs directory or deleted if the CPU being
      removed was the last user of the policy.
      
      While at it, notice that cpufreq_remove_dev() can't fail, because
      its return value is ignored, so make it ignore return values from
      __cpufreq_remove_dev_prepare() and __cpufreq_remove_dev_finish()
      and prevent these functions from aborting on errors returned by
      __cpufreq_governor().  Also drop the now unused sif argument from
      them.
      
      Fixes: 87549141 (cpufreq: Stop migrating sysfs files on hotplug)
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reported-and-tested-by: NRussell King <linux@arm.linux.org.uk>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      559ed407
  9. 27 7月, 2015 2 次提交
  10. 25 7月, 2015 1 次提交
  11. 24 7月, 2015 1 次提交
  12. 23 7月, 2015 2 次提交
  13. 19 7月, 2015 1 次提交
    • S
      hid-sensor: Fix suspend/resume delay · 88cc7b4e
      Srinivas Pandruvada 提交于
      By default all the sensors are runtime suspended state (lowest power
      state). During Linux suspend process, all the run time suspended
      devices are resumed and then suspended. This caused all sensors to
      power up and introduced delay in suspend time, when we introduced
      runtime PM for HID sensors. The opposite process happens during resume
      process.
      
      To fix this, we do powerup process of the sensors only when the request
      is issued from user (raw or tiggerred). In this way when runtime,
      resume calls for powerup it will simply return as this will not match
      user requested state.
      
      Note this is a regression fix as the increase in suspend / resume
      times can be substantial (report of 8 seconds on Len's laptop!)
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Tested-by: NLen Brown <len.brown@intel.com>
      Cc: <Stable@vger.kernel.org>
      Signed-off-by: NJonathan Cameron <jic23@kernel.org>
      88cc7b4e
  14. 18 7月, 2015 6 次提交
  15. 16 7月, 2015 1 次提交
  16. 15 7月, 2015 3 次提交
    • D
      libata: add ATA_HORKAGE_MAX_SEC_1024 to revert back to previous max_sectors limit · af34d637
      David Milburn 提交于
      Since no longer limiting max_sectors to BLK_DEF_MAX_SECTORS (commit 34b48db6),
      data corruption may occur on ST380013AS drive configured on 82801JI (ICH10 Family)
      SATA controller. This patch will allow the driver to limit max_sectors as before
      
       # cat /sys/block/sdb/queue/max_sectors_kb
       512
      
      I was able to double the max_sectors_kb value up to 16384 on linux-4.2.0-rc2
      before seeing corruption, but seems safer to use previous limit. Without this
      patch max_sectors_kb will be 32767.
      
      tj: Minor comment update.
      Reported-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NDavid Milburn <dmilburn@redhat.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: stable@vger.kernel.org # v3.19 and later
      Fixes: 34b48db6 ("block: remove artifical max_hw_sectors cap")
      af34d637
    • A
      libata: add ATA_HORKAGE_NOTRIM · 71d126fd
      Arne Fitzenreiter 提交于
      Some devices lose data on TRIM whether queued or not.  This patch adds
      a horkage to disable TRIM.
      
      tj: Collapsed unnecessary if() nesting.
      Signed-off-by: NArne Fitzenreiter <arne_f@ipfire.org>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: stable@vger.kernel.org
      71d126fd
    • L
      efi: Handle memory error structures produced based on old versions of standard · 4c62360d
      Luck, Tony 提交于
      The memory error record structure includes as its first field a
      bitmask of which subsequent fields are valid. The allows new fields
      to be added to the structure while keeping compatibility with older
      software that parses these records. This mechanism was used between
      versions 2.2 and 2.3 to add four new fields, growing the size of the
      structure from 73 bytes to 80. But Linux just added all the new
      fields so this test:
      	if (gdata->error_data_length >= sizeof(*mem_err))
      		cper_print_mem(newpfx, mem_err);
      	else
      		goto err_section_too_small;
      now make Linux complain about old format records being too short.
      
      Add a definition for the old format of the structure and use that
      for the minimum size check. Pass the actual size to cper_print_mem()
      so it can sanity check the validation_bits field to ensure that if
      a BIOS using the old format sets bits as if it were new, we won't
      access fields beyond the end of the structure.
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      4c62360d
  17. 13 7月, 2015 3 次提交
  18. 10 7月, 2015 5 次提交
    • P
      KVM: count number of assigned devices · 5544eb9b
      Paolo Bonzini 提交于
      If there are no assigned devices, the guest PAT are not providing
      any useful information and can be overridden to writeback; VMX
      always does this because it has the "IPAT" bit in its extended
      page table entries, but SVM does not have anything similar.
      Hook into VFIO and legacy device assignment so that they
      provide this information to KVM.
      Reviewed-by: NAlex Williamson <alex.williamson@redhat.com>
      Tested-by: NJoerg Roedel <jroedel@suse.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5544eb9b
    • E
      cdc_ncm: Add support for moving NDP to end of NCM frame · 4a0e3e98
      Enrico Mioso 提交于
      NCM specs are not actually mandating a specific position in the frame for
      the NDP (Network Datagram Pointer). However, some Huawei devices will
      ignore our aggregates if it is not placed after the datagrams it points
      to. Add support for doing just this, in a per-device configurable way.
      While at it, update NCM subdrivers, disabling this functionality in all of
      them, except in huawei_cdc_ncm where it is enabled instead.
      We aren't making any distinction between different Huawei NCM devices,
      based on what the vendor driver does. Standard NCM devices are left
      unaffected: if they are compliant, they should be always usable, still
      stay on the safe side.
      
      This change has been tested and working with a Huawei E3131 device (which
      works regardless of NDP position), a Huawei E3531 (also working both
      ways) and an E3372 (which mandates NDP to be after indexed datagrams).
      
      V1->V2:
      - corrected wrong NDP acronym definition
      - fixed possible NULL pointer dereference
      - patch cleanup
      V2->V3:
      - Properly account for the NDP size when writing new packets to SKB
      Signed-off-by: NEnrico Mioso <mrkiko.rs@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4a0e3e98
    • T
      blkcg: fix blkcg_policy_data allocation bug · 06b285bd
      Tejun Heo 提交于
      e48453c3 ("block, cgroup: implement policy-specific per-blkcg
      data") updated per-blkcg policy data to be dynamically allocated.
      When a policy is registered, its policy data aren't created.  Instead,
      when the policy is activated on a queue, the policy data are allocated
      if there are blkg's (blkcg_gq's) which are attached to a given blkcg.
      This is buggy.  Consider the following scenario.
      
      1. A blkcg is created.  No blkg's attached yet.
      
      2. The policy is registered.  No policy data is allocated.
      
      3. The policy is activated on a queue.  As the above blkcg doesn't
         have any blkg's, it won't allocate the matching blkcg_policy_data.
      
      4. An IO is issued from the blkcg and blkg is created and the blkcg
         still doesn't have the matching policy data allocated.
      
      With cfq-iosched, this leads to an oops.
      
      It also doesn't free policy data on policy unregistration assuming
      that freeing of all policy data on blkcg destruction should take care
      of it; however, this also is incorrect.
      
      1. A blkcg has policy data.
      
      2. The policy gets unregistered but the policy data remains.
      
      3. Another policy gets registered on the same slot.
      
      4. Later, the new policy tries to allocate policy data on the previous
         blkcg but the slot is already occupied and gets skipped.  The
         policy ends up operating on the policy data of the previous policy.
      
      There's no reason to manage blkcg_policy_data lazily.  The reason we
      do lazy allocation of blkg's is that the number of all possible blkg's
      is the product of cgroups and block devices which can reach a
      surprising level.  blkcg_policy_data is contrained by the number of
      cgroups and shouldn't be a problem.
      
      This patch makes blkcg_policy_data to be allocated for all existing
      blkcg's on policy registration and freed on unregistration and removes
      blkcg_policy_data handling from policy [de]activation paths.  This
      makes that blkcg_policy_data are created and removed with the policy
      they belong to and fixes the above described problems.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Fixes: e48453c3 ("block, cgroup: implement policy-specific per-blkcg data")
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      06b285bd
    • T
      blkcg: implement all_blkcgs list · 7876f930
      Tejun Heo 提交于
      Add all_blkcgs list goes through blkcg->all_blkcgs_node and is
      protected by blkcg_pol_mutex.  This will be used to fix
      blkcg_policy_data allocation bug.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Arianna Avanzini <avanzini.arianna@gmail.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      7876f930
    • I
      libceph: enable ceph in a non-default network namespace · 757856d2
      Ilya Dryomov 提交于
      Grab a reference on a network namespace of the 'rbd map' (in case of
      rbd) or 'mount' (in case of ceph) process and use that to open sockets
      instead of always using init_net and bailing if network namespace is
      anything but init_net.  Be careful to not share struct ceph_client
      instances between different namespaces and don't add any code in the
      !CONFIG_NET_NS case.
      
      This is based on a patch from Hong Zhiguo <zhiguohong@tencent.com>.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      757856d2
  19. 09 7月, 2015 1 次提交
  20. 08 7月, 2015 3 次提交
    • T
      hotplug: Prevent alloc/free of irq descriptors during cpu up/down · a8994181
      Thomas Gleixner 提交于
      When a cpu goes up some architectures (e.g. x86) have to walk the irq
      space to set up the vector space for the cpu. While this needs extra
      protection at the architecture level we can avoid a few race
      conditions by preventing the concurrent allocation/free of irq
      descriptors and the associated data.
      
      When a cpu goes down it moves the interrupts which are targeted to
      this cpu away by reassigning the affinities. While this happens
      interrupts can be allocated and freed, which opens a can of race
      conditions in the code which reassignes the affinities because
      interrupt descriptors might be freed underneath.
      
      Example:
      
      CPU1				CPU2
      cpu_up/down
       irq_desc = irq_to_desc(irq);
      				remove_from_radix_tree(desc);
       raw_spin_lock(&desc->lock);
      				free(desc);
      
      We could protect the irq descriptors with RCU, but that would require
      a full tree change of all accesses to interrupt descriptors. But
      fortunately these kind of race conditions are rather limited to a few
      things like cpu hotplug. The normal setup/teardown is very well
      serialized. So the simpler and obvious solution is:
      
      Prevent allocation and freeing of interrupt descriptors accross cpu
      hotplug.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: xiao jin <jin.xiao@intel.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
      Link: http://lkml.kernel.org/r/20150705171102.063519515@linutronix.de
      a8994181
    • S
      mtd: nand: Fix NAND_USE_BOUNCE_BUFFER flag conflict · 5f867db6
      Scott Wood 提交于
      Commit 66507c7b ("mtd: nand: Add support to use nand_base
      poi databuf as bounce buffer") added a flag NAND_USE_BOUNCE_BUFFER
      using the same bit value as the existing NAND_BUSWIDTH_AUTO.
      
      Cc: Kamal Dasu <kdasu.kdev@gmail.com>
      Fixes: 66507c7b ("mtd: nand: Add support to use nand_base
      	poi databuf as bounce buffer")
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Signed-off-by: NBrian Norris <computersforpeace@gmail.com>
      5f867db6
    • T
      tick/broadcast: Unbreak CONFIG_GENERIC_CLOCKEVENTS=n build · 37b64a42
      Thomas Gleixner 提交于
      Making tick_broadcast_oneshot_control() independent from
      CONFIG_GENERIC_CLOCKEVENTS_BROADCAST broke the build for
      CONFIG_GENERIC_CLOCKEVENTS=n because the function is not defined
      there.
      
      Provide a proper stub inline.
      
      Fixes: f32dd117 'tick/broadcast: Make idle check independent from mode and config'
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      37b64a42