1. 17 10月, 2013 1 次提交
    • J
      mm: memcg: handle non-error OOM situations more gracefully · 49426420
      Johannes Weiner 提交于
      Commit 3812c8c8 ("mm: memcg: do not trap chargers with full
      callstack on OOM") assumed that only a few places that can trigger a
      memcg OOM situation do not return VM_FAULT_OOM, like optional page cache
      readahead.  But there are many more and it's impractical to annotate
      them all.
      
      First of all, we don't want to invoke the OOM killer when the failed
      allocation is gracefully handled, so defer the actual kill to the end of
      the fault handling as well.  This simplifies the code quite a bit for
      added bonus.
      
      Second, since a failed allocation might not be the abrupt end of the
      fault, the memcg OOM handler needs to be re-entrant until the fault
      finishes for subsequent allocation attempts.  If an allocation is
      attempted after the task already OOMed, allow it to bypass the limit so
      that it can quickly finish the fault and invoke the OOM killer.
      Reported-by: NazurIt <azurit@pobox.sk>
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      49426420
  2. 15 10月, 2013 1 次提交
  3. 11 10月, 2013 6 次提交
  4. 04 10月, 2013 1 次提交
  5. 01 10月, 2013 2 次提交
    • N
      skbuff: size of hole is wrong in a comment · 45906723
      Nicolas Dichtel 提交于
      Since commit c93bdd0e ("netvm: allow skb allocation to use PFMEMALLOC
      reserves"), hole size is one bit less than what is written in the comment.
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      45906723
    • R
      mm: avoid reinserting isolated balloon pages into LRU lists · 117aad1e
      Rafael Aquini 提交于
      Isolated balloon pages can wrongly end up in LRU lists when
      migrate_pages() finishes its round without draining all the isolated
      page list.
      
      The same issue can happen when reclaim_clean_pages_from_list() tries to
      reclaim pages from an isolated page list, before migration, in the CMA
      path.  Such balloon page leak opens a race window against LRU lists
      shrinkers that leads us to the following kernel panic:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
        IP: [<ffffffff810c2625>] shrink_page_list+0x24e/0x897
        PGD 3cda2067 PUD 3d713067 PMD 0
        Oops: 0000 [#1] SMP
        CPU: 0 PID: 340 Comm: kswapd0 Not tainted 3.12.0-rc1-22626-g4367597 #87
        Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
        RIP: shrink_page_list+0x24e/0x897
        RSP: 0000:ffff88003da499b8  EFLAGS: 00010286
        RAX: 0000000000000000 RBX: ffff88003e82bd60 RCX: 00000000000657d5
        RDX: 0000000000000000 RSI: 000000000000031f RDI: ffff88003e82bd40
        RBP: ffff88003da49ab0 R08: 0000000000000001 R09: 0000000081121a45
        R10: ffffffff81121a45 R11: ffff88003c4a9a28 R12: ffff88003e82bd40
        R13: ffff88003da0e800 R14: 0000000000000001 R15: ffff88003da49d58
        FS:  0000000000000000(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00000000067d9000 CR3: 000000003ace5000 CR4: 00000000000407b0
        Call Trace:
          shrink_inactive_list+0x240/0x3de
          shrink_lruvec+0x3e0/0x566
          __shrink_zone+0x94/0x178
          shrink_zone+0x3a/0x82
          balance_pgdat+0x32a/0x4c2
          kswapd+0x2f0/0x372
          kthread+0xa2/0xaa
          ret_from_fork+0x7c/0xb0
        Code: 80 7d 8f 01 48 83 95 68 ff ff ff 00 4c 89 e7 e8 5a 7b 00 00 48 85 c0 49 89 c5 75 08 80 7d 8f 00 74 3e eb 31 48 8b 80 18 01 00 00 <48> 8b 74 0d 48 8b 78 30 be 02 00 00 00 ff d2 eb
        RIP  [<ffffffff810c2625>] shrink_page_list+0x24e/0x897
         RSP <ffff88003da499b8>
        CR2: 0000000000000028
        ---[ end trace 703d2451af6ffbfd ]---
        Kernel panic - not syncing: Fatal exception
      
      This patch fixes the issue, by assuring the proper tests are made at
      putback_movable_pages() & reclaim_clean_pages_from_list() to avoid
      isolated balloon pages being wrongly reinserted in LRU lists.
      
      [akpm@linux-foundation.org: clarify awkward comment text]
      Signed-off-by: NRafael Aquini <aquini@redhat.com>
      Reported-by: NLuiz Capitulino <lcapitulino@redhat.com>
      Tested-by: NLuiz Capitulino <lcapitulino@redhat.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      117aad1e
  6. 29 9月, 2013 1 次提交
  7. 28 9月, 2013 1 次提交
  8. 27 9月, 2013 2 次提交
    • K
      Drivers: hv: util: Correctly support ws2008R2 and earlier · 3a491605
      K. Y. Srinivasan 提交于
      The current code does not correctly negotiate the version numbers for the util
      driver when hosted on earlier hosts. The version numbers presented by this
      driver were not compatible with the version numbers supported by Windows Server
      2008. Fix this problem.
      
      I would like to thank Olaf Hering (ohering@suse.com) for identifying the problem.
      Reported-by: NOlaf Hering <ohering@suse.com>
      Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3a491605
    • A
      bcma: make bcma_core_pci_{up,down}() callable from atomic context · 2bedea8f
      Arend van Spriel 提交于
      This patch removes the bcma_core_pci_power_save() call from
      the bcma_core_pci_{up,down}() functions as it tries to schedule
      thus requiring to call them from non-atomic context. The function
      bcma_core_pci_power_save() is now exported so the calling module
      can explicitly use it in non-atomic context. This fixes the
      'scheduling while atomic' issue reported by Tod Jackson and
      Joe Perches.
      
      [   13.210710] BUG: scheduling while atomic: dhcpcd/1800/0x00000202
      [   13.210718] Modules linked in: brcmsmac nouveau coretemp kvm_intel kvm cordic brcmutil bcma dell_wmi atl1c ttm mxm_wmi wmi
      [   13.210756] CPU: 2 PID: 1800 Comm: dhcpcd Not tainted 3.11.0-wl #1
      [   13.210762] Hardware name: Alienware M11x R2/M11x R2, BIOS A04 11/23/2010
      [   13.210767]  ffff880177c92c40 ffff880170fd1948 ffffffff8169af5b 0000000000000007
      [   13.210777]  ffff880170fd1ab0 ffff880170fd1958 ffffffff81697ee2 ffff880170fd19d8
      [   13.210785]  ffffffff816a19f5 00000000000f4240 000000000000d080 ffff880170fd1fd8
      [   13.210794] Call Trace:
      [   13.210813]  [<ffffffff8169af5b>] dump_stack+0x4f/0x84
      [   13.210826]  [<ffffffff81697ee2>] __schedule_bug+0x43/0x51
      [   13.210837]  [<ffffffff816a19f5>] __schedule+0x6e5/0x810
      [   13.210845]  [<ffffffff816a1c34>] schedule+0x24/0x70
      [   13.210855]  [<ffffffff816a04fc>] schedule_hrtimeout_range_clock+0x10c/0x150
      [   13.210867]  [<ffffffff810684e0>] ? update_rmtp+0x60/0x60
      [   13.210877]  [<ffffffff8106915f>] ? hrtimer_start_range_ns+0xf/0x20
      [   13.210887]  [<ffffffff816a054e>] schedule_hrtimeout_range+0xe/0x10
      [   13.210897]  [<ffffffff8104f6fb>] usleep_range+0x3b/0x40
      [   13.210910]  [<ffffffffa00371af>] bcma_pcie_mdio_set_phy.isra.3+0x4f/0x80 [bcma]
      [   13.210921]  [<ffffffffa003729f>] bcma_pcie_mdio_write.isra.4+0xbf/0xd0 [bcma]
      [   13.210932]  [<ffffffffa0037498>] bcma_pcie_mdio_writeread.isra.6.constprop.13+0x18/0x30 [bcma]
      [   13.210942]  [<ffffffffa00374ee>] bcma_core_pci_power_save+0x3e/0x80 [bcma]
      [   13.210953]  [<ffffffffa003765d>] bcma_core_pci_up+0x2d/0x60 [bcma]
      [   13.210975]  [<ffffffffa03dc17c>] brcms_c_up+0xfc/0x430 [brcmsmac]
      [   13.210989]  [<ffffffffa03d1a7d>] brcms_up+0x1d/0x20 [brcmsmac]
      [   13.211003]  [<ffffffffa03d2498>] brcms_ops_start+0x298/0x340 [brcmsmac]
      [   13.211020]  [<ffffffff81600a12>] ? cfg80211_netdev_notifier_call+0xd2/0x5f0
      [   13.211030]  [<ffffffff815fa53d>] ? packet_notifier+0xad/0x1d0
      [   13.211064]  [<ffffffff81656e75>] ieee80211_do_open+0x325/0xf80
      [   13.211076]  [<ffffffff8106ac09>] ? __raw_notifier_call_chain+0x9/0x10
      [   13.211086]  [<ffffffff81657b41>] ieee80211_open+0x71/0x80
      [   13.211101]  [<ffffffff81526267>] __dev_open+0x87/0xe0
      [   13.211109]  [<ffffffff8152650c>] __dev_change_flags+0x9c/0x180
      [   13.211117]  [<ffffffff815266a3>] dev_change_flags+0x23/0x70
      [   13.211127]  [<ffffffff8158cd68>] devinet_ioctl+0x5b8/0x6a0
      [   13.211136]  [<ffffffff8158d5c5>] inet_ioctl+0x75/0x90
      [   13.211147]  [<ffffffff8150b38b>] sock_do_ioctl+0x2b/0x70
      [   13.211155]  [<ffffffff8150b681>] sock_ioctl+0x71/0x2a0
      [   13.211169]  [<ffffffff8114ed47>] do_vfs_ioctl+0x87/0x520
      [   13.211180]  [<ffffffff8113f159>] ? ____fput+0x9/0x10
      [   13.211198]  [<ffffffff8106228c>] ? task_work_run+0x9c/0xd0
      [   13.211202]  [<ffffffff8114f271>] SyS_ioctl+0x91/0xb0
      [   13.211208]  [<ffffffff816aa252>] system_call_fastpath+0x16/0x1b
      [   13.211217] NOHZ: local_softirq_pending 202
      
      The issue was introduced in v3.11 kernel by following commit:
      
      commit aa51e598
      Author: Hauke Mehrtens <hauke@hauke-m.de>
      Date:   Sat Aug 24 00:32:31 2013 +0200
      
          brcmsmac: use bcma PCIe up and down functions
      
          replace the calls to bcma_core_pci_extend_L1timer() by calls to the
          newly introduced bcma_core_pci_ip() and bcma_core_pci_down()
      Signed-off-by: NHauke Mehrtens <hauke@hauke-m.de>
          Cc: Arend van Spriel <arend@broadcom.com>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      
      This fix has been discussed with Hauke Mehrtens [1] selection
      option 3) and is intended for v3.12.
      
      Ref:
      [1] http://mid.gmane.org/5239B12D.3040206@hauke-m.de
      
      Cc: <stable@vger.kernel.org> # 3.11.x
      Cc: Tod Jackson <tod.jackson@gmail.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Rafal Milecki <zajec5@gmail.com>
      Cc: Hauke Mehrtens <hauke@hauke-m.de>
      Reviewed-by: NHante Meuleman <meuleman@broadcom.com>
      Signed-off-by: NArend van Spriel <arend@broadcom.com>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      2bedea8f
  9. 26 9月, 2013 2 次提交
    • T
      NFSv4: Honour the 'opened' parameter in the atomic_open() filesystem method · 5bc2afc2
      Trond Myklebust 提交于
      Determine if we've created a new file by examining the directory change
      attribute and/or the O_EXCL flag.
      
      This fixes a regression when doing a non-exclusive create of a new file.
      If the FILE_CREATED flag is not set, the atomic_open() command will
      perform full file access permissions checks instead of just checking
      for MAY_OPEN.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      5bc2afc2
    • D
      HID: uhid: allocate static minor · 19872d20
      David Herrmann 提交于
      udev has this nice feature of creating "dead" /dev/<node> device-nodes if
      it finds a devnode:<node> modalias. Once the node is accessed, the kernel
      automatically loads the module that provides the node. However, this
      requires udev to know the major:minor code to use for the node. This
      feature was introduced by:
      
        commit 578454ff
        Author: Kay Sievers <kay.sievers@vrfy.org>
        Date:   Thu May 20 18:07:20 2010 +0200
      
            driver core: add devname module aliases to allow module on-demand auto-loading
      
      However, uhid uses dynamic minor numbers so this doesn't actually work. We
      need to load uhid to know which minor it's going to use.
      
      Hence, allocate a static minor (just like uinput does) and we're good
      to go.
      Reported-by: NTom Gundersen <teg@jklm.no>
      Signed-off-by: NDavid Herrmann <dh.herrmann@gmail.com>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      19872d20
  10. 25 9月, 2013 5 次提交
    • R
      of: clean-up ifdefs in of_irq.h · b0b8c960
      Rob Herring 提交于
      Much of of_irq.h is needlessly ifdef'ed. Clean this up and minimize the
      amount ifdef'ed code. This fixes some  build warnings when CONFIG_OF
      is not enabled (seen on i386 and x86_64):
      
      include/linux/of_irq.h:82:7: warning: 'struct device_node' declared inside parameter list [enabled by default]
      include/linux/of_irq.h:82:7: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default]
      include/linux/of_irq.h:87:47: warning: 'struct device_node' declared inside parameter list [enabled by default]
      
      Compile tested on i386, sparc and arm.
      Reported-by: NRandy Dunlap <rdunlap@infradead.org>
      Cc: Grant Likely <grant.likely@linaro.org>
      Signed-off-by: NRob Herring <rob.herring@calxeda.com>
      b0b8c960
    • A
      revert "memcg, vmscan: integrate soft reclaim tighter with zone shrinking code" · 0608f43d
      Andrew Morton 提交于
      Revert commit 3b38722e ("memcg, vmscan: integrate soft reclaim
      tighter with zone shrinking code")
      
      I merged this prematurely - Michal and Johannes still disagree about the
      overall design direction and the future remains unclear.
      
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0608f43d
    • A
      revert "vmscan, memcg: do softlimit reclaim also for targeted reclaim" · b1aff7fc
      Andrew Morton 提交于
      Revert commit a5b7c87f ("vmscan, memcg: do softlimit reclaim also
      for targeted reclaim")
      
      I merged this prematurely - Michal and Johannes still disagree about the
      overall design direction and the future remains unclear.
      
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b1aff7fc
    • A
      revert "memcg: enhance memcg iterator to support predicates" · 694fbc0f
      Andrew Morton 提交于
      Revert commit de57780d ("memcg: enhance memcg iterator to support
      predicates")
      
      I merged this prematurely - Michal and Johannes still disagree about the
      overall design direction and the future remains unclear.
      
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      694fbc0f
    • M
      watchdog: update watchdog_thresh properly · 9809b18f
      Michal Hocko 提交于
      watchdog_tresh controls how often nmi perf event counter checks per-cpu
      hrtimer_interrupts counter and blows up if the counter hasn't changed
      since the last check.  The counter is updated by per-cpu
      watchdog_hrtimer hrtimer which is scheduled with 2/5 watchdog_thresh
      period which guarantees that hrtimer is scheduled 2 times per the main
      period.  Both hrtimer and perf event are started together when the
      watchdog is enabled.
      
      So far so good.  But...
      
      But what happens when watchdog_thresh is updated from sysctl handler?
      
      proc_dowatchdog will set a new sampling period and hrtimer callback
      (watchdog_timer_fn) will use the new value in the next round.  The
      problem, however, is that nobody tells the perf event that the sampling
      period has changed so it is ticking with the period configured when it
      has been set up.
      
      This might result in an ear ripping dissonance between perf and hrtimer
      parts if the watchdog_thresh is increased.  And even worse it might lead
      to KABOOM if the watchdog is configured to panic on such a spurious
      lockup.
      
      This patch fixes the issue by updating both nmi perf even counter and
      hrtimers if the threshold value has changed.
      
      The nmi one is disabled and then reinitialized from scratch.  This has
      an unpleasant side effect that the allocation of the new event might
      fail theoretically so the hard lockup detector would be disabled for
      such cpus.  On the other hand such a memory allocation failure is very
      unlikely because the original event is deallocated right before.
      
      It would be much nicer if we just changed perf event period but there
      doesn't seem to be any API to do that right now.  It is also unfortunate
      that perf_event_alloc uses GFP_KERNEL allocation unconditionally so we
      cannot use on_each_cpu() and do the same thing from the per-cpu context.
      The update from the current CPU should be safe because
      perf_event_disable removes the event atomically before it clears the
      per-cpu watchdog_ev so it cannot change anything under running handler
      feet.
      
      The hrtimer is simply restarted (thanks to Don Zickus who has pointed
      this out) if it is queued because we cannot rely it will fire&adopt to
      the new sampling period before a new nmi event triggers (when the
      treshold is decreased).
      
      [akpm@linux-foundation.org: the UP version of __smp_call_function_single ended up in the wrong place]
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NDon Zickus <dzickus@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Fabio Estevam <festevam@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9809b18f
  11. 24 9月, 2013 1 次提交
  12. 23 9月, 2013 1 次提交
  13. 22 9月, 2013 1 次提交
  14. 21 9月, 2013 1 次提交
  15. 20 9月, 2013 1 次提交
    • M
      dm mpath: disable WRITE SAME if it fails · f84cb8a4
      Mike Snitzer 提交于
      Workaround the SCSI layer's problematic WRITE SAME heuristics by
      disabling WRITE SAME in the DM multipath device's queue_limits if an
      underlying device disabled it.
      
      The WRITE SAME heuristics, with both the original commit 5db44863
      ("[SCSI] sd: Implement support for WRITE SAME") and the updated commit
      66c28f97 ("[SCSI] sd: Update WRITE SAME heuristics"), default to enabling
      WRITE SAME(10) even without successfully determining it is supported.
      After the first failed WRITE SAME the SCSI layer will disable WRITE SAME
      for the device (by setting sdkp->device->no_write_same which results in
      'max_write_same_sectors' in device's queue_limits to be set to 0).
      
      When a device is stacked ontop of such a SCSI device any changes to that
      SCSI device's queue_limits do not automatically propagate up the stack.
      As such, a DM multipath device will not have its WRITE SAME support
      disabled.  This causes the block layer to continue to issue WRITE SAME
      requests to the mpath device which causes paths to fail and (if mpath IO
      isn't configured to queue when no paths are available) it will result in
      actual IO errors to the upper layers.
      
      This fix doesn't help configurations that have additional devices
      stacked ontop of the mpath device (e.g. LVM created linear DM devices
      ontop).  A proper fix that restacks all the queue_limits from the bottom
      of the device stack up will need to be explored if SCSI will continue to
      use this model of optimistically allowing op codes and then disabling
      them after they fail for the first time.
      
      Before this patch:
      
      EXT4-fs (dm-6): mounted filesystem with ordered data mode. Opts: (null)
      device-mapper: multipath: XXX snitm debugging: got -EREMOTEIO (-121)
      device-mapper: multipath: XXX snitm debugging: failing WRITE SAME IO with error=-121
      end_request: critical target error, dev dm-6, sector 528
      dm-6: WRITE SAME failed. Manually zeroing.
      device-mapper: multipath: Failing path 8:112.
      end_request: I/O error, dev dm-6, sector 4616
      dm-6: WRITE SAME failed. Manually zeroing.
      end_request: I/O error, dev dm-6, sector 4616
      end_request: I/O error, dev dm-6, sector 5640
      end_request: I/O error, dev dm-6, sector 6664
      end_request: I/O error, dev dm-6, sector 7688
      end_request: I/O error, dev dm-6, sector 524288
      Buffer I/O error on device dm-6, logical block 65536
      lost page write due to I/O error on dm-6
      JBD2: Error -5 detected when updating journal superblock for dm-6-8.
      end_request: I/O error, dev dm-6, sector 524296
      Aborting journal on device dm-6-8.
      end_request: I/O error, dev dm-6, sector 524288
      Buffer I/O error on device dm-6, logical block 65536
      lost page write due to I/O error on dm-6
      JBD2: Error -5 detected when updating journal superblock for dm-6-8.
      
      # cat /sys/block/sdh/queue/write_same_max_bytes
      0
      # cat /sys/block/dm-6/queue/write_same_max_bytes
      33553920
      
      After this patch:
      
      EXT4-fs (dm-6): mounted filesystem with ordered data mode. Opts: (null)
      device-mapper: multipath: XXX snitm debugging: got -EREMOTEIO (-121)
      device-mapper: multipath: XXX snitm debugging: WRITE SAME I/O failed with error=-121
      end_request: critical target error, dev dm-6, sector 528
      dm-6: WRITE SAME failed. Manually zeroing.
      
      # cat /sys/block/sdh/queue/write_same_max_bytes
      0
      # cat /sys/block/dm-6/queue/write_same_max_bytes
      0
      
      It should be noted that WRITE SAME support wasn't enabled in DM
      multipath until v3.10.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: stable@vger.kernel.org # 3.10+
      f84cb8a4
  16. 17 9月, 2013 3 次提交
  17. 16 9月, 2013 1 次提交
  18. 13 9月, 2013 9 次提交