1. 12 7月, 2012 3 次提交
    • Y
      memblock: free allocated memblock_reserved_regions later · 29f67386
      Yinghai Lu 提交于
      memblock_free_reserved_regions() calls memblock_free(), but
      memblock_free() would double reserved.regions too, so we could free the
      old range for reserved.regions.
      
      Also tj said there is another bug which could be related to this.
      
      | I don't think we're saving any noticeable
      | amount by doing this "free - give it to page allocator - reserve
      | again" dancing.  We should just allocate regions aligned to page
      | boundaries and free them later when memblock is no longer in use.
      
      in that case, when DEBUG_PAGEALLOC, will get panic:
      
           memblock_free: [0x0000102febc080-0x0000102febf080] memblock_free_reserved_regions+0x37/0x39
        BUG: unable to handle kernel paging request at ffff88102febd948
        IP: [<ffffffff836a5774>] __next_free_mem_range+0x9b/0x155
        PGD 4826063 PUD cf67a067 PMD cf7fa067 PTE 800000102febd160
        Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
        CPU 0
        Pid: 0, comm: swapper Not tainted 3.5.0-rc2-next-20120614-sasha #447
        RIP: 0010:[<ffffffff836a5774>]  [<ffffffff836a5774>] __next_free_mem_range+0x9b/0x155
      
      See the discussion at https://lkml.org/lkml/2012/6/13/469
      
      So try to allocate with PAGE_SIZE alignment and free it later.
      Reported-by: NSasha Levin <levinsasha928@gmail.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      29f67386
    • Y
      mm: sparse: fix usemap allocation above node descriptor section · 99ab7b19
      Yinghai Lu 提交于
      After commit f5bf18fa ("bootmem/sparsemem: remove limit constraint
      in alloc_bootmem_section"), usemap allocations may easily be placed
      outside the optimal section that holds the node descriptor, even if
      there is space available in that section.  This results in unnecessary
      hotplug dependencies that need to have the node unplugged before the
      section holding the usemap.
      
      The reason is that the bootmem allocator doesn't guarantee a linear
      search starting from the passed allocation goal but may start out at a
      much higher address absent an upper limit.
      
      Fix this by trying the allocation with the limit at the section end,
      then retry without if that fails.  This keeps the fix from f5bf18fa
      of not panicking if the allocation does not fit in the section, but
      still makes sure to try to stay within the section at first.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: <stable@vger.kernel.org>	[3.3.x, 3.4.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      99ab7b19
    • J
      memory hotplug: fix invalid memory access caused by stale kswapd pointer · d8adde17
      Jiang Liu 提交于
      kswapd_stop() is called to destroy the kswapd work thread when all memory
      of a NUMA node has been offlined.  But kswapd_stop() only terminates the
      work thread without resetting NODE_DATA(nid)->kswapd to NULL.  The stale
      pointer will prevent kswapd_run() from creating a new work thread when
      adding memory to the memory-less NUMA node again.  Eventually the stale
      pointer may cause invalid memory access.
      
      An example stack dump as below. It's reproduced with 2.6.32, but latest
      kernel has the same issue.
      
        BUG: unable to handle kernel NULL pointer dereference at (null)
        IP: [<ffffffff81051a94>] exit_creds+0x12/0x78
        PGD 0
        Oops: 0000 [#1] SMP
        last sysfs file: /sys/devices/system/memory/memory391/state
        CPU 11
        Modules linked in: cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq microcode fuse loop dm_mod tpm_tis rtc_cmos i2c_i801 rtc_core tpm serio_raw pcspkr sg tpm_bios igb i2c_core iTCO_wdt rtc_lib mptctl iTCO_vendor_support button dca bnx2 usbhid hid uhci_hcd ehci_hcd usbcore sd_mod crc_t10dif edd ext3 mbcache jbd fan ide_pci_generic ide_core ata_generic ata_piix libata thermal processor thermal_sys hwmon mptsas mptscsih mptbase scsi_transport_sas scsi_mod
        Pid: 7949, comm: sh Not tainted 2.6.32.12-qiuxishi-5-default #92 Tecal RH2285
        RIP: 0010:exit_creds+0x12/0x78
        RSP: 0018:ffff8806044f1d78  EFLAGS: 00010202
        RAX: 0000000000000000 RBX: ffff880604f22140 RCX: 0000000000019502
        RDX: 0000000000000000 RSI: 0000000000000202 RDI: 0000000000000000
        RBP: ffff880604f22150 R08: 0000000000000000 R09: ffffffff81a4dc10
        R10: 00000000000032a0 R11: ffff880006202500 R12: 0000000000000000
        R13: 0000000000c40000 R14: 0000000000008000 R15: 0000000000000001
        FS:  00007fbc03d066f0(0000) GS:ffff8800282e0000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
        CR2: 0000000000000000 CR3: 000000060f029000 CR4: 00000000000006e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
        Process sh (pid: 7949, threadinfo ffff8806044f0000, task ffff880603d7c600)
        Stack:
         ffff880604f22140 ffffffff8103aac5 ffff880604f22140 ffffffff8104d21e
         ffff880006202500 0000000000008000 0000000000c38000 ffffffff810bd5b1
         0000000000000000 ffff880603d7c600 00000000ffffdd29 0000000000000003
        Call Trace:
          __put_task_struct+0x5d/0x97
          kthread_stop+0x50/0x58
          offline_pages+0x324/0x3da
          memory_block_change_state+0x179/0x1db
          store_mem_state+0x9e/0xbb
          sysfs_write_file+0xd0/0x107
          vfs_write+0xad/0x169
          sys_write+0x45/0x6e
          system_call_fastpath+0x16/0x1b
        Code: ff 4d 00 0f 94 c0 84 c0 74 08 48 89 ef e8 1f fd ff ff 5b 5d 31 c0 41 5c c3 53 48 8b 87 20 06 00 00 48 89 fb 48 8b bf 18 06 00 00 <8b> 00 48 c7 83 18 06 00 00 00 00 00 00 f0 ff 0f 0f 94 c0 84 c0
        RIP  exit_creds+0x12/0x78
         RSP <ffff8806044f1d78>
        CR2: 0000000000000000
      
      [akpm@linux-foundation.org: add pglist_data.kswapd locking comments]
      Signed-off-by: NXishi Qiu <qiuxishi@huawei.com>
      Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Reviewed-by: NMinchan Kim <minchan@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d8adde17
  2. 11 7月, 2012 1 次提交
    • A
      PCI: EHCI: fix crash during suspend on ASUS computers · dbf0e4c7
      Alan Stern 提交于
      Quite a few ASUS computers experience a nasty problem, related to the
      EHCI controllers, when going into system suspend.  It was observed
      that the problem didn't occur if the controllers were not put into the
      D3 power state before starting the suspend, and commit
      151b6128 (USB: EHCI: fix crash during
      suspend on ASUS computers) was created to do this.
      
      It turned out this approach messed up other computers that didn't have
      the problem -- it prevented USB wakeup from working.  Consequently
      commit c2fb8a3f (USB: add
      NO_D3_DURING_SLEEP flag and revert 151b6128) was merged; it
      reverted the earlier commit and added a whitelist of known good board
      names.
      
      Now we know the actual cause of the problem.  Thanks to AceLan Kao for
      tracking it down.
      
      According to him, an engineer at ASUS explained that some of their
      BIOSes contain a bug that was added in an attempt to work around a
      problem in early versions of Windows.  When the computer goes into S3
      suspend, the BIOS tries to verify that the EHCI controllers were first
      quiesced by the OS.  Nothing's wrong with this, but the BIOS does it
      by checking that the PCI COMMAND registers contain 0 without checking
      the controllers' power state.  If the register isn't 0, the BIOS
      assumes the controller needs to be quiesced and tries to do so.  This
      involves making various MMIO accesses to the controller, which don't
      work very well if the controller is already in D3.  The end result is
      a system hang or memory corruption.
      
      Since the value in the PCI COMMAND register doesn't matter once the
      controller has been suspended, and since the value will be restored
      anyway when the controller is resumed, we can work around the BIOS bug
      simply by setting the register to 0 during system suspend.  This patch
      (as1590) does so and also reverts the second commit mentioned above,
      which is now unnecessary.
      
      In theory we could do this for every PCI device.  However to avoid
      introducing new problems, the patch restricts itself to EHCI host
      controllers.
      
      Finally the affected systems can suspend with USB wakeup working
      properly.
      
      Reference: https://bugzilla.kernel.org/show_bug.cgi?id=37632
      Reference: https://bugzilla.kernel.org/show_bug.cgi?id=42728Based-on-patch-by: NAceLan Kao <acelan.kao@canonical.com>
      Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
      Tested-by: NDâniel Fraga <fragabr@gmail.com>
      Tested-by: NJavier Marcet <jmarcet@gmail.com>
      Tested-by: NAndrey Rahmatullin <wrar@wrar.name>
      Tested-by: NOleksij Rempel <bug-track@fisher-privat.net>
      Tested-by: NPavel Pisa <pisa@cmp.felk.cvut.cz>
      Cc: stable <stable@vger.kernel.org>
      Acked-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dbf0e4c7
  3. 07 7月, 2012 1 次提交
  4. 05 7月, 2012 2 次提交
  5. 04 7月, 2012 2 次提交
    • O
      rpmsg: make sure inflight messages don't invoke just-removed callbacks · 15fd943a
      Ohad Ben-Cohen 提交于
      When inbound messages arrive, rpmsg core looks up their associated
      endpoint (by destination address) and then invokes their callback.
      
      We've made sure that endpoints will never be de-allocated after they
      were found by rpmsg core, but we also need to protect against the
      (rare) scenario where the rpmsg driver was just removed, and its
      callback function isn't available anymore.
      
      This is achieved by introducing a callback mutex, which must be taken
      before the callback is invoked, and, obviously, before it is removed.
      
      Cc: stable <stable@vger.kernel.org>
      Reported-by: NFernando Guzman Lugo <fernando.lugo@ti.com>
      Signed-off-by: NOhad Ben-Cohen <ohad@wizery.com>
      15fd943a
    • O
      rpmsg: avoid premature deallocation of endpoints · 5a081caa
      Ohad Ben-Cohen 提交于
      When an inbound message arrives, the rpmsg core looks up its
      associated endpoint and invokes the registered callback.
      
      If a message arrives while its endpoint is being removed (because
      the rpmsg driver was removed, or a recovery of a remote processor
      has kicked in) we must ensure atomicity, i.e.:
      
      - Either the ept is removed before it is found
      
      or
      
      - The ept is found but will not be freed until the callback returns
      
      This is achieved by maintaining a per-ept reference count, which,
      when drops to zero, will trigger deallocation of the ept.
      
      With this in hand, it is now forbidden to directly deallocate
      epts once they have been added to the endpoints idr.
      
      Cc: stable <stable@vger.kernel.org>
      Reported-by: NFernando Guzman Lugo <fernando.lugo@ti.com>
      Signed-off-by: NOhad Ben-Cohen <ohad@wizery.com>
      5a081caa
  6. 03 7月, 2012 1 次提交
  7. 01 7月, 2012 1 次提交
  8. 21 6月, 2012 3 次提交
  9. 19 6月, 2012 1 次提交
  10. 18 6月, 2012 2 次提交
    • S
      ftrace: Make all inline tags also include notrace · 93b3cca1
      Steven Rostedt 提交于
      Commit 5963e317 ("ftrace/x86: Do not change stacks in DEBUG when
      calling lockdep") prevented lockdep calls from the int3 breakpoint handler
      from reseting the stack if a function that was called was in the process
      of being converted for tracing and had a breakpoint on it. The idea is,
      before calling the lockdep code, do a load_idt() to the special IDT that
      kept the breakpoint stack from reseting. This worked well as a quick fix
      for this kernel release, until a certain config caused a lockup in the
      function tracer start up tests.
      
      Investigating it, I found that the load_idt that was used to prevent
      the int3 from changing stacks was itself being traced!
      
      Even though the config had CONFIG_OPTIMIZE_INLINING disabled, and
      all 'inline' tags were set to always inline, there were still cases that
      it did not inline! This was caused by CONFIG_PARAVIRT_GUEST, where it
      would add a pointer to the native_load_idt() which made that function
      to be traced.
      
      Commit 45959ee7 ("ftrace: Do not function trace inlined functions")
      only touched the 'inline' tags when CONFIG_OPMITIZE_INLINING was enabled.
      PARAVIRT_GUEST shows that this was not enough and we need to also
      mark always_inline with notrace as well.
      Reported-by: NFengguang Wu <wfg@linux.intel.com>
      Tested-by: NFengguang Wu <wfg@linux.intel.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      93b3cca1
    • T
      NFSv4.1: Fix umount when filelayout DS is also the MDS · 2a4c8994
      Trond Myklebust 提交于
      Currently there is a 'chicken and egg' issue when the DS is also the mounted
      MDS. The nfs_match_client() reference from nfs4_set_ds_client bumps the
      cl_count, the nfs_client is not freed at umount, and nfs4_deviceid_purge_client
      is not called to dereference the MDS usage of a deviceid which holds a
      reference to the DS nfs_client.  The result is the umount program returns,
      but the nfs_client is not freed, and the cl_session hearbeat continues.
      
      The MDS (and all other nfs mounts) lose their last nfs_client reference in
      nfs_free_server when the last nfs_server (fsid) is umounted.
      The file layout DS lose their last nfs_client reference in destroy_ds
      when the last deviceid referencing the data server is put and destroy_ds is
      called. This is triggered by a call to nfs4_deviceid_purge_client which
      removes references to a pNFS deviceid used by an MDS mount.
      
      The fix is to track how many pnfs enabled filesystems are mounted from
      this server, and then to purge the device id cache once that count reaches
      zero.
      Reported-by: NJorge Mora <Jorge.Mora@netapp.com>
      Reported-by: NAndy Adamson <andros@netapp.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      2a4c8994
  11. 16 6月, 2012 4 次提交
    • R
      vga_switcheroo.h: fix pci_dev warning · f8fee8f5
      Randy Dunlap 提交于
      Fix warnings on some architectures/configs (not on x86):
      
      include/linux/vga_switcheroo.h:28:30: warning: 'struct pci_dev' declared inside parameter list [enabled by default]
      include/linux/vga_switcheroo.h:28:30: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default]
      Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
      Cc:	Takashi Iwai <tiwai@suse.de>
      Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      f8fee8f5
    • H
      swap: fix shmem swapping when more than 8 areas · 9b15b817
      Hugh Dickins 提交于
      Minchan Kim reports that when a system has many swap areas, and tmpfs
      swaps out to the ninth or more, shmem_getpage_gfp()'s attempts to read
      back the page cannot locate it, and the read fails with -ENOMEM.
      
      Whoops.  Yes, I blindly followed read_swap_header()'s pte_to_swp_entry(
      swp_entry_to_pte()) technique for determining maximum usable swap
      offset, without stopping to realize that that actually depends upon the
      pte swap encoding shifting swap offset to the higher bits and truncating
      it there.  Whereas our radix_tree swap encoding leaves offset in the
      lower bits: it's swap "type" (that is, index of swap area) that was
      truncated.
      
      Fix it by reducing the SWP_TYPE_SHIFT() in swapops.h, and removing the
      broken radix_to_swp_entry(swp_to_radix_entry()) from read_swap_header().
      
      This does not reduce the usable size of a swap area any further, it
      leaves it as claimed when making the original commit: no change from 3.0
      on x86_64, nor on i386 without PAE; but 3.0's 512GB is reduced to 128GB
      per swapfile on i386 with PAE.  It's not a change I would have risked
      five years ago, but with x86_64 supported for ten years, I believe it's
      appropriate now.
      
      Hmm, and what if some architecture implements its swap pte with offset
      encoded below type? That would equally break the maximum usable swap
      offset check.  Happily, they all follow the same tradition of encoding
      offset above type, but I'll prepare a check on that for next.
      Reported-and-Reviewed-and-Tested-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: stable@vger.kernel.org [3.1, 3.2, 3.3, 3.4]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9b15b817
    • E
      net: remove skb_orphan_try() · 62b1a8ab
      Eric Dumazet 提交于
      Orphaning skb in dev_hard_start_xmit() makes bonding behavior
      unfriendly for applications sending big UDP bursts : Once packets
      pass the bonding device and come to real device, they might hit a full
      qdisc and be dropped. Without orphaning, the sender is automatically
      throttled because sk->sk_wmemalloc reaches sk->sk_sndbuf (assuming
      sk_sndbuf is not too big)
      
      We could try to defer the orphaning adding another test in
      dev_hard_start_xmit(), but all this seems of little gain,
      now that BQL tends to make packets more likely to be parked
      in Qdisc queues instead of NIC TX ring, in cases where performance
      matters.
      
      Reverts commits :
      fc6055a5 net: Introduce skb_orphan_try()
      87fd308c net: skb_tx_hash() fix relative to skb_orphan_try()
      and removes SKBTX_DRV_NEEDS_SK_REF flag
      Reported-and-bisected-by: NJean-Michel Hautbois <jhautbois@gmail.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Tested-by: NOliver Hartkopp <socketcan@hartkopp.net>
      Acked-by: NOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      62b1a8ab
    • K
      kmsg - kmsg_dump() use iterator to receive log buffer content · e2ae715d
      Kay Sievers 提交于
      Provide an iterator to receive the log buffer content, and convert all
      kmsg_dump() users to it.
      
      The structured data in the kmsg buffer now contains binary data, which
      should no longer be copied verbatim to the kmsg_dump() users.
      
      The iterator should provide reliable access to the buffer data, and also
      supports proper log line-aware chunking of data while iterating.
      Signed-off-by: NKay Sievers <kay@vrfy.org>
      Tested-by: NTony Luck <tony.luck@intel.com>
      Reported-by: NAnton Vorontsov <anton.vorontsov@linaro.org>
      Tested-by: NAnton Vorontsov <anton.vorontsov@linaro.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e2ae715d
  12. 15 6月, 2012 1 次提交
    • A
      block: Drop dead function blk_abort_queue() · 76aaa510
      Asias He 提交于
      This function was only used by btrfs code in btrfs_abort_devices()
      (seems in a wrong way).
      
      It was removed in commit d07eb911,
      So, Let's remove the dead code to avoid any confusion.
      
      Changes in v2: update commit log, btrfs_abort_devices() was removed
      already.
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: linux-kernel@vger.kernel.org
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: linux-btrfs@vger.kernel.org
      Cc: David Sterba <dave@jikos.cz>
      Signed-off-by: NAsias He <asias@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      76aaa510
  13. 14 6月, 2012 4 次提交
  14. 12 6月, 2012 2 次提交
  15. 11 6月, 2012 2 次提交
  16. 10 6月, 2012 1 次提交
    • P
      net: Make linux/tcp.h C++ friendly (trivial) · 8876d6b5
      Paul Pluzhnikov 提交于
      I originally sent this patch to <trivial@kernel.org>, but Jiri Kosina did
      not feel that this is fully appropriate for the trivial tree.
      
      Using linux/tcp.h from C++ results in:
      
      cat t.cc
      #include <linux/tcp.h>
      int main() { }
      
      g++ -c t.cc
      
      In file included from t.cc:1:
      /usr/include/linux/tcp.h:72: error: '__u32 __fswab32(__u32)' cannot appear in a constant-expression
      /usr/include/linux/tcp.h:72: error: a function call cannot appear in a constant-expression
      ...
      
      Attached trivial patch fixes this problem.
      
      Tested:
      - the t.cc above compiles with g++ and
      - the following program generates the same output before/after
        the patch:
      
      #include <linux/tcp.h>
      #include <stdio.h>
      
      int main ()
      {
      #define P(a) printf("%s: %08x\n", #a, (int)a)
       P(TCP_FLAG_CWR);
       P(TCP_FLAG_ECE);
       P(TCP_FLAG_URG);
       P(TCP_FLAG_ACK);
       P(TCP_FLAG_PSH);
       P(TCP_FLAG_RST);
       P(TCP_FLAG_SYN);
       P(TCP_FLAG_FIN);
       P(TCP_RESERVED_BITS);
       P(TCP_DATA_OFFSET);
      #undef P
       return 0;
      }
      Signed-off-by: NPaul Pluzhnikov <ppluzhnikov@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8876d6b5
  17. 08 6月, 2012 5 次提交
  18. 07 6月, 2012 2 次提交
  19. 06 6月, 2012 2 次提交