1. 23 7月, 2016 2 次提交
    • A
      radix-tree: fix radix_tree_iter_retry() for tagged iterators. · 3cb9185c
      Andrey Ryabinin 提交于
      radix_tree_iter_retry() resets slot to NULL, but it doesn't reset tags.
      Then NULL slot and non-zero iter.tags passed to radix_tree_next_slot()
      leading to crash:
      
        RIP: radix_tree_next_slot include/linux/radix-tree.h:473
          find_get_pages_tag+0x334/0x930 mm/filemap.c:1452
        ....
        Call Trace:
          pagevec_lookup_tag+0x3a/0x80 mm/swap.c:960
          mpage_prepare_extent_to_map+0x321/0xa90 fs/ext4/inode.c:2516
          ext4_writepages+0x10be/0x2b20 fs/ext4/inode.c:2736
          do_writepages+0x97/0x100 mm/page-writeback.c:2364
          __filemap_fdatawrite_range+0x248/0x2e0 mm/filemap.c:300
          filemap_write_and_wait_range+0x121/0x1b0 mm/filemap.c:490
          ext4_sync_file+0x34d/0xdb0 fs/ext4/fsync.c:115
          vfs_fsync_range+0x10a/0x250 fs/sync.c:195
          vfs_fsync fs/sync.c:209
          do_fsync+0x42/0x70 fs/sync.c:219
          SYSC_fdatasync fs/sync.c:232
          SyS_fdatasync+0x19/0x20 fs/sync.c:230
          entry_SYSCALL_64_fastpath+0x23/0xc1 arch/x86/entry/entry_64.S:207
      
      We must reset iterator's tags to bail out from radix_tree_next_slot()
      and go to the slow-path in radix_tree_next_chunk().
      
      Fixes: 46437f9a ("radix-tree: fix race in gang lookup")
      Link: http://lkml.kernel.org/r/1468495196-10604-1-git-send-email-aryabinin@virtuozzo.comSigned-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Acked-by: NKonstantin Khlebnikov <koct9i@gmail.com>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3cb9185c
    • J
      mm: memcontrol: fix cgroup creation failure after many small jobs · 73f576c0
      Johannes Weiner 提交于
      The memory controller has quite a bit of state that usually outlives the
      cgroup and pins its CSS until said state disappears.  At the same time
      it imposes a 16-bit limit on the CSS ID space to economically store IDs
      in the wild.  Consequently, when we use cgroups to contain frequent but
      small and short-lived jobs that leave behind some page cache, we quickly
      run into the 64k limitations of outstanding CSSs.  Creating a new cgroup
      fails with -ENOSPC while there are only a few, or even no user-visible
      cgroups in existence.
      
      Although pinning CSSs past cgroup removal is common, there are only two
      instances that actually need an ID after a cgroup is deleted: cache
      shadow entries and swapout records.
      
      Cache shadow entries reference the ID weakly and can deal with the CSS
      having disappeared when it's looked up later.  They pose no hurdle.
      
      Swap-out records do need to pin the css to hierarchically attribute
      swapins after the cgroup has been deleted; though the only pages that
      remain swapped out after offlining are tmpfs/shmem pages.  And those
      references are under the user's control, so they are manageable.
      
      This patch introduces a private 16-bit memcg ID and switches swap and
      cache shadow entries over to using that.  This ID can then be recycled
      after offlining when the CSS remains pinned only by objects that don't
      specifically need it.
      
      This script demonstrates the problem by faulting one cache page in a new
      cgroup and deleting it again:
      
        set -e
        mkdir -p pages
        for x in `seq 128000`; do
          [ $((x % 1000)) -eq 0 ] && echo $x
          mkdir /cgroup/foo
          echo $$ >/cgroup/foo/cgroup.procs
          echo trex >pages/$x
          echo $$ >/cgroup/cgroup.procs
          rmdir /cgroup/foo
        done
      
      When run on an unpatched kernel, we eventually run out of possible IDs
      even though there are no visible cgroups:
      
        [root@ham ~]# ./cssidstress.sh
        [...]
        65000
        mkdir: cannot create directory '/cgroup/foo': No space left on device
      
      After this patch, the IDs get released upon cgroup destruction and the
      cache and css objects get released once memory reclaim kicks in.
      
      [hannes@cmpxchg.org: init the IDR]
        Link: http://lkml.kernel.org/r/20160621154601.GA22431@cmpxchg.org
      Fixes: b2052564 ("mm: memcontrol: continue cache reclaim from offlined groups")
      Link: http://lkml.kernel.org/r/20160617162516.GD19084@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: NJohn Garcia <john.garcia@mesosphere.io>
      Reviewed-by: NVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: Nikolay Borisov <kernel@kyup.com>
      Cc: <stable@vger.kernel.org>	[3.19+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      73f576c0
  2. 17 7月, 2016 1 次提交
    • P
      vlan: use a valid default mtu value for vlan over macsec · 18d3df3e
      Paolo Abeni 提交于
      macsec can't cope with mtu frames which need vlan tag insertion, and
      vlan device set the default mtu equal to the underlying dev's one.
      By default vlan over macsec devices use invalid mtu, dropping
      all the large packets.
      This patch adds a netif helper to check if an upper vlan device
      needs mtu reduction. The helper is used during vlan devices
      initialization to set a valid default and during mtu updating to
      forbid invalid, too bit, mtu values.
      The helper currently only check if the lower dev is a macsec device,
      if we get more users, we need to update only the helper (possibly
      reserving an additional IFF bit).
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      18d3df3e
  3. 15 7月, 2016 2 次提交
    • H
      mm: thp: refix false positive BUG in page_move_anon_rmap() · 5a49973d
      Hugh Dickins 提交于
      The VM_BUG_ON_PAGE in page_move_anon_rmap() is more trouble than it's
      worth: the syzkaller fuzzer hit it again.  It's still wrong for some THP
      cases, because linear_page_index() was never intended to apply to
      addresses before the start of a vma.
      
      That's easily fixed with a signed long cast inside linear_page_index();
      and Dmitry has tested such a patch, to verify the false positive.  But
      why extend linear_page_index() just for this case? when the avoidance in
      page_move_anon_rmap() has already grown ugly, and there's no reason for
      the check at all (nothing else there is using address or index).
      
      Remove address arg from page_move_anon_rmap(), remove VM_BUG_ON_PAGE,
      remove CONFIG_DEBUG_VM PageTransHuge adjustment.
      
      And one more thing: should the compound_head(page) be done inside or
      outside page_move_anon_rmap()? It's usually pushed down to the lowest
      level nowadays (and mm/memory.c shows no other explicit use of it), so I
      think it's better done in page_move_anon_rmap() than by caller.
      
      Fixes: 0798d3c0 ("mm: thp: avoid false positive VM_BUG_ON_PAGE in page_move_anon_rmap()")
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1607120444540.12528@eggly.anvilsSigned-off-by: NHugh Dickins <hughd@google.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: <stable@vger.kernel.org>	[4.5+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5a49973d
    • N
      mm: thp: move pmd check inside ptl for freeze_page() · 33f4751e
      Naoya Horiguchi 提交于
      I found a race condition triggering VM_BUG_ON() in freeze_page(), when
      running a testcase with 3 processes:
        - process 1: keep writing thp,
        - process 2: keep clearing soft-dirty bits from virtual address of process 1
        - process 3: call migratepages for process 1,
      
      The kernel message is like this:
      
        kernel BUG at /src/linux-dev/mm/huge_memory.c:3096!
        invalid opcode: 0000 [#1] SMP
        Modules linked in: cfg80211 rfkill crc32c_intel ppdev serio_raw pcspkr virtio_balloon virtio_console parport_pc parport pvpanic acpi_cpufreq tpm_tis tpm i2c_piix4 virtio_blk virtio_net ata_generic pata_acpi floppy virtio_pci virtio_ring virtio
        CPU: 0 PID: 28863 Comm: migratepages Not tainted 4.6.0-v4.6-160602-0827-+ #2
        Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
        task: ffff880037320000 ti: ffff88007cdd0000 task.ti: ffff88007cdd0000
        RIP: 0010:[<ffffffff811f8e06>]  [<ffffffff811f8e06>] split_huge_page_to_list+0x496/0x590
        RSP: 0018:ffff88007cdd3b70  EFLAGS: 00010202
        RAX: 0000000000000001 RBX: ffff88007c7b88c0 RCX: 0000000000000000
        RDX: 0000000000000000 RSI: 0000000700000200 RDI: ffffea0003188000
        RBP: ffff88007cdd3bb8 R08: 0000000000000001 R09: 00003ffffffff000
        R10: ffff880000000000 R11: ffffc000001fffff R12: ffffea0003188000
        R13: ffffea0003188000 R14: 0000000000000000 R15: 0400000000000080
        FS:  00007f8ec241d740(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000             CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007f8ec1f3ed20 CR3: 000000003707b000 CR4: 00000000000006f0
        Call Trace:
          ? list_del+0xd/0x30
          queue_pages_pte_range+0x4d1/0x590
          __walk_page_range+0x204/0x4e0
          walk_page_range+0x71/0xf0
          queue_pages_range+0x75/0x90
          ? queue_pages_hugetlb+0x190/0x190
          ? new_node_page+0xc0/0xc0
          ? change_prot_numa+0x40/0x40
          migrate_to_node+0x71/0xd0
          do_migrate_pages+0x1c3/0x210
          SyS_migrate_pages+0x261/0x290
          entry_SYSCALL_64_fastpath+0x1a/0xa4
        Code: e8 b0 87 fb ff 0f 0b 48 c7 c6 30 32 9f 81 e8 a2 87 fb ff 0f 0b 48 c7 c6 b8 46 9f 81 e8 94 87 fb ff 0f 0b 85 c0 0f 84 3e fd ff ff <0f> 0b 85 c0 0f 85 a6 00 00 00 48 8b 75 c0 4c 89 f7 41 be f0 ff
        RIP   split_huge_page_to_list+0x496/0x590
      
      I'm not sure of the full scenario of the reproduction, but my debug
      showed that split_huge_pmd_address(freeze=true) returned without running
      main code of pmd splitting because pmd_present(*pmd) in precheck somehow
      returned 0.  If this happens, the subsequent try_to_unmap() fails and
      returns non-zero (because page_mapcount() still > 0), and finally
      VM_BUG_ON() fires.  This patch tries to fix it by prechecking pmd state
      inside ptl.
      
      Link: http://lkml.kernel.org/r/1466990929-7452-1-git-send-email-n-horiguchi@ah.jp.nec.comSigned-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      33f4751e
  4. 14 7月, 2016 4 次提交
  5. 12 7月, 2016 1 次提交
  6. 11 7月, 2016 2 次提交
    • I
      Revert "perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86" · 44530d58
      Ingo Molnar 提交于
      This reverts commit 2c95afc1.
      
      Stephane reported the following regression:
      
       > Since Andi added:
       >
       > commit 2c95afc1
       > Author: Andi Kleen <ak@linux.intel.com>
       > Date:   Thu Jun 9 06:14:38 2016 -0700
       >
       >    perf/x86/intel, watchdog: Switch NMI watchdog to ref cycles on x86
       >
       > $ perf stat -e ref-cycles ls
       >   <not counted> ....
       >
       > fails systematically because the ref-cycles is now used by the
       > watchdog and given this is a system-wide pinned event, it monopolizes
       > the fixed counter 2 which is the only counter able to measure this event.
      
      Since the next merge window is near, fix the regression for now
      by reverting the commit.
      Reported-by: NStephane Eranian <eranian@google.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      44530d58
    • L
      x86/quirks: Add early quirk to reset Apple AirPort card · abb2bafd
      Lukas Wunner 提交于
      The EFI firmware on Macs contains a full-fledged network stack for
      downloading OS X images from osrecovery.apple.com. Unfortunately
      on Macs introduced 2011 and 2012, EFI brings up the Broadcom 4331
      wireless card on every boot and leaves it enabled even after
      ExitBootServices has been called. The card continues to assert its IRQ
      line, causing spurious interrupts if the IRQ is shared. It also corrupts
      memory by DMAing received packets, allowing for remote code execution
      over the air. This only stops when a driver is loaded for the wireless
      card, which may be never if the driver is not installed or blacklisted.
      
      The issue seems to be constrained to the Broadcom 4331. Chris Milsted
      has verified that the newer Broadcom 4360 built into the MacBookPro11,3
      (2013/2014) does not exhibit this behaviour. The chances that Apple will
      ever supply a firmware fix for the older machines appear to be zero.
      
      The solution is to reset the card on boot by writing to a reset bit in
      its mmio space. This must be done as an early quirk and not as a plain
      vanilla PCI quirk to successfully combat memory corruption by DMAed
      packets: Matthew Garrett found out in 2012 that the packets are written
      to EfiBootServicesData memory (http://mjg59.dreamwidth.org/11235.html).
      This type of memory is made available to the page allocator by
      efi_free_boot_services(). Plain vanilla PCI quirks run much later, in
      subsys initcall level. In-between a time window would be open for memory
      corruption. Random crashes occurring in this time window and attributed
      to DMAed packets have indeed been observed in the wild by Chris
      Bainbridge.
      
      When Matthew Garrett analyzed the memory corruption issue in 2012, he
      sought to fix it with a grub quirk which transitions the card to D3hot:
      http://git.savannah.gnu.org/cgit/grub.git/commit/?id=9d34bb85da56
      
      This approach does not help users with other bootloaders and while it
      may prevent DMAed packets, it does not cure the spurious interrupts
      emanating from the card. Unfortunately the card's mmio space is
      inaccessible in D3hot, so to reset it, we have to undo the effect of
      Matthew's grub patch and transition the card back to D0.
      
      Note that the quirk takes a few shortcuts to reduce the amount of code:
      The size of BAR 0 and the location of the PM capability is identical
      on all affected machines and therefore hardcoded. Only the address of
      BAR 0 differs between models. Also, it is assumed that the BCMA core
      currently mapped is the 802.11 core. The EFI driver seems to always take
      care of this.
      
      Michael Büsch, Bjorn Helgaas and Matt Fleming contributed feedback
      towards finding the best solution to this problem.
      
      The following should be a comprehensive list of affected models:
          iMac13,1        2012  21.5"       [Root Port 00:1c.3 = 8086:1e16]
          iMac13,2        2012  27"         [Root Port 00:1c.3 = 8086:1e16]
          Macmini5,1      2011  i5 2.3 GHz  [Root Port 00:1c.1 = 8086:1c12]
          Macmini5,2      2011  i5 2.5 GHz  [Root Port 00:1c.1 = 8086:1c12]
          Macmini5,3      2011  i7 2.0 GHz  [Root Port 00:1c.1 = 8086:1c12]
          Macmini6,1      2012  i5 2.5 GHz  [Root Port 00:1c.1 = 8086:1e12]
          Macmini6,2      2012  i7 2.3 GHz  [Root Port 00:1c.1 = 8086:1e12]
          MacBookPro8,1   2011  13"         [Root Port 00:1c.1 = 8086:1c12]
          MacBookPro8,2   2011  15"         [Root Port 00:1c.1 = 8086:1c12]
          MacBookPro8,3   2011  17"         [Root Port 00:1c.1 = 8086:1c12]
          MacBookPro9,1   2012  15"         [Root Port 00:1c.1 = 8086:1e12]
          MacBookPro9,2   2012  13"         [Root Port 00:1c.1 = 8086:1e12]
          MacBookPro10,1  2012  15"         [Root Port 00:1c.1 = 8086:1e12]
          MacBookPro10,2  2012  13"         [Root Port 00:1c.1 = 8086:1e12]
      
      For posterity, spurious interrupts caused by the Broadcom 4331 wireless
      card resulted in splats like this (stacktrace omitted):
      
          irq 17: nobody cared (try booting with the "irqpoll" option)
          handlers:
          [<ffffffff81374370>] pcie_isr
          [<ffffffffc0704550>] sdhci_irq [sdhci] threaded [<ffffffffc07013c0>] sdhci_thread_irq [sdhci]
          [<ffffffffc0a0b960>] azx_interrupt [snd_hda_codec]
          Disabling IRQ #17
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79301
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111781
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=728916
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=895951#c16
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1009819
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1098621
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1149632#c5
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1279130
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1332732
      Tested-by: Konstantin Simanov <k.simanov@stlk.ru>        # [MacBookPro8,1]
      Tested-by: Lukas Wunner <lukas@wunner.de>                # [MacBookPro9,1]
      Tested-by: Bryan Paradis <bryan.paradis@gmail.com>       # [MacBookPro9,2]
      Tested-by: Andrew Worsley <amworsley@gmail.com>          # [MacBookPro10,1]
      Tested-by: Chris Bainbridge <chris.bainbridge@gmail.com> # [MacBookPro10,2]
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Acked-by: NRafał Miłecki <zajec5@gmail.com>
      Acked-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Chris Milsted <cmilsted@redhat.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Michael Buesch <m@bues.ch>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: b43-dev@lists.infradead.org
      Cc: linux-pci@vger.kernel.org
      Cc: linux-wireless@vger.kernel.org
      Cc: stable@vger.kernel.org
      Cc: stable@vger.kernel.org # 123456789abc: x86/quirks: Apply nvidia_bugs quirk only on root bus
      Cc: stable@vger.kernel.org # 123456789abc: x86/quirks: Reintroduce scanning of secondary buses
      Link: http://lkml.kernel.org/r/48d0972ac82a53d460e5fce77a07b2560db95203.1465690253.git.lukas@wunner.de
      [ Did minor readability edits. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      abb2bafd
  7. 10 7月, 2016 1 次提交
    • P
      x86/entry: Avoid interrupt flag save and restore · 2e9d1e15
      Paolo Bonzini 提交于
      Thanks to all the work that was done by Andy Lutomirski and others,
      enter_from_user_mode() and prepare_exit_to_usermode() are now called only with
      interrupts disabled.  Let's provide them a version of user_enter()/user_exit()
      that skips saving and restoring the interrupt flag.
      
      On an AMD-based machine I tested this patch on, with force-enabled
      context tracking, the speed-up in system calls was 90 clock cycles or 6%,
      measured with the following simple benchmark:
      
          #include <sys/signal.h>
          #include <time.h>
          #include <unistd.h>
          #include <stdio.h>
      
          unsigned long rdtsc()
          {
              unsigned long result;
              asm volatile("rdtsc; shl $32, %%rdx; mov %%eax, %%eax\n"
                           "or %%rdx, %%rax" : "=a" (result) : : "rdx");
              return result;
          }
      
          int main()
          {
              unsigned long tsc1, tsc2;
              int pid = getpid();
              int i;
      
              tsc1 = rdtsc();
              for (i = 0; i < 100000000; i++)
                  kill(pid, SIGWINCH);
              tsc2 = rdtsc();
      
              printf("%ld\n", tsc2 - tsc1);
          }
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NAndy Lutomirski <luto@kernel.org>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kvm@vger.kernel.org
      Link: http://lkml.kernel.org/r/1466434712-31440-2-git-send-email-pbonzini@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2e9d1e15
  8. 08 7月, 2016 3 次提交
    • I
      efi: Reorganize the GUID table to make it easier to read · 7fb2b43c
      Ingo Molnar 提交于
      Re-organize the GUID table so that every GUID takes a single line.
      
      This makes each line super long, but if you have a large enough terminal
      (or zoom out of a small terminal) then you can see the structure at
      a glance - which is more readable than it was the case with the
      multi-line layout.
      Acked-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Joe Perches <joe@perches.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Jones <pjones@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/20160627104920.GA9099@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7fb2b43c
    • D
      x86/vdso: Add mremap hook to vm_special_mapping · b059a453
      Dmitry Safonov 提交于
      Add possibility for 32-bit user-space applications to move
      the vDSO mapping.
      
      Previously, when a user-space app called mremap() for the vDSO
      address, in the syscall return path it would land on the previous
      address of the vDSOpage, resulting in segmentation violation.
      
      Now it lands fine and returns to userspace with a remapped vDSO.
      
      This will also fix the context.vdso pointer for 64-bit, which does
      not affect the user of vDSO after mremap() currently, but this
      may change in the future.
      
      As suggested by Andy, return -EINVAL for mremap() that would
      split the vDSO image: that operation cannot possibly result in
      a working system so reject it.
      
      Renamed and moved the text_mapping structure declaration inside
      map_vdso(), as it used only there and now it complements the
      vvar_mapping variable.
      
      There is still a problem for remapping the vDSO in glibc
      applications: the linker relocates addresses for syscalls
      on the vDSO page, so you need to relink with the new
      addresses.
      
      Without that the next syscall through glibc may fail:
      
        Program received signal SIGSEGV, Segmentation fault.
        #0  0xf7fd9b80 in __kernel_vsyscall ()
        #1  0xf7ec8238 in _exit () from /usr/lib32/libc.so.6
      Signed-off-by: NDmitry Safonov <dsafonov@virtuozzo.com>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: 0x7f454c46@gmail.com
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160628113539.13606-2-dsafonov@virtuozzo.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b059a453
    • B
      printk: Make the printk*once() variants return a value · 069f0cd0
      Borislav Petkov 提交于
      Have printk*once() return a bool which denotes whether the string was
      printed or not so that calling code can react accordingly.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1467671487-10344-3-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      069f0cd0
  9. 07 7月, 2016 1 次提交
    • D
      locking/atomic: Introduce inc/dec variants for the atomic_fetch_$op() API · f0662863
      Davidlohr Bueso 提交于
      With the inclusion of atomic FETCH-OP variants, many places in the
      kernel can make use of atomic_fetch_$op() to avoid the callers that
      need to compute the value/state _before_ the operation.
      
      Peter Zijlstra laid out the machinery but we are still missing the
      simpler dec,inc() calls (which future patches will make use of).
      
      This patch only deals with the generic code, as at least right now
      no arch actually implement them -- which is similar to what the
      OP-RETURN primitives currently do.
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: James.Bottomley@HansenPartnership.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: awalls@md.metrocast.net
      Cc: bp@alien8.de
      Cc: cw00.choi@samsung.com
      Cc: davem@davemloft.net
      Cc: dledford@redhat.com
      Cc: dougthompson@xmission.com
      Cc: gregkh@linuxfoundation.org
      Cc: hans.verkuil@cisco.com
      Cc: heiko.carstens@de.ibm.com
      Cc: jikos@kernel.org
      Cc: kys@microsoft.com
      Cc: mchehab@osg.samsung.com
      Cc: pfg@sgi.com
      Cc: schwidefsky@de.ibm.com
      Cc: sean.hefty@intel.com
      Cc: sumit.semwal@linaro.org
      Link: http://lkml.kernel.org/r/20160628215651.GA20048@linux-80c1.suseSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f0662863
  10. 04 7月, 2016 1 次提交
  11. 03 7月, 2016 1 次提交
    • L
      iio: st_sensors: harden interrupt handling · 90efe055
      Linus Walleij 提交于
      Leonard Crestez observed the following phenomenon: when using
      hard interrupt triggers (the DRDY line coming out of an ST
      sensor) sometimes a new value would arrive while reading the
      previous value, due to latencies in the system.
      
      We discovered that the ST hardware as far as can be observed
      is designed for level interrupts: the DRDY line will be held
      asserted as long as there are new values coming. The interrupt
      handler should be re-entered until we're out of values to
      handle from the sensor.
      
      If interrupts were handled as occurring on the edges (usually
      low-to-high) new values could appear and the line be held
      asserted after that, and these values would be missed, the
      interrupt handler would also lock up as new data was
      available, but as no new edges occurs on the DRDY signal,
      nothing happens: the edge detector only detects edges.
      
      To counter this, do the following:
      
      - Accept interrupt lines to be flagged as level interrupts
        using IRQF_TRIGGER_HIGH and IRQF_TRIGGER_LOW. If the line
        is marked like this (in the device tree node or ACPI
        table or similar) it will be utilized as a level IRQ.
        We mark the line with IRQF_ONESHOT and mask the IRQ
        while processing a sample, then the top half will be
        entered again if new values are available.
      
      - If we are flagged as using edge interrupts with
        IRQF_TRIGGER_RISING or IRQF_TRIGGER_FALLING: remove
        IRQF_ONESHOT so that the interrupt line is not
        masked while running the thread part of the interrupt.
        This way we will never miss an interrupt, then introduce
        a loop that polls the data ready registers repeatedly
        until no new samples are available, then exit the
        interrupt handler. This way we know no new values are
        available when the interrupt handler exits and
        new (edge) interrupts will be triggered when data arrives.
        Take some extra care to update the timestamp in the poll
        loop if this happens. The timestamp will not be 100%
        perfect, but it will at least be closer to the actual
        events. Usually the extra poll loop will handle the new
        samples, but once in a blue moon, we get a new IRQ
        while exiting the loop, before returning from the
        thread IRQ bottom half with IRQ_HANDLED. On these rare
        occasions, the removal of IRQF_ONESHOT means the
        interrupt will immediately fire again.
      
      - If no interrupt type is indicated from the DT/ACPI,
        choose IRQF_TRIGGER_RISING as default, as this is necessary
        for legacy boards.
      
      Tested successfully on the LIS331DL and L3G4200D by setting
      sampling frequency to 400Hz/800Hz and stressing the system:
      extra reads in the threaded interrupt handler occurs.
      
      Cc: Giuseppe Barba <giuseppe.barba@st.com>
      Cc: Denis Ciocca <denis.ciocca@st.com>
      Tested-by: NCrestez Dan Leonard <cdleonard@gmail.com>
      Reported-by: NCrestez Dan Leonard <cdleonard@gmail.com>
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NJonathan Cameron <jic23@kernel.org>
      90efe055
  12. 02 7月, 2016 3 次提交
  13. 01 7月, 2016 2 次提交
  14. 30 6月, 2016 6 次提交
  15. 29 6月, 2016 3 次提交
  16. 28 6月, 2016 2 次提交
  17. 27 6月, 2016 5 次提交
    • B
      fs: export __block_write_full_page · b4bba389
      Benjamin Marzinski 提交于
      gfs2 needs to be able to skip the check to see if a page is outside of
      the file size when writing it out. gfs2 can get into a situation where
      it needs to flush its in-memory log to disk while a truncate is in
      progress. If the file being trucated has data journaling enabled, it is
      possible that there are data blocks in the log that are past the end of
      the file. gfs can't finish the log flush without either writing these
      blocks out or revoking them. Otherwise, if the node crashed, it could
      overwrite subsequent changes made by other nodes in the cluster when
      it's journal was replayed.
      
      Unfortunately, there is no way to add log entries to the log during a
      flush. So gfs2 simply writes out the page instead. This situation can
      only occur when the truncate code still has the file locked exclusively,
      and hasn't marked this block as free in the metadata (which happens
      later in truc_dealloc).  After gfs2 writes this page out, the truncation
      code will shortly invalidate it and write out any revokes if necessary.
      
      In order to make this work, gfs2 needs to be able to skip the check for
      writes outside the file size. Since the check exists in
      block_write_full_page, this patch exports __block_write_full_page, which
      doesn't have the check.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      b4bba389
    • C
      extcon: Add resource-managed functions to register extcon notifier · 58f38656
      Chanwoo Choi 提交于
      This patch adds the resource-managed functions for register/unregister
      the extcon notifier with the id of each external connector. This function
      will make it easy to handle the extcon notifier.
      
      - int devm_extcon_register_notifier(struct device *dev,
      				struct extcon_dev *edev, unsigned int id,
      				struct notifier_block *nb);
      - void devm_extcon_unregister_notifier(struct device *dev,
      				struct extcon_dev *edev, unsigned int id,
      				struct notifier_block *nb);
      Signed-off-by: NChanwoo Choi <cw00.choi@samsung.com>
      58f38656
    • C
      extcon: Move struct extcon_cable from header file to core · 20f7b53d
      Chanwoo Choi 提交于
      This patch moves the struct extcon_cable because that should
      be only handled by extcon core. There are no reason to publish
      the internal structure.
      Signed-off-by: NChanwoo Choi <cw00.choi@samsung.com>
      20f7b53d
    • A
      x86/efi: Remove the unused efi_get_time() function · b684e9bc
      Arnd Bergmann 提交于
      Nothing calls the efi_get_time() function on x86, but it does suffer
      from the 32-bit time_t overflow in 2038.
      
      This removes the function, we can always put it back in case we need
      it later.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/1466839230-12781-8-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b684e9bc
    • A
      efi: Convert efi_call_virt() to efi_call_virt_pointer() · 80e75596
      Alex Thorlton 提交于
      This commit makes a few slight modifications to the efi_call_virt() macro
      to get it to work with function pointers that are stored in locations
      other than efi.systab->runtime, and renames the macro to
      efi_call_virt_pointer().  The majority of the changes here are to pull
      these macros up into header files so that they can be accessed from
      outside of drivers/firmware/efi/runtime-wrappers.c.
      
      The most significant change not directly related to the code move is to
      add an extra "p" argument into the appropriate efi_call macros, and use
      that new argument in place of the, formerly hard-coded,
      efi.systab->runtime pointer.
      
      The last piece of the puzzle was to add an efi_call_virt() macro back into
      drivers/firmware/efi/runtime-wrappers.c to wrap around the new
      efi_call_virt_pointer() macro - this was mainly to keep the code from
      looking too cluttered by adding a bunch of extra references to
      efi.systab->runtime everywhere.
      
      Note that I also broke up the code in the efi_call_virt_pointer() macro a
      bit in the process of moving it.
      Signed-off-by: NAlex Thorlton <athorlton@sgi.com>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dimitri Sivanich <sivanich@sgi.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Roy Franz <roy.franz@linaro.org>
      Cc: Russ Anderson <rja@sgi.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/1466839230-12781-5-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      80e75596