1. 26 10月, 2017 1 次提交
    • B
      PCI: Document reset method return values · 832c418a
      Bjorn Helgaas 提交于
      The pci_reset_function() path may try several different reset methods:
      device-specific resets, PCIe Function Level Resets, PCI Advanced Features
      Function Level Reset, etc.
      
      Add a comment about what the return values from these methods mean.  If one
      of the methods fails, in some cases we want to continue and try the next
      one in the list, but sometimes we want to stop trying.
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      832c418a
  2. 25 10月, 2017 1 次提交
    • A
      PCI: Detach driver before procfs & sysfs teardown on device remove · 16b6c8bb
      Alex Williamson 提交于
      When removing a device, for example a VF being removed due to SR-IOV
      teardown, a "soft" hot-unplug via 'echo 1 > remove' in sysfs, or an actual
      hot-unplug, we first remove the procfs and sysfs attributes for the device
      before attempting to release the device from any driver bound to it.
      Unbinding the driver from the device can take time.  The device might need
      to write out data or it might be actively in use.  If it's in use by
      userspace through a vfio driver, the unbind might block until the user
      releases the device.  This leads to a potentially non-trivial amount of
      time where the device exists, but we've torn down the interfaces that
      userspace uses to examine devices, for instance lspci might generate this
      sort of error:
      
        pcilib: Cannot open /sys/bus/pci/devices/0000:01:0a.3/config
        lspci: Unable to read the standard configuration space header of device 0000:01:0a.3
      
      We don't seem to have any dependence on this teardown ordering in the
      kernel, so let's unbind the driver first, which is also more symmetric with
      the instantiation of the device in pci_bus_add_device().
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      16b6c8bb
  3. 20 10月, 2017 1 次提交
  4. 19 10月, 2017 1 次提交
  5. 11 10月, 2017 3 次提交
    • T
      PCI: Restore ARI Capable Hierarchy before setting numVFs · ff26449e
      Tony Nguyen 提交于
      In the restore path, we previously read PCI_SRIOV_VF_OFFSET and
      PCI_SRIOV_VF_STRIDE before restoring PCI_SRIOV_CTRL_ARI:
      
        pci_restore_state
          pci_restore_iov_state
            sriov_restore_state
              pci_iov_set_numvfs
                pci_read_config_word(... PCI_SRIOV_VF_OFFSET, &iov->offset)
                pci_read_config_word(... PCI_SRIOV_VF_STRIDE, &iov->stride)
              pci_write_config_word(... PCI_SRIOV_CTRL, iov->ctrl)
      
      But per SR-IOV r1.1, sec 3.3.3.5, the device can use PCI_SRIOV_CTRL_ARI to
      determine PCI_SRIOV_VF_OFFSET and PCI_SRIOV_VF_STRIDE.  Therefore, this
      path, which is used for suspend/resume and AER recovery, can corrupt
      iov->offset and iov->stride.
      
      Since the iov state is associated with the device, not the driver, if we
      reload the driver, it will use the the corrupted data, which may cause
      crashes like this:
      
        kernel BUG at drivers/pci/iov.c:157!
        RIP: 0010:pci_iov_add_virtfn+0x2eb/0x350
        Call Trace:
         pci_enable_sriov+0x353/0x440
         ixgbe_pci_sriov_configure+0xd5/0x1f0 [ixgbe]
         sriov_numvfs_store+0xf7/0x170
         dev_attr_store+0x18/0x30
         sysfs_kf_write+0x37/0x40
         kernfs_fop_write+0x120/0x1b0
         vfs_write+0xb5/0x1a0
         SyS_write+0x55/0xc0
      
      Restore PCI_SRIOV_CTRL_ARI before calling pci_iov_set_numvfs(), then
      restore the rest of PCI_SRIOV_CTRL (which may set PCI_SRIOV_CTRL_VFE)
      afterwards.
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      [bhelgaas: changelog, add comment, also clear ARI if necessary]
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      CC: Emil Tantilov <emil.s.tantilov@intel.com>
      ff26449e
    • S
      PCI: Create SR-IOV virtfn/physfn links before attaching driver · 27d61629
      Stuart Hayes 提交于
      When creating virtual functions, create the "virtfn%u" and "physfn" links
      in sysfs *before* attaching the driver instead of after.  When we attach
      the driver to the new virtual network interface first, there is a race when
      the driver attaches to the new sends out an "add" udev event, and the
      network interface naming software (biosdevname or systemd, for example)
      tries to look at these links.
      Signed-off-by: NStuart Hayes <stuart.w.hayes@gmail.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      27d61629
    • F
      PCI: Expose SR-IOV offset, stride, and VF device ID via sysfs · 7dfca152
      Filippo Sironi 提交于
      Expose the SR-IOV device offset, stride, and VF device ID via sysfs to make
      it easier for userspace applications to consume them.
      Signed-off-by: NFilippo Sironi <sironi@amazon.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      7dfca152
  6. 06 10月, 2017 4 次提交
  7. 02 10月, 2017 9 次提交
    • L
      Linux 4.14-rc3 · 9e66317d
      Linus Torvalds 提交于
      9e66317d
    • L
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 368f8998
      Linus Torvalds 提交于
      Pull x86 fixes from Thomas Gleixner:
       "This contains the following fixes and improvements:
      
         - Avoid dereferencing an unprotected VMA pointer in the fault signal
           generation code
      
         - Fix inline asm call constraints for GCC 4.4
      
         - Use existing register variable to retrieve the stack pointer
           instead of forcing the compiler to create another indirect access
           which results in excessive extra 'mov %rsp, %<dst>' instructions
      
         - Disable branch profiling for the memory encryption code to prevent
           an early boot crash
      
         - Fix a sparse warning caused by casting the __user annotation in
           __get_user_asm_u64() away
      
         - Fix an off by one error in the loop termination of the error patch
           in the x86 sysfs init code
      
         - Add missing CPU IDs to various Intel specific drivers to enable the
           functionality on recent hardware
      
         - More (init) constification in the numachip code"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/asm: Use register variable to get stack pointer value
        x86/mm: Disable branch profiling in mem_encrypt.c
        x86/asm: Fix inline asm call constraints for GCC 4.4
        perf/x86/intel/uncore: Correct num_boxes for IIO and IRP
        perf/x86/intel/rapl: Add missing CPU IDs
        perf/x86/msr: Add missing CPU IDs
        perf/x86/intel/cstate: Add missing CPU IDs
        x86: Don't cast away the __user in __get_user_asm_u64()
        x86/sysfs: Fix off-by-one error in loop termination
        x86/mm: Fix fault error path using unsafe vma pointer
        x86/numachip: Add const and __initconst to numachip2_clockevent
      368f8998
    • L
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c42ed9f9
      Linus Torvalds 提交于
      Pull timer fixes from Thomas Gleixner:
       "This adds a new timer wheel function which is required for the
        conversion of the timer callback function from the 'unsigned long
        data' argument to 'struct timer_list *timer'. This conversion has two
        benefits:
      
         1) It makes struct timer_list smaller
      
         2) Many callers hand in a pointer to the timer or to the structure
            containing the timer, which happens via type casting both at setup
            and in the callback. This change gets rid of the typecasts.
      
        Once the conversion is complete, which is planned for 4.15, the old
        setup function and the intermediate typecast in the new setup function
        go away along with the data field in struct timer_list.
      
        Merging this now into mainline allows a smooth queueing of the actual
        conversion in the affected maintainer trees without creating
        dependencies"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        um/time: Fixup namespace collision
        timer: Prepare to change timer callback argument type
      c42ed9f9
    • L
      Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 82513545
      Linus Torvalds 提交于
      Pull smp/hotplug fixes from Thomas Gleixner:
       "This addresses the fallout of the new lockdep mechanism which covers
        completions in the CPU hotplug code.
      
        The lockdep splats are false positives, but there is no way to
        annotate that reliably. The solution is to split the completions for
        CPU up and down, which requires some reshuffling of the failure
        rollback handling as well"
      
      * 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        smp/hotplug: Hotplug state fail injection
        smp/hotplug: Differentiate the AP completion between up and down
        smp/hotplug: Differentiate the AP-work lockdep class between up and down
        smp/hotplug: Callback vs state-machine consistency
        smp/hotplug: Rewrite AP state machine core
        smp/hotplug: Allow external multi-instance rollback
        smp/hotplug: Add state diagram
      82513545
    • L
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 7e103ace
      Linus Torvalds 提交于
      Pull scheduler fixes from Thomas Gleixner:
       "The scheduler pull request comes with the following updates:
      
         - Prevent a divide by zero issue by validating the input value of
           sysctl_sched_time_avg
      
         - Make task state printing consistent all over the place and have
           explicit state characters for IDLE and PARKED so they wont be
           displayed as 'D' state which confuses tools"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/sysctl: Check user input value of sysctl_sched_time_avg
        sched/debug: Add explicit TASK_PARKED printing
        sched/debug: Ignore TASK_IDLE for SysRq-W
        sched/debug: Add explicit TASK_IDLE printing
        sched/tracing: Use common task-state helpers
        sched/tracing: Fix trace_sched_switch task-state printing
        sched/debug: Remove unused variable
        sched/debug: Convert TASK_state to hex
        sched/debug: Implement consistent task-state printing
      7e103ace
    • L
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1c6f705b
      Linus Torvalds 提交于
      Pull perf fixes from Thomas Gleixner:
      
       - Prevent a division by zero in the perf aux buffer handling
      
       - Sync kernel headers with perf tool headers
      
       - Fix a build failure in the syscalltbl code
      
       - Make the debug messages of perf report --call-graph work correctly
      
       - Make sure that all required perf files are in the MANIFEST for
         container builds
      
       - Fix the atrr.exclude kernel handling so it respects the
         perf_event_paranoid and the user permissions
      
       - Make perf test on s390x work correctly
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/aux: Only update ->aux_wakeup in non-overwrite mode
        perf test: Fix vmlinux failure on s390x part 2
        perf test: Fix vmlinux failure on s390x
        perf tools: Fix syscalltbl build failure
        perf report: Fix debug messages with --call-graph option
        perf evsel: Fix attr.exclude_kernel setting for default cycles:p
        tools include: Sync kernel ABI headers with tooling headers
        perf tools: Get all of tools/{arch,include}/ in the MANIFEST
      1c6f705b
    • L
      Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1de47f3c
      Linus Torvalds 提交于
      Pull  locking fixes from Thomas Gleixner:
       "Two fixes for locking:
      
         - Plug a hole the pi_stat->owner serialization which was changed
           recently and failed to fixup two usage sites.
      
         - Prevent reordering of the rwsem_has_spinner() check vs the
           decrement of rwsem count in up_write() which causes a missed
           wakeup"
      
      * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        locking/rwsem-xadd: Fix missed wakeup due to reordering of load
        futex: Fix pi_state->owner serialization
      1de47f3c
    • L
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 3d9d62b9
      Linus Torvalds 提交于
      Pull irq fixes from Thomas Gleixner:
      
       - Add a missing NULL pointer check in free_irq()
      
       - Fix a memory leak/memory corruption in the generic irq chip
      
       - Add missing rcu annotations for radix tree access
      
       - Use ffs instead of fls when extracting data from a chip register in
         the MIPS GIC irq driver
      
       - Fix the unmasking of IPI interrupts in the MIPS GIC driver so they
         end up at the target CPU and not at CPU0
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irq/generic-chip: Don't replace domain's name
        irqdomain: Add __rcu annotations to radix tree accessors
        irqchip/mips-gic: Use effective affinity to unmask
        irqchip/mips-gic: Fix shifts to extract register fields
        genirq: Check __free_irq() return value for NULL
      3d9d62b9
    • L
      Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 156069f8
      Linus Torvalds 提交于
      Pull objtool fixes from Thomas Gleixner:
       "Two small fixes for objtool:
      
         - Support frame pointer setup via 'lea (%rsp), %rbp' which was not
           yet supported and caused build warnings
      
         - Disable unreacahble warnings for GCC4.4 and older to avoid false
           positives caused by the compiler itself"
      
      * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        objtool: Support unoptimized frame pointer setup
        objtool: Skip unreachable warnings for GCC 4.4 and older
      156069f8
  8. 01 10月, 2017 2 次提交
  9. 30 9月, 2017 16 次提交
  10. 29 9月, 2017 2 次提交
    • W
      arm64: fault: Route pte translation faults via do_translation_fault · 760bfb47
      Will Deacon 提交于
      We currently route pte translation faults via do_page_fault, which elides
      the address check against TASK_SIZE before invoking the mm fault handling
      code. However, this can cause issues with the path walking code in
      conjunction with our word-at-a-time implementation because
      load_unaligned_zeropad can end up faulting in kernel space if it reads
      across a page boundary and runs into a page fault (e.g. by attempting to
      read from a guard region).
      
      In the case of such a fault, load_unaligned_zeropad has registered a
      fixup to shift the valid data and pad with zeroes, however the abort is
      reported as a level 3 translation fault and we dispatch it straight to
      do_page_fault, despite it being a kernel address. This results in calling
      a sleeping function from atomic context:
      
        BUG: sleeping function called from invalid context at arch/arm64/mm/fault.c:313
        in_atomic(): 0, irqs_disabled(): 0, pid: 10290
        Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
        [...]
        [<ffffff8e016cd0cc>] ___might_sleep+0x134/0x144
        [<ffffff8e016cd158>] __might_sleep+0x7c/0x8c
        [<ffffff8e016977f0>] do_page_fault+0x140/0x330
        [<ffffff8e01681328>] do_mem_abort+0x54/0xb0
        Exception stack(0xfffffffb20247a70 to 0xfffffffb20247ba0)
        [...]
        [<ffffff8e016844fc>] el1_da+0x18/0x78
        [<ffffff8e017f399c>] path_parentat+0x44/0x88
        [<ffffff8e017f4c9c>] filename_parentat+0x5c/0xd8
        [<ffffff8e017f5044>] filename_create+0x4c/0x128
        [<ffffff8e017f59e4>] SyS_mkdirat+0x50/0xc8
        [<ffffff8e01684e30>] el0_svc_naked+0x24/0x28
        Code: 36380080 d5384100 f9400800 9402566d (d4210000)
        ---[ end trace 2d01889f2bca9b9f ]---
      
      Fix this by dispatching all translation faults to do_translation_faults,
      which avoids invoking the page fault logic for faults on kernel addresses.
      
      Cc: <stable@vger.kernel.org>
      Reported-by: NAnkit Jain <ankijain@codeaurora.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      760bfb47
    • W
      arm64: mm: Use READ_ONCE when dereferencing pointer to pte table · f069faba
      Will Deacon 提交于
      On kernels built with support for transparent huge pages, different CPUs
      can access the PMD concurrently due to e.g. fast GUP or page_vma_mapped_walk
      and they must take care to use READ_ONCE to avoid value tearing or caching
      of stale values by the compiler. Unfortunately, these functions call into
      our pgtable macros, which don't use READ_ONCE, and compiler caching has
      been observed to cause the following crash during ext4 writeback:
      
      PC is at check_pte+0x20/0x170
      LR is at page_vma_mapped_walk+0x2e0/0x540
      [...]
      Process doio (pid: 2463, stack limit = 0xffff00000f2e8000)
      Call trace:
      [<ffff000008233328>] check_pte+0x20/0x170
      [<ffff000008233758>] page_vma_mapped_walk+0x2e0/0x540
      [<ffff000008234adc>] page_mkclean_one+0xac/0x278
      [<ffff000008234d98>] rmap_walk_file+0xf0/0x238
      [<ffff000008236e74>] rmap_walk+0x64/0xa0
      [<ffff0000082370c8>] page_mkclean+0x90/0xa8
      [<ffff0000081f3c64>] clear_page_dirty_for_io+0x84/0x2a8
      [<ffff00000832f984>] mpage_submit_page+0x34/0x98
      [<ffff00000832fb4c>] mpage_process_page_bufs+0x164/0x170
      [<ffff00000832fc8c>] mpage_prepare_extent_to_map+0x134/0x2b8
      [<ffff00000833530c>] ext4_writepages+0x484/0xe30
      [<ffff0000081f6ab4>] do_writepages+0x44/0xe8
      [<ffff0000081e5bd4>] __filemap_fdatawrite_range+0xbc/0x110
      [<ffff0000081e5e68>] file_write_and_wait_range+0x48/0xd8
      [<ffff000008324310>] ext4_sync_file+0x80/0x4b8
      [<ffff0000082bd434>] vfs_fsync_range+0x64/0xc0
      [<ffff0000082332b4>] SyS_msync+0x194/0x1e8
      
      This is because page_vma_mapped_walk loads the PMD twice before calling
      pte_offset_map: the first time without READ_ONCE (where it gets all zeroes
      due to a concurrent pmdp_invalidate) and the second time with READ_ONCE
      (where it sees a valid table pointer due to a concurrent pmd_populate).
      However, the compiler inlines everything and caches the first value in
      a register, which is subsequently used in pte_offset_phys which returns
      a junk pointer that is later dereferenced when attempting to access the
      relevant pte.
      
      This patch fixes the issue by using READ_ONCE in pte_offset_phys to ensure
      that a stale value is not used. Whilst this is a point fix for a known
      failure (and simple to backport), a full fix moving all of our page table
      accessors over to {READ,WRITE}_ONCE and consistently using READ_ONCE in
      page_vma_mapped_walk is in the works for a future kernel release.
      
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Timur Tabi <timur@codeaurora.org>
      Cc: <stable@vger.kernel.org>
      Fixes: f27176cf ("mm: convert page_mkclean_one() to use page_vma_mapped_walk()")
      Tested-by: NRichard Ruigrok <rruigrok@codeaurora.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      f069faba