1. 28 9月, 2016 1 次提交
    • D
      vfs: Add current_time() api · 3cd88666
      Deepa Dinamani 提交于
      current_fs_time() is used for inode timestamps.
      
      Change the signature of the function to take inode pointer
      instead of superblock as per Linus's suggestion.
      
      Also, move the api under vfs as per the discussion on the
      thread: https://lkml.org/lkml/2016/6/9/36 . As per Arnd's
      suggestion on the thread, changing the function name.
      
      current_fs_time() will be deleted after all the references
      to it are replaced by current_time().
      
      There was a bug reported by kbuild test bot with the change
      as some of the calls to current_time() were made before the
      super_block was initialized. Catch these accidental assignments
      as timespec_trunc() does for wrong granularities. This allows
      for the function to work right even in these circumstances.
      But, adds a warning to make the user aware of the bug.
      
      A coccinelle script was used to identify all the current
      .alloc_inode super_block callbacks that updated inode timestamps.
      proc filesystem was the only one that was modifying inode times
      as part of this callback. The series includes a patch to fix that.
      
      Note that timespec_trunc() will also be moved to fs/inode.c
      in a separate patch when this will need to be revamped for
      bounds checking purposes.
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Reviewed-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      3cd88666
  2. 10 9月, 2016 1 次提交
  3. 08 9月, 2016 1 次提交
  4. 07 9月, 2016 1 次提交
    • K
      usercopy: fold builtin_const check into inline function · 81409e9e
      Kees Cook 提交于
      Instead of having each caller of check_object_size() need to remember to
      check for a const size parameter, move the check into check_object_size()
      itself. This actually matches the original implementation in PaX, though
      this commit cleans up the now-redundant builtin_const() calls in the
      various architectures.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      81409e9e
  5. 03 9月, 2016 1 次提交
    • L
      ACPI / drivers: fix typo in ACPI_DECLARE_PROBE_ENTRY macro · 3feab13c
      Lorenzo Pieralisi 提交于
      When the ACPI_DECLARE_PROBE_ENTRY macro was added in
      commit e647b532 ("ACPI: Add early device probing infrastructure"),
      a stub macro adding an unused entry was added for the !CONFIG_ACPI
      Kconfig option case to make sure kernel code making use of the
      macro did not require to be guarded within CONFIG_ACPI in order to
      be compiled.
      
      The stub macro was never used since all kernel code that defines
      ACPI_DECLARE_PROBE_ENTRY entries is currently guarded within
      CONFIG_ACPI; it contains a typo that should be nonetheless fixed.
      
      Fix the typo in the stub (ie !CONFIG_ACPI) ACPI_DECLARE_PROBE_ENTRY()
      macro so that it can actually be used if needed.
      Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Fixes: e647b532 (ACPI: Add early device probing infrastructure)
      Cc: 4.4+ <stable@vger.kernel.org> # 4.4+
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      3feab13c
  6. 02 9月, 2016 3 次提交
  7. 01 9月, 2016 2 次提交
    • M
      ovl: don't cache acl on overlay layer · 2a3a2a3f
      Miklos Szeredi 提交于
      Some operations (setxattr/chmod) can make the cached acl stale.  We either
      need to clear overlay's acl cache for the affected inode or prevent acl
      caching on the overlay altogether.  Preventing caching has the following
      advantages:
      
       - no double caching, less memory used
      
       - overlay cache doesn't go stale when fs clears it's own cache
      
      Possible disadvantage is performance loss.  If that becomes a problem
      get_acl() can be optimized for overlayfs.
      
      This patch disables caching by pre setting i_*acl to a value that
      
        - has bit 0 set, so is_uncached_acl() will return true
      
        - is not equal to ACL_NOT_CACHED, so get_acl() will not overwrite it
      
      The constant -3 was chosen for this purpose.
      
      Fixes: 39a25b2b ("ovl: define ->get_acl() for overlay inodes")
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      2a3a2a3f
    • M
      mm: introduce get_task_exe_file · cd81a917
      Mateusz Guzik 提交于
      For more convenient access if one has a pointer to the task.
      
      As a minor nit take advantage of the fact that only task lock + rcu are
      needed to safely grab ->exe_file. This saves mm refcount dance.
      
      Use the helper in proc_exe_link.
      Signed-off-by: NMateusz Guzik <mguzik@redhat.com>
      Acked-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Acked-by: NRichard Guy Briggs <rgb@redhat.com>
      Cc: <stable@vger.kernel.org> # 4.3.x
      Signed-off-by: NPaul Moore <paul@paul-moore.com>
      cd81a917
  8. 31 8月, 2016 2 次提交
    • A
      Revert "tty/serial/8250: use mctrl_gpio helpers" · 5db4f7f8
      Andy Shevchenko 提交于
      Serial console is broken in v4.8-rcX. Mika and I independently bisected down to
      commit 4ef03d32 ("tty/serial/8250: use mctrl_gpio helpers").
      
      Since neither author nor anyone else didn't propose a solution we better revert
      it for now.
      
      This reverts commit 4ef03d32.
      
      Link: https://lkml.kernel.org/r/20160809130229.GN1729@lahna.fi.intel.comSigned-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Tested-by: NHeikki Krogerus <heikki.krogerus@linux.intel.com>
      Tested-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5db4f7f8
    • J
      mm/usercopy: get rid of CONFIG_DEBUG_STRICT_USER_COPY_CHECKS · 0d025d27
      Josh Poimboeuf 提交于
      There are three usercopy warnings which are currently being silenced for
      gcc 4.6 and newer:
      
      1) "copy_from_user() buffer size is too small" compile warning/error
      
         This is a static warning which happens when object size and copy size
         are both const, and copy size > object size.  I didn't see any false
         positives for this one.  So the function warning attribute seems to
         be working fine here.
      
         Note this scenario is always a bug and so I think it should be
         changed to *always* be an error, regardless of
         CONFIG_DEBUG_STRICT_USER_COPY_CHECKS.
      
      2) "copy_from_user() buffer size is not provably correct" compile warning
      
         This is another static warning which happens when I enable
         __compiletime_object_size() for new compilers (and
         CONFIG_DEBUG_STRICT_USER_COPY_CHECKS).  It happens when object size
         is const, but copy size is *not*.  In this case there's no way to
         compare the two at build time, so it gives the warning.  (Note the
         warning is a byproduct of the fact that gcc has no way of knowing
         whether the overflow function will be called, so the call isn't dead
         code and the warning attribute is activated.)
      
         So this warning seems to only indicate "this is an unusual pattern,
         maybe you should check it out" rather than "this is a bug".
      
         I get 102(!) of these warnings with allyesconfig and the
         __compiletime_object_size() gcc check removed.  I don't know if there
         are any real bugs hiding in there, but from looking at a small
         sample, I didn't see any.  According to Kees, it does sometimes find
         real bugs.  But the false positive rate seems high.
      
      3) "Buffer overflow detected" runtime warning
      
         This is a runtime warning where object size is const, and copy size >
         object size.
      
      All three warnings (both static and runtime) were completely disabled
      for gcc 4.6 with the following commit:
      
        2fb0815c ("gcc4: disable __compiletime_object_size for GCC 4.6+")
      
      That commit mistakenly assumed that the false positives were caused by a
      gcc bug in __compiletime_object_size().  But in fact,
      __compiletime_object_size() seems to be working fine.  The false
      positives were instead triggered by #2 above.  (Though I don't have an
      explanation for why the warnings supposedly only started showing up in
      gcc 4.6.)
      
      So remove warning #2 to get rid of all the false positives, and re-enable
      warnings #1 and #3 by reverting the above commit.
      
      Furthermore, since #1 is a real bug which is detected at compile time,
      upgrade it to always be an error.
      
      Having done all that, CONFIG_DEBUG_STRICT_USER_COPY_CHECKS is no longer
      needed.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Nilay Vaish <nilayvaish@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0d025d27
  9. 29 8月, 2016 2 次提交
    • R
      net: smc91x: fix SMC accesses · 2fb04fdf
      Russell King 提交于
      Commit b70661c7 ("net: smc91x: use run-time configuration on all ARM
      machines") broke some ARM platforms through several mistakes.  Firstly,
      the access size must correspond to the following rule:
      
      (a) at least one of 16-bit or 8-bit access size must be supported
      (b) 32-bit accesses are optional, and may be enabled in addition to
          the above.
      
      Secondly, it provides no emulation of 16-bit accesses, instead blindly
      making 16-bit accesses even when the platform specifies that only 8-bit
      is supported.
      
      Reorganise smc91x.h so we can make use of the existing 16-bit access
      emulation already provided - if 16-bit accesses are supported, use
      16-bit accesses directly, otherwise if 8-bit accesses are supported,
      use the provided 16-bit access emulation.  If neither, BUG().  This
      exactly reflects the driver behaviour prior to the commit being fixed.
      
      Since the conversion incorrectly cut down the available access sizes on
      several platforms, we also need to go through every platform and fix up
      the overly-restrictive access size: Arnd assumed that if a platform can
      perform 32-bit, 16-bit and 8-bit accesses, then only a 32-bit access
      size needed to be specified - not so, all available access sizes must
      be specified.
      
      This likely fixes some performance regressions in doing this: if a
      platform does not support 8-bit accesses, 8-bit accesses have been
      emulated by performing a 16-bit read-modify-write access.
      
      Tested on the Intel Assabet/Neponset platform, which supports only 8-bit
      accesses, which was broken by the original commit.
      
      Fixes: b70661c7 ("net: smc91x: use run-time configuration on all ARM machines")
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Tested-by: NRobert Jarzmik <robert.jarzmik@free.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2fb04fdf
    • C
      iomap: don't set FIEMAP_EXTENT_MERGED for extent based filesystems · 17de0a9f
      Christoph Hellwig 提交于
      Filesystems like XFS that use extents should not set the
      FIEMAP_EXTENT_MERGED flag in the fiemap extent structures.  To allow
      for both behaviors for the upcoming gfs2 usage split the iomap
      type field into type and flags, and only set FIEMAP_EXTENT_MERGED if
      the IOMAP_F_MERGED flag is set.  The flags field will also come in
      handy for future features such as shared extents on reflink-enabled
      file systems.
      Reported-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Acked-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      17de0a9f
  10. 27 8月, 2016 2 次提交
  11. 24 8月, 2016 1 次提交
    • T
      drm/tegra: dsi: Enhance runtime power management · 87904c3e
      Thierry Reding 提交于
      The MIPI DSI output on Tegra SoCs requires some external logic to
      calibrate the MIPI pads before a video signal can be transmitted. This
      MIPI calibration logic requires to be powered on while the MIPI pads are
      being used, which is currently done as part of the DSI driver's probe
      implementation.
      
      This is suboptimal because it will leave the MIPI calibration logic
      powered up even if the DSI output is never used.
      
      On Tegra114 and earlier this behaviour also causes the driver to hang
      while trying to power up the MIPI calibration logic because the power
      partition that contains the MIPI calibration logic will be powered on
      by the display controller at output pipeline configuration time. Thus
      the power up sequence for the MIPI calibration logic happens before
      it's power partition is guaranteed to be enabled.
      
      Fix this by splitting up the API into a request/free pair of functions
      that manage the runtime dependency between the DSI and the calibration
      modules (no registers are accessed) and a set of enable, calibrate and
      disable functions that program the MIPI calibration logic at points in
      time where the power partition is really enabled.
      
      While at it, make sure that the runtime power management also works in
      ganged mode, which is currently also broken.
      Reported-by: NJonathan Hunter <jonathanh@nvidia.com>
      Tested-by: NJonathan Hunter <jonathanh@nvidia.com>
      Signed-off-by: NThierry Reding <treding@nvidia.com>
      87904c3e
  12. 22 8月, 2016 1 次提交
  13. 21 8月, 2016 1 次提交
  14. 19 8月, 2016 1 次提交
  15. 18 8月, 2016 2 次提交
  16. 17 8月, 2016 1 次提交
    • C
      PCI: Use positive flags in pci_alloc_irq_vectors() · 4fe0d154
      Christoph Hellwig 提交于
      Instead of passing negative flags like PCI_IRQ_NOMSI to prevent use of
      certain interrupt types, pass positive flags like PCI_IRQ_LEGACY,
      PCI_IRQ_MSI, etc., to specify the acceptable interrupt types.
      
      This is based on a number of pending driver conversions that just happend
      to be a whole more obvious to read this way, and given that we have no
      users in the tree yet it can still easily be done.
      
      I've also added a PCI_IRQ_ALL_TYPES catchall to keep the case of accepting
      all interrupt types very simple.
      
      [bhelgaas: changelog, fix PCI_IRQ_AFFINITY doc typo, remove mention of
      PCI_IRQ_NOLEGACY]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NAlexander Gordeev <agordeev@redhat.com>
      4fe0d154
  17. 16 8月, 2016 1 次提交
  18. 15 8月, 2016 1 次提交
  19. 14 8月, 2016 1 次提交
    • S
      net: remove type_check from dev_get_nest_level() · 952fcfd0
      Sabrina Dubroca 提交于
      The idea for type_check in dev_get_nest_level() was to count the number
      of nested devices of the same type (currently, only macvlan or vlan
      devices).
      This prevented the false positive lockdep warning on configurations such
      as:
      
      eth0 <--- macvlan0 <--- vlan0 <--- macvlan1
      
      However, this doesn't prevent a warning on a configuration such as:
      
      eth0 <--- macvlan0 <--- vlan0
      eth1 <--- vlan1 <--- macvlan1
      
      In this case, all the locks end up with a nesting subclass of 1, so
      lockdep thinks that there is still a deadlock:
      
      - in the first case we have (macvlan_netdev_addr_lock_key, 1) and then
        take (vlan_netdev_xmit_lock_key, 1)
      - in the second case, we have (vlan_netdev_xmit_lock_key, 1) and then
        take (macvlan_netdev_addr_lock_key, 1)
      
      By removing the linktype check in dev_get_nest_level() and always
      incrementing the nesting depth, lockdep considers this configuration
      valid.
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      952fcfd0
  20. 12 8月, 2016 3 次提交
  21. 11 8月, 2016 1 次提交
  22. 10 8月, 2016 2 次提交
    • D
      perf/core: Set cgroup in CPU contexts for new cgroup events · db4a8356
      David Carrillo-Cisneros 提交于
      There's a perf stat bug easy to observer on a machine with only one cgroup:
      
        $ perf stat -e cycles -I 1000 -C 0 -G /
        #          time             counts unit events
            1.000161699      <not counted>      cycles                    /
            2.000355591      <not counted>      cycles                    /
            3.000565154      <not counted>      cycles                    /
            4.000951350      <not counted>      cycles                    /
      
      We'd expect some output there.
      
      The underlying problem is that there is an optimization in
      perf_cgroup_sched_{in,out}() that skips the switch of cgroup events
      if the old and new cgroups in a task switch are the same.
      
      This optimization interacts with the current code in two ways
      that cause a CPU context's cgroup (cpuctx->cgrp) to be NULL even if a
      cgroup event matches the current task. These are:
      
        1. On creation of the first cgroup event in a CPU: In current code,
        cpuctx->cpu is only set in perf_cgroup_sched_in, but due to the
        aforesaid optimization, perf_cgroup_sched_in will run until the next
        cgroup switches in that CPU. This may happen late or never happen,
        depending on system's number of cgroups, CPU load, etc.
      
        2. On deletion of the last cgroup event in a cpuctx: In list_del_event,
        cpuctx->cgrp is set NULL. Any new cgroup event will not be sched in
        because cpuctx->cgrp == NULL until a cgroup switch occurs and
        perf_cgroup_sched_in is executed (updating cpuctx->cgrp).
      
      This patch fixes both problems by setting cpuctx->cgrp in list_add_event,
      mirroring what list_del_event does when removing a cgroup event from CPU
      context, as introduced in:
      
        commit 68cacd29 ("perf_events: Fix stale ->cgrp pointer in update_cgrp_time_from_cpuctx()")
      
      With this patch, cpuctx->cgrp is always set/clear when installing/removing
      the first/last cgroup event in/from the CPU context. With cpuctx->cgrp
      correctly set, event_filter_match works as intended when events are
      sched in/out.
      
      After the fix, the output is as expected:
      
        $ perf stat -e cycles -I 1000 -a -G /
        #         time             counts unit events
           1.004699159          627342882      cycles                    /
           2.007397156          615272690      cycles                    /
           3.010019057          616726074      cycles                    /
      Signed-off-by: NDavid Carrillo-Cisneros <davidcc@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vegard Nossum <vegard.nossum@gmail.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1470124092-113192-1-git-send-email-davidcc@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      db4a8356
    • L
      Revert "printk: create pr_<level> functions" · a0cba217
      Linus Torvalds 提交于
      This reverts commit 874f9c7d.
      
      Geert Uytterhoeven reports:
       "This change seems to have an (unintendent?) side-effect.
      
        Before, pr_*() calls without a trailing newline characters would be
        printed with a newline character appended, both on the console and in
        the output of the dmesg command.
      
        After this commit, no new line character is appended, and the output
        of the next pr_*() call of the same type may be appended, like in:
      
          - Truncating RAM at 0x0000000040000000-0x00000000c0000000 to -0x0000000070000000
          - Ignoring RAM at 0x0000000200000000-0x0000000240000000 (!CONFIG_HIGHMEM)
          + Truncating RAM at 0x0000000040000000-0x00000000c0000000 to -0x0000000070000000Ignoring RAM at 0x0000000200000000-0x0000000240000000 (!CONFIG_HIGHMEM)"
      
      Joe Perches says:
       "No, that is not intentional.
      
        The newline handling code inside vprintk_emit is a bit involved and
        for now I suggest a revert until this has all the same behavior as
        earlier"
      Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Requested-by: NJoe Perches <joe@perches.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a0cba217
  23. 09 8月, 2016 6 次提交
    • A
      KVM: arm64: ITS: return 1 on successful MSI injection · fd837b08
      Andre Przywara 提交于
      According to the KVM API documentation a successful MSI injection
      should return a value > 0 on success.
      Return possible errors in vgic_its_trigger_msi() and report a
      successful injection back to userland, while also reporting the
      case where the MSI could not be delivered due to the guest not
      having the LPI mapped, for instance.
      Signed-off-by: NAndre Przywara <andre.przywara@arm.com>
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      fd837b08
    • M
      genirq/msi: Make sure PCI MSIs are activated early · f3b0946d
      Marc Zyngier 提交于
      Bharat Kumar Gogada reported issues with the generic MSI code, where the
      end-point ended up with garbage in its MSI configuration (both for the vector
      and the message).
      
      It turns out that the two MSI paths in the kernel are doing slightly different
      things:
      
      generic MSI: disable MSI -> allocate MSI -> enable MSI -> setup EP
      PCI MSI: disable MSI -> allocate MSI -> setup EP -> enable MSI
      
      And it turns out that end-points are allowed to latch the content of the MSI
      configuration registers as soon as MSIs are enabled.  In Bharat's case, the
      end-point ends up using whatever was there already, which is not what you
      want.
      
      In order to make things converge, we introduce a new MSI domain flag
      (MSI_FLAG_ACTIVATE_EARLY) that is unconditionally set for PCI/MSI. When set,
      this flag forces the programming of the end-point as soon as the MSIs are
      allocated.
      
      A consequence of this is that we have an extra activate in irq_startup, but
      that should be without much consequence.
      
      tglx: 
      
       - Several people reported a VMWare regression with PCI/MSI-X passthrough. It
         turns out that the patch also cures that issue.
      
       - We need to have a look at the MSI disable interrupt path, where we write
         the msg to all zeros without disabling MSI in the PCI device. Is that
         correct?
      
      Fixes: 52f518a3 "x86/MSI: Use hierarchical irqdomains to manage MSI interrupts"
      Reported-and-tested-by: NBharat Kumar Gogada <bharat.kumar.gogada@xilinx.com>
      Reported-and-tested-by: NFoster Snowhill <forst@forstwoof.ru>
      Reported-by: NMatthias Prager <linux@matthiasprager.de>
      Reported-by: NJason Taylor <jason.taylor@simplivity.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Acked-by: NBjorn Helgaas <bhelgaas@google.com>
      Cc: linux-pci@vger.kernel.org
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/1468426713-31431-1-git-send-email-marc.zyngier@arm.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      f3b0946d
    • S
      qed: Add dcbx app support for IEEE Selection Field. · 59bcb797
      Sudarsana Reddy Kalluru 提交于
      MFW now supports the Selection field for IEEE mode. Add driver changes to
      use the newer MFW masks to read/write the port-id value.
      Signed-off-by: NSudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com>
      Signed-off-by: NYuval Mintz <Yuval.Mintz@qlogic.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      59bcb797
    • D
      bpf: fix checksum fixups on bpf_skb_store_bytes · 479ffccc
      Daniel Borkmann 提交于
      bpf_skb_store_bytes() invocations above L2 header need BPF_F_RECOMPUTE_CSUM
      flag for updates, so that CHECKSUM_COMPLETE will be fixed up along the way.
      Where we ran into an issue with bpf_skb_store_bytes() is when we did a
      single-byte update on the IPv6 hoplimit despite using BPF_F_RECOMPUTE_CSUM
      flag; simple ping via ICMPv6 triggered a hw csum failure as a result. The
      underlying issue has been tracked down to a buffer alignment issue.
      
      Meaning, that csum_partial() computations via skb_postpull_rcsum() and
      skb_postpush_rcsum() pair invoked had a wrong result since they operated on
      an odd address for the hoplimit, while other computations were done on an
      even address. This mix doesn't work as-is with skb_postpull_rcsum(),
      skb_postpush_rcsum() pair as it always expects at least half-word alignment
      of input buffers, which is normally the case. Thus, instead of these helpers
      using csum_sub() and (implicitly) csum_add(), we need to use csum_block_sub(),
      csum_block_add(), respectively. For unaligned offsets, they rotate the sum
      to align it to a half-word boundary again, otherwise they work the same as
      csum_sub() and csum_add().
      
      Adding __skb_postpull_rcsum(), __skb_postpush_rcsum() variants that take the
      offset as an input and adapting bpf_skb_store_bytes() to them fixes the hw
      csum failures again. The skb_postpull_rcsum(), skb_postpush_rcsum() helpers
      use a 0 constant for offset so that the compiler optimizes the offset & 1
      test away and generates the same code as with csum_sub()/_add().
      
      Fixes: 608cd71a ("tc: bpf: generalize pedit action")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      479ffccc
    • L
      unsafe_[get|put]_user: change interface to use a error target label · 1bd4403d
      Linus Torvalds 提交于
      When I initially added the unsafe_[get|put]_user() helpers in commit
      5b24a7a2 ("Add 'unsafe' user access functions for batched
      accesses"), I made the mistake of modeling the interface on our
      traditional __[get|put]_user() functions, which return zero on success,
      or -EFAULT on failure.
      
      That interface is fairly easy to use, but it's actually fairly nasty for
      good code generation, since it essentially forces the caller to check
      the error value for each access.
      
      In particular, since the error handling is already internally
      implemented with an exception handler, and we already use "asm goto" for
      various other things, we could fairly easily make the error cases just
      jump directly to an error label instead, and avoid the need for explicit
      checking after each operation.
      
      So switch the interface to pass in an error label, rather than checking
      the error value in the caller.  Best do it now before we start growing
      more users (the signal handling code in particular would be a good place
      to use the new interface).
      
      So rather than
      
      	if (unsafe_get_user(x, ptr))
      		... handle error ..
      
      the interface is now
      
      	unsafe_get_user(x, ptr, label);
      
      where an error during the user mode fetch will now just cause a jump to
      'label' in the caller.
      
      Right now the actual _implementation_ of this all still ends up being a
      "if (err) goto label", and does not take advantage of any exception
      label tricks, but for "unsafe_put_user()" in particular it should be
      fairly straightforward to convert to using the exception table model.
      
      Note that "unsafe_get_user()" is much harder to convert to a clever
      exception table model, because current versions of gcc do not allow the
      use of "asm goto" (for the exception) with output values (for the actual
      value to be fetched).  But that is hopefully not a limitation in the
      long term.
      
      [ Also note that it might be a good idea to switch unsafe_get_user() to
        actually _return_ the value it fetches from user space, but this
        commit only changes the error handling semantics ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1bd4403d
    • P
      sctp: Export struct sctp_info to userspace · dca3f53c
      Phil Sutter 提交于
      This is required to correctly interpret INET_DIAG_INFO messages exported
      by sctp_diag module.
      Signed-off-by: NPhil Sutter <phil@nwl.cc>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dca3f53c
  24. 08 8月, 2016 2 次提交
    • J
      block: rename bio bi_rw to bi_opf · 1eff9d32
      Jens Axboe 提交于
      Since commit 63a4cc24, bio->bi_rw contains flags in the lower
      portion and the op code in the higher portions. This means that
      old code that relies on manually setting bi_rw is most likely
      going to be broken. Instead of letting that brokeness linger,
      rename the member, to force old and out-of-tree code to break
      at compile time instead of at runtime.
      
      No intended functional changes in this commit.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      1eff9d32
    • J
      block/mm: make bdev_ops->rw_page() take a bool for read/write · c11f0c0b
      Jens Axboe 提交于
      Commit abf54548 changed it from an 'rw' flags type to the
      newer ops based interface, but now we're effectively leaking
      some bdev internals to the rest of the kernel. Since we only
      care about whether it's a read or a write at that level, just
      pass in a bool 'is_write' parameter instead.
      
      Then we can also move op_is_write() and friends back under
      CONFIG_BLOCK protection.
      Reviewed-by: NMike Christie <mchristi@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      c11f0c0b