1. 21 7月, 2014 2 次提交
  2. 16 7月, 2014 3 次提交
    • W
      net-timestamp: document deprecated syststamp · 26c4fdb0
      Willem de Bruijn 提交于
      The SO_TIMESTAMPING API defines option SOF_TIMESTAMPING_SYS_HW.
      This feature is deprecated. It should not be implemented by new
      device drivers. Existing drivers do not implement it, either --
      with one exception.
      
      Driver developers are encouraged to expose the NIC hw clock as a
      PTP HW clock source, instead, and synchronize system time to the
      HW source.
      
      The control flag cannot be removed due to being part of the ABI, nor
      can the structure scm_timestamping that is returned. Due to the one
      legacy driver, the internal datapath and structure are not removed.
      
      This patch only clearly marks the interface as deprecated. Device
      drivers should always return a syststamp value of zero.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      
      ----
      
      We can consider adding a WARN_ON_ONCE in__sock_recv_timestamp
      if non-zero syststamp is encountered
      Acked-by: NRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      26c4fdb0
    • T
      net: set name_assign_type in alloc_netdev() · c835a677
      Tom Gundersen 提交于
      Extend alloc_netdev{,_mq{,s}}() to take name_assign_type as argument, and convert
      all users to pass NET_NAME_UNKNOWN.
      
      Coccinelle patch:
      
      @@
      expression sizeof_priv, name, setup, txqs, rxqs, count;
      @@
      
      (
      -alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs)
      +alloc_netdev_mqs(sizeof_priv, name, NET_NAME_UNKNOWN, setup, txqs, rxqs)
      |
      -alloc_netdev_mq(sizeof_priv, name, setup, count)
      +alloc_netdev_mq(sizeof_priv, name, NET_NAME_UNKNOWN, setup, count)
      |
      -alloc_netdev(sizeof_priv, name, setup)
      +alloc_netdev(sizeof_priv, name, NET_NAME_UNKNOWN, setup)
      )
      
      v9: move comments here from the wrong commit
      Signed-off-by: NTom Gundersen <teg@jklm.no>
      Reviewed-by: NDavid Herrmann <dh.herrmann@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c835a677
    • T
      net: add name_assign_type netdev attribute · 685343fc
      Tom Gundersen 提交于
      Based on a patch by David Herrmann.
      
      The name_assign_type attribute gives hints where the interface name of a
      given net-device comes from. These values are currently defined:
        NET_NAME_ENUM:
          The ifname is provided by the kernel with an enumerated
          suffix, typically based on order of discovery. Names may
          be reused and unpredictable.
        NET_NAME_PREDICTABLE:
          The ifname has been assigned by the kernel in a predictable way
          that is guaranteed to avoid reuse and always be the same for a
          given device. Examples include statically created devices like
          the loopback device and names deduced from hardware properties
          (including being given explicitly by the firmware). Names
          depending on the order of discovery, or in any other way on the
          existence of other devices, must not be marked as PREDICTABLE.
        NET_NAME_USER:
          The ifname was provided by user-space during net-device setup.
        NET_NAME_RENAMED:
          The net-device has been renamed from userspace. Once this type is set,
          it cannot change again.
        NET_NAME_UNKNOWN:
          This is an internal placeholder to indicate that we yet haven't yet
          categorized the name. It will not be exposed to userspace, rather
          -EINVAL is returned.
      
      The aim of these patches is to improve user-space renaming of interfaces. As
      a general rule, userspace must rename interfaces to guarantee that names stay
      the same every time a given piece of hardware appears (at boot, or when
      attaching it). However, there are several situations where userspace should
      not perform the renaming, and that depends on both the policy of the local
      admin, but crucially also on the nature of the current interface name.
      
      If an interface was created in repsonse to a userspace request, and userspace
      already provided a name, we most probably want to leave that name alone. The
      main instance of this is wifi-P2P devices created over nl80211, which currently
      have a long-standing bug where they are getting renamed by udev. We label such
      names NET_NAME_USER.
      
      If an interface, unbeknown to us, has already been renamed from userspace, we
      most probably want to leave also that alone. This will typically happen when
      third-party plugins (for instance to udev, but the interface is generic so could
      be from anywhere) renames the interface without informing udev about it. A
      typical situation is when you switch root from an installer or an initrd to the
      real system and the new instance of udev does not know what happened before
      the switch. These types of problems have caused repeated issues in the past. To
      solve this, once an interface has been renamed, its name is labelled
      NET_NAME_RENAMED.
      
      In many cases, the kernel is actually able to name interfaces in such a
      way that there is no need for userspace to rename them. This is the case when
      the enumeration order of devices, or in fact any other (non-parent) device on
      the system, can not influence the name of the interface. Examples include
      statically created devices, or any naming schemes based on hardware properties
      of the interface. In this case the admin may prefer to use the kernel-provided
      names, and to make that possible we label such names NET_NAME_PREDICTABLE.
      We want the kernel to have tho possibilty of performing predictable interface
      naming itself (and exposing to userspace that it has), as the information
      necessary for a proper naming scheme for a certain class of devices may not
      be exposed to userspace.
      
      The case where renaming is almost certainly desired, is when the kernel has
      given the interface a name using global device enumeration based on order of
      discovery (ethX, wlanY, etc). These naming schemes are labelled NET_NAME_ENUM.
      
      Lastly, a fallback is left as NET_NAME_UNKNOWN, to indicate that a driver has
      not yet been ported. This is mostly useful as a transitionary measure, allowing
      us to label the various naming schemes bit by bit.
      
      v8: minor documentation fixes
      v9: move comment to the right commit
      Signed-off-by: NTom Gundersen <teg@jklm.no>
      Reviewed-by: NDavid Herrmann <dh.herrmann@gmail.com>
      Reviewed-by: NKay Sievers <kay@vrfy.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      685343fc
  3. 14 7月, 2014 1 次提交
  4. 11 7月, 2014 1 次提交
  5. 09 7月, 2014 4 次提交
  6. 08 7月, 2014 6 次提交
    • R
      net: arcnet: Remove "#define bool int" · db55b62c
      Rasmus Villemoes 提交于
      The header file include/linux/arcdevice.h #defines bool to int, if
      bool is not already #defined. However, the files which use that header
      file seem to rely on that #define (unconditionally) being in effect:
      the prototypes for the functions arcrimi_reset, com20020_reset,
      com90io_reset, com90xx_reset (whose addresses are assigned to the
      hw.reset member of struct arcnet_local) use int explicitly.
      
      Moreover, that #define is an accident waiting to happen (scenario:
      inclusion of arcdevice.h followed by inclusion of some header which
      declares function prototypes using bool). Also, #include
      <linux/types.h> must appear before #include <linux/arcdevice.h> (the
      compiler wouldn't like "typedef _Bool int").
      
      Since none of the files using arcdevice.h declare variables of type
      "bool", the patch is actually quite simple, unlike the commit message.
      Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      db55b62c
    • T
      net: Only do flow_dissector hash computation once per packet · a3b18ddb
      Tom Herbert 提交于
      Add sw_hash flag to skbuff to indicate that skb->hash was computed
      from flow_dissector. This flag is checked in skb_get_hash to avoid
      repeatedly trying to compute the hash (ie. in the case that no L4 hash
      can be computed).
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a3b18ddb
    • T
      ipv6: Implement automatic flow label generation on transmit · cb1ce2ef
      Tom Herbert 提交于
      Automatically generate flow labels for IPv6 packets on transmit.
      The flow label is computed based on skb_get_hash. The flow label will
      only automatically be set when it is zero otherwise (i.e. flow label
      manager hasn't set one). This supports the transmit side functionality
      of RFC 6438.
      
      Added an IPv6 sysctl auto_flowlabels to enable/disable this behavior
      system wide, and added IPV6_AUTOFLOWLABEL socket option to enable this
      functionality per socket.
      
      By default, auto flowlabels are disabled to avoid possible conflicts
      with flow label manager, however if this feature proves useful we
      may want to enable it by default.
      
      It should also be noted that FreeBSD has already implemented automatic
      flow labels (including the sysctl and socket option). In FreeBSD,
      automatic flow labels default to enabled.
      
      Performance impact:
      
      Running super_netperf with 200 flows for TCP_RR and UDP_RR for
      IPv6. Note that in UDP case, __skb_get_hash will be called for
      every packet with explains slight regression. In the TCP case
      the hash is saved in the socket so there is no regression.
      
      Automatic flow labels disabled:
      
        TCP_RR:
          86.53% CPU utilization
          127/195/322 90/95/99% latencies
          1.40498e+06 tps
      
        UDP_RR:
          90.70% CPU utilization
          118/168/243 90/95/99% latencies
          1.50309e+06 tps
      
      Automatic flow labels enabled:
      
        TCP_RR:
          85.90% CPU utilization
          128/199/337 90/95/99% latencies
          1.40051e+06
      
        UDP_RR
          92.61% CPU utilization
          115/164/236 90/95/99% latencies
          1.4687e+06
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb1ce2ef
    • T
      net: Call skb_get_hash in get_xps_queue and __skb_tx_hash · 0e001614
      Tom Herbert 提交于
      Call standard function to get a packet hash instead of taking this from
      skb->sk->sk_hash or only using skb->protocol.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0e001614
    • S
      ptp: Classify ptp over ip over vlan packets · ae5c6c6d
      Stefan Sørensen 提交于
      This extends the ptp bpf to also match ptp over ip over vlan packets. The ptp
      classes are changed to orthogonal bitfields representing version, transport
      and vlan values to simplify matching.
      Signed-off-by: NStefan Sørensen <stefan.sorensen@spectralink.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae5c6c6d
    • R
      bcma: add driver for PCIe Gen 2 core · f473832f
      Rafał Miłecki 提交于
      New Broadcom PCIe devices (802.11ac ones?) use Gen2 and have to be
      initialized differently.
      Signed-off-by: NRafał Miłecki <zajec5@gmail.com>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      f473832f
  7. 04 7月, 2014 1 次提交
    • T
      ptrace,x86: force IRET path after a ptrace_stop() · b9cd18de
      Tejun Heo 提交于
      The 'sysret' fastpath does not correctly restore even all regular
      registers, much less any segment registers or reflags values.  That is
      very much part of why it's faster than 'iret'.
      
      Normally that isn't a problem, because the normal ptrace() interface
      catches the process using the signal handler infrastructure, which
      always returns with an iret.
      
      However, some paths can get caught using ptrace_event() instead of the
      signal path, and for those we need to make sure that we aren't going to
      return to user space using 'sysret'.  Otherwise the modifications that
      may have been done to the register set by the tracer wouldn't
      necessarily take effect.
      
      Fix it by forcing IRET path by setting TIF_NOTIFY_RESUME from
      arch_ptrace_stop_needed() which is invoked from ptrace_stop().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NAndy Lutomirski <luto@amacapital.net>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b9cd18de
  8. 03 7月, 2014 2 次提交
    • A
      net/mlx4_en: Don't use irq_affinity_notifier to track changes in IRQ affinity map · 35f6f453
      Amir Vadai 提交于
      IRQ affinity notifier can only have a single notifier - cpu_rmap
      notifier. Can't use it to track changes in IRQ affinity map.
      Detect IRQ affinity changes by comparing CPU to current IRQ affinity map
      during NAPI poll thread.
      
      CC: Thomas Gleixner <tglx@linutronix.de>
      CC: Ben Hutchings <ben@decadent.org.uk>
      Fixes: 2eacc23c ("net/mlx4_core: Enforce irq affinity changes immediatly")
      Signed-off-by: NAmir Vadai <amirv@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35f6f453
    • T
      kernfs: kernfs_notify() must be useable from non-sleepable contexts · ecca47ce
      Tejun Heo 提交于
      d911d987 ("kernfs: make kernfs_notify() trigger inotify events
      too") added fsnotify triggering to kernfs_notify() which requires a
      sleepable context.  There are already existing users of
      kernfs_notify() which invoke it from an atomic context and in general
      it's silly to require a sleepable context for triggering a
      notification.
      
      The following is an invalid context bug triggerd by md invoking
      sysfs_notify() from IO completion path.
      
       BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
       in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
       2 locks held by swapper/1/0:
        #0:  (&(&vblk->vq_lock)->rlock){-.-...}, at: [<ffffffffa0039042>] virtblk_done+0x42/0xe0 [virtio_blk]
        #1:  (&(&bitmap->counts.lock)->rlock){-.....}, at: [<ffffffff81633718>] bitmap_endwrite+0x68/0x240
       irq event stamp: 33518
       hardirqs last  enabled at (33515): [<ffffffff8102544f>] default_idle+0x1f/0x230
       hardirqs last disabled at (33516): [<ffffffff818122ed>] common_interrupt+0x6d/0x72
       softirqs last  enabled at (33518): [<ffffffff810a1272>] _local_bh_enable+0x22/0x50
       softirqs last disabled at (33517): [<ffffffff810a29e0>] irq_enter+0x60/0x80
       CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.16.0-0.rc2.git2.1.fc21.x86_64 #1
       Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
        0000000000000000 f90db13964f4ee05 ffff88007d403b80 ffffffff81807b4c
        0000000000000000 ffff88007d403ba8 ffffffff810d4f14 0000000000000000
        0000000000441800 ffff880078fa1780 ffff88007d403c38 ffffffff8180caf2
       Call Trace:
        <IRQ>  [<ffffffff81807b4c>] dump_stack+0x4d/0x66
        [<ffffffff810d4f14>] __might_sleep+0x184/0x240
        [<ffffffff8180caf2>] mutex_lock_nested+0x42/0x440
        [<ffffffff812d76a0>] kernfs_notify+0x90/0x150
        [<ffffffff8163377c>] bitmap_endwrite+0xcc/0x240
        [<ffffffffa00de863>] close_write+0x93/0xb0 [raid1]
        [<ffffffffa00df029>] r1_bio_write_done+0x29/0x50 [raid1]
        [<ffffffffa00e0474>] raid1_end_write_request+0xe4/0x260 [raid1]
        [<ffffffff813acb8b>] bio_endio+0x6b/0xa0
        [<ffffffff813b46c4>] blk_update_request+0x94/0x420
        [<ffffffff813bf0ea>] blk_mq_end_io+0x1a/0x70
        [<ffffffffa00392c2>] virtblk_request_done+0x32/0x80 [virtio_blk]
        [<ffffffff813c0648>] __blk_mq_complete_request+0x88/0x120
        [<ffffffff813c070a>] blk_mq_complete_request+0x2a/0x30
        [<ffffffffa0039066>] virtblk_done+0x66/0xe0 [virtio_blk]
        [<ffffffffa002535a>] vring_interrupt+0x3a/0xa0 [virtio_ring]
        [<ffffffff81116177>] handle_irq_event_percpu+0x77/0x340
        [<ffffffff8111647d>] handle_irq_event+0x3d/0x60
        [<ffffffff81119436>] handle_edge_irq+0x66/0x130
        [<ffffffff8101c3e4>] handle_irq+0x84/0x150
        [<ffffffff818146ad>] do_IRQ+0x4d/0xe0
        [<ffffffff818122f2>] common_interrupt+0x72/0x72
        <EOI>  [<ffffffff8105f706>] ? native_safe_halt+0x6/0x10
        [<ffffffff81025454>] default_idle+0x24/0x230
        [<ffffffff81025f9f>] arch_cpu_idle+0xf/0x20
        [<ffffffff810f5adc>] cpu_startup_entry+0x37c/0x7b0
        [<ffffffff8104df1b>] start_secondary+0x25b/0x300
      
      This patch fixes it by punting the notification delivery through a
      work item.  This ends up adding an extra pointer to kernfs_elem_attr
      enlarging kernfs_node by a pointer, which is not ideal but not a very
      big deal either.  If this turns out to be an actual issue, we can move
      kernfs_elem_attr->size to kernfs_node->iattr later.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NJosh Boyer <jwboyer@fedoraproject.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ecca47ce
  9. 02 7月, 2014 5 次提交
    • D
      net: fix circular dependency in of_mdio code · d9daa247
      Daniel Mack 提交于
      Commit 86f6cf41 (net: of_mdio: add of_mdiobus_link_phydev()) introduced a
      circular dependency between libphy and of_mdio.
      
      depmod: ERROR: <modroot>/kernel/drivers/net/phy/libphy.ko in
      dependency cycle!
      depmod: ERROR: <modroot>/kernel/drivers/of/of_mdio.ko in dependency cycle!
      
      The problem is that of_mdio.c references &mdio_bus_type and libphy now
      references of_mdiobus_link_phydev.
      
      Fix this by not exporting of_mdiobus_link_phydev() from of_mdio.ko.
      Make it a static function in mdio_bus.c instead.
      Signed-off-by: NDaniel Mack <zonque@gmail.com>
      Reported-by: NJeff Mahoney <jeffm@suse.com>
      Fixes: 86f6cf41 (net: of_mdio: add of_mdiobus_link_phydev())
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d9daa247
    • E
      inet: move ipv6only in sock_common · 9fe516ba
      Eric Dumazet 提交于
      When an UDP application switches from AF_INET to AF_INET6 sockets, we
      have a small performance degradation for IPv4 communications because of
      extra cache line misses to access ipv6only information.
      
      This can also be noticed for TCP listeners, as ipv6_only_sock() is also
      used from __inet_lookup_listener()->compute_score()
      
      This is magnified when SO_REUSEPORT is used.
      
      Move ipv6only into struct sock_common so that it is available at
      no extra cost in lookups.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9fe516ba
    • G
      sched: Fix compiler warnings · b6220ad6
      Guenter Roeck 提交于
      Commit 143e1e28 (sched: Rework sched_domain topology definition)
      introduced a number of functions with a return value of 'const int'.
      gcc doesn't know what to do with that and, if the kernel is compiled
      with W=1, complains with the following warnings whenever sched.h
      is included.
      
        include/linux/sched.h:875:25: warning: type qualifiers ignored on function return type
        include/linux/sched.h:882:25: warning: type qualifiers ignored on function return type
        include/linux/sched.h:889:25: warning: type qualifiers ignored on function return type
        include/linux/sched.h:1002:21: warning: type qualifiers ignored on function return type
      
      Commits fb2aa855 (sched, ARM: Create a dedicated scheduler topology table)
      and 607b45e9 (sched, powerpc: Create a dedicated topology table) introduce
      the same warning in the arm and powerpc code.
      
      Drop 'const' from the function declarations to fix the problem.
      
      The fix for all three patches has to be applied together to avoid
      compilation failures for the affected architectures.
      Acked-by: NVincent Guittot <vincent.guittot@linaro.org>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1403658329-13196-1-git-send-email-linux@roeck-us.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b6220ad6
    • Z
      core: fix typo in percpu read_mostly section · 330d2822
      Zhengyu He 提交于
      This fixes a typo that named the read_mostly section of percpu as
      readmostly. It works fine with SMP because the linker script specifies
      .data..percpu..readmostly. However, UP kernel builds don't have percpu
      sections defined and the non-percpu version of the section is called
      data..read_mostly, so .data..readmostly will float around and may break
      things unexpectedly.
      
      Looking at the original change that introduced data..percpu..readmostly
      (commit c957ef2c), it looks like this
      was the original intention.
      
      Tested: Built UP kernel and confirmed the sections got merged.
      
      - Before the patch:
      $ objdump -h vmlinux.o  | grep '\.data\.\.read.*mostly'
      38 .data..read_mostly 00004418  0000000000000000  0000000000000000  00431ac0  2**6
      50 .data..readmostly 00000014  0000000000000000  0000000000000000  00444000  2**3
      
      - After the patch:
      $ objdump -h vmlinux.o  | grep '\.data\.\.read.*mostly'
      38 .data..read_mostly 00004438  0000000000000000  0000000000000000  00431ac0  2**6
      Signed-off-by: NZhengyu He <hzy@google.com>
      Signed-off-by: NFilipe Brandenburger <filbranden@google.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      330d2822
    • B
      ipv6: Allow accepting RA from local IP addresses. · d9333196
      Ben Greear 提交于
      This can be used in virtual networking applications, and
      may have other uses as well.  The option is disabled by
      default.
      
      A specific use case is setting up virtual routers, bridges, and
      hosts on a single OS without the use of network namespaces or
      virtual machines.  With proper use of ip rules, routing tables,
      veth interface pairs and/or other virtual interfaces,
      and applications that can bind to interfaces and/or IP addresses,
      it is possibly to create one or more virtual routers with multiple
      hosts attached.  The host interfaces can act as IPv6 systems,
      with radvd running on the ports in the virtual routers.  With the
      option provided in this patch enabled, those hosts can now properly
      obtain IPv6 addresses from the radvd.
      Signed-off-by: NBen Greear <greearb@candelatech.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d9333196
  10. 01 7月, 2014 1 次提交
  11. 30 6月, 2014 1 次提交
  12. 28 6月, 2014 3 次提交
  13. 27 6月, 2014 1 次提交
    • A
      Fix 32-bit regression in block device read(2) · 0b86dbf6
      Al Viro 提交于
      blkdev_read_iter() wants to cap the iov_iter by the amount of data
      remaining to the end of device.  That's what iov_iter_truncate() is for
      (trim iter->count if it's above the given limit).  So far, so good, but
      the argument of iov_iter_truncate() is size_t, so on 32bit boxen (in
      case of a large device) we end up with that upper limit truncated down
      to 32 bits *before* comparing it with iter->count.
      
      Easily fixed by making iov_iter_truncate() take 64bit argument - it does
      the right thing after such change (we only reach the assignment in there
      when the current value of iter->count is greater than the limit, i.e.
      for anything that would get truncated we don't reach the assignment at
      all) and that argument is not the new value of iter->count - it's an
      upper limit for such.
      
      The overhead of passing u64 is not an issue - the thing is inlined, so
      callers passing size_t won't pay any penalty.
      Reported-and-tested-by: NTheodore Tso <tytso@mit.edu>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Tested-by: NAlan Cox <gnomes@lxorguk.ukuu.org.uk>
      Tested-by: NBruno Wolff III <bruno@wolff.to>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0b86dbf6
  14. 26 6月, 2014 3 次提交
  15. 25 6月, 2014 2 次提交
  16. 24 6月, 2014 3 次提交
    • A
      kernel/watchdog.c: print traces for all cpus on lockup detection · ed235875
      Aaron Tomlin 提交于
      A 'softlockup' is defined as a bug that causes the kernel to loop in
      kernel mode for more than a predefined period to time, without giving
      other tasks a chance to run.
      
      Currently, upon detection of this condition by the per-cpu watchdog
      task, debug information (including a stack trace) is sent to the system
      log.
      
      On some occasions, we have observed that the "victim" rather than the
      actual "culprit" (i.e.  the owner/holder of the contended resource) is
      reported to the user.  Often this information has proven to be
      insufficient to assist debugging efforts.
      
      To avoid loss of useful debug information, for architectures which
      support NMI, this patch makes it possible to improve soft lockup
      reporting.  This is accomplished by issuing an NMI to each cpu to obtain
      a stack trace.
      
      If NMI is not supported we just revert back to the old method.  A sysctl
      and boot-time parameter is available to toggle this feature.
      
      [dzickus@redhat.com: add CONFIG_SMP in certain areas]
      [akpm@linux-foundation.org: additional CONFIG_SMP=n optimisations]
      [mq@suse.cz: fix warning]
      Signed-off-by: NAaron Tomlin <atomlin@redhat.com>
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Mateusz Guzik <mguzik@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NJan Moskyto Matejka <mq@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ed235875
    • A
      nmi: provide the option to issue an NMI back trace to every cpu but current · f3aca3d0
      Aaron Tomlin 提交于
      Sometimes it is preferred not to use the trigger_all_cpu_backtrace()
      routine when one wants to avoid capturing a back trace for current.  For
      instance if one was previously captured recently.
      
      This patch provides a new routine namely
      trigger_allbutself_cpu_backtrace() which offers the flexibility to issue
      an NMI to every cpu but current and capture a back trace accordingly.
      
      Patch x86 and sparc to support new routine.
      
      [dzickus@redhat.com: add stub in #else clause]
      [dzickus@redhat.com: don't print message in single processor case, wrap with get/put_cpu based on Oleg's suggestion]
      [sfr@canb.auug.org.au: undo C99ism]
      Signed-off-by: NAaron Tomlin <atomlin@redhat.com>
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Mateusz Guzik <mguzik@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f3aca3d0
    • P
      kexec: save PG_head_mask in VMCOREINFO · b3acc56b
      Petr Tesarik 提交于
      To allow filtering of huge pages, makedumpfile must be able to identify
      them in the dump.  This can be done by checking the appropriate page
      flag, so communicate its value to makedumpfile through the VMCOREINFO
      interface.
      
      There's only one small catch.  Depending on how many page flags are
      available on a given architecture, this bit can be called PG_head or
      PG_compound.
      
      I sent a similar patch back in 2012, but Eric Biederman did not like
      using an #ifdef.  So, this time I'm adding a common symbol
      (PG_head_mask) instead.
      
      See https://lkml.org/lkml/2012/11/28/91 for the previous version.
      Signed-off-by: NPetr Tesarik <ptesarik@suse.cz>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b3acc56b
  17. 23 6月, 2014 1 次提交