1. 04 10月, 2013 1 次提交
    • E
      inet: consolidate INET_TW_MATCH · 50805466
      Eric Dumazet 提交于
      TCP listener refactoring, part 2 :
      
      We can use a generic lookup, sockets being in whatever state, if
      we are sure all relevant fields are at the same place in all socket
      types (ESTABLISH, TIME_WAIT, SYN_RECV)
      
      This patch removes these macros :
      
       inet_addrpair, inet_addrpair, tw_addrpair, tw_portpair
      
      And adds :
      
       sk_portpair, sk_addrpair, sk_daddr, sk_rcv_saddr
      
      Then, INET_TW_MATCH() is really the same than INET_MATCH()
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      50805466
  2. 03 10月, 2013 2 次提交
  3. 01 10月, 2013 8 次提交
  4. 30 9月, 2013 1 次提交
  5. 29 9月, 2013 7 次提交
    • E
      net: introduce SO_MAX_PACING_RATE · 62748f32
      Eric Dumazet 提交于
      As mentioned in commit afe4fd06 ("pkt_sched: fq: Fair Queue packet
      scheduler"), this patch adds a new socket option.
      
      SO_MAX_PACING_RATE offers the application the ability to cap the
      rate computed by transport layer. Value is in bytes per second.
      
      u32 val = 1000000;
      setsockopt(sockfd, SOL_SOCKET, SO_MAX_PACING_RATE, &val, sizeof(val));
      
      To be effectively paced, a flow must use FQ packet scheduler.
      
      Note that a packet scheduler takes into account the headers for its
      computations. The effective payload rate depends on MSS and retransmits
      if any.
      
      I chose to make this pacing rate a SOL_SOCKET option instead of a
      TCP one because this can be used by other protocols.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Steinar H. Gunderson <sesse@google.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      62748f32
    • F
      ipv4: processing ancillary IP_TOS or IP_TTL · aa661581
      Francesco Fusco 提交于
      If IP_TOS or IP_TTL are specified as ancillary data, then sendmsg() sends out
      packets with the specified TTL or TOS overriding the socket values specified
      with the traditional setsockopt().
      
      The struct inet_cork stores the values of TOS, TTL and priority that are
      passed through the struct ipcm_cookie. If there are user-specified TOS
      (tos != -1) or TTL (ttl != 0) in the struct ipcm_cookie, these values are
      used to override the per-socket values. In case of TOS also the priority
      is changed accordingly.
      
      Two helper functions get_rttos and get_rtconn_flags are defined to take
      into account the presence of a user specified TOS value when computing
      RT_TOS and RT_CONN_FLAGS.
      Signed-off-by: NFrancesco Fusco <ffusco@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa661581
    • F
      ipv4: IP_TOS and IP_TTL can be specified as ancillary data · f02db315
      Francesco Fusco 提交于
      This patch enables the IP_TTL and IP_TOS values passed from userspace to
      be stored in the ipcm_cookie struct. Three fields are added to the struct:
      
      - the TTL, expressed as __u8.
        The allowed values are in the [1-255].
        A value of 0 means that the TTL is not specified.
      
      - the TOS, expressed as __s16.
        The allowed values are in the range [0,255].
        A value of -1 means that the TOS is not specified.
      
      - the priority, expressed as a char and computed when
        handling the ancillary data.
      Signed-off-by: NFrancesco Fusco <ffusco@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f02db315
    • E
      net: net_secret should not depend on TCP · 9a3bab6b
      Eric Dumazet 提交于
      A host might need net_secret[] and never open a single socket.
      
      Problem added in commit aebda156
      ("net: defer net_secret[] initialization")
      
      Based on prior patch from Hannes Frederic Sowa.
      Reported-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NHannes Frederic Sowa <hannes@strressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9a3bab6b
    • E
      net: Delay default_device_exit_batch until no devices are unregistering v2 · 50624c93
      Eric W. Biederman 提交于
      There is currently serialization network namespaces exiting and
      network devices exiting as the final part of netdev_run_todo does not
      happen under the rtnl_lock.  This is compounded by the fact that the
      only list of devices unregistering in netdev_run_todo is local to the
      netdev_run_todo.
      
      This lack of serialization in extreme cases results in network devices
      unregistering in netdev_run_todo after the loopback device of their
      network namespace has been freed (making dst_ifdown unsafe), and after
      the their network namespace has exited (making the NETDEV_UNREGISTER,
      and NETDEV_UNREGISTER_FINAL callbacks unsafe).
      
      Add the missing serialization by a per network namespace count of how
      many network devices are unregistering and having a wait queue that is
      woken up whenever the count is decreased.  The count and wait queue
      allow default_device_exit_batch to wait until all of the unregistration
      activity for a network namespace has finished before proceeding to
      unregister the loopback device and then allowing the network namespace
      to exit.
      
      Only a single global wait queue is used because there is a single global
      lock, and there is a single waiter, per network namespace wait queues
      would be a waste of resources.
      
      The per network namespace count of unregistering devices gives a
      progress guarantee because the number of network devices unregistering
      in an exiting network namespace must ultimately drop to zero (assuming
      network device unregistration completes).
      
      The basic logic remains the same as in v1.  This patch is now half
      comment and half rtnl_lock_unregistering an expanded version of
      wait_event performs no extra work in the common case where no network
      devices are unregistering when we get to default_device_exit_batch.
      Reported-by: NFrancesco Ruggeri <fruggeri@aristanetworks.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      50624c93
    • C
      IPv6 NAT: Do not drop DNATed 6to4/6rd packets · 7df37ff3
      Catalin\(ux\) M. BOIE 提交于
      When a router is doing DNAT for 6to4/6rd packets the latest
      anti-spoofing commit 218774dc ("ipv6: add anti-spoofing checks for
      6to4 and 6rd") will drop them because the IPv6 address embedded does
      not match the IPv4 destination. This patch will allow them to pass by
      testing if we have an address that matches on 6to4/6rd interface.  I
      have been hit by this problem using Fedora and IPV6TO4_IPV4ADDR.
      Also, log the dropped packets (with rate limit).
      Signed-off-by: NCatalin(ux) M. BOIE <catab@embedromix.ro>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7df37ff3
    • M
      USBNET: fix handling padding packet · 60e453a9
      Ming Lei 提交于
      Commit 638c5115(USBNET: support DMA SG) introduces DMA SG
      if the usb host controller is capable of building packet from
      discontinuous buffers, but missed handling padding packet when
      building DMA SG.
      
      This patch attachs the pre-allocated padding packet at the
      end of the sg list, so padding packet can be sent to device
      if drivers require that.
      Reported-by: NDavid Laight <David.Laight@aculab.com>
      Acked-by: NOliver Neukum <oliver@neukum.org>
      Signed-off-by: NMing Lei <ming.lei@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      60e453a9
  6. 28 9月, 2013 2 次提交
  7. 27 9月, 2013 9 次提交
    • J
      [networking]device.h: Remove extern from function prototypes · f629d208
      Joe Perches 提交于
      There are a mix of function prototypes with and without extern
      in the kernel sources.  Standardize on not using extern for
      function prototypes.
      
      Function prototypes don't need to be written with extern.
      extern is assumed by the compiler.  Its use is as unnecessary as
      using auto to declare automatic/local variables in a block.
      Signed-off-by: NJoe Perches <joe@perches.com>
      f629d208
    • J
      net.h/skbuff.h: Remove extern from function prototypes · 7965bd4d
      Joe Perches 提交于
      There are a mix of function prototypes with and without extern
      in the kernel sources.  Standardize on not using extern for
      function prototypes.
      
      Function prototypes don't need to be written with extern.
      extern is assumed by the compiler.  Its use is as unnecessary as
      using auto to declare automatic/local variables in a block.
      Signed-off-by: NJoe Perches <joe@perches.com>
      7965bd4d
    • J
      netfilter: Remove extern from function prototypes · a0f4ecf3
      Joe Perches 提交于
      There are a mix of function prototypes with and without extern
      in the kernel sources.  Standardize on not using extern for
      function prototypes.
      
      Function prototypes don't need to be written with extern.
      extern is assumed by the compiler.  Its use is as unnecessary as
      using auto to declare automatic/local variables in a block.
      Signed-off-by: NJoe Perches <joe@perches.com>
      a0f4ecf3
    • K
      Drivers: hv: util: Correctly support ws2008R2 and earlier · 3a491605
      K. Y. Srinivasan 提交于
      The current code does not correctly negotiate the version numbers for the util
      driver when hosted on earlier hosts. The version numbers presented by this
      driver were not compatible with the version numbers supported by Windows Server
      2008. Fix this problem.
      
      I would like to thank Olaf Hering (ohering@suse.com) for identifying the problem.
      Reported-by: NOlaf Hering <ohering@suse.com>
      Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3a491605
    • V
      net: add a possibility to get private from netdev_adjacent->list · b6ccba4c
      Veaceslav Falico 提交于
      It will be useful to get first/last element.
      
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      CC: Alexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6ccba4c
    • V
      net: add for_each iterators through neighbour lower link's private · 31088a11
      Veaceslav Falico 提交于
      Add a possibility to iterate through netdev_adjacent's private, currently
      only for lower neighbours.
      
      Add both RCU and RTNL/other locking variants of iterators, and make the
      non-rcu variant to be safe from removal.
      
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      CC: Alexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31088a11
    • V
      net: add netdev_adjacent->private and allow to use it · 402dae96
      Veaceslav Falico 提交于
      Currently, even though we can access any linked device, we can't attach
      anything to it, which is vital to properly manage them.
      
      To fix this, add a new void *private to netdev_adjacent and functions
      setting/getting it (per link), so that we can save, per example, bonding's
      slave structures there, per slave device.
      
      netdev_master_upper_dev_link_private(dev, upper_dev, private) links dev to
      upper dev and populates the neighbour link only with private.
      
      netdev_lower_dev_get_private{,_rcu}() returns the private, if found.
      
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      CC: Alexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      402dae96
    • V
      net: add adj_list to save only neighbours · 2f268f12
      Veaceslav Falico 提交于
      Currently, we distinguish neighbours (first-level linked devices) from
      non-neighbours by the neighbour bool in the netdev_adjacent. This could be
      quite time-consuming in case we would like to traverse *only* through
      neighbours - cause we'd have to traverse through all devices and check for
      this flag, and in a (quite common) scenario where we have lots of vlans on
      top of bridge, which is on top of a bond - the bonding would have to go
      through all those vlans to get its upper neighbour linked devices.
      
      This situation is really unpleasant, cause there are already a lot of cases
      when a device with slaves needs to go through them in hot path.
      
      To fix this, introduce a new upper/lower device lists structure -
      adj_list, which contains only the neighbours. It works always in
      pair with the all_adj_list structure (renamed from upper/lower_dev_list),
      i.e. both of them contain the same links, only that all_adj_list contains
      also non-neighbour device links. It's really a small change visible,
      currently, only for __netdev_adjacent_dev_insert/remove(), and doesn't
      change the main linked logic at all.
      
      Also, add some comments a fix a name collision in
      netdev_for_each_upper_dev_rcu() and rework the naming by the following
      rules:
      
      netdev_(all_)(upper|lower)_*
      
      If "all_" is present, then we work with the whole list of upper/lower
      devices, otherwise - only with direct neighbours. Uninline functions - to
      get better stack traces.
      
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      CC: Alexander Duyck <alexander.h.duyck@intel.com>
      CC: Cong Wang <amwang@redhat.com>
      Signed-off-by: NVeaceslav Falico <vfalico@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2f268f12
    • A
      bcma: make bcma_core_pci_{up,down}() callable from atomic context · 2bedea8f
      Arend van Spriel 提交于
      This patch removes the bcma_core_pci_power_save() call from
      the bcma_core_pci_{up,down}() functions as it tries to schedule
      thus requiring to call them from non-atomic context. The function
      bcma_core_pci_power_save() is now exported so the calling module
      can explicitly use it in non-atomic context. This fixes the
      'scheduling while atomic' issue reported by Tod Jackson and
      Joe Perches.
      
      [   13.210710] BUG: scheduling while atomic: dhcpcd/1800/0x00000202
      [   13.210718] Modules linked in: brcmsmac nouveau coretemp kvm_intel kvm cordic brcmutil bcma dell_wmi atl1c ttm mxm_wmi wmi
      [   13.210756] CPU: 2 PID: 1800 Comm: dhcpcd Not tainted 3.11.0-wl #1
      [   13.210762] Hardware name: Alienware M11x R2/M11x R2, BIOS A04 11/23/2010
      [   13.210767]  ffff880177c92c40 ffff880170fd1948 ffffffff8169af5b 0000000000000007
      [   13.210777]  ffff880170fd1ab0 ffff880170fd1958 ffffffff81697ee2 ffff880170fd19d8
      [   13.210785]  ffffffff816a19f5 00000000000f4240 000000000000d080 ffff880170fd1fd8
      [   13.210794] Call Trace:
      [   13.210813]  [<ffffffff8169af5b>] dump_stack+0x4f/0x84
      [   13.210826]  [<ffffffff81697ee2>] __schedule_bug+0x43/0x51
      [   13.210837]  [<ffffffff816a19f5>] __schedule+0x6e5/0x810
      [   13.210845]  [<ffffffff816a1c34>] schedule+0x24/0x70
      [   13.210855]  [<ffffffff816a04fc>] schedule_hrtimeout_range_clock+0x10c/0x150
      [   13.210867]  [<ffffffff810684e0>] ? update_rmtp+0x60/0x60
      [   13.210877]  [<ffffffff8106915f>] ? hrtimer_start_range_ns+0xf/0x20
      [   13.210887]  [<ffffffff816a054e>] schedule_hrtimeout_range+0xe/0x10
      [   13.210897]  [<ffffffff8104f6fb>] usleep_range+0x3b/0x40
      [   13.210910]  [<ffffffffa00371af>] bcma_pcie_mdio_set_phy.isra.3+0x4f/0x80 [bcma]
      [   13.210921]  [<ffffffffa003729f>] bcma_pcie_mdio_write.isra.4+0xbf/0xd0 [bcma]
      [   13.210932]  [<ffffffffa0037498>] bcma_pcie_mdio_writeread.isra.6.constprop.13+0x18/0x30 [bcma]
      [   13.210942]  [<ffffffffa00374ee>] bcma_core_pci_power_save+0x3e/0x80 [bcma]
      [   13.210953]  [<ffffffffa003765d>] bcma_core_pci_up+0x2d/0x60 [bcma]
      [   13.210975]  [<ffffffffa03dc17c>] brcms_c_up+0xfc/0x430 [brcmsmac]
      [   13.210989]  [<ffffffffa03d1a7d>] brcms_up+0x1d/0x20 [brcmsmac]
      [   13.211003]  [<ffffffffa03d2498>] brcms_ops_start+0x298/0x340 [brcmsmac]
      [   13.211020]  [<ffffffff81600a12>] ? cfg80211_netdev_notifier_call+0xd2/0x5f0
      [   13.211030]  [<ffffffff815fa53d>] ? packet_notifier+0xad/0x1d0
      [   13.211064]  [<ffffffff81656e75>] ieee80211_do_open+0x325/0xf80
      [   13.211076]  [<ffffffff8106ac09>] ? __raw_notifier_call_chain+0x9/0x10
      [   13.211086]  [<ffffffff81657b41>] ieee80211_open+0x71/0x80
      [   13.211101]  [<ffffffff81526267>] __dev_open+0x87/0xe0
      [   13.211109]  [<ffffffff8152650c>] __dev_change_flags+0x9c/0x180
      [   13.211117]  [<ffffffff815266a3>] dev_change_flags+0x23/0x70
      [   13.211127]  [<ffffffff8158cd68>] devinet_ioctl+0x5b8/0x6a0
      [   13.211136]  [<ffffffff8158d5c5>] inet_ioctl+0x75/0x90
      [   13.211147]  [<ffffffff8150b38b>] sock_do_ioctl+0x2b/0x70
      [   13.211155]  [<ffffffff8150b681>] sock_ioctl+0x71/0x2a0
      [   13.211169]  [<ffffffff8114ed47>] do_vfs_ioctl+0x87/0x520
      [   13.211180]  [<ffffffff8113f159>] ? ____fput+0x9/0x10
      [   13.211198]  [<ffffffff8106228c>] ? task_work_run+0x9c/0xd0
      [   13.211202]  [<ffffffff8114f271>] SyS_ioctl+0x91/0xb0
      [   13.211208]  [<ffffffff816aa252>] system_call_fastpath+0x16/0x1b
      [   13.211217] NOHZ: local_softirq_pending 202
      
      The issue was introduced in v3.11 kernel by following commit:
      
      commit aa51e598
      Author: Hauke Mehrtens <hauke@hauke-m.de>
      Date:   Sat Aug 24 00:32:31 2013 +0200
      
          brcmsmac: use bcma PCIe up and down functions
      
          replace the calls to bcma_core_pci_extend_L1timer() by calls to the
          newly introduced bcma_core_pci_ip() and bcma_core_pci_down()
      Signed-off-by: NHauke Mehrtens <hauke@hauke-m.de>
          Cc: Arend van Spriel <arend@broadcom.com>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      
      This fix has been discussed with Hauke Mehrtens [1] selection
      option 3) and is intended for v3.12.
      
      Ref:
      [1] http://mid.gmane.org/5239B12D.3040206@hauke-m.de
      
      Cc: <stable@vger.kernel.org> # 3.11.x
      Cc: Tod Jackson <tod.jackson@gmail.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Rafal Milecki <zajec5@gmail.com>
      Cc: Hauke Mehrtens <hauke@hauke-m.de>
      Reviewed-by: NHante Meuleman <meuleman@broadcom.com>
      Signed-off-by: NArend van Spriel <arend@broadcom.com>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      2bedea8f
  8. 26 9月, 2013 1 次提交
  9. 25 9月, 2013 5 次提交
    • R
      of: clean-up ifdefs in of_irq.h · b0b8c960
      Rob Herring 提交于
      Much of of_irq.h is needlessly ifdef'ed. Clean this up and minimize the
      amount ifdef'ed code. This fixes some  build warnings when CONFIG_OF
      is not enabled (seen on i386 and x86_64):
      
      include/linux/of_irq.h:82:7: warning: 'struct device_node' declared inside parameter list [enabled by default]
      include/linux/of_irq.h:82:7: warning: its scope is only this definition or declaration, which is probably not what you want [enabled by default]
      include/linux/of_irq.h:87:47: warning: 'struct device_node' declared inside parameter list [enabled by default]
      
      Compile tested on i386, sparc and arm.
      Reported-by: NRandy Dunlap <rdunlap@infradead.org>
      Cc: Grant Likely <grant.likely@linaro.org>
      Signed-off-by: NRob Herring <rob.herring@calxeda.com>
      b0b8c960
    • A
      revert "memcg, vmscan: integrate soft reclaim tighter with zone shrinking code" · 0608f43d
      Andrew Morton 提交于
      Revert commit 3b38722e ("memcg, vmscan: integrate soft reclaim
      tighter with zone shrinking code")
      
      I merged this prematurely - Michal and Johannes still disagree about the
      overall design direction and the future remains unclear.
      
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0608f43d
    • A
      revert "vmscan, memcg: do softlimit reclaim also for targeted reclaim" · b1aff7fc
      Andrew Morton 提交于
      Revert commit a5b7c87f ("vmscan, memcg: do softlimit reclaim also
      for targeted reclaim")
      
      I merged this prematurely - Michal and Johannes still disagree about the
      overall design direction and the future remains unclear.
      
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b1aff7fc
    • A
      revert "memcg: enhance memcg iterator to support predicates" · 694fbc0f
      Andrew Morton 提交于
      Revert commit de57780d ("memcg: enhance memcg iterator to support
      predicates")
      
      I merged this prematurely - Michal and Johannes still disagree about the
      overall design direction and the future remains unclear.
      
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      694fbc0f
    • M
      watchdog: update watchdog_thresh properly · 9809b18f
      Michal Hocko 提交于
      watchdog_tresh controls how often nmi perf event counter checks per-cpu
      hrtimer_interrupts counter and blows up if the counter hasn't changed
      since the last check.  The counter is updated by per-cpu
      watchdog_hrtimer hrtimer which is scheduled with 2/5 watchdog_thresh
      period which guarantees that hrtimer is scheduled 2 times per the main
      period.  Both hrtimer and perf event are started together when the
      watchdog is enabled.
      
      So far so good.  But...
      
      But what happens when watchdog_thresh is updated from sysctl handler?
      
      proc_dowatchdog will set a new sampling period and hrtimer callback
      (watchdog_timer_fn) will use the new value in the next round.  The
      problem, however, is that nobody tells the perf event that the sampling
      period has changed so it is ticking with the period configured when it
      has been set up.
      
      This might result in an ear ripping dissonance between perf and hrtimer
      parts if the watchdog_thresh is increased.  And even worse it might lead
      to KABOOM if the watchdog is configured to panic on such a spurious
      lockup.
      
      This patch fixes the issue by updating both nmi perf even counter and
      hrtimers if the threshold value has changed.
      
      The nmi one is disabled and then reinitialized from scratch.  This has
      an unpleasant side effect that the allocation of the new event might
      fail theoretically so the hard lockup detector would be disabled for
      such cpus.  On the other hand such a memory allocation failure is very
      unlikely because the original event is deallocated right before.
      
      It would be much nicer if we just changed perf event period but there
      doesn't seem to be any API to do that right now.  It is also unfortunate
      that perf_event_alloc uses GFP_KERNEL allocation unconditionally so we
      cannot use on_each_cpu() and do the same thing from the per-cpu context.
      The update from the current CPU should be safe because
      perf_event_disable removes the event atomically before it clears the
      per-cpu watchdog_ev so it cannot change anything under running handler
      feet.
      
      The hrtimer is simply restarted (thanks to Don Zickus who has pointed
      this out) if it is queued because we cannot rely it will fire&adopt to
      the new sampling period before a new nmi event triggers (when the
      treshold is decreased).
      
      [akpm@linux-foundation.org: the UP version of __smp_call_function_single ended up in the wrong place]
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NDon Zickus <dzickus@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Fabio Estevam <festevam@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9809b18f
  10. 24 9月, 2013 4 次提交