1. 08 2月, 2015 2 次提交
    • S
      x86/tlb/trace: Do not trace on CPU that is offline · 6c8465a8
      Steven Rostedt (Red Hat) 提交于
      When taking a CPU down for suspend and resume, a tracepoint may be called
      when the CPU has been designated offline. As tracepoints require RCU for
      protection, they must not be called if the current CPU is offline.
      
      Unfortunately, trace_tlb_flush() is called in this scenario as was noted
      by LOCKDEP:
      
      ...
      
       Disabling non-boot CPUs ...
       intel_pstate CPU 1 exiting
      
       ===============================
       smpboot: CPU 1 didn't die...
       [ INFO: suspicious RCU usage. ]
       3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted
       -------------------------------
       include/trace/events/tlb.h:35 suspicious rcu_dereference_check() usage!
      
       other info that might help us debug this:
      
       RCU used illegally from offline CPU!
       rcu_scheduler_active = 1, debug_locks = 0
       no locks held by swapper/1/0.
      
       stack backtrace:
       CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150204.1-iniza-small #1
       Hardware name: SAMSUNG ELECTRONICS CO., LTD. 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
        0000000000000001 ffff88011a44fe18 ffffffff817e370d 0000000000000011
        ffff88011a448290 ffff88011a44fe48 ffffffff810d6847 ffff8800c66b9600
        0000000000000001 ffff88011a44c000 ffffffff81cb3900 ffff88011a44fe78
       Call Trace:
        [<ffffffff817e370d>] dump_stack+0x4c/0x65
        [<ffffffff810d6847>] lockdep_rcu_suspicious+0xe7/0x120
        [<ffffffff810b71a5>] idle_task_exit+0x205/0x2c0
        [<ffffffff81054c4e>] play_dead_common+0xe/0x50
        [<ffffffff81054ca5>] native_play_dead+0x15/0x140
        [<ffffffff8102963f>] arch_cpu_idle_dead+0xf/0x20
        [<ffffffff810cd89e>] cpu_startup_entry+0x37e/0x580
        [<ffffffff81053e20>] start_secondary+0x140/0x150
       intel_pstate CPU 2 exiting
      
      ...
      
      By converting the tlb_flush tracepoint to a TRACE_EVENT_CONDITION where the
      condition is cpu_online(smp_processor_id()), we can avoid calling RCU protected
      code when the CPU is offline.
      
      Link: http://lkml.kernel.org/r/CA+icZUUGiGDoL5NU8RuxKzFjoLjEKRtUWx=JB8B9a0EQv-eGzQ@mail.gmail.com
      
      Cc: stable@vger.kernel.org # 3.17+
      Fixes: d17d8f9d "x86/mm: Add tracepoints for TLB flushes"
      Reported-by: NSedat Dilek <sedat.dilek@gmail.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Suggested-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: NDave Hansen <dave@sr71.net>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      6c8465a8
    • S
      tracing: Add condition check to RCU lockdep checks · a05d59a5
      Steven Rostedt (Red Hat) 提交于
      The trace_tlb_flush() tracepoint can be called when a CPU is going offline.
      When a CPU is offline, RCU is no longer watching that CPU and since the
      tracepoint is protected by RCU, it must not be called. To prevent the
      tlb_flush tracepoint from being called when the CPU is offline, it was
      converted to a TRACE_EVENT_CONDITION where the condition checks if the
      CPU is online before calling the tracepoint.
      
      Unfortunately, this was not enough to stop lockdep from complaining about
      it. Even though the RCU protected code of the tracepoint will never be
      called, the condition is hidden within the tracepoint, and even though the
      condition prevents RCU code from being called, the lockdep checks are
      outside the tracepoint (this is to test tracepoints even when they are not
      enabled).
      
      Even though tracepoints should be checked to be RCU safe when they are not
      enabled, the condition should still be considered when checking RCU.
      
      Link: http://lkml.kernel.org/r/CA+icZUUGiGDoL5NU8RuxKzFjoLjEKRtUWx=JB8B9a0EQv-eGzQ@mail.gmail.com
      
      Fixes: 3a630178 "tracing: generate RCU warnings even when tracepoints are disabled"
      Cc: stable@vger.kernel.org # 3.18+
      Acked-by: NDave Hansen <dave@sr71.net>
      Reported-by: NSedat Dilek <sedat.dilek@gmail.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a05d59a5
  2. 06 2月, 2015 1 次提交
  3. 05 2月, 2015 9 次提交
    • L
      ACPICA: Events: Introduce ACPI_GPE_DISPATCH_RAW_HANDLER to fix 2 issues for the current GPE APIs · 0d0988af
      Lv Zheng 提交于
      ACPICA commit 199cad16530a45aea2bec98e528866e20c5927e1
      
      Since whether the GPE should be disabled/enabled/cleared should only be
      determined by the GPE driver's state machine:
      1. GPE should be disabled if the driver wants to switch to the GPE polling
         mode when a GPE storm condition is indicated and should be enabled if
         the driver wants to switch back to the GPE interrupt mode when all of
         the storm conditions are cleared. The conditions should be protected by
         the driver's specific lock.
      2. GPE should be enabled if the driver has accepted more than one request
         and should be disabled if the driver has completed all of the requests.
         The request count should be protected by the driver's specific lock.
      3. GPE should be cleared either when the driver is about to handle an edge
         triggered GPE or when the driver has completed to handle a level
         triggered GPE. The handling code should be protected by the driver's
         specific lock.
      Thus the GPE enabling/disabling/clearing operations are likely to be
      performed with the driver's specific lock held while we currently cannot do
      this. This is because:
      1. We have the acpi_gbl_gpe_lock held before invoking the GPE driver's
         handler. Driver's specific lock is likely to be held inside of the
         handler, thus we can see some dead lock issues due to the reversed
         locking order or recursive locking. In order to solve such dead lock
         issues, we need to unlock the acpi_gbl_gpe_lock before invoking the
         handler. BZ 1100.
      2. Since GPE disabling/enabling/clearing should be determined by the GPE
         driver's state machine, we shouldn't perform such operations inside of
         ACPICA for a GPE handler to mess up the driver's state machine. BZ 1101.
      
      Originally this patch includes a logic to flush GPE handlers, it is dropped
      due to the following reasons:
      1. This is a different issue;
      2. Linux OSL has fixed this by flushing SCI in acpi_os_wait_events_complete().
      We will pick up this topic when the Linux OSL fix turns out to be not
      sufficient.
      
      Note that currently the internal operations and the acpi_gbl_gpe_lock are
      also used by ACPI_GPE_DISPATCH_METHOD and ACPI_GPE_DISPATCH_NOTIFY. In
      order not to introduce regressions, we add one
      ACPI_GPE_DISPATCH_RAW_HANDLER type to be distiguished from
      ACPI_GPE_DISPATCH_HANDLER. For which the acpi_gbl_gpe_lock is unlocked before
      invoking the GPE handler and the internal enabling/disabling operations are
      bypassed to allow drivers to perform them at a proper position using the
      GPE APIs and ACPI_GPE_DISPATCH_RAW_HANDLER users should invoke acpi_set_gpe()
      instead of acpi_enable_gpe()/acpi_disable_gpe() to bypass the internal GPE
      clearing code in acpi_enable_gpe(). Lv Zheng.
      
      Known issues:
      1. Edge-triggered GPE lost for frequent enablings
         On some buggy silicon platforms, GPE enable line may not be directly
         wired to the GPE trigger line. In that case, when GPE enabling is
         frequently performed for edge-triggered GPEs, GPE status may stay set
         without being triggered.
         This patch may maginify this problem as it allows GPE enabling to be
         parallel performed during the process the GPEs are handled.
         This is an existing issue, because:
         1. For task context:
            Current ACPI_GPE_DISPATCH_METHOD practices have proven that this
            isn't a real issue - we can re-enable edge-triggered GPE in a work
            queue where the GPE status bit might already be set.
         2. For IRQ context:
            This can even happen when the GPE enabling occurs before returning
            from the GPE handler and after unlocking the GPE lock.
         Thus currently no code is included to protect this.
      
      Link: https://github.com/acpica/acpica/commit/199cad16Signed-off-by: NLv Zheng <lv.zheng@intel.com>
      Signed-off-by: NBob Moore <robert.moore@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      0d0988af
    • D
      ACPICA: Update version to 20150204 · 121b7d91
      David E. Box 提交于
      ACPICA commit e06b1624b02dc8317d144e9a6fe9d684c5fa2f90
      
      Version 20150204.
      
      Link: https://github.com/acpica/acpica/commit/e06b1624Signed-off-by: NDavid E. Box <david.e.box@linux.intel.com>
      Signed-off-by: NBob Moore <robert.moore@intel.com>
      Signed-off-by: NLv Zheng <lv.zheng@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      121b7d91
    • D
      ACPICA: Update Copyright headers to 2015 · 82a80941
      David E. Box 提交于
      ACPICA commit 8990e73ab2aa15d6a0068b860ab54feff25bee36
      
      Link: https://github.com/acpica/acpica/commit/8990e73aSigned-off-by: NDavid E. Box <david.e.box@linux.intel.com>
      Signed-off-by: NBob Moore <robert.moore@intel.com>
      Signed-off-by: NLv Zheng <lv.zheng@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      82a80941
    • L
      ACPICA: Events: Cleanup GPE dispatcher type obtaining code · 7c43312a
      Lv Zheng 提交于
      ACPICA commit 7926d5ca9452c87f866938dcea8f12e1efb58f89
      
      There is an issue in acpi_install_gpe_handler() and acpi_remove_gpe_handler().
      The code to obtain the GPE dispatcher type from the Handler->original_flags
      is wrong:
          if (((Handler->original_flags & ACPI_GPE_DISPATCH_METHOD) ||
               (Handler->original_flags & ACPI_GPE_DISPATCH_NOTIFY)) &&
      ACPI_GPE_DISPATCH_NOTIFY is 0x03 and ACPI_GPE_DISPATCH_METHOD is 0x02, thus
      this statement is TRUE for the following dispatcher types:
          0x01 (ACPI_GPE_DISPATCH_HANDLER): not expected
          0x02 (ACPI_GPE_DISPATCH_METHOD): expected
          0x03 (ACPI_GPE_DISPATCH_NOTIFY): expected
      
      There is no functional issue due to this because Handler->original_flags is
      only set in acpi_install_gpe_handler(), and an earlier checker has excluded
      the ACPI_GPE_DISPATCH_HANDLER:
          if ((gpe_event_info->Flags & ACPI_GPE_DISPATCH_MASK) ==
                  ACPI_GPE_DISPATCH_HANDLER)
          {
              Status = AE_ALREADY_EXISTS;
              goto free_and_exit;
          }
          ...
          Handler->original_flags = (u8) (gpe_event_info->Flags &
              (ACPI_GPE_XRUPT_TYPE_MASK | ACPI_GPE_DISPATCH_MASK));
      
      We need to clean this up before modifying the GPE dispatcher type values.
      
      In order to prevent such issue from happening in the future, this patch
      introduces ACPI_GPE_DISPATCH_TYPE() macro to be used to obtain the GPE
      dispatcher types. Lv Zheng.
      
      Link: https://github.com/acpica/acpica/commit/7926d5caSigned-off-by: NLv Zheng <lv.zheng@intel.com>
      Signed-off-by: NDavid E. Box <david.e.box@linux.intel.com>
      Signed-off-by: NBob Moore <robert.moore@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      7c43312a
    • Y
      ACPI: Add interfaces to parse IOAPIC ID for IOAPIC hotplug · ecf5636d
      Yinghai Lu 提交于
      We need to parse APIC ID for IOAPIC registration for IOAPIC hotplug.
      ACPI _MAT method and MADT table are used to figure out IOAPIC ID, just
      like parsing CPU APIC ID for CPU hotplug.
      
      [ tglx: Fixed docbook comment ]
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Len Brown <lenb@kernel.org>
      Link: http://lkml.kernel.org/r/1414387308-27148-8-git-send-email-jiang.liu@linux.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      ecf5636d
    • J
      PCI: Use common resource list management code instead of private implementation · 14d76b68
      Jiang Liu 提交于
      Use common resource list management data structure and interfaces
      instead of private implementation.
      Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Acked-by: NBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      14d76b68
    • J
      resources: Move struct resource_list_entry from ACPI into resource core · 90e97820
      Jiang Liu 提交于
      Currently ACPI, PCI and pnp all implement the same resource list
      management with different data structure. We need to transfer from
      one data structure into another when passing resources from one
      subsystem into another subsystem. So move struct resource_list_entry
      from ACPI into resource core and rename it as resource_entry,
      then it could be reused by different subystems and avoid the data
      structure conversion.
      
      Introduce dedicated header file resource_ext.h instead of embedding
      it into ioport.h to avoid header file inclusion order issues.
      Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
      Acked-by: NVinod Koul <vinod.koul@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      90e97820
    • E
      ipv6: fix sparse errors in ip6_make_flowlabel() · 67765146
      Eric Dumazet 提交于
      include/net/ipv6.h:713:22: warning: incorrect type in assignment (different base types)
      include/net/ipv6.h:713:22:    expected restricted __be32 [usertype] hash
      include/net/ipv6.h:713:22:    got unsigned int
      include/net/ipv6.h:719:25: warning: restricted __be32 degrades to integer
      include/net/ipv6.h:719:22: warning: invalid assignment: ^=
      include/net/ipv6.h:719:22:    left side has type restricted __be32
      include/net/ipv6.h:719:22:    right side has type unsigned int
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      67765146
    • E
      flow_keys: n_proto type should be __be16 · f4575d35
      Eric Dumazet 提交于
      (struct flow_keys)->n_proto is in network order, use
      proper type for this.
      
      Fixes following sparse errors :
      
      net/core/flow_dissector.c:139:39: warning: incorrect type in assignment (different base types)
      net/core/flow_dissector.c:139:39:    expected unsigned short [unsigned] [usertype] n_proto
      net/core/flow_dissector.c:139:39:    got restricted __be16 [assigned] [usertype] proto
      net/core/flow_dissector.c:237:23: warning: incorrect type in assignment (different base types)
      net/core/flow_dissector.c:237:23:    expected unsigned short [unsigned] [usertype] n_proto
      net/core/flow_dissector.c:237:23:    got restricted __be16 [assigned] [usertype] proto
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Fixes: e0f31d84 ("flow_keys: Record IP layer protocol in skb_flow_dissect()")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4575d35
  4. 04 2月, 2015 4 次提交
  5. 03 2月, 2015 2 次提交
  6. 02 2月, 2015 2 次提交
    • E
      ipv4: tcp: get rid of ugly unicast_sock · bdbbb852
      Eric Dumazet 提交于
      In commit be9f4a44 ("ipv4: tcp: remove per net tcp_sock")
      I tried to address contention on a socket lock, but the solution
      I chose was horrible :
      
      commit 3a7c384f ("ipv4: tcp: unicast_sock should not land outside
      of TCP stack") addressed a selinux regression.
      
      commit 0980e56e ("ipv4: tcp: set unicast_sock uc_ttl to -1")
      took care of another regression.
      
      commit b5ec8eea ("ipv4: fix ip_send_skb()") fixed another regression.
      
      commit 811230cd ("tcp: ipv4: initialize unicast_sock sk_pacing_rate")
      was another shot in the dark.
      
      Really, just use a proper socket per cpu, and remove the skb_orphan()
      call, to re-enable flow control.
      
      This solves a serious problem with FQ packet scheduler when used in
      hostile environments, as we do not want to allocate a flow structure
      for every RST packet sent in response to a spoofed packet.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bdbbb852
    • L
      sched: don't cause task state changes in nested sleep debugging · 00845eb9
      Linus Torvalds 提交于
      Commit 8eb23b9f ("sched: Debug nested sleeps") added code to report
      on nested sleep conditions, which we generally want to avoid because the
      inner sleeping operation can re-set the thread state to TASK_RUNNING,
      but that will then cause the outer sleep loop not actually sleep when it
      calls schedule.
      
      However, that's actually valid traditional behavior, with the inner
      sleep being some fairly rare case (like taking a sleeping lock that
      normally doesn't actually need to sleep).
      
      And the debug code would actually change the state of the task to
      TASK_RUNNING internally, which makes that kind of traditional and
      working code not work at all, because now the nested sleep doesn't just
      sometimes cause the outer one to not block, but will cause it to happen
      every time.
      
      In particular, it will cause the cardbus kernel daemon (pccardd) to
      basically busy-loop doing scheduling, converting a laptop into a heater,
      as reported by Bruno Prémont.  But there may be other legacy uses of
      that nested sleep model in other drivers that are also likely to never
      get converted to the new model.
      
      This fixes both cases:
      
       - don't set TASK_RUNNING when the nested condition happens (note: even
         if WARN_ONCE() only _warns_ once, the return value isn't whether the
         warning happened, but whether the condition for the warning was true.
         So despite the warning only happening once, the "if (WARN_ON(..))"
         would trigger for every nested sleep.
      
       - in the cases where we knowingly disable the warning by using
         "sched_annotate_sleep()", don't change the task state (that is used
         for all core scheduling decisions), instead use '->task_state_change'
         that is used for the debugging decision itself.
      
      (Credit for the second part of the fix goes to Oleg Nesterov: "Can't we
      avoid this subtle change in behaviour DEBUG_ATOMIC_SLEEP adds?" with the
      suggested change to use 'task_state_change' as part of the test)
      Reported-and-bisected-by: NBruno Prémont <bonbons@linux-vserver.org>
      Tested-by: NRafael J Wysocki <rjw@rjwysocki.net>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>,
      Cc: Ilya Dryomov <ilya.dryomov@inktank.com>,
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Hurley <peter@hurleysoftware.com>,
      Cc: Davidlohr Bueso <dave@stgolabs.net>,
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      00845eb9
  7. 01 2月, 2015 1 次提交
    • E
      net: sched: fix panic in rate estimators · 0d32ef8c
      Eric Dumazet 提交于
      Doing the following commands on a non idle network device
      panics the box instantly, because cpu_bstats gets overwritten
      by stats.
      
      tc qdisc add dev eth0 root <your_favorite_qdisc>
      ... some traffic (one packet is enough) ...
      tc qdisc replace dev eth0 root est 1sec 4sec <your_favorite_qdisc>
      
      [  325.355596] BUG: unable to handle kernel paging request at ffff8841dc5a074c
      [  325.362609] IP: [<ffffffff81541c9e>] __gnet_stats_copy_basic+0x3e/0x90
      [  325.369158] PGD 1fa7067 PUD 0
      [  325.372254] Oops: 0000 [#1] SMP
      [  325.375514] Modules linked in: ...
      [  325.398346] CPU: 13 PID: 14313 Comm: tc Not tainted 3.19.0-smp-DEV #1163
      [  325.412042] task: ffff8800793ab5d0 ti: ffff881ff2fa4000 task.ti: ffff881ff2fa4000
      [  325.419518] RIP: 0010:[<ffffffff81541c9e>]  [<ffffffff81541c9e>] __gnet_stats_copy_basic+0x3e/0x90
      [  325.428506] RSP: 0018:ffff881ff2fa7928  EFLAGS: 00010286
      [  325.433824] RAX: 000000000000000c RBX: ffff881ff2fa796c RCX: 000000000000000c
      [  325.440988] RDX: ffff8841dc5a0744 RSI: 0000000000000060 RDI: 0000000000000060
      [  325.448120] RBP: ffff881ff2fa7948 R08: ffffffff81cd4f80 R09: 0000000000000000
      [  325.455268] R10: ffff883ff223e400 R11: 0000000000000000 R12: 000000015cba0744
      [  325.462405] R13: ffffffff81cd4f80 R14: ffff883ff223e460 R15: ffff883feea0722c
      [  325.469536] FS:  00007f2ee30fa700(0000) GS:ffff88407fa20000(0000) knlGS:0000000000000000
      [  325.477630] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  325.483380] CR2: ffff8841dc5a074c CR3: 0000003feeae9000 CR4: 00000000001407e0
      [  325.490510] Stack:
      [  325.492524]  ffff883feea0722c ffff883fef719dc0 ffff883feea0722c ffff883ff223e4a0
      [  325.499990]  ffff881ff2fa79a8 ffffffff815424ee ffff883ff223e49c 000000015cba0744
      [  325.507460]  00000000f2fa7978 0000000000000000 ffff881ff2fa79a8 ffff883ff223e4a0
      [  325.514956] Call Trace:
      [  325.517412]  [<ffffffff815424ee>] gen_new_estimator+0x8e/0x230
      [  325.523250]  [<ffffffff815427aa>] gen_replace_estimator+0x4a/0x60
      [  325.529349]  [<ffffffff815718ab>] tc_modify_qdisc+0x52b/0x590
      [  325.535117]  [<ffffffff8155edd0>] rtnetlink_rcv_msg+0xa0/0x240
      [  325.540963]  [<ffffffff8155ed30>] ? __rtnl_unlock+0x20/0x20
      [  325.546532]  [<ffffffff8157f811>] netlink_rcv_skb+0xb1/0xc0
      [  325.552145]  [<ffffffff8155b355>] rtnetlink_rcv+0x25/0x40
      [  325.557558]  [<ffffffff8157f0d8>] netlink_unicast+0x168/0x220
      [  325.563317]  [<ffffffff8157f47c>] netlink_sendmsg+0x2ec/0x3e0
      
      Lets play safe and not use an union : percpu 'pointers' are mostly read
      anyway, and we have typically few qdiscs per host.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Fixes: 22e0f8b9 ("net: sched: make bstats per cpu and estimator RCU safe")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d32ef8c
  8. 31 1月, 2015 1 次提交
  9. 30 1月, 2015 1 次提交
    • L
      vm: add VM_FAULT_SIGSEGV handling support · 33692f27
      Linus Torvalds 提交于
      The core VM already knows about VM_FAULT_SIGBUS, but cannot return a
      "you should SIGSEGV" error, because the SIGSEGV case was generally
      handled by the caller - usually the architecture fault handler.
      
      That results in lots of duplication - all the architecture fault
      handlers end up doing very similar "look up vma, check permissions, do
      retries etc" - but it generally works.  However, there are cases where
      the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV.
      
      In particular, when accessing the stack guard page, libsigsegv expects a
      SIGSEGV.  And it usually got one, because the stack growth is handled by
      that duplicated architecture fault handler.
      
      However, when the generic VM layer started propagating the error return
      from the stack expansion in commit fee7e49d ("mm: propagate error
      from stack expansion even for guard page"), that now exposed the
      existing VM_FAULT_SIGBUS result to user space.  And user space really
      expected SIGSEGV, not SIGBUS.
      
      To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those
      duplicate architecture fault handlers about it.  They all already have
      the code to handle SIGSEGV, so it's about just tying that new return
      value to the existing code, but it's all a bit annoying.
      
      This is the mindless minimal patch to do this.  A more extensive patch
      would be to try to gather up the mostly shared fault handling logic into
      one generic helper routine, and long-term we really should do that
      cleanup.
      
      Just from this patch, you can generally see that most architectures just
      copied (directly or indirectly) the old x86 way of doing things, but in
      the meantime that original x86 model has been improved to hold the VM
      semaphore for shorter times etc and to handle VM_FAULT_RETRY and other
      "newer" things, so it would be a good idea to bring all those
      improvements to the generic case and teach other architectures about
      them too.
      Reported-and-tested-by: NTakashi Iwai <tiwai@suse.de>
      Tested-by: NJan Engelhardt <jengelh@inai.de>
      Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots"
      Cc: linux-arch@vger.kernel.org
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      33692f27
  10. 29 1月, 2015 2 次提交
    • N
      tcp: stretch ACK fixes prep · e73ebb08
      Neal Cardwell 提交于
      LRO, GRO, delayed ACKs, and middleboxes can cause "stretch ACKs" that
      cover more than the RFC-specified maximum of 2 packets. These stretch
      ACKs can cause serious performance shortfalls in common congestion
      control algorithms that were designed and tuned years ago with
      receiver hosts that were not using LRO or GRO, and were instead
      politely ACKing every other packet.
      
      This patch series fixes Reno and CUBIC to handle stretch ACKs.
      
      This patch prepares for the upcoming stretch ACK bug fix patches. It
      adds an "acked" parameter to tcp_cong_avoid_ai() to allow for future
      fixes to tcp_cong_avoid_ai() to correctly handle stretch ACKs, and
      changes all congestion control algorithms to pass in 1 for the ACKed
      count. It also changes tcp_slow_start() to return the number of packet
      ACK "credits" that were not processed in slow start mode, and can be
      processed by the congestion control module in additive increase mode.
      
      In future patches we will fix tcp_cong_avoid_ai() to handle stretch
      ACKs, and fix Reno and CUBIC handling of stretch ACKs in slow start
      and additive increase mode.
      Reported-by: NEyal Perry <eyalpe@mellanox.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e73ebb08
    • T
      ALSA: ak411x: Fix stall in work callback · 4161b450
      Takashi Iwai 提交于
      When ak4114 work calls its callback and the callback invokes
      ak4114_reinit(), it stalls due to flush_delayed_work().  For avoiding
      this, control the reentrance by introducing a refcount.  Also
      flush_delayed_work() is replaced with cancel_delayed_work_sync().
      
      The exactly same bug is present in ak4113.c and fixed as well.
      Reported-by: NPavel Hofman <pavel.hofman@ivitera.com>
      Acked-by: NJaroslav Kysela <perex@perex.cz>
      Tested-by: NPavel Hofman <pavel.hofman@ivitera.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      4161b450
  11. 28 1月, 2015 2 次提交
    • P
      perf: Tighten (and fix) the grouping condition · c3c87e77
      Peter Zijlstra 提交于
      The fix from 9fc81d87 ("perf: Fix events installation during
      moving group") was incomplete in that it failed to recognise that
      creating a group with events for different CPUs is semantically
      broken -- they cannot be co-scheduled.
      
      Furthermore, it leads to real breakage where, when we create an event
      for CPU Y and then migrate it to form a group on CPU X, the code gets
      confused where the counter is programmed -- triggered in practice
      as well by me via the perf fuzzer.
      
      Fix this by tightening the rules for creating groups. Only allow
      grouping of counters that can be co-scheduled in the same context.
      This means for the same task and/or the same cpu.
      
      Fixes: 9fc81d87 ("perf: Fix events installation during moving group")
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20150123125834.090683288@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c3c87e77
    • J
      quota: Switch ->get_dqblk() and ->set_dqblk() to use bytes as space units · 14bf61ff
      Jan Kara 提交于
      Currently ->get_dqblk() and ->set_dqblk() use struct fs_disk_quota which
      tracks space limits and usage in 512-byte blocks. However VFS quotas
      track usage in bytes (as some filesystems require that) and we need to
      somehow pass this information. Upto now it wasn't a problem because we
      didn't do any unit conversion (thus VFS quota routines happily stuck
      number of bytes into d_bcount field of struct fd_disk_quota). Only if
      you tried to use Q_XGETQUOTA or Q_XSETQLIM for VFS quotas (or Q_GETQUOTA
      / Q_SETQUOTA for XFS quotas), you got bogus results. Hardly anyone
      tried this but reportedly some Samba users hit the problem in practice.
      So when we want interfaces compatible we need to fix this.
      
      We bite the bullet and define another quota structure used for passing
      information from/to ->get_dqblk()/->set_dqblk. It's somewhat sad we have
      to have more conversion routines in fs/quota/quota.c and another copying
      of quota structure slows down getting of quota information by about 2%
      but it seems cleaner than overloading e.g. units of d_bcount to bytes.
      
      CC: stable@vger.kernel.org
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      14bf61ff
  12. 27 1月, 2015 5 次提交
  13. 26 1月, 2015 2 次提交
  14. 22 1月, 2015 1 次提交
  15. 20 1月, 2015 2 次提交
    • R
      module: remove mod arg from module_free, rename module_memfree(). · be1f221c
      Rusty Russell 提交于
      Nothing needs the module pointer any more, and the next patch will
      call it from RCU, where the module itself might no longer exist.
      Removing the arg is the safest approach.
      
      This just codifies the use of the module_alloc/module_free pattern
      which ftrace and bpf use.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Ley Foon Tan <lftan@altera.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: x86@kernel.org
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: linux-cris-kernel@axis.com
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Cc: nios2-dev@lists.rocketboards.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: sparclinux@vger.kernel.org
      Cc: netdev@vger.kernel.org
      be1f221c
    • R
      module_arch_freeing_init(): new hook for archs before module->module_init freed. · d453cded
      Rusty Russell 提交于
      Archs have been abusing module_free() to clean up their arch-specific
      allocations.  Since module_free() is also (ab)used by BPF and trace code,
      let's keep it to simple allocations, and provide a hook called before
      that.
      
      This means that avr32, ia64, parisc and s390 no longer need to implement
      their own module_free() at all.  avr32 doesn't need module_finalize()
      either.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-ia64@vger.kernel.org
      Cc: linux-parisc@vger.kernel.org
      Cc: linux-s390@vger.kernel.org
      d453cded
  16. 19 1月, 2015 3 次提交