1. 06 10月, 2014 3 次提交
  2. 05 10月, 2014 1 次提交
    • D
      sparc64: Fix reversed start/end in flush_tlb_kernel_range() · 473ad7f4
      David S. Miller 提交于
      When we have to split up a flush request into multiple pieces
      (in order to avoid the firmware range) we don't specify the
      arguments in the right order for the second piece.
      
      Fix the order, or else we get hangs as the code tries to
      flush "a lot" of entries and we get lockups like this:
      
      [ 4422.981276] NMI watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [expect:117032]
      [ 4422.996130] Modules linked in: ipv6 loop usb_storage igb ptp sg sr_mod ehci_pci ehci_hcd pps_core n2_rng rng_core
      [ 4423.016617] CPU: 12 PID: 117032 Comm: expect Not tainted 3.17.0-rc4+ #1608
      [ 4423.030331] task: fff8003cc730e220 ti: fff8003d99d54000 task.ti: fff8003d99d54000
      [ 4423.045282] TSTATE: 0000000011001602 TPC: 00000000004521e8 TNPC: 00000000004521ec Y: 00000000    Not tainted
      [ 4423.064905] TPC: <__flush_tlb_kernel_range+0x28/0x40>
      [ 4423.074964] g0: 000000000052fd10 g1: 00000001295a8000 g2: ffffff7176ffc000 g3: 0000000000002000
      [ 4423.092324] g4: fff8003cc730e220 g5: fff8003dfedcc000 g6: fff8003d99d54000 g7: 0000000000000006
      [ 4423.109687] o0: 0000000000000000 o1: 0000000000000000 o2: 0000000000000003 o3: 00000000f0000000
      [ 4423.127058] o4: 0000000000000080 o5: 00000001295a8000 sp: fff8003d99d56d01 ret_pc: 000000000052ff54
      [ 4423.145121] RPC: <__purge_vmap_area_lazy+0x314/0x3a0>
      [ 4423.155185] l0: 0000000000000000 l1: 0000000000000000 l2: 0000000000a38040 l3: 0000000000000000
      [ 4423.172559] l4: fff8003dae8965e0 l5: ffffffffffffffff l6: 0000000000000000 l7: 00000000f7e2b138
      [ 4423.189913] i0: fff8003d99d576a0 i1: fff8003d99d576a8 i2: fff8003d99d575e8 i3: 0000000000000000
      [ 4423.207284] i4: 0000000000008008 i5: fff8003d99d575c8 i6: fff8003d99d56df1 i7: 0000000000530c24
      [ 4423.224640] I7: <free_vmap_area_noflush+0x64/0x80>
      [ 4423.234193] Call Trace:
      [ 4423.239051]  [0000000000530c24] free_vmap_area_noflush+0x64/0x80
      [ 4423.251029]  [0000000000531a7c] remove_vm_area+0x5c/0x80
      [ 4423.261628]  [0000000000531b80] __vunmap+0x20/0x120
      [ 4423.271352]  [000000000071cf18] n_tty_close+0x18/0x40
      [ 4423.281423]  [00000000007222b0] tty_ldisc_close+0x30/0x60
      [ 4423.292183]  [00000000007225a4] tty_ldisc_reinit+0x24/0xa0
      [ 4423.303120]  [0000000000722ab4] tty_ldisc_hangup+0xd4/0x1e0
      [ 4423.314232]  [0000000000719aa0] __tty_hangup+0x280/0x3c0
      [ 4423.324835]  [0000000000724cb4] pty_close+0x134/0x1a0
      [ 4423.334905]  [000000000071aa24] tty_release+0x104/0x500
      [ 4423.345316]  [00000000005511d0] __fput+0x90/0x1e0
      [ 4423.354701]  [000000000047fa54] task_work_run+0x94/0xe0
      [ 4423.365126]  [0000000000404b44] __handle_signal+0xc/0x2c
      
      Fixes: 4ca9a237 ("sparc64: Guard against flushing openfirmware mappings.")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      473ad7f4
  3. 01 10月, 2014 4 次提交
    • S
      sparc64: Add vio_set_intr() to enable/disable Rx interrupts · ca605b7d
      Sowmini Varadhan 提交于
      The vio_set_intr() API should be used by VIO consumers to enable/disable
      Rx interrupts to facilitate deferred processing in softirq/bottom-half
      context.
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ca605b7d
    • D
      vio: fix reuse of vio_dring slot · d0aedcd4
      Dwight Engen 提交于
      vio_dring_avail() will allow use of every dring entry, but when the last
      entry is allocated then dr->prod == dr->cons which is indistinguishable from
      the ring empty condition. This causes the next allocation to reuse an entry.
      When this happens in sunvdc, the server side vds driver begins nack'ing the
      messages and ends up resetting the ldc channel. This problem does not effect
      sunvnet since it checks for < 2.
      
      The fix here is to just never allocate the very last dring slot so that full
      and empty are not the same condition. The request start path was changed to
      check for the ring being full a bit earlier, and to stop the blk_queue if
      there is no space left. The blk_queue will be restarted once the ring is
      only half full again. The number of ring entries was increased to 512 which
      matches the sunvnet and Solaris vdc drivers, and greatly reduces the
      frequency of hitting the ring full condition and the associated blk_queue
      stop/starting. The checks in sunvent were adjusted to account for
      vio_dring_avail() returning 1 less.
      
      Orabug: 19441666
      OraBZ: 14983
      Signed-off-by: NDwight Engen <dwight.engen@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d0aedcd4
    • A
      sunvdc: add cdrom and v1.1 protocol support · 9bce2182
      Allen Pais 提交于
      Interpret the media type from v1.1 protocol to support CDROM/DVD.
      
      For v1.0 protocol, a disk's size continues to be calculated from the
      geometry returned by the vdisk server. The geometry returned by the server
      can be less than the actual number of sectors available in the backing
      image/device due to the rounding in the division used to compute the
      geometry in the vdisk server.
      
      In v1.1 protocol a disk's actual size in sectors is returned during the
      handshake. Use this size when v1.1 protocol is negotiated. Since this size
      will always be larger than the former geometry computed size, disks created
      under v1.0 will be forwards compatible to v1.1, but not vice versa.
      Signed-off-by: NDwight Engen <dwight.engen@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9bce2182
    • D
      sparc: VIO protocol version 1.6 · 163a4e74
      David L Stevens 提交于
      Add VIO protocol version 1.6 interfaces.
      Signed-off-by: NDavid L Stevens <david.stevens@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      163a4e74
  4. 27 9月, 2014 1 次提交
  5. 17 9月, 2014 5 次提交
    • S
      sparc64: Move request_irq() from ldc_bind() to ldc_alloc() · c21c4ab0
      Sowmini Varadhan 提交于
      The request_irq() needs to be done from ldc_alloc()
      to avoid the following (caught by lockdep)
      
       [00000000004a0738] __might_sleep+0xf8/0x120
       [000000000058bea4] kmem_cache_alloc_trace+0x184/0x2c0
       [00000000004faf80] request_threaded_irq+0x80/0x160
       [000000000044f71c] ldc_bind+0x7c/0x220
       [0000000000452454] vio_port_up+0x54/0xe0
       [00000000101f6778] probe_disk+0x38/0x220 [sunvdc]
       [00000000101f6b8c] vdc_port_probe+0x22c/0x300 [sunvdc]
       [0000000000451a88] vio_device_probe+0x48/0x60
       [000000000074c56c] really_probe+0x6c/0x300
       [000000000074c83c] driver_probe_device+0x3c/0xa0
       [000000000074c92c] __driver_attach+0x8c/0xa0
       [000000000074a6ec] bus_for_each_dev+0x6c/0xa0
       [000000000074c1dc] driver_attach+0x1c/0x40
       [000000000074b0fc] bus_add_driver+0xbc/0x280
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: NDwight Engen <dwight.engen@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c21c4ab0
    • B
      sparc64: T5 PMU · 05aa1651
      bob picco 提交于
      The T5 (niagara5) has different PCR related HV fast trap values and a new
      HV API Group. This patch utilizes these and shares when possible with niagara4.
      
      We use the same sparc_pmu niagara4_pmu. Should there be new effort to
      obtain the MCU perf statistics then this would have to be changed.
      
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: NBob Picco <bob.picco@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      05aa1651
    • B
      sparc64: mem boot option correction · 7c21d533
      bob picco 提交于
      The "mem" boot option can result in many unexpected consequences. This patch
      attempts to prevent boot hangs which have been experienced on T4-4 and T5-8.
      Basically the boot loader allocates vmlinuz and initrd higher in available
      OBP physical memory. For example, on a 2Tb T5-8 it isn't possible to boot
      with mem=20G.
      
      The patch utilizes memblock to avoid reserved regions and trim memory which
      is only free. Other improvements are possible for a multi-node machine.
      
      This is a snippet of the boot log with mem=20G on T5-8 with the patch applied:
      MEMBLOCK configuration:	<- before memory reduction
       memory size = 0x1ffad6ce000 reserved size = 0xa1adf44
       memory.cnt  = 0xb
       memory[0x0]    [0x00000030400000-0x00003fdde47fff], 0x3fada48000 bytes
       memory[0x1]    [0x00003fdde4e000-0x00003fdde4ffff], 0x2000 bytes
       memory[0x2]    [0x00080000000000-0x00083fffffffff], 0x4000000000 bytes
       memory[0x3]    [0x00100000000000-0x00103fffffffff], 0x4000000000 bytes
       memory[0x4]    [0x00180000000000-0x00183fffffffff], 0x4000000000 bytes
       memory[0x5]    [0x00200000000000-0x00203fffffffff], 0x4000000000 bytes
       memory[0x6]    [0x00280000000000-0x00283fffffffff], 0x4000000000 bytes
       memory[0x7]    [0x00300000000000-0x00303fffffffff], 0x4000000000 bytes
       memory[0x8]    [0x00380000000000-0x00383fffc71fff], 0x3fffc72000 bytes
       memory[0x9]    [0x00383fffc92000-0x00383fffca1fff], 0x10000 bytes
       memory[0xa]    [0x00383fffcb4000-0x00383fffcb5fff], 0x2000 bytes
       reserved.cnt  = 0x2
       reserved[0x0]  [0x00380000000000-0x0038000117e7f8], 0x117e7f9 bytes
       reserved[0x1]  [0x00380004000000-0x0038000d02f74a], 0x902f74b bytes
      ...
      MEMBLOCK configuration:	<- after reduction of memory
       memory size = 0x50a1adf44 reserved size = 0xa1adf44
       memory.cnt  = 0x4
       memory[0x0]    [0x00380000000000-0x0038000117e7f8], 0x117e7f9 bytes
       memory[0x1]    [0x00380004000000-0x0038050d01d74a], 0x50901d74b bytes
       memory[0x2]    [0x00383fffc92000-0x00383fffca1fff], 0x10000 bytes
       memory[0x3]    [0x00383fffcb4000-0x00383fffcb5fff], 0x2000 bytes
       reserved.cnt  = 0x2
       reserved[0x0]  [0x00380000000000-0x0038000117e7f8], 0x117e7f9 bytes
       reserved[0x1]  [0x00380004000000-0x0038000d02f74a], 0x902f74b bytes
      ...
      Early memory node ranges
        node   7: [mem 0x380000000000-0x38000117dfff]
        node   7: [mem 0x380004000000-0x380f0d01bfff]
        node   7: [mem 0x383fffc92000-0x383fffca1fff]
        node   7: [mem 0x383fffcb4000-0x383fffcb5fff]
      Could not find start_pfn for node 0
      Could not find start_pfn for node 1
      Could not find start_pfn for node 2
      Could not find start_pfn for node 3
      Could not find start_pfn for node 4
      Could not find start_pfn for node 5
      Could not find start_pfn for node 6
      .
      
      The patch was tested on T4-1, T5-8 and Jalap?no.
      
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: NBob Picco <bob.picco@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c21d533
    • B
      sparc64: find_node adjustment · 3dee9df5
      bob picco 提交于
      We have seen an issue with guest boot into LDOM that causes early boot failures
      because of no matching rules for node identitity of the memory. I analyzed this
      on my T4 and concluded there might not be a solution. I saw the issue in
      mainline too when booting into the control/primary domain - with guests
      configured.  Note, this could be a firmware bug on some older machines.
      
      I'll provide a full explanation of the issues below. Should we not find a
      matching BEST latency group for a real address (RA) then we will assume node 0.
      On the T4-2 here with the information provided I can't see an alternative.
      
      Technically the LDOM shown below should match the MBLOCK to the
      favorable latency group. However other factors must be considered too. Were
      the memory controllers configured "fine" grained interleave or "coarse"
      grain interleaved -  T4. Also should a "group" MD node be considered a NUMA
      node?
      
      There has to be at least one Machine Description (MD) "group" and hence one
      NUMA node. The group can have one or more latency groups (lg) - more than one
      memory controller. The current code chooses the smallest latency as the most
      favorable per group. The latency and lg information is in MLGROUP below.
      MBLOCK is the base and size of the RAs for the machine as fetched from OBP
      /memory "available" property. My machine has one MBLOCK but more would be
      possible - with holes?
      
      For a T4-2 the following information has been gathered:
      with LDOM guest
      MEMBLOCK configuration:
       memory size = 0x27f870000
       memory.cnt  = 0x3
       memory[0x0]    [0x00000020400000-0x0000029fc67fff], 0x27f868000 bytes
       memory[0x1]    [0x0000029fd8a000-0x0000029fd8bfff], 0x2000 bytes
       memory[0x2]    [0x0000029fd92000-0x0000029fd97fff], 0x6000 bytes
       reserved.cnt  = 0x2
       reserved[0x0]  [0x00000020800000-0x000000216c15c0], 0xec15c1 bytes
       reserved[0x1]  [0x00000024800000-0x0000002c180c1e], 0x7980c1f bytes
      MBLOCK[0]: base[20000000] size[280000000] offset[0]
      (note: "base" and "size" reported in "MBLOCK" encompass the "memory[X]" values)
      (note: (RA + offset) & mask = val is the formula to detect a match for the
      memory controller. should there be no match for find_node node, a return
      value of -1 resulted for the node - BAD)
      
      There is one group. It has these forward links
      MLGROUP[1]: node[545] latency[1f7e8] match[200000000] mask[200000000]
      MLGROUP[2]: node[54d] latency[2de60] match[0] mask[200000000]
      NUMA NODE[0]: node[545] mask[200000000] val[200000000] (latency[1f7e8])
      (note: "val" is the best lg's (smallest latency) "match")
      
      no LDOM guest - bare metal
      MEMBLOCK configuration:
       memory size = 0xfdf2d0000
       memory.cnt  = 0x3
       memory[0x0]    [0x00000020400000-0x00000fff6adfff], 0xfdf2ae000 bytes
       memory[0x1]    [0x00000fff6d2000-0x00000fff6e7fff], 0x16000 bytes
       memory[0x2]    [0x00000fff766000-0x00000fff771fff], 0xc000 bytes
       reserved.cnt  = 0x2
       reserved[0x0]  [0x00000020800000-0x00000021a04580], 0x1204581 bytes
       reserved[0x1]  [0x00000024800000-0x0000002c7d29fc], 0x7fd29fd bytes
      MBLOCK[0]: base[20000000] size[fe0000000] offset[0]
      
      there are two groups
      group node[16d5]
      MLGROUP[0]: node[1765] latency[1f7e8] match[0] mask[200000000]
      MLGROUP[3]: node[177d] latency[2de60] match[200000000] mask[200000000]
      NUMA NODE[0]: node[1765] mask[200000000] val[0] (latency[1f7e8])
      group node[171d]
      MLGROUP[2]: node[1775] latency[2de60] match[0] mask[200000000]
      MLGROUP[1]: node[176d] latency[1f7e8] match[200000000] mask[200000000]
      NUMA NODE[1]: node[176d] mask[200000000] val[200000000] (latency[1f7e8])
      (note: for this two "group" bare metal machine, 1/2 memory is in group one's
      lg and 1/2 memory is in group two's lg).
      
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: NBob Picco <bob.picco@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3dee9df5
    • B
      sparc64: sun4v TLB error power off events · 4ccb9272
      bob picco 提交于
      We've witnessed a few TLB events causing the machine to power off because
      of prom_halt. In one case it was some nfs related area during rmmod. Another
      was an mmapper of /dev/mem. A more recent one is an ITLB issue with
      a bad pagesize which could be a hardware bug. Bugs happen but we should
      attempt to not power off the machine and/or hang it when possible.
      
      This is a DTLB error from an mmapper of /dev/mem:
      [root@sparcie ~]# SUN4V-DTLB: Error at TPC[fffff80100903e6c], tl 1
      SUN4V-DTLB: TPC<0xfffff80100903e6c>
      SUN4V-DTLB: O7[fffff801081979d0]
      SUN4V-DTLB: O7<0xfffff801081979d0>
      SUN4V-DTLB: vaddr[fffff80100000000] ctx[1250] pte[98000000000f0610] error[2]
      .
      
      This is recent mainline for ITLB:
      [ 3708.179864] SUN4V-ITLB: TPC<0xfffffc010071cefc>
      [ 3708.188866] SUN4V-ITLB: O7[fffffc010071cee8]
      [ 3708.197377] SUN4V-ITLB: O7<0xfffffc010071cee8>
      [ 3708.206539] SUN4V-ITLB: vaddr[e0003] ctx[1a3c] pte[2900000dcc800eeb] error[4]
      .
      
      Normally sun4v_itlb_error_report() and sun4v_dtlb_error_report() would call
      prom_halt() and drop us to OF command prompt "ok". This isn't the case for
      LDOMs and the machine powers off.
      
      For the HV reported error of HV_ENORADDR for HV HV_MMU_MAP_ADDR_TRAP we cause
      a SIGBUS error by qualifying it within do_sparc64_fault() for fault code mask
      of FAULT_CODE_BAD_RA. This is done when trap level (%tl) is less or equal
      one("1"). Otherwise, for %tl > 1,  we proceed eventually to die_if_kernel().
      
      The logic of this patch was partially inspired by David Miller's feedback.
      
      Power off of large sparc64 machines is painful. Plus die_if_kernel provides
      more context. A reset sequence isn't a brief period on large sparc64 but
      better than power-off/power-on sequence.
      
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: NBob Picco <bob.picco@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ccb9272
  6. 11 9月, 2014 1 次提交
  7. 10 9月, 2014 6 次提交
  8. 14 8月, 2014 4 次提交
  9. 12 8月, 2014 2 次提交
    • D
      sparc64: Fix pcr_ops initialization and usage bugs. · 8bccf5b3
      David S. Miller 提交于
      Christopher reports that perf_event_print_debug() can crash in uniprocessor
      builds.  The crash is due to pcr_ops being NULL.
      
      This happens because pcr_arch_init() is only invoked by smp_cpus_done() which
      only executes in SMP builds.
      
      init_hw_perf_events() is closely intertwined with pcr_ops being setup properly,
      therefore:
      
      1) Call pcr_arch_init() early on from init_hw_perf_events(), instead of
         from smp_cpus_done().
      
      2) Do not hook up a PMU type if pcr_ops is NULL after pcr_arch_init().
      
      3) Move init_hw_perf_events to a later initcall so that it we will be
         sure to invoke pcr_arch_init() after all cpus are brought up.
      
      Finally, guard the one naked sequence of pcr_ops dereferences in
      __global_pmu_self() with an appropriate NULL check.
      Reported-by: NChristopher Alexander Tobias Schulze <cat.schulze@alice-dsl.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8bccf5b3
    • D
      sparc64: Do not disable interrupts in nmi_cpu_busy() · 58556104
      David S. Miller 提交于
      nmi_cpu_busy() is a SMP function call that just makes sure that all of the
      cpus are spinning using cpu cycles while the NMI test runs.
      
      It does not need to disable IRQs because we just care about NMIs executing
      which will even with 'normal' IRQs disabled.
      
      It is not legal to enable hard IRQs in a SMP cross call, in fact this bug
      triggers the BUG check in irq_work_run_list():
      
      	BUG_ON(!irqs_disabled());
      
      Because now irq_work_run() is invoked from the tail of
      generic_smp_call_function_single_interrupt().
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58556104
  10. 09 8月, 2014 1 次提交
  11. 07 8月, 2014 2 次提交
  12. 06 8月, 2014 1 次提交
  13. 05 8月, 2014 5 次提交
    • D
      sparc: Add "install" target · c78f77e2
      David L Stevens 提交于
      This patches adds an "install" target to install kernel builds for SPARC,
      modeled after the i386 script.
      Signed-off-by: NDavid L Stevens <david.stevens@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c78f77e2
    • A
      arch/sparc/math-emu/math_32.c: drop stray break operator · 093758e3
      Andrey Utkin 提交于
      This commit is a guesswork, but it seems to make sense to drop this
      break, as otherwise the following line is never executed and becomes
      dead code. And that following line actually saves the result of
      local calculation by the pointer given in function argument. So the
      proposed change makes sense if this code in the whole makes sense (but I
      am unable to analyze it in the whole).
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=81641Reported-by: NDavid Binderman <dcb314@hotmail.com>
      Signed-off-by: NAndrey Utkin <andrey.krieger.utkin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      093758e3
    • S
      sparc64: ldc_connect() should not return EINVAL when handshake is in progress. · 4ec1b010
      Sowmini Varadhan 提交于
      The LDC handshake could have been asynchronously triggered
      after ldc_bind() enables the ldc_rx() receive interrupt-handler
      (and thus intercepts incoming control packets)
      and before vio_port_up() calls ldc_connect(). If that is the case,
      ldc_connect() should return 0 and let the state-machine
      progress.
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: NKarl Volz <karl.volz@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ec1b010
    • D
      sparc64: Guard against flushing openfirmware mappings. · 4ca9a237
      David S. Miller 提交于
      Based almost entirely upon a patch by Christopher Alexander Tobias
      Schulze.
      
      In commit db64fe02 ("mm: rewrite vmap
      layer") lazy VMAP tlb flushing was added to the vmalloc layer.  This
      causes problems on sparc64.
      
      Sparc64 has two VMAP mapped regions and they are not contiguous with
      eachother.  First we have the malloc mapping area, then another
      unrelated region, then the vmalloc region.
      
      This "another unrelated region" is where the firmware is mapped.
      
      If the lazy TLB flushing logic in the vmalloc code triggers after
      we've had both a module unload and a vfree or similar, it will pass an
      address range that goes from somewhere inside the malloc region to
      somewhere inside the vmalloc region, and thus covering the
      openfirmware area entirely.
      
      The sparc64 kernel learns about openfirmware's dynamic mappings in
      this region early in the boot, and then services TLB misses in this
      area.  But openfirmware has some locked TLB entries which are not
      mentioned in those dynamic mappings and we should thus not disturb
      them.
      
      These huge lazy TLB flush ranges causes those openfirmware locked TLB
      entries to be removed, resulting in all kinds of problems including
      hard hangs and crashes during reboot/reset.
      
      Besides causing problems like this, such huge TLB flush ranges are
      also incredibly inefficient.  A plea has been made with the author of
      the VMAP lazy TLB flushing code, but for now we'll put a safety guard
      into our flush_tlb_kernel_range() implementation.
      
      Since the implementation has become non-trivial, stop defining it as a
      macro and instead make it a function in a C source file.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ca9a237
    • D
      sparc64: Do not insert non-valid PTEs into the TSB hash table. · 18f38132
      David S. Miller 提交于
      The assumption was that update_mmu_cache() (and the equivalent for PMDs) would
      only be called when the PTE being installed will be accessible by the user.
      
      This is not true for code paths originating from remove_migration_pte().
      
      There are dire consequences for placing a non-valid PTE into the TSB.  The TLB
      miss frramework assumes thatwhen a TSB entry matches we can just load it into
      the TLB and return from the TLB miss trap.
      
      So if a non-valid PTE is in there, we will deadlock taking the TLB miss over
      and over, never satisfying the miss.
      
      Just exit early from update_mmu_cache() and friends in this situation.
      
      Based upon a report and patch from Christopher Alexander Tobias Schulze.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      18f38132
  14. 03 8月, 2014 1 次提交
    • A
      net: filter: split 'struct sk_filter' into socket and bpf parts · 7ae457c1
      Alexei Starovoitov 提交于
      clean up names related to socket filtering and bpf in the following way:
      - everything that deals with sockets keeps 'sk_*' prefix
      - everything that is pure BPF is changed to 'bpf_*' prefix
      
      split 'struct sk_filter' into
      struct sk_filter {
      	atomic_t        refcnt;
      	struct rcu_head rcu;
      	struct bpf_prog *prog;
      };
      and
      struct bpf_prog {
              u32                     jited:1,
                                      len:31;
              struct sock_fprog_kern  *orig_prog;
              unsigned int            (*bpf_func)(const struct sk_buff *skb,
                                                  const struct bpf_insn *filter);
              union {
                      struct sock_filter      insns[0];
                      struct bpf_insn         insnsi[0];
                      struct work_struct      work;
              };
      };
      so that 'struct bpf_prog' can be used independent of sockets and cleans up
      'unattached' bpf use cases
      
      split SK_RUN_FILTER macro into:
          SK_RUN_FILTER to be used with 'struct sk_filter *' and
          BPF_PROG_RUN to be used with 'struct bpf_prog *'
      
      __sk_filter_release(struct sk_filter *) gains
      __bpf_prog_release(struct bpf_prog *) helper function
      
      also perform related renames for the functions that work
      with 'struct bpf_prog *', since they're on the same lines:
      
      sk_filter_size -> bpf_prog_size
      sk_filter_select_runtime -> bpf_prog_select_runtime
      sk_filter_free -> bpf_prog_free
      sk_unattached_filter_create -> bpf_prog_create
      sk_unattached_filter_destroy -> bpf_prog_destroy
      sk_store_orig_filter -> bpf_prog_store_orig_filter
      sk_release_orig_filter -> bpf_release_orig_filter
      __sk_migrate_filter -> bpf_migrate_filter
      __sk_prepare_filter -> bpf_prepare_filter
      
      API for attaching classic BPF to a socket stays the same:
      sk_attach_filter(prog, struct sock *)/sk_detach_filter(struct sock *)
      and SK_RUN_FILTER(struct sk_filter *, ctx) to execute a program
      which is used by sockets, tun, af_packet
      
      API for 'unattached' BPF programs becomes:
      bpf_prog_create(struct bpf_prog **)/bpf_prog_destroy(struct bpf_prog *)
      and BPF_PROG_RUN(struct bpf_prog *, ctx) to execute a program
      which is used by isdn, ppp, team, seccomp, ptp, xt_bpf, cls_bpf, test_bpf
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7ae457c1
  15. 22 7月, 2014 3 次提交