From e6974bcb88972d5c48b8d941e1b60b28d34c6259 Mon Sep 17 00:00:00 2001 From: unknown Date: Mon, 29 Jul 2019 15:36:58 +0800 Subject: [PATCH] complete 16 issue --- ...ed Caffe in comparison with BVLC Caffe.md" | 314 +++++++----------- ...ROVE DEEP LEARNING FRAMEWORKS FOR CPUS.md" | 64 ++-- ...what this means for the future of data.md" | 140 ++++---- 3 files changed, 225 insertions(+), 293 deletions(-) diff --git "a/20171018 \347\254\25416\346\234\237/Benefits of Intel\302\256 Optimized Caffe in comparison with BVLC Caffe.md" "b/20171018 \347\254\25416\346\234\237/Benefits of Intel\302\256 Optimized Caffe in comparison with BVLC Caffe.md" index d2cb9f5..ec0602c 100644 --- "a/20171018 \347\254\25416\346\234\237/Benefits of Intel\302\256 Optimized Caffe in comparison with BVLC Caffe.md" +++ "b/20171018 \347\254\25416\346\234\237/Benefits of Intel\302\256 Optimized Caffe in comparison with BVLC Caffe.md" @@ -1,272 +1,215 @@ -# Benefits of Intel® Optimized Caffe* in comparison with BVLC Caffe* +# 相比BVLC Caffe,英特尔®优化Caffe的优势 原文链接:[Benefits of Intel® Optimized Caffe* in comparison with BVLC Caffe*](https://software.intel.com/en-us/articles/comparison-between-intel-optimized-caffe-and-vanilla-caffe-by-intel-vtune-amplifier?from=hackcv&hmsr=hackcv.com&utm_medium=hackcv.com&utm_source=hackcv.com) -### Overview +### 总览 - This article introduces Berkeley Vision and Learning Center (BVLC) Caffe* and a custom version of Caffe*, Intel® Optimized Caffe*. We explain why and how Intel® Optimized Caffe* performs efficiently on Intel® architecture via Intel® VTune™ Amplifier and the time profiling option of Caffe* itself. +本文介绍了Berkeley Vision and Learning Center(BVLC)Caffe 和Caffe 的定制版本,英特尔®优化的Caffe 。 我们将解释英特尔®优化Caffe后在英特尔®架构上通过英特尔®VTune™放大器以及Caffe 本身的性能剖析与如何运用 ### -### Introduction to BVLC Caffe* and Intel® Optimized Caffe* +### BVLC Caffe* 和英特尔®优化版Caffe *简介 -[Caffe](http://caffe.berkeleyvision.org/)* is a well-known and widely used machine vision based Deep Learning framework developed by the Berkeley Vision and Learning Center ([BVLC](http://bvlc.eecs.berkeley.edu/)). It is an open-source framework and is evolving currently. It allows users to control a variety options such as libraries for BLAS, CPU or GPU focused computation, CUDA, OpenCV*, MATLAB and Python* before you build Caffe* through 'Makefile.config'. You can easily change the options in the configuration file and BVLC provides intuitive instructions on their project web page for developers. +[Caffe](http://caffe.berkeleyvision.org/)*是由Berkeley Vision 和Learning Center([BVLC](http://bvlc.eecs/berkeley.edu/))开发的一种运用广泛的基于机器视觉的深度学习框架。 它是一个开源框架,目前正在发展。 它允许用户在通过'Makefile.config'构建Caffe *之前修改参数,例如用于BLAS,CPU或GPU关注计算的库,CUDA,OpenCV *,MATLAB和Python *。 您可以轻松更改配置文件中的选项,BVLC可为开发人员提供项目网页上的直接说明。 -Intel® Optimized Caffe* is an Intel-distributed customized Caffe* version for Intel architecture. Intel® Optimized Caffe* offers all the goodness of main Caffe* with the addition of Intel architecture-optimized functionality and multi-node distributor training and scoring. Intel® Optimized Caffe* makes it possible to more efficiently utilize CPU resources. +英特尔®优化版Caffe *是针对英特尔架构的英特尔分布式定制Caffe *版本。英特尔®优化Caffe *通过增加英特尔架构优化功能和多节点分布训练和打分,具有主要Caffe *的所有优点。英特尔®优化版Caffe *可以更有效地利用CPU资源。 -To see in detail how Intel® Optimized Caffe* has changed in order to optimize itself to Intel Architectures, please refer this page : +要详细了解英特尔®优化Caffe *如何更改以优化自身的英特尔体系结构,请参阅此页:https://software.intel.com/en-us/articles/caffe-optimized-for-intel-architecture-applying-modern-code-techniques -In this article, we will first profile the performance of BVLC Caffe* with Cifar 10 example and then will profile the performance of Intel® Optimized Caffe* with the same example. Performance profile will be conducted through two different methods. +在本文中,我们将首先使用Cifar 10示例分析BVLC Caffe *的性能,然后使用相同的示例来分析英特尔®优化Caffe *的性能。性能评估将通过两种不同的方法进行。 -Tested platform : Xeon Phi™ 7210 ( 1.3Ghz, 64 Cores ) with 96GB RAM, CentOS 7.2 +测试的平台:Xeon Phi™7210(1.3Ghz,64核心),96GB RAM,CentOS 7.2 -\1. Caffe* provides its own timing option for example : +1. Caffe *提供了自己的计时选项,例如: -| `1` | `./build/tools/caffe ``time` `\` | -| ---- | -------------------------------- | -| | | +```bash +./build/tools/caffe time \ + --model=examples/cifar10/cifar10_full_sigmoid_train_test_bn.prototxt \ + -iterations 1000 +``` -| `2` | ` ``--model=examples/cifar10/cifar10_full_sigmoid_train_test_bn.prototxt \` | -| ---- | ------------------------------------------------------------ | -| | | - -| `3` | ` ``-iterations 1000` | -| ---- | ------------------------ | -| | | - -\2. Intel® VTune™ Amplifier : Intel® VTune™ Amplifier is a powerful profiling tool that provides advanced CPU profiling features with a modern analysis interface. +2. 英特尔®VTune™放大器:英特尔®VTune™放大器是一款功能强大的分析工具,可提供先进的CPU分析功能和现代分析界面。 -### How to Install BVLC Caffe* - -Please refer the BVLC Caffe project web page for installation : - -If you have Intel® MKL installed on your system, it is better using MKL as BLAS library. - -In your Makefile.config , choose BLAS := mkl and specify MKL address. ( The default set is BLAS := atlas ) +### 如何安装BVLC Caffe* -In our test, we kept all configurations as they are specified as default except the CPU only option. +请参考BVLC Caffe项目网页进行安装: - - -### Test example +如果您的系统上安装了英特尔®MKL,那么最好使用MKL作为BLAS库。 -In this article, we will use 'Cifar 10' example included in Caffe* package as default. +在Makefile.config中,选择BLAS:= mkl并指定MKL地址。 (默认设置为BLAS:= atlas) -You can refer BVLC Caffe project page for detail information about this exmaple : +在我们的测试中,我们将所有配置保留为默认值,但仅限CPU选项。 -You can simply run the training example of Cifar 10 as the following : -| `1` | `cd $CAFFE_ROOT` | -| ---- | ---------------- | -| | | -| `2` | `./data/cifar10/get_cifar10.sh` | -| ---- | ------------------------------- | -| | | +### 测试样例 -| `3` | `./examples/cifar10/create_cifar10.sh` | -| ---- | -------------------------------------- | -| | | +在本文中,我们将使用Caffe *包中包含的“Cifar 10”示例作为默认值。 -| `4` | `./examples/cifar10/train_full_sigmoid_bn.sh` | -| ---- | --------------------------------------------- | -| | | +您可以参考BVLC Caffe项目页面以获取有关此例子的详细信息: -First, we will try the Caffe's own benchmark method to obtain its performance results as the following: +您可以轻松地运行Cifar 10的训练示例,如下所示: -| `1` | `./build/tools/caffe ``time` `\` | -| ---- | -------------------------------- | -| | | +```bash +cd $CAFFE_ROOT +./data/cifar10/get_cifar10.sh +./examples/cifar10/create_cifar10.sh +./examples/cifar10/train_full_sigmoid_bn.sh +``` -| `2` | ` ``--model=examples/cifar10/cifar10_full_sigmoid_train_test_bn.prototxt \` | -| ---- | ------------------------------------------------------------ | -| | | +首先,我们将尝试使用Caffe自己的基准测试方法来获得其性能结果,如下所示: -| `3` | ` ``-iterations 1000` | -| ---- | ------------------------ | -| | | +```bash +./build/tools/caffe time\ + model=examples/cifar10/cifar10_full_sigmoid_train_test_bn.prototxt \ + -iterations 1000 +``` -as results, we got the layer-by-layer forward and backward propagation time. The command above measure the time each forward and backward pass over a batch f images. At the end it shows the average execution time per iteration for 1,000 iterations per layer and for the entire calculation. +结果,我们得到了逐层前向和后向传播时间。 上面的命令测得每个前向和后向传播批量f图像的时间。 最后,它显示了每层1000次迭代和整个计算的每次迭代的平均执行时间。 ![img](https://software.intel.com/sites/default/files/managed/c1/9d/Picture1.png) -This test was run on Xeon Phi™ 7210 ( 1.3Ghz, 64 Cores ) with 96GB RAM of DDR4 installed with CentOS 7.2. +该测试在Xeon Phi™7210(1.3Ghz,64核)上运行,其中96GB RAM的DDR4与CentOS 7.2一起安装。 -The numbers in the above results will be compared later with the results of Intel® Optimized Caffe*. +上述结果中的数字将在稍后与英特尔®优化Caffe *的结果进行比较。 -Before that, let's take a look at the VTune™ results also to observe the behave of Caffe* in detail. +在此之前,让我们来看看VTune™结果,以便详细观察Caffe *的表现 - -### VTune Profiling -Intel® VTune™ Amplifier is a modern processor performance profiler that is capable of analyzing top hotspots quickly and helping tuning your target application. You can find the details of Intel® VTune™ Amplifier from the following link : +### VTune 剖析 + +Intel® VTune™ Amplifier是一款现代处理器性能分析器,能够快速分析”hotspots“并帮助调整目标应用。 您可以从以下链接中找到英特尔®VTune™放大器的详细信息: Intel® VTune™ Amplifier : -We used Intel® VTune™ Amplifier in this article to find the function with the highest total CPU utilization time. Also, how OpenMP threads are working. +我们在本文中使用英特尔®VTune™放大器来查找具有最高总CPU利用时间的功能。 此外,OpenMP线程如何工作。 -### VTune result analysis +### VTune结果分析 ![img](https://software.intel.com/sites/default/files/managed/5c/97/Capture1.PNG) -What we can see here is some functions listed on the left side of the screen which are taking the most of the CPU time. They are called 'hotspots' and can be the target functions for performance optimization. +我们在这里看到的是屏幕左侧列出的一些功能,这些功能占用了大部分CPU时间。 它们被称为“hotspots”,可以作为性能优化的目标函数。 -In this case, we will focus on 'caffe::im2col_cpu' function as a optimization candidate. +在这种情况下,我们将关注'caffe :: im2col_cpu '函数作为优化候选。 -'im2col_cpu' is one of the steps in performing direct convolution as a GEMM operation for using highly optimized BLAS libraries. This function took the largest CPU resource in our test of training Cifar 10 model using BVLC Caffe*. +'im2col_cpu '是执行直接卷积作为使用高度优化的BLAS库的GEMM操作的步骤之一。 在我们使用BVLC Caffe *训练Cifar 10模型的测试中,此功能占用了最大的CPU资源。 -Let's take a look at the threads behaviors of this function. In VTune™, you can choose a function and filter other workloads out to observe only the workloads of the specified function. +我们来看看这个函数的线程行为。 在VTune™中,您可以选择一个功能并过滤其他工作负载,以仅观察指定功能的工作负载。 ![img](https://software.intel.com/sites/default/files/managed/36/a2/Capture2.PNG) -On the above result, we can see the CPI ( Cycles Per Instruction ) of the function is 0.907 and the function utilizes only one single thread for the entire calculation. +在上面的结果中,我们可以看到函数的CPI(每指令周期数)是0.907,并且该函数仅使用一个单个线程进行整个计算。 -One more intuitive data provided by Intel VTune Amplifier is here. +英特尔VTune放大器提供的更直观的数据就在这里。 ![img](https://software.intel.com/sites/default/files/managed/45/a8/Capture3.PNG) -This 'CPU Usage Histogram' provides the data of the numbers of CPUs that were running simultaneously. The number of CPUs the training process utilized appears to be about 25. The platform has 64 physical core with Intel® Hyper-Threading Technology so it has 256 CPUs. The CPU usage histogram here might imply that the process is not efficiently threaded. +此“CPU使用率直方图”提供同时运行的CPU数量的数据。 训练过程使用的CPU数量似乎约为25.该平台有64个物理内核和英特尔®超线程技术,因此它拥有256个CPU。 这里的CPU使用率直方图可能意味着该进程没有有效的线程。 -However, we cannot just determine that these results are 'bad' because we did not set any performance standard or desired performance to classify. We will compare these results with the results of Intel® Optimized Caffe* later. +但是,我们不能仅仅确定这些结果是“坏的”,因为我们没有设置任何性能标准或期望的性能来进行分类。 我们将在稍后将这些结果与英特尔®优化Caffe *的结果进行比较。 - - -Let's move on to Intel® Optimized Caffe* now. +现在让我们转向英特尔®优化Caffe *。 -### How to Install Intel® Optimized Caffe* - - The basic procedure of installation of Intel® Optimized Caffe* is the same as BVLC Caffe*. +### 如何安装 Intel® Optimized Caffe* -When clone Intel® Optimized Caffe* from Git, you can use this alternative : +安装英特尔®优化Caffe *的基本步骤与BVLC Caffe *相同。 -| `1` | `git clone https:``//github.com/intel/caffe` | -| ---- | -------------------------------------------- | -| | | +当从Git克隆英特尔®优化的Caffe *时,您可以使用以下替代方法: - +```bash +git clone https: //github.com/intel/caffe +``` -Additionally, it is required to install Intel® MKL to bring out the best performance of Intel® Optimized Caffe*. +此外,还需要安装英特尔®MKL才能发挥英特尔®优化Caffe *的最佳性能。 -Please download and install Intel® MKL. Intel offers MKL for free without technical support or for a license fee to get one-on-one private support. The default BLAS library of Intel® Optimized Caffe* is set to MKL. +请下载并安装英特尔®MKL。 英特尔免费提供MKL,无需技术支持或获得许可费即可获得一对一的私人支持。 英特尔®优化Caffe *的默认BLAS库设置为MKL。 Intel® MKL : -After downloading Intel® Optimized Caffe* and installing MKL, in your Makefile.config, make sure you choose MKL as your BLAS library and point MKL include and lib folder for BLAS_INCLUDE and BLAS_LIB +下载英特尔®优化Caffe *并安装MKL后,在Makefile.config中,确保选择MKL作为BLAS库并指向BLL_INCLUDE和BLAS_LIB的MKL include和lib文件夹 -| `1` | `BLAS :=mkl` | -| ---- | ------------ | -| | | +```bash +BLAS :=mkl -| `2` | | -| ---- | ---- | -| | | +BLAS_INCLUDE := /opt/intel/mkl/include +BLAS_LIB := /opt/intel/mkl/lib/intel64 +``` -| `3` | `BLAS_INCLUDE := /opt/intel/mkl/include` | -| ---- | ---------------------------------------- | -| | | +如果在编译英特尔®优化Caffe *期间遇到“libstdc ++”相关错误,请安装“libstdc++ - static”。 例如 : -| `4` | `BLAS_LIB := /opt/intel/mkl/lib/intel64` | -| ---- | ---------------------------------------- | -| | | +```bash +sudo yum install libstdc++- static +``` - +### 优化因素和曲调 -If you encounter 'libstdc++' related error during the compilation of Intel® Optimized Caffe*, please install 'libstdc++-static'. For example : +在我们运行和测试示例的性能之前,我们需要更改或调整一些选项以优化性能。 -| `1` | `sudo yum install libstdc++-``static` | -| ---- | ------------------------------------- | -| | | +- 使用'mkl'作为BLAS库:在Makefile.config中指定'BLAS:= mkl'并配置MKL的include和lib位置的位置。 - - - - - +- 设置CPU利用率限制: -### Optimization factors and tunes +```bash +echo "100" sudo tee /sys/devices/system/cpu/intel_pstate/min_perf_pct +echo "0" | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo +``` -Before we run and test the performance of examples, there are some options we need to change or adjust to optimize performance. + - 将'engine:“MKL2017”放在train_val.prototxt或solver.prototxt文件的顶部,或者使用此选项和caffe工具:-engine“MKL2017” -- Use 'mkl' as BLAS library : Specify 'BLAS := mkl' in Makefile.config and configure the location of your MKL's include and lib location also. + - 当前实现使用OpenMP线程。默认情况下,OpenMP线程的数量设置为CPU核心数。每个线程都绑定到一个内核以获得最佳性能结果。但是,可以通过OpenMP环境变量(如KMP_AFFINITY,OMP_NUM_THREADS或GOMP_CPU_AFFINITY)提供正确的配置来使用自己的配置。对于下面的示例运行,已使用'OMP_NUM_THREADS = 64'。 -- Set CPU utilization limit : + - 英特尔®优化Caffe *编辑了原始BVLC Caffe *代码的许多部分,以实现与OpenMP *更好的代码并行化。根据在后台运行的其他进程,调整OpenMP *使用的线程数通常很有用。对于Intel Xeon Phi™产品系列单节点,我们建议使用OMP_NUM_THREADS = numer_of_cores-2。 - | `1` | `echo ``"100"` `| sudo tee /sys/devices/``system``/cpu/intel_pstate/min_perf_pct` | - | ---- | ------------------------------------------------------------ | - | | | + - 请同时参考:[Intel Recommendation to Achieve the best performance ](https://github.com/intel/caffe/wiki/Recommendations-to-achieve-best-performance) - | `2` | `echo ``"0"` `| sudo tee /sys/devices/``system``/cpu/intel_pstate/no_turbo` | - | ---- | ------------------------------------------------------------ | - | | | +如果由于OS过于频繁地移动线程而观察到过多的开销,则可以尝试调整OpenMP * affinity环境变量: -- Put 'engine:"MKL2017" ' at the top of your train_val.prototxt or solver.prototxt file or use this option with caffe tool : -engine "MKL2017" -- Current implementation uses OpenMP threads. By default the number of OpenMP threads is set to the number of CPU cores. Each one thread is bound to a single core to achieve best performance results. It is however possible to use own configuration by providing right one through OpenMP environmental variables like KMP_AFFINITY, OMP_NUM_THREADS or GOMP_CPU_AFFINITY. For the example run below , 'OMP_NUM_THREADS = 64' has been used. - -- Intel® Optimized Caffe* has edited many parts of original BVLC Caffe* code to achieve better code parallelization with OpenMP*. Depending on other processes running on the background, it is often useful to adjust the number of threads getting utilized by OpenMP*. For Intel Xeon Phi™ product family single-node we recommend to use OMP_NUM_THREADS = numer_of_cores-2. - -- Please also refer here : [Intel Recommendation to Achieve the best performance ](https://github.com/intel/caffe/wiki/Recommendations-to-achieve-best-performance) - -If you observe too much overhead because of too frequent movement of thread by OS, you can try to adjust OpenMP* affinity environment variable : - -| `1` | `KMP_AFFINITY=compact,granularity=fine` | -| ---- | --------------------------------------- | -| | | +```bash +KMP_AFFINITY=compact,granularity=fine +``` -### Test example - - For Intel® Optimized Caffe* we run the same example to compare the results with the previous results. - -| `1` | `cd $CAFFE_ROOT` | -| ---- | ---------------- | -| | | - -| `2` | `./data/cifar10/get_cifar10.sh` | -| ---- | ------------------------------- | -| | | - -| `3` | `./examples/cifar10/create_cifar10.sh` | -| ---- | -------------------------------------- | -| | | - -| `1` | `./build/tools/caffe ``time` `\` | -| ---- | -------------------------------- | -| | | +### 测试样例 -| `2` | ` ``--model=examples/cifar10/cifar10_full_sigmoid_train_test_bn.prototxt \` | -| ---- | ------------------------------------------------------------ | -| | | +对于英特尔®优化Caffe *,我们运行相同的示例,将结果与之前的结果进行比较。 -| `3` | ` ``-iterations 1000` | -| ---- | ------------------------ | -| | | +```bash +cd $CAFFE_ROOT +./data/cifar10/get_cifar10.sh +./examples/cifar10/create_cifar10.sh +``` +```bash +./build/tools/caffe time \ + --model=examples/cifar10/cifar10_full_sigmoid_train_test_bn.prototxt \ + -iterations 1000 +``` - -### Comparison +### 对比 - The results with the above example is the following : +以上示例的结果如下: -Again , the platform used for the test is : Xeon Phi™ 7210 ( 1.3Ghz, 64 Cores ) with 96GB RAM, CentOS 7.2 +同样,用于测试的平台是 : Xeon Phi™ 7210 ( 1.3Ghz, 64 Cores ) with 96GB RAM, CentOS 7.2 -first, let's look at the BVLC Caffe*'s and Intel® Optimized Caffe* together, +首先,我们一起来看看BVLC Caffe *和英特尔®优化Caffe *, ![img](https://software.intel.com/sites/default/files/managed/a1/80/Picture1.png) --> ![img](https://software.intel.com/sites/default/files/managed/ed/6f/Picture2.png) -to make it easy to compare, please see the table below. The duration each layer took in milliseconds has been listed, and on the 5th column we stated how many times Intel® Optimized Caffe* is faster than BVLC Caffe* at each layer. You can observe significant performance improvements except for bn layers relatively. Bn stands for "Batch Normalization" which requires fairly simple calculations with small optimization potential. Bn forward layers show better results and Bn backward layers show 2~3% slower results than the original. Worse performance can occur here in result of threading overhead. Overall in total, Intel® Optimized Caffe* achieved about 28 times faster performance in this case. +为了便于比较,请参阅下表。 列出了每层采用的持续时间(以毫秒为单位),在第5列中,我们说明了每层英特尔®优化Caffe *比BVLC Caffe *快多少倍。 除了相对的bn层,您可以观察到显着的性能改进。 Bn代表“批量标准化”,它需要相当简单的计算,具有小的优化潜力。 Bn前向层显示更好的结果,Bn后向层显示比原始结果慢2~3%的结果。 由于线程开销,这里可能会出现更糟糕的性能。 总体而言,在这种情况下,英特尔®优化Caffe *的性能提升了约28倍。 + | | Direction | BVLC (ms) | Intel (ms) | Performance Benefit (x) | | -------- | ---------------- | --------- | ---------- | ----------------------- | @@ -303,35 +246,32 @@ to make it easy to compare, please see the table below. The duration each layer | Ave. | Forward-Backward | 1223.86 | 43.636 | 28.047 | | Total | | 1223860 | 43636 | 28.047 | - -Some of many reasons this optimization was possible are : +这种优化可能的原因有很多: -- Code vectorization for SIMD -- Finding hotspot functions and reducing function complexity and the amount of calculations -- CPU / system specific optimizations -- Reducing thread movements -- Efficient OpenMP* utilization - - + - SIMD的代码矢量化 + - 查找hotspot功能,降低功能复杂性和计算量 + - CPU /系统特定的优化 + - 减少线程移动 + - 高效的OpenMP *利用率 -Additionally, let's compare the VTune results of this example between BVLC Caffe and Intel® Optimized Caffe*. +另外,让我们比较一下BVLC Caffe和英特尔®优化Caffe *之间的VTune结果。 -Simply we will looking at how efficiently im2col_cpu function has been utilized. +我们将简单地研究如何有效地利用im2col_cpu函数。 ![img](https://software.intel.com/sites/default/files/managed/fd/1e/Capture2.PNG) -BVLC Caffe*'s im2col_cpu function had CPI at 0.907 and was single threaded. +BVLC Caffe *的im2col_cpu函数的CPI为0.907,并且是单线程的。 ![img](https://software.intel.com/sites/default/files/managed/4f/19/Capture4.PNG) -In case of Intel® Optimized Caffe* , im2col_cpu has its CPI at 2.747 and is multi threaded by OMP Workers. +对于英特尔®优化Caffe *,im2col_cpu的CPI为2.747,由OMP Workers提供多线程。 -The reason why CPI rate increased here is vectorization which brings higher CPI rate because of longer latency for each instruction and multi-threading which can introduce spinning while waitning for other threads to finish their jobs. However, in this example, benefits from vectorization and multi-threading exceed the latency and overhead and bring performance improvements after all. +这里CPI率增加的原因是矢量化带来了更高的CPI率,因为每条指令的延迟更长,多线程可以在等待其他线程完成工作时spinning。但是,在此示例中,矢量化和多线程的优势超过了延迟和开销,并且毕竟带来了性能改进。 -VTune suggests that CPI rate close to 2.0 is theoretically ideal and for our case, we achieved about the right CPI for the function. The training workload for the Cifar 10 example is to handle 32 x 32 pixel images for each iteration so when those workloads split down to many threads, each of them can be a very small task which may cause transition overhead for multi-threading. With larger images we would see lower spining time and smaller CPI rate. +VTune建议CPI率接近2.0在理论上是理想的,对于我们的情况,我们实现了该函数的正确CPI。 Cifar 10示例的训练工作量是为每次迭代处理32 x 32像素图像,因此当这些工作负载分解为多个线程时,它们中的每一个都可能是一个非常小的任务,这可能导致多线程的转换开销。对于较大的图像,我们会看到较短的spinning时间和较小的CPI率。 -CPU Usage Histogram for the whole process also shows better threading results in this case. +在这种情况下,整个过程的CPU使用率直方图也显示出更好的线程结果。 ![img](https://software.intel.com/sites/default/files/managed/bf/89/Capture3.PNG) @@ -341,9 +281,9 @@ CPU Usage Histogram for the whole process also shows better threading results in -### -### Useful links + +### 推荐链接 BVLC Caffe* Project : [http://caffe.berkeleyvision.org/ ](http://caffe.berkeleyvision.org/) @@ -361,16 +301,14 @@ Intel® Optimized Caffe* Modern Code Techniques : . Some performance tests showing that adding optimizations for CPUs yields as much as 82X – see the blog [*Benefits of Intel Optimized Caffe in comparison with BVLC Caffe*.](https://software.intel.com/en-us/articles/comparison-between-intel-optimized-caffe-and-vanilla-caffe-by-intel-vtune-amplifier) -- Torch is a popular framework for deep learning. There is no reason to use the standard Torch on a CPU without applying CPU optimizations. Use the *Intel Software Optimization for Torch* which is dedicated to improving Torch performance when running on CPU, in particular Intel Xeon Scalable processors. It is available from . I’ve been personally using this on Intel processors (I use: install.sh icc off mkl noskip) and on Intel Xeon Phi processors (I use: install.sh icc avx512 mkl noskip). The team is very open to feedback, and has proven responsive to questions and feedback I have offered. -- Theano is an open source Python library, popular with machine learning programmers, to help define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. CPU optimizations are available that improves performance on CPU devices, in particular Intel Xeon Scalable processors and Intel Xeon Phi, and is available at . -- Neon is a Python-based deep learning framework designed for ease of use and extensibility on modern deep neural networks and is committed to best performance on all hardware. Neon was created by Nervana, which was acquired by Intel. Learn more about it, including optimizations on all hardware, at . + - TensorFlow是由Google创建的领先的深度学习和机器学习框架。在处理器方面,Tensorflow进行了优化以适用于Linux作为[可通过pip安装的wheel](https://software.intel.com/en-us/articles/intel-optimized-tensorflow-wheel-now-available)。英特尔性能测试表明,与没有这些性能优化的基本版TensorFlow相比,CPU的性能提升高达72倍。有关实现此功能的优化工作以及性能数据的更多信息,请参阅[*博客文章标题为现代英特尔架构上的TensorFlow优化*](https://software.intel.com/en-us/articles/tensorflow-optimizations-on-modern-intel-architecture). + - Caffe是最受欢迎的图像识别社区应用程序之一。英特尔为优化的分支做出了贡献,该分支致力于在CPU上运行时提高Caffe性能。它可以从获得。一些性能测试表明,为CPU添加优化产生的效果高达82倍 - 请参阅博客[*与BVLC Caffe *相比,英特尔优化Caffe的优势](https://software.intel.com/en-us/articles/comparison-between-intel-optimized-caffe-and-vanilla-caffe-by-intel-vtune-amplifier)。 + - Torch是深度学习的流行框架。没有应用CPU优化,没有理由在CPU上使用标准Torch。使用*Intel Software for Torch*,专门用于在CPU上运行时提高Torch性能,特别是Intel Xeon Scalable处理器。它可以从下载。我自己在英特尔处理器上使用它(我使用:install.sh icc off mkl noskip)和英特尔至强Phi处理器(我使用:install.sh icc avx512 mkl noskip)。团队对反馈非常开放,并且已经证明对我提供的问题和反馈有所回应。 + - Theano是一个开源的Python库,深受机器学习程序员的欢迎,可以帮助定义,优化和评估涉及多维数组的数学表达式。 CPU优化可用于提高CPU设备(尤其是Intel Xeon Scalable处理器和Intel Xeon Phi)的性能,可通过获得。 + - Neon是一个基于Python的深度学习框架,旨在实现现代深度神经网络的易用性和可扩展性,并致力于在所有硬件上实现最佳性能。 Neon由Nervana创建,被英特尔收购。在上了解有关它的更多信息,包括对所有硬件的优化。 -### DEEP LEARNING MATH LIBRARIES +### 深度学习数学库 -- Python, and its libraries, is perhaps *the* most popular basis for machine learning applications. The accelerated version of Python has gained widespread adoption in the last few year – and is available for download directly, or via Conda, or via yum or apt-get, or Docker images. There is no excuse to be running vanilla un-accelerated Python. Every machine that I develop on has these accelerations for Python installed. Look at for all the information you need to know to use it. There is a nice piece titled *Overcome Python Performance Barriers for Machine Learning* in [Parallel Universe Magazine](https://software.intel.com/intel-parallel-universe-magazine), Issue 26, starting on page 33. -- BigDL is a distributed deep learning library for Apache Spark. With BigDL, users can write their deep learning applications as standard Apache Spark programs, which can directly run on top of existing Apache Spark or Hadoop clusters. Modeled after Torch, BigDL provides comprehensive support for deep learning, including numeric computing (via Tensor) and high level neural networks; in addition, users can load pre-trained Caffe or Torch models into Spark programs using BigDL. Intel has been reported to claim that processing in BigDL is “orders of magnitude faster than out-of-box open source Caffe, Torch, or TensorFlow on a single-node Xeon processor (i.e., comparable with mainstream GPU).” It is available from . There is also a nice article on BigDL in [Parallel Universe Magazine](https://software.intel.com/intel-parallel-universe-magazine), Issue 28, starting on page 57. -- MXNet is an open-source, deep learning framework available from . -- Intel MKL-DNN is an open source, performance-enhancing library for accelerating deep learning frameworks on CPUs with information with the [Intel MKL-DNN Overview blog](https://software.intel.com/articles/intel-mkl-dnn-part-1-library-overview-and-installation.). + - Python及其库可能是机器学习应用程序最常用的基础。 Python的加速版本在过去几年中得到了广泛采用 - 可直接下载,或通过Conda,或通过yum或apt-get或Docker镜像下载。没有理由不提高Python的性能。我开发的每台机器都安装了Python的这些加速功能。查看,了解使用它需要了解的所有信息。在[Parallel Universe Magazine](https://software.intel.com/intel-parallel-universe-magazine),第26期,从第33页开始,有一篇名为*克服机器学习的Python性能障碍的好文章*。 + - BigDL是Apache Spark的分布式深度学习库。使用BigDL,用户可以将他们的深度学习应用程序编写为标准的Apache Spark程序,它可以直接在现有的Apache Spark或Hadoop集群上运行。以Torch为模型,BigDL为深度学习提供全面支持,包括数值计算(通过Tensor)和高级神经网络;此外,用户可以使用BigDL将预先训练的Caffe或Torch模型加载到Spark程序中。据报道,英特尔声称BigDL处理在单节点Xeon处理器上比开箱即用的开源Caffe,Torch或TensorFlow快几个数量级(即与主流GPU相当)。“它可用来自。在[Parallel Universe Magazine](https://software.intel.com/intel-parallel-universe-magazine),第28期,从第57页开始,还有一篇关于BigDL的好文章。 + - MXNet是一个开源的深度学习框架,可从获得。 + - 英特尔MKL-DNN是一个开源的,性能增强的库,用于加速CPU上的深度学习框架,其中包含[英特尔MKL-DNN概述博客](https://software.intel.com/articles/intel-mkl-dnn-part-1-library-overview-and-installation.)。 -In addition to the frameworks and libraries noted above, the Intel Data Analytics Acceleration Library (DAAL) is an open source library of optimized algorithmic building blocks for data analysis stages most commonly associated with solving Big Data problems. The library is designed for use popular data platforms including Hadoop, Spark, R, and Matlab. It is available from . There is also a good article in *Parallel Universe Magazine*, Issue 28, starting on page 26, titled *Solving Real-World Machine Learning Problems with Intel Data Analytics Acceleration Library*. +除了上面提到的框架和库之外,英特尔数据分析加速库(DAAL)是一个开源的优化算法构建模块库,用于最常与解决大数据问题相关的数据分析阶段。该库设计用于流行的数据平台,包括Hadoop,Spark,R和Matlab。它可以从获得。在* Parallel Universe Magazine *,第28期,从第26页开始,还有一篇很好的文章,名为*解决英特尔数据分析加速库*的实际机器学习问题。 -### WHAT IF WE ONLY DO MACHINE LEARNING? +### 如果我们只做机器学习怎么办? -While Intel Xeon Scalable processors may be the best solution when we justify a server supporting a variety of workloads, what if we want to take a leap and buy a “machine learning only” server or supercomputer? +虽然英特尔至强可扩展处理器可能是我们证明服务器支持各种工作负载的最佳解决方案,但如果我们想要实现跨越并购买“仅限机器学习”的服务器或超级计算机呢? -My best advice “be sure you really know what you need” and be aware that things are really changing in the field. I do not mean to dissuade any one, but it is difficult to guess all the options we will have even a year from now. I have no doubt that the reality is that accelerators for machine learning will shift from GPUs to FPGAs, ASICs, and products with ‘neural’ in their descriptions. The CPU of choice in all these solutions where you have to support a variety of workloads will remain Intel Xeon processors. +我最好的建议是“确保你真的知道你需要什么”,并注意事情在这个领域真的发生了变化。我并不是要劝阻任何一个人,但很难猜到我们即将在一年之后所拥有的所有选择。我毫不怀疑现实情况是机器学习的加速器将从GPU转向FPGA,ASIC和产品中的“neural”。在您必须支持各种工作负载的所有这些解决方案中,所选择的CPU仍将是Intel Xeon处理器。 -Choices for accelerators are getting more diverse. High-core count CPUs (the Intel Xeon Phi processors – in particular the upcoming “Knights Mill” version), and FPGAs (Intel Xeon processors coupled with Intel/Altera FPGAs), offer highly flexible options excellent price/performance and power efficiencies. An Intel Xeon Phi processor-based system can train, or learn an AlexNet image classification system, up to 2.3 times faster than a similarly configured system using Nvidia GPUs. (see *Inside Intel: The Race for Faster Machine Learning*). Intel has shown that the Intel Xeon Phi Processor delivers up to nine times more performance per dollar versus a hosted GPU solution, and up to eight times more performance per watt. Coming soon are more products that are purpose built for AI from Intel Nervana. +加速器的选择越来越多样化。高核计数CPU(Intel Xeon Phi处理器 - 特别是即将推出的“Knights Mill”版本)和FPGA(Intel Xeon处理器与Intel/Altera FPGA结合)提供高度灵活的选项,具有出色的性价比和功效。基于英特尔至强处理器的系统可以训练或学习AlexNet图像分类系统,速度比使用Nvidia GPU的类似配置系统快2.3倍。 (参见*英特尔内部:更快的机器学习竞赛*)。英特尔表明,与托管GPU解决方案相比,英特尔至强融核处理器的每美元性能提高了9倍,每瓦性能提高了8倍。即将推出更多专为英特尔Nervana设计的AI产品。 -It’s an exciting time to be a computer geek, and machine learning is nothing if it is not fun. It is great to see all the options available to build super-fast machines for machine learning. +成为一名电脑爱好者是一个激动人心的时刻,如果机器学习不好玩,那么机器学习就不算什么了。很高兴看到可用于构建用于机器学习的超快速机器的所有选择。 -### FOUNDATION FOR MACHINE LEARNING +### 机器学习基础 -The Xeon SP processors, particularly the Platinum processors, offer outstanding performance for machine learning, while giving us more versatility than any other solution. If and when we are ready to add acceleration, Intel Xeon Scalable processors still serve as the core of a versatile system with accelerators – and the choice of what those accelerators can be is growing quickly. Either way, relying on Skylake processors and their excellent support for machine learning gives us the best combination of performance and versatility in one package. +Xeon SP处理器,特别是Platinum处理器,为机器学习提供了出色的性能,同时为我们提供了比任何其他解决方案更多的功能。 如果我们准备好增加加速度,英特尔至强可扩展处理器仍然是带加速器的多功能系统的核心 - 并且这些加速器的选择正在快速增长。 无论哪种方式,依靠Skylake处理器及其对机器学习的出色支持,我们在一个包装中为我们提供了性能和多功能性的最佳组合。 -Learn more: +学到更多: - [*Inside Intel: The Race for Faster Machine Learning*](https://www.intel.com/content/www/us/en/analytics/machine-learning/the-race-for-faster-machine-learning.html) - Intel’s official site for information on deep learning frameworks and optimization available to ensure top CPU performance: @@ -76,4 +76,4 @@ Learn more: - MXNet is an open-source, deep learning framework available from . - Neon – all platform optimized – -*James Reinders is an independent consultant in high performance computing and parallel programming. Reinders was most recently the parallel programming model architect for Intel’s HPC business, and was a key contributor to the design and implementation of the ASCI Red and Tianhe-2A massively parallel supercomputers.* \ No newline at end of file +*James Reinders是高性能计算和并行编程的独立顾问。 Reinders最近是英特尔HPC业务的并行编程模型架构师,并且是ASCI Red和Tianhe-2A大规模并行超级计算机设计和实现的关键贡献者。* \ No newline at end of file diff --git "a/20171018 \347\254\25416\346\234\237/Why SQL is beating NoSQL, and what this means for the future of data.md" "b/20171018 \347\254\25416\346\234\237/Why SQL is beating NoSQL, and what this means for the future of data.md" index d9a0bae..e8b7e22 100644 --- "a/20171018 \347\254\25416\346\234\237/Why SQL is beating NoSQL, and what this means for the future of data.md" +++ "b/20171018 \347\254\25416\346\234\237/Why SQL is beating NoSQL, and what this means for the future of data.md" @@ -1,8 +1,8 @@ -# Why SQL is beating NoSQL, and what this means for the future of data +# 为什么SQL打败NoSQL,这对未来的数据意味着什么 原文链接:[Why SQL is beating NoSQL, and what this means for the future of data](Why SQL is beating NoSQL, and what this means for the future of data) -*After years of being left for dead, SQL today is making a comeback. How come? And what effect will this have on the data community?* +*经过多年的死亡,SQL今天正在卷土重来。 这是怎么发生的? 这会对数据社区产生什么影响?* *(Update: #1 on Hacker News!* [*Read the discussion here.*](https://news.ycombinator.com/item?id=15335717)*)* @@ -12,41 +12,37 @@ ![img](https://cdn-images-1.medium.com/max/2000/1*HMEoq1e2RNxSwiQo_RL6tw.gif) -**SQL awakens to fight the dark forces of NoSQL** +**SQL唤醒了对抗NoSQL的黑暗势力** -Since the dawn of computing, we have been collecting exponentially growing amounts of data, constantly asking more from our data storage, processing, and analysis technology. In the past decade, this caused software developers to cast aside SQL as a relic that couldn’t scale with these growing data volumes, leading to the rise of NoSQL: MapReduce and Bigtable, Cassandra, MongoDB, and more. +自计算开始以来,我们一直在收集指数级增长的数据,不断从我们的数据存储,处理和分析技术中获取更多信息。在过去十年中,这导致软件开发人员抛弃SQL作为遗留物,无法随着这些不断增长的数据量而扩展,导致NoSQL的兴起:MapReduce和Bigtable,Cassandra,MongoDB等等。 -Yet today SQL is resurging. All of the major cloud providers now offer popular managed relational database services: e.g., [Amazon RDS](https://aws.amazon.com/rds/), [Google Cloud SQL](https://cloud.google.com/sql/docs/), [Azure Database for PostgreSQL](https://azure.microsoft.com/en-us/services/postgresql/) (Azure launched just this year). In Amazon’s own words, its PostgreSQL- and MySQL-compatible database Aurora database product has been the “[fastest growing service in the history of AWS](http://www.businesswire.com/news/home/20161130006131/en/AWS-Extends-Amazon-Aurora-PostgreSQL-Compatibility)”. SQL interfaces on top of Hadoop and Spark continue to thrive. And just last month, [Kafka launched SQL support](https://www.confluent.io/blog/ksql-open-source-streaming-sql-for-apache-kafka/). Your humble authors themselves are developers of a new [time-series database](https://github.com/timescale/timescaledb) that fully embraces SQL. +然而今天SQL正在复苏。所有主要的云提供商现在都提供流行的托管关系数据库服务:例如, [Amazon RDS](https://aws.amazon.com/rds/), [Google Cloud SQL](https://cloud.google.com/sql/docs/), [Azure Database for PostgreSQL](https://azure.microsoft.com/en-us/services/postgresql/) (Azure今年刚刚推出)。用亚马逊的话来说,它的PostgreSQL和MySQL兼容的数据库Aurora数据库产品一直是“[AWS历史上发展最快的服务]((http://www.businesswire.com/news/home/20161130006131/en/AWS-Extends-Amazon-Aurora-PostgreSQL-Compatibility))”。 Hadoop和Spark之上的SQL接口继续蓬勃发展。就在上个月,[Kafka发起了SQL支持](https://www.confluent.io/blog/ksql-open-source-streaming-sql-for-apache-kafka/)。你卑微的作者本身就是一个完全包含SQL的新[时间序列数据库](https://github.com/timescale/timescaledb)的开发人员。 -In this post we examine why the pendulum today is swinging back to SQL, and what this means for the future of the data engineering and analysis community. - ------- - -### Part 1: A New Hope - -To understand why SQL is making a comeback, let’s start with why it was designed in the first place. +在这篇文章中,我们将研究为什么今天的钟摆回到SQL,以及这对数据工程和分析社区的未来意味着什么。 +### 第一部分: 新希望 +为了理解SQL为什么可以卷土重来,让我们从回顾设计它的原因。 ![img](https://cdn-images-1.medium.com/max/1600/0*fAiBMwVRHoAPwLL7.) -**Like all good stories, ours starts in the 1970s** +**像所有好故事一样,我们的故事始于20世纪70年代** -Our story starts at IBM Research in the early 1970s, where the relational database was born. At that time, query languages relied on complex mathematical logic and notation. Two newly minted PhDs, Donald Chamberlin and Raymond Boyce, were impressed by the relational data model but saw that the query language would be a major bottleneck to adoption. They set out to design a new query language that would be (in their own words): “[more accessible to users without formal training in mathematics or computer programming](http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6359709).” +我们的故事始于20世纪70年代早期的IBM Research,关系数据库诞生于此。 那时,查询语言依赖于复杂的数学逻辑和符号。 两位新创建的博士Donald Chamberlin和Raymond Boyce对关系数据模型印象深刻,但发现查询语言将成为采用的主要瓶颈。 他们开始设计一种新的查询语言(用他们自己的话说):“[没有正式的数学或计算机编程基础的用户更容易使用](http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6359709).” ![img](https://cdn-images-1.medium.com/max/1600/0*Y5w_pCl0K9Fo9AF8.) -**Query languages before SQL ( a, b ) vs SQL ( c ) (**[**source**](http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6359709)**)** +**在SQL(a,b)与SQL(c)之前查询语言 (**[**source**](http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6359709)**)** -Think about this. Way before the Internet, before the Personal Computer, when the programming language C was first being introduced to the world, two young computer scientists realized that, “[much of the success of the computer industry depends on developing a class of users other than trained computer specialists.](http://www.almaden.ibm.com/cs/people/chamberlin/sequel-1974.pdf)” They wanted a query language that was as easy to read as English, and that would also encompass database administration and manipulation. +想想看。互联网发展之前,在个人计算机诞生之前,当编程语言C首次从世界上诞生时,两位年轻的计算机科学家意识到了这一点, “[计算机行业的成功大部分取决于除受过训练的计算机专家以外的一类用户。](http://www.almaden.ibm.com/cs/people/chamberlin/sequel-1974.pdf)” 他们想要一种像英语一样容易阅读的查询语言,并且还包括数据库管理和操作。 -The result was SQL, first introduced to the world in 1974. Over the next few decades, SQL would prove to be immensely popular. As relational databases like System R, Ingres, DB2, Oracle, SQL Server, PostgreSQL, MySQL (and more) took over the software industry, SQL became established as the preeminent language for interacting with a database, and became the *lingua franca* for an increasingly crowded and competitive ecosystem. +结果是SQL,于1974年首次在世界上诞生。在接下来的几十年中,SQL将被证明非常受欢迎。 随着System R,Ingres,DB2,Oracle,SQL Server,PostgreSQL,MySQL等关系数据库接管软件行业,SQL成为与数据库交互的优秀语言,并成为纷繁复杂生态系统中的*通用语言* 。 -(Sadly, Raymond Boyce never had a chance to witness SQL’s success. [He died of a brain aneurysm](https://en.wikipedia.org/wiki/Raymond_F._Boyce) 1 month after giving one of the earliest SQL presentations, just 26 years of age, leaving behind a wife and young daughter.) +(可悲的是,Raymond Boyce没有机会见证SQL的成功, [他死于脑动脉瘤](https://en.wikipedia.org/wiki/Raymond_F._Boyce) 。在给出一个最早的SQL演示文稿之后的一个月,仅仅26岁,留下了一个妻子和年幼的女儿。) -For a while, it seemed like SQL had successfully fulfilled its mission. But then the Internet happened. +有一段时间,似乎SQL已经成功完结。 但随后进入了互联网时代。 @@ -54,163 +50,161 @@ For a while, it seemed like SQL had successfully fulfilled its mission. But then ------ -### Part 2: NoSQL Strikes Back - -While Chamberlin and Boyce were developing SQL, what they didn’t realize is that a second group of engineers in California were working on another budding project that would later widely proliferate and threaten SQL’s existence. That project was [ARPANET](https://en.wikipedia.org/wiki/ARPANET), and on October 29, 1969, [it was born](http://all-that-is-interesting.com/internet-history). - +### 第二部分: NoSQL反击 +虽然Chamberlin和Boyce正在开发SQL,但他们没有意识到的是,加利福尼亚的第二组工程师正在研究另一个新兴的项目,该项目后来会广泛传播并威胁SQL的存在。 该项目是[ARPANET](https://en.wikipedia.org/wiki/ARPANET),并于1969年10月29日,[它诞生](http://all-that-is-interesting.com/internet-history)。 ![img](https://cdn-images-1.medium.com/max/1600/0*L-W7e8jSXtgdWSXu.) -**Some of the creators of ARPANET, which eventually evolved into today’s Internet (**[**source**](http://all-that-is-interesting.com/internet-history)**)** +**ARPANET的一些创造者,最终演变成今天的互联网 (**[**source**](http://all-that-is-interesting.com/internet-history)**)** -But SQL was actually fine until another engineer showed up and invented the [World Wide Web](https://en.wikipedia.org/wiki/World_Wide_Web), in 1989. +但是,在另一位工程师出现并发明了SQL之前,SQL实际上很好 [World Wide Web](https://en.wikipedia.org/wiki/World_Wide_Web), in 1989. ![img](https://cdn-images-1.medium.com/max/1600/0*6kZJR84blb_BkDxc.) -**The physicist who invented the Web (**[**source**](https://webfoundation.org/about/vision/history-of-the-web/)**)** +**发明网络的物理学家 (**[**source**](https://webfoundation.org/about/vision/history-of-the-web/)**)** -Like a weed, the Internet and Web flourished, massively disrupting our world in countless ways, but for the data community it created one particular headache: new sources generating data at much higher volumes and velocities than before. +就像杂草一样,互联网和网络蓬勃发展,以无数种方式大规模地扰乱了我们的世界,但对于数据社区来说,它造成了一个特别令人头疼的问题:新数据比以前数据的生成速度更快。 -As the Internet continued to grow and grow, the software community found that the relational databases of that time couldn’t handle this new load. *There was a disturbance in the force, as if a million databases cried out and were suddenly overloaded.* +随着互联网的不断发展和壮大,软件界发现当时的关系数据库无法处理这种新的负载。 *部队受到干扰,好像有数百万个数据库喊叫并突然超载。* -Then two new Internet giants made breakthroughs, and developed their own distributed non-relational systems to help with this new onslaught of data: **MapReduce** ([published 2004](https://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf)) and **Bigtable** ([published 2006](https://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf)) by Google, and **Dynamo** ([published 2007](http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf)) by Amazon. These seminal papers led to even more non-relational databases, including **Hadoop** (based on the MapReduce paper, [2006](https://en.wikipedia.org/wiki/Apache_Hadoop)), **Cassandra** (heavily inspired by both the Bigtable and Dynamo papers, [2008](https://en.wikipedia.org/wiki/Apache_Cassandra)) and **MongoDB** ([2009](https://en.wikipedia.org/wiki/MongoDB)). Because these were new systems largely written from scratch, they also eschewed SQL, leading to the rise of the NoSQL movement. +然后两个新的互联网巨头取得了突破,并开发了自己的分布式非关系系统,以帮助这一新的数据冲击: **MapReduce** ([published 2004](https://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf)) 与 **Bigtable** ([published 2006](https://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf)) 由谷歌开发, 和 **Dynamo** ([published 2007](http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf)) 由Amazon开发. 这些开创性的论文导致了更多的非关系型数据库, 包括**Hadoop** (基于MapReduce论文, [2006](https://en.wikipedia.org/wiki/Apache_Hadoop)), **Cassandra** (受Bigtable和Dynamo论文的启发, [2008](https://en.wikipedia.org/wiki/Apache_Cassandra)) 和 **MongoDB** ([2009](https://en.wikipedia.org/wiki/MongoDB)). 因为这些是从头开始编写的新系统,所以它们也避开了SQL,导致了NoSQL运动的兴起。 -And boy did the software developer community eat up NoSQL, embracing it arguably much more broadly than the original Google/Amazon authors intended. It’s easy to understand why: NoSQL was new and shiny; it promised scale and power; it seemed like the fast path to engineering success. But then the problems started appearing. +而且男孩让软件开发者社区吃掉了NoSQL,可以说它比原来的谷歌/亚马逊作者想的要广泛得多。 很容易理解为什么:NoSQL是新的,有光泽的; 它具有一定的规模和力量; 这似乎是工程成功的捷径。 但随后问题开始出现。 ![img](https://cdn-images-1.medium.com/max/1600/0*G6Hx2C1l9abkVkxq.) -**Classic software developer tempted by NoSQL. Don’t be this guy.** +**经典软件开发人员受NoSQL的诱惑。 不要成为这个人。** -Developers soon found that not having SQL was actually quite limiting. Each NoSQL database offered its own unique query language, which meant: more languages to learn (and to teach to your coworkers); increased difficulty in connecting these databases to applications, leading to tons of brittle glue code; a lack of a third party ecosystem, requiring companies to develop their own operational and visualization tools. +开发人员很快发现,没有SQL实际上是非常有限的。每个NoSQL数据库都提供了自己独特的查询语言,这意味着:学习更多语言(以及向同事传授);将这些数据库连接到应用程序的难度增加,导致大量脆弱的胶水代码;缺乏第三方生态系统,要求公司开发自己的运营和可视化工具。 -These NoSQL languages, being new, were also not fully developed. For example, there had been years of work in relational databases to add necessary features to SQL (e.g., JOINs); the immaturity of NoSQL languages meant more complexity was needed at the application level. The lack of JOINs also led to denormalization, which led to data bloat and rigidity. +这些NoSQL语言虽然是新的,但还没有完全开发出来。例如,关系数据库中已经有多年的工作要为SQL添加必要的功能(例如,JOIN); NoSQL语言的不成熟意味着在应用程序级别需要更多的复杂性。缺乏JOIN也导致非规范化,导致数据膨胀和僵化。 -Some NoSQL databases added their own “SQL-like” query languages, like Cassandra’s CQL. But this often made the problem worse. Using an interface that is *almost* identical to something more common actually created more mental friction: engineers didn’t know what was supported and what wasn’t. +一些NoSQL数据库添加了自己的“类SQL”查询语言,如Cassandra的CQL。但这往往使问题变得更糟。使用与更常见的*几乎相同的界面实际上创造了更多的精神摩擦:工程师不知道支持什么和不支持什么。 ![img](https://cdn-images-1.medium.com/max/1600/0*NxNoLnTnFQ7LkqBj.) -**SQL-like query languages are like the** [**Star Wars Holiday Special**](https://www.youtube.com/watch?v=ZX0x-I06Fpc)**. Accept no imitations.** [*(And always avoid the Star Wars Holiday Special.)*](https://xkcd.com/653/) +**类SQL的查询语言就像** [**Star Wars Holiday Special**](https://www.youtube.com/watch?v=ZX0x-I06Fpc)**. 不接受任何模仿.** [*(And always avoid the Star Wars Holiday Special.)*](https://xkcd.com/653/) -Some in the community saw the problems with NoSQL early on (e.g., [DeWitt and Stonebraker in 2008](https://homes.cs.washington.edu/~billhowe/mapreduce_a_major_step_backwards.html)). Over time, through hard-earned scars of personal experience, more and more software developers joined them. +社区中的一些人早期就看到了NoSQL的问题 (e.g., [DeWitt and Stonebraker in 2008](https://homes.cs.washington.edu/~billhowe/mapreduce_a_major_step_backwards.html)). 随着时间的推移,通过不断积攒的个人经验,越来越多的软件开发人员加入了他们。 [**Time-series data: Why (and how) to use a relational database instead of NoSQL** *Contrary to the belief of most developers, we show that relational databases can be made to scale for time-series data.*blog.timescale.com](https://blog.timescale.com/time-series-data-why-and-how-to-use-a-relational-database-instead-of-nosql-d0cd6975e87c) ------ -### Part 3: Return of the SQL +### 第三部分:回归SQL ![img](https://cdn-images-1.medium.com/max/1600/1*QsZLtPL0t9bspQ16fpmeLA.gif) -Initially seduced by the dark side, the software community began to see the light and come back to SQL. +最初被黑暗的一面诱惑,软件界开始看到光明并回到SQL。 -First came the SQL interfaces on top of Hadoop (and later, Spark), leading the industry to “back-cronym” NoSQL to “Not Only SQL” (yeah, nice try). +首先是在Hadoop(以及后来的Spark)之上的SQL接口,引领业界将“back-cronym”NoSQL改为“Not Only SQL”(是的,不错的尝试)。 -Then came the rise of NewSQL: new scalable databases that fully embraced SQL. **H-Store** [(published 2008](http://hstore.cs.brown.edu/papers/hstore-demo.pdf)) from MIT and Brown researchers was one of the first scale-out OLTP databases. Google again led the way for a geo-replicated SQL-interfaced database with their first **Spanner** paper [(published 2012](https://static.googleusercontent.com/media/research.google.com/en//archive/spanner-osdi2012.pdf)) (whose authors include the original MapReduce authors), followed by other pioneers like **CockroachDB** ([2014](https://en.wikipedia.org/wiki/Cockroach_Labs)). +然后是NewSQL的兴起:新的可扩展数据库完全包含SQL。 麻省理工学院和布朗研究人员的**H-Store** [(published 2008](http://hstore.cs.brown.edu/papers/hstore-demo.pdf))是最早的横向扩展OLTP数据库之一。 Google再次凭借他们的第一篇**Spanner** 论文引领了地理复制的SQL接口数据库 [(published 2012](https://static.googleusercontent.com/media/research.google.com/en//archive/spanner-osdi2012.pdf)) (其作者包括原始的MapReduce作者), 其次是其他**CockroachDB** ([2014](https://en.wikipedia.org/wiki/Cockroach_Labs))开拓者. -At the same time, the **PostgreSQL** community began to revive, adding critical improvements like a JSON datatype (2012), and a potpourri of new features in [PostgreSQL 10](https://wiki.postgresql.org/wiki/New_in_postgres_10): better native support for partitioning and replication, full text search support for JSON, and more (release slated for later this year). Other companies like **CitusDB** ([2016](https://www.citusdata.com/blog/2016/03/24/citus-unforks-goes-open-source/)) and yours truly ([**TimescaleDB**](https://github.com/timescale/timescaledb), [released this year](https://blog.timescale.com/when-boring-is-awesome-building-a-scalable-time-series-database-on-postgresql-2900ea453ee2)) found new ways to scale PostgreSQL for specialized data workloads. +同时, **PostgreSQL** 社区开始复活, 添加像JSON数据类型这样的关键改进(2012), 和一个新的功能的 [PostgreSQL 10](https://wiki.postgresql.org/wiki/New_in_postgres_10): 更好的本机支持分区和复制,JSON的全文搜索支持等(将在今年晚些时候发布)。 其他公司如**CitusDB** ([2016](https://www.citusdata.com/blog/2016/03/24/citus-unforks-goes-open-source/)) 和 ([**TimescaleDB**](https://github.com/timescale/timescaledb), [released this year](https://blog.timescale.com/when-boring-is-awesome-building-a-scalable-time-series-database-on-postgresql-2900ea453ee2))找到了为专业数据工作负载扩展PostgreSQL的新方法。 ![img](https://cdn-images-1.medium.com/max/1600/1*iGyZFQzaXJwP6gPAjqdgwQ.png) -In fact, our journey developing [**TimescaleDB**](https://github.com/timescale/timescaledb) closely mirrors the path the industry has taken. Early internal versions of [TimescaleDB](http://www.timescale.com/) featured our own SQL-like query language called “ioQL.” Yes, we too were tempted by the dark side: building our own query language felt powerful. But while it seemed like the easy path, we soon realized that we’d have to do a lot more work: e.g., deciding syntax, building various connectors, educating users, etc. We also found ourselves constantly looking up the proper syntax to queries that we could already express in SQL, for a query language we had written ourselves! +事实上,我们开发[**TimescaleDB**](https://github.com/timescale/timescaledb)的过程非常反映了该行业的发展方向。 [TimescaleDB](http://www.timescale.com/)的早期内部版本使用了我们自己的类似SQL的查询语言“ioQL”。是的,我们也受到黑暗面的诱惑:构建我们自己的查询语言感觉强大。虽然看起来很简单,但我们很快意识到我们还需要做更多的工作:例如,决定语法,构建各种连接器,教育用户等等。我们还发现自己不断地查找正确的查询语法对于我们自己编写的查询语言,我们已经可以在SQL中表达了! -One day we realized that building our own query language made no sense. That the key was to embrace SQL. And that was one of the best design decisions we have made. Immediately a whole new world opened up. Today, even though we are just a 5 month old database, our users can use us in production and get all kinds of wonderful things out of the box: visualization tools (Tableau), connectors to common ORMs, a variety of tooling and backup options, an abundance of tutorials and syntax explanations online, etc. +有一天,我们意识到构建我们自己的查询语言毫无意义。关键是要拥抱SQL。这是我们做出的最佳设计决策之一。立刻开启了一个全新的世界。今天,即使我们只是一个5个月大的数据库,我们的用户也可以在生产中使用我们并获得开箱即用的各种精彩内容:可视化工具(Tableau),常见ORM的连接器,各种工具和备份选项,在线等丰富的教程和语法解释等。 [**Eye or the Tiger: Benchmarking Cassandra vs. TimescaleDB for time-series data** *How a 5 node TimescaleDB cluster outperforms 30 Cassandra nodes, with higher inserts, up to 5800x faster queries, 10%…*blog.timescale.com](https://blog.timescale.com/time-series-data-cassandra-vs-timescaledb-postgresql-7c2cc50a89ce) ------ -### But don’t take our word for it. Take Google’s. +### 但是不要相信我们的话。看看Google ![img](https://cdn-images-1.medium.com/max/1600/1*CiKNT6_V8VH5hRVoWNcIHA.png) -Google has clearly been on the leading edge of data engineering and infrastructure for over a decade now. It behooves us to pay close attention to what they are doing. +十多年来,谷歌显然一直处于数据工程和基础设施的前沿。 我们应该密切关注他们正在做的事情。 -Take a look at Google’s second major **Spanner** paper, released just four months ago ([Spanner: Becoming a SQL System](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46103.pdf), May 2017), and you’ll find that it bolsters our independent findings. +看一下四个月前发布的谷歌第二篇主要的**Spanner**论文([Spanner: Becoming a SQL System](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46103.pdf),2017年5月),你会发现它支持我们的独立发现。 -For example, Google began building on top of Bigtable, but then found that the lack of SQL created problems (emphasis in all quotes below ours): +例如,Google开始在Bigtable之上构建,但后来发现缺少SQL会产生问题(强调所有引号内容都在): -> “While these systems provided some of the benefits of a database system, they lacked many traditional database features that application developers often rely on. **A key example is a robust query language**, meaning that developers had to write complex code to process and aggregate the data in their applications. **As a result, we decided to turn Spanner into a full featured SQL system**, with query execution tightly integrated with the other architectural features of Spanner (such as strong consistency and global replication).” +> “虽然这些系统提供了数据库系统的一些优点,但它们缺少应用程序开发人员经常依赖的许多传统数据库功能。 **一个关键的例子是强大的查询语言**,这意味着开发人员必须编写复杂的代码来处理和聚合应用程序中的数据。 **因此,我们决定将Spanner变成一个功能齐全的SQL系统**,查询执行与Spanner的其他架构特性紧密集成(例如强一致性和全局复制)。” -Later in the paper they further capture the rationale for their transition from NoSQL to SQL: +在本文的后面,他们进一步了解了从NoSQL转换到SQL的基本原理: -> The original API of Spanner provided NoSQL methods for point lookups and range scans of individual and interleaved tables. While NoSQL methods provided a simple path to launching Spanner, and continue to be useful in simple retrieval scenarios, **SQL has provided significant additional value in expressing more complex data access patterns and pushing computation to the data**. +> Spanner的原始API为单个和交错表的点查找和范围扫描提供了NoSQL方法。 虽然NoSQL方法提供了启动Spanner的简单路径,并且在简单的检索方案中继续有用,但是**在表达更复杂的数据访问模式和将计算推送到数据方面提供了显着的附加价值**。 -The paper also describes how the adoption of SQL doesn’t stop at Spanner, but actually extends across the rest of Google, where multiple systems today share a common SQL dialect: +本文还描述了SQL的采用如何不止于Spanner,而是实际扩展到Google的其他部分,其中多个系统现在共享一种常见的SQL方言: -> **Spanner’s SQL engine shares a common SQL dialect, called “Standard SQL”,** with several other systems at Google including internal systems such as F1 and Dremel (among others), and external systems such as BigQuery… +> **Spanner的SQL引擎共享一种常见的SQL方言,称为“标准SQL”,**与谷歌的其他几个系统,包括内部系统,如F1和Dremel(以及其他系统),以及外部系统,如BigQuery ...... -> **For users within Google, this lowers the barrier of working across the systems.** A developer or data analyst who writes SQL against a Spanner database can transfer their understanding of the language to Dremel without concern over subtle differences in syntax, NULL handling, etc. +> **对于Google中的用户,这降低了跨系统工作的障碍。**针对Spanner数据库编写SQL的开发人员或数据分析师可以将他们对语言的理解转移到Dremel,而无需担心语法上的细微差别,NULL处理 等 -The success of this approach speaks for itself. Spanner is already the *“source of truth”* for major Google systems, including AdWords and Google Play, while *“Potential Cloud customers are overwhelmingly interested in using SQL.”* +这种方法的成功说明了一切。 对于主要的Google系统,包括AdWords和Google Play,扳手已经是*“真相来源”*,而*“潜在云客户对使用SQL非常感兴趣。”* -Considering that Google helped initiate the NoSQL movement in the first place, it is quite remarkable that it is embracing SQL today. (Leading some to recently wonder: “[Did Google Send the Big Data Industry on a 10 Year Head Fake?](https://medium.com/@garyorenstein/did-google-send-the-big-data-industry-on-a-10-year-head-fake-9c94d553925a)”.) +考虑到谷歌首先帮助启动了NoSQL运动,今天它正在接受SQL是非常值得注意的。 (引导一些人最近想知道: “[Did Google Send the Big Data Industry on a 10 Year Head Fake?](https://medium.com/@garyorenstein/did-google-send-the-big-data-industry-on-a-10-year-head-fake-9c94d553925a)”.) ------ -### What this means for the future of data: SQL as the universal interface +### 这对数据的未来意味着什么:SQL作为通用接口 -In computer networking, there is a concept called the “[narrow waist](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1.4614&rep=rep1&type=pdf),” describing a universal interface. +在计算机网络中,有一个叫做“[narrow waist](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1.4614&rep=rep1&type=pdf),”的概念用来描述通用接口。 -This idea emerged to solve a key problem: On any given networked device, imagine a stack, with layers of hardware at the bottom and layers of software on top. There can exist a variety of networking hardware; similarly there can exist a variety of software and applications. One needs a way to ensure that no matter the hardware, the software can still connect to the network; and no matter the software, that the networking hardware knows how to handle the network requests. +出现这个想法是为了解决一个关键问题:在任何给定的网络设备上,想象一下堆栈,底层有硬件层,顶层有软件层。 可以存在各种网络硬件; 类似地,可以存在各种软件和应用程序。 人们需要一种方法来确保无论硬件如何,软件仍然可以连接到网络; 无论软件如何,网络硬件都知道如何处理网络请求。 ![img](https://cdn-images-1.medium.com/max/1600/0*qm2HH4Ob3YnH3C3f.) -**IP as the Networking Universal Interface (**[**source**](http://slideplayer.com/slide/7597601/)**)** +**IP作为网络通用接口 (**[**source**](http://slideplayer.com/slide/7597601/)**)** -In networking, the role of the universal interface is played by [Internet Protocol (IP)](https://en.wikipedia.org/wiki/Internet_Protocol), acting as a connecting layer between lower-level networking protocols designed for local-area network, and higher-level application and transport protocols. ([Here’s one nice explanation](https://www.youtube.com/watch?v=uXumm52oBMo).) And (in a broad oversimplification), this universal interface became the *lingua franca* for computers, enabling networks to interconnect, devices to communicate, and this “network of networks” to grow into today’s rich and varied Internet. +在网络中,通用接口的作用由[因特网协议(IP)](https://en.wikipedia.org/wiki/Internet_Protocol)起作用,作为为本地设计的低级网络协议之间的连接层。区域网络,以及更高级别的应用程序和传输协议。 ([这是一个很好的解释](https://www.youtube.com/watch?v=uXumm52oBMo)。)和(在一个广泛的过度简化),这个通用接口成为计算机的*通用语言*,使网络互连,设备进行通信,这个“网络中的网络”将成长为今天丰富多彩的互联网。 -**We believe that SQL has become the universal interface for data analysis.** +**我们相信SQL已成为数据分析的通用接口。** -We live in an era where data is becoming “the world’s most valuable resource” ([The Economist, May 2017](https://www.economist.com/news/leaders/21721656-data-economy-demands-new-approach-antitrust-rules-worlds-most-valuable-resource)). As a result, we have seen a Cambrian explosion of specialized databases (OLAP, time-series, document, graph, etc.), data processing tools (Hadoop, Spark, Flink), data buses (Kafka, RabbitMQ), etc. We also have more applications that need to rely on this data infrastructure, whether third-party data visualization tools (Tableau, Grafana, PowerBI, Superset), web frameworks (Rails, Django) or custom-built data-driven applications. +我们生活在一个数据正在成为“世界上最宝贵的资源”的时代([The Economist,2017年5月](https://www.economist.com/news/leaders/21721656-data-economy-demands-new-approach-antitrust-rules-worlds-most-valuable-resource))。结果,我们看到寒武纪爆炸的专业数据库(OLAP,时间序列,文档,图形等),数据处理工具(Hadoop,Spark,Flink),数据总线(Kafka,RabbitMQ)等。还有更多需要依赖这种数据基础架构的应用程序,无论是第三方数据可视化工具(Tableau,Grafana,PowerBI,Superset),Web框架(Rails,Django)还是定制的数据驱动应用程序。 ![img](https://cdn-images-1.medium.com/max/1600/1*iC7lwedryNOSSYiQc3M7-Q.png) -Like networking we have a complex stack, with infrastructure on the bottom and applications on top. Typically, we end up writing a lot of glue code to make this stack work. But glue code can be brittle: it needs to be maintained and tended to. +与网络一样,我们有一个复杂的堆栈,底层有基础设施,顶层有应用程序。通常,我们最终编写了大量的胶水代码来使这个堆栈工作。但胶水代码可能很脆弱:它需要维护和倾向于。 -What we need is an interface that allows pieces of this stack to communicate with one another. Ideally something already standardized in the industry. Something that would allow us to swap in/out various layers with minimal friction. +我们需要的是一个接口,允许这个堆栈的各个部分相互通信。理想情况下,业界已经标准化了。能够让我们以最小的摩擦交换各种层的东西。 -That is the power of SQL. Like IP, SQL is a universal interface. +这就是SQL的强大功能。与IP一样,SQL是一种通用接口。 -But SQL is in fact much more than IP. Because data also gets analyzed by humans. And true to the purpose that SQL’s creators initially assigned to it, SQL is readable. +但SQL实际上远不止IP。因为数据也会被人类分析。对于SQL创建者最初分配给它的目的而言,SQL是可读的。 -Is SQL perfect? No, but it is the language that most of us in the community know. And while there are already engineers out there working on a more natural language oriented interface, what will those systems then connect to? SQL. +SQL完美吗?不,但这是我们社区大多数人都知道的语言。虽然已经有工程师在开发更自然的语言界面,但这些系统会连接到什么? SQL。 -So there is another layer at the very top of the stack. And that layer is us. +所以在堆栈的最顶层还有另一层。那层是我们。 ------ -### SQL is Back +### SQL回来了 -SQL is back. Not just because writing glue code to kludge together NoSQL tools is annoying. Not just because retraining workforces to learn a myriad of new languages is hard. Not just because standards can be a good thing. +SQL回来了。 不只是因为编写胶水代码以合并NoSQL工具是令人讨厌的。 这不仅仅是因为重新培训劳动力来学习无数新语言也很困难。 不仅仅因为标准可以是一件好事。 -But also because the world is filled with data. It surrounds us, binds us. At first, we relied on our human senses and sensory nervous systems to process it. Now our software and hardware systems are also getting smart enough to help us. And as we collect more and more data to make better sense of our world, the complexity of our systems to store, process, analyze, and visualize that data will only continue to grow as well. +但也因为世界充满了数据。 它环绕着我们,束缚着我们。 起初,我们依靠人类的感官和感觉神经系统来处理它。 现在,我们的软件和硬件系统也变得足够聪明,可以帮助我们。 随着我们收集越来越多的数据以更好地了解我们的世界,我们用于存储,处理,分析和可视化数据的系统的复杂性也将继续增长。 ![img](https://cdn-images-1.medium.com/max/1600/0*0NbRxZrtmccWwYJ_.) -**Master Data Scientist Yoda** +**硕士数据科学家尤达** -Either we can live in a world of brittle systems and a million interfaces. Or we can continue to embrace SQL. And restore balance to the force. +要么我们生活在一个脆弱的系统和一百万个接口的世界里。 或者我们可以继续接受SQL。 并恢复力量的平衡。 ------ -- GitLab