diff --git a/docs/UserGuide/Ecosystem Integration/Spark IoTDB.md b/docs/UserGuide/Ecosystem Integration/Spark IoTDB.md index 790ce9f45d6b34341933f6f33885e1f9f649c961..b89b2b3ea2f5c65dd850b37b75433e294b12bf61 100644 --- a/docs/UserGuide/Ecosystem Integration/Spark IoTDB.md +++ b/docs/UserGuide/Ecosystem Integration/Spark IoTDB.md @@ -19,6 +19,7 @@ --> ## Spark-IoTDB + ### version The versions required for Spark and Java are as follow: @@ -28,12 +29,13 @@ The versions required for Spark and Java are as follow: | `2.4.5` | `2.12` | `1.8` | `0.13.0`| -> Currently we only support spark version 2.4.3 and there are some known issue on 2.4.7, do no use it +> Currently we only support spark version 2.4.5 and there are some known issue on 2.4.7, do no use it ### Install +```shell mvn clean scala:compile compile install - +``` #### Maven Dependency @@ -45,11 +47,15 @@ mvn clean scala:compile compile install ``` - #### spark-shell user guide +Notice: There is a conflict of thrift version between IoTDB and Spark. +Therefore, if you want to debug in spark-shell, you need to execute `rm -f $SPARK_HOME/jars/libthrift*` and `cp $IOTDB_HOME/lib/libthrift* $SPARK_HOME/jars/` to resolve it. +Otherwise, you can only debug the code in IDE. If you want to run your task by `spark-submit`, you must package with dependency. + + ``` -spark-shell --jars spark-iotdb-connector-0.13.0.jar,iotdb-jdbc-0.13.0-jar-with-dependencies.jar +spark-shell --jars spark-iotdb-connector-0.13.0.jar,iotdb-jdbc-0.13.0-jar-with-dependencies.jar,iotdb-session-0.13.0-jar-with-dependencies.jar import org.apache.iotdb.spark.db._ @@ -63,7 +69,7 @@ df.show() To partition rdd: ``` -spark-shell --jars spark-iotdb-connector-0.13.0.jar,iotdb-jdbc-0.13.0-jar-with-dependencies.jar +spark-shell --jars spark-iotdb-connector-0.13.0.jar,iotdb-jdbc-0.13.0-jar-with-dependencies.jar,iotdb-session-0.13.0-jar-with-dependencies.jar import org.apache.iotdb.spark.db._ @@ -118,7 +124,7 @@ You can also use narrow table form which as follows: (You can see part 4 about h * from wide to narrow -``` +```scala import org.apache.iotdb.spark.db._ val wide_df = spark.read.format("org.apache.iotdb.spark.db").option("url", "jdbc:iotdb://127.0.0.1:6667/").option("sql", "select * from root where time < 1100 and time > 1000").load @@ -127,7 +133,7 @@ val narrow_df = Transformer.toNarrowForm(spark, wide_df) * from narrow to wide -``` +```scala import org.apache.iotdb.spark.db._ val wide_df = Transformer.toWideForm(spark, narrow_df) @@ -135,11 +141,11 @@ val wide_df = Transformer.toWideForm(spark, narrow_df) #### Java user guide -``` +```java import org.apache.spark.sql.Dataset; import org.apache.spark.sql.Row; import org.apache.spark.sql.SparkSession; -import org.apache.iotdb.spark.db.* +import org.apache.iotdb.spark.db.*; public class Example { @@ -158,15 +164,15 @@ public class Example { df.show(); - Dataset narrowTable = Transformer.toNarrowForm(spark, df) - narrowTable.show() + Dataset narrowTable = Transformer.toNarrowForm(spark, df); + narrowTable.show(); } } ``` -## Write Data to IoTDB +### Write Data to IoTDB -### User Guide +#### User Guide ``` scala // import narrow table val df = spark.createDataFrame(List( @@ -205,6 +211,6 @@ dfWithColumn.write.format("org.apache.iotdb.spark.db") .save ``` -### Notes +#### Notes 1. You can directly write data to IoTDB whatever the dataframe contains a wide table or a narrow table. -2. The parameter `numPartition` is used to set the number of partitions. The dataframe that you want to save will be repartition based on this parameter before writing data. Each partition will open a session to write data to increase the number of concurrent requests. \ No newline at end of file +2. The parameter `numPartition` is used to set the number of partitions. The dataframe that you want to save will be repartition based on this parameter before writing data. Each partition will open a session to write data to increase the number of concurrent requests. diff --git a/docs/zh/UserGuide/Ecosystem Integration/Spark IoTDB.md b/docs/zh/UserGuide/Ecosystem Integration/Spark IoTDB.md index 3b6320ed21a8e908d8c850b5a3aa13ee7e2e51d0..04f5bf0f8fdcd5ad10392026348cec143ff8324f 100644 --- a/docs/zh/UserGuide/Ecosystem Integration/Spark IoTDB.md +++ b/docs/zh/UserGuide/Ecosystem Integration/Spark IoTDB.md @@ -45,7 +45,10 @@ mvn clean scala:compile compile install #### Spark-shell用户指南 -``` +注意:因为IoTDB与Spark的thrift版本有冲突,所以需要通过执行`rm -f $SPARK_HOME/jars/libthrift*`和`cp $IOTDB_HOME/lib/libthrift* $SPARK_HOME/jars/`这两个命令来解决。 +否则的话,就只能在IDE里面进行代码调试。而且如果你需要通过`spark-submit`命令提交任务的话,你打包时必须要带上依赖。 + +```shell spark-shell --jars spark-iotdb-connector-0.13.0.jar,iotdb-jdbc-0.13.0-jar-with-dependencies.jar import org.apache.iotdb.spark.db._ @@ -59,7 +62,7 @@ df.show() 如果要对rdd进行分区,可以执行以下操作 -``` +```shell spark-shell --jars spark-iotdb-connector-0.13.0.jar,iotdb-jdbc-0.13.0-jar-with-dependencies.jar import org.apache.iotdb.spark.db._ @@ -123,7 +126,7 @@ TsFile中的现有数据如下: * 从宽到窄 -``` +```scala import org.apache.iotdb.spark.db._ val wide_df = spark.read.format("org.apache.iotdb.spark.db").option("url", "jdbc:iotdb://127.0.0.1:6667/").option("sql", "select * from root where time < 1100 and time > 1000").load @@ -132,7 +135,7 @@ val narrow_df = Transformer.toNarrowForm(spark, wide_df) * 从窄到宽 -``` +```scala import org.apache.iotdb.spark.db._ val wide_df = Transformer.toWideForm(spark, narrow_df) @@ -140,11 +143,11 @@ val wide_df = Transformer.toWideForm(spark, narrow_df) #### Java用户指南 -``` +```java import org.apache.spark.sql.Dataset; import org.apache.spark.sql.Row; import org.apache.spark.sql.SparkSession; -import org.apache.iotdb.spark.db.* +import org.apache.iotdb.spark.db.*; public class Example { @@ -163,14 +166,14 @@ public class Example { df.show(); - Dataset narrowTable = Transformer.toNarrowForm(spark, df) - narrowTable.show() + Dataset narrowTable = Transformer.toNarrowForm(spark, df); + narrowTable.show(); } } ``` -## 写数据到IoTDB -### 用户指南 +### 写数据到IoTDB +#### 用户指南 ``` scala // import narrow table val df = spark.createDataFrame(List( @@ -209,6 +212,6 @@ dfWithColumn.write.format("org.apache.iotdb.spark.db") .save ``` -### 注意 +#### 注意 1. 无论dataframe中存放的是窄表还是宽表,都可以直接将数据写到IoTDB中。 -2. numPartition参数是用来设置分区数,会在写入数据之前给dataframe进行重分区。每一个分区都会开启一个session进行数据的写入,来提高并发数。 \ No newline at end of file +2. numPartition参数是用来设置分区数,会在写入数据之前给dataframe进行重分区。每一个分区都会开启一个session进行数据的写入,来提高并发数。