提交 bd30ea80 编写于 作者: X XuYi 提交者: Xiangdong Huang

[IOTDB-17] add english doc (#18)

* add english doc
上级 b576720e
......@@ -47,7 +47,7 @@ IoTDB's features are as following:
7. Intense integration with Open Source Ecosystem. IoTDB supports Hadoop, Spark, etc. analysis ecosystems and Grafana visualization tool.
For the latest information about IoTDB, please visit our [IoTDB official website](http://tsfile.org/index) (will transfer to iotdb.apache.org in the future).
For the latest information about IoTDB, please visit our [IoTDB official website](http://iotdb.apache.org/#/).
# Prerequisites
......@@ -58,38 +58,81 @@ To use IoTDB, you need to have:
If you want to use Hadoop or Spark to analyze IoTDB data file (called as TsFile), you need to compile the hadoop and spark modules.
# Quick Start
This short guide will walk you through the basic process of using IoTDB. For a more-complete guide, please visit our website's [Document Part](http://tsfile.org/document).
This short guide will walk you through the basic process of using IoTDB. For a more-complete guide, please visit our website's [Document Part](http://iotdb.apache.org/#/Documents/Quick%20Start).
## Build
### Installation from source code
Use git to get IoTDB source code:
If you are not the first time that building IoTDB, remember deleting the following files:
```
rm -rf iotdb/iotdb/data/
rm -rf iotdb/iotdb/lib/
Shell > git clone https://github.com/apache/incubator-iotdb.git
```
Then you can build IoTDB using Maven in current folder:
Or:
```
mvn clean package -Dmaven.test.skip=true
Shell > git clone git@github.com:apache/incubator-iotdb.git
```
Now suppose your directory is like this:
```
> pwd
/User/workspace/incubator-iotdb
(__NOTICE:__ Remember that you have to use -Dmaven.test.skip=true before you run tests, it is because some IT (integration test) requires jars in iotdb-cli/cli/lib/, and the folder is empty before you run `mvn package`. After that, you can run `mvn test` as long as you do not run `mvn clean`. For more details, see: [How to test IoTDB
](https://github.com/thulab/iotdb/wiki/How-to-test-IoTDB) )
> ls -l
incubator-iotdb/ <-- root path
|
+- iotdb/
|
+- jdbc/
|
+- tsile/
|
...
|
+- pom.xml
```
Let $IOTDB_HOME = /User/workspace/incubator-iotdb/iotdb/iotdb/
If you are not the first time that building IoTDB, remember deleting the following files:
```
> rm -rf $IOTDB_HOME/data/
> rm -rf $IOTDB_HOME/lib/
```
Then under the root path of incubator-iotdb, you can build IoTDB using Maven:
```
> pwd
/User/workspace/incubator-iotdb
> mvn clean package -pl iotdb -am -Dmaven.test.skip=true
```
If successful, you will see the the following text in the terminal:
```
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] IoTDB Root ......................................... SUCCESS [ 7.020 s]
[INFO] TsFile ............................................. SUCCESS [ 10.486 s]
[INFO] Service-rpc ........................................ SUCCESS [ 3.717 s]
[INFO] IoTDB Jdbc ......................................... SUCCESS [ 3.076 s]
[INFO] IoTDB .............................................. SUCCESS [ 8.258 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
```
Otherwise, you may need to check the error statements and fix the problems.
After build, the IoTDB project will be at the folder "iotdb/iotdb". The folder will include the following contents:
```
iotdb/iotdb/ <-- root path
|
......@@ -98,11 +141,9 @@ iotdb/iotdb/ <-- root path
+- conf/ <-- configuration files
|
+- lib/ <-- project dependencies
|
+- LICENSE <-- LICENSE
```
> NOTE: We also provide already built JARs and project at [http://tsfile.org/download](http://tsfile.org/download) instead of build the jar package yourself.
<!-- > NOTE: We also provide already built JARs and project at [http://tsfile.org/download](http://tsfile.org/download) instead of build the jar package yourself. -->
## Configure
......@@ -110,7 +151,7 @@ Before starting to use IoTDB, you need to config the configuration files first.
In total, we provide users three kinds of configurations module: environment config module (iotdb-env.bat, iotdb-env.sh), system config module (tsfile-format.properties, iotdb-engine.properties) and log config module (logback.xml). All of these kinds of configuration files are put in iotdb/config folder.
For more, you are advised to check our website [document page](http://tsfile.org/document). The forth chapter in User Guide Document will give you the details.
For more, you are advised to check our website [document page](http://iotdb.apache.org/#/Documents/Quick%20Start). The forth chapter in User Guide Document will give you the details.
## Start
......@@ -120,10 +161,10 @@ After that we start the server. Running the startup script:
```
# Unix/OS X
> ./bin/start-server.sh
> $IOTDB_HOME/bin/start-server.sh
# Windows
> bin\start-server.bat
> $IOTDB_HOME\bin\start-server.bat
```
### Stop Server
......@@ -132,10 +173,10 @@ The server can be stopped with ctrl-C or the following script:
```
# Unix/ OS X
> ./bin/stop-server.sh
> $IOTDB_HOME/bin/stop-server.sh
# Windows
> bin\stop-server.bat
> $IOTDB_HOME\bin\stop-server.bat
```
### Start Client
......@@ -146,10 +187,10 @@ Now let's trying to read and write some data from IoTDB using our Client. To sta
cd cli/cli
# Unix/OS X
> ./bin/start-client.sh -h <ip> -p <port> -u <username> -pw <password>
> $IOTDB_HOME/bin/start-client.sh -h <IP> -p <PORT> -u <USER_NAME>
# Windows
> bin\start-client.bat -h <ip> -p <port> -u <username> -pw <password>
> $IOTDB_HOME\bin\start-client.bat -h <IP> -p <PORT> -u <USER_NAME>
```
> NOTE: In the system, we set a default user in IoTDB named 'root'. The default password for 'root' is 'root'. You can use this default user if you are making the first try or you didn't create users by yourself.
......@@ -162,7 +203,7 @@ The command line client is interactive so if everything is ready you should see
| | .--.|_/ | | \_| | | `. \ | |_) |
| | / .'`\ \ | | | | | | | __'.
_| |_| \__. | _| |_ _| |_.' /_| |__) |
|_____|'.__.' |_____| |______.'|_______/ version 0.7.0
|_____|'.__.' |_____| |______.'|_______/ version x.x.x
IoTDB> login successfully
......@@ -219,7 +260,7 @@ execute successfully.
If your session looks similar to what's above, congrats, your IoTDB is operational!
For more on what commands are supported by IoTDB SQL, see our website [document page](http://tsfile.org/document). The eighth chapter in User Guide Document will give you help.
For more on what commands are supported by IoTDB SQL, see our website [document page](http://iotdb.apache.org/#/Documents/Quick%20Start). The eighth chapter in User Guide Document will give you help.
# Usage of import-csv.sh
......@@ -238,6 +279,8 @@ CREATE TIMESERIES root.fit.p.s1 WITH DATATYPE=INT32,ENCODING=RLE;
### Run import shell
```
cd cli/cli
# Unix/OS X
> ./bin/import-csv.sh -h <ip> -p <port> -u <username> -pw <password> -f <xxx.csv>
......@@ -253,6 +296,8 @@ csvInsertError.error
### Run export shell
```
cd cli/cli
# Unix/OS X
> ./bin/export-csv.sh -h <ip> -p <port> -u <username> -pw <password> -td <xxx.csv> [-tf <time-format>]
......
# History&Vision
Comming Soon.
<!-- TOC -->
- [Powered By](#powered-by)
- [Project and Product names using "IoTDB"](#project-and-product-names-using-iotdb)
- [Companies and Organizations](#companies-and-organizations)
<!-- /TOC -->
## Powered By
### Project and Product names using "IoTDB"
Organizations creating products and projects using Apache IoTDB, along with associated marketing materials, should take care to respect the trademark in “Apache IoTDB” and its logo. Please refer to [ASF Trademarks Guidance](https://www.apache.org/foundation/marks/) and associated [FAQ](https://www.apache.org/foundation/marks/faq/) for comprehensive and authoritative guidance on proper usage of ASF trademarks.
It is recommended to not include “IoTDB” in any names to prevent potential trademark issue with the IoTDB project.
As an example, names like “IoTDB BigDataProduct” should not be used, as the name include “IoTDB” in general. The above links, however, describe some exceptions, like for names such as “BigDataProduct, powered by Apache IoTDB” or “BigDataProduct for Apache IoTDB”. In summary, any names contain "Apache IoTDB" as a whole are acceptable.
A common practice you can take is to create software identifiers (Maven coordinates, module names, etc.) like “iotdb-tool”. These are permitted. Nominative use of trademarks in descriptions is also allowed, as in “BigDataProduct is a subproduct for Apache IoTDB”.
### Companies and Organizations
To add yourself to the list, please email dev@iotdb.apache.org with your organization name, URL, a list of IoTDB components you are using, and a short description of your use case.
- School of Software (Tsinghua University), and National Engineering Laboratery for Big Data Software that initially launched IoTDB
- We have both graduate students and a team of professional software engineers working on the stack
# Project Committers
| Name | Organization |
| :----------- | -------------: |
| Jianmin Wang | Tsinghua University |
|Xiangdong Huang | Tsinghua University |
| Jun Yuan | Tsinghua University |
| Chen Wang | Tsinghua University |
| Jialin Qiao | Tsinghua University |
| Jinrui Zhang | Tsinghua University |
| Rong Kang | Tsinghua University |
| Tian Jiang | Tsinghua University |
| Shuo Zhang | K2Data Company |
| Lei Rui | Tsinghua University |
| Rui Liu | Tsinghua University |
| Gaofei Cao | Tsinghua University |
| Xinyi Zhao | Tsinghua University |
| Yi Xu | Tsinghua University |
| Dongfang Mao | Tsinghua University |
| Tianan li | Tsinghua University |
| Yue Su | Tsinghua University |
| Hui Da | Lenovo |
<!-- TOC -->
- [Have Questions](#have-questions)
- [Mailing Lists](#mailing-lists)
- [JIRA issues](#jira-issues)
- [How to contribute](#how-to-contribute)
- [Becoming a committer](#becoming-a-committer)
- [Contributing by Helping Other Users](#contributing-by-helping-other-users)
- [Contributing by Testing Releases](#contributing-by-testing-releases)
- [Contributing by Reviewing Changes](#contributing-by-reviewing-changes)
- [Contributing by Documentation Changes](#contributing-by-documentation-changes)
- [Contributing Bug Reports](#contributing-bug-reports)
- [Contributing Code Changes](#contributing-code-changes)
- [Cloning source code](#cloning-source-code)
- [JIRA](#jira)
- [Pull Request](#pull-request)
- [The Review Process](#the-review-process)
- [Closing Your Pull Request / JIRA](#closing-your-pull-request--jira)
- [Code Style](#code-style)
<!-- /TOC -->
# Have Questions
## Mailing Lists
It is recommended to use our mailing lists to ask for help, report issues or contribute to the project.
* dev@iotdb.apache.org is for anyone who want to contribute codes to IoTDB or have usage questions for IoTDB.
Some quick tips when using email:
* For error logs or long code examples, please use GitHub gist and include only few lines of the pertinent code / log within the email.
* No jobs, sales, or solicitation is permitted on the Apache IoTDB mailing lists.
PS. To subscribe our mail list, you can send an email to dev-subscribe@iotdb.incubator.apache.org and you will receive a "confirm subscribe to dev@iotdb.apache.org" email, following the steps to confirm your subscription.
## JIRA issues
The project tracks issues and new features on [JIRA issues](https://issues.apache.org/jira/projects/IOTDB/issues). You can create a new issue to report a bug, request a new feature or provide your custmon issue.
# How to contribute
## Becoming a committer
To become a committer, you should first be active in our community so that most of our existing committers recognize you. Pushing codes and creating pull requests is just one of committer's rights. Moreover, it is committer's duty to help new uesrs on mail list, test new releases and improve documentation.
### Contributing by Helping Other Users
Since Apache IoTDB always attracts new users, it would be great if you can help them by answering questions on the dev@iotdb.apache.org mail list. We regard it as a valuable contribution. Also, the more questions you answer, the more poeple know you. Popularity is one of the necessary conditions to be a committer.
Contributors should subscribe to our mailing list to catch up the latest progress.
### Contributing by Testing Releases
IoTDB's new release is visible to everyone, members of the community can vote to accept these releases on the dev@iotdb.apache.org mailing list. Users of IoTDB will be invited to try out on their workloads and provide feedback on any performance or correctness issues found in the newer release.
### Contributing by Reviewing Changes
Changes to IoTDB source codes are made through Github pull request. Anyone can review and comment on these changes. Reviewing others' pull requests can help you comprehend how a bug is fixed or a new feature is added. Besides, Learning directly from source code will give you a deeper understanding of how IoTDB system works and where its bottlenecks lie. You can help by reviewing the changes, asking questions and pointing out issues.
### Contributing by Documentation Changes
To propose a change to release documentation (that is, docs that appear under <https://iotdb.apache.org/#/Documents>), edit the Markdown source files in Iotdb’s docs/ directory(`documentation-EN` branch). The process to propose a doc change is otherwise the same as the process for proposing code changes below.
### Contributing Bug Reports
If you encounter a problem, try to search the mailing list and JIRA to check whether other people have faced the same situation. If it is not reported before, please report an issue.
Once you are sure it is a bug, it may be reported by creating a JIRA without creating a pull request. In the bug report, you should provide enough information to understand, isolate and ideally reproduce the bug. Unreproducible bugs, or simple error reports, may be closed.
It’s very helpful if the bug report has a description about how the bug was introduced, by which commit, so that reviewers can easily understand the bug. It also helps committers to decide how far the bug fix should be backported, when the pull request is merged. The pull request to fix the bug should narrow down the problem to the root cause.
Performance regression is also one kind of bug. The pull request to fix a performance regression must provide a benchmark to prove the problem is indeed fixed.
Note that, data correctness/loss bugs are our first priority to solve. Please make sure the corresponding bug-reporting JIRA ticket is labeled as correctness or data-loss. If the bug report doesn’t gain enough attention, please include it and send an email to dev@iotdb.apache.org.
### Contributing Code Changes
> When you contribute code, you affirm that the contribution is your original work and that you license the work to the project under the project’s open source license. Whether or not you state this explicitly, by submitting any copyrighted material via pull request, email, or other means you agree to license the material under the project’s open source license and warrant that you have the legal authority to do so. Any new files contributed should be under Apache 2.0 License with a header on top of it.
#### Cloning source code
```
$ git clone git@github.com:apache/incubator-iotdb.git
```
Following README.md to test, run or build IoTDB.
#### JIRA
Generally, IoTDB uses JIRA to track logical issues, including bugs and improvements, and uses Github pull requests to manage the review and merge specific code changes. That is, JIRAs are used to describe what should be fixed or changed, proposing high-level approaches. Pull requests describe how to implement that change in the project’s source code. For example, major design decisions discussed in JIRA.
1. Find the existing IoTDB JIRA that the change pertains to.
1. o not create a new JIRA if you send a PR to address an existing issue labeled in JIRA; add it to the existing discussion.
2. Look for existing pull requests that are linked from the JIRA, to understand if someone is already working on the JIRA
2. If the change is new, then it usually needs a new JIRA. However, trivial changes, such as changes are self-explained do not require a JIRA. Example: Fix spelling error in JavaDoc
3. If required, create a new JIRA:
1. Provide a descriptive Title. “Problem in XXXManager” is not sufficient. “IoTDB failed to start on jdk11 because jdk11 does not support -XX:+PrintGCDetail” is good.
2. Write a detailed Description. For bug reports, this should ideally include a short reproduction of the problem. For new features, it may include a design document.
3. Set required fields:
1. Issue Type. Generally, Bug, Improvement and New Feature are the only types used in IoTDB.
2. Priority. Set to Major or below; higher priorities are generally reserved for committers to set. The main exception is correctness or data-loss issues, which can be flagged as Blockers. JIRA tends to unfortunately conflate “size” and “importance” in its Priority field values. Their meaning is roughly:
1. Blocker: pointless to release without this change as the release would be unusable to a large minority of users. Correctness and data loss issues should be considered Blockers.
2. Critical: a large minority of users are missing important functionality without this, and/or a workaround is difficult
3. Major: a small minority of users are missing important functionality without this, and there is a workaround
4. Minor: a niche use case is missing some support, but it does not affect usage or is easily worked around
5. Trivial: a nice-to-have change but unlikely to be any problem in practice otherwise
3. Affected Version. For Bugs, assign at least one version that is known to reproduce the issue or need to be changed
4. Label. Not widely used, except for the following:
* correctness: a correctness issue
* data-loss: a data loss issue
* release-notes: the change’s effects need mention in release notes. The JIRA or pull request should include detail suitable for inclusion in release notes – see “Docs Text” below.
* starter: small, simple change suitable for new contributors
5. Docs Text: For issues that require an entry in the release notes, this should contain the information that the release manager should include. Issues should include a short summary of what behavior is impacted, and detail on what behavior changed. It can be provisionally filled out when the JIRA is opened, but will likely need to be updated with final details when the issue is resolved.
4. Do not set the following fields:
1. Fix Version. This is assigned by committers only when resolved.
2. Target Version. This is assigned by committers to indicate a PR has been accepted for possible fix by the target version.
5. Do not include a patch file; pull requests are used to propose the actual change.
4. If the change is a large change, consider raising a discussion on it at dev@iotdb.apache.org first before proceeding to implement the change. Currently, we use https://cwiki.apache.org/confluence to store design proposals and release process. Users can also send them there.
#### Pull Request
1. Fork the Github repository at https://github.com/apache/incubator-iotdb if you haven’t done already.
2. Clone your fork, create a new branch, push commits to the branch.
3. Please add documentation and tests to explain/cover your changes.
Run all tests with [How to test](https://github.com/thulab/iotdb/wiki/How-to-test-IoTDB) to verify your change.
4. Open a pull request against the master branch of IoTDB. (Only in special cases would the PR be opened against other branches.)
1. The PR title should be in the form of "IoTDB-xxxx", where xxxx is the relevant JIRA number.
2. If the pull request is still under work in progress stage but needs to be pushed to Github to request for review, please add "WIP" after the PR title.
3. Consider identifying committers or other contributors who have worked on the code being changed. Find the file(s) in Github and click “Blame” to see a line-by-line annotation of who changed the code last. You can add @username in the PR description to ping them immediately.
4. Please state that the contribution is your original work and that you license the work to the project under the project’s open source license.
5. The related JIRA, if any, will be marked as “In Progress” and your pull request will automatically be linked to it. There is no need to be the Assignee of the JIRA to work on it, though you are welcome to comment that you have begun work.
6. The Jenkins automatic pull request builder will test your changes
1. If it is your first contribution, Jenkins will wait for confirmation before building your code and post “Can one of the admins verify this patch?”
2. A committer can authorize testing with a comment like “ok to test”
3. A committer can automatically allow future pull requests from a contributor to be tested with a comment like “Jenkins, add to whitelist”
7. Watch for the results, and investigate and fix failures promptly
1. Fixes can simply be pushed to the same branch from which you opened your pull request
2. Jenkins will automatically re-test when new commits are pushed
3. If the tests failed for reasons unrelated to the change (e.g. Jenkins outage), then a committer can request a re-test with “Jenkins, retest this please”. Ask if you need a test restarted. If you were added by “Jenkins, add to whitelist” from a committer before, you can also request the re-test.
#### The Review Process
* Other reviewers, including committers, may comment on the changes and suggest modifications. Changes can be added by simply pushing more commits to the same branch.
* Lively, polite, rapid technical debate is encouraged from everyone in the community. The outcome may be a rejection of the entire change.
* Keep in mind that changes to more critical parts of IoTDB, like its read/write data from/to disk, will be subjected to more review, and may require more testing and proof of its correctness than other changes.
* Reviewers can indicate that a change looks suitable for merging with a comment such as: “I think this patch looks good” or "LGTM". If you comment LGTM you will be expected to help with bugs or follow-up issues on the patch. Consistent, judicious use of LGTMs is a great way to gain credibility as a reviewer with the broader community.
* Sometimes, other changes will be merged which conflict with your pull request’s changes. The PR can’t be merged until the conflict is resolved. This can be resolved by, for example, adding a remote to keep up with upstream changes by
```shell
git remote add upstream git@github.com:apache/incubator-iotdb.git
git fetch upstream
git rebase upstream/master
# or you can use `git pull --rebase upstream master` to replace the above two commands
# resolve your conflicts
# push codes to your branch
```
* Try to be responsive to the discussion rather than let days pass between replies
#### Closing Your Pull Request / JIRA
* If a change is accepted, it will be merged and the pull request will automatically be closed, along with the associated JIRA if any
* Note that in the rare case you are asked to open a pull request against a branch beside master, you actually have to close the pull request manually
* The JIRA will be Assigned to the primary contributor to the change as a way of giving credit. If the JIRA isn’t closed and/or Assigned promptly, comment on the JIRA.
* If your pull request is ultimately rejected, please close it promptly
* … because committers can’t close PRs directly
* Pull requests will be automatically closed by an automated process at Apache after about a week if a committer has made a comment like “mind closing this PR?” This means that the committer is specifically requesting that it be closed.
* If a pull request has gotten little or no attention, consider improving the description or the change itself and ping likely reviewers again after a few days. Consider proposing a change that’s easier to include, like a smaller and/or less invasive change.
* If it has been reviewed but not taken up after weeks, after soliciting review from the most relevant reviewers, or, has met with neutral reactions, the outcome may be considered a “soft no”. It is helpful to withdraw and close the PR in this case.
* If a pull request is closed because it is deemed not the right approach to resolve a JIRA, then leave the JIRA open. However if the review makes it clear that the issue identified in the JIRA is not going to be resolved by any pull request (not a problem, won’t fix) then also resolve the JIRA
#### Code Style
For Java code, Apache IoTDB follows Google’s Java Style Guide.
<!-- TOC -->
- [Frequently Asked Questions](#frequently-asked-questions)
- [How can I identify my version of IoTDB?](#how-can-i-identify-my-version-of-iotdb)
- [Where can I find IoTDB logs?](#where-can-i-find-iotdb-logs)
- [Where can I find IoTDB data files?](#where-can-i-find-iotdb-data-files)
- [How do I know how many time series are stored in IoTDB?](#how-do-i-know-how-many-time-series-are-stored-in-iotdb)
- [Can I use Hadoop and Spark to read TsFile in IoTDB?](#can-i-use-hadoop-and-spark-to-read-tsfile-in-iotdb)
- [How does IoTDB handle duplicate points?](#how-does-iotdb-handle-duplicate-points)
- [How can I tell what type of the specific timeseries?](#how-can-i-tell-what-type-of-the-specific-timeseries)
- [How can I change IoTDB's CLI time display format?](#how-can-i-change-iotdbs-cli-time-display-format)
<!-- /TOC -->
# Frequently Asked Questions
## How can I identify my version of IoTDB?
There are several ways to identify the version of IoTDB that you are using:
* Launch IoTDB's Command Line Interface:
```
> ./start-client.sh -p 6667 -pw root -u root -h localhost
_____ _________ ______ ______
|_ _| | _ _ ||_ _ `.|_ _ \
| | .--.|_/ | | \_| | | `. \ | |_) |
| | / .'`\ \ | | | | | | | __'.
_| |_| \__. | _| |_ _| |_.' /_| |__) |
|_____|'.__.' |_____| |______.'|_______/ version 0.7.0
```
* Check pom.xml file:
```
<version>0.7.0</version>
```
* Use JDBC API:
```
String iotdbVersion = tsfileDatabaseMetadata.getDatabaseProductVersion();
```
## Where can I find IoTDB logs?
By default settings, the logs are stored under ```IOTDB_HOME/iotdb/logs```. You can change log level and storage path by configuring ```logback.xml``` under ```IOTDB_HOME/iotdb/conf```. ```IOTDB_HOME``` is the root path of IoTDB project.
## Where can I find IoTDB data files?
By default settings, the data files (including tsfile, metadata, and WAL files) are stored under ```IOTDB_HOME/iotdb/data```.
## How do I know how many time series are stored in IoTDB?
Use IoTDB's Command Line Interface:
```
IoTDB> show timeseries root.*
```
In the result, there will be a statement shows `Total timeseries number`, this number is the timeseries number in IoTDB.
If you are using Linux, you can use the following shell command:
```
> grep "0,root" IOTDB_HOME/iotdb/data/system/schema/mlog.txt | wc -l
> 6
```
## Can I use Hadoop and Spark to read TsFile in IoTDB?
Yes. IoTDB has intense integration with Open Source Ecosystem. IoTDB supports [Hadoop](https://github.com/apache/incubator-iotdb/tree/master/hadoop), [Spark](https://github.com/apache/incubator-iotdb/tree/master/spark) and [Grafana](https://github.com/apache/incubator-iotdb/tree/master/grafana) visualization tool.
## How does IoTDB handle duplicate points?
A data point is uniquely identified by a full time series path (e.g. ```root.vehicle.d0.s0```) and timestamp. If you submit a new point with the same path and timestamp as an existing point,
## How can I tell what type of the specific timeseries?
Use ```SHOW TIMESERIES <timeseries path>``` SQL in IoTDB's Command Line Interface:
For example, if you want to know the type of all timeseries, the \<timeseries path> should be `root`. The statement will be:
```
IoTDB> show timeseries root
```
If you want to query specific sensor, you can replace the \<timeseries path> with the sensor name. For example:
```
IoTDB> show timeseries root.fit.d1.s1
```
Otherwise, you can also use wildcard in timeseries path:
```
IoTDB> show timeseries root.fit.d1.*
```
## How can I change IoTDB's CLI time display format?
The default IoTDB's CLI time display format is human readable (e.g. ```1970-01-01T08:00:00.001```), if you want to display time in timestamp type or other readable format, add parameter ```-disableIS08601``` in start command:
```
>sh IOTDB_HOME/iotdb/bin/start-client.sh -h 10.129.187.21 -p 6667 -u root -pw root -disableIS08601
```
\ No newline at end of file
# IoTDB Examples
These examples give a quick overview of the IoTDB JDBC. IoTDB offers standard JDBC for users to interact with IoTDB, other language versions are comming soon.
To use IoTDB, you need to set storage group (detail concept about storage group, please view our documentations) for your timeseries. Then you need to create specific timeseries((detail concept about storage group, please view our documentations)) according to its data type, name, etc. After that, inserting and query data is allowed. In this page, we will show an basic example using IoTDB JDBC.
## IoTDB Hello World
``` JAVA
/**
* The class is to show how to write and read date from IoTDB through JDBC
*/
package com.tsinghua.iotdb.demo;
import java.sql.*;
import java.text.SimpleDateFormat;
import java.util.Date;
public class IotdbHelloWorld {
public static void main(String[] args) throws SQLException, ClassNotFoundException {
Connection connection = null;
Statement statement = null;
try {
// 1. load JDBC driver of IoTDB
Class.forName("org.apache.iotdb.iotdb.jdbc.IoTDBDriver");
// 2. DriverManager connect to IoTDB
connection = DriverManager.getConnection("jdbc:iotdb://localhost:6667/", "root", "root");
// 3. Create statement
statement = connection.createStatement();
// 4. Set storage group
statement.execute("set storage group to root.vehicle.sensor");
// 5. Create timeseries
statement.execute("CREATE TIMESERIES root.vehicle.sensor.sensor0 WITH DATATYPE=DOUBLE, ENCODING=PLAIN");
// 6. Insert data to IoTDB
statement.execute("INSERT INTO root.vehicle.sensor(timestamp, sensor0) VALUES (2018/10/24 19:33:00, 142)");
// 7. Query data
String sql = "select * from root.vehicle.sensor";
String path = "root.vehicle.sensor.sensor0";
boolean hasResultSet = statement.execute(sql);
SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS");
if (hasResultSet) {
ResultSet res = statement.getResultSet();
System.out.println(" Time" + "|" + path);
while (res.next()) {
long time = Long.parseLong(res.getString("Time"));
String dateTime = dateFormat.format(new Date(time));
System.out.println(dateTime + " | " + res.getString(path));
}
res.close();
}
}
finally {
// 8. Close
if (statement != null) statement.close();
if (connection != null) connection.close();
}
}
}
```
# Research Papers
Apache IoTDB started at Tsinghua University, School of Software. IoTDB is a database for managing large amount of time series data with columnar storage, data encoding, pre-computation, and index techniques. It has SQL-like interface to write millions of data points per second per node and is optimized to get query results in few seconds over trillions of data points. It can also be easily integrated with Apache Hadoop MapReduce and Apache Spark for analytics.
The research papers related are in the following:
* [PISA: An Index for Aggregating Big Time Series Data](https://dl.acm.org/citation.cfm?id=2983775&dl=ACM&coll=DL), Xiangdong Huang and Jianmin Wang and Raymond K. Wong and Jinrui Zhang and Chen Wang. CIKM 2016.
* [Matching Consecutive Subpatterns over Streaming Time Series](https://link.springer.com/chapter/10.1007/978-3-319-96893-3_8), Rong Kang and Chen Wang and Peng Wang and Yuting Ding and Jianmin Wang. APWeb/WAIM 2018.
* [KV-match: A Subsequence Matching Approach Supporting Normalization and Time Warping](https://www.semanticscholar.org/paper/KV-match%3A-A-Subsequence-Matching-Approach-and-Time-Wu-Wang/9ed84cb15b7e5052028fc5b4d667248713ac8592), Jiaye Wu and Peng Wang and Chen Wang and Wei Wang and Jianmin Wang. ICDE 2019.
<!-- TOC -->
- [v0.7.0 Release Notes](#v070-release-notes)
- [Features](#features)
- [IoTDB](#iotdb)
- [IoTDB-Transfer-Tool](#iotdb-transfer-tool)
- [Bugfixes](#bugfixes)
- [IoTDB](#iotdb-1)
- [System Organization](#system-organization)
<!-- /TOC -->
### v0.7.0 Release Notes
Add postback tools, multi-path data storage mechanism, ```SHOW TIMESERIES / STORAGE GROUP``` SQL extended expressions, and other new features. Fix several issues in version 0.6.0. Improve system stability.
#### Features
##### IoTDB
* Add ```Show Storage Group``` SQL statement, support for displaying storage groups.
* Enhance ```Show Timeseries``` SQL, support for displaying time series information under different prefix paths.
* Add multi-path data storage mechanism for distributed storage, allowing different data files (write ahead logs, metadata files, etc.) to be stored in different paths.
* Add data directory configuration which allows data files to be stored in different paths, facilitating the use of multiple disks to store data files.
##### IoTDB-Transfer-Tool
* Added IoTDB data postback module to provide users with end-to-cloud timed data file postback function.
#### Bugfixes
##### IoTDB
* Fix the problem that the IoTDB shutdown script does not work
* Fix the problem that the IoTDB installation path does not support spaces in Windows environment.
* Fix the problem that the system cannot be restarted after merging Overflow data
* Fix the problem that the permission module missing the ALL keyword
* Fixed an issue that quotation marks in strings were not supported when querying TEXT type data
#### System Organization
* Further improve system stability
\ No newline at end of file
<!-- TOC -->
- [Material: Sample Data](#material-sample-data)
- [Scenario Description](#scenario-description)
- [Sample Data](#sample-data)
<!-- /TOC -->
# Material: Sample Data
### Scenario Description
A power department needs to monitor the operation of various power plants under its jurisdiction. By collecting real-time monitoring data sent by various types of sensors deployed by various power plants, the power department can monitor the real-time operation of the power plants and understand the trend of data changes, etc. IoTDB has the characteristics of high write throughput and rich query functions, which can provide effective support for the needs of the power department.
The real-time data needed to be monitored involves multiple attribute layers:
* **Power Generation Group**: The data belongs to nearly ten power generation groups, and the name codes are ln, sgcc, etc.
* **Power Plant**: The power generation group has more than 10 kinds of electric fields, such as wind farm, hydropower plant and photovoltaic power plant, numbered as wf01, wf02, wf03 and so on.
* **Device**: Each power plant has about 5,000 kinds of power generation devices such as wind turbines and photovoltaic panels, numbered as wt01, wt02 and so on.
* **Sensor**: For different devices, there are 10 to 1000 sensors monitoring different states of the devices , such as power supply status sensor (named status), temperature sensor (named temperature), hardware version sensor (named hardware), etc.
It is worth noting that prior to the use of IoTDB by the power sector, some historical monitoring data of various power plants needs to be imported into the IoTDB system (we will introduce the import method in [Import Historical Data](需要连接到具体的网页链接Chapter3ImportHistoricalData)). Simutaneouly, the real-time monitoring data is continuously flowing into the IoTDB system (we will introduce the import method in Section 3.3.2 of this chapter).
### Sample Data
Based on the description of the above sample scenarios, we provide you with a simplified sample data. The data download address is http://tsfile.org/download.
The basic information of the data is shown in Table below.
<center>**Table: The basic information of the data**
|Name |Data Type| Coding | Meaning |
|:---|:---|:---|:---|
|root.ln.wf01.wt01.status| Boolean|PLAIN| the power supply status of ln group wf01 plant wt01 device |
|root.ln.wf01.wt01.temperature |Float|RLE| the temperature of ln group wf01 plant wt01 device|
|root.ln.wf02.wt02.hardware |Text|PLAIN| the hardware version of ln group wf02 plant wt02 device|
|root.ln.wf02.wt02.status |Boolean|PLAIN| the power supply status of ln group wf02 plant wt02 device|
|root.sgcc.wf03.wt01.status|Boolean|PLAIN| the power supply status of sgcc group wf03 plant wt01 device|
|root.sgcc.wf03.wt01.temperature |Float|RLE| the temperature of sgcc group wf03 plant wt01 device|
</center>
The time span of this data is from 10:00 on November 1, 2017 to 12:00 on November 2, 2017. The frequency at which data is generated is two minutes each.
In [Data Model Selection and Creation](Chapter3DataModelSelectionandCreation), we will show how to apply IoTDB's data model rules to construct the data model shown above. In [Import Historical Data](Chapter3ImportHistoricalData), we will introduce you to the method of importing historical data, and in [Import Real-time Data](Chapter3ImportReal-timeData), we will introduce you to the method of accessing real-time data. In [Data Query](Chapter3DataQuery), we will introduce you to three typical data query patterns using IoTDB. In [Data Maintenance](Chapter3DataMaintenance), we will show you how to update and delete data using IoTDB.
\ No newline at end of file
此差异已折叠。
<!-- TOC -->
- [Quick Start](#quick-start)
- [Prerequisites](#prerequisites)
- [Installation](#installation)
- [Installation from source code](#installation-from-source-code)
- [Configure](#configure)
- [Start](#start)
- [Start Server](#start-server)
- [Start Client](#start-client)
- [Have a try](#have-a-try)
- [Stop Server](#stop-server)
<!-- /TOC -->
# Quick Start
This short guide will walk you through the basic process of using IoTDB. For a more-complete guide, please visit our website’s documents.
## Prerequisites
To use IoTDB, you need to have:
1. Java >= 1.8 (Please make sure the environment path has been set)
2. Maven >= 3.0 (If you want to compile and install IoTDB from source code)
3. TsFile >= 0.7.0 (TsFile Github page: [https://github.com/apache/incubator-iotdb/tree/master/tsfile](https://github.com/apache/incubator-iotdb/tree/master/tsfile))
4. IoTDB-JDBC >= 0.7.0 (IoTDB-JDBC Github page: [https://github.com/apache/incubator-iotdb/tree/master/jdbc](https://github.com/apache/incubator-iotdb/tree/master/jdbc))
TODO: TsFile and IoTDB-JDBC dependencies will be removed after the project reconstruct.
## Installation
IoTDB provides you two installation methods, you can refer to the following suggestions, choose one of them:
* Installation from source code. If you need to modify the code yourself, you can use this method.
* Installation from binary files. Download the binary files from the official website. This is the recommended method, in which you will get a binary released package which is out-of-the-box.(Comming Soon...)
Here in the Quick Start part, we give a brief introduction of using source code to install IoTDB. For further information, please refer to Chapter 5 of this document.
### Installation from source code
Use git to get IoTDB source code:
```
Shell > git clone https://github.com/apache/incubator-iotdb.git
```
Or:
```
Shell > git clone git@github.com:apache/incubator-iotdb.git
```
Now suppose your directory is like this:
```
> pwd
/User/workspace/incubator-iotdb
> ls -l
incubator-iotdb/ <-- root path
|
+- iotdb/
|
+- jdbc/
|
+- tsile/
|
...
|
+- pom.xml
```
Let $IOTDB_HOME = /User/workspace/incubator-iotdb/iotdb/iotdb/
If you are not the first time that building IoTDB, remember deleting the following files:
```
> rm -rf $IOTDB_HOME/data/
> rm -rf $IOTDB_HOME/lib/
```
Then under the root path of incubator-iotdb, you can build IoTDB using Maven:
```
> pwd
/User/workspace/incubator-iotdb
> mvn clean package -pl iotdb -am -Dmaven.test.skip=true
```
If successful, you will see the the following text in the terminal:
```
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] IoTDB Root ......................................... SUCCESS [ 7.020 s]
[INFO] TsFile ............................................. SUCCESS [ 10.486 s]
[INFO] Service-rpc ........................................ SUCCESS [ 3.717 s]
[INFO] IoTDB Jdbc ......................................... SUCCESS [ 3.076 s]
[INFO] IoTDB .............................................. SUCCESS [ 8.258 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
```
Otherwise, you may need to check the error statements and fix the problems.
After building, the IoTDB project will be at the subfolder named iotdb. The folder will include the following contents:
```
$IOTDB_HOME/
|
+- bin/ <-- script files
|
+- conf/ <-- configuration files
|
+- lib/ <-- project dependencies
```
<!-- > NOTE: We also provide already built JARs and project at [http://tsfile.org/download](http://tsfile.org/download) instead of build the jar package yourself. -->
## Configure
Before starting to use IoTDB, you need to config the configuration files first. For your convenience, we have already set the default config in the files.
In total, we provide users three kinds of configurations module:
* environment config module (iotdb-env.`sh`(Linux or OSX), iotdb-env.`bat`(Windows))
* system config module (tsfile-format.properties, iotdb-engine.properties)
* log config module (logback.xml)
The configuration files of the three configuration items are located in the IoTDB installation directory: $IOTDB_HOME/conf folder. For more, you are advised to check Chapter 5 to give you the details.
## Start
### Start Server
After that we start the server. Running the startup script:
```
# Unix/OS X
> $IOTDB_HOME/bin/start-server.sh
# Windows
> $IOTDB_HOME\bin\start-server.bat
```
### Start Client
Now let's trying to read and write some data from IoTDB using our Client. To start the client, you need to explicit the server's IP and PORT as well as the USER_NAME and PASSWORD.
```
# Unix/OS X
> $IOTDB_HOME/bin/start-client.sh -h <IP> -p <PORT> -u <USER_NAME>
# Windows
> $IOTDB_HOME\bin\start-client.bat -h <IP> -p <PORT> -u <USER_NAME>
```
> NOTE: In the system, we set a default user in IoTDB named 'root'. The default password for 'root' is 'root'. You can use this default user if you are making the first try or you didn't create users by yourself.
The command line client is interactive so if everything is ready you should see the welcome logo and statements:
```
_____ _________ ______ ______
|_ _| | _ _ ||_ _ `.|_ _ \
| | .--.|_/ | | \_| | | `. \ | |_) |
| | / .'`\ \ | | | | | | | __'.
_| |_| \__. | _| |_ _| |_.' /_| |__) |
|_____|'.__.' |_____| |______.'|_______/ version x.x.x
IoTDB> login successfully
IoTDB>
```
### Have a try
Now, you can use IoTDB SQL to operate IoTDB, and when you’ve had enough fun, you can input 'quit' or 'exit' command to leave the client.
But lets try something slightly more interesting:
```
IoTDB> SET STORAGE GROUP TO root.ln
execute successfully.
IoTDB> CREATE TIMESERIES root.ln.wf01.wt01.status WITH DATATYPE=BOOLEAN, ENCODING=PLAIN
execute successfully.
```
Till now, we have already create a table called root.vehicle and add a colomn called d0.s0 in the table. Let's take a look at what we have done by 'SHOW TIMESERIES' command.
```
IoTDB> SHOW TIMESERIES
=== Timeseries Tree ===
root:{
ln:{
wf01:{
wt01:{
status:{
DataType: BOOLEAN,
Encoding: PLAIN,
args: {},
StorageGroup: root.ln
}
}
}
}
}
```
For a further try, create a timeseries again and use SHOW TIMESERIES to check result.
```
IoTDB> CREATE TIMESERIES root.ln.wf01.wt01.temperature WITH DATATYPE=FLOAT, ENCODING=RLE
IoTDB> SHOW TIMESERIES
=== Timeseries Tree ===
root:{
ln:{
wf01:{
wt01:{
status:{
DataType: BOOLEAN,
Encoding: PLAIN,
args: {},
StorageGroup: root.ln
},
temperature:{
DataType: FLOAT,
Encoding: RLE,
args: {},
StorageGroup: root.ln
}
}
}
}
}
```
Now, for your conveniect, SHOW TIMESERIES clause also supports extention syntax, the pattern is (for further details, check Chapter x):
```
SHOW TIMESERIES <PATH>
```
Here is the example:
```
IoTDB> SHOW TIMESERIES root.ln.wf01.wt01
+------------------------------+--------------+--------+--------+
| Timeseries| Storage Group|DataType|Encoding|
+------------------------------+--------------+--------+--------+
| root.ln.wf01.wt01.status| root.ln| BOOLEAN| PLAIN|
| root.ln.wf01.wt01.temperature| root.ln| FLOAT| RLE|
+------------------------------+--------------+--------+--------+
Total timeseries number = 2
Execute successfully.
```
We can also use SHOW STORAGE GROUP to check created storage group:
```
IoTDB> show storage group
+-----------------------------------+
| Storage Group|
+-----------------------------------+
| root.ln|
+-----------------------------------+
Total storage group number = 1
Execute successfully.
It costs 0.006s
```
Insert timeseries data is the basic operation of IoTDB, you can use 'INSERT' command to finish this:
```
IoTDB> INSERT INTO root.ln.wf01.wt01(timestamp,status) values(100,true);
execute successfully.
IoTDB> INSERT INTO root.ln.wf01.wt01(timestamp,status,temperature) values(200,false,20.71)
execute successfully.
```
The data we've just inserted displays like this:
```
IoTDB> SELECT status FROM root.ln.wf01.wt01
+-----------------------+------------------------+
| Time|root.ln.wf01.wt01.status|
+-----------------------+------------------------+
|1970-01-01T08:00:00.100| true|
|1970-01-01T08:00:00.200| false|
+-----------------------+------------------------+
record number = 1
execute successfully.
```
We can also query several timeseries data at once like this:
```
IoTDB> SELECT * FROM root.ln.wf01.wt01
+-----------------------+--------------------------+-----------------------------+
| Time| root.ln.wf01.wt01.status|root.ln.wf01.wt01.temperature|
+-----------------------+--------------------------+-----------------------------+
|1970-01-01T08:00:00.100| true| null|
|1970-01-01T08:00:00.200| false| 20.71|
+-----------------------+--------------------------+-----------------------------+
```
If your session looks similar to what’s above, congrats, your IoTDB is operational!
For more on what commands are supported by IoTDB SQL, see Chapter xx. It will give you help.
### Stop Server
The server can be stopped with ctrl-C or the following script:
```
# Unix/ OS X
> $IOTDB_HOME/bin/stop-server.sh
# Windows
> $IOTDB_HOME\bin\stop-server.bat
```
<!-- TOC -->
- [Chapter 1: Overview](#chapter-1-overview)
- [What is IoTDB](#what-is-iotdb)
- [Architecture](#architecture)
- [Scenario](#scenario)
- [Scenario 1](#scenario-1)
- [Scenario 2](#scenario-2)
- [Scenario 3](#scenario-3)
- [Scenario 4](#scenario-4)
- [Features](#features)
<!-- /TOC -->
# Chapter 1: Overview
## What is IoTDB
IoTDB(Internet of Things Database) is an integrated data management engine designed for timeseries data, which can provide users specific services for data collection, storage and analysis. Due to its light weight structure, high performance and usable features together with its intense integration with Hadoop and Spark ecology, IoTDB meets the requirements of massive dataset storage, high-speed data input and complex data analysis in the IoT industrial field.
## Architecture
Besides IoTDB engine, we also developed several components to provide better IoT service. All components are referred to below as the IoTDB suite, and IoTDB refers specifically to the IoTDB engine.
IoTDB suite can provide a series of functions in the real situation such as data collection, data writing, data storage, data query, data visualization and data analysis. Figure 1.1 shows the overall application architecture brought by all the components of the IoTDB suite.
<img style="width:100%; max-width:800px; max-height:600px; margin-left:auto; margin-right:auto; display:block;" src="https://user-images.githubusercontent.com/13203019/51578977-4d5b5800-1efa-11e9-9d6e-6bfe7e890f30.jpg">
As shown in Figure 1.1, users can use JDBC to import timeseries data collected by sensor on the device to local/remote IoTDB. These timeseries data may be system state data (such as server load and CPU memory, etc.), message queue data, timeseries data from applications, or other timeseries data in the database. Users can also write the data directly to the TsFile (local or on HDFS).
For the data written to IoTDB and local TsFile, users can use TsFileSync tool to synchronize the TsFile to the HDFS, thereby implementing data processing tasks such as abnormality detection and machine learning on the Hadoop or Spark data processing platform. The results of the analysis can be write back to TsFile in the same way.
Also, IoTDB and TsFile provide client tools to meet the various needs of users in writing and viewing data in SQL form, script form and graphical form.
## Scenario
### Scenario 1
A company uses surface mount technology (SMT) to produce chips: it is necessary to first print solder paste on the joints of the chip, then place the components on the solder paste, and then melt the solder paste by heating and cool it. Finally, the components are soldered to the chip.
The above process uses an automated production line. In order to ensure the quality of the product, after printing the solder paste, the quality of the solder paste printing needs to be evaluated by optical equipment. The volume (v), height (h), area (a), horizontal offset (px), and vertical offset (py) of the solder paste on each joint are measured by a three-dimensional solder paste printing (SPI) device.
In order to improve the quality of the printing, it is necessary for the company to store the metrics of the solder joints on each chip for subsequent analysis based on these data.
At this point, the data can be stored using TsFile component, TsFileSync tool, and Hadoop/Spark integration component in the IoTDB suite.That is, each time a new chip is printed, a data is written on the SPI device using the SDK, which ultimately forms a TsFile. Through the TsFileSync tool, the generated TsFile will be synchronized to the data center according to certain rules (such as daily) and analyzed by data analysts tools.
<img style="width:100%; max-width:800px; max-height:600px; margin-left:auto; margin-right:auto; display:block;" src="https://user-images.githubusercontent.com/13203019/51579014-695ef980-1efa-11e9-8cbc-e9e7ee4fa0d8.png">
In this scenario, only TsFile and TsFileSync are required to be deployed on a PC, and a Hadoop/Spark cluster is required. The schematic diagram is shown in Figure 1.2. Figure 1.3 shows the architecture at this time.
<img style="width:100%; max-width:800px; max-height:600px; margin-left:auto; margin-right:auto; display:block;" src="https://user-images.githubusercontent.com/13203019/51579026-77ad1580-1efa-11e9-8345-564b22d70286.jpg">
### Scenario 2
A company has several wind turbines which are installed hundreds of sensors on each generator to collect information such as the working status of the generator and the wind speed in the working environment.
In order to ensure the normal operation of the turbines and timely monitoring and analysis of the turbines, the company needs to collect these sensor data, perform partial calculation and analysis in the turbines working environment, and upload the original data collected to the data center.
<img style="width:100%; max-width:800px; max-height:600px; margin-left:auto; margin-right:auto; display:block;" src="https://user-images.githubusercontent.com/13203019/51579033-7ed42380-1efa-11e9-889f-fb4180291a9e.png">
In this situation, IoTDB, TsFileSync tools, and Hadoop/Spark integration components in the IoTDB suite can be used. A PC needs to be deployed with IoTDB and TsFileSync tools installed to support reading and writing data, local computing and analysis, and uploading data to the data center. In addition, Hadoop/Spark clusters need to be deployed for data storage and analysis on the data center side. As shown in Figure 1.4. Figure 1.5 shows the architecture at this time.
<img style="width:100%; max-width:800px; max-height:600px; margin-left:auto; margin-right:auto; display:block;" src="https://user-images.githubusercontent.com/13203019/51579064-8f849980-1efa-11e9-8cd6-a7339cd0540f.jpg">
### Scenario 3
A factory has a variety of robotic equipment within the plant area. These robotic equipment have limited hardware and are difficult to carry complex applications.
A variety of sensors are installed on each robotic device to monitor the robot's operating status, temperature, and other information. Due to the network environment of the factory, the robots inside the factory are all within the LAN of the factory and cannot connect to the external network. But there will be several servers in the factory that can connect directly to the external public network.
In order to ensure that the data of the robot can be monitored and analyzed in time, the company needs to collect the information of these robot sensors, send them to the server that can connect to the external network, and then upload the original data information to the data center for complex calculation and analysis.
<img style="width:100%; max-width:800px; max-height:600px; margin-left:auto; margin-right:auto; display:block;" src="https://user-images.githubusercontent.com/13203019/51579080-96aba780-1efa-11e9-87ac-940c45b19dd7.jpg">
At this point, IoTDB, IoTDB-CLI tools, TsFileSync tools, and Hadoop/Spark integration components in the IoTDB suite can be used. IoTDB-CLI tool is installed on the robot and each of them is connected to the LAN of the factory. When sensors generate real-time data, the data will be uploaded to the server in the factory. The IoTDB server and TsFileSync is installed on the server connected to the external network. Once triggered, the data on the server will be upload to the data center. In addition, Hadoop/Spark clusters need to be deployed for data storage and analysis on the data center side. As shown in Figure 1.6. Figure 1.7 shows the architecture at this time.
<img style="width:100%; max-width:800px; max-height:600px; margin-left:auto; margin-right:auto; display:block;" src="https://user-images.githubusercontent.com/13203019/51579085-9dd2b580-1efa-11e9-97b9-f56bc8d342b0.jpg">
### Scenario 4
A car company installed sensors on its cars to collect monitoring information such as the driving status of the vehicle. These automotive devices have limited hardware configurations and are difficult to carry complex applications. Cars with sensors can be connected to each other or send data via narrow-band IoT.
In order to receive the IoT data collected by the car sensor in real time, the company needs to send the sensor data to the data center in real time through the narrowband IoT while the vehicle is running. Thus, they can perform complex calculations and analysis on the server in the data center.
At this point, IoTDB, IoTDB-CLI, and Hadoop/Spark integration components in the IoTDB suite can be used. IoTDB-CLI tool is installed on each car and use IoTDB-JDBC tool to send data directly back to the server in the data center.
In addition, Hadoop/Spark clusters need to be deployed for data storage and analysis on the data center side. As shown in Figure 1.8.
<img style="width:100%; max-width:800px; max-height:600px; margin-left:auto; margin-right:auto; display:block;" src="https://user-images.githubusercontent.com/13203019/51579095-a4f9c380-1efa-11e9-9f95-17165ec55568.jpg">
## Features
* Flexible deployment. IoTDB provides users one-click installation tool on the cloud, once-decompressed-used terminal tool and the bridge tool between cloud platform and terminal tool (Data Synchronization Tool).
* Low cost on hardware. IoTDB can reach a high compression ratio of disk storage (For one billion data storage, hard drive cost less than $0.23)
* Efficient directory structure. IoTDB supports efficient oganization for complex timeseries data structure from intelligent networking devices, oganization for timeseries data from devices of the same type, fuzzy searching strategy for massive and complex directory of timeseries data.
* High-throughput read and write. IoTDB supports millions of low-power devices' strong connection data access, high-speed data read and write for intelligent networking devices and mixed devices mentioned above.
* Rich query semantics. IoTDB supports time alignment for timeseries data accross devices and sensors, computation in timeseries field (frequency domain transformation) and rich aggregation function support in time dimension.
* Easy to get start. IoTDB supports SQL-Like language, JDBC standard API and import/export tools which is easy to use.
* Intense integration with Open Source Ecosystem. IoTDB supports Hadoop, Spark, etc. analysis ecosystems and Grafana visualization tool.
<!-- TOC -->
- [Chapter 2: Concept](#chapter-2-concept)
- [Key Concepts and Terminology](#key-concepts-and-terminology)
- [Data Type](#data-type)
- [Encoding](#encoding)
- [Compression](#compression)
<!-- /TOC -->
# Chapter 2: Concept
## Key Concepts and Terminology
The following basic concepts are involved in IoTDB:
* Device
A devices is an installation equipped with sensors in real scenarios. In IoTDB, all sensors should have their corresponding devices.
* Sensor
A sensor is a detection equipment in an actual scene, which can sense the information to be measured, and can transform the sensed information into an electrical signal or other desired form of information output and send it to IoTDB. In IoTDB, all data and paths stored are organized in units of sensors.
* Storage Group
Storage groups are used to let users define how to organize and isolate different time series data on disk. Time series belonging to the same storage group will be continuously written to the same file in the corresponding folder. The file may be closed due to user commands or system policies, and hence the data coming next from these sensors will be stored in a new file in the same folder. Time series belonging to different storage groups are stored in different folders.
Users can set any prefix path as a storage group. Provided that there are four time series `root.vehicle.d1.s1`, `root.vehicle.d1.s2`, `root.vehicle.d2.s1`, `root.vehicle.d2.s2`, two devices `d1` and `d2` under the path `root.vehicle` may belong to the same owner or the same manufacturer, so d1 and d2 are closely related. At this point, the prefix path root.vehicle can be designated as a storage group, which will enable IoTDB to store all devices under it in the same folder. Newly added devices under `root.vehicle` will also belong to this storage group.
> Note: A full path (`root.vehicle.d1.s1` as in the above example) is not allowed to be set as a storage group.
Setting a reasonable number of storage groups can lead to performance gains: there is neither the slowdown of the system due to frequent switching of IO (which will also take up a lot of memory and result in frequent memory-file switching) caused by too many storage files (or folders), nor the block of write commands caused by too few storage files (or folders) (which reduces concurrency).
Users should balance the storage group settings of storage files according to their own data size and usage scenarios to achieve better system performance. (There will be officially provided storage group scale and performance test reports in the future).
> Note: The prefix of a time series must belong to a storage group. Before creating a time series, the user must set which storage group the series belongs to. Only the time series whose storage group is set can be persisted to disk.
Once a prefix path is set as a storage group, the storage group settings cannot be changed.
After a storage group is set, all parent and child layers of the corresponding prefix path are not allowed to be set up again (for example, after `root.ln` is set as the storage group, the root layer and `root.ln.wf01` are not allowed to be set as storage groups).
* Path
In IoTDB, a path is an expression that conforms to the following constraints:
```
path: LayerName (DOT LayerName)+
LayerName: Identifier | STAR
```
Among them, STAR is "*" and DOT is ".".
We call the middle part of a path between two "." as a layer, and thus `root.A.B.C` is a path with four layers.
It is worth noting that in the path, root is a reserved character, which is only allowed to appear at the beginning of the time series mentioned below. If root appears in other layers, it cannot be parsed and an error is reported.
* Timeseries Path
The timeseries path is the core concept in IoTDB. A timeseries path can be thought of as the complete path of a sensor that produces the time series data. All timeseries paths in IoTDB must start with root and end with the sensor. A timeseries path can also be called a full path.
For example, if device1 of the vehicle type has a sensor named sensor1, its timeseries path can be expressed as: `root.vehicle.device1.sensor1`.
> Note: The layer of timeseries paths supported by the current IoTDB must be greater than or equal to four (it will be changed to two in the future).
* Prefix Path
The prefix path refers to the path where the prefix of a timeseries path is located. A prefix path contains all timeseries paths prefixed by the path. For example, suppose that we have three sensors: `root.vehicle.device1.sensor1`, `root.vehicle.device1.sensor2`, `root.vehicle.device2.sensor1`, the prefix path `root.vehicle.device1` contains two timeseries paths `root.vehicle.device1.sensor1` and `root.vehicle.device1.sensor2` while `root.vehicle.device2.sensor1` is excluded.
* Path With Star
In order to make it easier and faster to express multiple timeseries paths or prefix paths, IoTDB provides users with the path pith star. `*` can appear in any layer of the path. According to the position where `*` appears, the path with star can be divided into two types:
`*` appears at the end of the path;
`*` appears in the middle of the path;
When `*` appears at the end of the path, it represents (`*`)+, which is one or more layers of `*`. For example, `root.vehicle.device1.*` represents all paths prefixed by `root.vehicle.device1` with layers greater than or equal to 4, like `root.vehicle.device1.*`, `root.vehicle.device1.*.*`, `root.vehicle.device1.*.*.*`, etc.
When `*` appears in the middle of the path, it represents `*` itself, i.e., a layer. For example, `root.vehicle.*.sensor1` represents a 4-layer path which is prefixed with `root.vehicle` and suffixed with `sensor1`.
> Note1: `*` cannot be placed at the beginning of the path.
> Note2: A path with `*` at the end has the same meaning as a prefix path, e.g., `root.vehicle.*` and `root.vehicle` is the same.
* Timestamp
The timestamp is the time point at which a data arrives. IoTDB timestamps are divided into two types: LONG and DATETIME (including DATETIME-INPUT and DATETIME-DISPLAY). When a user enters a timestamp, he can use a LONG type timestamp or a DATETIME-INPUT type timestamp, where the support format of the DATETIME-INPUT type timestamp is shown in Table 2-1.
<center>**Table 2-1 Support format of DATETIME-INPUT type timestamp**
|format|
|:---:|
|yyyy-MM-dd HH:mm:ss|
|yyyy/MM/dd HH:mm:ss|
|yyyy.MM.dd HH:mm:ss|
|yyyy-MM-dd'T'HH:mm:ss|
|yyyy/MM/dd'T'HH:mm:ss|
|yyyy.MM.dd'T'HH:mm:ss|
|yyyy-MM-dd HH:mm:ssZZ|
|yyyy/MM/dd HH:mm:ssZZ|
|yyyy.MM.dd HH:mm:ssZZ|
|yyyy-MM-dd'T'HH:mm:ssZZ|
|yyyy/MM/dd'T'HH:mm:ssZZ|
|yyyy.MM.dd'T'HH:mm:ssZZ|
|yyyy/MM/dd HH:mm:ss.SSS|
|yyyy-MM-dd HH:mm:ss.SSS|
|yyyy.MM.dd HH:mm:ss.SSS|
|yyyy/MM/dd'T'HH:mm:ss.SSS|
|yyyy-MM-dd'T'HH:mm:ss.SSS|
|yyyy.MM.dd'T'HH:mm:ss.SSS|
|yyyy-MM-dd HH:mm:ss.SSSZZ|
|yyyy/MM/dd HH:mm:ss.SSSZZ|
|yyyy.MM.dd HH:mm:ss.SSSZZ|
|yyyy-MM-dd'T'HH:mm:ss.SSSZZ|
|yyyy/MM/dd'T'HH:mm:ss.SSSZZ|
|yyyy.MM.dd'T'HH:mm:ss.SSSZZ|
|ISO8601 standard time format|
</center>
IoTDB can support LONG types and DATETIME-DISPLAY types when displaying timestamps. The DATETIME-DISPLAY type can support user-defined time formats. The syntax of the custom time format is shown in Table 2-2.
<center>**Table 2-2 The syntax of the custom time format**
|Symbol|Meaning|Presentation|Examples|
|:---:|:---:|:---:|:---:|
|G|era|era|era|
|C|century of era (>=0)| number| 20|
| Y |year of era (>=0)| year| 1996|
|||||
| x |weekyear| year| 1996|
| w |week of weekyear| number |27|
| e |day of week |number| 2|
| E |day of week |text |Tuesday; Tue|
|||||
| y| year| year| 1996|
| D |day of year |number| 189|
| M |month of year |month| July; Jul; 07|
| d |day of month |number| 10|
|||||
| a |halfday of day |text |PM|
| K |hour of halfday (0~11) |number| 0|
| h |clockhour of halfday (1~12) |number| 12|
|||||
| H |hour of day (0~23)| number| 0|
| k |clockhour of day (1~24) |number| 24|
| m |minute of hour| number| 30|
| s |second of minute| number| 55|
| S |fraction of second |millis| 978|
|||||
| z |time zone |text |Pacific Standard Time; PST|
| Z |time zone offset/id| zone| -0800; -08:00; America/Los_Angeles|
|||||
| '| escape for text |delimiter|  |
| ''| single quote| literal |'|
</center>
* Value
The value of a time series is actually the value sent by a sensor to IoTDB. This value can be stored by IoTDB according to the data type. At the same time, users can select the compression mode and the corresponding encoding mode according to the data type of this value. See [Data Type](需要连接到具体的网页链接Chapter2) and [Encoding](需要连接到具体的网页链接Chapter2) of this document for details on data type and corresponding encoding.
* Point
A data point is made up of a timestamp value pair (timestamp, value).
* Column
A column of data contains all values belonging to a time series and the timestamps corresponding to these values. When there are multiple columns of data, IoTDB merges the timestamps into multiple < timestamp-value > pairs (timestamp, value, value,...).
## Data Type
IoTDB supports six data types in total: BOOLEAN (Boolean), INT32 (Integer), INT64 (Long Integer), FLOAT (Single Precision Floating Point), DOUBLE (Double Precision Floating Point), TEXT (String).
The time series of FLOAT and DOUBLE type can specify (MAX\_POINT\_NUMBER, see [this page](需要连接到具体的网页链接Chapter5CreateTimeseriesStatement) for more information on how to specify), which is the number of digits after the decimal point of the floating point number, if the encoding method is [RLE](需要连接到具体的网页链接Chapter2RLE) or [TS\_2DIFF](需要连接到具体的网页链接Chapter2TS2DIFF) (Refer to [Create Timeseries Statement](需要连接到具体的网页链接) for more information on how to specify). If MAX\_POINT\_NUMBER is not specified, the system will use [float\_precision](需要连接到具体的网页链接chapter4float\_precision) in the configuration file "tsfile-format.properties" for configuration for the configuration method.
When the data type of data input by the user in the system does not correspond to the data type of the time series, the system will report type errors. As shown below, the second-order difference encoding does not support the Boolean type:
```
IoTDB> create timeseries root.ln.wf02.wt02.status WITH DATATYPE=BOOLEAN, ENCODING=TS_2DIFF
error: encoding TS_2DIFF does not support BOOLEAN
```
## Encoding
In order to improve the efficiency of data storage, it is necessary to encode data during data writing, thereby reducing the amount of disk space used. In the process of writing and reading data, the amount of data involved in the I/O operations can be reduced to improve performance. IoTDB supports four encoding methods for different types of data:
* PLAIN
PLAIN encoding, the default encoding mode, i.e, no encoding, supports multiple data types. It has high compression and decompression efficiency while suffering from low space storage efficiency.
* TS_2DIFF
Second-order differential encoding is more suitable for encoding monotonically increasing or decreasing sequence data, and is not recommended for sequence data with large fluctuations.
Second-order differential encoding can also be used to encode floating-point numbers, but it is necessary to specify reserved decimal digits (MAX\_POINT\_NUMBER, see [this page](需要连接到具体的网页链接Chapter5CreateTimeseriesStatement) for more information on how to specify) when creating time series. It is more suitable for storing sequence data where floating-point values appear continuously, monotonously increase or decrease, and it is not suitable for storing sequence data with high precision requirements after the decimal point or with large fluctuations.
* RLE
Run-length encoding is more suitable for storing sequence with continuous integer values, and is not recommended for sequence data with most of the time different values.
Run-length encoding can also be used to encode floating-point numbers, but it is necessary to specify reserved decimal digits (MAX\_POINT\_NUMBER, see [this page](需要连接到具体的网页链接Chapter5CreateTimeseriesStatement) for more information on how to specify) when creating time series. It is more suitable for storing sequence data where floating-point values appear continuously, monotonously increase or decrease, and it is not suitable for storing sequence data with high precision requirements after the decimal point or with large fluctuations.
* GORILLA
GORILLA encoding is more suitable for floating-point sequence with similar values and is not recommended for sequence data with large fluctuations.
* Correspondence between data type and encoding
The four encodings described in the previous sections are applicable to different data types. If the correspondence is wrong, the time series cannot be created correctly. The correspondence between the data type and its supported encodings is summarized in Table 2-3.
<center> **Table 2-3 The correspondence between the data type and its supported encodings**
|Data Type |Supported Encoding|
|:---:|:---:|
|BOOLEAN| PLAIN, RLE|
|INT32 |PLAIN, RLE, TS_2DIFF|
|INT64 |PLAIN, RLE, TS_2DIFF|
|FLOAT |PLAIN, RLE, TS_2DIFF, GORILLA|
|DOUBLE |PLAIN, RLE, TS_2DIFF, GORILLA|
|TEXT |PLAIN|
</center>
## Compression
When the time series is written and encoded as binary data according to the specified type, IoTDB compresses the data using compression technology to further improve space storage efficiency. Although both encoding and compression are designed to improve storage efficiency, encoding techniques are usually only available for specific data types (e.g., second-order differential encoding is only suitable for INT32 or INT64 data type, and storing floating-point numbers requires multiplying them by 10m to convert to integers), after which the data is converted to a binary stream. The compression method (SNAPPY) compresses the binary stream, so the use of the compression method is no longer limited by the data type.
IoTDB allows you to specify the compression method of the column when creating a time series. IoTDB now supports two kinds of compression: UNCOMPRESSOR (no compression) and SNAPPY compression. The specified syntax for compression is detailed in [Create Timeseries Statement](需要连接到具体的网页链接Chapter5).
\ No newline at end of file
<!-- TOC -->
- [Chaper6: JDBC API](#chaper6-jdbc-api)
<!-- /TOC -->
# Chaper6: JDBC API
\ No newline at end of file
<!-- TOC -->
- [Cli/shell tool](#clishell-tool)
- [Running Cli/Shell](#running-clishell)
- [Cli/Shell Parameters](#clishell-parameters)
<!-- /TOC -->
# Cli/shell tool
IoTDB provides Cli/shell tools for users to interact with IoTDB server in command lines. This document will show how Cli/shell tool works and what does it parameters mean.
> Note: In this document, \$IOTDB\_HOME represents the path of the IoTDB installation directory.
## Running Cli/Shell
After installation, there is a default user in IoTDB: `root`, and the
default password is `root`. Users can use this username to try IoTDB Cli/Shell tool. The client startup script is the `start-client` file under the \$IOTDB\_HOME/bin folder. When starting the script, you need to specify the IP and PORT. (Make sure the IoTDB server is running properly when you use Cli/Shell tool to connect it.)
Here is an example where the server is started locally and the user has not changed the running port. The default port is
6667. If you need to connect to the remote server or changes
the port number of the server running, set the specific IP and PORT at -h and -p.
The Linux and MacOS system startup commands are as follows:
```
Shell > ./bin/start-client.sh -h 127.0.0.1 -p 6667 -u root -pw root
```
The Windows system startup commands are as follows:
```
Shell > \bin\start-client.bat -h 127.0.0.1 -p 6667 -u root -pw root
```
After using these commands, the client can be started successfully. The successful status will be as follows:
```
_____ _________ ______ ______
|_ _| | _ _ ||_ _ `.|_ _ \
| | .--.|_/ | | \_| | | `. \ | |_) |
| | / .'`\ \ | | | | | | | __'.
_| |_| \__. | _| |_ _| |_.' /_| |__) |
|_____|'.__.' |_____| |______.'|_______/ version <version>
IoTDB> login successfully
IoTDB>
```
Enter ```quit``` or `exit` can exit Client. The client will shows `quit normally`
## Cli/Shell Parameters
|Parameter name|Parameter type|Required| Description| Example |
|:---|:---|:---|:---|:---|
|-disableIS08601 |No parameters | No |If this parameter is set, IoTDB will print the timestamp in digital form|-disableIS08601|
|-h <`host`> |string, no quotation marks|Yes|The IP address of the IoTDB server|-h 10.129.187.21|
|-help|No parameters|No|Print help information for IoTDB|-help|
|-p <`port`>|int|Yes|The port number of the IoTDB server. IoTDB runs on port 6667 by default|-p 6667|
|-pw <`password`>|string, no quotation marks|No|The password used for IoTDB to connect to the server. If no password is entered, IoTDB will ask for password in Cli command|-pw root|
|-u <`username`>|string, no quotation marks|Yes|User name used for IoTDB to connect the server|-u root|
|-maxPRC <`maxPrintRowCount`>|int|No|Set the maximum number of rows that IoTDB returns|-maxPRC 10|
Following is a client command which connects the host with IP
10.129.187.21, port 6667, username "root", password "root", and prints the timestamp in digital form. The maximum number of lines displayed on the IoTDB command line is 10.
The Linux and MacOS system startup commands are as follows:
```
Shell >./bin/start-client.sh -h 10.129.187.21 -p 6667 -u root -pw root -disableIS08601 -maxPRC 10
```
The Windows system startup commands are as follows:
```
Shell > \bin\start-client.bat -h 10.129.187.21 -p 6667 -u root -pw root -disableIS08601 -maxPRC 10
```
<!-- TOC -->
- [IoTDB-Grafana](#iotdb-grafana)
- [Grafana installation](#grafana-installation)
- [Install Grafana](#install-grafana)
- [Install data source plugin](#install-data-source-plugin)
- [Start Grafana](#start-grafana)
- [IoTDB installation](#iotdb-installation)
- [IoTDB-Grafana installation](#iotdb-grafana-installation)
- [Start IoTDB-Grafana](#start-iotdb-grafana)
- [Explore in Grafana](#explore-in-grafana)
- [Add data source](#add-data-source)
- [Design in dashboard](#design-in-dashboard)
<!-- /TOC -->
# IoTDB-Grafana
This project provides a connector which reads data from IoTDB and sends to Grafana(https://grafana.com/). Before you use this tool, make sure Grafana and IoTDB are correctly installed and started.
## Grafana installation
### Install Grafana
* Download url: https://grafana.com/grafana/download
* version >= 4.4.1
### Install data source plugin
* plugin name: simple-json-datasource
* Download url: https://github.com/grafana/simple-json-datasource
After downloading this plugin, you can use the grafana-cli tool to install SimpleJson from the commandline:
```
grafana-cli plugins install grafana-simple-json-datasource
```
Alternatively, you can manually download the .zip file and unpack it into your grafana plugins directory.
* `{grafana-install-directory}/data/plugin/` (Windows)
* `/var/lib/grafana/plugins` (Linux)
* `/usr/local/var/lib/grafana/plugins`(Mac)
### Start Grafana
If you use Unix, Grafana will auto start after installing, or you can run `sudo service grafana-server start` command. See more information [here](http://docs.grafana.org/installation/debian/).
If you use Mac and `homebrew` to install Grafana, you can use `homebrew` to start Grafana.
First make sure homebrew/services is installed by running `brew tap homebrew/services`, then start Grafana using: `brew services start grafana`.
See more information [here](http://docs.grafana.org/installation/mac/).
If you use Windows, start Grafana by executing grafana-server.exe, located in the bin directory, preferably from the command line. See more information [here](http://docs.grafana.org/installation/windows/).
## IoTDB installation
See https://github.com/apache/incubator-iotdb
## IoTDB-Grafana installation
```shell
git clone https://github.com/apache/incubator-iotdb.git
mvn clean package -pl grafana -am -Dmaven.test.skip=true
cd grafana
```
Copy `application.properties` from `conf/` directory to `target` directory. (Or just make sure that `application.properties` and `iotdb-grafana-{version}-SNAPSHOT.war` are in the same directory.)
Edit `application.properties`
```
# ip and port of IoTDB
spring.datasource.url = jdbc:iotdb://127.0.0.1:6667/
spring.datasource.username = root
spring.datasource.password = root
spring.datasource.driver-class-name=org.apache.iotdb.jdbc.IoTDBDriver
server.port = 8888
```
### Start IoTDB-Grafana
```shell
cd grafana/target/
java -jar iotdb-grafana-{version}-SNAPSHOT.war
```
If you see the following output, iotdb-grafana connector is successfully activated.
```shell
$ java -jar iotdb-grafana-{version}-SNAPSHOT.war
. ____ _ __ _ _
/\\ / ___'_ __ _ _(_)_ __ __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
\\/ ___)| |_)| | | | | || (_| | ) ) ) )
' |____| .__|_| |_|_| |_\__, | / / / /
=========|_|==============|___/=/_/_/_/
:: Spring Boot :: (v1.5.4.RELEASE)
...
```
## Explore in Grafana
The default port of Grafana is 3000, see http://localhost:3000
Username and password are both "admin" by default.
### Add data source
Select `Data Sources` and then `Add data source`, select `SimpleJson` in `Type` and `URL` is http://localhost:8888
<img style="width:100%; max-width:800px; max-height:600px; margin-left:auto; margin-right:auto; display:block;" src="https://user-images.githubusercontent.com/13203019/51664777-2766ae00-1ff5-11e9-9d2f-7489f8ccbfc2.png">
<img style="width:100%; max-width:800px; max-height:600px; margin-left:auto; margin-right:auto; display:block;" src="https://user-images.githubusercontent.com/13203019/51664842-554bf280-1ff5-11e9-97d2-54eebe0b2ca1.png">
### Design in dashboard
Add diagrams in dashboard and customize your query. See http://docs.grafana.org/guides/getting_started/
<img style="width:100%; max-width:800px; max-height:600px; margin-left:auto; margin-right:auto; display:block;" src="https://user-images.githubusercontent.com/13203019/51664878-6e54a380-1ff5-11e9-9718-4d0e24627fa8.png">
\ No newline at end of file
<!-- TOC -->
- [TsFile-Hadoop-Connector User Guide](#tsfile-hadoop-connector-user-guide)
<!-- /TOC -->
# TsFile-Hadoop-Connector User Guide
Comming Soon.
\ No newline at end of file
<!-- TOC -->
- [TsFile-Spark-Connector User Guide](#tsfile-spark-connector-user-guide)
- [Dependencies & Version](#dependencies--version)
- [Quick Start](#quick-start)
- [Step1: Build TsFile-Spark-Connector](#step1-build-tsfile-spark-connector)
- [Step2: Import Connector Lib into Spark](#step2-import-connector-lib-into-spark)
- [Step3: Use Connector in Spark](#step3-use-connector-in-spark)
- [5.2.1.1 Local Mode](#5211-local-mode)
- [5.2.1.2 Distributed Mode](#5212-distributed-mode)
- [Detail: Conversion between TsFile and Spark](#detail-conversion-between-tsfile-and-spark)
- [TsFile Type <-> SparkSQL type](#tsfile-type---sparksql-type)
- [TsFile Schema <-> SparkSQL Table Structure](#tsfile-schema---sparksql-table-structure)
- [the default way](#the-default-way)
- [unfolding delta_object column](#unfolding-delta_object-column)
- [Example](#example)
- [5.1 Scala API](#51-scala-api)
<!-- /TOC -->
# TsFile-Spark-Connector User Guide
TsFile-Spark-Connector implements the support of Spark for external data sources of Tsfile type. This enables users to read, write and query Tsfile by Spark.
## Dependencies & Version
The versions required for Spark and Java are as follow:
| Spark Version | Scala Version | Java Version | TsFile |
| ------------- | ------------- | ------------ |------- |
| `2.0+` | `2.11` | `1.8` | `0.7.0`|
> Note: For more information about how to download and use TsFile, please see the following link: https://github.com/apache/incubator-iotdb/tree/master/tsfile.
## Quick Start
### Step1: Build TsFile-Spark-Connector
To build TsFile-Spark-Connector, you can use the following command:
```
mvn clean scala:compile compile package
```
In addition, you can also choose to download the available lib package directly from our website. The download link will be comming soon.
### Step2: Import Connector Lib into Spark
* import tsfile-spark-connector.jar into spark lib
* replace libthrift-0.9.2.jar and libfb303-0.9.2.jar with libthrift-0.9.1.jar and libfb303-0.9.1.jar respectively.
### Step3: Use Connector in Spark
#### 5.2.1.1 Local Mode
Start Spark with TsFile-Spark-Connector in local mode:
```
./<spark-shell-path> --jars tsfile-<tsfile-version>.jar,tsfile-spark-connector-<connector-version>.jar
```
Note:
* \<spark-shell-path> is the real path of your spark-shell.
* \<tsfile-version> is the tsfile version.
* \<connector-version> is the TsFile-Spark-Connector version. Note that, the version of TsFile-Spark-Connector and TsFile should be correspondence.
* Multiple jar packages are separated by commas without any spaces.
* See https://github.com/apache/incubator-iotdb/tree/master/tsfile for how to get TsFile.
#### 5.2.1.2 Distributed Mode
Start Spark with TsFile-Spark-Connector in distributed mode (That is, the spark cluster is connected by spark-shell):
```
. /<spark-shell-path> --jars tsfile-<tsfile-version>.jar,tsfile-spark-connector-<connector-version>.jar --master spark://ip:7077
```
Note:
* \<spark-shell-path> is the real path of your spark-shell.
* \<tsfile-version> is the tsfile version.
* \<connector-version> is the TsFile-Spark-Connector version. Note that, the version of TsFile-Spark-Connector and TsFile should be correspondence.
* Multiple jar packages are separated by commas without any spaces.
* See https://github.com/apache/incubator-iotdb/tree/master/tsfile for how to get TsFile.
## Detail: Conversion between TsFile and Spark
### TsFile Type <-> SparkSQL type
This library uses the following mapping the data type from TsFile to SparkSQL:
| TsFile | SparkSQL|
| --------------| -------------- |
| BOOLEAN | BooleanType |
| INT32 | IntegerType |
| INT64 | LongType |
| FLOAT | FloatType |
| DOUBLE | DoubleType |
| ENUMS | StringType |
| TEXT | StringType |
### TsFile Schema <-> SparkSQL Table Structure
The way to display TsFile is related to TsFile Schema. Take the following TsFile structure as an example: There are three Measurements in the Schema of TsFile: status, temperature, and hardware. The basic info of these three Measurements is as follows:
<center>
<table style="text-align:center">
<tr><th colspan="2">Name</th><th colspan="2">Type</th><th colspan="2">Encode</th></tr>
<tr><td colspan="2">status</td><td colspan="2">Boolean</td><td colspan="2">PLAIN</td></tr>
<tr><td colspan="2">temperature</td><td colspan="2">Float</td><td colspan="2">RLE</td></tr>
<tr><td colspan="2">hardware</td><td colspan="2">Text</td><td colspan="2">PLAIN</td></tr>
</table>
<span>Basic info of Measurements</span>
</center>
The existing data in the file is as follows:
<center>
<table style="text-align:center">
<tr><th colspan="4">delta\_object:root.ln.wf01.wt01</th><th colspan="4">delta\_object:root.ln.wf02.wt02</th><th colspan="4">delta\_object:root.sgcc.wf03.wt01</th></tr>
<tr><th colspan="2">status</th><th colspan="2">temperature</th><th colspan="2">hardware</th><th colspan="2">status</th><th colspan="2">status</th><th colspan="2">temperature</th></tr>
<tr><th>time</th><th>value</td><th>time</th><th>value</td><th>time</th><th>value</th><th>time</th><th>value</td><th>time</th><th>value</td><th>time</th><th>value</th></tr>
<tr><td>1</td><td>True</td><td>1</td><td>2.2</td><td>2</td><td>"aaa"</td><td>1</td><td>True</td><td>2</td><td>True</td><td>3</td><td>3.3</td></tr>
<tr><td>3</td><td>True</td><td>2</td><td>2.2</td><td>4</td><td>"bbb"</td><td>2</td><td>False</td><td>3</td><td>True</td><td>6</td><td>6.6</td></tr>
<tr><td>5</td><td> False </td><td>3</td><td>2.1</td><td>6</td><td>"ccc"</td><td>4</td><td>True</td><td>4</td><td>True</td><td>8</td><td>8.8</td></tr>
<tr><td>7</td><td> True </td><td>4</td><td>2.0</td><td>8</td><td>"ddd"</td><td>5</td><td>False</td><td>6</td><td>True</td><td>9</td><td>9.9</td></tr>
</table>
<span>A set of time-series data</span>
</center>
There are two ways to show it out:
#### the default way
Two columns will be created to store the full path of the device: time(LongType) and delta_object(StringType).
- `time` : Timestamp, LongType
- `delta_object` : Delta_object ID, StringType
Next, a column is created for each Measurement to store the specific data. The SparkSQL table structure is as follows:
<center>
<table style="text-align:center">
<tr><th>time(LongType)</th><th> delta\_object(StringType)</th><th>status(BooleanType)</th><th>temperature(FloatType)</th><th>hardware(StringType)</th></tr>
<tr><td>1</td><td> root.ln.wf01.wt01 </td><td>True</td><td>2.2</td><td>null</td></tr>
<tr><td>1</td><td> root.ln.wf02.wt02 </td><td>True</td><td>null</td><td>null</td></tr>
<tr><td>2</td><td> root.ln.wf01.wt01 </td><td>null</td><td>2.2</td><td>null</td></tr>
<tr><td>2</td><td> root.ln.wf02.wt02 </td><td>False</td><td>null</td><td>"aaa"</td></tr>
<tr><td>2</td><td> root.sgcc.wf03.wt01 </td><td>True</td><td>null</td><td>null</td></tr>
<tr><td>3</td><td> root.ln.wf01.wt01 </td><td>True</td><td>2.1</td><td>null</td></tr>
<tr><td>3</td><td> root.sgcc.wf03.wt01 </td><td>True</td><td>3.3</td><td>null</td></tr>
<tr><td>4</td><td> root.ln.wf01.wt01 </td><td>null</td><td>2.0</td><td>null</td></tr>
<tr><td>4</td><td> root.ln.wf02.wt02 </td><td>True</td><td>null</td><td>"bbb"</td></tr>
<tr><td>4</td><td> root.sgcc.wf03.wt01 </td><td>True</td><td>null</td><td>null</td></tr>
<tr><td>5</td><td> root.ln.wf01.wt01 </td><td>False</td><td>null</td><td>null</td></tr>
<tr><td>5</td><td> root.ln.wf02.wt02 </td><td>False</td><td>null</td><td>null</td></tr>
<tr><td>5</td><td> root.sgcc.wf03.wt01 </td><td>True</td><td>null</td><td>null</td></tr>
<tr><td>6</td><td> root.ln.wf02.wt02 </td><td>null</td><td>null</td><td>"ccc"</td></tr>
<tr><td>6</td><td> root.sgcc.wf03.wt01 </td><td>null</td><td>6.6</td><td>null</td></tr>
<tr><td>7</td><td> root.ln.wf01.wt01 </td><td>True</td><td>null</td><td>null</td></tr>
<tr><td>8</td><td> root.ln.wf02.wt02 </td><td>null</td><td>null</td><td>"ddd"</td></tr>
<tr><td>8</td><td> root.sgcc.wf03.wt01 </td><td>null</td><td>8.8</td><td>null</td></tr>
<tr><td>9</td><td> root.sgcc.wf03.wt01 </td><td>null</td><td>9.9</td><td>null</td></tr>
</table>
</center>
#### unfolding delta_object column
Expand the device column by "." into multiple columns, ignoring the root directory "root". Convenient for richer aggregation operations. If the user wants to use this display way, the parameter "delta\_object\_name" needs to be set in the table creation statement (refer to Example 5 in Section 5.1 of this manual), as in this example, parameter "delta\_object\_name" is set to "root.device.turbine". The number of path layers needs to be one-to-one. At this point, one column is created for each layer of the device path except the "root" layer. The column name is the name in the parameter and the value is the name of the corresponding layer of the device. Next, one column will be created for each Measurement to store the specific data.
Then The SparkSQL Table Structure is as follow:
<center>
<table style="text-align:center">
<tr><th>time(LongType)</th><th> group(StringType)</th><th> field(StringType)</th><th> device(StringType)</th><th>status(BooleanType)</th><th>temperature(FloatType)</th><th>hardware(StringType)</th></tr>
<tr><td>1</td><td> ln </td><td> wf01 </td><td> wt01 </td><td>True</td><td>2.2</td><td>null</td></tr>
<tr><td>1</td><td> ln </td><td> wf02 </td><td> wt02 </td><td>True</td><td>null</td><td>null</td></tr>
<tr><td>2</td><td> ln </td><td> wf01 </td><td> wt01 </td><td>null</td><td>2.2</td><td>null</td></tr>
<tr><td>2</td><td> ln </td><td> wf02 </td><td> wt02 </td><td>False</td><td>null</td><td>"aaa"</td></tr>
<tr><td>2</td><td> sgcc </td><td> wf03 </td><td> wt01 </td><td>True</td><td>null</td><td>null</td></tr>
<tr><td>3</td><td> ln </td><td> wf01 </td><td> wt01 </td><td>True</td><td>2.1</td><td>null</td></tr>
<tr><td>3</td><td> sgcc </td><td> wf03 </td><td> wt01 </td><td>True</td><td>3.3</td><td>null</td></tr>
<tr><td>4</td><td> ln </td><td> wf01 </td><td> wt01 </td><td>null</td><td>2.0</td><td>null</td></tr>
<tr><td>4</td><td> ln </td><td> wf02 </td><td> wt02 </td><td>True</td><td>null</td><td>"bbb"</td></tr>
<tr><td>4</td><td> sgcc </td><td> wf03 </td><td> wt01 </td><td>True</td><td>null</td><td>null</td></tr>
<tr><td>5</td><td> ln </td><td> wf01 </td><td> wt01 </td><td>False</td><td>null</td><td>null</td></tr>
<tr><td>5</td><td> ln </td><td> wf02 </td><td> wt02 </td><td>False</td><td>null</td><td>null</td></tr>
<tr><td>5</td><td> sgcc </td><td> wf03 </td><td> wt01 </td><td>True</td><td>null</td><td>null</td></tr>
<tr><td>6</td><td> ln </td><td> wf02 </td><td> wt02 </td><td>null</td><td>null</td><td>"ccc"</td></tr>
<tr><td>6</td><td> sgcc </td><td> wf03 </td><td> wt01 </td><td>null</td><td>6.6</td><td>null</td></tr>
<tr><td>7</td><td> ln </td><td> wf01 </td><td> wt01 </td><td>True</td><td>null</td><td>null</td></tr>
<tr><td>8</td><td> ln </td><td> wf02 </td><td> wt02 </td><td>null</td><td>null</td><td>"ddd"</td></tr>
<tr><td>8</td><td> sgcc </td><td> wf03 </td><td> wt01 </td><td>null</td><td>8.8</td><td>null</td></tr>
<tr><td>9</td><td> sgcc </td><td> wf03 </td><td> wt01 </td><td>null</td><td>9.9</td><td>null</td></tr>
</table>
</center>
TsFile-Spark-Connector can display one or more TsFiles as a table in SparkSQL By SparkSQL. It also allows users to specify a single directory or use wildcards to match multiple directories. If there are multiple TsFiles, the union of the measurements in all TsFiles will be retained in the table, and the measurement with the same name will have the same data type by default. Note that if there is a situation with the same name but different data types, TsFile-Spark-Connector will not guarantee the correctness of the results.
The writing process is to write a DataFrame as one or more TsFiles. By default, two columns need to be included: time and delta_object. The rest of the columns are used as Measurement. If user wants to write the second table structure back to TsFile, user can set the "delta\_object\_name" parameter(refer to Section 5.1 of Section 5.1 of this manual).
## Example
The path of 'test.tsfile' used in the following examples is "data/test.tsfile". Please upload 'test.tsfile' to hdfs in advance and the directory is "/test.tsfile".
### 5.1 Scala API
* **Example 1**
```scala
import cn.edu.tsinghua.tsfile._
//read data in TsFile and create a table
val df = spark.read.tsfile("/test.tsfile")
df.createOrReplaceTempView("tsfile_table")
//query with filter
val newDf = spark.sql("select * from tsfile_table where temperature > 1.2").cache()
newDf.show()
```
* **Example 2**
```scala
val df = spark.read
.format("cn.edu.tsinghua.tsfile")
.load("/test.tsfile ")
df.filter("time < 10").show()
```
* **Example 3**
```scala
//create a table in SparkSQL and build relation with a TsFile
spark.sql("create temporary view tsfile_table using cn.edu.tsinghua.tsfile options(path = \"test.ts\")")
spark.sql("select * from tsfile_table where temperature > 1.2").show()
```
* **Example 4(using options to read)**
```scala
import cn.edu.tsinghua.tsfile._
val df = spark.read.option("delta_object_name", "root.group.field.device").tsfile("/test.tsfile")
//create a table in SparkSQL and build relation with a TsFile
df.createOrReplaceTempView("tsfile_table")
spark.sql("select * from tsfile_table where device = 'wt01' and field = 'wf01' and group = 'ln' and time < 10").show()
```
* **Example 5(write)**
```scala
import cn.edu.tsinghua.tsfile._
val df = spark.read.tsfile("/test.tsfile").write.tsfile("/out")
```
* **Example 6(using options to write)**
```scala
import cn.edu.tsinghua.tsfile._
val df = spark.read.option("delta_object_name", "root.group.field.device").tsfile("/test.tsfile")
df.write.option("delta_object_name", "root.group.field.device").tsfile("/out")
```
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册